Evidence (7278 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filtered →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Bifurcation scores were highest in the EU (0.84 ± 0.06).
Reported composite-index results: 'Bifurcation scores were highest in the EU (0.84 ± 0.06)'.
Tri-jurisdictional firms had higher annual revenues (2,310.4 ± 450.2 million USD) than other groups.
Reported descriptive statistics in the paper: 'and higher annual revenues (2,310.4 ± 450.2 million USD) than other groups.'
Tri-jurisdictional firms had larger workforces (5,380 ± 1,245).
Reported descriptive statistics in the paper: 'Tri-jurisdictional firms had larger workforces (5,380 ± 1,245)'.
The paper provides a conceptual foundation for designing AI systems that model expert sensing over time, positioning cognition as an infrastructural, operational, and professional domain in persistent human-AI systems.
Stated contribution of the paper (conceptual/theoretical contribution rather than empirical evidence).
The Cognitive Operations Research and Training Framework (CORTF) is introduced to support research, education, and workforce development.
Conceptual framework proposed in the paper (no empirical implementation or evaluation presented).
The Cognitive Operations Manager is proposed as a prototype AI-native professional role for coordinating tacit signal modelling, semantic modelling, AI system calibration, expert validation, and ethical governance.
Proposal of a new professional role in the paper (conceptual/visionary; no pilot study, job analysis, or workforce data reported).
Long-term Cognitive Operations are defined as the practices required to maintain and govern such systems, including memory curation, semantic organisation, tacit signal modelling, reasoning calibration, and cognitive governance.
Conceptual taxonomy/definition introduced in the paper (theoretical framing; no empirical validation).
Tacit Signal Infrastructure is introduced as a layer for capturing, structuring, modelling, interpreting, and validating expert tacit signals over time.
Conceptual design/proposal presented in the paper (architectural description; no empirical implementation or evaluation reported).
Next-generation AI systems should move beyond explicit knowledge processing toward the longitudinal modelling of expert tacit sensing.
Normative proposal / recommendation made in the paper as part of a vision; supported by conceptual rationale rather than empirical data.
High-level expertise also depends on tacit sensing: perceiving weak signals, recognising emerging tensions, detecting coherence degradation, and anticipating instability before formal indicators appear.
Conceptual claim grounded in cognitive-science-informed argumentation presented in the paper (no empirical study or sample size reported).
Current generative AI systems are increasingly effective at processing explicit knowledge, including retrieving information, summarising documents, generating explanations, and supporting codified workflows.
Asserted in the paper as a descriptive trend; based on literature synthesis and observations of current generative AI capabilities (no empirical sample or experiment reported in the paper).
The framework establishes a principled vocabulary for designing enterprise service platforms that manage human and artificial intelligence labor responsibly, transparently, and at scale.
Paper presents the combined constructs (Workforce Unit Abstraction, Hybrid Capacity Model, Governance-bound Autonomy) as a coherent reference model and vocabulary; described as conceptual contribution arising from the design-science approach.
Governance-bound autonomy constrains AI Workforce Unit actions within a five-level, policy-enforced autonomy ladder supported by six mandatory governance controls.
Conceptual governance artifact described in the paper (five-level autonomy ladder + six governance controls); presented as the proposed governance design, not as an empirically tested intervention in the abstract.
The Hybrid Capacity Model extends demand-to-supply planning across heterogeneous workforce pools, resolving a multi-objective allocation problem that simultaneously optimizes cost, quality, and risk constraints.
Described model/algorithmic artifact in the paper (Hybrid Capacity Model) claiming multi-objective optimization; no empirical benchmark or sample size reported in the provided text.
The Workforce Unit Abstraction defines a unified seven-attribute operational schema applicable to both human workers and AI agents, enabling consistent representation across planning, scheduling, and governance systems.
Artifact description from the paper (Workforce Unit Abstraction with seven attributes); presented as a designed schema rather than an empirically validated result in the abstract.
This article introduces three constructs as reusable primitives for hybrid workforce platform design.
Design science research methodology producing an artifact (three constructs); described as the paper's contribution. No empirical evaluation or sample size reported in the abstract.
Compounded through 500 turns of reciprocation, these differentials accumulated into in-group trust biases of +0.014 to +0.100 (d = 0.84-4.52), illustrating how modest per-interaction targeting propagates into structural inequality in persistent networks.
Aggregate/longitudinal result from the simulation after 500 turns: reported cumulative change in in-group trust bias (absolute change +0.014 to +0.100) and reported effect sizes in Cohen's d (0.84–4.52); based on the same experimental setup (6 model families, 20 seeds each).
Per-turn in-group versus out-group differentials of 5 to 16 percentage points were statistically significant for all six models (Wilcoxon signed-rank, all Benjamini-Hochberg-corrected p < 0.001), establishing group-contingent targeting as a robust property of instruction-tuned language models across architectures and training regimes.
Statistical analysis reported in the paper: per-turn differential between in-group and out-group targeting measured as percentages (5–16 percentage points); significance assessed with Wilcoxon signed-rank tests and Benjamini-Hochberg correction; applied across six model families each with 20 seeds.
When group labels were visible, we observed network assortativity (all absent when labels were hidden).
Reported network-level outcomes from the simulation comparing visible vs hidden label conditions across the experimental runs (6 model families, 20 seeds each, 500 turns).
When group labels were visible, we observed action homophily.
Result reported from the simulation comparing visible versus hidden group label conditions across the described experimental runs (6 model families, 20 seeds each, 500 turns).
When group labels were visible, we observed in-group trust bias.
Result reported from the simulation comparing conditions with visible versus hidden group labels; based on interactions of instruction-tuned LLM agents across the reported experimental runs (6 model families, 20 seeds each, 500 turns).
We ran a controlled multi-agent simulation in which instruction-tuned language model agents interacted across 500 turns under three conditions manipulating group label salience and resource scarcity, across six model families with 20 seeds each.
Descriptive methods statement from the paper: controlled multi-agent simulation; instruction-tuned LLM agents; 3 experimental conditions (manipulating group label salience and resource scarcity); 6 model families; 20 random seeds per model; 500 turns per simulation run.
Although AI creates obstacles, it also has the potential to be an important tool for creating innovative opportunities and continued growth if managed with sound practices.
Concluding statement in the paper's abstract presenting a normative/conditional conclusion based on the paper's evaluation and synthesis of evidence (no primary quantified results provided in the supplied text).
AI leads to the creation of new jobs.
The paper explicitly states it examines the creation of new jobs as a ramification of AI (abstract); claim presented qualitatively without reported sample sizes or quantified effect in the provided text.
Operational reasoning paradigms such as ReasonOps may become foundational infrastructure for next-generation trustworthy AI ecosystems.
Author's forward-looking argument / conjecture about the potential future impact and adoption of operational reasoning paradigms; presented as an argument rather than demonstrated empirically in the excerpt.
The paper presents the ReasonOps architecture, demonstrates its workflow using an autonomous braking system analysis example, and discusses its potential role in future safety-critical autonomous AI systems.
Author statement about the paper's content and demonstration (explicitly claims an architecture and an example walkthrough); evidence is the paper's own descriptive content.
The proposed paradigm integrates semantic interpretation, autoformalization, symbolic reasoning, theorem proving, runtime assurance, probabilistic reliability estimation, and adaptive correction into a unified reasoning lifecycle.
Author claim about the architecture and components of ReasonOps; presented as a proposed integrated lifecycle in the paper (no empirical evaluation reported in excerpt).
ReasonOps treats reasoning as a continuously monitored, verifiable, reliability-aware operational process rather than an isolated inference task.
Author description of the ReasonOps paradigm and its operational stance (conceptual framework described in paper).
This paper introduces ReasonOps, a unified operational paradigm for trustworthy verified reasoning systems.
Declarative claim about the paper's contribution (introduction of a named paradigm); supported by the paper itself (architectural description and example claimed).
Recent advances in theorem proving, autoformalization, symbolic reasoning, and tool-augmented language models demonstrate substantial progress toward machine-assisted formal reasoning.
Author statement citing multiple research directions (theorem proving, autoformalization, symbolic reasoning, tool-augmented LMs); no specific empirical results or quantitative studies provided in excerpt.
Large Language Models (LLMs) have transformed artificial intelligence from primarily generative systems into increasingly capable reasoning agents.
Author assertion in paper's introduction; conceptual argument referencing recent developments in LLMs (no empirical study or sample size reported in text excerpt).
There exists a data supply chain that runs from individual translators through language service providers (LSPs) and platforms to model developers.
Mapping and descriptive analysis of industry supply chains and intermediary roles provided in the paper; conceptual and empirical examples of flows of translation data from translators to model developers. No numerical sample reported.
Article 30-4 of the Japanese Copyright Act legitimates a mode of use the paper terms 'appropriation without consumption'—i.e., mining works for statistical features rather than reading or experiencing them.
Textual/legal analysis of Article 30-4 of the Japanese Copyright Act and its interpretation; comparative legal reading presented in the paper. No numerical sample reported.
The development of statistical machine translation (SMT), neural machine translation (NMT), the Transformer architecture, and multilingual large language models (LLMs) cannot be disentangled from the accumulation of translation data (TM/parallel corpora).
Historical and technical literature review linking MT/NLP methodological advances to the availability and use of parallel corpora and TM; comparative analysis of model development histories described in the paper. No numerical sample reported.
Translation memories (TM) and parallel corpora preserve a one-to-one correspondence between source and target text and therefore constitute extraordinarily valuable supervised training data for machine translation.
Conceptual argument and literature review of machine translation practice (discussion of TM/parallel corpora as supervised training data); examples and descriptive evidence from MT research and industry practice presented in the paper. No numerical sample reported.
To balance promotion of innovation with preservation of human creativity, it is essential to revise existing laws and introduce novel approaches such as defining a specific intellectual property right for AI-generated works or designating ownership among associated human agents.
Normative recommendation derived from the paper's comparative legal analysis and discussion of enforcement challenges (no empirical sample size).
Artificial intelligence systems are capable of autonomously generating artistic, literary, musical works, and even inventions without direct human intervention.
Stated as part of the paper's premise and supported by the paper's literature/theoretical review of advances in AI creative and inventive capabilities (no empirical sample size reported).
The present paper states the primitive contract, the toll identity, the within-boundary no-arbitrage result, and the budget guarantee that the later empirical, mechanism-design, and dynamic-underwriting companion papers depend on.
Paper's stated scope and organization asserting that these formal primitives and theorems are provided as foundations for follow-on empirical and companion studies.
(iv) A conservative runtime gating theorem translates high-probability toll envelopes into an executed-action budget guarantee.
Mathematical theorem in the paper proving that given high-probability bounds (toll envelopes), one can derive a guarantee on executed-action budget consumption (runtime gating).
(iii) An irreversible-authority premium is characterized and splits into a strictly positive action-level component plus an if-and-only-if characterization of the set-level robust capital increase.
Formal decomposition/theorem in the paper proving existence of the irreversible-authority premium, showing the action-level component is strictly positive, and providing an iff condition for set-level robust capital increases.
(ii, corollary) Gaming-resistance of the system is tied to the design of the underwriting boundary (i.e., a corollary linking gaming-resistance to boundary design).
Corollary derived from the no-splitting theorem that links strategic gaming-resistance properties to specific features of the underwriting boundary.
(ii) A no-splitting property holds within an underwriting boundary that telescopes path-decomposed actions into a boundary potential.
Formal theorem in the paper proving a no-splitting property and showing how path-decomposed action contributions aggregate (telescoping) into a boundary potential.
(i) There exists a well-defined counterfactual toll under a chosen safe-default mapping and continuation policy.
Theoretical derivation / formal proof presented in the paper establishing existence of the toll under specified mappings and policies.
The framework treats per-action insurance as the primary unit of analysis and replaces post-hoc annual liability cover with a pre-action transaction layer.
Conceptual and design claim supported by the paper's theoretical argumentation and proposed contract primitives; no empirical validation reported.
We propose a foundational runtime actuarial layer for autonomous AI agents in which every side-effect-bearing action carries a time-consistent, counterfactual risk toll computed against a contractually fixed safe default, inside an explicit underwriting boundary.
Theoretical proposal and formal description of an actuarial framework presented in the paper (architectural/axiomatic exposition). No empirical sample or experiment reported.
Policy makers and education/training organizations should comprehensively consider AI and EPU to cope with market uncertainty and ensure the stability and sustainability of China’s ETM.
Policy recommendation derived from the paper's empirical findings on causality, quantile dependence, and asymmetric risk spillovers (argumentative/conclusion statement rather than a direct empirical result).
There is an interaction between AI and EPU: EPU promotes AI during periods of economic stability.
Cross-quantilogram analysis indicating quantile-specific causality/interactions, with EPU predicting AI in stable-period quantiles (method reported; sample size not stated).
There is an interaction between AI and EPU: AI promotes EPU in bullish markets.
Cross-quantilogram analysis showing quantile-dependent interaction (method reported; sample size not stated); specific result described for bullish-market quantiles.
The cross-quantilogram indicates quantile dependence among AI, EPU and ETM: the positive predictive effect of AI on ETM is mainly concentrated in bullish markets.
Cross-quantilogram analysis (quantile cross-dependence test) applied to AI and ETM time-series in China (method reported; sample size not stated).
The nonparametric quantile causality test shows a unidirectional causal relationship from AI to China’s education and training market (ETM).
Nonparametric quantile causality test applied to time-series data on AI and ETM in China (method reported; sample size not stated).