Evidence (7198 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
8921 claims
Filter claims →
Productivity
8002 claims
Filter claims →
Governance
7198 claims
Filtered →
Human-AI Collaboration
6864 claims
Filter claims →
Org Design
4398 claims
Filter claims →
Innovation
4286 claims
Filter claims →
Labor Markets
3629 claims
Filter claims →
Skills & Training
3001 claims
Filter claims →
Inequality
2141 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 790 | 208 | 103 | 950 | 2117 |
| Governance & Regulation | 869 | 411 | 195 | 126 | 1630 |
| Organizational Efficiency | 817 | 202 | 126 | 87 | 1243 |
| Technology Adoption Rate | 675 | 258 | 128 | 106 | 1178 |
| Research Productivity | 462 | 138 | 64 | 347 | 1023 |
| Output Quality | 501 | 193 | 61 | 52 | 807 |
| Decision Quality | 346 | 180 | 84 | 51 | 668 |
| AI Safety & Ethics | 235 | 285 | 70 | 34 | 630 |
| Firm Productivity | 452 | 58 | 91 | 20 | 627 |
| Market Structure | 184 | 171 | 123 | 24 | 507 |
| Task Allocation | 221 | 65 | 76 | 34 | 401 |
| Skill Acquisition | 176 | 62 | 62 | 17 | 317 |
| Innovation Output | 207 | 28 | 48 | 18 | 303 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Employment Level | 105 | 56 | 108 | 13 | 284 |
| Consumer Welfare | 121 | 67 | 45 | 11 | 244 |
| Firm Revenue | 160 | 50 | 28 | 4 | 242 |
| Task Completion Time | 182 | 33 | 10 | 13 | 239 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 94 | 73 | 23 | 12 | 202 |
| Error Rate | 76 | 98 | 11 | 4 | 189 |
| Regulatory Compliance | 81 | 73 | 17 | 7 | 178 |
| Automation Exposure | 61 | 59 | 26 | 14 | 163 |
| Training Effectiveness | 97 | 21 | 14 | 19 | 153 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 21 | 1 | 117 |
| Hiring & Recruitment | 52 | 8 | 8 | 3 | 71 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 49 | 6 | 1 | 61 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 15 | 14 | — | 3 | 32 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
These cooperation mechanisms become more effective under evolutionary pressures to maximize individual payoffs.
Authors report results from experiments or simulations applying evolutionary-pressure dynamics (selection for payoff-maximizing agents) and observing increased effectiveness of mechanisms; no numeric results or sample sizes in excerpt.
Contracting and mediation are most effective in achieving cooperative outcomes between capable LLM models.
Empirical results from the authors' experiments across four social dilemmas comparing mechanism performance; specifics (which models, quantitative cooperation rates) are not included in the excerpt.
This work contributes to the growing body of research on digital sovereignty and the political economy of AI in frontier markets.
Author's concluding claim about the study's contribution to literature.
Many advanced nations are already integrating AI into their core systems.
General descriptive statement in the paper's background/comparative context; no quantitative enumeration or country-sample provided in the excerpt.
To fund this transition, the paper introduces a blended finance structure designed to attract multilateral banks and private venture capital.
Policy/finance architecture proposed in the paper (design description); no funding rounds, commitments, or empirical investor responses reported in the excerpt.
With coordinated reform, AI could boost Cameroon’s long-term productivity by 1.5% to 2.8% annually.
Result reported from the paper's digital infrastructure modeling; no empirical field trial or sampled population reported in the excerpt.
This model draws on international standards from the OECD, UNESCO, and the African Union, alongside the NIST Risk Management Framework.
Paper text states the model's normative/standards sources; descriptive claim about frameworks referenced.
The study proposes a three-layer framework tailored to Cameroon’s specific political economy using comparative policy analysis and digital infrastructure modeling.
Methodological claim in the paper (description of what the study proposes); based on the authors' analytical work rather than reported empirical validation.
Cameroon should not view AI simply as modernization; it must be treated as a sovereign strategy built on institutional economics, deliberate governance, and a solid blended finance architecture.
Normative policy recommendation derived from the paper's comparative analysis and modeling; no empirical trial or longitudinal data reported in the excerpt.
Artificial Intelligence is ... a structural force that determines national competitiveness and economic resilience.
Author assertion supported by literature review and high-level argumentation (comparative policy analysis); no empirical sample or dataset reported in the excerpt.
CoCoGen+ outperforms baselines in efficiency.
Comparative experiments reported in the paper showing CoCoGen+ versus baseline methods on efficiency metrics; the abstract does not report numeric effect sizes or sample sizes.
Experiments on varying learning tasks validate the feasibility of CoCoGen+.
Simulation/experimental evaluation on multiple learning tasks reported in the paper; abstract does not state dataset sizes, number of tasks, or other experimental details.
To promote long-term collaboration, CoCoGen+ integrates a payoff-redistribution-based incentive mechanism to compensate organizations for their contributions and competition-caused utility degradation.
Mechanism design described in the paper (proposed incentive mechanism); presented theoretically and incorporated into experiments.
We provide a tractable equilibrium characterization of the game and derive implementable synthetic-data generation strategies that maximize social welfare.
Analytical equilibrium characterization and derived strategies reported in the methods/analysis sections of the paper; theoretical derivations rather than randomized trial data.
We introduce CoCoGen+, a coopetition-compatible data generation and incentivization framework that jointly models non-IID data and inter-organizational competition while endogenizing GenAI-based synthetic data generation as a strategic decision.
Design and formal description of the CoCoGen+ framework within the paper (theoretical contribution); no sample size applicable.
We formalize the framework and outline a research agenda, motivated by business and economics, around marketplace simulation, metrics, optimization, and adoption in evaluation campaigns like TREC.
Statement of paper scope and contributions (formalization and research agenda); factual description of the paper's contents rather than an empirical claim.
By simulating repeated interactions and evolving user and agent preferences, the framework enables longitudinal evaluation and marketplace-level metrics, such as retention and market share, that complement and can extend beyond traditional accuracy-based metrics.
Descriptive claim about the capabilities of the proposed simulation-based framework as stated in the paper; described as enabling longitudinal and marketplace-level metrics (no empirical validation or sample size in the abstract).
We introduce Marketplace Evaluation, a simulation-based paradigm that evaluates information access systems as participants in a competitive marketplace.
Author's stated contribution in the paper (introduction of a proposed framework); the paper itself presents the framework (formalization described later), not an external empirical validation.
Modern information access ecosystems consist of mixtures of systems, such as retrieval systems and large language models, and increasingly rely on marketplaces to mediate access to models, tools, and data, making competition between systems inherent to deployment.
Statement in paper abstract/introduction describing current ecosystem architecture and marketplace mediation; conceptual/observational claim (no empirical data or sample size reported).
Successful AI implementation in auditing requires an integrated framework that aligns technological readiness, auditor acceptance, and innovation diffusion to sustainably improve audit quality in Indonesia.
Authors' conclusion and recommendation derived from thematic synthesis of reviewed literature and comparative findings.
Comparative analysis indicates Indonesia remains at the early majority stage of AI adoption in auditing.
Authors' comparative synthesis of the reviewed literature and country-specific discussion classifying Indonesia's adoption stage as early majority.
Comparative analysis indicates global audit firms are positioned at the innovators and early adopters’ stage of AI adoption.
Authors' comparative synthesis of the reviewed literature classifying global audit firms' diffusion stage (innovation adoption framework) based on patterns in the articles.
AI implementation has been shown to significantly enhance audit efficiency, accuracy, and overall audit quality.
Synthesis of findings across the reviewed articles (thematic analysis) reporting positive effects of AI on efficiency, accuracy, and audit quality.
Global auditing practices increasingly utilize machine learning, natural language processing, and robotic process automation to support risk-based auditing, fraud detection, and continuous auditing.
Thematic analysis of the 15 selected journal articles identifying dominant AI techniques (ML, NLP, RPA) and common use cases (risk-based auditing, fraud detection, continuous auditing).
Substituting subjective human preference with rigorous economic penalties provides a robust methodology for aligning autonomous agents in high-stakes, real-world environments.
Conclusion drawn from the authors' empirical study and the reported final-system performance; presented as a general methodological claim (supporting data referenced in paper but not detailed in excerpt).
The final OOM-RL-aligned system achieved a stable equilibrium with an annualized Sharpe ratio of 2.06 in its mature phase.
Quantitative performance result reported for the mature phase of the system in the paper's abstract; Sharpe ratio provided as a single-number metric (no sample size, number of trading periods, or statistical significance reported in the excerpt).
The MAS abandoned overfitted hallucinations in favor of the Strict Test-Driven Agentic Workflow (STDAW), which enforces a Byzantine-inspired uni-directional state lock (RO-Lock) anchored to a deterministically verified ≥95% code coverage constraint matrix.
Design and outcome claim in the paper: introduction of STDAW/RO-Lock and reported enforcement of a ≥95% code coverage constraint as part of the aligned architecture (qualitative + a coverage threshold stated).
The system evolved from a high-turnover, sycophantic baseline to a robust, liquidity-aware architecture over the course of the study.
Reported longitudinal observations from the 20-month empirical study described in the paper (qualitative system evolution claim; no numeric counts provided in excerpt).
We introduce Out-of-Money Reinforcement Learning (OOM-RL): deploying agents into the non-stationary, high-friction reality of live financial markets to utilize capital depletion as an un-hackable negative gradient.
Methodological claim / novel paradigm introduced by the paper; described as implemented in the study (no numerical sample size given in excerpt).
Established regional telcos and banks are leveraging proprietary data to develop digital loan products.
Observations and interviews from the nine-month ethnography describing practices of regional telcos and banks in Nairobi developing digital loan products using proprietary data.
A configuration-driven domain model means deploying a new institutional decision domain requires YAML configuration, not engineering capacity.
Design/implementation claim in paper describing deployment approach using YAML configuration rather than engineering work.
We introduce governability — how reliably a system knows when it should not act autonomously — as a primary evaluation axis for institutional AI alongside accuracy.
Conceptual contribution/metric proposed by authors in paper; no empirical validation reported in the excerpt.
Cognitive Core produced zero silent errors while both baselines produced 5-6 silent errors on the evaluation set.
Empirical benchmark reported in paper on the 11-case evaluation set; counts of silent errors given for Cognitive Core and baselines.
Cognitive Core achieves 91% accuracy on the 11-case prior authorization appeal set, versus 55% for ReAct and 45% for Plan-and-Solve.
Empirical benchmark reported in paper on the 11-case evaluation set; accuracies explicitly stated for three systems.
We propose Cognitive Core: a governed decision substrate built from nine typed cognitive primitives (retrieve, classify, investigate, verify, challenge, reflect, deliberate, govern, generate), a four-tier governance model where human review is a condition of execution rather than a post-hoc check, a tamper-evident SHA-256 hash-chain audit ledger endogenous to computation, and a demand-driven delegation architecture supporting both declared and autonomously reasoned epistemic sequences.
Design/proposal described in paper (architectural specification); no empirical evaluation reported for the architecture itself in the excerpt.
A simple regret-based payout rule is proposed that satisfies three out of the four Shapley axioms and also lies in the core.
Constructive proposal in the paper with accompanying theoretical/axiomatic analysis showing compliance with three Shapley axioms and proof of core-membership.
Convexity (in the homogeneous-agent case) implies a non-empty core that contains the Shapley value and ensures both stability and fairness of payout allocations.
Theoretical implication shown in the paper: proof that convexity leads to non-empty core and that the Shapley value belongs to the core under the stated conditions.
For identical (homogenous) agents with fixed action sets, the induced TU game is convex under mild algorithmic conditions.
Theoretical result/proof provided in the paper under assumptions of homogenous agents and fixed action sets and certain algorithmic conditions.
Relatedness-based simulations identify, when it exists, for each country the Simplest Single Sovereignty Enhancing Technology (SSSET), i.e., the most feasible single new technological direction associated with the largest expected improvement in relative geoeconomic positioning.
Simulation/relatedness analysis described in paper: for each country, relatedness-based (proximity) simulations used to propose the single most feasible technology (SSSET) expected to yield the largest improvement in geoeconomic position.
The United States and Israel consistently occupy a marked 'high-diversity/low-ubiquity' position and lead the GCI ranking, followed by China, France, Japan, and Germany.
Empirical ranking produced by the GCI measure applied to the 17-country sample (paper reports ordering and characterization of US and Israel positions).
Cloud Computing, Cybersecurity Tools, and Medtech exhibit the highest ETGCI values, reflecting concentration of specialization in a small set of leading countries.
Empirical result computed from ETGCI values derived from the RVA specialization matrix; paper reports these domains as having the highest ETGCI (implying concentration among high-GCI countries).
From this matrix we derive two eigenvector-based measures: a Geoeconomic Complexity Index (GCI) that ranks countries by the composition of their venture specializations, and an Emerging Technology Geoeconomic Complexity Index (ETGCI) that ranks domains by the extent to which specialization is concentrated among high-GCI countries.
Methodological claim: eigenvector centrality/complexity approach applied to the RVA-based specialization matrix to derive two indices (GCI for countries, ETGCI for domains).
We construct an RVA-based country-technology specialization matrix for the 17 countries with the highest aggregate VC funding.
Methodological statement in paper: Revealed Venture Advantage (RVA) metric computed and used to build country-by-technology specialization matrix restricted to top 17 countries by aggregate VC funding.
We map venture-backed startups to 18 emerging technology domains via a probabilistic multi-label large-language-model classifier using Crunchbase firm- and deal-level data.
Methodological description in paper: Crunchbase firm- and deal-level data used; classification into 18 domains performed with a probabilistic multi-label LLM classifier (paper states this pipeline).
In a test of eight behavioural persuasion strategies, all outperformed the most effective attitudinal persuasion strategy, but differences among the eight were small.
Experimental comparison within the preregistered studies of eight behavioural persuasion strategies versus the best attitudinal persuasion strategy; results reported in paper showing each behavioural strategy exceeded the attitudinal strategy and that variation among the eight behavioural strategies was small.
We replicated prior findings that information provision drove effects on attitudes.
Experimentally manipulating information provision within the preregistered studies and observing effects on attitudinal outcomes, consistent with prior literature (sample reported in paper).
We found sizable AI persuasion effects on these behavioural outcomes (e.g. +19.7 percentage points on petition signing).
Experimental results from the two preregistered studies reported in the paper; example effect explicitly reported as +19.7 percentage points increase in petition signing. Overall sample reported as N=17,950 responses.
The policy’s impact on inclusive green growth is most pronounced in cities with areas between 5,000 and 10,000 square kilometers.
Subgroup analysis by city area within the DID framework showing the largest estimated policy effect for cities whose area is between 5,000 and 10,000 km^2 (sample size not reported).
The policy exhibits spatial spillover effects on neighboring regions that diminish progressively with distance (spatial decay).
Spatial analysis of policy effects across neighboring regions showing declining effect sizes as geographic distance increases (details and sample size not reported).
Mechanism tests indicate the policy primarily enhances inclusive green growth by strengthening public environmental participation.
Mechanism/mediation tests reported in the study (presumably within the DID framework) showing an increase in measures of public environmental participation associated with the policy and linked to inclusive green growth (sample size not reported).