Evidence (14156 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Current research has largely focused on short-horizon tasks over a limited set of software with limited economic value (e.g., basic e-commerce and OS-configuration tasks).
Narrative literature/field observation reported in paper introduction (no numeric study reported in excerpt).
There is a fundamental gap in current agent capabilities: functional correctness alone is insufficient for design-aware issue resolution, motivating design-aware evaluation beyond functional correctness.
Synthesis of experimental findings: low design-satisfaction despite functional correctness, prevalence of design violations, and only partial improvement from guidance support the conclusion.
Design violations are widespread in agent-produced patches.
Empirical results from experiments on the benchmark showing many patches violate validated design constraints; backed by counts/percentages in evaluation (as summarized in abstract).
Test-based correctness substantially overestimates patch quality: fewer than half of resolved issues are fully design-satisfying.
Experimental evaluation with state-of-the-art LLM-based agents on the benchmark (reported in paper). Sample implicit: benchmark issues (495) used to evaluate agents; comparison between test pass rates and design-satisfaction measured by verifier.
Despite growing investment in data analytics, the decision-making and coordination layers of these workflows remain predominantly manual, reactive, and fragmented across outlets, distribution centers, and supplier networks.
Stated as an observation in the paper (abstract); no quantitative evidence, metrics, or comparative analysis provided in the excerpt.
Retail supply chain operations in supermarket chains involve continuous, high-volume manual workflows spanning demand forecasting, procurement, supplier coordination, and inventory replenishment.
Descriptive claim stated in the paper's introduction/abstract; no empirical data, sample, or methods reported to substantiate this characterization within the text provided.
We identify a temporal constraint: the window during which semiconductor manufacturing concentration makes hardware-level governance implementable is narrowing, while R&D timelines for critical mechanisms span years.
Authors' temporal analysis combining industry structure observations (semiconductor manufacturing concentration) with estimated R&D timelines for mechanisms (qualitative/engineering timeline estimates). No empirical time-series sample size provided.
We assess principal threats to compute-based governance, including algorithmic efficiency gains, distributed training methods, and sovereignty concerns.
Authors' threat analysis (qualitative assessment of technical and geopolitical threat vectors). No quantitative sample size; based on literature and engineering reasoning.
Our analysis reveals a structural mismatch: the mechanisms most needed for treaty verification, including on-chip compute metering, cryptographic proof-of-training, and hardware-embedded enforcement, are also the least mature.
Authors' feasibility assessments of mechanisms (qualitative/engineering evaluation across the taxonomy); identification of critical mechanisms for treaty verification and corresponding feasibility ratings. No empirical trial or sample size reported.
The governance of frontier AI increasingly relies on controlling access to computational resources, yet the hardware-level mechanisms invoked by policy proposals remain largely unexamined from an engineering perspective.
Authors' framing and literature review presented in the paper (conceptual/qualitative argument; no empirical sample size reported).
The review identifies persistent gaps in population coverage, multimodal integration, equity optimization, explainability, validation, and governance that constrain inclusiveness and robustness of GeoAI applications in urban mobility research.
Authors' gap analysis based on the contents and limitations of the 18 included studies.
Urban mobility is a central challenge for sustainable and inclusive cities, as climate change, congestion, and spatial inequality increasingly reveal mobility patterns as expressions of deeper social and spatial structures.
Introductory framing statement in the paper; general literature/contextual claim (no original empirical test reported in this paper).
In an additive model where human utility and fitness differ, if deception increases fitness beyond genuine utility then evolution will select for deception.
Mathematical analysis of an additive model in the paper showing selection pressure favors traits (deception) that increase the fitness function even when they reduce true human utility (theoretical derivation).
The two margins interact through a self-undermining feedback that can generate low-archive traps (multiple equilibria with low accumulated public archive).
Dynamic equilibrium analysis in the theoretical model showing interacting feedbacks and possible trap equilibria (model-derived result).
Resolution margin: the probability that posted queries are resolved declines because AI raises contributors' outside options, thinning the contributor pool and creating congestion on the platform.
Mechanism and comparative-static implication produced by the paper's theoretical model; no empirical sample provided in the excerpt.
Flow margin: the posted volume of knowledge-enhancing queries declines as AI resolves more problems privately before they reach the platform.
Mechanism derived in the theoretical model; stated as the flow-margin channel (no empirical quantification in the provided text).
AI reduces archive creation through two distinct margins: a flow margin and a resolution margin.
Analytical decomposition derived within the paper's theoretical model (mechanism claimed by the model).
Generative AI resolves user problems without leaving a public trace, so fewer discussions and solutions reach public platforms.
Stated as an empirical motivation in the paper; no empirical sample or quantified measurement reported in the provided text.
The literature remains fragmented, with limited integrative frameworks to explain how AI-human dynamics and decision-making typologies shape outcomes.
Conclusion drawn from the systematic review and bibliometric analysis of the 627-article corpus as reported in the abstract.
Green AI research has largely measured the footprint of models rather than the downstream workflows in which GenAI is a tool.
Literature review / mapping of recent Green AI literature reported in the paper; descriptive claim about the focus of the field (no sample size or numerical counts reported in the abstract).
These findings highlight how existing caste hierarchies are reproduced in LLM decision-making and underscore the need for culturally grounded evaluation and intervention strategies in AI systems deployed in socially sensitive domains.
Interpretation and policy recommendation based on empirical patterns found in the audit (consistent hierarchical ratings and up-to-25% differences).
Inter-caste matches are further ordered according to traditional caste hierarchy.
Reported analytic pattern where inter-caste match ratings follow the traditional caste ranking (implied ordering across Brahmin, Kshatriya, Vaishya, Shudra, Dalit).
Existing benchmarks differ from real usage in programming language distribution, prompt style and codebase structure.
Paper asserts mismatch between existing benchmarks and production usage as motivation for producing a production-derived benchmark (stated differences: language distribution, prompt style, codebase structure).
Within robotics subsectors, system integration delivers earlier and stronger carbon-reduction effects than ontology manufacturing.
Subsector analysis in the panel data (277 prefecture-level cities, 2008–2019) comparing effects of system integration versus ontology manufacturing on urban carbon emissions.
The carbon-mitigation effects of robotics manufacturing are more pronounced in the central region of China than in the eastern region, indicating a latecomer advantage in green industrialization.
Heterogeneity analysis across geographic regions (central vs eastern regions) using the same panel of 277 prefecture-level cities (2008–2019).
A stage-dependent sequential mechanism operates: mature robotics manufacturing promotes robot adoption, which improves urban energy efficiency, and ultimately reduces carbon emissions; this channel is inactive at early stages of industry development.
Mechanism/mediation analysis using the panel data of 277 prefecture-level cities (2008–2019), presented as sequential pathway evidence in the paper.
Once robotics manufacturing reaches a moderate scale, further expansion leads to declines in urban carbon emissions.
Same panel dataset (277 prefecture-level cities, 2008–2019); econometric identification of the right-hand (declining) portion of the inverted U-shaped curve.
Replacing deterministic components with probabilistic workflows changes the failure mode: LLM pipelines may generate plausible but incorrect outputs that pass superficial checks and propagate into irreversible actions such as DOI minting and public release.
Conceptual argument supported by the paper's incident descriptions (e.g., a detected coordinate transformation error); the statement is presented as a general risk rationale.
Up to 25% of routine administrative tasks face high automation risk.
Quantitative survey of 150 leading Nigerian firms across finance, tech, and manufacturing reporting the share of tasks at high automation risk.
There is a significant deficit in high-demand technical competencies such as data engineering, machine learning maintenance, and AI ethics within the Nigerian workforce.
Findings reported from the quantitative survey of 150 leading Nigerian firms (finance, tech, manufacturing) supplemented by qualitative workforce interviews and policy analysis.
The remaining 26 barriers are carried over from prior digital transformation waves — 22 in amplified form and 4 unchanged.
Comparative coding/classification within the review corpus indicating whether each barrier is novel or carried over, and whether it is amplified versus unchanged.
Three barriers were identified as agentic-specific: error propagation in multi-agent systems, role ambiguity, and accountability diffusion.
Classification of the 29 coded barriers by 'agentic specificity' within the literature review; these three barriers were labeled agentic-specific by the authors.
Acemoglu and Restrepo (2022) attribute 50–70% of the increase in US wage inequality between 1980 and 2016 to displacement of workers from tasks by automation.
Citation to Acemoglu and Restrepo (2022) empirical decomposition reported in the paper.
Dechezleprêtre et al. (2025), exploiting Germany's Hartz reforms, estimate an elasticity of automation innovation to low-skill wages of 2–5 at the firm level.
Citation to Dechezleprêtre et al. (2025) empirical estimate reported in the literature review.
When employers have monopsony power, they choose technologies that expand this power beyond what a social planner would consider optimal.
Model results and discussion in Section 7 on interaction of technological choices and monopsony power.
Profit-maximizing firms pursue innovations that erode workers' market power (make them more replaceable), even at the expense of production efficiency; a social planner would instead prefer technologies that preserve workers' market power.
Theoretical analysis in the paper of firms' profit-maximizing technology choices under market power considerations, plus comparative planner outcome.
A welfare-maximizing planner chooses to automate fewer tasks than a production-efficiency benchmark would dictate when workers' welfare is heavily weighted.
Model analysis of optimal task automation vs. production efficiency under different welfare weights on workers.
Occupations whose AI-exposed steps are more dispersed across the production workflow (higher fragmentation) exhibit a substantially lower share of their steps actually executed by AI, conditional on AI exposure share.
Empirical regression analysis controlling for share of AI-exposed steps; uses dataset linking O*NET tasks, human AI exposure assessments, Anthropic Economic Index execution outcomes, and GPT-generated workflow orderings (details in Sections 5.1 and 7).
Treated firms' demand for external capital investment falls by just over $220,000 relative to the control group.
RCT with 515 firms; reported dollar-change in external investment demand between treated and control firms.
Despite faster growth, treated firms do not scale inputs proportionally: their demand for external capital investment falls by 39.5% relative to the control group.
RCT with 515 firms; firms reported external capital demand/investment requests; comparison of investment demand between treatment and control groups.
For the private business sector, if the set of automated tasks were frozen in 1950, 87% of TFP growth between 1950 and 2023 would have been eliminated.
Counterfactual growth-accounting exercise that freezes the set of automated tasks at 1950 while allowing capital, labor, and other productivity growth to follow historical rates (simulation based on calibrated accounting).
The sum of "other" TFP growth and average labor productivity growth (ˆZt + ˆψℓt) is small — for example equal to -0.1% per year for the private business sector since 1950.
Growth-accounting decomposition for the private business sector since 1950 using BEA/BLS data in the task-based framework.
Under the rapid scenario, economists forecast the share of wealth held by the wealthiest 10% of households rising to 80.0% by 2050.
Conditional forecasts in Key Findings for the economist respondent group under the rapid AI scenario (2050 horizon).
Conditional on the rapid scenario, economists forecast the labor force participation rate falling from its current level of 62% to 55% by 2050.
Conditional forecasts in Key Findings for the economist respondent group under the rapid AI scenario (2050 horizon).
There are macroeconomic risks associated with AI-led unemployment.
Paper's macroeconomic analysis drawing on labor economics and technology adoption research; no quantitative estimates or sample sizes provided in the summary.
Managerial incentives drive premature workforce contraction during AI adoption.
Analytical claim grounded in labor economics and organizational behavior review; the summary indicates examination of managerial incentives but does not report primary empirical tests or sample sizes.
Premature workforce contraction in response to AI adoption foreshadows deeper structural challenges as AI systems mature.
Forward-looking claim based on synthesis of literature and theoretical projection; no empirical quantification or sample provided in the summary.
This pattern of premature workforce reductions reflects longstanding corporate short-termism rather than genuine technological displacement.
The paper's interpretation drawing on labor economics and organizational behavior literature; no empirical study or sample size reported in the summary.
Organizations face mounting pressure to demonstrate immediate returns on AI investments, often through workforce reductions that outpace actual automation capabilities.
Argument in paper citing accelerating AI adoption across sectors and observed managerial responses; no primary dataset or sample size reported in the text.
In the limiting case of full automation, the model predicts that optimal recombination distance collapses to zero, suggesting that fully AI-driven research would undermine the very knowledge creation that it seeks to accelerate.
Limiting-case analytical result of the model: as the share of AI-automated tasks approaches 1 (full automation), the derived optimal recombination distance converges to zero.