Evidence (14156 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Excessive reliance on AI may reduce the originality of research and lead to duplication of research efforts.
Model implication: as the share of tasks automated by AI increases, the paper shows analytically that originality can decline and firms may duplicate research efforts (due to homogenization of methods or search), reducing novel knowledge creation.
AI increases the aggregate rate of creative destruction, shortening the monopoly duration that rewards radical innovations.
Analytical result from the model: introducing AI raises the aggregate creative-destruction rate in the Schumpeterian framework, which reduces the expected monopoly duration and thus the rents that sustain radical innovation.
Applying the Auditor-Corrector methodology to ELT-Bench uncovers that most failed transformation tasks contain benchmark-attributable errors — including rigid evaluation scripts, ambiguous specifications, and incorrect ground truth — that penalize correct agent outputs.
Audit results on ELT-Bench identifying categories of benchmark errors (rigid scripts, ambiguous specs, incorrect ground truth) and attributing many failed transformation tasks to these errors; no numeric breakdown or sample count given in the excerpt.
On ELT-Bench, the first benchmark for end-to-end ELT pipeline construction, AI agents initially showed low success rates, suggesting they lacked practical utility.
Reference to initial evaluation results on ELT-Bench showing low success rates for AI agents; the provided excerpt does not give numerical success rates or sample size.
Such predatory-hiring cases often fall outside the scope of merger control because they fail to meet the applicable thresholds, warranting consideration under the abuse of dominance prohibition in Article 102 TFEU.
Legal analysis stated in abstract referencing merger control thresholds and Article 102 TFEU (no quantitative sample provided in abstract).
When a dominant undertaking in a concentrated market strategically targets and hires a large portion—or the entirety—of a smaller competitor’s key personnel, this behavior can raise significant competition concerns.
Legal argument presented in abstract; draws on relevant case law and scholarship (no empirical sample or experimental method reported in abstract).
LLM uncertainty estimates require statistical correction before they can be used in decision-making.
Empirical finding of severe undercoverage of nominal 95% intervals and demonstration that conformal recalibration is needed to achieve intended coverage.
All models are severely overconfident: their 95% intervals contain the true value only 9--44% of the time, far below the expected 95%.
Analysis of model-produced 95% credible intervals across elicited population statistics, measuring empirical coverage rates reported between 9% and 44%.
There is a governance window—estimated at 10–15 years—before current deployment trajectories risk path-dependent social, economic, and institutional lock-in.
Forward-looking estimate/projection provided in the paper based on the authors' characterization of deployment trajectories and governance dynamics (no empirical sample size provided in the excerpt).
Societal consequences of labor displacement intensify the governance gap by concentrating consequential AI decision-making among an increasingly narrow class of technical and capital actors.
Analytic/theoretical claim in the paper drawing on the paper's multi-domain argument (no empirical sample size or quantified concentration metrics provided in the excerpt).
This nominal-vs-genuine oversight distinction represents the primary architectural failure mode in deployed AI governance.
Argumentative claim based on the paper's multi-domain synthesis and theoretical analysis; no empirical sample size or quantified causal inference provided in the excerpt.
The distinction between nominal and genuine human oversight is largely absent from current governance frameworks, including the EU AI Act and NIST AI Risk Management Framework 1.0.
Comparative policy/regulatory review claimed in the paper (explicit reference to the EU AI Act and NIST AI RMF 1.0); no sample size—based on textual/regulatory analysis rather than statistical data in the provided excerpt.
There exists a critical and underexamined governance gap between nominal human oversight of AI systems (humans in formal authority positions) and genuine human oversight (humans with cognitive access, technical capability, and institutional authority to understand, evaluate, and override AI outputs).
Conceptual/qualitative analysis and argumentation presented in the paper; implied synthesis of case examples and theoretical considerations rather than a quantified empirical study in the provided excerpt.
The accelerating displacement of human labor by artificial intelligence (AI) and robotic systems represents a structural transformation whose societal consequences extend far beyond conventional labor market analysis.
Stated as a framing claim in the paper; supported by the paper's literature review and multi-domain conceptual argument (no empirical sample size or quantitative data reported in the provided excerpt).
Sustaining such cooperative informational systems has historically proven difficult due to structural incentives that gradually erode transparency and trust.
Historical/analytical assertion in the paper; presented as a high-level observation (no dataset or empirical historical analysis provided in the excerpt).
The interaction between strict algorithmic control and worker counter-strategies leads to persistent limit cycles in strategy frequencies rather than convergence to a stable compliant workforce.
Dynamical systems analysis and simulation trajectories from the EGT model showing limit cycles / oscillatory equilibria in strategy proportions; model-based (no empirical sample).
Policy enforcement reduces total spending by 27.3%.
Quantitative result reported from the paper's experiments across baselines and scenarios (paper reports a 27.3% reduction attributed to policy enforcement).
In many deployment contexts, especially countries with strong real-time fiat systems like UPI, relying on crypto rails is misaligned with regulatory and infrastructure realities.
Contextual/argumentative claim in the paper contrasting crypto reliance with fiat systems such as UPI (no empirical country-level sample reported).
The way we're thinking about generative AI right now is fundamentally individual (this appears in how users interact with models, how models are built, how they're benchmarked, and how commercial and research strategies using AI are defined).
Author's observational/descriptive claim supported by argumentative examples (mentions user interaction patterns, model design and benchmarking practices, and commercial/research strategies); no empirical sample or quantitative analysis reported in the excerpt.
The emission-reduction effect of AI innovation is more significant for firms located in regions with underdeveloped factor markets.
Heterogeneity (regional subsample/interaction) analysis reported in the paper on the 21,428 firm-year sample, indicating larger AI-related emission reductions in regions with less developed factor markets.
The emission-reduction effect of AI innovation is more significant for firms in high-environmental-sensitivity industries.
Heterogeneity (subsample/interaction) analysis in the paper using the 21,428 firm-year observations, showing stronger AI-related emission reductions in industries characterized as high environmental sensitivity.
The emission-reduction effect of AI innovation is more significant for enterprises with a low supply chain concentration.
Heterogeneity (subsample) analysis reported in the paper using the 21,428 firm-year dataset, comparing effects across firms with different supply chain concentration levels.
Executives’ green cognition and government environmental attention together constitute dual internal and external driving forces for corporate carbon emission reduction.
Further analysis reported in the paper (moderation/interaction analysis or additional regressions) on the same 21,428 firm-year sample showing these factors strengthen carbon reduction associated with AI innovation.
AI innovation can significantly reduce corporate carbon emission intensity.
Empirical analysis using panel data of 21,428 firm-year observations from Chinese A-share listed manufacturing companies over 2010–2022; result reported in the paper's main regressions (method described as micro-level empirical analysis).
Traditional questionnaires yielded slightly higher accuracy in risk assessment.
Result reported from the two experiments comparing traditional questionnaires to adaptive ARQuest versions; no numeric accuracy or sample size provided in the excerpt.
Insurers must blindly trust users' responses, increasing the chances of fraud.
Stated as a motivating problem in the paper; presented as logical/empirical concern rather than supported by a reported study within the paper.
Insurance application processes often rely on lengthy and standardized questionnaires that struggle to capture individual differences.
Descriptive claim in paper introduction arguing limitations of standard questionnaires; no experiment or sample size reported for this assertion.
AI's disproportionate benefits for lagging regions help narrow interprovincial emission gaps.
Heterogeneity analysis reported in the provincial panel (2003–2021) showing stronger AI-related reductions in emissions inequality for lagging regions compared to advanced regions.
Green innovation is concentrated in coastal provinces and has not effectively diffused to inland areas, limiting its ability to reduce regional carbon inequality.
Spatial distribution analysis within the provincial panel showing geographic concentration of green innovation activity in coastal provinces and limited diffusion inland.
AI reduces carbon inequality primarily through improved energy efficiency, enhanced environmental monitoring, and more efficient resource allocation, disproportionately benefiting lagging regions and narrowing interprovincial emission gaps.
Mechanism analysis reported in the paper based on the provincial panel (2003–2021) linking AI development to proximate channels (energy efficiency, monitoring, resource allocation) and heterogeneous impacts across regions.
AI development significantly reduces carbon inequality, particularly when measured by the Gini index.
Empirical analysis using a provincial panel dataset covering 2003–2021; carbon inequality measured with the Gini index; reported statistically significant negative association between AI development and Gini-measured carbon inequality.
Using a stylised inpatient capacity signalling example and minimal game-theoretic reasoning, task optimisation alone is unlikely to change system outcomes when incentives are unchanged.
Theoretical analysis using a stylised inpatient capacity signalling example and game-theoretic reasoning presented in the paper (no empirical data/sample reported in the abstract).
Deployment of AI systems carries significant costs including ongoing costs of monitoring and it is unclear whether optimism of a deus ex machina solution is well-placed.
Conceptual/argumentative claim made by the authors in the paper (no empirical study or sample size reported in the abstract).
Cross-equipment generalization is poor, with 42.7% performance on held-out datasets.
Paper reports held-out dataset evaluation showing 42.7% (presumably accuracy or task completion) for cross-equipment generalization.
Multi-asset reasoning causes a 14.9 percentage point degradation in performance.
Paper reports a 14.9 percentage point performance degradation attributed to multi-asset reasoning in comparative analyses.
There are systematic failures in tool orchestration, with 23% incorrect sequencing.
Paper reports a measured incorrect sequencing rate of 23% during evaluation of agent tool orchestration across scenarios.
Even top-performing configurations achieve only 68% task completion.
Reported aggregated performance result from the benchmark evaluation across the tested frameworks and LLMs (paper statement). The benchmark contains 75 scenarios (used as evaluation instances).
Improvements in operational resilience (OR) effectively reduce corporate operational risk.
Further analysis reported in the paper linking higher OR to lower operational risk measures for firms in the sample.
AI promotes operational resilience by reducing management agency conflicts.
Mechanism (mediation) tests reported in the paper showing AI associated with reductions in measures of agency/management conflict, which in turn relate to OR improvements.
Mandatory release delays can paradoxically reduce deployed model quality by shifting preemption to the announcement stage, where quality locks in before the mandated waiting period.
Model extension analyzing mandatory waiting periods: equilibrium strategic behavior shifts to earlier announcements and quality commitment, yielding lower quality at deployment than without the delay.
Premature release imposes safety externalities on society that firms do not fully internalize.
Model assumption and subsequent analysis: the paper models a socially harmful safety externality from early deployment that firms ignore (or undervalue) in their private payoff calculations.
Equilibrium release occurs strictly before the social optimum.
Analytic characterization of the symmetric Nash equilibrium in a theoretical preemption game where firms trade off development time (quality) against first-mover advantages; comparative statics show equilibrium release time < socially optimal release time.
Over time the equalizing channel weakened because market valuation (wage exposure) became increasingly unfavorable to female-concentrated occupations, contributing to a renewed widening of the gender wage gap in 2015–2019.
Decomposition results showing a temporal decline in the wage-exposure contribution to equality and a negative wage-exposure trend for female-concentrated occupations, coinciding with gap widening in 2015–2019.
Women experienced greater exposure to displacement compared with men.
Gender-disaggregated results from stacked first-difference estimations and dynamic shift-share decomposition showing higher displacement exposure for female workers.
Routine displacement unfolds episodically rather than simultaneously, with relative contraction in routine cognitive jobs (2001–2005), routine manual jobs (2005–2010), and renewed routine cognitive pressures (2015–2019).
Empirical results from stacked first-difference estimations and a dynamic shift-share decomposition applied to Indonesian formal wage-worker data over 2001–2019.
Enterprise adoption of LLMs is constrained by hallucination, domain drift, and the inability to enforce regulatory compliance at the reasoning level.
Framed as the motivating problem in the paper's introduction/abstract (conceptual claim; no empirical test reported here).
No regulatory framework requires disclosure of machine/AI labor output.
Author's assertion in the paper (policy claim; no legislative survey or quantification reported).
No index tracks machine labor output over time.
Author's assertion in the paper (stated lack of existing indices; no systematic review/sample reported).
This labor force is entirely invisible to the economic infrastructure humanity has built to measure work: no standardized unit of measurement exists.
Author's assertion/diagnosis in the paper (argumentative/observational, no empirical survey or sample reported).
Specific occupations such as credit analysts, judges, and sustainability specialists reach ATE scores of 0.43-0.47 by 2030.
Reported model outputs / ATE score estimates for individual occupations within the paper's 2025-2030 regional application.