Evidence (7953 claims)
Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 402 | 112 | 67 | 480 | 1076 |
| Governance & Regulation | 402 | 192 | 122 | 62 | 790 |
| Research Productivity | 249 | 98 | 34 | 311 | 697 |
| Organizational Efficiency | 395 | 95 | 70 | 40 | 603 |
| Technology Adoption Rate | 321 | 126 | 73 | 39 | 564 |
| Firm Productivity | 306 | 39 | 70 | 12 | 432 |
| Output Quality | 256 | 66 | 25 | 28 | 375 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 76 | 38 | 20 | 315 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 77 | 34 | 80 | 9 | 202 |
| Skill Acquisition | 92 | 33 | 40 | 9 | 174 |
| Innovation Output | 120 | 12 | 23 | 12 | 168 |
| Firm Revenue | 98 | 34 | 22 | — | 154 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 84 | 16 | 33 | 7 | 140 |
| Inequality Measures | 25 | 77 | 32 | 5 | 139 |
| Regulatory Compliance | 54 | 63 | 13 | 3 | 133 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Task Completion Time | 88 | 5 | 4 | 3 | 100 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 32 | 11 | 7 | 97 |
| Wages & Compensation | 53 | 15 | 20 | 5 | 93 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 24 | 22 | 9 | 6 | 62 |
| Job Displacement | 6 | 38 | 13 | — | 57 |
| Hiring & Recruitment | 41 | 4 | 6 | 3 | 54 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 10 | 6 | 2 | 40 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 5 | 9 | — | 26 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
Treated firms complete 12% more tasks.
RCT with 515 firms; weekly progress reports used to measure tasks completed; comparison of completed tasks between treatment (255) and control (260) groups.
The additional AI use cases discovered by treated firms are concentrated in product development and strategy-related domains.
Analysis of categorized AI use cases reported in weekly progress reports from the randomized accelerator sample (515 firms); comparison of functional distribution of use cases between treated and control firms.
Treated firms discover 2.7 additional AI use cases (a 44% increase).
Randomized field experiment in a 3-month accelerator; sample of 515 high-growth startups, 255 treatment and 260 control; weekly progress reports capturing AI use cases; treatment delivered case-study workshops prompting broader search for AI use cases.
Under an extreme calibration where A.I. makes the entire economy grow like the computer industry, growth 'explodes' with incomes becoming infinite in finite time; infinite income does not occur until around 2060 even in this extreme calibration.
Simulation of the endogenous-automation endogenous-growth model calibrated to the fast-automation (computer industry) scenario.
Simulating the calibrated endogenous-automation model under an 'A.I. as a continuation of historical patterns' calibration yields growth rates reaching only 2.5% by 2075.
Forward simulations of an endogenous-growth model calibrated to historical private business sector patterns (model + calibration + simulation).
The main benefit of automation is that it allows production of a task to shift from slowly-improving human labor to rapidly-improving machines.
Theoretical argument within the task-based model and supporting historical accounting showing faster capital-augmenting productivity growth relative to labor.
At the task level, capital productivity has grown at least 3 percentage points per year faster than labor productivity.
Historical task-level growth accounting across sectors using BEA/BLS data and the paper's task-based decomposition; statement appears in abstract and introduction summarizing empirical findings across sectors.
Historically, TFP growth is driven primarily by improvements in capital productivity.
Growth accounting using a task-based model applied to aggregate U.S. data (BEA and BLS) and industry-level data; theoretical decomposition separating capital-augmenting, labor-augmenting, and "other" productivity components.
Economists strongly favor targeted policy interventions such as AI-focused worker retraining (71.8% support) over broad structural interventions like job guarantees (13.7% support) or universal basic income (37.4% support).
Survey items asking respondents to indicate normative support for six policy proposals; reported support percentages for the economist group for specific policies (retraining, job guarantee, UBI).
Economists (as a group) forecast GDP growth of 3.5% under the rapid AI scenario.
Conditional forecasts reported in Key Findings (economist subgroup forecasts under the rapid progress scenario).
The median respondent in each group expects annual U.S. GDP growth of about 2.5% (unconditional forecast).
Unconditional (all-things-considered) survey forecasts of annual GDP growth elicited from respondents across five groups; compared in text to government and private-sector baseline forecasts (typical medium-run 2.0% and long-run 1.7%).
The average economist assigns a 61.4% probability to moderate or rapid AI progress by 2030.
Survey responses from the economist respondent group reporting the mean/average subjective probability for the combined 'moderate' and 'rapid' scenario categories.
The median respondent in each group expects substantial advances in AI capabilities by 2030.
Survey of five respondent groups (academic economists, AI-company employees, AI policy researchers, highly accurate forecasters, and the general public) eliciting unconditional and conditional forecasts about AI capabilities and economic outcomes (details and sample sizes referenced in Section 2.1, not provided in excerpt).
Organizations and policymakers that treat work-time policy as foundational economic planning will better position their economies to harness AI's benefits while mitigating systemic instability.
Policy-prescriptive conclusion based on cross-disciplinary analysis; no empirical trial or quantification offered in the summary.
Work-time reduction can distribute productivity gains more equitably.
Argument supported by examination of historical work-time transitions and pilot programs referenced in the article; no empirical effect sizes or sample details in the summary.
Coordinated reduction in working hours helps maintain aggregate demand.
The paper's synthesis of historical transitions and pilot programs and argument about distribution of productivity gains; no quantitative evidence or sample sizes provided in the summary.
Gradual, policy-led reduction in standard working hours can preserve employment.
Claim based on examination of historical work-time transitions, contemporary pilot programs, and cross-sector implementation strategies referenced in the paper; no specific studies or sample sizes cited in the summary.
Platforms should implement AIGC-sensitive distribution algorithms and precise governance frameworks to ensure the long-term health of online content platforms.
Policy/recommendation derived from the paper's empirical findings on consumption preferences, producer behaviors, and the moderating role of distribution algorithms.
AIGC creators achieve aggregate engagement comparable to HGC creators by producing content at high volume (a 'scale-over-preference' dynamic).
Analysis of creation and engagement patterns in the dataset showing that AIGC creators compensate for lower per-item engagement by higher production volume, yielding comparable aggregate engagement levels to HGC creators.
Consumers show a marked preference for Human-Generated Content (HGC) over Artificial Intelligence-Generated Content (AIGC).
Comparative analysis of consumption behavior in the longitudinal dataset; the paper reports consumption metrics that indicate higher consumer preference for HGC versus AIGC (e.g., relative engagement per item).
AI facilitates access to distant knowledge domains.
Theoretical model (Schumpeterian quality-ladder recombinant-innovation framework). The paper models R&D as recombining ideas across a knowledge space and shows analytically that AI increases firms' ability to combine ideas across longer distances.
Systematic quality auditing should be standard practice for complex agentic tasks.
Normative recommendation based on the authors' methodological and empirical findings that auditing revealed substantial benchmark issues affecting evaluation of agent capabilities.
Re-evaluating on ELT-Bench-Verified yields significant improvement attributable entirely to benchmark correction.
Re-evaluation of agent performance on the revised benchmark which the authors claim shows significant improvement and that this improvement is due to the benchmark corrections; no quantitative effect sizes or sample sizes provided in the excerpt.
Based on these findings, we construct ELT-Bench-Verified, a revised benchmark with refined evaluation logic and corrected ground truth.
Development and release of a revised benchmark (ELT-Bench-Verified) incorporating refined evaluation logic and corrected ground truth as described in the paper.
We develop an Auditor-Corrector methodology that combines scalable LLM-driven root-cause analysis with rigorous human validation (inter-annotator agreement Fleiss' kappa = 0.85) to audit benchmark quality.
Description of a methodology combining LLM root-cause analysis and human validation; human validation reported with inter-annotator agreement Fleiss' kappa = 0.85.
Re-evaluating ELT-Bench with upgraded large language models reveals that the extraction and loading stage is largely solved, while transformation performance improves significantly.
Re-evaluation performed using upgraded LLMs comparing performance across ELT pipeline stages; specific performance metrics or sample sizes not reported in the excerpt.
Constructing Extract-Load-Transform (ELT) pipelines is a labor-intensive data engineering task and a high-impact target for AI automation.
Statement in the paper framing ELT pipeline construction as labor-intensive and high-impact; no empirical data or sample size reported in the provided excerpt.
Competition law assessments of a dominant undertaking’s conduct must consider not only the product market but also the labor market, particularly in cases of significant market structure changes.
Conclusion stated in abstract summarizing the paper’s findings; supported by the paper's legal analysis and referenced case law (no empirical sample provided in abstract).
Poaching employees is an inherent aspect of competition for highly qualified talent and is particularly pronounced among tech giants.
Statement in abstract; general observation supported by literature/case-law references implied in paper (no specific empirical sample or quantitative method reported in abstract).
A statistical recalibration technique called conformal prediction can correct this overconfidence, expanding the intervals to achieve the intended coverage.
Application of conformal prediction to the LLM interval outputs in the experiment, resulting in expanded intervals that attain the target coverage.
Larger, more capable models produce more accurate estimates.
Empirical experiment asking eleven LLMs to estimate population statistics (health prevalence rates, personality trait distributions, labor market figures) and comparing accuracy across models of different capability.
Applying the Method of Moments Quantile Regression (MMQR) allows the study to capture heterogeneous impacts of robotics across performance levels.
Authors describe use of MMQR in methodology and justify it as appropriate for detecting heterogeneity across quantiles of the dependent variable (value added).
The study uses panel data from Eurostat, the International Federation of Robotics (2024), and World Robotics covering three key sectors in selected EU countries.
Data sources explicitly listed in the paper (Eurostat, IFR 2024, World Robotics); the scope is described as three key sectors in selected EU countries.
Policymakers should support automation through fiscal incentives, invest in reskilling programs, and develop innovation strategies tailored to specific sectors to foster inclusive and sustainable growth.
Policy recommendations derived from empirical findings showing heterogeneous effects of robot density, R&D and human capital across sectors; authors explicitly recommend fiscal incentives, reskilling, and sector-targeted innovation strategies.
The paper’s novelty lies in its differentiated, cross-sectoral approach integrating technological adoption (robotics) with sectoral gross value added using advanced econometric techniques (MMQR).
Authors state the study's contribution is differentiated cross-sectoral analysis and use of MMQR to capture heterogeneous impacts; methodological description provided in paper.
The positive effect of robot density on value added is particularly strong in higher-performing sectors (i.e., at higher quantiles of the value-added distribution).
Results from MMQR showing heterogeneous impacts across performance levels/quantiles; authors state larger positive coefficients of robot density at upper quantiles.
Increased robot density significantly enhances value added.
Empirical analysis using panel data (Eurostat, International Federation of Robotics 2024, World Robotics) estimated with Method of Moments Quantile Regression (MMQR); gross value added used as dependent variable and robot density as a core explanatory variable; authors report statistically significant positive coefficients.
The paper proposes five architectural requirements for genuine human oversight systems.
Stated methodological/prescriptive contribution of the paper (a proposal rather than an empirical finding); no sample size or empirical validation reported in the provided excerpt.
The proposed framework outlines a pathway toward large-scale cooperative intelligence and offers a constructive perspective on the coevolution of human and artificial agents in the informational ecosystems of the future.
Claim about the paper's contribution; based on conceptual synthesis and theoretical framing rather than empirical validation.
A voluntary ecosystem of free rational agents, human and artificial, who cooperate through transparent and fair exchange of information maximizes their adaptive capacity and long-term well-being.
Normative proposition in the paper derived from theoretical principles (information theory, collective intelligence); presented as a proposed ideal rather than an empirically tested policy.
Emerging opportunities exist for stabilizing these ecosystems through new forms of informational verification and monitoring made possible by advanced artificial agents.
Forward-looking claim grounded in conceptual analysis of capabilities of advanced agents; proposed as an opportunity in the paper rather than demonstrated empirically.
Systems that preserve diversity of exploration while minimizing barriers to information exchange exhibit superior capacity for discovery and adaptation in complex environments.
Theoretical claim supported by the paper's appeal to principles from information theory, adaptive systems, and collective intelligence; presented as an argument rather than as empirically validated result.
Increasing the strictness of algorithmic control paradoxically increases the evolutionary fitness of coordinated resistance (e.g., coordinated log-offs).
Results from the EGT model and simulations showing fitness/payoff changes for coordinated resistance strategies as platform surveillance strictness parameter increases; model-only (no empirical N reported).
The primary contribution is a controlled agent-payment infrastructure and reference architecture that demonstrates how agentic access monetization can be adapted to fiat systems without discarding security and policy guarantees.
Summary of the paper's claimed contribution (architectural demonstration and reference implementation).
Multiple trial runs show low variance across scenarios, demonstrating high reproducibility with 95% confidence intervals.
Reported statistical characterization from repeated trials in the paper (statement of low variance and 95% confidence intervals across scenarios).
Security mechanisms impose low latency overhead (19.6ms average).
Performance measurement reported in the paper's experiments (average latency overhead reported as 19.6ms).
Security mechanisms achieve 100% block rate for both replay attacks and invalid tokens.
Experimental security evaluation reported in the paper (block rate reported at 100% for replay attacks and invalid tokens).
The system uses FastAPI, SQLite, and Python standard libraries, making it transparent, inspectable, and reproducible.
Implementation stack specified in the paper and availability of reference implementation; asserted reproducibility.
APEX implements a challenge–settle–consume lifecycle with HMAC-signed short-lived tokens, idempotent settlement handling, and policy-aware payment approval.
Implementation details described in the methods/architecture section and supported by the provided reference implementation.
We present APEX, an implementation-complete research system that adapts HTTP 402-style payment gating to UPI-like fiat workflows while preserving policy-governed spend control, tokenized access verification, and replay resistance.
System design and implementation presented in the paper (codebase built using FastAPI, SQLite, Python; demonstration/implementation claimed).