Evidence (7953 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Treated firms complete 12% more tasks.

RCT with 515 firms; weekly progress reports used to measure tasks completed; comparison of completed tasks between treatment (255) and control (260) groups.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... number of tasks completed

The additional AI use cases discovered by treated firms are concentrated in product development and strategy-related domains.

Analysis of categorized AI use cases reported in weekly progress reports from the randomized accelerator sample (515 firms); comparison of functional distribution of use cases between treated and control firms.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... distribution of AI use cases across firm functions (e.g., product development, s...

Treated firms discover 2.7 additional AI use cases (a 44% increase).

Randomized field experiment in a 3-month accelerator; sample of 515 high-growth startups, 255 treatment and 260 control; weekly progress reports capturing AI use cases; treatment delivered case-study workshops prompting broader search for AI use cases.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... number of AI use cases discovered

Under an extreme calibration where A.I. makes the entire economy grow like the computer industry, growth 'explodes' with incomes becoming infinite in finite time; infinite income does not occur until around 2060 even in this extreme calibration.

Simulation of the endogenous-automation endogenous-growth model calibrated to the fast-automation (computer industry) scenario.

high positive Past Automation and Future A.I.: How Weak Links Tame the Gro... occurrence and timing of a finite-time singularity (infinite income) in simulate...

Simulating the calibrated endogenous-automation model under an 'A.I. as a continuation of historical patterns' calibration yields growth rates reaching only 2.5% by 2075.

Forward simulations of an endogenous-growth model calibrated to historical private business sector patterns (model + calibration + simulation).

high positive Past Automation and Future A.I.: How Weak Links Tame the Gro... projected economy-wide growth rate by 2075

The main benefit of automation is that it allows production of a task to shift from slowly-improving human labor to rapidly-improving machines.

Theoretical argument within the task-based model and supporting historical accounting showing faster capital-augmenting productivity growth relative to labor.

high positive Past Automation and Future A.I.: How Weak Links Tame the Gro... contribution of automation to productivity/TFP growth

At the task level, capital productivity has grown at least 3 percentage points per year faster than labor productivity.

Historical task-level growth accounting across sectors using BEA/BLS data and the paper's task-based decomposition; statement appears in abstract and introduction summarizing empirical findings across sectors.

high positive Past Automation and Future A.I.: How Weak Links Tame the Gro... gap in growth rates between capital productivity and labor productivity at the t...

Historically, TFP growth is driven primarily by improvements in capital productivity.

Growth accounting using a task-based model applied to aggregate U.S. data (BEA and BLS) and industry-level data; theoretical decomposition separating capital-augmenting, labor-augmenting, and "other" productivity components.

high positive Past Automation and Future A.I.: How Weak Links Tame the Gro... TFP growth

Economists strongly favor targeted policy interventions such as AI-focused worker retraining (71.8% support) over broad structural interventions like job guarantees (13.7% support) or universal basic income (37.4% support).

Survey items asking respondents to indicate normative support for six policy proposals; reported support percentages for the economist group for specific policies (retraining, job guarantee, UBI).

high positive Forecasting the Economic Effects of AI policy support percentages among economists

Economists (as a group) forecast GDP growth of 3.5% under the rapid AI scenario.

Conditional forecasts reported in Key Findings (economist subgroup forecasts under the rapid progress scenario).

high positive Forecasting the Economic Effects of AI annual GDP growth under rapid AI scenario (economists)

The median respondent in each group expects annual U.S. GDP growth of about 2.5% (unconditional forecast).

Unconditional (all-things-considered) survey forecasts of annual GDP growth elicited from respondents across five groups; compared in text to government and private-sector baseline forecasts (typical medium-run 2.0% and long-run 1.7%).

high positive Forecasting the Economic Effects of AI annual GDP growth (unconditional forecast)

The average economist assigns a 61.4% probability to moderate or rapid AI progress by 2030.

Survey responses from the economist respondent group reporting the mean/average subjective probability for the combined 'moderate' and 'rapid' scenario categories.

high positive Forecasting the Economic Effects of AI probability assigned to moderate or rapid AI progress by 2030

The median respondent in each group expects substantial advances in AI capabilities by 2030.

Survey of five respondent groups (academic economists, AI-company employees, AI policy researchers, highly accurate forecasters, and the general public) eliciting unconditional and conditional forecasts about AI capabilities and economic outcomes (details and sample sizes referenced in Section 2.1, not provided in excerpt).

high positive Forecasting the Economic Effects of AI AI capability progress by 2030

Organizations and policymakers that treat work-time policy as foundational economic planning will better position their economies to harness AI's benefits while mitigating systemic instability.

Policy-prescriptive conclusion based on cross-disciplinary analysis; no empirical trial or quantification offered in the summary.

high positive A Shorter Workweek as Economic Infrastructure: Managing AI-D... economic resilience / ability to harness AI benefits and mitigate instability

Work-time reduction can distribute productivity gains more equitably.

Argument supported by examination of historical work-time transitions and pilot programs referenced in the article; no empirical effect sizes or sample details in the summary.

high positive A Shorter Workweek as Economic Infrastructure: Managing AI-D... distribution of productivity gains / equity in gains

Coordinated reduction in working hours helps maintain aggregate demand.

The paper's synthesis of historical transitions and pilot programs and argument about distribution of productivity gains; no quantitative evidence or sample sizes provided in the summary.

high positive A Shorter Workweek as Economic Infrastructure: Managing AI-D... aggregate demand / consumption

Gradual, policy-led reduction in standard working hours can preserve employment.

Claim based on examination of historical work-time transitions, contemporary pilot programs, and cross-sector implementation strategies referenced in the paper; no specific studies or sample sizes cited in the summary.

high positive A Shorter Workweek as Economic Infrastructure: Managing AI-D... employment levels / preservation of jobs

Platforms should implement AIGC-sensitive distribution algorithms and precise governance frameworks to ensure the long-term health of online content platforms.

Policy/recommendation derived from the paper's empirical findings on consumption preferences, producer behaviors, and the moderating role of distribution algorithms.

high positive Scale over Preference: The Impact of AI-Generated Content on... long-term platform health (qualitative recommendation target)

AIGC creators achieve aggregate engagement comparable to HGC creators by producing content at high volume (a 'scale-over-preference' dynamic).

Analysis of creation and engagement patterns in the dataset showing that AIGC creators compensate for lower per-item engagement by higher production volume, yielding comparable aggregate engagement levels to HGC creators.

high positive Scale over Preference: The Impact of AI-Generated Content on... aggregate engagement per creator (total engagement across produced items)

Consumers show a marked preference for Human-Generated Content (HGC) over Artificial Intelligence-Generated Content (AIGC).

Comparative analysis of consumption behavior in the longitudinal dataset; the paper reports consumption metrics that indicate higher consumer preference for HGC versus AIGC (e.g., relative engagement per item).

high positive Scale over Preference: The Impact of AI-Generated Content on... consumer preference (relative engagement per content type)

AI facilitates access to distant knowledge domains.

Theoretical model (Schumpeterian quality-ladder recombinant-innovation framework). The paper models R&D as recombining ideas across a knowledge space and shows analytically that AI increases firms' ability to combine ideas across longer distances.

high positive Bridging Distant Ideas: the Impact of AI on R&D and Recombin... access to distant knowledge domains (distance of recombinations)

Systematic quality auditing should be standard practice for complex agentic tasks.

Normative recommendation based on the authors' methodological and empirical findings that auditing revealed substantial benchmark issues affecting evaluation of agent capabilities.

high positive ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... recommendation for adoption of systematic quality auditing (policy/practice prop...

Re-evaluating on ELT-Bench-Verified yields significant improvement attributable entirely to benchmark correction.

Re-evaluation of agent performance on the revised benchmark which the authors claim shows significant improvement and that this improvement is due to the benchmark corrections; no quantitative effect sizes or sample sizes provided in the excerpt.

high positive ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... change in agent performance after benchmark correction

Based on these findings, we construct ELT-Bench-Verified, a revised benchmark with refined evaluation logic and corrected ground truth.

Development and release of a revised benchmark (ELT-Bench-Verified) incorporating refined evaluation logic and corrected ground truth as described in the paper.

high positive ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... existence of a revised benchmark with corrected evaluation and ground truth

We develop an Auditor-Corrector methodology that combines scalable LLM-driven root-cause analysis with rigorous human validation (inter-annotator agreement Fleiss' kappa = 0.85) to audit benchmark quality.

Description of a methodology combining LLM root-cause analysis and human validation; human validation reported with inter-annotator agreement Fleiss' kappa = 0.85.

high positive ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... benchmark audit reliability as measured by inter-annotator agreement (Fleiss' ka...

Re-evaluating ELT-Bench with upgraded large language models reveals that the extraction and loading stage is largely solved, while transformation performance improves significantly.

Re-evaluation performed using upgraded LLMs comparing performance across ELT pipeline stages; specific performance metrics or sample sizes not reported in the excerpt.

high positive ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... performance on extraction/loading and transformation stages of ELT pipeline cons...

Constructing Extract-Load-Transform (ELT) pipelines is a labor-intensive data engineering task and a high-impact target for AI automation.

Statement in the paper framing ELT pipeline construction as labor-intensive and high-impact; no empirical data or sample size reported in the provided excerpt.

high positive ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... labor intensity and suitability for AI automation (qualitative claim)

Competition law assessments of a dominant undertaking’s conduct must consider not only the product market but also the labor market, particularly in cases of significant market structure changes.

Conclusion stated in abstract summarizing the paper’s findings; supported by the paper's legal analysis and referenced case law (no empirical sample provided in abstract).

high positive Employee Poaching as An Abuse of Dominance Under Article 102... scope of competition law assessment (inclusion of labor market considerations)

Poaching employees is an inherent aspect of competition for highly qualified talent and is particularly pronounced among tech giants.

Statement in abstract; general observation supported by literature/case-law references implied in paper (no specific empirical sample or quantitative method reported in abstract).

high positive Employee Poaching as An Abuse of Dominance Under Article 102... frequency/prevalence of employee poaching among firms (not quantitatively measur...

A statistical recalibration technique called conformal prediction can correct this overconfidence, expanding the intervals to achieve the intended coverage.

Application of conformal prediction to the LLM interval outputs in the experiment, resulting in expanded intervals that attain the target coverage.

high positive Bayesian Elicitation with LLMs: Model Size Helps, Extra "Rea... coverage of recalibrated credible intervals (post-conformal prediction)

Larger, more capable models produce more accurate estimates.

Empirical experiment asking eleven LLMs to estimate population statistics (health prevalence rates, personality trait distributions, labor market figures) and comparing accuracy across models of different capability.

high positive Bayesian Elicitation with LLMs: Model Size Helps, Extra "Rea... accuracy of population-statistic estimates

Applying the Method of Moments Quantile Regression (MMQR) allows the study to capture heterogeneous impacts of robotics across performance levels.

Authors describe use of MMQR in methodology and justify it as appropriate for detecting heterogeneity across quantiles of the dependent variable (value added).

high positive Automation and growth in the European Union: sectoral insigh... heterogeneity of estimated impacts across quantiles

The study uses panel data from Eurostat, the International Federation of Robotics (2024), and World Robotics covering three key sectors in selected EU countries.

Data sources explicitly listed in the paper (Eurostat, IFR 2024, World Robotics); the scope is described as three key sectors in selected EU countries.

high positive Automation and growth in the European Union: sectoral insigh... data coverage / sample scope

Policymakers should support automation through fiscal incentives, invest in reskilling programs, and develop innovation strategies tailored to specific sectors to foster inclusive and sustainable growth.

Policy recommendations derived from empirical findings showing heterogeneous effects of robot density, R&D and human capital across sectors; authors explicitly recommend fiscal incentives, reskilling, and sector-targeted innovation strategies.

high positive Automation and growth in the European Union: sectoral insigh... policy intervention recommendations aiming at inclusive and sustainable growth

The paper’s novelty lies in its differentiated, cross-sectoral approach integrating technological adoption (robotics) with sectoral gross value added using advanced econometric techniques (MMQR).

Authors state the study's contribution is differentiated cross-sectoral analysis and use of MMQR to capture heterogeneous impacts; methodological description provided in paper.

high positive Automation and growth in the European Union: sectoral insigh... methodological contribution / sectoral analysis of value creation

The positive effect of robot density on value added is particularly strong in higher-performing sectors (i.e., at higher quantiles of the value-added distribution).

Results from MMQR showing heterogeneous impacts across performance levels/quantiles; authors state larger positive coefficients of robot density at upper quantiles.

high positive Automation and growth in the European Union: sectoral insigh... gross value added across quantiles (sector performance levels)

Increased robot density significantly enhances value added.

Empirical analysis using panel data (Eurostat, International Federation of Robotics 2024, World Robotics) estimated with Method of Moments Quantile Regression (MMQR); gross value added used as dependent variable and robot density as a core explanatory variable; authors report statistically significant positive coefficients.

high positive Automation and growth in the European Union: sectoral insigh... gross value added (value added)

The paper proposes five architectural requirements for genuine human oversight systems.

Stated methodological/prescriptive contribution of the paper (a proposal rather than an empirical finding); no sample size or empirical validation reported in the provided excerpt.

high positive Beyond Symbolic Control: Societal Consequences of AI-Driven ... design requirements for systems enabling genuine human oversight

The proposed framework outlines a pathway toward large-scale cooperative intelligence and offers a constructive perspective on the coevolution of human and artificial agents in the informational ecosystems of the future.

Claim about the paper's contribution; based on conceptual synthesis and theoretical framing rather than empirical validation.

high positive A Case for Coevolution emergence of large-scale cooperative intelligence

A voluntary ecosystem of free rational agents, human and artificial, who cooperate through transparent and fair exchange of information maximizes their adaptive capacity and long-term well-being.

Normative proposition in the paper derived from theoretical principles (information theory, collective intelligence); presented as a proposed ideal rather than an empirically tested policy.

high positive A Case for Coevolution adaptive capacity and long-term well-being of participating agents

Emerging opportunities exist for stabilizing these ecosystems through new forms of informational verification and monitoring made possible by advanced artificial agents.

Forward-looking claim grounded in conceptual analysis of capabilities of advanced agents; proposed as an opportunity in the paper rather than demonstrated empirically.

high positive A Case for Coevolution stability of informational ecosystems via verification and monitoring tools

Systems that preserve diversity of exploration while minimizing barriers to information exchange exhibit superior capacity for discovery and adaptation in complex environments.

Theoretical claim supported by the paper's appeal to principles from information theory, adaptive systems, and collective intelligence; presented as an argument rather than as empirically validated result.

high positive A Case for Coevolution capacity for discovery and adaptation

Increasing the strictness of algorithmic control paradoxically increases the evolutionary fitness of coordinated resistance (e.g., coordinated log-offs).

Results from the EGT model and simulations showing fitness/payoff changes for coordinated resistance strategies as platform surveillance strictness parameter increases; model-only (no empirical N reported).

high positive THE RED QUEEN in the DASHBOARD: CO-EVOLUTIONARY DYNAMICS of ... evolutionary fitness (payoff) of coordinated resistance strategies

The primary contribution is a controlled agent-payment infrastructure and reference architecture that demonstrates how agentic access monetization can be adapted to fiat systems without discarding security and policy guarantees.

Summary of the paper's claimed contribution (architectural demonstration and reference implementation).

high positive APEX: Agent Payment Execution with Policy for Autonomous Age... existence of a controlled agent-payment infrastructure adapting monetization to ...

Multiple trial runs show low variance across scenarios, demonstrating high reproducibility with 95% confidence intervals.

Reported statistical characterization from repeated trials in the paper (statement of low variance and 95% confidence intervals across scenarios).

high positive APEX: Agent Payment Execution with Policy for Autonomous Age... variance / reproducibility across scenarios (95% CIs reported)

Security mechanisms impose low latency overhead (19.6ms average).

Performance measurement reported in the paper's experiments (average latency overhead reported as 19.6ms).

high positive APEX: Agent Payment Execution with Policy for Autonomous Age... latency overhead introduced by security mechanisms

Security mechanisms achieve 100% block rate for both replay attacks and invalid tokens.

Experimental security evaluation reported in the paper (block rate reported at 100% for replay attacks and invalid tokens).

high positive APEX: Agent Payment Execution with Policy for Autonomous Age... block rate for replay attacks and invalid tokens

The system uses FastAPI, SQLite, and Python standard libraries, making it transparent, inspectable, and reproducible.

Implementation stack specified in the paper and availability of reference implementation; asserted reproducibility.

high positive APEX: Agent Payment Execution with Policy for Autonomous Age... technology stack and reproducibility/inspectability of the implementation

APEX implements a challenge–settle–consume lifecycle with HMAC-signed short-lived tokens, idempotent settlement handling, and policy-aware payment approval.

Implementation details described in the methods/architecture section and supported by the provided reference implementation.

high positive APEX: Agent Payment Execution with Policy for Autonomous Age... presence of challenge–settle–consume lifecycle and specific security/payment mec...

We present APEX, an implementation-complete research system that adapts HTTP 402-style payment gating to UPI-like fiat workflows while preserving policy-governed spend control, tokenized access verification, and replay resistance.

System design and implementation presented in the paper (codebase built using FastAPI, SQLite, Python; demonstration/implementation claimed).

high positive APEX: Agent Payment Execution with Policy for Autonomous Age... ability to adapt HTTP 402-style gating to UPI-like fiat while preserving spend c...

« Prev 1 2 3 … 42 43 44 … 159 160 Next »