Evidence (14156 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	761	200	101	904	2020
Governance & Regulation	829	400	191	122	1566
Organizational Efficiency	784	193	125	84	1197
Technology Adoption Rate	637	236	124	97	1103
Research Productivity	431	131	58	340	972
Output Quality	481	183	59	47	770
Decision Quality	332	177	82	49	647
Firm Productivity	439	57	88	20	610
AI Safety & Ethics	218	279	66	33	602
Market Structure	181	170	123	24	503
Task Allocation	214	64	72	33	388
Skill Acquisition	174	62	62	17	315
Innovation Output	204	27	45	18	295
Employment Level	105	54	108	13	282
Fiscal & Macroeconomic	132	69	43	26	277
Consumer Welfare	117	63	42	11	233
Firm Revenue	154	48	26	3	231
Task Completion Time	173	31	8	12	225
Inequality Measures	44	123	50	6	223
Worker Satisfaction	89	65	22	12	188
Error Rate	71	92	10	2	175
Regulatory Compliance	77	69	14	5	165
Automation Exposure	58	56	26	13	156
Training Effectiveness	96	21	14	19	152
Wages & Compensation	77	37	25	6	145
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	81	21	1	115
Hiring & Recruitment	52	7	8	3	70
Creative Output	32	20	8	3	64
Skill Obsolescence	5	47	6	1	59
Social Protection	28	16	8	2	54
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Excessive reliance on AI may reduce the originality of research and lead to duplication of research efforts.

Model implication: as the share of tasks automated by AI increases, the paper shows analytically that originality can decline and firms may duplicate research efforts (due to homogenization of methods or search), reducing novel knowledge creation.

high negative Bridging Distant Ideas: the Impact of AI on R&D and Recombin... originality of research; duplication of research efforts

AI increases the aggregate rate of creative destruction, shortening the monopoly duration that rewards radical innovations.

Analytical result from the model: introducing AI raises the aggregate creative-destruction rate in the Schumpeterian framework, which reduces the expected monopoly duration and thus the rents that sustain radical innovation.

high negative Bridging Distant Ideas: the Impact of AI on R&D and Recombin... aggregate rate of creative destruction and monopoly duration (rents for radical ...

Applying the Auditor-Corrector methodology to ELT-Bench uncovers that most failed transformation tasks contain benchmark-attributable errors — including rigid evaluation scripts, ambiguous specifications, and incorrect ground truth — that penalize correct agent outputs.

Audit results on ELT-Bench identifying categories of benchmark errors (rigid scripts, ambiguous specs, incorrect ground truth) and attributing many failed transformation tasks to these errors; no numeric breakdown or sample count given in the excerpt.

high negative ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... proportion of failed transformation tasks attributable to benchmark errors (qual...

On ELT-Bench, the first benchmark for end-to-end ELT pipeline construction, AI agents initially showed low success rates, suggesting they lacked practical utility.

Reference to initial evaluation results on ELT-Bench showing low success rates for AI agents; the provided excerpt does not give numerical success rates or sample size.

high negative ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... agent success rate on ELT-Bench (agent capability / practical utility)

Such predatory-hiring cases often fall outside the scope of merger control because they fail to meet the applicable thresholds, warranting consideration under the abuse of dominance prohibition in Article 102 TFEU.

Legal analysis stated in abstract referencing merger control thresholds and Article 102 TFEU (no quantitative sample provided in abstract).

high negative Employee Poaching as An Abuse of Dominance Under Article 102... regulatory coverage (whether conduct falls within merger control or abuse of dom...

When a dominant undertaking in a concentrated market strategically targets and hires a large portion—or the entirety—of a smaller competitor’s key personnel, this behavior can raise significant competition concerns.

Legal argument presented in abstract; draws on relevant case law and scholarship (no empirical sample or experimental method reported in abstract).

high negative Employee Poaching as An Abuse of Dominance Under Article 102... competition concerns arising from strategic hiring of rival personnel

LLM uncertainty estimates require statistical correction before they can be used in decision-making.

Empirical finding of severe undercoverage of nominal 95% intervals and demonstration that conformal recalibration is needed to achieve intended coverage.

high negative Bayesian Elicitation with LLMs: Model Size Helps, Extra "Rea... adequacy of raw LLM uncertainty estimates for decision-making (calibration/cover...

All models are severely overconfident: their 95% intervals contain the true value only 9--44% of the time, far below the expected 95%.

Analysis of model-produced 95% credible intervals across elicited population statistics, measuring empirical coverage rates reported between 9% and 44%.

high negative Bayesian Elicitation with LLMs: Model Size Helps, Extra "Rea... empirical coverage rate of 95% credible intervals

There is a governance window—estimated at 10–15 years—before current deployment trajectories risk path-dependent social, economic, and institutional lock-in.

Forward-looking estimate/projection provided in the paper based on the authors' characterization of deployment trajectories and governance dynamics (no empirical sample size provided in the excerpt).

high negative Beyond Symbolic Control: Societal Consequences of AI-Driven ... time remaining before risk of path-dependent lock-in of harmful AI governance/st...

Societal consequences of labor displacement intensify the governance gap by concentrating consequential AI decision-making among an increasingly narrow class of technical and capital actors.

Analytic/theoretical claim in the paper drawing on the paper's multi-domain argument (no empirical sample size or quantified concentration metrics provided in the excerpt).

high negative Beyond Symbolic Control: Societal Consequences of AI-Driven ... concentration of AI decision-making authority and its amplification of governanc...

This nominal-vs-genuine oversight distinction represents the primary architectural failure mode in deployed AI governance.

Argumentative claim based on the paper's multi-domain synthesis and theoretical analysis; no empirical sample size or quantified causal inference provided in the excerpt.

high negative Beyond Symbolic Control: Societal Consequences of AI-Driven ... dominant failure mode in AI governance architectures

The distinction between nominal and genuine human oversight is largely absent from current governance frameworks, including the EU AI Act and NIST AI Risk Management Framework 1.0.

Comparative policy/regulatory review claimed in the paper (explicit reference to the EU AI Act and NIST AI RMF 1.0); no sample size—based on textual/regulatory analysis rather than statistical data in the provided excerpt.

high negative Beyond Symbolic Control: Societal Consequences of AI-Driven ... coverage of genuine human oversight concepts within major AI governance framewor...

There exists a critical and underexamined governance gap between nominal human oversight of AI systems (humans in formal authority positions) and genuine human oversight (humans with cognitive access, technical capability, and institutional authority to understand, evaluate, and override AI outputs).

Conceptual/qualitative analysis and argumentation presented in the paper; implied synthesis of case examples and theoretical considerations rather than a quantified empirical study in the provided excerpt.

high negative Beyond Symbolic Control: Societal Consequences of AI-Driven ... quality/effectiveness of human oversight over AI systems (cognitive access, tech...

The accelerating displacement of human labor by artificial intelligence (AI) and robotic systems represents a structural transformation whose societal consequences extend far beyond conventional labor market analysis.

Stated as a framing claim in the paper; supported by the paper's literature review and multi-domain conceptual argument (no empirical sample size or quantitative data reported in the provided excerpt).

high negative Beyond Symbolic Control: Societal Consequences of AI-Driven ... displacement of human labor and broader societal consequences

Sustaining such cooperative informational systems has historically proven difficult due to structural incentives that gradually erode transparency and trust.

Historical/analytical assertion in the paper; presented as a high-level observation (no dataset or empirical historical analysis provided in the excerpt).

high negative A Case for Coevolution persistence/stability of cooperative informational systems (affected by incentiv...

The interaction between strict algorithmic control and worker counter-strategies leads to persistent limit cycles in strategy frequencies rather than convergence to a stable compliant workforce.

Dynamical systems analysis and simulation trajectories from the EGT model showing limit cycles / oscillatory equilibria in strategy proportions; model-based (no empirical sample).

high negative THE RED QUEEN in the DASHBOARD: CO-EVOLUTIONARY DYNAMICS of ... dynamical behavior of strategy frequencies (limit cycles vs. stable equilibrium)

Policy enforcement reduces total spending by 27.3%.

Quantitative result reported from the paper's experiments across baselines and scenarios (paper reports a 27.3% reduction attributed to policy enforcement).

high negative APEX: Agent Payment Execution with Policy for Autonomous Age... total spending

In many deployment contexts, especially countries with strong real-time fiat systems like UPI, relying on crypto rails is misaligned with regulatory and infrastructure realities.

Contextual/argumentative claim in the paper contrasting crypto reliance with fiat systems such as UPI (no empirical country-level sample reported).

high negative APEX: Agent Payment Execution with Policy for Autonomous Age... alignment between payment-rail assumptions and regulatory/infrastructure realiti...

The way we're thinking about generative AI right now is fundamentally individual (this appears in how users interact with models, how models are built, how they're benchmarked, and how commercial and research strategies using AI are defined).

Author's observational/descriptive claim supported by argumentative examples (mentions user interaction patterns, model design and benchmarking practices, and commercial/research strategies); no empirical sample or quantitative analysis reported in the excerpt.

high negative The Future of AI is Many, Not One conceptual framing and practices around generative AI (individual-focused design...

The emission-reduction effect of AI innovation is more significant for firms located in regions with underdeveloped factor markets.

Heterogeneity (regional subsample/interaction) analysis reported in the paper on the 21,428 firm-year sample, indicating larger AI-related emission reductions in regions with less developed factor markets.

high negative Artificial Intelligence Innovation, Internal Structure Optim... corporate carbon emission intensity (differential effect by regional factor mark...

The emission-reduction effect of AI innovation is more significant for firms in high-environmental-sensitivity industries.

Heterogeneity (subsample/interaction) analysis in the paper using the 21,428 firm-year observations, showing stronger AI-related emission reductions in industries characterized as high environmental sensitivity.

high negative Artificial Intelligence Innovation, Internal Structure Optim... corporate carbon emission intensity (differential effect by industry environment...

The emission-reduction effect of AI innovation is more significant for enterprises with a low supply chain concentration.

Heterogeneity (subsample) analysis reported in the paper using the 21,428 firm-year dataset, comparing effects across firms with different supply chain concentration levels.

high negative Artificial Intelligence Innovation, Internal Structure Optim... corporate carbon emission intensity (differential effect by supply chain concent...

Executives’ green cognition and government environmental attention together constitute dual internal and external driving forces for corporate carbon emission reduction.

Further analysis reported in the paper (moderation/interaction analysis or additional regressions) on the same 21,428 firm-year sample showing these factors strengthen carbon reduction associated with AI innovation.

high negative Artificial Intelligence Innovation, Internal Structure Optim... corporate carbon emission intensity / carbon emission reduction

AI innovation can significantly reduce corporate carbon emission intensity.

Empirical analysis using panel data of 21,428 firm-year observations from Chinese A-share listed manufacturing companies over 2010–2022; result reported in the paper's main regressions (method described as micro-level empirical analysis).

high negative Artificial Intelligence Innovation, Internal Structure Optim... corporate carbon emission intensity

Traditional questionnaires yielded slightly higher accuracy in risk assessment.

Result reported from the two experiments comparing traditional questionnaires to adaptive ARQuest versions; no numeric accuracy or sample size provided in the excerpt.

high negative AI in Insurance: Adaptive Questionnaires for Improved Risk P... risk assessment accuracy

Insurers must blindly trust users' responses, increasing the chances of fraud.

Stated as a motivating problem in the paper; presented as logical/empirical concern rather than supported by a reported study within the paper.

high negative AI in Insurance: Adaptive Questionnaires for Improved Risk P... fraud risk from self-reported responses

Insurance application processes often rely on lengthy and standardized questionnaires that struggle to capture individual differences.

Descriptive claim in paper introduction arguing limitations of standard questionnaires; no experiment or sample size reported for this assertion.

high negative AI in Insurance: Adaptive Questionnaires for Improved Risk P... ability of standardized questionnaires to capture individual differences

AI's disproportionate benefits for lagging regions help narrow interprovincial emission gaps.

Heterogeneity analysis reported in the provincial panel (2003–2021) showing stronger AI-related reductions in emissions inequality for lagging regions compared to advanced regions.

high negative Artificial intelligence, green innovation, and regional carb... interprovincial emission gaps (carbon inequality)

Green innovation is concentrated in coastal provinces and has not effectively diffused to inland areas, limiting its ability to reduce regional carbon inequality.

Spatial distribution analysis within the provincial panel showing geographic concentration of green innovation activity in coastal provinces and limited diffusion inland.

high negative Artificial intelligence, green innovation, and regional carb... geographic concentration of green innovation (diffusion to inland areas)

AI reduces carbon inequality primarily through improved energy efficiency, enhanced environmental monitoring, and more efficient resource allocation, disproportionately benefiting lagging regions and narrowing interprovincial emission gaps.

Mechanism analysis reported in the paper based on the provincial panel (2003–2021) linking AI development to proximate channels (energy efficiency, monitoring, resource allocation) and heterogeneous impacts across regions.

high negative Artificial intelligence, green innovation, and regional carb... carbon inequality (interprovincial emission gaps)

AI development significantly reduces carbon inequality, particularly when measured by the Gini index.

Empirical analysis using a provincial panel dataset covering 2003–2021; carbon inequality measured with the Gini index; reported statistically significant negative association between AI development and Gini-measured carbon inequality.

high negative Artificial intelligence, green innovation, and regional carb... carbon inequality (Gini index)

Using a stylised inpatient capacity signalling example and minimal game-theoretic reasoning, task optimisation alone is unlikely to change system outcomes when incentives are unchanged.

Theoretical analysis using a stylised inpatient capacity signalling example and game-theoretic reasoning presented in the paper (no empirical data/sample reported in the abstract).

high negative Incentives, Equilibria, and the Limits of Healthcare AI: A G... system-level outcomes in healthcare (response to task optimisation interventions...

Deployment of AI systems carries significant costs including ongoing costs of monitoring and it is unclear whether optimism of a deus ex machina solution is well-placed.

Conceptual/argumentative claim made by the authors in the paper (no empirical study or sample size reported in the abstract).

high negative Incentives, Equilibria, and the Limits of Healthcare AI: A G... costs and uncertainty associated with AI deployment (including monitoring costs)

Cross-equipment generalization is poor, with 42.7% performance on held-out datasets.

Paper reports held-out dataset evaluation showing 42.7% (presumably accuracy or task completion) for cross-equipment generalization.

high negative PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... held-out dataset performance (cross-equipment generalization)

Multi-asset reasoning causes a 14.9 percentage point degradation in performance.

Paper reports a 14.9 percentage point performance degradation attributed to multi-asset reasoning in comparative analyses.

high negative PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... performance degradation (percentage points) when reasoning across multiple asset...

There are systematic failures in tool orchestration, with 23% incorrect sequencing.

Paper reports a measured incorrect sequencing rate of 23% during evaluation of agent tool orchestration across scenarios.

high negative PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... rate of incorrect tool sequencing

Even top-performing configurations achieve only 68% task completion.

Reported aggregated performance result from the benchmark evaluation across the tested frameworks and LLMs (paper statement). The benchmark contains 75 scenarios (used as evaluation instances).

high negative PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... task completion rate

Improvements in operational resilience (OR) effectively reduce corporate operational risk.

Further analysis reported in the paper linking higher OR to lower operational risk measures for firms in the sample.

high negative Does Artificial Intelligence Improve the Operational Resilie... corporate operational risk (reduction)

AI promotes operational resilience by reducing management agency conflicts.

Mechanism (mediation) tests reported in the paper showing AI associated with reductions in measures of agency/management conflict, which in turn relate to OR improvements.

high negative Does Artificial Intelligence Improve the Operational Resilie... management agency conflicts (reduction)

Mandatory release delays can paradoxically reduce deployed model quality by shifting preemption to the announcement stage, where quality locks in before the mandated waiting period.

Model extension analyzing mandatory waiting periods: equilibrium strategic behavior shifts to earlier announcements and quality commitment, yielding lower quality at deployment than without the delay.

high negative Optimal Release Timing of AI Systems: A Strategic Analysis w... deployed model quality under mandatory release delays

Premature release imposes safety externalities on society that firms do not fully internalize.

Model assumption and subsequent analysis: the paper models a socially harmful safety externality from early deployment that firms ignore (or undervalue) in their private payoff calculations.

high negative Optimal Release Timing of AI Systems: A Strategic Analysis w... magnitude of uninternalized safety externality / societal harm from premature re...

Equilibrium release occurs strictly before the social optimum.

Analytic characterization of the symmetric Nash equilibrium in a theoretical preemption game where firms trade off development time (quality) against first-mover advantages; comparative statics show equilibrium release time < socially optimal release time.

high negative Optimal Release Timing of AI Systems: A Strategic Analysis w... timing of model release relative to the social optimum

Over time the equalizing channel weakened because market valuation (wage exposure) became increasingly unfavorable to female-concentrated occupations, contributing to a renewed widening of the gender wage gap in 2015–2019.

Decomposition results showing a temporal decline in the wage-exposure contribution to equality and a negative wage-exposure trend for female-concentrated occupations, coinciding with gap widening in 2015–2019.

high negative Routine-Biased Technological Change and the Gender Wage Gap ... change in gender wage gap driven by wage exposure of female-concentrated occupat...

Women experienced greater exposure to displacement compared with men.

Gender-disaggregated results from stacked first-difference estimations and dynamic shift-share decomposition showing higher displacement exposure for female workers.

high negative Routine-Biased Technological Change and the Gender Wage Gap ... exposure to job displacement

Routine displacement unfolds episodically rather than simultaneously, with relative contraction in routine cognitive jobs (2001–2005), routine manual jobs (2005–2010), and renewed routine cognitive pressures (2015–2019).

Empirical results from stacked first-difference estimations and a dynamic shift-share decomposition applied to Indonesian formal wage-worker data over 2001–2019.

high negative Routine-Biased Technological Change and the Gender Wage Gap ... contraction/pressure on routine (cognitive and manual) jobs over specified perio...

Enterprise adoption of LLMs is constrained by hallucination, domain drift, and the inability to enforce regulatory compliance at the reasoning level.

Framed as the motivating problem in the paper's introduction/abstract (conceptual claim; no empirical test reported here).

high negative Ontology-Constrained Neural Reasoning in Enterprise Agentic ... hallucination / domain drift / regulatory compliance at reasoning level

No regulatory framework requires disclosure of machine/AI labor output.

Author's assertion in the paper (policy claim; no legislative survey or quantification reported).

high negative HEWU: A Standardized Framework for Measuring Machine-Generat... presence of regulatory disclosure requirements for machine labor

No index tracks machine labor output over time.

Author's assertion in the paper (stated lack of existing indices; no systematic review/sample reported).

high negative HEWU: A Standardized Framework for Measuring Machine-Generat... existence of time-series index for machine labor output

This labor force is entirely invisible to the economic infrastructure humanity has built to measure work: no standardized unit of measurement exists.

Author's assertion/diagnosis in the paper (argumentative/observational, no empirical survey or sample reported).

high negative HEWU: A Standardized Framework for Measuring Machine-Generat... existence of standardized unit for machine labor

Specific occupations such as credit analysts, judges, and sustainability specialists reach ATE scores of 0.43-0.47 by 2030.

Reported model outputs / ATE score estimates for individual occupations within the paper's 2025-2030 regional application.

high negative Agentic AI and Occupational Displacement: A Multi-Regional T... ATE score (automation exposure) for named occupations

« Prev 1 2 3 … 46 47 48 … 283 284 Next »