Evidence (14156 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	761	200	101	904	2020
Governance & Regulation	829	400	191	122	1566
Organizational Efficiency	784	193	125	84	1197
Technology Adoption Rate	637	236	124	97	1103
Research Productivity	431	131	58	340	972
Output Quality	481	183	59	47	770
Decision Quality	332	177	82	49	647
Firm Productivity	439	57	88	20	610
AI Safety & Ethics	218	279	66	33	602
Market Structure	181	170	123	24	503
Task Allocation	214	64	72	33	388
Skill Acquisition	174	62	62	17	315
Innovation Output	204	27	45	18	295
Employment Level	105	54	108	13	282
Fiscal & Macroeconomic	132	69	43	26	277
Consumer Welfare	117	63	42	11	233
Firm Revenue	154	48	26	3	231
Task Completion Time	173	31	8	12	225
Inequality Measures	44	123	50	6	223
Worker Satisfaction	89	65	22	12	188
Error Rate	71	92	10	2	175
Regulatory Compliance	77	69	14	5	165
Automation Exposure	58	56	26	13	156
Training Effectiveness	96	21	14	19	152
Wages & Compensation	77	37	25	6	145
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	81	21	1	115
Hiring & Recruitment	52	7	8	3	70
Creative Output	32	20	8	3	64
Skill Obsolescence	5	47	6	1	59
Social Protection	28	16	8	2	54
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

The article argues that the idea of a “Pax Silica” is fragile.

Conclusion drawn from the paper's theoretical framework and comparative analysis; presented as an assessment rather than empirical measurement.

high negative The Logistics of Hegemony: Semiconductor Chokepoints, Global... stability/fragility of a proposed techno-hegemonic order ('Pax Silica')

Contemporary struggles over semiconductor supply chains represent not a new hegemonic order but a logistical adaptation of Pax Americana.

Stated thesis supported by comparative/historical analysis and theoretical argumentation (comparative analysis of historical Pax orders and U.S. techno-security architecture); no quantitative sample size reported in abstract.

high negative The Logistics of Hegemony: Semiconductor Chokepoints, Global... characterization of geopolitical order governing semiconductor supply chains

Initial adaptation challenges to AI integration were identified among employees.

Participants in semi-structured interviews (n=12) reported initial difficulties adapting to AI tools; themes relating to early adaptation challenges were coded.

high negative AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... initial adaptation challenges to AI

Past machine learning applications to pricing have produced models that adapt slowly to real-time changes, depend heavily on historical data, and struggle to handle multi-agent scenarios.

Stated as literature/related-work critique in paper; no new empirical evidence or sample size provided in the excerpt.

high negative The Application of Adaptive Reinforcement Learning in Dynami... model adaptivity to real-time changes and capability in multi-agent scenarios

Traditional methods, such as rule-based algorithms and statistical scale forecasting, struggle to adapt to rapidly changing market conditions, competitive maneuvers, and evolving consumer strategies, leading to sub-optimal pricing and decreased profitability.

Paper asserts this as background/motivation; no detailed empirical study or sample size provided in the excerpt.

high negative The Application of Adaptive Reinforcement Learning in Dynami... adaptivity of pricing methods and resulting profitability (sub-optimal pricing, ...

In the short term, big data may inhibit welfare growth.

Theoretical comparative-static/dynamic analysis reported in the model showing that initial or short-run effects of increased data sharing can reduce welfare growth (no empirical/sample data).

high negative Study on the impact of big data sharing on individuals’ welf... short-term growth of individuals' welfare

There is a measurement asymmetry in standard LLM evaluation: unconstrained prompts can inflate constraint-adherence scores and mask the practical value of structured prompting.

Analysis of evaluation results from the controlled study showing that unconstrained (simple) prompts sometimes achieve high constraint-adherence scores, leading to misleading evaluation of structured prompts' benefits.

high negative Evaluating 5W3H Structured Prompting for Intent Alignment in... constraint_adherence_scores / evaluation_bias

Traditional paradigms, specifically the resource-based view and the dynamic capabilities framework, operate under closed-system, first-order cybernetic assumptions that fail to capture the dissipative nature of algorithmic agents.

Conceptual critique presented in the paper's theoretical argumentation (literature critique and re-framing); no empirical sample reported.

high negative Governing Human–AI Co-Evolution: Intelligentization Capabili... explanatory_power_of_management_theory (ability to account for AI-driven organiz...

AI usage predicts work disengagement behavior via emotional exhaustion elicited by AI-associated technostressors.

Four-stage longitudinal study (survey) of finance professionals (N=285); mediation analysis testing AI usage -> technostressors -> emotional exhaustion -> work disengagement, based on SOR framework.

high negative Autonomous enhancement or emotional depletion? The dual-path... work disengagement behavior (mediated by emotional exhaustion from technostresso...

These findings highlight fundamental challenges in the numerical and time-series reasoning for current LLMs and motivate future research in financial intelligence.

Interpretation of experimental results in the paper: authors conclude that the observed limited gains (particularly on trading-signal/time-series aspects) indicate shortcomings in LLM numerical and time-series reasoning.

high negative FinTradeBench: A Financial Reasoning Benchmark for LLMs LLMs' numerical and time-series reasoning capability (qualitative conclusion fro...

There is a central design tension in human-AI systems: maximizing short-term hybrid capability does not necessarily preserve long-term human cognitive competence.

Conceptual/theoretical claim derived from the framework and discussion in the paper (argument and mathematical framing), no empirical sample or longitudinal data presented in the excerpt.

high negative Cognitive Amplification vs Cognitive Delegation in Human-AI ... long-term human cognitive competence

This result directly contradicts classical scaling laws which assume monotonic capability gains with model scale.

Comparative theoretical claim in the paper contrasting the Institutional Scaling Law with classical empirical/theoretical scaling laws in ML literature.

high negative Punctuated Equilibria in Artificial Intelligence: The Instit... relationship between model scale and deployment-relevant fitness/capability

The Institutional Scaling Law proves that institutional fitness is non-monotonic in model scale.

Formal mathematical derivation/proof presented in the paper (the 'Institutional Scaling Law').

high negative Punctuated Equilibria in Artificial Intelligence: The Instit... institutional fitness as a function of model scale

AI development proceeds not through smooth advancement but through extended periods of stasis interrupted by rapid phase transitions that reorganize the competitive landscape (punctuated equilibrium pattern).

Argument based on punctuated equilibrium theory from evolutionary biology and historical analysis presented in the paper identifying discrete transitions in AI history; the paper cites and classifies eras/events as evidence.

high negative Punctuated Equilibria in Artificial Intelligence: The Instit... pattern of AI development (stasis vs. phase transitions)

The interaction of artificial intelligence and environmental regulation produces a '1 + 1 < 2' crowding-out effect (their combined effect is less than the sum of individual effects).

Spatial Durbin model with interaction term between AI and environmental regulation as summarized in the abstract; reported as a crowding-out interaction.

high negative How artificial intelligence and environmental regulation inf... UCEE index (interaction effect of AI and environmental regulation)

Environmental regulation significantly inhibits local UCEE.

Spatial Durbin model results reported in the abstract indicating a significant negative local coefficient for environmental regulation.

high negative How artificial intelligence and environmental regulation inf... UCEE index (local/provincial effect of environmental regulation)

Artificial intelligence significantly inhibits local UCEE.

Spatial Durbin model results reported in the abstract indicating a significant negative local coefficient for artificial intelligence.

high negative How artificial intelligence and environmental regulation inf... UCEE index (local/provincial effect of AI)

Progress in agentic AI systems that generate and optimize GPU kernels is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution.

Author argument/observation in paper (conceptual claim about limitations of existing benchmarks); no empirical sample or experiment reported in the provided text.

high negative SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GP... benchmark_alignment_with_hardware_efficiency

Rather than broad job losses, evidence points to a reallocation at the entry level: AI automates tasks typically assigned to junior staff, shifting the nature of entry-level roles.

Synthesis of firm- and task-level empirical studies reported in the brief documenting automation of routine/junior tasks and changes in job-task composition; specific sample sizes vary by cited study and are not provided in the brief.

high negative AI, Productivity, and Labor Markets: A Review of the Empiric... automation of entry-level/junior tasks and changes to entry-level job content

Algorithmic credit systems are linked to higher levels of financial stress.

Study reports a positive association between algorithmic credit system use and reported financial stress from regression analysis on the 400-user cross-sectional dataset.

high negative Architecting financial well-being in algorithmic credit syst... financial stress

Confirmation bias poses a weakness in LLM-based code review, with implications on how AI-assisted development tools are deployed.

Synthesis of findings from Study 1 (framing-induced detection failures) and Study 2 (practical exploitability and partial mitigation via debiasing).

high negative Measuring and Exploiting Confirmation Bias in LLM-Assisted S... reliability/security of LLM-based code review

Adversarial framing succeeds in 88% of cases against Claude Code (autonomous agent) in real project configurations where adversaries can iteratively refine their framing to increase attack success.

Study 2 experiments in real project configurations with iterative adversary refinement evaluated against Claude Code (autonomous agent); reported 88% success rate.

high negative Measuring and Exploiting Confirmation Bias in LLM-Assisted S... attack success rate (vulnerability reintroduction accepted/not detected)

Adversarial pull request framing (e.g., labeled as security improvements or urgent functionality fixes) succeeds in reintroducing known vulnerabilities in 35% of cases against GitHub Copilot under one-shot attacks.

Study 2 experiments simulating adversarial pull requests evaluated against GitHub Copilot (interactive assistant); reported success rate 35% for one-shot attacks.

high negative Measuring and Exploiting Confirmation Bias in LLM-Assisted S... attack success rate (vulnerability reintroduction accepted/not detected)

The framing effect is strongly asymmetric: false negatives increase sharply while false positive rates change little.

Comparison of false negative and false positive rates across framing conditions in Study 1 experiments (250 CVE pairs across models).

high negative Measuring and Exploiting Confirmation Bias in LLM-Assisted S... false negative rate and false positive rate

Framing a change as bug-free reduces vulnerability detection rates by 16-93%.

Result reported from Study 1 controlled experiments across models and framing conditions (250 CVE pairs).

high negative Measuring and Exploiting Confirmation Bias in LLM-Assisted S... vulnerability detection rate

AI-only baselines perform near or below the median of competition participants.

Comparison of AI-only baseline performance to the distribution of competition participant results reported in the paper (competition with 29 teams / 80 participants).

high negative AgentDS Technical Report: Benchmarking the Future of Human-A... relative performance rank of AI-only baselines vs participants

Our results show that current AI agents struggle with domain-specific reasoning.

Outcome of the competition reported in the paper comparing AI-only baselines to participant submissions across the AgentDS tasks (competition data from 29 teams / 80 participants); reported aggregate performance indicating AI weakness on domain-specific tasks.

high negative AgentDS Technical Report: Benchmarking the Future of Human-A... domain-specific reasoning performance

LLM-generated peer reviews place significantly less weight on clarity and significance of the research.

Comparative analysis between LLM-generated reviews and human reviews from the conference dataset; reported as a statistically significant difference but exact statistics and sample size not provided in the excerpt.

high negative How LLMs Distort Our Written Language importance/weight given to clarity and significance in peer review content

Significantly more heavy LLM users reported that the writing was less creative and not in their voice.

Self-reported measures from participants in the human user study comparing heavy LLM users to others; no sample size or exact statistics provided in the excerpt.

high negative How LLMs Distort Our Written Language self-reported creativity and 'in-your-voice' authenticity of writing

In Chicago, the model shows moderate under-detection of Black residents with DIR equal to 0.22.

Reported DIR value from simulation results on Chicago 2022 data.

high negative Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... Disparate Impact Ratio (DIR) indicating under-detection of Black residents

It is impractical to uniformly apply an alignment method across diverse, independently developed AI models in strategic settings.

Paper assertion / motivating argument (stated as motivation for investigating zero-shot Nash-like behavior); not presented as an empirical finding within the paper.

high negative Reasonably reasoning AI agents can avoid game-theoretic fail... practicality/adoption feasibility of universal alignment methods

The gap between informal natural language requirements and precise program behavior (the 'intent gap') has always plagued software engineering, but AI-generated code amplifies it to an unprecedented scale.

Conceptual claim and argumentation in the paper; presented as an observed escalation in the scale of the existing 'intent gap' due to AI code generation. No quantitative evidence or sample size given in the excerpt.

high negative Intent Formalization: A Grand Challenge for Reliable Coding ... mismatch between intended and actual program behavior (intent gap) / resulting c...

The crowding-out effect of AI washing on green innovation is heterogeneous: private enterprises, small and medium-sized enterprises (SMEs), and firms in highly competitive sectors suffer more severe negative impacts.

Subgroup/heterogeneity analysis reported in the paper on the same sample of Chinese A-share listed companies (2006–2024); abstract identifies private firms, SMEs, and firms in highly competitive industries as more affected.

high negative The Spillover Effects of Peer AI Rinsing on Corporate Green ... green innovation (heterogeneous treatment effects across firm types and industri...

The negative relationship between AI washing and green innovation is transmitted through dual channels in both product and capital markets.

Mechanism analysis reported in the paper (presumably mediation or channel analysis) using the same dataset of Chinese A-share firms' annual reports and firm-level market data; abstract states product- and capital-market channels convey the crowding-out effect.

high negative The Spillover Effects of Peer AI Rinsing on Corporate Green ... green innovation (via product-market and capital-market channels)

Corporate AI washing exerts a significant crowding-out effect on green innovation.

Empirical analysis using semantic measures of 'AI washing' derived from large language model (LLM) analysis of annual reports for Chinese A-share listed companies (2006–2024); paper reports statistically significant negative relationship between AI washing and firms' green innovation (details of regression models not provided in abstract).

high negative The Spillover Effects of Peer AI Rinsing on Corporate Green ... green innovation

The capital-output elasticity dropped significantly, from 0.42 in 2010–2015 to 0.35 in 2016–2022.

Estimated from an extended Cobb–Douglas production function applied to China's economy over 2010–2022, with period split 2010–2015 vs 2016–2022 (as reported in the study summary).

high negative Analysis of China's Economic Growth Drivers: An Empirical St... capital-output elasticity (elasticity of output with respect to capital)

These dynamics amplify initial disparities and produce persistent performance gaps across the population.

Main theoretical conclusion of the paper: analysis of the proposed dynamical system showing amplification and persistence of gaps (authors' demonstrated result).

high negative Actionable Recourse in Competitive Environments: A Dynamic G... magnitude and persistence of performance disparities across population over time

Exclusion-based cohesion can produce state-contingent illusory precision together with effective input concentration and dynamic lock-in simultaneously—i.e., these phenomena co-occur under the model's parameter regimes.

Analytical model results showing co-occurrence of multiple adverse phenomena (bias that grows in tails, illusory precision, input concentration, lock-in) under the same exclusion mechanisms; derived within the paper's theoretical framework.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... co-occurrence of multiple adverse outcomes: tail bias, observed disagreement, ef...

When the anchor belief is updated from internally filtered aggregates, the system can exhibit dynamic lock-in: delayed recognition of regime shifts followed by abrupt correction.

Analytical dynamics studied in the model when anchor updates depend on filtered (excluded) aggregates; derivations demonstrate delayed detection and abrupt adjustments. This is a theoretical/dynamical model result, no empirical data.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... delay in regime recognition and magnitude/timing of corrective update

Exclusion leads to effective concentration of decision inputs: the effective number of independent inputs falls below the nominal participant count.

Model-derived analytic result showing that report shrinkage and discarding reduce effective information contributions, quantified relative to nominal participation in the theoretical framework. No empirical sample.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... effective number of independent decision inputs (information concentration)

Exclusion-based cohesion induces 'illusory precision': observed disagreement can fall while actual estimation error in tail regimes rises (i.e., lower recorded variance despite higher true error).

Theoretical result derived from the signal-aggregation model showing a regime in which filtered reports reduce observed variance even as tail-regime estimation error increases. No empirical validation provided.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... observed disagreement (reported variance) versus true estimation error in tail r...

Relative to a full-inclusion benchmark, exclusion-based cohesion produces state-contingent bias that is small in normal regimes but grows sharply under regime displacement (tail events).

Analytical comparisons between the exclusion model and a full-inclusion benchmark within the theoretical model; derivations showing bias as a function of regime and exclusion parameters. The result is from model analysis, not empirical data.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... estimation bias (especially under regime displacement/tail events)

The establishment of the China–ASEAN Free Trade Area (CAFTA) reduced regional trade policy uncertainty.

Empirical analysis treats CAFTA as an exogenous policy shock and measures a decline in regional trade policy uncertainty using firm‑ and trade‑level data from the China Industrial Enterprise Database and China Customs Database covering 2000–2014; identification via difference‑in‑differences (DID). (Sample sizes not specified in provided summary.)

high negative How regional trade policy uncertainty affects agricultural i... regional trade policy uncertainty (measured at regional/firm level)

Limitations include possible limited organizational generalizability due to a single Fortune 500 lab context; ABS results depend on model specification/calibration; and operational definitions of 'resilience' and 'planning cycle' require careful reading.

Authors' reported limitations based on study design: single lab context (n = 23), dependence of ABS on model choices, and nontrivial operational definitions.

high negative The Algorithmic Canvas: On the Autopoietic Redefinition of S... generalizability and robustness of study findings

Some declines (in self-efficacy and meaningfulness) from passive AI use persist after participants return to manual work.

Within-experiment assessment of outcomes after participants returned to manual (no-AI) tasks following the AI-use manipulation in the pre-registered experiment (N = 269); reported persistent reductions in self-efficacy and meaningfulness for the passive condition.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... self-efficacy; perceived meaningfulness (measured post-return to manual work)

Passive use of AI reduces perceived meaningfulness of work.

Pre-registered experiment (N = 269) with self-reported measure of work meaningfulness; passive-copy condition showed lower meaningfulness ratings than No-AI and Active-collaboration conditions.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... perceived meaningfulness of work

Passive use of AI reduces psychological ownership of the produced outputs.

Same pre-registered experiment (N = 269). Participants in the passive-copy AI condition reported lower psychological ownership of their outputs (self-report scales) relative to No-AI and Active-collaboration conditions.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... psychological ownership of outputs

Passive use of AI (copying AI-generated output) reduces workers' self-efficacy.

Pre-registered between-subjects experiment (N = 269) using occupation-specific writing tasks. Participants assigned to a passive-copy AI condition reported lower self-efficacy (self-reported confidence to complete tasks without AI) compared to the No-AI (manual) and Active-collaboration conditions.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... self-efficacy (confidence to complete tasks without AI)

Securitization of economic dependencies—especially in strategic sectors (semiconductors, telecoms, cloud)—frames partner states as security risks and exposes them to blacklists, de-risking campaigns, and sudden loss of market access.

Process tracing of export controls and blacklisting episodes; chronologies of sanction/policy actions affecting firms and partners; policy documents and public lists (e.g., export-control lists). (Data sources: export-control lists, sanction policy documents, corporate/access denials; sample sizes not specified.)

high negative China-US Trade War and the Challenges for Developing Countri... incidence of blacklisting/sanctions affecting partners, sudden changes in market...

Large-scale AI models have significant energy and resource costs, creating a notable environmental footprint that must be addressed.

Narrative integration of prior empirical studies measuring compute, energy consumption, and embodied emissions of large models (cited literature); the review does not present new quantitative measurements itself.

high negative The Evolution and Societal Impact of Artificial Intelligence... energy consumption, carbon emissions, and resource use associated with large-sca...

« Prev 1 2 3 … 51 52 53 … 283 284 Next »