Evidence (11633 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Conventional web analytics treats the human user as its fundamental unit of analysis, assuming stable preferences, identifiable intentions, and behavioral patterns that unfold over time.

Conceptual statement supported by literature synthesis and critique of standard web-analytics assumptions (position paper; no primary empirical sample reported).

high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... validity of web analytics' human-centered unit-of-analysis assumption (stability...

Consequently, generated artifacts may exhibit brittle behavior and limited deployability.

Paper asserts that lack of production awareness leads to brittle artifacts and limited deployability; no quantitative measures or sample sizes provided in the abstract.

high negative Architectural Constraints Alignment in AI-assisted, Platform... brittleness of artifacts and deployability

AI-assisted development tools often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments.

Asserted observation in the paper arguing limitations of general-purpose AI code generation when targeting production-ready systems; no empirical sample size or methodological details provided in the excerpt.

high negative Architectural Constraints Alignment in AI-assisted, Platform... awareness of architectural constraints / suitability for production

Current AI tools are not yet mature enough to replace developers.

Conclusion drawn from the controlled experiment and participant feedback comparing AI-assisted vs traditional task-splitting.

high negative Splitting User Stories Into Tasks with AI -- A Foe or an All... suitability of AI to replace developers

Breaking down user stories into actionable tasks is a critical yet time-consuming process in agile software development.

Background/introductory statement in the paper describing the problem motivation; no experimental sample size reported for this claim.

high negative Splitting User Stories Into Tasks with AI -- A Foe or an All... time required to split user stories (descriptive claim about time consumption)

Nominally cheaper models can incur higher total cost due to token-intensive reasoning.

Cost and token usage analysis reported in the paper showing cheaper-per-token models may generate more tokens and thus higher total cost in practice.

high negative Switchcraft: AI Model Router for Agentic Tool Calling total inference cost as a function of token usage and per-token price

Agentic AI systems that invoke external tools are powerful but costly, leading developers to default to large models and overspend inference budgets.

Stated as background/motivation in the paper (conceptual claim; no empirical sample size reported).

high negative Switchcraft: AI Model Router for Agentic Tool Calling inference cost / developer tendency to use large models

LLM hallucinations are infiltrating knowledge production at scale, threatening both the reliability and equity of future scientific discovery as human and AI systems draw on the existing literature.

Synthesis/conclusion drawn from the observed prevalence, growth, distribution across fields and authorship patterns, and limited correction by moderation/publication processes described above.

high negative LLM hallucinations in the wild: Large-scale evidence from no... risk to reliability and equity of scientific discovery (qualitative assessment)

Preprint moderation and journal publication processes capture only a fraction of these errors.

Comparison of hallucinated-reference prevalence in preprints versus versions that underwent moderation or journal publication, showing many errors remain uncaught.

high negative LLM hallucinations in the wild: Large-scale evidence from no... fraction of hallucinated references detected/removed by moderation and publicati...

There are three practical failure modes produced or amplified by AI-assisted causal analysis: (1) method-data mismatch, where AI bypasses expertise at execution; (2) confidence laundering, where AI amplifies the credibility of formatted output; and (3) invisible forking, which spans both.

Taxonomy created and justified in the paper via conceptual argument and illustrative discussion; no empirical classification study or prevalence estimates provided.

high negative Vibe Econometrics and the Analysis Contract types of inferential failure modes arising in AI-assisted causal analysis

AI industrializes the packaging of existing inferential failure modes: the barrier between naming a method and executing it has collapsed, allowing weak foundations, dressed as rigorous analysis, to reach audiences at a scale, speed, and polish that previously required expertise.

Conceptual claim supported by narrative reasoning and illustrative examples; no empirical data on scale, speed, or reach are given.

high negative Vibe Econometrics and the Analysis Contract scale/speed/polish of dissemination of weak analyses (i.e., reach/adoption of lo...

AI changes the incidence, observability, and persuasive force of inferential failures enough to create a practically distinct governance problem (even if it does not invent previously nonexistent inferential failures).

Argumentative/theoretical reasoning in the paper; no empirical measurement of incidence, observability, or persuasiveness provided.

high negative Vibe Econometrics and the Analysis Contract governance challenge arising from changed incidence, observability, and persuasi...

When AI assists with methods whose validity depends on assumptions that cannot be verified from the output alone ("vibe inference"), the failure surface is structurally different: the output does not reliably signal invalidity, and when it does, recognizing the signal requires the expertise the workflow bypasses.

Logical/qualitative argument and definition development in the paper (no empirical validation or measured instances provided).

high negative Vibe Econometrics and the Analysis Contract observability/detectability of invalid inference and requirement of expert knowl...

AI-assisted methodology ("vibe methodology") democratizes the failure modes specific to each domain.

Conceptual/theoretical argument presented in the paper; no empirical sample, quantitative data, or experiments reported.

high negative Vibe Econometrics and the Analysis Contract democratization of domain-specific inferential failure modes (i.e., more widespr...

Patent text similarity analysis confirms a 'homogenization trap' (AI-associated increases in patent-text similarity).

Text-similarity analysis of patent documents reported in the paper showing increased patent similarity associated with AI use.

high negative The Inverted-U Relationship Between AI and Corporate Innovat... patent text similarity (homogenization of patent content)

Industry concentration negatively moderates the AI–innovation relationship.

Moderation analysis/interacted fixed-effects models indicating that higher industry concentration weakens the AI→innovation effect.

high negative The Inverted-U Relationship Between AI and Corporate Innovat... moderating effect of industry concentration on AI → innovation

Cascade performance is limited primarily by structural cost (they pay the cheap model before any escalation decision), rather than by a shortage of intermediate stages.

Synthesis of theoretical insights and empirical results reported in the paper (theoretical analysis of structural costs + empirical comparisons showing limited benefit from additional stages).

high negative Is Escalation Worth It? A Decision-Theoretic Characterizatio... primary constraint on cascade performance (structural cost vs availability of in...

Optimized subsequence cascades do not deliver practically meaningful held-out gains over the pairwise envelope.

Empirical evaluation on the five benchmarks comparing optimized subsequence cascades to the pairwise envelope; reported lack of practically meaningful held-out improvement.

high negative Is Escalation Worth It? A Decision-Theoretic Characterizatio... held-out performance gains of optimized subsequence cascades relative to the pai...

Within the deterministic threshold-cascade class, full fixed chains underperform the pairwise envelope.

Empirical comparison across the reported benchmarks and models showing that full fixed chains achieve worse cost-quality tradeoffs than the pairwise envelope (experimental results described in the paper).

high negative Is Escalation Worth It? A Decision-Theoretic Characterizatio... relative cost-quality performance of full fixed-chain cascades versus the pairwi...

The relative importance of AI-related equities as shock transmitters diminishes over time.

Time-varying estimates from the TVP-VAR showing a decline in the net transmitter contribution of the AI equities group across the sample.

high negative Artificial Intelligence and Financial Market Connectedness: ... relative contribution of AI equities to spillovers

The overall level of connectedness declines modestly following the public release of ChatGPT by OpenAI in November 2022.

Time-series comparison of aggregate connectedness measures derived from the TVP-VAR, with a reported modest post-November 2022 decline (event reference: ChatGPT release).

high negative Artificial Intelligence and Financial Market Connectedness: ... aggregate connectedness level

A policy irreversibility result: there exists a critical time before the singularity after which redistribution becomes politically impossible because wealth concentration makes feasible tax rates vanishingly small.

Proof/argument in the paper showing that as time approaches the singularity the set of tax rates that satisfy political-feasibility constraints (workers' budget / feasibility) shrinks to zero, implying a latest feasible intervention time.

high negative The Economic Singularity: Core Mathematical Model political_feasibility_of_redistribution (feasible tax rates over time)

Financialization amplifies the exponent of the super-exponential divergence by a factor γ_F/η.

Mathematical derivation in the paper showing that the exponent in the asymptotic growth rate near the singularity is multiplied by γ_F/η when including the financialization term γ_F K_f^2 and its coupling parameter η.

high negative The Economic Singularity: Core Mathematical Model growth_exponent of wealth_ratio (asymptotic)

Near the singularity, the wealth ratio between capital owners and workers diverges super-exponentially.

Asymptotic analysis near the finite-time singularity showing that the ratio of capital-owner wealth to worker wealth grows faster than exponential (super-exponentially) as time approaches the blow-up time.

high negative The Economic Singularity: Core Mathematical Model wealth_ratio (capital owners vs. workers)

AI adoption deepens the negative indirect effect of CEO–TMT faultlines on green innovation via reduced eco-attention (moderated mediation).

Reported moderated mediation analysis on the panel dataset (35,347 firm-year observations) showing that AI moderates the indirect path from CEO–TMT faultlines to green innovation through eco-attention, making the indirect effect more negative when AI is greater.

high negative When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... green innovation (indirect effect via eco-attention)

AI technology strengthens the negative relationship between CEO–TMT faultlines and eco-attention (AI exacerbates the adverse effect of faultlines on eco-attention).

Moderation/interaction analysis reported in the paper using the same panel dataset (35,347 firm-year observations) indicating a significant interaction between AI adoption and CEO–TMT faultlines on eco-attention.

high negative When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... eco-attention

CEO–TMT faultlines reduce eco-attention (organizational attention to environmental issues).

Direct association reported in the paper from regression/mediation models using the panel dataset (35,347 firm-year observations) showing a negative relationship between CEO–TMT faultlines and eco-attention.

high negative When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... eco-attention

CEO–TMT faultlines negatively affect green innovation through reduced eco-attention.

Empirical mediation analysis on the panel dataset (35,347 firm-year observations, 2010–2023) testing CEO–TMT faultlines -> eco-attention -> green innovation.

high negative When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... green innovation (mediated by eco-attention)

Municipal 311 call centers and complaint intake systems face a structural mismatch between incoming volume and classification capacity that produces a bottleneck and differential service quality that follows income and racial lines.

Stated in the paper's introduction; cites prior work (Liu 2024 SLA) as support for the differential service-quality / demographic claim. No sample size or quantitative result reported in the excerpt.

high negative Scaling the Queue: Reinforcement Learning for Equitable Call... differential service quality by income and race

Organizational resistance to technological change hinders AI adoption in logistics operations.

Qualitative synthesis of 31 reviewed publications identifying organizational and cultural barriers to AI uptake.

high negative Evaluating the Role of Artificial Intelligence in Optimizing... organizational resistance as an adoption barrier

Data security concerns are a key barrier to adopting AI in global supply chains.

Synthesis of themes from 31 scholarly sources in the structured literature review highlighting data/security-related implementation issues.

high negative Evaluating the Role of Artificial Intelligence in Optimizing... data security concerns as an adoption barrier

High initial investment costs are a significant barrier to AI implementation in logistics.

Synthesis of literature (31 sources) reporting implementation challenges and barriers identified across studies.

high negative Evaluating the Role of Artificial Intelligence in Optimizing... adoption barriers (initial investment costs)

AGI (Artificial General Intelligence) is problematic both conceptually and definitionally.

Authorial assertion in the paper stating AGI is problematic as a concept and definition; framed as a conditioning assumption that shapes the subsequent analysis.

high negative Pathways to AGI conceptual_and_definitional_soundness_of_AGI

The paper argues we should avoid assuming the inevitability of the current situation relating to AI (i.e., the current commercial AI development trajectory is not inevitable).

Authorial methodological claim in the paper's framing/introductory text; presented as a normative methodological stance rather than empirical evidence.

high negative Pathways to AGI policy_assumption_of_inevitability

There is an absence of agreed-upon benchmarks for evaluating AI systems.

Introductory chapter notes lack of standardized evaluation benchmarks as a cross-cutting concern; presented as an analytical observation by the task force.

high negative Introduction: Artificial Intelligence, Politics, and Politic... existence of standardized evaluation benchmarks for AI

AI systems exhibit bias.

Introductory chapter points to bias in AI systems as a recurring theme; supported by the broader literature cited in the report (no numerical sample reported in the introduction).

high negative Introduction: Artificial Intelligence, Politics, and Politic... bias and fairness issues in AI system outputs and decisions

AI model outputs are often opaque and non-replicable.

Introductory chapter identifies opacity and non-replicability of AI outputs as a cross-cutting theme; claim is based on literature synthesis and conceptual critique in the report.

high negative Introduction: Artificial Intelligence, Politics, and Politic... transparency and replicability of AI model outputs

A small number of AI corporations have unprecedented power.

Introductory chapter highlights the theme of concentrated corporate power in AI; asserted as an observational claim in the report's framing rather than derived from a presented empirical sample in the introduction.

high negative Introduction: Artificial Intelligence, Politics, and Politic... concentration of corporate power in the AI industry (market control, platform in...

Existing coordination approaches often occupy two extremes: highly structured methods that rely on fixed roles/pipelines assigned a priori, and fully unstructured teams that enable adaptability but suffer inefficiencies like error propagation, inter-agent conflicts, and wasted resources.

Framing/background claim made in the paper (conceptual argument motivating LATTE).

high negative Improving the Efficiency of Language Agent Teams with Adapti... coordination efficiency / error propagation / resource waste

The price-setter for cognitive labor is no longer the labor market.

Central normative/conceptual claim of the paper supported by the analytical model and the CAW bound: authors argue the compute capital market (through rental price of compute) sets the effective price for cognitive labor. Stated as the paper's concise position; based on theoretical derivation and argumentation.

high negative Who Prices Cognitive Labor in the Age of Agents? A Position ... which market determines cognitive labor price

Compute-Anchored Wage (CAW) bound: on tasks where human and agent cognitive labor are substitutes, the competitive human wage is bounded above by λ · k · r_c (where r_c is the rental rate of compute capital, k is the compute intensity of one effective agent-labor unit, and λ is the relative human-to-agent productivity).

Formal analytical result presented in the paper (mathematical derivation within the factor-pricing model). This is a theoretical bound derived from the model rather than an empirical estimate.

high negative Who Prices Cognitive Labor in the Age of Agents? A Position ... competitive human wage (upper bound)

Once agents are recognized as a production technology, the elastic-supply margin that anchors the equilibrium wage migrates from the labor market to the compute capital market.

Analytical derivation using a textbook factor-pricing framework (citing Mankiw 2020) within the paper's theoretical model; derivation and verbal argument linking supply-elasticity margins to compute capital market. No empirical data reported in the excerpt.

high negative Who Prices Cognitive Labor in the Age of Agents? A Position ... source of wage determination / wage-anchoring margin

Across short stories, marketing slogans, and alternative-uses tasks, three frontier LLMs fall below parity across crowding kernels.

Empirical experiments reported in the paper evaluating three frontier large language models on three task domains (short stories, marketing slogans, alternative-uses) and finding ρ < 1 (below parity) across crowding kernels. The abstract specifies three models but does not report the number of generated samples per model or other sample-size details.

high negative Ex Ante Evaluation of AI-Induced Idea Diversity Collapse human-relative diversity ratio (ρ) indicating excess crowding

This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level crowding.

Theoretical/ conceptual claim in the paper arguing that improvements at the individual-output level can still increase similarity (crowding) at the population level; no empirical numbers given in the abstract.

high negative Ex Ante Evaluation of AI-Induced Idea Diversity Collapse population-level crowding (diversity collapse)

Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones.

Conceptual argument presented in the paper's introduction motivating a population-level perspective on creative outputs (no empirical sample size reported).

high negative Ex Ante Evaluation of AI-Induced Idea Diversity Collapse loss of value due to similarity (population-level creative value)

The Price of Fairness can be large even when group distributions are nearly identical.

Theoretical result/constructive example in the paper showing instances where PoF is large despite near-identical group distributions.

high negative Price of Fairness in Short-Term and Long-Term Algorithmic Se... utility loss due to fairness constraints (PoF)

Enforcing static fairness constraints may exacerbate long-run disparities.

Statement referencing recent prior theoretical results and motivating literature; framed as background/motivation in the paper.

high negative Price of Fairness in Short-Term and Long-Term Algorithmic Se... long-run disparities between groups

Any metric that scores variants directly is manipulable as soon as two equivalent variants in a harmful class disagree in score.

Formal theoretical result/proof presented in the paper based on the transformation-graph semantic-class model.

high negative Gaming the Metric, Not the Harm: Certifying Safety Audits ag... regulatory_compliance

Once announced, such a metric becomes an optimization target: a strategic platform can improve its score by routing recommendations through semantically equivalent content variants, without reducing true harm.

Modeling argument in the paper (transformation graph / semantic classes) and supported by formal analysis and experimental checks described in the paper.

high negative Gaming the Metric, Not the Harm: Certifying Safety Audits ag... regulatory_compliance

We contribute a non-additive harm decomposition (welfare loss W, coverage loss C) that exposes how attrition shifts harm from the regulator-accountable surface to a regulator-invisible one.

Methodological contribution in the paper: definition of welfare loss W and coverage loss C and analysis showing attrition reallocates observable vs. unobservable harm; supported by theoretical exposition and simulation examples.

high negative A Benchmark for Strategic Auditee Gaming Under Continuous Co... distribution of harm (welfare loss vs coverage loss) and effect of sample attrit...

« Prev 1 2 3 … 16 17 18 … 232 233 Next »