Evidence (11633 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Conventional web analytics treats the human user as its fundamental unit of analysis, assuming stable preferences, identifiable intentions, and behavioral patterns that unfold over time.
Conceptual statement supported by literature synthesis and critique of standard web-analytics assumptions (position paper; no primary empirical sample reported).
Consequently, generated artifacts may exhibit brittle behavior and limited deployability.
Paper asserts that lack of production awareness leads to brittle artifacts and limited deployability; no quantitative measures or sample sizes provided in the abstract.
AI-assisted development tools often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments.
Asserted observation in the paper arguing limitations of general-purpose AI code generation when targeting production-ready systems; no empirical sample size or methodological details provided in the excerpt.
Current AI tools are not yet mature enough to replace developers.
Conclusion drawn from the controlled experiment and participant feedback comparing AI-assisted vs traditional task-splitting.
Breaking down user stories into actionable tasks is a critical yet time-consuming process in agile software development.
Background/introductory statement in the paper describing the problem motivation; no experimental sample size reported for this claim.
Nominally cheaper models can incur higher total cost due to token-intensive reasoning.
Cost and token usage analysis reported in the paper showing cheaper-per-token models may generate more tokens and thus higher total cost in practice.
Agentic AI systems that invoke external tools are powerful but costly, leading developers to default to large models and overspend inference budgets.
Stated as background/motivation in the paper (conceptual claim; no empirical sample size reported).
LLM hallucinations are infiltrating knowledge production at scale, threatening both the reliability and equity of future scientific discovery as human and AI systems draw on the existing literature.
Synthesis/conclusion drawn from the observed prevalence, growth, distribution across fields and authorship patterns, and limited correction by moderation/publication processes described above.
Preprint moderation and journal publication processes capture only a fraction of these errors.
Comparison of hallucinated-reference prevalence in preprints versus versions that underwent moderation or journal publication, showing many errors remain uncaught.
There are three practical failure modes produced or amplified by AI-assisted causal analysis: (1) method-data mismatch, where AI bypasses expertise at execution; (2) confidence laundering, where AI amplifies the credibility of formatted output; and (3) invisible forking, which spans both.
Taxonomy created and justified in the paper via conceptual argument and illustrative discussion; no empirical classification study or prevalence estimates provided.
AI industrializes the packaging of existing inferential failure modes: the barrier between naming a method and executing it has collapsed, allowing weak foundations, dressed as rigorous analysis, to reach audiences at a scale, speed, and polish that previously required expertise.
Conceptual claim supported by narrative reasoning and illustrative examples; no empirical data on scale, speed, or reach are given.
AI changes the incidence, observability, and persuasive force of inferential failures enough to create a practically distinct governance problem (even if it does not invent previously nonexistent inferential failures).
Argumentative/theoretical reasoning in the paper; no empirical measurement of incidence, observability, or persuasiveness provided.
When AI assists with methods whose validity depends on assumptions that cannot be verified from the output alone ("vibe inference"), the failure surface is structurally different: the output does not reliably signal invalidity, and when it does, recognizing the signal requires the expertise the workflow bypasses.
Logical/qualitative argument and definition development in the paper (no empirical validation or measured instances provided).
AI-assisted methodology ("vibe methodology") democratizes the failure modes specific to each domain.
Conceptual/theoretical argument presented in the paper; no empirical sample, quantitative data, or experiments reported.
Patent text similarity analysis confirms a 'homogenization trap' (AI-associated increases in patent-text similarity).
Text-similarity analysis of patent documents reported in the paper showing increased patent similarity associated with AI use.
Industry concentration negatively moderates the AI–innovation relationship.
Moderation analysis/interacted fixed-effects models indicating that higher industry concentration weakens the AI→innovation effect.
Cascade performance is limited primarily by structural cost (they pay the cheap model before any escalation decision), rather than by a shortage of intermediate stages.
Synthesis of theoretical insights and empirical results reported in the paper (theoretical analysis of structural costs + empirical comparisons showing limited benefit from additional stages).
Optimized subsequence cascades do not deliver practically meaningful held-out gains over the pairwise envelope.
Empirical evaluation on the five benchmarks comparing optimized subsequence cascades to the pairwise envelope; reported lack of practically meaningful held-out improvement.
Within the deterministic threshold-cascade class, full fixed chains underperform the pairwise envelope.
Empirical comparison across the reported benchmarks and models showing that full fixed chains achieve worse cost-quality tradeoffs than the pairwise envelope (experimental results described in the paper).
The relative importance of AI-related equities as shock transmitters diminishes over time.
Time-varying estimates from the TVP-VAR showing a decline in the net transmitter contribution of the AI equities group across the sample.
The overall level of connectedness declines modestly following the public release of ChatGPT by OpenAI in November 2022.
Time-series comparison of aggregate connectedness measures derived from the TVP-VAR, with a reported modest post-November 2022 decline (event reference: ChatGPT release).
A policy irreversibility result: there exists a critical time before the singularity after which redistribution becomes politically impossible because wealth concentration makes feasible tax rates vanishingly small.
Proof/argument in the paper showing that as time approaches the singularity the set of tax rates that satisfy political-feasibility constraints (workers' budget / feasibility) shrinks to zero, implying a latest feasible intervention time.
Financialization amplifies the exponent of the super-exponential divergence by a factor γ_F/η.
Mathematical derivation in the paper showing that the exponent in the asymptotic growth rate near the singularity is multiplied by γ_F/η when including the financialization term γ_F K_f^2 and its coupling parameter η.
Near the singularity, the wealth ratio between capital owners and workers diverges super-exponentially.
Asymptotic analysis near the finite-time singularity showing that the ratio of capital-owner wealth to worker wealth grows faster than exponential (super-exponentially) as time approaches the blow-up time.
AI adoption deepens the negative indirect effect of CEO–TMT faultlines on green innovation via reduced eco-attention (moderated mediation).
Reported moderated mediation analysis on the panel dataset (35,347 firm-year observations) showing that AI moderates the indirect path from CEO–TMT faultlines to green innovation through eco-attention, making the indirect effect more negative when AI is greater.
AI technology strengthens the negative relationship between CEO–TMT faultlines and eco-attention (AI exacerbates the adverse effect of faultlines on eco-attention).
Moderation/interaction analysis reported in the paper using the same panel dataset (35,347 firm-year observations) indicating a significant interaction between AI adoption and CEO–TMT faultlines on eco-attention.
CEO–TMT faultlines reduce eco-attention (organizational attention to environmental issues).
Direct association reported in the paper from regression/mediation models using the panel dataset (35,347 firm-year observations) showing a negative relationship between CEO–TMT faultlines and eco-attention.
CEO–TMT faultlines negatively affect green innovation through reduced eco-attention.
Empirical mediation analysis on the panel dataset (35,347 firm-year observations, 2010–2023) testing CEO–TMT faultlines -> eco-attention -> green innovation.
Municipal 311 call centers and complaint intake systems face a structural mismatch between incoming volume and classification capacity that produces a bottleneck and differential service quality that follows income and racial lines.
Stated in the paper's introduction; cites prior work (Liu 2024 SLA) as support for the differential service-quality / demographic claim. No sample size or quantitative result reported in the excerpt.
Organizational resistance to technological change hinders AI adoption in logistics operations.
Qualitative synthesis of 31 reviewed publications identifying organizational and cultural barriers to AI uptake.
Data security concerns are a key barrier to adopting AI in global supply chains.
Synthesis of themes from 31 scholarly sources in the structured literature review highlighting data/security-related implementation issues.
High initial investment costs are a significant barrier to AI implementation in logistics.
Synthesis of literature (31 sources) reporting implementation challenges and barriers identified across studies.
AGI (Artificial General Intelligence) is problematic both conceptually and definitionally.
Authorial assertion in the paper stating AGI is problematic as a concept and definition; framed as a conditioning assumption that shapes the subsequent analysis.
The paper argues we should avoid assuming the inevitability of the current situation relating to AI (i.e., the current commercial AI development trajectory is not inevitable).
Authorial methodological claim in the paper's framing/introductory text; presented as a normative methodological stance rather than empirical evidence.
There is an absence of agreed-upon benchmarks for evaluating AI systems.
Introductory chapter notes lack of standardized evaluation benchmarks as a cross-cutting concern; presented as an analytical observation by the task force.
AI systems exhibit bias.
Introductory chapter points to bias in AI systems as a recurring theme; supported by the broader literature cited in the report (no numerical sample reported in the introduction).
AI model outputs are often opaque and non-replicable.
Introductory chapter identifies opacity and non-replicability of AI outputs as a cross-cutting theme; claim is based on literature synthesis and conceptual critique in the report.
A small number of AI corporations have unprecedented power.
Introductory chapter highlights the theme of concentrated corporate power in AI; asserted as an observational claim in the report's framing rather than derived from a presented empirical sample in the introduction.
Existing coordination approaches often occupy two extremes: highly structured methods that rely on fixed roles/pipelines assigned a priori, and fully unstructured teams that enable adaptability but suffer inefficiencies like error propagation, inter-agent conflicts, and wasted resources.
Framing/background claim made in the paper (conceptual argument motivating LATTE).
The price-setter for cognitive labor is no longer the labor market.
Central normative/conceptual claim of the paper supported by the analytical model and the CAW bound: authors argue the compute capital market (through rental price of compute) sets the effective price for cognitive labor. Stated as the paper's concise position; based on theoretical derivation and argumentation.
Compute-Anchored Wage (CAW) bound: on tasks where human and agent cognitive labor are substitutes, the competitive human wage is bounded above by λ · k · r_c (where r_c is the rental rate of compute capital, k is the compute intensity of one effective agent-labor unit, and λ is the relative human-to-agent productivity).
Formal analytical result presented in the paper (mathematical derivation within the factor-pricing model). This is a theoretical bound derived from the model rather than an empirical estimate.
Once agents are recognized as a production technology, the elastic-supply margin that anchors the equilibrium wage migrates from the labor market to the compute capital market.
Analytical derivation using a textbook factor-pricing framework (citing Mankiw 2020) within the paper's theoretical model; derivation and verbal argument linking supply-elasticity margins to compute capital market. No empirical data reported in the excerpt.
Across short stories, marketing slogans, and alternative-uses tasks, three frontier LLMs fall below parity across crowding kernels.
Empirical experiments reported in the paper evaluating three frontier large language models on three task domains (short stories, marketing slogans, alternative-uses) and finding ρ < 1 (below parity) across crowding kernels. The abstract specifies three models but does not report the number of generated samples per model or other sample-size details.
This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level crowding.
Theoretical/ conceptual claim in the paper arguing that improvements at the individual-output level can still increase similarity (crowding) at the population level; no empirical numbers given in the abstract.
Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones.
Conceptual argument presented in the paper's introduction motivating a population-level perspective on creative outputs (no empirical sample size reported).
The Price of Fairness can be large even when group distributions are nearly identical.
Theoretical result/constructive example in the paper showing instances where PoF is large despite near-identical group distributions.
Enforcing static fairness constraints may exacerbate long-run disparities.
Statement referencing recent prior theoretical results and motivating literature; framed as background/motivation in the paper.
Any metric that scores variants directly is manipulable as soon as two equivalent variants in a harmful class disagree in score.
Formal theoretical result/proof presented in the paper based on the transformation-graph semantic-class model.
Once announced, such a metric becomes an optimization target: a strategic platform can improve its score by routing recommendations through semantically equivalent content variants, without reducing true harm.
Modeling argument in the paper (transformation graph / semantic classes) and supported by formal analysis and experimental checks described in the paper.
We contribute a non-additive harm decomposition (welfare loss W, coverage loss C) that exposes how attrition shifts harm from the regulator-accountable surface to a regulator-invisible one.
Methodological contribution in the paper: definition of welfare loss W and coverage loss C and analysis showing attrition reallocates observable vs. unobservable harm; supported by theoretical exposition and simulation examples.