Evidence (11633 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Firms do not internalize the congestion externality they impose on the retraining queue, the irreversibility of permanent exit, or the wage depression borne by non-routine incumbents — explaining why market adoption speed exceeds the social optimum.
Model-based mechanism: normative/comparative analysis showing omitted externalities in firm-level optimization relative to social planner, leading to divergence between private and social adoption speeds.
Social welfare is strictly concave in adoption speed and is maximized at an interior optimum below the market rate of adoption.
Analytical welfare optimization in the theoretical model: social-welfare function as a function of adoption speed yields strict concavity and an interior social optimum; comparison with market equilibrium adoption speed indicates market rate exceeds social optimum.
Faster adoption causes a sustained compression of the labor share throughout the transition window.
Model result showing time-path of labor's income share under varying adoption speeds in the theoretical framework.
Faster adoption produces a steeper and more persistent decline in labor force participation.
Dynamic model trajectories and comparative statics showing time path of labor force participation under different adoption-speed parameters.
Faster adoption overwhelms the retraining pipeline and generates permanent labor-force exit through worker discouragement.
Model mechanism: finite-capacity retraining queue in the dynamic model leads to queue congestion, producing a discouraged stock of permanently exited workers (analytical result in the theoretical model).
A controlled delivery-mode comparison shows that inline evaluation produces false negatives: GPT-5.1 shows 0% trust inline but 100% under both simulated and real agentic tool-use, demonstrating that delivery mode is a first-order confound.
Controlled experiments comparing inline evaluation vs simulated and real agentic tool-use on GPT-5.1; reported 0% trust in inline mode vs 100% trust in agentic modes (authors' reported results).
Every tested model trusts poisoned data at 100% at moderate attacker sophistication (L2), with 269 valid trials (of 270) accepting fabricated security claims under directed queries.
Primary experimental results across 270 directed-query trials (9 models × 30 each); authors report 269 of 270 trials accepted fabricated security claims under attacker sophistication level L2.
We demonstrate six attack scenarios against a production 42-million-node code knowledge graph, providing the first empirical demonstration of knowledge graph poisoning against a production-scale agentic system.
Empirical demonstrations described in paper: six distinct attack scenarios executed against a production knowledge graph containing 42 million nodes (authors' reported experimental setup).
We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning.
Conceptual definition presented by the authors in the paper (theoretical framing and distinction from prompt injection).
Disclosure banners, conversion A/B testing, UI dark-pattern taxonomies, and generic LLM safety scores were built for older interfaces and miss the prose-recommendation surface where the steering happens.
Argument in paper that existing governance/audit tools designed for ranked-list or older UIs do not cover the new single-sentence prose-recommendation surface; no empirical test reported in excerpt.
Common failures include replacing essential operations such as sweeps, lofts, and twist-extrudes with simpler sketch-and-extrude patterns.
Error-mode analysis described in the paper/abstract showing that models substitute complex CAD operations (sweep, loft, twist-extrude) with simpler sketch-and-extrude sequences.
Common failures include misinterpreting industrial design parameters.
Reported error analysis in the paper/abstract indicating models often misinterpret engineering/design parameters when generating CAD programs.
Common failures include missing fine 3D structure.
Qualitative and quantitative analysis of model outputs on BenchCAD reported in the paper/abstract noting missing fine 3D structural details as a frequent error mode.
Current AI development trajectory reflects value choices that prioritize conversational generality over domain specificity, accountability, and long-term social sustainability.
Normative/critical analysis in the paper highlighting design priorities and trade-offs; no empirical measurement provided.
Sustained investment in large-scale chatbot infrastructures increases environmental costs.
Paper asserts environmental impacts from infrastructure investment (energy, resource use) as part of systemic critique; no quantified environmental measurements or sample size reported.
Chatbot-driven AI development contributes to concentration of economic power.
Argumentation about industry dynamics and infrastructure centralization in the paper; no empirical market-concentration metrics or sample provided.
The normalization of chatbots contributes to labor displacement.
Theoretical argument linking widespread chatbot adoption to changes in work and employment; no empirical displacement estimates provided.
Normalization of chatbot-mediated interaction alters patterns of work, learning, and decision-making, contributing to deskilling, homogenization of knowledge, and shifting expectations of expertise.
Analytical reasoning and literature-informed claims in the paper; no quantitative measurement or sample reported.
Chatbot-based systems often fail to adequately meet user needs, particularly in complex or high-stakes contexts, while projecting confidence and authority.
Qualitative argumentation and illustrative examples in the paper; no reported controlled empirical study or sample size.
The chatbot paradigm is not a neutral interface choice, but a dominant sociotechnical configuration whose widespread adoption reshapes social, economic, legal, and environmental systems.
Conceptual argument and synthesis in the paper (theoretical analysis); no empirical sample or quantitative data reported.
This reliance frequently leads to an excessive reliance on mechanistic interpretability to address a deployment challenge beyond its intended scope.
Author argument drawing on conceptual critique and cited empirical distinctions (paper's argumentative content).
AI deployment in sensitive domains (health care, credit, employment, criminal justice) is often treated as unsafe to authorize until model internals can be explained.
Author assertion based on observed regulatory and institutional tendencies described in the paper (argumentative / contextual evidence within the paper).
A scoping review found that only 9.0% of FDA-approved AI/ML device documents contained a prospective post-market surveillance study.
Paper references a scoping review that examined FDA-approved AI/ML device documents and reported the 9.0% figure.
A 53-percentage-point gap between internal representations and output correction shows that understanding may not translate into action.
Paper cites a recent empirical finding reporting a 53 percentage-point gap between models' internal representations and their ability to correct outputs (described as 'recent evidence').
Human capital and technological innovation channels show weaker or even negative effects on Lae, attributed to short-term resource misallocation and skill mismatches.
Spatial mediation analysis (channel analysis) using panel data for 30 provincial regions (2012–2022) assessing mediating roles of human capital and technological innovation.
Functional deployment and operational investment in AI are associated with employment declines.
Regression analyses from the BTOS AI supplement linking measures of functional AI deployment and operational AI investment to firm-reported employment changes; observational associations (sample size and exact model specification not shown in excerpt).
Employment reductions attributable to AI are rare: only 2% of firms report employment reductions.
Firm self-reports on employment outcomes related to AI from the BTOS AI supplement (Nov 2025–Jan 2026); descriptive statistic reported; sample size not excerpted.
Among firms with worker-level AI use, 65% restrict use to three or fewer tasks.
Descriptive statistic from BTOS AI supplement giving distribution of number of worker tasks using AI among firms that report worker-level use; sample size not shown.
Among adopter firms, scope remains limited: 57% use AI in three or fewer functions.
Descriptive distribution of number of business functions using AI among adopter firms in the BTOS AI supplement (Nov 2025–Jan 2026); sample restricted to adopter firms (sample size not provided).
In labor-intensive industries, industrial robots shorten the backward linkage length (i.e., they reduce backward linkage length in labor-intensive sub-sectors).
Heterogeneity analysis in the paper comparing effects across labor-intensive sub-sectors within the panel of 14 manufacturing sub-sectors; reported finding of a negative effect on backward linkage length in labor-intensive industries.
Institutional inertia in property valuation poses risks to asset pricing, collateral risk modelling and investor confidence.
Analytical inference from interview findings and theoretical synthesis highlighting implications for property investment and financial market stability.
Despite advances in automation, data analytics and AI, the sector has been slow to digitise.
Background statement supported by interview data and sector observation reported in the study.
The IDOI framework provides a transferable model for understanding digital transformation in regulated, high-trust professions and highlights the market-level risks of institutional inertia in property valuation.
Development of the IDOI conceptual framework from qualitative data and theoretical integration; authors' claim about transferability and implications.
Generational divides, protectionist attitudes and fears of automation reinforce digital resistance.
Qualitative interview evidence reporting attitudes across cohorts of valuers and firm personnel; thematic analysis identifying cultural and attitudinal themes.
The Valuers Act (1948), fragmented infrastructure and sovereignty concerns limit innovation.
Interview data from practitioners, firm leaders and regulators in New Zealand citing specific regulatory and infrastructure constraints; thematic analysis.
Barriers to adoption arise primarily from institutional conservatism, outdated regulation and weak data governance rather than technical shortcomings.
Qualitative semi-structured interviews with valuers, firm leaders and regulators in New Zealand; thematic analysis guided by Rogers' diffusion of innovations and institutional theory synthesised into the IDOI framework.
Taken together, AI’s effects on labor and capital may strain democracy unless a set of policies we outline here are gradually implemented.
Paper's normative/predictive claim linking labor- and capital-market effects of AI to political strain on democratic institutions and proposing policy remedies (presented as contingent and prescriptive; no empirical test of democratic outcomes provided in the excerpt).
AI’s training and computing needs are intensifying the technological sector’s interest in regulatory capture.
Paper's causal/inferential claim that increased capital concentration and fixed investments raise incentives for regulatory capture in the tech sector (asserted reasoning; no political-economy empirical test reported in the excerpt).
AI’s current training and computing needs have magnified capital concentration and business investment in fixed assets.
Paper's economic claim linking AI compute/training requirements to increased capital concentration and fixed-asset investment (no quantitative investment or market-concentration data provided in the excerpt).
Many fear AI may displace them from their jobs.
Paper reports survey-style finding about public fear of job displacement (no specific surveys, question wording, dates, or sample sizes given in the excerpt).
Although AI may affect nonroutine jobs in particular.
Statement in paper; asserted as a general finding about which types of jobs AI impacts (no specific dataset, sample size, or empirical method reported in the excerpt).
The welfare equivalence property is unique to the Brier score: for every non-Brier strictly proper scoring rule, the welfare gap under smooth C^1 oversight is bounded below by Ω(Var(1/G'') (γ/β)^2).
Mathematical lower-bound result proved in the paper comparing welfare under smooth C^1 oversight for non-Brier scoring rules; the bound is expressed as Ω(Var(1/G'') (γ/β)^2) in the paper.
The impossibility (that non-affine approval undermines truthful reporting) holds for all strictly proper scoring rules, and the paper provides a closed-form perturbation formula.
General theoretical result proved across the class of strictly proper scoring rules, accompanied by a closed-form formula for the perturbation in the paper.
Any non-affine approval makes truthful reporting suboptimal under the combined objective whenever deviation is undetectable — the principal cannot avoid the perturbation that undermines calibration.
Analytical impossibility theorem in the paper's formal model showing that non-affine approvals create incentives for non-truthful reports when deviations are undetectable (mathematical proof).
Even access to the true conditional vulnerability probability cannot eliminate misallocation: aleatoric uncertainty over individual vulnerability status is irreducible, and probabilistic targeting inevitably misallocates some resources.
Theoretical argument in the paper (conceptual/theoretical result about irreducible aleatoric uncertainty and its implications for probabilistic targeting).
Opaque agent objectives, synthetic traffic loops, and the indistinguishability between human-originated and agent-mediated signals are critical measurement problems examined in the paper.
Conceptual examination and literature synthesis; the paper discusses these as open problems rather than providing primary empirical solutions.
The paper identifies three properties of LLM agents that distinguish the present challenge from prior bot-detection problems: identity discontinuity by design, task-based instantiation, and agent-to-agent loops.
Analytic claim based on synthesis of agent architecture literature; presented as conceptual identification rather than empirically tested properties.
A click may reflect an optimization routine, a proxy objective, or a recursive agent-to-agent exchange rather than meaningful human intent, and traditional inference frameworks cannot reliably distinguish among these possibilities.
Theoretical claim derived from literature on agent behaviors, agent-to-agent interactions, and limitations of existing inference frameworks; no empirical discrimination test reported in this paper excerpt.
The presence of autonomous AI agents weakens the interpretive value of core web analytics metrics, including sessions, engagement, conversion, and retention.
Argument based on conceptual synthesis of how non-human, non-persistent actors generate signals that undermine standard metric interpretations (position paper; no original empirical test included).
Unlike crawlers and traditional bots, these agents do not possess persistent identities or psychologically grounded motivations; they are task-specific, dynamically instantiated processes whose behaviors are contingent and often orchestrated by external systems.
Conceptual analysis informed by literature on agent architecture and LLM-based agents; no primary empirical measurement presented in this paper excerpt.