Evidence (11633 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Firms do not internalize the congestion externality they impose on the retraining queue, the irreversibility of permanent exit, or the wage depression borne by non-routine incumbents — explaining why market adoption speed exceeds the social optimum.

Model-based mechanism: normative/comparative analysis showing omitted externalities in firm-level optimization relative to social planner, leading to divergence between private and social adoption speeds.

high negative Too Fast to Adjust: Adoption Speed and the Permanent Cost of... degree of divergence between market and socially optimal adoption speeds (mechan...

Social welfare is strictly concave in adoption speed and is maximized at an interior optimum below the market rate of adoption.

Analytical welfare optimization in the theoretical model: social-welfare function as a function of adoption speed yields strict concavity and an interior social optimum; comparison with market equilibrium adoption speed indicates market rate exceeds social optimum.

high negative Too Fast to Adjust: Adoption Speed and the Permanent Cost of... social welfare as a function of adoption speed (location of social optimum vs ma...

Faster adoption causes a sustained compression of the labor share throughout the transition window.

Model result showing time-path of labor's income share under varying adoption speeds in the theoretical framework.

high negative Too Fast to Adjust: Adoption Speed and the Permanent Cost of... labor share (labor income as share of total income)

Faster adoption produces a steeper and more persistent decline in labor force participation.

Dynamic model trajectories and comparative statics showing time path of labor force participation under different adoption-speed parameters.

high negative Too Fast to Adjust: Adoption Speed and the Permanent Cost of... labor force participation rate

Faster adoption overwhelms the retraining pipeline and generates permanent labor-force exit through worker discouragement.

Model mechanism: finite-capacity retraining queue in the dynamic model leads to queue congestion, producing a discouraged stock of permanently exited workers (analytical result in the theoretical model).

high negative Too Fast to Adjust: Adoption Speed and the Permanent Cost of... permanent labor force exit (discouraged stock)

A controlled delivery-mode comparison shows that inline evaluation produces false negatives: GPT-5.1 shows 0% trust inline but 100% under both simulated and real agentic tool-use, demonstrating that delivery mode is a first-order confound.

Controlled experiments comparing inline evaluation vs simulated and real agentic tool-use on GPT-5.1; reported 0% trust in inline mode vs 100% trust in agentic modes (authors' reported results).

high negative Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise A... GPT-5.1 trust rate depending on delivery mode (inline vs agentic tool-use)

Every tested model trusts poisoned data at 100% at moderate attacker sophistication (L2), with 269 valid trials (of 270) accepting fabricated security claims under directed queries.

Primary experimental results across 270 directed-query trials (9 models × 30 each); authors report 269 of 270 trials accepted fabricated security claims under attacker sophistication level L2.

high negative Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise A... rate at which models accept fabricated security claims when querying poisoned gr...

We demonstrate six attack scenarios against a production 42-million-node code knowledge graph, providing the first empirical demonstration of knowledge graph poisoning against a production-scale agentic system.

Empirical demonstrations described in paper: six distinct attack scenarios executed against a production knowledge graph containing 42 million nodes (authors' reported experimental setup).

high negative Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise A... successful execution of poisoning attacks on a production-scale knowledge graph

We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning.

Conceptual definition presented by the authors in the paper (theoretical framing and distinction from prompt injection).

high negative Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise A... agent reasoning correctness when querying corrupted knowledge graphs

Disclosure banners, conversion A/B testing, UI dark-pattern taxonomies, and generic LLM safety scores were built for older interfaces and miss the prose-recommendation surface where the steering happens.

Argument in paper that existing governance/audit tools designed for ranked-list or older UIs do not cover the new single-sentence prose-recommendation surface; no empirical test reported in excerpt.

high negative TourMart: A Parametric Audit Instrument for Commission Steer... coverage/effectiveness of existing governance tools for prose recommendations

Common failures include replacing essential operations such as sweeps, lofts, and twist-extrudes with simpler sketch-and-extrude patterns.

Error-mode analysis described in the paper/abstract showing that models substitute complex CAD operations (sweep, loft, twist-extrude) with simpler sketch-and-extrude sequences.

high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... use_of_appropriate_CAD_operations_in_generated_code

Common failures include misinterpreting industrial design parameters.

Reported error analysis in the paper/abstract indicating models often misinterpret engineering/design parameters when generating CAD programs.

high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... accuracy_of_inferred_design_parameters

Common failures include missing fine 3D structure.

Qualitative and quantitative analysis of model outputs on BenchCAD reported in the paper/abstract noting missing fine 3D structural details as a frequent error mode.

high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... completeness_of_3D_structure_in_generated_models

Current AI development trajectory reflects value choices that prioritize conversational generality over domain specificity, accountability, and long-term social sustainability.

Normative/critical analysis in the paper highlighting design priorities and trade-offs; no empirical measurement provided.

high negative What if AI systems weren't chatbots? Relative prioritization of conversational generality versus domain specificity, ...

Sustained investment in large-scale chatbot infrastructures increases environmental costs.

Paper asserts environmental impacts from infrastructure investment (energy, resource use) as part of systemic critique; no quantified environmental measurements or sample size reported.

high negative What if AI systems weren't chatbots? Environmental costs associated with energy/resource use of chatbot infrastructur...

Chatbot-driven AI development contributes to concentration of economic power.

Argumentation about industry dynamics and infrastructure centralization in the paper; no empirical market-concentration metrics or sample provided.

high negative What if AI systems weren't chatbots? Concentration of economic power among firms/platforms producing and hosting chat...

The normalization of chatbots contributes to labor displacement.

Theoretical argument linking widespread chatbot adoption to changes in work and employment; no empirical displacement estimates provided.

high negative What if AI systems weren't chatbots? Labor displacement (job losses attributable to chatbot adoption)

Normalization of chatbot-mediated interaction alters patterns of work, learning, and decision-making, contributing to deskilling, homogenization of knowledge, and shifting expectations of expertise.

Analytical reasoning and literature-informed claims in the paper; no quantitative measurement or sample reported.

high negative What if AI systems weren't chatbots? Levels of skill retention/ acquisition (deskilling), diversity of knowledge (hom...

Chatbot-based systems often fail to adequately meet user needs, particularly in complex or high-stakes contexts, while projecting confidence and authority.

Qualitative argumentation and illustrative examples in the paper; no reported controlled empirical study or sample size.

high negative What if AI systems weren't chatbots? Adequacy of chatbot responses to user needs in complex/high-stakes contexts and ...

The chatbot paradigm is not a neutral interface choice, but a dominant sociotechnical configuration whose widespread adoption reshapes social, economic, legal, and environmental systems.

Conceptual argument and synthesis in the paper (theoretical analysis); no empirical sample or quantitative data reported.

high negative What if AI systems weren't chatbots? Degree to which chatbot adoption reshapes social, economic, legal, and environme...

This reliance frequently leads to an excessive reliance on mechanistic interpretability to address a deployment challenge beyond its intended scope.

Author argument drawing on conceptual critique and cited empirical distinctions (paper's argumentative content).

high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... appropriateness of mechanistic interpretability as a gate for deployment

AI deployment in sensitive domains (health care, credit, employment, criminal justice) is often treated as unsafe to authorize until model internals can be explained.

Author assertion based on observed regulatory and institutional tendencies described in the paper (argumentative / contextual evidence within the paper).

high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... authorization policy stance toward AI in sensitive domains (requirement for inte...

A scoping review found that only 9.0% of FDA-approved AI/ML device documents contained a prospective post-market surveillance study.

Paper references a scoping review that examined FDA-approved AI/ML device documents and reported the 9.0% figure.

high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... presence of prospective post-market surveillance study in FDA AI/ML device docum...

A 53-percentage-point gap between internal representations and output correction shows that understanding may not translate into action.

Paper cites a recent empirical finding reporting a 53 percentage-point gap between models' internal representations and their ability to correct outputs (described as 'recent evidence').

high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... gap between internal model representations and ability to correct outputs

Human capital and technological innovation channels show weaker or even negative effects on Lae, attributed to short-term resource misallocation and skill mismatches.

Spatial mediation analysis (channel analysis) using panel data for 30 provincial regions (2012–2022) assessing mediating roles of human capital and technological innovation.

high negative A study of the impact of artificial intelligence on the low-... mediated effect of human capital and technological innovation on Lae

Functional deployment and operational investment in AI are associated with employment declines.

Regression analyses from the BTOS AI supplement linking measures of functional AI deployment and operational AI investment to firm-reported employment changes; observational associations (sample size and exact model specification not shown in excerpt).

high negative The Microstructure of AI Diffusion: Evidence from Firms, Bus... employment change associated with functional deployment and operational investme...

Employment reductions attributable to AI are rare: only 2% of firms report employment reductions.

Firm self-reports on employment outcomes related to AI from the BTOS AI supplement (Nov 2025–Jan 2026); descriptive statistic reported; sample size not excerpted.

high negative The Microstructure of AI Diffusion: Evidence from Firms, Bus... reported employment reductions due to AI

Among firms with worker-level AI use, 65% restrict use to three or fewer tasks.

Descriptive statistic from BTOS AI supplement giving distribution of number of worker tasks using AI among firms that report worker-level use; sample size not shown.

high negative The Microstructure of AI Diffusion: Evidence from Firms, Bus... breadth of worker-task AI use per firm (number of tasks)

Among adopter firms, scope remains limited: 57% use AI in three or fewer functions.

Descriptive distribution of number of business functions using AI among adopter firms in the BTOS AI supplement (Nov 2025–Jan 2026); sample restricted to adopter firms (sample size not provided).

high negative The Microstructure of AI Diffusion: Evidence from Firms, Bus... number of business functions using AI per adopting firm (breadth of functional d...

In labor-intensive industries, industrial robots shorten the backward linkage length (i.e., they reduce backward linkage length in labor-intensive sub-sectors).

Heterogeneity analysis in the paper comparing effects across labor-intensive sub-sectors within the panel of 14 manufacturing sub-sectors; reported finding of a negative effect on backward linkage length in labor-intensive industries.

high negative Research on the impact of industrial robot application on th... backward linkage length (a component of global value chain length) in labor-inte...

Institutional inertia in property valuation poses risks to asset pricing, collateral risk modelling and investor confidence.

Analytical inference from interview findings and theoretical synthesis highlighting implications for property investment and financial market stability.

high negative Exploring barriers to valuation technology adoption in prope... risks to asset pricing, collateral risk modelling and investor confidence

Despite advances in automation, data analytics and AI, the sector has been slow to digitise.

Background statement supported by interview data and sector observation reported in the study.

high negative Exploring barriers to valuation technology adoption in prope... pace of digitisation in the property valuation sector

The IDOI framework provides a transferable model for understanding digital transformation in regulated, high-trust professions and highlights the market-level risks of institutional inertia in property valuation.

Development of the IDOI conceptual framework from qualitative data and theoretical integration; authors' claim about transferability and implications.

high negative Exploring barriers to valuation technology adoption in prope... transferability of the framework and market-level risks from institutional inert...

Generational divides, protectionist attitudes and fears of automation reinforce digital resistance.

Qualitative interview evidence reporting attitudes across cohorts of valuers and firm personnel; thematic analysis identifying cultural and attitudinal themes.

high negative Exploring barriers to valuation technology adoption in prope... cultural/attitudinal resistance to VTech

The Valuers Act (1948), fragmented infrastructure and sovereignty concerns limit innovation.

Interview data from practitioners, firm leaders and regulators in New Zealand citing specific regulatory and infrastructure constraints; thematic analysis.

high negative Exploring barriers to valuation technology adoption in prope... regulatory and infrastructure constraints on innovation

Barriers to adoption arise primarily from institutional conservatism, outdated regulation and weak data governance rather than technical shortcomings.

Qualitative semi-structured interviews with valuers, firm leaders and regulators in New Zealand; thematic analysis guided by Rogers' diffusion of innovations and institutional theory synthesised into the IDOI framework.

high negative Exploring barriers to valuation technology adoption in prope... barriers to VTech adoption

Taken together, AI’s effects on labor and capital may strain democracy unless a set of policies we outline here are gradually implemented.

Paper's normative/predictive claim linking labor- and capital-market effects of AI to political strain on democratic institutions and proposing policy remedies (presented as contingent and prescriptive; no empirical test of democratic outcomes provided in the excerpt).

high negative AI’s Economy and Its Political and Institutional Consequence... risk of democratic strain from AI-driven labor and capital shifts

AI’s training and computing needs are intensifying the technological sector’s interest in regulatory capture.

Paper's causal/inferential claim that increased capital concentration and fixed investments raise incentives for regulatory capture in the tech sector (asserted reasoning; no political-economy empirical test reported in the excerpt).

high negative AI’s Economy and Its Political and Institutional Consequence... technological sector's interest/incentive for regulatory capture

AI’s current training and computing needs have magnified capital concentration and business investment in fixed assets.

Paper's economic claim linking AI compute/training requirements to increased capital concentration and fixed-asset investment (no quantitative investment or market-concentration data provided in the excerpt).

high negative AI’s Economy and Its Political and Institutional Consequence... capital concentration and fixed-asset business investment

Many fear AI may displace them from their jobs.

Paper reports survey-style finding about public fear of job displacement (no specific surveys, question wording, dates, or sample sizes given in the excerpt).

high negative AI’s Economy and Its Political and Institutional Consequence... perceived risk of job displacement

Although AI may affect nonroutine jobs in particular.

Statement in paper; asserted as a general finding about which types of jobs AI impacts (no specific dataset, sample size, or empirical method reported in the excerpt).

high negative AI’s Economy and Its Political and Institutional Consequence... vulnerability of nonroutine jobs to AI

The welfare equivalence property is unique to the Brier score: for every non-Brier strictly proper scoring rule, the welfare gap under smooth C^1 oversight is bounded below by Ω(Var(1/G'') (γ/β)^2).

Mathematical lower-bound result proved in the paper comparing welfare under smooth C^1 oversight for non-Brier scoring rules; the bound is expressed as Ω(Var(1/G'') (γ/β)^2) in the paper.

high negative The Endogeneity of Miscalibration: Impossibility and Escape ... welfare gap between second-best and first-best under smooth C^1 oversight for no...

The impossibility (that non-affine approval undermines truthful reporting) holds for all strictly proper scoring rules, and the paper provides a closed-form perturbation formula.

General theoretical result proved across the class of strictly proper scoring rules, accompanied by a closed-form formula for the perturbation in the paper.

high negative The Endogeneity of Miscalibration: Impossibility and Escape ... existence and magnitude of perturbation from truthful reporting under arbitrary ...

Any non-affine approval makes truthful reporting suboptimal under the combined objective whenever deviation is undetectable — the principal cannot avoid the perturbation that undermines calibration.

Analytical impossibility theorem in the paper's formal model showing that non-affine approvals create incentives for non-truthful reports when deviations are undetectable (mathematical proof).

high negative The Endogeneity of Miscalibration: Impossibility and Escape ... truthfulness of agent reports (report calibration/truthfulness)

Even access to the true conditional vulnerability probability cannot eliminate misallocation: aleatoric uncertainty over individual vulnerability status is irreducible, and probabilistic targeting inevitably misallocates some resources.

Theoretical argument in the paper (conceptual/theoretical result about irreducible aleatoric uncertainty and its implications for probabilistic targeting).

high negative The Limits of AI-Driven Allocation: Optimal Screening under ... misallocation of resources (allocation error due to aleatoric uncertainty)

Opaque agent objectives, synthetic traffic loops, and the indistinguishability between human-originated and agent-mediated signals are critical measurement problems examined in the paper.

Conceptual examination and literature synthesis; the paper discusses these as open problems rather than providing primary empirical solutions.

high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... degree of opacity and indistinguishability of agent-mediated versus human-origin...

The paper identifies three properties of LLM agents that distinguish the present challenge from prior bot-detection problems: identity discontinuity by design, task-based instantiation, and agent-to-agent loops.

Analytic claim based on synthesis of agent architecture literature; presented as conceptual identification rather than empirically tested properties.

high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... distinctive properties of LLM agents relevant to detection and measurement

A click may reflect an optimization routine, a proxy objective, or a recursive agent-to-agent exchange rather than meaningful human intent, and traditional inference frameworks cannot reliably distinguish among these possibilities.

Theoretical claim derived from literature on agent behaviors, agent-to-agent interactions, and limitations of existing inference frameworks; no empirical discrimination test reported in this paper excerpt.

high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... reliability of attribution of click events to meaningful human intent

The presence of autonomous AI agents weakens the interpretive value of core web analytics metrics, including sessions, engagement, conversion, and retention.

Argument based on conceptual synthesis of how non-human, non-persistent actors generate signals that undermine standard metric interpretations (position paper; no original empirical test included).

high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... interpretive validity of core web analytics metrics (sessions, engagement, conve...

Unlike crawlers and traditional bots, these agents do not possess persistent identities or psychologically grounded motivations; they are task-specific, dynamically instantiated processes whose behaviors are contingent and often orchestrated by external systems.

Conceptual analysis informed by literature on agent architecture and LLM-based agents; no primary empirical measurement presented in this paper excerpt.

high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... identity persistence and motivational structure of autonomous AI agents (vs. tra...

« Prev 1 2 3 … 15 16 17 … 232 233 Next »