Evidence (6507 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Productivity Remove filter

The negative quadratic term confirms a concave (inverted-U) relationship between AI and economic growth (diminishing marginal returns of AI).

Panel data for 19 G20 countries (2005–2023) estimated with a quadratic specification in GMM; reported negative and statistically significant coefficient on the AI-squared term.

high mixed Artificial intelligence and economic growth in G20 economies... economic growth

Anhand von Fallstudien aus den G7-Ländern werden verschiedene Einsatzmöglichkeiten veranschaulicht und die wichtigsten Erfolgsfaktoren benannt – Netzanbindung, KI-Inputs, Kompetenzen und Finanzierung.

Evidence comes from G7 country case studies reported in the paper; method = qualitative case studies identifying key success factors (no number of case studies or sample size provided in excerpt).

high mixed Einführung von KI in kleinen und mittleren Unternehmen Schlüssel-Faktoren für erfolgreiche KI-Einführung in KMU (Netzanbindung, Inputs,...

The results vary across the 10 selected countries: the magnitude and significance of AI’s effects differ due to varying technological readiness and differing industrial structures.

Paper statement that results vary across the 10 selected countries and that nuances differ across countries due to varying industrial structures and technological readiness. Implied heterogeneity analysis across countries using the firm-level dataset and regression approaches; no country-level sample counts provided in the excerpt.

high mixed Estimation of Firm Labour Productivity and Sales Growth from... country-level heterogeneity in AI impact on labour productivity and sales growth

Differences in human intervention effectiveness across escalation types are partly explained by variation in workers' post-escalation intervention effort.

Observed correlations (and subgroup comparisons) in the randomized experiment showing that measures of post-escalation effort (e.g., message counts, share of chat rounds, proactivity) vary across escalation types and relate to outcome differences.

high mixed Agentic AI and Human-in-the-Loop Interventions: Field Experi... post-escalation intervention effort and its mediating role on service outcomes

There is a fundamental reward-coverage tradeoff: concentrating probability mass on high-reward actions reduces variance but risks missing signal on actions the target policy may take.

Explicit characterization in abstract; claimed theoretical analysis/derivation of the tradeoff between variance reduction and coverage when designing logging policies.

high mixed Logging Policy Design for Off-Policy Evaluation variance of OPE estimators and coverage of actions relevant to the target policy

These AIECI benefits were contingent on complementary conditions—particularly data quality, governance, managerial interpretation, and integration of intelligence outputs into operating decisions.

Cross-case pattern-matching across five analytical dimensions (intelligence source, AI mechanism, decision domain, economic implication, boundary condition) identifying recurring contingencies in the four firms' archival evidence.

high mixed Artificial Intelligence Enabled Competitive Intelligence as ... conditionality of benefits on complementary organizational factors (data quality...

Accounting for heterogeneity in AI literacy (agents' ability to identify and adapt to inaccurate AI outputs) can produce skill polarization in the long-run steady state.

Analytical/theoretical steady-state distribution analysis of agent skill dynamics with heterogeneous AI literacy parameters; paper reports conditions under which polarization emerges (theoretical, no empirical sample).

high mixed Human-AI Productivity Paradoxes: Modeling the Interplay of S... distribution of agent skill levels (skill polarization across population)

The dominant explanation for the gap locates it in model capability; instead, software-engineering capability emerges from a model-harness-environment system where a runtime substrate (the harness) mediates how an agent observes a project, acts on it, receives feedback, and establishes that a change is complete.

Conceptual argument and reframing presented in the paper (abstract). The paper formalizes this perspective rather than reporting a large-scale empirical test in the abstract.

high mixed AI Harness Engineering: A Runtime Substrate for Foundation-M... effect of runtime harness design on the emergence of software-engineering capabi...

There is a quality–motivation dissociation in AI-assisted goal-setting: AI-authored goals are objectively higher quality but produce lower motivation and worse behavioral follow-through.

Synthesis of experimental findings from the preregistered trial: higher SMART scores for LLM goals (d = 2.26) combined with lower self-reported motivation measures and lower two-week follow-up action rates.

high mixed Optimized but Unowned: How AI-Authored Goals Undermine the M... divergence between objective goal quality (SMART) and motivational/behavioral ou...

The research challenges for this vision stem from a broader flexibility–robustness tension that requires moving beyond the on-the-fly paradigm to navigate effectively.

Analytical claim in paper identifying a design trade-off (flexibility vs. robustness) as the core challenge motivating the proposed shift; no empirical demonstration provided.

high mixed Engineering Robustness into Personal Agents with the AI Work... trade-off between flexibility and robustness in agent design

Current LLM agents are proficient at calling isolated APIs but struggle with the "last mile" of commercial software automation.

Authors' comparative characterization based on literature context and their benchmark motivation; stated in introduction rather than a quantified experiment in the excerpt.

high mixed ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdepend... ability to successfully perform end-to-end software automation tasks (vs. isolat...

Fine-tuning and reinforcement learning improve in-distribution performance, but generalization to unseen part families remains limited.

Experiments reported in the paper/abstract applying fine-tuning and reinforcement learning to models evaluated on BenchCAD; observed improvements on in-distribution data and limited generalization to unseen families.

high mixed BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... in-distribution_performance_and_out-of-distribution_generalization

Across 10+ frontier models, current systems often recover coarse outer geometry but fail to produce faithful parametric CAD programs.

Empirical evaluation reported in the paper/abstract across more than ten contemporary multimodal / large language models on the BenchCAD dataset; observed pattern that coarse outer geometry is often recovered while faithful parametric program synthesis fails.

high mixed BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... faithfulness_of_generated_parametric_CAD_programs

AI exhibits a significant U-shaped spatial effect on Lae.

Spatial econometric analysis (spatial Durbin model) on panel data for 30 Chinese provincial regions (2012–2022); kernel density estimation used for distributional analysis.

high mixed A study of the impact of artificial intelligence on the low-... low-altitude economic growth (Lae) across space

AI has a significant inverted U-shaped impact on the low-altitude economy (Lae), with diminishing marginal returns after a certain turning point.

Panel data from 2012–2022 for 30 Chinese provincial regions; composite AI and Lae indices constructed via the entropy method; estimated using spatial Durbin models and non-linear specification to detect inverted U-shape.

high mixed A study of the impact of artificial intelligence on the low-... low-altitude economic growth (Lae)

Evidence suggests both top-down and bottom-up diffusion: worker use can occur without firm adoption, and vice versa.

Cross-tabulation of firm-level adoption indicators and reports of worker-level use in the BTOS AI supplement (Nov 2025–Jan 2026) indicating non-perfect overlap between firm-declared adoption and reported worker use; analytic approach descriptive (no sample size in excerpt).

high mixed The Microstructure of AI Diffusion: Evidence from Firms, Bus... co-occurrence (or lack thereof) of firm-wide adoption and worker-level AI use

The study reframes VTech adoption as legitimacy-seeking rather than efficiency-driven.

Thematic analysis using Rogers' diffusion of innovations and institutional theory, resulting in the institutionally mediated diffusion of innovations (IDOI) framework which emphasizes legitimacy concerns.

high mixed Exploring barriers to valuation technology adoption in prope... primary motivations for VTech adoption (legitimacy vs efficiency)

Practitioners stress that human judgement remains indispensable, positioning technology as an aid rather than a replacement.

Interview responses from valuers and firm leaders emphasizing the continued role of human judgement; thematic analysis framed by the IDOI model.

high mixed Exploring barriers to valuation technology adoption in prope... role of human judgement vs automation in valuation practice

Public discussion of generative AI in accounting swings between the allure of full automation and job-displacement anxiety, yet the most immediate reality in organizations is human + AI work.

Paper's background/intro synthesizing recent research and practitioner commentary (2023–2025); conceptual observation rather than empirical test.

high mixed Collaborative Intelligence in Accounting: A Human + AI Compl... task_allocation

Integrating Generative AI into agile development processes has potential benefits and limitations for planning efficiency.

High-level conclusion based on the controlled experiment with GitLab Duo and qualitative participant feedback discussed in the paper.

high mixed Splitting User Stories Into Tasks with AI -- A Foe or an All... planning efficiency (benefits and limitations)

Larger models do not consistently outperform smaller ones on tool-use tasks.

Empirical observations from the paper's evaluations across the five function-calling benchmarks.

high mixed Switchcraft: AI Model Router for Agentic Tool Calling relative performance of larger vs smaller models on tool-use tasks

Model routing can mitigate the cost of agentic tool use, but existing routers are designed for chat completion rather than tool use.

Argument/positioning in the paper and literature discussion (no specific empirical test reported for existing routers in this statement).

high mixed Switchcraft: AI Model Router for Agentic Tool Calling cost mitigation via model routing; applicability of existing routers to tool use

The finding that recurrence and neighborhood statistics are stronger predictors than complaint volume has direct implications for complaint routing given the demographic correlates of those features.

Interpretive implication drawn by the authors from the SHAP results; presented as a logical consequence rather than a separately tested empirical result in the excerpt.

high mixed Scaling the Queue: Reinforcement Learning for Equitable Call... implications for complaint routing policy/practice

Successful AI implementation in logistics requires not only technological capability but also organizational readiness and effective data governance.

Conclusion drawn from the structured qualitative review of 31 scholarly sources synthesizing reported success factors and preconditions for AI adoption.

high mixed Evaluating the Role of Artificial Intelligence in Optimizing... successful implementation / adoption

There are factor-share consequences from agent adoption (i.e., implications for the shares of income accruing to factors such as labor and capital).

Model-based discussion and comparative-static analysis in the paper deriving implications for factor shares as agents/compute capital alter production technology. The excerpt indicates qualitative/theoretical analysis rather than empirical measurement.

high mixed Who Prices Cognitive Labor in the Age of Agents? A Position ... factor shares (e.g., labor share vs capital share)

The CAW result generalizes through CES aggregation and, when tasks are separated into substitutable versus complementary, yields a directional inversion of skill-biased technical change.

Theoretical extension of the core model using CES (constant elasticity of substitution) aggregation and task decomposition in the paper; the claim arises from model generalization and comparative-static reasoning. No empirical validation provided in the excerpt.

high mixed Who Prices Cognitive Labor in the Age of Agents? A Position ... direction of skill-biased technical change (which skills gain/lose relative retu...

Agents are not labor; they are a production technology that converts compute capital K_c into effective units of cognitive labor L_A.

Theoretical argument and definitional framing in the paper: the authors recast agents as a technology that transforms compute capital into effective cognitive labor units within an analytical model (textual/theoretical exposition). No empirical sample or experimental data reported in the excerpt.

high mixed Who Prices Cognitive Labor in the Age of Agents? A Position ... classification of agents (technology vs labor)

The trajectory of AI systems is shaped not only by model design, but by the dynamics of human-AI co-evolution.

Conclusion drawn from the minimal model, analytical regimes, and simulation experiments presented in the paper.

high mixed Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Sy... determinants of AI system trajectory (model design vs. co-evolutionary dynamics)

Our analysis identifies three regimes: co-evolutionary enhancement, fragile equilibrium, and degenerative convergence.

Model analysis (categorization of dynamical behaviors) presented in the paper.

high mixed Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Sy... classification of system behavior into three named regimes

This feedback can give rise to distinct dynamical regimes.

Analytical results derived from the minimal dynamical model described in the paper.

high mixed Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Sy... existence of qualitatively different dynamical regimes in the coupled system

We introduce a minimal model with three variables -- human cognition, data quality, and model capability.

Model development in the paper (mathematical/minimal dynamical model); presented as a constructed model rather than empirical measurement.

high mixed Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Sy... theoretical representation of human cognition, data quality, and model capabilit...

Humans and language models form a coupled dynamical system linked by a feedback loop of usage, generation, and retraining.

Conceptual framing and theoretical proposal in the paper; model formulation rather than empirical data.

high mixed Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Sy... dynamical relationship between human cognition, model outputs, and retraining cy...

Prior work has studied cognitive offloading in humans and model collapse in recursive training, but these effects are typically considered in isolation.

Literature review / related-work statement in paper; references to prior research (qualitative, no sample size stated).

high mixed Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Sy... research focus of prior studies (whether effects studied jointly or separately)

Large language models (LLMs) are reshaping how knowledge is produced, with increasing reliance on AI systems for generation, summarization, and reasoning.

Background/literature observation cited in paper (qualitative claim), no empirical sample or quantified data reported in text provided.

high mixed Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Sy... extent to which AI systems are used for knowledge production tasks (generation, ...

Modeling fiscal policy as a government problem (instead of an abstract planner) implies a tax changes the firm's automation first-order condition, raises revenue only on the remaining automation base, and requires specifying rebates and administrative losses.

Explicit governmental optimization and budget-accounting setup in the model: taxes enter firms' automation first-order conditions; revenue is computed on post-tax automation activity and rebates/administration are modeled.

high mixed The Demand Externality of Automation effect of taxation on firm automation choice, tax revenue base, and fiscal accou...

The central analytic object is the derivative of household consumption demand and the collective wage bill with respect to automation.

Paper's stated modeling focus: comparative-static derivatives linking automation to household consumption demand and aggregate wages; used to characterize incidence and welfare effects.

high mixed The Demand Externality of Automation sensitivity (derivative) of household consumption demand and aggregate wage bill...

Automation reallocates income and ownership claims.

Theoretical model with heterogeneous households who hold capital/equity claims; equilibrium determines wages and returns and shows changes in income and ownership shares when automation increases.

high mixed The Demand Externality of Automation distribution of income and ownership (capital vs. labor income shares)

Across studies, causal modeling reveals that cognitive alignment systematically drives attentional coordination in successful collaboration, while mismatches between effort and attention characterize unproductive regulation.

Synthesis of causal inference results from the three studies using time-series measures (JME, JVA) and episode-based analyses across the pooled dataset (182 dyads total).

high mixed Cognitive Alignment Drives Attention: Modeling and Supportin... directional relationship between cognitive alignment (JME) and attentional coord...

There is substantial heterogeneity in the productivity effects across settings.

Meta-analytic heterogeneity assessment reported in the paper (subgroup/moderator analyses indicate variability by context). The paper states 'substantial heterogeneity across settings.'

high mixed A meta-analysis of the effect of generative AI on productivi... variation in productivity effect sizes across study contexts

Program outcomes are moderated by a person's prior occupational skill set, their area of work, and features of the local economy.

Heterogeneity analyses across subgroups defined by prior occupational skill composition, industry/area of work, and local labor-market conditions in the WIOA administrative data (2017-2023) show variation in outcomes.

high mixed Did US Worker Retraining Reduce Participant Automation Expos... Retrainability Index / program outcomes stratified by prior skill set, area of w...

These findings challenge the notion of a universal technological dividend from AI (i.e., AI does not automatically deliver uniform productivity gains across firms).

Overall interpretation/synthesis of heterogeneous empirical results from the panel and cluster analyses showing variation in productivity effects across firm types.

high mixed The Heterogeneous Effects of Artificial Intelligence on Ente... existence of universal productivity gains from AI

AI adoption yields asymmetric productivity gains depending on firms' resource constraints and competitive environments (i.e., heterogeneity rather than a homogeneous effect).

Heterogeneity analysis using multidimensional clustering (firm size, age, market competitiveness, digital infrastructure) applied to the panel dataset; reported differential effects across clusters.

high mixed The Heterogeneous Effects of Artificial Intelligence on Ente... Total Factor Productivity (TFP) heterogeneity

AI adoption affects Total Factor Productivity (TFP) of firms.

Panel regression analysis using the stated panel dataset examining relationship between AI adoption and firm-level TFP.

high mixed The Heterogeneous Effects of Artificial Intelligence on Ente... Total Factor Productivity (TFP)

Overall conclusion: AI offers substantial benefits to financial institutions, but ethical considerations and strategic workforce planning are essential for sustainable integration.

Synthesis/interpretation by the authors drawing on their empirical results (positive effects on ROA, efficiency, risk-adjusted returns, customer satisfaction, reduced compliance costs/breaches) and identified challenges (algorithmic bias, workforce displacement).

high mixed Research on the Transformation Acceleration of Financial Ins... Net impact of AI integration on firm performance and governance plus policy reco...

These divergences carry direct implications for policy interventions.

Interpretation/conclusion drawn from the divergence between RL Feasibility Index and existing measures (policy implication claimed by authors).

high mixed What Jobs Can AI Learn? Measuring Exposure by Reinforcement ... policy relevance of measurement divergences

While Agentic AI enhances economic performance, its benefits are mediated by structural conditions and are unevenly distributed across countries (i.e., reinforcing core–periphery inequalities).

Combined findings from fixed-effects regressions, mediation analysis, and observed heterogeneity between developed and emerging economies in the 2015–2024 panel.

high mixed The Economic Value of Agentic AI: A Comparative Analysis of ... distribution of economic benefits from AI across countries (inequality of gains)

No single governance setting dominates across all contexts; moderate governance becomes increasingly competitive as the learner accumulates experience within the governed action space.

Empirical finding reported from experiments with the contextual-bandit learner operating under different governance constraints and learning over time; comparative performance over learning horizon described in the paper. Sample size / trial counts not provided in the excerpt.

high mixed HAAS: A Policy-Aware Framework for Adaptive Task Allocation ... relative performance of governance settings over learning/experience (competitiv...

This workload-buffering effect (governance improving performance while reducing fatigue) contradicts the usual framing of governance as pure overhead.

Interpretation and comparison of empirical manufacturing results against prior framing in literature (qualitative claim within the paper). No sample size provided.

high mixed HAAS: A Policy-Aware Framework for Adaptive Task Allocation ... relationship between governance and combined measures of performance and fatigue

Governance is not a binary switch but a tunable design variable: tighter constraints predictably convert autonomous AI assignments into supervised collaborations, with domain-specific costs and benefits.

Empirical finding reported from experiments using the HAAS benchmark across the two domains (software engineering and manufacturing); qualitative and/or quantitative comparisons of allocations under varying governance constraints. Paper does not state sample size in the provided text.

high mixed HAAS: A Policy-Aware Framework for Adaptive Task Allocation ... distribution of collaboration modes / assignment types (autonomous vs supervised...

Workload-aware blended pricing reorders the leaderboard substantially: 7 of 10 top-ranked endpoints under the chat preset (3:1 input:output) fall out of the top 10 under the retrieval-augmented preset (20:1).

Comparison of endpoint rankings under two workload presets (chat preset 3:1 and retrieval-augmented preset 20:1); statement gives counts (7 of top 10 change).

high mixed Token Arena: A Continuous Benchmark Unifying Energy and Cogn... change in top-10 endpoint rankings between workload presets

1 2 3 … … 130 131 Next »