Evidence (16496 claims)

Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.

The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).

Browse by theme

Nine broad, paper-level topics. Click one to filter the claims below.

Human-AI Collaboration

Claims by outcome category

Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.

Outcome	Positive	Negative	Mixed	Null	Total
Other	870	233	116	1066	2363
Governance & Regulation	976	451	218	133	1809
Organizational Efficiency	949	224	144	88	1416
Technology Adoption Rate	764	287	141	122	1325
Research Productivity	501	152	74	362	1101
Output Quality	542	216	69	69	896
Decision Quality	387	198	94	54	740
Firm Productivity	513	67	101	27	714
AI Safety & Ethics	249	303	73	36	667
Market Structure	190	192	134	27	548
Task Allocation	243	77	91	36	452
Innovation Output	291	33	55	20	401
Skill Acquisition	206	72	65	21	364
Employment Level	133	63	115	22	335
Fiscal & Macroeconomic	153	79	52	32	323
Task Completion Time	206	37	12	15	272
Firm Revenue	179	52	29	5	266
Consumer Welfare	130	76	47	13	266
Inequality Measures	48	137	51	6	242
Worker Satisfaction	101	81	25	13	220
Error Rate	84	110	11	5	210
Wages & Compensation	98	47	30	10	185
Regulatory Compliance	88	73	17	7	185
Automation Exposure	66	64	33	16	182
Team Performance	105	29	30	11	176
Training Effectiveness	109	22	14	21	168
Developer Productivity	114	21	14	8	158
Job Displacement	12	90	24	1	127
Hiring & Recruitment	57	9	9	5	80
Skill Obsolescence	6	56	9	1	72
Social Protection	43	17	8	2	70
Creative Output	35	21	9	4	70
Labor Share of Income	18	21	17	1	57
Worker Turnover	15	16	—	4	35
Industry	—	—	—	1	1

Augmented work agency is shaped by whether applications are generative or non-generative, by employees' experiences of anxiety and technostress, and by micro-politics through which teams negotiate AI use and AI ethics.

Thematic findings from semistructured interviews (28 participants) and document review identifying these factors as shaping agency in practice.

high mixed Reimagining work in the age of intelligent automation: a qua... determinants shaping augmented work agency

The analysis uncovers three central tensions shaping AI-mediated work: autonomy versus orchestration; capability versus dependency; and experimentation versus ethics.

Recurring themes identified through qualitative interviews (28 participants) and document review; interpretive synthesis presented in findings.

high mixed Reimagining work in the age of intelligent automation: a qua... tensions influencing dynamics of AI-mediated work

AI integration transforms managerial practices, workforce identities and organizational coordination.

Thematic and interpretive analysis of semistructured interviews with 28 managers/professionals across 12 organizations and review of organizational documents.

high mixed Reimagining work in the age of intelligent automation: a qua... managerial practices, workforce identities, organizational coordination

These AIECI benefits were contingent on complementary conditions—particularly data quality, governance, managerial interpretation, and integration of intelligence outputs into operating decisions.

Cross-case pattern-matching across five analytical dimensions (intelligence source, AI mechanism, decision domain, economic implication, boundary condition) identifying recurring contingencies in the four firms' archival evidence.

high mixed Artificial Intelligence Enabled Competitive Intelligence as ... conditionality of benefits on complementary organizational factors (data quality...

Accounting for heterogeneity in AI literacy (agents' ability to identify and adapt to inaccurate AI outputs) can produce skill polarization in the long-run steady state.

Analytical/theoretical steady-state distribution analysis of agent skill dynamics with heterogeneous AI literacy parameters; paper reports conditions under which polarization emerges (theoretical, no empirical sample).

high mixed Human-AI Productivity Paradoxes: Modeling the Interplay of S... distribution of agent skill levels (skill polarization across population)

Beyond length biases, fine-tuning amplifies sycophancy and relationship-seeking behaviours in models.

Behavioral analysis of model outputs in the within-subject experiment (530 participants) showing increased incidence/intensity of sycophantic and relationship-seeking responses after preference fine-tuning compared to baseline models.

high mixed PRISM-X: Experiments on Personalised Fine-Tuning with Human ... frequency/intensity of sycophantic and relationship-seeking behaviours in model ...

Adapting to individual preference data yields only marginal gains over training on pooled preferences from a diverse population.

Comparison within the same within-subject experiment (530 participants) between models fine-tuned on individual preferences versus models trained on pooled preferences across participants; reported as 'marginal gains'.

high mixed PRISM-X: Experiments on Personalised Fine-Tuning with Human ... incremental improvement in human-judged preference alignment when using individu...

Specialized detectors generally perform better but remain inconsistent across generators and can produce false positives on real-damaged samples.

Experimental comparison showing specialized AI-generated image detectors outperform MLLMs on some generator subsets, yet show variability across generators and some false positives on genuine damaged images.

high mixed FraudBench: A Multimodal Benchmark for Detecting AI-Generate... detection accuracy and false positive rate of specialized detectors across gener...

The intervention serves as a middle ground in the trade-off between higher costs (from more granular demographic targeting) and skew (from ignoring demographics entirely).

Authors' comparative claim about cost–skew trade-offs observed in their intervention versus alternatives; no quantitative cost or skew figures provided in the excerpt.

high mixed Into the Unknown: Accounting for Missing Demographic Data wh... trade-off between advertising cost and magnitude of ad delivery skew

The dominant explanation for the gap locates it in model capability; instead, software-engineering capability emerges from a model-harness-environment system where a runtime substrate (the harness) mediates how an agent observes a project, acts on it, receives feedback, and establishes that a change is complete.

Conceptual argument and reframing presented in the paper (abstract). The paper formalizes this perspective rather than reporting a large-scale empirical test in the abstract.

high mixed AI Harness Engineering: A Runtime Substrate for Foundation-M... effect of runtime harness design on the emergence of software-engineering capabi...

There is a quality–motivation dissociation in AI-assisted goal-setting: AI-authored goals are objectively higher quality but produce lower motivation and worse behavioral follow-through.

Synthesis of experimental findings from the preregistered trial: higher SMART scores for LLM goals (d = 2.26) combined with lower self-reported motivation measures and lower two-week follow-up action rates.

high mixed Optimized but Unowned: How AI-Authored Goals Undermine the M... divergence between objective goal quality (SMART) and motivational/behavioral ou...

The research challenges for this vision stem from a broader flexibility–robustness tension that requires moving beyond the on-the-fly paradigm to navigate effectively.

Analytical claim in paper identifying a design trade-off (flexibility vs. robustness) as the core challenge motivating the proposed shift; no empirical demonstration provided.

high mixed Engineering Robustness into Personal Agents with the AI Work... trade-off between flexibility and robustness in agent design

Current LLM agents are proficient at calling isolated APIs but struggle with the "last mile" of commercial software automation.

Authors' comparative characterization based on literature context and their benchmark motivation; stated in introduction rather than a quantified experiment in the excerpt.

high mixed ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdepend... ability to successfully perform end-to-end software automation tasks (vs. isolat...

Aggregate effects are geographically uneven (geographic unevenness in AI-driven labor market impacts).

Synthesis across studies observing variation by geography and noting non-Anglophone markets and developing economies as under-studied and differentially affected.

high mixed Creation, validation, obsolescence: observed evidence of AI-... geographic heterogeneity in labor market impacts

Wage polarization characterizes the aggregate pattern of labor market change associated with recent AI advances.

Aggregate characterization from synthesized studies reporting divergent wage outcomes (higher wages for AI-augmented workers, pressures on junior/routine roles) consistent with polarization.

high mixed Creation, validation, obsolescence: observed evidence of AI-... wage distribution changes (polarization)

Sectoral effects are heterogeneous: infrastructure, security, and quality-assurance roles have expanded while developer roles have contracted.

Qualitative and quantitative results aggregated across the included studies noting role-level expansions and contractions; no single pooled effect size provided.

high mixed Creation, validation, obsolescence: observed evidence of AI-... changes in employment/posting volumes by occupational role (infrastructure, secu...

Non-routine employment and wages exhibit a crossing pattern: initially higher under fast adoption, then lower — so faster adoption can simultaneously raise long-run wages for survivors while permanently reducing participation.

Comparative dynamic trajectories in the model showing time paths for non-routine employment and wages under fast vs. slow adoption scenarios (analytical and/or simulated model paths).

high mixed Too Fast to Adjust: Adoption Speed and the Permanent Cost of... non-routine employment and non-routine wages (time-path / crossing pattern)

Even when two economies share the same long-run automation level, adoption speed alone determines transition welfare.

Comparative-welfare analysis in the dynamic theoretical model holding long-run automation level fixed while varying adoption speed (analytical comparative statics).

high mixed Too Fast to Adjust: Adoption Speed and the Permanent Cost of... transition social welfare

Under open-ended prompts, trust drops to 3-55%, confirming prompt framing as a confound; we report both conditions.

Experimental comparison reported by authors between directed queries and open-ended prompts; observed trust rates under open-ended prompts ranged from 3% to 55% (no explicit per-model sample sizes reported in the summary).

high mixed Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise A... model trust rate in accepting poisoned data under open-ended prompts

Generative AI lowers barriers to solo entrepreneurship while reinforcing team-based advantages.

Synthesis of the observed patterns in the Product Hunt data: sharp increase in solo launches after ChatGPT-3.5 (barrier lowering) combined with persistent team dominance among top-quality outcomes (reinforcing team advantages).

high mixed Generative AI Fuels Solo Entrepreneurship, but Teams Still L... barriers to entry for solo entrepreneurship (proxied by solo launch rates) and c...

Fine-tuning and reinforcement learning improve in-distribution performance, but generalization to unseen part families remains limited.

Experiments reported in the paper/abstract applying fine-tuning and reinforcement learning to models evaluated on BenchCAD; observed improvements on in-distribution data and limited generalization to unseen families.

high mixed BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... in-distribution_performance_and_out-of-distribution_generalization

Across 10+ frontier models, current systems often recover coarse outer geometry but fail to produce faithful parametric CAD programs.

Empirical evaluation reported in the paper/abstract across more than ten contemporary multimodal / large language models on the BenchCAD dataset; observed pattern that coarse outer geometry is often recovered while faithful parametric program synthesis fails.

high mixed BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... faithfulness_of_generated_parametric_CAD_programs

AI exhibits a significant U-shaped spatial effect on Lae.

Spatial econometric analysis (spatial Durbin model) on panel data for 30 Chinese provincial regions (2012–2022); kernel density estimation used for distributional analysis.

high mixed A study of the impact of artificial intelligence on the low-... low-altitude economic growth (Lae) across space

AI has a significant inverted U-shaped impact on the low-altitude economy (Lae), with diminishing marginal returns after a certain turning point.

Panel data from 2012–2022 for 30 Chinese provincial regions; composite AI and Lae indices constructed via the entropy method; estimated using spatial Durbin models and non-linear specification to detect inverted U-shape.

high mixed A study of the impact of artificial intelligence on the low-... low-altitude economic growth (Lae)

Evidence suggests both top-down and bottom-up diffusion: worker use can occur without firm adoption, and vice versa.

Cross-tabulation of firm-level adoption indicators and reports of worker-level use in the BTOS AI supplement (Nov 2025–Jan 2026) indicating non-perfect overlap between firm-declared adoption and reported worker use; analytic approach descriptive (no sample size in excerpt).

high mixed The Microstructure of AI Diffusion: Evidence from Firms, Bus... co-occurrence (or lack thereof) of firm-wide adoption and worker-level AI use

Depending on the used fairness metric, the Pareto frontier may include upper-bound threshold rules, thus preferring individuals with lower success probabilities.

Analytical derivations showing that for certain fairness metrics the set of Pareto-optimal rules includes rules that impose upper-bound thresholds; theoretical examples and arguments in the paper.

high mixed Fairness vs Performance: Characterizing the Pareto Frontier ... presence of upper-bound threshold rules on Pareto frontier (preference toward lo...

The study reframes VTech adoption as legitimacy-seeking rather than efficiency-driven.

Thematic analysis using Rogers' diffusion of innovations and institutional theory, resulting in the institutionally mediated diffusion of innovations (IDOI) framework which emphasizes legitimacy concerns.

high mixed Exploring barriers to valuation technology adoption in prope... primary motivations for VTech adoption (legitimacy vs efficiency)

Practitioners stress that human judgement remains indispensable, positioning technology as an aid rather than a replacement.

Interview responses from valuers and firm leaders emphasizing the continued role of human judgement; thematic analysis framed by the IDOI model.

high mixed Exploring barriers to valuation technology adoption in prope... role of human judgement vs automation in valuation practice

Responses [about AI's effects] vary by cohort and depending on survey framing.

Paper asserts heterogeneity in survey responses across demographic cohorts and due to framing effects (no subgroup sample sizes or framing experiment details in excerpt).

high mixed AI’s Economy and Its Political and Institutional Consequence... variation in survey responses by cohort and framing

This [model divergence] may explain why public opinion is not settled about the effects of AI.

Paper's interpretive claim linking model divergence to unsettled public opinion (presented as a plausible explanation; no causal test or survey linkage provided in excerpt).

high mixed AI’s Economy and Its Political and Institutional Consequence... public opinion about AI's effects

Current models about the vulnerability level of occupations and economic sectors differ widely in their forecasts.

Paper's comparative statement about existing models and their forecasts (no specific models, quantitative comparisons, or sample sizes provided in the excerpt).

high mixed AI’s Economy and Its Political and Institutional Consequence... disagreement across model forecasts of occupational/sector vulnerability

Message for AI alignment: smooth scoring-based oversight cannot elicit truthful reports from a strategic agent; sharp thresholds (step functions) are the calibration-preserving design.

Synthesis of the paper's theoretical impossibility and constructive results applied to AI oversight setting (argument plus the step-function constructive escape).

high mixed The Endogeneity of Miscalibration: Impossibility and Escape ... ability of oversight designs (smooth scoring vs. sharp thresholds) to preserve c...

Screening and algorithmic targeting can act as complements or substitutes; the paper empirically characterizes when they do so.

Empirical and theoretical analysis in the paper that identifies conditions (notably levels of aleatoric uncertainty) under which screening increases or decreases the marginal value of algorithmic targeting.

high mixed The Limits of AI-Driven Allocation: Optimal Screening under ... interaction between screening and algorithmic targeting (complementarity vs subs...

Governance machinery from energy systems and critical infrastructure offers a partial template for governing automated web actors, but only some dimensions transfer.

Comparative governance argument drawing on adjacent-sector governance literature; conceptual mapping rather than empirical governance trial reported.

high mixed The Vanishing User: Web Analytics in an Agent-Dominated Inte... applicability of governance frameworks from energy/critical infrastructure to AI...

Public discussion of generative AI in accounting swings between the allure of full automation and job-displacement anxiety, yet the most immediate reality in organizations is human + AI work.

Paper's background/intro synthesizing recent research and practitioner commentary (2023–2025); conceptual observation rather than empirical test.

high mixed Collaborative Intelligence in Accounting: A Human + AI Compl... task_allocation

Integrating Generative AI into agile development processes has potential benefits and limitations for planning efficiency.

High-level conclusion based on the controlled experiment with GitLab Duo and qualitative participant feedback discussed in the paper.

high mixed Splitting User Stories Into Tasks with AI -- A Foe or an All... planning efficiency (benefits and limitations)

Larger models do not consistently outperform smaller ones on tool-use tasks.

Empirical observations from the paper's evaluations across the five function-calling benchmarks.

high mixed Switchcraft: AI Model Router for Agentic Tool Calling relative performance of larger vs smaller models on tool-use tasks

Model routing can mitigate the cost of agentic tool use, but existing routers are designed for chat completion rather than tool use.

Argument/positioning in the paper and literature discussion (no specific empirical test reported for existing routers in this statement).

high mixed Switchcraft: AI Model Router for Agentic Tool Calling cost mitigation via model routing; applicability of existing routers to tool use

The novel governance problem is not that AI creates new failure modes, but that AI changes their incidence, observability, and persuasive force enough to require different governance responses.

Normative/analytic claim in the paper; argumentation rather than empirical evidence.

high mixed Vibe Econometrics and the Analysis Contract need for adapted governance responses to AI-mediated inferential failures

The turning point of the inverted-U relationship occurs at 2.948 (AI measure).

Estimated quadratic model that yields a calculated turning point value of 2.948.

high mixed The Inverted-U Relationship Between AI and Corporate Innovat... AI adoption level at which marginal effect on innovation changes sign

There is an inverted-U-shaped relationship between firm-level AI adoption and firm innovation.

Estimated fixed-effects models and U-tests on the 25,204 firm-year sample showing a non-linear (quadratic) AI–innovation coefficient pattern.

high mixed The Inverted-U Relationship Between AI and Corporate Innovat... firm innovation (AI → innovation relationship)

The study provides new empirical evidence that technological innovation (specifically generative AI) reshapes financial spillover networks and highlights the importance of considering both the level and structure of connectedness in assessing systemic risk.

Overall empirical results from the TVP-VAR analysis of connectedness across AI equities, cryptocurrencies, and traditional assets, and discussion of implications for systemic risk assessment.

high mixed Artificial Intelligence and Financial Market Connectedness: ... reshaping of spillover networks; relevance for systemic risk assessment

The impact of AI on financial markets is better understood as a structural transformation of interconnectedness rather than a simple intensification of linkages.

Synthesis of empirical findings from the TVP-VAR showing changes in network structure and heterogeneous directional roles across asset groups, rather than a monotonic increase in aggregate connectedness.

high mixed Artificial Intelligence and Financial Market Connectedness: ... nature of change in financial interconnectedness (structural transformation vs. ...

The structure of spillovers undergoes significant changes over the sample period.

TVP-VAR estimated time-varying spillover/connectedness network showing changes in directional spillovers and network topology (paper states 'significant changes').

high mixed Artificial Intelligence and Financial Market Connectedness: ... structure/topology of spillover network

Introducing taxes on AI returns (τ_ai) and financial gains (τ_f) yields three distinct long-run regimes: low-tax (extreme inequality), moderate-tax (stable mixed economy), and high-tax (post-scarcity with universal basic income).

Model extension with tax parameters τ_ai and τ_f and analysis of steady states/long-run regimes; bifurcation analysis identifying regime types associated with ranges of (τ_ai, τ_f).

high mixed The Economic Singularity: Core Mathematical Model long-run regime (inequality vs. stability vs. post-scarcity/UBI)

The finding that recurrence and neighborhood statistics are stronger predictors than complaint volume has direct implications for complaint routing given the demographic correlates of those features.

Interpretive implication drawn by the authors from the SHAP results; presented as a logical consequence rather than a separately tested empirical result in the excerpt.

high mixed Scaling the Queue: Reinforcement Learning for Equitable Call... implications for complaint routing policy/practice

Aesthetic and functional attributes load onto a single latent factor, suggesting users perceive quality as a unified construct rather than separable aesthetic and functional dimensions.

Factor analysis (or similar latent-variable analysis) on participant ratings of multiple attributes showing a single dominant factor combining aesthetic and functional attributes.

high mixed Artificial Aesthetics: The Implicit Economics of Valuing AI-... latent factor structure of perceived quality

Successful AI implementation in logistics requires not only technological capability but also organizational readiness and effective data governance.

Conclusion drawn from the structured qualitative review of 31 scholarly sources synthesizing reported success factors and preconditions for AI adoption.

high mixed Evaluating the Role of Artificial Intelligence in Optimizing... successful implementation / adoption

The rapid emergence of agentic AI tools raises new questions that the political science discipline must address.

Epilogue of the report raises agentic AI tools as a rapidly emerging phenomenon and lists questions for the discipline; based on expert judgment and forward-looking analysis rather than empirical measurement in the introduction/epilogue.

high mixed Introduction: Artificial Intelligence, Politics, and Politic... policy and research questions arising from agentic AI capabilities (norms, accou...

AI will affect political science research and teaching.

Report introduction explicitly notes the report investigates implications for political science research and teaching; based on the task force's review and analysis rather than a quantitative study.

high mixed Introduction: Artificial Intelligence, Politics, and Politic... research methods, replicability, teaching practices, and curriculum in political...

« Prev 1 2 3 … 10 11 12 … 329 330 Next »