Evidence (8066 claims)
Adoption
5586 claims
Productivity
4857 claims
Governance
4381 claims
Human-AI Collaboration
3417 claims
Labor Markets
2685 claims
Innovation
2581 claims
Org Design
2499 claims
Skills & Training
2031 claims
Inequality
1382 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 417 | 113 | 67 | 480 | 1091 |
| Governance & Regulation | 419 | 202 | 124 | 64 | 823 |
| Research Productivity | 261 | 100 | 34 | 303 | 703 |
| Organizational Efficiency | 406 | 96 | 71 | 40 | 616 |
| Technology Adoption Rate | 323 | 128 | 74 | 38 | 568 |
| Firm Productivity | 307 | 38 | 70 | 12 | 432 |
| Output Quality | 260 | 71 | 27 | 29 | 387 |
| AI Safety & Ethics | 118 | 179 | 45 | 24 | 368 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 75 | 37 | 19 | 312 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 74 | 34 | 78 | 9 | 197 |
| Skill Acquisition | 98 | 36 | 40 | 9 | 183 |
| Innovation Output | 121 | 12 | 24 | 13 | 171 |
| Firm Revenue | 98 | 35 | 24 | — | 157 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 87 | 16 | 34 | 7 | 144 |
| Inequality Measures | 25 | 76 | 32 | 5 | 138 |
| Regulatory Compliance | 54 | 61 | 13 | 3 | 131 |
| Task Completion Time | 89 | 7 | 4 | 3 | 103 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 33 | 11 | 7 | 98 |
| Wages & Compensation | 54 | 15 | 20 | 5 | 94 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 27 | 26 | 10 | 6 | 72 |
| Job Displacement | 6 | 39 | 13 | — | 58 |
| Hiring & Recruitment | 40 | 4 | 6 | 3 | 53 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 11 | 6 | 2 | 41 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 6 | 9 | — | 27 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
Heterogeneity across universities implies that targeting high-performing institutions and diffusing their practices could be more effective than uniform expansion of AI training.
Observed variation in employment effectiveness, placement outcomes, and wages across the 191 universities; policy implication drawn from comparative performance patterns.
Labor market institutions (unions, collective bargaining), education and training systems, social safety nets, and regulations substantially mediate distributional and aggregate outcomes of AI adoption.
Comparative institutional analysis and equilibrium models linking institutional settings to wage-setting and reallocation dynamics, supported by empirical cross-jurisdiction comparisons where available.
Developing economies face different trade-offs from AI adoption than advanced economies, due to different occupational structures and complementarities.
Comparative analyses and sectoral studies drawing on cross-country microdata and institutional comparisons; theoretical models highlighting differences in task composition and absorptive capacity.
Occupational reallocation occurs: declines in some routine occupations alongside growth in AI-complementary roles (e.g., AI maintenance, oversight, and creative tasks).
Administrative and household employment data analyzed with occupational breakdowns, supplemented by task-mapping methods and panel/event-study approaches documenting shifting occupational shares over time.
Lower-skill roles experience mixed outcomes: some see adverse effects from automation while others benefit where AI is complementary to their tasks.
Microdata analyses and case studies showing heterogeneous effects by task complementarity; task-based exposure measures that differentiate which low-skill tasks are automatable versus augmentable.
AI contributes to wage polarization: earnings grow at the top of the distribution and stagnate or fall for middle occupations.
Wage distribution decompositions and panel regression studies that examine percentile-level wage changes, combined with task-based exposure measures linking AI adoption to differential impacts across the wage distribution.
The employment impact of automation depends crucially on labour-market structure (formal vs informal), availability of alternative employment, and social protections.
Theoretical framing supported by secondary literature comparing institutional contexts and their mediating effects on automation outcomes; no primary causal estimates in this paper.
Standard policy responses focused on retraining and active labor-market programs are necessary but insufficient to fully offset structural job losses where K_T substitutes broadly for tasks.
Model simulations and policy experiments in the calibrated dynamic model comparing scenarios with aggressive retraining versus structural fiscal/interventionist reforms; discussion of empirical limits from case studies and historical reskilling outcomes.
Routine automation of routine drafting tasks by GLAI may reduce demand for junior drafting labor while increasing demand for skilled reviewers, auditors, and legal technologists.
Labor-market reasoning based on task automation literature and illustrative vignettes; no labor-force survey or longitudinal employment data provided.
Unstructured physical trades and high-stakes caretaking roles exhibit absolute resilience to LLM-driven automation (i.e., very low OAI), quantifying a 'Cognitive Risk Asymmetry.'
Empirical classification from computed OAIs showing low exposure for unstructured physical trades and high-stakes caretaking roles; the excerpt does not provide specific OAI values or counts.
Variance-based Human-in-the-Loop (HITL) validation with an expert panel demonstrates a profound cognitive gap: isolated algorithmic probabilities fail to encapsulate the "institutional premium" imposed by experts bounded by professional liability.
Empirical validation procedure reported: variance-based HITL validation involving an expert panel that compared algorithmic scores and expert adjustments, concluding a systematic difference attributed to institutional liability considerations. The excerpt does not give panel size or quantitative variance statistics.
Industry self-regulation has demonstrably failed, motivating the need for IASCA.
Proposal asserts a 'demonstrated failure of industry self-regulation' as rationale for IASCA; no specific empirical studies, incidents, or metrics are cited in the provided text.
Roughly half of the projected LFPR decline to 55% by 2050 is attributable to AI—equivalent to around 10 million lost jobs.
Authors' decomposition/interpretation of conditional forecast results under the rapid scenario reported in the abstract (ties LFPR decline to job-count equivalents).
Our findings echo observations of pervasive annotation errors in text-to-SQL benchmarks, suggesting quality issues are systemic in data engineering evaluation.
Comparative claim referencing prior observations in text-to-SQL literature and the authors' audit results on ELT-Bench; no new cross-benchmark quantitative analysis reported in the excerpt.
That measured machine-equivalent work appeared on no financial statement, workforce report, or government statistical return.
Claim about absence of reporting for the deployment's measured work (asserted in the paper for the deployment case).
The AI-as-advisor approach has limitations: people frequently ignore accurate advice, rely too much on inaccurate advice, and their decision-making skills may deteriorate over time.
Paper asserts these limitations in motivation/background and/or derives them from observed behavior in experiments (stated in abstract as known problems with AI-as-advisor).
When given a choice between which information source to give to an AI agent, a large portion of subjects fail to select the more informative one.
Experimental condition where subjects chose which source (prompt vs revealed-preference data) to provide to an AI agent; reported result that a large portion did not choose the more informative source.
The gap in predictive accuracy is driven by subjects' difficulty in translating their own preferences into written instructions.
Further analysis reported in the experiment attributing the observed accuracy gap to subjects' difficulty converting their preferences into prompts (presumably via analysis comparing content of prompts to revealed choices).
The emergence and diffusion of these technologies create an era of labor displacement.
Framed in the paper as a premise motivating policy proposals; presented as a conceptual claim rather than supported by original empirical estimates in the text provided.
Many automotive firms, especially those developing new energy and intelligent vehicles, have suffered financial distress and even exited the market.
Descriptive statement in the paper's introduction/motivation citing observed industry outcomes (financial distress and market exit) among automotive firms focused on NEV and intelligent vehicles.
The dominant mechanism behind the performance drop is a collapse of Type2_Contextual issue detection at config_B, consistent with attention dilution in long contexts.
Analysis of issue-type specific detection rates shows Type2_Contextual detection collapses at config_B; interpretation ties this to attention dilution in longer contexts.
The economic inevitability of technological transformation (in agentic finance) and the critical urgency of proactive intervention.
Author claim synthesizing the paper's argument and modeling results (normative conclusion based on earlier analysis and assertions, not a validated empirical finding).
Surveillance intensity is associated with hyper-vigilance (reported effect = -4.213).
One of the six propositions from the paper's trilevel framework; the abstract reports an effect value of '-4.213' associated with surveillance intensity → hyper-vigilance.
Platform workers receive 36.3% more third-party ratings than traditional workers.
Quantitative synthesis/summary reported in the paper (no primary sample size in abstract); likely aggregated from included studies.
Platform workers experience 59.6% higher digital speed determination than traditional workers.
Quantitative synthesis/summary reported in the paper (no primary sample size given in the abstract); presumably aggregated from included studies comparing platform and traditional workers.
Our findings surface practical limits on the complexity people can manage in human-AI negotiation.
Synthesis claim based on the empirical study varying number of issues and observed decline in performance beyond three issues; presented as a conceptual/practical implication of the results.
Multiple competing arbitrageurs drive down consumer prices, reducing the marginal revenue of model providers.
Analytic argument and empirical/simulation results reported in the paper showing that competition among arbitrageurs lowers prices faced by consumers and decreases marginal revenue for model providers.
Distillation further creates strong arbitrage opportunities, potentially at the expense of the teacher model's revenue.
Experiments or analyses involving model distillation reported in the paper showing that distilled/student models enable profitable arbitrage and may reduce revenue captured by the original teacher model.
The pre-existing AI community dissolved as the tools went mainstream, and the new vocabulary was absorbed into existing careers rather than binding a new occupation.
Interpretation of resume-data patterns: observed dispersion of previously coherent AI practitioners and spread of AI-related vocabulary into other occupational records rather than consolidation into a new occupational cluster.
Beyond an environment-specific optimum, scaling further degrades institutional fitness because trust erosion and cost penalties outweigh marginal capability gains.
Analytical argument from the Institutional Scaling Law together with illustrative examples and discussion of mechanisms (trust erosion, cost penalties) in the paper.
Bias effects vary by vulnerability type, with injection flaws being more susceptible to framing bias than memory corruption bugs.
Subgroup analysis in Study 1 comparing framing sensitivity across vulnerability classes (injection vs memory corruption) within the experiment dataset.
Model convergence in DRL can lead to crowded trades, which has implications for market stability and motivates a robust regulatory framework balancing innovation with market stability.
Analytical argument in the paper linking convergence/crowding to systemic effects; the excerpt does not include empirical market-impact studies, simulations, or measured incidence rates of crowding.
Deploying DRL at scale requires socio-technical infrastructure considerations including algorithmic governance, systemic risk management, and accounting for the environmental cost of large-scale computational finance.
Conceptual and system-level analysis presented in the paper; no empirical auditing data, carbon-footprint measurements, or governance case studies are provided in the excerpt.
Two sources of spurious performance addressed are memorization bias from ticker-specific pre-training and survivorship bias from flawed backtesting.
Problem identification and methodological focus: the paper names memorization bias and survivorship bias as primary confounders it aims to mitigate. The excerpt does not detail experiments that quantify the magnitude of those biases or the degree to which they were reduced.
Traditional ex ante regulatory approaches struggle to keep pace with AI development, exacerbating the 'pacing problem' and the Collingridge dilemma.
Theoretical/legal literature review and conceptual argument presented in the paper (no empirical sample or quantitative data reported in the abstract).
Low internal conflict or unanimity can be diagnostic of variance depletion (i.e., exclusion) rather than healthy integration, so governance systems should treat low conflict as a potential red flag until heterogeneity integration is verified.
Interpretive policy implication derived from the model's demonstration that exclusionary processes can produce deceptively low observed disagreement while increasing fragility; this recommendation is based on theoretical reasoning without empirical validation in the paper.
Most existing candidate matching systems act as keyword filters, failing to handle skill synonyms and nonlinear careers, resulting in missed candidates and opaque match scores.
Paper's introductory assertion about limitations of most current systems. The excerpt does not cite empirical studies, statistics, or systematic reviews to substantiate this claim.
TDD (test-driven development) prompting alone increased regressions to 9.94%.
Empirical result reported in the paper comparing a TDD prompting intervention against other workflows on the benchmark (values given in the excerpt).
Current benchmarks focus almost exclusively on resolution rate, leaving regression behavior under-studied.
Paper's critique of existing benchmark literature and practices (asserted by authors in background; no specific benchmark survey details in the excerpt).
The paper identifies five structural challenges arising from the memory governance gap: memory silos across agent workflows; governance fragmentation across teams and tools; unstructured memories unusable by downstream systems; redundant context delivery in autonomous multi-step executions; and silent quality degradation without feedback loops.
Qualitative analysis and problem framing presented in the paper (authors' identification of five specific challenges).
AI raises managerial cognitive complexity and creates recurring tensions between algorithmic optimisation and systemic, ethical reasoning.
Theoretical synthesis highlighting emergent tensions from integrating computational optimisation with systems thinking and ethical considerations; conceptual, no empirical tests.
Underprovision of verification is likely if left to market forces because information quality has positive externalities and misinformation imposes negative externalities, justifying public funding, subsidies, or regulation.
Economic reasoning and policy implications drawn from the study's findings and the literature on public goods/externalities.
Censorship, restricted data flows, and government interference fragment markets, limit economies of scale, and favor well-resourced, internationally connected actors—widening capacity gaps.
Interpretive economic analysis grounded in observed access constraints and comparative case material across the three platforms.
Limited data access and censorship reduce the efficacy of AI tools by creating training and validation gaps; legal risks complicate use of proprietary platforms and cloud services.
Interviews describing constraints on data availability and legal/operational barriers to using some platforms and cloud services; interpretive analysis of implications for AI training/validation.
Generative AI increases the volume and sophistication of misinformation (deepfakes, fabricated documents), raises false-positive risks, and can be weaponized by state or nonstate actors.
Interview accounts and qualitative analysis noting observed or anticipated misuse of generative models and associated verification challenges.
Resource constraints—limited staff time, funding, and technical capacity—are recurring operational challenges for these platforms.
Staff and stakeholder interviews plus analysis of organizational reports indicating staffing, funding, and technical limitations.
Platforms experience difficulty building and retaining audience trust and engagement, especially in contexts of high public skepticism or polarization.
Interview data from platform staff describing audience engagement challenges, supported by analysis of audience-focused platform formats and community-reporting strategies.
Platforms face limited or asymmetric access to primary data sources such as platform APIs, state data, and archives.
Interview accounts and document analysis noting restricted API access and barriers to state-held data and archives across the three cases.
Censorship and legal risks constrain reporting and distribution for these fact-checking platforms.
Consistent reports from interview subjects and corroborating document analysis indicating legal/censorship-related limitations on publishing and distribution.
Political instability, legal pressure, and censorship strongly shape what platforms can investigate, publish, and access in the region.
Thematic findings from semi-structured interviews with platform staff and document analysis of public reports and policy statements across the three country cases.