Evidence (4857 claims)
Adoption
5586 claims
Productivity
4857 claims
Governance
4381 claims
Human-AI Collaboration
3417 claims
Labor Markets
2685 claims
Innovation
2581 claims
Org Design
2499 claims
Skills & Training
2031 claims
Inequality
1382 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 417 | 113 | 67 | 480 | 1091 |
| Governance & Regulation | 419 | 202 | 124 | 64 | 823 |
| Research Productivity | 261 | 100 | 34 | 303 | 703 |
| Organizational Efficiency | 406 | 96 | 71 | 40 | 616 |
| Technology Adoption Rate | 323 | 128 | 74 | 38 | 568 |
| Firm Productivity | 307 | 38 | 70 | 12 | 432 |
| Output Quality | 260 | 71 | 27 | 29 | 387 |
| AI Safety & Ethics | 118 | 179 | 45 | 24 | 368 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 75 | 37 | 19 | 312 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 74 | 34 | 78 | 9 | 197 |
| Skill Acquisition | 98 | 36 | 40 | 9 | 183 |
| Innovation Output | 121 | 12 | 24 | 13 | 171 |
| Firm Revenue | 98 | 35 | 24 | — | 157 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 87 | 16 | 34 | 7 | 144 |
| Inequality Measures | 25 | 76 | 32 | 5 | 138 |
| Regulatory Compliance | 54 | 61 | 13 | 3 | 131 |
| Task Completion Time | 89 | 7 | 4 | 3 | 103 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 33 | 11 | 7 | 98 |
| Wages & Compensation | 54 | 15 | 20 | 5 | 94 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 27 | 26 | 10 | 6 | 72 |
| Job Displacement | 6 | 39 | 13 | — | 58 |
| Hiring & Recruitment | 40 | 4 | 6 | 3 | 53 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 11 | 6 | 2 | 41 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 6 | 9 | — | 27 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
Productivity
Remove filter
Broader conclusion: AI has the potential to raise productivity and create value, but without proactive policy the benefits risk being concentrated among skilled workers and firms, exacerbating inequality and regional disparities.
Integrative interpretation drawing on productivity and distributional findings from the 17 studies and theoretical considerations about differential complementarities and adoption patterns.
Whether AI is net job‑creating depends on context (sector, country, policy environment, and workforce skill composition).
Observed heterogeneity across the 17 studies by sectoral setting, country context, and policy environment; studies report differing net employment outcomes depending on these factors.
AI contributes to labor‑market polarization: growth in high‑skill opportunities alongside contraction in many middle- and low‑skill roles.
Comparative synthesis of occupational and wage-composition findings across the 17 studies shows recurring patterns of expansion at the high-skill end and reductions in middle/low-skill employment.
Expected differential wage pressure: wages are likely to fall for routine/low‑skill occupations and rise or remain stable for high‑skill workers who possess complementary AI skills.
Econometric studies summarized in the review (cross‑sectional and panel regressions) and theoretical consistency with SBTC; the review highlights heterogeneity in findings and limited long‑run causal certainty.
AI contributes to skills polarization: demand rises for advanced cognitive, digital, and socio‑emotional skills while routine cognitive and manual task demand declines.
Theoretical integration (SBTC), task decomposition studies showing shifts in task demand by skill content, and labour‑market analyses reporting changes in occupational skill mixes; evidence comes from cross‑sectional and panel studies summarized in the review.
AI/ML has a dual, sector- and skill-dependent effect on labor: widespread displacement of routine and lower-skilled tasks coexists with augmentation of professional and cognitive work and the creation of new labor forms (gig, platform-mediated, and human–AI hybrid roles).
Systematic synthesis of peer‑reviewed empirical studies, industry and policy reports, task‑based analyses, and firm/establishment case studies across cross‑country and sectoral analyses; empirical approaches include econometric (cross‑sectional and panel) studies linking automation/AI adoption to employment and wages, task decomposition analyses, and surveys of firm adoption and restructuring. The review notes heterogeneity across studies and limited long‑run causal evidence.
AI technical capability in the U.S. labor market is substantially larger and far more geographically diffuse than visible adoption suggests.
Agent-based simulation that maps thousands of AI tools to a skills taxonomy and a synthetic population representing the U.S. workforce (151 million agents), covering 32,000+ skills and ~3,000 counties; comparison of the Iceberg Index (skills-based exposure) to a visible-adoption wage-share metric.
Standard policy responses focused on retraining and active labor-market programs are necessary but insufficient to fully offset structural job losses where K_T substitutes broadly for tasks.
Model simulations and policy experiments in the calibrated dynamic model comparing scenarios with aggressive retraining versus structural fiscal/interventionist reforms; discussion of empirical limits from case studies and historical reskilling outcomes.
Routine automation of routine drafting tasks by GLAI may reduce demand for junior drafting labor while increasing demand for skilled reviewers, auditors, and legal technologists.
Labor-market reasoning based on task automation literature and illustrative vignettes; no labor-force survey or longitudinal employment data provided.
Roughly half of the projected LFPR decline to 55% by 2050 is attributable to AI—equivalent to around 10 million lost jobs.
Authors' decomposition/interpretation of conditional forecast results under the rapid scenario reported in the abstract (ties LFPR decline to job-count equivalents).
Our findings echo observations of pervasive annotation errors in text-to-SQL benchmarks, suggesting quality issues are systemic in data engineering evaluation.
Comparative claim referencing prior observations in text-to-SQL literature and the authors' audit results on ELT-Bench; no new cross-benchmark quantitative analysis reported in the excerpt.
That measured machine-equivalent work appeared on no financial statement, workforce report, or government statistical return.
Claim about absence of reporting for the deployment's measured work (asserted in the paper for the deployment case).
Many automotive firms, especially those developing new energy and intelligent vehicles, have suffered financial distress and even exited the market.
Descriptive statement in the paper's introduction/motivation citing observed industry outcomes (financial distress and market exit) among automotive firms focused on NEV and intelligent vehicles.
The dominant mechanism behind the performance drop is a collapse of Type2_Contextual issue detection at config_B, consistent with attention dilution in long contexts.
Analysis of issue-type specific detection rates shows Type2_Contextual detection collapses at config_B; interpretation ties this to attention dilution in longer contexts.
The economic inevitability of technological transformation (in agentic finance) and the critical urgency of proactive intervention.
Author claim synthesizing the paper's argument and modeling results (normative conclusion based on earlier analysis and assertions, not a validated empirical finding).
Our findings surface practical limits on the complexity people can manage in human-AI negotiation.
Synthesis claim based on the empirical study varying number of issues and observed decline in performance beyond three issues; presented as a conceptual/practical implication of the results.
TDD (test-driven development) prompting alone increased regressions to 9.94%.
Empirical result reported in the paper comparing a TDD prompting intervention against other workflows on the benchmark (values given in the excerpt).
Current benchmarks focus almost exclusively on resolution rate, leaving regression behavior under-studied.
Paper's critique of existing benchmark literature and practices (asserted by authors in background; no specific benchmark survey details in the excerpt).
The paper identifies five structural challenges arising from the memory governance gap: memory silos across agent workflows; governance fragmentation across teams and tools; unstructured memories unusable by downstream systems; redundant context delivery in autonomous multi-step executions; and silent quality degradation without feedback loops.
Qualitative analysis and problem framing presented in the paper (authors' identification of five specific challenges).
AI raises managerial cognitive complexity and creates recurring tensions between algorithmic optimisation and systemic, ethical reasoning.
Theoretical synthesis highlighting emergent tensions from integrating computational optimisation with systems thinking and ethical considerations; conceptual, no empirical tests.
Underprovision of verification is likely if left to market forces because information quality has positive externalities and misinformation imposes negative externalities, justifying public funding, subsidies, or regulation.
Economic reasoning and policy implications drawn from the study's findings and the literature on public goods/externalities.
Censorship, restricted data flows, and government interference fragment markets, limit economies of scale, and favor well-resourced, internationally connected actors—widening capacity gaps.
Interpretive economic analysis grounded in observed access constraints and comparative case material across the three platforms.
Limited data access and censorship reduce the efficacy of AI tools by creating training and validation gaps; legal risks complicate use of proprietary platforms and cloud services.
Interviews describing constraints on data availability and legal/operational barriers to using some platforms and cloud services; interpretive analysis of implications for AI training/validation.
Generative AI increases the volume and sophistication of misinformation (deepfakes, fabricated documents), raises false-positive risks, and can be weaponized by state or nonstate actors.
Interview accounts and qualitative analysis noting observed or anticipated misuse of generative models and associated verification challenges.
Resource constraints—limited staff time, funding, and technical capacity—are recurring operational challenges for these platforms.
Staff and stakeholder interviews plus analysis of organizational reports indicating staffing, funding, and technical limitations.
Platforms experience difficulty building and retaining audience trust and engagement, especially in contexts of high public skepticism or polarization.
Interview data from platform staff describing audience engagement challenges, supported by analysis of audience-focused platform formats and community-reporting strategies.
Platforms face limited or asymmetric access to primary data sources such as platform APIs, state data, and archives.
Interview accounts and document analysis noting restricted API access and barriers to state-held data and archives across the three cases.
Censorship and legal risks constrain reporting and distribution for these fact-checking platforms.
Consistent reports from interview subjects and corroborating document analysis indicating legal/censorship-related limitations on publishing and distribution.
Political instability, legal pressure, and censorship strongly shape what platforms can investigate, publish, and access in the region.
Thematic findings from semi-structured interviews with platform staff and document analysis of public reports and policy statements across the three country cases.
Investments in alignment interventions (pluralistic evaluation, transparency) produce public‑good benefits that private firms may underinvest in absent regulation, standards, or procurement incentives.
Economic reasoning about public goods and incentives, supported by conceptual synthesis of firm behavior literature, not by original empirical investment data.
Misalignment generates negative externalities (misinformation, biased decisions, harms to vulnerable groups) that markets may underprovide solutions for, motivating public‑interest interventions.
Economic argumentation and literature synthesis on externalities and public goods; supported by referenced examples in prior work though not quantified here.
AI can augment measurement (e.g., collaboration patterns, output tracking) but if poorly designed may reinforce visibility biases that disadvantage remote workers.
Theoretical reasoning and literature citations about algorithmic bias and monitoring; illustrated with secondary examples rather than primary empirical tests.
Hybrid arrangements can exacerbate inequities in access to informal networks and career advancement, often privileging co-located or better-networked employees.
Theoretical integration of sociological and management studies with comparative case illustrations; secondary data examples referenced but no new causal empirical tests reported.
Hybrid and remote work create risks of professional invisibility, fragmented social networks, and unequal access to workplace social capital.
Literature synthesis and illustrative case studies drawn from secondary sources; qualitative/comparative case evidence rather than primary quantitative data.
HACCA proliferation increases negative externalities and public-good failure risks, meaning private markets will underinvest in mitigation absent public intervention.
Public-goods and externality economic theory applied to cybersecurity; policy analysis (qualitative).
Widespread HACCA availability compresses the capability gap between resource-rich and resource-poor actors, empowering criminal groups and smaller states and concentrating harms in less-protected sectors and geographies.
Diffusion and strategic externalities analysis; scenario reasoning about capability democratization (qualitative).
Firms will shift investment toward cybersecurity and away from other productive uses; small and medium enterprises (SMEs) will be disproportionately affected due to limited defenses.
Investment-allocation reasoning and distributional analysis of firm capabilities (qualitative; no firm-level panel data).
Cyber insurance markets will face increased premium pressure and uncertainty; insurers may raise prices, restrict coverage, or withdraw from some lines.
Economic analysis of risk pricing under higher uncertainty and tail risks; analogy to prior insurance market reactions to emerging risks (qualitative).
Automation lowers fixed and marginal costs of conducting high-skill cyber operations, changing the supply-side economics and enabling a rapid expansion in the number of attackers.
Cost-structure reasoning about automation effects on labor and tool costs; conceptual economic analysis (no empirical cost data provided).
Widespread diffusion of HACCAs will raise the baseline cyber threat and reduce the monopoly of advanced states and groups on high-end offensive capabilities.
Capability diffusion assessment and historical analogies to proliferation of technologies (qualitative; no large-scale empirical diffusion model).
HACCAs would intensify interstate cyber competition by increasing operational tempo and reducing attribution certainty, complicating deterrence and crisis management.
Strategic scenario analysis and expert judgment linking automation features (speed, scale, opacity) to deterrence and attribution challenges (qualitative).
Automation via HACCAs lowers the barrier to entry for conducting sophisticated cyber operations, enabling criminal groups, non-state actors, and less-resourced states to perform high-tier attacks.
Economic reasoning about fixed and marginal cost reductions, capability-diffusion analysis, and analogy to automation in other domains (qualitative; no empirical cost-study sample).
HACCAs would sustain operations using five core operational tactics: autonomous infrastructure setup; credential and access harvesting; advanced detection evasion; adaptive shutdown-avoidance; and operational persistence and scaling.
Attack-lifecycle mapping, review of APT case studies, and red-team threat-modeling to extrapolate automated equivalents of human-led tactics (qualitative categorization).
HACCAs would materially change the threat environment by enabling top-tier offensive cyber operations to be automated and widely proliferable, creating large strategic, economic, and systemic security risks.
Scenario-based forecasting, capability-trajectory assessment, review of APT case studies, and threat-modeling/red-team reasoning (qualitative synthesis; no large-n empirical quantification).
Counterfactual simulations show that modest salary increases have a smaller effect on predicted attrition than eliminating overtime (in this dataset and model).
Comparative counterfactual experiments run on the calibrated logistic model: simulations altering salary vs. altering overtime feature; reported that overtime elimination outperforms modest pay increases in retained headcount and probability reductions (exact salary-change amounts and comparative numbers not given in the summary).
In the dataset used, eliminating overtime could potentially retain about 31 employees — a larger effect than modest salary increases.
Aggregated counterfactual simulation on the IBM HR Analytics dataset: after setting overtime to zero for applicable records, the model-predicted net retained headcount ≈ 31; compared to simulations of modest salary increases which yielded smaller retained headcount (exact salary-change magnitude and headcount numbers not provided).
Eliminating overtime could lower predicted attrition probability by 17.35% for affected employees (per the model's counterfactual simulation).
Counterfactual policy simulation using the calibrated logistic model on the IBM HR Analytics dataset: set overtime feature to zero for affected employees and compute change in each employee's calibrated attrition probability; reported average reduction = 17.35%.
Traditional STP showed a 67% performance decline after six months in unstable market conditions.
Empirical observation reported in the study—likely derived from simulation scenarios and/or longitudinal analysis of behavioral data; precise data source (simulation vs. observed field data), statistical tests, and sample framing are not specified in the summary.
The persistence of interpretive, human-in-the-loop evaluation implies ongoing labor requirements (annotation, sense-making, governance roles), affecting forecasts of automation and labor substitution in sectors adopting LLMs.
Interview reports describing continued manual work for evaluation tasks across participants; authors draw implications for labor demand.
Environmental and informational externalities from AI (energy use, privacy harms, bias) justify regulatory and Pigouvian-style interventions to correct market failures.
Conceptual and policy literature reviewed, combined with empirical observations about environmental impacts and privacy/bias incidents reported in prior studies; the paper does not provide new causal estimates of externality magnitudes.