Evidence (8570 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Adoption
Remove filter
In our central scenario — drawn from credible international estimates — around 7 per cent of current jobs could be displaced in the short–medium run.
Scenario simulation based on international estimates of AI exposure/adoption; central scenario reported in the report (linked to SWITCH microsimulation for distributional analysis).
AI tends to place higher earning and highly educated workers at greater risk of disruption, because the occupations most exposed to AI are predominantly in these groups.
Synthesis of international research on occupational exposure to AI and the report's analysis linking exposure to worker characteristics (education and earnings); presented as descriptive finding in the report.
Traditional frameworks for competition law, which emphasize short-term price impacts and inflexible market definitions, are inadequate to address exclusionary effects in AI-driven markets.
Conceptual/legal analysis combined with the paper's empirical findings (panel-data evidence of non-price exclusionary dynamics) arguing the mismatch between observed AI-driven exclusion and conventional competition law focus.
Route dependency produced by dynamic learning processes disproportionately disadvantages late entrants.
Empirical and theoretical analysis in the paper: dynamic learning / cumulative learning modeled in the conceptual framework and empirically tested using panel data on AI-intensive markets showing persistent advantages for early entrants.
These effects are made worse by data concentration.
Moderator/interaction analysis reported in the paper showing that market-level data concentration amplifies the association between algorithmic advantage and both reduced entry and greater concentration in the panel-data analysis.
Elevated levels of algorithmic advantage are consistently linked to diminished entry rates.
Empirical analysis using panel data: regressions on an unbalanced panel of markets with high AI intensity, controlling for firm size, capital intensity, R&D expenditure, and industry growth (as described in the paper).
The expansion of AI in digital health has simultaneously introduced complex governance, privacy, and financial sustainability challenges.
Argument and synthesis across regulatory policy, ethics, and healthcare economics literatures presented in the review (literature review / conceptual synthesis).
These risks are fundamentally product-level and cannot be eliminated by technical safeguards alone because agent behavior is inherently stochastic.
Theoretical argument/claim in the paper (no empirical demonstration or quantified test provided in the abstract).
Claude Sonnet 4.6 achieves only 33.3% (completion rate) on ClawBench.
Paper gives a concrete example performance result for Claude Sonnet 4.6 (reported completion percentage on the benchmark).
The authors evaluated 7 frontier models on ClawBench and found that both proprietary and open-source models can complete only a small portion of these tasks.
Paper reports evaluations of 7 models on the ClawBench tasks (empirical evaluation across the benchmark).
These dynamics risk trapping workers in a 'low-skill trap'.
Synthesis of observed labour-market polarisation, persistent low-skill segment, and limited reskilling coverage from secondary sources (2020–2024); presented as a likely risk/consequence.
Limited reskilling coverage constrains workers' ability to adapt to AI-driven changes.
Paper reviews official reports and secondary data (2020–2024) indicating low coverage/uptake of reskilling programs in India and links this to limited adaptation capacity.
AI-driven change is intensifying wage disparities.
Paper links observed occupational shifts in secondary data (2020–2024) with widening wage gaps between high- and lower-skilled groups.
Routine middle-skilled roles are declining.
Secondary data and official reports from 2020–2024 documenting reductions in middle-skill occupations, interpreted through SBTC/Human Capital frameworks.
There is a 'capability-demand inversion' where skills most demanded in AI-exposed jobs are those LLMs perform least well at in our benchmark.
Cross-referencing SAFI performance with Anthropic Economic Index demand data (reported in paper); described as an observed inversion pattern.
Conversational AI can covertly redirect consumer choices at scale, and existing transparency mechanisms may be insufficient to protect users.
Summary/interpretive claim based on the experimental findings (large increase in sponsored selections under LLM agents, low detection rates, lack of effect for 'Sponsored' labels) from the preregistered experiments (N = 2,012).
Instructing the model to conceal its intent makes its influence nearly invisible (detection accuracy < 10%).
Experimental manipulation instructing the LLM to conceal intent; reported detection accuracy under this condition is <10% in the experiments (N = 2,012).
The vast majority of participants fail to detect any promotional steering.
Reported participant detection measures collected during the experiments indicating low detection rates of promotional steering; based on the same experimental sample (N = 2,012).
We term this the Logic Monopoly -- the agent society's unchecked monopoly over the entire logic chain from planning through execution to evaluation.
Terminology/definition introduced by the authors to describe the conceptual governance problem; definitional claim rather than empirical finding.
When agents from different human principals collaborate at scale, the collective becomes opaque: no single human can observe, audit, or govern the emergent behavior.
Conceptual/analytical claim presented as a security/governance risk in the paper; no empirical study or quantified measurement given in the excerpt.
Health disparities research is severely underrepresented at just 5.7% of AI-funded work.
Semantic/topic classification identifying projects addressing health disparities among AI-labelled projects, yielding a reported share of 5.7%.
A critical research-to-deployment gap exists: 79% of AI projects remain in research/development stages while only 14.7% engage in clinical deployment or implementation.
Stage classification of AI-labelled projects in the dataset, reporting 79% classified as research/development and 14.7% as clinical deployment/implementation.
Many agents hover around the break-even point despite similar semantic matching scores.
Observed empirical pattern reported in benchmark results: agents with similar semantic matching scores nevertheless show different financial outcomes (many near break-even).
AI-assisted evaluation reduces variance in research quality.
SEM and regression analyses on OECD panel data report a decrease in variance of research quality measures associated with higher AIRC.
High-risk agentic systems with untraceable behavioral drift cannot currently satisfy the AI Act's essential requirements.
Authors' legal and normative conclusion based on their regulatory mapping and analysis (argumentative/legal reasoning rather than reported empirical testing).
The paper identifies agent-specific compliance challenges in cybersecurity, human oversight, transparency across multi-party action chains, and runtime behavioral drift.
Author-stated findings from the regulatory mapping and analysis; specific challenge areas listed without reported quantitative measurement.
The EU AI Act (Regulation 2024/1689) regulates these systems through a risk-based framework, but it does not operate in isolation: providers face simultaneous obligations under the GDPR, the Cyber Resilience Act, the Digital Services Act, the Data Act, the Data Governance Act, sector-specific legislation, the NIS2 Directive, and the revised Product Liability Directive.
Legal/regulatory mapping asserted by the authors listing specific EU regulations and directives that impose obligations on providers.
Multiple distinct contexts tend to collapse into one another or 'rot', degrading over time and reducing the utility of efforts to account for context.
Theoretical and empirical claim supported by interviewee reports and the authors' analytic synthesis; presented as observed pattern across cases (qualitative; sample size not specified).
Generative AI tools fail to account for users' context in workplace settings.
Findings from expert interviews reporting concrete examples where tools did not incorporate or respect relevant contextual information; qualitative analysis (sample size not provided in the summary).
Current approaches to account for the contexts in which generative AI technologies are used fall short of users' expectations and needs.
Qualitative empirical study based on expert interviews and analysis of user/developer perspectives (method described as expert interviews; exact sample size not stated in provided summary).
Occupations are not eradicated instantaneously, but gradually encroached upon via atomic actions.
Conceptual argument presented by the authors as part of their theoretical framing (Tech-Risk Dual-Factor Model); no empirical count reported for this specific claim.
Existing task-based evaluations predominantly measure theoretical "exposure" to AI capabilities, ignoring critical frictions of real-world commercial adoption: liability, compliance, and physical safety.
Authoritative statement in paper contrasting prior task-based exposure evaluations with the paper's focus on business/institutional frictions (liability, compliance, physical safety). No numeric sample; literature critique based on conceptual analysis.
Current research has largely focused on short-horizon tasks over a limited set of software with limited economic value (e.g., basic e-commerce and OS-configuration tasks).
Narrative literature/field observation reported in paper introduction (no numeric study reported in excerpt).
We identify a temporal constraint: the window during which semiconductor manufacturing concentration makes hardware-level governance implementable is narrowing, while R&D timelines for critical mechanisms span years.
Authors' temporal analysis combining industry structure observations (semiconductor manufacturing concentration) with estimated R&D timelines for mechanisms (qualitative/engineering timeline estimates). No empirical time-series sample size provided.
We assess principal threats to compute-based governance, including algorithmic efficiency gains, distributed training methods, and sovereignty concerns.
Authors' threat analysis (qualitative assessment of technical and geopolitical threat vectors). No quantitative sample size; based on literature and engineering reasoning.
Our analysis reveals a structural mismatch: the mechanisms most needed for treaty verification, including on-chip compute metering, cryptographic proof-of-training, and hardware-embedded enforcement, are also the least mature.
Authors' feasibility assessments of mechanisms (qualitative/engineering evaluation across the taxonomy); identification of critical mechanisms for treaty verification and corresponding feasibility ratings. No empirical trial or sample size reported.
The governance of frontier AI increasingly relies on controlling access to computational resources, yet the hardware-level mechanisms invoked by policy proposals remain largely unexamined from an engineering perspective.
Authors' framing and literature review presented in the paper (conceptual/qualitative argument; no empirical sample size reported).
The literature remains fragmented, with limited integrative frameworks to explain how AI-human dynamics and decision-making typologies shape outcomes.
Conclusion drawn from the systematic review and bibliometric analysis of the 627-article corpus as reported in the abstract.
Within robotics subsectors, system integration delivers earlier and stronger carbon-reduction effects than ontology manufacturing.
Subsector analysis in the panel data (277 prefecture-level cities, 2008–2019) comparing effects of system integration versus ontology manufacturing on urban carbon emissions.
The carbon-mitigation effects of robotics manufacturing are more pronounced in the central region of China than in the eastern region, indicating a latecomer advantage in green industrialization.
Heterogeneity analysis across geographic regions (central vs eastern regions) using the same panel of 277 prefecture-level cities (2008–2019).
A stage-dependent sequential mechanism operates: mature robotics manufacturing promotes robot adoption, which improves urban energy efficiency, and ultimately reduces carbon emissions; this channel is inactive at early stages of industry development.
Mechanism/mediation analysis using the panel data of 277 prefecture-level cities (2008–2019), presented as sequential pathway evidence in the paper.
Once robotics manufacturing reaches a moderate scale, further expansion leads to declines in urban carbon emissions.
Same panel dataset (277 prefecture-level cities, 2008–2019); econometric identification of the right-hand (declining) portion of the inverted U-shaped curve.
Replacing deterministic components with probabilistic workflows changes the failure mode: LLM pipelines may generate plausible but incorrect outputs that pass superficial checks and propagate into irreversible actions such as DOI minting and public release.
Conceptual argument supported by the paper's incident descriptions (e.g., a detected coordinate transformation error); the statement is presented as a general risk rationale.
Up to 25% of routine administrative tasks face high automation risk.
Quantitative survey of 150 leading Nigerian firms across finance, tech, and manufacturing reporting the share of tasks at high automation risk.
There is a significant deficit in high-demand technical competencies such as data engineering, machine learning maintenance, and AI ethics within the Nigerian workforce.
Findings reported from the quantitative survey of 150 leading Nigerian firms (finance, tech, manufacturing) supplemented by qualitative workforce interviews and policy analysis.
Treated firms' demand for external capital investment falls by just over $220,000 relative to the control group.
RCT with 515 firms; reported dollar-change in external investment demand between treated and control firms.
Despite faster growth, treated firms do not scale inputs proportionally: their demand for external capital investment falls by 39.5% relative to the control group.
RCT with 515 firms; firms reported external capital demand/investment requests; comparison of investment demand between treatment and control groups.
Applying the Auditor-Corrector methodology to ELT-Bench uncovers that most failed transformation tasks contain benchmark-attributable errors — including rigid evaluation scripts, ambiguous specifications, and incorrect ground truth — that penalize correct agent outputs.
Audit results on ELT-Bench identifying categories of benchmark errors (rigid scripts, ambiguous specs, incorrect ground truth) and attributing many failed transformation tasks to these errors; no numeric breakdown or sample count given in the excerpt.
On ELT-Bench, the first benchmark for end-to-end ELT pipeline construction, AI agents initially showed low success rates, suggesting they lacked practical utility.
Reference to initial evaluation results on ELT-Bench showing low success rates for AI agents; the provided excerpt does not give numerical success rates or sample size.
LLM uncertainty estimates require statistical correction before they can be used in decision-making.
Empirical finding of severe undercoverage of nominal 95% intervals and demonstration that conformal recalibration is needed to achieve intended coverage.