Evidence (8066 claims)
Adoption
5586 claims
Productivity
4857 claims
Governance
4381 claims
Human-AI Collaboration
3417 claims
Labor Markets
2685 claims
Innovation
2581 claims
Org Design
2499 claims
Skills & Training
2031 claims
Inequality
1382 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 417 | 113 | 67 | 480 | 1091 |
| Governance & Regulation | 419 | 202 | 124 | 64 | 823 |
| Research Productivity | 261 | 100 | 34 | 303 | 703 |
| Organizational Efficiency | 406 | 96 | 71 | 40 | 616 |
| Technology Adoption Rate | 323 | 128 | 74 | 38 | 568 |
| Firm Productivity | 307 | 38 | 70 | 12 | 432 |
| Output Quality | 260 | 71 | 27 | 29 | 387 |
| AI Safety & Ethics | 118 | 179 | 45 | 24 | 368 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 75 | 37 | 19 | 312 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 74 | 34 | 78 | 9 | 197 |
| Skill Acquisition | 98 | 36 | 40 | 9 | 183 |
| Innovation Output | 121 | 12 | 24 | 13 | 171 |
| Firm Revenue | 98 | 35 | 24 | — | 157 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 87 | 16 | 34 | 7 | 144 |
| Inequality Measures | 25 | 76 | 32 | 5 | 138 |
| Regulatory Compliance | 54 | 61 | 13 | 3 | 131 |
| Task Completion Time | 89 | 7 | 4 | 3 | 103 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 33 | 11 | 7 | 98 |
| Wages & Compensation | 54 | 15 | 20 | 5 | 94 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 27 | 26 | 10 | 6 | 72 |
| Job Displacement | 6 | 39 | 13 | — | 58 |
| Hiring & Recruitment | 40 | 4 | 6 | 3 | 53 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 11 | 6 | 2 | 41 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 6 | 9 | — | 27 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
There is insufficient availability of appropriate, solutions-oriented, and user-friendly tools for practitioners and decision-makers; availability should be increased.
Tool development and iterative testing with end users within MYRIAD-EU, and stakeholder feedback pointing to a demand for more usable tools.
Methods are needed to generate both present-day and future multi-hazard and multi-risk scenarios that integrate climate, socio-economic change, and cascading effects.
Project development and testing of scenario methods reported, plus identification of remaining methodological gaps in scenario integration.
Concepts, definitions, and terminologies for multi-hazard and multi-risk work must be mainstreamed and harmonized to enable comparability and communication across disciplines and stakeholders.
Stakeholder feedback and the project's synthesis of interdisciplinary outputs highlighting conceptual fragmentation and communication barriers.
If quantum advantages accrue initially to well-capitalized incumbents (cloud providers, financial firms, pharmaceuticals), we should expect increased market power and higher rents.
Scenario analysis and historical analogs where early compute advantages concentrated market power; qualitative market-structure modeling.
Benefits of quantum diffusion are likely to be uneven across countries, firms, and workers—boosting regions with strong innovation ecosystems and possibly increasing market concentration among compute-capable incumbents.
Multi-region/sectoral modeling with heterogenous adoption and capability parameters; historical analogs showing concentration following early compute advantages; scenario comparisons.
Without coordinated investments and governance, large theoretical gains may remain unrealized or be very unevenly distributed.
Policy counterfactual scenarios in which underinvestment, fragmented governance, or restrictive export regimes reduce adoption elasticities and infrastructure readiness, producing lower and more concentrated macro gains compared with coordinated-investment scenarios.
High executive digital cognition on its own tends to weaken the policy's positive effect on energy utilization efficiency (interpreted as short-run adjustment costs from digital transformation).
Interaction tests between policy treatment and an executive-level digital-cognition measure show a negative interaction coefficient in DID regressions; authors interpret this as evidence of short-run adjustment costs.
The under‑use of external text sources in the reviewed literature may be due to privacy, legal/regulatory uncertainty, or integration costs.
Authors' interpretation linking observed low coverage of external text sources (social media, news, reviews) in the 109 articles to plausible barriers (privacy/regulation/integration); no direct empirical test in the review.
Restrictions on cross‑border data flows or fragmented privacy rules reduce the training data available to AI systems, lowering the quality and scalability of AI services exported internationally.
Theoretical linkage and literature on AI training data needs synthesized in the paper; no original empirical measurement of AI performance loss presented.
Support systems for digital services exporters, especially SMEs, are inadequate in China.
Review of policy documents and literature highlighting gaps in finance, legal support, and standards compliance assistance for SME internationalization (qualitative).
China's platform firms show uneven internationalization and platform infrastructure is not consistently internationally competitive.
Case examples and synthesis of domestic/international studies on platform internationalization included in the review (qualitative evidence).
China has limited influence in high‑level trade rule formation.
Policy review and comparative institutional analysis within the literature review; descriptive assessment of China's participation in multilateral rule‑making (no formal measurement of influence).
Current institutional, technological, and market shortcomings limit China’s ability to close the gap with economies operating under high‑standard trade regimes.
Qualitative comparative analysis of policy and institutional frameworks against high‑standard trade members; literature and case examples (no new microdata).
Widespread deployment of similar models could create correlated failures or fraud vectors, implying systemic risk that may warrant macroprudential attention.
Analytic caution based on model homogeneity and case/literature discussion; speculative systemic risk concern rather than empirically demonstrated.
There is regulatory uncertainty around AI-generated filings and responsibility/liability for automated outputs.
Analysis and literature review discuss unclear regulatory positions and legal risks noted in case organizations' deployment considerations.
Integration complexity with legacy ERP/financial systems and sharing-center processes is a significant implementation challenge.
Case study narratives describe integration work and friction points; analytic framing highlights ERP compatibility issues.
Model hallucinations, lack of explainability, and limited audit trails limit safe adoption.
Paper cites literature and case observations about model reliability and explainability issues; examples and discussion are qualitative.
Data privacy, confidentiality, and cross-border data transfer concerns are important barriers to deployment.
Challenges enumerated from case studies and literature; specific organizational concerns cited in cases (Xiaomi, Deloitte) and in regulatory discussion.
Automation and human–robot assemblages can reproduce subjugation and vulnerability affecting care workers and marginalized users, requiring attention to distributional justice and labor-market impacts.
Illustrative vignettes from healthcare robotics and literature synthesis on care ethics and labor impacts; no quantitative labor-market analysis presented.
Legal liability regimes and insurance products may systematically under- or mis-assign costs of harm in socio-technical assemblages when primordial ethical demands are considered.
Conceptual argument and suggested modeling directions; no empirical simulation or insurance-market data presented.
Treating responsibility as a Levinasian, asymmetrical moral obligation implies it operates as a non-contractible externality that markets and contracts may fail to internalize, creating persistent externalities in AI deployment that standard economic models may miss.
Theoretical implication derived from philosophical argument applied to economic concepts; suggested consequences but no formal models or empirical validation in the paper.
Simple pluralist or multi-principle balancing approaches risk reproducing structural subordination by failing to foreground the asymmetrical ethical demand toward vulnerable Others.
Normative critique supported by cross-disciplinary literature (care ethics, mediation, STS) and illustrative examples; no empirical test of pluralist approaches’ effects.
The Levinasian framework helps reveal how human–robot interactions can both expose and reproduce systemic vulnerabilities, subjugation, and unaddressed harms (termed 'Problem C' — attribution of responsibility and distributed agency).
Theoretical diagnosis supported by interdisciplinary literature synthesis and illustrative vignettes from healthcare robotics, autonomous vehicles, and algorithmic governance. No quantitative prevalence data.
Absent interoperability, divergence in data and AI rules will raise transaction costs, reduce trade gains, and create opportunities for regulatory arbitrage.
Economic reasoning and scenario-based projections; asserted as an outcome of mechanism analysis rather than demonstrated with quantitative estimates.
Explainability, auditability, or data-localization requirements could favor larger vendors with compliance capacity, increasing market concentration and affecting competition among AI suppliers.
Market-structure argument grounded in regulatory-compliance burden analysis and comparative examples; not supported by empirical market data in the study.
Legal uncertainty and strict procedural requirements increase compliance costs and regulatory risk, which can slow AI adoption by firms and public agencies.
Theoretical economic implications drawn from legal analysis and comparative observations; no empirical measurement of costs or adoption rates in the study.
AI can restrict or reshape human administrative discretion in legally sensitive ways.
Doctrinal analysis of statutory specificity and formal procedural requirements in civil-law contexts, illustrated with Vietnam as the exemplar case; comparative observations.
Physical constraints (power grid reliability, water consumption for cooling, and data-center capacity) together with diminishing marginal returns on scaling make continued monolithic scaling economically and environmentally risky.
Conceptual argumentation using known infrastructure constraints and economic reasoning about diminishing returns; no new empirical assessment or quantified risk analysis included.
Reasoning-augmented models (e.g., models using chain-of-thought, multi-step reasoning, or external retrieval/looping) can inflate per-query compute by orders of magnitude, exacerbating sustainability problems.
Argument based on architectural patterns (multi-step reasoning, retrieval augmentation, multiple model passes) and reported per-query compute multipliers in auxiliary literature (referenced anecdotally); the paper provides no new benchmarked per-query compute measurements.
The energetic burden of generative AI is shifting from one-time training to recurring, potentially unbounded inference costs as models become productized and high-traffic.
Synthesis of industry observations and early/anecdotal quantitative reports on operational workloads; no original empirical time-series or workload measurements provided in this paper.
Scaling monolithic LLMs toward artificial general intelligence (AGI) is colliding with hard physical and economic limits (energy, grid stress, water use, diminishing returns).
Conceptual synthesis and argumentation drawing on observed industry trends (training/inference cost growth), infrastructure constraints (grid reliability, data-center cooling/water use) and theoretical diminishing marginal returns on model/data scaling. No new empirical dataset or controlled experiments reported in the paper.
Five qualitatively distinct D3 reflexive failure modes were identified in model responses, including categorical self-misidentification and false-positive self-attribution.
Qualitative coding and taxonomy reported in Results: five D3 categories cataloged with examples; identification based on analysis of model responses to narrative dilemmas (sample drawn from the study runs).
A probe composed of deliberately unresolvable moral dilemmas embedded in literary (science-fiction) narrative resists surface performance and exposes a measurable gap between performed and authentic moral reasoning.
Experimental application of the probe to 13 distinct LLM systems across 24 experimental conditions (13 blind, 4 declared re-tests, 7 ceiling-probe runs), with scoring and qualitative coding showing discriminating failure modes and a measurable gap in responses.
Existing AI moral-evaluation benchmarks largely measure surface-level, correct-sounding answers rather than genuine moral-reasoning capacity.
Comparative argument based on study results showing a measurable gap when applying the authors' narrative-based probe (unresolvable SF dilemmas) versus standard benchmarks; empirical support comes from experiments across 24 conditions and 13 systems showing systems produce plausible-sounding but reflexive/invalid reasoning on the narrative probe.
Capabilities and data advantages for certain vendors could lead to market concentration and platform dominance in AI-driven educational feedback.
Expert concern synthesized from the workshop of 50 scholars about market dynamics; theoretical warning without empirical market-structure analysis in the report.
Differential access to high-quality AI feedback systems and bias in training data can exacerbate educational inequalities and harm marginalized groups.
Expert consensus and thematic analysis from the 50-scholar workshop, raising equity and bias risks; no empirical subgroup effectiveness estimates included.
Learners may over-rely on AI feedback or game systems to obtain desirable responses, reducing effortful learning.
Workshop participant concerns synthesized qualitatively; cited as risk and an open empirical question—no experimental data provided.
Field observations from an enterprise deployment demonstrate production failure modes traceable to missing identity propagation, timeout/budgeting policies, and machine-readable error semantics.
Empirical context described as field lessons from an enterprise agent platform integrated with a major cloud provider's MCP servers; production failure vignettes and operational log analysis (client redacted).
MCP lacks three protocol-level primitives needed for reliable, production-scale agent operation: identity propagation, adaptive tool budgeting, and structured error semantics.
Observational analysis and classification of production failures from an enterprise agent deployment; taxonomy of failure modes identifying gaps in these specific areas.
Reliance on single-agent outputs or non-diverse agent ensembles can understate substantive uncertainty and bias conclusions in automated policy evaluation or AI-assisted empirical research.
Observed substantial agent-to-agent variability (NSEs) in the experiment (150 agents) demonstrating that single-agent results do not capture between-agent methodological uncertainty; imbalance between model families further implies potential bias if only one family is used.
The post-exemplar convergence largely reflected imitation of exemplar choices rather than demonstrated understanding or principled correction by agents.
Qualitative and behavioral analysis of agents' post-exposure outputs showing direct adoption of exemplar measures/procedures and lack of substantive justification or mechanistic reasoning indicating comprehension; inference based on content of agent code and writeups after exposure.
Chat-like interfaces commonly activate misleading beliefs including overtrust in correctness/robustness, attribution of goals or moral agency, and underestimation of hallucination/bias/privacy risks.
Aggregated observations from literature in HCI and ethics; suggested examples rather than empirical prevalence estimates; no sample size given.
Natural conversational style creates the impression the system is human-like, intentional, or reliably knowledgeable.
Conceptual claim supported by synthesis of prior work on anthropomorphism and conversational interfaces; no new quantitative data provided.
Reliance on preference signals risks learning spurious proxies and produces unstable behavior under distribution shift.
Theoretical argument supported by examples of spurious proxies in ML and by observations in RLHF-trained models; the paper cites literature showing proxy behavior but does not present a unified empirical quantification specific to RLHF across many tasks.
Positive preference signals are continuous, context-dependent, and entangled with surface correlates (e.g., agreement with the user), which causes models trained on them to pick up spurious proxies and exhibit sycophancy and brittleness.
Conceptual/theoretical argument in the paper describing structural properties of preference spaces, supported by cited observations of sycophantic behavior in models trained with preference-based objectives. No single definitive empirical quantification is provided within the paper; supporting examples are drawn from recent literature.
Agents that attempt to infer others' reasoning depth may be vulnerable to strategic misrepresentation (partners could behave to induce incorrect ToM estimates).
Conceptual analysis in the paper and discussion of strategic incentives; paper also identifies the risk and suggests potential mitigations (e.g., conservatism, verification, meta-reasoning).
Both too little and too much recursive reasoning (i.e., too shallow or too deep ToM) can produce poor joint behavior — miscalibrated anticipation harms coordination.
Observed non-monotonic effects in the reported experiments where fixed-order agents at either low or high ToM orders performed worse in mismatched pairings; evidence comes from the same multi-environment evaluation using joint-payoff / success-rate metrics.
Misalignment in Theory-of-Mind (ToM) order between agents (i.e., agents using different recursive reasoning depths) degrades coordination performance.
Empirical experiments using LLM-driven agents with configurable ToM depth across four coordination environments (a repeated matrix game, two grid navigation tasks, and an Overcooked task); comparisons of matched (same-order) vs mismatched (different-order) pairings using task-specific joint payoffs and success rates as metrics.
There is a risk of manipulation and misinformation if argument mining/synthesis is unregulated or misaligned with social incentives, creating externalities that may justify public intervention.
Conceptual risk assessment combining known misinformation dynamics and AI capabilities; no empirical incident data provided.
Increased error risk and weaker explainability from GLAI will raise malpractice and liability exposure for firms and lawyers, driving up insurance and compliance costs.
Legal-risk analysis and economic reasoning connecting explainability/liability to insurance costs; no empirical cost studies presented.