Evidence (3566 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Labor Markets
Remove filter
Reliance on preference signals risks learning spurious proxies and produces unstable behavior under distribution shift.
Theoretical argument supported by examples of spurious proxies in ML and by observations in RLHF-trained models; the paper cites literature showing proxy behavior but does not present a unified empirical quantification specific to RLHF across many tasks.
Positive preference signals are continuous, context-dependent, and entangled with surface correlates (e.g., agreement with the user), which causes models trained on them to pick up spurious proxies and exhibit sycophancy and brittleness.
Conceptual/theoretical argument in the paper describing structural properties of preference spaces, supported by cited observations of sycophantic behavior in models trained with preference-based objectives. No single definitive empirical quantification is provided within the paper; supporting examples are drawn from recent literature.
There is a risk of manipulation and misinformation if argument mining/synthesis is unregulated or misaligned with social incentives, creating externalities that may justify public intervention.
Conceptual risk assessment combining known misinformation dynamics and AI capabilities; no empirical incident data provided.
Increased error risk and weaker explainability from GLAI will raise malpractice and liability exposure for firms and lawyers, driving up insurance and compliance costs.
Legal-risk analysis and economic reasoning connecting explainability/liability to insurance costs; no empirical cost studies presented.
The combination of hallucination and professional overreliance strains existing regulatory goals (e.g., explainability, human oversight) within European AI governance frameworks.
Legal and regulatory analysis mapping technical and behavioral risks onto European AI governance goals; references to statutory/regulatory texts and policy debates. Qualitative argumentation rather than empirical test.
Fabricated or opaque intermediate data and reasoning in GLAI weaken explainability, making it difficult to provide meaningful explanations about how outputs were produced.
Conceptual analysis of token-prediction architectures, literature on explainability limits of LLMs, and legal/regulatory analysis referencing explainability requirements. No empirical measurement.
Hallucinated content produced by GLAI is often linguistically fluent and persuasive, increasing the risk that legal professionals will accept it without verification.
Literature synthesis on model fluency and behavioral literature on trust in coherent authoritative outputs, plus illustrative vignettes. No original experimental data or sample size.
This architectural mismatch (token-prediction vs. formal legal reasoning) contributes to confident but factually incorrect outputs (hallucinations) in GLAI.
Technical/conceptual analysis plus synthesis of existing literature on hallucinations in generative models; illustrative examples and vignettes provided. No primary empirical measurement in the paper.
Top-performing community submissions (including baselines and competition entries) still leave a performance gap relative to elite human play on battling tasks.
Paper reports comparative evaluation results showing win-rate and other metrics for heuristic, RL, LLM baselines and community submissions versus human (elite) benchmarks; analysis highlights a remaining gap.
Misalignment or poor meta-control could produce persistent unsafe behaviors in autonomous learners; governance and oversight mechanisms will be crucial.
Risk analysis based on conceptual failure modes for meta-control; no empirical incidents reported in the paper.
Current models transfer poorly across domains, are brittle in nonstationary environments, and are inefficient in physical/embodied tasks.
Synthesis of known challenges from prior literature and practical experience; paper cites these as motivating observations rather than reporting new data.
Current models have limited meta-control and do not autonomously decide when to explore, imitate, consult prior knowledge, or consolidate.
Conceptual critique based on typical ML training pipelines and limited on-line decision-making modules; no empirical tests in paper.
There is weak integration between passive observation (supervised/representation learning) and active experimentation (reinforcement/exploratory learning) in current systems.
Observation of methodological separation in current literature and systems; conceptual discussion in the paper.
Current AI models lack the architectures and control mechanisms required for sustained, autonomous learning in dynamic real-world settings.
Conceptual/theoretical analysis presented in the paper; synthesis of limitations observed in existing literature and practices (no new empirical data provided).
Public‑interest concerns (bias, misuse, systemic risk) may be harder to mitigate via simple transparency rules; policies should emphasize outcome‑based regulations, mandatory behavioral testing, and marketplace disclosure obligations for stressed scenarios.
Policy implication derived from the non‑rule‑encodability thesis; no empirical policy evaluation included.
Standard contracts and regulatory audits that rely on inspection of rule sets or source code will be insufficient to assess model behavior or risk; regulators and buyers must rely more on behavior‑based testing, standards, and outcome measures.
Policy and regulatory argument derived from the main theorem about non‑rule‑encodability; no empirical regulatory studies presented.
Full interpretability via rule extraction may be impossible for the most valuable parts of LLM competence, limiting the utility of some transparency approaches for safety and auditing.
Argumentative consequence of the main theoretical claim and structural mismatch; supported by historical limitations of rule‑based systems; no empirical tests reported.
There is a structural mismatch between explicit human cognitive tools (rules, checklists) and the pattern‑rich, high‑dimensional competence encoded in LLMs.
Theoretical/structural argument about distributed statistical representations in LLMs versus discrete rules; no experimental quantification provided.
Historical expert systems failed to generalize or scale to complex, ambiguous tasks, contrasting with LLMs' broader empirical successes.
Historical case analysis and literature review-style discussion of expert systems versus contemporary LLM performance; no new quantitative historical dataset provided.
LEAFE's benefits depend on informative, actionable feedback; environments with noisy or adversarial feedback may limit improvements.
Limitations stated in the paper noting sensitivity to feedback quality; conceptual reasoning that the method relies on extracting actionable signals from environment feedback.
Outcome-driven post-training (optimizing final rewards) underutilizes rich environment feedback and causes 'distribution sharpening' — policies overfit a narrow set of successful behaviors and fail to broaden problem-solving/recovery capacity in long-horizon settings.
Problem diagnosis in the paper supported by comparison of outcome-driven RL (GRPO) performance versus LEAFE and by conceptual argument about how optimizing final success signals can narrow behavioral support; supported by empirical observations of poorer recovery/generalization in baselines.
If left unchecked, managerial short-termism combined with AI adoption can create a feedback loop where firms cut labor to boost short-term profits, undermining aggregate demand and eroding the market that sustains those profits.
Conceptual macroeconomic and organizational synthesis drawing on theory and historical patterns; no new empirical time-series demonstrating this loop in current AI-driven layoffs.
Work-time reduction policies carry distributional and implementation risks (heterogeneous effects by occupation, firm size, capital intensity; risk of hidden wage cuts) that require careful compensation rules and monitoring.
Theoretical reasoning and references to heterogeneous outcomes in prior work-hour studies; no new empirical quantification of heterogeneity in AI-era implementations.
Lower household demand resulting from payroll cuts can precipitate further cost-cutting and automation, creating a self-reinforcing feedback loop that risks persistent demand shortfalls and higher structural unemployment.
Theoretical models of demand-driven adjustment and cited historical patterns; conceptual argument rather than empirical causal identification in contemporary AI contexts.
AI-justified layoffs are driven more by managerial short-termism and misaligned executive incentives than by immediate technological necessity.
Interdisciplinary conceptual synthesis drawing on labor-economics theory, organizational behavior literature linking executive compensation/short-termism to layoffs, and selected prior empirical studies; no new firm-level causal identification or large-scale dataset provided.
Distributional impacts of AI are uneven: younger workers and individuals with lower formal education face greater disruption.
Descriptive breakdowns of occupational vulnerability and employment changes by demographic groups (age and education) derived from labor statistics and vulnerability mapping; supported by qualitative case observations. Exact subgroup sample sizes not given.
Routine service and administrative occupations show the highest vulnerability to automation and displacement from AI.
Occupational vulnerability mapping using task/routine exposure methods and descriptive employment trend analysis across occupations; supported by employer survey responses and case-study observations. Sample sizes for surveys/mapping not provided in summary.
Passive monitoring and predictive models are insufficient for governing the complex dynamics of a tech-driven economy.
Conceptual critique based on economic cybernetics literature and the author's expert assessment; no empirical test comparing governance regimes is provided.
Digitalization is deepening digital inequality (unequal access to digital tools, skills, and benefits) across social groups and regions.
Qualitative analysis and expert assessment; the paper calls for new metrics but does not present systematic empirical measures of inequality.
Digital transformation can generate technological unemployment if not managed with appropriate retraining and social protection measures.
Expert assessment and literature-informed argumentation in the paper; no empirical longitudinal analysis isolating technology-driven job losses presented.
Forced or poorly regulated digitalization risks exacerbating social stratification.
Conceptual argument supported by qualitative analysis of policy documents and expert assessment; no empirical causal estimates provided.
Manufacturing and Retail experienced net employment contractions attributable mainly to task automation and substitution.
Simulated employment-level series and net change calculations by sector (Manufacturing, Retail) across 2020–2024 in the paper's dataset, together with literature-derived mechanisms emphasizing automation/substitution in these sectors (systematic review of selected publishers 2020–2024).
Explainability, trust, and demonstrated real-world effectiveness are key demand-side frictions; small-scale laboratory gains rarely translate into broad clinical uptake without workflow fit.
Adoption studies, qualitative interviews with clinicians and purchasers, and observations that many high-performing lab models see limited clinical use due to workflow and trust issues.
Hidden costs can arise from increased liability exposure, workflow redesign burden, and potential productivity loss during transition periods.
Qualitative deployment studies and procurement narratives reporting unanticipated legal, operational, and productivity impacts during early rollouts.
Human-AI collaboration can also generate harms, including automation bias, deskilling, and workflow disruption.
Behavioral laboratory experiments, simulation/reader studies demonstrating automation bias, qualitative reports and observational deployment accounts documenting workflow frictions and concerns about reduced trainee exposure.
Trust, verification costs, and legal/governance requirements remain consequential even with AI mediation and may limit or shape adoption.
Theoretical discussion of governance and verification costs; no empirical measurement of these costs in adopter firms provided.
AI-mediated interpretation and action carry risks related to quality, bias, and misalignment, which can produce miscommunication or incorrect automated actions.
Paper's discussion section raising caveats; conceptual risk analysis without empirical incident data; references to general concerns in AI safety literature (no new empirical evidence provided).
Despite positive outcomes, challenges such as workforce displacement, ethical concerns, and limited access to AI technologies were identified as barriers to full adoption.
Study respondents reported barriers in the survey; descriptive statistics summarized the prevalence of workforce displacement concerns, ethical issues, and limited access to AI technologies as impediments to broader adoption.
There is a growing tension between relatively rigid education and training systems and the rapidly changing skill requirements of digitally driven labor markets.
Argument motivated and supported by comparative assessment of international practices and systemic analysis; descriptive/comparative evidence rather than quantified empirical testing.
Analyses of online job postings indicate significant declines in demand for highly automatable and entry-level roles.
Empirical studies using online job-posting data described in the paper (methods: job-posting frequency/trend analysis; sample size/timeframe not specified in the excerpt).
Since the public release of ChatGPT in November 2022, concerns regarding job displacement, wage reduction, and labor market restructuring have intensified.
Temporal observation in the paper referencing heightened public and policy concerns after ChatGPT's release; based on cited literature and discourse (no sample size given).
Low‑skill installation and maintenance jobs have increased, but wage levels and upward mobility for these jobs remain lower than those in high‑skill industries.
Finding reported from the literature review and cited reports/studies indicating growth in low‑skill installation/maintenance employment alongside comparative analyses of wages and career mobility; no specific datasets or sample sizes provided in the summary.
Job polarization is occurring in solar power plants as a result of automation or digital transformation and changes in required skill sets.
Synthesis from the systematic literature review and referenced reports/studies indicating links between automation/digitalization and occupational shifts in solar plants; specific studies and sample sizes not provided in the summary.
The paper highlights that urgent policy intervention is required to reestablish a balance between the benefits of AI and the ethical ramifications that arise from these technologies, with a particular emphasis on job displacement.
Author conclusion drawn from the stated literature-based analysis; the excerpt does not list the specific studies, empirical findings, or criteria used to reach this policy recommendation.
There has been an increase in the level of concern regarding the ethical implications arising from the automation of tasks and the subsequent job displacement due to AI.
Author statement based on a review of (unspecified) novel studies and existing literature; no empirical sample size, instrumentation, or quantitative measure of 'concern' reported in the provided text.
The limitations of systems that prioritize academic pathways constrain workforce adaptability and inclusive labor market development.
Argument based on synthesis of empirical studies and secondary data connecting education pathway composition to workforce adaptability and inclusiveness (presented as a policy-relevant conclusion rather than a quantified causal estimate).
Skills mismatch in the labor market is structural and linked to education systems that prioritize academic pathways without adequate support for vocational and continuing training.
Integrated interpretation of comparative evidence and secondary data showing imbalances between academic and vocational provision and associated labor-market frictions (paper frames this as a structural conclusion; specific causal tests not described in the summary).
Expansion of intermediate vocational skills has been limited relative to the expansion of higher education.
Comparative evidence and secondary data showing smaller increases in intermediate vocational qualifications compared with higher education attainment (specific metrics/country coverage not provided in the summary).
The risk to the tax system is heightened by the federal government’s dependence on individual labor income even as economic value shifts toward mobile capital and AI ownership by large firms.
Analytical claim in the paper linking tax base dependence to shifts in economic value; no empirical measurement of 'mobile capital' or quantified shift included in the excerpt.
AI threatens to disrupt the tax system’s ability to fulfill its fundamental goals of raising revenue, redistributing income, and regulating taxpayer behavior.
Normative/policy argument made in the paper (no empirical testing or quantified projections provided in the excerpt).