Evidence (6869 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Information-quality externalities from misinformation and reduced trust impose social costs that are not internalized by producers, justifying policy interventions such as liability rules or provenance standards.
Theoretical externality reasoning and policy literature reviewed; no social-welfare empirical quantification included in the paper.
Economies of scale, data-driven advantages, and compute costs may concentrate market power in a few platforms or studios, raising entry barriers.
Market-structure reasoning and referenced industry analyses in the literature review; no empirical market-concentration metrics computed in the paper.
Cross-border enforcement difficulties and divergent national rules produce legal fragmentation in regulation and judiciary responses to generative audiovisual AI.
Comparative review of international statutes and judicial approaches included in the paper; qualitative legal analysis rather than empirical cross-jurisdictional enforcement metrics.
Process-stage risks include concentration of capabilities among a few platforms/actors and deficits in control, governance and transparency (e.g., limited explainability and restricted model access).
Policy and market-structure literature reviewed; descriptive evidence of platform concentration cited qualitatively but no original market-share analysis or sample sizes.
Key data challenges in African contexts are measurement error, censoring, selection bias (informal actors absent from official datasets), privacy/ethical concerns, and limited digital trace coverage in some regions.
Methodological critique synthesised from literature in the paper.
Key constraints on realized gains include governance complexity, model reliability limits (errors, brittleness, distribution shifts), orchestration challenges integrating agents across systems, and ongoing need for human oversight for safety, fairness, and quality control.
Qualitative observations and limitations reported from the Alfred AI deployments and authors' analysis of operational experience; evidence comes from live deployments but is descriptive rather than quantitative.
Data‑driven agritech platforms exhibit network effects and potential for market power, implying a policy need for data portability and interoperability to preserve competition.
Economic reasoning, policy reports, and case study examples summarized in the review; the claim is grounded in market analysis rather than large‑scale causal studies.
If left unregulated and untargeted, AI and digital agritech platforms risk concentrating surplus with technology providers and capital owners, potentially increasing rural inequality and weakening smallholder bargaining power.
Theoretical market‑structure analysis, case studies of platform markets, and policy analyses cited in the paper; empirical causal evidence on long‑run distributional effects is limited.
Data ownership, lack of interoperability, privacy concerns, and concentration of digital agritech platforms create risks for competition and equitable value capture in agricultural value chains.
Policy reports, market analyses, and case studies discussed in the paper; the claim is supported by descriptive evidence and theoretical assessments rather than large causal estimates.
Accumulated latent defects from unchecked AI outputs create negative externalities across dependent systems, complicating pricing and insurance; liability and cyber insurance markets may need to adapt.
Policy and economics argumentation drawing on externality theory; no actuarial or insurance-market empirical analysis provided.
Measured productivity gains from AI-assisted development may overstate welfare gains if verification costs, defect externalities, and long-run fragility are omitted from accounting.
Economic reasoning and accounting argument; no empirical accounting studies or welfare analyses presented.
The harm from latent defects is diffuse and slow-moving, making it easy for decision-makers to underweight these risks in adoption choices.
Descriptive argument drawing on behavioral economics concepts (discounting, salience); no empirical decision-making data included.
Small, unverified changes accumulate over time into system-level fragility, hidden bugs, and security vulnerabilities (latent risk accumulation).
Causal reasoning and illustrative examples; no longitudinal empirical measurement of defect accumulation presented.
AI-assisted code generation produces a throughput asymmetry: generation capacity rises much faster than human or automated verification capacity.
Synthesis of conceptual arguments and illustrative scenarios; no quantitative empirical evidence or sample-based analysis included in the paper.
Verification (human review, testing, security analysis) does not scale at the same rate as AI-assisted generation and becomes the bottleneck.
Mechanism reasoning and qualitative argumentation; illustrative examples showing mismatch between generation and verification capacity. No empirical scaling measurements provided.
Overreliance on generative AI risks eroding worker critical thinking and loss of tacit expertise.
Conceptual arguments supported by observational reports and theoretical concerns in the literature synthesis; limited empirical evidence cited.
Security vulnerabilities and IP leakage create negative externalities; absent internalization, social costs (breaches, legal disputes) may rise.
Security analyses, documented incidents, and economic externality reasoning synthesized from the literature; empirical quantification of social cost is limited.
Generated code may incidentally reproduce copyrighted or licensed snippets from training data.
Analyses detecting verbatim or near-verbatim reproductions of licensed/copyrighted code in model outputs in selected tests and audits; evidence heterogeneous and depends on prompts and model/data.
Outputs often lack deep, project-level contextual reasoning (e.g., design tradeoffs, architecture constraints).
Qualitative failure-mode analyses, user studies, and benchmark tasks showing limitations in system-level reasoning and context-aware design decisions; evidence from short-horizon labs and case studies.
There is a risk of shallow learning if learners over-rely on AI outputs without understanding fundamentals.
Educational studies and observational analyses indicating reduced engagement with underlying concepts for some learners using AI assistance, plus qualitative reports from instructors; studies often short-term.
There is a significant political-economy risk that dominant states or firms (an "AI superpower" veto) could block or undermine coordination on token taxes.
Political-economy discussion identifying veto risks and possible deterrent mechanisms; conceptual argumentation without empirical probability estimates.
FLOP taxes face measurement, enforceability, and leakage challenges and tax inputs rather than where value is realized.
Comparative critique presented in the paper; conceptual analysis without empirical measurement of FLOP-tax implementations.
Conversely, lack of standards or failed validation can create regulatory setbacks, reputational risk, and stranded R&D spending.
Case reports and regulatory analysis in the narrative review describing negative outcomes from failed validation or non-aligned AI tools (qualitative evidence).
Productivity gains from deploying agentic AI may be overstated if alignment costs, monitoring overhead, and coordination inefficiencies are ignored.
Conceptual economic accounting argument; recommends new accounting categories and empirical studies to quantify these factors.
Agentic systems generate tail risks and endogenous systemic correlations (multiple systems converging on similar failure modes), creating new insurability challenges.
Theoretical risk analysis and analogy to systemic risk literature; proposed implications for insurance markets but no empirical testing.
Coordination and control mechanisms (hierarchies, protocols, monitoring) face scalability and specification problems when agents generate unforeseen actions.
Theoretical analysis and examples from multi-agent/organizational theory; no empirical measurement included.
Human cognitive learning processes (calibration, error-correction) may misalign with agentic AIs because humans and AIs learn from different signals and on different horizons.
Conceptual argument supported by cross-disciplinary literature synthesis; empirical tests are proposed but not conducted in the paper.
Relational interaction mechanisms (trust, norms, mutual adjustment) can break down when AI objectives diverge or are opaque, reducing effective teaming.
Argument drawing on human factors and HAT literature; no new experimental data presented.
Agreement on bounded outputs (specifications, short-term goals) is insufficient for maintaining alignment with agentic AI.
Theoretical critique of specification-based alignment approaches; literature on limits of bounded specifications applied to open-ended systems.
Agentic AI undermines key assumptions that shared awareness will reliably stabilize coordinated action over time.
Theoretical argument showing mismatches in representation, timescales, and learning dynamics between humans and agentic AIs; drawn from literature synthesis rather than empirical tests.
Under agentic conditions, alignment cannot be treated as a one-time agreement over bounded outputs; it must be continuously sustained as plans and priorities evolve.
Conceptual argument and modeling in the paper; literature synthesis highlighting limits of specification-based alignment approaches; no empirical validation presented.
Agentic AI creates a new kind of structural uncertainty for human–AI teaming (HAT).
Theoretical/conceptual synthesis across literature on HAT, Team Situation Awareness (Team SA), human factors, multi-agent systems, and AI alignment; no new empirical data.
Regulators can operationalize 'human oversight' through auditable handover architectures like DAR, but this will increase compliance and record-keeping costs for firms and public bodies.
Policy implication argued in the paper: coupling Reversal Register and hysteresis parameters to regulatory enforcement; no empirical cost estimates provided.
Current AI tooling often mismatches existing team workflows and CI/CD pipelines, reducing seamless adoption.
Qualitative observations and practitioner reports from the Netlight study describing tooling and workflow frictions; specific integrations or lack thereof discussed but not quantitatively evaluated.
Generated code can introduce security vulnerabilities and licensing/IP ambiguity, raising quality, security, and IP concerns.
Practitioner concerns and examples documented in interviews and observations at Netlight; paper cites security and IP uncertainty as recurring themes; no systematic security scans or legal analyses reported.
Compliance with GDPR/CCPA and auditing for bias/harms imposes non-trivial technical and legal costs; implementing federated learning and DP increases engineering complexity and compute cost.
Paper's policy and cost discussion; cites increased engineering complexity and compute demands for privacy-preserving deployments but does not present quantified cost estimates.
Firms need complementary investments (data pipelines, monitoring tools, feedback loops, human oversight systems) which materially affect the economics of adoption.
Industry case studies and practitioner reports synthesized in the review describing necessary complementary investments; no quantified investment sample or ROI analysis provided here.
Regulatory attention is likely to focus on transparency, liability for factual errors, data privacy, and nondiscrimination; compliance and auditing will add to adoption costs.
Policy and regulatory analyses aggregated in the review and references to ongoing regulatory discussions; no primary regulatory impact study conducted in this paper.
Generative AI currently lacks genuine empathy and relational capabilities necessary for high-stakes or sensitive interactions.
Conceptual analyses and practitioner case examples aggregated in the review; limited direct quantitative measurement cited in this brief review.
Generative models exhibit contextual misunderstandings and cannot reliably infer nuanced customer intent in all cases.
Synthesis of empirical studies and practitioner observations documenting misinterpretation and intent-detection failures; no new testing reported in this review.
There is substitution risk: routine ideation and drafting tasks may be automated, altering task-level labor demand and wage structure.
Task-automation literature and empirical studies of LLMs performing routine drafting/ideation tasks summarized in the review; no long-run labor-market causality established in the paper.
Generative AI lacks reliable situational judgment on ambiguous problems and on ethical trade-offs, making it insufficient for autonomous decision-making in such contexts.
Case examples and experimental studies cited in the synthesis showing inconsistent or inappropriate responses to ambiguous/ethical scenarios; no large-scale causal evidence provided.
LLMs are prone to bias, mediocrity, and factual or logical errors when domain-specific context or experiential knowledge is absent.
Review of empirical evaluations documenting biased outputs, superficial or mediocre suggestions, and factual errors in open-ended tasks and domain-specific prompts; evidence comes from multiple short-term studies and applied examples.
LLMs are predominantly recombinative — they tend to rework and recombine existing material rather than produce deeply novel insights.
Analytical synthesis of output analyses and creativity assessments from multiple empirical studies demonstrating frequent recombination of existing concepts and lower rates of highly original novelty; studies and measures vary.
Proliferation of low-quality or biased AI-generated ideas creates externalities: increased filtering and reputational costs for firms and risks of poor product designs, ethical lapses, or regulatory violations if evaluation is insufficient.
Case studies and qualitative reports documenting filtering burdens and instances of biased/misleading outputs; theoretical reasoning about reputational and regulatory risks; direct quantification of these externalities is limited.
Standard productivity metrics (e.g., TFP) may undercount the value of ideation and creative augmentation provided by generative AI, making attribution between human and AI contributions difficult.
Methodological discussion in the review supported by heterogeneity in outcome measures across studies and challenges in measuring implemented idea quality and long-run impacts.
Generative models exhibit recombination bias: they tend to remix existing patterns rather than produce deeply original, paradigm-shifting insights.
Synthesis of output analyses across studies showing frequent recombination of known patterns and limited evidence of wholly novel, paradigm-changing ideas; claim based on qualitative and comparative analyses in reviewed literature.
Integration complexity (data access, context continuity, privacy/security, workflow alignment) raises implementation costs and time-to-value.
Deployment case studies and vendor reports documenting engineering effort, data plumbing, compliance work, and multi-month integration timelines; no aggregated cost meta-analysis provided.
Lack of genuine empathy and emotional intelligence undermines performance on complex or emotionally charged interactions.
Qualitative assessments and noisy measurement from pilot studies and customer feedback in complex cases; limited experimental validation and heterogeneous metrics.
Time/resource costs for re-running analyses and lack of computational environment capture (e.g., Docker/conda containers) increase the difficulty of reproducing results.
Empirical notes from reproduction attempts about compute/time burdens and survey/interview responses highlighting absence of containerized or captured environments as an obstacle.