Evidence (6869 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
AI integration creates challenges such as workforce displacement that must be addressed.
Authors raise workforce displacement as a challenge/consideration in the paper's discussion; this appears as a qualitative claim rather than an empirically quantified result in the supplied text.
AI integration creates challenges such as algorithmic bias that must be addressed.
Authors identify algorithmic bias as a notable challenge in the discussion/conclusion; presented qualitatively rather than as an estimated empirical outcome in the supplied text.
Responsible AI research typically focuses on examining the use and impacts of deployed AI systems, and there is currently limited visibility into the pre-deployment decisions to pursue building such systems.
Argument and literature framing presented in the paper based on a scoping review of academic literature, civil society resources, and grey literature.
This concentration can diffuse responsibility and raise the probability of irreversible system-level loss even when local per-action error rates remain low.
Theoretical result/argument from the model linking concentrated decision-energy to increased systemic risk despite low local error rates.
Efficiency pressure, path dependence, scale feedback, and weak boundary constraints concentrate decision-energy in the most efficient node.
Derived from the paper's formal model and argumentation about system dynamics (efficiency and feedback mechanisms); theoretical rather than empirical evidence.
Declining deployment friction changes the safety problem at its root: safety is not only local output correctness or preference alignment, but the control of irreversibility under rising decision density.
Main theoretical argument of the paper; supported by conceptual framing and a formal model that introduces decision-density considerations.
Recent AI systems compress the distance between capability growth and capability deployment.
Conceptual and descriptive claim in the paper's introduction; supported by theoretical argumentation and illustrative examples rather than empirical measurement.
Of these four, integration capacity is the least developed for scientific institutions and the most binding: no improvement in AI tooling can buy it.
Normative/diagnostic claim in the paper about relative scarcity and irreducibility of integration capacity; no empirical measures or sample provided in the excerpt.
Four complements then become scarce and load-bearing for AI-augmented science: verified signal, legitimacy, authentic provenance, and integration capacity (the community's tolerance for delegated cognition).
Theoretical framework proposed by the paper; list of four complements presented as an argument without empirical quantification in the excerpt.
The most valuable AI capabilities (reasoning, judgment, intuition) are precisely those we cannot verify with current methods.
Argumentative claim in the position paper linking capability value to unverifiability; no empirical validation or measurement of 'value' or verifiability included.
Current reliability methods can only verify explicit knowledge against sources, creating a fundamental gap in verifying AI's implicit knowledge.
Conceptual critique in the paper of existing verification/validation approaches; no systematic review or empirical comparison provided.
Implicit knowledge remains unexternalized because documentation cost exceeds perceived value.
Presented as an economic/theoretical explanation in the paper; no empirical study, sample, or cost estimates provided.
Specification discipline, not model capability, is the binding constraint on AI-assisted software dependability.
Synthesis conclusion by the authors based on the multivocal literature review, telemetry findings, conceptual modeling (PRP/SGM), and the four-month pilot evaluation.
These conflicting findings constitute the Productivity-Reliability Paradox (PRP): a systematic phenomenon emerging from non-deterministic code generators and insufficient specification discipline.
Conceptual synthesis and interpretation by the paper's authors, based on the multivocal literature review, telemetry, and experimental evidence summarized above.
Telemetry across 10,000+ developers shows 91% longer code review times.
Observational telemetry data aggregated across >10,000 developers reported in the paper; metric reported is percent increase in review time.
The most rigorous randomized controlled trial (RCT) documents a 19% slowdown for experienced developers.
A single RCT cited in the paper described as the most rigorous trial; result reported as a 19% slowdown for experienced developers. Sample size for the RCT is not provided in the summary statement.
Whether it is the periodic compulsory recoinage in medieval Europe or Gesell's stamp scrip, both are essentially mechanisms for taxing money holdings.
Interpretive/historical claim presented by the authors; no empirical testing or sample reported in the excerpt.
The devaluation of money runs through almost the whole process of history, from the weight reduction and purity decrease of metallic coin to the unanchored over-issuance of paper currency.
Historical summary/claim by the authors referencing long-run monetary history; no specific empirical study or sample size given in the excerpt.
Disparities may lead to AI bias and governance challenges that potentially leave the poorest communities excluded from the Fourth Industrial Revolution.
Paper lists AI bias and governance challenges as potential consequences of uneven AI development; presented as conceptual/ethical/political risks without empirical quantification in the excerpt.
These disparities risk causing economic isolation and social inequality.
Qualitative claim in the paper listing potential socio-economic risks of uneven AI adoption; no supporting empirical estimates in the excerpt.
These disparities carry the risk of a deepening digital divide.
Stated as a consequence/risk in the paper; presented qualitatively without empirical quantification in the excerpt.
Projections indicate that without additional measures, these disparities are likely to increase.
Paper reports forward-looking projections or scenario analysis (methods, assumptions, and quantitative projection details not given in the excerpt).
Low-income regions (in particular parts of Africa and South Asia) lag significantly behind in both education and access to digital technologies.
Statement in the paper based on comparative assessment of education levels and digital access across regions; the excerpt provides no numeric data or described sample.
Keeping humans in the loop can sometimes make the decision worse.
Argumentative/diagnostic statement in the paper (theoretical assertion; no experimental or observational effect sizes reported in the excerpt).
Leaders may believe oversight remains meaningful when it has become ceremonial.
Conceptual warning in the paper about erosion of meaningful oversight (no empirical validation provided in the excerpt).
The central risk is misrecognition: leaders may keep a human-centered story in place after decision-shaping authority has shifted elsewhere (e.g., to AI).
Analytic/diagnostic claim in the paper (conceptual warning; no empirical sample or measured incidence provided).
Current AI agents implement only the first half of CLS (fast exemplar/hippocampal-style storage) and lack the slow weight-consolidation half.
Analytic claim in paper comparing current AI agent designs to CLS; no empirical evaluation reported in abstract.
Agents that rely only on lookup are structurally vulnerable to persistent memory poisoning as injected content propagates across all future sessions.
Theoretical/security argument presented in paper; claims about propagation of injected content across sessions; no empirical attack experiments detailed in abstract.
Conflating the two produces agents that face a provable generalization ceiling on compositionally novel tasks that no increase in context size or retrieval quality can overcome.
Formal claim asserted in paper (formalization of limitations and proofs claimed); no empirical sample detailed in abstract.
Conflating retrieval and weight-based memory produces agents that accumulate notes indefinitely without developing expertise.
Theoretical argument/formalization presented in paper; claim based on analysis of how lookup-only systems fail to consolidate abstract knowledge; no empirical sample reported in abstract.
Treating lookup as memory is a category error with provable consequences for security.
Theoretical/formal argument and formalization in paper; security consequences (e.g., persistent poisoning) claimed; no empirical sample reported in abstract.
Treating lookup as memory is a category error with provable consequences for long-term learning.
Theoretical/formal argument asserted in the paper, drawing on formalization and Complementary Learning Systems theory; no empirical sample reported in abstract.
Treating lookup as memory is a category error with provable consequences for agent capability.
Theoretical/formal argument asserted in the paper (formalization and proofs claimed); no empirical sample reported in abstract.
Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup.
Conceptual/analytic claim stated in paper; supported by comparison of existing agent memory mechanisms (vector stores, RAG, scratchpads, context-window management) to the paper's definition of 'memory'. No empirical sample reported.
Existing approaches address data quality but not data valuation.
Literature review / background discussion in paper contrasting prior work on data quality with lack of approaches for data valuation.
Existing approaches, runtime guardrails, training-time alignment, and post-hoc auditing treat governance as an external constraint rather than an internalized behavioral principle, leaving agents vulnerable to unsafe and irreversible actions.
Author's conceptual/literature critique presented in the paper (argumentative claim, no empirical sample or experiment reported for this statement).
Obstacles exist for healthcare workers in rural areas that limit the benefits of technology.
Review conclusion noting persistent obstacles for rural healthcare workers drawn from the literature; synthesis of qualitative/quantitative sources (no sample size in excerpt).
Indian healthcare faces barriers to technological integration such as financial issues, poor infrastructure, and regulatory problems.
Review-identifed barriers drawn from the literature (qualitative and quantitative studies summarized by the authors); no aggregate sample size reported in the excerpt.
Algorithmic collusion is a new form of market failure arising from the agentic economy.
Theoretical claim and analysis of market failure mechanisms; no empirical antitrust cases or simulation evidence included in the provided text.
AIOs are less robust to minor query edits.
Experiments applying small edits to queries and measuring changes in AIO outputs; observed larger changes for AIOs compared to traditional search.
AIOs are less consistent when processing two runs of the same query.
Repeated-query experiments (running the same query multiple times) comparing AIO outputs across runs and measuring variability; paper reports greater run-to-run inconsistency for AIOs.
Websites that block Google's AI crawler are significantly less likely to be retrieved by AIOs, despite having access to the content.
Comparison of retrieval frequency in AIOs for domains that block Google's AI crawler versus domains that do not, using the benchmark set of queries and observed crawl/access signals.
AI governance, ethical concerns, openness, workforce adjustment, and integration complexity are crucial concerns that managers must consider when implementing AI.
Synthesis of risks and challenges reported across the reviewed literature (paper's discussion/conclusion); no specific counts of studies or empirical measures provided in the abstract.
Conventional managerial practices usually encounter difficulties dealing with the flow of information, ineffectiveness of workflow, slow decision making, and redundant administrative processes.
Background statement in the paper's introduction / literature review (narrative claim based on surveyed literature); no specific empirical study or sample size reported in the abstract.
The research also identifies policy loopholes and unequal AI preparedness on the continent.
Findings from the paper's systematic review highlighting gaps in policy frameworks and uneven preparedness across Sub‑Saharan African countries; no country‑level counts or indices provided in the summary.
Results indicate rising job displacement, industrial change, and inequality.
Aggregate findings reported from the systematic review pointing to increases in job displacement, structural industrial change, and inequality across studies; no aggregated numerical magnitudes provided in the summary.
They are a threat to semi-and unskilled jobs, particularly in manufacturing.
Conclusion from the systematic review synthesizing studies on automation risk to semi- and unskilled positions, especially in manufacturing; no numerical risk estimate provided in the summary.
Vulnerable populations—including low-skill workers, aging labour forces, and developing economies—are especially affected by AI-driven changes.
Abstract highlights special attention to vulnerable populations in the review and asserts differential impacts; no specific empirical estimates or sample sizes provided in abstract.
AI displaces routine cognitive and manual tasks.
Explicit finding reported in abstract based on the paper's systematic review of empirical studies (no individual study sample sizes or quantitative estimates provided in abstract).
In resource-dependent regional economies, AI adoption can transform seasonal industries into continuous economic infrastructure and replace intermediate coordination roles and traditional employment structures.
Illustrative case analysis used in the paper to show how the framework applies to resource-dependent regions; described as an illustrative argument rather than an empirically validated causal estimate in the provided text.