Evidence (6869 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Authors propose the 'AI orchestra' concept: future development will involve coordinated ensembles of specialized AI agents (code generation, test generation, dependency analysis, security scanning) orchestrated by humans and higher-level controllers.
Theoretical/conceptual argument by the authors grounded in qualitative findings from Netlight (practitioner reports of multiple tools and coordination frictions); this is a forward-looking synthesis rather than an empirically established fact.
Modular and cell‑free platforms could enable decentralized, localized manufacturing of specialty compounds, potentially altering trade flows away from centralized petrochemical hubs.
Conceptual synthesis plus small-scale demonstrations of modular/cell-free units in the reviewed literature; limited pilot projects and discussion of potential scalability and portability.
Product teams evaluating LLM-powered features rely on a spectrum of practices—from informal “vibe checks” to organizational meta-work—to cope with LLMs’ unpredictability.
Qualitative interview study with 19 practitioners; thematic coding of transcripts produced descriptions of a range of evaluation practices used by teams.
Platform design choices (property rights, portability, reputation, tokenization, escrowed memories) will shape incentives for contributions to shared knowledge and agent improvement.
Policy and mechanism-design implications drawn from observed phenomena (shared memories, contributions, and trust) in the qualitative dataset; recommendation rather than empirically tested claim.
Shared memory architectures create public-good–like externalities (knowledge diffusion and spillovers) that may be underprovided absent coordination or platform governance.
Qualitative observations of shared memories and diffusion patterns plus theoretical economic interpretation; no empirical quantification of spillover magnitudes provided.
Easier specification of constraints can reduce some harms (clear safety violations) but centralizes normative power (who defines constraints) and creates international/cultural externalities and risks of regulatory capture.
Normative and economic argument in the paper combining technical tractability of constraints with governance concerns; this is an inference about likely distributional effects rather than empirically established fact.
Because failure modes such as definition misalignment and hypothesis creep were observed, the authors argue for regulation/standards around disclosure of AI-assisted scientific claims and archival of verification artifacts.
Policy recommendation in the paper derived from the documented process-level failure modes in the single project; recommendation is prescriptive, not empirically validated beyond the project.
Improved throughput and lower travel costs can induce additional travel demand (rebound), partially offsetting congestion/emissions gains unless paired with demand-management measures.
Theoretical economic reasoning presented in the paper as a caveat; not directly measured in the simulation experiments (no induced-demand dynamic experiments reported).
There is a social welfare trade‑off between personalization value (higher AAR) and normative/social risk (higher MR); optimal policy and product design should balance these using BenchPreS metrics.
Analytical argument combining empirical findings (trade‑off between AAR and MR) with economic welfare considerations; the paper does not present formal welfare estimates or market experiments.
AI in higher education is not simply a technological shift but a structural transformation requiring deliberate, critically informed governance grounded in equity and human agency.
Normative/conceptual conclusion drawn by the author from the thematic analysis and the critical AI media literacy framing; presented as the paper's principal argument or recommendation. (Supported qualitatively by themes from the analyzed discussions rather than quantitative causal evidence.)
The adoption of AI governance programmes by military institutions will have strategic implications.
Hypothesis stated by the author; presented as forward-looking analysis without accompanying empirical modeling, historical analogues, or measured strategic outcomes in the provided text.
The expansion of the gig economy reflects both genuine labor-market innovation enabling worker flexibility and cost shifting from firms to workers that policy intervention may appropriately address.
Synthesis and interpretation of the study's empirical findings (prevalence, heterogeneity, earnings gaps, distributional effects, and social protection measures) from administrative data, labor force surveys, and platform transaction records across 24 OECD countries (2015–2025).
Standard productivity metrics (e.g., output per hour) may misprice value if temporal quality matters; firms will face trade‑offs between maximizing throughput and preserving richer subjective temporality that affects long‑run creativity, morale, and retention.
Conceptual economic reasoning and literature synthesis on attention and productivity; no empirical studies or longitudinal workplace data presented.
Investors and firms may need to include metrics of experiential quality (subjective well‑being, sustained attention quality) alongside productivity metrics when valuing neurotech and human–AI platforms.
Normative/economic implication argued from the framework; no empirical valuation studies or survey of investor behavior included.
AI raises returns to platformization and can change the distribution of financial intermediation rents (potentially concentrating returns among platform incumbents).
Theoretical and economic reasoning in the 'Implications for AI Economics' section; conceptual discussion of platform effects and rents rather than empirical measurement in the paper summary.
Reported pilot gains, if scaled, could shift firm‑level returns and industry productivity measures, but gains are contingent on coordinated adoption; uneven uptake may produce winner‑takes‑more dynamics among technologically advanced firms.
Inference from pilot results and economic reasoning in the reviewed literature; no large‑scale empirical validation provided in the review.
Topology is the dominant factor for price stability and scalability compared to other swept variables (load, presence of hybrid integrator, governance constraints).
Factor-ablation analysis within the 1,620-run simulation study showing the largest explanatory effect (largest changes in volatility and scalability metrics) attributable to graph topology rather than load, hybrid flag, or governance settings.
Demand for mid-level, routine-focused developer roles could compress while demand rises for verification, security, and AI–human orchestration skills.
Theoretical task-replacement argument based on observed capabilities of LLMs and synthesized user study evidence; limited direct labor-market empirical evidence in the reviewed literature.
Routine coding tasks may be partially automated, shifting human labor toward verification, integration, architecture, and domain-specific tasks.
Task-composition studies, user studies showing LLMs handle boilerplate/routine work, and economic inference synthesized across studies.
Societal acceptance of AI-generated audiovisual media is uncertain and could range from widespread uptake to broad rejection.
Discussion drawing on mixed empirical studies and scenario construction in the review; the paper notes contradictory findings in existing studies but does not provide primary survey data or sample sizes.
If cognitive interlocks are widely adopted, many negative externalities can be internalized and AI-driven productivity gains can be realized more sustainably; absent such controls, equilibrium may drift toward higher error rates and systemic incidents.
Long-run equilibrium argument based on theoretical reasoning and conditional claims; no longitudinal or cross-firm empirical evidence presented.
Labor demand effects are ambiguous: junior/entry-level demand may be reduced for some tasks while demand for verification and higher-skill roles may rise.
Economic reasoning, early observational signals, and theoretical task-reallocation frameworks; empirical longitudinal evidence is limited or absent.
The effectiveness of generative AI depends critically on human-AI workflows: prompt design, iterative refinement, and human vetting materially affect outcomes.
Qualitative analyses of interaction patterns and experiments manipulating prompting/iteration showing variation in outcomes; many studies report improved outputs after iterative prompting and human-in-the-loop refinement.
Market demand is likely to bifurcate: high-value clinical markets will require rigorous explainability and neuroscientific grounding (higher willingness-to-pay), while research and consumer segments may tolerate black-box models (lower margins).
Market segmentation argument built from differing end-user requirements and tolerance for opaque models; presented as a projected implication rather than an empirically tested market study.
Teams often produce evaluation outputs (tests, metrics, user feedback) but lack mechanisms, processes, or technical levers to convert those outputs into actionable engineering or product changes—a novel “results-actionability gap.”
Recurring theme from the 19 practitioner interviews and coding; authors explicitly articulate and label this gap based on participants' reports.
The study confirms several previously documented evaluation challenges with LLMs: model unpredictability, metric mismatch, high human-evaluation costs, and difficulty reproducing failures.
Interview data from 19 practitioners; thematic analysis flagged these recurring problems as reported by participants and aligned with prior literature.
Emergent quality hierarchies among agents imply winner-take-most dynamics in informational value and potential market concentration in agent quality.
Observed formation of quality hierarchies in agent interactions and documented economic interpretation; this is a hypothesis/implication drawn from qualitative patterns rather than measured market outcomes.
Security of LLM-based MASs functions as an economic externality: failures can impose social costs (misinformation, poor collective decisions), and absent liability or market incentives providers may underinvest in robustness.
Economic reasoning and implication section in the paper—conceptual argument linking the technical vulnerability to economic externality and incentive misalignment. No empirical economic data provided in the summary.
Analytical conditions on stubbornness and influence weights identify when a single adversary can dominate network dynamics (i.e., influence propagation criteria derived from FJ fixed-point analysis).
Mathematical/theoretical analysis of FJ model fixed points and influence propagation in the paper; derivation of conditions relating agent stubbornness and interpersonal trust weights to steady-state influence.
If models frequently leak or misuse preferences in third‑party contexts, users and organizations will discount the value of personalization or demand stronger controls, increasing costs for deploying memory features and reducing consumer surplus.
Economic reasoning and implication drawn from the observed misapplication behavior; no empirical user adoption or market data provided in the study to directly support this claim.
The failure mode (misapplication of preferences to third parties) creates negative externalities (privacy violations, normative harms, misinformation, contractual breaches) that markets and platforms may not internalize without regulation or design changes.
Economic interpretation and argumentation building on the empirical failure mode; these harms are hypothesized implications rather than measured outcomes in the paper.
Unclear liability frameworks increase perceived and real costs and can slow adoption by hospitals and insurers.
Policy analyses and procurement narratives noting liability uncertainty cited as a barrier to procurement and deployment.
Up-front implementation costs commonly include procurement, integration with PACS/EMR, UI/UX development, regulatory compliance, and staff training; recurring costs include monitoring, data labeling, software updates, and cybersecurity.
Implementation reports, vendor and hospital accounts, and qualitative studies documenting cost categories (specific dollar amounts vary across settings and are rarely published in detail).
Without continuous support for upskilling/reskilling and inclusive policies, AI risks becoming a source of exclusion rather than an enabler of human advancement.
Normative conclusion derived from reviewed literature and thematic interpretation in the qualitative study (literature-based; evidence is secondary and not quantified).
Research literature synthesis demonstrates 70-75% automation potential.
Quantitative estimate offered by the authors (70-75%) as part of function-by-function analysis; no described empirical evaluation or sample supporting the figure.
Knowledge transmission (teaching/lecturing) shows 75-80% AI substitutability.
Authors' quantitative estimate presented in the analysis (75-80%); the paper does not detail empirical methods or validation samples for this percentage.
Administrative tasks face 75-80% disruption risk from AI.
Paper provides a quantitative estimate (75-80%) as part of its functional disruption assessment; no empirical methodology, dataset, or sample size is described to support the numeric range.
Aggregation and linkage across data sources can reveal intimate, predictive traits that were not foreseeable to the data subject at the time of sale.
Conceptual argument with references to documented cases and literature on data linkage and inference; relies on illustrative examples rather than original empirical experiments.
The United States shows a more market-driven (firm-dominated) patenting profile and comparatively weaker integration between AI and robotics patent trajectories.
Country-level and actor-type decomposition for U.S. patent filings (1980–2019), showing higher firm share of patents and weaker long-run association/cointegration between core AI and AI-enhanced robotics series compared with China (as reported in the paper).
There is a risk of a two‑tier market where high‑quality temporal‑preserving enhancements are costly, increasing inequality in experiential welfare and cognitive capital.
Speculative socioeconomic implication based on cost/access arguments and distributional concerns; no inequality modeling or empirical pricing data provided.
Technical expansion without an accompanying theory of lived temporality risks increasing capabilities while degrading the qualitative depth of human experience (presence, attentional flow, felt meaning).
Argumentative claim supported by philosophical analysis and literature synthesis (neurophenomenology, attention economics); no empirical test reported (N/A).
High-quality, equitable climate information displays public-good characteristics (nonrival, nonexcludable at scale), so private incentives alone will underprovide geographically representative data and shared infrastructure.
Economic reasoning supported by observed concentration of compute and model development (mapping) and standard public-goods theory; no formal empirical market model estimated in the paper.
Full replacement of physicians would require breakthroughs in robust generalization, embodied capabilities, and legal/regulatory change—currently lacking.
Conceptual inference based on documented limitations (OOD generalization, lack of embodied/sensorimotor capability, unsettled legal/regulatory environment) summarized in the review.
Shrinking acquisition workforce capacity functions as a critical scarce input in defense AI economics; reduced human capital lowers the Department's ability to extract value from AI investments and to internalize externalities, decreasing effective returns to AI procurement.
Institutional trend evidence of workforce reductions combined with economic analysis treating institutional capacity as an input factor. No empirical quantification of returns or elasticity provided—this is analytical inference.
Ambiguous standards increase uncertainty for contracting officers, raising the risk that they will either over-rely on vendor claims or inconsistently enforce requirements, both of which harm procurement integrity.
Policy-text analysis identifying vague criteria combined with qualitative analysis of procurement decision workflows; argument based on measurement and enforcement friction literature. No empirical study of contracting officer behavior provided.
Lower governance barriers and ambiguous procurement criteria (e.g., undefined 'model objectivity') can skew market competition toward suppliers that prioritize rapid iteration and opaque practices over rigorous assurance, harming traceability and quality.
Market-effects reasoning grounded in policy changes (document analysis) and qualitative institutional analysis of measurement/enforcement frictions. No market-share or supplier-behavior data provided.
Mandating permissive contract terms and enabling waivers reduces private incentives for contractors to invest in safety and compliance, creating classical moral-hazard problems in defense AI procurement.
Economic reasoning and principal–agent analysis applied to the documented contractual changes (primary-source policy text). No empirical measurement of contractor investment behavior provided; claim is theoretical/inferential.
A mismatch between expanded waiver authority (Barrier Removal Board) and declining acquisition oversight capacity creates procurement-integrity and systemic risks: faster acquisition concurrent with weakened institutional checks increases likelihood of improper procurement decisions and unchecked deployment of unsafe or unvetted AI models.
Synthesis of primary-source policy analysis, institutional staffing trend evidence, and qualitative risk/scenario assessment using principal–agent and moral-hazard frameworks. This is a conceptual risk projection rather than an empirically derived probability estimate.
Emerging agentic/AGI capabilities introduce new failure modes and governance challenges that standard ML oversight may not cover.
Emerging literature, theoretical analyses, and expert opinion summarized in the synthesis; authors note limited empirical long-term data and characterize this as an emergent risk.
Centralized provision of high-quality coding models by a few vendors could produce vendor lock-in and increase platform power in software development inputs.
Market-structure analysis and industry observations synthesized in the paper; the claim is forward-looking and not established by longitudinal market data within the review.