Evidence (4333 claims)
Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 402 | 112 | 67 | 480 | 1076 |
| Governance & Regulation | 402 | 192 | 122 | 62 | 790 |
| Research Productivity | 249 | 98 | 34 | 311 | 697 |
| Organizational Efficiency | 395 | 95 | 70 | 40 | 603 |
| Technology Adoption Rate | 321 | 126 | 73 | 39 | 564 |
| Firm Productivity | 306 | 39 | 70 | 12 | 432 |
| Output Quality | 256 | 66 | 25 | 28 | 375 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 76 | 38 | 20 | 315 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 77 | 34 | 80 | 9 | 202 |
| Skill Acquisition | 92 | 33 | 40 | 9 | 174 |
| Innovation Output | 120 | 12 | 23 | 12 | 168 |
| Firm Revenue | 98 | 34 | 22 | — | 154 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 84 | 16 | 33 | 7 | 140 |
| Inequality Measures | 25 | 77 | 32 | 5 | 139 |
| Regulatory Compliance | 54 | 63 | 13 | 3 | 133 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Task Completion Time | 88 | 5 | 4 | 3 | 100 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 32 | 11 | 7 | 97 |
| Wages & Compensation | 53 | 15 | 20 | 5 | 93 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 24 | 22 | 9 | 6 | 62 |
| Job Displacement | 6 | 38 | 13 | — | 57 |
| Hiring & Recruitment | 41 | 4 | 6 | 3 | 54 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 10 | 6 | 2 | 40 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 5 | 9 | — | 26 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
Governance
Remove filter
Faster iterative experimental cycles enabled by LLM orchestration may increase returns to experimental R&D and change the optimal allocation between computation, instrumentation, and labor.
Economic argumentation about iterative cycles and returns to capital/labor; proposed rather than empirically demonstrated.
The paper provides an initial mapping from diagnosis to intervention strategies (therapeutics) — i.e., treatment planning for model dysfunctions.
Conceptual mapping and proposed intervention strategies documented in the therapeutics section (initial mappings; not claimed as exhaustive).
AI should serve precision and purpose in public policy — improving foresight, enabling better trade-offs, and preserving democratic accountability.
Normative policy prescription and conceptual argumentation in the book; no empirical testing or quantified outcomes reported.
AI-driven systems should empower people with knowledge and pathways to participate in global markets rather than concentrate gains.
Normative recommendation derived from policy analysis and value judgments in the book; not supported by empirical evidence in the blurb.
Algorithmic transparency and auditability can reduce systemic risk from opaque automated lending decisions and improve regulator oversight and macroprudential policy.
Conceptual/systemic-risk argument in the "Systemic risk & governance externalities" section; no empirical systemic-risk analysis provided.
Improved algorithmic transparency could reduce information asymmetries, lowering adverse selection and moral hazard over time and potentially expanding credit to underserved populations.
Conceptual economic argument in the "Credit allocation & pricing" section; based on theory rather than empirical testing.
If properly designed and enforced, the protocol measures can improve credit access for underserved populations and reduce biased exclusion, supporting inclusive growth.
Normative claim supported by doctrinal arguments, comparative regulatory literature and technical fairness literature synthesized in the audit (no controlled empirical evaluation reported).
Firms that effectively implement governed hyperautomation may realize sustainable efficiency and reliability advantages, potentially increasing market concentration in some sectors unless governance costs level the playing field.
Strategic and competitive-dynamics argument derived from case examples and best-practice synthesis; no sector-level empirical concentration measures presented.
Standardized governance patterns reduce information asymmetries, enabling insurers and regulators to better price and manage enterprise AI risks.
Policy implication argued from the existence of standardized governance artifacts (audit trails, certifications) and industry practice; conceptual, no empirical insurer/regulator data presented.
Embedding governance reduces downside risks (compliance fines, data breaches), improving expected net returns of automation investments and lowering the adoption threshold for risk-averse firms.
Conceptual cost-benefit argument and industry best-practice examples; lacking quantitative measurement of returns or threshold shifts.
Incentives for human‑augmenting AI (e.g., subsidies or tax incentives tied to task redesign and training) can promote inclusive adoption patterns.
Policy analysis and comparative case studies; theoretical models that predict firm adoption responses to incentives, but limited causal empirical evidence specific to AI-targeted incentives.
By synthesizing computer science, engineering, and financial policy insights, DRL should be viewed not merely as a mathematical tool but as a transformative agent within the global socio-technical infrastructure of capital markets.
High-level synthesis and interdisciplinary argumentation in the paper; no empirical evidence or longitudinal studies are cited in the excerpt to demonstrate systemic transformation.
Research agenda items include quantifying social returns to different alignment interventions, studying market equilibria under participatory vs. opaque strategies, and modeling optimal regulatory mixes under uncertainty about harms and capability growth.
Prescriptive research agenda derived from the paper's economic analysis and identified knowledge gaps; presented as proposed studies rather than completed research.
If conformal filtering produces vacuous outputs at factuality levels customers demand, adoption in knowledge-intensive domains may be limited until methods simultaneously provide robustness and informativeness; vendors using efficient verifiers and robust calibration may gain competitive advantage.
Paper's market/economic discussion drawing on empirical trade-offs (informativeness vs. factuality) and cost comparisons; this is an applied implication rather than a direct experimental result.
Authors propose the 'AI orchestra' concept: future development will involve coordinated ensembles of specialized AI agents (code generation, test generation, dependency analysis, security scanning) orchestrated by humans and higher-level controllers.
Theoretical/conceptual argument by the authors grounded in qualitative findings from Netlight (practitioner reports of multiple tools and coordination frictions); this is a forward-looking synthesis rather than an empirically established fact.
Modular and cell‑free platforms could enable decentralized, localized manufacturing of specialty compounds, potentially altering trade flows away from centralized petrochemical hubs.
Conceptual synthesis plus small-scale demonstrations of modular/cell-free units in the reviewed literature; limited pilot projects and discussion of potential scalability and portability.
Product teams evaluating LLM-powered features rely on a spectrum of practices—from informal “vibe checks” to organizational meta-work—to cope with LLMs’ unpredictability.
Qualitative interview study with 19 practitioners; thematic coding of transcripts produced descriptions of a range of evaluation practices used by teams.
Platform design choices (property rights, portability, reputation, tokenization, escrowed memories) will shape incentives for contributions to shared knowledge and agent improvement.
Policy and mechanism-design implications drawn from observed phenomena (shared memories, contributions, and trust) in the qualitative dataset; recommendation rather than empirically tested claim.
Shared memory architectures create public-good–like externalities (knowledge diffusion and spillovers) that may be underprovided absent coordination or platform governance.
Qualitative observations of shared memories and diffusion patterns plus theoretical economic interpretation; no empirical quantification of spillover magnitudes provided.
Easier specification of constraints can reduce some harms (clear safety violations) but centralizes normative power (who defines constraints) and creates international/cultural externalities and risks of regulatory capture.
Normative and economic argument in the paper combining technical tractability of constraints with governance concerns; this is an inference about likely distributional effects rather than empirically established fact.
Because failure modes such as definition misalignment and hypothesis creep were observed, the authors argue for regulation/standards around disclosure of AI-assisted scientific claims and archival of verification artifacts.
Policy recommendation in the paper derived from the documented process-level failure modes in the single project; recommendation is prescriptive, not empirically validated beyond the project.
Improved throughput and lower travel costs can induce additional travel demand (rebound), partially offsetting congestion/emissions gains unless paired with demand-management measures.
Theoretical economic reasoning presented in the paper as a caveat; not directly measured in the simulation experiments (no induced-demand dynamic experiments reported).
There is a social welfare trade‑off between personalization value (higher AAR) and normative/social risk (higher MR); optimal policy and product design should balance these using BenchPreS metrics.
Analytical argument combining empirical findings (trade‑off between AAR and MR) with economic welfare considerations; the paper does not present formal welfare estimates or market experiments.
AI in higher education is not simply a technological shift but a structural transformation requiring deliberate, critically informed governance grounded in equity and human agency.
Normative/conceptual conclusion drawn by the author from the thematic analysis and the critical AI media literacy framing; presented as the paper's principal argument or recommendation. (Supported qualitatively by themes from the analyzed discussions rather than quantitative causal evidence.)
The adoption of AI governance programmes by military institutions will have strategic implications.
Hypothesis stated by the author; presented as forward-looking analysis without accompanying empirical modeling, historical analogues, or measured strategic outcomes in the provided text.
The expansion of the gig economy reflects both genuine labor-market innovation enabling worker flexibility and cost shifting from firms to workers that policy intervention may appropriately address.
Synthesis and interpretation of the study's empirical findings (prevalence, heterogeneity, earnings gaps, distributional effects, and social protection measures) from administrative data, labor force surveys, and platform transaction records across 24 OECD countries (2015–2025).
Standard productivity metrics (e.g., output per hour) may misprice value if temporal quality matters; firms will face trade‑offs between maximizing throughput and preserving richer subjective temporality that affects long‑run creativity, morale, and retention.
Conceptual economic reasoning and literature synthesis on attention and productivity; no empirical studies or longitudinal workplace data presented.
Investors and firms may need to include metrics of experiential quality (subjective well‑being, sustained attention quality) alongside productivity metrics when valuing neurotech and human–AI platforms.
Normative/economic implication argued from the framework; no empirical valuation studies or survey of investor behavior included.
AI raises returns to platformization and can change the distribution of financial intermediation rents (potentially concentrating returns among platform incumbents).
Theoretical and economic reasoning in the 'Implications for AI Economics' section; conceptual discussion of platform effects and rents rather than empirical measurement in the paper summary.
Reported pilot gains, if scaled, could shift firm‑level returns and industry productivity measures, but gains are contingent on coordinated adoption; uneven uptake may produce winner‑takes‑more dynamics among technologically advanced firms.
Inference from pilot results and economic reasoning in the reviewed literature; no large‑scale empirical validation provided in the review.
Topology is the dominant factor for price stability and scalability compared to other swept variables (load, presence of hybrid integrator, governance constraints).
Factor-ablation analysis within the 1,620-run simulation study showing the largest explanatory effect (largest changes in volatility and scalability metrics) attributable to graph topology rather than load, hybrid flag, or governance settings.
Demand for mid-level, routine-focused developer roles could compress while demand rises for verification, security, and AI–human orchestration skills.
Theoretical task-replacement argument based on observed capabilities of LLMs and synthesized user study evidence; limited direct labor-market empirical evidence in the reviewed literature.
Routine coding tasks may be partially automated, shifting human labor toward verification, integration, architecture, and domain-specific tasks.
Task-composition studies, user studies showing LLMs handle boilerplate/routine work, and economic inference synthesized across studies.
Societal acceptance of AI-generated audiovisual media is uncertain and could range from widespread uptake to broad rejection.
Discussion drawing on mixed empirical studies and scenario construction in the review; the paper notes contradictory findings in existing studies but does not provide primary survey data or sample sizes.
If cognitive interlocks are widely adopted, many negative externalities can be internalized and AI-driven productivity gains can be realized more sustainably; absent such controls, equilibrium may drift toward higher error rates and systemic incidents.
Long-run equilibrium argument based on theoretical reasoning and conditional claims; no longitudinal or cross-firm empirical evidence presented.
Labor demand effects are ambiguous: junior/entry-level demand may be reduced for some tasks while demand for verification and higher-skill roles may rise.
Economic reasoning, early observational signals, and theoretical task-reallocation frameworks; empirical longitudinal evidence is limited or absent.
The effectiveness of generative AI depends critically on human-AI workflows: prompt design, iterative refinement, and human vetting materially affect outcomes.
Qualitative analyses of interaction patterns and experiments manipulating prompting/iteration showing variation in outcomes; many studies report improved outputs after iterative prompting and human-in-the-loop refinement.
Market demand is likely to bifurcate: high-value clinical markets will require rigorous explainability and neuroscientific grounding (higher willingness-to-pay), while research and consumer segments may tolerate black-box models (lower margins).
Market segmentation argument built from differing end-user requirements and tolerance for opaque models; presented as a projected implication rather than an empirically tested market study.
Teams often produce evaluation outputs (tests, metrics, user feedback) but lack mechanisms, processes, or technical levers to convert those outputs into actionable engineering or product changes—a novel “results-actionability gap.”
Recurring theme from the 19 practitioner interviews and coding; authors explicitly articulate and label this gap based on participants' reports.
The study confirms several previously documented evaluation challenges with LLMs: model unpredictability, metric mismatch, high human-evaluation costs, and difficulty reproducing failures.
Interview data from 19 practitioners; thematic analysis flagged these recurring problems as reported by participants and aligned with prior literature.
Emergent quality hierarchies among agents imply winner-take-most dynamics in informational value and potential market concentration in agent quality.
Observed formation of quality hierarchies in agent interactions and documented economic interpretation; this is a hypothesis/implication drawn from qualitative patterns rather than measured market outcomes.
Security of LLM-based MASs functions as an economic externality: failures can impose social costs (misinformation, poor collective decisions), and absent liability or market incentives providers may underinvest in robustness.
Economic reasoning and implication section in the paper—conceptual argument linking the technical vulnerability to economic externality and incentive misalignment. No empirical economic data provided in the summary.
Analytical conditions on stubbornness and influence weights identify when a single adversary can dominate network dynamics (i.e., influence propagation criteria derived from FJ fixed-point analysis).
Mathematical/theoretical analysis of FJ model fixed points and influence propagation in the paper; derivation of conditions relating agent stubbornness and interpersonal trust weights to steady-state influence.
If models frequently leak or misuse preferences in third‑party contexts, users and organizations will discount the value of personalization or demand stronger controls, increasing costs for deploying memory features and reducing consumer surplus.
Economic reasoning and implication drawn from the observed misapplication behavior; no empirical user adoption or market data provided in the study to directly support this claim.
The failure mode (misapplication of preferences to third parties) creates negative externalities (privacy violations, normative harms, misinformation, contractual breaches) that markets and platforms may not internalize without regulation or design changes.
Economic interpretation and argumentation building on the empirical failure mode; these harms are hypothesized implications rather than measured outcomes in the paper.
Unclear liability frameworks increase perceived and real costs and can slow adoption by hospitals and insurers.
Policy analyses and procurement narratives noting liability uncertainty cited as a barrier to procurement and deployment.
Up-front implementation costs commonly include procurement, integration with PACS/EMR, UI/UX development, regulatory compliance, and staff training; recurring costs include monitoring, data labeling, software updates, and cybersecurity.
Implementation reports, vendor and hospital accounts, and qualitative studies documenting cost categories (specific dollar amounts vary across settings and are rarely published in detail).
Without continuous support for upskilling/reskilling and inclusive policies, AI risks becoming a source of exclusion rather than an enabler of human advancement.
Normative conclusion derived from reviewed literature and thematic interpretation in the qualitative study (literature-based; evidence is secondary and not quantified).
Research literature synthesis demonstrates 70-75% automation potential.
Quantitative estimate offered by the authors (70-75%) as part of function-by-function analysis; no described empirical evaluation or sample supporting the figure.
Knowledge transmission (teaching/lecturing) shows 75-80% AI substitutability.
Authors' quantitative estimate presented in the analysis (75-80%); the paper does not detail empirical methods or validation samples for this percentage.