Evidence (5539 claims)
Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 402 | 112 | 67 | 480 | 1076 |
| Governance & Regulation | 402 | 192 | 122 | 62 | 790 |
| Research Productivity | 249 | 98 | 34 | 311 | 697 |
| Organizational Efficiency | 395 | 95 | 70 | 40 | 603 |
| Technology Adoption Rate | 321 | 126 | 73 | 39 | 564 |
| Firm Productivity | 306 | 39 | 70 | 12 | 432 |
| Output Quality | 256 | 66 | 25 | 28 | 375 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 76 | 38 | 20 | 315 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 77 | 34 | 80 | 9 | 202 |
| Skill Acquisition | 92 | 33 | 40 | 9 | 174 |
| Innovation Output | 120 | 12 | 23 | 12 | 168 |
| Firm Revenue | 98 | 34 | 22 | — | 154 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 84 | 16 | 33 | 7 | 140 |
| Inequality Measures | 25 | 77 | 32 | 5 | 139 |
| Regulatory Compliance | 54 | 63 | 13 | 3 | 133 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Task Completion Time | 88 | 5 | 4 | 3 | 100 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 32 | 11 | 7 | 97 |
| Wages & Compensation | 53 | 15 | 20 | 5 | 93 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 24 | 22 | 9 | 6 | 62 |
| Job Displacement | 6 | 38 | 13 | — | 57 |
| Hiring & Recruitment | 41 | 4 | 6 | 3 | 54 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 10 | 6 | 2 | 40 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 5 | 9 | — | 26 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
Adoption
Remove filter
Widespread unverifiable income claims and promotional framing create noisy signals about viable earnings, complicating entrants’ investment decisions and labor market expectations.
Analytical inference based on the documented prevalence of unverifiable earnings claims in the 377 videos and theory about market signaling; not quantitatively tested in the paper.
GenAI lowers the time and skill cost of producing many types of creative outputs, which can increase content supply and exert downward pressure on wages for routine creative tasks.
Inference drawn as an implication from observed practices (e.g., mass production workflows) in the 377 videos and existing literature; not directly measured in this study.
Creators and the community knowledge base document shifting norms around authorship and attribution: GenAI blurs who is considered the creator and complicates labor recognition and rights.
Coding captured explicit discussion and contested norms about authorship, attribution, and creator identity across the 377 videos.
Some creators recommend or describe synthetic engagement practices (e.g., automated posting, synthetic comments/engagement) as tactics to inflate visibility.
Thematic coding noted advice or descriptions of engagement-inflating tactics across videos in the 377-video corpus.
Creators surface and often employ practices that raise content misappropriation concerns (use of copyrighted or third-party material in synthetic outputs).
Instances and discussions captured in the 377-video sample where creators show or recommend synthesizing, transforming, or repurposing third‑party content.
Many videos advertise earnings or income claims that are unverifiable within the content, producing noisy market signals.
Qualitative observations from coding the 377 videos noting frequent asserted earnings without reproducible evidence or transparent accounting.
Interpretation: observed behavior is best explained by ambiguity aversion over data-leak likelihoods — uncertainty about leak probabilities drives avoidance of personalized AI more than baseline privacy preferences alone.
Comparative pattern of results across the Risk and Ambiguity conditions in the randomized experiment (N = 610): no privacy-threat effect when probability is known (Risk), but large privacy-threat effect when probability is ambiguous (Ambiguity), leading authors to attribute effects to ambiguity aversion.
The ambiguity-driven reduction in adoption occurs for both privacy-threatening labels applied to sensitive demographic data and to anonymized preference data — ambiguity reduces adoption regardless of the data-sensitivity label.
Experimental arms varied the data-type/privacy label (sensitive demographic data vs anonymized preference data) within the 2×3 design (N = 610). The paper reports that the negative effect of ambiguity on adoption was observed across these different data-type labels.
Platform-mediated visibility measures used in policy assessments, business analytics, and research (e.g., estimating market share, referral importance, or favoritism) are at risk of misestimation if measurement stochasticity is not incorporated.
Empirical demonstration that citation shares and domain ranks vary across repeated samples and that many apparent differences disappear once uncertainty is quantified; argument linking visibility stochasticity to downstream inference and decision risks.
The heavy-tailed nature of citation distributions implies long tails and high variance, meaning achieving tight uncertainty bounds can require substantially more sampling than would be expected under thin-tailed assumptions.
Observed power-law / heavy-tailed citation-count distributions from repeated-sample data; theoretical implication and empirical guidance from variance estimates and pilot-sample analyses described in the paper.
Numerical simulations using calibrated parameter sets produce phase diagrams and time-paths that show when gradual adjustment transitions into explosive demand collapse and financial stress under different combinations of capability growth, diffusion speed, and reinstatement rate.
Calibrated numerical simulation experiments described in the methods and results sections, using FRED, BLS, and occupational AI-exposure inputs and varying key model parameters.
Because consumption is concentrated and top incomes have high AI exposure, shocks to top-income labor/income disproportionately affect aggregate consumption and thereby threaten private credit and mortgage markets — the paper maps plausible exposures to roughly $2.5 trillion of global private credit and about $13 trillion of mortgages.
Calibration exercise linking household-level demand shocks (based on concentration and AI-exposure mapping) to aggregate credit and mortgage aggregates; reported dollar-amount mappings in the paper's scenarios.
Top-quintile households are also the cohort with the highest measured AI exposure (i.e., incomes/occupations most exposed to AI substitution), increasing the concentration of AI-driven demand risk.
Mapping occupation-level AI-exposure indices to household income quantiles using BLS occupation employment and wage data; used in calibration and scenario analysis.
Intermediation collapse: AI agents reduce information frictions and automate advice/coordination tasks, compressing intermediary margins toward logistics/execution costs and repricing business models across SaaS, payments, consulting, insurance, and financial advisory, with knock-on effects for firm valuations and collateral values that underpin credit markets.
Modeling of intermediary margins and information rents within the macro-financial framework; calibrated scenarios and sectoral discussion mapping margin compression to valuation and collateral effects.
Ghost GDP: AI output that replaces labor-intensive output can create a wedge between measured GDP (which may rise) and consumption-relevant income (which can fall) because a declining labor share reduces monetary velocity absent proportionate transfers — producing hidden demand shortfalls.
Formalization in the paper linking labor share to monetary velocity and thus to consumption-relevant income; calibration using FRED macro time series and monetary-aggregate/velocity proxies.
When firms rationally substitute AI for labor, aggregate labor income can fall and lower demand, which accelerates further AI substitution — a 'displacement spiral' whose net feedback is either self-limiting (convergent) or explosive (runaway adoption + demand collapse) depending on AI capability growth rate, diffusion speed across firms/sectors, and the reinstatement rate (rate at which new paid human roles or demand reappear).
Formal model derivations that identify key parameters and inequalities separating convergent vs explosive regimes; calibrated simulations that vary capability growth, diffusivity, and reinstatement elasticity to produce different phase outcomes.
Rapid AI adoption can create a macro-financial stress scenario not primarily through productivity collapse or existential risk but via a distribution-and-contract mismatch: AI-generated abundance reduces the need for human cognitive labor while institutions (wage contracts, credit, consumption patterns, financial intermediation) remain anchored to the scarcity of human cognition, producing a self-reinforcing downward spiral in labor income, demand, and intermediary margins that can tip into an explosive crisis unless offset by sufficiently fast reinstatement of human-paid demand or deliberate policy/market responses.
Analytical macro-financial model coupling firm-level substitution decisions, aggregate demand mapping, and financial-sector balance-sheet propagation; calibrated numerical simulations using U.S. macro time series (FRED), BLS occupation-level employment and wages, and published occupation-level AI-exposure indices; phase diagrams and scenario time-paths reported in the paper.
Distributional shifts and regime changes require periodic revalidation or TSFM updates to maintain reliable performance.
Paper discussion of limitations and recommended operational procedures (revalidation and periodic TSFM updates) to handle non-stationarity and regime shifts; rationale based on time-series modeling risks.
If the TSFM produces biased or poor forecasts in certain regimes, those errors can propagate into the downstream regression and harm performance.
Stated caveat in the paper (theoretical/empirical rationale); logical consequence of using TSFM-generated features as inputs—error propagation risk discussed in analysis/limitations section.
Instability of agent rankings across configurations makes procurement and deployment decisions based on narrow benchmarks risky; firms should evaluate agents under their own scaffolds, datasets, and workflows before committing.
Empirical finding of ranking instability across models, scaffolds, and datasets; methodological recommendation derived from that instability.
Claims that AI will imminently replace human auditors are overstated; real-world economic benefits are more likely to come from complementary automation (breadth + triage) rather than full substitution.
Interpretation based on empirical failures in end-to-end exploitation, instability across configurations, and scaffold sensitivity observed in this study.
Detection and exploitation rankings are unstable: rankings shift across model configurations, tasks, and datasets, so results are not robust to evaluation choices.
Observed variability in detection/exploitation rankings across the expanded matrix of models, scaffolds, and datasets in the study's experiments.
Standardized platforms and benchmarks may create network effects and lock-in around dominant hardware–software stacks; antitrust and standards policy will matter to preserve competition.
Workshop participants' market-structure analysis and policy discussion included in the summary recommendations (NSF workshop, Sept 26–27, 2024).
The sphere + dislodgement-threshold material approximation may not capture all real-world mechanical and adhesive properties, limiting generalization.
Authors note/modeling limitation: summary explicitly states the material physics are approximated and may not capture all real-world properties; this is presented as a limitation rather than an empirical result.
Key technical and organizational risks include model brittleness, privacy and IP concerns in code generation (training-data provenance), and increased governance and QA burdens.
Literature review highlighting known risks and survey responses reporting practitioner concerns; no quantified incident rates provided.
Practitioners report barriers to adoption including integration costs, lack of trust/explainability, poor data quality, and skills gaps.
Thematic analysis / coding of open-ended survey responses and literature review identifying common adoption barriers; survey sample size not specified.
GDP and productivity metrics that ignore interpretive labor risk understating the inputs to creative and knowledge work; RATs offer a means to measure previously invisible inputs.
Policy argument in the measurement/productivity subsection; no empirical re-estimation of GDP/productivity presented.
Algorithmic feeds and AI summarizers tend to compress or automate interpretive traces, potentially erasing signals of reasoning, context, and tacit knowledge.
Conceptual claim supported by argumentation and examples in the paper; no empirical comparison between RATs and existing summarizers is presented.
Human ratings and preference-trained metrics reward visually vivid but exaggerated color and contrast, which leads to outputs that are less photorealistic when photorealism is the intended objective.
Reported experiments in the paper comparing human preference ratings and preference-trained evaluators against a color-fidelity-focused ground truth (CFD). The authors state these existing evaluators favor high saturation/contrast and qualitatively and quantitatively select images that are 'too vivid' relative to photographic realism (paper reports qualitative examples and quantitative comparisons; exact sample sizes and statistical values are described in paper but not provided in the summary).
Prior work often conflates feedback source and feedback model; this study isolates them through controlled experiments.
Authors' literature review and the paper's experimental design explicitly constructed to disentangle source and model effects.
QCSC systems are capital- and skill-intensive, favoring well-resourced incumbents (large tech firms, national labs, major pharma/materials companies), potentially increasing concentration in compute-enabled domains.
Economic and industry-structure reasoning based on anticipated capital costs, specialized skills required, and comparison to existing capital-intensive compute infrastructures; no empirical market-share data.
Recent quantum advantage demonstrations for quantum-system simulation show utility, but practical applied research requires hybrid workflows that neither QPUs nor classical HPC can efficiently execute alone.
Review and synthesis of published quantum-simulation demonstrations and known performance/scaling limits of classical HPC; qualitative analysis of hybrid algorithm requirements; no new experiments.
Under realistic limitations (distribution shift, very large prompt inventories, or severe cold starts), DPS’s realized rollout savings and performance gains may be reduced.
Authors list these scenarios as potential limitations and caveats in the Discussion/Limitations section; no quantification provided in the summary.
Contracts and incentives based on expected performance can incentivize strategies that deliver high expected returns but poor or unreliable time-average outcomes; incentive design should account for path-dependent risks.
Theoretical/incentive argument and examples in the paper linking objective mismatch to adverse incentives; illustrative reasoning rather than empirical contract studies.
Economic evaluations and deployment decisions that rely on ensemble expectations can misstate economic value and risk because firms and users experience single time-averaged trajectories; regulators and decision-makers should therefore prefer objectives reflecting single-run guarantees when relevant.
Conceptual mapping of the theoretical results to economic decision-making and deployment risk; policy and incentive discussion in the paper (argumentative, not empirical).
The paper's illustrative example shows a policy that maximizes expected reward can produce trajectories that lock into high- or low-reward regimes so an agent’s long-term realized reward is highly uncertain and not captured by the expectation.
Constructed example provided in the paper; demonstration of divergent single-trajectory outcomes under a single policy; no empirical sample size (example-based).
In contexts analogous to AI markets, a firm at a network/geographic disadvantage would need exponentially greater scale (users/data/compute) to match the probability of early discovery achieved by a better-positioned rival.
Interpretation/translation of the model's analytic scaling result to market-relevant quantities; this is a theoretical implication rather than an empirically tested claim.
Expect diminishing returns from AI investments if parallel investments in organizational change and data governance are not made.
Synthesis of case evidence and theoretical argument: instances where additional AI investment produced limited marginal benefit absent organizational complements.
Legacy systems and siloed organizational structures produce persistent forecasting inaccuracies, operational disconnects, and constrained responsiveness.
Cross-case interview narratives documenting continued forecasting issues and operational misalignment in firms with legacy IT and functional silos.
MLOps and governance provisions shift costs from one-off implementation to ongoing maintenance, implying recurring costs that should be captured in economic evaluations.
Analytical/economic argument presented in the paper as an implication of including an MLOps layer (conceptual; no empirical cost accounting provided).
Adoption complementarities (AI tools + developer skill + organizational processes) favor larger incumbents and well‑funded firms, possibly increasing concentration in tech sectors.
Theoretical argument about complementarities and returns to scale; illustrative examples; lacks firm‑level empirical testing.
In the near term, displacement risks concentrate on junior or highly routine roles; mobility and retraining will determine realized unemployment impacts.
Task automatability mapping indicating routine tasks more automatable and qualitative reasoning on labor mobility; no empirical unemployment projections.
Adoption will be heterogeneous: larger firms and well‑resourced teams will capture more gains earlier, producing competitive advantages.
Theoretical argument about adoption complementarities (AI tools + developer skill + organizational processes) and illustrative examples; no cross‑firm empirical analysis.
Differential adoption across firms (due to modular, scalable designs and data advantages) may create winner‑takes‑most effects and increase market concentration, benefiting early adopters with rich data/integration capabilities.
Market-structure claim supported by economic reasoning about scale and data advantages; no cross-firm empirical adoption study or market concentration time‑series is provided.
Initial investment, integration, and ongoing maintenance/compliance costs can be substantial and affect short-term ROI.
Interviewed administrators and implementation reports citing upfront and recurring costs (integration, model maintenance, compliance); quantitative budget figures not standardized across sites in the paper.
Risk of deskilling or reduced empathy if human roles are overly automated.
Thematic analysis of staff interviews and surveys reporting concerns about loss of practice, reduced patient contact, and potential diminishment of empathetic skills; no longitudinal measures of skill loss presented.
Technical and organizational integration with legacy hospital IT systems is nontrivial.
Implementation reports and interviews describing integration work, time, and resource needs; descriptive accounts of technical and organizational barriers (no universal timelines/costs reported).
Algorithmic bias in NLP models can misclassify complaints from underrepresented groups.
Observations from system classification error analyses (disparities reported by demographic group) and corroborating qualitative concerns from staff and administrators; specific subgroup sample sizes and effect magnitudes not provided.
Data privacy and security risks arise from centralizing complaint text and metadata.
Stakeholder interviews, thematic coding of concerns, and risk assessment commentary based on centralized logs and metadata aggregation; no measured breach incidents reported here.
Organizations will incur additional governance and procurement costs (diversity audits, recalibration of reward models, multi-model infrastructures) to mitigate homogenization, shifting some economic benefits of AI toward governance spending.
Cost implication argued from the need for auditing and multi-model procurement described in recommendations; not supported by quantified cost analyses in the paper.