The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (4560 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Productivity Remove filter
XChronos reframes transhumanist technology evaluation in experiential terms, creating both market opportunities and measurement/regulatory challenges for AI economics.
Synthesis and concluding argument in the paper summarizing proposed implications; conceptual reasoning without empirical tests.
high mixed XChronos and Conscious Transhumanism: A Philosophical Framew... shift in evaluation criteria toward experiential measures and resultant market/r...
The methodological landscape of the evidence base is heterogeneous, consisting of cross-sectional surveys, case studies, quasi-experimental designs, and a limited number of longitudinal analyses.
Study design information was extracted from the 145 included studies revealing a mix of designs and relatively few longitudinal or experimental studies.
high mixed Digital transformation and its relationship with work produc... study design types (cross-sectional, case study, quasi-experimental, longitudina...
Human factors (training, trust calibration, workflows) determine whether clinicians accept, override, or ignore GenAI suggestions.
Qualitative and quantitative human-AI interaction studies and pilot deployments discussed in the paper; specific sample sizes and effect sizes are not reported in the paper.
high mixed GenAI and clinical decision making in general practice override/acceptance rates; clinician-reported trust and cognitive load; adherenc...
Safety and net benefit of GenAI CDS hinge on deployment details: user interface, real-time feedback, uncertainty quantification, calibration, and how recommendations are presented (strong vs. suggestive).
Human factors and implementation studies referenced; early A/B tests and human-AI interaction research suggest interface and presentation affect acceptance and error rates; no large-scale standardized implementation trial data cited.
high mixed GenAI and clinical decision making in general practice acceptance/override rates; error rates; calibration metrics; clinician trust
Reimbursement models (fee-for-service vs. capitation) will influence whether cost savings from GenAI are realized or offset by increased service volume.
Economic incentive framework and prior health-economics literature cited; the paper does not provide direct empirical tests but references plausible incentive channels.
high mixed GenAI and clinical decision making in general practice total spending; per-patient cost; service volume under different payment models
RL and adaptive methods are good for real-time adaptation but can be myopic, require large amounts of interaction data, and struggle to incorporate long-term preference structure and ethical constraints.
Surveyed properties of reinforcement learning and adaptive methods in HRI/RS literature; no new empirical evaluation in this paper.
high mixed Reimagining Social Robots as Recommender Systems: Foundation... real-time adaptation effectiveness, sample efficiency (amount of interaction dat...
Key tradeoffs in contemporary financing models include speed/flexibility versus regulatory coverage and long‑term cost, and data reliance versus privacy/fairness.
Multi‑criteria comparative evaluation and conceptual analysis across financing models; synthesis draws on regulatory context and observed product features rather than primary quantitative tradeoff estimation.
high mixed Traditional vs. contemporary financing models for MSMEs and ... tradeoff between speed/flexibility and regulatory protection/cost; tradeoff betw...
Performance of structure prediction models scales with data, model size, and compute; there are tradeoffs between accuracy and inference speed/simplicity.
Paper explicitly states scaling behavior and tradeoffs in 'Compute and training' and 'Representative models' sections; no precise scaling curves or thresholds are provided in the text.
high mixed Protein structure prediction powered by artificial intellige... model predictive performance as a function of training data volume, model size, ...
The community knowledge functions both as practical how-to guidance and as collective experimentation with platform rules and revenue mechanisms.
Observed dual nature in the 377-video corpus: instructional workflows alongside demonstrations/testing of platform-tailored monetization tactics and workarounds.
high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... co-occurrence of instructional content and platform-experimentation practices
Typical practices emphasized by creators include rapid mass production of content, productizing prompt engineering, repurposing existing material via synthesis/localization, and packaging AI outputs as sellable creative services or assets.
Recurring practices surfaced through qualitative coding of workflows, tools, and pipelines described in the 377 videos.
high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... presence and frequency of recommended production and productization practices
Across the 377 videos, creators converge on a set of repeatable use cases and platform‑tailored monetization tactics.
Thematic coding of 377 videos produced a catalog of recurring use cases and tactics; the paper reports convergence across that sample.
high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... frequency and recurrence of specific use cases and monetization tactics in the s...
YouTube creators have collectively constructed and circulated a practical knowledge repository about how to monetize GenAI-driven creative work.
Systematic qualitative content analysis (thematic coding) of 377 publicly available YouTube videos in which creators promote GenAI workflows and monetization strategies.
high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... presence and characteristics of a community knowledge repository (practical guid...
The topology of service-dependency graphs (modelled as DAGs of compute stages) is a first-order determinant of whether decentralised, price-based resource allocation will be stable and scalable.
Systematic ablation study using simulation: 1,620 runs total across six experiment types, sweeping graph topology (hierarchical vs cross-cutting), load, hybrid integrator presence, and governance constraints; metrics included price convergence/volatility and allocation throughput/quality. Effect sizes reported in the paper show topology had the largest impact on price stability and scalability.
high mixed Real-Time AI Service Economy: A Framework for Agentic Comput... price convergence / price volatility and system scalability (throughput and allo...
Choice of scaffold materially affects outcomes: an open-source scaffold outperformed vendor-provided scaffolds by up to approximately 5 percentage points.
Comparative experiments across three scaffolding approaches (vendor scaffolds and at least one open-source scaffold) showing up to ~5 percentage point differences in measured outcomes.
high mixed Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... performance_difference_across_scaffolds (detection/exploitation_rates_difference...
Adoption of NFD approaches in regulated domains will depend on standards for validation, auditability, and update procedures.
Implications and governance discussion emphasizing regulatory constraints (finance, healthcare) and the need for validation/audit standards; logical/ normative claim rather than empirical finding.
high mixed Nurture-First Agent Development: Building Domain-Expert AI A... adoption rate in regulated domains conditional on available validation/audit sta...
Absence of irreducibility, positive recurrence, or aperiodicity in the state dynamics can produce non-ergodic reward behavior.
Theoretical argument and examples in the paper illustrating how breakdowns of these chain conditions lead to multiple invariant measures or absorbing regimes; analysis-based evidence.
high mixed Ergodicity in reinforcement learning presence of non-ergodic long-run reward behavior (e.g., multiple invariant measu...
Standard Markov chain ergodicity conditions (irreducibility, positive recurrence, aperiodicity) imply ergodic reward processes when rewards depend only on the chain state.
Formal mapping in the paper between Markov-chain ergodicity properties and reward-process ergodicity; theoretical derivation (no empirical sample).
high mixed Ergodicity in reinforcement learning ergodicity of reward process (equivalence to chain ergodicity when rewards are s...
Non-ergodic processes admit path-dependent long-run behavior (e.g., absorbing sets, multiple invariant measures, path-dependent reinforcement), so different runs with the same policy can have different long-run averages.
Analytic discussion of Markov-chain examples and theory plus the paper's illustrative constructed example showing path-dependent locking into regimes; theoretical and example-driven evidence.
high mixed Ergodicity in reinforcement learning variance across realized long-run average rewards across trajectories under the ...
Ergodic reward processes are those where time averages along almost every long trajectory converge to the same value as the ensemble average.
Formal definition and discussion in the paper mapping ergodicity concepts from stochastic processes to reward processes; theoretical exposition.
high mixed Ergodicity in reinforcement learning convergence of time-average reward to ensemble average
Some patients value human contact for sensitive cases; automated interactions can feel impersonal.
Semi-structured interviews with patients/staff and open-ended survey responses documenting preferences for human interaction in sensitive/complex complaints.
high mixed The Role of Artificial Intelligence in Healthcare Complaint ... patient-reported preference for human contact and perceived interpersonal qualit...
India’s reported post-harvest loss is relatively low (3.2%) despite poor food-security outcomes (Global Hunger Index rank 111/125).
Reported statistics cited in the paper (FAO/Kaggle for post-harvest loss; Global Hunger Index ranking referenced).
high mixed AI in food inequality: Leveraging artificial intelligence to... post-harvest loss (percent) and Global Hunger Index rank
Data‑driven policies can either amplify or mitigate inequalities depending on data representativeness, model design, and deployment governance.
Multiple empirical examples and theoretical analyses in the review highlighting cases of both harm (bias amplification) and mitigation, identified across the 103 items.
high mixed Models, applications, and limitations of the responsible ado... distributional equity outcomes (inequality amplification or mitigation)
Citizen acceptance, transparency, and perceived fairness strongly shape adoption trajectories and the political feasibility of AI tools in government.
Repeated empirical findings in the reviewed literature linking public trust, transparency measures, and fairness perceptions to successful or failed deployments (drawn from multiple case studies in the 103 items).
high mixed Models, applications, and limitations of the responsible ado... adoption trajectory/political feasibility of government AI tools (measured via d...
Adoption of AI and data-driven governance is highly uneven across jurisdictions and sectors, driven by institutional capacity, governance frameworks, and public trust.
Cross‑regional and cross‑sector comparisons in the review corpus (103 items) showing varying maturity levels and repeated identification of institutional capacity, governance arrangements, and trust factors as determinants.
high mixed Models, applications, and limitations of the responsible ado... adoption level/maturity of AI-driven governance systems
Governance approaches are emerging at global, regional and national levels; they vary widely across sectors and jurisdictions, creating opportunities for regulatory experimentation but also risks of fragmentation and regulatory arbitrage.
Cross-jurisdictional comparison of existing/global/regional/national governance instruments and sectoral guidance; gap analysis highlighting heterogeneity.
high mixed AI Governance and Data Privacy: Comparative Analysis of U.S.... degree of regulatory heterogeneity, instances of fragmentation/regulatory arbitr...
Weak formal institutions often coexist with strong informal institutions in African contexts, shaping governance, trust, and enforcement mechanisms in supply chains.
Cross-disciplinary literature review presented in the paper; conceptual argumentation rather than primary empirical analysis.
high mixed Continental shift: operations and supply chain management re... relative strength of formal vs informal institutions and their effects on govern...
Technology effectiveness depends on institutional support (extension, property rights), finance, and local knowledge — technologies are not a silver bullet alone.
Conceptual frameworks and comparative analysis in the review; supporting case studies and program evaluations linking adoption and impact to institutional factors (extension reach, tenure security, access to credit).
high mixed MODERN APPROACHES TO SUSTAINABLE AGRICULTURAL TRANSFORMATION technology adoption rates, realized productivity gains, distribution of benefits...
Productivity gains from generative AI depend on task mix, integration design, and the availability of complementary human skills.
Theoretical evaluation and synthesis of heterogeneous empirical findings; authors highlight variation across firms, sectors, and tasks.
high mixed The Use of ChatGPT in Business Productivity and Workflow Opt... productivity change conditional on task mix/integration/human skills (productivi...
Existing evidence is time-sensitive and heterogeneous: rapidly evolving models, heterogeneous study designs, and many short-term lab/microtask studies limit direct comparability and long-run inference.
Meta-observation from the review: documented methodological limitations across the literature (variation in models, tasks, metrics; prevalence of short-term studies).
high mixed ChatGPT as a Tool for Programming Assistance and Code Develo... generalizability and comparability of empirical findings (study heterogeneity)
Real‑time and LLM‑based methods improve responsiveness but raise governance, transparency, and reproducibility challenges that BLS must manage (audit trails, uncertainty communication).
Operational tradeoff discussion in the paper identifying governance risks; no case studies or incident analyses provided.
high mixed Enhancing BLS Methodologies for Projecting AI's Impact on Em... tradeoff between responsiveness (timeliness/accuracy) and governance metrics (tr...
Distinguishing automation versus augmentation using causal methods changes policy responses (e.g., income support versus reskilling).
Policy implication drawn from conceptual separation of substitution and complementarity effects; logical inference rather than empirical demonstration in the paper.
high mixed Enhancing BLS Methodologies for Projecting AI's Impact on Em... policy prescriptions chosen contingent on causal classification (automation vs a...
Methodological caveats across the literature (heterogeneity of tasks/measures, publication bias, short-term studies) limit the generalizability of current findings.
Meta-level critique within the synthesis noting study heterogeneity, likely publication/short-term biases, and variable domain-specific performance dependent on user expertise and workflows.
high mixed ChatGPT as an Innovative Tool for Idea Generation and Proble... generalizability and external validity of LLM-assisted creativity findings
Standard productivity metrics are likely to undercount the value generated by AI-augmented ideation; quality-adjusted measures of creative output are required.
Measurement critique based on the mismatch between existing productivity statistics and the kinds of upstream idea-generation gains observed in empirical studies; supported by the review's methodological discussion.
high mixed ChatGPT as an Innovative Tool for Idea Generation and Proble... measured productivity vs. true quality-adjusted creative output
Realized value from AI methods (ML, predictive analytics, anomaly detection, XAI) is conditional: these technical methods deliver capabilities only when combined with strong data governance, standardized processes, and change management.
Thematic synthesis across the systematic review (2020–2025) showing repeated case-study and practitioner-report evidence that technical gains failed to scale without governance, process standardization, and organizational change efforts.
high mixed Integrating Artificial Intelligence and Enterprise Resource ... magnitude and durability of ERP-AI benefits (e.g., sustained accuracy gains, ado...
Despite laboratory and pilot successes, many engineered bioprocesses remain at bench or pilot scale and require techno‑economic validation before industrial competitiveness can be established.
Review aggregate noting scale and validation status of case studies (many reported at lab or pilot fermenter scale) and explicit references to the need for TEA and LCA for industrial assessment.
high mixed Harnessing Microbial Factories: Biotechnology at the Edge of... technology readiness level (lab/pilot vs commercial), presence/absence of publis...
Results and implications are limited by the sample and context: evidence comes from law students on a single issue-spotting exam using one brief training intervention, so generalizability to experienced professionals, other tasks, or other models is untested.
Authors’ reported sample (164 law students) and explicit caution about generalizability in the study summary; the intervention and outcome are specific to one exam and one ~10-minute training.
high mixed Training for Technology: Adoption and Productive Use of Gene... Generalizability/applicability to other populations and tasks
Some mechanism-specific estimates are imprecise due to the sample size; confidence intervals for those estimates are wide.
Authors report wide confidence intervals for mechanism decomposition (principal stratification) results based on the randomized sample of 164 students.
high mixed Training for Technology: Adoption and Productive Use of Gene... Precision of mechanism estimates (confidence interval width for adoption vs prod...
There is no consensus in the literature on net job effects — studies diverge on whether AI produces net job gains.
Direct finding from the review: the 17 peer‑reviewed studies produce heterogeneous results on net employment impacts (some positive, some negative, some neutral).
The effects of K_T adoption are heterogeneous across industries, firms, countries, and cohorts — early adopters and capital-rich firms/countries gain most — implying important transition dynamics for political economy.
Cross-country comparisons, industry- and firm-level panel heterogeneity analyses, and case studies demonstrating variation in adoption timing and gains; model simulations emphasizing transition path dependence.
high mixed The Macroeconomic Transition of Technological Capital in the... industry-/firm-/country-level productivity, income, employment, and adoption tim...
Aggregate productivity (output per worker or per unit of inputs) can rise while labor’s share and employment decline due to substitution toward K_T.
Macro growth-accounting exercises decomposing output growth into contributions from labor, traditional capital, and technological capital; model simulations showing productivity gains coexisting with falling labor shares under substitution elasticities.
high mixed The Macroeconomic Transition of Technological Capital in the... productivity (e.g., TFP or output per worker) and labor share
A weak manager directing a weak worker achieves a 42% success rate, performing worse than the weak agent alone which achieves 44%.
Empirical comparison across the same 200 SWE-bench Lite instances and pipeline configurations, comparing weak-manager+weak-worker pipeline to weak single-agent baseline.
high negative Can AI Models Direct Each Other? Organizational Structure as... task success rate (percentage of tasks solved)
Task complexity shapes substitution: low-complexity tasks see high substitution, while high-complexity tasks favor limited partial automation.
Calibration of the model to O*NET tasks + expert survey + GPT-4o decompositions; implementation results reported for computer vision showing substitution varies with task complexity.
high negative Economics of Human and AI Collaboration: When is Partial Aut... degree of labor substitution as a function of task complexity
AI systems exhibit predictable but diminishing returns to data, compute, and model size (scaling-law experiments), implying the cost of higher accuracy is convex: good performance may be inexpensive, but near-perfect accuracy is disproportionately costly.
Scaling-law experiments estimating performance as a function of data, compute, and model size; described experimental estimation of production function.
high negative Economics of Human and AI Collaboration: When is Partial Aut... marginal returns to inputs (data, compute, model size) and marginal cost of accu...
The common claim that generative AI simply amplifies the Dunning–Kruger effect is too coarse to capture the available evidence.
Paper's synthesis of heterogenous empirical findings from human–AI interaction, learning research, and model evaluation used to critique the uniform-amplification interpretation; no single empirical countertest reported.
high negative Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupli... validity of the 'amplified Dunning–Kruger' interpretation
LLM use degrades metacognitive accuracy and flattens the classic competence–confidence gradient across skill groups (i.e., reduces calibration and narrows differences in self-assessed confidence by skill level).
Synthesis of studies from human–AI interaction and learning research reported in the paper that document worsened calibration and a reduction in the competence–confidence gradient when users rely on LLM outputs; the paper does not report a single combined sample size.
high negative Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupli... metacognitive accuracy / calibration and competence–confidence gradient
The agent team topology exhibits higher operational fragility due to multi-author code generation.
Reported empirical observation from experiments comparing architectures, attributing increased fragility/errors to multi-author code generation in the agent team setup (stated qualitatively; no quantitative failure rates provided in the abstract).
high negative An Empirical Study of Multi-Agent Collaboration for Automate... operational fragility / error-proneness associated with multi-author code genera...
Azar et al. (2023) show that monopsonistic employers have stronger incentives to automate and document that US commuting zones with higher labor market concentration experienced more robot adoption.
Citation reported in the paper summarizing Azar et al. (2023); empirical analysis across US commuting zones (no sample size provided here).
high negative NBER WORKING PAPER SERIES robot adoption correlated with labor market concentration; incentives to automat...
Acemoglu and Restrepo (2022) attribute 50–70% of the increase in US wage inequality between 1980 and 2016 to displacement of workers from tasks by automation.
Citation reported in the paper summarizing Acemoglu and Restrepo (2022)'s attribution of the rise in wage inequality to automation-driven task displacement.
high negative NBER WORKING PAPER SERIES contribution of automation-driven displacement to rise in wage inequality (1980–...
Dechezleprêtre et al. (2025) exploit Germany's Hartz reforms to estimate an elasticity of automation innovation to low-skill wages of 2–5 at the firm level.
Citation reported in the paper summarizing Dechezleprêtre et al. (2025)'s empirical estimate (elasticity 2–5); the paper states this was estimated at the firm level.
high negative NBER WORKING PAPER SERIES elasticity of automation innovation to low-skill wages
Eloundou et al. (2024) predict that half of US jobs are significantly exposed to recent advances in generative AI.
Citation reported in the paper summarizing Eloundou et al. (2024)'s prediction; no sample size provided in the excerpt.
high negative NBER WORKING PAPER SERIES share of US jobs exposed to generative AI