The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (3062 claims)

Adoption
5227 claims
Productivity
4503 claims
Governance
4100 claims
Human-AI Collaboration
3062 claims
Labor Markets
2480 claims
Innovation
2320 claims
Org Design
2305 claims
Skills & Training
1920 claims
Inequality
1311 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 373 105 59 439 984
Governance & Regulation 366 172 115 55 718
Research Productivity 237 95 34 294 664
Organizational Efficiency 364 82 62 34 545
Technology Adoption Rate 293 118 66 30 511
Firm Productivity 274 33 68 10 390
AI Safety & Ethics 117 178 44 24 365
Output Quality 231 61 23 25 340
Market Structure 107 123 85 14 334
Decision Quality 158 68 33 17 279
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 88 31 38 9 166
Firm Revenue 96 34 22 152
Innovation Output 105 12 21 11 150
Consumer Welfare 68 29 35 7 139
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 71 10 29 6 116
Worker Satisfaction 46 38 12 9 105
Error Rate 42 47 6 95
Training Effectiveness 55 12 11 16 94
Task Completion Time 76 5 4 2 87
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 16 9 5 48
Job Displacement 5 29 12 46
Social Protection 19 8 6 1 34
Developer Productivity 27 2 3 1 33
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 8 4 9 21
Clear
Human Ai Collab Remove filter
Large language models (LLMs) risk reproducing, and in some cases amplifying, gender stereotypes and bias already present in the labour market.
Framed as an assertion supported by prior literature and used as motivation for the study; partially evaluated empirically in this paper via the GPT-5 experiment.
medium negative Gender Bias in Generative AI-assisted Recruitment Processes presence and amplification of gender stereotypes/bias in LLM outputs
The inability of models to reliably self-author useful Skills implies that models typically cannot produce the procedural knowledge they would benefit from consuming.
Interpretation based on the empirical finding that self-generated Skills provided no average benefit; inferred conclusion about model-authored procedural content quality. The paper's claim is supported by the comparative experimental results but the inference about broader capabilities is derived from those results rather than a direct separate measurement.
medium negative SkillsBench: Benchmarking How Well Agent Skills Work Across ... quality/usefulness of model-authored Skills as measured by downstream task pass ...
In some tasks, curated Skills worsened performance: 16 of 84 tasks showed negative deltas.
Per-task delta analysis reported in the paper: authors report 16 tasks with negative deltas where curated Skills reduced pass rate. (Note: the paper elsewhere reports 86 tasks in the benchmark; the negative-task count is reported as 16 of 84 in the paper's per-task summary.)
medium negative SkillsBench: Benchmarking How Well Agent Skills Work Across ... task pass rate (per-task delta)
Access to digital learning and credential portability could unevenly benefit those with connectivity or prior skills, creating distributional effects and digital divides that should be measured.
Conceptual risk analysis and distributional reasoning based on digital access differentials; no empirical subgroup analysis reported.
medium negative Training as corridor governance: TVET alignment, skills reco... differential program benefits across connectivity/skill/gender subgroups; measur...
Corridor governance is fragmented, with uneven implementation capacity across sending and receiving actors.
Governance gap analysis and desk review of corridor institutional arrangements; qualitative identification of capacity and accountability shortfalls.
medium negative Training as corridor governance: TVET alignment, skills reco... implementation capacity and inter-actor coordination in corridor governance
Current mandatory pre-departure training is typically delivered late, generically, and with weak assessment, limiting its capacity to change recruitment choices or support migrants after arrival.
Structured desk review of policy and program materials and corridor process mapping identifying timing, actors, and touchpoints; qualitative/administrative evidence rather than quantitative outcome measurement.
medium negative Training as corridor governance: TVET alignment, skills reco... timing and quality of training delivery; ability to affect recruitment choices a...
Platforms optimized for engagement can produce externalities that distort lived temporality (loss of presence and meaning) beyond standard attention‑capture harms.
Argument synthesizing platform literature and phenomenological concerns; no new empirical analysis of platform effects provided.
medium negative XChronos and Conscious Transhumanism: A Philosophical Framew... welfare externalities expressed as reductions in presence and perceived meaning ...
Contemporary transhumanist and neurotechnology developments (BCIs, neural digital twins, human–AI collaboration) have advanced technologically but lack a robust conceptual core focused on lived experience and temporality.
Survey and synthesis of existing literatures reported in the paper (conceptual review); no systematic empirical content analysis or coded sample size provided.
medium negative XChronos and Conscious Transhumanism: A Philosophical Framew... extent to which existing transhumanist/neurotech work centers lived temporality ...
LLM-generated participants are particularly risky in strategic and game-theoretic settings because they may misrepresent incentives, dynamic strategic thinking, and bounded rationality.
Review highlights examples and theoretical concerns from multiple studies indicating misrepresentation of strategic behavior; grouped under risks for strategic settings.
medium negative Synthetic Participants Generated by Large Language Models: A... accuracy of strategic decisions, equilibrium behavior, and incentive-respecting ...
The absence of level‑4 evidence (organizational/patient outcomes) limits the ability of health systems and payers to conduct cost‑benefit or return‑on‑investment analyses for upskilling investments in AI.
No included study reported level‑4 outcomes; the paper reasons that without organizational/patient outcome data, economic evaluation is hampered.
medium negative Assessing the effectiveness of artificial intelligence educa... availability of evidence linking training to organizational/patient outcomes for...
Because most programs were short, introductory, and assessed only short‑term learner outcomes, they likely produce modest increases in individual AI literacy but are insufficient to build advanced clinical AI competencies that would change clinical task allocation or productivity.
Synthesis combining program characteristics (short duration, introductory content, academic delivery) and outcome mapping to only Kirkpatrick levels 1–3 in the 27 studies; interpretation drawn in the paper.
medium negative Assessing the effectiveness of artificial intelligence educa... individual AI literacy gains and capacity to generate advanced clinical AI compe...
Workplace stress is associated with reduced job performance.
PLS-SEM analysis on the same N = 350 sample. Reported direct path: Stress → Performance, β = 0.158, p < 0.001. (Note: the study interprets this as stress reducing performance; sign/coding conventions are not detailed in the summary.)
High upfront and maintenance costs create scale advantages for larger institutions or centralized providers, potentially concentrating market power among well-resourced curriculum developers.
Economic inference from cost structure described in paper; no market concentration empirical data provided.
medium negative Curriculum engineering: organisation, orientation, and manag... costs (upfront and maintenance), market concentration metrics among curriculum p...
Disadvantages and risks include significant resource investment, complexity aligning multiple standards, and a high demand for continuous updates and audits.
Paper's risks section (author assertion); no quantified cost or burden data.
medium negative Curriculum engineering: organisation, orientation, and manag... implementation cost, complexity of standards alignment, frequency and cost of up...
Implementing this program requires substantial resources and ongoing governance.
Author assertions in disadvantages/risks section; no cost accounting or empirical costing data provided.
medium negative Curriculum engineering: organisation, orientation, and manag... resource requirements and governance burden (cost/time/staffing)
Proprietary models trained on large clinical datasets can create high entry barriers, concentrating market power among a few platform firms and increasing prices for hospitals.
Market-structure and platform economics analysis in the paper; empirical evidence of concentration in GenAI healthcare is limited and no firm-level market-share data are provided.
medium negative GenAI and clinical decision making in general practice market concentration metrics (HHI); vendor pricing; hospital switching costs
Liability and accountability gaps exist for AI-suggested errors: it is unclear whether vendors, hospitals, or clinicians are responsible for harms resulting from GenAI CDS recommendations.
Policy and legal analysis discussed in the paper; this is a structural/legal observation rather than an empirical finding and no case-law sample size is provided.
medium negative GenAI and clinical decision making in general practice existence of legal/ liability/ accountability clarity; number of resolved liabil...
Current simulation practice is insufficiently integrated with enabling technologies (digital twins, data analytics, AI/ML) and with relevant government policy constraints.
Synthesis of literature and gap analysis in the paper; assertions are conceptual and not empirically tested within the paper.
medium negative A Review of Manufacturing Operations Research Integration in... level of integration between simulation models and enabling technologies/policy ...
Current simulation practice has limited strategic orientation, often focusing more on tactical and operational questions than on firm strategy.
Literature review and analysis in the paper highlighting the emphasis in existing studies on tactical/operational problems.
medium negative A Review of Manufacturing Operations Research Integration in... strategic relevance of simulation research and models
Current simulation practice lacks contextualization to firm‑ and industry‑specific realities.
Findings from the paper's literature review and critique sections; no new empirical measurement provided.
medium negative A Review of Manufacturing Operations Research Integration in... degree of firm/industry contextualization in simulation models
Current manufacturing and supply‑chain simulation practices are insufficiently contextualized, strategically focused, or integrated with modern technologies and policy considerations.
Literature review and critique of existing simulation practice presented in the paper; no original empirical data or case studies.
medium negative A Review of Manufacturing Operations Research Integration in... simulation relevance (contextualization, strategic alignment, technology and pol...
Personalization raises distributional concerns and risks of manipulation or biased treatment; regulators may need to set transparency, fairness, and data-use standards.
Policy analysis and normative recommendation based on known risks of personalization systems; not empirically demonstrated in robotic deployments here.
medium negative Reimagining Social Robots as Recommender Systems: Foundation... incidence of biased treatment, transparency compliance, regulatory adoption rate...
LLM-based personalization generates context-aware responses but often fails to model long-term preferences and fine-grained user/item relations needed for consistent, proactive personalization.
Conceptual critique based on surveyed limitations of LLM-based approaches; no new experimental data reported.
medium negative Reimagining Social Robots as Recommender Systems: Foundation... consistency of personalization over time, representation of long-term user prefe...
ANN analysis ranks need-for-human-interaction barriers as the most important predictor of GAICS adoption outcome.
ANN feature-importance analysis reported in the paper that ranks predictors for adoption outcome and finds the human-interaction barrier as the top predictor; paper abstract does not include details on ANN implementation or sample characteristics.
medium negative Reimagining Stakeholder Engagement Through Generative AI: A ... GAICS adoption (likelihood/decision to adopt)
Students raised concerns about ChatGPT producing factual errors, the risk of overreliance that could reduce independent thinking, and functional constraints of free ChatGPT versions.
Qualitative analysis of open-ended student survey responses; concerns consistently reported across responses in the sample of 254 students.
medium negative Expanding the lens: multi-institutional evidence on student ... student-reported concerns and perceived risks
Biased or unrepresentative AI outputs produce negative externalities, including maladaptation and inefficient investments in vulnerable regions.
Conceptual analysis and illustrative cases linking misleading model outputs to maladaptive decisions; the paper notes risks rather than providing quantified incidence or cost estimates.
medium negative The Rise of AI in Weather and Climate Information and its Im... Incidence of maladaptation and associated economic inefficiencies attributable t...
Returns to scale in compute and data favor incumbents; without intervention this dynamic can entrench inequality in the global climate-information market.
Economic theory of returns to scale combined with observed compute concentration; no empirical elasticity or returns-to-scale estimates provided.
medium negative The Rise of AI in Weather and Climate Information and its Im... Degree to which compute/data scale advantages increase incumbents' market share ...
Concentration of compute and model development creates market power for Northern institutions and companies, likely leading to unequal pricing, control over standards, and capture of high-value climate services.
Descriptive mapping of concentration plus economic analysis of market structure and returns to scale; illustrative rather than quantitatively proven across markets.
medium negative The Rise of AI in Weather and Climate Information and its Im... Market power indicators (pricing, standard-setting control, market share in clim...
Rapid AI adoption without a shift from model-centric to data- and equity-centric development risks producing systematically worse performance and misleading recommendations for the most climate-vulnerable, data-sparse regions.
Synthesis of domain-specific case studies (weather/climate, impact models, LLMs) and conceptual causal tracing demonstrating how infrastructure asymmetry can degrade outputs in vulnerable regions; evidence illustrative rather than causal-estimate based.
medium negative The Rise of AI in Weather and Climate Information and its Im... Model performance and recommendation quality in climate-vulnerable, data-sparse ...
Large language models (LLMs) that rely on dominant, textualized climate knowledge tend to foreground Northern epistemologies and marginalize local or indigenous knowledge, reinforcing biases in climate narratives and recommendations.
Case studies and analysis of training-corpus composition and output examples illustrating the dominance of Northern textual sources and examples of sidelining local knowledge; no large-scale audit results provided.
medium negative The Rise of AI in Weather and Climate Information and its Im... Representation of local/indigenous knowledge in LLM outputs and bias in generate...
In climate impact modelling, sparse and unrepresentative exposure and vulnerability data combined with inadequate validation generate high uncertainty and risk of misleading interventions and maladaptation in vulnerable locales.
Targeted case studies and literature synthesis showing gaps in exposure/vulnerability datasets and validation failures; argument is illustrated rather than quantified across all systems.
medium negative The Rise of AI in Weather and Climate Information and its Im... Uncertainty in impact estimates and likelihood of misleading policy/intervention...
In weather and climate modelling, historically and spatially biased observational data produce systematic performance gaps in under-observed tropical and low-income regions, reducing forecast fidelity where adaptive capacity is lowest.
Comparative, domain-specific case studies and literature review documenting observational data sparsity and illustrative empirical performance gaps; no single cross-system statistical estimate provided.
medium negative The Rise of AI in Weather and Climate Information and its Im... Forecast fidelity/accuracy in under-observed tropical and low-income regions (mo...
The geographic concentration of compute and model development creates path dependence: model design, training datasets, and validation reflect Northern priorities and contexts.
Conceptual analysis supported by cross-disciplinary synthesis and illustrative case studies showing dataset selection, validation practices, and model design choices aligned with Northern contexts rather than global representativeness.
medium negative The Rise of AI in Weather and Climate Information and its Im... Degree of alignment between model design/validation choices and Northern (vs. lo...
At the organizational scale, AI adoption is constrained and shaped by compliance requirements, formal policies, and prevailing norms.
Participants' accounts in workshops (n=15) noting compliance and policy considerations; thematic analysis classified these as organizational-level constraints.
medium negative The Values of Value in AI Adoption: Rethinking Efficiency in... organizational-level constraints on adoption (compliance, policy, norms) and res...
Creators who systematize high-throughput AI workflows or control distribution channels may capture outsized returns, potentially increasing winner-take-most dynamics on platforms.
Theoretical implication extrapolated from observed high-throughput practices and monetization strategies in the 377 videos; not directly measured or quantified in the dataset.
medium negative Monetizing Generative AI: YouTubers' Collective Knowledge on... earnings concentration / market concentration effects (suggested, not measured)
Widespread unverifiable income claims and promotional framing create noisy signals about viable earnings, complicating entrants’ investment decisions and labor market expectations.
Analytical inference based on the documented prevalence of unverifiable earnings claims in the 377 videos and theory about market signaling; not quantitatively tested in the paper.
medium negative Monetizing Generative AI: YouTubers' Collective Knowledge on... information quality / market signaling affecting entrant decisions (hypothesized...
GenAI lowers the time and skill cost of producing many types of creative outputs, which can increase content supply and exert downward pressure on wages for routine creative tasks.
Inference drawn as an implication from observed practices (e.g., mass production workflows) in the 377 videos and existing literature; not directly measured in this study.
medium negative Monetizing Generative AI: YouTubers' Collective Knowledge on... potential change in labor costs, content supply, and wage pressure (not empirica...
Creators and the community knowledge base document shifting norms around authorship and attribution: GenAI blurs who is considered the creator and complicates labor recognition and rights.
Coding captured explicit discussion and contested norms about authorship, attribution, and creator identity across the 377 videos.
medium negative Monetizing Generative AI: YouTubers' Collective Knowledge on... frequency and content of discussions about authorship and attribution
Some creators recommend or describe synthetic engagement practices (e.g., automated posting, synthetic comments/engagement) as tactics to inflate visibility.
Thematic coding noted advice or descriptions of engagement-inflating tactics across videos in the 377-video corpus.
medium negative Monetizing Generative AI: YouTubers' Collective Knowledge on... presence of recommendations for synthetic or automated engagement tactics
Creators surface and often employ practices that raise content misappropriation concerns (use of copyrighted or third-party material in synthetic outputs).
Instances and discussions captured in the 377-video sample where creators show or recommend synthesizing, transforming, or repurposing third‑party content.
medium negative Monetizing Generative AI: YouTubers' Collective Knowledge on... occurrence of recommendations or demonstrations involving third-party/copyrighte...
Many videos advertise earnings or income claims that are unverifiable within the content, producing noisy market signals.
Qualitative observations from coding the 377 videos noting frequent asserted earnings without reproducible evidence or transparent accounting.
medium negative Monetizing Generative AI: YouTubers' Collective Knowledge on... presence of unverifiable income/earnings claims in videos
These methodological adaptations reduce but do not eliminate validity threats; they often increase complexity and cost while leaving unresolved issues of generalizability and time-dependence.
Practitioner accounts (n=16) describing limits/tradeoffs of adaptations; authors' synthesis concluding residual threats remain despite adaptations.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... effectiveness and tradeoffs of mitigation strategies for validity threats
External validity is limited: results from a given trial may not generalize across model versions, populations, tasks, or to temporally distant deployments.
Interview-derived themes (16 practitioners) and authors' analytic mapping to external validity concerns; supported by examples of model/version dependence discussed in interviews.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... generalizability/external validity of trial results across versions, populations...
Construct validity is threatened because commonly used outcome measures can misrepresent the constructs of interest when AI changes task structure or human strategies.
Practitioners' reports in semi-structured interviews (n=16) and authors' synthesis illustrating cases where metrics no longer capture intended constructs after AI introduction.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... construct validity of outcome measures (accuracy of metrics in capturing intende...
Common internal validity threats in uplift studies of frontier AI include violations of treatment fidelity and SUTVA (e.g., contamination, time-varying treatments).
The paper's validity-consequences section, based on thematic analysis of 16 interviews and mapping practitioner-reported problems to internal validity constructs.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... treatment fidelity and SUTVA adherence in RCTs measuring uplift
Porous real-world settings cause spillovers and contamination across experimental arms, violating SUTVA and threatening internal validity.
Multiple practitioners (n=16) reported examples of spillovers and contamination during deployment-like studies; thematic analysis mapped these to SUTVA/treatment-fidelity concerns.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... internal validity (SUTVA, treatment contamination) of uplift trials
Shifting baselines (changes in tools, protocols, or knowledge during and across studies) complicate defining an appropriate control or status quo.
Interview data (16 practitioners) and thematic analysis identifying shifting baselines as a recurring challenge reported by participants.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... construct validity of the control/status-quo definition in uplift studies
Rapidly evolving models (nonstationarity) make any single trial a moving target, undermining the temporal stability of measured uplift.
Practitioner reports from semi-structured interviews (n=16) describing model updates and performance changes during/after trials; thematic coding indicating nonstationarity as a common concern.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... temporal stability/generalizability of measured uplift across model versions
Properties of frontier AI — rapid model evolution, shifting baselines, heterogeneous and changing users, and porous real-world settings — regularly strain internal, construct, and external validity of human uplift studies.
Recurring themes identified via qualitative analysis of 16 practitioner interviews; mapped to internal/construct/external validity dimensions in the paper's results.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... internal, construct, and external validity of human uplift RCTs
Instability of agent rankings across configurations makes procurement and deployment decisions based on narrow benchmarks risky; firms should evaluate agents under their own scaffolds, datasets, and workflows before committing.
Empirical finding of ranking instability across models, scaffolds, and datasets; methodological recommendation derived from that instability.
medium negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... robustness_of_benchmark_based_procurement (risk_of_misleading_benchmarks)