The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (4793 claims)

Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 402 112 67 480 1076
Governance & Regulation 402 192 122 62 790
Research Productivity 249 98 34 311 697
Organizational Efficiency 395 95 70 40 603
Technology Adoption Rate 321 126 73 39 564
Firm Productivity 306 39 70 12 432
Output Quality 256 66 25 28 375
AI Safety & Ethics 116 177 44 24 363
Market Structure 107 128 85 14 339
Decision Quality 177 76 38 20 315
Fiscal & Macroeconomic 89 58 33 22 209
Employment Level 77 34 80 9 202
Skill Acquisition 92 33 40 9 174
Innovation Output 120 12 23 12 168
Firm Revenue 98 34 22 154
Consumer Welfare 73 31 37 7 148
Task Allocation 84 16 33 7 140
Inequality Measures 25 77 32 5 139
Regulatory Compliance 54 63 13 3 133
Error Rate 44 51 6 101
Task Completion Time 88 5 4 3 100
Training Effectiveness 58 12 12 16 99
Worker Satisfaction 47 32 11 7 97
Wages & Compensation 53 15 20 5 93
Team Performance 47 12 15 7 82
Automation Exposure 24 22 9 6 62
Job Displacement 6 38 13 57
Hiring & Recruitment 41 4 6 3 54
Developer Productivity 34 4 3 1 42
Social Protection 22 10 6 2 40
Creative Output 16 7 5 1 29
Labor Share of Income 12 5 9 26
Skill Obsolescence 3 20 2 25
Worker Turnover 10 12 3 25
Clear
Productivity Remove filter
Construct validity is threatened because commonly used outcome measures can misrepresent the constructs of interest when AI changes task structure or human strategies.
Practitioners' reports in semi-structured interviews (n=16) and authors' synthesis illustrating cases where metrics no longer capture intended constructs after AI introduction.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... construct validity of outcome measures (accuracy of metrics in capturing intende...
Common internal validity threats in uplift studies of frontier AI include violations of treatment fidelity and SUTVA (e.g., contamination, time-varying treatments).
The paper's validity-consequences section, based on thematic analysis of 16 interviews and mapping practitioner-reported problems to internal validity constructs.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... treatment fidelity and SUTVA adherence in RCTs measuring uplift
Porous real-world settings cause spillovers and contamination across experimental arms, violating SUTVA and threatening internal validity.
Multiple practitioners (n=16) reported examples of spillovers and contamination during deployment-like studies; thematic analysis mapped these to SUTVA/treatment-fidelity concerns.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... internal validity (SUTVA, treatment contamination) of uplift trials
Shifting baselines (changes in tools, protocols, or knowledge during and across studies) complicate defining an appropriate control or status quo.
Interview data (16 practitioners) and thematic analysis identifying shifting baselines as a recurring challenge reported by participants.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... construct validity of the control/status-quo definition in uplift studies
Rapidly evolving models (nonstationarity) make any single trial a moving target, undermining the temporal stability of measured uplift.
Practitioner reports from semi-structured interviews (n=16) describing model updates and performance changes during/after trials; thematic coding indicating nonstationarity as a common concern.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... temporal stability/generalizability of measured uplift across model versions
Properties of frontier AI — rapid model evolution, shifting baselines, heterogeneous and changing users, and porous real-world settings — regularly strain internal, construct, and external validity of human uplift studies.
Recurring themes identified via qualitative analysis of 16 practitioner interviews; mapped to internal/construct/external validity dimensions in the paper's results.
medium negative RCTs & Human Uplift Studies: Methodological Challenges and P... internal, construct, and external validity of human uplift RCTs
Instability of agent rankings across configurations makes procurement and deployment decisions based on narrow benchmarks risky; firms should evaluate agents under their own scaffolds, datasets, and workflows before committing.
Empirical finding of ranking instability across models, scaffolds, and datasets; methodological recommendation derived from that instability.
medium negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... robustness_of_benchmark_based_procurement (risk_of_misleading_benchmarks)
Claims that AI will imminently replace human auditors are overstated; real-world economic benefits are more likely to come from complementary automation (breadth + triage) rather than full substitution.
Interpretation based on empirical failures in end-to-end exploitation, instability across configurations, and scaffold sensitivity observed in this study.
medium negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... economic_value_of_automation (qualitative_assessment_of_substitution_vs_compleme...
Detection and exploitation rankings are unstable: rankings shift across model configurations, tasks, and datasets, so results are not robust to evaluation choices.
Observed variability in detection/exploitation rankings across the expanded matrix of models, scaffolds, and datasets in the study's experiments.
medium negative Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... ranking_stability (consistency_of_model_rankings_across_configs_and_datasets)
High within-person variability and statement-dependent ambiguity imply noisy sentiment labels that can attenuate estimated effects in econometric analyses (measurement error / attenuation bias).
Empirical findings of moderate within-person stability and strong statement dependence in a sample of 81 students labeling decontextualized statements; combined with standard measurement-error theory (paper’s implication for applied analyses).
medium negative Exploring Indicators of Developers' Sentiment Perceptions in... expected bias (attenuation) in estimated associations when using noisy sentiment...
The sphere + dislodgement-threshold material approximation may not capture all real-world mechanical and adhesive properties, limiting generalization.
Authors note/modeling limitation: summary explicitly states the material physics are approximated and may not capture all real-world properties; this is presented as a limitation rather than an empirical result.
medium negative Learning Adaptive Force Control for Contact-Rich Sample Scra... generalization/physical fidelity of the simulation model (limitation)
Key technical and organizational risks include model brittleness, privacy and IP concerns in code generation (training-data provenance), and increased governance and QA burdens.
Literature review highlighting known risks and survey responses reporting practitioner concerns; no quantified incident rates provided.
medium negative Artificial Intelligence as a Catalyst for Innovation in Soft... reported incidence or concern levels about risks (qualitative)
Practitioners report barriers to adoption including integration costs, lack of trust/explainability, poor data quality, and skills gaps.
Thematic analysis / coding of open-ended survey responses and literature review identifying common adoption barriers; survey sample size not specified.
medium negative Artificial Intelligence as a Catalyst for Innovation in Soft... prevalence of reported barriers in survey responses
GDP and productivity metrics that ignore interpretive labor risk understating the inputs to creative and knowledge work; RATs offer a means to measure previously invisible inputs.
Policy argument in the measurement/productivity subsection; no empirical re-estimation of GDP/productivity presented.
medium negative Chasing RATs: Tracing Reading for and as Creative Activity completeness of productivity/GDP measurement with respect to interpretive labor
Algorithmic feeds and AI summarizers tend to compress or automate interpretive traces, potentially erasing signals of reasoning, context, and tacit knowledge.
Conceptual claim supported by argumentation and examples in the paper; no empirical comparison between RATs and existing summarizers is presented.
medium negative Chasing RATs: Tracing Reading for and as Creative Activity loss of interpretive trace signals (reasoning/context/tacit knowledge) when usin...
Prior work often conflates feedback source and feedback model; this study isolates them through controlled experiments.
Authors' literature review and the paper's experimental design explicitly constructed to disentangle source and model effects.
medium negative A Systematic Study of Pseudo-Relevance Feedback with LLMs Degree to which prior studies separate PRF design dimensions (methodological ass...
QCSC systems are capital- and skill-intensive, favoring well-resourced incumbents (large tech firms, national labs, major pharma/materials companies), potentially increasing concentration in compute-enabled domains.
Economic and industry-structure reasoning based on anticipated capital costs, specialized skills required, and comparison to existing capital-intensive compute infrastructures; no empirical market-share data.
medium negative Reference Architecture of a Quantum-Centric Supercomputer market concentration and firm advantage in compute-enabled R&D domains
Recent quantum advantage demonstrations for quantum-system simulation show utility, but practical applied research requires hybrid workflows that neither QPUs nor classical HPC can efficiently execute alone.
Review and synthesis of published quantum-simulation demonstrations and known performance/scaling limits of classical HPC; qualitative analysis of hybrid algorithm requirements; no new experiments.
medium negative Reference Architecture of a Quantum-Centric Supercomputer ability of standalone QPUs or classical HPC to execute full applied-research hyb...
Under realistic limitations (distribution shift, very large prompt inventories, or severe cold starts), DPS’s realized rollout savings and performance gains may be reduced.
Authors list these scenarios as potential limitations and caveats in the Discussion/Limitations section; no quantification provided in the summary.
medium negative Dynamics-Predictive Sampling for Active RL Finetuning of Lar... magnitude of rollout savings and performance gains under adverse conditions
Contracts and incentives based on expected performance can incentivize strategies that deliver high expected returns but poor or unreliable time-average outcomes; incentive design should account for path-dependent risks.
Theoretical/incentive argument and examples in the paper linking objective mismatch to adverse incentives; illustrative reasoning rather than empirical contract studies.
medium negative Ergodicity in reinforcement learning alignment/misalignment of incentives with reliable long-run (time-average) perfo...
Economic evaluations and deployment decisions that rely on ensemble expectations can misstate economic value and risk because firms and users experience single time-averaged trajectories; regulators and decision-makers should therefore prefer objectives reflecting single-run guarantees when relevant.
Conceptual mapping of the theoretical results to economic decision-making and deployment risk; policy and incentive discussion in the paper (argumentative, not empirical).
medium negative Ergodicity in reinforcement learning accuracy of economic valuation and risk assessment when using ensemble expectati...
The paper's illustrative example shows a policy that maximizes expected reward can produce trajectories that lock into high- or low-reward regimes so an agent’s long-term realized reward is highly uncertain and not captured by the expectation.
Constructed example provided in the paper; demonstration of divergent single-trajectory outcomes under a single policy; no empirical sample size (example-based).
medium negative Ergodicity in reinforcement learning distribution (uncertainty) of long-term realized reward across individual trajec...
Expect diminishing returns from AI investments if parallel investments in organizational change and data governance are not made.
Synthesis of case evidence and theoretical argument: instances where additional AI investment produced limited marginal benefit absent organizational complements.
medium negative Optimizing integrated supply planning in logistics: Bridging... marginal returns to AI (performance per unit AI investment)
Legacy systems and siloed organizational structures produce persistent forecasting inaccuracies, operational disconnects, and constrained responsiveness.
Cross-case interview narratives documenting continued forecasting issues and operational misalignment in firms with legacy IT and functional silos.
medium negative Optimizing integrated supply planning in logistics: Bridging... forecasting accuracy, operational alignment, responsiveness (lead times)
MLOps and governance provisions shift costs from one-off implementation to ongoing maintenance, implying recurring costs that should be captured in economic evaluations.
Analytical/economic argument presented in the paper as an implication of including an MLOps layer (conceptual; no empirical cost accounting provided).
medium negative ALGORITHM FOR IMPLEMENTING AI IN THE MANAGEMENT LOOP OF SMES... cost structure (recurring maintenance costs vs one-off implementation costs)
Adoption complementarities (AI tools + developer skill + organizational processes) favor larger incumbents and well‑funded firms, possibly increasing concentration in tech sectors.
Theoretical argument about complementarities and returns to scale; illustrative examples; lacks firm‑level empirical testing.
medium negative How AI Will Transform the Daily Life of a Techie within 5 Ye... market concentration measures (market share, concentration ratios) and different...
In the near term, displacement risks concentrate on junior or highly routine roles; mobility and retraining will determine realized unemployment impacts.
Task automatability mapping indicating routine tasks more automatable and qualitative reasoning on labor mobility; no empirical unemployment projections.
medium negative How AI Will Transform the Daily Life of a Techie within 5 Ye... employment outcomes for junior/highly routine roles (displacement rates, unemplo...
Adoption will be heterogeneous: larger firms and well‑resourced teams will capture more gains earlier, producing competitive advantages.
Theoretical argument about adoption complementarities (AI tools + developer skill + organizational processes) and illustrative examples; no cross‑firm empirical analysis.
medium negative How AI Will Transform the Daily Life of a Techie within 5 Ye... heterogeneity in productivity gains and market advantage by firm size/resource l...
Differential adoption across firms (due to modular, scalable designs and data advantages) may create winner‑takes‑most effects and increase market concentration, benefiting early adopters with rich data/integration capabilities.
Market-structure claim supported by economic reasoning about scale and data advantages; no cross-firm empirical adoption study or market concentration time‑series is provided.
medium negative Next-Generation Financial Analytics Frameworks for AI-Enable... market concentration metrics (e.g., HHI), firm market shares, adoption timing di...
Initial investment, integration, and ongoing maintenance/compliance costs can be substantial and affect short-term ROI.
Interviewed administrators and implementation reports citing upfront and recurring costs (integration, model maintenance, compliance); quantitative budget figures not standardized across sites in the paper.
medium negative The Role of Artificial Intelligence in Healthcare Complaint ... implementation and maintenance costs; short-term return on investment (ROI)
Risk of deskilling or reduced empathy if human roles are overly automated.
Thematic analysis of staff interviews and surveys reporting concerns about loss of practice, reduced patient contact, and potential diminishment of empathetic skills; no longitudinal measures of skill loss presented.
medium negative The Role of Artificial Intelligence in Healthcare Complaint ... staff-reported empathy/skill levels and qualitative indicators of deskilling
Technical and organizational integration with legacy hospital IT systems is nontrivial.
Implementation reports and interviews describing integration work, time, and resource needs; descriptive accounts of technical and organizational barriers (no universal timelines/costs reported).
medium negative The Role of Artificial Intelligence in Healthcare Complaint ... integration difficulty/time/cost (implementation burden)
Algorithmic bias in NLP models can misclassify complaints from underrepresented groups.
Observations from system classification error analyses (disparities reported by demographic group) and corroborating qualitative concerns from staff and administrators; specific subgroup sample sizes and effect magnitudes not provided.
medium negative The Role of Artificial Intelligence in Healthcare Complaint ... differential misclassification rates by demographic group (bias in NLP classific...
Data privacy and security risks arise from centralizing complaint text and metadata.
Stakeholder interviews, thematic coding of concerns, and risk assessment commentary based on centralized logs and metadata aggregation; no measured breach incidents reported here.
medium negative The Role of Artificial Intelligence in Healthcare Complaint ... privacy/security risk (qualitative risk indicators; potential exposure of compla...
Organizations will incur additional governance and procurement costs (diversity audits, recalibration of reward models, multi-model infrastructures) to mitigate homogenization, shifting some economic benefits of AI toward governance spending.
Cost implication argued from the need for auditing and multi-model procurement described in recommendations; not supported by quantified cost analyses in the paper.
medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... governance and procurement costs associated with LLM deployment
Inter-model convergence undermines product differentiation across AI providers and could accelerate commoditization of base LLM outputs.
Market-structure inference built on empirical finding of high cross-model output similarity across 70+ models and theoretical discussion of vendor differentiation; no market-level price or adoption time-series analyzed in the paper.
medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... vendor product differentiation / commoditization of base outputs
Homogenized AI outputs reduce the value of AI as a source of varied cognitive complements to human labor, potentially lowering productivity gains from human–AI collaboration in tasks requiring creativity and exploration.
Economic argument drawing on measured decreases in model output diversity and theoretical literature on complementarities between diverse AI outputs and human creativity; no direct measured productivity changes reported in field settings within the paper.
medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... productivity gains from human–AI collaboration (theoretical implication inferred...
Reward-model and evaluation miscalibration can cause organizations to prefer models that maximize apparent evaluation scores at the expense of useful stylistic or cognitive diversity.
Comparative analyses between automated evaluation/reward-model rankings and human preference/diversity assessments reported in the paper; examples where high-scoring models produced more consensus-style outputs.
medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... model selection bias driven by automated evaluation scores; reduction in diversi...
Homogenized outputs increase organizational susceptibility to groupthink and correlated errors across teams using different models.
Argument based on observed inter-model convergence (high similarity across models) implying correlated outputs and thus correlated mistakes across teams; no randomized organizational field experiment reported, this is an inferred risk from the empirical convergence data.
medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... risk of correlated errors / susceptibility to groupthink (conceptual risk inferr...
Homogenization of LLM outputs erodes creative diversity in AI-assisted work and reduces the variety of solutions produced.
Inference drawn from measured decreases in response diversity (entropy/distinct-n) and the observed inter-model convergence across real-world queries; argument linking lower measured diversity to fewer distinct solution proposals in AI-augmented workflows.
medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... creative diversity / number of distinct solution variants produced
Current reward models and automated evaluation metrics are biased toward consensus/high-probability responses, preferring consensus-style outputs even when stylistically diverse alternatives are judged equally high-quality by humans.
Reported human preference assessments and comparisons between human judgments and automated/reward-model scores showing cases where reward models favor higher-probability/consensus outputs despite no human-quality advantage; analyses described comparing reward-model scores to human judgments on stylistically diverse outputs.
medium negative The Artificial Hivemind: Rethinking Work Design and Leadersh... alignment between reward-model/automated evaluation scores and human quality jud...
Feedback effects from physical capital and labor onto AI capital are weak, with only weak negative feedback observed (physical capital → AI and labor → AI small/weakly negative coefficients).
Estimated interaction coefficients from the 2016–2023 calibration showing small-magnitude, negative feedback terms from physical capital and labor onto AI.
medium negative Governance of Technological Transition: A Predator-Prey Anal... AI capital growth/stock (feedback strength)
Introducing ‘agent capital’ (AI that lowers coordination costs) reduces coordination costs inside firms (coordination compression).
Definition and central assumption of the paper's formal task-based model; analytical setup assumes agent capital parametrically reduces coordination frictions.
medium negative AI as Coordination-Compressing Capital: Task Reallocation, O... coordination costs (firm-internal coordination friction parameter)
Extremely high reported model performance (R² = 0.999) raises concerns about overfitting, data leakage, or measurement artifacts and the need for transparency, out-of-sample validation, and field trials.
Paper (or the paper's discussion/implications as summarized) notes model-risk and external validity concerns and recommends replication and validation before policy adoption.
medium negative AI in food inequality: Leveraging artificial intelligence to... model robustness / external validity concerns (qualitative)
Uneven inclusion in digital/AI deployments risks exacerbating digital divides and creating distributional harms.
Descriptive and case-based studies report differential access and uptake among demographic groups; limited causal quantification and varying measurement approaches across studies.
medium negative Digital Transformation and AI Adoption in Government: Evalua... service coverage across demographic groups, measures of digital divide (access, ...
Limited auditability and explainability of AI systems increase trust and legitimacy risks.
Technical governance literature and case reports show challenges in model explainability and external audit; evidence is technical and illustrative rather than based on large-sample causal studies.
medium negative Digital Transformation and AI Adoption in Government: Evalua... auditability metrics, transparency indicators, public trust measures
Inadequate regulatory frameworks raise privacy, accountability, and fairness concerns for AI in government.
Governance reviews and risk assessments documented in the literature highlight regulatory gaps and associated incidents/risks; empirical incident counts are not comprehensively tabulated in the review.
medium negative Digital Transformation and AI Adoption in Government: Evalua... privacy breaches, accountability/audit findings, measures of fairness/bias incid...
Procurement, budgeting rules, and siloed incentives discourage cross-cutting transformation and modular iterative deployments.
Policy and institutional analyses in the reviewed literature point to rigid procurement cycles, capital budgeting practices, and siloed funding as obstacles; examples and case narratives are provided but systematic quantification is limited.
medium negative Digital Transformation and AI Adoption in Government: Evalua... frequency of modular/iterative procurements, number of cross-cutting projects fu...
Organizational resistance and fragmented coordination block integrated rollouts of cross-cutting digital reforms.
Qualitative case studies and governance analyses repeatedly identify intra-governmental silos, conflicting incentives, and change-resistance as implementation barriers; evidence is primarily descriptive.
medium negative Digital Transformation and AI Adoption in Government: Evalua... degree of cross-agency integration, completion rates of integrated projects, imp...
Skills shortages (technical, managerial, data literacy) impede adoption and maintenance of digital and AI systems.
Multiple surveys, policy briefs and qualitative studies cited in the review report workforce capacity gaps; often based on targeted assessments or organizational audits rather than representative sampling.
medium negative Digital Transformation and AI Adoption in Government: Evalua... adoption rates, system maintenance capacity, time-to-value for deployments