The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (5157 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 736 1615
Governance & Regulation 664 329 160 99 1273
Organizational Efficiency 624 143 105 70 949
Technology Adoption Rate 502 176 98 78 861
Research Productivity 348 109 48 322 836
Output Quality 391 120 44 40 595
Firm Productivity 385 46 85 17 539
Decision Quality 275 143 62 34 521
AI Safety & Ethics 183 241 59 30 517
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 105 40 6 187
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 78 8 1 151
Regulatory Compliance 69 64 14 3 150
Training Effectiveness 81 15 13 18 129
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
Most programs were delivered in academic settings: 56% of evaluated programs reported an academic setting.
Setting information extracted from the 27 included programs, with 56% reported as delivered in academic settings.
high mixed Assessing the effectiveness of artificial intelligence educa... program delivery setting (academic vs non-academic)
A plurality of programs were short in duration: 44% of programs were categorized as short courses.
Extraction of program length from the 27 included studies; 44% were classified as short courses per the review's categorization.
high mixed Assessing the effectiveness of artificial intelligence educa... program duration (short vs longer formats)
Most programs were introductory in content: 67% of included programs taught introductory AI concepts rather than advanced/technical AI skills.
Program content extraction across the 27 included studies yielded that 67% were classified as teaching introductory AI.
high mixed Assessing the effectiveness of artificial intelligence educa... program content focus (introductory vs advanced/technical AI skills)
The methodological landscape of the evidence base is heterogeneous, consisting of cross-sectional surveys, case studies, quasi-experimental designs, and a limited number of longitudinal analyses.
Study design information was extracted from the 145 included studies revealing a mix of designs and relatively few longitudinal or experimental studies.
high mixed Digital transformation and its relationship with work produc... study design types (cross-sectional, case study, quasi-experimental, longitudina...
Human factors (training, trust calibration, workflows) determine whether clinicians accept, override, or ignore GenAI suggestions.
Qualitative and quantitative human-AI interaction studies and pilot deployments discussed in the paper; specific sample sizes and effect sizes are not reported in the paper.
high mixed GenAI and clinical decision making in general practice override/acceptance rates; clinician-reported trust and cognitive load; adherenc...
Safety and net benefit of GenAI CDS hinge on deployment details: user interface, real-time feedback, uncertainty quantification, calibration, and how recommendations are presented (strong vs. suggestive).
Human factors and implementation studies referenced; early A/B tests and human-AI interaction research suggest interface and presentation affect acceptance and error rates; no large-scale standardized implementation trial data cited.
high mixed GenAI and clinical decision making in general practice acceptance/override rates; error rates; calibration metrics; clinician trust
Reimbursement models (fee-for-service vs. capitation) will influence whether cost savings from GenAI are realized or offset by increased service volume.
Economic incentive framework and prior health-economics literature cited; the paper does not provide direct empirical tests but references plausible incentive channels.
high mixed GenAI and clinical decision making in general practice total spending; per-patient cost; service volume under different payment models
RL and adaptive methods are good for real-time adaptation but can be myopic, require large amounts of interaction data, and struggle to incorporate long-term preference structure and ethical constraints.
Surveyed properties of reinforcement learning and adaptive methods in HRI/RS literature; no new empirical evaluation in this paper.
high mixed Reimagining Social Robots as Recommender Systems: Foundation... real-time adaptation effectiveness, sample efficiency (amount of interaction dat...
The community knowledge functions both as practical how-to guidance and as collective experimentation with platform rules and revenue mechanisms.
Observed dual nature in the 377-video corpus: instructional workflows alongside demonstrations/testing of platform-tailored monetization tactics and workarounds.
high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... co-occurrence of instructional content and platform-experimentation practices
Typical practices emphasized by creators include rapid mass production of content, productizing prompt engineering, repurposing existing material via synthesis/localization, and packaging AI outputs as sellable creative services or assets.
Recurring practices surfaced through qualitative coding of workflows, tools, and pipelines described in the 377 videos.
high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... presence and frequency of recommended production and productization practices
Across the 377 videos, creators converge on a set of repeatable use cases and platform‑tailored monetization tactics.
Thematic coding of 377 videos produced a catalog of recurring use cases and tactics; the paper reports convergence across that sample.
high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... frequency and recurrence of specific use cases and monetization tactics in the s...
YouTube creators have collectively constructed and circulated a practical knowledge repository about how to monetize GenAI-driven creative work.
Systematic qualitative content analysis (thematic coding) of 377 publicly available YouTube videos in which creators promote GenAI workflows and monetization strategies.
high mixed Monetizing Generative AI: YouTubers' Collective Knowledge on... presence and characteristics of a community knowledge repository (practical guid...
Choice of scaffold materially affects outcomes: an open-source scaffold outperformed vendor-provided scaffolds by up to approximately 5 percentage points.
Comparative experiments across three scaffolding approaches (vendor scaffolds and at least one open-source scaffold) showing up to ~5 percentage point differences in measured outcomes.
high mixed Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... performance_difference_across_scaffolds (detection/exploitation_rates_difference...
Adoption of NFD approaches in regulated domains will depend on standards for validation, auditability, and update procedures.
Implications and governance discussion emphasizing regulatory constraints (finance, healthcare) and the need for validation/audit standards; logical/ normative claim rather than empirical finding.
high mixed Nurture-First Agent Development: Building Domain-Expert AI A... adoption rate in regulated domains conditional on available validation/audit sta...
Limitations include generalizability beyond Chatbot Arena data, calibration of priors on novel tasks, audit costs/latency, user comprehension/cognitive load, and strategic manipulation.
Authors' stated limitations and open questions; these are candid acknowledgements rather than empirical findings.
high mixed Task-Aware Delegation Cues for LLM Agents generalizability, calibration, audit cost/latency, user comprehension, susceptib...
RAD requires estimating cost distributions and choosing a reference policy and quantile-weighting function; these choices determine the method's conservatism and sample efficiency.
Methodological and practical considerations discussed in the paper; noted dependency on estimation and design choices (no quantitative sample-efficiency results provided in the summary).
high mixed Safe RLHF Beyond Expectation: Stochastic Dominance for Unive... method conservatism (relative safety level) and sample efficiency (amount of dat...
Explanations change workflows, shift responsibilities between humans and machines, and can reshape power dynamics—creating both opportunities (better oversight) and risks (over-reliance, gaming).
Qualitative and conceptual studies synthesized in the review, including socio-technical analyses and case studies reporting observed or theorized workflow and responsibility shifts; no meta-analytic causal estimate.
high mixed Explainable AI in High-Stakes Domains: Improving Trust, Tran... workflows, responsibility allocation, power dynamics, oversight quality
Explanations increase user trust principally when they are understandable, actionable, and aligned with users’ domain knowledge; opaque or overly technical explanations can fail to build trust or even decrease it.
Thematic synthesis of empirical and conceptual studies in the reviewed literature reporting conditional effects of explanation form and comprehensibility on trust; review notes heterogeneity in study designs and contexts.
high mixed Explainable AI in High-Stakes Domains: Improving Trust, Tran... user trust / changes in trust toward AI outputs
Explainability improves perceived legitimacy, user trust, and organizational accountability only when technical transparency is paired with human-centered explanation design and governance mechanisms.
Synthesis of studies from the reviewed literature showing conditional effects of algorithmic interpretability combined with explanation design and governance; derived via thematic coding across technical and social-science sources (no new primary experimental data reported).
high mixed Explainable AI in High-Stakes Domains: Improving Trust, Tran... perceived legitimacy, user trust, organizational accountability
Explainability is a necessary but not sufficient condition for trustworthy AI in high-stakes domains.
Systematic literature review (thematic coding and synthesis) of interdisciplinary scholarship (peer-reviewed research, technical reports, policy documents); the paper synthesizes conceptual and empirical studies rather than presenting new primary data. Emphasis on high-stakes domains (healthcare, finance, public sector).
high mixed Explainable AI in High-Stakes Domains: Improving Trust, Tran... overall trustworthiness of AI systems in high-stakes domains (multidimensional c...
Some patients value human contact for sensitive cases; automated interactions can feel impersonal.
Semi-structured interviews with patients/staff and open-ended survey responses documenting preferences for human interaction in sensitive/complex complaints.
high mixed The Role of Artificial Intelligence in Healthcare Complaint ... patient-reported preference for human contact and perceived interpersonal qualit...
Data‑driven policies can either amplify or mitigate inequalities depending on data representativeness, model design, and deployment governance.
Multiple empirical examples and theoretical analyses in the review highlighting cases of both harm (bias amplification) and mitigation, identified across the 103 items.
high mixed Models, applications, and limitations of the responsible ado... distributional equity outcomes (inequality amplification or mitigation)
Citizen acceptance, transparency, and perceived fairness strongly shape adoption trajectories and the political feasibility of AI tools in government.
Repeated empirical findings in the reviewed literature linking public trust, transparency measures, and fairness perceptions to successful or failed deployments (drawn from multiple case studies in the 103 items).
high mixed Models, applications, and limitations of the responsible ado... adoption trajectory/political feasibility of government AI tools (measured via d...
Adoption of AI and data-driven governance is highly uneven across jurisdictions and sectors, driven by institutional capacity, governance frameworks, and public trust.
Cross‑regional and cross‑sector comparisons in the review corpus (103 items) showing varying maturity levels and repeated identification of institutional capacity, governance arrangements, and trust factors as determinants.
high mixed Models, applications, and limitations of the responsible ado... adoption level/maturity of AI-driven governance systems
Productivity gains from generative AI depend on task mix, integration design, and the availability of complementary human skills.
Theoretical evaluation and synthesis of heterogeneous empirical findings; authors highlight variation across firms, sectors, and tasks.
high mixed The Use of ChatGPT in Business Productivity and Workflow Opt... productivity change conditional on task mix/integration/human skills (productivi...
Existing evidence is time-sensitive and heterogeneous: rapidly evolving models, heterogeneous study designs, and many short-term lab/microtask studies limit direct comparability and long-run inference.
Meta-observation from the review: documented methodological limitations across the literature (variation in models, tasks, metrics; prevalence of short-term studies).
high mixed ChatGPT as a Tool for Programming Assistance and Code Develo... generalizability and comparability of empirical findings (study heterogeneity)
Methodological caveats across the literature (heterogeneity of tasks/measures, publication bias, short-term studies) limit the generalizability of current findings.
Meta-level critique within the synthesis noting study heterogeneity, likely publication/short-term biases, and variable domain-specific performance dependent on user expertise and workflows.
high mixed ChatGPT as an Innovative Tool for Idea Generation and Proble... generalizability and external validity of LLM-assisted creativity findings
Standard productivity metrics are likely to undercount the value generated by AI-augmented ideation; quality-adjusted measures of creative output are required.
Measurement critique based on the mismatch between existing productivity statistics and the kinds of upstream idea-generation gains observed in empirical studies; supported by the review's methodological discussion.
high mixed ChatGPT as an Innovative Tool for Idea Generation and Proble... measured productivity vs. true quality-adjusted creative output
Realized value from AI methods (ML, predictive analytics, anomaly detection, XAI) is conditional: these technical methods deliver capabilities only when combined with strong data governance, standardized processes, and change management.
Thematic synthesis across the systematic review (2020–2025) showing repeated case-study and practitioner-report evidence that technical gains failed to scale without governance, process standardization, and organizational change efforts.
high mixed Integrating Artificial Intelligence and Enterprise Resource ... magnitude and durability of ERP-AI benefits (e.g., sustained accuracy gains, ado...
The hybrid estimator (GA+SQP) is computationally more intensive than single-stage MLE/local optimization, implying a trade-off between estimation reliability and runtime cost.
Reported runtime and computational cost comparisons in estimation experiments: the paper notes longer runtimes for GA+SQP versus standard optimizers while documenting improvements in objective values and convergence behavior.
high mixed k-QREM: Integrating Hierarchical Structures to Optimize Boun... computation time / runtime, convergence reliability
Results and implications are limited by the sample and context: evidence comes from law students on a single issue-spotting exam using one brief training intervention, so generalizability to experienced professionals, other tasks, or other models is untested.
Authors’ reported sample (164 law students) and explicit caution about generalizability in the study summary; the intervention and outcome are specific to one exam and one ~10-minute training.
high mixed Training for Technology: Adoption and Productive Use of Gene... Generalizability/applicability to other populations and tasks
Some mechanism-specific estimates are imprecise due to the sample size; confidence intervals for those estimates are wide.
Authors report wide confidence intervals for mechanism decomposition (principal stratification) results based on the randomized sample of 164 students.
high mixed Training for Technology: Adoption and Productive Use of Gene... Precision of mechanism estimates (confidence interval width for adoption vs prod...
No existing AI system replicates this: conversational recommenders treat recommendation as a terminal act, while general-purpose LLMs hallucinate product claims and default to generic promotional templates that fail to engage or persuade.
Author assertion/diagnosis comparing existing conversational recommenders and general-purpose LLMs; no empirical comparisons or quantified evaluation provided in the excerpt.
high negative VerbalValue: A Socially Intelligent Virtual Host for Sales-D... quality of recommendations / engagement and persuasion
Accuracy is not a sufficient proxy for governance in regulated AI systems.
Empirical results from synthetic banking experiments showing divergence between task accuracy and governance-quality metrics across architectures, as summarized in the abstract.
high negative Mechanical Enforcement for LLM Governance:Evidence of Govern... sufficiency of task accuracy as a proxy for governance/auditability
Under text-only governance, 27% of deferrals carry no decision-relevant information.
Experimental evaluation in a synthetic banking domain comparing text-only governance to mechanical enforcement; reported statistic in paper abstract. Specific sample size not stated in abstract.
high negative Mechanical Enforcement for LLM Governance:Evidence of Govern... fraction of deferrals that contain no decision-relevant information
In algorithm-triggered emotional escalations, workers showed lower engagement: they sent fewer messages, contributed a smaller share of total chat rounds, and showed less proactivity in information seeking and solution provision.
Behavioral measures derived from chat logs in the randomized experiment comparing worker actions post-escalation across escalation types; reported differences in message counts, share of rounds, and proxies for proactivity.
high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... worker engagement measures (message count, share of chat rounds, proactivity ind...
Human intervention is less effective in algorithm-triggered emotional escalations (where customers express frustration or dissatisfaction).
Experimental subgroup analysis comparing intervention outcomes for algorithm-triggered emotional escalations versus technical escalations; emotional escalations showed worse post-intervention outcomes.
high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... service quality after emotional escalations
AI deployment substantially lowers ratings for AI-eligible chats.
Randomized field experiment measuring customer ratings for AI-eligible chats; treated condition (AI + human oversight) produced substantially lower ratings relative to control (humans only).
high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... customer ratings for AI-eligible chats
AI deployment reduces average chat duration.
Randomized field experiment on Alibaba's Taobao platform: workers in treatment supervised an agentic AI resolving AI-eligible chats while handling AI-ineligible chats; control workers resolved all chats without AI. Effect observed on average chat duration in experiment data.
Rather than restoring stability, this cycle intensifies anxiety, undermines mastery, and erodes professional confidence.
Theoretical claim about psychological outcomes from the conceptual reskilling loop; paper provides argumentation but no empirical measurements.
high negative AI-driven skill volatility and the emergence of re-skilling ... anxiety, sense of mastery, professional confidence
Based on Job Demands–Resources (JD-R) theory and Conservation of Resources (COR) theory, the paper conceptualizes an AI-induced reskilling loop in which ongoing technological change leads to skill erosion, continuous reskilling demands, cognitive and emotional depletion, and reinforced learning as a defensive response to perceived obsolescence.
Theoretical model/loop derived from applying JD-R and COR frameworks; no empirical test or sample reported in the paper.
high negative AI-driven skill volatility and the emergence of re-skilling ... cognitive/emotional depletion and defensive learning responses
The paper introduces the concept of 'reskilling fatigue' to explain the human consequences of persistent skill volatility among Established Knowledge Professionals (EKPs).
Conceptual/theoretical contribution presented by the authors; definition and argumentation rather than empirical validation.
high negative AI-driven skill volatility and the emergence of re-skilling ... experience of reskilling fatigue among EKPs
Continuous reskilling is widely promoted as a solution to AI-driven disruption, but little attention has been paid to its cumulative psychological costs.
Argument from literature review/observation in the paper; no empirical measurement or sample reported in the paper.
high negative AI-driven skill volatility and the emergence of re-skilling ... psychological costs of continuous reskilling (e.g., fatigue, stress)
Unless labour law evolves to address digitally mediated control and platform-based asymmetry, the gig economy risks normalising exploitative labour conditions under the guise of innovation and flexibility.
Predictive/theoretical claim based on the paper's synthesis of platform practices, legal gaps, and normative concerns; argued through comparative analysis and conceptual reasoning rather than quantitative forecasting.
high negative Corporate Accountability in the Gig Economy: Re-examining La... future trajectory of labour conditions and normalization of exploitative practic...
The paper uses the concept of 'digital slavery' as a normative framework to describe labour conditions shaped by coercive algorithmic management, absence of bargaining power, and structural precarity.
Conceptual and normative framing within the paper, using the 'digital slavery' metaphor to interpret observed platform labour practices and their implications; theoretical argumentation rather than empirical measurement.
high negative Corporate Accountability in the Gig Economy: Re-examining La... characterisation of labour conditions under algorithmic management
While several jurisdictions (UK, US, EU, India) have attempted to regulate gig work, most regulatory responses remain incomplete and fail to fully address platform accountability.
Comparative policy/regulatory analysis of the United Kingdom, United States, European Union and India assessing statutes, litigation and policy measures; qualitative assessment rather than statistical evaluation (no quantitative sample size reported).
high negative Corporate Accountability in the Gig Economy: Re-examining La... completeness/effectiveness of regulatory responses to platform accountability
Platform companies rely on contractual misclassification, corporate structuring, and the legal fiction of neutrality to separate control from liability.
Legal and corporate-structure analysis across jurisdictions, examining contracts, corporate forms and legal doctrines; based on comparative statutory and case-law review (no quantitative sample size reported).
high negative Corporate Accountability in the Gig Economy: Re-examining La... allocation of legal liability and regulatory accountability
The platform economy produces a deeply unequal labour structure marked by algorithmic control, economic dependency, surveillance, and lack of social protection.
Synthesis and critical analysis combining literature, policy review and comparative jurisdictional study to argue systemic effects on labour structure; primarily qualitative evidence and theoretical framing (no quantitative sample size reported).
high negative Corporate Accountability in the Gig Economy: Re-examining La... distributional labour outcomes and social protection coverage
Gig workers, though formally classified as independent contractors, are functionally subjected to pricing control, performance monitoring, automated penalties, and deactivation mechanisms that closely resemble managerial authority.
Descriptive/qualitative evidence in the paper: examples and analysis of platform design and management practices (algorithmic pricing, monitoring, penalties, deactivation); based on platform policy documents, case examples and comparative review (no quantitative sample size reported).
high negative Corporate Accountability in the Gig Economy: Re-examining La... degree of algorithmic/managerial control over workers
Digital labour platforms exercise employer-like control while avoiding employer-like legal responsibilities.
Argument and comparative legal analysis across jurisdictions (United Kingdom, United States, European Union, India) demonstrating platform practices and legal/regulatory responses; based on documentary/legal review and critical analysis (no quantitative sample size reported).
high negative Corporate Accountability in the Gig Economy: Re-examining La... legal employment classification and control/responsibility