The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6869 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Governance Remove filter
Real-world deployment will require representative data coverage and online adaptation despite the method’s robustness mechanisms.
Authors' discussion/limitations section: theoretical requirements for persistently exciting/representative trajectories for DeePC and recommendation for online adaptation and continual data collection for deployment.
high negative Data-driven generalized perimeter control: Zürich case study data representativeness and need for online adaptation (deployment readiness/ris...
Agent performance degrades markedly as environment complexity, stochasticity, and non-stationarity increase, revealing core limitations of current LLM-based agents for long-horizon, multi-factor decision problems.
Experimental results across progressively harder RetailBench environments showing performance falloff for multiple LLMs under increased task complexity and non-stationarity.
high negative RetailBench: Evaluating Long-Horizon Autonomous Decision-Mak... overall agent performance across increasing environment complexity (e.g., fulfil...
Behavioral memorization probe (TS‑Guessing) signaled memorization above chance for 72.5% of prompts across all models and items.
Experiment 3 — TS‑Guessing behavioral probe applied exhaustively to all 513 MMLU questions × six models (total prompts = 513×6); statistical thresholds used to classify above-chance memorization signals, yielding 72.5% of prompts flagged.
high negative Are Large Language Models Truly Smarter Than Humans? fraction of prompt-model pairs with statistically significant memorization signa...
Paraphrase / indirect-reference diagnostic: on a 100-question subset, average accuracy dropped by 7.0 percentage points under indirect referencing.
Experiment 2 — paraphrase/indirect-reference diagnostic applied to a 100-question subset of MMLU; measured delta between original and paraphrased question accuracy averaged to 7.0 percentage points.
high negative Are Large Language Models Truly Smarter Than Humans? mean accuracy drop (percentage points) under paraphrase/indirect prompts
STEM items show higher lexical contamination (18.1%) relative to the overall rate.
Category-level results from Experiment 1 (lexical matching) on the MMLU dataset (513 questions), aggregated by subject domain to compute an 18.1% contamination rate for STEM categories.
high negative Are Large Language Models Truly Smarter Than Humans? category-level contamination prevalence (STEM)
Overall lexical contamination: 13.8% of MMLU items show evidence of exposure in training data.
Experiment 1 — lexical contamination detection pipeline that searched model training–era public corpora and the open web for literal or near-literal occurrences of the 513 MMLU questions/answers; per-item contamination flags aggregated to produce the 13.8% figure.
high negative Are Large Language Models Truly Smarter Than Humans? contamination prevalence (fraction of benchmark items with lexical matches)
Public leaderboards overstate modern LLM capabilities because substantial portions of benchmark QA items appear in (or are memorized from) training data, inflating measured accuracy.
Multi-method contamination audit across six frontier LLMs (GPT-4o, GPT-4o-mini, DeepSeek-R1, DeepSeek-V3, Llama-3.3-70B, Qwen3-235B) evaluated on the MMLU benchmark (513 questions, 57 subjects), using lexical matching, paraphrase sensitivity, and behavioral memorization probes that together show systematic leakage.
high negative Are Large Language Models Truly Smarter Than Humans? inflation of measured benchmark accuracy / overstatement of model capability
Proactive AI at national scale amplifies concerns around transparency, accountability, privacy, and potential misuse, necessitating robust regulatory and ethical frameworks.
Normative and ethical analysis in the paper, supported by general literature on large-scale AI governance; no empirical assessment of regulatory effectiveness in Russia included.
high negative DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... risks to transparency, accountability, privacy and potential for misuse
There are limited randomized controlled trials or longitudinal evaluations; few studies measure patient-relevant outcomes or economic impacts.
Literature synthesis noting scarcity of RCTs and long-term observational studies, and absence of widespread patient-outcome and cost-effectiveness evaluations in existing publications.
high negative Human-AI interaction and collaboration in radiology: from co... number of RCTs/longitudinal studies, frequency of patient outcome and economic o...
Many published studies focus on standalone algorithm accuracy rather than clinician–AI joint performance in routine workflows.
Review of the literature categorizing study designs (preponderance of algorithm development/validation studies, fewer reader-in-the-loop, simulation, or deployment studies).
high negative Human-AI interaction and collaboration in radiology: from co... proportion of studies reporting standalone algorithm metrics versus those report...
Ethical and legal issues—patient privacy, algorithmic bias, intellectual property, and equitable access—pose risks to AI deployment in drug development.
Ethics and legal analyses, policy reports, and documented case examples collated in the review that identify these recurring concerns.
high negative From Algorithm to Medicine: AI in the Discovery and Developm... ethical/legal risk incidence; privacy breaches; bias outcomes; access inequities
Regulatory uncertainty about validation standards and liability for AI tools raises investment risk and may slow deployment.
Regulatory and policy reports included in the narrative review describing evolving standards and open questions about validation, explainability, and liability for ML-based tools.
high negative From Algorithm to Medicine: AI in the Discovery and Developm... regulatory clarity; investment risk and deployment timelines
Adoption of AI in drug R&D requires high upfront investment in data curation, compute infrastructure, and specialized talent.
Industry reports and economic analyses summarized in the review reporting capital and operational needs for building AI capabilities; qualitative synthesis rather than quantitative costing across firms.
high negative From Algorithm to Medicine: AI in the Discovery and Developm... fixed upfront costs (data curation, compute, hiring/training)
Limited transparency and interpretability of many AI algorithms (black-box models) complicate clinical and regulatory trust and adoption.
Regulatory reports, methodological critiques, and case examples in the review highlighting interpretability concerns and their impact on clinical/regulatory acceptance.
high negative From Algorithm to Medicine: AI in the Discovery and Developm... clinical/regulatory acceptance, trust, and adoption rates; explainability metric...
Performance of AI models in drug R&D depends on large, high-quality, and representative biomedical datasets; dataset bias or gaps substantially undermine model performance and generalizability.
Methodological literature and case studies cited in the review documenting failures or limited generalization when training data are biased, sparse, or non-representative; thematic synthesis rather than pooled quantification.
high negative From Algorithm to Medicine: AI in the Discovery and Developm... model performance/generalizability across populations and contexts
High-quality, standardized, interoperable data (clean, annotated, connected across modalities) is a critical limiting factor for translating AI capability into sustained impact.
Conceptual emphasis and domain knowledge argument in the editorial; no empirical measurement of data quality's causal effect included.
high negative AI as the Catalyst for a New Paradigm in Biomedical Research ability to translate AI capability into sustained impact (dependent on data qual...
The paper's evidence base is limited by early-stage projects with limited longitudinal outcome data and dependence on publicly available project information which may be incomplete or biased.
Methods and limitations explicitly stated in the paper (qualitative review; reliance on secondary sources; two case studies; absence of large-scale quantitative evaluation).
high negative Decentralized Autonomous Organizations in the Pharmaceutical... completeness and robustness of empirical evidence supporting claims about DAO ef...
Data protection and privacy (especially sensitive health data) complicate open-data DAO models.
Conceptual analysis referencing privacy/data-protection concerns for health data (e.g., GDPR-like regimes); no empirical evaluation of privacy breaches within DAOs provided.
high negative Decentralized Autonomous Organizations in the Pharmaceutical... data privacy risk level, feasibility of open-data sharing for clinical data
Significant barriers remain for DAOs in pharma: regulatory uncertainty about tokenized securities, IP fractionalization, and clinical data sharing.
Legal/regulatory analysis and literature synthesis highlighting unclear classifications and open regulatory questions; no new regulatory rulings provided.
high negative Decentralized Autonomous Organizations in the Pharmaceutical... regulatory clarity/status for tokenized securities and IP models; legal risk ind...
Pharmaceutical R&D faces rising costs, long approval timelines, supply-chain inefficiencies, and low patient involvement.
Literature review and synthesis of well-documented industry challenges cited in the paper (secondary sources); no new primary data presented in this study.
high negative Decentralized Autonomous Organizations in the Pharmaceutical... R&D cost per approved drug, average time-to-approval, supply-chain performance m...
There is limited reporting on privacy safeguards, model interpretability, and external validity in the reviewed studies.
Review observed sparse reporting on privacy protections, interpretability analyses and external validation across included studies.
high negative Deep technologies and safer gambling: A systematic review. frequency/extent of reporting on privacy safeguards and interpretability (qualit...
Misclassification risks (false positives and false negatives) are a common limitation and can harm consumers by incorrectly restricting access or by failing to detect harm.
Review notes model error rates reported via precision/recall and AUC; discusses harms from false positives/negatives as a recurrent limitation in the literature.
high negative Deep technologies and safer gambling: A systematic review. model error rates and downstream consumer harm risk (false positive/negative imp...
Privacy and ethical concerns are substantial: continuous monitoring and sensitive behavioural inference raise privacy, surveillance, and misuse risks.
Multiple included studies and the review discussion explicitly identify privacy, ethical, and potential misuse concerns with continuous monitoring and behavioural inference.
high negative Deep technologies and safer gambling: A systematic review. privacy/ethical risk (qualitative concerns reported across studies)
Advanced technologies' complexity and lack of explainability create risks for audit reliability and professional judgement.
Findings from literature synthesis and professional/regulatory perspectives included in the review; presented as an identified risk/challenge rather than quantified effect.
high negative Audit 5.0 and the Digital Transformation of Auditing: The Ro... audit reliability and the exercise of professional judgement in presence of opaq...
Audit 5.0 introduces key challenges: data quality and integration issues, complexity and explainability of advanced technologies, regulatory and ethical uncertainty, and skills shortages combined with cultural resistance.
Systematic literature review and synthesis of professional standards and regulatory perspectives; assertions based on reviewed literature rather than a single empirical dataset.
high negative Audit 5.0 and the Digital Transformation of Auditing: The Ro... barriers to adoption/readiness factors (data quality, explainability, regulatory...
At the question level, incorrect chatbot suggestions substantially reduce caseworker accuracy, with a two-thirds reduction on easy questions where the control group performed best.
Question-level analysis from the randomized experiment comparing cases where chatbot suggestions were incorrect versus control; paper reports a ~66% reduction in accuracy on easy questions when chatbot suggestions were incorrect (exact denominators and statistics not provided in the excerpt).
high negative LLMs in social services: How does chatbot accuracy affect hu... caseworker accuracy on easy questions when presented with incorrect chatbot sugg...
Common barriers to ERM adoption in MSMEs include resource constraints and lack of expertise.
Findings from the literature review identifying determinants and barriers reported across studies (survey and qualitative studies commonly cited in such reviews); specific sample sizes/methods not provided in the summary.
high negative A Literature Review: Effect of Enterprise Risk Management (E... ERM adoption/implementation (barriers and determinants)
MSMEs are particularly vulnerable to external shocks because of limited financial resources, weak internal controls, and heavy dependence on owner-managers’ intuition.
Background literature summarized in the review describing common structural and governance characteristics of MSMEs; drawn from multiple sources in the literature (specific studies not cited in the summary).
high negative A Literature Review: Effect of Enterprise Risk Management (E... vulnerability to external shocks
The article identifies and lays out several concerns regarding the government's approach to regulating AI.
Analytical critique presented in the paper (legal/policy analysis summarizing potential regulatory shortcomings). Based on the author's review and argumentation rather than primary empirical data.
high negative Regulation and governance of artificial intelligence in Indi... adequacy and risks of the government's AI regulatory approach
Environmental regulations weaken the beneficial influence of generative AI on a company's ESG performance.
Moderation/interaction tests in the panel-data econometric model using measures of environmental regulation (on the same 2012–2024 Chinese A-share firm sample) showing a statistically significant negative interaction effect.
high negative How Can Generative AI Promote Corporate ESG Performance? Evi... corporate ESG performance (effect of generative AI moderated by environmental re...
Gaps in infrastructure readiness, digital awareness, and inclusive policy frameworks hinder equitable AI adoption among micro‑enterprises.
Cross‑study synthesis of barriers identified across the 55 included articles; infrastructural, awareness, and policy barriers are explicitly reported as recurring themes.
high negative Role of AI in Enhancing Work Efficiency and Opportunities fo... barriers to AI adoption (infrastructure readiness, digital awareness, policy inc...
Entrenched societal inequities imply that women and girls are often disproportionately held back from achieving their potential.
Broad claim referencing societal inequities and their effects on women and girls; stated in the introduction without specific empirical citations in the excerpt.
high negative Social Protection and Gender: Policy, Practice, and Research socioeconomic attainment of women and girls (e.g., income, education, empowermen...
Significant challenges persist for AI-enhanced GS-BESS deployment, including limited data availability, poor model generalization, high computational requirements, scalability issues, and regulatory gaps.
Barriers and limitations identified across the literature as reported in this systematic review (PRISMA-based synthesis). The excerpt does not enumerate which studies reported each barrier or provide prevalence statistics.
high negative Grid-Scale Battery Energy Storage and AI-Driven Intelligent ... Barriers to effective AI application and large-scale GS-BESS deployment (data av...
A preregistered, nationally representative replication experiment in the United States (N = 1,200) replicates the causal finding that a labor-replacing (vs. labor-creating) AI frame reduces willingness to politically engage with future AI developments.
Preregistered randomized experiment (nationally representative US sample, N = 1,200) replicating the UK manipulation and measuring willingness to engage politically regarding AI.
high negative Perceiving AI as labor-replacing reduces democratic legitima... willingness to politically engage with future AI developments (self-reported)
A preregistered, nationally representative experiment in the United Kingdom (N = 1,202) shows that exposure to a labor-replacing (vs. labor-creating) AI frame causally reduces trust in democracy.
Preregistered randomized experiment (nationally representative UK sample, N = 1,202) manipulating AI framing (labor-replacing vs. labor-creating) and measuring trust/satisfaction with democratic institutions.
high negative Perceiving AI as labor-replacing reduces democratic legitima... trust in democracy / satisfaction with democratic institutions (post-manipulatio...
Large-scale survey data indicate that the public tends to view AI as labor-replacing rather than labor-creating.
Cross-sectional survey (N = 37,079 respondents across 38 European countries); descriptive analysis of responses about AI's labor market impact.
high negative Perceiving AI as labor-replacing reduces democratic legitima... public perception of AI's labor-market impact (labor-replacing vs. labor-creatin...
Only 12% of gig workers participate in retirement savings programs.
Survey and administrative measures of retirement-savings participation among gig workers in the 24-country sample.
high negative The Gig Economy and Labor Market Restructuring: Platform Wor... proportion of gig workers participating in retirement savings programs (%)
Only 23% of gig workers report access to employer-provided health insurance.
Self-reported benefits coverage from labor force surveys and linked administrative records for gig workers across the 24 OECD countries (2015–2025).
high negative The Gig Economy and Labor Market Restructuring: Platform Wor... proportion of gig workers reporting access to employer-provided health insurance...
The environmental footprint of healthcare systems is growing and persistent inequities in access and outcomes have intensified calls for procurement reform.
Contemporary literature review and synthesis of sector reports and studies documenting healthcare emissions/footprint and health inequities (no original empirical data reported in this paper).
high negative Greening the Medicaid Supply Chain: An ESG-Integrated Framew... environmental footprint of healthcare systems; inequities in access and health o...
There exists a systemic governance vacuum around GenAI, including gaps in privacy, accountability, and intellectual property protections.
Authors' synthesis of governance-related gaps reported across the 28 secondary studies and research agendas in the review.
high negative The Landscape of Generative AI in Information Systems: A Syn... adequacy of governance mechanisms for privacy, accountability, and intellectual ...
Societal and ethical risks—such as bias, misuse, and skill erosion—constrain GenAI adoption.
Themes synthesized from the reviewed literature (28 papers) reporting societal and ethical concerns associated with GenAI deployment.
high negative The Landscape of Generative AI in Information Systems: A Syn... societal-ethical risk level associated with GenAI (bias incidence, misuse potent...
Technical unreliability—manifesting as hallucinations and performance drift—is a major constraint on GenAI adoption.
Recurring identification of technical reliability issues (hallucinations, performance drift) in the 28 reviewed papers and authors' aggregation of technical risks.
high negative The Landscape of Generative AI in Information Systems: A Syn... technical reliability of GenAI systems (frequency/severity of hallucinations and...
Adoption of GenAI is constrained by multiple interrelated challenges.
Cross-paper synthesis from the systematic review of 28 studies identifying recurring barriers and constraints reported in the literature.
high negative The Landscape of Generative AI in Information Systems: A Syn... level/extent of GenAI adoption (barriers to adoption)
Ongoing issues remain such as data access, model transparency, ethical concerns, and the varying relevance across Global North and Global South contexts.
Critical synthesis within the review drawing on discussions and critiques in the literature about barriers and ethical challenges; based on reported limitations and regional comparisons in reviewed studies (no numerical breakdown provided).
high negative Advancing Urban Analytics: GeoAI Applications in Spatial Dec... barriers to GeoAI adoption and trustworthy use: data accessibility, model interp...
Human judgment is constrained by bounded rationality, cognitive biases, and information-processing limitations.
Cited as established findings from prior research across decision sciences and related fields (extensive literature evidence referenced; no new empirical data in this paper's abstract).
high negative Reframing Organizational Decision-Making in the Age of Artif... human judgment accuracy/quality and cognitive processing capacity
Key implementation challenges include data quality and integration, model interpretability, cybersecurity and privacy, regulatory/compliance uncertainty, skills gaps among accounting professionals, and implementation costs.
Identified by the paper through literature review and practitioner reports; these are presented as recurring barriers rather than quantified with a specific sample.
high negative Role of Artificial Intelligence in the Accounting Sector incidence/severity of implementation barriers (data quality scores, integration ...
Many studies on serious-game DSTs are small-scale or experimental, and long-term impact data on behavioral change and emissions outcomes are sparse, limiting generalizability.
Review of the literature summarized in the chapter showing predominance of case studies, prototypes, and short-term evaluations rather than longitudinal or large-sample studies.
high negative Serious games and decision support tools: Supporting farmer ... Study scale/sample size, duration of follow-up, evidence on long-term behavior c...
Ensuring scientific validity of game models, scaling co-design processes, measuring real-world behavioral change, and aligning incentives (policy/subsidies, markets) are remaining challenges to using serious games for DST uptake.
Chapter discussion of limitations and gaps identified in the reviewed literature; absence or sparsity of long-term validation studies and large-scale co-design implementations documented in existing research.
high negative Serious games and decision support tools: Supporting farmer ... Model validity (accuracy vs. empirical data), scalability of co-design processes...
Current uptake of DSTs for net zero remains limited because of issues of trust, usability, lack of evidence linking actions to farm profitability, and poor integration into farmer workflows.
Literature synthesis, qualitative interviews and surveys, case studies documenting low adoption and barriers; multiple practice reports and studies cited in the chapter. Many studies report limited or uneven adoption across contexts.
high negative Serious games and decision support tools: Supporting farmer ... DST adoption/use rates; reported barriers (trust, usability, integration)
Using LLM participants without rigorous validation can bias external validity and causal inference in economic research.
Review documents cognitive misalignments and distortions that can bias estimated behaviors, preferences, or treatment effects; authors highlight this as a risk.
high negative Synthetic Participants Generated by Large Language Models: A... bias in estimated behaviors, preferences, or causal effects when using synthetic...