The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (7156 claims)

Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 369 105 58 432 972
Governance & Regulation 365 171 113 54 713
Research Productivity 229 95 33 294 655
Organizational Efficiency 354 82 58 34 531
Technology Adoption Rate 277 115 63 27 486
Firm Productivity 273 33 68 10 389
AI Safety & Ethics 112 177 43 24 358
Output Quality 228 61 23 25 337
Market Structure 105 118 81 14 323
Decision Quality 154 68 33 17 275
Employment Level 68 32 74 8 184
Fiscal & Macroeconomic 74 52 32 21 183
Skill Acquisition 85 31 38 9 163
Firm Revenue 96 30 22 148
Innovation Output 100 11 20 11 143
Consumer Welfare 66 29 35 7 137
Regulatory Compliance 51 61 13 3 128
Inequality Measures 24 66 31 4 125
Task Allocation 64 6 28 6 104
Error Rate 42 47 6 95
Training Effectiveness 55 12 10 16 93
Worker Satisfaction 42 32 11 6 91
Task Completion Time 71 5 3 1 80
Wages & Compensation 38 13 19 4 74
Team Performance 41 8 15 7 72
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 17 15 9 5 46
Job Displacement 5 28 12 45
Social Protection 18 8 6 1 33
Developer Productivity 25 1 2 1 29
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 7 4 9 20
Some patients value human contact for sensitive cases; automated interactions can feel impersonal.
Semi-structured interviews with patients/staff and open-ended survey responses documenting preferences for human interaction in sensitive/complex complaints.
high mixed The Role of Artificial Intelligence in Healthcare Complaint ... patient-reported preference for human contact and perceived interpersonal qualit...
The benefits of FDI (jobs, productivity, skills) are uneven and often conditional on institutional quality, labor regulation, and sectoral composition of investments.
Mechanism mapping and thematic synthesis linking heterogeneous empirical findings to contextual moderators (governance, regulation, sector); review emphasizes consistent role of these moderators across studies.
high mixed Foreign Direct Investment, Labor Markets, and Income Distrib... spillovers (productivity, employment quality, wage gains), distributional outcom...
FDI’s effects on employment, wages, and income distribution in Sub‑Saharan Africa are mixed and highly context‑dependent.
Conceptual literature review synthesizing theoretical frameworks and empirical findings across micro, firm, sectoral, and macro studies; no new primary data. Review notes heterogeneous identification strategies and results across studies and contexts.
high mixed Foreign Direct Investment, Labor Markets, and Income Distrib... employment levels, wages, income distribution
India’s reported post-harvest loss is relatively low (3.2%) despite poor food-security outcomes (Global Hunger Index rank 111/125).
Reported statistics cited in the paper (FAO/Kaggle for post-harvest loss; Global Hunger Index ranking referenced).
high mixed AI in food inequality: Leveraging artificial intelligence to... post-harvest loss (percent) and Global Hunger Index rank
Data‑driven policies can either amplify or mitigate inequalities depending on data representativeness, model design, and deployment governance.
Multiple empirical examples and theoretical analyses in the review highlighting cases of both harm (bias amplification) and mitigation, identified across the 103 items.
high mixed Models, applications, and limitations of the responsible ado... distributional equity outcomes (inequality amplification or mitigation)
Citizen acceptance, transparency, and perceived fairness strongly shape adoption trajectories and the political feasibility of AI tools in government.
Repeated empirical findings in the reviewed literature linking public trust, transparency measures, and fairness perceptions to successful or failed deployments (drawn from multiple case studies in the 103 items).
high mixed Models, applications, and limitations of the responsible ado... adoption trajectory/political feasibility of government AI tools (measured via d...
Adoption of AI and data-driven governance is highly uneven across jurisdictions and sectors, driven by institutional capacity, governance frameworks, and public trust.
Cross‑regional and cross‑sector comparisons in the review corpus (103 items) showing varying maturity levels and repeated identification of institutional capacity, governance arrangements, and trust factors as determinants.
high mixed Models, applications, and limitations of the responsible ado... adoption level/maturity of AI-driven governance systems
Governance approaches are emerging at global, regional and national levels; they vary widely across sectors and jurisdictions, creating opportunities for regulatory experimentation but also risks of fragmentation and regulatory arbitrage.
Cross-jurisdictional comparison of existing/global/regional/national governance instruments and sectoral guidance; gap analysis highlighting heterogeneity.
high mixed AI Governance and Data Privacy: Comparative Analysis of U.S.... degree of regulatory heterogeneity, instances of fragmentation/regulatory arbitr...
Weak formal institutions often coexist with strong informal institutions in African contexts, shaping governance, trust, and enforcement mechanisms in supply chains.
Cross-disciplinary literature review presented in the paper; conceptual argumentation rather than primary empirical analysis.
high mixed Continental shift: operations and supply chain management re... relative strength of formal vs informal institutions and their effects on govern...
Technology effectiveness depends on institutional support (extension, property rights), finance, and local knowledge — technologies are not a silver bullet alone.
Conceptual frameworks and comparative analysis in the review; supporting case studies and program evaluations linking adoption and impact to institutional factors (extension reach, tenure security, access to credit).
high mixed MODERN APPROACHES TO SUSTAINABLE AGRICULTURAL TRANSFORMATION technology adoption rates, realized productivity gains, distribution of benefits...
Productivity gains from generative AI depend on task mix, integration design, and the availability of complementary human skills.
Theoretical evaluation and synthesis of heterogeneous empirical findings; authors highlight variation across firms, sectors, and tasks.
high mixed The Use of ChatGPT in Business Productivity and Workflow Opt... productivity change conditional on task mix/integration/human skills (productivi...
Existing evidence is time-sensitive and heterogeneous: rapidly evolving models, heterogeneous study designs, and many short-term lab/microtask studies limit direct comparability and long-run inference.
Meta-observation from the review: documented methodological limitations across the literature (variation in models, tasks, metrics; prevalence of short-term studies).
high mixed ChatGPT as a Tool for Programming Assistance and Code Develo... generalizability and comparability of empirical findings (study heterogeneity)
Real‑time and LLM‑based methods improve responsiveness but raise governance, transparency, and reproducibility challenges that BLS must manage (audit trails, uncertainty communication).
Operational tradeoff discussion in the paper identifying governance risks; no case studies or incident analyses provided.
high mixed Enhancing BLS Methodologies for Projecting AI's Impact on Em... tradeoff between responsiveness (timeliness/accuracy) and governance metrics (tr...
Distinguishing automation versus augmentation using causal methods changes policy responses (e.g., income support versus reskilling).
Policy implication drawn from conceptual separation of substitution and complementarity effects; logical inference rather than empirical demonstration in the paper.
high mixed Enhancing BLS Methodologies for Projecting AI's Impact on Em... policy prescriptions chosen contingent on causal classification (automation vs a...
Methodological caveats across the literature (heterogeneity of tasks/measures, publication bias, short-term studies) limit the generalizability of current findings.
Meta-level critique within the synthesis noting study heterogeneity, likely publication/short-term biases, and variable domain-specific performance dependent on user expertise and workflows.
high mixed ChatGPT as an Innovative Tool for Idea Generation and Proble... generalizability and external validity of LLM-assisted creativity findings
Standard productivity metrics are likely to undercount the value generated by AI-augmented ideation; quality-adjusted measures of creative output are required.
Measurement critique based on the mismatch between existing productivity statistics and the kinds of upstream idea-generation gains observed in empirical studies; supported by the review's methodological discussion.
high mixed ChatGPT as an Innovative Tool for Idea Generation and Proble... measured productivity vs. true quality-adjusted creative output
The authors were able to fully reproduce the reported results for 49% of CHI papers that had publicly shared study data and analysis code.
Empirical reproduction attempts performed by the authors on the population of CHI papers that publicly shared study data and analysis code (sample defined as 'all CHI papers that had publicly shared study data and analysis code' — exact number/time window not specified in the summary).
high mixed On the Computational Reproducibility of Human-Computer Inter... proportion of papers whose reported results could be fully reproduced from the s...
Realized value from AI methods (ML, predictive analytics, anomaly detection, XAI) is conditional: these technical methods deliver capabilities only when combined with strong data governance, standardized processes, and change management.
Thematic synthesis across the systematic review (2020–2025) showing repeated case-study and practitioner-report evidence that technical gains failed to scale without governance, process standardization, and organizational change efforts.
high mixed Integrating Artificial Intelligence and Enterprise Resource ... magnitude and durability of ERP-AI benefits (e.g., sustained accuracy gains, ado...
Evaluation of the equivalency system should use metrics such as concordance between claimed competencies and verified inputs, predictive validity versus labor-market integration outcomes, and false positive/negative rates in automated decisions.
Methodological recommendation in the paper outlining specific evaluation metrics; this is a prescriptive claim (no empirical implementation reported).
high mixed Establishes a technical and academic bridge between the educ... concordance rate, predictive validity (e.g., accuracy, AUC), false positive/nega...
The hybrid estimator (GA+SQP) is computationally more intensive than single-stage MLE/local optimization, implying a trade-off between estimation reliability and runtime cost.
Reported runtime and computational cost comparisons in estimation experiments: the paper notes longer runtimes for GA+SQP versus standard optimizers while documenting improvements in objective values and convergence behavior.
high mixed k-QREM: Integrating Hierarchical Structures to Optimize Boun... computation time / runtime, convergence reliability
Despite laboratory and pilot successes, many engineered bioprocesses remain at bench or pilot scale and require techno‑economic validation before industrial competitiveness can be established.
Review aggregate noting scale and validation status of case studies (many reported at lab or pilot fermenter scale) and explicit references to the need for TEA and LCA for industrial assessment.
high mixed Harnessing Microbial Factories: Biotechnology at the Edge of... technology readiness level (lab/pilot vs commercial), presence/absence of publis...
Results and implications are limited by the sample and context: evidence comes from law students on a single issue-spotting exam using one brief training intervention, so generalizability to experienced professionals, other tasks, or other models is untested.
Authors’ reported sample (164 law students) and explicit caution about generalizability in the study summary; the intervention and outcome are specific to one exam and one ~10-minute training.
high mixed Training for Technology: Adoption and Productive Use of Gene... Generalizability/applicability to other populations and tasks
Some mechanism-specific estimates are imprecise due to the sample size; confidence intervals for those estimates are wide.
Authors report wide confidence intervals for mechanism decomposition (principal stratification) results based on the randomized sample of 164 students.
high mixed Training for Technology: Adoption and Productive Use of Gene... Precision of mechanism estimates (confidence interval width for adoption vs prod...
Overall, the protocol reframes AI governance in finance as a rights‑centered institutional design problem with direct economic consequences for market structure, credit allocation, compliance costs, and incentives shaping AI model development.
High-level synthesis claim made by the author, supported by the corpus audit (~4,200 texts), 12 years of legal research, doctrinal/comparative analysis, and the economics implications section.
high mixed Diego Saucedo Portillo Sauceport Research measurable economic consequences across market structure (concentration), credit...
Machine learning, recommender systems, NLP, computer vision, causal inference, reinforcement learning, federated learning/differential privacy/secure computation, and algorithmic governance tools are co-deployed in modern ad-tech.
Technical methods inventory drawn from literature and industry reports; no new experimental sample reported.
high mixed Artificial Intelligence for Personalized Digital Advertising... set of methods deployed in advertising systems
Personalization now spans data infrastructures, real-time bidding markets, recommender systems, creative generation, attribution pipelines, privacy tools, and governance regimes — all tightly coupled.
Survey of technical components and industry practice (system-analysis level); descriptive synthesis of common ad-tech stacks and interdependencies; no single-sample empirical audit provided.
high mixed Artificial Intelligence for Personalized Digital Advertising... presence and coupling of personalization components
AI has transformed personalized digital advertising from a narrow prediction task into a complex socio-technical infrastructure.
System-level conceptual analysis and literature synthesis presented in the paper; no single empirical dataset or sample size reported (review of industry components such as RTB, recommender systems, identity graphs).
high mixed Artificial Intelligence for Personalized Digital Advertising... scope and complexity of advertising systems (infrastructure breadth)
Applying differential privacy to model updates provides a bounded formal guarantee on information leakage, but DP noise budgets and communication constraints create accuracy and latency trade-offs that must be managed.
Analytical treatment of DP's impact on learning (trade-off modeling) and qualitative simulation examples showing accuracy degradation under DP noise; no numeric privacy-utility curves from field deployments provided.
high mixed Privacy-Aware AI Advertising Systems: A Federated Learning F... information leakage (DP privacy budget), model accuracy (loss/utility), communic...
There is no consensus in the literature on net job effects — studies diverge on whether AI produces net job gains.
Direct finding from the review: the 17 peer‑reviewed studies produce heterogeneous results on net employment impacts (some positive, some negative, some neutral).
Effects of AI adoption are heterogeneous across industries, firm sizes, regions, and worker characteristics (education, experience, occupation).
Microdata and firm-level studies exploiting cross-sectional and panel variation, quasi-experimental designs leveraging differential adoption across firms/regions, and comparative institutional analyses showing variation by context.
high mixed Intelligence and Labor Market Transformation: A Critical Ana... heterogeneity in employment and wage outcomes by industry, firm size, region, an...
The effects of K_T adoption are heterogeneous across industries, firms, countries, and cohorts — early adopters and capital-rich firms/countries gain most — implying important transition dynamics for political economy.
Cross-country comparisons, industry- and firm-level panel heterogeneity analyses, and case studies demonstrating variation in adoption timing and gains; model simulations emphasizing transition path dependence.
high mixed The Macroeconomic Transition of Technological Capital in the... industry-/firm-/country-level productivity, income, employment, and adoption tim...
Aggregate productivity (output per worker or per unit of inputs) can rise while labor’s share and employment decline due to substitution toward K_T.
Macro growth-accounting exercises decomposing output growth into contributions from labor, traditional capital, and technological capital; model simulations showing productivity gains coexisting with falling labor shares under substitution elasticities.
high mixed The Macroeconomic Transition of Technological Capital in the... productivity (e.g., TFP or output per worker) and labor share
More informative search can degrade both learning and consumer surplus unless the market learns as much as consumers (for example, by "reading the transcripts" of agentic conversations).
Analytical comparative statics in the paper's theoretical model showing how increasing the informativeness of consumer-side signals affects learning dynamics and welfare; relies on model assumptions about what information the market collects versus consumers.
high negative Agentic Markets: Equilibrium Effects of Improving Consumer S... consumer surplus (and market learning about product fit)
Performance degradation persists even when context is provided via structured semantic layers including AST-extracted function context and import graph resolution.
Experiments comparing unstructured versus structured context provision; structured semantic layers (AST context, import graph resolution) were evaluated and models still degraded with more context.
high negative SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... model detection/performance when given structured semantic context
Models' performance degrades monotonically from diff-only (config_A) to diff+file content (config_B) to full context (config_C) across all 8 models.
Systematic ablation across three frozen context configurations (config_A, config_B, config_C) reported; all 8 evaluated models show monotonic performance decline as more context is provided.
high negative SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... model performance score across context-provision configurations
Eight frontier models detect only 15–31% of human-flagged issues on the diff-only configuration (config_A).
Empirical evaluation across 8 models on SWE-PRBench (350 PRs) under the diff-only configuration; reported detection rates of 15–31% relative to human-flagged issues.
high negative SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... detection rate of human-flagged issues
There is a growing gap between rapid experimentation with AI tools and limited organizational capability to institutionalize them in everyday workflows.
Argument supported by targeted literature synthesis and review of recent scholarly and institutional sources; no primary empirical sample reported in this paper.
high negative Behavioral Factors as Determinants of Successful Scaling of ... organizational capability to institutionalize AI initiatives (pilot-to-productio...
Data reveals that less than 0.7% of the Indian population uses AI-induced ride services.
Empirical statistic reported in the paper (declared as data) quantifying the share of the population using AI-induced ride services.
high negative Artificial Intelligence, Demand Switching and Sectoral Wage ... share of population using AI-induced ride services
The lack of a significant worsening in transportation-sector inequality can be attributed to sluggish demand switching from non-AI to AI-based services in India.
Argument in the paper linking empirical finding (no significant increase in inequality) to low observed adoption rates of AI-based ride services; supported by reported adoption statistic.
high negative Artificial Intelligence, Demand Switching and Sectoral Wage ... rate of demand switching / adoption
Evaluations across eight state-of-the-art multimodal models reveal that models achieved only 55.0% accuracy on help prediction.
Experimental evaluation reported in the paper comparing eight multimodal models on the Help Prediction task with reported accuracy metric.
Evaluations across eight state-of-the-art multimodal models reveal that models achieved only 44.6% accuracy on behavior state detection.
Experimental evaluation reported in the paper comparing eight multimodal models on the Behavior State Detection task with reported accuracy metric.
high negative GUIDE: A Benchmark for Understanding and Assisting Users in ... behavior state detection accuracy
Technological proximity has a noteworthy negative effect on collaboration, underscoring the importance of complementary knowledge in AI innovation.
SAOM estimates from longitudinal patent collaboration data (2013–2024) showing a statistically negative coefficient for technological proximity (implying organizations closer in technology space are less likely to form ties).
high negative The evolutionary mechanism of artificial intelligence indust... tie formation / collaboration probability (as a function of technological proxim...
Sentiment signals derived from sparse news are commonly used in financial analysis and technology monitoring, yet transforming raw article-level observations into reliable temporal series remains a largely unsolved engineering problem.
Framing statement in the paper's introduction/abstract describing the problem motivation; conceptual argument rather than empirical test.
high negative Causal Reconstruction of Sentiment Signals from Sparse News ... reliability of temporal sentiment series reconstructed from article-level news
Ikema is a severely endangered Ryukyuan language spoken in Okinawa, Japan, with approximately 1,300 remaining speakers, most of whom are over 60 years old.
Demographic/descriptive claim reported in the paper's background (likely citing prior surveys or census estimates); the abstract states the ~1,300 speakers figure and age distribution.
high negative Automatic Speech Recognition for Documenting Endangered Lang... number and age distribution of speakers
The financial planning and investment management profession is undergoing a radical transformation driven by Generative AI (GenAI) and Agentic AI, creating urgent workforce displacement challenges that require coordinated government policy intervention alongside educational reform.
Author assertion in the paper's introduction/abstract; framing argument based on the paper's synthesized analysis (no empirical sample, no reported statistical test).
high negative STRENGTHENING FINANCIAL WORKFORCE COMPETITIVENESS: A CURRICU... rate of workforce displacement in the financial planning and investment manageme...
Within the set of agentic-mention filings, autonomy evidence remains rare.
Empirical statement derived from analysis of the identified agentic-mention filings (small number of such filings reported across 2024–2025).
high negative Measuring agentic AI adoption and control frameworks in fina... presence/rarity of autonomy-related evidence within agentic-mention filings
LLM design agents can fixate on existing paradigms and fail to explore alternatives when solving design challenges, potentially leading to suboptimal solutions (a pathology analogous to human designers).
Literature/background claim and authors' characterization of observed agent behavior; motivated the proposed metacognitive interventions. No numerical sample size reported.
high negative Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regul... tendency to fixate on existing paradigms / lack of exploration leading to subopt...
Current closed models are generally ill-suited for scientific purposes (with some notable exceptions).
Argumentative and evaluative reasoning in the paper comparing features of closed models to scientific needs; no empirical sample size reported in abstract.
high negative How Open Must Language Models be to Enable Reliable Scientif... suitability of models for scientific research / quality of scientific inference
Restrictions on information about model construction and deployment threaten reliable inference in research that involves those models.
Conceptual argument and analysis presented in the paper (no empirical sample or randomized evaluation reported in abstract). The paper analyzes how specific types of information restrictions (about model construction and deployment) create threats to inference.
high negative How Open Must Language Models be to Enable Reliable Scientif... reliable inference / scientific inference
This inefficiency directly undermines UN Sustainable Development Goals 13 (Climate Action) and 10 (Reduced Inequalities) by hindering equitable AI access in resource-constrained regions.
Normative/analytic claim in the paper linking energy inefficiency to negative impacts on specific UN SDGs (argumentative, not empirically quantified in the abstract).
high negative EcoThink: A Green Adaptive Inference Framework for Sustainab... equitable AI access / progress toward SDGs 13 and 10