The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (8542 claims)

Adoption
5831 claims
Productivity
5063 claims
Governance
4582 claims
Human-AI Collaboration
3625 claims
Labor Markets
2749 claims
Innovation
2704 claims
Org Design
2667 claims
Skills & Training
2126 claims
Inequality
1429 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 448 118 70 511 1163
Governance & Regulation 458 217 125 67 884
Research Productivity 274 103 35 303 720
Organizational Efficiency 444 106 78 43 675
Technology Adoption Rate 347 130 76 45 603
Firm Productivity 324 39 73 13 454
Output Quality 273 76 27 30 406
AI Safety & Ethics 122 188 46 27 385
Market Structure 119 134 86 14 358
Decision Quality 182 79 41 20 326
Fiscal & Macroeconomic 95 58 34 22 216
Employment Level 78 37 80 9 206
Skill Acquisition 104 37 41 9 191
Innovation Output 127 12 26 14 180
Firm Revenue 101 38 24 163
Task Allocation 95 18 36 8 159
Consumer Welfare 77 38 37 7 159
Inequality Measures 29 81 33 6 149
Regulatory Compliance 54 61 13 3 131
Task Completion Time 92 8 4 3 107
Worker Satisfaction 49 36 13 8 106
Error Rate 45 53 6 104
Training Effectiveness 60 13 12 16 102
Wages & Compensation 56 16 20 5 97
Team Performance 51 13 15 8 88
Automation Exposure 28 29 12 7 79
Job Displacement 7 45 13 65
Hiring & Recruitment 42 4 7 3 56
Developer Productivity 38 5 4 3 50
Social Protection 22 12 7 2 43
Creative Output 17 8 6 1 32
Skill Obsolescence 3 26 2 31
Labor Share of Income 12 7 10 29
Worker Turnover 10 12 3 25
BenchPreS can be used as an evaluative tool for mechanism designers and regulators to measure and compare models' context‑sensitivity to guide incentives, penalties, or certification regimes.
Methodological claim about the benchmark's applicability: BenchPreS produces MR and AAR metrics that can be used for comparisons; paper suggests use in policy/design contexts.
high positive BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Usability of BenchPreS metrics (MR, AAR) for model comparison and regulatory eva...
BenchPreS provides a benchmark and evaluation protocol that systematically varies stored user preference, interaction partner (self vs third party), and normative requirement to assess appropriate suppression or application of preferences.
Dataset construction and evaluation procedure described: scenario generation varying preference, partner, and normative appropriateness; MR and AAR computed across the scenario set.
high positive BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Benchmark coverage and experimental protocol (design dimensions: preference, par...
Historical transitions in standard work hours (e.g., six-day to five-day week) show that phased implementation, collective bargaining, and complementary policies can make work-time reductions feasible and economically beneficial.
Historical analyses and case studies of past industrialized-country workweek transitions cited in the synthesis; evidence drawn from historical institutional records and prior economic histories rather than a unified econometric analysis.
high positive A Shorter Workweek as a Policy Response to AI-Driven Labor D... feasibility and economic outcomes of phased work-time reductions (employment, pr...
The paper advances a replicable interdisciplinary synthesis method and provides a simulated dataset and transparent protocols enabling other researchers to adapt the approach.
Methods section detailing systematic literature search protocols (ACM/IEEE/Springer, 2020–2024), inclusion criteria, simulation parameterization for the cross-sectoral dataset (seven industries, 2020–2024), and stated reproducibility materials.
high positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Availability and description of reproducible methods and a simulated dataset (re...
AI adoption is strongly associated with workforce skill transformation (reported correlation r = 0.71).
Correlational analysis reported in the paper using the simulated cross-sectoral dataset that mirrors employment trends across seven industries (Manufacturing, Healthcare, Finance, Education, Transportation, Retail, IT Services) over 2020–2024. This corresponds to sector-year observations (7 sectors × 5 years = 35 observations) and is triangulated with findings from a systematic literature synthesis (ACM, IEEE, Springer publications 2020–2024).
high positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Skill shift index (measure of changes in required skills and task composition)
The evaluation compared models on multiple metrics (accuracy, precision, recall, F1, AUC) across repeated trials and cross-company tests, and reported gains for AI methods across these metrics.
Evaluation protocol described: repeated trials, cross-validation, holdout sets, cross-company tests; reported performance improvements for AI models on the listed metrics.
high positive Adoption of AI-Based HR Analytics and Its Impact on Firm Pro... Classification evaluation metrics (accuracy, precision, recall, F1, AUC)
Ensemble methods and deep learning models show the largest and most consistent improvements in predictive performance relative to classic statistical models.
Aggregate results across repeated trials and evaluation metrics indicate Random Forests and Gradient Boosting (ensembles) and deep neural networks outperform linear/logistic regression and other baselines on the publicly available datasets used.
high positive Adoption of AI-Based HR Analytics and Its Impact on Firm Pro... Predictive performance (accuracy, F1, AUC, etc.)
Modern AI-driven prediction methods (especially ensemble models and deep neural networks) systematically outperform traditional statistical approaches at predicting job performance in publicly available workforce datasets.
Direct model comparison reported in the paper: baseline statistical models (linear/logistic regression) versus machine learning models (Random Forest, Gradient Boosting, SVM, deep neural networks) evaluated on multiple publicly available workforce datasets using cross-validation and holdout sets; performance reported on accuracy, precision, recall, F1, and AUC across repeated trials.
high positive Adoption of AI-Based HR Analytics and Its Impact on Firm Pro... Job performance prediction (classification performance metrics: accuracy, precis...
Research priorities include rigorous real-world trials assessing patient outcomes, cost-effectiveness, and labor impacts; comparative studies of integration strategies; measurement of long-run workforce effects; and development of standard metrics and monitoring frameworks.
Explicit recommendations from the narrative review based on identified gaps: scarcity of RCTs, economic analyses, and long-term workforce studies.
high positive Human-AI interaction and collaboration in radiology: from co... number and quality of real-world trials, existence of standardized monitoring fr...
Economists and researchers should measure organizational mediators (governance, mentoring practices, learning processes) alongside AI adoption and use empirical designs such as difference-in-differences with phased rollouts, randomized mentoring/training interventions, matched employer–employee panels, and IV exploiting exogenous shocks to innovation backing to identify causal effects.
Methodological recommendations and proposed empirical designs contained in the paper; no implementation or empirical results reported.
high positive Revolutionizing Human Resource Development: A Theoretical Fr... feasibility and validity of empirical identification strategies for causal effec...
The integrated framework links multi-level outcomes: micro (individual skills, task performance), meso (team coordination, workflows), and macro (organizational strategy, innovation, productivity) effects to adaptive structuration processes and affordance actualization.
Framework specification and theoretical mapping across levels in the conceptual paper; no empirical validation or sample.
high positive Revolutionizing Human Resource Development: A Theoretical Fr... individual skills and performance; team coordination and workflow quality; organ...
The paper develops a conceptual framework that integrates Adaptive Structuration Theory (AST) and Affordance Actualization Theory (AAT) to explain how effective human–AI collaboration can be structured within organizations.
Conceptual/theoretical synthesis and literature integration combining AST and AAT streams; no original empirical data or sample reported (theoretical development).
high positive Revolutionizing Human Resource Development: A Theoretical Fr... explanatory power / conceptual framework for human–AI collaboration
As the competition progressed, teams relied more on the AI for larger subtasks (increasing delegation and reliance).
Time-series instrumentation of AI interactions and participant behavior during the live CTF with 41 participants showing increased frequency and scope of delegated tasks later in the event.
high positive Understanding Human-AI Collaboration in Cybersecurity Compet... frequency of delegation and average scope/complexity of delegated tasks over com...
One autonomous agent finished second overall on the fresh challenge set.
Final ranking/scoreboard from benchmarking the four autonomous agents against the live CTF challenge set and human teams; agent achieved overall 2nd place.
high positive Understanding Human-AI Collaboration in Cybersecurity Compet... overall ranking (2nd place) on the challenge set
In a live onsite Capture-the-Flag (CTF) study (41 participants), human teams increasingly delegated larger subtasks to an instrumented AI as the competition progressed.
Empirical observation and instrumentation of AI interactions during a live, onsite CTF with 41 human participants/teams; delegation and task-size metrics tracked over time during the event.
high positive Understanding Human-AI Collaboration in Cybersecurity Compet... degree/size of subtasks delegated to the AI over time (delegation rate and subta...
Reward shaping at the assignment layer enables an explicit trade-off between diagnostic accuracy and human labor by incorporating penalties for human involvement.
Methodology section describing reward shaping and experimental comparisons showing different accuracy/human-effort trade-offs (results reported in paper; exact experimental details not provided in the summary).
high positive Hierarchical Reinforcement Learning Based Human-AI Online Di... diagnostic accuracy vs human effort (as controlled by reward shaping)
Masked reinforcement learning techniques constrain or mask action spaces, reducing exploration over huge symptom/action spaces.
Paper describes use of masked RL to limit action options during training and execution; used in both assignment and execution layers (methodological claim supported by algorithmic description and experiments).
high positive Hierarchical Reinforcement Learning Based Human-AI Online Di... action-space reduction / sample efficiency / learning stability (as applied to s...
The upper layer ('master') learns turn-by-turn human–machine assignment using masked reinforcement learning with reward shaping to balance accuracy and human cost.
Methodological description in the paper and empirical results from experiments using masked RL and reward-shaped objectives at the assignment layer (implementation and experimental setup reported; dataset/sample size not specified in summary).
high positive Hierarchical Reinforcement Learning Based Human-AI Online Di... assignment policy performance; human effort allocation; diagnostic accuracy unde...
Service empathy mediates the relationship between employee emotion and collaboration proficiency.
Mediation analysis conducted on the experimental sample (n = 861) showing that measured 'service empathy' accounts for (part of) the effect of employee emotion on collaboration proficiency.
high positive Adoption of AI partners in temporary tasks: exploring the ef... collaboration proficiency
The paper advances augmentation debates by articulating the leader’s practical role when decision lead‑agency shifts between humans and AI and by detailing systemic HR changes needed to sustain performance, legitimacy and well‑being.
Stated contribution of the conceptual synthesis comparing existing augmentation and leadership literatures and providing an HR‑focused framework; descriptive of the paper's intellectual contribution.
high positive Symbiarchic leadership: leading integrated human and AI cybe... clarity of leader role; specification of HR system changes
Core practice 4 — Embed governance: make accountability, bias testing, privacy safeguards, audit trails, escalation thresholds and human oversight explicit and routine.
Prescriptive governance practice grounded in literature on algorithmic accountability and risk management and in practitioner examples; presented without original empirical validation.
high positive Symbiarchic leadership: leading integrated human and AI cybe... bias incidence; privacy breaches; auditability and compliance metrics
Core practice 3 — Manage the human–AI relationship: build adoption, psychological safety and calibrated trust; address automation anxiety and misuse.
Framework recommendation synthesizing organizational‑psychology and technology adoption literature plus practitioner observations; not tested empirically in the paper.
high positive Symbiarchic leadership: leading integrated human and AI cybe... adoption rates; psychological safety; calibrated trust; misuse incidents
Core practice 2 — Treat AI outputs as hypotheses: require human sensemaking and validation rather than blind adoption of model outputs.
Prescriptive practice derived from reviewed research and practitioner cases emphasizing human oversight; presented as framework guidance rather than empirically validated intervention.
high positive Symbiarchic leadership: leading integrated human and AI cybe... decision quality; error rates; incidence of blind automation
Core practice 1 — Allocate work by comparative advantage: assign tasks to humans or AI based on relative strengths (e.g., speed, pattern detection, contextual judgement).
Conceptual component of the framework drawn from synthesis of empirical findings in prior human–AI and task allocation literature and practitioner examples; no new empirical testing in the paper.
high positive Symbiarchic leadership: leading integrated human and AI cybe... task assignment efficiency; productivity from task allocation
AI methods have improved molecular property prediction, protein structure modelling, ADME/Tox prediction, NLP-based extraction from literature, virtual screening, and generative chemistry, accelerating early-stage tasks.
Compilation of benchmarking results, method-comparison studies, and applied case studies cited in the paper across these specific application areas.
high positive Has AI Reshaped Drug Discovery, or Is There Still a Long Way... accuracy/quality of property and structure predictions, throughput/speed of virt...
AI has materially improved efficiency, decision-making, and early-stage productivity in drug discovery, especially in hit discovery, property prediction, and protein modelling.
Synthesis of published benchmarking studies and industry case studies reported in the paper (e.g., improvements in virtual screening throughput, property-prediction benchmarks, and protein-structure prediction results such as those from folding competitions and tool evaluations).
high positive Has AI Reshaped Drug Discovery, or Is There Still a Long Way... efficiency and productivity in early-stage drug discovery (hit discovery rate, t...
Molecule operates a marketplace for decentralized clinical and preclinical assets, focusing on tokenizing drug assets and enabling investors to finance development.
Case-study description based on Molecule's public materials and marketplace listings; demonstrates platform design and transactions rather than long-term outcomes.
high positive Decentralized Autonomous Organizations in the Pharmaceutical... number of assets tokenized, capital deployed via the marketplace
VitaDAO is a community-driven organization funding and acquiring IP for longevity-related research, emphasizing open science and community governance.
Detailed case-study description drawing on VitaDAO's public documentation, governance records, and whitepaper materials.
high positive Decentralized Autonomous Organizations in the Pharmaceutical... IP acquisitions by VitaDAO, funding rounds executed, degree of open-science publ...
Research agenda priorities include: empirically quantifying the value of digital twins on R&D productivity; studying complementarities between AI tools and tacit sensory knowledge; measuring cultural translation costs; and analyzing market concentration risks from proprietary sensory models.
List of recommended empirical research directions derived from conceptual analysis and gap identification; no primary empirical work conducted within the paper itself.
high positive At the table with Wittgenstein: How language shapes taste an... future empirical metrics: R&D productivity changes, complementarity estimates, m...
The collection highlights resolving methodological challenges such as ecological validity, generalization across environments, and integrating domain knowledge rather than purely optimizing benchmarks.
Methodological-focus summary from the collection indicating emphasis on ecological validity, generalization, and domain-knowledge integration across multiple papers.
high positive Towards ‘digital ecology’: Advances in integrating artificia... methodological robustness (ecological validity, cross-site generalization, domai...
Early applications focused on automating straightforward, repetitive tasks (e.g., filtering blank camera‑trap images); current work aims for deeper integration with ecological questions.
Historical-arc observation drawn from the collection's examples and classifications of papers (descriptive review of prior vs. current papers in the collection).
high positive Towards ‘digital ecology’: Advances in integrating artificia... complexity and integration depth of AI applications in ecology (task automation ...
The AI–ecology interface is maturing from simple, task‑automation proofs of concept into genuinely interdisciplinary work that advances both AI methods and ecological science.
Synthesis of the paper collection (mix of methodological, empirical, and translational papers) and the paper's summary of trends across those contributions (no single-sample experiment; claim based on cross-paper review).
high positive Towards ‘digital ecology’: Advances in integrating artificia... advancement of AI methods and ecological science (depth of interdisciplinary int...
Seed 2.0 Lite achieved 75.7% success rate with-skill, an increase of +18.9 percentage points over baseline.
Model-specific reported result in the paper: Seed 2.0 Lite with-skill success rate (75.7%) and reported improvement (+18.9pp); reported from the benchmark runs.
high positive SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... task success rate (percentage) and absolute percent-point lift
GLM-5 Turbo achieved 78.4% success rate with-skill, an increase of +5.4 percentage points over baseline.
Model-specific reported result in the paper: GLM-5 Turbo with-skill success rate (78.4%) and reported improvement (+5.4pp); based on the benchmark evaluation.
high positive SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... task success rate (percentage) and absolute percent-point lift
Nemotron 120B achieved 78.4% success rate with-skill, an increase of +18.9 percentage points over baseline.
Model-specific reported result in the paper: Nemotron 120B with-skill success rate (78.4%) and reported improvement (+18.9pp); results drawn from the benchmark runs.
high positive SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... task success rate (percentage) and absolute percent-point lift
MiniMax M2.5 achieved 81.1% success rate with-skill, an increase of +13.5 percentage points over baseline.
Model-specific reported result in the paper: MiniMax M2.5 with-skill success rate (81.1%) and reported improvement (+13.5pp); based on subset of the 185 scenario-runs across the evaluated models.
high positive SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... task success rate (percentage) and absolute percent-point lift
Results across 5 open-weight model conditions and 185 scenario-runs show consistent skill lift across all models.
Aggregate experimental results reported in the paper: evaluation over 5 model conditions and 185 scenario-runs, with cross-model improvement when SKILL is provided.
high positive SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... skill lift measured as change in task success rate (percentage point improvement...
Returns to advanced digital skills vary by firm size/type: the wage return in large Chaebol conglomerates is approximately 18.7%, significantly higher than the ~9.5% return in Small and Medium-sized Enterprises (SMEs), indicating a 'skills–scale' complementarity effect.
Heterogeneity analysis within the extended Mincerian wage regression framework using KLIPS micro-data, comparing estimated returns across firm types (Chaebol vs SMEs). (Sample size and exact model specification not provided in the excerpt.)
high positive Measuring the Economic Returns of Vocational Digital Skills ... wage/worker compensation (percentage wage premiums by firm type: Chaebol ≈ 18.7%...
Workers with only general digital literacy receive a wage premium of approximately 5.8% (after controlling for education, experience, and demographics).
Same empirical framework: extended Mincerian wage equation on KLIPS micro-data with controls for education, experience, and demographic characteristics. (Sample size not specified in the provided excerpt.)
high positive Measuring the Economic Returns of Vocational Digital Skills ... wage/worker compensation (percentage wage premium ≈ 5.8%)
Workers possessing specialized digital skills (e.g., data analysis, programming, automation control) enjoy a significant wage premium of approximately 14.2% after controlling for years of education, work experience, and demographic characteristics.
Empirical estimation using an extended Mincerian wage equation on micro-data from the Korean Labor and Income Panel Study (KLIPS); models control for years of education, work experience, and demographic covariates. (Sample size not specified in the provided excerpt.)
high positive Measuring the Economic Returns of Vocational Digital Skills ... wage/worker compensation (percentage wage premium ≈ 14.2%)
AI-adopting firms increase R&D expenditures following adoption.
Firm financial data showing higher R&D spending for adopters relative to nonadopters in post-adoption periods using the diff-in-diff framework.
high positive AI and Productivity: The Role of Innovation R&D expenditures (absolute or relative change)
Post-adoption patents by AI adopters receive more citations than those of nonadopters.
Difference-in-differences estimates comparing citation counts per patent before and after AI installation versus nonadopters; patent citation data used as the dependent variable.
high positive AI and Productivity: The Role of Innovation citations per patent (average citation count)
Firms that adopt AI subsequently increase patenting relative to nonadopters.
Firm-level analysis using a novel AI adoption measure based on timing of AI product installations and a stacked difference-in-differences design exploiting staggered adoption; dependent variable = firm patent counts (patenting rate). (Sample size and exact time period not specified in the provided text.)
high positive AI and Productivity: The Role of Innovation firm patent counts / patenting rate
Programming experience significantly improved code security.
Association found in the study between participants' programming experience (general programming experience measured for each participant) and the security of their submitted code; statistical analysis in the sample (n = 159) showed a significant positive effect of experience on code security.
high positive The Impact of AI-Assisted Development on Software Security: ... code security (security quality of participants' solutions) as a function of pro...
Using distributed systems as a principled foundation is a useful approach for creating and evaluating LLM teams.
Primary methodological proposal of the paper; supported by conceptual argument and (per the paper) mappings between distributed-systems concepts and LLM team design (specific experimental validation not detailed in the excerpt).
high positive Language Model Teams as Distributed Systems suitability of distributed-systems framework for designing/evaluating LLM teams
Large language models (LLMs) are growing increasingly capable.
Statement in the paper's introduction/abstract summarizing the field; based on observed progress in LLM development cited by the authors (no experimental sample size provided in the excerpt).
high positive Language Model Teams as Distributed Systems capability of LLMs (general competence/capacity)
Only seven specialized skills produce meaningful gains (up to +30%).
Empirical results showing that 7 out of 49 skills yielded meaningful positive improvements in acceptance-test pass rates, with gains up to 30%.
high positive SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... number of skills with meaningful positive pass-rate gains and magnitude (up to +...
The average gain from injecting skills is only +1.2% in pass rate.
Aggregated pass-rate differences computed across the benchmark tasks comparing with-skill vs without-skill conditions, reported as an average +1.2% gain.
high positive SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... average change in acceptance-test pass rate (+1.2%)
Analysis of benchmark data (n = 667) reveals substantial synergy effects: Llama-3.1-8B improves human performance by 23 percentage points.
Empirical analysis of the same benchmark dataset (n = 667) using the Bayesian IRT model; reported improvement in human performance with Llama-3.1-8B assistance of +23 percentage points.
high positive Quantifying and Optimizing Human-AI Synergy: Evidence-Based ... human task performance (accuracy, measured in percentage points) when assisted b...
Analysis of benchmark data (n = 667) reveals substantial synergy effects: GPT-4o improves human performance by 29 percentage points.
Empirical analysis of a benchmark dataset of n = 667 using the paper's Bayesian IRT framework; reported improvement in human performance with GPT-4o assistance of +29 percentage points.
high positive Quantifying and Optimizing Human-AI Synergy: Evidence-Based ... human task performance (accuracy, measured in percentage points) when assisted b...