The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (3103 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Human Ai Collab Remove filter
The paper's conclusions are drawn from a mix of evidence types including literature review, surveys/interviews, case studies, usage-log or publication-metric analyses, and controlled experiments—although the abstract does not specify which of these were actually used or the sample sizes.
Explicitly noted in the Data & Methods summary as the likely underlying evidence types; the paper's abstract itself does not document original data or detailed methods.
high null result Artificial Intelligence for Improving Research Productivity ... methodological provenance (types of evidence used; presence/absence of original ...
There is a lack of large‑scale causal evidence on generative AI’s effects; the paper recommends RCTs, difference‑in‑differences, matched employer–employee panels, and longitudinal studies to fill empirical gaps.
Methodological critique and research agenda provided in the review; observation based on the authors' survey of the literature.
high null result The Use of ChatGPT in Business Productivity and Workflow Opt... n/a (research design recommendation; outcome is future evidence generation)
Policy interventions are needed for data protection, bias mitigation, model transparency, accountability, and public investments in workforce retraining to smooth transitions and reduce inequality.
Normative policy recommendations grounded in the review's synthesis of risks and distributional concerns; not an empirical claim but a recommendation.
high null result The Use of ChatGPT in Business Productivity and Workflow Opt... policy adoption (existence of regulations, programs), outcomes: retraining parti...
New productivity metrics are needed to capture AI impacts, including time‑use changes, quality‑adjusted output, and accounting for intangible AI capital.
Methodological recommendation from the conceptual synthesis, motivated by limitations of existing measures discussed in the paper.
high null result The Use of ChatGPT in Business Productivity and Workflow Opt... n/a (recommendation for metrics: time use, quality‑adjusted output, AI capital a...
Further empirical calibration and validation against observed behavioral and economic data are necessary; the framework primarily demonstrates method and emergent phenomena rather than ready predictive deployment.
Paper explicitly notes the necessity of further empirical calibration and frames results as demonstration of method and emergent phenomena. This is an explicit limitation statement in the summary.
high null result An LLM-Driven Multi-Agent Simulation Framework for Coupled E... level of empirical calibration/validation (current framework not yet empirically...
Further quantitative research is needed to measure task‑level productivity effects, skill‑depreciation trajectories, and market impacts of differential GenAI adoption; structural models could incorporate TGAIF to predict labor demand and wage effects.
Authors' stated research agenda and limitations acknowledged in the paper; this is a call for future empirical work rather than an empirical claim.
high null result Where Automation Meets Augmentation: Balancing the Double-Ed... task-level productivity, skill-depreciation trajectories, market impacts, labor ...
ChatGPT was used as the generative engine for the MLLM in the system implementation described in the paper.
Methods section: integration of AR overlays with an MLLM, with ChatGPT used as the generative engine (explicit in the summary).
high null result Augmented Reality-Based Training System Using Multimodal Lan... Identity of generative model used (ChatGPT)
The paper proposes measurable metrics such as projection congruence indices, alignment persistence measures, monitoring/oversight burden, and outcome variability/tail risks attributable to agentic autonomy.
Explicit metric proposals in the methods and metrics section of the paper; presented as part of a research agenda rather than empirically implemented.
high null result Visioning Human-Agentic AI Teaming: Continuity, Tension, and... proposed measurement constructs (projection congruence, alignment persistence, m...
The paper proposes specific empirical and analytic follow-ups — multi-agent simulations, lab experiments with humans and adaptive agents, field case studies, econometric analyses, and formal economic models — to test the conceptual claims.
Explicit methods and research agenda listed in the paper; these are recommended future methods, not evidence.
high null result Visioning Human-Agentic AI Teaming: Continuity, Tension, and... feasibility and design of empirical/analytic methods for studying agentic HAT
Agentic AI is characterized by three properties that drive structural uncertainty: open-ended action trajectories, generative representations/outputs, and evolving objectives.
Definitions and taxonomy developed in the paper based on conceptual synthesis; presented as framing rather than empirically measured properties.
high null result Visioning Human-Agentic AI Teaming: Continuity, Tension, and... presence of specified agentic properties
The framework provides sector-specific implementation guidance tailored to healthcare and public administration, accounting for existing governance and regulatory structures.
Case/sector guidance sections offering practical recommendations and considerations for deployment in those sectors; design-oriented, not empirically piloted in the paper.
high null result Human–AI Handovers: A Dynamic Authority Reversal Framework f... implementation_guidance_presence; sector_adaptation_features
DAR identifies four trigger classes that govern transitions between authority states: data superiority, contextual judgment requirements, risk thresholds, and ethics/legal overrides.
Conceptual derivation and classification in the framework; mapping of trigger types to transition rules. Theoretical, no empirical data.
high null result Human–AI Handovers: A Dynamic Authority Reversal Framework f... trigger_class (categorical) and resulting authority_state_transitions
The Dynamic Authority Reversal (DAR) framework formalizes four discrete intra-episode authority states: Human-Leader/AI-Follower (HL), AI-Leader/Human-Follower (AL), Co-Leadership (CO), and Mutual Override (MO).
Formal conceptual specification and formal modeling within the paper; definitions of the four states and their roles. No empirical sample; theoretical/design artifact.
high null result Human–AI Handovers: A Dynamic Authority Reversal Framework f... authority_state (categorical: HL, AL, CO, MO)
Further quantitative and comparative research is needed to measure net productivity effects, skill trajectories, and generalizability across firm types and industries.
Authors' methodological assessment and limitations section noting single-firm qualitative design (Netlight) and rapidly evolving toolchains; recommendation for future empirical work.
high null result Rethinking How IT Professionals Build IT Products with Artif... gaps in current empirical evidence (lack of quantitative, longitudinal, cross-fi...
Another important gap is quantifying complementarities between AI and different skill types (evaluative vs. generative tasks).
Review observation that existing empirical work has not systematically quantified how AI productivity gains vary with worker skill composition and complementary roles.
high null result ChatGPT as an Innovative Tool for Idea Generation and Proble... magnitude of complementarities between AI assistance and various human skill typ...
Key research gaps include a lack of long-run causal evidence on the effects of LLMs on firm-level innovation rates, business formation, and industry structure.
Explicit identification of gaps in the literature within the nano-review; the review states that most studies are short-term, task-level, or descriptive.
high null result ChatGPT as an Innovative Tool for Idea Generation and Proble... long-run causal impacts of LLM adoption on firm innovation, business formation, ...
High-priority research includes randomized controlled trials on hybrid vs. automated routing, long-run studies on labor markets in service sectors, and models quantifying trust externalities and governance costs.
Paper's stated research agenda based on identified evidence gaps and limitations (lack of randomized long-run studies).
high null result The Effectiveness of ChatGPT in Customer Service and Communi... research output (RCTs, long-run studies, models) addressing the specified gaps
Current evidence is promising but early: case studies, pilot deployments, and short-run experiments dominate; long-run causal evidence on labor and welfare effects is limited.
Explicit methodological assessment in the paper noting source types (deployments, pilots, vendor reports, short-run experiments) and limitations (heterogeneity, lack of randomized controls, short horizons).
high null result The Effectiveness of ChatGPT in Customer Service and Communi... quality and duration of evidence (study types, presence of randomized controls)
Study limitations include reliance on perceptual measures (rather than solely objective performance), heterogeneity across institutional samples, and likely correlational rather than strictly causal identification.
Authors' own noted limitations in the paper's methods section: mixed-methods design using perceptions from questionnaires and interviews, sample heterogeneity across multinational institutions, and quantitative analyses that are associative rather than strictly causal.
high null result Human-AI Synergy in Financial Decision-Making: Exploring Tru... validity/causal identification of study findings
Statistical analyses reported improvements across metrics, but specific effect sizes and detailed statistical results were not provided in the summary.
Summary indicates statistical analyses were performed and improvements reported, but it also states that specific effect sizes were not included in the provided summary.
high null result Context-Rich Adaptive Embodied Agents: Enhancing LLM-Powered... Presence/absence of detailed statistical effect-size reporting
Measurement and research gaps (data scarcity, informality) complicate robust economic assessment of AI impacts; improved metrics, granular labour and firm‑level data, and mixed‑methods evaluation are required.
Methodological critique based on reviewed literature and identified gaps; no new data collection in the paper.
high null result Towards Responsible Artificial Intelligence Adoption: Emergi... availability and granularity of labour and firm-level datasets, prevalence of mi...
There is a lack of causal evidence on the long-run impacts of AI-driven HRM on employment, wages, and firm survival—this is a key research gap identified by the review.
Explicitly stated research gap in the review based on assessment of methodologies and findings across the 47 included studies.
high null result Data-Driven Strategies in Human Resource Management: The Rol... availability of causal studies on long-run employment, wage, and firm survival i...
A systematic review following PRISMA identified 47 peer-reviewed studies (2012–2024) on data-driven HRM and workforce resilience from Scopus, Web of Science, and Google Scholar.
Explicit review protocol and search/screening results reported by the paper (PRISMA-based), final sample size = 47 studies.
high null result Data-Driven Strategies in Human Resource Management: The Rol... number of studies included in the review
There is a need for causal, longitudinal studies quantifying economic returns of ERP-AI integration and for measurement frameworks for quality-adjusted decision improvements.
Stated limitation and research opportunity in the review; reviewers found scarcity of longitudinal causal studies in the 2020–2025 literature.
high null result Integrating Artificial Intelligence and Enterprise Resource ... existence/volume of longitudinal causal studies and quality-adjusted measurement...
There is a need for causal, longitudinal studies on how AI‑enabled fintech affects women's portfolio outcomes and on algorithmic interventions designed to reduce gender gaps.
Explicit statement in the paper noting limitations of existing literature (heterogeneity, limited longitudinal causal evidence, possible platform sample selection).
high null result Women's Investment Behaviour and Technology: Exploring the I... existence/absence of causal longitudinal evidence on fintech impacts by gender
Empirical validation on experimental or field data is needed to fully establish k-QREM's practical applicability; current results are based on numerical examples and simulations.
Paper's methodology and validation section: validation confined to two numerical example datasets and simulation studies; authors acknowledge lack of real experimental/field validation and propose it as future work.
high null result k-QREM: Integrating Hierarchical Structures to Optimize Boun... extent of empirical validation (numerical + simulation only; no field/experiment...
Extensions such as Bayesian hierarchical estimation and integration with multi-agent reinforcement learning are promising future directions but not implemented in the paper.
Authors' discussion of future work and limitations; no empirical or methodological implementation presented for these extensions in the current paper.
high null result k-QREM: Integrating Hierarchical Structures to Optimize Boun... status of proposed extensions (not implemented)
k-QREM explicitly models heterogeneity both across cognitive levels (different proportions of players at each level) and within levels (stochastic variability among players assigned to the same level).
Model specification: the paper defines level-specific quantal response functions and allows distributions over player types within each level (theoretical/modeling choices demonstrated in equations and architecture).
high null result k-QREM: Integrating Hierarchical Structures to Optimize Boun... model structure (within- and across-level heterogeneity representation)
k-QREM is a hierarchical quantal-response model that nests the Cognitive Hierarchy Model (CHM) and Quantal Response Equilibrium (QRE) as special or limiting cases.
Analytical model construction in the paper: k-level hierarchical formulation showing CHM (discrete levels, deterministic best-response limit) and QRE (single-level stochastic best-response) arise as special/limiting parameterizations of k-QREM (model derivation/proofs provided).
high null result k-QREM: Integrating Hierarchical Structures to Optimize Boun... model relationship / representational inclusion (theoretical nesting)
Evaluation methods reported commonly include visual inspection by researchers/clinicians, correlation with known biomarkers/frequency bands, and ablation/perturbation faithfulness tests; few studies report standardized quantitative metrics for robustness, stability, or neuroscientific fidelity.
Survey of evaluation practices across the literature compiled in the review.
high null result Explainable Artificial Intelligence (XAI) for EEG Analysis: ... types of evaluation methods used to assess explanations
Modeling approaches in the literature include end-to-end deep models operating on raw or time–frequency representations, recurrent architectures for temporal dynamics, attention mechanisms, and hybrid feature-based classifiers.
Summary of modeling choices described across reviewed studies.
high null result Explainable Artificial Intelligence (XAI) for EEG Analysis: ... specific modeling strategies applied to EEG
Typical datasets used in EEG XAI research include public collections such as the TUH EEG Corpus, BCI Competition datasets, PhysioNet sleep databases, CHB-MIT for pediatric seizures, as well as many small/clinical cohorts.
Listing of commonly referenced datasets across the surveyed literature.
high null result Explainable Artificial Intelligence (XAI) for EEG Analysis: ... datasets employed in EEG XAI studies
A common taxonomy emphasized in EEG XAI work distinguishes local vs global explanations, model-specific vs model-agnostic methods, and post-hoc vs intrinsically interpretable models.
Conceptual organization presented in the review synthesizing common taxonomic distinctions used by authors in the field.
high null result Explainable Artificial Intelligence (XAI) for EEG Analysis: ... taxonomic classification of explanation types
XAI methods applied to EEG in the literature include gradient-based saliency methods, Integrated Gradients, layer-wise relevance propagation (LRP), CAM/Grad-CAM, occlusion/perturbation analyses, LIME, SHAP, TCAV, and counterfactual explanations.
Cataloging of explanation techniques reported across surveyed EEG papers.
high null result Explainable Artificial Intelligence (XAI) for EEG Analysis: ... types of XAI techniques used
Models used in EEG XAI work include deep learning architectures (CNNs, RNNs, attention/transformers), classical machine learning, and hybrid pipelines combining feature extraction with classifiers.
Summary of modeling approaches reported across reviewed studies.
high null result Explainable Artificial Intelligence (XAI) for EEG Analysis: ... model architectures applied to EEG tasks
The literature on EEG XAI covers tasks including seizure detection, sleep staging, brain–computer interfaces (BCI), cognitive/emotional state recognition, and diagnostic/supportive tools.
Descriptive review of topical coverage across surveyed papers; specific task categories enumerated in the review.
high null result Explainable Artificial Intelligence (XAI) for EEG Analysis: ... task domains addressed by EEG XAI studies
Analyses were conducted as intent-to-treat comparisons across arms, with hypothesis tests reported (including p-values) and principal stratification used for mechanism decomposition.
Methods statement: intent-to-treat comparisons, reported p-values for score differences, and use of principal stratification for separating total effect into adoption and effectiveness channels in the randomized trial (n = 164).
high null result Training for Technology: Adoption and Productive Use of Gene... Analysis methods (ITT, hypothesis tests, principal stratification)
The primary outcomes analyzed were LLM adoption (use), exam score (grade points), and answer length.
Study’s stated primary outcomes in methods: adoption indicator, exam score on an issue-spotting exam, and answer length (measured). Sample size n = 164.
high null result Training for Technology: Adoption and Productive Use of Gene... Adoption; exam score; answer length
The study used a randomized controlled design with three arms: no LLM access, optional LLM access, and optional LLM access plus brief training.
Study methods description: randomized assignment of 164 law students to three experimental conditions as listed.
high null result Training for Technology: Adoption and Productive Use of Gene... Study design (randomization and arm definitions)
The intervention consisted of roughly a ten-minute training focused on how to use the LLM effectively.
Study description of the intervention in the randomized experiment (three-arm design with one arm receiving ~10-minute targeted training).
high null result Training for Technology: Adoption and Productive Use of Gene... Intervention duration/content (training implementation)
Empirical validation of the book’s proposals would require complementary case studies, model documentation, and outcome measurements.
Author/reviewer recommendation in the blurb about methodological limitations and next steps; not an empirical finding.
high null result Governing The Future need for empirical case studies, documented models, and outcome metrics to valid...
The book is predominantly conceptual and policy-analytic and uses illustrative case vignettes rather than presenting a single empirical study.
Explicit methodological description in the Data & Methods blurb: synthesis of technical ideas, governance requirements, and illustrative vignettes; no empirical sample or experimental protocol described.
high null result Governing The Future presence or absence of empirical methodology in the book
The paper proposes two conceptual models (AI/ML‑Driven Labor Market Transformation Model and Sectoral Impact and Resilience Model) to organize heterogeneous findings and generate testable hypotheses about how AI reshapes labor across sectors and skill levels.
Conceptual synthesis integrating Technological Determinism, Socio‑Technical Systems Theory (STS), and Skill‑Biased Technological Change (SBTC); the models are theoretical outputs of the review used to map mechanisms and heterogeneity rather than empirical findings.
high null result The Impact of AI Machine Learning on Human Labor in the Work... conceptual mapping of mechanisms (task automation vs augmentation, sectoral expo...
There are substantial measurement and identification gaps in the literature: heterogeneity in measuring 'AI adoption', limited long‑run causal evidence, and geographic bias toward advanced economies.
Methodological assessment within the review noting variability across studies in AI measures (patents, investment, task exposure proxies), paucity of long‑run causal designs, and concentration of empirical studies in advanced economies; this is a meta‑evidence limitation statement.
high null result The Impact of AI Machine Learning on Human Labor in the Work... quality and robustness of empirical evidence on AI's labor‑market impacts
Overall, the HCT is a robust, accurate, and transparent alternative to the AI-as-advisor approach, offering a simple mechanism to tap into the wisdom of hybrid crowds.
Overall conclusion drawn from the empirical comparisons across datasets and analyses described in the paper (summary statement in abstract).
high positive Beyond AI advice -- independent aggregation boosts human-AI ... overall decision-making performance / robustness / transparency
Using signal detection theory, the paper finds that the HCT outperforms the AI-as-advisor approach because people cannot discriminate well enough between correct and incorrect AI advice.
Analysis in the paper applying signal detection theory to the empirical results (as stated in abstract).
high positive Beyond AI advice -- independent aggregation boosts human-AI ... discriminability between correct and incorrect AI advice (signal detection metri...
The HCT also performed better in almost all cases in which the AI offered an explanation of its judgment.
Empirical results on the subset of four datasets with AI explanations (abstract reports HCT performed better in 'almost all' of these cases).
high positive Beyond AI advice -- independent aggregation boosts human-AI ... decision accuracy when AI provides explanations
The HCT outperformed the AI-as-advisor approach in all datasets.
Empirical comparisons reported across the 10 datasets (statement in abstract that HCT 'outperformed' in all datasets). Specific performance metrics not provided in abstract.
high positive Beyond AI advice -- independent aggregation boosts human-AI ... decision accuracy / task performance
An AI agent given revealed-preference data predicts subjects' choices more accurately than an AI agent given stated-preference prompts.
Online experiment in which subjects provided written instructions (prompts) and revealed preferences via choices in a series of binary lottery questions; AI agents were given either the revealed-preference data or the stated-preference prompts and their prediction accuracy on subjects' choices was compared.
high positive Should I State or Should I Show? Aligning AI with Human Pref... prediction accuracy of AI agent for subjects' choices
Under economy-wide deployment, the share of computer-vision-exposed labor compensation that is cost-effectively automatable rises sharply (relative to the firm-level 11% estimate).
Model counterfactuals or calibration scenarios comparing firm-level deployment vs economy-wide deployment; qualitative statement that share increases substantially.
high positive Economics of Human and AI Collaboration: When is Partial Aut... share of labor compensation automatable under economy-wide deployment