The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (4560 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Productivity Remove filter
The paper is a position/normative paper (not an empirical study) that uses conceptual analysis, literature synthesis, and prescriptive roadmaping rather than new quantitative experiments or datasets.
Explicit methodological statement in the paper summarizing genre and methods used; absence of reported original data or controlled evaluations.
high null result LLM Alignment should go beyond Harmlessness–Helpfulness and ... presence or absence of original empirical data / controlled evaluation in the pa...
There is a need for longitudinal and cross‑country empirical research to measure how hybrid work and AI tools affect promotion rates, network centrality, productivity, privacy harms, trust, and long‑term career trajectories.
Statement of research gaps derived from the paper's methodological approach (conceptual synthesis and secondary case studies) and absence of longitudinal/cross‑cultural primary data.
high null result The Sociology of Remote Work and Organisational Culture: How... research gap existence (need for longitudinal and cross‑country empirical studie...
Highly Autonomous Cyber-Capable Agents (HACCAs) are AI systems able to plan and execute multi-stage cyber campaigns across the full attack lifecycle with minimal or no human direction.
Conceptual definition provided in the report; constructed via literature review and threat-framework formulation (no empirical sample; definitional/analytic).
high null result Highly Autonomous Cyber-Capable Agents: Anticipating Capabil... agent autonomy across reconnaissance, exploitation, lateral movement, persistenc...
Potential risks of deploying such models include fairness/bias, privacy concerns from employee-level predictions, and adverse morale effects if interventions are unevenly applied.
Authors' discussion of risks and ethical considerations when applying predictive XAI models to employee data; this is a stated limitation/risk discussion rather than an empirical finding.
high null result Explainable AI for Employee Retention in Green Human Resourc... risk categories (fairness, privacy, morale)—qualitative concerns
Generalizability is limited: results based on the IBM dataset may differ for real green-workforce populations, industries, or countries.
Authors' stated limitation regarding external validity and representativeness of the IBM HR Analytics dataset as a proxy for sustainability roles.
high null result Explainable AI for Employee Retention in Green Human Resourc... external validity / generalizability
Counterfactual simulations reported are predictive rather than causal; estimated effects require causal validation (e.g., randomized trials) before large-scale policy rollout.
Authors' methodological caveat noting that simulation-based changes in model-predicted probabilities do not establish causality and recommending causal evaluation methods for policy adoption.
high null result Explainable AI for Employee Retention in Green Human Resourc... validity of counterfactual policy effect estimates (predictive vs causal)
The IBM HR Analytics dataset was used as a proxy for sustainability-focused (green) roles, relying on objective HR records rather than self-report surveys.
Data statement in the paper: model trained and evaluated on the IBM HR Analytics dataset; authors explicitly treat it as a proxy for sustainability-oriented roles for purposes of demonstration.
high null result Explainable AI for Employee Retention in Green Human Resourc... data source / representativeness (proxy use)
The study shifts retention analysis from descriptive correlations and surveys toward actionable, employee-level predictions and policy evaluation.
Combination of objective HR records (IBM dataset), predictive modeling (logistic regression), calibration, XAI tools (SHAP, LIME), and counterfactual policy simulations to evaluate intervention effects at individual and aggregate levels.
high null result Explainable AI for Employee Retention in Green Human Resourc... operationalization of predictive, actionable attrition estimates (methodological...
Local explainability (SHAP and LIME) can identify employee-specific intervention levers for targeted retention actions.
Use of SHAP and LIME for local explanations of individual predictions; counterfactual simulations applied at the employee level to estimate impact of feature changes on that employee's calibrated attrition probability.
high null result Explainable AI for Employee Retention in Green Human Resourc... employee-level change in predicted attrition probability (used to prioritize int...
Practical recommendations for firms and policymakers include investing in training for AI curation/evaluation/coordination, experimenting with decentralised decision rights and governance safeguards, and monitoring competitive dynamics related to model/platform providers.
Policy and practitioner takeaways explicitly presented in the discussion/implications sections, deriving from the conceptual framework and mapped literature.
high null result Generative AI and the algorithmic workplace: a bibliometric ... recommended organisational and policy actions
The paper recommends a research agenda for AI economists: causal microeconometric studies (DiD, IVs, RCTs), structural models with hybrid human–AI agents, measurement work on GenAI use, distributional analysis and policy evaluation.
Explicit recommendations listed in the implications and research agenda sections; logical follow‑on from bibliometric findings about gaps in causal and measurement evidence.
high null result Generative AI and the algorithmic workplace: a bibliometric ... recommended methodological directions for future empirical and theoretical resea...
Bibliometric mapping profiles the intellectual structure and evolution of the field but does not establish causal effects of GenAI on organisational outcomes.
Methodological limitation explicitly stated in the paper; bibliometric approach (co‑word, citation, thematic mapping) is descriptive and historical in scope.
high null result Generative AI and the algorithmic workplace: a bibliometric ... methodological limitation (inability to infer causality from bibliometric mappin...
Co‑word and thematic analyses reveal six coherent conceptual clusters that bridge technical AI topics (e.g., LLMs, GANs) with managerial themes (e.g., autonomy, coordination, decision‑making).
Thematic mapping and co‑word network analysis performed on the 212‑paper corpus; identification of six clusters reported in results.
high null result Generative AI and the algorithmic workplace: a bibliometric ... number and thematic composition of conceptual clusters (six clusters linking tec...
Bibliometric and conceptual tools (VOSviewer, Bibliometrix) were used to identify performance trends, co‑word structures, thematic maps, and conceptual evolution in the GenAI–organisation literature.
Methods section: use of VOSviewer for network visualization and Bibliometrix for bibliometric statistics, co‑word analysis, thematic mapping and Sankey thematic evolution.
high null result Generative AI and the algorithmic workplace: a bibliometric ... types of bibliometric analyses applied (performance trends, co‑word structures, ...
The study analysed a corpus of 212 Scopus‑indexed publications covering 2018–2025 to map emergent literature on Generative AI and organisational change.
Bibliometric dataset constructed from Scopus; sample size = 212 peer‑reviewed articles; time window 2018–2025; analyses performed with Bibliometrix and VOSviewer.
high null result Generative AI and the algorithmic workplace: a bibliometric ... size and timeframe of bibliometric corpus (number of publications, 2018–2025)
Outcomes reported are primarily self-reported psychological measures rather than objective productivity metrics.
Paper reports measurement instruments focused on self-reported self-efficacy, psychological ownership, meaningfulness, and enjoyment/satisfaction; no primary objective productivity metrics reported.
high null result Relying on AI at work reduces self-efficacy, ownership, and ... measurement type (self-reported psychological outcomes)
The experiment was pre-registered, used occupation-specific writing tasks, and employed a between-subjects design with three conditions (No-AI, Passive AI, Active collaboration).
Study design reported in the paper: pre-registration statement, N = 269, between-subjects assignment to three conditions using occupation-specific writing tasks.
high null result Relying on AI at work reduces self-efficacy, ownership, and ... n/a (methodological claim)
Active, collaborative AI use preserves perceived meaningfulness of work at levels comparable to independent work and does not produce the lasting psychological costs seen with passive use.
Pre-registered experiment (N = 269) with post-manipulation and post-return measures; Active-collaboration condition matched No-AI on meaningfulness and showed no persistent declines after returning to manual tasks.
high null result Relying on AI at work reduces self-efficacy, ownership, and ... perceived meaningfulness of work (including post-return)
Active, collaborative AI use preserves psychological ownership of outputs at levels comparable to independent work.
Pre-registered experiment (N = 269); Active-collaboration condition reported ownership levels similar to No-AI condition on self-report scales.
high null result Relying on AI at work reduces self-efficacy, ownership, and ... psychological ownership of outputs
Active, collaborative AI use (human drafts first, then uses AI to refine) preserves self-efficacy at levels comparable to independent (no-AI) work.
Pre-registered experiment (N = 269) comparing Active-collaboration and No-AI conditions; no statistically meaningful differences in self-efficacy between them (self-reported measures).
high null result Relying on AI at work reduces self-efficacy, ownership, and ... self-efficacy (confidence to complete tasks without AI)
The authors propose research priorities for economists: quantify productivity gains from closing the actionability gap; estimate firm-level heterogeneity in evaluation capability and its effect on adoption; and model investment trade-offs between building evaluation-to-action pipelines versus accepting reduced LLM performance.
Paper's concluding recommendations for future research directions (explicitly listed by the authors).
high null result Results-Actionability Gap: Understanding How Practitioners E... recommended research agenda topics
The paper produces as primary outcomes a taxonomy of ten evaluation practices, the articulation of the results-actionability gap, and recommended strategies observed among successful teams.
Authors report these as the main outcomes of their thematic analysis and syntheses from the 19 interviews.
high null result Results-Actionability Gap: Understanding How Practitioners E... reported study outputs (taxonomy, articulated gap, recommended strategies)
The study method consisted of semi-structured qualitative interviews with 19 practitioners across multiple industries and roles, analyzed via thematic coding.
Explicit methods section of the paper stating sample size (n=19), participant diversity, interview approach, and coding/analysis procedure.
high null result Results-Actionability Gap: Understanding How Practitioners E... study design and sample size
AI-economics research should treat quantum capability as a distinct, gradually diffusing factor of production with sectoral specificity and model complementarities and policy counterfactuals endogenously.
Modeling recommendations grounded in sensitivity of macro outcomes to diffusion patterns, complementarities, and policy choices observed in the scenario and counterfactual analyses.
high null result Modeling Macroeconomic Output Gains from Quantum-Driven Prod... quality of AI-economic forecasts and policy evaluation (model realism)
Model parameters are calibrated using historical diffusion of enabling technologies (cloud computing, GPUs, AI toolchains), industry case studies, and expert elicitation where hard data are lacking.
Empirical grounding section describing calibration sources: historical diffusion, case studies (materials discovery, optimization), and expert elicitation.
high null result Modeling Macroeconomic Output Gains from Quantum-Driven Prod... calibrated model parameters (diffusion rates, adoption elasticities, complementa...
Uncertainty quantification is performed by running Monte Carlo or scenario ensembles and conducting sensitivity and robustness checks.
Methodological claim in the uncertainty quantification section describing Monte Carlo/scenario ensemble approach.
high null result Modeling Macroeconomic Output Gains from Quantum-Driven Prod... sensitivity of results to parameter uncertainty; distribution of model outcomes
Sectoral TFP shocks are integrated into computational general equilibrium (CGE) or multi-sector growth models (and optionally DSGE variants) to simulate GDP, sector output, trade impacts, and labor reallocation.
Method section stating integration of sectoral TFP shocks into CGE/multi-sector growth models with optional DSGE short-run dynamics.
high null result Modeling Macroeconomic Output Gains from Quantum-Driven Prod... GDP, sectoral output, trade flows, labor reallocation
Sectoral adoption is translated into total factor productivity (TFP) shocks or sector-specific Hicks-neutral productivity improvements based on micro evidence of quantum advantages.
Methodological description of productivity mapping linking adoption to TFP shocks using micro evidence and case studies.
The paper uses empirical diffusion functions (logistic/S-curve, Bass model) calibrated to analogous technologies to project uptake over time.
Methodological description: diffusion modeling section explicitly states use of logistic/S-curve and Bass models and calibration to past technologies (cloud, GPUs).
high null result Modeling Macroeconomic Output Gains from Quantum-Driven Prod... projected adoption curves over time
The analysis used sentence‑transformer models to produce dense vector representations of article text and UMAP to project those embeddings into a low‑dimensional thematic map for cluster identification and gap detection.
Methods section specifying use of sentence‑transformer embeddings and UMAP for dimensionality reduction/visualization of article text.
high null result Natural language processing in bank marketing: a systematic ... analytic techniques applied to article abstracts/text (embedding + dimensionalit...
The study followed a PRISMA protocol for literature selection and included peer‑reviewed journal articles published between 2014 and 2024, with a final sample size of n = 109.
Explicit methodological statement in the paper describing the literature search, inclusion/exclusion criteria, and final sample.
high null result Natural language processing in bank marketing: a systematic ... methodological protocol adherence and sample size
Twenty‑seven papers study marketing in banking without using NLP methods.
PRISMA systematic review; categorization of the 109 selected articles into the three coverage groups (8, 74, 27).
high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles on marketing in banking that do not use NLP
Seventy‑four papers study NLP in marketing more broadly (not specifically banking).
Same PRISMA‑based systematic review and manual categorization of the final sample n = 109 into topical buckets (NLP in marketing vs. NLP in bank marketing vs. marketing in banking without NLP).
high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles on NLP in marketing (general)
Only 8 peer‑reviewed papers directly examine NLP in bank marketing (out of a final sample of 109 articles published 2014–2024).
Systematic review following PRISMA protocol; final sample n = 109 peer‑reviewed journal articles published 2014–2024; manual screening and categorization yielding counts by topic.
high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles focused on NLP in bank marketing
The study's findings are qualitative and case-driven (Xiaomi and Deloitte); generalizability is limited by case selection and the absence of standardized quantitative metrics.
Methods section explicitly states case analysis and literature review as primary methods and notes lack of large-scale quantitative measurement.
high null result Explore the Impact of Generative AI on Finance and Taxation external validity/generalizability of results
The analysis in the paper is primarily qualitative and descriptive; it does not empirically quantify AI’s effects on trade flows or welfare.
Explicit statement in the methods/data description noting a mixed qualitative approach (theoretical analysis, comparative legal analysis, case studies, scenario reasoning) and absence of empirical quantification.
high null result Path Analysis of Digital Economy and Reconstruction of Inter... empirical quantification of AI's effect on trade flows and welfare (not provided...
The study is qualitative and law-focused and uses Vietnam as a focused case study without collecting primary quantitative field data.
Explicit Data & Methods statement in the paper indicating doctrinal legal analysis, comparative institutional analysis, and normative framework development; no primary quantitative sample.
high null result ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... study design/data type (qualitative, doctrinal, comparative; absence of primary ...
The study recommends empirical metrics for future evaluation of reforms, including processing time per case, reversal rates on appeal, administrative litigation frequency, compliance and procurement costs, investment flows into public-sector AI, and changes in labor composition and wages in administrative agencies.
Methodological recommendation arising from the paper's normative and comparative analysis.
high null result ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... recommended empirical metrics (processing time per case; appeal reversal rates; ...
The paper's argument is principally theoretical and prescriptive and requires empirical validation across domains and at scale.
Author-stated limitation in the Data & Methods section noting that the work is primarily conceptual and that empirical validation is needed.
high null result An Alternative Trajectory for Generative AI existence/absence of empirical validation (current lack of cross-domain, large-s...
Operationalizing DSS requires building domain ontologies/knowledge graphs, designing synthetic curricula, training compact domain models, benchmarking against monolithic LLMs, and measuring total cost-of-ownership (energy, latency, bandwidth, infrastructure).
Paper's recommended experimental and measurement agenda (procedural/methodological prescriptions); this is a proposed research plan rather than an empirical result.
high null result An Alternative Trajectory for Generative AI validation metrics proposed by the paper (benchmark performance, energy/inferenc...
Field experiments (A/B testing) and willingness-to-pay experiments are necessary to quantify monetary benefits, adoption curves, and optimal pricing for alignment capabilities.
Paper explicitly recommends these empirical approaches in the recommendations for economists and product teams; this is a methodological recommendation rather than an empirical finding.
high null result A Context Alignment Pre-processor for Enhancing the Coherenc... adoption rates, willingness-to-pay, retention, task completion differences acros...
Recommended evaluation directions include automatic metrics (embedding similarity, task success, turn counts), human evaluation (satisfaction, perceived collaboration), and A/B testing in deployed settings (latency, compute, retention).
Paper's explicit evaluation proposals and recommended metrics listed in the Data & Methods and Evaluation Directions sections; these are prescriptive recommendations rather than executed experiments.
high null result A Context Alignment Pre-processor for Enhancing the Coherenc... specified evaluation metrics (task success rate, turn counts, retention, latency...
The paper focuses on architecture and conceptual arguments rather than reporting large-scale empirical datasets or results.
Data & Methods section and overall document framing emphasize architecture description and proposed evaluations; explicitly notes absence of large-scale empirical results in the provided summary.
high null result A Context Alignment Pre-processor for Enhancing the Coherenc... presence/absence of large-scale empirical evaluation
Alignment verification can be implemented using semantic embeddings (cosine similarity) or learned classifiers with threshold-based decision branching.
Paper describes these as recommended implementation approaches for the alignment verification component; no empirical benchmark comparing methods is reported.
high null result A Context Alignment Pre-processor for Enhancing the Coherenc... similarity scores, classifier accuracy, false positive/negative rates for drift ...
Temporal decay in the retrieval component can be modeled with functions such as exponential decay and a tunable half-life parameter applied to dialogue-turn embeddings.
Methodological description in the paper specifying temporal decay modeling options (exponential decay example) and tunable parameters; descriptive claim about intended implementation (no empirical comparison of decay functions provided).
high null result A Context Alignment Pre-processor for Enhancing the Coherenc... decay parameter values / impact of decay function on retrieval weighting
Research agenda items for economists include: quantifying willingness-to-pay for verifiable reasoning, studying labor-market impacts for validators, designing contracts/mechanisms to incentivize truthful argument provision, and evaluating regulatory interventions.
Paper's stated research and policy agenda; prescriptive rather than empirical.
high null result Argumentative Human-AI Decision-Making: Toward AI Agents Tha... existence and prioritization of empirical research on WTP, labor impacts, mechan...
Evaluation currently lacks metrics and benchmarks for argument quality, fidelity, contestability, and human trust; developing these is necessary.
Paper notes the gap and proposes evaluation metrics and experimental designs; no new benchmarks introduced.
high null result Argumentative Human-AI Decision-Making: Toward AI Agents Tha... availability and maturity of evaluation metrics and benchmarks
Evaluation metrics for the architecture should include sample efficiency, generalization across tasks, robustness to distribution shift, autonomy (fraction of learning decisions made internally), transfer speed, lifelong retention, and safety/constraint adherence.
Explicit recommendations for evaluation metrics in the paper.
high null result Why AI systems don't learn and what to do about it: Lessons ... listed evaluation metrics (sample efficiency; generalization; robustness; autono...
This paper is a conceptual/theoretical architecture proposal rather than an empirical study; empirical validation should follow via suggested experiments.
Explicit statement in the paper about nature of contribution.
high null result Why AI systems don't learn and what to do about it: Lessons ... N/A (no empirical outcomes reported)
Results are from role-play contexts and short-term interventions; economic estimates of benefit require validation in field settings, across diverse populations, and with different LLM models.
Authors' caveats and limitations stated in the paper noting external validity concerns and the experimental context (role-play, short-term follow-up).
high null result Practicing with Language Models Cultivates Human Empathic Co... generalizability/external validity (not directly measured)