The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (2432 claims)

Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 369 105 58 432 972
Governance & Regulation 365 171 113 54 713
Research Productivity 229 95 33 294 655
Organizational Efficiency 354 82 58 34 531
Technology Adoption Rate 277 115 63 27 486
Firm Productivity 273 33 68 10 389
AI Safety & Ethics 112 177 43 24 358
Output Quality 228 61 23 25 337
Market Structure 105 118 81 14 323
Decision Quality 154 68 33 17 275
Employment Level 68 32 74 8 184
Fiscal & Macroeconomic 74 52 32 21 183
Skill Acquisition 85 31 38 9 163
Firm Revenue 96 30 22 148
Innovation Output 100 11 20 11 143
Consumer Welfare 66 29 35 7 137
Regulatory Compliance 51 61 13 3 128
Inequality Measures 24 66 31 4 125
Task Allocation 64 6 28 6 104
Error Rate 42 47 6 95
Training Effectiveness 55 12 10 16 93
Worker Satisfaction 42 32 11 6 91
Task Completion Time 71 5 3 1 80
Wages & Compensation 38 13 19 4 74
Team Performance 41 8 15 7 72
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 17 15 9 5 46
Job Displacement 5 28 12 45
Social Protection 18 8 6 1 33
Developer Productivity 25 1 2 1 29
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 7 4 9 20
Clear
Labor Markets Remove filter
This paper uses panel data of China's Shanghai and Shenzhen A-share non-financial listed companies from 2010 to 2022 to study AI's effects.
Explicit data description in the paper (sample frame and period stated).
high null result THE IMPACT OF ARTIFICIAL INTELLIGENCE ON ENTERPRISE INCOME D... n/a (methodological/data claim)
Capital income taxes, worker equity participation, universal basic income, upskilling, and Coasian bargaining cannot eliminate the excess automation.
Model-based policy counterfactuals evaluated in the paper showing these interventions fail to achieve the social optimum in the theoretical framework; no empirical sample.
high null result The AI Layoff Trap effectiveness of listed policies at preventing excessive automation / preserving...
Wage adjustments and free entry cannot eliminate the excess automation.
Analytical result in the model showing endogenous wage changes and free entry do not restore the socially optimal level of employment; theoretical equilibrium analysis, no empirical data.
high null result The AI Layoff Trap ability of wage adjustments and free entry to correct excessive automation / res...
The research methodology is based on the envelope model ("input" orientation) to assess the level of transformation of labor resources and labor markets due to the spread of artificial intelligence.
Methodological statement in the paper specifying the use of an input-oriented envelope model applied to a sample of European Union countries.
high null result Artificial intelligence as a driver of economic growth: Chal... method of measurement / assessment approach
Despite fears of mass unemployment, aggregate labor-market data through 2025 show limited labor-market disruption from generative AI.
Review of aggregate employment and labor-market studies and macro-level data through 2025 cited in the brief; methods include analyses of employment statistics and macro labor indicators (no single sample size reported).
high null result AI, Productivity, and Labor Markets: A Review of the Empiric... aggregate employment / labor-market disruption
The analysis extends the dynamic taxation setup of Slavik and Yazici (2014).
Methodological claim: the model and solution approach build on and modify the framework from Slavik and Yazici (2014) (reference to prior theoretical framework rather than empirical data).
high null result Workers' Incentives and the Optimal Taxation of AI scope and structure of the theoretical model (extension of the referenced dynami...
We characterize the optimal tax policy in an economy with human manual and cognitive labor, physical capital, and artificial intelligence (AI).
Theoretical/analytical work: the paper develops and analyzes a dynamic general-equilibrium model that includes manual and cognitive human labor, physical capital, and AI. (No empirical sample; model-based characterization.)
high null result Workers' Incentives and the Optimal Taxation of AI form and properties of the optimal tax policy in the specified theoretical econo...
Potential risks of deploying such models include fairness/bias, privacy concerns from employee-level predictions, and adverse morale effects if interventions are unevenly applied.
Authors' discussion of risks and ethical considerations when applying predictive XAI models to employee data; this is a stated limitation/risk discussion rather than an empirical finding.
high null result Explainable AI for Employee Retention in Green Human Resourc... risk categories (fairness, privacy, morale)—qualitative concerns
Generalizability is limited: results based on the IBM dataset may differ for real green-workforce populations, industries, or countries.
Authors' stated limitation regarding external validity and representativeness of the IBM HR Analytics dataset as a proxy for sustainability roles.
high null result Explainable AI for Employee Retention in Green Human Resourc... external validity / generalizability
Counterfactual simulations reported are predictive rather than causal; estimated effects require causal validation (e.g., randomized trials) before large-scale policy rollout.
Authors' methodological caveat noting that simulation-based changes in model-predicted probabilities do not establish causality and recommending causal evaluation methods for policy adoption.
high null result Explainable AI for Employee Retention in Green Human Resourc... validity of counterfactual policy effect estimates (predictive vs causal)
The IBM HR Analytics dataset was used as a proxy for sustainability-focused (green) roles, relying on objective HR records rather than self-report surveys.
Data statement in the paper: model trained and evaluated on the IBM HR Analytics dataset; authors explicitly treat it as a proxy for sustainability-oriented roles for purposes of demonstration.
high null result Explainable AI for Employee Retention in Green Human Resourc... data source / representativeness (proxy use)
The study shifts retention analysis from descriptive correlations and surveys toward actionable, employee-level predictions and policy evaluation.
Combination of objective HR records (IBM dataset), predictive modeling (logistic regression), calibration, XAI tools (SHAP, LIME), and counterfactual policy simulations to evaluate intervention effects at individual and aggregate levels.
high null result Explainable AI for Employee Retention in Green Human Resourc... operationalization of predictive, actionable attrition estimates (methodological...
Local explainability (SHAP and LIME) can identify employee-specific intervention levers for targeted retention actions.
Use of SHAP and LIME for local explanations of individual predictions; counterfactual simulations applied at the employee level to estimate impact of feature changes on that employee's calibrated attrition probability.
high null result Explainable AI for Employee Retention in Green Human Resourc... employee-level change in predicted attrition probability (used to prioritize int...
Practical recommendations for firms and policymakers include investing in training for AI curation/evaluation/coordination, experimenting with decentralised decision rights and governance safeguards, and monitoring competitive dynamics related to model/platform providers.
Policy and practitioner takeaways explicitly presented in the discussion/implications sections, deriving from the conceptual framework and mapped literature.
high null result Generative AI and the algorithmic workplace: a bibliometric ... recommended organisational and policy actions
The paper recommends a research agenda for AI economists: causal microeconometric studies (DiD, IVs, RCTs), structural models with hybrid human–AI agents, measurement work on GenAI use, distributional analysis and policy evaluation.
Explicit recommendations listed in the implications and research agenda sections; logical follow‑on from bibliometric findings about gaps in causal and measurement evidence.
high null result Generative AI and the algorithmic workplace: a bibliometric ... recommended methodological directions for future empirical and theoretical resea...
Bibliometric mapping profiles the intellectual structure and evolution of the field but does not establish causal effects of GenAI on organisational outcomes.
Methodological limitation explicitly stated in the paper; bibliometric approach (co‑word, citation, thematic mapping) is descriptive and historical in scope.
high null result Generative AI and the algorithmic workplace: a bibliometric ... methodological limitation (inability to infer causality from bibliometric mappin...
Co‑word and thematic analyses reveal six coherent conceptual clusters that bridge technical AI topics (e.g., LLMs, GANs) with managerial themes (e.g., autonomy, coordination, decision‑making).
Thematic mapping and co‑word network analysis performed on the 212‑paper corpus; identification of six clusters reported in results.
high null result Generative AI and the algorithmic workplace: a bibliometric ... number and thematic composition of conceptual clusters (six clusters linking tec...
Bibliometric and conceptual tools (VOSviewer, Bibliometrix) were used to identify performance trends, co‑word structures, thematic maps, and conceptual evolution in the GenAI–organisation literature.
Methods section: use of VOSviewer for network visualization and Bibliometrix for bibliometric statistics, co‑word analysis, thematic mapping and Sankey thematic evolution.
high null result Generative AI and the algorithmic workplace: a bibliometric ... types of bibliometric analyses applied (performance trends, co‑word structures, ...
The study analysed a corpus of 212 Scopus‑indexed publications covering 2018–2025 to map emergent literature on Generative AI and organisational change.
Bibliometric dataset constructed from Scopus; sample size = 212 peer‑reviewed articles; time window 2018–2025; analyses performed with Bibliometrix and VOSviewer.
high null result Generative AI and the algorithmic workplace: a bibliometric ... size and timeframe of bibliometric corpus (number of publications, 2018–2025)
Research agenda: causal studies (panel data, quasi-experiments) are needed to estimate effects of AI exposure on employment outcomes and to evaluate retraining/income-support interventions for pre-retirement populations.
Authors’ stated recommendation based on limits of cross-sectional regression results from the n=889 survey and the identified need to move from association to causation.
Study limitations: cross-sectional design, self-reported intentions, potential unobserved confounders, and limited generalizability to only three cities (Beijing, Guangzhou, Lanzhou).
Explicit methodological statements in the paper describing data and design: cross-sectional survey of 889 respondents from three cities and reliance on self-reported employment intentions.
The analysis used sentence‑transformer models to produce dense vector representations of article text and UMAP to project those embeddings into a low‑dimensional thematic map for cluster identification and gap detection.
Methods section specifying use of sentence‑transformer embeddings and UMAP for dimensionality reduction/visualization of article text.
high null result Natural language processing in bank marketing: a systematic ... analytic techniques applied to article abstracts/text (embedding + dimensionalit...
The study followed a PRISMA protocol for literature selection and included peer‑reviewed journal articles published between 2014 and 2024, with a final sample size of n = 109.
Explicit methodological statement in the paper describing the literature search, inclusion/exclusion criteria, and final sample.
high null result Natural language processing in bank marketing: a systematic ... methodological protocol adherence and sample size
Twenty‑seven papers study marketing in banking without using NLP methods.
PRISMA systematic review; categorization of the 109 selected articles into the three coverage groups (8, 74, 27).
high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles on marketing in banking that do not use NLP
Seventy‑four papers study NLP in marketing more broadly (not specifically banking).
Same PRISMA‑based systematic review and manual categorization of the final sample n = 109 into topical buckets (NLP in marketing vs. NLP in bank marketing vs. marketing in banking without NLP).
high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles on NLP in marketing (general)
Only 8 peer‑reviewed papers directly examine NLP in bank marketing (out of a final sample of 109 articles published 2014–2024).
Systematic review following PRISMA protocol; final sample n = 109 peer‑reviewed journal articles published 2014–2024; manual screening and categorization yielding counts by topic.
high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles focused on NLP in bank marketing
The study's findings are qualitative and case-driven (Xiaomi and Deloitte); generalizability is limited by case selection and the absence of standardized quantitative metrics.
Methods section explicitly states case analysis and literature review as primary methods and notes lack of large-scale quantitative measurement.
high null result Explore the Impact of Generative AI on Finance and Taxation external validity/generalizability of results
The study is qualitative and law-focused and uses Vietnam as a focused case study without collecting primary quantitative field data.
Explicit Data & Methods statement in the paper indicating doctrinal legal analysis, comparative institutional analysis, and normative framework development; no primary quantitative sample.
high null result ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... study design/data type (qualitative, doctrinal, comparative; absence of primary ...
The study recommends empirical metrics for future evaluation of reforms, including processing time per case, reversal rates on appeal, administrative litigation frequency, compliance and procurement costs, investment flows into public-sector AI, and changes in labor composition and wages in administrative agencies.
Methodological recommendation arising from the paper's normative and comparative analysis.
high null result ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... recommended empirical metrics (processing time per case; appeal reversal rates; ...
The work is qualitative and exploratory — presenting naturalistic phenomena rather than causal empirical estimates, and is intended to be hypothesis-generating rather than definitive.
Methodology explicitly stated: naturalistic, qualitative daily observations over one month across multiple platforms; comparative observational documentation without experimental manipulation or causal identification.
high null result When Openclaw Agents Learn from Each Other: Insights from Em... nature of evidence (qualitative/exploratory vs. causal inference)
CoMAI is a modular, four-agent interview-assessment framework coordinated by a centralized finite-state machine.
System design and implementation described in the paper: a pipeline of four specialized agents (question generation, security/validation, scoring by rubric, summarization/reporting) with a centralized finite-state machine enforcing workflow and information flow constraints.
high null result CoMAI: A Collaborative Multi-Agent Framework for Robust and ... system architecture (agent decomposition and FSM coordination)
Field experiments (A/B testing) and willingness-to-pay experiments are necessary to quantify monetary benefits, adoption curves, and optimal pricing for alignment capabilities.
Paper explicitly recommends these empirical approaches in the recommendations for economists and product teams; this is a methodological recommendation rather than an empirical finding.
high null result A Context Alignment Pre-processor for Enhancing the Coherenc... adoption rates, willingness-to-pay, retention, task completion differences acros...
Recommended evaluation directions include automatic metrics (embedding similarity, task success, turn counts), human evaluation (satisfaction, perceived collaboration), and A/B testing in deployed settings (latency, compute, retention).
Paper's explicit evaluation proposals and recommended metrics listed in the Data & Methods and Evaluation Directions sections; these are prescriptive recommendations rather than executed experiments.
high null result A Context Alignment Pre-processor for Enhancing the Coherenc... specified evaluation metrics (task success rate, turn counts, retention, latency...
The paper focuses on architecture and conceptual arguments rather than reporting large-scale empirical datasets or results.
Data & Methods section and overall document framing emphasize architecture description and proposed evaluations; explicitly notes absence of large-scale empirical results in the provided summary.
high null result A Context Alignment Pre-processor for Enhancing the Coherenc... presence/absence of large-scale empirical evaluation
Alignment verification can be implemented using semantic embeddings (cosine similarity) or learned classifiers with threshold-based decision branching.
Paper describes these as recommended implementation approaches for the alignment verification component; no empirical benchmark comparing methods is reported.
high null result A Context Alignment Pre-processor for Enhancing the Coherenc... similarity scores, classifier accuracy, false positive/negative rates for drift ...
Temporal decay in the retrieval component can be modeled with functions such as exponential decay and a tunable half-life parameter applied to dialogue-turn embeddings.
Methodological description in the paper specifying temporal decay modeling options (exponential decay example) and tunable parameters; descriptive claim about intended implementation (no empirical comparison of decay functions provided).
high null result A Context Alignment Pre-processor for Enhancing the Coherenc... decay parameter values / impact of decay function on retrieval weighting
Research agenda items for economists include: quantifying willingness-to-pay for verifiable reasoning, studying labor-market impacts for validators, designing contracts/mechanisms to incentivize truthful argument provision, and evaluating regulatory interventions.
Paper's stated research and policy agenda; prescriptive rather than empirical.
high null result Argumentative Human-AI Decision-Making: Toward AI Agents Tha... existence and prioritization of empirical research on WTP, labor impacts, mechan...
Evaluation currently lacks metrics and benchmarks for argument quality, fidelity, contestability, and human trust; developing these is necessary.
Paper notes the gap and proposes evaluation metrics and experimental designs; no new benchmarks introduced.
high null result Argumentative Human-AI Decision-Making: Toward AI Agents Tha... availability and maturity of evaluation metrics and benchmarks
Evaluation metrics for the architecture should include sample efficiency, generalization across tasks, robustness to distribution shift, autonomy (fraction of learning decisions made internally), transfer speed, lifelong retention, and safety/constraint adherence.
Explicit recommendations for evaluation metrics in the paper.
high null result Why AI systems don't learn and what to do about it: Lessons ... listed evaluation metrics (sample efficiency; generalization; robustness; autono...
This paper is a conceptual/theoretical architecture proposal rather than an empirical study; empirical validation should follow via suggested experiments.
Explicit statement in the paper about nature of contribution.
high null result Why AI systems don't learn and what to do about it: Lessons ... N/A (no empirical outcomes reported)
Suggested empirical research directions for AI economists include: comparing LLM performance and economic outcomes on rule‑encodable vs tacit tasks; quantifying performance decline when forcing LLMs into interpretable rule representations; studying contracting/pricing where buyers cannot verify internal rules; and measuring returns to scale attributable to tacit capabilities.
Explicitly enumerated recommended research agenda items in the paper; these are proposed studies rather than executed work.
high null result Why the Valuable Capabilities of LLMs Are Precisely the Unex... proposed empirical research topics and corresponding outcomes to measure
New metrics are needed to value tacit capabilities — e.g., measures of transfer, generalization under distribution shifts, ease of integrating with human workflows, and irreducibility to compressed rule representations.
Methodological recommendation in the paper listing specific metric categories for future empirical work.
high null result Why the Valuable Capabilities of LLMs Are Precisely the Unex... proposed metrics for assessing tacit LLM capabilities
Suggested empirical validations (not performed) include benchmarking LLMs versus rule systems on allegedly rule‑encodable tasks, attempting rule extraction and measuring fidelity loss, and compression/distillation studies to quantify irreducible task performance.
Recommendations and proposed experimental directions listed in the paper; these are proposals, not executed studies.
high null result Why the Valuable Capabilities of LLMs Are Precisely the Unex... types of empirical tests recommended for validating the thesis
The paper contains mostly qualitative and historically grounded empirical content and reports no primary datasets or large‑scale experimental results in support of the formal thesis.
Explicit declaration in the Data & Methods section that empirical content is qualitative/historical and no new datasets were collected.
high null result Why the Valuable Capabilities of LLMs Are Precisely the Unex... extent of empirical/quantitative evidence presented
The paper's core methodological approach is conceptual and theoretical argumentation (formal/logical proof, historical examples, and philosophical framing), not empirical experimentation.
Stated Data & Methods description indicating reliance on formal logic, historical case analysis, and philosophical argument; absence of primary datasets.
high null result Why the Valuable Capabilities of LLMs Are Precisely the Unex... presence/absence of empirical experiments in the paper
The LEAFE algorithmic procedure: summarize environment feedback into compact experience items; backtrack to earlier decision points causally linked to failures and re-explore corrective action branches; distill corrected trajectories into the policy via supervised fine-tuning.
Method section / algorithm description in paper specifying the reflective/backtracking and distillation pipeline as the core of LEAFE.
high null result Internalizing Agency from Reflective Experience N/A (algorithmic procedure description rather than an outcome)
Evaluation used seven benchmarks spanning online computer-use, offline computer-use, and multimodal tool-use reasoning tasks.
Benchmarks section in the summary states seven benchmarks covering those categories; no benchmark names or dataset sizes provided in the summary.
high null result Anticipatory Planning for Multimodal AI Agents benchmark task performance (task success, generalization)
Objectives combine trajectory-level rewards (for global consistency) with stepwise grounded rewards derived from execution outcomes.
Method summary explicitly lists these objectives as part of the TraceR1 training procedure.
high null result Anticipatory Planning for Multimodal AI Agents global plan consistency and stepwise execution outcomes
TraceR1 focuses on short-horizon trajectory forecasting to keep predictions tractable while capturing near-term consequences of actions.
Framework description in summary that emphasizes 'short-horizon trajectory forecasting' as a design choice.
high null result Anticipatory Planning for Multimodal AI Agents forecast horizon (short-horizon) / tractability of predictions
During grounded fine-tuning, tools are treated as frozen agents and only the policy is adjusted using execution feedback (tools are not modified).
Explicit statement in Data & Methods section of the summary describing tool handling during grounded fine-tuning.
high null result Anticipatory Planning for Multimodal AI Agents policy adaptation to tool execution feedback / tool-compatibility of executed ac...