The Commonplace
Home Papers Evidence Explore Syntheses Digests About 🎲 Workforce Futures
Direction, evidence grade, and study type are AI-generated labels (gpt-5-mini), not human-verified. Syntheses are LLM-written. "Tensions" are machine-detected candidates, not confirmed contradictions. A research-acceleration tool, not peer review. How this is built →

Evidence (7278 claims)

Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.

The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).

Browse by theme

Nine broad, paper-level topics. Click one to filter the claims below.

Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filtered →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →

Claims by outcome category

Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.

Outcome Positive Negative Mixed Null Total
Other 795 210 105 955 2131
Governance & Regulation 886 414 197 126 1654
Organizational Efficiency 826 204 129 87 1257
Technology Adoption Rate 681 259 128 110 1189
Research Productivity 464 138 65 349 1028
Output Quality 503 196 61 53 813
Decision Quality 351 180 84 51 673
AI Safety & Ethics 238 288 71 34 637
Firm Productivity 455 58 92 20 631
Market Structure 186 172 123 25 511
Task Allocation 222 70 76 34 407
Innovation Output 238 28 48 18 334
Skill Acquisition 177 62 62 17 318
Employment Level 107 57 108 13 287
Fiscal & Macroeconomic 135 72 44 26 284
Firm Revenue 172 50 28 5 256
Consumer Welfare 121 68 45 12 246
Task Completion Time 183 33 10 13 240
Inequality Measures 45 126 50 6 227
Worker Satisfaction 95 74 23 12 204
Error Rate 77 98 11 4 190
Regulatory Compliance 84 73 17 7 181
Automation Exposure 61 61 27 14 166
Training Effectiveness 98 21 14 19 154
Wages & Compensation 78 37 25 6 146
Developer Productivity 105 18 14 6 144
Team Performance 87 17 28 10 143
Job Displacement 12 83 23 1 119
Hiring & Recruitment 53 8 8 3 72
Social Protection 39 17 8 2 66
Creative Output 32 20 8 3 64
Skill Obsolescence 5 50 6 1 62
Labor Share of Income 17 20 17 54
Worker Turnover 15 15 3 33
Industry 1 1
Clear
Governance Remove filter
The paper recommends a research agenda for AI economists: causal microeconometric studies (DiD, IVs, RCTs), structural models with hybrid human–AI agents, measurement work on GenAI use, distributional analysis and policy evaluation.
Explicit recommendations listed in the implications and research agenda sections; logical follow‑on from bibliometric findings about gaps in causal and measurement evidence.
high null result Generative AI and the algorithmic workplace: a bibliometric ... recommended methodological directions for future empirical and theoretical resea...
Bibliometric mapping profiles the intellectual structure and evolution of the field but does not establish causal effects of GenAI on organisational outcomes.
Methodological limitation explicitly stated in the paper; bibliometric approach (co‑word, citation, thematic mapping) is descriptive and historical in scope.
high null result Generative AI and the algorithmic workplace: a bibliometric ... methodological limitation (inability to infer causality from bibliometric mappin...
Co‑word and thematic analyses reveal six coherent conceptual clusters that bridge technical AI topics (e.g., LLMs, GANs) with managerial themes (e.g., autonomy, coordination, decision‑making).
Thematic mapping and co‑word network analysis performed on the 212‑paper corpus; identification of six clusters reported in results.
high null result Generative AI and the algorithmic workplace: a bibliometric ... number and thematic composition of conceptual clusters (six clusters linking tec...
Bibliometric and conceptual tools (VOSviewer, Bibliometrix) were used to identify performance trends, co‑word structures, thematic maps, and conceptual evolution in the GenAI–organisation literature.
Methods section: use of VOSviewer for network visualization and Bibliometrix for bibliometric statistics, co‑word analysis, thematic mapping and Sankey thematic evolution.
high null result Generative AI and the algorithmic workplace: a bibliometric ... types of bibliometric analyses applied (performance trends, co‑word structures, ...
The study analysed a corpus of 212 Scopus‑indexed publications covering 2018–2025 to map emergent literature on Generative AI and organisational change.
Bibliometric dataset constructed from Scopus; sample size = 212 peer‑reviewed articles; time window 2018–2025; analyses performed with Bibliometrix and VOSviewer.
high null result Generative AI and the algorithmic workplace: a bibliometric ... size and timeframe of bibliometric corpus (number of publications, 2018–2025)
Research agenda: causal studies (panel data, quasi-experiments) are needed to estimate effects of AI exposure on employment outcomes and to evaluate retraining/income-support interventions for pre-retirement populations.
Authors’ stated recommendation based on limits of cross-sectional regression results from the n=889 survey and the identified need to move from association to causation.
Study limitations: cross-sectional design, self-reported intentions, potential unobserved confounders, and limited generalizability to only three cities (Beijing, Guangzhou, Lanzhou).
Explicit methodological statements in the paper describing data and design: cross-sectional survey of 889 respondents from three cities and reliance on self-reported employment intentions.
The paper identifies future research directions, including empirical causal studies on how DPP+AI interventions change recycling rates, second‑hand market prices, and firm investment in circular processes; and modeling firm strategy around proprietary vs shared DPP data.
Stated research agenda and gaps in the paper informed by the study's findings and limitations; these are recommendations rather than empirical claims.
high null result Integrating knowledge management and digital product passpor... proposed empirical and modeling research outcomes (not measured in current study...
The study used a mixed-methods design focused on the Italian fashion and cosmetics industries, employing two online surveys, k‑means clustering (consumer segmentation), principal component analysis (to identify underlying dimensions of DPP functionalities and sustainability practices), and logistic regression (to identify adoption drivers).
Methods section summary provided in the paper; explicit statement of methods and industry context. Note: sample sizes and survey instrument details are not provided in the summary.
high null result Integrating knowledge management and digital product passpor... methodological descriptors (survey-based measurements, clustering, PCA, regressi...
Two consumer segments were identified: 'aware' consumers (environmentally attuned and receptive to digital innovation and sustainability information) and 'unaware' consumers (prioritize immediate, tangible benefits like price and convenience over sustainability information).
K‑means cluster analysis applied to consumer responses from one of the online surveys in the Italian fashion and cosmetics context; summary identifies two clusters; sample sizes not reported.
high null result Integrating knowledge management and digital product passpor... consumer segmentation / cluster membership (attitudes and preferences toward sus...
This work is a conceptual/policy analysis rather than an original empirical study.
Explicit statement in the paper's Data & Methods section.
high null result A golden opportunity: Corporate sustainability reporting as ... study design/type (conceptual/policy analysis)
Study limitations include single-country (China) listed‑firm sample and reliance on secondary/administrative proxies for digitalization and innovation, which may miss internal qualitative aspects and introduce measurement error.
Authors’ stated limitations: sample restricted to Chinese A-share listed firms (2012–2022) and measures of digitalization/innovation derived from administrative/secondary data rather than direct observation/survey of internal practices.
high null result Supply Chain Digitalization and its Impact on Green Innovati... external validity and measurement quality of SCD and innovation proxies
No new primary empirical tests were performed in this paper; conclusions are based on secondary analysis and are broad and diagnostic rather than demonstrating causal mechanisms.
Explicit methodological statement in the Data & Methods and Limitations sections of the paper describing it as a qualitative literature review and synthesis.
high null result SUSTAINABILITY ISSUES IN FINANCIAL ACCOUNTING RESEARCH presence/absence of new primary empirical evidence in this paper
Research should prioritize causal identification (IV, difference‑in‑differences, regression discontinuity) to disentangle whether ESG causes better financial outcomes or instead proxies for unobserved firm quality.
Methodological recommendation based on limitations in the reviewed literature (many observational/correlational studies); the paper argues for stronger causal designs going forward.
high null result SUSTAINABILITY ISSUES IN FINANCIAL ACCOUNTING RESEARCH causal effect of ESG on financial outcomes (causal identification quality)
The authors propose research priorities for economists: quantify productivity gains from closing the actionability gap; estimate firm-level heterogeneity in evaluation capability and its effect on adoption; and model investment trade-offs between building evaluation-to-action pipelines versus accepting reduced LLM performance.
Paper's concluding recommendations for future research directions (explicitly listed by the authors).
high null result Results-Actionability Gap: Understanding How Practitioners E... recommended research agenda topics
The paper produces as primary outcomes a taxonomy of ten evaluation practices, the articulation of the results-actionability gap, and recommended strategies observed among successful teams.
Authors report these as the main outcomes of their thematic analysis and syntheses from the 19 interviews.
high null result Results-Actionability Gap: Understanding How Practitioners E... reported study outputs (taxonomy, articulated gap, recommended strategies)
The study method consisted of semi-structured qualitative interviews with 19 practitioners across multiple industries and roles, analyzed via thematic coding.
Explicit methods section of the paper stating sample size (n=19), participant diversity, interview approach, and coding/analysis procedure.
high null result Results-Actionability Gap: Understanding How Practitioners E... study design and sample size
AI-economics research should treat quantum capability as a distinct, gradually diffusing factor of production with sectoral specificity and model complementarities and policy counterfactuals endogenously.
Modeling recommendations grounded in sensitivity of macro outcomes to diffusion patterns, complementarities, and policy choices observed in the scenario and counterfactual analyses.
high null result Modeling Macroeconomic Output Gains from Quantum-Driven Prod... quality of AI-economic forecasts and policy evaluation (model realism)
Model parameters are calibrated using historical diffusion of enabling technologies (cloud computing, GPUs, AI toolchains), industry case studies, and expert elicitation where hard data are lacking.
Empirical grounding section describing calibration sources: historical diffusion, case studies (materials discovery, optimization), and expert elicitation.
high null result Modeling Macroeconomic Output Gains from Quantum-Driven Prod... calibrated model parameters (diffusion rates, adoption elasticities, complementa...
Uncertainty quantification is performed by running Monte Carlo or scenario ensembles and conducting sensitivity and robustness checks.
Methodological claim in the uncertainty quantification section describing Monte Carlo/scenario ensemble approach.
high null result Modeling Macroeconomic Output Gains from Quantum-Driven Prod... sensitivity of results to parameter uncertainty; distribution of model outcomes
Sectoral TFP shocks are integrated into computational general equilibrium (CGE) or multi-sector growth models (and optionally DSGE variants) to simulate GDP, sector output, trade impacts, and labor reallocation.
Method section stating integration of sectoral TFP shocks into CGE/multi-sector growth models with optional DSGE short-run dynamics.
high null result Modeling Macroeconomic Output Gains from Quantum-Driven Prod... GDP, sectoral output, trade flows, labor reallocation
Sectoral adoption is translated into total factor productivity (TFP) shocks or sector-specific Hicks-neutral productivity improvements based on micro evidence of quantum advantages.
Methodological description of productivity mapping linking adoption to TFP shocks using micro evidence and case studies.
The paper uses empirical diffusion functions (logistic/S-curve, Bass model) calibrated to analogous technologies to project uptake over time.
Methodological description: diffusion modeling section explicitly states use of logistic/S-curve and Bass models and calibration to past technologies (cloud, GPUs).
high null result Modeling Macroeconomic Output Gains from Quantum-Driven Prod... projected adoption curves over time
The analysis used sentence‑transformer models to produce dense vector representations of article text and UMAP to project those embeddings into a low‑dimensional thematic map for cluster identification and gap detection.
Methods section specifying use of sentence‑transformer embeddings and UMAP for dimensionality reduction/visualization of article text.
high null result Natural language processing in bank marketing: a systematic ... analytic techniques applied to article abstracts/text (embedding + dimensionalit...
The study followed a PRISMA protocol for literature selection and included peer‑reviewed journal articles published between 2014 and 2024, with a final sample size of n = 109.
Explicit methodological statement in the paper describing the literature search, inclusion/exclusion criteria, and final sample.
high null result Natural language processing in bank marketing: a systematic ... methodological protocol adherence and sample size
Twenty‑seven papers study marketing in banking without using NLP methods.
PRISMA systematic review; categorization of the 109 selected articles into the three coverage groups (8, 74, 27).
high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles on marketing in banking that do not use NLP
Seventy‑four papers study NLP in marketing more broadly (not specifically banking).
Same PRISMA‑based systematic review and manual categorization of the final sample n = 109 into topical buckets (NLP in marketing vs. NLP in bank marketing vs. marketing in banking without NLP).
high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles on NLP in marketing (general)
Only 8 peer‑reviewed papers directly examine NLP in bank marketing (out of a final sample of 109 articles published 2014–2024).
Systematic review following PRISMA protocol; final sample n = 109 peer‑reviewed journal articles published 2014–2024; manual screening and categorization yielding counts by topic.
high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles focused on NLP in bank marketing
The study's findings are qualitative and case-driven (Xiaomi and Deloitte); generalizability is limited by case selection and the absence of standardized quantitative metrics.
Methods section explicitly states case analysis and literature review as primary methods and notes lack of large-scale quantitative measurement.
high null result Explore the Impact of Generative AI on Finance and Taxation external validity/generalizability of results
The methodology is normative-philosophical argumentation supplemented by interdisciplinary synthesis (phenomenology, deconstruction, OOO, STS/material turn); this is not an empirical causal study and contains no quantitative datasets.
Author-declared methods and limits: statement that the intervention is theory-driven and qualitative; absence of quantitative analysis reported.
high null result Examining ethical challenges in human–robot interaction usin... study type and presence/absence of quantitative data (methodological)
The paper’s empirical grounding consists of illustrative case studies and vignettes from healthcare robotics, autonomous vehicles, and algorithmic governance used to demonstrate distributed agency and responsibility.
Author-stated methodology: qualitative vignettes/case illustrations across three domains; no reported sample sizes or systematic data collection.
high null result Examining ethical challenges in human–robot interaction usin... use of illustrative case material (methodological/descriptive)
The analysis in the paper is primarily qualitative and descriptive; it does not empirically quantify AI’s effects on trade flows or welfare.
Explicit statement in the methods/data description noting a mixed qualitative approach (theoretical analysis, comparative legal analysis, case studies, scenario reasoning) and absence of empirical quantification.
high null result Path Analysis of Digital Economy and Reconstruction of Inter... empirical quantification of AI's effect on trade flows and welfare (not provided...
The study is qualitative and law-focused and uses Vietnam as a focused case study without collecting primary quantitative field data.
Explicit Data & Methods statement in the paper indicating doctrinal legal analysis, comparative institutional analysis, and normative framework development; no primary quantitative sample.
high null result ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... study design/data type (qualitative, doctrinal, comparative; absence of primary ...
The study recommends empirical metrics for future evaluation of reforms, including processing time per case, reversal rates on appeal, administrative litigation frequency, compliance and procurement costs, investment flows into public-sector AI, and changes in labor composition and wages in administrative agencies.
Methodological recommendation arising from the paper's normative and comparative analysis.
high null result ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... recommended empirical metrics (processing time per case; appeal reversal rates; ...
Analysis compared responses across 16 predefined dimension pairs (ethical dimensions or response axes) and used repeated measures and qualitative coding to characterize system behavior.
Methods and Analysis sections reporting use of 16 dimension-pair comparisons, repeated-measures tests for delta between blind and declared administrations, and qualitative coding to derive D3 failure taxonomy.
high null result Literary Narrative as Moral Probe : A Cross-System Framework... analytic procedures applied (16 dimension pairs; repeated measures; qualitative ...
Probe administration included operational controls: runs were administered by two human raters across three machines to ensure operational consistency.
Methods statement describing administration by two human raters on three machines.
high null result Literary Narrative as Moral Probe : A Cross-System Framework... operational administration procedure (two human raters, three machines)
The ceiling discrimination probe used Gemini Pro (Google) and Copilot Pro (Microsoft) as independent judges.
Methods: reported use of Gemini Pro and Copilot Pro as independent judges for the ceiling probe.
high null result Literary Narrative as Moral Probe : A Cross-System Framework... agents used for ceiling-probe adjudication (Gemini Pro, Copilot Pro)
Primary blind scoring was performed by Claude (Anthropic) used as an LLM judge.
Methods: primary blind scoring explicitly performed by Claude.
high null result Literary Narrative as Moral Probe : A Cross-System Framework... agent used for primary blind scoring (Claude)
Re-administration under declared conditions produced zero delta across all 16 dimension-pair comparisons (no measurable change when declaration status changed).
Reported repeated-measures comparisons across 16 predefined dimension pairs between blind and declared administrations, with reported zero delta.
high null result Literary Narrative as Moral Probe : A Cross-System Framework... difference (delta) in scores across 16 dimension-pair comparisons between blind ...
Series 2 consisted of local and API open-source systems (n = 6) administered blind and declared, with four systems re-administered under declared conditions.
Methods description detailing Series 2 composition, modes (blind and declared), and that four systems were re-tested under declared conditions.
high null result Literary Narrative as Moral Probe : A Cross-System Framework... count of systems in Series 2 (n=6) and number re-administered under declared con...
Series 1 consisted of frontier commercial systems administered blind (n = 7).
Methods description specifying Series 1 composition and blind administration.
high null result Literary Narrative as Moral Probe : A Cross-System Framework... count of systems in Series 1 (n=7) and administration mode (blind)
The study employed 24 experimental conditions spanning 13 distinct LLM systems across two series.
Study design reported in Methods: Series 1 (frontier commercial, blind, n=7), Series 2 (local/API open-source, blind and declared, n=6), plus re-administered declared runs and ceiling-probe runs summing to 24 conditions.
high null result Literary Narrative as Moral Probe : A Cross-System Framework... number of experimental conditions and distinct systems tested (study scope)
The experiment used NYSE TAQ transaction and quote data for SPY covering 2015–2024 and tested six pre-specified hypotheses about market-quality trends.
Data and methods section specifying dataset (NYSE TAQ SPY, 2015–2024), the number of pre-specified hypotheses (six), and experimental protocol with 150 autonomous agents.
high null result Nonstandard Errors in AI Agents dataset and experimental design variables (data coverage, number of hypotheses t...
Agents' methodological choices and resulting effect estimates were systematically recorded and used to quantify dispersion and measure switching across stages.
Study design description: recorded agents' methodological choices (measure selection, estimation procedures), resulting estimates, and tracked switching and dispersion metrics (IQR) across the three-stage protocol applied to SPY TAQ data (2015–2024) with 150 agents.
high null result Nonstandard Errors in AI Agents recorded methodological choices (categorical), effect estimates (continuous), di...
AI peer review (agents exchanging written critiques) produced minimal reduction in dispersion of estimates.
Three-stage protocol: after stage 1 (independent analyses) and stage 2 (AI peer review), measured dispersion (e.g., IQR) across agents showed little change following the peer-review stage across the six hypotheses and agent pool (n=150).
high null result Nonstandard Errors in AI Agents change in dispersion (IQR) of estimates between independent-analysis stage and p...
The work is qualitative and exploratory — presenting naturalistic phenomena rather than causal empirical estimates, and is intended to be hypothesis-generating rather than definitive.
Methodology explicitly stated: naturalistic, qualitative daily observations over one month across multiple platforms; comparative observational documentation without experimental manipulation or causal identification.
high null result When Openclaw Agents Learn from Each Other: Insights from Em... nature of evidence (qualitative/exploratory vs. causal inference)
Future empirical work should measure calibration (user trust vs. model accuracy), hallucination rate, user comprehension of capability limits, and behavioral dependence on system recommendations.
Explicit methodological recommendations and suggested metrics in the paper; these are proposed future measurements rather than reported findings.
high null result Why We Need to Destroy the Illusion of Speaking to A Human: ... calibration metrics, hallucination rates, user comprehension, behavioral depende...
Conversational AI differs from interpersonal conversation: it has no true beliefs/intentions or accountability and produces probabilistic, sometimes inconsistent outputs with opaque training/data provenance.
Analytical/distinctive claim based on properties of LLMs and machine learning models discussed in the paper; conceptual analysis, no empirical testing.
high null result Why We Need to Destroy the Illusion of Speaking to A Human: ... ontological status of AI outputs (beliefs/intentions/accountability) and propert...
Research agenda items for economists include: quantifying willingness-to-pay for verifiable reasoning, studying labor-market impacts for validators, designing contracts/mechanisms to incentivize truthful argument provision, and evaluating regulatory interventions.
Paper's stated research and policy agenda; prescriptive rather than empirical.
high null result Argumentative Human-AI Decision-Making: Toward AI Agents Tha... existence and prioritization of empirical research on WTP, labor impacts, mechan...
Evaluation currently lacks metrics and benchmarks for argument quality, fidelity, contestability, and human trust; developing these is necessary.
Paper notes the gap and proposes evaluation metrics and experimental designs; no new benchmarks introduced.
high null result Argumentative Human-AI Decision-Making: Toward AI Agents Tha... availability and maturity of evaluation metrics and benchmarks