The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (4560 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Productivity Remove filter
Results are sensitive to model and prompt choice; researchers should perform robustness checks across LLMs, soft prompts, and embedding models.
Caveat explicitly stated in the paper summary noting model and prompt sensitivity; recommended validation steps include robustness checks across models and prompts.
high negative Soft-Prompted Semantic Normalization for Unsupervised Analys... sensitivity of clustering/labeling results to LLM, prompt design, and embedding ...
Measurement issues (task-based output measurement, attributing output changes to AI) and selection into early adoption bias estimated productivity gains upward.
Methodological robustness checks reported in the paper: task-based measures, bounding exercises, placebo tests, and analysis of pre-trends; discussions of selection on unobservables and potential upward bias.
high negative S-TCO: A Sustainable Teacher Context Ontology for Educationa... validity/bias of estimated productivity effects
Implementing the governed hyperautomation pattern raises upfront costs (governance tooling, monitoring, validation, compliance processes).
Economic and cost-structure discussion in the paper, based on qualitative reasoning and industry experience; no quantified cost estimates or sample-based cost analysis provided.
high negative Governed Hyperautomation for CRM and ERP: A Reference Patter... upfront implementation costs (governance tooling, validation, compliance overhea...
VIS inherits the limitations of input–output assumptions (fixed coefficients, no price feedbacks); AI-driven structural change may violate those assumptions, so dynamic extensions or calibration are needed.
Paper explicitly cautions about input–output model limitations and the need for dynamic extensions/calibration under structural/technological change.
high negative Measuring labor productivity dynamics in U.S. industrial and... validity/applicability of VIS estimates under structural/AI-driven change
Increases in K_T reduce employment levels in affected firms and industries even when aggregate productivity rises.
Panel econometric estimates at firm and industry levels relating K_T intensity to employment outcomes, controlling for demand, input prices, and firm characteristics; difference-in-differences specifications and instrumental-variable robustness checks; corroborated by sectoral case studies.
high negative The Macroeconomic Transition of Technological Capital in the... employment (firm- and industry-level employment counts or employment growth)
Rising technological capital (K_T) — proxied by robot/automation density, software and intangible capital accumulation, AI adoption surveys, and AI-related patenting — leads to a decline in labor’s share of output.
Firm- and industry-level panel regressions linking constructed K_T intensity measures to labor shares, supported by macro growth-accounting decompositions; robustness checks include difference-in-differences and instrumenting adoption with plausibly exogenous shocks (e.g., cross-border technology diffusion, trade shocks); validated with cross-country comparisons and case studies.
high negative The Macroeconomic Transition of Technological Capital in the... labor share of income (share of output paid to labor)
Fuel subsidy reform imposed an enormous fiscal burden that peaked at 2.8% of GDP in 2022, limiting the macroeconomic leverage of AI-driven efficiency gains.
Reported fiscal statistic in the paper (2.8% of GDP in 2022) and its role in analysis of why AI savings do not translate into large macro gains.
high negative (constraint) AI-Based Technological Transformation as a Driver for Develo... fiscal burden of fuel subsidies (% of GDP) and its moderating effect on GDP/trad...
The oil and gas trade balance remained in deficit at -1.55 billion USD in May 2025 and -1.58 billion USD in July 2025 despite an overall national trade surplus.
Reported trade-balance figures in the paper (monthly trade statistics for May and July 2025).
high negative (deficit persists) AI-Based Technological Transformation as a Driver for Develo... oil & gas trade balance (USD, monthly values)
The framework is calibrated with O*NET task data, a survey of 3,778 domain experts, and GPT-4o-derived task decompositions, and implemented in computer vision.
Calibration and empirical implementation using O*NET, a domain expert survey (n=3,778), and GPT-4o task decompositions; applied to computer vision tasks.
high neutral Economics of Human and AI Collaboration: When is Partial Aut... validity of calibration / empirical grounding of the framework
We introduce an entropy-based measure of task complexity that maps model accuracy into a labor substitution ratio, quantifying human labor displacement at each accuracy level.
New metric proposed in the paper (entropy-based task complexity) and mapping procedure from accuracy to substitution ratio; implemented in the framework.
high neutral Economics of Human and AI Collaboration: When is Partial Aut... labor substitution ratio (human labor displaced per unit accuracy)
Costinot and Werning (2023) develop a sufficient-statistic approach and find optimal technology taxes of 1–3.7% on robots.
Citation reported in the paper summarizing Costinot and Werning (2023)'s quantitative sufficient-statistic estimate.
high neutral NBER WORKING PAPER SERIES optimal robot tax rate
Guerreiro et al. (2022) characterize optimal Mirrleesian tax system with automation and find that robot taxes should be transitional—high when incumbent workers cannot retrain, converging to zero as new cohorts adjust skill investments.
Citation reported in the paper summarizing Guerreiro et al. (2022)'s theoretical result on transitional robot taxes.
high neutral NBER WORKING PAPER SERIES optimal robot tax path over time
If labor becomes economically redundant, the policy focus shifts from steering innovation to redesigning public finance and redistribution (e.g., new tax instruments, redistribution mechanisms).
Theoretical scenario analysis in the paper with references to related works (Korinek and Juelfs 2024; Korinek and Lockwood 2026).
high neutral NBER WORKING PAPER SERIES policy priority shift (steering -> public finance/redistribution)
Evaluation is carried out under three frozen context configurations (diff only: config_A; diff with file content: config_B; full context: config_C) enabling systematic ablation of context provision strategies.
Methodological description: three fixed context configurations defined and used for ablation experiments.
high neutral SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... effect of context-provision design on model performance
Traffic performance is evaluated using the Fundamental Diagram (FD) under varying driver heterogeneity, heterogeneous time-gap penetration levels, and different shares of RL-controlled vehicles.
Description of experimental/evaluation setup in the paper: macroscopic evaluation via Fundamental Diagram across varied scenario parameters. No numeric sample size provided in the claim text.
high neutral Macroscopic Characteristics of Mixed Traffic Flow with Deep ... traffic performance (via Fundamental Diagram) under varied heterogeneity and RL ...
CriQ is a sister app to Dream11, India's largest fantasy sports platform with over 250 million users.
Descriptive statement in the paper providing context about the application domain and user base.
We performed an extensive evaluation of 37 state-of-the-art Vision-Language Models on MultihopSpatial.
Empirical evaluation described in the paper listing the number of models evaluated (37).
high neutral MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... benchmark coverage across models evaluated
The paper treats data as a new type of production factor and endogenizes it within the production function.
Theoretical/methodological: the paper constructs a macro-level theoretical model that explicitly includes data as an endogenous input in the production function (no empirical/sample data).
high neutral Study on the impact of big data sharing on individuals’ welf... inclusion of data as a production factor (model specification)
Economic evaluations of GLAI should account for end-to-end risk externalities (error propagation, institutional trust, rights impacts), not only short-term productivity gains.
Methodological recommendation grounded in conceptual synthesis of technical, behavioral, and legal risks; normative argument rather than empirical result.
high neutral Why Avoid Generative Legal AI Systems? Hallucination, Overre... comprehensiveness of economic evaluations (inclusion of externalities vs. narrow...
Generative Legal AI (GLAI) systems are built on token-prediction (LLM) architectures rather than formal legal-reasoning architectures.
Conceptual and technical analysis in the paper distinguishing GLAI from other legal-tech; literature synthesis on common LLM architectures. No original empirical dataset or sample size—qualitative/technical review.
high neutral Why Avoid Generative Legal AI Systems? Hallucination, Overre... underlying model architecture type (token-prediction vs. formal-reasoning)
The paper's formalism shows that prompt/system messages shape distributions over possible execution paths (indirect control) but do not evaluate actual partial paths at runtime.
Formal mapping in the paper that treats prompts as shaping prior over paths; conceptual argument and illustrative examples.
high neutral Runtime Governance for AI Agents: Policies on Paths degree of control over execution path (distributional shaping vs. path-specific ...
Through a thematic review of existing research, the authors identified recurring themes about incentive schemes: their components, how researchers manipulate them, and their impact on research outcomes.
Authors' stated method and findings: thematic review (the scope/number of reviewed papers not specified in excerpt).
high neutral Incentive-Tuning: Understanding and Designing Incentives for... themes in incentive design practices and reported impacts on empirical study out...
A critical aspect of conducting human–AI decision-making studies is the role of participants, often recruited through crowdsourcing platforms.
Claim based on the authors' thematic literature review noting participant sourcing practices (specific studies and counts not given in excerpt).
high neutral Incentive-Tuning: Understanding and Designing Incentives for... participant recruitment source (e.g., crowdsourcing) and its influence on study ...
Researchers conduct empirical studies investigating how humans use AI assistance for decision-making and how this collaboration impacts results.
Statement summarizing the research landscape; supported implicitly by the authors' thematic review of existing empirical studies (number of studies not specified in excerpt).
high neutral Incentive-Tuning: Understanding and Designing Incentives for... human behaviour and decision outcomes when assisted by AI (empirical study outco...
The study provides empirical evidence specific to a small open EU economy (Slovakia) on the relationship between AI adoption and labour productivity.
Use of harmonised Eurostat enterprise and productivity data for Slovakia and EU27 over 2021–2024, analysed with descriptive statistics, gap analysis, dynamics of change, correlation, and an illustrative regression model.
high neutral Artificial Intelligence Adoption and Labour Productivity in ... Empirical characterization of AI adoption and labour productivity relationship f...
Returns to AI are heterogeneous across firms; estimating treatment effects requires attention to selection, complementarities, and dynamic adoption pipelines.
Methodological argument referencing treatment-effect literature and observed firm heterogeneity; supported by conceptual examples rather than a single empirical treatment-effect estimate.
high neutral Modern Management in the Age of Artificial Intelligence: Str... heterogeneity in returns to AI adoption (firm-level productivity or performance ...
The study adopted a positivist philosophy and a descriptive-correlational design.
Methods section statement in the paper describing the research philosophy and study design.
high null result Technology Innovation Strategy and the Competitiveness of Ke... research design / methodology
Data were collected from innovation-focused executives across 39 licensed Kenyan commercial banks.
Paper statement specifying sample source: 'Using data from innovation-focused executives across 39 licensed banks.'
high null result Technology Innovation Strategy and the Competitiveness of Ke... sample composition / data source
Technological innovation was assessed via adoption of new systems, integration of digital channels, and use of Artificial Intelligence and data analytics.
Measurement description provided in the paper listing the components used to operationalize technological innovation.
high null result Technology Innovation Strategy and the Competitiveness of Ke... measurement/operationalization of technological innovation
Competitiveness in the study was measured through market share, return on equity and customer satisfaction.
Measurement description provided in the paper describing dependent variable operationalization (explicit list of three indicators).
high null result Technology Innovation Strategy and the Competitiveness of Ke... measurement/operationalization of competitiveness
The user study had N=50 participants.
Reported user study sample size (N=50) used to evaluate AI-assisted intent expansion in ecologically valid settings.
high null result Structured Intent as a Protocol-Like Communication Layer: Cr... user study sample size
Under the current evaluation resolution, 5W3H, CO-STAR, and RISEN achieve similarly high goal-alignment scores, suggesting that dimensional decomposition itself is an important active ingredient.
Controlled comparison between three structured frameworks (5W3H, CO-STAR, RISEN) across the evaluated outputs, with no meaningful differences reported between them.
The study evaluated 3,240 model outputs (3 languages x 6 conditions x 3 models x 3 domains x 20 tasks) using an independent judge (DeepSeek-V3).
Reported experimental design and evaluation: 3 languages, 6 conditions, 3 models, 3 domains, 20 tasks; judged by DeepSeek-V3.
high null result Structured Intent as a Protocol-Like Communication Layer: Cr... number of model outputs evaluated / evaluation procedure
We implement a rigorously controlled execution-based testbed featuring Git worktree isolation and explicit global memory to evaluate agent coordination frameworks.
Methodological description in the paper indicating the testbed design choices (Git worktree isolation, explicit global memory) used to ensure controlled, reproducible execution of agent-generated code.
high null result An Empirical Study of Multi-Agent Collaboration for Automate... experimental reproducibility and isolation (testbed design)
We benchmark a single-agent baseline against two multi-agent paradigms: a subagent architecture (parallel exploration with post-hoc consolidation) and an agent team architecture (experts with pre-execution handoffs) using a rigorously controlled, execution-based testbed.
Description of experimental setup in the paper: an execution-based testbed with Git worktree isolation and explicit global memory; experiments explicitly compare single-agent, subagent, and agent-team architectures under fixed computational time budgets.
high null result An Empirical Study of Multi-Agent Collaboration for Automate... comparative performance of agent architectures (benchmarking setup)
Limitations: the Comscore data observe household internet activity on home (non-mobile) devices and do not capture offline or mobile device activities, so extrapolation to total at-home activities should be done with caution.
Authors' explicit limitation discussion in paper stating data do not include mobile devices or offline activities.
high null result https://arxiv.org/pdf/2603.03144 data coverage (mobile/offline activities not observed)
ChatGPT adoption leaves the total time spent on productive online activities (including any time spent using ChatGPT) unchanged.
Same IV long-difference estimates as above; authors state 'leaving time spent on productive digital tasks unchanged' and that total productive activity time does not decline significantly.
high null result https://arxiv.org/pdf/2603.03144 total time spent on productive online activities
The analysis uses detailed Internet browsing microdata from over 200,000 U.S. households' home devices from 2021 to 2024.
Comscore web browsing panel described in paper; authors state dataset covers 'over 200,000 U.S. households' across 2021-2024; data provides timestamps, visit durations, URLs, demographic bins, etc.
high null result https://arxiv.org/pdf/2603.03144 size and coverage of browsing panel
We release the anonymized dataset and analysis with a new query intent taxonomy to inform future designs of real-world AI research assistants and to support realistic evaluation.
Paper states that the anonymized Asta Interaction Dataset, accompanying analysis, and a new query intent taxonomy are being released publicly.
high null result Understanding Usage and Engagement in AI-Powered Scientific ... data and taxonomy release
The Asta Interaction Dataset comprises over 200,000 user queries and interaction logs from two deployed tools (a literature discovery interface and a scientific question-answering interface) within an LLM-powered retrieval-augmented generation platform.
Statement in paper describing dataset composition: >200,000 user queries and interaction logs collected from two deployed tools (literature discovery and scientific Q&A) within an RAG platform. Dataset release described in methods/dataset section.
high null result Understanding Usage and Engagement in AI-Powered Scientific ... size and composition of dataset (number of queries, tools included)
Methods combine targeted literature synthesis, comparative conceptual analysis, and framework building (with recent scholarly and institutional sources reviewed).
Explicit methodological statement in the paper describing the review and analytic approach; no primary-data methods used.
high null result Behavioral Factors as Determinants of Successful Scaling of ... methodological approach (literature synthesis and conceptual framework developme...
AI coding assistants are a high-visibility class of corporate AI and are given special attention as an illustrative case in the paper.
Paper specifically calls out AI coding assistants as a focal example in the conceptual analysis and discussion; based on literature review rather than original measurement.
high null result Behavioral Factors as Determinants of Successful Scaling of ... role of coding assistants as illustrative case for scaling and behavioral dynami...
The Article translates these insights into risk-sensitive guideposts for modernizing governance of AI-enabled tools and emerging modalities, from agentic systems to blockchain-deployed smart contracts.
Prescriptive/conceptual policy guidance presented in the Article (normative recommendations; governance framework).
high null result Rewired: Reconceptualizing Legal Services for the AI Age provision of governance guideposts for AI-enabled legal technologies
The Innovation Frontier traces LegalTech’s evolution from 2000s-vintage e-discovery to generative AI.
Historical/chronological analysis in the Article (literature review/history of LegalTech provided by authors).
high null result Rewired: Reconceptualizing Legal Services for the AI Age narrative/historical scope of LegalTech evolution covered in the Article
The Legal Services Value Chain disaggregates the lifecycle of a legal matter into five distinct nodes of activity.
Model description in the Article (conceptual architecture; decomposition of legal work).
high null result Rewired: Reconceptualizing Legal Services for the AI Age number and structure of nodes in the proposed value-chain model
The Article develops two core organizing models: the Legal Services Value Chain and the Innovation Frontier.
Explicit claim in the Article describing conceptual/model contributions (theoretical/model-building).
high null result Rewired: Reconceptualizing Legal Services for the AI Age presence of two organizing conceptual models in the Article
This Article provides a practical framework for navigating the shifting terrain of legal innovation and AI.
Statement of purpose in the Article (conceptual contribution; framework development). No empirical validation reported in the excerpt.
high null result Rewired: Reconceptualizing Legal Services for the AI Age existence of a practical framework for legal-AI governance and strategy
The SRL did not generate designs with significantly better performance than RWL, even though it explored a different region of the design space.
Empirical comparison on the battery pack design task showing no significant performance improvement of SRL over RWL despite differing exploration; exact statistical tests, p-values, and sample sizes are not provided in the excerpt.
high null result Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regul... design performance (SRL vs RWL)
The empirical analysis is based on A-share listed companies from 2015 to 2023.
Data description in the paper stating the study sample and time period (A-share listed firms, 2015–2023).
high null result The Impact of Digital Economy Pilot Zones on Corporate New Q... study sample/timeframe
Three interlocking threads characterize AI for science: (1) AI as research instrument, (2) AI for research infrastructure, and (3) the reshaping of scholarly profiles and incentives by machine-readable metrics.
Conceptual framework presented in the paper; organization of topics rather than empirical measurement. The paper indicates these threads are followed through historical and contemporary examples.
high null result A Brief History of AI for Scientific Discovery: Open Researc... conceptual decomposition of AI-for-science developments