The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (7448 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Guerreiro et al. (2022) characterize optimal Mirrleesian tax system with automation and find that robot taxes should be transitional—high when incumbent workers cannot retrain, converging to zero as new cohorts adjust skill investments.
Citation reported in the paper summarizing Guerreiro et al. (2022)'s theoretical result on transitional robot taxes.
high neutral NBER WORKING PAPER SERIES optimal robot tax path over time
If labor becomes economically redundant, the policy focus shifts from steering innovation to redesigning public finance and redistribution (e.g., new tax instruments, redistribution mechanisms).
Theoretical scenario analysis in the paper with references to related works (Korinek and Juelfs 2024; Korinek and Lockwood 2026).
high neutral NBER WORKING PAPER SERIES policy priority shift (steering -> public finance/redistribution)
Evaluation is carried out under three frozen context configurations (diff only: config_A; diff with file content: config_B; full context: config_C) enabling systematic ablation of context provision strategies.
Methodological description: three fixed context configurations defined and used for ablation experiments.
high neutral SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... effect of context-provision design on model performance
Traffic performance is evaluated using the Fundamental Diagram (FD) under varying driver heterogeneity, heterogeneous time-gap penetration levels, and different shares of RL-controlled vehicles.
Description of experimental/evaluation setup in the paper: macroscopic evaluation via Fundamental Diagram across varied scenario parameters. No numeric sample size provided in the claim text.
high neutral Macroscopic Characteristics of Mixed Traffic Flow with Deep ... traffic performance (via Fundamental Diagram) under varied heterogeneity and RL ...
CriQ is a sister app to Dream11, India's largest fantasy sports platform with over 250 million users.
Descriptive statement in the paper providing context about the application domain and user base.
We performed an extensive evaluation of 37 state-of-the-art Vision-Language Models on MultihopSpatial.
Empirical evaluation described in the paper listing the number of models evaluated (37).
high neutral MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... benchmark coverage across models evaluated
We critically compare LLM-generated rulings against 10,000 real-world court judgments from China Judgments Online (CJOL).
Dataset statement: the paper compares model outputs to a corpus of 10,000 CJOL labor dispute judgments.
high neutral LLM Safety in Judicial AI: A Stress Test of Social Media Inf... agreement / deviation between LLM-generated rulings and CJOL judgments
We introduce a novel stress test that evaluates LLM-generated labor dispute outcomes by injecting social media sentiment as an external pressure.
Methodological description in the paper: a designed stress test where social media sentiment is used to perturb LLM outputs for labor dispute cases.
high neutral LLM Safety in Judicial AI: A Stress Test of Social Media Inf... sensitivity of LLM-generated labor dispute outcomes to injected social media sen...
The paper treats data as a new type of production factor and endogenizes it within the production function.
Theoretical/methodological: the paper constructs a macro-level theoretical model that explicitly includes data as an endogenous input in the production function (no empirical/sample data).
high neutral Study on the impact of big data sharing on individuals’ welf... inclusion of data as a production factor (model specification)
In the near term, the most plausible equilibrium is bounded autonomy, in which AI agents operate as supervised co-pilots, monitoring systems, and constrained execution modules embedded within human decision processes.
Theoretical argument and forward-looking assessment by the authors based on the proposed framework and plausibility considerations; not presented as the result of a causal empirical study in the excerpt.
high neutral AI Agents in Financial Markets: Architecture, Applications, ... expected equilibrium mode of AI agent autonomy in finance (bounded autonomy / su...
Economic evaluations of GLAI should account for end-to-end risk externalities (error propagation, institutional trust, rights impacts), not only short-term productivity gains.
Methodological recommendation grounded in conceptual synthesis of technical, behavioral, and legal risks; normative argument rather than empirical result.
high neutral Why Avoid Generative Legal AI Systems? Hallucination, Overre... comprehensiveness of economic evaluations (inclusion of externalities vs. narrow...
Generative Legal AI (GLAI) systems are built on token-prediction (LLM) architectures rather than formal legal-reasoning architectures.
Conceptual and technical analysis in the paper distinguishing GLAI from other legal-tech; literature synthesis on common LLM architectures. No original empirical dataset or sample size—qualitative/technical review.
high neutral Why Avoid Generative Legal AI Systems? Hallucination, Overre... underlying model architecture type (token-prediction vs. formal-reasoning)
The paper's formalism shows that prompt/system messages shape distributions over possible execution paths (indirect control) but do not evaluate actual partial paths at runtime.
Formal mapping in the paper that treats prompts as shaping prior over paths; conceptual argument and illustrative examples.
high neutral Runtime Governance for AI Agents: Policies on Paths degree of control over execution path (distributional shaping vs. path-specific ...
Through a thematic review of existing research, the authors identified recurring themes about incentive schemes: their components, how researchers manipulate them, and their impact on research outcomes.
Authors' stated method and findings: thematic review (the scope/number of reviewed papers not specified in excerpt).
high neutral Incentive-Tuning: Understanding and Designing Incentives for... themes in incentive design practices and reported impacts on empirical study out...
A critical aspect of conducting human–AI decision-making studies is the role of participants, often recruited through crowdsourcing platforms.
Claim based on the authors' thematic literature review noting participant sourcing practices (specific studies and counts not given in excerpt).
high neutral Incentive-Tuning: Understanding and Designing Incentives for... participant recruitment source (e.g., crowdsourcing) and its influence on study ...
Researchers conduct empirical studies investigating how humans use AI assistance for decision-making and how this collaboration impacts results.
Statement summarizing the research landscape; supported implicitly by the authors' thematic review of existing empirical studies (number of studies not specified in excerpt).
high neutral Incentive-Tuning: Understanding and Designing Incentives for... human behaviour and decision outcomes when assisted by AI (empirical study outco...
The study provides empirical evidence specific to a small open EU economy (Slovakia) on the relationship between AI adoption and labour productivity.
Use of harmonised Eurostat enterprise and productivity data for Slovakia and EU27 over 2021–2024, analysed with descriptive statistics, gap analysis, dynamics of change, correlation, and an illustrative regression model.
high neutral Artificial Intelligence Adoption and Labour Productivity in ... Empirical characterization of AI adoption and labour productivity relationship f...
Returns to AI are heterogeneous across firms; estimating treatment effects requires attention to selection, complementarities, and dynamic adoption pipelines.
Methodological argument referencing treatment-effect literature and observed firm heterogeneity; supported by conceptual examples rather than a single empirical treatment-effect estimate.
high neutral Modern Management in the Age of Artificial Intelligence: Str... heterogeneity in returns to AI adoption (firm-level productivity or performance ...
A subset of four datasets included settings in which the AI provided explanations of its decision.
Paper states that four of the datasets involved AI explanations (explicitly stated in abstract).
high null result Beyond AI advice -- independent aggregation boosts human-AI ... presence_of_AI_explanation
The study compared HCT to the AI-as-advisor approach using 10 datasets from various domains, including medical diagnostics and misinformation discernment.
Paper reports an empirical comparison across 10 datasets spanning multiple domains (explicitly stated in abstract).
The hybrid confirmation tree (HCT) elicits a human judgment and an AI judgment independently; if they agree that decision is accepted, and if they disagree a second human breaks the tie.
Description of the HCT method in the paper (procedural/design specification).
This chapter is based on a systematic literature review using the PRISMA framework and includes a thematic analysis followed by a bibliometric coupling of 23 documents from the Scopus database.
Methodological statement in the paper: systematic literature review using PRISMA, thematic analysis, bibliometric coupling; sample drawn from Scopus; 23 documents.
high null result Tackling the rational decision-making in ethical consumption methodological approach (systematic review of literature)
The cross-sectional, self-reported survey design prevents strong causal claims about the effect of algorithms or selective exposure on polarization.
Authors explicitly note methodological limitations: cross-sectional survey of N = 450, reliance on self-reported consumption, and lack of platform log or longitudinal/experimental data.
high null result Echo Chambers, Filter Bubbles, and Selective Exposure: Media... causal inference ability (limitation due to design)
The study adopted a positivist philosophy and a descriptive-correlational design.
Methods section statement in the paper describing the research philosophy and study design.
high null result Technology Innovation Strategy and the Competitiveness of Ke... research design / methodology
Data were collected from innovation-focused executives across 39 licensed Kenyan commercial banks.
Paper statement specifying sample source: 'Using data from innovation-focused executives across 39 licensed banks.'
high null result Technology Innovation Strategy and the Competitiveness of Ke... sample composition / data source
Technological innovation was assessed via adoption of new systems, integration of digital channels, and use of Artificial Intelligence and data analytics.
Measurement description provided in the paper listing the components used to operationalize technological innovation.
high null result Technology Innovation Strategy and the Competitiveness of Ke... measurement/operationalization of technological innovation
Competitiveness in the study was measured through market share, return on equity and customer satisfaction.
Measurement description provided in the paper describing dependent variable operationalization (explicit list of three indicators).
high null result Technology Innovation Strategy and the Competitiveness of Ke... measurement/operationalization of competitiveness
Metode penelitian yang digunakan adalah penelitian hukum normatif dengan pendekatan perundang-undangan, konseptual, dan komparatif, didukung oleh analisis literatur dari jurnal nasional terindeks SINTA dan jurnal internasional bereputasi.
Pernyataan metode yang jelas tercantum dalam abstrak/metodologi makalah.
high null result Reformasi Hukum Ketenagakerjaan di Era Artificial Intelligen... metodologi penelitian (penelitian hukum normatif dan tinjauan literatur)
Penelitian menilai kecukupan perlindungan hukum yang tersedia bagi pekerja terdampak PHK akibat adopsi AI.
Pernyataan tujuan penelitian dan pendekatan analitis (normatif, komparatif) yang didukung oleh tinjauan literatur pada jurnal-jurnal terpilih.
high null result Reformasi Hukum Ketenagakerjaan di Era Artificial Intelligen... kecukupan perlindungan hukum bagi pekerja terdampak AI
Penelitian ini bertujuan menganalisis bagaimana Undang-Undang Cipta Kerja dan peraturan turunannya mengklasifikasikan dan menjustifikasi Pemutusan Hubungan Kerja (PHK) akibat adopsi AI.
Pernyataan tujuan penelitian yang tercantum di bagian metodologi/pendahuluan; pendekatan peraturan-perundang-undangan dalam penelitian hukum normatif.
high null result Reformasi Hukum Ketenagakerjaan di Era Artificial Intelligen... klasifikasi dan justifikasi PHK dalam kerangka UU Cipta Kerja
The user study had N=50 participants.
Reported user study sample size (N=50) used to evaluate AI-assisted intent expansion in ecologically valid settings.
high null result Structured Intent as a Protocol-Like Communication Layer: Cr... user study sample size
Under the current evaluation resolution, 5W3H, CO-STAR, and RISEN achieve similarly high goal-alignment scores, suggesting that dimensional decomposition itself is an important active ingredient.
Controlled comparison between three structured frameworks (5W3H, CO-STAR, RISEN) across the evaluated outputs, with no meaningful differences reported between them.
The study evaluated 3,240 model outputs (3 languages x 6 conditions x 3 models x 3 domains x 20 tasks) using an independent judge (DeepSeek-V3).
Reported experimental design and evaluation: 3 languages, 6 conditions, 3 models, 3 domains, 20 tasks; judged by DeepSeek-V3.
high null result Structured Intent as a Protocol-Like Communication Layer: Cr... number of model outputs evaluated / evaluation procedure
The paper frames the LLM-politician relationship through principal-agent theory and bounded rationality, conceptualizing the legislator as a principal delegating advisory tasks to a boundedly rational agent under structural information asymmetry.
Explicit theoretical framing described in the introduction or theory section of the paper.
Model outputs were evaluated using a dual framework combining LLM-as-Judge semantic scoring and programmatic text similarity metrics.
Paper describes the evaluation methodology: semantic scoring via LLM-as-Judge plus programmatic text similarity measures applied to model-generated rationales vs official memoranda.
high null result Can Commercial LLMs Be Parliamentary Political Companions? C... evaluation method / scoring approach
Six LLMs were evaluated: GPT-5-mini, GPT-5-chat (OpenAI), Claude Haiku 4.5 (Anthropic), and Llama 4 Maverick, Llama 3.3 70B, Llama 3.1 8B (Meta).
Paper explicitly lists the six evaluated models spanning three provider families and multiple capability tiers.
The study uses a dataset of 15 Romanian Senate law proposals paired with their official explanatory memoranda (expuneri de motive).
Explicit statement in the paper describing the dataset composition: 15 Romanian Senate law proposals each paired with its official explanatory memorandum.
high null result Can Commercial LLMs Be Parliamentary Political Companions? C... dataset size / data corpus
We implement a rigorously controlled execution-based testbed featuring Git worktree isolation and explicit global memory to evaluate agent coordination frameworks.
Methodological description in the paper indicating the testbed design choices (Git worktree isolation, explicit global memory) used to ensure controlled, reproducible execution of agent-generated code.
high null result An Empirical Study of Multi-Agent Collaboration for Automate... experimental reproducibility and isolation (testbed design)
We benchmark a single-agent baseline against two multi-agent paradigms: a subagent architecture (parallel exploration with post-hoc consolidation) and an agent team architecture (experts with pre-execution handoffs) using a rigorously controlled, execution-based testbed.
Description of experimental setup in the paper: an execution-based testbed with Git worktree isolation and explicit global memory; experiments explicitly compare single-agent, subagent, and agent-team architectures under fixed computational time budgets.
high null result An Empirical Study of Multi-Agent Collaboration for Automate... comparative performance of agent architectures (benchmarking setup)
Rather than proposing new recognition models, the contribution focuses on a system-level comparison of both paradigms under realistic edge constraints.
Stated scope in the abstract: the paper emphasizes system-level comparison instead of introducing new recognition models, demonstrated via the described hybrid system and evaluations.
high null result From Skeletons to Semantics: Design and Deployment of a Hybr... scope of contribution (system-level comparison vs model development)
The system is implemented on a GPU-enabled edge device and evaluated with respect to latency, resource usage, and operational trade-offs using a demonstrator-based setup.
Authors state implementation on a GPU-enabled edge device and describe evaluation of latency, resource usage, and operational trade-offs in a demonstrator-based experiment. The abstract does not include numeric metrics or sample sizes.
high null result From Skeletons to Semantics: Design and Deployment of a Hybr... latency and resource usage
Data construction: The authors treat Wikipedia technology pages as distinct technologies and trace them across patents and job postings from 1976 to 2007, using technical bigrams to identify technologies in texts.
Description of dataset construction building on Kalyani et al. (2025) in Section 2; methodological description of linking Wikipedia pages, patent text, and job postings.
high null result THE SKILL PREMIUM IN TIMES OF RAPID TECHNOLOGICAL CHANGE coverage and method of technology identification in data
Proposition 1: With a constant pace of technology creation (m(b)=m), the model admits a unique balanced growth path (BGP) along which real wages and output grow at rate g, the skill premium remains constant and is independent of m.
Analytical result (proposition) proved in the paper's model appendix under model assumptions.
high null result THE SKILL PREMIUM IN TIMES OF RAPID TECHNOLOGICAL CHANGE skill premium dependence on pace parameter m along BGP
The modal technology in the top 1% densest locations (e.g., New York, San Francisco) is 34 years old, while the modal technology in the bottom 50% lowest-density locations is 48 years old, indicating sizable diffusion gaps.
Empirical measurement from the text-based technology dataset tracking vintage of technologies across locations; reported modal ages by location density percentile.
high null result THE SKILL PREMIUM IN TIMES OF RAPID TECHNOLOGICAL CHANGE modal technology age by location density
Limitations: the Comscore data observe household internet activity on home (non-mobile) devices and do not capture offline or mobile device activities, so extrapolation to total at-home activities should be done with caution.
Authors' explicit limitation discussion in paper stating data do not include mobile devices or offline activities.
high null result https://arxiv.org/pdf/2603.03144 data coverage (mobile/offline activities not observed)
ChatGPT adoption leaves the total time spent on productive online activities (including any time spent using ChatGPT) unchanged.
Same IV long-difference estimates as above; authors state 'leaving time spent on productive digital tasks unchanged' and that total productive activity time does not decline significantly.
high null result https://arxiv.org/pdf/2603.03144 total time spent on productive online activities
The analysis uses detailed Internet browsing microdata from over 200,000 U.S. households' home devices from 2021 to 2024.
Comscore web browsing panel described in paper; authors state dataset covers 'over 200,000 U.S. households' across 2021-2024; data provides timestamps, visit durations, URLs, demographic bins, etc.
high null result https://arxiv.org/pdf/2603.03144 size and coverage of browsing panel
The present review examined the intersection of artificial intelligence, sustainable finance, ESG performance, FinTech, climate risk analytics, algorithmic governance, and responsible investing.
Statement of the paper's scope and aims (description of the review content and topics covered).
high null result Artificial intelligence in sustainable finance and Environme... topics covered by the review
The literature on AI-based ESG scoring, green finance, and data-driven sustainability reporting is disjointed across finance, management, and technology fields and requires application of the PRISMA framework to provide transparency and methodological rigor in systematic reviews.
Paper's methodological assessment and recommendation based on the authors' systematic review process and literature mapping (statement about the state of the literature and methodological needs). No numeric evidence provided in the excerpt.
high null result Artificial intelligence in sustainable finance and Environme... transparency and methodological rigor of literature reviews in the field
The analysis draws on data from 170 countries for 2020–2024 for the Government AI Readiness Index (GAIRI)–EGDI comparison.
Data description in abstract explicitly reporting the GAIRI–EGDI sample coverage as 170 countries for 2020–2024.
high null result E-government development: Artificial intelligence vibrancy a... E-Government Development Index (EGDI)