The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (14055 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Each of the four published papers used in the experiments contained an error that I helped identify or correct.
Author statement that the 4 papers each contained an error; author involvement in identification/correction is asserted.
high null result Can AI Refute Economic Theory? Evidence from Beyond the Know... presence of errors in the 4 target papers
I conducted experiments in which I asked several AI models (Gemini, Refine, Claude, and ChatGPT) to check the correctness of four published papers in economic theory.
Author reports running direct experiments: prompted listed models to check 4 published economic-theory papers.
high null result Can AI Refute Economic Theory? Evidence from Beyond the Know... existence of experiments using specified models on 4 papers
The paper proposes a five-pillar diagnostic framework combining fundamental valuation, residual-exuberance tests, SADF/GSADF explosive-root procedures, LPPL/HLPPL price-pattern diagnostics, sentiment and issuance measures, and capex-payback analysis.
Methodological proposal presented in the paper (framework description); this is a stated contribution rather than an empirical result.
high null result Boom, Bubble, or Buildout? A Multi-Method Evaluation of Whet... diagnostic framework components for bubble assessment
From Codeforces histories we build an AI-prompt signature characterised by more first-attempt acceptances and fewer attempts and retries, consistent with AI-assisted practice.
Empirical construction from CF submission histories (pattern: increased first-try accepts, fewer retries). Method: analysis of historical submission logs; sample size not stated in abstract.
high null result When the Scaffold Stays On: AI, Practice Style, and Screenin... submission patterns (first-attempt acceptances, attempts, retries)
The International Collegiate Programming Contest (ICPC) and the International Olympiad in Informatics (IOI) prohibit AI under proctoring and admit entrants through qualification rounds, whereas online Codeforces (CF) contests are unproctored and open to all.
Descriptive factual claim about contest rules and formats (institutional description in paper); based on contest rules and organizational formats referenced by authors.
high null result When the Scaffold Stays On: AI, Practice Style, and Screenin... institutional design (proctoring and entry requirements)
Future research should adopt a more intersectional approach exploring how race, class, and geography interact with gender to shape platform work experiences.
Research limitations and implications section of the paper recommends more intersectional research directions.
high null result Empowerment or Inequality? A Feminist Political Economy Anal... research scope / intersectional coverage
This paper conducted a systematic literature review and thematic synthesis of 48 peer‑reviewed studies (2010–2024) to analyze the gendered dynamics of AI‑mediated digital labor.
Methods statement in the paper: systematic literature review and thematic synthesis; explicitly reports reviewing 48 peer‑reviewed studies covering 2010–2024.
high null result Empowerment or Inequality? A Feminist Political Economy Anal... scope of review (number of studies and timeframe)
We evaluate the system on operator feedback and a question set collected from production usage, graded by human and automated panels.
Paper's stated evaluation methodology: operator feedback + production question set, graded by humans and automated panels.
high null result Archi: Agentic Operations at the CMS Experiment evaluation methodology (feedback and graded question set)
There is a need to examine the impacts of LLM on workers in jobs where the technology is prominent.
Recommendation in the paper's conclusion based on the observed concentration of LLM exposure in lower-precarity occupations.
high null result Large language model exposure and precarious occupations: Un... research/policy need (recommendation)
These occupations (those with higher LLM exposure and lower precariousness) have previously been sheltered from technological change.
Statement in the paper's conclusion asserting that occupations with higher LLM exposure are ones historically sheltered from technological change (no specific empirical evidence provided in abstract).
high null result Large language model exposure and precarious occupations: Un... historical exposure to technological change (asserted)
The study used Canada's Labour Force Survey, developed a multidimensional index summarizing occupational exposure to precarity (contractual instability, earnings inadequacy, schedule unpredictability, working-time mismatch), and estimated associations using four multivariate linear regression models with cluster-robust standard errors plus a fifth model for the multidimensional index.
Methods description in abstract specifying data source (Canada's Labour Force Survey), index construction, and multivariate linear regression models with cluster-robust standard errors.
high null result Large language model exposure and precarious occupations: Un... multidimensional precarity index / methodological approach
This study benchmarks Algeria’s readiness to adopt AI against Morocco, Egypt, and Turkey using data from the World Bank (2022), the Oxford Insights Government AI Readiness Index, and sector-specific studies.
Methodological statement in the paper specifying data sources used for the comparative assessment (World Bank 2022, Oxford Insights index, sector studies).
high null result Artificial Intelligence and Economic Productivity: A Compara... AI readiness / readiness indicators
Over 100 participants collaborated with one of four frontier models (Claude-Opus-4.6, GPT-5.4, Gemini-3.1-Pro, and MiniMax-M2.7) on a long-horizon coding task lasting around five hours.
Study description: experimental participants (reported as "Over 100 participants") each paired with one of four named models on a ~5-hour coding task designed to mimic real-world workflows.
high null result Coding with "Enemy": Can Human Developers Detect AI Agent Sa... study sample and experimental setup (models used, task duration)
We conduct the first large-scale study of human oversight in AI coding sabotage.
Authors state they ran a large-scale user study; described as the first such study focused on human oversight in AI coding sabotage (methodological claim).
high null result Coding with "Enemy": Can Human Developers Detect AI Agent Sa... existence/scale of study (methodological claim)
The study uses four waves of data from the China Family Panel Studies (CFPS) from 2022 to 2025, constructs an individual-level indicator of the skill wage gap, and adopts an occupational task automation exposure index as a proxy variable for technological shocks.
Authors report using four waves of CFPS (2022–2025); they state they constructed an individual-level skill-wage-gap indicator and use an occupational task automation exposure index as the proxy for technological shocks (methodological description in paper).
high null result Dynamic Evolution and Configurational Heterogeneity of the S... construction and use of dataset/variables (individual-level skill wage gap; occu...
The article aims to provide systematic literature support for subsequent research and adaptive policy formulation.
Statement of the paper's stated objective; methodological and policy-intent claim from the authors.
high null result Influence of Artificial Intelligence in the Labor Market policy formulation support
This article is based on a systematic literature review and summarizes the four core theoretical mechanisms of substitution, complementarity, new task creation, and skill mismatch.
Methodological claim from the paper: the authors conducted a systematic literature review and identified these four theoretical mechanisms.
high null result Influence of Artificial Intelligence in the Labor Market theoretical mechanisms
Traditional software and agentic systems are distinct: in traditional software code is the carrier of decision logic, whereas in agentic systems code is ephemeral tooling used by an LLM-driven reasoning loop.
Formalization and conceptual definitions developed in the paper (first-principles formal distinction; no empirical sample size reported).
high null result The End of Software Engineering: How AI Agents Are Fundament... architectural role of code (carrier of logic vs ephemeral tool)
For over half a century, software engineering has operated on a foundational premise: human engineers decompose problems, encode decision logic into static code, and manually adapt that code as requirements evolve.
Historical/descriptive claim presented in the paper's framing and literature review; citation of longstanding software engineering practices (qualitative, no empirical sample size reported).
high null result The End of Software Engineering: How AI Agents Are Fundament... software development practice (human-driven decomposition and static code mainte...
We implement a two-stage processing architecture separating document-level extraction (Stage 1) from claim-level synthesis (Stage 2).
Implementation description in paper: architecture design and pipeline stages described by the authors.
high null result Leveraging LLMs for Unstructured Claims Data Analysis system architecture (document-level vs claim-level processing)
The study introduces a methodological framework for evaluating LLM citation behaviors, integrating information retrieval theory, semantic search optimization, and structured content engineering.
Explicit claim about the paper's contribution: introduction of a methodological framework combining IR theory, semantic search, and structured content engineering. This is a factual statement about the paper's content (no sample size reported in excerpt).
high null result SEARCH ENGINE OPTIMIZATION: HOW LLM-GENERATED SUMMARIES ARE ... methodological framework for evaluating LLM citation behaviors
Traditional SEO strategies have historically focused on keyword density, backlink authority, and ranking positions within search engine results pages (SERPs).
Descriptive claim about historical SEO practices presented as background/context in the paper; based on domain knowledge and literature references (no new empirical data reported in the excerpt).
high null result SEARCH ENGINE OPTIMIZATION: HOW LLM-GENERATED SUMMARIES ARE ... features of historical SEO strategies (keyword density, backlink authority, SERP...
We extend the representation-completion principle to device cold-start by constructing cohort-based embeddings from demographic features.
Methodological extension described in paper (approach for device cold-start handled via cohort-based demographic embeddings).
high null result Bridging the Semantic-Collaborative Gap: An Asymmetric Graph... device cold-start embedding construction (cohort-based demographics)
We propose Shallow-RHS, an asymmetric link-prediction architecture in which the left-hand side (LHS) device tower leverages temporally valid watch-history message passing to capture collaborative signals, while the right-hand side (RHS) content tower is intentionally shallow and encodes content solely from intrinsic features.
Model architecture description in paper (design specification; no numeric evaluation included in excerpt).
high null result Bridging the Semantic-Collaborative Gap: An Asymmetric Graph... model architecture behavior (device tower uses message passing; content tower sh...
We formulate cold-start recommendation as an inductive graph-completion problem on a temporal bipartite device-content graph.
Methodological framing presented in the paper (problem formulation).
high null result Bridging the Semantic-Collaborative Gap: An Asymmetric Graph... problem formulation (inductive graph-completion on temporal bipartite graph)
In Tubi's production retrieval system, new content must be assigned a standalone embedding immediately, and the model must also produce device embeddings suitable for approximate nearest-neighbor retrieval.
Description of production serving constraints in Tubi stated in paper (system design / operational constraint).
high null result Bridging the Semantic-Collaborative Gap: An Asymmetric Graph... serving/operational constraint: immediate standalone content embedding and devic...
In neither unit did internal control mechanisms identify any information-security incident, sensitive-data leakage, or formal compliance challenge from external oversight bodies during the period examined.
Author reports absence of recorded incidents in internal control mechanisms and no external oversight challenges for both units over the study period; based on internal records and SEI-GDF auditable indicators.
high null result The Main Barrier to AI Adoption in the Public Sector is Lack... information-security incidents / sensitive-data leakage / formal compliance chal...
Verified word-count analysis of the Executive Order shows the word 'security' appears 17× and the word 'cyber' appears 14×, while there are zero mentions of 'labor', 'education', 'culture', 'fairness', 'transparency', 'attribution', 'provenance', 'meaning', or 'commons'.
Automated/count-based analysis of the EO text (single-document word-count reported in the paper).
high null result The Security Frame Is a Selection Kernel: Trump's AI Executi... term frequency (presence/absence of specific domain terms)
The aggregate Stanford HAI AI Vibrancy Score shows no significant within-country effect on tourism’s direct GDP share after controlling for macroeconomic factors.
Fixed-effects estimation with clustered standard errors on panel data from 33 countries (2017–2023); reported coefficient β = 0.061, p = 0.622, with macroeconomic controls.
high null result Which dimensions of AI development shape tourism’s direct co... tourism’s direct GDP share
The study integrates ICT4D, socio-technical systems theory, and the capability approach as its theoretical framing.
Methodological/theoretical statement in the paper describing the integrative framework used for analysis.
high null result Compressed professionalization in informal economies: a soci... theoretical_integration
While grounded in the DRC, the findings offer broader insights into AI adoption dynamics across informal economies in Sub-Saharan Africa and beyond.
Authors' claim of broader relevance/generalizability based on the DRC case study and theoretical framing.
high null result Compressed professionalization in informal economies: a soci... generalizability of findings to informal economies in Sub-Saharan Africa and bey...
AI adoption in the DRC emerges through hybrid socio-technical interactions between bottom-up youth innovation and weakly coordinated institutional frameworks, rather than following policy-led or infrastructure-first trajectories.
Theoretical integration (ICT4D, socio-technical systems, capability approach) and qualitative interview evidence used to characterize observed adoption pathways.
high null result Compressed professionalization in informal economies: a soci... adoption pathways (hybrid socio-technical, bottom-up)
The article introduces 'compressed professionalization', defined as the accelerated acquisition and immediate market enactment of professional-level digital capabilities outside formal institutional pathways.
Conceptual/theoretical contribution presented and defined in the paper, supported by illustrative field observations from the interviews.
high null result Compressed professionalization in informal economies: a soci... compressed_professionalization (conceptual construct)
The study drew on 125 semi-structured interviews conducted in Kinshasa, Lubumbashi, and Goma.
Primary qualitative fieldwork reported in the paper: 125 semi-structured interviews across three DRC cities (Kinshasa, Lubumbashi, Goma).
The research is grounded in the Resource-Based View (RBV) and Dynamic Capabilities Theory (DCT) to explain how technological and managerial resources contribute to organizational performance.
Author statement in the paper describing the theoretical framework (RBV and DCT) used to frame the study.
The study adopts a quantitative research design and analyzes collected data using Partial Least Squares Structural Equation Modeling (PLS-SEM).
Author statement in the paper describing research design and analytical method.
Digital Leadership did not demonstrate a statistically significant direct effect on Employee Productivity (β = -0.094, p = 0.275).
Reported quantitative result from the study using PLS-SEM; β and p-value provided in the paper showing a non-significant direct effect. Sample size not reported in the excerpt.
We scored over 2.1 million twin responses on 500 participants and 183 held-out questions.
Reported evaluation counts in the paper: 2.1M responses, 500 participants, 183 held-out questions.
high null result Synthetic Personalities: How Well Can LLMs Mimic Individual ... number of evaluated twin responses / evaluation scale
The construction-method grid covers three open-weight LLMs, five cumulative information depths ranked by normalized Shannon entropy, two embedding methods, and two reasoning modes.
Paper's experimental design specification (methods section).
high null result Synthetic Personalities: How Well Can LLMs Mimic Individual ... experimental factorization of model types, information depths, embedding methods...
We construct detailed individual-level twins from the German Socio-Economic Panel (SOEP) and evaluate them across a 3 × 5 × 2 × 2 construction-method grid.
Methodological description of the study: experimental construction and evaluation on SOEP data.
high null result Synthetic Personalities: How Well Can LLMs Mimic Individual ... feasibility of constructing and evaluating detailed individual-level twins from ...
These are mechanism-oriented synthetic results, not estimates of real firm behavior in a jurisdiction or industry.
Explicit qualification in the abstract stating the scope and limits of inference (paper text).
high null result When Firms Learn to Game the Rules external validity / scope of inference
The study uses a synthetic agent-based reinforcement-learning simulation that separates actual conduct near a legal threshold from proximity in the computable enforcement signal.
Methodological description in abstract: ABM/RL simulation with explicit separation of conduct vs. computable signal; run counts reported (150 seed-level scenario runs, 378 computability-sweep runs, 288 Latin-hypercube runs) and a 2,880,000-row firm-period panel.
high null result When Firms Learn to Game the Rules methodological separation of conduct vs enforcement signal (model design)
Ordinary adaptive updates do not reliably reduce boundary search.
ABM/RL simulation experiments reported in the paper (multiple runs and the firm-period panel); qualitative comparative statement from simulation outputs.
high null result When Firms Learn to Game the Rules boundary search (conduct boundary mass / firms' proximity to legal thresholds)
There is no evidence of improved win rates for AI-flagged complaints; AI-flagged complaints are more likely to be dismissed and to terminate at earlier procedural phases.
Outcome analysis linking AI-flag status to litigation outcomes (win rates, dismissal rates, termination phase) using case metadata.
high null result The New Pro Se: Generative AI and the Surge in Federal Civil... win rate; dismissal rate; procedural termination phase
A large-scale empirical study on Harvey LAB used 12,510 agent trajectories.
Paper states an empirical study run on Harvey LAB with a sample described as 12,510 agent trajectories.
high null result Parthenon Law: A Self-Evolving Legal-Agent Framework agent trajectories (dataset size)
The paper analyzes multiple dimensions of scientific creativity and impact, specifically recombinant novelty, object novelty, 3-year short-run citation impact, and 10-year long-run citation impact.
Methodological description in paper listing the specific dependent variables and time horizons used to measure novelty and impact.
high null result Does Artificial Intelligence Advance Science? measures used (recombinant novelty, object novelty, 3-year citations, 10-year ci...
The analysis draws on over one million publications from OpenAlex.
Descriptive statement in paper specifying dataset source (OpenAlex) and sample size of publications used for analysis.
high null result Does Artificial Intelligence Advance Science? sample of publications (dataset size)
This study uses panel data from 281 Chinese cities between 2005 and 2022, treats establishment of national GIPs as a quasi‑natural experiment, and applies a double machine learning approach.
Methods description in the paper explicitly states data coverage (281 Chinese cities, 2005–2022), research design (quasi‑natural experiment), and estimation strategy (double machine learning).
high null result Does green industrialization enhance urban industrial chain ... research design / methodological approach
Experts rated 24 AI risks on harm probability and severity, sector and actor vulnerability, actor responsibility, and overall concern.
Study design described in paper: set of 24 defined AI risks rated across several dimensions by Delphi panel participants (n=272).
high null result Prioritization of Risks from Artificial Intelligence: A Delp... risk ratings across multiple dimensions (probability, severity, vulnerability, r...
We conducted a three-round Delphi study conducted late 2025 with 272 international AI experts.
Methodological description in the paper: three-round Delphi study, timing reported as late 2025, sample size reported as 272 international AI experts.
high null result Prioritization of Risks from Artificial Intelligence: A Delp... study_participation / sample characterization