The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6917 claims)

Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 761 200 101 904 2020
Governance & Regulation 829 400 191 122 1566
Organizational Efficiency 784 193 125 84 1197
Technology Adoption Rate 637 236 124 97 1103
Research Productivity 431 131 58 340 972
Output Quality 481 183 59 47 770
Decision Quality 332 177 82 49 647
Firm Productivity 439 57 88 20 610
AI Safety & Ethics 218 279 66 33 602
Market Structure 181 170 123 24 503
Task Allocation 214 64 72 33 388
Skill Acquisition 174 62 62 17 315
Innovation Output 204 27 45 18 295
Employment Level 105 54 108 13 282
Fiscal & Macroeconomic 132 69 43 26 277
Consumer Welfare 117 63 42 11 233
Firm Revenue 154 48 26 3 231
Task Completion Time 173 31 8 12 225
Inequality Measures 44 123 50 6 223
Worker Satisfaction 89 65 22 12 188
Error Rate 71 92 10 2 175
Regulatory Compliance 77 69 14 5 165
Automation Exposure 58 56 26 13 156
Training Effectiveness 96 21 14 19 152
Wages & Compensation 77 37 25 6 145
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 81 21 1 115
Hiring & Recruitment 52 7 8 3 70
Creative Output 32 20 8 3 64
Skill Obsolescence 5 47 6 1 59
Social Protection 28 16 8 2 54
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Governance Remove filter
At the design layer, Codex matches human methodological diversity.
Comparison of methodological specifications produced by Codex (20 independent executions) to the many-analysts human baseline; reported similarity in diversity metrics between Codex outputs and human analysts.
high null result AI Coding Agents in Social Science: Methodologically Diverse... methodological diversity (variety of model/specification choices)
We run 20 independent executions of Claude Code and Codex on a prominent immigration and social-policy problem and compare them against a many-analysts human baseline.
Experimental method described in the paper: 20 independent runs/executions of each agent model (Claude Code and Codex), compared to an existing many-analysts human baseline.
high null result AI Coding Agents in Social Science: Methodologically Diverse... execution/sample of agent analyses compared to human many-analysts baseline
Specification, reference implementation, conformance suite, and worked examples are available at: https://github.com/BrightbeamAI/chap
Claim of artifact availability hosted on GitHub (URL provided) as part of the paper's resources.
high null result Collaborative Human-Agent Protocol (CHAP) availability of specification and accompanying artifacts
Two protocol standards address adjacent concerns: MCP standardises agent access to tools and data, and A2A standardises agent-to-agent interoperability.
Factual claim referencing existing standards (MCP and A2A) and their scopes; no citations or supporting documentation included in the provided excerpt.
high null result Collaborative Human-Agent Protocol (CHAP) scope of existing protocol standards
Production deployments are no longer one human supervising one model; they are multi-human, multi-agent collaborations that cross teams, time zones, and trust boundaries.
Stated as a general characterization of modern production deployments; no quantitative data or case counts provided in the excerpt.
high null result Collaborative Human-Agent Protocol (CHAP) structure of production deployments (multi-human, multi-agent)
The value of an in-band cooperative deny signal (Recuse Signal) is an empirical question: it was previously unmeasured and the paper measures whether compliant LLM agents honor such a signal.
Motivation and framing in the paper; they position their controlled experiment as the measurement addressing this previously unmeasured question.
high null result Will the Agent Recuse Itself? Measuring LLM-Agent Compliance... degree to which LLM agents honor an in-band cooperative deny signal
We searched seven databases (plus backward and forward citation searching) and synthesised 13 empirical studies published between 2018 and 2025.
Methods reported in abstract: PRISMA-ScR scoping review with a preregistered protocol; explicit count of included studies and publication date range.
high null result Artificial intelligence applications supporting women’s care... number of empirical studies identified and synthesized
From Codeforces histories we build an AI-prompt signature characterised by more first-attempt acceptances and fewer attempts and retries, consistent with AI-assisted practice.
Empirical construction from CF submission histories (pattern: increased first-try accepts, fewer retries). Method: analysis of historical submission logs; sample size not stated in abstract.
high null result When the Scaffold Stays On: AI, Practice Style, and Screenin... submission patterns (first-attempt acceptances, attempts, retries)
The International Collegiate Programming Contest (ICPC) and the International Olympiad in Informatics (IOI) prohibit AI under proctoring and admit entrants through qualification rounds, whereas online Codeforces (CF) contests are unproctored and open to all.
Descriptive factual claim about contest rules and formats (institutional description in paper); based on contest rules and organizational formats referenced by authors.
high null result When the Scaffold Stays On: AI, Practice Style, and Screenin... institutional design (proctoring and entry requirements)
Future research should adopt a more intersectional approach exploring how race, class, and geography interact with gender to shape platform work experiences.
Research limitations and implications section of the paper recommends more intersectional research directions.
high null result Empowerment or Inequality? A Feminist Political Economy Anal... research scope / intersectional coverage
This paper conducted a systematic literature review and thematic synthesis of 48 peer‑reviewed studies (2010–2024) to analyze the gendered dynamics of AI‑mediated digital labor.
Methods statement in the paper: systematic literature review and thematic synthesis; explicitly reports reviewing 48 peer‑reviewed studies covering 2010–2024.
high null result Empowerment or Inequality? A Feminist Political Economy Anal... scope of review (number of studies and timeframe)
This study benchmarks Algeria’s readiness to adopt AI against Morocco, Egypt, and Turkey using data from the World Bank (2022), the Oxford Insights Government AI Readiness Index, and sector-specific studies.
Methodological statement in the paper specifying data sources used for the comparative assessment (World Bank 2022, Oxford Insights index, sector studies).
high null result Artificial Intelligence and Economic Productivity: A Compara... AI readiness / readiness indicators
Over 100 participants collaborated with one of four frontier models (Claude-Opus-4.6, GPT-5.4, Gemini-3.1-Pro, and MiniMax-M2.7) on a long-horizon coding task lasting around five hours.
Study description: experimental participants (reported as "Over 100 participants") each paired with one of four named models on a ~5-hour coding task designed to mimic real-world workflows.
high null result Coding with "Enemy": Can Human Developers Detect AI Agent Sa... study sample and experimental setup (models used, task duration)
We conduct the first large-scale study of human oversight in AI coding sabotage.
Authors state they ran a large-scale user study; described as the first such study focused on human oversight in AI coding sabotage (methodological claim).
high null result Coding with "Enemy": Can Human Developers Detect AI Agent Sa... existence/scale of study (methodological claim)
Verified word-count analysis of the Executive Order shows the word 'security' appears 17× and the word 'cyber' appears 14×, while there are zero mentions of 'labor', 'education', 'culture', 'fairness', 'transparency', 'attribution', 'provenance', 'meaning', or 'commons'.
Automated/count-based analysis of the EO text (single-document word-count reported in the paper).
high null result The Security Frame Is a Selection Kernel: Trump's AI Executi... term frequency (presence/absence of specific domain terms)
The aggregate Stanford HAI AI Vibrancy Score shows no significant within-country effect on tourism’s direct GDP share after controlling for macroeconomic factors.
Fixed-effects estimation with clustered standard errors on panel data from 33 countries (2017–2023); reported coefficient β = 0.061, p = 0.622, with macroeconomic controls.
high null result Which dimensions of AI development shape tourism’s direct co... tourism’s direct GDP share
These are mechanism-oriented synthetic results, not estimates of real firm behavior in a jurisdiction or industry.
Explicit qualification in the abstract stating the scope and limits of inference (paper text).
high null result When Firms Learn to Game the Rules external validity / scope of inference
The study uses a synthetic agent-based reinforcement-learning simulation that separates actual conduct near a legal threshold from proximity in the computable enforcement signal.
Methodological description in abstract: ABM/RL simulation with explicit separation of conduct vs. computable signal; run counts reported (150 seed-level scenario runs, 378 computability-sweep runs, 288 Latin-hypercube runs) and a 2,880,000-row firm-period panel.
high null result When Firms Learn to Game the Rules methodological separation of conduct vs enforcement signal (model design)
Ordinary adaptive updates do not reliably reduce boundary search.
ABM/RL simulation experiments reported in the paper (multiple runs and the firm-period panel); qualitative comparative statement from simulation outputs.
high null result When Firms Learn to Game the Rules boundary search (conduct boundary mass / firms' proximity to legal thresholds)
There is no evidence of improved win rates for AI-flagged complaints; AI-flagged complaints are more likely to be dismissed and to terminate at earlier procedural phases.
Outcome analysis linking AI-flag status to litigation outcomes (win rates, dismissal rates, termination phase) using case metadata.
high null result The New Pro Se: Generative AI and the Surge in Federal Civil... win rate; dismissal rate; procedural termination phase
This study uses panel data from 281 Chinese cities between 2005 and 2022, treats establishment of national GIPs as a quasi‑natural experiment, and applies a double machine learning approach.
Methods description in the paper explicitly states data coverage (281 Chinese cities, 2005–2022), research design (quasi‑natural experiment), and estimation strategy (double machine learning).
high null result Does green industrialization enhance urban industrial chain ... research design / methodological approach
Experts rated 24 AI risks on harm probability and severity, sector and actor vulnerability, actor responsibility, and overall concern.
Study design described in paper: set of 24 defined AI risks rated across several dimensions by Delphi panel participants (n=272).
high null result Prioritization of Risks from Artificial Intelligence: A Delp... risk ratings across multiple dimensions (probability, severity, vulnerability, r...
We conducted a three-round Delphi study conducted late 2025 with 272 international AI experts.
Methodological description in the paper: three-round Delphi study, timing reported as late 2025, sample size reported as 272 international AI experts.
high null result Prioritization of Risks from Artificial Intelligence: A Delp... study_participation / sample characterization
This study constructs a comprehensive evaluation system of urban ecological resilience from three dimensions: potential, elasticity, and stability.
Methodological description in the paper: authors state they constructed a composite resilience evaluation system composed of three specified dimensions for prefecture-level cities.
high null result The impact of artificial intelligence on urban ecological re... urban ecological resilience index (constructed measure)
Explicit commercial content (product placement) shows no engagement premium (−3.8%, not significant).
Analysis comparing videos labeled for explicit commercial content (product placement) to others; reported percent difference and non-significance.
high null result Auditing Engagement Incentives in the Kidfluencer Ecosystem:... view counts (percent difference)
We conducted a multimodal AI audit of 5,051 videos across 79 kidfluencer channels using weak supervision (LLM-based classification of titles and GPT-4 Vision analysis of thumbnails and descriptions across six literature-grounded dimensions) to assign a probabilistic exploitation score to each video.
Described dataset and methods in paper: multimodal automated pipeline combining weak supervision labeling functions (LLM classifiers on titles, GPT-4 Vision on thumbnails/descriptions) applied to 5,051 videos from 79 channels.
high null result Auditing Engagement Incentives in the Kidfluencer Ecosystem:... probabilistic exploitation score (automated)
The study uses listed companies in China's manufacturing industry from 2010 to 2023 as the research sample.
Authors explicitly state the empirical sample: listed manufacturing firms in China covering 2010–2023.
high null result Big data technology application and carbon emission efficien... research sample/time period (data description)
The positive relationship between BDTA and CEE remains robust after a series of robustness tests and endogeneity tests.
Authors state they conducted robustness checks and endogeneity tests (unspecified in the summary) and report that the main regression results remain robust.
high null result Big data technology application and carbon emission efficien... carbon emission efficiency (CEE) (robustness of main effect)
Brain privacy has both personal and social attributes; its protection therefore implicates individual interests and technological development.
Normative/legal argumentation and conceptual analysis presented in the paper (no empirical data reported).
high null result Empowerment or behavioral regulation? governing brain–comput... scope of brain-privacy (personal vs. social) and implicated interests
New York City’s Local Law 144 mandates annual bias audits to increase transparency.
Statement of law/policy in paper (factual claim about NYC Local Law 144); legal requirement as described in the text.
high null result Towards Using Ai Bias Audits As Inputs For Red Teaming And P... annual bias audit mandate (LL144)
The fairness of AI-enabled hiring systems remains uncertain.
Statement in paper (background/interpretive claim); no direct empirical measure provided in the excerpt.
high null result Towards Using Ai Bias Audits As Inputs For Red Teaming And P... fairness of AI-enabled hiring systems
The study employs a comparative mixed-methods approach (comparative institutional analysis) of leading financial systems in China, the United States, and the United Kingdom (2022–2025), integrating secondary quantitative indicators with qualitative documentary evidence.
Direct methodological statement in the abstract describing the study design and data sources.
high null result Artificial Intelligence in Financial Security Markets: Catal... methodological approach (comparative mixed-methods)
The distinction matters: debt is a stock of design and governance liability, while the tax is a flow of operating cost that arises because stochastic agents act through tools and workflows.
Conceptual argument in the paper articulating difference between two defined concepts (Agentic Technical Debt vs Stochastic Tax); no empirical demonstration.
high null result Governing Technical Debt in Agentic AI Systems conceptual distinction between liability (stock) and operating cost (flow)
Stochastic Tax is the recurring operating burden of keeping probabilistic agent behavior within acceptable bounds.
Paper provides a formal definition / conceptual framing of 'Stochastic Tax'; stated as an operational concept (no empirical quantification provided).
high null result Governing Technical Debt in Agentic AI Systems operating burden from probabilistic agent behavior
Agentic Technical Debt is the accumulated liability created when prompts, memory, tool schemas, orchestration graphs, control policies, and observability routines are patched together faster than they can be validated, standardized, and governed.
Paper provides a formal definition / conceptual framing of 'Agentic Technical Debt'; presented as a definitional contribution rather than an empirically measured quantity.
high null result Governing Technical Debt in Agentic AI Systems conceptual definition of a technical/governance liability
Agentic AI systems reason over multiple steps, call tools, act through workflows, and adapt through memory and feedback.
Descriptive/definitional statement in the paper; presented as characteristics of agentic systems rather than supported by empirical measurement.
high null result Governing Technical Debt in Agentic AI Systems architectural/behavioral characteristics of agentic AI systems
Agentic AI systems are increasingly being explored as production infrastructure.
Stated as an observation in the paper's introduction/abstract; no empirical data, sample, or formal measurement provided (conceptual/observational claim).
high null result Governing Technical Debt in Agentic AI Systems exploration/adoption of agentic AI as production infrastructure
We identify four archetypes (data orchestrators, aggregators, niche specialists, and cloud orchestrators).
Paper states it develops a taxonomy and explicitly lists four archetypes; based on the taxonomy development and conceptual classification reported in the paper (no sample size or quantitative empirical test reported in abstract).
high null result An Ai Economy Beyond Big Tech Hyperscalers? A Taxonomy Of Ma... presence_of_archetypes (data orchestrators, aggregators, niche specialists, clou...
Regression models and moderation analyses were performed in R to examine associations between governance exposure, AI maturity, and adaptation intensity.
Methods statement: 'Regression models and moderation analyses were performed in R (R Computing, Austria) to examine associations between governance exposure, AI maturity, and adaptation intensity.'
high null result Research on the adaptation path of corporate strategy based ... associations_between_governance_exposure_AI_maturity_and_adaptation_indices
Path-specific composite indices for bifurcation, modularity, ethical signaling, and compartmentalization were quantified using validated scales.
Methods description in the paper: 'Path-specific composite indices ... were quantified using validated scales.'
high null result Research on the adaptation path of corporate strategy based ... composite_adaptation_indices (bifurcation, modularity, ethical signaling, compar...
The study coded 500 adaptation events.
Explicit statement: 'and 500 coded adaptation events.'
high null result Research on the adaptation path of corporate strategy based ... adaptation_event_count
The qualitative dataset included 48 executive and technical informants.
Explicit statement: 'including 48 executive and technical informants'.
The study uses a comparative multi-case dataset of 12 multinational firms (4 tri-jurisdictional, 4 Atlantic, 4 China-primary).
Explicit dataset description in the paper: 'A comparative multi-case dataset of 12 multinational firms (4 tri-jurisdictional, 4 Atlantic, 4 China-primary) was analyzed.'
This discrimination was invisible to standard action-log audits: bias operated entirely through who received each action, not what actions were chosen, with action-type distributions showing no increase in negative actions across conditions.
Comparison of action-recipient patterns vs action-type distributions across the experimental conditions in the simulation; reported observation that action-type distributions did not show increased negative actions and that audits of action logs (action types) failed to reveal the bias.
high null result Human-like in-group bias in instruction-tuned language model... action-type distribution (no increase in negative actions) and detectability of ...
(i, continued) The counterfactual toll has explicit non-uniqueness (i.e., non-uniqueness of the toll is demonstrated).
Mathematical argument in the paper identifying conditions or constructions that lead to multiple valid tolls (formal counterexample or theorem on non-uniqueness).
high null result Foundations of a Time-Consistent Counterfactual Actuarial Ru... non-uniqueness property of the counterfactual toll
The paper proposes a policy framework consisting of six groups of solutions for Vietnam to both promote AI development and control risks in the digital age.
Declared in abstract: the paper presents a six-group policy framework for Vietnam; the framework itself is the paper's output (proposal), not empirically tested in the paper.
high null result Regulatory Policy for the Agent Economy in the Digital Age: ... existence of a six-group policy framework aimed at promoting AI development and ...
This study employs document synthesis and comparative analysis of international policies.
Methodological statement in the paper abstract describing the research approach; no sample size specified beyond document sources.
high null result Regulatory Policy for the Agent Economy in the Digital Age: ... research method used (document synthesis and comparative policy analysis)
The rise of artificial intelligence (AI) is shaping a new Agent Economy (AE), in which autonomous AI agents represent humans in performing a wide range of complex tasks.
Statement in paper abstract/intro (conceptual definition); no empirical data or sample reported.
high null result Regulatory Policy for the Agent Economy in the Digital Age: ... existence/definition of Agent Economy (autonomous AI agents representing humans ...
The study contributes a taxonomy of AI workforce impact, a Workforce Resilience Readiness Score (WRRS), an AI Workforce Trust Index (AWTI), an Ethical Automation Boundary concept, and a pilot empirical validation design.
Declared methodological and conceptual contributions in the paper (these are presented as deliverables of the study; no validated results reported in the excerpt).
high null result From Automation Panic to Workforce Resilience: A Governance ... new measurement/conceptual tools (taxonomy, WRRS, AWTI, Ethical Automation Bound...
The International Labour Organization's 2025 update highlights the need to assess the exposure of generative AI at the task level using task data, expert input, and AI model predictions.
Reference to ILO 2025 update recommendation described in the paper (policy/technical guidance rather than primary empirical data in the excerpt).
high null result From Automation Panic to Workforce Resilience: A Governance ... recommended assessment methods for AI exposure (task-level approach)