The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13870 claims)

Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 196 98 892 1984
Governance & Regulation 817 394 188 121 1544
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 627 233 123 96 1088
Research Productivity 411 123 56 332 933
Output Quality 467 178 59 47 751
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 167 122 24 496
Task Allocation 207 64 71 32 379
Skill Acquisition 165 59 60 17 301
Innovation Output 203 27 43 18 292
Employment Level 105 52 107 13 279
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 150 48 26 3 227
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 63 20 12 184
Error Rate 69 92 10 2 173
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 93 21 13 19 148
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Creative Output 31 17 7 3 59
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
All artifacts associated with this study are publicly available at https://zenodo.org/records/18489222.
Statement in the paper providing a Zenodo link to artifacts.
high null result The Impact of LLM-Assistants on Software Developer Productiv... availability of study artifacts
This review identifies key research gaps and provides recommendations for future research and practice.
Authors' discussion and conclusion sections synthesizing gaps and offering recommendations based on the mapping results.
high null result The Impact of LLM-Assistants on Software Developer Productiv... research gaps and recommendations (qualitative synthesis)
Satisfaction, Performance, and Efficiency are the most frequently investigated SPACE dimensions, whereas Communication and Activity remain underexplored.
Frequency counts and synthesis across the 39 included studies mapped to SPACE dimensions as reported by the authors.
high null result The Impact of LLM-Assistants on Software Developer Productiv... frequency of SPACE dimensions studied
Only 15% of the reviewed studies extend beyond three SPACE dimensions.
Authors' coding of included studies against the SPACE framework with reported proportion.
high null result The Impact of LLM-Assistants on Software Developer Productiv... proportion of studies examining >3 SPACE dimensions
90% of the reviewed studies adopt a multi-dimensional perspective by examining at least two SPACE dimensions.
Authors' coding of included studies against the SPACE framework, yielding the reported proportion.
high null result The Impact of LLM-Assistants on Software Developer Productiv... proportion of studies examining >=2 SPACE dimensions
This paper is a systematic review and mapping of 39 peer-reviewed studies published between January 2014 and December 2024 that examine the impact of LLM-assistants on software developer productivity.
Authors conducted a systematic review and mapping exercise covering peer-reviewed studies within the stated date range; the paper reports the count of included studies as 39.
high null result The Impact of LLM-Assistants on Software Developer Productiv... scope of literature reviewed (count of studies)
Long-running agents accumulated thousands of sequential decisions; continuously active agents reached 6,000+ prompt-state-action cycles.
Agent activity traces showing sequential decision counts per agent (trace-level telemetry).
high null result Operating-Layer Controls for Onchain Language-Model Agents U... number of prompt-state-action cycles per agent
The system consumed roughly 70B inference tokens across the deployment.
API/inference telemetry reporting total token usage.
high null result Operating-Layer Controls for Onchain Language-Model Agents U... inference token consumption
More than 5,000 ETH was deployed by agents during the experiment.
Accounting of ETH held/deployed by agent-controlled vaults during deployment.
Agents executed about $20M in trading volume over the deployment.
Aggregate trading-volume accounting from the bounded onchain market during deployment.
The deployment produced roughly 300K onchain actions.
Onchain transaction logs aggregated over the deployment.
high null result Operating-Layer Controls for Onchain Language-Model Agents U... onchain actions (transactions executed)
The system produced 7.5M agent invocations during the deployment.
System invocation logs reporting total agent calls across the deployment.
high null result Operating-Layer Controls for Onchain Language-Model Agents U... agent invocations (usage)
DX Terminal Pro was deployed for 21 days with 3,505 user-funded agents trading real ETH in a bounded onchain market.
Deployment logs and system telemetry from a 21-day field deployment reporting the number of user-funded agents.
high null result Operating-Layer Controls for Onchain Language-Model Agents U... number of active agents
Fears of AI automation do not primarily increase support for traditional interventions such as unemployment benefits and training programs.
Comparative analysis of policy preference responses in the 2024 OECD 'Risks that Matter' survey as reported in the paper.
high null result AI, the Future of Work, and the Politics of the Welfare Stat... public support for unemployment benefits and training programs
Cross-stage correlations are very weak: parsing->retrieval r = 0.14, parsing->generation r = 0.17, retrieval->generation r = 0.02.
Reported Pearson (or Spearman) correlation coefficients between stage-level metrics in the benchmark; exact correlation method not specified in excerpt.
high null result Benchmarking Complex Multimodal Document Processing Pipeline... correlation between stage-level quality metrics
We evaluate SecMate in a controlled study with 144 participants and 711 conversations.
Reported experimental study sample and conversation counts in the paper.
high null result SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... study sample size and conversation count
Given the limited sample size, the results should be interpreted as exploratory.
Authors explicitly note limited sample size (20 decks) and label findings exploratory.
high null result Algorithmic personalities and the myth of neutrality: financ... interpretation caveat regarding sample size
Reliability (stability across repeated runs) varies substantially across models, with ICC values ranging from 0.240 to 0.930.
Paper reports interclass correlation coefficient (ICC) analysis of model output reliability across runs, giving a range of ICC values from 0.240 to 0.930.
high null result Algorithmic personalities and the myth of neutrality: financ... output reliability (ICC)
To account for stochastic variation in outputs, each model pair was evaluated five times under identical conditions.
Paper states that each model pair was run five times under identical conditions to distinguish one-off variation from persistent tendencies.
high null result Algorithmic personalities and the myth of neutrality: financ... number of repeated runs (methodological)
Each model evaluated 20 real startup pitch decks spanning multiple industries and funding stages.
Paper reports a controlled simulation design in which each model assessed 20 real pitch decks (sample of 20 decks).
high null result Algorithmic personalities and the myth of neutrality: financ... number of pitch decks evaluated (methodological)
The study used three leading models—GPT-4o, Claude 3.5 Sonnet, and DeepSeek-V2.
Explicit statement in the paper describing the experimental subjects: three named LLMs were evaluated.
high null result Algorithmic personalities and the myth of neutrality: financ... models evaluated (methodological)
The paper develops a typology of enterprise applications by their sensitivity to AI-induced shifts in make-or-buy economics.
Paper's stated contribution (conceptual typology based on analysis of application categories and AI sensitivity).
high null result The Buy-or-Build Decision, Revisited: How Agentic AI Changes... classification (typology) of enterprise applications by sensitivity to AI
This paper adopts a conceptual research approach, combining transaction cost economics and the resource-based view with an assessment of current AI capabilities, to systematically re-evaluate the factors underlying the make-or-buy decision.
Paper's stated methodology and theoretical framing (methodological claim about the paper itself).
high null result The Buy-or-Build Decision, Revisited: How Agentic AI Changes... methodological approach to studying make-or-buy decisions
Empirically, the decomposition eliminates evidence of speculation in the 2020-2025 AI rally.
Empirical application of the proposed decomposition and bubble test to asset price data covering the 2020–2025 period associated with the AI rally (data analysis reported in the paper).
high null result General-Purpose Technology and Speculative Bubble Detection presence (or absence) of speculative bubble evidence in the 2020–2025 AI rally
At this stage, AI adoption in Israel does not result in widespread layoffs; its primary impact lies in restructuring the labor market through a slowdown in recruitment, changes in job composition, and the emergence of new AI-related roles.
Empirical claim reported in the paper; the excerpt does not specify datasets, time periods, or sample sizes supporting this observation.
high null result Artificial Intelligence in Israel, Trends, Developments, and... employment changes attributable to AI adoption (layoffs, recruitment rates, job ...
Our architecture combines a two-layer Graph Convolutional Network (GCN) encoder, twin critics, and a value network that drives the adversary.
Model architecture description in the paper specifying a 2-layer GCN encoder, twin critics, and a value network used for adversary control.
high null result Semi-Markov Reinforcement Learning for City-Scale EV Ride-Ha... model architecture components (2-layer GCN encoder, twin critics, adversary-driv...
The robust backup uses the Kantorovich--Rubinstein dual, a projected subgradient inner loop, and a primal--dual risk-budget update.
Algorithmic description in the paper detailing the robust backup solver components (Kantorovich--Rubinstein dual, projected subgradient, primal-dual update).
high null result Semi-Markov Reinforcement Learning for City-Scale EV Ride-Ha... robust backup algorithm design and optimization procedure
To mitigate distributional shifts, we optimize a Soft Actor--Critic (SAC) agent against a Wasserstein-1 ambiguity set with a graph-aligned Mahalanobis ground metric that captures spatial correlations.
Methodological description of a robust training objective: SAC optimized under a Wasserstein-1 ambiguity set using a graph-aligned Mahalanobis metric to encode spatial correlations.
high null result Semi-Markov Reinforcement Learning for City-Scale EV Ride-Ha... robustness to distributional shift via Wasserstein-1 ambiguity set with graph-al...
These intentions are projected at every decision step through a time-limited rolling mixed-integer linear program (MILP) that strictly enforces state-of-charge, port, and feeder constraints.
Method/algorithm description in the paper: a rolling MILP projection component implemented to enforce physical constraints (state-of-charge, charger port limits, feeder limits) at each decision step.
high null result Semi-Markov Reinforcement Learning for City-Scale EV Ride-Ha... constraint compliance via MILP projection (state-of-charge, port, feeder constra...
The policy learns over high-level intentions produced by a masked, temperature-annealed actor.
Method/algorithm description in the paper describing the actor design (masked, temperature-annealed) and the high-level intentions used for policy learning.
high null result Semi-Markov Reinforcement Learning for City-Scale EV Ride-Ha... policy representation (high-level intentions from masked, temperature-annealed a...
We formulate the problem as a hex-grid semi-Markov decision process (semi-MDP) with mixed actions -- discrete actions for serving, repositioning, and charging, together with continuous charging power -- and variable action durations.
Methodological description in the paper presenting the model formulation (hex-grid semi-MDP) and action space design; no external dataset required.
high null result Semi-Markov Reinforcement Learning for City-Scale EV Ride-Ha... problem formulation (hex-grid semi-MDP with mixed and continuous actions and var...
The analysis employs rigorous econometric methods including difference-in-differences estimation and propensity score matching to control for confounding variables across industry (NAICS 2-digit), firm size, geographic location, occupation-level characteristics, and macroeconomic conditions.
Methodological description in the paper specifying DiD and propensity score matching and listed covariates/controls.
high null result The Generative AI Revolution: Early Evidence of Structural T... methodological controls / identification strategy
The study uses U.S. Census Bureau Business Trends and Outlook Survey data tracking over 1.2 million businesses.
Paper statement that it incorporates the Census Bureau Business Trends and Outlook Survey covering >1,200,000 businesses.
high null result The Generative AI Revolution: Early Evidence of Structural T... business-level observations (adoption/behavior)
The analysis integrates the Anthropic Economic Index capturing approximately one million AI usage interactions.
Paper statement that the Anthropic Economic Index was used and captures ~1,000,000 AI usage interactions.
high null result The Generative AI Revolution: Early Evidence of Structural T... AI usage interactions (adoption/usage)
We run over 1,100 games with over 16,000 private conversations totaling 15.2 million tokens and over 150,000 player actions.
Dataset and experimental log statistics reported in the paper.
high null result Cooperate to Compete: Strategic Coordination in Multi-Agent ... dataset size metrics (games, conversations, tokens, actions)
We run AI-only games and conduct a user study pitting human players against AI opponents.
Method statement in the paper describing experiments with both AI-only and human-vs-AI games.
high null result Cooperate to Compete: Strategic Coordination in Multi-Agent ... experimental setup (AI-only games and user study)
Players have asymmetric objectives and negotiations are non-binding, allowing alliances to form and break as players' short-term interests align and diverge.
Specification of game mechanics and rules in the paper (design features of C2C).
high null result Cooperate to Compete: Strategic Coordination in Multi-Agent ... game mechanic: objective asymmetry and non-binding negotiation
We introduce Cooperate to Compete (C2C), a multi-agent environment where players can engage in private negotiations while competing to be the first to achieve their secret objective.
Description of a newly developed environment (paper introduces the game and its rules/design).
high null result Cooperate to Compete: Strategic Coordination in Multi-Agent ... environmental features (private negotiations, secret objectives)
Overall, robot exposure is only weakly related to job-quality outcomes once controls and fixed effects are included.
Individual-level data from the European Working Conditions Telephone Survey (EWCTS) 2021 merged with country–industry robot exposure measures from International Federation of Robotics (IFR) statistics; weighted logistic regression models including individual and job controls and country and industry fixed effects.
high null result Gendered Effects of Robotisation on Job Quality job-quality outcomes (aggregate across dimensions)
There is no decrease in coding skills among new hires associated with GHC adoption.
Comparison of coding-skill indicators on LinkedIn profiles for new hires at GHC-adopting firms versus non-adopting firms; finding of no measurable decline in coding-skill measures.
high null result Firms' GitHub Copilot adoption and labor market outcomes for... coding skills among new hires
Semantic search maintained comparable inter-rater agreement while reducing chart abstraction time.
Clinical utility evaluation reports that inter-rater agreement was comparable between semantic-search-assisted abstraction and clinician-performed chart review.
The authors optimized embedding model and chunking strategy using a physician-authored benchmark dataset.
Methods: experiment described as optimization of embedding model and chunking using a physician-authored benchmark dataset.
high null result Health System Scale Semantic Search Across Unstructured Clin... model_and_chunking_configuration
The system uses instruction-tuned qwen3-embedding-0.6B embeddings, stores vectors in a managed database with storage-optimized indexing, maintains full-text metadata in a low-latency key-value store, and operates within a HIPAA-compliant governance framework.
Methods description of system architecture and governance provided in the paper.
high null result Health System Scale Semantic Search Across Unstructured Clin... system_architecture / governance_compliance
We deployed a semantic search system indexing 166 million clinical notes (484 million vectors) from 1.68 million patients.
Paper reports a production deployment at a large children's hospital and gives exact index counts: 166 million clinical notes, 484 million vectors, 1.68 million patients.
high null result Health System Scale Semantic Search Across Unstructured Clin... number_of_notes_indexed / index_size
We develop an analytical model in which a firm jointly chooses AI deployment and cybersecurity investment under this governance-capability gap.
Methodological claim: the paper presents an analytical (theoretical) model describing joint choice of deployment and cybersecurity investment.
high null result The Security Cost of Intelligence: AI Capability, Cyber Risk... model of joint choice (AI deployment and cybersecurity investment)
Through a rigorous sensitivity analysis of resource scarcity and temporal dominance, we quantify the coordination gap.
Methodological description in the paper indicating the authors performed a systematic sensitivity analysis across environmental parameters (resource scarcity and temporal dominance) to measure performance differences between training modalities.
Foundational research on AI identity is the central conclusion of this report.
Authors' stated conclusion of the paper.
high null result AI Identity: Standards, Gaps, and Research Directions for AI... priority recommendation for future research
We define AI Identity as the continuous relationship between what an AI agent is declared to be and what it is observed to do, bounded by the confidence that those two things correspond at any given moment.
Conceptual definition presented by the authors (conceptual/terminological contribution rather than empirical evidence).
high null result AI Identity: Standards, Gaps, and Research Directions for AI... conceptualization of AI agent identity
The sign reversal is a structural consequence of the reviewer effort collapse under log-concave quality distributions; this is proved analytically.
Formal analytical proofs in the paper that use the assumption of log-concave quality distributions to show the mechanism producing the sign reversal.
high null result Buying the Right to Monitor:Editorial Design in AI-Assisted ... existence of sign reversal as a robust structural model implication under log-co...
We formalize the distinction between compensatory and non-compensatory decision regimes and define a pre-execution legitimacy boundary.
Theoretical formalization presented in the paper (definitions and conceptual framework). No empirical evidence or sample size provided.
high null result Right-to-Act: A Pre-Execution Non-Compensatory Decision Prot... formal definitions distinguishing decision regimes and the notion of a pre-execu...