The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6507 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 736 1615
Governance & Regulation 664 329 160 99 1273
Organizational Efficiency 624 143 105 70 949
Technology Adoption Rate 502 176 98 78 861
Research Productivity 348 109 48 322 836
Output Quality 391 120 44 40 595
Firm Productivity 385 46 85 17 539
Decision Quality 275 143 62 34 521
AI Safety & Ethics 183 241 59 30 517
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 105 40 6 187
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 78 8 1 151
Regulatory Compliance 69 64 14 3 150
Training Effectiveness 81 15 13 18 129
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Productivity Remove filter
The study's findings are subject to design limitations including an AM/PM session confound, differential attrition, and LLM grading sensitivity to document length.
Authors' reported limitations section citing specific threats to internal validity and measurement (session timing confound, differential attrition across conditions, and grading biases of the LLM used to evaluate documents).
high negative Scaffolding Human-AI Collaboration: A Field Experiment on Be... threats to validity (confounds and measurement sensitivity)
The behavioral scaffolding intervention was associated with substantially lower document production.
Same field experiment (N=388); the behavioral scaffolding required joint AI use within pairs and was compared to unstructured use, with reported reductions in document production in the behavioral condition.
high negative Scaffolding Human-AI Collaboration: A Field Experiment on Be... document production (quantity of documents produced)
A behavioral scaffolding intervention (a structured protocol requiring joint AI use within pairs) was associated with lower document quality relative to unstructured use.
Field experiment with 388 employees at a Fortune 500 retailer; random/experimental assignment to scaffolding conditions while all participants had access to the same AI tool; comparison reported between behavioral scaffolding condition and unstructured use.
LLMs lag behind humans in sustaining heterogeneity when divergence is rewarded.
Empirical comparison from the experiment showing humans are better able than LLMs to maintain diverse actions when the payoff structure rewards divergence; stated qualitatively in the abstract without numeric effect sizes or sample sizes.
high negative Strategic Algorithmic Monoculture:Experimental Evidence from... ability to sustain heterogeneity/divergence under incentives
Latent-outcome estimation faces a within-study noncomparability challenge: different indicators within a study may have different and possibly nonlinear relationships with the same latent outcome, making them not directly comparable.
Theoretical exposition in the paper describing heterogenous indicator-to-latent mappings and potential nonlinearity; illustrated with examples (no empirical sample size).
high negative Nonparametric Identification and Estimation of Causal Effect... comparability of different indicators for the same latent outcome within a study
Latent-outcome estimation faces a cross-study noncomparability challenge: different measurement systems across studies may cause estimators to target different empirical quantities even when the underlying latent treatment effect is the same.
Conceptual and theoretical argumentation in the paper describing identification issues across studies due to differing measurement systems; supported by examples and discussion (no empirical sample size).
high negative Nonparametric Identification and Estimation of Causal Effect... comparability of estimated latent treatment effects across studies
Lower survival rates among BDA adopters are driven by greater uncertainty in sales.
Paper states greater uncertainty in sales is an interrelated factor explaining lower survival for BDA adopters, based on empirical analysis of German start-ups.
high negative Big data-based management decisions and start-up performance uncertainty in sales (sales volatility/variance)
Lower survival rates among BDA adopters are driven by higher operating costs.
Paper reports that higher operating costs are an interrelated factor explaining lower survival among BDA adopters, based on the same empirical sample of German start-ups.
Start-ups using BDA face lower survival rates.
Empirical comparison of BDA adopters versus non-adopters in a large sample of German start-ups (survival analysis implied by reported outcome).
high negative Big data-based management decisions and start-up performance survival (firm exit / failure)
Enterprise sales organizations are systematically hampered by what this paper terms 'Revenue Friction'—the accumulative productivity loss caused by fragmented, human-mediated data entry across disconnected CRM, ERP, and quoting systems.
Statement/definition presented in the paper excerpt. No empirical method, sample size, or quantitative evidence reported in the provided text.
high negative From CRM to Cognition: Autonomous Revenue Operations Systems... accumulative productivity loss (termed 'Revenue Friction') resulting from fragme...
Some of this reduced price is related to reduced input cost contributions, in particular labor and materials costs.
Decomposition/mediation analysis reported in the paper attributing part of the observed price reductions to declines in input cost contributions (labor and materials); exact methods, sample size, and statistical estimates not provided in the excerpt.
high negative Early Evidence on the Relationship Between AI, Costs, and Pr... input cost contributions (labor costs and materials costs)
AI intensity is associated with lower prices charged to purchasers.
Empirical analysis reported in the paper linking measures of AI intensity to observed output prices (details of data sources, sample size, and specific methods not provided in the excerpt).
high negative Early Evidence on the Relationship Between AI, Costs, and Pr... prices charged to purchasers (output prices)
Foundation-model usage can increase compute-related emissions.
Conceptual/environmental concern highlighted in the paper about the carbon footprint of heavy model use and persistent storage; no quantified emissions analysis or lifecycle assessment presented.
high negative Remote-Capable Knowledge Work Should Default to AI-Enabled F... compute-related (carbon) emissions associated with foundation-model usage
These systems can cause skill atrophy.
Theoretical risk articulated in the paper that reliance on AI assistance may degrade human skills over time; no longitudinal skill-measurement or experimental evidence provided.
high negative Remote-Capable Knowledge Work Should Default to AI-Enabled F... degradation or atrophy of worker skills
The same foundation-model systems can also intensify surveillance.
Cautionary claim in the paper noting the surveillance risk of durable, queryable traces and integrated tooling; presented as a conceptual risk rather than empirically measured increase in surveillance.
high negative Remote-Capable Knowledge Work Should Default to AI-Enabled F... increase in workplace surveillance capability/use
Baseline (non-structured) interactions had 16 of 50 accepted on first pass.
Reported counts in the paper for the baseline group (16 accepted of 50 baseline interactions).
high negative Context Engineering: A Practitioner Methodology for Structur... first-pass acceptances (count and rate)
In an observational study of documented interactions across four AI tools (Claude, ChatGPT, Cowork, Codex), incomplete context was associated with 72% of iteration cycles.
Observational study reported in the paper covering interactions across four AI tools; the paper reports the 72% figure.
high negative Context Engineering: A Practitioner Methodology for Structur... iteration cycles associated with incomplete context
Job insecurity emerges as a critical mediating factor influencing employee attitudes and behavioural responses to generative AI, including upskilling intentions and resistance to technological change.
Review-level synthesis identifying job insecurity reported in included studies as mediating relationships between AI adoption and employee attitudes/behaviours (e.g., upskilling, resistance).
high negative Generative AI in the Workplace: A Systematic Review of Produ... upskilling intentions and resistance to technological change (mediated by job in...
Employees express concerns about role displacement (job loss or role changes) associated with generative AI adoption.
Reported across multiple studies included in the review; the review summarises these concerns as part of mixed employee perceptions.
high negative Generative AI in the Workplace: A Systematic Review of Produ... perceived risk of role displacement / job loss
These positive perceptions coexist with employee concerns about skill obsolescence related to generative AI.
Synthesis of studies included in the review documenting worker concerns about skills becoming obsolete due to AI-driven changes.
high negative Generative AI in the Workplace: A Systematic Review of Produ... concerns about skill obsolescence
Income inequality, measured by the Gini index, rises moderately in every scenario we examine due to the polarising effect of job losses and wage and capital income increases on the income distribution.
Calculation of Gini index across multiple simulated scenarios using the SWITCH-linked distributional analysis; reported in the report.
high negative Artificial Intelligence and income inequality in Ireland Gini index (income inequality)
The largest average losses are experienced by middle and higher income households, for whom job displacement outweighs any wage or capital income gains. Lower income households also lose, but by much less.
Distributional results from microsimulation (SWITCH) applying scenarioled job displacement, wage and capital effects across income groups; reported in the report.
high negative Artificial Intelligence and income inequality in Ireland change in household disposable income by income group
When these effects are combined, we find an average decline in household disposable income as a result of AI adoption.
Combined scenario simulations incorporating job displacement, wage effects and capital income effects linked to the Irish tax-benefit system using SWITCH; result reported in the report's main findings.
high negative Artificial Intelligence and income inequality in Ireland household disposable income (average change)
These wage gains are not large enough to counterbalance the average fall in income due to job displacement.
Combined simulation results (displacement + wage effects) using scenario assumptions and microsimulation (SWITCH), reported in the report's distributional analysis.
high negative Artificial Intelligence and income inequality in Ireland net effect on household income (wages versus displacement losses)
Those most likely to experience this disruption are found in higher income households, where the share of workers transitioning into unemployment is substantially larger than in lower income families.
Microsimulation (SWITCH) linking simulated job displacement scenarios to household income groups; results reported in the report.
high negative Artificial Intelligence and income inequality in Ireland share of workers transitioning into unemployment by household income
In our central scenario — drawn from credible international estimates — around 7 per cent of current jobs could be displaced in the short–medium run.
Scenario simulation based on international estimates of AI exposure/adoption; central scenario reported in the report (linked to SWITCH microsimulation for distributional analysis).
high negative Artificial Intelligence and income inequality in Ireland share of jobs displaced
AI tends to place higher earning and highly educated workers at greater risk of disruption, because the occupations most exposed to AI are predominantly in these groups.
Synthesis of international research on occupational exposure to AI and the report's analysis linking exposure to worker characteristics (education and earnings); presented as descriptive finding in the report.
high negative Artificial Intelligence and income inequality in Ireland risk of job disruption / occupational exposure to AI
Result 2: When managers are short-termist or worker skill has external value, the decision-maker's optimal policy can produce the augmentation trap, leaving the worker worse off than if AI had never been adopted.
Analytical result from the dynamic model comparing planner/objective variations (short-termist manager or externalities) and showing an outcome labeled the 'augmentation trap'.
high negative The Augmentation Trap: AI Productivity and the Cost of Cogni... worker welfare/productivity relative to non-adoption
Result 1: Even a decision-maker who fully anticipates skill erosion rationally adopts AI when front-loaded productivity gains outweigh long-run skill costs, producing steady-state loss: the worker ends up less productive than before adoption.
Analytical result from the dynamic model showing optimal adoption choice can lead to a steady-state where worker productivity is lower than pre-adoption (model-based comparative statics).
high negative The Augmentation Trap: AI Productivity and the Cost of Cogni... steady-state worker productivity (relative to pre-adoption)
Experimental evidence shows that sustained use of AI tools can erode the expertise on which productivity gains depend (deskilling).
Statement in paper referencing experimental studies (no specific study, method, or sample size reported in the excerpt).
high negative The Augmentation Trap: AI Productivity and the Cost of Cogni... worker expertise / skill level
Claude Sonnet 4.6 achieves only 33.3% (completion rate) on ClawBench.
Paper gives a concrete example performance result for Claude Sonnet 4.6 (reported completion percentage on the benchmark).
high negative ClawBench: Can AI Agents Complete Everyday Online Tasks? task_completion_rate (percentage of tasks completed)
The authors evaluated 7 frontier models on ClawBench and found that both proprietary and open-source models can complete only a small portion of these tasks.
Paper reports evaluations of 7 models on the ClawBench tasks (empirical evaluation across the benchmark).
high negative ClawBench: Can AI Agents Complete Everyday Online Tasks? task_completion_rate / automation_exposure (how many tasks models can complete)
Aggressive compression increased total session cost by 67% despite reducing input tokens by 17%, because it shifted interpretive burden to the model's reasoning phase.
Result reported from the controlled experiment comparing log-format conditions; four conditions described but specific number of sessions/replications not provided in the abstract.
high negative Beyond Human-Readable: Rethinking Software Engineering Conve... total session cost (primary) and input token count (secondary)
Evaluation of 17 models reveals severe limitations: no model exceeds 66% overall.
Paper reports an evaluation across 17 models and states the maximum overall score observed was below 66%.
high negative ImplicitMemBench: Measuring Unconscious Behavioral Adaptatio... overall accuracy on the implicit memory benchmark
Existing memory benchmarks for LLM agents evaluate explicit recall of facts, yet overlook implicit memory where experience becomes automated behavior without conscious retrieval.
Statement in paper introduction contrasting prior benchmarks' focus on explicit recall with a claimed gap in evaluating implicit (non-declarative) memory; no systematic literature review or quantitative survey reported in the excerpt.
OpenAI o3 achieves only 17% of optimal collective performance.
Experimental measurement of collective performance for OpenAI o3 in the paper's multi-agent setup (value reported in abstract; no sample size provided there).
high negative More Capable, Less Cooperative? When LLMs Fail At Zero-Cost ... collective performance (percent of optimal group revenue)
The study observed errors and limitations in both phases (test generation and refactoring), and manual intervention was necessary at times.
Case study observations reported in the paper describing observed model errors/limitations and instances requiring manual developer intervention.
high negative AI-Assisted Unit Test Writing and Test-Driven Code Refactori... occurrence of errors and need for manual intervention
Current AI coding assistants, such as GitHub Copilot and Amazon CodeWhisperer, emphasize developer speed and convenience, with energy impact not yet a primary focus.
Stated as an observation in the paper; no specific empirical comparison or quantification provided in this excerpt.
high negative EcoAssist: Embedding Sustainability into AI-Assisted Fronten... design priorities of AI coding assistants (speed/convenience vs. energy impact)
Frontend code, replicated across millions of page views, consumes significant energy and contributes directly to digital emissions.
Asserted in paper's introduction; no specific empirical data or sample reported in this excerpt.
high negative EcoAssist: Embedding Sustainability into AI-Assisted Fronten... energy consumption / digital emissions from frontend code
We posit that persistence is reduced because AI conditions people to expect immediate answers, denying them the experience of working through challenges on their own.
Authors' proposed psychological mechanism / explanation inferred from observed behavior; presented as a hypothesis rather than directly proven causal mediator.
high negative AI Assistance Reduces Persistence and Hurts Independent Perf... mechanistic explanation for reduced persistence (expectation of immediate answer...
These negative effects (reduced persistence and impaired unassisted performance) emerge after only brief interactions with AI (approximately 10 minutes).
Experimental manipulation / exposure in RCTs where participants interacted with AI for about 10 minutes and subsequent outcomes were measured.
high negative AI Assistance Reduces Persistence and Hurts Independent Perf... onset/time to observable effect (persistence and unassisted performance after ~1...
People are more likely to give up after interacting with AI (increased likelihood of quitting tasks unassisted).
Randomized controlled trials (N = 1,222) measuring rates of task abandonment/giving-up after AI interaction vs. control.
high negative AI Assistance Reduces Persistence and Hurts Independent Perf... likelihood of giving up / task abandonment
AI assistance impairs unassisted performance: although AI improves short-term performance, people perform significantly worse without AI after interacting with it.
Randomized controlled trials (N = 1,222) comparing performance with and without AI assistance across tasks; causal inference from randomized assignment.
high negative AI Assistance Reduces Persistence and Hurts Independent Perf... unassisted task performance (accuracy/quality when working without AI after prio...
Through a series of randomized controlled trials on human-AI interactions (N = 1,222), we provide causal evidence that AI assistance reduces persistence.
Randomized controlled trials (RCTs) on human-AI interactions with total sample size N = 1,222; persistence measured after AI interaction across tasks.
high negative AI Assistance Reduces Persistence and Hurts Independent Perf... persistence (willingness to continue working on tasks without AI)
AI-assisted evaluation reduces variance in research quality.
SEM and regression analyses on OECD panel data report a decrease in variance of research quality measures associated with higher AIRC.
high negative AI-Augmented Peer Review and Scientific Productivity: A Cros... variance in research quality
Current research has largely focused on short-horizon tasks over a limited set of software with limited economic value (e.g., basic e-commerce and OS-configuration tasks).
Narrative literature/field observation reported in paper introduction (no numeric study reported in excerpt).
high negative Gym-Anything: Turn any Software into an Agent Environment scope and horizon of existing research tasks
There is a fundamental gap in current agent capabilities: functional correctness alone is insufficient for design-aware issue resolution, motivating design-aware evaluation beyond functional correctness.
Synthesis of experimental findings: low design-satisfaction despite functional correctness, prevalence of design violations, and only partial improvement from guidance support the conclusion.
high negative Does Pass Rate Tell the Whole Story? Evaluating Design Const... agent capability for design-aware issue resolution
Design violations are widespread in agent-produced patches.
Empirical results from experiments on the benchmark showing many patches violate validated design constraints; backed by counts/percentages in evaluation (as summarized in abstract).
high negative Does Pass Rate Tell the Whole Story? Evaluating Design Const... number/occurrence of design violations
Test-based correctness substantially overestimates patch quality: fewer than half of resolved issues are fully design-satisfying.
Experimental evaluation with state-of-the-art LLM-based agents on the benchmark (reported in paper). Sample implicit: benchmark issues (495) used to evaluate agents; comparison between test pass rates and design-satisfaction measured by verifier.
high negative Does Pass Rate Tell the Whole Story? Evaluating Design Const... design-satisfaction of patches (design compliance)
Despite growing investment in data analytics, the decision-making and coordination layers of these workflows remain predominantly manual, reactive, and fragmented across outlets, distribution centers, and supplier networks.
Stated as an observation in the paper (abstract); no quantitative evidence, metrics, or comparative analysis provided in the excerpt.
high negative Flowr -- Scaling Up Retail Supply Chain Operations Through A... degree of manual decision-making and coordination (fragmentation/reactivity)