The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (8570 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Adoption Remove filter
In our central scenario — drawn from credible international estimates — around 7 per cent of current jobs could be displaced in the short–medium run.
Scenario simulation based on international estimates of AI exposure/adoption; central scenario reported in the report (linked to SWITCH microsimulation for distributional analysis).
high negative Artificial Intelligence and income inequality in Ireland share of jobs displaced
AI tends to place higher earning and highly educated workers at greater risk of disruption, because the occupations most exposed to AI are predominantly in these groups.
Synthesis of international research on occupational exposure to AI and the report's analysis linking exposure to worker characteristics (education and earnings); presented as descriptive finding in the report.
high negative Artificial Intelligence and income inequality in Ireland risk of job disruption / occupational exposure to AI
Traditional frameworks for competition law, which emphasize short-term price impacts and inflexible market definitions, are inadequate to address exclusionary effects in AI-driven markets.
Conceptual/legal analysis combined with the paper's empirical findings (panel-data evidence of non-price exclusionary dynamics) arguing the mismatch between observed AI-driven exclusion and conventional competition law focus.
high negative Algorithmic Advantage and Barriers to Entry in AI-Driven Mar... adequacy of competition-law frameworks
Route dependency produced by dynamic learning processes disproportionately disadvantages late entrants.
Empirical and theoretical analysis in the paper: dynamic learning / cumulative learning modeled in the conceptual framework and empirically tested using panel data on AI-intensive markets showing persistent advantages for early entrants.
high negative Algorithmic Advantage and Barriers to Entry in AI-Driven Mar... relative disadvantage / entry probability of late entrants
These effects are made worse by data concentration.
Moderator/interaction analysis reported in the paper showing that market-level data concentration amplifies the association between algorithmic advantage and both reduced entry and greater concentration in the panel-data analysis.
high negative Algorithmic Advantage and Barriers to Entry in AI-Driven Mar... entry rates (and market concentration)
Elevated levels of algorithmic advantage are consistently linked to diminished entry rates.
Empirical analysis using panel data: regressions on an unbalanced panel of markets with high AI intensity, controlling for firm size, capital intensity, R&D expenditure, and industry growth (as described in the paper).
The expansion of AI in digital health has simultaneously introduced complex governance, privacy, and financial sustainability challenges.
Argument and synthesis across regulatory policy, ethics, and healthcare economics literatures presented in the review (literature review / conceptual synthesis).
high negative Conceptual framework for AI governance, data privacy complia... governance complexity / privacy compliance burden / financial sustainability ris...
These risks are fundamentally product-level and cannot be eliminated by technical safeguards alone because agent behavior is inherently stochastic.
Theoretical argument/claim in the paper (no empirical demonstration or quantified test provided in the abstract).
high negative Quantifying Trust: Financial Risk Management for Trustworthy... eliminability of product-level agent risks by technical safeguards
Claude Sonnet 4.6 achieves only 33.3% (completion rate) on ClawBench.
Paper gives a concrete example performance result for Claude Sonnet 4.6 (reported completion percentage on the benchmark).
high negative ClawBench: Can AI Agents Complete Everyday Online Tasks? task_completion_rate (percentage of tasks completed)
The authors evaluated 7 frontier models on ClawBench and found that both proprietary and open-source models can complete only a small portion of these tasks.
Paper reports evaluations of 7 models on the ClawBench tasks (empirical evaluation across the benchmark).
high negative ClawBench: Can AI Agents Complete Everyday Online Tasks? task_completion_rate / automation_exposure (how many tasks models can complete)
These dynamics risk trapping workers in a 'low-skill trap'.
Synthesis of observed labour-market polarisation, persistent low-skill segment, and limited reskilling coverage from secondary sources (2020–2024); presented as a likely risk/consequence.
high negative Artificial Intelligence and labour market polarisation in In... entrenchment of low-skill employment and reduced upward mobility
Limited reskilling coverage constrains workers' ability to adapt to AI-driven changes.
Paper reviews official reports and secondary data (2020–2024) indicating low coverage/uptake of reskilling programs in India and links this to limited adaptation capacity.
high negative Artificial Intelligence and labour market polarisation in In... coverage/effectiveness of reskilling and workers' adaptive capacity
AI-driven change is intensifying wage disparities.
Paper links observed occupational shifts in secondary data (2020–2024) with widening wage gaps between high- and lower-skilled groups.
high negative Artificial Intelligence and labour market polarisation in In... wage disparities between skill groups
Routine middle-skilled roles are declining.
Secondary data and official reports from 2020–2024 documenting reductions in middle-skill occupations, interpreted through SBTC/Human Capital frameworks.
high negative Artificial Intelligence and labour market polarisation in In... decline in middle-skill jobs / job displacement in routine roles
There is a 'capability-demand inversion' where skills most demanded in AI-exposed jobs are those LLMs perform least well at in our benchmark.
Cross-referencing SAFI performance with Anthropic Economic Index demand data (reported in paper); described as an observed inversion pattern.
high negative The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... relationship between skill demand in AI-exposed jobs and SAFI performance
Conversational AI can covertly redirect consumer choices at scale, and existing transparency mechanisms may be insufficient to protect users.
Summary/interpretive claim based on the experimental findings (large increase in sponsored selections under LLM agents, low detection rates, lack of effect for 'Sponsored' labels) from the preregistered experiments (N = 2,012).
high negative Commercial Persuasion in AI-Mediated Conversations ability of conversational AI to influence consumer choices and effectiveness of ...
Instructing the model to conceal its intent makes its influence nearly invisible (detection accuracy < 10%).
Experimental manipulation instructing the LLM to conceal intent; reported detection accuracy under this condition is <10% in the experiments (N = 2,012).
high negative Commercial Persuasion in AI-Mediated Conversations participant detection accuracy of concealed promotional intent
The vast majority of participants fail to detect any promotional steering.
Reported participant detection measures collected during the experiments indicating low detection rates of promotional steering; based on the same experimental sample (N = 2,012).
high negative Commercial Persuasion in AI-Mediated Conversations participant detection of promotional steering
We term this the Logic Monopoly -- the agent society's unchecked monopoly over the entire logic chain from planning through execution to evaluation.
Terminology/definition introduced by the authors to describe the conceptual governance problem; definitional claim rather than empirical finding.
high negative AgentCity: Constitutional Governance for Autonomous Agent Ec... concentration of control over planning, execution, and evaluation logic
When agents from different human principals collaborate at scale, the collective becomes opaque: no single human can observe, audit, or govern the emergent behavior.
Conceptual/analytical claim presented as a security/governance risk in the paper; no empirical study or quantified measurement given in the excerpt.
high negative AgentCity: Constitutional Governance for Autonomous Agent Ec... observability/auditability/governability of multi-principal agent collectives
Health disparities research is severely underrepresented at just 5.7% of AI-funded work.
Semantic/topic classification identifying projects addressing health disparities among AI-labelled projects, yielding a reported share of 5.7%.
high negative An Analysis of Artificial Intelligence Adoption in NIH-Funde... share of AI-funded projects focused on health disparities
A critical research-to-deployment gap exists: 79% of AI projects remain in research/development stages while only 14.7% engage in clinical deployment or implementation.
Stage classification of AI-labelled projects in the dataset, reporting 79% classified as research/development and 14.7% as clinical deployment/implementation.
high negative An Analysis of Artificial Intelligence Adoption in NIH-Funde... stage of project (research/development vs clinical deployment/implementation)
Many agents hover around the break-even point despite similar semantic matching scores.
Observed empirical pattern reported in benchmark results: agents with similar semantic matching scores nevertheless show different financial outcomes (many near break-even).
high negative Market-Bench: Benchmarking Large Language Models on Economic... profitability relative to semantic matching score
AI-assisted evaluation reduces variance in research quality.
SEM and regression analyses on OECD panel data report a decrease in variance of research quality measures associated with higher AIRC.
high negative AI-Augmented Peer Review and Scientific Productivity: A Cros... variance in research quality
High-risk agentic systems with untraceable behavioral drift cannot currently satisfy the AI Act's essential requirements.
Authors' legal and normative conclusion based on their regulatory mapping and analysis (argumentative/legal reasoning rather than reported empirical testing).
high negative AI Agents Under EU Law compliance feasibility of high-risk agentic systems with untraceable behavioral ...
The paper identifies agent-specific compliance challenges in cybersecurity, human oversight, transparency across multi-party action chains, and runtime behavioral drift.
Author-stated findings from the regulatory mapping and analysis; specific challenge areas listed without reported quantitative measurement.
high negative AI Agents Under EU Law compliance challenges (cybersecurity, human oversight, transparency, runtime dri...
The EU AI Act (Regulation 2024/1689) regulates these systems through a risk-based framework, but it does not operate in isolation: providers face simultaneous obligations under the GDPR, the Cyber Resilience Act, the Digital Services Act, the Data Act, the Data Governance Act, sector-specific legislation, the NIS2 Directive, and the revised Product Liability Directive.
Legal/regulatory mapping asserted by the authors listing specific EU regulations and directives that impose obligations on providers.
high negative AI Agents Under EU Law regulatory obligations faced by AI agent providers
Multiple distinct contexts tend to collapse into one another or 'rot', degrading over time and reducing the utility of efforts to account for context.
Theoretical and empirical claim supported by interviewee reports and the authors' analytic synthesis; presented as observed pattern across cases (qualitative; sample size not specified).
high negative Context Collapse: Barriers to Adoption for Generative AI in ... durability and distinctness of contextual representations and their utility for ...
Generative AI tools fail to account for users' context in workplace settings.
Findings from expert interviews reporting concrete examples where tools did not incorporate or respect relevant contextual information; qualitative analysis (sample size not provided in the summary).
high negative Context Collapse: Barriers to Adoption for Generative AI in ... degree to which tools incorporate relevant contextual factors
Current approaches to account for the contexts in which generative AI technologies are used fall short of users' expectations and needs.
Qualitative empirical study based on expert interviews and analysis of user/developer perspectives (method described as expert interviews; exact sample size not stated in provided summary).
high negative Context Collapse: Barriers to Adoption for Generative AI in ... fit between system behavior and users' expectations/needs (contextual appropriat...
Occupations are not eradicated instantaneously, but gradually encroached upon via atomic actions.
Conceptual argument presented by the authors as part of their theoretical framing (Tech-Risk Dual-Factor Model); no empirical count reported for this specific claim.
high negative Bounded by Risk, Not Capability: Quantifying AI Occupational... process of occupational change / displacement
Existing task-based evaluations predominantly measure theoretical "exposure" to AI capabilities, ignoring critical frictions of real-world commercial adoption: liability, compliance, and physical safety.
Authoritative statement in paper contrasting prior task-based exposure evaluations with the paper's focus on business/institutional frictions (liability, compliance, physical safety). No numeric sample; literature critique based on conceptual analysis.
high negative Bounded by Risk, Not Capability: Quantifying AI Occupational... theoretical automation exposure measurement practices
Current research has largely focused on short-horizon tasks over a limited set of software with limited economic value (e.g., basic e-commerce and OS-configuration tasks).
Narrative literature/field observation reported in paper introduction (no numeric study reported in excerpt).
high negative Gym-Anything: Turn any Software into an Agent Environment scope and horizon of existing research tasks
We identify a temporal constraint: the window during which semiconductor manufacturing concentration makes hardware-level governance implementable is narrowing, while R&D timelines for critical mechanisms span years.
Authors' temporal analysis combining industry structure observations (semiconductor manufacturing concentration) with estimated R&D timelines for mechanisms (qualitative/engineering timeline estimates). No empirical time-series sample size provided.
high negative Hardware-Level Governance of AI Compute: A Feasibility Taxon... temporal feasibility window for hardware-level governance
We assess principal threats to compute-based governance, including algorithmic efficiency gains, distributed training methods, and sovereignty concerns.
Authors' threat analysis (qualitative assessment of technical and geopolitical threat vectors). No quantitative sample size; based on literature and engineering reasoning.
high negative Hardware-Level Governance of AI Compute: A Feasibility Taxon... threats to feasibility and effectiveness of compute-based governance
Our analysis reveals a structural mismatch: the mechanisms most needed for treaty verification, including on-chip compute metering, cryptographic proof-of-training, and hardware-embedded enforcement, are also the least mature.
Authors' feasibility assessments of mechanisms (qualitative/engineering evaluation across the taxonomy); identification of critical mechanisms for treaty verification and corresponding feasibility ratings. No empirical trial or sample size reported.
high negative Hardware-Level Governance of AI Compute: A Feasibility Taxon... maturity/feasibility of treaty-relevant hardware mechanisms
The governance of frontier AI increasingly relies on controlling access to computational resources, yet the hardware-level mechanisms invoked by policy proposals remain largely unexamined from an engineering perspective.
Authors' framing and literature review presented in the paper (conceptual/qualitative argument; no empirical sample size reported).
high negative Hardware-Level Governance of AI Compute: A Feasibility Taxon... hardware-level governance examination / policy-technical gap
The literature remains fragmented, with limited integrative frameworks to explain how AI-human dynamics and decision-making typologies shape outcomes.
Conclusion drawn from the systematic review and bibliometric analysis of the 627-article corpus as reported in the abstract.
high negative Advancing Decision-Making through AI-Human Collaboration: A ... degree of integration/coherence of the academic literature; presence of integrat...
Within robotics subsectors, system integration delivers earlier and stronger carbon-reduction effects than ontology manufacturing.
Subsector analysis in the panel data (277 prefecture-level cities, 2008–2019) comparing effects of system integration versus ontology manufacturing on urban carbon emissions.
high negative Exploring the nonlinear relationship between robotics manufa... urban carbon emissions (subsector-differentiated effects)
The carbon-mitigation effects of robotics manufacturing are more pronounced in the central region of China than in the eastern region, indicating a latecomer advantage in green industrialization.
Heterogeneity analysis across geographic regions (central vs eastern regions) using the same panel of 277 prefecture-level cities (2008–2019).
high negative Exploring the nonlinear relationship between robotics manufa... urban carbon emissions (heterogeneous effect by region)
A stage-dependent sequential mechanism operates: mature robotics manufacturing promotes robot adoption, which improves urban energy efficiency, and ultimately reduces carbon emissions; this channel is inactive at early stages of industry development.
Mechanism/mediation analysis using the panel data of 277 prefecture-level cities (2008–2019), presented as sequential pathway evidence in the paper.
high negative Exploring the nonlinear relationship between robotics manufa... robot adoption; urban energy efficiency; urban carbon emissions
Once robotics manufacturing reaches a moderate scale, further expansion leads to declines in urban carbon emissions.
Same panel dataset (277 prefecture-level cities, 2008–2019); econometric identification of the right-hand (declining) portion of the inverted U-shaped curve.
Replacing deterministic components with probabilistic workflows changes the failure mode: LLM pipelines may generate plausible but incorrect outputs that pass superficial checks and propagate into irreversible actions such as DOI minting and public release.
Conceptual argument supported by the paper's incident descriptions (e.g., a detected coordinate transformation error); the statement is presented as a general risk rationale.
high negative Exploring Robust Multi-Agent Workflows for Environmental Dat... propensity for plausible-but-incorrect outputs to bypass checks and propagate to...
Up to 25% of routine administrative tasks face high automation risk.
Quantitative survey of 150 leading Nigerian firms across finance, tech, and manufacturing reporting the share of tasks at high automation risk.
high negative Human Capital and the AI-Powered Future of Work: (Training, ... share of routine administrative tasks at high automation risk
There is a significant deficit in high-demand technical competencies such as data engineering, machine learning maintenance, and AI ethics within the Nigerian workforce.
Findings reported from the quantitative survey of 150 leading Nigerian firms (finance, tech, manufacturing) supplemented by qualitative workforce interviews and policy analysis.
high negative Human Capital and the AI-Powered Future of Work: (Training, ... availability/deficit of technical competencies (data engineering, ML maintenance...
Treated firms' demand for external capital investment falls by just over $220,000 relative to the control group.
RCT with 515 firms; reported dollar-change in external investment demand between treated and control firms.
high negative Mapping AI into Production: A Field Experiment on Firm Perfo... change in external capital investment demand (USD)
Despite faster growth, treated firms do not scale inputs proportionally: their demand for external capital investment falls by 39.5% relative to the control group.
RCT with 515 firms; firms reported external capital demand/investment requests; comparison of investment demand between treatment and control groups.
high negative Mapping AI into Production: A Field Experiment on Firm Perfo... demand for external capital investment
Applying the Auditor-Corrector methodology to ELT-Bench uncovers that most failed transformation tasks contain benchmark-attributable errors — including rigid evaluation scripts, ambiguous specifications, and incorrect ground truth — that penalize correct agent outputs.
Audit results on ELT-Bench identifying categories of benchmark errors (rigid scripts, ambiguous specs, incorrect ground truth) and attributing many failed transformation tasks to these errors; no numeric breakdown or sample count given in the excerpt.
high negative ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... proportion of failed transformation tasks attributable to benchmark errors (qual...
On ELT-Bench, the first benchmark for end-to-end ELT pipeline construction, AI agents initially showed low success rates, suggesting they lacked practical utility.
Reference to initial evaluation results on ELT-Bench showing low success rates for AI agents; the provided excerpt does not give numerical success rates or sample size.
high negative ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... agent success rate on ELT-Bench (agent capability / practical utility)
LLM uncertainty estimates require statistical correction before they can be used in decision-making.
Empirical finding of severe undercoverage of nominal 95% intervals and demonstration that conformal recalibration is needed to achieve intended coverage.
high negative Bayesian Elicitation with LLMs: Model Size Helps, Extra "Rea... adequacy of raw LLM uncertainty estimates for decision-making (calibration/cover...