The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6507 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 736 1615
Governance & Regulation 664 329 160 99 1273
Organizational Efficiency 624 143 105 70 949
Technology Adoption Rate 502 176 98 78 861
Research Productivity 348 109 48 322 836
Output Quality 391 120 44 40 595
Firm Productivity 385 46 85 17 539
Decision Quality 275 143 62 34 521
AI Safety & Ethics 183 241 59 30 517
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 105 40 6 187
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 78 8 1 151
Regulatory Compliance 69 64 14 3 150
Training Effectiveness 81 15 13 18 129
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Productivity Remove filter
AGI could fundamentally alter the global distribution of economic and military power.
Paper's geopolitical analysis drawing on capability trends and scenario reasoning (as stated in abstract); no empirical quantification provided in the abstract.
high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation
Increased levels of AI assistance may degrade productivity, leading to potentially significant shortfalls under the model's identified conditions.
Model-based comparative-statics and steady-state analysis showing scenarios where marginal increases in AI assistance reduce expected task output; examples/parameter illustrations provided in the paper (theoretical, no empirical sample).
high negative Human-AI Productivity Paradoxes: Modeling the Interplay of S... expected task output / productivity shortfalls associated with increased AI assi...
Introducing AI unreliability (errors/noise in AI outputs) in the model can also generate a productivity paradox: greater AI assistance may lower productivity.
Analytical/theoretical model incorporating AI unreliability; model derivations and examples demonstrating conditions under which unreliability leads to reduced productivity (no empirical data).
high negative Human-AI Productivity Paradoxes: Modeling the Interplay of S... agent productivity (task output) as influenced by AI assistance and AI unreliabi...
Incorporating endogeneity in skill development into the model can induce a productivity paradox where increased AI assistance reduces productivity.
Analytical/theoretical model of human-AI interaction with utility-maximizing human agents and endogenous skill development; steady-state and comparative-static analysis reported in the paper (no empirical sample).
high negative Human-AI Productivity Paradoxes: Modeling the Interplay of S... agent productivity (task output) as a function of AI assistance and endogenous s...
AI integration simultaneously increases labor concerns about skill obsolescence by 33%.
Reported as a survey/result in the paper; the study includes surveys of 800 marketers (self-reported concerns about skill obsolescence are likely derived from that survey sample).
high negative Augmented Intelligence: Resolving the AI integration-obsoles... worker concerns about skill obsolescence
Rising data velocity renders legacy systems obsolete—threatening approximately $3.4 trillion in global marketing spending.
Paper reports an estimate/claim about threatened global marketing spending tied to legacy systems becoming obsolete (derivation likely from the study's quantitative analysis or economic estimate described in the paper).
high negative Augmented Intelligence: Resolving the AI integration-obsoles... value of global marketing spending at risk
62% of teams suffer from "AI paralysis," unable to scale pilot initiatives beyond isolated implementations.
Reported as a finding in the paper's mixed-methods study (paper states AI adoption audits of 120 organizations and surveys of 800 marketers as part of the study).
high negative Augmented Intelligence: Resolving the AI integration-obsoles... AI paralysis / inability to scale AI pilots
Autonomous software-engineering agents remain unreliable in realistic development settings.
Assertion in abstract summarizing the observed current state; likely based on prior literature and/or authors' observations (no empirical sample size given in abstract).
high negative AI Harness Engineering: A Runtime Substrate for Foundation-M... reliability of autonomous software-engineering agents (ability to perform correc...
Individuals low in trait self-efficacy experienced the steepest ownership erosion (i.e., AI-authorship reduced psychological ownership most for low self-efficacy participants).
Reported moderation analysis in the preregistered experiment showing trait self-efficacy moderated the authorship effect on psychological ownership; preregistered N = 470. (No numeric effect size reported in the abstract.)
high negative Optimized but Unowned: How AI-Authored Goals Undermine the M... change/erosion in psychological ownership as moderated by trait self-efficacy
Participants in the LLM condition reported lower perceived importance (d = 1.13).
Same preregistered experiment; reported effect size d = 1.13; preregistered N = 470.
high negative Optimized but Unowned: How AI-Authored Goals Undermine the M... perceived importance of goals (self-reported)
Participants in the LLM condition reported lower commitment (d = 1.19).
Same preregistered experiment comparing self-authored vs LLM-authored goals; reported effect size d = 1.19; preregistered N = 470.
high negative Optimized but Unowned: How AI-Authored Goals Undermine the M... commitment (self-reported)
Participants in the LLM condition reported lower psychological ownership (d = 1.38).
Same preregistered experiment (between-subjects comparison of authorship); reported effect size d = 1.38; preregistered N = 470.
high negative Optimized but Unowned: How AI-Authored Goals Undermine the M... psychological ownership (self-reported)
The paper identifies five fundamental architectural mismatches between conventional APIs and autonomous agent requirements: exact-identifier dependence, rendering-oriented responses, single-shot interaction assumptions, user-equivalent authorization, and opaque error semantics.
Conceptual analysis and problem-framing presented in the paper (qualitative identification of five mismatch categories).
high negative Agent-First Tool API: A Semantic Interface Paradigm for Ente... architectural_mismatches_between_conventional_APIs_and_autonomous_agent_requirem...
Using LLMs led to fewer creative moments observed in participants (p=0.002).
Within-subject comparison between LLM-assisted and unassisted conditions with reported p-value p=0.002. Study sample N=20.
high negative "Like Taking the Path of Least Resistance": Exploring the Im... count of creative moments
Participants using LLMs had significantly shorter idea-generation periods (p=0.0004).
Within-subject comparison between LLM-assisted and unassisted conditions reported in paper; p-value reported as p=0.0004. Sample size N=20.
high negative "Like Taking the Path of Least Resistance": Exploring the Im... idea-generation period (time spent generating ideas)
AI-assisted engineering teams concurrently face a 19% risk of skills obsolescence.
Empirical finding reported by the study, presumably based on the mixed-methods data (survey/Delphi/case studies) described in abstract.
high negative The AI-engineering imperative - Navigating synergy and obsol... risk of skills obsolescence
Forecasts indicate that automation may supplant as much as 45% of traditional tasks by 2030.
Statement in paper referencing external forecasts (no specific source or sample reported in abstract).
high negative The AI-engineering imperative - Navigating synergy and obsol... percentage of traditional tasks automated by 2030
Existing AI assistants (e.g., ChatGPT, Copilot) utilize pre-defined user preferences and chat interaction histories and are therefore confined to reactive exchanges lacking sufficient adaptability to users' psychophysiological states.
Authorial characterization/argument about current AI assistant behavior; no empirical data reported in abstract to substantiate beyond description.
high negative AwareLLM: A Proactive Multimodal Ecosystem for Personalized ... adaptability of AI assistants
Producing hardened, production-grade agent workflows may require extra compute and time, and these costs must be amortized through reuse across a broad user community.
Argument in paper reasoning that added rigor entails higher compute/time costs and that reuse across users is needed to amortize these costs; no empirical cost estimates provided.
high negative Engineering Robustness into Personal Agents with the AI Work... resource_costs (compute/time) and implications for amortization/adoption
By focusing on rapid, real-time synthesis, AI agents are effectively delivering users improvised prototypes rather than systems fit for high-stakes scenarios in which users may unwittingly apply them.
Conceptual argument presented in the paper asserting a qualitative mismatch between on-the-fly agents and high-stakes production needs; no empirical validation reported.
high negative Engineering Robustness into Personal Agents with the AI Work... suitability for high-stakes use / risk to users
The on-the-fly paradigm short-circuits disciplined software engineering processes—iterative design, rigorous testing, adversarial evaluation, staged deployment, and more—that have delivered relatively reliable and secure systems.
Argumentative claim in paper linking the on-the-fly loop to reduced application of standard SE processes; no empirical study, sample, or quantitative evidence provided.
high negative Engineering Robustness into Personal Agents with the AI Work... reliability and security (degree to which SE processes are applied)
These findings underscore the insufficiency of current agents for interdependent workflows, positioning ComplexMCP as a critical testbed for the next generation of resilient autonomous systems.
Synthesis of empirical results (low agent success rates, identified bottlenecks) presented by authors to make a broader claim about agent readiness and the benchmark's relevance.
high negative ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdepend... agent suitability/readiness for interdependent workflows
(3) strategic defeatism, a tendency to rationalize failure rather than pursuing recovery.
Qualitative/quantitative trajectory analysis indicating agents often choose rationalization/explanatory actions over recovery or retry strategies after failures.
high negative ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdepend... rate of recovery/persistence actions vs rationalization actions after failure
(2) over-confidence, where agents skip essential environment verifications;
Trajectory analyses showing agents often omit verification steps leading to failed interactions; reported as an identified failure mode.
high negative ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdepend... frequency of environment verification checks performed by agents
Granular trajectory analysis identifies three fundamental bottlenecks: (1) tool retrieval saturation as action spaces scale;
Trajectory analyses of agent interactions with the benchmark reported by authors; observational claim from analysis of agent action sequences as action space increases.
high negative ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdepend... tool retrieval performance / selection accuracy as action space scales
We evaluate various LLMs across full-context and RAG paradigms, revealing a stark performance gap: even top-tier models fail to exceed a 60% success rate, far trailing human performance 90%.
Empirical evaluation reported by authors comparing multiple LLM agents (full-context and RAG) against human performance on benchmark tasks; specific reported success rates: <=60% for top models, 90% for humans.
high negative ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdepend... task success rate (agent vs human)
Common failures include replacing essential operations such as sweeps, lofts, and twist-extrudes with simpler sketch-and-extrude patterns.
Error-mode analysis described in the paper/abstract showing that models substitute complex CAD operations (sweep, loft, twist-extrude) with simpler sketch-and-extrude sequences.
high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... use_of_appropriate_CAD_operations_in_generated_code
Common failures include misinterpreting industrial design parameters.
Reported error analysis in the paper/abstract indicating models often misinterpret engineering/design parameters when generating CAD programs.
high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... accuracy_of_inferred_design_parameters
Common failures include missing fine 3D structure.
Qualitative and quantitative analysis of model outputs on BenchCAD reported in the paper/abstract noting missing fine 3D structural details as a frequent error mode.
high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... completeness_of_3D_structure_in_generated_models
Human capital and technological innovation channels show weaker or even negative effects on Lae, attributed to short-term resource misallocation and skill mismatches.
Spatial mediation analysis (channel analysis) using panel data for 30 provincial regions (2012–2022) assessing mediating roles of human capital and technological innovation.
high negative A study of the impact of artificial intelligence on the low-... mediated effect of human capital and technological innovation on Lae
Functional deployment and operational investment in AI are associated with employment declines.
Regression analyses from the BTOS AI supplement linking measures of functional AI deployment and operational AI investment to firm-reported employment changes; observational associations (sample size and exact model specification not shown in excerpt).
high negative The Microstructure of AI Diffusion: Evidence from Firms, Bus... employment change associated with functional deployment and operational investme...
Employment reductions attributable to AI are rare: only 2% of firms report employment reductions.
Firm self-reports on employment outcomes related to AI from the BTOS AI supplement (Nov 2025–Jan 2026); descriptive statistic reported; sample size not excerpted.
high negative The Microstructure of AI Diffusion: Evidence from Firms, Bus... reported employment reductions due to AI
Among firms with worker-level AI use, 65% restrict use to three or fewer tasks.
Descriptive statistic from BTOS AI supplement giving distribution of number of worker tasks using AI among firms that report worker-level use; sample size not shown.
high negative The Microstructure of AI Diffusion: Evidence from Firms, Bus... breadth of worker-task AI use per firm (number of tasks)
Among adopter firms, scope remains limited: 57% use AI in three or fewer functions.
Descriptive distribution of number of business functions using AI among adopter firms in the BTOS AI supplement (Nov 2025–Jan 2026); sample restricted to adopter firms (sample size not provided).
high negative The Microstructure of AI Diffusion: Evidence from Firms, Bus... number of business functions using AI per adopting firm (breadth of functional d...
Institutional inertia in property valuation poses risks to asset pricing, collateral risk modelling and investor confidence.
Analytical inference from interview findings and theoretical synthesis highlighting implications for property investment and financial market stability.
high negative Exploring barriers to valuation technology adoption in prope... risks to asset pricing, collateral risk modelling and investor confidence
Despite advances in automation, data analytics and AI, the sector has been slow to digitise.
Background statement supported by interview data and sector observation reported in the study.
high negative Exploring barriers to valuation technology adoption in prope... pace of digitisation in the property valuation sector
The IDOI framework provides a transferable model for understanding digital transformation in regulated, high-trust professions and highlights the market-level risks of institutional inertia in property valuation.
Development of the IDOI conceptual framework from qualitative data and theoretical integration; authors' claim about transferability and implications.
high negative Exploring barriers to valuation technology adoption in prope... transferability of the framework and market-level risks from institutional inert...
Generational divides, protectionist attitudes and fears of automation reinforce digital resistance.
Qualitative interview evidence reporting attitudes across cohorts of valuers and firm personnel; thematic analysis identifying cultural and attitudinal themes.
high negative Exploring barriers to valuation technology adoption in prope... cultural/attitudinal resistance to VTech
The Valuers Act (1948), fragmented infrastructure and sovereignty concerns limit innovation.
Interview data from practitioners, firm leaders and regulators in New Zealand citing specific regulatory and infrastructure constraints; thematic analysis.
high negative Exploring barriers to valuation technology adoption in prope... regulatory and infrastructure constraints on innovation
Barriers to adoption arise primarily from institutional conservatism, outdated regulation and weak data governance rather than technical shortcomings.
Qualitative semi-structured interviews with valuers, firm leaders and regulators in New Zealand; thematic analysis guided by Rogers' diffusion of innovations and institutional theory synthesised into the IDOI framework.
high negative Exploring barriers to valuation technology adoption in prope... barriers to VTech adoption
Consequently, generated artifacts may exhibit brittle behavior and limited deployability.
Paper asserts that lack of production awareness leads to brittle artifacts and limited deployability; no quantitative measures or sample sizes provided in the abstract.
high negative Architectural Constraints Alignment in AI-assisted, Platform... brittleness of artifacts and deployability
AI-assisted development tools often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments.
Asserted observation in the paper arguing limitations of general-purpose AI code generation when targeting production-ready systems; no empirical sample size or methodological details provided in the excerpt.
high negative Architectural Constraints Alignment in AI-assisted, Platform... awareness of architectural constraints / suitability for production
Current AI tools are not yet mature enough to replace developers.
Conclusion drawn from the controlled experiment and participant feedback comparing AI-assisted vs traditional task-splitting.
high negative Splitting User Stories Into Tasks with AI -- A Foe or an All... suitability of AI to replace developers
Breaking down user stories into actionable tasks is a critical yet time-consuming process in agile software development.
Background/introductory statement in the paper describing the problem motivation; no experimental sample size reported for this claim.
high negative Splitting User Stories Into Tasks with AI -- A Foe or an All... time required to split user stories (descriptive claim about time consumption)
Nominally cheaper models can incur higher total cost due to token-intensive reasoning.
Cost and token usage analysis reported in the paper showing cheaper-per-token models may generate more tokens and thus higher total cost in practice.
high negative Switchcraft: AI Model Router for Agentic Tool Calling total inference cost as a function of token usage and per-token price
Agentic AI systems that invoke external tools are powerful but costly, leading developers to default to large models and overspend inference budgets.
Stated as background/motivation in the paper (conceptual claim; no empirical sample size reported).
high negative Switchcraft: AI Model Router for Agentic Tool Calling inference cost / developer tendency to use large models
Cascade performance is limited primarily by structural cost (they pay the cheap model before any escalation decision), rather than by a shortage of intermediate stages.
Synthesis of theoretical insights and empirical results reported in the paper (theoretical analysis of structural costs + empirical comparisons showing limited benefit from additional stages).
high negative Is Escalation Worth It? A Decision-Theoretic Characterizatio... primary constraint on cascade performance (structural cost vs availability of in...
Optimized subsequence cascades do not deliver practically meaningful held-out gains over the pairwise envelope.
Empirical evaluation on the five benchmarks comparing optimized subsequence cascades to the pairwise envelope; reported lack of practically meaningful held-out improvement.
high negative Is Escalation Worth It? A Decision-Theoretic Characterizatio... held-out performance gains of optimized subsequence cascades relative to the pai...
Within the deterministic threshold-cascade class, full fixed chains underperform the pairwise envelope.
Empirical comparison across the reported benchmarks and models showing that full fixed chains achieve worse cost-quality tradeoffs than the pairwise envelope (experimental results described in the paper).
high negative Is Escalation Worth It? A Decision-Theoretic Characterizatio... relative cost-quality performance of full fixed-chain cascades versus the pairwi...
Municipal 311 call centers and complaint intake systems face a structural mismatch between incoming volume and classification capacity that produces a bottleneck and differential service quality that follows income and racial lines.
Stated in the paper's introduction; cites prior work (Liu 2024 SLA) as support for the differential service-quality / demographic claim. No sample size or quantitative result reported in the excerpt.
high negative Scaling the Queue: Reinforcement Learning for Equitable Call... differential service quality by income and race