The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (5157 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 736 1615
Governance & Regulation 664 329 160 99 1273
Organizational Efficiency 624 143 105 70 949
Technology Adoption Rate 502 176 98 78 861
Research Productivity 348 109 48 322 836
Output Quality 391 120 44 40 595
Firm Productivity 385 46 85 17 539
Decision Quality 275 143 62 34 521
AI Safety & Ethics 183 241 59 30 517
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 105 40 6 187
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 78 8 1 151
Regulatory Compliance 69 64 14 3 150
Training Effectiveness 81 15 13 18 129
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
AI adoption in peripheral economies is not a purely technological or financial challenge but a social and human capital challenge, embedded in a biocultural environment shaped by brain drain, institutional thinness, and weak civic intermediation.
Synthesis of interview findings using Bitsani's Biocultural City framework; qualitative evidence from 12 interviews supports this argument.
high negative Artificial Intelligence, Social Capital, and Sustainable Emp... nature_of_challenges_to_AI_adoption
Knowledge deficits and financial constraints emerge as primary barriers [to AI adoption].
Thematic analysis of the twelve semi-structured interviews reporting these themes as primary barriers.
Disclosure banners, conversion A/B testing, UI dark-pattern taxonomies, and generic LLM safety scores were built for older interfaces and miss the prose-recommendation surface where the steering happens.
Argument in paper that existing governance/audit tools designed for ranked-list or older UIs do not cover the new single-sentence prose-recommendation surface; no empirical test reported in excerpt.
high negative TourMart: A Parametric Audit Instrument for Commission Steer... coverage/effectiveness of existing governance tools for prose recommendations
Common failures include replacing essential operations such as sweeps, lofts, and twist-extrudes with simpler sketch-and-extrude patterns.
Error-mode analysis described in the paper/abstract showing that models substitute complex CAD operations (sweep, loft, twist-extrude) with simpler sketch-and-extrude sequences.
high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... use_of_appropriate_CAD_operations_in_generated_code
Common failures include misinterpreting industrial design parameters.
Reported error analysis in the paper/abstract indicating models often misinterpret engineering/design parameters when generating CAD programs.
high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... accuracy_of_inferred_design_parameters
Common failures include missing fine 3D structure.
Qualitative and quantitative analysis of model outputs on BenchCAD reported in the paper/abstract noting missing fine 3D structural details as a frequent error mode.
high negative BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... completeness_of_3D_structure_in_generated_models
Current AI development trajectory reflects value choices that prioritize conversational generality over domain specificity, accountability, and long-term social sustainability.
Normative/critical analysis in the paper highlighting design priorities and trade-offs; no empirical measurement provided.
high negative What if AI systems weren't chatbots? Relative prioritization of conversational generality versus domain specificity, ...
Sustained investment in large-scale chatbot infrastructures increases environmental costs.
Paper asserts environmental impacts from infrastructure investment (energy, resource use) as part of systemic critique; no quantified environmental measurements or sample size reported.
high negative What if AI systems weren't chatbots? Environmental costs associated with energy/resource use of chatbot infrastructur...
Chatbot-driven AI development contributes to concentration of economic power.
Argumentation about industry dynamics and infrastructure centralization in the paper; no empirical market-concentration metrics or sample provided.
high negative What if AI systems weren't chatbots? Concentration of economic power among firms/platforms producing and hosting chat...
The normalization of chatbots contributes to labor displacement.
Theoretical argument linking widespread chatbot adoption to changes in work and employment; no empirical displacement estimates provided.
high negative What if AI systems weren't chatbots? Labor displacement (job losses attributable to chatbot adoption)
Normalization of chatbot-mediated interaction alters patterns of work, learning, and decision-making, contributing to deskilling, homogenization of knowledge, and shifting expectations of expertise.
Analytical reasoning and literature-informed claims in the paper; no quantitative measurement or sample reported.
high negative What if AI systems weren't chatbots? Levels of skill retention/ acquisition (deskilling), diversity of knowledge (hom...
Chatbot-based systems often fail to adequately meet user needs, particularly in complex or high-stakes contexts, while projecting confidence and authority.
Qualitative argumentation and illustrative examples in the paper; no reported controlled empirical study or sample size.
high negative What if AI systems weren't chatbots? Adequacy of chatbot responses to user needs in complex/high-stakes contexts and ...
The chatbot paradigm is not a neutral interface choice, but a dominant sociotechnical configuration whose widespread adoption reshapes social, economic, legal, and environmental systems.
Conceptual argument and synthesis in the paper (theoretical analysis); no empirical sample or quantitative data reported.
high negative What if AI systems weren't chatbots? Degree to which chatbot adoption reshapes social, economic, legal, and environme...
This reliance frequently leads to an excessive reliance on mechanistic interpretability to address a deployment challenge beyond its intended scope.
Author argument drawing on conceptual critique and cited empirical distinctions (paper's argumentative content).
high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... appropriateness of mechanistic interpretability as a gate for deployment
AI deployment in sensitive domains (health care, credit, employment, criminal justice) is often treated as unsafe to authorize until model internals can be explained.
Author assertion based on observed regulatory and institutional tendencies described in the paper (argumentative / contextual evidence within the paper).
high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... authorization policy stance toward AI in sensitive domains (requirement for inte...
A scoping review found that only 9.0% of FDA-approved AI/ML device documents contained a prospective post-market surveillance study.
Paper references a scoping review that examined FDA-approved AI/ML device documents and reported the 9.0% figure.
high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... presence of prospective post-market surveillance study in FDA AI/ML device docum...
A 53-percentage-point gap between internal representations and output correction shows that understanding may not translate into action.
Paper cites a recent empirical finding reporting a 53 percentage-point gap between models' internal representations and their ability to correct outputs (described as 'recent evidence').
high negative The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... gap between internal model representations and ability to correct outputs
In labor-intensive industries, industrial robots shorten the backward linkage length (i.e., they reduce backward linkage length in labor-intensive sub-sectors).
Heterogeneity analysis in the paper comparing effects across labor-intensive sub-sectors within the panel of 14 manufacturing sub-sectors; reported finding of a negative effect on backward linkage length in labor-intensive industries.
high negative Research on the impact of industrial robot application on th... backward linkage length (a component of global value chain length) in labor-inte...
Institutional inertia in property valuation poses risks to asset pricing, collateral risk modelling and investor confidence.
Analytical inference from interview findings and theoretical synthesis highlighting implications for property investment and financial market stability.
high negative Exploring barriers to valuation technology adoption in prope... risks to asset pricing, collateral risk modelling and investor confidence
Despite advances in automation, data analytics and AI, the sector has been slow to digitise.
Background statement supported by interview data and sector observation reported in the study.
high negative Exploring barriers to valuation technology adoption in prope... pace of digitisation in the property valuation sector
The IDOI framework provides a transferable model for understanding digital transformation in regulated, high-trust professions and highlights the market-level risks of institutional inertia in property valuation.
Development of the IDOI conceptual framework from qualitative data and theoretical integration; authors' claim about transferability and implications.
high negative Exploring barriers to valuation technology adoption in prope... transferability of the framework and market-level risks from institutional inert...
Generational divides, protectionist attitudes and fears of automation reinforce digital resistance.
Qualitative interview evidence reporting attitudes across cohorts of valuers and firm personnel; thematic analysis identifying cultural and attitudinal themes.
high negative Exploring barriers to valuation technology adoption in prope... cultural/attitudinal resistance to VTech
The Valuers Act (1948), fragmented infrastructure and sovereignty concerns limit innovation.
Interview data from practitioners, firm leaders and regulators in New Zealand citing specific regulatory and infrastructure constraints; thematic analysis.
high negative Exploring barriers to valuation technology adoption in prope... regulatory and infrastructure constraints on innovation
Barriers to adoption arise primarily from institutional conservatism, outdated regulation and weak data governance rather than technical shortcomings.
Qualitative semi-structured interviews with valuers, firm leaders and regulators in New Zealand; thematic analysis guided by Rogers' diffusion of innovations and institutional theory synthesised into the IDOI framework.
high negative Exploring barriers to valuation technology adoption in prope... barriers to VTech adoption
Even access to the true conditional vulnerability probability cannot eliminate misallocation: aleatoric uncertainty over individual vulnerability status is irreducible, and probabilistic targeting inevitably misallocates some resources.
Theoretical argument in the paper (conceptual/theoretical result about irreducible aleatoric uncertainty and its implications for probabilistic targeting).
high negative The Limits of AI-Driven Allocation: Optimal Screening under ... misallocation of resources (allocation error due to aleatoric uncertainty)
Consequently, generated artifacts may exhibit brittle behavior and limited deployability.
Paper asserts that lack of production awareness leads to brittle artifacts and limited deployability; no quantitative measures or sample sizes provided in the abstract.
high negative Architectural Constraints Alignment in AI-assisted, Platform... brittleness of artifacts and deployability
AI-assisted development tools often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments.
Asserted observation in the paper arguing limitations of general-purpose AI code generation when targeting production-ready systems; no empirical sample size or methodological details provided in the excerpt.
high negative Architectural Constraints Alignment in AI-assisted, Platform... awareness of architectural constraints / suitability for production
Current AI tools are not yet mature enough to replace developers.
Conclusion drawn from the controlled experiment and participant feedback comparing AI-assisted vs traditional task-splitting.
high negative Splitting User Stories Into Tasks with AI -- A Foe or an All... suitability of AI to replace developers
Breaking down user stories into actionable tasks is a critical yet time-consuming process in agile software development.
Background/introductory statement in the paper describing the problem motivation; no experimental sample size reported for this claim.
high negative Splitting User Stories Into Tasks with AI -- A Foe or an All... time required to split user stories (descriptive claim about time consumption)
There are three practical failure modes produced or amplified by AI-assisted causal analysis: (1) method-data mismatch, where AI bypasses expertise at execution; (2) confidence laundering, where AI amplifies the credibility of formatted output; and (3) invisible forking, which spans both.
Taxonomy created and justified in the paper via conceptual argument and illustrative discussion; no empirical classification study or prevalence estimates provided.
high negative Vibe Econometrics and the Analysis Contract types of inferential failure modes arising in AI-assisted causal analysis
AI industrializes the packaging of existing inferential failure modes: the barrier between naming a method and executing it has collapsed, allowing weak foundations, dressed as rigorous analysis, to reach audiences at a scale, speed, and polish that previously required expertise.
Conceptual claim supported by narrative reasoning and illustrative examples; no empirical data on scale, speed, or reach are given.
high negative Vibe Econometrics and the Analysis Contract scale/speed/polish of dissemination of weak analyses (i.e., reach/adoption of lo...
AI changes the incidence, observability, and persuasive force of inferential failures enough to create a practically distinct governance problem (even if it does not invent previously nonexistent inferential failures).
Argumentative/theoretical reasoning in the paper; no empirical measurement of incidence, observability, or persuasiveness provided.
high negative Vibe Econometrics and the Analysis Contract governance challenge arising from changed incidence, observability, and persuasi...
When AI assists with methods whose validity depends on assumptions that cannot be verified from the output alone ("vibe inference"), the failure surface is structurally different: the output does not reliably signal invalidity, and when it does, recognizing the signal requires the expertise the workflow bypasses.
Logical/qualitative argument and definition development in the paper (no empirical validation or measured instances provided).
high negative Vibe Econometrics and the Analysis Contract observability/detectability of invalid inference and requirement of expert knowl...
AI-assisted methodology ("vibe methodology") democratizes the failure modes specific to each domain.
Conceptual/theoretical argument presented in the paper; no empirical sample, quantitative data, or experiments reported.
high negative Vibe Econometrics and the Analysis Contract democratization of domain-specific inferential failure modes (i.e., more widespr...
AI adoption deepens the negative indirect effect of CEO–TMT faultlines on green innovation via reduced eco-attention (moderated mediation).
Reported moderated mediation analysis on the panel dataset (35,347 firm-year observations) showing that AI moderates the indirect path from CEO–TMT faultlines to green innovation through eco-attention, making the indirect effect more negative when AI is greater.
high negative When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... green innovation (indirect effect via eco-attention)
AI technology strengthens the negative relationship between CEO–TMT faultlines and eco-attention (AI exacerbates the adverse effect of faultlines on eco-attention).
Moderation/interaction analysis reported in the paper using the same panel dataset (35,347 firm-year observations) indicating a significant interaction between AI adoption and CEO–TMT faultlines on eco-attention.
CEO–TMT faultlines reduce eco-attention (organizational attention to environmental issues).
Direct association reported in the paper from regression/mediation models using the panel dataset (35,347 firm-year observations) showing a negative relationship between CEO–TMT faultlines and eco-attention.
CEO–TMT faultlines negatively affect green innovation through reduced eco-attention.
Empirical mediation analysis on the panel dataset (35,347 firm-year observations, 2010–2023) testing CEO–TMT faultlines -> eco-attention -> green innovation.
high negative When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... green innovation (mediated by eco-attention)
Municipal 311 call centers and complaint intake systems face a structural mismatch between incoming volume and classification capacity that produces a bottleneck and differential service quality that follows income and racial lines.
Stated in the paper's introduction; cites prior work (Liu 2024 SLA) as support for the differential service-quality / demographic claim. No sample size or quantitative result reported in the excerpt.
high negative Scaling the Queue: Reinforcement Learning for Equitable Call... differential service quality by income and race
There is an absence of agreed-upon benchmarks for evaluating AI systems.
Introductory chapter notes lack of standardized evaluation benchmarks as a cross-cutting concern; presented as an analytical observation by the task force.
high negative Introduction: Artificial Intelligence, Politics, and Politic... existence of standardized evaluation benchmarks for AI
AI systems exhibit bias.
Introductory chapter points to bias in AI systems as a recurring theme; supported by the broader literature cited in the report (no numerical sample reported in the introduction).
high negative Introduction: Artificial Intelligence, Politics, and Politic... bias and fairness issues in AI system outputs and decisions
AI model outputs are often opaque and non-replicable.
Introductory chapter identifies opacity and non-replicability of AI outputs as a cross-cutting theme; claim is based on literature synthesis and conceptual critique in the report.
high negative Introduction: Artificial Intelligence, Politics, and Politic... transparency and replicability of AI model outputs
A small number of AI corporations have unprecedented power.
Introductory chapter highlights the theme of concentrated corporate power in AI; asserted as an observational claim in the report's framing rather than derived from a presented empirical sample in the introduction.
high negative Introduction: Artificial Intelligence, Politics, and Politic... concentration of corporate power in the AI industry (market control, platform in...
GPT-4.1 exhibits hidden workflow shortcuts despite achieving perfect TSR and HF1.
Model-level observation from the ASR analysis within the experiment (paper reports GPT-4.1 had perfect TSR and HF1 but failed trajectory-level fidelity).
high negative Beyond Task Success: Measuring Workflow Fidelity in LLM-Base... trajectory fidelity vs. standard metrics (TSR, HF1)
Applied to the Hierarchical Multi-Agent System for Payments (HMASP) across 18 LLMs and 90,000 task instances, ASR reveals that 10 of 18 models systematically skip a confirmation checkpoint during payment checkout, a deviation invisible to both TSR and HF1, while 8 models enforce the checkpoint perfectly.
Empirical evaluation reported in the paper: HMASP tested across 18 LLMs and 90,000 task instances; analysis via ASR showing checkpoint-skipping behavior for 10 models and correct enforcement for 8 models.
high negative Beyond Task Success: Measuring Workflow Fidelity in LLM-Base... adherence to expected workflow transitions (confirmation checkpoint adherence)
From an information-theoretic perspective, this transition corresponds to an emergent information bottleneck in the human-AI loop, where entropy reduction reflects loss of diversity and support under closed-loop feedback rather than beneficial compression.
Theoretical / information-theoretic analysis in the paper linking observed dynamics to entropy reduction and information bottleneck concepts.
high negative Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Sy... entropy (diversity/support) of the human-AI data loop and its interpretation as ...
Through a simple simulation, we demonstrate that increasing reliance on AI can induce a transition toward a low-diversity, suboptimal equilibrium.
Computational simulation reported in the paper (described as a 'simple simulation'); no sample size or experimental dataset reported in the provided text.
high negative Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Sy... system transitioning to a low-diversity, suboptimal equilibrium as reliance on A...
DePAI entails risks including security, centralization, incentive failure, legal exposure, and the crowding-out of intrinsic motivation, requiring value-sensitive design and continuously adaptive governance.
Risk analysis and conceptual argument in the paper identifying possible failure modes and recommended design/governance responses; no empirical incidence data provided.
high negative DAO-enabled decentralized physical AI: A new paradigm for hu... security, centralization, incentive failure, legal exposure, and intrinsic motiv...
Experimental results show that current agents remain far from reliable workspace learning.
Authors' interpretation based on the reported agent performance (< best agent 68.7% vs human 80.7%, average 47.4%).
high negative Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tas... reliability of agents on workspace learning tasks
The average performance across evaluated agents is only 47.4%.
Reported mean performance across agents in the experiments (authors' aggregated result).
high negative Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tas... average benchmark score across agents