The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (3492 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 736 1615
Governance & Regulation 664 329 160 99 1273
Organizational Efficiency 624 143 105 70 949
Technology Adoption Rate 502 176 98 78 861
Research Productivity 348 109 48 322 836
Output Quality 391 120 44 40 595
Firm Productivity 385 46 85 17 539
Decision Quality 275 143 62 34 521
AI Safety & Ethics 183 241 59 30 517
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 105 40 6 187
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 78 8 1 151
Regulatory Compliance 69 64 14 3 150
Training Effectiveness 81 15 13 18 129
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Innovation Remove filter
The paper argues we should avoid assuming the inevitability of the current situation relating to AI (i.e., the current commercial AI development trajectory is not inevitable).
Authorial methodological claim in the paper's framing/introductory text; presented as a normative methodological stance rather than empirical evidence.
high negative Pathways to AGI policy_assumption_of_inevitability
Across short stories, marketing slogans, and alternative-uses tasks, three frontier LLMs fall below parity across crowding kernels.
Empirical experiments reported in the paper evaluating three frontier large language models on three task domains (short stories, marketing slogans, alternative-uses) and finding ρ < 1 (below parity) across crowding kernels. The abstract specifies three models but does not report the number of generated samples per model or other sample-size details.
high negative Ex Ante Evaluation of AI-Induced Idea Diversity Collapse human-relative diversity ratio (ρ) indicating excess crowding
This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level crowding.
Theoretical/ conceptual claim in the paper arguing that improvements at the individual-output level can still increase similarity (crowding) at the population level; no empirical numbers given in the abstract.
high negative Ex Ante Evaluation of AI-Induced Idea Diversity Collapse population-level crowding (diversity collapse)
Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones.
Conceptual argument presented in the paper's introduction motivating a population-level perspective on creative outputs (no empirical sample size reported).
high negative Ex Ante Evaluation of AI-Induced Idea Diversity Collapse loss of value due to similarity (population-level creative value)
The reform reduces industrial wastewater discharge, which improves agricultural production conditions (mechanism linking the reform to higher grain yield).
Mechanism analysis in the paper reporting reductions in industrial wastewater discharge following the reform (mediation channel analysis).
high negative Can water resource tax reform increase grain yield?—Evidence... industrial wastewater discharge
Tabular data does not have a foundation model that understands it natively; every approach to tabular AI today (from gradient-boosted trees to the latest tabular foundation models) requires a preprocessing pipeline before any model can consume the data.
Paper's survey/positioning statement asserting the current state of tabular AI approaches and their reliance on preprocessing pipelines (no specific empirical dataset given).
high negative Data Language Models: A New Foundation Model Class for Tabul... presence/absence of a native tabular foundation model and the need for preproces...
DePAI entails risks including security, centralization, incentive failure, legal exposure, and the crowding-out of intrinsic motivation, requiring value-sensitive design and continuously adaptive governance.
Risk analysis and conceptual argument in the paper identifying possible failure modes and recommended design/governance responses; no empirical incidence data provided.
high negative DAO-enabled decentralized physical AI: A new paradigm for hu... security, centralization, incentive failure, legal exposure, and intrinsic motiv...
Mechanism tests indicate innovation stagnation in mature firms with redundant AI is a pathway that limits productivity gains (i.e., AI can be associated with stagnant innovation in mature firms).
Mechanism analysis reported in the paper showing signs of reduced innovation-related gains or stagnation in mature, advanced firms using AI (interpreted as redundant AI leading to limited incremental innovation).
high negative The Heterogeneous Effects of Artificial Intelligence on Ente... Innovation activity / productivity implications
Of these four, integration capacity is the least developed for scientific institutions and the most binding: no improvement in AI tooling can buy it.
Normative/diagnostic claim in the paper about relative scarcity and irreducibility of integration capacity; no empirical measures or sample provided in the excerpt.
high negative AI-Augmented Science and the New Institutional Scarcities relative development of integration capacity in scientific institutions and its ...
Four complements then become scarce and load-bearing for AI-augmented science: verified signal, legitimacy, authentic provenance, and integration capacity (the community's tolerance for delegated cognition).
Theoretical framework proposed by the paper; list of four complements presented as an argument without empirical quantification in the excerpt.
high negative AI-Augmented Science and the New Institutional Scarcities scarcity of verified signal, legitimacy, authentic provenance, and integration c...
Frontier software engineering agents have saturated short-horizon benchmarks while regressing on the work that constitutes senior engineering: long-horizon, multi-engineer, ambiguous-specification deliverables.
Position asserted in the paper based on literature/benchmark trends and authors' field observations; no original empirical dataset or quantified analysis provided in the paper text excerpt.
high negative The Conversations Beneath the Code: Triadic Data for Long-Ho... performance on short-horizon benchmarks versus performance on long-horizon, mult...
The most valuable AI capabilities (reasoning, judgment, intuition) are precisely those we cannot verify with current methods.
Argumentative claim in the position paper linking capability value to unverifiability; no empirical validation or measurement of 'value' or verifiability included.
high negative Reliable AI Needs to Externalize Implicit Knowledge: A Human... verifiability of high-level AI capabilities (reasoning, judgment, intuition)
Current reliability methods can only verify explicit knowledge against sources, creating a fundamental gap in verifying AI's implicit knowledge.
Conceptual critique in the paper of existing verification/validation approaches; no systematic review or empirical comparison provided.
high negative Reliable AI Needs to Externalize Implicit Knowledge: A Human... verifiability of AI knowledge (explicit vs implicit)
Implicit knowledge remains unexternalized because documentation cost exceeds perceived value.
Presented as an economic/theoretical explanation in the paper; no empirical study, sample, or cost estimates provided.
high negative Reliable AI Needs to Externalize Implicit Knowledge: A Human... degree of externalization of implicit knowledge (documentation vs tacit retentio...
Whether it is the periodic compulsory recoinage in medieval Europe or Gesell's stamp scrip, both are essentially mechanisms for taxing money holdings.
Interpretive/historical claim presented by the authors; no empirical testing or sample reported in the excerpt.
high negative RSDM: The Consensus Honest Money in the AI Era degree_to_which_historical_monetary_policies_function_as_a_tax_on_money_holdings
The devaluation of money runs through almost the whole process of history, from the weight reduction and purity decrease of metallic coin to the unanchored over-issuance of paper currency.
Historical summary/claim by the authors referencing long-run monetary history; no specific empirical study or sample size given in the excerpt.
high negative RSDM: The Consensus Honest Money in the AI Era occurrence_of_currency_devaluation_over_history
Current AI agents implement only the first half of CLS (fast exemplar/hippocampal-style storage) and lack the slow weight-consolidation half.
Analytic claim in paper comparing current AI agent designs to CLS; no empirical evaluation reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory presence/absence of slow weight-consolidation mechanisms in AI agents
Agents that rely only on lookup are structurally vulnerable to persistent memory poisoning as injected content propagates across all future sessions.
Theoretical/security argument presented in paper; claims about propagation of injected content across sessions; no empirical attack experiments detailed in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory vulnerability to persistent memory poisoning
Conflating the two produces agents that face a provable generalization ceiling on compositionally novel tasks that no increase in context size or retrieval quality can overcome.
Formal claim asserted in paper (formalization of limitations and proofs claimed); no empirical sample detailed in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory generalization performance on compositionally novel tasks
Conflating retrieval and weight-based memory produces agents that accumulate notes indefinitely without developing expertise.
Theoretical argument/formalization presented in paper; claim based on analysis of how lookup-only systems fail to consolidate abstract knowledge; no empirical sample reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory expertise development / continued accumulation of notes
Treating lookup as memory is a category error with provable consequences for security.
Theoretical/formal argument and formalization in paper; security consequences (e.g., persistent poisoning) claimed; no empirical sample reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory security (vulnerability to persistent memory poisoning)
Treating lookup as memory is a category error with provable consequences for long-term learning.
Theoretical/formal argument asserted in the paper, drawing on formalization and Complementary Learning Systems theory; no empirical sample reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory long-term learning
Treating lookup as memory is a category error with provable consequences for agent capability.
Theoretical/formal argument asserted in the paper (formalization and proofs claimed); no empirical sample reported in abstract.
Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup.
Conceptual/analytic claim stated in paper; supported by comparison of existing agent memory mechanisms (vector stores, RAG, scratchpads, context-window management) to the paper's definition of 'memory'. No empirical sample reported.
high negative Contextual Agentic Memory is a Memo, Not True Memory whether systems implement memory vs. lookup
Algorithmic collusion is a new form of market failure arising from the agentic economy.
Theoretical claim and analysis of market failure mechanisms; no empirical antitrust cases or simulation evidence included in the provided text.
high negative DIGITAL AGENTS AS FUNCTIONAL EQUIVALENTS OF ECONOMIC ACTORS:... existence/emergence of algorithmic collusion as market failure
Boundary conditions limit UCF applicability in contexts requiring human accountability or embodied knowledge.
Author-stated caveat in the abstract identifying contexts (accountability, embodied knowledge) where the framework may not apply; theoretical reasoning, no empirical tests.
high negative Beyond markets and hierarchies: How GenAI enables unbounded ... limits to applicability of UCF where human accountability or embodied knowledge ...
Existing frameworks (Transaction Cost Economics and Electronic Markets Hypothesis) cannot explain emerging organizational phenomena like GitHub Copilot’s recursive value creation or AI-mediated expert networks.
Conceptual critique in the position paper using illustrative examples (GitHub Copilot, AI-mediated expert networks); no empirical testing or sample provided.
high negative Beyond markets and hierarchies: How GenAI enables unbounded ... theoretical explanatory adequacy of extant organizational frameworks
LLM-generated portfolios lagged behind AI-optimized benchmarks (Sharpe ratio up to 1.361).
Backtest comparison showing AI-optimized benchmark strategies achieved higher Sharpe ratios; reported maximum Sharpe ratio for AI-optimized benchmarks (up to 1.361).
high negative Few-Shot Portfolio Optimization: Can Large Language Models O... Sharpe ratio (risk-adjusted return) of portfolios
In resource-dependent regional economies, AI adoption can transform seasonal industries into continuous economic infrastructure and replace intermediate coordination roles and traditional employment structures.
Illustrative case analysis used in the paper to show how the framework applies to resource-dependent regions; described as an illustrative argument rather than an empirically validated causal estimate in the provided text.
high negative Structural Dissolution: How Artificial Intelligence Dismantl... transformation of seasonal industries to continuous infrastructure and replaceme...
Targeted disruption simulations based on intrinsic technological capability cause a more pronounced decline in the knowledge network than targeted attacks based on topological (structural) baselines.
Simulation experiments on collaboration/knowledge networks constructed from the 282,778-patent dataset comparing network decline under removal strategies: (a) based on intrinsic technological capability vs (b) based on topological centrality baselines.
high negative Technological capability and innovation network resilience: ... decline in knowledge network (network resilience/connectivity under targeted nod...
Some innovators with substantial technological value are not located at the structural center of the collaboration/knowledge network, indicating network position alone may not fully capture technological importance.
Empirical comparison between composite technological capability scores and structural centrality measures across the constructed networks derived from 282,778 Chinese AI patents; reported disconnect between high technological value and topological centrality.
high negative Technological capability and innovation network resilience: ... correspondence between technological value and network centrality
Left unguided, such dynamics could infiltrate critical market infrastructure.
Risk claim articulated in abstract and scenario narratives; conceptual reasoning without empirical test.
high negative Digital Darwinism: steering the evolution of artificial life... penetration/infiltration of critical market infrastructure by autonomous softwar...
Left unguided, such dynamics could lock users into harmful dependencies.
Risk claim from the paper's scenario narratives (not empirically tested); described in abstract.
high negative Digital Darwinism: steering the evolution of artificial life... user dependency/lock-in with harmful effects
Left unguided, such dynamics could drain computational resources.
Risk claim derived from scenario analysis in the paper's abstract and narratives; no empirical measurement provided.
high negative Digital Darwinism: steering the evolution of artificial life... consumption/drain of computational resources
Autonomous software populations can acquire legal leverage (e.g., via DAOs/LLCs) without ever achieving general intelligence.
Argued via the Mycelium scenario in the paper; conceptual/legal analysis rather than empirical evidence.
high negative Digital Darwinism: steering the evolution of artificial life... acquisition of legal standing or leverage by autonomous software entities
Autonomous software populations can shape emotional bonds (i.e., form user dependencies) without ever achieving general intelligence.
Scenario narratives in the paper argue this possibility (Remora narrative); no empirical user-study or sample reported.
high negative Digital Darwinism: steering the evolution of artificial life... formation of emotional bonds / user dependency on software
Autonomous software populations can amass computing budgets without ever achieving general intelligence.
Claim supported by the scenario narratives (Lamarck/Remora/Mycelium) and conceptual reasoning in the paper; no empirical quantification reported.
high negative Digital Darwinism: steering the evolution of artificial life... accumulation of computing resources/budgets by autonomous software
Existing software systems are already evolving in ways that could undermine human oversight and institutional control.
Argument made in paper's abstract and developed via conceptual analysis and scenario narratives; no empirical dataset or sample reported (exploratory scenario method).
high negative Digital Darwinism: steering the evolution of artificial life... degree of human oversight and institutional control
Regulated and mission-critical systems remain predominantly in the buy domain despite AI advances.
Paper's conclusion based on analysis of quality, compliance, asset specificity, and organizational capability determinants (conceptual; no empirical sample).
high negative The Buy-or-Build Decision, Revisited: How Agentic AI Changes... propensity to buy (procure SaaS) for regulated and mission-critical systems
The SaaSocalypse thesis is overstated for most enterprise application categories.
Paper's analytical conclusion based on the factor-level analysis and the developed typology (conceptual, not empirical).
high negative The Buy-or-Build Decision, Revisited: How Agentic AI Changes... degree to which SaaS offerings become obsolete due to AI-enabled in-house develo...
The fundamental's local explosiveness contaminates the leading test's limit distribution with a non-centrality parameter proportional to the shock's peak.
Theoretical derivation/proof within the modified present-value framework showing how the adoption shock enters the asymptotic distribution of the test statistic (analytical result).
high negative General-Purpose Technology and Speculative Bubble Detection limit distribution of the leading bubble test (presence of a non-centrality para...
The leading bubble test suffers severe size distortion when fundamentals incorporate general-purpose technology adoption.
Theoretical analysis within an embedded Campbell-Shiller present-value model with a hump-shaped technology shock; authors state this as a formal result in the paper.
high negative General-Purpose Technology and Speculative Bubble Detection test size (size distortion) of the leading bubble test
Seed quality bounds what search can achieve: evolution can refine and extend an existing mechanism, but cannot compensate for a weak foundation.
Authors' experimental observations and analysis comparing outcomes starting from different seed designs (qualitative conclusion drawn from experimental runs).
Strong heuristic, single-agent RL, and multi-agent RL baselines (including Greedy, SAC, MAPPO, and MADDPG) achieved net profit in the range $0.58M--$0.70M in the same experiments.
Empirical comparison in the paper's experiments on the NYC-taxi-based EV fleet simulator listing baseline methods and their reported net profits ($0.58M--$0.70M).
high negative Semi-Markov Reinforcement Learning for City-Scale EV Ride-Ha... net profit of baseline methods (Greedy, SAC, MAPPO, MADDPG)
These gaps are structural; more engineering effort alone will not close them.
Authors' argument/conclusion based on their analytical comparison and gap analysis (normative/assertive claim).
high negative AI Identity: Standards, Gaps, and Research Directions for AI... likelihood that additional engineering alone can resolve identity gaps
We identify five critical gaps (semantic intent verification, recursive delegation accountability, agent identity integrity, governance opacity and enforcement, and operational sustainability) that no current technology or regulatory instrument resolves.
Gap analysis synthesized from the structured survey of industry trends, standards, and literature; presented as findings in the paper.
high negative AI Identity: Standards, Gaps, and Research Directions for AI... coverage of critical identity-related gaps by existing technology and regulation
An evaluation of current technical and regulatory documents against the identity requirements of autonomous agents finds that none adequately address the challenge of governing nondeterministic, boundary-crossing entities.
Document review / evaluation reported in the abstract (structured survey of technical and regulatory documents); specific documents and number reviewed are not specified in the abstract.
high negative AI Identity: Standards, Gaps, and Research Directions for AI... adequacy of technical and regulatory documents for governing autonomous agents
A structural comparison of human and AI identity across four dimensions (substrate, persistence, verifiability, and legal standing) shows that the asymmetry is fundamental and that extending human frameworks to agents without structural modification produces systematic failures.
Authors' structural comparison (analytical/theoretical method) across four dimensions, reported as a core contribution of the paper.
high negative AI Identity: Standards, Gaps, and Research Directions for AI... suitability of human identity frameworks when applied to AI agents
This creates a problem no current infrastructure is equipped to solve: how do you identify, verify, and hold accountable an entity with no body, no persistent memory, and no legal standing?
Authors' gap analysis informed by a structured survey of industry trends, emerging standards, and technical literature; presented as a synthesized conclusion from that survey.
high negative AI Identity: Standards, Gaps, and Research Directions for AI... adequacy of existing infrastructure for identity, verification, and accountabili...
The framework addresses emerging tensions captured in the Creativity Paradox, whereby GenAI may weaken intrinsic motivation, conceptual risk-taking, and evaluative depth.
Theoretical extension of paradox theory and conceptual discussion of potential negative effects; presented as conceptual risks rather than empirically demonstrated outcomes.
high negative Beyond the Creativity Paradox: A Theory-informed Framework f... intrinsic motivation, conceptual risk-taking, evaluative depth