The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6869 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Governance Remove filter
Many fear AI may displace them from their jobs.
Paper reports survey-style finding about public fear of job displacement (no specific surveys, question wording, dates, or sample sizes given in the excerpt).
high negative AI’s Economy and Its Political and Institutional Consequence... perceived risk of job displacement
Although AI may affect nonroutine jobs in particular.
Statement in paper; asserted as a general finding about which types of jobs AI impacts (no specific dataset, sample size, or empirical method reported in the excerpt).
high negative AI’s Economy and Its Political and Institutional Consequence... vulnerability of nonroutine jobs to AI
The welfare equivalence property is unique to the Brier score: for every non-Brier strictly proper scoring rule, the welfare gap under smooth C^1 oversight is bounded below by Ω(Var(1/G'') (γ/β)^2).
Mathematical lower-bound result proved in the paper comparing welfare under smooth C^1 oversight for non-Brier scoring rules; the bound is expressed as Ω(Var(1/G'') (γ/β)^2) in the paper.
high negative The Endogeneity of Miscalibration: Impossibility and Escape ... welfare gap between second-best and first-best under smooth C^1 oversight for no...
The impossibility (that non-affine approval undermines truthful reporting) holds for all strictly proper scoring rules, and the paper provides a closed-form perturbation formula.
General theoretical result proved across the class of strictly proper scoring rules, accompanied by a closed-form formula for the perturbation in the paper.
high negative The Endogeneity of Miscalibration: Impossibility and Escape ... existence and magnitude of perturbation from truthful reporting under arbitrary ...
Any non-affine approval makes truthful reporting suboptimal under the combined objective whenever deviation is undetectable — the principal cannot avoid the perturbation that undermines calibration.
Analytical impossibility theorem in the paper's formal model showing that non-affine approvals create incentives for non-truthful reports when deviations are undetectable (mathematical proof).
high negative The Endogeneity of Miscalibration: Impossibility and Escape ... truthfulness of agent reports (report calibration/truthfulness)
Opaque agent objectives, synthetic traffic loops, and the indistinguishability between human-originated and agent-mediated signals are critical measurement problems examined in the paper.
Conceptual examination and literature synthesis; the paper discusses these as open problems rather than providing primary empirical solutions.
high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... degree of opacity and indistinguishability of agent-mediated versus human-origin...
The paper identifies three properties of LLM agents that distinguish the present challenge from prior bot-detection problems: identity discontinuity by design, task-based instantiation, and agent-to-agent loops.
Analytic claim based on synthesis of agent architecture literature; presented as conceptual identification rather than empirically tested properties.
high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... distinctive properties of LLM agents relevant to detection and measurement
A click may reflect an optimization routine, a proxy objective, or a recursive agent-to-agent exchange rather than meaningful human intent, and traditional inference frameworks cannot reliably distinguish among these possibilities.
Theoretical claim derived from literature on agent behaviors, agent-to-agent interactions, and limitations of existing inference frameworks; no empirical discrimination test reported in this paper excerpt.
high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... reliability of attribution of click events to meaningful human intent
The presence of autonomous AI agents weakens the interpretive value of core web analytics metrics, including sessions, engagement, conversion, and retention.
Argument based on conceptual synthesis of how non-human, non-persistent actors generate signals that undermine standard metric interpretations (position paper; no original empirical test included).
high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... interpretive validity of core web analytics metrics (sessions, engagement, conve...
Unlike crawlers and traditional bots, these agents do not possess persistent identities or psychologically grounded motivations; they are task-specific, dynamically instantiated processes whose behaviors are contingent and often orchestrated by external systems.
Conceptual analysis informed by literature on agent architecture and LLM-based agents; no primary empirical measurement presented in this paper excerpt.
high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... identity persistence and motivational structure of autonomous AI agents (vs. tra...
Conventional web analytics treats the human user as its fundamental unit of analysis, assuming stable preferences, identifiable intentions, and behavioral patterns that unfold over time.
Conceptual statement supported by literature synthesis and critique of standard web-analytics assumptions (position paper; no primary empirical sample reported).
high negative The Vanishing User: Web Analytics in an Agent-Dominated Inte... validity of web analytics' human-centered unit-of-analysis assumption (stability...
There are three practical failure modes produced or amplified by AI-assisted causal analysis: (1) method-data mismatch, where AI bypasses expertise at execution; (2) confidence laundering, where AI amplifies the credibility of formatted output; and (3) invisible forking, which spans both.
Taxonomy created and justified in the paper via conceptual argument and illustrative discussion; no empirical classification study or prevalence estimates provided.
high negative Vibe Econometrics and the Analysis Contract types of inferential failure modes arising in AI-assisted causal analysis
AI industrializes the packaging of existing inferential failure modes: the barrier between naming a method and executing it has collapsed, allowing weak foundations, dressed as rigorous analysis, to reach audiences at a scale, speed, and polish that previously required expertise.
Conceptual claim supported by narrative reasoning and illustrative examples; no empirical data on scale, speed, or reach are given.
high negative Vibe Econometrics and the Analysis Contract scale/speed/polish of dissemination of weak analyses (i.e., reach/adoption of lo...
AI changes the incidence, observability, and persuasive force of inferential failures enough to create a practically distinct governance problem (even if it does not invent previously nonexistent inferential failures).
Argumentative/theoretical reasoning in the paper; no empirical measurement of incidence, observability, or persuasiveness provided.
high negative Vibe Econometrics and the Analysis Contract governance challenge arising from changed incidence, observability, and persuasi...
When AI assists with methods whose validity depends on assumptions that cannot be verified from the output alone ("vibe inference"), the failure surface is structurally different: the output does not reliably signal invalidity, and when it does, recognizing the signal requires the expertise the workflow bypasses.
Logical/qualitative argument and definition development in the paper (no empirical validation or measured instances provided).
high negative Vibe Econometrics and the Analysis Contract observability/detectability of invalid inference and requirement of expert knowl...
AI-assisted methodology ("vibe methodology") democratizes the failure modes specific to each domain.
Conceptual/theoretical argument presented in the paper; no empirical sample, quantitative data, or experiments reported.
high negative Vibe Econometrics and the Analysis Contract democratization of domain-specific inferential failure modes (i.e., more widespr...
The relative importance of AI-related equities as shock transmitters diminishes over time.
Time-varying estimates from the TVP-VAR showing a decline in the net transmitter contribution of the AI equities group across the sample.
high negative Artificial Intelligence and Financial Market Connectedness: ... relative contribution of AI equities to spillovers
The overall level of connectedness declines modestly following the public release of ChatGPT by OpenAI in November 2022.
Time-series comparison of aggregate connectedness measures derived from the TVP-VAR, with a reported modest post-November 2022 decline (event reference: ChatGPT release).
high negative Artificial Intelligence and Financial Market Connectedness: ... aggregate connectedness level
A policy irreversibility result: there exists a critical time before the singularity after which redistribution becomes politically impossible because wealth concentration makes feasible tax rates vanishingly small.
Proof/argument in the paper showing that as time approaches the singularity the set of tax rates that satisfy political-feasibility constraints (workers' budget / feasibility) shrinks to zero, implying a latest feasible intervention time.
high negative The Economic Singularity: Core Mathematical Model political_feasibility_of_redistribution (feasible tax rates over time)
Financialization amplifies the exponent of the super-exponential divergence by a factor γ_F/η.
Mathematical derivation in the paper showing that the exponent in the asymptotic growth rate near the singularity is multiplied by γ_F/η when including the financialization term γ_F K_f^2 and its coupling parameter η.
high negative The Economic Singularity: Core Mathematical Model growth_exponent of wealth_ratio (asymptotic)
Near the singularity, the wealth ratio between capital owners and workers diverges super-exponentially.
Asymptotic analysis near the finite-time singularity showing that the ratio of capital-owner wealth to worker wealth grows faster than exponential (super-exponentially) as time approaches the blow-up time.
high negative The Economic Singularity: Core Mathematical Model wealth_ratio (capital owners vs. workers)
AGI (Artificial General Intelligence) is problematic both conceptually and definitionally.
Authorial assertion in the paper stating AGI is problematic as a concept and definition; framed as a conditioning assumption that shapes the subsequent analysis.
high negative Pathways to AGI conceptual_and_definitional_soundness_of_AGI
The paper argues we should avoid assuming the inevitability of the current situation relating to AI (i.e., the current commercial AI development trajectory is not inevitable).
Authorial methodological claim in the paper's framing/introductory text; presented as a normative methodological stance rather than empirical evidence.
high negative Pathways to AGI policy_assumption_of_inevitability
There is an absence of agreed-upon benchmarks for evaluating AI systems.
Introductory chapter notes lack of standardized evaluation benchmarks as a cross-cutting concern; presented as an analytical observation by the task force.
high negative Introduction: Artificial Intelligence, Politics, and Politic... existence of standardized evaluation benchmarks for AI
AI systems exhibit bias.
Introductory chapter points to bias in AI systems as a recurring theme; supported by the broader literature cited in the report (no numerical sample reported in the introduction).
high negative Introduction: Artificial Intelligence, Politics, and Politic... bias and fairness issues in AI system outputs and decisions
AI model outputs are often opaque and non-replicable.
Introductory chapter identifies opacity and non-replicability of AI outputs as a cross-cutting theme; claim is based on literature synthesis and conceptual critique in the report.
high negative Introduction: Artificial Intelligence, Politics, and Politic... transparency and replicability of AI model outputs
A small number of AI corporations have unprecedented power.
Introductory chapter highlights the theme of concentrated corporate power in AI; asserted as an observational claim in the report's framing rather than derived from a presented empirical sample in the introduction.
high negative Introduction: Artificial Intelligence, Politics, and Politic... concentration of corporate power in the AI industry (market control, platform in...
The Price of Fairness can be large even when group distributions are nearly identical.
Theoretical result/constructive example in the paper showing instances where PoF is large despite near-identical group distributions.
high negative Price of Fairness in Short-Term and Long-Term Algorithmic Se... utility loss due to fairness constraints (PoF)
Enforcing static fairness constraints may exacerbate long-run disparities.
Statement referencing recent prior theoretical results and motivating literature; framed as background/motivation in the paper.
high negative Price of Fairness in Short-Term and Long-Term Algorithmic Se... long-run disparities between groups
Any metric that scores variants directly is manipulable as soon as two equivalent variants in a harmful class disagree in score.
Formal theoretical result/proof presented in the paper based on the transformation-graph semantic-class model.
Once announced, such a metric becomes an optimization target: a strategic platform can improve its score by routing recommendations through semantically equivalent content variants, without reducing true harm.
Modeling argument in the paper (transformation graph / semantic classes) and supported by formal analysis and experimental checks described in the paper.
We contribute a non-additive harm decomposition (welfare loss W, coverage loss C) that exposes how attrition shifts harm from the regulator-accountable surface to a regulator-invisible one.
Methodological contribution in the paper: definition of welfare loss W and coverage loss C and analysis showing attrition reallocates observable vs. unobservable harm; supported by theoretical exposition and simulation examples.
high negative A Benchmark for Strategic Auditee Gaming Under Continuous Co... distribution of harm (welfare loss vs coverage loss) and effect of sample attrit...
An audit-aware OffAuditDrift strategy that exploits Stackelberg commitment defeats both (Periodic-with-floor and history-conditioned suspicion-escalation) auditor extensions.
Construction of the OffAuditDrift auditee strategy in the paper and simulation/theoretical demonstration that it can evade both proposed auditor policies by exploiting auditor commitment.
high negative A Benchmark for Strategic Auditee Gaming Under Continuous Co... effectiveness of an audit-aware auditee strategy at defeating auditor policies
We identify a structural feature of any noise-aware static-auditor design: a cover regime in which coverage gaps and granularity gaps cannot be closed simultaneously (formalized as Observation 1).
Theoretical observation/proposition in the paper (Observation 1) derived from the formal model of continuous auditing under noise-aware static auditing rules.
high negative A Benchmark for Strategic Auditee Gaming Under Continuous Co... trade-off between coverage gaps and granularity gaps in static auditing designs
Regulated systems can delay outcome reporting, drift their reports within plausible noise envelopes, exploit longitudinal sample attrition, and cherry-pick among ambiguous metric definitions.
Specification and enumeration of auditee strategies in the paper (Delay, Drift, Cherry-pick, Attrition, OffAuditDrift); conceptual examples and inclusion in simulator.
high negative A Benchmark for Strategic Auditee Gaming Under Continuous Co... types of auditee strategic behaviors available under continuous audits
Continuous post-deployment compliance audits, mandated by emerging regulations such as the EU AI Act and Digital Services Act, create a class of strategic gaming distinct from the one-shot input/output gaming studied in prior work.
Conceptual and theoretical argument in the paper, motivated by regulatory context; formalization of continuous auditing as a multi-round interaction (T-round Stackelberg game).
high negative A Benchmark for Strategic Auditee Gaming Under Continuous Co... existence of a distinct class of strategic gaming (audit-evasion behaviors) unde...
The reform reduces industrial wastewater discharge, which improves agricultural production conditions (mechanism linking the reform to higher grain yield).
Mechanism analysis in the paper reporting reductions in industrial wastewater discharge following the reform (mediation channel analysis).
high negative Can water resource tax reform increase grain yield?—Evidence... industrial wastewater discharge
A key finding is that higher exact action accuracy can worsen aggregate trace alignment when the target is distributional.
Empirical comparison in simulator experiments indicating that optimizing for exact action accuracy (matching individual actions) can harm higher-level trace distribution alignment; observed in the studies contrasting deterministic copying/value-based approaches with Trace-Prior RL.
high negative Market-Alignment Risk in Pricing Agents: Trace Diagnostics a... exact action accuracy vs. aggregate trace alignment (distributional match)
Deterministic value-based RL and deterministic copying collapse this unresolved uncertainty into shortcut behavior.
Empirical observation in simulator experiments comparing deterministic value-based RL and deterministic copying agents to other approaches; observed collapsed/shortcut pricing behaviors when uncertainty is unresolved.
high negative Market-Alignment Risk in Pricing Agents: Trace Diagnostics a... policy action distribution / pricing choices (shortcut behavior)
This failure is a Goodhart-style failure under partial observability: Hotel A cannot observe the competitor's remaining inventory, booking curve, or pricing rule, so the same Hotel A-visible state maps to multiple plausible Hotel B prices.
Theoretical diagnosis supported by simulator setup and observed ambiguity in agent-visible states mapping to multiple competitor prices; derived from the two-hotel simulator design where key competitor variables are hidden from Hotel A.
high negative Market-Alignment Risk in Pricing Agents: Trace Diagnostics a... policy robustness / correctness under partial observability (mapping from observ...
GPT-4.1 exhibits hidden workflow shortcuts despite achieving perfect TSR and HF1.
Model-level observation from the ASR analysis within the experiment (paper reports GPT-4.1 had perfect TSR and HF1 but failed trajectory-level fidelity).
high negative Beyond Task Success: Measuring Workflow Fidelity in LLM-Base... trajectory fidelity vs. standard metrics (TSR, HF1)
Applied to the Hierarchical Multi-Agent System for Payments (HMASP) across 18 LLMs and 90,000 task instances, ASR reveals that 10 of 18 models systematically skip a confirmation checkpoint during payment checkout, a deviation invisible to both TSR and HF1, while 8 models enforce the checkpoint perfectly.
Empirical evaluation reported in the paper: HMASP tested across 18 LLMs and 90,000 task instances; analysis via ASR showing checkpoint-skipping behavior for 10 models and correct enforcement for 8 models.
high negative Beyond Task Success: Measuring Workflow Fidelity in LLM-Base... adherence to expected workflow transitions (confirmation checkpoint adherence)
With strong exposure of low-wealth, high-MPC households and concentrated ownership, privately chosen automation can be excessive even though it raises high-skilled labor income.
Theoretical welfare/comparison analyses in the model with heterogeneous households (differing in wealth and marginal propensities to consume) and ownership concentration; shows private incentives lead to automation choices that are suboptimal from a social perspective under these parameter constellations.
high negative The Demand Externality of Automation extent of automation chosen relative to social optimum (welfare-relevant automat...
Automation reduces paid human labor.
Model comparative statics in the same equilibrium framework showing substitution away from paid human labor as firms choose automation; result reported in the paper's static benchmark and general-equilibrium analysis.
high negative The Demand Externality of Automation paid human labor (labor share / labor employed in production)
DePAI entails risks including security, centralization, incentive failure, legal exposure, and the crowding-out of intrinsic motivation, requiring value-sensitive design and continuously adaptive governance.
Risk analysis and conceptual argument in the paper identifying possible failure modes and recommended design/governance responses; no empirical incidence data provided.
high negative DAO-enabled decentralized physical AI: A new paradigm for hu... security, centralization, incentive failure, legal exposure, and intrinsic motiv...
These dynamics may produce an asymmetric barbell-shaped structure of value capture in advanced economies: high-volume synthetic production controlled by owners of AI infrastructure at one pole, and scarce, high-status human labor valued for verified human presence at the other.
Conceptual projection and economic argument in the paper (no empirical decomposition, distributional statistics, or sample reported in the excerpt).
high negative Human-Provenance Verification should be Treated as Labor Inf... concentration of value capture across economic actors (inequality / distribution...
AI compresses the value of standardized middle-tier labor by making good-enough synthetic substitutes scalable at low marginal cost, hollowing out the middle of the skill distribution currently categorized by knowledge work.
Conceptual/theoretical argument presented in the paper (no reported empirical sample, statistical analysis, or quantified experiment in the excerpt).
high negative Human-Provenance Verification should be Treated as Labor Inf... value of standardized middle-tier knowledge work (wages / scarcity premiums)
The cultural and technical misalignment of the data center and electric power sectors makes coordination difficult.
Analytic claim in the paper describing differing design principles, operational philosophies, and economic incentives as sources of misalignment; presented as conceptual analysis without empirical measurement in the excerpt.
high negative From Barrier to Bridge: The Case for AI Data Center/Power Gr... ease/difficulty of coordination between sectors
A single hyperscale training campus can draw power comparable to a mid-sized city, driven by one tightly synchronized job whose demand swings by hundreds of megawatts in seconds.
Concrete illustrative assertion in the paper about facility-level power draw and rapid demand swings; no numeric source, dataset, or case-study details provided in the excerpt.
high negative From Barrier to Bridge: The Case for AI Data Center/Power Gr... power draw (MW) and rapid demand swing magnitude/timescale
AI training data centers break that assumption (load diversity).
Argumentative claim in the paper asserting that characteristics of AI training workloads violate the load-diversity assumption; no quantitative study included in the excerpt.
high negative From Barrier to Bridge: The Case for AI Data Center/Power Gr... degree to which aggregate grid demand is smoothed by uncorrelated loads (i.e., l...