AI tools have collapsed the gap between naming and executing causal methods, letting polished but potentially invalid analyses spread widely; the author proposes an 'Analysis Contract' — a pre-commitment regime requiring a method-data contract, a data audit, and explicit disconfirmation criteria — to restore auditability and trust.

Vibe Econometrics and the Analysis Contract

Lydia Ashton · May 08, 2026

arxiv theoretical n/a evidence 8/10 relevance Source PDF

The paper argues that AI-assisted 'vibe' methodology democratizes both execution and domain-specific failure modes in causal analysis, and proposes an 'Analysis Contract' (method-data contract, data audit, pre-commitment on disconfirmation) to mitigate the resulting governance problem.

"Vibe coding" and "vibe analytics" have been framed as a democratization of technical capability. This paper argues that AI-assisted methodology more broadly, or what I call "vibe methodology," also democratizes the failure modes specific to each domain. When AI assists with methods whose validity depends on assumptions that cannot be verified from the output alone (a class I call "vibe inference"), the failure surface is structurally different: the output does not reliably signal invalidity, and when it does, recognizing the signal requires the expertise the workflow bypasses. I focus on "vibe econometrics," the subset of AI-assisted causal analysis where identification can be named faster than it can be audited. The claim of this paper is not that AI invents inferential failures that did not previously exist, but that it changes their incidence, observability, and persuasive force enough to create a practically distinct governance problem. This results in three failure modes: method-data mismatch, where AI bypasses expertise at execution; confidence laundering, where AI amplifies the credibility of formatted output; and invisible forking, which spans both. What is new is not the failure modes but AI's industrialization of their packaging. The barrier between naming a method and executing it has collapsed, and weak foundations, dressed as rigorous analysis, now reach audiences at a scale, speed, and polish that previously required expertise. I propose the Analysis Contract, a pre-commitment framework that adapts the logic of pre-analysis plans and the Causal Roadmap to the AI-assisted setting. The contract imposes three conditions before a causal claim is made: a method-data contract, a data audit, and a pre-commitment statement defining what would count as a disconfirming result. The framework generalizes across domains of vibe inference through domain-specific instantiation.

Summary

Main Finding

Vibe Econometrics argues that AI-assisted methods for causal analysis (“vibe econometrics”) do not create new inferential failures but materially change their incidence, observability, and persuasive force. This creates a distinct governance problem: AI collapses the barrier between naming a method and executing it, enabling structurally‑different failures (method-data mismatch, confidence laundering, and invisible forking). The paper proposes an “Analysis Contract” — a pre-commitment framework (method-data contract, data audit, and pre‑specified disconfirmation criteria) — to reintroduce methodological friction and reduce these failure modes.

Key Points

Vibe methodology = using AI to execute domain-specific methods via natural‑language prompts. Vibe inference = those uses where validity depends on assumptions not verifiable from output alone (e.g., causal claims, legal reasoning, scientific inference).
Vibe econometrics = AI-assisted causal inference (DiD, RDD, propensity scores, ITS, etc.). It is a sharp example because method names confer credibility while their identifying assumptions are invisible in polished outputs.
Two workflow stages matter:
- Production: AI executes the method; removes friction formerly imposed by coding and forced assumption checks.
- Reception: Polished AI outputs (tables, plots, interpretive text) reach audiences who may not detect foundational problems.
Three failure modes:
Method‑data mismatch — methods applied to data that do not meet identifying assumptions (valid arithmetic, invalid causal claim). Examples: DiD on aggregated claims with incompatible aggregation; propensity matching on unharmonized job titles across legacy firms; ITS across a measurement-instrument change.
Confidence laundering — AI formats weak or invalid analyses into authoritative, persuasive presentations (regression tables, diagnostics, confident prose), increasing audience trust even when evidence is invalid.
Invisible forking — rapid, unlogged specification search via prompts; default lack of audit trail; iterations framed as refinement rather than selective reporting.
AI amplifies these through model miscalibration (overconfident phrasing), sycophancy (models echo what the user seems to want), and human cognitive surrender (users adopt AI outputs with little scrutiny).
Downstream disclosure (journals/policies) has limited effect; upstream structural commitments are needed.
Proposal: Analysis Contract — three pre-commitment conditions before making a causal claim:
Method‑data contract: explicit statement matching identification strategy to data features and assumptions.
Data audit: independent checks of data provenance, measurement compatibility, and pre-processing that could violate assumptions.
Pre‑commitment disconfirmation criteria: what results would count as falsifying evidence (analogous to pre-analysis plans / Causal Roadmap).
Pre-commitment must be complemented by adversarial review and mechanical checks because AIs can produce confident affirmations that mislead even compliant analysts.
The framework is domain-general but requires domain-specific instantiation (templates, checklists, automated audits).

Data & Methods

Type: Conceptual working paper (May 2026) combining theory, literature synthesis, and composite illustrative cases drawn from the author’s applied experience and AI assistance.
Methodology:
- Taxonomy construction: maps “vibe methodology” landscape along two axes (failure observability from output alone; feedback loop speed), isolating “vibe inference” and focusing on “vibe econometrics.”
- Failure‑mode analysis: develops three failure modes with process diagrams (production vs reception) and concrete vignettes across sectors (healthcare claims, post‑merger pay audits, state education ITS).
- Governance proposal: Adapts logic from pre-analysis plans and the Causal Roadmap (Petersen & van der Laan, 2014; Dang et al., 2023) into the Analysis Contract.
- Literature grounding: integrates recent work on AI and scientific practice (Lin & Sohail 2026; Arbour et al. 2026; Shen & Tamkin 2026), LLM behavior (miscalibration, RLHF sycophancy), and classic methodology concerns (Gelman & Loken 2013).
Use of AI in producing the paper: author disclosed use of Claude Opus 4, Sonnet 4, ChatGPT 5, and Perplexity for drafting, revision, simulated reviews, and building appendices; primary sources were downloaded and verified by the author.
No primary empirical dataset; claims are supported via illustrative examples and synthesis of existing empirical and experimental literature (e.g., Becker et al. 2025 on perceived speedups; studies on LLM confidence).

Implications for AI Economics

Research practice: Economists must treat AI as a change to workflow structure, not merely a productivity tool. Methodological competence (assumption checking, data provenance) remains essential; training should emphasize meta‑skills (critical discernment, causal literacy, pre-commitment practices).
Replicability & credibility: Polished AI outputs increase the risk of convincing but invalid causal claims entering policy/firms. Journals, funders, and agencies should require upstream commitments (e.g., Analysis Contract artifacts) and machine‑readable audit trails, not only retrospective disclosure.
Tooling and tooling standards: Developers of analytical AI should build features that preserve provenance (prompt and spec logging, reproducible notebooks, automated assumption checks) and support mechanized audits (e.g., data‑schema checks, instrument change detection, harmonization checks).
Organizational governance: Firms and public agencies should adopt institutional processes that require method‑data matching and independent data audits before decisions or public claims. Analytics teams need role separation (producer, auditor, decision reviewer) to counter cognitive surrender and invisible forking.
Policy & regulation: Regulators who rely on AI-produced analyses (e.g., pay equity audits, program evaluations) must require documented pre-commitments and independent verification. Disclosure policies alone are insufficient.
Research agenda: Empirical measurement of the prevalence and impact of vibe econometrics (surveys, audits of organizational analytics outputs), RCTs testing effectiveness of Analysis Contract components, development of automated specification‑logging systems, and experiments on adversarial review workflows.
Broader economic consequences: By changing how persuasive evidence is produced and consumed, AI may influence market behavior and policy allocation—heightening the social value of robust governance to prevent misallocation based on spurious but polished causal claims.

If you want, I can: - Convert the Analysis Contract into a one‑page operational checklist for analytics teams; - Sketch a machine‑readable template for method‑data contracts that could be embedded in analysis pipelines; or - Identify empirical designs to measure how often the three failure modes occur in practice.

Assessment

Paper Typetheoretical Evidence Strengthn/a — This is a conceptual/theoretical paper that does not present empirical tests or causal estimates; it develops an argument about changed failure modes and proposes a governance framework rather than providing empirical evidence. Methods Rigormedium — The paper offers a coherent taxonomy of failure modes and adapts established practices (pre-analysis plans, the Causal Roadmap) into a concrete 'Analysis Contract', showing clear logical structure and domain-aware prescriptions; however it lacks formal modeling, empirical evaluation, or case studies demonstrating the framework's effectiveness in practice. SampleNo empirical sample or dataset; the paper is argumentative and conceptual, drawing on examples and prior methodological literature to illustrate failure modes and to motivate the proposed Analysis Contract. Themesgovernance human_ai_collab GeneralizabilityNo empirical validation — applicability and impact are untested in real-world settings, Effectiveness depends on institutional adoption and compliance (e.g., journals, funders, firms), Framework assumes availability of data and parties willing to pre-commit — may not hold in proprietary or adversarial settings, Practical implementation details will vary by domain, toolchain, and regulatory context, Does not quantify prevalence or economic magnitude of the identified failures across sectors

Claims (8)

Claim	Direction	Confidence	Outcome	Details
AI-assisted methodology ("vibe methodology") democratizes the failure modes specific to each domain. Automation Exposure	negative	high	democratization of domain-specific inferential failure modes (i.e., more widespread occurrence of those failures)	0.02
When AI assists with methods whose validity depends on assumptions that cannot be verified from the output alone ("vibe inference"), the failure surface is structurally different: the output does not reliably signal invalidity, and when it does, recognizing the signal requires the expertise the workflow bypasses. Decision Quality	negative	high	observability/detectability of invalid inference and requirement of expert knowledge to detect invalidity	0.02
AI changes the incidence, observability, and persuasive force of inferential failures enough to create a practically distinct governance problem (even if it does not invent previously nonexistent inferential failures). Governance And Regulation	negative	high	governance challenge arising from changed incidence, observability, and persuasiveness of inferential failures	0.02
AI industrializes the packaging of existing inferential failure modes: the barrier between naming a method and executing it has collapsed, allowing weak foundations, dressed as rigorous analysis, to reach audiences at a scale, speed, and polish that previously required expertise. Adoption Rate	negative	high	scale/speed/polish of dissemination of weak analyses (i.e., reach/adoption of low-quality analyses)	0.02
There are three practical failure modes produced or amplified by AI-assisted causal analysis: (1) method-data mismatch, where AI bypasses expertise at execution; (2) confidence laundering, where AI amplifies the credibility of formatted output; and (3) invisible forking, which spans both. Error Rate	negative	high	types of inferential failure modes arising in AI-assisted causal analysis	0.02
The novel governance problem is not that AI creates new failure modes, but that AI changes their incidence, observability, and persuasive force enough to require different governance responses. Governance And Regulation	mixed	high	need for adapted governance responses to AI-mediated inferential failures	0.02
The Analysis Contract, a proposed pre-commitment framework, can adapt the logic of pre-analysis plans and the Causal Roadmap to the AI-assisted setting by imposing three conditions before a causal claim is made: a method-data contract, a data audit, and a pre-commitment statement defining what would count as a disconfirming result. Governance And Regulation	positive	high	governance of AI-assisted causal claims / credibility of causal claims under AI assistance	0.02
The Analysis Contract framework generalizes across domains of vibe inference through domain-specific instantiation. Governance And Regulation	positive	high	applicability/generalizability of the Analysis Contract across domains	0.02