The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

Inference-layer gatekeepers can subtly shut out rivals by degrading latency, throughput or routing; the model finds discrimination grows when AI-quality matters more and downstream margins are high, and a four-pillared 'Neutral Inference' policy is proposed to restore parity.

The Inference Bottleneck: A Formal Model of Vertical Foreclosure in AI Markets
Gaston Besanson · April 19, 2026
arxiv theoretical n/a evidence 8/10 relevance Source PDF
A formal model shows inference-layer providers can foreclose downstream rivals through QoS discrimination and routing bias, with the QoS gap rising in inference-quality importance and downstream margins and falling with API price and rival entry elasticity, and proposes a 'Neutral Inference' conduct framework alongside illustrative welfare gains.

As generative AI commercializes, competitive advantage is shifting from model training toward inference, distribution, and routing. This paper develops a formal game-theoretic model of vertical foreclosure in inference markets, as the formal-model companion to Besanson and Celani (2026). The model isolates two foreclosure mechanisms operating without predatory pricing: quality-of-service (QoS) discrimination against downstream rivals via latency, throughput, context limits, or feature access; and routing bias in assistant-layer interfaces. An extension motivated by Anthropic's April 2026 release of Claude Opus 4.7 alongside the restricted-access Claude Mythos Preview introduces a third mechanism, tier-based access discrimination, parameterized by a tier gap (tau) and partner-exclusivity (kappa). The main result gives an explicit local equilibrium characterization of the QoS gap. Under logit demand and symmetric rivals, the gap is strictly increasing in inference-quality importance (alpha) and downstream margins, and strictly decreasing in API price and rival entry elasticity. Discrimination vanishes at a joint boundary rather than at a simple threshold in alpha alone. A stylized calibration to four providers using April 2026 data treats parameter values as inputs to a comparative risk mapping, not structural estimates. The mapping suggests Google and OpenAI face conditions most conducive to foreclosure; Microsoft's realized routing bias has been voluntarily constrained by a March 2026 multi-model pivot; Anthropic shows low consumer-channel risk and elevated risk in enterprise coding-agent segments. The policy section proposes Neutral Inference, a four-pillar conduct framework: QoS parity, routing transparency, FRAND-style non-discrimination, and tier transparency with release-pathway discipline. Illustrative welfare calculations suggest net gains in the tens of billions annually.

Summary

Main Finding

The paper develops a formal game-theoretic model of vertical foreclosure in AI inference markets and shows that vertically integrated inference providers can profitably degrade rivals without predatory pricing via three non-price mechanisms: (i) QoS discrimination (latency, throughput, context limits, feature gating), (ii) routing bias in assistant-layer interfaces, and (iii) tier-based access discrimination (distinct model classes with partner-exclusivity). The core analytic result gives an explicit local characterization of the equilibrium QoS gap (qU − qi), with clear comparative statics: the gap rises with the importance of inference to final quality (α) and the integrated downstream margin (mU), and falls with API access price (p) and rival-entry elasticity (η). The paper couples the model to a transparent, sensitivity-tested calibration for four providers (Google, OpenAI, Microsoft, Anthropic, April 2026) and proposes a four‑pillar conduct framework ("Neutral Inference") to operationalize enforcement.

Key Points

  • Model architecture
    • Two-layer game: one upstream inference provider U (also vertically integrated downstream) and N downstream rivals. Timing: U sets API price and QoS commitments; rivals decide entry and effort; consumers choose under logit preferences.
    • Final application quality Qi = α·qi + (1−α)·ei, where qi is inference QoS (controlled by U for rivals), ei is downstream effort, α∈(0,1).
    • Upstream infrastructure cost is quadratic in QoS with a scope parameter ϕ that penalizes serving rivals at higher QoS than the first‑party app.
    • Rival entry is modeled reduced-form with elasticity η to QoS.
  • Main analytical result (Proposition 1)
    • Equilibrium QoS gap ∆q = qU − qi has an explicit local form (equation (7) in paper). Discrimination (∆q>0) occurs iff the downstream business-stealing gains exceed foregone API revenue at the entry margin:
      • Gap increases with α and mU.
      • Gap decreases with API price p and entry elasticity η.
      • The discrimination boundary is a joint condition on (mU, p, η, sU, s, qi), not a simple α threshold.
  • Welfare and externalities
    • Private optimum typically elevates first‑party QoS above the social optimum and depresses rivals’ QoS → welfare loss.
    • Welfare loss decomposes into direct quality loss (reduced qi for rivals), an effort/innovation multiplier (lower downstream investment), and business‑stealing deadweight loss.
  • Dynamic foreclosure (Proposition 3)
    • "Open early, closed late": providers can attract entry with generous access then degrade QoS later if rival switching costs are large, extracting rents dynamically.
  • Routing bias (Proposition 4)
    • When the upstream provider controls assistant-layer routing, routing probabilities incorporate self‑preferencing; bias increases with mU, consumer inattention, and difficulty of ex‑post observation.
  • Tier-based access discrimination (Section 4.3 / Proposition 5)
    • Two-class model: generally-available models vs restricted-capability frontier models. Parameterized by tier gap τ and partner-exclusivity κ.
    • Tier gating generates more durable rents (less eroded by competition) and distinct empirical signatures (bimodal capability distributions across customer classes).
  • Calibration and firm-level mapping
    • Stylized calibration to Google, OpenAI, Microsoft, Anthropic using public adoption data (April 2026).
    • Parameters separated into observed, inferred, and judgment‑based; results presented as comparative risk mappings with sensitivity analysis, not structural estimates.
    • Baseline assessment: Google and OpenAI face conditions most conducive to foreclosure; Microsoft’s routing bias constrained after a March 2026 multi‑model pivot; Anthropic low risk on consumer channels but elevated risk in enterprise coding-agent segment.
  • Policy prescription: Neutral Inference (four pillars)
    • Pillar 1: QoS parity (conduct obligation to avoid discriminatory QoS).
    • Pillar 2: Routing transparency.
    • Pillar 3: FRAND-style non-discrimination on access terms.
    • Pillar 4: Tier transparency and release-pathway discipline (addressing tier‑gating).
    • Illustrative welfare calculations imply potential net gains in the tens of billions of dollars annually under plausible parameter scenarios (reported as order‑of‑magnitude, not point estimates).
  • Model-to-observables bridge and audit agenda
    • The paper maps latent primitives (α, η, τ, κ, etc.) to measurable objects regulators could audit (QoS metrics, latency/throughput logs, routing logs, capability distributions, contractual exclusivity).
    • Empirical predictions: observable QoS gaps, bimodal capability distributions when tiers are gated, dynamic patterns consistent with "open early, closed late", and routing logs showing biased weighting.

Data & Methods

  • Methodology
    • Theoretical: game-theoretic, two-stage model with logit consumer demand, symmetric downstream rivals, quadratic infrastructure costs, endogenous downstream effort, reduced‑form entry elasticity.
    • Analytical results: closed-form local equilibrium for QoS gap (interior solution), comparative statics, and several extensions (dynamic two‑period model, routing bias, tier gating).
  • Calibration
    • Uses public adoption and product-release data as of April 2026 for four major providers.
    • Parameters classification:
      • Observed: market shares/adoption proxies, public pricing, known product releases.
      • Inferred: implied shares, rough margins, entry sensitivity.
      • Judgment-based: α (importance of inference), cost curvature γ, ϕ (scope economies), τ and κ for tiering.
    • Calibration objective: comparative risk mapping and sensitivity analysis. Explicitly not a structural estimation exercise; results depend on interpretive inputs and are presented with robustness checks.
  • Limitations and identification caveats
    • Baseline model assumes a single upstream provider (extension to oligopoly noted but left for future work).
    • Logit demand and symmetric rivals used for tractability; alternative demand forms could change some quantitative comparative statics.
    • Reduced-form η bundles several margins (exit, upstream defection, multi‑homing); decomposition left for future work.
    • Calibration outputs are illustrative; welfare figures are order‑of‑magnitude.

Implications for AI Economics

  • Mechanisms shaping competition
    • Vertical control of inference can substitute for price-based exclusion: QoS degradation, routing self‑preference, and tier gating are powerful non‑price foreclosure tools.
    • The magnitude of foreclosure incentives depends jointly on downstream margins, the value of inference to final product (α), API pricing, and how responsive rival entry is to QoS (η).
    • Middleware, orchestration layers, and better multi‑homing reduce effective η and thus attenuate foreclosure incentives—policy and standardization efforts that lower switching costs can be competition‑enhancing.
  • Measurement and enforcement
    • Regulators should monitor multiple quantitative observables: per‑customer QoS logs (latency, throughput, context window), routing probability tables and assistant logs, contractual exclusivity terms, and distributional patterns of capability access (to detect tier gating).
    • Different mechanisms warrant different remedies: QoS parity and FRAND obligations for shared‑model discrimination; tier transparency and release-pathway constraints for durable tiered gating.
  • Welfare and policy tradeoffs
    • Conduct-based remedies (Neutral Inference pillars) could plausibly yield substantial aggregate welfare gains; the paper quantifies orders of magnitude but stresses parameter uncertainty.
    • Policies must balance safety carve-outs and legitimate staged rollouts versus exclusionary tiering—formal distinctions (e.g., documented safety release pathways) are necessary to avoid perverse blocking of innovations.
  • Directions for empirical work
    • Testable predictions: measurable QoS gaps between first‑party and third‑party traffic, increased downstream exit or migration after QoS degradations, bimodal capability access across customer classes when tiering occurs, and temporal patterns consistent with dynamic foreclosure.
    • Future empirical work should try to decompose η into exit, defection, and multi‑homing channels, and should extend the model to oligopolistic upstream competition to assess coordinated or asymmetric tier-gating risks.
  • Policy design guidance
    • Enforcement should not rely solely on price-based indicators: antitrust frameworks must incorporate technical QoS metrics, routing observability, and access to capability‑level release information.
    • Remedies should be mechanism‑targeted: QoS parity obligations and auditability for shared services; transparency and non‑exclusive release commitments for capability tiers.
  • Overall takeaway
    • The paper formalizes how control over inference infrastructure creates non‑price foreclosure incentives in generative AI markets, provides a tractable mapping from economic primitives to auditables, and offers a practical conduct framework (Neutral Inference) that can be deployed—with careful empirical work and design—to mitigate these risks while preserving legitimate safety and product-development pathways.

Assessment

Paper Typetheoretical Evidence Strengthn/a — Paper is a formal game-theoretic model with illustrative calibration rather than an empirical causal study; it generates theoretical comparative statics and policy implications but does not identify causal effects from observed variation. Methods Rigorhigh — Provides an explicit equilibrium characterization under clear assumptions (logit demand, symmetric rivals), derives monotonic comparative statics, and augments the theory with a stylized calibration and welfare calculations; limitations arise from simplifying assumptions and local-equilibrium focus rather than structural estimation. SampleNo micro-level sample; uses a stylized calibration mapped to four major inference providers (Google, OpenAI, Microsoft, Anthropic) using public April 2026 product, pricing, and release-information to set parameter inputs for comparative risk mapping rather than structural estimation. Themesgovernance innovation GeneralizabilityRelies on specific model assumptions (logit demand, symmetric rivals) that may not hold across markets, Main results are local equilibrium characterizations and may not extend to global dynamics or multi-period competition, Calibration is illustrative and not a structural estimate — quantitative welfare numbers are sensitive to input choices, Ignores some real-world frictions (heterogeneous user preferences, multi-homing costs, regulatory heterogeneity) and firm-specific strategies beyond the four providers, Does not model predatory pricing or dynamic entry/innovation explicitly, Findings are conditioned on April 2026 product configurations and may change as firms alter access or pricing

Claims (13)

ClaimDirectionConfidenceOutcomeDetails
As generative AI commercializes, competitive advantage is shifting from model training toward inference, distribution, and routing. Market Structure positive high shift in source of competitive advantage (training -> inference/distribution/routing)
0.06
The model isolates two foreclosure mechanisms operating without predatory pricing: quality-of-service (QoS) discrimination against downstream rivals (via latency, throughput, context limits, or feature access) and routing bias in assistant-layer interfaces. Market Structure positive high presence of foreclosure mechanisms (QoS discrimination, routing bias)
0.12
An extension motivated by Anthropic's April 2026 release introduces a third mechanism, tier-based access discrimination, parameterized by a tier gap (tau) and partner-exclusivity (kappa). Market Structure positive high tier-based access discrimination (parameterized by tau and kappa)
0.12
The main theoretical result provides an explicit local equilibrium characterization of the QoS gap under logit demand and symmetric rivals. Market Structure positive high QoS gap (equilibrium characterization)
0.12
Under logit demand and symmetric rivals, the QoS gap is strictly increasing in inference-quality importance (alpha) and downstream margins. Market Structure positive high QoS gap
0.12
Under logit demand and symmetric rivals, the QoS gap is strictly decreasing in API price and rival entry elasticity. Market Structure negative high QoS gap
0.12
Discrimination (QoS gap) vanishes at a joint boundary rather than at a simple threshold in alpha alone. Market Structure null_result high presence/absence of QoS discrimination
0.12
A stylized calibration to four providers using April 2026 data treats parameter values as inputs to a comparative risk mapping, not structural estimates. Market Structure null_result high comparative risk mapping across providers
n=4
0.12
The calibration mapping suggests Google and OpenAI face conditions most conducive to foreclosure. Market Structure positive medium conduciveness to foreclosure
n=4
0.04
Microsoft's realized routing bias has been voluntarily constrained by a March 2026 multi-model pivot. Market Structure negative medium routing bias (degree realized/constrained)
0.04
Anthropic shows low consumer-channel risk and elevated risk in enterprise coding-agent segments in the authors' comparative mapping. Market Structure mixed medium conduciveness to foreclosure by channel (consumer vs enterprise coding-agent)
n=4
0.04
The policy section proposes 'Neutral Inference', a four-pillar conduct framework consisting of QoS parity, routing transparency, FRAND-style non-discrimination, and tier transparency with release-pathway discipline. Governance And Regulation positive high regulatory/conduct framework (Neutral Inference) components
0.02
Illustrative welfare calculations suggest net gains in the tens of billions annually from the proposed policies/interventions. Consumer Welfare positive high aggregate welfare gains (annual)
net gains in the tens of billions annually
0.06

Notes