Simple pricing algorithms that ignore rivals can drive persistent supra-competitive pricing — and under symmetric exploration can reach monopoly levels. Calibrated simulations show these collusive-like outcomes arise robustly even with finite horizons and product heterogeneity.

Misspecified Explore-then-Exploit Leads to Supra-Competitive Prices

Jackie Baek, Vivek F. Farias, Farrell Wu · May 15, 2026

arxiv theoretical medium evidence 8/10 relevance Source PDF

Algorithms that explore then exploit using misspecified, monopoly-style demand estimates can systematically converge to supra-competitive — even monopoly — prices, and simulations calibrated to rental-market data show these collusive-like outcomes are robust.

We study whether simple algorithmic pricing systems can systematically produce collusive-like prices in multi-firm markets. We consider firms using an explore-then-exploit pipeline: they randomize prices during an initial exploration phase, then estimate demand from their own historical data and set prices myopically thereafter. The estimation step relies on a misspecified, monopoly-style model that omits competitors' prices. We characterize when this pipeline converges to supra-competitive prices above the Nash equilibrium, via a fluid-limit ordinary differential equation analysis. We show that supra-competitive prices arise when firms explore within similar price ranges on the same side of the Nash price. Moreover, prices can be substantially above the Nash price; we show that prices can reach monopoly levels under symmetric exploration. Simulations calibrated to a real multifamily rental market confirm that supra-competitive outcomes arise robustly beyond our theoretical assumptions, including under finite horizons, heterogeneous products, and nonlinear logit demand.

Summary

Main Finding

Simple explore-then-exploit pricing pipelines that fit a misspecified (monopoly-style) demand model using only a firm’s own price–sales history can systematically converge to supra-competitive prices. When firms’ initial exploration means lie in a “best-response cone” (i.e., firms experiment with similar prices on the same side of the Nash price), terminal prices lie above the Nash equilibrium and can be substantially higher — in symmetric cases they can converge all the way to the monopoly price. These supra-competitive outcomes arise without explicit coordination, punishments, or communication; the misspecification and the exploration pattern suffice.

Key Points

Model sketch
- N symmetric firms, linear true demand: Qi,t = a − bPi,t + (c/(N−1)) Σ_{j≠i} Pj,t + εi,t with b > c.
- Explore-then-exploit policy: K periods of independent randomized pricing (distribution Dexp with mean µ and covariance Σexp), then exploitation where each firm repeatedly fits a monopoly-style linear demand q(p)=α+βp by OLS using only its own price–sales history and sets the myopic profit-maximizing price.
- Firms treat competitors’ prices as unobserved noise (misspecification).
Analytical approach
- Fluid-scaling limit (K, T → ∞ with appropriate scaling) yields a deterministic price-moments ODE that tracks running price means U(t) and accumulated covariances V(t); posted prices P(t) are determined from U and V via closed-form expressions.
- The ODE analysis yields the main theoretical results about limiting prices.
Main theoretical results
- Theorem (supra-competitive cones): If the vector of exploration means µ lies inside a best-response cone (all firms exploring with similar prices on the same side of the Nash price and within a cone determined by demand parameters), the limiting prices are supra-competitive. In duopoly, these cones cover at least one quarter of the feasible (µ1, µ2) space.
- Duopoly dynamics: Under any initial conditions the duopoly limit lies either in a best-response cone or at the Nash price; re-exploring around resulting prices makes supra-competitive outcomes likely in subsequent rounds.
- Probability of landing in a cone: Under a natural clustered prior (means sampled from a common interval), the probability of being in a best-response cone is at least 1/4 and increases with N.
- Symmetric exploration characterization (Theorem 3.3): With all firms’ exploration mean equal to s and vanishing exploration variance, limiting price behavior is sharp — if s ≤ pNE (Nash) or s ≥ pMNP (monopoly) prices converge to the monopoly price; if pNE < s < pMNP prices converge to s. Thus limiting prices can equal monopoly when exploration is sufficiently low or sufficiently high, and otherwise freeze at the exploration mean.
Mechanism
- Key invariant: after sufficient exploration within a cone, each firm’s posted price maintains a common sign relative to its trailing mean (e.g., always weakly above or weakly below the running mean). This enforces positive correlation across firms’ price changes (in the fluid sense), which, under the misspecified estimate-then-optimize update, produces upward pressure on prices and supra-competitive limits.
Robustness evidence
- Simulations calibrated to a real multifamily rental market (Greater Boston, using Calder-Wang & Kim 2024 demand estimates) show supra-competitive terminal prices arise robustly beyond the linear-asymptotic assumptions: finite horizons, heterogeneous products, nonlinear logit demand, and product-specific costs. Effects appear quickly and are strongest when exploration prices are clustered on the same side of Nash.

Data & Methods

Theoretical analysis
- Core analytical tool: a price-moments ODE (Definition 3.1) describing deterministic limits of running means U(t) and accumulated covariances V(t) under the fluid scaling; posted prices P(t) are functions of U and V with explicit formulas capturing the OLS misspecification effect.
- Rigorous results: convergence of terminal prices to ODE solution; characterization of best-response cones; duopoly and symmetric-exploration theorems; probability bounds for cone membership under clustered priors.
Simulations
- Synthetic simulations (numerical ODE evaluation and stochastic runs) to illustrate and quantify the gap above Nash under various exploration parameters.
- Empirical calibration: simulations using a calibrated multifamily rental market model (Greater Boston) from Calder-Wang & Kim (2024), incorporating heterogeneous products, heterogeneous customers, nonlinear logit demand, and calibrated shadow costs. Firms run the same explore-then-exploit pipeline (OLS misspecified model, myopic pricing) and outcomes are compared to Nash and monopoly benchmarks.
Assumptions and limitations
- Analytical results rely on linear demand, symmetric firms, independent exploration draws, OLS estimation, and fluid (large-horizon) scaling.
- Simulations test departures from many assumptions and confirm qualitative robustness, but exact quantitative outcomes depend on market specifics and finite-horizon effects.

Implications for AI Economics

Algorithmic pricing risk: Widely used, simple model-based pricing pipelines (estimate-then-optimize using only a firm’s own data) can induce sustained supra-competitive pricing without explicit collusion. Misspecification — here ignoring competitors’ prices — plus correlated exploration is a sufficient mechanism.
Public policy and antitrust
- The results strengthen the case that antitrust concerns about algorithmic pricing do not require sophisticated multi-agent RL or explicit coordination; even simple automated systems can produce collusive-like outcomes.
- Detection and enforcement need to account for learning-induced coupling (not just explicit communication or rule-based coordination).
Platform and firm design
- Firms or platform providers deploying pricing algorithms should be aware that omitting competitor prices from demand estimation can induce harmful market outcomes. Incorporating competitor-price information or structural models that account for strategic interactions may mitigate upward bias.
- Design options to reduce risk: force or encourage diversity in exploration (reduce clustering on one side of Nash), incorporate competitor-price features, use models that acknowledge strategic interactions, add explicit regularization toward competitive benchmarks, or implement guardrails that cap long-run prices relative to market benchmarks.
Research directions
- Extend analysis to broader demand systems, asymmetric firms, endogenously determined exploration, richer estimation procedures (Bayesian, penalized regression), and strategic learning rules.
- Study detection methods to distinguish learning-induced supra-competitive outcomes from explicit collusion, and design policy interventions (randomized audits, mandated information sharing, algorithmic certification).
Cautionary note
- While the analytical results are derived under stylized assumptions (linear demand, fluid limits), the paper’s calibrated simulations show the mechanism persists in realistic settings, suggesting practical relevance. Nonetheless, quantitative magnitudes will depend on market structure, demand curvature, estimation details, and firms’ algorithmic designs.

Assessment

Paper Typetheoretical Evidence Strengthmedium — The paper provides a formal theoretical characterization (fluid-limit ODE analysis) that demonstrates how a particular algorithmic pricing pipeline can produce supra-competitive outcomes, and supports those claims with simulations calibrated to real multifamily rental data; however, it lacks causal empirical identification from real-world interventions or natural experiments, and results depend on model assumptions and the specific algorithmic architecture studied. Methods Rigorhigh — The analysis uses formal asymptotic techniques (fluid-limit ODEs) to derive conditions for convergence and conducts simulation experiments to test robustness across finite horizons, heterogeneous products, and nonlinear demand, indicating careful and appropriate methodological choices for the research question. SampleTheoretical model of competing firms using an explore-then-exploit pricing pipeline, with simulation experiments calibrated to a real multifamily rental market (calibration uses observed price ranges and demand features; simulations include finite horizons, product heterogeneity, and logit demand), but no randomized or observational causal identification from field data. Themesgovernance adoption GeneralizabilityRelies on a specific explore-then-exploit algorithm and a misspecified monopoly-style estimation step—other learning algorithms (e.g., multi-agent RL, opponent-aware estimators) may behave differently, Assumes firms explore within similar price ranges and on the same side of the Nash price—real-world firms may have heterogeneous exploration strategies, Calibration to a multifamily rental market may not generalize to other industries with different dynamics (retail, platforms, frequent-price settings), Ignores regulatory responses, legal constraints, and explicit collusion incentives which can alter outcomes, Theoretical results use asymptotic/fluid-limit arguments that may not fully capture small-sample or highly stochastic settings

Claims (6)

Claim	Direction	Confidence	Outcome	Details
Simple algorithmic pricing systems can systematically produce collusive-like (supra-competitive) prices in multi-firm markets. Market Structure	positive	high	price level relative to the Nash equilibrium	supra-competitive prices (above Nash equilibrium) 0.12
Supra-competitive prices arise when firms explore within similar price ranges on the same side of the Nash price. Market Structure	positive	high	whether equilibrium prices are supra-competitive (above Nash)	supra-competitive prices when exploration ranges are similar and on same side of Nash 0.2
Under symmetric exploration, prices can reach monopoly levels. Market Structure	positive	high	price level (specifically reaching the monopoly price)	prices can reach monopoly levels 0.2
Simulations calibrated to a real multifamily rental market confirm that supra-competitive outcomes arise robustly beyond the theoretical assumptions, including under finite horizons, heterogeneous products, and nonlinear logit demand. Market Structure	positive	high	occurrence of supra-competitive prices in simulated market environments	supra-competitive outcomes observed robustly in simulations 0.12
Firms following an explore-then-exploit pipeline randomize prices during an initial exploration phase, then estimate demand from their own historical data and set prices myopically thereafter; the estimation relies on a misspecified, monopoly-style model that omits competitors' prices. Other	null_result	high	pricing algorithm structure (exploration then myopic exploitation based on misspecified demand model)	0.02
The convergence properties of the explore-then-exploit pricing pipeline can be characterized via a fluid-limit ordinary differential equation (ODE) analysis. Market Structure	null_result	high	convergence behavior of prices under the pricing pipeline	0.06