Arbitrageurs can stitch together cheap models to undercut AI providers—simple strategies delivered up to 40% profit margins in a GitHub issue-resolution benchmark—pushing consumer prices down, squeezing provider revenues, and changing incentives around distillation and market entry.

Computational Arbitrage in AI Model Markets

Ricardo Olmedo, Bernhard Schölkopf, Moritz Hardt · March 23, 2026

arxiv descriptive medium evidence 8/10 relevance Source PDF

In a verifiable GitHub issue-resolution benchmark, simple arbitrage strategies that allocate queries across models can undercut providers and yield net profit margins up to 40%, lowering consumer prices and reducing provider marginal revenue while also easing entry for smaller models.

Consider a market of competing model providers selling query access to models with varying costs and capabilities. Customers submit problem instances and are willing to pay up to a budget for a verifiable solution. An arbitrageur efficiently allocates inference budget across providers to undercut the market, thus creating a competitive offering with no model-development risk. In this work, we initiate the study of arbitrage in AI model markets, empirically demonstrating the viability of arbitrage and illustrating its economic consequences. We conduct an in-depth case study of SWE-bench GitHub issue resolution using two representative models, GPT-5 mini and DeepSeek v3.2. In this verifiable domain, simple arbitrage strategies generate net profit margins of up to 40%. Robust arbitrage strategies that generalize across different domains remain profitable. Distillation further creates strong arbitrage opportunities, potentially at the expense of the teacher model's revenue. Multiple competing arbitrageurs drive down consumer prices, reducing the marginal revenue of model providers. At the same time, arbitrage reduces market segmentation and facilitates market entry for smaller model providers by enabling earlier revenue capture. Our results suggest that arbitrage can be a powerful force in AI model markets with implications for model development, distillation, and deployment.

Summary

Main Finding

Computational arbitrage — purchasing generations from multiple model providers and reselling them — is practically viable in AI model markets. Simple cascade-based arbitrage strategies can source a target level of performance at lower cost than any single provider, yielding profit margins up to ~40%. Arbitrage is cheap to discover, robust across realistic distribution shifts, amplified by model distillation, and economically consequential: it lowers consumer prices, reduces providers’ marginal revenue, and collapses market segmentation, while also enabling revenue capture by smaller/cheaper models.

Key Points

Formalization
- Providers characterized by cost-performance curves: Cp(u) = minimum budget to reach performance u.
- Market price CP(u) = min_p Cp(u).
- Arbitrage opportunity: existence of an arbitrage policy q with Cq(u) < CP(u). Profit per performance level: Πq(u) = max{CP(u) − Cq(u), 0}.
Empirical proof-of-concept
- Case study on SWE-bench Verified (500 GitHub issues with unit-test verification).
- Models compared: GPT-5 mini, DeepSeek v3.2 (primary), plus later experiments with Qwen Coder (30B, 480B), Claude Sonnet 4.5, and a distilled mini-coder 4B.
- Example arbitrage policy (cascade): query GPT-5 mini up to $0.08 per issue; if unsuccessful, spend remaining budget on DeepSeek ($0.92 under a $1 cap). That policy achieves >68% solve rate at lower cost than either model alone.
- Profitability: net profit margins up to ~40% in the SWE-bench setting.
Robustness and cost of discovery
- Arbitrage policies can be fitted with very small search budgets. Expected profitable policies obtained with search budgets as low as $1; consistently profitable with ~$10–$30.
- Policies generalize across distribution shifts (e.g., Django issues vs. other repositories).
Market dynamics & economic effects
- Competition among arbitrageurs (Bertrand-style undercutting) drives market price down toward arbitrageurs’ marginal cost, eliminating arbitrage profits in equilibrium — benefiting consumers.
- Arbitrage breaks segmentation: cheaper models earn revenue across more performance tiers because arbitrage relies on them; a small/cheap model can capture frontier-driven revenue indirectly.
- Providers’ marginal revenue can fall substantially (up to ~40% in the study); lost provider surplus is captured by arbitrageurs or passed to consumers as lower prices.
Distillation
- Distillation improves cost-to-solution and thus increases scope for arbitrage.
- The authors demonstrate distillation can directly undermine a teacher model’s revenue: their distilled mini-coder 4B outperforms Qwen Coder 30B on cost-to-solution, creating new arbitrage paths and reducing teacher revenue.

Data & Methods

Benchmark and tasks
- SWE-bench Verified: 500 real GitHub software issues with unit tests to verify correctness (pass/fail).
Performance measurement
- Repeated-sampling protocol: models queried repeatedly until a correct patch or budget exhaustion; map dollar budget b to expected number of independent attempts k = b / (mean cost per attempt).
- Use pass@k estimator per issue; aggregate across issues to get expected solve rate ū_i(b).
- Expected total cost at budget b: c_i(b) = |J| ∫_0^b (1 − ū_i(x)) dx (survival-function identity).
Arbitrage policy class
- Cascades: sequential querying of providers with per-provider caps τ_i; remaining budget allocated forward.
- Solve probability for cascade computed per-issue as 1 − ∏_i (1 − u_i,j(b_i(τ))).
- Profit-maximizing τ found by search over small datasets (search budgets of $0.5 per query allowed during fitting).
Experiments
- Primary two-provider comparison: GPT-5 mini vs DeepSeek v3.2; cascaded arbitrage yields lower cost curves than either alone.
- Search-budget sensitivity: profitability vs. amount spent to fit the arbitrage policy; bootstrapped CIs reported.
- Distribution-shift tests: fit on Django subset, evaluate on non-Django (and vice versa).
- Multi-provider markets: add Qwen 30B/480B, Claude Sonnet 4.5, and distilled mini-coder 4B; analyze cost-frontiers and revenue splits.
- Competition model: two arbitrageurs repeatedly undercut prices (Bertrand-style) to show price erosion and vanishing arbitrage profits.
Reproducibility
- Code, data, and models provided at authors’ GitHub repository (link in paper).

Implications for AI Economics

Pricing and consumer surplus
- Arbitrage tends to lower end-user prices (good for consumers) because competing arbitrageurs erode markups until price ≈ marginal sourcing cost.
Provider incentives and revenue
- Providers face reduced marginal revenue; frontier models may lose revenue even as they remain necessary to source hard cases.
- Cheap/efficient models become strategically valuable: being “good enough and cheap” lets smaller providers participate in high-performance demand via arbitrage-mediated bundling.
Distillation and model-release strategy
- Distillation amplifies arbitrage by enabling low-cost models that replicate higher-tier performance; this can cannibalize teacher revenue.
- Providers may need to reconsider public API pricing, access controls, or strategic release schedules to protect revenue (e.g., rate limits, usage-based tiers, verification/credentialing for resellers).
Market structure and regulation
- Platforms may need mechanisms to detect and manage arbitrage resale (billing attribution, reseller contracts, or differentiated verification of resold outputs).
- Policy discussion: arbitrage is economically rational and risk-free by construction; regulator/market design choices will determine whether benefits (lower prices, greater access) or harms (reduced provider incentives for frontier investment) dominate.
Short-term vs long-term effects
- Short term: consumers benefit from lower prices and wider effective access to performance.
- Long term: persistent arbitrage could reduce incentives for providers to invest in higher-margin capabilities unless business models adapt (e.g., bundling services, proprietary verification tasks, or locked features).

If you want, I can: - Extract the key quantitative plots/data points (cost vs. solve-rate curves) into a compact table. - Translate the cascade arbitration algorithm into pseudo-code. - Draft potential market-design or policy responses (platform-level anti-arbitrage mechanisms and pros/cons).

Assessment

Paper Typedescriptive Evidence Strengthmedium — The paper provides clear, reproducible empirical demonstrations that arbitrage strategies can be profitable in a verifiable benchmark and explores multiple robustness checks (different strategies, distillation). However, evidence is limited to a narrow, synthetic/benchmarked domain and a small set of representative models; claims about broader market dynamics rely on modeled counterfactuals rather than observed real-world market behavior. Methods Rigormedium — Experiments appear systematic: they use a verifiable task, compare concrete pricing/costs, test multiple strategies and distillation effects, and report margins and revenue impacts. But rigor is constrained by limited scope (one benchmark, two primary models), potential simplifications (perfect verification, frictionless market access, simplified pricing and rate-limit assumptions), and lack of empirical validation against real market data or endogenous provider responses. SampleSWE-bench GitHub issue resolution dataset (verifiable task instances) evaluated using two representative models (GPT-5 mini and DeepSeek v3.2); experiments simulate customer budgets, provider prices/costs, arbitrage strategies that split inference budget across models, and additional simulations for distillation and multiple competing arbitrageurs; some robustness checks across other domains noted. Themesadoption innovation IdentificationNo formal causal identification; the paper uses simulation/empirical experiments in a verifiable task (SWE-bench GitHub issue resolution) to compare costs, revenues and profits when an arbitrageur allocates inference budget across providers versus single-provider purchases; also conducts counterfactuals (distillation, multiple arbitrageurs) via modelling. GeneralizabilitySingle verifiable domain (GitHub issue resolution) may not represent less-verifiable or open-ended tasks, Only two primary models tested — results may differ with other model architectures, quality/cost tradeoffs, or larger model pools, Assumes frictionless access to providers and ability to combine responses (no contractual, legal, or rate-limit constraints), Assumes reliable, cheap verification of solutions — many real tasks lack easy verifiability, Market dynamics modeled rather than observed; provider strategic responses (price changes, rate-limiting, API terms) could alter outcomes, Ignores non-price competition dimensions (latency, data privacy, brand, integrated features) that affect real adoption

Claims (10)

Claim	Direction	Confidence	Outcome	Details
We conduct an in-depth case study of SWE-bench GitHub issue resolution using two representative models, GPT-5 mini and DeepSeek v3.2. Other	null_result	high	execution of a case study on SWE-bench GitHub issue resolution with two named models	0.3
In this verifiable domain, simple arbitrage strategies generate net profit margins of up to 40%. Firm Revenue	positive	high	net profit margin of arbitrage strategies	up to 40% 0.18
Arbitrage is viable in AI model markets (we empirically demonstrate the viability of arbitrage and illustrate its economic consequences). Adoption Rate	positive	medium	viability/profitability and economic impact of arbitrage strategies	0.11
Robust arbitrage strategies that generalize across different domains remain profitable. Firm Revenue	positive	medium	profitability of arbitrage strategies across multiple domains	0.11
Distillation further creates strong arbitrage opportunities, potentially at the expense of the teacher model's revenue. Firm Revenue	negative	medium	arbitrage profitability enabled by distilled models and impact on teacher model revenue	0.11
Multiple competing arbitrageurs drive down consumer prices, reducing the marginal revenue of model providers. Firm Revenue	negative	medium	consumer prices and marginal revenue of model providers	0.11
Arbitrage reduces market segmentation and facilitates market entry for smaller model providers by enabling earlier revenue capture. Market Structure	positive	medium	market segmentation and ease of market entry for smaller model providers	0.11
An arbitrageur can efficiently allocate inference budget across providers to undercut the market, creating a competitive offering with no model-development risk. Adoption Rate	positive	medium	ability to undercut market prices and create competitive offering without model development	0.11
Robust arbitrage strategies remain profitable even when generalized across different domains (claim reiteration emphasizing cross-domain profitability and robustness). Firm Revenue	positive	medium	cross-domain profitability of arbitrage strategies	0.11
Our results suggest that arbitrage can be a powerful force in AI model markets with implications for model development, distillation, and deployment. Market Structure	mixed	medium	overall economic influence of arbitrage on model development, distillation, and deployment practices	0.11