Arbitrageurs can stitch together cheap models to undercut AI providers—simple strategies delivered up to 40% profit margins in a GitHub issue-resolution benchmark—pushing consumer prices down, squeezing provider revenues, and changing incentives around distillation and market entry.
Consider a market of competing model providers selling query access to models with varying costs and capabilities. Customers submit problem instances and are willing to pay up to a budget for a verifiable solution. An arbitrageur efficiently allocates inference budget across providers to undercut the market, thus creating a competitive offering with no model-development risk. In this work, we initiate the study of arbitrage in AI model markets, empirically demonstrating the viability of arbitrage and illustrating its economic consequences. We conduct an in-depth case study of SWE-bench GitHub issue resolution using two representative models, GPT-5 mini and DeepSeek v3.2. In this verifiable domain, simple arbitrage strategies generate net profit margins of up to 40%. Robust arbitrage strategies that generalize across different domains remain profitable. Distillation further creates strong arbitrage opportunities, potentially at the expense of the teacher model's revenue. Multiple competing arbitrageurs drive down consumer prices, reducing the marginal revenue of model providers. At the same time, arbitrage reduces market segmentation and facilitates market entry for smaller model providers by enabling earlier revenue capture. Our results suggest that arbitrage can be a powerful force in AI model markets with implications for model development, distillation, and deployment.
Summary
Main Finding
Computational arbitrage — purchasing generations from multiple model providers and reselling them — is practically viable in AI model markets. Simple cascade-based arbitrage strategies can source a target level of performance at lower cost than any single provider, yielding profit margins up to ~40%. Arbitrage is cheap to discover, robust across realistic distribution shifts, amplified by model distillation, and economically consequential: it lowers consumer prices, reduces providers’ marginal revenue, and collapses market segmentation, while also enabling revenue capture by smaller/cheaper models.
Key Points
- Formalization
- Providers characterized by cost-performance curves: Cp(u) = minimum budget to reach performance u.
- Market price CP(u) = min_p Cp(u).
- Arbitrage opportunity: existence of an arbitrage policy q with Cq(u) < CP(u). Profit per performance level: Πq(u) = max{CP(u) − Cq(u), 0}.
- Empirical proof-of-concept
- Case study on SWE-bench Verified (500 GitHub issues with unit-test verification).
- Models compared: GPT-5 mini, DeepSeek v3.2 (primary), plus later experiments with Qwen Coder (30B, 480B), Claude Sonnet 4.5, and a distilled mini-coder 4B.
- Example arbitrage policy (cascade): query GPT-5 mini up to $0.08 per issue; if unsuccessful, spend remaining budget on DeepSeek ($0.92 under a $1 cap). That policy achieves >68% solve rate at lower cost than either model alone.
- Profitability: net profit margins up to ~40% in the SWE-bench setting.
- Robustness and cost of discovery
- Arbitrage policies can be fitted with very small search budgets. Expected profitable policies obtained with search budgets as low as $1; consistently profitable with ~$10–$30.
- Policies generalize across distribution shifts (e.g., Django issues vs. other repositories).
- Market dynamics & economic effects
- Competition among arbitrageurs (Bertrand-style undercutting) drives market price down toward arbitrageurs’ marginal cost, eliminating arbitrage profits in equilibrium — benefiting consumers.
- Arbitrage breaks segmentation: cheaper models earn revenue across more performance tiers because arbitrage relies on them; a small/cheap model can capture frontier-driven revenue indirectly.
- Providers’ marginal revenue can fall substantially (up to ~40% in the study); lost provider surplus is captured by arbitrageurs or passed to consumers as lower prices.
- Distillation
- Distillation improves cost-to-solution and thus increases scope for arbitrage.
- The authors demonstrate distillation can directly undermine a teacher model’s revenue: their distilled mini-coder 4B outperforms Qwen Coder 30B on cost-to-solution, creating new arbitrage paths and reducing teacher revenue.
Data & Methods
- Benchmark and tasks
- SWE-bench Verified: 500 real GitHub software issues with unit tests to verify correctness (pass/fail).
- Performance measurement
- Repeated-sampling protocol: models queried repeatedly until a correct patch or budget exhaustion; map dollar budget b to expected number of independent attempts k = b / (mean cost per attempt).
- Use pass@k estimator per issue; aggregate across issues to get expected solve rate ū_i(b).
- Expected total cost at budget b: c_i(b) = |J| ∫_0^b (1 − ū_i(x)) dx (survival-function identity).
- Arbitrage policy class
- Cascades: sequential querying of providers with per-provider caps τ_i; remaining budget allocated forward.
- Solve probability for cascade computed per-issue as 1 − ∏_i (1 − u_i,j(b_i(τ))).
- Profit-maximizing τ found by search over small datasets (search budgets of $0.5 per query allowed during fitting).
- Experiments
- Primary two-provider comparison: GPT-5 mini vs DeepSeek v3.2; cascaded arbitrage yields lower cost curves than either alone.
- Search-budget sensitivity: profitability vs. amount spent to fit the arbitrage policy; bootstrapped CIs reported.
- Distribution-shift tests: fit on Django subset, evaluate on non-Django (and vice versa).
- Multi-provider markets: add Qwen 30B/480B, Claude Sonnet 4.5, and distilled mini-coder 4B; analyze cost-frontiers and revenue splits.
- Competition model: two arbitrageurs repeatedly undercut prices (Bertrand-style) to show price erosion and vanishing arbitrage profits.
- Reproducibility
- Code, data, and models provided at authors’ GitHub repository (link in paper).
Implications for AI Economics
- Pricing and consumer surplus
- Arbitrage tends to lower end-user prices (good for consumers) because competing arbitrageurs erode markups until price ≈ marginal sourcing cost.
- Provider incentives and revenue
- Providers face reduced marginal revenue; frontier models may lose revenue even as they remain necessary to source hard cases.
- Cheap/efficient models become strategically valuable: being “good enough and cheap” lets smaller providers participate in high-performance demand via arbitrage-mediated bundling.
- Distillation and model-release strategy
- Distillation amplifies arbitrage by enabling low-cost models that replicate higher-tier performance; this can cannibalize teacher revenue.
- Providers may need to reconsider public API pricing, access controls, or strategic release schedules to protect revenue (e.g., rate limits, usage-based tiers, verification/credentialing for resellers).
- Market structure and regulation
- Platforms may need mechanisms to detect and manage arbitrage resale (billing attribution, reseller contracts, or differentiated verification of resold outputs).
- Policy discussion: arbitrage is economically rational and risk-free by construction; regulator/market design choices will determine whether benefits (lower prices, greater access) or harms (reduced provider incentives for frontier investment) dominate.
- Short-term vs long-term effects
- Short term: consumers benefit from lower prices and wider effective access to performance.
- Long term: persistent arbitrage could reduce incentives for providers to invest in higher-margin capabilities unless business models adapt (e.g., bundling services, proprietary verification tasks, or locked features).
If you want, I can: - Extract the key quantitative plots/data points (cost vs. solve-rate curves) into a compact table. - Translate the cascade arbitration algorithm into pseudo-code. - Draft potential market-design or policy responses (platform-level anti-arbitrage mechanisms and pros/cons).
Assessment
Claims (10)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We conduct an in-depth case study of SWE-bench GitHub issue resolution using two representative models, GPT-5 mini and DeepSeek v3.2. Other | null_result | high | execution of a case study on SWE-bench GitHub issue resolution with two named models |
0.3
|
| In this verifiable domain, simple arbitrage strategies generate net profit margins of up to 40%. Firm Revenue | positive | high | net profit margin of arbitrage strategies |
up to 40%
0.18
|
| Arbitrage is viable in AI model markets (we empirically demonstrate the viability of arbitrage and illustrate its economic consequences). Adoption Rate | positive | medium | viability/profitability and economic impact of arbitrage strategies |
0.11
|
| Robust arbitrage strategies that generalize across different domains remain profitable. Firm Revenue | positive | medium | profitability of arbitrage strategies across multiple domains |
0.11
|
| Distillation further creates strong arbitrage opportunities, potentially at the expense of the teacher model's revenue. Firm Revenue | negative | medium | arbitrage profitability enabled by distilled models and impact on teacher model revenue |
0.11
|
| Multiple competing arbitrageurs drive down consumer prices, reducing the marginal revenue of model providers. Firm Revenue | negative | medium | consumer prices and marginal revenue of model providers |
0.11
|
| Arbitrage reduces market segmentation and facilitates market entry for smaller model providers by enabling earlier revenue capture. Market Structure | positive | medium | market segmentation and ease of market entry for smaller model providers |
0.11
|
| An arbitrageur can efficiently allocate inference budget across providers to undercut the market, creating a competitive offering with no model-development risk. Adoption Rate | positive | medium | ability to undercut market prices and create competitive offering without model development |
0.11
|
| Robust arbitrage strategies remain profitable even when generalized across different domains (claim reiteration emphasizing cross-domain profitability and robustness). Firm Revenue | positive | medium | cross-domain profitability of arbitrage strategies |
0.11
|
| Our results suggest that arbitrage can be a powerful force in AI model markets with implications for model development, distillation, and deployment. Market Structure | mixed | medium | overall economic influence of arbitrage on model development, distillation, and deployment practices |
0.11
|