Neither laissez-faire nor blanket IP protects the future of creative content for AI: permissive access starves creators of compensation, while strong rights blunt originality — and even a high‑quality model can erode itself by causing creators to homogenize their output. A data intermediary that internalizes cross-creator externalities and subsidizes novel contributions can both preserve incentives and improve long-run model quality.

Market Design for AI: Beyond the Copyright Binary

Yan Dai, Maryam Farboodi, Negin Golrezaei, Sepehr Shahshahani · June 10, 2026

arxiv theoretical n/a evidence 8/10 relevance Source PDF

Through static and dynamic theoretical models the paper shows that both permissive 'free-for-all' and strong IP regimes fail to sustain high-quality, innovative human-generated content for AI training — producing an 'originality penalty' under strong IP and a 'curse of precision' where good models induce homogenization that degrades future performance — and proposes a data intermediary that internalizes externalities and subsidizes innovation to restore efficiency.

How can we design a market of human-generated content for use in training AI models that both enables technological progress and preserves individual incentives for high-quality content creation? Existing approaches take polar positions: a "free-for-all" model based on fair use and a "strong intellectual property rights" model. We show that both fail: Free-for-all does not compensate creators, and -- by modeling as a static Stackelberg game -- strong intellectual property rights also underpower creative incentives. We find this especially true for more innovative creators, a phenomenon we term the "originality penalty." Extending this insight to a dynamic model, we find another market failure undermining AI model performance, even for an initially good model: Such a model induces greater reliance by humans on AI-assisted creation, resulting in homogenized content feeding back into training, which degrades the model performance -- a "curse of precision." We further propose a market design with a data intermediary internalizing cross-creator externalities and subsidizing innovative contributions, thereby restoring efficiency.

Summary

Main Finding

The paper shows that the two prevailing policy extremes for sourcing human-created content to train AI—(1) an unrestricted “free-for-all” (fair use) and (2) a strict individual intellectual-property (IP) rights regime—both fail to produce socially desirable outcomes. Free-for-all destroys creators’ incentives. Surprisingly, strict IP also under-provisions original creative effort because AI firms’ monopsonistic buying power and correlation across creators’ output depress prices and impose an “originality penalty” that disproportionately discourages novel creators. In a dynamic extension, a high‑quality model causes humans to rely more on AI, which homogenizes future human-created data and feeds back to degrade model performance—a “curse of precision.” The authors propose a data intermediary that bundles creators, internalizes cross‑creator externalities, and distributes payments (using Aumann–Shapley–inspired weights) via simple two‑part tariffs to restore efficiency (under full information).

Key Points

Free-for-all (blanket fair use): Leads to underinvestment in human-created content because creators are uncompensated while AI outputs substitute for original content.
Strict IP / individual bargaining: Fails even without transaction costs because
- AI firms have monopsony power and optimally buy lower‑quality (cheaper) content, reducing creators’ incentives.
- Correlation across creators means selling one creator’s data partially reveals others’ content, creating negative externalities that depress the marginal price creators receive.
- Originality penalty: Highly novel/innovative creators are undercompensated relatively more than typical creators because AI prefers representative data; scarcity of originality does not command a premium in equilibrium.
Dynamic "curse of precision":
- Better models substitute for costly human creativity, increasing human reliance on AI-assisted generation.
- This increases homogeneity of human-produced data, which is then used for retraining, degrading the model over time—even if fresh human data continues to be supplied.
- This is a distinct economic pathway to phenomena similar to empirically observed “model collapse.”
Data intermediary solution:
- A single intermediary negotiates with the AI firm, pooling creators to overcome monopsony power and internalize information‑leakage externalities.
- Payments allocated according to marginal contributions (weights inspired by Aumann–Shapley) can reverse the originality penalty.
- A two‑part tariff (affine lump-sum and affine per-effort transfers) is sufficient to restore efficiency under full information.
The proposed intermediary is a benchmark: it solves the market failures in a full‑information setting but does not yet address informational asymmetries or agency frictions, which are left for future work.

Data & Methods

Modeling approach:
- Static model: Stackelberg game where an AI firm (buyer/monopsonist) first chooses price(s) for content/quality and individual creators choose effort/quality. Analysis focuses on equilibrium effort vs socially optimal effort.
- Dynamic model: Continuous‑time framework where the AI firm repeatedly trains on human-generated data; humans in turn use AI to assist creation. The model captures feedback effects between model quality and human effort/heterogeneity over time.
Key assumptions:
- Creators expend costly effort to produce content; content signals are statistically correlated across creators (substitutability).
- AI firm has market power (monopsony/oligopsony) in acquiring training data.
- Full information in the intermediary benchmark (no asymmetric information or principal‑agent frictions).
- Social planner objective compares aggregate surplus including human production value and AI output.
Analytical tools:
- Game-theoretic equilibrium analysis (Stackelberg and continuous‑time dynamics).
- Comparative static analysis to identify originality penalty and underinvestment.
- Cooperative-division ideas (Aumann–Shapley style weighting) to apportion payments among creators in the intermediary solution.
- Formal propositions and proofs provided in the Appendix.
Literature connections:
- Extends informational‑economics literature on correlated data and data markets (Acemoglu et al., Bergemann et al.).
- Links to ML literature on model collapse and self‑training dynamics (Alemohammad et al., Shumailov et al.).
- Situates results in IP policy and fair-use legal debates and ongoing litigation.

Implications for AI Economics

Policy design:
- Neither blanket fair use nor uncompensated individual IP bargaining is sufficient; policy should enable compensation mechanisms that internalize cross‑creator externalities and mitigate AI monopsony power.
- Data intermediaries (or collective bargaining platforms) are promising institutional solutions to align incentives between creators and AI firms.
- Regulatory attention should target market structure (monopsony power of AI buyers) and mechanisms for collective payments and attribution.
Market design:
- Markets for training data should reward originality explicitly (reverse the originality penalty) to preserve diversity and innovation in creative inputs.
- Simple two‑part tariffs can implement efficient transfers under full information; practical implementations must also handle informational asymmetries and agency problems.
Dynamic risk management:
- Economic incentives, not only technical fixes, are central to preventing deleterious feedback loops (homogenization → degraded models).
- Platforms and regulators should monitor and counteract incentives that push creators toward homogenized, AI‑assisted outputs (e.g., by subsidizing novel content or limiting overreliance on model outputs in training pipelines).
Research directions:
- Study intermediary designs under asymmetric information and principal–agent constraints.
- Empirically measure content correlation, originality value, and the strength of monopsony power in data markets.
- Explore hybrid legal/regulatory approaches combining clearer property rules, mandated intermediaries, or antitrust remedies to monopsonistic data buying.
- Investigate platform-level mechanisms (recommendation systems, attribution, payment rules) that preserve content heterogeneity and long‑run model health.

Overall, the paper reframes the AI-data policy debate from a binary copyright question to a market‑design problem: effective institutions (not just stronger IP or unfettered access) are required to sustain both technological progress and human creative incentives.

Assessment

Paper Typetheoretical Evidence Strengthn/a — The paper is purely theoretical and provides no empirical or experimental evidence; conclusions follow from internal model assumptions and analytical derivations rather than observed causal estimates. Methods Rigormedium — The work uses standard and appropriate theoretical tools (Stackelberg game, dynamic modeling) and highlights novel mechanisms (originality penalty, curse of precision), but the strength of conclusions depends on key simplifying assumptions (e.g., functional forms, agent rationality, how training feedback is modeled) and lacks robustness checks or empirical calibration in the abstract. SampleNo empirical sample — analysis is based on analytical models: (1) a static Stackelberg game between AI model/trainer and content creators; (2) a dynamic model of repeated training where human reliance on AI affects future training data composition; theoretical agents and payoff structures are specified abstractly rather than estimated from data. Themesgovernance innovation human_ai_collab IdentificationAnalytical game-theoretic modeling: a static Stackelberg game to analyze creator incentives under IP regimes, and a dynamic model capturing feedback between model performance and human content production; results derived via equilibrium and comparative statics analysis (no empirical identification). GeneralizabilityResults depend on model-specific assumptions (utility/payoff functional forms, information structure, and timing) that may not hold across real markets., Abstract representation of AI training and learning dynamics may omit important technical details (e.g., model architecture, data heterogeneity, fine-tuning practices)., Ignores many institutional and legal constraints (copyright litigation costs, platform rules, licensing markets) that shape real-world outcomes., Assumes rational, homogeneous agents in some dimensions; real creator heterogeneity and strategic platform behavior may alter predictions., Does not include empirical calibration, so quantitative magnitudes and policy thresholds may not generalize to specific sectors or jurisdictions.

Claims (6)

Claim	Direction	Confidence	Outcome	Details
A "free-for-all" model based on fair use fails because it does not compensate creators for their contributions. Wages	negative	high	creator compensation	0.12
A regime of strong intellectual property rights, modeled as a static Stackelberg game, also fails to provide adequate creative incentives (it underpowers creative incentives). Creativity	negative	high	creative incentives / creator payoff	0.12
More innovative creators are especially harmed under the strong-IP regime — a phenomenon the paper terms the "originality penalty." Creativity	negative	high	relative payoff/incentive for innovative creators	0.12
In a dynamic model, an initially good AI model induces greater human reliance on AI-assisted creation, which homogenizes content and creates a feedback loop that degrades the model's own performance — a phenomenon termed the "curse of precision." Output Quality	negative	high	AI model performance (degradation over time due to homogenized training data)	0.12
Even a high-quality initial model can be undermined over time because human creators, relying on the model, produce more homogeneous content that harms subsequent training and lowers model performance. Output Quality	negative	high	degree of content homogeneity and downstream model performance	0.12
A market design with a data intermediary that internalizes cross-creator externalities and subsidizes innovative contributions can restore efficiency in the content-for-training market. Market Structure	positive	high	market efficiency / restoration of incentives and model performance	0.02