Neither laissez-faire nor blanket IP protects the future of creative content for AI: permissive access starves creators of compensation, while strong rights blunt originality — and even a high‑quality model can erode itself by causing creators to homogenize their output. A data intermediary that internalizes cross-creator externalities and subsidizes novel contributions can both preserve incentives and improve long-run model quality.
How can we design a market of human-generated content for use in training AI models that both enables technological progress and preserves individual incentives for high-quality content creation? Existing approaches take polar positions: a "free-for-all" model based on fair use and a "strong intellectual property rights" model. We show that both fail: Free-for-all does not compensate creators, and -- by modeling as a static Stackelberg game -- strong intellectual property rights also underpower creative incentives. We find this especially true for more innovative creators, a phenomenon we term the "originality penalty." Extending this insight to a dynamic model, we find another market failure undermining AI model performance, even for an initially good model: Such a model induces greater reliance by humans on AI-assisted creation, resulting in homogenized content feeding back into training, which degrades the model performance -- a "curse of precision." We further propose a market design with a data intermediary internalizing cross-creator externalities and subsidizing innovative contributions, thereby restoring efficiency.
Summary
Main Finding
The paper shows that the two prevailing policy extremes for sourcing human-created content to train AI—(1) an unrestricted “free-for-all” (fair use) and (2) a strict individual intellectual-property (IP) rights regime—both fail to produce socially desirable outcomes. Free-for-all destroys creators’ incentives. Surprisingly, strict IP also under-provisions original creative effort because AI firms’ monopsonistic buying power and correlation across creators’ output depress prices and impose an “originality penalty” that disproportionately discourages novel creators. In a dynamic extension, a high‑quality model causes humans to rely more on AI, which homogenizes future human-created data and feeds back to degrade model performance—a “curse of precision.” The authors propose a data intermediary that bundles creators, internalizes cross‑creator externalities, and distributes payments (using Aumann–Shapley–inspired weights) via simple two‑part tariffs to restore efficiency (under full information).
Key Points
- Free-for-all (blanket fair use): Leads to underinvestment in human-created content because creators are uncompensated while AI outputs substitute for original content.
- Strict IP / individual bargaining: Fails even without transaction costs because
- AI firms have monopsony power and optimally buy lower‑quality (cheaper) content, reducing creators’ incentives.
- Correlation across creators means selling one creator’s data partially reveals others’ content, creating negative externalities that depress the marginal price creators receive.
- Originality penalty: Highly novel/innovative creators are undercompensated relatively more than typical creators because AI prefers representative data; scarcity of originality does not command a premium in equilibrium.
- Dynamic "curse of precision":
- Better models substitute for costly human creativity, increasing human reliance on AI-assisted generation.
- This increases homogeneity of human-produced data, which is then used for retraining, degrading the model over time—even if fresh human data continues to be supplied.
- This is a distinct economic pathway to phenomena similar to empirically observed “model collapse.”
- Data intermediary solution:
- A single intermediary negotiates with the AI firm, pooling creators to overcome monopsony power and internalize information‑leakage externalities.
- Payments allocated according to marginal contributions (weights inspired by Aumann–Shapley) can reverse the originality penalty.
- A two‑part tariff (affine lump-sum and affine per-effort transfers) is sufficient to restore efficiency under full information.
- The proposed intermediary is a benchmark: it solves the market failures in a full‑information setting but does not yet address informational asymmetries or agency frictions, which are left for future work.
Data & Methods
- Modeling approach:
- Static model: Stackelberg game where an AI firm (buyer/monopsonist) first chooses price(s) for content/quality and individual creators choose effort/quality. Analysis focuses on equilibrium effort vs socially optimal effort.
- Dynamic model: Continuous‑time framework where the AI firm repeatedly trains on human-generated data; humans in turn use AI to assist creation. The model captures feedback effects between model quality and human effort/heterogeneity over time.
- Key assumptions:
- Creators expend costly effort to produce content; content signals are statistically correlated across creators (substitutability).
- AI firm has market power (monopsony/oligopsony) in acquiring training data.
- Full information in the intermediary benchmark (no asymmetric information or principal‑agent frictions).
- Social planner objective compares aggregate surplus including human production value and AI output.
- Analytical tools:
- Game-theoretic equilibrium analysis (Stackelberg and continuous‑time dynamics).
- Comparative static analysis to identify originality penalty and underinvestment.
- Cooperative-division ideas (Aumann–Shapley style weighting) to apportion payments among creators in the intermediary solution.
- Formal propositions and proofs provided in the Appendix.
- Literature connections:
- Extends informational‑economics literature on correlated data and data markets (Acemoglu et al., Bergemann et al.).
- Links to ML literature on model collapse and self‑training dynamics (Alemohammad et al., Shumailov et al.).
- Situates results in IP policy and fair-use legal debates and ongoing litigation.
Implications for AI Economics
- Policy design:
- Neither blanket fair use nor uncompensated individual IP bargaining is sufficient; policy should enable compensation mechanisms that internalize cross‑creator externalities and mitigate AI monopsony power.
- Data intermediaries (or collective bargaining platforms) are promising institutional solutions to align incentives between creators and AI firms.
- Regulatory attention should target market structure (monopsony power of AI buyers) and mechanisms for collective payments and attribution.
- Market design:
- Markets for training data should reward originality explicitly (reverse the originality penalty) to preserve diversity and innovation in creative inputs.
- Simple two‑part tariffs can implement efficient transfers under full information; practical implementations must also handle informational asymmetries and agency problems.
- Dynamic risk management:
- Economic incentives, not only technical fixes, are central to preventing deleterious feedback loops (homogenization → degraded models).
- Platforms and regulators should monitor and counteract incentives that push creators toward homogenized, AI‑assisted outputs (e.g., by subsidizing novel content or limiting overreliance on model outputs in training pipelines).
- Research directions:
- Study intermediary designs under asymmetric information and principal–agent constraints.
- Empirically measure content correlation, originality value, and the strength of monopsony power in data markets.
- Explore hybrid legal/regulatory approaches combining clearer property rules, mandated intermediaries, or antitrust remedies to monopsonistic data buying.
- Investigate platform-level mechanisms (recommendation systems, attribution, payment rules) that preserve content heterogeneity and long‑run model health.
Overall, the paper reframes the AI-data policy debate from a binary copyright question to a market‑design problem: effective institutions (not just stronger IP or unfettered access) are required to sustain both technological progress and human creative incentives.
Assessment
Claims (6)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| A "free-for-all" model based on fair use fails because it does not compensate creators for their contributions. Wages | negative | high | creator compensation |
0.12
|
| A regime of strong intellectual property rights, modeled as a static Stackelberg game, also fails to provide adequate creative incentives (it underpowers creative incentives). Creativity | negative | high | creative incentives / creator payoff |
0.12
|
| More innovative creators are especially harmed under the strong-IP regime — a phenomenon the paper terms the "originality penalty." Creativity | negative | high | relative payoff/incentive for innovative creators |
0.12
|
| In a dynamic model, an initially good AI model induces greater human reliance on AI-assisted creation, which homogenizes content and creates a feedback loop that degrades the model's own performance — a phenomenon termed the "curse of precision." Output Quality | negative | high | AI model performance (degradation over time due to homogenized training data) |
0.12
|
| Even a high-quality initial model can be undermined over time because human creators, relying on the model, produce more homogeneous content that harms subsequent training and lowers model performance. Output Quality | negative | high | degree of content homogeneity and downstream model performance |
0.12
|
| A market design with a data intermediary that internalizes cross-creator externalities and subsidizes innovative contributions can restore efficiency in the content-for-training market. Market Structure | positive | high | market efficiency / restoration of incentives and model performance |
0.02
|