Tweak incentives, not the model: paying writers for originality—rather than quality—prevents generative AI from making outputs more alike, because originality payoffs prompt users to lean on the model for brainstorming and targeted edits instead of copying suggestions verbatim.
Generative AI is quickly becoming an integral part of people's everyday workflows. Early evidence has shown that while generative AI can increase individual-level productivity, it does so at the cost of collective diversity, potentially narrowing the set of ideas and perspectives produced. Our research stands in contrast to this concern: through a pre-registered randomized control trial, we show that incentives mediate AI's homogenizing force in a creative writing task where participants can use AI interactively. Participants rewarded for originality relative to peers produce collectively more diverse writing than those rewarded for quality alone. This divergence is driven not by abandoning AI, but by how participants use it: those incentivized for originality incorporate fewer AI suggestions verbatim, relying on the model more selectively for brainstorming, proofreading, and targeted edits. Our results reveal that the effects of generative AI depend not only on the technology itself, but also the behavioral strategies and incentive structures surrounding its use.
Summary
Main Finding
In a pre-registered randomized trial (n = 200), generative AI (GPT5-Mini) tends to homogenize creative outputs when it produces full drafts, but human co-creation substantially offsets that homogenizing force. Crucially, incentive structures matter: rewarding originality (vs. technical quality) increases the diversity of final stories when people have access to AI. This increase in diversity arises not from abandoning the tool but from different strategies of using it—participants incentivized for originality incorporate fewer AI suggestions verbatim and use the model more selectively (brainstorming, proofreading, targeted edits). However, even with originality incentives, AI-assisted outputs remain less diverse than purely human-written stories.
Key Points
- Experimental design: 2 × 2 between-subjects study (AI vs. SELF) × (Originality (O) vs. Technical quality (T)), 50 participants per cell, pre-registered.
- Task: write a 250–350 word short story in a 25-minute session (5-minute brainstorming, 20-minute writing); rich interaction logs (editor snapshots every 5s and full AI transcripts).
- AI usage: participants in the AI condition were required to use the tool at least once. 91% requested at least one full draft; mean conversation turns = 3.57.
- Time use: AI users wrote slightly faster on average (18.6 vs. 20.6 minutes), but differences are modest.
- Measured diversity via a pre-specified suite of metrics (multiple embedding models’ cosine similarities, style embeddings, byte-level embeddings, and n-gram/token-based metrics including compression ratio, BLEU, ROUGE-L, and n-gram diversity score). Participant-level outcome = leave-one-out average similarity within each cell.
- Main empirical patterns:
- AI first drafts are highly similar to each other (homogenizing).
- Human editing and iterative prompting substantially increase diversity relative to raw AI drafts; final stories move closer to the human-only distribution.
- Incentivizing originality increases diversity among AI-assisted writers compared to those incentivized for technical quality.
- The mechanism: originality incentives reduce direct adoption/anchoring on AI drafts (fewer verbatim incorporations); users rely more selectively on the model.
- Conditional on adopting AI suggestions, participants incentivized for originality spent more time prompting but did not achieve additional diversity gains.
- Participants with higher prior AI experience tended to rely more on AI suggestions and produced more homogeneous stories.
- Statistical approach: pre-registered hypotheses tested with leave-one-out similarity measures and Welch’s t-tests across multiple similarity/diversity metrics. Results are consistent across metrics and robust to multiple-hypothesis correction (reported as a correlated set of views on latent diversity).
Data & Methods
- Sample: 200 Prolific participants (50 per experimental cell). Attention controls: full-screen-only interface, inactivity checks, minimum time thresholds, post-hoc filtering for low-effort sessions.
- Treatments:
- AI vs. SELF: AI condition had an integrated LLM assistant (GPT5-Mini); SELF wrote without the tool.
- Incentives: Originality (bonus for top 25% in originality; $2–7) vs. Technical Quality (rubric-based grade bonuses; no cap; A=$2.5, B=$1).
- Interaction logging: editor snapshots every 5 seconds, full AI conversation transcripts, and manual flagging of AI-produced valid drafts.
- Classification of AI uses (from transcripts): editing requests (38%), drafting (30%), creative consulting (14%), mechanical assistance (3%), others (15%).
- Diversity measurement: ensemble of similarity/diversity metrics (embedding cosine similarities from several models, style embeddings, byte-level embeddings, compression ratio, BLEU, ROUGE-L, n-gram diversity). Outcome is average within-cell similarity (leave-one-out). Hypotheses:
- H1: AI reduces diversity vs. SELF.
- H2: Originality incentive increases diversity vs. Technical-quality incentive.
- Inferential tests: two-sample difference-in-means (Welch’s t-test) for each metric; interpret pattern across correlated metrics rather than a single p-value. Pre-registration and analysis plan published on OSF.
Implications for AI Economics
- Incentives shape aggregate outcomes from human-AI co-creation. Market or organizational incentive structures (bonuses, grading, promotion criteria) materially influence whether AI increases productivity at the cost of reduced collective diversity.
- Product differentiation and cultural variety depend not only on model capabilities but on how agents are rewarded and how they strategically use AI. Firms/platforms that reward distinctiveness (e.g., editorial or commissioning incentives emphasizing novelty) can partially mitigate AI-induced homogenization.
- Platform/UI design matters as a policy lever: defaults, templates, or recommendation systems that encourage selective use, iterative editing, or explicit originality signals can change adoption behavior and thus the distribution of outputs.
- Labor-market implications: more experienced AI users may increase homogeneity if their standard practice is heavy reliance on model outputs. This could compress observable stylistic differentiation across workers, affecting reputational signaling, wage dispersion, and returns to creative skill.
- Trade-offs: rewarding originality increases diversity but may involve higher effort, slower productivity, or lower measured “technical quality” under some rubrics. Policymakers and firms should weigh the social value of diversity (innovation externalities, cultural variety) against short-term productivity gains from standardized AI assistance.
- Regulation & platform policy: interventions that change incentives (e.g., disclosure rules, copyright and attribution regimes, reward structures for human-authored originality) could influence the macro-level cultural and economic impacts of generative AI.
- Research gaps for policy and practice: need field experiments with stronger/real-world incentives, longer-term studies of learning and habituation, and exploration of how product markets respond to shifts in diversity (consumer surplus, innovation rates, competition).
Limitations noted by authors: controlled Prolific sample and lab-like payoffs (bonuses) are not perfect proxies for real-world stakes; short-term task frame (one session) does not reveal long-run dynamics; choice of AI model and similarity metrics influences measured effects. The study is pre-registered and transparent about exploratory analyses.
Assessment
Claims (8)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Early evidence has shown that generative AI can increase individual-level productivity. Organizational Efficiency | positive | high | individual-level productivity |
0.6
|
| Early evidence suggests generative AI increases productivity but does so at the cost of collective diversity, potentially narrowing the set of ideas and perspectives produced. Creativity | negative | high | collective diversity of produced ideas/perspectives |
0.6
|
| Through a pre-registered randomized control trial, we show that incentives mediate AI's homogenizing force in a creative writing task where participants can use AI interactively. Creativity | mixed | high | extent to which incentives alter AI's homogenizing effect (mediating effect) |
0.6
|
| Participants rewarded for originality relative to peers produce collectively more diverse writing than those rewarded for quality alone. Creativity | positive | high | collective diversity of writing |
0.6
|
| The divergence in collective outputs is not driven by participants abandoning AI, but by how participants use it. Task Allocation | null_result | high | continued use of AI (vs. abandonment) |
0.6
|
| Participants incentivized for originality incorporate fewer AI suggestions verbatim. Task Allocation | negative | high | rate of verbatim incorporation of AI suggestions |
0.6
|
| Those incentivized for originality rely on the model more selectively for brainstorming, proofreading, and targeted edits. Task Allocation | positive | high | types of tasks for which AI is used (brainstorming, proofreading, targeted edits) |
0.6
|
| The effects of generative AI depend not only on the technology itself, but also the behavioral strategies and incentive structures surrounding its use. Organizational Efficiency | mixed | high | impact of incentives and strategies on AI outcomes |
0.6
|