Large language models sharpen individual outputs but shrink the pool of distinct ideas: across stories, slogans and alternative-use prompts three frontier LLMs generate systematically less diverse idea sets than comparable human samples, implying higher redundancy costs — though targeted generation protocols can reduce that crowding.
Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones. This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level crowding. We introduce a human-relative framework for benchmarking AI-induced human diversity collapse without requiring human-AI interaction data, providing an ex ante protocol to estimate crowding risk from model-only generations and matched unaided human baselines. By modeling ideas as congestible resources, we show that source-level crowding is identifiable from within-distribution comparisons, yielding an excess-crowding coefficient $Δ$ and a human-relative diversity ratio $ρ$. We show that $ρ\ge1$ is the no-excess-crowding parity condition and connect $Δ$ to an adoption game with exposure-dependent redundancy costs. Across short stories, marketing slogans, and alternative-uses tasks, three frontier LLMs fall below parity across crowding kernels. Estimates stabilize with feasible model-only sample sizes. Importantly, generation-protocol variants show that crowding can be reduced through targeted design, making diversity collapse an actionable, development-time evaluation target for population-aware creative AI.
Summary
Main Finding
The paper introduces an ex ante, human-relative framework to quantify AI-induced human diversity collapse from model-only and matched unaided human samples. It defines interpretable source-level metrics—an excess-crowding coefficient (∆) and a human-relative diversity ratio (ρ)—that are identifiable from within-distribution comparisons. Theoretical results link these source metrics to a population adoption (congestion) game: if ρ < 1 a model imposes an adoption-dependent externality (higher redundancy cost) on creators; if ρ ≥ 1 the model introduces no excess crowding. Empirically, across three creative tasks (short stories, alternative-uses, marketing slogans) and three frontier LLMs (GPT-5.4, Claude Sonnet 4.5, Gemini 2.5 Flash), neutral model-conditions fall below parity (ρ < 1), indicating positive excess crowding vs. matched human baselines. The paper also shows estimates stabilize with feasible model-only sample sizes and that prompt/protocol interventions (temperature, persona mixtures) can reduce crowding.
Key Points
- Conceptual framing
- Ideas-as-congestible-resources: inspiration sources act like shared resources whose repeated use reduces downstream distinctiveness.
- Human-relative benchmark: contextualize model crowding against task-matched unaided human crowding to avoid conflating task-constrained convergence with model effects.
- Source-level metrics (identifiable from samples)
- κH,k = E_{h,h'~Hk}[Kk(h,h')] (human pairwise crowding)
- κA,m,k = E_{a,a'~Am,k}[Kk(a,a')] (model pairwise crowding)
- ∆m,k = max{0, κA,m,k − κH,k} (excess-crowding coefficient)
- ρm,k = (1 − κA,m,k) / (1 − κH,k) (human-relative diversity ratio)
- Parity condition: ∆ = 0 ⇔ ρ ≥ 1 (no excess crowding)
- Links to economic/adoption theory
- Redundancy cost (exposure-dependent): Cm,k(X−i) = γk (1 − exp{−X−i ∆m,k}), where γk is value of distinctiveness and X−i is number of other adopters.
- Critical-benefit adoption threshold: a creator adopts iff private AI benefit Bi,m,k exceeds the redundancy cost; thus lower ρ increases the benefit required for rational adoption.
- Mass-adoption limit: if ρ < 1, excess crowding can reach full distinctiveness penalty γk as adoption grows; if ρ ≥ 1 no excess penalty arises at any exposure.
- Empirical findings
- Neutral prompting (T = 1), 50 model-only draws per task-condition: all nine model×task combinations had ρ < 1 under the primary semantic kernel; bootstrap CIs for ρ were below 1.
- Example: GPT-5.4 in the slogan condition showed a large deficit (bρ ≈ 0.179).
- Rarefaction diagnostics: pairwise crowding estimates stabilize with feasible model-only sample sizes, supporting practical development-time evaluation.
- Task-specific kernels (plot-synopsis, concept-bucket, lexical-template) corroborate results across representational levels.
- Generation-protocol variants (temperature sweeps, persona-mixture prompting) can move model-conditions toward parity; crowding is not immutable to prompting/design.
Data & Methods
- Tasks and human baselines
- Short stories: 3 compact-fiction prompts from WritingPrompts; 87 human authors (one story each).
- Alternative Uses Task (AUT): socialmuse dataset; 109 human contributors generating 3,047 unaided ideas across five objects (primary uses excluded).
- Smartphone slogans: IRB-approved study with 95 contributors producing 659 slogans (650 unique).
- Each prompt/object/slogan context treated as a task condition k; estimates are computed within-condition and then equally aggregated across conditions.
- Models & generation protocols
- Models: GPT-5.4, Claude Sonnet 4.5, Gemini 2.5 Flash.
- Main protocol: neutral prompting, temperature T = 1.0, 50 independent model-only generations per condition.
- Deployment variants: temperature sweeps and a persona-mixture protocol (25-persona grid based on Big Five binary dimensions).
- Crowding kernels
- Primary kernel (semantic): Ksem(x,y) = (1 + cos(embed(x), embed(y))) / 2, mapping cosine similarity to [0,1].
- Task-specific kernels: plot-synopsis similarity for stories, concept-bucket co-membership for AUT, lexical-template overlap for slogans.
- Same kernel applied to both human and model samples for comparability.
- Estimation procedure
- Matched-sample bootstrap: for each condition, draw bm,k = min(nH_k, nA_m,k) human units and model generations with replacement; compute mean off-diagonal pairwise K values to estimate κH,k and κA,m,k.
- Compute ∆ and ρ per condition; aggregate equally across conditions in a task family; use percentile bootstrap intervals for uncertainty.
- Participant-aware sampling when humans contributed multiple responses (sample participant, then one response) to avoid domination by prolific contributors.
- Rarefaction curves used to assess finite-sample stability.
- Theoretical results and assumptions
- Independent-exposure and mean-field approximations used to derive adoption-cost expressions; crowding kernel bounded in [0,1].
- Decision-theoretic interpretation requires estimates of γk, adoption probability p, and population size N for translating ∆ into expected costs.
Implications for AI Economics
- Externalities and adoption dynamics
- Shared use of generative models can produce negative externalities (excess crowding) that reduce private returns to distinctiveness and alter aggregate welfare.
- The framework cleanly separates a model-intrinsic crowding parameter (∆ or ρ) from population context (N, adoption prevalence p) and value of distinctiveness (γ), enabling modular welfare and adoption analyses.
- In markets where distinctiveness is valuable (high γ) and adoption rates are high, models with ρ < 1 raise the private benefit threshold for adoption and can produce large aggregate redundancy costs.
- Policy, platform, and firm strategy applications
- Ex ante auditing: developers and platforms can estimate ∆ and ρ from model-only samples before deployment to audit crowding risk and compare model-conditions.
- Product design & mitigation: generation-protocol choices (e.g., higher temperature, persona mixtures, diversity-promoting decoding) are actionable levers to reduce crowding and move toward parity.
- Pricing and market design: knowledge of crowding externalities could inform subscription pricing, feature segmentation, or differentiation (e.g., offering diversified-generation modes as a premium privacy/uniqueness feature).
- Regulation and standards: ρ and ∆ provide candidate metrics for assessing population-level cultural/creative impacts and could inform guidelines around the deployment of ideation tools in domains where distinctiveness matters (journalism, marketing, patent ideation).
- Research and measurement implications
- Practicality: the method requires only model-only and matched human-only samples and stabilizes with modest sample sizes, making it feasible for routine use in model development cycles.
- Decision support: combining source-level ρ estimates with market parameters (p, N, γ) yields quantitative predictions of expected redundancy costs and critical benefit thresholds for adoption—useful for forecasting adoption and welfare impacts.
- Limitations and caveats relevant to economic interpretation
- Kernel dependence: results depend on the choice of crowding kernel; different representational levels may yield different quantitative conclusions—careful kernel selection is essential for domain-relevant policy.
- Behavioral assumptions: the adoption game uses independent-exposure and mean-field approximations; real-world strategic behavior, network structure, and feedback loops (e.g., personalization, model updates) can complicate dynamics.
- Mapping to realized human outputs: source-level excess crowding is necessary but not sufficient for realized human-AI diversity collapse—users may selectively use or transform model outputs; empirical human–AI interaction data remains necessary to validate realized effects in deployment contexts.
- Value of distinctiveness (γ) and beliefs about others’ adoption (p) are context-dependent and may be hard to estimate; welfare conclusions require domain-specific calibration.
Overall, the paper provides a usable, theoretically grounded ex ante tool for measuring model-driven crowding risk and links that measurement to adoption incentives and aggregate externalities—offering developers, platforms, and policymakers a concrete pathway to audit and mitigate population-level harms from creative AI.
Assessment
Claims (8)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones. Creativity | negative | high | loss of value due to similarity (population-level creative value) |
0.08
|
| This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level crowding. Creativity | negative | high | population-level crowding (diversity collapse) |
0.08
|
| We introduce a human-relative framework for benchmarking AI-induced human diversity collapse without requiring human-AI interaction data, providing an ex ante protocol to estimate crowding risk from model-only generations and matched unaided human baselines. Creativity | positive | high | ability to benchmark AI-induced diversity collapse (method performance) |
0.48
|
| By modeling ideas as congestible resources, we show that source-level crowding is identifiable from within-distribution comparisons, yielding an excess-crowding coefficient Δ and a human-relative diversity ratio ρ. Creativity | positive | high | identifiability of source-level crowding; definition of Δ and ρ |
0.48
|
| We show that ρ ≥ 1 is the no-excess-crowding parity condition and connect Δ to an adoption game with exposure-dependent redundancy costs. Creativity | neutral | high | parity condition for no-excess-crowding (ρ ≥ 1) and economic/game-theoretic relation of Δ |
0.48
|
| Across short stories, marketing slogans, and alternative-uses tasks, three frontier LLMs fall below parity across crowding kernels. Creativity | negative | high | human-relative diversity ratio (ρ) indicating excess crowding |
n=3
0.48
|
| Estimates stabilize with feasible model-only sample sizes. Creativity | positive | medium | stability/convergence of crowding estimates as model-only sample size increases |
0.14
|
| Generation-protocol variants show that crowding can be reduced through targeted design, making diversity collapse an actionable, development-time evaluation target for population-aware creative AI. Creativity | positive | medium | change in crowding (Δ or ρ) under generation-protocol variants |
0.29
|