The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

Large language models sharpen individual outputs but shrink the pool of distinct ideas: across stories, slogans and alternative-use prompts three frontier LLMs generate systematically less diverse idea sets than comparable human samples, implying higher redundancy costs — though targeted generation protocols can reduce that crowding.

Ex Ante Evaluation of AI-Induced Idea Diversity Collapse
Nafis Saami Azad, Raiyan Abdul Baten · May 07, 2026
arxiv quasi_experimental medium evidence 7/10 relevance Source PDF
Using a congestible-resource framework, the paper shows that three frontier LLMs produce less population-level creative diversity than matched human baselines (ρ<1), quantifying excess crowding with an identifiable coefficient Δ and demonstrating that protocol design can mitigate diversity collapse.

Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones. This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level crowding. We introduce a human-relative framework for benchmarking AI-induced human diversity collapse without requiring human-AI interaction data, providing an ex ante protocol to estimate crowding risk from model-only generations and matched unaided human baselines. By modeling ideas as congestible resources, we show that source-level crowding is identifiable from within-distribution comparisons, yielding an excess-crowding coefficient $Δ$ and a human-relative diversity ratio $ρ$. We show that $ρ\ge1$ is the no-excess-crowding parity condition and connect $Δ$ to an adoption game with exposure-dependent redundancy costs. Across short stories, marketing slogans, and alternative-uses tasks, three frontier LLMs fall below parity across crowding kernels. Estimates stabilize with feasible model-only sample sizes. Importantly, generation-protocol variants show that crowding can be reduced through targeted design, making diversity collapse an actionable, development-time evaluation target for population-aware creative AI.

Summary

Main Finding

The paper introduces an ex ante, human-relative framework to quantify AI-induced human diversity collapse from model-only and matched unaided human samples. It defines interpretable source-level metrics—an excess-crowding coefficient (∆) and a human-relative diversity ratio (ρ)—that are identifiable from within-distribution comparisons. Theoretical results link these source metrics to a population adoption (congestion) game: if ρ < 1 a model imposes an adoption-dependent externality (higher redundancy cost) on creators; if ρ ≥ 1 the model introduces no excess crowding. Empirically, across three creative tasks (short stories, alternative-uses, marketing slogans) and three frontier LLMs (GPT-5.4, Claude Sonnet 4.5, Gemini 2.5 Flash), neutral model-conditions fall below parity (ρ < 1), indicating positive excess crowding vs. matched human baselines. The paper also shows estimates stabilize with feasible model-only sample sizes and that prompt/protocol interventions (temperature, persona mixtures) can reduce crowding.

Key Points

  • Conceptual framing
    • Ideas-as-congestible-resources: inspiration sources act like shared resources whose repeated use reduces downstream distinctiveness.
    • Human-relative benchmark: contextualize model crowding against task-matched unaided human crowding to avoid conflating task-constrained convergence with model effects.
  • Source-level metrics (identifiable from samples)
    • κH,k = E_{h,h'~Hk}[Kk(h,h')] (human pairwise crowding)
    • κA,m,k = E_{a,a'~Am,k}[Kk(a,a')] (model pairwise crowding)
    • ∆m,k = max{0, κA,m,k − κH,k} (excess-crowding coefficient)
    • ρm,k = (1 − κA,m,k) / (1 − κH,k) (human-relative diversity ratio)
    • Parity condition: ∆ = 0 ⇔ ρ ≥ 1 (no excess crowding)
  • Links to economic/adoption theory
    • Redundancy cost (exposure-dependent): Cm,k(X−i) = γk (1 − exp{−X−i ∆m,k}), where γk is value of distinctiveness and X−i is number of other adopters.
    • Critical-benefit adoption threshold: a creator adopts iff private AI benefit Bi,m,k exceeds the redundancy cost; thus lower ρ increases the benefit required for rational adoption.
    • Mass-adoption limit: if ρ < 1, excess crowding can reach full distinctiveness penalty γk as adoption grows; if ρ ≥ 1 no excess penalty arises at any exposure.
  • Empirical findings
    • Neutral prompting (T = 1), 50 model-only draws per task-condition: all nine model×task combinations had ρ < 1 under the primary semantic kernel; bootstrap CIs for ρ were below 1.
    • Example: GPT-5.4 in the slogan condition showed a large deficit (bρ ≈ 0.179).
    • Rarefaction diagnostics: pairwise crowding estimates stabilize with feasible model-only sample sizes, supporting practical development-time evaluation.
    • Task-specific kernels (plot-synopsis, concept-bucket, lexical-template) corroborate results across representational levels.
    • Generation-protocol variants (temperature sweeps, persona-mixture prompting) can move model-conditions toward parity; crowding is not immutable to prompting/design.

Data & Methods

  • Tasks and human baselines
    • Short stories: 3 compact-fiction prompts from WritingPrompts; 87 human authors (one story each).
    • Alternative Uses Task (AUT): socialmuse dataset; 109 human contributors generating 3,047 unaided ideas across five objects (primary uses excluded).
    • Smartphone slogans: IRB-approved study with 95 contributors producing 659 slogans (650 unique).
    • Each prompt/object/slogan context treated as a task condition k; estimates are computed within-condition and then equally aggregated across conditions.
  • Models & generation protocols
    • Models: GPT-5.4, Claude Sonnet 4.5, Gemini 2.5 Flash.
    • Main protocol: neutral prompting, temperature T = 1.0, 50 independent model-only generations per condition.
    • Deployment variants: temperature sweeps and a persona-mixture protocol (25-persona grid based on Big Five binary dimensions).
  • Crowding kernels
    • Primary kernel (semantic): Ksem(x,y) = (1 + cos(embed(x), embed(y))) / 2, mapping cosine similarity to [0,1].
    • Task-specific kernels: plot-synopsis similarity for stories, concept-bucket co-membership for AUT, lexical-template overlap for slogans.
    • Same kernel applied to both human and model samples for comparability.
  • Estimation procedure
    • Matched-sample bootstrap: for each condition, draw bm,k = min(nH_k, nA_m,k) human units and model generations with replacement; compute mean off-diagonal pairwise K values to estimate κH,k and κA,m,k.
    • Compute ∆ and ρ per condition; aggregate equally across conditions in a task family; use percentile bootstrap intervals for uncertainty.
    • Participant-aware sampling when humans contributed multiple responses (sample participant, then one response) to avoid domination by prolific contributors.
    • Rarefaction curves used to assess finite-sample stability.
  • Theoretical results and assumptions
    • Independent-exposure and mean-field approximations used to derive adoption-cost expressions; crowding kernel bounded in [0,1].
    • Decision-theoretic interpretation requires estimates of γk, adoption probability p, and population size N for translating ∆ into expected costs.

Implications for AI Economics

  • Externalities and adoption dynamics
    • Shared use of generative models can produce negative externalities (excess crowding) that reduce private returns to distinctiveness and alter aggregate welfare.
    • The framework cleanly separates a model-intrinsic crowding parameter (∆ or ρ) from population context (N, adoption prevalence p) and value of distinctiveness (γ), enabling modular welfare and adoption analyses.
    • In markets where distinctiveness is valuable (high γ) and adoption rates are high, models with ρ < 1 raise the private benefit threshold for adoption and can produce large aggregate redundancy costs.
  • Policy, platform, and firm strategy applications
    • Ex ante auditing: developers and platforms can estimate ∆ and ρ from model-only samples before deployment to audit crowding risk and compare model-conditions.
    • Product design & mitigation: generation-protocol choices (e.g., higher temperature, persona mixtures, diversity-promoting decoding) are actionable levers to reduce crowding and move toward parity.
    • Pricing and market design: knowledge of crowding externalities could inform subscription pricing, feature segmentation, or differentiation (e.g., offering diversified-generation modes as a premium privacy/uniqueness feature).
    • Regulation and standards: ρ and ∆ provide candidate metrics for assessing population-level cultural/creative impacts and could inform guidelines around the deployment of ideation tools in domains where distinctiveness matters (journalism, marketing, patent ideation).
  • Research and measurement implications
    • Practicality: the method requires only model-only and matched human-only samples and stabilizes with modest sample sizes, making it feasible for routine use in model development cycles.
    • Decision support: combining source-level ρ estimates with market parameters (p, N, γ) yields quantitative predictions of expected redundancy costs and critical benefit thresholds for adoption—useful for forecasting adoption and welfare impacts.
  • Limitations and caveats relevant to economic interpretation
    • Kernel dependence: results depend on the choice of crowding kernel; different representational levels may yield different quantitative conclusions—careful kernel selection is essential for domain-relevant policy.
    • Behavioral assumptions: the adoption game uses independent-exposure and mean-field approximations; real-world strategic behavior, network structure, and feedback loops (e.g., personalization, model updates) can complicate dynamics.
    • Mapping to realized human outputs: source-level excess crowding is necessary but not sufficient for realized human-AI diversity collapse—users may selectively use or transform model outputs; empirical human–AI interaction data remains necessary to validate realized effects in deployment contexts.
    • Value of distinctiveness (γ) and beliefs about others’ adoption (p) are context-dependent and may be hard to estimate; welfare conclusions require domain-specific calibration.

Overall, the paper provides a usable, theoretically grounded ex ante tool for measuring model-driven crowding risk and links that measurement to adoption incentives and aggregate externalities—offering developers, platforms, and policymakers a concrete pathway to audit and mitigate population-level harms from creative AI.

Assessment

Paper Typequasi_experimental Evidence Strengthmedium — The paper provides a clear theoretical identification argument and applies it empirically across multiple creative tasks and three frontier LLMs, with robustness checks on sample size and generation protocols; however it does not observe real-world adoption or downstream economic outcomes and rests on structural assumptions (crowding kernels, representativeness of human baselines) that limit causal claims about economic impact. Methods Rigorhigh — The authors derive formal identifiability results, define transparent summary statistics (Δ and ρ), test across multiple tasks and models, report stability with feasible sampling, and explore protocol/prompt variants to probe mechanisms; remaining concerns are explicit assumptions about kernels and external validity rather than weaknesses in implementation or inference. SampleModel-only generations from three frontier large language models across three creative tasks (short stories, marketing slogans, alternative-uses), with matched unaided human baseline samples; multiple generation-protocol variants and crowding-kernel specifications used to test robustness; sample sizes reported sufficient for estimator stability. Themesinnovation adoption IdentificationModels ideas as congestible resources and compares model-only generation distributions to matched unaided human baselines; within-distribution contrasts identify an excess-crowding coefficient (Δ) and a human-relative diversity ratio (ρ) under structural assumptions about crowding kernels, giving an ex ante, model-only estimand for population-level crowding without requiring observed human-AI interaction. GeneralizabilityTasks limited to three creative domains (short stories, slogans, alternative-uses) and may not generalize to technical or domain-specific idea markets, Only three frontier LLMs evaluated — results may differ for smaller or future models, Model-only generation comparison does not capture real-world adoption dynamics, market incentives, or human-AI interactive workflows, Identifiability relies on assumed crowding kernels and on matched human baselines being representative of population creativity, Cultural, language, and domain-specific variation not extensively explored

Claims (8)

ClaimDirectionConfidenceOutcomeDetails
Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones. Creativity negative high loss of value due to similarity (population-level creative value)
0.08
This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level crowding. Creativity negative high population-level crowding (diversity collapse)
0.08
We introduce a human-relative framework for benchmarking AI-induced human diversity collapse without requiring human-AI interaction data, providing an ex ante protocol to estimate crowding risk from model-only generations and matched unaided human baselines. Creativity positive high ability to benchmark AI-induced diversity collapse (method performance)
0.48
By modeling ideas as congestible resources, we show that source-level crowding is identifiable from within-distribution comparisons, yielding an excess-crowding coefficient Δ and a human-relative diversity ratio ρ. Creativity positive high identifiability of source-level crowding; definition of Δ and ρ
0.48
We show that ρ ≥ 1 is the no-excess-crowding parity condition and connect Δ to an adoption game with exposure-dependent redundancy costs. Creativity neutral high parity condition for no-excess-crowding (ρ ≥ 1) and economic/game-theoretic relation of Δ
0.48
Across short stories, marketing slogans, and alternative-uses tasks, three frontier LLMs fall below parity across crowding kernels. Creativity negative high human-relative diversity ratio (ρ) indicating excess crowding
n=3
0.48
Estimates stabilize with feasible model-only sample sizes. Creativity positive medium stability/convergence of crowding estimates as model-only sample size increases
0.14
Generation-protocol variants show that crowding can be reduced through targeted design, making diversity collapse an actionable, development-time evaluation target for population-aware creative AI. Creativity positive medium change in crowding (Δ or ρ) under generation-protocol variants
0.29

Notes