← Papers

A game-theoretic framework finds that firms in competitive federated learning strategically trade off generating synthetic data against strengthening rivals, and that a payoff-redistribution mechanism can restore cooperation and improve collective model performance.

Cooperate to Compete: Strategic Data Generation and Incentivization Framework for Coopetitive Cross-Silo Federated Learning

Thanh Linh Nguyen, Nguyen Van Huynh, Quoc-Viet Pham · April 16, 2026

arxiv theoretical low evidence 8/10 relevance Source PDF

CoCoGen+ models firms' strategic GenAI-based synthetic data generation in competitive cross-silo federated learning and shows that an equilibrium-aware payoff-redistribution mechanism can align incentives, sustain collaboration, and raise social welfare under non-IID data.

In data-sensitive domains such as healthcare, cross-silo federated learning (CFL) allows organizations to collaboratively train AI models without sharing raw data. However, practical CFL deployments are inherently coopetitive, in which organizations cooperate during model training while competing in downstream markets. In such settings, training contributions, including data volume, quality, and diversity, can improve the global model yet inadvertently strengthen rivals. This dilemma is amplified by non-IID data, which leads to asymmetric learning gains and undermines sustained participation. While existing competition-aware CFL and incentive-design approaches reward organizations based on marginal training contributions, they fail to account for the costs of strengthening competitors. In this paper, we introduce CoCoGen+, a coopetition-compatible data generation and incentivization framework that jointly models non-IID data and inter-organizational competition while endogenizing GenAI-based synthetic data generation as a strategic decision. Specifically, CoCoGen+ formulates each training round as a weighted potential game, where organizations strategically decide how much synthetic data to generate by balancing learning performance gains against computational costs and competition-caused utility losses. We then provide a tractable equilibrium characterization and derive implementable generation strategies to maximize social welfare. To promote long-term collaboration, we integrate a payoff redistribution-based incentive mechanism to compensate organizations for their contributions and competition-caused utility degradation. Experiments on varying learning tasks validate the feasibility of CoCoGen+. The results show how non-IID data, competition intensity, and incentives shape organizational strategies and social welfare, while CoCoGen+ outperforms baselines in efficiency.

Summary

Main Finding

CoCoGen+ is a unified framework that treats GenAI-based synthetic data generation as a strategic decision in coopetitive cross-silo federated learning (CFL). By modeling per-round interactions as a weighted potential game, deriving Nash equilibria via KKT conditions and a convergent fixed-point iteration, and coupling these with a payoff-redistribution incentive, CoCoGen+ internalizes competitive externalities and non‑IID effects. Empirically (Fashion-MNIST, CIFAR-10, CIFAR-100), CoCoGen+ yields higher system-wide social welfare and more efficient generation strategies than baseline methods; its outcomes depend strongly on data heterogeneity, competition intensity, and incentive strength (which can have non‑monotonic, task-dependent effects).

Key Points

Problem context
- Cross-silo FL often involves coopetition: organizations cooperate to train a shared model but compete in downstream markets.
- Non-IID local data produces asymmetric learning gains and can destabilize participation.
- GenAI can mitigate non-IID via synthetic data, but generation is costly and strengthens rivals (competitive externality).
Main components of CoCoGen+
- Strategic decision: each organization chooses how much synthetic data to generate locally (d_gen_n) each training round.
- Utility specification: organizations trade off learning-performance gains, computational/generation costs, and utility losses from competitors benefiting (competition intensity modeled).
- Game-theoretic formulation: per-round interaction is a weighted potential game → existence and characterization of pure-strategy Nash equilibrium.
- Solution methods: derive equilibrium conditions using KKT and compute strategies via a provably convergent fixed-point iteration.
- Incentive mechanism: payoff redistribution that (i) rewards marginal contributions to the global model and (ii) redistributes payoffs from low-contribution organizations to higher contributors to offset competition-caused losses; mechanism is designed to satisfy individual rationality and budget balance.
Empirical findings
- CoCoGen+ outperforms baselines on social welfare across datasets of increasing task complexity.
- Stronger market competition and milder data heterogeneity tend to stimulate more synthetic-data generation and improve welfare.
- The strength of payoff redistribution interacts with task complexity and heterogeneity in a non-monotonic way—effective incentives must be co-tuned to task characteristics.
- Incorporating redistribution helps sustain participation and mitigates free-riding and competitive externalities.

Data & Methods

System model
- N organizations indexed by n; each has local dataset D_loc_n and may add synthetic data D_gen_n so mixed local data D_mix_n = D_loc_n ∪ D_gen_n (size d_mix_n = d_loc_n + d_gen_n).
- Training proceeds in rounds; organizations send model updates to a central server (cross-silo CFL setting).
- Utilities depend on training loss improvements (learning gains), generation/computation costs, and negative externalities from competitors gaining in downstream market share.
Game & solution
- Per-round strategic interaction framed as a weighted potential game over synthetic-data amounts {d_gen_n}.
- Nash equilibrium characterized analytically; KKT conditions used for tractable solutions.
- A fixed-point iteration algorithm is proved to converge to implementable equilibrium strategies.
Incentive design
- Payoff-redistribution mechanism: compensates organizations for marginal contributions and redistributes from lower to higher contributors to internalize competitive externalities.
- Mechanism constraints: individual rationality (participants better off joining) and budget balance (no external subsidy required).
Experiments
- Datasets: Fashion-MNIST, CIFAR-10, CIFAR-100 (increasing task complexity).
- Metrics: social welfare (sum of organization utilities), synthetic-data generation amounts, equilibrium strategies, sensitivity to heterogeneity, competition intensity, and redistribution strength.
- Baseline comparisons: standard CFL strategies and prior incentive/augmentation schemes (details in paper).

Implications for AI Economics

Coopetition matters: In federated settings where participants are market rivals, contribution incentives must account for negative competitive externalities. Classical marginal-contribution rewards can be insufficient and may discourage beneficial investments (e.g., costly GenAI data generation).
Designing incentives for public-good vs. competitive spillovers: Redistribution-style mechanisms that reallocate surplus from lower to higher contributors can internalize externalities and sustain participation without external subsidies (budget-balanced), but they must be calibrated to task complexity and data heterogeneity to avoid perverse effects.
GenAI as a strategic economic resource: Organizations will treat synthetic-data generation as an economic decision (costs vs. rival-strengthening). Policy or consortium rules that subsidize or tax generation, or that shape redistribution, can materially change equilibrium behaviors and welfare.
Heterogeneity and market intensity guide optimal policy: Mild heterogeneity and stronger competition can paradoxically increase cooperative investments (more synthetic data) and improve welfare; hence mechanism design should consider the empirical distribution of data heterogeneity and market competition intensity.
Practical deployment considerations: For regulated, data-sensitive sectors (healthcare, finance), CoCoGen+ suggests a pathway to ethically and economically viable pooled model training—provided there is a trustworthy coordination entity and well-designed, budget-balanced incentives.
Future economic research directions: quantifying welfare trade-offs under alternative redistribution rules, extending to dynamic multi-round markets (long-term strategic behavior), and integrating asymmetric access to external generative services or heterogeneous generation costs.

Assessment

Paper Typetheoretical Evidence Strengthlow — Findings rely on a formal theoretical model and simulation experiments rather than empirical measurement in real-world cross-silo deployments, so claims about real organizational behavior and causal impacts on market outcomes are not directly validated. Methods Rigormedium — The paper provides a tractable equilibrium characterization and implements an incentive mechanism with comparative simulations, suggesting solid analytical work; however, rigor is limited by modeling assumptions (e.g., utility specifications, agent rationality, synthetic data fidelity) and lack of empirical validation. SampleSimulated cross-silo federated learning environments with multiple organizations holding non-IID data partitions; GenAI-based synthetic data generation is endogenized and evaluated across several learning tasks in experiments comparing CoCoGen+ to baseline incentive/generation strategies (specific datasets and real-world deployment details not provided in the abstract). Themesorg_design governance IdentificationGame-theoretic model: frames each training round as a weighted potential game in which organizations choose synthetic data generation levels; equilibrium characterization and mechanism design (payoff redistribution) are used to analyze incentives and welfare. No causal identification from observational or experimental real-world data. GeneralizabilitySimulated experiments may not capture real-world organizational behavior, legal/regulatory constraints, or frictions in deployments, Model assumes fully rational agents and specific utility/cost functional forms that may not match firms' preferences, Results depend on fidelity of synthetic data and assumptions about how synthetic data affects model performance, Competition structure and market impacts abstracted away; applicability to diverse sectors (beyond healthcare-like settings) is uncertain, Scalability and privacy/ethical constraints of GenAI-generated data in practice are not empirically tested

Claims (10)

Claim	Direction	Confidence	Outcome	Details
Cross-silo federated learning (CFL) deployments in data-sensitive domains are inherently coopetitive: organizations cooperate during model training while competing in downstream markets, so training contributions can inadvertently strengthen rivals. Adoption Rate	negative	high	strengthening_of_rivals / participation incentives	0.12
Non-IID data amplifies this coopetition dilemma by producing asymmetric learning gains across organizations and undermining sustained participation. Adoption Rate	negative	high	asymmetry_in_learning_gains / sustained_participation	0.12
Existing competition-aware CFL and incentive-design approaches reward organizations based on marginal training contributions but fail to account for the costs of strengthening competitors. Governance And Regulation	negative	high	adequacy_of_incentive_design (accounting for competitor-strengthening costs)	0.12
We introduce CoCoGen+, a coopetition-compatible data generation and incentivization framework that jointly models non-IID data and inter-organizational competition while endogenizing GenAI-based synthetic data generation as a strategic decision. Task Allocation	positive	high	strategic_data_generation_decisions	0.2
CoCoGen+ formulates each training round as a weighted potential game in which organizations strategically decide how much synthetic data to generate by balancing learning performance gains against computational costs and competition-caused utility losses. Task Allocation	neutral	high	synthetic_data_generation_quantity (strategy)	0.2
We provide a tractable equilibrium characterization of the game and derive implementable synthetic-data generation strategies that maximize social welfare. Organizational Efficiency	positive	high	social_welfare / efficiency_of_strategies	0.12
To promote long-term collaboration, CoCoGen+ integrates a payoff-redistribution-based incentive mechanism to compensate organizations for their contributions and competition-caused utility degradation. Governance And Regulation	positive	high	compensation_for_contributions / mitigation_of_utility_loss	0.12
Experiments on varying learning tasks validate the feasibility of CoCoGen+. Organizational Efficiency	positive	high	feasibility_of_framework (empirical validation)	0.12
The results show how non-IID data, competition intensity, and incentives shape organizational strategies and social welfare. Organizational Efficiency	mixed	high	organizational_strategies / social_welfare	0.12
CoCoGen+ outperforms baselines in efficiency. Organizational Efficiency	positive	high	efficiency (presumably social welfare or utility per cost)	0.12