A sampling algorithm can reliably find broadly acceptable AI-generated statements while protecting minority blocking rights, and its sample requirements are provably optimal; matching lower bounds show no algorithm can achieve the same guarantees with fewer user queries.
We study the problem of selecting a statement that finds common ground across diverse population preferences. Generative AI is uniquely suited for this task because it can access a practically infinite set of statements, but AI systems like the Habermas machine leave the choice of generated statement to a voting rule. What it means for this rule to find common ground, however, is not well-defined. In this work, we propose a formal model for finding common ground in the infinite alternative setting based on the proportional veto core from social choice. To provide guarantees relative to these infinitely many alternatives and a large population, we wish to satisfy a notion of proportional veto core using only query access to the unknown distribution of alternatives and voters. We design an efficient sampling-based algorithm that returns an alternative in the (approximate) proportional veto core with high probability and prove matching lower bounds, which show that no algorithm can do the same using fewer queries. On a synthetic dataset of preferences over text, we confirm the effectiveness of our sampling-based algorithm and compare other social choice methods as well as LLM-based methods in terms of how reliably they produce statements in the proportional veto core.
Summary
Main Finding
The paper formalizes “finding common ground” over an infinite set of candidate statements using the proportional veto core (a proportional-consensus social choice concept) and gives an efficient, sampling-based algorithm that—using only query access to an unknown distribution of voters and alternatives—returns an alternative in the (approximate) proportional veto core with high probability. The authors prove matching lower bounds showing the sample complexity is information-theoretically optimal, and they validate the approach on a synthetic text-preference dataset while comparing other social-choice and LLM-based methods.
Key Points
- Problem: pick a single statement that represents common ground when the space of possible statements (alternatives) is effectively infinite and voter preferences are unknown.
- Solution concept: proportional veto core — an extension of veto/proportionality ideas that ensures groups of voters can block alternatives proportionally to their size (protects minority interests while seeking broad acceptability).
- Access model: only query (sampling) access to the unknown joint distribution of voters and alternatives (i.e., you can draw samples of voters’ preferences over sampled alternatives).
- Algorithmic contribution: a sampling-based algorithm that, with high probability, returns an alternative in the approximate proportional veto core using a number of queries that the authors upper-bound.
- Optimality: matching lower bounds are proved, showing no algorithm can guarantee the same with fewer queries (sample-optimality).
- Empirics: experiments on a synthetic dataset of text preferences show the algorithm reliably finds alternatives in the proportional veto core and the authors compare performance against standard social-choice rules and LLM-based heuristics.
Data & Methods
- Formal model:
- Infinite alternative space; voters have preferences over alternatives drawn from an unknown distribution.
- Formalization of the proportional veto core in this infinite setting and a definition of an approximate proportional veto core appropriate for sampling.
- Query/sampling framework:
- Algorithms may sample voters and candidate alternatives to elicit preference information; no full knowledge of the distributions is assumed.
- Algorithm:
- A concrete sampling procedure that identifies a candidate alternative meeting the approximate proportional veto-core condition with high probability. (Paper gives the algorithmic steps and proves correctness.)
- Theoretical analysis:
- Upper bounds on the number of samples/queries needed as a function of accuracy/confidence and population/alternative-space parameters.
- Lower bounds proving those sample-complexity rates are tight.
- Experiments:
- Synthetic dataset modeling preferences over text statements (to simulate generative-AI outputs and heterogeneous voter tastes).
- Baselines: classical social-choice rules (e.g., plurality, median-like rules) and LLM-based selection heuristics.
- Evaluation: frequency with which methods produce outcomes in the proportional veto core and robustness to population heterogeneity.
Implications for AI Economics
- Generative-AI-enabled choice sets: When AI can propose essentially unlimited alternatives (statements, messages, product descriptions), the cost is not the candidate space but the information needed to aggregate preferences—this paper characterizes that information cost precisely via sample complexity and lower bounds.
- Efficient elicitation: Sampling-based aggregation gives a principled, sample-efficient way to find broadly acceptable outputs without exhaustively presenting or evaluating all AI-generated options—important for scalable human-AI coordination and platform design.
- Representation and fairness trade-offs: The proportional veto core encodes a formal protection for minority blocs (proportional blocking power). Using it helps align algorithmic content or policy selection with proportional fairness norms rather than simple majoritarian outcomes.
- Mechanism design and deployment: Results inform how many user interactions (or how much preference feedback) platforms need to collect to reliably produce consensual or non-controversial AI-generated statements; the lower bounds set fundamental limits on minimally required feedback.
- Practical adoption caveats:
- The model assumes truthful, exogenous preferences and sampling access; strategic behavior, manipulation, or costly reporting could change the information requirements.
- Empirical validation is on synthetic text-preference data; field testing with real user populations and richer preference models remains necessary.
- Research directions for AI economics:
- Integrate strategic reporting and incentive design to ensure truthful feedback under sampling constraints.
- Extend to dynamic settings where statements and preferences co-evolve (e.g., opinion formation or repeated interactions).
- Cost-benefit analyses that trade the sampling cost against social-welfare or platform objectives when deploying generative-AI-driven common-ground selection.
Assessment
Claims (12)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| The paper formalizes the proportional veto core for settings with an infinite alternative space and voters whose preferences are drawn from an unknown distribution. Research Productivity | positive | high | formal definition / existence of an appropriate approximate proportional veto-core concept for infinite alternatives |
0.2
|
| Under only query (sampling) access to the unknown joint distribution of voters and alternatives, there is an efficient sampling-based algorithm that, with high probability, returns an alternative in the approximate proportional veto core. Research Productivity | positive | high | probability that the algorithm's output lies in the approximate proportional veto core |
high probability (unspecified)
0.2
|
| The authors prove an upper bound on the number of samples/queries required by their algorithm as a function of accuracy, confidence, and problem parameters. Research Productivity | positive | high | sample/query complexity required for the algorithm to achieve specified accuracy and confidence |
0.2
|
| Matching information-theoretic lower bounds are proved, establishing that no algorithm can guarantee finding an (approximate) proportional veto-core element with fewer queries than the stated bounds (i.e., the sample complexity is optimal). Research Productivity | negative | high | information-theoretic lower bound on sample/query complexity (optimality claim) |
0.2
|
| On a synthetic text-preference dataset, the proposed algorithm reliably finds alternatives that lie in the proportional veto core. Research Productivity | positive | medium | frequency/proportion of experimental trials producing outcomes in the proportional veto core |
0.12
|
| The authors compare their sampling algorithm against classical social-choice rules and LLM-based heuristics and report superior core-attainment frequency for their method. Research Productivity | positive | medium | relative frequency/proportion of outputs that lie in the proportional veto core across methods |
0.12
|
| The proposed algorithm's performance is robust to heterogeneous populations in the synthetic experiments (i.e., it continues to find core alternatives under varying degrees of population heterogeneity). Research Productivity | positive | medium | frequency/proportion of core outcomes as a function of population heterogeneity |
0.12
|
| The paper characterizes the information cost of aggregating preferences when AI can generate essentially unlimited candidate alternatives by providing tight sample-complexity bounds and lower bounds. Other | positive | high | sample/query complexity as the measure of information cost |
0.2
|
| Using the proportional veto core provides formal protection for minority blocs by giving them proportional blocking power, thus encoding a proportional fairness guarantee compared to simple majoritarian rules. Other | positive | high | existence of proportional blocking power / protection for minority groups as formalized by the core definition |
0.2
|
| The theoretical results (algorithms and sample-complexity bounds) assume truthful, exogenous preferences and simple sampling access; strategic behavior or costly reporting could change the information requirements. Other | negative | high | applicability limitations given model assumptions (truthful sampling access vs. strategic reporting) |
0.2
|
| The empirical validation is performed only on synthetic text-preference data rather than real-world user populations, so field deployment effects and richer preference models remain to be tested. Other | negative | high | scope of empirical validation (synthetic dataset vs. real-world data) |
0.2
|
| The paper suggests (as future work) integrating incentive design for truthful reporting and extending the model to dynamic settings where statements and preferences co-evolve. Research Productivity | speculative | medium | research agenda items (proposed extensions, not empirically measured outcomes) |
0.12
|