A sampling algorithm can reliably find broadly acceptable AI-generated statements while protecting minority blocking rights, and its sample requirements are provably optimal; matching lower bounds show no algorithm can achieve the same guarantees with fewer user queries.

Finding Common Ground in a Sea of Alternatives

Jay Chooi, Paul Gölz, Ariel D. Procaccia, Benjamin Schiffer, Shirley Zhang · March 17, 2026

arxiv theoretical high evidence 7/10 relevance Source PDF

The paper gives a sampling-based algorithm that, with provable correctness and information-theoretically optimal sample complexity, returns an alternative in the (approximate) proportional veto core for an infinite alternative space using only query access, and validates this approach on synthetic text-preference data.

We study the problem of selecting a statement that finds common ground across diverse population preferences. Generative AI is uniquely suited for this task because it can access a practically infinite set of statements, but AI systems like the Habermas machine leave the choice of generated statement to a voting rule. What it means for this rule to find common ground, however, is not well-defined. In this work, we propose a formal model for finding common ground in the infinite alternative setting based on the proportional veto core from social choice. To provide guarantees relative to these infinitely many alternatives and a large population, we wish to satisfy a notion of proportional veto core using only query access to the unknown distribution of alternatives and voters. We design an efficient sampling-based algorithm that returns an alternative in the (approximate) proportional veto core with high probability and prove matching lower bounds, which show that no algorithm can do the same using fewer queries. On a synthetic dataset of preferences over text, we confirm the effectiveness of our sampling-based algorithm and compare other social choice methods as well as LLM-based methods in terms of how reliably they produce statements in the proportional veto core.

Summary

Main Finding

The paper formalizes “finding common ground” over an infinite set of candidate statements using the proportional veto core (a proportional-consensus social choice concept) and gives an efficient, sampling-based algorithm that—using only query access to an unknown distribution of voters and alternatives—returns an alternative in the (approximate) proportional veto core with high probability. The authors prove matching lower bounds showing the sample complexity is information-theoretically optimal, and they validate the approach on a synthetic text-preference dataset while comparing other social-choice and LLM-based methods.

Key Points

Problem: pick a single statement that represents common ground when the space of possible statements (alternatives) is effectively infinite and voter preferences are unknown.
Solution concept: proportional veto core — an extension of veto/proportionality ideas that ensures groups of voters can block alternatives proportionally to their size (protects minority interests while seeking broad acceptability).
Access model: only query (sampling) access to the unknown joint distribution of voters and alternatives (i.e., you can draw samples of voters’ preferences over sampled alternatives).
Algorithmic contribution: a sampling-based algorithm that, with high probability, returns an alternative in the approximate proportional veto core using a number of queries that the authors upper-bound.
Optimality: matching lower bounds are proved, showing no algorithm can guarantee the same with fewer queries (sample-optimality).
Empirics: experiments on a synthetic dataset of text preferences show the algorithm reliably finds alternatives in the proportional veto core and the authors compare performance against standard social-choice rules and LLM-based heuristics.

Data & Methods

Formal model:
- Infinite alternative space; voters have preferences over alternatives drawn from an unknown distribution.
- Formalization of the proportional veto core in this infinite setting and a definition of an approximate proportional veto core appropriate for sampling.
Query/sampling framework:
- Algorithms may sample voters and candidate alternatives to elicit preference information; no full knowledge of the distributions is assumed.
Algorithm:
- A concrete sampling procedure that identifies a candidate alternative meeting the approximate proportional veto-core condition with high probability. (Paper gives the algorithmic steps and proves correctness.)
Theoretical analysis:
- Upper bounds on the number of samples/queries needed as a function of accuracy/confidence and population/alternative-space parameters.
- Lower bounds proving those sample-complexity rates are tight.
Experiments:
- Synthetic dataset modeling preferences over text statements (to simulate generative-AI outputs and heterogeneous voter tastes).
- Baselines: classical social-choice rules (e.g., plurality, median-like rules) and LLM-based selection heuristics.
- Evaluation: frequency with which methods produce outcomes in the proportional veto core and robustness to population heterogeneity.

Implications for AI Economics

Generative-AI-enabled choice sets: When AI can propose essentially unlimited alternatives (statements, messages, product descriptions), the cost is not the candidate space but the information needed to aggregate preferences—this paper characterizes that information cost precisely via sample complexity and lower bounds.
Efficient elicitation: Sampling-based aggregation gives a principled, sample-efficient way to find broadly acceptable outputs without exhaustively presenting or evaluating all AI-generated options—important for scalable human-AI coordination and platform design.
Representation and fairness trade-offs: The proportional veto core encodes a formal protection for minority blocs (proportional blocking power). Using it helps align algorithmic content or policy selection with proportional fairness norms rather than simple majoritarian outcomes.
Mechanism design and deployment: Results inform how many user interactions (or how much preference feedback) platforms need to collect to reliably produce consensual or non-controversial AI-generated statements; the lower bounds set fundamental limits on minimally required feedback.
Practical adoption caveats:
- The model assumes truthful, exogenous preferences and sampling access; strategic behavior, manipulation, or costly reporting could change the information requirements.
- Empirical validation is on synthetic text-preference data; field testing with real user populations and richer preference models remains necessary.
Research directions for AI economics:
- Integrate strategic reporting and incentive design to ensure truthful feedback under sampling constraints.
- Extend to dynamic settings where statements and preferences co-evolve (e.g., opinion formation or repeated interactions).
- Cost-benefit analyses that trade the sampling cost against social-welfare or platform objectives when deploying generative-AI-driven common-ground selection.

Assessment

Paper Typetheoretical Evidence Strengthhigh — Main claims are supported by formal definitions and rigorous proofs (existence/approximation results, an algorithm with upper bounds, and matching information-theoretic lower bounds), and the algorithm is validated on synthetic text-preference experiments; theoretical completeness and tight lower bounds give strong evidence for the paper's core claims, although empirical validation is limited to simulations. Methods Rigorhigh — The paper provides a precise formal model for an infinite alternative space, proves correctness and sample-complexity upper bounds for the proposed algorithm, and proves matching lower bounds showing optimality; experiments compare to reasonable baselines on synthetic data to illustrate empirical behavior. SampleFormal setting: population of voters and an effectively infinite alternative space with only query/sampling access to the joint distribution of voters and alternatives; empirical evaluation: synthetic text-preference dataset simulating heterogeneous voter tastes over AI-generated statements, with baselines including classical social-choice rules and LLM-based heuristics. Themeshuman_ai_collab org_design GeneralizabilityEmpirical results are on synthetic text-preference data and may not reflect behavior of real user populations, Assumes truthful, exogenous preferences and unrestricted sampling access to voter–alternative pairs (no strategic reporting or costly elicitation), Static preference model — does not handle dynamic or endogenous preference formation, Assumes alternatives can be sampled from the space; performance may differ when candidates are produced adaptively by models or users, Validation limited to text-statement preferences; other domains (images, products) may pose different challenges

Claims (12)

Claim	Direction	Confidence	Outcome	Details
The paper formalizes the proportional veto core for settings with an infinite alternative space and voters whose preferences are drawn from an unknown distribution. Research Productivity	positive	high	formal definition / existence of an appropriate approximate proportional veto-core concept for infinite alternatives	0.2
Under only query (sampling) access to the unknown joint distribution of voters and alternatives, there is an efficient sampling-based algorithm that, with high probability, returns an alternative in the approximate proportional veto core. Research Productivity	positive	high	probability that the algorithm's output lies in the approximate proportional veto core	high probability (unspecified) 0.2
The authors prove an upper bound on the number of samples/queries required by their algorithm as a function of accuracy, confidence, and problem parameters. Research Productivity	positive	high	sample/query complexity required for the algorithm to achieve specified accuracy and confidence	0.2
Matching information-theoretic lower bounds are proved, establishing that no algorithm can guarantee finding an (approximate) proportional veto-core element with fewer queries than the stated bounds (i.e., the sample complexity is optimal). Research Productivity	negative	high	information-theoretic lower bound on sample/query complexity (optimality claim)	0.2
On a synthetic text-preference dataset, the proposed algorithm reliably finds alternatives that lie in the proportional veto core. Research Productivity	positive	medium	frequency/proportion of experimental trials producing outcomes in the proportional veto core	0.12
The authors compare their sampling algorithm against classical social-choice rules and LLM-based heuristics and report superior core-attainment frequency for their method. Research Productivity	positive	medium	relative frequency/proportion of outputs that lie in the proportional veto core across methods	0.12
The proposed algorithm's performance is robust to heterogeneous populations in the synthetic experiments (i.e., it continues to find core alternatives under varying degrees of population heterogeneity). Research Productivity	positive	medium	frequency/proportion of core outcomes as a function of population heterogeneity	0.12
The paper characterizes the information cost of aggregating preferences when AI can generate essentially unlimited candidate alternatives by providing tight sample-complexity bounds and lower bounds. Other	positive	high	sample/query complexity as the measure of information cost	0.2
Using the proportional veto core provides formal protection for minority blocs by giving them proportional blocking power, thus encoding a proportional fairness guarantee compared to simple majoritarian rules. Other	positive	high	existence of proportional blocking power / protection for minority groups as formalized by the core definition	0.2
The theoretical results (algorithms and sample-complexity bounds) assume truthful, exogenous preferences and simple sampling access; strategic behavior or costly reporting could change the information requirements. Other	negative	high	applicability limitations given model assumptions (truthful sampling access vs. strategic reporting)	0.2
The empirical validation is performed only on synthetic text-preference data rather than real-world user populations, so field deployment effects and richer preference models remain to be tested. Other	negative	high	scope of empirical validation (synthetic dataset vs. real-world data)	0.2
The paper suggests (as future work) integrating incentive design for truthful reporting and extending the model to dynamic settings where statements and preferences co-evolve. Research Productivity	speculative	medium	research agenda items (proposed extensions, not empirically measured outcomes)	0.12