Large language models replicate South Asian caste hierarchies in matchmaking: same-caste profiles score up to 25% higher, and inter-caste pairings are ranked according to traditional status across GPT, Gemini, Llama, Qwen and BharatGPT—raising risks that AI-mediated matchmaking could reinforce historical exclusion.

Sima AIunty: Caste Audit in LLM-Driven Matchmaking

Atharva Naik, Shounok Kar, Varnika Sharma, Ashwin Rajadesingan, Koustuv Saha · March 31, 2026 · arXiv (Cornell University)

openalex descriptive medium evidence 7/10 relevance Source PDF

Across five major LLM families, same-caste matches in South Asian matrimonial profiles receive substantially higher ratings and inter-caste matches are ordered according to traditional caste hierarchy, with same-caste ratings up to 25% higher on a 10-point scale.

Social and personal decisions in relational domains such as matchmaking are deeply entwined with cultural norms and historical hierarchies, and can potentially be shaped by algorithmic and AI-mediated assessments of compatibility, acceptance, and stability. In South Asian contexts, caste remains a central aspect of marital decision-making, yet little is known about how contemporary large language models (LLMs) reproduce or disrupt caste-based stratification in such settings. In this work, we conduct a controlled audit of caste bias in LLM-mediated matchmaking evaluations using real-world matrimonial profiles. We vary caste identity across Brahmin, Kshatriya, Vaishya, Shudra, and Dalit, and income across five buckets, and evaluate five LLM families (GPT, Gemini, Llama, Qwen, and BharatGPT). Models are prompted to assess profiles along dimensions of social acceptance, marital stability, and cultural compatibility. Our analysis reveals consistent hierarchical patterns across models: same-caste matches are rated most favorably, with average ratings up to 25% higher (on a 10-point scale) than inter-caste matches, which are further ordered according to traditional caste hierarchy. These findings highlight how existing caste hierarchies are reproduced in LLM decision-making and underscore the need for culturally grounded evaluation and intervention strategies in AI systems deployed in socially sensitive domains, where such systems risk reinforcing historical forms of exclusion.

Summary

Main Finding

LLMs used to evaluate matrimonial profiles reproduce and amplify existing caste hierarchies. In a controlled audit using real matrimonial profiles (Shaadi.com), five LLM families (GPT, Gemini, Llama, Qwen, BharatGPT) rated same‑caste matches most favorably and produced a graded ordering of cross‑caste matches that aligns with traditional caste hierarchy (Brahmin → Kshatriya → Vaishya → Shudra → Dalit). Average ratings for same‑caste matches were up to ~25% higher on a 10‑point scale than inter‑caste matches. These patterns were consistent across model families and three social evaluation dimensions (social acceptance, marital stability, cultural compatibility).

Key Points

Audit setting: matchmaking as a socially and culturally grounded domain where algorithmic judgments matter for relational outcomes.
Treatment variables: caste (Brahmin, Kshatriya, Vaishya, Shudra, Dalit) and income (five buckets) systematically varied across real anonymized matrimonial profiles.
Evaluation targets: models asked to rate profiles on social acceptance, marital stability, and cultural compatibility (numeric ratings).
Models tested: five state‑of‑the‑art LLM families — GPT, Gemini, Llama, Qwen, BharatGPT — including proprietary, open, and regional models.
Main empirical result: robust same‑caste preference (endogamy signal) and hierarchical ordering across castes, with upper‑caste candidates rated higher and lower‑caste candidates rated lower, holding across models and evaluation dimensions.
Magnitude: same‑caste advantages up to ~25% on a 10‑point scale; clear graded disparities rather than only binary discrimination.
Framing of harm: identifies “relational and hierarchical bias” — not merely representational stereotypes but structured ranking that shapes who is deemed acceptable or stable as a partner.
Ethics & privacy: used anonymized secondary data; team engaged reflexively and noted positionality and the need for culturally grounded interpretation.

Data & Methods

Data source: real matrimonial profiles originally from Shaadi.com, anonymized and used as a seed dataset.
Experimental design: controlled audit where caste and income labels were systematically varied while other profile content remained constant to isolate caste effects.
Caste categories: five major caste groups (Brahmin, Kshatriya, Vaishya, Shudra, Dalit).
Income variation: five income buckets included to assess intersectional effects with socioeconomic status.
Models: evaluated five LLM families (GPT, Gemini, Llama, Qwen, BharatGPT).
Prompting & tasks: models prompted to rate candidate pairings on three dimensions—social acceptance, marital stability, cultural compatibility—producing numeric scores (10‑point scale).
Analysis: regression‑based statistical analysis to estimate the effect of caste (and income) on ratings and to test for ordered/hierarchical patterns across caste labels.
Robustness checks: comparative analysis across model families and dimensions showed consistent hierarchical patterns; details on standard errors, covariates, and model specification are in the paper (full methods/appendix).

Implications for AI Economics

Market outcomes and matching efficiency
- Algorithmic reproduction of caste preferences can structurally segment matchmaking markets, reinforcing assortative matching by caste and reducing cross‑caste matches even where social welfare might increase from broader matching.
- Biased evaluations distort the information set used by users (or firms) and can lower aggregate matching efficiency, produce crowding in certain pools, and create unused surplus among lower‑rated groups.
Distributional and welfare effects
- Systematic downgrading of lower‑status groups (e.g., Dalits) imposes negative welfare and opportunity costs, compounding historical disadvantages and potentially lowering labor/social mobility that is intertwined with marriage markets.
- Perceived objectivity of AI amplifies legitimacy of discriminatory outcomes, increasing the persistence of exclusionary equilibria.
Platform incentives, competition, and regulation
- Platforms that deploy LLM evaluators gain market power to shape norms; if users trust AI assessments, platform design choices (whether to expose caste, to allow caste filters, or to surface AI scores) will materially affect demand and revenues.
- There are reputational, legal, and regulatory risks for platforms whose AI perpetuates discriminatory outcomes; these create economic incentives to audit, disclose, or redesign algorithms.
Policy and intervention economics
- Cost–benefit analyses are needed for mitigation strategies (data curation, constrained optimization, post‑hoc calibration, human oversight, hiding caste attributes). Mitigation has direct costs (development, reduced model utility) and social benefits (reduced discrimination, higher aggregate welfare).
- Market design interventions—e.g., removing caste filters, anonymizing caste in initial matching, or subsidizing diversity-promoting features—could be evaluated with field experiments to quantify effects on matches and welfare.
Measurement and metric design
- Standard fairness metrics (parity, calibration) may be insufficient; need culturally grounded metrics that capture hierarchical and relational harms (ordered disparities, endogamy bias, intergroup ranking effects).
- Dynamic feedbacks: AI‑shaped norms can create path dependence—models trained on platform data influenced by prior biased matches will perpetuate and amplify bias—necessitating longitudinal audits and interventions.
Research agenda for AI economics
- Quantify welfare losses from algorithmic caste bias in matching markets and estimate efficiency gains from corrective policies.
- Study equilibrium responses: how users, families, and competing platforms adapt to AI evaluators (e.g., signal inflation, strategic labeling).
- Compare costs and effectiveness of technical (model‑level) versus design/regulatory (platform‑level) mitigation strategies in field deployments.
- Investigate cross‑domain spillovers: similar hierarchical biases in other relational markets (housing, lending, hiring networks) could produce broader economic externalities.

Takeaway: LLMs encode and reproduce culturally specific hierarchical biases that have measurable economic consequences for matching markets, equity, and welfare. Addressing these harms requires combining culturally informed auditing, platform‑level design choices, regulation, and economic evaluation of mitigation tradeoffs.

Assessment

Paper Typedescriptive Evidence Strengthmedium — The study uses a controlled audit design that systematically varies caste and income across real-world matrimonial profiles and evaluates multiple LLM families, which gives fairly direct evidence about model outputs; however, it does not connect those outputs to downstream human behavior or labor/economic outcomes, and results may be sensitive to prompt choice, profile sampling, languages, and model versions. Methods Rigormedium — Rigorous aspects include controlled counterfactual variation of caste and income, multiple evaluative dimensions, and evaluation across five model families; missing or unclear elements (based on the summary) are sample size and representativeness, details on prompt engineering and robustness checks, statistical testing and effect uncertainty, and whether results hold across languages and prompt framings. SampleReal-world matrimonial profiles were used as the base; caste identity was systematically varied across five categories (Brahmin, Kshatriya, Vaishya, Shudra, Dalit) and income across five buckets; five LLM families were evaluated (GPT, Gemini, Llama, Qwen, BharatGPT); models were prompted to rate profiles on social acceptance, marital stability, and cultural compatibility on a 10-point scale. (The summary does not report the number of profiles, languages, or time/version stamps.) Themesinequality governance GeneralizabilityResults specific to Indian matrimonial contexts and caste categories; not directly generalizable to other cultural settings or domains., Limited set of LLM families and unspecified model versions — results may change with model updates or different providers., Dependent on prompt phrasing, language of profiles, and profile edits; different prompts or languages may yield different biases., Real-world impacts on human decisions or market outcomes are not measured — model output biases may not translate directly into behavioral or economic effects., Representativeness of the matrimonial profiles is unclear (selection bias); effects may differ for different subpopulations or profile formats.

Claims (8)

Claim	Direction	Confidence	Outcome	Details
We conduct a controlled audit of caste bias in LLM-mediated matchmaking evaluations using real-world matrimonial profiles. Other	other	high	presence of caste bias in LLM-mediated matchmaking evaluations	0.18
We vary caste identity across Brahmin, Kshatriya, Vaishya, Shudra, and Dalit, and income across five buckets. Other	other	high	manipulation of profile attributes (caste, income)	0.18
We evaluate five LLM families (GPT, Gemini, Llama, Qwen, and BharatGPT). Other	other	high	model set evaluated	0.18
Models are prompted to assess profiles along dimensions of social acceptance, marital stability, and cultural compatibility. Decision Quality	other	high	ratings for social acceptance, marital stability, cultural compatibility	0.18
Our analysis reveals consistent hierarchical patterns across models: same-caste matches are rated most favorably. Decision Quality	positive	high	favorability ratings for same-caste vs inter-caste matches	0.18
Average ratings [for same-caste matches were] up to 25% higher (on a 10-point scale) than inter-caste matches. Decision Quality	positive	high	average rating on a 10-point scale	up to 25% higher (on a 10-point scale) 0.18
Inter-caste matches are further ordered according to traditional caste hierarchy. Inequality	negative	high	ordinal rating/order of inter-caste matches by caste	0.18
These findings highlight how existing caste hierarchies are reproduced in LLM decision-making and underscore the need for culturally grounded evaluation and intervention strategies in AI systems deployed in socially sensitive domains. Inequality	negative	high	risk of reinforcing historical exclusion through LLM decision-making	0.03