Unobserved counterfactuals in sequential decision systems systematically under-expose marginalized groups and can amplify exclusion; modelling model- and feedback-uncertainty and using uncertainty-aware exploration reduces outcome variance for disadvantaged groups without sacrificing expected institutional utility.

Fairness under uncertainty in sequential decisions

Michelle Seng Ah Lee, Kirtan Padh, David Watson, Niki Kilbertus, Jatinder Singh · April 23, 2026

arxiv theoretical low evidence 7/10 relevance Source PDF

The paper formalizes how model, feedback, and prediction uncertainty in sequential decision systems can unevenly harm under-represented groups, and shows via RL-formulated models and simulations that uncertainty-aware exploration can reduce disparities while preserving institutional objectives.

Fair machine learning (ML) methods help identify and mitigate the risk that algorithms encode or automate social injustices. Algorithmic approaches alone cannot resolve structural inequalities, but they can support socio-technical decision systems by surfacing discriminatory biases, clarifying trade-offs, and enabling governance. Although fairness is well studied in supervised learning, many real ML applications are online and sequential, with prior decisions informing future ones. Each decision is taken under uncertainty due to unobserved counterfactuals and finite samples, with dire consequences for under-represented groups, systematically under-observed due to historical exclusion and selective feedback. A bank cannot know whether a denied loan would have been repaid, and may have less data on marginalized populations. This paper introduces a taxonomy of uncertainty in sequential decision-making -- model, feedback, and prediction uncertainty -- providing shared vocabulary for assessing systems where uncertainty is unevenly distributed across groups. We formalize model and feedback uncertainty via counterfactual logic and reinforcement learning, and illustrate harms to decision makers (unrealized gains/losses) and subjects (compounding exclusion, reduced access) of policies that ignore the unobserved space. Algorithmic examples show it is possible to reduce outcome variance for disadvantaged groups while preserving institutional objectives (e.g. expected utility). Experiments on data simulated with varying bias show how unequal uncertainty and selective feedback produce disparities, and how uncertainty-aware exploration alters fairness metrics. The framework equips practitioners to diagnose, audit, and govern fairness risks. Where uncertainty drives unfairness rather than incidental noise, accounting for it is essential to fair and effective decision-making.

Summary

Main Finding

Unequal uncertainty in sequential (online/reinforcement) decision systems — especially uneven epistemic and selective-feedback uncertainty across groups — systematically compounds disparities. Explicitly accounting for uncertainty (e.g., uncertainty-aware exploration that targets high-uncertainty subgroups) can reduce variance in outcomes for historically disadvantaged groups and improve observed fairness metrics without necessarily sacrificing the decision-maker’s expected utility. The paper provides a taxonomy and formal framework to diagnose where uncertainty arises and how it drives fairness harms in sequential settings.

Key Points

Taxonomy: The authors introduce a lifecycle-based taxonomy of uncertainty in sequential decision systems, distinguishing global uncertainties (systemic, e.g., model and data generation processes) from local uncertainties (individual or subgroup-level, e.g., prediction and feedback uncertainty). They highlight three broad categories emphasized throughout the paper: model uncertainty, feedback uncertainty, and prediction uncertainty.
Unequal uncertainty matters: Marginalized or historically excluded groups often suffer higher epistemic uncertainty (less data, selective observation), which raises effective risk and can lead to systematically worse decisions (e.g., more loan denials).
Selective feedback / selective labels: When outcomes are only observed conditional on past decisions (e.g., only seeing repayment behavior for approved loans), feedback uncertainty interacts with representation gaps to create self-reinforcing exclusionary dynamics.
Formalization: The paper formalizes model and feedback uncertainty using counterfactual logic and reinforcement-learning (bandit/RL) techniques to show how naively ignoring unobserved counterfactuals biases policies and outcomes.
Illustrative mechanism: The authors propose and simulate targeted, uncertainty-proportional exploration (increasing the probability of favorable actions when prediction uncertainty is high) as a principled alternative to naive policies or group-based preference rules (distinct from affirmative action).
Empirical/experimental results: Simulations on synthetic datasets with controlled degrees of bias demonstrate that uncertainty-aware exploration can (i) reduce outcome variance for disadvantaged groups, (ii) improve fairness metrics, and (iii) retain institutional objectives like expected utility.
Governance & legal considerations: RL-style exploration introduces stochasticity that raises legal, reputational, and operational concerns (e.g., individual fairness, non-discrimination law, and acceptability of randomness). The taxonomy is positioned as a diagnostic tool for auditing and governance.

Data & Methods

Conceptual / theoretical tools:
- Counterfactual logic to formalize unobserved outcomes and feedback uncertainty (what would have happened under alternate decisions).
- Reinforcement learning / online learning (including bandit-style setups) to model sequential decision-making with exploration–exploitation trade-offs.
Taxonomy construction:
- Survey of ML uncertainty literature mapped to stages of the ML lifecycle; six uncertainty types are organized into global (systemic/model-level) and local (individual/subgroup-level) categories.
Experiments:
- Synthetic/simulated datasets generated to include varying degrees of historical bias and selective observation (selective labels).
- Implementation of simple algorithmic policies including baseline (naïve) policies and uncertainty-aware exploration policies that allocate exploration proportional to estimated uncertainty.
- Evaluation metrics include decision-maker utility (expected reward), fairness metrics, and variance in group outcomes; outcomes compared across policies and bias regimes.
Key methodological claim: the paper is not primarily an algorithmic innovation paper; it offers a diagnostic/analytic framework plus illustrative, simple exploration strategies to demonstrate the effects of unequal uncertainty.

Implications for AI Economics

Distributional dynamics and market access:
- Uneven uncertainty produces dynamic exclusion: under-observed groups get fewer positive decisions, reducing future data and reinforcing higher uncertainty — analogous to persistent adverse selection and market segmentation that lowers labor/credit access for disadvantaged groups.
- This can create long-run aggregate inefficiencies by under-allocating productive opportunities and underestimating demand in excluded segments.
Firm incentives and strategic behavior:
- Firms maximizing short-term expected utility may rationally avoid exploring high-uncertainty submarkets, producing socially suboptimal “data poverty traps.” Understanding the exploration cost-benefit is crucial for incentive design.
- Uncertainty-aware exploration can, in some settings, be a Pareto-improving strategy (reduce group disparities while preserving firm utility), suggesting private incentives for its adoption may exist—but friction, legal risk, and reputational costs complicate adoption.
Policy and regulation:
- Regulation should account for dynamic, uncertainty-driven harms (not just static fairness metrics). Possible approaches:
  - Mandates or incentives for firms to deploy uncertainty-aware exploration (or subsidize exploration) where public-interest services are at stake (e.g., lending, credit scoring).
  - Requirements for auditing pipelines to document sources of uncertainty (taxonomy-based documentation) and to monitor selective feedback effects.
  - Time-limited safe exploration regimes, human-in-the-loop guardrails, or evidence-based pilot programs to limit legal/reputational exposures while collecting data.
Welfare, cost-benefit, and measurement:
- Cost–benefit analyses of ML deployment must include the long-run welfare effects of perpetuated data gaps and reduced access to markets for marginalized groups.
- Measuring firm-level utility alone can be misleading; social welfare accounting should include effects of compounded exclusion and reduced downstream opportunities.
Research and market design questions for economics:
- Optimal exploration policy design under regulatory constraints and private cost structures (how much exploration should firms do, who should pay).
- Market-level effects when multiple firms compete: do competitive pressures induce exploration that reduces exclusion, or do firms free-ride on others’ exploration?
- Mechanism design for corrective subsidies, data-pooling, or public-data provision to mitigate representation gaps.
- Empirical estimation of the magnitude of selective-feedback externalities in real markets (credit, hiring, health) and the welfare gains from uncertainty-aware policies.
Auditing and governance implications:
- Regulators and auditors should track uncertainty heterogeneity across groups and require counterfactual analyses and monitoring of selective feedback loops.
- Documentation and disclosure (e.g., uncertainty maps, exploration policies) can help align firm incentives, consumer expectations, and legal standards.

Suggested next steps for economists interested in this area: - Model the firm’s optimization including exploration costs, legal/reputational risk, and long-run market access externalities. - Empirically estimate feedback uncertainty and the impact of targeted exploration on access and default/repayment rates in real-world lending or hiring datasets. - Design and evaluate policy instruments (subsidies, data trusts, mandated audits) to correct market failures arising from unequal uncertainty.

Assessment

Paper Typetheoretical Evidence Strengthlow — Evidence is primarily conceptual and simulation-based rather than empirical or causal in real-world settings; the paper shows mechanisms and proof-of-concept algorithmic effects in controlled synthetic environments but does not establish causal impacts on real economic outcomes or validate results on observational or experimental field data. Methods Rigormedium — Theoretical formalization via counterfactual logic and reinforcement-learning frameworks appears careful and coherent, and simulation experiments are useful for illustrating mechanisms; however, lack of real-world datasets, external validation, or robustness checks across diverse empirical settings limits methodological rigor for applied inference. SampleSynthetic/simulated datasets and stylized sequential decision-making environments generated with varying degrees of bias and selective feedback (e.g., loan-denial style settings where counterfactual repayment is unobserved); algorithmic experiments evaluate uncertainty-aware exploration policies versus baseline policies under controlled simulated conditions. Themesinequality governance human_ai_collab adoption IdentificationNo causal identification from observational data; the paper develops a conceptual taxonomy and formal models (counterfactual logic and reinforcement learning environments) to characterize sources of uneven uncertainty, and uses algorithmic constructions and simulations (synthetic data with controlled bias) to demonstrate mechanisms and evaluate proposed uncertainty-aware policies. GeneralizabilityResults are from simulations and may not generalize to messy, high-dimensional real-world administrative or firm data., Formal models rely on assumptions (specified counterfactual structure, reward models, feedback-generation mechanisms) that may not hold across institutions or domains., Institutional, legal, and social complexities (e.g., strategic behavior, policy constraints, multi-stakeholder incentives) are not fully modeled., Practical deployment challenges (data collection, measurement error, user responses) and costs of exploration are not empirically evaluated.

Claims (10)

Claim	Direction	Confidence	Outcome	Details
This paper introduces a taxonomy of uncertainty in sequential decision-making consisting of three types: model uncertainty, feedback uncertainty, and prediction uncertainty. Ai Safety And Ethics	positive	high	categories of uncertainty in sequential decision-making	0.12
The authors formalize model and feedback uncertainty using counterfactual logic and reinforcement learning. Ai Safety And Ethics	positive	high	formalization of uncertainty types	0.12
Algorithmic examples in the paper demonstrate it is possible to reduce outcome variance for disadvantaged groups while preserving institutional objectives such as expected utility. Inequality	positive	high	outcome variance for disadvantaged groups; expected utility (institutional objective)	0.12
Experiments on simulated data with varying bias show that unequal uncertainty and selective feedback produce disparities across groups. Inequality	negative	high	group disparities (fairness metrics)	0.12
Uncertainty-aware exploration (in algorithms) alters fairness metrics compared to policies that ignore uncertainty. Ai Safety And Ethics	mixed	high	fairness metrics	0.12
Policies that ignore the unobserved (counterfactual) space can harm decision makers (via unrealized gains or losses) and subjects (via compounding exclusion and reduced access). Inequality	negative	high	unrealized gains/losses for decision makers; compounding exclusion and reduced access for subjects	0.12
Many practical machine learning applications are online and sequential, meaning prior decisions inform future ones — a setting in which fairness challenges differ from standard supervised learning. Ai Safety And Ethics	neutral	high	characterization of ML application setting (online/sequential)	0.12
Under-represented groups tend to be systematically under-observed because of historical exclusion and selective feedback, which exacerbates uncertainty for those groups. Inequality	negative	high	observation frequency/data availability for under-represented groups; resulting uncertainty	0.12
The proposed framework can help practitioners diagnose, audit, and govern fairness risks in socio-technical decision systems. Governance And Regulation	positive	high	practitioner ability to diagnose/audit/govern fairness risks	0.06
When unfairness is driven by uncertainty (rather than incidental noise), accounting for uncertainty is essential to achieving fair and effective decision-making. Ai Safety And Ethics	positive	high	fairness and effectiveness of decision-making when uncertainty is accounted for	0.12