A principled probability cutoff can stop exploitative Pascal-style gambles without breaking rational decision-making, offering implementable design norms for safer AI agents; practical calibration and deployment remain context-dependent and untested.
The issue discussed in this paper is a long-standing issue of decision theory. It is reframed as a design problem for intelligent agents. It provides a principled cutoff which can be ultra-low-probability and extreme utility outcomes. This cutoff can prevent the exploitability of autonomous agents. A vulnerability class is characterised for expected-utility maximisers and a rationally negligible probability threshold, which is rooted in a scepticism standpoint, is introduced. This preserves dominance and tractability and blocks adversarial gambles (Pascal-type offers). The use of formal analysis motivates the design norms for AI agents (utility bounding, calibrated priors and epsilonscreening) and guidance on the selection of context-sensitive thresholds that ensure that preferences do not undergo dramatic changes. The proposition of a safety-oriented inductive bias for rational AI decision makers, whose theorists' desiderata are in line with implementable policy constraints in high stakes low signal situations.
Summary
Main Finding
Reframing a classical decision-theory problem as an AI design problem, the paper proposes a principled, context-sensitive cutoff (a "rationally negligible probability" threshold) that excludes ultra-low-probability, extreme-utility outcomes from expected-utility calculations. This cutoff prevents a class of exploitations (Pascal-type adversarial gambles) that make autonomous expected-utility maximisers manipulable, while preserving dominance relations and computational tractability. The formal analysis motivates concrete design norms—utility bounding, calibrated priors, and epsilon-screening—constituting a safety-oriented inductive bias for rational AI agents applicable in high-stakes, low-signal environments.
Key Points
- Problem framing: A long-standing pathology in decision theory (extreme low-probability, high-utility outcomes dominating decisions) is recast as an engineering/design vulnerability for autonomous agents.
- Vulnerability class: Expected-utility maximisers can be exploited by adversarial offers that attach tiny probabilities to enormous utilities (Pascal-type gambles), causing unstable or manipulable preferences.
- Rationally negligible probability threshold: Introduces an explicit epsilon cutoff beneath which probabilities are treated as effectively zero. This cutoff is grounded in a sceptical/epistemic stance about ultra-low-probability claims.
- Preservation properties:
- Dominance preserved: Choices that dominate others at relevant probability scales remain preferred.
- Tractability preserved: Decision computations become bounded and implementable because extreme tails are screened out.
- Safety design norms:
- Utility bounding: Limit the effective utility scale to avoid runaway weight on extreme payoffs.
- Calibrated priors: Use priors that reflect realistic epistemic uncertainty and are resistant to adversarial overfitting of tiny-probability claims.
- Epsilon-screening: Apply the rationally negligible threshold in decision rules to ignore outcomes below the cutoff.
- Context sensitivity: The epsilon should be chosen relative to contextual factors (stakes, information quality, agent resources) to avoid large preference reversals from minor changes in context.
- Policy alignment: The proposed inductive bias aligns normative theorist desiderata with implementable constraints for safety in high-stakes, low-signal settings.
Data & Methods
- Nature of work: The paper is formal/theoretical (no empirical dataset). It develops definitions, lemmas, and proofs to characterize vulnerabilities and to show how cutoff rules affect decision properties.
- Methods used:
- Decision-theoretic modeling of expected-utility maximisers and adversarial offer mechanisms.
- Formal definition of a vulnerability class and of a rationally negligible probability threshold (epsilon).
- Analytical proofs that epsilon-screening blocks Pascal-type manipulations while preserving dominance and feasible computation.
- Normative argumentation linking epistemic scepticism about tiny-probability claims to practical design rules (utility bounds, prior calibration).
- Guidance for selecting context-sensitive epsilons via sensitivity/robustness considerations (qualitative/mathematical criteria rather than empirical estimation).
- Limitations noted: The approach is normative and structural; it requires choices (utility bounds, epsilon levels, prior calibration) that introduce design trade-offs and may reduce sensitivity to genuinely rare but real catastrophic risks if misapplied.
Implications for AI Economics
- Robustness of autonomous agents: Implementing epsilon-screening and utility bounds reduces agents' exploitability in economic environments where adversaries can propose skewed gambles or craft tiny-probability, huge-payoff claims.
- Market and mechanism design: Designers of markets, auctions, and contract mechanisms that include or interact with autonomous agents should account for susceptibility to Pascal-type offers and require agents to use calibrated priors and negligible-probability cutoffs.
- Regulatory policy: Regulators can adopt or mandate inductive biases (bounded utility scales, minimum probability thresholds, documentation of prior calibration) as part of safety standards for high-stakes AI systems to limit manipulable decision rules.
- Welfare and distributional trade-offs: Truncating tails can prevent pathological behaviour but may underweight genuine low-probability catastrophic risks; economic analysis must weigh robustness against potential welfare losses from ignoring real rare events.
- Practical implementation: Economists and engineers should choose epsilon relative to signal quality, decision stakes, and agent capacity; perform sensitivity checks to ensure preferences do not flip with small epsilon adjustments; and integrate these norms into agent design, testing, and certification.
- Research directions: Empirical calibration of context-sensitive epsilons, quantifying welfare trade-offs from tail-truncation, and developing standards for prior calibration that are robust to strategic manipulation.
Assessment
Claims (7)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| The long-standing issue in decision theory is reframed as a design problem for intelligent agents. Ai Safety And Ethics | positive | high | reframing of a theoretical issue as a design problem for agents |
0.12
|
| The paper provides a principled cutoff — a rationally negligible probability threshold — that can exclude ultra-low-probability extreme-utility outcomes and thereby prevent the exploitability of autonomous agents. Ai Safety And Ethics | positive | high | prevention of exploitability by excluding ultra-low-probability extreme-utility outcomes |
0.2
|
| A vulnerability class is characterised for expected-utility maximisers that makes them susceptible to adversarial gambles. Ai Safety And Ethics | negative | high | vulnerability of expected-utility maximisers to adversarial gambles |
0.2
|
| The introduced rationally negligible probability threshold preserves dominance and tractability while blocking adversarial gambles (Pascal-type offers). Ai Safety And Ethics | positive | high | preservation of dominance and tractability; blocking of adversarial gambles |
0.2
|
| The formal analysis motivates specific design norms for AI agents: utility bounding, calibrated priors, and epsilon-screening. Ai Safety And Ethics | positive | high | adoption of design norms (utility bounding, calibrated priors, epsilon-screening) |
0.12
|
| The paper gives guidance on the selection of context-sensitive thresholds (negligibility thresholds) that ensure an agent's preferences do not undergo dramatic changes due to ultra-rare hypotheses. Ai Safety And Ethics | positive | high | stability of agent preferences under thresholding |
0.12
|
| The paper proposes a safety-oriented inductive bias for rational AI decision-makers whose desiderata align with implementable policy constraints in high-stakes, low-signal situations. Governance And Regulation | positive | medium | alignment of a proposed inductive bias with implementable policy constraints; improved decision-making in high-stakes low-signal contexts |
0.01
|