An actuarial runtime contract lets AI 'buy' authority: pricing and reserving capital for side-effecting actions reveals how much autonomy can be safely released. Tests across four environments and a live Postgres panel show common low-reserve refusal and intermediate-release behaviour, while required reserve capital varies widely (22×) across domains and models.
Autonomous AI agents increasingly issue side-effect-bearing actions: database mutations, refunds, payments, external commitments. We propose the Actuarial Action Interface (AAI), a deterministic runtime contract that prices each such action against a contractually fixed safe default under a time-consistent risk mapping, and gates execution against a per-boundary reserve capital budget. We then develop the Authority Frontier, an evaluation primitive measuring how much autonomous authority the runtime releases at each level of reserve capital. The framework provides (i) a deterministic quote-bind-commit protocol with toll-bounded capability tokens; (ii) a universal seven-class action taxonomy mapping heterogeneous tool calls to comparable authority units; (iii) replay determinism and pathwise reserve coverage under alpha-spending; (iv) cross-domain normalization via full reserve demand C_full and capital metrics Capital@k. We instantiate AAI across four agentic environments (database mutation, customer-service refund, and the public tau-bench retail and airline tool-use traces) and report a live Postgres panel in which three Azure-hosted models propose actions through the same contract. The frontier exhibits a common low-reserve refusal and intermediate-release pattern across domains, with saturation only where the budget grid reaches full reserve demand; required reserve capital varies by 22x (Capital@50 from 289 to 6457). The framework does not force domains into the same shape; it surfaces each domain's actuarial geometry. In the live panel the contract prevents realized loss across all three models at low budget while differing in underwriting persistence under denial: model identity is an actuarial underwriting variable. The contribution is a benchmark-ready evaluation framework for runtime actuarial control of autonomous-agent side effects.
Summary
Main Finding
The paper introduces the Actuarial Action Interface (AAI), a deterministic runtime contract that prices every side-effect-bearing agent action against a fixed safe default and gates execution against a per-boundary reserve capital budget. Around AAI the author defines the Authority Frontier: the fraction of side-effect-bearing authority released as a function of reserve capital. Empirical instantiations across four agentic environments and a live Postgres panel show a common low-budget refusal and intermediate authority-release pattern, domain-specific saturation where budgets hit full reserve demand, and large cross-domain heterogeneity in required reserve capital (Capital@50 ranges 289–6,457, a 22× spread). Model identity matters as an underwriting variable: the same contract prevents realized loss at low budgets across models, but models differ in persistence (how many additional priced proposals they submit) after denial.
Key Points
-
Actuarial Action Interface (AAI)
- A deterministic quote–bind–commit runtime contract that: parses an LLM-proposed action, maps it to a safe default, prices a conservative per-action toll, and gates or downgrades execution according to remaining reserve.
- Five formal properties: bitwise replay determinism; pathwise reserve coverage under alpha-spending; universal seven-class action taxonomy with idempotent safe-default compiler; toll-bounded capability tokens; deterministic ambiguity-set reserve for interface failures.
-
Seven-class universal action taxonomy
- Classes: read-only, additive write, modify write, destructive, monetary-low, monetary-high, external-commit.
- Classification uses typed predicates (monetary flag and amount, external binding, mutation level, reversibility), not string matching.
- Each class is paired with a contractually fixed safe default (e.g., dry-run diff, log proposed op, escalate to human).
-
Pricing and risk machinery
- Per-action conservative tolls computed against a contractually fixed safe-default under a time-consistent convex risk mapping.
- Conformal envelopes and alpha-spending provide finite-sample, pathwise reserve coverage guarantees.
- Toll-bounded capability tokens limit realized authority and scope.
-
Authority Frontier as evaluation primitive
- For each reserve level B, reports the fraction of authority released (release curve).
- Cross-domain normalization uses full reserve demand Cfull and capital summary metrics Capital@k (k ∈ {50,75,90}).
- Descriptive curve distances (Kolmogorov–Smirnov and inverse-Wasserstein) are used for shape comparison (not hypothesis testing).
-
Empirical findings
- Experiments on four environments: database-mutation paired-replay panel, controlled refund/customer-service domain, and trace-only mappings of the public τ-bench retail and airline trajectories.
- Live Postgres panel with three Azure-hosted LLMs proposing actions through the same AAI contract.
- Across domains the Authority Frontier shows a common low-reserve refusal and intermediate release; saturation occurs only when budget grid reaches Cfull.
- Capital@50 ranges from 289 to 6,457 (22× variation).
- The runtime contract prevented realized loss at low budgets across all three models in the live panel, but models differed in underwriting persistence after denials — model identity is an actuarial underwriting variable.
-
Positioning vs prior work
- AAI is distinct from constitutional AI, RLHF, tool sandboxing, capability-based access, and outcome-based guardrails: it is side-effect aware, budget-indexed, deterministic, counterfactual (fixed safe-default), and per-action priced — the paper claims this unique combination.
Data & Methods
-
Environments / datasets
- Database mutation environment: paired-replay experiments (real Postgres live panel).
- Controlled refund / customer-service domain: closed environment with monetary actions and operator escalation.
- τ-bench bridges (public): trace-only retail and airline historical tool-use trajectories mapped into AAI action space.
- Live LLM underwriting panel: three Azure-hosted models proposing actions to the same Postgres instance via AAI.
-
Core protocol and components
- Quote–bind–commit protocol: model proposes action → deterministic parsing/canonicalization → taxonomy classification τ(a) → safe-default mapping σ(a) → conservative toll quote → bind budget if toll ≤ remaining reserve → execute or downgrade/record interface failure.
- Safe-default compiler σ is idempotent on safe actions; monetary cap θ(d)_cap is a contract parameter used to distinguish monetary-low vs monetary-high.
- Toll computation uses a time-consistent convex risk mapping; finite-sample conservatism obtained by conformal envelope on counterfactual increments with alpha-spending for sequential coverage.
- Deterministic canonicalization and canonical state hashing ensure bitwise replay determinism; capability tokens limit exposure.
-
Metrics and evaluation
- Authority Frontier: release fraction vs reserve capital B.
- Normalization: Cfull (full reserve demand) used to permit cross-domain comparability of curve shapes.
- Summary statistics: Capital@k (capital required to obtain k% authority release) for k = 50, 75, 90.
- Shape comparisons: KS distance and inverse-Wasserstein between normalized release curves.
- Additional telemetry: per-class downgrade rates ρ(k)_κ,B(Di), downgrade families, and underwriting persistence (expected additional proposals after denial).
-
Robustness & scope
- Replay determinism, pathwise reserve coverage, and ambiguity reserves described; adversarial stress tests and limitations discussed (contract parameter sensitivity, domain ontology limits, adaptive alpha-spending left to future work).
- Paper is an evaluation framework (benchmark-ready) rather than an established community benchmark.
Implications for AI Economics
- New micro-insurance unit: Treating single agent actions as insurable micro-exposures enables per-action reserve pricing and pathwise capital accounting — a shift from coarse-period insurance/firm-level models to transaction-level actuarial control.
- Capital allocation and pricing heterogeneity: Large cross-domain heterogeneity (22× Capital@50 spread) implies insurers and platform underwriters must price and allocate capital by domain and by agent/model identity, not just by task label.
- Underwriting variables include model behavior: Model identity affects underwriting persistence and reserve demand; insurers can use observed model persistence and post-denial behavior as underwriting signals to set premiums, capacity, or stricter gating rules.
- Operational risk management: Deterministic, auditable runtime contracts that gate authority according to remaining reserve help translate risk appetite and capital constraints into operational guardrails, facilitating compliance, auditing, and contractual enforcement.
- Product and market design opportunities
- Per-action premiums, per-boundary reserve allocations, and capability-token-based products for different exposure classes.
- Authority Frontier curves as standardized risk disclosures for agent deployments—enabling buyers, auditors, and regulators to compare how much authority is released at given capital levels.
- Integration with mechanism-design (companion work) can reduce gaming and enable marketable underwriting contracts with strategic operators.
- Policy and regulatory relevance: AAI provides a transparent mapping from capital to permitted authority and can serve as evidence for regulators that deployments are run within provable budgeted risk constraints.
- Limitations and open questions important for economics
- Choice of monetary-cap parameter, safe-defaults, and alpha-spending schedule materially affect pricing and released authority; standardization is needed for market comparability.
- Adversarial or strategic agents may change behavior; companion mechanism-design work is required to ensure incentive compatibility in market settings.
- Scaling to complex multi-agent, multi-boundary systems and aggregation risk across portfolios of agents requires further actuarial research (correlation, accumulation, systemic exposure).
- Empirical calibration of loss distributions for rare but high-cost external-commit actions will be crucial for realistic pricing.
Overall, AAI operationalizes an actuarial approach to runtime control of agent side effects: it creates a contractible, auditable, and comparable object (the Authority Frontier) that maps reserve capital into released authority, opening a pathway for per-action pricing, underwriting, and market products in the economics of autonomous AI agents.
Assessment
Claims (10)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We propose the Actuarial Action Interface (AAI), a deterministic runtime contract that prices each such action against a contractually fixed safe default under a time-consistent risk mapping, and gates execution against a per-boundary reserve capital budget. Ai Safety And Ethics | positive | high | ability to price actions and gate execution via a deterministic runtime contract |
0.02
|
| We develop the Authority Frontier, an evaluation primitive measuring how much autonomous authority the runtime releases at each level of reserve capital. Adoption Rate | positive | high | amount of autonomous authority released as a function of reserve capital |
0.02
|
| The framework provides (i) a deterministic quote-bind-commit protocol with toll-bounded capability tokens; (ii) a universal seven-class action taxonomy mapping heterogeneous tool calls to comparable authority units; (iii) replay determinism and pathwise reserve coverage under alpha-spending; (iv) cross-domain normalization via full reserve demand C_full and capital metrics Capital@k. Ai Safety And Ethics | positive | high | availability of protocol, taxonomy, determinism properties, and normalization metrics |
0.02
|
| We instantiate AAI across four agentic environments (database mutation, customer-service refund, and the public tau-bench retail and airline tool-use traces). Adoption Rate | positive | high | successful instantiation of AAI across multiple agentic environments |
n=4
0.12
|
| We report a live Postgres panel in which three Azure-hosted models propose actions through the same contract. Adoption Rate | positive | high | models proposing actions under the contract in a live Postgres setup |
n=3
0.12
|
| The frontier exhibits a common low-reserve refusal and intermediate-release pattern across domains, with saturation only where the budget grid reaches full reserve demand. Adoption Rate | mixed | high | pattern of authority release (refusal at low-reserve, release at intermediate-reserve, saturation at full-reserve) |
n=4
0.12
|
| Required reserve capital varies by 22x (Capital@50 from 289 to 6457). Adoption Rate | mixed | high | required reserve capital (Capital@50) |
n=4
Capital@50 from 289 to 6457 (22x)
0.12
|
| The framework does not force domains into the same shape; it surfaces each domain's actuarial geometry. Adoption Rate | mixed | high | variation in actuarial geometry (frontier shape) across domains |
n=4
0.12
|
| In the live panel the contract prevents realized loss across all three models at low budget while differing in underwriting persistence under denial: model identity is an actuarial underwriting variable. Ai Safety And Ethics | positive | medium | realized loss prevention and underwriting persistence under denial across models |
n=3
0.07
|
| The contribution is a benchmark-ready evaluation framework for runtime actuarial control of autonomous-agent side effects. Governance And Regulation | positive | high | availability of a benchmark-ready evaluation framework |
0.02
|