A layered insurance contract can make actuarial runtimes robust to strategic operators: by closing two attack surfaces with minimal-authority and no-splitting and fixing the rest via aggregation, escalation fees and a model-identity menu, the authors prove incentive compatibility and satisfy individual rationality and weak budget balance in equilibrium.
Paper A defines a time-consistent actuarial runtime that prices each side-effect-bearing action against a contractually fixed safe default and gates execution against a reserve budget. It treats the operator as passive. This paper makes the operator strategic. We characterise a five-attack space for autonomous AI-agent insurance contracts and prove when the actuarial runtime is gaming-resistant. Two attack surfaces -- post-toll safe-default selection and within-boundary action splitting -- are closed by Paper A's minimal-authority and no-splitting clauses. The remaining three require new contract clauses. First, common-control aggregation prevents cross-boundary re-routing from reducing toll below the boundary potential applied to total exposure. Second, interface failures such as invalid JSON are contract-relevant events, not safety wins: treating them as zero-toll safe defaults can reward unreliable models, while escalation fees reverse the incentive. We validate this interface-compliance theorem on committed cross-model traces from the companion empirical paper. Third, a model-identity menu with a componentwise-minimum penalty schedule makes truthful reporting of the deployed model weakly dominant. We then compose these clauses with Paper A's runtime guarantees to obtain joint incentive compatibility over the five-attack space. Finally, a two-parameter premium family discharges operator individual rationality and weak budget balance at the truthful equilibrium. The result is an incentive-compatibility layer for actuarial control of autonomous-agent side effects.
Summary
Main Finding
The paper extends Paper A’s actuarial runtime to a strategic-operator setting and characterises a complete set of contractual clauses that eliminate five natural operator gaming strategies. Key constructive results:
- Two attack surfaces are already closed by Paper A (no-splitting; ex-ante safe-default commitment).
- Three additional clauses are necessary and sufficient (in combination with Paper A) to obtain strategy-proofness across the five-attack space:
1. Common-control aggregation + aggregate (or max-) settlement (Theorem 5) to prevent cross-boundary re-routing arbitrage.
2. Interface-failure adjudication that treats parsing/tool-call failures as escalation events with a per-failure fee κesc (Theorem 8); this deters intentionally unreliable interfaces that evade tolls.
3. A model-identity menu with a componentwise-minimum penalty schedule that makes truthful reporting of the deployed model weakly dominant (Theorem 13).
- Composing these clauses with Paper A’s runtime yields joint incentive compatibility across the five attack classes, and a two-parameter premium family can ensure operator individual rationality and weak budget balance at the truthful equilibrium.
Key Points
- Strategic model of operator: operator type θ = (model family Mθ, preferred safe-default a0θ, committed boundary Bθ, max-authority A+θ). Operator chooses deployment/reporting strategy σ to maximize business value minus expected tolls minus premium.
- Five attack classes (Definition 2):
a. Within-boundary splitting — closed by Paper A’s no-splitting clause.
b. Post-toll safe-default selection — closed by Paper A’s ex-ante safe-default commitment.
c. Cross-boundary re-routing (routing same economic exposure across multiple boundaries to reduce toll) — solved by common-control aggregation + aggregate settlement (Theorem 5). Max-settlement gives a conservative backup when super-additivity of potentials is uncertain.
d. Interface-compliance gaming (use unreliable model that produces invalid outputs so actions never get priced) — handled by adjudication rule choices: ξsafe (treat as safe default, zero toll) vs ξescalate (escalate and charge κesc). Theorem 8 shows when each rule creates incentives and provides a conservative implementable κesc ≥ µc deterrent.
e. Model-identity misreporting (declare one model but deploy another) — handled by a model-identity menu with a componentwise-min penalty schedule that makes truthful reporting weakly dominant (formalised in Theorem 13). - Theorem 5 (Cross-boundary no-arbitrage): With an observable aggregation map π and aggregate settlement, the payable toll equals the aggregate-boundary potential ΦB⋆(E⋆) − ΦB⋆(0) and cannot be reduced by any finite re-routing across sub-boundaries. Max-settlement guarantees the payable toll is at least the aggregate charge and at least the split-sum, protecting against misspecification.
- Theorem 8 (Interface-compliance adjudication incentives): Under a one-step affine utility model,
- ξsafe makes raising interface-failure probability f privately attractive iff µc > Vθ + Cfail (i.e., avoided toll exceeds value lost + failure cost).
- ξescalate makes raising f weakly unattractive if κesc ≥ µc − Vθ − Cfail; a conservative implementable rule is κesc ≥ µc when Vθ + Cfail ≥ 0.
- Empirical corroboration (companion Paper B): committed cross-model traces show extreme interface differences (e.g., Kimi-K2.5 produced invalid JSON in 100% of sampled trajectories; reliable models had f≈0). Operational numbers reported: mean gate-computed side-effect reserve µc ≈ 7,131; per-event destructive loss ≈ 1,350; in sweep of plausible business parameters the perverse-incentive regime (µc > Vθ + Cfail) holds in 45/48 grid cells under ξsafe — illustrating practical importance of adjudication choice and κesc calibration.
- Practical contract drafting tools: common-control attribution, related-party rules, audit clauses, currency/normalisation, and max-settlement as conservative fallback.
Data & Methods
- Methods: formal mechanism-design/contract-theoretic model + rigorous theorems and constructive clauses. Core techniques:
- Formal operator utility model: Ubiz − E[toll] − premium; operator strategies σ map types and contract terms to deployment/reporting plans.
- Formal definitions for aggregation maps, exposure vectors, boundary potentials Φ, adjudication rules ξ, and attack classes.
- Analytical proofs: Theorem 5 (seven-step proof showing exposure aggregation and routing-independence), Theorem 8 (affine comparative statics in interface-failure rate f), and a separate constructive proof for the model-identity menu (Theorem 13).
- Empirical validation (companion Paper B):
- Cross-model committed runs (“formal-core” and “pilot” panels). Formal-core: typically 10 trajectories per model–baseline cell (≈30 per model across B0/B2/B3); pilot cells smaller.
- Observed interface-failure rates: Kimi-K2.5 f = 1.0 (100% invalid JSON); gpt-4.1-mini, gpt-5.4-mini, DeepSeek-V3.1 f ≈ 0.
- Operational statistics: µc ≈ 7,131 (mean expected positive toll per baseline), destructive loss ≈1,350.
- These empirical quantities drive calibration recommendations (e.g., κesc ≥ µc as conservative rule when Vθ + Cfail ≥ 0).
Implications for AI Economics
- Insurance contracts for AI agents must internalise operator strategic behavior (moral hazard and misreporting). Contracts that ignore operator incentives risk perverse equilibria (e.g., deliberately deploying unreliable models to evade tolls).
- Design levers with economic interpretation:
- Auditability and common-control aggregation internalise externalities and prevent arbitrage via entity/session splitting — important for pricing systemic exposure correctly.
- Adjudication rules convert an unobservable interface failure into a contract-relevant event; escalation fees are a public lever (κesc) that can be set without observing private operator parameters if calibrated conservatively (κesc ≥ µc), trading off slack vs. implementability.
- Model-identity menus and penalty schedules align information revelation incentives and reduce adverse selection/misreporting.
- Trade-offs and costs:
- Strategy-proofness requires extra clauses and can reduce allocative efficiency or require conservative pricing (reflecting Myerson–Satterthwaite / budget-balance trade-offs). Max-settlement and conservative κesc choices can overcharge reliable operators if mis-specified.
- Enforcement/observability: the approach depends on verifiable common-control attribution and reliable detection of interface failures vs. exogenous infra outages. Audit and monitoring costs matter.
- Dynamic and collusion risks remain: multi-period dynamics, operator collusion, and incomplete observability of Vθ and Cfail may complicate tight calibration.
- Practical recommendations for insurers and regulators:
- Include common-control aggregation / related-party clauses and aggregate or max-settlement to block routing arbitrage.
- Treat interface failures as contract events; prefer escalation adjudication with a calibrated κesc (use µc as conservative baseline when operator private costs are unknown).
- Require signed model-identity menus and employ componentwise-min penalty schedules to induce truthful reporting.
- Use sandboxed cross-model testing (as in Paper B) to estimate µc, f by model, and loss statistics to calibrate premiums, κesc, and boundary potentials Φ.
- Open questions for AI economics research:
- How to balance conservatism vs. competitiveness in premiums when µc is large and heterogeneous across operators?
- Proper design under partial observability of common-control (false positives/negatives in attribution) and measurement noise in interface failure rates.
- Dynamic contracting when operator types evolve, and strategic multi-period manipulation or collusion across operators.
Overall, the paper provides a clear, implementable set of contractual clauses — backed by formal proofs and empirical evidence — that together form an incentive-compatibility layer for actuarial control of autonomous-agent side effects.
Assessment
Claims (9)
| Claim | Direction | Outcome | Confidence & Evidence | Details |
|---|---|---|---|---|
| We characterise a five-attack space for autonomous AI-agent insurance contracts and prove when the actuarial runtime is gaming-resistant. Governance And Regulation | positive | gaming-resistance of the actuarial runtime |
Reading fidelity
high
Study strength
high
|
|
| Two attack surfaces -- post-toll safe-default selection and within-boundary action splitting -- are closed by Paper A's minimal-authority and no-splitting clauses. Governance And Regulation | positive | closure/elimination of specified attack surfaces |
Reading fidelity
high
Study strength
high
|
|
| Common-control aggregation prevents cross-boundary re-routing from reducing toll below the boundary potential applied to total exposure. Governance And Regulation | positive | prevention of toll-reduction via cross-boundary re-routing |
Reading fidelity
high
Study strength
medium
|
|
| Interface failures such as invalid JSON are contract-relevant events, not safety wins: treating them as zero-toll safe defaults can reward unreliable models, while escalation fees reverse the incentive. Governance And Regulation | positive | incentives for interface compliance (treatment of interface failures) |
Reading fidelity
high
Study strength
medium
|
|
| We validate this interface-compliance theorem on committed cross-model traces from the companion empirical paper. Governance And Regulation | positive | empirical support for interface-compliance theorem |
Reading fidelity
medium
Study strength
low
|
|
| A model-identity menu with a componentwise-minimum penalty schedule makes truthful reporting of the deployed model weakly dominant. Governance And Regulation | positive | truthful reporting (strategy dominance) |
Reading fidelity
high
Study strength
high
|
|
| We compose these clauses with Paper A's runtime guarantees to obtain joint incentive compatibility over the five-attack space. Governance And Regulation | positive | joint incentive compatibility across attack vectors |
Reading fidelity
high
Study strength
medium
|
|
| A two-parameter premium family discharges operator individual rationality and weak budget balance at the truthful equilibrium. Governance And Regulation | positive | operator individual rationality and weak budget balance at equilibrium |
Reading fidelity
high
Study strength
medium
|
|
| The result is an incentive-compatibility layer for actuarial control of autonomous-agent side effects. Governance And Regulation | positive | existence of an incentive-compatibility layer enabling actuarial control |
Reading fidelity
high
Study strength
medium
|