AI agents’ risks hinge on their execution histories, so static prompts and access controls cannot reliably enforce path-dependent rules; firms must evaluate actions at runtime, a change that raises latency, engineering and compliance costs and reshapes markets for governance, insurance and enterprise adoption.

Runtime Governance for AI Agents: Policies on Paths

Maurits Kaptein, Vassilis-Javed Khan, Andriy Podstavnychy · March 17, 2026

arxiv theoretical n/a evidence 8/10 relevance Source PDF

Because agent behavior is path-dependent, effective governance requires runtime evaluation of proposed actions against the execution path—prompts and static access controls are only limited special cases.

AI agents -- systems that plan, reason, and act using large language models -- produce non-deterministic, path-dependent behavior that cannot be fully governed at design time, where with governed we mean striking the right balance between as high as possible successful task completion rate and the legal, data-breach, reputational and other costs associated with running agents. We argue that the execution path is the central object for effective runtime governance and formalize compliance policies as deterministic functions mapping agent identity, partial path, proposed next action, and organizational state to a policy violation probability. We show that prompt-level instructions (and "system prompts"), and static access control are special cases of this framework: the former shape the distribution over paths without actually evaluating them; the latter evaluates deterministic policies that ignore the path (i.e., these can only account for a specific subset of all possible paths). In our view, runtime evaluation is the general case, and it is necessary for any path-dependent policy. We develop the formal framework for analyzing AI agent governance, present concrete policy examples (inspired by the AI act), discuss a reference implementation, and identify open problems including risk calibration and the limits of enforced compliance.

Summary

Main Finding

Runtime governance that evaluates proposed actions conditional on their full execution path is necessary to govern AI agents effectively. The paper formalizes compliance policies as deterministic functions that map agent identity, partial execution path, proposed next action, and organizational state to a policy-violation probability, and shows that common governance measures (prompts, access control, agent-embedded guardrails, content filters, human approvals) are either special cases or insufficient. Effective governance therefore requires an external Policy Engine that performs per-step, path-aware evaluation and manages an organizational risk objective (trade-off between task success and expected violation costs).

Key Points

Problem framing
- AI agents (LLM-driven planners that invoke tools, code, other agents) produce non-deterministic, path-dependent behavior: violations often arise only from sequences of actions, not single steps.
- Traditional governance (RBAC, content filters, design-time verification, prompt-level control) cannot express or reliably enforce path-dependent constraints.
Formal contribution
- Defines an execution path P = (s1, s2, ..., sn), with each step si = (τi, din,i, dout,i) and step types including stochastic LLM calls, deterministic tool invocations, and human interactions.
- Introduces a compliance policy as a deterministic function mapping (agent identity, partial path, proposed next action, organizational state) -> violation probability.
- Frames prompting as shaping the distribution over paths (but not evaluating them), and access control as a context-free restriction that removes actions unconditionally (a special case of the policy that ignores path history).
- Positions runtime evaluation (per-step, path-aware policy checks executed outside the agent) as the general mechanism required for path-dependent policies.
Policy Engine & organizational objective
- Proposes a Policy Engine that:
  - Evaluates policies for proposed next actions across a fleet of agents,
  - Decides to allow, block, require approval, or modify actions,
  - Keeps the expected organizational cost of violations within a prescribed bound while preserving productivity.
- Connects per-step evaluations to a fleet-level risk optimization: maximize expected successful task completions subject to a constraint on expected violation costs (or equivalently minimize violation risk subject to productivity targets).
Concrete examples & relevance
- Worked scenarios: ticket injection leading to data disclosure; report preparing agent that reads restricted data then emails externally; two agents whose combined authorized actions create an information-barrier breach.
- Discusses policy examples inspired by the EU AI Act and how runtime governance supports regulatory compliance.
Practical architecture & limitations
- Discusses reference implementation choices, trade-offs (latency vs. fidelity of path evaluation), policy authoring, shared state management, and auditability.
- Identifies open problems: calibrating violation risk probabilities, limits of enforceable compliance (e.g., self-modifying agents), scaling cross-agent visibility, and robustly handling delegation and long partial paths.

Data & Methods

Nature of the paper
- Conceptual / theoretical framework paper — no empirical dataset or experimental evaluation is presented.
- Methods are formal definitions and system design arguments rather than statistical or experimental methods.
Formal elements
- Execution path formalism: agent identity + finite sequence of steps (step types, inputs, outputs).
- Policy function: deterministic mapper f(agent_id, partial_path, proposed_action, org_state) -> Pr(violation). This serves as the primitive evaluation unit.
- Policy Engine: an organizational runtime component that queries policy functions for each proposed step, and takes enforcement actions (allow/block/require human approval/modify).
- Risk objective: an organizational-level optimization balancing expected utility (task success/productivity) and expected costs of policy violations; the Policy Engine acts to keep expected violations below bounds.
Relationship to existing mechanisms
- Prompting = modifies prior distribution over paths but does not perform explicit per-step path evaluation.
- Access control = a policy that returns zero probability for disallowed actions but is path-agnostic.
- Guardrails/content filters = either agent-internal heuristics or per-step content checks; they do not capture cross-step trajectory violations unless integrated in a path-aware external engine.
Implementation discussion
- Outlines architectures for integrating the Policy Engine (inline interception of tool calls, asynchronous monitoring, approvals workflow).
- Practical constraints considered: latency, observability, policy expressiveness, need for shared state across agents, audit logs for compliance.

Implications for AI Economics

Governance as an economic constraint on agent deployment
- Runtime governance imposes direct costs (engineering, latency, human review), and indirect costs (reduced agent productivity due to blocking or added friction). These costs must be balanced against the productivity gains from agent automation.
- Firms will face trade-offs: looser governance increases expected violation costs (legal, reputational, regulatory fines), tighter governance increases operational costs and reduces throughput. Optimal governance design becomes an economic optimization problem.
Compliance and regulatory cost structures
- The framework clarifies how regulatory regimes (e.g., EU AI Act) map to operational requirements: organizations must invest in path-aware runtime controls to credibly demonstrate compliance for high-risk agent use.
- Governance spending becomes a predictable line item affecting marginal economics of agent substitution for human labor, especially in high-regulation sectors (finance, healthcare).
Market and adoption effects
- Firms with better runtime governance (lower expected violation costs at equal productivity) will enjoy competitive advantage. This creates demand for shared Policy Engines, governance-as-a-service, standardized policy languages, and audit tools—new markets and business models.
- Smaller firms or those with limited compliance budgets may delay or limit agent adoption in regulated tasks, shaping uneven diffusion across industries.
Insurance, liability, and externalities
- Quantified path-aware violation probabilities enable insurers to price agent-related risk more accurately; this could lower capital costs for compliant firms and raise premiums for those with weak governance.
- Systemic externalities arise from cross-agent and cross-organization interactions (delegation, shared datasets). Externalities may justify industry-wide standards or shared governance infrastructure to internalize systemic risk.
Labor and productivity
- Runtime governance changes the effective productivity of agents versus humans. When governance requires frequent human approvals for difficult path-dependent checks, the net labor substitution is reduced; conversely, strong automated policy engines that keep violation risk low enable greater automation and labor displacement.
- Investments in governance tooling (policy authoring, path auditing, monitoring) become complementary capital to agent deployments.
Research and measurement needs
- Economically useful metrics: distribution over execution paths, conditional violation probabilities, expected violation cost per task, approval burden per agent, latency impact on throughput. These are required to perform cost-benefit analyses for governance investments.
- Calibration of policy-violation probabilities is both a technical and economic problem: overconfident policies underprice risk; overly conservative policies reduce productivity and raise marginal costs.
Long-run competition and standards
- Firms that develop reusable, verifiable policy languages and scalable Policy Engines can capture rents as platform providers or security vendors.
- Standards (regulatory or industry) that define minimum path-aware governance will shift costs from bespoke solutions to standardized implementations, lowering entry barriers for compliant automation.

Overall, the paper reframes governance as a runtime, path-aware control problem and thereby highlights a class of economic trade-offs (productivity vs. violation risk, upfront governance investment vs. expected downstream costs) that will shape agent adoption, market structure, regulatory compliance costs, and the emergence of governance service markets.

Assessment

Paper Typetheoretical Evidence Strengthn/a — The paper is a formal/theoretical contribution that provides definitions, formal mappings, and architectural designs but does not present empirical tests or causal inference from observed data. Methods Rigormedium — The work offers a clear formal execution model, formalization of policies, and logical arguments showing that prompts and static ACLs are special cases; it also supplies a reference implementation and worked examples. However, it lacks empirical validation, stress-testing in adversarial settings, and measurement of operational costs, which limits assessment of practical effectiveness. SampleNo empirical sample or observational dataset; the paper uses an abstract execution model for LLM-based agents, formal proofs/arguments, illustrative regulatory-style policy examples, worked examples and possibly small-scale simulations or demonstration traces rather than large-scale field data. Themesgovernance adoption org_design productivity innovation GeneralizabilityRelies on full observability of execution paths and reliable logging—may not hold when internal states/actions are partially hidden or encrypted., Assumes policy evaluator can deterministically map (agent, partial_path, proposed_action, org_state) to a meaningful violation score—may be infeasible in complex or ambiguous domains., Does not evaluate adversarial agents that intentionally evade logging or craft paths to obscure violations., Operational costs (latency, compute) and scalability constraints may limit applicability in high-throughput production systems., Focused on LLM-based agent architectures; applicability to other classes of AI (e.g., robotics with rich continuous state) may require adaptation., Regulatory and legal enforceability will vary across jurisdictions and industries; formalism does not resolve all compliance/legal interpretation issues.

Claims (19)

Claim	Direction	Confidence	Outcome	Details
LLM-based agent behavior is non-deterministic and path-dependent: an agent's safety/compliance risk depends on the entire execution path, not just the current prompt or single action. Ai Safety And Ethics	negative	high	path-dependent compliance/safety risk (probability of policy violation conditional on full execution path)	0.02
Prompt-level instructions and static access control lists (ACLs) are limited special cases of a more general runtime policy-evaluation framework and cannot, in general, enforce path-dependent rules. Regulatory Compliance	negative	high	ability to detect/enforce path-dependent policy violations (yes/no / coverage of constraints)	0.02
Effective governance for agentic LLM systems requires treating the execution path as the central object and performing runtime evaluation of proposed next actions given the partial path. Governance And Regulation	positive	high	governance effectiveness for path-dependent policies (qualitative/coverage)	0.02
Policies can be formalized as deterministic functions p_violation = Policy(agent_id, partial_path, proposed_action, org_state) that return a probability or score of violation for a proposed next action. Governance And Regulation	positive	high	expressiveness of policy formalism (ability to represent targeted constraints)	0.02
Runtime policy evaluation can intercept, score, log, allow/modify/block actions, and update organizational state as part of an agent's execution loop (reference implementation architecture). Governance And Regulation	positive	high	feasibility of integrating runtime policy evaluator into agent loops (architectural feasibility)	0.02
Runtime evaluation imposes additional compute, latency, logging, and engineering costs that increase the marginal cost of deploying agents. Firm Productivity	negative	high	marginal deployment cost (compute/latency/engineering overhead)	0.02
Firms will trade off compliance strictness against service quality (task completion rates), creating an economic tradeoff that shapes market offerings (e.g., safer-but-slower vs. faster-but-riskier agents). Task Completion Time	mixed	medium	tradeoff curve between task completion rate and compliance risk (expected violations or penalties)	0.01
A market will develop for third-party governance tools, auditors, and insurers providing policy evaluators, risk calibration, and certification services. Adoption Rate	positive	medium	emergence of third-party governance services (market development; presence/size of market)	0.01
Path-dependent policies complicate ex post auditing and simple rule-based regulation; regulators may prefer standards requiring runtime evaluation and logging to be enforceable in practice. Regulatory Compliance	negative	medium	enforceability of regulation (ease of ex post compliance verification)	0.01
Liability regimes and penalties should account for limits of enforced compliance and false positives/negatives from probabilistic policy evaluations. Governance And Regulation	mixed	medium	appropriateness of liability frameworks given probabilistic enforcement (policy suitability)	0.01
Risk calibration—mapping violation probabilities to enforcement actions and thresholds—is a key unsolved operational problem for runtime governance. Governance And Regulation	null_result	high	existence of calibrated thresholds and procedures (presence/absence)	0.02
Path-dependent behavior increases the complexity of principal–agent contracting and moral hazard between platforms, enterprise customers, and downstream users, requiring richer contract terms (acceptable paths, logging, audit rights). Governance And Regulation	negative	medium	complexity of contractual arrangements (number/complexity of contract clauses or monitoring requirements)	0.01
High governance costs in regulated/high-risk domains can slow adoption of agentic systems, concentrating deployment in less regulated uses or among large firms that can afford governance infrastructure. Adoption Rate	negative	medium	rate of adoption of agentic systems across firm sizes and regulated domains	0.01
Standardized runtime governance frameworks could lower per-deployment compliance engineering costs and increase diffusion of agentic systems. Adoption Rate	positive	medium	per-deployment compliance cost and diffusion rate (adoption)	0.01
The paper's formalism shows that prompt/system messages shape distributions over possible execution paths (indirect control) but do not evaluate actual partial paths at runtime. Governance And Regulation	neutral	high	degree of control over execution path (distributional shaping vs. path-specific enforcement)	0.02
Static ACLs evaluate deterministic rules that ignore partial execution paths and therefore can only capture a subset of organizational constraints. Regulatory Compliance	negative	high	coverage of organizational constraints by static ACLs (proportion of constraints capturable)	0.02
The paper provides concrete, regulation-inspired policy examples (e.g., content prohibition, sensitive data exfiltration) showing how they map into the Policy function. Governance And Regulation	positive	high	representability of regulation-inspired policies in the formalism (yes/no; example coverage)	0.02
No large empirical dataset or large-scale field experiments were used; the work is primarily theoretical/formal with simulations and worked examples rather than empirical validation. Research Productivity	null_result	high	use of empirical data (presence/absence of large-scale empirical evaluation)	0.02
Measuring the marginal cost of runtime governance, the tradeoff curve between task completion and compliance risk, and calibrating violation probabilities are open empirical research questions identified by the paper. Research Productivity	null_result	high	existence of empirical research gaps (identified/not identified)	0.02