A formal leverage ratio shows AI agents can only replace human labor up to hard bounds set by information-transfer limits and irreducible planning, while repeated tasks can yield greater displacement only to the extent of prior planning investment.
We propose a per-task leverage ratio for human-agent collaboration: human work displaced by an agent, divided by the human time required to specify the task, resolve mid-run interrupts, and review the result. The denominator decomposes into three channels through which a conserved per-task information requirement must flow, each with its own time-cost scalar. We show that information density itself is directional and bounded by separate ceilings on human-to-agent and agent-to-human flow, and that the asymptotic behavior of leverage decomposes into two scaling axes (capability and memory) with a non-zero floor on the planning term set by irreducible task novelty bounded by human throughput. We extend this per-task analysis to a windowed leverage measure that accommodates recurring tasks, spawned subtasks, and amortized system-design investment. The per-task ceiling does not bind the windowed measure, though both remain bounded: $L_{\text{task}}$ by per-task novelty, $L_{\text{window}}$ by the stock of accumulated planning investment that pays out within the window. The framework operationalizes aspects of earlier qualitative work on supervisory control (Sheridan, 1992), common ground (Clark & Brennan, 1991), and mixed-initiative interaction (Horvitz, 1999) within a single normative ratio, and produces a list of testable empirical questions that we leave as open problems.
Summary
Main Finding
The paper proposes a normative, per-task leverage ratio for human–agent collaboration: L_task = H_displaced / (t_planning + sum_j t_interrupt_j + t_review) and shows how that ratio decomposes into three information-exchange channels (planning, interrupt resolution, review) with directional information-density ceilings (ρ_in for human→agent, ρ_out for agent→human). Key theoretical results: - Information conservation: task information I_task must be allocated across the three channels; channel allocations trade off. - Directional asymmetry: ρ_in (human output) has a much tighter ceiling (≈ human speech/typing rates) than ρ_out (agent-chosen modalities). - Asymptotics: as shared memory M and capability grow, interrupts and review can shrink to zero, but planning approaches a nonzero floor set by irreducible task novelty I_novel. Therefore per-task leverage has a finite ceiling: L_max_task = H_displaced * ρ_in_max / I_novel. - Windowed leverage L_window(T) (total H_displaced in a time window divided by operator hours) allows amortization of one-time investments (templates, memories) and can grow beyond the per-task ceiling, bounded by accumulated planning investment that pays out in the window.
Key Points
- Definitions and primitives:
- H_displaced: human work (hours) removed by the agent.
- t_planning, t_interrupt_j, t_review: human hours spent in each phase.
- Information density ρ = I_conveyed / t_exchange (bits/hour); directional decomposition into ρ_in and ρ_out and an effective ρ_eff depending on the split α of information direction.
- Channel cost scalars c_p, c_i,j, c_r capture overheads beyond raw bit transfer (context switching, rework).
- Structural claims:
- Numerator (H_displaced) driven by capability; denominator by workflow & information flow.
- ρ_out has greater headroom than ρ_in; raising ρ_out won’t help exchanges bottlenecked on ρ_in.
- Investments map cleanly to variables: capability → H_displaced; workflow design → c_term; memory/templates → increase ρ(M) and shift I_task distribution across channels.
- Empirical and falsifiable claims:
- The directional asymmetry prediction: interventions that increase ρ_in (e.g., faster input modalities) should reduce time more in high-α phases (planning); interventions that increase ρ_out (better visualizations) should reduce time more in low-α phases (review).
- The existence of a planning floor (I_novel / ρ_eff) that cannot be eliminated by capability alone.
- Practical regimes beyond single tasks:
- Recurring tasks: planning amortizes across runs, so per-run leverage can grow without the per-task ceiling.
- Spawned-task hierarchies: parent tasks transfer memory M to children, lowering child planning costs.
- System-design investments (templates, skills, memory) are one-time costs that pay back over windows.
Data & Methods
- This is a theoretical/analytical paper (no empirical dataset). Methods:
- Formal definitions and algebraic decomposition of human–agent interaction into information-theoretic and time-cost terms.
- Use of Shannon bits to define information density and algebraic combination (harmonic mean for mixed-direction exchanges).
- Asymptotic analysis with two axes: capability (affects H_displaced) and memory M (affects ρ_in, ρ_out and hence denominator).
- Proposal of measurement protocols and open empirical problems (10 listed), including:
- How to operationalize and measure ρ(M) (functional form).
- Measuring channel cost scalars c_p, c_i, c_r via field studies and timing.
- Two measurement strategies for H_displaced: ensemble estimation and A/B timing.
- Falsification protocol for directional asymmetry (within-subjects interventions changing ρ_in vs ρ_out and measuring per-phase time deltas).
- Open technical problems: formal scheduling for L_window with dependencies, recursive task hierarchies, failure-mode analysis when p_fail > 0.
- Assumptions & limitations:
- Uses “useful” bits and assumes tasks succeed (p_fail = 0 in base formula).
- Human input rate ρ_in constrained by physiological/behavioral ceilings (speech ≈ 20 bits/sec).
- Memory is treated as reliable; memory quality/staleness is a separate engineering concern.
Implications for AI Economics
- Investment prioritization and ROI
- Distinguish investments that raise numerator (model capability, skills, tooling) from those that reduce denominator (workflow design, templates, input/output modalities).
- To raise per-task ceilings, prioritize interventions that increase ρ_in (better input modalities, intent compression). To scale across many tasks or windows, invest in memory, templates, and system design that amortize planning cost.
- Different interventions have phase-specific returns: improving ρ_out (visualizations, structured outputs) yields biggest payoff where agent→human flow dominates (review); improving ρ_in yields biggest payoff for planning-dominated phases.
- Measurement of labor displacement and productivity accounting
- Provides a unit of analysis (L_task and L_window) useful for estimating displaced human-hours per operator-hour spent — a micro-founded complement to aggregate productivity metrics.
- Highlights the measurement challenge: H_displaced is counterfactual and must be estimated via ensemble counterfactuals or matched A/B human-vs-agent timing, each with biases. Policy and firm-level measurement should triangulate methods.
- Dynamics of automation and labor demand
- Per-task ceiling implies many tasks will be intrinsically limited by irreducible novelty; these tasks are less prone to full automation unless input-bandwidth or intent-compression breakthroughs occur.
- However, window-level amortization and recurring tasks imply large-scale automation gains are feasible via one-time investments (templates, memories), producing strong scale economies and potentially rapid labor displacement in sectors with many recurring or isomorphic tasks.
- Human capital accumulation (building M) becomes an economic lever: firms that invest in shared memory, templates, and orchestration can extract outsized leverage per operator-hour, yielding competitive advantages and possibly winner-take-most dynamics.
- Policy and distributional considerations
- Heterogeneous task structure across occupations matters: jobs with low I_novel relative to H_displaced are most automatable; jobs with high irreducible novelty remain resilient.
- Returns to automation investment depend on the ability to amortize planning costs (task recurrence, spawned subtasks, platformization). Labor-market impacts will therefore be uneven across firms and sectors.
- Regulatory/measurement implications: accurate public statistics on AI-driven labor displacement require standardized ways to estimate H_displaced, I_novel, and amortized system-design investments.
- Research agenda for economists
- Empirically quantify ρ_in and ρ_out in real workflows, measure functional form ρ(M), and estimate channel cost scalars c_term across occupations.
- Model firm-level decisions: optimize tradeoffs between capability investments vs. workflow/memory investments given their different returns to L_task and L_window.
- Incorporate the per-task ceiling and amortization effects into macro models of automation, wage-setting, and transition dynamics (employment reallocation, retraining needs).
- Operational recommendations for practitioners
- Measure per-phase time and bits transferred where feasible; prioritize interventions according to which phase is bottlenecked by which ρ.
- Invest in system-level memory and templates to maximize L_window; prioritize ρ_in improvements when seeking higher per-task ceilings.
- Use the falsification protocol and the proposed measurements to decide whether to invest in input modalities, output modalities, capability, or memory/system design.
Overall, the framework supplies a micro-founded, testable unit (per-task and windowed leverage) that connects information-theoretic constraints and workflow engineering to economic outcomes around automation, offering concrete hypotheses and an empirical program for AI economists and firms to prioritize investments and measure displacement.
Assessment
Claims (8)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We propose a per-task leverage ratio for human-agent collaboration: human work displaced by an agent, divided by the human time required to specify the task, resolve mid-run interrupts, and review the result. Task Allocation | positive | high | human work displaced per unit human time (per-task leverage) |
0.02
|
| The denominator decomposes into three channels through which a conserved per-task information requirement must flow, each with its own time-cost scalar (specify the task, resolve mid-run interrupts, and review the result). Task Allocation | positive | high | components of human time cost (specification, interrupt resolution, review) |
0.02
|
| Information density itself is directional and bounded by separate ceilings on human-to-agent and agent-to-human flow. Automation Exposure | positive | high | directional information flow bounds between human and agent |
0.02
|
| The asymptotic behavior of leverage decomposes into two scaling axes (capability and memory) with a non-zero floor on the planning term set by irreducible task novelty bounded by human throughput. Developer Productivity | positive | high | leverage scaling behavior and lower bound on planning term |
0.02
|
| We extend this per-task analysis to a windowed leverage measure that accommodates recurring tasks, spawned subtasks, and amortized system-design investment. Task Allocation | positive | high | windowed leverage (aggregated leverage over a time window accounting for amortization and spawned tasks) |
0.02
|
| The per-task ceiling does not bind the windowed measure, though both remain bounded: L_task by per-task novelty, L_window by the stock of accumulated planning investment that pays out within the window. Task Allocation | positive | high | bounds on L_task and L_window (per-task novelty and accumulated planning investment) |
0.02
|
| The framework operationalizes aspects of earlier qualitative work on supervisory control (Sheridan, 1992), common ground (Clark & Brennan, 1991), and mixed-initiative interaction (Horvitz, 1999) within a single normative ratio. Organizational Efficiency | positive | high | conceptual operationalization of supervisory control/common ground/mixed-initiative into leverage metric |
0.12
|
| The framework produces a list of testable empirical questions that we leave as open problems. Research Productivity | positive | high | set of testable empirical research questions derived from the framework |
0.02
|