AI drafts can make workers look faster while slowing down the system: under congestion reviewers cut scrutiny and AI errors cause costly rework, so AI only reduces backlogs when it handles enough tasks and review plus rework takes far less time than doing the task manually.
Quantifying the workplace productivity effects of Generative Artificial Intelligence is now central to economics, management, and public policy. The deployment of AI tools in customer service, writing, software development, and consulting operations has been reported to generate large per-task productivity gains, typically measured as tasks completed per worker-hour or reductions in mean handle time. We argue that such mean-based metrics can misrepresent AI's effects in workflows where tasks accumulate and compete for scarce human attention. AI assistance can generate a deceptive productivity signature: average completion times fall because AI tools typically supply a fast first draft, yet workflow-level performance deteriorates when a subset of AI errors escapes review and returns as costly downstream rework. We call this divergence between mean task speed and system-level delay the variance wedge. Depending on the operational parameters, the most time-efficient way to complete a workflow may undergo a transition between two task-processing regimes, a fully AI-assisted and a fully manual one. We formalize the mechanism as a queueing model and derive two main implications analytically. First, under congestion, reviewers rationally raise the risk threshold for checking AI outputs, reducing scrutiny precisely when it would matter the most. Second, AI assistance can stabilize an overloaded workflow only when (i) the fraction of tasks handled by AI exceeds a critical threshold, and (ii) the human attention required for review and expected rework is lower than the attention for manual completion, a requirement substantially more stringent than faster draft generation. These results suggest that AI deployment should be evaluated not only by average task speed, but by its overall effects on congestion, rework, and the robustness of human oversight under load.
Summary
Main Finding
AI can reduce average per-task human time while increasing workflow-level delays when tasks compete for scarce human attention. In an M/G/1 queueing model with AI-produced drafts and imperfect human review, AI creates a “variance wedge”: a lower mean service time but a heavier right tail (rework), and because queueing delay depends on the second moment of service time, occasional costly rework cases can lengthen waiting times despite faster typical task completion.
Key Points
-
Model setup
- Tasks arrive at rate λ and are routed to manual handling with probability 1−x or AI-assisted handling with probability x.
- Manual service time TH: mean τH and variance σH^2.
- AI route: reviewer spends r hours; residual miss probability p(r)=p∞ + (p0−p∞) e^{−κ r}; escaped errors produce rework R with mean µR and second moment µR,2. AI route total human-attention time TA = r + M R (M indicator for escaped error).
- The system is an M/G/1 queue with server capacity C (human-attention hours per calendar time).
-
Central analytical expression (Pollaczek–Khinchine)
- Average queue waiting time: Wq(x;r) = λ q(x;r) / [2 C (C − λ m(x;r))] equivalently Wq = [ρ/(1−ρ)] · [(1 + c_s^2)/2] · (m/C), where m = E[T], q = E[T^2], ρ = λ m / C, and c_s^2 is the squared CV of service time.
- Waiting time is jointly determined by congestion (ρ/(1−ρ)), mean service time m, and service-time variability c_s^2.
-
The variance wedge (main tradeoff)
- Comparing pure manual (x=0) and pure-AI (x=1) systems, AI yields lower average waiting time only if (1 + c_A^2) / (1 + c_H^2) < (τH / τA)^2 · (1 − λ τA / C) / (1 − λ τH / C).
- Interpretation: AI must not only lower mean human time (τA < τH) but also keep variability small enough. If AI creates a sufficiently heavier tail (large c_A^2), waiting time can increase even when τA < τH.
- The queue’s tolerance for extra variability (the “AI variance budget”) grows when AI produces large mean savings and when the manual system is near saturation.
-
Reviewer behavior under congestion
- Because review effort competes with all queued tasks, congestion raises the opportunity cost of checking individual drafts. Optimal (rational) behavior is to raise the risk threshold for inspection under heavier load—i.e., check fewer drafts and reduce scrutiny precisely when errors are most consequential.
-
Stability / adoption threshold
- AI can stabilize an overloaded workflow only if (i) the AI adoption share x exceeds a critical threshold x*, and (ii) the AI-route expected human attention is lower than manual handling.
- The stability boundary solves λ m(x; r) = C. With m(x;r) = τH + x(τA − τH), the critical share (when τA ≠ τH) is x = (C/λ − τH) / (τA − τH), so a feasible stabilizing adoption requires x > x (and x* ∈ [0,1]). Concretely, a necessary (but not sufficient) condition is τA = r + p(r) µR < τH.
-
Relation to feedback queues
- Rework acts as Bernoulli feedback: completed AI-routed tasks can re-enter the queue for correction, amplifying endogenous traffic and congestion even if exogenous arrival λ is unchanged.
Data & Methods
- This is a theoretical / modeling paper; no new field experiment or administrative dataset is introduced.
- Analytical methods:
- Queueing-theoretic analysis of an M/G/1 system using the Pollaczek–Khinchine formula for mean queue waiting time.
- Kingman approximation noted for non-Poisson/bursty arrival processes (replaces the variability factor appropriately).
- Bernoulli-feedback interpretation for rework; closed-form expressions for m(x;r) and q(x;r) are derived for the mixed system (manual + AI).
- Illustrations / calibration:
- Simulated example distributions (spike-at-review-time plus long rework tail) show realistic parameterizations where τA < τH but c_A^2 ≫ c_H^2, producing longer queues under AI.
- Key parameters to measure/estimate in practice:
- τH, σH^2 (manual mean & variance)
- review time r and skill κ (shape of p(r))
- residual miss probabilities p0, p∞
- rework mean µR and higher moments
- arrival rate λ and reviewer capacity C
- AI routing share x
Implications for AI Economics
-
Measurement and empirical work
- Mean-based productivity metrics (tasks per hour, mean handle time) can be misleading for workflow-level performance when tasks queue for scarce human attention. Analysts should report full service-time distributions (or at least second moments and rework rates), not just means.
- Macro/metering frameworks that multiply task exposure by mean cost savings risk overstating gains if they omit rework and variance effects; second-moment terms should be included.
-
Organizational policy and deployment
- Simply speeding upstream generation (shorter drafts) is not sufficient; success requires reducing expected downstream human attention (r + p(r) µR) below the manual benchmark.
- Firms should measure rework rates and tail risks and invest in either (i) improving AI reliability (lower p(r) for given r), (ii) more effective review tools that lower r for the same miss rate, or (iii) increasing review capacity C (especially for bottleneck senior reviewers).
- Partial adoption can worsen congestion unless adoption exceeds the critical share x*; rollout plans should consider this threshold or phase capacity expansion with adoption.
-
Labor markets and tasks
- Bottlenecks may shift toward verification, review, and incident-handling roles; demand for skilled reviewers and specialists may rise even as average generation time falls.
- Wages and staffing models should reflect a shift in attention needs and tail-risk management rather than only average time savings.
-
Policy and regulation
- In high-stakes domains (medical, legal, safety-critical engineering), the variance wedge highlights systemic risk from rare but costly AI errors that escape review. Regulation and auditing should target not only average error rates but also tail behavior, rework/feedback pathways, and capacity constraints on oversight.
- Disclosure/benchmarking standards for AI in workflows should include residual error after feasible review, expected rework burden, and recommended review effort r for target miss probabilities.
Bottom line: Evaluate AI deployment by its effect on the full distribution of human-attention demand (mean and variance, including rework feedback), the bottleneck review capacity, and the adoption share — not by mean per-task speed alone.
Assessment
Claims (8)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Mean-based metrics (e.g., tasks completed per worker-hour or mean handle time) can misrepresent AI's effects in workflows where tasks accumulate and compete for scarce human attention. Task Completion Time | negative | high | task_completion_time |
0.12
|
| AI assistance can generate a deceptive productivity signature: average completion times fall because AI tools typically supply a fast first draft, yet workflow-level performance can deteriorate when a subset of AI errors escapes review and returns as costly downstream rework. Task Completion Time | mixed | high | task_completion_time |
0.12
|
| The divergence between mean task speed and system-level delay caused by AI assistance is labeled the 'variance wedge'. Task Completion Time | neutral | high | task_completion_time |
0.12
|
| Depending on operational parameters, the most time-efficient way to complete a workflow may undergo a transition between two task-processing regimes: a fully AI-assisted regime and a fully manual regime. Task Allocation | mixed | high | task_allocation |
0.12
|
| Under congestion, reviewers rationally raise the risk threshold for checking AI outputs, reducing scrutiny precisely when it would matter the most. Decision Quality | negative | high | decision_quality |
0.12
|
| AI assistance can stabilize an overloaded workflow only when (i) the fraction of tasks handled by AI exceeds a critical threshold, and (ii) the human attention required for review and expected rework is lower than the attention required for manual completion. Organizational Efficiency | positive | high | organizational_efficiency |
0.12
|
| The requirement that review + expected rework attention be lower than manual completion attention is substantially more stringent than the requirement that AI merely generate faster drafts. Developer Productivity | negative | high | developer_productivity |
0.12
|
| AI deployment should be evaluated not only by average task speed, but by its overall effects on congestion, rework, and the robustness of human oversight under load. Governance And Regulation | neutral | high | organizational_efficiency |
0.12
|