Because AI performance improves with diminishing returns to data, compute, and model size, near-perfect accuracy is disproportionately costly, so firms often choose partial human-AI collaboration rather than full automation; calibrated to computer vision, cost-effective automation covers roughly 11% of exposed wages at the firm level but can scale far higher when AI services spread fixed costs across users.

Economics of Human and AI Collaboration: When is Partial Automation More Attractive than Full Automation?

Wensu Li, Atin Aboutorabi, Harry Lyu, Kaizhi Qian, Martin Fleming, Brian C. Goehring, Neil Thompson · March 31, 2026

arxiv theoretical medium evidence 8/10 relevance Source PDF

Modeling automation intensity as a continuous choice and calibrating with scaling laws, task data, expert surveys, and GPT-4o decompositions, the paper shows that convex costs of higher AI accuracy typically make partial human-AI collaboration cost-optimal, yielding about 11% of computer-vision-exposed labor compensation at the firm level and much larger shares under widespread deployment.

This paper develops a unified framework for evaluating the optimal degree of task automation. Moving beyond binary automate-or-not assessments, we model automation intensity as a continuous choice in which firms minimize costs by selecting an AI accuracy level, from no automation through partial human-AI collaboration to full automation. On the supply side, we estimate an AI production function via scaling-law experiments linking performance to data, compute, and model size. Because AI systems exhibit predictable but diminishing returns to these inputs, the cost of higher accuracy is convex: good performance may be inexpensive, but near-perfect accuracy is disproportionately costly. Full automation is therefore often not cost-minimizing; partial automation, where firms retain human workers for residual tasks, frequently emerges as the equilibrium. On the demand side, we introduce an entropy-based measure of task complexity that maps model accuracy into a labor substitution ratio, quantifying human labor displacement at each accuracy level. We calibrate the framework with O*NET task data, a survey of 3,778 domain experts, and GPT-4o-derived task decompositions, implementing it in computer vision. Task complexity shapes substitution: low-complexity tasks see high substitution, while high-complexity tasks favor limited partial automation. Scale of deployment is a key determinant: AI-as-a-Service and AI agents spread fixed costs across users, sharply expanding economically viable tasks. At the firm level, cost-effective automation captures approximately 11% of computer-vision-exposed labor compensation; under economy-wide deployment, this share rises sharply. Since other AI systems exhibit similar scaling-law economics, our mechanisms extend beyond computer vision, reinforcing that partial automation is often the economically rational long-run outcome, not merely a transitional phase.

Summary

Main Finding

Partial automation — where firms choose an intermediate AI accuracy and humans handle the residual uncertainty — is frequently the cost‑minimizing long‑run outcome. Because model performance follows scaling laws with sharply diminishing returns, the marginal cost of pushing AI from “good” toward near‑perfect accuracy is convex and often exceeds the marginal labor savings. As a result, firms optimally stop at interior solutions (human–AI collaboration) rather than pursue full automation in many tasks. Scale of deployment (e.g., AI-as-a-Service or shared agents) materially expands the set of economically viable automations by spreading fixed development costs.

Key Points

Framework: Automation is modeled as a continuous choice of AI accuracy (not a binary decision). Firms minimize costs by choosing an accuracy level that trades off AI development costs against labor savings.
Supply side (costs): Fine‑tuning scaling‑law experiments show performance increases predictably with data, training steps, and model size but with diminishing returns. This produces a convex cost function: small gains in accuracy can be cheap, but close‑to‑perfect accuracy becomes disproportionately expensive.
Demand side (labor substitution): An entropy/information‑theoretic mapping translates model accuracy into a labor‑substitution ratio: higher accuracy reduces residual uncertainty and thus human processing time. This gives a quantitative mapping from accuracy to how much human work AI displaces.
Three possible optima per task: no automation, partial automation (interior solution), or full automation. Partial automation occupies a large share of task space because of the convex cost structure.
Task complexity matters: tasks with few subtasks and low entropy (low complexity) have high substitution rates and are more likely to be highly automated; tasks with many subtasks/high complexity favor limited partial automation.
Scale effects: Sharing fixed costs across many users (AI-as-a-Service, economy‑wide agents) lowers per‑user cost, increases optimal model quality, and raises automation rates. Under shared/economy‑wide deployment the economically viable share of automation rises sharply.
Quantitative result (computer vision calibration): At typical firm‑level deployment, roughly 11% of labor compensation tied to computer‑vision‑exposed tasks is economically attractive to automate; most of that saving comes from partial rather than full automation. The share would be larger when including other modalities (LLMs, multimodal models).

Data & Methods

Theoretical model: Microeconomic, task‑level cost minimization where firms choose AI accuracy. Supply side modeled via an estimated AI production function; demand side modeled by an entropy‑based accuracy→labor substitution mapping. Optimality determined by comparing marginal cost of accuracy improvements with marginal labor savings.
AI production function: Estimated from fine‑tuning scaling‑law experiments linking performance to (i) additional task data, (ii) training steps, and (iii) model size. Results document performance elasticities and substitutability across inputs and confirm convex cost structure at high performance.
Entropy mapping: Uses information theory to map remaining uncertainty (entropy) at a given model accuracy to human processing time required to resolve residual tasks, yielding a formal labor substitution ratio.
Calibration and empirical implementation (computer vision domain):
- O*NET: Identified 420 computer‑vision‑exposed tasks across 263 occupations and used task→time allocations.
- Expert survey: Large survey of 3,778 domain experts elicited task‑specific required accuracies and validation of task characteristics.
- GPT‑4o decompositions: Automated extraction of number of vision subtasks, classes per subtask, and visual share per task; outputs were manually validated by human coders.
- Administrative data: Wages, employment, and firm‑size distributions from U.S. agencies scaled task decisions to occupation, firm, industry, and economy levels.
Empirical findings: Using the estimated cost function and entropy mapping, the authors compute per‑task optimal automation levels and aggregate outcomes under different deployment scales (firm‑level, AI‑as‑a‑Service, economy‑wide).

Implications for AI Economics

Reconceptualize exposure metrics: Technical feasibility alone is insufficient — economic viability requires modeling convex scaling costs and the accuracy→labor mapping. Measures of “exposure” should incorporate the cost of attaining required accuracy and task complexity.
Human–AI collaboration is likely durable: Partial automation is not merely transitional; for many tasks the cost structure makes human oversight or residual human work optimal long term. Policies and models should anticipate extensive hybrid workflows.
Scale and market structure matter: Large firms, platforms, and shared AI providers can broaden the automation frontier by amortizing fixed development costs. This helps explain concentrated early adoption and incentivizes centralized AI services.
Distributional and labor implications: Partial automation implies task redesign rather than wholesale job elimination. Effects on wages and employment depend on which subtasks are automated (expert vs. inexpert work) and on within‑firm task heterogeneity. Models of labor market adjustment should incorporate task‑level partial substitution and complementarities.
Modeling guidance for macro and policy work: Aggregate projections of automation impacts must account for (i) scaling‑law convexities in AI development, (ii) task complexity/entropy, and (iii) deployment scale. Ignoring these factors will overstate the pace and extent of full automation.
Research directions: Extend empirical calibration beyond computer vision (LLMs, multimodal models), incorporate organizational/implementation costs (beyond model development), study dynamic evolution as model/data/compute costs change, and analyze distributional impacts across firm sizes and worker skill groups.

Assessment

Paper Typetheoretical Evidence Strengthmedium — The paper combines a theoretical framework with empirical calibration using scaling-law experiments, a large expert survey (N=3,778), O*NET task data, and GPT-4o-based task decompositions to demonstrate plausibility and quantify magnitudes; however, it does not use exogenous variation or causal identification to estimate realized effects in markets, and key mappings (accuracy -> substitution) are model-derived and survey-informed rather than observed in field experiments. Methods Rigorhigh — Methods integrate rigorous elements — direct scaling-law experiments, systematic task-level data from O*NET, a large domain-expert survey, and algorithmic task decomposition — and the paper calibrates and implements the model in a concrete domain (computer vision); nevertheless, the approach depends on structural assumptions (entropy-based complexity, the functional form linking accuracy to substitution, and calibration choices) that condition the quantitative conclusions. SampleTask universe from O*NET task descriptors across occupations; a cross-domain survey of 3,778 domain experts eliciting task difficulty/substitutability judgments; task decompositions generated using GPT-4o; scaling-law experiments run to map performance to data, compute, and model size in computer-vision tasks; calibration connecting model outputs to labor compensation exposures in occupations with computer-vision tasks. Themeshuman_ai_collab productivity labor_markets adoption innovation GeneralizabilityCalibrated primarily in computer vision — quantitative results may not hold for NLP or other modalities without re-estimating scaling parameters and task mappings, Relies on scaling-law parameters that vary across model architectures, training regimes, and time; future models could exhibit different marginal costs of accuracy, Survey sample may suffer from selection or framing biases and may not represent employer-level substitution decisions, GPT-4o-derived task decompositions reflect the model's internal priors and may mischaracterize human task structure, The entropy-based mapping from accuracy to substitution is a modeling choice; alternative mappings could change magnitudes, Macro labor-market feedbacks, adjustment costs, heterogeneous firm behavior, and institutional constraints are simplified or abstracted away

Claims (10)

Claim	Direction	Confidence	Outcome	Details
We model automation intensity as a continuous choice in which firms minimize costs by selecting an AI accuracy level, from no automation through partial human-AI collaboration to full automation. Task Allocation	positive	high	degree of automation (accuracy level chosen by firms)	0.12
AI systems exhibit predictable but diminishing returns to data, compute, and model size (scaling-law experiments), implying the cost of higher accuracy is convex: good performance may be inexpensive, but near-perfect accuracy is disproportionately costly. Firm Productivity	negative	high	marginal returns to inputs (data, compute, model size) and marginal cost of accuracy	0.2
Because higher accuracy is disproportionately costly (convex cost), full automation is often not cost-minimizing; partial automation, where firms retain human workers for residual tasks, frequently emerges as the equilibrium. Task Allocation	positive	high	prevalence of partial automation vs full automation as cost-minimizing choices	0.12
We introduce an entropy-based measure of task complexity that maps model accuracy into a labor substitution ratio, quantifying human labor displacement at each accuracy level. Automation Exposure	neutral	high	labor substitution ratio (human labor displaced per unit accuracy)	0.12
The framework is calibrated with O*NET task data, a survey of 3,778 domain experts, and GPT-4o-derived task decompositions, and implemented in computer vision. Other	neutral	high	validity of calibration / empirical grounding of the framework	n=3778 0.2
Task complexity shapes substitution: low-complexity tasks see high substitution, while high-complexity tasks favor limited partial automation. Automation Exposure	negative	high	degree of labor substitution as a function of task complexity	n=3778 0.12
Scale of deployment is a key determinant: AI-as-a-Service and AI agents spread fixed costs across users, sharply expanding economically viable tasks. Adoption Rate	positive	high	number/coverage of economically viable tasks (adoption potential) as a function of deployment scale	0.12
At the firm level, cost-effective automation captures approximately 11% of computer-vision-exposed labor compensation. Labor Share	positive	high	share of computer-vision-exposed labor compensation captured by cost-effective automation	approximately 11% 0.12
Under economy-wide deployment, the share of computer-vision-exposed labor compensation that is cost-effectively automatable rises sharply (relative to the firm-level 11% estimate). Labor Share	positive	high	share of labor compensation automatable under economy-wide deployment	0.12
Because other AI systems exhibit similar scaling-law economics, the mechanisms identified extend beyond computer vision, reinforcing that partial automation is often the economically rational long-run outcome, not merely a transitional phase. Task Allocation	positive	medium	prevalence of partial automation across AI application domains	0.01