Large experiment finds a 'speedup illusion': users expect LLMs to be faster but actual completion times on simple tasks are unchanged, even as subjective effort falls; the bias is specific to AI and not seen when imagining human help.

Cognitive offloading and the speedup illusion in human-AI interaction

Sunny Yu, Myra Cheng, Ahmad Jabbar, Ilia Sucholutsky, Katherine M. Collins, Dan Jurafsky, Robert D. Hawkins · May 22, 2026

arxiv rct medium evidence 7/10 relevance Source PDF

In a preregistered experiment with 1,237 participants, people overestimate the time-saving benefits of LLM assistance on simple cognitive tasks—actual completion times did not differ between independent and AI-assisted conditions, though participants predicted AI would be faster and reported lower subjective effort when using AI.

Large language models (LLMs) have the potential to boost human productivity by speeding up task completion -- provided users know when to offload cognitive work to them. But we do not know if users are well-calibrated in estimating these potential time savings. We conducted a preregistered large-scale behavioral study (N = 1237) to characterize mismatches between expectations and reality, with a focus on simple cognitive tasks. While actual completion times between independent completion and AI-assisted completion did not differ, participants predicted AI to be significantly faster. The same bias was not observed when imagining help from another human participant. We identify a speedup illusion where people have accurate forecasts of independent completion times but significantly underestimate AI-assisted times. Additionally, time and effort dissociate: participants reported lower subjective effort with AI despite equivalent completion times. This suggests that completion time itself is not sufficient to characterize efficiency gains.

Summary

Main Finding

People systematically overestimate how much time large language models (LLMs) will save on simple cognitive tasks — a "speedup illusion." In a preregistered behavioral study (N = 1,237), predicted AI-assisted completion times were much shorter than people’s actual AI-assisted times, whereas predictions of independent (no-AI) completion times were well calibrated. Although AI rarely reduced actual completion time for easy tasks, it consistently reduced subjective mental effort.

Key Points

Sample and design
- Total N = 1,237 (prediction sample n = 401, completion sample n = 836) recruited on Prolific (US-representative); preregistered; IRB-approved; data/code available.
- Between-subject design: separate prediction vs. completion samples to avoid contamination.
- Tasks: 24 tasks (4 categories: C1 Information Seeking, C2 Processing & Synthesis, C3 Procedural Guidance & Execution, C4 Content Creation & Transformation), each at two difficulty levels (easy / difficult). Only correct, high-quality responses were analyzed (excluded 6.3% independent, 4.0% AI responses).
Main quantitative results
- People expected AI to reduce completion time by ~68.5 seconds relative to independent completion (β = 68.5, SE = 3.37, p < 0.001).
- Predicted vs. actual calibration:
  - Independent condition: no reliable difference between predicted and actual completion times (β = −2.52, SE = 8.83, p = 0.775) — i.e., good calibration.
  - AI condition: actual AI-assisted completion time was significantly longer than predicted AI-assisted time (difference ≈ +57.8 seconds; β = 57.8, SE = 8.79, p < 0.001) — the speedup illusion.
- Actual time savings: AI assistance sped up only difficult tasks (mean ≈ 26.1 seconds faster; SE = 11.9, p < 0.05) and produced significant time savings on only 3 of the 24 tasks.
- Subjective effort (NASA-TLX, 5-item version):
  - AI reduced perceived effort across tasks: average NASA-TLX decreased by 0.61 points on a 7-point scale (SE = 0.059, t = −10.38, p < 0.001).
  - 15 out of 24 tasks showed significant reductions in subjective effort with AI assistance.
- Individual differences:
  - Lower Need for Cognition (NfC) — i.e., people who dislike thinking — predicted larger predicted AI speedup (more susceptible to the speedup illusion).
  - Frequency of AI use and general AI assessments did not predict calibration error.
- Prompting and interaction
  - 70% of user–LLM interactions were single-turn; max observed turns = 5.
  - Model generation time was small (≈ 2.89 seconds).
  - Prompt composition time and post-response processing time were similar on average, but varied by task:
    - Prompting took ~33.4 seconds longer than post-processing for C2 tasks.
    - For some tasks (e.g., a logic problem), post-response reading/processing was much longer (≈ 114 seconds longer), driven by verbose or hard-to-parse model outputs.
  - Copy–pasted prompts (18.5% of prompts) reduced NASA-TLX slightly (~0.14) but did not shorten completion time.

Data & Methods

Two-sample design:
- Prediction sample: participants estimated how long each task would take independently and with external assistance (AI or another highly intelligent human). They also stated whether they'd choose to offload.
- Completion sample: participants either completed tasks independently or with AI assistance (embedded GPT‑4o chat); hidden timers recorded completion times; NASA-TLX measured subjective effort after each task.
Analyses:
- Linear mixed-effects models with random intercepts for participant and task; main contrasts: (prediction vs. completion) × (independent vs. AI-assisted) controlling for task difficulty and task category.
- Focused analyses on correct, high-quality responses.
Key robustness & exclusions:
- Pre-registration and IRB.
- Excluded low-effort or incorrect responses; reported proportions excluded.
- Sample balanced on demographics (reported percentages).

Implications for AI Economics

Productivity measurement: time-to-completion alone can misrepresent AI-driven productivity. LLMs may not reduce objective time on many simple tasks but do reduce subjective effort. Economic assessments of AI productivity gains should incorporate both objective time and subjective/psychic costs (effort, cognitive load).
Adoption dynamics and demand:
- Miscalibrated user beliefs (speedup illusion) can inflate adoption of LLMs for tasks where they do not actually save time. This may create a self-reinforcing feedback loop: people use AI because it feels less effortful and expect time savings, which further normalizes offloading even absent objective efficiency gains.
- Models of diffusion and firm adoption should allow for demand driven by perceived (not realized) gains and by reductions in subjective effort that alter labor allocation or task willingness.
Labor-market effects:
- Substitution vs. augmentation is heterogeneous by task complexity. AI produced measurable time savings primarily on harder tasks — suggesting complementarity for complex tasks and limited substitution for simple ones.
- Lower subjective effort with unchanged completion times could change workers' willingness to perform certain tasks (increasing labor supply for some activities) or enable higher throughput in longer workflows where subjective effort constraints were binding even if per-task timing unchanged.
Measurement & policy recommendations:
- When estimating macro or firm-level productivity impacts, include measures of interaction overhead (prompting, processing verbose outputs), post-response cognitive processing, and user calibration errors.
- Interface and transparency interventions (e.g., showing expected vs. empirical time-to-complete, concise model outputs, better summarization) may reduce miscalibration and improve resource-rational offloading decisions.
- Training and nudges targeted at users with low Need for Cognition may be especially important, since they are more prone to overestimating AI time savings.
Research and modeling suggestions:
- Incorporate subjective effort reductions as a separate utility term in economic models of AI adoption and labor supply.
- Allow adoption to be influenced by beliefs that may diverge from realized productivity gains and study dynamic belief-updating from repeated use.
- Account for heterogeneity in task-level returns: estimate distributions of tH(τ) and tA(τ) by task complexity, and include interaction costs (prompting + post-processing).

Caveats - Tasks were short, discrete cognitive tasks in an experimental setting using GPT‑4o; generalization to complex, multi-step real-world workflows, other LLMs, or integrated systems may be limited. - Between-subject design avoids practice effects but does not measure within-person calibration change after feedback. - The study filtered for correct, high-quality responses — results concern tasks where outcomes were achieved, not cases with incorrect AI output or error correction costs.

Assessment

Paper Typerct Evidence Strengthmedium — Internal validity is strong due to preregistration, large sample (N=1,237), experimental manipulation, and objective time measures, which credibly identify a mismatch between expectations and realized times for simple tasks; external validity is limited because tasks are simple/cognitive, assistance was constrained to a specific experimental setup (and likely a single LLM/interface), assistance from humans was imagined rather than enacted, and outcomes are short-term lab measures rather than real-world firm- or worker-level productivity. Methods Rigorhigh — Study is preregistered, uses a large sample, includes an experimental control (imagined human help), measures objective completion times alongside subjective effort, and contrasts forecasts with realized outcomes; possible limitations include task choice (simple tasks), potential differences between imagined and actual human assistance, and single-session testing, but these do not substantially undermine the internal experimental design. SamplePreregistered large-scale behavioral sample of 1,237 adult participants (online convenience panel) who completed a set of simple cognitive tasks under independent and LLM-assisted conditions; participants provided time forecasts and subjective effort ratings, and actual completion times were recorded; includes a comparison condition asking participants to imagine help from another human. Themesproductivity human_ai_collab adoption IdentificationPreregistered randomized behavioral experiment comparing predicted vs actual completion times across conditions (independent completion vs LLM-assisted completion), with an additional imagined-human-help control; causal claims rely on random assignment/experimental manipulation and objective timing of task completion. GeneralizabilityOnline convenience sample may not represent workers in organizations or expert users, Tasks were simple, short cognitive tasks—not complex, creative, or domain-specific work, Single-session experiment does not capture learning, habituation, or long-run effects, Assistance provided via a specific LLM/interface and configuration — effects may differ for other models or tool integrations, Human-help condition was imagined rather than actual collaborative assistance, Does not measure firm-level productivity, output quality in real work settings, wages, or labor market outcomes

Claims (8)

Claim	Direction	Confidence	Outcome	Details
Large language models (LLMs) have the potential to boost human productivity by speeding up task completion -- provided users know when to offload cognitive work to them. Task Completion Time	positive	high	task completion speed (potential)	0.1
We conducted a preregistered large-scale behavioral study (N = 1237) to characterize mismatches between expectations and reality, with a focus on simple cognitive tasks. Other	null_result	high	study design / sample size (methodological claim)	n=1237 1.0
Actual completion times between independent completion and AI-assisted completion did not differ. Task Completion Time	null_result	high	actual completion time	n=1237 1.0
Participants predicted AI to be significantly faster. Task Completion Time	positive	high	predicted completion time	n=1237 1.0
The same bias was not observed when imagining help from another human participant. Task Completion Time	null_result	high	predicted completion time when imagining help from another human	n=1237 1.0
There is a 'speedup illusion' where people have accurate forecasts of independent completion times but significantly underestimate AI-assisted times. Task Completion Time	negative	high	calibration of predicted vs actual completion time	n=1237 1.0
Time and effort dissociate: participants reported lower subjective effort with AI despite equivalent completion times. Worker Satisfaction	positive	high	subjective effort (self-reported); actual completion time also measured	n=1237 1.0
Completion time itself is not sufficient to characterize efficiency gains. Organizational Efficiency	mixed	high	adequacy of completion time as a measure of efficiency	n=1237 0.6