Heavy use of on-demand AI can undermine learning in a controlled reasoning task, while informative AI boosts short-term performance and—on average—does not reduce later independent performance; outcomes, however, vary substantially across users.
Artificial intelligence (AI) is being increasingly integrated into human problem-solving, yet its effects on individual skill development remain unclear. We examine how both AI usage and informativeness can shape learning in the context of a controlled logical reasoning task with on-demand access to AI assistance. We find that greater AI usage is associated with weaker skill development: heavy AI users underperform relative to comparable peers, whereas light AI users perform similarly to matched users who do not use AI. We also find in our study that these patterns are mediated by AI informativeness. Low-information AI neither improves immediate performance nor preserves performance after AI assistance is removed, and is linked to weaker learning overall. On the other hand, high-information AI was found to improve short-run performance without reducing post-AI outcomes on average in our experiments, but with heterogeneous effects. Our findings in general suggest that AI can, depending on context, either complement human skill development by amplifying independent reasoning or can act as a substitute that undermines such reasoning, with the implication that regulating AI access and usage will be important for promoting skill development in the presence of AI assistance.
Summary
Main Finding
Greater reliance on on-demand AI in a time-limited logical-reasoning task is associated with weaker subsequent skill development: heavy AI users underperform matched non-users after AI is removed, whereas light users perform similarly (or slightly better) than matched peers. Crucially, AI informativeness moderates these effects: low-information AI (reveals 1 item) produces no short-run gains and harms post-AI learning, while high-information AI (reveals 3 items) improves immediate performance and, on average, does not reduce post-AI outcomes but generates heterogeneous (polarizing) effects—benefiting higher-ability users more and amplifying ability gaps.
Key Points
- Usage behavior matters: Heavy AI usage (usage fraction > 0.4 of Phase 2 problems) correlates with longer response times and lower correctness in the post-AI assessment relative to matched controls; light users do not show this deficit.
- Informativeness matters:
- Low-information AI: no meaningful immediate performance improvement; associated with reduced post-AI reward rates (especially among lower-ability participants).
- High-information AI: immediate performance increases during AI exposure; on average no significant post-AI decline, but widens the gap between high- and low-ability participants (polarization).
- Mechanisms:
- Cognitive offloading: heavier and earlier requests reduce “solo thinking” time, displacing independent effort.
- Miscalibration: lower-ability participants report inflated perceived ability when AI is available, possibly reducing effort and impairing learning.
- Design levers (timing of assistance, informativeness, cost per request) influence reliance and learning outcomes.
- Robustness: analyses control for baseline ability using propensity-score matching and split-sample analyses by initial ability; AI accuracy was held at 100% (simulated) to isolate informativeness effects.
Data & Methods
- Task: Time-constrained logic puzzles—determine unique order of six objects given logical constraints; subjects could submit up to two attempts per problem and always saw the correct solution afterward (feedback).
- Experimental phases:
- Phase 1: Pre-AI baseline (8 minutes; at least 4 problems).
- Phase 2: Treatment (20 minutes): No-AI, Low-information AI (reveals 1 object per request), or High-information AI (reveals 3 objects per request). AI requests cost 0.2 points (only if solved correctly). One AI request allowed per problem.
- Phase 3: Post-AI assessment (same as Phase 1, no assistance).
- Sample: 160 recruited via Prolific; final N = 132 after exclusions (42 No-AI, 47 Low-info AI, 43 High-info AI). Participants were U.S.-based adults with at least a BA; mean age ~40.
- AI: Simulated assistant with perfect (100%) accuracy to separate informativeness from reliability.
- Primary outcome: Reward rate = correctness (correct objects out of 6) per minute (speed–accuracy tradeoff). Other metrics: response time, correctness, AI usage fraction, timing of AI requests, solo thinking ratio, initial ability (Phase 1 reward rate).
- Analyses:
- Between-condition comparisons over phases.
- Propensity-score matching (PSM) to compare Light and Heavy AI users with matched No-AI controls on Phase 1 correctness and response time.
- Subgroup analyses by initial ability (median split) to assess heterogeneity.
- Time-course plots of reward rate across phase halves to inspect dynamics.
- Limitations noted by authors: lab-style puzzle domain (abstraction), short-term post-AI assessment (no long-run follow-up), simulated perfect AI (doesn’t capture trust dynamics with imperfect accuracy), sample skewed to educated adults.
Implications for AI Economics
- Short-run vs long-run trade-offs: Firms and educators should weigh immediate productivity gains from AI against potential depreciation of human capital when AI is heavily used. Policies that prioritize only short-run output may underinvest in durable skill formation.
- Informational design matters for welfare and inequality:
- Well-designed, information-rich assistants can raise productivity without uniformly degrading skills, but they risk amplifying disparities if higher-ability workers exploit them more effectively. Adoption of such tools may increase within-occupation inequality.
- Low-information tools that are easy to invoke but not very helpful can harm learning and productivity in the medium run—introducing a negative externality from apparent short-run access.
- Pricing and access regulation as policy levers:
- Small user costs (monetary or behavioral nudges) per request can shape reliance; optimal pricing could discourage overreliance and preserve skill while allowing beneficial selective use.
- Access controls or delayed-assistance designs (encouraging independent effort before help) may sustain learning while still enabling assistance.
- Human capital accumulation and labor markets:
- Persistent use of low-value AI could reduce future worker capabilities, lowering long-term productivity and increasing training costs for firms/governments.
- Firms should monitor usage patterns: heavy reliance signals potential skill erosion and may warrant targeted training or adjusted task assignments.
- Product and interface design:
- Designers should prioritize informativeness and promote selective, late-stage assistance (e.g., reveal hints only after users have attempted or after a timeout) to encourage cognitive engagement.
- Transparent feedback about a user’s dependence and calibrated performance signals could reduce miscalibrated self-assessment and sustain learning.
- Measurement and evaluation:
- Evaluations of AI’s economic impact should incorporate post-assistance skill retention (not just in-assistance productivity) and heterogeneity by baseline ability to capture distributional effects.
- Cost–benefit analyses for AI deployment should internalize potential long-term skill depreciation and inequality effects.
Suggested directions for policymakers, firms, and researchers: test incentive structures (pricing, delayed hints), evaluate long-term retention in real work/education settings, extend to imperfect AI accuracy, and design interventions that encourage productive (selective) use especially among lower-ability users.
Assessment
Claims (6)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Greater AI usage is associated with weaker skill development: heavy AI users underperform relative to comparable peers, whereas light AI users perform similarly to matched users who do not use AI. Skill Acquisition | negative | high | skill development / performance after AI assistance removed |
0.48
|
| Light AI users perform similarly to matched users who do not use AI. Skill Acquisition | null_result | high | post-AI performance / skill development |
0.48
|
| AI informativeness mediates the relationship between AI usage and learning outcomes. Skill Acquisition | mixed | medium | learning outcomes / performance (mediated by AI informativeness) |
0.29
|
| Low-information AI neither improves immediate performance nor preserves performance after AI assistance is removed, and is linked to weaker learning overall. Skill Acquisition | negative | high | immediate performance and post-AI performance (skill retention/learning) |
0.48
|
| High-information AI improves short-run (immediate) performance without reducing post-AI outcomes on average in the experiments, but effects are heterogeneous across participants. Skill Acquisition | mixed | high | short-run (immediate) performance and post-AI performance (average and heterogeneous effects) |
0.48
|
| Depending on context, AI can either complement human skill development by amplifying independent reasoning or act as a substitute that undermines such reasoning; therefore regulating AI access and usage will be important for promoting skill development in the presence of AI assistance. Governance And Regulation | positive | medium | policy relevance for skill development (recommendation to regulate AI access/usage) |
0.05
|