Heavy use of on-demand AI can undermine learning in a controlled reasoning task, while informative AI boosts short-term performance and—on average—does not reduce later independent performance; outcomes, however, vary substantially across users.

The Impact of AI Usage and Informativeness on Skill Development in Logical Reasoning

Shang Wu, Hongyu Yao, Catarina Belem, Shuyuan Fu, Mark Steyvers, Padhraic Smyth · May 20, 2026

arxiv quasi_experimental medium evidence 7/10 relevance Source PDF

In a controlled reasoning task with on-demand AI, heavy AI users learned less than comparable peers, low-information AI failed to aid learning, and high-information AI improved immediate performance without consistently harming post-AI outcomes on average, though effects varied across users.

Artificial intelligence (AI) is being increasingly integrated into human problem-solving, yet its effects on individual skill development remain unclear. We examine how both AI usage and informativeness can shape learning in the context of a controlled logical reasoning task with on-demand access to AI assistance. We find that greater AI usage is associated with weaker skill development: heavy AI users underperform relative to comparable peers, whereas light AI users perform similarly to matched users who do not use AI. We also find in our study that these patterns are mediated by AI informativeness. Low-information AI neither improves immediate performance nor preserves performance after AI assistance is removed, and is linked to weaker learning overall. On the other hand, high-information AI was found to improve short-run performance without reducing post-AI outcomes on average in our experiments, but with heterogeneous effects. Our findings in general suggest that AI can, depending on context, either complement human skill development by amplifying independent reasoning or can act as a substitute that undermines such reasoning, with the implication that regulating AI access and usage will be important for promoting skill development in the presence of AI assistance.

Summary

Main Finding

Greater reliance on on-demand AI in a time-limited logical-reasoning task is associated with weaker subsequent skill development: heavy AI users underperform matched non-users after AI is removed, whereas light users perform similarly (or slightly better) than matched peers. Crucially, AI informativeness moderates these effects: low-information AI (reveals 1 item) produces no short-run gains and harms post-AI learning, while high-information AI (reveals 3 items) improves immediate performance and, on average, does not reduce post-AI outcomes but generates heterogeneous (polarizing) effects—benefiting higher-ability users more and amplifying ability gaps.

Key Points

Usage behavior matters: Heavy AI usage (usage fraction > 0.4 of Phase 2 problems) correlates with longer response times and lower correctness in the post-AI assessment relative to matched controls; light users do not show this deficit.
Informativeness matters:
- Low-information AI: no meaningful immediate performance improvement; associated with reduced post-AI reward rates (especially among lower-ability participants).
- High-information AI: immediate performance increases during AI exposure; on average no significant post-AI decline, but widens the gap between high- and low-ability participants (polarization).
Mechanisms:
- Cognitive offloading: heavier and earlier requests reduce “solo thinking” time, displacing independent effort.
- Miscalibration: lower-ability participants report inflated perceived ability when AI is available, possibly reducing effort and impairing learning.
Design levers (timing of assistance, informativeness, cost per request) influence reliance and learning outcomes.
Robustness: analyses control for baseline ability using propensity-score matching and split-sample analyses by initial ability; AI accuracy was held at 100% (simulated) to isolate informativeness effects.

Data & Methods

Task: Time-constrained logic puzzles—determine unique order of six objects given logical constraints; subjects could submit up to two attempts per problem and always saw the correct solution afterward (feedback).
Experimental phases:
- Phase 1: Pre-AI baseline (8 minutes; at least 4 problems).
- Phase 2: Treatment (20 minutes): No-AI, Low-information AI (reveals 1 object per request), or High-information AI (reveals 3 objects per request). AI requests cost 0.2 points (only if solved correctly). One AI request allowed per problem.
- Phase 3: Post-AI assessment (same as Phase 1, no assistance).
Sample: 160 recruited via Prolific; final N = 132 after exclusions (42 No-AI, 47 Low-info AI, 43 High-info AI). Participants were U.S.-based adults with at least a BA; mean age ~40.
AI: Simulated assistant with perfect (100%) accuracy to separate informativeness from reliability.
Primary outcome: Reward rate = correctness (correct objects out of 6) per minute (speed–accuracy tradeoff). Other metrics: response time, correctness, AI usage fraction, timing of AI requests, solo thinking ratio, initial ability (Phase 1 reward rate).
Analyses:
- Between-condition comparisons over phases.
- Propensity-score matching (PSM) to compare Light and Heavy AI users with matched No-AI controls on Phase 1 correctness and response time.
- Subgroup analyses by initial ability (median split) to assess heterogeneity.
- Time-course plots of reward rate across phase halves to inspect dynamics.
Limitations noted by authors: lab-style puzzle domain (abstraction), short-term post-AI assessment (no long-run follow-up), simulated perfect AI (doesn’t capture trust dynamics with imperfect accuracy), sample skewed to educated adults.

Implications for AI Economics

Short-run vs long-run trade-offs: Firms and educators should weigh immediate productivity gains from AI against potential depreciation of human capital when AI is heavily used. Policies that prioritize only short-run output may underinvest in durable skill formation.
Informational design matters for welfare and inequality:
- Well-designed, information-rich assistants can raise productivity without uniformly degrading skills, but they risk amplifying disparities if higher-ability workers exploit them more effectively. Adoption of such tools may increase within-occupation inequality.
- Low-information tools that are easy to invoke but not very helpful can harm learning and productivity in the medium run—introducing a negative externality from apparent short-run access.
Pricing and access regulation as policy levers:
- Small user costs (monetary or behavioral nudges) per request can shape reliance; optimal pricing could discourage overreliance and preserve skill while allowing beneficial selective use.
- Access controls or delayed-assistance designs (encouraging independent effort before help) may sustain learning while still enabling assistance.
Human capital accumulation and labor markets:
- Persistent use of low-value AI could reduce future worker capabilities, lowering long-term productivity and increasing training costs for firms/governments.
- Firms should monitor usage patterns: heavy reliance signals potential skill erosion and may warrant targeted training or adjusted task assignments.
Product and interface design:
- Designers should prioritize informativeness and promote selective, late-stage assistance (e.g., reveal hints only after users have attempted or after a timeout) to encourage cognitive engagement.
- Transparent feedback about a user’s dependence and calibrated performance signals could reduce miscalibrated self-assessment and sustain learning.
Measurement and evaluation:
- Evaluations of AI’s economic impact should incorporate post-assistance skill retention (not just in-assistance productivity) and heterogeneity by baseline ability to capture distributional effects.
- Cost–benefit analyses for AI deployment should internalize potential long-term skill depreciation and inequality effects.

Suggested directions for policymakers, firms, and researchers: test incentive structures (pricing, delayed hints), evaluate long-term retention in real work/education settings, extend to imperfect AI accuracy, and design interventions that encourage productive (selective) use especially among lower-ability users.

Assessment

Paper Typequasi_experimental Evidence Strengthmedium — Randomized variation in AI informativeness yields credible short-run causal evidence about the informativeness treatment, but the main result linking usage intensity to weaker learning is based on endogenous, self-selected use (albeit with matching/controls), leaving residual selection and unobserved confounding concerns; the setting is an artificial reasoning task with likely limited external validity. Methods Rigormedium — The study employs a controlled experimental protocol, pre/post measurement, and randomized informativeness which are methodologically strong, and it uses matching/mediation analysis to probe usage effects; however, reliance on on-demand uptake (endogenous treatment), potential small or non-representative sample, and domain limitations (logical-reasoning task rather than real-world work) weaken rigor relative to a fully powered field RCT. SampleParticipants in a controlled experiment (likely online or lab subjects) completing a logical-reasoning training task with on-demand access to an AI assistant; participants were randomized to different AI informativeness levels, with pre-test and post-test performance measured and usage intensity recorded (sample size and population not specified in the summary). Themesskills_training human_ai_collab productivity IdentificationRandomized manipulation of AI informativeness combined with a controlled pre/post testing protocol; causal claims about informativeness come from the randomized assignment, while claims about AI usage intensity rely on observational comparisons (matching/controls) across participants with on-demand access. GeneralizabilityArtificial task (logical reasoning) may not map to workplace or real-world skill domains (e.g., writing, coding, negotiation)., Short-term experimental horizon — limited evidence on long-term skill development or retention., Participant pool likely non-representative (students or online panels) so population external validity is limited., On-demand, experimental AI assistant may differ from production LLMs and real-world integration into workflows., Usage intensity is endogenous in the study, so patterns may differ when access or incentives differ in real settings.

Claims (6)

Claim	Direction	Confidence	Outcome	Details
Greater AI usage is associated with weaker skill development: heavy AI users underperform relative to comparable peers, whereas light AI users perform similarly to matched users who do not use AI. Skill Acquisition	negative	high	skill development / performance after AI assistance removed	0.48
Light AI users perform similarly to matched users who do not use AI. Skill Acquisition	null_result	high	post-AI performance / skill development	0.48
AI informativeness mediates the relationship between AI usage and learning outcomes. Skill Acquisition	mixed	medium	learning outcomes / performance (mediated by AI informativeness)	0.29
Low-information AI neither improves immediate performance nor preserves performance after AI assistance is removed, and is linked to weaker learning overall. Skill Acquisition	negative	high	immediate performance and post-AI performance (skill retention/learning)	0.48
High-information AI improves short-run (immediate) performance without reducing post-AI outcomes on average in the experiments, but effects are heterogeneous across participants. Skill Acquisition	mixed	high	short-run (immediate) performance and post-AI performance (average and heterogeneous effects)	0.48
Depending on context, AI can either complement human skill development by amplifying independent reasoning or act as a substitute that undermines such reasoning; therefore regulating AI access and usage will be important for promoting skill development in the presence of AI assistance. Governance And Regulation	positive	medium	policy relevance for skill development (recommendation to regulate AI access/usage)	0.05