Generative-AI shaped practice habits rose sharply among competitive programmers, but its payoff depends on screening: in open online contests the AI-style practice predicts smaller rating gains, whereas among entrants vetted through AI-prohibited, proctored gates it predicts stronger unaided performance, suggesting proctored credentials separate true skill accelerants from mere substitutes.

When the Scaffold Stays On: AI, Practice Style, and Screening in Elite Skill Formation

Song Yao · June 04, 2026

arxiv quasi_experimental medium evidence 8/10 relevance Source PDF

Using an AI-style practice signature derived from Codeforces histories and contrasts between open contests and AI-prohibited, proctored gates (ICPC/IOI), the paper finds AI-style practice increased after generative-AI rollouts and predicts worse rating gains in unproctored contests but stronger non-AI-aided performance among those who pass AI-prohibited selection, implying institutions can screen substitutes from complements.

Generative AI raises short-term productivity by completing tasks that learners would otherwise practice on their own. Whether this substitution erodes frontier skill, the skill behind top-tail non-AI-aided performance, is an open question of rising stakes. The sharper question is whether selection mechanisms can screen apart two coexisting types: substitute-users, who use AI in place of deliberate practice, and complement-users, who use it to accelerate skill development. In elite programming, the International Collegiate Programming Contest (ICPC) and the International Olympiad in Informatics (IOI) prohibit AI under proctoring and admit entrants through qualification rounds, whereas online Codeforces (CF) contests are unproctored and open to all. From CF histories we build an AI-prompt signature (more first-attempt acceptances, fewer attempts and retries) consistent with AI-assisted practice. Three patterns triangulate institutional screening. First, CF practice shifted toward this signature across cohorts over two AI rollouts. Second, in open CF contests a stronger signature predicts smaller rating gains for users with no ICPC/IOI affiliation, but not for those who qualified for the AI-prohibited contests. Third, inside the AI-prohibited ICPC environment, a shift toward AI-style practice predicts higher non-AI-aided scores for AI-era entrants. The same practice input carries opposite signs depending on whether the environment screens for it. The contrast points to two levers: how AI is integrated into training, since within the screened pool AI-style practice coincides with stronger non-AI-aided performance; and the design of AI-prohibited evaluation gates as a type-separating institution. Both extend beyond programming to credentialing systems (medical and legal boards, professional certification) that certify skill in a workforce increasingly shaped by AI.

Summary

Main Finding

The paper shows that generative-AI–style practice (an "AI-prompt" submission signature on Codeforces) coexists with two distinct user types: substitute-users, who let AI replace deliberate practice and lose unaided performance, and complement-users, who use AI to accelerate learning and maintain or improve unaided performance. AI-prohibited qualification gates (ICPC/IOI) screen out substitute-users, so the same AI-style practice predicts worse open-contest outcomes but better non-AI-aided contest outcomes inside the screened elite pool.

Key Points

AI-prompt signature: Defined from CF submission histories as (i) higher share of first-attempt accepts, (ii) fewer attempts per solved problem, (iii) fewer rapid debugging retries — consistent with a prompt-and-accept AI workflow.
Descriptive shift (Prediction 1): CF practice has trended toward the AI-prompt signature across entry cohorts spanning two rollouts (GitHub Copilot 2021, ChatGPT 2022). Example: first-attempt solved fraction rose by 0.0586 (SE 0.0063, p < 0.001) for AI-era cohorts vs pre-AI.
Cross-subpopulation asymmetry (Prediction 2): In open CF contests, a stronger AI-prompt signature predicts smaller CF rating gains for non-rostered users (−12.24 rating pts per signature unit, p < 0.001) but is uncorrelated for users who passed ICPC/IOI qualification (−2.41, p = 0.58). The cross-group difference is −9.82 (p = 0.048; N = 6,690).
- Interpretation: where screening is absent, AI-style practice signals substitution and lower unaided improvement; where screening removed substitutes, the negative association disappears.
- Caveat: this result is suggestive (near conventional significance) and subject to selection/attrition bounds.
Within-screen positive association (Prediction 3): Inside AI-prohibited ICPC contests, within-user shifts toward AI-style practice predict higher unaided contest scores for entrants whose careers began in the AI era. Interaction coefficient = 0.190 (p = 0.028; 95% CI [0.020, 0.360]; N = 579 contest-years, 444 users). Pre-AI cohort slope is null (−0.048).
- Interpretation: after screening out substitute-users, AI-style practice among the remaining users tracks complement use that improves unaided performance.
Conceptual framing: Extends macro models (e.g., Acemoglu et al. 2026) by introducing heterogeneous agent responses (substitute vs complement) and treats AI-prohibited gates as screening instruments that shape which equilibrium obtains.

Data & Methods

Data sources:
- Codeforces (CF) submission histories (panel of 10,419 users, 2018–2025) to measure practice inputs and construct the AI-prompt signature.
- Roster and contest-score linkages to ICPC and IOI for a subset of users (used to identify screened vs unscreened subpopulations and to analyze performance in AI-prohibited contests).
- Two AI rollouts used for timing/context: GitHub Copilot (2021) and ChatGPT (2022).
Key variables:
- AI-prompt signature: first principal component of the three practice-style measures (first-attempt accepts, attempts per solved problem, rapid retries).
- Outcomes: CF rating changes (open, AI-permissive contests) and ICPC contest scores (AI-prohibited, in-person).
Identification strategy:
- Use behavioral proxies (submission patterns) as indicators of AI-assisted practice rather than direct logs of AI use.
- Exploit institutional variation: CF is open/unproctored (mixes types), ICPC/IOI are AI-prohibited with qualification gates (expected to filter substitute-users).
- Tests include cohort contrasts, cross-subpopulation interactions (rostered vs non-rostered), and within-contest within-user analyses to control for time-invariant heterogeneity and career stage.
Sample sizes & key estimates:
- N ≈ 10,419 CF users tracked overall.
- CF rating cross-subpopulation test: N = 6,690 (5,283 non-rostered, 1,407 roster-linked); coefficient −12.24 (non-rostered), −2.41 (rostered).
- ICPC within-contest test: 579 contest-years, 444 users; AI-era cohort interaction = 0.190 (p = 0.028).
Robustness and limits:
- The AI-prompt signature is a proxy consistent with AI use but not a verified measure of actual AI prompting.
- Cohort practice shifts are descriptive and not cleanly timed to AI rollouts; causality is strongest for within-ICPC analyses.
- Cross-group contrast is near conventional significance and sensitive to selection/attrition assumptions; the paper reports selection bounds.

Implications for AI Economics

Human-capital formation and heterogeneity:
- Generative AI can compress lower-tail performance but threaten frontier, unaided skill if used as a substitute for deliberate practice. Models of labor and human capital should allow heterogeneous engagement (substitute vs complement) and account for selection into credentials.
Credentialing and screening as policy levers:
- AI-prohibited evaluation gates (in-person, no-internet/proctoring, peer selection) act as mechanism-design tools that can preserve the signal value of credentials by filtering substitute-users. Economic policy debates should treat gate design (not just prohibition) as central.
Training design:
- The same AI tool can be scaffolded to complement learning (e.g., hint-based or explanation-focused modes) rather than substitute it; experiments (and evidence cited in the paper) suggest scaffolded AI raises practice outcomes without degrading unaided performance.
Labor-market signaling and redistribution:
- Widespread AI adoption may change the information content of common signals (grades, contest ratings). Credentialing institutions that fail to adjust may see erosion of their signaling power at the frontier, affecting hiring and education incentives.
External validity and domain dependence:
- Generalization depends on how substitutable AI is for core skill in a domain. Competitive programming is a setting where AI can directly produce scored outputs (high substitutability). Domains where AI functions primarily as feedback (e.g., some chess uses) or where strong institutional boundaries already exist may be less vulnerable.
Research and policy priorities:
- Empirical work should measure engagement heterogeneity (not just access) and evaluate gate designs across professions (medicine, law, certification bodies).
- Policy experiments could test targeted prohibitions, scaffolded AI tools, or mixed-evaluation designs that preserve non-AI-aided practice for frontier skill assessment.

Assessment

Paper Typequasi_experimental Evidence Strengthmedium — The paper triangulates evidence across multiple comparisons (pre/post rollouts, open vs proctored environments, within-screened entrants) which strengthens causal claims relative to simple correlations; however, it lacks a randomized intervention or direct observation of AI use, relies on an inferred signature that may be noisy, and residual selection/confounding cannot be fully ruled out. Methods Rigormedium — Methods are thoughtful: a validated behavioral signature, multiple institutional contrasts, cohort comparators, and outcome measures (rating gains and non-AI-aided contest scores). Limitations include measurement error in the proxy for AI use, potential unobserved heterogeneity across participants and institutions, and limited robustness checks reported (no randomized assignment or instrumental variable to fully address endogeneity). SampleLongitudinal contest histories from Codeforces users (practice/problem submissions, first-attempt accepts, attempts/retries, contest ratings and rating changes) across cohorts spanning pre- and post-generative-AI rollouts, plus entrant rosters and non-AI-aided contest scores from AI-prohibited, proctored competitions (ICPC and IOI) used to identify entrants who passed screening. Themesskills_training human_ai_collab adoption governance IdentificationConstruct an AI-prompt usage signature from Codeforces (CF) practice histories (more first-attempt accepts, fewer attempts/retries); exploit timing of two generative-AI rollouts and contrast behavior/outcomes across institutional environments that differ in AI use constraints — open, unproctored CF contests versus AI-prohibited, proctored gates (ICPC/IOI) — and compare within-cohort and within-gate changes to infer effects and screening (difference-in-differences / cohort-comparison with selection-by-institution logic and within-sample heterogeneity analyses). GeneralizabilityElite, self-selected sample of competitive programmers — not representative of general workers or typical learners, Contest/problem-solving tasks differ from many workplace tasks (structured problems, immediate feedback), AI-signature is an indirect proxy for AI use and may not generalize across platforms or domains with different interaction patterns, Short- to medium-term outcomes in contest performance; long-term skill trajectories unobserved, Cultural and institutional differences in other credentialing systems (medicine, law) may limit transferability

Claims (9)

Claim	Direction	Confidence	Outcome	Details
Generative AI raises short-term productivity by completing tasks that learners would otherwise practice on their own. Developer Productivity	positive	high	short-term productivity (task completion of practice items)	0.24
The International Collegiate Programming Contest (ICPC) and the International Olympiad in Informatics (IOI) prohibit AI under proctoring and admit entrants through qualification rounds, whereas online Codeforces (CF) contests are unproctored and open to all. Governance And Regulation	null_result	high	institutional design (proctoring and entry requirements)	0.8
From Codeforces histories we build an AI-prompt signature characterised by more first-attempt acceptances and fewer attempts and retries, consistent with AI-assisted practice. Skill Acquisition	null_result	high	submission patterns (first-attempt acceptances, attempts, retries)	0.48
Codeforces practice shifted toward this AI-style signature across cohorts over two AI rollouts. Skill Acquisition	positive	high	prevalence of AI-style practice signature in CF cohorts	0.48
In open Codeforces contests a stronger AI-style signature predicts smaller rating gains for users with no ICPC/IOI affiliation, but not for those who qualified for the AI-prohibited contests. Skill Acquisition	mixed	high	rating gains in open CF contests	0.48
Inside the AI-prohibited ICPC environment, a shift toward AI-style practice predicts higher non-AI-aided scores for AI-era entrants. Skill Acquisition	positive	medium	non-AI-aided ICPC scores	0.29
The same practice input carries opposite signs depending on whether the environment screens for it. Skill Acquisition	mixed	high	effect of AI-style practice on performance (rating gains or non-AI scores)	0.48
Two levers follow from the contrast: (1) how AI is integrated into training, since within the screened pool AI-style practice coincides with stronger non-AI-aided performance; and (2) the design of AI-prohibited evaluation gates as a type-separating institution. Governance And Regulation	positive	medium	policy levers affecting skill certification and training outcomes	0.05
These findings and institutional lessons extend beyond programming to credentialing systems (medical and legal boards, professional certification) that certify skill in a workforce increasingly shaped by AI. Governance And Regulation	positive	medium	applicability of findings to credentialing systems' design and certification outcomes	0.05