Generative-AI shaped practice habits rose sharply among competitive programmers, but its payoff depends on screening: in open online contests the AI-style practice predicts smaller rating gains, whereas among entrants vetted through AI-prohibited, proctored gates it predicts stronger unaided performance, suggesting proctored credentials separate true skill accelerants from mere substitutes.
Generative AI raises short-term productivity by completing tasks that learners would otherwise practice on their own. Whether this substitution erodes frontier skill, the skill behind top-tail non-AI-aided performance, is an open question of rising stakes. The sharper question is whether selection mechanisms can screen apart two coexisting types: substitute-users, who use AI in place of deliberate practice, and complement-users, who use it to accelerate skill development. In elite programming, the International Collegiate Programming Contest (ICPC) and the International Olympiad in Informatics (IOI) prohibit AI under proctoring and admit entrants through qualification rounds, whereas online Codeforces (CF) contests are unproctored and open to all. From CF histories we build an AI-prompt signature (more first-attempt acceptances, fewer attempts and retries) consistent with AI-assisted practice. Three patterns triangulate institutional screening. First, CF practice shifted toward this signature across cohorts over two AI rollouts. Second, in open CF contests a stronger signature predicts smaller rating gains for users with no ICPC/IOI affiliation, but not for those who qualified for the AI-prohibited contests. Third, inside the AI-prohibited ICPC environment, a shift toward AI-style practice predicts higher non-AI-aided scores for AI-era entrants. The same practice input carries opposite signs depending on whether the environment screens for it. The contrast points to two levers: how AI is integrated into training, since within the screened pool AI-style practice coincides with stronger non-AI-aided performance; and the design of AI-prohibited evaluation gates as a type-separating institution. Both extend beyond programming to credentialing systems (medical and legal boards, professional certification) that certify skill in a workforce increasingly shaped by AI.
Summary
Main Finding
The paper shows that generative-AI–style practice (an "AI-prompt" submission signature on Codeforces) coexists with two distinct user types: substitute-users, who let AI replace deliberate practice and lose unaided performance, and complement-users, who use AI to accelerate learning and maintain or improve unaided performance. AI-prohibited qualification gates (ICPC/IOI) screen out substitute-users, so the same AI-style practice predicts worse open-contest outcomes but better non-AI-aided contest outcomes inside the screened elite pool.
Key Points
- AI-prompt signature: Defined from CF submission histories as (i) higher share of first-attempt accepts, (ii) fewer attempts per solved problem, (iii) fewer rapid debugging retries — consistent with a prompt-and-accept AI workflow.
- Descriptive shift (Prediction 1): CF practice has trended toward the AI-prompt signature across entry cohorts spanning two rollouts (GitHub Copilot 2021, ChatGPT 2022). Example: first-attempt solved fraction rose by 0.0586 (SE 0.0063, p < 0.001) for AI-era cohorts vs pre-AI.
- Cross-subpopulation asymmetry (Prediction 2): In open CF contests, a stronger AI-prompt signature predicts smaller CF rating gains for non-rostered users (−12.24 rating pts per signature unit, p < 0.001) but is uncorrelated for users who passed ICPC/IOI qualification (−2.41, p = 0.58). The cross-group difference is −9.82 (p = 0.048; N = 6,690).
- Interpretation: where screening is absent, AI-style practice signals substitution and lower unaided improvement; where screening removed substitutes, the negative association disappears.
- Caveat: this result is suggestive (near conventional significance) and subject to selection/attrition bounds.
- Within-screen positive association (Prediction 3): Inside AI-prohibited ICPC contests, within-user shifts toward AI-style practice predict higher unaided contest scores for entrants whose careers began in the AI era. Interaction coefficient = 0.190 (p = 0.028; 95% CI [0.020, 0.360]; N = 579 contest-years, 444 users). Pre-AI cohort slope is null (−0.048).
- Interpretation: after screening out substitute-users, AI-style practice among the remaining users tracks complement use that improves unaided performance.
- Conceptual framing: Extends macro models (e.g., Acemoglu et al. 2026) by introducing heterogeneous agent responses (substitute vs complement) and treats AI-prohibited gates as screening instruments that shape which equilibrium obtains.
Data & Methods
- Data sources:
- Codeforces (CF) submission histories (panel of 10,419 users, 2018–2025) to measure practice inputs and construct the AI-prompt signature.
- Roster and contest-score linkages to ICPC and IOI for a subset of users (used to identify screened vs unscreened subpopulations and to analyze performance in AI-prohibited contests).
- Two AI rollouts used for timing/context: GitHub Copilot (2021) and ChatGPT (2022).
- Key variables:
- AI-prompt signature: first principal component of the three practice-style measures (first-attempt accepts, attempts per solved problem, rapid retries).
- Outcomes: CF rating changes (open, AI-permissive contests) and ICPC contest scores (AI-prohibited, in-person).
- Identification strategy:
- Use behavioral proxies (submission patterns) as indicators of AI-assisted practice rather than direct logs of AI use.
- Exploit institutional variation: CF is open/unproctored (mixes types), ICPC/IOI are AI-prohibited with qualification gates (expected to filter substitute-users).
- Tests include cohort contrasts, cross-subpopulation interactions (rostered vs non-rostered), and within-contest within-user analyses to control for time-invariant heterogeneity and career stage.
- Sample sizes & key estimates:
- N ≈ 10,419 CF users tracked overall.
- CF rating cross-subpopulation test: N = 6,690 (5,283 non-rostered, 1,407 roster-linked); coefficient −12.24 (non-rostered), −2.41 (rostered).
- ICPC within-contest test: 579 contest-years, 444 users; AI-era cohort interaction = 0.190 (p = 0.028).
- Robustness and limits:
- The AI-prompt signature is a proxy consistent with AI use but not a verified measure of actual AI prompting.
- Cohort practice shifts are descriptive and not cleanly timed to AI rollouts; causality is strongest for within-ICPC analyses.
- Cross-group contrast is near conventional significance and sensitive to selection/attrition assumptions; the paper reports selection bounds.
Implications for AI Economics
- Human-capital formation and heterogeneity:
- Generative AI can compress lower-tail performance but threaten frontier, unaided skill if used as a substitute for deliberate practice. Models of labor and human capital should allow heterogeneous engagement (substitute vs complement) and account for selection into credentials.
- Credentialing and screening as policy levers:
- AI-prohibited evaluation gates (in-person, no-internet/proctoring, peer selection) act as mechanism-design tools that can preserve the signal value of credentials by filtering substitute-users. Economic policy debates should treat gate design (not just prohibition) as central.
- Training design:
- The same AI tool can be scaffolded to complement learning (e.g., hint-based or explanation-focused modes) rather than substitute it; experiments (and evidence cited in the paper) suggest scaffolded AI raises practice outcomes without degrading unaided performance.
- Labor-market signaling and redistribution:
- Widespread AI adoption may change the information content of common signals (grades, contest ratings). Credentialing institutions that fail to adjust may see erosion of their signaling power at the frontier, affecting hiring and education incentives.
- External validity and domain dependence:
- Generalization depends on how substitutable AI is for core skill in a domain. Competitive programming is a setting where AI can directly produce scored outputs (high substitutability). Domains where AI functions primarily as feedback (e.g., some chess uses) or where strong institutional boundaries already exist may be less vulnerable.
- Research and policy priorities:
- Empirical work should measure engagement heterogeneity (not just access) and evaluate gate designs across professions (medicine, law, certification bodies).
- Policy experiments could test targeted prohibitions, scaffolded AI tools, or mixed-evaluation designs that preserve non-AI-aided practice for frontier skill assessment.
Assessment
Claims (9)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Generative AI raises short-term productivity by completing tasks that learners would otherwise practice on their own. Developer Productivity | positive | high | short-term productivity (task completion of practice items) |
0.24
|
| The International Collegiate Programming Contest (ICPC) and the International Olympiad in Informatics (IOI) prohibit AI under proctoring and admit entrants through qualification rounds, whereas online Codeforces (CF) contests are unproctored and open to all. Governance And Regulation | null_result | high | institutional design (proctoring and entry requirements) |
0.8
|
| From Codeforces histories we build an AI-prompt signature characterised by more first-attempt acceptances and fewer attempts and retries, consistent with AI-assisted practice. Skill Acquisition | null_result | high | submission patterns (first-attempt acceptances, attempts, retries) |
0.48
|
| Codeforces practice shifted toward this AI-style signature across cohorts over two AI rollouts. Skill Acquisition | positive | high | prevalence of AI-style practice signature in CF cohorts |
0.48
|
| In open Codeforces contests a stronger AI-style signature predicts smaller rating gains for users with no ICPC/IOI affiliation, but not for those who qualified for the AI-prohibited contests. Skill Acquisition | mixed | high | rating gains in open CF contests |
0.48
|
| Inside the AI-prohibited ICPC environment, a shift toward AI-style practice predicts higher non-AI-aided scores for AI-era entrants. Skill Acquisition | positive | medium | non-AI-aided ICPC scores |
0.29
|
| The same practice input carries opposite signs depending on whether the environment screens for it. Skill Acquisition | mixed | high | effect of AI-style practice on performance (rating gains or non-AI scores) |
0.48
|
| Two levers follow from the contrast: (1) how AI is integrated into training, since within the screened pool AI-style practice coincides with stronger non-AI-aided performance; and (2) the design of AI-prohibited evaluation gates as a type-separating institution. Governance And Regulation | positive | medium | policy levers affecting skill certification and training outcomes |
0.05
|
| These findings and institutional lessons extend beyond programming to credentialing systems (medical and legal boards, professional certification) that certify skill in a workforce increasingly shaped by AI. Governance And Regulation | positive | medium | applicability of findings to credentialing systems' design and certification outcomes |
0.05
|