The Commonplace
Home Dashboard Papers Evidence Digests 🎲
← Papers

AI coding co-pilots speed routine development and lower barriers for junior programmers, but they often produce incorrect or insecure code that requires new verification practices and governance to avoid costly externalities.

ChatGPT as a Tool for Programming Assistance and Code Development
Horn Sarun · March 26, 2026 · Zenodo (CERN European Organization for Nuclear Research)
openalex review_meta medium evidence 8/10 relevance DOI Source PDF
LLM-based coding assistants materially speed many routine programming tasks and lower entry barriers for novices but introduce correctness, security, and IP risks that make human verification, new workflows, and governance essential to realize net benefits.

Abstract The integration of generative artificial intelligence, specifically large language models like ChatGPT, is instigating a foundational shift in software engineering practices and pedagogy. This nano review critically examines its emergent role as a collaborative coding assistant, evaluating its transformative potential in augmenting developer productivity, debugging, and code documentation. It synthesizes empirical findings on how these tools enhance efficiency and lower barriers to entry for novices, while simultaneously dissecting their critical limitations—including the generation of erroneous or insecure code ("hallucinations"), a lack of deep contextual reasoning, and significant risks related to software security and intellectual property. The analysis posits that the future of programming lies in a synergistic "co-pilot" paradigm, where the strategic augmentation of human expertise with AI-generated suggestions necessitates robust verification protocols, enhanced security practices, and a renewed focus on cultivating fundamental computational thinking skills. Keywords: ChatGPT, generative AI, software development, programming assistance, code generation, AI pair programming, developer productivity, software security, computational thinking

Summary

Main Finding

Large language models (LLMs) such as ChatGPT are catalyzing a shift toward an AI “co-pilot” model in software engineering: AI-generated suggestions materially augment developer workflows—improving productivity and lowering barriers for novices—while also introducing important limits and risks (erroneous/insecure code, weak contextual reasoning, intellectual‑property and security externalities). Realizing net benefits requires systematic verification, stronger security practices, and sustained emphasis on core computational thinking skills.

Key Points

  • Productivity and workflow

    • LLMs can speed up many programming tasks (boilerplate, code completion, documentation, simple debugging) and change how developers iterate.
    • They are most effective when used interactively as assistants rather than as autonomous code authors.
  • Effects on learning and entry

    • These tools lower initial barriers for novices by giving example code, explanations, and templates, potentially accelerating onboarding.
    • There is a risk of shallow learning if learners over-rely on AI outputs without understanding fundamentals.
  • Reliability and correctness

    • LLMs can produce plausible‑looking but incorrect or insecure code (so‑called “hallucinations”).
    • Outputs often lack deep, project‑level contextual reasoning (design tradeoffs, architecture constraints).
  • Security, IP, and legal risks

    • Generated code can introduce security vulnerabilities and may incidentally reproduce copyrighted or licensed snippets.
    • Liability and intellectual‑property ownership around AI‑assisted code are unresolved practical and legal concerns.
  • Human–AI complementarity

    • The highest value arises when human developers verify, adapt, and integrate AI suggestions—requiring new workflows and verification protocols.
    • Organizations will need to build processes and tools (automated testing, static analysis, code review augmented for AI outputs).
  • Educational implications

    • Curricula should emphasize computational thinking, debugging skills, and verification practices rather than rote coding alone.
    • Training should teach how to prompt, validate, and correct AI outputs.

Data & Methods

  • Scope: Nano review / synthesis of emerging empirical literature and practitioner reports on LLMs used as coding assistants.
  • Evidence types synthesized:
    • Controlled experiments and benchmark tasks comparing developer speed/accuracy with and without LLM assistance.
    • User studies and observational analyses of developer workflows and learning outcomes.
    • Security analyses evaluating vulnerabilities in AI‑generated code and instances of reproduced licensed code.
    • Qualitative interviews and case studies documenting organizational adoption and workflow changes.
  • Methodological limitations noted:
    • Rapidly evolving models produce time‑sensitive results.
    • Heterogeneous study designs, tasks, and metrics across the literature limit direct comparability.
    • Many studies focus on short‑term lab or microtask settings rather than long‑horizon, production deployments.

Implications for AI Economics

  • Productivity and labor composition

    • Short‑run: measurable productivity gains for many coding tasks imply higher effective output per developer; could raise demand for higher‑level engineering tasks.
    • Task reallocation: routine, boilerplate, and debugging tasks are most automatable/complemented; value shifts toward design, verification, and systems thinking.
    • Labor demand effects are ambiguous—junior/entry‑level demand may be reduced for some tasks but demand for verification and higher‑skill roles may rise.
  • Skills, wages, and training

    • Skill premiums may shift toward workers who can effectively collaborate with AI (prompting, verification, security auditing).
    • Education and on‑the‑job training should reprioritize computational thinking, software verification, security best practices, and AI literacy.
  • Market structure and incumbent advantages

    • Firms that integrate LLMs effectively (tooling, testing, governance) could capture outsized productivity gains, raising firm‑level dispersion.
    • Model and platform providers may capture significant rents (value capture through API platforms, integrated dev tools).
  • Externalities and social costs

    • Security vulnerabilities and IP leakage create negative externalities; absent internalization, social costs (breaches, legal disputes) may rise.
    • Public goods problems for verification tools and open benchmarks: need incentives for robust testing and auditing infrastructure.
  • Policy and measurement recommendations

    • Economists and policymakers should invest in better measurement: task‑level productivity metrics, longitudinal studies of employment outcomes, and datasets linking AI use to code quality/security incidents.
    • Consider regulatory and liability frameworks addressing IP, provenance of training data, and responsibilities for insecure/generated code.
    • Support public investments in verification infrastructure (automated testing, formal methods, benchmarks) and workforce retraining programs.
  • Research agenda for AI economics

    • Quantify net labor market impacts (substitution vs complementarity) across occupations and firm sizes.
    • Measure distributional effects within firms (who benefits) and across the industry (market concentration).
    • Evaluate long‑run effects on innovation rates, software quality, and security incidents attributable to AI assistance.

Assessment

Paper Typereview_meta Evidence Strengthmedium — Synthesizes multiple controlled experiments, user studies, and security analyses that consistently show task-level productivity gains and common failure modes, but the underlying empirical work is heterogeneous, often short-term or lab-based, and models/tools evolve rapidly—limiting confidence in long-run, production-scale effects. Methods Rigormedium — The piece systematically brings together diverse evidence types (experiments, observational studies, qualitative case studies, security audits) and notes limitations, but it is a compact synthesis rather than a pre-registered or exhaustive systematic review or meta-analysis and does not resolve heterogeneity in methods or metrics across studies. SampleA nano-review of emerging empirical literature and practitioner reports on LLMs as coding assistants, drawing on controlled experiments and benchmark tasks comparing developer performance with/without LLMs, lab and observational user studies, security analyses of AI-generated code, and qualitative interviews/case studies of organizational adoption; many underlying studies focus on microtasks, short-session lab experiments, or early production pilots using recent LLMs (e.g., ChatGPT, Copilot). Themesproductivity human_ai_collab skills_training labor_markets adoption governance GeneralizabilityFindings are time-sensitive because LLM capabilities and tool integrations evolve rapidly., Many studies use short-term lab or microtask settings, which may not map to long-horizon production engineering work., Evidence is concentrated on certain languages, tasks (boilerplate, completion, simple debugging), and developer populations (novices, crowdworkers), limiting applicability across all software domains., Organizational heterogeneity (firm size, development processes, tooling) may alter realized productivity and security outcomes., Geographic and regulatory contexts (IP regimes, data-protection laws) could change incentive structures around adoption and governance.

Claims (20)

ClaimDirectionConfidenceOutcomeDetails
LLMs can speed up many programming tasks (boilerplate, code completion, documentation, simple debugging) and change how developers iterate. Developer Productivity positive medium developer productivity (task completion time, throughput) and task iteration frequency
0.14
LLMs are most effective when used interactively as assistants rather than as autonomous code authors. Output Quality positive medium task success rate and code quality when used interactively versus autonomous generation
0.14
These tools lower initial barriers for novices by giving example code, explanations, and templates, potentially accelerating onboarding. Skill Acquisition positive medium novice task performance and onboarding time
0.14
There is a risk of shallow learning if learners over-rely on AI outputs without understanding fundamentals. Skill Acquisition negative medium depth of conceptual understanding and learning outcomes
0.14
LLMs can produce plausible-looking but incorrect or insecure code (so-called 'hallucinations'). Error Rate negative high code correctness/error rate and frequency of insecure code returned
0.24
Outputs often lack deep, project-level contextual reasoning (e.g., design tradeoffs, architecture constraints). Decision Quality negative medium ability to produce context-appropriate architectural/design decisions
0.14
Generated code can introduce security vulnerabilities. Error Rate negative high incidence of security vulnerabilities in AI-generated code
0.24
Generated code may incidentally reproduce copyrighted or licensed snippets from training data. Regulatory Compliance negative medium frequency of reproduced copyrighted/licensed code in outputs
0.14
Liability and intellectual-property ownership around AI-assisted code are unresolved practical and legal concerns. Governance And Regulation mixed medium legal clarity and risk exposure (qualitative/legal status)
0.14
The highest value arises when human developers verify, adapt, and integrate AI suggestions—human–AI complementarity. Output Quality positive medium task success rate, final code quality, and error rates when human verification is applied
0.14
Organizations will need to build processes and tools (automated testing, static analysis, code review augmented for AI outputs) to realize net benefits safely. Organizational Efficiency positive medium adoption of verification tooling and process changes (qualitative/operational readiness)
0.14
Computer science curricula should emphasize computational thinking, debugging skills, and verification practices rather than rote coding alone. Training Effectiveness positive low curricular emphasis and student competency in verification/debugging (recommended)
0.07
Short-run: measurable productivity gains for many coding tasks imply higher effective output per developer. Developer Productivity positive medium effective output per developer (productivity metrics)
0.14
Routine, boilerplate, and debugging tasks are most automatable or complemented by LLMs, shifting value toward design, verification, and systems thinking. Task Allocation mixed medium time allocation across task types and relative automability
0.14
Labor demand effects are ambiguous: junior/entry-level demand may be reduced for some tasks while demand for verification and higher-skill roles may rise. Employment mixed speculative labor demand by skill level and occupation (employment levels, hiring rates)
0.02
Skill premiums may shift toward workers who can effectively collaborate with AI (prompting, verification, security auditing). Wages positive low wage/skill premium for AI-collaboration skills
0.07
Firms that integrate LLMs effectively (tooling, testing, governance) could capture outsized productivity gains, raising firm-level dispersion. Firm Productivity mixed low firm productivity dispersion and performance differences between adopters and non-adopters
0.07
Model and platform providers may capture significant rents through APIs and integrated developer tooling. Firm Revenue positive low value capture/revenue concentration among model/platform providers
0.07
Security vulnerabilities and IP leakage create negative externalities; absent internalization, social costs (breaches, legal disputes) may rise. Consumer Welfare negative medium social costs from security breaches and IP disputes (incidence and severity)
0.14
Existing evidence is time-sensitive and heterogeneous: rapidly evolving models, heterogeneous study designs, and many short-term lab/microtask studies limit direct comparability and long-run inference. Research Productivity mixed high generalizability and comparability of empirical findings (study heterogeneity)
0.24

Notes