AI coding co-pilots speed routine development and lower barriers for junior programmers, but they often produce incorrect or insecure code that requires new verification practices and governance to avoid costly externalities.

ChatGPT as a Tool for Programming Assistance and Code Development

Horn Sarun · March 26, 2026 · Zenodo (CERN European Organization for Nuclear Research)

openalex review_meta medium evidence 8/10 relevance Summary only summary available; pdf_status=error DOI Source PDF

LLM-based coding assistants materially speed many routine programming tasks and lower entry barriers for novices but introduce correctness, security, and IP risks that make human verification, new workflows, and governance essential to realize net benefits.

Abstract The integration of generative artificial intelligence, specifically large language models like ChatGPT, is instigating a foundational shift in software engineering practices and pedagogy. This nano review critically examines its emergent role as a collaborative coding assistant, evaluating its transformative potential in augmenting developer productivity, debugging, and code documentation. It synthesizes empirical findings on how these tools enhance efficiency and lower barriers to entry for novices, while simultaneously dissecting their critical limitations—including the generation of erroneous or insecure code ("hallucinations"), a lack of deep contextual reasoning, and significant risks related to software security and intellectual property. The analysis posits that the future of programming lies in a synergistic "co-pilot" paradigm, where the strategic augmentation of human expertise with AI-generated suggestions necessitates robust verification protocols, enhanced security practices, and a renewed focus on cultivating fundamental computational thinking skills. Keywords: ChatGPT, generative AI, software development, programming assistance, code generation, AI pair programming, developer productivity, software security, computational thinking

Summary

Main Finding

Large language models (LLMs) such as ChatGPT are catalyzing a shift toward an AI “co-pilot” model in software engineering: AI-generated suggestions materially augment developer workflows—improving productivity and lowering barriers for novices—while also introducing important limits and risks (erroneous/insecure code, weak contextual reasoning, intellectual‑property and security externalities). Realizing net benefits requires systematic verification, stronger security practices, and sustained emphasis on core computational thinking skills.

Key Points

Productivity and workflow
- LLMs can speed up many programming tasks (boilerplate, code completion, documentation, simple debugging) and change how developers iterate.
- They are most effective when used interactively as assistants rather than as autonomous code authors.
Effects on learning and entry
- These tools lower initial barriers for novices by giving example code, explanations, and templates, potentially accelerating onboarding.
- There is a risk of shallow learning if learners over-rely on AI outputs without understanding fundamentals.
Reliability and correctness
- LLMs can produce plausible‑looking but incorrect or insecure code (so‑called “hallucinations”).
- Outputs often lack deep, project‑level contextual reasoning (design tradeoffs, architecture constraints).
Security, IP, and legal risks
- Generated code can introduce security vulnerabilities and may incidentally reproduce copyrighted or licensed snippets.
- Liability and intellectual‑property ownership around AI‑assisted code are unresolved practical and legal concerns.
Human–AI complementarity
- The highest value arises when human developers verify, adapt, and integrate AI suggestions—requiring new workflows and verification protocols.
- Organizations will need to build processes and tools (automated testing, static analysis, code review augmented for AI outputs).
Educational implications
- Curricula should emphasize computational thinking, debugging skills, and verification practices rather than rote coding alone.
- Training should teach how to prompt, validate, and correct AI outputs.

Data & Methods

Scope: Nano review / synthesis of emerging empirical literature and practitioner reports on LLMs used as coding assistants.
Evidence types synthesized:
- Controlled experiments and benchmark tasks comparing developer speed/accuracy with and without LLM assistance.
- User studies and observational analyses of developer workflows and learning outcomes.
- Security analyses evaluating vulnerabilities in AI‑generated code and instances of reproduced licensed code.
- Qualitative interviews and case studies documenting organizational adoption and workflow changes.
Methodological limitations noted:
- Rapidly evolving models produce time‑sensitive results.
- Heterogeneous study designs, tasks, and metrics across the literature limit direct comparability.
- Many studies focus on short‑term lab or microtask settings rather than long‑horizon, production deployments.

Implications for AI Economics

Productivity and labor composition
- Short‑run: measurable productivity gains for many coding tasks imply higher effective output per developer; could raise demand for higher‑level engineering tasks.
- Task reallocation: routine, boilerplate, and debugging tasks are most automatable/complemented; value shifts toward design, verification, and systems thinking.
- Labor demand effects are ambiguous—junior/entry‑level demand may be reduced for some tasks but demand for verification and higher‑skill roles may rise.
Skills, wages, and training
- Skill premiums may shift toward workers who can effectively collaborate with AI (prompting, verification, security auditing).
- Education and on‑the‑job training should reprioritize computational thinking, software verification, security best practices, and AI literacy.
Market structure and incumbent advantages
- Firms that integrate LLMs effectively (tooling, testing, governance) could capture outsized productivity gains, raising firm‑level dispersion.
- Model and platform providers may capture significant rents (value capture through API platforms, integrated dev tools).
Externalities and social costs
- Security vulnerabilities and IP leakage create negative externalities; absent internalization, social costs (breaches, legal disputes) may rise.
- Public goods problems for verification tools and open benchmarks: need incentives for robust testing and auditing infrastructure.
Policy and measurement recommendations
- Economists and policymakers should invest in better measurement: task‑level productivity metrics, longitudinal studies of employment outcomes, and datasets linking AI use to code quality/security incidents.
- Consider regulatory and liability frameworks addressing IP, provenance of training data, and responsibilities for insecure/generated code.
- Support public investments in verification infrastructure (automated testing, formal methods, benchmarks) and workforce retraining programs.
Research agenda for AI economics
- Quantify net labor market impacts (substitution vs complementarity) across occupations and firm sizes.
- Measure distributional effects within firms (who benefits) and across the industry (market concentration).
- Evaluate long‑run effects on innovation rates, software quality, and security incidents attributable to AI assistance.

Assessment

Paper Typereview_meta Evidence Strengthmedium — Synthesizes multiple controlled experiments, user studies, and security analyses that consistently show task-level productivity gains and common failure modes, but the underlying empirical work is heterogeneous, often short-term or lab-based, and models/tools evolve rapidly—limiting confidence in long-run, production-scale effects. Methods Rigormedium — The piece systematically brings together diverse evidence types (experiments, observational studies, qualitative case studies, security audits) and notes limitations, but it is a compact synthesis rather than a pre-registered or exhaustive systematic review or meta-analysis and does not resolve heterogeneity in methods or metrics across studies. SampleA nano-review of emerging empirical literature and practitioner reports on LLMs as coding assistants, drawing on controlled experiments and benchmark tasks comparing developer performance with/without LLMs, lab and observational user studies, security analyses of AI-generated code, and qualitative interviews/case studies of organizational adoption; many underlying studies focus on microtasks, short-session lab experiments, or early production pilots using recent LLMs (e.g., ChatGPT, Copilot). Themesproductivity human_ai_collab skills_training labor_markets adoption governance GeneralizabilityFindings are time-sensitive because LLM capabilities and tool integrations evolve rapidly., Many studies use short-term lab or microtask settings, which may not map to long-horizon production engineering work., Evidence is concentrated on certain languages, tasks (boilerplate, completion, simple debugging), and developer populations (novices, crowdworkers), limiting applicability across all software domains., Organizational heterogeneity (firm size, development processes, tooling) may alter realized productivity and security outcomes., Geographic and regulatory contexts (IP regimes, data-protection laws) could change incentive structures around adoption and governance.

Claims (20)

Claim	Direction	Outcome	Confidence & Evidence	Details
LLMs can speed up many programming tasks (boilerplate, code completion, documentation, simple debugging) and change how developers iterate. Developer Productivity	positive	developer productivity (task completion time, throughput) and task iteration frequency	Reading fidelity medium Study strength medium	not reported 0.14
LLMs are most effective when used interactively as assistants rather than as autonomous code authors. Output Quality	positive	task success rate and code quality when used interactively versus autonomous generation	Reading fidelity medium Study strength medium	not reported 0.14
These tools lower initial barriers for novices by giving example code, explanations, and templates, potentially accelerating onboarding. Skill Acquisition	positive	novice task performance and onboarding time	Reading fidelity medium Study strength medium	not reported 0.14
There is a risk of shallow learning if learners over-rely on AI outputs without understanding fundamentals. Skill Acquisition	negative	depth of conceptual understanding and learning outcomes	Reading fidelity medium Study strength medium	not reported 0.14
LLMs can produce plausible-looking but incorrect or insecure code (so-called 'hallucinations'). Error Rate	negative	code correctness/error rate and frequency of insecure code returned	Reading fidelity high Study strength medium	not reported 0.24
Outputs often lack deep, project-level contextual reasoning (e.g., design tradeoffs, architecture constraints). Decision Quality	negative	ability to produce context-appropriate architectural/design decisions	Reading fidelity medium Study strength medium	not reported 0.14
Generated code can introduce security vulnerabilities. Error Rate	negative	incidence of security vulnerabilities in AI-generated code	Reading fidelity high Study strength medium	not reported 0.24
Generated code may incidentally reproduce copyrighted or licensed snippets from training data. Regulatory Compliance	negative	frequency of reproduced copyrighted/licensed code in outputs	Reading fidelity medium Study strength medium	not reported 0.14
Liability and intellectual-property ownership around AI-assisted code are unresolved practical and legal concerns. Governance And Regulation	mixed	legal clarity and risk exposure (qualitative/legal status)	Reading fidelity medium Study strength medium	not reported 0.14
The highest value arises when human developers verify, adapt, and integrate AI suggestions—human–AI complementarity. Output Quality	positive	task success rate, final code quality, and error rates when human verification is applied	Reading fidelity medium Study strength medium	not reported 0.14
Organizations will need to build processes and tools (automated testing, static analysis, code review augmented for AI outputs) to realize net benefits safely. Organizational Efficiency	positive	adoption of verification tooling and process changes (qualitative/operational readiness)	Reading fidelity medium Study strength medium	not reported 0.14
Computer science curricula should emphasize computational thinking, debugging skills, and verification practices rather than rote coding alone. Training Effectiveness	positive	curricular emphasis and student competency in verification/debugging (recommended)	Reading fidelity low Study strength medium	not reported 0.07
Short-run: measurable productivity gains for many coding tasks imply higher effective output per developer. Developer Productivity	positive	effective output per developer (productivity metrics)	Reading fidelity medium Study strength medium	not reported 0.14
Routine, boilerplate, and debugging tasks are most automatable or complemented by LLMs, shifting value toward design, verification, and systems thinking. Task Allocation	mixed	time allocation across task types and relative automability	Reading fidelity medium Study strength medium	not reported 0.14
Labor demand effects are ambiguous: junior/entry-level demand may be reduced for some tasks while demand for verification and higher-skill roles may rise. Employment	mixed	labor demand by skill level and occupation (employment levels, hiring rates)	Reading fidelity speculative Study strength medium	not reported 0.02
Skill premiums may shift toward workers who can effectively collaborate with AI (prompting, verification, security auditing). Wages	positive	wage/skill premium for AI-collaboration skills	Reading fidelity low Study strength medium	not reported 0.07
Firms that integrate LLMs effectively (tooling, testing, governance) could capture outsized productivity gains, raising firm-level dispersion. Firm Productivity	mixed	firm productivity dispersion and performance differences between adopters and non-adopters	Reading fidelity low Study strength medium	not reported 0.07
Model and platform providers may capture significant rents through APIs and integrated developer tooling. Firm Revenue	positive	value capture/revenue concentration among model/platform providers	Reading fidelity low Study strength medium	not reported 0.07
Security vulnerabilities and IP leakage create negative externalities; absent internalization, social costs (breaches, legal disputes) may rise. Consumer Welfare	negative	social costs from security breaches and IP disputes (incidence and severity)	Reading fidelity medium Study strength medium	not reported 0.14
Existing evidence is time-sensitive and heterogeneous: rapidly evolving models, heterogeneous study designs, and many short-term lab/microtask studies limit direct comparability and long-run inference. Research Productivity	mixed	generalizability and comparability of empirical findings (study heterogeneity)	Reading fidelity high Study strength medium	not reported 0.24