Generative AI is reshaping software engineering into a human–AI co‑pilot model that can boost developer productivity and ease onboarding, but the gains are conditional on robust verification, security audits and preserved emphasis on core computational thinking; without these guardrails, hallucinations, vulnerabilities and IP risks can erode value.

ChatGPT as a Tool for Programming Assistance and Code Development

Horn Sarun · March 26, 2026 · Zenodo (CERN European Organization for Nuclear Research)

openalex review_meta medium evidence 8/10 relevance Summary only summary available; pdf_status=error DOI Source PDF

Generative LLMs function effectively as developer co‑pilots that can raise productivity and lower entry barriers, but their net economic value depends on rigorous verification, security practices, and maintaining core computational skills because of hallucination, vulnerability, and IP risks.

Abstract The integration of generative artificial intelligence, specifically large language models like ChatGPT, is instigating a foundational shift in software engineering practices and pedagogy. This nano review critically examines its emergent role as a collaborative coding assistant, evaluating its transformative potential in augmenting developer productivity, debugging, and code documentation. It synthesizes empirical findings on how these tools enhance efficiency and lower barriers to entry for novices, while simultaneously dissecting their critical limitations—including the generation of erroneous or insecure code ("hallucinations"), a lack of deep contextual reasoning, and significant risks related to software security and intellectual property. The analysis posits that the future of programming lies in a synergistic "co-pilot" paradigm, where the strategic augmentation of human expertise with AI-generated suggestions necessitates robust verification protocols, enhanced security practices, and a renewed focus on cultivating fundamental computational thinking skills. Keywords: ChatGPT, generative AI, software development, programming assistance, code generation, AI pair programming, developer productivity, software security, computational thinking

Summary

Main Finding

Generative AI (notably large language models like ChatGPT) is reshaping software engineering toward a human–AI "co-pilot" model that can raise developer productivity and lower barriers for novices, but its value is conditional on robust verification, security practices, and preserved emphasis on core computational thinking due to risks from hallucinated, insecure, or IP-sensitive code.

Key Points

Productivity and workflow
- LLMs can accelerate coding tasks, debugging, and documentation; they function effectively as collaborative coding assistants.
- The most promising mode is augmentation (AI suggestions + human oversight), not full automation.
Learning and access
- These tools lower entry barriers for novices and can speed learning, but can also mask gaps in foundational skills.
Limitations and risks
- Hallucinations: models sometimes generate incorrect, nonsensical, or insecure code.
- Lack of deep contextual reasoning: models may fail on tasks requiring long-term design thinking or deep domain knowledge.
- Security vulnerabilities and IP concerns: generated code can introduce vulnerabilities and raise licensing/intellectual‑property issues.
Recommended practice
- Adopt rigorous verification/QA protocols and security audits for AI-generated code.
- Emphasize human oversight and training in computational thinking alongside tool use.
- Treat AI as a collaborator that changes task boundaries (more verification, design, integration work for humans).

Data & Methods

Approach: nano review / critical synthesis of empirical literature on LLMs in software development.
Evidence types synthesized:
- User studies and productivity measurements (task completion time, developer workflows).
- Benchmarks and code-generation accuracy tests.
- Case studies and incident analyses (security and correctness failures).
- Pedagogical assessments of learning outcomes for novices.
Analysis method: cross-study synthesis highlighting consistent empirical patterns and pinpointing methodological gaps (e.g., long-term effects, firm-level impacts, security cost quantification).

Implications for AI Economics

Labor demand and task composition
- Routine coding tasks may be partially automated, reducing time spent on boilerplate work and shifting human labor toward verification, integration, architecture, and domain-specific tasks.
- Demand for mid-level, routine-focused developer roles could compress; demand increases for skills in verification, security, and AI–human orchestration.
Wages and skill premiums
- Potential reallocation of wage premia: higher returns to developers who can supervise AI, secure systems, and perform complex design; downward pressure on purely routine coding wages.
Productivity and firm incentives
- Firms can gain productivity gains from adoption, but net value depends on costs of verification, security remediation, and IP risk management.
- Strong incentives for firms to integrate LLMs into development pipelines, invest in internal guardrails, and retrain staff.
Market structure and concentration
- Centralized provision of high‑quality coding models by a few vendors could create vendor lock-in and platform power, affecting competition and input costs.
Security externalities and systemic risk
- Widespread adoption without adequate verification raises systemic cybersecurity risks with economic spillovers (breaches, downstream failures).
Policy and regulation
- Need for standards on provenance, licensing, and security auditing of AI-generated code; potential roles for certification and liability rules.
Research and measurement needs
- Empirical work needed on long-run impacts: wage trajectories, employment composition, firm-level returns, costs of verification, and the public‑good risks of insecure code proliferation.

Suggested priorities for economists: measure how adoption changes task mixes and wages, quantify verification and remediation costs, estimate productivity gains net of security/IP costs, and study market dynamics from centralized model providers.

Assessment

Paper Typereview_meta Evidence Strengthmedium — Synthesizes multiple empirical strands (lab and field user studies, benchmarks, case reports, pedagogical tests) that consistently show short-term productivity and learning effects, but lacks strong, long-run causal identification at firm or labor-market scale and relies on small samples, heterogeneous measures, and case-based evidence for risks. Methods Rigormedium — Careful cross-study synthesis that identifies consistent patterns and gaps, but it is a compact ('nano') review rather than a systematic review or meta-analysis—no formal search protocol, quantitative aggregation, or new causal estimation is presented. SampleA synthesis of published and emergent evidence including short-term lab and field user studies of developers (often N in the tens to low hundreds), code‑generation benchmarks and accuracy tests, case studies and incident analyses from industry, and pedagogical assessments of novice learners; data are heterogeneous in scope, geography, and measurement. Themesproductivity human_ai_collab skills_training labor_markets adoption governance GeneralizabilityMost empirical studies are short-term task experiments or small-scale field studies, limiting inference about long-run effects., Participants often are early adopters or volunteers, creating selection bias relative to broader developer populations., Findings vary by task type (boilerplate/routine vs. architecture/design) and programming language/toolchain, limiting cross-context transfer., Firm-level incentives, team processes, and industry sectors are underrepresented—results may not generalize to large enterprise software development., Rapid evolution of LLM capabilities and product integrations means results may be time-sensitive and model-specific., Security, legal, and IP cost estimates are largely case-based and not broadly quantified, restricting economic generalizability.

Claims (19)

Claim	Direction	Outcome	Confidence & Evidence	Details
Large language models (LLMs) can accelerate coding tasks, debugging, and documentation, functioning effectively as collaborative coding assistants. Developer Productivity	positive	developer productivity (task completion time, throughput, time-to-debug, documentation time)	Reading fidelity medium Study strength medium	not reported 0.14
The most promising deployment mode is augmentation (AI suggestions plus human oversight) rather than full automation. Error Rate	positive	task success rate and error rate under human-in-the-loop workflows versus fully automated workflows	Reading fidelity medium Study strength medium	not reported 0.14
Generative AI tools lower entry barriers for novices and can speed learning of programming tasks. Skill Acquisition	positive	novice learning outcomes (time-to-complete tasks, accuracy, self-reported confidence)	Reading fidelity medium Study strength medium	not reported 0.14
Use of these tools can mask gaps in foundational computational skills among novices. Skill Acquisition	negative	measures of foundational skill (conceptual quiz scores, ability to solve novel/unassisted problems)	Reading fidelity medium Study strength medium	not reported 0.14
LLMs sometimes generate incorrect, nonsensical, or insecure code (hallucinations). Error Rate	negative	code correctness/error rate; incidence of hallucinated outputs (false or fabricated code/claims)	Reading fidelity high Study strength medium	not reported 0.24
Models lack deep contextual reasoning and may fail on tasks requiring long-term design thinking or deep domain knowledge. Decision Quality	negative	task success on long-horizon design tasks, reasoning/chain-of-thought benchmark scores	Reading fidelity medium Study strength medium	not reported 0.14
AI-generated code can introduce security vulnerabilities and raise licensing/intellectual-property concerns. Ai Safety And Ethics	negative	incidence of security vulnerabilities in generated code; instances of license or IP violations attributable to generated code	Reading fidelity high Study strength medium	not reported 0.24
Rigorous verification, QA protocols, and security audits are necessary when integrating AI-generated code into production systems. Error Rate	positive	adoption of verification/QA practices; reduction in post-deployment defects and security incidents	Reading fidelity medium Study strength medium	not reported 0.14
Human oversight and continued emphasis on computational thinking should be preserved alongside AI tool use. Skill Acquisition	positive	continuing competency in computational thinking (assessment scores) and reliance on human review in development workflows	Reading fidelity medium Study strength medium	not reported 0.14
Routine coding tasks may be partially automated, shifting human labor toward verification, integration, architecture, and domain-specific tasks. Task Allocation	mixed	time allocation across task types (routine coding vs. verification/architecture), changes in job task composition	Reading fidelity speculative Study strength medium	not reported 0.02
Demand for mid-level, routine-focused developer roles could compress while demand rises for verification, security, and AI–human orchestration skills. Employment	mixed	employment demand by role/skill category; hiring trends and vacancy composition	Reading fidelity speculative Study strength medium	not reported 0.02
Wage premia may reallocate: higher returns for developers who can supervise AI and secure systems, and downward pressure on pure routine-coding wages. Wages	mixed	wage changes by skill level (supervisory/verification vs routine coding)	Reading fidelity low Study strength medium	not reported 0.07
Firms can realize productivity gains from adopting LLMs, but net value depends on verification, security remediation, and IP-management costs. Firm Productivity	mixed	firm productivity metrics (output per developer) net of verification/remediation costs; ROI on AI tool adoption	Reading fidelity medium Study strength medium	not reported 0.14
Firms have strong incentives to integrate LLMs into development pipelines and to invest in internal guardrails and retraining. Adoption Rate	positive	rates of LLM integration into pipelines; investment in guardrails/training; internal policy adoption	Reading fidelity medium Study strength medium	not reported 0.14
Centralized provision of high-quality coding models by a few vendors could produce vendor lock-in and increase platform power in software development inputs. Market Structure	negative	market concentration measures (e.g., HHI), indicators of vendor lock-in (switching costs, exclusive integration)	Reading fidelity speculative Study strength medium	not reported 0.02
Widespread adoption of LLMs without adequate verification increases systemic cybersecurity risks with potential economic spillovers. Ai Safety And Ethics	negative	frequency/severity of security breaches attributable to AI-generated code; downstream economic costs of such breaches	Reading fidelity medium Study strength medium	not reported 0.14
There is a need for standards on provenance, licensing, and security auditing of AI-generated code, and potential roles for certification and liability frameworks. Regulatory Compliance	positive	existence and adoption of provenance/licensing/security standards; implementation of certification/liability regimes	Reading fidelity medium Study strength medium	not reported 0.14
Significant empirical gaps remain on long-term impacts (wage trajectories, employment composition, firm-level returns), verification/remediation cost quantification, and public-good risks of insecure code proliferation. Research Productivity	null_result	absence or paucity of longitudinal studies and firm-level quantitative measurements on the listed outcomes	Reading fidelity high Study strength medium	not reported 0.24
Recommended research priorities for economists include measuring how adoption changes task mixes and wages, quantifying verification/remediation costs, estimating productivity gains net of security/IP costs, and studying market dynamics from centralized model providers. Research Productivity	positive	generation of targeted empirical studies addressing task mix, wage impacts, verification costs, net productivity, and market dynamics	Reading fidelity high Study strength medium	not reported 0.24