LLM coding assistants speed up developers and cut routine work, but their effect on code quality and teamwork remains unresolved; most studies are short-term and exploratory, leaving long-run and team-level impacts unclear.
Large language model assistants (LLM-assistants) present new opportunities to transform software development. Developers are increasingly adopting these tools across tasks, including coding, testing, debugging, documentation, and design. Yet, despite growing interest, there is no synthesis of how LLM-assistants affect software developer productivity. In this paper, we present a systematic review and mapping of 39 peer-reviewed studies published between January 2014 and December 2024 that examine this impact. Our analysis reveals that the majority of studies report considerable benefits from LLM-assistants, though a notable subset identifies critical risks. Commonly reported gains include accelerated development, minimized code search, and the automation of trivial and repetitive tasks. However, studies also highlight concerns around cognitive offloading and reduced team collaboration. Our study reveals that whether LLM-based assistants improve or degrade code quality remains unresolved, as existing studies report contradictory outcomes contingent on context and evaluation criteria. While the majority of studies (90%) adopt a multi-dimensional perspective by examining at least two SPACE dimensions, reflecting increased awareness of the complexity of developer productivity, only 15% extend beyond three dimensions, indicating substantial room for more integrated evaluations. Satisfaction, Performance, and Efficiency are the most frequently investigated dimensions, whereas Communication and Activity remain underexplored. Most studies are exploratory (59%) and methodologically diverse, but lack longitudinal and team-based evaluations. This review surfaces key research gaps and provides recommendations for future research and practice. All artifacts associated with this study are publicly available at https://zenodo.org/records/18489222 .
Summary
Main Finding
This systematic review of 39 peer‑reviewed studies (2014–Dec 2024) finds that LLM‑based coding assistants generally produce measurable short‑term productivity benefits for software developers (faster development, less code search, automation of repetitive tasks, reduced task initiation overhead), but also introduce meaningful risks (cognitive offloading, reduced team collaboration, flow disruption). The effect on code quality remains unresolved—existing studies report contradictory outcomes depending on context, tasks, and evaluation criteria. Most primary studies are exploratory, methodologically diverse, and short‑term; team‑level and longitudinal evidence is scarce.
Reference: Amr Mohamed, Maram Assi, Mariam Guizani — “The Impact of LLM‑Assistants on Software Developer Productivity: A Systematic Review and Mapping Study” (39 studies synthesized; replication artifacts: https://zenodo.org/records/18489222).
Key Points
- Evidence base
- 39 primary studies (published 2014–Dec 2024) selected from 9,756 records across 6 databases; 228 full texts screened; 39 included.
- Authors used Kitchenham & Charters SLR protocol, PRISMA flow, and validated search with control papers.
- Overall reported effects
- Common benefits: accelerated task completion, reduced code search, automation of trivial/repetitive tasks, lower task initiation overhead, better support for code‑adjacent work (documentation, tests).
- Common risks: over‑reliance/cognitive offloading, decreased team communication or collaboration, flow disruption, possible propagation of subtle bugs or insecure patterns.
- Code quality: mixed and context‑dependent — some studies find improvements, others degradation or no effect.
- Productivity conceptualization and measurement
- Studies mapped to the SPACE framework: Satisfaction & well‑being, Performance, Activity, Communication & collaboration, Efficiency & flow.
- 90% of studies treat productivity multi‑dimensionally (≥2 SPACE dimensions); only ~15% examine >3 dimensions.
- Most frequently studied: Satisfaction, Performance, Efficiency/Flow. Understudied: Communication and Activity.
- Methodology and gaps
- 59% of studies are exploratory; methods are heterogeneous (lab experiments, controlled tasks, surveys, observational analyses).
- Major gaps: few longitudinal studies, few team‑level or organization‑level studies, limited real‑world deployment studies, inconsistent metrics for productivity and quality.
- Artefacts and transparency
- Authors released replication package and selection decisions publicly (zenodo link).
Data & Methods
- Search & selection
- Databases: ACM DL, IEEE Xplore, ScienceDirect, Web of Science, Scopus, SpringerLink.
- Initial hits: 9,756; after deduplication: 8,953 records screened; full texts screened: 228; final included: 39.
- Query strategy used iterative refinement, proximity operators where supported, and validation against 17 control papers.
- Inclusion/exclusion highlights
- Included: peer‑reviewed English papers (2014+) that investigate the effect of AI/LLMs on developer productivity.
- Excluded: secondary studies, short papers (<4 pages), non‑peer reviewed grey literature, inaccessible texts, out‑of‑scope/out‑of‑focus works.
- Analysis frameworks
- Mapped each study onto the SPACE productivity framework; discussion augmented with McLuhan’s Tetrad to interpret socio‑technical implications.
- Characteristics of primary studies
- Study designs: lab experiments, controlled tasks, surveys, observational analyses of IDE/tool usage, case studies.
- Focus: individual developer interactions with LLM‑assistants predominate; few team studies.
- Temporal scope: largely short‑term snapshots; almost no long‑horizon follow‑ups.
Implications for AI Economics
- Labor demand and task composition
- Short‑term productivity gains suggest reduced time per task and potential reallocation of developer effort toward higher‑value tasks (design, architecture, coordination).
- Economists should model task‑level substitution: LLMs substitute for routine coding/search tasks while complementing higher‑skill activities—implying shifts in relative demand for skills.
- Wage and skill‑premium effects
- If LLMs automate routine tasks, demand for mid‑level routine coding may fall while demand for senior/architectural, verification, and coordination skills rises—potentially increasing skill premia and polarization within software labor markets.
- Heterogeneous effects by task, experience, and sector: junior developers may gain or lose depending on adoption, oversight requirements, and supervision structures.
- Productivity vs. quality tradeoffs and externalities
- Mixed code‑quality findings imply ambiguous impact on product reliability and consumer welfare. Economists should account for possible negative externalities (security/bug propagation) that raise downstream costs.
- Firms may face tradeoffs between short‑term throughput gains and longer‑term maintenance costs; general equilibrium impacts depend on how quality is verified and regulated.
- Organizational complementarities and team effects
- Reduced communication/collaboration signals changed complementarities between tools and human coordination. Team‑level complementarities could amplify or dampen productivity gains; hence, firm‑level models need to include coordination frictions and knowledge spillovers.
- Measurement and empirical research needs
- Multi‑dimensional productivity: researchers should move beyond single proxies (LOC, task time) and integrate SPACE dimensions into empirical work (satisfaction, activity, communication, efficiency, performance).
- Urgent need for longitudinal, team‑level, and field experiments (or administrative/firm panel data) to estimate durable effects on employment, wages, promotion, churn, and firm performance.
- Use of administrative IDE/tool logs, matched employer‑employee data, and randomized encouragement/rollouts would improve causal identification.
- Policy and market design
- Regulation, standards, and certification for LLM‑generated code may be needed if negative externalities are nontrivial (security, liability).
- Training and reskilling policies should emphasize supervisory, verification, and collaborative skills that complement LLMs.
- Practical research recommendations for economists
- Build task‑based models that explicitly separate routine vs non‑routine software tasks and model complementarities with human capital.
- Estimate heterogeneous treatment effects by experience level, team structure, and sector; test interactions between tool adoption and managerial practices.
- Quantify welfare effects including product quality, maintenance costs, and consumer risk; include dynamic adjustment (retraining, task creation).
- Leverage the reviewed replication package and SPACE mapping as a taxonomy for constructing multi‑dimensional outcome variables.
Summary takeaway: LLM‑assistants appear to reallocate developer effort and raise short‑term productivity along several dimensions, but the long‑run labor, organizational, and product‑quality consequences are uncertain. Economists should prioritize task‑level, team‑level, and longitudinal empirical strategies and incorporate multi‑dimensional productivity metrics to estimate durable impacts and guide policy.
Assessment
Claims (14)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| This paper is a systematic review and mapping of 39 peer-reviewed studies published between January 2014 and December 2024 that examine the impact of LLM-assistants on software developer productivity. Other | null_result | high | scope of literature reviewed (count of studies) |
n=39
0.4
|
| The majority of reviewed studies report considerable benefits from LLM-assistants. Developer Productivity | positive | high | overall reported impact on developer productivity |
n=39
0.24
|
| A notable subset of studies identifies critical risks associated with LLM-assistants. Other | negative | high | reported risks and negative impacts |
n=39
0.24
|
| Commonly reported gains from LLM-assistants include accelerated development (faster task completion). Task Completion Time | positive | high | task completion time / development speed |
n=39
0.24
|
| Commonly reported gains include minimized code search due to LLM assistance. Developer Productivity | positive | high | time/effort spent searching for code or information |
n=39
0.24
|
| Commonly reported gains include the automation of trivial and repetitive tasks. Developer Productivity | positive | high | automation of low-complexity tasks / developer time freed |
n=39
0.24
|
| Studies highlight concerns around cognitive offloading and reduced team collaboration when using LLM-assistants. Team Performance | negative | high | cognitive processes and team collaboration |
n=39
0.24
|
| Whether LLM-based assistants improve or degrade code quality remains unresolved: existing studies report contradictory outcomes contingent on context and evaluation criteria. Output Quality | mixed | high | code quality (e.g., correctness, maintainability, defects) |
n=39
0.24
|
| 90% of the reviewed studies adopt a multi-dimensional perspective by examining at least two SPACE dimensions. Other | null_result | high | proportion of studies examining >=2 SPACE dimensions |
n=39
90%
0.4
|
| Only 15% of the reviewed studies extend beyond three SPACE dimensions. Other | null_result | high | proportion of studies examining >3 SPACE dimensions |
n=39
15%
0.4
|
| Satisfaction, Performance, and Efficiency are the most frequently investigated SPACE dimensions, whereas Communication and Activity remain underexplored. Other | null_result | high | frequency of SPACE dimensions studied |
n=39
0.4
|
| Most studies are exploratory (59%) and methodologically diverse, but there is a lack of longitudinal and team-based evaluations. Other | negative | high | study design types and presence/absence of longitudinal or team-based evaluations |
n=39
59%
0.4
|
| This review identifies key research gaps and provides recommendations for future research and practice. Other | null_result | high | research gaps and recommendations (qualitative synthesis) |
n=39
0.04
|
| All artifacts associated with this study are publicly available at https://zenodo.org/records/18489222. Other | null_result | high | availability of study artifacts |
0.4
|