The Commonplace
Home Dashboard Papers Evidence Digests 🎲
← Papers

AI assistant Gemini did not improve code security in a controlled developer experiment; developers' own programming experience—rather than the tool—drove more secure outcomes.

The Impact of AI-Assisted Development on Software Security: A Study of Gemini and Developer Experience
Nadine Jost, Benjamin Berens, Manuel Karl, Stefan Albert Horstmann, Martin Johns, Alena Naiakshina · March 16, 2026
arxiv rct medium evidence 7/10 relevance Source PDF
In a randomized experiment with 159 developers, using Google's Gemini (free or paid) did not significantly alter code security, whereas higher general programming experience substantially improved security outcomes and was not fully substitutable by Gemini.

The ongoing shortage of skilled developers, particularly in security-critical software development, has led organizations to increasingly adopt AI-powered development tools to boost productivity and reduce reliance on limited human expertise. These tools, often based on large language models, aim to automate routine tasks and make secure software development more accessible and efficient. However, it remains unclear how developers' general programming and security-specific experience, and the type of AI tool used (free vs. paid) affect the security of the resulting software. Therefore, we conducted a quantitative programming study with software developers (n=159) exploring the impact of Google's AI tool Gemini on code security. Participants were assigned a security-related programming task using either no AI tools, the free version, or the paid version of Gemini. While we did not observe significant differences between using Gemini in terms of secure software development, programming experience significantly improved code security and cannot be fully substituted by Gemini.

Summary

Main Finding

  • In a quantitative study of 159 software developers given a security-related programming task, use of Google's Gemini (free or paid) did not produce significant differences in code security compared to no-AI assistance.
  • By contrast, developers' general programming experience was a significant predictor of more secure code; AI assistance could not fully substitute for that experience.

Key Points

  • Context: Organizations are adopting AI-powered development tools to mitigate shortages of skilled developers and to speed secure software development.
  • Experimental conditions: participants were assigned to one of three conditions — no AI, Gemini free, or Gemini paid.
  • Outcome: No significant security improvements from using Gemini (free or paid) relative to no-AI. No significant difference between the free and paid Gemini conditions was reported.
  • Human capital matters: greater general programming experience materially improved security outcomes; reliance on Gemini alone did not close that gap.

Data & Methods

  • Sample: n = 159 software developers.
  • Design: Participants completed a security-related programming task while using either no tool, Gemini (free), or Gemini (paid). Assignment to conditions was experimental (participants were assigned to one of the three conditions).
  • Measurement: Code security was assessed from the task submissions (summary does not provide specific metrics or statistical tests used).
  • Limitations (implicit from design): single AI tool (Gemini), single task domain (security-focused programming), and lab/experimental context—these constrain external validity.

Implications for AI Economics

  • Complementarity > Substitution for skilled labor: Results indicate AI assistance (Gemini) did not replace the value of programming experience for producing secure code. Investments in human capital (training, hiring experienced developers) remain economically important.
  • Pricing and product-market implications: No observed security advantage for paid vs. free Gemini suggests that, at least for security outcomes on this task, willingness to pay for premium model access may not translate into better security, affecting firms’ ROI calculations for paid developer tools.
  • Labor market effects: Widespread adoption of LLM-based dev tools may not substantially reduce demand for experienced developers in security-critical roles; instead, tools may shift the task composition toward less-experienced labor supplemented by supervision from experienced staff.
  • Procurement and regulation: Buyers of AI-assisted development tools should not assume these tools reduce the need for experienced engineers in security-sensitive contexts; procurement decisions and regulatory guidance should account for persistent reliance on human expertise.
  • Policy and training: Public and private investment in developer training and security education remains warranted; subsidizing upskilling could be more effective for improving software security than subsidizing tool adoption alone.
  • Research priorities: Economists should study heterogeneous effects across task types, tool capabilities, team settings, and long-run dynamics (learning effects, model improvements) to better estimate the substitutability between AI tools and developer experience.

Assessment

Paper Typerct Evidence Strengthmedium — Internal validity is strong thanks to the experimental design and randomized treatment assignment, allowing causal inference about the tool's short-term effect in the studied setting; external validity is limited by a single tool (Gemini), a single task environment, and a modest sample size, which reduces confidence in generalizing results to real-world development teams and longer-run outcomes. Methods Rigormedium — The study uses a clear experimental design and measures both tool usage and developer experience, but the description omits key methodological details (e.g., pre-registration, blinding, exact randomization protocol, compliance measures, and robustness checks), and security outcomes derive from an artificial task environment that may be sensitive to measurement choices. Samplen = 159 software developers who completed a security-related programming task; participants varied in general programming and security-specific experience and were assigned to one of three conditions (no AI tool, free Gemini, paid Gemini). Themeshuman_ai_collab skills_training IdentificationRandomized assignment of participants to three experimental arms (no AI, free Gemini, paid Gemini); causal effects inferred by comparing security-relevant code quality metrics across arms, with additional analyses controlling for participants' general programming and security-specific experience. GeneralizabilitySingle AI product and specific versions (Google Gemini) — results may not generalize to other LLM-based coding assistants or future Gemini updates, Single security-related programming task in a controlled/short-term setting — may not reflect complex, long-run codebase work, Participant recruitment details not provided here — sample may not represent the broader population of professional software engineers, Paid vs free distinctions are time- and feature-dependent and may change, limiting external validity, Security outcome metrics in an experiment may not map directly to real-world vulnerability discovery, exploitability, or operational security consequences

Claims (8)

ClaimDirectionConfidenceOutcomeDetails
We conducted a quantitative programming study with software developers (n = 159) exploring the impact of Google's AI tool Gemini on code security. Output Quality null_result high impact of Gemini on code security (security of code produced in the study)
n=159
0.6
Participants were assigned a security-related programming task using either no AI tools, the free version, or the paid version of Gemini. Output Quality null_result high experimental condition (tool used) as it relates to subsequent code security outcomes
n=159
0.6
We did not observe significant differences between using Gemini (free or paid) and not using Gemini in terms of secure software development. Output Quality null_result medium secure software development / code security (e.g., detected vulnerabilities or security score of submitted solutions)
n=159
no significant difference
0.36
Programming experience significantly improved code security. Output Quality positive high code security (security quality of participants' solutions) as a function of programming experience
n=159
significant positive effect of programming experience
0.6
Programming experience cannot be fully substituted by Gemini. Output Quality mixed medium degree to which Gemini use offsets the effect of programming experience on code security
n=159
programming experience effect remained despite Gemini
0.36
Organizations increasingly adopt AI-powered development tools to boost productivity and reduce reliance on limited human expertise, especially in security-critical software development. Adoption Rate positive medium adoption of AI-powered development tools (general trend; not measured in this study)
0.36
AI-powered developer tools (often based on large language models) aim to automate routine tasks and make secure software development more accessible and efficient. Developer Productivity positive medium intended goals of AI tools (automation of routine tasks; accessibility/efficiency in secure development)
0.36
It remains unclear how developers' general programming and security-specific experience, and the type of AI tool used (free vs. paid), affect the security of the resulting software — motivating this study. Other null_result high the combined effect of developer experience and AI tool type on code security (identified as an open question prior to the study)
0.6

Notes