Interviews with 22 developers show that balancing control over LLMs determines whether these tools boost or erode software work; excessive reliance risks skill atrophy while underuse forfeits productivity gains. The paper offers a preliminary 'reliance–control' framework to guide tool design, training and policy.

Towards an Appropriate Level of Reliance on AI: A Preliminary Reliance-Control Framework for AI in Software Engineering

Samuel Ferino, Rashina Hoda, John Grundy, Christoph Treude · April 12, 2026

arxiv descriptive low evidence 7/10 relevance Source PDF

From 22 developer interviews the authors propose a preliminary reliance-control framework showing that levels of control over LLMs help identify overreliance and underreliance and guide recommendations for research, practice, and policy.

How software developers interact with Artificial Intelligence (AI)-powered tools, including Large Language Models (LLMs), plays a vital role in how these AI-powered tools impact them. While overreliance on AI may lead to long-term negative consequences (e.g., atrophy of critical thinking skills); underreliance might deprive software developers of potential gains in productivity and quality. Based on twenty-two interviews with software developers on using LLMs for software development, we propose a preliminary reliance-control framework where the level of control can be used as a way to identify AI overreliance and underreliance. We also use it to recommend future research to further explore the different control levels supported by the current and emergent LLM-driven tools. Our paper contributes to the emerging discourse on AI overreliance and provides an understanding of the appropriate degree of reliance as essential to developers making the most of these powerful technologies. Our findings can help practitioners, educators, and policymakers promote responsible and effective use of AI tools.

Summary

Main Finding

The paper proposes a preliminary two-dimensional "reliance–control" framework for AI in software engineering (SE). It characterises degrees of (a) control developers hold over AI tools (self-control → losing control) and (b) reliance developers place on AI (self-reliance → full automation). The authors argue the optimal outcome (“sweet spot”) is balanced control combined with appropriate reliance, and they illustrate how different interaction modes (e.g., IDE-integrated agents vs. conversational chatbots) map to different positions in this space. They use interview evidence to show that tool design and organizational choices influence whether teams gain productivity or risk skill atrophy, reduced team communication, and bugs/vulnerabilities from overreliance.

Key Points

Reliance and control are related but distinct dimensions. Appropriate reliance is not just trusting accurate outputs but choosing when and how much to rely across tasks.
Degrees of control (Table 1): self-control, taking control, balanced control, handing over control, losing control.
Degrees of reliance (Table 2): self-reliance, reliance on colleagues, appropriate reliance, overreliance, full automation.
The “sweet spot” is balanced control + appropriate reliance (e.g., using LLMs for test-driven development while supervising outputs).
Risk scenarios: handing over control + overreliance (prototype generation without checks), losing control + full automation (multi-file agent changes introducing vulnerabilities).
Empirical background: 22 semi-structured interviews with practitioners across regions; examples reference tools and interaction modes (ChatGPT, Claude, Cursor, Windsurf, GitHub Copilot, agent modes).
Prior empirical findings cited: incidents of operational harm (e.g., Replit-agent deleting live data), cognitive/behavioral impacts (Kosmyna et al.’s EEG study showing neural/behavioral underperformance after LLM use), and mixed effectiveness of interventions (Bo et al. and Collins et al. on reliance interventions).
Practical recommendations include aligning tool selection with team expertise, documenting norms about when automation is acceptable, and curricular choices to avoid premature automation for novices.

Data & Methods

Primary data: twenty-two semi-structured interviews with software developers collected in three rounds (Oct 2024–Sep 2025) across Asia, Europe, North America, South America, and Oceania.
Participants completed a pre-interview questionnaire covering demographics and AI attitudes.
Analysis approach: socio-technical grounded theory (STGT) — open coding, constant comparison, memoing. First author led coding with early collaborative sessions with a grounded-theory specialist.
Supplementary materials (protocol, interview guide, coding examples, memos) are provided by the authors.
Framework construction: emergent coding revealed reliance and control as core concepts; authors mapped observed developer practices and tool interaction types to the two dimensions and illustrated practical examples.
External grounding: framework linked to prior taxonomies of developer–AI interaction types (Treude & Gerosa) to show how interaction modes cluster by control/reliance.

Implications for AI Economics

Labor market and productivity trade-offs
- Short-run productivity gains may be real (reduced effort, faster prototyping), but long-run human capital depreciation (skill atrophy, reduced critical thinking) poses a risk to worker productivity and employability. Economists should model both immediate output gains and negative dynamic effects on skills.
- Task-level complementarity vs. substitution: interaction modes that preserve developer control (balanced control) imply complementarity (AI + human), whereas agent-driven, multi-file automation pushes toward substitution and potential job-task reallocation.
Firm-level adoption and organizational choices
- Tool selection and governance (which interaction mode to deploy) are strategic decisions with measurable returns and risks. Firms may optimally choose different automation levels depending on workforce skill composition, product criticality, and regulatory exposure.
- Team structures and communication externalities: reduced peer discussion (observed in practice) can lower overall team knowledge diffusion and raise coordination costs—important for models of team productivity.
Human capital and education policy
- Curriculum design matters: restricting certain AI features early in training may preserve foundational skills. Economists and policymakers should assess the social returns to investing in retraining and in-place checks (code review, TDD practices).
Measurement and empirical strategy suggestions
- Key variables to measure: reliance level, control mode, task criticality, error rates in AI outputs, developer skill trajectories (short- and long-run), team communication frequency/quality, incidence of costly failures (security/availability).
- Suggested empirical designs: longitudinal panel studies of developers (to capture skill dynamics), randomized controlled trials (RCTs) of interaction modes (e.g., agent mode vs. conversational assist) to estimate causal impacts on productivity and skill retention, within-subjects experiments for cognitive outcomes (replicating/expanding EEG and behavioral measures).
- Natural experiments / difference-in-differences: exploit staggered firm-level rollouts of agent-mode tools or policy changes restricting certain AI features.
Policy, liability, and regulation
- Externalities from overreliance (e.g., system failures, security breaches) create regulatory and insurance angles. Regulators may mandate logging, human-in-the-loop requirements for critical systems, or certification standards for AI-driven code changes.
- Incentive design: firms may need to internalize the long-term cost of skill erosion (through training subsidies, enforced review workflows, or UX friction where appropriate).
Research agenda for AI economists
- Quantify the threshold where agent automation becomes net welfare-improving vs. welfare-reducing when accounting for skill depreciation and externalities.
- Model firm decisions on tool adoption as a dynamic optimization (short-run gains vs. long-run human capital loss).
- Estimate wage and labor demand impacts by occupation and experience level as agent modes diffuse.
- Evaluate policy levers (e.g., mandated human oversight, certification, liability rules) via calibrated models or field experiments.
- Test design interventions (frictions, uncertainty indicators, multi-model cross-check prompts) not just for immediate calibration of reliance but for their long-run impact on skills and productivity.

Practical quick suggestions for economists planning empirical work - Collect microdata at developer level: tool logs (interaction mode, frequency), code quality metrics, review churn, and career progression. - Pair controlled lab experiments (productivity, cognitive load) with field data (firm rollouts) for external validity. - Use Treude & Gerosa’s interaction taxonomy and this reliance–control mapping as a coding scheme to classify tool–task interactions in datasets.

Concluding note This framework provides a parsimonious way to map interaction modes and organizational choices to economic outcomes (productivity, skill formation, risk). It suggests that policies and firm strategies that preserve “balanced control” while achieving “appropriate reliance” may maximize durable gains from AI — a hypothesis well-suited for empirical testing by AI economists.

Assessment

Paper Typedescriptive Evidence Strengthlow — Findings are based on 22 qualitative interviews and describe perceptions and a proposed framework rather than measuring causal effects or estimating magnitudes; conclusions are hypothesis-generating rather than definitive. Methods Rigormedium — Uses standard qualitative methods (semi-structured interviews and thematic analysis to build a framework), which are appropriate for exploratory work, but the sample is small and likely nonrepresentative, with limited triangulation or quantitative validation. SampleTwenty-two software developers who use LLMs for software development were interviewed (semi-structured qualitative interviews); participants likely vary in role and experience but were recruited via purposive/convenience sampling and provided self-reported accounts of tool use; data were analyzed thematically to propose a reliance-control framework. Themeshuman_ai_collab productivity skills_training Generalizabilitysmall_nonrepresentative_sample, self_report_and_social_desirability_bias, likely convenience/purposive recruitment (selection bias), rapidly_evolving_LLM capabilities limit temporal generalizability, unclear geographic/industry spread of respondents

Claims (7)

Claim	Direction	Confidence	Outcome	Details
How software developers interact with AI-powered tools, including Large Language Models (LLMs), plays a vital role in how these AI-powered tools impact them. Developer Productivity	mixed	high	impact of AI tools on developers (broadly: productivity, skills, quality)	n=22 0.18
Overreliance on AI may lead to long-term negative consequences (e.g., atrophy of critical thinking skills). Skill Obsolescence	negative	high	atrophy of critical thinking skills / skill degradation	n=22 0.03
Underreliance on AI might deprive software developers of potential gains in productivity and quality. Developer Productivity	negative	high	productivity and output quality	n=22 0.03
We propose a preliminary reliance-control framework where the level of control can be used to identify AI overreliance and underreliance. Task Allocation	positive	high	ability to identify overreliance and underreliance (framework applicability)	n=22 0.03
The reliance-control framework can be used to recommend future research to explore different control levels supported by current and emergent LLM-driven tools. Research Productivity	positive	high	research directions and scope (exploration of control levels)	n=22 0.03
Our paper contributes to the emerging discourse on AI overreliance and provides an understanding of the appropriate degree of reliance as essential to developers making the most of these powerful technologies. Developer Productivity	positive	high	developers' ability to effectively use AI tools (appropriate degree of reliance)	n=22 0.09
Our findings can help practitioners, educators, and policymakers promote responsible and effective use of AI tools. Governance And Regulation	positive	medium	promotion of responsible and effective AI use (policy/education/practice guidance)	n=22 0.05