Generative AI helps break user stories into finer, more complete task lists but can’t yet replace human planners; developers prefer hybrid workflows combining GitLab Duo suggestions with manual review.
In agile software development, breaking down user stories into actionable tasks is a critical yet time-consuming process. This paper investigates the potential of Generative AI tools to assist in task splitting, aiming to enhance planning efficiency. We conducted a controlled experiment comparing traditional task-splitting methods with AI-assisted approaches using GitLab Duo. Our findings indicate that while current AI tools are not yet mature enough to replace developers, they can aid in generating more granular task lists and ensuring no important tasks are overlooked. Participants favored a hybrid approach, combining AI tools with conventional methods to maintain high accuracy in planning. This study highlights the potential benefits and limitations of integrating Generative AI into agile development processes, suggesting that AI tools can serve as valuable aids in task splitting, provided there is human oversight to filter out irrelevant tasks.
Summary
Main Finding
Generative AI (here: GitLab Duo) cannot yet replace developers for task-splitting in agile planning but can be a useful assistive tool. AI-generated breakdowns are more granular and surface tasks humans sometimes omit (e.g., tests, docs, refactors), but they also produce irrelevant or context-insensitive tasks. Participants strongly prefer a hybrid workflow where AI suggestions are curated by humans.
Key Points
- Controlled experiment: AI-assisted (GitLab Duo) vs conventional task-splitting.
- Sample: 42 higher-year CS students (39 in final analysis), teams of 3, 6 experimental teams, 7 control teams.
- Workload: each team given 8 user stories (total 48 split by experimental, 56 by control).
- Output differences:
- AI group produced 260 tasks (avg 5.4 tasks/user story).
- Control group produced 184 tasks (avg 3.2 tasks/user story).
- Teams implementing AI-generated plans implemented on average 59% of generated tasks; teams using manual plans implemented all their tasks.
- AI lists commonly included testing, documentation, and refactoring tasks that manual lists sometimes missed, but also unrelated items.
- Perceptions:
- 100% of participants preferred a hybrid AI+human approach for future task splitting.
- Only 10% judged AI as producing more relevant tasks than conventional methods; 55% disagreed, 35% were uncertain.
- Additional note from the authors’ earlier related work: automated effort estimation by generative AI in this context showed poor accuracy (≈16%).
Data & Methods
- Design: One-factor controlled experiment comparing two task-splitting methods (conventional vs GitLab Duo).
- Participants: Voluntary, experienced students (91% had 1–3 years dev experience); 82% had prior exposure to AI dev tools.
- Procedure: Three-session simulated sprint:
- Setup + pre-test + distribution to groups; teams created task lists from provided user stories.
- Implementation of selected user stories; progress tracked in GitLab.
- Acceptance testing + post-test questionnaire.
- Measures collected:
- Number and content of generated tasks per user story/team.
- Implementation outcomes (which generated tasks were executed).
- Participant attitudes via pre/post questionnaires.
- Key quantitative results: 260 vs 184 tasks; 5.4 vs 3.2 tasks per story; 59% implementation rate for AI-generated tasks.
- Threats to validity noted by authors: small/short controlled setting; student sample (not full-time industry devs); single AI tool (GitLab Duo) and no domain fine-tuning; rapid evolution of generative AI could change results.
Implications for AI Economics
- Complementarity, not substitution (for now): AI increases task granularity and surfaces neglected activities, implying it complements developers by augmenting planning comprehensiveness. However, imperfect relevance means human oversight remains essential, so AI shifts (rather than eliminates) labor toward verification, curation, and higher-level coordination.
- Productivity and taskization effects: More granular task lists could (a) reduce missed work and rework (raising effective throughput/quality) and (b) increase apparent administrative overhead (more issues to track), potentially changing how labor is allocated across roles (more time spent triaging/closing micro-tasks). Net productivity gains depend on the balance between avoided rework and added task-management friction.
- Skill-biased demand: Adoption favors workers skilled at prompt engineering, AI oversight, prioritization, and integrating AI outputs—skills that may command a wage premium. Routine estimation or low-level decomposition tasks may decline in value relative to supervisory and integrative skills.
- Platform- and tool-specific lock-in and market power: Integration of AI assistants into development platforms (e.g., GitLab Duo) can strengthen platform lock-in. Firms may face switching costs as AI workflows and artifacts become embedded in process tooling; market power implications deserve attention when platforms bundle increasingly capable AI assistants.
- Measurement and valuation challenges: Firms should not equate more tasks with greater work value. Economic assessments need to measure time-to-delivery, defect rates, rework, and managerial overhead to establish whether AI-assisted splitting delivers cost savings or just more tracked tasks.
- Transition dynamics and policy considerations: Short-term labor impacts are likely modest because human oversight is required. Over time, as LLMs improve and can be fine-tuned for domain/context, some lower-skill planning tasks could be automated—affecting entry-level roles and internship training. Policies and firm strategies should emphasize reskilling toward oversight, tooling integration, and quality assurance.
- Research priorities for applied AI economics:
- Field experiments in industry settings measuring time saved, defect reduction, and rework avoidance.
- Cost–benefit analyses accounting for increased task counts and curation costs.
- Studies on wage effects for roles that supervise or integrate AI outputs.
- Investigation of platform competition and lock-in as AI assistants proliferate.
Summary takeaway: Generative AI is an effective augmenting technology in agile task decomposition—improving coverage and granularity but introducing noise—so its economic impact will primarily be through complementarity (reshaping tasks and skills), changes in productivity conditional on oversight costs, and platform-dependent adoption dynamics.
Assessment
Claims (8)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Breaking down user stories into actionable tasks is a critical yet time-consuming process in agile software development. Task Completion Time | negative | high | time required to split user stories (descriptive claim about time consumption) |
0.3
|
| We conducted a controlled experiment comparing traditional task-splitting methods with AI-assisted approaches using GitLab Duo. Other | null_result | high | method comparison (experimental design) |
1.0
|
| Current AI tools are not yet mature enough to replace developers. Job Displacement | negative | high | suitability of AI to replace developers |
0.6
|
| AI-assisted approaches can generate more granular task lists than traditional methods. Output Quality | positive | high | task list granularity |
0.6
|
| AI-assisted approaches can help ensure no important tasks are overlooked during task-splitting. Error Rate | positive | high | task omission rate / completeness of task lists |
0.6
|
| Participants favored a hybrid approach, combining AI tools with conventional methods to maintain high accuracy in planning. Output Quality | positive | high | participant preference for planning approach / planning accuracy |
0.6
|
| AI tools can serve as valuable aids in task splitting, provided there is human oversight to filter out irrelevant tasks. Developer Productivity | positive | high | effectiveness of AI-assisted task-splitting under human oversight |
0.6
|
| Integrating Generative AI into agile development processes has potential benefits and limitations for planning efficiency. Organizational Efficiency | mixed | high | planning efficiency (benefits and limitations) |
0.6
|