An LLM-driven graph workflow slashed API implementation time at Volvo from roughly five hours to under seven minutes per endpoint, achieving 93.7% automation accuracy; the pilot reported an estimated 979 engineering hours saved and unanimous stakeholder satisfaction.
Multidisciplinary Software Development (MSD) requires domain experts and developers to collaborate across incompatible formalisms and separate artifact sets. Today, even with AI coding assistants like GitHub Copilot, this process remains inefficient; individual coding tasks are semi-automated, but the workflow connecting domain knowledge to implementation is not. Developers and experts still lack a shared view, resulting in repeated coordination, clarification rounds, and error-prone handoffs. We address this gap through a graph-based workflow optimization approach that progressively replaces manual coordination with LLM-powered services, enabling incremental adoption without disrupting established practices. We evaluate our approach on \texttt{spapi}, a production in-vehicle API system at Volvo Group involving 192 endpoints, 420 properties, and 776 CAN signals across six functional domains. The automated workflow achieves 93.7\% F1 score while reducing per-API development time from approximately 5 hours to under 7 minutes, saving an estimated 979 engineering hours. In production, the system received high satisfaction from both domain experts and developers, with all participants reporting full satisfaction with communication efficiency.
Summary
Main Finding
Automating the document-to-code translation in multidisciplinary software development (MSD) with an LLM-powered, graph-based workflow reduces coordination overhead and dramatically increases productivity. In a production automotive case (Volvo’s spapi), a three-stage automated pipeline produced 192 real endpoints with 93.7% F1, cut per-API development time from ~5 hours to under 7 minutes (~43× speedup), and saved an estimated 979 engineering hours while maintaining high stakeholder satisfaction.
Key Points
- Problem framed: MSD requires translating heterogeneous, domain-specific artifacts (OpenAPI specs, CAN signal defs, mapping docs) into code. Manual handoffs create fragmentation, ambiguity-driven churn, manual-translation overload, and coordination bottlenecks that scale poorly.
- Solution: Represent the MSD workflow as a directed graph G = (V, R) of artifact nodes and dependency edges. Apply iterative graph transformations that replace manual translation nodes with LLM-powered automated services while keeping domain experts in the loop.
- Implementation: Three coordinated LLM services (signal read/write synthesis, signal-property alignment, and property→endpoint assembly) plus automated validation tests synthesized from artifacts. Node-level function fLLM: (input materials, instructions) → code artifact; validation Φtest ensures artifacts pass tests before substitution.
- Incremental adoption: Automate individual nodes and validate with experts rather than attempting end-to-end replacement; remove redundant coordination edges when automated services can directly invoke one another.
- Evaluation (production deployment on spapi):
- Scope: 192 endpoints, 420 properties, 776 CAN signals across six functional domains.
- Quality: 93.7% F1 compared with semi-automated baseline (developers assisted by GitHub Copilot).
- Time: per-API time reduced from ≈5 hours to <7 minutes; estimated 979 engineering hours saved.
- User satisfaction: domain experts 4.80/5, developers 4.67/5; all participants reported full satisfaction with communication efficiency.
- Failure modes & mitigations: LLM outputs validated via synthesized test cases; experts remain in loop to review and refine; incremental rollout controls disruption.
Data & Methods
- Case-study setting: spapi, an in-vehicle REST API server at Volvo Group used by mobile and backend services.
- Data: production artifacts and metadata for 192 APIs, 420 properties, and 776 CAN signals from six vehicle domains.
- Modeling: Formal workflow graph (documents as nodes, dependency relations as edges). Node-to-service substitution formalized (d_i ⇒ s_i) with fLLM and Φtest.
- Implementation details: Three automated services composed into a production pipeline (depicted as three servers in the paper); fuzzy matching and embedding-based retrieval used to align signal descriptions with API properties; automated test synthesis for validation.
- Baselines & metrics:
- Baseline: semi-automated developer implementations using GitHub Copilot.
- Metrics: F1 score measuring correctness of generated endpoints; measured developer time per API; aggregate engineering hours saved; stakeholder survey scores.
- Iterative process: Node-level transformations followed by graph-level restructuring, repeated until workflow converged to G* with redundant edges removed.
- Deployment: System used in production; recorded quantitative metrics and qualitative feedback from participating roles.
Implications for AI Economics
- Productivity and cost structure
- Large productivity multiplier (≈43× per-API speedup) implies a large reduction in marginal labor cost for routine translation tasks in MSD. Estimated 979 hours saved is direct cost-avoidance and redeployable labor.
- Enables faster feature rollouts and lower time-to-market for API-driven capabilities; can shift firm investment from repetitive implementation labor toward higher-value activities (design, testing, validation, product strategy).
- Task reallocation and labor composition
- LLM automation substitutes repetitive coordination/transcription tasks but augments the role of domain experts and validation engineers. Demand shifts toward roles that design/oversee automated services, synthesize validation artifacts, and handle exception cases.
- Workforce effects are likely partial substitution: developers and coordination staff can be redeployed to higher-skill tasks (system architecture, verification), while a smaller set of staff maintains and improves the automation pipeline.
- Transaction costs and information frictions
- The graph-based approach reduces coordination transaction costs by programmatically encoding information flows. This reduces costly clarification rounds and knowledge bottlenecks, effectively lowering the internal costs of complex product development.
- By embedding domain knowledge into validated artifacts and services, the firm reduces reliance on tacit person-to-person coordination—improving resilience to staff turnover.
- Quality, safety, and liability considerations
- High F1 and expert-in-loop validation mitigate risks, but LLM-driven automation in safety-critical domains (automotive, aerospace, medical) creates regulatory and liability considerations. Firms must invest in rigorous validation, monitoring, and explainability to meet compliance and liability standards.
- Organizational design and capital investment
- Firms benefit from initial fixed costs to build and validate the automated workflow (engineering to build services, test scaffolding, and governance). The large per-unit savings make such investments attractive for large-scale MSD portfolios.
- The approach encourages modularization and standardization of artifacts (better interfaces, more machine-readable specs), which can generate compounding efficiency gains and platform effects across products.
- Market dynamics and competition
- Early adopters can reduce development lead times and lower costs for complex, multidisciplinary software, creating competitive advantage. Over time, standardized automated pipelines could become a source of scale economies, raising entry barriers for firms lacking the data, artifacts, or expertise to train/validate domain-specific automation.
- Externalities and distributional effects
- Positive externalities: faster innovation, reduced development waste, improved documentation and traceability.
- Potential negative effects: concentration of specialized tooling expertise, and transitional displacement of coordination/implementation roles if not managed through retraining and role evolution.
- Generalizability and limits
- The method generalizes to other MSD settings where artifacts are structured and validation tests can be synthesized (e.g., aerospace, energy systems, medical devices, industrial control). It is less applicable where artifacts are highly informal or where LLM hallucination risk cannot be mitigated by testable specifications.
- Economic value scales with portfolio size and artifact regularity; small projects may not justify fixed costs.
Summary takeaway: LLM-powered, graph-aware automation can sharply reduce coordination frictions and marginal implementation costs in multidisciplinary software development, producing substantial productivity gains and changing firm labor composition and organizational processes. Realizing these gains requires investment in validation, governance, and incremental integration to manage risk in safety- and regulation-sensitive domains.
Assessment
Claims (9)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Multidisciplinary Software Development (MSD) requires domain experts and developers to collaborate across incompatible formalisms and separate artifact sets. Organizational Efficiency | negative | high | collaboration/workflow efficiency between domain experts and developers |
0.08
|
| Even with AI coding assistants like GitHub Copilot, individual coding tasks are semi-automated, but the workflow connecting domain knowledge to implementation is not. Organizational Efficiency | negative | high | degree of automation of coding tasks vs. end-to-end workflow automation |
0.08
|
| Developers and experts still lack a shared view, resulting in repeated coordination, clarification rounds, and error-prone handoffs. Team Performance | negative | high | frequency of coordination rounds / error-prone handoffs |
0.24
|
| We address this gap through a graph-based workflow optimization approach that progressively replaces manual coordination with LLM-powered services, enabling incremental adoption without disrupting established practices. Adoption Rate | positive | high | ability to reduce manual coordination and enable incremental adoption |
0.48
|
| We evaluate our approach on spapi, a production in-vehicle API system at Volvo Group involving 192 endpoints, 420 properties, and 776 CAN signals across six functional domains. Other | null_result | high | evaluation dataset scale and scope (endpoints, properties, CAN signals, domains) |
n=192
192 endpoints, 420 properties, and 776 CAN signals across six functional domains
0.8
|
| The automated workflow achieves 93.7% F1 score. Output Quality | positive | high | F1 score (accuracy/quality of automated workflow outputs) |
n=192
93.7% F1 score
0.48
|
| The automated workflow reduces per-API development time from approximately 5 hours to under 7 minutes. Task Completion Time | positive | high | per-API development time |
n=192
per-API development time from approximately 5 hours to under 7 minutes
0.48
|
| The automated workflow saved an estimated 979 engineering hours. Organizational Efficiency | positive | high | total engineering hours saved |
n=192
saving an estimated 979 engineering hours
0.48
|
| In production, the system received high satisfaction from both domain experts and developers, with all participants reporting full satisfaction with communication efficiency. Worker Satisfaction | positive | high | participant-reported satisfaction with communication efficiency |
all participants reporting full satisfaction with communication efficiency
0.24
|