← Papers

An agentic multi-agent system speeds product experimentation roughly tenfold in simulations, reducing time-to-validated-learning while keeping statistical rigor and decision traceability. By instrumenting a SaaS codebase and logging feature-level behaviour, the study turns agentic AI into a disciplined experimentation infrastructure rather than a generic assistant.

Multi Agent Systems In The Lean Startup Cycle: Operationalising Dynamic Capabilities

Elias Jelinek, Hannes Rothe · June 14, 2026 · Journal of the Association for Information Systems

openalex descriptive medium evidence 7/10 relevance Full text usable extracted full text Source PDF

Structured author observations

Linked only from stored provider relations; the raw author line above is never matched by name.

OpenAlex

Latest observation: July 23, 2026

Jelinek, Elias provider ID
Rothe, Hannes provider ID

A designed multi-agent system that operationalizes the Build–Measure–Learn cycle cuts time-to-validated-learning by roughly an order of magnitude in controlled simulations while preserving statistical rigor, traceability, and nuanced iterate/persevere decisions.

Citation observations

Cumulative provider counts captured on specific dates; providers are never combined.

0 cumulative citations

OpenAlex · Observed July 22, 2026

View corpus context

Generative, agentic AI promises to accelerate venture learning, yet we lack concrete designs for embedding them into entrepreneurial experimentation. This design science study proposes a multi-agent artefact that operationalises the Build–Measure–Learn (B-M-L) cycle as a closed-loop control system. Drawing on the Dynamic Capabilities View, we derive fifteen meta-requirements and thirty-three design principles (consolidated into seven goal-directed groups) for sensing, seizing, reconfiguring, orchestration, and governance. We instantiate them in a Node.js package instrumenting a production-grade SaaS codebase. Controlled simulations compare agentic and manual B-M-L cycles on feature ideas. The Multi Agent System reduces time-to-validated-learning by roughly an order of magnitude while preserving statistical rigour, traceability, and nuanced Persevere/Iterate decisions. Logs render capabilities observable at the feature level, turning “agentic AI” into a disciplined experimentation infrastructure rather than a generic assistant. We discuss implications for IS design and future field evaluations.

Summary

Main Finding

A purpose-built multi-agent system (MAS) that operationalises the Lean Startup Build–Measure–Learn (B–M–L) loop as a closed‑loop control system materially accelerates entrepreneurial learning. In controlled simulations on a production-grade SaaS codebase, the MAS reduced time‑to‑validated‑learning (TTVL) by roughly an order of magnitude while retaining statistical rigour, traceability, and nuanced Persevere/Iterate decisions. The artefact turns “agentic AI” into a disciplined experimentation infrastructure (with observable, auditable capabilities at the feature level) rather than a generic assistant.

Key Points

Research question: How can multi-agent systems operationalise the Lean Startup cycle to accelerate entrepreneurial learning under Knightian uncertainty?
Theoretical framing: combines the Dynamic Capabilities View (sense, seize, reconfigure) with the Lean Startup B–M–L routine; treats the B–M–L cycle as a routinised dynamic capability that can be algorithmically instantiated.
Design outputs:
- 15 meta‑requirements (design obligations required to avoid collapse of validated learning).
- 33 concrete design principles, consolidated into seven goal‑directed groups covering sensing, seizing, reconfiguring, orchestration, governance and cross‑cutting needs (e.g., traceability, human oversight).
- A four‑agent MAS implementation: Build, Measure, Learn, and an Orchestrator layer that enforces guardrails (feature flags, experiment contracts, rollback rules, ethical checks, mandatory human approvals for high‑risk actions).
Implementation details:
- Node.js package / CLI instantiated on a production‑grade SaaS codebase.
- Agents operate over a human‑readable markdown “database” capturing ideas, features, experiments, telemetry, and learning reports to ensure traceability.
- Agents built on an LLM framework (Claude Code in the paper) plus a shared data substrate and orchestration.
Evaluation:
- Design Science Research process: literature review → derive MRs → translate to design principles → build MAS → evaluate.
- Literature search: initial 312 hits (Scopus, Web of Science, AIS eLibrary) → 58 papers for full-text coding (2015–2025).
- Simulation-based evaluation comparing agentic MAS vs a scripted manual baseline across three B–M–L cycles on realistic feature ideas.
- Measured outcomes: TTVL, experiment throughput, rollback events, traceability, and completeness of learning documentation.
Key empirical results:
- Approximately an order‑of‑magnitude reduction in TTVL in simulations.
- Preservation of statistical rigour and documented, auditable decisions.
- Accumulation of institutional memory outside individual founders, enabling “learning to learn.”
Governance/design safeguards: mandatory human decision gates, feature flags and rollback policies, explicit experiment contracts, bias‑awareness checks.
Limitations noted by authors: evaluation limited to controlled simulations and a single codebase; dependence on current LLM tooling; field validation remains future work.

Data & Methods

Literature review:
- Databases: Scopus, Web of Science, AIS eLibrary.
- Search constrained to works mentioning dynamic capabilities, agentic AI, and Lean Startup concepts.
- Screening reduced 312 initial results to 58 papers for coding (coded for presence/implementation of sensing, seizing, reconfiguring; autonomy and governance).
- Outcome: 15 meta‑requirements and 33 design principles.
Artefact development:
- Design Science Research (Peffers et al., Hevner et al.) methodology.
- Implemented MAS as a Node.js CLI package with four agents (Build, Measure, Learn, Orchestrator) and markdown-based datastore.
- Agents use an LLM-based engine (Claude Code in the paper) and operate under orchestration/gatekeeping.
Evaluation:
- Ex‑ante technical mapping: verified implemented functions against meta‑requirements and design principles.
- Simulation study:
  - Environment: a market/behaviour simulator controlling user responses; deployed against a realistic SaaS product codebase.
  - Regimes compared: scripted manual workflow (developer + analyst) vs MAS orchestration.
  - Each regime executed three B–M–L cycles.
  - Recorded metrics: loop latency (TTVL), rollback events, traceability/audit logs, completeness of learning documents.
- Results: MAS produced faster validated learning, maintained documentation and governance artefacts; quantitative TTVL improvement ~10x (order of magnitude).

Implications for AI Economics

Lower cost and faster speed of experimentation
- Temporal compression via MAS reduces the time and therefore cost required to validate feature hypotheses. This lowers the marginal cost of search and discovery for digital ventures, changing the economics of early-stage experimentation.
Reduced barriers to entry and altered firm formation dynamics
- By enabling high-quality experimentation with fewer human resources (supporting “one‑person” or small teams), agentic MAS can lower fixed and variable costs of starting digital ventures. Expect higher startup formation rates and more rapid iteration on product-market fits.
Redistribution of comparative advantage
- Firms that embed algorithmic dynamic capabilities (MAS + instrumentation + institutional memory) can learn and adapt faster, potentially concentrating advantages among ventures that invest in these infrastructures. This may increase returns to scale and intensify winner‑take‑most dynamics in some digital markets.
Changed investment and valuation signals
- Faster validated learning and richer traceability create new measurable signals (e.g., TTVL, experiment throughput, audit logs) that investors can use to assess capability and de‑risk ventures. Conventional metrics and due diligence may need adaptation.
Labour and task reconfiguration
- Cognitive and coordination tasks in early product experimentation may shift from human labor to agentic systems, altering demand for certain developer/analyst roles while increasing demand for roles in governance, orchestration, and human oversight.
New forms of intangible capital
- The MAS artefact embeds institutional memory and routinised dynamic capabilities as codified artefacts (logs, experiment contracts, templates). These are firm‑specific intangible assets that raise switching costs and may be partially transferable (productized as platform services).
Risk, externalities and policy implications
- Faster cycles increase market churn and may amplify systemic risks (e.g., rapid rollouts with subtle harms). The paper’s governance primitives (feature flags, rollback, human gates, audit trails) point to the kinds of regulatory and compliance features regulators and policy makers should require for agentic experimentation systems.
Measurement & research agenda
- Introduces TTVL and documented experiment traceability as candidate economic performance metrics for AI‑enabled ventures. Empirical work should estimate aggregate effects (market formation, concentration, productivity) and distributional impacts (who benefits, who loses).
Limits to generalisability
- Current results derive from simulation and a single SaaS codebase with a specific LLM stack. Field experiments and broader industry studies are needed to quantify macroeconomic impacts, contestability effects, and labor market adjustments.

If you want, I can: - Extract the 15 meta‑requirements and the 7 consolidated DP groups verbatim from the paper (if you provide the full table), or - Draft research hypotheses and an empirical strategy for estimating macroeconomic effects (market concentration, startup formation, investment returns) based on the paper’s findings.

Assessment

Paper Typedescriptive Evidence Strengthmedium — The controlled simulations provide internally consistent, mechanistic evidence that the multi-agent system can speed experiment cycles and preserve decision traceability, but the results rely on a simulated/benchmarked setting and a single instantiated codebase rather than live, randomized field deployments across firms and contexts, limiting external validity. Methods Rigormedium — The study systematically derives meta-requirements and design principles from Dynamic Capabilities, implements a concrete Node.js artefact, and runs controlled comparisons with statistical rigour and detailed logging; however, methodological limits include reliance on simulation scenarios, potential implementation-specific optimizations, unclear sample size/power reporting for simulations, and no field randomization or multi-site replication. SampleA multi-agent artefact implemented as a Node.js package and instrumented on a production-grade SaaS codebase; controlled simulations compare agentic vs manual Build–Measure–Learn cycles on a set of feature ideas (logs and feature-level metrics collected); experiments appear to be synthetic or replay-based within the instrumented environment rather than live field trials across multiple companies. Themesproductivity innovation human_ai_collab org_design governance IdentificationControlled simulation experiments comparing an instantiated multi-agent B-M-L system against manual B-M-L cycles on feature ideas instrumented in a production-grade SaaS codebase; outcome is time-to-validated-learning with statistical comparisons and trace/log analysis (simulation environment, not field randomized trial). GeneralizabilityResults are based on controlled simulations rather than live field deployments, so human-in-the-loop behaviors in production may differ., Single SaaS codebase/infrastructure used for instantiation may bias results toward that tech stack and product architecture., Performance may depend on specific agent designs, prompts, and tuning choices that may not generalize to other agent implementations., Types and complexity of feature ideas tested may not represent the full range of product experiments (e.g., hardware, regulated domains)., Organizational factors (team structure, governance, incentives) and market dynamics in other firms could alter effectiveness.

Claims (8)

Claim	Direction	Outcome	Confidence & Evidence	Details
We propose a multi-agent artefact that operationalises the Build–Measure–Learn (B-M-L) cycle as a closed-loop control system. Other	positive	operationalisation of the Build–Measure–Learn cycle as a closed-loop control system	Reading fidelity high Study strength low	not reported 0.09
Drawing on the Dynamic Capabilities View, we derive fifteen meta-requirements and thirty-three design principles (consolidated into seven goal-directed groups) for sensing, seizing, reconfiguring, orchestration, and governance. Other	positive	number and organization of derived meta-requirements and design principles	Reading fidelity high Study strength low	not reported 0.09
We instantiate them in a Node.js package instrumenting a production-grade SaaS codebase. Other	positive	existence and instantiation of a Node.js package that instruments a SaaS codebase	Reading fidelity high Study strength low	not reported 0.09
Controlled simulations compare agentic and manual B-M-L cycles on feature ideas. Task Completion Time	positive	comparison of agentic vs manual B-M-L cycles (experimentation performance metrics)	Reading fidelity high Study strength medium	not reported 0.18
The Multi Agent System reduces time-to-validated-learning by roughly an order of magnitude while preserving statistical rigour, traceability, and nuanced Persevere/Iterate decisions. Task Completion Time	positive	time-to-validated-learning (and preservation of statistical rigour, traceability, decision quality)	Reading fidelity high Study strength medium	roughly an order of magnitude reduction in time-to-validated-learning 0.18
Logs render capabilities observable at the feature level, turning 'agentic AI' into a disciplined experimentation infrastructure rather than a generic assistant. Organizational Efficiency	positive	feature-level observability/traceability of experimentation activities	Reading fidelity high Study strength low	not reported 0.09
The approach preserves statistical rigour, traceability, and nuanced Persevere/Iterate decisions when accelerating experimentation. Decision Quality	positive	statistical rigour, traceability, and decision quality in experimentation (Persevere/Iterate decisions)	Reading fidelity high Study strength medium	not reported 0.18
We discuss implications for Information Systems (IS) design and propose future field evaluations. Governance And Regulation	positive	proposed implications and future research directions	Reading fidelity high Study strength speculative	not reported 0.03