LLMs turn bug-finding into a volume problem rather than a scarcity problem: cheaply produced candidate reports flood defenders, moving value toward evidence-rich remediation, triage, and release capacity, with open-source maintainers especially strained.

Demystifying the Mythos or Disrupting Bugonomics? From Zero-Day Asymmetry to Defender Remediation Throughput

Alfredo Pesoli, Herman Errico, Lorenzo Cavallaro · May 23, 2026

arxiv descriptive medium evidence 7/10 relevance Source PDF

LLM assistance is lowering the cost of generating candidate vulnerability reports at scale, shifting the economic bottleneck toward defender-side validation, triage, patching, and release capacity—especially in open-source projects.

Recent demonstrations of large language models producing candidate and confirmed vulnerabilities in production software have renewed the narrative that AI will reshape offensive and defensive security. Headlines emphasize capability; they rarely interrogate costs and incentives. This paper examines LLM-driven vulnerability discovery through a bugonomics lens: the operational economics of producing, proving, prioritizing, and fixing security-relevant defects. Historically, the most visible high-end bugonomics was offense-priced because production-grade zero-days and exploit chains were expensive specialist outputs for governments, brokers, and offensive vendors. Defender-side bugonomics already existed in vulnerability research, reward programs, and vendor remediation work; LLM-assisted systems change its scale and distribution. They make candidate generation, code comprehension, harness construction, proof-of-impact drafting, and report preparation cheaper at codebase scale. Exploits and proofs of concept remain important, but in defender workflows they primarily prove impact, guide prioritization, and justify remediation. The resulting bottleneck is not only finding more bugs; it is absorbing, validating, triaging, patching, and shipping a larger stream of reports. Using public data from Anthropic's Mythos Preview and Mozilla Firefox collaborations, along with public exploit-market price anchors and vulnerability reward programs, we argue that the near-term shift is not simply more zero-days. It is a move toward broader defender remediation throughput: low-signal candidates become cheaper, evidence-rich remediation become more important, and scarce capacity shifts toward maintainer review and release work. The effect is acute in open source, where LLM-assisted discovery can increase report volume while maintainer-side validation, triage, funding, and release capacity may not scale.

Summary

Main Finding

LLM-assisted workflows are changing the economics of vulnerability discovery not by collapsing offensive exploit prices, but by massively lowering the cost of generating candidate reports and evidence that defenders can act on. The economic effect is a shift from "zero-day asymmetry" (offense-dominated, high-price exploit markets) toward increased defender remediation throughput: many low-signal, cheaply generated candidates arrive at machine speed, making validation, impact-assessment, remediation packaging, and maintainer triage the new bottlenecks.

Key Points

Distinct artifact categories matter. Candidate report, accepted vulnerability, exploitable vulnerability, proof-of-impact, remediation package, and production exploit chain are different products with different buyers, prices, and roles in workflows. Conflating them (as public debate often does) hides important economic distinctions.
LLMs reduce costs unevenly. They most strongly reduce costs for: understanding unfamiliar code, generating hypotheses, writing harnesses/tests, producing minimal reproducers, drafting reports/patches, and scaling across many targets in parallel. They do not automatically make production-grade exploit chains cheap or plentiful.
Defender-side repricing is likely. Cheaper candidate generation increases the supply of reports; defenders will value evidence-rich remediation packages and proofs-of-impact more highly (they reduce recipient validation and triage cost). VRPs, internal budgets, and maintainer priorities will adjust.
New bottlenecks: validation (confirming true positives), impact/exploitability assessment, remediation packaging (tests/patches/review rationale), and shipping (triage, review, release) become the scarce, costly resources as candidate volume scales.
Offense remains constrained by operational requirements. Historical exploit-market prices (multi-million-dollar payouts for production exploit chains) reflect scarcity, exclusivity, reliability, stealth, and operational constraints that LLM candidate volume does not eliminate.
Open-source maintainers are especially exposed. LLM-assisted discovery can increase report volume far faster than maintainer capacity to validate, triage, fund, and ship fixes, creating practical overload and technical-debt pressure.

Data & Methods

Sources used:
- Anthropic Mythos Preview public report: campaign-scale claims (thousands of candidate vulnerabilities across major OS/browser targets); stated campaign cost under ~$20,000 for ~1,000 runs, with demonstration runs sometimes as low as <$50 (hindsight-dependent).
- Mozilla / Firefox public reports: Claude Opus 4.6 found 22 Firefox vulnerabilities in two weeks (14 high severity); later reporting indicated 271 Mythos-identified bugs for Firefox 150 (180 sec-high) and 423 total Firefox security bugs fixed across sources in April 2026.
- Public exploit-price anchors: Crowdfense / reporting showing multi-million-dollar payouts for iPhone, Android, Chrome, Safari, messenger exploits; RAND zero-day analyses.
- Defender-side signals: Google VRP payouts (~$17M in 2025, ≈40% increase vs 2024); Google Project Zero work; incident baselines (Verizon DBIR 2025 showing vulnerabilities in ~20% of breaches; Mandiant M-Trends 2025 showing exploitation as ~33% of incidents where a vector was identified; Google tracked ~90 zero-days exploited in 2025).
Method: conceptual, empirical, and accounting analysis (not an equilibrium market model). The authors introduce a simple workflow cost model separating stages:
- Ctotal = CG + CV + CI + CR + CT
  - CG: candidate-generation cost (token + infrastructure); example token-prices from Anthropic: Opus 4.6 ~$5/M input, $25/M output.
  - CV: validation (time × hourly validator cost × number of candidates)
  - CI: impact/exploit assessment (time × cost × number of accepted findings)
  - CR: remediation packaging (tests/patches/review rationale)
  - CT: triage, maintainer review, shipping
- Per-outcome cost formulas show how low confirmation rates (πs = fraction confirmed) and exploitability rates (πe) inflate per-acceptedfinding or per-exploit costs (equations for Cfinding, Cimpact, Caccepted).
Empirical arithmetic example:
- Anthropic campaign claim: < $20,000 for ~1,000 runs. If interpreted as 24–48 findings, token/scaffold cost per reported finding is ≈ $417–$833; but that is an upper bound on candidate-generation cost only, not total cost to produce accepted, exploitable, or remediated vulnerabilities.
Limitations:
- Public reports lack full disclosure of validation, deduplication, acceptance rates (πs), exploitability rates (πe), maintainer triage hours, and true remediation costs; authors therefore avoid inferring exploit-market prices from LLM outputs and present conservative, campaign-level accounting.

Implications for AI Economics

Repricing of defender investments: as candidate generation costs fall, marginal defender value shifts toward spending on validation, automated triage, evidence generation (executable reproducers, PoCs), remediation packaging, and release engineering. Financial incentives (VRPs, maintainer funding, grants) may need to move downstream to pay for packaging and shipping fixes, not just discovery.
Economic specialization and differentiation of outputs: defender markets will prize evidence-rich artifacts (minimal reproducers, working patches, regression tests, clear exploitability assumptions). Offensive markets will continue to prize reliability, stealth, chaining, and exclusivity; LLMs do not negate those premium attributes.
Operational policy and governance:
- Maintain and invest in orchestration: compose LLMs with fuzzers, static/dynamic analysis, sandboxing, and CI to produce high-evidence outputs and reduce validator/trier workload.
- Fund maintainer capacity and remediation pipelines, particularly in open-source, to avoid backlog and "denial-of-service" via noisy report floods.
- Adapt VRPs and disclosure incentives to reward remediation packaging and shipping (higher premiums for reproducers + patches).
- Monitor diffusion of capable models and open-weight alternatives; defenders cannot rely solely on a small set of frontier providers.
Metrics for defenders and economists:
- Measure validated accepted findings per maintainer-hour and cost-per-accepted-fix rather than raw candidate counts.
- Track πs and πe empirically (confirmation and exploitability rates) to understand true throughput and to price defensive activities.
Attack-defense substitution dynamics: attackers will keep substituting cheaper paths (credential theft, known CVEs, supply-chain), so an increase in candidate generation does not translate linearly into attacker capability. High-end exploit value (operational chains) likely remains scarce and priced by operational utility.
Research and tooling priorities: invest in automated validation, deduplication, impact-assessment heuristics, and robust remediation-patch generation; design reward systems that pay for shipping fixes and not merely for candidate reports.

Summary takeaway: LLMs change the cost structure of vulnerability production primarily by making the front-end (candidate generation and preliminary evidence) cheap and scalable. The economically decisive constraints move downstream: validation, impact assessment, remediation packaging, and release — especially for open-source — and defenders, incentives, and tooling must follow to realize improved security outcomes.

Assessment

Paper Typedescriptive Evidence Strengthmedium — The paper triangulates across proprietary collaboration logs (Anthropic Mythos Preview, Mozilla Firefox) and public price/reward data to document changes in vulnerability-report volume and the shifting bottlenecks; this provides timely empirical support but is observational, limited in scope, and vulnerable to selection and reporting biases, so it cannot support strong causal claims about broad market or long-run effects. Methods Rigormedium — The authors assemble and compare multiple data sources and use economic reasoning (price anchors, reward-program data) to interpret changes in incentives and workload, which is appropriate for a first-order 'bugonomics' analysis; however, they do not use quasi-experimental or causal-identification strategies, sample sizes and coverage are limited, and key variables (e.g., unreported vulnerability discovery, private exploit trades) are unobserved or approximated. SamplePublic and collaborative data from Anthropic's Mythos Preview outputs and interactions, documented vulnerability reports and remediation interactions from Mozilla Firefox collaborations, supplemented with public exploit-market price anchors and vulnerability-reward program data; primarily observational, public-facing reports and price signals rather than comprehensive market transaction datasets. Themesorg_design governance GeneralizabilityFocused on specific collaborations (Anthropic Mythos Preview, Mozilla Firefox) and may not generalize across other LLMs, closed-source software, or commercial cloud services, Open-source bias: maintainer and release-capacity constraints in OSS differ from vendor/enterprise settings, Early-stage/preview-model behavior may differ from later, more capable or differently regulated LLMs, Public-reporting and bounty data omit clandestine offensive markets and private exploit transactions, Exploit-market price anchors and reward-program figures are noisy proxies for true economic incentives, Short-term, near-term analysis that may not capture longer-run equilibrium responses (hiring, tooling, regulation)

Claims (8)

Claim	Direction	Confidence	Outcome	Details
LLM-assisted systems make candidate generation, code comprehension, harness construction, proof-of-impact drafting, and report preparation cheaper at codebase scale. Task Completion Time	positive	high	cost/effort to produce candidate vulnerabilities (generation, comprehension, harnessing, PoI drafting, reporting)	0.18
Exploits and proofs of concept remain important, but in defender workflows they primarily prove impact, guide prioritization, and justify remediation rather than serving the same role they did in high-end offensive workflows. Decision Quality	mixed	high	role of exploits/PoCs in remediation/prioritization decisions	0.18
The resulting bottleneck is not only finding more bugs; it is absorbing, validating, triaging, patching, and shipping a larger stream of reports. Organizational Efficiency	negative	high	capacity/throughput for absorbing, validating, triaging, patching, and shipping vulnerability reports	0.18
The near-term shift is not simply more zero-days; it is a move toward broader defender remediation throughput: low-signal candidates become cheaper, evidence-rich remediation become more important, and scarce capacity shifts toward maintainer review and release work. Task Allocation	mixed	high	distribution of effort across discovery vs. validation/triage/remediation; relative importance of evidence-rich reports	0.03
LLM-assisted discovery can increase report volume while maintainer-side validation, triage, funding, and release capacity may not scale—an effect that is acute in open source. Task Allocation	negative	high	vulnerability report volume vs. maintainer validation/triage/funding/release capacity	0.18
Historically, the most visible high-end bugonomics was offense-priced because production-grade zero-days and exploit chains were expensive specialist outputs for governments, brokers, and offensive vendors. Market Structure	null_result	high	price/scarcity of production-grade zero-days and exploit chains in exploit markets	0.18
Defender-side bugonomics already existed in vulnerability research, reward programs, and vendor remediation work; LLM-assisted systems change its scale and distribution. Task Allocation	mixed	high	scale and distribution of defender-side vulnerability discovery and remediation activities	0.18
Public data from Anthropic's Mythos Preview and Mozilla Firefox collaborations, along with public exploit-market price anchors and vulnerability reward programs, support the argument that the near-term shift is toward increased defender remediation throughput rather than simply more zero-days. Adoption Rate	mixed	high	empirical basis for the paper's central thesis (data sources cited)	0.18