LLMs turn bug-finding into a volume problem rather than a scarcity problem: cheaply produced candidate reports flood defenders, moving value toward evidence-rich remediation, triage, and release capacity, with open-source maintainers especially strained.
Recent demonstrations of large language models producing candidate and confirmed vulnerabilities in production software have renewed the narrative that AI will reshape offensive and defensive security. Headlines emphasize capability; they rarely interrogate costs and incentives. This paper examines LLM-driven vulnerability discovery through a bugonomics lens: the operational economics of producing, proving, prioritizing, and fixing security-relevant defects. Historically, the most visible high-end bugonomics was offense-priced because production-grade zero-days and exploit chains were expensive specialist outputs for governments, brokers, and offensive vendors. Defender-side bugonomics already existed in vulnerability research, reward programs, and vendor remediation work; LLM-assisted systems change its scale and distribution. They make candidate generation, code comprehension, harness construction, proof-of-impact drafting, and report preparation cheaper at codebase scale. Exploits and proofs of concept remain important, but in defender workflows they primarily prove impact, guide prioritization, and justify remediation. The resulting bottleneck is not only finding more bugs; it is absorbing, validating, triaging, patching, and shipping a larger stream of reports. Using public data from Anthropic's Mythos Preview and Mozilla Firefox collaborations, along with public exploit-market price anchors and vulnerability reward programs, we argue that the near-term shift is not simply more zero-days. It is a move toward broader defender remediation throughput: low-signal candidates become cheaper, evidence-rich remediation become more important, and scarce capacity shifts toward maintainer review and release work. The effect is acute in open source, where LLM-assisted discovery can increase report volume while maintainer-side validation, triage, funding, and release capacity may not scale.
Summary
Main Finding
LLM-assisted workflows are changing the economics of vulnerability discovery not by collapsing offensive exploit prices, but by massively lowering the cost of generating candidate reports and evidence that defenders can act on. The economic effect is a shift from "zero-day asymmetry" (offense-dominated, high-price exploit markets) toward increased defender remediation throughput: many low-signal, cheaply generated candidates arrive at machine speed, making validation, impact-assessment, remediation packaging, and maintainer triage the new bottlenecks.
Key Points
- Distinct artifact categories matter. Candidate report, accepted vulnerability, exploitable vulnerability, proof-of-impact, remediation package, and production exploit chain are different products with different buyers, prices, and roles in workflows. Conflating them (as public debate often does) hides important economic distinctions.
- LLMs reduce costs unevenly. They most strongly reduce costs for: understanding unfamiliar code, generating hypotheses, writing harnesses/tests, producing minimal reproducers, drafting reports/patches, and scaling across many targets in parallel. They do not automatically make production-grade exploit chains cheap or plentiful.
- Defender-side repricing is likely. Cheaper candidate generation increases the supply of reports; defenders will value evidence-rich remediation packages and proofs-of-impact more highly (they reduce recipient validation and triage cost). VRPs, internal budgets, and maintainer priorities will adjust.
- New bottlenecks: validation (confirming true positives), impact/exploitability assessment, remediation packaging (tests/patches/review rationale), and shipping (triage, review, release) become the scarce, costly resources as candidate volume scales.
- Offense remains constrained by operational requirements. Historical exploit-market prices (multi-million-dollar payouts for production exploit chains) reflect scarcity, exclusivity, reliability, stealth, and operational constraints that LLM candidate volume does not eliminate.
- Open-source maintainers are especially exposed. LLM-assisted discovery can increase report volume far faster than maintainer capacity to validate, triage, fund, and ship fixes, creating practical overload and technical-debt pressure.
Data & Methods
- Sources used:
- Anthropic Mythos Preview public report: campaign-scale claims (thousands of candidate vulnerabilities across major OS/browser targets); stated campaign cost under ~$20,000 for ~1,000 runs, with demonstration runs sometimes as low as <$50 (hindsight-dependent).
- Mozilla / Firefox public reports: Claude Opus 4.6 found 22 Firefox vulnerabilities in two weeks (14 high severity); later reporting indicated 271 Mythos-identified bugs for Firefox 150 (180 sec-high) and 423 total Firefox security bugs fixed across sources in April 2026.
- Public exploit-price anchors: Crowdfense / reporting showing multi-million-dollar payouts for iPhone, Android, Chrome, Safari, messenger exploits; RAND zero-day analyses.
- Defender-side signals: Google VRP payouts (~$17M in 2025, ≈40% increase vs 2024); Google Project Zero work; incident baselines (Verizon DBIR 2025 showing vulnerabilities in ~20% of breaches; Mandiant M-Trends 2025 showing exploitation as ~33% of incidents where a vector was identified; Google tracked ~90 zero-days exploited in 2025).
- Method: conceptual, empirical, and accounting analysis (not an equilibrium market model). The authors introduce a simple workflow cost model separating stages:
- Ctotal = CG + CV + CI + CR + CT
- CG: candidate-generation cost (token + infrastructure); example token-prices from Anthropic: Opus 4.6 ~$5/M input, $25/M output.
- CV: validation (time × hourly validator cost × number of candidates)
- CI: impact/exploit assessment (time × cost × number of accepted findings)
- CR: remediation packaging (tests/patches/review rationale)
- CT: triage, maintainer review, shipping
- Per-outcome cost formulas show how low confirmation rates (πs = fraction confirmed) and exploitability rates (πe) inflate per-acceptedfinding or per-exploit costs (equations for Cfinding, Cimpact, Caccepted).
- Ctotal = CG + CV + CI + CR + CT
-
Empirical arithmetic example:
- Anthropic campaign claim: < $20,000 for ~1,000 runs. If interpreted as 24–48 findings, token/scaffold cost per reported finding is ≈ $417–$833; but that is an upper bound on candidate-generation cost only, not total cost to produce accepted, exploitable, or remediated vulnerabilities.
-
Limitations:
- Public reports lack full disclosure of validation, deduplication, acceptance rates (πs), exploitability rates (πe), maintainer triage hours, and true remediation costs; authors therefore avoid inferring exploit-market prices from LLM outputs and present conservative, campaign-level accounting.
Implications for AI Economics
- Repricing of defender investments: as candidate generation costs fall, marginal defender value shifts toward spending on validation, automated triage, evidence generation (executable reproducers, PoCs), remediation packaging, and release engineering. Financial incentives (VRPs, maintainer funding, grants) may need to move downstream to pay for packaging and shipping fixes, not just discovery.
- Economic specialization and differentiation of outputs: defender markets will prize evidence-rich artifacts (minimal reproducers, working patches, regression tests, clear exploitability assumptions). Offensive markets will continue to prize reliability, stealth, chaining, and exclusivity; LLMs do not negate those premium attributes.
- Operational policy and governance:
- Maintain and invest in orchestration: compose LLMs with fuzzers, static/dynamic analysis, sandboxing, and CI to produce high-evidence outputs and reduce validator/trier workload.
- Fund maintainer capacity and remediation pipelines, particularly in open-source, to avoid backlog and "denial-of-service" via noisy report floods.
- Adapt VRPs and disclosure incentives to reward remediation packaging and shipping (higher premiums for reproducers + patches).
- Monitor diffusion of capable models and open-weight alternatives; defenders cannot rely solely on a small set of frontier providers.
- Metrics for defenders and economists:
- Measure validated accepted findings per maintainer-hour and cost-per-accepted-fix rather than raw candidate counts.
- Track πs and πe empirically (confirmation and exploitability rates) to understand true throughput and to price defensive activities.
- Attack-defense substitution dynamics: attackers will keep substituting cheaper paths (credential theft, known CVEs, supply-chain), so an increase in candidate generation does not translate linearly into attacker capability. High-end exploit value (operational chains) likely remains scarce and priced by operational utility.
- Research and tooling priorities: invest in automated validation, deduplication, impact-assessment heuristics, and robust remediation-patch generation; design reward systems that pay for shipping fixes and not merely for candidate reports.
Summary takeaway: LLMs change the cost structure of vulnerability production primarily by making the front-end (candidate generation and preliminary evidence) cheap and scalable. The economically decisive constraints move downstream: validation, impact assessment, remediation packaging, and release — especially for open-source — and defenders, incentives, and tooling must follow to realize improved security outcomes.
Assessment
Claims (8)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| LLM-assisted systems make candidate generation, code comprehension, harness construction, proof-of-impact drafting, and report preparation cheaper at codebase scale. Task Completion Time | positive | high | cost/effort to produce candidate vulnerabilities (generation, comprehension, harnessing, PoI drafting, reporting) |
0.18
|
| Exploits and proofs of concept remain important, but in defender workflows they primarily prove impact, guide prioritization, and justify remediation rather than serving the same role they did in high-end offensive workflows. Decision Quality | mixed | high | role of exploits/PoCs in remediation/prioritization decisions |
0.18
|
| The resulting bottleneck is not only finding more bugs; it is absorbing, validating, triaging, patching, and shipping a larger stream of reports. Organizational Efficiency | negative | high | capacity/throughput for absorbing, validating, triaging, patching, and shipping vulnerability reports |
0.18
|
| The near-term shift is not simply more zero-days; it is a move toward broader defender remediation throughput: low-signal candidates become cheaper, evidence-rich remediation become more important, and scarce capacity shifts toward maintainer review and release work. Task Allocation | mixed | high | distribution of effort across discovery vs. validation/triage/remediation; relative importance of evidence-rich reports |
0.03
|
| LLM-assisted discovery can increase report volume while maintainer-side validation, triage, funding, and release capacity may not scale—an effect that is acute in open source. Task Allocation | negative | high | vulnerability report volume vs. maintainer validation/triage/funding/release capacity |
0.18
|
| Historically, the most visible high-end bugonomics was offense-priced because production-grade zero-days and exploit chains were expensive specialist outputs for governments, brokers, and offensive vendors. Market Structure | null_result | high | price/scarcity of production-grade zero-days and exploit chains in exploit markets |
0.18
|
| Defender-side bugonomics already existed in vulnerability research, reward programs, and vendor remediation work; LLM-assisted systems change its scale and distribution. Task Allocation | mixed | high | scale and distribution of defender-side vulnerability discovery and remediation activities |
0.18
|
| Public data from Anthropic's Mythos Preview and Mozilla Firefox collaborations, along with public exploit-market price anchors and vulnerability reward programs, support the argument that the near-term shift is toward increased defender remediation throughput rather than simply more zero-days. Adoption Rate | mixed | high | empirical basis for the paper's central thesis (data sources cited) |
0.18
|