The web's human-first assumption is broken by AI intermediaries; to preserve truth and fair economics the authors propose making agents first-class: agent identity headers and rate limits, a tokenized, intent-based subscription model that charges agents like their human principals, and ATML plus cryptographic provenance to stop self-reinforcing AI content loops.

Towards an Agent-First Web: Redesigning the Web for AI Agents

Eranga Bandara, Ross Gore, Ravi Mukkamala, Asanga Gunaratna, Safdar H. Bouk, Xueping Liang, Peter Foytik, Abdul Rahman, Sachini Rajapakse, Isurunima Kularathna, Pramoda Karunarathna, Chalani Rajapakse, Ng Wee Keong, Kasun De Zoysa, Tharaka Hewa, Amin Hass, Wathsala Herath, Aruna Withanage, Nilaan Loganathan, Atmaram Yarlagadda, Sachin Shetty · June 17, 2026

arxiv theoretical n/a evidence 7/10 relevance Source PDF

The paper argues the web must be redesigned so AI agents are first-class citizens—via agent-identifying access headers, intent-aligned tokenized economics, and provenance-markup (ATML)—to preserve human-grounded content and fair web incentives.

The World Wide Web was built on an assumption held for three decades: the primary consumer of web content is a human being. This permeates every layer; its access model presumes human visitors, its economics rest on human attention, and its content targets human perception. The rapid emergence of AI agents as intermediaries between humans and web content invalidates this assumption. Yet the web resists agents through blanket blocking, CAPTCHA-based exclusion, and economic models that treat agent access as extraction rather than legitimate interaction. This paper proposes a principled redesign across three layers. At the access layer, agents acting for humans should inherit equivalent access rights, governed by rate limiting and agent identification metadata in HTTP requests, analogous to browser headers, alongside a dual-layer architecture serving human-readable and agent-optimized content from the same domain. At the economic layer, we propose an intent-based tier framework grounded in the agent-as-human-proxy principle: an agent's economic obligation mirrors that of the human it represents. A token-based subscription model meters content in tokens rather than pageviews, alongside a commissioned content economy anchoring AI content production in human intentionality. At the content layer, we identify epistemic recursion, the self-referential loop in which AI-generated content is consumed by agents to produce further content, progressively detaching web knowledge from human ground truth. We propose the Agent Text Markup Language (ATML), a four-level human supervision tier model, and a cryptographic provenance chain to counter this threat. Together these constitute ten design principles for an agent-first internet, one in which agents are first-class citizens whose integration requires renegotiating the web's foundational social contract across access, economics, and content.

Summary

Main Finding

The paper argues that the web must be redesigned across three interdependent layers — access, economics, and content — to treat AI agents acting on behalf of humans as first‑class web citizens. Doing so requires new protocol primitives (agent identification and agents.txt), an intent‑based economic model (token subscriptions, commissioned content), and content standards (Agent Text Markup Language, human supervision tiers, cryptographic provenance) to avoid economic harm to publishers and the epistemic risk of AI‑generated content spiraling into self‑referential “epistemic recursion.”

Key Points

Diagnosis
- The web was built on the assumption that humans are the primary consumers of web content; that assumption breaks under large‑scale AI agent use.
- Empirical signals of failure include infrastructure blocking of agents (Cloudflare moved to block AI crawlers by default), huge crawl‑to‑referral asymmetries (e.g., Anthropic ~73,000:1), explosive agent request growth (GPTBot +147%, Meta‑ExternalAgent +843% in one period), dramatic increases in zero‑click queries (≈60% overall, ≈93% in AI‑native modes), and large reported publisher traffic losses (70–80% reported declines).
- A new structural threat labeled epistemic recursion: AI agents consuming AI‑generated content to produce more AI content, progressively detaching web knowledge from human‑anchored ground truth and risking model collapse.
Access layer proposals
- Standardized agent identification metadata (HTTP headers) where agents declare identity, represented human, and intent (analogous to User‑Agent).
- agents.txt: replacement/upgrade of robots.txt — machine‑readable, intent‑aware, and supporting graduated access policies.
- Dual‑layer web architecture: coexisting human‑readable and agent‑optimized content during an orderly migration to agent‑first publishing.
- Prefer rate limiting and intent‑aware policing over blanket blocking.
Economic layer proposals
- Agent‑as‑human‑proxy principle: an agent’s economic obligations should mirror the human it represents.
- Intent‑based tiered framework: tiers tied to agent intent (personal vs. commercial), with a token‑based metering model compatible with existing AI API pricing.
- Commissioned content economy: payments/commissions tied to human‑intent‑anchored AI content generation to prevent a self‑reinforcing loop of AI‑only content supply.
- Free tiers and rate limits to preserve public goods and mirror open‑source/proprietary distinctions.
Content layer proposals
- Agent Text Markup Language (ATML): semantic format optimized for agent consumption with explicit provenance and supervision metadata.
- Four‑level human supervision tiers (machine‑readable/verifiable) plus a cryptographic provenance chain so consumers (agents) can verify degree of human oversight.
- These measures aim to break epistemic recursion by making provenance and supervision explicit and enforceable.
Synthesis
- The authors present a framework of ten design principles (across access, economics, content) that together treat agents as legitimate web actors and renegotiate the web’s social contract.

Data & Methods

Methodological approach
- Conceptual synthesis and architecture design: the paper combines literature review, technical proposals, and policy/architecture arguments rather than reporting a new experimental system.
- Empirical diagnosis draws on publicly reported industry data and prior studies (Cloudflare reports on crawler blocking and crawl/referral ratios; Semrush and SISTRIX data on search behavior and click‑through rates; other published work on model collapse and provenance).
Evidence cited (representative)
- Cloudflare: infrastructure shifts to block AI crawlers by default; crawl‑to‑referral ratios (e.g., Anthropic ~73,000:1).
- Growth metrics: GPTBot requests +147%, Meta‑ExternalAgent +843% over referenced periods.
- Search behavior: zero‑click share ≈60% overall, ≈93% in AI‑native search; CTR at position one falling from 27% to 11%.
- Prior academic work: model collapse and provenance/watermarking studies (Shumailov et al.; Kirchenbauer et al.; C2PA).
What the paper contributes methodologically
- A multi‑layer architectural/design proposal (concrete protocol ideas: agent headers, agents.txt, ATML, token meter) and a normative economic principle (agent‑as‑human‑proxy) derived from the diagnosis and literature synthesis.

Implications for AI Economics

New pricing primitives and revenue allocation
- Token‑based metering aligns web access pricing with how LLMs/APIs are already priced, enabling publishers to monetize agent access in a measurable way.
- Intent‑based tiers (personal vs. commercial use) reduce blunt pay‑per‑query approaches and avoid double charging (where both the agent platform and the publisher try to extract value).
- Commissioned content payments create a new revenue stream when agents commission AI‑generated content tied to explicit human intent, potentially compensating creators and publishers for derivative use.
Effects on existing monetization models
- Ad‑driven publisher revenue is threatened by zero‑click/agent mediation; the proposed tiered token economics is an attempt to internalize that externality.
- Subscription/token flows could shift bargaining power toward large agent platforms (who buy bulk access or negotiate enterprise tiers), risking concentration unless standards enable interoperability and competitive access.
Market structure and incentives
- Clear agent identity and intent metadata make it feasible to differentiate between benign personal assistants and large‑scale commercial scrapers, enabling differentiated pricing and potentially lowering friction for personal uses.
- Publishers can adopt agent‑optimized offerings (ATML) that command premium token prices, creating incentives to produce agent‑friendly structured content.
- There is a risk of rent extraction by gatekeepers (CDNs, cloud providers) who control agent access enforcement; regulatory or standardization interventions may be needed to prevent anti‑competitive pricing.
Externalities and public goods
- Free/limited tiers are necessary to preserve public information access and innovation (research, small creators). Designing these tiers properly is an economic and political challenge.
- Provenance and supervision metadata increase verification costs but generate positive externalities (improved epistemic integrity), which may justify collective funding or platform subsidization.
Training data markets and epistemic quality
- Preventing epistemic recursion preserves the quality of training data over time; without interventions, model collapse imposes negative externalities on all downstream models, which supports collective action (standards, payments, regulation).
- Commissioned content and provenance chains could create a market for high‑quality, human‑anchored datasets, changing the economics of model pretraining and fine‑tuning.
Policy and governance implications
- Standards (agents.txt, agent headers, ATML, provenance chains) must be interoperable and governance must prevent capture by a few firms.
- Regulators may need to clarify obligations for agent platforms (disclosure, payments to publishers, provenance verification) to protect publishers and public knowledge.
- Antitrust and platform regulation considerations: pricing regimes and access controls must be monitored to avoid exclusionary practices that entrench dominant agent platforms.

Overall, the paper reframes the economics of the web under agent mediation: it argues for economically coherent, intent‑aware pricing and provenance mechanisms that align incentives across users, agents, publishers, and platforms to sustain both commercial supply and epistemic integrity.

Assessment

Paper Typetheoretical Evidence Strengthn/a — This is a conceptual and design paper without empirical tests, experiments, or causal identification; claims are argued theoretically rather than demonstrated with data. Methods Rigorn/a — The paper develops principled architectures, protocols, and economic frameworks but does not present formal models, simulations, pilots, or empirical evaluation to validate feasibility or effects. SampleNo empirical sample — the paper is a normative, architecture-and-policy proposal drawing on descriptive characterization of the web, a threat model (epistemic recursion), and design ideas (HTTP metadata, token subscriptions, ATML, provenance chains). Themesgovernance adoption GeneralizabilityNo empirical validation — practical performance, costs, and incentives are untested, Depends on broad technical adoption across web servers, CDNs, browsers, and AI platforms — adoption barriers vary by ecosystem, Legal and regulatory differences across jurisdictions may limit implementability (privacy, liability, intermediary rules), Economic feasibility uncertain for small publishers and long-tail content creators, Assumes agents can be reliably identified and authenticated without creating new privacy or surveillance harms, Might not generalize to all agent types (closed-source LLMs, peer-to-peer agents, enterprise crawlers)

Claims (14)

Claim	Direction	Outcome	Confidence & Evidence	Details
The World Wide Web was built on an assumption held for three decades: the primary consumer of web content is a human being. Adoption Rate	null_result	primary consumer of web content (human vs agent)	Reading fidelity high Study strength low	0.06
This human-centric assumption permeates every layer of the web: its access model presumes human visitors. Adoption Rate	null_result	access model's intended client (human vs agent)	Reading fidelity high Study strength low	0.06
The web's economics rest on human attention. Market Structure	null_result	basis of web economic models (human attention reliance)	Reading fidelity high Study strength low	0.06
Web content is targeted to human perception. Consumer Welfare	null_result	target audience of web content (human perception)	Reading fidelity high Study strength low	0.06
The rapid emergence of AI agents as intermediaries between humans and web content invalidates the web's human-first assumption. Automation Exposure	negative	validity of human-first assumption given agent intermediaries	Reading fidelity high Study strength low	0.06
The web resists agents through blanket blocking, CAPTCHA-based exclusion, and economic models that treat agent access as extraction rather than legitimate interaction. Adoption Rate	negative	mechanisms of web resistance to agents (blocking, CAPTCHAs, economic treatment)	Reading fidelity high Study strength low	0.06
Agents acting for humans should inherit equivalent access rights, governed by rate limiting and agent identification metadata in HTTP requests (analogous to browser headers). Governance And Regulation	positive	access rights for agents relative to humans	Reading fidelity high Study strength speculative	0.02
A dual-layer architecture should serve human-readable and agent-optimized content from the same domain. Adoption Rate	positive	architecture serving human vs agent-optimized content	Reading fidelity high Study strength speculative	0.02
An intent-based tier framework should ground agent economics in the agent-as-human-proxy principle: an agent's economic obligation mirrors that of the human it represents. Market Structure	positive	economic obligations of agents relative to humans	Reading fidelity high Study strength speculative	0.02
A token-based subscription model can meter content in tokens rather than pageviews. Firm Revenue	positive	metering method for content consumption (tokens vs pageviews)	Reading fidelity high Study strength speculative	0.02
A commissioned content economy can anchor AI content production in human intentionality. Innovation Output	positive	degree to which AI content production is anchored in human intentionality	Reading fidelity high Study strength speculative	0.02
Epistemic recursion (AI-generated content consumed by agents to produce further content) progressively detaches web knowledge from human ground truth. Ai Safety And Ethics	negative	divergence of web knowledge from human-grounded truth due to recursive agent consumption/production	Reading fidelity high Study strength low	0.06
Agent Text Markup Language (ATML), a four-level human supervision tier model, and a cryptographic provenance chain can counter the epistemic recursion threat. Ai Safety And Ethics	positive	mechanisms to ensure provenance and human supervision for agent-mediated content	Reading fidelity high Study strength speculative	0.02
Together these proposals constitute ten design principles for an agent-first internet that requires renegotiating the web's foundational social contract across access, economics, and content. Governance And Regulation	positive	need for renegotiation of web governance/social contract to integrate agents	Reading fidelity high Study strength speculative	0.02