Collaborative Human-Agent Protocol (CHAP)

Foundation models are moving from response generation into operational roles. They plan across steps, call tools, request human input, coordinate with other agents, and increasingly carry responsibility for work that affects customers, claims, code, contracts, and clinical decisions. Production deployments are no longer one human supervising one model. They are multi-human, multi-agent collaborations that cross teams, time zones, and trust boundaries. The technical surface for this collaboration remains weakly specified. When an agent drafts a response and a human edits it before it ships, the moment of human judgement is the most valuable signal in the system. In current practice it is recorded, if at all, in application code, chat threads, ticket comments, and tribal memory. Two protocol standards address adjacent concerns: MCP standardises agent access to tools and data, and A2A standardises agent-to-agent interoperability. Neither defines the shared workspace in which humans and agents perform accountable work together. This paper presents CHAP, the Collaborative Human-Agent Protocol. Under CHAP, the override that used to vanish into a chat thread becomes a structured event carrying a diff, a rationale, and a content hash. The handoff between shifts becomes a portable envelope rather than a pinned message. The human approval of an agent's draft becomes a non-repudiable signed decision that can be replayed years later. The protocol achieves this through a small Core (workspaces, participants, tasks, artefacts, and an append-only evidence log) together with composable profiles that add review, modes, routing, deliberation, handoff, identity, signatures, and transparency-backed audit as deployments require them. Specification, reference implementation, conformance suite, and worked examples are available at: https://github.com/BrightbeamAI/chap

Summary

Main Finding

CHAP (Collaborative Human-Agent Protocol) defines a portable, auditable collaboration layer for multi-human, multi-agent operational work. It formalises the workspace, typed task lifecycle, structured human review (overrides as diffs + rationales), append‑only evidence logs, and composable profiles (review, modes, routing, handoff, identity/signatures, SCITT-backed audit). CHAP fills a protocols gap not addressed by MCP, A2A, workflow engines, or identity systems and is positioned to materially change how organisations govern, audit, and scale human–AI work.

Key Points

Problem statement
- Wave I/II agent paradigms (assistant, agentic tool use) leave the collaboration semantics between humans and agents underspecified.
- Typical practice fragments evidence of human judgement across UI state, chat threads, application code — impairing auditability, governance, and replayability.
Core proposal
- Core primitives: workspace, participants (human/agent/service/group/bridge), tasks, artefacts, append-only evidence log.
- JSON‑RPC‑2.0 inspired envelope model for events; typed lifecycle events (assign, accept, progress, review, approve, reject, override, abstain, escalate, handoff, snapshot, rollback).
- Structured overrides: diffs + rationale + content hashes; non‑repudiable signed decisions where profiles require.
- Profiles: composable extensions for review, modes/promotion, routing, whisper channels, handoff, deliberation, identity binding, signatures, and transparency/audit (e.g., SCITT).
Implementation status
- v0.2 working draft (public): specification, a single reference implementation (@chap/coordinator), draft conformance suite, worked examples, and repository: https://github.com/BrightbeamAI/chap.
- Not yet standards‑track stable; production pilots encouraged but full conformance claims await more interoperable implementations and exhaustive tests.
Security, trust, compliance
- Threat model, signing & canonicalisation rules, identity binding, privacy/retention/redaction practices described.
- Designed to compose with existing identity (OIDC/OAuth), credentialing (W3C VC), tool access (MCP), agent interoperability (A2A), and audit supply‑chains (SCITT).
Runtime semantics & observability
- Informative algorithms for envelope acceptance, review depth decision, mode promotion, override analytics.
- Conformance levels and suggested evaluation questions to assess deployments.

Data & Methods

Nature of contribution: protocol specification + reference implementation + evaluation guidance (not empirical/experimental research).
Artifacts produced:
- Core data model schemas: workspace descriptor, participant descriptor, task descriptor, artefact descriptor.
- Envelope specification (JSON‑RPC‑style) for exchanging collaboration events.
- Profile catalogue and method surface (Core methods + profile methods).
- Reference implementation: @chap/coordinator (public GitHub).
- Draft conformance/evaluation harness and worked user journeys (13+ journeys, appendices with case studies).
- Informative runtime algorithms and deployment patterns (coordinator‑mediated, peer‑to‑peer, federated, on‑prem).
Interoperability & assurance approach:
- Progressive profile model (deploy minimal Core then add profiles as required).
- Conformance levels with suggested evaluation criteria and analytics (override frequency, review depth, escation patterns).
Limitations of the methods:
- v0.2: single reference implementation, draft test vectors; no exhaustive interoperability or field evaluation yet.
- Focus is on protocol design and practical engineering guidance rather than field trials or large‑scale empirical validation.

Implications for AI Economics

Transaction costs and coordination
- CHAP reduces frictions in multi‑actor workflows by standardising what counts as evidence and how decisions are recorded. Lower coordination costs can raise effective productivity of mixed human–agent teams and make cross‑tool collaboration more tractable.
Liability, compliance, and regulatory economics
- Structured, replayable evidence (signed overrides, provenance chains) lowers uncertainty about who did what and why, reducing legal/regulatory verification costs and potentially lowering insurer/creditor risk premia for AI‑augmented processes.
- However, producing/storeing rich append‑only logs increases compliance and data‑retention costs; firms will face tradeoffs between auditability and storage/privacy costs.
Markets and modularity
- A stable collaboration protocol creates modular interfaces between agents, tool providers, identity/policy services, and audit services. That modularity can:
  - Reduce vendor lock‑in and switching costs for orchestration and review tooling.
  - Enable markets for specialised “review”, “audit”, and “compliance” services (e.g., third‑party auditors that consume CHAP logs).
  - Foster ecosystems of certified agent components and profiles, increasing competition and specialization.
Labour and task decomposition
- CHAP makes the moment and content of human judgement explicit and structured. This can:
  - Shift human roles from generating content to auditing/reviewing/overriding agent outputs.
  - Make overrides valuable training data (explicit diffs + rationales), lowering costs to retrain/improve agents and altering the returns to human reviewers (higher skill premium for those producing high‑quality rationales).
  - Enable finer‑grained measurement of human vs agent contributions, facilitating performance‑based compensation or pricing.
Pricing and contracting
- With standardized evidence and metrics (override analytics, review depth), buyers and sellers of agent services can contract on observable outcomes (e.g., override rate, escalation frequency). This supports outcome‑linked pricing models and service‑level agreements that incorporate human review burdens.
Network effects and standard adoption
- Widespread CHAP adoption would create positive network effects (shared profiles, off‑the‑shelf governance policies, interoperable audit logs). Early movers could gain scale benefits, but existence of a draft spec and single implementation means standards risk capture if not opened to multiple implementers.
Compliance-driven demand and new markets
- Regulated industries (healthcare, finance, insurance, regulated manufacturing, government) are likely early adopters. Demand for CHAP‑compatible tooling — secure evidence stores, identity bindings, SCITT integration, and certified profile bundles — could become a measurable new market segment in enterprise AI.
Risks & externalities
- Privacy and retention tradeoffs: append‑only evidence logs are valuable for audit but raise privacy/regulatory burdens (GDPR, sectoral rules). Firms may face higher compliance costs and greater attack surface.
- Potential for increased surveillance of workers: fine‑grained audit logs can be used for productivity monitoring, with labor‑market and welfare implications.
- Uneven bargaining power: large platform vendors could integrate CHAP and bundle compliant ecosystems, potentially reinforcing platform dominance despite the protocol’s intent to reduce lock‑in.
Research & measurement opportunities
- Economists can study: how CHAP adoption affects (a) override rates and hence agent trust calibration; (b) costs of compliance and litigation; (c) market structure and vendor competition in agent/tool markets; (d) labor reallocation between drafting vs review tasks; and (e) price formation for mixed human‑agent services.
Policy implications
- Regulators can leverage CHAP‑style standards to specify minimum audit/evidence requirements for production agentic systems, lowering enforcement costs. But regulators must also manage privacy/retention tradeoffs and avoid standards that entrench particular vendors.

Short caveats - CHAP is a protocol design and engineering contribution (v0.2); its economic effects are prospective and depend on adoption, interoperability, and regulatory uptake. - Empirical validation (field pilots, cross‑firm studies) is needed to quantify impacts on transaction costs, liability exposure, labor demand, and market structure.

Assessment

Paper Typedescriptive Evidence Strengthn/a — The paper is a standards/protocol specification and engineering proposal; it does not present empirical or causal evidence, experiments, or observational analyses to support claims about economic or productivity impacts. Methods Rigormedium — The work provides a clear core specification, composable profiles, a reference implementation, a conformance suite, and worked examples, indicating solid engineering rigor; however, it lacks formal verification, threat modeling, security proofs, user studies, or field deployments to validate real-world behavior and usability. SampleNo empirical sample—this is a protocol/specification paper. Artifacts include the CHAP specification, composable profile definitions, a reference implementation, a conformance test suite, and illustrative examples hosted on the project's GitHub repository. Themeshuman_ai_collab org_design governance adoption GeneralizabilityDepends on ecosystem adoption and integration with existing toolchains and standards (MCP, A2A, proprietary systems)., Unclear applicability across regulated sectors (healthcare, finance, legal) without domain-specific extensions or compliance proof., Relies on identity, signing, and cryptographic infrastructure that may vary across organizations and jurisdictions., Does not quantify or validate productivity, cost, or safety impacts across different organizational sizes or cultures., May require cultural and workflow changes that limit adoption in organizations with entrenched practices.

Claims (14)

Claim	Direction	Confidence	Outcome	Details
Foundation models are moving from response generation into operational roles. Adoption Rate	positive	high	movement of foundation models into operational roles	0.03
Agents plan across steps, call tools, request human input, coordinate with other agents, and increasingly carry responsibility for work that affects customers, claims, code, contracts, and clinical decisions. Decision Quality	positive	high	extent of agent capabilities and responsibilities affecting operational outputs	0.03
Production deployments are no longer one human supervising one model; they are multi-human, multi-agent collaborations that cross teams, time zones, and trust boundaries. Team Performance	null_result	high	structure of production deployments (multi-human, multi-agent)	0.03
The technical surface for this collaboration remains weakly specified. Organizational Efficiency	negative	high	degree of specification/standardization of collaboration interfaces	0.09
When an agent drafts a response and a human edits it before it ships, the moment of human judgement is the most valuable signal in the system. Decision Quality	positive	high	value of human judgement signal in human-agent workflows	0.03
In current practice the human judgement is recorded, if at all, in application code, chat threads, ticket comments, and tribal memory. Organizational Efficiency	negative	high	location and durability of records of human judgement in workflows	0.09
Two protocol standards address adjacent concerns: MCP standardises agent access to tools and data, and A2A standardises agent-to-agent interoperability. Governance And Regulation	null_result	high	scope of existing protocol standards	0.18
Neither MCP nor A2A defines the shared workspace in which humans and agents perform accountable work together. Governance And Regulation	negative	high	presence/absence of specifications for shared workspace in existing standards	0.18
This paper presents CHAP, the Collaborative Human-Agent Protocol. Governance And Regulation	positive	high	introduction of a new protocol	0.3
Under CHAP, the override that used to vanish into a chat thread becomes a structured event carrying a diff, a rationale, and a content hash. Governance And Regulation	positive	high	nature of recorded overrides (structured events with metadata)	0.18
The handoff between shifts becomes a portable envelope rather than a pinned message under CHAP. Organizational Efficiency	positive	high	formality and portability of handoff artifacts	0.18
The human approval of an agent's draft becomes a non-repudiable signed decision that can be replayed years later under CHAP. Governance And Regulation	positive	high	non-repudiability and auditability of human approvals	0.18
The protocol achieves this through a small Core (workspaces, participants, tasks, artefacts, and an append-only evidence log) together with composable profiles that add review, modes, routing, deliberation, handoff, identity, signatures, and transparency-backed audit as deployments require them. Governance And Regulation	positive	high	protocol architecture and composability of features	0.18
Specification, reference implementation, conformance suite, and worked examples are available at: https://github.com/BrightbeamAI/chap Other	null_result	high	availability of specification and accompanying artifacts	0.3

CHAP formalises human–agent collaboration by turning informal edits, approvals and handoffs into structured, signed, auditable events; the protocol and reference implementation aim to make multi-agent, multi-human production workflows accountable and portable across teams.