Seventeen ways to pair humans with LLMs: the chosen interaction pattern materially alters recommendations, error types and who bears responsibility; in clinical diagnostics different archetypes produce systematic differences in outputs and reliance, implying nontrivial consequences for productivity, liability and adoption.

Who Does What? Archetypes of Roles Assigned to LLMs During Human-AI Decision-Making

S. Chappidi, Jatinder Singh, A. Krauze · Fetched March 15, 2026

semantic_scholar descriptive medium evidence 7/10 relevance Source

The paper defines 17 reusable human–LLM interaction archetypes and shows that which archetype is adopted meaningfully changes LLM recommendations, error profiles, and downstream decisions in clinical diagnostic cases, with attendant tradeoffs for control, accountability, and information needs.

LLMs are increasingly supporting decision-making across high-stakes domains, requiring critical reflection on the socio-technical factors that shape how humans and LLMs are assigned roles and interact during human-in-the-loop decision-making. This paper introduces the concept of human-LLM archetypes -- defined as re-curring socio-technical interaction patterns that structure the roles of humans and LLMs in collaborative decision-making. We describe 17 human-LLM archetypes derived from a scoping literature review and thematic analysis of 113 LLM-supported decision-making papers. Then, we evaluate these diverse archetypes across real-world clinical diagnostic cases to examine the potential effects of adopting distinct human-LLM archetypes on LLM outputs and decision outcomes. Finally, we present relevant tradeoffs and design choices across human-LLM archetypes, including decision control, social hierarchies, cognitive forcing strategies, and information requirements. Through our analysis, we show that selection of human-LLM interaction archetype can influence LLM outputs and decisions, bringing important risks and considerations for the designers of human-AI decision-making systems

Summary

Main Finding

The paper introduces "human-LLM archetypes" — recurring socio-technical interaction patterns that structure how humans and large language models (LLMs) share roles in decision-making — and shows that which archetype is chosen meaningfully affects LLM outputs and downstream decisions. The authors identify 17 archetypes from a scoping review of the literature and demonstrate, using real-world clinical diagnostic cases, that archetype selection produces systematic differences in outputs, with important tradeoffs for control, accountability, and information needs.

Key Points

Concept: Human-LLM archetypes are reusable interaction patterns (who leads, who verifies, how information flows, control and veto rights) that shape human–LLM collaboration in decision-making.
Evidence base: The authors conducted a scoping literature review and thematic analysis of 113 papers to derive 17 archetypes spanning modes from purely assistive/advisory roles to delegated automated decision-making.
Empirical evaluation: The paper evaluates these archetypes across actual clinical diagnostic cases to show that archetype choice influences model outputs and final decisions (e.g., differences in recommendations, error patterns, and human reliance).
Tradeoffs identified: Four main design dimensions differentiate archetypes and drive tradeoffs:
- Decision control (who makes final decisions; veto rights)
- Social hierarchies and perceived authority (how human users defer to LLMs)
- Cognitive forcing strategies (structures that reduce bias/automation bias)
- Information requirements (what data/context the LLM and human need)
Risks called out: automation bias, over-/under-reliance, opaque delegation of responsibility, mismatched information flows, and variability of outcomes across archetypes.
Design implications: Choosing an archetype is a substantive design choice that affects performance, accountability, and the socio-technical fit for a domain.

Data & Methods

Scoping literature review: Surveyed 113 papers on LLM-supported decision-making to identify recurring interaction patterns.
Thematic analysis: Qualitative coding of the literature to derive a taxonomy of 17 human-LLM archetypes and to characterize the dimensions that distinguish them.
Empirical evaluation: Applied the archetypes to a set of real-world clinical diagnostic cases to observe how adopting different archetypes changes LLM outputs and decision outcomes. The evaluation compared archetype-conditioned behaviors and highlighted domain-specific effects (diagnostics chosen as a high-stakes testbed).
Analysis focused on comparative patterns and tradeoffs rather than producing a single “best” archetype; emphasis on demonstrating sensitivity of outputs to interaction design choices.
Limitations: The empirical testbed is clinical diagnostics (domain-specific); results may vary across domains and with different LLMs, user populations, or incentive structures.

Implications for AI Economics

Labor allocation and complementarities: Archetype choice affects task reallocation between humans and LLMs (substitution vs. augmentation). Economic models of automation should incorporate archetype heterogeneity when predicting labor demand and skill premiums.
Productivity vs. quality tradeoffs: Different archetypes shift the marginal productivity and error profiles of human–AI teams. Economists should account for how archetype-driven changes in accuracy and error types affect welfare, costs, and adoption incentives.
Incentives, liability, and contract design: Archetypes that delegate more control to LLMs change liability exposure and moral hazard. Procurement, contracts, and insurance pricing will need to reflect who holds final decision authority and verification duties.
Information asymmetries and signaling: Archetypes alter information flows and observability (e.g., whether human actions or model reasoning are visible). This affects signaling, trust, and market mechanisms (hiring, certification, pricing of services).
Market structure and adoption dynamics: Perceived authority of LLMs (social hierarchy effects) can speed adoption but increase systemic risk. Platform providers and firms will choose archetypes also based on liability and reputation tradeoffs, influencing market concentration and standards.
Regulation and governance: Regulatory frameworks should consider archetype-specific risks (e.g., mandatory human-in-the-loop vs. human-on-the-loop) rather than treating all AI deployment modes the same.
Valuation and measurement: When valuing AI contributions (for billing, revenue sharing, or performance pay), economists need metrics that capture the joint human–LLM contribution under different archetypes.
Research agenda for economists: Empirically quantify welfare impacts of archetypes across sectors, model principal–agent problems induced by archetype selection, design incentives and contract forms that mitigate adverse reliance, and evaluate externalities from archetype-driven systemic errors.

Actionable takeaways for AI economists - When modeling automation impacts, include interaction-design choices (archetypes) as key parameters that affect substitution, productivity, and risk. - In policy and contract design, distinguish deployment archetypes to allocate liability and incentives appropriately. - Prioritize empirical studies across multiple domains to measure how archetype heterogeneity alters labor demand, error externalities, and adoption paths.

Assessment

Paper Typedescriptive Evidence Strengthmedium — Provides multimethod evidence: a systematic scoping review (113 papers) that grounds a taxonomy, plus empirical tests showing consistent differences across archetypes in a high‑stakes domain; however, the empirical evaluation is domain‑limited (clinical diagnostics), sample size and model/user variation are not broadly representative, and there is no experimental randomization or economic outcome measurement to establish external causal effects on labor or productivity. Methods Rigormedium — Review and thematic coding appear systematic and comprehensive for the stated scope, and the applied tests use real-world cases to demonstrate sensitivity to interaction design; nevertheless, the empirical component lacks strong identification features (e.g., RCT or natural experiment), details on case/sample size and LLM variation are limited, and qualitative coding may involve subjective judgment without reported interrater reliability in the summary. SampleScoping review of 113 papers on LLM-supported decision-making (qualitative coding to derive archetypes); empirical evaluation applied those archetypes to a set of real-world clinical diagnostic cases (number of cases and specific LLM(s)/human participants not specified in the summary), comparing archetype-conditioned LLM outputs, error patterns, and human reliance across cases. Themeshuman_ai_collab labor_markets productivity governance IdentificationScoping literature review and qualitative thematic analysis to derive 17 interaction ‘archetypes’, plus a comparative empirical evaluation that applies different archetypes to a set of real-world clinical diagnostic cases to observe systematic differences in LLM outputs and downstream decisions; no randomized assignment or formal causal identification strategy. GeneralizabilityEmpirical tests are confined to clinical diagnostics — high-stakes and domain-specific, so results may not generalize to other sectors (e.g., finance, customer service, manufacturing)., Limited information on the diversity and number of cases, LLM models, and user populations reduces external validity., No randomized or longitudinal design to assess long-run adoption, labor reallocation, or productivity impacts across firms or workers., The archetypes are derived from existing literature and may be biased by publication and domain sampling in the reviewed corpus., Qualitative coding and taxonomy construction may reflect authors' interpretive choices; cultural and institutional contexts could alter archetype behavior.

Claims (7)

Claim	Direction	Confidence	Outcome	Details
The paper introduces the concept of human-LLM archetypes, defined as re-occurring socio-technical interaction patterns that structure the roles of humans and LLMs in collaborative decision-making. Ai Safety And Ethics	null_result	high	conceptual framework (existence and definition of human-LLM archetypes)	0.18
We describe 17 human-LLM archetypes derived from a scoping literature review and thematic analysis of 113 LLM-supported decision-making papers. Ai Safety And Ethics	null_result	high	number and characterization of human-LLM archetypes (17 archetypes identified)	n=113 17 archetypes identified 0.18
We evaluate these diverse archetypes across real-world clinical diagnostic cases to examine the potential effects of adopting distinct human-LLM archetypes on LLM outputs and decision outcomes. Decision Quality	mixed	medium	LLM outputs and decision outcomes in clinical diagnostic cases	0.11
Selection of human-LLM interaction archetype can influence LLM outputs and decisions. Decision Quality	mixed	medium	changes in LLM outputs and decision outcomes associated with different human-LLM archetypes	0.11
Selection of a human-LLM archetype brings important risks and considerations for the designers of human-AI decision-making systems. Ai Safety And Ethics	negative	medium	identified risks and design considerations for system designers	0.11
The paper presents relevant tradeoffs and design choices across human-LLM archetypes, including decision control, social hierarchies, cognitive forcing strategies, and information requirements. Ai Safety And Ethics	null_result	high	catalog of tradeoffs and design considerations across archetypes (categories: decision control, social hierarchies, cognitive forcing strategies, information requirements)	0.18
LLMs are increasingly supporting decision-making across high-stakes domains, requiring critical reflection on the socio-technical factors that shape how humans and LLMs are assigned roles and interact during human-in-the-loop decision-making. Ai Safety And Ethics	positive	medium	trend: increased use of LLMs in high-stakes decision-making domains (motivation for study)	0.11