A blueprint for human–AI complementarity: firms that invest in team composition, shared mental models, attention/orchestration, and continuous training can achieve team performance that exceeds humans or AI alone; without these sociotechnical investments, AI’s productivity gains will be limited and uneven.

Toward a science of human–AI teaming for decision-making: A complementarity framework

Cleotilde Gonzalez, Kate Donahue, Daniel G Goldstein, Hoda Heidari, Mohammad S. Jalali, Beau G. Schelble, Aarti Singh, Anita Woolley · Fetched March 12, 2026 · PNAS Nexus

semantic_scholar theoretical n/a evidence 7/10 relevance DOI Source

The paper proposes a unifying, cognitive-process–centered framework and practical design principles to create human–AI teams that achieve true complementarity through sociotechnical interventions rather than by technology alone.

As artificial intelligence (AI) becomes embedded in critical decisions involving health, safety, finance, and governance, the key challenge is no longer whether humans and AI will collaborate, but how to structure this collaboration to achieve true complementarity, conditions under which Human-AI teams outperform either humans or AI-only teams. This paper advances the science of Human–AI teaming for decision-making by integrating insights from cognitive science, AI, human factors, organizational behavior, and ethics. We propose a framework grounded in collective intelligence, anchored in the foundational processes of reasoning, memory, and attention, for understanding and engineering effective Human–AI teams. We examine how Human–AI teams can achieve complementarity, and identify the sociotechnical factors that shape their effectiveness, including team composition, trust calibration, shared mental models, training, and task structure. We then outline design principles for achieving complementarity: defining goals and constraints, partitioning roles, orchestrating attention and interrogation, building knowledge infrastructures, and establishing continuous training and evaluation. We conclude with theoretical, practical, and policy implications, emphasizing alignment with human values, accountability, and equity. Taken together, these insights offer a roadmap for building Human–AI teams that are not only high-performing and adaptive but also transparent, trustworthy, and fundamentally human-centered.

Summary

Main Finding

The paper develops a unifying, interdisciplinary framework for engineering Human–AI teams that achieve true complementarity — i.e., teams whose joint performance exceeds that of humans or AI alone. Grounded in collective intelligence and centered on the cognitive processes of reasoning, memory, and attention, the framework identifies sociotechnical levers (team composition, trust calibration, shared mental models, training, task structure) and design principles (define goals/constraints, partition roles, orchestrate attention/interrogation, build knowledge infrastructures, continuous training/evaluation) needed to build high-performing, transparent, trustworthy, and equitable Human–AI teams.

Key Points

Complementarity focus: The central challenge is structuring interactions so humans and AI amplify each other’s strengths rather than substitute.
Foundational cognitive lens: Effective teaming depends on aligning AI capabilities with human reasoning, memory, and attention processes.
Sociotechnical determinants: Effectiveness is shaped not only by AI algorithms but by team composition, trust calibration, shared mental models, training regimes, and how tasks are structured.
Practical design principles:
- Explicitly define goals and constraints that guide joint decision-making.
- Partition roles to allocate tasks based on relative strengths (e.g., pattern detection vs. normative judgment).
- Orchestrate attention and interrogation: design interfaces and workflows that manage what humans and AI focus on and how they challenge/verify each other.
- Build knowledge infrastructures that capture, curate, and make accessible team knowledge and provenance.
- Institute continuous training, evaluation, and feedback loops to adapt teams over time.
Ethics and governance: Emphasizes alignment with human values, clear accountability, transparency, and equity across users and stakeholders.

Data & Methods

Methodological type: Conceptual and integrative. The paper synthesizes literatures across cognitive science, AI, human factors, organizational behavior, and ethics to produce a theoretical framework and design guidelines.
Typical methods used or recommended:
- Cross-disciplinary literature review and conceptual modeling to identify processes and levers.
- Design heuristics and principled prescriptions rather than empirical causal claims.
- Suggested empirical evaluation strategies (implied): controlled experiments, field trials, simulation studies, and continuous measurement for deployment.
What is and isn’t present: The contribution is primarily theoretical and prescriptive (framework + design principles). It appears to propose evaluation approaches but does not report large-scale empirical datasets or causal identification of economic outcomes in deployments.

Implications for AI Economics

Rethinking complementarity in production models:
- Move beyond simple task-automation substitution models to include team-level complementarities (interaction effects between human skill and AI capabilities).
- Incorporate cognitive-process primitives (reasoning, memory, attention) as inputs or modifiers in production functions where feasible.
Labor markets and skill demand:
- Demand will increase for skills that support effective teaming (interpretation, interrogation of AI, systems orchestration, shared-model building) rather than routine task execution alone.
- Training and re-skilling investments become critical; firms capture value not only from AI tech but from organizational practices that realize complementarity.
Measurement and empirical strategies:
- Empirical work should quantify team-level performance gains, trust calibration, error types, and distributional outcomes. Suggested methods include randomized interventions (training, interface designs), difference-in-differences on phased rollouts, lab/field experiments, and structural models that allow interaction terms between human skill and AI quality.
- Key measurable outcomes: accuracy/efficiency, robustness to novel cases, decision consistency, trust/misuse rates, training costs, and inequity indicators.
Firm strategy and adoption:
- Adoption returns depend on sociotechnical investments (training, redesign, knowledge infrastructure); price/performance of AI alone is an incomplete predictor.
- Optimal division of labor requires analysis of comparative advantage at the task-subtask level — task decomposition and orchestration are economically consequential.
Policy and regulation:
- Policies should incentivize transparency, auditability, and standards for human–AI interfaces to ensure accountability and equitable outcomes.
- Support for workforce development, certification of teaming practices, and liability frameworks that reflect shared human–AI decision processes will shape distributional effects.
Research agenda for economists:
- Estimate magnitudes of team-level complementarities across industries and tasks.
- Study how organizational practices and regulations mediate firm-level returns to AI.
- Analyze dynamic effects: how learning, trust calibration, and evolving shared models change productivity and inequality over time.

Takeaway: For economists, this paper reframes AI’s economic impact from a technology-versus-labor story to an organizational and cognitive one — returns to AI depend crucially on how firms and institutions structure Human–AI collaboration, invest in complementary capabilities, and govern teams to realize durable, equitable productivity gains.

Assessment

Paper Typetheoretical Evidence Strengthn/a — The paper is a conceptual, integrative framework that synthesizes literatures rather than presenting empirical tests or causal estimates, so it does not provide direct empirical evidence. Methods Rigormedium — The work appears to be a careful, interdisciplinary synthesis drawing on cognitive science, human factors, organizational behavior, and AI, with clear, actionable design principles; however, it lacks empirical validation, pre-registered tests, or formal models that would raise rigor to high. SampleNo original empirical sample; the paper synthesizes existing cross-disciplinary literatures (cognitive science, HCI/human factors, organizational behavior, AI systems, and ethics) to build a conceptual framework and prescriptive design heuristics. Themeshuman_ai_collab productivity org_design skills_training labor_markets governance GeneralizabilityNo empirical validation means uncertain external validity across industries, firm sizes, and tasks, Applicability depends on AI capability level and task decomposition—may not generalize from narrow tools to more generalist models, Organizational culture and institutional context (e.g., regulation, liability regimes) will mediate outcomes, Economic effects (productivity, wages) are not quantified; transferability to macro or sectoral outcomes is indirect, Scalability and cost of required sociotechnical investments vary across firms and countries

Claims (16)

Claim	Direction	Confidence	Outcome	Details
Human–AI teams can achieve true complementarity such that joint team performance exceeds that of humans or AI alone. Team Performance	positive	medium	joint team performance (overall accuracy/quality of decisions compared to individual human or AI performance)	0.01
Aligning AI capabilities with human cognitive processes — reasoning, memory, and attention — is foundational to effective Human–AI teaming. Decision Quality	positive	medium	team effectiveness (decision quality, error rate) as mediated by alignment with human reasoning, memory, and attention	0.01
Sociotechnical determinants — team composition, trust calibration, shared mental models, training regimes, and task structure — materially shape Human–AI team effectiveness beyond algorithmic performance alone. Team Performance	mixed	medium	team effectiveness/productivity (accuracy, robustness, decision consistency) conditioned on sociotechnical factors	0.01
Design principles (define goals/constraints, partition roles, orchestrate attention/interrogation, build knowledge infrastructures, continuous training/evaluation) are necessary design levers to build high-performing, transparent, trustworthy, and equitable Human–AI teams. Team Performance	positive	low	team performance metrics (performance, transparency/trust measures, equity indicators) when design principles are implemented	0.01
Complementarity requires structuring interactions so humans and AI amplify each other's strengths rather than substitute for one another. Task Allocation	positive	medium	degree of complementarity (interaction effects between human skill and AI capability on output/productivity)	0.01
Partitioning roles — assigning pattern-detection tasks to AI and normative or contextual judgment to humans — improves task allocation based on comparative strengths. Task Allocation	positive	medium	task performance (accuracy, speed, decision quality) under role-partitioned workflows	0.01
Orchestrating attention and interrogation through interface and workflow design helps manage what humans and AI focus on and how they challenge/verify each other, thereby reducing errors and misuse. Error Rate	positive	low	error detection rates, misuse rates, verification frequency, and decision accuracy	0.01
Building knowledge infrastructures that capture, curate, and make provenance accessible is necessary for team knowledge continuity, accountability, and learning. Organizational Efficiency	positive	medium	knowledge availability, traceability/provenance metrics, learning/adaptation speed, auditability	0.01
Instituting continuous training, evaluation, and feedback loops is required to adapt Human–AI teams over time and maintain performance. Training Effectiveness	positive	medium	performance trajectories over time (learning curves), calibration of trust, adaptability to novel cases	0.01
Economic models of AI impact should move beyond simple task-automation/substitution frameworks to incorporate team-level complementarities and cognitive-process primitives (reasoning, memory, attention). Other	mixed	medium	accuracy of production-function or labor-impact models when team-level interaction terms and cognitive-process inputs are included	0.01
Labor demand will increasingly favor skills that support effective Human–AI teaming (interpretation, interrogation of AI, systems orchestration, shared-model building) rather than routine task execution. Employment	positive	medium	labor demand by skill type (employment shares, wage growth for non-routine teaming skills vs. routine tasks)	0.01
Firm returns to AI adoption depend crucially on sociotechnical investments (training, redesign, knowledge infrastructure), so AI price/performance alone is an incomplete predictor of adoption returns. Firm Productivity	mixed	medium	firm-level productivity/returns to AI adoption conditional on investments in sociotechnical assets	0.01
Empirical evaluation strategies for Human–AI teams should include randomized interventions, field trials, lab experiments, phased rollouts (difference-in-differences), and structural models that allow interaction terms between human skill and AI quality. Research Productivity	null_result	high	appropriate empirical identification of team-level complementarities and causal impacts (measured via RCTs, DiD, structural estimates)	0.02
Key measurable outcomes to assess Human–AI teams include accuracy/efficiency, robustness to novel cases, decision consistency, trust/misuse rates, training costs, and inequity indicators. Output Quality	null_result	high	accuracy, efficiency, robustness, consistency, trust/misuse rates, training costs, equity measures	0.02
Policy should incentivize transparency, auditability, standards for human–AI interfaces, workforce development, certification of teaming practices, and liability frameworks to ensure accountability and equitable outcomes. Governance And Regulation	positive	low	policy outcomes such as levels of transparency, auditability, workforce skill development, liability clarity, and distributional equity	0.01
The paper is primarily theoretical and prescriptive: it synthesizes literature and proposes a framework and design guidelines rather than reporting large-scale empirical datasets or causal identification of economic outcomes. Research Productivity	null_result	high	presence/absence of empirical datasets or causal identification studies in the paper	0.02