How an AI looks matters: in an escape-room field experiment, teams working with embodied agents performed unevenly — some mixed teams excelled while others fared worse — whereas all-human teams were steadier but slower and error-prone. Higher degrees of embodiment produced conversational dynamics closer to human–human interaction, suggesting social cues from embodiment reshape collaboration.

Teaming Up with Artificial Agents in Non-routine Analytical Tasks

Lorenzo Cominelli, F. Galatolo, Caterina Giannetti, F. Dell’orletta, Cristiano Ciaccio, Philipp Chapkovski, Giulia Venturi · Fetched May 30, 2026 · ACM Transactions on Human-Robot Interaction

semantic_scholar rct medium evidence 7/10 relevance Summary only summary available; pdf_status=not_found DOI Source

Structured author observations

Linked only from stored provider relations; the raw author line above is never matched by name.

OpenAlex

Latest observation: July 23, 2026

Lorenzo Cominelli exact ORCID
Federico Galatolo exact ORCID
Caterina Giannetti exact ORCID
Felice Dell’Orletta exact ORCID
Cristiano Ciaccio exact ORCID
Philipp Chapkovski exact ORCID
Giulia Venturi exact ORCID

Semantic Scholar

Latest observation: July 23, 2026

Lorenzo Cominelli provider ID
F. Galatolo provider ID
Caterina Giannetti provider ID
F. Dell’orletta provider ID
Cristiano Ciaccio provider ID
Philipp Chapkovski provider ID
Giulia Venturi provider ID

Embodiment of artificial agents alters team performance and conversational patterns in an escape-room experiment: mixed human–AI teams show greater outcome variance and more human-like dialogue as embodiment increases, while human-only teams are more consistent but slower and make more errors.

Citation observations

Cumulative provider counts captured on specific dates; providers are never combined.

0 cumulative citations

OpenAlex · Observed July 22, 2026

View corpus context

0 cumulative citations

Semantic Scholar · Observed July 22, 2026

View corpus context

Although AI systems are becoming increasingly common in the workplace, research on their integration into human teams remains limited. In particular, little is known about how the embodiment of artificial agents shapes collaboration and performance in non-routine analytical tasks. To address this gap, we examine how different degrees of embodiment affect team performance and conversational dynamics in a real-life escape room. Teams composed of either three humans or two humans and an artificial agent (a Box, an Avatar, or a hyper-realistic humanoid) worked together to escape the room within a time limit. Our findings show that artificial agents have an uneven impact on team outcomes, with some mixed human–AI teams performing exceptionally well and others markedly worse. Human-only teams, by contrast, display more consistent performance: they are more likely to complete all tasks successfully, although they take longer and commit more errors. We also document a suggestive non-linear relationship between embodiment and team performance. Teams interacting with more embodied agents display conversational patterns that more closely resemble human–human dialogue. Together, these findings show that embodied AI shapes collaboration in complex ways, reinforcing evidence that social cues critically guide teamwork dynamics.

Summary

Main Finding

Embedding artificial agents into small teams changes collaboration in complex, non-monotonic ways. Mixed human–AI teams show higher variance in outcomes (some outperform human-only teams, others underperform), while human-only teams are more consistent—more likely to complete all tasks but slower and with more observable errors. Higher degrees of physical/social embodiment in the agent make conversational dynamics more similar to human–human dialogue, suggesting social cues from embodiment materially shape teamwork.

Key Points

Study context: real-world escape room with a time limit; teams were either three humans or two humans plus one artificial agent.
Embodiment treatments: low (Box), medium (Avatar), high (hyper-realistic humanoid).
Outcome heterogeneity: mixed teams produced a wide spread of performance — some excelled, others performed poorly — whereas human-only teams clustered around more consistent (but slower) performance with more errors.
Completion vs efficiency trade-off: human-only teams tended to complete all tasks more often but required more time and committed more errors; mixed teams could be faster and occasionally better, but also riskier.
Non-linear embodiment effects: relationship between embodiment level and team performance is suggestive of non-linearity (not simply “more embodiment = better”).
Conversational dynamics: teams paired with more embodied agents exhibited dialogue patterns that resembled human–human interaction (e.g., turn-taking, grounding, backchanneling), indicating embodiment triggers social interaction modes.

Data & Methods

Field experiment in a naturalistic, task-oriented setting (escape room) to study non-routine analytical teamwork under time pressure.
Between-subjects design comparing:
- Human-only teams (3 humans)
- Mixed teams (2 humans + 1 artificial agent) with three embodiment conditions (Box, Avatar, humanoid)
Performance metrics included task completion, time-to-complete, and error counts; conversational measures captured interaction patterns and coordination dynamics.
Analysis focused on outcome distributions (variance, completion rates), comparisons across embodiment conditions, and conversational feature similarity to human–human dialogue.
Limitations: context-specific (escape-room tasks, short-duration interactions), potential novelty effects from interacting with embodied agents, and suggestive (rather than definitive) evidence of non-linearity — replication and broader task domains needed.

Implications for AI Economics

Productivity vs. Risk: Introducing embodied AI can raise average productivity in some teams but increases outcome variance. Firms should weigh potential upside against heightened downside risk (failed tasks, coordination breakdowns).
Complementarity and Team Composition: Embodiment affects social complementarity. Firms should consider which tasks and team compositions benefit from embodied agents versus those better served by human-only teams.
Investment decisions: Embodiment is an investable feature (hardware/appearance, interface design). Returns may be non-linear — incremental spending on embodiment may yield disproportionate changes in interaction dynamics; cost–benefit assessments should account for variance, not just mean gains.
Measurement and incentives: Standard productivity metrics (means) may hide important distributional effects. Contracting and incentives should account for tails (e.g., reliability bonuses, insurance against coordination failures).
Adoption strategy and training: Because social cues matter, deploying embodied agents may require complementary investments in human training, onboarding, and workflow redesign to reduce failure modes and stabilize performance.
Labor markets and task allocation: Embodied AI could shift which tasks are assigned to machines vs humans, altering skill demand. Highly social or coordination-heavy roles may change in value depending on agent embodiment quality.
Policy and organizational design: Regulators and firms should monitor not just average productivity impacts but also inequality of outcomes, error risk, and accountability when agents introduce stochastic team performance.
Research priorities: Evaluate long-run adaptation (do humans learn to coordinate reliably with embodied agents?), expand to varied tasks and settings, quantify the sources of non-linearity, and model economic trade-offs of embodiment investments.

Assessment

Paper Typerct Evidence Strengthmedium — The study uses experimental variation in embodiment, which supports causal claims about effects within the escape-room context; however, external validity is limited (single task type, short interactions, likely modest sample sizes and potential novelty or Hawthorne effects), so evidence for broader economic impacts is suggestive rather than definitive. Methods Rigormedium — A field experiment in a real-world task is a rigorous design choice, and conversational/process measures add depth, but likely limitations include unclear randomization details, possible small N, non-blinding, unreported pre-registration or power calculations, and limited controls for participant selection and prior familiarity with agents. SampleParticipants worked in teams in a real-life escape room: either three humans or two humans plus one artificial agent of three embodiment types (a Box, an Avatar, or a hyper-realistic humanoid); outcomes measured include task completion, time to escape, errors, and conversational dynamics. Themeshuman_ai_collab productivity org_design IdentificationBetween-subjects experimental manipulation of team composition/agent embodiment (teams assigned to either three humans or two humans plus an artificial agent in one of three embodiment forms), with causal comparisons of task outcomes and conversational dynamics across conditions. GeneralizabilitySingle task type (escape-room puzzles) may not represent typical workplace analytical tasks, Short-duration, high-pressure setting limits applicability to ongoing team work, Small team size (three people) may not scale to larger organizational teams, Specific agent designs (Box, Avatar, humanoid) and implementations may not generalize to other AI systems, Potential novelty/selection effects if participants were volunteers or unfamiliar with embodied agents, Likely single-site/cultural context limits transferability across countries or industries

Claims (8)

Claim	Direction	Outcome	Confidence & Evidence	Details
We examined how different degrees of embodiment affect team performance and conversational dynamics in a real-life escape room; teams were composed of either three humans or two humans and an artificial agent (a Box, an Avatar, or a hyper-realistic humanoid). Other	null_result	team composition / experimental manipulation (embodiment)	Reading fidelity high Study strength high	not reported 1.0
Artificial agents have an uneven impact on team outcomes, with some mixed human–AI teams performing exceptionally well and others markedly worse. Team Performance	mixed	team outcomes / performance variability	Reading fidelity high Study strength medium	not reported 0.6
Human-only teams are more likely to complete all tasks successfully (higher task completion success) than mixed human–AI teams. Team Performance	positive	task completion / success rate	Reading fidelity high Study strength medium	not reported 0.6
Human-only teams take longer to complete the escape room than mixed human–AI teams. Task Completion Time	negative	time to complete task	Reading fidelity high Study strength medium	not reported 0.6
Human-only teams commit more errors than mixed human–AI teams. Error Rate	negative	errors committed	Reading fidelity high Study strength medium	not reported 0.6
There is a suggestive non-linear relationship between embodiment and team performance. Team Performance	mixed	team performance as a function of embodiment	Reading fidelity high Study strength speculative	not reported 0.1
Teams interacting with more embodied agents display conversational patterns that more closely resemble human–human dialogue. Team Performance	positive	conversational pattern similarity to human–human dialogue	Reading fidelity high Study strength medium	not reported 0.6
Embodied AI shapes collaboration in complex ways, and social cues critically guide teamwork dynamics. Team Performance	positive	influence of social cues/embodiment on teamwork dynamics	Reading fidelity medium Study strength speculative	not reported 0.06