How an AI looks matters: in an escape-room field experiment, teams working with embodied agents performed unevenly — some mixed teams excelled while others fared worse — whereas all-human teams were steadier but slower and error-prone. Higher degrees of embodiment produced conversational dynamics closer to human–human interaction, suggesting social cues from embodiment reshape collaboration.
Although AI systems are becoming increasingly common in the workplace, research on their integration into human teams remains limited. In particular, little is known about how the embodiment of artificial agents shapes collaboration and performance in non-routine analytical tasks. To address this gap, we examine how different degrees of embodiment affect team performance and conversational dynamics in a real-life escape room. Teams composed of either three humans or two humans and an artificial agent (a Box, an Avatar, or a hyper-realistic humanoid) worked together to escape the room within a time limit. Our findings show that artificial agents have an uneven impact on team outcomes, with some mixed human–AI teams performing exceptionally well and others markedly worse. Human-only teams, by contrast, display more consistent performance: they are more likely to complete all tasks successfully, although they take longer and commit more errors. We also document a suggestive non-linear relationship between embodiment and team performance. Teams interacting with more embodied agents display conversational patterns that more closely resemble human–human dialogue. Together, these findings show that embodied AI shapes collaboration in complex ways, reinforcing evidence that social cues critically guide teamwork dynamics.
Summary
Main Finding
Embedding artificial agents into small teams changes collaboration in complex, non-monotonic ways. Mixed human–AI teams show higher variance in outcomes (some outperform human-only teams, others underperform), while human-only teams are more consistent—more likely to complete all tasks but slower and with more observable errors. Higher degrees of physical/social embodiment in the agent make conversational dynamics more similar to human–human dialogue, suggesting social cues from embodiment materially shape teamwork.
Key Points
- Study context: real-world escape room with a time limit; teams were either three humans or two humans plus one artificial agent.
- Embodiment treatments: low (Box), medium (Avatar), high (hyper-realistic humanoid).
- Outcome heterogeneity: mixed teams produced a wide spread of performance — some excelled, others performed poorly — whereas human-only teams clustered around more consistent (but slower) performance with more errors.
- Completion vs efficiency trade-off: human-only teams tended to complete all tasks more often but required more time and committed more errors; mixed teams could be faster and occasionally better, but also riskier.
- Non-linear embodiment effects: relationship between embodiment level and team performance is suggestive of non-linearity (not simply “more embodiment = better”).
- Conversational dynamics: teams paired with more embodied agents exhibited dialogue patterns that resembled human–human interaction (e.g., turn-taking, grounding, backchanneling), indicating embodiment triggers social interaction modes.
Data & Methods
- Field experiment in a naturalistic, task-oriented setting (escape room) to study non-routine analytical teamwork under time pressure.
- Between-subjects design comparing:
- Human-only teams (3 humans)
- Mixed teams (2 humans + 1 artificial agent) with three embodiment conditions (Box, Avatar, humanoid)
- Performance metrics included task completion, time-to-complete, and error counts; conversational measures captured interaction patterns and coordination dynamics.
- Analysis focused on outcome distributions (variance, completion rates), comparisons across embodiment conditions, and conversational feature similarity to human–human dialogue.
- Limitations: context-specific (escape-room tasks, short-duration interactions), potential novelty effects from interacting with embodied agents, and suggestive (rather than definitive) evidence of non-linearity — replication and broader task domains needed.
Implications for AI Economics
- Productivity vs. Risk: Introducing embodied AI can raise average productivity in some teams but increases outcome variance. Firms should weigh potential upside against heightened downside risk (failed tasks, coordination breakdowns).
- Complementarity and Team Composition: Embodiment affects social complementarity. Firms should consider which tasks and team compositions benefit from embodied agents versus those better served by human-only teams.
- Investment decisions: Embodiment is an investable feature (hardware/appearance, interface design). Returns may be non-linear — incremental spending on embodiment may yield disproportionate changes in interaction dynamics; cost–benefit assessments should account for variance, not just mean gains.
- Measurement and incentives: Standard productivity metrics (means) may hide important distributional effects. Contracting and incentives should account for tails (e.g., reliability bonuses, insurance against coordination failures).
- Adoption strategy and training: Because social cues matter, deploying embodied agents may require complementary investments in human training, onboarding, and workflow redesign to reduce failure modes and stabilize performance.
- Labor markets and task allocation: Embodied AI could shift which tasks are assigned to machines vs humans, altering skill demand. Highly social or coordination-heavy roles may change in value depending on agent embodiment quality.
- Policy and organizational design: Regulators and firms should monitor not just average productivity impacts but also inequality of outcomes, error risk, and accountability when agents introduce stochastic team performance.
- Research priorities: Evaluate long-run adaptation (do humans learn to coordinate reliably with embodied agents?), expand to varied tasks and settings, quantify the sources of non-linearity, and model economic trade-offs of embodiment investments.
Assessment
Claims (8)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We examined how different degrees of embodiment affect team performance and conversational dynamics in a real-life escape room; teams were composed of either three humans or two humans and an artificial agent (a Box, an Avatar, or a hyper-realistic humanoid). Other | null_result | high | team composition / experimental manipulation (embodiment) |
1.0
|
| Artificial agents have an uneven impact on team outcomes, with some mixed human–AI teams performing exceptionally well and others markedly worse. Team Performance | mixed | high | team outcomes / performance variability |
0.6
|
| Human-only teams are more likely to complete all tasks successfully (higher task completion success) than mixed human–AI teams. Team Performance | positive | high | task completion / success rate |
0.6
|
| Human-only teams take longer to complete the escape room than mixed human–AI teams. Task Completion Time | negative | high | time to complete task |
0.6
|
| Human-only teams commit more errors than mixed human–AI teams. Error Rate | negative | high | errors committed |
0.6
|
| There is a suggestive non-linear relationship between embodiment and team performance. Team Performance | mixed | high | team performance as a function of embodiment |
0.1
|
| Teams interacting with more embodied agents display conversational patterns that more closely resemble human–human dialogue. Team Performance | positive | high | conversational pattern similarity to human–human dialogue |
0.6
|
| Embodied AI shapes collaboration in complex ways, and social cues critically guide teamwork dynamics. Team Performance | positive | medium | influence of social cues/embodiment on teamwork dynamics |
0.06
|