The Quiet Path from Seemingly Minor Design Errors to Workplace AI Incidents

Recent human-computer interaction (HCI) research has revealed a widespread misalignment between how developers design workplace artificial intelligence (AI) systems, and what workers actually need from them. Yet, little research has examined the effects of this gap, or how it may cause harm. We analyzed 1,524 reports of incidents in which AI systems were used to perform 171 occupational tasks across 12 industry sectors. Using an Large Language Model (LLM)-as-an-expert approach, we extracted the main traits of the AI systems involved in those incidents using an established framework of twelve traits. We then compared them with the traits that 202 workers highly familiar with those tasks would have preferred. We found that as many as 83\% of workplace incidents stem from worker-AI misalignments. In most cases, workers wanted systems that are precise, insightful, or personal, but instead received systems that are basic, simple, or general. Over the years, fast AI caused a considerable number of incidents, yet these declined, and imaginative AI, with the mass introduction of generative AI, started to cause incidents. We also compared the traits causing the incidents with the traits that 197 developers building AI systems for those tasks would have preferred. If the traits causing the incidents were the same as those designed by developers, then developers may be responsible for those incidents. We found that 74\% of task misalignments could be attributed to developers who tended to overfocus on efficiency and speed, especially for systems performing tasks in people-facing occupations such as those in the human resources sector. Our results call for design interventions that better align AI development with workers' needs, as without such corrections, workplace AI incidents are likely to persist, causing the invisible erosion of worker agency and organizational productivity.

Summary

Main Finding

Workplace AI incidents are overwhelmingly driven by design misalignment between the traits workers need from AI and the traits built into deployed systems. In the authors’ analysis of AI incident reports and task-level preferences, 83% of workplace incidents involved AI trait misalignment, and 74% of those misalignments could be attributed to developer design choices (not incidental system behavior). Common mismatches involve workers wanting precision, insight, or personalization while receiving systems designed to be basic, fast, or general; these mismatches cascade into real workplace harms that quietly erode worker agency and organizational productivity.

Key Points

Scope and scale
- Source: AI Incident Database (AIID). Authors collected ~1,256 incidents (6,163 news sources) up to Nov 2025 and used an LLM pipeline to identify workplace incidents; 286 incidents were classified as workplace occurrences.
- Task coverage: 171 occupational tasks drawn from O*NET spanning 12 industry sectors.
- Human data: preference surveys from 202 domain-familiar workers and 197 developers on how AI should behave for those tasks.
Analytical approach
- Used an LLM-as-expert (GPT-5) for systematic extraction/classification of tasks and incident details, with human validation and improved prompts (Cohen’s κ used for validation).
- Adopted a 12-pair trait framework (pairs of opposite psychological/functional traits, e.g., imaginative vs. practical, precise vs. basic) to characterize AI system behavior and preferences.
- Defined misalignment: a gap >0.5 on a 5-point Likert scale between worker preferred trait and observed trait in the incident; developer attribution determined by comparing developer preferences and a counterfactual question (“Would the incident have occurred if the AI had the worker-preferred trait?”).
Core quantitative findings
- 83.4% of workplace incidents involved at least one AI trait misalignment with worker needs.
- 74% of those task-level misalignments were attributable to developers’ design choices (developers’ preferred traits matched those causing the incident).
- Typical pattern: workers preferred systems that are precise, insightful, or personal; deployed systems tended to be basic, simple, or general.
- Temporal/technology trends: incidents tied to “fast”/efficiency-oriented systems rose early in the sample and later declined; incidents tied to “imaginative” traits rose with the mass adoption of generative AI.
- Sectoral variation: legal-sector incidents often involved imaginative AI (e.g., hallucinations in drafting), HR and people-facing occupations saw incidents linked to speed/efficiency-focused systems (e.g., resume screeners).
Conceptual contribution
- Reframes workplace AI incidents as “trait misalignment” problems, linking HCI-style trait frameworks to incident analysis and sociotechnical accounts (miscalibration, automation surprise, cascading failures).

Data & Methods

Incident data and preprocessing
- Primary corpus: AI Incident Database (AIID) incident entries and supporting news sources (2013–2025).
- LLM (GPT-5) prompt-based pipeline to: (1) classify incidents as workplace vs. non-workplace (iteratively prompt-tuned and validated against human annotators; Cohen’s κ improved to 0.85), and (2) extract task occurrences in each incident (rated 0–3; retained only those scored 3).
Task selection and participant recruitment
- Task universe: 18,796 O*NET tasks filtered to computer-use, frequently performed/core tasks and high worker familiarity → 171 tasks across 12 sectors.
- Participants: 202 workers recruited through Prolific (screened for domain expertise) and 197 U.S.-based developers with AI experience; all respondents rated task-level trait preferences on a 5-point Likert scale.
Trait framework and operationalization
- Used a validated expanded framework of 12 opposing trait pairs (e.g., imaginative vs. practical, fast vs. explainable).
- Worker and developer preferences captured per task; misalignment defined as worker–developer gap >0.5 points.
- Incident causality and developer attribution: for a given task occurrence in an incident, coders assessed whether the AI exhibited the opposite trait to worker preference and whether making the system worker-preferred would have prevented the incident (counterfactual-based causal judgment). If developers preferred the trait that caused the incident, the incident was attributed to developer design choices.
Validation and limitations acknowledged by authors
- Multiple human-LLM validation steps (Cohen’s κ reported); LLM used as an “expert” labeler but human oversight retained.
- Limitations: reliance on AIID (news/reporting bias); trait framework developed in a U.S. context (potential cultural sensitivity); subjective counterfactual judgments for attribution; GPT-5 classifications not infallible though validated.

Implications for AI Economics

Hidden costs and mismeasured productivity
- Apparent efficiency gains from AI can be offset by “patchwork” labor—extra human effort required to compensate for misaligned systems. Firms that focus ROI solely on throughput/time-savings risk underestimating total costs (rework, verification, oversight, error remediation).
- Incidents (e.g., biased hiring, welfare misclassification, legal hallucinations) create direct costs (remediation, litigation, reputational damage) and indirect costs (lower worker morale, turnover, slower uptake of AI due to distrust).
Incentives and principal-agent problems in AI deployment
- Developers and procurement teams may over-prioritize speed, efficiency, or generic solutions (cost- and time-saving incentives), producing externalities borne by frontline workers and affected stakeholders. This is a classical principal-agent misalignment: developers/management optimize different objectives than workers.
- Economic models of automation adoption should incorporate alignment risk as an explicit factor influencing adoption thresholds and expected returns.
Labor market and task-shift dynamics
- The qualitative nature of tasks matters: worker preference for precision or insight implies complementarities between workers and AI (AI as augmenting tool), while generic/fast-only AI displaces or degrades meaningful aspects of work.
- Misaligned AI can reduce effective human capital (erosion of agency and professional skill use), affecting compensation bargaining, job quality, and potentially long-run labor demand for certain occupations.
Policy, procurement, and firm-level actions
- Risk management: include trait-alignment metrics in procurement, impact assessments, and ex-ante cost–benefit analyses. Measure alignment-adjusted productivity, not raw speed gains.
- Internalize externalities: firms should account for downstream remediation costs, staff time for oversight, and the value of task-appropriate AI traits when choosing systems.
- Investment priorities: allocate resources to worker-centered design, developer training in domain practices, human-in-the-loop verification, and usability studies—these are investments that reduce incident risk and hidden labor costs.
- Regulation and standards: regulators can require alignment demonstrations or audits that test trait fit across representative tasks and worker groups (especially for people-facing or high-stakes sectors), and mandate incident reporting that includes trait-alignment diagnostics.
Research and measurement gaps for economists
- The paper documents prevalence and attribution but does not monetize impacts. Future work should estimate:
  - Time and wage costs of compensatory labor (“patchwork”).
  - Incident remediation costs, litigation, and lost output.
  - Long-run effects on productivity growth and adoption dynamics when alignment risk is internalized.
  - Sector-specific elasticities: how do alignment requirements shift the marginal ROI of automation across occupations?
- Econometric or structural models could incorporate misalignment probabilities and expected incident costs into firm-level adoption and investment decisions.

Overall, the study argues that seemingly small trait-design errors are economically meaningful: they generate frequent, often avoidable incidents that impose hidden costs and distort incentives. For AI economics, that implies adjusting models of automation benefits to include alignment risk, rethinking procurement and investment criteria, and quantifying the returns to worker-centered design and developer incentives that internalize alignment.

Assessment

Paper Typecorrelational Evidence Strengthlow — The study is observational and relies on heuristic rule-based attribution (trait mismatch → cause), LLM-based coding rather than human-validated coding, and non-random samples of incident reports and surveyed participants; these factors leave substantial room for measurement error, selection bias, and confounding, so causal claims about responsibility and harm are weak. Methods Rigormedium — Strengths include a large set of incident reports (1,524), an explicit 12-trait framework, and parallel surveys of workers and developers, plus use of an LLM to scale coding; weaknesses include reliance on an LLM without clear validation/benchmarking of coding accuracy, potential subjectivity in mapping incidents to traits, limited information on incident sampling, and absence of exogenous variation or robustness checks to support causal attribution. Sample1,524 workplace AI incident reports covering 171 occupational tasks across 12 industry sectors; trait extraction from those reports using an LLM mapped to a 12-trait framework; survey samples of 202 workers highly familiar with those tasks and 197 developers who build AI systems for those tasks; temporal analysis referencing changes with generative AI adoption (e.g., rise in 'imaginative' AI incidents). Themeshuman_ai_collab org_design productivity IdentificationCompare traits extracted from 1,524 incident reports (coded using an LLM-as-expert against a 12-trait framework) to trait preferences elicited from 202 workers and 197 developers for the same tasks; classify an incident as caused by worker-AI misalignment when the incident-system traits differ from worker preferences, and attribute task-level misalignments to developers when the traits causing incidents match developers' stated preferences. GeneralizabilityIncident reports may be non-representative (self-reports, specific reporting channels, or media-selected incidents) and biased toward salient failures., Worker and developer survey samples (n≈200 each) may be self-selected and not representative across regions, firm sizes, or occupations., LLM-based trait coding may not generalize across languages, report styles, or subtle contextual cues without validation., The 12-trait framework is a modeling choice that may omit important dimensions of worker needs or system behavior., Temporal patterns (e.g., effects of generative AI) may not generalize across firms/regions with different adoption timing., Causal attribution rules (mismatch → cause; match to developer preferences → developer responsibility) assume simplified causal pathways that may not hold in complex organizational contexts.

Claims (9)

Claim	Direction	Confidence	Outcome	Details
We analyzed 1,524 reports of incidents in which AI systems were used to perform 171 occupational tasks across 12 industry sectors. Other	null_result	high	scope and coverage of analyzed incident reports (number of incidents, tasks, and sectors)	n=1524 0.3
We used an LLM-as-an-expert approach to extract the main traits of the AI systems involved in those incidents using an established framework of twelve traits. Other	null_result	high	trait classification of AI systems involved in incidents	n=1524 0.3
We compared the extracted traits with the traits that 202 workers highly familiar with those tasks would have preferred. Worker Satisfaction	null_result	high	workers' preferred AI system traits (self-reported preferences)	n=202 0.3
As many as 83% of workplace incidents stem from worker-AI misalignments. Error Rate	negative	high	proportion of workplace AI incidents attributable to worker-AI trait misalignment	n=1524 83% 0.3
In most cases, workers wanted systems that are precise, insightful, or personal, but instead received systems that are basic, simple, or general. Worker Satisfaction	negative	high	mismatch between worker-preferred AI traits and deployed AI traits (trait-level preference vs. delivered)	n=202 0.3
Over the years, fast AI caused a considerable number of incidents, yet these declined, and imaginative AI, with the mass introduction of generative AI, started to cause incidents. Error Rate	mixed	medium	time trends in incident counts by AI trait category (e.g., 'fast' vs 'imaginative')	n=1524 0.18
We compared the traits causing the incidents with the traits that 197 developers building AI systems for those tasks would have preferred. Other	null_result	high	developers' preferred AI system traits (self-reported)	n=197 0.3
Seventy-four percent of task misalignments could be attributed to developers who tended to overfocus on efficiency and speed, especially for systems performing tasks in people-facing occupations such as those in the human resources sector. Task Allocation	negative	high	proportion of task misalignments attributable to developer-design choices (overemphasis on efficiency/speed)	n=197 74% 0.3
Without design corrections that better align AI development with workers' needs, workplace AI incidents are likely to persist, causing the invisible erosion of worker agency and organizational productivity. Organizational Efficiency	negative	medium	persistence of incidents and resulting erosion of worker agency and organizational productivity (projected, not empirically measured longitudinally)	0.03

Three-quarters of AI design mismatches trace back to developers’ focus on speed: 83% of workplace AI incidents arise when systems don’t match worker needs, and developers’ emphasis on efficiency explains 74% of task-level misalignments, especially in people-facing roles.