A Human–AI team can deliver near-expert online diagnoses while cutting required doctor involvement to roughly one-tenth; hierarchical reinforcement learning allocates human attention sparingly to preserve accuracy and lower labor per consultation.
Online medical consultation is one of important mobile services worldwide, in which patients can make consultation more conveniently through phones anytime and anywhere. However, expert-level online consultations are expensive due to the shortage of medical professionals, while AI models are unreliable because they have unpredictable risks. Therefore, we introduce human-machine collaboration to medical online consultation and focus on symptom inquiry, as the basis for disease diagnosis. There are two key issues: 1) how to design an intelligent assignment strategy that can determine which doctors or models participate in each turn? 2) how to design an effective execution strategy that can improve the machine’s inquiry ability among considerable symptoms? To address the above issues, we propose the Human-AI Diagnostic Team (HADT) framework based on Hierarchical Reinforcement Learning, which aims to achieve high accuracy with low laborforce. Specifically, HADT has two layers. The upper one is responsible for assignment, in which we propose a module called master that enables intelligent human-machine assignments through the masked reinforcement learning with reward shaping. The lower one is responsible for execution, consisting of a doctor and a proposed module called machine. This module can effectively ask about symptoms through the masked hierarchical reinforcement learning with bottom-up training. Experiments on the public datasets show that HADT can achieve up to 89.4<inline-formula><tex-math notation="LaTeX">$\%$</tex-math><alternatives><mml:math><mml:mo>%</mml:mo></mml:math><inline-graphic xlink:href="liu-ieq1-3629745.gif"/></alternatives></inline-formula> accuracy with only 10.9<inline-formula><tex-math notation="LaTeX">$\%$</tex-math><alternatives><mml:math><mml:mo>%</mml:mo></mml:math><inline-graphic xlink:href="liu-ieq2-3629745.gif"/></alternatives></inline-formula> human effort, as confirmed by real clinical doctors using the designed interface with mobile devices.
Summary
Main Finding
The Human-AI Diagnostic Team (HADT) framework — a two-layer hierarchical reinforcement learning system combining an assignment “master” and an execution “machine” plus human doctors — can deliver near-expert-level online symptom inquiry and diagnosis while using very little human labor. On public medical-consultation datasets and in clinical-interface validation with real doctors, HADT reached up to 89.4% diagnostic accuracy while requiring only 10.9% human effort.
Key Points
- Problem: expert online medical consultation is costly due to scarce medical professionals; pure AI is unreliable. Symptom inquiry is the critical front-line task for accurate diagnosis.
- Two central design questions addressed:
- Which actor (human doctor or AI module) should act at each turn? (assignment)
- How should the machine execute symptom inquiry effectively from a large symptom space? (execution)
- Architecture:
- Upper layer (“master”): learns turn-by-turn human–machine assignment using masked reinforcement learning with reward shaping to balance accuracy and human cost.
- Lower layer: execution team composed of a doctor and a “machine” module. The machine uses masked hierarchical reinforcement learning with bottom-up training to ask informative symptom questions.
- Training innovations: masked RL to constrain/guide action spaces and reward shaping to trade off diagnostic accuracy vs human labor; bottom-up training for the machine execution module to improve question selection over many possible symptoms.
- Empirical validation: experiments on public datasets show strong accuracy/human-effort tradeoffs (89.4% accuracy at 10.9% human effort). The system and interface were also tested with real clinical doctors on mobile devices to confirm practical viability.
Data & Methods
- Data: experiments were run on publicly available online medical consultation datasets (paper reports public dataset(s) for symptom-inquiry/diagnosis tasks).
- Methodology:
- Hierarchical reinforcement learning with two layers: master (assignment) and worker/execution (doctor + machine).
- Masked RL techniques restrict or mask actions to relevant subsets (reducing exploration over huge symptom/action spaces).
- Reward shaping applied at the assignment layer to incorporate penalties for human involvement and incentives for diagnostic accuracy.
- Bottom-up training in the execution module to progressively learn effective symptom-question policies.
- Evaluation:
- Primary metrics: diagnostic accuracy and human effort (proportion of turns or time requiring human doctors).
- Comparative evaluation against (implied) baselines of fully human, fully automated, and simpler assignment strategies (paper reports superior trade-offs).
- Clinical validation: physicians used the designed mobile interface to confirm usability and real-world performance.
Implications for AI Economics
- Labor-cost reduction: HADT demonstrates a concrete way to substitute expensive human diagnostic labor with AI assistance while preserving high accuracy — lowering marginal cost per consultation.
- Allocation efficiency: intelligent turn-level assignment can reduce costly human attention to only the high-value moments, improving overall system productivity.
- Pricing and market expansion: lower per-consultation costs could enable broader access to medical advice, new pricing models (tiered or dynamic pricing based on human involvement), and expanded demand in underserved regions.
- Incentive/design considerations: reward-shaping and assignment rules embody implicit incentive structures; similar mechanisms could be used to design payment and staffing contracts that balance quality and cost.
- Workforce effects: partial substitution may reduce routine diagnostic workload, shifting clinicians toward oversight, complex cases, and supervision — raising questions about retraining, job design, and labor market transitions.
- Reliability and regulation: economic gains depend on robustness and trust. Regulators and payers will require clinical validation, safety guarantees, and clear liability frameworks for human-AI shared decision-making.
- Research opportunities: formal cost–benefit analyses, mechanism design for optimal assignment pricing, heterogeneity across specialties, generalizability to other high-skill services, and long-run effects on supply of medical professionals.
If you want, I can (a) extract and format the experimental result table and baseline comparisons (if you provide the paper or dataset names), or (b) produce a short policy brief estimating potential cost savings per consultation under plausible assumptions. Which would you prefer?
Assessment
Claims (13)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| HADT reached up to 89.4% diagnostic accuracy while requiring only 10.9% human effort. Output Quality | positive | medium | diagnostic accuracy; human effort (proportion of turns/time requiring human doctors) |
89.4% diagnostic accuracy; 10.9% human effort required
0.11
|
| The Human-AI Diagnostic Team (HADT) framework can deliver near-expert-level online symptom inquiry and diagnosis while using very little human labor. Output Quality | positive | medium | quality of symptom inquiry / diagnostic performance (compared to expert-level) |
near-expert-level performance with low human labor
0.11
|
| A two-layer hierarchical reinforcement learning system—an assignment 'master' and an execution 'machine' (plus human doctors)—effectively balances accuracy and human cost. Task Allocation | positive | medium | trade-off between diagnostic accuracy and human effort |
hierarchical RL balances accuracy and human cost
0.11
|
| The upper layer ('master') learns turn-by-turn human–machine assignment using masked reinforcement learning with reward shaping to balance accuracy and human cost. Task Allocation | positive | high | assignment policy performance; human effort allocation; diagnostic accuracy under assignment policy |
assignment-layer policy learned to trade off accuracy and human cost
0.18
|
| The execution machine uses masked hierarchical reinforcement learning with bottom-up training to ask informative symptom questions from a large symptom space. Output Quality | positive | medium | quality/informativeness of symptom questions; downstream diagnostic accuracy |
execution module asks informative symptom questions improving downstream accuracy
0.11
|
| Masked reinforcement learning techniques constrain or mask action spaces, reducing exploration over huge symptom/action spaces. Other | positive | high | action-space reduction / sample efficiency / learning stability (as applied to symptom-action space) |
masked RL constrains action space, improving sample efficiency/stability
0.18
|
| Reward shaping at the assignment layer enables an explicit trade-off between diagnostic accuracy and human labor by incorporating penalties for human involvement. Task Allocation | positive | high | diagnostic accuracy vs human effort (as controlled by reward shaping) |
reward shaping encodes penalties for human involvement to trade off accuracy/human effort
0.18
|
| On public datasets HADT achieves superior accuracy/human-effort trade-offs compared to baselines (fully human, fully automated, and simpler assignment strategies). Output Quality | positive | medium | diagnostic accuracy and human effort relative to baseline methods |
superior accuracy/human-effort trade-offs vs baselines on public datasets
0.11
|
| Clinical-interface validation with real physicians on mobile devices confirmed the practical viability and usability of the HADT system and interface. Adoption Rate | positive | medium | practical viability / usability in clinical-interface testing (physician interaction) |
clinical-interface validation confirmed practical viability/usability (physician tests)
0.11
|
| HADT demonstrates a concrete way to substitute expensive human diagnostic labor with AI assistance while preserving high accuracy, implying reductions in marginal cost per consultation. Firm Productivity | positive | low | implied marginal cost per consultation (not directly measured) |
implied reductions in marginal cost per consultation (partial substitution of human labor)
0.05
|
| Intelligent turn-level assignment can reduce costly human attention to only high-value moments, improving overall system productivity. Organizational Efficiency | positive | low | distribution of human attention / system productivity (conceptual, not directly measured) |
intelligent turn-level assignment reduces human attention to high-value moments improving productivity (conceptual)
0.05
|
| Partial substitution of routine diagnostic work by HADT may shift clinicians toward oversight, complex cases, and supervision, raising workforce and retraining considerations. Skill Acquisition | mixed | speculative | clinician workload composition / need for retraining (speculative) |
partial substitution shifts clinicians toward oversight/complex cases, implying retraining needs
0.02
|
| Regulators and payers will require clinical validation, safety guarantees, and clear liability frameworks for human–AI shared decision-making before widescale deployment. Regulatory Compliance | null_result | speculative | regulatory requirements / safety validation (anticipated, not measured) |
anticipation that regulators/payers will require validation, safety guarantees, liability frameworks
0.02
|