An AI trained on ~7,400 paired CBCT studies produces draft oral/maxillofacial reports at intermediate-radiologist quality and, as a co-authoring tool, systematically improves final report quality—helping novices reach intermediate standards, lifting intermediates toward senior quality, and cutting omission-related errors among seniors.
Generative AI has advanced rapidly in medical report generation; however, its application to oral and maxillofacial CBCT reporting remains limited, largely because of the scarcity of high-quality paired CBCT-report data and the intrinsic complexity of volumetric CBCT interpretation. To address this, we introduce CBCTRepD, a bilingual oral and maxillofacial CBCT report-generation system designed for integration into routine radiologist-AI co-authoring workflows. We curated a large-scale, high-quality paired CBCT-report dataset comprising approximately 7,408 studies, covering 55 oral disease entities across diverse acquisition settings, and used it to develop the system. We further established a clinically grounded, multi-level evaluation framework that assesses both direct AI-generated drafts and radiologist-edited collaboration reports using automatic metrics together with radiologist- and clinician-centered evaluation. Using this framework, we show that CBCTRepD achieves superior report-generation performance and produces drafts with writing quality and standardization comparable to those of intermediate radiologists. More importantly, in radiologist-AI collaboration, CBCTRepD provides consistent and clinically meaningful benefits across experience levels: it helps novice radiologists improve toward intermediate-level reporting, enables intermediate radiologists to approach senior-level performance, and even assists senior radiologists by reducing omission-related errors, including clinically important missed lesions. By improving report structure, reducing omissions, and promoting attention to co-existing lesions across anatomical regions, CBCTRepD shows strong and reliable potential as a practical assistant for real-world CBCT reporting across multi-level care settings.
Summary
Main Finding
CBCTRepD is a bilingual system for generating oral and maxillofacial cone-beam CT (CBCT) reports that was trained on a curated paired CBCT–report dataset (~7,408 studies, 55 disease entities). Under a multi-level clinical evaluation (automatic metrics plus radiologist/clinician review), CBCTRepD produces draft reports whose writing quality and standardization match intermediate radiologists, and—when used in a radiologist-AI co-authoring workflow—consistently improves report quality across experience levels: it helps novices reach intermediate quality, helps intermediates approach senior quality, and reduces omission-related errors (including clinically important missed lesions) for seniors. The system improves structure, reduces omissions, and promotes attention to multi-region co-existing lesions, demonstrating practical potential for real-world CBCT reporting in diverse care settings.
Key Points
- Dataset scale and scope: ~7,408 paired CBCT studies; bilingual; covers 55 oral/maxillofacial disease entities across diverse acquisition settings.
- Problem addressed: scarcity of high-quality paired CBCT–report data and complexity of volumetric CBCT interpretation that have limited automated CBCT reporting.
- System purpose: designed for integration into routine radiologist-AI co-authoring workflows (draft generation + human editing).
- Evaluation framework: clinically grounded, multi-level assessment covering both AI-generated drafts and radiologist-edited collaborative reports; combines automatic metrics with radiologist- and clinician-centered evaluation.
- Performance highlights:
- Draft reports reach writing quality and standardization comparable to intermediate radiologists.
- In collaborative use, CBCTRepD yields consistent, clinically meaningful improvements across experience tiers:
- Novices → improvements toward intermediate-level reporting.
- Intermediates → improvements toward senior-level reporting.
- Seniors → fewer omission-related errors, including missed clinically important lesions.
- Clinical effects: better report structure, fewer omissions, and more systematic attention to co-existing lesions across anatomical regions—features important for diagnostic completeness and downstream care decisions.
Data & Methods
- Data curation:
- Large-scale, high-quality paired dataset of CBCT studies and corresponding radiology reports (≈7,408 studies).
- Bilingual reports (language details not specified).
- Coverage includes 55 distinct oral and maxillofacial disease entities and a range of acquisition settings to increase heterogeneity and clinical realism.
- Model / system:
- CBCTRepD: a report-generation system built using the curated dataset and intended for human-in-the-loop co-authoring workflows.
- Emphasis on producing clinically usable drafts that radiologists can edit.
- (Paper does not specify model architecture in the provided text; primary contribution is dataset plus workflow-focused evaluation.)
- Evaluation:
- Multi-level framework that separately assesses:
- Raw AI-generated drafts (automatic metrics + clinician review).
- Radiologist-AI collaborative/final reports (how radiologists edit and the downstream clinical effects).
- Evaluation modalities: automatic metrics (e.g., likely text-similarity/clinical concept metrics), radiologist-centered review (accuracy, omissions, structure), and clinician-centered assessment (clinical importance of findings, missed lesions).
- Comparative analyses across radiologist experience levels (novice, intermediate, senior).
- Multi-level framework that separately assesses:
Implications for AI Economics
- Productivity and labor augmentation:
- CBCTRepD can raise effective reporting quality per radiologist, particularly boosting less-experienced clinicians to higher-quality output—this implies productivity gains and more consistent report quality per unit of labor.
- The system is an augmenting technology (human-in-the-loop), likely increasing output quality and throughput rather than fully substituting radiologists.
- Skill-biased effects and distributional impacts:
- Strong complementarities with low- and mid-skill radiologists (largest relative gains for novices and intermediates) could compress quality-based wage differentials or alter demand for experience-specific tasks.
- Seniors benefit mainly via reduced omissions and fewer high-consequence errors—this affects top-end quality assurance and liability exposure.
- Training and human capital:
- Radiologist trainees may learn faster and produce higher-quality reports earlier, potentially shortening training time or changing training emphases (more focus on interpretation of edge/corner cases).
- Conversely, overreliance on AI drafts could create new skill-atrophy risks unless training is adapted.
- Adoption economics:
- Value proposition includes improved report quality, fewer missed lesions (which can reduce downstream costs/complications), and potential time savings per report; quantifying these gains vs implementation costs (integration, validation, workflow change, and regulatory compliance) will determine adoption speed.
- Bilingual and multi-setting dataset improves generalizability, increasing expected adoption across regions and reducing localization costs.
- Liability, regulation, and reimbursement:
- Reduced omission rates and systematic structure may lower medicolegal risk, but responsibility for final reports remains with clinicians; regulatory clarity and standards for AI-assisted reporting will influence deployment costs and insurer/health-system uptake.
- Potential for new billing or reimbursement pathways for AI-augmented reporting quality needs exploration (e.g., quality-based incentives).
- Data and market dynamics:
- The creation of a large paired dataset is valuable: data scarcity is a major barrier in specialized medical imaging domains, so datasets like this raise entry costs for competing products and can be a source of competitive advantage.
- If such datasets are scarce and proprietary, market concentration around dataset holders or service providers is possible; open datasets or standards could mitigate that.
- Broader system effects:
- More standardized, complete reports can improve downstream decision-making, referrals, and treatment planning—expected externalities include better patient outcomes and more efficient care pathways, potentially lowering total costs.
- However, integration costs, need for human oversight, and potential false confidence in AI outputs must be managed to realize net economic benefits.
Caveats and open questions: - Generalizability beyond the dataset population and imaging protocols used remains to be validated prospective and across healthcare systems. - Quantitative estimates of time saved, cost reductions, and effects on employment/wages were not provided—economic modeling and field trials are needed to quantify net impacts. - Regulatory, liability, and clinician training adaptations will materially affect real-world economic outcomes.
Assessment
Claims (10)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| The authors curated a paired CBCT–report dataset of approximately 7,408 CBCT studies covering 55 oral and maxillofacial disease entities that is bilingual and includes diverse acquisition settings. Other | null_result | high | Dataset composition (number of studies, disease-entity coverage, bilingual status, and acquisition heterogeneity) |
n=7408
0.18
|
| CBCTRepD is a report-generation system trained on this curated paired dataset to produce bilingual CBCT radiology draft reports intended for radiologist-in-the-loop (co-authoring) workflows. Other | null_result | high | System capability: generation of bilingual CBCT draft reports for human editing |
0.18
|
| Under a multi-level clinical evaluation (automatic metrics plus radiologist/clinician review), raw AI-generated draft reports from CBCTRepD achieve writing quality and standardization comparable to intermediate radiologists. Output Quality | positive | medium | Writing quality and standardization of draft reports (AI drafts vs intermediate radiologists) |
0.11
|
| When used in a radiologist–AI co-authoring workflow, CBCTRepD consistently improves report quality for novice radiologists, bringing their reports toward intermediate-level quality. Output Quality | positive | medium | Final report quality for novice radiologists in a co-authoring workflow |
0.11
|
| In the same co-authoring workflow, intermediate radiologists improve their report quality toward senior-level performance when assisted by CBCTRepD. Output Quality | positive | medium | Final report quality for intermediate radiologists in a co-authoring workflow |
0.11
|
| Senior radiologists using CBCTRepD produce collaborative reports with reduced omission-related errors, including fewer clinically important missed lesions. Error Rate | positive | medium | Omission-related errors and clinically important missed lesions in final reports by senior radiologists |
0.11
|
| CBCTRepD improves report structure, reduces omissions, and promotes more systematic attention to co-existing lesions across anatomical regions in CBCT reports. Output Quality | positive | medium | Report structure, omission rate, and documentation of multi-region co-existing lesions |
0.11
|
| The paper used a clinically grounded, multi-level evaluation framework that separately assessed raw AI drafts (automatic metrics + clinician review) and radiologist-AI collaborative final reports (how radiologists edit and downstream clinical effects), including comparisons across radiologist experience levels. Other | null_result | high | Evaluation framework components (draft assessment, collaborative report assessment, automatic and clinician-centered modalities, experience-level comparisons) |
0.18
|
| The dataset and model are bilingual and cover varied acquisition settings, which the authors claim increases heterogeneity and clinical realism and should improve generalizability across care settings. Output Quality | positive | high (for dataset composition claim); medium (for the implication about improved generalizability) | Dataset heterogeneity and implied generalizability across settings |
0.02
|
| The paper does not provide quantitative estimates of time saved per report, cost reductions, or effects on employment/wages; such economic impacts remain to be quantified. Other | null_result | high | Absence of quantitative economic impact estimates (time saved, cost reduction, employment/wage effects) |
0.18
|