Performance of open-source and proprietary large language models in generating patient-friendly radiology chest CT reports

Prucker P, Busch F, Dorfner F, Mertens CJ, Bayerl N, Makowski MR, Bressem KK, Adams LC (2025)


Publication Type: Journal article

Publication year: 2025

Journal

Book Volume: 125

Article Number: 110557

DOI: 10.1016/j.clinimag.2025.110557

Abstract

Rationale and objectives: Large Language Models (LLMs) show promise for generating patient-friendly radiology reports, but the performance of open-source versus proprietary LLMs needs assessment. To compare open-source and proprietary LLMs in generating patient-friendly radiology reports from chest CTs using quantitative readability metrics and qualitative assessments by radiologists. Materials and methods: Fifty chest CT reports were processed by seven LLMs: three open-source models (Llama-3-70b, Mistral-7b, Mixtral-8x7b) and four proprietary models (GPT-4, GPT-3.5-Turbo, Claude-3-Opus, Gemini-Ultra). Simplification was evaluated using five quantitative readability metrics. Three radiologists rated patient-friendliness on a five-point Likert scale across five criteria. Content and coherence errors were counted. Inter-rater reliability and differences among models were statistically assessed. Results: Inter-rater reliability was substantial to near perfect (κ = 0.76–0.86). Qualitatively, Llama-3-70b was non-inferior to leading proprietary models in 4/5 categories. GPT-3.5-Turbo showed the best overall readability, outperforming GPT-4 in two metrics. Llama-3-70b outperformed GPT-3.5-Turbo on the CLI (p = 0.006). Claude-3-Opus and Gemini-Ultra scored lower on readability but were rated highly in qualitative assessments. Claude-3-Opus maintained perfect factual accuracy. Claude-3-Opus and GPT-4 outperformed Llama-3-70b in emotional sensitivity (90.0 % vs 46.0 %, p < 0.001). Conclusions: Llama-3-70b shows strong potential in generating quality, patient-friendly radiology reports, challenging proprietary models. With further adaptation, open-source LLMs could advance patient-friendly reporting technology.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Prucker, P., Busch, F., Dorfner, F., Mertens, C.J., Bayerl, N., Makowski, M.R.,... Adams, L.C. (2025). Performance of open-source and proprietary large language models in generating patient-friendly radiology chest CT reports. Clinical Imaging, 125. https://doi.org/10.1016/j.clinimag.2025.110557

MLA:

Prucker, Philipp, et al. "Performance of open-source and proprietary large language models in generating patient-friendly radiology chest CT reports." Clinical Imaging 125 (2025).

BibTeX: Download