Model soups improve performance of dermoscopic skin cancer classifiers

Maron RC, Hekler A, Haggenmüller S, von Kalle C, Utikal JS, Müller V, Gaiser M, Meier F, Hobelsberger S, Gellrich FF, Sergon M, Hauschild A, French LE, Heinzerling L, Schlager JG, Ghoreschi K, Schlaak M, Hilke FJ, Poch G, Korsing S, Berking C, Heppt M, Erdmann M, Haferkamp S, Schadendorf D, Sondermann W, Goebeler M, Schilling B, Kather JN, Fröhling S, Lipka DB, Krieghoff-Henning E, Brinker TJ (2022)

Publication Type: Journal article

Publication year: 2022

Journal

European Journal of Cancer Elsevier

Book Volume: 173

Pages Range: 307-316

DOI: 10.1016/j.ejca.2022.07.002

Abstract

Background: Image-based cancer classifiers suffer from a variety of problems which negatively affect their performance. For example, variation in image brightness or different cameras can already suffice to diminish performance. Ensemble solutions, where multiple model predictions are combined into one, can improve these problems. However, ensembles are computationally intensive and less transparent to practitioners than single model solutions. Constructing model soups, by averaging the weights of multiple models into a single model, could circumvent these limitations while still improving performance. Objective: To investigate the performance of model soups for a dermoscopic melanoma-nevus skin cancer classification task with respect to (1) generalisation to images from other clinics, (2) robustness against small image changes and (3) calibration such that the confidences correspond closely to the actual predictive uncertainties. Methods: We construct model soups by fine-tuning pre-trained models on seven different image resolutions and subsequently averaging their weights. Performance is evaluated on a multi-source dataset including holdout and external components. Results: We find that model soups improve generalisation and calibration on the external component while maintaining performance on the holdout component. For robustness, we observe performance improvements for pertubated test images, while the performance on corrupted test images remains on par. Conclusions: Overall, souping for skin cancer classifiers has a positive effect on generalisation, robustness and calibration. It is easy for practitioners to implement and by combining multiple models into a single model, complexity is reduced. This could be an important factor in achieving clinical applicability, as less complexity generally means more transparency.

Authors with CRIS profile

Carola Berking Lehrstuhl für Haut- und Geschlechtskrankheiten Markus Heppt Department of Dermatology Michael Erdmann Department of Dermatology

Involved external institutions

Charité - Universitätsmedizin Berlin

Germany (DE) Nationales Centrum für Tumorerkrankungen (NCT)

Germany (DE) Universitätsklinikum Essen

Germany (DE) Universitätsklinikum Würzburg

Germany (DE) Universitätsklinikum Schleswig-Holstein (UKSH)

Germany (DE) Deutsches Krebsforschungszentrum (DKFZ)

Germany (DE) Universitätsklinikum Mannheim / University Medical Centre Mannheim (Universitätsmedizin Mannheim)

Germany (DE) Klinikum der Universität München (LMU Klinikum)

Germany (DE) Universitätsklinikum Regensburg

Germany (DE) Berliner Institut für Gesundheitsforschung in der Charité / Berlin Institute of Health at Charité (BIH)

Germany (DE) Universitätsklinikum Aachen (UKA)

Germany (DE)

How to cite

APA:

Maron, R.C., Hekler, A., Haggenmüller, S., von Kalle, C., Utikal, J.S., Müller, V.,... Brinker, T.J. (2022). Model soups improve performance of dermoscopic skin cancer classifiers. European Journal of Cancer, 173, 307-316. https://doi.org/10.1016/j.ejca.2022.07.002

MLA:

Maron, Roman C., et al. "Model soups improve performance of dermoscopic skin cancer classifiers." European Journal of Cancer 173 (2022): 307-316.

BibTeX: Download