
Research Article
An Automated Method of Identifying Incorrectly Labelled Images Based on the Sequences of Loss Functions of Deep Learning Networks
@INPROCEEDINGS{10.1007/978-3-030-67514-1_21, author={Zhipeng Zhang and Wenhui Shou and Wengting Ma and Dongjia Xing and Qingqing Xu and Li-Qun Xu and Qingxia Fan and Ling Xu}, title={An Automated Method of Identifying Incorrectly Labelled Images Based on the Sequences of Loss Functions of Deep Learning Networks}, proceedings={IoT as a Service. 6th EAI International Conference, IoTaaS 2020, Xi’an, China, November 19--20, 2020, Proceedings}, proceedings_a={IOTAAS}, year={2021}, month={1}, keywords={Deep learning Medical image classification Incorrectly labelled sample identification}, doi={10.1007/978-3-030-67514-1_21} }
- Zhipeng Zhang
Wenhui Shou
Wengting Ma
Dongjia Xing
Qingqing Xu
Li-Qun Xu
Qingxia Fan
Ling Xu
Year: 2021
An Automated Method of Identifying Incorrectly Labelled Images Based on the Sequences of Loss Functions of Deep Learning Networks
IOTAAS
Springer
DOI: 10.1007/978-3-030-67514-1_21
Abstract
Deep learning has been widely applied to medical image analysis tasks. Since the labelled medical images are the foundation of the training, validation, and test of deep learning classification models, the quality of labelling process could directly affect the performance of the models. However, it was estimated that up to ten percent of manually labelled medical images may be incorrectly labelled. In this paper, by utilizing the sequences of loss functions of deep learning classification networks through multiple training epochs, an automated method of identifying incorrectly labelled medical images was proposed. For those identified images, their labels could be further reviewed and updated by senior and experienced physicians, ultimately improving the quality of labelled medical image datasets, as well as the performance of the deep learning models.
Two experiments were carried out to validate the effectiveness of the proposed method, based on a specific fundus image dataset for referable diabetic retinopathy screening. a) In the first experiment, the effectiveness of the method to accurately identify the incorrectly labelled samples from the whole labelled dataset was verified. For a fundus image dataset comprising 10788 samples with gold-standard labels (5394 non-referable diabetic retinopathy samples and 5384 referable diabetic retinopathy samples), the labels of a small part (6%, 648) of the images were intentionally changed to the opposite, in order to simulate the real-world situation. By utilizing the proposed method, 75.31% (488) of the incorrectly labelled samples were successfully identified, and only 4.85% (492) of the correctly labelled samples were wrongly identified as the incorrectly labelled ones. b) In the second experiment, by further reviewing those 980 samples (only 9.1% of the whole dataset) that were identified as incorrectly labelled from the dataset and updating their labels to the correct ones, the deep learning classification model for referable diabetic retinopathy screening was retrained. Tested on an independent test dataset with completely correct labels (700 non-referable diabetic retinopathy samples and 700 referable diabetic retinopathy samples), the best accuracy of the model was increased from 95.93% (trained on the dataset with 6% incorrectly labelled samples) to 96.50% (trained on the revised dataset with 1.5% incorrectly labelled samples), approaching the ideal value 96.57% (trained on the original dataset with 0% incorrectly labelled samples), demonstrating the effectiveness of the proposed method to improve the performance of the deep learning models.