Semantic segmentation of chest X-ray images based on the severity of COVID-19 infected patients

.


Introduction
Coronavirus (COVID- 19) is an infectious disease caused by newly discovered SARS-CoV-2 in China [1]. Although the virus is derived from animals, currently available epidemiological data indicate that the virus spreads relatively quickly and easily among humans [2].
Various portals and researchers published numerous studies about the new coronavirus along with the consequences it has on human health, especially for the human respiratory system that is most exposed to this virus [3][4][5][6]. People with pre-existing comorbidities in particular lung disease are at much higher risk for severe symptoms of COVID-19 since they have reduced lung capacity and decreased saturation (SpO2) [7]. Early diagnosis of COVID-19 severity may be crucial in planning the patient's hospitalization site or securing respiratory aids [8]. For that reason, healthcare systems urgently require decision-making tools in order to assist clinicians in receiving proper information in real-time [9]. AI-based tools are commonly used in a variety of medical fields due to the possibility of automated medical diagnosis with high performance [10,11]. Additionally, such tools can be used in other fields of science, economy, and technology [12][13][14].

EAI Endorsed Transactions on Bioengineering and Bioinformatics
Research Article EAI Endorsed Transactions on Bioengineering and Bioinformatics 03 2021 -08 2021 | Volume 1 | Issue 3 | e3 D. Štifanić et al. 2 The main application of AI-based approaches during the current pandemic is in the prediction of spread and dynamic of disease [15], development of vaccines [16], monitoring the treatment [17], and medical imaging and diagnosis of infection [18]. Medical imaging is the process of obtaining images of body parts for diagnostic analysis to facilitate the detection and classification of disease. It offers a noninvasive, touch-free, and comparatively safer alternative option for diagnosis [19]. Several studies have noted the successful implementation of AI algorithms for COVID-19 detection and prognosis using digital images such as lung computed tomography (CT) scans, chest X-rays, and lung ultrasounds.
Jaiswal et al. [20] propose the automatic analysis of CT scans based on AI methods. Dataset used in the research is publicly available and consist of 2492 CT scans (1262 positive for COVID-19 and 1230 negative). Comparative studies show that the proposed classification model based on deep transfer learning models outperforms the alternatives with an accuracy of 97%. Alom et al. [21] used multiple deep learning models to identify COVID-19 patients. The authors used CT scans and X-ray, both publicly available datasets, to evaluate the models. For the COVID-19 detection task, the Inception Recurrent Residual Neural Network (IRRCNN) was used, and the NABLA-N model was used for the segmentation task. Models resulted in an accuracy value of 98.78% and 99.56%, respectively. In their work, Mangal et al. [22] propose the use of chest X-ray to prioritize the selection of patients for further RT-PCR testing. Moreover, such an approach may be beneficial in finding patients with a high risk of COVID-19 infection who need to be tested again after a false negative RT-PCR. The proposed pre-trained AI model resulted in 95% accuracy with 100% sensitivity. Alqudah et al. [23] used different AI hybrid models that can distinguish the difference between COVID-19 and Non-COVID-19 patients from chest X-ray images. The best performing model resulted in a testing accuracy of 95.2%. Based on chest X-ray scans Jain et al. [24] compared the performance of different deep learning models in order to classify healthy patients and COVID-19 affected patients. Compared to other models, Xception resulted in the highest accuracy of 97.97%. Ozturk et al. [25] with aid of deep neural networks developed a model for binary (COVID-19 vs. No-Findings) and multiclass classification (COVID-19 vs. No-Findings vs. Pneumonia). For binary classification, the proposed model achieved an accuracy of 98.08%, and for multiclass, it achieved an accuracy of 87.02%.
An extensive literature analysis was conducted in order to gather the most up-to-date information on Artificial Intelligence (AI) based tools for COVID-19 to determine its potential application for this disease. Many researchers have trained deep learning classification models to determine a binary state of infected or non-infected patients. However, literature based on semantic segmentation is particularly scarce. With a high number of X-ray images obtained every day in hospitals for COVID-19 patients, image quality can vary for several reasons, which is a major problem for clinicians in establishing a diagnosis. For that reason, segmentation of anatomical structures plays an important role in many tasks relevant for accurate diagnosis, where an early determination of severity may be necessary for planning patients' hospitalization and respiratory aids [26].
The aim of this research is to implement an AI-based method in order to perform lung segmentation from chest Xrays of COVID-19 positive patients and compare the performance of three different convolutional neural network (CNN) architectures.
The proposed AI approach is the first step towards a detailed analysis of the affected areas of the lungs and the development of an appropriate individual treatment plan for the patient. Such system could aid in cost-effective and faster decision-making for clinician.

Materials and Methods
The workflow of the proposed AI approach is organised as follow; first, the data obtained by Clinical Centre in Kragujevac will be augmented and used for training and validation of the AI-based models. The overall performance of trained models will be estimated utilizing 5-fold crossvalidation. Obtained results, in terms of performance and robustness, will be examined and discussed. Afterwards, the best performing CNN architecture will be used in order to perform lung segmentation on X-ray data of COVID-19 positive patients obtained by Clinical Hospital Centre in Rijeka. Predicted masks will be graphically presented along with corresponding X-ray images. The framework of the proposed approach is shown in Figure 1. Semantic segmentation of chest X-ray images based on the severity of COVID-19 infected patients 3

Dataset description
For this research, two X-ray datasets are used. The first dataset consists of 183 X-ray images with 1062 × 870-pixel size, and it has been used for training and validation purposes. Images were retrieved from Clinical Centre in Kragujevac and represent the lung of 21 patients diagnosed with coronavirus. Patients were adults, where the mean age was 59. Moreover, approximately 33% of them were female and 67% male. The second dataset is a testing set and consists of 62 X-ray images with 1062 × 870-pixel size. Images were collected in Clinical Hospital Centre in Rijeka (KBC Ri) from 49 COVID-19 infected patients. Approximately 37% of hospitalised patients in KBC Ri were female and 63% male with a mean age of 61.
Originally, in the case of both datasets, the images have been divided into four classes based on the clinical picture of the patient: mild clinical picture, moderate clinical picture, severe clinical picture, and critical clinical picture as shown in Figure 2.
In order to perform lung segmentation from chest X-ray, the ground truth mask needs to be used as model input along with the corresponding X-ray image. Sample of X-ray image and corresponding ground truth mask is shown in Figure 3.
Since domains such as medical image analysis usually have limited availability of the data, it is important to use augmentation techniques to substantially increase the number of samples [27]. 90 degrees anticlockwise rotation, 180 degrees anticlockwise rotation, 270 degrees anticlockwise rotation, horizontal flip, horizontal flip along with 90 degrees anticlockwise rotation, vertical flip, and vertical flip along with 90 degrees anticlockwise rotation are the geometrical transformations used for the augmentation procedure.
Due to the limited availability of the data along with the high-imbalance of clinical picture classes, stratified 5-fold cross-validation is utilized in order to estimate the performance and robustness of the AI-based models. This way, within each fold, training and validation sets have approximately the same proportion of each class as in the original dataset. Considering the aforementioned augmentation procedures, a new training set with additional 1022 images has been generated, resulting in a total of 1168 images, while the validation set remained the same. Since the second dataset, with the images collected in Clinical Hospital Centre in Rijeka, is used as a testing set, augmentation procedure is not utilized.

DeepLabv3+
DeepLab is a deep learning model for semantic segmentation, intending to assign semantic labels to any pixel of the input image. The role of semantic segmentation, also known as image segmentation, is to group sections of an image that belong to the same object class together. DeepLab achieves dense prediction by computing pixelwise loss and up-sampling the output of the last convolutional layer. For up-sample, it uses atrous convolution. Chen et al. [28] proposed the newest version of DeepLab called DeepLabv3+. It adds a simple, yet effective, decoder module to DeepLabv3 in order to help refine segmentation results, particularly along object boundaries. The framework of the proposed architecture is given in Figure 4.  In the latest implementation, DeepLabv3+ support adopting the following network backbones: • PNASNet -In their paper Liu et al. [29] describe a more efficient method for learning the structure of CNNs. Progressive Neural Architecture Search (PNASNet) uses a sequential model-based optimization strategy in order to increase complexity. Compared to other techniques that search directly in the space of fully defined structures, such architecture has many advantages. To begin with, simple structures train faster, enabling initial results in order to quickly train the surrogate. The surrogate is then only applied to estimate the consistency of structures that are slightly different from the ones it has already seen (trust-region methods). Lastly, the search space is factored into a product of smaller search spaces, allowing the possible search of models with several more blocks. Based on ImageNet and CIFAR-10 dataset proposed method achieves state-of-the-art classification accuracies. • ResNet -Since training deeper neural network models is more difficult, due to the well-known vanishing gradient problem, He et al. [30] introduce a residual learning system for training networks that are significantly deeper than previously used networks.
They improved the residual block as well as the preactivation version of the residual block, allowing vanishing gradients to flow unhindered to any previous layer through shortcut connections. Furthermore, they evaluated residual nets with a depth of up to 152 layers on the ImageNet which resulted in a 3.57% error. that use an extended input representation, MobileNetV2 uses thin bottleneck layers as an input to the residual block. It includes a fully convolutional layer with 32 filters and 19 residual bottleneck layers in its architecture [10]. The described architecture enhances the state-of-the-art for a wide range of performance points on the ImageNet dataset. Howard et al. [33] demonstrated the next generation of mobile nets called MobileNetV3 that is the combination of novel architecture designs as well as complementary search techniques. Based on the results presented, architecture outperforms MnasNet, ProxylessNas as well as MobileNetV2.
• Xception -Chollet [34] present, Inception inspired a new deep CNN architecture called Xception. In this architecture, Inception modules are replaced with depthwise separable convolutions. Convolutional layers in a typical convolutional neuronal network seek for correlation in depth and space. To capture crosschannel correlation, Xception goes a step further by mapping the spatial correlations for each output channel separately and performing 1x1 depth-wise convolution [10]. On the ImageNet dataset, Xception demonstrates slight improvements in classification efficiency compared to Inception V3, but significant gains on the JFT dataset.

Evaluation criteria
In order to evaluate and interpret the results obtained with semantic segmentation, Mean Intersection-Over-Union (mIOU) is used. IOU is also known as the Jaccard Index and it is one of the most widely used metrics for segmentation tasks. IOU can be calculated as follows [35]: . (1) where TP represents true positives, FP false positives and FN false negatives. In the case of multiclass classification, mIOU can be calculated as the ratio of the total number of IOU-s for each semantic class and total number of classes.
Other metrics used for semantic segmentation are Dice Coefficient (F1), Accuracy (ACC), Precision, Sensitivity, and Specificity. The aforementioned metrics can be calculated as following [10]: and (6) Higher values of performance measures defined by Eq. (1-6) mean better overall performance of the model.

Results and discussion
In this section, the experimental results achieved with DeepLabv3+ and three different backbones are demonstrated and discussed. Backbones used in this research are Xception_65, MobileNetV2, and ResNet101. Additionally, each model was pre-trained on the Cityscapes dataset before the training on the X-ray dataset was performed. X-ray images from the first dataset were used along with corresponding ground truth masks for the training and validation process. In the case of Xception_65 and ResNet101 backbones, Atrous Spatial Pyramid Pooling (ASPP) was used in a configuration with atrous rates of 12, 24, 36. However, in the case of MobileNetV2, atrous rates were not used. Moreover, in all three approaches output stride was set to 8, and the decoder output stride to 4. During the training, Adam was used as an optimizer with a learning rate of 0.001. The value of weight decay, for Xception_65 and MobileNetV2 model variants, was set to 0.00004, while for the ResNet101 was set to 0.0001. The performances of each of the aforementioned models achieved on 5-fold cross-validation are shown in Table 1.  The robustness of the models is correlated with the value of standard deviation, thereby, the lower value of standard deviation indicates increased robustness of the model. DeepLabv3+ with ResNet101 as backbone resulted in the lowest standard deviation with the value of ± 0.013 in the case of mIOU performance measure, while in the case of precision, sensitivity, and specificity was the worstperforming with the values of ± 0.046, ± 0.042, and ± 0.014, respectively. The Xception_65 resulted in the lowest value of standard deviation with the value of ± 0.028, ± 0.008, and ± 0.008 for performance measures precision, sensitivity, and specificity, respectively. However, in the case of F1 measure was worst-performing with the value of ± 0.014. When all results are summed and standard deviations are observed, it can be concluded that the best performance with the highest robustness was achieved when Xception_65 was used as DeepLabv3+ backbone in the aforementioned configuration of model parameters.
Considering the results presented in Table 1, the best performing approach was utilized in order to perform lung segmentation on the X-ray images from the second dataset. One sample from each class was afterwards used to graphically demonstrate the performance of the proposed approach on new, unseen data, as shown in Figure 5.
By overlapping the predicted mask and X-ray image, as shown in the third column of Figure 5, it can be observed that the AI-based algorithm is capable of performing accurate lung segmentations. Figure 5. Visual representation of chest X-ray image, sematic segmentation result, predicted mask combined with X-ray image, cropped image, and cropped and segmented image. The first column represents samples of lung Xray images obtained by the clinician in Clinical Hospital Centre in Rijeka. The second column represents semantic segmentation results i.e. predicted masks, while the third column represents a combination of original X-ray images and predicted masks. The fourth column represents cropped images considering the segmented area. Finally, the last column represents segmented lungs and isolated from other regions. Moreover, X-ray images were afterwards cropped considering the segmented area in order to discard a surrounding area of an image that is not relevant for further analysis. With the aim of focusing only on the regions of the lung, segmented lungs were further isolated, as shown in the last column of Figure 5.
Due to local minima caused by shading effects, non-rigid shape variations of the lungs due to differing heart dimensions, and the presence of strong edges due to the rib cage, lung field segmentation in X-ray images tends to be difficult. Extensive literature searches show that deep learning is the most effective machine learning tool, which provides useful analysis for studying X-ray images that can have a significant impact on the screening of COVID-19. Rahman et al. [36] compose a dataset consisting of 3616 COVID-19 chest x-ray images and along with corresponding ground truth lung masks in order to perform semantic segmentation. The proposed U-Net model resulted in Accuracy, IOU, and F1 of 98.63%, 94.3%, and 96.94%, respectively. Ankalaki et al. [37] used DeepLabv3 along with ResNet101 as backbone to segment the COVID-19 affected region. The authors concluded that research can be extended by improving the performance of the model by conducting experiments with varying the backbone network and fine-tuning the atrous convolutions parameters. Compared to Anakalaki et al.'s paper our model based on DeepLabv3+ shows enviable results in terms of semantic segmentation and it is the first step towards the development of an automated system for individual treatment plan for the patient. In future work, the segmented areas will be used for a detailed analysis of the COVID-19 affected area of lungs.

Conclusion
Due to the extreme clinical significance during the current pandemic and complexity of X-ray images, researchers are actively investigating the use of image processing and machine learning techniques to improve computational methods in order to assist radiologists in interpreting chest images.
The first step required to compose an automated computer-aided system is segmentation. In this research, the authors demonstrate semantic segmentation of lung area based on X-ray images. Deeplabv3+ and Xception_65 as backbone resulted in highest values of performance measures of 0.910 ± 0.015 mIOU, 0.925 ± 0.014 F1, 0.968 ± 0.005 accuracy, 0.916 ± 0.028 precision, 0.935 ± 0.008 sensitivity, and 0.977 ± 0.008 specificity.
Based on the results of the proposed approach, the AI model has shown to be effective in terms of lung segmentation from chest X-ray images and has a lot of potential for clinical use in detecting COVID-19 lung abnormalities. Due to limited data availability, future work should use a dataset with more X-ray images to increase the effectiveness and robustness of the system.