Semi-supervised Learning for COVID-19 Image Classification via ResNet

Coronavirus disease 2019 (COVID-19) is an ongoing global pandemic in over 200 countries and territories, which has resulted in a great public health concern across the international community. Analysis of X-ray imaging data can play a critical role in timely and accurate screening and fighting against COVID-19. Supervised deep learning has been successfully applied to recognize COVID-19 pathology from X-ray imaging datasets. However, it requires a substantial amount of annotated X-ray images to train models, which is often not applicable to data analysis for emerging events such as COVID-19 outbreak, especially in the early stage of the outbreak. To address this challenge, this paper proposes a two-path semi-supervised deep learning model, ssResNet, based on Residual Neural Network (ResNet) for COVID-19 image classification, where two paths refer to a supervised path and an unsupervised path, respectively. Moreover, we design a weighted supervised loss that assigns higher weight for the minority classes in the training process to resolve the data imbalance. Experimental results on a large-scale of X-ray image dataset COVIDx demonstrate that the proposed model can achieve promising performance even when trained on very few labeled training images.


I. INTRODUCTION
Coronavirus disease 2019 (COVID- 19) outbreak has led to the heavy losses of the world's economy and life. To reduce the spread of COVID-19 and the death rate, it is essential to detect the disease at the early stage with effective and timely screening/testing and place COVID-19 infected patients in quarantine immediately [1], [2]. Artificial intelligence (AI), an emerging technology for medical imaging processing, has actively contributed to the fight against COVID-19 [3]. Compared to the traditional imaging workflow that heavily relies on human interpretation, AI enables more safe, accurate and efficient imaging solutions.
Recent AI-empowered applications in COVID-19 detection include the dedicated imaging platform, the lung and infection region segmentation, as well as the clinical assessment and diagnosis [4], [5], [6]. Moreover, commercial products integrate AI to combat COVID-19 and demonstrate the capability of the AI technology [4]. All of these examples show the tremendous enthusiasm cast by the public for AI-empowered progress in the medical imaging field, especially during the ongoing COVID-19 pandemic.
Regarding the COVD-19 research based on AI, COVID-19 image classification becomes more and more attractive, which is to separate COVID-19 patients from non-COVID-19 subjects using the features extracted from medical images. Specially, supervised deep learning such as convolutional neural networks (CNN) has been very popular in this research area. For example, Wang et al. proposed a 2D CNN supervised model to analyze delineated region patches to accomplish classification between COVID-19 and typical viral pneumonia [7]. Similarly, Xu et al. utilized candidate infection regions to complete COVID-19 classification via supervised ResNet-18 [8].
In addition, as a powerful deep learning model for medical image analysis, UNet [9] was employed for COVID-19 image classification and segmentation. For example, Zheng et al. employed UNet to obtain lung segmentation and predicted the probability of COVID-19 with 3D CNN on segmentation features [6]. Jin et al. proposed a UNet++ based segmentation model for locating lesions and built a ResNet-50 based classification model for COVID-19 diagnosis [10]. Chen et al. implemented COVID-19 classification with the patterns of segmented lesions extracted by supervised UNet++ [11], [12]. Moreover, they employed a 2D Deeplab model for the lung segmentation and a 2D ResNet-152 model for lung-mask slice based identification of positive COVID-19 cases [13]. Although supervised deep learning presents impressive performance on COVID-19 image classification, it requires a large amount of annotated medical images to train models, which is not practical with respect to limited data resources related to COVID-19, due to huge costs of labeling medical images, and labeling noise [14].
To reduce the efforts on labeling medical images for  1. Framework of the proposed semi-supervised learning. Input x is the medical image. Labels such as y are available only for the labeled inputs. Shared ResNet will evaluate the input to obtain the low-level representations as inputs to supervised ResNet and unsupervised ResNet, where these three ResNets are built with residual blocks and N , M , and K are numbers of residual blocks for these three ResNets. Then z sup and z unsup are outputs from the supervised ResNet and the unsupervised ResNet, respectively. Moreover, z sup and y will be applied to calculate a weighted cross entropy loss l W CEL whereas z sup and z unsup are used to calculate a mean squared error loss l M SEL , where w the weight to different classes of samples. We jointly optimize the combined losses, where λ is the weight for unsupervised loss. ⊕ is the short-cut connection in the residue operation.
COVID-19 image classification, we build a two-path semisupervised deep learning model that is able to learn on both labeled and unlabeled medical images, based on residual neural networks (ResNet) [15]. ResNet is an artificial neural network developed by mimicking pyramidal cells in the cerebral cortex. It is to introduce a so-called "identity shortcut connection" that skips one or more layers since stacking layers should not degrade the network performance. With ResNet, we implement a two-path semi-supervised learning model that is composed of three components, namely, shared ResNet, supervised ResNet, and unsupervised ResNet.
Framework of the proposed model is shown in Fig. 1. One path is composed of a shared ResNet and a supervised ResNet while the other path consists of the shared ResNet and an unsupervised ResNet. All data (labeled and unlabeled data) will be evaluated to calculate the unsupervised loss that is the mean squared error loss (MSEL), while only labeled data will be used to calculate the supervised loss that is the cross entropy loss (CEL). Specifically, we design a weighted cross entropy loss (WCEL) that assigns more weight to the COVID-19 class for addressing the data imbalance. Reducing MSEL is to enhance the image representation while decreasing WCEL is to enhance classification performance. We validate the proposed model on a large-scale of X-ray image dataset COVIDx and experimental results demonstrate the proposed model can accomplish COVID-19 image classification with promising performance even when trained on the extremely limited amount of labeled X-ray images.
The contributions in this study are below.
• We propose a semi-supervised deep learning model with ResNet through jointly training a supervised ResNet and an unsupervised ResNet. We observed that the proposed iii model can learn on both unlabeled images and labeled images jointly for COVID-19 image classification with high performance. • The proposed model is validated on a large-scale COVID-19 image dataset. Experimental results indicate the proposed model is able to effectively recognize COVID-19 images by learning on very few labeled medical images, for example, less than 10% samples in the training data, which meets the requirement of few available labeled data from the medical domain for real applications [14], especially for the cases at the early stage of such global pandemic.

II. PROPOSED METHODOLOGY
We propose a semi-supervised ResNet to address the challenge of lacking of labeled data for COVID-19 image classification, where the detailed framework is shown in Fig. 1. The shared ResNet will generate a new representation of input x, where the new representation z is given by where f cov (·) is the convolutional operation. f Resblock (·) is the residual operation [15] and f Resblock N · · · f Resblock1 (·) refers to N sequencing residual operations. f pooling (·) is the pooling operation. Introducing this shared ResNet to the proposed model is inspired by deep multi-task learning [16], [17], since different tasks share a low-level feature representation extracted from the input x. In addition, the reason for learning low-level feature representations instead of directly using x is that the original representation may not have enough expressive power for multiple tasks [18]. With the training data in all tasks, a more powerful representation can be learned for all tasks and this representation will improve performance. As shown in Fig. 1, we have two "tasks" in our proposed model, namely, a supervised task and an unsupervised task, which is similar to the framework of deep multi-task learning. Therefore, the shared ResNet is necessary to feed the low-level representations to these two tasks.
The output z from the shared ResNet is evaluated by two ResNets, namely, a supervised ResNet and an unsupervised ResNet. For the supervised ResNet, it is to learn the deep features of labeled samples. The output z sup of the supervised ResNet is given by where f sup Resblock (z ) = z + f sup conv (f sup conv (z )) .
We employ the same operations including the pooling operation f sup pooling (·), the convolutional operation f sup conv (·), and M sequencing residual operations f sup Resblock (·). Moreover, we build the unsupervised ResNet to generate another representation of all inputs including labeled data and unlabeled data. This representation z unsup is given by where Similarly, we employ the pooling operation f unsup pooling (·), the convolutional operation f unsup conv (·), and K sequencing residual operation f unsup Resblock (·) to build the unsupervised ResNet. Then, we utilize those two vectors z sup and z unsup to calculate the weighted cross entropy loss (WCEL) and mean squared error loss (MSEL) for supervised and unsupervised paths, respectively. They are given by where y is the ground truth of the input and w is corresponding weight. φ(·) is the softmax activation function. l W CEL is the weighted cross entropy loss to account for the loss of labeled inputs. To enhance classification performance for the minority class (COVID-19 class), we assign more weight to COVID-19 class, where during the learning procedure the classifier will pay more attentions to the COVID-19 class so as to reduce the learning bias that is caused by data imbalance. l M SEL is to measure the differences between z sup and z unsup . Since training ResNets with dropout regularization and gradient-based optimization is a stochastic process, the two ResNets will have different link weights after training. In other words, there will be differences between the two prediction vectors z sup and z unsup that are from these two ResNets (the supervised ResNet and the unsupervised ResNet). These differences can be treated as an error in the classification and thus minimizing this loss is a goal in the proposed model, which is inspired by Π model [19].
Based on these two losses, the total loss is defined by where λ is the weight for l M SEL . Training the proposed model is to optimize Loss on the training data. At the beginning of training, the total loss and the learning gradients are dominated by the supervised loss component, i.e., the labeled data only. At later stage of training, unlabeled data will contribute more than labeled data. These processes are controlled by fine-tuning λ [19]. The detailed steps for learning of the proposed model is shown in Algorithm 1. f θ shared (·) is to learn the low-level features from the medical images. Parameters of the shared ResNet θ shared include weights learned for the operations, namely, pooling operation f pooling (·), convolutional operation f conv (·), and residual operation f Resblock (·). for each minibatch B do 3: z i∈B ← f θ shared (x i∈B ) shared representation 4: z sup i∈B ← f θsup (z i∈B ) supervised representation 5: z unsup i∈B ← f θunsup (z i∈B ) unsupervised representation 6: update θ shared , θ sup , θ unsup using optimizer , e.g., ADAM return θ shared , θ sup , θ unsup After extracting low-level feature representations from inputs, we use f θsup (·) and f θunsup (·) to obtain higher level representations z sup and z unsup , where z sup is used to complete COVID-19 classification. In addition, z sup and z unsup are employed to enhance the image representations. Parameters of the supervised ResNet θ sup include weights learned for the operations, namely, pooling operation f sup pooling (·), convolutional operation f sup conv (·), and residual operation f sup Resblock (·) while those of the unsupervised ResNet θ unsup consist of weights learned for the operations, namely, pooling operation f unsup pooling (·), convolutional operation f unsup conv (·), and residual operation f unsup Resblock (·). Specially, in the training procedure, we overcome the data imbalance by assigning more weight w i to the minority class (COVID-19 class) of samples. Finally, we employ ADAM optimizer to jointly optimize the total loss.

A. Dataset
We employ a large-scale of chest X-ray dataset COVIDx [20] to validate the proposed model. It is comprised of 18,543 chest radiography images across 13,725 cases. Example chest X-ray images belonging to normal, pneumonia, and COVID-19 classes from COVIDx dataset are shown in Figure 2. When we examine these examples, we can differentiate these images in terms of features shown within areas marked by the blue circle since we can observe some lighter areas indicating COVID-19 infected regions in the blue circle.
Additionally, when examining the class distribution between training and testing data, we noticed that class distribution of the training set is significantly different from that of testing set. Hence we rebuild the data by splitting the dataset into training and testing datasets that share similar class distributions, where 70% and 30% of data are used for training and testing datasets, respectively. The detailed information of the rebuilt dataset is shown in Table I for sample distribution.
We can observe that the sample distribution is extremely unbalanced regarding the number of samples of COVID-19

Dataset
Normal Pneumonia COVID-19  Total  Training  6,195  6,708  75  12,978  Testing  2,656  2,876  33  5,565  Total  8,851  9,584  108  18,543 class. This poses a great challenge for obtaining a classifier with high performance.We overcome this challenge by the weighted cross entropy loss in the proposed model that is to assign more weight to the minority class (COVID-19 class) during the training, where the details are presented in section two.

B. Experimental settings
In  Table II, where the residual block is the standard one [15]. Specifically, the output of the proposed model contains two parts: image class φ(z sup ) and a new representation z unsup . We employ COVID-Net 1 [20] as a baseline supervised model to present the state-of-the-art performance of COVID-19 image classification for comparison. Furthermore, we compared the proposed model with SRC-MT [21] that is the state-of-the-art of semi-supervised learning since it outperformed Π model [19] and mean teacher model [22] in the area of medical image classification.

C. Evaluation metric
We applied different evaluation metrics to evaluate the performance of our proposed model. Since our task is a multi-class classification problem, we use accuracy, macroaverage Precision (MacroP), macro-average Recall (MacroR), and macro-average Fscore (MacroF) [23], [24], [25]. Accuracy is calculated by dividing the number of medical images identified correctly over the total number of testing medical images.
Macro-average [26] is to calculate the metrics such as Precision, Recall and F-scores independently for each image class and then utilize the average of these metrics. It is to evaluate the whole performance of classifying image classes.
where C denotes the total number of image classes and F score c , P recision c , Recall c are F score, P recision, Recall values in the c th image class which are defined by F score = 2 × P recision × Recall P recision + Recall .
where P recision defines the capability of a model to represent only correct image classes and Recall computes the aptness to refer all corresponding correct image classes: whereas T P (True Positive) counts total number of medical images matched the annotated images. F P (False Positive) measures the number of recognized classes does not match the annotated images. F N (False Negative) counts the number of medical images that does not match the predicted medical images. The ideal case of learning from imbalanced datasets such as COVIDx is to improve the recall without hurting the precision. However, recall and precision goals are often conflicting, since when increasing the true positive (TP) for the minority class (True), the number of false positives (FP) can also be increased; this will reduce the precision [27]. In addition, we employ confusion matrix to check the detailed performance for each class, especially on COVID-19 class.

D. Experimental results
We evaluated the proposed model performance in four steps. The first step is to examine the performance of supervised learning baselines, which is to prove if ResNet is a reasonable supervised model for COVID-19 image classification. A competitive supervised baseline is useful to compare the proposed semi-supervised model in order to present the effectiveness of the proposed model. Furthermore, we will check whether fewer labeled data will lead to lower performance. The second step is to comparing the proposed model with state-of-the-art semi-supervised learning. The third step is to examine whether the hyper-parameter setting will affect the performance of the proposed model significantly. Finally, we will discuss why the proposed model cannot classify certain COVID-19 cases.
1) Supervised learning for COVID-19 classification: Table III presents the comparison of supervised baselines built with ResNet. We observed that ResNet (100%) can outperformed COVID-Net (100%) when comparing accuracy, macroaverage precision, and macro-average Fscore. It means that ResNet is a competitive supervised baseline for COVID-19 image classification. Additionally, for learning on fewer labeled data, we only focus on the cases of 5%, 7%, and 9% labeled data since the labeled data will be very scarce in medical domain [14] during the early stage of a global pandemic such as COVID-19 outbreak. We observed that the classification accuracy can be improved by increasing the labeled data to train ResNet. Meanwhile, the performance such as accuracy and MacroF is reduced significantly when comparing with ResNet (100%), which demonstrates that more labeled data is imperative for building high-performance supervised models. Moreover, we observed that weighted ResNet cannot improve the performance since we might assign inappropriate weight to different classes.
On the other side, we compare their confusion matrix to examine the performance details in Fig. 3. It indicates that ResNet (100%) can be a promising supervised baseline model when compared to COVID-Net in terms of the accuracies on the normal and pneumonia classes. For the COVID-19 class, ResNet is lower than COVID-Net since COVID-Net employed transfer learning to enhance performance.
To check the performance for each class when learning on fewer labeled data, we present the detailed performance with confusion matrix shown in Fig. 4. When we use low ratios of labeled training data to train models, ResNet cannot recognize COVID-19 images effectively, which is duo to insufficient COVID-19 labeled samples. In the training sets of these cases, only few of images are for the COVID-19 class. For example, vii in the case of ResNet (5%), we only have three images for COVID-19 class in the training data, which means most of training images are for the classes of Normal and Pneumonia. Learning on this data will lead to classification bias. Weighted ResNet was no sufficient to enhance the performance, which means even more weight assigned to COVID-19 class is not enough to overcome the lack of labeled samples to learn distinguish features to differentiate COVID-19 patients from Non-COVID-19 patients on X-ray images with supervised learning.
2) Comparing the proposed model with state-of-the-art semi-supervised learning: In this section, we will examine if the proposed model is able to effectively identify COVID-19 samples by training on very limited amount of annotated images. Table IV presents the comparison of classification performance between SRC-MT and the proposed model (SS-ResNet). Overall accuracies of SRC-MT are better than those of the proposed model. However, when only 5% labeled samples were used for training, MacroF of our proposed model is higher than that of SRC-MT, which indicates that the proposed model is more effective in detecting COVID-19 samples. can detect COVID-19 samples with higher performance. It means that compared to SRC-MT, the unsupervised path could enhance the data representation for improving COVID-19 classification more effectively.
In addition, we examine detailed performance of each class with confusion metrics shown in Fig. 5. We observed that the accuracy of recognizing COVID-19 by the proposed model is higher than that of SMC-TC, which means SSResNets can learn more effective features from unlabeled data to recognize COVID-19 samples. Furthermore, with the increased ratios of labeled data, the accuracies of recognizing COVID-19 is enhanced significantly. It means that the unsupervised path can enhance the representations of images to improve the classification. In other words, unlabeled data contributed to increasing the COVID-19 classification performance significantly by enhancing the image representations with the unsupervised path of the SSResNet.
3) Hyper-parameter setting: In addition to examining the performance comparison between the proposed models and baselines, we have to figure out whether the proposed model is sensitive to the hyper-parameters. There are various hyperparameters involved in the learning procedure of the proposed model. Here, we choose class weight to check since different weights would lead to different performance of recognizing COVID-19 samples. Table V shows the comparison results for different weights of three classes. We observe that different weights will result in significant differences of the performance when examining the values of accuracy. On the other hand, compared to accuracy and MacroP, MacroR and MacroF are less sensitive to the weight of COVID-19 class. Generally, we have to delicately select the weight for COVID-19 class to obtain the optimal performance. 4) Error Analysis: Fig. 6 presents three COVID-19 samples that are classified into Normal, COVID-19, and Pneumonia classes, respectively. X-ray images of COVID-19 patients shows various features for different stages of COVID-19 patients 2 . At the early stage of COVID-19 patients, X-ray images cannot present significant features ( Fig. 6 (a)) that can be used to differentiate COVID-19 and Non COVID-19 patients, which leads to the incorrect classification result for the sample. It is consistent with the expectation that X-ray images are not ideal evidences to support diagnosis of COVID-19 for the patients at the early stage.
However, with development of COVID-19, X-ray images are able to present obvious features such as multifocal lung airspace opacities, nodules and consolidation ( Fig. 6 (b)), which contributes to the correct classification result. Unfortunately, if the patients are at the late stage of COVID-19, X-ray images presents lobar diffused consolidation (See Fig. 6 (c)) that is similar to features of pneumonia. These features will be confusing to the proposed model and lead to the incorrect result for the sample shown in Fig. 6 (c). In summary, in terms of samples shown in Fig. 6, the proposed model will be effective for the patients who are in the development of COVID-19 rather than those at the early stage or late stage of such disease.

IV. RELATED WORK
Deep learning technique has shown its power on classification of COVID-19. Ghoshal et al. [28] proposed a Bayesian convolutional neural network to estimate the diagnosis uncertainty in COVID-19 prediction, where the dataset includes 70 lung X-ray images of patients with COVID-19 from an online COVID-19 dataset [29], and non-COVID-19 images from Kaggle's Chest X-Ray data (Pneumonia). Narin et al. [30] is to detect COVID-19 infection from Xray images through comparing three different deep learning models, namely, ResNet50, InceptionV3, and Inception-ResNetV2. The evaluation results show that the ResNet50 model outperformed other two models. Zhang et al. [31] also utilized ResNet to complete COVID-19 classification on X-ray images and estimated an anomaly score to optimize the COVID-19 score for the classification. In addition, Wang et al. [20] propose COVID-Net to detect COVID-19 cases using X-ray images. In general, most current studies use X-ray images to differentiate between COVID-19 and other pneumonia and healthy subjects.
In addition to COVID-19 image classification, it is imperative to figure out the regions of infection of COVID-19 since it will provide detailed information on COVID-9 for diagnosis. Semantic segmentation is able to help us recognize the regions and corresponding patterns to assess and quantify COVID-19, where the regions of interest (ROIs) contains those of lung, lobes, bronchopulmonary segments, and infected regions or lesions, in the chest X-ray or CT images. Moreover, segmented regions could be further used to extract handcrafted or selflearned features for diagnosis and other applications. Deep learning has promoted the development of semantic segmentation of images significantly [9], [32]. To segment ROIs in CT, the segmentation networks for COVID-19 include classic U-Net [6], [33], [5], UNet++ [12], and VB-Net [34]. The segmentation methods related to COVID-19 can be classified   into two groups: 1) the lung-region-oriented methods and 2) the lung-lesion-oriented methods. The first group aims at separating lung regions, i.e., whole lung and lung lobes, from other (background) regions in CT or X-ray images [10], [35].
For example, Jin et al. [10] is to detect the whole lung region with UNet++. The second group is to detect lesions (or metal and motion artifacts) in the lung from lung regions [36], [37]. The experimental results indicate that the segmentation of X-ray images is even more challenging because of the ribs projected onto soft tissues in 2D. Although supervised deep learning outperforms other models on these two tasks, it requires substantial amount of labeled data to train the model, which is not practical in real applications. Semi-supervised deep learning has attracted lots of attentions since it has the strong ability to generalize the model performance through learning on labeled data and unlabeled data [38], [39], [40], [19]. Generally, it is to train the deep neural networks by jointly optimizing the standard supervised classification loss on labeled samples and an unsupervised loss on unlabelled data [38], [19]. The rationale of these semi-supervised learning models is to enrich the supervision signals by exploiting the knowledge learned on unlabeled data [41], or regularize the network by enforcing smooth and consistent classification boundaries [40]. Regarding COVID-19 research such as COVID-19 image classification and image segmentation, semi-supervised learning is employed to resolve the lacking of labeled data [42], [43], [44], [45], [46], [47]. However, for COVID-19 image classification, these studies [42], [43], [44] have not comprehensively examined the model performance on a large-scale of X-ray image dataset such as COVIDx [20] by comparing with the state-of-the-art, especially for the case of very few labeled data such as less than 10% labeled data. This paper proposed a semi-supervised deep learning model for COVID-19 image classification and checked out the model performance systematically on the COVIDx [20] dataset.

V. CONCLUSION AND FUTURE WORK
In this paper, a novel framework of semi-supervised deep learning is proposed for COVID-19 image classification on chest X-ray images. Supervised learning based COVID-19 classification on X-ray datasets could provide useful information to medical staff for facilitating a diagnosis of COVID-19 in an effective and efficient manner. Unfortunately, it relies on the availability of large amount of labeled medical images, which are not available in practice in the early outbreak of such global pandemic. Hence, we propose a semi-supervised learning model based on ResNet that can utilize unlabeled images to enhance classification performance. There are two paths in the model for reducing supervised cross entropy loss and unsupervised mean squared error loss, respectively. Then training is performed by jointly optimizing these two losses, which allows the proposed scheme to take advantage of the information from both labeled and unlabeled images. Experimental results demonstrate that the proposed model could recognize COVID-19 lung pathology effectively by learning on very limited labeled images and substantial unlabeled images. For the future work, we plan to extend the proposed model for other tasks such as COVID-19 image segmentation.