Explanation of the Convolutional Neural Network Classifying Chest X-Ray Images Supporting Pneumonia Diagnosis

Medical images are valuable sources for disease diagnosis. Besides, advancements in deep learning in recent years have been supporting disease diagnosis methods based on images to obtain numerous achievements. However, deep learning algorithms still work as a black-box so it is difficult to interpret output from these algorithms. In this study, we propose a convolutional network architecture to classify Chest X-ray images as well as apply explanation approaches for trained models to support disease diagnosis. The proposed method provides insights in medical images to support Pneumonia diagnosis.


Introduction
In recent years, the rapid increase in the development of technologies and a large number of available data in medical contribute to improving treatment outcomes, make it more specific and effective. The medical data are very diverse and abundant, the most popular including images or digital. The medical images, specifically with X-ray imaging, magnetic resonance imaging (MRI), computed tomography (CT) are used widely in clinical analysis, medical intervention, and easy to detect the anomalies by creating visual representations of the inner structures of the body. The outbreaks of SARS-CoV-2 in 2019 have an unintended consequence and the attempts to contain the disease by using medical images are very necessary [1,2]. X-ray (also commonly referred to as chest radiography or CXR) may be considered as the most common method for the identification of chest and nearby structures abnormalities. Furthermore, the American College of Radiology notes that chest x-ray can be used to minimize the risk of cross-infection [3]; according to [4], chest x-ray produces millions of scans performed globally every year. The improvements of deep learning bring medical technology to light with promising results, for instance, medical events prediction [5,6], antibiotic discovery [7], or analysis of electronic health records [8]. Moreover, the improvements of image classification, image segmentation [9,10] offering plenty of encouragement for the developments of medical imaging. Several deep learning approaches in medical imaging have been proposed for disease detection and diagnosis, skin cancer classification [11], or a deep encoder-decoder architecture for 3D image biomedical segmentation [12]. However, the annotation progress in the medical image is based on medical professional knowledge, medical industry standard, and EAI Endorsed Transactions on Context-aware Systems and Applications Online First Thanh Hai Nguyen et al.
2 medical system [13]. Thus, it requires a large human and material resources.

A literature review on medical images processing
In the field of medical processing, the experimental results of the author in [10] show the improvements of performance on hyperspectral image classification tasks based on a deep feature manifold embedding method. The first extracted features are discovered the discriminant manifold structure by an intrinsic and a penalty graph, then those features are mapped into a low-dimensional embedding space. The results demonstrated that the proposed method is effective. The following approach also works in hyperspectral images with sparse representation and dictionary learning [14]. This approach can learn the sub-dictionary effectively for each class and the multi-scale strategy. The shape of the regions would not be adapted in multi-scale sparse representation based on context structure. There are two process stages, dictionary learning stage, and sparse representation stage. The learning dictionary stage ensures that contain more discriminative information while the sparse representation stage can exploit the spatial context. This combination can explore more information in hyperspectral images and has a great performance. Besides, it can deal with the imbalanced data problem in the hyperspectral image. The authors in [13] proposed semi-supervised learning for CT pathological images analysis of the brain and chest. The generated antagonism network was trained with a small amount of labelled data and combined the extracted features for classifying. This approach reached higher performance in comparison to Convolutional Neural Network (CNN) or the other traditional image classification models. Several different approaches in chest radiography are proposed, the purpose of the study in [15] is to detect the abnormalities and assess the change in findings over serial radiographs from 724 patients. Two thoracic radiologists assessed all the images for abnormalities, four test radiologists assessed the presence of radiographic abnormalities to establish a standard of reference (SOR). The authors used QureAI which is based on CNN to evaluate their study and the results showed that no statistical difference between CNN and SOR but the area under the curve (AUC) is quite a difference. The accuracy of CNN is affected negatively by the presence of the medical devices of the chest. Thus, deep learning with its limitations is unlikely to replace radiologists but it can be helpful for the interpretation of CXR findings. The authors in [16] detected the pneumoconiosis from chest X-ray images in the form of digital radiography by applied CNN. The datasets contain 1881 images and collected from subjects who have worked in a dusty environment. Furthermore, the certified radiologists are involved in this study and compared their performance with CNN. The results demonstrated that CNN had a good performance and can be a solution in screening pneumoconiosis.
In this study, we present a novel method based on the explanation of CNN to support pneumonia diagnosis on chest x-ray images, our contributions include: • We proposed a method to classify the pneumoconiosis patients and normal from X-ray images with a shallow Convolutional Neural Network.
• We combined the Gradient Class Activation Map (Grad-CAM) [18] with our CNN as a novel method for detecting the abnormalities regions in chest x-ray images and aiding the interpretative decision by visualizing them into images. The method is expected to a good tool for disease diagnosis based medical images.
The remaining of this work is presented as follows. Section 3 introduces the architecture of our Convolutional Neural Networks for supporting pneumonia diagnosis and details of the used dataset. We explain the predictions and our experimental results in Section 4 and 5 respectively. Finally, we discuss and summarize our study in Section 6.

Dataset
We used a dataset contains 5840 chest X-ray images with 1575 images of normal chest and 4265 images of the pneumoconiosis patients. Chest X-ray images were collected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children's Medical Center, Guangzhou [23]. Initially, all chest X-ray images were screened for monitoring the quality by removing all low quality or hard-to-read scans. Then, two expert physicians classified the diagnosis for the images before being cleared. The sample of normal and pneumonia chest X-ray images are visualized in Figure 1. The normal chest (left image) describes a clear lung without any abnormalities regions in the image whereas Pneumonia (right image) exhibits a focal lobar consolidation.

Learning model
We trained our Convolutional Neural Networks with structures which are illustrated in Figure 2. The CNN contains one Convolutional layer, followed by a Max-Pooling layer and a Fully Connected layer. For more specific, the convolutional layer is related to feature extraction, it generates the feature maps by performing convolution operations between the input and the filter. Our convolutional layer contains 64 filters or kernels, the filter itself is a 3 × 3 integers matrix. The Max-Pooling layer is used to reduce the dimensions of the generated feature maps and perform the process of extracting a max EAI Endorsed Transactions on Context-aware Systems and Applications Online First Explanation of the Convolutional Neural Network Classifying Chest X-Ray Images Supporting Pneumonia Diagnosis 3 value from a set of values. We used the Max-Pooling layer with a size of 2 × 2. The output from the Max-Pooling layer will be flattened into a 1D array which forms the input matrix for the Fully Connected layer. Furthermore, CNN is implemented with Adam optimizer [17] with the default learning rate of 0.001 and a batch of size 16. The proposed CNN architecture has been used in disease prediction tasks [19,20] and achieved promising results. Furthermore, we used the input of size 64 × 64, a shallow architecture should be able to work well in this task.
We computed binary cross-entropy loss during training by the formula 1. The goal of cross-entropy loss is to compare the probability distribution output with the true label and evaluate how good are the predicted probabilities. For binary classification tasks, the typical loss function is the binary cross-entropy as we mentioned above.
Where and � denote the label and the predicted results respectively.
Alongside with loss function, the selection of optimization algorithms is also important. Adam optimization is an extension to stochastic gradient descent, showing its advantages in terms of speed of training. Furthermore, Adam combines the characteristic of the AdaGrad [21] and RMSProp [22] that can handle sparse gradients on noisy problems. To avoid overfitting issues, we also used Early Stopping method.

Explanations for the predictions
To visualize the areas of abnormal in the images, we combined Grad-CAM with our CNN for detecting characteristic features from X-ray images. In other words, Grad-CAM can detect the areas where CNN focused on and paid more attention to the image. Specifically, let represents for the K generated feature maps and refers to the activation at location (i,j), and denotes the score for class c. The heatmap ware generated by computing the importance weights captures the information of the feature map k for class c, by the formula 2: Where Z is the number of pixels in feature map.    Figure 3b and 3h are not class-discriminative but they are very similar and generated by Guided Backpropagation [24] and Deconvolution [25] respectively. Otherwise, Figure 3c and Figure 3i focused on "cat" and "dog" due to applying Grad-CAM method. The authors in [18] combined the Grad-CAM with Guided Back-propagation and Deconvolution method to create Guided Grad-CAM and visualized the results in Figure 3d and Figure 3j. The heatmap presents the critical regions on X-ray images and helps the clinicians make faster and more accurate diagnoses. Summarily, the consistent visualizations of heatmap depend on the accuracy of the model due to Grad-CAM conjoined features precisely based on the generated feature map. Figure 4 illustrates the normal chest and the heatmap one by one. The first row contains the X-ray chest images, the second row presents the visualizations of the abnormalities regions, and so on. The images with tag 79, 81, and 82 are classified to Normal class correctly and its heatmap visualized dark blue images. The others are misclassified due to the imbalanced dataset. The CNN though those images are Pneumonia class and generated the heatmap by Grad-CAM with abnormalities regions which are visualized by light green color.
The similar in Figure 5, the CNN classified to Pneumonia class correctly and generated feature maps for Grad-CAM. The highlight regions exhibit the areas where the CNN focused on, also known as the abnormalities regions. These regions are discriminative between Normal and Pneumonia class.

Metrics for Comparison
We computed the overall Accuracy, Area Under the Receiver Operating Characteristic Curve (ROC-AUC) to evaluate the generalization of the classifiers. The ROC-AUC was designed as a metric for distinguishing noise from not noise, ROC is a probability curve and AUC represents degree or measure of separability. That means, it tells how much model is capable of distinguishing between classes.

Execution time
The training section is more time consuming, it can take hours or days with complicated models. In this study, we used a shallow CNN, the images with size of 64 × 64, the training section finished after 5 hours. We implemented this system in Tensorflow, trained and tested on a 64-bit Windows system equipped with an NVIDIA GeForce GTX 1070 and 8 GB of memory. are presented in (d) and (j) [18].
EAI Endorsed Transactions on Context-aware Systems and Applications Online First

Results
We trained our CNN model with 10-fold cross-validation, reached the overall 99.3% of training and 84.8% validation accuracy. The Pneumonia cases were classified almost correctly in each validation fold but the Normal class was misclassified. The training and validation accuracy during training are visualized in Figure 6. In general, the difference between training and validation accuracy is not significant, the classifiers focused on Pneumonia features for generating the feature map which will be used by Grad-CAM.   In the case of imbalanced datasets, the accuracy is considered to be a misleading metric and the ROCAUC seems to be more effective metrics to evaluate the performance. We achieved 0.830 and 0.802 for trainingvalidation AUC respectively, it is quite good for classification tasks. The AUC is illustrated in Figure 7. These results stated the performance of CNN is acceptable for classification tasks. As we mentioned above, the good model is the well-generated visualizations for abnormalities regions in X-ray chest images. The abnormalities regions in Figure 5 was built based on the findings of CNN. But in Figure 4 it still visualizes the same things on several images due to the poor performance of CNN on Normal class. Figure 4 and Figure 5 show explanations generated by Grad-Cam using visualizations with heatmap color to point out signals which can aim to distinguish classes of a normal person and pneumonia patients. The output from the explanation method can provide important areas (in reds) in the images so that doctors can identify regions of injury in patients' lungs.

Conclusion
We presented an approach to use Convolutional Neural Network and explanation from the Grad-Cam algorithm to support Pneumonia diagnosis. The experiments provide interesting results in image analysis for predicting this disease.
There are significant differences in medical advancements among countries from various regions in the world where doctors' abilities are also different. Research on disease diagnosis based on computers can promote to reduce Adverse events in image-based medical diagnosis.
As an attempt for using an explanation for supporting disease diagnosis and due to some limitations of computation resources, we only deployed a shallow CNN architecture for image classification tasks on small-size medical images of Chest X-ray. In the future, deeper architectures should be investigated on larger images.
Although the detailed medical discussion of the areas visualized in the X-ray images and the quality of the obtained signatures is out of the scope of this study and is to be done by doctors doing pre-clinical research, we expect that our framework can help to better determine signal of pneumonia and to develop methods of imagebased diagnosis.