Texture-based Feature Extraction for COVID-19 Pneumonia Classification using Chest Radiography

INTRODUCTION: The identification of COVID-19 pneumonia using chest radiography is challenging. OBJECTIVES: We investigate classification models to differentiate COVID-19-based and typical pneumonia in chest radiography. METHODS: We use 136 segmented chest X-rays to train and evaluate the performance of support vector machine (SVM), random forest (RF), AdaBoost (AB), and logistic regression (LR) classification methods. We use the PyRadiomics to extract statistical texture-based features in the right, left, and in six lung zones. We use a stratified k-folds (k=5) cross-validation within the training dataset, selecting the most relevant features with validation accuracy and relative feature importance. RESULTS: The AB model seems to be the best discriminant method, using six lung zones (AUC = 0.98). CONCLUSION: Our study shows a predominance of radiomic texture-based features related to COVID-19 pneumonia within the right lung, with a tendency within the upper lung zone.


Introduction
The outbreak of COVID-19 pneumonia, caused by the coronavirus strain SARS-Cov-2 (severe acute respiratory syndrome coronavirus 2), has caused global turmoil and was declared a pandemic by the World Health Organization on March 13, 2020. Until February 19, more than 109 million cases worldwide were confirmed. More than 9.9 million cases located in Brazil.
The incubation period of COVID-19 is 5.2 days and can last up to 14 days [1]. Clinical features include respiratory symptoms, fever, cough, dyspnea, and viral pneumonia. COVID-19 shows to be more transmissible when people display the symptoms [2]. However, there are several cases in which subjects are asymptomatic. Radiological chest * Corresponding author. Email: ana.marques@pucrs.br examinations, as chest X-ray or computed tomography (CT) already play a fundamental role in monitoring COVID-19 [3].
Observation of radiological lung patterns can reveal different types of pulmonary diseases. These patterns can be described based on the disease and the affected tissue region. Knowledge of disease-related patterns is very important for differentiation and follow-up of pulmonary diseases. Studies have shown that COVID-19, for instance, induces abnormal pneumonia that leads to a bilateral, peripheral, ground-glass opaque pattern [4]. Apart from typical visual analysis, lung diseases patterns can be studied through feature analysis, using a technique called radiomics.
Computer-based texture analysis is used to numerically quantify specific features of an image. The quantitative analysis of morphological, intensity, and texture features are helpful on diagnosis and prognosis. Texture analysis can be EAI Endorsed Transactions categorized into structural, model-based, transformational, and statistics-based [5].
Most recent COVID-19 radiological studies are focused on CT scans, which has better sensitivity than X-ray. However, CT is more expensive and scarcer when compared to conventional radiography, requiring a more complex process of decontamination after COVID-19 patient scanning. The American College of Radiology (ACR) recommends CT exams to be used sparingly and reserved for hospitalized COVID-19 symptomatic patients with specific clinical indications. Portable chest X-ray is suggested as a viable option to minimize the risk of cross-infection and avoid overload and disruption of radiological departments [6].
Several models of classification and prediction of COVID-19-based pneumonia using binary or multilabel classification techniques have been developed to aid the diagnosis using CT and X-ray images [7]- [13]. Researchers have been trying to identify patterns and features related to this new disease.
Our goal in this article is to investigate classification models to differentiate chest X-ray images of COVID-19based pneumonia and typical pneumonia, and to provide grounds for understanding the distinctive radiographic features of COVID-19. Our study analyzes texture-features in two different approaches of lung segmentation showing a predominance of radiomic feature selection in the right lung, with a tendency to the upper lung zone. Figure 1 shows a sample of the images we use in this study. A total of 136 anteroposterior (AP) and posteroanterior (PA) chest X-rays from two public databases are used to train and evaluate the classification methods we investigate in here. We use 68 COVID-19 images provided by the COVID-19 Image Data Collection [14], which comprises images and information of multiple centers, and some subjects with longitudinal studies. We choose to use only the images of the first time point of each subject. Images have various matrix sizes, ranging between 156 × 156 pixels and 3,520 × 4,280 pixels, stored in JPEG and PNG.

Image Dataset
The CheXpert dataset [15] was used to get additional 68 chest X-ray pneumonia images. This dataset is labeled as certain, uncertain, and no findings for pneumonia. Here we only use images labeled as certain for pneumonia, randomly selected from the data source. Image sizes vary from 320 × 320 pixels to 320 × 394 pixels, all in JPEG.
The entire dataset is split into training (80%, = 108, with 54 COVID-19 images) and test sets (20%, = 28, with 14 COVID-19 images). The testing set is never seen by the model until the very last metric evaluation.

Lung Segmentation
The chest X-ray images were rescaled to 256×256 pixels due to computational constraints, and we apply a histogram equalization procedure over them. We segment lungs using an open source pre-trained U-Net-inspired segmentation model to generate lung masks. Two different lung segmentation methods are used: left and right lung (L-R); and further division of upper, middle, and bottom zones (see Figure 2). All lung segmentations were visually inspected.
Lung masks are stretched back to 512×512 pixels. To remove background clusters and fill holes of the lung mask, we apply an opening morphological operator with a structural element and 8-connected neighborhood. We remove clusters Texture-based Feature Extraction for COVID-19 Pneumonia Classification using Chest Radiography 3 with less than 5 pixels. We make a division of connected areas and exclude areas with less than 75 pixels.
The split between left and right lungs uses the centroid of two areas; if the centroid is located within the first half of the matrix size (from left to right), we considered it as part of the right lung (observe the radiological image in chest X-ray is mirrored). The height of each lung is divided into upper, middle, and bottom zones, determined by the difference in the extremity points, divided in thirds.
Lung masks are applied in the respective chest X-ray and divided in left and right side (L-R), and then divided into superior, middle, and bottom zones (lung zones). This separation into lungs zones is similar to the proposal in [16] regarding CT lung images. Both lung approaches are used independently for all images in our experiments.
The model we use for segmentation is trained in two different chest X-ray databases, JSTR and Montgomery County. The images of those dataset are from patients with tuberculosis, and hence they are not specific for COVID-19based or typical pneumonia lung segmentation.

Radiomic Feature Extraction and Selection
We make use of the PyRadiomics library for the extraction of statistical texture-based features of first and second order for each lung mask. The number of radiomic features is divided into five classes [17] and are described in Table 1.  Based on the first-order histogram and related to the pixel intensity distribution.
Gray-level co-occurrence matrix or GLCM 24 Gives information about the gray level spatial distribution, considering the relationship between pixel pairs and the frequency of each intensity within an 8-connected neighborhood.

Gray-level Run Length Matrix or GLRLM 16
Is like GLCM, it is defined as the number of contiguous pixels with the same gray level considering a 4-connected neighborhood, indicating the homogeneity of pixel intensity.
Gray-level Size Zone Matrix or GLSZM 16 Is used for texture characterization, it provides statistical representation by the estimation of a bivariate conditional probability density function of the image distribution values. It is invariant to image rotation.

Gray-level Dependence Matrix or GLDM 14
Quantifies the dependence of image gray level by calculating the connectivity at a certain distance when their difference on pixel intensity is < 1.
For feature selection, we use the Random Forest (RF) method, which is a tree-based ensemble approach that has a high performance in multidimensional data due to the use of an internal feature-selection mechanism based on feature importance. It uses Gini-based values that allows one to rank the features during model training. In a nutshell, Gini is an impurity measure that indicates class heterogeneity in each node of the tree. For performing feature selection, we use a stratified K-folds (K=5) cross-validation procedure within the training set for generating validation data. Each feature was independently scaled between [0,1], and an RF model is trained in 4 of the folds and validated in the 5th fold. We measure validation accuracy and the relative feature importance. We calculate the mean of the relative feature importance based on the five validation folds. For providing statistical validity, we execute this procedure for 100 times. Relative feature importance is then averaged, and we calculate the confidence intervals. The feature selection pipeline is shown in Figure 3. In this study, we use 136 images, and we choose to keep one feature for each 10 images, therefore selecting the 13 most relevant.

Machine Learning Models
We train four classification models over the selected radiomic features. The classification methods we use in this paper are Support Vector Machines (SVMs, more specifically its SVC version [18]), Random Forest (RF) [19], Adaptative Boosting or AdaBoost (AB) [20], and Logistic regression (LR) [21]. Each method is briefly presented next: • Logistic regression (LR) is a linear model for classification based on the logistic (sigmoid) function to predict binary or multiclass dependent variables based on the maximum-likelihood ratio [22]. The LR function traces a hyperplane where data is fitted, and when interaction terms are included, the model provides more flexible decision boundaries. Regularization is applied to adjust the decision boundaries to avoid overfitting [23], usually in the form of 1 or 2 norms, where 1 -norm results in sparser solutions and 2 -norm in more restricted boundaries. • Support Vector Machine (SVM) is a linear classification algorithm that seeks for the best discriminant vector that segregates the data classes. It traces a vector that separates training data into two classes and adjusts the vector in relation to the nearest data point of each class, determining a margin. One of the benefits of using this strategy is due to the mathematical formulation of the method that works with support vectors, allowing for better generalization even for very high-dimensional and small datasets [24][25]. • Random Forest (RF) is a tree-based ensemble learning algorithm that induces a pre-specified number of decision trees to solve a classification problem. Each tree is built using a subsample of the training data, and each node searches for the best feature in a subset of the original features.
The assumption here is that by combining the results of several weak classifiers (each individual tree) via majority voting, one can achieve a strong classifier with enhanced generalization ability [26]. • Adaptative Boosting or AdaBoost (AB) is also an ensemble method for boosting a model, here also a decision tree. It fits a classifier on the original dataset and give weights for each sample. Copies of the classifier are generated based on the weight of the samples, and those samples that are more often incorrectly classified are used in the subsequent classifiers, making the method to focus on the difficult cases. In the set of trained classifiers, the final classification is given by majority voting [27][28].
We tune the hyperparameter of all models via grid search with cross-validation, looking for the set of hyperparameters that provides best sensitivity. We tune the following hyperparameters: kernel, polynomial degree, and regularization term for SVC; number of estimators, criterion and maximum depth for RF; number of estimators and learning rate for AB; and penalty, regularization term and maximum iteration for LR. All models are implemented using the scikit-learn library on Python version 3.6.5 [29].
For model evaluation, we make use of the sensitivity (Eq. 1), accuracy (Eq. 2), and the area under the curve (AUC) of the receiver operating characteristic (ROC). The final model is selected based in the best sensitivity achieved in the validation dataset (test folds in this inner cross-validation procedure). Each metric was calculated as follow [30]: where: TP = true positive, TN = true negative, FP = false positive and FN = false negative.

Selected Features
For each approach (lung zones and L-R), we extract 88 features. The relative importance of each feature and each approach is measured with 100 runs of the RF model with stratified -Folds ( = 5). Texture-based Feature Extraction for COVID-19 Pneumonia Classification using Chest Radiography 5 The relative importance of each radiomic feature in each approach is shown in Figure 4.
The thirteen most relevant radiomic features for each approach are detailed in Appendix A.

Classification
After the selection of the best hyper-parameters, all models are trained in the entire training dataset and their performance is evaluated in the test dataset. Table 2 shows the values of AUC, sensitivity and accuracy for each model. Based on the sensitivity values, the best model was AdaBoost (AB). The parameters used in this model was the learning rate equal to 1 and 50 estimators.  Figure 5 shows the receiver operating characteristic (ROC) graph for each classification model, using the lung zones, and the left-right (L-R) approaches.  Figure 6 shows the time that was needed to run each step of the proposed pipeline to make a prediction of a new case/subject using the AB model.

Discussion
A wide variety of computer-aided methods are being applied to aid the diagnosis and prognosis of COVID-19 [7], [8], [32], [33]. Clinical evaluation of symptomatic COVID-19 patients exhibits pulmonary problems and atypical pneumonia. Deep learning (DL) "black box" techniques are the most common strategy nowadays. These techniques are not capable, originally, of explaining their predictions in a way that humans can understand [34]. In our approach, radiomics and machine learning models are used to differentiate pneumonia patterns in chest X ray images. Even though the models we use are not completely explainable, all features involved in the classification process have a definition and some can be associated with known radiological patterns. In this way, it is possible to know what aspects of the image in our models are relevant for the proposed classification.
Our analysis aimed to find a group of meaningful radiomic features and the best classification model to differentiate between COVID-19-based pneumonia and typical pneumonia. Two lung segmentation approaches are performed to assess the influence in pneumonia types differentiation: left and right lung sides (L-R), and in zones in each lung (upper, lower, bottom). The segmentation and use of masks in specific regions avoid the features unrelated to the lung disease pattern, such as lung borders, presence of heart, muscles and bones, restricting the evaluation in the lung tissues.
The use of the lung zones approach, separating the upper, middle, and bottom regions, achieves better performance than L-R. This might be associated with the use of smaller regions for the feature analysis step, making it more representative of small structures, suppressed in L-R group due to the prevalence of bigger homogeneous regions. Our best result reached 94% of sensitivity in differentiating COVID-19-based pneumonia from typical pneumonia using the AdaBoost model with the separated lung zones.
GLSZM class of radiomic features are selected in 5 out of 6 lung zones, comprising 11 out of 13 significant selected features. These features are based on a gray level zone, defined as the number of 8-connected pixels sharing the same gray level intensity. They are invariant to rotation, with the initial matrix calculated in all directions at once [35].
When the importance of each feature in the classification process (Fig.2) is correlated with the features (Table 3), it can be seen that two GLSZM features are between the most relevant in both segmentation approaches: the size zone non-uniformity normalized feature and the small area emphasis. The first quantifies the variability on the size zones, with a lower value representing a higher homogeneity among the size of the zones. The second feature, on the other hand, quantifies the distribution of small size zones in the image, having higher values when fine textures are present [17].
In the GLDM class, which quantifies gray level dependency on an image, defined by the difference between neighboring pixels connected with the central pixel, the dependence non-uniformity feature is among the most relevant features. It quantifies the similarity between pixels, where small values represent a higher homogeneity [36].
Four different machine learning models for classification (SVC, RF, AB and LR) between COVID-19based pneumonia and typical pneumonia were used with the previously 13 selected features. The highest sensitivity was obtained using the AB model with the lung zones approach. AdaBoost is referred, in other studies, as the best option to boost the performance of decision trees on binary classification problems. AdaBoost creates a collection of weak learners by maintaining a set of weights over training samples and adjusting these weights after each weak learning cycle adaptively: the weights of the samples which are misclassified by the current weak learner will be increased, while the weights of the samples which are correctly classified will be decreased [20].
When we use L-R lung segmentation, all models perform similarly. This might be due to the classification accuracy limit using the whole lung segmentation, where a vast area is used to the extraction of the radiomic attributes.
In our study, we achieve 93% and 95% of accuracy and sensibility. Even though it is a binary classification problem, our study used lung segmentation, with features related only to the pulmonary tissue patterns.
Other classification studies classifying COVID-19based pneumonia from other pulmonary diseases have emerged in the last few months. Accuracies varying between 90% and 98% in pneumonia classification have been achieved with DL models [9]- [12]. Apostolopoulos et al. [7] extracted biomarkers from chest X-ray images to differentiate seven classes: COVID-19-based pneumonia, edema, pleural effusion, emphysema, fibrosis, pneumonia and normal. DL was used for extraction of high order features and a MobileNet v2 for multilabel classification. Accuracy was 87.7%, with 99.18% and 97.36% of accuracy and sensitivity just for the COVID-19 pneumonia class.
Asnaoui and Chawki [8] have used X-ray and CT images with DL models to classify between atypical COVID-19-based pneumonia, typical pneumonia, and normal subjects. DL models they use are: VGG16, VGG19, DenseNet201, Inception_ResNet_V2, Inception_V3, Resnet50 and MobileNet_V2. They obtained 82.8% of sensitivity in COVID-19 classification. However, it is important to note that typical pneumonia and normal subjects were taken from a pediatric database, while COVID-19 dataset has only images of adult subjects. Thereby, this model might be differentiating the relation of adult versus pediatric X-ray, and not the pulmonary diseases. The population bias was avoided by the work published by Rahimzadeh and Attar [37], where they use the same database and groups for classification, but using only adult chest X-rays. In that study, the authors concatenated two DL networks (Xception and ResNet50V2) for high order feature extraction and classification. The average performance accuracy was 80.5%, with a sensitivity of 99.5% for COVID-19 cases.
Despite showing encouraging results of classification accuracy and sensitivity, the use of DL to aid diagnosis of diseases has shown some concerns about the model explainability. DL are "black boxes", with fundamental issues in explaining the decision-making process for classification, and which aspects of the input data drive the decisions of the network. Furthermore, Maguolo and Nanni [38] showed that their datasets might influence many proposed DL models for COVID-19 identification. Images from the same dataset usually have similar characteristics, since most come from similar equipment and medical center. Because DL models use all available information and characteristics on the image, the models might be learning to discriminate the datasets instead of diseases, leading to biased COVID-19 identification.
Several studies have shown that the radiological findings in CT images of COVID-19 patients are evenly distributed between the left and right lungs [39]- [41]. However, findings are more present in the lower right lobe, followed by the upper and lower left lobes [4], [42], [43].
It is interesting to note that, among the most important features selected by our method, we have two related to the bottom right lung. For the left lung side, we only have features related to the medial and upper lung zones. We hypothesize that for our method, the lack of selection of features in the lower part of the left lung might be due to the heart penumbra, which makes it harder to segment this lung region. In CT images, however, this tissue overlay does not occur. Thus, there may be a significant variation in features in this anatomical region, causing it not to be selected to perform the classification.
It is important to emphasize that the computational cost of ML models should not be too high so they could be potentially applied in clinical use. The analysis of computational time to perform the classification using our best model, including all steps, took less than two seconds using 2.3 GHz Intel Xeon processor with a single core, making it suitable for clinical use.
Some limitations of this study are related to the low spatial resolution of some images from the COVID-19 dataset, and a small number of chest X-ray images of subjects with COVID-19-based pneumonia in public datasets. Another limitation of our method is the segmentation step that needs further work to improve its reliability. Currently, chest X-ray segmentation models are trained in images of subjects that have pulmonary diseases without severe lung obstruction or lesions, leading to a non-generalization for more aggressive pulmonary diseases.

Conclusion
This paper presents the investigation of classification models to differentiate chest X-ray images between COVID-19-based pneumonia and typical pneumonia. Our analysis showed that AdaBoost is the best discriminant method between features related to COVID-19-based pneumonia when compared to typical pneumonia, using a model of lung segmentation in six distinct zones. Our study showed a predominance of features being selected in the right lung, with a tendency to the upper zone.
Further studies are required to increase the number of chest X-ray images of COVID-19-based pneumonia to investigate features related to radiological findings. A more in-depth evaluation of the radiomic features related to COVID-19 in chest X-ray and CT images will be required to analyse whether there is a radiomic signature of COVID-19.