Gray level co-occurrence matrix and Schmitt neural network for Covid-19 diagnosis

INTRODUCTION: When COVID-19 spreads to most of the world, chest CT imaging is widely regarded as a convenient and feasible method for the diagnosis of suspected patients. In the traditional diagnosis method, doctors and experts judge these CT images and draw conclusions. However, with the surge in the number of suspected patients, relying solely on traditional manual diagnosis methods can no longer meet people's demand for efficiency. OBJECTIVES: A number of previous studies have shown that it is possible to use machine learning methods to help people diagnose suspected SARS-CoV-2 patients. However, the accuracy of the existing scheme still needs to be improved. METHODS: In order to make a more accurate diagnosis of suspected patients with SARS-CoV-2, we proposed a new model. We first preprocessed the CCT images of the tested objects to seek for higher accuracy, then extracted the texture features from the processed CCT images, and finally divided the CCT images of the tested objects into two categories: sick and normal using Schmidt neural network. RESULTS: The accuracy of the proposed model is (76.33±1.18%), while the accuracy of the existing model RBFNN, ELMBA, WEBBO and GLCM-SVM are (73.45±0.69%), (74.88±0.86%), (70.48±0.81%) and (64.42±0.88%), respectively. Compared with the existing RBFNN, ELM-BA, WEBBO and GLCM-SVM models, the accuracy of our proposed model is 1.45% higher than that of the best ELM-BA model. More importantly, the proposed model has better stability. CONCLUSION: The model we proposed is feasible for the corresponding diagnosis of suspected patients. This is not only conducive to timely treatment of patients, but more importantly, effective isolation of confirmed patients as soon as possible can prevent the further spread of the epidemic.


Background
In December 2019, the first case of SARS-CoV-2 was detected globally, and just a few months later, SARS-CoV-2 has spread to most countries and regions in the world [1]. On 11 February 2020, the novel coronavirus2019 coronavirus was officially named COVID-19, and on 11 March of the pneumonia worldwide, and 11,548 new deaths, which are still increasing. The epidemic has had a huge impact on the world politically and economically, and has fragmented countless families [3].
Although now through the continuous efforts of researchers, some countries have developed a new crown vaccine. In some countries, the epidemic has been further controlled through vaccination [4]. However, a considerable part of countries and regions do not have the economic and technological strength to conduct vaccine research and development and vaccination [5]. Therefore, isolation of infected persons is still the main means to prevent the spread of the epidemic at this stage. Timely diagnosis of suspected patients can break the chain of transmission and is essential to prevent further large-scale spread of the epidemic [6,7]. From the clinical observation of a large number of SARS-CV-2 patients, it is known that most infected patients will develop fever, dry cough, fatigue and other typical characteristics after the incubation period. However, there are a small number of infected people who still do not have any clinical symptoms after the incubation period [8]. In some cases, the incubation period is 3-14 days, but in a few cases, the incubation period is as long as 24 days. Therefore, COVID-19 virus is highly infectious, highly pathogenic and has a long incubation period [9]. Therefore, it is not advisable to rely only on routine symptoms to determine whether a patient has the disease. The identification of suspected COVID-19 patients is very difficult compared to those infected with conventional virus. At present, the most commonly used method for testing suspected SARS-CoV-2 patients is nucleic acid testing. This method first collects nasopharyngeal swabs from the subject, and then uses RNA reverse transcription and polymerase chain reaction (RT-PCR). The principle draws a conclusion whether the subject is sick or not [10]. When the test subject's nucleic acid test result is positive, we will classify it as a high-risk group and immediately isolate the subject and treat it accordingly. Conversely, when the nucleic acid test result is negative, we temporarily classify it as a low-risk group. However, according to existing studies, the results of nucleic acid testing cannot be regarded as the only criterion for whether the subject is ill. Secondly, when we conduct nucleic acid testing, we need specialized testing personnel and reagents. In some economically underdeveloped countries and regions, they lack relevant medical personnel and corresponding reagents. Normally, the results of nucleic acid testing are relatively slow. During this period, the subject is at risk of spreading the epidemic. Therefore, we urgently need an accurate and efficient detection method for the tested object [11]. Since the lung images of SARS-CoV-2 patients have features such as pulmonary sclerosis and ground glass shadows, this is very helpful for us to use the CCT images of the tested subjects for corresponding diagnosis. Moreover, compared with nucleic acid detection methods, it is faster and safer to use the CCT image of the subject to perform the corresponding diagnosis. Therefore, chest imaging has become a better method to diagnose potential new coronary pneumonia [12].

Related work
Although compared with nucleic acid detection, it is safer and faster to use the CCT image of the subject for diagnosis, but it still has some shortcomings. When making a corresponding diagnosis, experts and doctors generally give corresponding conclusions. This method is more dependent on the doctor's current state. Some experts and scholars have begun to consider using artificial intelligence to help people diagnose CCT images of subjects. [13] The combination of wavelet entropy (WE) and biogeographic optimization (BBO) is used to diagnose SARS-CoV-2. [14] The gray level co-occurrence matrix (GLCM) and support vector machine (SVM) are used to classify the new SARS-CoV-2. [15] proposed a Radial Basis Function Neural Network (RBFNN) and applied it to the detection of human brain pathology. In this article, we use it as a comparative experiment in the detection of COVID-19. [16] combined extreme learning machine (ELM) with bat algorithm (BA). This ELM-BA is used as comparison basis in our experiment. [17] proposed an optimized method for the diagnosis of SARS-CoV-2, which is based on three-stage biogeography. [18] presented CCSHNet for COVID-19 diagnosis. [19] presented an attention network for COVID-19 (ANC) which can provide explainable diagnosis. [20] In order to find the influence of CNN's depth and fine-tuning degree on migration learning, a detailed comparison experiment was carried out using VGG-16 and VGG-19. The result shows that the deeper the depth, the better the result. [21] First, a data set containing 5000 images was collected, and four networks ResNet18, ResnNet50, SqueezeNet and DenseNet-121 were trained on this data set using transfer learning. Finally, a technique was used to generate a heat map of the lungs of people infected with COVID-19. [22] In this work, a new convolutional neural network, CapsNet, is proposed for corresponding detection of suspected SARS-CoV-2 patients. The two-class and multi-class accuracy of the model reached 97.24% and 84.22% respectively. [23] In this article, we introduce a convolutional neural network, COVID-Net, which specifically detects SARS-CoV-2 from the CXR images of the lungs of suspected patients. And generated a public data set containing 13,975 CXR images. [24] In this research, a new framework, CGNet, is proposed to classify CXR images into two categories: normal and diseased. The best accuracy of this model is 98.72%, the sensitivity is 100%, and the specificity is 97.95%. [25] This paper proposes a lightweight CNN structure based on the SqueezeNet network structure. This model is used to classify the CCT of suspected SARS-CoV-2 patients accordingly. The final accuracy of the model is 85.03%. [26] This research proposes a computer-aided diagnosis (CAD) system for SARS-CoV-2 detection. The system consists of feature extractors, classification methods and content-based image retrieval (CBIR). The accuracy of the model on the CT data set and CXR data set reached 93.20% and 99.38% respectively. At present, there is an indisputable fact in the research field that most of our neural network training is based on gradient descent training. But algorithms based on gradient descent have the same shortcomings. All corresponding parameters need to be updated and adjusted during the training process, which will greatly reduce the training speed. Therefore, considering people's needs for efficiency and speed, we no longer use traditional neural networks based on gradient descent algorithms, but instead use Schmidt neural networks that can initialize weights and biases randomly. We first use the gray-level co-occurrence matrix to find the texture features of the preprocessed image, and then use the Schmidt neural network to classify it. After that, we compared our model with the existing RBFNN, ELM-BA, WEBBO, GLCM-SVM and other models. The results show that the performance of our proposed model has a significant improvement over the existing models. The main structure of this article is as follows: In the second section, we introduce the applied data set and the corresponding preprocessing work. In the third section, we describe the corresponding structure of the model and the main methods of application and evaluation indicators. In the fourth section, we summarize and state the experimental results accordingly. In the fifth section, we give the final conclusion and the defects of the model, as well as the work we need to do in the future.

Dataset
We obtained CCT scan images of a total of 284 study subjects from 142 patients with new coronary pneumonia and 142 healthy controls from local hospitals. The specific collection method is as follows: Philips Craft 64-slice spiral CT, KV: 120, Mas: 240, layer thickness 3mm, layer spacing 3mm, spacing 1.5: lung window (W: 1500HU, L: -500HU), mediastinal window (W: 350HU), L: 60HU). According to the lesion display, a lung window image with a layer thickness and a layer spacing of 1 mm was obtained. The patient is placed flat on the scanning instrument, and the patient is allowed to breathe deeply, and then scan from the top of the lung to the corner of the ribs.
We adopt a layer selection method for SARS-CoV-2 patients, choosing the layer with larger lesion area and more lesions. For healthy subjects, we can use images of any level, without the need to adopt a level selection method. Table 1 shows the demographic data of the subjects used in this study. We finally got a total of 680 CCT images including COVID-19 patients and healthy subjects. The resolution of the image is 1024×1024.
When we came up with two conclusions ( 1 , 2 ) for a CCT image, we asked senior doctors and experts to help us reach the final conclusion. Assuming that represents a CTT image scan, means the conclusion given by each expert or doctor, and the final annotation conclusion is determined by the following formula: represents the majority vote, and represents the opinions given by three experts.
We set the original image data set as 1 . 1 contains 320 CCT images of SARS-CoV-2 patients and 320 CCT images of healthy subjects.
where stands for gray scale operation. In order to enhance the contrast of the gray image, we used the histogram stretching method to process the image accordingly. For each picture 2 ( ), = 1,2, … ,640, we use the following formula to calculate their minimum gray value [ 2( )] and maximum gray value Here and represent the pixel coordinates of the image 2 ( ), and the image 3 ( ) stretched by the histogram is obtained by the following formula.
Then the data set 3 can be expressed as: After that, we trimmed the size of the image accordingly, and removed the edge and bottom blank areas of the image. Finally, we get the cropped data set 4 .
Here represents the tailoring operation. We set the values of the four variables , , and ℎ to 150, that is, the part representing the image to be removed from the , , and ℎ is 150. Each image has now been cropped from 1024×1024 to 724×724.
Finally, we take the subsampling operation for all the images, and we get the resized image data set 5 .
Here ↓ represents the down-sampling operation. After the fourth step, we get the image data set 5 with a number of 640, a channel number of 1, and a size of 256×256.   Table 2 shows the image size and required memory space after each preprocessing stage. We found that after these four preprocessing steps, each image required only 2.08% of the original memory space. Therefore, the experimental application of pre-processed images will save us a lot of memory resources. The efficiency of our experiments will also be greatly improved. We compared the memory space required by the final image with the original image. The comparison results are as follows:

Methodology
In this section, we will elaborate on the proposed model. The principle of gray level co-occurrence matrix for image feature extraction and the basic concept of Schmidt neural network are included. In addition, this section also introduces the required ten-fold cross-validation methods and the indicators required for model performance evaluation.

GLCM
Feature extraction refers to a method of transforming a piece of data to find its typical features. In the classification research, in order to reduce the burden of the classifier, we will perform corresponding feature extraction operations. Extracting the texture feature information in the image is an excellent method for X-ray image feature extraction.
Research on existing data shows that texture features are particularly suitable for X-ray image classification. The texture feature will also consider the quantization and positioning relationship of the pixel group with the same intensity. Any grayscale surface of an image can be viewed as a curved surface in three dimensions. In a three-dimensional space, two pixels are separated by a distance. They may have the same grayscale or different grayscales. GLCM [27] is to start from the pixel of image ( , ) whose gray level is , and to count the probability ( , , , ) of the simultaneous occurrence of pixel ( + , + ) whose distance is and gray level is . This is shown in  It can be seen from the above definition that the elements in the ℎ row and ℎ column of the GLCM constitute the frequency of occurrence of a pixel pair. The gray value of one pixel of this pixel pair is , and the value of the other pixel is . The angle between them is , and the interval is . Here is the included Angle between the X-axis and the two pixels in the clockwise direction [28], and is generally 0°, 45°,  We created the corresponding GLCM based on the CCT image, the offset in this process is determined and the creation process is shown in Fig 7. We first convert the preprocessed image to its corresponding grayscale image [29], that is, the intensity of each pixel with a depth of N bits ranging from 0 to 255. We then define the offset of the symbiotic comparison. Since the gray level co-occurrence matrix is twodimensional and our image has only 8 gray levels at maximum, this matrix is 8×8 matrix. The gray co-occurrence matrix reflects the gray direction of the image. It also in turn reflects the comprehensive information of gray changes in adjacent intervals. We can use it as information to analyze the image base and arrangement structure. However, we generally do not directly calculate the GLCM as the characteristic quantity of texture analysis. When we extract texture features, we generally use the GLCM, which is called quadratic statistics. (14)-(18) Give the calculation equation of each characteristic quantity. Table 3 provides a detailed explanation of the symbols in the example. CON: A measure of the result of comparing a pixel value with the pixel value of the surrounding area. This metric value also reflects the clarity of the image and the depth of the image texture. When the sharpness of the image is higher, the texture of the image will be deeper. When the sharpness of the image is low, the texture of the image will be lighter.
COR: Used to express the degree of local correlation in grayscale images. When the value distribution of the graylevel co-occurrence matrix in a certain image range is relatively uniform, its correlation will be greater. Conversely, when the value distribution of the gray-level co-occurrence matrix in a certain image range is chaotic, its correlation will be smaller. When texture features appear in a certain direction, the correlation in this direction is greater than the correlation in other directions.
ASM: It is a measure of whether the grayscale distribution in the image is uniform, and it is also a measure of the depth of texture in the image. Its calculation method is to first square all the elements in the gray-level co-occurrence matrix, and then add them up to get the energy of the image. If the EAI Endorsed Transactions on e-Learning 04 2021 -08 2021 | Volume 7 | Issue 22 | e3 gray value fluctuation range in the matrix is relatively small, then the energy value of the image is relatively large, on the contrary, if the gray value fluctuation range in the matrix is relatively large, then the energy value of the image is relatively small.
Entropy: Entropy is a measure of the non-uniformity of image texture. When the element values of the matrix are randomly distributed, the entropy value of the image is larger. Conversely, when the element values of the matrix are distributed regularly, the entropy value of the image is smaller.
IDM: Reflects the clarity and regularity of the texture. When the texture of the image is clearer, the regularity is stronger, and the texture is better than the description, the greater the value of IDM. On the contrary, when the texture of the image is messy and the texture is more difficult to describe, the value of IDM is smaller.
When we calculate the texture features of these images, the means and standard deviations of and have to be calculated as follows: ( ) is the ℎ entry in the marginal probability matrix obtained by adding the rows of ( , ).
( ) is the ℎ entry in the marginal probability matrix obtained by summing the columns of ( , ).

Schmidt Neural Network
Similar to the traditional single hidden layer neural network structure, the extreme learning machine has only a three-layer structure, including an input layer, a hidden layer in the middle, and an output layer at the end. However, compared with the traditional neural network based on gradient descent algorithm, the extreme learning machine has a very different parameter update. The basic core of the traditional backward propagation algorithm is to use the gradient descent mechanism to update all the corresponding parameters each time during the training process [30]. This training method is very time-consuming and cannot meet people's highefficiency needs. The extreme learning machine uses a new training algorithm, which randomly generates input weights and deviations during the training process, and at the same time derives the output weights from the principle of the generalized inverse matrix. Therefore, compared with the traditional neural play network based on gradient descent algorithm, it greatly shortens the time required for training while ensuring that the accuracy is almost unchanged. The structure of ELM is shown in Fig 8. The structure of Schmidt Neural Network (SNN) is similar to that of Extreme Learning Machine (ELM), and the principle of training is also the same. The only difference between the two is that the output deviation of the extreme learning machine is always 0. But the 1 output deviation of the Schmidt neural network may not be 0. The structure of the Schmidt neural network is shown in Fig 9. We assume that the number of training sets is , and each sample in the training set can be expressed as ( , ), where = [ 1 , 2 , … , ] ∈ , = [ 1 , 2 , … , ] ∈ , represents the ℎ data, and represents the label corresponding to the ℎ data. For a neural network, we can simply regard it as a function. There is a hidden layer from the input layer to the corresponding output layer, the hidden layer is a fully connected layer, and there are nodes in the hidden layer. When we are given data set samples, the corresponding output of the hidden layer is: The hidden layer contains several non-linear activation functions that can be different. The input of the input layer is multiplied by its weight, and then the corresponding deviation is added to transform the non-linear activation function to obtain the output of the hidden layer. Where in, = [ ,1 , ,2 , … , , ] represents the set of weights corresponding to the input layer, β I represents the set of corresponding hidden layer output weights, and represents the corresponding set of deviations corresponding to the input layer.
· is the inner product of and . The ( ) corresponds to the activation function of the hidden layer. The activation function is generally nonlinear. Common activation functions include Sigmoid function, Gaussian function, etc.
In order to obtain β I that has a good effect on the training sample set, we must ensure that the error reaches its minimum during the training process. We can set the square difference between the label value and the output of the output layer as the objective function. To minimize the error in the training process, we only need to minimize the value of the objective function. According to the definition, the objective function can be written as: So there are , and , such that It can be represented matrices as:  (28) represents the output matrix corresponding to the hidden layer, represents the target matrix of the corresponding training data, and represents the weight of the output corresponding to the hidden layer.
( 1 , … , , 1 , … , , 1 , … , ) = In order to train the entire single hidden layer neural network accordingly, we hope to obtain � , � and � , such that: In this formula, = 1, … , , which means that we bring the value of the loss function to its corresponding minimum value.
We transform the equation into: = , can be written as:

K-fold Cross Validation
When we train the model accordingly, the data set will be divided into two parts: training set and test set. The training set is responsible for training and updating the parameters of the model, and the test set is responsible for the final model performance test part [31]. However, due to the particularity of the research field, the medical field often faces the situation of missing data sets or insufficient data sets. If you still divide the data set into two major sections according to the original method, you will face the problem of model overfitting [32]. That is, the model performs well in the training set, but the performance in the test set is really unsatisfactory. But if the data in the test set is divided into the training set, this will improve the accuracy of the model. But the final accuracy of the model cannot be guaranteed. When we give the model a new test set, the model's fitting ability may be poor.
In order to solve this problem, we generally separate a part of the training set data as a validation set. The role of the validation set is to make corresponding choices for the generated models. At the same time, in order to prevent the problem of over-fitting, we will use the k-fold crossvalidation method to solve it. The principle of this method is to evenly divide the data set into k equal parts, and take one of them as the verification set each time, and then the remaining k-1 parts as the corresponding training set data. Finally, repeat k times. In this case, each part of the data will eventually serve as the data part of a part of the validation set. We will eventually retain the average of the k results as the final result. In this way, we can effectively result in the problem of model overfitting.
One of the methods we often use to test the accuracy of algorithms is the ten-fold cross-validation method. Similar to the principle described above, we will first divide the data set into ten equal parts evenly, and each time nine of them will be used as training set to participate in the updating of model parameters, and the remaining one will be used as verification set to participate in the verification of the model. We will repeat this step ten times, so that each part of the dataset has a chance to act as a validation set. Finally, we keep the average of these ten results. There is some theoretical EAI Endorsed Transactions on e-Learning 04 2021 -08 2021 | Volume 7 | Issue 22 | e3 evidence that dividing the data set into 10 segments yields the best error estimates.
In this experiment, we used a ten-fold cross-validation method to evaluate the model. Fig 10 is a detailed illustration of the ten-fold cross-validation method.

Measurement
In order to verify whether our proposed model has the meaning of existence, we use the sensitivity, specificity, precision, accuracy, MCC, F1-Score and FMI as the indicators of our model performance measurement. In order to obtain these indicators, we need to use a visualization tool: confusion matrix. Its basic principle is to compare the predicted result with the actual result and produce four corresponding results. A true positive means that the predicted result is positive, and the actual result is also positive. Conversely, a false positive means that the predicted result is positive, but the actual label of the sample is negative. True negative means that the predicted result is negative, and the label of the actual sample is also negative. On the contrary, false negative means that the result predicted by the model is negative, but the label of the actual sample is positive. When we accumulate all false negative and false positive results, it represents all the parts of the sample that were predicted incorrectly. When we accumulate all true positive and true negative results, it represents all the parts of the sample that are predicted to be correct [33]. In the confusion matrix, the column of the matrix represents the collection of all the attribute samples predicted to be the column, and the row of the matrix represents the label that the row of samples should correspond to.
More advanced classification indicators can be obtained from the confusion matrix: Accuracy, Precision, Specificity, Sensitivity, MCC, F1-score and FMI. For the dichotomy problem, the sample can be divided into: = + + + according to the combination of its real category and model predicted category.
1. Accuracy: It is often used to indicate the basic indicator that the model can make accurate predictions. We first add up the total number of true positive samples that the model can predict correctly plus the total number of true negative samples that the model can correctly predict, and then compare the total number of samples. The value obtained is the ability value of the sample to correctly predict the sample.
(36) 2. Sensitivity: We need the model to be able to maximize the judgment of all positive samples as positive [34]. Therefore, the sensitivity is the ratio of the total number of samples that the model judges to be true positives to the total number of positive samples.
3. Specificity: Contrary to sensitivity, specificity represents the model's ability to maximize the ability to identify true negatives in negative samples. Therefore, the specificity calculation formula is the total number of samples identified as true negative divided by the total number of samples actually labeled as negative. 6. F1-score: It is a common indicator to measure the performance of a classification model. Its maximum value is 1, and its minimum value is 0.
(41) 7. The Fowlkes-Mallows score FMI [35] is defined as the geometric mean of the pairwise precision and recall.

Statistical Analysis
In this experiment, we cooperated with local institutions and recruited a total of 284 volunteers to participate in this research. Our final data set obtained 640 lung CT images of the subject, including 320 lung CT images of SARS-CoV-2 infected persons and 320 lung CT images of healthy subjects. After that, in order to be more helpful to the experiment, we carried out preprocessing operations on these images. In the experiment process, the texture feature of the processed image is extracted through the generated GLCM. Finally, the extracted texture feature is classified by Schmidt neural network to obtain the experimental result. In order to evaluate the performance of our proposed model, we obtain the sensitivity, specificity, precision, accuracy, F1-score, MCC and FMI indicators of the model through the visual confusion matrix. The detailed data is shown in

SNN against Fully-connected layer
We replaced the SNN of this model with the full-connection layer network, and the results after ten times of ten-fold crossvalidation are shown in  ± 1.30%), respectively. It can be found that the proposed model is superior to the replaced model in terms of sensitivity, specificity, precision, accuracy, F1score, MCC and FMI. Therefore, compared with the traditional neural network, SNN not only has a faster learning speed. In some specific tasks, the comprehensive performance of SNN is also better than that of the traditional neural networks. Fig 12 shows the trend chart of each index in detail.   Table 6. Fig 13 is a three-dimensional display of the performance of several models. It can be found that the model we proposed performs best in most of the indicators. In terms of sensitivity, it was 1.72% higher than WEBBO, and in terms of specificity, it was 0.62% higher than GLCM-SVM. In terms of precision and accuracy, compared with GLCM-SVM, the results are improved by 1.02% and 1.45%, respectively. In terms of F1-score and FMI, our model also increased by 2.47% and 1.68% respectively compared to GLCM-SVM. Compared with the above model, our proposed model uses the gray level co-occurrence matrix for image feature extraction. Compared with the above feature extraction method, this method can more comprehensively reflect the image direction, adjacent interval, change range and other comprehensive information. Image information.
After that, we used Schmidt neural network to classify the corresponding features. Due to the random initialization of weights and biases, the network not only has a good improvement in classification effect compared with traditional forward neural networks, but also has a considerable improvement in classification efficiency.

Conclusion
In this study, we introduced a model for COVID-19 detection. We first use the GLCM to extract the features of the preprocessed CCT image, and then use SNN to classify the image with the extracted features. We divide the images into two categories, one is the CCT of HC, and the other is the CCT of people infected with COVID-19. After that, in order to verify the effectiveness of our proposed model, we conducted comparative experiments with several existing methods: RBFNN, ELM-BA, WEBBO and GLCM-SVM. The experimental results show that the proposed model is not only the best in terms of specificity, accuracy, and precision than the existing models. In addition, in terms of model stability, our proposed model also performs best.
In the field of medical imaging, due to special reasons, the lack of data sets is a common phenomenon. Without sufficient data sets for corresponding training, the model cannot achieve the desired effect. In the follow-up work, we will collect more CCT images of COVID-19 patients to expand the data set. In addition, we will continue to improve the model, or develop a model with higher accuracy for the diagnosis of COVID-19, to help all mankind overcome this difficulty as soon as possible.