Cervix Image Classification for Prognosis of Cervical Cancer using Deep Neural Network with Transfer Learning

.

Sanjeev Dhawan, Kulvinder Singh and Mamta Arora 2 report published by the world health organization gives worldwide facts that new cancer cases reported per year are about 10 million which will rise to double i.e. 20 million in the year 2020 [1]. This figure can be reduced by a quarter with the proper screening and awareness campaign. It is also reported that a quarter of cancers are caused by infections, including hepatitis B, which is linked to liver cancer, and the human papillomavirus, which is linked to cancer of the cervix. The most important fact of cervical cancer is that it is 100% curable and it takes decades to develop. In this way, it very well may be counteracted by auspicious screening and even relieved with proper treatment. The adequacy of treatment profoundly relies upon the anatomical kind of a patient's cervixspecifically, the type of their cervix. Under the present arrangement, the cervix is classified into three types (Type 1, Type 2, and Type 3), which can be recognized from one another outwardly, using colposcopy [2]. An exact finding of a patient's cervix type is critical. If a physician mistakenly finds a different type of cervix and provides medications according to findings then it increases the chance of leaving malignancy after treatment which sometimes also becomes life-threatening [2]. The treatment will be proven effective if the right type of cervix is identified at the early stage. According to cervical cancer statistics [3], the early identification of cervical cancer can increase the 5-years survival rate to 66%. The 5-years survival rate means how many women will be alive for at least 5 years after the cancer is found. The commonly used cervical cancer screening methods are Pap test, HPV Testing, colposcopy, and digital cervicography. These are effective methods of screening but are time-consuming and also suffer from low sensitivity in detecting CIN2/3+. Moreover; these tests require expert personnel and laboratory setup for conducting the test. The entire abovementioned screening test requires human interventions thus is more prone to human error. Deep learning and computer vision have proven effective in the health care domain for the classification of medical imaging. To aid health care providers, we proposed an algorithm using deep learning and transfer learning that will classify type the cervix images into 3 classes. In this work, we present a novel deep learning-based predictive model that diagnoses the stage of cervical cancer by classifying the cervix images into one of the three classes (Type 1/Type2/Type3). The proposed model is trained and evaluated using cervigram images pertaining to the competition hosted by Intel & Mobile ODT Kaggle [4]. The major contribution related to the proposed work is a predictive model for cervical cell recognition and classification with an accuracy of 97.1%. Usage of the predictive model will help the healthcare providers in giving timely and cost-effective results.

Related Work
With the advent of artificial intelligence technology, Deep learning is gradually becoming a state-of-art technique for solving the problems of the medical domain. The recently published studies state their applicability in identifying the stage of various types of cancers. The traditional method used widely in the early diagnosis of cervical cancer is a cytology-based screening method [5]. The main limitation of the cytology-based screening method is that it requires domain expertise and is time-consuming [6]. Thus, to serve the mass population it raises the need for computer-based algorithms. In 2015, Author Song et al. Published a paper in which they proposed an algorithm based on a "multiscale convolution neural network (MSCN) and graphpartitioning-based algorithm" [7]. The proposed multiclass convolution neural network was used for the segmentation of nuclei and cytoplasm whereas the graph-based method was used for enhancing the results of segmentation. In 2016, Author Mithlesh et al. published a paper [8] in which they demonstrated an approach that uses multiple and overlapped cells from the Pap smear images. The pap smear images were in JPEG format and collected from Jaipur pathology labs. In 2017, Author Song et al. extended the work by introducing a new approach namely overlapping constraint level set for refining the boundary of overlapped cells [9].
In 2020, Author Anshu Malhotra and Rajni Jindal recently proposed a Multimodel deep learning-based framework [10] that detect depression and suicidal behaviour of the user by analysing their social media post. The input to this model is any social post that can contain text, images, video, or emotions. Each type of message is extracted and interpreted by using different modalities of deep learning techniques. For Example, features from the images are extracted using the VGG16 network similarly Region Convolutional Neural Network (RCNN) can be used for extracting the features from videos. Finally, the combined weighted score is used for predicting user behaviour. In [11] authors M. Sharma and N. Romero presented "Future prospective of soft computing techniques to diagnose psychiatric disorder". They also emphasize using various deep learning techniques for better results and fast processing. Bibhuprasad Sahu [12] et al. performed the experimental investigation on the Wisconsin Breast Cancer Dataset obtained from the UCI Repository of Machine Learning Databases. They proposed a hybrid model using artificial neural network (ANN) and Principal component analysis (PCA) which gives more promising results over other machine learning models. The PCA is used for minimizing the noise in the dataset and ANN is used for classification tasks. Their analysis reports 97% accuracy obtained using a hybrid model. In [13] Author Manik Sharma et al. presented a comprehensive review of the diagnosis of cancer and diabetes using five different insect-based techniques. These five insect-based techniques include Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Glow-Worm Swarm Optimization (GSO), Firefly Algorithm (FA), and Ant Lion Optimization (ALO). Their study reveals that higher predictive accuracy can be obtained using a neural network with an ACO optimization technique. Loh and Then [37] proposed another methodology for cardiac diagnosis in ruler areas and talked Cervix Image Classification for Prognosis of Cervical Cancer using Deep Neural Network with Transfer Learning 3 about the advantages, issues, and arrangements for actualizing deep learning in this area. Dai and Wang [38] proposed a framework for medical application by using artificial intelligence techniques that will help in reducing the burden of healthcare providers. Their work demonstrates that the algorithms of pattern recognition and deep learning are sufficient to diagnose health. Their Experiment proved to be the most efficient with the increasing statistical data. For conducting the experiment, they used the dataset consisting of health state representation space of 9 body constitutional types.
The survey of the recently published studies justifies the use of deep learning techniques not only for cancer prediction but also for other chronic diseases. After analysing the results and architectures discussed in previously published studies the best-suited algorithms were chosen for this research so, that a reliable deep learning model can be developed to predict cervical cancer patients. Table 1 summarizes the recent publications that are based on machine learning and deep learning models. The major attributes of the table include reference, application, modality, and technique used during the experiment.

Dataset
The provided image set on Kaggle consists of a total of 5278 labelled images with labels as Type I/II/III and 512 test images without any label. These images are further divided into two sets namely train and additional set. The images present in the train set are of high quality shown in figure 1, whereas the images of the additional set are redundant and are of low quality shown in figure2.
The experiments carried out on the entire image set takes more time in converging and also result in high validation loss. Therefore, we selected the images from the main data set and split them into train sets and test set in the ratio of 8:2 respectively. The experimental work demonstrated in this paper is carried out on images available in the train set. These images are in JPEG format with a scale of 3: 4. Most of the images are 2448×3264 and 3096×4128 as shown in Figure  4. The images are uniformly re-sized into 224 by 224.

Network Architecture and implementation
The proposed architecture is shown in figure 3. It takes the set of cervix images for the training stage. As images in the training set are of different sizes thus these images are cropped randomly into the size of 224x224. Then the entire dataset is split into two portions namely test and train set. The images from the train set are used for tuning the hyperparameters of the proposed model. After this, the model was evaluated on the test dataset.  The proposed model is depicted in figure 3. The detailed workflow of the same is given in the form of the algorithm below.
Step 1: All the cervix images are cropped into 224x224 pixels for faster processing.
Step 2: The dataset is then split into training and test set. The test set is further split into validation and test set.
Step 4: The model performance is evaluated using accuracy, recall, precision, and F1-score.

Data Pre-processing
The images available in the main set are of different sizes as shown in figure 4. The X-axis indicates the size of images in each type whereas the y-axis indicates the count of images. The figure depicts the maximum no. of images are of size 2448 x 3264 pixels. For the sake of fast processing and uniformity, all the images are resized into 224 × 224 pixels. These images are pre-processed using the preprocess_input () function. This function is available in Keras API which is a python library.

Dataset split
The proposed model is tested and evaluated using three ratio sets. The images are split into 3 sets namely training set, validation set, and testing set. The three different ratios used for splitting are: • 60% train, 20% validation and 20%test • 70% train, 15% validation and 15%test • 80% train, 10% validation and 10%test The dataset is split into three different sets using the following python script: from sklearn.model_selection import, train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 1 -train_ratio) X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, test_size=test_ratio/(test_ratio + validation_ratio)) The data is split using the train_test_split () method of the sklearn library. The entire dataset is first divided into train and test sets. Later on, the test set is divided into test and validation sets.

Deep Learning Pre-Trained Models
The most widely used deep learning network for image classification is the Convolutional neural network (ConvNet). The ConvNet is the same as the neural network that consists of an input layer, hidden layer, and output layer. The depth of thousands of hidden layers makes it a deeper neural network. The input to the input layer is the raw pixel values of an image whereas the output of the output layer consists of neurons corresponding to the number of output classes. For example, in the cervix classification problem input is the cervigram image and output is the probability of image belongs to Type I/II/III. The last layer in the convolution layer is a fully connected layer that uses the SoftMax activation layer. As the number of images is available dataset is not enough to give prominent results thus our proposed model uses weights of some pre-trained models. Our proposed model uses 3 pretrained model listed below: • Inception v3: Inception v3 is a widely used image classification model. It attains an accuracy of 78.1% on the image dataset. It is based on the original paper: "Rethinking the Inception Architecture for Computer Vision" by Szegedy, et. al [23].
• VGG19: It is another very popular deep learning model used for image classification model is named after the Visual Geometry Group. It is having various variants like VGG16, VGG19, etc. VGG19 is consists of 19 layers which includes 16 convolution layers, 5 max pool layers, 3 fully connected layers, and 1 SoftMax layer.
• ResNet50: ResNet50 is based on the residual learning framework for making the training of a deeper network easier. In 2015 ResNet bagged the first position in classifying ImageNet [24] challenge by having 3.75% as an error rate. 5 accuracy achieved on ImageNet challenge. Top 5% accuracy implies that the predicted accuracy of your model lies in the first 5 probabilities given by the model

Transfer Learning
Transfer learning is another approach that is commonly used in creating Convolutional neural networks. In this approach, the weights of pre-trained models are used for creating new models. In this experiment, the activation of three pre-trained models namely Inceptionv3, VGG19, and ResNet50 are used. These models were earlier trained on Image Net data sets. Image net dataset is consisting of over 14 million images.

Evaluation Metrics
The proposed model is then evaluated on the test data set. The parameters used for performance evaluation are accuracy, precision, recall, and F1. The accuracy is considered a prime performance metric. But the only accuracy doesn't give the real picture of model performance so recall, precision, and F1-score are also considered for performance evaluation. These metrics significantly help in evaluating the model performance.
• Accuracy: It helps us to evaluate the overall performance of the model. • Recall: It helps to evaluate the ratio of correctly diagnosed cancer patients out of total patients. For example, 90% recall reveals that 90% of cancer patients are correctly diagnosed with cancer. • Precision: It evaluates how many cancer patients are correctly diagnosed as cancer patients out of all patients. • F1-Score: it is the harmonic mean of precision and recall which is considered a better performance indicator than the regular accuracy measure. The formula used for calculating the performance is given in table 3. The term TP, TN, FP, and FN refers to a true positive, true negative, false positive and false negative.

Experimental Result and Discussion
We trained and evaluated the proposed model for the cervix image classification problem which leads to the results shown in There are so many optimizers namely Adam, SGD, and RmsProp. The proposed model is using Adam optimizer. Adam optimizer is an extension to stochastic gradient descent [26]and is widely used in computer vision applications. The learning rate of the model is used to control the learning steps. The value of this attribute decides how fast or how slow the model will converge. The next attribute beta1and beta_2 represent the exponential decay rate for the first and second-moment estimates respectively. Epsilon represents a very small number that is used to prevent any division by zero in the implementation. The batch_size is used here 32 will makes batches of 32 training images. The accuracy and loss reported during the training of various models are depicted in figure 5 and figure 6 respectively. The highest accuracy is reported by Inception v3(reduced 8 modules) as 97% with a loss of 0.11. Similarly, the accuracy and loss reported during the testing phase on the test set of various models are depicted in figure 7 and figure 8 respectively. The highest accuracy is reported by Inception v3 (reduced 6 modules) as of 86% with the loss of 0.5.

Saliency Map Visualization of the cervix classification model
In the deep learning era, the Saliency maps were first presented in the paper "Deep Inside Convolutional Networks: Visualization Image Classification Models and Saliency Maps" [21]. These maps play a very important role in the image classification problem. For example, the images available in the data set are consist of a transformation zone as well as a speculum. By using the map feature it can be observed that the neural network is focusing on an important feature like the transformation zone instead of the speculum. This section focusses on the steps to create a Saliency map for learning algorithms like convolutional neural networks. The model takes colours, intensity, and orientations of the image as feature input. By using these features the model gives the visualization of Saliency maps. It works on the principle of winner-takesit-all for plotting the Saliency map.
(i) The three features namely color, intensity, and orientation are extracted from input images. (ii) The color feature gets transformed to red-green-blueyellow space, intensity feature gets converted to grayscale space. (iii) The orientation feature is converted using Gabor filters concerning four angles. (iv) All of these processed images are used to create Gaussian pyramids to create feature maps. (v) The feature maps are created about each of the three features. The Saliency map is the mean of all the feature maps.
We plot the Saliency map to visualize the neural activation for convolution and activation layers. The Saliency map tells us the degree to which each pixel in the image affects the classification score for that image [27]. The Saliency map for convolution and activation layers is shown in figure 9(a) and 9(b) respectively.

Comparisons with Existing Methodologies
After comparing the performance of various pre-trained deep neural networks, it has been seen that the best performance is achieved by using inceptionV3. So, this model is compared with some other architecture from the literature as shown in figure 10. In all the experiments the dataset that is used is the image dataset. C. Asawa et al. [28] used an intel mobile ODT dataset from Kaggle and achieved the best performance using a convolutional neural network with transfer learning of ResNet pretrained network. J. Payette et al. [29] used the same Kaggle dataset and obtained an accuracy of 58.8%. K. Fernandes et al. [30] achieved an AUC of 68.75 using supervised machine learning and deep learning methods. Z. Alyafeai [31] used two pretrained deep learning networks. The first model identifies the cervix region while the other model classifies the cervix tumour. The model attains a detection accuracy of 0.68 in terms of the intersection of union (IoU) measure. T. Chen et al. [32] used multimodality for the diagnosis of cervical dysplasia and attain an accuracy of 87.4% (88.6% sensitivity and 86.1% specificity). P. Guo et al. [33] worked on 30,000 smartphone-captured images and used ensembles deep learning to classify images into the cervix or non-cervix which achieve an accuracy of 91.6%. It can be observed from the above analysis that our proposed model performed fairly well in comparison with other existing methodologies as it achieves the accuracy of 97.1%.

Conclusion
The work present in this paper demonstrates the development and implementation of various deep learning algorithms to address the problem of cervix classification. The algorithm used for the experiment is based on the transfer learning approach. The performance of the model results in progressive improvement by tuning the value of various parameters.
Inferable from the accomplishment of our methodology, we found that utilizing the methodology of transfer learning yield good outcomes, as this was a fine discrimination task without much data. In this paper, we also presented visualization techniques for deep classification ConvNet which computes an image-specific class Saliency map, highlighting the areas of the given image, discriminate for the given class. In our future research, we are planning to incorporate the image-specific Saliency maps into learning formulations in a more principled manner.