AiCNNs (Artificially-integrated Convolutional Neural Networks) for Brain Tumor Prediction

INTRODUCTION: Accurate analysis of brain MRI images is vital for diagnosing brain tumor in its nascent stages. Automated classification of brain tumor is an important step for accurate diagnosis. OBJECTIVES: This paper propose a model named Artificially-integrated Convolutional Neural Networks (AiCNNs) that accurately classifies brain MRI scans to 3 classes of brain tumor and negative diagnosis results. METHODS: AiCNNs model integrates 5 already trained models including simple convolutional neural networks (one uses a simple CNN while the other utilizes data augmentation) and three pre-trained networks whose weights are transferred from ImageNet dataset. RESULTS: AiCNNs model was trained on 3501 augmented T1-weighted contrast enhanced MRI (CE-MRI) brain images. Validation results of 99.49% (loss=0.0303) had been achieved by AiCNNs on a set of 1167 images, which outperform its contemporaries which have got results upto 97.81% (loss=0.1794) and 97.79% (loss=0.1787). CONCLUSION: AiCNNs has been shown to obtained a test accuracy of 98.89 % on a set of 1167 images.


Introduction
The brain is the central nervous system of the body, and is responsible for all activities originating in, or from the body.When abnormal cells start forming within the brain, it leads to the condition known as a Brain Tumor.A tumor in this organ can be extremely threatening to anyone's life.Among these, there can be of two types either cancerous or noncancerous (i.e.Malignant or Benign) [1].Brain Tumors can further be divided into primary and secondary tumors.The primary brain tumors start within the brain while secondary brain tumors spread to the brain from somewhere else.The secondary brain tumors are one of the most complicated neurological cancers [2].The symptoms for brain tumor may range from severe headaches and seizures to problems with vision and mental changes, depending on different parts of the body [3].Nowadays, diagnosing brain tumors just on the basis of the given symptoms has become an arduous task.Even with the advent of CT (i.e.computed tomography) and MRI (i.e.magnetic resonance imaging) the amount of data that has to be analysed for detection, has increased.So, many mathematical and image processing techniques have been proposed to counter the aforementioned problem.These algorithms work mostly on T1, T1-weighted, T2, FLAIR MRI scans and CT scans and may not be that accurate to detect or classify brain tumors [4].An image of a brain tumor in a T1-weighted CE-MRI brain image has been depicted in Figure 1.The rest of the paper is organized as follows.In section 2, the works related to the AiCNNs model have been studied.These are then used for comparison analysis in section 3 which compares the performance of state-of-the-art algorithms and most conventional algorithms after extracting global features from the image dataset.Section 4 delineates the proposed AiCNNs model in detail along with the dataset used and augmentations done.Section 5 enlists the results that were obtained when the AiCNNs model had been trained and these results were compared in a table to some common deep learning models.Finally, the research is concluded by stating some inferences derived from Table 3 about the AiCNNs model and outlines a possible future scope.

Related Work
The earliest works in MRI image evaluation, processing, and classification date back to November 1999 [32].This research essentially made it possible for MRI simulators to efficiently create 3D brain images.MRI scans had been used in medical imaging for a while then, from their advent in 1971 by Paul Lauterbur [33].Research like [34] provided a direction of machine learning algorithms in the discipline of medicine.It briefly discussed the significance of decision trees, neural networks, and Naive Bayes classifier and hence became a trailblazer in the aforementioned domains.Later in 2003, Fatemizadeh proposed MGNG (modified neural-gas) algorithm for automated landmark extraction.This was a neural network-based unsupervised algorithm that split and then merged SOMs (Self Organizing Maps) [35] which were used for CT-scans.In 2006, Shang used Principal Component Analysis (PCA) and neural networks to register several computed tomography (CT) and magnetic resonance (MR) brain images automatically.This was done using the first finding principal directions using both the algorithms aforementioned.
In 2009, revolutionary research [36] shaped the application of machine learning in the discipline of Brain Tumor.It used pattern classification to differentiate between gliomas and metastases while determining the stages of gliomas in Brain MRI scans.It used many steps for classification such as region-of-interest (ROI) (which is essentially the ratio of the area of the intersection to the area of the union), feature extraction and feature selection.The classification was done using support vector machines in the support of 102 images.The results for accuracy were found out to be 85 % for binary classification of metastasis and gliomas.Here, only 4 images of meningiomas were found.But this model performed badly as has been discussed later.The research aimed at the classification of meningioma tumors was conducted in 2010.It used optimum channel amongst RGB color channels for histopathological images for optimum texture combinations.The texture features were extracted using 4 feature extractors amongst which 2 were model-based and 2 were statistics-based.These features had then been fused together in different combinations while excluding the correlated features to reduce redundancy and in turn avoiding the model from overfitting.Then a Bayesian classifier classified the images if they represented meningiomas tumor or not.Finally, a model-based Gaussian Markov field and a statistics-based run-length matrix texture were used to obtain an accuracy of 92.50 % [37].Benign and Malignant Brain tumors or normal and abnormal images had been classified by El-3 Dahshan et al [38].Although they used just 101 images, they had been able to achieve a train and test accuracy of 99% on this set.Their approach utilized feedback pulsecoupled neural networks, discrete wavelet transforms, principal component analysis, and feed-forward backpropagation neural network for image segmentation, feature extraction, minimizing the dimensions of coefficients of wavelets and final classification, respectively.Later, SVM based approach for medical diagnosis and automated tumor detection was introduced [39].This approach had been developed, keeping in mind that large volumes of MR images that couldn't be inspected by manual intervention.An anisotropic filter had been used here as a means of preprocessing.The connected component pixels and edge segmentation had been used together for image segmentation tasks.Further, feature extraction was conducted using global histogram features while SVR algorithm analysed and extracted these features to give ~95 % accuracy whereas feedforward backpropagation neural network (BPNN), gradient descent (GD) and levenbergmarquardt (LM) got an accuracy of ~93.15 % on average, 92.21 %, and 94.14 %, respectively.In 2015, Yan Xu et al [40] proposed a deep convolutional activation feature (CNNs) model for classification and segmentation for MICCAI 2014 brain tumor digital pathology challenge.The proposed model transferred the features from the ImageNet dataset to get 97.5 % classification accuracy and 84 % segmentation accuracy.Later, Jun Cheng et al [41] performed brain tumor classification using ROI (as described in [36]) on 3 classes (i.e.meningioma, glioma, and pituitary brain tumors).This was performed on the dataset that this model is trained on.They initiated their work by augmenting the region of the tumor with image dilation which was used as ROI instead of the original tumor region.They then split the tumor region into concentric ring-form sub-regions.Three feature extraction methods had been used -intensity histogram, bag-of-words (BoW) and gray level co-occurrence matrix which gave them an accuracy of 87.54 %, 89.72 %, and 91.28 %.These work will hold vital significance to our research as would be seen in section 3 and later on when the results would be discussed.
Authors proposed deep convolutional neural networks with significant dropouts, leaky rectified linear units, and small kernels for convolutions for the 2015 Multimodal Brain Tumor Segmentation (BraTS) [42].They aimed at distinguishing LGG (Low-Grade Gliomas) against HGG (High-Grade Gliomas).In [43], Jocelyn Barker et al. introduced a method to overcome inefficiencies in computerized analysis of whole slide pathology and mislead diagnosis algorithms (happening due to diverse tissue regions in whole slide).This has been done using analysing regions that were coarse in slide images, then extracting localized features of the tiled region of the slide images, and then reducing their dimensions, they achieved an accuracy of 93.1% on binary classification among gliomas and glioblastoma multiforme.Later, Liya Zhao [44] introduced a multiscale CNN for the BraTS challenge that was designed to combine information from different sizes of the region around the ROI.This model had also combined features from T1, T1-enhanced, T2 and FLAIR MRI images and it achieved an accuracy of about 90 % or 0.9.The research done in [45][46][47][48][49] should be referred for a detailed analysis of other segmentation-related research which utilized deep learning models.Kaur B et al. proposed a Beaming Edge SAlient (BE-SAL) approach for the segmentation of images [50].In 2017, R. Anjali and S. Priya [51] proposed a model having different stages such as image pre-processing (or noise removal), enhancing texture features, feature segmentation, feature selection, and classification.They utilized a hybrid of CART (Classification and Regression Trees) and an ensemble of SVM (Support Vector Machine) for classifying if an MRI scan has a tumor or not.This hybrid achieved an accuracy of 92.31 %.Later, Ali Ari and Davut Hanbay [52] proposed a model having 3 stages.The first stage cleaned the data of noise, while, in the second stage cranial MRI were classified using extreme learning machine local receptive fields (ELM-LRF), as benign and malignant.Later, segmentation of the tumors was done.

Methodology
Various Machine learning algorithms have been used for Brain tumor classification as had been studied in [53] (that used Random Forest Approach) which gave an accuracy of 79.67 %, [54] (which used Support Vector Classification) which resulted in an accuracy of 94 %, and [55] (which used Naïve Bayes) that had a detection rate of upto 90.63 %.These algorithms were tested on a different dataset which consisted of only 50 brain MRI scans.A comparative analysis of all the existing machine learning algorithms has been done in this section for brain tumor classification for the dataset that we used.It has been done for comparing accuracies that utilized dataset combined using 2 different datasets.The first dataset used here is a combination of three types of tumors, namely Glioma, Meningioma, and Pituitary [56].A fourth class depicting negative results has also been augmented to the dataset from [57]. Figure 2   Then Global features from datasets such as Moments, Haralick Texture Features and color histograms have been extracted from it.A comparative analysis of various machine learning algorithms like Logistic Regression, LDA, K-nearest neighbours, Classification and Regression Trees (CART), Random Forest, Gaussian Naive Bayes and Support Vector Machine (SVM) have been done and results for the same have been depicted in Table 1.The aim of doing this had been to determine which algorithm performed well on the aforementioned dataset.A further extrapolation of Table 1 has been done in figure 3 given below.It shows that the Random Forest Classifier has achieved maximum accuracy of up to ~86.08%.On the other hand, SVC (or Support Vector Classification) got a maximum accuracy of 50.57%, which has just been a little over a simple random classifier which may give an accuracy of 50 %.Figure 3 represents a comparative analysis of algorithms using the Box Plot which depicts the maximum and the minimum accuracy that they had achieved.This analysis leads to the conclusion that instead of using global features, a better classification can be done using the pixel values of MRI scans of the brain.Hence, deep learning-based approach utilizing data augmentation and simple convolutions had been used to classify tumors.The algorithms comprise of CNNs, different transfer learning methods like VGG16, VGG19, Xception, VGG19 (with ELU).The results for the same are shown in Table 2.This analysis suggests that most of the models that were trained had been subjected to overfitting of some kind, except those that used data augmentation (that can be referred from "DA used" column).But, the models that had used data augmentation have given a low accuracy.Hence, an inference can be derived from Table 2 that the models with data augmentation suffer from low accuracy while avoiding overfitting.Examples of this can be seen in the case of simple CNN, VGG16 or VGG19 which obtain a training accuracy of 90.02 %, 86.62 %, and 81.74 %, while a validation and test accuracy of 94.09 % and 94.60 %, 91.95 % and 89.97 % and 91.26 % and 88.95 % had been achieved for these models respectively.On the other hand, all other models have been shown to suffer from overfitting while they usually achieve high training accuracy.The best example of this can be seen in the case of the Xception transfer-learned model which achieves a training accuracy of 99.03 %, validation accuracy of 68.55 % and test accuracy of 66.32 %.We observe that model (6) has been an exception to the rule (acctrain > accvalidation > acctest) among the models not using data augmentation.It may be due to the exponential linear unit (or elu) activation function (described later in eq. ( 5) and figure 7(e)).A final comparison of all these algorithms has been done with the AiCNNs (whose framework has been proposed in the next section) in Section 5.

Dataset Description
Step i.The Matlab data files had been read through h5py [58] object.
Step ii.That object had been used to get its Image array file.
Step iii.The Image array file had been converted to a NumPy [59] array.Step iv.Label had been extracted to a Label object from cjdata/label.Step v. Label object had been used to check the type of tumor and save it to Meningioma, Glioma or Pituitary folders using SciPy [60] (i.e.misc.imsave()function) Pituitary class.Finally, a set of 111 images for Negative tumors has been extracted from the dataset.

Data Augmentation
To overcome imbalances in the datasets of images, an augmenter module had been implemented.The number of image files in the dataset had been artificially increased by using transformations which didn't change the vectors of pixel intensities for specific classes that the images belonged to.Each image had been transformed according to the steps mentioned below.
Step i. PIL (or Python Imaging Library) [61] had been used to open the files in the class folders and saved as PNG files (as JPG format doesn't support RGBA channels while PNG does) Step ii.A pipeline object had been defined using Augmentor [62] which conducted the following augmentations to the images.a. Rotation: rotating an image with 0.9 probability with maximum left rotation as 10° and maximum right rotation as 15°.b.Flipping: flipping of image horizontally and vertically with a probability of 0.5 and 0.5, respectively.c.Resize: resizing at a probability of 35% to 512×512.
Step iii.This had been done till the following number of images had been obtained.a. 1402 images for Meningioma Brain Tumor (after adding 600 images from augmentor module); b. 1486 original images for Glioma Tumor; c. 1536 images for Pituitary Brain Tumor (after adding 600 images from augmentor module); and d. 1411 images for Negative (after adding 1300 images from augmentor module).
The class distribution for different classes (as have been mentioned in figure 4) after passing the images through augmentor pipeline has been depicted in figure 5, given below.The total no. of images after data augmentation had been 5835 whose class distribution has been aforementioned.In addition to artificially increase the size of the data set, data augmentation can make the resulting model more invariant to rotation, reflection, translation, and small noise in the pixel values.After this, each image has been resized to 250×250 pixel which has then been given as an input to the input layers Conv2d_16_input, Conv2d_24_input, xception_input, vgg16_input and vgg_19 input of models ( 1), ( 2), ( 5), ( 7) and ( 8) from Table 2, respectively.

Figure 5. Uniform Class Distribution of Brain Tumor
Images after dataset has been pipelined to the Augmentor module.

Dropouts
To reduce overfitting of the individual models (1-8), regularization technique called dropout had been employed at the fully connected layers, after three input layers, as can be seen from figure 6. Dropout had been set to 0.25 for most of the models, which depicts that the weights of the one-quarter of the neurons are randomly set to zero while remaining give their original inputs multiplied by the weights.This ensures that the neurons in convolution functions and ANN layers are mostly independent of each other, in the same layer in which they are introduce.

Model Architecture
The final architecture for AiCNNs has been shown in Figure 8. AiCNNs has utilized models (1), ( 2), ( 5), ( 7), ( 8) from Table 2 which are described below along with Figures 6(ae).These models are cascaded and concatenated together as an input to the Artificial Neural Network layer of AiCNNs The kernel size taken has been 3×3, uniformly with an input image size of 250×250 and batch size of 32.Figures 6 and 8 have been generated using Matplotlib [63] and Keras [64].
Ansh Mittal, Deepika Kumar  2. This is a normal CNN model that utilize Data Augmentation.
Model (1) has been depicted in figure 6(a).This model has its first convolutional layer (i.e.Conv2d_16) as an input with a kernel size of (3,3), along with tanh(s) (or hyperbolic tangent) (as per equation ( 1)) as an activation function for introducing non-linearity.The equation for the hyperbolic tangent activation function as used in [65] has been given below in equation ( 1). ( where s is considered as the output values of pixels from the previous layer of the Convolutional Neural Network for individual images.This function creates a S-shaped curve (as shown in figure 7(a)) due to its exponential (i.e. e s or e -s ) terms.This is because the function defined above asymptotes at -1 and +1 as the value of s increases or decreases, respectively.This further helps map negative inputs to be strongly negative while zero inputs to be mapped near zero.Furthermore, this function is differentiable, monotonic, while its derivative is nonmonotonic.Here, this function plays an integral role to The layer (discussed earlier i.e.Conv2d_16) gives 16 feature maps in all.In the second convolutional layer (i.e.Conv2d_17), the same kernel size has been used and it gives 32 feature maps of 125×125 when it's passed through the leaky rectified linear unit as an activation function.The equations for Leaky Rectified Linear Unit (LReLU) activation function [66] has been described by the equation (2) given below. ( where α had been kept at a value of 0.08 for optimality purposes.This function is used to augment to the range of the ReLU function (discussed for next layer).This function is used to counter the dying ReLU problem in which the model becomes unfit to be trained on the data due to loss of important features in negative pixels.It and its derivative are both monotonic in nature.
Subsequently, the first max-pooling layer (i.e.max_poolingd_16) of 2×2 kernel has been utilized to get 32 feature maps, each of 62×62 pixels.Then, a dropout of 0.25 (in dropout_20) i.e. 25 % (i.e.25% values are randomly set to zero).Similarly, the third convolutional layer (i.e.conv2d_18) gives 64 feature maps with 31×31 pixels each and second max-pooling layer (i.e.max_pooling2d_17) of 2×2 kernel size has been utilized to get 64 feature maps with 15×15 pixels.The convolutional layer described here utilizes a ReLU (i.e.Rectified Linear Unit) activation function.The equation for the ReLU activation function has been described in equation (3) given below. (3) As can be seen ReLU, has been used to introduce nonlinearity in the form of half rectified (from bottom).ReLU gives a zero output for any value of vectors that is less than 0, while it behaves as an identity function for any vector values that is equal to 0 or greater than 0. It and its derivatives both are monotonic function.Despite this fact, the fact that all negative values become 0, reduce its functionality for the models to train from data.Hence, here it has been used in conjunction with other activation functions.
Further, a dropout of 0.25 has been introduced after this layer again (in dropout_21).Another convolutionalmaxpool-dropout block (conv2d_19, max_poolingd_18, dropout_22, respectively) with the same configuration as the previous block has been utilized to obtain 128 feature maps of 4×4 pixels each.These are then flattened (in flatten_5) to get 2048 values which are passed through a fully connected layer of 128 neurons (in dense_9).The last dropout of 0.25 has been introduced in this model which has then been passed through a fully connected or dense layer (dense_10) to get the output as per one hot encoding defined earlier using softmax function.The equation for softmax activation function which has been mathematically described in equation ( 4). ( where i = 1, 2…, K and z = (z1, z2…, zK) ∈ ℝ k .The main reason to use softmax function is because it gives a probability for each class in multiclass-classification.This probability is calculated using equation described above; where value for specific neuron in the last layers is used as numerator and the sum of all the values in neurons of the last layer is added to get denominator.This provides a probability for image to be classified as a class when a corresponding neuron (hence, it corresponds to that specific class) has the highest value.This function is monotonic and differentiable but its derivative is not monotonic.
Model (2) introduced using figure 6(b) has the same layout as that of model ( 1) except that it utilizes Keras [64] API's ImageDataGenerator which has a shear range, width shift range, height shift range, and a zoom range of 0.1 and a rotation range of 20°.It horizontally flips the image as well.Due to the aforementioned augmentations, test and validation images had been rescaled as well.Figure 6 (c -e) are the transfer learning models (corresponding to model ( 5), ( 7) and ( 8)) that have been utilized in the AiCNNs model.They have briefly been described below.
The model ( 5) has been depicted in Figure 6(c).Here, the xception model [67] has been loaded from keras.application package with model pre-trained on the ImageNet dataset [68][69].Now, the weights obtained from Xception (xception_input in figure 8) for this dataset (2048 feature maps of 8×8) have been flattened (flatten_11) to get 131072 features which are then passed through a fully connected or dense layer (dense_32) of 128 neurons through a dropout of 0.25 (in dropout_22).This configuration of the densedropout layer has been repeated again (as dense_33 and dropout_23, respectively).This model only utilizes rectified linear unit (eq.( 3)) as its activation function in 2 blocks mentioned before.Finally, the output from 64 neurons has been passed through the final fully connected layer (i.e.dense_34) by a softmax activation function (eq.( 4)).
The model (7) has been depicted in Figure 6(d).Here, the VGG16 model has been loaded from keras.application package with model pre-trained on the ImageNet dataset.The weights obtained from VGG16 (vgg16_input in figure 8) for this dataset (512 feature maps of 7×7) have been flattened (flatten_9) to get 25088 features which are then passed through a fully connected or dense layer (dense_25) of 128 neurons through a dropout of 0.4 (in dropout_16).This model only utilizes a rectified linear unit (eq.( 3)) as its activation function in the block mentioned earlier.Finally, the output from 128 neurons has been passed through the final fully connected layer (i.e.dense_26) by a softmax activation function (eq.( 4)).  2. This is a transfer-learned VGG19 model.None of these models use data augmentation as it tends to increase the overfitting which will be discussed later.

Ansh Mittal, Deepika Kumar
The model ( 8) has been depicted in Figure 6(e).Here, the vgg19 model has been loaded from keras.application package with model pre-trained on the ImageNet dataset.The weights that are obtained from VGG19 model (vgg19_input in figure 8) for this dataset (512 feature maps of 7×7) have to be flattened (flatten_11) to get 25088 features which are then passed through a fully connected or dense layer (dense_29) of 256 neurons through a dropout of 0.25 (in dropout_18).This configuration of the densedropout layer has been repeated again (as dense_30 and dropout_19, respectively).This model only utilizes a rectified linear unit (eq.( 3)) as its activation function in 2 blocks mentioned before.Finally, the output from 64 neurons has been passed through the final fully connected layer (i.e.dense_31) by a softmax activation function (eq.( 4)).
Finally, the model ( 6) utilized the exponential linear unit (ELU) as an activation function in all its dense layers, the formula for which has been described in equation ( 3).(5) where α had been kept at a value of 1 as per Keras [64] documentation of Tensorflow.ELU is very similar to ReLU and it slowly smoothens until its output equals -α whereas ReLU smoothens sharply.Finally, ELU tends to account for negative results and hence can be comparatively better than ReLU.Its monotonic and differentiable.
The graphs for all 5 of the activation functions that have been mentioned from eq. ( 1) -( 5 2), ( 5), ( 7), ( 8)) from Table 2. Now, the AiCNNs model has been depicted in figure 8 and has been cascaded and concatenated from models (1), ( 2), ( 5), ( 7) and ( 8) of Table 2. Here, all the 5 models discussed above had been loaded using the load_model() function of keras.modelspackage.The inputs that had been fed to all models remained the same except that of the model (2) whose inputs had been scaled by a factor of 255.0 to normalize the pixel values around 1. The outputs of these models had then been concatenated together using concatenate() function of keras.layers.mergepackage.This concatenated layer of outputs had been passed through 2 hidden layers, first having 32 hidden layer neurons (dense_1), and the second having 16 hidden layer neurons (dense_2).Both these hidden layers used ReLU (eq.( 3)) activation layer.Finally, the output layer has 4 output layer neurons corresponding to the 4 classes discussed earlier.This model also calculates categorical cross-entropy loss with AdaDelta optimizer.The complete architecture of the AiCNNs model has been depicted in Figure 8. AiCNNs have been inspired by the integrated stacking model [70] which utilizes a similar cascade of classifiers where ANN layer is used as a meta-learner model or level 1 learner (or model) while CNNs are used as sub-models or level 0 learners (or models).

Results & Analysis
Initially, the 5 models (i.e.model (1), model (2), model ( 5), model (7), and model (8) of Table 2) have been trained on the same dataset, with only model (2) taking the input as scaled values (on 255.0 for proper normalization).These were trained on 3501 images, validated on 1167 images and tested on 1167 images.The training curve for these models has been given below.A detailed comparison of these models along with models (3), ( 4), and (6) has been done in Table 3 while their training history has been depicted in figure 9 (a -e) and figure 10 (a -e 2), ( 5), ( 7) and ( 8), resp.for the classification of MRI scans.2), ( 5), ( 7) and ( 8), resp.for the classification of MRI scans.The model had been run on 100 epochs with batch size 32.This resulted in a training accuracy of 99.20% and training loss (categorical cross-entropy) of 0.0306.While the validation and testing accuracy had considerably been reduced to approximately 99.486 % and 98.886 % respectively.Also, the validation loss (categorical crossentropy) for the model has been noted to be 0.0303 which depicts that this model doesn't overfit the dataset that had been taken.
AiCNNs had been trained on a set of 3501 images, validated on a set of 1167 images and tested on a set of 1167 images.The curves for validation and training history has been displayed in figure 11 and figure 12.These curves represent the history of training of the AiCNNs (not including the 5 models that have been trained earlier).The confusion matrix for all the models whose training history has been mentioned earlier in figure 9 (a -e) and figure 10 (a -e), has been given in figure 13 (a -e).We can observe that the simple CNN model (1) inaccurately classifies 14 instances of Meningioma brain tumor as negative, 1 instance of Glioma and 3 instances of Pituitary brain tumor as Negative.On the other hand, the model (2) which utilizes data augmentation has misclassified only 8 instances of brain tumors as negative which had been a much more desirable result as compared to the earlier models.All these wrongly classified instances are known as False Negatives (FNs).In real-world scenarios, less false negatives are requirement for a medical system, as a system detecting FNs may give a negative result to those suffering from brain tumor, which can prove fatal.
Model (5), whose confusion matrix has been plotted alongside model (1-2) (7-8), performs the worst out of the 5 models.This has been subjected to its low accuracy and high overfitting of the data to the model.This model also has a low validation accuracy and loss as can be seen in figure 9(c) and figure 10(c).The value and performance of model ( 7) and ( 8) in real-life scenario lies in-between model (1) and (2) as these classify 9 and 11 MRI (20 in total) scans as FNs.So, these confusion matrices have been used for cross-checking the previous test accuracies (acctest) that had been described in Table 2 and have finally been stated in Table 3. Figure 14 depicts the confusion matrix corresponding to AiCNNs (architecture for which has been defined in figure 8 and whose history of training (with accuracy and categorical cross-entropy loss) had been depicted in figure 11 and figure 12).AiCNNs has reduced the FNs (for brain tumor (all 3 types)) instances to 3 which has been the best achievable amongst all the models discussed and worked upon, so far.It also has less misclassified points, better validation accuracy, and validation loss compared to models (1), ( 2), ( 5), ( 7) and ( 8) (discussed earlier in Table 2) as has been discussed with reference to Table 3.A detailed comparison of all the models that have been implemented has been done in Table 3, which has been given below.Note that AiCNNs has been one of the best models among all the models mentioned here.This has been due to the fact that 5 models compute the weights to be given to the concatenation layer which is a vital part for AiCNNs, discussed in section 4. The calculation of features from the 5 models (namely model ( 1), ( 2), ( 5), (7), and ( 8)) make AiCNNs more robust.If the model had to be finetuned it may work even better than it currently does.An expeditious investigation of Table 3 depicts that Model (5) (i.e.Xception model) converges the fastest for the dataset described in section 4 but this model provides the least validation and test accuracy.It can also be observed that only AiCNNs utilizes data augmentation partially due to its nature of cascading models, in which model (2) (that uses data augmentation) has been used with model (1), model (5), model (7) and model (8) (which don't use data augmentation).Model (8) (i.e.VGG19 transfer-learned model) has achieved the maximum training accuracy (ACCTRAIN).While the minimum training loss (LOSSTRAIN) had been observed with Model (7) (i.e.VGG16 transfer-learned model).Now, AiCNNs have been observed to achieve the best ACCVALID, LOSSVALID, and ACCTEST.The four models -namely, model (2), model (3), model (4), and AiCNNs are the only models that don't follow the general trend of (ACCTRAIN > ACCVALID) and (LOSSVALID > LOSSTRAIN).This depicts that these models have been more robust to overfitting when being trained on mentioned combined dataset.It's important to note that the introduction of only a single model robust to overfitting (i.e.model ( 2)) has made the AiCNNs robust to it as well.8) (VGG19 model)) as discussed earlier in section 4 and depicted in figure 6 and 8.These all models had been trained on the ratio of 3:1:1 for train, validation, a test set of images (with 5835 being the total number of images) which is similar to AiCNNs as discussed above.It used 4 types of activation function which have been best described by eq.(1 -4) and figure 7 (a -d).And, it has been observed that the introduction of only one data augmented model (i.e.model ( 2)) resulted in the model becoming robust to overfitting.

Discussions & Conclusion
Although this model achieves the highest testing (ACCTEST) and validation accuracy (ACCVALID) and least validation loss (LOSSVALID) compared to the models mentioned earlier.This has been ascribed due to the fact that the 5 models first have to calculate 4 vectors by a 5way cascaded process and then these were then used to train the hidden layer (ANN layer) weights.
This work can further be extended through the utilization of all the models.Some more areas where this work can be extended have been defined below.AiCNNs used convolution functions that were made through manual trial-and-error method which is not an efficient means for defining a good classification model.It's important to mention that genetic algorithms may be utilized which can help to design an optimal CNN for classification purposes [71][72].Along with this, generation of ANN layers can also be conducted through genetic algorithm [73][74] and evolutionary strategies [75] such as CMA-ES (Covariance Matrix Adaptation Evolution Strategy) [76][77] and PEPG (Parameter-Exploring Policy Gradients) [78][79], for which a further perusal is needed.AiCNNs model can further be extended to include more types of brain tumors such as Neurofibroma and Osteoma.It can also be extended to certain subtypes of the tumors for Meningioma, Glioma, and Pituitary such as Convexity meningioma and Skull base meningioma, Astrocytoma and Brain stem glioma, and Craniopharyngioma and Pituitary adenoma [80], respectively.

Figure 3 .
Figure 3. Comparative Analysis (using Boxplot) of different machine learning algorithms for classification.

Figure 4 .
Figure 4. Initial Class Distribution of Brain Tumor Images AiCNNs (Artificially-integrated Convolutional Neural Networks) for Brain Tumor Prediction EAI Endorsed Transactions on Pervasive Health and Technology 10 2018 -02 2019 | Volume 5 | Issue 17 | e5 The Architecture of model (1) of AiCNNs from Table2.This is a normal CNN model that doesn't utilize Data Augmentation.(b) The Architecture of model (2) of AiCNNs from Table AiCNNs (Artificially-integrated Convolutional Neural Networks) for Brain Tumor Prediction EAI Endorsed Transactions on Pervasive Health and Technology 10 2018 -02 2019 | Volume 5 | Issue 17 | e5 classify the images into positive and negative classes of brain tumor.

Figure 6 (
Figure 6(c) The Architecture of model (5) of AiCNNs (i.e.Artificially integrated Convolution Neural Networks) from Table 2.This is a transfer-learned Xception model.(d) The Architecture of model (7) of AiCNNs (i.e.Artificially integrated Convolution Neural Networks) from Table 2.This is a transfer-learned VGG16 model.(e) The Architecture of model (8) of AiCNNs (i.e.Artificially integrated Convolution Neural Networks) from Table2.This is a transfer-learned VGG19 model.None of these models use data augmentation as it tends to increase the overfitting which will be discussed later.

Figure 7 (
Figure 7(a).A representation of hyperbolic tangent activation function; (b).Leaky Rectified Linear Unit activation function; (c).A representation of Exponential Linear Unit activation function; (d).A representation of Rectified Linear Unit activation function; and (e).A representation of Softmax function.

Figure 11 .
Figure 11.Validation (in orange) & Training Accuracy (in blue) during the training of AiCNNs for the classification of MRI scans.

Figure 12 :
Figure 12: Validation (in orange) & Training Loss (in blue) during the training of AiCNNs for classification of MRI scans.The confusion matrix for all the models whose training history has been mentioned earlier in figure9 (a -e) and figure10(a -e), has been given in figure13 (a -e).We can observe that the simple CNN model (1) inaccurately classifies 14 instances of Meningioma brain tumor as negative, 1 instance of Glioma and 3 instances of Pituitary brain tumor as Negative.On the other hand, the model (2) which utilizes data augmentation has misclassified only 8 instances of brain tumors as negative which had been a much more desirable result as compared to the earlier models.All these wrongly classified instances are known as False Negatives (FNs).In real-world scenarios, less false negatives are requirement for a medical system, as a system detecting FNs may give a negative result to those suffering from brain tumor, which can prove fatal.Model(5), whose confusion matrix has been plotted alongside model (1-2) (7-8), performs the worst out of the 5 models.This has been subjected to its low accuracy and high overfitting of the data to the model.This model also
The model introduced in this research, AiCNNs achieved a training accuracy (ACCTRAIN) of 99.20 % and training loss (i.e.categorical cross-entropy loss) (LOSSTRAIN) of 0.0306 on a set of 3501 images.It had been robust to overfitting and achieved a validation accuracy (ACCVALID) of 99.49 % and validation loss (i.e.categorical cross-entropy loss) (LOSSVALID) of 0.0303 on a set of 1167 images.This model has been tested on yet another 1167 images and achieved an accuracy (ACCTEST) of 98.89 % as can be seen from figure 14.It used a cascaded ensemble of 5 models (i.e.model (1) (a simple CNN), model (2) (CNN with data augmentation), model (5) (Xception model), model (7) (VGG16 model) and model (

Table 1 .
Comparative Analysis of various machine learning algorithms

Table 2 .
Comparative Analysis of different CNN models.

Table 3 .
A holistic comparison of different CNN models for the classification of different types of tumors.These algorithms are compared on basis if data augmentation had been used or not, training accuracy (ACC TRAIN ), training loss (LOSS TRAIN ), validation accuracy (ACC VALID ), validation loss (LOSS VALID ), and test accuracy (ACC TEST ).‡ It partially uses data augmentation as Model (2) (which also utilizes data augmentation) has been cascaded into it.