Deep Medical Image Reconstruction with Autoencoders using Deep Boltzmann Machine Training

INTRODUCTION: Deep learning-based Image compression achieves a promising result in recent years as compared with the traditional transform coding methodology. Autoencoder, an unsupervised learning algorithm with the input value as same as that of the output value, is considered in this research work for effective medical image reconstruction. OBJECTIVES: Medical data needs to be reconstructed without distorting the details present over it. A deep neural network that accepts the data and processes it to the other several layers and reconstructs that data is achieved by autoencoder. METHODS: Deep Autoencoder is implemented in this methodology as it has been considered for high dimensionality reduction. Layer by layer pretraining is achieved using an approximate inference algorithm called Deep Boltzmann Machine. RESULTS: The proposed method proves to be efficient when compared with the performance of the other autoencoders such as Deep Autoencoder with multiple Backpropagation (DA-MBP), Deep Autoencoder with RBM (DA-RBM) and Deep Convolutional Autoencoder with RBM (DCA-RBM). CONCLUSION: Performance metrics are measured in terms of Mean Square Error, Structural similarity Index and PSNR.


Introduction
Medical imaging modalities produce a huge volumetric medical data as images of internal body organs. CT, MRI and X-rays are widely used modality in the medical field. In terms of storage, data demands a very large space and bandwidth for transmission. Medical image compression plays a vital role in reducing the data size for storage and transmission. Since medical data needs to be evaluated by the physician, which reflects a life-saving scenario, the data should be compressed without a loss in the data. Deep learning is evolved as a branch of machine learning based on a neural network that processes the data and emulates the thinking process by using layers of an algorithm. Autoencoder is a *Corresponding author. Email: saranrulz671@gmail.com_ kind of artificial neural network that learns the efficient data in an unsupervised way. The objective of autoencoder results in learning a representation called encoding and typically for dimensionality reduction [1] by training the network.
Dimensionality reduction is the standard application of representation learning and deep learning. An autoencoder imitates the data from input to the output in an unsupervised network, otherwise also called a neural replicator network. It is widely classified as an undercomplete autoencoder and an overcomplete autoencoder [2]. Autoencoders are constructed with three components such as Encoder, Code and Decoder. When the input has a higher dimension than the code, then it is said as undercomplete. As reverse, when the input has a less dimension than the code, then it is overcomplete. Figure  1 a) depicts the general structure of a traditional autoencoder,

EAI Endorsed Transactions on Pervasive Health and Technology
Research Article EAI Endorsed Transactions on Pervasive Health and Technology Online First 2 which maps an input ( ) to an output ( ) through an internal representation or code (ℎ). Figure 1 b) depicts the analysis of linear and non-linear relationships in terms of dimensionality reduction. Compared to principal component analysis (PCA), which can discover the low dimensional data, autoencoders can learn non-linear continuous, non-intersecting surfaces. From suitable dimensionality and sparsity constraints, autoencoders are well enough able to learn projected data than PCA. Autoencoders are so often trained with one hidden layer, but depending on the application, it can also have multiple hidden layers. More than one layer is referred to as a stacked autoencoder or deep autoencoder. Multiple hidden layers can represent the non-linear relationship with low computational time. Hinton et al. [1] proved that accumulating more hidden layers effects in achieving a better compression rate.
The architecture of autoencoder is considered with three components such as Encoder, Code also called latent space representation and decoder. The encoder compresses the input value into a latent space representation in the hidden layer and compresses the value. Code is the part of the network which has a reduced representation of the input value and passes into the decoder. The decoder has a similar structure to the encoder, which reconstructs the input data back to the original dimensions. A simple single-layer autoencoder is depicted in figure 2. Multilayer autoencoders are implemented in this article, which is categorized in section 2. The proposed approach and image reconstruction using autoencoders are explained in section 3. Experimental analysis is presented in section 4. Section 5 completes with a conclusion and future scope.

Related Works
Balle et al. [3] proposed a research article with a compression algorithm that analyses the non-linear transform with a uniform quantizer. A convolutional neural network is executed as a non-linear transform. Distortion occurred is parametrized in terms of Mean square error (MSE) among the original input and reconstructed output. The entire scheme on this methodology is optimized end to end with a value (λ) resulting in different operational points. Limitation of this method as it requires a separate model, there is a need for new training for each value of (λ). Toderici et al. [4] presented a research article with the approach of using (RNN), which deals with analyzing streams of data by means of hidden units. The compression model is united with a binarized value to obtain with a single model. Other traditional image compression algorithms like transform-based compression [5], [6] algorithms are adopted by hybrid with the neural networks [7], which also achieve better efficient results.

Figure 2. Simple single layer autoencoder
Valenzise et al. [8] proposed a research article for image compression algorithm analysis based on subjective and objective parameters with deep learning-based image compression algorithms. Performance is analyzed in comparing with the JPEG and JPEG2000 lossless algorithms. The authors suggest handling the deep learning methodology for better visual quality in image compression applications. Navamani et al. [9] proposed a detailed survey of different types of autoencoders and suggested that the implementation of denoising autoencoder can achieve better compression rate and also excel in feature extraction. It is suggested that the training process of a multilayer neural network led to the Saravanan.S and Sujitha Juliet EAI Endorsed Transactions on Pervasive Health and Technology Online First 3 optimal problem and not assumable convergence. However, deep learning models can overcome with pretraining and finetuning for training the network efficiently. Razzak et al. [10] proposed a survey paper with a deep analysis of medical image processing applications where deep autoencoders are projected with the advantages of labelless data and the cons of pretraining step requirements. And it supports sparse autoencoder, which is efficient for feature extraction in terms of identifying Alzheimer's disease. Bo Zhu et al. [11] proposed an automated transform by manifold approximation (AUTOMAP) that reorganizes image reconstruction with a data-driven supervised learning task to allow the map among sensors and images. Performance is evaluated with hyperparameters of conventional transforms and proved AUTOMAP to be efficient. Chuxi Yang et al. [7] proposed a wavelet-based deep image compression algorithm using a DNN. This proposed algorithm based on wavelet transform for decomposing the input images in frequency level band and compressed with a deep autoencoder in each sub-bands.
Performances are analyzed in comparison with traditional compression methods JPEG, JPEG2000 and BPG. Chun Chet Tan et al. [12] proposed a deep compression on mammogram images. Selecting a particular patch over an image is considered for training in this method in order to reduce computational time on training. Mean square error and SSIM are the performance metrics considered for this algorithm. And the method suggests using multilayer in an autoencoder for reducing a high dimensional input data into a smaller code space. Proposed method trains with multiple times Backpropagation (MBP) in comparison with Restricted Boltzmann Machine (RBM).
Haisheng Fu et al [13], Proposed a hybrid model using convolutional neural network for achieving a compact representation of input image, which is encoded by FLIF codec to achieve the best representation of the image. Mu Li et al. [14] proposed a research article for a lossy image compression model using learning convolutional networks. Based on the content map of the image, local and global variants are determined with a binarizer to encode the output of the Quantizer to achieve the algorithm. Nguyen et al. [15] presented a research article to identify pneumonia using chest X-Ray images.
Classification of pneumonia Chest x-rays and normal images are done using a convolutional neural network. Krishnaraj et al. [16] proposed an internet of underwater things based on real-time image compression models. Convolutional Neural network (CNN) is used as an encoder and also decoder in this compression model and the experimentation is based on the discrete wavelet transform with a CNN for effective validation. Ibraheem et al. [17] proposed a research paper with an efficient image compression algorithm using a logarithmic discrete wavelet transform. It achieves a lossless image compression using a logarithmic number system technique to compute the DWT for achieving an integer number. CT, MRI and X-Ray image modalities are taken into consideration for achieving an efficient image compression model. As a finding from the deep autoencoder algorithms survey, for a high dimension, data such as medical images need to be divided into patches or blocks to make the training efficient. Challenges obtained in training are controlling the trade-off with the distortion & bit rates and analyzing with nondifferential quantization in BP. Moreover, to obtain the efficiency in reconstructed images based on the training dataset, which the algorithm is trained. Our proposed methodology is implemented with the Deep Boltzmann Machine (DBM) for training the medical dataset for compression and Deep autoencoder to achieve an efficient reconstructed medical image.

Proposed Approach
As proposed by Hinton et al. [1] an autoencoder with more number of hidden layers can reduce high dimensional input data to a smaller code space. Though training a multiple layer neural network with several numbers of hidden layers is also monotonous since deep hidden data values are hard to optimize. It can be achieved by training the data using the Deep Boltzmann Machine (DBM). DBM has the probability of learning internal representations with complex data which is highly implemented on object recognition applications. Obtaining a high-level representation is possible from an unlabelled input with a Deep Boltzmann Machine training model.

Deep autoencoder
Autoencoder is a feed-forward neural network that is trained to copy the input data to its output with the help of a hidden layer (h). The network consists of two parts such as encoder, where the network compresses the input data into a latent space representation or code represented in Equation (1). The second part is a decoder which aims to reconstruct the input from the latent space representation denoted in Equation (2).
Where ℎ denotes the hidden layer, denotes the input layer, r denotes the reconstructed layer, and denote the encoder and decoder respectively. As a function is denoted in Equation (3).
The learning process is described by ( , ( ( ))), where L denotes the loss function ( ( )) for being divergent from , represented as MSE (Mean Square Error). Autoencoders can have multiple hidden layers described as stacked autoencoders or deep autoencoder [18]. Additional layers in autoencoder provide an advantage of coding to represent multiple complex and non-linear relations with low computational time. Hinton  autoencoders. Figure 3 explains the deep autoencoder neural network structure with multiple layers. This network experiences the problem of vanishing gradients during the training process. To overcome the problem, initial weight assignments are made randomly in the pretraining phase after which BPN is united for tuning the parameters. Figure 4 depicts the different symmetric versions of deep autoencoders.

Deep Boltzmann Machines Training
Difficulty in optimizing the weights in a non-linear autoencoder which has multiple layers is overcome by introducing a pretraining. The pretraining phase in this proposed method is adapted from Salakhutdinov et al. [19]. Deep Boltzmann Machine (DBM) is a type of Markov random field which contains undirected connections between layers. It also has the potential of learning internal representations that have progressively attaining complex [20]. It achieves a layer-by-layer training procedure which, trains the unlabeled data and also for finetuning with the desired data. As compared with Deep belief network and Deep convolutional network, the interference procedure achieves by top-down feedback with addition to the bottomup process allowing DPM to better incorporate uncertainty about ambiguous inputs. And it also achieves a optimize of the layers using an approximate gradient of variational lowerbound. This brings a better facilitative learning generative model. Pretraining a DBM with three hidden layers consists of learning a stack of RBM's that are then composed to create a Deep Boltzmann Machine. To create the structure of the DBM more explicit by specifying its energy function. For the 2-layer model as described in Equation (4).
Where = � , , (1) , (2) , �. DBM can be considered as bipartite graph among two vertices. Increasing the lower bound with deference to Mean-field distribution attains the following Equation (5) & (6) which are iterated until attaining the convergence.  [19], [20], [21] & [22]. Deep Boltzmann Machine (DBM) is a deep generative undirected model, which consists several hidden layers. It works on the top-down pattern of connectivity to influence the learning of lower-level features. As denoted R1, R2 & R3 are weights of recognition model, which are doubled at each layer to compensate for the lack of top-down feedback. When these three modules as depicted in Figure 5 a) are composed to form a single model, the layer copies are removed and the total inputs coming into the first and second hidden layers are halved. Pretraining algorithm with a deep Boltzmann machine with 3 layers are shown in the algorithm.

Algorithm: Pretraining algorithm for a deep Boltzmann machine with 3 layers
Step 1: Initialize the Vector by duplicating into two copies and connected to hidden layers with weight W 1

Database
In this implementation, NIH clinical center's chest X-Ray image dataset is used for analyzing the image compression performance of our model. The dataset contains high-quality X-ray images with a 1024 X 1024 resolution, with a collection of 1,12,110 images with a wide variety of x-ray images from more than 30,000 patients. During the training process, this network was trained for 100 epochs using a onestep contrastive divergence.

Setup
From the NIH dataset, due to the computational constraints, 4999 images are considered to perform experiments. As implemented by Asif et al. [23], the images are resized as patches with 512 X 512 resolutions. And the methodology is implemented in MATLAB 2019b version with the deep learning toolbox. To confirm that the proposed methodology learns patients' invariant features, the training, validation, and test set images are obtained from the same data set patients' images.

Methodology
In the proposed method as depicted in Figure 4 Experiment results prove that adapting a Deep Boltzmann machine instead of a Restricted Boltzmann machine for training the datasets achieves a better result in terms of PSNR and SSIM. And the quality of the reconstructed images is identical to the input images that are proved with a metrics SSIM at a rate of 99 % accuracy. Performance metrics are evaluated based on PSNR, MSE, SSIM and Compression ratio with respect to the number of epochs on the algorithm.  From Table 1, it can be clearly understood that the Deep autoencoders provide low reconstruction error compared to other architecture with the same compression ratio on X-Ray 1024 x 1024 image. Through the experimentation results in Figures 6, 8& 9, it has been clearly understood that the architecture of Deep Autoencoder with Deep Boltzmann Machine training achieves a better performance in comparison with other sets of architectures.
As depicted in Figure 6 and Figure 8, proposed method proves to be a good pretraining method as it leads to a faster convergence. However, Test error is consistently higher than training error with a small margin, and both error curves are decreasing with epochs which doesn't overfit. As the training set using proposed DBM results with 0.01 at 18 Epochs for the patches of input images determines to be consistent among the other algorithms. The training loss is the average of the losses over each batch of training data. Because the input images are changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.     [25], calculates the structural similarity index by combining the SSIM index of several versions of the image at various scales. Figure 9 illustrates the distortion curve obtained in comparing with the existing algorithms at different bit rates with respect to the Peak Signal Noise Ratio. As a comparison, the proposed method reconstructs the image with good retention of structural quality and achieves a high PSNR level in comparing with other algorithms. Experiments are analyzed based on performance metrics such as PSNR, MSE, CR and SSIM. Peak Signal Noise Ratio is a parameter for evaluating the quality of the compressed image. It is defined by equation (7).
Mean Square Error is defined by Equation (8). Furthermore, as the essential parameter, the Compression ratio is formulated by the size of the input image divided by the size of a compressed image, as shown in Equation (9).
Structural similarity (SSIM) is evaluated using Equation (10). Obtained results based on the different metrics declares that the proposed method is efficient in reconstructing the medical images.
As insight and recommendation based on the experimentation, Deep Autoencoders are efficient in reconstructing the medical images without a quality loss. From the analysis, the feasibility of using image patches for training, which saves training time and also reduces the architecture size. It is noticed from the experiments that using bigger patches yields better recognition rates over the smaller patches. Possibility of raise in computational complexity as an increase in the size of the architecture.

Conclusion
Experimental results achieved for the medical image compression using deep autoencoder shows the feasibility of using a patch-based image to train the autoencoders with Deep Boltzmann Machines. Analyzing the existing image reconstruction algorithms, the proposed method Deep Autoencoder with Deep Boltzmann Machines training network can able to retain the structural quality of the medical images since the details present on the medical images are vital, which needs to be reconstructed without degrading the quality of the image. Training with Deep Boltzmann Machines proves to be efficient by training a huge set of medical data in achieving a higher peak signalnoise ratio. The proposed architecture proves to be an efficient method in achieving a higher similarity index with less error. The proposed architecture needs to be trained with appropriate individual medical modality images in order to achieve a higher similarity. Future investigation of Image compression using different Deep autoencoder architectures are needed to improve the efficiency of the compression rate.