A smart cropping pipeline to improve prostate’s peripheral zone segmentation on MRI using Deep Learning

INTRODUCTION: Although accurate segmentation of the prostatic subregions is a crucial step for prostate cancer diagnosis, it remains a challenge. OBJECTIVES: To propose a deep learning (DL)-based cropping pipeline to improve the performance of DL networks for segmenting the prostate’s peripheral zone. METHODS: A U-net network was trained to crop the area around the peripheral zone on MRI in order to reduce the class imbalance between foreground and background pixels. The DL-cropping was compared with the standard center-cropping using three segmentation networks. RESULTS: The DL-cropping improved significantly the segmentation performance in terms of Dice score, Sensitivity, Hausdor ff Distance, and Average Surface Distance, for all three networks. The improvement in Dice Score was 34%, 13% and 16% for the U-net, Dense U-net and Bridged U-net, respectively. CONCLUSION: For all the evaluated networks, the proposed DL-cropping technique outperformed the standard center-cropping.


Introduction
Prostate cancer is one of the most malignant tumors and the second cause of cancer-related death in males. Nevertheless, early detection and staging of the disease is associated with nearly 100% 5-year survival rate [1]. One of the critical steps for accurate prostate cancer diagnosis and efficient treatment is the precise delineation of the prostate gland and its sub-divisions. Today, the T2-weighted (T2w) magnetic resonance imaging (MRI) is considered the state-of-theart imaging modality for prostate segmentation as it Nonetheless, the wide range of prostate shape variation among patients and the heterogeneous pixel representation surrounding the PZ boundary, renders the automatic PZ segmentation a daunting task [5].
Over the past decade, the advances in Deep Learning, and particularly in convolutional neural network (CNN) based concepts, have significantly improved the performance of automatic prostate segmentation. In the field of medical imaging in general, a plethora of deep learning (DL) architectures have been proposed, with the original U-net [6] being a remarkable achievement that consists today the backbone of several more sophisticated models. Despite recent advances in DLbased segmentation methods, the performance of existing models for prostate, and particularly for PZ segmentation, is not considered sufficient to enable their transfer and deployment in clinical practice.
Novel image preprocessing approaches are commonly applied to boost networks' performance and to enhance segmentation accuracy. A common but critical issue that may hamper model's performance is the presence of imbalanced data that characterize the image labels [7]. Specifically in medical imaging, class imbalance refers to the situation were the number of pixels in the region of interest (ROI) is significantly smaller that the number of pixels in the background. The presence of class imbalance in the dataset used for model training might result to inconsistent segmentation networks. A workaround to overcome this problem is to crop the image around the ROI before training the DL network. To automate this process, conventional center cropping is commonly used under the premise that the ROI is located at image center [8]. While this practice is best suited for large and consistent ROIs, for PZ segmentation this risks producing faulty segmentation results [9].
In this paper we propose a smart DL-cropping pipeline that permits cropping around the PZ of the prostate on T2w MR images with the objective to improve DL models' segmentation accuracy. The efficacy of DL-cropping for improving segmentation performance was assessed on three state-of-the-art segmentation networks and was compared to the conventional center-cropping approach.

Dataset
For the purpose of this study, the publicly available Cancer Imaging Archive (TCIA) PROSTATEx dataset was used [10]. This includes T2w MRI images of 98 patients along with their annotations, manually delineated by experts. The main MRI vendors was Siemens (MAGNETON Trio and Skyra models) with a magnetic field 3 Tesla. In total, the number of annotated 2D frames on the PZ is 1319, the slice thickness was 3.6mm and the number of slices ranged from 15-22. The frames were 384X384 pixels in size before being resized to 256X256 to match the models' specifications. To increase model variability and generalizability, data augmentation was used to apply a set of affine transformations to the original image, including (i) image rotation in varying predetermined degrees (-20, -10, -5, 5, 10, 20), and (ii) image shifting in any direction by a factor of 0.5.

DL cropping pipeline
In this paper, we suggest a deep learning cropping strategy for reducing the pixels' class imbalance between the PZ of the prostate and the background pixels in the frame. This can be considered as a preprocessing step for PZ segmentation where a bounding box enclosing the ROI is created on each frame. The ROI box is then extended by 40 pixels either horizontally as well as vertically from the original mask, and the training frames are clipped around the bounding box region [12]. A U-net network [6] was trained to trim the region surrounding the PZ in the testing dataset for this purpose, resulting in a more equal mixture of foreground and background pixels. Fig. 1 shows the pipeline that was used to define the bounding box. A sample from the center-cropped images and the related annotation are presented in steps (i) & (ii). These frames are sent into the U-net model (step (iii)), which was previously trained to detect the region surrounding the PZ using the larger bounding boxes. The predictions are taken from the initial frames in step (iv), approximately defining the region of interest. Furthermore, the bounding boxes are constructed in step (v) utilizing the amorphous masks' minimum and maximum coordinates on the x and y axes from step (iv). The original annotations are always included in the final cropped image thanks to this approach. In step (vi), the frames are resampled to 256X256 pixels in order for the input frames to meet the network's criteria. Finally, the cropped frames and annotations are displayed in steps (vii) and (viii), and the data resulted from the previous stages may be utilized to train the networks.

Deep Learning segmentation networks
For the comparison of the DL-cropping technique to the traditional center-cropping method, the original versions of three state-of-the-art segmentation networks were used. The first is the U-net model [6], which employs an encoder-decoder layer combination, with the layers coupled in serial and parallel to increase the network's capacity to learn spatial features. Furthermore, Dense U-net [13] is an encoder-decoder network in which dense blocks [14] are used to propagate information from previous layers forward, while transitional blocks flatten the feature maps and keep 2 EAI Endorsed Transactions on Bioengineering and Bioinformatics 08 2021 -04 2022 | Volume 1 | Issue 4 | e3 the most important features, lowering the network's computational cost. The third network, Bridged U-net [11], is comprised of two inter-connected U-nets, with cross linkages across levels from the first U-net to the second, enabling the models to interact and collaborate further on feature extraction.

Network training
Two pipelines were used to train and test the network designs. The networks were trained using the frames after conventional center-cropping in the first (Fig.1, steps i, ii) while the suggested DL cropping technique was utilized to crop the frames into variable size slices ranging from 90 to 140 pixels for the second pipeline, and the generated images (Fig.1, steps vi, vii) were used to train the networks.
Regarding the training parameters in the DL models, the cost functions employed were the training accuracy and binary cross-entropy loss. The optimization method used was the Adam algorithm [16] instead of the Stochastic gradient descent [17] since it has been shown that the former converges faster. For all architectures, the model was trained for 120 epochs. To decrease calculation time, a checkpoint technique and early stopping were utilized, as well as tensorflow's tensorboard for monitoring the training and validation process.
Training was performed in 5-fold cross validation pipeline to properly evaluate the generalizability of the models. The partition of the images within each fold was done patient-wised to ensure unbiased model training by preserving the intra-individual size and shape variations of prostate's PZ. For all models and both DL-crop and center-crop techniques, the patients were segregated in the folds in identical way. A total of 78 patients were used in each training and validation set, with the remaining 20 patients being used for testing. In terms of frames' distribution (2D slices), each training fold had 891 slices prior data augmentation and 1692 slices post data augmentation, with an image size of 256x256 pixels. There were roughly 152 slices in each validation set and 276 slices in each testing fold.

Performance evaluation
Several metrics were used to assess the segmentation performance, including the Dice Score coefficient, the Balanced Accuracy, the Hausdorff distance, the Average Surface Distance, the Rand Error index, the Sensitivity and the Specificity [18]. The performance of the trained models was calculated by averaging the 5-fold cross validation results over the test-sets. The non-parametric Wilcoxon matched-pairs signed rank test was used to compare the center-crop and DL-crop for each metric and architecture.

Results
The performance of the three segmentation algorithms was evaluated for both the proposed DL-cropping and 3 EAI Endorsed Transactions on Bioengineering and Bioinformatics 08 2021 -04 2022 | Volume 1 | Issue 4 | e3 the standard center-cropping using seven performance metrics. The average values and standard deviation are shown in Table 1. Comparing the scores in Table 1, the DL-cropping resulted to better performance for all the metrics, except specificity. Besides, the proposed pipeline has the lowest standard deviation values for all the metrics, except specificity. It is worth mentioning that the increased specificity achieved with centercropping is directly associated with the higher class imbalance in the resulting images in favor of the background pixels leading to a considerably larger number of true negative predictions.
Overall, the proposed pipeline outperformed the center-cropping method with significant improvements for all architectures (p<0.001) in terms of Dice score, Sensitivity, Hausdorff Distance and Average Surface Distance.The improvement in mean Dice Score was 34%, 13% and 16% for the U-net, Dense U-net and Bridged U-net, respectively. Regarding the Hausdorff Distance, there was an improvement of 53%, 76% and 39% for U-net, Dense U-net and Bridged U-net, respectively. The corresponding boxplots of of the seven metrics for the three segmentation networks using center-cropping and DL-cropping are shown in Fig. 2.
In Fig. 3 an indicative example of the segmentation performance for the three DL networks after DLcropping and center-cropping is provided. The blue contours originate from the original PZ mask, while the predicted mask is depicted with orange. The qualitative assessment through visual inspection of the segmented regions using the three networks, also confirmed that image preprocessing with DL cropping instead of center cropping, improved the performance of the algorithms.

Discussion
The current study presents a novel preprocessing technique for increasing the efficiency and effectiveness of established PZ segmentation DL architectures on T2w MR images. To address the problem of class imbalance between background and foreground pixels in the image, a DL-based framework for image cropping is presented. As it was shown, the suggested DL cropping approach outperformed the traditional center cropping for all of the prostate segmentation networks considered in this work. The prostate gland, particularly the PZ, consist just a small component of the typical pelvic MRI. At the same time, it is well-documented that when machine-and deep-learning algorithms are trained on unbalanced data, may suffer from restricted prediction accuracy [19]. In the case of imbalanced representation between two classes in the training image (i.e. foreground and background pixels), then the most frequently occurring class will be favored during training. A potential solution is to tackle this issue during model training by opting for a loss functions able to compensate for the presence of class imbalance. With the weighted cross-entropy loss, for example, class weights inversely related to the incidence of each class are assigned, thereby penalizing the most frequently occurring class. Nonetheless, the choose of the most efficient weighting function is cumbersome and applicationspecific [20]. In [21], the authors compared the effects on PZ segmentation of some of the most popular loss functions.
Some authors have leveraged object detection approaches to improve segmentation performance in various medical imaging applications. Jaeger et al. [22], proposed the Retina Unet, which combines the RetinaNet's one-stage detector with the standard U-Net architecture. Training both segmentation and detection tasks simultaneously was able to improve detection rate, compared to U-net-like models. Based on the 3D Retina Unet, recently, a fully automatic DL-based model was proposed to perform at the same time prostate cancer detection, segmentation and Gleason Grade estimation, achieving a state-of-the-art performance level [23]. Nevertheless, the original U-Net architecture has 30 million parameters and the Resnet model, which is the backbone of RetinaNet, has 25 million parameters. Conversely, our proposed smart-crop U-Net, with only 1.94 million parameters, maintains the same performance to crop efficiently the area of interest. Apart from two-stage detectors, some works have also implemented one-stage detectors to directly localize ROIs without requiring candidate regions to be proposed [24]. These detectors have shown to be more flexible, straightforward, and computationally efficient. Additionally, novel object detection approaches, such 4 EAI Endorsed Transactions on Bioengineering and Bioinformatics 08 2021 -04 2022 | Volume 1 | Issue 4 | e3 Figure 2. Boxplots of the PZ segmentation performance for center-cropping and DL-cropping using three networks. 5 EAI Endorsed Transactions on Bioengineering and Bioinformatics 08 2021 -04 2022 | Volume 1 | Issue 4 | e3 as the CaraNet [25], are particularly attractive for the segmentation of small objects in medical images with some recent works, demonstrating promising results. In future studies, it would be of particular interest to compare emerging object detection methods with the proposed DL-based smart cropping for addressing challenging task of prostate cancer segmentation.
The present work has some limitations. First, we have not been able to reach the performance scores reported in the literature for the different DL segmentation networks, possibly due to the lower number of patients included in our study. For instance, using a training sample of 141 patients for PZ segmentation, the Unet and Dense U-net models have achieved 75% and 78% Dice scores, respectively [21]. Herein, models' performance was maximum 61% but model training was performed on a dataset of 78 patients. These differences in model performance can also be attributed to the fact than in [21], the authors have only estimated segmentation performance on the mid-gland region of the prostate where segmentation models tend to perform better than the apex, since the later may present important shape and size variations. Despite all, additional evidence is required to prove that the suggested approach is superior to conventional methods like center-cropping.

Conclusion
A preprocessing technique is proposed to effectively overcome the class imbalance problem in prostate MRI segmentation tasks. The improvement in PZ segmentation performance of DL networks was significant when the proposed method was employed in comparison with the conventional center-cropping method. In the future, the generalizability of the proposed pipeline needs to be demonstrated on independent populations through external validation including images acquired by different MRI vendors, field properties and acquisition protocols.