You are on page 1of 8

Segmentation and Classification of CT Renal

Images using Deep Networks

Anil Kumar Reddy, Sai Vikas, R. Raghunatha Sarma, Gurudat Shenoy, and
Ravi Kumar?

Sri Sathya Sai Institute of Higher Learning, Prashanti Nilayam


{anilkumarreddy.sssihl@gmail.com, saivikas23uniq@gmail.com,
rraghunathasarma@sssihl.edu.in, s.gurudat@gmail.com,
ravikumar.s@sssihms.org.in}
http://www.sssihl.edu.in/

Abstract. The study of deep learning[1] models, in particular, the Con-


volutional Neural Networks(CNN) is playing a key role for various appli-
cations in medical domain since last decade. It has successfully demon-
strated interesting results with higher accuracy which motivates sophis-
ticated diagnosis tools in the Healthcare domain. We have done a study
on using CNN models such as U-net and Alexnet on renal dataset for
segmentation and classification of renal images. Data preprocessing on
kidney images has been carried out using U-net architecture [2]. A de-
tailed study on fine tuning the hyper parameters that governs the mod-
els performance and test accuracy has been carried out. We achieved a
dice coefficient of 83% in creating masks for renal data using u-net. We
performed experiments on AlexNet and the best accuracy achieved is
94.75%. Finally we have visualized the convolutional layers using saliency
maps.

Keywords: Renal data, segmentation, classification, convolutional neu-


ral networks, Unet, Alexnet, tensorflow, Saliency Maps, Visualization

1 Introduction

Deep networks such as Convolutional neural networks (CNN), deep belief net-
works (DBN), recurrent neural networks (RNN) etc., are being used in current
research fields such as computer vision, natural language processing etc., where
all these architectures have produced very interesting results as compared to
human experts [3]. In this paper, we are mainly interested in the experimental
study for optimizing the hyper parameters of prominent CNN models such as
U-net for segmentation and AlexNet for classification of the CT Renal images.
We have specifically focused on the axial dimension CT images to find whether
the kidney has stones or not.
?
We thank Sri Satya Sai Institute of Higher Medical Sciences for providing the dataset
and the details about it.
The rest of the paper is organized as follows. Section 2 briefly describes the
background of all the frameworks and the related work. Section 3 describes the
experimental setup. In Section 4 we present and discuss the results. Section 5
concludes by giving the summary and immediate future directions.

2 Background and Related Work


With the rapid proliferation of deep learning techniques, a number of deep neural
networks(DNNs) such as fully connected neural networks(FCNs) and convolu-
tional neural networks(CNNs) have been developed for various applications. In
this work we segmented the kidney from the renal images using U-net and clas-
sified the resultant images using AlexNet.

2.1 DNN Training


DNN training starts off with the forward propagation. The results from the
forward propagation step are compared against the known label to calculate
the error value. In backward propagation, error propagates back through the
network’s layers and updates their weights using gradient descent. It is a very
common approach to batch hundreds of training inputs and operate on them
simultaneously during DNN training in order to prevent over fitting and more
importantly, to reduce loading weights from GPU memory across multiple inputs
increasing computational efficiency.

2.2 Frameworks
We have worked with keras framework with tensorflow as backend. TensorFlow
[4] is an open source software library for numerical computation using data flow
graphs developed by Google. It is an ecosystem for developing deep learning
models.
Keras [5] is a high-level open source neural networks application program
interface written in Python language. Instead of keeping all the functionalities
within keras it runs on top of Theano, TensorFlow and CNTK.

2.3 Deep Networks


U-net U-net [2] is a convolutional network architecture for fast and precise seg-
mentation of images. Till now it has outperformed the prior best method on the
ISBI challenge for segmentation of neuronal structures in electron microscopic
stacks.
We have added batch normalization after every convolution in the network
and initialized the network with the glorot uniform [6] and kernel regularizer with
l2 lambda of 0.0001. U-net comprises of a contracting path and an expansion
path. Contracting path is same as the usual convolutional neural network archi-
tecture . In the expansion part of the architecture each step of an up-sampling
procedure for the feature map is followed with a 2x2 convolution, which is known
as up-convolution. After which there are two 3x3 convolutions and a ReLU. Af-
ter all the convolutional layers, at the end there is a single convolution. So in
total U-net architecture consists of 23 layers.

Alexnet Alexnet [7] is the 2012 winner of ILSRVC image classification chal-
lenge. This work is widely regarded as the most influential publication in the
field. AlexNet consists of five convolutional layers and three fully connected(FC)
layers.

2.4 Visualization Technique

Saliency maps Saliency map is an image which shows the unique nature of
each pixel. The main aim of the saliency map is to change the given image rep-
resentation to a more meaningful one which helps in analysis. Saliency is a kind
of Image segmentation. The output of these maps are a set of contours extracted
from the original image. Formally speaking, in a given image x, the class which
it belongs to as c, and the classification network (in our case Alexnet) with class
score function SC (x), we can rank the image pixels based on their influence on
the class score. So if the class score function is piece wise differentiable, for any
given image, we can construct these saliency maps MC (x) very easily by just
differentiating MC (x) with the input x as shown in 1 [8].

dSC (x)
MC (x) = (1)
dx
These maps [9] are just the depictions of what exactly the convolutional network
is doing in each layer. Some pixels in the image might seem to be scattered in a
random fashion, but across the image they are central which shows us how the
CNN is making decisions in each layer.

2.5 Dataset description

Renal dataset consists of Computed Tomography(CT) images in native for-


mat. These are then converted to JPEG format for the training purposes. This
dataset is collected from the Sri Sathya Sai Institute of Higher Medical Sci-
ences(SSSIHMS) hospital, Puttaparthi, Andhra Pradesh, India. This dataset is
collected by us under the direct supervision of Dr. Gurudat, department of Radi-
ology. This dataset can be used for problems such as diagnosing whether kidney
has stones or not, whether stones are present in urinary bladder or anywhere
in the human body parts between chest to the urinary bladder. Sample images
that are healthy and with stone are shown in figure 1. We have collected healthy
and non-healthy images of 1800 in total.
(a) (b)

Fig. 1. Coronal view (a) Healthy & (b) Stone Renal samples.

3 Experimental Results and Discussion

3.1 Experimental Setup

We have conducted experiments on two different GPU cards namely GeForce


GTX TITAN X and Tesla K20c.

3.2 Challenges with original renal dataset

While our goal is to classify a renal image as healthy or non-healthy based on


the presence or absence of the stone, both the stone and the bone pixels have
similar shade in the image corpus. This makes the classification task difficult for
the CNN. As a result, when we trained Alexnet on those kidney images, we could
only achieve around 50% accuracy on the test data. Given network with large
depth, the ability to back propagate the gradient to all layers was a concern.
This can be combated by adding regularization terms and techniques like batch
normalization and Xavier initialization. When we included these techniques in
the CNN models, the accuracies were touching 60%.This accuracies are not worth
speaking in medical domain, so we manually created the masks of the kidneys
so that the bone part of the image is not accounted for. Masks for close to 1200
image were created just as it is done for ultrasound nerves images in [3].

3.3 U-net Results

We trained U-net architecture using Adam optimizer, with an initial learning


rate of 0.001 on the 1200 images with batch size of 32 for 50 epochs and achieved
dice coefficient close to 83%. We could achieve these accuracies by including the
optimization techniques such as batch normalization, Xavier initialization etc.
Then U-net has predicted the masks for all the remaining test images (600
images) with the aforementioned dice coefficient. Some of the images which are
predicted by the network are shown in the figure 2.
So far we have only the shape of the kidney but not the information about it.
Now the final step is to extract only the kidney information from the shape and
that is achieved by applying the masked kidney on the corresponding image.
(a) (b)

Fig. 2. Masks predicted by the U-net architecture for given test images.

The graphs in the figure 3 and 4 show the training loss, training accuracy,
dice coefficient for training, validation loss, validation accuracy and validation
dice-coefficient.

(a) (b)

Fig. 3. Metrics (a) Train loss & (b) Training Dice Coefficient

(a) (b)

Fig. 4. Metrics (a) Validation loss & (b) Validation Dice Coefficient

3.4 Complete work flow of data pre-processing on renal images

Segmenting the kidneys from the renal image is effectively a data pre-processing
step for the classification. Here we show the whole system design on how we
achieved segmentation using U-net. The figure 5 sums up the whole process.
Fig. 5. Complete work flow of data pre-processing on Renal images

3.5 Experimental results on renal dataset using Alexnet


architecture
To attain higher levels of accuracies, we worked with Alexnet model. The model is
trained from scratch for our experiment by first doing data preprocessing with u-
net architecture as discussed previously. We implemented this model with added
features such as Batch normalization, Dropout, Xavier initialization, he uniform
and regularization terms to improve our results. We worked with images of size
224x224x1, where the training set is divided into 580 healthy and 580 stone
images, where as the validation set into 150 healthy and 150 stone images. To
increase the dataset we used Image Data Generator function in keras.
We conducted experiments by focusing only on single TitanX GPU which is
different from [7]. We trained AlexNet model from scratch with Adam optimizer,
ReLU non-linearity, initial learning rate of 0.001 and trained the model for 300
epochs.The accuracy we achieved is 0.9475 and loss of 0.2613 for test dataset.

Classification F1
Test Loss Test Accuracy Sensitivity FPR Precision
Error Score
0.2613 0.9475 0.9506 0.0555 0.944 0.0524 0.9468
Table 1. Results of Alexnet with ReLU activation on test dataset

All the classification results are shown in the table 1. We can see false positive
rate to be 0.0555 (9 out of 170 images are predicted as wrong). Using tensorboard
we calibrated the metrics related to renal data such as train loss, train accuracies,
validation loss and validation accuracies are shown in the figures 6 and 7.
In the figures below we can see the training and validation loss close to 0.30
whereas the train and validation accuracies going up to 95%.

3.6 Visualization results on renal dataset using saliency maps :


We attempted the visualizations of each convolutional layer in Alexnet architec-
ture to see what exactly each layer does for a given image. For this we constructed
the saliency maps for each layer where the gradient is multiplied with the input.
By doing this only the key regions are highlighted in the image which means
that in the corresponding layer, those features have been identified. The figures
below 8 show the different saliency maps that are produced at different layers.
(a) (b)

Fig. 6. Metrics (a) Train loss & (b) Training accuracy

(a) (b)

Fig. 7. Metrics (a) Validation loss & (b) Validation accuracy

(a) (b)

Fig. 8. Visualizations of first and second convolutional layers for the given (a) Healthy
& (b) Non healthy images

When dealing with image data, each pixel in the image is a feature. So in the
above figures we can see the input images and the corresponding attributions.
The main aim of attribution is to output the original value for every feature with
respect to a target neuron of interest. So when all the attributions of input image
features are grouped together to get the similar shape as the input then we talk
about attribution maps which are clearly shown in the above figures, where red
indicates the features that contribute positively to the activation of the output
whereas the blue color represents the features that are having a suppressing
effect on them. These maps also show us the features that contribute in getting
the desired output image, the layers that are responsible for it and the main
reasons for the misclassifications.
4 Conclusion & Future Work
We performed segmentation and classification of renal images by experimenting
with U-net and Alexnet. U-net is used for the prediction of kidney masks. We
conducted series of experiments on Alexnet to tune the hyper parameters. We
then experimented with saliency maps to visualize each layer’s output. In future
we would like to consider sagittal and axial images of renal images to achieve a
more general model. Along with segmentation of the stone, predicting the size
and it’s severity can also be very useful.

References
1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553,
p. 436, 2015.
2. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for
biomedical image segmentation,” in International Conference on Medical image
computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
3. D. C. Ciresan, L. M. Gambardella, A. Giusti, and J. Schmidhuber, “Deep neural
networks segment neuronal membranes in electron microscopy images,” in IN NIPS,
2012, pp. 2852–2860.
4. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat,
G. Irving, M. Isard et al., “Tensorflow: A system for large-scale machine learning,”
arXiv preprint arXiv:1605.08695, 2016.
5. F. Chollet, “keras,” https://github.com/fchollet/keras, 2015.
6. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedfor-
ward neural networks,” in Proceedings of the Thirteenth International Conference
on Artificial Intelligence and Statistics, 2010, pp. 249–256.
7. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” in Advances in neural information processing sys-
tems, 2012, pp. 1097–1105.
8. K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional net-
works: Visualising image classification models and saliency maps,” arXiv preprint
arXiv:1312.6034, 2013.
9. D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg, “Smoothgrad: re-
moving noise by adding noise,” arXiv preprint arXiv:1706.03825, 2017.

You might also like