Professional Documents
Culture Documents
Anil Kumar Reddy, Sai Vikas, R. Raghunatha Sarma, Gurudat Shenoy, and
Ravi Kumar?
1 Introduction
Deep networks such as Convolutional neural networks (CNN), deep belief net-
works (DBN), recurrent neural networks (RNN) etc., are being used in current
research fields such as computer vision, natural language processing etc., where
all these architectures have produced very interesting results as compared to
human experts [3]. In this paper, we are mainly interested in the experimental
study for optimizing the hyper parameters of prominent CNN models such as
U-net for segmentation and AlexNet for classification of the CT Renal images.
We have specifically focused on the axial dimension CT images to find whether
the kidney has stones or not.
?
We thank Sri Satya Sai Institute of Higher Medical Sciences for providing the dataset
and the details about it.
The rest of the paper is organized as follows. Section 2 briefly describes the
background of all the frameworks and the related work. Section 3 describes the
experimental setup. In Section 4 we present and discuss the results. Section 5
concludes by giving the summary and immediate future directions.
2.2 Frameworks
We have worked with keras framework with tensorflow as backend. TensorFlow
[4] is an open source software library for numerical computation using data flow
graphs developed by Google. It is an ecosystem for developing deep learning
models.
Keras [5] is a high-level open source neural networks application program
interface written in Python language. Instead of keeping all the functionalities
within keras it runs on top of Theano, TensorFlow and CNTK.
Alexnet Alexnet [7] is the 2012 winner of ILSRVC image classification chal-
lenge. This work is widely regarded as the most influential publication in the
field. AlexNet consists of five convolutional layers and three fully connected(FC)
layers.
Saliency maps Saliency map is an image which shows the unique nature of
each pixel. The main aim of the saliency map is to change the given image rep-
resentation to a more meaningful one which helps in analysis. Saliency is a kind
of Image segmentation. The output of these maps are a set of contours extracted
from the original image. Formally speaking, in a given image x, the class which
it belongs to as c, and the classification network (in our case Alexnet) with class
score function SC (x), we can rank the image pixels based on their influence on
the class score. So if the class score function is piece wise differentiable, for any
given image, we can construct these saliency maps MC (x) very easily by just
differentiating MC (x) with the input x as shown in 1 [8].
dSC (x)
MC (x) = (1)
dx
These maps [9] are just the depictions of what exactly the convolutional network
is doing in each layer. Some pixels in the image might seem to be scattered in a
random fashion, but across the image they are central which shows us how the
CNN is making decisions in each layer.
Fig. 1. Coronal view (a) Healthy & (b) Stone Renal samples.
Fig. 2. Masks predicted by the U-net architecture for given test images.
The graphs in the figure 3 and 4 show the training loss, training accuracy,
dice coefficient for training, validation loss, validation accuracy and validation
dice-coefficient.
(a) (b)
Fig. 3. Metrics (a) Train loss & (b) Training Dice Coefficient
(a) (b)
Fig. 4. Metrics (a) Validation loss & (b) Validation Dice Coefficient
Segmenting the kidneys from the renal image is effectively a data pre-processing
step for the classification. Here we show the whole system design on how we
achieved segmentation using U-net. The figure 5 sums up the whole process.
Fig. 5. Complete work flow of data pre-processing on Renal images
Classification F1
Test Loss Test Accuracy Sensitivity FPR Precision
Error Score
0.2613 0.9475 0.9506 0.0555 0.944 0.0524 0.9468
Table 1. Results of Alexnet with ReLU activation on test dataset
All the classification results are shown in the table 1. We can see false positive
rate to be 0.0555 (9 out of 170 images are predicted as wrong). Using tensorboard
we calibrated the metrics related to renal data such as train loss, train accuracies,
validation loss and validation accuracies are shown in the figures 6 and 7.
In the figures below we can see the training and validation loss close to 0.30
whereas the train and validation accuracies going up to 95%.
(a) (b)
(a) (b)
Fig. 8. Visualizations of first and second convolutional layers for the given (a) Healthy
& (b) Non healthy images
When dealing with image data, each pixel in the image is a feature. So in the
above figures we can see the input images and the corresponding attributions.
The main aim of attribution is to output the original value for every feature with
respect to a target neuron of interest. So when all the attributions of input image
features are grouped together to get the similar shape as the input then we talk
about attribution maps which are clearly shown in the above figures, where red
indicates the features that contribute positively to the activation of the output
whereas the blue color represents the features that are having a suppressing
effect on them. These maps also show us the features that contribute in getting
the desired output image, the layers that are responsible for it and the main
reasons for the misclassifications.
4 Conclusion & Future Work
We performed segmentation and classification of renal images by experimenting
with U-net and Alexnet. U-net is used for the prediction of kidney masks. We
conducted series of experiments on Alexnet to tune the hyper parameters. We
then experimented with saliency maps to visualize each layer’s output. In future
we would like to consider sagittal and axial images of renal images to achieve a
more general model. Along with segmentation of the stone, predicting the size
and it’s severity can also be very useful.
References
1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553,
p. 436, 2015.
2. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for
biomedical image segmentation,” in International Conference on Medical image
computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
3. D. C. Ciresan, L. M. Gambardella, A. Giusti, and J. Schmidhuber, “Deep neural
networks segment neuronal membranes in electron microscopy images,” in IN NIPS,
2012, pp. 2852–2860.
4. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat,
G. Irving, M. Isard et al., “Tensorflow: A system for large-scale machine learning,”
arXiv preprint arXiv:1605.08695, 2016.
5. F. Chollet, “keras,” https://github.com/fchollet/keras, 2015.
6. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedfor-
ward neural networks,” in Proceedings of the Thirteenth International Conference
on Artificial Intelligence and Statistics, 2010, pp. 249–256.
7. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” in Advances in neural information processing sys-
tems, 2012, pp. 1097–1105.
8. K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional net-
works: Visualising image classification models and saliency maps,” arXiv preprint
arXiv:1312.6034, 2013.
9. D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg, “Smoothgrad: re-
moving noise by adding noise,” arXiv preprint arXiv:1706.03825, 2017.