Professional Documents
Culture Documents
Abstract
Image segmentation has created important advances in recent years. Recent work construct to a
great extent with respect to Deep Learning techniques that has brought about groundbreaking
enhancement within the accuracy of segmentation. As a result of image segmentations are a mid-
level illustration, they need a potential to create major contribution over the wide field of visual
understanding from image classification and interactive pursuit. Medical image segmentation is a
sub area of image segmentation that has many essential applications inside the prospect of
medical image evaluation and diagnostic. In this paper, distinct strategies of medical image
segmentation could be classified forthwith their sub techniques and sub fields. This paper
presents useful approaches into the field of medical image segmentation using Deep Learning
and attempt to summarize the long term scale of work.
I Introduction
In image segmentation, the images are partitioned into series of non-overlapping regions. These
regions provide the tissues of human with utterly different structures and submit the tissue into
the suitable technique for determining the clinical identification accurately. Automatic
segmentation of medical image is difficult due to the fact of variations in structure such as shape
and size among patients [1]. Moreover, the tissue surrounded by the poor contract will create
troublesome in automated segmentation. Recently, the application of deep learning based
methods provides an effective classification and learning of features from the image directly.
Particularly, the improvement of Convolutional Neural Network (CNN) has similarly advanced
the progressive in medical image segmentation [2].
A CNN contains more than one Conloluational layers with sub sampling layers. The
convolutional layers build spatial correlation for given input image by constructing a feature map
and sharing the kernel weight for each contribution [3]. The CNNs are effectual due to the
hierarchical feature representation of a given image is learned strictly in data driven method. The
challenges of using CNNs are defined as follows.
1. The CNN does not simplify the formerly unseen objects that are not available in the
training set. In medical image segmentation, the annotations of medical image are high-
prices to gather as both knowledge and it take more time to produce the accurate
information. This restricts the performance of CNNs to segment the specified image for
annotations are not present in the training set [4].
2. The recent research is not adaptive for various test images and requires image specific
learning for dealing the large context variations among various images.
Deep learning is a set of rules in machine learning [5] that automatically analyze the medical
images efficiently for diagnosis the diseases. In recent year, deep learning become popular by
facilitate the higher level of abstraction for providing enhanced prediction from the given data
set. In this paper, the proposed deep learning based medical image segmentation techniques are
compared to identify the challenges for its future research.
The rest of paper is organized as follows. In section II, architecture of Deep learning technique is
described in detail afterwards section III addresses the application of deep learning in medical
field and the pros and cons of each algorithm is discussed in brief. Finally the conclusion is
drawn with suggestion of future research.
To show that the deep learning technologies are saturated into the entire field of
medical image segmentation
To identify the difficulties for effective utilization of deep learning in medical image
segmentation
Unet
The U Net [7] architecture is constructed upon the FCCN and adapted in an exceedingly method
that it give up higher segmentation in medical image. The two important innovations of U Net
architecture is the unsampling and downsampling layers are mixed in equal quantity. The Unet
architecture is differed from FCN-8 in two ways (1) U Net has symmetric connection (2) the skip
connection between convolution and deconvolution layers apply a concatenate operator from
contracting and expanding its path. The aim of skip connection is to provide the local and global
information whereas unsampling. From the training viewpoint, the segmentation map is directly
produced in U-Net by processing the image in single forward pass. Due to the symmetric
connection, U-Net has a large number of feature map within the sampling layer that permits
transferring the information. The U-Net architecture is shown in fig 2.
In U-Net, the contracting path consists of 4 blocks in which each block can have 3*3
convolutional layers and activation function with batch normalization. U-Net starts with 64
feature map. It doubles the feature map at each pooling. In order to segment the medical image,
the contracting path confine the context of input image and transfer it to the expanding path.
V-Net
In medical image segmentation, the diagnostic images often contains 3D format that are having
the ability to perform volumetric segmentation by the way of deliberating the entire volume
content directly has a specific relevance. The aim of V-Net [8] architecture is segment the
prostate MRI volumetric. This can be difficult function due to the prostate will assume in
different examines because of deformation of intensity distribution. In V-Net, medical image
segmentation uses the intensity of completely convolutional neural systems, to process MRI
volumes. In contrast to other latest methodologies, V-Net abstain from preparing the input
volumes in slice-wise and uses the dice coefficient maximization to get an accurate
segmentation. In V-Net the MRI images are segmented quickly and efficiently than other
networks.
In training, dice overlab coefficient is established between anticipated segmentation and
ground truth annotation. Fig. 3 shows a graphical representation of V-Net architecture. The
compression path is placed in left side and decompression path is in right side. It uses a
volumetric kernel with the size of 5*5*5 voxels for performing the convolution at each state. In
order to activate various resolution, the left side of the network is further divided into number of
stages.
GoogleNet
The GoogleNet [9] is developed by Google. It contains 22 layers with the novel method of
inception module. This module consists of number of very little convolutional layers with the
purpose of reducing the number of required parameters.
ResNet
ResNet [11] built the architecture with the novel approach of skip connections with feature heavy
batch normalization. It has the ability to train the network with 152 layers at the same time as
lower complexities compared than previous networks. ResNet includes multiple residual
modules which helps to construct the architecture. The illustration of residual module as follows.
The residual module has two choices, both it may carry out the set of task on this input or it
may well omit this step on the whole. Now almost like GoogleNet, these modules are arranged
in staked manner to construct an absolute end-end network. Some of the additional novel
methods are introduced by the ResNet is described as follows.
This can be carried out together with an affordable initialization function that
maintains the training together.
The RCNN determine the CNN feature of each object by using a selective search to extort
huge amount of object proposals. It uses class specific classifier SVMs for classifying each
region in the image. The RCNN concentrated on the more problematical task of image
segmentation and object detection. In RCNN image segmentation, 2 kinds of features are
extracted in each region. They are foreground features and full region features. These two
features are combined together to get a good performance for image segmentation. At testing
stage, the region based predictions are converted into pixel based predictions. The pixels are
classified based on the high score in the region.
SegNet
SegNet [13] is a deep learning architecture for image segmentation which has encoder and
consequent decoder with pixel classification layer. The encoder generates the feature map by
carry out the convolution with filter bank. The input feature map is converted into unsample
using the decoder with the help of max pooling. The max pooling induces from feature map
decoder. As a result, the sparse feature map is generated.
These feature maps is further trained by the decoder filter bank to create a dense feature
maps. Finally batch normalization is applied over the feature maps. The decoded feature map is
passed into the soft-max classifier to generate the independent class probability for every pixel.
The output of soft-max classifier consist of k probabilities channel images. It retains the high
frequency details efficiently in segmented image by connecting the pooling indices of encoder
with the pooling indices of decoder. The comparison of deep learning architectures are shown in
table 1.
Table 1. Comparison of Deep Learning Architectures
DeepLab
Deeplab [15] segments the semantic images by applying the strous convolution using the
unsampled filters for extracting the dense features. Atrous convolution makes it possible for
segmentation to unambiguously manipulate the decision at which function extractions are
determined inside the DCNN. The DeepLab tend to more extend it to atrous special pyramid
pooling that train the objects furthermore as image framework at different scales. The
localization of object boundaries are enhanced by combining the probabilities graphical model
with Deep Convolutional Neural Networks. Furthermore, the DCNN is combined with fully
connected conditional random fields to provide the exact calculation of semantic images and
construct the segmentation maps for boundary of each objects.
DeepIGeos
A Deep Interactive Geodisk [16] proposed a 2D and 3D image segmentation using DL
interaction framework. To increase the interaction of users, the DeepIGeos presented a
framework with two stages. In first stage, the initial segmentation is obtained automatically by
using P-Net and in the second stage, DeepIGeos used R-Net for indicating the missegmented
regions by interacting with users and filter the user interaction. The filtered user interactions are
converted and integrated into the input of R-Net. The level features are captured without any loss
in resolution by using the resolution preserving structure in R-Net and P-Net. Once the R-Net is
trained based on the interaction of users, it consider the reasonable time for retraining the
convolutional neural network for large number of training set.
Deepcut
Deepcut [17] integrate the iterative graphical optimization with Deep Learning to attain
segmentation in pixelwise manner. The medical images are segmented from image data sets with
its consequent bounding box observations. The training targets are updated iteratively finds out
by CNN method and the segmentations are regularized with the help of fully connected
conditional random filed. The training targets are the functions related with the localization of
voxels which are depicted from image patch. So that Deepcut method is developed in an
exceedingly generic type and therefore is often readily applicable to any medical images.
DeepMedic
DeepMedic [18] presents a dual pathway with eleven layers and 3D DCNN for segmenting the
multi model MRI based brain lesion. It contains two important components. They are 1) 3D
Convolutional Neural Network which provides segmentation map accurately 2) fully connected
3D CRF which enforces the constraints in regular manner on output of CNN and finally generate
the segmentation labels. The dual pathway analyzes the given images at different scales
concurrently in order to integrate both local and global contextual annotations. DeepMedic is
usually assessed on three different tasks of brain MRI lesion segmentation: 1. Traumatic brain
wound, 2. Ischemic stroke and 3. Brain tumors. In training stage, test images segmentation
utilize the dense training for analyzing the behavior of each image.
DeepOrgan
DeepOrgan [19] provides a probabilities bottom up approach for pancreas image segmentation in
abdominal CT scan by using multilevel DCNN. It proposes coarse to fine classification for
image patches in dense annotations, to regions and whole organs. It generates the initial super
pixels for medical images and these super pixels can act as local contextual information with low
precision. DeppOrgan provides a dense classification of local patches in images through nearest
neighbor fusion and P-ConNet. Class probabilities are assigned to each super pixel regions to
train image.
DCAN
Deep contour-Aware Networks [20] propose multilevel contextual features from layered
architecture which discovered the gland segmentation with auxiliary supervisor. Discriminative
features of transitional features are improved when training set is integrated multilevel
regularization technique. DCAN can handle three important challenges of gland segmentation.
Fist, probability map is generated straightly in a single forward propagation for large quantity of
image analysis. Second, it is easy to analyze the glandlure structure with biopsy test which
includes benign and malignant. Finally, gland contour and objects are investigated independently
using multi task training framework. The comparison of deep learning applications are shown in
table 2.
IV Conclusion
This paper provides a brief overview of Deep Learning architecture and its application of image
segmentation in medical field. The aim of this survey is specify the applications of deep learning
in medical image analysis in a crisp and easy way. Medical field consists of a number of issues
for analyzing the disease of different kinds of patients. Therefore, this paper is concentrated on
analyzing the architecture of DL and application of DL in medical field. Furthermore, it also
carried out the difficulties of analyzing the particular disease in specified organs in discussed in
details. From the survey, it concludes that the DL can provide a sensible support for
specialization in medical field. In future, the application of various fields will be considered.
Reference
[1] Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. “Deep learning”. MIT Press(2016)
[2] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, “Deep learning for visual
understanding: a review,” Neurocomputing, vol. 187, pp. 27–48, 2016.
[3] Bahrampour, Soheil, et al. “Comparative study of deep learning software frameworks.” arXiv
preprint arXiv:1511.06435 (2015).
[4] Dou, Q.; Yu, L.; Chen, H.; Jin, Y.; Yang, X.; Qin, J.; Heng, P.A. 3D deeply supervised network
for automated segmentation of volumetric medical images. Med Image Anal. 2017, 41, 40–54.
[5] Havaei, M.; Davy, A.; Warde-Farley, D.; Biard, A.; Courville, A.; Bengio, Y.; Larochelle, H.
Brain tumor segmentation with Deep Neural Networks. Med. Image Anal. 2017, 35, 18–31.
[6] Mehta, R.; Majumdar, A.; Sivaswamy, J. BrainSegNet: A convolutional neural network
architecture for automated segmentation of human brain structures. J. Med. Imaging (Bellingham)
2017, 4, 024003.
[7] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-Net: Convolutional Networks for
Biomedical”,Computer Vision and Pattern Recognition, 2015
[8] Fausto Milletari1, Nassir Navab1;2, Seyed-Ahmad Ahmadi,” V-Net: Fully Convolutional Neural
Networks for Volumetric Medical Image Segmentation”,