You are on page 1of 5

2020 International Conference on Virtual Reality and Visualization (ICVRV)

SKIN LESION SEGMENTATION BASED ON


MASK R-CNN
Cheng Huang
School of Information and Software Engineering, UESTC Yiwen Wang
Chengdu, P.R. China School of Information and Software Engineering, UESTC
Email: 784796744@qq.com Chengdu, P.R. China
Email: 869513083@qq.com
Anyuan Yu
School of Information and Software Engineering, UESTC Honglin He
Chengdu, P.R. China School of Software Engineering, CQUPT
Email: andrewyuanyuan@163.com Chengdu, P.R. China
Email: 452662218@qq.com

Abstract—Dermatological segmentation has always visually. Meanwhile, the type of skin disease is also difficult
been a hot topic in medical imaging. At present, many to judge itself, due to its visual similarity. This delay in time
algorithms have achieved good results in the may cause the patient’s condition to deteriorate. Nevus and
segmentation of skin diseases, such as super-pixel melanomas (As shown in Fig. 1) are so similar in physical
segmentation and U-Net network. The method we used characteristics that even experienced doctors cannot
in this paper is improved based on the instance distinguish them in the first place.
segmentation model, Mask R-CNN. Firstly, we have
trained the classification branch in Mask R-CNN in
advanced. Secondly, we made some adjustments to the
parameters of Mask R-CNN. These two changes ensure
that our method has higher segmentation accuracy and
detection accuracy than traditional Mask R-CNN. The
data set used in this paper comes from ISIC
(International Skin Imaging Collaboration). Experiment
results demonstrate that the segmentation effect of our
method on skin lesion images is better than the
Fig. 1. The Left image is nevus, and the other is melanoma. They are not
traditional Mask R-CNN. very different in color, shape, and distribution area. It takes a certain
amount of time and effort to make judgments by manual means.
Keywords—Deep learning; Skin Segmentation; Mask R-
CNN; CNN If skin disease develops into skin cancer, 5% of which
are malignant, will result in around 75% of death [4], which
I. INTRODUCTION
will seriously endanger human health. At the same time, the
Today, deep learning models are widely used in various development of methods and tools for automated diagnostics
fields. They have made enormous contributions to the field of skin lesion could provide low-cost medical help around
of medical imaging, especially in the field of dermatology the world and eventually benefits the humanity, especially
segmentation. Many computer vision laboratories are for those who live in less-developed areas where lack
involving in the research on dermatological segmentation professional dermatologists and the cost professional help is
algorithms. As a result, many classic algorithms have high [5]. Medical experts in South Korea have obtained
appeared so far, such as Fully Convolutional Network (FCN) good results by using deep learning algorithms to detect skin
[1], super-pixel segmentation algorithm [2], and U-net [3], diseases [6]. Its accuracy even exceeds the corresponding
which are milestones in dermatological segmentation medical experts. We also verified the correctness of their
algorithm. Many of the later dermatological segmentation results by reproducing their paper codes and used it to detect
algorithms are based on them. Nevertheless, they still have skin diseases. Moreover, on this basis, we studied the deep
some problems to be optimized. Due to the down-sampling learning model, Faster R-CNN [7], they adopted, replacing
steps caused by pooling and large receptive fields in the the original non-maximum suppression algorithm (NMS) [8]
convolutional layers, the predicted lesion segmentation is with Soft-NMS [9] to optimize the decision of the candidate
sometimes vague, and there is a lack of lesion boundary box. The successful application of the deep learning model
details, which is the problem of FCN [2]. The problem of the they used in the medical field prompted us to dive into a
super-pixel segmentation algorithm is that it does not take deeper direction in this field. We also found that there is a
into account on differences from the training data set ground more powerful model called Mask R-CNN [10] based on
truth segmentation masks. Also, for U-Net it's hard to avoid Faster R-CNN. The origin of Mask R-CNN is from the R-
the overfitting problem when the number of iterations CNN [11] series. R-CNN is improved on convolutional
increases. These problems have always been more difficult neural networks(CNN) basis [12] and is also the first deep
when trying to improve the dermatological segmentation learning algorithm applied to target detection. Based on R-
algorithm. For the skin disease itself, it is difficult to CNN, Fast R-CNN [13], Faster R-CNN, and Mask R-CNN
accurately determine the boundary of the affected areas

978-0-7381-4252-4/20/$31.00 ©2020 IEEE 64


DOI 10.1109/ICVRV51359.2020.00024
came out one after another. The model used in this paper is based on Mask R-CNN, with certain improvements.
Bounding-Box
Regression
Proposals
Mask
FPN RPN Feature map NMS
Predicttion

Feature map Classification


Regression
Stage.1
Stage.2

Fig. 2. The architecture of the Mask R-CNN

Mask R-CNN is a model that is easy to expand and In order to verify the effectiveness of our method, we
improve and has good robustness and stability. It extends adopted the ISIC skin disease data set, and the experimental
Faster R-CNN in which the mask branch only adds a small results and related conclusions will be described later.
computational overhead, enabling a fast system and rapid
experimentation [10]. Also, Mask R-CNN is quite successful III. METHODOLOGY
in the field of instance segmentation. It is the basis for many A. Mask R-CNN
later instance segmentation models. These models will be
compared with Mask R-CNN. The method used in this paper We constructed our framework based on Mask R-CNN
is to train its classification branch based on Mask R-CNN as shown in Fig. 2. Mask R-CNN, a general framework for
and to modify some settings of Mask R-CNN to adapt to object instance segmentation, can realize the accurate
skin lesion segmentation tasks. detection of objects in an image and generate a segmentation
mask for each instance simultaneously. For the images in
The dermatological data set we used is from ISIC skin lesion datasets, most of the segmented objects have
(International Skin Imaging Collaboration) [14], which has only one instance, but we have a higher standard for the
23906 images of skin lesion. We selected and downloaded accuracy of the segmented area of this instance. It contains
the data set used in the dermatological segmentation two stages. The first stage is proposing candidate object
competition for the training and testing of our method. bounding boxes with the RPN(Region Proposal Network).
The structure of the paper is organized as follows: The second stage is made up of a Fast R-CNN classifier and
Section 1. is the introduction. We will introduce related a binary mask prediction branch. The detailed steps for each
work about this paper in Section 2. In Section 3, we describe stage are as follows:
Methodology. Experimental results on a skin lesion data set Stage. 1: The original picture enters the Feature Pyramid
are introduced in Section 4. Section 5 concludes the paper. Networks (FPN) [18], a vital part to extract features and
obtain a feature map in the feature extractor of Mask R-
II. RELATED WORK CNN. The feature map passes through the RPN network to
Following Faster R-CNN, Mask R-CNN gave birth to a generate candidate boxes, which are then combined with the
masterpiece in the field of instance segmentation. Many feature map. Therefore, a feature map with candidate boxes
researchers have made certain improvements based on Mask is obtained. Among them, the number of candidate boxes is
R-CNN to complete their scientific research projects. These quite large. Therefore a certain selection is needed in the
papers [15][16][17] use Mask R-CNN to make certain later stage.
improvements for the recognition and segmentation of
Stage. 2: The feature map with candidate boxes is
Lungs, ships, and remote sensing pictures respectively.
screened by the NMS algorithm to obtain the candidate box
We borrow their improvement methods and have made of the optimal solution. NMS is widely used in object
certain improvements and adjustments to Mask R-CNN by detection algorithms whose purpose is to eliminate
studying the characteristics of skin diseases in the skin redundant candidate boxes and find the best object detection
disease data set. We also combined the capabilities of Mask position. Then the feature map is classified through three
R-CNN itself. For the problem of the difficulty to judge the branches, pixel-level segmentation, and candidate frame
type of skin disease, we pre-trained the classification optimization to obtain the final result. Among them, the
network of Mask R-CNN; for the problem of segmentation pixel-level segmentation operation occurs in the branch of
regions and their candidate frames, we have adjusted some Mask prediction. It will classify and judge the target object
settings of Mask R-CNN through certain research on the at the pixel level. If it is, it will mark the segmentation;
structure of Mask R-CNN. otherwise, it will not. Meanwhile, by feeding the features
into the mask prediction branch which consists of four
At the same time, given the characteristics of skin convolution layers and one de-convolution layer, it can
diseases, we also read some related literature and the predict the skin lesion area target mask.
corresponding segmentation algorithm literature to have a
better understanding of skin diseases. Combined with the B. Our Method
characteristics of skin diseases, we can improve Mask R- Due to the characteristics of skin diseases, moles and
CNN to deal with skin diseases more specifically. melanoma are not well distinguished in physical features.

65
We pre-trained the Mask R-CNN classification network and assigned corresponding weights before the formal training.

Fig. 3. Comparison of segmentation results of skin diseases. The first line is the original image, the second line is the Mask R-CNN segmentation map, and
the third line is the result of our method.

Meanwhile, according to the characteristics of the IV. EXPERIMENT RESULTS


distribution area of the skin disease and its color
characteristics, we have made certain modifications to the A. Dataset
settings of some corresponding parameters of Mask R-CNN. To demonstrate the effectiveness of our method, we
The specific modifications and comparisons are as download skin lesion datasets from ISIC. There are two
follows in Table 1: types of skin diseases in the dermatology dataset:
neuromolecules and melanoma. Each picture has associated
Table 1: The parameters' adjustments and comparisons
metadata data for scientific researchers to understand the
Parameter Mask R-CNN Our Method characteristics of skin disease and pathology in this picture.
IoU 07 0.75 We downloaded 2 dermatology datasets, one for training
learning rate 0.001 0.001/0.0001 and the other for testing.
NUM_CLASSES 40 3
The training dataset contains 500 skin lesion images,
WEIGHT_DECAY 0.0001 0.00001
which can be divided into 2 types: melanoma (238 images)
learning_momentum 0.9 0.95
and nevus (262 images).
The testing data set mixes up these two types (500 =
Changing IoU can adapt well to the change of candidate 250 melanoma + 250 nevus).
boxes. A proper learning rate can help the objective function The sizes of images for training and testing are
converge to a local minimum in a suitable time, so we 1022×767, 1504×1129, and 962×762.
adjusted it to 0.0001 in the later stage of training. Since
usually there is only one dermatological object in the ISIC B. Dataset Pre-processing
dermatological data set, and at most no more than three, we We downloaded the metadata file on the ISIC website
changed NUM_CLASSES to three to improve the accuracy. to learn more about skin diseases. The images that we
Excessive WEIGHT_DECAY will lead to overfitting, which manually labeled through LabelMe will generate a .json
is a problem of U-net, so we reduced it. As for learning
file. Using the corresponding code conversion, this .json file
momentum, a proper one will have better buffering when the
gradient changes. We have also made some changes to the can be converted into five files, which are the original
framework of Mask R-CNN and compared the results using image, the original image with annotations, the label image,
different frameworks. the .txt file, and the .yaml file.
We have referred to the paper [19] to make all the C. Experiment Results
modifications to these parameters. We modified the above As shown in Fig. 3, we compared the effects of
parameters because of their good experimental results and dermatological segmentation. Among them, the first line is
corresponding data basis.
the original image, the second line is the Mask R-CNN

66
segmentation map, and the third line is the result of our
method. We can see that our method can be more precise [1] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
and detailed compared to the traditional Mask R-CNN. Our for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit, 2015, pp. 3431–3440.
method can also frame more areas where the diseased area
[2] B. Bozorgtabar, S. Sedai,P. Kanti Roy and R. Garnavi,"Skin lesion
is not obvious. segmentation using deep convolution networks guided by local
The segmentation accuracy of our method and Mask R- unsupervised learning" in IBM Journal of Research and
CNN is shown in Table 2: Development,pp.6:1-6:8,September 2017.
[3] Olaf Ronneberger, Philipp Fischer and Thomas Brox,"U-Net:
Table 2: Segmentation performance of our method
Convolutional Networks for Biomedical Image Segmentation"
Model AP arXiv:1505.04597v1 [cs.CV],18 May 2015.
Mask R-CNN 0.9085 [4] S. Anand, R. verma, C. Vaja,A. Shah and K. Gaikwad,"Metastatic
Our method 0.9192 Malignant Melonoma: A Case Study" International Journal of
Scientific Study, Vol. 4,Issue 6, pp.2321-6379, September 2016.
In comparison with the traditional Mask R-CNN, our [5] Tatyana Polevaya, Roman Ravodin and Andrey Filchenkov "Skin
method has a 1.07% improvement in segmentation accuracy. Lesion Primary Morphology Classification With End-To-End Deep
We can also see that from the comparison of the Learning Network" 2019 International Conference on Artificial
segmentation graphs in Fig. 3, our method does handle the Intelligence in Information and Communication (ICAIIC), Feb. 2019,
details better. There are many fuzzy boundaries where the pp.247-250.
diseased area is not obvious, but our method can also [6] Seung Seog Han, Myoung Shin Kim, Woohyung Lim, and Gyeong
Hun Park, Ilwoo Park, and Sung Eun Chang,''Classification of the
roughly divide them. Meanwhile, the segmentation
Clinical Images for Benign and Malignant Cutaneous Tumors Using
performance comparison of our method under different a Deep Learning Algorithm'' Journal of Investigative Dermatology,
frameworks is shown in Table 3: Vol. 138, Issue 7, pp.1529–1538, July 2018.
Table 3: Segmentation performance among different backbones [7] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun,"Faster R-
CNN: Towards real-time object detection with region proposal
Backbone AP networks," IEEE Transactions on Pattern Analysis and Machine
LeNet 0.9118 Intelligence, Vol. 39 , Issue: 6 , pp.1137-1149, June. 2017.
AlexNet 0.9134 [8] R. Girshick, F. Iandola, T. Darrell, and J. Malik, “Deformable part
VGG-16 0.9157 models are convolutional neural networks, ” 2015 IEEE Conference
ResNet50 0.9081 on Computer Vision and Pattern Recognition (CVPR), pp. 437–446.
ResNet101 0.9192 June 2015.
[9] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, "Soft-nms x2014;
Under the premise that the framework is ResNet101,
improving object detection with oneline of code," in 2017 IEEE
our method is better than the other frameworks and it also International Conference on Computer Vision (ICCV), October 2017,
provides a reference for the operation of our subsequent pp. 5562–5570.
higher-level experiments under this framework. As for the [10] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick,
detection accuracy of dermatological types, the results are "Mask R-CNN" IEEE Transactions on Pattern Analysis and Machine
shown in Table 4: Intelligence, pp. 2980–2988, October. 2017.
[11] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik,
Table 4: Detection performance of our method "Rich feature hierarchies for accurate object detection and semantic
segmentation" 2014 IEEE Conference on Computer Vision and
Model Precision Recall AP
Pattern Recognition, pp.580-587, June.2014.
Mask R-CNN 0.8837 0.9064 0.8936
[12] Y. Lecun , L. Bottou , Y. Bengio , and P. Haffner,"Gradient-based
Our method 0.9021 0.9187 0.9085
learning applied to document recognition" Proceedings of the IEEE,
Our method has higher detection and segmentation Volume: 86 , Issue: 11 , pp.2278-2324, November. 1998.
accuracy than the original Mask R-CNN, which also [13] Ross Girshick, "Fast R-CNN" 2015 IEEE International Conference
verifies the effectiveness of our model. on Computer Vision (ICCV), pp.1440-1448, December.2015.
[14] ISIC, International Skin Imaging Collaboration, Available:
V. CONCLUSION https://www.isic-
archive.com/#!/topWithHeader/onlyHeaderTop/gallery
In this paper, we propose the improved Mask R-CNN, [15] Shanlan Nie, Zhiguo Jiang, Haopeng Zhang, Bowen Cai, Yuan Yao,
and our new method has shown better effectiveness of "INSHORE SHIP DETECTION BASED ON MASK R-CNN" ,
segmentation and detection on skin lesion than Mask R- IGARSS 2018-2018 IEEE International Geoscience and Remote
CNN. Our method can segment the skin disease area more Sensing Symposium, July 2018, pp.693-696.
accurately and effectively, while not causing incomplete [16] Menglu Liu, Junyu Dong, Xinghui Dong, Hui Yu, Lin Qi
segmentation when the skin disease area is not obvious. "Segmentation of Lung Nodule in CT Images Based on Mask R-
CNN", 2018 9th International Conference on Awareness Science and
In future work, we will try other models to segment skin Technology (iCAST), Sept. 2018, pp.99-100 .
diseases, and make some improvements to these models to [17] Hao Su, Shunjun Wei, Min Yan, Chen Wang, Jun Shi, Xiaoling
adapt to new needs and accomplish new challenges. Zhang "Object Detection and Instance Segmentation in Remote
Especially for the segmentation of dermatosis on dark skin, Sensing Imagery Based on Precise Mask R-CNN", IGARSS 2019 -
it is hard to identify and segment this type of skin lesion, due 2019 IEEE International Geoscience and Remote Sensing
to the subtle color differences. We will continue our research Symposium, November 2019, pp.1454-1457.
in this direction. [18] T. Y. Lin, P. Dollr, R. Girshick, K. He, B. Hariharan, and S.
Belongie, “Feature pyramid networks for object detection,” in 2017
REFERENCES

67
IEEE Conference on Computer Vision and Pattern Recognition 2019 IEEE 2nd International Conference on Information
(CVPR), July 2017, pp. 936–944. Communication and Signal Processing (ICICSP), September. 2019,
[19] Guofeng Lv , Ke Wen, Zheng Wu, Xu Jin, Hong An, and Jie pp.357-362.
He,"Nuclei R-CNN Improve Mask R-CNN for Nuclei Segmentation"

68

You might also like