Professional Documents
Culture Documents
A R T I C LE I N FO A B S T R A C T
Keywords: Fine needle aspiration cytology (FNAC) entails using a narrow gauge (25-22 G) needle to collect a sample of a
Deep learning lesion for microscopic examination. It allows a minimally invasive, rapid diagnosis of tissue but does not pre-
Convolutional neural network serve its histological architecture. FNAC is commonly used for diagnosis of breast cancer, with traditional
Breast cancer practice being based on the subjective visual assessment of the breast cytopathology cell samples under a mi-
FNAC
croscope to evaluate the state of various cytological features. Therefore, there are many challenges in main-
taining consistency and reproducibility of findings. However, the advent of digital imaging and computational
aid in diagnosis can improve the diagnostic accuracy and reduce the effective workload of pathologists. This
paper presents a comparison of various deep convolutional neural network (CNN) based fine-tuned transfer
learned classification approach for the diagnosis of the cell samples. The proposed approach has been tested
using VGG16, VGG19, ResNet-50 and GoogLeNet-V3 (aka Inception V3) architectures of CNN on an image
dataset of 212 images (99 benign and 113 malignant), later augmented and cleansed to 2120 images (990 benign
and 1130 malignant), where the network was trained using images of 80% cell samples and tested on the rest.
This paper presents a comparative assessment of the models giving a new dimension to FNAC study where
GoogLeNet-V3 (fine-tuned) achieved an accuracy of 96.25% which is highly satisfactory.
⁎
Corresponding author.
E-mail addresses: ar5saikia@gmail.com (A.R. Saikia), kangkana.bora89@gmail.com (K. Bora), dranupdas@gmail.com (A.K. Das).
https://doi.org/10.1016/j.tice.2019.02.001
Received 13 December 2018; Received in revised form 30 January 2019; Accepted 2 February 2019
Available online 05 February 2019
0040-8166/ © 2019 Elsevier Ltd. All rights reserved.
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14
using breast cytopathology samples and CNN based breast FNAC cell and Bengtsson, 2003); followed by pattern recognition techniques for
sample classification. prediction of abnormalities and anomalies (Langer et al., 2015).
Limited data samples collected did not limit the study as various Recently deep neural networks like autoencoders (Xu et al., 2016)
data augmentation and cleansing methods were employed. Comparison are increasingly finding their way into solving whole slide cyto-
of various segmentation methods and moving forward with the pro- pathology image analysis challenges while jointly learning the re-
mising one, after the formation of a proper dataset enabled us to move presentative feature space and classification margin. Some contribu-
forward with our research. tions related to radiological and interventional image analysis include
The paper is organized as – Section 2 describes the prior art for (Havaei et al., 2017; Liu et al., 2016; Sirinukunwattana et al., 2016).
computer vision, and machine learning techniques used in breast cancer The prior art (Spanhol et al., 2016a) on the breast cancer histopatho-
diagnosis by FNAC image analysis and deep learning techniques used in logical image classification dataset (BreakHis database) (Spanhol et al.,
cytopathology image analysis, Section 3 describes the materials and 2016b) had earlier used an AlexNet (Spanhol et al., 2016a) based CNN
methods used including the image dataset developed by us and used architecture for classifying whole slide histopathology and achieved
during experimentation for this paper, this section also presents details 84–90% accuracy. Das et al. (Das et al., 2017) in their approach com-
of the experimental setup used, Section 4 presents classification results bine predictions by transfer learned GoogLeNet (Szegedy et al., 2015)
obtained along with discussion of the findings. Conclusions for the architecture of a CNN for random multiple images of a breast histo-
study are presented in Section 5. pathology sample acquired at multiple magnifications to arrive at
whole slide diagnosis achieving 94.67% accuracy in multifold valida-
2. Prior art tion over the BreakHis database. Garud et al. (West and West, 2000)
achieved 89.7% mean accuracy in 8-fold validation using GoogLeNet
Advancement in machine learning and AI has shown us new pro- architecture of CNN on an FNAC image dataset.
mising paths in the field of medical imaging. Researchers and practi-
tioners of pathology have been using quantitative analysis to improve 3. Material and methodology
diagnostic accuracy and to reduce the effective workload of a pathol-
ogist. With recent developments in cost-effective and high-performance 3.1. FNAC database generation
computer technology, the digital pathology has become amenable to
the application of quantitative analysis in the form of decision support For this study FNAC cell sample images were generated at
systems (West and West, 2000; Wolberg and Mangasarian, 1990) and Ayursundra Healthcare Pvt. Ltd, Guwahati, India. Experienced doctors
computer vision and machine learning techniques-based CAD systems. collected the sample from patients and prepared the slides. The pro-
The CAD systems for digital pathology applications are being de- cessing and staining of cell samples was done at their laboratory. The
veloped and deployed for some time now (Irshad et al., 2014; Saha cell samples were collected from 20 patients following all ethical pro-
et al., 2016). Review of the prior art shows that computer vision sys- tocols. Images were captured by us using Leica ICC50 HD microscope
tems commonly used in cytological diagnosis apply the bottom up ap- using 400 resolution and 24 bits colour depth and with 5 megapixels
proach of diagnostic reasoning from evidence to hypothesis (Patel and camera associated with the microscope. Digitized images captured were
Groen, 1986). It involves segmentation of primitives such as clusters, then reviewed by experienced certified cyto-pathologists and selected a
cells, and nuclei using image processing and segmentation techniques total of 212 images for this study. The database can be downloaded
(Garud et al., 2017); quantification of diagnostically significant cyto- from the link: https://1drv.ms/u/s!Al-T6d-_ENf6axsEbvhbEc2gUFs.
logical criteria (Garud et al., 2012) using techniques that extract mor- Fig. 2 is a glimpse of five such benign and malignant cell samples. We
phometric, densitometric, textural and structural features (Rodenacker are considering only Pap stained images for the study.
9
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14
Fig. 2. Column I showing five benign samples and column II showing five malignant samples.
The generated database comprised of 212 FNAC images (99 benign red blood cells in the images are considered as noise and hence re-
and 113 malignant). Ground truth is then confirmed by a pathologist by moved completely. Histogram equalization is also used on the red
marking the cell sample images as benign or malignant. Along with the channel output, as a pre-processing step for removal of cytoplasm
images the corresponding original reports of the patients were also completely. Then images are auto segmented to achieve the goal. Two
collected which are later used for labelling of the images. basic thresholding techniques namely global thresholding and Otsu
For smear level study of the samples with deep learning, we require thresholding are employed to make it simple. Between the two, Otsu
more data than the current count to feed into the CNN models. Since, thresholding performs better than global thresholding based on visual
limited by cell samples, data augmentation was the prime focus to be investigation. Hence Otsus threshold was selected for segmenting the
implemented on the current data. Augmentation techniques, such as, whole dataset. We are not giving much focus on overlapped cell seg-
Cropping, Shearing, Rotation, Mirroring, Skewing, Inverting and mentation, rather importance is given on finding the region where
Zooming were implemented and the new count stood to 2120 images nuclei are present. Highly overlapped areas are avoided in the study.
(990 benign and 1130 malignant). The image data count being better Fig. 4(b) exemplifies the result of Otsu threshold of a sample image,
than before, was cleaned now to focus specifically on the region of which displays better segmentation output than Fig. 4(a) global
interest (ROI), i.e. the nucleus. For doing so an image was split into its threshold, due to its clear visual boundary around the nucleus and
RGB (red green blue) and CMY (cyan magenta yellow) colour channels better ROI selection.
and we proceeded only with the red channel of the image due to its
promising clear boundary highlighting only the ROI as shown in Fig. 3.
Also dealing with the red channel helps in noise removal efficient. From 3.2. Training CNN for breast FNAC image classification
Fig. 3 we can observe that debris like red blood cells are removed au-
tomatically in red channel output and nucleus region are clear with CNNs comprise a feed-forward family of deep networks, where in-
their prominent boundary. As we can see in row number 2 and 4 images between layers receive as input the features generated by the previous
of Fig. 3 contains lots of RBCs which were completely removed in red layer, and pass their outputs to the subsequently layer. The strength of
channel output. The output of colour channel decomposition of four this network is in learning hierarchical layers of concept representation,
images (two normal and two abnormal) is displayed in Fig. 3. This corresponding to different levels of abstraction. For image data, the
channel decomposition is used as a pre-processing step in the proposed lower levels of abstraction might describe the different orientated edges
approach. in the image; middle levels might describe parts of an object while high
The actual data is in the ROI, i.e., the nucleus. The cytoplasm and layers refer to larger object parts and even the object itself (Garud,
2017). In this study, several deep learning models, viz. VGG16
Fig. 3. Output of all the six channels after colour channel decomposition. Row 1, 2 are normal images and row 3, 4 contains abnormal images. Each column
indicating output different channels.
10
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14
11
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14
12
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14
5. Conclusion References
Bora, K., Chowdhury, M., Mahanta, L.B., Kundu, M.K., Das, A.K., 2016. Pap smear image
In this paper, we presented an FNAC breast cytopathology samples
classification using convolutional neural network. ACM International Conference
image dataset along with the performance of VGG16, VGG19, ResNet50 Proceeding Series, ICVGIP, 18–22 December 2016, IIT Guwahati.
and GoogLeNet-V3 CNN architecture in breast FNAC cell sample U.S. Breast Cancer Statistics, Technical Report, https://www.breastcancer.org/
13
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14
symptoms/understand_bc/statistics. feature extraction of combinational methods with and without learning process. Med.
Das, K., Karri, S.P.K., Guha Roy, A., Chatterjee, J., Sheet, D., 2017. Classifying histo- Image Anal. 32, 281–294.
pathology whole-slides using fusion of decisions from deep convolutional network on Mathers, C., Fat, D.M., Boerma, J.T., 2008. The Global Burden of Disease: 2004 Update.
a collection of random multi-views at multi-magnification. IEEE International World Health Organization.
Symposium on Biomedical Imaging. Patel, V.L., Groen, G.J., 1986. Knowledge based solution strategies in medical reasoning.
Demir, C., Yener, B., 2005. Automated Cancer Diagnosis Based on Histopathological Cognit. Sci. 10, 91–116.
Images: A Systematic Survey. Rensselaer Polytechnic Institute Tech. Rep. Rodenacker, K., Bengtsson, E., 2003. A feature set for cytometry on digitized microscopic
Doreswamy, H., Umme Salma, M., 2015. Fast modular artificial neural network for the images. Anal. Cell. Pathol. 25, 1–36.
classification of breast cancer data. Proceedings of the Third International Saha, M., Mukherjee, R., Chakraborty, C., 2016. Computer-aided diagnosis of breast
Symposium on Women in Computing and Informatics 66–72. cancer using cytological images: a systematic review. Tissue Cell 48 (5), 461–474
Ducatman, B.S., Wang, H.H., 2009. Chapter 8 – Breast. In: Cibas, E.S., Ducatman, B.S. ISSN: 1532-3072.
(Eds.), Cytology, 3rd ed. pp. 221–254. Sharma, G.N., Dave, R., Sanadya, J., Sharma, P., Sharma, K.K., 2010. Various types and
Garud, H.T., Sheet, D., Mahadevappa, M., Chatterjee, J., Ray, A.K., Ghosh, A., 2012. management of breast cancer: an overview. J. Adv. Pharm. Technol. Res. 1, 109–126.
Breast fine needle aspiration cytology practices and commonly perceived diagnostic Simonyan, K., Zisserman, A., 2015. Very deep convolutional networks for large-scale
significance of cytological features: a pan-India survey. J. Cytol. 293, 183. image recognition. 3rd IAPR Asian Conference on Pattern Recognition (ACPR)
Garud, H., Karri, S.P.K., Sheet, D., Maity, A.K., Chatterjee, J., Mahadevappa, M., Ray, 730–734.
A.K., 2017. Methods and system for segmentation of isolated nuclei in microscopic Sirinukunwattana, K., Raza, S.E.A., Tsang, Y.W., Snead, D.R.J., Cree, I.A., Rajpoot, N.M.,
images of breast fine needle aspiration cytology images. IEEE Conference on 2016. Locality sensitive deep learning for detection and classification of nuclei in
Computer Vision and Pattern Recognition Workshops (CVPRW). routine colon cancer histology images. IEEE Trans. Med. Imaging 35, 1196–1206.
Garud, H., 2017. High-magnification multi-views based classification of breast fine needle Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L., 2016a. Breast cancer histopatho-
aspiration cytology cell samples using fusion of decisions from deep convolutional logical image classification using convolutional neural networks. Proc. Int. Jt. Conf.
networks. IEEE Conference on Computer Vision and Pattern Recognition Workshops Neural Net.
(CVPRW) 828–833. Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L., 2016b. A dataset for breast cancer
Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., histopathological image classification. IEEE Trans. Biomed. Eng. 63, 1455–1462.
Jodoin, P.-M., Larochelle, H., 2017. Brain tumor segmentation with deep neural Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke,
networks. Med. Image Anal. 35, 18–31. V., Rabinovich, A., 2015. Going deeper with convolutions. Proc. IEEE Conf. Comp.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. Vis. Patt. Recognit. 1–9.
IEEE Conference on Computer Vision and Pattern Recognition 770–778. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception
Irshad, H., Veillard, A., Roux, L., Racoceanu, D., 2014. Methods for nuclei detection, architecture for computer vision. IEEE Conference on Computer Vision and Pattern
segmentation, and classification in digital histopathology: a review-current status and Recognition (CVPR) 2818–2826.
future potential. IEEE Rev. Biomed. Eng. 7, 97–114. West, D., West, V., 2000. Model selection for a medical diagnostic decision support
Kocjan, G., 2006. Diagnostic dilemmas in FNAC cytology: difficult breast lesions. Fine system: a breast cancer detection case. Artif. Intell. Med. 20, 183–204.
Needle Aspiration Cytology. Springer, Berlin, Heidelberg, pp. 181–211. Wolberg, W.H., Mangasarian, O.L., 1990. Multisurface method of pattern separation for
Langer, L., Binenbaum, Y., Gugel, L., Amit, M., Gil, Z., Dekel, S., 2015. Computer-aided medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. 87, 9193–9196.
diagnostics in digital pathology: automated evaluation of early-phase pancreatic Xu, J., Xiang, L., Liu, Q., Gilmore, H., Wu, J., Tang, J., Madabhushi, A., 2016. Stacked
cancer in mice. Int. J. Comput. Assist. Radiol. Surg. 10, 1043–1054. sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology
Liu, D.Y., Gan, T., Rao, N.N., Xing, Y.W., Zheng, J., Li, S., Luo, C.S., Zhou, Z.J., Wan, Y.L., images. IEEE Trans. Med. Imaging 35, 119–130.
2016. Identification of lesion images from gastrointestinal endoscope based on
14