You are on page 1of 7

Tissue and Cell 57 (2019) 8–14

Contents lists available at ScienceDirect

Tissue and Cell


journal homepage: www.elsevier.com/locate/tice

Comparative assessment of CNN architectures for classification of breast T


FNAC images

Amartya Ranjan Saikiaa, Kangkana Borab, Lipi B. Mahantab, , Anup Kumar Dasc
a
The Department of Computer Science and Engineering, Assam Engineering College, Guwahati 781013, Assam, India
b
The Department of Centre for Computational and Numerical Sciences, Institute of Advanced Study in Science and Technology, Guwahati 781035, Assam, India
c
Arya Wellness Center, Guwahati 781032, Assam, India

A R T I C LE I N FO A B S T R A C T

Keywords: Fine needle aspiration cytology (FNAC) entails using a narrow gauge (25-22 G) needle to collect a sample of a
Deep learning lesion for microscopic examination. It allows a minimally invasive, rapid diagnosis of tissue but does not pre-
Convolutional neural network serve its histological architecture. FNAC is commonly used for diagnosis of breast cancer, with traditional
Breast cancer practice being based on the subjective visual assessment of the breast cytopathology cell samples under a mi-
FNAC
croscope to evaluate the state of various cytological features. Therefore, there are many challenges in main-
taining consistency and reproducibility of findings. However, the advent of digital imaging and computational
aid in diagnosis can improve the diagnostic accuracy and reduce the effective workload of pathologists. This
paper presents a comparison of various deep convolutional neural network (CNN) based fine-tuned transfer
learned classification approach for the diagnosis of the cell samples. The proposed approach has been tested
using VGG16, VGG19, ResNet-50 and GoogLeNet-V3 (aka Inception V3) architectures of CNN on an image
dataset of 212 images (99 benign and 113 malignant), later augmented and cleansed to 2120 images (990 benign
and 1130 malignant), where the network was trained using images of 80% cell samples and tested on the rest.
This paper presents a comparative assessment of the models giving a new dimension to FNAC study where
GoogLeNet-V3 (fine-tuned) achieved an accuracy of 96.25% which is highly satisfactory.

1. Introduction reproducibility in findings are inevitable. Moreover, with inadequate or


non-representative sampling may leading to equivocal diagnosis there
Breast malignancy is the second most common type of malignancy exists an overlap in the state of various cytological criteria in benign
after lung cancer in general, second most in women after skin cancer and malignant lesions (Ducatman and Wang, 2009; Kocjan, 2006).
and the fifth most common cause of cancer death worldwide (Sharma Advancement in AI, digital imaging and computational aid in diagnosis
et al., 2010). Breast cancer is also prevalent among men and according can help to improve the diagnostic accuracy and to reduce the effective
to new statistics, about 2550 new cases of invasive breast cancer are workload of a pathologist. In this regard, researchers and practitioners
expected to be diagnosed in men in 2018 (Breast Cancer Statistics, of pathology have been using quantitative analysis for computer-aided
2019). According to the World Health Organization projections, breast diagnosis (CAD) of pathology samples including breast FNAC (Demir
cancer caused 559,000 deaths worldwide in the year 2008 (Mathers and Yener, 2005; Irshad et al., 2014; Saha et al., 2016).
et al., 2008). The incidence rate of breast cancer is increasing rapidly in This paper implements and analyses several deep convolutional
developing countries. India being under the radar of high incidence, neural network (CNN) based classification models for the diagnosis of
there was an ardent need to work on this area with cutting edge deep the breast FNAC cell and presents a comparison-based study of the
learning approaches. Fine needle aspiration cytology (FNAC) is one of same. Candidate CNN model used in this study are VGG16, VGG19,
the most commonly used pathological investigations for screening, and ResNet50, GoogLeNet-V3. A fine-tuned version of all the said are also
diagnosis of breast cancer. The traditional practice of breast FNAC is explored in the study. Finally, the best model is proposed for further use
based on subjective assessment. Here, microscopic appearance of the or study. As per our knowledge this is the first comparative study on
aspirates is visually evaluated on various cytological criteria. Therefore, deep learning technique for FNAC image classification. Fig. 1 sum-
there are many challenges in maintaining consistency and ensuring marizes the contribution of this paper reflecting all the steps followed


Corresponding author.
E-mail addresses: ar5saikia@gmail.com (A.R. Saikia), kangkana.bora89@gmail.com (K. Bora), dranupdas@gmail.com (A.K. Das).

https://doi.org/10.1016/j.tice.2019.02.001
Received 13 December 2018; Received in revised form 30 January 2019; Accepted 2 February 2019
Available online 05 February 2019
0040-8166/ © 2019 Elsevier Ltd. All rights reserved.
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14

Fig. 1. Overview of the proposed work.

using breast cytopathology samples and CNN based breast FNAC cell and Bengtsson, 2003); followed by pattern recognition techniques for
sample classification. prediction of abnormalities and anomalies (Langer et al., 2015).
Limited data samples collected did not limit the study as various Recently deep neural networks like autoencoders (Xu et al., 2016)
data augmentation and cleansing methods were employed. Comparison are increasingly finding their way into solving whole slide cyto-
of various segmentation methods and moving forward with the pro- pathology image analysis challenges while jointly learning the re-
mising one, after the formation of a proper dataset enabled us to move presentative feature space and classification margin. Some contribu-
forward with our research. tions related to radiological and interventional image analysis include
The paper is organized as – Section 2 describes the prior art for (Havaei et al., 2017; Liu et al., 2016; Sirinukunwattana et al., 2016).
computer vision, and machine learning techniques used in breast cancer The prior art (Spanhol et al., 2016a) on the breast cancer histopatho-
diagnosis by FNAC image analysis and deep learning techniques used in logical image classification dataset (BreakHis database) (Spanhol et al.,
cytopathology image analysis, Section 3 describes the materials and 2016b) had earlier used an AlexNet (Spanhol et al., 2016a) based CNN
methods used including the image dataset developed by us and used architecture for classifying whole slide histopathology and achieved
during experimentation for this paper, this section also presents details 84–90% accuracy. Das et al. (Das et al., 2017) in their approach com-
of the experimental setup used, Section 4 presents classification results bine predictions by transfer learned GoogLeNet (Szegedy et al., 2015)
obtained along with discussion of the findings. Conclusions for the architecture of a CNN for random multiple images of a breast histo-
study are presented in Section 5. pathology sample acquired at multiple magnifications to arrive at
whole slide diagnosis achieving 94.67% accuracy in multifold valida-
2. Prior art tion over the BreakHis database. Garud et al. (West and West, 2000)
achieved 89.7% mean accuracy in 8-fold validation using GoogLeNet
Advancement in machine learning and AI has shown us new pro- architecture of CNN on an FNAC image dataset.
mising paths in the field of medical imaging. Researchers and practi-
tioners of pathology have been using quantitative analysis to improve 3. Material and methodology
diagnostic accuracy and to reduce the effective workload of a pathol-
ogist. With recent developments in cost-effective and high-performance 3.1. FNAC database generation
computer technology, the digital pathology has become amenable to
the application of quantitative analysis in the form of decision support For this study FNAC cell sample images were generated at
systems (West and West, 2000; Wolberg and Mangasarian, 1990) and Ayursundra Healthcare Pvt. Ltd, Guwahati, India. Experienced doctors
computer vision and machine learning techniques-based CAD systems. collected the sample from patients and prepared the slides. The pro-
The CAD systems for digital pathology applications are being de- cessing and staining of cell samples was done at their laboratory. The
veloped and deployed for some time now (Irshad et al., 2014; Saha cell samples were collected from 20 patients following all ethical pro-
et al., 2016). Review of the prior art shows that computer vision sys- tocols. Images were captured by us using Leica ICC50 HD microscope
tems commonly used in cytological diagnosis apply the bottom up ap- using 400 resolution and 24 bits colour depth and with 5 megapixels
proach of diagnostic reasoning from evidence to hypothesis (Patel and camera associated with the microscope. Digitized images captured were
Groen, 1986). It involves segmentation of primitives such as clusters, then reviewed by experienced certified cyto-pathologists and selected a
cells, and nuclei using image processing and segmentation techniques total of 212 images for this study. The database can be downloaded
(Garud et al., 2017); quantification of diagnostically significant cyto- from the link: https://1drv.ms/u/s!Al-T6d-_ENf6axsEbvhbEc2gUFs.
logical criteria (Garud et al., 2012) using techniques that extract mor- Fig. 2 is a glimpse of five such benign and malignant cell samples. We
phometric, densitometric, textural and structural features (Rodenacker are considering only Pap stained images for the study.

9
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14

Fig. 2. Column I showing five benign samples and column II showing five malignant samples.

The generated database comprised of 212 FNAC images (99 benign red blood cells in the images are considered as noise and hence re-
and 113 malignant). Ground truth is then confirmed by a pathologist by moved completely. Histogram equalization is also used on the red
marking the cell sample images as benign or malignant. Along with the channel output, as a pre-processing step for removal of cytoplasm
images the corresponding original reports of the patients were also completely. Then images are auto segmented to achieve the goal. Two
collected which are later used for labelling of the images. basic thresholding techniques namely global thresholding and Otsu
For smear level study of the samples with deep learning, we require thresholding are employed to make it simple. Between the two, Otsu
more data than the current count to feed into the CNN models. Since, thresholding performs better than global thresholding based on visual
limited by cell samples, data augmentation was the prime focus to be investigation. Hence Otsus threshold was selected for segmenting the
implemented on the current data. Augmentation techniques, such as, whole dataset. We are not giving much focus on overlapped cell seg-
Cropping, Shearing, Rotation, Mirroring, Skewing, Inverting and mentation, rather importance is given on finding the region where
Zooming were implemented and the new count stood to 2120 images nuclei are present. Highly overlapped areas are avoided in the study.
(990 benign and 1130 malignant). The image data count being better Fig. 4(b) exemplifies the result of Otsu threshold of a sample image,
than before, was cleaned now to focus specifically on the region of which displays better segmentation output than Fig. 4(a) global
interest (ROI), i.e. the nucleus. For doing so an image was split into its threshold, due to its clear visual boundary around the nucleus and
RGB (red green blue) and CMY (cyan magenta yellow) colour channels better ROI selection.
and we proceeded only with the red channel of the image due to its
promising clear boundary highlighting only the ROI as shown in Fig. 3.
Also dealing with the red channel helps in noise removal efficient. From 3.2. Training CNN for breast FNAC image classification
Fig. 3 we can observe that debris like red blood cells are removed au-
tomatically in red channel output and nucleus region are clear with CNNs comprise a feed-forward family of deep networks, where in-
their prominent boundary. As we can see in row number 2 and 4 images between layers receive as input the features generated by the previous
of Fig. 3 contains lots of RBCs which were completely removed in red layer, and pass their outputs to the subsequently layer. The strength of
channel output. The output of colour channel decomposition of four this network is in learning hierarchical layers of concept representation,
images (two normal and two abnormal) is displayed in Fig. 3. This corresponding to different levels of abstraction. For image data, the
channel decomposition is used as a pre-processing step in the proposed lower levels of abstraction might describe the different orientated edges
approach. in the image; middle levels might describe parts of an object while high
The actual data is in the ROI, i.e., the nucleus. The cytoplasm and layers refer to larger object parts and even the object itself (Garud,
2017). In this study, several deep learning models, viz. VGG16

Fig. 3. Output of all the six channels after colour channel decomposition. Row 1, 2 are normal images and row 3, 4 contains abnormal images. Each column
indicating output different channels.

10
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14

For GoogLeNet-V3 fine-tuned model, the hyperparameters in the


trained model were: batch − size = 32, number − of − epochs = 12,
sgd − learning − rate = 1e − 4, momentum = 0.9, transforma-
tion − ratio = 0.05, training − sample = 1695 and valida-
tion − sample = 425.

3.3. Experimental setup

Experimentations were carried out in Intel Optimized Python 3 with


Fig. 4. Comparing of general threshold vs Otsus threshold. Keras (Intel Optimized Tensorflow backend) using Intel AI DevCloud, a
cluster comprising of Intel Xeon Gold 6128 processors. First an image
(Simonyan and Zisserman, 2015), VGG19 (Simonyan and Zisserman, repository was created where all the collected images were stored in 2
2015), ResNet50 (He et al., 2016) and GoogLeNet-V3 (Szegedy et al., different directories which represents 2 different classes. Images were
2016), were trained to represent breast FNAC features. Fine-tuning of arranged according to the reports of the patients collected and with
the classification margin were also done for these models. Transfer proper consultation with the pathologist. The accuracy of classification
learning from imagenet was deployed to transfer learned features along was evaluated. The accuracy is the proportion of true results (both true
with the newly trained features from the FNAC cell sample dataset. As positives and true negatives) among the total number of cases ex-
for Inception-V3, it is a variant of Inception-V2 (Szegedy et al., 2016; amined. Higher the accuracy, higher the rate of truly classified classes.
Doreswamy and Umme Salma, 2015) which adds BN-auxiliary. BN
auxiliary refers to the version in which the fully connected layer of the 4. Result and discussion
auxiliary classifier is also-normalized, not just convolutions. Goo-
gLeNet-V3 performs better than other models due to smaller kernels, The graphical representation of training loss vs validation loss and
efficient grid size reduction and presence of several inception grid. All training accuracy vs validation accuracy of all the models and their fine
architectures are already available in the mentioned literatures. We tuned versions is displayed from Fig. 5–8. Lower the loss better is the
have recreated those works and trained and tested with our own gen- model and higher the accuracy more satisfactory is the classification
erated database. Parameters considered for each model is as follows - results. From Fig. 8 it is observed that fine-tuned version of GoogLeNet
VGG 16/VGG 19/ResNet 50: batch − size = 32, epoch = 12, ver- V3 displays highly satisfactory performance with rate of decrease in loss
bose = 1, Training sample size: 1695 samples and validation sample with each epoch and rate of increase in accuracy with each epoch. It has
size: 425 samples. For fine tuned version we have flattened the last achieved an accuracy of 96.25% in epoch = 12 and lowest validation
layer, also fine-tuned with relu activation and adadelta optimizer. loss of 0.0828 compared to all other models result.

Fig. 5. Result of VGG 16 and VGG 16 fine tuned.

11
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14

Fig. 6. Result of VGG 19 and VGG 19 fine tuned.

Fig. 7. Result of ResNet 50 and ResNet 50 fine tuned.

12
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14

Fig. 8. Result of GoogLeNet V3 and GoogLeNet V3 fine tuned.

Table 1 diagnosis in malignant or benign categories. It was observed that in


Classification accuracy of different CNN model (bold values indicating the best general GoogLeNet-V3 can learn visual features and classification
results). margin from different training samples included in the dataset and
Model Accuracy (%) Loss achieves the smear level classification accuracy of 96.25%, which is less
than the classification accuracies achieved with conventionally defined
VGG 16 63.2 0.6395 Computed Feature Dataset and statistical classifiers. It is known, com-
VGG 16 fine tuned 88.67 0.2967
puted feature dataset is slave of segmentation technique used so the
VGG 19 60.84 0.8606
VGG 19 fine tuned 88.2 0.2875 results used for one dataset may not show consistent results on other
ResNet 50 85.61 0.3414 dataset, but deep learning technique learn the features itself and user do
ResNet 50 fine tuned 90.56 0.2571 not have to bother and spend too much time on significant feature se-
GoogLeNet V3 71.88 0.3295 lection. This proves that GoogLeNet-V3 is by far the best deep learning
GoogLeNet V3 fine tuned 96.25 0.0828
method for FNAC cell image classification. The proposed scheme can be
considered as a baseline for future research. Data augmentation by
Results for the multi-experiment classification is shown in Table 1. adding more samples and data replication, transfer learning along with
The table presents classification accuracy performances across each better CNN models and hyperparameter tweaking can be used to im-
experiment along with the details of training and test data in each ex- prove results.
periment. From the results, it can be observed that in general Goo-
gLeNet-V3 (Szegedy et al., 2016) can learn visual features and classi- Acknowledgements
fication margin from different training samples during transfer learning
and achieves the smear level classification accuracy of 96.25%. The Authors would like to thank the Director of Institute of Advanced
fine-tuned VGG16 achieved 90.09%, fine-tuned VGG19 achieved Study in Science and Technology, Guwahati, India and Principal of
72.53%, and ResNet50 achieved 89.15%. This accuracy is comparable Assam Engineering College, Guwahati, India for giving us the platform
with the accuracy of 89.7% mean accuracy in 8-fold validation from to carry forward this work. Kangkana Bora would like to thank Inspire
(Bora et al., 2016). This proves that GoogLeNet-V3 is by far the best Fellowship scheme of DST, Govt. of India for providing the fellowship
deep learning method for FNAC cell image classification. The final re- to pursue PhD. This work has the consent of all the co-authors and
sults are summarized in Table 1. authorities of the institute, where this study has been carried out and
there exists no conflict of interest anywhere.

5. Conclusion References

Bora, K., Chowdhury, M., Mahanta, L.B., Kundu, M.K., Das, A.K., 2016. Pap smear image
In this paper, we presented an FNAC breast cytopathology samples
classification using convolutional neural network. ACM International Conference
image dataset along with the performance of VGG16, VGG19, ResNet50 Proceeding Series, ICVGIP, 18–22 December 2016, IIT Guwahati.
and GoogLeNet-V3 CNN architecture in breast FNAC cell sample U.S. Breast Cancer Statistics, Technical Report, https://www.breastcancer.org/

13
A.R. Saikia et al. Tissue and Cell 57 (2019) 8–14

symptoms/understand_bc/statistics. feature extraction of combinational methods with and without learning process. Med.
Das, K., Karri, S.P.K., Guha Roy, A., Chatterjee, J., Sheet, D., 2017. Classifying histo- Image Anal. 32, 281–294.
pathology whole-slides using fusion of decisions from deep convolutional network on Mathers, C., Fat, D.M., Boerma, J.T., 2008. The Global Burden of Disease: 2004 Update.
a collection of random multi-views at multi-magnification. IEEE International World Health Organization.
Symposium on Biomedical Imaging. Patel, V.L., Groen, G.J., 1986. Knowledge based solution strategies in medical reasoning.
Demir, C., Yener, B., 2005. Automated Cancer Diagnosis Based on Histopathological Cognit. Sci. 10, 91–116.
Images: A Systematic Survey. Rensselaer Polytechnic Institute Tech. Rep. Rodenacker, K., Bengtsson, E., 2003. A feature set for cytometry on digitized microscopic
Doreswamy, H., Umme Salma, M., 2015. Fast modular artificial neural network for the images. Anal. Cell. Pathol. 25, 1–36.
classification of breast cancer data. Proceedings of the Third International Saha, M., Mukherjee, R., Chakraborty, C., 2016. Computer-aided diagnosis of breast
Symposium on Women in Computing and Informatics 66–72. cancer using cytological images: a systematic review. Tissue Cell 48 (5), 461–474
Ducatman, B.S., Wang, H.H., 2009. Chapter 8 – Breast. In: Cibas, E.S., Ducatman, B.S. ISSN: 1532-3072.
(Eds.), Cytology, 3rd ed. pp. 221–254. Sharma, G.N., Dave, R., Sanadya, J., Sharma, P., Sharma, K.K., 2010. Various types and
Garud, H.T., Sheet, D., Mahadevappa, M., Chatterjee, J., Ray, A.K., Ghosh, A., 2012. management of breast cancer: an overview. J. Adv. Pharm. Technol. Res. 1, 109–126.
Breast fine needle aspiration cytology practices and commonly perceived diagnostic Simonyan, K., Zisserman, A., 2015. Very deep convolutional networks for large-scale
significance of cytological features: a pan-India survey. J. Cytol. 293, 183. image recognition. 3rd IAPR Asian Conference on Pattern Recognition (ACPR)
Garud, H., Karri, S.P.K., Sheet, D., Maity, A.K., Chatterjee, J., Mahadevappa, M., Ray, 730–734.
A.K., 2017. Methods and system for segmentation of isolated nuclei in microscopic Sirinukunwattana, K., Raza, S.E.A., Tsang, Y.W., Snead, D.R.J., Cree, I.A., Rajpoot, N.M.,
images of breast fine needle aspiration cytology images. IEEE Conference on 2016. Locality sensitive deep learning for detection and classification of nuclei in
Computer Vision and Pattern Recognition Workshops (CVPRW). routine colon cancer histology images. IEEE Trans. Med. Imaging 35, 1196–1206.
Garud, H., 2017. High-magnification multi-views based classification of breast fine needle Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L., 2016a. Breast cancer histopatho-
aspiration cytology cell samples using fusion of decisions from deep convolutional logical image classification using convolutional neural networks. Proc. Int. Jt. Conf.
networks. IEEE Conference on Computer Vision and Pattern Recognition Workshops Neural Net.
(CVPRW) 828–833. Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L., 2016b. A dataset for breast cancer
Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., histopathological image classification. IEEE Trans. Biomed. Eng. 63, 1455–1462.
Jodoin, P.-M., Larochelle, H., 2017. Brain tumor segmentation with deep neural Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke,
networks. Med. Image Anal. 35, 18–31. V., Rabinovich, A., 2015. Going deeper with convolutions. Proc. IEEE Conf. Comp.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. Vis. Patt. Recognit. 1–9.
IEEE Conference on Computer Vision and Pattern Recognition 770–778. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception
Irshad, H., Veillard, A., Roux, L., Racoceanu, D., 2014. Methods for nuclei detection, architecture for computer vision. IEEE Conference on Computer Vision and Pattern
segmentation, and classification in digital histopathology: a review-current status and Recognition (CVPR) 2818–2826.
future potential. IEEE Rev. Biomed. Eng. 7, 97–114. West, D., West, V., 2000. Model selection for a medical diagnostic decision support
Kocjan, G., 2006. Diagnostic dilemmas in FNAC cytology: difficult breast lesions. Fine system: a breast cancer detection case. Artif. Intell. Med. 20, 183–204.
Needle Aspiration Cytology. Springer, Berlin, Heidelberg, pp. 181–211. Wolberg, W.H., Mangasarian, O.L., 1990. Multisurface method of pattern separation for
Langer, L., Binenbaum, Y., Gugel, L., Amit, M., Gil, Z., Dekel, S., 2015. Computer-aided medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. 87, 9193–9196.
diagnostics in digital pathology: automated evaluation of early-phase pancreatic Xu, J., Xiang, L., Liu, Q., Gilmore, H., Wu, J., Tang, J., Madabhushi, A., 2016. Stacked
cancer in mice. Int. J. Comput. Assist. Radiol. Surg. 10, 1043–1054. sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology
Liu, D.Y., Gan, T., Rao, N.N., Xing, Y.W., Zheng, J., Li, S., Luo, C.S., Zhou, Z.J., Wan, Y.L., images. IEEE Trans. Med. Imaging 35, 119–130.
2016. Identification of lesion images from gastrointestinal endoscope based on

14

You might also like