Professional Documents
Culture Documents
ABSTRACT: The breast cancer is the most threatening and lethal disease among women and its more prevalent in the
developed countries. From the report of World Health Organization (WHO) it has been described that as of 2015 there
are about 2.1 million women were diagnosed with benign and malignant breast cancers. On an average out of 268,600
women diagnosed with cancer, about 41,760 deaths were reported. Record of various Oncologists show that early
detection of tumour can increase the survival rate of the patients. Though cancer diagnosis is done initially by experts in
radiology it is not always accurate. In the modern era new technologies such as mammography, Computerized
Tomography, Breast MRI and computer- aided detection (CAD) mammogram are used in the diagnosis procedure. But
because of the false-positive and false- negative values from the CAD report it becomes mandatory to improve the
efficiency of the system. The efficiency of the algorithm depends on the type of data and the quality of data images
employed in the process Machine Learning plays a vital role in detecting cancer by processing the mammogram images.
This paper presents different technologies and their limitations in detection and classification of the breast cancer images.
Keywords: Breast Cancer, CART , KNN , NB, RF, SVM,
18
International Journal of Inventions in Computer Science and Engineering, Volume 6 Issue 5-7 May-July 2019
Fig 1.1. Mammogram image from new and old III. Creating summary of each class and each
technologies feature.
IV. Finding probability of each features.
The technology of computerizing the process of V. Finding the probability of each class as the
diagnosis of breast cancer from the mammogram multiplication of all features.
images without the need of a radiologist or oncologist VI. Predict the class of instance.
has been growing day by day and machine learning
becomes an inevitable part of this development. The Limitation: One important problem about this
accuracy of diagnosis with experienced Physician is algorithm is Zero Probability. This is a condition that
found to be 79.97% while with Machine Learning it is occurs when probability of a feature is zero thus it fails
91.1% (3). to give valid prediction.
II. MACHINE LEARNING APPROACHES
The art of training machines with numerous data 2.2. K- NEAREST NEIGHBOUR:
and testing them to perform in a desired way reduces
the time and effort put through by performing the same This approach takes a bunch of label points and
task manually. when a new point enters it looks for the most nearer
point.[7]. In contrast to Naïve Bayes this algorithm
There are two different types in machine learning classifies data with the feature similarity. It works well
for noisy inputs.
Supervised Learning
Reinforcement learning Algorithm:
This paper focuses on classification of breast I. The input data set is split into training and
cancer as benign or malignant. For the classification of testing data, mostly 80% for training and 20%
images there are some algorithms such as K-nearest for testing.
neighbours, Support Vector Machines (SVM), decision II. Pick an instance from the testing set and
trees and so on. But before applying the algorithm there compute its distance with the training set
are some preparation steps to be performed on the data III. Listing distances in ascending order.
in order to prepare the data for training. Once the data is IV. The class of instance is the most common class
prepared then it can be split into training and testing first 3 training instances.
data sets.
Limitation: The value of K denotes the nearest
Machine learning offers list of approaches or distance between each points and this is done for each
algorithms for classification. The task lies in choosing sample and each instance thus increases the
the best algorithm for the required dataset. The machine computational cost.
learning approaches for classifying given set of data are
as follows, 2.3. DECISION TREES:
1) Linear Classifier
a. Logistic Regression A large dataset is broken down into various subsets
b. Naïve Bayes until it ends up with a leaf node having the least cost.
2) K - Nearest Neighbour This leaf node is the one from which the classifier
3) Support Vector Machine is chosen.
4) Decision Trees
5) Boosted Trees
6) Random Forest
7) Neural Networks
2.1. LINEAR CLASSIFIER – NAÏVE BAYES:
19
International Journal of Inventions in Computer Science and Engineering, Volume 6 Issue 5-7 May-July 2019
20
International Journal of Inventions in Computer Science and Engineering, Volume 6 Issue 5-7 May-July 2019
reduced the dimension of WDBC data to 5 and having a test to evaluate predictive models. The results of
good accuracy of 98.2%. simulating each algorithms gives different values of
time required to build the model and accuracy. Among
P. Hamsagayathri [6] in 2017 performed 4 different classifiers SVM has the highest accuracy of
decision tree classification with J48 classifier in order to 97.9% with processing time of 0.08 s and KNN with the
reduce the size of the tree and the number of leaf nodes. least of computation time of 0 s is a lazy model as it
This method also eliminated the repetitive sub-trees by does not work much. Also Random Frost and Naïve
implementing attribute priority. The dataset employed Bayes has been estimated to have highest error rates.
was from SEER and it was witnessed that J48 classifier
showed an accuracy of 98.5% reducing computational Anusha Barath [19] in 2018 used a WBCD
cost and complexity by increasing the memory size. dataset using 80% of data for training and the remaining
for testing. They worked on each algorithm NB, KNN,
Meriem Armane [7] in 2018 compared the SVM and CART without standardizing and the
Naïve Bayes and KNN algorithm in the same data set accuracy is found to be above 92%. On the other hand
WBCD from UCI. This dataset analyses different after standardizing the dataset the accuracy of the SVM
characteristics assigned by pathologists in order to is increased drastically to 99.1%. This proves that fine-
classify Breast Cancer such as lump thickness, tuning of parameters improves the accuracy of the
uniformity of cell shape or size, bare nuclei etc. most of classifier.
the time NB is combines with other algorithms for
classification. Through the results from simulation the IV COMPARISON OF ML ALGORITHMS
author has noted the accuracy of KNN is 97.5% which
is higher than that of NB which has 96.19%. A table is formulated by comparing different
approaches in classification of breast cancer. This
Youness Khourdifi and Mohamed Bahaj [8] in shows the advantages and disadvantages of the list of
2018 proposed a survey on selecting the best classifier algorithms that enables the proper selection of
for Breast Cancer prediction by comparing various data appropriate algorithm for the dataset and for the right
mining algorithms. They used a 10-fold cross-validation application.
TABLE 4.1. COMPARISON OF DIFFERENT MACHINE LEARNING TECHNIQUES
21
International Journal of Inventions in Computer Science and Engineering, Volume 6 Issue 5-7 May-July 2019
From the table, it is evident that each algorithm on Multi-scale Blob Detection Algorithm in Automated
is used in particular situations and it depends on the Breast Ultrasound Images 2011.
datasets. The datasets that are considered in
classification may be of images or just numerical data [7] Ahmet Mert, Niyazi Kilic, Aydn Akan, Breast
and its length is also accounted for the right algorithm. Cancer Classification by Using Support Vector
As a compilation of the literature survey the Support Machines with Reduced Dimension IEEE Proceedings
Vector Machine algorithms found to be used often ELMAR-2011.
providing good accuracy. Moreover SVM can be
combined with other algorithms to improve the [8] E. F. Hall, M., I. Witten, Data mining:
efficiency and accuracy [4]. Usually the breast cancer Practical machine learning tools and techniques,
classification employs only 2 categories such as Benign Kaufmann,. 2011.
or malignant so it is much efficient to use binary SVM
classifier than adopting other classifiers.in order to [9] Aruna S, Rajagopalan SP, Nandakishore LV.
achieve improved results proper pre-processing Knowledge based analysis of various statistical tools in
techniques can be followed. detecting breast cancer. Computer Science &
Information Technology 2011; 2:37–45
CONCLUSION
This paper is a survey of various algorithms [10] Evanthia E. Tripoliti et al. Automated
available in machine learning to classify the breast Diagnosis of Diseases Based on Classification:
cancer. The survey was conducted from a collection of Dynamic Determination of the Number of Trees in
standard papers published by researchers and content Random Forests Algorithm, IEEE Transactions On
studied from authorized websites. It provides an insight Information Technology In Biomedicine Vol. 16, No. 4,
of selecting a suitable algorithm in classification of July 2012
breast mammogram images. In regards to the survey
SVM is found to be best suited for binary classification [11] Xiufeng Yang,Hui Peng, Mingrui Shi, SVM
of breast cancer. with Multiple Kernels based on Manifold Learning for
Breast Cancer Diagnosis, Proceeding of the IEEE
REFERENCES International Conference on Information and
Automation Yinchuan, China, August 2013.
[1] T. Jinshan, R.R., X. Jun, I. El Naqa, Y.
Yongyi, Computer-Aided Detection and Diagnosis of [12] Mitko Veta, Josien.P.W. Pluim, Paul J.
Breast Cancer With Mammography: Recent Advances , Vandiest, Max A. Viergever, Breast Cancer Histopathy
Information Technology in Biomedicine. IEEE, Vol. Image Analysis: A Review , IEEE Transactions On
13, pp. 236-251, 2009. Biomedical Engineering Vol. 61, No. 5, May 2014.
[2] O. Chapelle, B. Scholkopf, and A. Z. Eds., [13] A. Alarabeyyat, A.M., Breast Cancer Detection
Supervised Learning (Chapelle, O. et al., Eds.; 2006) Using K-Nearest Neighbor Machine Learning
[Book reviews], IEEE Trans. Neural Networks, vol. 20, Algorithm in 9th International Conference on. IEEE,
no. 3, p. 542, 2009. v.i.e.E. (DeSE), pp. 35-39, 2016.
[3] Li Rong Sun Yuvan, Diagnosis of Breast [14] M.H. Asri, H.A Moatassime, Using Machine
Tumour Using SVM-KNN Classifier , 2010 Second Learning Algorithms for Breast Cancer Risk Prediction
WRI Global Congress on Intelligent Systems. and Diagnosis. Procedia Comput Sci, Vol. 83, pp.
1064– 1073, 2016.
[4] A. M. Krishnan, R. Banerjee, S. Chakraborty
and C. Chakraborty, Statistical analysis of [15] S. Kanta Sarkar, A.N., Identifying patients at
mammographic features and its classification using risk of breast cancer through decision trees ,
support vector machine Expert Systems with International Journal of Advanced Research in
Applications vol. 37, pp. 470-478, 2010. Computer Science. Vol. 08, pp. 88-96, 2017.
22
International Journal of Inventions in Computer Science and Engineering, Volume 6 Issue 5-7 May-July 2019
23