You are on page 1of 6

Implementation of Artificial Intellegence to Diagnose and Predicting Breast Cancer disease : Review

Irfan1, Himawan2, Ni Wayan Parwati3


1 Computer Science Faculty Budi Luhur University, Jakarta, 12260 Telp : (021) 5853753 ext 253, Fax : (021) E-mail : irfan03092008@gmail.com 2 Computer Science Faculty Budi Luhur University, Jakarta, 12260 Telp : (021) 5853753 ext 253, Fax : (021) E-mail : himawanawan10@gmail.com 3 Computer Science Faculty Budi Luhur University, Jakarta, 12260 Telp : (021) 5853753 ext 253, Fax : (021) E-mail : wayan.parwati@gmail.com

Abstract Breast cancer is one of the most common and deadly disease among women in the world. Detection of breast cancer in its early stage is the key of its cure. The automatic diagnosis of breast cancer is important. Artificial intelligence is now challenging research area in medicine, in this paper we show how artificial intelligence are used to diagnosis breast cancer. Neural network and fuzzy logic have been successfully applied to the problem of breast cancer diagnosis. People can be checked for breast cancer disease quickly and at an early stage. This review indicates that various artificial intelligence techniques can be effectively used for breast cancer diagnosis. The prediction can help a doctor to plan for a better medication and give patient with early diagnosis. The experiments that used on paper that we have reviewed using Wisconsin Breast Cancer Dataset (WBCD). Keywords: Breast cancer diagnosis, Artificial intelligence, fuzzy logic, neural network
1. INTRODUCTION Breast cancer is a disease initially found in the form of tumor around the breast. These tumors are classified as benign (non cancerous) and malignant (cancerous). The malignant tumors are cancer, where the cancer cell can invade and damage tissues and organs near the tumor. Breast cancer is the second leading cause of cancer deaths among women in the world [2]. According to a World Health Organization reports, breast cancer was detected in 1.3 million women in the world every year. Improvement in diagnostic procedures and effective medical aid has much reduced breast cancer death rate. A major problem in medical science involves diagnosis of disease, based on various tests performed upon the patient [17]. Accurate diagnosis and Early detection can improve the survival rate of breast cancer patients [3]. Breast cancer classification, diagnosis and prediction techniques have been a great researched area in the world of medical informatics. Several articles have been published which successfully classify breast cancer datasets using various techniques such as fuzzy logic and neural networks. Since the breast tumors whether malignant or benign share structural similarities, it becomes an extremely tedious and time-consuming task to manually diagnose them. Accurate classification is important as the potency of the cytotoxic drugs administered during the treatment can be life threatening or may develop into another cancer. Manual laboratory analysis or biopsies are time-consuming and yet accurate system of prediction. hence, an automated system to provide a faster and more reliable diagnosis methods for the patients is needed. Nevertheless, classifier systems help the medical community to a great extent in cancer detection. The advantages of such systems are: they are fast and capable of detailed examination, free from subjective errors and minimum patient inconvenience. 2. LITERATURE REVIEW A mount of research on diagnosis of breast cancer was founded in many literature. Many of them show good classification accuracy in diagnose. According Quinlan [4] classification accuracy using 10-fold cross- validation with C4.5 decision tree method achieved 94.74%. Hamiton, Shan, & Cercone [5] accuracy with RIAC method obtained 94.99%. Ster & Dobnikar [6] with linear discreet analysis method obtained 96.8%. Nauck and Kruse [7] was increased 95.06% with

neuron- fuzzy techniques. Pena-Reyes and Sipper [8] used the classification technique of fuzzy-GA method, reaching a classification accuracy of 97.36%. Setiono [9] employed the classification based on a feed forward neural network rule extraction algorithm, the accuracy was 98.10%. according Goodman, Boggess, and Watkins [10] used three different methods, optimized learning vector quantization (LVQ), big LVQ, and artificial immune recognition system (AIRS), and the obtained accuracies were 96.7%, 96.8%, and 97.2%, respectively. Albrecht, Lappas, Vinterbo, Wong, and Ohno-Machado [11] applied a learning algorithm that combined logarithmic simulated annealing with the Perceptron algorithm, the reported accuracy was 98.8%. the method proposed by Abonyi and Szeifert [12] an accuracy of 95.57% was obtained with the application of supervised fuzzy clustering technique. Polat and Gunes [13] least square SVM was used and an accuracy of 98.53% was obtained. Mehmet Fatih Akay[14] increased the accuracy to 99.51%, by combining SVM with feature selection.
Table 1: Classification accuracies obtained with our method and other classifiers from literature:
No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Author (year) Quinlan (1996) Hamiton et al. (1996) Ster and Dobnikar (1996) Nauck and Kruse (1999) Pena-Reyes and Sipper (1999) Setiono (2000) Goodman et al. (2002) Goodman et al. (2002) Goodman et al. (2002) Albrecht et al. (2002) Abonyi and Szeifert (2003) Polat and Gunes (2007) Mehmet Fatih Akay (2009) N. Kumaravel, J. Palanivel (2011) Method C4.5 RIAC Linear Discreet Analysis (LDA) Neuro-fuzzy Fuzzy GA feed forward neural network rule extraction algorithm LVQ Big LVQ AIRS learning algorithm that combined logarithmic simulated annealing with the Perceptron algorithm supervised fuzzy clustering least square SVM combining SVM with feature selection Fuzzy C-Means Clustering Classification Accuracy (%) 94.74 94.99 96.80 95.06 97.36 98.10 96.70 96.80 97.20 98.80 95.57 98.53 99.51 99.87

3. METHODOLOGY 3.1. Breast Cancer Dataset There are ten features computed using digital image [1], as following: 1. Radius (mean of distances from center to points on the perimeter) 2. Texture (standard deviation of gray-scale values) 3. Perimeter 4. Area 5. Smoothness (local variation in radius lengths) 6. Compactness (perimeter^2/are 1.0) 7. Concavity (severity of concave portions of the contour) 8. Concave points (number of concave portions of the contour) 9. Symmetry 10. Fractal dimension ("coastline approximation"-1). Many researchers use WBCD to conduct their research so that we are able to compare the result of reviewed paper. According to A, Ahirwar and R.S. Jadon [15] using gray level co-occurrence matrices (GLCM). The feature extraction using formula will describ on the table 2:
Table 2: feature extraction on GLCM
N o 1 Method Contrast Formula

S c=
2 Entropy:

Se=-

Energy:

Sen=
4 Mean:

Sm=
5 Inverse Differenc e Moment: Standard Deviation:

Sidm= Sd=

3.2. Fuzzy C-Means Clustering Fuzzy C-means (FCM) is a clustering method which allows one piece of data to belong to two or more clusters. This method developed by Dunn[16] in 1973 and improved is used in pattern recognition frequently. The algorithm of Fuzzy C-means (FCM) was following here : Start
Initialize the membership matrix KA

Calculate Fuzzy cluster center Compute the cost function Compute a new k A

Vij=

Show the result

Finni sh Figure 1. The algorithm of Fuzzy C-means (FCM) 3.4. The Generalized Regression Neural Networks (GRNN) There are four layer in GRNN: input layer, pattern layer, summation layer, and output layer. Input layer is fully connected to pattern layer. Each pattern unit is connected to the neurons on the summation layer. The summation neuron computes the sum of weighted output of the pattern layer while the D summation neuron calculates the un weighted outputs of the pattern neurons[17]. The connection weight between neuron in the pattern layer and the summation neuron is Y i. For D summation neuron, the connection weight is unity. The output layer divides the output of each S summation neuro and D summation neuron, yielding the predicted value to an unknown input vector [18].

Figure 2. GRNN built up in a way such that it can be used as a parallel Neural Network 3.5. Probabilistic Neural Network A probabilistic neural network is an implementation of statistical algorithm called Kernel discriminant analysis in which the operations are organized into a multilayered feed forward network with four layers: input layer, pattern layer, summation layer, output layer. This method give a fast training process and guaranteed to converge to an optimal classifier as the size of the representative training set increases. In Probabilistic neural network, training samples can be added or removed without extensive retraining. Pattern layer, there is one pattern node for each training example. Each pattern node built a product of the weight vector and the given example for classification, where the weights entering a node are from a particular example. The prouct is passed through the activation function. Summation layer: each summation node receives the outputs from pattern nodes associated with a given class. output layer: the output nodes are binary neurons that produce the classification decision. 4. RESULT AND DISCUSSION 4.1. Fuzzy Logic The data was used for training SVM[1]. In this work , there are two classes: the first is benign (non cancerous) and the other is malignant (cancerous). Performance classification of the classifier are calculated using fuzzy C-means clustering. The highest classification accuracy achieved is 97.007 %. A poor classification, with an accuracy of 54.401%. for comparison. Experiments are conducted with six different epochs of the data. First epoch contains to 5 10% of the data, second epoch 10- 20%, third epoch 20 40%, fourth 40- 50 %, fifth epoch 5070% and sixth epoch 70 100%. They are shown on table 3. Table 3. Experiments are conducted with six different epochs of the data
Epoch 1 2 3 4 5 6 Epoch percentage 5 10% 10 - 20% 20 - 40% 40-50% 50-70% 70- 100% Sensitivity 96.69 98.6 97.83 98.93 98.74 99.82 Specificity 95.54 97.05 96.84 98.27 95.54 99.71 Positive predictive value 96.9 98.41 92.30 97.67 95.45 99.73 Negative Predictive value 97.97 94.28 96.61 99.74 98.71 99.83

4.2. Neural Network Material that used in the research [17] was derived from the internet site of University at California at Irvine (UCI) Machine Learning Data Repository. The files contain medical data concerning breast cancer classification cases that were categorized into benign or malignant. Table 4 shows the discretion of three dataset: Table 4. Data Discretions

Dataset Number 1 2 3

Number of Instance 699, missing values = 16 569 198

Number of Attributes 10 32 35

Data type Integer Real Real

Class Distribution Benign : 458 Malignant : 241 Benign : 357 Malignant : 212 Benign : 151 Malignant : 47

Table 5. Classes and their data discretion Dataset Number 1 2 3 Train/Dataset 477/203 398/171 136/58 Class A Train 311 250 32 Class B Train 166 148 104 Class A Test 133 107 14 Class B Test 73 64 44

Selection of proper neural network structure is one of the most difficult problems for neural network modeling. There are three different neural network structures, multi layer perceptron (MLP), generalized regression neural network (GRNN) and probabilistic neural network (PNN) were applied to three Wisconsin Breast Cancer Dataset (WBCD) to show the performance of neural networks on breast cancer data. The following table is the result of ANNs performance:

Data Set 1 2 3

Table 6. ANNs Performance [17] MLP Performance (%) GRNN Performance (%) 99 97 98 95 70 75

PNN Performance (%) 99 96 75

5. CONCLUSSION Fuzzy C-Means Clustering method was brought results with higher classification accuracies. The 80-20% training- test partition gives a highest classification accuracy achieved 99.87%. Neural network have been applied for pattern classification and recognition problems. The performance of three networks MLP,GRNN and PNN was investigated for breast cancer diagnosis using three data set. There are different results on the third dataset because the number of instance was less than the other sets. An overall result shows that the most suitable neural networks for classifying WCBD data are MLP and GRRN.

REFERENCES [1] J. Palanivel, N. Kumaravel, An Efficient Breast Cancer Screening System Based on Adaptive Support Vector Machines with Fuzzy C-Means Clustering, European Journal of Scientific Research, ISSN 1450-216X Vol.51 No.1 (2011), pp.115-123 [2] http://napavalley.patch.com/articles/healthy-living-can-prevent-breast-cancer-napa-valley-resources [3] D.West, P.Mangiameli, R,Rampal, and V.West. Ensemble Strategies for a medical diagnosis decision support system: A breast Cancer diagnosis application. European Journal of Operational Research, 2005.vol.162, pp. 532551, [4] J.R.Quinlan, Improved use of continuous attributes in C4.5, Journal of Artificial Intelligence Research, 1996, vol.4, pp. 7790. [5] H. J .Hamiton, N. Shan, and N. Cercone RIAC: A rule induction algorithm based on approximate classification. Technical Report CS. University of Regina, ,1996, pp.96-06. [6] B.Ster, and A.Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. Proceedings of the international conference on engineering applications of neural networks, 1996, pp. 427430. [7] D.Nauck, and R. Kruse,Obtaining interpretable fuzzy classification rules from medical data. Artificial Intelligence in Medicine, , 1999, vol.16, pp.149169. [8] C. A.Pena-Reyes, and M. Sipper, A fuzzy-genetic approach to breast cancer diagnosis. Artificial Intelligence

[9] [10] [11] [12] [13] [14] [15] [16] [17] [18]

in Medicine , 1999, vol.17, pp.131155,. R. Setiono, Generating concise and accurate classification rules for breast cancer diagnosis. Artificial Intelligence in Medicine, 2000, vol.18 (3), pp.205217. D.E.Goodman, L. Boggess, and A.Watkins, Artificial immune system classification of multiple-class problems. Proceedings of the artificial neural networks in engineering, 2002, pp. 179183. A. A.Albrecht, G. Lappas, S. A. Vinterbo, C. K Wong, and L. Ohno- Machado Two applications of the LSA machine. Proceedings of the 9th international conference on neural information processing , 2002, pp. 184189. J.Abonyi, and F. Szeifert, Supervised fuzzy clustering for the identification of fuzzy classifiers. Pattern Recognition Letters, ,2003, vol.14(24), 21952207. K.Polat, and S.Gunes, Breast cancer diagnosis using least square support vector machine. Digital Signal Processing, 2007,vol.17(4), 694701. Mehmet Fatih Akay Support vector machines combined with feature selection for breast cancer diagnosis Expert Systems with Applications, Elsivier. ,2009, Vol.36, pp. 3240324. Ahirwar and R.S. Jadon , Characterization of tumor region using SOM and Neuro Fuzzy techniques in Digital Mammography, International Journal of Computer Science & Information Technology (IJCSIT), Feb 2011, Vol 3, No 1, pp.199-211 J. C. Dunn, "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters", Journal of Cybernetics, 1973, vol.3 pp.32-57 A. A. E. Howida, H. H. Mohammed, Breast Cancer Diagnosis Using Intelligence Neural Network , J.Sc. Tech, (1) 2011, pp 159-171 Kerem .H CIIZOLU1 Pnar AKINArtificial neural network models in : Rain fall Run - off Modeling of Turkish Rivers. 1990

You might also like