You are on page 1of 15

Innovative Computing Review (ICR)

Volume 3 Issue 1, Spring 2023


ISSN(P): 2791-0024 ISSN(E): 2791-0032
Homepage: https://journals.umt.edu.pk/index.php/ICR

Article QR

Comparative Analysis of Breast Cancer Detection using Cutting-edge


Title: Machine Learning Algorithms (MLAs)

Author (s): Tanzeel Sultan Rana1*, Imran Saleem2, Rabia Naseer Rao1, Maryam Shabbir2,
Laiba Wahid Chaudary1
Affiliation (s): 1
University of Management and Technology Lahore, Pakistan
2
Bahria University, Lahore, Pakistan
3
Octans Digital Pvt Ltd. 92-cc,Ex-Park View, Lahore ,Pakistan

DOI: https://doi.org/10.32350.icr.31.01

Received: April 12, 2023, Revised: April 29, 2023, Accepted: May 29, 2023, Published:
History: June 24, 2023
Citation: T. Sultan, R. I. Saleem, R. N. Rao, M. Shabbir, and L. W. Chaudary,
“Comparative analysis of breast cancer detection using cutting-edge machine
learning algorithms (MLAs),” Innov. Comput. Rev., vol. 3, no. 1, pp. 01–15,
2023, doi: https://doi.org/10.32350.icr.31.01

Copyright: © The Authors


Licensing: This article is open access and is distributed under the terms of
Creative Commons Attribution 4.0 International License
Conflict of
Interest: Author(s) declared no conflict of interest

A publication of
School of Systems and Technology
University of Management and Technology, Lahore, Pakistan
Comparative Analysis of Breast Cancer Detection using Cutting-edge
Machine Learning Algorithms (MLAs)
Tanzeel Sultan Rana1 ∗, Imran Saleem2, Rabia Naseer Rao1, Maryam Shabbir2,
0F

Laiba Wahid Chaudary1


School of Professional Advancement, University of Management and Technology,
Lahore, Pakistan
Department of Computer Sciences, Bahria University, Lahore, Pakistan
Octans Digital Pvt Ltd. 92-cc,Ex-Park View, Lahore ,Pakistan

ABSTRACT Recently, machine learning techniques have gained popularity for the
medical diagnosis. Medical professionals use this approach to learn and detect the
abnormalities of life-threatening chronic diseases. The increasing use of ML approaches
may be due in part to better disease diagnosis enabled through improved symptom
detection. The current study deployed different machine learning algorithms, including
Decision Trees (DT), K-Nearest Neighbors (KNN), classifiers Multilayer Perceptron (MP),
Support Vector Machines (SVM), and Random Forest (RF) for early predictions and
symptoms of the disease. These models were capable of differentiating between benign
and harmful cancer cells Benign tumours, which were non-cancerous and in most cases,
non-lethal were mostly confined to the area from where they originated, however, it was
observed that malignant cancer can start with abnormal cell growth in the human body.
This abnormal cell growth can quickly spread to nearby tissues, which can cause
infiltration of adjacent cells, resulting in a potentially fatal condition. Thereby, it was
observed that Multilayer Perceptron (MLP) model provided the highest accuracy
percentage of 86% when compared with all the other techniques in association with the
accuracy rate of the models
INDEX TERMS Decision Tree (DT), K-Nearest Neighbor (KNN), Logistic Regression
(LR), Support Vector Machine (SVM), Machine Learning (ML), Multilayer Perceptron
(MP), Random Forest (RF)
I. INTRODUCTION tissue. The current study focused on
breast cancer tumours along with
Cancer is a complex disease, which is
Machine Learning Algorithms (MLA).
characterized by uncontrolled growth of
Primarily, the breast composed of two
abnormal tissue in the entire body.
main types of tissue: glandular tissue and
Normally, old or damaged cells are
connective tissue. Glandular tissues are
replaced by new and healthy cells to
responsible for producing milk, whereas
maintain a healthy functioning of body .
connective tissues provide structural
Contrastingly, some damaged tissue
support and shape the breast. Glandular
incessantly grows and become a mass of
tissues may convert into malignant
tissues known as a tumour in the human
tumours with the passage of time. Most
body. There are two types of tumours,
breast cancers emanate in the cells of the
for instance, malignant and benign
lobules, the anatomical structures that


Corresponding Author: tanzeelsultanrana1@gmail.com
Innovative Computing Review
2
Volume 3 Issue 1, Spring 2023
Rana et al.

consist of milk-producing glands, or in is essentially important to optimize


the ducts embedded amid the breast various aspects of therapy and to
tissue, which acts as the passageways increase the chances of survivability of
that deliver milk from the lobules to the patients. Many researchers have put
nipples. Breast stromal tissues, which forth various approaches for automatic
are composed of fatty and fibrous cell classification to diagnose breast
connective tissues, can also cause breast cancer in the recent years. Therefore, it
cancer. The overall structural was observed that Machine learning
deformation of a woman's breast tissue techniques stand out among them for the
caused by the presence of a malignant classification and prediction of breast
tumour is influenced by age-related cancer among female patients [4].
changes in the amount of fatty and Machine learning methods were used to
fibrous tissues in her body[1]. The death identify cancer and determine whether a
rate annually has significantly increased tumour is present or not, it may be useful
due to the rising cases of breast cancer. in the study of breast cancer.
Recently, it has been observed that a Additionally, these cancer tumors can be
massive death rate is due to breast predicted using machine-learning
cancer[2]. techniques (MLAs). When using
conventional methods of diagnosis,
The majority of female patients normally
these cancer tumors might frequently go
do not have a good prognosis for it at the
undetected for a long time [5]. Thereby,
stage when it is eventually recognised
increasing the proportion of deaths
for what it is—a cancer of breast tissue—
brought by cancer.
which accounts for its high lethality./ Its
high fatality rate is due to the unfortunate Machine learning techniques (MLs)
reality and less awareness regarding this have recently emerged as a highly
disease, which has not only increased the relevant area in practical research, which
poor-prognosis due to last-stage are very constructive in the prompt
detection but it has also caused diagnosis of breast cancer. Over the past
malignancy in breast tissues. Breast ten years, using machine learning
cancer usually develops by a genetic techniques for medical diagnosis has
abnormality, or defect, in the inherent gained popularity. This increased use of
code. This inherited genetic abnormality ML techniques can be partly attributed to
can cause erroneous gene expression, the fact that it made better disease
which becomes a prime reason for its identification and symptom detection
cause and development. Only 5–10% of [6].
the cancer cases were observed to be
The current study is divided into the
because of inherited abnormalities from
following sub-sections. The reviews of
your mother or father. Instead, 85–90%
literature is given in Section 2.
of breast cancer cases are due to genetic
Moreover, Section 3 explains the
abnormalities, which arise with aging
experiment performed and the adopted
[3].
algorithms to show the obtained results.
In order to improve different areas of Furthermore, these results are elaborated
treatment and raise patients' chances of in Section 4. Additionally, Sections 5 of
survival, accurate cancer detection is the study dicusses the study results and
important./ Accurate diagnosis of cancer concludes the research.
School of System and Technology
3
Volume 3 Issue 1, Spring 2023
Comparative Analysis of Breast Cancer...

II. LITERATURE REVIEW Haifeng developed a SVM-based


ensembled model for the early prediction
Machine Learing (ML) can be used to
of cancer. The proposed ensembled
submit fresh diagnostic hypotheses,
model was made up of six different types
helping to create a more customized
of kernel functions and two different
therapeutic proposal. Several different
types of SVM structures, such as a-SVM
MLAs, such as Decision Trees,
and C-SVM [9]. In this study, two
Multilayer Perceptrons, K-Nearest
datasets from Wisconsin were used,
Neighbors, and Support Vector
namely WBC and WDBC, which were
Machines (SVM) have been used in this
used to test the model (SEER). The
study. The CSV file of the breast
intended model enhanced the diagnosis
mammograms was first taken and
accuracy when compared to other
classification models that had been
researches using a single SVM. The
trained to achieve the aforementioned
major observed disadvantage of this
objective were then used. In contrast to
strategy is that it requires a long training
benign tumors, which are non-cancerous
period and is computationally expensive.
andnon-lethal and are confined to the
Resultantly, the validity became
area where they originate whereas
doubtful because patients possessing
malignant cancer starts with abnormal
breast cancer were constantly at risk of
cell growth and the abnormal behavior of
the disease spreading.
messages between cells [7]. There is a
vast body of literature on this subject, Several deep learning and data mining
however, In the current study aims to techniques were examined for
deploy a computational method that veneration cancer by several
would predict breast cancer with of practitioners [10]. However, according
accuracy. to previous researches only a few papers
used genetics, , which claimed that
On the Wisconsin datasets, practitioners
imaging was used in majority of the
analyzed the performance of four
publications. A variety of algorithms,
classifiers: Naive Bayes (NB), SVM,
such as CNN and Nave Bayes, were used
Decision Tree (DT), and K Nearest
in imaging techniques. However,
Neighbors (K-NN) [8]. SVM has been
Machine Learning Techniques (MLT),
identified as the best amongst the others
Decision Trees, SVM, and Random
by achieving high accuracy percentage
Forests (RF) were quite famous.
of 97.13% with the lowest error rate with
respect to others' confusion matrices. It Researchers validated and applied
is important to consider this number different neural networks (NNs)
because the conventional diagnostic techniques , especially in early-stage
methods have higher probability to occur cancer classification. They discovered
errors, which would significantly impact that most NNs were capable of
the treatment strategies and staging of identifying cancerous cells and these
breast cancer. Currently, biopsy-proven cells are typically difficult to distinguish
breast malignancy is the most accurate because they frequently resemble the
method for diagnosing breast cancer, Parr of healthy breast tissue.
which also directly affects the medical Furthermore, they illustrated that these
professionals decisions . cells must be observed for a specific
amount of time by the experts to identify
Innovative Computing Review
4
Volume 3 Issue 1, Spring 2023
Rana et al.

their distinctive characteristics, precision and accuracy [16]. All other


including their structure and growth rate, classifiers outperformed the support
which can differ from normal breast vector (SV) machines, which had the
cells. The imaging method, however, highest accuracy of 97.2%.
required a lot of processing power to
Utilizing the WEKA program from the
preprocess the images [11].
Waikato Environment for Knowledge
Sudarshan Nayak in his study used 3D Analysis, Valentina Mikhailova
images to illustrate Machine Learning compared the categorization algorithms.
Algorithms (MLA) to categorize breast A number of 286 cases and 10 attributes
cancer. Before classifying an abnormal made the dataset for this study. The J48,
mass in the breast parenchyma, many Nave Bayes, Random Forest (RF), MLP,
factors were taken into consideration. K*, and SVM models were compared
Based on his overall results, it was using various parameters. Metrics such
concluded that SVM is the best model as machine learning helped to assess the
for this purpose [12]. performance of the developed models
[17]. The SVM algorithm and the J48
Similarly, , Youness Khoudfi and
model both offered the highest levels of
Mohamed Bahaj in a comparison of
accuracy, at 75.5% and 79.6%,
machine learning algorithms found that
respectively.
SVM is the best classifier, with an
accuracy of 97.9%, when compared to To categorize a WBCD, the practitioner
K-NN, RF, and NB, which are based on used naive Bayes, SVM, and Decision
multilayer perception with 5 layers and Trees (DT); support vector SVM, which
10 times cross-validation using MLP produced the best results with an
[13]. Moreover, Ahmed [14] prepared a accuracy score of 96.99% [18]. Clinical
method for estimating WBCD, which information from medical intensive care
combined a clustering strategy with a units was used in this study. Machine
potent probabilistic vector support learning techniques were used for the
machine, with a prediction rate of early detection of disease for the patients
99.10% made by the SVM technique. inside the hospital over the course of 24
hours. The KNN and logistic regression
A new approach was introduced by
produced the highest accuracy ratings
David A. [15], which employed linear
when applied to training data [19].
discriminant analysis (LDA) to reduce
the feature dimensionality and then The biggest problem with breast cancer,
deployed the new, reduced feature according to Nithya et al. [20], is
dataset with SVM. On the (BCWD) classifying breast tumours because of the
dataset, the practitioner applied five ML structural distortion these tumours cause.
algorithms: (SVM), Logistic Regression It is crucial to determine the type of
(LR), Decision Tree (DT), Random tumour when one is considering the the
Forest (RF), and KNN. A performance correct form of prognosis because it
assessment and comparison of these determines the impact of tumour on the
various classifiers were done after breast tissue. Computer-aided diagnosis
getting the results. This study's main (CAD) was used to check the
objective was to develop a confusion significance of breast cancer tumours in
matrix to check the significance of the patient. Their main goal was to use

School of System and Technology


5
Volume 3 Issue 1, Spring 2023
Comparative Analysis of Breast Cancer...

data mining technologies to improve WEKA. The CART employed in Python


breast cancer projection. The gave 97.4% and 98.9% in terms of its
classification performance of many accuracy and sensitivity, respectively,
machine learning algorithms was while in WEKA both DT algorithms
enhanced using Bagging, Multiboot, achieved 95.3% and 95.3% accuracy,
Random Subspace, and Multilayer respectively [24] [25].
Perceptron.
In the testing phase, NSVM, LPSVM,
The classification of different patient SSVM, and LPSVM all achieved the
groups into high-risk and low-risk highest accuracy, sensitivity, and
patients were examined by Kourou et al. specificity, which were 96.5517%,
using Artificial Neural Networks 98.2456%, 96.5517%, and 97.1429%,
(ANN), Decision Trees (DT), and SVM respectively. Yue et al. provided
to present a model for cancer risks [21]. thorough evaluations of different models
by using the standard WBCD dataset and
A study has been conducted using
Decision Tree (DT) methods, which
mammogram diagnosis on biopsy of
were used to predict breast cancer.
breast cancer [22]. The study used
According to the practitioners by
traditional classification techniques such
collaborating with two deep learning
as LR, LDA, QDA, FR, and SVM in
models the highest accuracy rate can be
their practical work. Md. Milon Islam
achieved. This architecture had a
contrasts supervised machine learning
classification accuracy of 99.68%, but
algorithms like SVM, ANNs, and LR
when combined with the clustering
[23]. The UCI machine learning
algorithm, the SVM method had a
database, a well-known machine
classification accuracy of 99.10%. They
learning resource, is where the WBC
also studied the ensemble method, which
dataset was taken from. The
employed voting to create the J48, SVM,
performance of the study was evaluated
and Naive Bayes models. With the
using the confusion matrix and
ensemble method, an accuracy of
correlational factors. Additionally,
97.13% was achieved, respectively [26].
different approaches' receiver operating
Infrared imaging coupled with an agent
characteristic curves and precision-recall
previously administered to a patient can
area under curves were assessed. The
lead to a very accurate tumour detector,
findings indicated that SVM had the
with a thermal sensitivity camera and
highest accuracy among all applied
model of the breast [27].In this research
algorithms, while ANNs have the
remote health care systems used
highest values of accuracy, precision,
technological paradigms and enablers to
and F1-score.
fulfill their needs of remote monitoring,
Different versions of the DT algorithm remote aid, and research gaps, which
for the diagnosis of cancer were were identified to stimulate the future
employed using Mat lab, Python, and research [28].

Innovative Computing Review


6
Volume 3 Issue 1, Spring 2023
Rana et al.

TABLE I
SUMMARY OF LITERATURE REVIEW
Reference Model Method
Support Vector Machine
(SVM), Decision Tree (C4.5), SVM improves breast cancer
[8]
Naive Bayes (NB), and k diagnosis and treatment strategies.
Nearest Neighbors (k-NN)
Haifeng's ensemble model
demonstrated improved diagnosis
Support vector machine
[9] accuracy, but its long training time
ensemble Algorithm
and computational expense raise
concerns.
Haifeng's ensemble model
demonstrated improved diagnosis
Deep learning and machine
[10] accuracy, but its long training time
learning algorithms
and computational expense raise
concerns.
Neural networks can identify
cancerous cells but require
Artificial Neural Network
[11] observation over time and significant
(ANN)
processing power processing of image
processing.
Support Vector Machine
(SVM), Random Forest, SVM is the best-supervised machine
[12] Logistic Regression, Decision learning algorithm for breast cancer
tree (C4.5), and K-Nearest classification.
Neighbours (KNN)
Random Forest, Naïve Bayes,
Support Vector Machines SVM is the best classifier with 97.9%
[13]
SVM, and K-Nearest accuracy.
Neighbors K-NN
Support Vector Machine Clustering and SVM combined to
[14]
(SVM) predict WBCD with 99.10% accuracy.
Pulse-Coupled Neural A new approach uses LDA to reduce
Networks (PCNN) and Deep feature dimensionality and applies
[15]
Convolutional Neural five ML algorithms to the BCWD
Networks (CNN) dataset.
Support Vector Machine
(SVM), Random Forest,
Support vector machines had the
[16] Logistic Regression, Decision
highest accuracy (97.2%).
tree (C4.5), and K-Nearest
Neighbours (KNN)

School of System and Technology


7
Volume 3 Issue 1, Spring 2023
Comparative Analysis of Breast Cancer...

Reference Model Method


Naïve Bayes, J48, K*, Random
Forest, Multilayer Perceptron SVM and J48 models offer accuracy
[17]
(MLP) and Support Vector at 75.5% and 79.6% respectively.
Machine (SVM)
Naïve Bayes, Support vector
machines, Radial basis neural SVM produced the best results with
[18]
networks, Decision trees J48, 96.99% accuracy.
and simple CART
Used data mining technologies to
improve breast cancer prediction,
using bagging, multiboot, random
Randomized Controlled Trial
[19,20] subspace, and multilayer perceptron to
(RCT)
improve the classification
performance of machine learning
algorithms.

III. MATERIALS AND METHODS comprehensive follow-up instead. Due to


some important limitations, a biopsy is not
Mammography is a radiological method,
usually used to diagnose breast masses.
which is used for screening breast cancer.
Diagnostic mammography is a special kind This data set can be used to classify benign
of mammogram, which is used to detect or malignant)based on BI-RADS (Breast
abnormalities in females who have been Imaging-Reporting and Data System)
diagnosed with having breast issues or features and the patient's age, which is
cancer after the suggestion or advice of a considered a benchmark classification for
medical professional. Women over the age the breast cancer diagnoses. This system
of 45 are advised to undergo screening depicts the correlation with the likelihood
mammography to rule out any breast of malignant breast cancer. Sensitivity and
tumours or malignancies, but this associated specificities can be determined
diagnostic procedure in and of itself has by assuming that all instances with BI-
risks related to radiation exposure for both RADS ratings greater than or equal to a
young and malignant females. However, specific value (ranging from 1 to 5) are
mammography only predicts 70% on a true malignant and all other cases are benign.
positive scale many other unnecessary These can demonstrate a CAD system's
biopsies are performed to confirm the performance in comparison to radiologists.
indication of this chronic disease. It can be noticed from previous literature
that different practitioners use different
Recently, several Computer-Aided
methods. whereas the machine learning
Diagnostics (CAD) approaches have been
method was employed in the current study
put forth to lessen the number of
to check the significance of the dataset,
unnecessary screening biopsies. These
which fills the gap. Previously, these type
systems have helped clinicians to choose
of models have not been applied to check
between performing a screening biopsy for
the accuracy of cancer genome by any of
a suspicious patient on the behalf of a
the researcher but in this paper novel model
mammogram and may perform a
Innovative Computing Review
8
Volume 3 Issue 1, Spring 2023
Rana et al.

techniques have been used to aid the Feature Role


research in this field. Concave number of concave portions of
points the contour
A. DATA PREPARATION Fractal
"coastline approximation" - 1
dimension
Data preprocessing is a tool that locates or
Class
eliminates outliers. Additionally, it distribution
357 benign, 212 malignant
eliminates self-contradiction. The dataset is
usually reduced to just the sample code B. MODEL SELECTION
number. Its removal is justified by the fact The learning process in machine learning
that it does not affect illnesses. The dataset strategies can be divided into two main
contains 16 missing values for traits. For categories: supervised learning and
this, the missing traits are replaced by the unsupervised learning. The following
mean. A random selection process from the Machine learning models were used in this
dataset has been employed to ensure that study,
the data is distributed correctly. The dataset
contains the following features such as id, Support Vector Machine (SVM)
diagnosis (M = malignant, B = benign), Multilayer Perceptron (MLP)
radius, texture, perimeter, area,
smoothness, compactness, concavity, Decision Tree (DT)
concave points, symmetry, and fractal Random Forest (RF)
dimension. Ten real-valued features are
computed for each cell nucleus, namely K-Nearest Neighbor (KNN)
mean, standard deviation for grey-scale Logistic Regression (LR)
values, and worst or largest of these
features. The dataset can be understood by Different models were used to verify the
the description given in the Table II below. best result for the dataset. The
aforementioned methods were used in the
TABLE II research because they provided a better
DESCRIPTION OF DATASET classification and interpretation.
Feature Role Moreover, a convolutional neural network
the main feature of
in the future can be used to compare this
Diagnosis work with a deep learning model. The
classification
Radius
mean of distances from the methods, which were employed in this
center study are widely used in classification
the standard deviation of gray-
Texture
scale values
problems, which gave clear directions to
conduct better analysis. Moreover, a
Smoothness local variation in radius lengths
generic flow diagram for better
Compactness perimeter^2 / area - 1.0 interpretation was used which is given
Concavity
the severity of concave portions below:
of the contour

School of System and Technology


9
Volume 3 Issue 1, Spring 2023
Comparative Analysis of Breast Cancer...

FIGURE 1. Proposed Framework


C. TRAINING AND TESTING PHASE Many performance metrics were employed
to measure the effectiveness of Machine
Primarily, the dataset was set into two
Learning Algorithms (MLAs). In case of
chunks for training and testing. In the
the evaluation of the concerned parameters,
training phase, main features were
confusion matrices were used such as TP,
extracted to classify the cancer type,
TN, FN, and FP, which were used to predict
whereas in the testing phase, the
data as well as the real data. For all the
significance of the predicted case was
methods employed, the calculated
checked in the confusion matrix. K fold
confusion matrix is as follows:
cross-validation showed that one fold was
used for testing, while k 1 folds were used TABLE III
repeatedly for training. Overfitting was SUPPORT VECTOR MACHINE (SVM)
prevented by using cross-validation. Data CONFUSION MATRIX
was partitioned in the study by using a ten-
fold cross-validation technique. Each Negative Positive
iteration used nine folds for training and the Negative TN=84 FP=33
remaining one fold for testing.
Positive FN=13 TP=78
D. SETUP FOR EXPERIMENTATION
In the case of SVM, TN, FN, FP, and TP
Six machine learning approaches were are 84, 13, 33, and 78, respectively,
utilizedin this study, which include SVM, whereas for KNN these are 81,16, 24, and
KNN, DT, RFs, MLP, and LR. The 87.
predictions encompassed the benign or TABLE IV
cancerous nature of cells. Intel(R) Core K-NEAREST NEIGHBORS (KNN)
(TM) i3-1111G4 CPU @ 3.00GHz CONFUSION MATRIX
2995Mhz with 8 GB RAM were installed
and used in this study. An open-source Negative Positive
written library that was written in Python, Negative TN=81 FP=24
namely Scikit-learn was employed for the Positive FN=16 TP=87
analysis. Reports, which contained narrated
text, live code, equations, and graphics For the Decision Tree (DT), TN, FN, FP,
were created by using an open-source tool, and TP are calculated as 72, 25, 34, and 77,
Colab. respectively.

Innovative Computing Review


10
Volume 3 Issue 1, Spring 2023
Rana et al.

TABLE V IV. RESULTS


DECISION TREE (DT) CONFUSION
The study showed an accuracy of 86%,
MATRIX
which was the highest accuracy achieved
Negative Positive through MLP and the second highest
accuracy achieved was 85.2%, which
Negative TN=72 FP=34 obtained through Logistics Regression.
Positive FN=25 TP=77 Additionally, the highest precision value
For the Multilevel Perceptron, TN, FN, FP, (0) was achieved by MLP and Logistic
and TP are calculated as 20, 09, 21, and 88, Repression in which MLP achieved 81% of
respectively. accuracy in case 1. The least value of
Table VI precision was 75%, which was achieved by
Multilevel Perceptron (MP) Confusion SVM for malignant or benign cases. LR
Matrix provided the highest recall rate (0), whereas
the lowermost values were obtained
Negative Positive through the Decision Tree (DT) and SVM.
MLP provided the highest recall rate (0)
Negative TN=20 FP=21
and the least value was given by the
Positive FN=09 TP=88 Decision Tree (DT) and SVM.
For the Linear Regression, TN, FN, FP, and In terms of the F1 score rate, the highest
TP are calculated as 88, 11, 20, and 89, value (0) was achieved by MLP,LR, and the
respectively. Decision Tree (DT), whereasSVM
TABLE VII provided the lowest value. The highest
LINEAR REGRESSION (LR) recall rate (1) was achieved through MLP,
CONFUSION MATRIX whereas SVM provided the lowest value.
TABLE IX
Negative Positive ACCURACY ATTAINED BY MODELS
Negative TN=88 FP=20
Positive FN=11 TP=89 Model Accuracy(%)
For the Random Forest (RF), TN, FN, FP, SVM 78
and TP are calculated as 80, 19, 21, and 88. KNN 81
respectively. DECISION TREE 72
TABLE VIII
LR 85
RANDOM FOREST (RF) CONFUSION
MATRIX RANDOM FOREST 81
MLP 86
Negative Positive
Negative TN=80 FP=21 After the accuracy, it is significant to know
the precision, recall, and F1 score to check
Positive FN=19 TP=88 the significance of the obtained results.
Hence, these results are discussed in the
discussion section for further interpretation.

School of System and Technology


11
Volume 3 Issue 1, Spring 2023
Comparative Analysis of Breast Cancer...

FIGURE 2. Comparison of models


The MLP model provided the highest V. DISCUSSION AND CONCLUSION
accuracy percentage when compared toall To evaluate the specific terms by their
other techniques in association with the equivalent formula in the investigation,
accuracy in Figure 1. We are going to following factors have been abundantly
discuss other things in the discussion used. There are several characteristics,
section below. which are comparable to those that tend to
describe the associations as substantial to
amply measure a system's performance.
The experimental results are given in Table
VII.
TABLE X
MACHINE LEARNING AND DEEP LEARNING MODELS RESULTS
Accuracy Precision(0) Precision(1) Recall(0) Recall(1) F1score(0) F1score(1)
SVM 78% 75% 68% 69% 74% 72% 71%
KNN 81% 84% 77% 78% 84% 81% 80%
DT 73% 75% 68% 69% 74% 72% 71%
LR 85% 89% 81% 82% 89% 85% 85%
RFT 81% 82% 79% 81% 81% 81% 80%
MLP 86% 91% 81% 81% 91% 85% 86%

For technique, the confusion matrix was testing, whereas 25% of the data was used.
calculated. For the dataset with 831 The confusion matrix of these machine
instances, 75% of the data instances were learning algorithms was shown, which
employed for training models and for provided the results for SVM, DT RF, K-
Innovative Computing Review
12
Volume 3 Issue 1, Spring 2023
Rana et al.

NN, MLP, and LR, respectively. The on some machine learning techniques
confusion matrix was seen in its in breast cancer classification,” in 2020
combination in the table above. IEEE-EMBS Conference on
Biomedical Engineering and Sciences
However, It can forecast the greatest
(IECBES), IEEE, 2021, pp. 499–504,
number of positives when any of the six
doi:
strategies is noticed to be true. Logistics
https://doi.org/10.1109/IECBES48179
Regression Models (LRM) may forecast
.2021.9398837
the least amount of positives when they are
false positives in addition to predicting the [5] S. Saeed, N. Z. Jhanjhi, M. Naqvi, M.
greatest number of true positives. Logistic Humyun, M. Ahmad, and L. Gaur,
Regression (LR) predicted the lowest false- “Optimized breast cancer premature
positive rate, whereas the false-positives detection method with computational
highest value was achieved by LR and segmentation: A systematic review
MLP. mapping,” Approaches Appl. Deep
Learn. Virtual Med. Care, pp. 24–51,
The decision Tree (DT) provided the
2022, doi: https://doi.org/10.4018/978-
highest rate where false negatives were
1-7998-8929-8.ch002
concerned, with the lowest rate being
achieved by MLP. The F1 score for all the [6] W. Yue, Z. Wang, H. Chen, A. Payne,
techniques is almost 97%, which was and X. Liu, “Machine learning with
significantly better. LR predicted the applications in breast cancer diagnosis
highest value of True-Negative, whereas and prognosis,” Designs, vol. 2, no. 2,
MLP provided the lowest value. Art. no. 13, May 2018, doi:
https://doi.org/10.3390/designs202001
REFERENCES
3
[1] D. Lazaro-Pacheco, A. M. Shaaban, S.
[7] L. Galluzzi et al., “Autophagy in
Rehman, and I. Rehman, “Raman
malignant transformation and cancer
spectroscopy of breast cancer,” Appl.
progression,” EMBO J., vol. 34, no. 7,
Spectrosc. Rev., vol. 55, no. 6, pp. 439–
pp. 856–880, 2015, doi:
475, 2020, doi:
https://doi.org/10.15252/embj.201490
https://doi.org/10.1080/05704928.201
784
9.1601105
[8] H. Asri, H. Mousannif, H. Al
[2] M. Arnold et al., “Current and future
Moatassime, and T. Noel, “Using
burden of breast cancer: Global
machine learning algorithms for breast
statistics for 2020 and 2040,” The
cancer risk prediction and diagnosis,”
Breast, vol. 66, pp. 15–23, 2022, doi:
Procedia Comput. Sci., vol. 83, pp.
https://doi.org/10.1016/j.breast.2022.0
1064–1069, 2016, doi:
8.010
https://doi.org/10.1016/j.procs.2016.0
[3] F. Gorunescu, “Fighting breast cancer 4.224
with the aid of artificial intelligence: A
[9] H. Wang, B. Zheng, S. W. Yoon, and
big challenge,” J. Cancer Clin. Res.,
H. S. Ko, “A support vector machine-
vol. 3, no. 1, Art. no. 1069.
based ensemble algorithm for breast
[4] N. A. Mashudi, S. A. Rossli, N. cancer diagnosis,” Eur. J. Oper. Res.,
Ahmad, and N. M. Noor, “Comparison vol. 267, no. 2, pp. 687–699, 2018, doi:

School of System and Technology


13
Volume 3 Issue 1, Spring 2023
Comparative Analysis of Breast Cancer...

https://doi.org/10.1016/j.ejor.2017.12. classification techniques for breast


001 cancer diagnosis,” in IOP Conf. Series:
Mat. Sci. Eng., 2019, Art. no. 12033,
[10] N. Fatima, L. Liu, S. Hong, and H.
doi: https://doi.org/10.1088/1757-
Ahmed, “Prediction of breast cancer,
899X/495/1/012033
comparative review of machine
learning techniques, and their [16] M. A. Naji, S. El Filali, K. Aarika, E.
analysis,” IEEE Access, vol. 8, pp. L. H. Benlahmar, R. A. Abdelouhahid,
150360–150376, 2020, doi: and O. Debauche, “Machine learning
https://doi.org/10.1109/ACCESS.2020 algorithms for breast cancer prediction
.3016715 and diagnosis,” Procedia Comput. Sci.,
vol. 191, pp. 487–492, 2021, doi:
[11] M. Mahmood, B. Al-Khateeb, and W.
https://doi.org/10.1016/j.procs.2021.0
M. Alwash, “A review on neural
7.062
networks approach on classifying
cancers,” IAES Int. J. Artif. Intell., vol. [17] V. Mikhailova and G. Anbarjafari,
9, no. 2, p. 317, 2020, doi: “Comparative analysis of
https://doi.org/1010.11591/ijai.v9.i2.p classification algorithms on the breast
p317-326 cancer recurrence using machine
learning,” Med. Biol. Eng. Comput.,
[12] B. M. Gayathri and C. P. Sumathi,
vol. 60, no. 9, pp. 2589–2600, July
“Comparative study of relevance
2022, doi:
vector machine with various machine
https://doi.org/10.1007/s11517-022-
learning techniques used for detecting
02623-y
breast cancer,” in 2016 IEEE Int. Conf.
Comput. Intell. Comput. Res., IEEE, [18] S. Aruna, S. P. Rajagopalan, and L. V
2016, pp. 1–5, doi: Nandakishore, “Knowledge based
https://doi.org/10.1109/ICCIC.2016.7 analysis of various statistical tools in
919576 detecting breast cancer,” Comput. Sci.
Inf. Technol., vol. 2, no. 2011, pp. 37–
[13] Y. Khourdifi and M. Bahaj, “Applying
45, 2011, doi:
best machine learning algorithms for
https://doi.org/10.5121/csit.2011.1205
breast cancer prediction and
classification,” in 2018 Int. conf. Elec. [19] J. L. Bernal, S. Cummins, and A.
Cont. Optim. Comput. Sci., IEEE, Gasparrini, “Interrupted time series
2018, pp. 1–5, doi: regression for the evaluation of public
https://doi.org/10.1109/ICECOCS.201 health interventions: a tutorial,” Int. J.
8.8610632 Epidemiol., vol. 46, no. 1, pp. 348–
355, Feb. 2017, doi:
[14] A. H. Osman, “An enhanced breast
https://doi.org/10.1093/ije/dyw098
cancer diagnosis scheme based on two-
step-SVM technique,” Int. J. Adv. [20] R. Nithya and B. Santhi,
Comput. Sci. Appl., vol. 8, no. 4, 2017, “Classification of normal and
doi: abnormal patterns in digital
https://doi.org/10.14569/IJACSA.201 mammograms for diagnosis of breast
7.080423 cancer,” Int. J. Comput. Appl., vol. 28,
no. 6, pp. 21–25, Aug. 2011.
[15] D. A. Omondiagbe, S. Veeramani, and
A. S. Sidhu, “Machine learning
Innovative Computing Review
14
Volume 3 Issue 1, Spring 2023
Rana et al.

[21] K. Kourou, T. P. Exarchos, K. P. Transac. Software Eng., vol. 9, no. 2,


Exarchos, M. V Karamouzis, and D. I. June 2021.
Fotiadis, “Machine learning
[27] D. Lazaro-Pacheco, A. M. Shaaban, S.
applications in cancer prognosis and
Rehman, and I. Rehman, “Raman
prediction,” Comput. Struct.
spectroscopy of breast cancer,” Appl.
Biotechnol. J., vol. 13, pp. 8–17, 2015,
Spectrosc. Rev., vol. 55, no. 6, pp. 439–
doi:
475, 2020, doi:
https://doi.org/10.1016/j.csbj.2014.11.
https://doi.org/10.1080/05704928.201
005
9.1601105
[22] D. Oyewola, D. Hakimi, K. Adeboye,
[28] E. U. Khadim, S. A. Shah and R. A.
and M. D. Shehu, “Using five machine
Wagan, "Evaluation of Activation
learning for breast cancer biopsy
Functions in CNN Model for
predictions based on mammographic
Detection of Malaria Parasite using
diagnosis,” Int. J. Eng. Technol. IJET,
Blood Smear Images," 2021
vol. 2, no. 4, pp. 142–145, 2016, doi:
International Conference on
https://doi.org/10.19072/ijet.280563
Innovative Computing (ICIC), Lahore,
[23] M. K. Hasan, M. M. Islam, and M. M. Pakistan, 2021, pp. 1-6, doi:
A. Hashem, “Mathematical model 10.1109/ICIC53490.2021.9693056
development to detect breast cancer
[29] I. Ahmad et al., "Emerging
using multigene genetic
Technologies for Next Generation
programming,” in 2016 5th Int. Conf.
Remote Health Care and Assisted
Info. Elect. Vision, IEEE, 2016, pp.
Living," in IEEE Access, vol. 10, pp.
574–579, doi:
56094-56132, 2022, doi:
https://doi.org/10.1109/ICIEV.2016.7
10.1109/ACCESS.2022.3177278
760068
[24] E. Venkatesan and T. Velmurugan,
“Performance analysis of decision tree
algorithms for breast cancer
classification,” Indian J. Sci. Technol.,
vol. 8, no. 29, pp. 1–8, Nov. 2015, doi:
https://doi.org/10.17485/ijst/2015/v8i
1/84646
[25] T. S. Rana, H. M. Usman, and S.
Naseer, “Static handwritten signature
verification using convolution neural
network,” in 2019 Int. Conf. Inn.
Comput., IEEE, 2019, pp. 1–6, doi:
https://doi.org/10.1109/ICIC48496.20
19.8966696
[26] T. S. Rana and A. Ashraf, “Bladder and
kidney cancer genome classification
using neural network,” VFAST

School of System and Technology


15
Volume 3 Issue 1, Spring 2023

You might also like