You are on page 1of 19

A novel SVM Kernel Classi er Technique using

Support Vector Machine for Breast Cancer


Classi cation
G S Pradeep Ghantasala
Chitkara University Institute of engineering and Technology, Chitkara University
Yaswanth Raparthi
Vellore Institute of Technology
Venkateswarulu Naik. B
Narasimha Reddy Engineering College
Amal Al-Rasheed
Princess Nourah bint Abdulrahman University
Mohammed S. Alqahtani
King Khalid University
Mohamed Abbas
King Khalid University
Ben Othman Sou ene (  sou ene.benothman@isim.rnu.tn )
University of Sousse

Article

Keywords: Breast cancer, Classi cation, Support Vector Machine, Chemotherapy, Kernels

Posted Date: April 24th, 2023

DOI: https://doi.org/10.21203/rs.3.rs-2820379/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Additional Declarations: No competing interests reported.


A novel SVM Kernel Classifier Technique using Support Vec-
tor Machine for Breast Cancer Classification
G S Pradeep Ghantasala 1, Yaswanth Raparthi 2, Venkateswarulu Naik.B 3, Amal Al-Rasheed4,*, Mohammed S.
Alqahtani5,6, Mohamed Abbas7, Ben Othman Soufiene8

1 Department of Computer Science and Engineering, Chitkara University Institute of engineering and Tech-
nology, Chitkara University, Punjab, India ; ggspradeep@gmail.com
2 School of Computer Science and Engineering, Vellore Institute of Technology, Tamil Nadu, India; yas-

wanthraparthi@gmail.com
3 Department of Computer Science and Engineering, Narasimha Reddy Engineering College, Secunderabad,

India ; b.v.naik681@gmail.com
4 Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint

Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia; aaalrasheed@pnu.edu.sa
5 Radiological Sciences Department, College of Applied Medical Sciences, King Khalid University, Abha

61421, Saudi Arabia; mosalqhtani@kku.edu.sa


6 BioImaging Unit, Space Research Centre, Michael Atiyah Building, University of Leicester, Leicester,

LE17RH, U.K.
7 Electrical Engineering Department, College of Engineering, King Khalid University, Abha 61421, Saudi

Arabia; mabas@kku.edu.sa
8 PRINCE Laboratory Research, ISITcom, Hammam Sousse, University of Sousse, Tunisia;

* Correspondence: Ben Othman Soufiene

Abstract: Breast cancer prediction is an important topic in the field of healthcare. Breast cancer is
one of the most common cancers in women and early detection is critical for successful treatment.
There are several methods for predicting breast cancer, including imaging studies, genetic testing,
and risk assessment models. Early detection can greatly improve the chances of successful treatment
and long-term survival. One approach to detecting breast cancer is to use machine learning algo-
rithms such as support vector machine (SVM) classifiers. SVMs are a popular type of supervised
learning algorithm that can be used for classification or regression analysis. To use SVMs for breast
cancer classification, you need to first prepare the data by dividing it into training and testing sets.
The training set is used to train the SVM model, and the testing set is used to evaluate the perfor-
mance of the model. The SVM model learns to classify the data by adjusting the parameters of the
kernel function. In this paper, the performance of Linear, Polynomial, Gaussian and Sigmoid ma-
chine-learning kernels in the Support Vector Machine method was investigated to determine which
kernel classifier is better at diagnosing breast cancer. In addition, this study made usage of the Wis-
consin Breast Cancer (Diagnostic) dataset that contains 569 occurrences and 32 features for analysis.
The major objective of this study is to compare a variety of kernel classifiers to identify the one that
provides the best accuracy. Linear kernel support vector machine was shown to have the highest
accuracy (97.90%) and lowest false discovery rates in this investigation. In contrast, other kernels
and classification algorithms show low performance, which may not be more accurate in breast
cancer prediction.

Keywords: Breast cancer; Classification; Support Vector Machine; Chemotherapy; Kernels

1. Introduction
The Women of childbearing age in the United States and worldwide are dying from
breast cancer at an alarming rate. According to the WHO, cancer will afflict 19.3 million
individuals globally by 2025 [1]. The risk that this terrible disease could emerge if control
is not maintained has made the control of the growth of breast cancer in women a top
priority. Breast cancer, also known as chest cancer, must be properly diagnosed for the
patient who has it to receive the appropriate treatment. It is a cancerous tumor that origi-
nates in breast tissue and spreads throughout the body. As a rule, the majority of chest
malignancies are classified as either invasive or non-invasive. Chemotherapy patients
with non-invasive chest cancer might expect to see tumors that have not yet progressed
to other parts of the body. As a result of this problem, an additional 400,000 women die
each year, according to some research organizations. Obesity, hormones, radiation ther-
apy, reproductive variables, or even a family history of the disease are all potential causes
of breast cancer. Premature detection of breast cancer decreases mortality by 25% and al-
lows for easier treatment with fewer side effects. Techniques for analyzing the breast have
improved significantly over the past decade. For the sake of saving the lives of breast can-
cer patients soon, new and improved approaches must be created. Data gathering, select-
ing a model, training the model, and testing the model are all common processes in ma-
chine learning for the classification of cancer. As a potential method for early identification
of breast cancer, the machine learning algorithm SVM was considered in this research.
Machine learning technologies are used to mine large datasets for new information. It's
difficult to deduce what the retrieved data means. Since the patient's data must be classi-
fied to determine if the patient has an invasive or non-invasive malignancy, classification
is used to do so to deal with categorization issues, machine learning classifiers need more
time. Based on statistical learning, Classifier SVM-Support Vector Machine employs a
novel approach to supervised pattern classification. Solve pattern recognition issues and
are commonly employed in classification and regression activities. High-dimensional fea-
ture spaces are classified accurately and efficiently with this system. A wide variety of
data can be classified using machine learning classifiers these days. From the beginning,
machine learning classifiers were employed to classify cancer patient data. These classifi-
ers are frequently employed because they allow for strong inferences and help in making
decisions to categorize unlabeled data to an appropriate class in a given data set. Predic-
tions of breast cancer using several classifiers were reported with the effective model
showing excellent accuracy. An approach based on multi-classifiers was found to be ef-
fective in detecting and treating breast cancer.
This paper serves as the first attempt to predict breast cancer for early diagnosis in the
health industry. Principal contributions made by this paper are listed below.
• In this paper, we investigated existing research on breast cancer for early diagnosis
healthcare and examined machine learning models for breast cancer prediction, as
well as previous and modern techniques of healthcare application.
• In this paper, a novel SVM kernel classifier technique is adopted, and various phases
are clearly explained in the breast cancer prediction. These algorithms include Linear
Kernel, Polynomial, Gaussian kernel, Sigmoid kernel and selected features were given
as input and it achieves promising performance.
• Various phases were briefly explained, including the features and pre-processing data
that are required to predict breast cancer.
Finally, predicted breast cancer by using the evaluation matrix which are train accu-
racy, test accuracy, recall, precision, f1_score and roc curve that helps physicians for early
diagnosis.
Following paper is in the order of related work in section 2, methodology and Sup-
port Vector Machine Kernel functions are included in section 3 & 4, Performance matrix
in section 5 and section 6 contains Section 7 contains experimental findings, a discussion,
and a conclusion.

2. Related Work
Digital mammograms may be used to detect breast cancer using SVM and KNN, two
of the most promising machine learning algorithms [2]. Although the AUC score of SVM
was somewhat higher than that of KNN, the results demonstrate that the accuracy of both
techniques was equivalent. The results demonstrate that KNN and SVM may be used ef-
fectively for the identification of breast cancer, and this could help clinicians interpret
mammography more accurately. Among the approaches [3], Support Vector Classifica-
tion, Logistic Regression, and Multilayer Perceptron all perform fairly well. The accuracy
ratings for all available approaches are more than 90% [4] even after a significant reduc-
tion in the number of characteristics employed. Random Forest, K-Nearest Neighbors,
Nave Bayes and Decision Tree [5] are a few of the categorization algorithms now in use.
The area under the corrected operational characteristic curve, ambiguity matrix, recall
score and accuracy were the most accurate accuracy measures. These metrics were the
most accurate. Overfitting may be avoided by using cross-validation with a k fold value
of 3. Deep learning methods have received a lot of interest in [6] for object identification,
picture recognition, and computer vision. When cancer is diagnosed and categorized
early, patients have a far better chance of surviving. The most modern CNN models were
designed to assist radiologists in detecting even the tiniest of lesions at the earliest oppor-
tunity. Breast cancer is diagnosed via mammography and Fuzzy C-Means (FCM) image
segmentation. Data is brought in, and features are retrieved and learned as the areas are
divided. Thereafter, a mammography classification system is utilized to categories the
trained images. Color-level co-occurrence matrix (GLCM), multi-level discrete wavelet
transforms (MWT), or Principal Component Analysis (PCA) are some of the techniques
used to extract texture from photos (PCA). Only 5% of the previous data's [7] Accuracy,
Specificity, and Significance levels remained after the multi pre-processing, indicating that
the proposed approach outperformed the other algorithms. A comparison of data mining
classification algorithms is therefore presented [8], which focuses on data mining using
the learning approach. There are a lot of options in this field when it comes to imaging
modalities. This review's sources of information included a wide range of research data-
bases [9] that provided access to a diverse range of field publications.
Finally, this research examines how breast cancer is currently being classified and
diagnosed, as well as the concerns that are causing concern. It is possible to save lives and
improve the diagnosis and treatment of DCIS and LCIS if they are detected early enough.
For normal and abnormal mammography, [10] CVSM results show a 98.95% and a 98.01%
accuracy rate, respectively. A detailed investigation on machine learning algorithms [33]
have been experimented with the prominent methods for dimensionality reduction. For
routing in BASNs, the optimal dynamic cluster-based solutions between input and output
have been found by removing the challenges of local search given by Muhammad Faheem
[34]. Table 1 presents a comparison of the related works presented in this section.

Table 1. Previous work.

Year Author Purpose Pre-eminence


The reduction of features and The features in this research were chosen using
classification of thermograms can unsupervised feature reduction methodologies. Malignancy
2021 Vartika Mishra et.al[11]
be used to distinguish breast cancer was detected by comparing the subset of features to various
tumours. classifiers. The most accurate algorithm was Random Forest.
Supervised and unsupervised Classifier support vector machines trained and tested on the
feature selection techniques for the characteristics selected by Relief algorithm, Autoencoder,
2021 Amin Ul Haq et.al[12]
recognition of breast cancer in and PCA algorithm to determine if they can accurately
clinical data diagnose breast cancer.
Wisconsin Breast Cancer (Diagnostic) Dataset is used and
Deep Learning Algorithms for
Pronab Ghosh, Sami compared many Deep Learning methodologies. Gated
2021 Accurate Breast Cancer Expectation
Azam et.al[13] Recurrent Unit (GRU) and Long Short-Term Memory
Based on Performance
(LSTM) are the finest options.
Breast cancer recognition utilising
Naveed Chouhan, Detection of Breast cancer is being developed using an
digital mammography and a deep
2021 Asifullah Khan et. automated system that recognizes abnormalities in
convolutional neural network
al[14] mammograms.
based on emotional learning
DBT is an effective method for integrating deep learning into
Breast cancer recognition by using breast cancer screening procedures.
Jun Bai, Russell Posner
2021 deep learning in digital Diagnostic and non-diagnostic DBT photos are used to
et.al[15]
tomosynthesis: a review. gather data for this study, which examines the theoretical
underpinnings of deep learning algorithms.
Automatic Detection and Built on transfer learning a new deep-learning model can
Abeer Saber, Mohamed Classification of Breast Cancer now make it easier to automatically identify and diagnose
2021
Sakr et. al[16] Using Transfer-Learning BC (TL). To extract features from the MIAS dataset, a
Methodology. convolutional neural network (CNN) was used.
This work describes and illustrates nanoparticle-based
Neda Shahbazi, A review of optical biosensing optical biosensors for the recognition of breast cancer. The
2021 Rouholah Zare-Dorabei nanoparticles for breast cancer benefits and drawbacks of more accurate breast cancer
et.al[17] detection: detection are discussed, as well as the obstacles, possibilities,
and prospects for breast cancer biomarkers.
K-Nearest Neighbor optimization is A grid search and an enhanced KNN-based breast cancer
Tsehay Admassu
2021 used in a breast cancer diagnostic detection model were used in this study to find the optimal
Assegie[18]
system. parameter.
Breast cancer can be detected using multi-objective feature
selection utilising Ant Colony Optimization (ACO) then
Rajesh Saturi, Breast Cancer Detection Using
Particle Swarm Optimization (PSO). The proposed strategy,
2021 Parvataneni Premchand ACO and PSO Algorithms in Multi-
for example, decreases the number of breast tumours
[19] Objective Feature Selection
incorrectly identified by focusing on the most relevant
features.
Breast Cancer Histopathology To locate nuclei in histopathological images of breast cancer,
Engin Bozada, Gizem
2021 Image Nuclei Detection Using researchers have employed RetinaNet and a ResNet-152
Solmaz et.al[20]
RetinaNet feature extractor.
Series networks and VGG-16 can be It has been shown that the Gradient Descent Decision Tree
Gul Shaira Banu
2021 used to identify breast cancer at an Classification and the hybrid VGG-16 network segmentation
Jahangeer et al.[21]
initial stage. have produced better accurate classifications.
Using deep learning to identify The CNN (AlexNet) approach is used to categorize breast
Ebrahim Mohammed
2021 breast cancer in its earliest stages cancer as either benign or malignant. It was employed in four
Senan et.al[22]
using histological images distinct trials, each with a different factor of magnification.
ML algorithms employed CNN architectures to compare
Saad Awadh Alanazi Increasing the Detection of Breast
2021 their accuracy in detecting breast cancer. This system's
et.al[23] Cancer Using CNN
findings outperform those of ML algorithms by 9 percent.
In this paper, a CNN classifier called MitosRes-CNN is
Mitosis detection in breast cancer
proposed and its act is compared to that of the most recent
2021 Anbia Sohail et.al[24] histopathology images using a
up-to-date CNNs. In expressions of recall, F-score, and
multi-phase deep CNN
accuracy, the results reveal high discrimination ability.
Examining methods for a diagnosis
Taking care of mammography pictures before they are fed
of the breast cancer from
2021 G.Meenalochini et.al[25] into the classifier has been shown in this study to progress
mammograms using machine
the accurateness of the classifier.
learning
Thermal imaging in the pre-screening phase would
Milosevic, Marina, Early diagnosis and detection of considerably reduce the number of women who would need
2018
Jankovic et a.l[26] breast cancer screening mammography if doctors used a computer system
for tumor identification based on picture processing.
Recognition of breast cancer via
Ahmet Hasim Yurttakal CNN's ability to classify MRI scans of lesions as malignant
2019 deep convolution neural networks
et al.[27] or benign tumours shows great promise in medical imaging.
consuming MRI images
Francisco Javier Detecting of Breast cancer by
Researchers were able to achieve good classification
2019 Fernandez- Ovies et electromagnetic thermography and
accuracy with minimal processing and technology resources
al.[28] deep neural networks
by employing deep neural networks intended for breast
cancer diagnosis.
Fully Automated Detection of
Mehmet Ufuk Dalmis Deep Learning is used to build a system that can make use
2018 breast cancer in screening MRI
et.al[29] of early-phase scans' spatial information.
using CNN
The retrieval process concludes with the computation of
pattern similarity and the instantiation of patterns, as well as
An Efficient CBIR Method for
Jini.R. Marsilin and G. feature extraction and kNN classification. With Content-
2012 Diagnosing the Phases of Breast
Wiselin Jiji [30] Based Image Retrieval, it is now possible to recover photos
Cancer Using KNN
from large databases and diagnose the true phase of breast
cancer.
Early heart disease prediction is achieved with fuzzy
Reddy, G. Thippa, et al. Early heart disease prediction and
2020 classifiers using genetic algorithm. This assists medical
[32] diagnosis
practitioners in diagnosis at earlier stage.
BASN healthcare monitoring applications could benefit from creating a safe routing
system. Syed Arslan Ali [35] developed a genetic approach that stacks two genetic algo-
rithms to get the ideal DBN configuration. An RBM and DBN training session is examined
to understand better how the system operates. When evaluating the proposed approach,
the accuracy, Matthew's correlation coefficient, sensitivity, specificity, preciseness, and F1
were considered. OCI-proposed DBN's temporal complexity will be studied in the future,
one of the essential variables in healthcare. In this study [36] the newly proposed YOLOv5
model has been shown to detect and classify breast cancer, surpassing previous study
limitations. The state-of-the-art widespread dataset outperformed support vector ma-
chines (SVM), Random Forests, Kernels Support Vector Machines, Decision Trees, Lo-
gistic Regressions, Stochastic Gradient Descents, and Gaussian Naive Bayes in this [37]
study. BCD-WERT had the greatest accuracy rate of 99.30%, charted by SVM at 98.60%.
Experiments have also demonstrated that feature selection tactics boost prediction accu-
racy.
In this study, the research is proposed to begin with a review of the various imaging
modalities, utilizing data from a variety of databases, including those for ultrasound, his-
tology (including mammography), and mammography, to gain access to a large number
of papers. The second section looks at a range of machine learning practices to assess the
chance of breast cancer recurring. The elimination of missing values and data noise, as
well as the application of transformations, are all part of the first data preparation. The
dataset is separated into the following sections: The training dataset accounts for 60% of
the overall dataset, whereas the testing dataset accounts for 40%. To improve accuracy
and sensitivity, we want to reduce the number of false positives and false negatives
(FNRs). They proposed to employ machine learning methods like LR, SVM, and K-nearest
neighbour in our new model to improve accuracy in breast cancer classification, and we
intend to use them in our new model (KNN). Aside from that, we attain 97.7 [38] percent
accuracy with a ROC AUC score of 0.99 and FPRs of 0.01 and 0.03 as well as FNRs of 0.03
and 0.01 and an AUC value of 0.99.
Smart homes are increasingly being used to monitor and analyse daily routines. Their
mission is to make smart homes more ecologically friendly. Their research attempts to
improve categorization performance in both easy and difficult ordinary life activities. Re-
searchers confirmed the algorithm's accuracy by using scores granted by a neuropsycholo-
gist who conducted in-depth firsthand observation. We achieved an accuracy [39] per-
centage of 96.02 percent to 99.6 percent using our method. The validity and applicability
of the evaluation approach have improved as a result of research in this field. The Gray
Filter Bayesian Convolution Neural Network technology was used to construct an IoT-
based eHealth architecture that is AI-driven and uses the Grey Filter Bayesian Convolu-
tion Neural Network to decrease time and overhead while increasing accuracy [40]. Their
research attempts to create support systems and medical device monitoring [41] by inte-
grating big data and Internet of things to provide better performance in communication,
scheduling cores, medical devices and resource management.
Figure 1. Cancer Patients worldwide.

In Figure 1, The data is collected from google trends which contains the past five
years of cancer patients worldwide from 2017 to 2022 [42]. As technology grows, there is
a chance to forecast breast cancer in the early stage and cure the patients. In this research
the literature gap includes the study of normal breast growth as well as the transition to
cancer demands an understanding of the roles that genetic and epigenetic modifications
play as well as the context interactions that they produce. Expert bioinformatics support
is necessary to maximize the utility of clinical evidence for translational research collected
from primary, recurrent, metastatic, and drug-resistant malignancies, as well as normal
breast and blood cancers. Understanding and studying the various machine learning svm
kernel techniques and performing predictive analysis on breast cancer that helps physi-
cians for early predictions.

3. Methodology
The possibility of dying from breast cancer is reduced by 7% if you have a mammog-
raphy every two years for 20 years. Chemotherapy can be avoided if cancers are found
early enough during screening, which is often the case. It is, therefore, unnecessary to
provide chemotherapy. This device allows women to keep checks on the health of their
breasts. Most women will be cancer-free if their mammograms and other tests show no
signs of the disease. Over half of the women screened for 20 years or more need at least
one more checkup (453 in 1,000). This is an increase of 156 women over the initial 1,000
women screened. So early prediction is much needed for diagnosis of breast cancer.

3.1. Dataset
The pre-processing data module includes initial dataset analysis, which is appropri-
ate for this specific module. Figure 2 is made up of three sections. Gathering data from the
dataset is the first stage. Analyzing all aspects which improve the data quality is in the
second phase. The data is prepared for the following module by removing noise or non-
content data during the third step. The Wisconsin Diagnosis Breast Cancer data collection
is used in this study [31]. The kaggle repository data set has 569 occurrences and 32 attrib-
utes with no missing or incorrect data shown in figure 3. A malignant (212 observations)
or a benign result is possible (357 observations). Predominant factors include diagnostics,
mean textures, average radii, average areas, etc. This class is used to categorize incidences
of malignancy. In contrast, the positive type is used to classify good things globally.
Figure 2. Collection of Dataset.

Figure 3. Breast Cancer Dataset.


3.2. Analysing the Dataset

Figure 4. Preprocessing the dataset.

Figure 4 depicts the three steps of the design for the first module. The cleansed da-
taset from breast cancer is placed in a log file after a first inspection confirms it. Module
one's goal is to carry out activities according to the dataset's instructions. The output of
these activities or processing will be sent to module two after being saved in a log file. The
data preprocessing technique involves label encoding to give labels for diagnosis feature
in the data. The StandardScaler is used to scale the values of training and testing of the
data.

3.3. Categorizing Dataset

Figure 5. Categorizing the dataset.

The Classification of Breast Cancer Detector is the data module shown in figure 5 for
categorizing data. Training, testing, and validation datasets are separated into three
groups in the third breast cancer dataset classification module using an SVM to differen-
tiate among benign and malignant breast cancer mass. The Classification of Breast Cancer
Detector is the data module for categorizing data [43,44]. Training, testing, and validation
datasets are separated into three categories in the third breast cancer dataset classification
module using an SVM to distinguish between benign and malignant cancer masses.
4. Support Vector Machine Kernel functions
To solve classification and regression problems, supervised machine learning can be
employed. An important and adaptable machine learning technique, the support vector
machine is capable of performing linear and nonlinear classification, regression, and out-
lier identification in a variety of contexts. Classification and regression problems may be
addressed with SVMs, a prominent machine-learning approach. Classification algorithms
that use less computation and are more accurate are chosen over those that require more
calculation and are less accurate. Even if there is a lack of data, it provides accurate results.
Various kernel function of SVM that is presented in figure 6 is explained below:

Figure 6. Performance of SVM kernel classifiers.

4.1. Kernel function


̅̅̅ =10 𝑖𝑓 ||𝑥̅ || ≤ 1
𝐸(𝑥) (1)
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
A "kernel" of mathematical functions provides a foundation for exploration within the
context of support vector machines. Kernel functions are commonly used to transform
the training set, transforming a non-linear decision surface into a linear equation in a
higher-dimensional space. The purpose of doing so is to raise precision. This function
provides access to the inner product of a common feature dimension ‘x’ between two
points either 0 and 1. The above equation gives Linear kernel, Polynomial, RBF and
Sigmoid are the four most often used kernels for SVM research in this field.
Listed below are some examples:
1. Linear Kernel
The kernel function is characterized by the following linear function in the linear
kernel:
𝐸(𝑥𝑚 , 𝑥𝑛 ) = 𝑥𝑚 𝑇 𝑥𝑛 (2)
When the data can be separated linearly, the axis 𝑥𝑚 , 𝑥𝑛 is used, and L is the
linear kernel. Using a single line, data can be segregated. As one of the most
widely used kernels, it's easy to see why. When a dataset contains a large num-
ber of features, this technique is frequently employed. When it comes to text
classification, the linear kernel is a popular choice. To train with a linear kernel,
we only have to optimize the C regularization parameter. The 𝛾 parameter
must also be optimized when working with other kernels.
2. Polynomial Kernel
The similarity between training samples (vectors) in a feature space is repre-
sented by the polynomial kernel of the original variables. For the polynomial
kernel to determine the similarity of input samples, it examines both their
provided features and combinations of those features. The polynomial kernel
has the following definition for degree-d polynomials:
𝐸(𝑥𝑚 , 𝑥𝑛 ) = (𝛾𝑥𝑚 𝑇 𝑥𝑛 + 𝑠)𝑔, 𝛾 > 0 (3)
In Natural Language Processing, polynomial kernels are particularly common.
The most commonly used degree is g = 2 (quadratic) because bigger degrees
tend to overfit NLP problems.
3. Radial Basis Function Kernel
Kernel with radial basis functions can be used for any purpose. When we don't
know anything about the data, we use it. There are several uses for radial basis
functions. When we have no prior knowledge of the data, this is the method we
employ. This equation is used to calculate the RBF kernel for two samples m and
n.
2
||𝑚−𝑛||
𝐸(𝑚, 𝑛) = exp (− ) (4)
2𝜎 2

4. Sigmoid Kernel
Neural networks are the ancestors of the sigmoid kernel. A neural network can
be modelled using this technique. The following equation yields the sigmoid
kernel:
𝐸(𝑚, 𝑛) = 𝑡𝑎𝑛ℎ(𝛼𝑚𝑇𝑛 + 𝑐) (5)
Pseudocode

Breast Cancer Prediction Using Support Vector Machines


Input: Wisconsin Diagnosis Breast Cancer dataset ‘𝐷 = 𝑋1 𝑦1 , 𝑋2 𝑦2 , … 𝑋𝑛 𝑦𝑛 ’

Output: Improved accuracy in predicting breast cancer

Step 1: Begin
Step 2: Import all the required packages 𝜖 ′𝑃′
Step 3: Load the Wisconsin Diagnosis Breast Cancer dataset using the library function called Pandas
#input = pd.read_csv(“Wisconsin.csv”)
Step 4: input.shape #To know the size of the dataset
Step 5: Use the Label Encoder library function and encode the feature column named diagnosis (Male/Female)
Step 6: X = input.iloc[:,1:] # To select the specific rows/columns using .iloc
Step 7: y = input.iloc[:,0]
Step 8: Import the train_test_split module from sklearn and split the dataset ‘D’
Step 9: StandardScaler is used on the training and testing dataset.
Step 10: Assemble a support vector machine learning algorithm that matches the model.
Step 11: from the sklearn.svm import SVC library
Step 12: Create a model and assign the Kernel = “Linear”
Step 13: Fit the model to the X_train, y_train and predict the test set results
Step 14: Calculating the accuracies for training and testing data
#For Training accuracy assign to
model.score(x_train, y_train)
# Model.score(x test, y test) can be used to measure the accuracy of tests.
Step 15: Calculate the Performance metrics by importing the packages from the sklearn.metrics
Step 16: To calculate the Precision, assign y_test, y_predict to the precision_score
Where precision is equal to True Positive / (True Positive + False Positive)
Step 17: To calculate the Recall, assign y_test, y_predict to the recall_score
Where Recall is equal to True Positive / (True Positive + False Negative)
Step 18: To calculate the F_score, assign y_test, y_predict to the f1_score. Where Precision and recall are taken into
account when calculating accuracy using the statistical method known as F-score
Step 19: To calculate the ROC curve assign y_test, y_predict to the roc_auc_score. Binary classification problems are
evaluated using the Receiver Operator Characteristic (ROC) curve.
Step 20: Repeat the same procedure from step 11 to step 18 for other kernels like Polynomial, Radial Basis Function
Kernel, Sigmoid kernel to find the accuracy and performance metrics.
Step 21: End
5. Performance Matrix

Figure 7. Preprocessing the dataset.

This method has various phases, as seen in the above figures. It was essential to ob-
tain data from the Wisconsin Diagnosis Breast Cancer study, which was then used to pre-
process and select the relevant features before developing prediction models with SVM-
based kernel machine learning shown in the above figure 7.

a) Accuracy
An algorithm's accuracy is measured by how well it can detect patterns and
correlations between variables in a dataset using just the input data. Test accu-
racy indicates that the trained model correctly detects the cancer from the data
that were used in both training and testing, whereas training accuracy means
that the model correctly detects the caner from the data that were not used in
training.
b) Precision
The percentage of accurately anticipated positive events among all the projected
positive outcomes is what we mean by "precision." Use the ratio of true positives
(TP) to both true and false positives (TP + FP) to determine it.
A favorable outcome can be defined as a result of a high degree of Precision. The
positive class is more important than the negative class to this system.
The ratio of TP to (TP + FP) can be used to determine precision mathematically.
𝑡𝑝
Precision = (6)
𝑡𝑝 +𝑓𝑝

c) Recall
When referring to memory, the percentage of properly anticipated positive
events out of all positive outcomes is what is meant. False negatives (FN) can be
stated as a percentage of true positives (TP + FN). The phrases "sensitivity" and
"recall" are sometimes used interchangeably since they are synonyms. A recall
is the percentage of correctly predicted positive outcomes that occurred. Math-
ematically, recall can be expressed as a ratio of (TP+FN) to TP.
𝑡𝑝
Recall = (7)
𝑡𝑝 +𝑓𝑛
d) F1-score
The weighted harmonic mean of their f1-scores represents precision and recall.
For the f1-score, the greatest score is 1.0, while the worst is zero. The f1-score is
calculated through adding the measures of precision and recall. Since precision
and recall are included in the calculation, f1-scoring is always lower than accu-
rate measurements. For model comparisons, the weighted f1-score should be
used, not the global accuracy. The ratio of TP to TP+1/2(FP + FN) can be used to
determine precision mathematically.
tp
F1_score = 1 (8)
𝑡𝑝 + (𝑓𝑝 +𝑓𝑛 )
2

e) ROC Curve

Classification models may be visualized using the ROC curve. The term “ROC
Curve” is used to describe the operating characteristics of a receiver. This graph
shows how well a classification model works at various threshold levels, and the
False Positive Rate (FPR) and True Positive Rate (TPR) are presented in the
curve (FPR).

6. Experimental results and discussion


Table 2 shows evaluating different kernel-based machine learning algorithms using
evaluation criteria like Precision, F1 Score, Accuracy, and Recall. The train and test accu-
racy of the linear kernel is 98% and 97%; precision value is 98%, recall scores 96%, f1_score
is 97%, and roc_curve with 97%. Therefore, the linear kernel shows better performance
when compared with the remaining kernel functions.

Table 2. Results.

Performance of SVM Algorithm


Evaluation metrics
Linear Kernel Polynomial Kernel Gaussian Kernel Sigmoid Kernel
Train Accuracy 0.9859 0.8122 0.9882 0.9553
Test Accuracy 0.979 0.7832 0.965 0.944
Precision 0.9803 0.9565 0.9607 0.94
Recall 0.9615 0.423 0.9423 0.9038
F_Score 0.9708 0.5866 0.9514 0.9215
Roc curve 0.9752 0.706 0.9601 0.9358
Figure 8(a). Performance of Linear kernel Figure 8(b). Performance of Polynomial kernel

Figure 8(c). Performance of Gaussian kernel Figure 8(d). Performance of Sigmoid kernel
Figure 9. Comparison graph for the SVM kernels.

Figure 8(a), 8(b), 8(c) and 8(d) represents the result of the various kernels such as
Linear, polynomial, Gaussian, Sigmoid in the SVM algorithm concerning the performance
and Evaluation parameters such as test and train accuracy, recall, F score, precision, roc
curve is used to determine the breast cancer prediction accuracy of the model.
As discussed, Figure 9 represents the performance metrics using a kernel-based Ma-
chine Learning algorithm, and we can see the comparison and outcomes for breast cancer
prediction. Because of selecting the appropriate dataset by eliminating the inappropriate
features and standardization and proper selection of the class label, the linear SVM has
achieved more accuracy of 98% classifier algorithm for detecting cancer compared to other
kernels.

7. Conclusion
In this work, a variety of kernel classifiers were utilized to classify breast cancer data.
By analysis various SVM Kernel classifiers in which Linear SVM kernel had great preci-
sion, sensitivity, and specificity and other kernels had excellent F-measure and ROC val-
ues. Following Linear SVM is Polynomial kernel, Sigmoid Kernel, and RBF (also known
as Gaussian) according to the findings (also known as RBF). To reduce diagnostic test
expenses and diagnostic errors, SVMs and Linear kernels can be used in conjunction with
each other to design and automate systems that predict breast cancer in patients.
More machine learning algorithms like neural network algorithms and genetic algo-
rithms could be incorporated into the study and the ultimate accuracy assessment could
be performed utilizing optimization algorithms.
Author Contributions: All authors contributed equally to the conceptualization, formal analysis,
investigation, methodology, and writing and editing of the original draft. All authors have read and
agreed to the published version of the manuscript.
Funding: This research was financially supported by Princess Nourah bint Abdulrahman Univer-
sity Researchers Supporting Project number (PNURSP2023R235), Princess Nourah bint Abdulrah-
man University, Riyadh, Saudi Arabia. The authors extend their appreciation to the Deanship of
Scientific Research at King Khalid University (KKU) for funding this work through the Research
Group Program Under the Grant Number: (R.G.P.1/224/43).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The dataset we used is a Public Dataset. The Wisconsin Diagnosis
Breast Cancer data collection is used in this paper: https://www.kaggle.com/uciml/breast-cancer-
wisconsin-data/version/2
Conflicts of Interest: The authors declare no conflict of interest.
Acknowledgments: This work was supported by Princess Nourah bint Abdulrahman University
Researchers Supporting Project number (PNURSP2023R235), Princess Nourah bint Abdulrahman
University, Riyadh, Saudi Arabia. The authors extend their appreciation to the Deanship of Scien-
tific Research at King Khalid University (KKU) for funding this work through the Research Group
Program Under the Grant Number: (R.G.P.1/224/43).

References
1. WHO. WHO breast cancer scope. https://www.who.int/news-room/fact-sheets/detail/breast-cancer, 2022. [Online; accessed
2022]
2. Singh, Laxman, Sovers Singh Bisht, and V. K. Pandey. "Comparative Study of Machine Learning Techniques for Breast Cancer
Diagnosis." In Healthcare and Knowledge Management for Society 5.0, pp. 151-167. CRC Press, 2021.
3. Das, Arijit, Tanisha Khan, Subhram Das, and D. K. Bhattacharya. "Proper Choice of a Machine Learning Algorithm for Breast
Cancer Prediction." In Computational Advancement in Communication, Circuits and Systems, pp. 1-12. Springer, Singapore,
2022.
4. Guru Sai Sarma Chilukuri, N. V. S., Shahana Bano, Guru Sree Ram Tholeti, Sai Pavan Kamma, and Gorsa Lakshmi Niharika.
"An Analytical Prediction of Breast Cancer Using Machine Learning." In ICDSMLA 2020, pp. 185-202. Springer, Singapore, 2022.
5. Jawad, M. Abdul, and Farida Khursheed. "Machine Learning-Aided Automatic Detection of Breast Cancer: A Survey." In Hand-
book of Research on Applied Intelligence for Health and Clinical Informatics, pp. 274-290. IGI Global, 2022.
6. Venugeetha, Y., B. M. Harshitha, K. P. Charitha, K. Shwetha, and V. Keerthana. "Breast Cancer Prediction and Trail Using
Machine Learning and Image Processing." In ICDSMLA 2020, pp. 957-966. Springer, Singapore, 2022.
7. Parekh, Mayank V., and Dushyantsinh Rathod. "Breast Cancerous Tumor Detection Using Supervised Machine Learning Tech-
niques." In ICT with Intelligent Applications, pp. 671-678. Springer, Singapore, 2022.
8. Mishra, Anshul, M. H. Khan, Waris Khan, Mohammad Zunnun Khan, and Nikhil Kumar Srivastava. "A Comparative Study on
Data Mining Approach Using Machine Learning Techniques: Prediction Perspective." In Pervasive Healthcare, pp. 153-165.
Springer, Cham, 2022.
9. Houssein, Essam H., Marwa M. Emam, Abdelmgeid A. Ali, and Ponnuthurai Nagaratnam Suganthan. "Deep and machine
learning techniques for medical imaging-based breast cancer: A comprehensive review." Expert Systems with Applications 167
(2021): 114161.
10. Mehmood, Mavra, Ember Ayub, Fahad Ahmad, Madallah Alruwaili, Ziyad A. Alrowaili, Saad Alanazi, Mamoona Humayun,
Muhammad Rizwan, Shahid Naseem, and Tahir Alyas. "Machine learning enabled early detection of breast cancer by structural
analysis of mammograms." Computers, Materials and Continua 67, no. 1 (2021): 641-657.
11. Mishra, Vartika, and Santanu Kumar Rath. "Detection of breast cancer tumours based on feature reduction and classification of
thermograms." Quantitative InfraRed Thermography Journal 18, no. 5 (2021): 300-313.
12. Haq, Amin Ul, Jian Ping Li, Abdus Saboor, Jalaluddin Khan, Samad Wali, Sultan Ahmad, Amjad Ali, Ghufran Ahmad Khan,
and Wang Zhou. "Detection of breast cancer through clinical data using supervised and unsupervised feature selection tech-
niques." IEEE Access 9 (2021): 22090-22105.
13. Ghosh, Pronab, Sami Azam, Khan Md Hasib, Asif Karim, Mirjam Jonkman, and Adnan Anwar. "A performance based study
on deep learning algorithms in the effective prediction of breast cancer." In 2021 International Joint Conference on Neural Net-
works (IJCNN), pp. 1-8. IEEE, 2021.
14. Chouhan, Naveed, Asifullah Khan, Jehan Zeb Shah, Mazhar Hussnain, and Muhammad Waleed Khan. "Deep convolutional
neural network and emotional learning based breast cancer detection using digital mammography." Computers in Biology and
Medicine 132 (2021): 104318.
15. Bai, Jun, Russell Posner, Tianyu Wang, Clifford Yang, and Sheida Nabavi. "Applying deep learning in digital breast tomosyn-
thesis for automatic breast cancer detection: A review." Medical image analysis 71 (2021): 102049.
16. Saber, Abeer, Mohamed Sakr, Osama M. Abo-Seida, Arabi Keshk, and Huiling Chen. "A novel deep-learning model for auto-
matic detection and classification of breast cancer using the transfer-learning technique." IEEE Access 9 (2021): Intelligence 13,
no. 2 (2020): 185-196.
17. Shahbazi, Neda, Rouholah Zare-Dorabei, and Seyed Morteza Naghib. "Multifunctional nanoparticles as optical biosensing
probe for breast cancer detection: A review." Materials Science and Engineering: C 127 (2021): 112249.
18. Assegie, Tsehay Admassu. "An optimized K-Nearest Neighbor based breast cancer detection." Journal of Robotics and Control
(JRC) 2, no. 3 (2021): 115-118.
19. Saturi, Rajesh, and Parvataneni Premchand. "Multi-Objective Feature Selection Method by Using ACO with PSO Algorithm for
Breast Cancer Detection." International Journal of Intelligent Engineering and Systems 14, no. 5 (2021): 359-368.
20. Bozaba, Engin, Gizem Solmaz, Çisem Yazıcı, Gülşah Özsoy, Fatma Tokat, Leonardo O. Iheme, Sercan Çayır, Samet Ayaltı, Cavit
Kerem Kayhan, and Ümit İnce. "Nuclei Detection on Breast Cancer Histopathology Images Using RetinaNet." In 2021 29th
Signal Processing and Communications Applications Conference (SIU), pp. 1-4. IEEE, 2021.
21. Jahangeer, Gul Shaira Banu, and T. Dhiliphan Rajkumar. "Early detection of breast cancer using hybrid of series network and
VGG-16." Multimedia Tools and Applications 80, no. 5 (2021): 7853-7886.
22. Senan, Ebrahim Mohammed, Fawaz Waselallah Alsaade, Mohammed Ibrahim Ahmed Al-Mashhadani, H. H. Theyazn, and
Mosleh Hmoud Al-Adhaileh. "Classification of histopathological images for early detection of breast cancer using deep learn-
ing." Journal of Applied Science and Engineering 24, no. 3 (2021): 323-329.
23. Alanazi, Saad Awadh, M. M. Kamruzzaman, Md Nazirul Islam Sarker, Madallah Alruwaili, Yousef Alhwaiti, Nasser Alsham-
mari, and Muhammad Hameed Siddiqi. "Boosting breast cancer detection using convolutional neural network." Journal of
Healthcare Engineering 2021 (2021).
24. Sohail, Anabia, Asifullah Khan, Noorul Wahab, Aneela Zameer, and Saranjam Khan. "A multi-phase deep CNN based mitosis
detection framework for breast cancer histopathological images." Scientific Reports 11, no. 1 (2021): 1-18.
25. Meenalochini, G., and S. Ramkumar. "Survey of machine learning algorithms for breast cancer detection using mammogram
images." Materials Today: Proceedings 37 (2021): 2738-2743.
26. Milosevic, Marina, Dragan Jankovic, Aleksandar Milenkovic, and Dragan Stojanov. "Early diagnosis and detection of breast
cancer." Technology and Health Care 26, no. 4 (2018): 729-759.
27. Yurttakal, Ahmet Haşim, Hasan Erbay, Türkan İkizceli, and Seyhan Karaçavuş. "Detection of breast cancer via deep convolution
neural networks using MRI images." Multimedia Tools and Applications 79, no. 21 (2020): 15555-15573.
28. Fernández-Ovies, Francisco Javier, Edwin Santiago Alférez-Baquero, Enrique Juan de Andrés-Galiana, Ana Cernea, Zulima
Fernández-Muñiz, and Juan Luis Fernández-Martínez. "Detection of breast cancer using infrared thermography and deep neu-
ral networks." In International Work-Conference on Bioinformatics and Biomedical Engineering, pp. 514-523. Springer, Cham,
2019.
29. Dalmış, Mehmet Ufuk, Suzan Vreemann, Thijs Kooi, Ritse M. Mann, Nico Karssemeijer, and Albert Gubern-Mérida. "Fully
automated detection of breast cancer in screening MRI using convolutional neural networks." Journal of Medical Imaging 5, no.
1 (2018): 014502.
30. Marsilin, Jini R., and G. Wiselin Jiji. "An efficient cbir approach for diagnosing the stages of breast cancer using knn classifier."
Bonfring International Journal of Advances in Image Processing 2, no. 1 (2012): 01-05.
31. Wisconsin Diagnosis Breast Cancer data set: https://www.kaggle.com/uciml/breast-cancer-wisconsin-data/version/2
32. Reddy, G. Thippa, M. Reddy, Kuruva Lakshmanna, Dharmendra Singh Rajput, Rajesh Kaluri, and Gautam Srivastava. "Hybrid
genetic algorithm and a fuzzy logic classifier for heart disease diagnosis." Evolutionary Intelligence 13, no. 2 (2020): 185-196.
33. Reddy, G. Thippa, M. Praveen Kumar Reddy, Kuruva Lakshmanna, Rajesh Kaluri, Dharmendra Singh Rajput, Gautam Sri-
vastava, and Thar Baker. "Analysis of dimensionality reduction techniques on big data." IEEE Access 8 (2020): 54776-54788.
34. Faheem, Muhammad, Rizwan Aslam Butt, Basit Raza, Hani Alquhayz, Muhammad Zahid Abbas, Md Asri Ngadi, and Vehbi
Cagri Gungor. "A multiobjective, lion mating optimization inspired routing protocol for wireless body area sensor network
based healthcare applications." Sensors 19, no. 23 (2019): 5072.
35. Ali, Syed Arslan, Basit Raza, Ahmad Kamran Malik, Ahmad Raza Shahid, Muhammad Faheem, Hani Alquhayz, and Yogan
Jaya Kumar. "An optimally configured and improved deep belief network (OCI-DBN) approach for heart disease prediction
based on Ruzzo–Tompa and stacked genetic algorithm." IEEE Access 8 (2020): 65947-65958.
36. Mohiyuddin, Aqsa, Asma Basharat, Usman Ghani, Veselý Peter, Sidra Abbas, Osama Bin Naeem, and Muhammad Rizwan.
"Breast Tumor Detection and Classification in Mammogram Images Using Modified YOLOv5 Network." Computational and
Mathematical Methods in Medicine 2022 (2022).
37. Abbas, Shafaq, Zunera Jalil, Abdul Rehman Javed, Iqra Batool, Mohammad Zubair Khan, Abdulfattah Noorwali, Thippa Reddy
Gadekallu, and Aqsa Akbar. "BCD-WERT: a novel approach for breast cancer detection using whale optimization based efficient
features and extremely randomized tree algorithm." PeerJ Computer Science 7 (2021): e390.
38. Safdar, Sadia, Muhammad Rizwan, Thippa Reddy Gadekallu, Abdul Rehman Javed, Mohammad Khalid Imam Rahmani, Khur-
ram Jawad, and Surbhi Bhatia. "Bio-Imaging-Based Machine Learning Algorithm for Breast Cancer Detection." Diagnostics 12,
no. 5 (2022): 1134.
39. Javed, Abdul Rehman, Labiba Gillani Fahad, Asma Ahmad Farhan, Sidra Abbas, Gautam Srivastava, Reza M. Parizi, and Mo-
hammad S. Khan. "Automated cognitive health assessment in smart homes using machine learning." Sustainable Cities and
Society 65 (2021): 102572.
40. Patan, Rizwan, GS Pradeep Ghantasala, Ramesh Sekaran, Deepak Gupta, and Manikandan Ramachandran. "Smart healthcare
and quality of service in IoT using grey filter convolutional based cyber physical system." Sustainable Cities and Society 59
(2020): 102141.
41. Rizwan, Patan, M. Rajasekhara Babu, B. Balamurugan, and K. Suresh. "Real-time big data computing for internet of things and
cyber physical system aided medical devices for better healthcare." In 2018 Majan International Conference (MIC), pp. 1-8. IEEE,
2018.
42. Google. Metaverse Vs. Virtual Reality. https://trends.google.com/trends/explore?date=today%205-y&q=Breast%20caner, 2022.
[Online; accessed 2022]
43. Sachdeva, R. K., Bathla, P., Rani, P., Kukreja, V., & Ahuja, R. (2022, April). A systematic method for breast cancer classification
using RFE feature selection. In 2022 2nd International Conference on Advance Computing and Innovative Technologies in
Engineering (ICACITE) (pp. 1673-1676). IEEE.
44. Sachdeva, R. K., & Bathla, P. (2022). A Machine Learning-Based Framework for Diagnosis of Breast Cancer. International Journal
of Software Innovation (IJSI), 10(1), 1-11.

You might also like