Feature Selection and Dimensionality Reduction Techniques For Effective Breast Cancer Predictions

Feature selection and dimensionality
reduction techniques for effective breast

cancer predictions
1
Table of Contents
Abstract......................................................................................................................................................
1 Introduction........................................................................................................................................
1.1 Traditional Methods used to Diagnosis Breast Cancer in medical domain..........................................
1.2 Role of Machine Learning techniques used to predict the Breast Cancer...........................................
1.3 Hybrid Machine Learning models used to predict the Breast Cancer.................................................
1.4 Scope for Research...........................................................................................................................
1.5 Aim of the research...........................................................................................................................
1.6 Research question.............................................................................................................................
1.7 Research Hypothesis.........................................................................................................................
1.8 Objectives.........................................................................................................................................
1.9 Limitations........................................................................................................................................
1.10 Ethical Issues.....................................................................................................................................
1.11 Project Planning and Timescales.......................................................................................................
1.12 Risk Analysis......................................................................................................................................
1.13 Ethical Approval................................................................................................................................
2 Literature Review..............................................................................................................................
2.1 Feature selection and dimensionality reduction usage in breast cancer predictions........................
2.2 Dimensionality Reduction Techniques for Feature Selection and Feature Extraction.......................
2.3 Breast cancer prediction using feature selection..............................................................................
2.4 Analysis and Conclusions..................................................................................................................
3 Methodology....................................................................................................................................
3.1 Choice of Methods............................................................................................................................
3.2 Dataset Description...........................................................................................................................
3.3 Feature selection Methods...............................................................................................................
3.3.1 Chi-square...........................................................................................................................
3.3.2 L1-based feature selection..................................................................................................
3.3.3 Recursive Feature Elimination.............................................................................................
3.4 Dimensionality reduction techniques...............................................................................................
3.4.1 Principal Component Analysis (PCA)...................................................................................
3.4.2 Latent Dirichlet Allocation (LDA).........................................................................................
3.5 Machine Learning algorithm.............................................................................................................
3.5.1 SVM classifier......................................................................................................................
3.5.2 Random Forest classifier.....................................................................................................
3.5.3 MLP.....................................................................................................................................
3.5.4 Passive Aggressive classifier (PAC)......................................................................................
3.6 Evaluation metrics.............................................................................................................................
3.6.1 Confusion Report................................................................................................................
3.6.2 Time taken for training, validating & testing the data........................................................
4 Experiments & results.......................................................................................................................
4.1 Breast Cancer Data Collection...........................................................................................................
4.2 Breast Cancer Data Description........................................................................................................
4.3 Breast Cancer Data Preprocessing....................................................................................................
4.4 Breast Cancer Data Visualization.......................................................................................................
4.5 Label Encoding Process.....................................................................................................................
2
4.6 Splitting the Breast Cancer Data........................................................................................................
4.7 ML model implementation...............................................................................................................
4.7.1 SVM model..........................................................................................................................
4.7.2 Random Forest Classifier.....................................................................................................
4.7.3 Decision Tree.......................................................................................................................
4.7.4 MLP Classifier......................................................................................................................
4.7.5 Passive Classifier.................................................................................................................
5 Results & conclusion..........................................................................................................................
5.1 Technical Challenges faced & its solution.........................................................................................
5.1.1 Name error.........................................................................................................................
5.1.2 Attribute Error.....................................................................................................................
5.2 Interpretation of Results...................................................................................................................
5.2.1 Accuracy.............................................................................................................................
5.3 Critical Analysis.................................................................................................................................
5.3.1 Research Findings...............................................................................................................
5.3.2 Comparison with other research work................................................................................
5.4 Conclusion.........................................................................................................................................
5.4.1 Addressing Research Questions..........................................................................................
5.5 Future Enhancement........................................................................................................................
6 Reference..........................................................................................................................................
Appendix..................................................................................................................................................
3
List of tables
Table 1 Parameters of SVM.........................................................................................................46
Table 2 Results of SVM................................................................................................................46
Table 3 Parameters of random forest.........................................................................................47
Table 4 Results of random forest................................................................................................48
Table 5 Parameters of decision tree...........................................................................................49
Table 6 Results of decision tree..................................................................................................49
Table 7 Parameters of MLP.........................................................................................................50
Table 8 Results of MLP................................................................................................................51
Table 9 Parameters of passive classifier......................................................................................52
Table 10 Results of passive classifier...........................................................................................52
Table 11 Overall results...............................................................................................................55
4
List of figures
Figure 1 Support Vector Machine Classifier [19].........................................................................33
Figure 2 Architecture of the Random Forest algorithm [24].......................................................34
Figure 3 Architecture of the Decision Tree algorithm [24]..........................................................35
Figure 4 Multi-Layer Perceptron [31]..........................................................................................37
Figure 5 Passive Aggressive Classifier [36]..................................................................................38
Figure 6 visualization of the first 10 number of rows..................................................................42
Figure 7 Data visualization..........................................................................................................43
Figure 8 Data visualization of radius and perimeter....................................................................44
Figure 9 Label encoding code......................................................................................................44
Figure 10 Label encoding graph..................................................................................................45
Figure 11 Data splitting...............................................................................................................45
5
Abstract
With the advancement of biomedical and computer technologies, a vast amount of data
on various clinical factors related to breast cancer have been collected, providing a new
opportunity for accurate predictions of the disease. The problem with using high-
dimensional medical data to predict breast cancer is that this data can be difficult to
interpret and analyze. Due to its complexity, traditional techniques such as logistic
regression and decision trees may not be able to accurately capture the underlying
relationships between the various clinical factors. Furthermore, the prediction accuracy
of such models can be limited due to the high-dimensional nature of the data, which can
lead to over- or under-fitting of the model. As such, new methods must be developed to
effectively utilize the high-dimensional medical data in order to improve the accuracy of
breast cancer predictions. This article includes research on the usefulness of feature
selection and dimensionality reduction strategies in enhancing the precision of breast
cancer prediction. They include the support vector machine, random forest, decision
tree, passive classifier, and multi-layer perceptron (MLP),, which are all forms of
machine learning (ML) as well as three feature selection techniques namely L1-based
feature selection, methods for reducing data dimensions by two, including principal
component analysis and linear discriminant analysis, as well as recursive feature
elimination (RFE) and wrapper-based forward selection and backward removal, were
studied and contrasted. The study provides insights into the importance of feature
selection and dimensionality reduction for obtaining better accuracy in breast cancer
predictions.
6
1 Introduction
The focus of this study is on contrasting the usefulness of feature selection as well as
dimensionality reduction methods for enhancing the precision of breast cancer forecasts.
Machine learning methods like support vector machine, random forest, decision tree,
passive classifier, and multi-layer perceptron will be tested to see which ones work best
(MLP), as well as feature selection techniques namely L1-based feature selection,
Recursive feature elimination (RFE) and wrapper-based forward selection
and backward elimination and principal component analysis and linear discriminant
analysis are two examples of dimensionality reduction techniques.
Breast cancer is the most common form of cancer in women worldwide, accounting for
about 30 percent of all occurrences of cancer in females. This means that more than 1.5
million women have been diagnosed with breast cancer yearly, and 500,000 women lose
their lives as a result of this sickness in nations all over the world. Breast cancer is also
regarded to be a multifactorial disease. This condition has been more prevalent during
the last three decades, despite a concurrent decline in the mortality rate. On the other
hand, it is anticipated that mammography screening will result in a 20% decrease in
fatalities, while improvements in cancer therapy will result in a 60% increase [1].
1.1 Traditional Methods used to Diagnosis Breast Cancer in medical

domain
Mammography for diagnostic purposes makes it possible to analyses abnormal breast

tissue in patients who show only subtle or inconspicuous signs of having cancer. This is
a significant advance in cancer detection. Due to the need of a significant number of
photographs, this method cannot be used for the purpose of precisely analyzing areas in
the body in which cancer may be present. According to the findings of a study,
mammograms of women who had breast tissue that was very dense failed to detect
nearly fifty percent of the breast cancers that had been identified in those people. On the
other hand, within the first couple of years after screening, approximately a quarter of
women with breast cancer are able to get a diagnosis that is negative for the disease.
Because of this, getting a breast cancer diagnosis as quickly as possible is of the utmost
significance [2].
7
All women should have a mammogram at regular intervals, either once every year or
once every two years, since this kind of screening for breast cancer is an essential and
effective. This "A fix screening programme for everyone" is inefficient at recognising
malignancy at the individual level and has the potential to undermine the effectiveness
of screening initiatives. On the other end of the spectrum, medical specialists are of the
opinion that a more precise diagnosis of women that are at risk may be reached by
taking into consideration other risks in along with mammography screening. The
recognition of patients who are at the greatest risk of developing the disease can be
aided by accurate risk prediction by modelling, which may also help radiation
oncologists in organising a personal screening for patients and advising them to take part
in the programme for early detection [3].
1.2 Role of Machine Learning techniques used to predict the Breast

Cancer
In recent years, machine learning has gained traction in the healthcare industry for the
aim of disease prediction. This modelling approach indicates the process of gathering
details from data and finding hidden relationships. While some research has relied only
on demographic risk indicators (such as lifestyles and laboratory data) to make breast
cancer predictions, other studies have included mammographic stereotypes or data from
patient biopsies. Some research has used just demographic risk variables to make breast
cancer predictions. Prediction of breast cancer using genetic information has been shown
by others [4].
One of the most challenging components of breast cancer predictions is the process of
creating a model that takes into consideration all of the known characteristics that
increase the likelihood of developing breast cancer. The study of mammographic
pictures or demographic risk variables may be the exclusive focus of the most recent
prediction models; other important aspects may not be taken into account. In addition,
these models, which are precise enough to identifies women who are at high risk, might
lead to frequent screenings and invasive sampling using magnetic resonance imaging
(MRI) and ultrasound. Patients are at risk of bearing both the financial and
psychological burdens of the situation [5].
8
In order to accurately forecast a woman's chance of developing breast cancer, a number
of criteria, such as demographic, laboratory, and mammographic risk factors, need to be
considered. As a result, multifactorial models, which include a number of potential risk
variables into their assessment, have the potential to be helpful in accurately estimating
the likelihood of developing breast cancer [6].
It is possible for breast cancer to form in either the fatty tissue or the fibrous connective
tissue of the breast. Cancer of the breast is a deadly malignant tumor that rapidly spreads
throughout the body. Breast cancer scores highly in terms of fatality rates among
females. Risk for breast cancer can be increased by a number of other variables,
including being older and having a family history of the disease. The medical
community has a significant hurdle when attempting to make a correct diagnosis of
breast cancer. The wide variety of tests available adds unnecessary complexity to
diagnosis and makes it harder to draw meaningful conclusions. So, it is necessary to
implement computational diagnostic approaches with the help of AI and ML. However,
there are challenges associated with using high dimensional data for breast cancer
prediction using machine learning. These include data sparsity, class imbalance, and
overfitting. Additionally, the high number of features can make it difficult to interpret
the models and identify the most important features for predicting cancer. The intelligent
system has to perform dimensionality reduction by mapping the high-dimensional input
into a low-dimensional subspace so it can assess the depth of the relationships involved.
Reducing the high dimensional medical data through feature ranking and dimensionality
reduction techniques may be useful in dealing with high dimensionality. As a result,
these techniques can boost the effectiveness of classification algorithms in terms of, for
example, projected accuracy, speed of prediction, and clarity of results, all while
reducing operational expenses.
1.3 Hybrid Machine Learning models used to predict the Breast

Cancer
The analysis of data is undergoing a sea change at a breakneck speed. Start with a basic
study of the data, as it was before it was improved, and then go on to employing
intelligence. Other common approaches that are used to develop understanding of a
dataset include machine learning, fuzzy logic, and artificial intelligence, to name just a
few. In order to learn from the data, a technique that is based on machine learning uses
9
both supervisory and non-supervisory methods. Once the data have been learnt, the
suggested technique is able to make predictions about those data with a high level of
accuracy [7]. Methods depends on ML may be used to make predictions about the data
based on the label of the dataset. In the event that the label of the data is categorical,
then classification techniques will be used. On the other hand, the regression technique
of prediction may be used if the data being analyzed are continuous. The data may be
clustered by the application of clustering algorithms. Clustering allows for the prediction
of new instances on the basis of the approach that is used [8]. Classification techniques
are broken down into linear and non-linear categories according to whether or not they
provide linear or non-linear results when applied to data sets. Using with a linear model,
one is able to make predictions based on data that is spread linearly. On the other hand,
methods like as decision trees, neural networks, SVM, and KNN may be used for
processing non-linear data. During the classification phase, some of the most common
methods utilized include decision trees, neural networks, support vector machines, and
kernel neural networks [9]. The use of the neural network-based approach offers
significant untapped potential. In light of the benefits it offers, it has also found its way
into an advanced neural network-based learning approach known as the deep learning
method. The phenomenon known as the dimensionality curse is one that often occurs
with gene expression data. In a dataset like this one, the number of dimensions might
range anywhere from a few hundreds to thousands [10]. When faced with such
obstacles, the application of any model becomes not only difficult but also time
consuming. It is not possible to sketch the most important functionality with the
assistance of visualization. None of the models can provide such a vast number of
dimensions in a clear and concise manner. Under these circumstances, it is necessary to
cut down on the number of dimensions without jeopardizing the fundamental properties
of the dataset. Techniques for feature selection and size reduction may be used to
accomplish this goal successfully. When procedures like as feature selection and
dimension reduction are applied to a dataset thereafter, the qualities of the dataset are
not lost [11].
1.4 Scope for Research
The majority of cancers are avoidable if caught early, however many women are
nonetheless diagnosed with advanced forms of the disease. In addition to aiding in the
10
management and prevention of cancer recurrence, improved diagnostic methods play a
crucial role in patient-specific treatment plans. An accurate breast tumour classification
system that can distinguish between malignant and benign breast tumours is necessary.
While diagnosing and prognosticating breast cancer, doctors typically compile
information from a variety of sources, including patient histories, laboratory results, and
research on the disease. Due to the sheer volume of data, managing and analysing high
dimensional medical data can be difficult and lead to a number of problems. One of the
biggest issues with high dimensional medical data is the curse of dimensionality. This
leads to over fitting, which is when the model is overly complex and only works well on
the training data but not on unseen data. Another problem with high dimensional
medical data is the problem of selecting relevant features. Due to the sheer number of
variables, it can be difficult to identify which ones are important and which are not. This
can lead to biased results as only the most important features are used for analysis.
Feature selection and dimensionality reduction techniques can help to address these
issues. Selecting the most relevant features from a dataset is the goal of feature selection
methods like L1-based feature selection, RFE, and wrapper-based forward selection
and backward elimination. Combining variables may lower the number of dimensions,
and methods like principal component analysis (PCA) and latent variable modelling
(LDM) are two examples. These methods have the potential to simplify the dataset and
boost the reliability of the resulting model.
1.5 Aim of the research
This research is focused on developing an efficient model that can be able the predict
breast cancer based on medical data. To successfully build the model, the research
mainly focused on feature selection methods and dimension reduction techniques
applied to the data to reduce the dimension, and will verify the efficiency of these two
methods by applying the dimension reduced data to machine learning algorithms.
1.6 Research question
What are the relative performance differences between feature selection and
dimensionality reduction techniques in improving the accuracy of breast cancer
predictions?
11
1.7 Research Hypothesis
Improved breast cancer prediction accuracy may be achieved via the use of
dimensionality reduction techniques rather than feature selection methods.
1.8 Objectives
 To implement machine learning algorithms such as support vector machine,

decision tree, random forest, passive classifier and MLP classifier.
 To implement feature selection methods such as L1 based feature selection, chi-

square and recursive feature elimination methods.
 To implement dimensionality reduction techniques such as PCA and LDA.
 To evaluate the results with and without feature selection methods and
dimensionality reduction techniques.
1.9 Limitations
One of the study's potential flaws is that it doesn't take into consideration that there are
numerous varieties of breast cancer, and that each type may call for a unique approach to
therapy. There are likely more variables, such as those related to one's lifestyle and the
surrounding environment, that contribute to cancer development, but these were not
explored in this research. The study does not address how to ensure that the model is
able to handle highly imbalanced data sets.
1.10 Ethical Issues
There are a few ethical considerations associated with this study.
Data accuracy, completeness, and timeliness should be prioritised first. Medical records
may include private information about patients, so it is important that they are kept
secure and that the data is not used without the patient’s consent. In addition, it is
crucial to verify that the data is not utilised in a manner that might result in injury or
prejudice.
Second, it is important to ensure that the algorithms used are fair and unbiased.
Algorithms can be biased if they are built on data that has been collected in a way that is
12
biased. It is essential to gather data in a manner that is representative of the whole
community.
1.11 Project Planning and Timescales
1.12 Risk Analysis
Risk Likelihood Severity Mitigation
Data not properly High High Data should be properly prepared

prepared and and anonymised before analysis.
anonymized
Machine learning Medium High Machine learning algorithms

algorithms not properly should be properly tuned and
tuned and evaluated evaluated to ensure accuracy.
13
Feature selection and Medium High Feature selection and
dimensionality dimensionality reduction
reduction techniques techniques must be applied
not applied correctly correctly to ensure the best
performance of the machine
learning algorithms.
1.13 Ethical Approval
The ethical considerations discussed above do not necessarily require ethical approval,
as the data is being collected from an open source website, Kaggle, and the
programming language used is Python. As the data is freely available, it does not require
the consent of the patients for its use. In addition, the Kaggle data set is collected from
reliable sources and is regularly updated. Furthermore, Python is a general purpose
programming language and does not require any special permissions for its use. Thus,
the use of Kaggle and Python ensures that the data is accurate, complete, and up-to-date.
Moreover, Python programming can be used to create algorithms that are designed to be
fair and unbiased, by avoiding the use of any data that may be biased. Therefore, no
ethical approval is required in this study.
14
2 Literature Review
This literature review aims to analyse existing research on the use of feature selection
and dimensionality reduction techniques for effective breast cancer predictions. This
review will discuss the findings of the existing researches and draw conclusions on
which methods need to be explored in this study for accurate prediction of breast cancer.
2.1 Feature selection and dimensionality reduction usage in breast

cancer predictions
The primary goal of this study [12] was to employ correlation analysis and variance of
input features to pick feature selection strategies, then feed these relevant features to a
classification algorithm. In order to enhance breast cancer categorization, the authors
adopted an ensemble approach. The WBCD dataset, available to the public, was used to
test the suggested method (Wisconsin Breast Cancer Dataset). Dimensionality reduction
was accomplished by correlation analysis and principal component analysis. Many
machine learning techniques were tested, and their results compared and contrasted: LR,
SVM, NB, KNN, RF, DT and SGD. The effectiveness of the classifiers was enhanced
by tweaking their hyper-parameters. Two distinct voting methods were used in
combination with the top performing classification algorithms. The class chosen by the
majority of voters is the one predicted by a hard vote, whereas the class chosen by the
highest probability is the one predicted by a soft vote. The suggested technique exceeded
the state-of-the-art by a wide margin, with an accuracy of 98.24%, high precision of
99.29%, and recall of 95.89%.
In this study [13], the SVM and the Extreme Gradient Boosting approach, both machine
learning algorithms, will be evaluated against one another in a classification setting. To
make classification easier, PCA is used to extract features from the raw data and limit
the amount of data attributes. In addition to PCA, K-Means is employed for
dimensionality reduction as a clustering technique. In this work, the results are examined
by applying four distinct models to the Wisconsin Breast Cancer Dataset, each of which
makes use of a different dimensionality reduction technique and one of two different
classifiers. Accuracy, sensitivity, and specificity metrics evaluated from the confusion
matrices will be used to make the comparison. Results from experiments demonstrate
15
that the less popular K-Means approach for dimensionality reduction is on par with the
more popular Principal Component Analysis.
In order to predict breast cancer, the authors of this study [14] offer a synthetic model
with a collection of features optimised with a genetic algorithm (CHFS-BOGA). In
place of chance and random selection, the authors suggest OGA by enhancing the
initialization generation and genetic algorithms with the C4.5 decision tree classifier as
the fitness function. The updated data consisting of 569 rows and 32 columns was
gathered from Wisconsin UCI machine learning. Weka, an open-source data mining
programme, was used to conduct an evaluation of the dataset using its explorer module.
The results demonstrate that when compared to the single filter techniques and PCA, the
suggested hybrid feature selection approach provides superior results. These factors help
in forecasting future profits. Previous iterations of the proposed system (CHFS-BOGA)
utilising support vector machine (SVM) classifiers reached an accuracy of 97.3 percent.
Using (CHFS-BOGA-SVM), they achieved a best-in-class 98.25% accuracy on a data
set composed of 70.0% training data and 30.0% testing data, and a perfect 100%
accuracy on the whole training set. Not only that, but the ROC curve had a value of 1.0.
The findings demonstrated that the suggested (CHFSBOGA-SVM) system successfully
distinguished between malignant and benign breast tumours.
Dimensionality reduction, feature ranking, fuzzy logic, and an artificial neural network
are all used in this study [15] to develop a new approach to data classification. The
purpose of this research is to evaluate the current integrated methods to breast cancer
detection and prognosis and to draw conclusions about their relative merits. The best
diagnostic classification accuracy is provided by principal component analysis (PCA)
using a neural network (NN), however gain ratio and chi-square also perform well
(85.78 percent). These findings pave the way for the creation of a Medical Expert
System model, which may be utilised for the automated diagnosis of additional diseases
and high-dimensional medical datasets.
2.2 Dimensionality Reduction Techniques for Feature Selection and

Feature Extraction
Medical disease analysis, racial profiling and gene classification are just some of the
topics covered in this paper [16]. In addition, for each feature selection and feature
16
extraction method, the authors summarise the techniques/algorithms, datasets, classifier
approaches, and achieved findings relating to the accuracy and computational time.
Around half of the examined studies relied on various optimization strategies, and this
tendency was seen by the authors when they looked at how researchers reduced
dimensionality based on feature selection methods. Both the SVM and the KNN
classifiers are quite common, however the SVM has better accuracy. Yet, 7 of the
analysed research methodologies rely heavily on CNN and DNN algorithms for feature
extraction. In feature extraction, principal component analysis (PCA) is still widely
utilised, having been implemented in 8 different approaches so far. In addition, the
optimised PCA had the potential for enhanced efficiency in terms of both computational
time and the number of features that were eliminated in the process.
2.3 Breast cancer prediction using feature selection
The goal of this research [17] is to create a system for early-stage breast cancer
prediction using the minimal amount of features that can be extracted from the clinical
information. The planned experiment has been carried out using the Wisconsin breast
cancer dataset (WBCD). When using most predictive factors, KNN classifier has been
found to produce the highest classification accuracy of 99.28%. Detecting breast cancer
at an early stage using the suggested method drastically reduces medical costs and
improves quality of life.
The goal of this effort [18] is to categorise breast cancer patients as having a recurrence
or not. In this study, the authors used a breast cancer categorization dataset to identify
the optimal feature set for prediction. As feature selection methods, Chi-squared and the
Mutual Information method have been employed. Afterward, the Logistic Regression
model utilised the decided-upon features to arrive at its conclusion. Specifically, it was
shown that the Mutual Information method was more effective and yielded more reliable
forecasts.
Recently, researchers have increased their efforts to work on a dataset with a big number
of attributes that is known as Big Data. This is a direct result of the revolution in
technology and also the growth in the field of data science. This data, which consists of
many different factors, may be analysed using dimension reduction technology's
methodologies, which are efficient, effective, and influential. The value of the
technology known as "data processing, pattern recognition, machine learning, and data
17
mining" rests in a variety of industries, including those listed above. This study
examines the similarities and differences between two important techniques for reducing
dimensionality—namely, feature extraction and feature selection—both of which are
used often in machine learning models. The authors used a variety of classifiers, such as
Support vector machines, k-nearest neighbours, Decision tree, and Naive Bayes, on the
data from the anthropometric survey of US Army personnel (ANSUR 2) in order to
categorise the data and test the relevance of features by determining particular
characteristics in USA Army personnel. The results showed that (k-nearest neighbours)
achieved high accuracy (83%) in prediction, and we reduced the dimensions using a
number of techniques, such as (High The findings of this study make it abundantly
evident that the effectiveness of strategies for dimension reduction is going to vary
depending on the kind of data being dealt with. When it comes to text data, some
methods are more effective than others, but other methods perform better when dealing
with photos [19].
2.4 Analysis and Conclusions
Based on the reviewed literature, it appears that only a small fraction of research use
both feature selection and dimensionality reduction techniques to boost their breast
cancer prediction models' accuracy. The techniques used include correlation analysis,
principal component analysis (PCA), Chi-squared method, Mutual Information method,
gain ratio, K-Means clustering and chi-square. Unfortunately, studies comparing feature
selection and dimensionality reduction methods to better breast cancer forecasts are
limited. L1-based selection, Recursive Feature Elimination (RFE), and wrapper-based
forward selection and backward elimination have all been studied insufficiently, and
there are few other feature selection approaches that have been thoroughly investigated.
Most studies have focused on PCA as a dimensionality reduction technique, while none
have explored LDA. In addition, Logistic regression, SVM, NB, KNN, RF, DT, and
stochastic gradient decent learning, as well as fuzzy logic and ANN are just some of the
machine learning techniques put to the test in these investigations. However, none of
these studies investigate the potential of algorithms like passive classifiers and MLP for
breast cancer prediction. The goal of this study is to compare the performance of the
multilayer perceptron (MLP) and the passive classifier (PC) in breast cancer prediction
with that of other more established methods, such as the support vector machine, the
18
random forest, and the decision tree. Moreover, this study analyses the effectiveness of
feature selection techniques namely L1-based feature selection, Recursive feature
elimination (RFE) and wrapper-based forward selection and backward elimination and
dimensionality reduction methods such as PCA and LDA and performances will be
compared to determine which one is more effective in predicting breast cancer.
19
3 Methodology
3.1 Choice of Methods
The below techniques and algorithms are chosen for design the effective breast cancer
predictions based on their advantages as discussed below.
Features selection reduces text categorization data. To simplify categorization, feature

selection removes unnecessary and noisy data and selects a representative subset. Chi-
square, L1-based feature selection, and recursive feature elimination are employed in
this work [20]. Chi-square (CS) statistics for feature selection improve classification
model performance by selecting variables. Lasso regression (L1) discards redundant and
unnecessary features to minimize the cost function. To discover the best feature subset,
Recursive Feature Elimination greedily searches. Each iteration, it develops models and
chooses the best or worst feature. Until all features are studied, it builds models with the
left features [21]. The Dimensionality Reduction (DR) approach facilitates effective
transfer learning by decreasing the metric space distance between distributions of
different data sets [22]. Latent Dirichlet Allocation (LDA) is a popular method for
determining the distribution of latent topics across a big corpus. As a result, it is able to
recognize sub-topics for a technical area that is comprised of a great number of patents
and depict each of the patents in a variety of subject distributions [23]. Machine
Learning (ML) uses algorithms to automatically train a computer for certain tasks.
Application sets of algorithms analyse data to uncover and filter generic rules in massive
data sets, automatically learning user preferences. The support vector machine (SVM) is
a multidimensional data-division hyperplane-based discriminative classifier. This
method outputs an ideal hyperplane that can predict future states via supervised learning
[24]. An ensemble of decision trees forms a Random Forest (RF). In RF, each tree
makes classification predictions using the model's majority vote. A decision tree is a
type of classification tree made up of nodes and branches. It sorts attribute values and
groups them [25]. MLPs, also known as multilayer perceptrons, are a like a feedforward
artificial neural network. MLP is made with a huge number of input node layers that are
coupled as a directed graph connecting the input and output layers. Back propagation is
utilized in the training process for MLP. Online learning systems like the passive
20
aggressive classifier algorithm can handle big datasets and adjust their weights as new
data comes in [26].
3.2 Dataset Description
Follow the link below to access the Breast Cancer Wisconsin (Diagnostic) Data Set, a
database containing detailed information about breast cancer in Wisconsin.
https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data
Expert medical opinion on the benign or malignant nature of a breast mass's nucleus is
included. Fine needle aspirate (FNA) of a lump in the breast, digital image features are
included in the first portion of the data set, and the remaining part consists of the cancer
diagnosis. Totalling 569 pieces of information, these are the categories into which they
fall: There were 357 noncancerous cases, and 212 cancerous ones.
Due to its size and diversity, this dataset is ideal for research into feature selection and
dimensionality reduction strategies for accurate breast cancer forecasts. The dataset can
be mined for information that could prove useful in deciding whether a given case of
breast cancer is malignant or not. Furthermore, the dataset can be utilized to create
algorithms that efficiently cut down on the requisite number of attributes for accurate
patient diagnostic prediction. The diagnostic procedure for breast cancer may benefit
from this.
3.3 Feature selection Methods
Feature selection (FS) is currently being utilized in a different of real-world challenges,

the majority of which are classified as classification and regression issues. Problems that
arise in a various domain, such as image categorization, microarray analysis, text
classification, and facial recognition, have been successfully tackled by employing the
strategy of feature selection. Medicine is quickly becoming one of the most productive
and innovative disciplines for the application of feature selection and machine learning
[27]. The primary focus of research in this area is to lessen the scope of problems while
simultaneously cutting expenditures. Applications in this discipline include, but are not
limited to, information extraction from photos and investigating the causes of expert
differences in the disease diagnosis using image recognition [28].
21
During the process of developing a predictive model, the number of input characteristics
required is reduced as a result of this activity. Problems with processing speed and
complexity may be resolved by selecting features; however, this results in storage in
memory being compromised. To put it another way, feature selection eliminates features
that provide insufficient information in order to narrow the dimensions down to a
secondary set. As a result, the features selection process identified the best possible
subset of characteristics from the extensive dataset [29]. During the implementation
features selection process, the new set dropped while still maintaining most of the
information from the source dataset. Because of this, we should consider deleting
redundant features and obtaining important information instead. Two things are
important for the selection procedure to take into account: (1) there should be no impact
on the accuracy and performance, and (2) the final subset should be comparable to the
primary dataset. When it comes to obtaining data visualization and data interpretation
while also cutting down on storage space, the optimum feature selection criteria consist
of two primary processes (feature creation). These steps make up the features selection
mechanism. This step involves constructing a subset from the massive quantity of data
that was collected. Second, as its name suggests, the function of feature evaluation is to
assess the subset that was produced to meet the criteria [30].
The technique for selecting features consists of a mix of search procedures, each of
which chooses a distinct subset of characteristics, and an evaluation metric, which
assigns scores to each of the distinct subsets of features. This needs a lot of processing
resources, and depending on the kind of machine learning model being used, a different
subset of characteristics could yield the best results. This indicates that there is not a
single ideal collection of features but rather several optimal features set depending on
the ML technique that is meant to be used. There is a distinction to be made between
supervised and unsupervised feature selection approaches [31]. This categorization is
accomplished by taking into account, or not taking into account, the target variable
throughout the method of feature selection. When reducing redundant variables using
the correlation approach, unsupervised methods do not take into consideration the
variable that is being targeted. It is necessary to employ supervised methods for the
purpose to exclude the variables and elements that are not connected to the one that is
being studied. It is commonly known that some methods, like filter methods and
wrapper methods, have a supervisor looking over their shoulders [32].
22
The selection of variables in filter techniques is determined by the properties of the
features themselves. They do not use any kind of machine learning method, instead
relying only on the qualities of the features themselves. Filter techniques are not
dependent on any particular model and require little to no computing effort. However,
they often result in performance that is poorer when compared to that produced by
different feature selection approaches. Moreover, these are outstanding tools for quick
filtering and the rapid exclusion of components from a collection of data that are not
associated with the topic that is now being addressed [33].
Wrapper approaches identify an appropriate feature subset by using a predictive

machine learning algorithm. Wrapper approaches develop an algorithm for machine
learning for every one of the subsets, then choose the subset of characteristics that
generates the maximum performance. This results in their having a very high
computational cost, yet they often yield the most effective subset of features for the
machine learning technique that is being used. It also suggests that the characteristics
that were chosen from the subset may not be the best ones for producing the best
possible outcome using a different ML method [34].
The embedded approach is selects features to be included in the model during the
process of creating the model itself. The models are equipped with built-in feature
selection methods, which choose and includes these variables that result in the highest
possible level of precision. The embedded technique takes into account the interaction
that occurs between the model and its features. In contrast to the feature selection
techniques, which were mentioned earlier and involve removing the variables from the
dataset, the dimensionality reduction approach involves the creation of a new projection
of data together with whole new variables set. This is in opposiste to the fact that the
feature selection methods involve removing the variables [35].
Chi-square, L1-based feature selection and Recursive Feature Elimination are the
suitable feature selection methods of this study.
3.3.1 Chi-square
In information theory, the Chi-square statistic is connected to feature selection functions

that aim to reflect the intuition that the most dissimilar terms t k are the most appropriate
for a given class ci.
23
2
(observed Values−Expected values)
Chi−square ( t k , c i )=∑
Expected Values
Chi-square statistics equation is represented in terms expected frequency and actual

frequency of data items [36].
Addition of the input variables and the target variable in the breast cancer dataset are
categorical data. The input variables are broken down into subcategories. The problem
that we are dealing with is one of classification predictive modelling, in which the
system attempts to classify the data as belong to either the recurring or the non-recurrent
class. Using the statistical approach known as Pearson's Chi squared, one may conduct a
test to determine whether or not the variable serving as an input also influences the
variable serving as an output [37].
A test statistic is computed by drawing a comparison between the observed values and
the theoretical ones. In order to determine whether or not the distributions of category
observed variables and expected variables vary from one another, the chi-squared test is
carried out. In the event that the value exceeds a certain threshold, the null hypothesis is
rejected, and the significance of the finding is established. This means that the variable
is seen as being reliant on the outcome. If this is not the case, the value being reported is
not significant; hence, the null hypothesis should not be rejected, and the variable in
question should be considered independent [38].
3.3.2 L1-based feature selection
The regularization denoted by the notation "L1" is also known by the name "lasso"
regularization. A method known as the Least Absolute Shrinkage and Selection Operator
has the capability of bringing some of the coefficients down to zero. Least Absolute
Shrinkage and Selection Operator is the full name of this algorithm. This shows that
lasso has the capability to punish the feature and make its coefficient 0 if it determines
that the feature is not essential. This indicates that when attempting to forecast the target
variable, some of the attributes will have a value of zero. As a result of the fact that these
characteristics will not be included into the model's final predictions, it is possible to
exclude them. Consequently, this results in a streamlined collection of characteristics for
the completed model [39].
24
L1 is a regularization constraint added to the target function of linear models to stop the
prediction model from being overfit to the data. The L1-based feature selection method
uses the sparse solutions produced by punishing the model with the L1 norm. Linear
SVC is used as a classifier in this linear model [40].
3.3.3 Recursive Feature Elimination
Recursive feature elimination, often known as RFE, is a feature selection technique that
makes an effort to choose the features subset that provides the greatest level of
classification accuracy based on the learnt model. After constructing a classification
model, traditional RFE works by first identifying and then systematically removing the
poorest feature responsible for any decrease in "classification accuracy." Recent
developments have led to the development of a novel method to RFE, which analyses
"feature (variable) importance" rather than "classification accuracy" using a support
vector machine (SVM) model, and then selects the least significant features for deletion
[41].
It is a wrapper approach in feature selection. To boost the model's generalization

performance, it discards the dependent and weak features while retaining the
independent and strong ones. It employs a form of backward feature removal known as
iterative process for ranking features. This method initially constructs the model using
all of the features and ranks them in order of significance [42]. The model is then rebuilt
with the least-important feature removed and the importance of the remaining features
recalculated. The rating of features can be stored in a sequence number, denoted by T.
The Ti remembers the highest-scoring features from each iteration of backward feature
elimination so that the model's fitness and performance can be checked. Best-performing
features are used to fit the output model, and the optimal value of Ti is calculated [43].
3.4 Dimensionality reduction techniques
The technique known as computer vision employs the process of ML in order to

examine and extract the characteristics of digital pictures in order to recognize, identify,
and deduce the meaning of those images. The vast majority of computer vision
approaches make use of classification algorithms, which are an essential component in
determining the advantages and disadvantages of various feature representations.
Extraction of features and selection of features are two common techniques that are used
25
to increase classification accuracy and performance by removing redundant and
superfluous data. This is accomplished by incorporating a new pixel-based classification
algorithm. When doing data analysis on a huge amount of characteristics, it is also used
to decrease the dimension of the data to a lower dimension. This is because smaller
dimensions are easier to analyse [44]. The principal component analysis, often known as
PCA, is a useful pre-processing technique for the constructive dimensionality reduction
methods that determine the association between the variables of features. PCA is a
technique that has been developed and utilized in a various domain, such as natural
language processing, audio recognition, geography, bioinformatics, and computer
vision, to name a few. Because the job of image processing exposes the challenges of
both compute and memory consumption, PCA is frequently employed in the research
area of image processing. However, the variable selection methods are ineffective when
all of the variables are linked, but they perform effectively when working with
informative variables [45].
On the other hand, the identification of the class levels of informativeness is also a very
significant notion. In the past, many strategies for the extraction of features were
researched in an effort to enhance categorization via feature selection. The feature
extraction process makes use of local features rather than global features to more
correctly depict unique attributes of relevant information. This is possible when the fact
that local features are picked separately depending on the characteristics of the dataset. It
begins by gleaning features from each of the feature component sets, after which it
applies those features to the input data in order to convert it into a new collection of
informative features. The impact of dimension reduction using principal component
analysis (PCA) alone is inferior than the effect of using PCA in conjunction with
entropy. Because of this, a number of academics have focused their attention on feature
weighing in order to enhance the feature selection techniques that include weighting
with inter-class and intra-class distance [46]. In order to enhance classification and
clustering from maintaining features of a class problem, the classes that were utilized in
statistically weighted feature approaches were also explored. On the other hand, a
number of academics have proposed a class-weighted measurement by a comparable
distance that represents the qualities that are associated with classes. In addition, the
number of dataset features that have a significant distinction may be minimized by using
strategies for feature selection as well as extraction. The process of picking and
26
eliminating certain aspects without altering them is referred to as feature selection,
whereas the process of transforming data into a lower dimension is referred to as
dimensionality reduction. Finding a dimensionality reduction approach that uses
principal component analysis (PCA) that can carry out feature extraction and feature
selection without sacrificing picture quality and without losing important features due to
dominating class characteristics is one of the challenges that must be overcome [47].
Dimensionality reduction (DR) involves selecting useful traits and discarding irrelevant
and superfluous ones. It reducing dimensionality of input can increase performance by
reducing learning time and complexity of model or enhancing capacity of generalization
and accuracy of classification accuracy. By selecting acceptable characteristics reduces
measurement expense and improves problem understanding [42].
DR converts high-dimensional data into low-dimensional representations. Due to the

tremendous development in high-dimensional data, several fields are using DR
techniques. Modern methods also emerge. DR strategies convert high-dimensional
datasets into low-dimensional datasets while preserving data semantics. The
dimensionality curse is alleviated via low-dimensional data representation. Analysing,
processing, and visualizing low-dimensional data is simple [48].
Dimensionality reduction on a dataset has many benefits. (i) Data storage space
decreases as dimensions decrease. (ii) It takes only less computing time. It removes
redundant, unnecessary, and noisy data. Improve data quality. (v) Some algorithms fail
in higher dimensions. By reducing these dimensions improves algorithm efficiency and
accuracy. Higher-dimensional data visualization is difficult. Thus, lowering dimensions
may help us design and analyse patterns. (vii) It streamlines and enhances categorisation
[49].
3.4.1 Principal Component Analysis (PCA)
It is the popular Dimensionality Reduction (DR) technology. Finding the sweet spot
between Information variance and vector dimensionality reduction is what PCA is all
about. The PCA is a technique of unsupervised learning that can help to make sense of
the data. PCA was developed to reduce the number of dimensions a dataset has. It's a
method for decreasing the number of dimensions needed to describe a dataset to focus
on the most informative subset of data [50]. The PCA is a method of orthogonal
statistical for transforming a dataset of connected observations into a data set of non-
27
linearly related values. Face recognition and other applications like medical data
correlation are two of the general uses of principal component analysis (PCA), but it is
also utilized in other sectors such as quantitative finance and spike-triggered covariance
analysis in neuroscience [22].
An unsupervised linear dimension reduction approach, principal component analysis

(PCA) is the most extensively used methods. It is also a technique that uses principal
components. The goal of principal component analysis (PCA) is to minimize the
dimensionality of a dataset while at the same time preserving as much of its inherent
variability as is reasonably possible. The process of locating additional variables that are
linear functions of those that are already present in the original dataset is what is meant
by the term "preserving as much variability as possible." These linear functions do not
exhibit any correlation with one another, and moreover, they have the feature of
increasing the variance to its maximum. Principal component analysis is a technique for
describing data that does not need any distributional assumptions to be made on the part
of the analyst [51].
Principal Component Analysis (PCA) has a wide variety of applications, some of which
include machine learning, the processing of images and voice, computer vision, text
mining, visualisation, biometrics, and robotic sensor data. Facial recognition is another
one [52].
The purpose of principal component analysis (PCA) is to find the primary Components
(PCs), which are a collection of characteristics that are not connected with one another.
The first PC and the sequence in which it was found have the biggest amount of
variation in the data set. Despite the fact that it is a reliable approach for dimension
reduction, it does have certain restrictions. In spite of its broad use, the PCA
transformation is based on second-order statistical analysis. Because the principal
components may be extremely statistically reliant on one another while having no
correlation with one another, principal component analysis (PCA) may not be successful
in finding the data's most concise description. PCA requires a representation with a
higher dimension, which will be disclosed by a non-linear method, since it physically
portrays the data as a hyperplane that is buried in an ambient space. This is why a non-
linear method is required. PCA displays the data in this manner, hence it is required for
the data components to demonstrate non-linear correlations if this is how the data is to
be represented. This is due to the fact that the hyperplane is nested inside of a space that
28
is ambient. As a result, non-linear alternatives to principal component analysis have
been developed. Because they utilize estimate approaches based on least squares, PCA
methods do not take into account outliers, which are ubiquitous in practical training sets
[53].
3.4.2 Latent Dirichlet Allocation (LDA)
It is a framework of probabilistic of a corpus that can be used to produce new phrases.

The documents are presented in the form of random mixtures over latent topics. This is
understood to be scattered over words. LDA offers concepts by working backwards
from the probabilities of individual words. The LDA word probabilities can be used to
infer the topic based on the words that occur most frequently within each category [54].
In the pre-processing phase of applications involving data mining and machine learning,
LDA is another prominent dimensionality reduction method that may be used. The
primary objective of latent dirichlet allocation (LDA) is to map a dataset that has a vast
number of characteristics onto a space that has fewer dimensions and a high degree of
class separability. The amount of money spent on computing will decrease as a result.
The technique that is used by LDA is quite similar to the strategy that is taken by PCA.
LDA not only maximizes the separation of numerous classes, but it also maximizes the
variance of the data (as measured by PCA). The objective of linear discriminant analysis
is to project a dimension space onto a more compact subspace while maintaining the
integrity of the class information [55].
The linear combination of features is used as a linear classifier in LDA, which allows for
the extraction of features and the decrease in dimension. Through the translation of
characteristics into a space with fewer dimensions, it ensures the greatest possible
degree of class separability by optimizing the ratio of the variation between classes to
that of the variance within classes. The capacity of LDA to combine the information
from both features in order to build a new axis, which in turn minimizes the variance of
the variables and maximizes the class distance between them, is one of the benefits of
using this statistical approach. Another advantage is that this method minimizes the class
distance between the variables [56].
Despite the fact that the LDA represents one of the data reduction techniques that is used
on a regular basis, it has a number of downsides that should be taken into consideration.
The linear discriminant analysis (LDA) is unable to discover the lower dimensional
29
space when the dimensions are significantly larger than the total quantity of samples
included in the data matrix. This results in the singularization of the within-class matrix.
This challenge is sometimes referred to as a small sample size (SSS) problem. There
have been a number of recommendations made on possible answers to this problem.
Eliminating the empty space in the within-class matrix was the first tactic that was
proposed as a solution to the problem. The second method involves converting a
subspace that is considered to be intermediate, such as PCA, into a matrix that is
considered to be inside the class, and then finally into a matrix that has full rank. In the
second method, if there is a linearity issue, which means that various classes cannot be
separated linearly from one another, the linear discriminant analysis (LDA) is unable to
differentiate between these classes. It is possible to solve the issue by using the kernel's
built-in functions. Applying the regularization problem to the process of handling
singular linear systems is the third strategy, which is also a well-known option [57].
3.5 Machine Learning algorithm
ML is the process of problem solving by analyzing the patterns of data that are
available, as opposed to being explicitly programmed to do so. Models of ML are used
in a various activity, including classification, regression, clustering, anomaly detection,
ranking, recommendations, and forecasting, to name a few [58].
The process of determining which category a certain piece of data belongs to is referred
to as classification. Classification algorithms work on data that has been labelled, with
each label serving to define a different class or category that the data may be placed in.
There are two possible approaches to classification: binary classification and multi class
classification. The first method detects that the unlabeled data will fall into any of the
two classifications that are accessible, whereas the second method predicts that the data
will fall into one of N different groups or categories. And also, regression is the process
of attempting to identify a label that is a continuous value based on a collection of
characteristics that are connected to one another. The regression process is applied to a
collection of labelled features, and a function is used to estimate the value of unlabeled
data based on the labelled features [59].
The process of organizing individual instances of data into distinct groups determined by
the degree to which they are alike is referred to as clustering. Clusters are the distinct
groupings, and members of a cluster have comparable traits that are unique to that
30
cluster. Clusters may be further broken down into subclusters. A rare occurrence or
observation that is deceptive and otherwise different from the other observations is
known as an anomaly. Anomalies may occur at any frequency. detecting fraudulent
transactions, anomalous clusters, patterns that suggest network infiltration, detecting
outliers, and other irregularities are all made easier with the assistance of anomaly
detections. During the ranking process, the data with labels are organised into instances
and given scores. The ranker then uses these scores to determine rankings for the
examples that have not yet been observed. The process of making suggestions about
items or services to a user on the basis of that user's previous activity is referred to as
"recommendation." Predicting is the act of looking into the past and making assumptions
about the future [60].
The majority of ML models are founded on the concept of predictive modelling.

Predictive models are often "trained" on historical information so that they can
accurately anticipate future events. The machine learning model's performance depends
on how well the chosen approach works for fixing the problem. ML models achieve
astounding levels of performance when given enough relevant data. Feature selection
procedures may be quite useful in this context. They help not only in reducing the
necessary computational power, but also in boosting the model's overall performance
[61]. Selecting features or variables from a dataset to utilize in machine learning models
is known as feature selection or variable selection. It is essential for the construction of
Machine Learning models that are quicker, easier, and more dependable. Models with
fewer moving parts are not only simpler to comprehend but also need less time to train.
A model with less variables is simpler to grasp than one with one hundred variables
since there are fewer moving parts in the former. A decrease in the total quantity of
variables not only speeds up the process of constructing models but also brings the
computing cost down. The process of feature selection also increases generalization, and
as a result, it makes model overfitting better. In many cases, a significant number of the
variables just contribute noise and have either little or no predictive value. The ML
models acquire new knowledge from the background noise, which inhibits
generalization and leads to overfitting. We will be able to significantly enhance
generalization and greatly minimize overfitting if we eliminate this noise. When there
are fewer factors to take into account, there is a lower likelihood that mistakes will occur
in the data gathering process. It is possible to eliminate variable redundancy by choosing
31
just the essential characteristics and excluding the highly associated feature. This may be
done without the risk of losing critical information [62].
3.5.1 SVM classifier
SVM is a important approach utilized in the area of ML. The goal of the algorithm is to
locate an N-dimensional hyperplane that may be used to categorize the data. The core of
this technique is locating the optimal plane in terms of margin. Depending on the total
number of features, N can take on different forms. It was simple to evaluate two
characteristics side by side. However, this is not always the case if there are multiple
features to classify. Increasing the margin leads to more precise predictions [63]. SVM
is depicted graphically in the figure.
Figure 1 Support Vector Machine Classifier [64]

The trade-off between margin size and classification precision in SVM is rather minor.
A lesser level of precision is possible if the exact categorization is applied without
compromising any particular sample. To improve accuracy, however, we can take into
account support vectors from other classes that are close to the hyperplane by increasing
the margin between classes [65].
3.5.1.1 Hyperparameters Used

Here, Kernel, gamma, c, degree, and tol are used as hyperparameters. The algorithm's
kernel is defined by the value of the kernel parameter. The kernel function breadth is
controlled by a parameter called gamma. The classifier's penalty is set by the value of C.
Due to the high cost of training errors, the margin for error decreases as C increases.
And if C is little, then the cost will be minimal and the margin will be substantial [67].
32
The degree of a polynomial kernel function (denoted by "poly") is controlled by one of
the hyperparameters. All other kernels will disregard it if it is negative. The "tol"
parameter defines a stop criterion's tolerance [68].
3.5.2 Random Forest classifier
The most widely used supervised ML technique is the Random Forest algorithm. The
technique is flexible enough to solve both classification as well as regression issues. It's
an ensemble learning technique in which a group of somewhat ineffective learners
works together to produce a more robust representation of the world. This method
generates a forest of trees. The key benefit of utilizing this approach is that, unlike other
algorithms, it can deal with missing values and outlier values [69].
Figure 2 Architecture of the Random Forest algorithm [70]

It excels in high-dimensional settings when huge datasets are involved. Each and every
tree in the forest provides its own unique categorization, or "vote," as it expands,
preventing the model from being overfit. In the categorization problem, utilizes the
majority voting approach to place a new instance in the group that earned the most
votes. The average of the results from each tree is used in a regression problem. The lack
of transparency into the algorithm's inner workings makes it a "black box" for statistical
modelers [70].
33
The classification technique is used by the RF in order to take a non-parametric
approach. The RF applies a vast number of decision trees to each individual data set
after performing categorization at a high rate for each data set. A random number of
input variables is used in each tree, and then the results of all of the trees are integrated
to get a more accurate conclusion based on the variables [71].

Following parameters are used to tune the model developed by using RF.
Criterion: The function that determines how good a split actually is. Supported criteria
are “gini”, “log_loss” and “entropy”.
max_features: It is related to the total amount of features used to train a random forest.
n_estimators: It specifies number of trees considered to construct the RF [72].
min_samples_split: It provides the decision tree in a random forest with the less required
number of observations in each node to divide the nodes.
min_samples_leaf: It defines the least count of samples that should be found in the leaf
node after breaking a node into two separate nodes [73].
3.5.2.2 Decision tree

DT can be applied to both regression as well as classification tasks. It is applicable to
input and output variables that can be either discrete or continuous. Parallel to the
creation of the tree, the dataset is partitioned into subsets. Nodes make up the structure
of the tree that is produced. There are three kinds of nodes in a tree: the root, the
decision nodes, and leaves. When a node in the tree may split into more than one
branches, it is called a decision node, and when it can't be split any more, it is called a
leaf node. The internal node of a tree, or leaf node, represents labels and characteristics
for classes [74].
34
Figure 3 Architecture of the Decision Tree algorithm [70]
In order to construct the tree, a DT first divides the data into more than one groups. The
ratio of information gain to entropy is used to determine the partitioning. Attributes are
prioritized for investigation based on the knowledge they promise to impart. Each
variable’s entropy can be obtained using the following equation, and from that, the
information gain can be derived.
Entropy (S) = - p log2 p - q log2 q
Gain (S, V) = Entropy(S) - Σ (|Sv|/|S|) Entropy (Sv)
Where p represents the proportion of samples from positive classes found in S and q
represents the same proportion found in S, S is a collection of both successful and
unsuccessful occurrences. There is a need to determine the information gain associated
with the attribute V [75].

Following hyperparameters are used to tune the decision tree.
Splitter: It is used to determine which branch to take at each node. Its acceptable values
either be "best" or "random" techniques.
Criterion: Quality of a split is measured by this parameter. Acceptable criteria are “gini”,
“log_loss” and “entropy” [75].
max_features: This category of hyperparameter represents the total amount of DT

features.
35
class_weight: This is a hyperparameter that goes by the name of class_weight, and it
determines the weight that is associated with classes or the weight that is provided for
each output class.
min_samples_split: This provides the least number of samples necessary to split a

decision node [47].
3.5.3 MLP
MLP, which stands for multi-layer perceptron. This is a highly effective modelling
technique that employs a supervised training approach by making use of samples of data
that have outcomes that are already known. A model of a nonlinear function is produced
as a result of this technique. This model permits the prediction of output data based on
input data that is provided. The result of one layer is the input of the following layer in
MLP design, and so on. The first and last layers of a neural network are commonly
referred to as the layers of input and output. The other levels are called as the hidden
layers [76]. The MLP is a kind of multilayer neural network called a feedforward neural
network because data is not transformed in the hidden layers before being sent from the
input layer to the output layer. Every individual connection made between neurons
carries its own unique amount of weight. The activation function of the perceptron that
belong to the same layer is the same. In a broad sense, it can be understood as a sigmoid
function for the hidden layers. Depending on the requirements of the application, the
output layer might take the shape of a sigmoid or a linear function [77].
Figure 4 Multi-Layer Perceptron [78]

MLP is an advanced kind of ANN. An input layer gets the signal, an output layer
produces predictions, and a set of hidden layers in between does the actual computation.
Multilayer perceptrons (MLPs) are trained using the backpropagation approach, which is
part of supervised networks. The information in this network travels from the input
36
nodes to the output nodes. In the event that there is a mistake in the output, that error
must be transferred in some way from the output to the input; doing so will result in the
weights being rectified. The post-diffusion algorithm is the approach that is used most
often for this purpose.
A MLP with significantly one hidden layer is capable of identifying the nonlinear
function, despite the fact that it does it with reduced precision. Overfitting the data to be
trained is more likely to occur in networks that have more hidden layers than other types
of networks. The rate of learning and the momentum are the primary determinants of the
speed and performance of the process of learning. MLPs are able to address issues that
cannot be linearly separated and are meant to provide an approximation of any
continuous function. Pattern categorization, recognition, prediction, and approximation
are the primary applications of MLP [80].

The hyperparameters activation, solver, learning_rate, learning_rate_init, and tol are
used by MLP. The function of activation for the hidden layer is defined by the
"activation" parameter. The weight optimization solver is denoted by the symbol Solver.
The weight-update schedule is determined by the learning_rate. The weights are updated
in discrete increments determined by the value of learning_rate_init, which is the initial
learning rate. In order to optimize, "Tol" offers tolerance [81].
3.5.4 Passive Aggressive classifier (PAC)
PAC is an effective online ML algorithm that maintains its passive state for the purpose
of producing an accurate classification outcome but switches to its aggressive state in
the event of a miscount or input that is irrelevant. It is one of the applications in ML that
is particularly useful and successful. Its principal use is when it requires a vast amount
of data to be processed all at once, which is also one of its applications. They are related
to the perceptron framework in the sense that they do not call for the specification of an
attribute for the learning rate [82]. If the offered prediction turns out to be accurate,
it should not attempt to update or improve the model in any way; instead, you should
just leave it as it is. This shows that the data is not adequate for triggering any
modifications in the model, and as a result, it seems to be passive to the system. If the
input results in inaccurate predictions, the model is modified to account for this. They
37
are the most important facts for providing effective modifications to the model that is
currently being proposed [83].
Figure 5 Passive Aggressive Classifier [83]

This is extremely helpful in conditions where there is an excessive quantity of data,
making it computationally impossible to train on the whole dataset. This may out to be a
highly efficient answer. The fake news detection on a social media platform such as
Twitter, which has new data being uploaded virtually every second, is an excellent
illustration of this concept in action. If you were to dynamically read data from Twitter
on a continual basis, the amount of data would be enormous; therefore, it would be great
to use an online-learning algorithm [84].

Passive aggressive classifier uses c, max_iter, validation_fraction, loss, and tol as
parameter for its tuning.
C: This option determines the maximum step size that can be used for regularization.
The default value is 1.0.
max_iter: This parameter specifies maximum number of times that the training data will
be gone through.
validation_fraction: The percentage of the training data that will be used for the
validation set when early halting will occur. Must fall between 0 and 1, inclusive. Only
used if the early_stopping flag has been set to true. [85].
loss: The loss function to be used. It might either be hinge or squares loss.
tol: The criteria is used to specifies that when to stop [86].
3.6 Evaluation metrics
38
In this study, uses confusion matrix, Accuracy, F1-Score, Precision and Recall as well as
time taken for training, validating and testing the data as its evaluating metrics that are
used to assess the efficiency of the model.
3.6.1 Confusion Report
A table that is known as a confusion matrix can be utilized to do an evaluation of the

efficiency of a categorization system. A confusion matrix can be used to demonstrate
how good a classification system is, and it can also provide a summary of the technique
[87].
Precision, Recall, F1-score, Accuracy
The accuracy is the percentage of occurrences in the data set that were correctly
classified. The accuracy of a classification model relies on how well it can filter out
irrelevant information. The recall of a model is its propensity to locate all relevant
instances within a data set. Instead of measuring a model's overall performance, like
accuracy does, an F1 score can be used to evaluate it based on how well it performs
inside each individual class [88].
3.6.2 Time taken for training, validating & testing the data
The dataset that was utilized to train the model and give it the ability to detect features
and trends that had not been noticed before. The validation set is a various collection of
data utilized to check how well our model is doing throughout its training phase. After
the training phase has been finished, the model is evaluated using data that was gathered
from the test set. This helps ensure that the model is accurate. At this point, an
evaluation of the total amount of time necessary for training, validating, and testing a set
is carried out [89].
39
4 Experiments & results
Early detection and treatment are essential for improving the chances of survival ML
models have been shown to be effective in predicting breast cancer, and they have the
potential to be used as a screening tool. In this session, we will discuss the Breast Cancer
Wisconsin (Diagnostic) Dataset Collection, its description, and the data preprocessing
techniques used. This chapter also describes how the data is preprocessed and
visualized. Further, the results after implementation of ML models with and without
feature selection techniques and dimensionality reduction techniques and the critical
analysis will be detailed in this chapter.
4.1 Breast Cancer Data Collection
The Breast Cancer Wisconsin (Diagnostic) Data Set is a publicly available dataset that
contains data about breast cancer tumors. The dataset was collected by Dr. William H.
Wolberg, W. Nick Street, Olvi L. Mangasarian, and Harold E. Wechsler at the
University of Wisconsin Hospitals, Madison, Wisconsin, USA. The dataset contains 569
observations, each of which represents a breast cancer tumor. The observations are
described by 30 features. Follow the link below to access the Breast Cancer Wisconsin
(Diagnostic) Data Set, a database containing detailed information about breast cancer in
Wisconsin.
https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data
Expert medical opinion on the benign or malignant nature of a breast mass's nucleus is
included. Fine needle aspirate (FNA) of a lump in the breast, digital image features are
included in the first portion of the data set, and the remaining part consists of the cancer
diagnosis. Totalling 569 pieces of information, these are the categories into which they
fall: There were 357 noncancerous cases, and 212 cancerous ones.
Due to its size and diversity, this dataset is ideal for research into feature selection and
dimensionality reduction strategies for accurate breast cancer forecasts. The dataset can
be mined for information that could prove useful in deciding whether a given case of
breast cancer is malignant or not.
4.2 Breast Cancer Data Description
40
The Breast Cancer Wisconsin (Diagnostic) Dataset has been imported using pandas
package and importing the ‘data.csv’ file. After importing the dataset, the first 10
number of rows have been visualized as shown below.
The visualization of the first 10 number of rows is as shown below,
Figure 6 visualization of the first 10 number of rows

Further after importing, the dataset now contains 33 columns and 569 rows. In the
dataset, the "diagnosis" has been renamed with the title "cancer_target".
4.3 Breast Cancer Data Preprocessing
The preprocessing steps involved in the Breast Cancer Wisconsin (Diagnostic) Dataset
are as follows:
Removing Unwanted columns: In the Breast Cancer Wisconsin (Diagnostic) Dataset,

there are two columns such as ‘id' and 'Unnamed: 32', which are not much useful for this
research work. Hence these columns have been removed and after removing the same,
the dataset now contains 569 rows and 31 columns.
41
Removing Null values: In the Breast Cancer Wisconsin (Diagnostic) Dataset, there are
no null values and there is no further processing required.
Removing Duplicates: In the Breast Cancer Wisconsin (Diagnostic) Dataset, there are no
duplicates present in the dataset and there is no further processing required.
4.4 Breast Cancer Data Visualization
The Breast Cancer Wisconsin (Diagnostic) Dataset has been visualized using count plot,
dist plot, line plot and box plot. The output column ‘cancer_target’ has been visualized
using count plot and the input columns ‘area_mean’ and ‘value’ have been visualized
using dist plot. The visualization of these input and output columns are shown below.
Figure 7 Data visualization

Based on the above visualizations, the malignant count is more than the benign count in
the breast cancer results. It is also inferred that; the cell nucleus area mean value is differ
from patient to patient. Further, the input columns ‘radius_mean’ and ‘perimeter_mean’
have been visualized using line plot and the output column ‘cancer_target’ and input
column ‘concavity_se’ have been visualized using box plot and are displayed below.
42
Figure 8 Data visualization of radius and perimeter
Based on the above visualizations, both radius and perimeter mean of cell nucleus are
directly proportional. It is also inferred that; severity of concave portions value is less
for malignant cancer patients.
4.5 Label Encoding Process
Label encoding is the process of translating categorical information into numeric

information by giving each category a distinct integer value. In this research work, label
encoder have been used for converting the categorical data into numerical data. The
coding for label encoding the data is shown below.
Figure 9 Label encoding code

After the label encoding, the data is saved in a separate file named ‘Effe_cancerbre.csv’.
The data look alike as shown below.
43
Figure 10 Label encoding graph
4.6 Splitting the Breast Cancer Data
The Breast Cancer Data has been splitted for the purpose of training, validating and
testing. In this research, the 341 count of data has been used for training purpose, 114
count of data has been used for training purpose and 114 count of data has been used for
training purpose. The coding for splitting the dataset is shown below.
Figure 11 Data splitting
4.7 ML model implementation
For effective breast cancer predictions, the ML models such as SVM, RF, DT, MLP
Classifier and Passive Classifier models will be implemented in this session. Before
implementation of any ML models, the best fit parameter have been determined by using
Grid search method.
This research work aims to predict breast cancer via the use of dimensionality reduction
techniques and feature selection methods and also investigates which ML model along
with dimensionality reduction techniques and feature selection methods helps in
effectively predicting breast cancer.
44
4.7.1 SVM model
For predicting breast cancer, the SVM model has been implemented with parameters
such as kernel, gamma, C, degree and tol with varying values chosen through
GridSearch method. Based on the GridSearch method, the parameters and its values
have been chosen for implementing SVM with feature selection techniques such as Chi-
square FS, L1 based FS, Recursive feature elimination techniques and dimensionality
reduction techniques such as PCA and LDA techniques. The parameters and its values
chosen for each separate methods using Grid search method is as follows.
Table 1 Parameters of SVM
L1
Without Chi- Recursive Feature
Parameters based
FS square FS Elimination PCA LDA
FS
linea
kernel poly Poly poly poly poly
r
gamma auto Auto auto auto auto scale
C 1.00 1.00 1.00 1.00 1.00 1.00
degree 7 7 7 3 3 3
tol 0.001 0.001 0.001 0.001 0.001 0.001
Based on the above parameters and its chosen values, the Support Vector Machine ML
model have been implemented and evaluated with feature selection techniques such as
Chi-square FS, L1 based FS, Recursive feature elimination techniques and
dimensionality reduction techniques such as PCA and LDA techniques. The assessed
results are as tabulated below.
Table 2 Results of SVM
Accuracy Time Taken
Algorithm
Trainin
Validation Testing Validation Testing
g
45
Without FS 0.90 0.94 0.00 0.06 0.04
Chi-square FS 0.90 0.94 0.01 0.06 0.07
L1 based FS 0.92 0.91 0.01 0.10 0.07
Recursive Feature
0.93 0.96 0.01 0.19 0.06
Elimination
PCA 0.88 0.93 0.01 0.12 0.07
LDA 0.61 0.77 0.00 0.06 0.06
Based on the evaluated results, the SVM model by selecting the most essential features
using Recursive feature elimination technique gives the maximum validation accuracy
of 93% and testing accuracy of 96% but the time taken for validation and testing is quite
long of 0.19secs and 0.06secs respectively. The combination of RFE and SVM is a
powerful technique for improving the accuracy of ML models because RFE can help to
remove features that are not important, while SVM can help to identify the optimal
hyperplane that splits the two classes in the dataset. Comparing the performance in terms
of time taken, the SVM model with all features and SVM model with LDA
dimensionality reduction techniques takes least time to validate and test the model of
0.06secs and 0.06 secs respectively.
4.7.2 Random Forest Classifier
The Random Forest ML model has been constructed to predict breast cancer, and several
values for the parameters criteria, max_features, n_estimators, min_samples_split, and
min_samples_leaf have been selected using the GridSearch method. The parameters and
their specified values for the various Grid search algorithms are as follows.
Table 3 Parameters of random forest
Chi- L1 Recursive
Without
Parameters square based Feature
FS PCA LDA
FS FS Elimination
criterion entropy entropy entropy entropy entropy entropy
46
max_features sqrt sqrt sqrt sqrt sqrt sqrt
n_estimators 100 100 100 100 100 100
min_samples_spli
2 2 2 2 2 2
t
min_samples_leaf 1 1 1 1 1 1
The Random Forest ML model has been built and assessed with feature selection
techniques like Chi-square FS, L1 based FS, Recursive feature removal techniques, and
dimensionality reduction approaches like PCA and LDA procedures using the
aforementioned parameters and their chosen values. The results are tabulated below after
being evaluated.
Table 4 Results of random forest
Accuracy Time Taken
Algorithm
Trainin
g
Without FS 0.96 0.95 0.11 0.04 0.05
Chi-square FS 0.93 0.91 0.46 0.13 0.07
L1 based FS 0.95 0.96 0.48 0.21 0.07
Recursive Feature
0.96 0.95 0.24 0.08 0.06
Elimination
PCA 0.93 0.90 0.33 0.11 0.12
LDA 0.92 0.93 0.33 0.15 0.08
By selecting the most important features with L1-based FS, the RF model achieves the
highest validation accuracy of 95% and the highest testing accuracy of 96%, but the
training, validation, and testing times are relatively long compared to other models at
0.48secs, 0.21secs, and 0.07secs, respectively. L1-based feature selection is effective at
47
picking out the most crucial characteristics of a model because it penalizes the absolute
value of the coefficients of the features. In terms of time, the RF model without any
feature selection technique requires the least amount of time to validate and evaluate the
model, 0.04secs and 0.05secs, respectively.
4.7.3 Decision Tree
The Decision Tree ML model for predicting breast cancer has been constructed with
parameters like splitter, criterion, max_features, class_weight, and min_samples_split
with variable values selected using GridSearch method. The parameters and values
selected for each distinct technique using the Grid search method are detailed below.
Table 5 Parameters of decision tree
Recursive
Chi- L1
Without Feature
Parameters square based PCA LDA
FS Eliminatio
FS FS
n
splitter random random best best random best
criterion gini gini gini gini gini gini
max_features log2 log2 auto auto auto auto
balance balance balance balance balance

class_weight balanced
d d d d d
min_samples_spl 11 8
5 7 5 9
it
The Decision Tree ML model was implemented and evaluated using feature selection
techniques including Chi-square FS, L1-based FS, Recursive feature elimination
techniques, and dimensionality reduction techniques including PCA and LDA
techniques. The tabulated results of the evaluation are shown below.
Table 6 Results of decision tree
Algorithm Accuracy Time Taken
48
Trainin
g
Without FS 0.95 0.95 0.00 0.04 0.04
Chi-square FS 0.90 0.96 0.01 0.35 0.06
L1 based FS 0.93 0.94 0.02 0.26 0.08
Recursive Feature
0.95 0.96 0.01 0.08 0.06
Elimination
PCA 0.77 0.82 0.00 0.06 0.06
LDA 0.89 0.95 0.01 0.05 0.06
Based on the evaluated results, the DT model that selects the most important features
using Chi-Square FS and the Recursive feature elimination technique provides the
highest testing accuracy of 96%. However, the validation accuracy of Chi-square FS is
significantly lower than that of RFE, at 90%. The combination of RFE and DT is a
potent technique for increasing the accuracy of ML models, as RFE can be used to
eliminate unimportant features and DT can be utilized to determine the optimal decision
tree for the model. In terms of time, the DT model without any feature selection
techniques takes the least amount of time to validate and evaluate the model, 0.04
seconds for each.
4.7.4 MLP Classifier
For predicting breast cancer, the MLP Classifier model has been implemented with
parameters such as activation, solver, learning_rate, learning_rate_init and tol with
varying values chosen through GridSearch method. The parameters and its values
chosen for each separate methods using Grid search method is as follows.
Table 7 Parameters of MLP
Parameters Without Chi- L1 based Recursive

FS square FS Feature
49
Eliminatio PCA LDA
FS
n
activation identity identity identity identity identity identity
solver adam sgd lbfgs lbfgs lbfgs lbfgs
constan constan constan constan

learning_rate constant constant
t t t t
learning_rate_ini
0.001 0.001 0.001 0.001 0.001 0.001
t
0.00000 1e-05
tol 0.0001 0.0001 0.0001 1e-07
1
Based on the above parameters and its chosen values, the MLP Classifier model have
been implemented and evaluated with feature selection techniques such as Chi-square
FS, L1 based FS, Recursive feature elimination techniques and dimensionality reduction
techniques such as PCA and LDA techniques. The assessed results are as tabulated
below.
Table 8 Results of MLP
Accuracy Time Taken
Algorithm
Trainin
g
Without FS 0.94 0.96 0.20 0.05 0.06
Chi-square FS 0.80 0.79 0.06 0.07 0.08
L1 based FS 0.96 0.95 1.58 0.22 0.09
Recursive Feature
0.94 0.96 0.70 0.14 0.08
Elimination
PCA 0.94 0.96 0.99 0.15 0.09
50
LDA 0.93 0.90 0.15 0.22 0.10
Based on the assessed results, the MLP Classifier model by selecting the most important
features using Recursive feature elimination technique, PCA dimensionality reduction
techniques and without feature selection techniques gives the maximum validation
accuracy of 94% and testing accuracy of 96%, because the MLP Classifier is a effective
ML algorithm that can learn complex relationships among the features and the target
variable. Comparing the efficiency in terms of time taken, the MLP Classifier model
with Chi-square feature selection techniques takes least time to train, validate and test
the model of 0.06secs, 0.07secs and 0.08secs respectively.
4.7.5 Passive Classifier
The Passive Classifier model for predicting breast cancer has been implemented with
GridSearch-determined parameter values for C, max_iter, validation_fraction, loss, and
tol. The parameters and their values selected for each distinct method using Grid search
are as follows:
Table 9 Parameters of passive classifier
Chi- L1 Recursive
Without
Parameters square based Feature
FS PCA LDA
FS FS Elimination
C 1.0 1.0 1.0 1.0 1.0 2.0
max_iter 1000 1000 1000 1000 1000 1000
validation_fra
0.2 0.2 0.2 0.2 0.2 0.2
ction
loss hinge hinge hinge hinge hinge hinge
0.000
tol 0.0001 0.0001 0.0001 0.0001 0.0001
1
51
The Passive Classifier model was implemented and evaluated using feature selection
techniques including Chi-square FS, L1-based FS, Recursive feature elimination
techniques, and dimensionality reduction techniques including PCA and LDA
techniques. The tabulated results of the evaluation are shown below.
Table 10 Results of passive classifier
Accuracy Time Taken
Algorithm
Trainin
g
Without FS 0.89 0.92 0.01 0.05 0.05
Chi-square FS 0.91 0.90 0.01 0.06 0.09
L1 based FS 0.89 0.92 0.01 0.06 0.21
Recursive Feature
0.84 0.82 0.02 0.22 0.06
Elimination
PCA 0.94 0.94 0.00 0.07 0.07
LDA 0.92 0.92 0.01 0.07 0.09
Based on the evaluated results, the Passive Classifier model that selects the most
important features using PCA dimensionality reduction techniques achieves the highest
validation & testing accuracy of 94%. This is because PCA can help to decrease the
dimensionality of the dataset without losing too much information, which makes the
model easier to train and can improve the model's accuracy, while also requiring the
least amount of time to train, validate, and test the model, with respective times of
0.00secs, 0.07secs, and 0.07secs.
52
5 Results & conclusion
5.1 Technical Challenges faced & its solution
The technological difficulties encountered and their resolutions are described in the
following sections.
5.1.1 Name error
Error corrected code:
5.1.2 Attribute Error
Error corrected code:
53
5.2 Interpretation of Results
5.2.1 Accuracy
Based on the results, the accuracy of the ML models such as SVM, RF, DT, MLP
Classifier and Passive Classifier along with FS techniques such as Chi-square FS, L1
based FS, RFE and dimensionality reduction techniques such as PCA & LDA ae
compared to analyzes the effectiveness of these models in predicting breast cancer. The
comparison results are shown below.
Table 11 Overall results
L1
Without Chi- Recursive Feature PC
based LDA
FS square FS Elimination A
FS
SVM Model 0.94 0.94 0.91 0.96 0.93 0.97
Random
Forest 0.95 0.91 0.96 0.95 0.90 0.93
Decision
Tree 0.95 0.96 0.94 0.96 0.82 0.95
MLP
Classifier 0.96 0.79 0.95 0.96 0.96 0.90
Passive
Classifier 0.92 0.90 0.92 0.82 0.94 0.92
54
Among all the feature selection techniques, the Recursive Feature Elimination (RFE)
model gives the maximum accuracy of 96%, which is higher among all feature selection
techniques. It gives maximum accuracy of the other models, as it works by recursively
removing features that are not important. This is done by iteratively building a model
and then removing the features that have the least impact on the model's performance.
Among all the dimensionality reduction techniques, the LDA dimensionality reduction
technique with SVM model gives the maximum accuracy of 97%, which is higher
among all other dimensionality reduction techniques. LDA is better at finding the
directions in the dataset that separate the two classes the best and hence it gives more
accuracy when compared to the PCA technique in predicting breast cancer.
Among all ML models, the SVM model with LDA feature selection has the highest
accuracy of 97%, followed by the SVM model with RFE feature selection, RF model
with L1 based feature selection, DT model with Chi-square FS & RFE and MLP
Classifier with RFE and PCA technique gives the maximum accuracy of 96% in
predicting breast cancer.
5.3 Critical Analysis
The detailed analysis of this study is broken down into two distinct sessions, which are
titled respectively "Research Findings" and "Comparison of other research work," and
the specifics of each session are provided in the following information.
5.3.1 Research Findings
Why the SVM along with LDA feature selection technique gives maximum accuracy in
predicting breast cancer?
The SVM along with LDA feature selection technique gives maximum accuracy in
predicting breast cancer because LDA is a linear discriminant analysis technique that
helps in choosing the most related features for the classification task. This helps to
reduce the dimensionality of the data, which can help to enhance the accuracy of the
model. The SVM model with LDA feature selection has the highest accuracy of 97%,
which is significantly greater than the accuracy of the other models.
Why MLP Classifier with Chi-square FS, Decision Tree with PCA and Passive
Classifier with RFE could not able to predict breast cancer?
55
MLP Classifier, Decision Tree, and Passive Classifier are all supervised learning
models, which means that they learn from labeled data. However, it is possible that the
features selected by the feature selection techniques are not informative enough for these
models to learn from.
Why RFE along with SVM, DT, MLP Classifier gives the maximum accuracy of 96% in
predicting breast cancer?
Recursive feature elimination (RFE) is a feature selection technique that iteratively

removes features that are least important for predicting the target variable. This can help
in decreasing the data's dimensionality, which in turn can enhance the model's accuracy.
The SVM, DT, and MLP Classifier models are all supervised learning models that are
known for their high accuracy in classification tasks. When combined with RFE feature
selection, these models can achieve even higher accuracy in predicting breast cancer.
5.3.2 Comparison with other research work
The research work (Kumari.M et al., 2018) has created a system for early-stage breast
cancer prediction using the minimal amount of features that can be extracted from the
clinical information. The planned experiment has been carried out using the Wisconsin
breast cancer dataset (WBCD). When using most predictive factors, KNN classifier has
been found to produce the highest classification accuracy of 99.28%. Detecting breast
cancer at an early stage using the suggested method drastically reduces medical costs
and improves quality of life.
Dimensionality reduction, feature ranking, fuzzy logic, and an artificial neural network
are all used in this study (Gupta.K et al., 2019) to develop a new approach to data
classification. The purpose of this research is to evaluate the current integrated methods
to breast cancer detection and prognosis and to draw conclusions about their relative
merits. The best diagnostic classification accuracy is provided by principal component
analysis (PCA) using a neural network (NN), however gain ratio and chi-square also
perform well (85.78 percent). These findings pave the way for the creation of a Medical
Expert System model, which may be utilised for the automated diagnosis of additional
diseases and high-dimensional medical datasets.
This study's main objective (Ibrahim, S et al., 2021) was to use correlation analysis and
variance of input features to select feature selection strategies and feed them to a
classification algorithm. An ensemble technique improved breast cancer categorization.
56
The public WBCD dataset tested the suggested method. Correlation and principal
component analysis reduced dimension. LR, SVM, NB, KNN, RF, DT, and SGD were
examined and compared. Tweaking hyper-parameters improved classification
performance. Two voting approaches were utilized using top classification algorithms.
The majority of voters choose a hard vote, whereas the highest probability chooses a soft
vote. With 98.24% accuracy, 99.29% precision, and 95.89% recall, the proposed method
outperformed the current standard.
This research work (Karimi.K et al., 2022) uses a genetic algorithm-optimized synthetic
model (CHFS-BOGA) to predict breast cancer. OGA using initialization generation and
genetic algorithms with the C4.5 decision tree classifier as the fitness function replaces
chance and random selection. Wisconsin UCI machine learning provided 569 rows and
32 columns. Using its explorer module, Weka, an open-source data mining software,
evaluated the dataset. The hybrid feature selection approach outperforms single filter
approaches and PCA. These criteria predict earnings. Support vector machine (SVM)
classifier-based CHFS-BOGA iterations have 97.3 percent accuracy. They achieved a
best-in-class 98.25% accuracy on a data set of 70.0% training data and 30.0% testing
data using (CHFS-BOGA-SVM) and 100% accuracy on the entire training set. The ROC
curve was also 1.0. The CHFSBOGA-SVM system identified malignant from benign
breast tumours.
This research work focus on predicting breast cancer by using feature selection and
dimensionality reduction techniques. For this purpose, Breast Cancer Wisconsin
(Diagnostic) Dataset has been chosen for this research work and the same has been
collected and imported in the pandas data frame. Further, the data is preprocessed,
visualized and splitted before implementing the ML models. Then the ML models such
as SVM, RF, DT, MLP Classifier and Passive Classifier have been implemented,
evaluated without any feature selection techniques and feature selection techniques such
as Chi-square, L1 based FS and RFE and dimensionality reduction techniques including
PCA and LDA. Comparing the performance of the ML models with feature selection
and dimensionality reduction techniques, the SVM model with LDA feature selection
has the highest accuracy of 97% in predicting breast cancer. Overall, both feature
selection and dimensionality reduction techniques can be effective in enhancing the
accuracy of breast cancer predictions.
57
5.4 Conclusion
Breast cancer is a disease in which breast cells develop uncontrollably, and there are
variety of breast cancer depending on which breast cells become cancer. Through blood
vessels and lymph vessels, breast cancer can expand outside the breast. Importantly,
early detection of this fatal disease reduces the mortality rate and increases the survival
period of breast cancer patients. ML models are capable of autonomously learning and
adjusting actions for breast cancer prediction based on historical data without requiring
human intervention. When using high-dimensional medical data to predict breast cancer,
it can be difficult to interpret and analyze these data. Due to their complexity, traditional
methods may be incapable of effectively capturing the underlying relationships between
the various clinical factors. Due to the high-dimensionality of the data, which can result
in over- or under-fitting of the model, the predictive accuracy of such models may also
be limited. Thus, the focus of this research is on predicting breast cancer through the use
of feature selection and dimensionality reduction techniques. Breast Cancer Wisconsin
(Diagnostic) Dataset has been collected and imported into the pandas data frame for this
research project. In addition, the data is preprocessed, visualized, and split before ML
models are implemented. Then, ML models such as SVM, RF, DT, MLP Classifier, and
Passive Classifier were implemented and evaluated with and without feature selection
techniques, including Chi-square, L1-based FS and RFE, and dimensionality reduction
techniques including PCA and LDA. Comparing the efficacy of ML models with feature
selection and dimensionality reduction techniques, the SVM model with LDA feature
selection has the highest predictive accuracy for breast cancer, at 97%. Overall, both
feature selection and dimensionality reduction can improve the accuracy of breast cancer
predictions.
5.4.1 Addressing Research Questions
RQ 1: What are the relative performance differences between feature selection and
dimensionality reduction techniques in improving the accuracy of breast cancer
predictions?
Feature selection approaches and dimensionality reduction strategies both have the
potential to increase the accuracy of breast cancer predictions. Feature selection
techniques do this by eliminating unnecessary characteristics from the dataset, while
methods for reducing dimensionality reduce the amount of features present in the
58
dataset. When evaluating the performance of the different models, the SVM model that
uses LDA feature selection has the greatest accuracy, coming in at 0.97. This shows that
the LDA dimensionality reduction approach is more successful than other feature
selection strategies, which aids in boosting the accuracy of breast cancer predictions for
this specific dataset. In general, the accuracy of breast cancer forecasts may be improved
using strategies like as feature selection, as well as dimensionality reduction, which can
both be useful.
5.5 Future Enhancement
The results analysis shows that the integration of multidimensional data with different
classification, feature selection, and dimensionality reduction strategies might give
useful inference instruments for this area. There is a need for further study in this area to
better understand how to hyper-tune model parameters to increase the accuracy of
classification methods. In the future, it is anticipated that multiple datasets and deep
learning models will be utilized to attain high precision.
Future research could concentrate on devising personalized treatment plans based on the
characteristics and genetic profile of the patient's tumor. Once breast cancer has been
identified, it is essential to devise an individualized treatment plan for each patient.
Currently, the majority of breast cancer detection models only utilize a single data type,
such as tissue samples or medical images. Using multimodal data, such as tissue
samples, medical images, and patient history, could enhance detection accuracy.
59
6 Reference
[1] A. Ashraf and R. Yadav, “Integrative computational approach for gene

expression profiling of metastatic breast cancer,” Curr. Med. Res. Pract., vol. 13, no. 3,
p. 100, 2023.
[2] R. Jalloul et al., “A review of machine learning techniques for the classification
and detection of breast cancer from medical images,” Diagnostics (Basel), vol. 13, no.
14, p. 2460, 2023.
[3] K. Lång et al., “Artificial intelligence-supported screen reading versus standard
double reading in the Mammography Screening with Artificial Intelligence trial (Masai):
A clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded,
screening accuracy study,” Lancet Oncol., vol. 24, no. 8, pp. 936-944, 2023.
[4] R. Gurumoorthy and M. Kamarasan, “Computer aided breast cancer detection
and classification using optimal deep learning” in International Conference on
Sustainable Computing and Data Communication Systems (ICSCDS), vol. 2023. IEEE,
2023, Mar., pp. 143-150.
[5] V. Jaiswal et al., “A breast cancer risk predication and classification model with
ensemble learning and big data fusion,” Decis. Anal. J., p. 100298, 2023.
[6] R. Rabiei et al., “Prediction of breast cancer using machine learning
approaches,” J. Biomed. Phys. Eng., vol. 12, no. 3, p. 297-308, 2022.
[7] D. Bressan et al., “The dawn of spatial omics,” Science, vol. 381, no. 6657,
p.eabq4964, eabq4964, 2023.
[8] S. P. Jakkaladiki and F. Maly, “An efficient transfer learning based cross model
classification (TLBCM) technique for the prediction of breast cancer,” PeerJ Comput.
Sci., vol. 9, p. e1281, 2023.
[9] A. Sharma et al., “Improved technique to improve breast cancer prediction,” J.
Data Acquisition Process., vol. 38, no. 2, p. 1889, 2023.
[10] F. Hamedani-KarAzmoudehFar et al., “Breast cancer classification by a new
approach to assessing deep neural network-based uncertainty quantification methods,”
Biomed. Signal Process. Control, vol. 79, p. 104057, 2023.
[11] X. Li et al., “Automatic breast cancer diagnosis based on hybrid dimensionality
reduction technique and ensemble classification,” J. Cancer Res. Clin. Oncol., vol. 149,
no. 10, pp. 7609-7627, 2023.
60
[12] S. Ibrahim et al., “Feature selection using correlation analysis and principal
component analysis for accurate breast cancer diagnosis,” J. Imaging, vol. 7, no. 11, p.
225, 2021.
[13] A. Jamal et al., “Dimensionality reduction using pca and k-means clustering for
breast cancer prediction,” LKJITI, vol. 9, no. 3, pp. 192-201, 2018.
[14] K. Karimi et al., “Two new feature selection methods based on learn-heuristic
techniques for breast cancer prediction: A comprehensive analysis,” Ann. Oper. Res.,
pp. 1-36, 2022.
[15] K. Gupta and R. R. Janghel, “Dimensionality reduction-based breast cancer
classification using machine learning” in Computational Intelligence: Theories,
Applications and Future Directions, vol. I. Singapore: Springer, 2019, pp. 133-146.
[16] R. Zebari et al., “A comprehensive review of dimensionality reduction
techniques for feature selection and feature extraction,” J. Appl. Sci. Technol. Trends,
vol. 1, no. 2, pp. 56-70, 2020.
[17] M. Kumari and V. Singh, “Breast cancer prediction system,” Procedia Comput.
Sci., vol. 132, pp. 371-376, 2018.
[18] L. Nesamani and S. N. S. Rajini, “Predictive modeling for classification of breast
cancer dataset using feature selection techniques” in Research Anthology on Medical
Informatics in Breast and Cervical Cancer. IGI Global, 2023, pp. 166-177.
[19] H. K. Malik et al., “Comparison of feature selection and feature extraction role in
dimensionality reduction of big data,” J. Tech., vol. 5, no. 1, pp. 184-192, 2023.
[20] G. Kou et al., “Evaluation of feature selection methods for text classification
with small datasets using multiple criteria decision-making methods,”, Applied Soft
Computing, vol. 86, p. 105836, 2020.
[21] U. Das et al., “Accurate recognition of coronary artery disease by applying
machine learning classifiers” in 23rd International Conference on Computer and
Information Technology (ICCIT), Dec. 2020, 2020, pp. 1-6.
[22] B. M. S. Hasan and A. M. Abdulazeez, A Review of Principal Component
Analysis Algorithm for Dimensionality Reduction, 2021, pp. 20-30.
[23] H. Jelodar et al., “Latent Dirichlet allocation (LDA) and topic modeling: Models,
applications, a survey,”, Multimed. Tools Appl., vol. 78, no. 11, pp. 15169-15211, 2019.
[24] I. Ibrahim and A. Abdulazeez, “The role of machine learning algorithms for
diagnosing diseases,”, JASTT, vol. 2, no. 1, pp. 10-19, 2021.
61
[25] S. Buschjager et al., “Realization of random forest for real-time evaluation
through tree framing” in IEEE International Conference on Data Mining (ICDM), Nov.
2018, 2018, pp. 19-28.
[26] H. Taud and J. F. Mas, “Multilayer Perceptron (MLP)” in Geomatic Approaches
for Modeling Land Change Scenarios, 2018, pp. 451-455.
[27] B. Remeseiro and V. Bolon-Canedo, “A review of feature selection methods in
medical applications,”, Comput. Biol. Med., vol. 112, p. 103375, 2019.
[28] B. Venkatesh and J. Anuradha, “A review of feature selection and its methods,”,
Cybernetics and Information Technologies, vol. 19, no. 1, pp. 3-26, 2019.
[29] V. Jaiswal et al., “A breast cancer risk predication and classification model with
ensemble learning and big data fusion,” Decis. Anal. J., p. 100298, 2023.
[30] H. K. Malik et al., “Comparison of feature selection and feature extraction role in
dimensionality reduction of big data,” J. Tech., vol. 5, no. 1, pp. 184-192, 2023.
[31] A. García-Domínguez et al., “Diabetes detection models in Mexican patients by
combining machine learning algorithms and feature selection techniques for clinical and
paraclinical attributes: A comparative evaluation,” J. Diabetes Res., vol. 2023, 9713905,
2023.
[32] X. Sun and A. Qourbani, “Combining ensemble classification and integrated
filter-evolutionary search for breast cancer diagnosis,” J. Cancer Res. Clin. Oncol., pp.
1-17, 2023.
[33] L. Guo et al., “Breast cancer prediction model based on clinical and biochemical
characteristics: Clinical data from patients with benign and malignant breast tumors
from a single center in South China,” J. Cancer Res. Clin. Oncol., pp. 1-13, 2023.
[34] S. Rostamzadeh et al., “A comparative investigation of machine learning
algorithms for predicting safety signs comprehension based on socio-demographic
factors and cognitive sign features,” Sci. Rep., vol. 13, no. 1, p. 10843, 2023.
[35] L. Nesamani and S. N. S. Rajini, “Predictive modeling for classification of breast
cancer dataset using feature selection techniques” in Research Anthology on Medical
Informatics in Breast and Cervical Cancer. IGI Global, 2023, pp. 166-177.
[36] A. Y. Ikram and L. O. Q. M. A. N., “Chakir, "Arabic text classification in the
legal domain”," in Third International Conference on Intelligent Computing in Data
Sciences (ICDS), Oct. 2019, 2019, pp. 1-6.
62
[37] E. Strelcenia and S. Prakoonwit, “Effective feature engineering and classification
of breast cancer diagnosis: A comparative study. BioMedInformatics,” vol. 3, no. 3, pp.
616-631, 2023.
[38] S. S. Travers et al., “Breast cancer brain metastases localization and risk of
hydrocephalus: A single institution experience,” J. Neurooncol., vol. 163, no. 1, pp. 115-
121, 2023.
[39] M. M. Hassan et al., “A comparative assessment of machine learning algorithms
with the Least Absolute Shrinkage and Selection Operator for breast cancer detection
and prediction,” Decis. Anal. J., vol. 7, p. 100245, 2023.
[40] R. Shang et al., “Local discriminative based sparse subspace learning for feature
selection,”, Pattern Recognition, vol. 92, pp. 219-230, 2019.
[41] J. Abdollahi et al., Diabetes Data Classification Using Deep Learning Approach
and Feature Selection Based on Genetic, 2023.
[42] R. Lamba et al., “A hybrid feature selection approach for Parkinson’s detection
based on mutual information gain and recursive feature elimination,”, Arab. J. Sci. Eng.,
vol. 47, no. 8, pp. 10263-10276, 2022.
[43] P. Misra and A. S. Yadav, Improving the Classification Accuracy Using
Recursive Feature Elimination with Cross-Validation, 2020, pp. 659-665.
[44] N. Tikher, Brain tumor detection model using digital image processing and
transfer learning, 2023 ([Doctoral dissertation]. St. Mary’s University).
[45] R. Hu et al., “Evaluation of customs supervision competitiveness using principal
component analysis,” Sustainability, vol. 15, no. 3, p. 1833, 2023.
[46] F. Abbas et al., Assessing the Dimensionality Reduction of the Geospatial
Dataset Using Principal Component Analysis (PCA) and Its Impact on the Accuracy and
Performance of Ensembled and Non-Ensembled Algorithm, 2023.
[47] J. P. Bharadiya, “A tutorial on principal component analysis for dimensionality
reduction in machine learning,” Int. J. Innov. Sci. Res. Technol., vol. 8, no. 5, pp. 2028-
2032, 2023.
[48] S. Ayesha et al., “Overview and comparative study of dimensionality reduction
techniques for high dimensional data,”, Information Fusion, vol. 59, pp. 44-58, 2020.
[49] R. Zebari et al., “A comprehensive review of dimensionality reduction
techniques for feature selection and feature extraction,”, JASTT, vol. 1, no. 2, pp. 56-70,
2020.
63
[50] L. Zhang et al., “Hyperspectral dimensionality reduction based on multiscale
superpixelwise kernel principal component analysis,”, Remote Sensing, vol. 11, no. 10,
p. 1219, 2019.
[51] J. P. Bharadiya, “A tutorial on principal component analysis for dimensionality
reduction in machine learning,” Int. J. Innov. Sci. Res. Technol., vol. 8, no. 5, pp. 2028-
2032, 2023.
[52] V. Tomar et al., “Single sample face recognition using deep learning: A survey,”
Artif. Intell. Rev., pp. 1-49, 2023.
[53] I. Babikir et al., “Evaluation of principal component analysis for reducing
seismic attributes dimensions: Implication for supervised seismic facies classification of
a fluvial reservoir from the Malay Basin, offshore Malaysia,” J. Petrol. Sci. Eng., vol.
217, p. 110911, 2022.
[54] C. Schwarz, “ldagibbs: A command for topic modeling in Stata using latent
Dirichlet allocation,”, The. Stata Journal, vol. 18, no. 1, pp. 101-117, 2018.
[55] P. N. Thotad et al., “Diabetes disease detection and classification on Indian
demographic and health survey data using machine learning methods,” Diabetes Metab.
Syndr., vol. 17, no. 1, p. 102690, 2023.
[56] C. Schwarz, “ldagibbs: A command for topic modeling in Stata using latent
Dirichlet allocation,”, The. Stata Journal, vol. 18, no. 1, pp. 101-117, 2018.
[57] W. Jia et al., “Feature dimensionality reduction: A review,” Complex Intell.
Syst., vol. 8, no. 3, pp. 2663-2693, 2022.
[58] A. K. Tyagi and P. Chahal, “Artificial intelligence and machine learning
algorithms” in Research Anthology on Machine Learning Techniques, Methods, and
Applications. IGI Global, 2022, pp. 188-219.
[59] I. Izonin et al., “A two-step data normalization approach for improving
classification accuracy in the medical diagnosis domain,” Mathematics, vol. 10, no. 11,
p. 1942, 2022.
[60] S. Chaudhury et al., “Effective image processing and segmentation-based
machine learning techniques for diagnosis of breast cancer,” Comp. Math. Methods
Med., vol. 2022, 6841334, 2022.
[61] A. D. Sendek et al., “Machine learning modeling for accelerated battery
materials design in the small data regime,” Adv. Energy Mater., vol. 12, no. 31, p.
2200553, 2022.
64
[62] W. Zheng et al., “Interpretability application of the Just-in-Time software defect
prediction model,” J. Syst. Softw., vol. 188, p. 111245, 2022.
[63] D. Jalal and T. Ezzedine, “Decision tree and support vector machine for anomaly
detection in water distribution networks,”, 2020 in International Wireless
Communications and Mobile Computing (IWCMC), Jun. 2020, pp. 1320-1323.
[64] M. F. Ak, “A comparative analysis of breast cancer detection and diagnosis
using data visualization and machine learning applications,”, Healthcare (Basel), vol. 8,
no. 2, p. 111, 2020.
[65] Y. Wang et al., “Deep learning-based socio-demographic information
identification from smart meter data,”, IEEE Trans. Smart Grid, vol. 10, no. 3, pp. 2593-
2602, 2018.
[66] M. C. Gomes et al., “Tool wear monitoring in micromilling using support vector
machine with vibration and sound sensors,”, Precision Engineering, vol. 67, pp. 137-
151, 2021.
[67] S. Liang, “Comparative analysis of SVM, XGBoost and neural network on hate
speech classification,”, RESTI, vol. 5, no. 5, pp. 896-903, 2021.
[68] M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical
learning,”, The. Stata Journal, vol. 20, no. 1, pp. 3-29, 2020.
[69] V. Kadam et al., “Enhancing surface fault detection using machine learning for
3D printed products,”, ASI, vol. 4, no. 2, p. 34, 2021.
[70] D. Ramayanti and U. Salamah, Text Classification on Dataset of Marine and
Fisheries Sciences Domain Using Random Forest Classifier, 2018, pp. 1-7.
[71] L. Ren et al., “An adaptive Laplacian weight random forest imputation for
imbalance and mixed-type data,” Inf. Syst., vol. 111, p. 102122, 2023.
[72] C. Faria et al., “A tree-based approach to forecast the total nitrogen in
wastewater treatment plants” in International Symposium on Distributed Computing and
Artificial Intelligence, Sept. 2021, pp. 137-147.
[73] B. Charbuty and A. Abdulazeez, “Classification based on decision tree algorithm
for machine learning,”, JASTT, vol. 2, no. 1, pp. 20-28, 2021.
[74] L. Abhishek, “Optical character recognition using ensemble of SVM, MLP and
extra trees classifier” in International Conference for Emerging Technology (INCET),
Jun. 2020, 2020, pp. 1-4.
[75] J. Anmala and V. Turuganti, “Comparison of the performance of decision tree
(DT) algorithms and extreme learning machine (ELM) model in the prediction of water
65
quality of the Upper Green River watershed,”, Water Environ. Res., vol. 93, no. 11, pp.
2360-2373, 2021.
[76] A. H. Fath et al., Implementation of Multilayer Perceptron (MLP) and Radial
Basis Function (RBF) Neural Networks to Predict Solution Gas-Oil Ratio of Crude Oil
Systems, 2020, pp. 80-91.
[77] L. E. McCoubrey et al., “Machine learning uncovers adverse drug effects on
intestinal bacteria,”, Pharmaceutics, vol. 13, no. 7, p. 1026, 2021.
[78] A. N. Ahmed et al., Machine Learning Methods for Better Water Quality
Prediction, 2019, p. 124084.
[79] A. Al Bataineh et al., “Multi-layer Perceptron training optimization using nature
inspired computing,” IEEE Access, vol. 10, pp. 36963-36977, 2022.
[80] S. Baressi Šegota et al., “Frigate speed estimation using CODLAG propulsion
system parameters and multilayer Perceptron,” Naše More, vol. 67, no. 2, pp. 117-125,
2020.
[81] P. N. Kumar, Detection of Textual Propaganda Using Passive Aggressive
Classifiers, 2023, pp. 73-79.
[82] K. Varada Rajkumar et al., “Detection of fake news using natural language
processing techniques and passive aggressive classifier” in Intelligent Systems and
Sustainable Computing, Proc. ICISSC 2021, Singapore: Springer Nature Singapore,
2022, pp. 593-601.
[83] S. A. Krishnan et al., SQL Injection Detection Using Machine Learning, p. 11.
[84] S.M. TS, P. S. Sreeja, and R. P. Ram, "Fake News Article classification using
Random Forest, Passive Aggressive, and Gradient Boosting," in 2022 International
Conference on Connected Systems & Intelligence (CSI), Aug. 2022, pp. 1-6.
[85] S. Sievert et al., “Better and faster hyperparameter optimization with Dask” in,
Proceedings of the Python in Science Conference, Proc. 18th Python in Science
Conference, pp. 118-125, Jul. 2019.
[86] L. V. Von Krannichfeldt et al., “Online ensemble approach for probabilistic wind
power forecasting,”, IEEE Trans. Sustain. Energy, vol. 13, no. 2, pp. 1221-1233, 2021.
[87] D. Chicco and G. Jurman, “The advantages of the Matthews correlation
coefficient (MCC) over F1 score and accuracy in binary classification evaluation,”,
BMC Genomics, vol. 21, no. 1, pp. 6, 2020.
66
[88] R. Yacouby and D. Axman, “Probabilistic extension of precision, recall, and f1
score for more thorough evaluation of classification models” in Proc. First Workshop on
Evaluation and Comparison of NLP Systems, Nov. 2020, pp. 79-91.
[89] M. Mudassir et al., “Time-series forecasting of Bitcoin prices using high-
dimensional features: A machine learning approach,”, Neural Comput. Appl., pp. 1-15,
2020.
67
Appendix
1- Breast cancer data collection
import warnings as Effe_cancerbre__w
Effe_cancerbre__w.filterwarnings("ignore")
####### effective of breast cancer csv file loading
import pandas as Effe_cancerbre__p
Effe_cancerbre = Effe_cancerbre__p.read_csv('data.csv')
Effe_cancerbre.head(n=10)
Effe_cancerbre.tail(n=10)
Effe_cancerbre.shape
Effe_cancerbre.mean()
Effe_cancerbre.max()
Effe_cancerbre=Effe_cancerbre.rename(columns={"diagnosis":"cancer_target"})
Effe_cancerbre.select_dtypes(include=['object']).dtypes
68
** output 'cancer_target' alone is the object type column.
Effe_cancerbre.info()
## avoid variables 'id', 'Unnamed: 32'
Effe_cancerbre= Effe_cancerbre.drop(['id', 'Unnamed: 32'], axis=1)
Effe_cancerbre.isna().any() ##### @@@@ NULL condition @@@@@
Effe_cancerbre[Effe_cancerbre.duplicated()] ##### @@@@ duplicates condition

@@@@@
#### Plottings- EDA
import seaborn as Effe_cancerbre_bn
import matplotlib.pyplot as Effe_cancerbre_oy
Effe_cancerbre_bn.countplot(x ='cancer_target', data = Effe_cancerbre, palette = "Set2",

saturation=0.3)
Effe_cancerbre_oy.title("breast cancer result")
***** Malignant breast cancer results are majority.
Effe_cancerbre_bn.distplot(Effe_cancerbre['area_mean'], color='orange')
Effe_cancerbre_oy.title("area mean of cell nucleus")
Effe_cancerbre_oy.ylabel("value")
Effe_cancerbre_oy.show()
69
***** cell nucleus area mean value is differ from patient to patient.
Effe_cancerbre_bn.lineplot(x="radius_mean", y="perimeter_mean", color='brown',

data=Effe_cancerbre)
Effe_cancerbre_oy.title("radius vs perimeter mean of cell nucleus")
****** both radius and perimeter mean of cell nucleus are directly proportional.
Effe_cancerbre.boxplot(by ='cancer_target', column =['concavity_se'], grid = False)
Effe_cancerbre_oy.ylabel("severity of concave portions of the contour")
**** severity of concave portions value is less for maligmant cancer patients.
## *** LE the cancer_target feature ***
from sklearn import preprocessing as Effe_cancerbre_ing
Effe_cancerbre_ingk = Effe_cancerbre_ing.LabelEncoder()
Effe_cancerbre['cancer_target']=
Effe_cancerbre_ingk.fit_transform(Effe_cancerbre['cancer_target'])
Effe_cancerbre['cancer_target']
Effe_cancerbre.to_csv('Effe_cancerbre.csv', index=False)
Effe_cancerbre
2- Breast cancer- ML algorithms
70
Effe_cancerbre = Effe_cancerbre__p.read_csv('Effe_cancerbre.csv')
Effe_cancerbre
Effe_cancerbre['cancer_target'].value_counts()
Effe_cancerbre__X = Effe_cancerbre.drop('cancer_target',axis=1)
Effe_cancerbre__Y = Effe_cancerbre['cancer_target']
Effe_cancerbre__X
Effe_cancerbre__Y
from sklearn.model_selection import train_test_split as Effe_cancerbretrits
##60% of trn data
tst_s=0.4
rdm_ste=55
tst_s1=0.5
Effe_cancerbre__Xtrn, Effe_cancerbre__Xtst, Effe_cancerbre__Ytrn,

Effe_cancerbre__Ytst = Effe_cancerbretrits(Effe_cancerbre__X, Effe_cancerbre__Y,
test_size=tst_s, random_state=rdm_ste)
## 20% of tst and 20% of valdtin data
Effe_cancerbre__Xvld, Effe_cancerbre__Xtst, Effe_cancerbre__Yvld,

Effe_cancerbre__Ytst = Effe_cancerbretrits(Effe_cancerbre__Xtst,
Effe_cancerbre__Ytst, test_size=tst_s1, random_state=rdm_ste)
71
print("train-size : ", Effe_cancerbre__Xtrn.shape[0])
print("validation-size : ", Effe_cancerbre__Xvld.shape[0])
print("test-size : ", Effe_cancerbre__Xtst.shape[0])
Effe_cancerbre__Xtrn
Effe_cancerbre__Xvld
Effe_cancerbre__Xtst
### --- ML algorithms-breast cancer--
from sklearn.metrics import confusion_matrix as Effe_cancerbrecfusx
from sklearn.metrics import classification_report as Effe_cancerbreclfctr
from sklearn.metrics import ConfusionMatrixDisplay as Effe_cancerbrecfmrd
import time as Effe_cancerbretiim
from sklearn.model_selection import GridSearchCV as Effe_cancerbregidr
#### SVM classifier
from sklearn.svm import SVC as Effe_cancerbresvvm
cncr_hyp = { 'kernel': ['linear','poly','sigmoid','rbf'],
'gamma': ['auto','scale'],
'C':[1.0,1.5,2.0,2.5,3.0,3.5,4.0],
'degree':[3,5,7,8,10,13],
72
'tol':[1e-3,1e-5,1e-7,1e-9]}
cncr_hyp_Vb = Effe_cancerbresvvm(random_state=rdm_ste)
cncr_hyp_Vb = Effe_cancerbregidr(cncr_hyp_Vb, cncr_hyp,
cv=2, verbose=1)
cncr_hyp_Vb.fit(Effe_cancerbre__Xtrn.sample(10,random_state=rdm_ste),
Effe_cancerbre__Ytrn.sample(10,random_state=rdm_ste))
print("check breast cancer hypr_par :", cncr_hyp_Vb.best_params_)
print("cancer score: ", cncr_hyp_Vb.best_score_)
cncr_Pdcct1 = Effe_cancerbretiim.time()
cncr_moul1= Effe_cancerbresvvm( C=1.0, degree=7, gamma=

'auto',kernel='poly',tol=0.001)
cncr_moul1.fit(Effe_cancerbre__Xtrn.sample(10,random_state=rdm_ste),
print("\n period of training data:", cncr_Pdcct2-cncr_Pdcct1,"\n")
cncr_pdt= cncr_moul1.predict(Effe_cancerbre__Xvld)
print(Effe_cancerbreclfctr(Effe_cancerbre__Yvld, cncr_pdt))
73
P_os = Effe_cancerbrecfusx(Effe_cancerbre__Yvld,cncr_pdt)
R_op = Effe_cancerbrecfmrd(confusion_matrix = P_os, display_labels = [0,1])
R_op.plot()
print("\n period of validation data:", cncr_Pdcct2-cncr_Pdcct1,"\n")
cncr_pdt= cncr_moul1.predict(Effe_cancerbre__Xtst)
print(Effe_cancerbreclfctr(Effe_cancerbre__Ytst, cncr_pdt))
P_os = Effe_cancerbrecfusx(Effe_cancerbre__Ytst,cncr_pdt)
R_op.plot()
print("\n period of testing data:", cncr_Pdcct2-cncr_Pdcct1,"\n")
#### Random forest classifier
from sklearn.ensemble import RandomForestClassifier as Effe_cancerbrerdmrt
cncr_hyp = { 'criterion': ['entropy','gini','log_loss'],
'max_features': ['sqrt','log2',None],
74
'n_estimators':[100,150,200,250,300,350,400],
'min_samples_split':[2,4,6,8,10,12,14],
'min_samples_leaf':[1,2,3,4,5,6]}
cncr_hyp_Vb = Effe_cancerbrerdmrt(random_state=rdm_ste)
cv=2, verbose=1)
cncr_moul2= Effe_cancerbrerdmrt( criterion='entropy', max_features='sqrt',

min_samples_leaf= 1, min_samples_split=2, n_estimators=100)
75
R_op.plot()
R_op.plot()
#### Decision tree classifier
from sklearn.tree import DecisionTreeClassifier as Effe_cancerbredcinre
76
cncr_hyp = { 'splitter': ['best','random'],
'criterion': ['gini','entropy','log_loss'],
'max_features':['auto','sqrt','log2'],
'class_weight':['dict','balanced'],
'min_samples_split':[5,6,7,8,9,10,11]}
cncr_hyp_Vb = Effe_cancerbredcinre(random_state=rdm_ste)
cv=2, verbose=1)
cncr_moul3= Effe_cancerbredcinre( class_weight='balanced', criterion='gini',

max_features= 'log2', min_samples_split=5, splitter='random')
cncr_moul3.fit(Effe_cancerbre__Xtrn, Effe_cancerbre__Ytrn)
77
R_op.plot()
R_op.plot()
#### MLP classifier
from sklearn.neural_network import MLPClassifier as Effe_cancerbremllp
78
cncr_hyp = { 'activation': ['identity','tanh','relu','logistic'],
'solver': ['lbfgs','sgd','adam'],
'learning_rate':['constant','invscaling','adaptive'],
'learning_rate_init':[0.001,0.0001,0.00001],
'tol':[1e-4,1e-5,1e-6,1e-7]}
cncr_hyp_Vb = Effe_cancerbremllp(random_state=rdm_ste)
cv=2, verbose=1)
cncr_moul4= Effe_cancerbremllp( activation='identity', learning_rate='constant',

learning_rate_init= 0.001, solver='adam',tol=0.0001)
79
R_op.plot()
R_op.plot()
#### passive classifier
80
from sklearn.linear_model import PassiveAggressiveClassifier as Effe_cancerbrepsvea
cncr_hyp = { 'C': [1.0, 5.0, 2.0, 4.0, 3.0],
'max_iter': [1000, 400, 200],
'validation_fraction':[0.2, 0.4, 0.6],
'loss':['hinge', 'huber'],
'tol':[1e-4,1e-5,1e-6,1e-7,1e-3]}
cncr_hyp_Vb = Effe_cancerbrepsvea(random_state=rdm_ste)
cv=2, verbose=1)
cncr_moul5= Effe_cancerbrepsvea( C=1.0, loss='hinge', max_iter= 1000, tol=0.0001,

validation_fraction=0.2)
81
R_op.plot()
R_op.plot()
3- Chi-square -Feature selection
82
from sklearn.feature_selection import chi2 as Effe_cancerbre__lc2
from sklearn.feature_selection import SelectKBest as Effe_cancerbre__lsk
Effe_cancerbre__fsfs_k = Effe_cancerbre__lsk(score_func=Effe_cancerbre__lc2, k=13)
Effe_cancerbre__fsfs_M = Effe_cancerbre__fsfs_k.fit(Effe_cancerbre__X,
Effe_cancerbre__Y)
Effe_cancerbre__fsfs_sr =
Effe_cancerbre__p.DataFrame(Effe_cancerbre__fsfs_M.scores_)
Effe_cancerbre__fsfs_co = Effe_cancerbre__p.DataFrame(Effe_cancerbre__X.columns)
Effe_cancerbre__N = Effe_cancerbre__p.concat([Effe_cancerbre__fsfs_sr,
Effe_cancerbre__fsfs_co],axis=1)
Effe_cancerbre__N.columns = ['Effe_cancerbre__fsfs_sr', 'Effe_cancerbre__fsfs_co']
Effe_cancerbre__N[:]
Effe_cancerbre__X=Effe_cancerbre__X[['area_worst', 'area_mean', 'area_se',

'perimeter_worst', 'perimeter_mean', 'radius_worst', 'radius_mean', 'perimeter_se']]
Effe_cancerbre__X
from sklearn.model_selection import train_test_split as Effe_cancerbretrits
##60% of trn data
tst_s=0.4
rdm_ste=55
tst_s1=0.5
83

#### SVM classifier
'C':[1.0,1.5,2.0,2.5,3.0,3.5,4.0],
'degree':[3,5,7,8,10,13],
'tol':[1e-3,1e-5,1e-7,1e-9]}
84
cv=2, verbose=1)

85
R_op.plot()
R_op.plot()
'n_estimators':[100,150,200,250,300,350,400],
86
cv=2, verbose=1)

87
R_op.plot()
R_op.plot()
88
cv=2, verbose=1)

max_features= 'log2', min_samples_split=7, splitter='random')
R_op.plot()
89
R_op.plot()
#### MLP classifier
'tol':[1e-4,1e-5,1e-6,1e-7]}
90
cv=2, verbose=1)

learning_rate_init= 0.001, solver='sgd',tol=0.0001)
R_op.plot()
91
R_op.plot()
cncr_hyp = { 'C': [1.0, 5.0, 2.0, 4.0, 3.0],
'max_iter': [1000, 400, 200],
'tol':[1e-4,1e-5,1e-6,1e-7,1e-3]}
92
cv=2, verbose=1)

R_op.plot()
93
R_op.plot()
4- L1-based -Feature selection
from sklearn.svm import LinearSVC as Effe_cancerbre__lis
from sklearn.feature_selection import SelectFromModel as Effe_cancerbre__fsfs
Effe_cancerbre__lisk = Effe_cancerbre__lis(penalty="l1", dual=False)
Effe_cancerbre__fsfs_k= Effe_cancerbre__fsfs(estimator=Effe_cancerbre__lisk)
Effe_cancerbre__fsfs_M = Effe_cancerbre__fsfs_k.fit(Effe_cancerbre__X,
Effe_cancerbre__Y)
Effe_cancerbre__fsfs_M.transform(Effe_cancerbre__X)
94
Effe_cancerbre__FS = [Effe_cancerbre__X.columns[o] for o in
range(len(Effe_cancerbre__fsfs_M.get_support())) if
Effe_cancerbre__fsfs_M.get_support()[o] == True]
print(' Columns : ', Effe_cancerbre__FS)
print('Count::: ', len(Effe_cancerbre__FS))
Effe_cancerbre__X=Effe_cancerbre__X[Effe_cancerbre__FS]
Effe_cancerbre__X
##60% of trn data
tst_s=0.4
rdm_ste=55
tst_s1=0.5


95
#### SVM classifier
'C':[1.0,1.5,2.0,2.5,3.0,3.5,4.0],
'degree':[3,5,7,8,10,13],
'tol':[1e-3,1e-5,1e-7,1e-9]}
cv=2, verbose=1)

96
R_op.plot()
97
R_op.plot()
'n_estimators':[100,150,200,250,300,350,400],
cv=2, verbose=1)

98
R_op.plot()
99
R_op.plot()
cv=2, verbose=1)

max_features= 'auto', min_samples_split=5, splitter='best')
100
R_op.plot()
101
R_op.plot()
#### MLP classifier
'tol':[1e-4,1e-5,1e-6,1e-7]}
cv=2, verbose=1)

learning_rate_init= 0.001, solver='lbfgs',tol=1e-06)
102
R_op.plot()
R_op.plot()
103
cncr_hyp = { 'C': [1.0, 5.0, 2.0, 4.0, 3.0],
'max_iter': [1000, 400, 200],
'tol':[1e-4,1e-5,1e-6,1e-7,1e-3]}
cv=2, verbose=1)

104
R_op.plot()
R_op.plot()
105
5- Recursive Feature Elimination -Feature selection
from sklearn.feature_selection import RFECV as Effe_cancerbre__rfc
from sklearn.model_selection import StratifiedKFold as Effe_cancerbre__strkfd
from sklearn.tree import DecisionTreeClassifier as Effe_cancerbredcinre
import numpy as Effe_cancerbredcinmy
eva_met='accuracy'
Effe_cancerbre__fsfs_k = Effe_cancerbre__rfc(Effe_cancerbredcinre(random_state=4),
step=3, cv=Effe_cancerbre__strkfd(2), scoring=eva_met)
Effe_cancerbre__fsfs_k.fit(Effe_cancerbre__X, Effe_cancerbre__Y)
print('count: {}'.format(Effe_cancerbre__fsfs_k.n_features_))
Effe_cancerbre__X.drop(Effe_cancerbre__X.columns[Effe_cancerbredcinmy.where(Eff
e_cancerbre__fsfs_k.support_ == False)[0]], axis=1, inplace=True)
Effe_cancerbre__X
##60% of trn data
tst_s=0.4
rdm_ste=55
tst_s1=0.5
106

#### SVM classifier
'C':[1.0,1.5,2.0,2.5,3.0,3.5,4.0],
'degree':[3,5,7,8,10,13],
'tol':[1e-3,1e-5,1e-7,1e-9]}
107
cv=2, verbose=1)

108
R_op.plot()
R_op.plot()
'n_estimators':[100,150,200,250,300,350,400],
109
cv=2, verbose=1)

110
R_op.plot()
R_op.plot()
111
cv=2, verbose=1)

R_op.plot()
112
R_op.plot()
#### MLP classifier
'tol':[1e-4,1e-5,1e-6,1e-7]}
113
cv=2, verbose=1)

learning_rate_init= 0.001, solver='lbfgs',tol=0.0001)
R_op.plot()
114
R_op.plot()
cncr_hyp = { 'C': [1.0, 5.0, 2.0, 4.0, 3.0],
'max_iter': [1000, 400, 200],
'tol':[1e-4,1e-5,1e-6,1e-7,1e-3]}
115
cv=2, verbose=1)

R_op.plot()
116
R_op.plot()
6- PCA -Dimensionality reduction techniques
from sklearn.decomposition import PCA as Effe_cancerbre__dimep
Effe_cancerbre__dimep_k = Effe_cancerbre__dimep(n_components = 15)
Effe_cancerbre__X = Effe_cancerbre__dimep_k.fit_transform(Effe_cancerbre__X)
Effe_cancerbre__X
Effe_cancerbre__X.shape
##60% of trn data
117
tst_s=0.4
rdm_ste=55
tst_s1=0.5


#### SVM classifier
'C':[1.0,1.5,2.0,2.5,3.0,3.5,4.0],
118
'degree':[3,5,7,8,10,13],
'tol':[1e-3,1e-5,1e-7,1e-9]}
cv=2, verbose=1)
cncr_hyp_Vb.fit(Effe_cancerbre__Xtrn[:10], Effe_cancerbre__Ytrn[:10])

'auto',kernel='linear',tol=0.001)
cncr_moul1.fit(Effe_cancerbre__Xtrn[:10], Effe_cancerbre__Ytrn[:10])
119
R_op.plot()
R_op.plot()
'n_estimators':[100,150,200,250,300,350,400],
120
cv=2, verbose=1)

121
R_op.plot()
R_op.plot()
122
cv=2, verbose=1)

max_features= 'auto', min_samples_split=11, splitter='random')
123
R_op.plot()
R_op.plot()
#### MLP classifier
'tol':[1e-4,1e-5,1e-6,1e-7]}
124
cv=2, verbose=1)

R_op.plot()
125
R_op.plot()
cncr_hyp = { 'C': [1.0, 5.0, 2.0, 4.0, 3.0],
'max_iter': [1000, 400, 200],
'tol':[1e-4,1e-5,1e-6,1e-7,1e-3]}
126
cv=2, verbose=1)

R_op.plot()
127
R_op.plot()
7- LDA-Dimensionality reduction techniques
from sklearn.decomposition import LatentDirichletAllocation as

Effe_cancerbre__dimep
Effe_cancerbre__dimep_k = Effe_cancerbre__dimep(n_components = 15)
Effe_cancerbre__X = Effe_cancerbre__dimep_k.fit_transform(Effe_cancerbre__X)
Effe_cancerbre__X
Effe_cancerbre__X.shape
128
##60% of trn data
tst_s=0.4
rdm_ste=55
tst_s1=0.5


#### SVM classifier
129
'C':[1.0,1.5,2.0,2.5,3.0,3.5,4.0],
'degree':[3,5,7,8,10,13],
'tol':[1e-3,1e-5,1e-7,1e-9]}
cv=2, verbose=1)

'scale',kernel='poly',tol=0.001)
130
R_op.plot()
R_op.plot()
131
'n_estimators':[100,150,200,250,300,350,400],
cv=2, verbose=1)

132
R_op.plot()
R_op.plot()
133
cv=2, verbose=1)

134
R_op.plot()
R_op.plot()
#### MLP classifier
135
'tol':[1e-4,1e-5,1e-6,1e-7]}
cv=2, verbose=1)

136
R_op.plot()
R_op.plot()
cncr_hyp = { 'C': [1.0, 5.0, 2.0, 4.0, 3.0],
'max_iter': [1000, 400, 200],
'tol':[1e-4,1e-5,1e-6,1e-7,1e-3]}
137
cv=2, verbose=1)

138
R_op.plot()
R_op.plot()
# **Comparing Feature selection and dimensionality reduction techniques for effective

breast cancer predictions**
Effe_cancerbre_R = {'befo_dim_red':{'SVM':94, "RF":95, 'DT':95, 'MLP':96,

'PAC':92},
'Chi_sq':{'SVM':94, "RF":91, 'DT':96, 'MLP':79, 'PAC':90},
'L1_FS':{'SVM':91, "RF":96, 'DT':94, 'MLP':95, 'PAC':92},
'RFE_FS':{'SVM':96, "RF":95, 'DT':96, 'MLP':96, 'PAC':82},
'PCA_dim_red':{'SVM':93, "RF":90, 'DT':82, 'MLP':96, 'PAC':94},
139
'LDA_dim_red':{'SVM':77, "RF":93, 'DT':95, 'MLP':90, 'PAC':92}
Effe_cancerbre_Re = Effe_cancerbre__p.DataFrame(Effe_cancerbre_R)
Effe_cancerbre_Re.plot.bar()
Effe_cancerbre__oty.xticks()
Effe_cancerbre__oty.ylabel('Result')
Effe_cancerbre__oty.legend(loc='lower left')
Effe_cancerbre__oty.title('Feature selection and dimensionality reduction techniques for

effective breast cancer predictions')
Effe_cancerbre__oty.show()
140

Feature Selection and Dimensionality Reduction Techniques For Effective Breast Cancer Predictions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Feature Selection and Dimensionality Reduction Techniques For Effective Breast Cancer Predictions

Uploaded by

Copyright:

Available Formats

Feature selection and dimensionality

reduction techniques for effective breast

1.1 Traditional Methods used to Diagnosis Breast Cancer in medical

Mammography for diagnostic purposes makes it possible to analyses abnormal breast

1.2 Role of Machine Learning techniques used to predict the Breast

1.3 Hybrid Machine Learning models used to predict the Breast

1.4 Scope for Research

1.5 Aim of the research

1.6 Research question

 To implement machine learning algorithms such as support vector machine,

 To implement feature selection methods such as L1 based feature selection, chi-

 To implement dimensionality reduction techniques such as PCA and LDA.

1.10 Ethical Issues

There are a few ethical considerations associated with this study.

1.11 Project Planning and Timescales

1.12 Risk Analysis

Risk Likelihood Severity Mitigation

Data not properly High High Data should be properly prepared

Machine learning Medium High Machine learning algorithms

1.13 Ethical Approval

2.1 Feature selection and dimensionality reduction usage in breast

2.2 Dimensionality Reduction Techniques for Feature Selection and

2.3 Breast cancer prediction using feature selection

2.4 Analysis and Conclusions

3.1 Choice of Methods

Features selection reduces text categorization data. To simplify categorization, feature

3.2 Dataset Description

3.3 Feature selection Methods

Feature selection (FS) is currently being utilized in a different of real-world challenges,

Wrapper approaches identify an appropriate feature subset by using a predictive

In information theory, the Chi-square statistic is connected to feature selection functions

Chi-square statistics equation is represented in terms expected frequency and actual

3.3.2 L1-based feature selection

3.3.3 Recursive Feature Elimination

It is a wrapper approach in feature selection. To boost the model's generalization

3.4 Dimensionality reduction techniques

The technique known as computer vision employs the process of ML in order to

DR converts high-dimensional data into low-dimensional representations. Due to the

3.4.1 Principal Component Analysis (PCA)

An unsupervised linear dimension reduction approach, principal component analysis

3.4.2 Latent Dirichlet Allocation (LDA)

It is a framework of probabilistic of a corpus that can be used to produce new phrases.

3.5 Machine Learning algorithm

The majority of ML models are founded on the concept of predictive modelling.

3.5.1 SVM classifier

Figure 1 Support Vector Machine Classifier [64]

3.5.1.1 Hyperparameters Used

3.5.2 Random Forest classifier

Figure 2 Architecture of the Random Forest algorithm [70]

3.5.2.1 Hyperparameters Used

n_estimators: It specifies number of trees considered to construct the RF [72].

3.5.2.2 Decision tree

Entropy (S) = - p log2 p - q log2 q

Gain (S, V) = Entropy(S) - Σ (|Sv|/|S|) Entropy (Sv)

3.5.2.3 Hyperparameters Used

max_features: This category of hyperparameter represents the total amount of DT

min_samples_split: This provides the least number of samples necessary to split a

Figure 4 Multi-Layer Perceptron [78]

3.5.3.1 Hyperparameters Used

3.5.4 Passive Aggressive classifier (PAC)

Figure 5 Passive Aggressive Classifier [83]

3.5.4.1 Hyperparameters Used