You are on page 1of 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/356265849

A Comparative Analysis of Deep Learning Approaches for Predicting Breast


Cancer Survivability

Article in Archives of Computational Methods in Engineering · November 2021


DOI: 10.1007/s11831-021-09679-3

CITATIONS READS

20 72

2 authors:

Surbhi Gupta Manoj Gupta


Model Institute of Engineering and Technology Shri Mata Vaishno Devi University
25 PUBLICATIONS 426 CITATIONS 57 PUBLICATIONS 439 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Surbhi Gupta on 10 November 2023.

The user has requested enhancement of the downloaded file.


Archives of Computational Methods in Engineering
https://doi.org/10.1007/s11831-021-09679-3

SURVEY ARTICLE

A Comparative Analysis of Deep Learning Approaches for Predicting


Breast Cancer Survivability
Surbhi Gupta1,2 · Manoj K. Gupta1

Received: 5 August 2021 / Accepted: 30 October 2021


© CIMNE, Barcelona, Spain 2021

Abstract
Breast cancer is the second largest cause of mortality among women. Breast cancer patients in developed nations have a
relative survival rate of more than 5-years due to early detection and treatment. Deep learning approaches can help enhance
the identification of breast cancer cells, lower the risk of detection mistakes, and minimize the time it takes to diagnose
breast cancer using human methods. This paper examines the accuracy of artificial neural networks, Restricted Boltzmann
Machine, Deep Autoencoders, and Convolutional Neural Networks (CNN) for post-operative survival analysis of breast
cancer patients. A thorough examination of each network's operation and design is carried out to determine which network
outperforms the other, followed by an analysis based on the network's prediction accurateness. The experimental results
assert that all the deep learning techniques can predict the survival of breast cancer patients. The accuracy score achieved
by Restricted Boltzmann Machine performed is the highest (0.97), followed by deep Autoencoders that attained an accuracy
score of 0.96. CNN achieved a 92% accuracy score, while artificial neural networks attained the least accuracy score (0.89).
The prediction performance of models has been evaluated using distinct parameters like accuracy, the area under the curve,
F1 Score, Matthew’s correlation coefficient, sensitivity, and specificity. Also, the models have been validated using fivefold
cross-validation techniques. However, there is still a need for complete analysis and research using deep learning methods
to determine the design that provides superior accuracy.

1 Introduction deployed screening method for detecting breast cancer


shown to reduce mortality significantly. Other detection
Breast cancer is the most commonly occurring malignancy techniques have also been used and explored throughout the
in females worldwide, with 606,520 fatalities estimated previous decade. Several causes of cancer, including gender,
in 2020 [1]. Breast carcinoma afflicts more than 2 million ageing, estrogens, genetic factors, genetic conditions, are
women worldwide each year (24.2 per cent of all cancer considered prime risk factors [4]. The mortality rates can be
patients in 2018) [2]. Breast cancer is predicted to account reduced with early detection and better surgical procedures.
for 30% of all new cancer cases in women worldwide in Biological therapy for breast cancer is effective [5].
2020 [3]. Breast cancer is metastatic cancer that can spread Figure 1 shows the cancer statistics (estimated new cancer
to other organs, making it incurable. An excellent progno- cases) of 2020.
sis and a high survival percentage can be achieved if the The term "survival" refers to how long a patient lives after
diagnosis is made early. Mammography is a frequently being diagnosed with an illness. Women with invasive breast
cancer have a 91 per cent 5-year survival rate [6]. If breast
cancer is detected entirely in the breast, women have a 99
* Surbhi Gupta
sur7312@gmail.com per cent 5-year survival rate. 0.62 per cent of breast cancer
patients are diagnosed at this stage. This disparity could be
Manoj K. Gupta
manoj.gupta@smvdu.ac.in attributable in part to the screening delays experienced by
younger women. This emphasizes the need for early breast
1
School of Computer Science and Engineering, Shri cancer screening for a greater survival rate [7].
Mata Vaishno Devi University, Kakryal, Katra, The healthcare system is an essential aspect of society
Jammu and Kashmir, India
since it ensures that everyone receives proper diagnosis
2
Department of Computer Science and Engineering, Model and treatment. Leading to better treatment and prescription
Institute of Engineering & Technology, Jammu, J&K, India

13
Vol.:(0123456789)
S. Gupta, M. K. Gupta

Fig. 1  Cancer cases and deaths


in 2020

medications, it also reduces the requirement of skilled labor. aids in providing a more efficient and less time-consuming
Integrating new technologies in healthcare has become vital diagnosis [17].
and inevitable in today's technologically advanced and fast- The remaining paper is organized as follows: Sect. 2 puts
paced environment. As a result, incorporating diverse tech- forward the existing literature for the 5-year survival predic-
nology in healthcare is a critical step in medical research tion of breast cancer. Section 3 outlines the contributions of
growth. The application of big data analysis to design sur- this paper. Section 4 deliberates the materials and methods
vival prediction models became the latest research hotspot, explored in this work: data gathering, compilation, preproc-
thanks to the influx of medical data and the advancement essing, and deep neural networks. Section 5 highlights the
of artificial intelligence [8]. However, automated learning empirical analysis, simulation outcomes and comparison
strategies can automatically learn data structures and do with the existing literature. Section 6 discusses the study by
not require any underlying assumptions. The deep learn- explaining the acquired results and comparing them. Finally,
ing approach can handle inter-dependence and non-linear the paper is concluded in the last section, along with the
relationships among variables [9]. It excels at handling sig- further scope of enhancement.
nificant complex higher-order interactions found in medical
data. As a result, machine/deep learning techniques have
much potential for everyday medical practice as central 2 Related Work
healthcare technology systems. Diagnoses [10], illness pre-
diction models, recurrence predictions [11], and symptom Existing research on the survival prediction of breast can-
predictions [12, 13] have all benefited from ML research. cer patients is extensive. The clinical trials data, treatment,
Furthermore, while the survival prediction grows over time, and tumor-related information contribute to predicting the
there are significant differences in the databases, modelling 5-year survival of a cancer patient. This section provides
procedure, methodological quality, measures of perfor- the literature of existing works on the survival prediction of
mance, and modelling of related potential predictors [14, breast cancer patients. People of all ages and health condi-
15]. tions who have been diagnosed with breast cancer, including
This study intends to explore the performance of neural those diagnosed very early and those diagnosed very late,
Learning-based techniques to develop and validate the can- are included in the overall survival rate.
cer survival prediction models. The predictive models have For survival analysis of breast cancer patients, a research
been employed to predict breast cancer 5-year survival. The study conducted in 2008 [18] employed seven classifica-
5-year mark is critical for standardizing reporting and deter- tion techniques, i.e., Linear Regression; Artificial Neu-
mining survivability. Many research works have employed ral Networks, Naive Bayes, Bayesian network; Decision
a 5-year criterion to determine survivability as it takes at Trees + NB; Decision Trees and 37,256 cancer cases were
least five years to label a patient as alive or dead [16]. The considered. The models evaluated using different parameters
accuracy (acc.) of the distinct neural models in predicting performed well, and the highest acc (85.8%) was achieved
the 5-year survival rate for breast cancer hints towards estab- using logistic regression. Another study predicted the breast
lishing a better theoretical basis for applying machine learn- cancer survival rate of 162 K cancer patients with an acc of
ing/deep learning-based techniques in breast cancer survival 85% of using Fuzzy decision trees. Similar studies in 2009
prediction [6]. Breast cancer is a complicated disease, and [19] aimed to predict the breast cancer survival of 294 K and
assessing survivability becomes a significant focus of mod- 182 K, respectively and the dataset was derived from the
ern breast cancer research. The advancement of technology SEER data repository. Choi 2009 [20] deployed Artificial

13
A Comparative Analysis of Deep Learning Approaches for Predicting Breast Cancer Survivability

Neural Networks; Bayesian network; Hybrid Bayesian 2020 [32] employed Multi-Layer Perceptron, Multi-Layer
network for survival analysis and achieved an acc score of Perceptron experts, and Multi-Layer Perceptron stacked gen-
88.8%. However, Liu 2009 [21] deployed decision trees for eralization, whereas Hussain, 2020 [33] employed Artificial
the same purpose and achieved 79% acc. Decision Trees and Neural Networks, Decision Trees, and Linear Regression on
Linear Regression were also used in another study Wang, breast cancer dataset. All the studies have proposed auto-
2013 [22] to analyze the survival of breast cancer cases, mated learning approaches to predict breast cancer survival
and after undersampling, decision trees achieved an 89.6% rates and affirmed the superior performance of automated
acc score. learning techniques for breast cancer survival prediction.
Kim, 2013 [23] and Park, 2013 [24] employed Support The research studies that have proposed predictive models
Vector Classifier and Artificial Neural Networks on data of using the SEER Breast cancer dataset have been analyzed
cancer patients collected from SEER (1973–2003). These in Table 1.
studies affirmed the superior performance of neural networks Table 1 highlights the research analysis and the limita-
for survival prediction. Shin, 2014 [25] also analyzed the tions of the existing literature on the SEER Breast cancer
same breast cancer data and used multiple learning tech- dataset. The existing literature is quite significant. Also, the
niques like Decision Trees, Artificial Neural Networks, and current research works that enable efficient survival predic-
Support Vector Classifiers for constructing the prediction tion mainly rely on convenient machine learning techniques,
model. This study inferred that semi-supervised learning and a few of the studies have evaluated neural network algo-
could predict the survival rate of breast cancer patients rithms. However, a scope of improvement in the field of sur-
with excellent efficiency and achieved an 81% area under vival prediction is there. The significant contributions made
the curve (AUC) score. Also, Wang, 2014 [26] explored the in this work are highlighted in Sect. 3 as follows.
cancer data gathered during 1973–2007 and proposed learn-
ing techniques like linear Regression, Decision Trees, and
1-nearest neighbor for classifying cancer patients into differ-
ent risk groups and decision trees to achieve the highest acc 3 Contribution Outline
of 94%. Lotfnezhad, 2015 [7] used Support Vector Classi-
fier, Bayesian network, and Chi-squared Automatic Interac- The study's main purpose is to evaluate the performance
tion Detection on SEER breast cancer dataset (1999–2004). of deep learning techniques and validate it on a fairly large
This study advocated using support vector classifiers to esti- amount of data to analyze the 5-year survival analysis of
mate breast cancer patient's survival, the acc achieved by breast cancer patients. The contributions of the study are
support vector classifiers was the highest, i.e., 97%. Shawky stated below:
2017 [27] explored four thousand cancer patient's data and
proposed multiple learning approaches like Artificial Neu- i. This paper provides a survey of research works that
ral Networks, KNN, Support Vector Classifier, and Linear performed the survival analysis of breast cancer using
Regression for breast cancer survival prediction. The high- SEER Breast cancer survival Dataset.
est prediction acc was achieved using Artificial Neural Net- ii. Since the dataset is large-sized, deep learning algo-
works (91%). In 2019, two retrospective studies, Lu 2019 rithms can significantly improve the survival predic-
[28] and Abdikenov, 2019, intended to predict the survival tion outcomes compared to conventional machine
time of cancer patients. This study [28] employed a Genetic learning techniques applied throughout the literature.
algorithm, Extreme Learning Machine, Gradient Boosting, The deep learning architectures/models, namely arti-
and Multi-Layer Perceptron on a dataset of 82,707 patients ficial neural networks, Restricted Boltzmann Machine,
and achieved the highest acc score of 75%. Abdikenov, 2019 Deep Autoencoders, and Convolutional Neural Net-
[29] explored the dataset of 659 K cancer cases and used works (CNN) are employed for survival prediction
Deep Neural Networks, Linear Regression, Support Vector to enhance the prediction accuracy (ref. Section:
Classifier, Random Forest, and Gradient Boosting technique Results). Further, the study provides a comparative
for classification of breast cancer patients into risk groups analysis of multiple deep learning strategies.
and attained appreciable prediction outcomes (acc = 97%). iii. As depicted in Fig. 2, the proposed approach works
The studies affirmed the supremacy of machine-learning- by gathering the breast cancer dataset by downloading
based techniques to predict breast cancer survival. Signifi- the data files from a trusted repository (https://​seer.​
cant research studies were conducted on the SEER breast cancer.​gov/) followed by preprocessing and conse-
cancer dataset in 2010 [30] that employed multi-layer per- quent survival prediction (based on five years) through
ceptrons for the survival prediction of breast cancer patients. deep learning architectures. The proposed approach
Simsek, 2020 [31] used Artificial Neural Networks; Linear has been validated using a fivefold cross-validation
Regression for breast cancer survival prediction. Salehi, technique.

13
Table 1  Research analysis
Study Cases Features Analysis Specifications Limitations Results

13
Endo et al. [18] 37,256 10 The study employed multiple learn- Missing values are not addressed; The study can be extended by using Accuracy = 0.858
ing techniques like Linear regres- nor is the feature section technique better feature selection technique
sion, artificial neural networks, described. No parameter tuning
NB, Bayesian network, decision
trees + NB, decision trees on
SEER breast cancer data collected
between 1972 and 1997
Khan et al. [19] 162,500 16 This study explored the fuzzy deci- feature section technique is not The complexity of the fuzzy decision Accuracy = 0.85
sion trees SEER Breast Cancer described and no hyper parameter trees is high
Data collected between 1973–2003 tuning
Choi et al. [20] 294,275 14 Artificial neural networks, Bayes- feature section technique is not Hybrid Bayesian network model has Accuracy = 0.88
ian network, and Hybrid Bayesian described and no hyper parameter higher complexity and takes more
network was used for breast cancer tuning processing time
survival prediction using SEER
Breast cancer dataset (1973–2003)
Liu et al. [21] 182,517 16 This study employed decision trees no hyper parameter tuning The study can be extended using Accuracy = 0.79
algorithm for survival prediction of ensemble or deep learning tech-
breast cancer patients registered in niques
SEER database during 1973–2004
Wang et al., 2013 [22] 215,221 9 The survival of breast cancer patients No hyper parameter tuning The prediction accuracy can be Accuracy = 0.89
was predicted using decision trees increased using deep learning
and linear regression on the SEER techniques
dataset recorded during 1973–2007
Kim et al. [23] 162,500 16 Support Vector Classifier and Artifi- Missing values are not addressed; The accuracy achieved by the pro- Accuracy = 0.76
cial Neural Networks were used to feature section technique is not posed model is insignificant
predict the survival of breast cancer described and no hyper parameter
patients on the SEER Breast Cancer tuning
dataset
Park et al. [24] 162,500 16 This study used Support Vec- Missing values are not addressed; Ther is scope to use more exhaustive Accuracy = 0.78
tor Classifier on SEER database feature section technique is not learning models to increase predic-
(1973–2003) described tion score
Shin et al. [25] 162,500 16 Three learning approaches, i.e., Missing values are not addressed; The study has not achieved signifi- Area Under the Curve = 0.81
Decision Trees, Artificial Neural feature section technique is not cant accurateness
Networks, and Support Vector described
Classifier were used to predict the
survival analysis of breast cancer
using the SEER database from
1973 to 2003
S. Gupta, M. K. Gupta
Table 1  (continued)
Study Cases Features Analysis Specifications Limitations Results

Wang et al. [26] 215,221 5 Three machine-learning techniques, Missing values are not addressed; A very few numbers of predictors are Accuracy = 0.940
i.e., Linear Regression, Decision feature section technique is not used to train the models
Trees, and 1-nearest neighbor were described
used for the post-operative survival
prediction task. The dataset used by
the study was the SEER 1973–2007
breast cancer dataset
Lotfnezhad et al. [7] 22,763 18 This study explored the prediction Feature section technique is not This research work has explored Accuracy = 0.967
performance of Support Vec- described and no hyper parameter very few instances, a more detailed
tor Classifier, Bayesian network, tuning analysis on larger-sized data would
Chi-squared Automatic Interac- have provided better insights
tion Detection to forecast the life
expectancy of breast cancer patients
on SEER 1999–2004 dataset
Shawky et al. [27] 4490 14 This study explored Artificial Neural feature section technique is not The study needs to carry more Accuracy = 0.89
Networks, K-nearest neighbor, Sup- described and no hyper parameter exhaustive analysis to justify the
port Vector Classifier, and Linear tuning results
Regression to analyze the breast
cancer survivability on SEER 2010
dataset
Lu et al. [28] 82,707 14 Multiple learning approaches like feature section technique is not The prediction outcomes of study are Accuracy = 0.75
Genetic algorithm, Extreme Learn- described not significant enough
ing Machine, Gradient Boosting,
and Multi-Layer Perceptron were
employed to predict the breast
cancer survival. The dataset used
A Comparative Analysis of Deep Learning Approaches for Predicting Breast Cancer Survivability

by the study was SEER 1973–2014


Breast cancer dataset
Abdikenov et al. [29] 659,802 19 Deep Neural Networks, Linear Missing values are not addressed; The study has not described the Accuracy = 0.96
Regression, Support Vector Classi- feature section technique is not procedure through which features
fier, Random Forest, and Gradient described are selected
Boosting classification algorithms
were used for the survival predic-
tion of breast cancer patients
Fotouhi et al. [30] 26,092 17 This study used the algorithms like Missing values and hyper parameter The study can be improvised by car- Accuracy = 0.84
k-nearest neighbor, Multi-Layer tuning are not addressed rying an analysis of hybrid models
Perceptron, and Decision Trees for
breast cancer survival prediction on
SEER 1993–2014 dataset

13
S. Gupta, M. K. Gupta

iv. As no state-of-the-art works describe compilation and

Accuracy = 0.84

Accuracy = 0.93
preprocessing of such a large amount of data (163,413
cancer cases), we have made available the python
code used for the preprocessing and for developing
the deep learning architecture to predict breast cancer
Accuracy = 0.75

survival: https://​github.​com/​surbh​igupt ​a24/​Breast-​


Cancer-​Survi​val-​Predi​ction.
Results

v. The features selected as the candidate predictors for


training the models have also been discussed and
The study has compared the proposed

comparison with other architectures

with enhanced learning strategies is


approved by the medical expert (oncologist) as the
would provide a fair comparative
Multi-Layer Perceptron architec-
ture with its own variants only; a
parameters used for building the

most important prediction indicators. Thus, the fea-

The evaluation of proposed study


The study has not described the

tures approved by the medical practitioner are used


for predicting the survival analysis of breast cancer
patients.
prediction modes

recommended
Limitations

analysis

4 Materials and Methods

The purpose of this article is to examine the performance


of deep learning classification algorithms for predicting
described and no hyper parameter
feature section technique and hyper
parameter tuning is not described

the survival rate of breast cancer patients. The procedure's


feature section technique is not

objective is to compare the efficacy of different strategies on


No hyper parameter tuning

cancer data sets. The python language is used on the Spyder


framework to develop all experiments. The workflow of the
study is depicted in Fig. 2.
The data collection and preprocessing is explained in
Specifications

the Sect. 4.1. In this study, we have used the data stand-
tuning

ardization as the information gathered includes features of


various dimensions and scales. The modelling of a dataset
is adversely affected by different sizes of data features. In
predict the breast cancer survival on
Perceptron stacked generalization to
for predicting the survival of breast

the life expectancy of breast cancer


cancer patients. The records of the

This study proposed different learn-

terms of misclassification error and accuracy rates, it leads


Networks and Linear Regression

ing architecture like Multi-Layer


patients were registered between

Linear Regression for predicting

patients. The data of the cancer


This study used Artificial Neural
This study used Artificial Neural

patients was collected between


Networks, Decision Trees, and

to a skewed outcome of predictions. As a result, the data


must be scaled before modelling as the input values in gen-
SEER 2004–2013 dataset

eral have different scales. Standardizing the dataset implies


transforming the distribution of values such that the mean
(µ) of observed values is 0 and the standard deviation(σ) is
1973 to 2001

1. Standardization is achieved using the equation (i).


1973–2013
Features Analysis

x−𝜇
Z= (1)
𝜎
After the data preprocessing, dataset is divided into 4:1
for training and testing. A 75% of data (122,559 records)
17

141,254 35

17

is used for training, and 25% of data instances (40,854) are


used to train the models. The data sets are classified using
90,308
53,732
Cases

distinct classification techniques, namely artificial neural


networks [35], Restricted Boltzmann Machine [36], Deep
Autoencoders [37], and Convolutional Neural Networks
Table 1  (continued)

Hussain et al. [33]

(CNN) [38]. The description of deep learning approaches


Simsek et al. [31]

Salehi et al. [32]

is elaborated in Sect. 4.2. In this study, we have used deep


learning algorithms for the survival prediction of breast can-
cer patients. The prediction performance of the model is val-
Study

idated using fivefold cross-validation [39]. The performance

13
A Comparative Analysis of Deep Learning Approaches for Predicting Breast Cancer Survivability

Fig. 2  Work flow of proposed


study

parameters used to evaluate the models are described in SEER collects data of the cancer patient. The National
Sect. 5.1. Center for Health Statistics has reported death statistics
from SEER. The data sets, which span the years 1973 to
4.1 Data Source 2014, contain 133 cancer-related characteristics and are
accessible via the SEER website at https://​seer.​cancer.​gov.
The Surveillance, Epidemiology, and End Results (SEER) The Breast Cancer data set is utilized in the investiga-
database is a dependable and valuable source that inte- tive study. SEER provided the raw data for this investiga-
grates detailed data of patients, including cancer stage, tion, which include the breast cancer dataset. Data pre-
metastasis information, the patient's pathology, treatment processing covers data cleansing, data integration, data
history, and the reason of death [16]. SEER, established transformation, and data reduction in this study. However,
and preserved by the National Cancer Institute (NCI) of according to studies published in journals [30, 34], the
the United States, is a reliable source of information on set of features is trimmed to 18. Table 2 summarizes the
cancer incidence and patient survival in the United States. description of the attributes used in the study.

13
S. Gupta, M. K. Gupta

Table 2  Description of attributes


Feature Description

Marital status Defines the marital status of the patient at the time of tumor diagnosis
Race/ethnicity Describes the race/ethnicity of the patient
Sex Gender of the patient
Age at diagnosis Age of patient at the time of tumor diagnosis
Sequence number Keeps the record of tumors that arise during a patient's lifespan
Primary site Site in which tumor originated
Histologic type The microscopic constitution of cells associated with a specific primary
Behavior code Cancers are classified as benign /0, borderline /1, in situ /2 or malignant /3 according to ICD-O-3
Grade Extra codes indicating the presence of T-cells, B-cells, or null cells
Diagnostic confirmation Tests used for confirming the presence of a malignancy that has been reported
CS Tumor size Provides the details regarding tumor size
CS Extension Provides info regarding Extension of tumor
CS_Lymph_nodes briefs whether lymph nodes are involved or not
Regional_nodes (Positive) The actual amount of lymph nodes in the region inspected by the pathologist and found to contain metas-
tases
Regional_nodes (Examined) The pathologist inspected the actual amount of lymph nodes in the region and found them to contain
metastases
Reason_for_no_surgery The data element details the rationale for not performing surgery at the primary site
SEER_historic_stage It is a condensed version of the term "stage": in situ, localized, regional, remote, and unknown
First_malignant_primary_indicator Based on the total number of tumors in SEER. Tumors that have not been reported to SEER are presumed
to be malignant

4.2 Data Pre‑Processing for further analysis. Data is spliced randomly using the
‘train test split’ function.
Pre-classification is done as per the suggestion of recent
studies. The categorization of survival (target column) is 4.3 Deep Learning Techniques
shown in the Fig. 3 below.
Other situations, such as live patients with a survival In this study, we have proposed to employ multiple deep
time of fewer than 60 months who die of a reason other learning approaches to predict the survival of breast cancer
than that type of cancer, are excluded from this classifica- patients. The deep learning approaches used in the studies
tion. Following this stage, all other extraneous data associ- are described below.
ated with any record is removed. The remaining data sets
are corrected to include only 17 chosen features. After 4.3.1 Artificial Neural Network (ANN) [40]
these steps, 163,413 records were filtered out of 590,646
records, out of which 81,019 belong to the ‘alive’ cat- ANN is a neural/deep learning method derived from the Bio-
egory, and 82,394 belong to the ‘not alive category used logical Neural Networks concept in the human brain. The

Pre-classification is done as per the suggestion of recent studies. The categorization of survival (target column) is shown in
the figure 3 below.
Step 1: If ((Survival_Months>=60) && (Vital_Status_Recode = alive))
Label Survival Status = ‘alive’
Step 2: Else if ((Survival_Months<60) && (Cause_of_Death= cancer))
Label Survival Status = ‘not survived’
Else
Step 3: Excluded from further analysis

Fig. 3  Artificial neural networks

13
A Comparative Analysis of Deep Learning Approaches for Predicting Breast Cancer Survivability

( )
creation of ANN was the outcome of an attempt to mimic
y[2] = q t
t [1]
+ b t [2]
= g [2] [2]
z5 (7)
the human brain's functions. The workings of ANN are pretty 5 5
⋅ 5 5

analogous to those of biological neural networks. However,


( )
they are not identical. The structure of an artificial neural net-
y[2] = qt6 ⋅ t[1] + b6 t6[2] = g[2] z[2] (8)
work is shown in Fig. 4. 6 6

The ANN algorithm accepts only numeric and structured


ANN can be used for both regression and classification
data. A perceptron is a neural network with only one layer,
applications by adjusting the activation functions of the out-
and Artificial Neural Networks are multi-layer perceptrons. A
put layers. For binary classification, use the sigmoid activa-
neural network can have distinct layers. One or more neurons
tion function; for multi-class classification, use the Softmax
or units can be found in each layer. Each neuron is connected
activation function; and for regression, use the linear acti-
to every other neuron in the system. Different activation func-
vation function. Since the target attribute, i.e., survival sta-
tions could be assigned to each layer. Forward propagation and
tus was a binary class problem; hence we used the sigmoid
back-propagation are the two phases of ANN. Hence, multipli-
activation function. The activation function, i.e., Sigmoid,
cation of weights and addition of bias followed by applying an
is described in the Eq. (9).
activation function to the inputs and propagating it forward is
the main task of forward propagation Each neuron in the layer 1
g(y) = (9)
(l) performs the calculation given in Eq. (2). 1 + e−y
( )
y[l] [l] [l] [l] The most crucial stage is backpropagation, which entails
(2)
t [l−1]
= q ⋅ a + b a = g zi
determining optimal parameters for the model by propagat-
k k i k

ing the neural network layers backwards. Backpropagation


The bi represents the bias term and t represents the activa-
requires an optimization function to discover the model's
tion function of the corresponding layer. For better understand-
ideal weights.
ing, the equation Eqs. (3), (4) represents the working for layer
2.
( ) 4.3.2 4.3.2 Convolutional Neural Network (CNN) [41]
y[2]
1
= qt1 ⋅ a[1] + b1 a[2]
1
= g[2] z[2]
1 (3)
A CNN differs from a traditional neural network as it func-
( ) tions over a volume of inputs. Each layer looks for a pattern
y[2]
2
= qt2 ⋅ t[1] + b2 t2[2] = g[2] z[2]
2 (4) or uses data in the data. Figure 5 presents the structure of a
Convolutional Neural Network.
( ) Each Conv layer is composed of numerous planes, allow-
y[2]
3
= q t
3
⋅ t [1]
+ b t
3 3
[2]
= g [2] [2]
z3 (5) ing for the construction of several feature maps at each point.
CNN model passes the kernel over the input; consequently,
( ) matrices are calculated using Eq. (10).
y[2] = qt4 ⋅ t[1] + b4 t4[2] = g[2] z[2] (6)
4 4 [ ] [ ] ∑∑ [ ] [ ]
Z x, y = (t ∗ w) x, y = w i, j t x − i, y − j
i j

tandwdenote the input (10)


[ ]
x, y is the [row,column] index of the resultant matrix

We have employed the CONV1D model for the survival


prediction of cancer patients. The activation function used
for constructing the model is sigmoid, while the 'Root
Mean Square Propagation' optimizer is used to optimize
the model. Also, we have used 20 epochs and a batch size
of 100 units to construct the model. A CNN receives text
as a sequence. The embedding layer receives the embed-
ding matrix as an argument. Each remark is subjected to five
various filter sizes, as well as GlobalMaxPooling1D layers.
Following that, all of the outputs are combined. A Dropout
layer is placed first, followed by a dense layer, followed by
another Dropout layer, and finally a Final Dense layer. We
Fig. 4  Artificial neural networks utilize a pooling layer between the convolutional layers to

13
S. Gupta, M. K. Gupta

Fig. 5  Convolutional neural networks

( )
reduce dimensional complexity while still retaining the con- ( ) ∑
1
volutions' significant information. P vi = 1|Y = wij YJ + pi (11)
1 + e−x j

Restricted Boltzmann Machine [36]


4.3.3 
( )
( ) 1 ∑
The Boltzmann Machine is an unsupervised modelling P Yi = 1|Y = wij vi + qj (12)
1 + e−x
approach. This model generates a probabilistic model from i

a dataset and predicts new data instances. This model is


In the Eqs. (11) and (12), p and q represent the inter-
designed using an input layer (visible layer) and single or
cept vectors for the visible and hidden layers, respectively.
multiple hidden layers. The Boltzmann Machine uses neural
There is not any intra-layer connectivity between the nodes
networks in which neurons are connected to neurons at dif-
of the input and hidden layer. RBM is known as a Restricted
ferent levels and those in the same layer. Connections are
Boltzmann Machine because it restricts intra-layer connec-
bidirectional, with visible neurons and buried neurons both
tions. As RBMs are undirected, they do not use gradient
connected. The structure of a Restricted Boltzmann Machine
descent or backpropagation to alter their weights. Contras-
is given in Fig. 6.
tive divergence is the method through which they modify
RBM is an undirected, probabilistic, and generative
their weights. The distribution of weights for the nodes in
approach. RBM is also an asymmetrical bipartite graph since
the input layer is arbitrarily generated and utilized in the
it contains two layers: an input layer and a hidden layer.
hidden layer nodes at the initial stage.
Every visible node is linked to every hidden node. This
Further, the nodes in the hidden layer recreate visible
technique aims to identify the joint probability distribution
nodes throughout using the same weights. The created nodes
that intends to maximize the logarithmic-likelihood func-
are unidentical due to disconnection from one another. There
tion. The logistic sigmoid activation function of the input
is a symmetric bipartite graph in an RBM with no connec-
determines the conditional probability distribution of each
tions between units in the same group. Multiple RBMs can
unit, given in Eqs. (11) and (12).

Fig. 6  Restricted Boltzmann


machine

13
A Comparative Analysis of Deep Learning Approaches for Predicting Breast Cancer Survivability

be stacked and fine-tuned using gradient descent and back- computed using the 'binary_crossentropy' function. The
propagation algorithms. The RBM is a Stochastic Neural cross-entropy (CE) is computed using Eq. (13).
Network with two additional bias units (hidden bias and j
visible bias). The hidden bias RBM generates activation 1∑ ( ) ( )( ( ))
CE = Mi log Ki + 1 − Mi log 1 − Ki (13)
on the forward pass, whereas the visible bias aids RBM in j i=1
reconstructing the input on the backward pass. The regen-
erated input varies from the actual input since there are no The term Ki denotes the probability for i th instance and
connections between the units in the input/visible layer and Mi represents all the truth values for Jth instances. The acti-
hence no method of transmitting information between them. vation function used in the deep Autoencoders, i.e., ‘RELU’
is Eq. (13).
4.3.4 Deep Autoencoders [35] f (y) = max (0, y)(xiv) Here, y is the input. Also, we have
used the “Root Mean Square Propagation” optimizer to opti-
An autoencoder is made up of an encoder and decoder. The mize the algorithm. The RMSE score is calculated using
former compresses the input data, while the latter reverses Eq. (14).
the process to generate the uncompressed data, allowing for √
√ n
the most accurate reconstruction of the input. Autoencoder √∑ (𝛼i − 𝛽i )2
RMSE = √
architecture aims to construct an output layer representation i=1
n
of the input as close (similar) as feasible. On the other hand, (14)
𝛼1 , 𝛼2 , 𝛼3 , … 𝛼n are the predicted values
Autoencoders are used to determine a compressed version of
the input data with the slightest degree of data loss. Autoen- 𝛽1 , 𝛽2 , 𝛽3 , … 𝛽n are the observed values
coders operate similarly. The structure of a deep autoencoder ndenotes the number of observations
is given in Fig. 7, which depicts the input, hidden, and output
The disadvantage of this approach is that compressed data
layer in a deep autoencoder.
is that the organization of data in its compressed form cannot
The encoder component of the design compresses the
be determined. For proper clarification, the encoder does
input data, ensuring that vital information is preserved while
not eliminate some of the parameters; instead, it uses them
the total size of the data is significantly decreased. Dimen-
to generate the compressed version with fewer parameters.
sionality Reduction is the term for this notion. The loss is

Fig. 7  Deep autoencoders

13
S. Gupta, M. K. Gupta

4.4 Validation [39] Table 3  Hardware and software specifications


Name Parameter
A five-fold cross-validation technique has been utilized to
validate classification results were compared objectively Operating system Windows 10, 64 bit
and avoid the generation of random outcomes. A cross- Processor Intel(R) Core i5-3230 M @
2.60 GHz, 2.60 GHz
validation is an approach for evaluating the prediction
RAM 8 GB
performance of automated learning techniques. Also, it
Programming LANGUAGE Python 3.6.9
demonstrates how to enlarge the machine learning model
Development environment Anaconda, Spyder Notebook
we have constructed into an individual data set.
As illustrated in Figs. 8, 5-fold Cross-Validation divides
the data into five sections. Each time, one of these five
subsets is used to validate, while the remaining four are 5.1 Evaluation Parameters
utilized to learn. This procedure is performed five times
more until all data has been used precisely once for learn- The performance of algorithms was measured on various
ing and once for validation. Finally, as a final estimate, parameters like acc, AUC score, Matthews correlation coef-
the average of the five times validation results is used [39, ficient (MCC), sensitivity score, specificity score, and F1 score
42, 43]. [44–46]. The description of evaluation parameters is given
below.
The confusion matrix functional evaluation metric is used
to assess the performance of these techniques. A confusion
5 Result Analysis matrix provides information about a classifier's actual and
expected classifications. A classifier's performance is often
All investigations were performed on a computer with evaluated using a confusion matrix based on the data. Table 4
a Windows 10 operating system and 8 GB of RAM, shows the confusion matrix.
and Python version 3.7 was used. Table 3 contains the In the Table 4, P stands for positive, N stands for negative,
detailed specifications. T stands for true, and F stands for false. Consequently, TP
stands for true positive, TN stands for true negative, FP stands
for false positive, and FN stands for false negative. Analyzing
and interpreting data is a necessary component of the evalua-
tion, and numerous evaluation methods are accessible. This is
to arrange and provide visible, understandable outcomes that
may be used and improved.

• Accuracy is the measurement parameter to determine the


effectiveness of a classifier. Accuracy is defined as the
number of correctly categorized values in a set and is cal-
culated using Eq. (15).
TP + TN
Accuracy = (15)
(TP + TN + FP + FN)
• The Receiver Operator Characteristic (ROC) curve is a
binary classification issue evaluation metric. A probability
curve displays the True-Positives against the False-Pos-
itives at different threshold values. The AUC score sum-
marizes the ROC curve that measures a model's ability to
classify instances.

Table 4  Confusion matrix P N

P TP FP
N FN TN
Fig. 8  5-Fold cross-validation

13
A Comparative Analysis of Deep Learning Approaches for Predicting Breast Cancer Survivability

• F1 score is calculated as the harmonic mean of precision Table 5  Accuracy of deep learning models
and recall. The F1 measure combines the characteristics K-Folds Artificial Convolutional Deep Restricted
of both metrics into a single value. The score is calcu- neural net- neural net- autoen- Boltzmann
lated using Eq. (16) to get the precision or recall values. works works coders machine

2 ∗ Precision ∗ Recall K=1 0.87 0.93 0.97 0.97


F1Score = (16) K=2 0.88 0.92 0.97 0.96
Precision + Recall
K=3 0.89 0.92 0.95 0.96
• Matthews’s correlation coefficient (MCC) measures the
K=4 0.9 0.94 0.96 0.97
quality of binary (two-class) classifications. It ranges K=5 0.885 0.91 0.96 0.98
between − 1 and + 1, where + 1 signifies a perfect pre- Average 0.885 0.924 0.962 0.968
diction and − 1 indicates total imperfection [47]. MCC
is calculated using Eq. (17).
(TP ∗ TN) − (FP ∗ FN) Table 6  AUC of deep learning models
MCC =
(TP + FP)(TP + FN)(TN + FP)(TN + FN)
(17) Models Artificial Convolutional Deep Restricted
neural net- neural net- autoen- Boltzmann
• A sensitivity/recall is calculated when a classifier divides works works coders machine
the true positives by the sum of true positives and false
K=1 0.87 0.92 0.97 0.98
negatives, as shown in Eq. 4. A high recall indicates that
K=2 0.88 0.93 0.97 0.97
the classifier is accurate and produces few false nega-
K=3 0.89 0.91 0.95 0.95
tives [44–46]. The sensitivity scores of the models are
K=4 0.9 0.94 0.96 0.98
calculated using Eq. (18).
K=5 0.885 0.9 0.96 0.96
TP Average 0.885 0.92 0.962 0.968
Sensitivity∕Recall = (18)
TP + FN
Bold indicates for each deep learning algorithm, the highest scores
• Specificity quantifies the proportion of negatives predic- achieved by the specific model
tions predicted correctly by a classifier. As seen in Eq. 5,
divides the number of true negatives by the sum of all 5.3 Comparison of AUC Scores of Prediction Models
true negatives and false positives. Specificity indicates
that the positive prediction is reliable, and the likelihood Table 6 shows the comparison of the deep learning tech-
of a false positive is low [44–46]. The specificity scores niques based on the AUC scores achieved.
of the models are calculated using Eq. (19). From Table 6, it can be inferred that Bernoulli RBM
fared the best, with a 97 per cent (approx.) AUC score,
TN
Specificity = (19) followed by Deep Autoencoders with a score of 96.2 per
TN + FP
cent AUC. Artificial neural networks had an AUC score
of 88 per cent, whereas convolutional neural networks had
an AUC score of 92 per cent. Thus, Bernoulli RBM was
5.2 Comparison of Accuracy Scores of Prediction the most accurate, whereas ANN was the least accurate.
Models

The deep learning models are compared on basis of predic- 5.4 Comparison of F1 Scores of Prediction Models
tion accuracy obtained in Table 6. From Table 5, it is clear
that Bernoulli RBM performed the best attaining the high- Table 7 illustrates the comparison between multiple learn-
est accuracy, followed by Deep Autoencoders that attained ing approaches. The highest F1 scores achieved in each of
96.2% acc. Convolutional Neural networks achieved 92% the folds are bold-faced.
accurateness, whereas artificial neural networks attained an Table 7 confirms that Bernoulli and Deep Autoencod-
89% acc score. ers performed better than other techniques and attained an
According to Table 5, based on the accurateness of mod- F1 score of 96.2 and 96 per cent (approx.), respectively.
els, the order of the models can be stated as Restricted Boltz- Artificial neural networks achieved the least F1 score of
mann Machine > Deep Autoencoders > Convolutional Neu- 87 per cent, whereas convolutional neural networks had
ral Networks > Artificial Neural Networks. Hence, Bernoulli an F1 score of 92 per cent. In terms of F1 score, Bernoulli
RBM performed the best while ANN performed the worst RBM outperformed the other three learning techniques.
in terms of acc score.

13
S. Gupta, M. K. Gupta

Table 7  F1 Score of deep learning models Table 9  Sensitivity score of deep learning models
Models Artificial neu- Convolutional Deep Restricted Models Artificial neu- Convolutional Deep Restricted
ral networks neural networks autoen- Boltzmann ral networks neural networks autoen- Boltzmann
coders Machine coders Machine

K=1 0.88 0.92 0.97 0.97 K=1 0.97 0.88 0.93 0.93
K=2 0.84 0.93 0.94 0.95 K=2 0.98 0.87 0.94 0.95
K=3 0.87 0.93 0.95 0.95 K=3 0.96 0.89 0.96 0.97
K=4 0.89 0.92 0.98 0.98 K=4 0.95 0.88 0.95 0.96
K=5 0.88 0.90 0.96 0.96 K=5 0.98 0.88 0.92 0.92
Average 0.872 0.92 0.96 0.962 Average 0.968 0.88 0.94 0.946

Bold indicates for each deep learning algorithm, the highest scores Bold indicates for each deep learning algorithm, the highest scores
achieved by the specific model achieved by the specific model

Table 8  MCC score of deep learning models Table 10  Specificity score of deep learning models
Models Artificial neu- Convolutional Deep Restricted Models Artificial neu- Convolutional Deep Restricted
ral networks neural networks autoen- Boltzmann ral networks neural networks autoen- Boltzmann
coders Machine coders Machine

K=1 0.79 0.88 0.94 0.95 K=1 0.82 0.88 0.98 0.99
K=2 0.77 0.89 0.93 0.94 K=2 0.81 0.84 0.99 0.98
K=3 0.77 0.91 0.93 0.95 K=3 0.78 0.88 0.96 0.96
K=4 0.78 0.88 0.9 0.95 K=4 0.8 0.87 0.98 0.99
K=5 0.79 0.89 0.92 0.9 K=5 0.8 0.88 0.98 0.98
Average 0.78 0.89 0.924 0.938 Average 0.802 0.87 0.978 0.98

Bold indicates for each deep learning algorithm, the highest scores Bold indicates for each deep learning algorithm, the highest scores
achieved by the specific model achieved by the specific model

5.5 Comparison of MCC Scores of Prediction Models the second-highest sensitivity score (94.6%) is achieved
using Deep Autoencoders.
Table 8 depicts the comparison between multiple learning
approaches in terms of the MCC score obtained. 5.7 Comparison of Specificity Scores of Prediction
Table 8 verifies that Bernoulli RBM attained an MCC Models
score of 94 per cent (approx.). Deep Autoencoders attained
the MCC score of 92.4%, and convolutional neural networks Table 10 shows a comparison of deep learning approaches
had an MCC score of 89.4 per cent. Artificial neural net- in terms of the attained specificity score.
works achieved the least MCC score of 78 per cent. In com- Table 11 also shows that Bernoulli RBM attained the
parison to the other three learning approaches, Bernoulli highest specificity score, i.e., 98%. Deep Autoencoders had
RBM demonstrated the best performance. a specificity score of 97.8 per cent, whereas Convolutional
Neural Networks had a specificity score of 89 per cent, and
5.6 Comparison of Sensitivity Scores of Prediction artificial neural networks achieved the lowest specificity
Models score, i.e. 80%. In terms of specificity, Bernoulli RBM out-
performed the other three methods while artificial neural
The sensitivity/recall (Sens.) and specificity (Spec.) scores networks underperformed. Analyzing the specificity scores,
obtained in each fold by the deep learning techniques are we can rank Restricted Boltzmann Machine > Deep Autoen-
summarized in Table 9. coders > Convolutional Neural Networks > Artificial Neural
Table 9 also demonstrates that Bernoulli RBM and Deep Networks.
Autoencoders attained a sensitivity score of 95% (approx.).
Convolutional neural networks had a sensitivity score of 5.8 Comparison with Existing Studies
97 per cent. Convolutional neural networks outperformed
the other three methods in terms of sensitivity. In terms The proposed models have been compared to the previ-
of sensitivity score, the Artificial Neural Networks have ous studies based on the acc and AUC scores obtained by
achieved the highest sensitivity score, i.e., 96.8% (average), the learning algorithms of the respective studies. Table 11

13
A Comparative Analysis of Deep Learning Approaches for Predicting Breast Cancer Survivability

Table 11  Comparison with Year Learning technique Accuracy


previous works
2008 [18] Logistic regression 0.858
2008 [19] Fuzzy decision trees 0.85
2009 [20] Hybrid Bayesian network 0.88
2011 [21] Decision Trees 0.79
2013 [22] Decision Trees, Linear REGRESSION 0.89
2013 [23] Semi-supervised Learning 0.76
2013 [24] Artificial neural networks 0.78
2014 [26] Decision Trees 0.940
2015 [7] Support vector classifier 0.967
2017 [27] Artificial neural networks 0.89
2019 [28] Genetic algorithm and Gradient Boosting 0.75
2019 [30] Multi-Layer perceptron 0.84
2020 [31] Artificial neural networks 0.75
2020 [32] Multi-Layer Perceptron stacked generalization 0.84
2020 [33] Decision Trees 0.93
Proposed study Restricted Boltzmann Machine 0.97
Deep autoencoders 0.962

Bold indicates for each deep learning algorithm, the highest scores achieved by the specific model

presents the comparative analysis of the proposed study with we have elected multiple neural techniques for breast cancer
previous works. survival prediction.
All the studies compared in Table 5 have explored the In this work, we executed deep learning techniques
same breast cancer dataset from the SEER repository. Also, using data from the SEER database of breast cancer inci-
most the studies compared in Table 5 have employed deci- dence after 2004. A dataset of 163,413 records with 18 fea-
sion trees. The highest accuracy achieved by the decision tures was created after preprocessing 590,646 records with
trees is 94% [26]. Also, many of the studies have employed 109 features. The neural learning techniques employed in
artificial neural networks and multi-layer perceptron and this study are artificial neural networks, Restricted Boltz-
obtained the maximum of 89% prediction accurateness. mann Machine, Deep Autoencoders, and Convolutional
However, the deep learning model (Restricted Boltzmann Neural Networks. The performance of neural learning
Machine) proposed in the current study have attained the architectures is validated using a fivefold cross-validation
highest prediction outcomes (0.97) and have outperformed technique. This study has multiple evaluation parameters
the existing literature on breast cancer survival prediction. like accuracy, AUC, MCC, F1score, sensitivity, and speci-
ficity to evaluate models' performances. Also, most of the
studies have used different parameters to assess the perfor-
mance of the respective proposed models. Overall, all the
6 Discussion neural learning approaches performed well and predicted
the survival of breast cancer patients with good accurate-
The number of cancer cases reported each year globally is on ness. However, the simulation results also affirmed that
the rise, and employing neural networks for cancer diagnosis Restricted Boltzmann Machine, Deep Autoencoders, and
can help reduce this burden while also improving health- Convolutional Neural Networks performed better than arti-
care system efficiency. Advancements have been achieved ficial neural networks (which was proposed was most of
in understanding the survival rate of breast cancer in the last the recent studies). Also, another highlighting fact is that
decade. Consequently, the chances of survival are increased. Restricted Boltzmann Machine outperformed other learn-
Because AI can perform several functions, including risk ing architectures. Neural learning-based techniques have a
assessment, diagnosis and detection, prognosis, and therapy brighter future in relieving a single doctor of the strain of
response, it is employed in cancer diagnosis. Because of its examining thousands of patients. Various functions, proce-
vast variety of applications, AI in cancer detection is accept- dures, and training can all help to increase the efficiency of
able and valuable. It has been analyzed that artificial neu- this process. Neural architectures are self-learning systems
ral network is the most frequently used automated learning that can reduce errors over time while also reducing the
technique to predict breast cancer in the literature. Hence, time required for diagnosis and categorization. Physicians

13
S. Gupta, M. K. Gupta

can use survivability prediction to help them determine the Data Availability The datasets and Python code for data preprocessing
optimal treatment option for a given patient. It can also and constructing deep learning models is publicly available online on
Github (on request): https://​github.​com/​surbh​igupt​a24/​Breast-​Cancer-​
help patients avoid unnecessary treatments and operations Survi​val-​Predi​ction
and the costs and adverse effects. Machine learning algo-
rithms for estimating cancer patient survivability produce Declarations
more accurate outcomes than human prediction. Conse-
quently, clinicians can use these predictors in addition to Conflict of interests There is no conflict to be declared.
their professional and clinical investigations to determine
the best course of treatment for their patients. Clinicians
can utilize the predictors to forecast tumor surgery and its
impact on a patient's survivability before the procedure. References
1. Sung H et al (2020) Global cancer statistics 2020: GLOBOCAN
estimates of incidence and mortality worldwide for 36 cancers
in 185 countries. CA Cancer J Clin 71(3):209–249
7 Conclusion 2. Islami F et al (2018) Proportion and number of cancer cases and
deaths attributable to potentially modifiable risk factors in the
The primary purpose of this research was to investigate United States. CA Cancer J Clin 68(1):31–54
the performance of deep-based learning algorithms that 3. Kumar Y, Gupta S, Singla R, & Hu YC (2021) A systematic
review of artificial intelligence techniques in cancer prediction
could predict breast cancer survival. This study presents an and diagnosis. Arch Comput Methods Eng 1–28
elaborate experiment using multiple neural-learning tech- 4. Gupta S, Gupta MK (2021) Computational model for prediction
niques for testing the SEER Breast cancer dataset. This of malignant mesothelioma diagnosis. Comput J
research analyses the prediction performance of distinct 5. Chang CH, Sibala JL, Fritz SL, Dwyer 3rd SJ, Templeton AW,
Lin F, Jewell WR (1980) Computed tomography in detection
deep learning approaches to predict the post-operative and diagnosis of breast cancer. Cancer 46(4 Suppl):939–946
life expectancy of breast cancer patients. The models 6. Tapak L, Shirmohammadi-khorram N, Amini P, Alafchi B
have been evaluated on real-time breast cancer patients (2019) Prediction of survival and metastasis in breast cancer
and validated using a fivefold cross-validation technique. patients using machine learning classifiers. Clin Epidemiol Glob
Heal 7(3):293–299
We have used multiple performance evaluation parameters 7. Afshar HL, Ahmadi M, Roudbari M, Sadoughi F (2015) Predic-
to ascertain the significant performance of the proposed tion of breast cancer survival through knowledge discovery in
approaches. databases. Glob J Health Sci 7(4):392–398
Restricted Boltzmann Machine performed the best, and 8. Kononenko I (2001) Machine learning for medical diagno-
sis: history, state of the art and perspective. Artif Intell Med
Deep Autoencoders displayed the second-best performance 23(1):89–109
as per the accuracy, AUC, MCC, F1, and specificity scores. 9. Zhu W, Xie L, Han J, Guo X (2020) The application of deep
In terms of sensitivity scores, Artificial Neural Networks, learning in cancer prognosis prediction. Cancers 12(3):603
followed by Deep Autoencoders, performed the best. Incor- 10. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Foti-
adis DI (2015) Machine learning applications in cancer prog-
porating neural approaches in breast cancer diagnosis and nosis and prediction. Comput Struct Biotechnol J 13:8–17
classification is critical to reducing the workload on doctors 11. Kim W et al (2012) Development of novel breast cancer recur-
who must diagnose multiple patients daily with improved rence prediction model using support vector machine. J Breast
efficiency. The success of deep learning methods in predict- Cancer 15(2):230–238
12. Gupta S, Gupta MK (2021) Computational prediction of cervi-
ing breast cancer survival suggests that similar models could cal cancer diagnosis using ensemble-based classification algo-
be used in other prediction applications in the future. Fur- rithm. Comput J
thermore, using these predictors allows doctors to educate 13. Gupta S, Gupta MK (2021) A comprehensive data-level inves-
their patients about their chances of surviving in the future. tigation of cancer diagnosis on imbalanced data. Comput Intell
14. Chen Y, Ke W, Chiu H (2014) Risk classi fi cation of cancer
Consequently, patients can make more informed decisions survival using ANN with gene expression data from multiple
about their lifestyle and treatment options in the future. laboratories. Comput Biol Med 48:1–7
Hence, these techniques can be a cutting-edge tool in breast 15. Zhu W, Fang K, He J, Cui R, Zhang Y, Le H (2019) Research
cancer diagnosis and survival prediction that will benefit article a prediction rule for overall survival in non-small-cell
lung cancer patients with a pathological tumor size less than 30
patients and doctors. mm. Dis Markers 2019:1–9
16. Duggan MA, Anderson WF, Altekruse S, Penberthy L, Sher-
Acknowledgements We express our gratitude to Dr. Rajeev Saini, MD, man ME (2016) The surveillance, epidemiology and end results
DNB (Medical Oncology), Narayana Multispecialty Clinic, Jammu (SEER) program and pathology: towards strengthening the criti-
& Kashmir (India), to provide us with consistent guidance on breast cal relationship. Am J Surg Pathol 40(12):e94
cancer. Dr. Saini has also assisted in confirming that the prediction 17. Shimizu H, Nakayama KI (2020) Artificial intelligence in oncol-
indicators used in the study are extremely important from the medical ogy. Cancer Sci 111(5):1–9
point of view.

13
A Comparative Analysis of Deep Learning Approaches for Predicting Breast Cancer Survivability

18. Arihito E, Shibata T, Hiroshi T (2008) Comparison of seven algo- 33. Delen D, Walker G, Kadam A (2005) Predicting breast cancer
rithms to predict breast cancer survival. Biomed Soft Comput survivability : a comparison of three data mining methods. Artif
Human Sci 13(2):11–16 Intell Med 343(2):113–127
19. Khan MU, Choi JP, Shin H, Kim M (2008) Predicting breast 34. Id JL et al (2021) Predicting breast cancer 5-year survival using
cancer survivability using fuzzy decision trees for personalized machine learning: a systematic review. PLoS ONE 16:1–23
healthcare. In: 2008 30th annual international conference of the 35. Fine TL, Hassoun MH (1996) Fundamentals of artificial neural
IEEE engineering in medicine and biology society. IEEE, pp networks. IEEE Trans Inf Theory 42(4):1322–1324
5148–5151 36. Larochelle H (2012) Learning algorithms for the classification
20. Choi JP, Han TH, Park RW (2009). A hybrid Bayesian network restricted Boltzmann machine. J Mach Learn Res 13:643–669
model for predicting breast cancer prognosis. J Korean Soc Med 37. Chicco D, Sadowski P, Baldi P (2014) Deep autoencoder neural
Inform 15(1):49–57 networks for gene ontology annotation predictions. In: ACM BCB
21. Fan C, Chang P, Lin J, Hsieh JC (2011) A hybrid model combin- 2014 - 5th ACM Conference on Bioinformatics, Computational
ing case-based reasoning and fuzzy decision tree for medical data Biology and Health. Informatics pp. 533–540
classification. Appl Soft Comput 11:632–644 38. Kim P (2012) Convolutional neural network. MATLAB deep
22. Wang KJ, Makond B, Wang KM (2013) An improved surviv- learning. Apress, Berkeley, pp 121–147
ability prognosis of breast cancer by using sampling and feature 39. Celisse A (2010) A survey of cross-validation procedures for
selection technique to solve imbalanced patient classification data. model selection. Statistics Surv 4:40–79
BMC Med Inform Decis Mak 13(1):1–14 40. Aziz R, Verma CK, Jha M, Srivastava N (2017) Artificial neural
23. Kim J, Shin H (2013) Breast cancer survivability prediction using network classification of microarray data using new hybrid gene
labeled, unlabeled, and pseudo-labeled patient data. J Am Med selection method. Int J Data Min Bioinform 17(1):42–65
Inform Assoc 20(4):613–618 41. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature
24. Park K, Ali A, Kim D, An Y, Kim M, Shin H (2013) Engineering 521(7553):436–444
Applications of Arti fi cial Intelligence Robust predictive model 42. Polat K, Güneş S (2007) Breast cancer diagnosis using least square
for evaluating breast cancer survivability. Eng Appl Artif Intell support vector machine. Digit Signal Process 17(4):694–701
26(9):2194–2205 43. Er O, Tanrikulu AC, Abakay A, Temurtas F (2012) An approach
25. Shin H, Nam Y (2014) A coupling approach of a predictor and based on probabilistic neural network for diagnosis of Mesothe-
a descriptor for breast cancer prognosis. BMC Med Genomics lioma’s disease. Comput Electr Eng 38(1):75–81
7(Suppl 1):1–12 44. Bradley AE (1997) The use of the area under the roc curve in
26. Wang K, Makond B, Chen K, Wang K (2014) A hybrid classifier the evaluation of machine learning algorithms. Pattern Recogn
combining SMOTE with PSO to estimate 5-year survivability of 30(7):1145–1159
breast cancer patients. Appl Soft Comput J 20:15–24 45. Chicco D, Jurman G (2020) The advantages of the Matthews cor-
27. Shawky DM, Seddik AF (2017) On the temporal effects of fea- relation coefficient ( MCC ) over F1 score and accuracy in binary
tures on the prediction of breast cancer survivability. Curr Bioin- classification evaluation. BMC Genomics 21(1):1–13
form 12(4):378–384 46. Chicco D (2017) Ten quick tips for machine learning in compu-
28. Li Y, Ge D, Gu J, Xu F, Zhu Q, Lu C (2019) A large cohort study tational biology. BioData Mining 10(1):1–17
identifying a novel prognosis prediction model for lung adeno- 47. Halimu C, Kasem A, Shah Newaz SH (2019) Empirical com-
carcinoma through machine learning strategies. BMC Cancer parison of area under ROC curve (AUC) and Mathew correlation
19(1):1–14 coefficient (MCC) for evaluating machine learning algorithms on
29. Abdikenov B, Iklassov Z, Sharipov A, Hussain S, Jamwal PK imbalanced datasets for binary classification. In: Proceedings of
(2019) Analytics of heterogeneous breast cancer data using neu- the 3rd international conference on machine learning and soft
roevolution. IEEE Access 7:18050–18060 computing, pp. 1–6
30. Fotouhi S, Asadi S, Kattan MW (2019) A comprehensive data
level analysis for cancer diagnosis on imbalanced data. J Biomed Publisher's Note Springer Nature remains neutral with regard to
Inform 90:103089 jurisdictional claims in published maps and institutional affiliations.
31. Simsek S, Kursuncu U, Kibis E, Anisabdellatif M (2020) A dag a
hybrid data mining approach for identifying the temporal effects
of variables associated with breast cancer survival. Exp Syst Appl
139:112863
32. British T, Society C (2020) A novel data mining on breast cancer
survivability using MLP ensemble learners. Compu J Vol 63(3):
pp. 435–447

13

View publication stats

You might also like