You are on page 1of 7

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Breast Cancer Survival Prediction using Machine


Learning
P.Lakshmi Sai Saran 1 P.Hemanth Kumar2 Md.Sohail3
UG Scholar, Dept. Of CSM, NRI UG Scholar, Dept. Of CSE, NRI UG Scholar, Dept. Of CSM, NRI
Institute of Technology, A.P-521212 Institute of Technology, A.P-521212 Institute of Technology, A.P-521212

Abstract:- Breast cancer is still a major worldwide I. INTRODUCTION


health issue, highlighting the demand for accurate
prognostic tools to support individualised treatment Thousands of women globally are affected by breast
choices. In this article, we describe a unique method for cancer, which continues to be a serious public health issue.
reliably predicting breast cancer survival outcomes that Even with improvements in detection and therapy,
synergistically combines multimodal biomarkers with determining an accurate prognosis for breast cancer survival
state-of-the-art machine learning approaches. is still a difficult task. Traditional prognostic models
frequently focus on a small number of clinical variables and
This study makes use of a large dataset that fall short of capturing the complex interactions between
includes clinical, histological, genetic, and imaging data many factors influencing disease development and treatment
collected from a heterogeneous group of breast cancer response. The development of machine learning techniques
patients. We use sophisticated feature engineering in recent years has opened up a promisingpath for improving
techniques to extract relevant data form each modality prognosis accuracy and enabling individualised treatment
and assure robust depiction of the fundamental plans.
biological processes by utilising this wealth of data.
A branch of artificial intelligence called machine
We investigate a broad spectrum of cutting-edge learning uses computing algorithms to analyse huge, diverse
machine learning algorithms, such as ensemble datasets and spot relationships and trends that would not be
approaches, deep learning architectures, and explainable obvious using conventional statistical techniques.
AI models, in order to improve model performance and
improve interpretability. We determine the best Researchers have the opportunity to uncover hidden
algorithmic framework that maximises predicted patterns in challenging breast cancer datasets and create
accuracy while offering valuable insights into the reliable models for predicting survival by utilising machine
underlying causes of survival differences through learning methods.
rigorous cross-validation and model selection
approaches. This study paper's goal is to investigate and create a
cutting-edge machine learning framework for predicting
Furthermore, in order to pinpoint the most useful breast cancer survival that makes use of a variety of data
biomarkers influencing prognosis, we examine the sources. To reflect the complex nature of breast cancer
effects of various feature selection strategies and progression, we integrate clinical, histological, genetic, and
dimensionality reduction techniques. As a result, it is imaging aspects. Our objective is to develop a prognostic
possible to identify prospective therapeutic targets and model that transcends the constraints of current techniques,
create individualised treatment plans.On a sizable and offers higher accuracy, and offers useful insights into the
diverse breast cancer dataset, numerous experiments are underlying biological mechanisms causing survival
carried out to verify the efficacy of our suggested inequalities by utilising this extensive dataset. likewise, our
architecture. The results show much higher precision, research aims to address the interpretability issue that is
specificity, and sensitivity than those of existing frequently connected to machine learning models. Although
prognostic models, demonstrating superior predictive these models have shown impressive predictive power,
ability. Additionally, extensive internal and external comprehension of the reasoning behind their predictions is
verification processes have proven that our model still essential for clinical use. Therefore,
achieves great stability and generalizability. our strategy focuses on both obtaining high accuracy and
creating understandable AI models that can clarify how
Keywords:- Breast Cancer, Machine Learning, AI Models, different biomarkers and attributes contribute to survival
Prognostic Tools, Extensive Internal and External outcomes. Building trust among healthcare professionals and
Verification Processes. easing the incorporation of machine learning into clinical
decision-making procedures depend on this interpretability
element.

IJISRT23JUL1355 www.ijisrt.com 1744


Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
We will use a strict methodology that includes data  Breast Cancer
pretreatment, feature engineering, algorithm selection, and
model evaluation to achieve our research goals. To ensure
the robustness and generalizability of our findings, we will
use a broad and representative dataset that includes a large
number of breast cancer patients. Additionally, in- depth
validation procedures will be carried out to evaluate the
functionality and stability of our suggested model.

By providing more precise prognostication, facilitating


personalised treatment approaches, and ultimately
improving patient outcomes, the findings of this study have
the potential to have a considerable impact on breast cancer
care. Our strategy seeks to push the limits of breast cancer
survival prediction by utilising machine learning techniques
and combining various data sources, opening up new doors
for precision medicine and improving clinical decision- Fig 3 Breast Cancer
making in the battle against breast cancer.
II. TECHNOLOGIES USED
 Cancer :
 Python
Python, a flexible and high-level programming
language, has been widely embraced by a wide range of
industries. Python is renowned for its ease of use,
readability, and broad library support. It provides developers
of all skill levels with a seamless writing experience. Python
enables researchers, data professionals, and developers to
effectively tackle complicated issues and create creative
solutions because to its clean syntax and clear design
principles. Python's enormous ecosystem of libraries,
including NumPy, Pandas, and TensorFlow, offers a
comprehensive toolkit that speeds up development and
encourages innovation, covering everything from data
analysis and processing to machine learning and artificial
intelligence. Additionally, Python's free and open-source
nature and active community support its ongoing progress by
guaranteeing that new features and improvements are easily
accessible.

Fig 1 Normal Cell Development  Machine Learning:


The dynamic subject of machine learning, which
combines statistics and computer science, has completely
changed how we approach challenging issues. It includes a
wide variety of algorithms and methods that enable
machines to gain knowledge from data and form wise
judgements or predictions without being explicitly
programmed. Researchers and practitioners can find hidden
patterns, glean important insights, and create complex
models that propel breakthroughs across a range of fields by
utilising machine learning.

Machine learning has evolved into an essential


technology that offers creative solutions and transforms
industries, ranging from healthcare and finance to
transportation and cybersecurity.

It is a useful tool in the pursuit of discovering


Fig 2 Cancer knowledge and problem-solving because to its capacity to
manage large-scale and high-dimensional information, adapt
to changing contexts, and continually improve performance.
The potential of machine learning extends beyond predicted
accuracy, as it prepares the way for artificial intelligence

IJISRT23JUL1355 www.ijisrt.com 1745


Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
(AI) where models can explain their reasoning in a clear and opening up previously unimaginable opportunities across a
comprehensible manner, inspiring confidence and wide range of fields thanks to its vast capabilities and
facilitating informed decision-making. Machine learning is boundless potential.
positioned to change the future by revealing intricacies and

Fig 4 Machine Learning

III. SOFTWARE REQUIREMENTS  User Interface and Visualisation:


SPECIFICATION
 To interact with the system, researchers and doctors
 Machine Learning Algorithms: should have a user-friendly interface provided by the
programme. This will make it easier to explore data,
 A wide variety of machine learning methods, including choose models, and view results.
decision trees, random forests, support vector machines  To support comprehension and decision-making, it
(SVM), logistic regression, gradient boosting, and deep should provide intuitive visualisations of prediction
learning architectures, should be included in the outcomes, feature relevance, and model performance.
programme.
 It should enable users to test out various algorithms, IV. EXISTING SYSTEM
fine-tune hyperparameters, and assess their effectiveness
using the proper metrics (such as accuracy, sensitivity, Multiple crucial elements make up the current cutting-
specificity, and area under the receiver operating edge method for machine learning-based breast cancer
characteristic curve, for example). survival prediction.

 Model Training and Evaluation: First, patient records are gathered from a variety of
sources, including hospitals, research facilities, and open
 The software should make it easy to train models on the databases, to create a complete dataset that includes clinical,
pre-processed dataset using the chosen machine learning histopathological, genomic, and imaging data. To deal with
algorithms. missing values, outliers, and inconsistent data, the dataset is
 It should allow cross-validation methods for evaluating pre-processed.
the models' generalizability and stability.

IJISRT23JUL1355 www.ijisrt.com 1746


Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
By utilising statistical approaches and domain  Problems with Data Heterogeneity and Integration:
knowledge, feature engineering techniques are used to Compatibility, inconsistency, and heterogeneity are
extract pertinent features from the dataset. Tumour size, problems when integrating different datasets from different
lymph node involvement, hormone receptor status, gene sources. The effectiveness and generalizability of the models
expression patterns, and radiological imaging features are might be impacted by various data quality across sources,
only a few of the many aspects covered by these features. missing variables, and different data formats.
Additionally, clinical factors like age, menopausal status,
and prior treatments are taken into account.  Handling Missing Data:
Missing data, including those related to breast cancer,
Predictive models are then trained using machine are a significant problem in medical databases. The current
learning methods. Decision trees, random forests, support system might not be able to handle missing values correctly,
vector machines (SVM), logistic regression, gradient which could lead to inaccurate findings and the possible loss
boosting, and deep learning architectures are common of important data. Effective handling of missing data is
techniques used in breast cancer survival prediction. These essential for ensuring the dependability and precision of the
models' hyperparameters are tweaked to maximise predictive models.
performance once they are trained on the pre-processed
dataset.  Model Generalisation:
The current system may have trouble applying the
Area under the receiver operating characteristic curve prediction models to other populations, racial groups, or
(AUC-ROC), accuracy, sensitivity, and specificity are some healthcare environments. When used to different patient
suitable performance indicators used in model evaluation. To cohorts or geographical areas, the models developed on one
evaluate the generalizability and robustness of the models, set of data may not function as well. To evaluate the
cross-validation methods are used. generalizability of the models, adequate validation and
outside testing are required.
Post-hoc techniques are used to examine the internal
workings of the models, highlight the most important V. PROPOSED SYSTEM
features contributing to survival predictions, and improve
interpretability and explainability. To advance the field of breast cancer survival
prediction, we propose a novel system that integrates cutting-
The current system often needs computational edge techniques to enhance accuracy, interpretability, and
resources to efficiently handle huge datasets. The models personalized treatment strategies. The proposed system
are frequently trained and tested using high- performance encompasses several key components aimed at addressing
computer clusters and cloud computing platforms. For the limitations of the existing approaches.
smooth data access and real-time prediction, integration
with current healthcare systems and databases is also  Multi-modal Data Integration:
crucial. The suggested system will make use of a wide variety of
data sources, such as clinical records, histopathological
Although the current system has shown encouraging findings, genetic profiles, and imaging data. By combining
outcomes, there are still issues to be resolved. Improved different modalities, we hope to obtain a more complete
feature selection methods are required to find the most picture of the underlying biological processes and pinpoint
insightful biomarkers, as well as to correct class imbalances important prognostic elements that would have been missed
in the dataset and make complex machine learning models in single- modal methods.
easier to understand.
 Advanced Feature Engineering and Selection:
In conclusion, the current approach provides a basis for To extract useful features from the combined dataset,
machine learning-based breast cancer survival prediction, the proposed system will make use of advanced feature
emphasising the significance of thorough data integration, engineering approaches. We will investigate cutting- edge
feature engineering, algorithm selection, model validation, techniques, such as deep learning-based feature extraction
and interpretability techniques. and transfer learning, in addition to conventional statistical
approaches, to capture complex patterns and latent
 Disadvantages of Existed System: representations that are essential for precise survival
prediction.
 Data Bias and Quality:
The reliability and accuracy of the predictions  Ensemble of Machine Learning Algorithms:
significantly depend on the correctness and To build reliable and precise prediction models, the
representativeness of the training data. Existing proposed system would make use of an ensemble of
discrepancies in healthcare outcomes can be widened by machine learning algorithms. This group of algorithms will
inherent biases in the data, such as underrepresentation of include decision trees, random forests, gradient boosting, and
some communities or restrictions on data gathering. deep neural networks, among others. We seek to increase
the models' generalizability and predictive power by fusing
the advantages of several techniques.

IJISRT23JUL1355 www.ijisrt.com 1747


Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Interpretability and Explainable AI: forecasts by utilising post-hoc methodologies and
The suggested system will concentrate on creating visualisation tools. The predictive models can be used in
models with high interpretability in order to give doctors and actual clinical settings with greater knowledge, confidence,
researchers clear and useful information. Modern and adoption because to this transparency.
explainable AI approaches will be used, including feature
importance analysis, SHAP (SHapley Additive  Individualised Treatment Plans:
exPlanations), and model-independent The correct prognostication of breast cancer survival
interpretability techniques. enables the creation of individualised treatment plans. The
suggested approach gives clinicians the ability to customise
 Advantages of Proposed System: treatment programmes by taking into account the unique
patient's traits, risks, and biomarkers, optimising outcomes
 Increased Accuracy: and reducing pointless interventions. The personalised
To increase the accuracy of breast cancer survival approach has the power to transform breast cancer treatment
forecasts, the proposed method makes use of cutting-edge and enhance patient well-being.
machine learning algorithms and feature engineering
approaches. The models are better able to depict the  Integration of Multiple Data Sources:
complexity of breast cancer growth and treatment response The suggested system incorporates multiple datasets,
because they combine several datasets and extract pertinent including imaging, genomic, histopathological, and clinical
information. data. It provides a full perspective of the disease and allows
for a more thorough and precise prediction of survival
 Improved Interpretability: outcomes by integrating various disparate data sources.
In contrast to the current approach, the suggested Through this integration, the models' ability to anticipate the
system places a higher priority on the model's explain ability future is improved, and a deeper comprehension of breast
and interpretability. It offers physicians and academics cancer progression is made possible.
useful insights into the key components driving survival

VI. SYSTEM ARCHITECTURE

Fig 5 System Architecture

IJISRT23JUL1355 www.ijisrt.com 1748


Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Data Collection leveraging the strength of various datasets and sophisticated
 Feature Engineering algorithms. Our findings pave the way for future
 Feature Selection developments in breast cancer research and clinical
 Model Selection judgement, as well as newdirections for precision treatment.
 Training and Validation
 Model Evaluation REFERENCES
 Model Interpretability
[1]. Seyyid Ahmed Medjahed, Tamazouzt Ait Saadi,
 Deployment and Integration
Abdelkader benyettou,”Breast Cancer Diagnosis by
 Performance Optimization
using k-Nearest Neighbor with Different Distances and
Classification Rules”,International Journal of
VII. CONCLUSION Computer Applications (0975 - 8887) Volume 62 - No.
1, January 2013
We have introduced a novel method for predicting
[2]. Shemim Begum, Debasis Chakraborty, Ram Sarkar,
breast cancer survival using machine learning techniques in “Data Classification Using Feature Selection And K-
this research study. We wanted to capture the complicated NEAREST NEIGHBOUR Machine Learning
nature of breast cancer progression and therapy response by Approach”, ISSN: 978-1-5090-0076-0/15, 2015 IEEE
integrating various datasets spanning clinical, histological,
[3]. Hamada R. H. AI-Absi, Brahim Belhaouari Samir,
genomic, and imaging data. We determined which
Khaled Bashir Shaban, Suziah Sulaiman,”Computer
biomarkers were the most insightful through careful feature Aided Diagnosis System based on Machine Learning
engineering and selection, and we created reliable predictive Techniques for Lung Cancer”, ISSN: 978-1-4673-
models. 1938-6/12 IEEE
[4]. Baraa M. Abed , Khalid Shaker, Hamid A. Jalab,
Our results show that a variety of machine learning Hothefa Shaker, Ali Mohammed Mansoor, Ahmad F.
methods, such as gradient boosting, decision trees, random
Alwan, Ihsan Salman Al-Gburi, “A Hybrid
forests, support vector machines, logistic regression, and
Classification AlgorithmApproach for Breast Cancer
deep learning architectures, are capable of accurately
Diagnosis”, ISSN: 978-1-5090-0925- 1/16, 2016 IEEE
forecasting breast cancer survival rates. In comparison to [5]. H. Song, H. Watanabe, X. Xiao and T. Kikkawa,
conventional prognostic methods, these models performed
"Influence of Air-gaps between Antennasand Breast on
better, reaching greater accuracy, sensitivity, specificity, and
Impulse-RadarBased Breast Cancer Detection," 2019
AUC-ROC values.
13th European Conference on Antennas and
Propagation (EuCAP), Krakow, Poland, 2019, pp. 1-2
Additionally, we stress the significance of model
[6]. N. Aibe et al., "Results of a nationwide survey on
interpretability and explainability in the context of
Japanese clinical practice in breast- conserving
predicting breast cancer survival. We gave insights into the
radiotherapy for breast cancer," in Journal of Radiation
underlying elements causing survival disparities by using
Research, vol. 60, no. 1,pp. 142-149, Jan. 2019.doi:
post-hoc methodologies and visualisation tools, enabling 10.1093/jrr/rry095
doctors and researchers to make wise judgements and have [7]. Biomedical Engineering, vol. 66, no. 7, pp. 2053-
faith in the predictive models. 2064, July 2019.doi: 10.1109/TBME.2018.2882867
[8]. E. A. Bayrak, P. Kırcı and T. Ensari, "Comparison of
Our study's ramifications are significant. Accurate Machine Learning Methods for Breast Cancer
survival prediction for breast cancer can result in more Diagnosis," 2019 Scientific Meeting on Electrical-
effective therapeutic approaches, individualised therapies,
Electronics & BiomedicalEngineering and Computer
and better patient outcomes. We open the door to precision Science (EBBT), Istanbul, Turkey, 2019, pp. 1-3. doi:
medicine by utilising machine learning techniques, where 10.1109/EBBT.2019.8741990
treatment plans can be customised based on unique traits
and risks.

However, there are still issues with improving feature


selection methods, dealing with class imbalance, and
improving model interpretability. To improve the prediction
capability of the models, future research should concentrate
on incorporating other data sources, such as transcriptomics
and proteomics. Collaborations with doctors and healthcare
professionals are also essential to ensuring the successful
application of these predictive models in actual clinical
settings.

In conclusion, our study adds to the body of


knowledge in machine learning-based breast cancer survival
prediction. We have shown the ability to transform
prognostic models and enhance patient outcomes by

IJISRT23JUL1355 www.ijisrt.com 1749


Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
BIOGRAPHIES

P.Lakshmi Sai Saran is currently studying B.Tech with


specification of computer science and engineering in NRI
Institute of technology. He has done a project on Prediction
of Breast Cancer using Machine Learning.He has finished
an internship at Blackbucks and he has done 1 NPTEL,
CISCO, Microsoft Azure,Devops Engineer expert and
Azure administrator associate certificates.

P.Hemanth kumar is currently studying B.Tech with


specification of computer science and engineering in NRI
Institute of Technology . He has done a project on future
sales prediction . He has finished an internship at
Blackbucks and he has done 1 NPTEL certificates.

Md. Sohail is currently studying B. Tech with


specification of Computer Science Engineering (Artificial
Intelligence and Machine Learning) in NRI Institute of
Technology. He has done a project on Prediction of Breast
Cancer using Machine Learning. He has finished an
Internship at Blackbuck. He also got a Microsoft Certificate
on Azure.

IJISRT23JUL1355 www.ijisrt.com 1750

You might also like