You are on page 1of 19

Heart attack analysis via

supervised machine learning


methods
Student Name and roll no.
Table of Contents
01 02 03
The Disease Diagnosis Recommendation
You could describe the You could describe the You could describe the
topic of the section here topic of the section here topic of the section here

04 05
Pathology Treatment
You could describe the You could describe the
topic of the section here topic of the section here
Introduction
• Heart attack, also known as myocardial infarction, is a serious condition that affects many
people worldwide.
• It occurs when the blood flow to a part of the heart muscle is blocked, leading to damage or
death of the affected tissue.
• Heart attack is a major cause of illness and death, and understanding its risk factors,
diagnosis, and treatment is crucial.
• Machine learning algorithms can be used to predict the likelihood of a heart attack in a
person based on a labelled dataset.
• Risk factors for heart attack can be classified as modifiable and non-modifiable.
• Modifiable risk factors include hypertension, high cholesterol, smoking, obesity, physical
inactivity, poor diet, diabetes, and stress.
• Understanding the role of modifiable risk factors is crucial for implementing preventive
strategies and interventions to reduce the risk of heart attack.
Background of
study
• Heart attacks are a leading cause of death worldwide
• Traditional risk assessment methods may overlook complex correlations and
interactions between risk factors
• Machine learning algorithms offer advantages for heart attack analysis, including
analyzing multiple risk indicators at once and detecting complex relationships
between risk factors

Problem statement
• Early identification of those at risk of having a heart attack is critical for implementing
preventative strategies
• Traditional methods for assessing the risk of a heart attack may lead to erroneous forecasts
• The usefulness of supervised machine learning algorithms for heart attack analysis is
unknown and requires further research
• Research problem statement: Compare the performance of supervised machine learning
methods to traditional methods of heart attack risk assessment to determine whether they
can improve accuracy and efficacy of heart attack risk assessment and inform therapeutic
treatment.
AI
M
The purpose of this research is to develop and test supervised machine
learning algorithms for assessing heart attack data and forecasting the risk
of a heart attack, as well as to identify the most relevant features that
contribute to prediction accuracy.

Objecti

ve
To identify gaps in current research by reviewing the available literature on supervised machine learning
approaches for heart attack analysis.
• To perform exploratory data analysis on the heart attack dataset in order to detect any trends or patterns.
• Using feature selection approaches, determine the most important elements that contribute to prediction
accuracy.
• To compare and assess the performance of several supervised machine learning models, including logistic
regression, decision trees, and support vector machines.
• To optimize the performance of the best-performing machine learning model using hyper parameter
tweaking approaches.
• To assess the results and determine the most important characteristics that contribute to forecast accuracy.
• To make suggestions on the usage of supervised machine learning algorithms for healthcare practitioners.
Scope of
study Significance of stud
• Use of supervised machine learning
techniques to assess heart attack data and • Improved Prediction Accuracy: Heart
anticipate the risk of a heart attack attack ML prediction models can achieve
• Models: logistic regression, decision trees, greater accuracy than standard statistical
support vector machines methods. Helps healthcare providers
• Feature selection strategies: recursive identify patients at risk of having a heart
feature elimination, principal component attack and provide timely therapies to
analysis prevent heart disease.
• Use of hyperparameter tuning • Early Heart Disease Detection: ML
methodologies to improve model
models can assist healthcare
performance
• Investigation of heart attack datasets and
practitioners in detecting heart disease at
variables such as demographic information, an early stage, resulting in better patient
medical history, physical examination findings, outcomes and lower healthcare
and laboratory results expenditures.
• Evaluation of supervised machine learning • Identification of Significant Features:
algorithms in improving cardiac illness Identify the most important features that
detection and therapy and integration with contribute to the accuracy of heart attack
clinical decision-making prediction. Helps in prioritizing certain risk
• Study of potential limitations and ethical factors in their patient assessments
issues associated with using machine learning
in heart attack analysis and risk prediction
Literature Review
Heart Attack Prediction Using Machine Learning
Predicting Heart Attack Using Machine Learning Algorithms: A Comparative Analysis (Sadiq et al.,
Techniques (Ali et al., 2020) 2021)
• Dataset: 303 patients with heart disease • Dataset: 303 patients with heart disease
• Algorithms: Logistic Regression, K-Nearest • Algorithms: Logistic Regression, Naïve Bayes,
Neighbors, Decision Tree, and Random Forest Decision Tree, Random Forest, Support Vector
• Highest accuracy: Random Forest with 89.76% Machine, and Artificial Neural Network
• Potential of machine learning to improve heart attack • Highest accuracy: Random Forest with 91.42%
prediction accuracy • Importance of assessing multiple algorithms to
determine the most effective one
• Potential of machine learning to increase accuracy in
early diagnosis and prevention of heart disease
A Comparison of Supervised Machine Learning
Techniques for Heart Attack Prediction (Al-Shayea et
A Hybrid Machine Learning Approach for Heart
al., 2021)
Attack Prediction (Farghaly et al., 2020)
• Dataset: 303 patients with heart disease
• Dataset: 303 patients with heart disease
• Algorithms: Logistic Regression, K-Nearest Neighbors,
• Algorithms: Decision Tree and Artificial Neural Network
Decision Tree, Random Forest, Support Vector
• Accuracy: 96.7%; sensitivity: 97.8%; specificity: 96.2%
Machine, and Artificial Neural Network
• Value of using multiple algorithms to improve heart
• Highest accuracy: Artificial Neural Network with
attack prediction accuracy
92.77%
• Potential of hybrid technique for early detection and
• Random Forest accuracy: 91.89%; sensitivity: 93.94%;
prevention of heart disease
specificity: 91.24%; area under ROC curve: 0.976
• Potential of machine learning to improve accuracy in
Research Methodology
• The heart attack assessment study requires a thorough
investigation methodology.
• The proposed method will be described in detail in the
book's research section.
• The research design, data collection, preparation,
alteration, interactive visual analytics, class balance, data
extraction approaches, and outcomes interpretation and
assessment will be reviewed.
• The suggested method for heart attack categorization will
be thoroughly explained, along with its justification and
implementation.
• The primary aim is to create a cleaned dataset based on
client data, perform attribute selection, and use various
modelling and classification techniques to predict the
disease and its accuracy measures.
Research Design
In this heart attack examination study, an elucidating research configuration was utilized, as the point was to portray and
break down the connection between different highlights connected with heart wellbeing and the event of heart attacks.

• Data Selection: Selection of a suitable dataset is crucial for the validity of research study.Dataset contains 303
samples and 14 features, and meeting the criteria for dataset size, diversity, quality, balance, and protection.

• Data Pre-processing: In this study, the heart attack dataset was pre-processed by handling missing values, encoding
categorical variables, dealing with anomalies, standardizing features, and ensuring data quality and consistency.

• Data Transformation: The data transformation steps performed on the pre-processed heart attack dataset, including
feature selection, engineering, and data splitting, as well as various methods used for data transformation such as
data cleaning, normalization, encoding, selection, and scaling.

• Interactive Visual Analytics: Interactive visual analytics is a method that combines visualization and user interaction
to analyze complex datasets and gain insights, including EDA, hypothesis testing, and model evaluation.

• Class Balancing: Class balancing is crucial for developing classification models, and techniques such as resampling
and synthetic data generation were used in this study to address class imbalance issues in the heart attack dataset.
Interactive visual analytics was used to evaluate the effectiveness of these techniques.

• Data Mining: Data mining techniques were used to develop a heart attack prediction model through algorithm
selection, model preparation, validation, hyperparameter tuning, and interpretation.

• Interpretation/Evaluation:The heart attack classification model was evaluated using various techniques including
visualization, feature importance analysis, decision boundary analysis, prediction explanation, model robustness
analysis, model comparison, and ethical considerations.
Proposed Method
Information determination: Selecting a relevant dataset with required features and adequate sample size.

Data pre-processing: Handling missing values, abnormalities, and encoding categorical features.

Feature engineering: Transforming raw data into meaningful features that can capture relevant information.

Interactive visual analysis: Exploring data, identifying patterns, and anomalies.

Class balancing: Resolving class imbalance by oversampling minority or under sampling majority class.

Data mining: Applying different AI algorithms to train and evaluate the model.

Interpretation/assessment: Evaluating model performance using various performance metrics and interpretation
techniques.

Model optimization: Refining the model by tweaking hyperparameters or changing classification thresholds.

Model comparison: Comparing the model with other existing or benchmark models.

Ethical considerations: Addressing bias, privacy, transparency, and fairness concerns in model development and
evaluation.
Analysis Data Description
• Matplotlib.pyplot library is used to create various plots and charts to
aid in gaining insights from the information.
• Data visualization is a powerful method that allows for visually
representing complex data in a more understandable form to
identify patterns, trends, relationships, and anomalies in the data.
• The Axes Subplot functionality allows for creating multiple plots
within a single figure, facilitating side-by-side comparison and
analysis of different aspects of the data.
• The heart.csv dataset consists of 14 different variables with 303
rows, and the analysis may involve plotting histograms, scatter
plots, bar charts, and line plots to explore the distribution,
relationships, and trends in the data.
Analysis (EDA)
• The article describes various data visualization techniques
for exploratory data analysis on the heart.csv dataset
using the Axes Subplot functionality from the
matplotlib.pyplot library.
• Histograms and box plots were used to visualize the
distributions of variables such as age, cholesterol levels,
and blood pressure. Subplots facilitated easy comparison
across different variables.
• Bar charts were used to visualize the distribution of
categorical variables such as sex, chest pain type, and
target variable.
• A heat map was created to explore the correlations
between variables, providing insights into potential
Data Cleaning & analysis
• The dataset includes information on patients' age, sex, chest pain type, resting pulse, serum
cholesterol level, fasting glucose level, resting electrocardiographic results, maximum heart rate
achieved during exercise, and whether they have exercise-induced angina.
• The typical age of patients is around 54.37 years, and the dataset has slightly more male patients
than female patients.
• The most common type of chest pain experienced by the patients is typical angina, followed by
atypical angina, non-anginal pain, and asymptomatic.
• The normal resting pulse is approximately 131.62 mm Hg, and the typical serum cholesterol level
is around 246.26 mg/dl.
• Most of the patients have a fasting glucose level under 120 mg/dl, and the proportion of patients
with exercise-induced angina is relatively small at approximately 0.33.
• The dataset provides information on the maximum heart rate achieved during exercise, with a
typical value of around 149.65 beats per minute.
Results &

Discussion
The portion comprises a thorough evaluation and translation of the results
obtained from the tests and assessments directed, utilizing perspectives
such as plots, charts, and tables to effectively present the information.
• Discussions on the consequences and significance of the findings in the
context of the study topic or application field are presented, including
explanations, interpretations, and insights into the results, as well as their
usefulness in addressing the research questions or objectives.
• The blue color map ('Blues') is used to represent the confusion matrix and a
color bar is added for better interpretation.
• The accuracy of the model is computed and reported using the accuracy
score () function from the sklearn.metrics module, which provides a
quantitative measure of the model's overall correctness in its predictions.
• By analyzing the confusion matrix visualization and interpreting the accuracy
score, we can assess the overall performance of the Naive Bayes classifier
on the validation dataset and compare it to other evaluation metrics for a
comprehensive understanding of the model's performance.
Results &
Discussion
• The dataset was split into training and testing datasets using the
train_test_split function from the sklearn.model_selection module.
• The testing dataset was set to 30% of the total dataset, while the remaining
70% was used for training the model.
• The Gaussian Naive Bayes classifier from the sklearn.naive_bayes module
was used to train the model on the training dataset using the fit function,
with the input data containing feature values and labels.
• A confusion matrix is used to evaluate the effectiveness of a machine
learning model by comparing its predicted output to the actual output, and
provides information on true/false positives and true/false negatives.
• Visualization of the confusion matrix is crucial in quickly and easily
identifying model errors and evaluating its performance using several
evaluation criteria.
• The plt.imshow() capability from the matplotlib.pyplot module is used to plot
the confusion matrix, with the cm variable as input, and a blue color map
('Blues') and variety bar added for better interpretation. The visualization
provides a visual representation of the model's performance, making it
easier to interpret and analyze the results.
Conclusion
• The Naive Bayes classifier was utilized to analyze a dataset of heart to predict age based on
available features, with promising results.
• The confusion matrix helped to identify areas for improvement in the model, such as accurately
forecasting the risk of heart attack.
• Supervised machine learning approaches have shown potential in enhancing the accuracy and
efficiency of heart attack diagnosis and prediction.
• The data selection process and data transformation techniques can improve data quality and
model performance.
• Limitations and considerations to keep in mind include the assumptions of the Naive Bayes
classifier and the amount and level of detail of data used for training and testing.
Recommendations
• Machine learning algorithms such as decision trees, support vector machines,
and deep learning models should be explored to compare their performance in
predicting age.
• Research and development is needed to improve the accuracy and interpretability
of machine learning models, and to develop more robust algorithms that can
handle complex information and predict heart attacks more accurately.
• Future studies could examine the effects of different feature engineering and
selection approaches on model performance. Experimenting with different hyper
parameter settings and model optimization techniques could be explored to fine-
tune the Naive Bayes classifier and improve its performance.
• Testing the performance of the Naive Bayes classifier in a real-world setting could
provide insights into its practical applicability and effectiveness.
References
• Ali, M. M., Paul, B. K., Ahmed, K., Bui, F. M., Quinn, J. M., & Moni, M. A. (2021). Heart disease
prediction using supervised machine learning algorithms: Performance analysis and
comparison. Computers in Biology and Medicine, 136, 104672.

• Krishnani, D., Kumari, A., Dewangan, A., Singh, A., & Naik, N. S. (2019, October). Prediction of
coronary heart disease using supervised machine learning algorithms. In TENCON 2019-2019
IEEE Region 10 Conference (TENCON) (pp. 367-372). IEEE

• Princy, R. J. P., Parthasarathy, S., Jose, P. S. H., Lakshminarayanan, A. R., & Jeganathan, S.
(2020, May). Prediction of cardiac disease using supervised machine learning algorithms. In
2020 4th international conference on intelligent computing and control systems (ICICCS) (pp.
570-575). IEEE.
Thank you.

You might also like