You are on page 1of 22

DEPARTMENT OF BIOMEDICAL ENGINEERING

FACULTY OF ENGINEERING

UNIVERSITY OF MALAYA

KIB 4009: PATTERN RECOGNITION

GROUP 7

ASSIGNMENTS AND VIDEO PRESENTATION

CLASSIFICATIONS OF CARDIAC DISEASES

Group Members Matric.no


WINNY WONG WEN NI U2004012
LIM YUN HUI S2026317
Introduction

Cardiovascular diseases (CVDs) remain a significant global contributor to


mortality (McNamara & Alzubaidi, 2019). Heart failure (HF) is characterized by
Suboptimal myocardial function. Symptoms include extreme fatigue, dyspnoea,
reduced stamina, and signs of systemic or pulmonary congestion (Malik & Brito, 2023).
The etiology and phenotype of heart failure vary significantly, and consensus on a
universal classification system is lacking due to substantial overlap between potential
categories (Tripoliti & Papadopoulos, 2017).

Heart failure is a clinical condition marked by identifiable symptoms like


shortness of breath, swelling in the ankles, and fatigue, possibly coupled with
observable signs such as increased jugular venous pressure, crackling sounds in the
lungs, and swelling in the extremities (Ponikowski & Voors, A, 2016). These
manifestations result from an underlying structural and/or functional abnormality in the
heart, causing a diminished cardiac output and/or increased pressure within the heart.
Abnormalities of the valves (stenosis, regurgitation), pericardium, endocardium, heart
rhythm/conduction or a combination of these alterations may also initiate heart failure
(Schwinger, 2021). Heart failure (HF) may be accompanied by considerable
alterations of left ventricular (LV) volume. All variants of HF show substantially
elevated LV filling pressures, which tend to induce changes in LV size and shape. In
patients with reduced contraction, the more severe the systolic dysfunction due to LV
filling pressures, the greater the EDV and ESV (Kerkhof, 2015).

Aortic stenosis (AS) is a common valvular disorder (Pujari & Agasthi, 2023)
which involves the narrowing or stenosis of the aortic valve, leading to left ventricular
outflow obstruction. It ranks as the primary valvular heart disease in Europe and North
America, following arterial hypertension and coronary artery disease (Rana, 2022). AS
etiologies include congenital (bicuspid/unicuspid), calcific, and rheumatic diseases.
Classic symptoms encompass angina, syncope, and dyspnea, with manifestations of
heart failure indicating a more severe form and a worse prognosis (Johnston &
Zeeshan, 2018). Many predisposing factors, such as age, hypertension and the
turbulence of perivalvular blood flow, contribute to degenerative aortic stenosis. In
microscopic observations, (Otto & Prendergast, 2014) noted that in stenotic valves,
the initial aortic lesions contain disorganized collagen fibres, chronic inflammatory cells,
lipids and proteins of extracellular bone matrix and bone minerals, causing the
narrowing of the aortic valve.

Deviation in the elastic properties of the aorta gives rise to an elevation in the
velocity of aortic pulse wave and reflected wave. This leads to premature return of
reflected wave velocities from the periphery to the root of the aorta, causing the systolic
portion of the pulse wave to fuse, ultimately contributing to the onset of systolic
hypertension (Boudoulas & Konstantinos, 2018).

Recent progress in magnetic resonance imaging (MRI), especially MRI in flow


in 4 dimensions, provides a thorough visualization and quantification of the dynamics
in aortic flow patterns in vivo. This encompasses advanced measures of
hemodynamics, including helicity and vorticity, wall shear stress, pressure gradients,
flow displacement, turbulent kinetic energy and viscous energy loss, (Garcia & Barker,
2019).

Machine learning (ML) as a pattern recognition technology finds application in


medical imaging (Zhang & Li, 2021). ML, learning parameters from examples, excels
in tasks like detecting and differentiating patterns in data (Leiner & Rueckert, 2019).
Various ML techniques, including Extreme Gradient Boost, Ada - Boost Classifier,
Decision Tree, Gradient Boost, Linear Regression, Support Vector Machine (SVM),
MLP, Extra Trees Classifier, Random Forest, and K-Nearest Neighbours are utilized
to recognize sign of heart irregularities (Mohi & Ripa, 2023). ML algorithms analyze
large volumes of clinical data to uncover links and patterns not immediately apparent
to human practitioners (Mahmud & Kabir, 2023).

In the analysis of cardiac disease diagnosis, multiple ML techniques were


employed, with SVM achieving an accuracy of 96.72%, and the Decision Tree
algorithm achieving 99.16% accuracy (Mohi & Ripa, 2023). These ML algorithms were
utilised to construct a Cardiovascular disease detection model, and the data was
examined using 4 techniques, with Decision Tree and Random Forest methods
showing a precision of 99.83%, while K - Nearest Neighbour methods and Support
Vector Machine exhibited precisions of 84.49% and 85.32% respectively (Hassan &
Iqbal, 2022). Heart disease detection models using the CHSLB dataset from Kaggle,
has accuracies of 99.03%, 96.10%, 100%, and 100% by the RF, DT, AB, and KNN
models respectively (Khateeb & Usman, 2017). This shows that machine learning is
indeed accurate in the diagnosis of cardiovascular diseases.

Problem Statement

Cardiovascular diseases, including heart failure and aortic stenosis, pose


significant health challenges, necessitating accurate and timely diagnostic methods.
This study addresses the need for an advanced machine learning classifier capable
of distinguishing between normal patients, those with heart failure, and those with
aortic stenosis based on comprehensive medical data. The goal is to enhance
diagnostic precision, streamline patient stratification, and contribute to more effective
clinical decision-making in the field of cardiovascular medicine.

LITERATURE REVIEW
PRE-PROCESSING
Raw datas are vulnerable to noise, corruption, missing, and inconsistent data,
it is necessary to perform pre-processing steps. Poor data can primarily affect the
accuracy and lead to false prediction (Maharana & Mondal, 2022). By far the most
common approach to the missing data is to simply omit those cases with the missing
data and analyze the remaining data. This approach is known as the complete case
(or available case) analysis or listwise deletion. However, its only acceptable if it
causes a small relatively loss of data (Kang, 2013).

FEATURE EXTRACTION
Left ventricular ejection fraction (EF) is a key measure in the diagnosis and
treatment of heart failure (HF) and many patients experience changes in EF overtime
(Adekkanattu & Rasmussen, 2023). HF falls into 2 major categories: Heart Failure
with preserved Ejection Fraction (HFpEF) and Heart Failure with reduced Ejection
Fraction (HFrEF). Preserved EF refers to the type of heart failure with normal ejection
fraction yet exhibits signs and symptoms of heart failure. Preserved EF is more
commonly in women as well as the elderly, with a regular left-ventricular cavity volume
but a dense and stiff LV wall (Inamdar & Inamdar, 2016). HF with preserved ejection
fraction has LV ejection fraction more than 50% while HF with improved ejection
fraction has LV ejection fraction more than 40%. On the other hand, Reduced EF is
marked by a dilated LV cavity, with the ratio of LV mass to end-diastolic volume being
either normal or reduced (Inamdar & Inamdar, 2016). Reduced ejected fraction in mild
condition has LV ejection fraction in the range of 41% to 49%, while in a more serious
condition, the left ventricular ejection fraction is less than 40% (Malik & Brito, 2023).
As the incidence of heart failure (HF) with reduced ejection fraction (HFrEF) increases
with age, the AS often co-exists with LV systolic dysfunction. Approximately one-third
of patients who are diagnosed with severe AS have LV systolic dysfunction, defined
as LV ejection fraction (LVEF) less than 50% (Spilias, 2022).

As mentioned, the HFpEF patient have an ejection fraction like a healthy patient
does. And so, the diagnosis of HFpEF requires the following diagnosis to be satisfied,
namely the patient must have signs and symptoms of heart failure, and evidence of
diastolic LV dysfunction (Paulus & Tschöpe, 2007). HFpEF is a complex and extremely
common syndrome, accounting for greater than 50% of HF patients (LeWinter & Meyer,
2013). In heart-failure patients, there was a paradoxical increase in left ventricular end-
diastolic volume in association with an expected decrease in right ventricular end-
diastolic volume during lower-body suction (Kerkhof, 2015).

CLASSIFICATIONS
There are several techniques used in the classification stages such as support vector
machine (SVM), decision tree, and random forest.
(i) Support Vector Machine (SVM)
Based on the study of B. Manoj Kumar (2022), the research aims to
detect heart disease using Support Vector Machine (SVM) classifier and
compare its performance with Linear Regression (LR) model. The dataset
for detection of heart disease was collected from the UCI machine learning
repository database and consisted of 60 samples. The SVM classifier
exhibited a significantly higher accuracy rate of 90.43% as compared to the
LR model's accuracy rate of 78.56%. The study utilized statistical analysis
to demonstrate the superiority of the SVM classifier in accurately detecting
heart disease. The SVM classifier was shown to outperform the LR model
in heart disease classification. The statistical analysis of the SVM classifier
and LR model was presented, demonstrating the superiority of the SVM
classifier in terms of mean accuracy rate, standard deviation, and standard
error mean. The study also included a comparison of the evaluation metrics,
statistical computations, and significance levels between SVM and LR,
further confirming the higher accuracy of the SVM classifier. In conclusion,
the research study demonstrated that the novel SVM classifier outperforms
the LR model in accurately detecting heart disease, with an accuracy rate
of 90.43%. The study provides valuable insights into the efficient prediction
of heart disease using SVM classification algorithms and establishes the
superiority of SVM over LR in terms of accuracy.
According to the study of Elsedimy, AboHashish, and Algarni (2023), the
document presents a new approach for predicting cardiovascular disease
(CVD) using a hybrid model of Quantum-Behaved Particle Swarm
Optimization (QPSO) and Support Vector Machine (SVM). The study
addresses the importance of early detection of CVD to reduce the risk of
heart attack and increase the chance of recovery. The proposed QPSO-
SVM model aims to analyze and predict heart disease risk by optimizing
SVM parameters using the QPSO algorithm. The data pre-processing
involves transforming nominal data into numerical data and applying
effective scaling techniques. The experimental results show that the QPSO-
SVM model achieved a high accuracy of 96.31% on the Cleveland heart
disease dataset. Furthermore, the model outperformed other state-of-the-
art prediction models in terms of sensitivity, specificity, precision, and F1
score. An improved SVM trained by a QPSO algorithm is proposed to select
the optimal values of the SVM parameters and improve classification
accuracy. The QPSO-SVM model is trained and learned using public heart
disease datasets to forecast the patients' heart disease status based on
their current state. The proposed QPSO-SVM is evaluated and compared
with the results of previous studies using evaluation metrics such as
accuracy, specificity, precision, G-Mean, and F1 score. Additionally, the
statistical analysis was presented to evaluate the QPSO-SVM significance
compared to other models. The study provides a detailed explanation of the
proposed QPSO-SVM model, including the use of QPSO to adjust SVM
parameters, the self-adaptive threshold method, and the experimental
results. The proposed model demonstrates promising results in predicting
heart disease risk and outperforms other state-of-the-art models. The
comprehensive summary of the study provides a clear understanding of the
significance and potential of the proposed QPSO-SVM model in predicting
cardiovascular disease.
(ii) Naive Bayes
Based on V Sai Krishna Reddy (2022), it discusses the application of
machine learning (ML) techniques, specifically Decision Tree and Naïve
Bayes classifiers, to predict cardiovascular disease using the Heart Failure
Dataset. The performance evaluation and results section provide the
accuracy values and confusion matrices for both classifiers, demonstrating
the effectiveness of the Naïve Bayes algorithm, which achieved an accuracy
of 86%, compared to the Decision Tree's 82%. The conclusion suggests
that the Naïve Bayes algorithm performed well and emphasizes the
importance of ML techniques in predicting and preventing heart disease.
(iii) KNN
Based on the study of C (2021), it presents a study on predicting heart
disease using different K-nearest neighbour (KNN) classifiers. The study
utilizes machine learning classification algorithms to analyze heart disease
data and predicts results using various versions of the KNN classifier in
MATLAB. The study evaluates the performance of these algorithms based
on accuracy, misclassification rate, and other metrics. Various distance
metrics and distance weights used by the KNN algorithms are also
discussed. The performance of the proposed KNN classifiers varies based
on the specific model and its configuration. The study evaluates the
performance of different KNN classifiers based on accuracy,
misclassification rate, prediction speed, and training time. The results show
that the Optimizable KNN model achieved the highest accuracy of 69%, with
a prediction speed of approximately 5600 observations per second.
However, it also had a relatively high training time of about 50.125 seconds.
Other KNN models such as Fine KNN, Medium KNN, Coarse KNN, and
Weighted KNN also demonstrated varying levels of accuracy,
misclassification costs, and prediction speeds.
SUMMARY
To classify the subjects into healthy, aortic stenosis (AS), and heart failure (HF),
feature extractions as the pre-processing steps are necessary to extract the significant
cardiac features that to be used along with the classifiers. During the study of feature
extractions, left ventricular ejection fraction (EF) is believed to be an important cardiac
feature for classification purpose. Apart from that, there are several classifiers that can
be used to differentiate if a subject is healthy or with heart diseases such as k-Nearest
Neighbour (KNN), Naïve Bayes, and Support Vector Machine (SVM).

METHODOLOGY
The proposed algorithm is implemented from MathWorks which classify the
heart disease conditions using SVM, Naïve Bayes, and KNN. The process is divided
into two phases which are training and testing phase. After the segmentation of left
and right ventricles using Segment software, cardiac features such as LVEF (Left
Ventricle Ejection Fraction), EDV (End- Diastolic Volume), EF (Ejection Fraction),
LVEDV (Left Ventricle End-Diastolic Volume), RVEF (Right Ventricle Ejection
Fraction), RVEDV (Right Ventricle End-Diastolic Volume), ESV (End-Systolic Volume),
SV (Stroke Volume), and HR (Heart Rate) are extracted into an Excel file. Heart
diseases will be classified into 3 classes which are HF (Heart Failure), AS (Aortic
Stenosis), and Healthy. There is a total of 66 data for all three classes with some of
the data removed due to missing values. According to the study conducted by Devare
(2023) which is about heart disease prediction using binary classification, the splitting
of the dataset is 70% for training and 30% for testing.
Feature selection techniques such as ANOVA are used in the MATLAB
Classification Learner App. ANOVA which represents for One-Way Analysis of
Variance is a statistical method used to analyze the differences among group means
in a sample. From the previous 9 features, only the first 4 features with the highest
influence will be selected. The 4 selected features are LVEF, EDV, EF, and LVEDV.
The classifications of heart diseases into three different classes such as HF, AS, and
Healthy are based on the 4 cardiac features which are LVEF, EDV, EF, and LVEDV.
The 3 chosen classifiers for the heart classifications using MATLAB Classification
Learner App are Support Vector Machine (SVM), KNN, and Naive Bayes.
The SVMs are one of the most powerful and robust classification algorithms
that has been playing a significant role in pattern recognition (Cervantes, 2020). SVMs
distinctively afford balanced predictive performance, even in studies where sample
sizes may be limited (Pisner & Schnyer, 2020). SVM is a sparse technique. Like
nonparametric methods, SVM requires that all the training data be available, that is,
stored in memory during the training phase, when the parameters of the SVM model
are learned. However, once the model parameters are identified, SVM depends only
on a subset of these training instances, called support vectors, for future prediction.
Support vectors define the margins of the hyperplanes (Awad & Khanna, 2015).
Contrary to the general assumption that a nonlinear decision boundary is more
effective in SVM classification than a linear one, we have found that SVM encounters
overfitting on nonlinear kernels through rigorous kernel analysis (Han & Jiang, 2014).
And therefore, we are using a linear kernel SVM.
Naive Bayes is a probabilistic classification algorithm rooted in Bayes' theorem,
a method that computes the probability of a predictor given a specific class. It operates
under the assumption that a predictor's impact on a class remains independent of the
values of other predictors, a principle termed class conditional independence. This
algorithm finds notable application in medical science, particularly in the diagnosis of
heart patients, owing to its straightforwardness and capacity to deliver precise
outcomes. Despite its simplicity, the Naive Bayes classifier frequently demonstrates
robust performance and enjoys widespread usage, often surpassing more intricate
classification methods. It leverages probability theory to ascertain the most probable
classification for an unseen instance and proves effective with categorical data, though
it may exhibit suboptimal performance with numerical data in the training set. In the
realm of heart disease prediction, the Naive Bayes algorithm has been effectively
employed to categorize medical data, yielding accurate predictions of heart diseases
with a noteworthy level of precision (K.Vembandasamy, 2015).
KNN is employed in machine learning for tasks related to classification and
regression. As a non-parametric and approach, KNN does not assume a specific data
distribution and operates without a dedicated training phase. Instead, it categorizes
new data points by assessing their resemblance to existing data points within the
feature space. The algorithm assigns a label to a fresh observation based on the
predominant class among its closest neighbors, with "closeness" determined by a
distance metric like Euclidean distance or cosine similarity. KNN finds extensive
application in tasks such as pattern recognition, anomaly detection, and
recommendation systems, thanks to its simplicity and effectiveness in managing
intricate datasets (jabbar, Deekshatulu, & Chandra, 2013).

RESULTS

(i) Support Vector Machine (SVM)


The accuracy of testing for linear SVM had reaches an accuracy of 80%.

Figure: Summary of Validation and Testing results of Linear SVM.

Figure: Confusion matrix of Linear SVM.


(ii) Naïve Bayes
Gaussian Naïve Bayes is used as the classifier. The testing accuracy of
Naïve Bayes model had reaches 70.0%.

Figure: Summary of Validation and Testing results of Gaussian Naïve Bayes.

Figure: Confusion matrix of Gaussian Naïve Bayes.

(iii) KNN
Cubic KNN had reaches a testing accuracy of 75.0%.
Figure: Summary of Validation and Testing results of Cubic KNN.

Figure: Confusion matrix of Cubic KNN.


DISCUSSIONS
For the results of using MATLAB Classification Learner App, Support Vector
Machine (SVM) shows the highest accuracy of 80% as compared with KNN with an
accuracy of 75%, and Naïve Bayes with an accuracy of 70%. SVM shows the highest
accuracy as it can handle high-dimensional data which is effective in the cases with
limited training samples, and it can handle non-linear classification using kernel
functions (Saini, 2024). SVM is widely used in classifications and regression problems.
The use of SVM enables the finding of optimal line or decision boundary for the
classification of n-dimensional space into sections so that successive data points may
be classified conveniently (Sheth, Tripathi, & Sharma, 2022). SVM can handle
structured, semi-structured, and unstructured data and the Kernel functions can
minimize the complexity of the data type. The downsides of SVM are such as it is time-
consuming with a larger data sets, and the difficulties in locating the accurate kernel
function. On the other hand, Naïve Bayes has the benefits to set up, able to produce
good results, and its scales is proportional with the number of predictors and data
points. Moreover, Naïve Bayes need less training data, and able to manage discrete
and continuous data. The pitfalls of Naïve Bayes are such as the developed model is
too simple, and properly trained and optimized model often outperform them. The
complexity to implement Naïve Bayes will increase if a continuous variable such as
time is used. As compared with SVM or simple logistic regression, Naïve Bayes need
a higher runtime memory for prediction. Therefore, Naïve Bayes is more time-
consuming (Sheth et al., 2022). Furthermore, KNN which represents for K-Nearest
Neighbour can estimate the values of any new data points using the feature similarity
algorithm. KNN is a simple technique which can be implement quickly. KNN algorithm
is inexpensive and ideal for multi-modal classes. The drawbacks of KNN are it is
relatively costly to classify unknown records, and it is much more computationally
costly as the size of the training set grows. Moreover, accuracy will degrade because
of noisy or irrelevant features.
SVM outperforms Naïve Bayes and KNN due to its effectiveness in high-
dimensional feature spaces. In terms of heart disease classifications, there are some
significant number of features such as LVEF, EDV, EF, and LVEDV which SVM can
perform well by finding a hyperplane that effectively separate between different
classes. In comparison with KNN, SVM is less sensitive to noisy data. Thus, its
maximum-margin property of SVM helps it in mitigating the impact of outliers and noisy
data on the decision boundary. Next, SVM often require tuning of hyperparameters,
such as the regularization parameter (C) and the choice of kernel. Properly tuned
hyperparameters contribute to the superior performance of SVM.
Confusion matrix is used in machine learning and statistics to assess the
performance of a classification model. It summarizes the results of classification by
showing the counts of true positive, true negative, false positive, and false negative
predictions. The confusion matrix helps in the evaluation of a model’s accuracy,
precision, recall, and F1 score, which are crucial metrics for understanding how well
the model classifies data into different categories.
To further improve the accuracy of the model of the classification of heart
disease, an ensemble technique that combines several Machine Learning algorithms
can be used. According to the study of Latha and Jeeva (2019), an ensemble method
with combinations of Naïve Bayes, Random Forests, Multilayer Perceptrons (MLP),
and Bayesian networks based on majority voting (MV) are used for the detection of
heart disease. Moreover, missing values and outliers’ values in the dataset need to be
treated. For example, the missing values can be eliminated but the elimination results
in the reduction in the size of dataset and potential loss of valuable information. Apart
from that, imputation methods can be applied such as Pearson Correlation
Coefficients or KNN imputations to replace the missing values in the dataset. These
methods could predict the missing values according on the other variables in the
dataset. Furthermore, normalisation and standardization can be used to address the
outliers in the dataset. Standardization involves the subtraction by mean and the
division by the standard deviation while normalisation involves the scaling of feature
values to a uniform range.

CONCLUSION

Support Vector Machine (SVM) stands out as the preferred choice for heart
disease classification when compared to other machine learning techniques, due to
its ability in handling high-dimensional feature spaces and accommodating non-linear
classifications through kernel functions. To enhance SVM's performance,
normalisation and tuning of hyperparameters, particularly the regularization parameter
and kernel selection are useful for optimal results. Current innovations in SVM
applications involve ensemble techniques, combining various algorithms like Naïve
Bayes and Random Forests, showcasing the potential for even more robust and
accurate diagnostic models (Biswas, 2023). Ultimately, the diagnosis of heart diseases
can be significantly improved by integrating SVM with advanced ensemble methods,
addressing missing values, and employing normalization techniques, ensuring precise
and early detection for effective intervention of heart diseases.

REFLECTIONS
Winny Wong Wen Ni
In this assignment, the most interesting study I had done is how to utilize my
MATLAB knowledges in performing feature extractions, conducting ranking
mechanisms on the cardiac features which is ANOVA, then feeding the features into
the classification learner app to detect the existence of heart disease. The journey
commenced with a thorough exploration of the dataset and a profound analysis of
heart disease attributes. Grasping the significance of each feature laid a robust
foundation for subsequent model development. Employing the Naive Bayes algorithm
for heart disease classification illuminated the efficacy of probabilistic models. The
simplicity of the approach and its ability to handle numerous features with ease made
it a compelling choice. However, the assumption of feature independence, albeit
"naive," highlighted the delicate balance between model simplicity and accuracy.
Transitioning to KNN, a proximity-based model, offered a fresh perspective. The
intuitive concept of classifying a data point based on its neighbors' consensus
resonated well. Experimenting with different values of k and observing their impact on
model performance underscored the importance of meticulous hyperparameter tuning.
Implementing SVM, with its capacity to handle non-linear decision boundaries, stood
out as a project highlight. The kernel trick, particularly the radial basis function (RBF)
kernel, empowered the model to capture intricate data relationships. However, the
model's sensitivity to hyperparameter settings underscored the necessity of a precise
optimization process. The comparative analysis of the three models shed light on the
trade-offs between simplicity and complexity. Naive Bayes excelled in simplicity, KNN
in intuitiveness, and SVM in capturing intricate patterns. The choice among them, I
realized, hinges on the problem's nature, dataset characteristics, and the desired level
of interpretability. In summary, the completion of machine learning models for heart
disease classification has been a gratifying experience. It has not only fortified my
technical skills but also instilled an appreciation for the interdisciplinary nature of
applying machine learning in healthcare. This reflection stands as a testament to the
iterative and dynamic nature of the learning process, encouraging a continuous pursuit
of knowledge and improvement.

Lim Yun Hui


In this assignment, I have learned about the characteristics of a normal heart, heart
failure and aortic stenosis, and thus recognising the features that can be extracted to
be used in classifying the 3 different classes (healthy, heart failure and aortic stenosis)
of them. Understanding the relationships between ejection fraction and end diastolic
volume enables me to determine the type of classifiers to be used to efficiently
separate the 3 classes. We have used SVM, Naive Bayes and KNN model as
classifiers. SVM is used to define a hyperplane that separates different classes. SVM
uses the support vectors which are the data points that lie closest to the decision
boundary (hyperplane) between different classes to determine the optimal hyperplane.
While the support vectors are essential for defining the decision boundary, they alone
are not enough to determine the predictions. The entire model, trained on the complete
dataset, contributes to making the accurate categorical predictions. Naive Bayes is a
classification algorithm that makes predictions based on probabilities. For Naive Bayes
classification, we make the assumption that features are independent given the class
label (hence, "naive"). The Naive bayes equation for this example will be P(healthy |
EF = high, EDV = low”) = [ P(EF = high | healthy) • P(EDV = low | healthy) • P(healthy) ]
/ P( EF = high, EDV = low”). When a new instance, such as a patient with specific EF
and EDV values, is presented to the trained Naive Bayes model, the algorithm applies
the learned probabilities. It uses Bayes' theorem to calculate the probability of each
possible class (healthy or HF or AS) given the observed features. The class with the
highest calculated probability is then predicted as the most likely class for the new
instance. Lastly, "K" refers to the number of nearest neighbors that the algorithm
considers when making predictions or classifications. There is no one-size-fits-all rule
for determining the best value for K. Matlab often gives users the ability to experiment
and select the optimal K value through various means. Since Matlab provides
functions for cross-validation, it allows us to evaluate the performance of our KNN
model with different values of K. Cross-validation helps in assessing how well the
model generalizes to new, unseen data.
Reference

Adekkanattu, P., Rasmussen, L. V., Pacheco, J. A., Kabariti, J., Stone, D. J., Yu, Y.,
Jiang, G., Luo, Y., Brandt, P. S., Xu, Z., Vekaria, V., Xu, J., Wang, F., Benda, N.
C., Peng, Y., Goyal, P., Ahmad, F. S., & Pathak, J. (2023). Prediction of left
ventricular ejection fraction changes in heart failure patients using machine
learning and electronic health records: a multi-site study. Scientific reports, 13(1),
294. https://doi.org/10.1038/s41598-023-27493-8

Awad, M., Khanna, R. (2015). Support Vector Machines for Classification. In: Efficient
Learning Machines. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4302-
5990-9_3

Biswas, A. K., et al. (2023). "An ensemble learning model for predicting the intention
to quit among employees using classification algorithms." Decision Analytics
Journal 9: 100335.

Boudoulas, Konstantinos D., Triposkiadis, F., & Boudoulas, H. (2018). The Aortic
Stenosis Complex. Cardiology, 140(3), 194-198. doi:10.1159/000486363

B. Manoj Kumar, P. S. U. P. (2022). Efficient Prediction of Heart Disease using SVM


Classification Algorithm and Compare its Performance with Linear Regression in
Terms of Accuracy. Journal of Pharmaceutical Negative Results, 1430-1437.
doi:10.47750/pnr.2022.13.S04.171

Cervantes, J., et al. (2020). "A comprehensive survey on support vector machine
classification: Applications, challenges and trends." Neurocomputing 408: 189-
215.

C, C. (2021, 6-8 May 2021). Prediction of Heart Disease using Different KNN Classifier.
Paper presented at the 2021 5th International Conference on Intelligent
Computing and Control Systems (ICICCS).
Devare, V. S. (2023). Heart Disease Prediction Using Binary Classification California
State University, San Bernardino.

Garcia, J, Barker, A, Markl, M. The Role of Imaging of Flow Patterns by 4D Flow MRI
in Aortic Stenosis. J Am Coll Cardiol Img. 2019 Feb, 12 (2) 252–266.
https://doi.org/10.1016/j.jcmg.2018.10.034

Han, H., & Jiang, X. (2014). Overcome support vector machine diagnosis overfitting.
Cancer informatics, 13(Suppl 1), 145–158. https://doi.org/10.4137/CIN.S13875

Hassan, C. A. U., Iqbal, J., Irfan, R., Hussain, S., Algarni, A. D., Bukhari, S. S. H.,
Alturki, N., & Ullah, S. S. (2022). Effectively Predicting the Presence of Coronary
Heart Disease Using Machine Learning Classifiers. Sensors (Basel, Switzerland),
22(19), 7227. https://doi.org/10.3390/s22197227

Inamdar, A. A., & Inamdar, A. C. (2016). Heart Failure: Diagnosis, Management and
Utilization. Journal of clinical medicine, 5(7), 62.
https://doi.org/10.3390/jcm5070062

Johnston, D. R., Zeeshan, A., & Caraballo, B. A. (2018). Chapter 29 - Aortic Stenosis.
In G. N. Levine (Ed.), Cardiology Secrets (Fifth Edition) (pp. 269-276): Elsevier.

Kang H. (2013). The prevention and handling of the missing data. Korean journal of
anesthesiology, 64(5), 402–406. https://doi.org/10.4097/kjae.2013.64.5.402

Kerkhof P. L. (2015). Characterizing heart failure in the ventricular volume domain.


Clinical Medicine Insights. Cardiology, 9(Suppl 1), 11–31.
https://doi.org/10.4137/CMC.S18744

Khateeb, N., & Usman, M. (2017). Efficient Heart Disease Prediction System using K-
Nearest Neighbor Classification Technique.
Leiner, T., Rueckert, D., Suinesiaputra, A., Baeßler, B., Nezafat, R., Išgum, I., & Young,
A. A. (2019). Machine learning in cardiovascular magnetic resonance: basic
concepts and applications. Journal of Cardiovascular Magnetic Resonance,
21(1), 61. doi:10.1186/s12968-019-0575-y

LeWinter, M. M., & Meyer, M. (2013). Mechanisms of diastolic dysfunction in heart


failure with a preserved ejection fraction: If it's not one thing it's another.
Circulation. Heart failure, 6(6), 1112–1115.
https://doi.org/10.1161/CIRCHEARTFAILURE.113.000825

Mahmud, I., Kabir, M. M., Mridha, M. F., Alfarhood, S., Safran, M., & Che, D. (2023).
Cardiac Failure Forecasting Based on Clinical Data Using a Lightweight Machine
Learning Metamodel. Diagnostics (Basel, Switzerland), 13(15), 2540.
https://doi.org/10.3390/diagnostics13152540

Malik A, Brito D, Vaqar S, et al. Congestive Heart Failure. [Updated 2023 Nov 5]. In:
StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2023 Jan-.
Available from: https://www.ncbi.nlm.nih.gov/books/NBK430873/

Mc Namara, K., Alzubaidi, H., & Jackson, J. K. (2019). Cardiovascular disease as a


leading cause of death: how are pharmacists getting involved?. Integrated
pharmacy research & practice, 8, 1–11. https://doi.org/10.2147/IPRP.S133088

Mohi Uddin, K. M., Ripa, R., Yeasmin, N., Biswas, N., & Dey, S. K. (2023). Machine
learning-based approach to the diagnosis of cardiovascular vascular disease
using a combined dataset. Intelligence-Based Medicine, 7, 100100.
doi:https://doi.org/10.1016/j.ibmed.2023.100100

Otto, C. M., & Prendergast, B. (2014). Aortic-valve stenosis--from patients at risk to


severe valve obstruction. The New England journal of medicine, 371(8), 744–756.
https://doi.org/10.1056/NEJMra1313875
Paulus, W. J., Tschöpe, C., Sanderson, J. E., Rusconi, C., Flachskampf, F. A.,
Rademakers, F. E., Marino, P., Smiseth, O. A., De Keulenaer, G., Leite-Moreira,
A. F., Borbély, A., Edes, I., Handoko, M. L., Heymans, S., Pezzali, N., Pieske, B.,
Dickstein, K., Fraser, A. G., & Brutsaert, D. L. (2007). How to diagnose diastolic
heart failure: a consensus statement on the diagnosis of heart failure with normal
left ventricular ejection fraction by the Heart Failure and Echocardiography
Associations of the European Society of Cardiology. European heart journal,
28(20), 2539–2550. https://doi.org/10.1093/eurheartj/ehm037

Pisner, D. A. and D. M. Schnyer (2020). Chapter 6 - Support vector machine. Machine


Learning. A. Mechelli and S. Vieira, Academic Press: 101-121.

Ponikowski, P., Voors, A. A., Anker, S. D., Bueno, H., Cleland, J. G. F., Coats, A. J.
S., . . . van der Meer, P. (2016). Eur Heart J, 37(27), 2129-2200.
doi:10.1093/eurheartj/ehw128

Pujari, S. H., & Agasthi, P. (2023). Aortic Stenosis. In StatPearls. Treasure Island (FL)
ineligible companies. Disclosure: Pradyumna Agasthi declares no relevant
financial relationships with ineligible companies.: StatPearls Publishing

Rana, M. (2022). Aortic Valve Stenosis: Diagnostic Approaches and


Recommendations of the 2021 ESC/EACTS Guidelines for the Management of
Valvular Heart Disease -A Review of the Literature. Cardiol Cardiovasc Med, 6(3),
315-324. doi:10.26502/fccm.92920267

Saini, A. (2024). Guide on Support Vector Machine (SVM) Algorithm. Retrieved from
https://www.analyticsvidhya.com/blog/2021/10/support-vector-machinessvm-
a-complete-guide-for-
beginners/#:~:text=A.%20SVM%20is%20considered%20one,linear%20classif
ication%20using%20kernel%20functions.
Schwinger R. H. G. (2021). Pathophysiology of heart failure. Cardiovascular diagnosis
and therapy, 11(1), 263–276. https://doi.org/10.21037/cdt-20-302

Sheth, V., Tripathi, U., & Sharma, A. (2022). A Comparative Analysis of Machine
Learning Algorithms for Classification Purpose. Procedia Computer Science,
215, 422-431. doi:https://doi.org/10.1016/j.procs.2022.12.044

Spilias, N., et al. (2022). "Left Ventricular Systolic Dysfunction in Aortic Stenosis:
Pathophysiology, Diagnosis, Management, and Future Directions." Structural
Heart 6(5): 100089.

Tripoliti, E. E., Papadopoulos, T. G., Karanasiou, G. S., Naka, K. K., & Fotiadis, D. I.
(2017). Heart Failure: Diagnosis, Severity Estimation and Prediction of Adverse
Events Through Machine Learning Techniques. Computational and Structural
Biotechnology Journal, 15, 26-47. doi:https://doi.org/10.1016/j.csbj.2016.11.001

Zhang, Z., Li, G., Xu, Y., & Tang, X. (2021). Application of Artificial Intelligence in the
MRI Classification Task of Human Brain Neurological and Psychiatric Diseases:
A Scoping Review. Diagnostics (Basel, Switzerland), 11(8), 1402.
https://doi.org/10.3390/diagnostics11081402

APPENDIX

Table of contributions

No Tasks Contributors
1 Introduction + Problem Statement Lim Yun Hui
2 Literature Review (Pre-Processing + Feature Lim Yun Hui
Extraction)
3 Literature Review (Classification + Summary) Winny Wong Wen Ni
4 Methodology Winny Wong Wen Ni
5 Results Winny Wong Wen Ni
6 Discussion (SVM) Lim Yun Hui
7 Discussion (Naive Bayes + KNN) Winny Wong Wen Ni
8 Conclusion Lim Yun Hui

You might also like