Professional Documents
Culture Documents
FACULTY OF ENGINEERING
UNIVERSITY OF MALAYA
GROUP 7
Aortic stenosis (AS) is a common valvular disorder (Pujari & Agasthi, 2023)
which involves the narrowing or stenosis of the aortic valve, leading to left ventricular
outflow obstruction. It ranks as the primary valvular heart disease in Europe and North
America, following arterial hypertension and coronary artery disease (Rana, 2022). AS
etiologies include congenital (bicuspid/unicuspid), calcific, and rheumatic diseases.
Classic symptoms encompass angina, syncope, and dyspnea, with manifestations of
heart failure indicating a more severe form and a worse prognosis (Johnston &
Zeeshan, 2018). Many predisposing factors, such as age, hypertension and the
turbulence of perivalvular blood flow, contribute to degenerative aortic stenosis. In
microscopic observations, (Otto & Prendergast, 2014) noted that in stenotic valves,
the initial aortic lesions contain disorganized collagen fibres, chronic inflammatory cells,
lipids and proteins of extracellular bone matrix and bone minerals, causing the
narrowing of the aortic valve.
Deviation in the elastic properties of the aorta gives rise to an elevation in the
velocity of aortic pulse wave and reflected wave. This leads to premature return of
reflected wave velocities from the periphery to the root of the aorta, causing the systolic
portion of the pulse wave to fuse, ultimately contributing to the onset of systolic
hypertension (Boudoulas & Konstantinos, 2018).
Problem Statement
LITERATURE REVIEW
PRE-PROCESSING
Raw datas are vulnerable to noise, corruption, missing, and inconsistent data,
it is necessary to perform pre-processing steps. Poor data can primarily affect the
accuracy and lead to false prediction (Maharana & Mondal, 2022). By far the most
common approach to the missing data is to simply omit those cases with the missing
data and analyze the remaining data. This approach is known as the complete case
(or available case) analysis or listwise deletion. However, its only acceptable if it
causes a small relatively loss of data (Kang, 2013).
FEATURE EXTRACTION
Left ventricular ejection fraction (EF) is a key measure in the diagnosis and
treatment of heart failure (HF) and many patients experience changes in EF overtime
(Adekkanattu & Rasmussen, 2023). HF falls into 2 major categories: Heart Failure
with preserved Ejection Fraction (HFpEF) and Heart Failure with reduced Ejection
Fraction (HFrEF). Preserved EF refers to the type of heart failure with normal ejection
fraction yet exhibits signs and symptoms of heart failure. Preserved EF is more
commonly in women as well as the elderly, with a regular left-ventricular cavity volume
but a dense and stiff LV wall (Inamdar & Inamdar, 2016). HF with preserved ejection
fraction has LV ejection fraction more than 50% while HF with improved ejection
fraction has LV ejection fraction more than 40%. On the other hand, Reduced EF is
marked by a dilated LV cavity, with the ratio of LV mass to end-diastolic volume being
either normal or reduced (Inamdar & Inamdar, 2016). Reduced ejected fraction in mild
condition has LV ejection fraction in the range of 41% to 49%, while in a more serious
condition, the left ventricular ejection fraction is less than 40% (Malik & Brito, 2023).
As the incidence of heart failure (HF) with reduced ejection fraction (HFrEF) increases
with age, the AS often co-exists with LV systolic dysfunction. Approximately one-third
of patients who are diagnosed with severe AS have LV systolic dysfunction, defined
as LV ejection fraction (LVEF) less than 50% (Spilias, 2022).
As mentioned, the HFpEF patient have an ejection fraction like a healthy patient
does. And so, the diagnosis of HFpEF requires the following diagnosis to be satisfied,
namely the patient must have signs and symptoms of heart failure, and evidence of
diastolic LV dysfunction (Paulus & Tschöpe, 2007). HFpEF is a complex and extremely
common syndrome, accounting for greater than 50% of HF patients (LeWinter & Meyer,
2013). In heart-failure patients, there was a paradoxical increase in left ventricular end-
diastolic volume in association with an expected decrease in right ventricular end-
diastolic volume during lower-body suction (Kerkhof, 2015).
CLASSIFICATIONS
There are several techniques used in the classification stages such as support vector
machine (SVM), decision tree, and random forest.
(i) Support Vector Machine (SVM)
Based on the study of B. Manoj Kumar (2022), the research aims to
detect heart disease using Support Vector Machine (SVM) classifier and
compare its performance with Linear Regression (LR) model. The dataset
for detection of heart disease was collected from the UCI machine learning
repository database and consisted of 60 samples. The SVM classifier
exhibited a significantly higher accuracy rate of 90.43% as compared to the
LR model's accuracy rate of 78.56%. The study utilized statistical analysis
to demonstrate the superiority of the SVM classifier in accurately detecting
heart disease. The SVM classifier was shown to outperform the LR model
in heart disease classification. The statistical analysis of the SVM classifier
and LR model was presented, demonstrating the superiority of the SVM
classifier in terms of mean accuracy rate, standard deviation, and standard
error mean. The study also included a comparison of the evaluation metrics,
statistical computations, and significance levels between SVM and LR,
further confirming the higher accuracy of the SVM classifier. In conclusion,
the research study demonstrated that the novel SVM classifier outperforms
the LR model in accurately detecting heart disease, with an accuracy rate
of 90.43%. The study provides valuable insights into the efficient prediction
of heart disease using SVM classification algorithms and establishes the
superiority of SVM over LR in terms of accuracy.
According to the study of Elsedimy, AboHashish, and Algarni (2023), the
document presents a new approach for predicting cardiovascular disease
(CVD) using a hybrid model of Quantum-Behaved Particle Swarm
Optimization (QPSO) and Support Vector Machine (SVM). The study
addresses the importance of early detection of CVD to reduce the risk of
heart attack and increase the chance of recovery. The proposed QPSO-
SVM model aims to analyze and predict heart disease risk by optimizing
SVM parameters using the QPSO algorithm. The data pre-processing
involves transforming nominal data into numerical data and applying
effective scaling techniques. The experimental results show that the QPSO-
SVM model achieved a high accuracy of 96.31% on the Cleveland heart
disease dataset. Furthermore, the model outperformed other state-of-the-
art prediction models in terms of sensitivity, specificity, precision, and F1
score. An improved SVM trained by a QPSO algorithm is proposed to select
the optimal values of the SVM parameters and improve classification
accuracy. The QPSO-SVM model is trained and learned using public heart
disease datasets to forecast the patients' heart disease status based on
their current state. The proposed QPSO-SVM is evaluated and compared
with the results of previous studies using evaluation metrics such as
accuracy, specificity, precision, G-Mean, and F1 score. Additionally, the
statistical analysis was presented to evaluate the QPSO-SVM significance
compared to other models. The study provides a detailed explanation of the
proposed QPSO-SVM model, including the use of QPSO to adjust SVM
parameters, the self-adaptive threshold method, and the experimental
results. The proposed model demonstrates promising results in predicting
heart disease risk and outperforms other state-of-the-art models. The
comprehensive summary of the study provides a clear understanding of the
significance and potential of the proposed QPSO-SVM model in predicting
cardiovascular disease.
(ii) Naive Bayes
Based on V Sai Krishna Reddy (2022), it discusses the application of
machine learning (ML) techniques, specifically Decision Tree and Naïve
Bayes classifiers, to predict cardiovascular disease using the Heart Failure
Dataset. The performance evaluation and results section provide the
accuracy values and confusion matrices for both classifiers, demonstrating
the effectiveness of the Naïve Bayes algorithm, which achieved an accuracy
of 86%, compared to the Decision Tree's 82%. The conclusion suggests
that the Naïve Bayes algorithm performed well and emphasizes the
importance of ML techniques in predicting and preventing heart disease.
(iii) KNN
Based on the study of C (2021), it presents a study on predicting heart
disease using different K-nearest neighbour (KNN) classifiers. The study
utilizes machine learning classification algorithms to analyze heart disease
data and predicts results using various versions of the KNN classifier in
MATLAB. The study evaluates the performance of these algorithms based
on accuracy, misclassification rate, and other metrics. Various distance
metrics and distance weights used by the KNN algorithms are also
discussed. The performance of the proposed KNN classifiers varies based
on the specific model and its configuration. The study evaluates the
performance of different KNN classifiers based on accuracy,
misclassification rate, prediction speed, and training time. The results show
that the Optimizable KNN model achieved the highest accuracy of 69%, with
a prediction speed of approximately 5600 observations per second.
However, it also had a relatively high training time of about 50.125 seconds.
Other KNN models such as Fine KNN, Medium KNN, Coarse KNN, and
Weighted KNN also demonstrated varying levels of accuracy,
misclassification costs, and prediction speeds.
SUMMARY
To classify the subjects into healthy, aortic stenosis (AS), and heart failure (HF),
feature extractions as the pre-processing steps are necessary to extract the significant
cardiac features that to be used along with the classifiers. During the study of feature
extractions, left ventricular ejection fraction (EF) is believed to be an important cardiac
feature for classification purpose. Apart from that, there are several classifiers that can
be used to differentiate if a subject is healthy or with heart diseases such as k-Nearest
Neighbour (KNN), Naïve Bayes, and Support Vector Machine (SVM).
METHODOLOGY
The proposed algorithm is implemented from MathWorks which classify the
heart disease conditions using SVM, Naïve Bayes, and KNN. The process is divided
into two phases which are training and testing phase. After the segmentation of left
and right ventricles using Segment software, cardiac features such as LVEF (Left
Ventricle Ejection Fraction), EDV (End- Diastolic Volume), EF (Ejection Fraction),
LVEDV (Left Ventricle End-Diastolic Volume), RVEF (Right Ventricle Ejection
Fraction), RVEDV (Right Ventricle End-Diastolic Volume), ESV (End-Systolic Volume),
SV (Stroke Volume), and HR (Heart Rate) are extracted into an Excel file. Heart
diseases will be classified into 3 classes which are HF (Heart Failure), AS (Aortic
Stenosis), and Healthy. There is a total of 66 data for all three classes with some of
the data removed due to missing values. According to the study conducted by Devare
(2023) which is about heart disease prediction using binary classification, the splitting
of the dataset is 70% for training and 30% for testing.
Feature selection techniques such as ANOVA are used in the MATLAB
Classification Learner App. ANOVA which represents for One-Way Analysis of
Variance is a statistical method used to analyze the differences among group means
in a sample. From the previous 9 features, only the first 4 features with the highest
influence will be selected. The 4 selected features are LVEF, EDV, EF, and LVEDV.
The classifications of heart diseases into three different classes such as HF, AS, and
Healthy are based on the 4 cardiac features which are LVEF, EDV, EF, and LVEDV.
The 3 chosen classifiers for the heart classifications using MATLAB Classification
Learner App are Support Vector Machine (SVM), KNN, and Naive Bayes.
The SVMs are one of the most powerful and robust classification algorithms
that has been playing a significant role in pattern recognition (Cervantes, 2020). SVMs
distinctively afford balanced predictive performance, even in studies where sample
sizes may be limited (Pisner & Schnyer, 2020). SVM is a sparse technique. Like
nonparametric methods, SVM requires that all the training data be available, that is,
stored in memory during the training phase, when the parameters of the SVM model
are learned. However, once the model parameters are identified, SVM depends only
on a subset of these training instances, called support vectors, for future prediction.
Support vectors define the margins of the hyperplanes (Awad & Khanna, 2015).
Contrary to the general assumption that a nonlinear decision boundary is more
effective in SVM classification than a linear one, we have found that SVM encounters
overfitting on nonlinear kernels through rigorous kernel analysis (Han & Jiang, 2014).
And therefore, we are using a linear kernel SVM.
Naive Bayes is a probabilistic classification algorithm rooted in Bayes' theorem,
a method that computes the probability of a predictor given a specific class. It operates
under the assumption that a predictor's impact on a class remains independent of the
values of other predictors, a principle termed class conditional independence. This
algorithm finds notable application in medical science, particularly in the diagnosis of
heart patients, owing to its straightforwardness and capacity to deliver precise
outcomes. Despite its simplicity, the Naive Bayes classifier frequently demonstrates
robust performance and enjoys widespread usage, often surpassing more intricate
classification methods. It leverages probability theory to ascertain the most probable
classification for an unseen instance and proves effective with categorical data, though
it may exhibit suboptimal performance with numerical data in the training set. In the
realm of heart disease prediction, the Naive Bayes algorithm has been effectively
employed to categorize medical data, yielding accurate predictions of heart diseases
with a noteworthy level of precision (K.Vembandasamy, 2015).
KNN is employed in machine learning for tasks related to classification and
regression. As a non-parametric and approach, KNN does not assume a specific data
distribution and operates without a dedicated training phase. Instead, it categorizes
new data points by assessing their resemblance to existing data points within the
feature space. The algorithm assigns a label to a fresh observation based on the
predominant class among its closest neighbors, with "closeness" determined by a
distance metric like Euclidean distance or cosine similarity. KNN finds extensive
application in tasks such as pattern recognition, anomaly detection, and
recommendation systems, thanks to its simplicity and effectiveness in managing
intricate datasets (jabbar, Deekshatulu, & Chandra, 2013).
RESULTS
(iii) KNN
Cubic KNN had reaches a testing accuracy of 75.0%.
Figure: Summary of Validation and Testing results of Cubic KNN.
CONCLUSION
Support Vector Machine (SVM) stands out as the preferred choice for heart
disease classification when compared to other machine learning techniques, due to
its ability in handling high-dimensional feature spaces and accommodating non-linear
classifications through kernel functions. To enhance SVM's performance,
normalisation and tuning of hyperparameters, particularly the regularization parameter
and kernel selection are useful for optimal results. Current innovations in SVM
applications involve ensemble techniques, combining various algorithms like Naïve
Bayes and Random Forests, showcasing the potential for even more robust and
accurate diagnostic models (Biswas, 2023). Ultimately, the diagnosis of heart diseases
can be significantly improved by integrating SVM with advanced ensemble methods,
addressing missing values, and employing normalization techniques, ensuring precise
and early detection for effective intervention of heart diseases.
REFLECTIONS
Winny Wong Wen Ni
In this assignment, the most interesting study I had done is how to utilize my
MATLAB knowledges in performing feature extractions, conducting ranking
mechanisms on the cardiac features which is ANOVA, then feeding the features into
the classification learner app to detect the existence of heart disease. The journey
commenced with a thorough exploration of the dataset and a profound analysis of
heart disease attributes. Grasping the significance of each feature laid a robust
foundation for subsequent model development. Employing the Naive Bayes algorithm
for heart disease classification illuminated the efficacy of probabilistic models. The
simplicity of the approach and its ability to handle numerous features with ease made
it a compelling choice. However, the assumption of feature independence, albeit
"naive," highlighted the delicate balance between model simplicity and accuracy.
Transitioning to KNN, a proximity-based model, offered a fresh perspective. The
intuitive concept of classifying a data point based on its neighbors' consensus
resonated well. Experimenting with different values of k and observing their impact on
model performance underscored the importance of meticulous hyperparameter tuning.
Implementing SVM, with its capacity to handle non-linear decision boundaries, stood
out as a project highlight. The kernel trick, particularly the radial basis function (RBF)
kernel, empowered the model to capture intricate data relationships. However, the
model's sensitivity to hyperparameter settings underscored the necessity of a precise
optimization process. The comparative analysis of the three models shed light on the
trade-offs between simplicity and complexity. Naive Bayes excelled in simplicity, KNN
in intuitiveness, and SVM in capturing intricate patterns. The choice among them, I
realized, hinges on the problem's nature, dataset characteristics, and the desired level
of interpretability. In summary, the completion of machine learning models for heart
disease classification has been a gratifying experience. It has not only fortified my
technical skills but also instilled an appreciation for the interdisciplinary nature of
applying machine learning in healthcare. This reflection stands as a testament to the
iterative and dynamic nature of the learning process, encouraging a continuous pursuit
of knowledge and improvement.
Adekkanattu, P., Rasmussen, L. V., Pacheco, J. A., Kabariti, J., Stone, D. J., Yu, Y.,
Jiang, G., Luo, Y., Brandt, P. S., Xu, Z., Vekaria, V., Xu, J., Wang, F., Benda, N.
C., Peng, Y., Goyal, P., Ahmad, F. S., & Pathak, J. (2023). Prediction of left
ventricular ejection fraction changes in heart failure patients using machine
learning and electronic health records: a multi-site study. Scientific reports, 13(1),
294. https://doi.org/10.1038/s41598-023-27493-8
Awad, M., Khanna, R. (2015). Support Vector Machines for Classification. In: Efficient
Learning Machines. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4302-
5990-9_3
Biswas, A. K., et al. (2023). "An ensemble learning model for predicting the intention
to quit among employees using classification algorithms." Decision Analytics
Journal 9: 100335.
Boudoulas, Konstantinos D., Triposkiadis, F., & Boudoulas, H. (2018). The Aortic
Stenosis Complex. Cardiology, 140(3), 194-198. doi:10.1159/000486363
Cervantes, J., et al. (2020). "A comprehensive survey on support vector machine
classification: Applications, challenges and trends." Neurocomputing 408: 189-
215.
C, C. (2021, 6-8 May 2021). Prediction of Heart Disease using Different KNN Classifier.
Paper presented at the 2021 5th International Conference on Intelligent
Computing and Control Systems (ICICCS).
Devare, V. S. (2023). Heart Disease Prediction Using Binary Classification California
State University, San Bernardino.
Garcia, J, Barker, A, Markl, M. The Role of Imaging of Flow Patterns by 4D Flow MRI
in Aortic Stenosis. J Am Coll Cardiol Img. 2019 Feb, 12 (2) 252–266.
https://doi.org/10.1016/j.jcmg.2018.10.034
Han, H., & Jiang, X. (2014). Overcome support vector machine diagnosis overfitting.
Cancer informatics, 13(Suppl 1), 145–158. https://doi.org/10.4137/CIN.S13875
Hassan, C. A. U., Iqbal, J., Irfan, R., Hussain, S., Algarni, A. D., Bukhari, S. S. H.,
Alturki, N., & Ullah, S. S. (2022). Effectively Predicting the Presence of Coronary
Heart Disease Using Machine Learning Classifiers. Sensors (Basel, Switzerland),
22(19), 7227. https://doi.org/10.3390/s22197227
Inamdar, A. A., & Inamdar, A. C. (2016). Heart Failure: Diagnosis, Management and
Utilization. Journal of clinical medicine, 5(7), 62.
https://doi.org/10.3390/jcm5070062
Johnston, D. R., Zeeshan, A., & Caraballo, B. A. (2018). Chapter 29 - Aortic Stenosis.
In G. N. Levine (Ed.), Cardiology Secrets (Fifth Edition) (pp. 269-276): Elsevier.
Kang H. (2013). The prevention and handling of the missing data. Korean journal of
anesthesiology, 64(5), 402–406. https://doi.org/10.4097/kjae.2013.64.5.402
Khateeb, N., & Usman, M. (2017). Efficient Heart Disease Prediction System using K-
Nearest Neighbor Classification Technique.
Leiner, T., Rueckert, D., Suinesiaputra, A., Baeßler, B., Nezafat, R., Išgum, I., & Young,
A. A. (2019). Machine learning in cardiovascular magnetic resonance: basic
concepts and applications. Journal of Cardiovascular Magnetic Resonance,
21(1), 61. doi:10.1186/s12968-019-0575-y
Mahmud, I., Kabir, M. M., Mridha, M. F., Alfarhood, S., Safran, M., & Che, D. (2023).
Cardiac Failure Forecasting Based on Clinical Data Using a Lightweight Machine
Learning Metamodel. Diagnostics (Basel, Switzerland), 13(15), 2540.
https://doi.org/10.3390/diagnostics13152540
Malik A, Brito D, Vaqar S, et al. Congestive Heart Failure. [Updated 2023 Nov 5]. In:
StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2023 Jan-.
Available from: https://www.ncbi.nlm.nih.gov/books/NBK430873/
Mohi Uddin, K. M., Ripa, R., Yeasmin, N., Biswas, N., & Dey, S. K. (2023). Machine
learning-based approach to the diagnosis of cardiovascular vascular disease
using a combined dataset. Intelligence-Based Medicine, 7, 100100.
doi:https://doi.org/10.1016/j.ibmed.2023.100100
Ponikowski, P., Voors, A. A., Anker, S. D., Bueno, H., Cleland, J. G. F., Coats, A. J.
S., . . . van der Meer, P. (2016). Eur Heart J, 37(27), 2129-2200.
doi:10.1093/eurheartj/ehw128
Pujari, S. H., & Agasthi, P. (2023). Aortic Stenosis. In StatPearls. Treasure Island (FL)
ineligible companies. Disclosure: Pradyumna Agasthi declares no relevant
financial relationships with ineligible companies.: StatPearls Publishing
Saini, A. (2024). Guide on Support Vector Machine (SVM) Algorithm. Retrieved from
https://www.analyticsvidhya.com/blog/2021/10/support-vector-machinessvm-
a-complete-guide-for-
beginners/#:~:text=A.%20SVM%20is%20considered%20one,linear%20classif
ication%20using%20kernel%20functions.
Schwinger R. H. G. (2021). Pathophysiology of heart failure. Cardiovascular diagnosis
and therapy, 11(1), 263–276. https://doi.org/10.21037/cdt-20-302
Sheth, V., Tripathi, U., & Sharma, A. (2022). A Comparative Analysis of Machine
Learning Algorithms for Classification Purpose. Procedia Computer Science,
215, 422-431. doi:https://doi.org/10.1016/j.procs.2022.12.044
Spilias, N., et al. (2022). "Left Ventricular Systolic Dysfunction in Aortic Stenosis:
Pathophysiology, Diagnosis, Management, and Future Directions." Structural
Heart 6(5): 100089.
Tripoliti, E. E., Papadopoulos, T. G., Karanasiou, G. S., Naka, K. K., & Fotiadis, D. I.
(2017). Heart Failure: Diagnosis, Severity Estimation and Prediction of Adverse
Events Through Machine Learning Techniques. Computational and Structural
Biotechnology Journal, 15, 26-47. doi:https://doi.org/10.1016/j.csbj.2016.11.001
Zhang, Z., Li, G., Xu, Y., & Tang, X. (2021). Application of Artificial Intelligence in the
MRI Classification Task of Human Brain Neurological and Psychiatric Diseases:
A Scoping Review. Diagnostics (Basel, Switzerland), 11(8), 1402.
https://doi.org/10.3390/diagnostics11081402
APPENDIX
Table of contributions
No Tasks Contributors
1 Introduction + Problem Statement Lim Yun Hui
2 Literature Review (Pre-Processing + Feature Lim Yun Hui
Extraction)
3 Literature Review (Classification + Summary) Winny Wong Wen Ni
4 Methodology Winny Wong Wen Ni
5 Results Winny Wong Wen Ni
6 Discussion (SVM) Lim Yun Hui
7 Discussion (Naive Bayes + KNN) Winny Wong Wen Ni
8 Conclusion Lim Yun Hui