Professional Documents
Culture Documents
Heart Disease Prediction - Ipynb
Heart Disease Prediction - Ipynb
"cells": [
{
"cell_type": "markdown",
"id": "cbe3afff",
"metadata": {},
"source": [
"# Prediction of Heart Disease\n",
"\n",
"In this project, we build a Machine Learning (ML) model that is capable of
predicting whether a person has heart disease or not, using the medical attributes
of the person. Thus, we have a binary classification problem at hand and we solve
it via supervised Machine Learning."
]
},
{
"cell_type": "markdown",
"id": "c51e4bef",
"metadata": {},
"source": [
"## Data\n",
"\n",
"The dataset to train and test the model is taken from the `UC Irvine Machine
Learning Repository`. We shall use the Cleveland dataset
(https://archive.ics.uci.edu/dataset/45/heart+disease), which has the following
medical attributes:\n",
"\n",
"* `age`: age in years.\n",
"* `sex`: biological sex.\n",
" * value 1 = male\n",
" * value 0 = female\n",
"* `cp`: chest pain type.\n",
" * value 1 = typical angina\n",
" * value 2 = atypical angina\n",
" * value 3 = non-anginal pain\n",
" * value 4 = asymptomatic\n",
"* `trestbps`: resting blood pressure in mm Hg on admission to the hospital.\
n",
"* `chol`: serum cholestoral in mg/dl.\n",
"* `fbs`: fasting blood sugar > 120 mg/dl.\n",
" * value 1 = true\n",
" * value 0 = false\n",
"* `restecg`: resting electrocardiographic results.\n",
" * value 0 = normal\n",
" * value 1 = having ST-T wave abnormality\n",
" * value 2 = showing probable or definite left ventricular hypertrophy by
Estes' criteria\n",
"* `thalach`: maximum heart rate achieved.\n",
"* `exang`: exercise induced angina.\n",
" * value 1 = yes\n",
" * value 0 = no\n",
"* `oldpeak`: ST depression induced by exercise relative to rest.\n",
"* `slope`: the slope of the peak exercise ST segment.\n",
" * value 1 = upsloping\n",
" * value 2 = flat\n",
" * value 3 = downsloping\n",
"* `ca`: number of major vessels (0-3) colored by flourosopy.\n",
"* `thal`: An inherited blood disorder (thalassemia)\n",
" * value 3 = normal\n",
" * value 6 = fixed defect\n",
" * value 7 = reversable defect\n",
"* `num`: diagnosis of heart disease (angiographic disease status).\n",
" * value 0 = has no heart disease\n",
" * value $\\in$ {1,2,3,4} = has heart disease"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "a04e28ec",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.metrics import classification_report, ConfusionMatrixDisplay,
RocCurveDisplay \n",
"from sklearn.model_selection import cross_val_score, GridSearchCV,
train_test_split\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.svm import SVC\n",
"from xgboost import XGBClassifier\n",
"\n",
"# render plots in the notebook.\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"id": "9b32ef4c",
"metadata": {},
"source": [
"## Data Preprocessing"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5fe200fd",
"metadata": {},
"outputs": [],
"source": [
"# names of the medical attributes.\n",
"medical_attributes =
[\"age\", \"sex\", \"cp\", \"trestbps\", \"chol\", \"fbs\", \"restecg\",\n",