You are on page 1of 13

Auto-ECG Cardiac Monitoring Project: Heart Attack Risk Assessment Algorithm Based on Machine

Learning Models - Logistic regression, KNN and Random Forest Classifier.

The study assessing the model’s validity, which was published this week in Nature
Medicine, indicates that distinct patterns in the peaks and valleys of ECG readings are
crucial for clinicians to recognize a heart attack, but up to two-thirds of heart attacks do
not have these patterns.

For patients with serious heart attacks caused by total blockages of a coronary artery,
known as ST-elevation myocardial infarction (STEMI), failure to recognize clues
in the ECG can lead to life-threatening complications.

The ML tool is designed to help detect subtle patterns in the ECG that may be clinically
relevant but are easily missed by clinicians. The model also classifies patients based on
whether they are at low, intermediate, or high risk of heart attack.

The model was developed using ECG data from 4,026 patients with chest pain at three
hospitals in Pittsburgh, Pennsylvania. The tool was then validated using data from 3,287
patients at a separate health system.

To test the algorithm, the researchers compared its performance to three gold standards
for evaluating cardiac events: experienced clinician interpretation of ECG, commercial
ECG algorithms, and the History, Electrocardiogram, Age, Risk factors, Troponin
(HEART) score.

The researchers indicated that they hoped to match the accuracy of the HEART score
with their model, but they found that the ML significantly exceeded the performance of
the gold standard approaches. The tool was able to accurately reclassify one in three
chest pain patients as low, intermediate, or high risk.
These findings led the research team to conclude that the tool has the potential to help
emergency medical services (EMS) and emergency department personnel flag people
having a heart attack in a more robust way than with conventional ECG analysis.

“This information can help guide EMS medical decisions such as initiating certain
treatments in the field or alerting hospitals that a high-risk patient is incoming,”
explained study co-author Christian Martin-Gill, MD, MPH, chief of the EMS division at
UPMC. “On the flip side, it’s also exciting that it can help identify low-risk patients who
don’t need to go to a hospital with a specialized cardiac facility, which could improve
prehospital triage.”

Moving forward, the researchers are partnering with the City of Pittsburgh Bureau of
Emergency Medical Services to integrate the model into a cloud-based system that
would allow hospital command centers to analyze and risk stratify patients based on
ECG readings received from EMS, which could support medical decision-making in real-
time.

Other AI systems are also being developed to predict heart attack risk.

Last year, researchers at Johns Hopkins University created Survival Study of Cardiac
Arrhythmia Risk (SSCAR), an AI system designed to improve the accuracy of predicting
cardiac arrest using raw images of patients’ hearts and demographic data.

Using this information, the system analyzes patterns not visible to the naked eye to
detect important indicators of ten-year cardiac arrest risk.

To predict this, we use 14 medical attributes of a patient and classify him if the patient is
likely to have a heart disease. These medical attributes are trained under three
algorithms: Logistic regression, KNN and Random Forest Classifier.
INTRODUCTION

In this notebook i want to predict different arrhytmia on ECG. We have two different dataset, but i will
consider at start only one : mitbih. The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts
of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH
Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random
from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of
inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining
25 recordings were selected from the same set to include less common but clinically significant
arrhythmias that would not be well-represented in a small random sample.
souces : https://physionet.org/content/mitdb/1.0.0/

Arrhythmia Dataset

Number of Samples: 109446

Number of Categories: 5

Sampling Frequency: 125Hz

Data Source: Physionet's MIT-BIH Arrhythmia Dataset


Classes: ['N': 0, 'S': 1, 'V': 2, 'F': 3, 'Q': 4]

-N : Non-ecotic beats (normal beat) -S : Supraventricular ectopic beats -V : Ventricular ectopic beats
-F : Fusion Beats -Q : Unknown Beats

What is an ECG?

An electrocardiogram (ECG) is a simple test that can be used to check your heart's rhythm and
electrical activity.

Sensors attached to the skin are used to detect the electrical signals produced by your heart each
time it beats.

These signals are recorded by a machine and are looked at by a doctor to see if they're unusual.

An ECG may be requested by a heart specialist (cardiologist) or any doctor who thinks you might
have a problem with your heart, including your GP. That's the result of this test we will analyze.

Load Data

In [1]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.
/kaggle/input/heartbeat/mitbih_test.csv
/kaggle/input/heartbeat/mitbih_train.csv
/kaggle/input/heartbeat/ptbdb_normal.csv
/kaggle/input/heartbeat/ptbdb_abnormal.csv
In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.metrics import confusion_matrix
from keras.utils.np_utils import to_categorical
from sklearn.utils import class_weight
import warnings
warnings.filterwarnings('ignore')
Using TensorFlow backend.
In [3]:
train_df=pd.read_csv('/kaggle/input/heartbeat/mitbih_train.csv',header=None)
test_df=pd.read_csv('/kaggle/input/heartbeat/mitbih_test.csv',header=None)
Balance of dataset

In [4]:
train_df[187]=train_df[187].astype(int)
equilibre=train_df[187].value_counts()
print(equilibre)
plt.bar(equilibre.index,equilibre)
0 72471
4 6431
2 5788
1 2223
3 641
Name: 187, dtype: int64
Out[4]:
<BarContainer object of 5 artists>

We can underligned a huge difference in the balanced of the classes. After some try i have decided
to choose the resample technique more than the class weights for the algorithms.

In [5]:
from sklearn.utils import resample
df_1=train_df[train_df[187]==1]
df_2=train_df[train_df[187]==2]
df_3=train_df[train_df[187]==3]
df_4=train_df[train_df[187]==4]
df_0=(train_df[train_df[187]==0]).sample(n=20000,random_state=42)

df_1_upsample=resample(df_1,replace=True,n_samples=20000,random_state=123)
df_2_upsample=resample(df_2,replace=True,n_samples=20000,random_state=124)
df_3_upsample=resample(df_3,replace=True,n_samples=20000,random_state=125)
df_4_upsample=resample(df_4,replace=True,n_samples=20000,random_state=126)

train_df=pd.concat([df_0,df_1_upsample,df_2_upsample,df_3_upsample,df_4_upsample])
In [6]:
equilibre=train_df[187].value_counts()
print(equilibre)
plt.bar(equilibre.index,equilibre)
4 20000
3 20000
2 20000
1 20000
0 20000
Name: 187, dtype: int64
Out[6]:
<BarContainer object of 5 artists>

Resample works perfectly we can go on.

Classes

In this part i want to study the differente classes.


In [7]:
c=train_df.groupby(187,group_keys=False).apply(lambda train_df : train_df.sample(1))
I take one sample per class and i store it in a datafrmae in order to have an exmeple.

In [8]:
c
Out[8]:

. 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 . 7 7 8 8 8 8 8 8 8 8
. 8 9 0 1 2 3 4 5 6 7

27 0.9 0.7 0.3 0.1 0.0 0.0 0.0 0.0 0.0 0.0 . 0 0 0 0 0 0 0 0 0
06 425 871 682 587 540 540 337 304 270 135 . . . . . . . . . . 0
2 68 62 43 84 54 54 84 05 27 14 . 0 0 0 0 0 0 0 0 0

73 1.0 0.8 0.4 0.1 0.0 0.1 0.1 0.1 0.1 0.1 . 0 0 0 0 0 0 0 0 0
51 000 186 461 515 481 402 756 303 161 090 . . . . . . . . . . 1
0 00 97 76 58 59 27 37 12 47 65 . 0 0 0 0 0 0 0 0 0

77 0.7 0.7 0.6 0.5 0.5 0.5 0.6 0.6 0.3 0.2 . 0 0 0 0 0 0 0 0 0
81 274 539 884 610 150 840 336 053 911 176 . . . . . . . . . . 2
7 34 82 96 62 44 71 28 10 50 99 . 0 0 0 0 0 0 0 0 0

80 1.0 0.9 0.7 0.4 0.0 0.0 0.0 0.0 0.0 0.0 . 0 0 0 0 0 0 0 0 0
99 000 475 762 137 825 725 937 550 375 325 . . . . . . . . . . 3
4 00 00 50 50 00 00 50 00 00 00 . 0 0 0 0 0 0 0 0 0

87 0.6 0.6 0.4 0.3 0.2 0.1 0.0 0.0 0.0 0.0 . 0 0 0 0 0 0 0 0 0
19 956 070 749 645 190 270 484 150 050 535 . . . . . . . . . . 4
1 52 23 16 49 64 90 95 50 17 12 . 0 0 0 0 0 0 0 0 0

5 rows × 188 columns

In [9]:
plt.plot(c.iloc[0,:186])
Out[9]:
[<matplotlib.lines.Line2D at 0x7fee779a95f8>]

Here is a normal beat. I don't have something particular to say on that class.

In [10]:
def plot_hist(class_number,size,min_):
img=train_df.loc[train_df[187]==class_number].values
img=img[:,min_:size]
img_flatten=img.flatten()

final1=np.arange(min_,size)
for i in range (img.shape[0]-1):
tempo1=np.arange(min_,size)
final1=np.concatenate((final1, tempo1), axis=None)
print(len(final1))
print(len(img_flatten))
plt.hist2d(final1,img_flatten, bins=(80,80),cmap=plt.cm.jet)
plt.show()
In [11]:
plot_hist(0,70,5)
1300000
1300000

Here is a representation for all the class. We take all the signal and map them. Like that we have an
estimation what the signal can look like.

In [12]:
plt.plot(c.iloc[1,:186])
Out[12]:
[<matplotlib.lines.Line2D at 0x7fee778f3358>]

In [13]:
plot_hist(1,50,5)
900000
900000

In [14]:
plt.plot(c.iloc[2,:186])
Out[14]:
[<matplotlib.lines.Line2D at 0x7fee7782da58>]

In [15]:
plot_hist(2,60,30)
600000
600000

Here is an exemple of the two classes :


in the
second and third line you have the 2 et 3 class.

In [16]:
plt.plot(c.iloc[3,:186])
Out[16]:
[<matplotlib.lines.Line2D at 0x7fee77766320>]

In [17]:
plot_hist(3,60,25)
700000
700000

Fusion beat :

Don't really see the difference with the previous one but i'm not an expert of ECG!

In [18]:
plt.plot(c.iloc[4,:186])
Out[18]:
[<matplotlib.lines.Line2D at 0x7fee776a10b8>]

I will not comment a lot this one because it correspond to other class.

In [19]:
plot_hist(4,50,18)
640000
640000

Pretreat

In this part i will speak o n what i do to transform data.

In [20]:
def add_gaussian_noise(signal):
noise=np.random.normal(0,0.05,186)
return (signal+noise)
I use a fonction ( will depend of the version) where i add a noise to the data to generilize my train.

In [21]:
tempo=c.iloc[0,:186]
bruiter=add_gaussian_noise(tempo)
plt.subplot(2,1,1)
plt.plot(c.iloc[0,:186])

plt.subplot(2,1,2)
plt.plot(bruiter)

plt.show()

In [22]:
target_train=train_df[187]
target_test=test_df[187]
y_train=to_categorical(target_train)
y_test=to_categorical(target_test)
In [23]:
X_train=train_df.iloc[:,:186].values
X_test=test_df.iloc[:,:186].values
for i in range(len(X_train)):
X_train[i,:186]= add_gaussian_noise(X_train[i,:186])
X_train = X_train.reshape(len(X_train), X_train.shape[1],1)
X_test = X_test.reshape(len(X_test), X_test.shape[1],1)
Network

In [24]:
def network(X_train,y_train,X_test,y_test):

im_shape=(X_train.shape[1],1)
inputs_cnn=Input(shape=(im_shape), name='inputs_cnn')
conv1_1=Convolution1D(64, (6), activation='relu', input_shape=im_shape)
(inputs_cnn)
conv1_1=BatchNormalization()(conv1_1)
pool1=MaxPool1D(pool_size=(3), strides=(2), padding="same")(conv1_1)
conv2_1=Convolution1D(64, (3), activation='relu', input_shape=im_shape)(pool1)
conv2_1=BatchNormalization()(conv2_1)
pool2=MaxPool1D(pool_size=(2), strides=(2), padding="same")(conv2_1)
conv3_1=Convolution1D(64, (3), activation='relu', input_shape=im_shape)(pool2)
conv3_1=BatchNormalization()(conv3_1)
pool3=MaxPool1D(pool_size=(2), strides=(2), padding="same")(conv3_1)
flatten=Flatten()(pool3)
dense_end1 = Dense(64, activation='relu')(flatten)
dense_end2 = Dense(32, activation='relu')(dense_end1)
main_output = Dense(5, activation='softmax', name='main_output')(dense_end2)

model = Model(inputs= inputs_cnn, outputs=main_output)


model.compile(optimizer='adam', loss='categorical_crossentropy',metrics =
['accuracy'])

callbacks = [EarlyStopping(monitor='val_loss', patience=8),


ModelCheckpoint(filepath='best_model.h5', monitor='val_loss',
save_best_only=True)]
history=model.fit(X_train, y_train,epochs=40,callbacks=callbacks,
batch_size=32,validation_data=(X_test,y_test))
model.load_weights('best_model.h5')
return(model,history)
In [25]:
def evaluate_model(history,X_test,y_test,model):
scores = model.evaluate((X_test),y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

print(history)
fig1, ax_acc = plt.subplots()
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Model - Accuracy')
plt.legend(['Training', 'Validation'], loc='lower right')
plt.show()

fig2, ax_loss = plt.subplots()


plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Model- Loss')
plt.legend(['Training', 'Validation'], loc='upper right')
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.show()
target_names=['0','1','2','3','4']

y_true=[]
for element in y_test:
y_true.append(np.argmax(element))
prediction_proba=model.predict(X_test)
prediction=np.argmax(prediction_proba,axis=1)
cnf_matrix = confusion_matrix(y_true, prediction)

In [26]:
from keras.layers import Dense, Convolution1D, MaxPool1D, Flatten, Dropout
from keras.layers import Input
from keras.models import Model
from keras.layers.normalization import BatchNormalization
import keras
from keras.callbacks import EarlyStopping, ModelCheckpoint

model,history=network(X_train,y_train,X_test,y_test)
Train on 100000 samples, validate on 21892 samples
Epoch 1/40
100000/100000 [==============================] - 107s 1ms/step - loss: 0.2446 -
accuracy: 0.9106 - val_loss: 0.1766 - val_accuracy: 0.9467
Epoch 2/40
100000/100000 [==============================] - 105s 1ms/step - loss: 0.1237 -
accuracy: 0.9549 - val_loss: 0.2360 - val_accuracy: 0.9143
Epoch 3/40
100000/100000 [==============================] - 107s 1ms/step - loss: 0.0915 -
accuracy: 0.9668 - val_loss: 0.1536 - val_accuracy: 0.9503
Epoch 4/40
100000/100000 [==============================] - 103s 1ms/step - loss: 0.0753 -
accuracy: 0.9733 - val_loss: 0.2048 - val_accuracy: 0.9327
Epoch 5/40
100000/100000 [==============================] - 104s 1ms/step - loss: 0.0625 -
accuracy: 0.9775 - val_loss: 0.1682 - val_accuracy: 0.9478
Epoch 6/40
100000/100000 [==============================] - 104s 1ms/step - loss: 0.0528 -
accuracy: 0.9811 - val_loss: 0.1629 - val_accuracy: 0.9519
Epoch 7/40
100000/100000 [==============================] - 103s 1ms/step - loss: 0.0467 -
accuracy: 0.9835 - val_loss: 0.2352 - val_accuracy: 0.9350
Epoch 8/40
100000/100000 [==============================] - 104s 1ms/step - loss: 0.0413 -
accuracy: 0.9859 - val_loss: 0.1797 - val_accuracy: 0.9549
Epoch 9/40
53216/100000 [==============>...............] - ETA: 46s - loss: 0.0331 -
accuracy: 0.9879
In [27]:
evaluate_model(history,X_test,y_test,model)
y_pred=model.predict(X_test)
Accuracy: 96.63%
<keras.callbacks.callbacks.History object at 0x7fee4db2b1d0>

i take the next function from : https://www.kaggle.com/coni57/model-from-arxiv-1805-00794

In [28]:
import itertools
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')

plt.imshow(cm, interpolation='nearest', cmap=cmap)


plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)

fmt = '.2f' if normalize else 'd'


thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, format(cm[i, j], fmt),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')

# Compute confusion matrix


cnf_matrix = confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1))
np.set_printoptions(precision=2)

# Plot non-normalized confusion matrix


plt.figure(figsize=(10, 10))
plot_confusion_matrix(cnf_matrix, classes=['N', 'S', 'V', 'F', 'Q'],normalize=True,
title='Confusion matrix, with normalization')
plt.show()
Normalized confusion matrix

We underlign that two class(supraventricular and fusion) is weeker than the other. maybe due to less
exemple in the starter dataset. I will try to improve in the next version.

You might also like