You are on page 1of 10

Expert Systems With Applications 122 (2019) 75–84

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

A robust deep convolutional neural network with batch-weighted loss


for heartbeat classification
Ali Sellami, Heasoo Hwang∗
Department of Computer Science and Engineering, University of Seoul, 02504 Korea

a r t i c l e i n f o a b s t r a c t

Article history: The early detection of abnormal heart rhythm has become crucial due to the spike in the rate of deaths
Received 14 June 2018 caused by cardiovascular diseases. While many existing works tried to classify heartbeats accurately, they
Revised 24 November 2018
suffered from the imbalance between heartbeat classes in the available ECG datasets since abnormal
Accepted 19 December 2018
heartbeats appear much less frequently than normal ones. In addition, most of existing methods heav-
Available online 21 December 2018
ily rely on data preprocessing such as noise removal and feature extraction, which is computationally
expensive, thus limits their use on low-cost portable ECG devices.
We present a novel deep convolutional neural network based on state-of-the-art deep learning tech-
niques for accurate heartbeat classification. We suggest a batch-weighted loss function to better quantify
the loss in order to overcome the imbalance between classes. The loss weights dynamically change as
the distribution of classes in each batch changes. Also, we propose to use multiple heartbeats for more
effective heartbeat classification.
Even though we use ECG signal from one lead only without any data preprocessing, our method con-
sistently outperforms existing methods of 5-class heartbeat classification. Our accuracy, positive produc-
tivity, sensitivity and specificity under intra-patient paradigm are 99.48%, 98.83%, 96.97% and 99.87%, and
those under inter-patient paradigm are 88.34%, 48.25%, 90.90% and 88.51% respectively.
© 2018 Elsevier Ltd. All rights reserved.

1. Introduction robust classification methods that can easily applied to datasets


from different domains are highly demanded, considering the in-
According to World Health Organization (WHO), cardiovascu- creasing amount of time series data available in various domains.
lar diseases take the lion’s share of death causes globally by ap- In this paper, we focus on classifying important types of car-
proximately 31%, out of which 82% are in low or middle-income diovascular diseases, arrhythmias. According to the American Heart
countries, due to pricey high quality electrocardiograms (ECGs) and Association (“AHA”), arrhythmias are any change from the normal
shortage in medical experts to read and interpret the signal (WHO sequence of electrical impulses, causing abnormal heart rhythms
Cardiovascular Diseases Factsheet, 2017). and can be either life-threatening or require medical therapy to
Recently, various types of one-lead portable ECG devices such prevent future problems. The Association for the Advancement of
as chest patches and wristbands have become more widely avail- Medical Instrumentation (AAMI) (American National Standards In-
able. As the amount of ECG data that is continuously collected us- stitute, 2012) provides a clear guideline for grouping heartbeat
ing these devices grows rapidly, both the interest and the opportu- types under 5 super classes as shown in Table 1.
nity on the effective and robust detection of important arrhythmias Many methods have been suggested for automatic detection of
such as atrial fibrillation, one of the leading causes of stroke, are arrhythmia (Acharya et al., 2017; deChazal, O’Dwyer, & Reilly, 2004;
growing as well. It is also true that the research on heartbeat clas- Huang, Liu, Zhu, Wang, & Hu, 2014; Martis, Acharya, Lim, & Suri,
sification methods involving deep learning techniques is attracting 2013; Ye, Coimbra, & Vijaya Kumar, 2010; Yu & Chen, 2007), but
more attention than ever before, since more ECG data for training none fully addressed both effectiveness and real-time applicability
means that deeper neural networks with better classification per- at the same time.
formance can be constructed. At the same time, more adaptive and This paper aims at providing a highly robust and efficient heart-
beat classification method that can be used on low-cost portable
ECG monitors for early detection of arrhythmia. Our approach can

Corresponding author. perform accurate classification using raw ECG signal from single
E-mail addresses: asellami1@gmail.com (A. Sellami), hwang@uos.ac.kr (H. lead without any data preprocessing such as noise removal or fea-
Hwang).

https://doi.org/10.1016/j.eswa.2018.12.037
0957-4174/© 2018 Elsevier Ltd. All rights reserved.
76 A. Sellami and H. Hwang / Expert Systems With Applications 122 (2019) 75–84

Table 1
Various types of heartbeats grouped under five super-classes (N, S, V, F and Q) defined by the AAMI.

AAMI classes N (Nonectopic) S (Supraventricular V (Ventricular F (Fusion beat) Q (Unknown beat)


ectopic beat) ectopic beat)

Heartbeat types Normal beat Atrial premature Premature Fusion of Paced beat
beat ventricular ventricular and
contraction normal beat
Left bundle branch
block beat
Aberrated atrial Ventricular escape Fusion of paced
premature beat beat and normal beat
Right bundle Unclassifiable beat
branch block beat
Nodal (junctional)
premature beat
Atrial escape beat
Nodal (junctional)
escape beat
Supraventricular
premature beat

ture extraction, which is essential for real-time heartbeat classifi- et al., 2004; Huang et al., 2014; Li & Zhou, 2016), allowing a more
cation on portable ECG sensors. Also, our approach can be applied realistic evaluation of heartbeat classification methods.
to classify imbalanced time series datasets from various domains in Luz and Menotti (2011) proved that the same heartbeat clas-
which normal patterns are largely predominant. Such datasets in- sification methods evaluated under intra-patient paradigm show
clude various types of sensor data collected from biological sources significantly higher accuracy than under inter-patient paradigm, as
such as human hearts or brains and non-biological sources such as shown in Table 2.
autonomous vehicles and manufacturing lines.
The contributions of this paper are the followings: 2.2. Existing methods

• Allow a robust real-time classification of heartbeats using raw Acharya et al. (2017) used a 9-layer convolutional neural net-
ECG signal without any data preprocessing. work classifier to build four variations of their method, (1) with-
• Propose a novel loss weights formula calculated dynamically for out noise removal and without data balancing, (2) with noise re-
each class according to its occurrences in each batch. moval and without data balancing, (3) without noise removal and
• Design and build a robust convolutional neural network model with data balancing and (4) with noise removal and with data bal-
that shows high classification performance under both intra- ancing. Since (2) and (4) outperformed (1) and (3), we compare
patient and inter-patient evaluation paradigms. our method with the two variations with noise removal in Table
In the next section, we discuss two major evaluation paradigms 6. To overcome the imbalanced data problem, they generated syn-
of heartbeat classification methods, intra-patient and inter-patient thetic data by varying the standard deviation and mean of Z-score
paradigms, and present existing approaches showing highest per- calculated from original normalized ECG signal. However, this may
formance under each paradigm. In Section 3, we explain the ECG increase the probability of generating biased results because the
dataset for our experiments and propose our heartbeat classifi- synthetic data is generated directly from the original data.
cation method. In Section 4, we analyze the experimental results Martis et al. (2013) applied a wavelet-based denoising tech-
of our approach and the state-of-the-art approaches under both nique on ECG signal followed by QRS complex detection (Pan &
paradigms and conclude in Section 5. Tompkins, 1985). Then, they generated the discrete cosine trans-
form of each heartbeat (Ahmed, Natarajan, & Rao, 1974) and re-
2. Related works duced its dimensionality using principal component analysis (PCA)
(Duda, Hart, & Stork, 2001). Afterwards, they chose discriminatory
Heartbeat classification has been the subject of many re- PCA features as an input for their classifiers. In their study, they
searches and there are two paradigms used for performance eval- used five layered feed forward neural network, least square SVM
uation of heartbeat classification methods, intra-patient paradigm and a probabilistic neural network where the latter performed the
and inter-patient paradigm (Da Silva Luz, Schwartz, Cámara- best.
Chávez, & Menotti, 2016; De Lannoy, François, Delbeke, & Verley- deChazal et al. (2004) preprocessed ECG signal from two leads
sen, 2012; deChazal et al., 2004). by removing noises such as baseline wander, power line interfer-
ence and high frequency signal. From each lead, 15 domain-specific
2.1. Evaluation paradigms features were extracted and used as an input for two classifiers.
The outputs from the classifiers were combined by a classifier
Under intra-patient paradigm, heartbeats of the same patient combiner. They tried to address the imbalanced data issue by using
are used for both training and testing of heartbeat classifiers a weighted likelihood function, but it remained static throughout
(Acharya et al., 2017; Martis et al., 2013; Yu & Chen, 2007). It all the batches.
has been demonstrated by deChazal et al. (2004) that intra-patient The best performing variants of Acharya et al. (2017) and all
paradigm is well known for producing biased results by learning the other methods above used denoising, which may remove some
characteristics of each patient during training phase hence show- important details from the heartbeat waveform and affect the im-
ing almost 100% classification accuracy in testing phase. However, portance of subsequent classification. Furthermore, Acharya et al.
in real-world scenarios, the trained model must deal with heart- (2017) and Martis et al. (2013) did not evaluate their methods
beats from patients that are unseen during training. under inter-patient paradigm (Da Silva Luz et al., 2016) while
Under inter-patient paradigm, researchers use heartbeats from the method proposed by Huang et al. (2014) was evaluated un-
totally different patients for training and testing phases (deChazal der inter-patient paradigm without performing denoising. How-
A. Sellami and H. Hwang / Expert Systems With Applications 122 (2019) 75–84 77

Table 2
The difference of classification performances of each method evaluated using two different evalu-
ation paradigms.

Inter-patient accuracy (%) Intra-patient accuracy (%)

Ye et al. (2010) 75.15 96.53


Yu and Chen (2007) 73.87 81.10
Yu and Chou (2008) 75.21 95.39
Güler and Übeyli (2005) 66.70 89.06
Song, Lee, Cho, Lee, and Yoo (2005) 76.29 98.66

Table 3
ECG recordings included in training dataset (DS1) and test dataset (DS2)
proposed by (deChazal et al., 2004).

Dataset MIT-BIH recordings in the dataset

DS1 101, 106, 108, 109, 112, 114, 115, 116, 118, 119, 122, 124,
201, 203, 205, 207, 208, 209, 215, 220, 223, 230
DS2 100, 103, 105, 111, 113, 117, 121, 123, 200, 202, 210, 212,
213, 214, 219, 221, 222, 228, 231, 232, 233, 234

ever, their input data was a 101-dimensional feature vector gener-


ated by concatenating 100 random projections of 2-lead ECG data
and the RR-interval of each heartbeat. Also, their method could
Fig. 1. An example showing the impact of the number of heartbeats to the classifi-
classify only three classes out of 5 recommended by the AAMI in
cation performance, where a fusion beat (F) is (1) misclassified as (V) using a single
Table 1. heartbeat as input but (2) correctly classified as (F) using two heartbeats as input.
Notice that it is hard to use the classification methods that
heavily rely on domain-specific features (deChazal et al., 2004;
Huang et al., 2014) to classify time series data from different do- ECG data as input. Later, we will show through experiments that
mains. our heartbeat classification method performs as well as or better
than the state-of-the-art methods using multi-lead ECG data.
3. Materials and methodology
3.2. Input data generation
3.1. ECG dataset
For a fair comparison with existing works, we use the MIT-BIH
The ECG dataset used in our experiments is from the MIT- arrhythmia database, a standard ECG database that is widely refer-
BIH arrhythmia database (Moody & Mark, 2001) publicly avail- enced in the literature (Da Silva Luz et al., 2016). It is highly het-
able on PhysioNet (Goldberger et al., 20 0 0). We use MIT-BIH erogeneous in that it contains heartbeats from different patients,
dataset to compare the classification performance with other ex- different ages, different genders and even different countries. It
isting methods whose performance was measured using it. The provides the annotations for heartbeat classification and R-peak
MIT-BIH dataset contains 48 30-minutes-long records from 47 pa- locations that were verified by at least two independent experts.
tients. The ECG records were digitized at 360 samples and obtained Therefore, we do not perform any preprocessing or feature extrac-
from MLII, V1, V2, V4 and/or V5 leads. Cardiologists annotated each tion over the ECG signal, but use the raw noisy waveform corre-
record. sponding to each heartbeat directly as input for training and test-
For fair comparison, we split the ECG dataset into DS1 and DS2 ing.
just as described by deChazal et al. (2004), and use DS1 for train- After browsing various heartbeat waveforms in the MIT-BIH
ing and DS2 for testing for experiments under inter-patient eval- database, we set the extraction window size to 170 samples to
uation paradigm. On the other hand, under intra-patient evalua- capture the most important waves that define a heartbeat. From
tion paradigm we apply a 10-fold cross validation throughout all the raw ECG signal from MLII lead, we extract heartbeats using a
records in DS1 and DS2 combined. Table 3 shows the lists of the fixed-size window of 170 samples heartbeats around each R-peak
names of MIT-BIH records in DS1 and DS2 separately. Among the location by taking 70 samples before an R-peak and 100 samples
48 records available in the MIT-BIH dataset, DS1 and DS2 contain after. In our real-time heartbeat classification scenario, a fast R-
22 records each and four records with paced beats were excluded peak detection algorithm such as variations of Pan and Tompkins
from our experimental datasets as recommended by AAMI. (1985) can be used to detect R-peak locations for heartbeat seg-
In contrast to many existing methods that used ECG signal mentation. However, since heartbeat segmentation is a common
from multiple leads, we build our classifier using heartbeats ex- step in heartbeat classification methods, we focus on classifying
tracted from a single ECG lead, the modified limb lead II (MLII). heartbeats assuming that the R-peak location of each heartbeat is
It is because we want to simulate a practical and severe environ- given.
ment where a user wears a small one-lead portable ECG device on After the heartbeat extraction, we can use each individual
which real-time arrhythmia detection algorithm is running. An in- heartbeat as input data. However, we observed that there are cases
creasing popularity of continuous ECG monitoring using single-lead in which neighboring heartbeats of the target heartbeat can help
portable ECG devices would soon allow us to use a much larger our classifier to predict the label of the target more accurately as
collection of one-lead ECG data for real-time heartbeat classifica- depicted in Fig. 1. Fig. 1(1) is an example in which a target heart-
tion. We want to mention that our method can be easily extended beat of F class is misclassified as V when single heartbeats are
to classify heartbeat types using data from multiple leads by in- given as input. In fact, some F beats have similar morphology to
creasing the number of nodes in the input layer of our CNN archi- V beats, so it is difficult to classify them correctly based on the
tecture and adding corresponding connections to accept multi-lead morphology of individual heartbeats. However, we observed that
78 A. Sellami and H. Hwang / Expert Systems With Applications 122 (2019) 75–84

Table 4
The number of heartbeats per heartbeat class and their distribution in DS1, DS2 and.
DS1 + DS2. DS1 + DS2 is used for intra-patient paradigm and DS1 and DS2 are used for
inter-patient paradigm.

Heartbeat class N S V F Q Total

DS1 + DS2 Number of beats 90,070 2781 7007 802 15 100,675


% 89.47 2.76 6.96 0.80 0.01 10 0.0 0
DS1 Number of beats 45,839 944 3788 414 8 50,993
% 89.89 1.85 7.43 0.81 0.02 10 0.0 0
DS2 Number of beats 44,231 1837 3219 388 7 49,682
% 89.03 3.70 6.48 0.78 0.01 10 0.0 0

the other four classes (S, V, F, Q) account for 2.76%, 6.96%, 0.80%
and 0.01% respectively.
To address this issue of imbalanced dataset, Acharya et al.
(2017) enlarged the size of their input data by incorporating syn-
thetic data. However, this may generate biased results with spend-
ing more training time. Instead, our method uses the original input
data without adding any extra data.
To achieve high classification performance with the original in-
put data only, we propose to use a novel batch-weighted loss func-
tion that is defined below. Firstly, we define the set of labels of
M heartbeats in the ith batch as Batch_l abel si and the jth label in
Batch_l abel si as yi, j .
Batch_l abel si = {yi,1 , yi,2 , yi,3 , . . . , yi,M }, (1)
where yi,j ∈ {N, S, V, F, Q}. Then, let us define the loss weight of
the kth class in the ith batch, cwi,classk , in Eq. (2). Here, classk ∈ {N,
Fig. 2. Examples of three input types for the classification of the target heartbeat S, V, F, Q}.
in our experiments. (a) No neighboring heartbeats, (b) one previous heartbeat and M
(c) two neighboring heartbeats. j=1 1yi, j =classk
cwi,classk = 1 − + ε, (2)
M
where M is the batch size and ε is set to 0.02 to prevent having a
the class of a heartbeat is determined not only by the morphology
loss weight equal to 0 when Batch_l abel si contains only one class.
of a heartbeat but also by the heartbeat rhythm composed of its
Finally, a weighted cross entropy loss function is calculated for
neighboring beats. In Fig. 1(2), we can see that a given target beat
the ith batch as in Eq. (3).
is correctly classified as F when we give its previous beat as input.
M
In this paper, we choose the most effective input for the pro- Li = − cwi, yi, j log yˆi, j + λW 22 , (3)
yi, j
posed method by comparing 3 inputs each containing different j=1

number of heartbeats shown in Fig. 2. Let us suppose we want where yˆi, j is the predicted probability of the jth training instance
to classify the heartbeat in the middle of the three beats in Fig. in the ith batch, W is the weight matrices of all the layers and
2(a)–(c). Fig. 2(a) shows the situation where no neighboring heart- λ is the L2 regularization parameter which is set to 0.01 in our
beats are used while Fig. 2(b) depicts 2-beat input by appending experiments.
the previous heartbeat to the target heartbeat. Fig. 2(c) shows 3-
beat input data in which the target beat is surrounded by one pre- 3.4. Classification model architecture
vious beat and one next beat. Notice that we do not provide the
class labels of neighboring heartbeats, but the label of the target We propose a novel classification model architecture of an end-
beat only. Later, we present the experimental results to show the to-end deep convolutional neural network (CNN) to classify heart-
impact of neighboring heartbeats on the performance of heartbeat beats of 5 classes, {N, S, V, F, Q}, as shown in Fig. 3. The fol-
classification. The inputs we compare in our experiments are the lowings are architectural details carefully chosen to achieve high
following: classification performance.

• Input(a): a set of target heartbeats and their class labels as


• Input: our input data is composed of target heartbeats and their
shown in Fig. 2(a). class labels. We can provide arbitrary number of neighboring
• Input(b): a set of target heartbeats preceded by their previous beats with the target heartbeat for better classification perfor-
heartbeat and the class labels of target heartbeats as shown in mance. In our experiments, we measure the classification per-
Fig. 2(b). formance with three different inputs as depicted in Fig. 2. The
• Input(c): a set of target heartbeats with their neighboring number of nodes in the input layer corresponds to the number
heartbeats and the class labels of target heartbeats as shown of samples in each input, e.g. 170 (target beat only) and 340
in Fig. 2(c). (target beat with a previous beat). Notice that our method does
not require expensive data preprocessing such as denoising or
domain-specific feature extraction, which makes our method
3.3. Batch-weighted loss for imbalanced data more robust and easily applicable to datasets from other do-
mains.
The class distribution of heartbeats in the MIT-BIH arrhythmia • Convolution layers: our model contains 9 convolutional layers.
database in Table 4 shows that the dataset has a problem of im- Each convolution layer has 64k kernels with a length of 16
balanced classes. Class N occupies 89.47% of the full dataset, while where k starts from 1 and increments by 1 per 2 convolutional
A. Sellami and H. Hwang / Expert Systems With Applications 122 (2019) 75–84 79

layers. This choice allows the network to focus on small details


of the input (heartbeats) and then construct the whole picture
at the end. Let us denote xCi as the output of the ith convolu-
tional layer Ci .
• Batch normalization: Using a neural network with more layers
are helpful in improving the classification performance of high
dimensional time series data like ECG datasets. However, learn-
ing the weights in a deep network such as our 9-layer CNN in
Fig. 3 may require a long training time. To improve the training
time, we perform batch normalization after each convolutional
layer for faster training stage. Eq. (4) is the batch normalization
formula (Ioffe & Szegedy, 2015) we use for our method.
 
xCi − μβ xCi
BNCi =    , (4)
σβ2 xCi

where μβ (xCi ) and σβ (xCi ) are the mean and the variance of xCi ,
and BNCi is the output after applying batch normalization.

• Activation functions: We use “Tanh” function after the first con-


volution while “ReLU” is used after each of the next 8 convo-
lutions. If we use “ReLU” after the first convolution, it is highly
likely that important heartbeat features from the first convolu-
tional layer are largely lost from the beginning of the network
since it will convert all the negative values into 0. Therefore, in
order to preserve important information generated by the first
convolutional layer, we choose to use “Tanh” function so that a
large amount of information can be well preserved in the ker-
nels and fed to the following convolutional layers. For the re-
maining convolutional layers, we use “ReLU”.
• Dropout: Deep convolutional neural networks may overfit the
training data, which seriously degrade the classification perfor-
mance. To prevent overfitting, we choose to use 75% dropout
(Srivastava, Hinton, Krizhevsky, Sutskever, & Salakhutdinov,
2014) before layers that are prone to overfitting, which are con-
volutional layers 3 through 9. We picked 75% dropout by a de-
tailed analysis of the behavior of our network during training.
• Residual connections: Deep convolutional neural networks come
with the issue of losing the original information over deep lay-
ers. To address this issue, we add skip connections (He, Zhang,
Ren, & Sun, 2016) that enables useful information to propagate
better through the deep networks during training without neg-
atively affecting the learning curve.
• Likelihood estimation: after the last skip connection, we apply
a fully connected layer followed by a softmax function. We do
this to predict the probability of a class after minimizing the
cross entropy weighted-loss given in Eq. (3).

4. Results and discussion

We evaluate our heartbeat classification method under both


intra-patient and inter-patient paradigms by comparing with the
state-of-the-art methods of each paradigm, specifically those who
followed the AAMI guidelines and grouped the classes as shown in
Table 1. Extensive experimental results show that our method gen-
erates highly accurate heartbeat classification results under both
paradigms.
In contrast to existing methods that heavily rely on expensive
Fig. 3. The architecture of an end-to-end convolutional neural network consisting data preprocessing such as noise removal, feature extraction and
of four residual blocks each encapsulating two convolutional layers and two activa- synthetic data generation, note that our method simply uses raw
tion functions; a special convolutional layer at the beginning and a dense layer at 1-lead ECG signal as input.
the end followed by a softmax. We used Python programming language with an Nvidia GeForce
GTX 1070 GPU, 32GB RAM and Microsoft Windows10 operating
system.
80 A. Sellami and H. Hwang / Expert Systems With Applications 122 (2019) 75–84

Table 5 In Table 6, we compare all the performance metric results of


The confusion matrix of our method under intra-patient paradigm ob-
our method with those of Acharya et al. (2017) with imbalanced
tained by performing 10-fold cross validation.
data, Acharya et al. (2017) with balanced data and Martis et al.
Predicted label (2013).
N S V F Q Acharya et al. (2017) constructed a deep convolutional neural
N 89,944 68 31 22 0
network of 9 layers with input data obtained after noise removal.
S 250 2526 4 1 0 As shown in Table 6, they measured the heartbeat classification
True
V 31 2 6941 33 0 performance with both imbalanced and balanced input data. They
label
F 38 1 40 723 0 obtained balanced input data by adding synthetic data generated
Q 0 0 0 1 14
by varying the standard deviation and mean of Z-score from the
original imbalanced ECG data.
Our method significantly outperforms the method by Acharya
4.1. Performance metrics et al. (2017) with imbalanced input data. While all the performance
metrics are improved by using balanced input data, our method
We follow the guidelines provided by AAMI to calculate four still shows a superior performance to Acharya et al. (2017) with
performance metrics, accuracy, positive productivity, sensitivity balanced input data in all aspects. In addition, they added syn-
and specificity as the followings: thetic data directly generated from the original input data, which
TP + TN may produce biased results. At the same time, it increases the
accuracy = (5)
TP + TN + FP + FN training time significantly, especially when the degree of imbalance
in the original input data is high.
TP
positive productivity = (6) Since Martis et al. (2013) did not provide the confusion matrix
TP + FP needed to compute their aggregate performance metrics, we com-
TP pare Martis et al. (2013) and our method using the average perfor-
sensitivity = (7) mance results across 5 classes instead. Compared to the method
TP + FN
proposed by Martis et al. (2013), our method shows similar ac-
TN curacy (99.79%–99.52%) and specificity (99.36%–99.91%), but their
specificity = , (8)
TN + FP method is better in positive productivity (97.71%–99.58%) and sen-
where TP, TN, FP and FN are the numbers of true positives, true sitivity (94.65%–98.69%).
negatives, false positives and false negatives respectively. We com- However, the method proposed by Martis et al. (2013) is com-
pute these performance measures per heartbeat class and provide putationally expensive since they performed heavy data prepro-
them in Tables 6 and 10. cessing steps: noise removal and QRS complex detection followed
For performance comparison with existing methods, we mainly by discrete cosine transform and a principal component analysis
use the aggregate performance measures (the gross statistics) that (PCA) to select the most discriminatory PCA features to construct
assign equal weight to each heartbeat just as many prior works a probabilistic neural network classifier. In contrast, our method
such as Acharya et al. (2017), deChazal et al. (2004) and Bani- simply uses the raw ECG signal after heartbeat extraction, which
Hasan, El-Hefnawi, and Kadah (2011). When computing the aggre- enables our method to be applicable to real-time heartbeat classi-
gate performance measures, TP is defined as the number of abnor- fication.
mal beats correctly classified while TN is the number of normal In fact, the existing methods in Table 6 did not evaluate their
beats classified as normal. FP represents normal beats classified as performance under inter-patient paradigm that is more realistic
abnormal and FN represents abnormal beats classified as normal. and challenging. In the next section, we evaluate our method un-
The aggregate performance measures are mainly used throughout der inter-patient paradigm to show that our method is robust
this paper for comparison with existing methods. enough to achieve high classification performance in real life sce-
However, some state-of-the-art methods such as Martis et al. narios.
(2013) and Li and Zhou (2016) did not provide the aggregate per-
formance results or the confusion matrix, but the performance re- 4.3. Performance evaluation under inter-patient paradigm
sults per heartbeat class or the averages. For fair comparison with
such methods, we provide the average performance results of our Under inter-patient paradigm, the classification performance of
method as well in Tables 6 and 10. Note that the performance re- each method is measured by training with heartbeats from ECG
sults underlined are the averages across classes. recordings in DS1 and then testing with those from D2 as in Table
3. In addition, we measured the performance of our approach with
4.2. Performance evaluation under intra-patient paradigm inputs with different number of heartbeats, Input(a), Input(b) and
Input(c). Table 7 shows the classification performance of four vari-
Under intra-patient paradigm, we performed 10-fold cross- ants of our approach to show the impact of multi-beat inputs and
validation as Acharya et al. (2017) did to obtain a confusion ma- the batch-weighted loss function.
trix for our method and compare with the state-of-the-art intra- By comparing the performance of A1, A2 and A3 in Table 7, we
patient heartbeat classification approaches. Firstly, we randomly can see that our approach shows better classification performance
shuffle the 100,675 beats in DS1 + DS2 dataset and partition them when we use multiple heartbeats as input. In fact, our classifica-
into 10 folds. For each fold, we train our CNN model by using tion model constructed with inputs of length 2 (A2 in Table 7) per-
the remaining 9 folds as training set and test the trained model forms the best.
with the fold. Once 10 confusion matrices are obtained, we per- In addition, to show the impact of our batch-weighted loss, we
form element-wise addition to obtain the final confusion matrix as measured the performance of our approach with Input(b) with-
shown in Table 5. Since some existing approaches did not report all out using the batch-weighted loss formula as in A4 in Table 7. It
the four performance evaluation metrics of AAMI, we re-calculated turned out that the absence of the batch-weighed loss resulted in
them using their confusion matrices for fair comparison. Our con- a significant drop in accuracy (88.34%–84.03%), positive productiv-
fusion matrix is in Table 5 and the comparison results are in ity (48.25%–35.25%) and sensitivity (90.90%–60.46%). In particular,
Tables 6. a 30% drop in sensitivity can be interpreted as the result of one
A. Sellami and H. Hwang / Expert Systems With Applications 122 (2019) 75–84 81

Table 6
Performance results per class, the averages (underlined) and the aggregate performance results of
our approach. We compare our approach with three state-of-the-art approaches under intra-patient
paradigms.

Positive
Accuracy productivity Sensitivity Specificity

N 99.56% 99.65% 99.87% 96.99%


S 99.68% 97.27% 90.83% 99.93%
Class V 99.86% 98.93% 99.06% 99.92%
F 99.86% 92.69% 90.15% 99.94%
Q 10 0.0 0% 10 0.0 0% 93.33% 10 0.0 0%

Our approach 99.79% 97.71% 94.65% 99.36%


Our approach 99.48% 98.83% 96.97% 99.87%
Acharya et al. (2017) 89.03% 62.29% 95.74% 88.39%
(Imbalanced)
Acharya et al. (2017) 94.03% 97.81% 96.64% 91.54%
(Balanced)
Martis et al. (2013)a 99.52% 99.58% 98.69% 99.91%
a
Martis et al. (2013) did not provide the confusion matrix, so the aggregate performance results
cannot be computed.

Table 7
Classification performance results (gross statistics) of four variants of our approach. Three vari-
ants, A1, A2 and A3, used batch-weighted loss (BWL) formula, and inputs are Input (a), Input
(b) and Input(c) respectively. A4 used Input (b) but did not use BWL.

Accuracy Positive productivity Sensitivity Specificity

A1: Input(a) with BWL 84.36% 37.04% 65.13% 87.26%


A2: Input(b) with BWL 88.34% 48.25% 90.90% 88.51%
A3: Input(c) with BWL 87.17% 43.32% 59.88% 91.14%
A4: Input(b) without BWL 84.03% 35.25% 60.46% 87.67%

Table 8
Comparison of four variants of our approach. ‘CNN with 9 layers’ corresponds to A2 in Table 7 (the proposed approach) and all the
other network architectures also use the same settings as A2 (Input(b) with BWL) except for the number of convolutional layers.

Training DS2 classification


Number of epoch time Model size DS2 classification accuracy (acc, + p,
Network architecture weights (seconds) (MB) time (seconds) sen, sp)

CNN with 5 layers 149,184 21.63 ∼17 12.02 (82.13%, 33.16%,


68.38%, 84.62%)
CNN with 7 layers 433,600 47.41 ∼60 25.63 (82.40%, 32.92%,
64.96%, 85.38%)
CNN with 9 layers 1,395,648 132.64 ∼219 64.18 (88.33%, 48.24%,
90.90%, 88.51%)
CNN with 11 layers 4,892,608 424.96 ∼840 214.66 (89.11%, 50.26%,
76.61%, 91.33%)

class (the N class) dominating other classes due to the absence of Table 9
The confusion matrix of our method (A2 in Table 7) under inter-patient
the batch-weighted loss formula that deals with the imbalanced
paradigm.
data issue. When the N class dominates others, the number of ab-
normal beats misclassified as N significantly increases (474–1942), Predicted label
decreasing the number of abnormal beats correctly classified to N S V F Q
their respective classes (4736–2969). With more FN and less TP,
N 39,151 3414 1063 602 1
the sensitivity of A4 is lowered to 60.46%. Hence, we conclude that S 294 1507 29 7 0
True
A2 showed the best classification performance with the given im- V 107 26 2963 123 0
label
balanced input data. F 72 3 48 265 0
In Table 8, we compare our approach A2 (CNN with 9 convo- Q 1 0 5 0 1

lutional layers) with three variants with different number of con-


volutional layers to analyze the impact of network architecture on
the training stage and the classification stage. As the number of
class and the averages in Table 10. We will mainly compare our
layers increases, the number of weights, training epoch time and
approach A2 and A2 for 3 classes classification with other ap-
the model size also increase. Also, it takes more time to classify
proaches in Table 11.
49,682 heartbeats in DS2. The classification accuracy also improves
The classification performance of our method is better than
until we use 9 layers. When 11 layers are used, however, sensitivity
the one proposed in deChazal et al. (2004) in terms of accuracy
drops significantly from 90.90% to 76.61% due to overfitting. There-
(88.34%–85.88%), positive productivity (48.25%–42.21%) and speci-
fore, we use 9 convolutional layers for our approach A2.
ficity (88.51%–86.86%), but their method performed better in sensi-
Table 9 presents the confusion matrix of our approach A2 ob-
tivity (90.90%–92.85%). Recall that our method involves no feature
tained with 2-beat inputs containing target heartbeats and their
extraction and uses 1-lead ECG raw signal as input. Unlike our ap-
previous ones. Then, we calculated four performance metrics per
proach, deChazal et al. (2004) constructed their classifier by us-
82 A. Sellami and H. Hwang / Expert Systems With Applications 122 (2019) 75–84

Table 10
Performance results of our method (A2 in Table 7) per class and the averages.

Positive
productiv-
Accuracy ity Sensitivity Specificity

N 88.82% 98.80% 88.51% 91.30%


S 92.41% 30.44% 82.04% 92.80%
Class V 97.18% 72.13% 92.05% 97.54%
F 98.28% 26.58% 68.30% 98.52%
Q 99.99% 50.00% 14.29% 10 0.0 0%
Our approach (A2) 95.33% 55.59% 69.04% 96.03%

Table 11
The aggregate performance results of our approach (A2 and A2 ) and three state-of-the-art approaches
under inter-patient paradigm. Performance results underlined are the averages across 5 classes, not the
gross statistics.

Accuracy Positive productivity Sensitivity Specificity

Our approach (A2) 88.34% 48.25% 90.90% 88.51%


deChazal et al. (2004) 85.88% 42.21% 92.85% 86.86%
Li and Zhou (2016) 94.61% 38.03% 51.77% –
Huang et al. (2014)a 94.55% 66.88% 96.91% 94.74%
Our approach (A2 for 3 classes) 95.08% 68.65% 96.26% 95.00%
a
Huang et al. (2014) can classify only 3 classes, so cannot identify F and Q heartbeats.

ing many domain-specific features extracted from 2-lead ECG sig- each characteristic on the performance and applicability of our
nal. The extracted features were used as an input for a statistical heartbeat classification method.
classifier model of each signal lead and a third classifier is built Firstly, we proposed a robust deep convolutional neural net-
to combine the output of previous two classifiers. Therefore, their work architecture with state-of-the-art deep learning techniques
method is computationally expensive, hence not suitable for real- such as residual connections, dropouts and batch normalization to
time heartbeat classification on 1-lead ECG monitoring devices. achieve high classification performance. This novel architecture al-
Li and Zhou (2016) decomposed ECG signal by wavelet packet lowed us to achieve high classification performance with a given
decomposition, calculated entropy from the decomposed coeffi- raw ECG data. Therefore, our method did not involve any expen-
cients as representative features, and built a classification model sive data preprocessing steps such as noise removal using signal
using random forests. They showed the highest aggregate accuracy processing techniques and domain-specific feature extraction. This
(94.61%) among three approaches that classify 5 classes. Since pos- greatly improves the applicability of our approach to various do-
itive productivity and sensitivity of Li and Zhou (2016) are pre- mains other than heartbeat classification. Unlike our method, many
sented per class and no confusion matrix is provided, we use existing methods including three state-of-the-art methods in our
the average positive productivity and the average sensitivity across comparison experiments relied on various preprocessing steps to
classes for our comparison. Specificity results were not provided. achieve high classification performance. Notice that, even though
The average positive productivity and sensitivity of Li and Zhou they (Acharya et al., 2017; deChazal et al., 2004; Martis et al., 2013)
(2016) are only 38.03% and 51.77% while our simpler approach can executed expensive data preprocessing steps, none of them showed
achieve 55.59% and 69.04% respectively. Notice that given an im- consistently better experimental results than our approach in all of
balanced dataset, one can achieve high accuracy by sacrificing pos- the four AAMI performance evaluation metrics in Tables 6 and 11.
itive productivity and sensitivity just as we can observe in the re- Secondly, in order to further improve our classification perfor-
sults of A4 in Table 7. mance with the raw ECG data, we suggested using multiple heart-
Huang et al. (2014) showed the highest performance results beats, each target beat and its neighboring beats, as input for train-
among all the approaches. However, recall that Huang et al. ing. The impact of multi-beat inputs was illustrated in Table 7 in
(2014) constructed a 3-class heartbeat classifier while other ap- which the classification models obtained with multi-beat inputs
proaches targeted on classifying all the 5 classes suggested by (A2 and A3) showed significantly higher experimental results than
AAMI. For fair comparison with our approach, we constructed a the model built with 1-beat inputs (A1). This verifies our intuition
new CNN model A2 that considers only 3 classes just as Huang that the class of a heartbeat can be more effectively classified by
et al. (2014) does. We can see from Table 11 that A2 outperforms considering both its morphology and the heartbeat rhythm com-
Huang et al. (2014) except for sensitivity (96.26%–96.91%). Notice posed of its neighboring beats. In our experiments, the classifica-
that Huang et al. (2014) used 2-lead ECG data as input, performed tion model obtained using 2-beat inputs, each target heartbeat pre-
100 random projections and extracted RR-interval of each heart- ceded by its previous beat (A2), performed the best among three
beat to construct 101-dimensional input vectors, while our method variants of our approach in Table 7. It outperformed other models
simply uses the single-lead ECG data without feature extraction. obtained from 1-beat inputs (A1) or 3-beat inputs (A3) up to 31%.
Thirdly, we used the original input ECG data with imbalance
between heartbeat classes as it is. To overcome the problem of
4.4. Discussion imbalanced data, we optimized parameters of the proposed con-
volutional neural network using a novel batch-weighted loss for-
The aim of this research was to propose a novel single-lead mula. Many existing methods including Acharya et al. (2017) tried
heartbeat classification method that needs little to no preprocess- to address this issue by balancing the dataset by adding synthetic
ing effort and still performs at the same level or better than the data or data from other sources. However, the size of such bal-
state-of-the-art methods. Here, we summarize the key character- anced data could become very large as the degree of imbalance in
istics of our approach, and discuss the impact and implication of a given input data is high, which would greatly increase the train-
A. Sellami and H. Hwang / Expert Systems With Applications 122 (2019) 75–84 83

ing time. Also, it was highly likely that this might generate biased However, we want to also mention that during the supervised
classification results since synthetic data was constructed directly learning stage, our method requires a large amount of heartbeat
from the given original data. In Table 6, we can see that our ap- databases annotated by clinical experts. In the medical domain, it
proach outperformed (Acharya et al., 2017) with balanced input is very difficult to obtain such datasets with many abnormal pat-
data except for positive productivity. deChazal et al. (2004) used a terns. Secondly, our method can classify heartbeat types but not
weighted likelihood function as a solution for the imbalanced data the types of abnormal rhythms such as atrial fibrillation that is a
issue. While it is used after heavy data preprocessing steps, it still major cause of stroke.
was not effective enough to beat our approach in all four AAMI
performance evaluation metrics as shown in Table 11, since the 5. Conclusion
loss weights remained static throughout batches. Our novel batch-
weighted loss formula enabled our approach to dynamically adapt We presented a novel deep convolutional neural network op-
to the changing class distribution across batches. Without batch- timized with a dynamic batch-weighted loss function. It performs
weighted loss formula, one class might dominate other classes dur- highly effective heartbeat classification using the raw ECG data as
ing training especially for small datasets. In Table 7, we demon- it is without heavy data preprocessing such as noise removal and
strated the impact of batch-weighted loss formula by comparing feature extraction. Our experiments also revealed that the classi-
the performance of two classification models obtained by using fication performance of our approach is further improved by us-
2-beat inputs with batch-weighted loss (A2) and 2-beat inputs ing 2-beat inputs. The robustness of our method is illustrated by
without batch-weighted loss (A4). Without batch-weighted loss, A4 consistency in high classification performance under both intra and
showed a lower classification performance than A2 in all aspects. inter-patient paradigms, even though we use single-lead raw ECG
We demonstrated the robustness of our approach by compar- data only.
ison experiments with the state-of-the-art methods under both We need to use larger heartbeat databases with annotations to
intra and inter-patient evaluation paradigms. Under intra-patient improve the classification performance of our method. However,
paradigm, our method achieved the highest accuracy of 99.48% as there is a serious lack in the annotated heartbeat databases that
shown in Table 6. Also, under inter-patient paradigm, it showed a are publicly available, whereas annotating heartbeat types by clin-
superior performance to the state-of-the-art methods that classify ical experts is very expensive and time-consuming. As a future
5 classes and 3 classes as shown in Table 11. work, we want to develop a semi-supervised heartbeat classifica-
Now, we discuss the classification time of our method. We need tion method by using a large amount of unannotated heartbeat
to train our CNN model during training stage, which usually re- databases for automatic feature learning so that we can use the
quires a long training time using a single GPU. However, the train- learned features as additional input to our CNN model or for fine
ing time can be significantly reduced simply by using multiple tuning of network weights. Next, we want to extend our classifi-
GPUs. Once our CNN model is trained, we can perform heartbeat cation method that currently classifies beat types to predict heart-
classification using the trained CNN model. The execution time of beat rhythms of multiple heartbeats. Lastly, we plan to apply our
the actual heartbeat classification with the trained CNN model is approach to time series datasets from other domains that suffer
very fast. During classification stage, given an ECG signal, we need from the imbalanced data issue, especially where normal patterns
to extract heartbeats just as any heartbeat classification methods occur a lot more frequently than abnormal ones.
needs to do. However, unlike other methods, we then skip sig-
nal denoising or feature extraction that can be very time consum-
ing. Given a raw ECG waveform corresponding to a heartbeat, our Acknowledgment
method can classify it directly just by a fixed number of matrix
multiplications. This work was supported by the National Research Foun-
In contrast, signal preprocessing techniques that are frequently dation of Korea grant funded by the Korea Government (NRF-
used in heartbeat classification methods usually have high time 2016R1C1B2015528 and No.2015R1A2A2A04005646).
complexity, increasing the overall classification time. For instance,
principal component analysis (PCA) used in Martis et al. (2013) has References
the time complexity of O(N3 ) as explained in Johnstone and Lu
Acharya, U. R., Oh, S. L., Hagiwara, Y., Tan, J. H., Adam, M., & Gertych, A. (2017). A
(2009). Also, an efficient implementation of discrete wavelet trans- deep convolutional neural network model to classify heartbeats. Computers in
form (DWT) can achieve O(N), but other popular signal process- Biology and Medicine, 89, 389–396. https://doi.org/10.1016/j.compbiomed.2017.
ing techniques such as discrete Fourier transform (DFT) and fast 08.022.
AHA. American Heart Association https://www.heart.org/en/health-topics/
Fourier transform (FFT) are O(N2 ) and O(NlogN) respectively. arrhythmia.
Overall, we achieved high classification performance with our Ahmed, N., Natarajan, T., & Rao, K. R. (1974). Discrete cosine transform. Computers,
novel convolutional neural network model trained by using the IEEE Transactions On, (1), 90–93. C-23 https://doi.org/10.1109/T-C.1974.223784 .
American National Standards Institute. (2012). Testing and reporting performance re-
proposed batch-weighted loss formula with 2-beat raw ECG in- sults of cardiac rhythm and ST segment measurement algorithms: ANSI/AAMI EC57.
puts. In addition, notice that we do not heavily rely on1 domain- Association for the Advancement of Medical.
specific data preprocessing steps or synthetic data generation that Bani-Hasan, A. M., El-Hefnawi, M. F., & Kadah, M. Y. (2011). Model-based parameter
estimation applied on electrocardiogram signal. Journal of Computational Biology
would significantly increase the training time and may generate bi-
and Bioinformatics Research, 3(2), 25–28.
ased results. Therefore, our approach can perform real-time heart- da Silva Luz, E. J., Schwartz, W. R., Cámara-Chávez, G., & Menotti, D. (2016). ECG-
beat classification efficiently and effectively using single-lead raw based heartbeat classification for arrhythmia detection: A survey. Computer
Methods and Programs in Biomedicine, 127, 144–164. https://doi.org/10.1016/j.
ECG data directly. Furthermore, it can be easily applied to time se-
cmpb.2015.12.008.
ries datasets from other domains since raw time series data can De Lannoy, G., François, D., Delbeke, J., & Verleysen, M. (2012). Weighted conditional
be used directly as input to our approach without domain-specific random fields for supervised interpatient heartbeat classification. IEEE Trans-
feature extraction or noise removal. actions on Biomedical Engineering, 59(1), 241–247. https://doi.org/10.1109/TBME.
2011.2171037.
deChazal, P., O’Dwyer, M., & Reilly, R. B. (2004). Automatic classification of heart-
beats using ECG morphology and heartbeat interval features. IEEE Transactions
on Biomedical Engineering, 51(7), 1196–1206. https://doi.org/10.1109/TBME.2004.
1
Given a long ECG signal, R-peak locations are used to extract heartbeats to clas- 827359.
sify. However, our approach uses only the raw heartbeat waves as input features for Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. New York: John
classification. Wiley Section https://doi.org/10.1007/BF01237942 .
84 A. Sellami and H. Hwang / Expert Systems With Applications 122 (2019) 75–84

Goldberger, A. L., Amaral, L. A. N., Glass, L., Hausdorff, J. M., Ivanov, P. C., Moody, G. B., & Mark, R. G. (2001). The impact of the MIT-BIH arrhythmia database.
& Mark, R. G. (20 0 0). PhysioBank, PhysioToolkit, and PhysioNet. Circulation, IEEE Engineering in Medicine and Biology Magazine. https://doi.org/10.1109/51.
101(23), E215–E220. https://doi.org/10.1161/01.CIR.101.23.e215. 932724.
Güler, I., & Übeyli, E. D. (2005). ECG beat classifier designed by combined neu- Pan, J., & Tompkins, W. J. (1985). A real-time QRS detection algorithm. IEEE Transac-
ral network model. Pattern Recognition, 38(2), 199–208. https://doi.org/10.1016/ tions on Biomedical Engineering. https://doi.org/10.1109/TBME.1985.325532.
j.patcog.20 04.06.0 09. Song, M. H., Lee, J., Cho, S. P., Lee, K. J., & Yoo, S. K. (2005). Support vector machine-
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recog- based arrhythmia classification using reduced features. International Journal of
nition. In 2016 IEEE Conference on computer vision and pattern recognition (CVPR) Control, Automation and Systems, 3(4), 571–579. https://doi.org/10.1016/j.artmed.
(pp. 770–778). https://doi.org/10.1109/CVPR.2016.90. 20 08.04.0 07.
Huang, H., Liu, J., Zhu, Q., Wang, R., & Hu, G. (2014). A new hierarchical method for Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014).
inter-patient heartbeat classification using random projections and RR intervals. Dropout: a simple way to prevent neural networks from overfitting. Journal of
BioMedical Engineering Online, 13(1). https://doi.org/10.1186/1475-925X-13-90. Machine Learning Research, 15, 1929–1958. https://doi.org/10.1214/12-AOS10 0 0.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network train- WHO Cardiovascular Diseases Factsheet 2017: https://www.who.int/news-room/
ing by reducing internal covariate shift. Data Mining with Decision Trees, 7(6), fact-sheets/detail/cardiovascular- diseases- (cvds).
1–9. https://doi.org/10.1007/s13398- 014- 0173- 7.2. Ye, C., Coimbra, M. T., & Vijaya Kumar, B. K. (2010). Arrhythmia detection and classi-
Johnstone, I. M., & Lu, A. Y. (2009). Sparse principal components analysis. Computers fication using morphological and dynamic features of ECG signals. In Conference
& Geosciences. https://doi.org/10.1198/jasa.2009.0121. proceedings : annual international conference of the IEEE engineering in medicine
Li, T., & Zhou, M. (2016). ECG classification usingwavelet packet entropy and random and biology society. IEEE engineering in medicine and biology society. Conference,
forests. Entropy, 18(8). https://doi.org/10.3390/e18080285. 2010 (pp. 1918–1921). https://doi.org/10.1109/IEMBS.2010.5627645.
Luz, E., & Menotti, D. (2011). How the choice of samples for building arrhyth- Yu, S. N., & Chou, K. T. (2008). Integration of independent component analysis and
mia classifiers impact their performances. In Proceedings of the annual interna- neural networks for ECG beat classification. Expert Systems with Applications,
tional conference of the IEEE engineering in medicine and biology society, EMBS 34(4), 2841–2846. https://doi.org/10.1016/j.eswa.20 07.05.0 06.
(pp. 4988–4991). https://doi.org/10.1109/IEMBS.2011.6091236. Yu, S., & Chen, Y. (2007). Electrocardiogram beat classification based on wavelet
Martis, R. J., Acharya, U. R., Lim, C. M., & Suri, J. S. (2013). Characterization of ECG transformation and probabilistic neural network. Pattern Recognition Letters,
beats from cardiac arrhythmia using discrete cosine transform in PCA frame- 28(10), 1142–1150. https://doi.org/10.1016/j.patrec.2007.01.017.
work. Knowledge-Based Systems, 45, 76–82. https://doi.org/10.1016/j.knosys.2013.
02.007.

You might also like