You are on page 1of 8

Biomedical Signal Processing and Control 34 (2017) 1–8

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control


journal homepage: www.elsevier.com/locate/bspc

Technical Note

Evaluation of effect of unsupervised dimensionality reduction


techniques on automated arrhythmia classification
Rekha Rajagopal ∗ , Vidhyapriya Ranganathan
Department of Information Technology, PSG College of Technology, Coimbatore 641 004, India

a r t i c l e i n f o a b s t r a c t

Article history: Automation in cardiac arrhythmia classification helps medical professionals to make accurate decisions
Received 9 December 2015 upon the patient’s health. The aim of this work is to evaluate the performance of five different linear and
Received in revised form nonlinear unsupervised dimensionality reduction (DR) techniques namely principal component analysis
16 September 2016
(PCA), fast independent component analysis (fastICA) with tangential, kurtosis and Gaussian contrast
Accepted 20 December 2016
functions, kernel PCA (KPCA) with polynomial kernel, hierarchical nonlinear PCA (hNLPCA) and principal
polynomial analysis (PPA) on classification of cardiac arrhythmias using probabilistic neural network
Keywords:
classifier (PNN). The design phase of the classification model comprises of the following stages: pre-
Biomedical signal processing
Decision support systems
processing of the cardiac signal by eliminating detail coefficients that contain noise, feature extraction
Feature extraction through Daubechies wavelet transform, dimensionality reduction through unsupervised DR techniques,
Supervised learning and arrhythmia classification using PNN. PCA is a widely used DR technique for mapping high dimen-
sional data to its low dimensional representation. But real world data like electrocardiogram (ECG) signals
are complex and nonlinear in nature. This work concentrates on performance analysis of four nonlinear
DR techniques and conventional linear PCA technique on classification of cardiac arrhythmias. Entire
MIT-BIH arrhythmia database is used for experimentation. The experimental results demonstrate that
the combination of PNN classifier (at spread parameter, ␴ = 0.4) and fastICA DR technique with tangential
contrast function exhibit highest F score of 99.83% with a minimum of 10 dimensions. hNLPCA and KPCA
requires more computation time for low dimensional mapping. PPA performs about 10% better than PCA
and serves intermediate between linear and nonlinear techniques.
© 2016 Elsevier Ltd. All rights reserved.

1. Introduction Dimensionality reduction is the mapping of data in higher


dimensional space to meaningful representation in lower dimen-
Cardiovascular disease is a leading cause of global mortality. sional space such that the dimensionality of data is reduced to
Hence, there is a need to develop automation strategies for the manageable size. Dimensionality reduction is very important in a
management of sudden cardiac death [1]. Abnormality in the nor- classification system in order to mitigate the curse of dimension-
mal rhythm of heartbeat causes arrhythmia. ANSI/AAMI EC57: 1998 ality, computational complexity and memory requirement. Curse
classification standard categorize arrhythmias into five classes, of dimensionality means that for a given sample size, there is a
namely: non ectopic beat (N), supra ventricular ectopic beat (S), maximum number of features above which the performance of
ventricular ectopic beat (V), fusion beat (F) and unknown beat (Q). a classifier gets degraded. The objective of this work is to evalu-
The diagnosis of a specific class of arrhythmia is done by careful ate the performance of five different unsupervised dimensionality
monitoring of long term electrocardiograph (ECG) signal. Automa- reduction techniques in the classification of ECG arrhythmias.
tion in ECG arrhythmia classification is very essential in order to
make fast and accurate decision on arrhythmia class. The important
2. Related works
requirements of an automated system are reduced complexity, fast
decision making and less memory. This can be accomplished with
Several research works are carried out for automation in
the use of dimensionality reduction techniques.
arrhythmia classification. In general, the algorithm used for
automated classification includes (i) preprocessing, (ii) feature
extraction, (iii) dimensionality reduction and (iv) classification. The
∗ Corresponding author. preprocessing of recorded ECG signals is done in order to elimi-
E-mail addresses: rekha.psgtech@gmail.com, jp.psgtech@gmail.com nate the important noises that degrade the classifier performance,
(R. Rajagopal), rvidhyapriya@gmail.com (V. Ranganathan). such as baseline wandering, motion artifact, power line interfer-

http://dx.doi.org/10.1016/j.bspc.2016.12.017
1746-8094/© 2016 Elsevier Ltd. All rights reserved.
2 R. Rajagopal, V. Ranganathan / Biomedical Signal Processing and Control 34 (2017) 1–8

Table 1
Number of heartbeats in each class. Connuous ECG recordings from MIT_BIH
Heartbeat type N S V F Q arrhythmia database
Full database 87643 2646 6792 794 15

ECG beat segmentaon


ence and high frequency noise. Currently, researchers use many
filtering techniques like morphological filtering [2], integral coef-
ficient band stop filtering [3], finite impulse response filtering [4],
Wavelet based feature extracon
5–20 Hz band pass filtering [5,6], median filtering [7] and wavelet
based denoising [7–9] for preprocessing.
Commonly extracted ECG features include (i) Temporal features
of heartbeat like P–Q interval, QRS interval, S–T interval, Q–R inter-
val, R–S interval and R–R interval between adjacent heartbeats. (ii) Dimensionality reducon
Amplitude based features like P peak amplitude, Q peak ampli- (Linear – PCA)
tude, R peak amplitude, S peak amplitude and T peak amplitude. (Non-linear - fast ICA, Kernel PCA, hNLPCA,
(iii) Wavelet transform based features that include Haar wavelet, PPA)
Daubechies wavelet and discrete Meyer wavelet at various decom-
position levels of 4, 6 and 8. (iv) Stockwell transform based features
that include statistical features taken from a complex matrix of
Stockwell transform, time–frequency contour and time–maximum Divide enre samples into 10 equal sets
amplitude contour.
Dimensionality reduction is applied in order to remove redun-
dancy and extract useful information from captured features. A
training set with class label information is required by supervised
dimensionality reduction techniques to learn the lower dimen- Set Set Set ……... Set
sional representation. Commonly used supervised techniques are
1 2 3 10
linear discriminant analysis (LDA) [10], generalized discriminant
analysis (GDA) [5], neighborhood component analysis and metric
learners. Unsupervised techniques do not require class label infor-
mation and few examples of unsupervised techniques are principal
component analysis, independent component analysis, canonical 90 % Training 10% Tesng
correlation analysis, partial least squares, isomap, kernel PCA, max-
imum variance unfolding, diffusion maps, local linear embedding,
Laplacian eigenmaps, Hessian LLE, Local tangent space analysis,
Sammon mapping and multilayer auto encoders [11]. PNN Classifier
Support vector machine (SVM) [5,12–17], probabilistic neural
network (PNN) [9,15,18,19], multilayer perceptron neural network
(MLPNN) [6,15,20], linear discriminant classifier [7], mixture of
experts [15] and unsupervised clustering [21,22] are commonly Ten fold cross validaon
used by researchers for classification of ECG arrhythmia. The
parameters like accuracy, sensitivity and specificity are used in
literature for evaluating the performance of a classifier.
Most of the research works have compared the performance Fig. 1. Architecture of the proposed work.
of classifiers using PCA [2,9], LDA [9,10] and ICA [9,23,24] for
dimensionality reduction. Research works that use nonlinear DR
techniques in arrhythmia classification are very minimal. This work
concentrates on evaluating the performance of an automated clas-
sification system when DR techniques such as fast ICA, Kernel PCA, 4. Methodology
hNLPCA, PPA and PCA are used. The next section discusses the mate-
rials and methods used in this work. Fig. 1 shows the architecture of the proposed work. The entire
experiments are carried out using Matlab R2014a. The details of the
methodology carried out are summarized below.

3. Materials used

The MIT BIH arrhythmia database [25] is used in this work. It 4.1. Data preprocessing
contains 48 half–hour excerpts of two channel ambulatory ECG
recordings, obtained from 47 subjects studied by the BIH arrhyth- The records contain continuous ECG recording for duration
mia laboratory. The recordings were digitized at 360 samples per of 30 min. The raw ECG signals include baseline wander, motion
second per channel with 11 bit resolution over 10 mV range. The artifact and power line interference noise. The discrete wavelet
reference annotations for each beat were included in the database. transform (DWT) is used for denoising the ECG signal and also for
Four records containing paced beats (102, 104, 107 and 217) were extracting the important features from original ECG signal [26,27].
removed from analysis as specified by AAMI. The total number of DWT captures both temporal and frequency information. The DWT
heart beats in each class is given in Table 1. of the original ECG signal is computed by successive high pass and
R. Rajagopal, V. Ranganathan / Biomedical Signal Processing and Control 34 (2017) 1–8 3

Fig. 2. Segment of continuous ECG waveform before preprocessing.

Fig. 3. Segment of continuous ECG waveform after preprocessing.

low pass filtering of that signal. This can be mathematically repre- 4.2. Feature extraction
sented in the following Eqs. (1) and (2),
The entire database (97,890 heartbeats) is divided into ten sets
∞ each containing 9789 heartbeats. Nine sets are used for train-
yhigh [k] = x [n] g [2k − n] (1) ing (88,101 heartbeats) and one set for testing (9789 heartbeats).
n=−∞
From each heartbeat, wavelet based features are extracted by
∞ using Daubechies wavelet (‘db4 ). Daubechies wavelet with level
ylow [k] = x [n] h [2k − n] (2) 4 decomposition is selected in this work after making perfor-
n=−∞
mance comparisons with discrete Meyer wavelet and other levels
of Daubechies wavelet including ‘db2 , ‘db6 . The power spectral
where, x[n] is the original ECG signal samples, g and h are the density of ECG beats showed that most of the ECG signal variations
impulse response of high pass and low pass filters respectively, occur in the fourth sub band frequency range of (0–11.25 Hz) and
and yhigh [k],ylow [k] are the outputs of high pass and low pass fil- (11.25–22.5 Hz). A total of 107 features are produced by the 4th
ters after sub sampling by 2. This procedure is repeated until the level approximation sub band coefficients and another 107 fea-
required decomposition level is reached. The low frequency com- tures by the 4th level detail sub band coefficients. Fig. 4 shows
ponent is called approximation and high frequency component is the decomposition of heart beat signal (sampled at 360 Hz) into
called detail. approximation and detail sub bands up to level 2. This procedure
In this work, the raw ECG signals are decomposed into approxi- can be repeated for further decomposition.
mation and detail sub bands up to level 9 using Daubechies (‘db8 )
wavelet basis function [28]. The detail coefficients contain most of
the noise information. Hence, soft thresholding is applied to the
detail coefficient at each level. Denoised ECG signal is computed
based on the original approximation coefficient at 9th level and 4.3. Dimensionality reduction
modified detail coefficient of levels from 1 to 9. After denoising,
the continuous ECG waveform is segmented into individual heart- The coefficients of approximation and detail sub bands at level
beats. This segmentation is done by identifying the R peaks using 4 decomposition of Daubechies wavelet transform are used as
Pan Tompkins algorithm [29] and by considering 99 samples before features to represent each heart beat. Hence each heartbeat is rep-
R peak and 100 samples after R Peak. This choice of 200 samples, resented by 214 features (107 coefficients from approximation sub
including R peak for segmentation is taken because it constitutes band summed with 107 coefficients from detail sub band pro-
one cardiac activity with P, QRS and T waves. Figs. 2 and 3 show a vide 214 features). In order to extract useful information from the
segment of the recorded ECG waveform of patient identifier: 209 features and to remove redundancy the following dimensionality
before and after preprocessing respectively. reduction techniques are applied.
4 R. Rajagopal, V. Ranganathan / Biomedical Signal Processing and Control 34 (2017) 1–8

ECG signal samples x [n]


4.3.2.1. Fast independent component analysis (Fast ICA). ICA makes a
(Sampled at 360 Hz) data representation such that transformed components are statisti-
cally independent from each other. In order to have computational
simplicity and little memory space, fixed point ICA algorithm is
h[n] g[n] used in this work [30]. The steps involved are,

1 Data centering
2 2
This step produces a data set whose mean (x̄) is zero and is
Level 1 detail coefficients shown in (6).
h[n] g[n] (90 to 180 Hz)
XC = X − E{X} (6)

2 2 where, X is the input data matrix.

2 Whitening
Level 2 approximation Level 2 detail coefficients
coefficients (45 to 90 Hz)
Eigenvector V and Eigen value E are computed from the covari-
(0 to 45 Hz)
ance matrix of centered data, and whitening is performed as in
Fig. 4. Two level wavelet filter bank. (7),

Z = E−1/2 V ∗ XC (7)
4.3.1. Linear technique
Linear mapping of high dimensional data to low dimensional 3 Fixed point iteration for one unit
data is done with this technique. The important advantages of linear
technique are its simple geometric interpretations and attractive It estimates one row of the demixing matrix w as a vector wT .
computational properties [11]. Estimation of w proceeds iteratively as in (8) until convergence is
achieved.
4.3.1.1. Principal component analysis (PCA). PCA that maximizes the
amount of variance in the data is used in this work. The steps fol- • Selection of initial random vector w
lowed for dimensionality reduction of data matrix X consisting of
n ECG samples xi (i &1028; {1, 2, .,n}) with D features (The value of wp = E{x g(wT x)} − E{xg (wT x)}w (8)
D is 214 in the proposed work due to 214 features extracted from
each heartbeat using wavelet transform) are given below, where, wp ∗ = wp /wp  and wp  is the norm of w and g is the contrast
function.
1 Subtraction of the mean
4 Evaluation of next independent components
This step produces a data set whose mean x̄ is zero.
To estimate the other independent components, step 3 is
2 Calculation of covariance matrix C is shown in (3). repeated for getting weight vectors wi , i = 2, 3,.n. To prevent dif-
ferent vectors from converging to the same optimum, the weight
C = (x − x̄)(x − x̄)T (3) vectors are decorrelated using Gram-Schmidt orthogonalization
p T w )w and renormalize
and are given by wp+1 = wp+1 − j=1 (wp+1
 j j
3 Calculation of eigenvectors V and eigenvalues E is shown in (4) wp+1 as wp+1 = wp+1 / T w
wp+1 p+1 .

V−1 CV = E (4)
4.3.2.2. Kernel principal component analysis (Kernel PCA). Using
4 Sorting of eigenvectors Kernel PCA [31], the linear operations of PCA are done with non-
linear mapping. Kernel PCA computes the principal eigenvectors
of the kernel matrix instead of computation from the covariance
Eigen vector with highest eigenvalue is the principal compo-
matrix. The steps involved are,
nent of the data. Eigen vectors are ordered by eigenvalue, highest
to lowest.
1 Computation of kernel matrix K
5 Projection of data is shown in (5)
For the data samples {xi }, kernel matrix is computed. The entries
 T T
 in the kernel matrix are given by ki,j = k(xi , xj ), where k is a kernel
Projected data = V (x − x̄) T (5)
function such as Gaussian or polynomial.
Arrhythmia classification is done by varying the selection of
components from 1 to 10 in the projected data and the results are 2 Double centering of K
analyzed.
It means subtracting the mean of the data in the feature space
4.3.2. Nonlinear techniques defined by the kernel function and is shown in (9).
1 1 
Nonlinear techniques have the ability to deal with complex
1 1
nonlinear real world data. The following nonlinear techniques are ki,j = − (ki,j − kil − l kjl + 2 klm ) (9)
2 n l n n lm
experimented to analyze its performance in comparison with tra-
ditional PCA. Where, n is the number of data samples.
R. Rajagopal, V. Ranganathan / Biomedical Signal Processing and Control 34 (2017) 1–8 5

3 Computation of eigenvectors vi of centered kernel matrix. Table 2


Properties of DR techniques.
4 Computation of eigenvectors of covariance matrix ai in the fea-
ture space constructed by k. Technique Mixing model Parameters Computational complexity
5 Projection of data is shown in (10) PCA Linear None O(D3 )
n  n 

FastICA Non linear g, ε O(2(D+1)ni)


(j) (j) Kernel PCA Non linear k (., .) O(n3 )
yi = a1 k xj , xi , . . ..., ad k x j , x i (10) NLPCA Non linear Net size O(inw)
j=1 j=1
PPA Non linear  O(D3 +(D−1)(+1)3 ))
(j)
where, a1 indicates the jth value in the vector a1 and k is the kernel D–Dimensionality of input sample.
function. Arrhythmia classification is done by varying the selection g–Contrast function.
of components from 1 to 10 in the projected data and the results ε–Convergence parameter.
n–Number of data points.
are analyzed. i–Number of iterations.
k–Kernel function.
4.3.2.3. Hierarchical nonlinear PCA (hNLPCA). hNLPCA [32] is an w–Number of weights in a neural network.
–Polynomial degree.
extension of PCA in which both the principal component values and
the mapping function is provided by the neural network approach.
Layers involved are input, hidden and output layer. Hidden layer 5 Steps 3 and 4 are repeated for the remaining (D-1) dimensions.
performs nonlinear mapping of high dimensional data to lower Data of reduced dimension will be the input for the next step in
dimensions. Input and output layers have D nodes and hidden layer the sequence.
has d nodes, where, D is the actual dimension and d is the reduced
dimension. hNLPCA means that the same hierarchical order as Properties of above discussed DR techniques are shown in
the linear components of standard PCA is maintained. The steps Table 2.
involved are,
4.4. Probabilistic neural network classifier
1 Feed forward neural network is trained such that the mean
squared error between the input and output is minimized. Probabilistic neural network (PNN) is used for classification of
hNLPCA tries to minimize the hierarchical error as well. ECG beats. It is a feed forward network with input, hidden, sum-
mation and output layer. When an input is given, the hidden layer
The hierarchical error function is given in (11), computes the distances from the inputs and training input vectors
to produce a vector whose elements indicates how close the input is
EH = E1 + E 1,2 + E 1,2,3 + . . .. . .. + E 1,2,3...d (11) for a training input. Summation layer sums these contributions for
each class of inputs to produce as its net output a vector of proba-
and E = 12 x̂ − x2 , where x is the data input with Dimension D and bilities. Output layer picks the maximum of these probabilities, and
x̂ is its reconstructed value. produces a 1 for that class and a 0 for the other classes. Radial basis
function (RBF) is used as the transfer function. PNN is trained with
2 Optimal network weights are found using conjugate gradient 88101 ECG beat samples which includes training examples from all
descent approach. five classes. Training and testing matrix is computed such that each
3 The addition of a weight decay term in order to penalize large row represents an ECG heartbeat and features occupy columns.
network weights w. The performance of the arrhythmia classifier system is eval-
 uated using performance metrics such as accuracy, sensitivity,
Etotal = E + v w2 , w is the network weight and v is the weight
i i specificity, false positive rate and F score. These metrics are com-
decay coefficient. puted by calculating TP (True Positive), TN (True Negative), FP
(False Positive) and FN (False Negative) count and are defined as
3 At each iteration, single error terms such as E1, E1,2 . . . are cal- follows: Sensitivity = TP/(TP + FN), Specificity = TN/(TN + FP), False
culated separately and d-dimensional space with minimal mean positive rate = FP/(TN + FP), F score = 2TP/(2TP + FP + FN) and Accu-
square error is found. racy = (TP + TN)/(TP + FP + FN + TN). The process is repeated ten
times such that each set is used once for testing. The overall per-
4.3.2.4. Principal polynomial analysis (PPA). PPA models the direc- formance of the classifier is computed by taking the average of all
tions of maximal variance by means of curves instead of straight ten folds.
lines as modeled by PCA [33]. PPA performs simple univariate
regression such that it becomes computationally feasible. The steps 5. Results and discussion
involved are,
The goodness of a classifier in accurately classifying the test
1 Data centering: heartbeat class is measured mainly by the sensitivity and F score.
The reason for not considering accuracy is that even a poor classi-
This step produces a data set whose mean (x̄) is zero and is fier can show good accuracy in favoring class with more training
shown in (12). examples. Fig. 5 shows the results of average accuracy of PNN clas-
sifier taken from 10 fold cross validation for different DR techniques
XC = X − E{X} (12) and dimensions. The result shows that kernel PCA with polynomial
nonlinearity is able to get the highest accuracy of 99.88% even at
2 The best vector for data projection is located by finding the lead- lower dimensions. But the time taken for dimensionality reduction
ing eigenvector of conventional PCA. is higher than PCA, fastICA and PPA because KPCA computes the
3 Conditional mean m̂ of D dimensional data sample x is estimated eigenvectors of the kernel matrix instead of the covariance matrix.
from the projection by a polynomial fitted to minimize the resid- Sensitivity is an important classification parameter which indi-
ual |x − m̂|. cates the number of correctly identified arrhythmias from patients.
4 Conditional mean is subtracted to every data sample Fig. 6 shows the results of average sensitivity taken from 10 fold
6 R. Rajagopal, V. Ranganathan / Biomedical Signal Processing and Control 34 (2017) 1–8

100 PCA 105


Accuracy in %
98 95 PCA
KPCA-POLY

F score in %
96
85 KPCA-POLY
94 PPA
fastICA-pow3 75 PPA
92
90 fastICA-tanh 65 fastICA-pow3
1 2 3 4 5 6 7 8 9 10 fastICA-tanh
fastICA-gauss 55
Number of dimensions
hNLPCA 1 2 3 4 5 6 7 8 9 10 fastICA-gauss
Number of dimensions hNLPCA
Fig. 5. Results of average accuracy of PNN classifier for different DR techniques and
dimensions. Fig. 9. Results of average F score of PNN classifier for different DR techniques and
dimensions.
100
95 PCA Table 3
Sensivity in %

90
85 KPCA-POLY Variation of average F scores with respect to different spread values during PNN
80 classification using fastICA and Kernel PCA DR techniques.
75 PPA
70 Spread parameter Average F Score
65 fastICA-pow3
60 FastICA–tanh KernelPCA–poly
fastICA-tanh
55
fastICA-gauss 0.1 78.11 92.92
1 2 3 4 5 6 7 8 9 10
0.2 99.62 93.05
Number of dimensions hNLPCA 0.3 99.75 93.05
0.4 99.83 92.97
Fig. 6. Results of average sensitivity of PNN classifier for different DR techniques 0.5 99.83 92.97
and dimensions. 0.6 95.87 92.97
0.7 95.87 92.9
0.8 95.87 92.74
100 PCA 0.9 95.87 92.52
Specificity in %

98 1 95.72 91.66
KPCA-POLY
96 PPA
94 fastICA-pow3 Table 4
Variation of average F scores with respect to different spread values during PNN
92 fastICA-tanh classification using PCA, hNLPCA and PPA DR techniques.
1 2 3 4 5 6 7 8 9 10
fastICA-gauss
Number of dimensions Spread parameter Average F Score
hNLPCA
PCA hNLPCA PPA

Fig. 7. Results of average specificity of PNN classifier for different DR techniques 0.01 61.77 79.13 61.21
and dimensions. 0.02 76.74 92.53 89.19
0.03 78.19 90.51 91.2
0.04 78.6 85.91 91.49
7 0.05 78.74 80.98 91.49
False posive rate in %

6 PCA 0.06 78.74 73.5 91.49


5 KPCA-POLY 0.07 78.74 66.63 91.51
4 0.08 78.82 59 91.28
PPA 0.09 78.6 51.34 91.28
3
fastICA-pow3 0.1 78.5 53.22 91.43
2
1 fastICA-tanh
0
fastICA-gauss
1 2 3 4 5 6 7 8 9 10 of various DR techniques and different dimensions are shown in
Number of dimensions hNLPCA Fig. 9. Results show that fastICA with Gaussian nonlinearity gives
highest F-score of 99.67% with 5 dimensions itself.
Fig. 8. Results of average false positive rate of PNN classifier for different DR tech- Table 3 shows the variation of average F scores with respect to
niques and dimensions.
different spread values during PNN classification using fastICA and
Kernel PCA DR techniques. PNN with fastICA (tangential contrast
cross validation of a PNN classifier for different DR techniques and function) coefficients yielded highest F-score of 99.83% on spread
dimensions. Experimental results show that fast ICA with Gaus- value 0.4. PNN with Kernel PCA (polynomial kernel) coefficients
sian non linearity is able to achieve 99.54% sensitivity with only yielded highest F-score of 93.05% on spread value 0.2. Table 4 shows
5 dimensions. PCA, which performs a linear mapping, is able to the variation of average F scores with respect to different spread
produce only a maximum sensitivity of 78.74% even when 10 values during PNN classification using PCA, hNLPCA and PPA DR
dimensions are used. techniques. PNN with PCA coefficients yielded highest F-score of
Specificity indicates the true negative rate and is shown in Fig. 7. 78.82% on spread value 0.08. PNN with hNLPCA coefficients yielded
The result shows that kernel PCA with polynomial nonlinearity is highest F-score of 92.53% on spread value 0.02 and PNN with PPA
able to get the highest specificity of 99.9% even at lower dimensions. coefficients yielded highest F-score of 91.51% on spread value 0.07.
Average false positive rate for different DR techniques and dimen- Time taken by various dimensionality reduction techniques for
sions taken from 10 fold cross validation is shown in Fig. 8. FastICA making a mapping from high dimensional representation of ECG
with Gaussian non linearity produces the least false positive rate of samples (97,890 × 214) with 97,980 heartbeats and 214 features to
0.05% even with two dimensions. F score is an important parameter low dimensional representation (97,890 × 10) reduced to 10 fea-
that combines both precision and recall. The average F score result tures is shown in Table 5.
R. Rajagopal, V. Ranganathan / Biomedical Signal Processing and Control 34 (2017) 1–8 7

Table 5 evaluated by using various linear and nonlinear dimensionality


Computation time required by various DR techniques for the input ECG data samples.
reduction techniques with different dimensions of data represen-
DR Technique Computation time (Seconds) tations. Experimental results revealed that the conventional linear
PCA 0.6 DR technique like PCA is easy to apply, but is not able to capture
fastICA–gauss 2 important information required for class discrimination from very
KernelPCA–poly 170 lower dimensional data representation. Nonlinear DR techniques
hNLPCA 4576 are able to capture significant information from the lower dimen-
PPA 0.8
sional representation itself. At the same time, nonlinear techniques
face the important drawback of tuning the parameters. The time
Table 6 required for computation is also high compared to linear DR tech-
Comparison of arrhythmia classifiers classifying classes N, S, V and F. niques. hNLPCA requires comparatively very huge time for low
F Score of each arrhythmia class
dimensional mapping than other techniques because of the tedious
training procedure and slow convergence. PPA is computationally
Class N Class S Class V Class F
feasible, but not able to achieve high sensitivity and F score as
Chazal et al. [10] 92.60 51.09 79.76 0.15 other nonlinear techniques since it is just a nonlinear generaliza-
Llamedo et al. [34] 87.15 53.66 85.38 8.10 tion of PCA. FastICA in combination with PNN classifier provides
Ye et al. [24] 92.80 56.25 70.01 4.43
Llamedo et al. [35] 97.46 58.43 88.58 5.33
the highest sensitivity and F score with reduced computation time.
Zhang et al. [36] 93.69 49.45 88.96 23.95
Herry et al. [37] 95.18 12.19 75.15 0
Proposed 99.91 80.97 91.54 86.76 References

[1] R. Mehra, Global public health problem of sudden cardiac death, J.


Based on experimental results obtained, it is found that fast Electrocardiol. (2007) 118–122, http://dx.doi.org/10.1016/j.jelectrocard.2007.
06.023.
ICA dimensionality reduction technique with ‘tanh’ non-linearity [2] J. Kim, S. Shin H, K. Shin, M. Lee, Robust algorithm for arrhythmia
produced the best average F-score of 99.83% when tenfold cross classification in ECG using extreme learning machine, Biomed. Eng. Online 8
validation was done by dividing the entire dataset into ten equal (1) (2009) 1–12, http://dx.doi.org/10.1186/1475-925x-8-31.
[3] W. Liang, Y. Zhang, J. Tan, Y. Li, A novel approach to ECG classification based
sets. The problem with this cross validation approach is that there upon two-layered HMMs in body sensor networks, Sensors (2014)
is a chance for training and testing matrix to get ECG beats from 5994–6011, http://dx.doi.org/10.3390/s140405994.
same patient. Hence experiments were repeated by dividing entire [4] H. Kim, R.F. Yazicioglu, P. Merken, C. Van hoof, H.J. Yoo, ECG signal
compression and classification algorithm with quad level vector for ECG
patient records into ten instead of dividing entire ECG beats into holter system, IEEE Trans. Inf. Technol. Biomed. (2010) 93–100, http://dx.doi.
ten partitions, so that 10% of patient records are tested against 90% org/10.1109/TITB.2009.2031638.
of patient records. The results obtained by using fast ICA (tanh [5] B.M. Asl, S.K. Setarehdan, M. Mohebbi, Support vector machine—based
arrhythmia classification using reduced features of heart rate variability
non-linearity) DR technique are compared with six arrhythmia
signal, Artif. Intell. Med. 44 (2008) 51–64, http://dx.doi.org/10.1016/j.artmed.
classifiers and results are shown in Table 6. Best F Score obtained 2008.04.007.
for each class by arrhythmia classifiers are shown bolded. [6] J.J. Oresko, Z. Jin, S. Huang, Y. Sun, Heather Duschland Allen Cheng C. A
wearable smart phone based platform for real time cardiovascular disease
Automatic heartbeat classification system was proposed by [10].
detection via electrocardiogram processing, IEEE Trans. Inf. Technol. Biomed.
Morphological and temporal features were used by them and 14 (2010) 734–740, http://dx.doi.org/10.1109/TITB.2010.2047865.
achieved 75.9% sensitivity, 38.5% positive predictivity and 4.7% false [7] Kai Huang, Liqing Zhang, Cardiology knowledge free ECG feature extraction
positive rate for S class. Two ECG leads were used by them for using generalized tensor rank one discriminant analysis, EURASIP J. Appl.
Signal Process. 2014 (1) (2014) 1–15, http://dx.doi.org/10.1186/1687-6180-
feature extraction and final decision was obtained by combining 2014-2.
the outputs from two linear discriminant classifiers. The proposed [8] B. Zhu, Y. Ding, K. Hao, A novel automatic detection for ECG arrhythmias using
method used ECG signal from only one lead. Random selection maximum margin clustering with immune evolutionary algorithm, Comput.
Math. Methods Med. 2013 (April) (2013) 1–8, http://dx.doi.org/10.1155/2013/
of training samples was done by [24]. Cluster based classification 453402.
was done by [35]. Classifier performance was improved by them [9] R.J. Martis, U. Rajendra Acharya, Min Lim Choo, ECG beat classification using
with expert assistance. The proposed method does not require any PCA, LDA, ICA and discrete wavelet transform, Biomed. Signal Process. Control
8 (2013) 437–448, http://dx.doi.org/10.1016/j.bspc.2013.01.005.
expert assistance. Feature selection was made by one-versus-one [10] P. Chazal, M.O. Dwyer, Automatic classification of heartbeats using ECG
feature ranking stage by [36] and they achieved 88.94%, 79.06%, morphology and heartbeat interval features, IEEE Trans. Biomed. Eng. 51
85.48% and 93.81% sensitivities for classes N, S, V and F respec- (2004), http://dx.doi.org/10.1109/tbme.2004.827359.
[11] Laurens van der Maaten, Eric Postma, Dimensionality Reduction: A
tively using two ECG leads. Synchrosqueezing transform was used
Comparative Review, TiCC, Tilburg University, 2009.
to enhance R peak detection by [37] and achieved 91.02% over- [12] M.H. Song, Jeon Lee, Sungpilcho, Kyoung Joung Lee, Sun Kook Yoo, Support
all accuracy. But classification performance of classes other than N vector machine based arrhythmia classification using reduced features, Int. J.
Control Automat. Syst. 3 (2005) 571–579.
and V were very poor. The proposed model used only DWT features
[13] Z. Zidelmal, A. Amirou, D. Ould Abdeslamand, J. Merckle, ECG beat
reduced using fast ICA. The results show that F Score of classifier classification using a cost sensitive classifier, Comput. Method Prog. Biomed.
in classifying class S is 80.97% and class F is 86.76%. The reason for 111 (2013) 570–577, http://dx.doi.org/10.1016/j.cmpb.2013.05.011.
low F Score is because of heavy overlap of class S with class N and [14] A. Daamouche, L. Hamami, N.A. Farid Melgani, A wavelet optimization
approach for ECG signal classification, Biomed. Signal Process. Control 7
few training examples available for certain classes. Future work is (2012) 342–349, http://dx.doi.org/10.1016/j.bspc.2011.07.001.
to overcome class overlap problem and class imbalance problem in [15] E.D. Ubeyli, Usage of Eigen vector methods in implementation of automated
arrhythmia classification. Comparison of proposed system is done diagnostic systems for ECG beats, Digit. Signal Process. 18 (2008) 33–48,
http://dx.doi.org/10.1016/j.dsp.2007.05.005.
with existing systems that followed AAMI recommended practice [16] F. Melgani, Y. Bazi, Classification of electrocardiogram signals with support
of arrhythmia classification in which training and testing data do vector machines and particle swarm optimization, IEEE Trans. Inf. Technol.
not contain ECG beats from same patient. Biomed. 12 (2008) 667–677, http://dx.doi.org/10.1109/TITB.2008.923147.
[17] J. Wiensand, J.V. Guttag, Patient adaptive ectopic beat classification using
active learning, Proceedings of the 2010 Computing in Cardiology, IEEE (2010)
6. Conclusion 109–112 http://hdl.handle.net/1721.1/73888.
[18] Manab Kumar Das, Samit Ari, ECG beats classification using mixture of
features, Int. Sch. Res. Notices 2014 (2014).
Classification of cardiac arrhythmias into five different classes [19] Manab Kumar Das, Samit Ari, Electrocardiogram beat classification using
as per AAMI standard is performed. Classifier performance is S-transform based feature set, J. Mech. Med. Biol. (2014) 14.
8 R. Rajagopal, V. Ranganathan / Biomedical Signal Processing and Control 34 (2017) 1–8

[20] Hari Mohan Rai, Anurag Trivedi, Shailja Shukla, ECG signal processing for [28] B.N. Singh, A.K. Tiwari, Optimal selection of wavelet basis function applied to
abnormalities detection using multi-resolution wavelet transform and ECG signal denoising, Digit. Signal Process. 16 (2006) 275–287, http://dx.doi.
artificial neural network classifier, Measurement 46 (2013) 3238–3246, org/10.1016/j.dsp.2005.12.003.
http://dx.doi.org/10.1016/j.measurement.2013.05.021. [29] J. Pan, J.W. Tompkins, A real time QRS detection algorithm, IEEE Trans.
[21] Behnaz Ghoraani, Sridhar Krishnan, Discriminant non-stationary signal Biomed. Eng. (1985) 32, http://dx.doi.org/10.1109/tbme.1985.325532.
features clustering using hard and fuzzy cluster labeling, EURASIP J. Adv. [30] Aapo Hyvarinen, Fast and robust fixed point algorithms for independent
Signal Process. 2012 (1) (2012) 1–20, http://dx.doi.org/10.1186/1687-6180- component analysis, IEEE Trans. Neural Netw. (1999) 626–634, http://dx.doi.
2012-250. org/10.1109/72.761722.
[22] Fahim Sufi, Ibrahim Khalil, Abdun Naser Mahmood, A clustering based system [31] J. Shawe Taylor, N. Christianini, Kernel Methods for Pattern Analysis,
for instant detection of cardiac abnormalities from compressed ECG, Expert Cambridge University Press, Cambridge, UK, 2004, http://dx.doi.org/10.1017/
Syst. Appl. 38 (2011) 4705–4713, http://dx.doi.org/10.1016/j.eswa.2010.08. CBO9780511809682.
149. [32] M. Scholz, R. Vigario, P.C.A. Nonlinear, A new hierarchical approach, Proc.
[23] Sung Nien Yu, Combining independent component analysis and back ESANN (2002) 439–444.
propagation neural network for ECG beat classification, Proc. Conf. IEEE Eng. [33] V. Laparra, S. Jimenez, D. Tuia, G. Camps-Valls, J. Malo, Principal polynomial
Med. Biol. Soc. (2006) 3090–3093. analysis, Int. J. Neural Syst. (2014) 26.
[24] Can Ye, B.V.K. Vijayakumar, M.T. Coimbra, Heartbeat classification using [34] M. Llamedo, Martinez Juan Pablo, Heartbeat classification using feature
morphological and dynamic features of ECG signals, IEEE Trans. Biomed. Eng. selection driven by database generalization criteria, IEEE Trans. Biomed. Eng.
(2012) 59. 58 (2011) 616–625, http://dx.doi.org/10.1109/TBME.2010.2068048.
[25] G.B. Moody, R.G. Mark, The impact of the MIT BIH arrhythmia database, IEEE [35] M. Llamedo, J.P. Martinez, An automatic patient adapted ECG heartbeat
Eng. Med. Biol. 20 (2001) 45–50, http://dx.doi.org/10.1109/51.932724. classifier allowing expert assistance, IEEE Trans. Biomed. Eng. 59 (2012)
[26] E.B. Mazomenos, D. Biswas, A. Acharyya, Taihai Chen A low-complexity ECG 2312–2320, http://dx.doi.org/10.1109/TBME.2012.2202662.
feature extraction algorithm for mobile healthcare applications, IEEE J. [36] Z. Zhang, J. Dong, X. Luo, K. Choi, X. Wu, Heartbeat classification using disease
Biomed. Health Inform. 17 (2013) 459–569, http://dx.doi.org/10.1109/TITB. specific feature selection, Comput. Biol. Med. 46 (2014) 79–89, http://dx.doi.
2012.2231312. org/10.1016/j.compbiomed.2013.11.019.
[27] Li Deqiang, Pedrycz Witold, J. Nicolino Pizzi, Fuzzy wavelet packet based [37] C.L. Herry, M. Frasch, H. Wu, Characterization of ECG patterns with the
feature extraction method and its application to biomedical signal synchrosqueezing transform. (2015). arXiv:1510.02541.
classification, IEEE Trans. Biomed. Eng. 52 (2005) 1132–1139, http://dx.doi.
org/10.1109/TBME.2005.848377.