You are on page 1of 34

Chapter 3

ECG Signal Classification

The previous chapter reviewed existing methods for ECG signal and image classifica-

tion. In this chapter, methods for ECG signal pre-processing, feature extraction and

classification is presented. Support Vector Machines (SVM), Auto associative Neural

Network (AANN), Gaussian Mixture Models (GMM) are the models used for classifi-

cation. Section 3.2 explains the method for feature extraction of ECG signal. Section

3.2.1 explains the procedure for ECG signal preprocessing. Section 3.2.2 describes

the Morphological feature extraction, Section 3.2.3 and Section 3.2.4 describes LPC

and MFCC computation respectively. Modeling techniques used for classification are

presented in Section 3.3. Section 3.4 presents the performance measures used in the

proposed work. The experimental results for normal/abnormal classification is given in

Section 3.5.1. Section 3.5.2 discusses the experimental results for disease classification.

In Section 3.5.3 the disease subcategory classification performance is presented.

3.1 Introduction

Feature extraction plays an important role in any classification task. It is the process of

finding the most informative and yet compact set of features, so that the effectiveness

of machine learning task can be enhanced. The main objective of the ECG feature

extraction process is to derive a set of parameters that best characterizes the ECG

signal. These parameters should contain maximum information about the ECG signal.

Hence the selection of these parameters is an important criterion to be considered for

proper classification. Cardiac disease classification, therefore, involves determination

of several characteristic features of the ECG signal [128]. More importantly, the most

common and appropriate ways of representing data for any classification and regression

problems are feature vectors.

In this work the normal and abnormal ECG classification is made in which three

major issues have been focused for abnormal category. They are Arrhythmia, Myocar-

dial Infarction and Conduction Blocks in which each disease is further classified into

three subcategories namely Supra ventricular tachycardia, Atrial fibrillation, Ventric-

ular tachycardia of Arrhythmia, Anteroseptal infarction, Anterior infarction, Inferior

infarction of Myocardial infarction, Atrio ventricular blocks, Left bundle branch blocks

and Right bundle branch blocks of Conduction blocks respectively.

Morphological features in an ECG are the essential features for diagnosing various

cardiac diseases. Morphological analysis of ECG signal adopts various signal processing

strategies over the past two decades [62]. Linear prediction technique is one of the most

powerful signal analysis technique for encoding good quality signals at a low bit rate,

and widely used in a variety of fields such as medical signals, speech and audio signal

processing. ECG is often contaminated by noise and artifacts, therefore, denoising is

the most significant stage introduced at each preprocessing stage. After pre-processing,

the second stage towards classification is to extract features from the signals. ECG

being a non-stationary signal, the irregularities may not be periodic and may show

up at different intervals. Selection of an efficient technique to analyze these types

of signals is an important task. Wavelet transform has been proven as a useful tool

for ECG [6] signal analysis and it is widely used in biomedical signal processing and

denoising applications [7], [8], [9].

In the proposed work three types of features are used for ECG signal classification.

They are Morphological features of ECG, Linear Prediction Coefficients (LPC) and

Mel Frequency Cepstral Coefficients (MFCC). In preprocessing stage the noise and

artifacts in the signal are removed using DWT. This chapter addresses preprocessing,

feature extraction and classification methods for ECG signal.

3.2 ECG Signal Feature Extraction

3.2.1 Preprocessing

The ECG recording retrieved from the databases may consist of various kind of noise

such as muscle contraction, electrode displacement, patient movements and so on.

Hence the signal should be pre-processed to remove these kinds of artifacts from the

signal for further processing. ECG signals are preprocessed using the filtering process.

In this work, the Daubechies filter of Discrete Wavelet Transform (DWT) is used for

denoising and Morphological feature extraction of the ECG signal. Throughout this

work, a sampling rate of 360 Hz, 16 bit monophonic, Pulse Code Modulation (PCM)

format of ECG signal is adopted.

Discrete Wavelet Transform

An ECG signal changes over the time marker with respect to heartbeat events. The

wavelet transform is a method for the complete frequency localization of a signal.

Wavelet transform may use long sampling intervals where low-frequency information is

needed and shorter sampling intervals where high-frequency information is available.

The major advantage of wavelet transform is its ability to perform multiresolution

analysis for event localisation with respect to all frequency components in data over

time to space. Thus, wavelet analysis is capable of revealing aspects of data that other

signal analysis techniques miss, such as breakdown points, and discontinuities in higher

derivatives [129].

Daubechies wavelets are compactly orthonormal wavelets that make discrete wavelet

analysis practicable [129]. The Daubechies wavelet is conceptually more complex but,

it picks up detail that is missed by the Haar wavelet algorithm [130]. Choosing a

wavelet function which closely matches the signal to be processed is of extreme impor-

tance in wavelet applications [131]. It is used to obtain the characteristic waves of the

ECG signal from which a set of features are derived. The 8 level wavelet decompo-

sition based on Daubechies 6 wavelet functions are considered here. The Daubechies

6 wavelets are chosen based on their shape and their ability to analyze the signal in

this particular application. The shape of Daubechies wavelets is similar to that of

the shape of QRS complex and their energy spectrum are concentrated around low

frequencies. Wavelet processing is based on the idea of sub-band decomposition and

coding. The two basic wavelet processes are decomposition and reconstruction. It

decomposes a signal into a set of basis functions called wavelets. Signal decomposition

using DWT is shown in Fig. 3.1. LoD and HiD are low pass and high pass decompo-

Fig. 3.1: Signal decomposition using DWT

sition filters respectively. 2 ↓ 1 or 1 ↓ 2 represents down sampling by 2. cA and cD

are the approximation and detail coefficients.

Wavelet transform theory uses two major concepts: scaling and shifting. Scal-

ing, through dilation or compression, provides the capability to analyse a signal over

different windows (sampling periods) in the data, while shifting, through delay or

advancement, provides translation of the wavelet kernal over the entire signal. The

wavelet transform is based on the principle of linear series expansion of a signal using

a set of orthonormal basis function. Through linear series expansion, a signal f(t) can

be uniquely decomposed as a weighted combination of orthonormal basis functions as

f (t) = ak ϕn (t) (3.1)

where n is an integer index with n ∈ Z (Z is a set of all integers), ak are weights and

ϕn (t) are orthonormal basis functions.

A single level decomposition puts a signal through two complementary low-pass

and high-pass filters. The output of the low-pass filter gives the Approximation (A)

coefficients, while the high pass filter gives the Detail (D) coefficients. The A and

D coefficients can be used to reconstruct the signal absolutely while run through the

mirror reconstruction filters of the wavelet family. Daubechies-6 wavelet family is used

for filtering the noise. The output of the filter is the down sampled version of the 8th

level coefficients and it is reconstructed to find the R Peak in which 8th level has larger

variations obviously. Minor variations are visible in the lower level itself. So the other

peaks such as P, Q, S and T are detected in this level of reconstruction. To obtain

the ECG features, the characteristic points P, Q, R, S and T are obtained at different

decomposition levels [128].

3.2.2 Morphological Feature Extraction

In this work Morphological features such as R peak count, QRS interval, PR interval,

QT interval, ST interval, RR interval, PP interval and TT interval are extracted.

R-peak has the highest amplitude in an ECG signal. R-peak is detected by using

the Daubechies 8th level reconstructed coefficients. The heart rate for each patient is

calculated by finding the distance between two R peaks (R-R). The other peaks are

identified by traversing the windowing function on either side of R peak. The Q and

S peaks are found by traversing on the left and right side of the R peak within the

specified window and locating the minimum or negative peak values. By traversing

the left side of the Q peak the maximum value is found to be the P peak. Similarly

by traversing the right side of the S peak, the maximum value is found to be the T

peak. The onset and offset of all points are calculated. Depending upon these data

points the Morphological features are extracted. The steps in wavelet decomposition

for 8 levels is given as follows:

8-level wavelet decomposition using Daubechies 6

• R peak: The R peak is detected by combining detail components D3, D4 and

D5 using adaptive threshold value. The values greater than threshold is taken

as R peak

T hresholdvalue = Max[signal] ∗ Mean[signal] (3.2)

• Q Peak: Q peak is detected, as local minimum point obtained within next

25 samples from the left side of R-peak (before R-peak) by combining detail

components D2, D3, D4 and D5

• P peak: P peak is detected obtained within next 5 samples from left side of Q

wave (before Q wave) by combining detail components D6 and D7

• S Peak: S peak is detected, as local minimum point obtained within next 5 sam-

ples from right side of R- peak (after R-peak) by combining detail components

D2, D3, D4 and D5.

• T Peak: T peak is detected, as maximum point obtained within 90 samples

from P peak by combining detail components D6 and D7

Based on all peak values obtained the wavelet domain parameters of ECG have been

calculated. They are

1. P-P interval (IP P ) is the mean of P-P interval durations. The P-P interval is

obtained by

IP P = Pi+1 − Pi , i = 1, 2, .....N − 1 (3.3)

2. R-R interval (IRR ) is the mean of R-R interval durations. The R-R interval is

obtained by

IRR = Ri+1 − Ri , i = 1, 2, .....N − 1 (3.4)

3. P-R interval (IP R ) is the time duration between successive P and R waves in

each beat. The P-R interval is obtained by

IP R = R − Pon−set (3.5)

4. QRS Duration (IQRS ) is the time duration from the beginning of the Q wave to

the end of the S wave. The QRS duration is calculated by

IQRS = TS − TQ (3.6)

5. QT Interval (IQT ) Duration is the time from the beginning of the Q-wave to the

end of the T wave. It is obtained by

IQT = Tof f −set − Q (3.7)

6. T-T interval (IT T ) is the mean of T-T interval durations, obtained by

IT T = Ti+1 − Ti , i = 1, 2, .....N − 1 (3.8)

7. ST interval (IST ) is

IST = Ti+1 − Ti , i = 1, 2, .....N − 1 (3.9)

The heart rate is calculated from the RR interval time series is given by (3.10)
HR = (3.10)
R − R interval (seconds)

3.2.3 Linear Prediction Coefficients

Linear Predictive Coefficients (LPC) analysis is one of the most powerful tools in signal

processing, especially speech signals which is able to extract the dominant features of

speech signal. The capability of precise estimation of signal parameters and high speed

computations is considered as an advantage of this technique which has become a base

to use these coefficients in evaluation of ECG signal changes [53]. A pth order LP

analysis is used to capture the properties of the signal.

In the LP analysis of ECG, each sample is predicted as linear weighted sum of

the past p samples, where p represents the order of prediction [47], [48]. If s(n) is the

present sample, then it is predicted by the past p samples as

ŝ(n) = − ak s(n − k) (3.11)

The LPC are obtained using Levinson-Durbin recursive algorithm. This is known

as LPC analysis. The difference between the actual and the predicted sample value is

termed as the prediction error or residual, and is given by

e(n) = s(n) − ŝ(n) = s(n) + ak s(n − k) (3.12)
= ak s(n − k), a0 = 1 (3.13)

For an ECG frame of size m samples, the mean square of prediction error over the

whole frame is given by,

" p
E= e2 (m) = s(m) − ak s(m − k) (3.14)
m m k=1

Optimal predictor coefficients will minimize this mean square error. At minimum value

of E,
= 0, k = 1, 2, . . . p (3.15)
Differentiating using (3.14) and equating to zero we get,

Ra = r (3.16)

where , a = [a1 a2 · · · ap ]T , r = [r(1)r(2) · · · r(p)]T , and R is a Toeplitz symmetric

autocorrelation matrix given by,

 
 r(0) r(1) ··· r(p − 1) 
 
 r(1) r(0) ··· r(p − 2) 
 
R=  .
 (3.17)
 .. .. .. .. 
 . . . 

 
r(p − 1) ··· ··· r(0)

using (3.16) can be solved for predictor coefficients using Durbin’s algorithm as follows:

E (0) = r[0] (3.18)

r[i] − j=1 αji−1 · r[|i − j|]
ki = 1≤i≤p (3.19)
E (i−1)
αii = ki (3.20)

(i−1) (i−1)
αji = αj − ki · αi−j (3.21)

E i = (1 − ki2 ) · E i−1 (3.22)

The above set of equations is solved recursively for i = 1, 2 . . . , p. The final solution is

given by
am = αm 1≤m≤p (3.23)

where, am ’s are Linear prediction coefficients. In this work, 14th order LP coeffi-

cients are extracted. As previously mentioned, linear predictive coefficients are used

to directly estimate the parameters of speech signal, so we could refer to speakers

recognition, speech recognition, speech classification and signal dereverberation as im-

portant applications of this analysis. It is noteworthy to know that because of time

varying nature of the signal, coefficients must be calculated from short segments [132].

Consequently, in order to extract these coefficients in an interval including 100 samples

of QRS complex, 14 LPC coefficients are determined and these coefficients are used as

inputs of classification block.

3.2.4 Mel Frequency Cepstral Coefficients

Mel Frequency Cepstral Coefficients (MFCC) are short-term spectral features and are

widely used in the area of audio and speech processing. The mel frequency cepstrum

has proven to be highly effective in recognizing the structure of audio signals and in

modeling the subjective pitch and frequency content of audio signals. The MFCCs

have been applied in a range of audio mining tasks, and have shown good performance

compared to other features [49]. MFCC is computed by various authors in different

methods. Although MFCCs have been used in music identification, there is very few

work done for heart sound analysis using MFCCs. In this work the MFCC features

are used for analysing the Electrocardiogram signal.

MFCC filters are spaced linearly at low frequencies and logarithmically at high

frequencies to capture the important characteristics of ECG signal. To obtain MFCCs,

the ECG signals are segmented and windowed into frames of 10 seconds. Fig. 3.2

describes the procedure for extracting the MFCC features.

• Mel frequency wrapping: Magnitude spectrum is computed for each of these

Fig. 3.2: Block diagram of MFCC computation

frames using FFT and converted into a set of mel scale filter bank outputs.

The filterbank analysis provides a much more straightforward route to obtain

the desired non-linear frequency resolution. However, filterbank amplitudes are

highly correlated and hence, the use of a cepstral transformation in this case is

virtually mandatory. A simple Fourier transform based filterbank is designed

to give approximately equal resolution on a mel-scale. Fig. 3.3 illustrates the

general form of this filterbank. As can be seen, the filters used are triangular

and they are equally spaced along the mel-scale which is defined by

Mel(f ) = 2595 log10 (1 + ) (3.24)

To implement this filterbank, the window of ECG data is transformed using a

Fourier transform and the magnitude is taken. The magnitude coefficients are

then binned by correlating them with each triangular filter. Here binning means

that each FFT magnitude coefficient is multiplied by the corresponding filter

gain and the results are accumulated. Thus, each bin holds a weighted sum

representing the spectral magnitude in that filterbank channel.

Normally the triangular filters are spread over the whole frequency range from

zero upto the Nyquist frequency. However, band-limiting is often useful to

reject unwanted frequencies or avoid allocating filters to frequency regions in

which there is no useful signal energy. For filterbank analysis, lower and upper

frequency cut-offs can be set. When low and high pass cut-offs are set in this

way, the specified number of filterbank channels are distributed equally on the

mel-scale across the resulting pass-band.


m1 mj mP
Energy in
... ...
Each Band

Fig. 3.3: Mel scale filter bank

• Cepstrum: Logarithm is then applied to the the filter bank outputs followed

by discrete cosine transformation to obtain the MFCCs. Because the mel spec-

trum coefficients are real numbers (and so are their logarithms), they may be

converted to the time domain using the DCT. In practice the last step of taking

inverse DFT is replaced by taking DCT for computational efficiency. Typically,

first 13 MFCCs are used as features [49], [133].

In the proposed work 360 values are reduced to 8 dimensional Morphological features,

14 dimensional LPC and 13 dimensional MFCC features respectively. These features

are extracted as described for different categories of cardiac diseases.

3.3 Modeling Techniques

SVM, AANN and GMM classifiers are the models used in this work for classifying the

ECG data based on Morphological features, LPC and MFCC extracted from the ECG


3.3.1 Support Vector Machine

Support Vector Machine (SVM) is a statistic machine learning technique that has been

successfully applied in the pattern recognition area and is based on the principle of

Structural risk minimization [73], [74], [134], [135]. Fig. 3.4 shows the architecture of

the SVM. SVM constructs a linear model to estimate the decision function using non-

Fig. 3.4: Architecture of the SVM (Ns is the number of support vectors).

linear class boundaries based on support vectors. If the data are linearly separated,

SVM trains linear machines for an optimal hyperplane that separates the data without

error and into the maximum distance between the hyperplane and the closest training

points. The training points that are closest to the optimal separating hyperplane are

called support vectors.

SVM maps the input patterns into a higher dimensional feature space through

some nonlinear mapping chosen a priori. A linear decision surface is then constructed

in this high dimensional feature space. Thus, SVM is a linear classifier in the parameter

space, but it becomes a non-linear classifier as a result of the non-linear mapping of

the space of the input patterns into the high dimensional feature space.

Fig. 3.5: An example for SVM kernel function Φ(x) maps 2-dimensional input space to
higher 3-dimensional feature space. (a) Nonlinear problem. (b) Linear problem.

For linearly separable data, SVM finds a separating hyperplane which separates

the data with the largest margin. For linearly inseparable data, it maps the data in the

input space into a high dimension space x ∈ RI 7→ Φ(x) ∈ RH with kernel function

Φ(x), to find the separating hyperplane. An example for SVM kernel function Φ(x)

maps 2-Dimensional input space to higher 3-Dimensional feature space as shown in

Fig. 3.5. SVM was originally developed for two class classification problems. The N

class classification problem can be solved using N SVMs. Each SVM separates a single

class from all the remaining classes (one-vs-rest approach).

SVM generally applies to linear boundaries. In the case where a linear boundary is

inappropriate SVM can map the input vector into a high dimensional feature space. By

choosing a non-linear mapping, the SVM constructs an optimal separating hyperplane

in this higher dimensional space as shown in Fig. 3.5. The function K is defined as the

kernel function for generating the inner products to construct machines with different

types of non-linear decision surfaces in the input space.

(x,xi ) = Φ (x) .Φ (xi ) (3.25)

The kernel function may be any of the symmetric functions that satisfy the Mercer’s

Table 3.1: Types of SVM inner product kernels.

Types of kernels Inner Product Kernel K(xT , xi ) Details

Polynomial (xT xi + 1)p Where x is input patterns,
xi is support vectors,
" 2 #

x −x i

Gaussian exp − 2σ2
σ 2 is variance, 1 ≤ i ≤ Ns ,

Ns is number of support vectors,

Sigmoidal tanh β0 (xT xi ) + β1 β0 , β1 are constant values.
p is degree of the polynomial

conditions. There are several SVM kernel functions as given in Table 3.1. The dimen-

sion of the feature space vector Φ(x) for the polynomial kernel of degree p and for the

input pattern dimension of d is given by

(p + d)!
p! d!

For sigmoidal kernel and Gaussian kernel, the dimension of feature space vectors is

shown to be infinite. Finding a suitable kernel for a given task is an open research

problem. Given a set of ECG corresponding to N categories for training, N SVMs

are trained. Each SVM is trained to distinguish between one category and all other

categories in the training set. During testing, the class label l of an ECG x can be

determined using (3.27).

 n, if dn (x) + t > 0

l= (3.27)
 0, if dn (x) + t ≤ 0

where dn (x)= max {di (x)}N

i=1 , and di (x) is the distance from x to the SVM hyperplane

corresponding to category i. The classification threshold is t, and the class label l = 0

stands for unknown.

3.3.2 Autoassociative Neural Network Model

The five layer Autoassociative Neural Network (AANN) model is used to capture the

distribution of the ECG feature vectors. The general topology of AANN is discussed

in this section. In this network, the second and fourth layers have more units than

the input layer. The third layer has fewer units than the first or fifth. The processing

units in the first and third hidden layer are non-linear, and the units in the second

compression/hidden layer can be linear or non-linear. The network is trained using

backpropagation algorithm [84], [136]. As the error between the actual and the desired

output vectors is minimized, the cluster of points in the input space determines the

shape of the hypersurface obtained by the projection onto the lower dimensional space.

Fig. 3.6(b) shows the space spanned by the one dimensional compression layer for the

2 dimensional data shown in Fig. 3.6(a) for the network structure 2L 10N 1N 10N 2L,

where L denotes a linear unit and N denotes a non-linear unit. The non-linear units

use tanh(s) as the activation function, where s is the activation value of the unit. The

integer value indicates the number of units used in that layer. The backpropagation

learning algorithm is used to adjust the weights of the network to minimize the mean

square error for each feature vector. The solid lines shown in Fig. 3.6(b) indicate

mapping of the given input points due to the one dimensional compression layer. Thus,

one can say that the AANN captures the distribution of the input data depending on

the constraints imposed by the structure of the network.

In order to visualize the distribution better, one can plot the error for each input

data point in the form of some probability surface as shown in Fig. 3.6(c). The error

Ei for the data point i in the input space is plotted as pi = exp(−Ei /α) , where α is a

constant. Note that pi is not strictly a probability density function, but the resulting

surface is called probability surface. The plot of the probability surface shows a large

0.1 0.1
10 10
0.05 0.05

0 5 0 5
−4 −4
−2 0 −2 0
0 0
2 −5 2 −5
4 4

(a) (b)


Fig. 3.6: Distribution capturing ability of AANN model. From [1]. (a) Artifi-
cial 2 dimensional data. (b) 2 dimensional output of AANN model with the struc-
ture 2L 10N 1N 10N 2L. (c) Probability surfaces realized by the network structure
2L 10N 1N 10N 2L.

amplitude for smaller error Ei , indicating better match of the network for that data

point. The constraints imposed by the network can be seen by the shape the error

surface takes in both the cases. One can use the probability surface to study the

characteristics of the distribution of the input data captured by the network. Ideally,

one would like to achieve the best probability surface, best defined in terms of some

measure corresponding to a low average error.

During AANN training, the weights of the network are adjusted to minimize the

mean square error obtained for each feature vector. If the adjustment of weights is done

for all feature vectors once, then the network is said to be trained for one epoch. For

successive epochs, the mean square error is averaged over all feature vectors. During

testing phase, the features extracted from the test data are given to the trained AANN

model to obtain the average error.

The standard backpropagation neural network training algorithm is used to adjust the

weights in AANN. All the initial weights are randomly chosen by the backpropagation

training algorithm. Only the number of epochs is to be specified.

3.3.3 Gaussian Mixture Models

The probability distribution of feature vectors is modeled by parametric or non-

parametric methods. Models which assume the shape of probability density func-

tion are termed parametric. In non-parametric modeling, minimal or no assumptions

are made regarding the probability distribution of feature vectors. The potential of

Gaussian mixture models to represent an underlying set of ECG classes by individual

Gaussian components, in which the spectral shape of the ECG class is parameterized

by the mean vector and the covariance matrix, is significant. Also, these models have

the ability to form a smooth approximation to the arbitrarily-shaped observation den-

sities in the absence of other information [137]. With Gaussian mixture models, each

ECG is modeled as a mixture of several Gaussian clusters in the feature space. The

basis for using GMM is that the distribution of feature vectors extracted from a class

can be modeled by a mixture of Gaussian densities as shown in Fig. 3.7. For a D

dimensional feature vector x, the mixture density function for category s is defined as

p(x/λs ) = αsi fis (x)

The mixture density function is a weighted linear combination of m component uni-

modal Gaussian densities fis (.). Each Gaussian density function fis (.) is parameterized

by the mean vector µsi and the covariance matrix Σsi using

fis (x) = √ exp(− 12 (x − µsi )T (Σsi )−1 (x − µsi )),
(2π)d |Σsi |

Fig. 3.7: Gaussian mixture models

where (Σsi )−1 and |Σsi | denote the inverse and determinant of the covariance matrix
Σsi , respectively. The mixture weights (αs1 , αs2 , ..., αsM ) satisfy the constraint αsi =
s s
1. Collectively, the parameters of the model λ are denoted as λ = {αi ,µsi , Σsi },

i = 1, 2, · · · , M. The number of mixture components is chosen empirically for a given

data set. The parameters of GMM are estimated using the iterative expectation-

maximization algorithm [138].

The motivation for using Gaussian densities as the representation of ECG features

is the potential of GMMs to represent an underlying set of ECG classes by individual

Gaussian components in which the spectral shape of the ECG class is parameterized

by the mean vector and the covariance matrix. Also, GMMs have the ability to form

a smooth approximation to the arbitrarily-shaped observation densities in the absence

of other information [137]. With GMMs, each ECG is modeled as a mixture of several

Gaussian clusters in the feature space.

3.4 Performance Measures

Performance measurement is the process of collecting, analyzing and/or reporting in-

formation regarding the performance of a system or component. Sensitivity and speci-

ficity are statistical measures of the performance of a binary classification test, also

known in statistics as classification function.


Sensitivity (also called the true positive rate, or the recall in some fields) measures

the proportion of positives that are correctly identified as such. It refers to the test’s

ability to correctly detect patients who do have the condition. Mathematically, this

can be expressed as:

No. of T rue P ositives

Sensitivity = (3.28)
No. of T rue P ositives + No. of F alse Negatives


Specificity (also called the true negative rate) measures the proportion of negatives

that are correctly identified as such. It relates to the test’s ability to correctly detect

patients without a condition. Mathematically, this can also be written as:

No. of T rue Negatives

Specif icity = (3.29)
No. of T rue Negatives + No. of F alse P ositives


In a classification task, the precision for a class is the number of true positives (i.e.

the number of items correctly labeled as belonging to the positive class) divided by

the total number of elements labeled as belonging to the positive class (i.e. the sum

of true positives and false positives, which are items incorrectly labeled as belonging

to the class).
T rue P ositives
P recision = (3.30)
T rue P ositives + F alse P ositives


Recall is defined as the number of true positives divided by the total number of elements

that actually belong to the positive class (i.e. the sum of true positives and false

negatives, which are items which were not labeled as belonging to the positive class

but should have been).

T rue P ositives
Recall = (3.31)
T rue P ositives + F alse Negatives

F- Measure:

F-Measure is a measure of a test’s accuracy that considers both the precision and the

recall of the test to compute the score (Harmonic mean).

P recision ∗ Recall
F − Measure = 2 (3.32)
P recision + Recall


The accuracy of a measurement system is a level of measurement that yields true (no

systemic errors) and consistent (no random errors) results.

T rue P ositives + T rue Negatives

Accuracy = (3.33)
T otal No. of Samples

3.5 Experimental Results

Experiments are carried out using different features in signal and image processing but

features like morphological, LPC and MFCC has given better performance compared

with other features.

Dataset: In order to validate the effectiveness of the proposed algorithms, assessment

of the performance in terms of accuracy should be made. As there is no golden rule to

determine the peak, onset and offset of the ECG waves, validation of the ECG feature

detection algorithms must be done using databases with manual annotations. The

basic idea here is to compare manually annotated results of the clinical features of a

specific heartbeat to the ones generated by the algorithms.

The experiments are conducted for the ECG wav data collected from different

Physiobank databases with different age groups of both men (1186) and women (814).

The total duration of the ECG signal is from 30 minutes to 1 hour, which is sampled

at 360 hertz and encoded by 16-bit, Pulse Code Modulation (PCM) format. The ECG

signal of 10 seconds duration is taken from each for experimentation. 2000 ECG clips

of all categories are taken for conducting experiments. For each disease 200 ECG clips

are considered in which 150 are given for training and 50 are given for testing. The ratio

of training and testing data in terms of accuracy is shown in Table 3.4. The dataset

collected from different sources are given in Table 3.2, 3.3. Along with the Physiobank

dataset, the realtime dataset collected from Raja Muthaiah Medical College Hospital

(RMMCH), Annamalai University and Mahatma Gandhi Medical College Hospital

(MGMCH), Pondicherry are also taken for conducting experiments. The reason for

using the database and dataset from hospitals is to evaluate our proposed algorithms on

standard 12-lead ECG for various disease categories to demonstrate their effectiveness.

Evaluation using SVM, AANN and GMM:

The ECG signal is preprocessed and the features namely Morphological features, Linear

Prediction Coefficients (LPC) and Mel Frequency Cepstral Coefficients (MFCC) are

extracted with 8, 14 and 13 dimensions. The ECG recording of about 30 minutes to

1 hour duration is taken for experimentation. The sampling rate is 360 Hertz and the

duration of training data is 10 seconds.

Initially the Morphological features with 8 dimensions, LPC features with 14 di-

mensions and MFCC features with 13 dimensions are trained using SVM. N SVMs

Table 3.2: Dataset I

S.No Category Data Source

MIT-BIH Atrial Fibrillation Database (afdb) ,

St.Petersburg INCART 12-lead Arrhythmia

1 Atrial Fibrillation
database (incartdb), MIT-BIH Supra

Ventricular Arrhythmia database and Hospitals

MIT-BIH Supra Ventricular Arrhythmia

Supra Ventricular
2 database (svdb), MIT-BIH Arrhythmia
database and Hospitals

CU Ventricular Tacyarrhythmia
3 Ventricular Tachycardia Database (cudb), MIT-BIH Arrhythmia

database and Hospitals

St.Petersburg INCART 12-lead Arrhythmia

database (incartdb), PTB Diagnostic ECG

4 Antero Septal Infarction
Database (ptbdb), European St-T Database

and Hospitals

PTB Diagnostic ECG Database (ptbdb),

5 Anterior Infarction
European St-T Database and Hospitals

PTB Diagnostic ECG Database (ptbdb),

6 Inferior Infarction
European St-T Database and Hospitals

St.Petersburg INCART 12-lead Arrhythmia

7 database (incartdb), MIT-BIH Arrhythmia
Database and Hospitals

Table 3.3: Dataset II

S.No Category Data Source

St.Petersburg INCART 12-lead Arrhythmia

Left Bundle Branch
8 database (incartdb), MIT-BIH Arrhythmia
database and Hospitals

St.Petersburg INCART 12-lead Arrhythmia

Right Bundle Branch
9 database (incartdb),MIT-BIH Arrhythmia
database and Hospitals

10 Normal MIT - BIH Database and Hospitals

Table 3.4: Ratio of training and testing data in terms of accuracy

Ratio of training and testing data Accuracy

70 : 30 97.40%

80 : 20 98.60%

60 : 40 96.00%

are created for each feature of ECG samples. For training, 1000 ECG samples are

considered which includes normal and nine abnormal categories. 100 feature vectors

from each category is considered for Morphological, LPC and MFCC features. The

training process analyzes ECG training data to find an optimal way to classify ECG

frames into their respective classes.

The derived support vectors are used to classify sub categories of the disease from

ECG data. For testing, 1000 ECG samples were considered. During testing, 8 dimen-

sional Morphological features, 14 dimensional LPC and 13 dimensional MFCC features

are given as input to SVM model and the distance between each of the feature vectors

and the SVM hyperplane is obtained.

The average distance is calculated for each model. The disease corresponding to

the ECG is decided based on the maximum distance. The same process is repeated for

all the sub categories of the diseases, and the performance is studied. The performance

of ECG classification for Polynomial, Gaussian and Sigmoidal kernels is studied. From

the analysis, Gaussian kernel function in SVM using MFCC features provides improve-

ment in performance for three levels of classification. Hence Gaussian kernal is applied

in this work. The performance of kernel functions for normal/abnormal classification

is shown in Table 3.5.

Table 3.5: Performance of SVM for normal/abnormal classification

Performance (in %) Polynomial Gaussian Sigmoidal

Morphological 92.72 97.90 94.30

LPC 82.30 94.00 91.21

MFCC 91.40 98.50 96.19

The distribution of the Morphological, LPC and MFCC feature vectors in the

feature space is captured using an AANN model. Separate AANN models are used to

capture the distribution of feature vectors of each class, and the network is trained for

400 epochs. One epoch of training is a single presentation of all the training vectors

to the network. For evaluating the performance of the system, the feature vector is

given as input to each of the models. The output of the model is compared with the

input to compute the normalized squared error.

The normalized squared error (E) for the feature vector y is given by, E = ||y||2

where o is the output vector given by the model. The error (E) is transformed into a

confidence score (C) using C = exp(−E). The average confidence score is calculated

for each model. The class is decided based on the highest confidence score. The perfor-

mance of the system is evaluated, and the method achieves about 98.80% classification

rate using MFCC for normal/abnormal classification. The structure of AANN model

plays an important role in capturing the distribution of the feature vectors. After the

trial and error, the network structure obtained for three features is shown in Table

3.6. The structure seems to give good performance in terms of classification accuracy.

For testing, the feature vectors extracted from the various classes are given as input

to the model, and the corresponding class has the maximum confidence score.

Table 3.6: Structure of AANN for ECG signal classification

Feature Dimension AANN structure

Morphological 8 8L 16N 3N 16N 8L

LPC 14 14L 28N 7N 28N 14L

MFCC 13 13L 26L 6N 26N 13L

The number of units in the third layer (compression layer) determines the number

of components captured by the network. The AANN model projects the input vectors

onto the subspace spanned by the number of units (Nc ) in the compression layer. If

there are Nc units in the compression layer, then the ECG feature vectors are projected

onto the subspace spanned by Nc components to realize them at the output layer. The

effect of changing the value of Nc on the performance of normal/abnormal (Level-I)

classification is studied. There is no major change in the performance if Nc and Ne is

between 4 ≤ Nc ≤ 6 and 24 ≤ Ne ≤ 26 for MFCC features as Table 3.7 and 3.8.

The performance of the system decreases because there may not be a boundary

between the components representing the disease information and the training ECG

samples may not be sufficient for capturing the distribution of the feature vectors.

Table 3.7: Performance in terms of number of units in the compression layer (Nc ) for
normal/abnormal classification (Level-I)

Features Performance Evaluation for Compression


MFCC No.of units in Nc Nc =2 Nc =4 Nc =6 Nc =8

Accuracy (in %) 97.23 97.95 98.80 96.82

Table 3.8: Performance in terms of number of units in the expansion layer (Ne ) for
normal/abnormal classification.

Features Performance Evaluation for Expansion Layer

MFCC No.of units in Ne Ne =22 Ne =24 Ne =26 Ne =28

Accuracy (in %) 96.11 96.20 98.80 96.84

Table 3.6 shows the structure of AANN used in this work. The general topology

of AANN is discussed in Fig. 3.6. AANN performs identity mapping and hence input

and output layer contains same number of nodes.

In GMM the database comprises of ECG samples that leads to fitting of each

category to individual component. The component setting of 4 or more provides better

accuracy than others. Based on the characteristics of each disease the sub categories

are analysed. Various components in GMM using Morphological, LPC and MFCC

features are analyzed for three levels. The number of Gaussian mixtures is increased

from 2 to 10 and the performance in terms of classification accuracy is studied.

When the number of mixtures is 2, the performance is very low. When the mixtures

are increased from 2 to 4, the classification performance slightly increases. When

the number of mixtures varies from 4 to 10, there is no considerable increase in the

performance and the maximum performance is achieved. There is no considerable

increase in the performance when the number of mixtures is above 10. With GMM, the

best performance is achieved with 4 Gaussian mixtures for three levels of classification.

Table 3.9: Performance of GMM for normal/abnormal classification

❳❳❳ No. of Mixtures
2 4 6 8
Performance (in %) ❳❳❳

Morphological 80.79 95.64 94.92 94.89

LPC 79.50 82.50 82.46 82.11

MFCC 78.43 83.30 82.80 81.92

3.5.1 Normal/Abnormal Classification using SVM, AANN and

GMM (Level - I)

The classification is carried out in three levels in this work. First level is focused on

classification of ECG samples into normal or abnormal category. The performance for

normal/abnormal classification is shown in Table 3.10. From the experimental analysis

it is observed that MFCC with AANN classifier provides an optimum result than other

techniques for ECG signal. In normal/abnormal classification the ECG characteristic

of the normal ECG sample are alone considered and the characteristics of individual

diseases were not considered. The accuracy of normal/abnormal categories is shown

in Fig. 3.8.

Table 3.10: Performance of SVM, AANN and GMM for normal/abnormal classifica-

Classifiers SVM AANN GMM

Performance (in %) Spec Sen Acc Spec Sen Acc Spec Sen Acc

Morphological 93.71 97.93 97.55 98.45 98.28 97.90 93.12 95.64 93.10

LPC 93.90 94.73 94.00 95.01 95.90 96.00 81.71 82.40 82.50

MFCC 98.21 97.95 98.50 98.79 98.64 98.80 86.90 83.30 87.60

Fig. 3.8: Performance of SVM, AANN and GMM for normal / abnormal classification

3.5.2 Disease Classification using SVM, AANN and GMM

(Level -II)

The second level focusses on classification of three cardiac diseases. The performance

of three diseases namely Arrhythmia, Myocardial Infarction and Conduction Blocks

using different classifiers are discussed in this Section. From the analysis it is observed

that AANN provides better performance compared to other classifiers is shown in

Table 3.11. The accuracy of AANN for three major cardiac diseases is shown in Fig.


Table 3.11: Performance of AANN for disease classification (Level - II)


Performance (in %) Precision Recall F-score Accuracy

Morphological 97.50 97.50 97.50 98.50

LPC 94.97 94.50 94.73 96.02

MFCC 95.52 96.00 95.76 97.16

Myocardial Infarction

Morphological 97.97 97.00 97.48 98.33

LPC 95.50 95.50 95.50 96.23

MFCC 96.01 96.50 96.25 97.50

Conduction Blocks

Morphological 96.50 97.50 96.99 98.00

LPC 95.01 95.50 95.25 95.45

MFCC 96.46 95.50 95.97 97.33

3.5.3 Disease Subcategory Classification using SVM, AANN

and GMM (Level - III)

In third level the sub categories of Arrhythmia (Arr), Myocardial Infarction (MI) and

Conduction Blocks (CB) are considered. Performance of ECG classification for disease

subcategories using techniques namely SVM, AANN, GMM is shown in Figs. 3.10,

3.11 and 3.12. Table 3.11 shows the performance of AANN for level-II classification

where the morphological features for Arrhythmia shows a high precision of 97.50%,

Fig. 3.9: Performance of AANN for disease classification

for Myocardial infarction 97.97% and for Conduction blocks 96.50% respectively. In

the literature F-measure and Accuracy are the two main performance measures for

classification. Hence it is used in this work. F-Measure is a measure of a test’s

accuracy that considers both the precision and the recall of the test to compute the

score (Harmonic mean).

P recision ∗ Recall
F − Measure = 2 (3.34)
P recision + Recall

The accuracy of a measurement system is a level of measurement that yields true (no

systemic errors) and consistent (no random errors) results.

T rue P ositives + T rue Negatives

Accuracy = (3.35)
T otal No. of Samples

In this work hierarchical classification is made. In level I the normal and abnormal

classification is made. In level II the three major cardiac diseases classification is made

namely Arrhythmia, Myocardial Infarction and Conduction blocks and in level III the

disease subcategory classification is made.

Fig. 3.10: Performance of Arrhythmia

Fig. 3.11: Performance of Myocardial Infarction

Fig. 3.12: Performance of Conduction Blocks

For Arrhythmia three subcategories namely Supraventricular Arrhythmia, Atrial

Fibrillation and Ventricular Tachycardia, for Myocardial infarction three subcategories

namely Anteroseptal Infarction, Anterior Infarction and Inferior Infarction and for

Conduction blocks three subcategories namely atrioventricular blocks, Left bundle

branch blocks and right bundle branch blocks are classified respectively.

Novelty of the Work: The novelty of the work is signal is processed using image

processing techniques.

3.6 Summary

This chapter discusses the Morphological, LPC and MFCC feature extraction and

classification of ECG data using SVM, AANN and GMM classifiers. Nine categories

of cardiac diseases is classified in the proposed work. The performance of the system

is studied for all the nine categories. The performance of the system is evaluated on a

large dataset collected from the Physiobank database and real time dataset from hos-

pitals for normal and nine types of diseases. Most of the samples are correctly detected

and it is observed that AANN with morphological features gives better performance

when compared with other techniques.