Professional Documents
Culture Documents
5, SEPTEMBER 2015
Abstract—Continuous followup of heart condition through long- Cardiac arrhythmias are the most relevant among the ECG
term electrocardiogram monitoring is an invaluable tool for di- findings. There are two main sources of arrhythmias: An
agnosing some cardiac arrhythmias. In such context, providing automatism disorder, that is, a set of alterations in the beat activa-
tools for fast locating alterations of normal conduction patterns is
mandatory and still remains an open issue. This paper presents tion point due to changes in its location or activation frequency;
a real-time method for adaptive clustering QRS complexes from or a conduction disorder, that is, an abnormal propagation of the
multilead ECG signals that provides the set of QRS morpholo- beat wavefront through the cardiac tissue. They both have an
gies that appear during an ECG recording. The method processes effect on the ECG, affecting the beat morphology and/or beat
the QRS complexes sequentially by grouping them into a dynamic rhythm. In order to support their identification, a method for
set of clusters based on the information content of the temporal
context. The clusters are represented by templates which evolve separating the beats by their activation point and conduction
over time and adapt to the QRS morphology changes. Rules to pattern should be provided.
create, merge, and remove clusters are defined along with tech- Beat classification arises as the task of assigning each beat
niques for noise detection in order to avoid their proliferation. To in an ECG a label identifying its physiological nature. Machine
cope with beat misalignment, derivative dynamic time warping is
learning techniques have been applied to this task by estimating
used. The proposed method has been validated against the MIT-
BIH Arrhythmia Database and the AHA ECG Database showing the underlying mechanisms that produce the data of a training
a global purity of 98.56% and 99.56%, respectively. Results show set. The main drawback of this approach is its strong dependence
that our proposal not only provides better results than previous on the pattern diversity present in the training set. Thus, interpa-
offline solutions but also fulfills real-time requirements. tient differences show that it cannot be assumed that a classifier
Index Terms—Adaptive clustering, dominant points, dynamic trained on data of a large set of patients will yield valid results on
time warping (DTW), electrocardiogram (ECG), QRS clustering. a new patient [2]–[4], and intrapatient differences show that this
cannot be assumed even for the same patient throughout time.
In addition, class labels only provide gross information about
I. INTRODUCTION the origin of the beats in the cardiac tissue, losing all the in-
OWADAYS, the surface electrocardiogram (ECG) is rec- formation about their conduction pathways. This approach does
N ognized as an invaluable tool for monitoring heart con-
dition, since its analysis provides decisive information that can
not distinguish the multiple morphological families present in a
given class, as occurs in multifocal arrhythmias.
reveal critical deviations from normal cardiac behavior. Recent In contrast, beat clustering aims at dividing the ECG recording
developments in mobile sensors and mobile computing have in a set of beat clusters, each one of them preserving some
enabled new scenarios for continuous ECG monitoring as an in- similar properties. Previous proposals have focused on an offline
expensive tool for the early detection of some cardiac events [1], approach from a priori maximum number of clusters [5]–[8] and
especially in those cases where symptoms appear intermittently. they imply processing the ECG signal once the acquisition has
As the monitoring period increases, the interpretation task been completed. This approach has given good noise robustness
becomes more time consuming and decision support tools are but as a side effect, a single morphology is usually replicated in
needed to help cardiologists to reduce the time spent on it. If a several clusters and rare beat morphologies can be missed. It also
continuous followup is required, these tools become imperative. omits the dynamic aspect of ECG and, in particular, ignores the
Their main aim is to provide the cardiologists with a summary temporal evolution of morphologies. Furthermore, the detection
of all the acquired signals, enhanced with a fast locating of those of critical events can be deferred too long to provide timely
anomalies detected. attention. For all these reasons, a dynamic online approach must
be considered.
In this paper, we present a real-time method for adaptive
beat clustering, with a potential application not only as a pre-
Manuscript received February 17, 2014; revised July 4, 2014 and September
1, 2014; accepted September 26, 2014. Date of publication October 8, 2014; date vious step for classification [9], but also as a summary about
of current version September 1, 2015. This work was supported by the Spanish those beat morphologies present in a certain period, their tem-
Ministry of Science and Innovation under Grant TIN2009-14372-C03-03. poral evolution and variability, or even to detect the presence
The authors are with the Centro de Investigación en Tecnoloxı́as da Infor-
mación, University of Santiago de Compostela, 15782 Santiago de Compostela, of alternating morphologies. The proposed method emulates
Spain (e-mail: daniel.castro@usc.es; paulo.felix@usc.es; jesus.presedo@ the experts behavior in exploiting the temporal context for
usc.es). assigning each new beat to the most appropriate cluster. To
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. this end, clusters are continuously adapting to the temporal
Digital Object Identifier 10.1109/JBHI.2014.2361659 evolution of beat morphologies, and they can be dynamically
2168-2194 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
CASTRO et al.: METHOD FOR CONTEXT-BASED ADAPTIVE QRS CLUSTERING IN REAL TIME 1661
Fig. 1. Flowchart of the method proposed for QRS clustering. The different stages involved in beat processing are shown and the creation of a new cluster is
exemplified. Each block refers to the corresponding section.
created, merged or modified, resulting in a variable number of wards, the current cluster set is updated in one of three ways:
clusters. Creating a new cluster, modifying the most similar one or merg-
Beat clustering requires extracting from the ECG a set of rep- ing two or more clusters (see Section IV-E). The next stage
resentative measurements for every beat. Bibliography shows a performs a noise analysis for each lead in order to detect noisy
variety of proposals for beat representation that can be grouped intervals and avoid the processing of noisy beats or discard the
into four categories: morphological features where the signal clusters created from them (see Section V). Finally, the beats are
amplitude is directly used [2], [4], [7], [9]; segmentation fea- classified by their rhythm type and a set of groups with common
tures like area, amplitude, or interval duration from beat waves morphology and rhythm is obtained (see Section VI).
[2], [4], [7], [10] or amplitude and angle values from vector- The ECG databases have been processed and their beat
cardiogram [3]; statistical features derived using high-order cu- class labels have been used to validate the purity of the fi-
mulants [4], [11], [12]; and finally, transformed space features, nal cluster and group sets (see Section VII). These results are
defined in an alternative space using different transforms like discussed in Section VIII along with the conclusions of this
Karhunen–Loewe transform [13], Hermite basis functions [4], paper.
[5], [12], discrete Fourier transform [9], [14] or wavelet trans-
form [3], [14]. All the previous works complete their feature
sets with rhythm information to complement their description II. ECG SIGNAL DATABASES
capabilities. The ECG databases recommended by the ANSI/AAMI EC57
This paper proposes a new approach to represent a beat by [15] standard for reporting the performance of arrhythmia de-
reducing the QRS complex to a set of relevant points and sup- tectors were used for validation purposes: The MIT-BIH Ar-
port regions. This representation has some nice properties for rhythmia database and the AHA ECG database.
beat clustering: It is stable against the usual variability and the The MIT-BIH Arrhythmia database [16], [17] can be referred
presence of noise in the ECG, and it explicitly represents the to as the golden standard for beat clustering and classification
temporal location of some QRS features. tasks and it is the reference database for almost all the litera-
The proposed method processes a real-time multilead ECG ture in this field. This database is composed of 48 recordings
signal through a set of data-driven stages as shown in Fig. 1. In of ambulatory ECG, obtained from 47 different patients which
order to obtain comparable results, signals from two standard comprise a very complete set of examples of common and rare
ECG signal databases are used as data source (see Section II). arrhythmias. Each record has a duration of 30 min, and in-
The preprocessing stage comprises real-time beat detection and cludes two channels with the same leads in almost all of them:
baseline filtering (see Section III). Then, a fixed-length signal a modified-lead II (MLII) in the first one and lead V1 in the
segment is selected for extracting and characterizing the QRS second one. MLII was replaced by lead V5 in three records and
complex (see Section IV-A). QRS complexes are compared to V1 was replaced by MLII, V2, V4 or V5 on eight records. The
the current set of clusters following a context-based criteria to signals were digitized at fs = 360 Hz and bandpass filtered with
obtain the best matching cluster (see Sections IV-B–D). After- cutoff frequencies at 0.1 and 100 Hz. All beats present in the
1662 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 5, SEPTEMBER 2015
III. PREPROCESSING
The major drawback of processing long-term ECG sig-
nals is the presence of a high level of noise with multiple
manifestations—baseline wandering, power line interference or
electromyographic activity— so an initial filtering stage needs
to be performed. The efforts have been focused on filtering the
Fig. 2. Example of a subsequence qln for lead l = 2, of a beat at sample 4102,
baseline wandering as this is the most relevant source affect- n = 13, from record 108 of the MIT-BIH database. The relevant points detected
ing the reliability of the clustering algorithm because of the by the algorithm are marked with *. The parameters involved in the relevant
distortion it can cause on the QRS morphology. point detection are also shown.
where qjl = slt−w − −1+j . We set w− = 0.1 × fs and w+ = In consequence, adjacent support regions can now overlap.
0.2 × fs , which are wide enough to capture the longest QRS A relevant point pj is said to be in a concave wave if qj > qj −
of abnormal beats [19] (V, F, and f beat types, typically). and qj > qj + . Otherwise, it is said to be in a convex wave. The
Most of this section describes operations over one lead; thus, wave height is defined as Δpj = min(Δqj,j − , Δqj,j + ).
for the sake of notation simplicity, the l superscript will be Finally, the nth beat is represented by the QRS signal segment
obviated unless multiple leads are involved. and the set of its relevant points and support regions Pn =
We define the curvature K at qj ∈ q n as {(pj , support(pj )) | pj ∈ Rn } and denoted by
K(qj , q n ) = max cos q
i qj qk (2) Bn =< q n , Pn > . (11)
i∈I j− ,k ∈I j+
There is not a consensus in the literature about the limits for
where 2 ≤ j ≤ w − 1. The terms Ij− and Ij+ denote the intervals width and height of QRS complex or individual QRS waves.
used for calculating the curvature and they are given by The AAMI standard [20] recommends a minimum amplitude of
Ij− = {i | (j − θ) ≤ i < j ∧ 50 μV and the duration of 10 ms for a QRS wave to be detected,
and 150 μV for peak-to-peak QRS amplitude with a minimum
∀a ∈ (i, j), ( max Δqj,b − Δqj,a ) < ρm in } (3) duration of 70 ms. On the other hand, the AHA [21] and CSE
b∈(a,j )
[19] report lower amplitude for QRS waves (down to 20 μV and
Ij+ = {k | j < k ≤ (j + θ) ∧ 10 ms) based on measures over averaged beats with increased
∀a ∈ (j, k), ( max Δqj,b − Δqj,a ) < ρm in } (4) signal-to-noise ratio. These limits were not established for phys-
b∈(j,a) iological reasons but for signal noise level or instrumentation
where Δqj,x = |qj − qx |. The term θ is the maximum physi- limitations. Nothing is stated about maximum QRS width be-
ologically meaningful width of a QRS wave between its peak yond a reference to case-based duration values (e.g. the CSE
location and its left or right end so that it is the upper limit of Ij− study [19] reports a maximum QRS width of 210 ms). In our
and Ij+ . The term ρm in is the minimum height for a signal de- case, due to the signal-to-noise ratio present in the ambulatory
flection to be considered physiologically relevant and, therefore, signals, the value of ρQRS is set to 150 μV in order to avoid the
to be excluded from the calculation of curvature. detection of small waves caused by noise and the value of θ is
We define the dominance region of a sample qj as set to 100 ms to accept QRS waves with a maximum width of
200 ms. The value of ρm in is set to 50 μV following the AAMI
dominance (qj ) = [r− , r+ ] (5) standard [20] and will be useful to detect noise contaminated
−
where r = min(arg maxi∈I j− cos q QRS complexes.
i qj qk ) for any fixed k ∈ Ij
+
C. Template Matching
In order to assign a beat Bn to an existent cluster Cnc −1 , we
design a similarity calculation that only considers the differ-
ence between signals over the support region of each relevant
point; thus, limiting the comparison to the constituent waves
of the QRS. Given a pair (pj , [j − , j + ]) ∈ Pn , we are interested
in verifying whether Tnc −1 contains a similar wave in the same
location of the QRS. In order to do so, the interval [ǰ − , ǰ + ]
of q cn −1 aligned with the interval [j − , j + ] of q n must be ob-
tained (see Fig. 4). To this end, the warping path m is used
to map j and [j − , j + ] from q n into q n obtaining the equiv-
alent index j for the relevant point, and [
j−, j + ] for its sup-
−
port region where j = max{k | xk = j}, j = min{k | xk =
j − }, and j + = max{k | xk = j + }, being (xk , yk ) ∈ m. After-
wards, since q n are already aligned by the application
cn −1 and q
of DDTW, the same interval [ j−,
j + ] from q cn −1 is selected and
mapped into q n −1 using the warping path. The interval [ǰ − , ǰ + ]
c Fig. 4. Parameters involved in calculating the concordance and local dissimi-
larity of a cluster with respect to a beat at a relevant point p j . Solid and dashed
is given by ǰ − = yj − and ǰ + = yj + . lines represent beat and template data, respectively. (a) Shows parameters in-
Once the aligned intervals are obtained, we proceed to volved in concordance checking; the support region [j − , j + ] and the derived
evaluate the concordance of both vectors. The segment intervals [j−,j + ] and [ǰ − , ǰ + ] are drawn; (b) signal segments within the inter-
q cn −1 is said to concord with q n at pj and denoted by val [
j −,
j +] with a vertical alignment performed in the subintervals [ j−,j ] and
q cn −1 ≈p j q n if q cn −1 contains a deflection in [ǰ − , ǰ + ] with [
j, + − +
j ] independently; ΔA j and ΔA j areas are shaded; (c) the same signal
height Δpcǰ > ρm in likely to be considered a significant segments shown in (b) with A − +
j and A j areas shaded.
and its most similar cluster may change. The closest relation is
updated when multiple clusters in the same set than Cnwin −1 , be it
either Cnctx −1 or C n −1 − C ctx
n −1 , fulfill the assignment condition
Fig. 5. Flowchart of the cluster selection and cluster set updating processes. S̄(q ln , q c,l
n −1 ) > γ in all leads (see Section IV-D). Since Cn has
been updated, getting more similar to Bn , the best matching
cluster Cns −1 for Bn is selected within the remaining subset of
L c,l
l=1 S(q n , q n −1 )/L)
l
When Bn and Cnsim clusters, where s = arg maxC nc −1 = C nw −1 in (
−1 are not similar enough, a new comparison
is performed within the subset Cn −1 − Cnctx −1 , obtaining the most and the oldest of both clusters, Cnwin s
−1 or Cn −1 , is set as the
∗ sim ∗ new most similar cluster for the other. Then, the clusters Cnwin
similar cluster Cnsim
−1 . If B n and C n −1 are similar enough, the
sim ∗ and Cns are checked for possibly merging, since their templates
beat is assigned and Cnwin −1 = Cn −1 . Otherwise, the beat is not
assigned to any existing cluster and its most similar cluster have evolved to be similar enough to the last beat (see Fig. 5).
sim ∗
Cnwin sim
−1 is selected between Cn −1 and Cn −1 using the same voting The condition for merging those clusters will be analogous to
criteria previously seen. the condition for beat assignment: S̄(q s,l win,l
n , qn ) > γ , where
γ = 0.40, which corresponds to the maximum contribution of
E. Cluster Set Updating a point with local dissimilarity over 20%. The threshold value
is increased with respect to γ since both signal segments are
In order to adaptively respond to the changing behavior of promediated templates with increased signal-to-noise ratio.
the ECG, clusters must be dynamically created, modified, or Afterwards, the special case of clusters within its transient
merged whenever a new beat is detected. period is considered. We establish the duration of this transitory
1) Cluster Creation: If Bn is not assigned to Cnwin −1 , a new state in terms of the number of assigned beats. Hence, if Bn
cluster Cnnew is created and its template is initialized for each is assigned to Cnwin −1 and |Cn | < μ, the cluster is checked for
win
lead using the beat representation: Tnnew ,l = Bnl . Then the clus- merging with its closest one Cns = closest (Cnwin ) (see Fig. 5).
ter set is updated as Cn = Cn −1 ∪ {Cnnew }. We set μ as the minimum number of beats assigned to the
2) Cluster Modification: If Bn is assigned to Cnwin −1 , then the cluster to confirm, it represents an independent morphology and
cluster is updated to Cnwin = Cnwin −1 ∪ {B n } and the template we consider μ = 10 enough for this purpose.
win,l win,l win,l win,l
Tn −1 = < q n −1 , Pn −1 > is modified to Tn for each lead. When two clusters Cnwin and Cnc are merged, the cluster set,
To this end, q nwin,l is calculated from q ln and q win,l
n −1 cluster template, and closest relation are updated accordingly
win,l (we suppose that Cnc = closest (Cnwin )).
,n = (1 − β)q̇y ,n −1 + β mean({q̇x,n | (x, y) ∈ m})
q̇ywin.l l
1) Cnc is updated to Cnc = Cnc ∪ Cnwin .
c,l
(23) 2) q c,l
n is calculated by merging q n −1 and q n
win,l
qywin,l
,n = qywin,l
−1,n + q̇ywin,l
,n (24)
q̇yc,l,n = (1 − β)q̇yc,l,n +β win,l
({q̇x,n | (x, y) ∈ m}) (25)
win,l win,l
where q1,n
= q1,n
The term β is the coefficient of the
−1 .
exponential update. Setting a value for β implies a tradeoff qyc,l,n = qyc,l−1,n + q̇yc,l,n (26)
between plasticity and stability of the cluster template. We set
c,l
β = 1/8 so the last 16 beats assigned to the cluster provide where q1,n remains unmodified.
CASTRO et al.: METHOD FOR CONTEXT-BASED ADAPTIVE QRS CLUSTERING IN REAL TIME 1667
TABLE I
RULES FOR RHYTHM LABEL SELECTION
ΔRRn GP P N− N N+ C D
∈ (3
σ n , ∞] D6 D2,6 D D 1 |1 0 D 2 |1 1 D D
C C N+ N+
∈ (2
σ n , 3
σn ] C C N+ N+ N+ D5,6 D 6 |9
N+ N+
∈ [−2
σ n , 2
σn ] N N N N N N N
σ n , −2
∈ [−3 σn ) GP 8 N −1 N− P 3,4,6 N− P6 P4
N− GP GP 3,8 GP 8 N−
N− N−
∈ [−∞, −3
σn ) GP GP P 3,4,7 P 3,4,7 P4 P 4,7 P4
GP 3,12 GP 3 GP GP GP
N− N−
The absence of an analysis of the P wave leads to the inability then the context in T C is repeatedly moved forward one beat
to discriminate premature, normal, or ectopic beats which share until T CN = ∅ for a context τ –ctx− (Bτ +1+i ). Finally, we set
a common QRS morphology. All previous clustering proposals
n = T C N and σ
NN n = σT C N ∀n ∈ [1, τ + 1 + i].
include rhythm information within their beat characterization
and claim its separation capabilities for this kind of arrhyth-
mias. In order to make our results comparable, we include a B. Rhythm Labeling
rhythm processing stage that allows us to separate those beats The aim of rhythm processing is the discrimination of those
into different groups. abnormal RR values associated with arrhythmic beats from the
normal ones. To this end, the rhythm of the Bn beat is charac-
A. RR-Interval Characterization terized through a vector rr n = (RRn , RRn− , RRn+ , NN
n, σ n )
− +
The beat-to-beat interval (RR) between normal beats, com- where RRn and RRn are the RR values for the previous and
monly known as NN interval, is the result of a nonstationary next beat, respectively. Then, the model described in the previ-
stochastic process regulated by the sympathetic and parasym- ous section is used to establish a range of validity for the RRn
pathetic nervous systems. This implies that the RR value for value which allows us to detect any alteration in the normal
beat Bn , denoted by RRn , should be put in context using the rhythm.
rhythm of the surrounding beats to analyze its normality. To We use seven rhythm labels to discern between four beat
this end, we model the NN series as a stochastic process with rhythm types: Normal, with (C) or without (N , N − , N + ) com-
marginal distribution N̄ (NN n , σn2 ) at the nth beat. The mean is pensatory pause, premature (P ), group of prematures (GP ),
n = θRRn −1 + (1 − θ)NN
estimated as NN
n −1 , with θ = 0.2 and delayed (D). The explicit domain knowledge contained in
for RRn −1 values labeled as normal (as explained in next sec- Table I models, for each rhythm type, the relation of an RR
n = NN
n −1 . The standard deviation is value with the normal rhythm from its temporal context. It also
tion). Otherwise, NN
i using the last τ beats with reflects the dependence of the rhythm type for an RR value on
estimated as σn = i NNi − NN
2
the rhythm type of its adjacent beats. This model allows us to
normal rhythm. Fig. 6 shows the evolution of the RR in an ex-
assign a rhythm label to the beat Bn based on the rrn values
cerpt with a premature beat from the record 117 of the MIT-BIH
and the rhythm label of the previous beat.
database to illustrate this point.
The first complete context, τ –ctx− (Bτ +1 ), is used to ini-
n and σ
tialize NN n . Let T C denote the set of the first τ
VII. RESULTS
values of RR and let Ki denote any subset of RR val- We have applied our clustering method to all the records
ues from consecutive beats. A value RRj ∈ T C is said to of the MIT-BIH database. The parameters and threshold val-
be normal if ∃Ki such that RRj ∈ Ki ∧ σK i < 0.1, where ues of this method have been neither trained nor adjusted to fit
σK i is the normalized standard deviation of the Ki set. Let this database. These values have been justified by physiological
T CN = {RRj | RRj is normal}. If T CN = ∅, then T CN = reasons, or by the expertise or intuition of experienced cardiol-
{RRj | RRj ∈ [T C − 2σT C , T C + 2σT C ]}, where T C and ogists. The method shows low sensitivity to small changes in
σT C are the mean and standard deviation of T C. If T CN = ∅, parameter values; the results either improve or worsen slightly.
CASTRO et al.: METHOD FOR CONTEXT-BASED ADAPTIVE QRS CLUSTERING IN REAL TIME 1669
TABLE II computation time for the different stages. Only the baseline
NUMBER OF CLUSTERS PER RECORD IN MIT-BIH DB
filtering and rhythm labeling stages present an intrinsic latency
due to noncausality: 400 ms and one beat, respectively. Since
Rec. N NR R NcR R Rec. N NR R NcR R
they both are executed concurrently, their contribution to the
100 4 7 7 201 15 32 10 delay is given by the maximum of both. The time complexity of
101 4 6 6 202 9 18 18 the method is constant in all stages but two: QRS alignment and
102 10 13 13 203 33 87 25
103 10 12 12 205 14 20 20
template matching. In both cases, time complexity is constant
104 16 25 25 207 61 96 25 for the best case, corresponding to beats assigned to a cluster in
105 10 16 16 208 28 63 22 the context; this happens in 95.35% of the total number of beats
106
107
27
11
49
21
18
21
209
210
10
27
26
65
9
13
in MIT-BIH database. Time complexity is linear (O(|Cn |)) for
108 22 35 7 212 5 8 8 the worst case, which corresponds to beats assigned to a new or
109 13 18 18 213 17 29 11 an out-of-context cluster, representing the remaining 4.65% of
111 8 10 10 214 21 36 11
112 4 7 7 215 16 32 8
the total number of beats. Given the high degree of parallelism
113 5 9 9 217 28 50 19 in both stages and the computational cost of processing a single
114 8 13 13 219 14 24 24 cluster, it can be guaranteed that a beat is clustered before the
115 11 11 11 220 2 5 5
116 10 15 15 221 14 24 24
next one arrives even with a set of hundreds of clusters. In order
117 4 5 5 222 8 19 19 to support this claim, the MIT-BIH Arrhythmia database was
118 3 8 8 223 23 43 17 processed with a nonoptimized, nonparallelized MATLAB im-
119 6 13 13 228 14 25 25
121 5 7 7 230 3 5 5
plementation of the method using a single core of an Intel Q9550
122 1 1 1 231 5 6 6 CPU. The following computation times for a single beat were
123 3 5 5 232 4 9 9 obtained: QRS characterization (maximum, mean): 4 ms, 3 ms;
124 14 18 18 233 24 48 15
200 20 46 16 234 5 7 7
cluster set selection: 323 ms, 16 ms; cluster set updating: 110 ms,
5 ms; noise analysis: 9 ms, <1 ms; and rhythm-based labelling:
N column indicates the number of clusters created using QRS mor- 10 ms, 5 ms. Globally, the whole method summed a maximum
phology; NR R , the number of groups after using rhythm labels; of 358 ms and mean of 32 ms. Additionally, reducing groups
and NcR R , the number of groups after the merging process.
for validation required a maximum of 190 ms.
Although better results could be obtained by a fine tuning of B. Clustering Performance Measures
these parameters for each specific database, this is not the aim
of this paper but proving the validity of the present approach for Purity is usually used to measure the goodness of a clustering
continuous ECG monitoring. method. Nevertheless, in a multiclass problem like this one, after
For each record, a set of clusters is obtained reflecting the QRS characterizing the clusters, the values of sensitivity (Se), positive
morphologies present in them. Afterwards, the rhythm labels predictivity (+P), and false positive rate (FPR) should also be
are used to split each cluster into groups of beats with the same provided for each class, in order to obtain a multidimensional
rhythm type. In order to compare our results with the proposal measure of the quality of the results from the perspective of each
in [5] under equivalent conditions, we adopted a fixed number class involved. A global purity of 98.56% is obtained for MIT-
of clusters (25) as the maximum number of groups. Thus, if that BIH Arrhythmia database (98.84% with AAMI class labels)
limit is exceeded for a record, a merging process is applied to and a 99.56% for the AHA ECG database. The other values are
obtain a reduced set of groups with the most prevalent rhythm shown in the last row of Tables III–V.
and morphologies. If necessary, we keep merging the groups
with the lowest number of assigned beats until the maximum is VIII. DISCUSSION AND CONCLUSION
reached. Table II shows the results before and after the merging
process. We propose a new clustering method to dynamically separate
Each group is labeled with the majority class label of the beats QRS morphologies as they appear in a multichannel ECG sig-
assigned to this group from the database. An assigned beat is nal, representing them with a dynamic number of clusters. This
considered as correctly grouped if the class label in the database objective has not been previously addressed in the literature and
match the label of the group. A confusion matrix is obtained only some partial solutions can be found; all of them restricted
for each record comparing both labels for every beat. These to offline processing [5]–[7] and some of them even limited
matrices are summed to obtain the global confusion matrix for to single channel signals and to a fixed subset of beat classes
the whole validation set shown in Table III. Table IV shows the [6], [7].
results using the AAMI class labels obtained from MIT-BIH The performance of the QRS clustering technique without
labels as described in [15]. Table V shows the results on AHA using rhythm data shows a global purity of 97.15% and 99.43%
database. for MIT-BIH and AHA databases, respectively. This confirms
the validity of our approach since no method, as far as we know,
neither offline nor online, achieve this performance without us-
A. Real-Time Considerations ing RR derived information. As expected, the main source of
The proposed method processes an ECG recording with error is the group of supraventricular classes A, N, J, j, and e,
a bounded time delay comprising the intrinsic latency and that can only be separated using P wave and rhythm information.
1670 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 5, SEPTEMBER 2015
TABLE III
CONFUSION MATRIX RESULTING FROM CLUSTERING MIT-BIH ARRHYTHMIA DATABASE
N L R a V F J A S E j / e f Q !
The first row corresponds to the annotation labels of the database, and the first column, to the dominant annotation label in the clusters. Se, + P , and F P R denote
the sensitivity, positive predictivity, and false positive rate for each beat class, respectively.
TABLE IV TABLE V
CONFUSION MATRIX FOR MIT-BIH DB USING AAMI CLASS LABELS CONFUSION MATRIX FOR AHA ECG DATABASE
N S V F Q N V F E / Q
Classes from the MIT-BIH label set without assigned beats are omit-
ted.
This is the reason for the difference between both results since
AHA database does not contain this type of beats.
The validation results after using rhythm labels show a high The NRR column of Table II shows a general increase in
sensitivity and positive predictivity for almost all classes while the number of groups in all records with a mean of 24 groups
the purity increases. We observe in accordance with [5], that the per record. This increment reflects the presence of different
largest number of errors in Table III are caused by beats with rhythm labels in the beats assigned to the clusters although
similar morphology. Fusion (F) of ventricular (V) and normal they do not always belong to different beat classes. Records
(N) beats are included with N or V clusters and viceversa. The with irregular rhythm like those with auricular fibrillation, or
same occurs with N, paced (/) and fusion of N and / beats (f). sudden rhythm change will render the RR information useless
Finally, beats with supraventricular or nodal activation points to discriminate premature or delayed beats. In such cases, the
(A, N, J, j, and e) with similar QRS are wrongly clustered when RR does not provide information about the beat activation point
the rhythm information does not result determinant for their and the discrimination cannot be performed without an analysis
discrimination. This kind of errors represent 66% of the total. of the P wave.
Besides the clustering performance, the analysis of the num- Since no other method for online clustering has been reported,
ber of clusters (N) generated for each record shows that it re- we compare our clustering performance with existing offline
mains reduced: 34 records (71%) have 15 or less clusters; 12 proposals. Only the work of Lagerholm et al. [5] provides com-
records (25%) have between 16 and 30 clusters, both included parable results since others [6], [7] are designed to deal with a
and only two records (4%) has more than 30 clusters. The high concrete subset of beat types and perform the evaluation over a
number of clusters for record 207 is caused by the presence of single channel (usually the one with lower noise). Compared to
an episode of ventricular flutter with QRS complexes replaced [5], our method provides a slightly better purity (98.56% ver-
by irregular waves. The clustering results would be improved sus 98.49%). The sensitivity is improved in our paper for 11
if an specific detection method were used for these kind of out of the 16 beat classes and slightly worsened for 3. Special
arrhythmias with absent QRS complex. mention should be made for classes a, A, R, j, and E where the
CASTRO et al.: METHOD FOR CONTEXT-BASED ADAPTIVE QRS CLUSTERING IN REAL TIME 1671
improvement is remarkable. Let us remember that Lagerholm ECG descriptors for heartbeat classification,” Med. Eng. Phys., vol. 28,
et al. [5] rely on a SOM with 25 clusters to represent the differ- no. 9, pp. 876–887, 2006.
[11] S. Osowski and T. H. Linh, “ECG beat recognition using fuzzy hybrid
ent beat classes. This approach has two main drawbacks, first neural network,” IEEE Trans. Biomed. Eng., vol. 48, no. 11, pp. 1265–
the clusters get saturated with dominant morphologies present 1271, Nov. 2001.
in the learning stage, while rare morphologies are ignored as [12] S. Osowski, L. T. Hoai, and T. Markiewicz, “Support vector machine-based
expert system for reliable heartbeat recognition,” IEEE Trans. Biomed.
well as new morphologies that appear afterwards. Second, the Eng., vol. 51, no. 4, pp. 582–589, Apr. 2004.
generated clusters are redundant in those records with a low [13] Y. H. Hu, S. Palreddy, and W. J. Tompkins, “A patient-adaptable ECG
number of morphologies. In contrast, our method dynamically beat classifier using a mixture of experts approach,” IEEE Trans. Biomed.
Eng., vol. 44, no. 9, pp. 891–900, Sep. 1997.
adapts the number of clusters to the number of morphologies [14] Z. Dokur and T. Olmez, “Ecg beat classification by a novel hybrid neural
detected. network,” Comput. Methods Prog. Bio., vol. 66, nos. 2/3, pp. 167–181,
The results of our proposal confirm the relevance of the tem- 2001.
[15] Testing and Reporting Performance Results of Cardiac Rhythm
poral context for beat clustering. It allows us to switch from an and ST-segment Measurement Algorithms. ANSI/AAMI Standard
offline to an online analysis achieving the same or even better EC57:1998/(R)2008, 2008.
results, and to address the temporal evolution of a beat morphol- [16] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C.
Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E.
ogy that otherwise would be projected into multiple clusters. Stanley, “PhysioBank, PhysioToolkit, and PhysioNet: Components of a
The results of the experiments on ECG standard databases also new research resource for complex physiologic signals,” Circulation, vol.
show the adequacy of the present method for real-time ECG 101, no. 23, pp. e215–e220, 2000.
[17] G. Moody and R. Mark, “The impact of the MIT-BIH Arrhythmia
monitoring. Database,” IEEE Eng. Med. Biol. Mag., vol. 20, no. 3, pp. 45–50, May/Jun.
Our proposal provides the cardiologists with the information 2001.
about the morphological diversity within a desired time frame [18] W.-Y. Wu, “An adaptive method for detecting dominant points,” Pattern
Recog., vol. 36, no. 10, pp. 2231–2237, 2003.
and its temporal evolution. This information allows them to [19] The CSE Working Party, “Recommendations for measurement stan-
promptly detect the different conduction patterns and evaluate dards in quantitative electrocardiography,” Eur. Heart. J., vol. 6, no. 10,
its relevance. It also can be useful for arrhythmia detection and pp. 815–825, 1985.
[20] Cardiac Monitors, Heart Rate Meters, and Alarms. ANSI/AAMI Standard
classification which can be later addressed either automatically EC13:2002/(R)2007, 2007.
by classification algorithms or manually by the cardiologists. [21] P. Kligfield, L. S. Gettes, J. J. Bailey, R. Childers, B. J. Deal, E. W. Han-
In conclusion, we have presented an adaptive, multichannel cock, G. van Herpen, J. A. Kors, P. Macfarlane, D. M. Mirvis, O. Pahlm,
P. Rautaharju, and G. S. Wagner, “Recommendations for the standardiza-
context-based method for clustering beat morphologies in real tion and interpretation of the electrocardiogram: Part I,” Circulation, vol.
time that has been validated over the whole MIT-BIH Arrhyth- 115, no. 10, pp. 1306–1324, 2007.
mia and AHA ECG databases with performance results that [22] H. Sakoe, and S. Chiba, “Dynamic programming algorithm optimization
for spoken word recognition,” IEEE Trans. Acoust., Speech, Signal Pro-
outperform its offline counterparts in the field. cess., vol. 26, no. 1, pp. 43–49, Feb. 1978.
[23] E. J. Keogh, and M. J. Pazzani, “Derivative dynamic time warping,” pre-
sented at the 1st SIAM Int. Conf. Data Mining, Chicago, IL, USA, 2001.
REFERENCES
[1] R. Sutton, “Remote monitoring as a key innovation in the management Daniel Castro received the B.Sc. and M.Sc. degrees in physics from the Uni-
of cardiac patients including those with implantable electronic devices,” versity of Santiago de Compostela, Santiago de Compostela, Spain, in 1998 and
Europace, vol. 15, no. 1, pp. i3–i5, 2013. 1999, respectively. He is a Researcher and currently working toward the Ph.D.
[2] P. DeChazal, M. O’Dwyer, and R. B. Reilly, “Automatic classification of degree at the Centro Singular de Investigación en Tecnoloxı́as da Información,
heartbeats using ECG morphology and heartbeat interval features,” IEEE University of Santiago de Compostela.
Trans. Biomed. Eng., vol. 51, no. 7, pp. 1196–1206, Jul. 2004. His research interests include signal processing and its application to biomed-
[3] M. Llamedo and J. P. Martı́nez, “Heartbeat classification using feature ical signals.
selection driven by database generalization criteria,” IEEE Trans. Biomed.
Eng., vol. 58, no. 3, pp. 616–625, Mar. 2011. Paulo Félix received the B.Sc. and Ph.D. degrees
[4] G. de Lannoy, D. François, J. Delbeke, and M. Verleysen, “Weighted con- in physics from the University of Santiago de Com-
ditional random fields for supervised interpatient heartbeat classification,” postela, Santiago de Compostela, Spain, in 1993 and
IEEE Trans. Biomed. Eng., vol. 59, no. 1, pp. 241–247, Jan. 2012. 1999, respectively.
[5] M. Lagerholm, C. Peterson, G. Braccini, L. Edenbrandt, and L. Sornmo, He is currently a Researcher at the Centro Singu-
“Clustering ECG complexes using Hermite functions and self-organizing lar de Investigación en Tecnoloxı́as da Información,
maps,” IEEE Trans. Biomed. Eng., vol. 47, no. 7, pp. 838–848, Jul. 2000. University of Santiago de Compostela. His research
[6] D. Cuesta-Frau, J. C. Pérez-Cortes, and G. A. Garcı́a, “Clustering of interests include temporal reasoning, machine learn-
electrocardiograph signals in computer-aided holter analysis,” Comput. ing, and signal processing.
Methods Prog. Bio., vol. 72, no. 3, pp. 179–196, 2003.
[7] D. Cuesta-Frau, M. O. Biagetti, R. A. Quinteiro, P. Micó-Tormos, and
M. Aboy, “Unsupervised classification of ventricular extrasystoles using
bounded clustering algorithms and morphology matching,” Med. Biol. Jesús Presedo received the B.Sc. and Ph.D. degrees
Eng. Comput., vol. 45, no. 3, pp. 229–239, 2007. in physics from the University of Santiago de Com-
[8] J. L. Rodrı́guez-Sotelo, D. Cuesta-Frau, and G. Castellanos-Domı́nguez, postela, Santiago de Compostela, Spain, in 1989 and
“Unsupervised classification of atrial heartbeats using a prematurity index 1994, respectively.
and wave morphology features,” Med. Biol. Eng. Comput., vol. 47, no. 7, He is currently a Researcher at the Centro Singu-
pp. 731–741, 2009. lar de Investigación en Tecnoloxı́as da Información,
[9] V. Krasteva and I. Jekova, “ QRS template matching for recognition of University of Santiago de Compostela. His research
ventricular ectopic beats,” Ann. Biomed. Eng., vol. 35, pp. 2065–2076, interests include biomedical digital signal processing,
Sep 2007. heart rate variability, nonlinear dynamics, soft com-
[10] I. Christov, G. Gómez-Herrero, V. Krasteva, I. Jekova, A. Gotchev, and puting, and the development of ubiquitous healthcare
K. Egiazarian, “Comparative study of morphological and time-frequency systems.