You are on page 1of 6

/typeset1:/sco4/jobs2/ELSEVIER/com/week.15/Pcom2212.

001101 Tue May 7 10:25:30 2002 Page


ARTICLE IN PRESS

Computer Methods and Programs in Biomedicine 000 (2002) 000– 000


www.elsevier.com/locate/cmpb

An automated data extraction system from 12 lead ECG


images
Sucharita Mitra a, M. Mitra b,*
a
Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata 700 035, India
b
Department of Applied Physics, Uni6ersity College of Technology, 92 APC Road, Kolkata 700 009, India

Received 14 September 2001; received in revised form 17 January 2002; accepted 4 March 2002

Abstract

F
A software based normalized ECG data acquisition system is developed for both normal and abnormal ECG

O
records. This system can transfer wave data recorded on paper to the digital time database. A flatbed scanner is used
to form an image database of each 12 lead ECG signal. These TIF formatted gray tone images are then converted

O
into two tone binary images with the help of histogram analysis. Smearing runlength technique is used to remove the
vertical and horizontal line segments of graphical papers. Thinning algorithm is applied to each image to obtain the
PR
skeleton (1 pixel representation) of each image, which is essential to avoid excess data points in the database. After
extracting pixel to pixel co-ordinate information of images of each of the signal of 12 lead ECG records, the data are
sorted to regenerate the signal. From standard deviation of the database a graphical analysis is performed to examine
the consistency of our database. © 2002 Published by Elsevier Science Ireland Ltd.

Keywords: Electrocardiogram; Database; Skeletonization; Smearing runlength; Images


D
TE

1. Introduction application potential. The added advantages of


this system includes —flexibility of data process-
In this era of Information Technology (IT) ing, reduction of noise, improved precision to
EC

advances have been made in storing, retrieving or control, data management and other various ad-
processing of any kind of information relating to ditional capabilities. In this paper, however, we
any field. The creation of the database is consid- concentrate to develop a purely computer based
ered as the most essential factor for utilization of data extraction and processing system of 12 lead
R

the IT tools. Therefore, a completely computer- ECG signals.


ized data extraction and processing system is An instrumentation scheme using Computer
R

much more acceptable today, because of its wider Aided Design and Drafting (AUTO CAD) applica-
tion package is already being used to generate an
O

* Corresponding author.
ECG database. Here, a single-channel strip chart
E-mail addresses: susa68@hotmail.com (S. Mitra), m – recorder and a digitizer with tablet attached to the
C

mitra@vsnl.net (M. Mitra). RS 422/432 port of the computer is used as input


N

0169-2607/02/$ - see front matter © 2002 Published by Elsevier Science Ireland Ltd.
PII: S 0 1 6 9 - 2 6 0 7 ( 0 2 ) 0 0 0 3 2 - 9
U
/typeset1:/sco4/jobs2/ELSEVIER/com/week.15/Pcom2212.001101 Tue May 7 10:25:30 2002 Page
ARTICLE IN PRESS
2 S. Mitra, M. Mitra / Computer Methods and Programs in Biomedicine 000 (2002) 000–000

device. A digital plotter/printer-plotter was used finally for human identification. Especially in
as output device [1]. rural area most of the doctors have conventional
The CAD based data acquisition system was ECG machine with paper plotter. So easiest way
used to study polar phase response property of to create digital time database is to use an appro-
monopolar chest leads (V1 – V6) ECG voltages. priate off line data acquisition package.
The spin harmonic constituent of ECG voltages is
evaluated at each harmonic plane and the polar 2. Materials and method
phase responses are studied at each plane [2].
Some systems were developed which can trans- To prepare the database of ECG signals we first
fer wave data recorded on paper to a digital time collected 50 ECG reports of different patients.
data corresponding to those obtained by A/D Out of these reports, 30 reports were of normal
converter [3,4]. patients and the rest were of patients suffering
A system was developed to scan ECG wave from some kind of heart disease. The ECG
forms stored on a paper and convert them to voltages of the usual 12 lead locations of the
digital data stored in computer. This system en- human body were obtained from the single chan-
abled quick and reliable measurements of QT nel chart recorder with a 25 mm s − 1 paper speed
intervals [5]. and an average resolution of 10 mV mm − 1.
Initially, a high resolution flat based scanner is
used to capture the image of each ECG signal Table 1
recorded on a single channel chart recorder A portion of extracted database in ASCII format

F
(Voltage in mV vs. time in s). These TIF format- X (s) Y (mV)
ted gray tone scanned images are then converted

O
into two tone binary images by binarization tech- 0 0.742
nique. These binary images are then fed to the 0.0008 0.742

O
next domain of the purely computerized system 0.0016 0.742
0.0024 0.742
developed by us to remove the horizontal and 0.0032 0.742
vertical line segments of graphical papers by ap-
plying Smearing runlength technique in a different
0.004
0.0048
PR 0.742
0.742
manner. The resulting image is then ported to the 0.0056 0.742
next part of the automated system where skele- 0.0064 0.742
0.0072 0.742
tonizaton or thinning of the region of ECG signal 0.008 0.744
D

is performed. In the next step of the system the 0.0088 0.744


raw database in ASCII format is generated Table 0.0096 0.744
TE

1. These data are then sorted and ported to the 0.0104 0.744
regeneration domain of the system. The captured 0.0112 0.744
0.012 0.744
pattern is compared or checked with the original 0.0128 0.744
waveform by the help of this regeneration module 0.0136 0.744
EC

of our system. A Time (in s) versus mV data-file is 0.0144 0.746


obtained for each of 12 lead ECG signal after 0.0152 0.746
each processing. A block schematic of the system 0.016 0.746
0.0168 0.746
is given below in Fig. 1.
R

0.0176 0.746
The creation of this off line data acquisition 0.0192 0.746
package is essential, because our final goal is to 0.02 0.746
R

develop a digital ECG database for subjects of 0.0208 0.746


different age, food habits, for rural and urban 0.0216 0.746
O

0.0224 0.746
people and for normal and diseased subjects. The
developed database will be further processed for Paper speed = 25 mm s−1, Calibration factor = 10 mV mm−1,
C

feature extraction, frequency plane analysis and Total number of points = 2730, Heart rate =68 min−1.
N
U
/typeset1:/sco4/jobs2/ELSEVIER/com/week.15/Pcom2212.001101 Tue May 7 10:25:30 2002 Page
ARTICLE IN PRESS
S. Mitra, M. Mitra / Computer Methods and Programs in Biomedicine 000 (2002) 000–000 3

Fig. 1. Block schematic of the proposed system.

F
A high resolution flatbed scanner (HP ScanJet 2.2. Remo6al of grid lines of graphical papers
4C model) is set at 1200 dpi high resolution and 256

O
gray tone spectral resolution for scanning those This histogram based approach for selecting a
ECG reports with very high accuracy. After scan- suitable threshold limit is partly used in removing

O
ning those reports the images of ECG signals were the graphical line segments having colors other
fed to a Personal Computer (PC) having a system than blackish. However, when the lines are black-
for extracting pixel to pixel co-ordinate informa- ish, they are not removed totally at the time of
PR
tion. For this purpose the following steps were binarization. There are several traditional ways for
involved. removing thin continuous lines like median filters,
morphological operators etc. but in our observa-
2.1. Binarization of the image tion we found out that more often the gridlines are
D

not continuous rather dotted in most cases. For this


This process converted a gray tone image into reason another technique based on Runlength
TE

two tones or binary image after selecting a suitable Smearing algorithm is used to remove those lines.
threshold limit. A histogram-based approach is us- The method is more effective to remove only the
ed to determine this threshold limit. A routine rev- line portions keeping intact the signal portion of the
iew of the histogram of each scanned image is done image.
EC

before setting the threshold limit. For example: Runlength smearing algorithm: The runlength-
We know that after scanning, we obtain images smearing algorithm was first proposed by Wong et
having pixel levels varied from 0 to 255 (for gray al. (1982). This operation can connect two non-ad-
tone image). So to mark or detect only the black jacent runs into one merged run if the distance
R

or nearly black pixels a threshold is chosen from between them is smaller than a threshold limit
the peak nearest to 0 pixel level of the histogram. [9].
R

Hence all the pixels other than black got a value The technique applied here is used to merge two
(say 0) and the rest i.e. the black pixels got another non-adjacent runs of zeros into one merged run if
O

value 1 resulting in a two tone or binary image. This the distance between them is smaller than a pre-
approach also increased the robustness of the defined threshold. This method is used in separat-
C

system. ing the ECG signals from the paper record.


N
U
/typeset1:/sco4/jobs2/ELSEVIER/com/week.15/Pcom2212.001101 Tue May 7 10:25:30 2002 Page
ARTICLE IN PRESS
4 S. Mitra, M. Mitra / Computer Methods and Programs in Biomedicine 000 (2002) 000–000

2.3. Thinning of the input signal mm intervals. A heavier line represents 5 mm


interval.
Thinning of the input images is necessary to Time is measured along horizontal lines where
avoid repetition of co-ordinate information in the one smallest division i.e. 1 mm= 0.04 s. Voltage is
data-set of normalized database. For this purpose measured along the vertical lines and is expressed
a thinning algorithm [7] is used where region points as mm (10 mm= 1 mV). In routine cardiographic
are assume to have a value of 1 and background practice the recording speed is 25 mm s − 1. The
points have a value of 0. The method consists of usual calibration is a 1 mV signal, which produces
successive passes of two basic steps applied to the a 10 mm deflection. We use a resolution 1200 dpi
contour point of the given region. A contour point both in vertical and horizontal direction which
may be considered as any pixel having value 1 and indicates the resolution 1 pixel= 25.4/1200 =
at least one eight-neighbor valued 0. For the step 2.12e − 2 mm. This calibration factor is used in both
1 the contour point p1 (see Fig. 2) will be flagged X and Y direction for getting the data-set. The
for deletion if the following conditions are satisfied: paper speed is 25 mm s − 1. Which would indicate
(i) 25 N(p1)56; (N(p1), Number of non-zero a maximum precision of 3.4e − 3 s.
neighbors of p1); These data-set are easily converted into time-mV
(ii) S(p1)= 1; (S(p1), Number of zero to one data-set from the relations mentioned above.
transitions in the ordered sequence); To standardize the ECG database this calibra-
(iii) p2, p4, p6 =0; tion part is essential. The most important parame-
(iv) p4, p6, p8 =0; ter of ECG signal is patient’s heart rate. Hence, for

F
this purpose, we also include heart rate of each
In step 2 the first two conditions are same
patient in each data file of this database.

O
but the other two are changed to:
(v) p2, p4, p8 =0;
2.5. Data sorting and re generation of the ECG
(vi) p2, p6, p8 =0.

O
signal
The iteration of this algorithm consists of (1)
application of step1 to flag the border points for PR
Since the raw data are extracted from the image
deletion, (2) deletion of those border points, (3)
so they are arranged according to their ordinate
application of step 2 to flag the remaining border
value. To maintain similarities between the cap-
points for deletion and (4) deletion of those
tured data and the paper record a bubble sorting
points. The successive operations of these four
is performed to arrange the extracted co-ordinates
steps until no further points are deleted resulting
D

according to their abscissa.


the skeleton of the region.
Now those sorted data are ported to the regen-
TE

eration domain of the system for checking the


2.4. Raw data extraction captured pattern with the original wave shape. The
reproduced ECG signal before and after thinning
In this step the co-ordinate of each black pixel of an ECG image (Fig. 3) is shown in Figs. 4–6,
EC

is extracted and then the values are calibrated respectively.


according to the co-ordinate system of the chart.
Electrocardiograph paper is a graph in which 2.6. Result
horizontal and vertical lines are represented at 1
R

In the present investigation, the analysis has been


carried out for about 30 normal subjects and also
R

for 20 abnormal subjects mostly having Myocardial


Infraction and Ischimia. The normal subjects were
O

males in the age group 40–50 years. An estimation


of Standard Deviation (S.D.) is being done for the
C

Fig. 2. Eight neighbors of P1. whole database.


N
U
/typeset1:/sco4/jobs2/ELSEVIER/com/week.15/Pcom2212.001101 Tue May 7 10:25:30 2002 Page
ARTICLE IN PRESS
S. Mitra, M. Mitra / Computer Methods and Programs in Biomedicine 000 (2002) 000–000 5

Fig. 5. Reproduced ECG signal from extracted database after


Fig. 3. Image of original ECG signal from chart record. thinning.

The S.D. is a measure of how widely values are


dispersed from the average value (the mean). The
S.D. is calculated using the ‘nonbiased’ or ‘n − 1’ database. From this graphical analysis it has been

D
method by using the following formula: noted that the data set is slightly deviated at the
peak region where as it is almost steady at base
n % x 2 −(% x)2

F
line.

O
n(n− 1)
3. Discussions
where X is a variable and n is the total number of

O
samples. A software based ECG data extraction system
To estimate the S.D. of the database a test is developed for both normal and abnormal sub-
PR
sample of data-set has been taken and the whole jects to obtain a normalized ECG database to
process has been repeated for 50 times on same help further processing, analysis, control and deci-
image to obtain 50 different sets of data set of sion making application. This database is made
same sample. compatible to the existing database management
The mean S.D. is calculated and graphs are
D

systems. For this purpose the ECG images are


plotted against number of points for both abscissa processed to extract pixel to pixel co-ordinate
and ordinate to examine the consistency of the
TE

information of ECG signal. The consistency of


this database is also checked from a graphical
analysis of mean S.D. of the data-sets.
EC
R
R
O
C

Fig. 4. Extracted ECG signal before thinning. Fig. 6. Graphical analysis of mean S.D. (for ordinate).
N
U
/typeset1:/sco4/jobs2/ELSEVIER/com/week.15/Pcom2212.001101 Tue May 7 10:25:30 2002 Page
ARTICLE IN PRESS
6 S. Mitra, M. Mitra / Computer Methods and Programs in Biomedicine 000 (2002) 000–000

4. Uncited refernces [5] H.K. Bhullar, D.P. deBono, J.C. Fothergill, N.B. Jones,
A computer based system for the study of QT intervals,
Proceedings Computers in Cardiology, Venis, Italy, 23 –
[6,8,10–12]. 26 September 1991, pp. 533 – 536.
[6] O. Pahlm, L. Sornmo, Dataprocessing of exersize ECG,
IEEE Trans. Biomed. BME 34 (1987) 158 – 165.
[7] R.C. Gonzalez, R.E. Woods, Digital Image Processing,
third edn, Addison Wesley/Longman, New York, 2000,
References pp. 491 – 495.
[8] A.M. Goon, M.K. Gupta, B. Dasgupta, Fundamental of
[1] B. Goswami, T.K. Mitra, M. Mitra, B. Nag, S.K. Basu, Statistics, sixth edn, The World Press Pvt Ltd, India,
D.K. Basu, Data base generation from ECG records 1993, pp. 449 – 454.
using AUTOCAD application package, IETE Technical [9] A. Rosenfield, A.C. Kak, Digital Picture Processing, vol.
Rev. 11 (1994) 67 – 69. 2, Academic Press, New York, 1982.
[2] B. Goswami, M. Mitra, B. Nag, T.K. Mitra, The polar [10] G. Wang, M. Takigawa, Development of data reconstruc-
phase response property of monopolar ECG voltages tion system of paper recorded ECG: method and its
using a computer aided design and drafting (CAD) based evaluation, Electroencephalogr. Clin. Neurophysiol. 83
data acquisition system, Int. J. Biomed. Comput. 33 (1992) 398 – 401.
(1993) 209 – 217. [11] W. Gall, H. Heinzl, P. Sachs, Extracting a statistical data
[3] A. Tezuka, A study on transformation system of wave matrix from electronic patient records:, Comput. Meth-
data recorded on paper into digital time data, Rec. Electr. ods Programs Biomed. 66 (2001) 153 – 166.
Commun. Eng. Conversazione 57 (1989) 343 –344. [12] S. Madhvanath, S. McCauliff, K.M. Mohiuddin, Extract-
[4] L.E. Wideman, G.L. Freeman, A to D conversion from ing Patron Data from Check Images: Proceedings of the
paper records with a desktop scanner and a microcom- Fifth International Conference on Document Analysis

F
puter, Comput. Biomed. Res. (USA) 22 (1989) 393. and Recognition, Bangalore, India, 1999, pp. 519 – 522.

O
O
PR
D
TE
EC
R
R
O
C
N
U

You might also like