You are on page 1of 4

Stream Query Processing for Healthcare Bio-sensor Applications

Chung-Min Chen Hira Agrawal Munir Cochinwala David Rosenbluth


Telcordia Technologies, One Telcordia Drive, Piscataway, NJ 08854-4157
chungmin@research.telcordia.com

1. Introduction sends the individual’s acceleration to a receiving base


station. The acceleration information, correlated with the
The need of a data stream management system ECG sensor stream, helps to remove motion artifact from
(DSMS), with the capability of querying continuous data the ECG data and, in a real setting, reduces false alarms
streams, has been well understood by the database of irregular heart beats due to body motion.
research community and witnessed by a proliferation of A recent work [1] also addresses data management
related publications in this area (see, e.g., [2] for a partial issues in clinical ECG data. However, the focuses of [1]
survey). Examples of applications abound in many were on off-line analysis of disk-recorded ECG data, and
domains: from environmental and military applications its integration with other medical information stored on
consuming streams of sensor data, to telecommunications relational databases.
and data network assurance systems analyzing real-time
network traffic data. 2. ECG sensor networks
This article provides an overview on a DSMS
prototype called T2. T2 inherits some of the concepts of In order to provide an understanding of the nature of
an early prototype, Tribeca [3], developed also at the queries to be performed on ECG stream data, we first
Telcordia, but with complete new design and give a brief introduction to the principles of
implementation in Java with an SQL-like query language. electrocardiography. As we will discuss in more detail
Our goal is to build a framework that provides a later, mobility is an important characteristic of the data
programming infrastructure as well as useful operators to streams discussed here and distinguishes the nature of the
support stream processing in different applications. The streams and the queries to be performed on them from
framework provides for, among other things: more traditional ECG analysis that does not take into
x A stream query processor to query continuous account the effect of body motion.
streams with standard relational operators and time The human body, by virtue of the chemical nature of
window constraints. its fluids, is essentially a volume conductor, its boundary
x A mechanism to run user-defined functions on being limited by the body surface. The nearly
streams. synchronous activity of large numbers of cardiac muscle
x Thread management including scheduling of query cells generates electrical fields large enough that they can
plans. be mapped by sensing electrical currents or potential
x An adaptor layer that converts different stream differences between probe points on the body surface and
types into streams of Java objects that can be processed a ground reference. It is from a picture of these three
by the query processor. dimensional electrical fields and their evolution over time
We set our first targeted application to healthcare bio- that information about the muscular contraction of the
sensor networks, where we applied T2 to monitoring and heart can be deduced.
analyzing electrocardiogram (ECG) data streams, arriving The signal detected by a probe depends upon the
via wireless networks from mobile subjects wearing ECG position of the probe within the electromagnetic field
sensors. Monitoring remote patients via wireless sensors (EMF). To obtain a complete picture of the EMF,
not only provides convenience and safety assurance to the measurements are taken from multiple different points on
patients, but also saves health care cost in many aspects. the body surface. The standard three-lead ECG takes
In our trail setting, each subject wears three ECG measurements from the right arm just above the wrist
sensing elements (leads) and a small transmitter that (RA), the left arm just above the wrist (LA), and the left
amplifies, digitizes, and time stamps the three analog leg just above the ankle (LL). From the differences
signals before transmitting each sample (a time-stamped between the fields detected at each pair of these points,
triplet) to a receiving base station. In addition, a subject one can determine changes in the electrical activity of the
may wear an accelerometer that continuously senses and heart occurring in the front-parallel plane, which in the

Proceedings of the 20th International Conference on Data Engineering (ICDE’04)


1063-6382/04 $ 20.00 © 2004 IEEE
case of human physiology is the most important aspect.
Derived parameters can be computed and monitored from ECG base Client GUI
the above three EMF feeds, based on which timely ISM station (PC)
medical decisions can be made.
The ECG signals, however, could be corrupted with
ECG
noises due to subject mobility. This causes the signals to
sensor wireline IP
be inaccurate and may trigger annoying false alarms (e.g.,
irregular heart rate). To cope with the problem, we add a network
wearable accelerometer to the subject, which senses and
sends back the acceleration of subject. With this T2 Server
information, one can filter out the period of ECG signals Patient Wi-Fi
Wi-Fi
that correlates with extensive acceleration activity.
We have set up a wireless network environment with
Accelerometer
embedded ECG and accelerometer sensors. Figure 1 Accelerometer Relay Node
shows the picture. The ECG sensors, BioRadio 110, are (Pocket PC)
manufactured by Cleveland Medical Devices Inc. The
Bluetooth Doctor
subject who wears the sensors will attach the three leads
to their body. The leads are wired into a transmitter (3.8
oz) that the subject also carries. The transmitter digitizes Figure 1. T2 healthcare sensor network
and time-stamps the readings at a pre-specified sampling
rate. The transmitter sends the data continuously using an library of predefined utility functions. Although
ISM band (the same spectrum used by cordless phones) performance remains an issue, we believe the productivity
wireless link to a base station. The base station is a PC gains it affords far outweigh its problems. We also
running Windows 2000 with a receiver card believe that the steady progress being made by Java
(manufactured by the same vendor) attached to its serial Virtual Machine implementations will soon enable many
port. Data acquisition software was written in Labview of them to reliably match the performance of C++.
from National Instruments. The effective transmission We will describe briefly some elements of T2 that are
range varies from 25 to 350 feet, depending on conditions relevant to our discussion of the ECG and accelerometer
of line-of-sight. Currently, we set the ECG sampling rate stream monitoring. Upon startup, T2 spawns two major
at 500 Hz with 16-bit accuracy per reading. threads: a stream server and a query server. The stream
We obtained the accelerometers from Crossbow server listens to connections from sensors over a TCP
Technologies. The sensor weighs 1.62 oz and can socket and writes the data into a buffer designated for the
measure acceleration between –4 and +4 g, which is good corresponding stream. Users issue queries through a client
enough for our purpose. The accelerometer is wired to a program with a command-line or graphical user interface.
Bluetooth sensor adaptor that the subject also wears. The Schema and Base Streams: In T2, a stream schema is
adaptor, obtained from Roving Networks Inc., converts similar to a relational scheme, except that it is defined as
and transmits the acceleration data via Bluetooth radio to a Java class. Currently we support “float” and “string”
a PocketPC with built-in Bluetooth transmitter. The attributes; timestamp is defined as “float”. A base
PocketPC then relays the data stream via a wireless link stream, analogous to a table, must be created with a
to a Wi-Fi access point. unique name and an associated schema. When a sensor
The ECG and acceleration data streams have separate stream connects to the server, it must send along a base
wireless transmission paths. Once received at the stream name. In T2, multiple sensor streams can write
respective base stations, the streams are routed to a Sun concurrently to the same base stream (just like the case
workstation running the T2 stream query processor. Users where multiple writers can write to a table). This is
can monitor and analyze the streams in real-time by helpful in our application where instead of creating a
issuing queries to T2. separate ECG stream for each patient, we can merge the
ECG sensor streams into a single base stream.
Buffering: T2 allocates a buffer for each base stream.
3. Overview of T2 The buffer is implemented as a circular buffer with
multiple (upstream) writers and (downstream) readers.
T2 is a generic DSMS prototype built in Java. The
Concurrent write/read operations to the buffer are
decision of using Java as opposed to C/C++ is based on
protected using Java’s synchronization mechanism. The
organizational needs as well as on technical
buffer maintains two variables: front and rear that
considerations such as Java’s built-in support for
keep track of the currently occupied portion of the buffer.
networking, code mobility, security, type-safety, multi-
A write operation simply appends a tuple to the front of
threading, platform independence, as well as its rich

Proceedings of the 20th International Conference on Data Engineering (ICDE’04)


1063-6382/04 $ 20.00 © 2004 IEEE
the buffer. Each new reader must open a separate cursor Database
C Compute
that initially set to rear. For efficiency, a delayed- 1. Calibrate 2. avg. heart
update policy is adopted which updates rear only when heart rate rate
a write operation finds the buffer full. The rear can be
A D
updated in one scan of the opened cursors. The same
buffering mechanism is also used to realize intermediate ECG Detect Look up
4. irregular F 5. doctor’s
streams. Sensor
heart rate contact info.
Intermediate Streams: Intermediate streams are
intermediate output of relational operators or user-defined E
functions. In T2, tuples in intermediate streams are Accelerometer
B Compute Automatic
maintained as Java object references that point to the base 3. avg. 6. paging
tuples. That is, they are non-materialized. No copying is acceleration service
performed unless necessary. An intermediate tuple may
comprise several object references, as a consequence of Stream schema Attribute meanings
multi-way join operation. Materialization of intermediate A: (ts, pid, RA, LA, LL) ts: timestamp
tuples is delayed until the final result is produced or when B: (ts, pid, acc) pid: patient ID
an aggregation function is encountered in the evaluation C: (ts, pid, hrt) RA, LA, AA: ECG readings
path. D: (ts, pid, avg_hrt) acc: acceleration
E: (ts, pid, avg_acc) hrt: heart rate
Relational operators and SQL: T2 supports relational avg_hrt: average heart rate
F: (pid)
operators including selection, join and aggregations. The avg_acc: average acceleration
latter two operations can be scoped with a given window
definition. Both moving and accumulating windows are
supported, with three time line options provided: tuple Figure 2. ECG stream monitoring
count, timestamp attribute, and system time. The
following is the syntax of a window-based aggregate The composite tuples are then further evaluated by the
query: predicate specified in the where clause..
select avg|max|min|sum(expression), … User-defined Functions: T2 supports user-defined
from stream_name functions on streams with a stream I/O framework. In its
[where predicate] most general form, users can use T2 stream objects and
window on count | system | ts_attribute their read/write methods to code their own functions. The
[size window_size] user-defined functions are registered with T2 as static
[interval window_interval] methods, which are invoked during query execution.
[group by expression, …] User-defined functions are essential in applications that
Keywords are in boldface and the clauses in brackets require more sophisticated analysis than SQL can handle.
are optional. To execute the query, the predicate specified
in the where clause is used to filter out the satisfied 4. Health monitoring scenario
tuples. Then the tuples are partitioned into separate sub-
streams according to their evaluated values in the group- We now describe an example ECG monitoring
by list. Next, the aggregates are computed using the scenario that includes the complete process: ECG sensing,
window specification on each of the sub-streams. The monitoring, triggers, and notification to medical staff.
window clause specifies a time line option, the size of the Figure 2 shows the example scenario that works with
moving window (0 indicating an accumulating window two sensor streams: patients’ ECG and their
instead), and the interval between successive windows (or accelerometer readings, depicted as stream A and B,
reporting interval in the case of accumulating window).. respectively. All stream entries are time stamped. The
The following shows the syntax of a simple two-way application performs the following task: It monitors the
join query. patients’ heart rate and accelerometer readings and
select expression, … whenever it detects an elevated heart rate for a patient, it
from S, T queries the doctor’s contact information and notifies him.
where predicate Simultaneously, it could forward the patient’s ECG and
window on [S.ts, T.ts] accelerometer readings to the doctor’s handheld device
size window_size (e.g., a cell phone or pager with an LCD).
The window is defined on the timestamp attributes S.ts We shall now discuss the operations involved in the
and T.ts from stream S and T, respectively. Semantically, above example. In module 1, a heart rate stream is
this means that each tuple s from S is to be paired with computed from the ECG stream. This requires calibrating
every tuple t from T where | s.ts – t.ts | d window_size. the period from the ECG signals and is best implemented

Proceedings of the 20th International Conference on Data Engineering (ICDE’04)


1063-6382/04 $ 20.00 © 2004 IEEE
as a user-defined function, though a coarse estimate can
be computed using SQL select queries with moving time
window. This produces an intermediate stream C. Next, in
module 2, the heart rate is averaged over a fixed-size
moving time window. This is expressed in SQL as:
create stream D (ts, pid, avg_hrt) as
select min(ts), pid, avg(hrt) from C
window on ts size 30 second
group by pid.
The create-clause in the above query creates an
intermediate stream out of stream C. Similarly, in module
3, we compute average acceleration over moving time
window as:
create stream E (ts, pid, avg_acc) as Figure 3. GUI snapshots
select min(ts), pid, avg(abs(acc)) from B
window on ts size 60 second ms. = 200 ms, with successive windows placed at 150u2
group by pid. ms = 300 ms apart. The right window shows the result of
Note in the above the ECG and acceleration are the same query but with an addition clause “where ra >
averaged on different window size. 200” that selects only those signals greater than 200 PV.
Now, if a patient’s average heart rate exceeds a pre-
defined threshold, we check, by joining with the average
acceleration stream, if this heart rate elevation is caused
6. Ongoing work
by body motion. This is done in module 4 and expressed
as: Our initial testing shows that T2 can effectively
create stream F (pid) as process multiple select and aggregate queries with
select pid from D, E moving time windows against a few number of ECG data
where D.avg_hrt > hr_threshold and streams. Future work will gauge its scalability to
D.pid = E.pid and E.acc < acc_threshold increasing number of streams and more complicated join
window on (D.ts, E.ts) size 120 second. queries. We also plan to add location-tracking capability
In the query above, acc_threshold is a pre-defined to the handheld devices carried by the mobile subjects.
constant that is small enough to safely assure that the The location data is to be streamed back to the server
patient is not moving around. As a result, any patient with where it can be correlated with ECG and acceleration
an elevated average heart rate, at the absence of an streams for better patient monitoring and emergency
accompanied motion activity, is considered irregular and response. Finally, although Java allows user to assign
will have their ID show up in the result stream F. thread priorities, it makes no promises about scheduling
Stream F then is joined with a static relational table(s) and fairness with respect to those priorities. We intend to
to retrieve the contact information (e.g. pager or cell develop a thread management framework on top of
phone number) of the doctor responsible for the patient. standard Java primitives to overcome this problem. We
Automatically paging and forwarding ECG data to a are also tracking the emerging Real Time Java
doctor upon detecting a serious situation, however, is a Specification that provides for real-time threads with
task that is best performed programmatically using an strictly defined, fixed-priority, preemptive scheduling, as
API within the overall framework but outside the query well as no-heap threads that are guaranteed to never be
processor. delayed by garbage collection.

5. GUI snapshots References

[1] A. F. Cárdenas, R. K. Pon, R. B. Cameron. "Management


Figure 3 shows the GUI snapshots of two example of Streaming Body Sensor Data for Medical Information
queries issued against an EKG sensor stream with four Systems," in Proc. Int. Conf. on Mathematics and Engineering
attributes: time, ra, la, and ll, at 500 MHz (one sample per Techniques in Medicine and Biological Sciences, June 2003.
[2] L. Golab, M.T. Özsu. “Issues in Data Stream
2 ms.). The left window shows the result of the query:
Management”, SIGMOD Record, vol. 22, nbr.2, June 2003.
select avg(ra) from EKG window on time size 100 [3] M. Sullivan, A. Heybey. “Tribeca: a system for managing
interval 150. This query computes average of the “ra” large databases of network traffic”, In Proc. USENIX Annual
signal stream over a moving time window size of 100u2 Techical Conf. 1998.

Proceedings of the 20th International Conference on Data Engineering (ICDE’04)


1063-6382/04 $ 20.00 © 2004 IEEE

You might also like