You are on page 1of 29

Sensor Network Databases An Introduction

Advanced Research Topics in Databases Summer 2004-2005 Lina Al-Jadir

Context

MICS (Mobile Information and Communication Systems) project (www.mics.org):

promoted by the National Competence Center in Research (NCCR), under the authority of the Swiss National Science Foundation (FNRS) goal: study fundamental & applied questions raised by future mobile communication and information systems call for proposals for 2nd phase (Nov. 2005 Oct. 2009)
research cluster: in-network information management support end-to-end information management for sensor and mobile ad-hoc networks
Semantic integration of heterogeneous sensor network resources and backend databases exploiting the temporal and spatial dimension is required to make sensor data available through the Internet and Grid infrastructure

Lina Al-Jadir, April 2005

Outline
1. Introduction
Definition, Applications, Differences, Storage

2. Queries
2.1. Querying in Cougar 2.2. Querying in TinyDB 2.3. In-network Aggregation

3. Other Issues

Lina Al-Jadir, April 2005

1. Introduction

From a data storage point of view, a sensor network database:


a distributed database that collects physical measurements about the environment, indexes them, and serves queries from users and other applications external to or from within the network

Research in sensor network databases:


relatively new can benefit from current efforts in data streams and P2P networks

Lina Al-Jadir, April 2005

1. Introduction

Sensor network:

10s to 100s of autonomous nodes that operate without human interaction (e.g. configuration of network routes, recharging of batteries, tuning of parameters) for weeks or months sensor node:

battery-powered, wireless computer physically small (few cubic centimeters) extremely low power (few tens of milliWatts versus tens of Watts for a laptop) Power = Watts (W) = Amps (A) * Volts (V) Energy = Joules (J) = W * time

Lina Al-Jadir, April 2005

1. Introduction

Each sensor in a sensor network:

takes time-stamped measurements of physical phenomena, e.g. temperature, light, sound, air pressure: sensor data contains its characteristics, e.g. id, location, type of sensor: stored data a sensor network database: combination of sensor data and stored data from every sensor

Modern sensors:

do not only respond to physical signals to produce data embed computing and communications capabilities: able to store, process locally, transfer the data they produce
6

Lina Al-Jadir, April 2005

1. Introduction

Many applications monitor the physical world by querying and analyzing sensor data, e.g.:

supervising items in a factory warehouse (temperature) organizing vehicle traffic in a large city (vehicle passing) monitoring earthquakes in shake-test sites monitoring habitat: petrels (birds) on Great Duck Island

Lina Al-Jadir, April 2005

1. Introduction

Differences between sensor network DB and other DB, at the physical level:

the network replaces the storage and the buffer manager


data transfers are from data in node memory as opposed to data blocks on disks

node memory is limited by cost and energy considerations, unlike disk storage that has become incredibly inexpensive the system is highly volatile (nodes may be depleted, links may go down)
the system should provide the illusion of a stable environment

access to data may be hampered by long delays; rates at which input data arrives to a DB operator can be highly variable
rates and availability of data have to be continuously monitored (not enough to make a query execution plan once)

Lina Al-Jadir, April 2005

1. Introduction

relational tables are not static (new data is continuously being sensed)
regarded as append-only streams where certain reordering operations no longer available

high energy cost of communication encourages in-networking processing during query execution
query processing to be closely coupled and co-optimized with the networking layer

limited storage on nodes and high communication costs


older data has to be discarded, the system should maintain more high-level statistical summaries of deleted information, to answer queries about the past

sensor tasking interacts with the sensor database system classical metrics of DBMS performance have to be adjusted

Lina Al-Jadir, April 2005

1. Introduction

Differences between sensor network DB and other DB, at the logical level:

sensor network data consists of measurements from the physical world, which include errors (e.g. noise)
range queries (instead of exact queries) and probabilistic or approximate queries

durations and sampling rates for data to be acquired


additional operators to be added to the query language

continuous, long-running queries (e.g. monitoring avg. temperature in room)


data streams

report exceptional cases or other events of interest


operators to describe event detection and action triggers

Lina Al-Jadir, April 2005

10

1. Introduction

2 ways to implement a sensor network database:

warehouse (centralized) approach: 2 steps

data is extracted from the sensor network in a predefined way and is stored in a database located on a unique front-end server (connected to the network via an access point) query processing takes place on the centralized database well suited for answering predefined queries over historical data disadvantages:

nodes near the access point become traffic hot spots, central points of failure, may be depleted of energy prematurely does not take advantage of in-network aggregation of data to reduce communication load, when only aggregate data needs to be reported sampling rates have to be set to the highest that might be needed for any potential query, possibly burdening the network with unnecessary traffic

Lina Al-Jadir, April 2005

11

1. Introduction

in-network (distributed) approach: stores the data within the network itself and allows queries to be injected anywhere in the network
efficient:

only relevant data are extracted from the sensor network allows data to be aggregated before it is sent to an external query

Lina Al-Jadir, April 2005

12

1. Introduction

To characterize the performance of a sensor network DBMS:

network usage:

total usage: total nb. of paquets sent in the network hot spot usage: max. nb. of paquets processed by any particular node

preprocessing time: time to construct an index storage space: storage for data and index query time: time to process a query, assemble an answer, and return it throughput: average nb. of queries processed per unit of time update and maintenance cost: cost for sensor data insertions, deletions, or repairs when nodes fail

Lina Al-Jadir, April 2005

13

1. Introduction

When designing a sensor DB, the following properties are desirable:

persistence: data stored must remain available to queries, despite sensor node failures and changes in the network topology consistency: a query must be routed correctly to a node where the data are stored controlled access to data: different update operations must not undo one anothers work, queries must always see a valid state of the DB scalability in network size: as the nb. of nodes increases, the total storage capacity should increase, and the communication cost should not grow unduly load balancing: storage should not unduly burden any node, nor should a node become a concentration point of communication topological generality: DB architecture should work well on broad range of network topologies

Lina Al-Jadir, April 2005

14

2. Queries

Express queries to a sensor network DB at a logical, declarative level, using SQL Example: flood warning system. A user from an emergency management agency sends a query to the flood sensor DB:
for the next 3 hours, retrieve every 10 minutes the maximum rainfall level in each county in Southern California, if it is greater than 3.0 inches
select max(rainfall_level), county from Sensors where state = 'Southern California' group by county having max(rainfall_level) > 3.0 in duration [now, now + 180 min] sampling period 10 min

Lina Al-Jadir, April 2005

15

2. Queries

Characteristics of queries:

a query is expressed over one table comprising all sensors in the network, each sensor corresponds to a row in the table. assumed that the DB schema is known at a fixed base station. For a P2P system where a query may originate from any node, the DB schema will have to be broadcast to every node. monitoring queries are long-running duration clause: period during which data is to be collected, sampling period clause: frequency at which the query results are returned. desired result is a set of notifications of system activity (periodic or triggered by special situations)
16

Lina Al-Jadir, April 2005

2. Queries

queries need to aggregate sensor data over time windows


every ten minutes, return the average temperature measured over the last ten minutes

queries need to correlate data produced simultaneously by different sensors


sound an alarm whenever 2 sensors within 10 meters of each other simultaneously detect an abnormal temperature

most queries contain some conditions on the sensors involved (usually geographical conditions)

Lina Al-Jadir, April 2005

17

2. Queries

3 types of queries:

long-running, continuous queries: report results over an extended time window


ex: for the next 3 hours, retrieve every 10 minutes the rainfall level in Southern California

snapshot queries: data in the network at a given point in time


ex: retrieve the current rainfall level for all sensors in Southern California

historical queries: aggregate information over historical data


ex: retrieve the average rainfall level at all sensors for the last 3 months of the previous year

A user interacting with a sensor DB will issue a sequence of queries to obtain the information use outputs from past queries as inputs to further commands
18

Lina Al-Jadir, April 2005

2.1. Querying in Cougar

Cougar sensor database system (Cornell University):

maintains an SQL-like query interface for users at a front-end server connected to a sensor network represents sensor data as time series (each measurement associated with a timestamp) creates for each type of sensor (e.g. temperature sensors, seismic sensors) an abstract data type (ADT)
an ADT provides access to encapsulated data through a set of functions

assumes that the nodes are time synchronized reasonably well no misalignment when multiple time series are aggregated

Lina Al-Jadir, April 2005

19

2.1. Querying in Cougar

a measurement may not be instantaneously available (network delays) Cougar introduces virtual relations (defined for ADT methods):

relations that are not actually materialized, views that are persistent during their associated time interval whenever a signal processing function returns a value, a record is inserted into the virtual relation (append-only) records are never updated or deleted

Lina Al-Jadir, April 2005

20

2.1. Querying in Cougar

Example: simplified schema of the sensor DB contains one relation Sensors(loc POINT, floor INT, s SENSORNODE)

loc: location of the sensor floor: floor where the sensor is located in the data warehouse s: sensor node SENSORNODE is an ADT that has the methods: getTemp() and detectAlarmTemp(threshold) where threshold is the temperature above which abnormal temperatures are returned. Both methods return temperature as float.

Lina Al-Jadir, April 2005

21

2.1. Querying in Cougar

Query1: return repeatedly the abnormal temperatures measured by all sensors


select Sensors.s.detectAlarmTemp(100) from Sensors where $every();

Query2: every minute, return the temperature measured by all sensors on the third floor
select Sensors.s.getTemp() from Sensors where Sensors.floor=3 and $every(60);

Lina Al-Jadir, April 2005

22

2.1. Querying in Cougar

Query3: generate a notification whenever two sensors within 5 yards of each other measure simultaneously an abnormal temperature
select S1.s.detectAlarmTemp(100), S2.s.detectAlarmTemp(100), from Sensors S1, Sensors S2 where distance(S1.loc,S2.loc) < 5 and S1.s > S2.s and $every();

Lina Al-Jadir, April 2005

23

2.1. Querying in Cougar

expensive to transmit data from all sensors to the front-end server where the query processing could be performed Cougar considers distributed (in-network) query

processing
ex: push the selection (max(rainfall_level) > 3.0 in) out to each sensor, so that only those that satisfy the condition return a virtual record (level measurement + sensor ID + timestamp)

Lina Al-Jadir, April 2005

24

2.1. Querying in Cougar

to model uncertainty (due to device noise, environmental perturbations), Cougar uses Gaussian ADT (GADT)

models uncertainty as a continuous probability distribution function over measurement values GADT has a set of defined functions: Prob, Diff, Conf ex: retrieve from sensors all tuples whose temperature is within 0.5 degrees of 68 degrees, with at least 60% probability
select * from Sensors where Sensors.s.getTemp().Prob([67.5,68.5] >= 0.6)

probabilistic queries and range queries

Lina Al-Jadir, April 2005

25

2.2. Querying in TinyDB

TinyDB sensor system (UC Berkeley):

distributed query processor, runs on Berkeley Mica mote platform, on top of TinyOS operating system successful deployments in Intel Berkeley Lab and redwood trees at UC Botanical Garden

largest deployment: ~80 weather station nodes collect dense sensor readings to monitor climatic variations across altitudes, angles, time, forest locations, etc. study how dense sensor data affect predictions of conventional tree-growth models

Lina Al-Jadir, April 2005

26

2.2. Querying in TinyDB

acquisitional query processing (ACQP):

goal: reduce power consumption (placing new sensors, replacing or recharging batteries of sensors is time consuming and expensive) idea: smart sensors have control over where, when, and how often data is physically acquired (i.e. sampled) and delivered to query processing operators TinyDB has the features of a traditional query processor (select, join, project, aggregate), and special ACQP features

Lina Al-Jadir, April 2005

27

2.2. Querying in TinyDB


Basic architecture:

query submitted at a PC (base station), parsed, optimized query sent into the sensor network, disseminated, processed result flows back up the routing tree that was formed as the query propagated

Lina Al-Jadir, April 2005

28

2.2. Querying in TinyDB


Power consumption:

Snoozing mode: processor and radio idle, waiting for a timer to expire, or external event to wake the device Processing mode: when the device wakes, it enters this mode. Query results generated locally Processing and Receiving mode: results collected from neighbors over the radio Transmitting mode: results for the query are delivered by the local mote
29

Lina Al-Jadir, April 2005

2.2. Querying in TinyDB

Communication:

Typical communication distances for low power wireless radios: few feet to around 100 feet
short ranges imply multi-hop communication where intermediate nodes relay information for their peers

Requirement: sensor networks be low maintenance and easy to deploy


communication topologies must be automatically discovered (i.e. ad-hoc) by the devices rather than fixed at the time of network deployment
devices keep a short list of neighbors who they have heard transmit recently, as well as routing information about the connectivity of those neighbors to the rest of the network

Lina Al-Jadir, April 2005

30

2.2. Querying in TinyDB

Routing tree:

allows a base station at the root of the network to disseminate a query and collect query results formed by:

the root sends a request, all child nodes that hear this request process it, and forward it on to their children, and so on, until the entire network has heard the request, nodes pick a parent node (with the most reliable connection to the root, i.e. highest link quality). This parent will be responsible for forwarding the nodes (and its childrens) query results to the base station.

Lina Al-Jadir, April 2005

31

2.2. Querying in TinyDB


Routing tree (multihop networking)
R:{}B B
B R:{} B B B

A
B

R:{} C
B

D R:{} B E
Lina Al-Jadir, April 2005 32

B B B

R:{} F
B

2.2. Querying in TinyDB

Basic language features:

declarative SQL-like query interface (selection, join, projection, aggregation) + explicit support for sampling, windowing view the entire sensor network as:

single, infinitely-long logical table, with columns for all the attributes defined in the network

sensor readings (one column per sensor type) meta-data: node id, location, etc. internal states: routing tree parent, timestamp, etc.

tuples are appended periodically, at sample intervals


33

Lina Al-Jadir, April 2005

2.2. Querying in TinyDB

Query1: return sensor id, light and temperature readings, once per second for 10 seconds
select nodeid, light, temp from Sensors sample interval 1s for 10s

results of a query:

stream to the root, where they may be logged or output to the user output: sequence of tuples, each tuple includes a timestamp.

Lina Al-Jadir, April 2005

34

2.2. Querying in TinyDB

Sensors table: (conceptually) unbounded continuous data stream of values

some blocking operations (e.g. sort) not allowed over such streams unless a bounded subset of the stream, or window, is specified windows defined as fixed-size materialization points over streams
create storage point recentlight size 8 as (select nodeid, light from Sensors sample interval 10s)

Lina Al-Jadir, April 2005

35

2.2. Querying in TinyDB

Joins allowed between 2 storage points on the same node, or between a storage point and the Sensors relation. Query2: return number of recent light readings (from 0 to 8 in the past) that were brighter than the current reading, every 10 seconds (landmark query)
select count(*) from Sensors s, recentLight r where r.nodeId = s.nodeId and r.light > s.light sample interval 10s

Lina Al-Jadir, April 2005

36

2.2. Querying in TinyDB

TinyDB supports grouped aggregations, and temporal operations Query3: return the average volume over the last 30 seconds, once every 5 seconds, sampling once per second (sliding-window query)
select winavg(volume, 30s, 5s) from Sensors sample interval 1s

when a query is issued, it is assigned an id


can be used to stop a query via stop query id command, or queries can be limited to run for a period via a FOR clause, or include a stopping condition as an event
37

Lina Al-Jadir, April 2005

2.2. Querying in TinyDB

Event-based queries:

events as a mechanism for initiating data collection events generated explicitly, either by another query or the operating system Query4: report the average light and temperature at sensors near a bird nest when a bird has been detected
on event bird-detect(loc): select avg(light), avg(temp), event.loc from Sensors s where dist(s.loc, event.loc) < 10m sample interval 2s for 30s

Lina Al-Jadir, April 2005

38

2.2. Querying in TinyDB

events are central in ACQP

allow the system to be dormant until some external conditions occur, instead of continually polling or blocking on an iterator waiting for some data to arrive significant reduction in power consumption

events can also serve as stopping conditions for queries

Lina Al-Jadir, April 2005

39

2.2. Querying in TinyDB

Lifetime-based queries:

in lieu of explicit SAMPLE INTERVAL clause, users may request a specific query lifetime via LIFETIME <x> clause, where x is a duration in days, weeks, or months Query5: the network should run for at least 30 days, sampling light and acceleration sensors at a rate that is as quick as possible
select nodeId, light, accel from Sensors lifetime 30 days

TinyDB performs lifetime estimation:

computes sampling and transmission rate given a number of Joules of energy remaining

Lina Al-Jadir, April 2005

40

2.2. Querying in TinyDB

Power-aware optimization:

queries are parsed and optimized at the base station (ordering of sampling, selection, and joins) before being disseminated in the network each node in TinyDB has metadata describing its local attributes. This metadata is periodically copied to the root for use by the optimizer.

power: cost to sample this attribute (in J) sample time (in s) constant?: is this attribute constant-valued (e.g. id)? rate of change: how fast the attribute changes (units/s) range: dynamic range of attribute values

Lina Al-Jadir, April 2005

41

2.2. Querying in TinyDB


Sensor Light, Temp Magnetometer Accelerometer Power [mW] 0.9 15 1.8 Sample time [ms] Sample energy [uJ] 0.1 0.1 0.1 90 1500 180

sampling is expensive in terms of power

a sample from a sensor s must be taken to evaluate any predicate over the attribute sensors.s. if a predicate discards a tuple of the sensors table, then subsequent predicates need not examine the tuple, and the expense of sampling any attributes in those predicates can be avoided

Lina Al-Jadir, April 2005

42

2.2. Querying in TinyDB

example:
select accel, mag from Sensors where accel > c1 and mag > c2 sample interval 1s 3 possible query plans: P1: magnetometer and accelerometer sampled before either selection is applied P2: mag. sampled - selection over it accel. sampled selection over it P3: accel. sampled - selection over it mag. sampled selection over it

P1 (traditional DBMS) always more expensive than P2 and P3 P3 better than P2 since Powermagn >> Poweraccel (unless the mag. predicate is much more selective than accel predicate).
43

Lina Al-Jadir, April 2005

2.2. Querying in TinyDB

Power-sensitive dissemination:

after the query has been optimized, it is disseminated into the network:

broadcast of the query from the root as each sensor hears the query, it must decide if the query applies locally and/or needs to be broadcast to its children in the routing tree if a query does not apply at a node, or at any of its children, the entire subtree is excluded from the query saves time of disseminating, executing, forwarding results extends the nodes lifetime common situation: constant-valued attributes (nodeId or location in fixedlocation network) used in a query predicate

deciding where a query should run: important ACQP decision

Lina Al-Jadir, April 2005

44

2.2. Querying in TinyDB

Semantic Routing Tree (SRT):

routing tree that allows each node to efficiently determine if any of the nodes below it needs to participate in a query over some constant attribute A conceptually, it is an index over A used to locate nodes that have data relevant to the query each node stores a single unidimensional interval: the range of A values beneath each of its children when a query q with a predicate over A arrives at node n:

if the query applies locally, n begins executing the query itself if any childs value of A overlaps the query range of A in q, n prepares to receive results and forwards the query; otherwise the query is not forwarded

Lina Al-Jadir, April 2005

45

2.2. Querying in TinyDB

example: SRT over the latitude (x in location). Only 3 nodes participate in the query.
46

Lina Al-Jadir, April 2005

2.2. Querying in TinyDB

even though SRTs are limited to constant attributes, SRT maintenance must occur:
new nodes can appear, existing nodes can fail, link qualities can change

using SRT is efficient, but maintenance and construction costs construction of SRT: several policies for parent selection

each node picks a random parent from the nodes with which it can communicate reliably each node picks a parent whose attribute value is closest to his own

Lina Al-Jadir, April 2005

47

2.2. Querying in TinyDB

Processing queries:

once a query has been optimized and disseminated, the query processor executes it
query execution = sequence of operations at each node
node sleeps -- wakes -- samples the sensor -- applies operator to data generated locally and received from children -- delivers the result to its parent

once results have been sampled and operators applied, the results are enqueued onto radio queue for delivery (both tuples from the local node and tuples forwarded from other nodes) when network contention and data rates are low, the queue can be drained faster than results arrive but situations when the queue will overflow prioritizing data delivery
48

Lina Al-Jadir, April 2005

2.2. Querying in TinyDB

3 simple prioritization schemes:


nave: FIFO delivery, tuples are dropped if they do not fit in the queue winavg: 2 results at the head of the queue are averaged to make room delta: the largest changes are probably interesting

a tuple is assigned an initial score relative to its difference from the most recent value transmitted from this node at each point in time, the tuple with the highest score is delivered the tuple with the lowest score is dropped when the queue overflows

adapting sampling and transmission rates, in 2 contexts:


network contention: reduce the frequency of network-related losses power consumption: meet lifetime requirements

Lina Al-Jadir, April 2005

49

2.2. Querying in TinyDB

To summarize, ACQP:

when: event clause where: semantic routing trees how often: lifetime clause, adapting sampling and transmission rates query optimization: ordering of sampling operators quality of data: prioritizing data

Lina Al-Jadir, April 2005

50

2.3. In-network aggregation

Query propagation and aggregation:


ex: average computed over a network of 6 nodes, in a 3-level routing tree server-based approach:

in-network approach:

each sensor sends its data directly to the server total of 16 message transmissions

each sensor computes a partial state record, consisting of {sum, count}, based on its data and that of its children (if any) total of only 6 message transmissions

1 2 3 3 4 3

S
f(c,d,f(a,b),e) c f(a,b) a
51

Lina Al-Jadir, April 2005

2.3. In-network aggregation

a query may be propagated:

through the routing structure (e.g. routing tree) using a broadcast mechanism using multicast to reach only the nodes that may contribute to the query (e.g. if having-predicate specifies a geographic region)

data is collected and aggregated within the network, using the same routing structure

Lina Al-Jadir, April 2005

52

2.3. In-network aggregation

TinyDB supports in-network aggregation:

supports 5 SQL operators: count, min, max, sum, average and 2 extensions: median, histogram aggregation implemented via a merging function f (commutative and associative), an initializer i, and an evaluator e:

z = f(x,y) where x,y,z: multivalued partial state records


ex: for average, a partial state record is (S,C), sum and count of sensor values f((S1,C1), (S2,C2)) = (S1+S2, C1+C2) ex: for average: i(x) = (x, 1) ex: for average, e(S,C) = S/C

initializer i: constructs a state record from a single sensor value

evaluator e: computes the value of an aggregate from a partial state record

Lina Al-Jadir, April 2005

53

2.3. In-network aggregation

uses an epoch-based mechanism:

each epoch (sampling period) is divided into time intervals. Nb. of intervals reflects the depth of the routing tree. aggregation results reported at the end of each sampling period when a node broadcasts a query, it specifies the time interval within which it expects to hear the result from its children. during its scheduled interval, each node:

listens for the packets from the children, receives them (gray) computes a new partial state record by combining its own data level 1 and the partial state records from level 2 its children (black) sends the result up the tree to its level 3 parent (white) level 4 start epoch Time

Root

end epoch

Lina Al-Jadir, April 2005

54

2.3. In-network aggregation

Important issue: ensure robustness to node or link failures


losing a parent node may orphan an entire subtree each node has to periodically rediscover its parent to make sure it is connected TinyDB also considers providing redundancy by duplicating parent nodes for each child, and by caching data over a past window of time at each node.

Lina Al-Jadir, April 2005

55

3. Other issues

Data-Centric Storage:

tree-based query propagation mechanism (used by TinyDB) appropriate when queries issued by a server Data-centric storage (DCS): method to support queries from any node in the network, by providing a rendez-vous mechanism for data and queries build indices to speed up the execution of queries involving data ranges build indices for continuously changing sensor data (continuous updates to a static index incurs heavy modification & communication costs) discard old data and maintain some temporal summaries
56

Data indices and range queries

Indexing motion data

Data aging

Lina Al-Jadir, April 2005

References

Book: Wireless Sensor Networks: An Information Processing Approach, by F. Zhao and L. Guibas, Elsevier, 2004. Papers: Bonnet P., Gehrke J., Seshadri P., Towards Sensor Database Systems, Proc. Int. Conf. on Mobile Data Management (MDM), Hong Kong (China), 2001. Madden S., Franklin M.J., Hellerstein J.M., Hong W., The Design of an Acquisitional Query Processor For Sensor Networks, Proc. Int. Conf. on Management of Data (SIGMOD), San Diego (USA), 2003.
+ Hong W., Madden S., Implementation and Research Issues in Query Processing for Wireless Sensor Networks, tutorial slides, Int. Conf. on Data Engineering (ICDE), Boston (USA), 2004.

Lina Al-Jadir, April 2005

57

You might also like