You are on page 1of 67

Open Sensor Networks for Air Quality Monitoring

OpenSense

Alcherio Martinoli and Karl Aberer (PI), EPFL


Joint work with:

Boi Faltings, Martin Vetterli, EPFL Lothar Thiele, ETH Zrich


and their team members In collaboration with multiple industrial and external research partners

opensense.epfl.ch
OpenSense

OVERVIEW

MOTIVATION RESEARCH PROBLEMS AND RESULTS


SENSING SYSTEM FROM DATA TO INFORMATION THE USERS

CONCLUSION

OpenSense

A IR POLLUTION

Air pollution in urban areas is a global concern


affects quality of life and health urban population is increasing

Air pollution is highly location-dependent


traffic chokepoints urban canyons industrial installations

Air pollution is time-dependent


rush hours weather industrial activities OpenSense

A IR POLLUTION M ONITORING

Accurate location-dependent and real-time information on air pollution is needed

Officials

environmental engineers: location of

pollution sources municipalities: creating incentives to reduce environmental footprint public health studies

Citizens

advice for outside activities assessment of long-term exposure pollution maps

Air pollution levels in the city center of Zrich (micro-scale model)

OpenSense

H EALTH S TUDY S CENARIO ( OPENS ENSE+ )

Assessment of impact of air pollution on health

E.g. blood pressure, renal activity, respiration


Correlation with activity and health parameters
On-body sensors Activity recognition (using mobile phones)

Effects are immediate


High temporal and spatial resolution of air quality data required Trajectories of study subjects are needed

User concerns
Study participants are sensitive data privacy Participants would like personalized information about individual exposure and risks

OpenSense

M ONITORING TODAY

Stationary and expensive stations

Sparse sensor network (Nabel)

Expensive mobile high fidelity equipment

Coarse models (mesoscale = 1km2)

Data difficult to integrate into applications (e.g. for correlating with other features like peoples activities)

OpenSense

OPPORTUNITIES

Wireless communication: deploy larger numbers of stations Mobility: deploy mobile stations and increase coverage

Communities: citizens as data producers and information consumers

OpenSense

R ESEARCH C HALLENGES

SENSING SYSTEM From many wireless, mobile, heterogeneous, unreliable raw measurements

NANO

INFORMATION SYSTEM to reliable, understandable and Web-accessible real-time information

TERA

1. Sensing system 2. Data acquisition and modeling 3. Community involvement

Air pollution as exemplary use case for other environmental phenomena:

Microscale: 5m^2

Radiation, noise, energy


OpenSense

S ENS ING S Y S T EM

OpenSense

L AUSANNE D EPLOYMENT
Sensor modalities and communication

Arfire et al., poster and demo

CO, CO2, NO2, O3, humidity, temperature GPRS channel

3 stationary stations - oldest since Sep 2011


2 close to NABEL station, 1 to e-vehicle garage needed for sensor calibration and long-term experiments

2 mobile stations on buses - oldest since Jun 2011


accurate localization vehicle context information

1 mobile station on electric vehicle - since Apr 2013


accurate localization flexible mobility, fast prototyping

OpenSense

EPFL|TRACE

Z URICH D EPLOYMENT ( COLLABORATION 1 stationary station - since Apr 2011

WITH

X -S ENSE)

close to NABEL station long-term sensor testing and sensor calibration testing new sensors (combined CO/NO2 sensor)

10 mobile stations on trams oldest since Sep 2011


O3, CO, Ultra Fine Particles, humidity, temperature > 1 year of measurements and 30 Mio data points Communication: GPRS/WLAN
Mar 2013

1 mobile station on LuftiBus - since


O3, Ultra Fine Particles, humidity, temperature Covers all Switzerland OpenSense

A IR POLLUTION S ENSORS
Gases We measure CO, CO2, NO2, O3

Lack of long-term spatially resolved data due to high sensor cost Lack of good low cost sensors on the market Need for frequent sensor calibration

Particles We measure nanoscale particles with a


diameter < 100 nanometers (UFPs)

Much smaller than PM10 or PM2.5 and are believed to have more severe health implications Lack of epidemiological studies on health effects of long-term exposure

UFPs MiniDisc (Miniature diffusion size classifier)

High cost of UFP monitoring equipment Lack of spatially resolved exposure data Lack of reliable dispersion models

OpenSense

S ENSOR

STATIONS ON

PUBLIC TRANSPORTATION
Lausanne deployment

Zurich deployment

+
OpenSense

PERSONAL M OBILE S ENSING


Zurich prototype Lausanne prototype

Eberle et al., poster and demo

AirQualityEgg

Smartphone connected to ozone sensor and various application software for Android
[Hasenfratz et al., Mobile Sensing 2012] [Predic et al., PerCom 2013]

Low-cost devices for home deployment (calibration tests close to a NABEL station)

OpenSense

S ENSOR M ODELING Understand behavior of electro-chemical sensors


sensor dynamics linearity sensitivity to humidity variability with different flow conditions
Selected City Technology A3CO Sensor - measured and modeled response

Output [V]

OpenSense

S AMPLING S YSTEM
Slow response of chemical sensors
active vs. passive sampling open vs. closed sampling system new flow pre-processing layer for Lausanne deployment enabling full flow control
exhaust chamber valve Amplitude [ppm]

Prototype CO2 sniffer on Khepera III mobile robot

intake

pump

OpenSense

LOCALIZATION A CCURACY
Accurate chemical sampling requires accurate positioning Low-cost, embedded sub-meter accuracy in urban settings requires sensor fusion & light map matching algorithms Large set of rich data (stop coordinates, heading, odometer, acceleration, vehicle context data, etc.)
doors open Next stop: Sallaz Next stop: Valmont

Current stop: Sallaz

OpenSense

C ITY C OVERAGE

Public transport vehicles are not bounded to a specific line number but rather to their host depot We can choose how many stations we deploy in each depot but not on which lines

Route selection algorithm

Depot Oerlikon

3 trams

Depot Irchel
Depot Kalkbreite

2 trams
5 trams

[Saukh et al., AIHC Journal, 2013, Saukh et al., PerSeNS 2012]

OpenSense

C ALIBRATION PROCEDURE

Gas sensor drift (aging) -> periodic recalibration needed Gas sensors are installed on mobile vehicles Few expensive reference stations within city limits Two recipes:

Calibration upon rendezvous of mobile vehicles and references Passing of calibration data from vehicle to vehicle: Multi-hop Calibration

[Hasenfratz et al., EWSN 2012]

OpenSense

D ATA Q UANTITY

AND

VISUALIZATION

[Keller et al., SenseApp 2012]

CO concentration Pollutant UFP Ozone # of Measurements 24.050.000 3.430.000

UFP concentration Sampling rate 5s 20s Time Period 18 months 18 months

CO

2.820.000

20s

18 months

OpenSense

DATA STORAGE GS N @ DATA.OPENSENSE.ETHZ.CH

[Aberer et al., MDM 2007]

OpenSense

F R O M D AT A T O I NF O R MATIO N

OpenSense

C LASSICAL A IR Q UALITY M ODELS

OpenSense

C HOICE

OF

M ODELS

Physics-based

Context-aware machine learning Types of models

Data-driven statistical and machine learning

OpenSense

P RIMARY M ODEL U SE : P OLLUTION M APS

3 km

3 km

Single measurements

Micro-scale pollution map

Processing steps:
Raw data

Data filtering & calibration

Data validation

LUR model

Pollution map

Map validation

Land-Use Regression models


OpenSense

UFP D ISTRIBUTION
Winter (Jan Mar)

IN

Z URICH

Hasenfratz et al., poster and demo

x 10 3

Spring (Apr - Jun)

x 10 3

2.5

2.5

Particle concentration [particle/cm 3]

1.5

1.5

18300 3600 PM/cm3

0.5

10200 4100 PM/cm3


4

0.5

Summer (Jul Sep)

x 10 3

Autumn (Oct Dec)

x 10 3

2.5

2.5

Particle concentration [particle/cm 3]

1.5

1.5

9400 2500 PM/cm3

0.5

13100 4000 PM/ cm3

0.5

OpenSense

Particle concentration [particle/cm 3]

Particle concentration [particle/cm 3]

100 x 100 m2 resolution Random 10-fold validation RMSE: 2600 PM/cm3

UFP D ISTRIBUTION

IN

Z URICH

Li et al., poster

Spatial + Land Use Regression with Gaussian Processes Random 10-fold validation: RMSE = 2324
UFP Estimation (Mean)
UFP Estimation Confidence (95% Conf. Int)

OpenSense

GENERALIZED USE

OF

M ODELS

Reduce number of measurements

Do not store outliers

Answer queries where no data is available

Store sensor data in approximate form

[Sathe et al. , Model-based Sensor Data Acquisition and Management, Springer to appear]

OpenSense

D ATA A CQUISITION: O PTIMAL S ENSING FOR M OVING S ENSORS


Goal: find an optimal sensing strategy, which provides an appropriate balance between maximize sensing coverage of moving sensors and minimize sensing cost (sampling and communicating back)? Questions: Can segmentation help? What is the optimal sampling strategy? Results: 5 algorithms for segmentation, 3 for sampling

[Zhixian et al., MDM 2012]

OpenSense

D ATA C LEANING: I NFERRING D YNAMIC D ENSITY M ETRICS

Goal: Inferring future dynamics of a time series


Outlier detection Supporting probabilistic querying


Given time series up to t-1 Estimate probability distribution of values rt at time t

Problem:

Results: 4 estimation algorithms


[Sathe et al., 2011, ICDE 2011]

OpenSense

D ATA C LEANING: A NOMALY D ETECTION

original data stream approximation using user-selected models detecting anomalies user confirmation: anomaly is an actual error?
[Paparrizos et al. , ICDE, 2011]

OpenSense

Q UERY PROCESSING
Continuous Moving Queries Aggregate Queries COX emitted yesterday in Give a (in car) pollution update every 30 mins Lausanne center

Approach
Data aggregator produces a model cover from a set of models on an area Continuous sensor updates Continuous and ad-hoc queries

Challenge
Different sensor accuracies Unreliable, erroneous data Uncontrolled mobility

Results: 3 algorithms
[Cartier et al., SECON 2012]

OpenSense

DATA C OMPRESSION

Efficiently Maintaining Distributed Model-Based Views on Real-Time Data Streams


s1
s2
m1 time s1 m2 time

s1 s2
5.9 6.1 6.2 5.2 5.7 6.2

s2
5.3 5.7 6.1
internet

t1 t2 t3

5.8 5.9 6.1

t1 t2 t3
m1

m2

2-level data compression

Original data stream approximation using models bitmap compression of model parameters

Over 90 % compression for temperature with 0.5 error bound

[Arion et al., GLOBECOM 2011]

OpenSense

T HE U S ER S

OpenSense

C ONTEXT E XTRACTION
Objective: Automatically annotating trajectories of different types of moving objects (cars, people) Stops
Hidden Markov Model (HMM) Stop behaviors

Moves
Map matching Transportation means

Trajectory
Land use coverage

[Z. Yan et al., EDBT 2011]

OpenSense

PARTICIPATION - TRUST - PRIVACY


Users as consumers

Different concerns, perceptions, user groups, data quality requirements Can we satisfy them simultaneously? Incentives for participation Trusted data Protecting privacy Objective: multi-query optimization for maximizing social benefit Economic approach

Incentives

Users as producers

User

Can we reconcile these?


information reputation reward Sensing System

data privacy energy

[Aberer et al., IWGS 2010], [Li et al, Internet of Things, 2012]

Utility-based framework

OpenSense

THE S ETTINGS

Riahi et al., poster

[Riahi et al., EDBT 2013]

OpenSense

E N H AN C IN G S EN S IN G EF F IC I EN C Y
Exploit the spatial correlations Reduce data readings

Zhang et al., poster

Maintain coverage (small reconstruction error) Achieve fairness among all mobile users Adaptively choosing subsets of active mobile users
2 1 6 5 3 1 Reduce 4 4 2 3

6 5

OpenSense

C O NCL US IO NS

OpenSense

C ONCLUSION

End-to-end system view essential


Investigate all system layers: sensor user interfaces Utility-based framework as integrative approach System modeling as a key requirement

Mobility is crucial and challenging at the same time

Coverage, maintenance, flexibility, data dissemination

Results applicable beyond air pollution

Complex, distributed, participatory measurement

For more information: posters, demos, and opensense.epfl.ch


OpenSense

TEAM
Karl Aberer, EPFL-LSIR, PI

Thanasis Papaioannou, postdoc Zhixian Yan, postdoc Hoyoung Jeung, postdoc Rammohan Narendula, PhD Mehdi Riahi, PhD Alex Arion, PhD Saket Sathe, PhD Tian Guo, PhD Julien Eberle, PhD Sofiane Sarni, engineer Jason Jingshi Li, postdoc

Martin Vetterli, EPFL-LCAV, co-PI


Andrea Ridolfi, postdoc Runwei Zhang, PhD

Lothar Thiele, ETHZ-TIK, co-PI


Olga Saukh, postdoc Jan Beutel, postdoc David Hasenfratz, PhD Christoph Walser, engineer

Boi Faltings, EPFL-LIA, co-PI

Alcherio Martinoli, EPFL-DISAL, co-PI


Alexander Bahr, postdoc Ali Marjovi, postdoc Adrian Arfire, PhD William C. Evans, PhD Emanuel Droz, engineer

OpenSense

B ACKUP

SLIDES

OpenSense

DEATHS

FROM

URBAN A IR POLLUTION

2% of all deaths (1.2 million people)

Global Health Risks, WHO 2009

OpenSense

A IR POLLUTION

AND

C ARDIOVASCULAR M ORTALITY

Health studies show that air pollution increases the risk of cardiovascular mortality (heart attacks) by 5% to 20% at least

OpenSense

WHAT

IS THE PROBLEM?

Two mobile nodes: who should measure?


1. 2. 3. 4. 5. 6.

Node decides individually depending on its state, e.g. calibration Nodes communicate with WSN and coordinate Base station schedules nodes using mobility model: a third node arrives, dont measure! Air quality model: dont need measurement! Privacy model: node 1 should measure! Application model (e.g. health service): no measurement needed!
OpenSense

VALUE

OF

D ENSE M EASUREMENTS

Traditional approach

Recent results
Massive deployment of stations (150) at street-level (2008/2009 New York City Community Air Quality Survey) Pollutants of interest heavily concentrated along roads with high traffic densities

Few stations Low resolution interpolated estimates of pollutant concentrations across massive regions

OpenSense

GRANULARITY

OF

M ODELS
Mesoscale: 1km^2

Macroscale: 100 km^2

Microscale: 5m^2

Statistical

OpenSense

C ALIBRATION A CCURACY
150

Raw data

100

50

Time

Raw sensor data Calibration Calibrated sensor data

Accuracy bounds Invalid sensor readings

Signal range algorithm


Sensor model Phenomenon model Reference measurements
[Hasenfratz et al., DCOSS 2013]

OpenSense

S EGMENTATION

HELPS

No segments

5 Segments

0.8

Raw CO2 Readings Linear Regression SVM Regression

0.8

Raw CO2 Readings Linear (5 Segments) SVM (5 Segments)

Normalized CO2 Values

0.7

Normalized CO2 Value


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2 0

0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

One day measurement of a bus line (hour)

One day measurement of a bus line (hour)

OpenSense

S EGMENTATION S TRATEGIES Optimal segmentation


Dynamic programming, O(k n2) (n readings, k segments) Too expensive

Top-down binary segmentation

Basic strategy: O(k log n) Binary+: optimized approach in finding segment boundaries

Error-based heuristic segmentation


Heuristic: segmentation using absolute errors Heuristic+: segmentation using relative errors

Near-optimal segmentation

Binary+ + Heuristic+ : O(k n log n) OpenSense

S EGMENTATION R ESULTS
One day data as training and test sets

Heuristic is better than Binary, specially for testing Large number of segments (k > 5) does not help much

OpenSense

S AMPLING

Optimal sampling

NP-hard Uniform Random Points with highest entropy Remove information redundancy Recalculate entropy after the points already selected

Distribution-based sampling

Entropy (error) based sampling

Mutual Information based sampling


OpenSense

S AMPLING R ESULTS

Uniform is better than random: duty cycle sensing! Entropy is as good as mutual-info, when sampling rate is large Entropy goes worse when sampling rate is small (bias for large errors)

OpenSense

D YNAMIC D ENSITY M ETRICS M ODEL

Measured value rt is a sample of a probability distribution Different statistics for estimation

Problem: the true distribution is not observable How to determine the quality of the estimation?

OpenSense

E XPERIMENTAL E VALUATION

OpenSense

C ONDENSE E FFICIENTLY C ONSTRUCTING M ODELS

Three methods

Standard geo-statistics: kriging (one model) Uniform gridding: linear regression for each grid cell Adaptive k-means: cluster points that are jointly well approximated by a linear regression model

S. Cartier, S. Sathe, D. Chakraborty and K. Aberer, ConDense: Managing Data in Community-driven Mobile Geosensor Networks, SECON 2012.

OpenSense

A DAPTIVE K-MEANS

Algorithm sketch

Create k clusters in 2-d space Build a model for each cluster In each cluster identify the point with the largest approximation error If that point is above error threshold it becomes a new cluster center
OpenSense

E VALUATION:

PROCESSING COST

OpenSense

E VALUATION:

ERROR

OpenSense

E XPERIMENTAL R ESULTS
error completenesss

Real data for electrosmog sensing from Nokia campaign Avg Static : static parameters that meet the threshold on the average Max Static : static parameters that always meet the threshold
OpenSense

USER PRIVACY

VS.

D ATA R ELIABILITY Approach


Personalized privacy Users estimate potential privacy loss

Participatory sensing

Users reveal location Semi-honest aggregation server infers user activity Obfuscation affects data quality

B.Agir, T.Papaioannou, R.Narendula, K.Aberer, J.P. Hubaux, An adaptive scheme for personalized privacy in participatory sensing, WiSec 2012.

OpenSense

TASK A SSIGNMENT

Users submit queries to the aggregator


They specify a valuation function vq() and a limited budget Bq for each query q Trustworthiness, accuracy Battery consumption, privacy leakage maximize utility (social welfare) u(S), S subset of S

Sensors S have sensing cost cs that they advertise

Aggregator tries to optimally answer the queries

Utility definition: difference between the value of the query results and the cost for obtaining the results.
OpenSense

Q UERIES

IN

P ARTICIPATORY S ENSING

OpenSense

C OMPUTING

AN

A LLOCATION

Optimal

Formulate the problem as binary integer linear program Iteratively choose the sensor that maximizes the utility gain and remove obsolete sensors (exploits sub-modularity) 1/3-approximation algorithm for sub-modular functions Iteratively choose, for each query, the sensor with highest utility Valuation functions vq need not be sub-modular Experiments: greedy scheduling works better than heuristic scheduling when vqs are not sub-modular

Heuristic (LocalSearch)

Greedy (Baseline)

OpenSense

E VALUATION

OpenSense

INCENTIVE S CHEMES

FOR

S ENSORS

IEEE TRANSACTION ON COMPUTERS, SPECIAL SECTION ON COMPUTATIONAL SUSTA

&# $"

"- ./ - 0123456"

! "#$% & #'( % ) * #+,'- #$'. #% /0$#* #+,'

"- ./ - 01738 93"


&"

Peer Truth Serum

"- ./ - 01: 9/ "

% # $"

! "#$% & #'( % ) * #+,'- #$'. #% /0$#* #+,'

Reporting the true measurement is the best strategy for sensor operators Incentivize measurements at locations of less certainty Resistant to collusion (up to 70% collusion in simulation)

% "

!# $"

!" % !" &! " ' !" (!" $! " )!" *! " +! " , !" % !!"

1 23 /#'4#"#5 '

Fig. 7. Average payment per measurement against different noise levels


&# $"

Fig. 9 levels

"- ./ - 0123456"
&"

"- ./ - 01738 93"

"- ./ - 01: 9/ "

Robust Bayesian Truth Serum for Sensors

% # $"

% "

!# $"

Extends existing incentive scheme to multivalue sensors. An effective reputation system for sensors.

!" IEEE TRANSACTION ON COMPUTERS, SPECIAL SECTION ON COMPUTATIONAL SUSTAINABILITY !"

%! "

&! "

' !"

(!"

$! "

)!"

*! "

+! "

, !"

% !!"

11

( #$1#+,% & #'23'4 ,5#$'! & #+,/'627 7 089 +& '2+',5#'( 0: 7 9 1'( $9 2$'
("

Influence Limiter on Sensor Reports

! "#$% & #'( % ) * #+,'- #$'. #% /0$#* #+,''

+, , -. -/ made 012, -/ 3 " to an average +-45, -264-7 89: / ;, " Fig. 8. Payment sensor for different +, , -. -/ 012, -/ 3 "<7 8, "4= ">, ?0"@ 0" +-45, -264-7 89: / ;, "<7 8, "4="A, ?0"B7 0" levels of collusion reporting the public prior
$# '" $# &" $# % "

Fig. 1 levels 7.4

Fig. 5. Our simulation of air quality sensors from a suburb in Strasbourg.

OpenSense w ith accurate sensors, w here the range of 5 ppb of

and the proper scoring rules. Figure 6 show s the average payment a given sensor received from the Peer Truth Serum given different degrees of uncertainty of the pollutant level for the given sensor location. The uncertainty is presented in the form of the root-meansquared deviation betw een the ground truth and the most likely value from the public prior at the sensor 1. 2'3 #"4 % 56+'7#,8 ##+'( $4 6$'% +9': $60+9'; $0,<' location. This graph show s that in general, the Peer Truth Serum incentivizes reporting at locations of greater Fig. 6. Average payment per sensor given uncertainty uncertainty, w here the public prior differs more from the actual ground truth observed by the sensor. In contrast, the proper scoring are indifferent to the degree by of Table 1 show s therules distribution of payments received imprecision at the location of measurement. an average sensor throughout the simulation, adopting
$" !# '" !# &" !# % " !# $" !" $) " $*" ()" ( *" %) " % *"

$# $"

Finall impac other not ne consid 1) 2) 3)

OPENS ENSE A RCHITECTURE


Applications checks data offers submits requests

response

queries charges data cost submits offers

Service market

landuse data Map data Landuse data

Environment models interpolation/segment ation cleaned calibrated data

sampling for locations considering error, value Data aggregation server required samples priority

Data market

Calibration model Cleaning model

raw data calibrated data

sensor locations Mobility model predictions measurements, location, status Mobile sensors

Scheduling component schedule (measurements, priority)

Sensor model (e.g sensor wear)

sensor status
predictions

local coordination

Context

Data Flow

Control Flow

OpenSense