Professional Documents
Culture Documents
SENSOR NETWORK
A Dissertation Presented
by
PURUSHOTTAM KULKARNI
DOCTOR OF PHILOSOPHY
February 2007
Computer Science
c Copyright by Purushottam Kulkarni 2007
A Dissertation Presented
by
PURUSHOTTAM KULKARNI
My six-year stay at Amherst for my doctoral degree has been a memorable experience—
I am indebted to my advisers Prof. Prashant Shenoy and Prof. Deepak Ganesan. Prashant
provided valuable guidance and mentoring throughout my stay at Amherst. I am also grate-
ful to Deepak for advising me for my dissertation. I have learnt several aspects of research
I would like to thank my thesis committee members— Prof. Jim Kurose, Prof. C. Mani
Krishna and Prof. Mark Corner, for agreeing to be part of my dissertation and for their
feedback.
Minnesota, Duluth which not only taught me important subjects but also the importance of
couraged me to pursue the Ph.D. program and I will always be grateful for his advice.
Trafford thanks for helping with all the computer hardware and equipment issues.
While at Amherst, I was fortunate to meet and make a lot of friends, who made my stay
memorable. My sincere thanks to all my friends, Sudarshan Vasudevan, Hema Raghavan,
v
Murthy, Swati Birla, Preyasee Kamath, Sreedhar Bunga, Rati Sharma, Sourya Ray, Koushik
Dutta, Ambarish Karmalkar, Smita Ramnarian, Satyanarayan Ray Pitambar Mohapatra,
Neil Naik, Stephanie Jo Kent, Hema Dave, Dheeresh Mamidi, Pranesh Venugopal and
many others. Special thanks to Tejal Kanitkar, Ashwin Gambhir, Ashish Deshpande and
Anoop George Ninan who helped and motivated me in several ways to complete the Ph.D.
program. I will forever cherish the wonderful memories with you all.
Lastly, I am grateful to my parents, Nanna, Aai and Pappa, my brother, Dhananjay, and
sister, Renuka, for their support, encouragement and patience. Thank you for everything.
vi
ABSTRACT
FEBRUARY 2007
PURUSHOTTAM KULKARNI
less communication technologies, have enabled and led to a large research focus in sensor
sensor nodes. Single-tier networks consisting of homogeneous nodes achieve only a sub-
set of application requirements and often sacrifice others. In this thesis, I propose the
notion of multi-tier heterogeneous sensor networks, sensors organized hierarchically into
multiple tiers. With intelligent use of resources across tiers, multi-tier heterogeneous sen-
sor networks have the potential to simultaneously achieve the conflicting goals of network
lifetime, sensing reliability and functionality.
with image sensors. I address the issues of automatic configuration and initialization and
design of camera sensor networks.
vii
Like any sensor network, initialization of cameras is an important pre-requisite for cam-
era sensor networks applications. Since, camera sensor networks have varying degrees of
infrastructure support and resource constraints a single initialization procedure is not appro-
priate. I have proposed the notions of accurate and approximate initialization to initialize
cameras with varying capabilities and resource constraints. I have developed and empiri-
cally evaluated Snapshot, an accurate calibration protocol tailored for sensor network de-
ployments. I have also developed approximate initialization techniques that estimate the
degree of overlap and region of overlap estimates at each camera. Further, I demonstrate
usage of these estimates to instantiate camera sensor network applications. As compared to
manual calibration, which can take a long time (order of hours) to calibrate several cameras,
is inefficient and error prone, the automated calibration protocol is accurate and greatly re-
duces the time for accurate calibration—tens of seconds to calibrate a single camera and
can easily scale to calibrate several cameras in order of minutes. The approximate tech-
With regards to design of camera sensor networks, I present the design and imple-
mentation of SensEye, a multi-tier heterogeneous camera sensor network and address the
and energy usage based on the type of sensor used for application tasks. Using SensEye
I demonstrate how multi-tier networks can achieve simultaneous system goals of energy
efficiency and reliability.
viii
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
CHAPTER
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Initialization of Camera Sensor Networks . . . . . . . . . . . . . . . . . . 4
1.3 Design of Camera Sensor Networks . . . . . . . . . . . . . . . . . . . . . 6
2. RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
ix
2.2.3 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
x
4. APPROXIMATE INITIALIZATION OF CAMERA SENSOR
NETWORKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4.1 Duty-Cycling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4.2 Triggered Wakeup . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
xi
5.2 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 SensEye Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
xii
LIST OF TABLES
Table Page
5.1 SensEye Tier 1 (with CMUcam) latency breakup and energy usage. Total
latency is 136 ms and total energy usage is 167.24 mJ. . . . . . . . . . . 88
5.2 SensEye Tier 1 (with Cyclops) latency breakup and energy usage. . . . . . . 88
5.3 SensEye Tier 2 Latency and Energy usage breakup. The total latency is 4
seconds and total energy usage is 4.71 J. † This is measured on an optimized Stargate
node with no peripherals attached. . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4 Number of wakeups and energy usage of a Single–tier system. Total energy
usage of both Stargates when awake is 2924.9 J. Total missed detections
are 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5 Number of wakeups and energy usage of each SensEye component. Total
energy usage when components are awake with CMUcam is 466.8 J and
with Cyclops is 299.6 J. Total missed detections are 8. . . . . . . . . . . 92
xiii
LIST OF FIGURES
Figure Page
1.1 A typical sensor network consisting of sensors for sampling and sink nodes
for data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 Projection of reference points on the image plane through the lens. . . . . . 24
3.11 Comparison of empirical error with lower bounds with and without
considering error due to Cricket. . . . . . . . . . . . . . . . . . . . . . 48
xiv
4.1 Different degrees of overlap (k-overlap) for a camera. . . . . . . . . . . . . 56
5.4 Prototype of a Tier 1 Mote and CMUcam and a Tier 2 Stargate, web-cam
and a Mote. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
xv
CHAPTER 1
INTRODUCTION
1.1 Motivation
A sensor network—a wireless network of spatially distributed sensors—senses and
exist and a few examples are: (i) Surveillance and Tracking: The focus of surveillance and
tracking applications to monitor and area of interest and report events of interest. Surveil-
lance and tracking applications [17, 43, 22] can use a deployment of sensors to detect
and recognize objects of interest and coordinate to tract their movement, (ii) Disaster Re-
sponse: In emergency scenarios, where existing infrastructure has been damaged, a quick
deployment of sensors [32] provides valuable feedback for relief operations, (iii) Envi-
monitor environmental phenomenon [56, 35] of temperature, soil moisture, humidity, pres-
ence of chemicals, solar radiation etc. which in turn can be used for recording observations
and forecasting. to study their movements and environmental conditions. Sensor network
applications like landslide detection and prediction [1, 53] can be used to determine the
scope of a landslide and also to predict occurrences. Sensors have also been used in volca-
noes [64] to aid in-depth study of volcanic activity. Further, sensor nodes can also be placed
in natural habitats of animals [33, 24, 67] (iv) Seismic Structure Monitoring: A network
of seismic sensors monitor stress levels and seismic activity of buildings [66] and bridges.
Such systems are used to detect and localize damage to a structure and also quantify the
severity.
1
Sink
Sensors
Sink Sink
Figure 1.1. A typical sensor network consisting of sensors for sampling and sink nodes for
data collection.
Figure 1.1 shows a typical sensor network, consisting of sensors spread within an area
of interest and sink nodes that are interested in gathering events and sampled data form
the sensors. Further, sensors most often communicate with each other and the sink nodes
using a wireless network. As is the case in several of the example applications presented
above, sensors are deployed in areas with no infrastructure like remote forests, volcanoes
or disaster areas where existing infrastructure has been destroyed or on moving objects
of constant power supply and most often the nodes are battery powered—making energy-
limited energy, have limited computation and communication capabilities. Sensor network
deployments using these often resource-constrained devices has introduced a variety of
interesting research challenges. A few of the important research problems in the design of
sensor networks fall in the following categories,
operation and also to interface with different modality sensors based on applications
needs.
2
• Programming tools: Programming paradigms and operating system support is re-
quired to add and modify functionality at sensor to drive applications.
• Networking: Sensors often need to coordinate with each other to transmit useful
data to the sink or collaborate to execute application tasks. Several networking issues
arise to meet these requirements, different types of radios and their characteristics,
entities that are interested in it. Issues in data management are aggregation to reduce
Limited resources and lack of continuous power supply are two main constraints in the
works with camera sensors. A few examples of camera sensor networks applications are:
ad–hoc surveillance, environmental monitoring, and live virtual tours. Regardless of the
end-application camera sensors perform several common tasks such as object detection,
object recognition and tracking. The object detection task detects object appearance, recog-
nition identifies object of interest and tracking tracks object movements. A characteristic
of camera sensors that differentiates it from other modalities like temperature, acoustic, vi-
bration etc. is that they are directional. Each image sensor points in a certain direction and
for a given location can have different orientations resulting in different viewing regions.
3
As mentioned above, research issues in camera sensor networks are several—development
of energy-efficient camera sensors, deployment and configuration of nodes,task allocation
and coordination for collaborative task execution, resource allocation at each node for var-
ious tasks, communication of images and events to interested entities, efficient storage and
archival of image data. In this thesis, I address the issues of automatic initialization of
camera sensors with varying resource capabilities and design of multi-tier heterogeneous
camera sensor networks to achieve the simultaneous goals of energy efficiency and relia-
bility.
era sensor network applications. Monitoring and surveillance applications using an ad-hoc
camera sensor network, requires initialization of cameras with location and orientation in-
formation. Camera calibration parameters of orientation and location, are essential for
localization and tracking of detected objects. While internal calibration parameters, like
focal length, scaling factor and distortion can be estimated a priori, the external parameters
have to be estimated only after the camera sensors are placed in an environment. Further,
in cases where camera sensors have no infrastructure support and are resource-constrained,
estimating exact camera parameters may not be feasible. Inspite of such limitations, cam-
eras need to initialized with information to enable applications.
Manual calibration of cameras is one possibility, but is highly error prone, inefficient
and can take a long time (in order of hours to calibrate several cameras). Several vision-
based calibration techniques have been studied, but are not well-suited for camera sensor
networks. Vision-based techniques often relay on high-fidelity images, abundant process-
ing power, calibrate a single or few cameras and depend on knowing exact locations on
reference points(landmarks)—assumptions that are most often not applicable to sensor
networks. In this thesis, I propose Snapshot an automatic calibration protocol for cam-
4
era sensors. Snapshot leverages capabilities of position sensors for efficient and accurate
estimation of external parameters of cameras. Snapshot reduces the time required for ac-
curate calibration of camera networks from the order of hours to order of minutes. The
Cricket [38] mote sensors, which localize themselves using ultrasound beacons, are used
as a calibration device along with captured images from the camera for calibration. I aim
to analytically characterize the different errors introduced by the calibration protocol and
empirically compare them with those due to Snapshot.
However, accurate calibration techniques are not feasible for deployments of ad-hoc
low power camera sensors for the following reasons: (i) Resource constraints: Accurate
calibration of cameras is fairly compute intensive. Low-power cameras do not have the
landmarks: In many scenarios, ad-hoc camera sensor networks are deployed in remote lo-
cations for monitoring mountainous and forest habitats or for monitoring natural disasters
such as floods or forest fires. No landmarks may be available in remote inhabited loca-
each camera sensor with a positioning device such as GPS [5] and a directional digital com-
pass [11], which enable direct determination of the node location and orientation. However,
today’s GPS technology has far too much error to be practical for calibration purposes (GPS
can localize an object to within 5-15m of its actual position). Ultrasound-based positioning
and ranging technology [42] is an alternative that provides greater accuracy. But the use
of additional positioning hardware both consumes more energy resources on battery pow-
ered nodes, and in some cases, can be prohibitive due to their cost. As a result, accurate
calibration is not always feasible for initialization of resource-constrained camera sensor
networks with limited or no infrastructure support. In this thesis, we answer the funda-
mental question of whether it is possible to initialize resource-constrained camera sensors
5
Computation Type of
Sensor Capability Calibration
Cyclops Limited Approximate
CMUCam + Mote Limited Approximate
Webcam + Stargate Abundant Accurate
PTZ + Stargate Abundant Accurate
sensor nodes and also supports application requirements. The approximate initialization
depends on very limited infrastructure support and is well-suited for low-power camera
Table 1.1 shows different types of sensor nodes, their computation capabilities and a
suitable calibration technique for each. In this thesis, I study both, accurate and approxi-
works [15, 16]. Given a set of application requirements and tasks, an appropriate sensor
and embedded platform is chosen for the entire network. The choice of the hardware is
guided by the most demanding application task to be performed at each node. All sen-
sor nodes execute the same set of tasks and coordinate in a distributed manner to achieve
application requirements.
A homogeneous network has several design choices, network lifetime being an im-
portant primary constraint. For increased lifetime of the network, low power sensor nodes,
e.g., cell-phone class cameras, can be deployed. Low power consumption nodes address the
6
Figure 1.2. A multi-tier heterogeneous camera sensor network.
lifetime constraint but often have lower reliability and functionality. The low power cell-
phone class cameras yield low resolution coarse grained images—sacrificing reliability for
network lifetime. Another design choice is to optimize the network for high reliability and
functionality by using high resolution webcams. The sensors produce high resolution im-
ages resulting in better reliability, but sacrifice lifetime as each node consumes considerably
more power than cell-phone class cameras. As a result, there exists a tradeoff between the
design choices of lifetime or energy efficiency and reliability and functionality. A similar
tradeoff exists between energy efficiency and latency of detection. Sensor nodes deployed
to optimize energy efficiency result in higher latency detections, as nodes sleep for larger
duration to save energy. Nodes deployed to minimize latency of detection result in lower
energy efficiency as they are asleep for smaller durations. Energy efficiency and cost are
also similar conflicting design choices. Thus, a single choice along the axes of power, re-
liability and cost results in a sensor network that sacrifices one or more of the other key
requirements. As a result, homogeneous networks often achieve only a subset of the design
goals and sacrifice others.
In this thesis, I propose a novel multi-tier design of sensor networks consisting of het-
erogeneous sensors. A multi-tier sensor network is a hierarchical network of heterogeneous
7
sensors as shown in Figure 1.2. The network consists of sensors with different capabilities
and power requirements at each tier. Referring to Figure 1.2, Tier 1 consists of low power
cell-phone class cameras, whereas Tier 2 consists of high power webcams. Tier 1 sensor
nodes can be used when energy efficiency is a primary constraint and nodes from Tier 2
when reliability is a primary constraint. Intelligent usage of nodes at each tier has the poten-
tial to reconcile the conflicting goals of energy efficiency and reliability and overcome the
drawback of homogeneous single-tier networks. Further, intelligent node placement can
enable usage of high power nodes at higher tiers only when required with a wakeup mech-
anism resulting in energy benefits. Coverage of a region with nodes from multiple tiers
benefits.
age, functionality, and reliability. For instance, the lower tier of such a system can employ
cheap, untethered elements that can provide dense coverage with low reliability. However,
reliability concerns can be mitigated by seeding such a network with a few expensive, more
reliable sensors at a higher tier to compensate for the variability in the lower tier. Similarly,
a mix of low-fidelity, low-cost sensors and high-fidelity high-cost sensor can be used to
achieve a balance between cost and functionality. Application performance can also be
improved by exploiting alternate sensing capabilities that may reduce energy requirements
without sacrificing system reliability. As a result, multi-tier sensor networks can exploit
the spectrum of available heterogeneous sensors to reconcile conflicting system goals and
overcome drawbacks of homogeneous networks.
This thesis addresses research challenges in multi-tier sensor network design and im-
plementation with focus on camera sensor networks. The rest of this chapter describes
contributions of the thesis and research issues addressed.
8
1.4 Thesis Contributions
My thesis makes the following contributions related to initialization of camera sen-
sor networks and energy-reliability tradeoff in multi-tier heterogeneous camera sensor net-
works:
port application requirements. Due to the varying capabilities at each node and nature
of infrastructure support, an uniform solution to initialize nodes at all tiers is not fea-
sible. In this thesis, I propose techniques for accurate and approximate initialization
The contributions of this thesis related to initialization of camera sensors are as fol-
lows:
networks.
– I have shown that using the Cricket position sensors to automate the calibration
9
– The proposed accurate and approximate initialization methods demonstrate fea-
sibility of initializing low-power low-fidelity camera sensors quickly and effi-
ciently.
In this thesis, I argue that multi-tier heterogeneous sensor networks can achieve si-
multaneous system goals which are seldom possible in single-tier homogeneous net-
works. I study the energy-reliability tradeoff and demonstrate that a multi-tier net-
work can simultaneously achieve these conflicting goals.
– I have designed and implemented a multi-tier camera sensor network and demon-
– Using the tasks of object detection, recognition and tracking I have quantified
network and found that a multi-tier network can obtain comparable reliability
network developed to study the tradeoffs of energy usage and object detection accuracy.
10
CHAPTER 2
RELATED WORK
This thesis draws upon numerous research efforts in camera sensors. There has been
work in the broad topics of system-level issues, initialization and configuration of sensors
and design of various applications using camera sensor networks. This chapter gives an
overview of these related research efforts and places the contributions of this thesis.
vary from embedded PCs, PDA-class Intel Stargates [55] to Crossbow Telos nodes [40] and
Motes [37] (see Table 2.1). Commonly used communication technologies vary from infra-
red (IR), Bluetooth, to RF-based standards like 802.11 and 802.15.4. Several modalities of
sensors interface with the above embedded platforms to sense and monitor different kind
of phenomenon. A few different sensing modalities are: acoustic, vision, temperature,
humidity, vibration, light etc. Considering camera sensors, the available sensors range
from high-end pan-tilt-zoom (PTZ) cameras, Webcams [31] to low-fidelity cellphone class
cameras like Cyclops [45] and CMUcams [9] (see Table 2.2). The available choices span
the spectrum of form factor, cost, reliability, functionality and power consumption. These
developments in turn have resulted in a major research focus in the field of sensor networks
and its applications.
11
Platform Type Resources
Mica Mote Atmega128 84mW, 4KB RAM,
(6 MHz) 512 KB Flash
Telos TI MSP430 40mW, 10 KB RAM,
(8 MHz) 48 KB Flash
Yale XYZ OKI ArmThumb 7-160mW, 32K RAM,
(2-57 MHz) 2MB external
Stargate XScale PXA255 170-400 mW, 32MB RAM,
(100MHz–400MHz) Flash and CF card slots
Several studies have focused on single-tier camera sensor networks. Panoptes [62]
is an example of a video sensor node built using a Intel StrongARM PDA platform with a
Logitech Webcam as the vision sensor. The node uses the 802.11 wireless interface and can
is similar to Panoptes, with additional support for network wake ups and optimized wakeup-
from-suspend energy saving capability. Panoptes also incorporates compression, filtering
and buffering and adaptation mechanisms for the video stream and can be used by higher
tier nodes of SensEye. Other types of multimedia sensors, like audio sensors [61], have
also been used for calibration and localization applications.
12
2.1.3 Power management
Power management schemes, like wake–on-wireless [14] and Triage [6], are techniques
to efficiently use the limited battery power and thus extend lifetime of sensor platforms. The
wake–on–wireless solution uses a incoming call to wakeup the PDA and reduces power
consumption by shutting down the PDA when not in use. Triage is a software architecture
for tiered micro-servers, which contains more than one subsystem with different capabili-
ties and power requirements. The architecture uses an approach called Hierarchical Power
Management [54], which through intelligent software control reduces the amount of time a
higher power tier must remain on by executing tasks whenever possible at lower tiers. The
SensEye higher tier nodes are optimized using both the above solutions.
There exist several sensor network applications comprising of heterogeneous and hy-
brid sensor nodes. “Do-Not-Disturb” [20] is a heterogeneous sensor network that uses
acoustic and motion sensors for low-level sampling and resource efficient Stayton nodes
(equipped with Intel XScale processors). The low-power sensors transmit noise-level read-
ings to resource rich nodes for correlation and fusion tasks, and to identify and send alert
messages appropriate nodes. The Cane-toad monitoring application [24] also uses a pro-
totype consisting of heterogeneous sensors—low power Mica2 nodes and resource rich
Stargate nodes. The low-power nodes are used for high frequency acoustic sampling and
the Stargates for compute intensive machine learning tasks and calculation of Fast Fourier
transforms. While both these applications use the higher-tier resource-rich nodes for com-
putation and communication services, SensEye also uses the higher tier nodes to sense
and increase reliability. Tenet [19] is describes a generic architecture and the associated
research challenges of tiered sensor network. The tiered architecture consists of resource-
rich master nodes which are responsible for data fusion and application logic tasks. Master
nodes can further task lower level mote nodes to perform basic sampling tasks. The archi-
13
tecture and research challenges discussed overlap with the motivation for SensEye and is
similar to our work [28].
An important criteria of sensor networks is placement and coverage. Single tier place-
ment of cameras is studied in [60]. The paper solves the problem of efficient placement
of cameras given an area to be covered to meet task–specific constraints. This method
provides solutions for the single–tier placement problem and is useful to place each tier of
works are: location, orientation, set of neighbors and route setup. Localization of sensor
nodes in ad-hoc networks for localizing events and geographic routing is discussed in [52].
The technique depends on the presence of a few beacon nodes and localizes nodes using
distributed iterative algorithms. In [46], the authors develop techniques to estimate virtual
coordinates for nodes with very limited or no information regarding location information.
The virtual coordinates assigned to nodes are used for geographical routing purposes. I
have borrowed ideas from [36] and [51] which derive lower bounds for location and ori-
entation estimates of sensor nodes. Both studies identifies the source of errors and use
Cramér Rao Boundanalysis based on Euclidean distance and angle of arrival measurement
to derive error bounds. In this thesis, I apply a similar analysis to consider localization
based on the relation between a set of reference points and their projection locations.
14
2.2.3 Camera Calibration
Camera calibration using a set of known reference points is well studied in the computer
vision community. Methods developed in [58, 59, 68] are examples of techniques that esti-
mate both the intrinsic and extrinsic parameters of a camera using a set of known reference
points. The goal of these efforts is to estimate a complete set of about twelve parameters
of the camera. As a result, the methods require a larger number of reference points, are
compute-intensive, and require multiple stages to determine all parameters. Snapshotis de-
signed to estimate only the extrinsic parameters and requires only four known reference
locations to estimate a camera’s parameters. A recent effort [63] has proposed techniques
to estimate only the extrinsic parameters and also requires four reference points. The tech-
nique requires three out of the four reference locations to be collinear. Snapshot is similar
to some these calibration techniques proposed by the vision community, but differs in the
used of the Cricket position sensors to automate the protocol. Further, our empirical eval-
uation shows that the use of Cricket introduces very small error.
2.3 Applications
2.3.1 Video Surveillance
A distributed video surveillance sensor network is described in [17]. The video sensor
network is used to solve the problem of attention to events in presence of limited computa-
tion, bandwidth and several event occurrences. The system implements processing at cam-
eras to filter out uninteresting and redundant events and tracks abnormal movements. An
example of a single-tier video surveillance and monitoring system is VASM [43]. The main
objective of the system is to use multiple, cooperative video sensors for continuous tracking
and coverage. The system develops sophisticated techniques for target detection, classifi-
cation and tracking and also a central control unit to arbitrate sensors to tracking tasks. A
15
cal description and representation of events and learning-based classification. The system
uses a hierarchical master-slave configuration, where each slave camera station tracks lo-
cal movements and relays information to the master for fusion and global representation.
While our general aim is to build similar systems, we focus on systems, networking and
Localization is well studied in the sensor networks community [21, 52, 65]. All these
techniques assume a sensor node cable of position estimation. For example, a temperature
sensor can use its RF wireless communication link to send and receive beacons for location
estimation. Snapshot does not require any position estimation capability on the nodes and
directly uses the imaging capability of the cameras for localization and calibration.
Several positioning and self-localization systems have been proposed in the literature.
Active Badge [2] is a locationing system based in IR signals, where badges emit IR signals
are used for location estimation. A similar successor system based on ultrasound signals
is the Active Bat [3] system. Several other systems use RF signal strength measurements,
like RADAR [4], for triangulation based localization. While most of these techniques are
used indoors, GPS [5] is used for outdoor localization. While any of these methods can be
used by the Snapshot calibration device instead of the Cricket, each has its own advantages
and disadvantages. Based on the environment and desired error characteristics a suitable
Bayesian techniques like Kalman filter [26] and its variants [18, 50] have extensively
been used for track and trajectory prediction in several applications. [39] and [29] have used
Kalman filters to model user mobility and predict trajectory in cellular and ATM networks
16
for advance resource reservation, advance route establishment and for efficient seamless
handoff across bases stations. [34] used switching Kalman filters to track meteorological
features over time to determine future radar sensing decisions. While these applications use
Kalman filters for track prediction and optimizing performance of single-tiers networks, as
17
CHAPTER 3
3.1 Introduction
Typical applications of camera sensor networks include active monitoring of remote
environments and surveillance tasks such as object detection, recognition, and tracking.
Video surveillance and monitoring involves interaction and coordination between multiple
cameras, for instance, to hand-off tracking responsibilities for a moving object from one
camera to another. If a camera has an estimate of the tracked object’s location and knows
the set of cameras that view objects in that region, it can handoff tracking responsibilities
to the appropriate next camera. Object localization is one technique that is used to estimate
an object’s location and combine it information regarding location and orientation of other
cameras for effective handoff. The procedure used to calculate the camera’s parameters1 :
location, orientation, focal length, skew factor, distortion is known as camera calibration.
among cameras. Once a camera is calibrated, its viewing range can be used to estimate the
overlap and spatial relationships with other calibrated cameras in the network. Redundant
the network. Cameras can be intelligently duty-cycled, with a minimal subset of cameras
guaranteeing complete coverage, while others are in power-save mode.
Automated camera calibration is well studied in the computer vision community [58,
59, 63, 68]. Many of these techniques are based on the classical Tsai method—they require
1
In our work we focus only on external camera parameters and assume internal parameters to be known
or estimated a priori.
18
a user to specify reference points on a grid whose true locations are known in the physi-
cal world and use the projection of these points on the camera image plane to determine
camera parameters. However, such vision-based calibration techniques may not be directly
applicable to camera sensor networks for the following reasons. First, the vision-based
systems tend to use high-resolution cameras as well as high-end workstations for image
and video processing; consequently, calibration techniques can leverage the availability of
high-resolution images and abundance of processing power. Neither assumption is true
in sensor networks. Such networks may employ low-power, low-fidelity cameras such as
the CMUcam [48] or Cyclops [45] that have coarse-grain imaging capabilities; at best, a
mix of low-end and a few high-end cameras can be assumed for such environments. Fur-
ther, the cameras may be connected to nodes such as the Crossbow Motes [37] or Intel
Stargates [55] that have one or two orders of magnitude less computational resources than
PC-class workstations. Calibration techniques for camera sensor networks need to work
Second, vision-based calibration techniques have been designed to work with a single
camera or a small group of cameras. In contrast, a camera sensor network may comprise
tens or hundreds of cameras and calibration techniques will need to scale to these larger
environments. Further, camera sensor networks are designed for ad-hoc deployment, for
instance, in environments with disasters such as fires or floods. Since quick deployment
is crucial in such environments, it is essential to keep the time required for calibrating the
system to a minimum. Thus, calibration techniques need to be scalable and designed for
quick deployment.
Third, vision-based camera calibration techniques are designed to determine both in-
trinsic parameters (e.g., focal length, lens distortion, principal point) and extrinsic param-
eters (e.g., location and orientation) of a camera. Due to the large number of unknowns,
the calibration process typically involves many tens of measurements of reference points
and is computationally intensive. In contrast, calibrating a camera sensor network involves
19
only determining external parameters such as camera location and orientation, and may be
amenable to simpler, more efficient techniques that are better suited to resource-constrained
sensor platforms.
Automated localization techniques are a well-studied problem in the sensor community
and a slew of techniques have been proposed. Localization techniques employ beacons
(e.g., IR [2], ultrasound [3], RF [4]) and use sophisticated triangulation techniques to de-
termine the location of a node. Most of these technique have been designed for general-
purpose sensor networks, rather than camera sensor networks in particular. Nevertheless,
they can be employed during calibration, since determining the node location is one of the
not sufficient for calibration. Cameras are directional sensors and camera calibration also
involves determining other parameters such as the orientation of the camera (where a cam-
era is pointing) as well as its range (what it can see). In addition, calibration is also used
The design of an automated calibration technique that is cost-effective and yet scalable,
In this paper, we propose Snapshot a novel wireless protocol for calibrating camera
sensor networks. Snapshot advances prior work in vision-based calibration and sensor lo-
calization in important ways. Unlike vision-based techniques that require tens of reference
points for calibration and impose restrictions on the placement of these points in space,
Snapshot requires only four reference points to calibrate each camera sensor and allows
these points to be randomly chosen without restrictions. Both properties are crucial for
sensor networks, since fewer reference points and fewer restrictions enable faster calibra-
tion and reduce the computational overhead for subsequent processing. Further, unlike
20
sensor localization techniques that depend on wireless beacons, Snapshot does not require
any specialized positioning equipment on the sensor nodes. Instead, it leverages the inher-
ent picture-taking abilities of the cameras and the on-board processing on the sensor nodes
to calibrate each node. Our results show Snapshot yields accuracies that are comparable
those obtained by using positioning devices such as ultrasound-based Cricket on each node.
Our techniques can be instantiated into a simple, quick and easy-to-use wireless cali-
bration protocol—a wireless calibration device is used to define reference points for each
camera sensor, which then uses principles from geometry, optics and elementary machine
vision to calibrate itself. When more than four reference points are available, a sensor can
use median filter and maximum likelihood estimation techniques to improve the accuracy
of its estimates.
our prototype implementation. Our experiments yield the following key results:
2. Accuracy: We show that Snapshot can localize a camera to within few centimeters
of its actual location and determine its orientation with a median error of 1.3–2.5
degrees. More importantly, our experiments indicate that this level of accuracy is
sufficient for tasks such as object tracking. We show that a system calibrated with
Snapshot can localize an external object to within 11 centimeters of its actual loca-
tion, which is adequate for most tracking scenarios.
21
4. Scalability: We show that Snapshot can calibrate a camera sensor in about 20 sec-
onds on current hardware; Since a human needs to only specify a few reference
points using the wireless calibration device—a process that takes a few seconds per
sensor—Snapshot can scale to networks containing tens of camera sensors.
a calibration device. Snapshot estimates the camera location coordinates and orientation
using the acquired images, principles of optics and solving a non-linear optimization prob-
lem. These capabilities available on a high-end sensor node and hence an example of
accurate calibration.
As part of the study of Snapshot we have empirically studied the accuracy or location
and orientation estimates for two different types of cameras, the CMUCam and the Sony
ize the error in the calibration procedure. I plan to compare the empirical results with the
The basic Snapshot protocol involves taking pictures of a small randomly-placed cal-
ibration device. To calibrate each camera sensor, at least four pictures of the device are
necessary, and no three positions of the device must lie along a straight line. Each position
of the calibration device serves as a reference point; the coordinates of each reference point
are assumed to be known and can be automatically determined by equipping the calibra-
tion device with a locationing sensor (e.g., GPS or ultra-sound Cricket receiver). Next, we
describe how Snapshot uses the pictures and coordinates of the calibration device to esti-
22
Figure 3.1. Left Handed Coordinate System.
mate camera parameters. We also discuss how the estimates can be refined when additional
We begin with the intuition behind approach. Without loss of generality, we assume all
coordinate systems are left handed (see Figure 3.1), and the z-axis of the camera coordinate
system is co-linear with the camera’s optical axis. Consider a camera sensor C whose
are given along with their coordinates for determining the camera location. No assumption
is made about the placement of these points in the three dimensional space, except that
these points be in visual range of the camera and that no three of them lie along a straight
line. Consider the first two reference points R1 and R2 as shown in Figure 3.2. Suppose
that point objects placed at R1 and R2 project an image of P1 and P2 , respectively, in the
camera’s image plane as shown in Figure 3.2. Further, let θ1 be the angle incident by the
the reference points on the camera. Since θ1 is also the angle incident by P1 and P2 on
the camera lens, we assume that it can be computed using elementary optics (as discussed
later). Given θ1 , R1 and R2 , the problem of finding the camera location reduces to finding
a point in space where R1 and R2 impose an angle of θ1 . With only two reference points,
there are infinitely many points where R1 and R2 impose an angle of θ1 . To see why,
consider Figure 3.3(a) that depicts the problem in two dimensions. Given R1 and R2 , the
set of possible camera locations lies on the arc R1 CR2 of a circle such that R1 R2 is a
23
Image
plane
R2
focal length f
(x2 , y2 , z2 )
P1
C
θ1
θ1
R1
Lens (x1 , y1 , z1 )
P2
Camera center at
(x, y, z)
Figure 3.2. Projection of reference points on the image plane through the lens.
θ1
θ1 θ1
R1 R2
θ
axis of
R1 R2 rotation
chord of the circle and θ1 is the angle incident by this chord on the circle. From elementary
geometry, it is known that a chord of a circle inscribes a constant angle on any point on
the corresponding arc. Since we have chosen the circle such that chord R1 R2 inscribes an
angle of θ1 on it, the camera can lie on any point on the arc R1 CR2 . This intuition can be
generalized to three dimensions by rotating the arc R1 CR2 in space with the chord R1 R2 as
the axis (see Figure 3.3(b)). Doing so yields a three dimensional surface of possible camera
locations. The nature of the surface depends on the value of θ1 : the surface is shaped like
a football when θ1 > 90◦ , is a sphere when θ1 = 90◦ , and a double crown when θ1 < 90◦ .
The camera can lie on any point of this surface.
Next, consider the third reference point R3 . Considering points R1 and R3 , we obtain
another surface that consists of all possible locations such that R1 R3 impose a known angle
θ2 on all points of this surface. Since the camera must lie on both surfaces, it follows that the
24
Image
plane focal length f
v2
R2
P1 (−px1 ,−py1 , −f)
(x2 , y2 , z2 ) C (0, 0, 0)
C θ1 u1 θ1
v1 R1 u2
Lens
Lens (x1 , y1 , z1 ) P2
(−px2 ,−py2 , −f)
Camera center at (x, y, z)
(a) (b)
set of possible locations is given by the intersection of these two surfaces. The intersection
of two surfaces is a closed curve and the set of possible camera locations is reduced to any
Finally, if we consider the pair of reference points R2 and R3 , we obtain a third surface
of all possible camera locations. The intersection of the first surface and the third yields
a second curve of possible camera locations. The camera lies on the intersection of these
two curves, and the curves can intersect in multiple points. The number of possible camera
locations can be reduced further to at most 4 by introducing the fourth reference point R4 .
reality, only one of these locations can generate the same projections as R1 , R2 , R3 , and
R4 on the image plane. Using elementary optics, it is easy to eliminate the false solutions
(x1 , y1 , z1 ) . . . (x4 , y4 , z4 ). The line joining the camera C with each of these reference point
defines a vector. For instance, as shown in Figure 3.4(a), the line joining C and R1 defines
−−→
a vector CR1 , denoted by v~1 . The components of v1 are given by
−−→
v~1 = CR1 = {x1 − x, y1 − y, z1 − z}
25
Similarly, the vector joining points C and Ri , denoted by v~i , is given as
−−→
v~i = CRi = {xi − x, yi − y, zi − z} 1 ≤ i ≤ 4
As shown in Figure 3.4(a), let θ1 denote the angle between vectors v~1 and v~2 . The dot
p
|v~1 | = (x1 − x)2 + (y1 − y)2 + (z1 − z)2
The magnitude of v~2 is defined similarly. Substituting these values into Equation 3.2,we
get
(x1 − x)(x2 − x) + (y1 − y)(y2 − y) + (z1 − z)(z2 − z)
cos(θ1 ) = (3.3)
|v~1 | · |v~2 |
Let θ2 , through θ6 denote the angles between vectors v~1 and v~3 , v~1 and v~4 , v~2 and v~3 , v~2
and v~4 and v~3 and v~4 respectively. Similar expressions can be derived for θ2 , θ3 , . . . θ6 .
The angles θ1 through θ6 can be computed using elementary optics and vision, as dis-
cussed next. Given these angles and the coordinates of the four reference points, the above
expressions yield six quadratic equations with three unknowns: x,y, and z. A non-linear
solver can be used to numerically solve for these unknowns.
We now present a technique to compute the angle between any two vectors v~i and v~j .
Consider any two reference points R1 and R2 as shown in Figure 3.4 (a). Figure 3.4 (b)
26
shows the projection of these points through the camera lens onto the image plane. The
image plane in a digital camera consists of a CMOS sensor that takes a picture of the
camera view. Let P1 and P2 denote the projections of the reference points on the image
plane as shown in the Figure 3.4(b), and let f denote the focal length of the lens. For
simplicity, we define all points with respect to the camera’s coordinate system: the center
of the lens is assumed to be the origin in this coordinate system. Since the image plane
is at a distance f from the lens, all points on the image plane are at a distance f from
the origin. By taking a picture of the reference points, the coordinates of P1 and P2 can
be determined. These are simply the pixel coordinates where the reference points project
their image on the CMOS sensor; these pixels can be located in the image using a simple
(−px1 , −f, −pz1 ) and (−px2 , −f, −pz2 ) respectively. We define vectors u~1 and u~2 as lines
joining the camera (i.e., the origin C) to the points P1 and P2 . Then, the angle θ1 between
the two vectors u~1 and u~2 can be determined by taking the dot product of them.
u~1 · u~2
cos(θ1 ) =
|u~1 ||u~2 |
The inverse cosine transform yields θ1 , which is also the angle incident by the original
tions using a non-linear optimization algorithm [10] to estimate the camera location.
We now describe the technique employed by Snapshot to determine the camera’s ori-
entation along the three axes. We assume that the camera location has already been es-
timated using the technique in the previous section. Given the camera location (x, y, z),
2
In Snapshot the calibration device contains a colored LED and the vision-based recognizer must locate
this LED in the corresponding image.
27
our technique uses three reference points to determine the pan, tilt, and roll of the camera.
Intuitively, given the camera location, we need to align the camera in space so that the three
reference points project an image at the same location as the pictures takes by the camera.
Put another way, consider a ray of light emanating from each reference point. The camera
needs to be aligned so that each ray of light pierces the image plane at the same pixel where
the image of that reference point is located. One reference point is sufficient to determine
the pan and tilt of the camera using this technique and three reference point are sufficient
to uniquely determine all three parameters: pan, tilt and roll. Our technique uses the actual
coordinates of three reference points and the pixel coordinates of their corresponding im-
ages to determine the unknown rotation matrix R that represents the pan, tilt and roll of the
camera.
Assume that the camera is positioned at coordinates (x, y, z) and that the camera has a
a pan of α degrees, a tilt of β degrees, a roll of γ degrees The pan, tilt and roll rotations can
be represented as matrices, and can be used to calculate locations of points in the camera’s
coordinate space. The composite matrix for the pan, tilt and roll rotations of the camera
cos(γ) 0 sin(γ) 1 0 0 cos(α) − sin(α) 0
R = 0 1 0 × 0 cos(β) sin(β) × sin(α) cos(α) 0
− sin(γ) 0 cos(γ) 0 − sin(β) cos(β) 0 0 0
r11 r12 r13
= r21 r22 r23 (3.4)
r31 r32 r33
If an object is located at (xi , yi , zi ) in the world coordinates, the object’s location in the
0 0 0
camera coordinates (xi , yi , zi ) can be computed via Equation 3.5.
0
xi xi − x
y 0 = R × yi − y (3.5)
i
0
zi zi − z
where the composite rotation matrix R is given by Equation 3.4.
28
Front
Image
Image plane
plane
Dp
Pi
C (0,0,0)
(px i,f,pz i )
Intuitively, we can construct and solve a set of linear equations (see Equation 3.6) where
(x1 , y1 , z1 ), (x2 , y2 , z2 ), and (x3 , y3 , z3 ) are the world coordinates of 3 reference points,
0 0 0 0 0 0 0 0 0
and (x1 , y1 , z1 ), (x2 , y2 , z2 ), and (x3 , y3 , z3 ) are the corresponding camera coordinates to
It is easy to see that as these three reference points are not co-linear, the matrix
x1 − x y1 − y z1 − z
x −x y −y z −z
2 2 2
x3 − x y3 − y z3 − z
is a non-singular matrix, and hence, the three sets of linear equations in Equation 3.6 have
As shown in Figure 3.5, an object’s location in the camera coordinates and the projec-
tion of the object on the image plane have the following relation:
0
xi pxi
y 0 = Di × f (3.7)
i
0 Dp
zi pzi
29
where:
p
Di = (xi − x)2 + (yi − y)2 + (zi − z)2 and
p
Dp = px2i + f 2 + pzi2
Di and Dp represent the magnitude of the object to camera center vector and the projection
Recall from Section 3.2.1.1 that our six quadratic equations yields up to four possible
solutions for the camera location. Only one of these solution is the true camera location. To
eliminate false solutions, we compute the pan, tilt and roll for each computed location using
three reference points. The fourth reference point is then used to eliminate false solutions as
follows: for each computed location and orientation, we project the fourth reference point
onto the camera’s image plane. The projected coordinates are then matched to the actual
pixel coordinates of the reference point in the image. The projected coordinates will match
the pixel coordinates only for the true camera location. Thus, the three false solutions can
be eliminated by picking the solution with the smallest re-projection error. The chosen
solution is always guaranteed to be the correct camera location.
30
Image Polyhedron Q
visual range
Sensor Camera
Lens
P
C
R
A
focal length maximum viewable distance
S
Once the location and orientation of each camera have been determined, the next task is
to determine the visual range of each camera and the overlap of viewable regions between
sensor coverage in the environment. Overlapping cameras can also be used to localize and
ure 3.6. The apex of the polyhedron is the location of the camera C (also the lens center)
and height of the pyramid is the maximum viewable distance of the camera. An object in
Although a camera can view infinitely distant objects, such objects will appear as point
objects in any picture taken by the camera and are not useful for tasks such as object
detection and recognition. Thus, it is necessary to artificially restrict the viewable range
of the camera; the maximum viewable distance is determined in an application-specific
manner and depends on the sizes of the objects being monitored (the larger the object, the
greater is the maximum viewable distance of each camera). Assuming that this distance is
determined offline, Snapshot can then precisely determine the polyhedron that encompasses
the viewable range of the camera (assuming no obstacles such as walls are present to cut
31
Assume that the camera location (x, y, z) is given. We also assume that the size of the
camera CMOS sensor is known (specifications for digital cameras typically specify the size
of the internal CMOS sensor). Since the CMOS sensor is placed at a focal length distance
from the lens, the coordinates of the four corners of the sensor can be determined relative
to the camera location (x, y, z). As shown in Figure 3.6, the polyhedron is fully defined
−→ −→ −→ −→
by specifying vectors CP , CQ, CR and CS which constitute its four edges. Further,
−→ −→
CP = fd · AC, where AC is the line segment joining the edge of the CMOS sensor to
the center of the lens, and d is the maximum viewable distance of the camera. Since the
−→ −→
coordinates of points A and C are known, the vector AC is known, and CP can then be
determined. The four edges of the polyhedron can be determined in this fashion.
hedrons intersect (the intersection indicates the region in space viewable from both cam-
eras). To determine if two polyhedrons intersect, we consider each surface of the first
polyhedron and determine if one of the edges on the other polyhedron intersects this sur-
face. For instance, does the line segment CP intersect any of the four surfaces of the other
polyhedron? If any edge intersects a surface of the other polyhedron, then the two cameras
have overlapping viewable regions. The intersection of a line segment with a plane can be
easily represented in vector algebra using vector cross and dot products [25] and we omit
While Snapshot requires only four reference points to calibrate a camera sensor, the
estimates of the camera location and orientation can be improved if additional reference
points are available. Suppose that n reference points, n ≥ 4, are available for a particular
sensor node. Then n4 unique subsets of four reference points can be constructed from these
n points. For each subset of four points, we can compute the location and orientation of
the camera using the techniques outlined in the previous sections. This yields n4 different
32
estimates of the camera location and orientation. These estimates can be refined to obtain
the final solution using one of three methods:
Least Square Method: This technique picks one solution from the n4 solutions that
most accurately reflects the camera location and orientation. To do so, the technique uses
each computed camera location and orientation to re-project all reference points on the
camera image plane and chooses the solution that yields the minimum error between the
projected coordinates and the actual coordinates in the image. The solution that yields the
minimum error is one that minimizes the following expression
0
n xi
X f 0
|| 0 × yi − Pi ||2 . (3.9)
y i 0
i=1 zi
0
xi
0
where yi is the location of reference point i in camera coordinates according to Equa-
0
zi
pxi
tion 3.5, and Pi =
f is the real projection of reference point i.
pzi
Median Filter Method: This method simply takes the median of each estimated pa-
rameter, namely x, y, z, pan α, tilt β, and roll γ. These median values are then chosen as
the final estimates of each parameter. Note that while the least squares method picks one
of the n4 initial solutions as the final solution, the median filter method can yield a final
solution that is different from all n4 initial solutions (since the median of each parameter
is computed independently, the final solution need not correspond to any of the initial so-
lutions). The median filter method is simple and cost-effective, and it performs well when
n is large.
Maximum Likelihood Estimation: The MLE method [13] uses the initial estimates
as its initial guess and searches through the state space to choose a solution that minimizes
an error term. We choose the same error function as the least squares method: the search
should yield a solution that yields the least error when projecting the reference points on
the camera image plane.
33
Minimizing Equation 3.9 by searching through the parameter state space is a non-linear
minimization problem, that can be solved numerically using the Levenberg-Marquardt al-
gorithm. The algorithm requires an initial guess of R and (x, y, z): our estimates from
Snapshot can be used as this initial guess. Note that, MLE is computationally more ex-
pensive that the median filter method or the least squares method. While its advantage
diminishes when n is large, it can yield better accuracy when n is small.
Choosing between these methods involves a speed versus accuracy tradeoff. In general,
the first two methods are more suitable if calibration speed is more important. The MLE
method should be chosen when calibration accuracy is more important or if n is small.
This section, describes how the estimation techniques presented in the previous section
can be instantiated into a simple wireless protocol for automatically calibrating each camera
sensor. The protocol assumes that each sensor node has a wireless interface that enables
wireless communication to and from the camera. The calibration process involves the use of
a wireless calibration device which is a piece of hardware that performs the following tasks.
First, the device is used to define the reference points during calibration—the location of
the device defines a reference point, whose coordinates are automatically determined by
equipping the device with a positioning sensor (e.g., ultrasound-based Cricket). Second, the
device also also serves as a point object for pictures taken by the camera sensors. To ensure
that the device can be automatically detected in an image by vision processing algorithms,
we equip the device with a bright LED sensor (which then serves as the point object in
an image). Third, the devices serves as a “wireless remote” for taking pictures during the
calibration phase. The devices is equipped with a switch that triggers a broadcast packet
on the wireless channel. The packet contains the coordinates of the device at that instant
and includes a image capture command that triggers a snapshot at all camera sensors in its
wireless range.
34
Given such a device, the protocol works as follows. A human assists the calibration
process by walking around with the calibration device. The protocol involves holding the
device at randomly location points and initiating the trigger. The trigger broadcast a packet
to all cameras in the range with a command to take a picture (if the sensor node is asleep,
the trigger first wakes up a node using a wakeup-on-wireless algorithm). The broadcast
packet also includes the coordinates of the current position of the device. Each camera then
processes the picture to determine if the LED of the calibration device is visible to it. If so,
the pixel coordinates of the device and the transmitted coordinates of the reference point
are recorded. Otherwise the camera simply waits for the next trigger. When at least four
reference points become available, the sensor node then processes this data to determine
the location, orientation and range of the camera. These parameters are then broadcast so
that neighboring cameras can subsequently use them for determining the amount of overlap
between cameras. Once a camera calibrates itself, a visual cue is provided by turning on an
LED on the sensor node so that the human assistant can move on to other sensors.
• Projection point location: The calibration protocol use the projection point on
the camera image plane to calculate the camera parameters. Errors introduced in
the location of the projection point, due to lens distortion, skew and object detection
• Reference point location: The locations of the reference points are used to calibrate
the camera, which in turn are calculated based on range estimates to fixed beacons.
The error in the ultrasound based range estimation introduces an error in the location
35
of the reference point, which in turn influences the error for the calibrated camera
parameters.
In this section, we present two techniques to analyze the effect of the above errors on
the calibration parameters.
We first derive a lower bound on the camera location estimation error using the Cramér
Rao Bound(CRB) [27, 36, 51]. The CRB gives a lower bound on the expected covariance
for an unbiased estimator. We use the CRB to derive a lower bound on the Euclidean
distance error between the exact camera location and the estimated camera location. Let
0 0 0 0
C = (xc , yc , zc ) be the location of a camera and C = (xc , yc , zc ) its estimate. The error
0 0 T
V = E{(C − C)(C − C) } (3.10)
The lower bound on the error covariance is the Cramér Rao Bound and is calculated as,
where, the matrix J(C), the Fisher information matrix. Fisher information is the amount of
∂ ∂
J(C) = E{[ ln fX (x; C)][ ln fX (x; C)]T } (3.12)
∂C ∂C
where, x = (u, v) is the measured location of the reference point projection on the image
plane.
36
Consider a reference point at location (xi , yi , zi ) and camera at C = (xc , yc , zc ). The
coordinated of the projection point on the image plane (u, v) are given by:
xi 0 − xc 0 yi 0 − yc 0
u= × f, v = × f,
zi 0 − zc 0 zi 0 − zc 0
T
[xi 0 yi 0 zi 0 ] = R × [xi yi zi ]T
T
[xc 0 yc 0 zc 0 ] = R × [xc yc zc ]T
where R is the composite rotation matrix. For this analysis, we assume R to be the identity
matrix, as we are interested only error relative to each camera and not in terms of absolute
error which depends on the camera’s orientation in the reference coordinate system.
For the purpose of this analysis, we assume the error in measuring the coordinates
X = (u, v) on the image plane is Gaussian, and the probability density function is given
by,
where, u(C, Rp ) and v(C, Rp ) are the true coordinates on the image plane, σ 2 the variance
and Rp the reference point. As stated above, the projection coordinates of a reference point
depend on the reference points location and the camera location, denoted by u(C, Rp )
and v(C, Rp ). Let D denote the array of parameters the projections depend. Since the u
coordinate depends on the x and the z intercepts, Du = [xc zc xr zr ] and the v coordinate
on the y and z intercepts, Dv = [yc zc yr zr ], where, (xr , yr , zr ) represent reference point
37
Based on equations 3.12 and 3.14, the information matrix for the u coordinate on the
image plane can be represented as,
where, i is index of the ith reference point, j is the j th dependent variable and Σ the co-
variance matrix corresponding to the projection error. The entries of Gu (C, Rp ) for each
∂u(C, Ri ) ∂u(C, Ri ) ∂u(C, Ri ) ∂u(C, Ri )
Gu (C, Rp )i = (3.17)
∂xc ∂zc ∂xr ∂zr
The Fisher information matrix Ju (C, Rp ) can be computed from equation 3.12 using the
matrix Gu (C, Rp ). We can similarly estimate Jv (C, Rp ), the Fisher information matrix for
Next, we consider the error introduced due to uncertainty in location of the reference
point. In this case too, the error in the location of the reference point is assumed to be
where, Rp is a vector of reference point locations, µ0 the exact locations of the reference
points and Σ0 a diagonal matrix of associated uncertainties. For the Gaussian probability
0 0
J0 = (3.19)
0 Σ−1
0
38
Under the assumption that the information provided by the measurements are independent
of the a priori information, the total fisher information matrix is the sum of the two. From
The total fisher information matrices can be used to determine the CRB for the coordi-
nates of the camera location as,
−1
Cu ≥ Jut (3.22)
−1
Cv ≥ Jvt (3.23)
The lower bound on the expected localization error is calculated using the expected
p
Err = var(xc ) + var(yc ) + var(zc ) (3.24)
While the above technique calculates the lower bound of the error in location estima-
tion of a camera, a similar bound can be derived for the error in orientation estimation.
Techniques to calculate this bound needs further investigation and is not explored in this
thesis. Further, the Cramér Rao Bound-based analysis assumes location errors of the ref-
erence objects and the projection errors are independent. Techniques can be developed to
analytically study the error characteristics of estimating calibration parameters when these
39
3.3.2 Empirical Error Analysis
As described in Section 3.2.4, locations of reference points are estimated using a po-
sitioning system (ultrasound based Cricket locationing system) which are further used for
calibration. The estimated locations of reference points have uncertainties due to errors in
ultrasound based range estimates. The average location error using Cricket (measured in
terms of Euclidean distance) is empirically estimated to be 3-5 cm. The error in reference
point locations impacts the calculated calibration parameters and we study the sensitivity
of calibrated parameters to these errors. Consider four reference points with true loca-
tions (x1 , y1 , z1 ), (x2 , y2 , z2 ), (x3 , y3 , z3 ) and (x4 , y4 , z4 ) which estimate the location of the
camera as (xc , yc , zc ) and orientation angles as α, β and γ. Further, we assume that the
error in each dimension of the reference point location is specified by a normal distribution
N (0, σ 2 ), with zero mean and variance σ 2 . Given n reference points, an error component
x0i = xi + ex ; (3.28)
yi0 = yi + ey ; (3.29)
zi0 = zi + ez ; (3.30)
n
where, ex , ey , ez are randomly sampled from a normal distribution N . The 4
updated
reference point subsets are then used to compute the camera location (x0c , yc0 , zc0 ) and orien-
tation parameters α0 , β 0 , γ 0 . The relative error in calibration as result of the error in reference
point locations is measured as,
p
locerr = (x0c − xc )2 + (yc0 − yc )2 + (zc0 − zc )2 (3.31)
40
where, locerr is the relative location error, measured as the Euclidean distance between the
estimated camera locations and panerr , tilterr and rollerr are the relative orientation errors
We test sensitivity for random error—errors in each dimension of every reference point are
randomly sampled and correlated error—errors for each dimension are sampled randomly
but are same for all reference points. We present the experimental results of the sensitivity
analysis in Section 3.5.
that will subsequently use this calibrated sensor network. To determine how calibration
errors impact application accuracy, we consider a simple object localization and tracking
example. This scenario assumes that the calibrated sensor network is used to detect external
objects and track them as they move through the environment. Tracking is performed by
continuously computing the coordinates of the moving object. A camera sensor network
is simultaneously visible from at least two cameras, and if the locations and orientations
of these cameras are known, then the location of the object can be calculated by taking
pictures of the object and using its pixel coordinates to compute its actual location.
To see how this is done, consider Figure 3.7 that depicts an object O that simultaneously
visible in cameras C1 and C2 . Since both cameras are looking at the same object, the lines
connecting the center of the cameras to the object, should intersect at the object O. Since
the locations of each camera is known, a triangle C1 OC2 can be constructed as shown in
the figure. Let D1 and D2 denote the distance between the object and the two cameras,
respectively, and let D12 denote the distance between the two cameras. Note that D12 can
41
Object O
D1 D2
Camera
θ2 θ1 C2
Camera
C1 D 12 v2
Image Image
plane v plane
1
be computed as the Euclidean distance between the coordinates C1 and C2 , while D1 and
D2 are unknown quantities. Let θ1 , θ2 and φ denote the internal angles of the triangle as
shown in the figure. Then the Sine theorem for a triangle from elementary trigonometry
states that
D1 D2 D12
= = (3.35)
sin(θ1 ) sin(θ2 ) sin(φ)
The angles θ1 and θ2 can be computed by taking pictures of the object and using its
pixel coordinates as follows. Suppose that the object projects an image at pixel coordinates
(−px1 , −pz1 ) at camera C1 , Let f1 denote the focal length of camera C1 . Then projection
vector v~1 = (px1 , f, pz1 ) is the vector joining the pixel coordinates to the center of the
lens and this vector lies along the direction of the object from the camera center. If ~v is
the vector along the direction of line connected the two cameras, the the angle θ1 can be
calculated using the vector dot product:
The angle θ2 can be computed similarly and the angle φ is next determined as (180−θ1 −θ2 ).
Given θ1 , θ2 and φ and the distance between two cameras D12 , the values of D1 and D2
can be computed using the Sine theorem as stated above.
42
Given the distance of the object from the cameras (as given by D1 and D2 ) and the
direction along which the object lies (as defined by the projection vectors v~1 and v~2 ), the
object location can be easily computed. Note that the orientation matrices of the cameras
must also be accounted for when determining the world coordinates of the object using each
camera. In practice, due to calibration errors, the object location as estimated by the two
cameras are not identical. We calculate the mid–point of the two estimates as the location
of the object.
Thus, two overlapping cameras can coordinate with one another to triangulate the lo-
cation of an external object. We will use this object localization application in our exper-
imental evaluation to quantify the impact of calibration errors on the application tracking
error.
Cricket, and the evaluate the impact of Snapshot on our object tracking application.
The setup to evaluate the accuracy and sensitivity to system parameters of Snapshot
consisted of placing the two types of cameras, CMUcam and the Sony MotionEye webcam,
the position sensor objects. Each camera took several pictures to estimate the parameters.
The difference between the estimated parameter value and the actual value is reported as
the measurement error. The Cricket sensors on the objects received beacons from a set of
pre–calibrated Cricket sensor nodes placed on the ceiling of a room. The digital compass
was attached to the two cameras in order to measure the exact orientation angles.
43
3.5.2 Camera Location Estimation Accuracy
locations and orientations. We measure the location of these reference points by hand
(referred as without Cricket) which can be considered as the object’s real location and by
Cricket [42] (referred as with Cricket) where we observed a 2–5cm error.
For each picture, we take all the combinations of any four reference points in view (not
any 3 points in the same line), and estimate camera’s location accordingly. We consider
the distance between the estimated camera’s location and the real camera’s location as the
As shown in Figure 3.8(a), our results show: (i) the median errors using webcam with-
out Cricket and with Cricket are 4.93cm and 9.05cm, respectively; (ii) the lower quartile
and higher quartile errors without Cricket are 3.14cm and 7.13cm; (iii) the lower quartile
and higher quartile errors with Cricket are 6.33cm and 12.79cm; (iv) the median filter (re-
ferred as M.F.) improves the median error to 3.16cm and 7.68cm without Cricket and with
Cricket, respectively.
Figure 3.8(b) shows: (i) median errors using CMUcam without Cricket and with Cricket
are 6.98cm and 12.01cm, respectively; (ii) the lower quartile and higher quartile errors
without Cricket are 5.03cm and 10.38cm; (iii) the lower quartile and higher quartile errors
with Cricket are 8.76cm and 15.97cm; (iv) the median filter improves the median error to
As our protocol proceeds, the number of available reference points increases. As a re-
sult, the number of combinations of any four reference points also increases, and we have
more location estimations available for the median filter. Consequently, we can eliminate
tails and outliers better. In this section, we study the effect of the iterations of our proto-
44
1 1
0.9 0.9
0.8 0.8
Probability 0.7 0.7
Probability
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 webcam,no Cricket 0.2 CMUcam,no Cricket
webcam,no Cricket(M.F.) CMUcam,no Cricket(M.F.)
0.1 webcam+Cricket 0.1 CMUcam+Cricket
webcam+Cricket(M.F.) CMUcam+Cricket(M.F.)
0 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
Error (cm) Error (cm)
col’s runs on camera location estimation error by plotting the median versus the number of
Figure 3.9 shows: (i) the median errors using webcam drop from 4.93cm to 2.13cm and
from 9.05cm to 6.25cm as the number of reference points varies from 4 to 16 for without
and with Cricket, respectively; (ii) the median errors using CMUcam drop from 6.98cm to
2.07cm and from 12.01cm to 9.59cm as the number of reference points varies from 4 to 16
for without and with Cricket, respectively. The difference in the location estimation errors
(with and without Cricket) are due to the position error estimates in Cricket and also due to
errors in values of camera intrinsic parameters.
ters. We used the two cameras, the CMUcam and the Sony MotionEye webcam, to capture
images of reference points at different locations and different orientations of the camera.
We used estimated location of the camera based on exact locations on reference points and
Cricket–reported locations of reference points to estimate the orientation parameters of the
camera. The orientation of the camera was computed using the estimated camera location.
45
16
CMUcam,no Cricket
14 CMUcam+Cricket
webcam,no Cricket
12 webcam+Cricket
0
2 4 6 8 10 12 14 16
Number of Reference Points
Figure 3.9. Effect of number of reference points on location estimation error.
We compared the estimated orientation angles with the measured angles to calculate error.
Figure 3.10(a) shows the CDF of the error estimates of the pan, tilt and roll orientations
respectively using the CMUcam camera. Figure 3.10(b) show the CDF of the error of the
three orientations using Cricket for location estimation. The cumulative error plots follow
the same trends for each of the orientation angles. The median roll orientation error using
Cricket and without Cricket for camera location estimations is 1.2 degrees. In both cases,
the 95th percentile error is less than 5 degrees for the pan and tilt orientation and less than
3 degrees for the roll orientation. The slight discrepancies in the error measurement of the
two cases is due to the use the digital compass to measure the orientation of the camera.
Thus, we conclude the Cricket’s positioning errors do not add significant errors in esti-
mation of camera orientation parameters. In our experiments, we find that a median loca-
tion estimation error of 11cm does not affect the orientation estimation significantly.
Next, we compare the empirical error using Snapshot to calibrate cameras with the
expected error lower bounds obtained using Cramér Rao Bound analysis. As discussed
in Section 3.3, the error in estimation of camera parameters is effected by the projection
error and the error in the location of the reference point. Figure 3.11 reports results with a
46
1 1
0.9 0.9
0.8 0.8
0.7 0.7
Probability
Probability
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 pan 0.2 pan
0.1 tilt 0.1 tilt
roll roll
0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Error (degrees) Error (degrees)
Figure 3.10. Empirical CDF of error in estimating orientations with the CMUcam.
variance of 3 pixels in the projection error and 8cm error in each dimension of the reference
point location. Figure 3.11(a) reports the error considering only error in projection, both the
empirical and the lower bound on the error decrease. Comparing the lower bound on error
with the empirical error, the difference with fewer reference points in 4-5 cm and decreases
to 2-3 cm with greater than 10 reference points. Figure 3.11(b) reports the comparison
between the empirical error and the lower bound when both the projection error and error
in reference point location are considered. As can be seen, the lower bound on error is
greater than only when the effect projection error is considered for not the CMUCam and
the Webcam. Further, the error in reference point location dominates the calibration error
and remains almost constant even with increase in number of reference points. The trend
in the empirical error is similar to the lower bound in the error and differs from the lower
bound by 3cm with the Webcam 5-6cm with the CMUcam.
tribution in each dimension from 1cm to 8cm and numerically computed its impact on the
47
9
CMUCam (352x288) 12
8 Webcam (640x480)
7 CRB(352x288) 10
CRB(640x480)
Distance error (cm)
3 4
CMUCam (352x288)
2
Webcam (640x480)
2
1 CRB(352x288)
CRB(640x480)
0 0
4 6 8 10 12 14 16 4 6 8 10 12 14 16
Number of reference points Number of reference points
Figure 3.11. Comparison of empirical error with lower bounds with and without consider-
ing error due to Cricket.
calibration parameters. As shown in Figure 3.12(a), the estimated locations are less sensi-
tive to the correlated error, but are highly sensitive to the random error. Further, the results
in Figure 3.12(b) shows that: (i) orientation estimation is insensitive to the correlated er-
ror, the mean error is always very close to zero; and (ii) the orientation estimation is very
sensitive to the random error, the mean error increases by a factor of four as the standard
deviation increases from 1cm to 8cm. The calibrated parameters are less sensitive to corre-
lated errors as all reference points have the same error magnitudes and the camera location
shifts in the direction of the error without affecting the estimated orientation. With random
errors in each dimension of the reference points, all reference points shift to different direc-
tions by different offsets, and as a result, calibration errors are larger. However, the error
in a real Cricket system is neither correlated nor random, it is somewhere between these
two cases, and has intermediate sensitivity. The previous experimental results verify this
hypothesis.
48
90
webcam(random) pan(random)
80 webcam(correlated) 8 tilt(random)
cmucam(random) roll(random)
70 cmucam(correlated) pan(correlated)
1
0.9
0.8
0.7
Probability
0.6
0.5
0.4
0.3
0.2 webcam,no Cricket
webcam+Cricket
0.1 CMUcam,no Cricket
CMUcam+Cricket
0
0 5 10 15 20 25 30 35 40 45 50
Error (cm)
calibrated parameters to triangulate an object via the technique described in Section 3.4.
Similar to Section 3.5.2, we use the empirical CDF of object’s location estimation error to
measure the performance. Our results (see Figure 3.13) show that: (i) the median local-
ization error using webcams is 4.94cm and 5.45cm without and with Cricket, respectively;
(ii) the median localization error using CMUcams is 11.10cm and 11.73cm without and
with Cricket, respectively; (iii) localization without Cricket outperforms localization us-
ing Cricket for all cameras; and (iv) localization using webcams outperforms that with the
49
Task Duration(ms)
Snap Image 178 ± 2
Recognize Object Location 52 ± 0.1
Location Estimation 18365 ± 18
Using our prototype implementation of we measure the runtime of the Snapshot pro-
tocol. Figure 3.14 reports runtime of different tasks of the Snapshot calibration protocol
executing on the Intel Stargate platform with the camera attached to a USB connector (the
transfer of an image on the serial cable with the CMUcam requires additional time). As
seen from the table, the location estimation task which uses a non–linear solver, has the
highest execution time. The time to calibrate an individual camera is, 4 × (178 ms + 52
ms) – time to snap four images and recognize the location of object in each and 18365 ms
for the location and orientation estimation, which is total time of 19.285 seconds. Thus,
with a time of approximately 20 seconds to calibrate a single camera, Snapshot can easily
3.6 Conclusions
In this chapter, we presented Snapshot, an automated calibration protocol that is ex-
plicitly designed and optimized for sensor networks. Our techniques draw upon principles
from vision, optics and geometry and are designed to work with low-fidelity, low-power
camera sensors that are typical in sensor networks. Our experiments showed that Snapshot
yields an error of 1-2.5 degrees when determining the camera orientation and 5-10cm when
determining the camera location. We argued that this is a tolerable error in practice since a
Snapshot-calibrated sensor network can track moving objects to within 11cm of their actual
locations. Finally, our measurements showed that Snapshot can calibrate a camera sensor
50
within 20 seconds, enabling it to calibrate a sensor network containing tens of cameras
within minutes.
51
CHAPTER 4
4.1 Introduction
Wireless sensor networks have received considerable research attention over the past
decade, and rapid advances in technology have led to a spectrum of choices of image sen-
[44, 49]—have become popular for applications such as environmental monitoring and
surveillance.
tasks such as object detection, recognition, and tracking. While object detection involves
determining when a new object appears in range of the camera sensors, recognition in-
volves determining the type of the object, and tracking involves using multiple camera
sensors to continuously monitor the object as it moves through the environment. To effec-
tively perform these tasks, the camera sensor network needs to be calibrated at setup time.
Calibration involves determining the location and orientation of each camera sensor. The
location of a camera is its position (3D coordinates) in a reference coordinate system, while
orientation is the direction in which the camera points. By determining these parameters
for all sensors, it is possible to determine the viewable range of each camera and what por-
tion of the environment is covered by one or more cameras. The relationship with other
nearby cameras, in particular, the overlap in the viewable ranges of neighboring cameras
can be determined. This information can be used by applications to determine which cam-
era should be used to sense an object at a certain location, how to triangulate the position
52
of an object using overlapping cameras, and how to handoff tracking responsibilities from
one camera to another as the object moves.
been proposed [23, 59, 68]. These techniques assume that coordinates of few landmarks
are known a priori and use the projection of these landmarks on the camera’s image plane,
in conjunction with principles of optics, to determine a camera’s coordinates and orienta-
tion.1 In certain cases locations of landmarks are themselves determined using range esti-
mates from known locations; for instance, a positioning technology such as Cricket can be
used to determine the coordinates of landmarks from known beacon locations. However,
these techniques are not feasible for deployments of ad-hoc low power camera sensors
for the following reasons: (i) Resource constraints: Vision-based techniques for accu-
rate calibration of cameras are compute intensive. Low-power cameras do not have the
low-power cameras are often of low fidelity and not well suited for high precision calibra-
tion, (ii) Availability of landmarks: In many scenarios, ad-hoc camera sensor networks
are deployed in remote locations for monitoring mountainous and forest habitats or for
monitoring natural disasters such as floods or forest fires. No landmarks may be available
enable direct determination of the node location and orientation. However, today’s GPS
technology has far too much error to be practical for calibration purposes (GPS can lo-
calize an object to within 5-15m of its actual position). Ultrasound-based positioning and
1
Vision-based calibration techniques can also determine a camera’s internal parameters such as the camera
focal length and lens distortion, in addition to external parameters such as location and orientation.
53
ranging technology [42] is an alternative which provides greater accuracy. But use of ad-
ditional hardware with low-power cameras both consumes more energy and in some cases,
can be prohibitive due to its cost. As a result, accurate calibration is not always feasible
for initialization of resource-constrained camera sensor networks with limited or no infras-
tructure support.
Due to these constraints, in this thesis we ask a fundamental question: is it possible
to initialize camera sensors without the use of known landmarks or without using any po-
sitioning technology? In scenarios where accurate camera calibration may not always be
feasible, determining relative relationships between nearby sensor nodes may be the only
• How can we determine relative locations and orientations of camera sensors without
To address the above challenges, in this thesis, we propose novel approximate initializa-
tion techniques for camera sensors. Our techniques rely only on the inherent picture-taking
ability of cameras and judicious use of on-board computational resources to initialize each
camera relative to other cameras in the system. No infrastructure support for beaconing,
range estimation or triangulation is assumed. Our initialization techniques are computa-
tionally lightweight and easily instantiable in environments with little or no infrastructure
niques to estimate these parameters by taking pictures of a randomly placed reference ob-
ject. To quantify the accuracy of our methods, we implement two techniques—duty-cycling
54
and triggered wakeup—that exploit this initialization information to effectively perform
these tasks.
• Our approximate initialization techniques can estimate both k-overlap and region of
• The approximation techniques can handle and correct for skews in the distribution of
• The application-level accuracy using our techniques is 95-100% for determining the
ion with no a priori planning. Each sensor node is assumed to consist of a low-power
imaging sensor such as the Cyclops [44] or the CMUCam [49] connected to an embedded
sensor platform such as the Crossbow Mote [37] or the Telos node [41]. No positioning
hardware is assumed to be present on the nodes or in the environment. Given such an ad-
hoc deployment of camera sensors, our goal is to determine the following parameters for
each node:
• Degree of overlap, which is the fraction of the viewable range that overlaps with
other nearby cameras; specifically we are interested in the k-overlap, which is the
fraction of the viewable region that overlaps with exactly k other cameras.
• Region of overlap, which is the spatial volume within the viewable region that over-
laps with another camera. While the degree of overlap indicates the extent of the
55
k1
Camera 1
k1 k2 Camera 3
k3 k1
k1 k2
Camera 2 k1
viewable region that overlaps with another camera, it does not indicate which portion
of the viewable range is covered by another camera. The region of overlap captures
this spatial overlap and is defined as the 3D intersection of the viewable regions of
Our goal is to estimate these parameters using the inherent picture-taking capability
of cameras. We assume the presence of a reference object that can be placed at random
locations in the environment; while the coordinates of the reference object are unknown,
the sensors can take pictures to determine if the object can be viewed simultaneously by
two or more cameras from a particular location. Our goal is to design techniques that use
this information to determine the degree and region of overlap for the various nodes. The
physical dimensions of the reference object as well as the focal length f of each camera is
As indicated earlier, degree of overlap is defined by the k-overlap, which is the fraction
of the viewing area simultaneously covered by exactly k cameras. Thus, 1-overlap is the
56
Camera 1 Camera 1
Camera 3 Camera 3
Camera 1
Camera 2 Camera 2
Field−of−view Field−of−view
fraction of a camera’s viewable region that does not overlap with any other sensor; 2-
overlap is the fraction of region viewable to itself and one other camera, and so on. This is
illustrated in Figure 4.1 where k1 denotes the region covered by a single camera, k2 and k3
denote the regions covered by two and three cameras, respectively. It follows that the union
of the k-overlap regions of a camera is exactly the total viewable range of that camera (i.e.,
the sum of the k-overlap fractions is 1). Our goal is to determine the k-overlap for each
to determine the k-overlap for each camera sensor. This is done by placing an easily identi-
fiable reference object at randomly chosen locations and by having the camera sensors take
pictures of the object. Let each object location be denoted as a reference point (with un-
known coordinates). Each camera then processes its pictures to determine which reference
points are visible to it. By determining the subset of the reference points that are visible to
multiple cameras, we can estimate the k-overlap fractions for various sensors. Suppose that
ri reference points from the total set are visible to camera i. From these ri reference points,
let rik denote the reference points that are simultaneously visible to exactly k cameras. As-
suming an uniform distribution of reference points in the environments, the k-overlap for
camera i is given by
rik
Oik = (4.1)
ri
57
Depending on the density of reference points, error in the estimate of OiK can be controlled.
The procedure is illustrated in Figure 4.2(a), where there are 16 reference points visible to
camera 1, of which 8 are visible only to itself, 4 are visible to cameras 1 and 3 and another
4 to cameras 1, 2, and 3. This yields a 1-overlap of 0.5, 2-overlap and 3-overlap of 0.25 for
and the need to calibrate the system online in the field, the placement of reference objects
at randomly chosen locations will not be uniform. The resulting error due to a non-uniform
distribution is illustrated in Figure 4.2(b), where our technique estimates the 1-, 2- and
The basic idea behind our enhancement is to assign a weight to each reference point,
where the weight denotes the volume that it represents. Specifically, points in dense pop-
ulated region are given smaller weights and those in sparely populated regions are given
higher weights. Since a higher weight can compensate for the scarcity of reference points
in sparely populated region, we can correct for skewed distributions of reference points.
Our enhancement is based on the computational geometry technique called Voronoi tessel-
lation [7]. In two dimensions, a Voronoi tessellation of a set of points is the partitioning of
the plane into convex polygons such that all polygons contain a single generating point and
all points within a polygon are closest to the corresponding generating point. Figure 4.2(c)
shows a skewed distribution of reference points in the 2D viewing area of a camera and the
corresponding Voronoi tessellation. Each reference point in the camera is contained within
a cell, with all points in a cell closest to the corresponding reference point. Given a skewed
58
distribution of reference points, it follows that densely situated points will be contained
within smaller polygons, and sparsely situated points in larger polygons. Since the size
of each polygon is related to the density of the points in the neighborhood, it can be used
as an approximation of the area represented by each point. Voronoi tessellations can be
extended to points in three dimensions, with each point contained with a 3D cell instead of
a polygon.
Using Voronoi tessellation, each reference point is assigned a weight that is approxi-
mately equal to volume of the cell that it lies in. The k-overlap is then computed as
wik
Oik = (4.2)
wi
where wik is the cumulative weight of all reference points that are simultaneously visible
to exactly k cameras and wi is the total weight of all the cells in the viewable region of
camera i. Observe that when the reference points are uniformly distributed, each point gets
order to partition the viewable region into cells or polygons. Since reference point coor-
dinates are unknown, our techniques must estimate them during the initialization phase
(without using any infrastructure support). We describe how to do this in Section 4.3.2.
closest viewable reference point from the center of the cube is calculated. The volume of
the cube is added to the weight of that reference point. When all cubes are associated and
their volumes added to the respective reference points, the weight of each reference points
59
Camera 2
Camera 1
Figure 4.3. Region of overlap estimation using reference points and Voronoi tessellation.
is in proportion to the density of points in the vicinity—points in less dense regions will
have higher weights than points in less dense regions, thereby yielding an approximation
of the tessellation process.
Since k-overlap only indicates the extent of overlap but does not specify where the over-
lap exists, our techniques also determine region of overlap for each camera. Like before,
we assume a reference object placed at randomly chosen locations. Using these points, first
a Voronoi tessellation of the viewing area is obtained for each camera. The region of over-
lap for any two cameras Ci and Cj is simply the the union of cells containing all reference
points simultaneously visible to the two cameras. Figure 4.3 shows the Voronoi tessellation
of the 2D viewing region of camera 1, the reference points viewable by cameras 1 and 2,
and the approximate region of overlap (shaded region) for (C1 , C2 ). Thus, our approximate
tessellation (described in Section 4.3.1.3) can be used to determine the region of overlap
A key insight is that if each camera can determine the coordinates of visible reference
points relative to itself, then tessellation is feasible—absolute coordinates are not required.
60
Front
Image
Image plane
plane
Dp Di
plane Lens
Pi
C (0,0,0)
s
f
(pxi ,py i ,f)
Assuming the origin lies at the center of the lens, the relative coordinates of a point are
defined as (dr , vr ), where dr is its distance from the origin, and v~r is a vector from the
origin in the direction of the reference point that defines its orientation in 3D space.
have assumed that the size of the reference object is known a prior, say s. The focal length
f is also known. Then the camera first estimates the size of the image projected by the
object—this is done by computing the bounding box around the image, determining the
size in pixels and using the size of the CMOS sensor to determine the size of those many
pixels. If s0 denotes the size of the image projected by the reference object on the camera,
s s0
tanθ = = (4.3)
dr f
Since s, s0 and f are known, dr can be computed. A similar idea holds in 3D space where
that the reference object projects an image at pixel coordinates (x, y) on the image plane of
the camera. Then the vector v~r has the same orientation as the vector that joins the centroid
of the image to center of the lens (i.e., the origin). As shown in Figure 4.4(b), the vector
61
P~O = (x, y, f ) has the same orientation as v~r , where O is the origin and P is the centroid
of the image with coordinates (−x, −y, −f ). Since (x, y) can be determined by processing
the image and f is known, the relative orientation of the reference point can be determined.
4.4 Applications
In this section, we describe how camera that are initialized approximately can satisfy
application requirements.
4.4.1 Duty-Cycling
increase lifetime while providing the desired event-detection reliability and also to bound
the maximum time to detect an event. The duty-cycling parameter d is commonly defined
as the fraction of time a sensor is ON. An important criteria in deciding the duty-cycle
parameter is the degree of overlap. Sensors with high coverage redundancy can be operated
at low duty cycles to provide desired event detection probability, whereas those with lower
redundancy will require higher duty cycles. One of the techniques to duty-cycle parameter
n
X 1
di = Oik × (4.4)
k=1
k
where, di is the duty-cycle parameter of camera i, Oik the fraction of k-overlap with the
neighboring cameras and n the total number of cameras. The intuition is to duty-cycle
Object tracking involves continuous monitoring of an object—as the object moves from
the range of one camera to another, tracking responsibilities are transferred via a handoff.
Since cameras may be duty-cycled, such a hand-off involves a triggered wakeup to ensure
62
Distance
threshold
Image
Projection Object
line
that the destination camera is awake. A naive solution is to send triggered wakeups to all
overlapping cameras and have one of them take over the tracking. While doing so ensures
wakeups. A more intelligent technique is to determine the trajectory of the object and using
the region of overlap determine which camera is best positioned to take over tracking duties
However, since the object location is unknown to the sensor network, its trajectory can
not be accurately determined. The only known information about the object is the image
it projects onto the camera’s image plane—the object is known to lie along a line that
connects the image to the center of the lens. As shown in Figure 4.5, we refer to this line as
the projection line, the line on which the object must lie. We can exploit this information
to design an intelligent triggered wakeup technique. Any camera whose region of overlap
intersects with the projection line can potentially view the object and is a candidate for a
handoff. To determine all such cameras, we first determine the set of reference points within
a specific distance threshold of the line (see Figure 4.5). To determine these reference
points, equidistant points along the length of the projection line are chosen and reference
points within the distance threshold are identified. Next, the set of neighboring cameras
that can view these reference points is determined (using information gathered during our
initialization process). One or more of these camera can then be woken up. Depending
on the extent of overlap with the projection line, candidate cameras are prioritized and
woken up in priority order—the camera with highest overlap has the highest probability
63
of detecting the object on wakeup and is woken up first. Two important parameters of the
scheme are the distance threshold and the maximum number of cameras to be woken up. A
large distance threshold will capture many reference points and yield many candidates for
wakeup, while a small threshold will ignore overlapping cameras. The maximum number
of cameras to be woken up bounds the redundancy in viewing the same object by multiple
cameras—a small limit may miss the object whereas a large limit may result in wasteful
wakeups. We discuss the effect of these parameters as part of the experimental evaluation.
reference points (or objects). Reference points are objects like a ball with a unique color
or a light source, that can be easily identified by processing images at each camera. Each
camera after taking a picture, processes the image to determine if it can view a reference
point. If a reference point is visible to a camera, it calculates the location of the reference
point on its image plane and if possible estimates the location of the reference point. The
location can be estimated using an approximation of the distance of the reference point
from the camera. The distance can be determined if dimensions of the reference object
are known a priori along with the size of it’s image on the camera’s image plane. The
image location and distance of object information is exchanged with all other cameras in
the network. The data recorded at each camera can be stored as table of tuples,
where, Rk is the k th reference point visible to camera i, (ui , vi ) is the projection location
of the reference point in the image plane and di is the distance of the reference point from
the camera. The tuple also stores information from each camera that can view the refer-
ence point simultaneously. Based on this information collected at each camera, techniques
described above are used to initialize cameras.
64
Cyclops HostMote
trigger
Image Grabber View Table
Object Detection Initialization
Bounding Box view procedure
information
The network setup for our prototype implementation is shown in Figure 4.6(a). The
equidistantly placed on the longest side, each at a height of 3f t facing each other and view-
ing inside the cubical volume. The depth-of-view for each camera is 8f t and the horizontal
and vertical viewing regions are 7f t and 6f t respectively. The setup is used to estimate and
Hardware Components We used the Cyclops [44] camera sensor in our prototype
sensor consists of a ADCM 1700 CMOS camera module, and supports image resolutions of
32x32, 64x64 and 128x128. The Cyclops node also has an on-board ATMEL ATmega128L
micro-controller, 512 KB external SRAM and 512 KB Flash memory. The on-board pro-
cessing capabilities of the Cyclops are used for object detection and to detect the size of
object’s image. Each Cyclops sensor is connected to a Crossbow Mote (referred to as the
HostMote) and they communicate with each other via the I2C interface. The HostMote is
also used to receive and send wireless messages and store initialization information on be-
half of the Cyclops. A mote is also used as a remote control to send synchronized sampling
triggers to detect reference points during the initialization process. We experimented with
different objects as reference points in our experiments—small balls with unique colors, a
65
bulb and a glowing ball.
Software Components Both the Cyclops sensors and the Intel Crossbow Motes run
TinyOS [57]. Each Cyclops communicates with it’s attached mote using the I2C interface
and the motes communicate with each other via their wireless interface (see Figure 4.6(b)).
Cyclops Onboard Tasks: Each Cyclops is responsible for taking images and processing
them locally to detect the reference objects. On receiving a trigger from the HostMote each
Cyclops takes a picture and processes it to detect and recognize reference objects. The
results are communicated back to the HostMote.
HostMote Tasks: The HostMote drives each Cyclops to detect reference objects and
stores all the initialization information for each camera. Once an reference object is de-
tected, the HostMote estimates the distance of the object from the camera and transmits
a broadcast message indicating visibility of the reference object, coordinates of the object
on it’s image plane and distance of object from the camera. Further, the HostMote re-
ceives similar broadcasts from other nodes and maintains the ViewTable, a table of tuples
Trigger Mote Tasks: The trigger mote is used as a remote control for synchronized
detection of the reference object. Once a reference object is placed in a location, the trigger
mote sends a wireless broadcast trigger to all HostMotes, which in turn trigger the attached
Cyclops sensors.
the degree of overlap and region of overlap of camera sensors. In addition, we evaluate
the effect of skew in location of reference points on the accuracy of estimation. Further,
66
we also evaluate the performance of an triggered wakeup application which demonstrates
effective use of the region of overlap information.
The simulation setup used for evaluation consisted of a cubical region with dimen-
sions 150x150x150. Two cases, one with 4 cameras and the other with 12 cameras are
used. In the first case, 4 cameras are placed at locations (75,0,75), (75,150,75), (0,75,75),
(150,75,75), oriented perpendicular to the side plane looking inwards. The k-overlap at
each camera is as follows: 1-overlap: 0.54, 2-overlap: 0.23, 3-overlap: 0.07 and 4-overlap:
0.16. In the second case, additional 8 cameras are placed at the 8 corners of the cube and
each of them is oriented inwards with the central axis pointing towards the center of the
cube.
points in the cubical viewing region. To simulate a skewed distribution, a fraction of refer-
ence points were distributed in a smaller region at the center of the viewing region and the
rest were distributed in the entire viewing area. For example, a region of size 25x25x25 at
the center of the viewing region, in different cases, had atleast 25%, 33%, 50%, 66% and
75% of total points within its boundary. We also used restricted regions of sizes 50x50x50
In this section we present evaluation of the techniques used to estimate k-overlap, the
degree of overlap metric, and its use to estimate the duty-cycling parameter.
Figure 4.7 plots the error in k-overlap estimation using the four camera setup with uni-
form distribution of reference points. The absolute difference in the approximate estimation
67
1−overlap 2−overlap
0.5 0.5
non−weighted non−weighted
0.4 weighted 0.4
weighted
0.3 0.3
error
error
0.2 0.2
0.1 0.1
0 0
0 100 200 300 0 100 200 300
# reference points # reference points
3−overlap 4−overlap
0.5 0.5
non−weighted non−weighted
0.4 weighted 0.4 weighted
0.3 0.3
error
error
0.2 0.2
0.1 0.1
0 0
0 100 200 300 0 100 200 300
# reference points # reference points
Figure 4.7. Evaluation of k-overlap estimation scheme with uniform distribution of refer-
ence points.
and the exact k-overlap fraction averaged over the 4 cameras is reported as error. The error
in k-overlap estimation using both the non-weighted and weighted techniques is similar.
Figure 4.7 also plots the effect of number of viewable reference points— reference
overlap estimation decreases with increase in number of reference points for both the non-
weighted and weighted schemes. Error in 1-overlap estimation with the weighted scheme
decreases from 0.075 to 0.04 with 50 and 150 reference points respectively.
Figure 4.8 plots the k-overlap estimates with non-uniform distribution of reference
points. The results are averaged for the different fractions of skew within a restricted re-
gion of 25x25x25. As seen from the figure, the weighted scheme accounts for skew better
than the non-weighted scheme—with most benefits for 1-overlap and 4-overlap estima-
tion. The non-weighted scheme performs poorly as it only counts the number of simulta-
68
1−overlap 2−overlap
1 1
non−weighted non−weighted
0.8 weighted 0.8 weighted
0.6 0.6
error
error
0.4 0.4
0.2 0.2
0 0
0 100 200 300 0 100 200 300
# reference points # reference points
3−overlap 4−overlap
1 1
non−weighted non−weighted
0.8 weighted 0.8 weighted
0.6 0.6
error
error
0.4 0.4
0.2 0.2
0 0
0 100 200 300 0 100 200 300
# reference points # reference points
Figure 4.8. Evaluation of weighted k-overlap estimation with skewed distribution of refer-
ence points.
neously viewable points, while the weighted scheme accounts for the spatial distribution
of the points. Further, with increase in number of reference points, the error with the
weighted scheme decrease, whereas that with the non-weighted scheme remains the same.
Figure 4.9(a) plots the k-overlap with 150 reference points, and it shows that the weighted
scheme performs better than the non-weighted scheme. The error with the non-weighted
ing skew. As skew increases, so does the error in both non-weighted and weighted schemes—
error with the weighted scheme being smaller than the non-weighted scheme. The increase
in error is also more gradual with the weighted scheme as compared to the non-weighted
scheme. The error with the non-weighted scheme increases from 0.26 to 0.49 with increase
in skew fraction from 25% to 75% and the corresponding values for the weighted scheme
are 0.045 and 0.09 respectively.
69
30
0.7 non−weighted
non−weighted weighted 25
0.6 0.5
weighted
Percentage error
0.5 0.4 20
error
error
0.4
0.3 15
0.3
0.2 10
0.2
non−weighted
weighted
0.1 5
0.1
0 0 0
1 2 3 4 0.2 0.3 0.4 0.5 0.6 0.7 0.8 50 100 150 200 250
k−overlap Fraction representing skew # reference points
Duty-Cycling The percentage error in duty-cycle parameter estimation (see Section 4.4.1)
using the k-overlap estimates is shown in Figure 4.9(c). As seen from the figure, error using
the non-weighted scheme is close to 24% and remains unchanged with increase in refer-
ence points. Whereas, error with the weighted scheme is 5% even with only 50 points and
From the results presented above, we conclude that the weighted k-overlap estimation
scheme is well suited to estimate degree of overlap of cameras. The scheme performs
identical to the non-weighted scheme with uniform distribution of reference points and sig-
In this section we present evaluation of region of overlap estimation and the triggered
wakeup heuristic that uses this estimate. Figure 4.10(a) plots results evaluating the effect of
number of reference points on region of overlap estimation. The percentage error reported
is the absolute error in estimated volume corresponding to a region of overlap and the ex-
act volume. As seen in Figure 4.10(a), with uniform distribution of reference points, the
percentage error of all four cameras follows a similar trend. With 50 reference points the
percentage error for the four cameras is between 21-23% and with 100 reference points is
70
40 1 1
camera 1
Percentage error
30
camera 4
25 0.6 0.6
20
15
0.4 0.4 ncams=1
ncams=2
10 100 ref. pts ncams=3
0.2 200 ref pts. 0.2
ncams=4
5 300 ref. pts ncams=5
0 0 0
0 50 100 150 200 250 300 0 1 2 3 4 5 6 10 20 30 40
#reference points Wakeup Threshold (#cams) Wakeup distance threshold
12-14%. With higher number of reference points the error decreases and so does the stan-
dard deviation. With 200 reference points the error is 7-8% and with 250 points is 6-7%.
The above results show that region of overlap between pair of cameras can be estimated
the region of overlap estimates with the 12-camera setup. Figure 4.10(b) plots the effect of
maximum number of cameras triggered on the fraction of positive wakeups, i.e., fraction
of cases when atleast one of the triggered cameras could view the object. As seen from the
figure, with increase in maximum number of cameras triggered per wakeup, the fraction
of positive wakeups increases. Further, the fraction also increases with increase in total
reference points in the environment. The fraction of positive wakeups with a maximum
of 2 cameras to be triggered is 0.7 and 0.88 for 100 and 300 reference points respectively
with a distance threshold (see Section 4.4.2) of 20 inches. With a maximum of 5 cameras
to be triggered the corresponding fractions are 0.77 and 0.93 respectively. The fraction of
positive wakeups is over 0.8 with a maximum of 2 wakeups per trigger. The result shows
that the wakeup heuristic based on region of overlap estimate can achieve high fraction of
71
Camera Error Camera Error 14
1 1.5% 1 2.4% 12
2 7.1% 2 2% 10
Percentage error
3 4.9% 3 6.4% 8
4 5.8% 4 10.8% 6
5 8.7% 5 3% 4
6 3.1% 6 4.7% 2
7 7.9% 7 4.3% 0
0 10 20 30 40 50 60 70 80
True Distance (inches)
8 6.7% 8 0.65%
(a) k-overlap error (b) Region-of-overlap error (c) Distance estimation error
Another parameter that influences the performance of the heuristic is the distance thresh-
old—the distance along the projection of the object’s image used to approximate overlap-
ping cameras. As shown in Figure 4.10(c), with increase in distance threshold from 10 to
20 with 200 reference points, the fraction of positive wakeups increases and remains rela-
tively constant for a maximum 2, 3, 4 and 5 triggered cameras. With just one camera to be
woken up for each trigger, the fraction of positive wakeups decreases with further increase
(beyond 20) in distance threshold. This indicates that the distance threshold is an important
factor affecting the performance of the heuristic and for our setup a threshold of 20 yields
best performance.
In this section, we evaluate the estimation of k-overlap and region of overlap using our
prototype implementation. As described in Section 4.5, we use 8 cameras in our setup and
a light-bulb (1.5 in in diameter) as a reference object placed uniformly in the region viewed
by the cameras. Table 4.11(a) shows the average k-overlap percentage error at each camera.
We also evaluate the accuracy of region of overlap estimate between pairs of cameras
in the 8-camera setup. Figure 4.11(b) tabulates the average percentage error estimating the
region of overlap between pairs of cameras. The average error in estimating the region of
72
overlap between pairs of cameras varies form 1-11% for our setup. An important factor
that affects the region of overlap estimate is the distance estimate of the object from the
camera. Figure 4.11(c) plots the percentage error in estimating the distance of the object
from the camera based on its image size. As can been from the figure, the error is varies
from 2-12%. For our setup, the region of overlap estimates show that the error is below
11% inspite of the error in distance estimation of the object.
Our results show that the approximate initialization techniques are feasible in real-
world deployments and for our setup had errors close to 10%.
4.7 Conclusions
In this chapter, we argued that traditional vision-based techniques for accurately cal-
ibrating cameras are not directly suitable for ad-hoc deployments of sensors networks in
and orientations of camera sensors without any use of landmarks or positioning technol-
ogy. By randomly sampling the environment with a reference object, we showed how to
determine the degree and range of overlap for each camera and how this information can
be exploited for duty cycling and triggered wakeups. We implemented our techniques on a
Mote testbed. Our experimental results showed that our approximate techniques can esti-
mate the degree and region of overlaps to within 10% of their actual values and this error
is tolerable at the application-level for effective duty-cycling and wakeups.
73
CHAPTER 5
In this chapter I will present the design and implementation of SensEye, a multi-tier
heterogeneous camera sensor network, to demonstrate the benefits of multi-tier sensor net-
works. A simple surveillance application is implemented using SensEye, which studies the
A camera sensor network will need to perform several processing tasks in order to
obtain useful information from the video and images acquired by various camera sensors.
Two sample applications are, surveillance and monitoring in a disaster response to provide
visual feedback and monitoring of rare species in remote forests. Both applications have
whenever it enters the monitored environment. To illustrate, the rare species monitoring
application needs to detect the presence of each animal that enters the monitored environ-
ment, while the surveillance application needs to detect vehicles or people that enter the
monitored area. A good detection algorithm will minimize the latency to detect each new
74
Object recognition: Once a new object is detected, it needs to be classified to deter-
mine its type (e.g., a car versus a truck, a tiger versus a deer). This process, referred to
as object recognition, enables the application to determine if the object is of interest and
whether further processing is warranted. For instance, a surveillance system may be inter-
ested in counting the number of trucks on a highway but not cars. In this work, I assume
that an image database of all interesting objects is available a priori, and the recognition
step involves determining if the newly detected object matches one of the objects in this
database.
Object tracking: Assuming the new object is of interest to the application, it can be
tracked as it moves through the environment. Tracking involves multiple tasks: (i) comput-
ing the current location of the object and its trajectory, (ii) handoff of tracking responsibility
as an object moves out of visual range of one camera sensor and into the range of another,
and (iii) streaming video or a sequence of still images of the object to a logging store or a
monitoring station.
The goal is to devise a hardware and software architecture to perform these tasks so as
and reliability. As explained earlier, rather than choosing a single platform and a single type
of camera sensor, the thesis focuses on multi-tier networks where the detection, recognition
and tracking may be performed on different nodes and cameras to achieve the above goal.
SensEye is a camera sensor network comprising multiple tiers (see Figure 5.1). A
canonical sensor node within each tier is assumed to be equipped with a camera sensor,
a micro-controller, and a radio as well as on-board RAM and flash memory. Nodes are
assumed to be tetherless and battery-powered, and consequently, the overall constraint for
each tier is energy. Within each tier, nodes are assumed to be homogeneous, while dif-
ferent tiers are assumed to be heterogeneous with respect to their capabilities. In general,
75
PTZ Camera
Mini−ITX Ethernet
Tier3
Webcam Webcam
Mote Mote
Tier2
Tier1
Radio
Mote Mote Mote Mote
Serial
Cable
Cmucam Cmucam Cmucam Cmucam
that the processing, networking, and imaging capabilities improve as we proceed from a
lower tier to a higher tier, at the expense of increased power consumption. Consequently,
to maximize application lifetime, the overall application should use tier-specific resources
judiciously and should execute its tasks on the most energy-efficient tier that has sufficient
resource to meet the needs of that task. Thus, different tasks will execute on different tiers
and various tiers of camera sensor network will need to interact and coordinate to achieve
application goals. Given these intra- and inter-tier interactions, application design becomes
more complex—the application designer needs to carefully map various tasks to different
One of the goals of SensEye is to illustrate these tradeoffs while demonstrating the over-
all benefits of the multi-tier approach. To do so, SensEye assumes a three-tier architecture
(see Figure 5.1). The lowest tier in SensEye comprises Mote nodes [37] equipped with
900MHz radios and low-fidelity Cyclops or CMUcam camera sensors. The second Sens-
Eye tier comprises Stargate [55] nodes equipped with web-cams. Each Stargate is equipped
with an embedded 400MHz XScale processor that runs Linux and a web-cam that can cap-
ture higher fidelity images than Tier 1 cameras. Each Tier 2 node also consists of two
radios—a 802.11 radio that is used by Stargate nodes to communicate with each other, and
a 900MHz radio that is used to communicate with Motes in Tier 1. The third tier of Sens-
76
Eye contains a sparse deployment of high-resolution pan-tilt-zoom cameras connected to
embedded PCs. The camera sensors at this tier are retargetable and can be utilized to fill
small gaps in coverage provided by Tier 2 and to provide additional redundancy for tasks
such as localization.
Nodes in each tier and across tiers are assumed to communicate using their wireless
radios in ad-hoc mode; no base-stations are assumed in this environment. The radio inter-
face at each tier is assumed to be individually duty-cycled to meet application requirements
of latency and lifetime constraint on each node. Consequently, the application tasks need
to be designed carefully since the radios on the nodes (and the nodes themselves) are not
“always-on”.
Given the above system model, the key design principles for the design and implemen-
principles.
• Principle 1: Map each task to the least powerful tier with sufficient resources: In
order to judiciously use energy resources, each sensing and processing task should
be mapped to the least powerful tier that is still capable of executing it reliably within
the latency requirements of the application—running the task on a more capable tier
fidelity camera can be woken up to acquire a high-resolution image only after a new
object is detected by a lower tier. By putting more energy-constrained higher-tier
77
Figure 5.2. Software architecture of SensEye.
nodes in sleep mode and using triggers to wake them up on-demand, our system can
the coverage of cameras whenever possible. For example, two cameras with overlap-
ping coverage can be used to localize an object and compute its (x, y, z) coordinates
in the environment; this information can then be used to intelligently wakeup other
nodes or to determine the trajectory of the object. Thus, redundancy in sensor cover-
Task allocation in SensEye is a point solution in the space of all possible allocation
permutations across tiers. The static task allocation is based on the power requirements
of each task and the power requirements and capabilities of nodes at each tier. Figure 5.2
shows different components of SensEye and mapping of each task to the corresponding tier.
Following is a description of each task and its instantiation in SensEye.
78
5.3.1 Object Detection
The first task of a camera sensor network is to detect presence of objects as they en-
tire a region of interest. Low latency detection is possible with always-on nodes or with
dense node deployment and efficient duty-cycling. Always-on nodes reduce energy effi-
cient operation of nodes and periodic sampling can be used to bound latency of detection
and improve energy efficiency.
In general, object detection is the simplest task and hence is assigned to Tier 1, the tier
with the least power requirements and least image fidelity. Tier 1 nodes wakeup period-
ically, acquire an image of the environment and process the image to detect presence of
objects. The sampling rate can be varied to change bound on latency of detection and also
the energy usage of the node. Nodes are initialized randomly for non-synchronized duty
cycles. Nodes perform object detection using a simple frame differencing mechanism. An
image of the background is stored at each node and is used for frame differencing for each
captured image. The frame difference is passed through a simple threshold-based noise
filter to get a cleaned foreground image. The number of foreground pixels along with a
In SensEye the higher tier nodes are by default asleep to avoid usage of nodes with
higher power requirements and conserve energy. Once Tier 1 nodes detect object pres-
ence, one or more Tier 2 or Tier 3 nodes need to be woken up for further processing, i.e.,
recognition and tracking.
Two important aspects of inter-tier wakeup are: (i) Intelligent wakeup of appropriate
higher tier nodes and (ii) Inter-tier wakeup latency. Intelligent wakeup of higher nodes
can be achieved assuming location and coverage information of each sensor is known and
further the object can be localized. Object localization (described in further detail in Sec-
tion 5.3.5) is possible if more than one camera sensors view the object simultaneously. The
79
object’s location along with the coverage and location information of each camera can be
used for intelligent wakeup of higher tier nodes and reducing high power wasteful wakeups.
If as object is observed by a single sensor, all higher tier nodes with overlapping coverage
need to woken up to ensure high reliability.
The separation of detection and recognition tasks across tiers introduces latency. The la-
tency includes the delay in receiving the wakeup signal and the delay in transition from the
sleep to wakeup state. SensEye uses several optimizations to reduce the inter-tier wakeup
latency. The wakeup begins with the transmission of a short wakeup packet from Tier 1.
Low-power always-on components at higher tiers process these packets and transition the
higher power subsystems from sleep to wakeup for further processing. The techniques
used are similar to those used in Triage [6] and wake-on-wireless [14]. Further, nodes at
higher tiers load the bare minimum device drivers need for operation—thereby keeping the
Once an object’s presence is detected, the next step is to recognize and classify it.
Higher tier nodes capable of acquiring high fidelity images are used for this purpose. The
recognition task is used to identify objects of interest, e.g., identify whether an object is
tion involves obtaining an image of the environment, isolating the object from the fore-
ground, identifying object features and using similarity analysis in conjunction with a im-
age database. High fidelity images result in high accuracy recognition and also require
In this work, SensEye uses a simple pixel-based comparison as a proof of concept object
recognition technique. A connected components [47] algorithm isolates objects from the
foreground and a color matching heuristic to match the object to the image database.
80
Figure 5.3. 3D object localization using views from two cameras.
object detection as it moves through the field of view of cameras, object recognition to en-
sure that the object of interest to the application is tracked across cameras, and finally
wakeup as well as recognition. As the object moves through the covered region, different
Tier 1 nodes detect the target. If multiple nodes detect the target, localization can be used
to accurately pinpoint the location of the target. Continuous localization can used to track
the path of the moving object. Our current prototype can handle slow moving objects, and
trajectory prediction schemes for fast moving objects (using techniques such as [69]) is the
subject of ongoing research. Future SensEye mechanisms can enable acquired images or
An object’s location can be determined if more than one camera sensor can view and
detect the object simultaneously. Localization can provide several optimizations to improve
performance of SensEye. Localization at Tier 1 can be used to intelligently wakeup the
appropriate higher tier nodes and reduce wasteful wakeups. Tier 1 nodes could further
steer Tier 3 nodes in the direction of the object based on its location. Higher tier nodes
track objects movement based of objects location and can also use it for track prediction.
81
The localization scheme implemented in SensEye works for a 3D setting and assumes
that cameras are calibrated at system setup—their locations and orientations known relative
is the Z axis. The center of the camera lens is at point P2 : (0, 0, f ), where f is the
focal length of the lens, and the centroid of the image of the object on the image plane is
P1 : (x, y, 0). The vector, v, along which the object’s centroid lies is, therefore, computed
image, isolating the object and calculating a bounding box around the object.
To translate the object’s vector, v, from the camera’s reference frame to the global refer-
ence frame, the rotation and translation matrices obtained during calculation of the camera
orientations are used. Each camera’s orientation consists of a translation and two rotations.
The translation from the global reference origin to the camera location is denoted by a
two rotations Initially, the camera is assumed to positioned with its central axis along the
Z axis and its image plane parallel to the global X-Y plane. First, the camera is rotated
by an angle of θ in the counter clockwise direction about the Z axis, resulting in X’ and
Y’ as the new X and Y axes. Next, the camera is rotation by an angle φ in the clockwise
direction about the X’ axis, resulting in Y” and Z’ as the new Y and Z axes. The two rota-
tions are represented by a rotation matrix R and can be used to reverse transform the vector
calculated in Step 1 to the global reference frame. If v1 and v2 are the two vectors along
the direction of object location from cameras 1 and 2 respectively, the two corresponding
vectors in global reference frame are:
82
v10 = R1 .v1 (5.1)
where, R1 and R2 are the composite rotation and translation matrices. The matrix R takes
the following form:
Cosθ −SinθCosφ −SinθSinφ a
Sinθ CosθCosφ CosθSinφ b
R = (5.3)
0 −Sinφ Cosφ c
0 0 0 1
where, θ and φ are rotation angles as described in Step 2 and a,b and c are the translation
Given the two vectors, v10 and v20 , their intersection is the location of the object as shown in
Figure 5.3(c). Since the lines are in three dimensions they are not guaranteed to intersect
especially due to error in centroid computation and camera calibration. A standard tech-
nique used for approximating the intersection is using the Closest Point of Approach [12].
The closest point of approach gives the shortest distance between the two lines in three
dimensions. We use this method to get points CP1 and CP2 , the closest points between
vectors v10 and v20 respectively. The location of the object is given by the mid-point of CP1
and CP2 .
Note that camera calibration and localization in 2D are simpler cases of the more gen-
eral 3D technique presented above.
83
5.4.1 Hardware Architecture
Our SensEye implementation uses four types of cameras—the Agilent Cyclops [45],
the CMUcam Vision sensor [9, 48], a Logitech Quickcam Pro Webcam and a Sony PTZ
camera—and three platforms—Crossbow Motes [37], Intel Stargates [55] and a mini-ITX
embedded PC. SensEye is a three-tier network, with the first two tiers shown in Figure 5.4.
Tier 1: Tier 1 of SensEye comprises a low-power camera sensor such as Cyclops [45]
connected to a low-power Mote [37] sensor platform. The Cyclops camera is currently
available only as a prototype. Therefore, the Cyclops platform is used for our individual
component benchmarks and substitute it with a similarly constrained but higher power
ATMega128 micro-controller and a Xilinx FPGA. The board attaches using a standard 32-
pin connector to a Mote, and communicates to it using UART. The software distribution for
Cyclops [45] provides support for frame capture, frame differencing and object detection.
CMOS camera and a SX52 micro-controller. The CMUcam connects to a Mote using
a serial interface, as shown in Figure 5.4(a). The CMUcam has a command set for its
micro-controller, that can be used to wakeup the CMUcam, set camera parameters, capture
wakeup circuit to wakeup the node from the sleep or suspend state upon receiving a trigger
from a Tier 1 node. In our implementation, as shown in Figure 5.4(b), A Intel Stargate
sensor platform is used along with an attached Mote that acts as the wakeup trigger. Since
the Stargate does not have hardware support for being woken up by the Mote, a relay circuit
described in Turducken [54] is used for this purpose. The Logitech Webcam connects to
the Stargate through the USB port.
84
(a) Tier 1 (b) Tier 2
Figure 5.4. Prototype of a Tier 1 Mote and CMUcam and a Tier 2 Stargate, web-cam and
a Mote.
The software framework of SensEye is shown in Figure 5.5. The description of our
software framework assumes that Tier 1 comprises Motes connected to CMUcam cameras.
Substituting a CMUcam with a Cyclops involves minimal change in the architecture. The
first two tiers of SensEye comprise four software components: (i) CMUcam Frame Differ-
entiator, (ii) Mote–level Detector, (iii) Wakeup Mote, and (iv) Object Recognition at the
Tier 1 Frame Differentiator: The Tier 1 cameras receive periodic instructions from
the Mote to capture an image for differencing. On each such instruction, the CMUcam
captures the image in view, quantizes it into a smaller resolution frame, performs frame
differencing with the reference background frame and sends back the result to the Mote.
Frame differencing results in image areas where objects are present to be highlighted (by
non–zero difference values). The CMUcam has two modes of frame differencing, (i) a
low resolution mode, where it converts the current image (of 88 × 143 or 176 × 255) to a
8 × 8 grid for differencing, or (ii) high resolution mode, where a 16 × 16 grid is used for
85
Tier 1 Tier 2
poll Stargate
Cmucam Mote−level Wakeup
Frame FrameGrabber
Detector trigger Mote wakeup Detection
Differentiator
response Recognition
differencing. The frame differencing is at very coarse level and hence has relatively high
error to estimate location of the object or its bounding box.
Mote–Level Detector: The function of the Tier 1 Mote is to control the CMUcam
and send object detection triggers to the higher level nodes. On startup, the Mote sends
initialization commands to the CMUcam, to set its background and frame differencing
parameters. Periodically, based in its sampling rate, the Mote sends commands to the
CMUcam to capture an image and perform frame differencing. The CMUcam responds
with the frame difference result. The Mote uses a user–specified threshold and the returned
frame difference result to decide whether an event (object appearance or object motion)
has occurred. If an event is detected, the Mote broadcasts a trigger for the higher tier. On
no event detection, the Mote sleeps till the next sampling time. Additionally, the Mote
Wakeup Mote: The Mote connected to the Stargate receives triggers from the lower
tier Motes and is the interface between the two tiers. On receiving a trigger, the Mote can
decide whether to wakeup the Stargate for further processing. Typically, the localized co-
ordinates are used for this purpose. Rather than actually computing the object coordinates
at a Tier 1 Mote, which requires significant coordination between the Tier 1 nodes, our
86
derive the coordinates. The Stargate is then woken up if the object location is within its
field of view, otherwise the trigger is ignored.
High Resolution Object Detection and Recognition: Once the Stargate is woken up,
it captures the current image in view of the webcam. Frame differencing and connected
component labeling [47] of the captured image along with the reference background image
is performed. This yields the pixels and boundaries where the potential objects appear
in the image. Smoothing techniques based on color threshold filtering and averaging of
neighboring region are used to remove noise pixels. Each potential object then has to be
recognized. In our current implementation, we use an averaging scheme based on the pixel
colors on the object. The scheme produces an average value of the red, green and blue
components of the object. The values can be matched against a library of objects and the
closest match is declared as the object’s classification. SensEye can be extended by adding
evaluate a face recognition system in the Experimental section to get an idea of its latency
PTZ Controller: The Tier 3 retargetable cameras are used to fill gaps in coverage and
to provide additional coverage redundancy. The pan and tilt values for the PTZ cameras are
based on localization techniques as described before. The cameras export a HTTP API for
program–controlled camera movement. We use one such HTTP–based camera driver [8]
vidual components are evaluated and are used compare single–tier and multi–tier SensEye
systems.
87
Mode Latency Average Power Energy
(ms) Current Consumption Usage
(mA) (mW) (mJ)
Mote Processing 136 19.7 98.5 13.4
CMUcam Object 132 194.25 1165.5 153.8
Detection
Table 5.1. SensEye Tier 1 (with CMUcam) latency breakup and energy usage. Total latency
is 136 ms and total energy usage is 167.24 mJ.
A B
Current (mA) 20
0 1 2
Time (seconds)
Mode Latency Current Power Energy
(ms) (mA) (mW) Usage(mJ)
A: Object Detection 892 11 33 29.5
B: Idle – 0.34 1 –
Table 5.2. SensEye Tier 1 (with Cyclops) latency breakup and energy usage.
The latency and energy usage benchmarks for Tier 1 and Tier 2 are reported in this
section.
Since minimizing energy usage is an important goal of SensEye, the power consumption
and latency of each hardware and software component in its different modes of operation is
systematically studied. Tables 5.1 and 5.2 report latency, average power consumption and
the energy usage for object detection at Tier 1 and Table 5.3 provides a similar breakdown
for object recognition at Tier 2.
Tier 1: As seen from Table 5.1, 97%of the total latency of object detection at Tier
1, i.e., 132 ms out of 136 ms, is due to CMUcam processing (frame capture and frame
88
differencing). Also, due its higher power requirements, CMUcam uses 92% of the energy,
i.e., 153.8 mJ out of 167.2 mJ. In contrast, the Cyclops (refer Table 5.2) is much more
energy efficient as compared to the CMUcam and consumes 33 mW for 892 ms, which
is better than the CMUcam by a factor of 5.67 in terms of energy usage. However, the
latency of detection at the Cyclops is around 900 ms, which is more than 6 times as much
as the CMUcam. This latency number is an artifact of the current Cyclops hardware and
can be reduced to around 200ms with optimizations expected in future revisions of the
node. A breakup of the energy consumption of the Cyclops camera for detection is given
in Table 5.2.
Tier 2: The processing tasks at Tier 2 of SensEye can be divided as: wakeup from
suspend of the Stargate, stabilization after wakeup for program to start executing, camera
initialization, frame grabber, vision algorithm for detection and recognition and finally the
shutdown procedure for suspend, as shown in Table 5.3. The total latency at Tier 2 to com-
plete all operations is 4 seconds. The largest delays are during camera initialization (1.28 s)
and shutdown for suspend (1 s), with corresponding energy usages of 1725.4 mJ and 768.5
mJ. The least latency task is the algorithm used for object detection and recognition, which
has a latency of 105 ms and the least energy usage of 144.2 mJ.
The comparison of energy consumption and latency reveals some of the benefits of us-
ing a two-tier rather than a single-tier camera sensor network. Every wakeup to shutdown
cycle at Tier 2 consumes around 28 times as much energy as similar task at Tier 1 compris-
ing of CMUcams. When the Tier 1 comprises of Cyclops cameras instead of CMUcams
the ratio of energy usage is 142. There are two reasons for this large difference in en-
ergy consumption between tiers. First, the latency associated with Linux operating system
wakeup from suspend state is significantly greater than the wakeup latency on a highly lim-
ited Mote platform that runs TinyOS. Second, the Stargate platform consumes significantly
greater power than a Mote during the wakeup period. The net effect of greater latency
89
A B C D E F G
Current (mA)
400
200
0
1 2 3 4 5 6
Time (seconds)
Mode Latency Current Power Energy
(ms) (mA) (mW) Usage(mJ)
A: Wakeup 366 201.6 1008 368.9
B: Wakeup Stabilization 924 251.2 1256.5 1161
C: Camera Initialization 1280 269.6 1348 1725.4
D: Frame Grabber 325 330.6 1653 537.2
E: Object Recognition 105 274.7 1373.5 144.2
F: Shutdown 1000 153.7 768.5 768.5
G: Suspend – 3 15† –
Table 5.3. SensEye Tier 2 Latency and Energy usage breakup. The total latency is 4
seconds and total energy usage is 4.71 J. † This is measured on an optimized Stargate node with no peripherals
attached.
M: Mote
S: Stargate
M1 M2 M3 M4
S1 S2
and greater power consumption results in significantly greater total energy consumption
for Tier 2.
Next, I present an evaluation of the full SensEye system and compare it to a single-tier
implementation. The comparison is along two axes—energy consumption and sensing re-
liability. Sensing reliability is defined as the fraction of objects that are accurately detected
and recognized.
90
The experimental setup consisted of circular objects projected onto a wall with an area
of 3m × 1.65m. Objects appeared at random locations sequentially and stayed for a speci-
fied duration. Only one object was present in the viewable area at any time. Object appear-
ances were interspersed with periods of no object being present in the viewable area. A set
of four Motes, each connected to a CMUcam, constituted Tier 1 and two Stargates, each
connected to a webcam, constituted Tier 2 of SensEye. Tier 1 Motes used a sampling period
of 5 seconds and their start times were randomized. The object appearance time was set to
7 seconds and the interval between appearances was set to 30 seconds. The single–tier sys-
tem consisted of the two Stargate nodes which were woken up every 5 seconds for object
detection. This differs from SensEye where a Stargate is woken up only on a trigger from
Tier 1. The nodes at both the tiers were placed in such a manner that each tier covered the
entire viewable region as shown in Figure 5.6. The experiment used 50 object appearances
Tables 5.4 and 5.5 report the number of wakeups and details of detection at each com-
ponent of the single–tier system and SensEye respectively. As can be seen from the tables,
the Stargates of the single–tier system wakeup more often than the Stargates at Tier 2 of
SensEye. A total of 621 wakeups occur in the single–tier system, whereas 58 wakeups
occur at Tier 2 of SensEye. The higher number of wakeups with the single–tier are due the
periodic sampling of the region to detect objects. Of out the total 621 wakeups, an object is
detected only 74 times in the single-tier system whereas in SensEye Tier 1 performs initial
detection and the Tier 2 Stargates are woken up fewer times— resulting in lower energy
usage. The Tier 1 sensor nodes are cumulatively woken up 1216 times. The energy usage
node, a factor of 6.26 reduction. If the CMUcams in SensEye were replaced by Cyclops
cameras, a factor of 9.75 reduction in energy usage is obtained.
91
Component Total On Wakeup Energy
Wakeups Object No Object Usage
Found Found (Joules)
Stargate 1 311 32 279 1464.8
Stargate 2 310 42 268 1460.1
Table 5.4. Number of wakeups and energy usage of a Single–tier system. Total energy
usage of both Stargates when awake is 2924.9 J. Total missed detections are 5.
Table 5.5. Number of wakeups and energy usage of each SensEye component. Total energy
usage when components are awake with CMUcam is 466.8 J and with Cyclops is 299.6 J.
Total missed detections are 8.
As reported in [45], the Cyclops with Mote consumes 1 mW in its sleep state whereas
consumption of 464 mW in sleep mode and is highly unoptimized. Thus, in the suspend
state, the Tier 2 node consumes more than an order of magnitude more power than the Tier
1 nodes with Cyclops cameras. For our experimental setting of 30 seconds of idle time
between objects, this corresponds to an energy reduction by a factor of 33 for SensEye.
Next I compare the reliability of detection and recognition of the two systems in the
above described experimental setup. The single–tier system detected 45 out of the 50 object
appearances and SensEye detected 42—a 6% decrease in sensing reliability. The result
shows the efficacy of using SensEye instead of a single-tier network, as SensEye provides
92
Objects Not Detected Objects Not Detected
60 70
50 60
10 10
0 0
5 6 7 8 9 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Duration (seconds) Speed (m/s)
similar detection performance (6% more missed detections) at an order of magnitude less
energy requirements.
The sensing reliability of SensEye is dependent on the time for which an object is in
view, the sampling period at Tier 1 and speed of the object if it is moving. Since increasing
sampling period is same as increasing time for which object in view, the effect of different
times for which object is view on sensing reliability is studied. Figure 5.7(a) plots the
fraction of undetected objects with object in–view timings of 5,7 and 9 seconds. As seen
from the figure, when an object is in view for 5 seconds, 52% objects are not detected. With
a time of 9 seconds for each object to be in view, the percentage drops to zero. A timing of
7 seconds yields an intermediate value of 16% undetected objects.
the other side. The sampling period used at the Tier 1 nodes was 5 seconds. Figure 5.7(b)
plots the percentage of undetected objects at different speeds of the moving object. As can
be seen, at the slowest considered speed of 0.2 m/s, a sampling rate of 5 seconds is able
to detect all objects atleast once. A speed of 0.6 m/s results in 62% undetected objects.
93
Fraction of wakeups for detection Localization accuracy of camera sensors
0.7 120
Overlapping Coverage Detection (8x8)
Single Coverage Detection (16x16)
0.6 (80x60)
100 average
0.5
Fraction of Wakeups
80
0.4
% error
60
0.3
0.2 40
0.1 20
0
M1 M2 M3 M4 S1 S2 0
Cmucam Cmucam Webcam
Component
The trend shown is intuitive, given a sampling rate, higher speeds lead to higher undetected
objects. Based on the desired probability of detection, the plots can be used to choose
Since multiple tiers cover a given region of interest, an object can be localized at either
of the tiers for tracking. The deployment densities at each tier is different and differ in the
spatial coverage redundancy. Further, the fidelity of acquired images by sensors at different
tiers also varies. In this section, I will present experiments that quantify spatial redundancy
and sensing reliability, by studying the localization opportunities and localization accuracy
of objects.
If an object can be simultaneously viewed by more than a single camera, it can be lo-
calized. Figure 5.8(a) plots, for each tier, the cases when only a single camera and multiple
cameras covered and detected an object. As can be seen, due to greater spatial redundancy
at Tier 1 than Tier 2, more objects can be detected by more than a single camera simulta-
neously. As a result, 54% of the objects can be localized by sensor nodes at Tier 1, while
94
only 28% objects can be localized by nodes at Tier 2—the results in line with the spatial
coverage redundancy of nodes at each tier.
Another important metric for tracking is the localization accuracy provided by the nodes
at each tier. Figure 5.8(b) is a scatter plot of localization accuracy for objects using the
CMUcam and the Webcam. The CMUcam uses 8 × 8 and 16 × 16 matrix representations
of the captured image (converted from 88×143 and 176×255 pixels respectively) for frame
differencing. This is representative of a typical centroid computation that is expected on
Cyclops nodes since these devices are resource-constrained both in memory and compu-
tation capability. The webcam uses a 80 × 60 representation calculated from a 320 × 240
pixels image. As seen from the figure, the webcam has the least localization error and
the CMUcam using a 8 × 8 representation the largest error. The average error for each
Based on the above experiments, Tier 1 nodes can localize as much as twice the number
of objects as compared to Tier 2 but with 15-20% more error in accuracy. The trends
depicted in the figure indicate that if coarse location information is desired or suffices, Tier
To test the coverage and retargatable feature of the Tier 3 PTZ cameras, the number of
times a Tier 3 node successfully views an object is measured. The experimental setup had
40% overlapping coverage among Tier 1 nodes and the PTZ camera could view at most a
quarter of the total coverage area at any time. When an object was detected by more than
one Tier 1 node, previously described 3D localization techniques were used to calculate
the pan and tilt values and retarget the Tier 3 camera. Out of the 50 object appearances,
the PTZ camera could view 46—a 92% success rate. The experiment verifies that 3D
95
Power consumption at Tier 1 Maximum Detection Distance at Tier 1
1200 7
Mote
Cmucam
1000 Total 6
Power Consumption (mW)
5
800
Distance (feet)
4
600
3
400
2
200 1
0 0
0 1 2 3 4 5 6 7 8 9 10 30 40 50 60 70 80 90 100
Sampling Period (seconds) Condifence Threshold
localization techniques along with retargetable cameras have a high success rate and are
SensEye has several tunable parameters which effect energy usage and sensing reliabil-
ity. In this section, I explore the sensitivity to two important system parameters, sampling
The power consumption at Tier 1 is a function of the sampling period used to probe
the CMUcam and check for object detections. Figure 5.9(a) plots the power consumption
at a Mote with increasing values of sampling period. The sampling period is varied from
100 ms to 10 seconds and the power consumption at these two ends is 137 mW and 105.7
mW respectively. While the power consumption reduces with increasing sampling period
as expected, it quickly plateaus since the large sleep power consumption of the CMUcam
dominates at lower sampling periods.
From a sensing reliability perspective, each Mote uses a confidence threshold value to
compare with the confidence with which a CMUcam reports a detection. The threshold
determines when triggers are sent to Tier 2. A higher threshold means closer objects will
96
be detected more easily than farther objects and a lower threshold can more easily detect
objects at larger distances. The trend is verified by the plot shown in Figure 5.9(b). We
varied the confidence threshold from 30 to 100 and measured to maximum distance at
which objects are flagged as detected and its trigger sent to Tier 2. As can be seen in the
figure, a threshold of 30 can detect objects till a distance of 6.5 feet and with thresholds
greater than 80 the maximum distance drops to less than 1 feet. Choosing a good threshold
is important since it controls the false positives and false negatives, and hence the energy
5.6 Conclusions
In this chapter, I argued about the benefits of using a multi-tier camera sensor net-
work over single tier networks and presented the design and implementation of SensEye, a
SensEye and extensive experiments, we demonstrated that a multi-tier network can achieve
without sacrificing reliability. I have also evaluated the effect of several system parameters
on SensEye and tested its ability to track objects and use the retargetable PTZ cameras.
Further, the implementation was also used to study the benchmarks of energy usage and
97
CHAPTER 6
wireless networks with image sensors. I addressed the issues of automatic configuration
and initialization and design of camera sensor networks. I proposed notions of accurate
and approximate initialization to initialize cameras with varying capabilities and resource
constraints. As compared to manual calibration, which can take a long time (order of
hours) to calibrate several cameras, is inefficient and error prone, the automated calibration
protocol is accurate and greatly reduces the time for accurate calibration—tens of seconds
to calibrate a single camera and can easily scale to calibrate several cameras in order of
networks. I designed and built, SensEye, a multi-tier heterogeneous camera sensor network.
Using SensEye I demonstrated how multi-tier networks can achieve simultaneous system
goals of energy efficiency and reliability. In this chapter, I summarize the contributions of
this thesis and future work to extend the scope of this work.
as they assume presence of landmarks and abundant resources. In this thesis, I presented
Snapshot, an automated calibration protocol that is explicitly designed and optimized for
sensor networks. Our techniques are based on principles from optics and geometry and
98
are designed to work with low-fidelity, low-power camera sensors that are typical in sen-
sor networks. As experimental evaluation of our prototype implementation showed that is
ments showed that Snapshot yields an error of 1-2.5 degrees when determining the camera
orientation and 5-10cm when determining the camera location. We argued that this is a
tolerable error in practice since a Snapshot -calibrated sensor network can track moving
objects to within 11cm of their actual locations. Our measurements showed that Snapshot
can calibrate a camera sensor within 20 seconds, enabling it to calibrate a sensor network
containing tens of cameras within minutes. I have developed techniques to analyze the
effect of errors inherent in the calibration protocol on estimated parameters. Both the em-
pirical error analysis and derivation of a lower bound on the expected error verify that
sensor networks.
• Demonstrated that use of Cricket position sensors to automate the calibration proce-
• Studied the sensitivity analysis of the automated calibration protocol and its error
characteristics. The empirical error of Snapshot was found tolerable when compared
to the analytical lower bounds.
• Showed that the application-level error using Snapshot is small. The error in a track-
99
6.2 Approximate Initialization of Camera Sensor Networks
Accurate calibration techniques are not feasible for deployments of ad-hoc low-power
camera sensors due to limited resources and lack of landmark nodes for beaconing. I
be exploited for duty cycling and triggered wakeups. I have implemented our techniques
on a Mote testbed and conduct a detailed experimental evaluation. The results show that
our approximate techniques can estimate the degree and region of overlaps to within 10%
of their actual values and this error is tolerable at the application-level for effective duty-
tructure support.
able applications. The effective error at the application level was found to be accept-
ergy usage based on the type of sensor used for application tasks. I presented the design
and implementation of SensEye, a multi-tier camera sensor network. Using a implemen-
tation of a surveillance application on SensEye and extensive experiments, I demonstrated
100
that a multi-tier network can achieve an order of magnitude reduction in energy usage when
compared to a single-tier network, without sacrificing reliability. I also evaluated the effect
of several system parameters on SensEye and tested its ability to track objects and use retar-
getable PTZ cameras. Further, the implementation was also used to study the benchmarks
• Designed and implemented a multi-tier camera sensor network and demonstrated its
• Using the tasks of object detection, recognition and tracking, quantified the energy
and found that a multi-tier network can obtain comparable reliability with substantial
energy savings.
challenges in the design and operation of multi-tier sensor networks. One of the initial de-
sign decisions is that of placement, coverage and task allocation. Given a fixed budget, in
terms of cost or number of nodes, an initial decision has to be made regarding the number
of tiers and number of nodes at each tier. These decisions are closely related to the place-
ment policies and coverage requirements of the applications. Solutions to place sensors
into multiple-tiers and satisfy coverage guarantees are required. Further, task allocation
policies are needed to map application tasks to each tier. I am interested in developing
solutions that will answer these initial configuration questions in a holistic manner. In am
also interested in exploring research problems related to the dynamic behavior of multi-
101
tier sensor networks. Sensors can fail over time, get overloaded or their remaining-energy
may have to be conserved to increase lifetime. In such cases, dynamic policies have to be
used to migrate tasks across tiers to maintain similar system guarantees or decrease reli-
ability to increase lifetime. These policies need to account for the varied capabilities and
requirements at multiple tiers and more over need to be distributed. I aim to develop such
adaptive policies to handle the dynamic behavior of multi-tier sensor systems.
As part of a broader goal, I am interested in real world deployments of sensor network
applications and use of sensor networks to disseminate geographical data via CDNs. Re-
lated to deployment, I am interested in studying the use of energy-harvesting sensors.
Due to the additional capability of recharging batteries periodically, such nodes change the
tradeoff of energy usage and other system metrics. I am interested in studying the effect
of this parameter through deployment and build optimized solutions for the same. To aid
quick deployment and ease of prototyping to evaluate proposed ideas, I would like to setup
a generic sensor networks testbed. I envision the testbed to consist of various hardware
platforms, including a variety of sensor and embedded platforms. Related to data dissem-
ination, CDNs disseminating geographical data and a sensor network used to retrieve data
from an area, often deal with spatially co-related data. The spatial co-relation can be ex-
ploited by CDN proxies and sensor network edge proxies for approximate responses with
102
BIBLIOGRAPHY
[1] Anandarajah, A., Moore, K., Terzis, A., and Wang, I-J. Sensor Networks for Land-
slide Detection. In Proceedings of the Third International Conference on Embedded
Networked Sensor Systems (2005), pp. 268–269.
[2] Andy Harter and Andy Hopper. A Distributed Location System for the Active Office.
IEEE Network 8, 1 (January 1994).
[3] Andy Ward and Alan Jones and Andy Hopper. A New Location Technique for the
Active Office. IEEE Personal Communications 4, 5 (October 1997), 42–47.
[5] Bajaj, R., Ranaweera, S. L., and Agrawal, D. P. Gps: Location-tracking technology.
Computer 35, 4 (March 2002), 92–94.
[6] Banerjee, N., Sorber, J., Corner, M. D., Rollins, S., and Ganesan, D. Triage: A
Power-Aware Software Architecture for Tiered Microservers. Tech. rep., University
of Massachusetts, Amherst, April 2005.
[7] Berg, M., Kreveld, M., Overmars, M., and Schwarzkopf, O. Computational Geome-
try, Second ed. Springer, 2000.
[10] Coleman, T. F., and Li, Y. On the convergence of reflective newton methods for large-
scale nonlinear minimization subject to bounds. Mathematical Programming 67, 2
(1994), 189–224.
[12] D A. Forsyth and J Ponce. Computer Vision: A Modern Approach. Prentice Hall,
2002.
[13] Devore, J. L. Probability and Statistics for Engineering and the Sciences, fifth ed.
Brooks/Cole, 1999.
103
[14] E. Shih and P. Bahl and M. Sinclair. Wake on Wireless: An Event Driven Energy
Saving Strategy for Battery Operated Devices. In Proc. of ACM MOBICOM (2002),
pp. 160–171.
[15] Estrin, D., Culler, D., Pister, K., and Sukhatme, G. Connecting the Physical World
with Pervasive Networks. IEEE Pervasive Computing 1, 1 (2002), 59–69.
[16] Estrin, D., Govindan, R., Heidemann, J. S., and Kumar, S. Next Century Challenges:
Scalable Coordination in Sensor Networks. In Proceedings of ACM MOBICOM
(1999), pp. 263–270.
[17] F. Zhao and M. Chu and J. E. Reich. Distributed Video Sensor Network. In Proc. of
Intelligent Distributed Surveillance Systems (2004).
[18] Fox, D., Hightower, J., Liao, L., Schulz, D., and Borriello, G. Bayesian Filtering for
Location Estimation. IEEE Pervasive Computing (2003).
[19] Gnawali, O., Greenstein, B., Jang, K., Joki, A., Paek, J., Vieira, M., Estrin, D., Govin-
dan, R., and Kohler, E. The TENET Architecture for Tiered Sensor Networks. In Pro-
ceedings of the ACM Conference on Embedded Networked Sensor Systems (SenSys)
(November 2006).
[20] Gnawali, O., and Yarvis, M. ”Do Not Disturb”, An Application Leveraging Hetero-
geneous Sensor Networks. In ACM SENSYS (2003).
[21] He, T., Huang, C., Blum, B., Stankovic, J., and Abdelzaher, T. Range-Free Localiza-
tion Schemes in Large Scale Sensor Networks. In Mobile Computing and Networking
MOBICOM (2003).
[22] He, T., Krishnamurth, S., Stankovic, J., Abdelzaher, T., Luo, L., Stoleru, R., Yan, T.,
Gu, L., Hui, J., and Krogh, B. Energy-efficient Surveillance System Using Wireless
Sensor Networks. In Proceedings of the Second Internationl Conference on Mobile
Systems, Applications and Services (2004), pp. 270–283.
[23] Horn, B. K. P. Robot Vision , First ed. The MIT Press , 1986.
[24] Hu, W., Tran, V. N., Bulusu, N., Chou, C., Jha, S., and Taylor, A. The Design and
Evaluation of a Hybrid Sensor Network for Cane-toad Monitoring. In Proceedings of
Information Processing in Sensor Networks (IPSN 2005/SPOTS 2005) (April 2005).
[26] Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. Trans-
actions of the ASME–Journal of Basic Engineering 82, Series D (1960), 35–45.
104
[28] Kulkarni, Purushottam, Ganesan, Deepak, and Shenoy, Prashant. The Case for Multi-
tier Camera Sensor Networks. In Proceedings of ACM NOSSDAV (2005), pp. 141–
146.
[29] Liu, T., Bahl, P., and Chlamtac, I. Mobility Modeling, Location Tracking, and Tra-
jectory Prediction in Wireless ATM Networks. IEEE Journal On Selected Areas In
Communications 16, 6 (August 1998), 922–936.
[30] L.Jiao and Y. Wu and G. Wu and E. Y. Chang and Y. Wang. The Anatomy of a Multi-
camera Security Surveillance System. ACM Multimedia System Journal Special Issue
(October 2004), 144–163.
[31] Logitech QuickCam Pro Webcam. http://www.logitech.com.
[32] Lorincz, K., Malan, D., Fulford-Jones, T., Nawoj, A., Clavel, A., Shnayder, V., Main-
land, G., Welsh, M., and Moulton, S. Sensor Networks for Emergency Response:
Challenges and Opportunities. IEEE Pervasive Computing 3, 4 (2004), 16–23.
[33] Mainwaring, A, Polastre, J., Szewczyk, R., and Culler, D. Wireless Sensor Networks
for Habitat Monitoring. In Proceedings of the First ACM International Workshop on
Wireless Sensor Networks and Applications (2002), pp. 88–97.
[34] Manfredi, V., Mahadevan, S., and Kurose, J. Switching Kalman Filters for Prediction
and Tracking in an Adaptive Meteorological Sensing Network. In Proceedings of
IEEE SECON (September 2005).
[35] McLaughlin, D., Chandrasekar, V., Droegemeier, K., Frasier, S., Kurose, J., Junyent,
F., Philips, B., Cruz-Pol, S., and Colom, J. Distributed Collaborative Adaptive Sens-
ing (DCAS) for Improved Detection, Understanding, and Prediction of Atmospheric
Hazards. In Proceedings of the Ninth AMS Symposium on Integrated Observing and
Assimilation Systems for the Atmosphere, Oceans, and Land Surface (January 2005).
[36] Moses, R.L., and Patterson, R.M. Self-Calibration of Sensor Networks. Unattended
Ground Sensor Technologies and Applications IV (April 2002), 108–119.
[37] Crossbow wireless sensor platform.
http://www.xbow.com/Products/Wireless Sensor Networks.htm.
[38] N B. Priyantha and A. Chakraborty and H. Balakrishnan. The Cricket Location-
Support System. In Proc. of MOBICOM (2000), pp. 32–43.
[39] Pathirana, P. N., Savkin, A. V., and Jha, S. Mobility Modelling and Trajectory Pre-
diction for Cellular Networks with Mobile Base Stations. In Proceedings of the
Fourth International Symposium on Mobile Ad Hoc Networking & Computing (2003),
pp. 213–221.
[40] Polastre, J., Szewczyk, R., and Culler, D. Telos: Enabling ultra-low power wireless
research. In Proceedings of the Fourth International Conference on Information Pro-
cessing in Sensor Networks: Special track on Platform Tools and Design Methods for
Network Embedded Sensors (IPSN/SPOTS) (April 2005).
105
[41] Polastre, J., Szewczyk, R., and Culler, D. Telos: Enabling ultra-low power wireless
research. In Proceedings of the 4th International Conference on Information Pro-
cessing in Sensor Networks: Special track on Platform Tools and Design Methods for
Network Embedded Sensors (IPSN/SPOTS) (April 2005).
[42] Priyantha, N. B., Chakraborty, A., and Balakrishnan, H. The cricket location-support
system. In In Proceedings of the 6th annual ACM International Conference on Mobile
Computing and Networking (MobiCom’00), Boston, MA (August 2000), pp. 32–43.
[43] R. Collins and A Lipton and T. Kanade. A System for Video Surveillance and Mon-
itoring. In Proc. of American Nuclear Society (ANS) Eighth International Topical
Meeting on Robotics and Remote Systems (1999).
[44] Rahimi, M., Baer, R., Iroezi, O. I., Garcia, J. C., Warrior, J., Estrin, D., and Srivastava,
M. Cyclops: In Situ Image Sensing and Interpretation in Wireless Sensor Networks.
In 3rd International Conference on Embedded Networked Sensor Systems (November
2005), pp. 192–204.
[45] Rahimi, M., Baer, Rick, Warrior, J., Estrin, D., and Srivastava, M. Cyclops: In Situ
Image Sensing and Interpretation in Wireless Sensor Networks. In Proc. of ACM
SenSys (2005).
[46] Rao, A., Ratnasamy, S., Papadimitriou, C., Shenker, S., and Stoica, I. Geographic
routing without location information. In Proceedings of ACM MOBICOM (September
2003), pp. 96–108.
[47] Rosenfeld, A., and Pfaltz, J L. Sequential Operations in Digital Picture Processing.
Journal of the ACM 13, 4 (1966), 471–494.
[48] Rowe, A., Rosenberg, C., and Nourbakhsh, I. A Low Cost Embedded Color Vision
System. In International Conference on Intelligent Robots and Systems (2002).
[49] Rowe, A., Rosenberg, C., and Nourbakhsh, I. A Low Cost Embedded Color Vision
System. In International Conference on Intelligent Robots and Systems (2002).
[50] Russell, S., and Norvig, P. Artificial Intelligence: A Modern Approach. Prentice Hall,
2003.
[51] Savvides, A., Garber, W., Moses, R., and Srivastava, M. An Analysis of Error Induc-
ing Parameters in Multihop Sensor Node Localization. IEEE Transactions on Mobile
Computing 4, 6 (2005), 567–577.
[52] Savvides, Andreas, Han, Chih-Chieh, and Strivastava, Mani B. Dynamic fine-grained
localization in ad-hoc networks of sensors. In Mobile Computing and Networking
MOBICOM (2001).
[53] Sheth, A., Tejaswi, K., Mehta, P., Parekh, C., Bansal, R., Merchant, S., Singh, T.,
U.B.Desai, C.A.Thekkath, and Toyama, K. SenSlide - A Sensor Network Based
Landslide Prediction System. In Proceedings of the Third International Conference
on Embedded Networked Sensor Systems (2005), pp. 280–281.
106
[54] Sorber, J., Banerjee, N., Corner, M. D., and Rollins, S. Turducken: Hierarchical
Power Management for Mobile Devices. In Proc. of MOBISYS (2005), pp. 261–274.
[56] Steere, D., Baptista, A., McNamee, D., Pu, C., and Walpole, J. Research Chal-
lenges in Environmental Observation and Forecasting Systems. In Proceedings of the
Sixth Annual International Conference on Mobile Computing and Networking (2000),
pp. 292–299.
[58] Tsai, R. Y. An Efficient and Accurate Camera Calibration Technique for 3D Machine
Vision. In In Proceedings of 1986 IEEE Conference on Computer Vision and Pattern
Recogition (CVPR’86), Miami Beach, FL (June 1986), pp. 364–374.
[60] U.M. Erdem and S. Sclaroff. Optimal Placement of Cameras in Floorplans to Satisfy
Task Requirements and Cost Constraints. In Proc. of OMNIVIS Workshop (2004).
[61] V.C. Raykar, I. Kozintsev and R. Lienhart. Position Calibration of Audio Sensors and
Actuators in a Distributed Computing Platform. In Proc. of ACM Multimedia (2003),
pp. 572–581.
[62] W. Feng and B. Code and E. Kaiser and M. Shea and W. Feng and L. Bavoil. Panoptes:
A Scalable Architecture for Video Sensor Networking Applications. In Proc. of ACM
Multimedia (2003), pp. 151–167.
[63] Wang, F. Y. A Simple and Analytical Procedure for Calibrating Extrinsic Camera
Parameters. IEEE Transactions on Robotics and Automation 20, 1 (February 2004),
121–124.
[64] Werner-Allen, G., Lorincz, K., Welsh, M., Marcillo, O., Johnson, J., Ruiz, M., and
Lees, J. Deploying a Wireless Sensor Network on an Active Volcano. IEEE Internet
Computing 10, 2 (2006), 18–25.
[65] Whitehouse, K., and Culler, D. Calibration as Parameter Estimation in Sensor Net-
works. In First ACM International Workshop on Sensor Networks and Applications
(WSNA 2002) (2002).
[66] Xu, N., Rangwala, S., Chintalapudi, K., Ganesan, D., Broad, A., Govindan, R., and
Estrin, D. A Wireless Sensor Netowrk for Structural Monitoring. In Proceedings of
the Second International Conference on Embedded Network Sensor Systems (2004),
pp. 13–24.
107
[67] Zhang, P., Sadler, C., Lyon, S., and Martonosi, M. Hardware Design Experiences
in ZebraNet. In Proceedings of the Second International Conference on Embedded
Networked Sensor Systems (2004), pp. 227–238.
[68] Zhang, Z. Y. A Flexible New Technique for Camera Calibration. IEEE Transactions
on Pattern Analysis and Machine Intelligence 22, 11 (November 2000), 1330–1334.
[69] Zhao, F., Liu, J., Liu, J., Guibas, L., and Reich, J. Collaborative Signal and Informa-
tion Processing: An Information Directed Approach. Proceedings of the IEEE 91, 8
(2003), 1199–1209.
108