You are on page 1of 6

On the self-healing properties of an extremely low

power indoor metering network

Nicola Altan Erwin P. Rathgeb
University of Duisburg-Essen University of Duisburg-Essen
Ellernstrasse 29 Ellernstrasse 29
45326 Essen, Germany 45326 Essen, Germany
phone:+49 201 183 7667 phone:+49 201 183 7670

Abstract—Wireless sensor networks (WSN) consisting of a operator of the WSN provides installation and maintenance of
large number of tiny inexpensive sensor nodes are a viable the WSN. The nodes have to be as inexpensive as possible and
solution for many problems in the field of building automation. In the network has to work unattended for a long time such that
order to meet the energy constraints, the nodes have to operate
according to an extremely low duty cycle schedule. Based on the network maintenance can be included into the scheduling
an earlier proposal for efficiently bootstrapping and configuring for routine building maintenance.
such a network, we show how this algorithm can be extended in Bootstrapping this kind of WSN in reasonable time without
order to also resolve error conditions, e.g. synchronization loss, wasting too much energy is non-trivial because an extremely
during normal network operation and to automatically maintain low duty cycle in the order of 10−4 has to be maintained and
an optimized network topology.
Simulation studies using different error scenarios and network increased duty cycles during network setup already consume a
models show that the extended algorithm automatically adapts significant portion of the total energy available. Therefore, we
the network topology to the given conditions and in fact maintains proposed a simple algorithm for bootstrapping the network
an optimized network topology at reasonable energy cost. in [2]. It produces a connected network and a tree like
hierarchical routing structure with redundant links after the
activation of the sink and is fast and energy efficient.
Recent advances in electronics and radio communications During normal network operation, the network has to auto-
have enabled the development of low-cost, low-power sensor matically adapt to modifications of the radio environment, the
nodes which are small in size and communicate through short presence of interference signals and all other problems which
range radio devices. Various authors envisaged the application could arise during the expected life time.
of sensor networks to monitor events in inhospitable environ- In this paper, we extend the result of our previous work for
ments. However, until now it is still unclear which application this purpose. The resulting algorithm is continuously active
will drive the transition of this technology from the research and provides self-healing by re-integration of temporarily
labs to a large scale application [1]. isolated nodes or network segments. It can also cope with
Building automation and metering are – apart from the the addition of new nodes and permanent node failures by
military sector – the most promising fields for the usage providing a continuous adaptation of the network to the
of wireless sensor network (WSN) technologies and many modification of the surrounding environment.
manufacturers have already started the production of new This paper will demonstrate the benefits of using this
wireless sensor systems for these applications. approach by analyzing the properties of the resulting network
A network tailored to metering differs from the original structure with respect to self-configuration, self-healing and
idea of WSN mainly in the deployment strategy and in the resiliency.
mobility aspect. Instead of distributing the nodes at random,
they will be individually installed at positions predetermined II. R EQUIREMENTS AND ASSUMPTIONS
by the specific tasks. This results in in a specific structure With respect to scalability, network sizes from just a few
in the node distribution. At least one data sink will collect nodes in residential buildings up to 1000 nodes in large
all the measurements the nodes generate at regular intervals. complexes are realistic for building automation scenarios. We
Neither the nodes nor the sink will change their position. assume the number of nodes to be proportional to the number
A modification of the signal propagation conditions or node of rooms for the respective building.
failures will, therefore, be the only reasons for a modification We consider the autonomous – i.e. battery powered –
of the network topology. node operation mandatory. In fact, the WSN will typically
In such a building automation scenario, the owner of the be deployed in an already existing building such that mains
building and the operator of the WSN may be different. In operation is much too costly because a large number of power
this case the owner of the building pays for the service and the outlets would have to be retrofitted for that purpose. Moreover,
autonomous battery powered operation is also mandatory if the
WSN is operated by an external provider.
Simple business plan considerations of the building automa-
tion scenario mandate the use of off-the-shelf sensor hardware
and an unattended node life time of up to 10 years. Hence, to
meet the total energy budget, the nodes have to operate with
an extremely low duty cycle (no higher than 10−4 ).
The medium access control protocol (MAC) combines con-
tention based and contention free access periods into a single Fig. 1. Schematical description of the scan process with n = 3 and m = 1
frame, which begins with the generation of a beacon signal.
The reception of these beacons is the mandatory precondition
for the communication with the respective sender. is higher than the maximum number of nodes which will be
The bootstrap process defines which nodes receive the bea- active in the network. This allows to easily identify after the
con signal a specific node generates, and hence the direction setup phase if all nodes have been integrated into the network.
of the communication between neighbor nodes [2]. The sink uses distance 0 as initial value and doesn’t change
The physical characteristics of the used batteries impose it.
upper and lower limits to the duration of activity and inactivity Each node schedules its activity periods in order to receive
(dmin ) periods of the transceiver, respectively. the beacons generated by the selected neighbors. Each node
The interval between two consecutive beacons emitted by can then answer to a beacon with an update packet in order to
the same node is fixed, and its nominal value is known propose itself as new neighbor to the beacon’s sender. Update
(Tbeacon = 6min). Each node is equipped with a free running packets are generated only if the new connection would allow
clock and a time synchronization algorithm is used to reduce the beacon’s sender to reduce its distance from the sink. [2]
the synchronization error. Each node activates the receiver has shown, that it is advantageous to generate update packets
prior to the expected beacon emission time to provide a guard in this case only with a certain probability 0 < p < 1 as this
period and to accommodate the clock errors. However, even leads to more robust network structures. Upon reception of
with these precautions a node may lose the synchronization a beacon (or update) packet, each node re-evaluates the set
with the neighbors and a re-integration mechanism is required. of nodes it synchronizes with and computes its new distance
In addition, it is also necessary to implement a mechanism value.
which searches for new neighbors to cope with changes in the As a result, after the initialization phase each node is able to
surrounding environment. send its data to at most n neighbors. After the sink activation,
Instead of using specific algorithms which perform network the network structure converges quickly to a tree structure with
adaptation and optimization periodically or upon request, we redundant paths rooted at the sink.
propose to use the algorithm already proposed for bootstrap- An extension of the original algorithm has been proposed
ping with some modifications and have it continuously active to allow the integration of new nodes and an adaptation to
for this purpose. This approach reduces the code complexity the changes of the environment. It is based on the ideas of
and avoids the difficult task of identifying a trigger event for keeping the neighbor selection algorithm continuously active
the optimization process. and of seeking actively for new nodes. For this purpose, some
additional mechanisms have to be included as described below.
III. BASIC SELF - HEALING ALGORITHM One new requirement is that a node has to be removed from
The basic algorithm already described and evaluated in the set (N ) of the neighbor nodes, if its beacon has not been
[2] allows the computation of the Breadth-First Spanning observed for a certain period tinactive . If a node gets isolated
Tree (BST) used for routing purposes even in presence of (N = ∅), it has to reset its hop counter to the initial value h0 .
unidirectional transmission links. If a group of nodes gets isolated from the sink, a count
At power on time a newly activated node observes the to infinity race for the logical distance of the isolated nodes
communication channel for a full beacon interval and builds starts. To avoid this, the sink generates a sequence number for
up a list including at most q already active neighbor nodes. each beacon it sends. Each node (except for the sink) stores the
Due to memory constraints, q will in general be smaller than highest sequence number previously received from the uplink
the total number of nodes within communication range. The nodes and sends it with each own beacon. Each node resets its
selection criterion for the q potential neighbors is the strength logical distance to the initial value if no new sequence number
of the observed beacon signals. has been observed during the interval tisolated .
After the initial phase, each node selects at most n neighbors Energy permitting, each nodes has to search for previously
out of q as communication partners. The decision is taken unknown (and maybe better) neighbors to provide a continuous
locally and based on the (logical) distance indication (h) and fast adaptation of the routing structure to environment
carried by each beacon signal. This represents the number of modifications. To find new neighbors, each node currently
hops from the sending node to the sink along the shortest path. observing m < n neighbors scans the beacon interval by using
Initially, each node has a value for this distance (h0 ), which n − m (additional short observation windows, evenly spaced
over the beacon interval as shown in Fig. 1. The length wObs isolated element or subtree finds its new place in the routing
of the observation windows corresponds to the longest feasible structure.
activity period. The observation windows are shifted by wObs The continuous search for new neighbor nodes allows – on
every beacon interval. As long as the distance between the a longer time scale – the reintegration of nodes which can not
boundaries of an observation window and the boundaries of take advantage of the previously described failover mechanism
one of the m scheduled activity periods is smaller than dmin and at the same time provides a continuous optimization of the
the system does not make use of that observation window. routing structure.
The maximum number of neighbors n a node synchronizes
with has to be set taking in account the conflicting goals of V. S IMULATION S TUDY
robustness and energy efficiency. According to the results in The behavior of the self-healing mechanism has been inves-
[2], this parameter has been set to n = 6 for the simulation tigated by using a simulation model based on the Omnet++ [3]
study as this values provides a reasonable compromise. simulation library. For signal propagation, a model able to to
take into account some peculiarities of the indoor propagation
IV. A DAPTATION CAPABILITY OF THE NETWORK has been defined. It supports 3-dimensional sensor placement
STRUCTURE AFTER CONVERGENCE even in complex buildings, including the possibility to model
When a sufficient amount of information about the sur- the presence of unexpected propagation paths (i.e. air condi-
rounding nodes has been collected, the algorithm used to build tioning conduits). The original node model described in detail
the network [2] converges quickly to a stable tree structure in [2] has been extended by including the synchronization
rooted at the sink. Each node is able to observe the beacon protocol described in [4], along with clock error models
signals generated by at most n neighbor nodes. With respect described there and the capability of scanning the beacon
to the distance from the sink only a subset of the selected interval. The parameter values have been set to the values
neighbor nodes is useful for routing data as they have a found to be reasonable in the cited work.
smaller logical distance to the sink as the node itself. All Three different node placements have been considered1. In
other neighbors have at least the same distance as the node. the simplest case, all nodes placed in a single room were able
Nodes belonging to the latter group play an important role for to directly communicate with each other. This placement has
the fast adaptation of the network structure to changes of the been used in particular to analyze the dependency of the setup
environment. procedure from the number of nodes.
As a second option, we considered the placement found in
i−1 A A [2] to represent a worst case: the nodes have been distributed
over k aligned rooms and the sink has been located in the room

i B C B k+1. The propagation parameters have been set such that the
nodes in room i were able to communicate almost certainly
with the nodes in the rooms i ± 1 but there was no direct
(a) (b) communication path to the elements in the rooms i ± m with
m ≥ 3.
Fig. 2. Example: Node C loses the direct connection to A (a) Node B notices
the event and generates an update which permits the reintegration of C (b) A third, more realistic placement has been obtained by dis-
tributing the nodes in a cubic structure consisting of Rz floors,
each one with Rx · Ry rooms. The number of nodes in each
As an example we consider the situation shown in Fig. 2.
room is selected randomly in the interval [0 . . . M axnodes ].
The nodes of the directed neighborhood graph represent the
The attenuation between rooms belonging to the same floor
network nodes while the edges represent the capability of
has been set like in the previous case, while the direct
receiving a beacon packet and hence the possibility of sending
communication between different floors is possible only if the
data to the beacon sender. At a given time the node C is no
floors are adjacent. The sinks have been put alternatively in
more able to receive the beacons sent by A (a). C removes A
the middle or in a corner of the structure.
from the set of its neighbors and updates its distance estimation
h. B receives a beacon from C, which carries the new distance A. Network-Setup
estimation, and might answer with an update frame because it
is now closer to the sink than C. This node selects B as new The results in [2] assume that each node observes the radio
neighbor and finds its new position in the routing structure channel for a complete beacon interval after its activation
(b). in order to collect the information concerning all the active
neighbors. After this initial beacon period, it switches to the
For more complex failure scenarios, the basic behavior of
low duty cycle scheduling. This solution allows a fast network
the algorithm is almost the same in all cases: As soon as a
configuration at the expense of the energy required for the
node (or a subtree) becomes unable to route data towards the
extended channel observation.
sink, the distance estimation of the isolated element increases
and eventually becomes larger than the distance estimation of 1 A direct comparison of the results collected using the three models is not
a neighbor node. After the reception of an update frame, the meaningful because of the different propagation parameters.
It should not be excluded that during the long network life a be able to find the previous neighbors again once the cause
jamming signal disturbs the communication so massively that of the disturbance is removed. However, after a long service
afterwards all nodes are isolated. In this case it is interesting interruption the nodes will lose the synchronization with their
to know whether the nodes are able to rebuild the network by neighbors completely due to clock drift.
themselves. Therefore, we evaluated first how much time is To check the self-healing capabilities of the network, we
required to build the network by relying on the slow beacon observed the network behavior in presence of severe outages.
period scanning mechanism only - without increasing the duty The quality metric introduced in [2] allows to identify the
cycle. The set-up time is a function of the total number most important nodes at each level of the routing structure,
of nodes in the communication range in this case. Simple i.e. the nodes which have to transport the highest portion of
computations suggest that it is inverse proportional to the the information flow. During a first simulation series we con-
number of neighboring nodes and observation windows. sidered 100 nodes evenly distributed over five aligned rooms.
A simulation study has been done by putting a different The propagation parameters have been chosen according to
number of nodes (20, 50 and 100 nodes) into a single room. the worst case previously described. After the initialization
The nodes have been activated in a random order during the phase the ten most important nodes directly connected to the
first beacon interval. The curves (one for each network size) sink (based on the metric mentioned above) were disabled
in Fig. 3 and in the subsequent figures have been obtained by resulting in many other nodes becoming isolated. After 120
calculating the mean value observed over 50 simulation runs beacon periods the disabled nodes were activated again and
for each measurement interval. the nodes started to reintegrate themselves into the routing
The time required to integrate 90% of the nodes into the structure. The results plotted in Fig. 4 show the superposition
routing structure is almost inverse proportional to the number of the observations done during 50 runs.
of active nodes (Fig. 3(a)) as expected. Soon after the beginning of the disturbance (t = 0), a
It is interesting to observe that the mean number of update significant fraction of the isolated nodes was reintegrated into
packets generated per node during one beacon period shows the routing structure (see Table I) by the mechanism described
a maximum corresponding to the maximum of the integration in Section IV. It is worth noting that in the graph of Fig. 4(a)
rate (Fig. 3(b)). This effect can be observed also during a the number of isolated nodes includes also the failed relay
failover phase and gives a hint on the basic behavior of the nodes which can not participate in the communication because
proposed algorithm. At the beginning the nodes join together of the presence of disturbing signals. The initial restructuring
to build some isolated clusters. As the nodes continue to phase is characterized by an increased update generation rate
discover new neighbors, the clusters grow and merge together (Fig. 4(b)). After that initial restructuring phase, other nodes
until eventually all of the nodes belong to a big cluster. When were able to reintegrate themselves into the routing structure
the sink – which is activated last – becomes part of that by finding new neighbors. After the disturbance had been
big cluster, the nodes increase the update generation rate to removed, the reactivated nodes tried to find their position in
reconfigure the routing structure towards the sink and we the routing structure and the network recovered fully. Table II
observe the above mentioned effect. The nodes belonging to contains the measurements done during this second phase.
smaller isolated clusters wait longer before discovering some The graph in Fig. 4(b) shows the mean number of updates
already integrated neighbors. per node and beacon interval. This value increases as long
It can be estimated that by the time 90% of the nodes as the failure recovery proceeds and then (as noted in [2])
have been integrated into the routing structure, each node has saturates to a constant value, which is upper bounded by 0.03
discovered on average less than three neighbors by shifting for the chosen parameters.
the observation window, the rest of the neighbors has been The same test has been repeated using the 3D node place-
discovered based on the update messages. ment described previously. The playground consists of 60
The total time required for the complete network setup rooms grouped in 5 levels with 3x4 rooms each. The number
spans a few days. Therefore, this setup strategy does not fulfill of nodes in each room was randomly chosen in the interval
the requirements in [2], which state that a verification of the (0,8) at the beginning of each simulation run. The sink has
network functionality has to be possible immediately after been placed either in the center room or in a corner room.
node deployment while the service technician is still on the The propagation parameters have been chosen such that the
premises. However, the self configuration properties observed nodes in a room were able to communicate almost certainly
in this experiment guarantee that the network is able to re- with the nodes in the adjacent rooms but the nodes could not
establish the full functionality even after a massive failure (e.g. communicate to nodes farther than two room away if the rooms
due to the presence of massive jamming signals). belonged to the same level.
Each test run followed the scheme described previously.
B. Self-Healing The observations show almost the same behavior as for the
It can not be ruled out that during the network life time previous scenario.
some nodes get isolated because of environment changes or With respect to the data in Tables I and II the failover is
because of the temporary presence of interference signals. defined as the time which is required to reintegrate all nodes
As long as the service interruption is short, the nodes will able to exchange data into the routing structure. That means
(a) Fraction of isolated nodes vs. time for different number (b) Average number of updates per node
of nodes

Fig. 3. Network setup for different number of directly communicating nodes

(a) Fraction of isolated nodes vs. time (b) Average number of updates per node

Fig. 4. Network behavior if the 10 most important relay nodes fail (100 nodes distributed over 5 aligned rooms)

Time for the failover Success Time for the failover Success
(beacon intervals) rate (beacon intervals) rate
mean stddev mean stddev
Aligned rooms 25.7 30.3 45/50 Aligned rooms 118.4 59.3 50/50
3D building 3D building
17.2 25.1 50/50 92.9 42.9 50/50
(sink in the middle) (sink in the middle)
3D building 3D building
18.9 27.7 47/50 134.4 69.3 50/50
(sink in a corner) (sink in a corner)

all nodes not affected by the disturbance during the first part of that the nodes are forced to search for new neighbors in a
the test and all nodes afterwards. The intervals are measured failure case and on average they select some new neighbors
starting with the activation (respectively deactivation) of the which are nearer to the sink.
disturbance. Observing the field success rate, we note that in
C. Resiliency
some cases the network structure did not integrate all active
nodes during the disturbance period. This can be explained by A further failure scenario takes into account the progressive
observing that sometimes the damage is such that for some erosion of the number of active nodes due to energy depletion
nodes there is actually no path remaining to the rest of the or malfunctioning.
network. The same networks used for the previous test have been
By comparing the value for the mean distance from the sink stressed by progressively turning off all nodes during a five day
metric 2 at the beginning and at the end of this test, even a interval. The node deactivation sequence has been randomized
slight reduction of this metric can be observed. This means by selecting the deactivation time of each node in the given
interval according to a uniform distribution. The curves plotted
2 This metric gives an indication of the energy consumption for the data in Fig. 5 show the mean collected during the 50 runs performed
transmission in absence of data aggregation [2] using the previously described 3D node placements with the
(a) Isolated nodes (b) Updates per active node

Fig. 5. Network behavior in presence of a progressive erosion of the number of active nodes (3D building structure with the sink placed in a corner)

C ONNECTIVITY BREAKDOWN IN PRESENCE OF A PROGRESSIVE NETWORK because of its simplicity and compactness, booth favorable
DAMAGE characteristics for the implementation on simple nodes.
Breakdown point Since the network has to work unattended for many years,
(Active nodes/N odes) its nodes have to be able to cope with modifications of
mean stddev the propagation characteristics, the presence of interfering
Aligned rooms 0.18 0.08 signals and maybe failures of nodes. We propose to use an
3D building extended version of the bootstrap algorithm in order to pro-
0.25 0.12
(sink in the middle) vide continuous optimization and failure recovery capabilities.
3D building
0.23 0.08 This approach allows to maintain the low complexity of the
(sink in a corner) software, and avoids the necessity of explicitly identifying
the transition point between power on, normal operation and
failure recovery.
sink placed in a corner. At a macroscopic level the network A simple mechanism for removing the stale neighborhood
behavior resembles the theoretical results in the work of relationships and a search procedure based on the observation
Srisankat et al. [5], but the different propagation model does of the channel for a short interval each beacon period have
not allow a direct comparison between the observations and been the only additions to the original algorithm required to
the theoretical results in [5]. fulfill the new requirements.
In particular, each time a node is failing the remaining nodes Simulation studies done with different node placements
cooperate to reconstruct a connected network. However, at a highlighted the self-configuration, self-healing and resiliency
given point this is no more possible and the network of the properties of a network running the modified algorithm. The
surviving nodes remains definitively partitioned as indicated node capabilities with respect to processing, memory and
in Fig. 5(a). This point has been defined as breakdown energy constraints as well as the node placements and the
point and the fraction of surviving nodes at this partitioning propagation parameters were chosen based on a real field
time, has been measured. It is interesting to note that the test scenario. Therefore, we are confident, that the results
breakdown point does not vary too much for the considered are representative for real installation conditions and that the
node placements (Table III). proposed approach will also perform well in a real network.
The graphic in Fig. 5(b) shows an increase of the update
generation near to the breakdown point, and it is reasonable to R EFERENCES
assume that during this phase the node reintegration happens [1] I. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A survey
mainly due to the presence of redundant links. The active on sensor networks,” IEEE Communications Magazine, 2002. [Online].
search for still active neighbors plays only a minor role [2] N. Altan and E. Rathgeb, “Bootstrapping a very low power, beacon en-
because of the frequent node failures and the relatively low abled, wireless sensor network,” in 12th IEEE Symposium on Computers
node density. and communications (ISCC), 2007.
[3] A. Varga, “The omnet++ discrete event simulation system,” in European
Since for the considered application the maximum accept- Simulation Multiconference (ESM’2001), 2001.
able network damage is in a range of about 10% of the total [4] N. Altan and E. Rathgeb, “Opportunistic clock synchronization in a
number of nodes, the results can be considered satisfactory. beacon enabled wireless sensor network,” TDR / University Duisburg-
Essen, Tech. Rep., November 2007.
[5] S. S. Kunniyur and S. S. Venkatesh, “Network devolution and the growth
VI. C ONCLUSIONS of sensory lacunae in sensor networks,” in ISIT 2004 (International
Symposium on Information Theory, 2004), 2004.
In a previous work [2] we proposed an algorithm for the
efficient initialization of a low power / low duty cycle wire-
less sensor network. That algorithm is particularly interesting