Professional Documents
Culture Documents
The availability and quality of information extracted from Wireless Sensor Networks (WSNs) revolutionised a wide range of application areas. The success of any WSN application is, nonetheless, determined by the ability to retrieve information with the required level of accuracy, within specied time constraints, and with minimum resource utilisation. This paper presents a new approach to localised information extraction that utilises the Watershed segmentation algorithm to dynamically group nodes into segments, which can be used as programming abstractions upon which dierent query operations can be performed. Watershed results in a set of well delimited areas, such that the number of necessary operations (communication and computation) to answer a query are minimised. This paper presents a fully asynchronous Watershed implementation, where nodes can compute their local data in parallel and independently from one another. The preliminary experimental results demonstrate that the proposed approach is able to signicantly reduce the query processing cost and time without involving any loss of eciency.
Abstract
1 Introduction
Wireless Sensor Networks (WSNs) are currently being employed in a variety of applications ranging from home to industry, and from health to military. These applications have a number of elements in common: (1) The request for information; (2) The answer to this request is usually present in a set of unstructured data streams; (3) WSNs generate large amount of data that is `imperfect' in nature and contains considerable redundancy. Resource constraints on nodes in the network coupled with the characteristics of returned data means that applications have to be developed with the primary design goal of minimising resource utilisation. Distributed information extraction has been advocated to solve this kind of problems. Information extraction is the sub-discipline of articial intelligence that selectively structures, lters, and merges data generated by one or more sensor nodes. It adds meaning to unstructured raw data; therefore, the data become structured or quasi-structured making it more suitable for information processing tasks. This denition is concise and covers exactly in what sense the term information extraction will be used throughout this paper.
This paper presents an in-network information extraction system that nds and links relevant information while ignoring irrelevant and extraneous information. The nal output of the extraction process varies based on user queries; however, it commonly involves the extraction of fragments of information from various nodes within the same segment and linking of these fragments into a coherent answer. We propose the utilisation of Watershed segmentation algorithm [20] that result in a set of well delimited segments of homogenous sensed regions based on nodes location and their corresponding sensor readings. Watershed algorithm is suitable for dynamically changing environments because it uses no thresholding, instead the best option is chosen at each decision stage. We also propose a parallel asynchronous Watershed algorithm implementation that complies with sensor node constraints, i.e. real-time computing, and low power consumption. This in-network information extraction does not return the entire collected data, but it extracts sense data units from one or more network segments, typically simple or multi-modal information of spatio-temporal nature. The goal of node segmentation is to provide a high-level programming model for sensor networks that abstracts away the details of individual sensor nodes. In this paper we examine query-based systems that utilise in-network processing for query response. However, the produced abstractions can be utilised in by other information extraction systems, i.e. event based and time-driven. Query-based information extraction is a request-response interaction between the sensor nodes and end-user or application component. The end user issues a query in an appropriate language, and then the query is disseminated to the network to retrieve the desired data from the sensors based on the description in the query. Most query-based systems provide a high level interface to the sensor network while hiding the network topology as well as radio communication. The end user does not need to know how the data is collected or processed. User controlled, query-based, information extraction is usually applied in situations where it is known in advance what type of semantic information is to be extracted from the network. For example, it might be necessary to identify what type of events are happening in a certain part of the monitored environment and at what time these events took place. Depending on the information needs, dierent queries can be constructed to dierentiate various types of events at dierent levels of semantic granularity. In some applications, for instance, it will be adequate to specify that a part of a query is a temporal expression, while in others it might be necessary to dierentiate between dierent temporal classes, for example between expressions indicating past, present and future. In other applications, not only the semantic nature of the target information is predened, but also the unit and scope of the event to be extracted. The unit of extraction refers to the granularity of individual information portions that are lifted out of the sensor node. The scope of extraction refers to the granularity of the extraction space for every individual information request. In order to decide which portion of information is supposed to contribute in the answer of a query, an information extraction application uses a set of extraction conditions. These
conditions state what formal properties a particular portion of information must possess to belong to a particular semantic class. The Watershed transformation is a popular image segmentation algorithm for grey scale images [20,12,7,3,2,5,17,14,19,21]. Its basic concept comes from the eld of topography, referring to the partitioning of a landscape to a number of basins or water catchment areas. The authors in [12] use the following analogy to explain how the Watershed algorithm works: The USA can be divided into two main segments, one associated with the Atlantic Ocean and another associated with the Pacic Ocean. All the rain falling on the east segment will ow into the Atlantic Ocean, while the rain falling on the west segment will ow into the other ocean. The water will reach the ocean given that it is not trapped in a local minimum along the way. Both segments are usually named catchment basins, and each one has an associated minimum (the ocean). The boundary line that separates both basins is called the watershed line, corresponding to the continental divide in the example. Therefore, the image is viewed as a topographic surface where each pixel is a point situated at some altitude as a function of its grey level. The grey levels correspond to the altitude associated to the image between 0 and 255. After the original Watershed algorithm was published, several modications and variations of this algorithm was published to suit various applications. Watershed algorithms can be classied into two conceptually distinct techniques: immersion and raining. 1. Immersion simulates progressively immersing the entire topographic surface in a water container. 2. Raining simulates the rain fall over a topographic surface. The raining can be considered as a local method because each droplet follows on its own way not considering neighbouring droplets. On at areas of the surface, the motion of the water droplet is directed towards the nearest brim of a downward slope and it stops when it reaches a regional minimum. An important aspect of designing a parallel asynchronous algorithm is the exploitation of the data locality for minimisation of the communication overhead. Aiming at the goal, we propose here a reformulation of the raining watershed segmentation due to its suitability for parallel implementation. The presented implementation is capable of computing the Watershed transform according to local conditions. The approach proposed in this paper generates the same result using only one point of synchronisation, thus decreasing the running time without degradation in the eectiveness of the segmentation. Both assertions are demonstrated throughout this paper and supported by experimental results. The paper is organised as follows: Section 2 describes the Watershed algorithm. Section 3 illustrates the suitability of this algorithm for WSNs. Section 5 presents the parallel asynchronous implementation of the Watershed algorithm. The performance of the proposed implementation is evaluated in Section 6. Section 7 concludes the work.
) |f (p )f (p) )
(Steepest decending path) p , SDP (p) is the set of points p N (p) dened as follows:
p N (p) f (p) f (p ) = LS (p) , f (p ) < f (p) dist (p, p )
i.e. the SDP is a series of connected points where each point presents a grey level strictly lower than the previous one. There may exist multiple decending paths from a given point, the choice between them depends on the implementation. A descending path is said to be a steepest path if each point in the path is connected to the neighbour with the lowest grey level. According to rain simulation analogy, the SDP is the path a drop of water would follow when travelling down to a regional minimum. (Cost function based on lower slope) The cost, cost (pi1 , pi ), for walking on the topographic surface from point pi1 to pi N (pi1 ) is:
LS (pi1 ) .dist (pi1 , pi ) f (pi1 ) > f (pi ) LS (pi ) .dist (pp1 , pi ) f (pi1 ) < f (pi ) 1 2 (LS (pi1 ) + LS (pi )) .dist(pi1 , pi ) f (pi1 ) = f (pi )
(Topographic distance) The topographic distance between two points p and q on a surface is the minimal topographical distance among all paths between p and q on the suraface:
T Df (p, q) = inf T Df (p, q)
cost(pi1 , pi ) is the topograpical distance of a path = (p = p1 , p2, ..., pn = q), such that i , pi N (pi1 ) and pi (Catchment basin based on topographic distance) CBT D (mi ) of a local minimum mi is the set of points p where the topographical distance is closer to mi than to any other regional minimum mj , based on the topographical disn i=2
where T Df (p, q) =
In other words, the CB is formed by a regional minimum and all the points whose steepest decsending path ends in that minimum [2]. According to the rain simulation analogy, a CB is an area of a topographic surface, such that when a droplet of rain fall in any point in that area, it would poure in to its minimum following the steepest descending path of that point.
were A is the area of the largest polygon enclosed by the data points. The suitability of this method for WSN is presented in [6].
2- Inclusion of the number of hops in the calculation of the steepest path In Section 2, the SDP (Denition 3) was dened as a series of connected
points where each point presents a grey level strictly lower than the previous one. There may exist multiple descending paths from a given point, the choice between them depends on the Watershed algorithm implementation. For WSNs applications, the most energy ecient descending path should be chosen. According to [13], communication is the most power hungry operation. Communication cost can be computed from transmission distance, hop count, delay, link quality, and other factors. Hop count is widely used factor to measure energy requirement of a routing task and for grouping nodes in energy ecient clusters. The
power consumed in data transmission is directly proportional to the square of the transmission distance between the sending and receiving nodes. The power consumption does not only depend on the transmission distance, but also on the scale of the network [18]. Therefore, the lower distances must be computed as a function of the number of hops as well as the inter-sensor Euclidean distances. The new distance function is the sum of the individual inter-node Euclidean distances multiplied by the hop count.
Dist(p, p ) = dist(pi , pj ) hc
a logical notion of proximity determined by applicative information is, therefore, capable to return specied information with high condence. Segments generated by Watershed algorithm are automatically labelled with a marker, which is an integer identier. All nodes within the segment satisfy the logical constraints encoded in the Watershed algorithm. This logical and intuitive Watershed template serves as the membership function that dynamically determines and updates which nodes belong to the segment. Programmers manipulate segments instead of nodes within communication range. The programmers can still reason in terms of nodes and broadcast messages, but now they can specify declaratively which portions of the network to consider and therefore control the span of communication to save energy. Macroprogramming [22] has been put forward in the literature as an ecient approach to information extraction that provides a more general-purpose approach to distributed computation. Many macroprogramming approaches aim at programming the network as a whole rather than programming the individual nodes that compose the network. Global behaviour can be specied, programmed and then translated to node level code transparently from low level details like network topology, radio communication or power capacity. A signicance class of macroprogramming systems are the application-dened, in-network abstractions that are used in data processing. The Regiment [22] and Hood systems are examples of neighbourhood-based abstractions that handle many nodes collectively and a set of operations on it to enable the programmer to extract information about the state of the group. EnviroTrack [1] is a programming abstraction specically for target-tracking applications, where a group is dened as the set of sensors that detected the same event. In SPIDEY [10], a node is represented as a logical node that has multiple exported attributes (static and dynamic). However, utilising the network topology as an abstraction can require some rigidity in the programming model. It can also be inecient for systems with mobile nodes due to the cost and complexity of maintaining the mapping between the physical topology and the logical topology. The parallel Watershedbased abstractions are dierent from these approaches in one important aspect; segmentation runtime loosely synchronises state across nodes, attaining grater robustness and higher eciency.
chronisation is limited to N (u) and non-blocking communications are used, each node operates independently and the algorithm needs no global scheduling. The role of the Watershed process is to label each non-minimum node by walking downward on a steepest slope path towards the minimum. Initially, all nodes in the network are considered as non-minima and will be ooded from dierent sub-domains. At this stage of the segmentation process, nodes are assigned temporary labels because the segmentation results depends on neighbouring nodes readings. When a node detects a steepest neighbour, it changes it status to a non-minimum node and gets labelled from the steepest neighbour or from its predecessor that has the shortest distance among its neighbouring nodes. When the minimum is reached, its label is assigned to all the nodes upward along the path. If no non-minimum were detected, the steepest distances must be calculated based on the entire set of lower borders. Process termination is locally detected on each node. This reduces the amount of communication and the idle time in nodes. The algorithm builds several paths with dierent origins and destinations from data communications between nodes. This does not introduce additional cost due to the broadcast nature of wireless communications. Typically, any changes in one node readings may aect all nodes on the steepest slope line between that node and the minimum. Therefore, nodes must keep monitoring and updating their membership. A ag, called reset, is maintained by each node to record whether changes occurred or not since the last communication. One advantage of this implementation is that no relabeling and no synchronisation between nodes are needed frequently. One disadvantage of this parallel asynchronous implementation is that the middle nodes on plateaus cannot locally decide whether they are on a minimum or non-minimum plateau, (N P ), and necessitate global synchronisation points to identify and label the minimum plateaus. To avoid global synchronisation, all middle nodes are labelled as minimum or Plateau, (M P ), to allow them makes local decisions. After that, the propagation of data over a non-minimum plateau allows the middle nodes of that plateau ooded by a neighbour to switch to non-minimum. Algorithm 1 presents the pseudocode of the segmentation process in each node.
6 Evaluation
Suppose that a WSN has been deployed to monitor temperature of environmentally sensitive areas. An event of interest is predened if temperature readings with enough numbers go above a certain threshold in a specic geographic area. In our simulation, an event is triggered at random times in random locations followed by issuing a query to locate the hottest spots. Query resolution is implemented using three methods: Watershed-based, nodes are logically grouped into segments that are used to assist query processing; in-network processing via aggregation of messages up a spanning tree of the network; and centralised,
Algorithm 1 The segmentation process of nodes in each state. Input node current state of node (u) S(u); f (v) states of neighbours Output segment membership case (S (u) = initial)
The node u broadcasts its data d (u) and label l (u) to its neighbours N (u), waits for data from each neigbhour v N (u), compute the following: = N (u) all neighbouring readings equal to (u) N (u) all neighbouring readings greater than (u) Ln N (u) = vi a singleton set such that LSmax (vi ) and vi N (u)
if
else
Ln
case (S (u) =
S (u) N P MP ) The node u waits to receive new data, d (v), from any neighbour if d (v) < d(u) ln N (u) v; S (u) N P
l (u) min (l (u) , l (v)); S (u) M P NP ) The node (u)waits for data from N (u)Ln ; S (u) N P In all states, the node (u) sends its data f (u) to all nodes in N (u) whenever
else
case (S (u) =
(a)
Figure 1.
(b)
(c)
(a) Original thermal map (b) Watershed segmentation results (c) Extend segmentation
(a)
Figure 2.
(b)
query
were all data is sent directly to the sink for analysis. All simulations were carried out using Dingo [11], which is a scalable python-based package to allow rapid prototyping of WSNs algorithms. In all experiments, we make use of a thermal map, Figure 1(a), adopted from goinfrared.com. Figure 1(b) shows the result of segmentation in which nodes in each segment collaborate to solve a query. It is easy to extend the segmentation to the whole monitored terrain by using generalised Voronoi, i.e., each location where there is no sensor node is assigned to its nearest segment, Figure 1(c). The cost of the query resolution was measured in terms of the number of messages exchanged to answer the query. Multiple runs with dierent topologies and dierent number of nodes were carried out for the three query resolution methods. These results are presented in Figure 2(a). The energy cost, response time, and accuracy are aected by which and how many nodes are involved in answering a query. Excluding nodes irrelevant to a certain query not only improve answer accuracy but also saves energy and reduce the time required for data analysis. Figure 2(b) shows the number of nodes involved in resolving the same query at dierent network densities. The results obtained in the above experiments indicate that segments can be used to support in-network query resolution. The results shows that the communication overhead associated with segment-based query processing is almost 2 folds less than in-network processing and 10 folds less than the centralised query resolution. These results are clearly explained by the analysis presented in Figure 2.
7 Conclusion
The preliminary work in this paper indicates that segment-based in-network query processing produces considerable energy savings over aggregation and centralised approaches. Watershed logically organises nodes into energy ecient segments that reduce unecessary data transmissions and improve response accuracy. Compared to standard Watershed segmentation algorithms, the major
improvement of our algorithm is that labelling and climbing along the steepest paths are concurrently and locally executed based on the node state, during the entire segmentation process. There are a number of limitations in the work so far that need to be addressed in the future, for example the cost of segmentation.
References
1. T. Abdelzaher, B. Blum, Q. Cao, Y. Chen, D. Evans, J. George, S. George, L. Gu, T. He, S. Krishnamurthy, L. Luo, S. Son, J. Stankovic, R. Stoleru, and A. Wood. Envirotrack: Towards an environmental computing paradigm for distributed sensor networks. In Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04), pages 582589, 2004. 2. A. Bieniek and A. Moga. An ecient watershed algorithm based on connected components. Pattern Recognition, 33(6):907 916, 2000. 3. Andr Bleau and L. Joshua Leon. Watershed-based segmentation and region merging. Comput. Vis. Image Underst., 77:317370, January 2000. 4. Maurice Chu and Juan Julia Liu. State-centric programming for sensor and actuator network systems. IEEE Pervasive Computing, 2003. 5. V. Grau, A. U. J. Mewes, M. Alcaniz, R. Kikinis, and S. K. Wareld. Improved watershed transform for medical image segmentation using prior information. 23(4):447458, 2004. 6. M. Hammoudeh, R. Newman, C. Dennett, and S. Mount. Interpolation techniques for building a continuous map from discrete wireless sensor network data. Wireless Communications and Mobile Computing, January 2011. 7. C.J. Kuo, S.F. Odeh, and M.C. Huang. Image segmentation with improved watershed algorithm and its fpga implementation. IEEE ISCAS 2001, 2:753756, 2001. 8. Chun-Han Lin, Chung-Ta King, and Hung-Chang Hsiao. Region abstraction for event tracking in wireless sensor networks. In 8th International Symposium on Parallel Architectures Algorithms and Networks, page 2005, 274-281. 9. Fernand Meyer. Topographic distance and watershed lines. Signal Process., 38:113 125, July 1994. 10. Luca Mottola and Gian Pietro Picco. Using logical neighborhoods to enable scoping in wireless sensor networks. In Proceedings of the 3rd international Middleware doctoral symposium, pages 6, 2006. 11. Sarah Mount. Dingo wireless sensor networks simulator. http://code.google.com/p/dingo-wsn/, 2011. [Online; accessed 26-March-2011]. 12. Vctor Osma-Ruiz, Juan I. Godino-Llorente, Nicols Senz-Lechn, and Pedro Gmez-Vilda. An improved watershed algorithm based on ecient computation of shortest paths. Pattern Recogn., 40:10781090, March 2007. 13. G. J. Pottie and W. J. Kaiser. Wireless integrated network sensors. Commun. ACM, 43(5):5158, 2000. 14. C. Rambabu, T.S. Rathore, and I. Chakrabarti. A new watershed algorithm based on hillclimbing technique for image segmentation. 4:14041408, 2003. 15. Roerdink and Meijster. The watershed transform: Denitions, algorithms and parallelization strategies. FUNDINF: Fundamenta Informatica, 41, 2000. 16. Donald Shepard. A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 1968 23rd ACM national conference, pages 517524, 1968.
17. Han Sun, Jingyu Yang, and Mingwu Ren. A fast watershed algorithm based on chain code and its application in image segmentation. Pattern Recogn. Lett., 26:12661274, July 2005. 18. Peng Sun, Winston K.G. Seah, and Pius W.Q. Lee. Ecient data delivery with packet cloning for underwater sensor networks. In Symposium on Underwater Tech19. Michawiercz and Marcin Iwanowski. Fast, parallel watershed algorithm based on path tracing. In Proceedings of the 2010 international conference on Computer vision and graphics: Part II, ICCVG'10, pages 317324, 2010. 20. L. Vincent and P. Soille. Watersheds in digital spaces: An ecient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13:583598, 1991. 21. Bjrn Wagner, Andreas Dinges, Paul Mller, and Gundolf Haase. Parallel volume image segmentation with watershed transformation. In Proceedings of the 16th Scandinavian Conference on Image Analysis, SCIA '09, pages 420429, 2009. 22. Matt Welsh and Geo Mainland. Programming sensor networks using abstract regions. In Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1, pages 33, 2004.
nology and Workshop on Scientic Use of Submarine Cables and Related Technologies, pages 3441, April 2007.