You are on page 1of 6



Liliana M. Arboleda C. and Nidal Nasser

Department of Computing & Information Science
University of Guelph
Guelph, Ontario, N1G 2W1 Canada
{larboled, nnasser}

Abstract Clustering techniques have been proposed in WSNs in order
One of the mechanisms used to enlarge the lifetime of to achieve high energy efficiency and assure long network
Wireless Sensor Networks (WSN) and to provide more lifetime, for bandwidth reuse, for data gathering [19] and
efficient functioning procedures is clustering. By assuming target tracking [5], one-to-many, many-to-one, one-to-any,
roles within a cluster hierarchy, the nodes in a WSN can or one-to-all communications, routing [2][8][11][13], etc.
control the activities they performed and therefore, reduce Clustering is particularly useful for applications that require
their energy consumption. However, the election of when to
act as a data provider (saving energy) and when to act as a scalability to hundreds or thousands of nodes. Scalability in
gateway (cluster head) between the nodes and the base this context implies the need for load balancing, efficient
station is not a simple task. To make this decision it is resource utilization, and data aggregation [3]. Also, many
necessary to take into account aspects like power level routing protocols can use clustering to create a hierarchical
signal, transmission schedules and networking functioning structure and minimize the path cost when communicating
(proactive or reactive). In this paper we study some basic
concepts related to the clustering process in WSN and with the base station.
presenting a comparison survey between different clustering In many WSN applications where data collection and
protocols. processing can be done “in-situ”, this hierarchical approach
is a promising method for efficiently organizing the
network. Also, many signal processing algorithms used for
1. Introduction extraction of final information from the data gathered by the
A wireless sensor network is a set of sensors deployed in sensors are well suited for local processing of data within
a sensor field, to monitor specific characteristics of the the clusters.
environment, measure those characteristics and collect the The rest of the paper is organized as follows. The cluster
data related to that phenomena. The sensors are small Elements in WSN are described in Section 2. Section 3
devices with limited resources: limited battery power, low presents the establishment process for forming a cluster in
memory, little computing capability, very low data rates, WSN. A comprehensive survey for different clustering
low bandwidth processing, variable link quality, etc. algorithms and protocols in WSN is discussed in Section 4.
Despite their constraints, when the sensors are deployed in Finally, summary of the paper is provided in Section 5.
large numbers, they can provide us with a very real picture
of the field being sensed. WSN can provide an area
2. Cluster Structure
coverage that was not possible with other wired and
wireless networks. They can be deployed in different In this section, we describe the elements, types and
environments and can be permanently attended or can be advantages of clusters in WSN.
left unattended once they have been deployed in the field.
The use of the WSN potential will provide efficient and 2.1. Elements in a Cluster
costs effective solutions for many problems. However, it is
necessary to implement mechanisms or procedures to deal In general, when working with clusters it is possible to
with the sensor constraints. identify three main different elements in the WSN: sensor
Using clustering techniques in WSN can help solving nodes (SNs), base station (BS) and cluster heads (CH). The
some of those constraints, by allowing the organization of SNs are the set of sensors present in the network, arranged
the sensors in a hierarchical manner, grouping them into to sense the environment and collect the data. The main task
clusters and assigning specific task to the sensor in the of a SN in a sensor field is to detect events, perform quick
clusters, before moving the information to higher levels. local data processing, and then transmit the data. But the

1-4244-0038-4 2006 1787
IEEE CCECE/CCGEI, Ottawa, May 2006
greatest constraint it has is the power consumption, which homogeneous sensors, but can also be used in
usually is caused when the sensor is observing it heterogeneous WSN. The formation of a cluster can be
surroundings, and communicating (sending and receiving) triggered by using a special message sent to the cluster
data. The BS is the data processing point for the data every certain period of time, or can be triggered by the
received from the sensor nodes, and where the data is occurrence of certain events (i.e. the detection of a big
accessed by the end-user. It is generally considered fixed change in the monitored attributes). No explicit leader (CH)
and at a far distance from the sensor nodes. The CH acts as election is required, and this decreases the number of
a gateway between the SNs and the BS. The function of the messages used during the network deployment. However, a
cluster head is to perform common functions for all the CH election method, a cluster formation method and cluster
nodes in the cluster, like aggregating the data before maintenance methods must exist on the network. Dynamic
sending it to the BS. In some way, the CH is the sink for the clustering in WSN is also more feasible when monitoring
cluster nodes, and the BS is the sink for the cluster heads. moving targets, due to the possibility of clusters
This structure formed between the sensor nodes, the sink reconfiguration.
and the base station can be replicated as many times as it is
needed, creating the different layers of the hierarchical 2.3. Advantages of Clustering
The cluster-based communication scheme helps solving
2.2. Cluster Types the previous problems. Once the WSN has been divided
into clusters, the communication between nodes can be
There exist many different ways to classify the clusters. intra-cluster or inter-cluster. Intra-cluster communication
Two of the most common classifications are homogeneous comprises the message exchanges between the participating
or heterogeneous clusters and static or dynamic clusters. nodes and the CH. Inter-cluster communications includes
The formal classification is based on the characteristics and the transmission of messages between the CHs or between
functionality of the sensors in the cluster. Whereas the later the CH and the BS. The fact that only the CH is transmitting
is based on the method used to form the cluster. information out of the cluster helps avoiding collisions
In heterogeneous sensor networks, there are generally between the sensors inside the cluster, because they don’t
two types of sensors: (a) sensors with higher processing have to share the communication channel with the nodes in
capabilities and complex hardware, used generally to create other clusters. This also helps saving energy and avoiding
some sort of backbone inside the WSN. They are designated the black hole problem. Latency is also reduced. Although
as the cluster head nodes, and therefore have to serve as the data has to hop from one cluster head to another, it
data collectors and processing centers for data gathered by covers larger distances than when the sensor are using a
other sensor nodes, and (b) participating sensors, with lower multi-hop communication model as the one used in Directed
capabilities than the previous ones, used to actually sense Diffusion [10]. The cluster based communication model
the desired attributes in the field. In homogeneous networks, also facilitates the use of data aggregation models. In this
all nodes have the same characteristics, hardware and case, only the CH performs data aggregation operations,
processing capabilities. This is the typical case when the helping the participating nodes inside the cluster to save
sensors are deployed in battle fields. In this case, every energy.
sensor can become a CH. The cluster head role is
periodically rotated among the nodes to balance the load, 3. The Clustering Process
ensure that sensors consume energy more uniformly and try
During the establishment of the cluster, it is necessary to
to avoid the black hole problem described before.
take into account aspects like: cluster size and form, how to
Static clusters are usually created when the network is
select the cluster head, how to control inter-cluster and
formed of heterogeneous nodes and the network designers
intra-cluster collisions, and energy saving issues. The design
want to create the clusters around the more powerful nodes.
of the clustering process is one of the more important issues
In this case, the clusters are formed at the time of network
for the correct functioning of the WSN, due to the probed
deployment. The attributes of each cluster, such as the size
efficiency of using a hierarchical scheme for
of a cluster, the CH, the number of participating sensors and
communications between the network elements.
the area it covers, are static. Static clusters are easy to
In all the cluster-based protocols we can identify three
deploy, but their use is only appropriate for limited
main phases during the clustering establishment process: (a)
scenarios where the sensor field is predetermined, the
cluster head election, (b) cluster formation (set-up phase),
targets to monitor are not in motion and it is easy to perform
(c) data transmission phase (steady-state phase). Different
maintenance tasks (i.e. sensors replacements) in the
approaches exist to implement each one of these stages. For
network. Dynamic cluster architectures make a better use of
example, it is possible to use a fixed distribution of the SN
the sensors. Sensors do not statically belong to a cluster
and the CH, or to use a dynamic algorithm for the location
and may support different clusters at different times. This
of the sensors and the CH election.
communication schema is generally used in WSN with

4. Cluster-based Protocols However, the use of a random cycle scheme to select the
CH made LEACH the basis for other protocols with very
In this section we present some well known clustering slight modifications like LEACH-C and LEACH-F. In the
protocols and highlight their characteristics, related to the first case, LEACH-C transfers the responsibility of the
phases identified in the previous section. Some of these cluster creation to the BS, which obligates the nodes to
protocols are routing protocols and some of them are initially perform a direct communication with the BS for it
general clustering protocols. Table 1 shows the main to have a global view of the WSN. On the other hand,
characteristics of the protocols and evidences where they LEACH-F uses the global clustering scheme of LEACH-C
can be currently used, but also shows some weaknesses in and also fixes the clusters once they are formed, which
the reviewed clustering protocols, which lead to the reduces the overhead of cluster formation in the network,
proposal for future work over those uncovered issues. but prevents the use of the protocol in WSN with any kind
of mobility.
4.1. Former Protocols
The initial clustering schemas proposed for WSN were
4.3. Protocols for Proactive Networks
based in some sort of manual formation of the clusters, a One of the assumptions on LEACH-based protocols is
consequence of the type of networks where the sensor were that the sensors always have data to send. That is the reason
being used. In cases like the Dynamic Clustering for for considering them all during the cluster formation. The
Acoustic Target Tracking [5], due to the manual location of same assumption is made for other protocols that combine
the sensors, it is possible to create a heterogeneous setting the sensor nodes’ “willingness” to send information, with
of the network. The more capable nodes can be located in the revision of specific attributes of special interest in the
strategic places to allow them to act as cluster heads, and at sensor field. Examples of this kind of protocols are:
the same time, lower capacity sensor can be placed around HEED[22], EECS[19], Sensor Aggregates[6], ACE[4],
that CH to sense the data and send it to the BS using the EEDC[11] and TASC[17].
backbone created. Mobility in these protocols in only HEED uses the node proximity to it neighbors or the
considered regarding the approach of the targets to the node degree as a base attribute for cluster creation. The
stationary sensor nodes. node’s residual energy is used as the attribute to select the
CH, same as in EECS where the nodes use
4.2. LEACH-Based Protocols COMPETE_HEAD_MSG messages to compare their own
LEACH (Low Energy Adaptive Clustering Hierarchy) power level with the other values received and if the power
[8], was probably the first dynamic cluster head protocol level is the greater, the SN becomes CH. This mechanism to
proposed specifically for WSN using homogeneous select the CH and form the clusters produces a uniform
stationary sensor nodes randomly deployed. distribution of cluster heads across the network through
In LEACH all nodes have a chance to become CH to localized communications with little overhead. However,
balance the energy spent per round by each sensor node. synchronization is required and the energy expended during
The CH for the cluster are selected randomly and in a rotary data transmission for far away cluster heads is significant,
scheme based on their energy load. After its election, each especially in large scale networks.
CH broadcasts an advertisement message to the other nodes, In [6] the authors propose three different algorithms to
which decide which cluster they want to belong to, based on create and manage the clusters: DAM (Distributed
the advertisement’s signal strength. The clusters are formed Aggregate Management), EBAM (Energy-Based Activity
dynamically in each round and the times to perform the Monitoring) and EMLAM (Expectation-Maximization Like
rounds are selected randomly. Activity Monitoring). They work using the concept of
The data collection in the cluster is centralized and it is sensor aggregates. Their final goal is to abstract the
performed periodically using a TDMA schedule created by collaboration patterns of the sensors into a set of generic
every CH. The sensor nodes send data to the CH according schemas to support a wide class of applications for sensor
to the schedule. After completing the schedule, the CH networks. A sensor aggregate comprises those nodes in a
fuses all the data and transmits it to the BS. network that satisfy a grouping predicate for a collaborative
Despite of the good performance of LEACH, it has some processing task. The clusters formation depends on the
drawbacks, i.e., it is possible that the elected CH will be peaks formed by the sensors sensed signal, which is
concentrated in one part of the network and some nodes will broadcasted by each sensor at the start of each protocol
not have any CH in their vicinity; LEACH clustering period. This value is used as a “qualification parameter” to
terminates in a constant number of iterations, but it does not create the sensors clusters, assuring that there is only one
guarantee good cluster head distribution and assumes peak per cluster. If after this exchange of information the
uniform energy consumption for CH; and, the cost of the sensor finds that its signal strength is higher than its
overhead to form the clusters is expensive. neighbours’ signal strength, it elects itself as a CH or cluster
leader. Each sensor joins the cluster defined by the highest

Table 1: Clustering algorithms and protocols in wireless sensor networks

peak that can reach that sensor through a strictly descending CH by considering only density reachable nodes as
path in the landscape. nomination candidates. This effect pulls cluster leaders
In ACE the nodes initiate actions at random intervals to towards most dense groups in the cluster, but nomination
avoid collisions. The goal is to select the smallest set of among density reachable candidates is still based on
cluster heads such that all nodes in the network belong to a weights.
cluster. When a node is unclustered at the beginning of its As can be seen, all of the previous protocols use a
iteration, it assesses its surroundings and count the number specific parameter for cluster formation, but always based in
of loyal followers (nodes that can belong only to the cluster the assumption that the SN always have data to send to the
that would be formed by the current node sensing its signal) BS.
it would receive if it declared itself a CH of a new cluster. If
the number of loyal followers for the node is greater than or 4.4. Protocols for Reactive Networks
equal to its spawning threshold function, the node will span
Contrary to the protocols presented in the previous
a new cluster. This algorithm only covers the aspects related
section, this new group of protocols usually take advantage
to the clusters formation, and does not include aspect related
of the queries performed by the user about the sensed data
to data transmission after that.
or of a specific triggering event occurred in the WSN. In
EEDC proposes a dynamic clustering and scheduling
this case the main idea is to save energy by aggregating
approach, based on the overlaps of sensing ranges of
information at the same time that a data transmission path is
sensors and on the analysis of the surveillance data reported
created to supply the request for information. Protocols that
by the sensors. To minimize the number of clusters and
follow this patter include: TEEN[13], APTEEN[14],
therefore maximize the energy saving, EEDC models the
CAG[20] and Upgraded CAG [21].
cluster creation process as a clique-covering problem and
TEEN is a clustering protocol designed to be used in
uses the minimum number of cliques to cover all vertices in
reactive networks, where the nodes react immediately to
the graph. The sink dynamically adjusts the clusters based
sudden and drastic changes in the value of a sensed
on spatial correlation and the received data from the
attribute. This approach is useful for time-critical
sensors. However, in EEDC the number of clusters
applications, but not well suited for applications where the
increases over time, since there is only splitting operation in
users need to get data on a regular basis. In this protocol, all
the protocol, but not a clearly defined cluster
nodes take turns becoming the CH for a time interval T,
reconfiguration operation.
called the Cluster Period. TEEN main focus is in the
The Topology Adaptive Spatial Clustering (TASC)
information aggregation, thus the cluster formation is very
algorithm [17] is a distributed algorithm that partitions the
similar to LEACH. Sensor nodes sense the medium
network into a set of locally isotropic, non-overlapping
continuously, but the data transmission is done less
clusters without prior knowledge of the number of clusters,
frequently which favours the energy saving. However, it
cluster size and node coordinates. For the cluster formation,
has an important drawback: if the thresholds are not
two different parameters must be previously specified: the
reached, the nodes will never communicate.
required minimum cluster size and a density reachability
APTEEN is a variation of TEEN, designed as a hybrid
parameter. The latter parameter allows each node to further
protocol that changes the periodicity or threshold values
limit the number of nodes that it can potentially nominate as
used to provide a periodic state view of the network. Uses a

combination of proactive and reactive network’s features. In this case, the CH keeps transmitting streams of response
The cluster head selection in APTEEN is based on the for a query that is issued just once, thus the tree does not
mechanism used in LEACH-C. The cluster exist for an have to be reconstructed and the network changes its
interval called the cluster period, and then the BS re-groups schema from reactive to proactive. This mode of operation
clusters, at a time called the cluster change time. APTEEN is appropriate when the environment does not change as
uses a modified TDMA, where each node in the cluster is frequently and the query remains valid for a certain period
assigned a transmission slot, to avoid collisions. For query of time. A node counting is performed within a cluster to
responses, APTEEN uses node pairs. This implies adjacent assign weights to the CH values. The count is updated only
nodes that sense similar data, but only one of them responds when cluster adjustments happen. In streaming mode, due
to a query; the other can go to “sleep” mode and don’t need to the CH are fixed throughout the duration of the query,
to receive the query. These two nodes can take the role of they can become an energy bottleneck. In this case, the
handling queries alternately, which helps them saving authors propose the use of CH rotation technique to
resources. The main drawbacks of the APTEEN approach maximize the network lifetime, like the mechanism used in
are the overhead and complexity associated with forming LEACH.
clusters at multiple levels, the method of implementing As can be seen, the idea in this group of protocols is to
threshold functions, and how to deal with attribute-based take advantage of the time and effort necessary to perform
naming of queries. other operations in the WSN, to perform the clustering
While monitoring environmental features in a sensor process.
field, nearby sensor nodes typically register similar values.
This relation between position and measures is called spatial 5. Summary
correlation. According to its authors, CAG (Clustered
Several clustering approaches have been proposed for
AGgregation technique) [20] is the first in-network
mobile sensor networks, due to the advantages of having a
aggregation algorithm exploiting spatial correlation, which
hierarchical structure to communicate the nodes and save
trades a negligible quality of result (precision) for a
energy in the WSN. Reviewing the protocols presented in
significant energy saving. CAG forms clusters of nodes
this paper is possible to observe that all of them are
sensing similar values. The CAG algorithm operates in two
concerned on how to prolong the WSN lifetime and how to
phases: query and response. During the query phase, CAG
make a more efficient use of the critical resources located at
forms clusters. During the response phase, CAG transmits
the sensor nodes, without decreasing the communication
the value of the aggregated data within the cluster to the BS.
functionalities, but creating more intelligent clusters,
CAG achieves efficient in-network storage and processing
minimizing the maximum number of nodes in a cluster, and
by allowing a unified mechanism between query routing
minimizing clusters with only a single node (the CH). In
(networking) and query processing (application). Instead of
this paper we provide a comparison between different
gathering and compressing all the data, CAG generates
clustering algorithms and protocols in WSN. This
synopsis by filtering out insignificant elements in data
comparison provides a classification that can be used when
streams to minimize response time, storage, computation,
deciding which clustering technique to use when
and communication costs. CAG is a lossy clustering
implementing other mechanisms, like routing, involved in
algorithm because it uses only sensor values from the
cluster heads to compute the aggregates.
Updated CAG algorithm [21] is an improvement to the References
CAG algorithm, where the clusters are still formed from
nodes sensing similar values within a given threshold, but in [1] Akyildiz I.F., Su W., Sankarasubramanian, Cayirci E.,
this case, the clusters remain unchanged as long as the “Wireless Sensor Networks: A Survey”, Georgia Tech
sensor values stay within a given threshold over time Technical Report, December 2001.
(temporal correlation). This fixed range clustering ensures [2] Al-Karaki J.N., Kamal A.E., “Routing Techniques in
that the performance of CAG become independent of the Wireless Sensor Networks: A Survey”, Department of
magnitude of sensor readings and network topology. When Electrical and Computer Engineering, Iowa State
used in the interactive mode, the protocol alternates query University, December 2004.
and response phases. This is appropriate for scenarios [3] Aurenhammer F., “Voronoi Diagrams - A Survey Of A
Fundamental Geometric Data Structure”, ACM
where network topology and data change dynamically, or
Computing Surveys, col. 23, pp. 345- 405, September
the users change the approximation granularity or query
attributes over time. In this scenario, a new forwarding tree [4] Chan H., Perrig A., “ACE: An Emergent Algorithm for
is built each time a query is sent out. This frequently Highly Uniform Cluster Formation”, Proceedings of
rebuilding the tree can be wasteful if the sensed data is the First European Workshop on Sensor Networks
almost the same over time. The streaming mode adjusts the (EWSN), January 2004.
clusters locally as the data and topology change over time.

[5] Chen WP., Hou J. C., Sha L. , “Dynamic Clustering [14] Manjeshwar A., Agrawal D.P., “APTEEN: A Hybrid
for Acoustic Target Tracking in Wireless Sensor Protocol for Efficient Routing and Comprehensive
Networks”, 11th IEEE International Conference on Information Retrieval in Wireless Sensor Networks”,
Network Protocols (ICNP'03), pp. 284-294, 2003. Proceedings of the International Parallel and
[6] Fang Q., Zhao F., Guibas L., “Lightweight Sensing and Distributed Processing Symposium, 2002.
Communication Protocols for Target Enumeration and [15] Raghavendra C.S., Sivalingham K.M., Znatti T.F.,
Aggregation”, Proceedings of the 4th ACM “Wireless Sensor Networks”, Kluwer Academic
International Symposium on Mobile Ad Hoc Publishers, 2004.
Networking and Computing (MOBIHOC), pp. 165-176, [16] Soro S., Heinzelman W.B., “Prolonging the Lifetime of
2003. Wireless Sensor Networks via Unequal Clustering”,
[7] Heinzelman W.B., “Application Specific Protocol Proceedings of the 5th International Workshop on
Architectures for Wireless Networks”, PhD Algorithms for Wireless, Mobile, Ad Hoc and Sensor
dissertation, Massachusetts Institute of Technology, Networks (IEEE WMAN '05), April 2005.
June 2000. [17] Virrankoski R., Savvides A., “TASC: Topology
[8] Heinzelman W., Chandrakasan A., Balakrishnan H., Adaptive Clustering for Sensor Networks”,
“Energy-Efficient Communication Protocol for Proceedings of the Second IEEE International
Wireless Microsensor Networks”, Proceedings of the Conference on Mobile Ad-Hoc and Sensor Systems,
33rd Hawaii International Conference on System (MASS 2005), Washington DC, November 2005.
Sciences, January 2000. [18] Wang K., Abu-Ayyash S., Little T.D.C., Basu P.,
[9] Ibriq J., Mahgoub I., “Cluster-Based Routing in “Attribute-Based Clustering for Information
Wireless Sensor Networks: Issues and Challenges”, Dissemination in Wireless Sensor Networks”, Proc.
Proceedings of the 2004 Symposium on Performance 2nd Annual IEEE Communications Society Conf. on
Evaluation of Computer Telecommunication Systems, Sensor and Ad Hoc Communications and Networks
2004. (SECON 2005), Santa Clara, CA, September 2005.
[10] Intanagonwiwat C., Govindan R., Estrin D., “Directed [19] Ye M., Li C., Chen G., Wu J., “EECS: An Energy
diffusion: A scalable and robust communication Efficient Clustering Scheme in Wireless Sensor
paradigm for sensor networks”, Proceedings of the Networks”, National Laboratory of Novel Software
Sixth Annual International Conference on Mobile Technology, Nanjing University – China, Department
Computing and Networking (MobiCOM '00), Boston, of Computer Science and Engineering, Florida Atlantic
Massachussetts, August 2000. University, USA, 2005.
[11] Liu C., Wu K., Pei J., “A Dynamic Clustering and [20] Yoon S., Shahabi C., “Exploiting Spatial Correlation
Scheduling Approach to Energy Saving in Data Towards an Energy Efficient Clustered AGgregation
Collection from Wireless Sensor Networks”, Second Technique (CAG)”, IEEE International Conference on
Annual IEEE Communications Society Conference on Communications, pp. 82-98, 2005.
Sensor and Ad Hoc Communications and Networks [21] Yoon S., Shahabi C., “An Experimental Study of the
(SECON 05), California, September 2005. Effectiveness of Clustered AGgregation (CAG)
[12] Luo J., Hubaux J-P., “Joint Mobility and Routing for Leveraging Spatial and Temporal Correlations in
Lifetime Elongation in Wireless Sensor Networks”, Wireless Sensor Networks", submitted to ACM
IEEE INFOCOM 2005, Miami, March 2005. Transactions on Sensor Networks. USC (University of
[13] Manjeshwar A., Agrawal D.P., “TEEN: A Routing Southern California) Computer Science Department
Protocol for Enhanced Efficiency in Wireless Sensor Technical Report 05-869, August 2005.
Networks”, Proceedings of the 15th International [22] Younis O., Fahmy S., “HEED: A hybrid, Energy-
Parallel and Distributed Processing Symposium, San Efficient, Distributed Clustering Approach for Ad Hoc
Francisco, 2001. Sensor Networks”, IEEE Transactions Mobile
Computing, vol. 3, pp. 366-379, June 2004.