You are on page 1of 10

Region-Based Approaches and Clustering

The main idea in region-based segmentation techniques is to identify various


regions in an image that have similar features. Clustering techniques encountered in
pattern-recognition literature have similar objectives and can be applied for image
segmentation. Examples of clustering are given in Section 9. 14.
One class of region-based techniques involves region growing [72) . The image
is divided into atomic regions of constant gray levels. Similar adjacent regions are
merged sequentially until the adjacent regions become sufficiently different (Fig.
9.55). The trick lies in selecting the criterion for merging. Some merging heuristics
are as follows:

1. Merge two regions f?l; and fll1 if w!Pm > 0i, where Pm = min(P;, �), P; and � are
the perimeters of Yl; and 9l1, and w is the number of weak boundary locations
(pixels on either side have their magnitude difference less than some threshold
a) . The parameter 01 controls the size of the region to be merged. For example

01 = 1 implies two regions will be merged only if one of the regions almost
surrounds the other. Typically, 01 = 0.5.
2. Merge 9l; and fll1 if w/I > 02, where I is the length of the common boundary
between the two regions. Typically 02 = 0.75. So the two regions are merged if
the boundary is sufficiently weak. Often this step is applied after the first
heuristic has been used to reduce the number of regions.
3. Merge 9l; and £7(1 only if there are no strong edge points between them. Note
that the run-length connectivity method for binary images can be interpreted
as an example of this heuristic.
4. Merge fll ; and fll1 if their similarity distance [see Section 9. 14) is less than a
threshold.

Instead of merging regions, we can approach the segmentation problem by


splitting a given region . For example the image could be split by the quad-tree
approach and then similar regions could be merged (Fig. 9.56).
Region-based approaches are generally less sensitive to noise than the
boundary-based methods. However, their implementation complexity can often be
quite large.
Wea k
boundary

Merge

Figure 9.55 Region growing by merging.


2

CR 2

• •
Split Merge
-----+- -----+-

3
(al Input

3 4


l ( A, B, Dl


2(A, B, Cl
3 ( 8, C, Dl
A
4

(bl Ouad tree split (cl Segmented regions

Figure 9.56 Region growing by split and merge techniques.

Template Matching

One direct method of segmenting an image is to match it against templates from a


given list. The detected objects can then be segmented out and the remaining image
can be analyzed by other techniques (Fig. 9.57). This method can be used to
segment busy images, such as journal pages containing text and graphics. The text
can be segmented by template-matching techniques and graphics can be analyzed
by boundary following algorithms.

Texture Segmentation

Texture segmentation becomes important when objects in a scene have a textured


background. Since texture often contains a high density of edges, boundary-based
techniques may become ineffective unless texture is filtered out. Clustering and

(al Template (bl Input image (cl Filtered image

Figure 9.57 Background segmentation (or filtering) via template matching.


region-based approaches applied to textured features can be used to segment tex­
tured regions. In general, texture classification and segmentation is quite a difficult
problem. Use of a priori knowledge about the existence and kinds of textures that
may be present in a scene can be of great utility in practical problems.

9.14 CLASSIFICATION TECHNIQUES

A major task after feature extraction is to classify the object into one of several
categories. Figure 9.2 lists various classification techniques applicable in image
analysis. Although an in-depth discussion of classification techniques can be found
in the pattern-recognition literature-see, for example, [1]-we will briefly review
these here to establish their relevance in image analysis.
It should be mentioned that classification and segmentation processes have
closely related objectives. Classification can lead to segmentation, and vice-versa.
Classification of pixels in an image is another form of component labeling that can
result in segmentation of various objects in the image. For example , in remote
sensing, classification of multispectral data at each pixel location results in segmen­
tation of various regions of wheat, barley, rice, and the like. Similarly, image
segmentation by template matching, as in character recognition, leads to classifica­
tion or identification of each object.
There are two basic approaches to classification , supervised and nonsuper­
vised, depending on whether or not a set of prototypes is available .

Supervised Learning

Supervised learning, also called supervised classification, can be distribution free


or statistical. Distribution-free methods do not require knowledge of any a priori
probability distribution functions and are based on reasoning and heuristics. Statis­
tical techniques are based on probability distribution models, which may be
parametric (such as Gaussian distributions) or nonparametric.

Distribution-free classification. Suppose there are K different objects or


, Sb . . . , SK. Each class is characterized by Mk prototypes,
which have N x 1 feature vectors y<:/, m = 1 , . . . , Mk· Let x denote an N x 1 feature
pattern classes Si , S2 ,
. . •

vector obtained from the observed image. A fundamental function in pattern recog­
nition is called the discriminant function. It is defined such that the kth discriminant
function gk (x) takes the maximum value if x belongs to class k, that is, the decision
rule is
gk (x) > g; (x) (9. 138)
For a K class problem, we need K - 1 discriminant functions. These functions
divide the N -dimensional feature space into K different regions with a maximum of
K (K - 1)/2 hypersurfaces. The partitions become hyperplanes if the discriminant
function is linear, that is, if it has the form
(9. 139)
Such a function arises, for example, when x is classified to the class whose centroid
is nearest in Euclidean distance to it (Problem 9. 17). The associated classifier is
called the minimum mean (Euclidean) distance classifier.
An alternative decision rule is to classify x to S; if among a total of k nearest
prototype neighbors of x, the maximum number of neighbors belong to class S;. This
is the k-nearest neighbor classifier, which for k = 1 becomes a minimum-distance
classifier.
When the discriminant function can classify the prototypes correctly for some
linear discriminants, the classes are said to be linearly separable. In that case, the
weights ak and bk can be determined via a successive linear training algorithm. Other
discriminants can be piecewise linear, quadratic, or polynomial functions. The
k -nearest neighbor classification can be shown to be equivalent to using piecewise
linear discriminants.
Decision tree classification [60-61 ] . Another distribution-free classifier,
called a decision tree classifier, splits the N -dimensional feature space into unique
regions by a sequential method. The algorithm is such that every class need not be
tested to arrive at a decision. This becomes advantageous when the number of
classes is very large. Moreover, unlike many other training algorithms, this algo­
rithm is guaranteed to converge whether or not the feature space is linearly sepa­
rable.
Let µk (i ) and (J'k (i ) denote the mean and standard deviation, respectively,
measured from repeated independent observations of the kth prototype vector

r l
element y �> (i), m = 1 , . . . , Mk. Define the normalized average prototype features
zk (i) � µk (i )!ak (i ) and an N x K matrix
z 1 (l) z2 (l) . . . zk(l)
Z1 (2) Z2 (2) . . . Zk(2)
Z = :. :
. .: (9. 140)
z 1 (N) z2 (N) . . . zk(N)
The row number of Z is the feature number and the column number is the object or
class number. Further, let Z' � [Z] denote the matrix obtained by arranging the
elements of each row of Z in increasing order with the smallest element on the left
and the largest on the right. Now, the algorithm is as follows.
Decision Tree Algorithm
Step 1 Convert Z to Z'. Find the maximum distance between adjacent row
elements in each row of Z' . Find r, the row number with the largest maximum
distance. The row r represents a feature. Set a threshold at the midpoint of the
maximum distance boundaries and split row r into two parts.
Step 2 Convert Z' to Z such that the row r is the same in both the matrices.
The elements of the other rows of Z' are rearranged such that each column of Z

Z are in the same order as the elements of row r. Split Z into two matrices Z1 and Zz
represents a prototype vector. This means, simply, that the elements of each row of

by splitting each row in a manner similar to row r.


Step 3 Repeat Steps 1 and 2 for the split matrices that have more than one
column. Terminate the process when all the split matrices have only one column.
The preceding process produces a series of thresholds that induce questions
of the form, Is feature j > threshold? The questions and the two possible decisions
for each question generate a series of nodes and branches of a decision tree. The
terminal branches of the tree give the classification decision.
Example 9.11
The accompanying table contains the normalized average areas and perimeter lengths
of five different object classes for which a vision system is to be trained.

2 3 4 5

z (l) = (�) area 6 12 20 24 27

z (2) = (�) perimeter 56 28 42 35 48

This gives

I
'T]1 = 16

Z'
= [�
2
12
35
20 24 27
42 48 56 ] =>
- [
Zi =
6 12
56 28
20 24 27
42 35 48 ]
Z2 z3
The largest adjacent difference in the first row is 8; in the second row it is 7. Hence the
first row is chosen, and z (l) is the feature to be thresholded. This splits Z1 into Z2 and
Z3 , as shown. Proceeding similarly with these matrices, we get

2 1
'
Z2
=
[ 6
28 I
12
56 ] - [
I J
:::> Zz =
12
28
6
56
'T] 2 = 42

I
TJ4 = 23.5

z3 =
, [
20
35 I
24
42
27
48 ] =>
-
Zn =
[ 24
35
20
42
21
48 ]
'T]J = 38. 5 4 3 5
z4
The thresholds partition the feature space and induce the decision tree, as shown in
Fig. 9.58.

Statistical classification. In statistical classification techniques it is


assumed the different object classes and the feature vector have an underlying joint
probability density. Let P(Sk) be the a priori probability of occurrence of class Sk
and p (x) be the probability density function of the random feature vector observed
as x.

Bayes' minimum-risk classifier. The Bayes' minimum-risk classifier mini­


mizes the average loss or risk in assigning x to a wrong class. Define
Measure z( l ). z(2)

z(2)

56

52
5

44 3

36 •
4
2
28 •

20
-+--..___..___����.._ z( l )
0 8 16 24

Figure 9.58 Decision tree classifier.

J c (xlSk)P (x) dx
K

Risk, 9t � L
k=1 Rk
(9 . 141)
K

c (xlSk) � L C;, k p (S; lx),


i = l
where C;, k is the cost of assigning x to Sk when x E S; in fact and Rk represents the
region of the feature space where p (xlSk) > p (xlS;), for every i -:/= k. The quantity
c (xlSk) represents the total cost of assigning x to Sk. It is well known the decision
rule that minimizes 9C is given by
K K

L c;, k P (S;)p (xlS;) < L C;,i P (S;)p (xlS;), 'Vj -:/= k � x E Sk (9. 142)
i= 1 i= 1

If C;, k = 1 , i -:/= k, and C;, k = 0, i = k, then the decision rule simplifies to


'Vj -:/= k � x E Sk (9. 143)
In this case the probability of error in classification is also minimized and the
minimum error classifier discriminant becomes
(9. 144)
In practice the p (xlSk) are estimated from the prototype data by either parametric
or nonparametric techniques which can yield simplified expressions for the discrimi­
nant function.
There also exist some sequential classification techniques such as sequential
probability ratio test (SPRT) and generalized SPRT, where decisions can be made
initially using fewer than N features and refined as more features are acquired
sequentially [62] . The advantage lies in situations where N is large, so that it is
desirable to terminate the process if only a few features measured early can yield
adequate results.

Nonsupervised Learning or Clustering

In nonsupervised learning, we attempt to identify clusters or natural groupings in


the feature space. A cluster is a set of points in the feature space for which their
local density is large (relative maximum) compared to the density of feature points
in the surrounding region. Clustering techniques are useful for image segmentation
and for classification of raw data to establish classes and prototypes. Clustering is
also a useful vector quantization technique for compression of images.
Example 9.12
The visual and IR images u1 (m, n) and U2 (m, n), respectively (Fig. 9.59a), are trans­
formed pixel by pixel to give the features as v1 (m, n) = (u1 (m, n) + u2 (m, n))!VZ ,
v2 (m, n) = (u1 (m, n) - it2 (m, n))IVZ . This is simply the 2 x 2 Hadamard transform of
the 2 x 1 vector [u1 u2f. Figure 9.59b shows the feature images. The images v1 (m, n)
and v2 (m, n) are found to contain mainly the clouds and land features, respectively.
Thresholding these images yield the left-side images in Fig. 9.59c and d. Notice the
clouds contain some land features, and vice-versa. A scatter diagram, which plots each
vector [ v1 v2f as a point in the v1 versus v2 space, is seen to have two main clusters (Fig.
9.60). Using the cluster boundaries for segmentation, we can remove the land features
from clouds, and vice versa, as shown in Fig. 9.59c and d (right-side images).

(a) (b)

(c) (d)

Figure 9.59 Segmentation by clustering. (a) Input images u1 ( m, n) and u, ( m, n ) ;


( b ) feature images v, (m, n) and v , ( m, n ) ; ( c) segmenation o f clouds b y thresholding
v, (left) and by clustering (right) ; (d) segmentation of land by thresholding v, (left)

and by clustering (right).


240

Cluster 2 I/ :-.._
I \
1 80

1
V2 \ I
1'- /
1 20

60
t-+-+-+-t-+-+-+-1r-+--+-+-t--+--fl--r.--t---t-,rl-r-t--il Cluster 1

\ /
0 60 1 20 1 80 240
v,

Figure 9.60 Scatter diagram in feature space.

Similarity measure approach. The success of clustering techniques rests


on the partitioning of the feature space into cluster subsets. A general clustering
algorithm is based on split and merge ideas (Fig. 9.61). Using a similarity measure,
the input vectors are partitioned into subsets. Each partition is tested to check
whether or not the subsets are sufficiently distinct. Subsets that are not sufficiently
distinct are merged. The procedure is repeated on each of the subsets until no
further subdivisions result or some other convergence criterion is satisfied. Thus, a
similarity measure, a distinctiveness test, and a stopping rule are required to define
a clustering algorithm. For any two feature vectors X; and xi , some of the commonly
used similarity measures are:
Dot product:

Similarity rule:

Weighted Euclidean distance: d (x; , xi) � 2: [x; (k) - xi (k)]2 wk


k
Normalized correlation: p (X; , Xi) d
(x; , xi )
= "' I
v (x;, X; )(xi , xi )

Several different algorithms exist for clustering based bn similarity approach.


Examples are given next.
I nput
data
__-� Partition ...,....,...� Test an d ....__
.__ .,,,.
merge

No

Spl it

Figure 9.61 A clustering approach.

Chain method (63]. The first data sample is designated as the representative
of the first cluster and similarity or distance of the next sample is measured from the
first cluster representative. If this distance is less than a threshold, say TJ, then it is
placed in the first cluster; otherwise it becomes the representative of the second
cluster. The process is continued for each new data sample until all the data has
been exhausted. Note that this is a one-pass method.

An iterative method (lsodata) [64] . Assume the number of clusters, K, is


known. The partitioning of the data is done such that the average spread or variance
of the partition is minimized. Let µk (n) denote the kth cluster center at the nth
iteration and Rk denote the region of the kth cluster at a given iteration. Initially, we
assign arbitrary values to f.1.k (0). At the nth iteration take one of the data points X;
and assign it to the cluster whose center is closest to it, that is,
x; E Rk � d (x;, f.1.k (n)) = min [d (x;, µi (n)] (9. 145)
j = l, . . . , K
where d (x , y ) i s the distance measure used. Recompute the cluster centers by
finding the point that minimizes the distance for elements within each cluster. Thus

µk (n + 1): L d (x;, f.1.k (n + l)) = min L d (x;, y), k = l, . . . ,K (9. 146)

The procedure is repeated for each x;, one at a time, until the clusters and their
centers remain unchanged. If d (x , y) is the Euclidean distance, then a cluster center
is simply the mean location of its elements. If K is not known, we start with a large

.
Sk k = 1 , . . , K
Classificatio n t------t� kth class

i ma ge Features
___.....-. Feature
extraction
Symbols ....-----. Description
Symbolic
I nterpretation
representation

Visual Look up
models in tables

Figure 9.62 Image understanding systems.


value of K and then merge to K - 1, K - 2, . . . clusters by a suitable cluster-distance
measure.
Other Methods
Clusters can also be viewed as being located at the nodes of the joint Nth-order
histogram of the feature vector. Other clustering methods are based on statistical
nonsupervised learning techniques, ranking, and intrinsic dimensionality determi­
nation, graph theory, and so on [65 , 66] . Discussion of those techniques is beyond
the goals of this text.
Finally it should be noted that success of clustering techniques is closely tied
to feature selection. Clusters not detected in a given feature space may be easier to
detect in rotated, sealed, or transformed coordinates. For images the feature vector
elements could represent gray level, gradient magnitude, gradient phase, color,
and/or other attributes. It may also be useful to decorrelate the elements of the
feature vector.

You might also like