You are on page 1of 23

COMPUTER VISION, GRAPHICS, AND IMAGE PROCESSING 3t),125-147 (1985)

Threshold Selection Based on a Simple image Statistic


J. KI~ERAND J. ILLINGWORTH
SERC Rutherford Appleion Laboratory, Chilton, Didcot, Oxon, OX11 OQX, United Kingdom

AND
J. F~GLEIN
Institute for Coordination of Computer Techniques, Budapest, Hungary
Received June 19,1984; accepted December 4,1984

The problem of automatic threshold selection is considered. After a brief review of available
techniques, a novel method is proposed. It is based on image statistics which can be computed
without histogramming the grey level values of the image. A detailed analysis of the properties
of the algorithm is then carried out. The effectivenessof the method is shown on a number of
practical examples. 0 1985 Academic Press. Inc.

1. INTRODUCTION
A primary problem of image processing is to devise algorithms which will
successfully divide complex images into areas which meaningfully correspond to
objects in the real world. This image segmentationproblem can be extremely difficult
for general images which contain a large range of luminance or grey level values.
However, for many important applications in medicine or industrial inspection, the
main features of an image can be representedby as few as two grey levels. A typical
example is the inspection of an object placed on a dark background with which it
contrasts strongly. In such a situation the histogram of luminance values will possess
a strong bimodality with one peak corresponding to pixels from the object regions
and the other corresponding to pixels of the image background. This observation
permits classification or segmentationof the image by considering the relation of the
luminance values 1(x, y) with a luminance value T which is between the luminance
values of the object and background. The simple decision criterion for the class of
each pixel is:

if I( x, y) 2 T then pixel is object or class 1


else pixel is background or class 2.

T is called a threshold and this paper is concerned with automatically choosing this
number for images which satisfy a two-class assumption.
The importance of the thresholding segmentation method is based on its simplic-
ity and its wide applicability. It is useful because it is a data reduction step and
because it produces a binary representation of an image. Binary images are readily
manipulated to produce higher level descriptions of the scenes and objects, i.e.,
borders, relational graphs, etc.
The problems which occur in blindly applying thresholding are due to the nature
of real images and the fact that the assumptions which underlie the method, i.e., the
image is representable using only two grey levels, are not satisfied. A common
125
0734-189X/85 $3.00
Copyright 0 1985 by Academic Press, Inc.
All rights of reproduction in any form resaved.
126 KITTLER, ILLINGWORTH, AND FijGLEIN

problem is the effect of gradual shading across the surface of a large image. The
image is still segmentable provided nonrandom contrast which delineates objects is
locally preserved. However the gradual shading may mean that spatially separated
points of the same object may have luminances sufficiently different that they are
assigned to different classes. The shading leads to a broadening or filling in of the
intermodal valleys of the grey level histogram. It can even result in unimodal
distributions. Another problem may be that the object area is small compared with
the background area and therefore the size of the mode that it contributes to the
histogram is no more significant than that of random picture noise. Both of the
above problems can be addressed by summing the histogram over an area which is
more appropriate, i.e., in these cases a smaller area. However, the optimal size and
shape of such an area is difficult to determine and histogramming over too small an
area may produce results which lose any statistical significance or which violate our
basic assumption of histogram analysis, i.e., that two distinct significant modes exist.
Equally invalid is any attempt to apply the threshold method to an image with
more than two histogram modes. A common case occurs for industrial inspection
images in which there is a background, an object, and a strong shadow from the
object. Two intermodal valleys would then exist and it would be necessary to search
for two threshold values. In this paper we will not deal with this question of multiple
threshold selection.
The aim of this paper is to present a new method of automatically selecting a
threshold value. This is accomplished using simple statistics of the image and
without reference to histogram analysis. In order to place the new method, which we
call RATS (robust automatic threshold selector), in context we have included a brief
survey of many of the other methods which have been proposed for automatic
threshold selection, stressing some of the advantages or drawbacks of each method.
Such a review is timely as thresholding is an important developing technique and
few recent references exist to provide any comparative study of useful techniques.
The paper is organized so that the review occupies the following section. Section 3
considers a specific model of images from which several interesting properties and
statistics can be derived. Their usefulness has been indicated in previous publications
[l, 21. Section 4 constructs and justifies the use of a combination of these measures to
select a meaningful threshold. The effect of noise on the robustness of RATS
algorithm is then analysed in Section 5. Section 6 gives examples of the practical
application of the method to a couple of images. Section 7 discusses and illustrates
how the method can be applied more locally to overcome the effects of nonuniform
illumination. The linal section includes discussion and conclusions.
2. REVIEW OF THRESHOLDING TECHNIQUES
A wide selection of thresholding techniques use only the information contained
within the luminance histogram of the image. The most general method involves
locating all the modes of the histogram. Several peak-finding algorithms exist [3, 41.
We have used a scheme which produces a linear piecewise approximation of the
histogram [5,6]. This is then coded to indicate sections of positive, negative, or zero
slope and a simple syntactic analysis can be performed to locate all peaks and
valleys. This method includes several tunable parameters such as the precision of the
linear approximation and the value of gradient at which a line is regarded as having
significant nonzero slope. In practice these were easily selected. The method was
THRESHOLD SELECTION BASED ON A SIMPLE IMAGE STATISTIC 127

found to work well as it imposes few assumptions on the structure of the histogram
and can be used to select several thresholds if the structure of the histogram
indicates that more than two prominent peaks are present.
A popular thresholding method assumes that the grey level histogram contains
two and only two prominent modes and they are both Normally (GaussianIy)
distributed. The method fits the observed histogram to a sum of Gaussians with the
distribution means and widths as parameters [7]. The problem of such an analysis is
the computational complexity and its sensitivity to the correctness of the underlying
assumptions. A goodnessof fit criterion can be used as a test of the suitability of the
method for any particular image. If a bad fit is discovered than the analysis can be
repeated postulating a sum of three Gaussian distributions.
Ridler [8] has proposed an iterative method of thresholding. In his method he
utilizes a switching function image which is the binary version of the picture
obtained by using the threshold value of the last iteration. The initial switch function
is arbitrarily chosen as a binary image with the comer points assignedas background
and the rest of the picture as object. At each iteration the mean luminance values of
the pixels in the object and background classesof the associated switching function
image are calculated. The average of these grey level means is used as a new
threshold value to produce a new switching function. This process is iterated until a
stable solution is found. This is a very inelegant, multiple pass formulation of the
method. Essentially only grey level histogram information is used and this can be
accumulated by a single pass through the image data. The method consists of
arbitrarily dividing the histogram into two parts and calculating the mean grey level
of each part. The next approximation to the best threshold is the average of these
two mean values. This new approximation is used to divide the histogram and the
process is iterated until a stable solution is obtained. This formulation of the method
is simple and the process is much faster [30].
The effectiveness of grey level histogram analysis can be increased by considering
the histogram of suitable subpopulations of pixels of the image. In an ideal case the
pixels which populate the intermodal valley are those which he on the edgesbetween
the object and background regions. Thus if these pixels can be identified and
removed from the histogram the intermodal valley will deepen and be more easily
identified. Conversely the mean of the subpopulation of edge pixels should be a good
point at which to choose a threshold. Many people have suggested weighting the
grey level histogram with local derivative information, gradient and/or Laplacian.
However problems occur because random noise pixels also have large derivative
values. Rosenfeld and Weszka [9] have made a detailed study of such methods and
conclude that the study of grey level versus gradient plots can be useful aids in
threshold selection but they are not a general solution to the threshold selection
problem.
Wu, Hong, and Rosenfeld [lo] have experimented with isolating edge region pixels
by looking at small size blocks of a quadtree representation of an image. This
involves iterative subdivision of the image and parts of the image into quadrants, A
quadrant is not subdivided once the variance of the grey level values in the quadrant
is less than a specified tolerance. Edge points will give large contributions to the grey
level variance and therefore squares containing them will be subdivided until they
reach a small sire. The subset of these small quadtree blocks provide an enriched
sample of edge points upon which threshold analysis can be more easily performed.
128 KITTLFiR, ILLINGWORTH, AND FijGLEIN

Many threshold selection methods consist of thresholding an image at many


possible threshold values and then selecting that thresholded image which has some
desirable quality. Using this strategy Milgram [ll] has suggested that the best
threshold image should be taken as that which has the most coincidences between
object outlines defined by the border pixels of the thresholded image and the pixels
of a thinned edge map. He calls this cooperative strategy the method of “convergent
evidence.” More complex criteria can be formulated to quantify the agreement
between the edge and threshold based segmentation. This method inherently de-
mands many data passes and/or much image storage.
Rosenfeld and De La Torre [12] have recently suggested using a concavity analysis
to help to select a good threshold. They construct a convex hull around the upper
half of the grey level histogram of the image and look for prominent valleys by
investigating points with large differences between the values of the convex hull and
the histogram. They claim that the number of candidate thresholds selected by this
method is low and therefore it is feasible to threshold the histogram at several
thresholds and use some criterion such as that of Milgram to determine the best
result.
The method of Pal, Ring, and Hashim uses concepts borrowed from fuzzy set
theory [13]. For every possible threshold level they compute two new maps of the
image. The first is the threshold map which would be obtained by thresholding at a
trial level TO. The second map is derived by mapping the full range of grey level
values into a value between 0 and 1 using the standard fuzzy set S membership
function with its point of inflection at the grey level TO. By taking a pixel by pixel
sum of the differences between these two maps a measure called the index of
fuzziness is derived for each trial threshold TO. The width of the S membership
function is an important parameter of the method as it determines the size of a grey
level slice which contributes to the difference sum. If the trial threshold is close to
the average grey level of an edge with the contrast of the edge closely matched to the
width of the S function then the index of fuzziness will be small. Therefore the
histogram of the index of fuzziness values as a function of trial threshold contains
valleys at grey levels which are most suitable for thresholding. The disadvantage of
this method is the large computational cost to derive a histogram which must then be
analyzed in the same way as a grey level histogram.
Several threshold selection methods are based on computing a suitable measure
for every possible grey level or candidate threshold value. The best threshold is
selected as that for which the measure has its maximum or minimum value. Otsu has
suggested calculating the between class separation (BCS) of the grey level distribu-
tion [14]. For the case of analyzing a bimodal distribution this quantity is defined as
BCS (TO) = Nl * N2 * (Ml--M2)*, where Nl is the number of pixels having grey
level less than the trial threshold TO and N2 is the number with grey level at or
above this threshold. Ml and M2 are the means of the two parts into which the
distribution is split by the threshold TO. The best threshold is that for which BCS is
a maximum. Otsu was motivated by considering the theory of discriminant analysis
but he also showed that this method is equivalent to minGzing the mean square
error between the grey level picture and its binary representation obtained by
thresholding at grey level T. Further analysis [15] led Otsu to derive a normalized
measure of the goodness of fit of his method. Tests show that the algorithm is
computationally simple, stable, and effective. Its is also readily extendable to dis-
THRESHOLD SELECTION BASED ON A SIMPLE IMAGE STATISTIC 129

criminate between modes of a trimodal distribution. However as with fitting Gauss-


ians to grey level distributions it is necessary to decide on the modality of the
distribution.
A function based on the Shannon entropy, an information theory quantity, was
suggested by Pun [la]. Pun attempted to formulate a method to maximize the
entropy of the thresholded image while considering its relationship to the entropy
(information or statistical properties) of the grey level distribution of the original
image. Unfortunately the entropy of a distribution is maximal for a flat distribution
and the method strongly favors a trivial solution in which there are equal numbers of
pixels in each class of the thresholded image histogram. A later paper 1171suggests
using the histograms of cumulative entropy and cumulative probability of grey level
occurrence to characterize the asymmetry of the grey level distribution and then
select a threshold. In our tests we found this method rather poor.
Johannsen ef al. [18] use an entropy measure to split the grey level distribution
into two parts. The properties of the measure are such that it selects the correct
threshold, that is the grey level value which corresponds to the m inimum in the grey
level histogram. If several candidates for threshold exist, the measure favors the grey
level value closest to the equal population point. There are conceptual similarities
between this and Otsu’s method but the former is more locally sensitive.
Deravi and Pal have published a method of threshold selection based on a grey
level transition matrix [19]. This is a n * n matrix, where n is the number of possible
grey levels. The matrix is constructed by horizontally and then vertically scanning
the image and incrementing the (i, i) entry of the matrix by one if a transition from
an i grey level to a i grey level is encountered. A trial threshold at TO will partition
this matrix into four parts. The sum of two of these parts suitably normalised yields
a measure of the number of times a pixel is followed by a pixel of the opposite class.
For good thresholding this measure should be a m inimum. Deravi and Pal claim that
this method is useful in segmenting even unimodal distributions. An obvious
disadvantage is the large storage space which is required for its implementation.
Barrett [20] has suggestedan iterative algorithm which can be used to select one or
more stable thresholds. A binary image is produced by thresholding at a trial gray
level. For each border pixel in the binary image an average gray level value is
calculated for part of the edge transition region in a 1 x m window, centered at the
border pixel position. The longer axis of the window is orthogonal to the border
contour, i.e., the long axis is parallel to the maximum gray level gradient direction of
the edge. For real blurred edges with S shaped gray level profiles the gray level at
the position on the edge profile where maximum gradient occurs is a good threshold
value. The average gray level computed in the window is either equal to this value or
closer to it than the original gray level at which the image was thresholded.
Therefore iteration of the process using the mean of the average gray level of all
border pixels to produce the next iteration binary image will eventually produce a
stable solution which corresponds to a good threshold selected where many pixels
have large gradient values, i.e., in central positions along gray level transition
profiles. If more than one class of object is in the field of view they can be recovered
by reinitiating at a new threshold value and reiterating to a new distinct stable
solution.
Kohler [33] has suggesteda threshold selection method which uses the intensity
contrast between adjacent pixels. His method calculates for each trial threshold a
130 KITTL.ER, ILLINGWORTH, AND FijGLEIN

mean value of the contrast of all step edges detected by the threshold. A step edge is
identified whenever the trial threshold value lies between the intensity values of the
adjacent pixels under consideration. The contrast contribution is the minimum of
the absolute differences between the trial threshold and the respective pixel intensi-
ties. This method of calculating contrast ensures that a given step edge gives
maximal contribution when the trial threshold is midway between the intensities of
object and background, of the step edge. The threshold selected is that trial value
which gives the highest average contrast value. The method favors the selection of a
threshold which has large numbers of high contrast edges and low numbers of low
contrast edges. As with the method of Barrett, it is also suitable for selecting
multiple thresholds.
Several authors have investigated the use of relaxation algorithms for classification
and threshold selection. This relies on the fact that the pixels within a uniform area
should have similar grey level values and therefore comparison of the assignment of
a pixel with its immediate neighborhood allows a quantitative assessment of the
correctness of that assignment. Rosenfeld and Smith [21] claim good results with this
method but Renade and Prewitt [22] find that the method is sensitive to the initial
assignment and to subsequent pixel updating rules. Renade found relaxation a useful
method to improve thresholding results after initializing the method with a standard
histogram threshold selection algorithm. Bhanu and Faugeras [23] have performed
studies of a gradient relaxation scheme and found that this yields good results and
the convergence properties of the method are controllable using only a few parame-
ters. They claim that their method is useful for unimodal distributions [24].
Other papers which suggest thresholding methods and/or include comparative
studies of thresholding algorithms can be found in [25-291.
3. PRELIMINARIES
Most of the approaches reviewed in Section 2 involve the analysis of the histogram
of grey level values which is associated with the difllculties and problems listed
earlier. The exceptions include the convergent evidence method [ll], where the
threshold is considered as a parameter of some criterion function quantifying the fit
between edge and threshold based segmentation. The determination of the optimal
threshold involves several iterations through the image data. Likewise, the edge
profile searching method [20] and the relaxation methods are iterative. Although the
unreliable histogram analysis is obviated in these methods, they are neither simple to
implement, nor free from artifacts, two essential prerequisites for completely auto-
matic operation.
Ideally we should like to be able to determine the correct threshold on the basis of
simple statistics defined directly in terms of pixel grey level values and possibly their
functions, without the need to rely on histogram analyses or some criterion optimiza-
tion involving multiple data passes. We shall introduce such a novel method in
Section 4 but before doing so it will be useful to provide some background and
preliminary material.
The search for a simple statistic which could provide a basis for a thresholding
method was stimulated by our recent work on edge detection which led to the
development of the absorption edge detector [l, 21. This detector has the desirable
property of yielding the edge magnitude proportional to the contrast between the
background and the object independent of the actual edge position and orientation.
THRESHOLD SELECTION BASED ON A SIMPLE IMAGE STATISTIC 131

FIG. 1. Image of a scene segment.

It is based on the observation that the sum of the edge magnitudes output by
conventional edge operators in the vicinity of an edge and along a line intersecting
the edge is constant. This property has been shown to hold for a family of operators
with a 3 x 3 kernel. For the purpose of our discussion here it will be more
appropriate to consider a 1 x 3 operator and show that the absorption principle
remains valid.
Let us consider a scene segment containing a boundary between the dark
background and light object illustrated in Fig. 1. Suppose the contrast between the
object and the background is E, i.e.,

E-B-D.
where B and D are the luminance of the object and the background, respectively.
For the moment let us assumethat we can obtain a noisefree image of the scene and
also that the true edge angle lies in the interval [ - 45”, 457.
We shall now apply the edge gradient operator illustrated in Fig. 2 along one scan
line and sum its output over a set of consecutive pixels. Note that well inside the
background, or the object, the outputs of the operator will be zero and hence their
sum will also be zero. We shall therefore turn our attention to the boundary region.
In the previously reported studies of edge detectors a particular model for the
imaging device has been adopted [l, 21.According to this model, the grey level value
at a pixel is given by the integral of the sceneluminance function over the pixel area.
Here we shall adopt a more general model capable of characterizing factors affecting
the imaging process such as frequency characteristic lim itations, cell overlap, cell
crosstalk, etc. Thus it will be assumed that an ideal step in the scene luminance
function corresponding to the object boundary will give rise to a grey level value
transition function in the boundary region. The only restriction in the model is that
at any pixel in the vicinity of a true edge at angle [ - 45”, 457 the magnitude of the
derivative of the transition function with respect to x will exceed the one in the y
direction.

FIG. 2. x-derivative mask.


132 .KITTLER, ILLINGWORTH, AND FijGLEIN

background D D “, a2 a3 . * B 8 object
k-l =k

FIG. 3. Grey level values along a scan line.

Let us consider grey level values in the boundary region along one scan line and
let us denote them as shown in Fig. 3. Applying the operator of Fig. 2 we get the
gradient magnitudes shown in Fig. 4. Summing over the k + 1 pixels of the
luminance function transition region along the scan line we find
k+l k k
cej= xaj+2(B-D)- xaj=2E. 0)
j=O j=l j-l

Thus the summation of the derivatives in the x direction equals twice the contrast.
As the derivatives of the luminance function at pixels outside the boundary region
are zero, the sum of the operator outputs along the complete scan line is still equal to
2 - E.
In order to extend this result to true edge angles from the interval [135’, 225’1, we
simply need to replace ek by its absolute value. It is easy to verify that this does not
affect the result in (1).
To summarize this result more formally let us introduce the concept of a vertical
edge.
DEFINITION 1. Let the luminance function over the imaged scene be a step
function and denote the contour defining the boundary between high and low
luminance values by r(x, y). Let (x0, yo) be a boundary point. We say that the
luminance function, has a vertical edge at point (x0, vo) is the angle between the x
axis and the normal to the boundary at this point lies in either of the following
intervals: [ -45”, 457, [135’, 225’1.
Let us denote by N the number of pixels in one scan line. We can now state the
following theorem.
THEOREM 1. Let ej, j = 1, N be the outputs of the diferentiation operator in Fig. 2
along a horizontal scan line intersecting one vertical edge. Then

z lejl = 2E.
j=l

We now extend this result to the whole image. Suppose that in total n horizontal
scan lines intersect one vertical edge. Denoting by eij the output of the diEerentia-
tion operator centered at pixel (i, j), each intersected line contributes 2 E to the sum
of all eij. Assuming that the total number of horizontal scan lines is equivalent to

e0 q 82 . . . . . e k-l ek ektl
bockgoundI 0 k, -q2qa3y1 . [ * 1 * bt&@k&~kI 0 Iobject

FIG. 4. Output of the x-derivative mask for the data of Fig. 3,


THRESHOLD SELECTION BASED ON A SIMPLE IMAGE STATISTIC 133

the number of pixels in each line we can write:

(2)

Obviously in usual situations edges arise at the boundaries between objects and
the background. If a complete object is in the field of view, then one scan line will
intersect at least two vertical edges. A complicated shape of an object or several
objects may give rise to several vertical edges being intersected by one scan line.
Provided these edges are separated enough to allow the grey level function to reach
either the object or background intensity as appropriate, then the ith scan line will
contribute to the summation of the derivative magnitudes by the amount of
2 . E * n,.
The summation over the complete image will yield

(3)

Thus the output is directly proportional to the number of vertical edge pixels in
the image.
By analogy we can define the horizontal edge as follows.
DEFINITION 2. We say that the luminance function has a horizontal edge at
point (x,, vO) if the angle between the x axis and the normal to the boundary
r(x, JJ) at this point lies in either of the following intervals: [4Y, 1357 and
[225”, 3157.
If instead of the operator in Fig. 2 we now use the column operator of Fig. 5,
because of the rotational symmetry we can derive identical results to (3) for the
horizontal edges.
In order to combine these results for an image containing both vertical and
horizontal edges we recall that the operators in Figs. 2 and 5 approximate the x and
y derivatives of the image intensity function. It is easy to show that for a vertical
edge the output of the x mask (Fig. 2) will exceed the output of the y mask in Fig. 5
and vice versa. Thus by selecting the greater of the two outputs we obtain the
appropriate maximum derivative map of the image.
We now consider the effect of summing up the pixel values of the derived map.
Irrespective of the edge directions it makes no difference to the output of the
summation operator whether the summation is carried out row-wise or column-wise.
Provided we are not too close to the pixels where vertical and horizontal edges
intersect, i.e., near comers, the above results can be applied directly. In the region

El
1

-1

FIG. 5. y-derivative mask.


134 KITTLER, ILLINGWORTH, AND FijGLEIN

where the vertical and horizontal edges intersect, the situation is not as clear and
further detailed analysis is warranted. However, the number of such points is likely
to be small in comparison with noncomer points and their effect on the summation
operator output will therefore be negligible. We thus have the following theorem.
THEOREM 2. Let eij be the maximum in the absolute sense of the outputs of the x
and y derivative masks centered at the (i, j)th pixel. Ignoring any edge corner effects,
the sum of the absolute values of eij over an image is equal to two times the contrast E
times the number n of edgepixels in the image, i.e.,

i t leijl = 2En. (4
i-l j-1

This result is quite interesting. It states that irrespective of the edge profile which
is a function of the sensing device limitations, the output of the derivative map
summation operator depends only on the contrast and the number of edge pixels.
Note that n is the sum of the perimeters (expressed in pixel size units) of the objects
in the image. It must *be emphasized that the result is valid only under the
assumptions of uniform lighting and zero noise. However these aspects will be dealt
with later.
We shall not elaborate on the possible uses of this result which could, for instance,
include object perimeter mensuration. Instead we shall consider whether similar
simple and meaningful relationships exist between other statistics that can be easily
derived from any image. We shall see that one such relationship provides a basis for
an effective method of threshold determination which does not require the compu-
tation and, in particular, analysis of the grey level histogram.
4. A NEW THRESHOLDING METHOD
Having obtained such a surprising result when summing up max derivative values,
let us consider some other obvious candidate variables for such simple statistics.
Suppose we sum up all the grey level values g, to see whether the result has
interesting properties. In an ideal two-level image object and background pixels have
grey level values B and D, respectively. If the number of object pixels is 4, then the
sum will be
N N
c ~gij=qB+(N2-q)D=qE+N’D.
i-l j-l

In a more realistic image where the transition from the background to object
intensities is gradual this result will still hold provided the transition function is
reasonably symmetric.
While there may be some uses for these grey level statistics if some of the
parameters are known, i.e., either object size, contrast, or background level, espe-
cially in conjunction with the relationship in (4), the result does not seem to have a
clear designation of applications. We shall, therefore, consider the next obvious
candidate, namely the sum of grey values each multiplied by the maximum deriva-
tive.
THRESHOLD SELECTION BASED ON A SIMPLE IMAGE STATISTIC 135

FIG. 6. The product of the grey level and the x-derivative magnitude for the scan line of Fig. 3.

As before, we shall first consider one scan line intersected by a single edge at an
angle from the interval [ - 45’, 457 as in Fig. 1. Taking the product of the grey level
values in Fig. 3 with the corresponding magnitudes of the x derivatives in Fig. 4 we
get an output scan line illustrated in Fig. 6. Summing up over j = 0,. . . , k + 1 with
ff O=Danda,,,=Bwefind
k+l
C hj = fI aj(aj+l - aj-l) + a~(a~ - %I + ak+l(ak+l - ak)o
j=O j-0

After eliminating the terms that cancel out we get


k+l
c hj = i.~f+~- of
j-l

which in terms of intensity levels B and D becomes

k+l
xhj=(B+D)E.
j=l

It is easy to see that this result holds for any vertical edge provided hj is replaced
by its absolute value. Also if a scan line is intersected by one vertical edge only, then
the result holds for the summation over all pixels in the scan line, as the pixels well
within the object and the background do not contribute to the value of the statistics
(zero derivative). The following theorem summarizes this basic result.
??HBORBM 3. Let hj, j = 1, . . . , N be the product of the grey level value gj and the
output ej of the operator in Fig. 2 along a horizontal scan line intersecting one vertical
edge. Then

t lhjl = (B + D)E.
j=l

By an argument identical to that presented in Section 3 we can extend this basic


result to the statistics defined over the whole image with multiple edges at arbitrary
angles (and under the same assumptions).
THEOREM 4. Let eij be the maximum of the outputs of the x and y gradient masks
centered at the (i, j)th pixel. Ignoring any edgecorner eflects, the sum of the absolute
values of the products h ij of eij and the grey level values g,, over the entire image is
equal to the contrast E times the sum of the background and object intensities times the
136 KITTLER, ILLINGWORTH, AND F6GLEIN

number n of edge pixels in the image, i.e.,


N N
c c lhiil = (B + D)E - n. (6)
i-l j-l

This result again is quite interesting. The “greygrad” statistics is a meaningful


quantity which is directly proportional to the total perimeter of objects in the image.
While its usefulness as a stand-alone statistic may be limited, the comparison of
Theorems 2 and 4 immediately suggestscompound statistics given by the ratio of the
sum of grey-grad values to the sum of grad values, i.e.,
N N

C C lhijl
B+D
T= i;lj;l _
(7)
C C leijl 2 ’

Now note that the right-hand side of (7) is the midpoint between the object and
background intensities. Under the assumption that B and D do not vary over the
image, this quantity is the appropriate threshold value for segmenting objects from
the background. Thus we have derived a completely new basis for thresholding
which does not require the computation of the grey level histogram and, more
importantly, its subsequent analysis.
A few comments are in order. First of all note that this novel method of
determining the threshold can be applied to images of any size. Second, the size of
the objects in the image in relation to the background size is immaterial. However, if
the image contains no objects then the statistics T is undefined. It is important,
therefore, to check the denominator in (7) to avoid numerical and semantic prob-
lems.
The assumption that the image be uniform, that is, having constant object and
background intensities over the whole image is highly unrealistic. Indeed if that were
the case the image thresholding problem would be trivial. Nevertheless it can be
argued that, the assumption often holds for a small enough subimage. If we divide
the image accordingly then an appropriate threshold value can be determined for
each resulting image window separately. This is the basic philosophy behind the
variable thresholding approach to segmenting images of scenes subject to nonuni-
form lighting. However, unlike histogram-based thresholding methods, where the
image partitioning accentuates the problems of histogram analysis, the new method
of threshold selection is remarkably robust. Of course, the subdivision of the image
increases the probability of each window being homogeneous, that is, containing
either background or object only. A suitable strategy, therefore, must be adopted to
cope with such situations.
Another factor causing the image to be nonuniform is noise. In the above
derivation of the properties of statistics T, noise has not been taken into account. It
will obviously result in nonzero value of the sum of derivatives even if a window
contains only background or object pixels. The effect of noise on the threshold value
will be investigated in the next section.
THRESHOLD SELECTION BASED ON A SIMPLE IMAGE STATISTIC 137

5. THE NOISE EFFECT


In practice the luminance function of any scene imaged by a sensor will be
corrupted by noise. Assuming that the noise is stationary, additive, and the noise
variables are independent of each other and the true image values, the signal sij at
the output of the sensing device will be
Si, = gij + lij,

where g, is the grey level in the noise free case and qlij is the noise signal both at
pixel (i, j).
For the sake of simplicity, instead of analyzing pixel values in the 2-dimensional
image field, we shall consider- a l-dimensional analog that is a time series (or
one-scan line of the image). The generalization to the 2dimensional case is fairly
straightforward.
According to the simplifying assumption our model in the interval of N sample
points is now

s(i) = g(i) + q(i) i= 1,2 ,--., N. (8)

In this case, statistics T defining the threshold is given by

it 4i)M i) I
T= j=lhJ (9)
iFl 14) I ’

where e(i) is the digital approximation of the derivative of s obtained using the
mask of Fig. 2, i.e.,

e(i) = s(i + 1) - s(i - 1). (10)


Now suppose that additive noise q(i) is normally distributed with zero mean and
variance S2, i.e., the density function p(q) satisfies

p(v) = [2&2]-1’2exp
i
-$
i
. (11)
Let us denote the difference noise signal by c(i), i.e.,

t(i) = q(i + 1) - q(i - 1). 02)

It is well known that the random variable obtained by a linear transformation of a


normally distributed vector variable is also normally distributed. More specifically, if
y is a joint normal variable distributed according to N(v, Z) and [ = Ay is a linear
mapping of y then [ - N(Av, AZAr). Letting

y = [q(i + l),v(i - I)]’


138 KITTLER, ILLINGWORTH, AND FiiGLEIN

and

A = [l, -11,

we find that t(i) is distributed according to N(0,2a2). Thus the density of E(i) is

p(6) = [4&2]-1’2exp - -$ . (13)


1 i

From this expression we can readily find the mean of It] to be

WI} = I. (14)

Let us now return to the numerator of T in (9). We shall examine the expression:

(15)

and show that for sufhciently large N it will approach zero. We fhst consider the
expected value of (15). Changing the order of the mathematical expectation and
summation operators we get

06)

Since E(i) is independent of q(i), so is e(i) and we can write

E(e) = $ ,$ E{ q(i))E{ [e(i) I} = 0. 07)


r-l

Now, the variance of (15) is given by

=5 flE{ q2(i)e2(i)} + $E{ di)di + l)k(ib(i + 1) I}


[

+ icl c E{dih(i)k(i>e(j)I)]- (18)


,i-&2

Note that for Ii -jl r 2 u(i) and v(j) will be independent of e(i) and e(j) and
therefore the last term will vanish. Because of the independence of q(i) and e(i) we
THRESHOLD SELECTION BASED ON A SIMPLE IMAGE STATISTIC 139

have for the first term

=~62~E{[g(i+l)-g(i-1)+~(i+l)-~(i-1)]2}. (19)

Since the absolute value of the difference of two noisefree pixel intensity values is
less or equal to contrast E the first term in (18) can be bounded by 6 2( E2 + 26 2)/N.
For the second term in (18) we can write

5 ;clE{di)di + l)k(ib(i + 1)I>


= $E{ v(i)le(i + 1) I} * E{ q(i + l>]e(i)]}. (20)

Each term of the product in (20) can be bounded as

E{ di)le(i + 1) I} s E{ ls(i>e(i + 1) i>


= E{ Iv(i)[g(i + 2) - g(i) + rl(i + 2) - v(i)] I}
5 E{ Irl(i)[di + 2) - g(i)1I} + E{Iv(i)[v(i + 2) - v(i)1I>
I E(&/n) 6 + E{ (q(i)q(i + 2) I} + E{ q2(i)}
=@/T)SE+(&~)~~+S~ (21)

Thus finally we can write

&ire=; S2(E2 + 2S2)+ (+'+(6,v)SE)i]. (22)


Since both the contrast and the noise variance are bounded, by letting N + cc the
variance of E wiII approach zero as we set out to show.
The practical meaning of (17) and (22) is that E is close to zero provided N is
sufficiently large.
In contrast the first term in the numerator of (9) satisfies

where D is the background signal level. Consequently the effect of the second term
in the numerator can be neglected and we can write:
140 KITTLER, ILLINGWORTH, AND FbGLEIN

FIG. 7. Image of lens cap.

Starting from (24), it has been shown in [31] that in the presence of noise the
computed threshold T will be approaching

T + y +(B - D)(O.5 - q),

where q denotes the fraction of the object pixels in the image. For q = 0.5 the
threshold determined will still be correct. For other values of q the threshold will be
shifted either towards the object or background grey level values. However this bias
can be removed very effectively as discussed in [32].

6. EXPERIMENTAL RESULTS
We have implemented the RATS algorithm on a PDP D/44 using the PASCAL
programming language. Input images were acquired using a standard VIDICON
camera whose output was digitized to give grey levels in the range 0 to 63. The
gradient values of the image were calculated in software but considerable speed
advantages would result from a hardware implementation of this simple operator.
Figure 7 is an image of a lens cap. High contrast between the dark cap and light
background permits simple, correct threshold selection. A single global threshold
was calculated by including contributions of all the pixels. The grey level histogram
of the image is shown in Fig. 8a with the selected threshold indicated. The resultant
binary image is Fig. 8b.
Figure 9a is an image of a typical metal product which may need inspection.
Figures 9b and c show the grey level histogram and the binary image obtained using
the indicated threshold. The histogram is relatively complex. The upper mode of the
histogram has been spread and split into 3 submodes by nonuniform lighting and
shadowing. The resultant globally determined threshold produces a poor segmenta-
tion. The nonuniformity of illumination is illustrated in Fig. 9d which is the profile
of grey level intensity along a diagonal line in the background portion of the image.
This image requires threshold selection to be made locally.
THRESHOLD SELECTION BASED ON A SIMPLE IMAGE STATISTIC 141

FIG. 8. (a) Gray level histogram. The selected threshold is indicated. (b) Binary image based on global
statistic. Good segmentation results.

FIG. 9. (a) Image of metal object. (b) Gray level histogram. The selected threshold is indicated. (c)
Binary image from global statistic. Poor segmentation. (d) Gray level scan along a diagonal line of the
image background.
142 KITTL.ER, ILLINGWORTH, AND FijGLEIN

7. VARIABLE THRESHOLDING
For scenes which are nonuniformly ilhuninated, the determination of a single
global threshold for the image is often unsatisfactory. A more appropriate method is
to partition the image into smaller square windows for which relevant thresholds can
be independently determined. This strategy has been successfully used by several
authors for several thresholding methods [7, 271. However many threshold selection
methods which analyze the gray level histogram of the image are ill-suited to this
approach because in small populations statistical fluctuations dominate and make a
significant and correct segmentation of the histogram difficult. However the RATS
method, which has been shown to be insensitive to population size for noisefree
images, should benefit from this local application. This is related to the proposition
that in noisy images RATS will produce an optimal threshold if the number of
object and number of background pixels are equal. As the window size decreasesit
becomes more probable that a window which contains both object and background
pixels will achieve this desired balance. However, many small windows will contain
either only object or only background pixels. The determination of a threshold for
these homogenous windows will be inappropriate but suitable threshold values to
classify them as all object or all background pixels should be derivable from their
spatial and gray level relationships to the well-thresholded windows.
A simple test for windows which are homogenous, i.e., contain only object or only
background pixels, is to consider the E-grad statistic for all the windows into which
the image is divided. Windows which contain edge pixels will generally have a larger
Z-grad value than those containing no edge pixels. The effect of noise, if its statistics
are constant over the image, will add, on average, an equal Z-grad to both types of
window. The large Z-grad value windows as they contain edge pixels, are threshold-
able. They can be separated from the small Z-grad windows by treating the
2-dimensional array of Z-grad values as an image to which the RATS thresholding
method can be applied. Figure 10 shows the effectiveness of this method for the
image in Fig. 9a. The image was partitioned into square windows each with a side
length of 32 pixels. The 8 x 8 array of Z-grad values of all windows was thresholded
using the RATS method and Fig. 10 is the resultant binary image together with an
overlay of the border points of the object in the image. (The border was obtained

FIG. 10. The el?‘ectof the grad cut. The bright squares have values above the cut. This indicates they
have gradient contributions from large edge values and therefore good thresholds may be assigned for
these windows.
THRESHOLD SELECTION BASED ON A SIMPLE IMAGE STATISTIC 143

FIG. 11. A pyramid data structure. Values at high levels are constructed by the successive union of
nonoverlapping 2 x 2 window values.

from edge pixels of a later successful image binarization.) It is seen that the bright
windows, i.e., those above the calculated Z-grad threshold, coincide well with the
border of the object. The Z-grad threshold selects windows which contain edge
points and it is for these windows that meaningful thresholds are calculable. Similar
good results were found for a variety of images.
The assignment of good thresholds to window areas of the image which contain
only object or only background pixels can be attempted in many ways. The two
possibilities which have been considered in our work involve simple neighborhood
window averaging or the use of a pyramid data structure. The first of these methods
is well described in [7]. An unassignedwindow is given a threshold value which is the
weighted arithmetic mean of the threshold values of the 8 neighboring windows
which have a threshold assigned. Each window threshold is weighted by a factor
inversely proportional to the distance between its center and that of the central
unassigned window (e.g., for diagonal neighbors by l/ 6). This assignment process
is iterated until thresholds have been assigned to all windows.
The use of a pyramid data structure was motivated by the desire to select the best
thresholds from those determined at several different spatial resolutions [34]. The
independent nature of the sums of individual pixel statistics means that the RATS
algorithm is well suited to this approach. The pyramid data structure is illustrated in
Fig. 11. At the lowest level of the pyramid the statistics of the image are calculated
for small windows. At the next higher level the statistics for nonoverlapping blocks
formed by the union of 2 x 2 windows are calculated by summing the statistics
obtained for those 4 windows at the lower level. At the very highest level the
statistics are just the sums calculated over all pixels and the threshold calculated is
the global threshold. An appropriate or best threshold can be calculated for the low
level windows by considering information at several spatial scales. At each level we
144 KIT-ITER, ILLINGWORTH, AND FtjGLEIN

FIG. 12. Four gray level image which indicates the level at which a threshold was assigned to a base
level 32 x 32 pixel window. The brightest squares were assigned thresholds based on 32 x 32 pixel
window statistic, next brightest 64 X 64 pixel window, next brightest 128 X 128 windows, and the darkest
areas were given a threshold determined by sums over the full 256 X 256 image.

can decide whether a window is thresholdable by defining a Z-grad threshold as


previously described. If a window is homogenous at the lowest level the pyramid can
be ascended until the window is contained within an area which has a Z-grad value
above the Z-grad threshold for that level. The window is therefore likely to be within
an area which contains edge pixels and therefore a good threshold can be assigned
from the statistics defined at the high level. Figure 12 shows how this works for the
image of Fig. 9a. At the lowest pyramid level the image has been divided into square
windows of side length 32 pixels. The 4 grey level shades of Fig. 12 indicate the level
of the pyramid at which the window was assigned a threshold. The windows with the
brightest grey level were assigned a threshold derived from the statistics at the
32 x 32 pixel square base level. The approximate border of the object is indicated in
the image by overlay. It is seen that windows which contain edge points are assigned
thresholds at this low level of Fig. 10. Thresholds assigned at the next level up the
pyramid, 64 x 64 pixel square windows, are indicated by the second brightest grey
levels. The darkest areas which are the uniform featureless areas in the comer of the
image are assigned a threshold only at the highest pyramid level, i.e., a globally
determined threshold.
Window thresholds can be used to calculate appropriate thresholds for individual
pixels. Two distinct methods have been proposed in the literature for this. In [27] the
2D array of window thresholds are used for all pixels in a window and then a 2D
low pass filter is used to smooth discontinuities at the boundary of windows. They
considered several different filters and achieved satisfactory results for their applica-
tion which is the automatic determination of left ventricle of the heart in X-ray
angiograms. The second method is to assign individual pixel thresholds using 4-point
Lagrangean interpolation among window thresholds. The idea is illustrated in Fig.
13. The window thresholds are assumed to be appropriate to the central pixel of each
window area. We have implemented this method and Fig. 14 shows the results of its
application to the image of Fig. 9. Figure 14a is the result of window threshold
assignment using nearest neighbors averaging and Fig. 14b is the result using the
pyramid structure reassignments. These results are a definite improvement over the
result of Fig. 10a which is the application of a globally determined threshold.
THRESHOLD SELECTION BASED ON A SIMPLE IMAGE STATISTIC 145

--_
WINDOW A WINOOW 8

. TE

.
d-
TO

WINDOW C WINDOW D
-- L

Tp= + bcTB +daTC+coTg]


* CbdTA

FIG. 13. Assignment of individual pixel thresholds based on 4 point interpolation between the window
thresholds of the nearest four windows.

FIG. 14. (a) Result of variable threshold& Window threshold reassignment was by averaging of
nearest neighbor windows. (b) Result of variable thresholding. Window threshold reassigmnent was by
use of a pyramid data structure.

8. CONCLUSIONS
The problem of automatic threshold selection has been considered. After a brief
review of available techniques, a novel method has been proposed. It is based on
image statistics which can be computed without histogramming the grey level values
of the image. A detailed analysis of the properties of the algorithm has been carried
out. The effectiveness of the method has been shown on a number of practical
examples.
146 KITTLER, ILLINGWORTH, AND FGGLEIN

REFERENCES
1. J. Kittler and K. Paler, An absorption edge detector, in Proceedings Computer Vision and Pattern
Recognition Conf., Washington 1983, pp. 345-350.
2. J. Kittler, J. Illingworth, and K. Paler, The magnitude accuracy of the template edge detector, Pattern
Recognition, 16,1983,607-613.
3. J. Ekhmdh and A. Rosenfeld, Peak detection using difference operators, IEEE Trans. Pattern Anal.
Mach. Intell. PAMI-1, No. 3, 1979, 317-325.
4. S. L. Horowitz, Peak Recognition in Waveforms in Syntactic Pattern Recognition, Applications (K. S.
Fu, Ed.), Springer-Verlag, Berlin/New York, 1977.
5. I. Tomek, Two algorithms for piecewise-linear continuous approximations of functions of one
variable, IEEE Trans. Comput. C-22,1974,445-448.
6. C. Willams, An efficient algorithm for the piecewise linear approximation of planar curves, Comput.
Graphics Image Process. 8,1978, 286-293.
7. Y. Nakagawa and A. Rosenfeld, Some experiments on variable thresholding, Pattern Recognition 11,
1979, 191-204.
8. T. Ridler and S. Calvard, Picture thresholding using an iterative selection method, IEEE Trans.
Systems Man Cybern., SMC-8, No. 8,1978, 630-632.
9. J. Wesxka and A. Rosenfeld, Histogram modification for threshold selection, IEEE Systems Man
Cybem., SMC-9, No. 1, 1979, 38-52.
10. A. Wu, T. H. Hong, and A. Rosenfeld, Threshold selection using quadtrees, IEEE Trans. Pattern
Anal. Mach. Inteli. PAMI4, No. 1, 1982, 90-94.
11. D. Milgram, Region extraction using convergent evidence, Comput. Graphics Image Process. l&1979,
1-12.
12. A. Rosenfeld and P. De La Torre, Histogram concavity analysis as an aid in threshold selection,
IEEE Trans. Systems Man Cybem. SMC-13, No. 3,1983, 231-235.
13. S. K. Pal, R. A. King, and A. A. Hashim, Automatic grey level thresholding through index of
fuzziness and entropy, to appear.
14. N. Otsu, A threshold selection method from grey level histograms, IEEE Trans. Systems Man
Cybern., SMC-9, No. 1,1979,62-66.
15. N. Otsu, Discriminant and least squares threshold selection, in 4th Znt. Joint Conf. on Pattern
Recognition, Kyoto, Japan, 1978, pp. 592-596.
16. T. Pun, A new method for grey level picture thresholding using the entropy of the histogram, Signal
Process. 2, 1980, 223-237.
17. T. Pun, Entropic thresholding, a new approach, Comput. Graphics Image Process. 16,1981, 210-239.
18. G. Johannsen and J. Bille, A threshold selection method using information measures, in 6th ht. Conf.
on Pattern Recognition, Munich, Germany, 1982.
19. F. Deravi and S. K. Pal, Grey level thresholding using second-order statistics, Pattern Recognition
Z.-err.1, Nos. 5, 6, 1983, 417-422.
20. W. Barrett, An iterative algorithm for multiple threshold selection, in Proc. IEEE Comput. Sot. Conf.
on Pattern Recognition and Image Process., DaBas, Texas, 1981, 273-278.
21. A. Rosenfeld and R. Smith, Thresholding using relaxation, IEEE Trans. Pattern Anal. Mach. Zntell.
PAMI3, No. 5, 1981, 598-606.
22. S. Ranade and J. Prewitt, A comparison of some segmentation algorithms for cytology, in 5th Znt.
Conf. on Pattern Recognition, Miami, Fla., 1980, pp. 561-564.
23. B. Parvin and B. Bhanu, Segmentation of images using a relaxation technique, IEEE Comput. Sot.
Conf. on Computer Vision and Pattern Recognition, Washington, D.C., 1983, pp. 151-153.
24. B. Bhanu and 0. Faugeras, Segmentation of images having unimodal distributions, IEEE Trans.
Pattern Anal. Mach. Intell. PAMI-4, No. 4, 1982, pp. 408-419.
25. B. Nordin, E. Bengtsson, B. Dahlgvist, 0. Eriksson, T. Jarkraus, and B. Stenkvist, Object orientated
cell image segmentation, in First IEEE Symp. on Medical Imaging and Image Interpretation,
Berlin 1982.
26. H. Bunke, H. Feistl, H. Neimann, G. Sagerer, F. Wolf, and G. X. Zhou, Smoothing, thresholding and
contour extraction in images from gated blood pool studies, in First IEEE Symp. on Medical
Imaging and Image Interpretation, Berlin 1982.
27. G. Fernando and D. M. Munro, Variable thresholding applied to angiography, in First IEEE Symp.
on Medical Zmag’ng and Image Interpretation, Berlin 1982.
THRESHOLD SELECTION BASED ON A SIMPLE IMAGE STATISTIC 147

28. J. Tokumtsu, S. Kawata, Y. Ichioka, and T. Suzuki, Adaptive binarisation using a hybrid image
processing system, Appl. Optics 17, No. 16,1978,2655-2657.
29. J. White and G. Rohrer, Image thresholding for OCR and other applications requiring character
image extraction, IBMJ. Res. Dev. 27, No. 4,1983,400-410.
30. H. J. Trussell, Comments on “Picture thresholding using an iterative selection method,” IEEE Trans.
Systems MQ~ Cybern. SMC-9,1979, 311.
31. J. Kittler, J. Illingworth, J. Foglein, and K. Paler, An automatic thresholding method for waveform
segmentation, in Proc. Digital Signal Processing-84, Florence 1984, pp. 727-732.
32. J. Kittler, J. Illingworth, J. Foglein, and K. Paler, An automatic thresholding algorithm and its
performance, in Proc. 7th Int. Conf. on Pattern Recognition, Montreal 1984, pp. 245.
33. R. Kohler, A segmentation system based on thresholding, Comput. Graphics Image Process. 15, 1981,
319-338.
34. A. R. Hanson and E. M. Riseman, Processing cones: A computational structure for image analysis, in
Structured Computer Vision (S. Tanimoto and A. Klinger, Eds.), Academic Press, New York,
1980.

You might also like