You are on page 1of 38

P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK

International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

International Journal of Computer Vision 30(2), 79–116 (1998)


°
c 1998 Kluwer Academic Publishers. Manufactured in The Netherlands.

Feature Detection with Automatic Scale Selection

TONY LINDEBERG
Computational Vision and Active Perception Laboratory (CVAP), Department of Numerical Analysis
and Computing Science, KTH (Royal Institute of Technology), S-100 44 Stockholm, Sweden
tony@nada.kth.se

Received February 1, 1994; Revised June 1, 1996; Accepted July 30, 1998

Abstract. The fact that objects in the world appear in different ways depending on the scale of observation has
important implications if one aims at describing them. It shows that the notion of scale is of utmost importance
when processing unknown measurement data by automatic methods. In their seminal works, Witkin (1983) and
Koenderink (1984) proposed to approach this problem by representing image structures at different scales in a
so-called scale-space representation. Traditional scale-space theory building on this work, however, does not
address the problem of how to select local appropriate scales for further analysis. This article proposes a systematic
methodology for dealing with this problem. A framework is presented for generating hypotheses about interesting
scale levels in image data, based on a general principle stating that local extrema over scales of different combinations
of γ -normalized derivatives are likely candidates to correspond to interesting structures. Specifically, it is shown
how this idea can be used as a major mechanism in algorithms for automatic scale selection, which adapt the local
scales of processing to the local image structure.
Support for the proposed approach is given in terms of a general theoretical investigation of the behaviour of the
scale selection method under rescalings of the input pattern and by integration with different types of early visual
modules, including experiments on real-world and synthetic data. Support is also given by a detailed analysis of
how different types of feature detectors perform when integrated with a scale selection mechanism and then applied
to characteristic model patterns. Specifically, it is described in detail how the proposed methodology applies to the
problems of blob detection, junction detection, edge detection, ridge detection and local frequency estimation.
In many computer vision applications, the poor performance of the low-level vision modules constitutes a major
bottleneck. It is argued that the inclusion of mechanisms for automatic scale selection is essential if we are to
construct vision systems to automatically analyse complex unknown environments.

Keywords: scale, scale selection, normalized derivative, feature detection, blob detection, corner detection, fre-
quency estimation, Gaussian derivative, scale-space, multi-scale representation, computer vision

1. Introduction through thermodynamics and solid mechanics dealing


with every-day phenomena, to astronomy and relativity
One of the very fundamental problems that arises when theory at scales much larger than those we are usually
analysing real-world measurement data originates from dealing with. Notably, the type of physical description
the fact that objects in the world may appear in dif- may be strongly dependent on the scale at which the
ferent ways depending upon the scale of observation. world is modelled, and this is in clear contrast to cer-
This fact is well-known in physics, where phenomena tain idealized mathematical entities, such as “point” or
are modelled at several levels of scale, ranging from “line,” which are independent of the scale of observa-
particle physics and quantum mechanics at fine scales, tion.
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

80 Lindeberg

In certain controlled situations, appropriate scales should be regarded as important, we obtain a sub-
for analysis may be known a priori. For example, a stantial expansion of the amount of data to be inter-
desirable property of a physicist is his intuitive ability preted by later stage processes. In most previous works,
to select appropriate scales to model a given situation. this problem has been handled by formulating algo-
Under other circumstances, however, it may not be ob- rithms which rely on the information present in the
vious at all how to determine in advance what are the data at a small set of manually chosen scales (or even
proper scales. One such example is a vision system a single scale). Alternatively, coarse-to-fine algorithms
with the task of analysing unknown scenes. Besides have been expressed, which start at a given coarse scale
the inherent multi-scale properties of real-world ob- and propagate down to a given finer scale. Determin-
jects (which, in general, are unknown), such a system ing such scales in advance, however, leads to the intro-
has to face the problems that the perspective mapping duction of free parameters. If one aims at autonomous
gives rise to size variations, that noise is introduced algorithms which are to operate in a complex environ-
in the image formation process, and that the available ment without need for external parameter tuning, we
data are two-dimensional data sets reflecting only in- therefore argue that it is essential to complement tra-
direct properties of a three-dimensional world. To be ditional multi-scale processing by explicit mechanisms
able to cope with these problems, an image representa- for automatic scale selection. Notably, image descrip-
tion that explicitly incorporates the notion of scale is a tors can be highly unstable if computed at inappro-
crucially important tool whenever interpreting sensory priately chosen scales, whereas a proper tuning of the
data, such as images, by automatic methods. scale parameter can improve the quality of an image
In computer vision and image processing, these in- descriptor substantially. As will be demonstrated later,
sights have lead to the construction of multi-scale repre- local scale information can also constitute an important
sentations of image data, obtained by embedding any cue in its own right.
given signal into a one-parameter family of derived Early work addressing this problem was presented
signals (Burt, 1981; Crowley, 1981; Witkin, 1983; in (Lindeberg, 1991, 1993a) for blob-like image struc-
Koenderink, 1984; Yuille and Poggio, 1986; Florack tures. The basic idea was to study the behaviour of im-
et al., 1992; Lindeberg, 1994a; Haar Romeny, 1994). age structures over scales, and to measure the saliency
This family should be parameterized by a scale para- of image structures from the stability properties and the
meter and be generated in such a way that fine-scale lifetime of these structures in scale-space. Scale levels
structures are successively suppressed when the scale were selected from the scales at which a measure of
parameter is increased. A main intention behind this blob strength assumed local maxima over scales and
construction is to obtain a separation of the image significant image structures were determined from the
structures in the original image, such that fine scale stability of the blob structures in scale-space. Experi-
image structures only exist at the finest scales in the mentally, it was shown that this approach could be used
multi-scale representation. Thereby, the task of oper- for extracting regions of interest with associated scale
ating on the image data will be simplified, provided levels, which in turn could serve as to guide various
that the operations are performed at sufficiently coarse early visual processes.
scales where unnecessary and irrelevant fine-scale The subject of this article is to address the prob-
structures have been suppressed. Empirically, this idea lem of automatic scale selection in a more general set-
has proved to be extremely useful, and multi-scale rep- ting, for wider classes of image descriptors. We shall
resentations such as pyramids, scale-space representa- be concerned with the extraction of image features and
tion and non-linear diffusion methods are commonly the computation of filter-like image descriptors, and
used as preprocessing steps to a large number of early present a scale selection principle for image descriptors
visual operations, including feature detection, stereo which can be expressed in terms of Gaussian deriva-
matching, optic flow, and the computation of shape tive filters. The general idea that will be proposed is
cues. to study the evolution properties over scales of nor-
A multi-scale representation by itself, however, con- malized differential descriptors. Specifically, it will be
tains no explicit information about what image struc- suggested that local extrema over scales of normalized
tures should be regarded as significant or what scales differential entities, which arise in this way, are likely
are appropriate for treating those. Hence, unless early to correspond to interesting image structures. By theo-
judgements can be made about what image structures retical considerations and experiments it will be shown
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 81

that this approach gives rise to intuitively reasonably 2. Scale-Space Representation: Review
results in different situations and that it provides a uni-
fied framework for scale selection for detecting image Given any continuous signal f : R D → R, its linear
features such as blobs, corners, edges and ridges. scale-space representation L : R D × R+ → R is de-
fined as the solution to the diffusion equation

1.1. Outline of the Presentation 1 2 1X D


∂t L = ∇ L= ∂x x L (1)
2 2 i=1 i i
The presentation is organized as follows: Section 2 re-
views main concepts from scale-space theory which we
build upon. Section 3 introduces the notion of normal- with initial condition L(·; 0) = f (·). Equivalently, this
ized derivatives and illustrates how local maxima over family can be defined by convolution with Gaussian
scales of normalized Gaussian derivatives reflect the kernels of various width t
frequency content in sine wave patterns. This material
serves as a preparation for Section 4, which presents L(·; t) = g(·; t) ∗ f (·), (2)
the proposed scale selection methodology and shows
how it applies generally to a large class of differen- where g : R D × R+ → R is given by
tial descriptors. Section 4 also proposes a general
1
e−(x1 +...+x D )/(2t) ,
extension of the common idea of defining features as 2 2
g(x; t) = (3)
zero-crossings of spatial differential descriptors. If a (2π t) N /2
scale selection mechanism is integrated into such a fea-
ture detector, this corresponds to adding another zero- and x = (x1 , . . . , x D )T . There are several results
crossing requirement over the scale dimension in the (Koenderink, 1984; Babaud et al., 1986; Yuille and
differential feature definition. Poggio, 1986; Lindeberg, 1990, 1994a, 1994c; Koen-
Then, Sections 5 and 6 show in detail how these ideas derink and van Doorn, 1990, 1992; Florack et al., 1992,
can be used for formulating blob detectors and cor- 1993; Florack et al., 1994; Pauwels et al., 1995) stat-
ner detectors with automatic scale selection. Section 8 ing that within the class of linear transformations the
shows an example of how this approach applies to Gaussian kernel is the unique kernel for generating a
the computation of dense feature maps, and Section 7 scale-space. The conditions that specify the uniqueness
presents a complementary scale selection mechanism are essentially linearity and shift invariance combined
for the purpose of computing accurate localization with different ways of formalizing the notion that new
estimates of image features. Section 9 describes differ- structures should not be created in the transformation
ent ways of interpreting the normalized derivative con- from a finer to a coarser scale.
cept. Finally, Section 10 summarizes main results and Interestingly, the results from these theoretical con-
ideas of the approach. A companion paper (Lindeberg, siderations are in qualitative agreement with the results
1996b) shows in detail how this approach applies to of biological evolution. Neurophysiological studies by
edge detection and ridge detection. Young (1985, 1987) have shown that there are recep-
Earlier presentations of different parts of this ma- tive fields in the mammalian retina and visual cortex,
terial have appeared elsewhere (Lindeberg, 1993b, whose measured response profiles can be well mod-
1994d, 1994a, 1996c) as well as applications of the gen- elled by Gaussian derivatives up to order four. In these
eral ideas to various problem domains (Lindeberg and respects, the scale-space representation with its asso-
Gårding, 1993, 1997; Gårding and Lindeberg, 1996; ciated Gaussian derivative operators can be seen as
Lindeberg and Li, 1995, 1997; Bretzner and Lindeberg, a canonical idealized model of a visual front-end. It
1998, 1997; Almansa and Lindeberg, 1996; Wiltschi is for this multi-scale representation concept we will
et al., 1997; Lindeberg, 1997). The subject of this pa- develop the scale selection methodology.
per is to present a coherent description of the pro- With appropriate adaptations, however, the approach
posed scale selection methodology in journal form, also applies to other filter families. Since the Gaussian
including the developments and refinements that have family is complete, it follows that we can, for example,
been performed since the earliest presented manu- express any Gabor function as a linear combination of
scripts. Gaussian derivatives.
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

82 Lindeberg

3. Normalized Derivatives and Intuitive Idea useful when formulating scale selection mechanisms
for Scale Selection for edge detection and ridge detection.
For the sinusoidal signal, the amplitude of an mth
A well-known property of the scale-space representa- order normalized derivative as function of scale is given
tion is that the amplitude of spatial derivatives by

L x α (·; t) = ∂x α L(·; t) = ∂x1α1 · · · ∂x Dα D L(·; t) L ξ m ,max (t) = t mγ /2 ω0m e−ω0 t/2 ,


2
(8)

in general decrease with scale, i.e., if a signal is subject i.e., it first increases and then decreases. Moreover, it
to scale-space smoothing, then the numerical values of assumes a unique maximum at tmax,L ξ m = γωm2 . If we
spatial derivatives computed from the smoothed data 0
define a scale parameter σ of dimension length by
can be expected to decrease. This is a direct conse- √
σ = t and introduce the wavelength λ0 of the sig-
quence of the non-enhancement property of local ex- nal by λ0 = 2π/ω0 , we can see that the scale at which
trema, which means that the value at a local maximum the amplitude of the γ -normalized derivative assumes
cannot increase, and the value at a local minimum can- its maximum over scales is proportional to the wave-
not decrease. Hence, the amplitude of the variations in length, λ0 , of the signal:
a signal will always decrease with scale.

As a simple example of this, consider a sinusoidal γm
input signal1 of some given frequency ω0 ; for simplicity σmax,L ξ m = λ0 . (9)

in one dimension,
The maximum value over scales is
f (x) = sin ω0 x. (4) ¡ ¢ (γ m)γ m/2 (1−γ )m
L ξ m ,max tmax,L ξ m = ω0 . (10)
It is straightforward to show that the solution of the eγ m/2
diffusion equation is given by In the case when γ = 1, this maximum value is inde-
pendent of the frequency of the signal (see Fig. 1),
L(x; t) = e−ω0 t/2 sin ω0 x.
2
(5) and the situation is highly symmetric, i.e., given any
scale t0 , the
√ maximally amplified frequency is given
Thus, the amplitude of the scale-space representa- by ωmax = m/t0 , and for any ω0 the scale with maxi-
tion, L max , as well as the amplitude of the mth order mum amplification is tmax = m/ω02 . In other words, for
smoothed derivative, L x m ,max , decrease exponentially normalized derivatives with γ = 1 it holds that sinu-
with scale soidal signals are treated in a similar (scale invariant)
way independent of their frequency (see Fig. 1).
L max (t) = e−ω0 t/2 , L x m ,max (t) = ω0m e−ω0 t/2 .
2 2
The situation is a bit different when γ 6= 1. We shall
return to this subject in Section 4.1.
Let us next introduce a γ -normalized derivative oper-
ator defined by

∂ξ,γ -norm = t γ /2 ∂x , (6)

which corresponds to the change of variables


x
ξ= . (7)
t γ /2
In the special case when γ = 1, these ξ -coordinates and
their associated normalized derivative operator are di-
mensionless. This property of perfect scale invariance
has been used by Florack et al. (1992) as a main re-
quirement in an axiomatic scale-space formulation (see Figure 1. The amplitude of first order normalized derivatives as
also Pauwels et al., 1995; Lindeberg, 1994c, 1996e). function of scale for sinusoidal input signals of different frequency
As we shall see later, however, values of γ < 1 are also (ω1 = 0.5, ω2 = 1.0 and ω3 = 2.0).
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 83

4. Proposed Scale Selection Methodology level, at which some (possibly non-linear) combi-
nation of normalized derivatives assumes a local
The example shows that the scale at which a normal- maximum over scales, can be treated as reflecting
ized derivative assumes its maximum over scales is for a characteristic length of a corresponding struc-
a sinusoidal signal proportional to the wavelength of ture in the data.
the signal. In this respect, maxima over scales of nor-
malized derivatives reflect the scales over which spatial This principle is closely related to although not equiv-
variations take place in the signal. alent to the method for scale selection previously pro-
This operation corresponds to an interesting compu- posed in (Lindeberg, 1991, 1993a), where interesting
tational structure, since it constitutes a way of estimat- scale levels were determined from maxima over scales
ing length based on measurements performed at only a of a normalized blob measure. It can be theoretically
single spatial point in the scale-space representation, justified under a number of different assumptions and
and without explicitly laying out a ruler. Moreover, for a number of specific brightness models (see next).
compared to a local windowed Fourier transform there Its general usefulness, however, must be verified em-
is no need for making any explicit settings of window pirically, and with respect to the type of problem it is
size for computing the Fourier transform. Instead, the to be applied to.
propagation of length information over space is per-
formed via the diffusion equation, and the decisions 4.1. General Scaling Property of Local Maxima
about the contents in the data are made by studying the over Scales
output of derivative operators as the diffusion process
evolves. A basic justification for the abovementioned arguments
Alternatively, we can view such a measurement pro- can be obtained from the fact that for a large class
cedure as a pattern matcher, which matches Gaussian of (possibly non-linear) combinations of normalized
derivative kernels of different size to the given image derivatives it holds that maxima over scales have a nice
pattern, based on a specific normalization of the prim- behaviour under rescalings of the intensity pattern. If
itive templates. By using the proposed γ -normalized the input image is rescaled by a constant scaling factor
derivative concept, we obtain one-to-one correspon- s, then the scale at which the maximum is assumed
dence between the matching response of the Gaussian will be multiplied by the same factor (if measured in

derivative kernels and the wavelength of the signal. Se- units of σ = t). This is a fundamental requirement
lecting the scale at which the maximum over scale is on a scale selection mechanism, since it guarantees that
assumed corresponds to selecting the pattern (or the image operations commute with size variations.
scale) for which the operator response is strongest.
This property is, however, not restricted to sine wave Transformation Properties under Rescalings. To give
patterns or to image measurements in terms of linear a formal characterization of this scaling property, con-
derivative operators of a certain order. Contrary, it ap- sider two signals f and f 0 related by
plies to a large class of image descriptors which can
be formulated as multi-scale differential invariants ex- f (x) = f 0 (sx), (11)
pressed in terms of Gaussian derivatives (this notion
will be made more precise next). A main message of
and define the scale-space representations of f and f 0
this article is that this property can be used as a ma-
in the two domains by
jor mechanism in algorithms for automatic scale se-
lection, which automatically adapt the local scales of
processing to image data. Let us hence generalize the L(·; t) = g(·; t) ∗ f, (12)
0 0 0 0
abovementioned observation to more complex signals L (·; t ) = g(·; t ) ∗ f , (13)
and state the following principle for scale selection, to
be applied in situations when no other information is where the spatial variables and the scale parameters are
available. In its most general form, it can be expressed transformed according to
as follows:

Principle for automatic scale selection: In the x 0 = sx, (14)


0
absence of other evidence, assume that a scale t = s t.
2
(15)
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

84 Lindeberg

Then, L and L 0 are related by does not depend on the index i of that term. For a
differential expression of this form, the corresponding
L(x; t) = L 0 (x 0 ; t 0 ), (16) normalized differential expression in each domain is
given by
and the mth order spatial derivatives satisfy
Dγ -norm L = t Mγ /2 DL , (23)
0 0 0
∂x m L(x; t) = s ∂x 0 m L (x ; t ).
m
(17) Dγ0 -norm L 0 =t 0 Mγ /2 0
DL. 0
(24)
For γ -normalized derivatives in the two domains From (20) it follows that these normalized differential
expressions are related by
∂ξ = t γ /2 ∂x , (18)
∂ξ 0 = t 0 γ /2
∂x 0 , (19) Dγ -norm L = s M(1−γ ) Dγ0 -norm L 0 . (25)

we have For γ -normalization with γ 6= 1, the magnitude of the


derivative is not scale invariant—it scales according to
∂ξ m L(x; t) = s m(1−γ ) ∂ξ 0 m L 0 (x 0 ; t 0 ). (20) a power law. Local maxima over scales are, however,
still preserved,
Perfect Scale Invariance When γ = 1. From the re-
lation (20) it can be seen that, when γ = 1 the nor- ∂t (Dγ -norm L) = 0 ⇔ (26)
malized derivative concept leads to perfect scale in- ∂t 0 (Dγ0 -norm L 0 ) =0 (27)
variance. The normalized derivatives are equal in the
two domains, provided that the scale parameters and and the type of critical points are preserved under
the spatial positions are matched according to (14) and this transformation. Hence, even when γ 6= 1, we can
(15). More specifically, local maxima over scales are achieve sufficient scale invariance to support the pro-
always assumed at corresponding positions, and this posed scale selection methodology.
scaling property holds for any differential expression
defined from the local N -jet. Scale Compensated Magnitude Measures When γ 6= 1.
When basing a scale selection methodology on γ 6= 1,
Sufficient Scale Invariance When γ 6= 1. The case there is a minor complication which needs attention.
when γ 6= 1 leads to a different type of structure, since When performing feature detection in practice, it is nat-
we cannot preserve a scaling property for arbitrary ural to associate a measure of feature strength with each
combinations of normalized derivatives. Let us hence detected feature. Specifically, the magnitude of the re-
restrict the analysis to polynomial differential invari- sponse at the local maximum over scales constitutes a
ants which are homogeneous in the sense that the sum natural entity to include in such a measure. From the
of the orders of differentiation is the same for each transformation property (20), it is, however, apparent
term in the polynomial. To express this notion com- that this magnitude measure will be strongly depen-
pactly, introduce multi-index notation for derivatives dent on the scale at which the maximum over scales
by L x α = L x1α1 x2α2 ··· x Dα D where x = (x1 , x2 , . . . , x D ), is assumed. Hence, the magnitude measure will de-
α = (α1 , α2 , . . . , α D ) and |α| = α1 + α2 + · · · + α D . pend on the feature size. In view of the scale invariant
Then, consider a homogeneous polynomial differen- magnitude measure obtained using γ = 1, it is, how-
tial invariant DL of the form ever, straightforward to correct for this phenomenon
by multiplying the response by a correction factor and
X
I Y
J
DL = ci L x αi j , (21) to define a scale compensated magnitude measure by
i=1 j=1
Mγ -norm L = t M(1−γ )/2 Dγ -norm L . (28)
where the sum of the orders of differentiation in a cer-
tain term Then, the magnitude measures in the two domains will
satisfy
X
J
|αi j | = M (22)
Mγ -norm L(x; t) = M0γ -norm L 0 (x 0 ; t 0 ). (29)
j=1
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 85

Necessity of the γ -Normalization. More generally, be detected at quite coarse scales, and the localization
one may ask what choices of normalization factors are properties may not be the best. Therefore, we propose
possible, provided that we would like to state this scal- to complement this framework by a second process-
ing property as a fundamental constraint on a scale se- ing stage, in which more refined processing is invoked
lection mechanism based on local maxima over scales for computing more accurate localization estimates. In
of normalized differential entities. Then, in fact, it can this respect, the suggested framework naturally gives
be shown that the γ -normalized derivative concept ac- rise to two-stage algorithms, with feature detection at
cording to (6) arises by necessity. coarse scales followed by feature localization at finer
In other words, the γ -normalized derivative concept scales. Whereas coarse-to-fine approaches are com-
comprises the most general class of normalization fac- mon practice in several computer vision problems, this
tors for which detection of local maxima over scales framework leads to explicit mechanisms for automatic
commutes with rescalings of the input pattern. A more selection of all the scale parameters involved.
precise formulation of this statement as well as the de- In the following, a series of theoretical and exper-
tails of the necessity proof are given in Appendix A.1. imental results will be presented showing how the
abovementioned general approach applies to differ-
Summary: General Properties of Scale Selection ent types of feature detectors expressed as polynomial
Framework. To conclude, this analysis shows that if combinations of Gaussian derivatives.
a γ -normalized homogeneous differential expression
assumes a maximum over scales at (x0 ; t0 ) in the scale-
space representation of f , then there will be a cor- 4.3. Scale-Space Signatures from Real-World Data
responding maximum over scales in the scale-space
representation of f 0 at (sx0 ; s 2 t0 ). Moreover, although Figure 2 shows the variations over scales of two simple
the magnitude of a normalized derivative at a local max- differential expressions formulated in terms of normal-
imum over scales is not scale invariant unless γ = 1, it ized derivatives. It shows the result of computing the
is possible to compensate for this phenomenon and to trace and the determinant of the normalized Hessian
define scale invariant magnitude descriptors also when matrix by (with γ = 1)
γ 6= 1.
trace Hnorm L = t γ ∇ 2 L = t γ (L x x + L yy ), (30)
¡ ¢
4.2. The Scale Selection Mechanism in Practice det Hnorm L = t 2γ L x x L yy − L 2x y , (31)

So far, we have proposed a general methodology for for two details in an image of a field of sunflowers.2
scale selection by detecting local maxima in feature These graphs are called the scale-space signatures of
responses over scales. A fundamental problem that re- trace Hnorm L and det Hnorm L, respectively.
mains to be solved in this context concerns what dif- As can be seen, the maximum over scales in the top
ferential expressions to use. Is any differential invari- row of Fig. 2 is assumed at a finer scale than in the
ant feasible? Here, we shall not attempt to answer this bottom row. A more detailed examination of the ratio
question. Let us instead contend that the differential ex- between the scale values3 where the graphs attain their
pression should at least be determined so as to capture maxima over scales shows that when the scale param-
the types of image structures under consideration. eter is measured in dimension length, this scale ratio is
The general approach to scale selection that is pro- roughly equal to the ratio of the diameters of the sun-
posed is to use these locally maximal responses over flowers in the centers of the two images, respectively.
scales in the stage of detecting image features, i.e., This example illustrates that results in agreement with
when establishing the existence of different types of the proposed scale selection principle can be obtained
image structures. Basically, the scale at which a maxi- for real-world data (i.e., signals having a much richer
mum over scales is attained will be assumed to give in- frequency content than a single sine wave).
formation about how large a feature is, in analogy with The reason why these particular differential expres-
the common approach of taking the spatial position at sions have been selected here is because they consti-
which the maximum operator response is assumed as tute useful differential entities for blob detection; see
an estimate of the spatial location of a feature. In cer- e.g. (Marr, 1982; Voorhees and Poggio, 1987; Blostein
tain situations, this implies that image features may and Ahuja, 1989). Before we turn to the problem of
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

86 Lindeberg

Figure 2. Scale-space signatures of the trace and the determinant of the normalized Hessian matrix computed for two details of a sunflower
image; (left) grey-level image, (middle) signature of (trace Hnorm L)2 , (right) signature of (det Hnorm L)2 . (The signatures have been computed
at the central point in each image. The horizontal axis shows effective scale, essentially the logarithm of the scale parameter, whereas the scaling
of the vertical axis is linear in the normalized operator response.)

expressing an integrated blob detector with automatic mally, an extremum path of a differential entity Dnorm L
scale selection, however, let us describe a further ex- is defined (from the implicit function theorem) as a set
tension of the general scale selection idea. of points (r (t); t) ∈ R N × R+ in scale-space such that
for any t ∈ R+ the point r (t) is a local extremum of the
mapping x 7→ (Dnorm L)(x; t),
4.4. Simultaneous Detection of Interesting
Points and Scales © ª
(x; t) ∈ R N × R+
© ª
In Fig. 2, the signatures of the normalized differen- = (r (t); t) ∈ R N × R+ : (∇(Dnorm L))(r (t); t)=0 .
tial entities were computed at the central point in each
image. These points were deliberately chosen to co- At points at which extrema in the scale-space signature
incide with the centers of the sunflowers, where the are assumed, the derivative along the scale direction is
blob response can be expected to be maximal under zero as well. Hence, it is natural to define a normalized
spatial perturbations. In a real-world vision situation, scale-space extremum of a differential entity Dnorm L as
however, we cannot assume such points to be known a point (x0 ; t0 ) ∈ R N × R+ in scale-space which is si-
a priori. Moreover, we can expect that the spatial max- multaneously a local extremum with respect to both the
imum of the operator response is assumed at different spatial coordinate and the scale parameter.4 In terms
positions at different scales. This is one example of of derivatives, such points satisfy
the well-known fact that scale-space smoothing leads
to shape distortions. (∇(Dnorm L))(x0 ; t0 ) = 0,
(32)
Therefore, a more general approach to scale selec- (∂t (Dnorm L))(x0 ; t0 ) = 0.
tion from local extrema in the scale-space signature is
by accumulating the signature of any normalized differ- These normalized scale-space extrema constitute natu-
ential entity Dnorm L along the path r : R+ → R N that ral generalizations of extrema in the spatial domain,
a local spatial extremum in Dnorm L describes across and can serve as natural interest points for feature
scales. The mathematical framework for describing detectors formulated in terms of spatial maxima of
such paths is described in (Lindeberg, 1994a). For- differential operators, such as blob detectors, junction
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 87

detectors, symmetry detectors, etc. Specific examples graphically illustrated by a circle centered at the point
of this idea will be worked out in more detail in the at which the spatial maximum is assumed and with the
following sections.5 size determined such that the radius (measured in pixel
Trivially, the invariance properties of local maxima units) is proportional to the scale at which the maxi-
over scales under rescalings of the input signal transfer mum over scales is assumed (measured in dimension
to scale-space maxima. Hence, if a normalized scale- length). To reduce the number of blobs, a threshold on
space maximum is assumed at (x0 ; t0 ) in the scale- the maximum normalized response has been selected
space representation of a signal f , then in a rescaled such that the 250 blobs having the maximum normal-
signal f 0 defined by f 0 (sx) = f (x), a correspond- ized responses according to (28) remain.
ing scale-space maximum is assumed at (sx0 ; s 2 t0 ) in The bottom row shows the result of superimposing
the scale-space representation of f 0 . these circles onto a bright copy of the original image,
as well as corresponding results for the normalized
scale-space extrema of the square of the determinant
5. Blob Detection with Automatic Scale Selection of the Hessian matrix. Corresponding experiments for
a synthetic pattern (analysed in Section 5.1) are given
Figure 3 shows the result of detecting scale-space ex- in Fig. 4. Observe how these conceptually very sim-
trema of the normalized Laplacian in an image of a ple differential geometric descriptors give a very rea-
sunflower field. Every scale-space maximum has been sonable description of the blob-like structures in the

Figure 3. Normalized scale-space maxima computed from an image of a sunflower field: (top left): Original image. (top right): Circles
representing the 250 normalized scale-space maxima of (trace Hnorm L)2 having the strongest normalized response. (bottom left): Circles
representing scale-space maxima of (trace Hnorm L)2 superimposed onto a bright copy of the original image. (bottom right): Corresponding
results for scale-space maxima of (det Hnorm L)2 .
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

88 Lindeberg

Figure 4. The 250 most significant normalized scale-space extrema detected from the perspective projection of a sine wave (with 10% added
Gaussian noise).

image (in particular concerning the blob size) consid- sunflower image. Here, each scale-space maximum
ering how little information is used in the processing. has been visualized by a sphere centered at the posi-
Figure 5 shows a three-dimensional illustration of tion (x0 ; t0 ) in scale-space at which the maximum is
the multi-scale blob descriptors computed from the assumed, with the radius proportional to the selected

Figure 5. Three-dimensional view of the 150 strongest scale-space maxima of the square of the normalized Laplacian of the Gaussian computed
from the sunflower image. (A dark copy of the original grey-level image is shown in the ground plane, and the vertical dimension represents
scale.)
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 89

scale, and the brightness increasing with the signifi- is simple. It is assumed at
cance of the blob. Observe how the size variations in

the image data are reflected in the spatial variations of tdet HL = t1 t2 , (37)
the image descriptors.
and for both trace Hnorm L and det Hnorm L the scale at
5.1. Analysis for Idealized Model Patterns which the scale-space maximum is assumed reflects a
characteristic size of the blob.
Whereas the theoretical analysis in Section 4.1 applies
Example 2. Another interesting special case to con-
generally to large classes of differential invariants and
sider is a periodic signal defined as the sum of two
input signals, one may ask how the scale selection
perpendicular sine waves,
method for blob detection performs in specific situa-
tions. In this section, we shall study two model patterns
for which a closed-form solution of the diffusion equa- f (x, y) = sin ω1 x + sin ω2 y. (38)
tion can be calculated and a complete analytical study
hence is feasible. Its scale-space representation is

L(x, y; t) = e−ω1 t/2 sin ω1 x + e−ω2 t/2 sin ω2 y, (39)


2 2
Example 1. Consider a non-symmetric Gaussian
function
and ∇norm
2
L and det Hnorm L assume their spatial max-
f (x1 , x2 ) = g(x1 ; t1 )g(x2 ; t2 ) (33) ima at (π/2, π/2). Setting the derivative ∂t |(∇norm 2
2 −ω12 t/2 2 −ω22 t/2
L)| = ∂t (t (ω1 e + ω2 e )) to zero at this point
where gives

g(xi ; t) = √
1
e−xi /2t1
2 ¡ ¢ ¡ ¢
ω12 2 − ω12 t e−ω1 t/2 + ω22 2 − ω22 t e−ω2 t/2 = 0.
2 2
(34)
2πt1

is a model of a√two-dimensional √ blob with charac- There is a unique solution when the ratio ω2 /ω1 is close
teristic lengths t1 and t2 in the coordinate direc- to one, and three solutions when the ratio is sufficiently
tions. From the semi-group property of the Gaus- large. Hence, there is a unique maximum over scales
sian g(·; t A ) ∗ g(·; t B ) = g(·; t A + t B ) it follows that the when ω2 /ω1 is close to one, and two maxima when the
scale-space representation of f is ratio is sufficiently large. (The bifurcation occurs when
ω2 /ω1 ≈ 2.4.) In the special case when ω1 = ω2 = ω0 ,
L(x1 , x2 ; t) = g(x1 ; t1 + t)g(x2 ; t2 + t). (35) the maximum over scales is assumed at
2
After a few algebraic manipulations it can be shown ttrace HL = . (40)
that for any t1 , t2 > 0 there is a unique maximum over ω02
scales in
Similarly, setting ∂t |(det Hnorm L)(π/2, π/2; t)| =
¯¡ 2 ¢ ¯ ∂t (t 2 ω12 e−ω1 t/2 ω22 e−ω2 t/2 ) = 0 gives that the maximum
2 2

¯ ∇ ¯ t (t1 + t2 + 2t)
norm L (0, 0; t) = . over scales in det Hnorm L is at
2π(t1 + t)3/2 (t2 + t)3/2

In the case when t1 = t2 = t0 , this maximum over scales tdet HL =


4
. (41)
is given by ω12 + ω22
¡ 2 ¢
∂t ∇norm L (0, 0; t) = 0 ⇔ t = t0 . (36) Hence, for both the Gaussian blob model and the
periodic sine waves, these specific results agree with
The closed-form solution for the maximum over scales the suggested general scale selection principle.√When
in the scale parameter is measured in units of σ = t, the
scale levels, at which the maxima over scales are as-
t2 sumed, reflect the characteristic length of the structures
|det Hnorm (0, 0; t)| =
4π 2 (t1 + t)2 (t2 + t)2 in the signal.
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

90 Lindeberg

Figure 6. The 50 strongest spatial responses to the Laplacian operator computed at the scale levels: (a) t = 4.0, (b) t = 16.0, and (c) t = 64.0.
Observe how this blob detector leads to a bias towards image structures of a certain size.

5.2. Comparisons with Fixed-Scale Blob Detection In (Lindeberg and Gårding, 1993; Gårding and
Lindeberg, 1996) an application to estimation of sur-
In view of these results, it is interesting to compare this face shape is presented, where: (i) scale-space max-
blob detector with automatic scale selection to a stan- ima guide the computation of regional image texture
dard multi-scale blob detector operating at a fixed scale. descriptors and (ii) scale information serves as a cue to
Figure 6 shows the result of computing spatial maxima three-dimensional surface shape.
at different scales in the response of the Laplacian op- In (Wiltschi et al., 1997) it is demonstrated how scale
erator from the sine wave pattern in Fig. 4. At each information from a blob detection step can be incorpo-
scale, the 50 strongest responses have been extracted. rated into a pattern classifier.
As can be seen, small blobs are given the high-
est relative ranking at fine scales, whereas large blobs
are given the highest relative ranking at coarse scales. 6. Junction Detection with Automatic
Hence, a blob detector operating at a single predeter- Scale Selection
mined scale induces a bias towards image structures
of a corresponding size. On the other hand, if we use A similar approach can be used for detecting corners
the proposed methodology for blob detection based on in grey-level images. In this section, it will be shown
the detection of scale-space maxima, we obtain a con- how a multi-scale junction detector can be formulated
ceptually clean way of handling image structures of all in terms of the scale-space maxima of a normalized
sizes (within a given scale range) in a similar manner. differential invariant.

5.3. Applications of the Scale Selection Method 6.1. Detection Scales from Scale-Space Maxima

Following the previously presented arguments, we ar- A commonly used entity for junction detection is the
gue that a scale selection mechanism is an essential curvature of level curves in intensity data multiplied
complement to any blob detector aimed at handling by the gradient magnitude (Kitchen and Rosenfeld,
large size variations in the image structures. In addi- 1982; Dreschler and Nagel, 1982; Koenderink and
tion, scale information associated with such adaptively Richards, 1988; Noble, 1988; Deriche and Giraudon,
computed image descriptors may serve as an important 1990; Blom, 1992; Florack et al., 1992; Lindeberg,
cue in its own right. 1994a). A special choice is to multiply the level curve
In (Bretzner and Lindeberg, 1996, 1997) a feature curvature by the gradient magnitude raised to the power
tracker is presented, where (i) the scale information of three. This is the smallest value of the exponent that
is a key component for matching image features over leads to a polynomial expression
time, and (ii) the scale selection mechanism is essential
to capture objects under large size variations. κ̃ = L 2x2 L x1 x1 − 2L x1 L x2 L x1 x2 + L 2x1 L x2 x2 .
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 91

Moreover, spatial maxima of this operator are pre- Figure 7 shows the result of applying this operator to a
served under affine transformations. The correspond- grey-level image at a number of different scales. The
ing normalized differential expression is obtained by results are displayed in two ways; (i) as grey-level im-
replacing each derivative operator ∂xi by t γ /2 ∂xi , which ages showing the scale-space representation L as well
gives as the junction response κ̃ 2 at each scale, and (ii) in
terms of the 50 strongest spatial maxima of κ̃ 2 , respec-
κ̃norm = t 2γ κ̃. (42) tively, extracted at the same scale levels. As can be

Figure 7. Junction responses at different scales computed from a noisy image containing a number of ideal sharp corners as well as rounded
and diffuse corners. As can be seen, different types of junction structures give rise to different types of responses at different scales. Notably,
certain diffuse junction structures fail to give rise to dominant responses at the finest levels of scale. (Image size: 320 ∗ 240 pixels.)
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

92 Lindeberg

seen, qualitatively different responses are obtained at space maxima of κ̃norm


2
from the same grey-level image.
different scales. At fine scales, the strongest responses Each scale-space maximum has been graphically illus-
are obtained for the sharp corners and for a number of trated by a circle centered at the point at which the
spurious fine-scale perturbations along edges. Then, maximum is assumed, and with the size determined
with increasing values of the scale parameter, the se- such that the radius is proportional to the scale at which
lectivity to junction-like structures increases. In par- the maximum over scales was assumed (measured in
ticular, the diffuse (non-sharp) and strongly rounded dimension length). To reduce the number of junction
corners only give rise to strong responses at coarse candidates, the scale-space maxima have been sorted
scales. In summary, this example illustrates the fol- with respect to a saliency measure. This measure has
lowing basic aspects of multi-scale corner detection: been determined as the magnitude of the normalized
response according to (28) multiplied by the scale pa-
• If we are only interested in ideally sharp corners, i.e.,
rameter measured in dimension area, so as to approx-
corners which can be well approximated by straight
imate the area of the spatial support region of each
lines and the intensity contrast across the edge is high
scale-space maximum. Finally, the 50 most significant
and corresponds to a step edge, then it is sufficient
blobs according to this ranking have been displayed.
to use a fine scale in the detection stage. The main
Of course, thresholding on the magnitude of the op-
motivation for using coarser scales on such data is to
erator response constitutes a coarse selective mecha-
reduce the number of false positives.
nism for feature detection. Nevertheless, note that this
• If we are interested in capturing rounded corners and
operation gives rise to a set of junction candidates with
corners for which the intensity variations across the
reasonable interpretations in the scene. Moreover, ob-
edges around the junction are slow (diffuse corners),
serve that the circles representing the scale-space ex-
then it is essentially necessary to use a coarse scale
trema constitute natural regions of interest around the
if we want to have strong spatial maxima in the re-
candidate junctions. In particular, the selected scales
sponse of κ̃ 2 at such image structures.
reflect the diffuseness and the spatial extent of the cor-
Specifically, this example shows that if no a priori in- ners, such that coarser scales are, in general, selected
formation is available about what can be expected to for the diffuse corners than for sharp ones.
be in the scene, then a mechanism for automatic scale
selection is essential to capture corner structures at dif-
ferent scales. 6.2. Analysis for Diffuse Junction Models
Figure 8 shows the result of including such a scale se-
lection mechanism in the junction detector. It shows the To obtain an intuitive understanding of the qualitative
result of detecting the 50 strongest normalized scale- behaviour of the scale selection method in this case, let

Figure 8. Junction detection with automatic scale selection: The result of computing the 50 most significant normalized scale-space extrema
of κ̃norm
2 from a grey-level image containing sharp straight edges as well as diffuse and rounded edges (with γ = 1). Compare with Fig. 7 and
observe how corner structures at different scales are captured by this operation.
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 93

us analyse a simple junction model for which a closed- Gaussian blob is


form analysis can be carried out without too much
effort. L(x1 , x2 ; t) = g(x1 ; t1 + t)g(x2 ; t2 + t). (48)

Diffuse Step Junction. Consider Differentiation and insertion into (41) shows that the
absolute value of the rescaled level curve curvature as-
f (x1 , x2 ) = 8(x1 ; t0 )8(x2 ; t0 ) (43) sumes its spatial maximum

as a simple model of a diffuse L-junction, where t 2γ


|κ̃norm |max = (49)
8(·; t0 ) describes a diffuse step edge 12eπ 3 (t1 + t)5/2 (t2 + t)5/2
Z xi 3x 2 3x 2
8(xi ; t0 ) = g(x 0 ; t0 ) d x 0 (44) on the ellipse 2(t1 +t)
1
+ 2(t2 +t)
2
= 1. In the special case
x 0 =−∞ when γ = 1, the maximum over scales is assumed at

with diffuseness t0 . From the semi-group property of Ãs !


(t1 + t2 ) 96t1 t2
the Gaussian kernel it follows that the scale-space tκ̃ = 1+ −1 , (50)
representation L of f is 12 (t1 + t2 )2

L(x1 , x2 ; t) = 8(x1 ; t0 + t)8(x2 ; t0 + t). (45) whereas when t1 = t2 = t0


After differentiation, and using the fact that L x1 x1 = tκ̃ = t0 . (51)
5 − 2γ
L x2 x2 = 0 at √
the origin, as well as 8(0; t) = 1/2 and
g(0; t) = 1/ 2πt, we obtain Interpretation of the Qualitative Behaviour. To
conclude, the junction response κ̃norm
2
can for γ = 1 be
t 2γ
|κ̃norm (0, 0; t)| = . (46) expected to increase with scales when a single corner
8π 2 (t0 + t)2 model of infinite extent constitutes a reasonable ap-
proximation. On the other hand, κ̃norm
2
can be expected
When γ = 1, this entity increases monotonically with
to decrease with scales when so much smoothing is
scale, whereas for γ ∈ ]0, 1[, κ̃norm (0, 0; t) assumes a
applied that the overall shape of the object is substan-
unique maximum over scales at
tially distorted (and neighbouring junctions interfere
γ with each other or disappear altogether).
tκ̃ = t0 . (47) Hence, selecting scale levels (and spatial points)
1−γ
where κ̃norm
2
assumes maxima over scales can be ex-
Non-Uniform Gaussian Blob. A limitation of the pected to give rise to scale levels in the intermediate
abovementioned analysis is that the signature is com- scale range, where a finite extent junction model con-
puted at a fixed point, whereas the maximum in κ̃ 2 stitutes a reasonable approximation. In particular, this
can be expected to drift due to scale-space smoothing. approach will lead to larger scale values for corners
Unfortunately, the equation that determines the posi- having large spatial extent, and prevent too fine scales
tion of the spatial maximum in κ̃ 2 over scales is non- from being selected at rounded junctions.
trivial to handle (it contains a non-linear combination
of the Gaussian function, the primitive function of the 6.3. Experiments: Scale-Space Signatures
Gaussian, and polynomials). Carrying out a closed- in Junction Detection
form analysis along non-vertical extremum paths is,
however, straightforward for the previously treated Figure 9 illustrates these effects for synthetic L-junc-
non-uniform Gaussian blob model. This function can tions with varying degrees of diffuseness. It shows
be regarded as an approximate model of the behaviour simulation experiments with scale-space signatures of
at so coarse scales in scale-space that the shape distor- κ̃norm accumulated in two different ways: (i) a verti-
tions are substantial and the overall shape of a finite- cal signature obtained by computing κ̃norm at the fixed
size object is severely affected. From (35) we have central point at different scales, and (ii) a path signa-
that the scale-space representation of the non-uniform ture obtained by tracking the spatial extremum in κ̃norm
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

94 Lindeberg

Figure 9. Scale-space signatures of κ̃norm for synthetic L-junctions with different degrees of diffuseness (top t = 4.0, bottom t = 64.0). (left)
original grey-level image, (middle) path signature of κ̃norm accumulated by tracking a spatial maximum in κ̃norm across scales, (right) vertical
signature of κ̃norm accumulated at the central point.

Figure 10. Scale-space signatures of κ̃norm for diffuse L-junctions (t0 = 1.0) of different spatial extent (1/4 and 1/16 of the image size). (left)
original grey-level image, (middle) path signature of κ̃norm accumulated by tracking a spatial maximum in κ̃norm across scales, (right) vertical
signature of κ̃norm accumulated at the central point.
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 95

across scales. As can be seen, the qualitative behaviour at coarse scales, whereas the superimposed corner
is in agreement with the approximate analysis in previ- structures of smaller size give rise to scale-space max-
ous section—with increasing degree of diffuseness the ima at finer scales.
values of κ̃norm become smaller at fine scales. More results on corner detection, including a com-
Figure 10 shows the result of replacing the infinite plementary mechanism for accurate corner localiza-
extent L-junction model by junction models of finite tion, are presented in Section 7.
size. Observe that the peak in the signature is assumed
at finer scales when the spatial extent of the junction is 7. Feature Localization with Automatic
decreased. In other words, the scale at which the maxi- Scale Selection
mum over scales is assumed indicates the spatial extent
(the size) of the region for which a junction model is The scale selection methodology presented so far ap-
consistent with the local grey-level pattern (in agree- plies to the detection of image features, and the role
ment with the suggested scale selection principle). of the scale selection mechanism is to estimate the
Figure 11 gives a three-dimensional illustration approximate size of the image structures the feature
of this junction detector with automatic scale selec- detector responds to. Whereas this approach provides
tion. It shows scale-space maxima of κ̃norm2
computed a conceptually simple way to express various feature
from a synthetic image containing corner structures at detectors, such as a junction detector, which automat-
different scales. The original grey-level image is shown ically adapts its scale levels to the local image struc-
in the ground plane, and each scale-space maximum ture, it is not guaranteed that the spatial positions of
has been graphically visualized by a sphere centered at the scale-space maxima constitute accurate estimates
the position (x0 ; t0 ) in scale-space at which the scale- of the corner locations. The local maxima over scales
space maximum was assumed. (The height over the may be assumed at rather coarse scales, where the drift
image plane reflects the selected scale.) Observe how due to scale-space smoothing is substantial and adja-
the large scale corner as a whole gives rise to a response cent features may interfere with each other. For this

Figure 11. Three-dimensional view of scale-space maxima of κ̃norm 2 computed for a large scale corner with superimposed corner structures at
finer scales. Observe that a coarse scale response is obtained for the large scale corner structure as a whole, whereas the superimposed corner
structures of smaller size give rise to scale-space maxima at finer scales.
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

96 Lindeberg

reason, it is natural to complement the initial feature


detection step by an explicit feature localization stage.
The subject of this section shows how mechanism
for automatic scale selection can be formulated in this
context, by minimizing normalized measures of incon-
sistency over scales.

7.1. Corner Localization by Local Consistency

Given an approximate estimate x0 of the location and Figure 12. Minimizing (53) corresponds to finding the point x that
the size s of a corner (computed according to Section 6), minimizes the distance to all edge tangents in a neighbourhood of
an improved estimate of the corner position can be the given candidate junction point x0 .
computed as follows: Following (Förstner and Gülch,
1987), consider at every point x 0 ∈ R2 in a neighbour-
hood of a junction candidate x0 , the line l x 0 perpendic-
and the minimization problem be expressed as a stan-
ular to the gradient vector (∇ L)(x 0 ) = (L x1 , L x2 )T (x 0 )
dard least squares problem
at that point:

min x T Ax − 2x T b + c ⇔ Ax = b, (54)
0 0
Dx 0 (x) = ((∇ L)(x )) (x − x ) = 0. T
(52) x∈R2

where x = (x1 , x2 )T , and A, b, and c are determined


Then, minimize the perpendicular distance to all lines by the local statistics of the gradient directions in a
l x 0 in a neighbourhood of x0 , i.e., determine the point neighbourhood of x0 ,
x ∈ R2 that minimizes Z
Z A= (∇ L)(∇ L)T wx0 (x 0 ) d x 0 , (55)
x 0 ∈R2
min (Dx 0 (x))2 wx0 (x 0 ) d x 0 (53) Z
x∈R2 x 0 ∈R2
b= (∇ L)(∇ L)T x 0 wx0 (x 0 ) d x 0 , (56)
x 0 ∈R2
for some window function wx0 : R2 → R centered at Z
x 0 (∇ L)(∇ L)T x 0 wx0 (x 0 ) d x 0 . (57)
T
the candidate junction x0 . Minimizing this expression c=
x 0 ∈R2
corresponds to finding the point x that minimizes the
weighted integral of the squares of the distances from Provided that the 2 × 2 matrix A is non-singular, the
x to all l x 0 in the neighbourhood, see Fig. 7.12. (Dx 0 (x) minimum value is given by
is the distance from x to l x 0 multiplied by the gradi-
ent magnitude, and the window function gives stronger dmin = min x T Ax − 2x T b + c
weights to points in a neighbourhood of x0 .) The inten- x∈R2
tion behind this formulation is that for an image pattern = c − b T A−1 b, (58)
containing a junction, the point x that minimizes (53)
should constitute a better estimate of the projection of and the point x that minimizes (53) is x = A−1 b.
the physical junction than x0 (see Fig. 12). Hence, an improved localization estimate can be com-
puted directly from image measurements.
Explicit Solution in Terms of Local Image Statistics.
An attractive property of the formulation in (53) is that 7.2. Automatic Selection of Localization Scales
it allows for a compact closed-form solution. After ex-
pansion, it can be written The formulation in previous section, however, leaves
Z two major problems open: How to choose the window
min (x − x 0 )T (∇ L)(∇ L)T (x − x 0 )wx0 (x 0 ) d x 0 , function wx0 and the scale(s) for computing the gradient
x∈R2 x 0 ∈R2 vectors?
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 97

• The problem of choosing the weighting function is duce to relate minimizations at different scales. The
a special case of a common scale problem in least particular choice of norm(t) = trace A implies that the
squares estimation: Over what spatial region should normalized residual
the fitting be performed? Clearly, it should be large
enough such that statistics of gradient directions is R 0
x 0 ∈R2 R|((∇ L)(x ))T (x − x 0 )|2 wx0 (x 0 ) d x 0
accumulated over a sufficiently large neighbourhood d˜ = min 0 2 0 0
around the candidate junction. Nevertheless, the re- x∈R2 x 0 ∈R2 |(∇ L)(x )| wx0 (x ) d x
gion must not be so large that interfering structures (60)
corresponding to other junctions are included.
• The second scale problem, on the other hand, is of
has dimension [length]2 and can be interpreted as a
a slightly different nature than the previous ones—
weighted estimate of the localization error. Specifi-
it concerns what scales should be used for localiz-
cally, scale selection by minimizing the normalized
ing image structures. Previously, in this paper, only
residual r̃ (60) over scales, corresponds to selecting
the problem of detecting image structures has been
the scale that minimizes the estimated inaccuracy in
treated.
the localization estimate.
Here, the following solutions are proposed: This principle for selecting localization scales im-
plies that we take as localization scale the scale that
Selection of Window Function and Spatial Points from gives the maximum consistency between the distribu-
the Detection Step. When computing A, b, and c tion of gradient directions in a neighbourhood of x0
above, let the window function wx0 be a Gaussian func- and a local (qualitative) junction model. More specific
tion centered at the point x0 at which κ̃norm
2
assumed its motivations behind this choice can also be expressed
scale-space maximum, and let the scale value of this as follows:
window function be proportional to the detection scale At very fine scales, where a large amount of noise
t0 at which the maximum over scales in κ̃norm 2
was as- and interfering fine-scale structures can be expected to
sumed. be present, the first-order derivative operators will re-
The idea behind this approach is that the detection spond mainly to such structures. Hence, the gradient
scale should reflect a representative region around the directions can be expected to be roughly randomly dis-
candidate junction, such that larger regions are se- tributed, and the residual dmin will, in general, be large.
lected for corners with large spatial extent than for At coarser scales in scale-space, such fine-scale struc-
corners with small extent. Experimentally, this has tures will be suppressed and the locally computed gra-
been demonstrated to be the case in a large number dient directions will be better aligned to the underlying
of situations (compare with the qualitative results in corner structure. Thus, when smoothing is necessary,
Sections 6.2 and 6.3).6 the residual will decrease. On the other hand, if too
much smoothing is applied, then the shape distorting
Selection of Localization Scale: Minimize the Normal- effects of scale-space smoothing will be dominant and
ized Residual. Clearly, the gradient estimates used for the residual can be expected to increase again. Hence,
computing A, b, and c must be computed at a certain selecting the minimum gives a natural trade-off be-
scale. To determine this localization scale, it is nat- tween these two effects.
ural to select the scale that minimizes the normalized
residual dmin in (60) over scales. Behaviour at Ideal Sharp Polygon-Type Junctions.
This scale selection criterion corresponds to extend- Note, in particular, that for an ideal (sharp) step junc-
ing the minimization problem (54) from a single scale tion, the localization scale given by this method will
to optimization over multiple scales always be zero in the noise free case. This can be easily
understood by observing that for an ideal polygon-type
x T Ax − 2x T b + c junction (consisting of regions of uniform grey-level
min min
t∈R+ x∈R2 norm(t) delimited by straight edges), all edge tangents meet at
c − b T A−1 b the junction point, which means that the residual d˜min is
= min min (59) exactly zero. Thus, any amount of smoothing increases
t∈R+ x∈R2 trace A
the residual, and the minimum value will be assumed
where the normalization factor norm(t) has been intro- at zero scale.
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

98 Lindeberg

Figure 13. Scale-space signatures of the normalized residual at a synthetic sharp T -junction (t0 = 0.0) for different amounts of added white
Gaussian noise (ν = 1.0 and 30.0): (left) grey-level image, (middle left) signature of normalized residual d˜min accumulated at the central point,
(middle right) signature of the error in the localization estimate relative to the true corner position in the unsmoothed image, (right) localization
estimate computed at the scale at which d˜min assumes its minimum over scales (illustrated by a circle overlayed onto a bright copy of the image
smoothed to that scale).

7.3. Experiments: Choice of Localization Scale discussion above. There is a clear minimum over scales
in each scale-space signature, and the minimum over
Figures 13 and 14 show the result of applying this scale scales is assumed at coarser scales when the noise level
selection mechanism to a sharp and a diffuse corner is increased. The position of the minimum in d˜min over
with different amounts of added white Gaussian noise. scales agrees well with the position of the minimum
As can be seen, the results agree with the qualitative over scales in the absolute error measure (defined as

Figure 14. Corresponding results for a synthetic diffuse T -junction (t0 = 64.0).
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 99

Table 1. The result of applying the junction localization method to a synthetic T -junction
with different amounts of added white Gaussian noise. For each noise level, this table
gives the scale at which the normalized residual assumes its minimum over scales, as
well as the scale at which the estimate with the minimum absolute error over scales is
obtained. Moreover, numerical values of the two error measures are given at these scales.
As can be seen, the selected scales increase with the noise level, and the scale at which
the normalized residual assumes its minimum over scales serves as a reasonable estimate
of a scale at which a near optimal localization estimate over scales is obtained.

Junction localization with automatic scale selection: T-junction

Noise Selected Normalized Absolute Reference Normalized Absolute


level scale residual error scale residual error

0.01 1.2 1.0 0.9 1.0 1.0 0.9


0.03 2.8 2.3 1.4 2.1 2.5 1.4
0.1 8.1 6.7 5.1 6.3 6.8 4.9
0.3 18.8 13.9 16.6 13.4 15.0 14.7
1.0 50.3 28.5 116.0 28.2 39.2 96.3

the distance between the estimated and the true posi- which the normalized residual assumes its minimum
tion of the corner).7 Moreover, slightly coarser scales over scales is only slightly higher than the minimum
are selected for the diffuse junction than for the sharp absolute error over all scales. In this respect, mini-
one. mization of d˜norm over scales gives a near-optimal lo-
Table 1 gives a numerical illustration of basic pro- calization estimate.
perties of this scale selection method for junction lo- Moreover, whereas the error estimates assume rather
calization. It shows the result of applying one iteration high values for a single application of the junction
of the junction localization method to a T-junction with localization scheme when the noise level is high,
90 degree opening angles, and the results are shown in the localization erro can be decreased substantially
terms of the following six measures as function of the by applying the junction localization scheme itera-
noise level: tively.
Figure 15 shows the result of applying such an ite-
• the selected scale tdmin obtained by minimizing the
riative junction localization stage to the junction can-
normalized residual over scales,
didates in Fig. 8. For each scale-space maximum, an
• the normalized residual at the selected scale,
individual scale selection process has been invoked
• the absolute error in the localization estimate at the
consisting of the following processing steps:
selected scale,
• the scale tabs at which the localization estimate with
• The signature of the normalized residual has been ac-
the minimum absolute error is obtained,
cumulated using a window function with scale value
• the normalized residual at tabs ,
equal to the detection scale of the scale-space maxi-
• the minimum actual error of the localization esti-
mum.
mates computed at all scales.
• The minimum over scales in the signature of d˜min
All descriptors have been computed at a position with has been detected, and a new localization estimate
10 pixels horizontal and vertical offset from true corner has been computed using x = A−1 b.
position. • This procedure has been repeated iteratively until
The results show that the normalized residual serves either the difference between two successive local-
as an estimate of the inaccuracy in the corner localiza- ization estimates is less than one pixel or the num-
tion estimate, and specifically that the scale at which the ber of iterations has reached an upper bound (here 3
minimum over scales in d˜min is assumed is a reasonable iterations).
estimate of the scale at which we have the localization • Junction candidates for which the localization es-
estimate with the minimum absolute error. Whereas timate fall outside the support region of the orig-
the correspondence between the two error measures is inal scale-space maximum have been classified as
not perfect, the absolute error computed at the scale at “diverged” and been suppressed.
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

100 Lindeberg

Figure 15. Improved localization estimates for the junction candidates in Fig. 8. Each junction has been graphically illustrated by a circle
centered at the new location estimate. In the left image, the size reflects the detection scale, whereas in the right image, the size reflects the
localization scale.

• Each remaining junction candidate has been illus- scale. Then, at the scale at which the minimum in
trated by a circle with radius proportional to the de- d˜min is assumed, compute an improved localization
tection scale or the localization scale. estimate using

Observe how the localization is improved by this post- x = A−1 b.


processing step, and moreover, that the selected local-
ization scales serve as estimates of the spatial localiza- 3. Iterations. Optionally, repeat step 2 until the incre-
tion error. ment is sufficiently small.

7.5. Further Experiments


7.4. Composed Scheme for Junction Detection
and Junction Localization
Figures 16 and 17 show the result of applying the com-
posed two-state junction detection scheme to four in-
To summarize, the composed two-stage scheme for
door images containing different types of junctions. As
junction detection and junction localization consists of
can be seen, very reasonable sets of junction candidates
the following processing steps:8
are obtained, and the support regions of the scale-space
blobs again serve as natural regions of interest around
1. Detection. Detect scale-space maxima in the nor-
the features.
malized rescaled level curve curvature
Concerning the number of junction candidates to be
processed and passed on to later processing stages, we
κ̃norm
2
= (t 2γ κ̃)2 .
have not made any attempts here to decide automat-
ically how many of the extracted junction candidates
This generates a set of junction candidates.
correspond to physical junctions in the world. We argue
2. Localization. For each junction candidate, accu-
that such decisions require integration with higher level
mulate the scale-space signature of the normalized
reasoning and verification processes, and may be ex-
residual
tremely hard to make at the earliest processing stages,
c − b T A−1 b unless additional information is available about the ex-
d˜min = ternal conditions.9 For this reason, this module only
trace A
aims at computing an early ranking of image features
with A, b, and c computed according to (55)–(57), in order of significance, which can be used by a vision
and using a window function reflecting the support system for processing features in decreasing order of
region of the scale-space maximum at the detection significance.
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 101

Figure 16. Results of composed two-stage junction detection followed by junction localization for two different grey-level images. (top row)
original grey-level image, (middle and bottom rows) the N strongest junction candidates for different values of N .
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

102 Lindeberg

Figure 17. Results of composed two-stage junction detection followed by junction localization for two different grey-level images. (top row)
original grey-level image, (middle and bottom rows) the N strongest junction candidates for different values of N .
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 103

junctions have been processed. As can be seen, higher


order junctions are handled in a similar way as lower
order junctions.
Table 2 shows numerical values exemplifying how
large the localization errors can be in the different pro-
cessing stages. Typically, sub-pixel accuracy is ob-
tained, down to the order of 0.1–0.2 pixels for low
noise levels.

7.6. Applications of Corner Detection


with Automatic Scale Selection

An attractive property of this junction detector is that it


associates a region of interest with each junction can-
didate. We argue that this attribute information consti-
tutes an important cue when using these junctions in
later stage processes.
In (Lindeberg and Li, 1995, 1997) it is shown how
the support region associated with each junction allows
for conceptually simple matching between junctions
Figure 18. The result of applying the composed junction detec-
tion scheme to synthetic junction models with added Gaussian noise and edges based on spatial overlap only and without
(ν = 10.0). any need for providing externally determined thresh-
olds on e.g., distance. Then, the matching relations
between edges and junction cues that arise in this way
In line with this idea, the results are shown in terms of
are used in a pre-processing stage for classifying edges
the N strongest junction candidates for different (man-
into straight and curved. More generally, such relations
ually chosen) values of N . In Fig. 16, which contains
between edges and junctions are also useful for other
only a few objects, the 50 and the 100 strongest junc-
problems relating to object recognition (Lindeberg and
tions responses are shown. In Fig. 17, which shows
Olofsson, 1995).
corresponding examples for more cluttered scenes, the
In (Bretzner and Lindeberg, 1996) it is demonstrated
number of junctions displayed has been increased to
how these support regions can be used for simplify-
100 and 200. Notably, this number of junction can-
ing matching of junctions over time in tracking algo-
didates constitutes the only essential tuning parameter
rithms. Specifically, it is shown that the scale selec-
of the composed algorithm. In particular, no external
tion mechanism in the junction detector is essential
setting of the scale parameters is needed.
to capture junctions that undergo large size changes.
Figure 18 shows two examples of applying the com-
Moreover, the scale information associated with each
posed method to synthetic polygon-type junctions with
junction can be used as an important matching cue
added Gaussian noise. Here, the 10 most significant
in its own right and be included in the matching cri-
terion.
Table 2. Localization errors in the different process- In (Lindeberg, 1995a, 1996d) a scale selection prin-
ing stages of the composed junction detection scheme ciple for stereo matching and flow estimation is pre-
applied to the synthetic junction models in Fig. 18: (i) sented, which also involves the extension of a fixed
in the detection stage, (ii) after one localization step,
scale least squares estimation problem to optimization
(iii) after convergence of the iterative procedure.
over multiple scales.
Localization errors (noise level ν = 10.0)

Detection Localization Iteration 7.7. Extensions of the Junction Detector

L-junction 3.81 0.78 0.43 The main purpose of this section has been to make ex-
Y -junction 2.12 0.35 0.35 plicit how a scale selection mechanism can be incorpo-
4-junction 3.53 0.28 0.07
rated in junction detection.When building a stand-alone
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

104 Lindeberg

junction detector, there are a few additional mecha-


nisms which are natural to include:
Concerning the ranking on significance, we can
conceive linking the maxima of the junction resp-
onses across scales in a similar way as done in the
scale-space primal sketch (Lindeberg, 1993a), regis-
ter scale-space events such as bifurcations, and include
the scale-space lifetime of each junction response into
the significance measure.
Concerning the region of interest associated with
each junction candidate, we have throughout this work
represented the support region of a scale-space maxi-
mum by a circle with area reflecting the detection scale.
A possible limitation of this approach is that nearby
junctions may lead to interference effects in operations
such as the localization stage. If we want to reduce this
problem, a natural extension is to define the support re-
gion of a spatial maximum of a differential invariant in
a similar way as the definition of grey-level blobs from
spatial maxima in (Lindeberg, 1993a).

7.8. Edge Localization

Concerning more general applications of the proposed


methodology, it should be noted that the scale selection
Figure 19. Illustration of the result of applying minimization over
method for junction localization applies to edge detec- scales of the normalized residual d˜min to different types of edge
tion. If we compute the scale-space signature of the structures: (left) grey-level image, (middle) scale-space signature of
normalized residual d˜min according to (60) at an edge d˜min accumulated at the central point, (right) (unthresholded) edges
point, then we can interpret the scale at which the min- computed at the scale where d˜min assumes its minimum over scales.
imum over scales in d˜min occurs as the scale for which
a local edge (or corner) model is maximally consistent third row, where the grey-level image has been sub-
with data. This means that the fine scale fluctuations ject to a large amount of pre-smoothing, no additional
of the edge normals can be expected to be small. smoothing is necessary and the minimum is assumed
Figure 19 shows the result of applying this ap- at zero scale. Finally, the fourth row shows an inter-
proach to different types of edges. The columns show esting example where the minimization of d˜min over
from left to right; (i) the local grey-level pattern, scales leads to the selection of a scale level suitable for
(ii) the signature of d˜min computed at the central point, capturing a very faint shadow edge.
and (iii) (unthresholded) edges detected at the scale Concerning these experiments it should be pointed
tmin = argmin d˜min at which the minimum over scales out that they are mainly intended to demonstrate the
in d˜min was assumed. potential in applying the proposed method for select-
In the first row, we can see that when performing ing localization scales to the problem of edge detection,
edge detection at argmin d˜min we obtain coherent edge and that further processing steps are needed to give a
descriptors corresponding to the dominant edge struc- complete algorithm. A complementary scale selection
ture in this region. In the second row, a large amount of method applicable in the edge detection stage is pre-
white Gaussian noise has been added to the grey-level sented in (Lindeberg, 1996b, 1996c).
image, and the minimum over scales is assumed at a
much coarser scale. (Note that a coherent edge descrip- 8. Dense Frequency Estimation
tor is nevertheless obtained at the new argmin d˜min ,
whereas edge detection at the same scale as in the first So far, we have seen how the proposed scale selection
row gives rise to severely fragmented edges.) In the methodology applies to the detection of sparse feature
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 105

points. In certain situations, however, one is also inter- also with Section 3.) In this respect, QL serves as an
ested in computing dense image descriptors. approximate quadrature pair leading to small relative
An obvious problem that arises if we would base spatial variations near the scales given by the scale se-
a scale selection mechanism for computing dense im- lection procedure. For this reason, we propose to use
age descriptors on a partial derivative of the intensity QL as an entity to maximize over scales when com-
function, such as the Laplacian operator is that there puting dense image descriptors. For two-dimensional
would be large spatial variations in the operator re- data, we can instead consider
sponse and the selected scales. A common methodol- ¡ ¢
ogy in signal processing for reducing such a so-called QL = L 2ξ + L 2η + C L 2ξ ξ + 2L 2ξ η + L 2ηη
phase dependency is by using quadrature filter pairs ¡ ¢ ¡ ¢
= t L 2x + L 2y + Ct 2 L 2x x + 2L 2x y + L 2yy (63)
defined (from a Hilbert transform) in such a way that the
Euclidean sum of the filter responses is constant for any
defined to be rotationally symmetric and equal to the
sine wave. Since the Hilbert transform of a Gaussian
one-dimensional quadrature measure in any direction.
derivative kernel, however, is not within the Gaussian
derivative family, one may be interested in operators
Analysis for Sine Wave Patterns. The free param-
of small support which can be expressed within the
eter C determines the relative weight between the in-
scale-space framework.
formation in the first- and second-order derivatives. To
Given the normalized derivative concept, there is, a
obtain an intuitive understanding of how this parameter
straightforward way of combining Gaussian derivatives
affects the scale selection procedure, it is instructive to
into an entity that gives an approximately constant op-
analyse what scales are obtained by maximizing QL
erator response at the scale given by the scale selection
over scales for different values of C. Straightforward
mechanism. At any scale t in the scale-space repre-
differentiation of (62) gives that selected scale as func-
sentation L of a one-dimensional signal f , define the
tion of spatial position is given by
following quasi quadrature entity in terms of normal-
ized derivatives based on γ = 110 by µ ¶
1 2Cs 2
tQL (x) = 2 1 + √ (64)
QL = L 2ξ + C L 2ξ ξ = t L 2x + Ct 2 L 2x x (61) ω0 c2 + c4 + 4C 2 s 4

where C is a free parameter (to be determined later). where


By differentiating (5), it is follows that for a signal of
c = cos(ω0 x), s = sin(ω0 x), (65)
the form f (x) = sin ω0 x the quasi quadrature measure
assumes the form and that the extreme values at ω0 x = 0 and π
are inde-
2 ¡ ¡ ¢ ¢ 2
(QL)(x; t) = tω02 e−ω0 t 1 + Ctω02 − 1 sin2 ω0 x . pendent of C

(62). 1
tQ,0 = tQL |ω0 x=0 = , (66)
ω02
As can be seen, the spatial variations in QL will be
large when tω02 is either much smaller or much larger 2
tQ, π2 = tQL |ω0 x= π2 = . (67)
than one, whereas the relative oscillations decrease to ω02
zero when t approaches 1/(Cω02 ). (As will be shown
below, this scale value is of the same order of magnitude Graphs showing this variation for a few values of C are
as the scales that maximize QL over scales; compare displayed in Fig. 20. Given the form of these curves, a

Figure 20. Spatial variation of the selected scale levels when maximizing the quasi quadrature entity (61) over scales for different values of
the free parameter C using a one-dimensional sine wave of unit input frequency as input pattern. Observe that C = 2/3 gives rise to the most
symmetric variations in the selected scale values.
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

106 Lindeberg

Figure 21. Spatial variation of the maximum value over scales of the quasi quadrature entity (61) computed for different values of the free
parameter C using a one-dimensional sine wave of unit input frequency as input pattern. As can be seen, the smallest spatial variations in the
amplitude of the maximum response are obtained for C = e/4.

natural symmetry requirement can be stated as In other words, the determination of C based on small
spatial variations in the magnitude measure computed
1¡ ¢
tQL |ω0 x= π4 = tQL |ω0 x=0 + tQL |ω0 x= π2 at the selected scales gives rise to an approximately
2 similar value of C as the abovementioned symmetry
2 requirement.
⇒ C = ≈ 0.6667. (68)
3 This choice of C also corresponds to normalizing the
Gaussian derivative operators of first and second order
In this respect, C = 2/3 gives the most symmetric vari-
to having the same L 1 -norm (compare with the explicit
ation of selected scale levels with respect to the in-
expressions in (74) and (75)).
formation contents in the first-order and second-order
derivatives.11
Another interesting factor to analyse is the variation Experimental Results. Figure 22 gives a three-
in magnitude at the selected scales. Insertion of the dimensional illustration of the result of applying this
scale values according to (64) into the quasi quadrature operation to the perspective image of a sine wave pat-
measure (61) gives spatial variations of as displayed in tern with large size variations. The results are shown
Fig. 21. To determine C, a simple minimum-ripple as a surface plot of the magnitude of QL computed at
condition is to require that different positions and scales along a vertical cross-
section of the image. Moreover, the position of the
QL| ω0 x=0 = QL| ω0 x= π2 first local maximum over scales has been indicated
t=tQ,0 t=tQ, π
2
at each spatial point. Observe how the size variations
e in the vertical direction are captured and that the spa-
⇒ C = ≈ 0.6796. (69)
4 tial variations in QL at the selected scales are minor

Figure 22. Dense scale selection by maximizing the quasi quadrature measure (63) over scales: (left) Original grey-level image. (right) The
variations over scales of the quasi quadrature measure Q L computed along a vertical cross-section through the center of the image. The result is
visualized as a surface plot showing the variations over scale of the quasi quadrature measure as well as the position of the first local maximum
over scales.
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 107

compared to response of an operator such as the squared 9. Analysis and Interpretation


Laplacian. of Normalized Derivatives

Extensions and Applications. In (Almansa and 9.1. Interpretation of γ -Normalized Derivatives


Lindeberg, 1996, 1998) this approach is applied for in Terms of L p -Norms
estimating the distance between ridges in fingerprint
images. For that specific problem, the following ex- Concerning the interpretation of the γ -normalized
tensions were found to be highly useful: derivatives, it can be shown (see Appendix A.2) that
in the D-dimensional case, the variation over scales
• The extension of the renormalized one-dimensional of the L p -norm of an mth order normalized Gaussian
quadrature entity in (70) to two dimensions is done derivative kernel is given by
using the ridge strength measure (Lindeberg, 1996c)
¡ ¢
Aγ2 -norm L = t 2γ2 (L x x − L yy )2 + 4L x y . √ m(γ −1)+D(1/ p−1)
kgξ m (·; t)k p = t kgξ m (·; 1)k p . (72)
This second order differential entity is also invari-
ant under rotations, but has more selective response
properties to elongated ridge-like structures than In other words, the L p -norm of the mth order Gaussian
L 2x x + 2L 2x y + L 2yy , which rather constitutes a mea- derivative kernel is constant over scales if and only if
sure of the total amount of second order information.
• The quasi quadrature entity (61) is multiplied by a 1
uniform self-similar scaling factor p= . (73)
1+ m
D
(1 − γ)
Q0 L = t −0 QL
¡ ¢
= t −0 L 2ξ + L 2η . Hence, the γ -normalized derivative concept be inter-
¡ ¢2 ¡ ¢2 preted as an L p -normalization of the Gaussian deriva-
= t (1−0)/2 L x + t (1−0/2) L x x . (70) tive kernels over scales for a specific value of p, which
depends upon γ , the dimension D as well as the order
This operation does not affect the homogeneity of
m of differentiation.
the general class of differential expressions (21), and
Notably, the perfectly scale invariant case γ = 1
with the special choice 0 = 1/2, the quasi quadrature
gives p = 1 for all orders m and corresponds to L 1 -
entity can be written
normalization of the Gaussian derivative kernels. For
¡ ¢2 ¡ ¢2
Q0 L = t γ1 /2 L x + t γ2 L x x orders up to four, the kernel norms are then in the one-
dimensional case given by
= Gγ1 -norm L + Aγ2 -norm L . (71)

In view of results presented in a companion pa- Z ∞


r
2
per (Lindeberg, 1996b, 1996c), this scale renormal- N1 = |gξ (u; t)| du =≈ 0.797885, (74)
−∞ π
ized quasi quadrature measure can be interpreted Z ∞ r
as the linear combination of an edge strength mea- 8
N2 = |gξ 2 (u; t)| du = ≈ 0.967883, (75)
sure Gγ1 -norm L with scale normalization parameter −∞ πe
γ1 = 1/2 and a ridge strength measure Aγ2 -norm L Z ∞
with scale normalization parameter γ2 = 3/4. In the N3 = |gξ 3 (u; t)| du
−∞
abovementioned sources, these specific entities and r µ ¶
normalization parameters are shown to be useful for 2 4
= 1 + 3/2 ≈ 1.51003, (76)
edge detection and ridge detection with automatic π e
scale selection. Z ∞
A main motivation for this renormalization is that N4 = |gξ 4 (u; t)| du
−∞
it gives more pronounced peaks over scales, and pre- √ µq q ¶
4 3 √ √ √
vents the monotone increase with scale that occurs = 3/2+√3/2 √ 3 − 6e 6 + 3 + 6
for edges if the scale normalization parameter γ is e π
equal to one. ≈ 2.8006. (77)
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

108 Lindeberg

9.2. Interpretation of γ -Normalized Derivatives In other words, the normalized derivative model is neu-
in Terms of Self-similar Fourier Spectrum tral with respect to power spectra of the form

Another useful interpretation of normalized derivatives S f (ω) = |ω|−D−2m(1−γ ) . (87)


can be obtained in the context of signals having a self-
similar Fourier spectrum. In the special case when D = 2 and γ = 1, this corre-
Consider a D-dimensional signal f : R D → R hav- sponds to power spectra of the form |ω|−2 .
ing a self-similar power spectrum of the form It is well-known that natural images often show a
qualitative behaviour similar to this (Field, 1987). This
S f (ω) = S f (ω1 , . . . , ω D ) = ( fˆ fˆ∗ )(ω) is in fact a direct consequence of scale invariance; the
¡ ¢−β
= |ω|−2β = ω12 + · · · + ω2D , (78) power spectrum variation

and the following class of energy measures con- S(ω) ∼ |ω|−D , (88)
cerning the amount of information in the mth order
γ -normalized Gaussian derivatives where D is the dimension of the signal, can be easily
derived12 directly from the assumption that the power
Z X spectrum should contain the same amount of energy
Em = t mγ |L x α |2 d x. (79)
x∈R D |α|=m
for all frequencies.

where α represents multi-index notation. These differ- 9.3. Relations to Previous Work
ential energy measures are related to mth order spectral
moments by L 1 -normalized Gaussian derivative kernels of first or-
Z der have been used, for example, in edge detection
t mγ
Em = |ω|2m | L̂(ω; t)|2 dω. (80) and edge classification by Korn (1988), Mallat and
(2π) D ω∈R D Zhong (1992), and Zhang and Bergholm (1993), and in
Specifically, for derivatives up to order three in the pyramids by Crowley and Parker (1984). More gener-
two-dimensional case, this class of energy measures ally, evolution properties across scales of wavelet trans-
includes the following descriptors forms have been used by Mallat and Hwang (1992)
for characterizing local Lipschitz exponents of singu-
Z
larities. Mallat and Hwang (1992) also proposed the
E0 = L(x; t)2 d x, (81)
x∈R2
notion of “general maxima” of wavelet transforms for
Z estimating the frequency of local oscillations. This idea
¡ ¢
E1 = t γ L 2ξ + L 2η d x, (82) is closely related to the notion of scale-space maximum
x∈R2 considered here and to the scale selection mechanism in
Z
¡ ¢ (Lindeberg, 1991, 1993a) based on local maxima over
E2 = t 2γ L 2ξ ξ + 2L 2ξ η + L 2ηη d x, (83)
x∈R2 scales of blob responses computed along extremum
Z paths in scale-space. There is also a connection to the
¡ ¢
E3 = t 3γ L 2ξ ξ ξ + 3L 2ξ ξ η + 3L 2ξ ηη + L 2ηηη d x. “top point” representation proposed by Johansen et al.,
x∈R2 (1986) in the sense that the points in scale-space at
(84)
which bifurcations occur serve as to delimit extremum
paths with different topology. A main difference be-
It is rather straightforward to show (see Appendix A.3)
tween the scale selection mechanism suggested here
that the variation over scales of these γ -normalized
and the work in (Lindeberg, 1991; Mallat and Hwang,
energy measures are given by
1992), however, here it is shown that how these no-
E m (·; t) ∼ t β−D/2−m(1−γ ) . (85) tions can be applied to large classes of non-linear dif-
ferential invariants computed in a scale-space repre-
This expression is scale independent if and only if sentation. Moreover, feature detection algorithms have
been formulated with integrated scale selection mech-
D anisms and it has been shown how different derivative
β= + m(1 − γ ). (86)
2 normalization approaches lead to different classes of
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 109

differential expressions for which the scale selec- detection algorithms, where features are first detected
tion mechanism commutes with rescalings of the in- at locally adapted coarse scales, and then localized
put pattern. Specifically, it has been shown how L 1 - to finer scales in a second stage processing stage.
normalization is special in terms of scale invariance Whereas the general advantages of such a two-stage
properties.13 approach to feature detection are well-known in the
literature, a major contribution here is that explicit
mechanisms are provided for automatic selection of
10. Summary and Discussion the detection scales as well as the localization scales.
Moreover, these processing stages are integrated into
We have argued that the subject of scale selection is algorithms which are essentially free from other tuning
essential to many problems in computer vision and au- parameters that the number of features of interest.
tomated image analysis. Specifically, we have outlined Of course, the task of selecting “the best scale” for
how the evolution properties over scales of normal- handling real-world image data (about which usually
ized Gaussian derivatives provide important cues in no or very little a priori information is available) is
this context—for generating hypotheses about interest- intractable if treated as a pure mathematical problem.
ing scales (and associated spatial points or regions) for Therefore, the proposed heuristic principle should not
further analysis. A general scale selection principle has be interpreted as any “optimal solution,” but rather as a
been presented stating that in the absence of other evi- systematic method for generating initial hypotheses in
dence, coarse estimates of the size of image structures situations where no or very little information is avail-
can be computed from the scales at which normalized able about what can be expected to be in the scene.
differential geometric descriptors assume maxima over
scales. In particular, it has been suggested that this ap-
10.1. Technical Contributions
proach can be used for adaptively choosing the scales
for feature detection. At a technically more detailed level some of the main
Support for the overall approach has been provided contributions are that:
in terms of a theoretical analysis of the general scaling
property of local maxima over scales in the scale-space • It is emphasized how the evolution properties over
signature, and by a detailed analysis of the behaviour scales of normalized scale-space derivatives dif-
of the scale selection method when integrated with fea- fer from those of traditional spatial derivatives.
ture detection algorithms and applied to characteristic Whereas the magnitude of a traditional scale-space
model patterns; see Table 3 for an overview. The main derivative always decreases with scale, peaks over
support of the methodology is, however, experimental; scales can be expected in the scale-signatures of nor-
it has been demonstrated that intuitively reasonable and malized derivatives computed from data containing
quantitatively accurate results can be obtained by ap- dominant information at certain scales.
plying the proposed scheme to the problems of blob de- • A general heuristic principle for scale selection has
tection, junction detection and frequency estimation.14 been proposed stating that extrema over scales in
For a problem such as junction detection, the the signature of normalized differential entities are
methodology naturally gives rise to two-stage feature useful in the stage of detecting image features. In

Table 3. Measures of feature strength and normalization parameters used for different types
of feature detectors with automatic scale selection (including results from a companion paper
(Lindeberg, 1996b, 1996c)). For each feature detector, a preferred γ -value is specified as well
as the p-value for which the L p -norm of the Gaussian derivatives is constant over scales (73)
and the β-value for which the energy of a self-similar Fourier spectrum is constant over scales
(86).
Feature type Differential entity for scale selection γ -value L p -norm Fourier β

Edge t γ /2 L v 1/2 4/5 3/2


Ridge t 2γ (L pp − L qq )
2 3/4 4/5 3/2
Corner 2γ 2
t L v L uu 1 1 1
Blob tγ ∇2 L 1 1 1
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

110 Lindeberg

particular, it has been theoretically analyzed and Appendix


experimentally demonstrated how extrema over
scales of the following differential entities can be A.1. Necessity of the Form of the γ -Parameterized
used in feature detection; Derivative Operator
—maxima and minima in the rescaled level curve
curvature are highly useful for junction detection, In this appendix it is shown that the form of the γ -
—maxima and minima of the trace and the determi- normalized derivative operator
nant of the Hessian matrix can serve as qualitative
descriptors for blob detection, and
—local maxima over scales of the pseudo quadrature ∂ξ = t γ /2 ∂x (1)
measure can be used for local frequency estima-
tion. can be derived by necessity, given natural assumptions
The problem of junction detection is treated more
on a scale selection procedure based on local max-
extensively, and the resulting method is the first
ima over scales. Let us follow the general methodology
junction detection algorithm with automatic scale
proposed in Section 4 and state the following require-
selection.
ments on a normalized derivative concept:
• It is shown that the γ -normalized derivative concept
arises by necessity given natural scale invariance re-
quirements on a scale selection mechanism. • The main idea of introducing a normalized deriva-
• It is explained how the localization of junction can- tive operator should be to compensate for the general
didates can be improved by invoking a modified decrease in amplitude caused by scale-space smooth-
Förstner operator adapted to the local image struc- ing, and to ensure that image structures are processed
ture. Specifically, it is shown how localization scales by the vision system in such a way that the results
can be selected automatically by minimizing a cer- are not critically dependent upon how large the im-
tain normalized residual across scales. age structures actually are.
The same mechanism can be used for selecting lo- In the absence of further information, the main
calization scales in edge detection, which constitute source of information for normalizing the derivative
a trade-off between the diffuseness of the edge and operator should be the value of the scale parameter.
the noise level. Moreover, this normalization should allow a differ-
• It is shown how the normalized derivative con- ential descriptor to be expressed in terms of normal-
cept can be discretized in a consistent manner (see ized derivatives in a similar way as traditional image
Appendix A.4). descriptors are expressed in terms of unnormalized
derivatives. Thereby, it is natural to perform the nor-
When combined with feature detection modules, the malization as a multiplicative factor.
general approach of scale selection can serve as a Motivated by these qualitative requirements, the
primitive mechanism for guiding the focus of attention mth order normalized derivative operator ∂ξ m at a
(Lindeberg, 1993a; Tsotsos et al., 1995) and provide certain scale t should therefore be defined from the
qualitative context information for other early visual ordinary spatial derivative operator ∂x m by
modules.

∂ξ m = ϕ(t)∂x m (2)
Acknowledgments

This work was partially performed under the ESPRIT- where ϕ : R+ → R is a smooth function.
BRA project INSIGHT and the ESPRIT-NSF collab- • To allow for scale selection based on local maxima
oration DIFFUSION. The support from the Swedish over scales, the normalized derivative concept must
Research Council for Engineering Sciences, TFR, is preserve local maxima over scales. Specifically, if a
gratefully acknowledged. normalized differential entity assumes a local maxi-
The three-dimensional illustrations in Figs. 5 and 11 mum over scales at a certain point (x0 , t0 ) in scale-
have been generated with the kind assistance of Pascal space, then after a rescaling of the input signal by
Grostabussiat, and the illustration in Fig. 22 is from a a factor of s, the maximum over scales should be
collaboration work with Andrés Almansa. assumed at (sx0 , s 2 t0 ).
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 111

Given these requirements, it follows that the normal- for some other (arbitrary) function h. If we differentiate
ization must be of the form with respect to u and v

ϕ(t) = At B (3) ψ 0 (u) − ψ 0 (u + v) = 0,


(13)
ψ 0 (u + v) + h 0 (v) = 0,
for some constants A and B.
Proof: Consider two signals f and f 0 related by a and add up the equations
scaling factor s such that f (x) = f 0 (sx), and define
the normalized mth order derivatives by ψ 0 (u) + h 0 (v) = 0, (14)

L ξ m = t γ /2 L x m , (4)
we find that ψ 0 (and h 0 ) must be constant. In other
0 γ /2
L ξ 0m = t L x0m . (5) words, ψ(u) = C1 u + C2 and

From (17) we have that these derivatives at correspond- ϕ(t) = eC1 log t + C2 = At B . (15)
ing points (x 0 ; t 0 ) = (sx, s 2 t 0 ) are related by

ϕ(t)
L ξ m (x; t) = s m L ξ m (sx; s 2 t). (6) A.2. L p -Normalization Interpretation
ϕ(s 2 t) of γ -Normalized Derivatives
If a local maximum over scales is to be preserved, we
must require that In this appendix it is shown how the γ -normalized
derivative concept can be interpreted in terms of L p -
∂t (L ξ m (x; t)) = 0 ⇔ ∂t 0 (L ξ 0 m (x 0 ; t 0 )) = 0. (7) normalization of the Gaussian derivative kernels over
scales. The L p -norm of the mth order γ -normalized
Differentiating (6) with respect to t gives Gaussian derivative kernel is
Z ∞
¯ mγ /2 ¯p
∂t (L ξ m (x; t)) kgξ m (·; t)k pp = ¯t gx m (x; t)¯ d x. (16)
µ µ ¶ ¶
ϕ(t) ¡ ¢ x=−∞
= s m ∂t + ∂ t L ξ 0 m (sx; s t)
2
(8)
ϕ(s 2 t) From the well-known relation between the derivatives
of the unnormalized Gaussian kernel and the Her-
and application of (7) results in the necessary require- mite polynomials Hn ∂x m (e−x ) = (−1)m Hm (x) e−x
2 2

ment: (Abramowitz and Stegun, 1964) it follows that the mth


µ ¶ order Gaussian derivative kernel can be written
ϕ(t)
∂t = 0. (9) µ ¶
ϕ(s 2 t) 1 x
∂x m g(x; t) = (−1)m
Hm √ g(x; t). (17)
Rewrite ϕ as (2t)m/2 2t
Insert this expression
√ into (16) and make the sub-
ϕ(t) = eψ(log t) (10)
stitution x = tu. Then, by exploiting the fact that
g(x; t) = √1t g( √x t ; 1), we obtain
and introduce u = log t. Moreover, make use of the
fact that (9) implies that the ratio ϕ(t)/ϕ(s 2 t) cannot
depend on t and must be a function of s only. With kgξ m (·; t)k pp
v = 2 log s, (9) can hence be rewritten as Z ∞ ¯ m(γ −1)/2 µ ¶ ¯p
¯t u 1 ¯ √
= ¯ ¯
¯ 2m/2 Hm √2 √t g(u; 1)¯ t du
eψ(log t) u=−∞
= h̃(s) (11)
eψ(log(s 2 t)) = t p(mγ −m−1)/2+1/2
Z ∞ ¯ µ ¶ ¯p
¯ 1 u ¯ √
for some function h̃, and be reduced to the relation × ¯ H √ g(u; 1) ¯ t du
¯ 2m/2 m 2 ¯
u=−∞
p(mγ −m−1)/2+1/2
ψ(u) − ψ(u + v) = h(v) (12) =t kgξ m (·; 1)k pp . (18)
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

112 Lindeberg

In other words, the L p -norm of the Gaussian deriva- where ĥ i (ω) is the Fourier transform of h i (x) and by
tive kernel at scale level t is related to the L p -norm at letting h 1 = h 2 = L x α = L x1α1 ··· x Dα D , we obtain
unit scale by Z
√ m(γ −1)−1+1/ p L 2x α1 ··· x α D (·; t) d x
kgξ m (·; t)k p = t kgξ m (·; 1)k p . (19) x∈R2 1 D
Z
1 −2α
= ω2α1 · · · ω2α
D ĝ (ω; t)|ω|
D 2

Concerning L p -norms of Gaussian derivatives in (2π ) D ω∈R D 1
higher dimensions, we can make use of the separabil- (25)
ity of the normalized Gaussian derivative to separate
the D-dimensional integral in the L p -norm definition where ĝ denotes the Fourier transform of the Gaussian
into a product of one-dimensional integrals of the form kernel. By adding (25) over all possible multi-indeces
(18). Hence, if we let |m| = m 1 + · · · + m D denote the α with |α| = α1 +· · ·+α D = m, and by using the defini-
total order of differentiation, it follows that the varia- tion (79) we obtain the rotationally invariant descriptor
tion over scales of the L p -norm of the D-dimensional
Z
normalized Gaussian derivative kernel will be of the t mγ
Em = |ω|2m ĝ 2 (ω; t)|ω|−2β dω (26)
form (2π ) D ω∈R D
√ |m|(γ −1)+D(1/ p−1)
kgξ m (·; t)k p = t kgξ m (·; 1)k p . (20) Let us next introduce the D-dimensional correspon-
dence to spherical coordinates, (ρ; ϕ1 , . . . , ϕ D−1 ), with
In other words, the L p -norm of this kernel is constant the volumetric element dω = C D ρ D−1 dρ
over scales if and only if Z ∞
t mγ
ρ 2m−2β+D−1 e−ρ t dρ (27)
2
Em =
m(γ − 1) + D(1/ p − 1) = 0. (21) (2π ) D ρ=0
R∞ 0((m+1)/2)
x m e−ax d x =
2
Specifically, p is independent of m if and only if γ = 1. Then, using 0 2a (m+1)/2
, we get

C D t mγ 0(m − β + D/2)
A.3. Normalized Derivative Responses Em = , (28)
(2π ) D 2t m−β+D/2
to Self-similar Power Spectra
and the variation over scales is of the form
In this appendix, a closed-form expression is derived
for the variation over scales of the following class of E m (t) ∼ t β−D/2−m(1−γ ) . (29)
energy measures
Z X A.4. Discrete Implementation of the Scale
Em = t mγ |L x α |2 d x. (22) Selection Mechanisms
x∈R D |α|=m

when computed at different scales t in the scale-space Discretizing the normalized derivative operators leads
representation L of a two-dimensional signal f with a to two discretization problems; (i) how to discretize
self-similar power spectrum of the form the scale-space derivatives such that the scale-space
properties are preserved, and (ii) how to discretize the
S f (ω1 , . . . , ω D ) = ( fˆ fˆ∗ )(ω) normalization factor.
¡ ¢−β
= |ω|−2β = ω12 + ω22 . (23) A.4.1. Computing Discrete Derivative Approxima-
tions. The first problem can be solved by using the
Using Plancherel’s relation scale-space concept for discrete signals (Lindeberg,
Z 1994a), given by L(·, ·; t) = T (·, ·; t) ∗ f (·, ·), where
ĥ 1 (ω)ĥ ∗2 (ω) dω T (m, n; t) = T1 (m; t)T1 (n; t) and T1 (m; t) = e−t Im (t)
ω∈R D
Z is the one-dimensional discrete analogue of the Gaus-
= (2π) D
h 1 (x)h ∗2 (x) d x, (24) sian kernel defined from the modified Bessel func-
x∈R2 tions In .
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 113

The scale-space properties of L transfer to any dis- Normalization factors αm (t) for discrete approxima-
crete derivative approximations L x i x j defined as the tions (δx m T )(x; t) to higher order Gaussian derivative
1 2
result of applying difference operators δx i x j to L. In operators (∂x m g)(x; t) can be determined in an analo-
1 2
the implementations presented here, the first-order and gous way, such that the discrete l1 -norm is equal to the
second-order derivatives are approximated by the op- continuous L 1 -norm
erators (δx L)(x; t) = 12 (L(x + 1; t) − L(x − 1; t)) and X

(δx x L)(x; t) = (L(x + 1; t) − 2L(x; t) + L(x − 1; t)), αm (t) |(δx m T )(x; t)|
respectively. x=−∞
Z ∞
A.4.2. Normalization in the Discrete Case. In view = t m/2 |(∂x m g)(x; t)| d x. (32)
x=−∞
of the results presented in Section 9.1, it is natural
to normalize the discrete derivative approximation
kernels δx i x j T such that their discrete l1 -norms will
1 2
be constant over scales. Of course, it is not necessary A.4.3. Detection of Scale-Space Maxima in Dis-
to construct the normalized derivative approximation crete Data. Given the abovementioned discretiza-
kernels explicitly. Concerning e.g., first order deriva- tion methods for computing normalized differential
tives, discrete approximations to L x1 and L x2 can first descriptors based on the local N -jet representation, it
be computed according to Section A.4.1. Then, the is straightforward to express algorithms for detecting
result can be multiplied by the discrete normalization scale-space maxima. In summary, the implementations
factor underlying this work have been performed as follows:
√ 1. Given a discrete image f (here: of size between
2 128 × 128 or 256 × 256 pixels), select a scale
α1 (t) = √ , (30)
π(T1 (0; t) + T1 (1; t)) range for the analysis (here: tmin = 2 and tmax = 256).
Within this range, distribute a set of scale levels
which has been determined such that the discrete l1 - tk (here: 20 or 40 levels) such that the ratio be-
norm of the discrete derivative approximation kernel is tween successive scale levels tk+1 /tk is approxi-
equal to the continuous L 1 -norm of the Gaussian kernel mately constant.15
of the same order 2. For each scale tk , compute the scale-space represen-
tation of f according to Section A.4.1. Then, for
X
0
α1 (t)(δx T )(x; t) each point at each scale, compute discrete deriva-
x=−∞ tive approximations according to Section A.4.1 and
Z normalize them according to Section A.4.2. Finally,
0 √ 1
= t(∂x g)(x; t) d x = √ . (31) compute the normalized differential expression by
x=−∞ 2π pointwise combination of these entities.
3. In the three-dimensional volume so generated, de-
Using an asymptotic expression for the modified Bessel
tect local maxima (as points whose values are
functions of integer order (Abramowitz and Stegun,
greater than or equal to the values of their 26 dis-
1964) In (t) = √e2πt (1 − 4n8t−1 + O( t12 )), it can be ver-
t 2

crete neighbours).16 Optionally, select the N points


(t) approaches the continuous normaliza-
ified that α1√
having the strongest normalized responses.
tion factor t when the scale parameter t increases.
Observe, however, that α1 (t) assumes a non-zero value
A.4.4. Implementing Junction Localization. For
when t = 0,√in contrast to the continuous normaliza-
each scale-space maximum (x0 ; t0 ) detected according
tion factor t, which forces any normalized derivative
to the methodology in Section A.4.3, the subsequent
to be zero in unsmoothed data. This property of the
junction localization step is performed as follows:
L p -normalization approach is important if we want to
capture peaks at fine scales in the scale-space signa- • Compute A, b and c according to the definitions in
tures. In (Lindeberg, 1995b) it is shown that discrete (55), (56) and (57) using a Gaussian window function
L p -normalization also has better performance than wx0 (x 0 ) = g(x 0 − x0 ; t0 ) with integration scale value
variance based normalization when expressing scale t0 equal to the detection scale t0 of the scale-space
selection mechanisms in subsampled multi-scale re- maximum and with the center at the candidate junc-
presentations. tion x0 ). At a number of scales (here: 5–10 levels),
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

114 Lindeberg

uniformly distributed between a lower scale (here: 11. If we redefine the quasi quadrature measure as Q L = (1 − α)
0.01) and the detection scale t0 , vary the local scale L 2ξ + αL 2ξ ξ , then C = 2/3 corresponds to the relative weights
1 − α = 3/5 and α = 2/5.
at which derivatives are computed and select the lo-
12. To derive the self-similar power spectrum, consider an D-
cal scale that minimizes the normalized residual (60) dimensional signal with power spectrum S(ω), and parameterize
over scales. the D-dimensional frequency space using the D-dimensional
• At this scale, the new localization estimate is correspondence to spherical coordinates, (r ; ϕ1 , . . . , ϕ D−1 ),
x̂ = A−1 b. where r = |ω| ∈ [0, ∞] and ϕ1 , . . . , ϕ D−1 are suitably selected
angles in some domain Ä. To analyse the energy contribution
• Iterate the abovementioned steps until either the in-
from each range of frequencies, consider a volume element d V
crement is sufficiently small (here: within the same defined by r0 ≤ |ω| ≤ r0 (1 + dρ) for some r0 . Since the area
pixel) or an upper bound (here: 3 iterations) has been of an D-dimensional hypersphere of radius r0 is proportional
reached. to r0D−1 , the volume of a scale-invariant element can be written
Suppress all points for which the scheme diverges d V = C D r0D−1 r0 dρ for some constant C D . If we want the signal
(here: when the total update is larger than the detec- to contain the same amount of energyR for all frequencies (where
tion scale measured in dimension [length]).
the total energy is measured
R by R E = ω∈R D S(ω) dω), it follows
by necessity that d E = d V ( ϕ∈Ä S(r ; ϕ) dϕ) R d V must be inde-
pendent of ω, which in turn implies that ( ϕ∈Ä S(r ; ϕ) dϕ) d V
must be proportional to dρ, and the power spectrum must be of
the form S(ω) ∼ |ω|−D .
Notes
13. Applications of scale selection based on L p -normalization with
p < 1 are developed in more detail in (Lindeberg, 1996b).
1. An analysis of scale-space like responses to sine waves corre- 14. See (Lindeberg, 1996b) concerning scale selection mechanisms
sponding to the case when γ = 1 in this section has also been for detecting edges and ridges.
performed in wavelet analysis by (Mallat and Hwang, 1992); see 15. Specifically, the scale levels have been determined such that the
Section 9.3. difference τk+1 − τk in effective scale between adjacent scales
2. To avoid the sensitivity to sign of these entities, and hence the tk+1 and tk is constant (see Appendix A.4.4).
polarity of the signal, trace Hnorm L and det Hnorm L have been 16. In other words, a point (x0 , y0 , tk0 ) is regarded as a discrete scale-
squared before presentation. space maximum of a normalized differential entity Dnorm L if and
3. In the graphs in Fig. 2 the scale parameter (on the horizontal only if (Dnorm L)(x0 , y0 , tk0 ) ≥ (Dnorm L)(x0 + i, y0 + j, tk0 +k )
axis) is measured in terms of effective scale, τ . For continuous holds for all 26 neighbouring points (i, j, k) ∈ {−1, 0, 1}.
signals, this parameter is essentially the logarithm of the scale
parameter τ = C1 log t + C2 for some C1 , C2 > 0. To avoid
the singularity at zero scale, however, all experiments are based
on an effective scale concept especially developed for discrete
signals and defined such that τ ∼ log t at coarse scales and τ ∼ t References
at fine scales see (Lindeberg, 1994a).
4. When detecting scale-space maxima in practice, there is, of Abramowitz, M. and Stegun, I.A. (Eds.). 1964. Handbook of Mathe-
course, no need to explicitly track the extrema along the ex- matical Functions. Applied Mathematics Series. National Bureau
tremum path in scale-space. It is sufficient to detect three- of Standards, 55th edition.
dimensional maxima over space and scale (as described in more Almansa, A. and Lindeberg, T. 1996. Enhancement of finger-
detail in Section A.4.3). print images by shape-adapted scale-space operators. In Gaussian
5. Further extensions of this idea are also explored in (Lindeberg, Scale-Space Theory: Proc. Ph.D. School on Scale-Space Theory,
1996b), where differential definitions of edges and ridges are Copenhagen, Denmark. J. Sporring, M. Nielsen, L. Florack, and
expressed in such a way that scale selection constitutes an inte- P. Johansen (Eds.), Kluwer Academic Publishers.
grated part of the feature definition. Almansa, A. and Lindeberg, T. 1998. Fingerprint enhancement by
6. A more general approach for defining the support region of the shape adaptation of scale-space operators with automatic scale
junction feature is described in Section 7.7. selection. Technical Report ISRN KTH NA/P-98/03-SE., Dept. of
7. The difference in positions in the last row in Fig. 14 is mainly Numerical Analysis and Computing Science, KTH, Stockholm,
due to the fact that for the purpose of numerical evaluation, the Sweden.
reference position of the corner is defined as the position in the Babaud, J., Witkin A.P., Baudin, M., and Duda, R.O. 1986. Unique-
ideal corner image before the smoothing operation. ness of the Gaussian kernel for scale-space filtering. IEEE Trans.
8. Besides the general descriptions given in previous sections, fur- Pattern Analysis and Machine Intell., 8(1):26–33.
ther details concerning algorithms and discrete implementation Blom, J. 1992. Topological and geometrical aspects of image struc-
can be found in Appendix A.1.4 and in (Lindeberg, 1994b). ture. Ph.D. Thesis. Dept. Med. Phys. Physics, Univ. Utrecht, NL-
9. An integrated vision system for analysing junctions by ac- 3508 Utrecht, Netherlands.
tively zooming in to interesting structures is presented in Blostein, D. and Ahuja, N. 1989. A multiscale region detector. Com-
(Brunnström et al., 1992; Lindeberg, 1993a). puter Vision, Graphics, and Image Processing, 45:22–41.
10. Since Q L is an inhomogeneous differential expression, γ = 1 is a Bretzner, L. and Lindeberg, T. 1996. Feature tracking with automatic
necessary requirement for the scale selection procedure to com- selection of spatial scales. Technical Report ISRN KTH/NA/P-
mute with size variations in the input pattern (see Section 4.1). 96/21-SE, Dept. of Numerical Analysis and Computing Science,
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

Feature Detection with Automatic Scale Selection 115

KTH, Stockholm, Sweden. Revised version in Computer Vision Koenderink, J.J. 1984. The structure of images. Biological Cyber-
and Image Understanding. netics, 50:363–370.
Bretzner, L. and Lindeberg, T. 1997. On the handling of spatial and Koenderink, J.J. and Richards, W. 1988. Two-dimensional curva-
temporal scales in feature tracking. In Proc. 1st Int. Conf. on Scale- ture operators. J. of the Optical Society of America, 5:7:1136–
Space Theory in Computer Vision, Utrecht, The Netherlands. ter 1141.
Haar Romeny et al. (Ed.), LNCS, Springer Verlag: New York, Koenderink, J.J. and van Doorn, A.J. 1990. Receptive field families.
Vol. 1252, pp. 128–139. Biological Cybernetics, 63:291–298.
Bretzner, L. and Lindeberg, T. 1998. Feature tracking with automatic Koenderink, J.J. and van Doorn, A.J. 1992. Generic neighborhood
selection of spatial scales. Computer Vision and Image Under- operators. IEEE Trans. Pattern Analysis and Machine Intell.,
standing, 71(3):385–392. 14(6):597–605.
Brunnström, K., Lindeberg, T., and Eklundh, J.-O. 1992. Active Korn, A.F. 1988. Toward a symbolic representation of intensity
detection and classification of junctions by foveation with a head- changes in images. IEEE Trans. Pattern Analysis and Machine
eye system guided by the scale-space primal sketch. In Proc. 2nd Intell., 10(5):610–625.
European Conf. on Computer Vision, Santa Margherita Ligure, Lindeberg, T. 1990. “Scale-space for discrete signals. IEEE Trans.
Italy. G. Sandini (Ed.), Vol. 588 of Lecture Notes in Computer Pattern Analysis and Machine Intell., 12(3):234–254.
Science, Springer-Verlag, pp. 701–709. Lindeberg, T. 1991. Discrete scale-space theory and the scale-space
Burt, P.J. 1981. Fast filter transforms for image processing. Computer primal sketch. Ph.D. Thesis. ISRN KTH/NA/P-91/08-SE. Dept.
Vision, Graphics, and Image Processing, 16:20–51. of Numerical Analysis and Computing Science, KTH, Stockholm,
Crowley, J.L. 1981. A representation for visual information. Ph.D. Sweden.
Thesis, Carnegie-Mellon University, Robotics Institute, Pitts- Lindeberg, T. 1993. Detecting salient blob-like image structures and
burgh, Pennsylvania. their scales with a scale-space primal sketch: A method for focus-
Crowley, J.L. and Parker, A.C. 1984. A representation for shape of-attention. Int. J. of Computer Vision, 11(3):283–318.
based on peaks and ridges in the difference of low-pass trans- Lindeberg, T. 1993. On scale selection for differential operators.
form. IEEE Trans. Pattern Analysis and Machine Intell., 6(2): In Proc. 8th Scandinavian Conf. on Image Analysis, Tromsø,
156–170. Norway, K. Heia K.A. Høgdra, B. Braathen (Ed.), pp. 857–866.
Deriche, R. and Giraudon, G. 1990. Accurate corner detection: An Lindeberg, T. 1994a. Scale-Space Theory in Computer Vision.
analytical study. In Proc. 3rd Int. Conf. on Computer Vision, Kluwer Academic Publishers: Netherlands.
Osaka, Japan, pp. 66–70. Lindeberg, T. 1994b. Scale selection for differential operators.
Dreschler, L. and Nagel, H.-H. 1982. Volumetric model and 3D- Technical Report ISRN KTH/NA/P-94/03-SE, Dept. of Nu-
trajectory of a moving car derived from monocular TV-frame se- merical Analysis and Computing Science, KTH, Stockholm,
quences of a street scene. Computer Vision, Graphics, and Image Sweden.
Processing, 20(3):199–228. Lindeberg, T. 1994c. On the axiomatic foundations of linear scale-
Field, D.J. 1987. Relations between the statistics of natural images space: Combining semi-group structure with causality vs. scale
and the response properties of cortical cells. J. of the Optical So- invariance. Technical Report ISRN KTH/NA/P-94/20-SE, Dept.
ciety of America, 4:2379–2394. of Numerical Analysis and Computing Science, KTH, Stockholm,
Florack, L.M.J. 1993. The syntactical structure of scalar images. Sweden. Revised version in J. Sporring and M. Nielsen and L.
Ph.D. Thesis. Dept. Med. Phys. Physics, Univ. Utrecht, NL-3508 Florack and P. Johansen (Eds.) Gaussian Scale-Space Theory:
Utrecht, Netherlands. Proc. Ph.D. School on Scale-Space Theory, Copenhagen, Den-
Florack, L.M.J., ter Haar Romeny, B.M., Koenderink, J.J., and mark, Kluwer Academic Publishers.
Viergever, M.A. 1992. Scale and the differential structure of im- Lindeberg, T. 1994d. Junction detection with automatic selection of
ages. Image and Vision Computing, 10(6):376–388. detection scales and localization scales. In Proc. 1st International
Florack, L.M.J., ter Haar Romeny, B.M., Koenderink, J.J., and Conference on Image Processing, Austin, Texas, Vol. 1, pp. 924–
Viergever, M.A. 1994. Linear scale-space. J. of Mathematical 928.
Imaging and Vision, 4(4):325–351. Lindeberg, T. 1995a. Direct estimation of affine deformations of
Förstner, W.A. and Gülch, E. 1987. A fast operator for detec- brightness patterns using visual front-end operators with auto-
tion and precise location of distinct points, corners and centers matic scale selection. In Proc. 5th International Conference on
of circular features. In Proc. Intercommission Workshop of the Computer Vision, Cambridge, MA, pp. 134–141.
Int. Soc. for Photogrammetry and Remote Sensing, Interlaken, Lindeberg, T. 1995b. On scale selection in subsampled (hybrid)
Switzerland. multi-scale representations. Draft manuscript.
Gårding, J. and Lindeberg, T. 1996. Direct computation of shape Lindeberg, T. 1996a. Feature detection with automatic scale se-
cues using scale-adapted spatial derivative operators. Int. J. of lection. Technical Report ISRN KTH/NA/P-96/18-SE, Dept. of
Computer Vision, 17(2):163–191. Numerical Analysis and Computing Science, KTH, Stockholm,
ter Haar Romeny, B. (Ed.). 1994. Geometry-Driven Diffusion in Sweden.
Computer Vision. Kluwer Academic Publishers, Netherlands. Lindeberg, T. 1996b. Edge detection and ridge detection with auto-
Johansen, P., Skelboe, S., Grue, K., and Andersen, J.D. 1986. matic scale selection. Technical Report ISRN KTH/NA/P-96/06-
Representing signals by their top points in scale-space. In Proc. SE, Dept. of Numerical Analysis and Computing Science, KTH,
8th Int. Conf. on Pattern Recognition, Paris, France, pp. 215– Stockholm, Sweden. Revised version in Intl. J. of Computer Vision,
217. 30(2), 1998.
Kitchen, L. and Rosenfeld, A. 1982. Gray-level corner detection. Lindeberg, T. 1996c. Edge detection and ridge detection with
Pattern Recognition Letters, 1(2):95–102. automatic scale selection. In Proc. IEEE Comp. Soc. Conf. on
P1: VIY/MDR-BNY-BIS P2: VIY/RCK P3: SUK/RCK QC: SUK
International Journal of Computer Vision KL660-01-Linderberg-I October 20, 1998 16:21

116 Lindeberg

Computer Vision and Pattern Recognition, 1996, San Francisco, Mallat, S.G. and Zhong, S. 1992. Characterization of signals from
California, pp. 465–470. multi-scale edges. IEEE Trans. Pattern Analysis and Machine
Lindeberg, T. 1996d. A scale selection principle for estimating im- Intell., 14(7):710–723.
age deformations. Technical Report ISRN KTH/NA/P-96/16-SE, Marr, D. 1982. Vision. W.H. Freeman: New York.
Dept. of Numerical Analysis and Computing Science, KTH. Im- Noble, J.A. 1988. Finding corners. Image and Vision Computing,
age and Vision Computing, in press. 6(2):121–128.
Lindeberg, T. 1996e. On the axiomatic foundations of linear scale- Pauwels, E.J., Fiddelaers, P., Moons, T., and van Gool, L. J. 1995.
space, In Gaussian Scale-Space Theory: Proc. Ph.D. School An extended class of scale-invariant and recursive scale-space
on Scale-Space Theory, Copenhagen, Denmark, J. Sporring, M. filters. IEEE Trans. Pattern Analysis and Machine Intell., 17(7):
Nielsen, L. Florack, and P. Johansen (Eds.), Kluwer Academic 691–701.
Publishers. Tsotsos, J.K., Culhane, S.M., Wai, W.Y.K., Lai, Y., Davis, N., and
Lindeberg, T. 1997. On automatic selection of temporal scales in Nufflo, F. 1995. Modeling visual attention via selective tuning,
time-casual scale-space. In Proc. AFPAC’97: Algebraic Frames Artificial Intelligence, 78:507–545.
for the Perception-Action Cycle, G. Sommer and J. J. Koenderink Voorhees, H. and Poggio, T. 1987. Detecting textons and texture
(Eds.), Vol. 1315 of Lecture Notes in Computer Science, Springer boundaries in natural images. In Proc. 1st Int. Conf. on Computer
Verlag, Berlin, pp. 94–113. Vision, London, England.
Lindeberg, T. and Gårding, J. 1993. Shape from texture from a multi- Wiltschi, K., Pinz, A., and Lindeberg, T. 1997. Classification of car-
scale perspective. In Proc. 4th Int. Conf. on Computer Vision, bide distributions using scale selection and directional distribu-
Berlin, Germany, H.-H. Nagel et. al., (Eds.), pp. 683–691. tions, In Proc. 4th International Conference on Image Processing,
Lindeberg, T. and Gårding, J. 1997. Shape-adapted smoothing in Santa Barbara, CA.
estimation of 3D depth cues from affine distortions of local 2D Witkin, A.P. 1983. Scale-space filtering. In Proc. 8th Int. Joint Conf.
structure. Image and Vision Computing, 15:415–434. Art. Intell., Karlsruhe, West Germany, pp. 1019–1022.
Lindeberg, T. and Li, M. 1995. Segmentation and classification Young, R.A. 1985. The Gaussian derivative theory of spatial vi-
of edges using minimum description length approximation and sion: Analysis of cortical cell receptive field line-weighting pro-
complementary junction cues. In Proc. 9th Scandinavian Confer- files. Technical Report GMR-4920, Computer Science Depart-
ence on Image Analysis, Uppsala, Sweden, G. Borgefors (Ed.), ment, General Motors Research Lab., Warren, Michigan.
pp. 767–776. Young, R.A. 1987. The Gaussian derivative model for spatial vision:
Lindeberg, T. and Li, M. 1997. Segmentation and classification of I. Retinal mechanisms. Spatial Vision, 2:273–293.
edges using minimum description length approximation and com- Yuille, A.L. and Poggio, T.A. 1986. Scaling theorems for zero-
plementary junction cues. Computer Vision and Image Under- crossings. IEEE Trans. Pattern Analysis and Machine Intell., 8:
standing, 67(1):88–98. 15–25.
Lindeberg, T. and Olofsson, G. 1995. The aspect feature graph in Zhang, W. and Bergholm, F. 1993. An extension of Marr’s signature
recognition by parts. Draft manuscript. based edge classification and other methods for determination of
Mallat, S.G. and Hwang, W.L. 1992. Singularity detection and diffuseness and height of edges, as well as line width. In Proc.
processing with wavelets. IEEE Trans. Information Theory, 4th Int. Conf. on Computer Vision, Berlin, Germany, H.-H. Nagel
38(2):617–643. et al. (Eds.), IEEE Computer Society Press, pp. 183–191.

You might also like