A32 Keypoint Detectors

SIFT,
GLOH, SURF descriptors

Dipartimento di Sistemi e Informatica
Invariant local descriptor: Useful for…
•  Object RecogniAon and Tracking.

•  Robot LocalizaAon and Mapping.
•  Image RegistraAon and SAtching.
•  Image Retrieval.
•  Augmented Reality (hKp://blogs.oregonstate.edu/hess/siM-‐library-‐places-‐2nd-‐in-‐acm-‐mm-‐10-‐ossc/)
Template
Video Stream
Scale invariant detectors
•  In most object recogniAon applicaAons, when the scale of the object in the image is unknown
instead of extracAng features at many different scales and then matching all of them, it is more
efficient to design a funcAon on the region which is the same for corresponding regions, even if
they are at different scales.
•  The problem can also be stated as follows: given two images of the same scene with a large
scale difference between them, find the same interest points independently in each image.

•  For scale invariant feature extracAon it is necessary to detect structures that can be reliably
extracted under scale changes.
•  This is done by evaluaAng a signature funcAon (a kernel) in the point neighbourhood and plot
the result as a funcAon of the neighbourhood scale. Since it measures properAes of the local
neighbourhood at a certain scale, it should take a similar qualitaAve shape if two keypoints
are centered on corresponding image structures;
•  The funcAon shape should be squashed or expanded as a result of the scaling factor.
Corresponding neighbourhood sizes should be detected by searching for extrema of the
signature funcAon in both images.
We can consider points as a funcAon of region size (circle radius) . A common approach is to take
a local maximum of this funcAon. The soluAon is to search for maxima of suitable funcAons in
scale and in space over the images.
f
f Image 2
Image 1 scale = 1/2
region size region size
The region size (scale), for which the maximum is achieved, should be invariant to image scale.
f Image 1 f Image 2
scale = 1/2
s1 region size/scale s2 region size/scale

•  A “good” funcAon for scale detecAon has one stable sharp peak
f f f Good !
bad bad
region size region size region size
•  For usual images a good funcAon would be the one with contrast (sharp local intensity
change). It is easier to look for zero-‐crossings of 2nd derivaAve than maxima.

•  There are a few approaches which are truly invariant to significant scale changes. Typically,
such techniques assume that the scale change is the same in every direcAon, although they
exhibit some robustness to weak affine deformaAons.
•  The appropriate kernel for this is the scale-‐normalized Gaussian kernel G(x, σ) and its
derivaAves.
•  The classical approach is to generate a Gaussian scale-‐space representaAon of an
image, i.e. a set of images from the convoluAon of an isotropic (circular) Gaussian
Kernel of various sizes:
A larger scale results into a smoother image

•  ExisAng methods search for local extrema in the 3D Gaussian scale-‐space
representaAon of an image (x , y and scale). Local extrema over scale of normalized
derivaAves indicate the presence of characterisAc local structures

•  The moAvaAon for generaAng a scale-‐space representaAon of a given image originates
from the basic observaAon that real-‐world objects are composed of different structures
at different scales. This implies that real-‐world objects, may appear in different ways
depending on the scale of observaAon.

•  The Gaussian scale-‐space guarantees that new structures must not be created when
going from a fine scale to any coarser scale. Its properAes include linearity, shiM
invariance, non-‐enhancement of local extrema, scale invariance and rotaAonal
invariance

FuncAons for determining scale

f = Kernel ∗ Image
Kernels:
L = σ 2 (Gxx ( x, y, σ ) + Gyy ( x, y, σ ) )
Laplacian of Gaussians
DoG = G( x, y, kσ ) − G( x, y, σ )
Difference of Gaussians (an approximaAon of Laplacian)
where Gaussian
x2 + y 2
−
2σ 2
G ( x, y , σ ) = 1
2πσ
e both kernels are invariant to scale and rota8on
The method:
-‐  build scale-‐space pyramids;
-‐  all scales are examined to idenAfy scale-‐invariant features:
-‐  compute the Difference of Gaussian (DoG) pyramid or Laplacian of Gaussians (LoG)
-‐  detect maxima and minima in scale space scale
← Laplacian →
•  Harris-‐Laplacian1
Find local maximum of:
–  Harris corner detector in space (image coordinates) y
–  Laplacian in scale
← Harris → x
•  SIFT (Lowe)2
Find local maximum of:
scale
–  Difference of Gaussians in space and scale
← DoG →
y
← DoG → x
1 K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001
2 D.Lowe. “DisAncAve Image Features from Scale-‐Invariant Keypoints”. Accepted to IJCV 2004
Harris-‐Laplacian scale-‐invariant detector

•  Harris-‐Laplacian method uses Harris funcAon first at mulAple scales, then selects points for which
Laplacian aKains maximum over scales.

•  Harris-‐corner points are interest points that have good rotaAonal and illuminaAon invariance. But
are not scale invariant. To reflect scale-‐invariance the second-‐moment matrix is modified taking a
Gaussian scale space representaAon with a Laplacian of Gaussian kernel.
•  Since the computaAon of derivaAves usually involves a stage of scale-‐space smoothing, an
operaAonal definiAon of the Harris operator requires two scale parameters:
–  (i) a local deriva8on scale for smoothing before the computaAon of derivaAves
–  (ii) an integra8on scale for accumulaAng the operaAons on derivaAves

•  where g(σI) is the Gaussian kernel of scale σI (integraAon scale) and L(x,y) is the gaussian
smoothed image and Lx and Ly its derivaAves in the x and y direcAon, calculated using a
Gaussian kernel of scale σD (differenAaAon scale). MulAplicaAon by σ2 is because derivaAves must
be normalized across scales according to Dm(x, s ) = σm Lm(x, s ).
•  The algorithm searches across mulAple scales σn σ0 , k1σ0 , k2σ0 k3σD…… knσ0 (k=1,4 )
sekng σI = σn and σD = s σI (s=0,7).

•  At each scale corners are found as with the Harris method applied to M matrix in a 8 point
neighbourhood. An iteraAve algorithm localizes corner points spaAally and chooses the
characterisAc scale:
–  Laplacian of Gaussians is used to judge if each of the candidate points found on
different levels, forms a maximum in the scale direcAon (check with n-‐1 and n+1).
The scale where such maximum in scale is found is referred to as CharacterisAc scale.
It is used in future iteraAons. Points are spaAally localized at the characterisAc scale
•  Mikolajczyk and Schmid (2001) demonstrated that the LoG measure

D D D

aKains the highest percentage of correctly detected corner points in comparison to
other scale-‐selecAon measures:

•  At each iteraAon the corner point xk+1 is selected that maximizes the LoG within the scale
neigbourhood. The process terminates when xk+1 = xk
MulA-‐scale Harris points
SelecAon of points at the characterisAc

scale with Laplacian
Invariant points + associated regions [Mikolajczyk & Schmid’01]

SIFT Scale Invariant Feature Transform
•  SIFT method has been introduced by D. Lowe in 2004 to represent visual enAAes according to
their local properAes. The method employs local features taken in correspondence of salient
points (referred to as keypoints or SIFT points). Keypoints (their SIFT descriptors) are used to
characterize shapes with invariant properAes
•  Image points selected as keypoints and their SIFT descriptors are robust under:
-‐  Luminance change (due to difference-‐based metrics)
-‐  Scale change (due to scale-‐space)
-‐  RotaAon (due to local orientaAons wrt the keypoint canonical)
The original Lowe’s algorithm:

Given a grey-‐scale image:
-‐  Build a Gaussian-‐blurred image pyramid
-‐  Subtract adjacent levels to obtain a Difference of Gaussians (DoG) pyramid
(so approximaAng the Laplacian of Gaussians)
-‐  Take local extrema of the DoG filters at different scales as keypoints
-‐  Compute keypoint dominant orientaAon
For each keypoint:
-‐  Evaluate local gradients in a neighbourhood of the keypoint with orientaAons relaAve to the
keypoint orientaAon and normalize
-‐  Build a descriptor as a feature vector with the salient keypoint informaAon
•  MoAvaAons for usage of DoG are that while Laplacian of Gaussian σ2 2 G (x,y, σ)
provides strong responses to dark blobs of size √σ and is good to capture scale invariance,
calculaAon of Laplacian is costly. So an approximaAon can be used:
Scale normalized Laplacian
σ 2 G (x,y, σ)

Heat diffusion equaAon

unless ½ mulAplicaAve
constant (Koenderink ’92
for luminance scale space)

•  SIFT descriptors are obtained in the following three steps:
1.  Keypoint detecAon using local extrema of DoG filters
2.  ComputaAon of keypoint orientaAon
3.  SIFT descriptor derivaAon
Build Gaussian pyramids
Keypoints are detected as local scale-‐space maxima of the Differences of Gaussians. They correspond
to local min/max points in image I(x,y) that keep stable at different scales σ

Resample
Blur
Pyramid construcAon process

Blur: σ is doubled from the boKom to top of each pyramid
Resample: pyramid images are sub-‐sampled from scale to scale
Subtract: adjacent levels of pyramid images are subtracted
Building pyramids in detail
•  A first pyramid is obtained by the convoluAon

operaAon at different σ such that σn =kn σ0
•  L(x,y,σ) are grouped into a first octave

•  The DoG at a scale σ is obtained by the
difference of two nearby scales separated by
a constant k

•  AMer the first octave is completed the image
such that σ = 2 σ0 is subsampled by a factor
equal to 2 and the next pyramid is obtained in
the same way
•  The procedure is iterated for the next levels
σn =kn σ0 L(x, y, σ) = G(x, y, σ) *I(x, y) D(x, y, σ) = L(x, y, kσ) – L(x, y, σ)
σ0 = (k )0 σ
σ1 = (k )1 σ
σ2 = (k )2 σ
σ3 = (k )3 σ
σ4 = (k )4 σ

•  Octave: the original image is convoluted with a set of Gaussians, so as to obtain a set of images
that differ by k in the scale space: each of these sets is usually called octave.
k4
k3
Octave k2
k
k0
•  Each octave is divided into a number of intervals such as k = 2 1/s. .
•  For each octave s + 3 images must be calculated. For example if s = 2 then k = 2 ½ and we will
have 5 images at different scales. In this case an octave corresponds to doubling the value of σ

σ0 = (2 ½ )0 σ = σ
σ1 = (2 ½ )1 σ = κ σ
σ2 = (2 ½ )2 σ = 2 σ
σ3 = (2 ½ )3 σ = 2 κ σ
σ4 = (2 ½ )4 σ = κ2σ
σ4 is doubled wrt σ0
Choice of s = 2 is based on empirical verificaAon of the keypoint stability
•  Gaussian kernel size: the number of samples increases as σ increases. The number of
operaAons that are needed are (N2 -‐1) sums and N2 products. They grow as σ grows.
A good compromise is to use a sample interval of [-3σ, 3σ]
Sums (N2 – 1) Products (N2)

•  ComputaAonal savings can be obtained considering that the Gaussian kernel is separable into
the product of two one-‐dimensional convoluAons (2N) products and (2N – 2) sums. This
makes computaAonal complexity O(N).
•  Moreover convoluAon of two gaussians of σ12 and σ22 is a Gaussian with variance:
σ3 2 = ( σ12 + σ22 ). This property can be exploited to build the scale space, so to use
convoluAons already calculated
aa
Example
Detect maxima and minima of DoGs in scale space

•  Local extrema of D(x,y,σ) are the local interest points. To detect the interest points at each level
of scale of the DoG pyramid every pixel p is compared to its 8 neighbours:
–  if p is a local extrema (local minimum or maximum) it is selected as a candidate keypoint
–  each candidate keypoint is compared to the 9 neighbours in the scale above and below
Only pixels that are local extrema in 3 adjacent levels are promoted as keypoints
Keypoint stability
•  The many points extracted from maxima+minima of DoGs have only pixel-‐accuracy at best
and may correspond to low contrast and therefore unreliable points.
•  To improve keypoint stability a funcAon is adapted to the local points in order to determine
the interpolated posiAon. Since points are defined in 3D (x,y, σ) it is a 3D curve fikng
problem. The interpolaAon is done using the quadraAc Taylor expansion of the Difference-‐of-‐
Gaussian scale-‐space funcAon, with the candidate keypoint as the origin:
k x w-‐k
where D and its derivaAves are evaluated at the candidate keypoint k (x,y σ) and x(x,y σ) is
the offset from this point.

•  The locaAon of the extremum, is determined by taking the derivaAve of this funcAon with
respect to x and sekng it to zero:
that is where

–  If the offset is larger than 0.5 in any dimension, then it is an indicaAon that the
extremum lies closer to another candidate keypoint. In this case, the candidate keypoint
is changed and the interpolaAon performed instead about that point.
–  otherwise the offset is added to its candidate keypoint to get the interpolated esAmate
for the locaAon of the extremum.

•  low contrast keypoints are generally less reliable than high contrast and keypoints that
respond to edges are unstable. Filtering can be performed respecAvely by:
•  thresholding on simple contrast
•  thresholding based on principal curvature

•  The local contrast can be directly obtained from D(x,y,σ) calculated at the locaAon of the
keypoint as updated from the previous step. Unstable extrema with low contrast can be
discarded according to Lowe’ rule: |D(x) < 0,03|
•  The DoG funcAon has strong responses along edges. To eliminate the keypoints that have poorly
determined locaAons but have high edge responses it must be noAced that for poorly defined
peaks in the DoG funcAon, the principal curvature across the edge would be much larger than
the principal curvature along it.
•  Finding these principal curvatures amounts to solving for the

eigenvalues of the second-‐order Hessian matrix of D(x,y,s) . The
eigenvalues of H are proporAonal to the principal curvature of D(x,y,s):
to calculate for adiacent DoG pixels
•  The raAo of the two eigenvalues is sufficient to the goal. If r is the raAo between the highest and
the lowest eigenvalue, then:
R = (r+1)2 / r depends only on the raAo of the two eigenvalues and is minimum when the two
eigenvalues have the same value and increases as r increases.

In order to have the raAo between the two principal curvatures below a threshold it must be
that for some threshold on r, if R is higher than the keypoint is poorly
localized and hence rejected.
Maxima in D Remove low contrast and edges
•  Experimental evaluaAon of detectors w.r.t. scale change
Repeatability rate:
# correspondences
# possible correspondences
(points present)
•  The common drawback of both the LoG (and DoG) representaAon is that local maxima can
also be detected in the neighborhood of contours or straight edges, where the signal change
is only in one direcAon.
•  These maxima are less stable because their localizaAon is more sensiAve to noise or small
changes in neighboring texture.
OrientaAon assignment
•  For a keypoint, if L is the image with the closest scale, for a region around keypoint compute
gradient magnitude and orientaAon using finite differences:
⎡ L( x + 1, y ) − L( x − 1, y ) ⎤
GradientVector = ⎢ ⎥
⎣ L( x, y + 1) − L( x, y − 1) ⎦
For such region
-‐  create an histogram with 36 bins for orientaAon
-‐  weight each point with Gaussian window of 1.5σ east squares)
•  Peak orientaAon is the keypoint canonical orientaAon
•  Any peak within 80% of the highest peak is used to create a keypoint with that orientaAon.
Local peak within 80% creates mulAple orientaAons. About 15% has mulAple orientaAons
Once the local orientaAon and scale of a keypoint have been esAmated, a scaled and oriented
patch around the detected point can be extracted and used to form a feature descriptor

A32 Keypoint Detectors

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A32 Keypoint Detectors

Uploaded by

Copyright:

Available Formats

SIFT,

GLOH, SURF descriptors

•  Object RecogniAon and Tracking.

region size region size

s1 region size/scale s2 region size/scale

•  Mikolajczyk and Schmid (2001) demonstrated that the LoG measure

SelecAon of points at the characterisAc

Invariant points + associated regions [Mikolajczyk & Schmid’01]

The original Lowe’s algorithm:

Heat diﬀusion equaAon

Pyramid construcAon process

•  A ﬁrst pyramid is obtained by the convoluAon

•  L(x,y,σ) are grouped into a ﬁrst octave

•  The procedure is iterated for the next levels

σn =kn σ0 L(x, y, σ) = G(x, y, σ) *I(x, y) D(x, y, σ) = L(x, y, kσ) – L(x, y, σ)

Sums (N2 – 1) Products (N2)

that is where

•  Finding these principal curvatures amounts to solving for the

to calculate for adiacent DoG pixels

You might also like

A32 Keypoint Detectors

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A32 Keypoint Detectors

Uploaded by

Copyright:

Available Formats

SIFT,

GLOH, SURF descriptors

• Object RecogniAon and Tracking.

region size region size

s1 region size/scale s2 region size/scale

• Mikolajczyk and Schmid (2001) demonstrated that the LoG measure

SelecAon of points at the characterisAc

Invariant points + associated regions [Mikolajczyk & Schmid’01]

The original Lowe’s algorithm:

Heat diﬀusion equaAon

Pyramid construcAon process

• A ﬁrst pyramid is obtained by the convoluAon

• L(x,y,σ) are grouped into a ﬁrst octave

• The procedure is iterated for the next levels

σn =kn σ0 L(x, y, σ) = G(x, y, σ) *I(x, y) D(x, y, σ) = L(x, y, kσ) – L(x, y, σ)

Sums (N2 – 1) Products (N2)

that is where

• Finding these principal curvatures amounts to solving for the

to calculate for adiacent DoG pixels

You might also like

•  Object RecogniAon and Tracking.

•  Mikolajczyk and Schmid (2001) demonstrated that the LoG measure

•  A ﬁrst pyramid is obtained by the convoluAon

•  L(x,y,σ) are grouped into a ﬁrst octave

•  The procedure is iterated for the next levels

•  Finding these principal curvatures amounts to solving for the