Interest Points Detection

Interest Point Detectors
(see CS485/685 notes for more details)
CS491Y/691Y Topics in Computer Vision

Dr. George Bebis
What is an Interest Point?
• A point in an image which has a well-defined position
and can be robustly detected.
• Typically associated with a significant change of one or
more image properties simultaneously (e.g., intensity,
color, texture).
What is an Interest Point? (cont’d)
• Corners is a special case of interest points.
• However, interest points could be more generic than
corners.
Why are interest points useful?
• Could be used to find corresponding points between
images which is very useful for numerous
applications!
panorama stitching
stereo matching
left camera right camera
How to find corresponding points?
?
()
feature
= ()
feature
descriptor descriptor
• Need to define local patches surrounding the interest points
and extract feature descriptors from every patch.
• Match feature descriptors to find corresponding points.
Properties of good features
• Local: robust to occlusion and clutter.
• Accurate: precise localization.
• Covariant
Repeatable
• Robust: noise, blur, compression, etc.
• Efficient: close to real-time performance.

Interest point detectors should be covariant
• Features should be detected in corresponding locations despite

geometric or photometric changes.
Interest point descriptors should be invariant
?
()
feature
= ()
feature
descriptor descriptor
Should be similar despite geometric

or photometric transformations
Interest point candidates
Use features with gradients in at least two, significantly

different orientations (e.g., corners, junctions etc)
Harris Corner Detector
• Assuming a W x W window, it computes the matrix:
 f x2 fx f y 
AW   w( x, y )  2 
xW , yW  f x f y f y 
• AW(x,y) is a 2 x 2 matrix called auto-correlation matrix.
• fx, fy are the horizontal and vertical derivatives.
• w(x,y) is a smoothing function (e.g., Gaussian)

C. Harris and M. Stephens. "A Combined Corner and Edge Detector”, Proceedings of the
4th Alvey Vision Conference: pages 147—151, 1988.
Why is the auto-correlation
matrix useful?
Describes the gradient distribution

(i.e., local structure) inside the
window!
 f x2 fx f y 
AW   w( x, y )  
xW , yW  f x f y f y2 
Properties of the auto-correlation matrix
1 0 
1
Aw is symmetric and AW  R   R
can be decomposed :  0 2 
• We can visualize AW as an ellipse with axis lengths and directions

determined by its eigenvalues and eigenvectors.
(min)1/2
(max)1/2
Harris Corner Detector (cont’d)
• The eigenvectors of AW encode v1

direction of intensity change.
• The eigenvalues of AW encode
v2
strength of intensity change.
direction of the
slowest change
direction of the
(min)1/2 fastest change
(max)1/2
2 “Edge”
2 >> 1 “Corner”
Classification of 1 and 2 are large,
pixels using the 1 ~ 2 ;
eigenvalues of AW : intensity changes in
all directions
1 and 2 are small;

intensity is almost
“Flat” “Edge”
constant in all
region 1 >> 2
directions
1
• To avoid computing the eigenvalues explicitly, the

Harris detector uses the following function:
R(AW) = det(AW) – α trace2(AW) α: is a const
which is equal to:
R(AW) = λ1 λ2- α (λ1+ λ2)2

“Edge”
Classification of image R<0 “Corner”
R>0
points using R(AW):
R(AW) = det(AW) – α trace2(AW)
α: is usually between |R| small

0.04 and 0.06 “Edge”
“Flat”
region R<0
• Other functions:
2  a1
det A 12

trA 1  2
Harris Corner Detector - Steps
1. Compute the horizontal and vertical (Gaussian) derivatives
 
fx  G ( x, y,  D ) * f ( xi , yi ) fy  G ( x, y,  D )* f ( xi , yi )
x y
σD is called the “differentiation” scale
2. Compute the three images involved in AW :
 f x2 fx f y 
AW   w( x, y )  2 
xW , yW  f x f y f y 
Harris Detector - Steps
3. Convolve each of the three images with a larger Gaussian
σI is called the
w(x,y) :
“integration” scale
Gaussian
4. Determine cornerness:
R(AW) = det(AW) – α trace2(AW)
5. Find local maxima

Harris Corenr Detector - Example
Harris Detector - Example
Compute corner response R
Find points with large corner response:
R>threshold
Take only the points of local maxima of R
Map corners on the original image
(for visualization)
• Rotation invariant
• Sensitive to:
• Scale change
• Significant viewpoint change
• Significant contrast change
Multi-scale Harris Detector
• Detect interest points at varying scales.
R(AW) = det(AW(x,y,σI,σD)) – α trace2(AW(x,y,σI,σD))
scale
σn=knσ
σn
σD= σn
y
σI=γσD
 Harris  x
Multi-scale Harris Detector (cont’d)
• The same interest point will be detected at multiple
consecutive scales.
• Interest point location will shift as scale increases (i.e.,
due to smoothing).
i.e., the size of each circle

corresponds to the scale at
which the interest point
was detected.
How do we match them?
• Corresponding features might appear at different scales.
• How do we determine these scales?
• We need a scale selection mechanism!
Exhaustive Search
• Simple approach for scale selection but not efficient!
Characteristic Scale
• Find the characteristic scale of each feature (i.e., the scale
revealing the spatial extent of an interest point).
characteristic scale characteristic scale

Characteristic Scale (cont’d)
• Only a subset of interest points are selected using the
characteristic scale of each feature.
• i.e., the size of the

circles is related to
the scale at which
the interest points
were selected.
Matching can be simplified!

Automatic Scale Selection – Main Idea
• Design a function F(x,σn) which provides some local measure.
• Select points at which F(x,σn) is maximal over σn.
max of F(x,σn)
F(x,σn)
corresponds to
characteristic scale!
σn
T. Lindeberg, "Feature detection with automatic scale selection" International
Journal of Computer Vision, vol. 30, no. 2, pp 77-116, 1998.
Lindeberg et al, 1996
Slide from Tinne Tuytelaars

Automatic Scale Selection (cont’d)
• Using characteristic scale,

the spatial extent of interest
points becomes covariant
to scale transformations.
• The ratio σ1/σ2 reveals the

scale factor between the
images.
σ1 σ2
σ1/σ2 = 2.5
How to choose F(x,σn) ?
• Typically, F(x,σn) is defined using derivatives, e.g.:
Square gradient :  2 ( L2x ( x,  )  L2y ( x,  ))
LoG : |  2 ( Lxx ( x,  )  Lyy ( x,  )) |
DoG :| I ( x)* G ( n 1 )  I ( x)* G ( n ) |
Harris function : det( AW )   trace2 ( AW )
• LoG (Laplacian of Gaussian) yielded best results in a
recent evaluation study.
• DoG (Difference of Gaussians) was second best.
C. Schmid, R. Mohr, and C. Bauckhage, "Evaluation of Interest Point Detectors",
International Journal of Computer Vision, 37(2), pp. 151-172, 2000.
LoG and DoG
• LoG can be approximated by DoG:
G( x, y, k )  G( x, y,  )  (k 1) 22G
Harris-Laplace Detector
• Multi-scale Harris with scale selection.
• Uses LoG maxima to find characteristic scale.
scale
 LoG 
σn
y
 Harris  x
Harris-Laplace Detector (cont’d)
(1) Find interest points at multiple scales using Harris detector.

- Scales are chosen as follows: σn =knσ
- At each scale, choose local maxima assuming 3 x 3 window
(i.e., non-maximal suppression)
F ( x,  n )  F ( xW ,  n ) xW W
F ( x,  n )  t h
where F ( x ,  n )  det( AW )   trace 2

( Aw ) (σD =σn, σI =γσD )
(2) Select points at which the normalized LoG is maximal
across scales and the maximum is above a threshold.
σn+1 F ( x,  n )  F ( x,  n 1 )  F ( x,  n )  F ( x,  n 1 )
F ( x,  n )  t
σn
where:
σn-1 F ( x,  n ) |  2 ( Lxx ( x,  n )  Lyy ( x,  n )) |
K. Mikolajczyk and C. Schmid, “Indexing based on scale invariant interest points"

IEEE Int. Conference on Computer Vision, pp 525-531, 2001.
Example
• Interest points detected at each scale using Harris-Laplace
– Few correspondences between levels corresponding to same σ.
– More correspondences between levels having a ratio of σ = 2.
images differ σ=1.2 σ=2.4 σ=4.8 σ=9.6

by a scale
factor of 1.92
σ=1.2 σ=2.4 σ=4.8 σ=9.6

Example (cont’d)
(same viewpoint – change in focal length and orientation)
-More than 2000 points would have been detected without scale selection.
-Using scale selection, 190 and 213 points were detected in the left and
right images, respectively.
Example (cont’d)
58 points are initially matched (some were not correct)

Example (cont’d)
• Reject outliers (i.e., inconsistent matches) using RANSAC

• Left with 32 matches, all of which are correct.
•Estimated scale factor is 4:9 and rotation angle is 19 degrees.
Repeatability
• Invariant to:
– Scale
– Rotation
– Translation
• Robust to:
– Illumination changes
– Limited viewpoint changes
Harris-Laplace using DoG
Look for local maxima
in DoG pyramid
DoG pyramid
David G. Lowe, "Distinctive image features from scale-invariant keypoints.” Int.

Journal of Computer Vision, 60 (2), pp. 91-110, 2004.
Handling Affine Changes
• Similarity transformations cannot account for perspective
distortions; affine could be used for planar surfaces.
• Similarity transform
• Affine transform
Harris Affine Detector
• Use an iterative approach:
– Extract approximate locations and scales using the Harris-
Laplace detector.
– For each point, modify the scale and shape of its neighborhood
in an iterative fashion.
– Converges to stable points that are covariant to affine
transformations.
.
Steps of Iterative Affine Adaptation
1. Detect initial locations and neighborhood using Harris-
Laplace.
2. Estimate affine shape of neighborhood using 2nd order

moment matrix μ(x, σI, σD).
Steps of Iterative Affine Adaptation (cont’d)
3. Normalize (i.e., de-skew) the affine region by mapping it to
a circular one (i.e., “remove” perspective distortions).
4. Re-detect the new location and scale in the normalized

image.
5. Go to step 2 if the eigenvalues of μ(x, σI, σD) for the new

point are not equal (i.e., not yet adapted to the characteristic
shape).
Iterative affine adaptation - Examples
Initial points
Example 1 Example 2
Iterative affine adaptation – Examples (cont’d)
Iteration #1
Example 1 Example 2
Iteration #2
Example 1 Example 2
Iteration #3, #4, …

Example 1 Example 2
K. Mikolajczyk and C. Schmid, “Scale and Affine invariant interest point detectors”,
International Journal of Computer Vision, 60(1), pp. 63-86, 2004.
http://www.robots.ox.ac.uk/~vgg/research/affine/
De-skewing
• Consider a point xL with 2nd order matrix ML
xL
• The de-skewing transformation is defined as follows:

De-skewing corresponding regions
• Consider two points xL and xR which are related
through an affine transformation:
xL xR
De-skewing corresponding regions (cont’d)
Normalized regions
are related by pure
rotation R.
Resolving orientation ambiguity
• Create histogram of local gradient directions in the patch.
• Smooth histogram and assign canonical orientation at
peak of smoothed histogram.
Dominant gradient
direction!
0 2p
(36 bins)
Resolving orientation ambiguity (cont’d)
• Resolve orientation ambiguity.
Compute R
Other Affine Invariant Blob/Region Detectors
• There are many other techniques for detecting affine

invariant blobs or regions, for example:
– Intensity Extrema-Based Region (IER)

– Maximally Stable Extremal Regions (MSERs)
• No need to detect interest points.

Intensity Extrema-Based Region Detector
(1) Take a local intensity extremum as
initial point.
(2) Examine the intensity along “rays”

from the local extremum.
(3) The point on each ray for which fI(t)

reaches an extremum is selected
(i.e., invariant).
(4) Linking these points together yields

an affinely invariant region, to
which an ellipse is fitted.
d>0: small const
Tuytelaars, T. and Van Gool, L. “Matching Widely Separated Views based on Affine Invariant
Regions”, International Journal on Computer Vision, 59(1):61–85, 2004.
Intensity Extrema-Based Region Detector (cont’d)
• The regions found may not exactly correspond, so we

approximate them with ellipses using 2nd order moments.
• Double the size of the ellipse to make regions more distinctive.
• Final ellipse is not necessarily centered at original anchor

point.
• Region extraction is fairly robust to inaccurate localization
of intensity extremum.
Maximally Stable Extremal Regions (MSERs)
• Consider all possible thresholdings of a gray-scale image:
If I(x,y) > It then I(x,y)=255; else I(x,y)=0
Matas, J., Chum, O., Urban, M. and Pajdla, T., “Robust wide-baseline stereo from maximally
stable extremal regions”, British Machine Vision Conf, pp. 384–393, 2002.
Maximally Stable Extremal Regions (MSERs)
• Extermal Regions
– All regions formed using different thresholds.
– Can be extracted using connected components.
• Maximally Stable Extremal Regions

– Regions that remain stable over a large range of thresholds.
• Approximate MSER by an ellipse

Example
• Extermal regions can be extracted in O(nlog(logn)) time
where n is the number of pixels.
• Sensitive to image blur.

Interest Points Detection

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Interest Points Detection

Uploaded by

Copyright:

Available Formats

Interest Point Detectors

(see CS485/685 notes for more details)

CS491Y/691Y Topics in Computer Vision

• Efficient: close to real-time performance.

• Features should be detected in corresponding locations despite

Should be similar despite geometric

Use features with gradients in at least two, significantly

• AW(x,y) is a 2 x 2 matrix called auto-correlation matrix.

• fx, fy are the horizontal and vertical derivatives.

• w(x,y) is a smoothing function (e.g., Gaussian)

Describes the gradient distribution

• We can visualize AW as an ellipse with axis lengths and directions

• The eigenvectors of AW encode v1

1 and 2 are small;

• To avoid computing the eigenvalues explicitly, the

R(AW) = det(AW) – α trace2(AW) α: is a const

which is equal to:

R(AW) = λ1 λ2- α (λ1+ λ2)2

R(AW) = det(AW) – α trace2(AW)

α: is usually between |R| small

1. Compute the horizontal and vertical (Gaussian) derivatives

σD is called the “differentiation” scale

2. Compute the three images involved in AW :

3. Convolve each of the three images with a larger Gaussian

5. Find local maxima

i.e., the size of each circle

characteristic scale characteristic scale

• i.e., the size of the

Matching can be simplified!

Slide from Tinne Tuytelaars

• Using characteristic scale,

• The ratio σ1/σ2 reveals the

• LoG can be approximated by DoG:

(1) Find interest points at multiple scales using Harris detector.

where F ( x ,  n )  det( AW )   trace 2

σn-1 F ( x,  n ) |  2 ( Lxx ( x,  n )  Lyy ( x,  n )) |

K. Mikolajczyk and C. Schmid, “Indexing based on scale invariant interest points"

images differ σ=1.2 σ=2.4 σ=4.8 σ=9.6

σ=1.2 σ=2.4 σ=4.8 σ=9.6

58 points are initially matched (some were not correct)

• Reject outliers (i.e., inconsistent matches) using RANSAC

David G. Lowe, "Distinctive image features from scale-invariant keypoints.” Int.

2. Estimate affine shape of neighborhood using 2nd order

4. Re-detect the new location and scale in the normalized

5. Go to step 2 if the eigenvalues of μ(x, σI, σD) for the new

Iteration #3, #4, …

• The de-skewing transformation is defined as follows:

• There are many other techniques for detecting affine

– Intensity Extrema-Based Region (IER)

• No need to detect interest points.

(2) Examine the intensity along “rays”

(3) The point on each ray for which fI(t)

(4) Linking these points together yields

• The regions found may not exactly correspond, so we

• Double the size of the ellipse to make regions more distinctive.

• Final ellipse is not necessarily centered at original anchor

If I(x,y) > It then I(x,y)=255; else I(x,y)=0

• Maximally Stable Extremal Regions

• Approximate MSER by an ellipse

• Sensitive to image blur.

You might also like