CV Lecture 06 FeatureDetection

Computer Vision
Lecture 6
FEATURE DETECTION and MATCHING
Today’s Topics
¢ Feature Detection
¢ Feature Description
¢ Feature Matching
2
Questions of Interest
¢ How to find camera parameters?
¢ Where is the camera, where is it directed at?
¢ What is the movement of the camera?
¢ Where are the objects located in 3D?

¢ What are the dimensions of objects in 3D?
¢ What is the 3D structure of a scene?
¢ How to process stereo video?

¢ How to detect and match image features?
¢ How to stitch images?
3
Introduction to
Feature Detection
4
Image Mozaics
Object Recognition
Image Matching
Image Matching using Feature
Points
NASA Mars Rover images

Feature Points are Used for …
¢ Image alignment (e.g., mosaics)

¢ 3D reconstruction
¢ Motion tracking
¢ Object recognition
¢ Indexing and database retrieval
¢ Robot navigation
¢ …
Advantages of Local Features
Locality
l features are local, so robust to occlusion and clutter
Distinctiveness
l can differentiate a large database of objects
Quantity
l hundreds or thousands in a single image
Efficiency
l real-time performance achievable
Generality
l exploit different types of features in different situations
Point vs. Line Features
Processing Stages
¢ Feature detection/extraction (look for

stable/accurate features)
¢ Feature description (aim for invariance to

transformations)
¢ Feature matching (for photo collection) or

tracking (for video)
Feature Detection
13
Invariant Local Features
Image content is transformed into local feature coordinates that are

invariant to translation, rotation, scale, and other imaging parameters
Features Descriptors
Choosing interest points
Where would you tell
your friend to meet
you?
In computer vision, an interest point (also known as a keypoint or
feature point) is a specific location or point in an image that has
unique and distinguishable characteristics. Interest points are
used to identify and match corresponding features between
different images or frames. These points are selected based on
their properties, such as intensity variations, corners, edges, or
other distinctive patterns.
Interest points play a crucial role in various computer vision tasks,

including image recognition, object detection, and image stitching.
They serve as landmarks that can be reliably detected and
tracked across different images, enabling the computer to
understand and compare the visual content
Choosing interest points
Where would you tell
your friend to meet
you?
Common properties of interest points include:
Invariant to Transformations: Interest points are selected to be

invariant to certain transformations like rotation, scale, and
illumination changes. This ensures that the points can be reliably
matched even if the overall appearance of the scene undergoes
variations.
Repeatability: Interest points should be repeatable, meaning that they

can be reliably detected under different viewing conditions or when
the scene undergoes changes.
Distinctiveness: Interest points should have unique characteristics

that make them easily distinguishable from the surrounding features.
This distinctiveness is essential for accurate matching.
Stable (Good) Features:
High Accuracy in Localization
Stable (Good) Features
Aperture problem Good feature
“flat” region: “edge”: “corner”:

no change in all no change along significant change
directions the edge direction in all directions
(uniqueness)
Autocorrelation:
Indicator of “Good”ness
R ( u, v ) = å I ( x + u, y + v ) I ( x , y )
( x , y )ÎW
Autocorrelation Calculation
R ( u, v ) = å I ( x + u, y + v ) I ( x , y )
( x , y )ÎW
Autocorrelation can be approximated by

sum-squared-difference (SSD):
Corner Detection: Mathematics
Change in appearance of window w(x,y)
for the shift [u,v]:
E (u, v) = å w( x, y) [ I ( x + u, y + v) - I ( x, y) ]
2
x, y
I(x, y)
E(u, v)
E(3,2)
w(x, y)
2
x, y
I(x, y)
E(u, v)
E(0,0)
w(x, y)
2
x, y
Window Shifted Intensity

function intensity
Window function w(x,y) = or
1 in window, 0 outside Gaussian
Source: R. Szeliski
2
x, y
We want to find out how this function behaves for small shifts
E(u, v)

2
x, y
We want to find out how this function behaves for small shifts
Local quadratic approximation of E(u,v) in the neighborhood of (0,0) is

given by the second-order Taylor expansion:
é Eu (0,0)ù 1 é Euu (0,0) Euv (0,0)ù éu ù

E (u, v) » E (0,0) + [u v]ê ú + [u v]ê ú êv ú
E
ë v ( 0,0) û 2 E
ë uv ( 0,0) Evv ( 0,0) ûë û
Taylor Expansion
2
x, y
Second-order Taylor expansion of E(u,v) about (0,0):

E
ë v ( 0,0) û 2 E
ë uv ( 0,0) Evv ( 0,0) ûë û
Eu (u, v) = å 2w( x, y )[I ( x + u, y + v) - I ( x, y )]I x ( x + u, y + v)
x, y
Euu (u, v) = å 2w( x, y )I x ( x + u, y + v) I x ( x + u, y + v)

x, y
+ å 2w( x, y )[I ( x + u, y + v) - I ( x, y )]I xx ( x + u, y + v)

x, y
Euv (u, v) = å 2w( x, y )I y ( x + u, y + v) I x ( x + u, y + v)

x, y
+ å 2w( x, y )[I ( x + u, y + v) - I ( x, y )]I xy ( x + u, y + v)

x, y
2
x, y

E
ë v ( 0,0) û 2 E
ë uv ( 0,0) Evv ( 0,0) ûë û
E (0,0) = 0
Eu (0,0) = 0
Ev (0,0) = 0
Euu (0,0) = å 2w( x, y )I x ( x, y ) I x ( x, y )
x, y
E vv (0,0) = å 2w( x, y )I y ( x, y ) I y ( x, y )
x, y
Euv (0,0) = å 2w( x, y )I x ( x, y ) I y ( x, y )

x, y
2
x, y

é
ê
å w( x, y ) I x2 ( x, y ) å w( x, y) I ( x, y) I ( x, y)ùú éu ù
x y
E (u, v) » [u v]ê x, y x, y
ú êv ú
å
êë x , y
w( x, y ) I x ( x, y ) I y ( x, y ) å w ( x , y ) 2
I
y ( x , y )
úë û
x, y û
E (0,0) = 0
Eu (0,0) = 0
Ev (0,0) = 0
Euu (0,0) = å 2w( x, y )I x ( x, y ) I x ( x, y )
x, y
E vv (0,0) = å 2w( x, y )I y ( x, y ) I y ( x, y )
x, y
Euv (0,0) = å 2w( x, y )I x ( x, y ) I y ( x, y )

x, y
The quadratic approximation simplifies to
éu ù
E (u, v) » [u v] M ê ú
ëv û
where M is a second moment matrix computed from image
derivatives:
é I x2 IxI y ù
M = å w( x, y) ê 2 ú
x, y êë I x I y I y úû
M
Corners as distinctive interest points
éI x I x IxI y ù
M = å w( x, y ) ê ú
ëI x I y IyIy û
2 x 2 matrix of image derivatives (averaged in
neighborhood of a point).
¶I ¶I ¶I ¶I
Notation: Ix Û Iy Û IxI y Û
¶x ¶y ¶x ¶y
Interpreting the second moment matrix
The surface E(u,v) is locally approximated by a

quadratic form. Let’s try to understand its shape.
éu ù
E (u, v) » [u v] M ê ú
ëv û
é I x2 IxI y ù
M = å w( x, y) ê 2 ú
x, y êë I x I y I y úû
First, consider the axis-aligned case

(gradients are either horizontal or vertical)
é I 2
I x I y ù él1 0 ù
M = å w( x, y) ê x
2 ú
=ê ú
x, y êë I x I y I y úû ë 0 l2 û
If either λ is close to 0, then this is not a corner, so
look for locations where both are large.
Consider a horizontal “slice” of E(u, v): éu ù

[u v] M ê ú = const
ëv û
This is the equation of an ellipse.
éu ù
Consider a horizontal “slice” of E(u, v): [u v] M ê v ú = const
ë û
This is the equation of an ellipse.
él1 0 ù
-1
Diagonalization of M: M =R ê ú R
ë 0 l2 û
The axis lengths of the ellipse are determined by the
eigenvalues and the orientation is determined by R
direction of the
fastest change
direction of the
slowest change
(lmax)-1/2
(lmin)-1/2
Interpreting the eigenvalues
Classification of image points using eigenvalues
of M:
l2 “Edge”
l2 >> l1 “Corner”
l1 and l2 are large,
l1 ~ l2;
E increases in all
directions
l1 and l2 are small;

E is almost constant “Flat” “Edge”
in all directions region l1 >> l2
l1
Corner response function
R = det( M ) - a trace(M ) 2 = l1l2 - a (l1 + l2 ) 2
α: constant (0.04 to 0.06)
“Edge”
R<0 “Corner”
R>0
|R| small
“Flat” “Edge”
region R<0
Harris corner detector
1) Compute M matrix for each image window

to get their cornerness scores.
2) Find points whose surrounding window
gave large corner response (f > threshold)
3) Take the points of local maxima, i.e.,
perform non-maximum suppression
C.Harris and M.Stephens. “A Combined Corner and Edge Detector.”

Proceedings of the 4th Alvey Vision Conference: pages 147—151, 1988.
Harris Detector [Harris88]
é I x2 (s D ) I x I y (s D )ù
µ (s I , s D ) = g (s I ) * ê ú
I I
êë x y D(s ) I 2
y (s )
D ú û 1. Image Ix Iy
¢ Second moment matrix derivatives
(optionally, blur first)
Ix2 Iy2 IxIy

2. Square of
derivatives
det M = l1l2
trace M = l1 + l2
3. Gaussian g(Ix2) g(Iy2) g(IxIy)
filter g(sI)
4. Cornerness function – both eigenvalues are strong

har = det[ µ (s I ,s D)] - a [trace(µ (s I ,s D))2 ] =
g ( I x2 ) g ( I y2 ) - [ g ( I x I y )]2 - a[ g ( I x2 ) + g ( I y2 )]2
40
5. Non-maxima suppression har
Use Sobel Operator for
Gradient Computation
Gaussian derivative of Gaussian
-1 0 1 1 2 1
-2 0 2 0 0 0
-1 0 1 -1 -2 -1
Horizontal derivative Vertical derivative

Eigenvalues and Eigenvectors
of the Auto-correlation Matrix
lower limit upper limit

λ− ≤ E(u, v) = uT Mu ≤ λ+
where l+ and l- are the two eigenvalues of M.
The eigenvector e+ corresponding to l+

gives the direction of largest increase in E,
while the eigenvector e- corresponding to l-

gives the direction of smallest increase in E.
The Harris Operator
l-l+ det( A) a11a22 - a12a21 é a11 a12 ù

f = = = A= ê ú
l- + l+ tr( A) a11 + a22 a a
ë 21 22 û
• Called the “Harris Corner Detector” or “Harris Operator”

• Very similar to l- but less expensive (no square root)
• Lots of other detectors, this is one of the most popular
Harris Detector Example
f value (red = high, blue = low)
Threshold (f > value)
Find Local Maxima of f
Harris Features (in red)
Many Existing Detectors Available
Hessian & Harris [Beaudet ‘78], [Harris ‘88]
Laplacian, DoG [Lindeberg ‘98], [Lowe 1999]
Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01]
Harris-/Hessian-Affine [Mikolajczyk & Schmid ‘04]
EBR and IBR [Tuytelaars & Van Gool ‘04]
MSER [Matas ‘02]
Salient Regions [Kadir & Brady ‘01]
Others…
K. Grauman, B. Leibe
Feature
Description
57
Feature Descriptor
¢ How can we extract feature descriptors that are invariant to
variations (transformations) and yet still discriminative
enough to establish correct correspondences?
?
Transformation Invariance
Find features that are invariant to transformations
l geometric invariance: translation, rotation, scale
l photometric invariance: brightness, exposure
Transformation Invariance
¢ We would like to find the same features regardless of

the transformation between images. This is called
transformational invariance.
l Most feature methods are designed to be invariant to
• Translation, 2D rotation, scale
l They can usually also handle
• Limited affine transformations (some are fully affine invariant)
• Limited 3D rotations (SIFT works up to about 60 degrees)
• Limited illumination/contrast changes
Types of invariance
¢ Illumination
Types of invariance
¢ Illumination
¢ Scale
Types of invariance
¢ Illumination
¢ Scale
¢ Rotation
Types of invariance
¢ Illumination
¢ Scale
¢ Rotation
¢ Affine
Types of invariance
¢ Illumination
¢ Scale
¢ Rotation
¢ Affine
¢ Full Perspective
How to Achieve Invariance
1. Make sure the detector is invariant
l Harris is invariant to translation and rotation
l Scale is trickier
• A common approach is to detect features at many scales using a
Gaussian pyramid (e.g., MOPS)
• More sophisticated methods find “the best scale” to represent each
feature (e.g., SIFT)
2. Design an invariant feature descriptor
l A descriptor captures the information in a region around the detected
feature point
l The simplest descriptor: a square window of pixels
• What’s this invariant to?
l Let’s look at some better approaches…
Rotational Invariance for Feature
Descriptors
Ø Find dominant orientation of the image patch. This is given by e+, the
eigenvector of M corresponding to l+
Ø Rotate the patch to horizontal direction
Ø Intensity normalize the window by subtracting the mean, dividing by the
standard deviation in the window
40 pi
xels 8 pixels
CSE 576: Computer Visio
Adapted from slide by Matthew Brown

Scale Invariance
¢ Detecting features at the finest stable scale possible may not
always be appropriate
l e.g., when matching images with little high frequency detail (e.g., clouds),
fine-scale features may not exist.
¢ One solution to the problem is to extract features at a variety of
scales
l e.g., by performing the same operations at multiple resolutions in a pyramid
and then matching features at the same level.
¢ This kind of approach is suitable when the images being matched
do not undergo large scale changes
l e.g., when matching successive aerial images taken from an airplane or
l stitching panoramas taken with a fixed-focal-length camera.
Scale Invariance for Feature
Descriptors
Scale Invariant Feature
Transform
• Take 16x16 square window around detected feature
• Compute edge orientation (angle of the gradient - 90°) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations
0 2p
angle histogram
Adapted from slide by David Lowe

SIFT Descriptor
• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
Adapted from slide by David Lowe

Properties of SIFT
Extraordinarily robust matching technique
l Can handle changes in viewpoint
• Up to about 60 degree out of plane rotation
l Can handle significant changes in illumination
• Sometimes even day vs. night (below)
l Fast and efficient—can run in real time
l Lots of code available
• http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT
Feature Matching
75
Feature Matching
¢ Given a feature in image I1, how to find the best
match in image I2?
1. Define a distance function that compares two descriptors
2. Test all the features in I2, find the one with minimum distance
Feature Distance
¢ SSD is commonly used to measure the difference
between two features f1 and f2
¢ Use ratio-of-distance = SSD(f1, f2) / SSD(f1, f2’)
as a measure of reliability of the feature match:
l f2 is best SSD match to f1 in I2
l f2’ is 2nd best SSD match to f1 in I2
l Ratio-of-distance is small for ambiguous matches
f1 f2' f2
I1 I2
Evaluating Feature
Matching
Performance
79
Evaluating the Results
¢ How can we measure the performance of

a feature matcher?
50
75
200
feature distance
True/false positives
The distance threshold affects performance
l True positives = # of detected matches that are correct
• Suppose we want to maximize these—how to choose
threshold?
l False positives = # of detected matches that are incorrect
• Suppose we want to minimize these—how to choose
threshold?
50
true match
75
200
false match
feature distance
Precision-Recall and F-measure
In statistical analysis of binary classification, the F1 score (also
F-score or F-measure) is a measure of a test's accuracy. It
considers both the precision p and the recall r of the test to
compute the score:
p is the number of correct positive results divided by the

number of all positive results, and r is the number of correct
positive results divided by the number of positive results that
should have been returned.
The F1 score can be interpreted as a weighted average of the

precision and recall, where an F1 score reaches its best value
at 1 and worst at 0.
Precision-Recall Curves
Evaluating the Performance of a
Feature Matcher
true
positive
rate 1
0.7
true positives
true positive rate (TPR) =
positives
= recall
= sensitivity
0 0.1 1 false
positive
false positives rate
false positive rate (FPR) =
negatives
= 1− specificity
= 1− true negative rate
Receiver Operating Characteristic
(ROC) Curve
• Generated by counting # current/incorrect matches, for different thresholds
• Want to maximize area under the curve (AUC)
• Useful for comparing different feature matching methods
• For more info: http://en.wikipedia.org/wiki/Receiver_operating_characteristic
AUC stands for "Area under the ROC Curve."

That is, AUC measures the entire two-dimensional
1
area underneath the entire ROC curve (think
integral calculus) from (0,0) to (1,1). Figure 5.
AUC (Area under the ROC Curve). AUC provides
an aggregate measure of performance across all 0.7
possible classification thresholds
true
positive
rate
0 0.1 false positive rate 1

CV Lecture 06 FeatureDetection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CV Lecture 06 FeatureDetection

Uploaded by

Copyright:

Available Formats

Computer Vision

¢ Where are the objects located in 3D?

¢ How to process stereo video?

NASA Mars Rover images

¢ Image alignment (e.g., mosaics)

¢ Feature detection/extraction (look for

¢ Feature description (aim for invariance to

¢ Feature matching (for photo collection) or

Image content is transformed into local feature coordinates that are

Interest points play a crucial role in various computer vision tasks,

Invariant to Transformations: Interest points are selected to be

Repeatability: Interest points should be repeatable, meaning that they

Distinctiveness: Interest points should have unique characteristics

Aperture problem Good feature

“flat” region: “edge”: “corner”:

Autocorrelation can be approximated by

Window Shifted Intensity

Window function w(x,y) = or

1 in window, 0 outside Gaussian

Change in appearance of window w(x,y)

Local quadratic approximation of E(u,v) in the neighborhood of (0,0) is

é Eu (0,0)ù 1 é Euu (0,0) Euv (0,0)ù éu ù

Second-order Taylor expansion of E(u,v) about (0,0):

Euu (u, v) = å 2w( x, y )I x ( x + u, y + v) I x ( x + u, y + v)

+ å 2w( x, y )[I ( x + u, y + v) - I ( x, y )]I xx ( x + u, y + v)

Euv (u, v) = å 2w( x, y )I y ( x + u, y + v) I x ( x + u, y + v)

+ å 2w( x, y )[I ( x + u, y + v) - I ( x, y )]I xy ( x + u, y + v)

Second-order Taylor expansion of E(u,v) about (0,0):

Euv (0,0) = å 2w( x, y )I x ( x, y ) I y ( x, y )

Second-order Taylor expansion of E(u,v) about (0,0):

Euv (0,0) = å 2w( x, y )I x ( x, y ) I y ( x, y )

The surface E(u,v) is locally approximated by a

First, consider the axis-aligned case

Consider a horizontal “slice” of E(u, v): éu ù

l1 and l2 are small;

1) Compute M matrix for each image window

C.Harris and M.Stephens. “A Combined Corner and Edge Detector.”

Ix2 Iy2 IxIy

4. Cornerness function – both eigenvalues are strong

Gaussian derivative of Gaussian

Horizontal derivative Vertical derivative

lower limit upper limit

where l+ and l- are the two eigenvalues of M.

The eigenvector e+ corresponding to l+

while the eigenvector e- corresponding to l-

l-l+ det( A) a11a22 - a12a21 é a11 a12 ù

• Called the “Harris Corner Detector” or “Harris Operator”

¢ We would like to find the same features regardless of

CSE 576: Computer Visio

Adapted from slide by Matthew Brown

Adapted from slide by David Lowe

Adapted from slide by David Lowe

¢ How can we measure the performance of

p is the number of correct positive results divided by the

The F1 score can be interpreted as a weighted average of the

AUC stands for "Area under the ROC Curve."

0 0.1 false positive rate 1

You might also like