6 SIFT Features

Previous Lecture
Features
Pattern Recognition By: Dr. Mudassar Raza

SIFT Algorithm
• Input:
• Image nxm.
• Output:
• Set of descriptors of image’s features.
SIFT - Scale Invariant Feature Transform

SIFT Algorithm

SIFT Algorithm
• Scale-space extrema detection.

• Keypoint localization.
• Orientation assignment.
• Keypoint descriptor.

Scale-space Representation
Scales
• What should be sigma value for Canny and LoG edge detection?
• If use multiple sigma values (scales), how do you combine multiple
edge maps?
• Marr-Hildreth:
– Spatial Coincidence assumption:
– Zerocrossings that coincide over several scales are physically significant.

Scale Space
• Apply whole spectrum of scales (sigma in Gaussian)
• Plot zero-crossings vs scales in a scalespace

Scale Space
• Apply whole spectrum of scales (sigma in Gaussian)
• Plot zero-crossings vs scales in a scalespace

Why Extrema?
• We want to find points that give us information about the objects
in the image.
• The information about the objects is in the object’s edges.
• We will represent the image in a way that gives us these edges as

this representations extrema points.








The only way to do
that is with the
Gaussian Blur

•Scale spaces in SIFT
SIFT takes scale spaces to the next level. You
take the original image, and generate
progressively blurred out images. Then, you
resize the original image to half size. And you
generate blurred out images again. And you keep
repeating.
Images of the same size (vertical) form an octave.

Above are four octaves. Each octave has 5
images. The individual images are formed
because of the increasing "scale" (the amount of
blur). Images of the same size (vertical) form an
octave. Above are four octaves. Each octave has 5
images. The individual images are formed
because of the increasing "scale" (the amount of
blur).
Octaves and Scales
• The number of octaves and scale depends on the size of
the original image.
• While programming SIFT, you'll have to decide for
yourself how many octaves and scales you want.
• However, the creator of SIFT suggests that 4 octaves and 5
blur levels are ideal for the algorithm.

The first octave
• If the original image is doubled in size and antialiased a bit
(by blurring it) then the algorithm produces more four
times more keypoints.
• The more the keypoints, the better!

Blurring
• Mathematically, "blurring" is referred to as the convolution of the
gaussian operator and the image.
• Gaussian blur has a particular expression or "operator" that is
applied to each pixel.
• What results is the blurred image.
L  x, y ,    G  x, y ,    I  x , y 
1  x 2  y 2  2 2
G  x, y ,    e
2 2

Blurring
L  x, y ,    G  x, y ,    I  x , y 
1  x 2  y 2  2 2
G  x, y ,    e
2 2
The symbols:
•L is a blurred image
•G is the Gaussian Blur operator
•I is an image
•x,y are the location coordinates
•σ is the "scale" parameter. Think of it as the amount of blur. Greater the value, greater the blur.
•The * is the convolution operation in x and y. It "applies" gaussian blur G onto the image I.

Amount of blurring
• The amount of blurring in each image is important.
• It goes like this. Assume the amount of blur in a particular image is
σ.
• Then, the amount of blur in the next image will be k*σ.
• Here k is whatever constant you choose.
See how each σ differs by a factor sqrt(2) from the previous one
Laplacian of Gaussian (LoG)
• We created the scale space of the image.
• The idea is to blur an image progressively, shrink it, blur the small image
progressively and so on.
• Now we use those blurred images to generate another set of images, the
Difference of Gaussians (DoG).
• These DoG images are a great for finding out interesting key points in the
image.

• The Laplacian of Gaussian (LoG) operation goes like this. You take an
image, and blur it a little.
• And then, you calculate second order derivatives on it (or, the "laplacian").
• This locates edges and corners on the image. These edges and corners are
good for finding keypoints.
• But the second order derivative is extremely sensitive to noise. The blur
smoothes it out the noise and stabilizes the second order derivative.
• The problem is, calculating all those second order derivatives is
computationally intensive. So we cheat a bit.

• To generate Laplacian of
Guassian images quickly, we
use the scale space.
• We calculate the difference
between two consecutive
scales.
• Or, the Difference of
Gaussians.
These Difference of Gaussian images are approximately equivalent to the Laplacian

of Gaussian.
As Laplace operator may detect edges as well as noise (isolated, out-of-
range), it may be desirable to smooth the image first by a convolution
with a Gaussian kernel of width
to suppress the noise before using Laplace for edge detection:



• Difference-of-Gaussian (DoG):
D  x , y ,     G  x , y , k   G  x , y ,     I  x , y 
 L  x , y , k   L  x , y ,  
Low computation time -

Only subtraction of smoothed images!

What is a useful signature function?

Scale-space blob detector: Example

Scale-space blob detector: Example

Building a Scale Space
All scales must be examined to identify

scale invariant features
An efficient function is to compute the

Laplacian Pyramid (Difference of
Gaussian)
(Burt & Adelson, 1983)

SIFT Algorithm


Finding keypoints
• Up till now, we have generated a scale space and used the scale space to
calculate the Difference of Gaussians.
• Those are then used to calculate Laplacian of Gaussian approximations that
is scale invariant. Now….
• Finding key points is a two part process
• Locate maxima/minima in DoG images

• Find subpixel maxima/minima

DoG Pyramid
Scale
invariance
Different
frequencies
features

Extracting Keypoints, Locate maxima/minima in DoG
• X is selected if it is larger orimages
smaller than all
26 neighbors.
iterate through each pixel and check all

it's neighbours. The check is done within
the current image, and also the one
above and below it.
Low cost - only several usually checked.

• X is selected if it is larger orimages
smaller than all
26 neighbors.
• X marks the current pixel. The green circles mark
the neighbors. This way, a total of 26 checks are
made. X is marked as a "key point" if it is the
greatest or least of all 26 neighbors.
• Once this is done, the marked points are the

approximate maxima and minima. They are
"approximate" because the maxima/minima
almost never lies exactly on a pixel.
Low cost - only several usually checked.

images

Extracting Keypoints
• Extrema detection product:
233x189 image => 832 DoG Keypoints.

• Each Keypoint is represented as  x, y,   .
Not all of them are good…

Extracting Keypoints
• Extrema detection product:
233x189 image => 832 DoG Keypoints.

• Each Keypoint is represented as  x, y,   .
•Candidates are chosen from extrema detection

Problematic Keypoints
• Inaccurate localization (due to scaling and sampling).
• Low contrast - sensitive to noise.

• Strong edge responses.

Inaccurate Keypoint Localization, Finding subpixel
maxima/minima
• Using the available pixel data, subpixel values are generated.
• This is done by the Taylor expansion of the image around the
approximate key point.
 DT  1 T  2 D T 
D x   D   x  x 2 x
x 2 x
• We can easily find the extreme points of this equation (differentiate and
equate to zero). On solving, we'll get subpixel key point locations. These
subpixel values increase chances of matching and stability of the
algorithm.

Inaccurate Keypoint Localization
The Solution:
Taylor expansion:
 DT  1 T  2 D T 
D x   D   x  x 2 x
x 2 x
Minimize to find accurate extrema:
1
 D D
2
xˆ   2 
x x
If offset from sampling point is larger than 0.5 -
Keypoint should be in a different sampling point.
Brown & Lowe 2002

Removing Low Contrast Keypoints
• - xˆ 
Function value at the extremun D
• If D x
ˆ   0.03 (pixel values in range [0,1]) -
keypoint is discarded.
=> down to 729 Keypoints after min. contrast threshold.

Eliminating Edge Responses
•The idea is to calculate two gradients at the keypoint.
•Both perpendicular to each other.
•Based on the image around the keypoint, three possibilities exist.
•The image around the keypoint can be:
•A flat region: If this is the case, both gradients will be small.
•An edge: Here, one gradient will be big (perpendicular to the edge) and the
other will be small (along the edge)
•A "corner": Here, both gradients will be big.
•Corners are great keypoints. So we want just corners. If both gradients are
big enough, we let it pass as a key point. Otherwise, it is rejected.

• The Problem:
• “Edge“ keypoints are poorly determined.
Point detection
Point detection Point can move along edge
=>
unstable.

• The Solution:
• Check Keypoints “cornerness”.
Point constrained
• High “cornerness”  No dominant principal curvature

component.

Finding “Cornerness”
• Principal curvature are proportional to eigenvalues of Hessian
matrix: max , min
 Dxx Dxy 
H 
 Dxy Dyy 
• Harris (1988) showed:
max Tr ( H ) 2 ( r  1)2
r 
min Det ( H ) r
• Threshold: if r < 10 - ratio is too great, keypoint discarded.

Stable Keypoints
=> down to 536 Keypoints after “cornerness” threshold.

SIFT Algorithm

Representation invariant to Rotation.

Keypoint orientations
• we have legitimate key points.

• They've been tested to be stable.
• We already know the scale at which the keypoint was detected (it's
the same as the scale of the blurred image).
• So we have scale invariance.
• The next thing is to assign an orientation to each keypoint.

Gradients
• For each sample point we compute gradient’s magnitude and
orientation:
m x, y   Lx  1, y   Lx  1, y   Lx, y  1  L( x, y  1) 
2 2
 Lx, y  1  L( x, y  1)  
 x, y   tan 1  
 Lx  1, y   Lx  1, y  
The magnitude and orientation is calculated for all pixels around the keypoint.
Gradients
• A histogram is created for this.
• In this histogram, the 360 degrees of orientation are broken into
36 bins (each 10 degrees).

Gradients

SIFT Algorithm
Distinctive (yet invariant) Keypoint

Representation.

Generating features
• We want to generate a very unique fingerprint for the keypoint.

• It should be easy to calculate.
• We also want it to be relatively lenient when it is being
compared against other keypoints.
• Things are never EXACTLY same when comparing two
different images.

Generating features
• To do this, a 16x16 window around the keypoint. This 16x16
window is broken into sixteen 4x4 windows.

Generating features
• Within each 4x4 window, gradient magnitudes and
orientations are calculated.
• These orientations are put into an 8 bin histogram.
orientation in the range 0-44 degrees add to the first bin. 45-89 add to the next bin. And so
on. And (as always) the amount added to the bin depends on the magnitude of the gradient.
SIFT Algorithm

References
• https://www.aishack.in/tutorials/sift-scale-invariant-feature-transform-log-approximation/
• Duda, R. O., Hart, P. E., & Stork, D. G. (2012). Pattern classification. John Wiley & Sons.
• https://courses.cs.washington.edu/courses/cse576/06sp/notes/Interest2.pdf
• http://www.ballardblair.net/projects/Difference_of_Gaussian_presentation.pdf
• Szeliski, R. (2010). Computer vision: algorithms and applications. Springer Science & Business Media.

Next Lecture
Bayes Decision Theory

6 SIFT Features

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6 SIFT Features

Uploaded by

Copyright:

Available Formats

Previous Lecture

Pattern Recognition By: Dr. Mudassar Raza

SIFT - Scale Invariant Feature Transform

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

• Scale-space extrema detection.

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

• We will represent the image in a way that gives us these edges as

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Images of the same size (vertical) form an octave.

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

These Difference of Gaussian images are approximately equivalent to the Laplacian

to suppress the noise before using Laplace for edge detection:

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Low computation time -

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

All scales must be examined to identify

An efficient function is to compute the

(Burt & Adelson, 1983)

• Scale-space extrema detection.

Pattern Recognition By: Dr. Mudassar Raza

• Finding key points is a two part process

• Locate maxima/minima in DoG images

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

iterate through each pixel and check all

Low cost - only several usually checked.

• Once this is done, the marked points are the

Low cost - only several usually checked.

Pattern Recognition By: Dr. Mudassar Raza

233x189 image => 832 DoG Keypoints.

Not all of them are good…

Pattern Recognition By: Dr. Mudassar Raza

233x189 image => 832 DoG Keypoints.

•Candidates are chosen from extrema detection

Pattern Recognition By: Dr. Mudassar Raza

• Low contrast - sensitive to noise.

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza

=> down to 729 Keypoints after min. contrast threshold.

Pattern Recognition By: Dr. Mudassar Raza

Pattern Recognition By: Dr. Mudassar Raza