You are on page 1of 62

Previous Lecture

Features

Pattern Recognition By: Dr. Mudassar Raza


SIFT Algorithm
• Input:
• Image nxm.
• Output:
• Set of descriptors of image’s features.

SIFT - Scale Invariant Feature Transform

Pattern Recognition By: Dr. Mudassar Raza


SIFT Algorithm

Pattern Recognition By: Dr. Mudassar Raza


SIFT Algorithm

• Scale-space extrema detection.


• Keypoint localization.
• Orientation assignment.
• Keypoint descriptor.

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation
Scales
• What should be sigma value for Canny and LoG edge detection?
• If use multiple sigma values (scales), how do you combine multiple
edge maps?
• Marr-Hildreth:
– Spatial Coincidence assumption:
– Zerocrossings that coincide over several scales are physically significant.

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation
Scale Space
• Apply whole spectrum of scales (sigma in Gaussian)
• Plot zero-crossings vs scales in a scalespace

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation
Scale Space
• Apply whole spectrum of scales (sigma in Gaussian)
• Plot zero-crossings vs scales in a scalespace

Pattern Recognition By: Dr. Mudassar Raza


Why Extrema?
• We want to find points that give us information about the objects
in the image.
• The information about the objects is in the object’s edges.

• We will represent the image in a way that gives us these edges as


this representations extrema points.

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation
The only way to do
that is with the
Gaussian Blur

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation
•Scale spaces in SIFT
SIFT takes scale spaces to the next level. You
take the original image, and generate
progressively blurred out images. Then, you
resize the original image to half size. And you
generate blurred out images again. And you keep
repeating.

Images of the same size (vertical) form an octave.


Above are four octaves. Each octave has 5
images. The individual images are formed
because of the increasing "scale" (the amount of
blur). Images of the same size (vertical) form an
octave. Above are four octaves. Each octave has 5
images. The individual images are formed
because of the increasing "scale" (the amount of
blur).
Pattern Recognition By: Dr. Mudassar Raza
Scale-space Representation
Octaves and Scales
• The number of octaves and scale depends on the size of
the original image.
• While programming SIFT, you'll have to decide for
yourself how many octaves and scales you want.
• However, the creator of SIFT suggests that 4 octaves and 5
blur levels are ideal for the algorithm.

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation
The first octave
• If the original image is doubled in size and antialiased a bit
(by blurring it) then the algorithm produces more four
times more keypoints.
• The more the keypoints, the better!

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation
Blurring
• Mathematically, "blurring" is referred to as the convolution of the
gaussian operator and the image.
• Gaussian blur has a particular expression or "operator" that is
applied to each pixel.
• What results is the blurred image.

L  x, y ,    G  x, y ,    I  x , y 
1  x 2  y 2  2 2
G  x, y ,    e
2 2

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation
Blurring
L  x, y ,    G  x, y ,    I  x , y 
1  x 2  y 2  2 2
G  x, y ,    e
2 2

The symbols:
•L is a blurred image
•G is the Gaussian Blur operator
•I is an image
•x,y are the location coordinates
•σ is the "scale" parameter. Think of it as the amount of blur. Greater the value, greater the blur.
•The * is the convolution operation in x and y. It "applies" gaussian blur G onto the image I.

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation
Amount of blurring
• The amount of blurring in each image is important.
• It goes like this. Assume the amount of blur in a particular image is
σ.
• Then, the amount of blur in the next image will be k*σ.
• Here k is whatever constant you choose.

See how each σ differs by a factor sqrt(2) from the previous one
Pattern Recognition By: Dr. Mudassar Raza
Laplacian of Gaussian (LoG)
• We created the scale space of the image.
• The idea is to blur an image progressively, shrink it, blur the small image
progressively and so on.
• Now we use those blurred images to generate another set of images, the
Difference of Gaussians (DoG).
• These DoG images are a great for finding out interesting key points in the
image.

Pattern Recognition By: Dr. Mudassar Raza


Laplacian of Gaussian (LoG)

• The Laplacian of Gaussian (LoG) operation goes like this. You take an
image, and blur it a little.
• And then, you calculate second order derivatives on it (or, the "laplacian").
• This locates edges and corners on the image. These edges and corners are
good for finding keypoints.
• But the second order derivative is extremely sensitive to noise. The blur
smoothes it out the noise and stabilizes the second order derivative.
• The problem is, calculating all those second order derivatives is
computationally intensive. So we cheat a bit.

Pattern Recognition By: Dr. Mudassar Raza


Laplacian of Gaussian (LoG)

• To generate Laplacian of
Guassian images quickly, we
use the scale space.
• We calculate the difference
between two consecutive
scales.
• Or, the Difference of
Gaussians.

These Difference of Gaussian images are approximately equivalent to the Laplacian


of Gaussian.
Pattern Recognition By: Dr. Mudassar Raza
Laplacian of Gaussian (LoG)
As Laplace operator may detect edges as well as noise (isolated, out-of-
range), it may be desirable to smooth the image first by a convolution
with a Gaussian kernel of width

to suppress the noise before using Laplace for edge detection:

Pattern Recognition By: Dr. Mudassar Raza


Laplacian of Gaussian (LoG)

Pattern Recognition By: Dr. Mudassar Raza


Laplacian of Gaussian (LoG)

Pattern Recognition By: Dr. Mudassar Raza


Scale-space Representation
• Difference-of-Gaussian (DoG):

D  x , y ,     G  x , y , k   G  x , y ,     I  x , y 
 L  x , y , k   L  x , y ,  

Low computation time -


Only subtraction of smoothed images!

Pattern Recognition By: Dr. Mudassar Raza


What is a useful signature function?

Pattern Recognition By: Dr. Mudassar Raza


Scale-space blob detector: Example

Pattern Recognition By: Dr. Mudassar Raza


Scale-space blob detector: Example

Pattern Recognition By: Dr. Mudassar Raza


Building a Scale Space

All scales must be examined to identify


scale invariant features

An efficient function is to compute the


Laplacian Pyramid (Difference of
Gaussian)

(Burt & Adelson, 1983)


Pattern Recognition By: Dr. Mudassar Raza
SIFT Algorithm

• Scale-space extrema detection.


• Keypoint localization.
• Orientation assignment.
• Keypoint descriptor.

Pattern Recognition By: Dr. Mudassar Raza


Finding keypoints

• Up till now, we have generated a scale space and used the scale space to
calculate the Difference of Gaussians.
• Those are then used to calculate Laplacian of Gaussian approximations that
is scale invariant. Now….

• Finding key points is a two part process

• Locate maxima/minima in DoG images


• Find subpixel maxima/minima

Pattern Recognition By: Dr. Mudassar Raza


DoG Pyramid

Scale
invariance

Different
frequencies
features

Pattern Recognition By: Dr. Mudassar Raza


Extracting Keypoints, Locate maxima/minima in DoG
• X is selected if it is larger orimages
smaller than all
26 neighbors.

iterate through each pixel and check all


it's neighbours. The check is done within
the current image, and also the one
above and below it.

Low cost - only several usually checked.


Pattern Recognition By: Dr. Mudassar Raza
Extracting Keypoints, Locate maxima/minima in DoG
• X is selected if it is larger orimages
smaller than all
26 neighbors.
• X marks the current pixel. The green circles mark
the neighbors. This way, a total of 26 checks are
made. X is marked as a "key point" if it is the
greatest or least of all 26 neighbors.

• Once this is done, the marked points are the


approximate maxima and minima. They are
"approximate" because the maxima/minima
almost never lies exactly on a pixel.

Low cost - only several usually checked.


Pattern Recognition By: Dr. Mudassar Raza
Extracting Keypoints, Locate maxima/minima in DoG
images

Pattern Recognition By: Dr. Mudassar Raza


Extracting Keypoints
• Extrema detection product:

233x189 image => 832 DoG Keypoints.


• Each Keypoint is represented as  x, y,   .

Not all of them are good…

Pattern Recognition By: Dr. Mudassar Raza


Extracting Keypoints
• Extrema detection product:

233x189 image => 832 DoG Keypoints.


• Each Keypoint is represented as  x, y,   .

•Candidates are chosen from extrema detection

Pattern Recognition By: Dr. Mudassar Raza


Problematic Keypoints
• Inaccurate localization (due to scaling and sampling).

• Low contrast - sensitive to noise.


• Strong edge responses.

Pattern Recognition By: Dr. Mudassar Raza


Inaccurate Keypoint Localization, Finding subpixel
maxima/minima
• Using the available pixel data, subpixel values are generated.
• This is done by the Taylor expansion of the image around the
approximate key point.

 DT  1 T  2 D T 
D x   D   x  x 2 x
x 2 x

• We can easily find the extreme points of this equation (differentiate and
equate to zero). On solving, we'll get subpixel key point locations. These
subpixel values increase chances of matching and stability of the
algorithm.

Pattern Recognition By: Dr. Mudassar Raza


Inaccurate Keypoint Localization
The Solution:
Taylor expansion:
 DT  1 T  2 D T 
D x   D   x  x 2 x
x 2 x
Minimize to find accurate extrema:
1
 D D
2
xˆ   2 
x x
If offset from sampling point is larger than 0.5 -
Keypoint should be in a different sampling point.
Brown & Lowe 2002

Pattern Recognition By: Dr. Mudassar Raza


Removing Low Contrast Keypoints
• - xˆ 
Function value at the extremun D
• If D x
ˆ   0.03 (pixel values in range [0,1]) -
keypoint is discarded.

=> down to 729 Keypoints after min. contrast threshold.

Pattern Recognition By: Dr. Mudassar Raza


Eliminating Edge Responses
•The idea is to calculate two gradients at the keypoint.
•Both perpendicular to each other.
•Based on the image around the keypoint, three possibilities exist.
•The image around the keypoint can be:
•A flat region: If this is the case, both gradients will be small.
•An edge: Here, one gradient will be big (perpendicular to the edge) and the
other will be small (along the edge)
•A "corner": Here, both gradients will be big.
•Corners are great keypoints. So we want just corners. If both gradients are
big enough, we let it pass as a key point. Otherwise, it is rejected.

Pattern Recognition By: Dr. Mudassar Raza


Eliminating Edge Responses
• The Problem:
• “Edge“ keypoints are poorly determined.

Point detection
Point detection Point can move along edge

=>
unstable.

Pattern Recognition By: Dr. Mudassar Raza


Eliminating Edge Responses
• The Solution:
• Check Keypoints “cornerness”.

Point constrained

• High “cornerness”  No dominant principal curvature


component.

Pattern Recognition By: Dr. Mudassar Raza


Finding “Cornerness”
• Principal curvature are proportional to eigenvalues of Hessian
matrix: max , min

 Dxx Dxy 
H 
 Dxy Dyy 
• Harris (1988) showed:
max Tr ( H ) 2 ( r  1)2
r 
min Det ( H ) r

• Threshold: if r < 10 - ratio is too great, keypoint discarded.

Pattern Recognition By: Dr. Mudassar Raza


Stable Keypoints

=> down to 536 Keypoints after “cornerness” threshold.

Pattern Recognition By: Dr. Mudassar Raza


SIFT Algorithm

• Scale-space extrema detection.


• Keypoint localization.
• Orientation assignment.
• Keypoint descriptor.

Representation invariant to Rotation.

Pattern Recognition By: Dr. Mudassar Raza


Keypoint orientations

• we have legitimate key points.


• They've been tested to be stable.
• We already know the scale at which the keypoint was detected (it's
the same as the scale of the blurred image).
• So we have scale invariance.
• The next thing is to assign an orientation to each keypoint.

Pattern Recognition By: Dr. Mudassar Raza


Gradients
• For each sample point we compute gradient’s magnitude and
orientation:
m x, y   Lx  1, y   Lx  1, y   Lx, y  1  L( x, y  1) 
2 2

 Lx, y  1  L( x, y  1)  
 x, y   tan 1  
 Lx  1, y   Lx  1, y  

The magnitude and orientation is calculated for all pixels around the keypoint.
Pattern Recognition By: Dr. Mudassar Raza
Gradients
• A histogram is created for this.
• In this histogram, the 360 degrees of orientation are broken into
36 bins (each 10 degrees).

Pattern Recognition By: Dr. Mudassar Raza


Gradients

Pattern Recognition By: Dr. Mudassar Raza


SIFT Algorithm
• Scale-space extrema detection.
• Keypoint localization.
• Orientation assignment.
• Keypoint descriptor.

Distinctive (yet invariant) Keypoint


Representation.

Pattern Recognition By: Dr. Mudassar Raza


Generating features

• We want to generate a very unique fingerprint for the keypoint.


• It should be easy to calculate.
• We also want it to be relatively lenient when it is being
compared against other keypoints.
• Things are never EXACTLY same when comparing two
different images.

Pattern Recognition By: Dr. Mudassar Raza


Generating features
• To do this, a 16x16 window around the keypoint. This 16x16
window is broken into sixteen 4x4 windows.

Pattern Recognition By: Dr. Mudassar Raza


Generating features
• Within each 4x4 window, gradient magnitudes and
orientations are calculated.
• These orientations are put into an 8 bin histogram.

orientation in the range 0-44 degrees add to the first bin. 45-89 add to the next bin. And so
on. And (as always) the amount added to the bin depends on the magnitude of the gradient.
Pattern Recognition By: Dr. Mudassar Raza
SIFT Algorithm

Pattern Recognition By: Dr. Mudassar Raza


References
• https://www.aishack.in/tutorials/sift-scale-invariant-feature-transform-log-approximation/
• Duda, R. O., Hart, P. E., & Stork, D. G. (2012). Pattern classification. John Wiley & Sons.
• https://courses.cs.washington.edu/courses/cse576/06sp/notes/Interest2.pdf
• http://www.ballardblair.net/projects/Difference_of_Gaussian_presentation.pdf
• Szeliski, R. (2010). Computer vision: algorithms and applications. Springer Science & Business Media.

Pattern Recognition By: Dr. Mudassar Raza


Next Lecture
Bayes Decision Theory

Pattern Recognition By: Dr. Mudassar Raza

You might also like