Professional Documents
Culture Documents
that are reasonably invariant to changes in illumination, image noise, rotation, scaling, and small changes in
viewpoint. SIFT is detector and descriptor algorithms in one.
Types of invariance
Advantages:
1. Locality: features are local, so robust to occlusion and clutter (against global feature such as histogram)
2. Distinctiveness: individual features can be matched to a large database of objects
3. Quantity: many features can be generated for even small objects
4. Efficiency: close to real-time performance
This is the initial preparation. You create internal representations of the original image to ensure
scale invariance. This is done by generating a "scale space".
Scales:
What should be sigma value for Canny and LoG edge detection?
If use multiple sigma values (scales), how do you combine multiple edge maps?
Scale Space:
Laplacian-of-Gaussian (LoG)
Intuition:
• Find scale that gives local maxima of some function f in both position and scale.
• Function responses for increasing scale (scale signature)
Continue until the maxima appear in both images
Looking for maximum responses in each scale, this is different not like edge for each we look for
zero-crossing.
a) Pyramids
The first step toward the detection of interest points is the convolution of the image with Gaussian
filters at different scales, and the generation of difference-of-Gaussian images from the difference of
adjacent blurred images.
This is just the difference of the Gaussian-blurred images at scales σ and k σ. That is mean if we
take a difference of Gaussian in different scales (σ and k σ) is the same as LoG operation.
Find keypoint from maximum and minimum of D over neighboring pixels and scales above and
below.
Compare a pixel (X) with 26 pixels in current and adjacent scales (Green Circles)
Select a pixel (X) if larger/smaller than all 26 pixels
Large number of extrema ,computationally expensive
Detect the most stable subset with a coarse sampling of scales
Local extrema detection, the pixel marked x is compared against its 26 neighbors in a 3 x 3 x 3
neighborhood that spans adjacent DoG images.
Interest points (called keypoints in the SIFT framework) are identified as local maxima or
minima of the DoG images across scales. Each pixel in the DoG images is compared to its 8
neighbors at the same scale, plus the 9 corresponding neighbors at neighboring scales. If the
pixel is a local maximum or minimum, it is selected as a candidate keypoint.
By perform derivatization on the D(x) function and equal it to zero then minima or maxima
are located at:
Eliminate key points if (is edge because one direction is high and the other is low)
Orientation Assignment
( ) √[ ( ) ( )] [ ( ) ( )]
( ) [ ( ) ( )]⁄[ ( ) ( )]
Create a weighted direction histogram in a neighborhood of a key point (36 bins). i.e. 10 for each
bin (360/10=36)
Weights are
Gradient magnitudes (the long one is more important)
Spatial Gaussian filter with =1.5 x <scale of key point>
Once a keypoint orientation has been selected, the feature descriptor is computed as a set of
orientation histograms on 4 x 4 pixel neighborhoods. The orientation histograms are relative to
the keypoint orientation, the orientation data comes from the Gaussian image closest in scale to
the keypoint's scale. Just like before, the contribution of each pixel is weighted by the gradient
magnitude, and by a Gaussian with 1.5 times the scale of the keypoint.
Example: