You are on page 1of 27

Scale Invariant Feature Transform

¨ Harri Auvinen, Tapio Leppalampi, Joni Taipale and Maria Teplykh
Lappeenranta University of Technology Machine Vision and Digital Image Analysis

November 24th , 2009


Scale-Invariant Feature Transform (SIFT) A method developed by David G. Lowe Feature extraction method Invariance in feature extraction A method should locate features Extraction method should be robust, it should handle different types of changes between images
Illumination Affine transform
Scale Rotation

The steps of the SIFT algorithm: 1. Scale-space extrema detection
Search over scales and image locations Locate local extremas

2. Keypoint localization
Selects keypoints from local extremas Keypoints are selected based on measures of their stability

3. Orientation assignment
Orientations are assigned to each keypoint based on local image gradient directions

4. Keypoint descriptor
Local image gradients are measured at the selected scale in the region around each keypoint These are transformed into a representation that allows for signifigant levels of local shape distortion and change in illumination

Scale-space extrema detection

The first stage in keypoint detection is to find the local extremas in scale-space. It contains A cascade filtering approach Creation of octaves and Scale-space images for each octave

Scale-space extrema detection
The original image I(x, y ) is blurred with Gaussian filter L(x, y , σ) = G(x, y, σ) ∗ I(x, y ), where G(x, y , σ) = 1 1 exp − 2 (x 2 + y 2 ) . 2πσ 2 2σ (1)


The procedure is repeated by changing the scale σ by multiplying it with the factor k s times. Then the difference of Gaussians (DoG), D(x, y, σ) = L(x, y , k σ) − L(x, y, σ), (3)

is calculated for adjagent blurred images. According to Lowe to achieve stable keypoints one should set s = 3 and k = 21/s .

Scale-space extrema detection

Scale-space extrema detection

The procedure to calculate the differences of Gaussians is then repeated for each octave. The creation of the next octave: Select the Gaussian blurred image which has σ value twice to that of the original Subsample the image and use the output as the starting point for next octave Subsampling is made by selecting every second pixel from the rows and columns of the image

Keypoint localization

Detection of local extremas/keypoints
Find the extrema points in the DoG pyramid Improve the localization of the keypoint to subpixel accuracy by using a second order Taylor series expansion

Elimination of keypoints
Eliminate some points from the candidate list of keypoints by finding those that have low contrast or are poorly localised on an edge Contrast thresholding Cornerness thresholding

Detection of local extremas/keypoints

To detect the local maxima and minima of D(x, y , σ) each point is compared with the pixels of all its 26 neighbours If this value is the minimum or maximum, then this point is an extrema

Detection of local extremas/keypoints. Brown and Lowe method
Improvement to matching and stability Approach uses the Taylor expansion of the scale-space function, D(x, y, σ), shifted so that the origin is at the sample point: ∂D T 1 ∂2D D(X ) = D + X+ X X (4) ∂X 2 ∂X 2 where D and its derivatives are evaluated at the sample point and X = (x, y , σ)T is the offset from this point. The location of ˆ the extremum X is determined by taking the derivative of this function with respect to X and setting it to zero, giving ˆ X =− ∂2D ∂X 2

∂D ∂X


ˆ If X > 0.5 then it means that the extremum lies closer to a different sample point. In this case, the interpolation is performed.

Elimination of keypoints
a The 233x189 pixel original image b The initial 832 keypoints locations at maxima and minima of the difference-of-Gaussian function c After applying a threshold on minimum contrast 729 keypoints remain d The final 536 keypoints that remain following an additional threshold on ratio of principal curvatures

Contrast thresholding

ˆ The function value at the extremum, D(X ), is useful for rejecting unstable extrema with low contrast. This can be obtained by substituting equation (5) into (4), giving ˆ D(X ) = D + 1 ∂D T ˆ X. 2 ∂X (6)

ˆ If the function value at X is below a threshold value this point is ˆ excluded. For (c) all extrema with a value of |D(X )| < 0.03 were discarded.

Cornerness thresholding

A poorly defined peak in the difference-of-Gaussian function will have a large principal curvature across the edge but a small one in the perpendicular direction. The principal curvatures can be computed from a 2x2 Hessian matrix, H, computed at the location and scale of the keypoint: H = Dxx Dxy Dxy Dyy (7)

The derivatives are estimated by taking differences of neighboring sample points. The eigenvalues of H are proportional to the principal curvatures of D.

Cornerness thresholding
Let α be the eigenvalue with the largest magnitude and β be the smaller one Tr (H) = Dxx + Dyy = α + β (8) Det(H) = Dxx Dyy − (Dxy )2 = αβ Let r be the ratio between the largest magnitude eigenvalue and the smaller one, so that α = r β. Then, Tr (H)2 (α + β)2 (r β + β)2 (r + 1)2 = = = Det(H) αβ r r β2 The quantity (r + 1)2 /r is at a minimum when the two eigenvalues are equal and it increases with r . Therefore, to check that the ratio of principal curvatures is below some threshold, r , we only need to check Tr (H)2 (r + 1)2 < Det(H) r The transition from (c) to (d) was obtained with r = 10. (11) (10) (9)

Orientation assignment (2)

Left: The point in the middle is the keypoint candidate. The orientations of the points in the square area around this point are precomputed using pixel differences. Right: Each bin in the histogram holds 10 degree, so it covers the whole 360 degree with 36 bins in it. The value of each bin holds the magnitude sums from all the points precomputed within that orientation.

Keypoint descriptor

Keypoint samples are accumulated into orientation histograms summarizing the contents over 4x4 subregions Best result is obtained 4X4 array of histograms with 8 orientation bins in each As a result a 4x4x8 = 128 element feature vector is generated for each keypoint

Keypoint descriptor

Orientation invariance
In order to achieve orientation invariance the coordinates of the descriptor and the gradient orientations are rotated relative to the keypoint orientation For efficiency, the gradients are precomputed for all levels of the pyramid A Gaussian weighting function with equal to one half the width of the descriptor window is used to assign a weight to the magnitude of each sample point The purpose of the Gaussian window is
To avoid sudden changes in the descriptor with small changes in the position of the window And to give less emphasis to gradients that are far from the center of the descriptor, as these are most affected by misregistration errors

Boundary affects

To avoid all boundary affects Trilinear interpolation is used to distribute the value of each gradient sample into adjacent histogram bins In other words, each entry into a bin is multiplied by a weight of 1 − d for each dimension d is the distance of the sample from the central value of the bin as measured in units of the histogram bin spacing

Effect of illumination

The feature vector modification Reason by this is to reduce the effects of illumination change First, the vector is normalized to unit length Second, threshold the values in the unit feature vector And then renormalizing to unit length

Demo and applications

Search for the sample in the image
Classification of remote sensed imagery. [Yang&Newsam, 2008]

Model images of planar objects Recognition of 3D objects Recognising Panoramas People Redetection [Hu et al., 2008]

Object recognition

Recognising panoramas

Comparison and Modifications
Until now, SIFT has been proven to be the most reliable descriptor among the others. Ancuti&Bekaert, 2007 Mikolajczyk&Schmid, 2005 Modifications CSIFT: A SIFT Descriptor with Color Invariant Characteristics [Abdel-Hakim&Farag] SIFT-CCH: Increasing the SIFT distinctness by Color Co-occurrence Histograms [Ancuti&Bekaert, 2007] PCA-SIFT: A More Distinctive Representation for Local Image Descriptors [Ke et al., 2004]
”. . . instead of using SIFT’s smoothed weighted histograms, we apply Principal Components Analysis (PCA) to the normalized gradient patch.” ”. . . more distinctive and more compact leading to significant improvements in matching accuracy (and speed) for both controlled and real-world conditions.”

David G. Lowe, Object Recognition from Local Scale-Invariant Features, Proc. of the International Cenference on Computer Vision, 1999 David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 2004 M. Brown and D.G. Lowe, Recognising Panoramas, International Conference on Computer Vision, 2002 Andrea Vevaldi, SIFT for Matlab, vedaldi/code/sift.html Cosmin Ancuti and Philippe Bekaert, SIFT-CCH: Increasing the SIFT distinctness by Color Co-occurrence Histograms, Proceedings of 5th IEEE International Symposium on Image and Signal Processing and Analysis, 2007.

K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE PAMI, 2005. Alaa E. Abdel-Hakim, Aly A. Farag: CSIFT: A SIFT Descriptor with Color Invariant Characteristics. Y. Ke, R. Suthankar and L. Hutson, PCA-SIFT: a more distinctive representation for local image descriptors, in Proc. of CVPR, 2004. Y. Yang and S. Newsam, Comparing SIFT Descriptors and Gabor Texture Features for Classification of Remote Sensed Imagery, IEEE International Conference on Image Processing, 2008 Lei Hu, Shuqiang Jiang, Qingming Huang, Yizhou Wang,Wen Gao,PEOPLE RE-DETECTION USING ADABOOST WITH SIFT AND COLOR CORRELOGRAM, The International Conference on Image Processing (ICIP2008)