You are on page 1of 7

Journal of Optics Applications January 2013, Volume 2, Issue 1, PP.


Improved SIFT Algorithm Image Matching
Can Ding 1# , Chang-wen Qu 1, Jie Gao 2
1. Department of Electronic and Information Engineering, Naval Aeronautical & Astronautical University, Yantai 26400, China 2. 92840 Unit, Qingdao 266405, China #Email:

High-dimensional and complex feature descriptor of SIFT not only occupies a large memory space, but also affects the speed of feature matching. We adopt the statistic feature point’s neighbor gradient way in which the local statistic area is constructed by 8 concentric square ring feature of points-centered, and the pixels gradient as well the statistic gradient accumulated value of eight directions are computed before sorting them in descending order and standardizing them. The new feature descriptor descends feature dimension of feature from 128 to 64. The experiment reveals that the proposed method can improve matching speed and keep matching precision at the same time. Key words: SIFT Algorithm; Image Matching; Feature Descriptor; Difference of Gaussian

Image matching, an important part of computer vision and digital image processing, is widely used in photogrammetry and remote sensing, resource analysis, 3D reconstruction, object recognition and many other fields, has been focus of research. The fact that image matching is affected by weather, sunlight, shelter and other external factors significantly, varying imaging time, angle, distance, as well the factors such as image translation, rotation, scaling and other issues poses much difficulty to image matching. For a long time, many domestic and foreign scholars are committed to solving the above problem of image matching technology. Recent years, in computer vision, based on local invariant descriptor method in object recognition image matching has made significant progress. In 2004, Lowe formally proposed a local features operator based on image scaling, rotation, affine transformation to maintain the invariant of image that is SIFT operator by means of the summary made on the existing techniques based on invariant feature detection methods[1]. The algorithm is a better solution to part block of the scene, rotate zoom, image distortion caused by the point of view and other issues, but the algorithm still has a problem, that is, high-dimensional of feature descriptor leading to complex calculation and long match time. Y ke et al proposed PCA-SIFT[2] method to reduce dimensions of feature descriptor, but in the absence of any a priori knowledge, this method increased the amount of calculation; Grabner[3]who used integral image has already accelerated computing speed of SIFT, but reduced the superiority of SIFT method. Delponte[4] proposed using SVD method for feature matching, but the matching process is very complexity, and can not be used for wide-baseline matching. These methods are improved in feature describe or matching stage, without changing the algorithm itself. Mikolajczyk[5] confirmed that in most of cases, SIFT descriptors were subjected to various transformations between images with minimal impact and the most stable match performance. They also proposed Gradient location-orientation histogram (GLOH) local descriptor. Regardless of PCA-SIFT or GLOH take advantage of PCA technology, which requires a series of representative images to train the projection matrix, which not only requires additional offline computation time, but trained matrix only works for this type image, do not have broad applicability. In the process of extracting invariant features of image, 6 refs. added a multi-scale Harris corner detection operator, which increases the repeatability of matching point pairs. 7 refs. proposed a feature point detecting method which was a simplification of the SIFT method, that ensures the performance and decreases the computation at the same time. Deng[8] proposed that ratio of the first and second nearest neighbor distance with mutual correspondence constraint is used to setup the initial correspondences, then using Random Sample

Conesus(RANSAC) to remove the mismatched feature point. Wu[9] proposed the WT-SIFT method. After the wavelet transform, Low-resolution image affected by the local details of the image is greatly reduced, which can improve the scale SIFT feature extraction capability. But the method’s running time is long not conducive to real-time matching. Against the problem of high dimension and complexity of SIFT in image matching, we improved SIFT algorithm to simplify the computational complexity by reducing the dimension of feature descriptors and shorten the matching time further.

SIFT algorithm is one kind of local feature extraction algorithms,finds key points in scale space,extracts position, scale, rotation invariant. SIFT feature point is the extreme point of the differential Gaussian scale space, in the Gaussian image by comparing 26 neighbourhood of each pixel with the current scale, the upper and lower scale to get the maximum and minimum values. Then SIFT algorithm use the Taylor expansion and the Hessian matrix to filter instable extreme point and calculate accuracy position of sub-pixel, and in the Gaussian image, statistic analysis gradient values and gradient direction of each pixel of neighbourhood of key points to get main direction in order to achieve operator’s independence to scale and direction. SIFT feature extraction consists mainly of four steps: (1) Image scale space, detects extreme point of scale space; (2) Accurately determine the key points, excluding the unstable point; (3) Confirm the direction of key points; (4) SIFT feature vector generation.

2.1 Image Scale Space, Detect Extreme Point of Scale Space
One of the characteristics of the key points is invariance to scale change which is in different scales can be detected. Koenderink and Lindeberg proved that: Gaussian convolution kernel is only linear nuclear scale transformation. So an image’s scale space can defined as: L  x, y ,   , obtained by convolution variable scale Gaussian function
G  x, y ,   with the input image I  x, y  ,

L  x, y ,    G  x, y ,    I  x, y 


Here, G  x, y,   is variable scale Gaussian function. In order to detect the key point in the scale space effectively, we adopt Difference of Gaussian(DoG) scale space, which is using different scale nuclear image convolution with image.
Dx, y,    Gx, y, k   Gx, y,   I x, y  Lx, y, k   Lx, y,  


Here, two adjacent scales separated by a constant k. DOG operator are the normalized scale approximation of LOG operator. In order to find key point invariant to scale, it is needed to calculate the differential of adjacent scale image. We can get a series of images and detect extreme points in this image space. We can use pyramid to calculate the Gaussian difference image efficiently, as shown in FIG. 1. After generating Gaussian differential image, we calculate extreme point in the space. In order to find the extreme points of the scale space, it should be compared with each sample point and all adjacent points to see whether it is larger or smaller than adjacent points in image domain or scale domain. As shown in FIG. 2, we compare the middle detection point with the eight adjacent points in the same scale and 9× 2 points in the upper and lower adjacent scales, to ensure the extreme points in scale space and the two-dimensional image space. If the point is the maximal or minimal in the 26 neighborhood, it can be considered that this point is a key point of the image in the scale.



2.2 Accurately Determine the Key Points, Excluding the Unstable Point
By fitting 3D quadratic function to accurately determine the location and scale of key points, in order to enhance the matching stability and improve noise immunity, it should remove the low contrast key point and unstable edge response point(because DoG operator have strong edge response).

2.3 Confirm the Direction of Key Points
Main direction refers to the direction corresponding to the maximum in the histogram of the gradient direction of each point of the key points neighborhood. The subsequent descriptor structure takes the main direction as a reference, thus the constructed descriptor has rotation invariance. For each image sample L  x , y  , the gradient magnitude m  x, y  and orientation   x, y  are pre-computed using pixel differences:
mx, y  

Lx  1, y   Lx  1, y 2  Lx, y  1  Lx, y  12
 x, y   tan1
Lx, y  1  Lx, y  1 Lx  1, y   Lx  1, y 

(3) (4)

In practice, the sampling area is the center of the neighborhood of the key point, adopting histogram statistics to the gradient direction of neighborhood pixel. Gradient histogram range is 0-360 degrees, one for 10 degrees, a total of 36. The peak of the histogram represents the main direction of neighborhood’s gradient of the key point, which is the direction of the key point. In the histogram, when there is another peak equivalent to 80% of the peak energy, this direction may be regarded as the secondary direction of the key point. A key point may be specified with several directions (a main direction, more than one auxiliary direction), which can enhance the robustness of the matching. Figure 3 shows the direction histogram.


2.4 SIFT Feature Vector Generation
In constructing descriptor, it is necessary to rotate the main direction of the coordinates of the descriptor to the main direction of the key point, to satisfy the rotational invariance. Then, taking 8×8 window of the key point. As shown in FIG 4, the black point in the center of left image in which every cell represents a pixel of key point’s neighborhood, using the formula to obtain gradient magnitude and gradient direction of each pixel. The direction of the arrow represents the gradient direction of the pixel, and the arrow length represents the gradient modulus. Then using Gaussian window to weight it, the blue circle represents the range of the Gaussian weighted. Further calculating the direction of the gradient of the eight directions histogram on each block of 4×4, accumulating value of each gradient direction. Finally, a seed point can be formed. The joint of neighborhood’s directional information can enhance the anti-noise ability, also provide better fault tolerance to feature matching containing positioning error. In the actual calculation process, in order to enhance robustness of matching, Lowe proposed 4 × 4 of 16 seeds for each critical point to describe, that can produce 4×4×8=128 data for a key point, and a 128-dimensional SIFT feature vectors can be formed at last.. Now that SIFT feature vector has removed the effect of scale changes, rotation. Then we normalized the feature vectors, to remove the influence of the illumination change.


High-dimensional feature descriptor of SIFT not only occupies a large memory space, but also affects the speed of feature matching. In the feature detection stage, we adopt the method of reference to detect the location of each key points, the feature scale and main orientation. In the stage of calculating feature operator, we adopt the method of statistic neighbourhood gradient of key points, local statistical area consisting of 8 concentric square rings, as shown FIG. 5(FIG. 5 only shows four concentric square rings). When the pixels of the same square ring changed, the relative position unchanged, while the other pixel relative information remained unchanged. So descended sort gradient accumulated value of 8 directions in square ring will keep stability to image rotation. Then the 8 gradient accumulated value of first square ring is the first 8 feature vector. The rest can be done in the same manner. In this way, we can get 8× 8=64 feature vector as feature operator of key point. In real calculation, pixel’s gradient information close to the key point can be applied repeatedly, and the closer to the key point, the more contribution to form feature operator, which is similar to the effect of Gaussian weighted. Finally, the feature operator is modified to reduce the effects of illumination change. Firstly, the operator is normalized to unit length. A change in image contrast in which each pixel value is multiplied by a constant will multiply gradients by the same constant, so this contrast change will be cancelled by operator normalization. A brightness change in which a constant is added to each image pixel will not affect the gradient values, as they are computed based on pixel differences. Therefore, the descriptor is invariant to affine changes in illumination. The new descriptor reduces the dimensions of feature operator from 128 to 64, and key point neighbour’s gradients

can give more expression to descriptor, so this method can keep the case of matching accuracy at the same time improve matching speed.


In order to test the effectiveness of improved algorithm, this article was carried out on the following experiment. We matched two optical images which have a certain radiation aberration and geometry distortion from different view and time. (In our experiment, we test lots of images, here we only show two of it, as figure 6 and figure 7). Feature point matching need compute the Euclidean distance between every point feature operator of two images. If the least Euclidean distance is 0.8 times less than secondary least Euclidean distance, then we say the two feature point matching. After repeated experiments when the ratio = 0.6, we consider that there may omit some match points but remove some mismatch point, in addition the matching ratio is the best. It is the theory that more match points helps the achievement of higher accuracy in higher accuracy; however, in fact, too many points do not help improve matching accuracy as generally 20-50 match points can satisfy the matching. In this article if a target in the test image matching is more than seven points, then the target can be reliably identified. Seven matching points can guarantee high accuracy of the calculation of the affine transformation parameters. Using the Intel core 2 Duo clocked at 2.2GHz, 504MB memory computer, Matlab 7.04 programming environment to implement the experiment.


Ratio=0.6 Feature points Match points Mismatch points Matching rate Matching time (s) The original algorithm 506 1152 21 1 95.2% 6.27 The improved algorithm 506 21 1 95.2% 2.21 1152

Ratio=0.6 Feature points Match points Mismatch points Matching rate Matching time (s) The original algorithm 1535 49 3 93.9% 5.96 1448 The improved algorithm 1535 49 3 93.9% 2.04 1448

The reason for the fact proved by the experimental results that the improved algorithm is able to maintain the matching accuracy, raise matching speed and thus shorten the matching time is that the feature operator of key points consumes a long time of original algorithm.

We have improved feature operator of SIFT algorithm high-dimensional and high-complexity of SIFT algorithm by force of the adoption of neighbourhood gradient statistics, local statistical area consisting of eight concentric square split ring and the dimension of descriptor reduced from 128 to 64. Experimental results reveal that shortens the matching as well keep the matching accuracy.

[1] [2] Lowe. “Distinctive image features from scale-invariant keypoints” International Journal of Computer Vision 2(2004): 91-110. Accessed November 1 2004. doi: 10.1023/B:VISI.0000029664.99615.94 KE Y, SUKTHANKAR R. “PCA-SIFT: A More Distinctive Representtation for Local Image Descriptors’’. Paper presented at Conference Computer Vision and Pattern Recognition, Washington DC , ETATS-UNIS, July 511-517, 2004

[3] [4] [5] [6] [7] [8] [9]

GRABNER M, GRABNER H, BISCHOF H. “Fast Approximated SIFT” Paper presented at 7th Asian Conference on Computer Vision, Hyderabad, India, January 918-927,2006, 1 Elisabetta Delponte, Francesco Isgrò , Francesca Odone, et al. “SVD-matching using SIFT features”. Graphical Models 65(2006): 415-431. Accessed September, 2006. doi: 10.1016/j.gmod.2006.07.002 Mikolajczyk K, Schmid C. “A Performance Evaluation of Local Descriptors”. Analysis and Machine Intelligence, 27(2005): 1615-1630. Accessed Oct, 2005 37(1): 156-160 Gao Jian, Huang Xinhan, Peng Gang, et al. “Simplified SIFT feature point detecting method”. Application Research of Computers, 2008, 25(7): 2213-2215 Deng Chuanbin, Guo Lei, Li Wei. “Remote Sensing Image Registration Algorithm Based on SIFT”. Chinese Journal of Sensors and Actuators, 2009, 22(12): 1742-1746 Wu Jianming, Tian Zheng, Liu Xiangzeng, er al. “Proposing an Effective Method for Image Muti-scale Registration by Combining SIFT with Wavelet Transform(WT)”. Journal of Northwestern Polytechnical University, 2011, 29(1): 17-21 doi: 10.1109/TPAMI.2005.188 Liu Xiao jun, Yang Jie, Sun Jianwei, et al. “Image registration approach based on SIFT”. Infrared and Laser Engineering, 2008,


Can Ding (1983- ), doctoral candidate,


Changwen Qu (1963- ), PhD, the major field is Electronic

the major field is Image feature extraction and target detection. Email:

warfare, signal processing, target recognition.