You are on page 1of 6

Face Recognition Using Binary SIFT and its Robustness against Face Variations

Rahul Prakash1, Padmavati2 PEC University of Technology, Chandigarh (1 rp.ercs@gmail.com, 2Padma_khandnor@yahoo.in)


Abstract SIFT algorithm is one of the most notable algorithm being used for feature extraction. In order to detect the object, these extracted features should be matched with the features extracted from the target image. For the matching purpose there are number of algorithms (Euclidian distance, Cityblock, correlation etc) to compute the distance between extracted features. But the space and time complexity of this algorithm is high enough to meet the real time requirements because of having the large feature vector space (256128). To overcome this drawback, Binary-SIFT is introduced by Kadir A. Peker having a very small feature vector space (comparatively) and meets the real time requirements by using the XOR function for matching purpose which needs very less time in comparison with the technique mentioned above. In this paper we have done the performance evaluation of SIFT and Binary- SIFT on test images of Indian face database and The ORL Database of Faces. The performance of both the algorithms is same if the variation (change in scale, illumination and rotation) in the image is very low. But as the variation increases the Binary-SIFT algorithm began to lack the performance in comparison with SIFT. . KeywordsBinary-SIFT, DoG (Difference of Gaussian), Keypoints, SIFT, XOR. I. INTRODUCTION Feature Extraction is becoming the basic need under the image processing field because for most of the image processes like image stitching, panoramic photography, object detection and recognition, stereo photography, robot navigation and many more applications there is a need of this feature extraction process. There are number of feature extraction techniques exists but SIFT feature extraction technique is used for object recognition is one of the most robust approach [2], [3]. This technique is widely being used for feature extraction purpose but the technique increase space and time complexity. To overcome these difficulties there are broadly two approaches, first, a high performance and compact hardware should be used so that the SIFT algorithm can work fast in real time environment [3] and second, the algorithm should be light enough so that a low performance compact hardware can work with this efficiently to meet the real time performance requirements. Binary-SIFT algorithm is the solution belongs to the second method [4], [5]. In case of SIFT, a much larger space is being used for each feature vector (256128) but in the case of Binary-SIFT this huge space is contracted to very small (In comparison with SIFT) to 2128 feature vector can exist [4]. Even that much space is also sufficient to fulfill the high separation quality of SIFT. According to [4] very less information loss happens during the quantization of the SIFT feature vector extraction which does not affect the robustness of the SIFT algorithm. But during our experiments, on various face images taken by us the robustness of the algorithm also affected if the change in the face (Illumination change, rotation, scaling etc.) exceeds from some extend, instead the original algorithm retain its robustness against these changes more than the Binary-SIFT. In the second section we brief about the related work in the field of face recognition, the third section briefs about SIFT algorithm and in fourth section we brief about quantizing the SIFT features and the last section of this paper will show our test results to prove the argument that SIFT algorithm is more robust on face recognition database instead of Binary-SIFT. The experimental results show the difference between these two algorithms. II. RELATED WORK Face recognition system is a computer application for automatically identifying or verifying a person by taking his digital image with some image capturing device. The common method to do this is by comparing the target image features with some facial features database. This technique is commonly being used in security systems. Other biometric techniques (finger print matching, voice recognition, iris, retina, hand geometry, ear geometry) are also available for security purpose so the question arises that why the face recognition is special among them? Other biometric techniques have the limitation that those cannot be used in general public areas like stadium, cinema hall or roads, but face recognition technique has no limitations related to the no of persons or place. Face recognition is a subfield of pattern recognition. Face recognition mainly divided into two parts: 2-D face recognition and 3-D face recognition. Here we interested only with 2-d face recognition. There are number of 2-D face recognition techniques like PCA, LDA, SIFT, Binary-SIFT etc. as SIFT technique gives better result in comparison with PCA and LDA [10] [11]. This is the reason why we choose SIFT over other face recognition technique. According to Kadir A. Peker Binary-SIFT is the improved version of SIFT, and here we try to explain that this statement is true but only in terms of time required for execution, the results after high variation in images is better in case of SIFT. III. FEATURE EXTRACTION USING SIFT The SIFT feature extraction technique is one of the most popular technique which is being used in object recognition [6], panoramic photography, image registration and also in medical purposes. The SIFT feature extraction is based upon the difference-of-Gaussian over all scales of the image and the

keypoint is detected as the maxima of that image pyramid [1]. The multi scale approach of David Lowe makes it robust against the scaling of the image. Each keypoint has a 128-D vector associated with it which shows the gradient orientation around the keypoint. The most substantial gradient orientation is selected to provide the rotational invariance to the algorithm. Each value in SIFT vector is lies between 0 and 255. So there may 18x10308 (256128) distinct points exist which shows the high separation quality among the different keypoints. In this section we describe the basic steps of the SIFT algorithm In brief according to the [1] which we will follow in this paper. A. Gaussian Pyramid In order to identify the keypoints the first step to create a Gaussian pyramid. First we have to make the convolution of the given source image with the Gaussian function. Let I(x,y) be the given source image and G(x,y,) be the Gaussian function, then the convolved image L(x,y,) will be L(x,y,) = G(x,y,) * I(x,y) (1) Where 2 2 1 G x, y , = e (x + y ) (2)

subtracted to produce the Difference-of-Gaussian images on the right. After each octave, the Gaussian image is down-sampled by a factor of 2, and the process is further repeated. B. Keypoint Detection and Stabilization

Fig. 2 Comparing target pixel with its 26 neighbors in 3x3 regions at the current and adjacent scales to compute the maxima and minima of the difference-of-Gaussian images.

2 2

Here x and y are the horizontal and vertical spatial coordinates and the is the variable scaling factor. Let the Difference-of-Gaussian of the image is denoted by D(x,y,). According to the definition, DoG is the difference of

The keypoints are detected by calculating the local maxima and/or the local minima of the DoG images. For each pixel in the DoG image it is compared with its 26 neighboring pixels (8 surrounding neighbors in the same DoG image, 9 surrounding neighbors in its upper-level DoG image and 9 surrounding neighbors in its lower-level DoG image). The pixel is identified as a candidate keypoint if it is the maximum or the minimum out of the total 26 neighboring pixels. In order to become a true keypoint, each detected keypoint have to gone through a stability checking procedure. In the process, first the keypoints with low contrast are rejected because they are considered as the flat points and second, points located on edges are also eliminated because they are considered as the non-distinct points on the image. C. Orientation Assignment We need to calculate principle orientation of each keypoint. In order to do this first, a gradient histogram of orientation is computed from the neighborhood of the keypoint in the corresponding Gaussian filtered image. The gradient of each pixel located at (x,y) is represented in two parts, Gx and Gy. Gx is the gradient in horizontal direction and Gy is the gradient in vertical direction Gx = L (x+1,y) L (x-1,y) And Gy = L (x,y+1) L (x,y-1). For each pixel inside the keypoint region we compute the gradient magnitude and orientation. Gradient Magnitude mag(x,y) = (Gx2 +Gy2)1/2 Gradient Orientation (x,y) = tan-1(Gx + Gy)

Fig. 1 Gaussian Pyramid and the Difference-of-Gaussian

the two adjacent Gaussian filtered images. Then the computed DoG will be D(x,y,) = L(x,y,k) L(x,y,) (3) Here k is a multiplicative factor. The suggestive value of k is 21/3 according to the David Lowe [1]. The process of construction of DoG can be easily shown by the fig. 1. For each octave of scale space, the initial image is repeatedly convolved with Gaussians function to produce the set of scale space images shown on the left. Adjacent Gaussian images are

(a)

(b)

(c) Fig. 3 Generation of Feature Descriptor

Now we create a gradient histogram of orientation, to do so, we divide the whole 360 degree into 10 equal parts and now collect all the gradient magnitude within the respective bin. The peak of the histogram will show the orientation of the keypoint. D. Keypoint Descriptor Under this process we transfer the detected key points and their neighboring pixels into specified feature descriptors. Taking a keypoint as the center, its keypoint region is divided into 4x4=16 square sub-regions on the Gaussian-filtered image hosting the target key-point, as shown in Fig. 3(a) and Fig. 3(b). The gradient histogram of orientation is computed for each sub-region, and each histogram now has 8 orientation bins as shown in Fig. 4(c). In other words, now each bin covers 45 degrees. There is a subtle detail - the gradient histograms of orientation are weighted by a Gaussian function as specified in Lowes algorithm [1]. To achieve rotation invariance, the pixels within each sub-region is further rotated with the key-point orientation as we have computed previously. Overall, these 16 histograms will be represented by 16 x 8 = 128 values. IV. BINARY QUANTIZED FEATURES There is a very large space available for SIFT feature vectors to show the distinctive power of the algorithm but that much large space cause a high utilization of the database and time during the process. There is a way to optimize the utilization of the database as well as time, quantization of the SIFT features to binary quantized features can be a solution to this problem. A binary quantized feature utilizes 2128 feature vector space instead of 10308. Now the question arises that if that much feature vector space is sufficient to full fill the performance requirement in comparison to the previously defined large vector space? According to [4], Yes, quantized features vectors can preserve the performance of the original SIFT feature vectors. During quantization of vectors we actually divide the space

into different partitions. It should be kept in mind that the SIFT vectors should be uniformly distributed over these partitions. That is so, for maximum utilization of the space and minimizes the probability of lying two distinct feature vectors into the same partition. Here the criterion for optimum quantization is, preserving the distance relations between vectors instead of the error between the original and quantized values. In SIFT algorithm the matching process is also a typical one terms of time, but the advantage of Binary-SIFT finishes its matching process faster than the original SIFT. This is so because the distance computation between vectors changes into calculating the bitwise XOR of two vectors and after that just count the non-zero bits to find the keypoints matched. Goal here is, mapping different SIFT vectors to different binary vectors. Similar SIFT vectors should be mapped to similar binary vectors & dissimilar SIFT vectors to dissimilar binary patterns so that the retrieval test return similar result. It is proposed that the median value of each of the feature components as the quantization thresholds [4]. In 1-D, quantization of a single component using the median will result in equal probabilities of 1 and 0. This helps to maximize the entropy and spreads SIFT vectors uniformly over the new binary space. In our next section, our experimental results will show the matching performance of SIFT and Binary-SIFT comparatively under different lightning and orientation conditions. We extract the feature vectors from the image with the help of SIFT algorithm and store these feature vectors into the database, after that these feature vectors are quantized to binary and separately stored into the database. The performance graph between SIFT and Binary-SIFT under illumination change, angle oriented rotation and scaling shows the difference between matching algorithms. V. EXPERIMENTAL RESULTS We have collected more than 1500 images of various face recognition databases and some real time faces which are having very high overlapping content. First, we extract feature vectors from these images using SIFT feature extraction algorithm, after that we map these feature vectors to distinct binary vectors by calculating them with the help of median-based binary quantization. We have collected over 10,000 feature vectors using original SIFT feature vector extraction algorithm and after that we quantized these feature vectors to binary vectors. After quantizing these features to binary, only 4500 feature vectors are there which are common between SIFT feature vectors and binary quantized vectors. Here more than 95% vector space collapsed, but also 95% matching points also reduced from the database. Lot of matching points eliminated during the binary quantization which can help in matching during the variations (illumination change, scaling & rotation) on face.

Fig. 4 (b) Graph between SIFT and Binary-SIFT showing the recognition rate under Scaling of the Image. Fig. 4 (a) Graph between SIFT and Binary-SIFT showing the recognition rate under Illumination change. I. Percentage Keypoint Match (SIFT vs Binary-SIFT) in Case of Illumination Change Illumination change Percentage percentage keypoint on scale of 0 to 255 keypoint match match using using SIFT Binary-SIFT

Fig.: 4 shows the matching performances of the SIFT and Binary-SIFT. In fig. 4(a) the graph shows the performance of both the algorithms under illumination change. Here we have taken the illumination change scale from -255 to +255; the value -255 indicates the perfect black and the value +255 shows the perfect white effect over the image.
II. Percentage Keypoint Match (SIFT vs Binary-SIFT) in Case of Various Scaling of the Image Scaling in Percentage Percentage percentage keypoint match keypoint match using SIFT using Binary-SIFT

-255 -235 -215 -195 -175 -155 -135 -115 -95 -75 -55 -35 -15 5 25 45 65 85 105 125 145 165 185 205 225 245

0 0 0 2.38 15.17 26.22 43.2 54.55 70.67 77.7 84.52 96.13 98.1 100 95.89 84.52 77.7 70.67 54.55 49.88 34.47 15.17 12.2 0 0 0

0 0 0 0 0 14.52 29.39 39.39 49.88 59.89 80 92.67 94.29 100 97.22 87.45 76.62 70.67 47.79 38.8 24.39 9.43 4.2 0 0 0

10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% 120% 130% 140% 150% 160% 170% 180% 190% 200%

62.2 65.17 71.63 68.82 77.67 81.91 79.17 86.63 86.63 100 89.91 89.91 83 69.67 69.67 71.63 81.91 84.47 73.32 81.91

15.34 25.5 38.38 56.67 66.53 78.72 79.17 87.43 98 100 89.91 83 69.67 62.32 69.67 65.57 79.63 73.32 72 76.62

For a little variations in image SIFT and binary SIFT performs approx equally, but as the variation increases the no. of matching points shows a wide difference between the matching performances of SIFT and binary SIFT.

While matching the keypoints of the original face image with the modified one, the no of keypoints matched shows the matching difference of the algorithms. Here we have taken the percentage keypoint match to show the recognition rate. The percentage keypoints are calculated by taking the base value as the maximum keypoints found in the original image. The percentage keypoints matched is equal to the no of keypoints

the Binary-SIFT is not robust if the scale of the image is very low in comparison with the original one. Fig.: 4(c) shows the recognition rate after the angular rotation of the image. In the graph -90 shows that the image is 90 degree angularly anti clock wise rotated, and 90 shows the image is 90 degree angularly clock wise rotated. This result shows that the Binary-SIFT is quite robust against the angular rotation of the image, but still the SIFT algorithm is better than the Binary-SIFT in terms of recognition rate.

Fig. 4 (c) Graph Between SIFT Binary-SIFT showing the recognition rate under the positive and negative rotation of the face.

matched between the original image and modified image multiplied by 100, and this value is divided by the maximum keypoints found in the original image. This is so because the maximum keypoints found in the original image is equal to the maximum keypoints matched between two same (original) images.
III. Percentage Keypoint Match (SIFT vs Binary-SIFT) in Case of Angular Rotation of the Image Rotation angle in Percentage Percentage degrees keypoint match keypoint match using SIFT using binary SIFT

Fig.: 5(a) Illumination Change Over the Face Images

-90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90

87.32 40.67 41.32 46.22 41.32 42.83 45.55 48.37 45.22 100 44.34 51.67 49.49 42.83 44.34 42.83 46.22 42.82 92.67

80.47 33.33 35.45 47.34 43.33 43.33 44.34 42.83 41.32 100 48.67 44.34 42.83 45.55 44.34 41.32 44.34 38.89 83.32

Fig.: 5(b) Scaling of the Face Images

Fig.: 5(c) Angular Clockwise Rotations of Face Images.

Fig.: 5(d) Angular Anti-Clockwise Rotations of face Images

Fig.: 4(b) shows the recognition rate after varying the scale of the image. In graph 10% scaling of the image shows that the image is 90% smaller than the original image, 100% scaling shows the image is equal to the original image (no scaling), and 200% scaling shows that the image is just double scaled in comparison with original image. Experimental result shows that

After showing our experimental results, we are displaying some images from our database which we have used to calculate our experimental results, and some modified images which we have calculated by applying some variations on the original image. Above Fig.: 5(a,b,c,d) are the examples of the images we have taken for our experimental purposes. The 5(a) shows the

illumination change over the face images, 5(b) shows the scaling of the face images, 5(c) and 5(d) is the sample image to show the angular rotation of the face images.

VI. CONCLUSION The new Binary-SIFT algorithm for extracting the feature vectors by using the medians of SIFT feature vector is quite efficient in terms of time and space utilization. In this paper we have shown that the Binary-SIFT is better but not in terms of result compared to original SIFT. Robustness against high variation in image is one of the issues in Binary-SIFT but instead of this problem algorithm has some advantages also. One of the advantages is that, this algorithm uses the XOR operation to perform the matching between images. This Binary operation needs very less memory of cache, RAM and little processing power. Thousands of distance computation can be done easily and efficiently with the help of this algorithm. Which means very large database is quite small by using this method. So this algorithm can be used under some conditions like, having less memory space, less computation power, need to speed up matching process and having database of less variations. The Binary-SIFT is the algorithm which we can use in compact hardware but in that case we have to compromise with the results given by this algorithm. SIFT algorithm is a time and space consuming algorithm but does not play with the quality of results. With the help of our experiments we have proved that both the algorithms are giving nearly same results if there is less variation in the image in comparison with the original image, but in the real world applications, it is quite often happens that image varies under different conditions. So the Binary-SIFT algorithm is not a reliable one to use in real world applications where image taken by the camera varies frequently. With the help of our experiments we have shown that the image (face) varying with different conditions (rotation, scaling and illumination), the percentage of keypoint matched is better in SIFT as compared to Binary-SIFT. In case of no variation (rotation, scaling and illumination) performed with the target image then the percentage of keypoint matched is 100%. If we increase the illumination on the image (face) by 105 (on scale of 0 to 255, where 0 as no illumination and 255 perfect white effect) the percentage of keypoint matched using Binary-SIFT is 47.79% but in case of SIFT it is 54.55% [I]. If we scale the image by 120% the percentage keypoint match using Binary-SIFT is 83% where as in case of using SIFT the percentage keypoint match is 89.91% [II]. If we rotate the image by 20 degree anticlockwise, then the percentage of keypoint matched in Binary-SIFT is 42.83% where as in SIFT the percentage of keypoint matched using SIFT is 48.37% [III]. So our experimental results clearly shows that after varying the image (face) SIFT perform better than the Binary-SIFT. So it is concluded SIFT is more robust than Binary-SIFT. REFERENCES
[1] David G. Lowe Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision. 60(2): 91-110, 2004.

Available:www.cs.ubc.ca/~lowe/papers/ijcv04.pdf [2] Herbert Bay, Tinne Tuytelaars, and Luc Van GoolSURF: Speeded up robust features Computer Vision and Image Understanding 110 (2008). Available:www.vision.ee.ethz.ch/~surf/eccv06.pdf [3] Feng-Cheng Huang Shi-Yu Huang, Ji-Wei Ker Yung-Chang Chen,High performance sift hardware accelerator for real time image feature extraction Accepted In IEEE Journal. [4] Kadir A. Peker Binary SIFT: Fast image retrieval using binary quantized features 2011 IEEE. [5] Michael Calonder, Vincent Lepetit, Mustafa zuysal, Tomasz Trzcinski, Christoph Strecha, and Pascal Fua, BRIEF: Computing a local binary descriptor very fast, Digital Object Identifier IEEE 2011. Available:cvlab.epfl.ch/~lepetit/papers/calonder_pami11.pdf [6] E. Jauregi, E. Lazkano and B. Sierra Object recognition using region detection and feature extraction. Available:www.sc.ehu.es/ccwrobot/publications/papers/jauregi0 9object.pdf [7] F. Samaria and A. Harter Parameterisation of a stochastic model for human face identification 2nd IEEE workshop on Applications of Computer Vision December 1994 Sarasota (Florida). [8] Vidit Jain and Amitabha Mukherjee "The Indian Face Database" 2002. Availabe:http://vis-www.cs.umass..edu/$\sim$vidit/{I}ndian{F} ace{D}atabase/ [9] Rafel C. Gonzalez, Richard E. woods Digital image processing. [10] Mohamed Aly Face Recognition Using SIFT Features CNS186 Term Project winter 2006. [11] Bo Dai, Dengsheng Zhang, Hui Liu, Shixin Sun, KeLi Evaluation of Face Recognition Techniques. Proc. Of SPIE vol. 7489 74890M-1.
Rahul Prakash received B.Tech. Degree from Uttar Pradesh Technical University, Lucknow, India (2010) and pursuing M.E. from Punjab Engineering College University of Technology, Chandigarh, India. He is pursuing his M.E. thesis on face recognition using SIFT under the guidance of Asst. Prof. Padmavati (PEC University of Technology, Chandigarh). Padmavati received M.E. degree from Punjab Engineering College University of Technology, Chandigarh, India and pursuing Ph.D. from PEC University of Technology, Chandigarh, India. She is having 7 years experience in teaching and published 10 research papers in International Journal and International Conferences.