You are on page 1of 6

TOPIC: SIFT (SCALE INVARIANT FEATURE TRANSFORM)

METHOD FOR KEY LOCATION DETECTION –

The scale invariant feature transform method is a computer algorithm that is used
to detect images in detail and to match local characteristics in images.

Background on SIFT:

In the process of SIFT, the following major phases of essential calculation are
employed to build the set of image features:

1. Scale-space extreme detection:

The first stage of the calculation looks for images at all scales and locations and
after that a difference-of-Gaussian function is used to discover future interest spots
that are scale and orientation invariant.(Arya, S., and Mount, D.M. 1993.)

2. Location of key points:

A comprehensive model is fitted to each possible location to determine location


and scale. The key points are chosen based on how stable they are.

3. Orientation of given project:

After localization is done then ech key point position is assigned one or more
orientations based on the local image gradient directions.All subsequent actions are
carried out on picture data that has been modified in relation to the starting point.

4. Key point descriptor: The local image gradients in the region around each key
point are measured at the given scale. These are converted into a representation
that allows for extensive local shape deformation and lighting changes.

This method is known as the Scale Invariant Feature Transform because it turns
picture data into scale-invariant coordinates related to local features (SIFT). This
approach generates a large number of features that cover the image densely at all
scales and locations, which is a key feature.(Basri, R., and Jacobs, D.W. 1997.)

With a resolution of 500x500 pixels, a typical image will provide around 2000
stable features by using SIFT (Despite the fact that this number is dependent on
both image content and parameter settings).

The number of features is especially significant in object recognition, where the


capacity to recognize small items in cluttered backgrounds necessitates the proper
matching of at least three attributes from each object for reliable identification.
SIFT characteristics are taken from a set of reference photos and stored in a
database for image matching and recognition.

A fresh image is matched by comparing each feature from the new image to the old
database and discovering candidate matching features based on their feature
vectors' Euclidean distance. Fast nearest-neighbor algorithms will be discussed in
this work, which can execute this computation quickly on huge databases.
(Baumberg, A. 2000)

Uses of SIFT:

SIFT aids in the identification of the image’s local features, often known as ‘key
points.’ Various Picture matching, object detection, scene detection, and many
other computer vision applications can all benefit from these scale and rotation
invariant key points.

SIFT are also useful in estimation of fundamental matrix in stereo and also in
tracking and in motion segmentation.

SIFT can reliably recognises objects even among clutter and under partial
occlusion in scale or rotation because it is invariant to uniform scaling, orientation,
illumination changes, and partially invariant to affine distortion. It also gives
qualities that characterize a givn significant point and are unaffected by external
changes are main useful features of using SIFT(Brown, M. and Lowe, D.G. 2002.)
They're simple to match against a (big) database of local attributes, but the high
dimensionality might be a problem, thus probabilistic approaches like k-d trees
with best bin first search are commonly utilized.

Mathematics behind SIFT –


We begin with the image you provided or with our default image, a portrait, then
use bilinear interpolation to double the width and height of the input. Following
that, the image is blurred using a Gaussian convolution.
Then comes a series of additional convolutions with increasing standard deviation.
Finally each row's antepenultimate image is down sampled, and a new row of
convolutions begins. This technique is repeated until the images are too small to
continue. (Each row is commonly referred to as an octave because the sampling
rate is reduced by a factor of two per stage.)
A scale space is what we've created right now. The goal is to replicate varied
observation scales) while suppressing fine-scale features.(Hough, P.V.C. 1962)
The above representation has now been adjusted. This will be especially noticeable
in photographs with low contrast. (Black will be 0.00 and white will be 1.00 with a
full contrast input.)
Let's pretend for a moment that each octave of our scale space were a continuous
space with three dimensions: the pixel's x and y coordinates, as well as the
convolution's standard deviation.(Koenderink, J.J. 1984.)
We’d now compute the Laplacian of the scale-space function as a next process
which assigns grey values to each member of this space in an ideal world. As a
result, the Laplacian’s extrema could be candidates for the critical points that our
algorithm searches. Because we’ll be working in a discrete approximation of this
continuous space, we’ll instead employ a technique known as difference of
Gaussians.

Conclusion –
The SIFT approach is insensitive to image rotation and scale, as well as
changes in noise, 3D viewpoint, and illumination, according to this
report on the uses of SIFT and knowing the mathematics behind SIFT
features. Given a suspect photo, it can accurately keypoint match if a
specific region has been changed in intensity, as well as determine the
geometric transformations used to achieve such tampering, such as scale
and rotation.

REFERENCES -
Arya, S., and Mount, D.M. 1993. Approximate nearest neighbor queries in fixed
dimensions. In Fourth Annual ACM-SIAM Symposium on Discrete Algorithms
(SODA’93), pp. 271-280.

Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., and Wu, A.Y. 1998. An optimal
algorithm for approximate nearest neighbor searching. Journal of the ACM, 45:891-923.

Ballard, D.H. 1981. Generalizing the Hough transform to detect arbitrary patterns. Pattern
Recognition, 13(2):111-122.

Basri, R., and Jacobs, D.W. 1997. Recognition using region correspondences.
International Journal of Computer Vision, 25(2):145-166.

Baumberg, A. 2000. Reliable feature matching across widely separated views. In


Conference on Computer Vision and Pattern Recognition, Hilton Head, South Carolina,
pp. 774-781.

Beis, J. and Lowe, D.G. 1997. Shape indexing using approximate nearest-neighbour
search in highdimensional spaces. In Conference on Computer Vision and Pattern
Recognition, Puerto Rico, pp. 1000-1006.

Brown, M. and Lowe, D.G. 2002. Invariant features from interest point groups. In
British Machine Vision Conference, Cardiff, Wales, pp. 656-665.

Carneiro, G., and Jepson, A.D. 2002. Phase-based local features. In European
Conference on Computer Vision (ECCV), Copenhagen, Denmark, pp. 282-296.

Crowley, J. L. and Parker, A.C. 1984. A representation for shape based on peaks and
ridges in the difference of low-pass transform. IEEE Trans. on Pattern Analysis and
Machine Intelligence, 6(2):156-170. 26
Edelman, S., Intrator, N. and Poggio, T. 1997. Complex cells and object recognition.
Unpublished manuscript: http://kybele.psych.cornell.edu/∼edelman/archive.html

Hough, P.V.C. 1962. Method and means for recognizing complex patterns. U.S. Patent
3069654.

Koenderink, J.J. 1984. The structure of images. Biological Cybernetics, 50:363-396.

You might also like