Professional Documents
Culture Documents
The Harris matrix at level l and position (x, y) is the the maximum number of interest points extracted from each
smoothed outer product of the gradients image. At the same time, it is important that interest points
are spatially well distributed over the image, since for image
Hl (x, y) = ∇σd Pl (x, y)∇σd Pl (x, y)T ∗ gσi (x, y)
stitching applications, the area of overlap between a pair of
We set the integration scale σi = 1.5 and the derivative images may be small. To satisfy these requirements, we
scale σd = 1.0. To find interest points, we first compute the have developed a novel adaptive non-maximal suppression
“corner strength” function (ANMS) strategy to select a fixed number of interest points
from each image.
det Hl (x, y) λ1 λ2
fHM (x, y) = = Interest points are suppressed based on the corner
tr Hl (x, y) λ1 + λ2 strength fHM , and only those that are a maximum in a
which is the harmonic mean of the eigenvalues (λ1 , λ2 ) of neighbourhood of radius r pixels are retained. Conceptu-
H. Interest points are located where the corner strength ally, we initialise the suppression radius r = 0 and then
fHM (x, y) is a local maximum in a 3 × 3 neighbourhood, increase it until the desired number of interest points nip is
and above a threshold t = 10.0. Once local-maxima have obtained. In practice, we can perform this operation with-
been detected, their position is refined to sub-pixel accuracy out search as the set of interest points which are generated
by fitting a 2D quadratic to the corner strength function in in this way form an ordered list.
the local 3 × 3 neighbourhood and finding its maximum. The first entry in the list is the global maximum, which
For each interest point, we also compute an orientation is not suppressed at any radius. As the suppression radius
θ, where the orientation vector [cos θ, sin θ] = u/|u| comes decreases from infinity, interest points are added to the list.
from the smoothed local gradient However, once an interest point appears, it will always re-
main in the list. This is true because if an interest point is
ul (x, y) = ∇σo Pl (x, y) a maximum in radius r then it is also a maximum in radius
The integration scale for orientation is σo = 4.5. A r < r. In practice we robustify the non-maximal suppres-
large derivative scale is desirable so that the gradient field sion by requiring that a neighbour has a sufficiently larger
ul (x, y) varies smoothly across the image, making orienta- strength. Thus the minimum suppression radius ri is given
tion estimation robust to errors in interest point location. by
ri = min |xi − xj |, s.t. f (xi ) < crobust f (xj ), xj I
j
3 Adaptive Non-Maximal Suppression
where xi is a 2D interest point image location, and I is the
Since the computational cost of matching is superlinear set of all interest point locations. We use a value crobust =
in the number of interest points, it is desirable to restrict 0.9, which ensures that a neighbour must have significantly
100
position
position, orientation
90
matched
80
70
60
repeatibility %
50
40
Figure 2. Adaptive non-maximal suppression (ANMS). The two upper images show interest points with the highest
corner strength, while the lower two images show interest points selected with adaptive non-maximal suppression
(along with the corresponding suppression radius r). Note how the latter features have a much more uniform spatial
distribution across the image.
0.16 7 1
correct matches correct matches 1NN/(average 2NN)
incorrect matches incorrect matches 0.9 1NN/2NN
0.14 1NN
6
0.8
0.12
5 0.7
probability density
probability density
0.1
0.6
true positive
0.08 0.5
3
0.4
0.06
2 0.3
0.04
0.2
1
0.02
0.1
0 0 0
0 10 20 30 40 50 60 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1−NN squared error 1−NN/2−NN squared error false positive
Figure 6. Distributions of matching error for correct and incorrect matches. Note that the distance of the closest match
(the 1-NN) is a poor metric for distinguishing whether a match is correct or not (figure (a)), but the ratio of the closest
to the second closest (1-NN/2-NN) is a good metric (figure (b)). We have found that using an average of 2-NN distances
from multiple images (1NN/(average 2-NN)) is an even better metric (figure (c)). These results were computed from
18567 features in 20 images of the Abbey dataset, and have been verified for several other datasets.
1
s=8 6(a)). Hence, it is not possible
to set a global threshold on
s=5 2
0.9
s=3 the matching error e = x n(x) in order to distinguish
s=1
0.8 between correct and incorrect matches.
This behaviour has also been observed by Lowe [6], who
0.7
suggested thresholding instead on the ratio e1−N N /e2−N N .
0.6
Here e1−N N denotes the error for the best match (first near-
true positive
0.5 est neighbour) and e2−N N denotes the error for the second
0.4 best match (second nearest neighbour). As in Lowe’s work,
we have also found that the distributions of e1−N N /e2−N N
0.3
for correct and incorrect matches are better separated than
0.2
the distributions of e1−N N alone (figure 6(b)).
0.1 The intuition for why this works is as follows. For a
0 given feature, correct matches always have substantially
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
false positive lower error than incorrect matches. However, the overall
scale of errors varies greatly, depending upon the appear-
Figure 5. Effect of changing the descriptor sample
ance of that feature (location in feature space). For this rea-
spacing on performance. These ROC curves show
son it is better to use a discriminative classifier that com-
the results of thresholding feature matches based on
pares correct and incorrect matches for a particular feature,
normalised match distance as in section 5.1. Per-
than it is to use a uniform Gaussian noise model in feature
formance improves as the sample spacing increases
space.
(larger patches), but gains are minimal above a sam-
Lowe’s technique works by assuming that the 1-NN in
ple spacing of 5 pixels.
some image is a potential correct match, whilst the 2-NN
in that same image is an incorrect match. In fact, we have
observed that the distance in feature space of the 2-NN and
5 Feature Matching subsequent matches is almost constant1 . We call this the
outlier distance, as it gives an estimate of the matching dis-
Given Multi-scale Oriented Patches extracted from all n tance (error) for an incorrect match (figure 7).
images, the goal of the matching stage is to find geometri- We have found that in the n image matching context we
cally consistent feature matches between all images. This can improve outlier rejection by using information from all
proceeds as follows. First, we find a set of candidate fea- of the images (rather than just the two being matched). Us-
ture matches using an approximate nearest neighbour algo- ing the same argument as Lowe, the 2-NN from each image
rithm (section 6). Then we refine matches using an out- will almost certainly be an incorrect match. Hence we av-
lier rejection procedure based on the noise statistics of cor- erage the 2-NN distances from all n images, to give an im-
rect/incorrect matches. Finally we use RANSAC to apply proved estimate for the outlier distance. This separates the
geometric constraints and reject remaining outliers. distributions for correct and incorrect matches still further,
resulting in improved outlier rejection (figure 6(c)). Note
5.1 Feature-Space Outlier Rejection that the extra overhead associated with computing 2-nearest
neighbours in every image is small, since if we want to con-
Our basic noise model assumes that a patch in one image, sider every possible image match, we must compute all of
when correctly oriented and located, corresponds to a patch the 1-nearest neighbours anyway.
in the other image modulo additive Gaussian noise: In general the feature-space outlier rejection test is very
powerful. For example, we can eliminate 80% of the false
I (x ) = αI(x) + β + n(x) matches for a loss of less than 10% correct matches. This al-
x = Ax + t lows for a significant reduction in the number of RANSAC
cos θ sin θ iterations required in subsequent steps.
A = s
− sin θ cos θ
6 Fast Approximate Nearest Neighbours us-
n(x) ∼ N (0, σn2 I) ing Wavelet Indexing
where I(x) and I (x) are the corresponding patches, and
n(x) is independent Gaussian noise at each pixel. How- To efficiently find candidate feature matches, we use a
ever, we have found this model to be inadequate for clas- fast nearest-neighbour algorithm based on wavelet index-
sification, since the noise distributions for correctly and in- 1 This is known as the shell property. The distances of a set of uniformly
correctly matching patches overlap significantly (see figure distributed points from a query point in high dimensions are almost equal.
0.7 1
Outlier distance direct
Threshold translation
0.9
similarity
0.6 affine
0.8
0.5 0.7
distance from query
0.6
true positive
0.4
0.5
0.3 0.4
0.3
0.2
0.2
0.1
0.1
0
0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
1 2 3 4 5 6 7 8 9 10
false positive
feature number
Figure 9. The stitched images used for the matching results found in this paper. We have successfully
tested our multi-image matching scheme on a database containing hundreds of panoramic images. See
http://www.research.microsoft.com/∼szeliski/StitchingEvaluation for more examples.