Professional Documents
Culture Documents
Template
Video
Stream
Scale
invariant
detectors
• In
most
object
recogniAon
applicaAons,
when
the
scale
of
the
object
in
the
image
is
unknown
instead
of
extracAng
features
at
many
different
scales
and
then
matching
all
of
them,
it
is
more
efficient
to
design
a
funcAon
on
the
region
which
is
the
same
for
corresponding
regions,
even
if
they
are
at
different
scales.
• The
problem
can
also
be
stated
as
follows:
given
two
images
of
the
same
scene
with
a
large
scale
difference
between
them,
find
the
same
interest
points
independently
in
each
image.
• For
scale
invariant
feature
extracAon
it
is
necessary
to
detect
structures
that
can
be
reliably
extracted
under
scale
changes.
• This
is
done
by
evaluaAng
a
signature
funcAon
(a
kernel)
in
the
point
neighbourhood
and
plot
the
result
as
a
funcAon
of
the
neighbourhood
scale.
Since
it
measures
properAes
of
the
local
neighbourhood
at
a
certain
scale,
it
should
take
a
similar
qualitaAve
shape
if
two
keypoints
are
centered
on
corresponding
image
structures;
• The
funcAon
shape
should
be
squashed
or
expanded
as
a
result
of
the
scaling
factor.
Corresponding
neighbourhood
sizes
should
be
detected
by
searching
for
extrema
of
the
signature
funcAon
in
both
images.
We
can
consider
points
as
a
funcAon
of
region
size
(circle
radius)
.
A
common
approach
is
to
take
a
local
maximum
of
this
funcAon.
The
soluAon
is
to
search
for
maxima
of
suitable
funcAons
in
scale
and
in
space
over
the
images.
f
f Image 2
Image 1 scale = 1/2
The region size (scale), for which the maximum is achieved, should be invariant to image scale.
f Image 1 f Image 2
scale = 1/2
f f f Good !
bad bad
region size region size region size
• For
usual
images
a
good
funcAon
would
be
the
one
with
contrast
(sharp
local
intensity
change).
It
is
easier
to
look
for
zero-‐crossings
of
2nd
derivaAve
than
maxima.
• There
are
a
few
approaches
which
are
truly
invariant
to
significant
scale
changes.
Typically,
such
techniques
assume
that
the
scale
change
is
the
same
in
every
direcAon,
although
they
exhibit
some
robustness
to
weak
affine
deformaAons.
•
The
appropriate
kernel
for
this
is
the
scale-‐normalized
Gaussian
kernel
G(x,
σ)
and
its
derivaAves.
• The
classical
approach
is
to
generate
a
Gaussian
scale-‐space
representaAon
of
an
image,
i.e.
a
set
of
images
from
the
convoluAon
of
an
isotropic
(circular)
Gaussian
Kernel
of
various
sizes:
A
larger
scale
results
into
a
smoother
image
• ExisAng
methods
search
for
local
extrema
in
the
3D
Gaussian
scale-‐space
representaAon
of
an
image
(x
,
y
and
scale).
Local
extrema
over
scale
of
normalized
derivaAves
indicate
the
presence
of
characterisAc
local
structures
• The
moAvaAon
for
generaAng
a
scale-‐space
representaAon
of
a
given
image
originates
from
the
basic
observaAon
that
real-‐world
objects
are
composed
of
different
structures
at
different
scales.
This
implies
that
real-‐world
objects,
may
appear
in
different
ways
depending
on
the
scale
of
observaAon.
• The
Gaussian
scale-‐space
guarantees
that
new
structures
must
not
be
created
when
going
from
a
fine
scale
to
any
coarser
scale.
Its
properAes
include
linearity,
shiM
invariance,
non-‐enhancement
of
local
extrema,
scale
invariance
and
rotaAonal
invariance
FuncAons
for
determining
scale
f = Kernel ∗ Image
Kernels:
L = σ 2 (Gxx ( x, y, σ ) + Gyy ( x, y, σ ) )
Laplacian
of
Gaussians
DoG = G( x, y, kσ ) − G( x, y, σ )
Difference
of
Gaussians
(an
approximaAon
of
Laplacian)
where
Gaussian
x2 + y 2
−
2σ 2
G ( x, y , σ ) = 1
2πσ
e both
kernels
are
invariant
to
scale
and
rota8on
The
method:
-‐
build
scale-‐space
pyramids;
-‐
all
scales
are
examined
to
idenAfy
scale-‐invariant
features:
-‐
compute
the
Difference
of
Gaussian
(DoG)
pyramid
or
Laplacian
of
Gaussians
(LoG)
-‐
detect
maxima
and
minima
in
scale
space
scale
← Laplacian →
• Harris-‐Laplacian1
Find
local
maximum
of:
– Harris
corner
detector
in
space
(image
coordinates)
y
– Laplacian
in
scale
← Harris → x
• SIFT
(Lowe)2
Find
local
maximum
of:
scale
– Difference
of
Gaussians
in
space
and
scale
← DoG →
y
← DoG → x
1
K.Mikolajczyk,
C.Schmid.
“Indexing
Based
on
Scale
Invariant
Interest
Points”.
ICCV
2001
2
D.Lowe.
“DisAncAve
Image
Features
from
Scale-‐Invariant
Keypoints”.
Accepted
to
IJCV
2004
Harris-‐Laplacian
scale-‐invariant
detector
• Harris-‐Laplacian
method
uses
Harris
funcAon
first
at
mulAple
scales,
then
selects
points
for
which
Laplacian
aKains
maximum
over
scales.
• Harris-‐corner
points
are
interest
points
that
have
good
rotaAonal
and
illuminaAon
invariance.
But
are
not
scale
invariant.
To
reflect
scale-‐invariance
the
second-‐moment
matrix
is
modified
taking
a
Gaussian
scale
space
representaAon
with
a
Laplacian
of
Gaussian
kernel.
• Since
the
computaAon
of
derivaAves
usually
involves
a
stage
of
scale-‐space
smoothing,
an
operaAonal
definiAon
of
the
Harris
operator
requires
two
scale
parameters:
– (i)
a
local
deriva8on
scale
for
smoothing
before
the
computaAon
of
derivaAves
– (ii)
an
integra8on
scale
for
accumulaAng
the
operaAons
on
derivaAves
• where
g(σI)
is
the
Gaussian
kernel
of
scale
σI
(integraAon
scale)
and
L(x,y) is
the
gaussian
smoothed
image
and
Lx and
Ly
its
derivaAves
in
the
x
and
y
direcAon,
calculated
using
a
Gaussian
kernel
of
scale
σD
(differenAaAon
scale).
MulAplicaAon
by
σ2
is
because
derivaAves
must
be
normalized
across
scales
according
to
Dm(x, s ) =
σm
Lm(x, s ).
• The
algorithm
searches
across
mulAple
scales
σn σ0 , k1σ0 , k2σ0 k3σD…… knσ0 (k=1,4
)
sekng σI = σn and
σD = s σI (s=0,7).
• At
each
scale
corners
are
found
as
with
the
Harris
method
applied
to
M
matrix
in
a
8
point
neighbourhood.
An
iteraAve
algorithm
localizes
corner
points
spaAally
and
chooses
the
characterisAc
scale:
– Laplacian
of
Gaussians
is
used
to
judge
if
each
of
the
candidate
points
found
on
different
levels,
forms
a
maximum
in
the
scale
direcAon
(check
with
n-‐1
and
n+1).
The
scale
where
such
maximum
in
scale
is
found
is
referred
to
as
CharacterisAc
scale.
It
is
used
in
future
iteraAons.
Points
are
spaAally
localized
at
the
characterisAc
scale
• SIFT
method
has
been
introduced
by
D.
Lowe
in
2004
to
represent
visual
enAAes
according
to
their
local
properAes.
The
method
employs
local
features
taken
in
correspondence
of
salient
points
(referred
to
as
keypoints
or
SIFT
points).
Keypoints
(their
SIFT
descriptors)
are
used
to
characterize
shapes
with
invariant
properAes
• Image
points
selected
as
keypoints
and
their
SIFT
descriptors
are
robust
under:
-‐ Luminance
change
(due
to
difference-‐based
metrics)
-‐ Scale
change
(due
to
scale-‐space)
-‐ RotaAon
(due
to
local
orientaAons
wrt
the
keypoint
canonical)
• SIFT
descriptors
are
obtained
in
the
following
three
steps:
1. Keypoint
detecAon
using
local
extrema
of
DoG
filters
2. ComputaAon
of
keypoint
orientaAon
3. SIFT
descriptor
derivaAon
Build
Gaussian
pyramids
Keypoints
are
detected
as
local
scale-‐space
maxima
of
the
Differences
of
Gaussians.
They
correspond
to
local
min/max
points
in
image I(x,y) that
keep
stable
at
different
scales
σ
Resample
Blur
σ0 = (k )0 σ
σ1 = (k )1 σ
σ2 = (k )2 σ
σ3 = (k )3 σ
σ4 = (k )4 σ
• Octave:
the
original
image
is
convoluted
with
a
set
of
Gaussians,
so
as
to
obtain
a
set
of
images
that
differ
by
k
in
the
scale
space:
each
of
these
sets
is
usually
called
octave.
k4
k3
Octave
k2
k
k0
• Each octave is divided into a number of intervals such as k = 2 1/s. .
• For
each
octave
s
+
3
images
must
be
calculated.
For
example
if
s
=
2
then
k
=
2
½
and
we
will
have
5
images
at
different
scales.
In
this
case
an
octave
corresponds
to
doubling
the
value
of
σ
σ0 = (2 ½ )0 σ = σ
σ1 = (2 ½ )1 σ = κ σ
σ2 = (2 ½ )2 σ = 2 σ
σ3 = (2 ½ )3 σ = 2 κ σ
σ4 = (2 ½ )4 σ = κ2σ
σ4
is
doubled
wrt
σ0
Choice
of
s
=
2
is
based
on
empirical
verificaAon
of
the
keypoint
stability
• Gaussian
kernel
size:
the
number
of
samples
increases
as
σ increases.
The
number
of
operaAons
that
are
needed
are
(N2
-‐1)
sums
and
N2
products.
They
grow
as
σ grows.
A
good
compromise
is
to
use
a
sample
interval
of [-3σ, 3σ]
• Moreover
convoluAon
of
two
gaussians
of
σ12 and σ22 is
a
Gaussian
with
variance:
σ3 2 = ( σ12 + σ22 ). This
property
can
be
exploited
to
build
the
scale
space,
so
to
use
convoluAons
already
calculated
aa
Example
Detect
maxima
and
minima
of
DoGs
in
scale
space
•
Local
extrema
of
D(x,y,σ)
are
the
local
interest
points.
To
detect
the
interest
points
at
each
level
of
scale
of
the
DoG
pyramid
every
pixel
p
is
compared
to
its
8
neighbours:
– if
p
is
a
local
extrema
(local
minimum
or
maximum)
it
is
selected
as
a
candidate
keypoint
– each
candidate
keypoint
is
compared
to
the
9
neighbours
in
the
scale
above
and
below
Only
pixels
that
are
local
extrema
in
3
adjacent
levels
are
promoted
as
keypoints
Keypoint
stability
• The
many
points
extracted
from
maxima+minima
of
DoGs
have
only
pixel-‐accuracy
at
best
and
may
correspond
to
low
contrast
and
therefore
unreliable
points.
• To
improve
keypoint
stability
a
funcAon
is
adapted
to
the
local
points
in
order
to
determine
the
interpolated
posiAon.
Since
points
are
defined
in
3D
(x,y,
σ)
it
is
a
3D
curve
fikng
problem.
The
interpolaAon
is
done
using
the
quadraAc
Taylor
expansion
of
the
Difference-‐of-‐
Gaussian
scale-‐space
funcAon,
with
the
candidate
keypoint
as
the
origin:
k x w-‐k
where
D
and
its
derivaAves
are
evaluated
at
the
candidate
keypoint
k (x,y
σ)
and
x(x,y
σ)
is
the
offset
from
this
point.
• The
locaAon
of
the
extremum,
is
determined
by
taking
the
derivaAve
of
this
funcAon
with
respect
to
x
and
sekng
it
to
zero:
– otherwise
the
offset
is
added
to
its
candidate
keypoint
to
get
the
interpolated
esAmate
for
the
locaAon
of
the
extremum.
• low
contrast
keypoints
are
generally
less
reliable
than
high
contrast
and
keypoints
that
respond
to
edges
are
unstable.
Filtering
can
be
performed
respecAvely
by:
•
thresholding
on
simple
contrast
•
thresholding
based
on
principal
curvature
• The
local
contrast
can
be
directly
obtained
from
D(x,y,σ)
calculated
at
the
locaAon
of
the
keypoint
as
updated
from
the
previous
step.
Unstable
extrema
with
low
contrast
can
be
discarded
according
to
Lowe’
rule:
|D(x)
<
0,03|
• The
DoG
funcAon
has
strong
responses
along
edges.
To
eliminate
the
keypoints
that
have
poorly
determined
locaAons
but
have
high
edge
responses
it
must
be
noAced
that
for
poorly
defined
peaks
in
the
DoG
funcAon,
the
principal
curvature
across
the
edge
would
be
much
larger
than
the
principal
curvature
along
it.
• The
raAo
of
the
two
eigenvalues
is
sufficient
to
the
goal.
If
r
is
the
raAo
between
the
highest
and
the
lowest
eigenvalue,
then:
R
=
(r+1)2
/
r
depends
only
on
the
raAo
of
the
two
eigenvalues
and
is
minimum
when
the
two
eigenvalues
have
the
same
value
and
increases
as
r
increases.
In
order
to
have
the
raAo
between
the
two
principal
curvatures
below
a
threshold
it
must
be
that
for
some
threshold
on
r,
if
R
is
higher
than
the
keypoint
is
poorly
localized
and
hence
rejected.
Maxima
in
D
Remove
low
contrast
and
edges
• Experimental
evaluaAon
of
detectors
w.r.t.
scale
change
Repeatability
rate:
# correspondences
# possible correspondences
(points present)
• The
common
drawback
of
both
the
LoG
(and
DoG)
representaAon
is
that
local
maxima
can
also
be
detected
in
the
neighborhood
of
contours
or
straight
edges,
where
the
signal
change
is
only
in
one
direcAon.
• These
maxima
are
less
stable
because
their
localizaAon
is
more
sensiAve
to
noise
or
small
changes
in
neighboring
texture.
OrientaAon
assignment
• For
a
keypoint,
if L is
the
image
with
the
closest
scale,
for
a
region
around
keypoint
compute
gradient
magnitude
and
orientaAon
using
finite
differences:
⎡ L( x + 1, y ) − L( x − 1, y ) ⎤
GradientVector = ⎢ ⎥
⎣ L( x, y + 1) − L( x, y − 1) ⎦
For
such
region
-‐
create
an
histogram
with
36
bins
for
orientaAon
-‐
weight
each
point
with
Gaussian
window
of
1.5σ
east squares)
• Peak
orientaAon
is
the
keypoint
canonical
orientaAon
• Any
peak
within
80%
of
the
highest
peak
is
used
to
create
a
keypoint
with
that
orientaAon.
Local
peak
within
80%
creates
mulAple
orientaAons.
About
15%
has
mulAple
orientaAons
Once
the
local
orientaAon
and
scale
of
a
keypoint
have
been
esAmated,
a
scaled
and
oriented
patch
around
the
detected
point
can
be
extracted
and
used
to
form
a
feature
descriptor