You are on page 1of 43

22/05/2021

Plan
• Edges
‒ Detection
‒ linking

Computer Vision • Feature extraction


‒ Global features
Chapter 4: Feature detection and Image
‒ Local features
matching
‒ Matching
• Applications

1 2

Local features vs Global features


• Two types of features are extracted from the image:
‒ local and global features (descriptors)
• Global features
‒ Describe the image as a whole to the generalize the entire object
‒ Include contour representations, shape descriptors, and texture
features
• Examples: Invariant Moments (Hu, Zernike), Histogram Oriented
Gradients (HOG), PHOG, and Co-HOG,...
• Local feature: Feature extraction
‒ the local features describe the image patches (key points in the • Global features
image) of an object
‒ represents the texture/color in an image patch
• Color / Shape / Texture
• Examples: SIFT, SURF, LBP, BRISK, MSER and FREAK, … • Local features

3 4
22/05/2021

Global features? Types of features


• Contour representation, Shape features
How to distinguish these objects?
• Color descriptors
• Texture features

5 6

Color features Distance / Similarity


• Histogram 256 bins intensity histogram
• L1 ou L2 (euclidian) distances are often used
16 bins intensity histogram
N
d L1 (H, G)   hi  g i
i 1

• Histogram intersection
∑ min( ℎ , 𝑔 )
∩ (H,G) =
∑ 𝑔

7 8
22/05/2021

Advantages of histogram Some inconveniences


• Invariant to basic geometric transformations: • The similarity between colors in
‒ Rotation adjacent colors (bin) is not taken
‒ Translation into account
‒ Zoom (Scaling)

• The spatial distribution of pixel


values is not considered: 2 different
images may have the same
histogram

9 10

Some inconveniences Texture features


• Background effect: d(I1,I2) ? d(I1, I3) • A texture can be defined as
‒ a region with variation of intensity
‒ as a spatial organization of pixels
I3
I1

I2

• Color representation dependency (color space),


device dependency, …

11 12
22/05/2021

Texture features Texture features


• There are several methods for analyzing textures:
‒ First order statistics
• Statistics on histogram

‒ Co-occurence matrices
• Searching patterns

‒ Frequential analysis
• Gabor filter
‒…
• The most difficult is to find a good representation (good
parameters) for each texture

13 14

First order statistics First order statistics


• Histogram-based: mean, variance, skewness, kurtosis, • Skewness vs. kurtosis (red: normal distribution with mean
energy, entropy, ... = 5, variance = 4)

h(i) : số điểm ảnh ở mức xám i

is.hust.edu.vn 15 16
22/05/2021

GLCM (Grey Level Co-occurence Matrices) GLCM


 The idea here is to identify gray level that repeat
themselves given a distance and a direction
 Co-occurence matrices (Haralick)
card ({ p1 , p2 I ( p1 )  ci , I ( p2 )  c j , N d ,  ( p1 , p2 )  true})
CM d ,  (ci , c j ) 
 Matrix of size Ng x Ng card ({ p1, p 2 N d , ( p1 , p2 )  true})
 Ng is the number of gray level in the image (256x256)
We often reduce that number to 8x8, 16x16 or 32x32
N d ,  ( p1 , p2 )  true

p2 is a neigbor of p1 at a distance d direction 

 Many matrices, one for each distance and direction


 Distance : 1, 2, 3 (,4, …) 0°
 Direction : 0°, 45°, 90°, 135° (, …)
135° 90° 45°
 Processing time can be very long
18

GLCM GLCM
 Example on how to compute these matrices:  Example on how to compute these matrices:

1 2 3 4 1 2 3 4
1 4 4 3 1 4 4 3
1 ? ? ? ? 1 0 0 0 1
4 2 3 2 4 2 3 2
2 ? ? ? ? 2 0 0 0 0
1 2 1 4 1 2 1 4
3 ? ? ? ? 3 0 0 0 0
1 2 2 3 1 2 2 3
4 ? ? ? ? 4 0 0 0 0
Image Image
Matrix for distance=1 Matrix for distance=1 and
and direction=0° direction=0°

We loop over the image and for each pair of pixels following the given Pair of neighbor pixels (1,4)
distance and orientation, we increment the co-occurence matrix
22/05/2021

GLCM GLCM
 Example on how to compute these matrices:  Example on how to compute these matrices:

1 2 3 4
1 2 3 4 1 4 4 3
1 4 4 3 1 0 0 0 1
1 0 0 0 1 4 2 3 2
4 2 3 2 2 0 0 1 0
2 0 0 0 0 1 2 1 4
1 2 1 4 3 0 1 0 0
3 0 0 0 0 1 2 2 3
1 2 2 3 4 0 1 1 1
4 0 0 0 1 Image
Image Matrix for distance=1 and
Matrix for distance=1 and direction=0°
direction=0°

Pair of neighbor pixels (4,4) ...and so on (after 2 lines)

GLCM GLCM
 Example on how to compute these matrices • Most important/populer parameters computed from
(final): GLCM:
1 4 4 3
1 2 3 4 1 2 3 4
Energy   CM d 2 (i, j ) minimal when all elements are equal
1 0 2 0 2 1 0 2 1 0
4 2 3 2 i j
2 1 1 2 0 2 1 1 0 0
1 2 1 4 entropy    CM d (i, j ) log(CM d (i, j ))
3 0 1 0 0 3 0 0 0 1 a measure of chaos,
1 2 2 3 i j maximal when all elements are equal
4 0 1 1 1 4 0 2 1 0
Image
Matrix for distance=1 Matrix for distance=1 contrast   (i  j ) 2 CM d (i, j )
and direction=0° and direction=45° i j small values when big elements
1 are near the main diagonal
idm   CM d (i, j )
i j 1  (i  j ) 2
…and so on for each matrix (several matrices at the end) idm (inverse differential moment) has small values
when big elements are far from the main diagonal
is.hust.edu.vn 24
22/05/2021

GLCM Invariances
• Haralick features: • Rotation?
‒ For each GLCM, we can compute up to 14 (13) ‒ Average on all directions
parameters characterizing the texture, of which the most
important : mean, variance, energy, inertia, entropy,
• Scaling?
inverse differential moment
‒ Multi-resolutions
‒ Ref: http://haralick.org/journals/TexturalFeatures.pdf

is.hust.edu.vn 25 26

Texture features comparision Shape features


• Contour-based features
‒ Chain coding, polygone approximation, geometric
parameters, angular profile, surface, perimeter, …
• Region based:
‒ Invariant moments, …

Source : William Robson Schwartz et al. Evaluation of Feature Descriptors for Texture Classification – 2012 JEI
27 28
22/05/2021

Shape features Examples: Freeman chain coding


Shape features

Boundary-based methods
Region-based methods

Gometrical Signatures
Global
Perimeter Centroid Distance Geometrical
Eccentricity Complex Coordinates Moment invariants
Curvature Curvature signature Area Zernike moments
Axes directionality Turning Angle Compactness Pseudo Zernike moments
Euler number
Grid method
Others Signature descriptors Decompasition
Chain codes Fourier Descriptors
Triangulation
UNL-Fourier
Medial Axis Transform
NFD
(Skeleton Transform)
Wavelet Descriptors

B-Splines

29 30

Examples: angular profile, … Examples : Image moments


• Moment

𝑀 , = 𝑎𝑟𝑒𝑎 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑒𝑔𝑖𝑜𝑛 𝐷

𝑀 , ,𝑀 , = 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑 𝑜𝑓 𝐷
• Central moments:

Invariant to
translation

31 32
22/05/2021

Invariant moments (Hu's moments) Examples : Hu's moments


6 images and their Hu Moments

invariant to
translation,
scale, and
rotation, and
reflection

Change for
image
reflection
https://www.learnopencv.com/wp-content/uploads/2018/12/HuMoments-Shape-Matching.png

33 34

Shape Context Examples: PHOG

PHOG:
Pyramid Histogram of Oriented Gradients

https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/shape/sc_digits.html
Source:http://www.robots.ox.ac.uk/~vgg/research/caltech/phog.html

35 36
22/05/2021

Why local features?


• Image matching: a challenging problem

Feature extraction
• Global features
• Local features
• Interest point detector
• Local descriptor
Source: CS131 - Juan Carlos Niebles and Ranjay Krishna

37 38

Image matching Harder Still?

by swashford

by Diva Sian
NASA Mars Rover images

Slide credit: Steve Seitz

by scgbt Slide credit: Steve Seitz

39 40
22/05/2021

Answer Below (Look for tiny colored squares) Motivation for using local features
• Global representations have major limitations
• Instead, describe and match only local regions
• Increased robustness to
‒ Occlusions

‒ Articulation d dq
φ
φ
θq
NASA Mars Rover images with SIFT feature matches
θ
(Figure by Noah Snavely)

Slide credit: Steve Seitz


‒ Intra-category variations
Source: CS131 - Juan Carlos Niebles and Ranjay Krishna

41 42

Local features and object recognition Local features and alignment


• We need to match (align) images
• Recognition of specific objects/scenes
• Global methods sensitive to occlusion, lighting,
parallax effects. So look for local features that match
well.
• How would you do it by eye?

Sivic and Zisserman, 2003

D. Lowe 2002

[Darya Frolova and Denis Simakov]


43 44
22/05/2021

Local features and alignment Local features and alignment


• Detect feature points in both images • Detect feature points in both images
• Find corresponding pairs • Find corresponding pairs
• Use these pairs to align images

[Darya Frolova and Denis Simakov]

45 [Darya Frolova and Denis46Simakov]

Local features for image matching


1. Find a set of
distinctive key- points

A1 2. Define a region
around each keypoint

A2 A3
3. Extract and
normalize the region content

fA Similarity fB 4. Compute a localdescriptor


measure
from the normalized region
N pixels

e.g. color e.g. color

N pixels d ( f A, fB )  T 5. Match local descriptors

Source : Jim Little, Lowe: features, UBC.


Slide credit: Bastian Leibe

47 48
22/05/2021

Local features Local feature extraction


• Objectifs: • Local features: how to determine image patches / local
• Look for similar objects/regions regions

• Partial query Dividing into


Look for pictures that contain sunflowers patches with Keypoint detection
regular grid
Image segmentation

• Solution:
• Describing local regions
• Adding spatial constraints if need Without knowledge about
image content
Based on the content of image

49 50

Common Requirements Invariance: Geometric Transformations


• Problem 1:
‒ Detect the same point independently in both images

No chance to match!
We need a repeatable detector!
• Problem 2:
‒ For each point correctly recognize the corresponding one

Slide credit: Steve Seitz


We need a reliable and distinctive descriptor!
Slide credit: Darya Frolova, Denis Simakov

51 52
22/05/2021

Invariance: Geometric Transformations Invariance: Photometric Transformations

Levels of Geometric Invariance


• Often modeled as a linear
transformation:
‒ Scaling + Offset
Slide credit: Steve Seitz Slide credit: Tinne Tuytelaars

53 54

Requirements Main questions


• Region extraction needs to be repeatable and accurate • Where will the interest points come from?
‒ Invariant to translation, rotation, scale changes
‒ Robust or covariant to out-of-plane (affine) transformations ‒ What are salient features that we’ll detect in multiple
‒ Robust to lighting variations, noise, blur, quantization views?

• Locality: Features are local, therefore robust to occlusion and


• How to describe a local region?
clutter • How to establish correspondences, i.e.,
• Quantity: We need a sufficient number of regions to cover the
compute matches?
object

• Distinctiveness: The regions should contain “interesting” structure

• Efficiency: Close to real-time performance

55 56
22/05/2021

Interest points: why and where?

Feature extraction
• Global features
• Local features
• Interest point detector
• Local descriptor Yarbus eye tracking
• Matching
Source : Derek Hoiem, Computer Vision, University of Illinois.

57 58

Same image with different questions

Interest points: why and where?


• Where will the interest points come from?

59 60
22/05/2021

Keypoint Localization Finding Corners


• Goals:
‒ Repeatable detection
‒ Precise localization
‒ Interesting content
 Look for two-
• Key property:
dimensional signal
‒ In the region around a corner, image gradient has two or
changes
more dominant directions
• Corners are repeatable and distinctive
C.Harris and M.Stephens. "A Combined Corner and Edge Detector.“ Proceedings of the 4th Alvey
Vision Conference, 1988.

Slide credit: Bastian Leibe

Slide credit: Svetlana Lazebnik

61 62

Corners as distinctive interest points Corners versus edges


• Design criteria
‒ We should easily recognize the point by looking through a
small window (locality) Large
Corner
‒ Shifting the window in any direction should give a large Large
change in intensity (good localization)
Small
Edge
Large

Small
Nothing
Small
“flat” region: “edge”: “corner”:
no change in no change along significant change
all directions the edge in all directions
direction
Slide credit: Alyosha Efros

63 64
22/05/2021

Harris detector formulation Corner Detection by Auto-correlation


Change of intensity for the shift [u,v]: Change in appearance of window w(x,y) for shift [u,v]:

E (u , v )   w( x, y )  I ( x  u, y  v )  I ( x, y ) 
2
E (u , v )   w( x, y )  I ( x  u, y  v )  I ( x, y ) 
2
x, y
x, y

Window Shifted
I(x, y)
Intensity E(u, v)
function intensity

or
Window function w(x,y) = E(0,0)

1 in window, 0 outside Gaussian


Source: R. Szeliski
w(x, y)
65 66

Corner Detection by Auto-correlation Corner Detection by Auto-correlation


Change in appearance of window w(x,y) for shift [u,v]: Change in appearance of window w(x,y) for shift [u,v]:

E (u , v )   w( x, y )  I ( x  u, y  v )  I ( x, y )  E (u , v )   w( x, y )  I ( x  u, y  v )  I ( x, y ) 
2 2

x, y x, y

I(x, y) We want to discover how E behaves for small shifts


E(u, v)
But this is very slow to compute naively.
O(window_width2 * shift_range2 * image_width2)
E(3,2)

O( 112 * 112 * 6002 ) = 5.2 billion of these


w(x, y)
67 68
22/05/2021

Corner Detection by Auto-correlation


Change in appearance of window w(x,y) for shift [u,v]: Local quadratic approximation of E(u,v) in the
neighborhood of (0,0) is given by the
E (u , v )   w( x, y )  I ( x  u, y  v )  I ( x, y ) 
2

x, y
second-order Taylor expansion:

We want to discover how E behaves for small shifts  E (0,0) 1  E (0,0) Euv (0,0) u 
E (u , v)  E (0,0)  [u v] u   [u v] uu  
 Ev (0,0)  2  Euv (0,0) Evv (0,0)   v 
But we know the response in E that we are looking
for – strong peak.  Approximation Notation: partial derivative

69 70

Corner Detection: Mathematics


we are looking for strong peak Second-order Taylor expansion of E(u,v) about (0,0):
 E (0,0) 1  E (0,0) Euv (0,0) u 
E (u , v)  E (0,0)  [u v ] u   [u v ] uu  
 Ev (0,0)  2  Euv (0,0) Evv (0,0)   v 
 E (0,0) 1  E (0,0) Euv (0,0) u  Eu (u , v)   2w( x, y )I ( x  u , y  v)  I ( x, y )I x ( x  u , y  v)
E (u , v)  E (0,0)  [u v] u   [u v] uu   x, y
 Ev (0,0)  2  Euv (0,0) Evv (0,0)   v 
Euu (u , v)   2w( x, y )I x ( x  u, y  v) I x ( x  u , y  v)
x, y

Ignore function   2 w( x, y )I ( x  u , y  v)  I ( x, y )I xx ( x  u , y  v)


value; set to 0 Ignore first x, y
Just look at
derivative, shape of Euv (u , v)   2w( x, y )I y ( x  u, y  v) I x ( x  u, y  v)
set to 0 second x, y
derivative
  2 w( x, y )I ( x  u , y  v)  I ( x, y )I xy ( x  u , y  v)
x, y
71 72
22/05/2021

Corner Detection: Mathematics Corner Detection: Mathematics


Second-order Taylor expansion of E(u,v) about (0,0):
E (u , v )   w( x, y )  I ( x  u, y  v )  I ( x, y ) 
2

 E (0,0) 1  Euu (0,0) Euv (0,0) u  x, y


E (u , v)  E (0,0)  [u v ] u   [u v ] E (0,0) E (0,0)   v 
 Ev ( 0, 0)  2  uv vv  

𝑤(𝑥, 𝑦)𝐼 (𝑥, 𝑦) 𝑤(𝑥, 𝑦)𝐼 (𝑥, 𝑦)𝐼 (𝑥, 𝑦)


, , 𝑢
𝐸(𝑢, 𝑣) ≈ [𝑢 𝑣]
𝐸 (0,0) = 2𝑤(𝑥, 𝑦) 𝐼 (𝑥, 𝑦)𝐼 (𝑥, 𝑦) 𝑣
𝐸(0,0) = 0 𝑤(𝑥, 𝑦)𝐼 (𝑥, 𝑦)𝐼 (𝑥, 𝑦) 𝑤(𝑥, 𝑦)𝐼 (𝑥, 𝑦)
𝐸 (0,0) = 0 ,
, ,
𝐸 (0,0) = 0 𝐸 (0,0) = 2𝑤(𝑥, 𝑦) 𝐼 (𝑥, 𝑦)𝐼 (𝑥, 𝑦)
,

𝐸 (0,0) = 2𝑤(𝑥, 𝑦) 𝐼 (𝑥, 𝑦)𝐼 (𝑥, 𝑦)


,

73 74

Harris detector formulation What does this matrix reveal?


• This measure of change can be approximated by: • First, let’s consider an axis-aligned corner:
u 
E (u , v )  [u v ] M     I x2 I I  1 0 
v  M  x y
 
 I x I y I   0 2 
2
where M is a 22 matrix computed from image derivatives: y

 I2 IxI y  Gradient with respect to x,


M   w( x, y )  x  times gradient with respect to y
x, y  I x I y I y2 

Sum over image region – the area


we are checking for corner

Image I Ix Iy IxIy
M
Slide credit: Rick Szeliski
Slide credit: David Jacobs

75 76
22/05/2021

What does this matrix reveal? General case


• First, let’s consider an axis-aligned corner: • Since M is symmetric, we have  0 
M  R 1  1 R
 0 2 
  I x2 I I x y
 1 0  (Eigenvalue decomposition)
M   
 I x I y I   0 2 
2
y • We can visualize M as an ellipse with axis lengths
determined by the eigenvalues and orientation
• This means:
determined by a rotation matrix R
‒ Dominant gradient directions align with x or y axis Direction of the
fastest change
‒ If either λ is close to 0, then this is not a corner, so look for
Direction of the
locations where both are large. slowest change

(max)-1/2
• What if we have a corner that is not aligned with the (min)-1/2
image axes?

Slide credit: David Jacobs adapted from Darya Frolova, Denis Simakov

77 78

Interpreting the eigenvalues Corner response function


• Classification of image points using eigenvalues of M :
q  det(M )  a trace(M )2  12  a (1  2 )2
2 “Edge”
2 “Edge” θ<0
2 >> 1 “Corner”
“Corner” θ>0
1 and 2 are large, 1 ~ 2;
E increases in all directions

• Fast approximation
1 and 2 are small; ‒ Avoid computing the “Flat”
E is almost constant in “Edge”
all directions
“Flat” “Edge” eigenvalues region θ<0
region 1 >> 2
‒ α: constant 1
1
(0.04 to 0.06)
Slide credit: Kristen Grauman Slide credit: Kristen Grauman

79 80
22/05/2021

Window Function w(x,y) M   I x2 IxI y  Summary: Harris Detector [Harris88]


 w( x, y)  I I 
I y2 
x, y  x y

• Option 1: uniform window • Compute second moment matrix


‒ Sum over square window (autocorrelation matrix)
Ix Iy
 I2 IxI y  1. Image
M   x  derivatives
IxI y
x, y  I y2  Ix Iy IxI
2. Square of 2 2
1 in window, 0 outside  I x2 ( D ) I x I y ( D )  derivatives y
M ( I ,  D )  g ( I )   
 I x I y ( D ) I y ( D ) 
2
‒ Problem: not rotation invariant 3. Gaussian
filter g(I)
g(Ix2) g(Iy2) g(IxIy)
• Option 2: Smooth with Gaussian • Compute corner response
‒ Gaussian already performs weighted sum 4. Cornerness function – two strong eigenvalues

 I 2
IxI y  q  det[M ( I , D )] a[trace(M ( I , D ))]2
M  g ( )   x
  g ( I x2 ) g ( I y2 )  [ g ( I x I y )]2  a[ g ( I x2 )  g ( I y2 )]2
Gaussian  I x I y I y2 
R
‒ Result is rotation invariant 5. Perform non-maximum suppression
C.Harris and M.Stephens. “A Combined Corner and Edge Detector.” Proceedings of the 4th Alvey
Vision Conference: pages 147—151, 1988.
Slide credit: Bastian Leibe Slide credit: Krystian Mikolajczyk

81 82

Harris Detector: Workflow Harris Detector: Workflow


computer corner responses θ

Slide adapted from Darya Frolova, Denis Simakov Slide adapted from Darya Frolova, Denis Simakov
83 84
22/05/2021

Harris Detector: Workflow Harris Detector: Workflow


Take points where θ > threshold Take only the local maxima of θ, where θ > threshold

Slide adapted from Darya Frolova, Denis Simakov


Slide adapted from Darya Frolova, Denis Simakov
85 86

Harris Detector: Workflow Harris Detector – Responses [Harris88]

Resulting Harris points

Effect: A very precise


corner detector.

Slide credit: Krystian Mikolajczyk


Slide adapted from Darya Frolova, Denis Simakov
87 88
22/05/2021

Harris Detector: Properties Harris Detector: Properties


• Translation invariance? • Translation invariance
• Rotation invariance?

Ellipse rotates but its shape (i.e.


eigenvalues) remains the same

Corner response θ is invariant to image rotation

Slide credit: Kristen Grauman

Slide credit: Kristen Grauman

89 90

Harris Detector: Properties Scale invariance: how to?


• Translation invariance • Exhaustive search
• Rotation invariance • Invariance
• Scale invariance? • Robustness

Corner All points will be


classified as edges!
Not invariant to image scale!

Slide credit: Kristen Grauman

91 92
22/05/2021

Exhaustive search Invariance


• Multi-scale approach • Extract patch from each image individually

Slide adapted from T. Tuytelaars ECCV 2006 tutorial Slide adapted from T. Tuytelaars ECCV 2006 tutorial

93 94

Automatic scale selection Automatic scale selection


• Solution: • Common approach:
‒ Design a function on the region, which is “scale Take a local maximum of this function
invariant” (the same for corresponding regions, even if
Observation: region size, for which the maximum is
they are at different scales)
achieved, should be invariant to image scale.
Example: average intensity. For corresponding regions (even of
different sizes) it will be the same. Important: this scale invariant region size is found in
‒ For a point in one image, we can consider it as a each image independently!
function of region size (patch width)
f Image 1 f Image 2 f Image 1 f Image 2
scale = 1/2
scale = 1/2

region size s1 s2 region size


region size region size
95 96
22/05/2021

Automatic Scale Selection Example


Function responses for increasing scale (scale signature)

f ( I i1im ( x,  ))  f ( I i1im ( x,  ))

Same operator responses if the patch contains the same image up to scale
factor.
K. Grauman, B. Leibe K. Grauman, B. Leibe
f ( I i1 im ( x,  )) f ( I i1im ( x,  ))
97
98

Example Example
Function responses for increasing scale (scale signature) Function responses for increasing scale (scale signature)

K. Grauman, B. Leibe K. Grauman, B. Leibe


f ( I i1 im ( x,  )) f ( I i1im ( x,  )) f ( I i1 im ( x,  )) f ( I i1im ( x,  ))
99 100
22/05/2021

Example Example
Function responses for increasing scale (scale signature) Function responses for increasing scale (scale signature)

K. Grauman, B. Leibe K. Grauman, B. Leibe


f ( I i1 im ( x,  )) f ( I i1im ( x,  )) f ( I i1 im ( x,  )) f ( I i1im ( x,  ))
101 102

Scale Invariant Detection What is a useful signature function?


• A “good” function for scale detection: • Functions for determining scale f  Kernel  Image
has one stable sharp peak Kernels:
f f f
Good !
L   2  Gxx ( x, y,  )  G yy ( x, y,  ) 
bad bad (Laplacian)
region size region size region size
DoG  G ( x, y, k )  G ( x, y,  )
(Difference of Gaussians)
• For usual images: a good function would be one
where Gaussian
which responds to contrast (sharp local intensity x2  y 2

change) G ( x, y ,  )  1
e 2 2
2 Note: both kernels are invariant
to scale and rotation

103 104
22/05/2021

What is a useful signature function? Characteristic scale


• Laplacian-of-Gaussian = “blob” detector • We define the characteristic scale as the scale
that produces peak of Laplacian response

characteristic scale
T. Lindeberg (1998). "Feature detection with automatic scale selection." IJCV 30 (2): pp 77--116.
Source: K. Grauman, B. Leibe
Source: Lana Lazebnik
105 106

Laplacian-of-Gaussian (LoG) Example: Scale-space blob detector


• Interest points: 5

Local maxima in scale


space of LoG 4

Lxx ( )  Lyy ( ) 3

2

 List of
 (x, y, σ)

Source: K. Grauman, B. Leibe

107 108Lana Lazebnik


Source:
22/05/2021

Example: Scale-space blob detector Example: Scale-space blob detector

Source: Lana Lazebnik


109Lana Lazebnik
Source: 110

Alternative approach Alternative approach


Approximate LoG with Difference-of-Gaussian (DoG). • Approximate LoG with Difference-of-Gaussian (DoG):
‒ 1. Blur image with σ Gaussian kernel
‒ 2. Blur image with kσ Gaussian kernel
‒ 3. Subtract 2. from 1.
Small k gives a closer approximation to LoG, but usually we want to build a
scale space quickly out of this. k = 1.6 gives an appropriate scale space, k =
sqrt(2)

- =

Ruye Wang
Source: K. Grauman, B. Leibe
111 112
22/05/2021

Find local maxima in position-scale space


of DoG Harris-Laplacian
• Harris-Laplacian1 scale

 Laplacian 
Find maxima
Find local maximum of:


‒ Harris corner detector in space y
(image coordinates)
k 2k  Harris  x
‒ Laplacian in scale
- =

- k  List of
= (x, y, s)

- =

Input image 1 K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001
2 D.Lowe. “Distinctive Image Features from Scale-Invariant Keypoints”. IJCV 2004

113 114

115

Scale Invariant Detectors DoG (SIFT) keypoint Detector


• Harris-Laplacian1
scale • DoG at multi-octaves
 Laplacian 

Find local maximum of: • Extrema detection in scale space


‒ Harris corner detector in space y
• Keypoint location
(image coordinates)
‒ Laplacian in scale  Harris  x ‒ Interpolation
‒ Removing instable points
 SIFT (D.Lowe)2 scale
• Orientation Assignment
 DoG 

Find local maximum of:


 Difference of Gaussians in space y
and scale
 DoG  x

1 K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001
2 D.Lowe. “Distinctive Image Features from Scale-Invariant Keypoints”. IJCV 2004

116
22/05/2021

DoG (SIFT) Detector DoG (SIFT) Detector


• DoG at multi-octaves • Scale-Space Extrema Choose all extrema within
3x3x3 neighborhood

 
D k 2
 
G k 2 * I

G k * I
D    G k   G  * I Dk 
G  * I

D 

X is selected if it is larger or smaller than all 26 neighbors


(its 8 neighbors in the current image and 9 neighbors
each in the scales above and below)

117 118

DoG (SIFT) Detector Example of keypoint detection


(a) 233x189 image
• Orientation assignment (b) 832 DOG extrema
(c) 729 left after peak
‒ Create histogram of local value threshold
gradient directions at selected (d) 536 left after testing
ratio of principle
scale curvatures (removing
edge responses)
‒ Assign canonical orientation at
peak of smoothed histogram
• Each key specifies stable 2D
coordinates
(x, y, scale,orientation)
If 2 major orientations, use both.

119 120
22/05/2021

DoG (SIFT) Detector Scale Invariant Detectors


• Experimental evaluation of detectors w.r.t. scale
change
Repeatability rate:
# correspondences
# possible correspondences

A SIFT keypoint : {x, y, scale, dominant orientation}


K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001

Source: Distinctive Image Features from Scale-Invariant Keypoints – IJCV 2004 Slide credit: CS131 -Juan Carlos Niebles and Ranjay Krishna

121

Many existing detectors available


• Hessian & Harris [Beaudet ‘78], [Harris ‘88]

• Laplacian, DoG [Lindeberg ‘98], [Lowe ‘99]

• Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01]

• Harris-/Hessian-Affine [Mikolajczyk & Schmid ‘04]

• EBR and IBR [Tuytelaars & Van Gool ‘04]



MSER
Salient Regions
[Matas ‘02]

[Kadir & Brady ‘01] Feature extraction


• Others… • Global features
• Local features
• Interest point detector
• Those detectors have become a basic building block • Local descriptor
for many recent applications in Computer Vision. • Matching
Slide credit: Bastian Leibe

123 124
22/05/2021

Local Descriptor Invariant local features


• Compact, good representation for local information • Image content is transformed into local feature coordinates that
are invariant to translation, rotation, scale, and other imaging
parameters
• Invariant
‒ Geometric transformations: rotation, translation, scaling,..
‒ Camera view change
‒ Illiminution
• Exemples
‒ SIFT, SURF(Speeded Up Robust Features), PCA-SIFT, …
‒ LBP, BRISK, MSER and FREAK, …

Following slides credit: CVPR 2003 Tutorial on Recognition and Matching Based on Local Invariant Features David Lowe

125 126

128

Advantages of invariant local features Becoming rotation invariant


• Locality: • We are given a keypoint and its scale from DoG
‒ features are local, so robust to occlusion and clutter (no prior
segmentation) • We will select a characteristic orientation for
• Distinctiveness: the keypoint (based on the most prominent
gradient there)
‒ individual features can be matched to a large database of objects
• Quantity:
• We will describe all features
‒ many features can be generated for even small objects
relative to this orientation
• Efficiency:
‒ close to real-time performance
• Extensibility:
‒ can easily be extended to wide range of differing feature types,
with each adding robustness
127
22/05/2021

129 130

SIFT descriptor formation SIFT descriptor formation


• Use the blurred image associated with the
keypoint’s scale
0 2
• Take image gradients over the keypoint
neighborhood.
• To become rotation invariant, rotate the
gradient directions AND locations by (-
keypoint orientation) • Using precise gradient locations is fragile. We’d like to allow some “slop” in
‒ Now we’ve cancelled out rotation and have the image, and still produce a very similar descriptor
gradients expressed at locations relative to
keypoint orientation θ • Using Gaussian filter : to avoid sudden changes in the descriptor with small
changes in the position of the window, and to give less emphasis to
‒ We could also have just rotated the whole
image by -θ, but that would be slower. gradients that are far from the center of the descriptor, as these are most
affected by misregistration errors
• Create array of orientation histograms (a 4x4 array is shown)

Source: Distinctive Image Features from Scale-Invariant Keypoints – IJCV 2004


http://campar.in.tum.de/twiki/pub/Chair/TeachingWs13TDCV/feature_descriptors.pdf SIFT: Distinctive Image Features from Scale-Invariant Keypoints – IJCV 2004
Image: Ashish A Gupta, PhD thesis 2013

131 132

SIFT descriptor formation SIFT descriptor formation


• Adding robustness to illumination changes:
0 2
‒ The descriptor is made of gradients (differences between pixels)
•  already invariant to changes in brightness (e.g. adding 10 to
all image pixels yields the exact same descriptor)
‒ A higher -contrast photo will increase the magnitude of gradients
• Put the rotated gradients into their local orientation histograms linearly
‒ A local orientation histogram has n bins (e.g 8 as shown). •  to correct for contrast changes, normalize the vector (scale to
‒ To avoid all boundary affects in which the descriptor abruptly changes as a sample shifts length 1.0)
smoothly from being within one histogram to another or from one orientation to another
‒ Very large image gradients are usually from unreliable 3D
•  interpolation is used to distribute the value of each gradient sample into adjacent histogram bins
• Also, scale down gradient contributions for gradients far from the bin center: a weight of 1-d
illumination effects (glare, etc)
d: distance of the sample from the central value of the bin •  to reduce their effect, clamp all values in the vector to be ≤ 0.2
• The SIFT authors found that best results were with 8 orientation bins per histogram (an experimentally tuned value). Then normalize the vector again.
and and a 4x4 histogram array  a SIFT descriptor: vector of 128 values
 Result is a vector which is fairly invariant to illumination changes
Image: Ashish A Gupta, PhD thesis 2013 SIFT: Distinctive Image Features from Scale-Invariant Keypoints – IJCV 2004
22/05/2021

SIFT Sensitivity to number of histogram orientations

• Extraordinarily robust matching technique


‒ Can handle changes in viewpoint: up to about 60 degree out of plane rotation
‒ Can handle significant changes in illumination
• Sometimes even day vs. night (below)
‒ Fast and efficient—can run in real time

Steve Seitz
Steve Seitz
133 134

136
Feature stability to noise Feature stability to affine change

• Match features after random change in image scale & orientation, • Match features after random change in image scale & orientation,
with differing levels of image noise with 2% image noise, and affine distortion
• Find nearest neighbor in database of 30,000 features • Find nearest neighbor in database of 30,000 features

David G. Lowe, "Distinctive image features from scale-invariant keypoints," IJCV, 60, 2 (2004), pp. 91-110

135
22/05/2021

SIFT Keypoint Descriptor: summary


137
Distinctiveness of features

• Vary size of database of features, with 30 degree affine change,


2% image noise
• Measure % correct for single nearest neighbor match

Blur the image Compute orientation


Compute gradients in
using the scale of histogram in 8
respect to the keypoint
the keypoint directions over 4x4
orientation(rotation
(scale invariance) sample regions
invariance)

Source: Distinctive Image Features from Scale-Invariant Keypoints – IJCV 2004


http://campar.in.tum.de/twiki/pub/Chair/TeachingWs13TDCV/feature_descriptors.pdf
138

Other detectors and descriptors

Popular features: SURF, HOG, SIFT


http://campar.in.tum.de/twiki/pub/Chair/TeachingWs13TDCV/feature_descriptors.p
df

Summary some local features:


http://www.cse.iitm.ac.in/~vplab/courses/CV_DIP/PDF/Feature_Detectors_and_Descri

Feature extraction
ptors.pdf

• Global features
• Local features
• Interest point detector
• Local descriptor
• Matching

139 140
22/05/2021

Feature matching Feature matching


Given a feature in I1, how to find the best match in I2? • How to define the difference between two features f1, f2?
1. Define distance function that compares two descriptors ‒ Simple approach: use only distance value d(f1, f2)
• Use L1, L2, cosine, Mahalanobis,… distance •  can give good score to very ambiguous matches

2. Test all the features in I2, find the one with min distance ‒ Better approaches: add additional constraints
• Radio of distance
OpenCV: • Spatial constraints between neigborhood pixels
- Brute force matching
• Fitting the transformation, then refine the matches (RANSAC)
- Flann Matching: Fast Library for Approximate Nearest Neighbors
[Muja and Lowe, 2009]

Marius Muja and David G Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP (1),
pages 331–340, 2009

141

Feature matching Feature matching


• Simple approach: use distance value d(f1, f2) • Better approaches: radio of distance = d(f1,f2) / d(f1,f2')
 can give good score to very ambiguous matches ‒ f2 is best match to f1 in I2;
‒ f2’ is 2nd best SSD match to f1 in I2
‒ An ambiguous/bad match will have ratio close 1
‒ Look for unique matches which have low ratio
f1 f2
f1 f2' f2

I1 I2
I1 I2
22/05/2021

145
Ratio of distances reliable for matching Feature matching
• Better approaches: Spatial constraints between neigborhood pixels

David G. Lowe, "Distinctive image features from scale-invariant keypoints," IJCV, 60, 2 (2004), pp. 91-110 Source: from slides of Valérie Gouet-Brunet

Feature matching Evaluating the results


• Better approaches: fitting the transformation (RANSAC alg.) How can we measure the performance of a feature matcher?
‒ Fitting 2D transformation matrix
50
• Six variables 75
– Each point give two equations 200
–  at least three points
• Least squares

‒ RANSAC: refinement of matches Best feature distance


• Compute error:

148
22/05/2021

149

True/false positives Image matching


50
true match
75
• How to define the distance between 2 images I1, I2?
200 ‒ Using global features: easy
false match
d(I1, I2) = d(feature of I1, feature of I2)

‒ Using local features:


feature distance
• Voting strategy
The distance threshold affects performance • Solving an optimization problem (time consuming)
‒ True positives = # of detected matches that are correct • Building a "global" feature from local features: BoW (bag-of-
• Suppose we want to maximize these—how to choose threshold? words, bag-of-features), VLAD, ..
‒ False positives = # of detected matches that are incorrect
• Suppose we want to minimize these—how to choose threshold?

150

Voting strategy Optimization problem


Input image
• Transportation problem
Images in a database

I1 : {(ri, wi), i=1, N} Provider


I2 : {(r’j, wj), j=1, M} Consommer
d(I1, I2) =???

d ( I1 , I 2 )  min  f ij  d (ri , r ' j )


i j
Selected region
for query The similarity between 2
f ij  0
images is based on the
number of matches fi
ij  w j ,  f ij  wi
j

 f  d (r , r '
*
)
 f ij  min( wi ,  w j )
d EMD ( I1 , I 2 )  i j
ij i j

 f
i j i j *
ij
i j

Source: Modified from slides of Valérie Gouet-Brunet


http://vellum.cz/~mikc/oss-projects/CarRecognition/doc/dp/node29.html
151 152
22/05/2021

Bag-of-words Visual Vocabulary …


• Local feature ~~ a word
• An image ~~ a document
• Apply a technique for textual document representation:
vector model

1.Extracting local features from a 2. Builing visual vocabulary


set of images (dictionary) using a clustering
method

3. An image is represented by a bag of words


 can be represented by tf.idf vector

153 154

Bag of words: outline


1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
4. Represent images by frequencies of
“visual words”
Applications

155 156
22/05/2021

Object detection/recognition/search 157


Object detection/recognition

Sivic and Zisserman, 2003

Lowe 2002
Rothganger et al. 2003

David Low, Distinctive Image Features from Scale-Invariant Keypoints, IJCV


1582004

Application: Image Panoramas Application: Image Panoramas 16


0

Slide credit: Darya Frolova, Denis Simakov

Slide credit: Darya Frolova, Denis Simakov


• Procedure:
‒ Detect feature points in both images
‒ Find corresponding pairs
‒ Use these pairs to align the images
17-Oct- 17-Oct-
159 160
17 17
22/05/2021

Automatic mosaicing Wide baseline stereo

[Image from T. Tuytelaars ECCV 2006 tutorial]

http://www.cs.ubc.ca/~mbrown/autostitch/autostitch.html

161 162

CBIR (Content-based image retrieval) CBIR: partial retrieval

Source. http://www-rocq.inria.fr/imedia
163 164
22/05/2021

CBIR: BoW with SIFT + histogram CBIR: BoW with SIFT + histogram
Tập đặc
20 ảnh/nhóm trưng
SIFT

5 ảnh/nhóm

Tập ảnh lưu Từ điển hình ảnh


trong DB
Tính vector trọng
số cho từng ảnh

~80 ảnh/nhóm

~60 ảnh/nhóm
CSDL mô tả đặc
trưng ảnh SIFT
dưới dạng các
Source :ĐATN – Phạm Xuân Trường K52 - BK vector trọng số
Tập ảnh test Source :Đồ án tốt nghiệp – Phạm Xuân Trường K52 - BK

165 166

CBIR: BoW with SIFT + histogram References


• Lecture 5,6: CS231 - Juan Carlos Niebles and Ranjay Krishna,
Stanford Vision and Learning Lab
• Vision par Ordinateur, Alain Boucher, IFI

Source :Đồ án tốt nghiệp – Phạm Xuân Trường K52 - BK

is.hust.edu.vn 167 168


22/05/2021

Thank you for


your attention!

169

You might also like