Professional Documents
Culture Documents
Zebra
Non-zebra
Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
Image classification
• Given the bag-of-features representations of
images from different classes, how do we
learn a model for distinguishing them?
Discriminative methods
• Learn a decision rule (classifier) assigning
bag-of-features representations of images
to different classes
Decision Zebra
boundary
Non-zebra
Generative methods
• Model the probability of a bag of features given a class
Classification
• Assign input vector to one of two or more
classes
• Any decision rule divides input space into
decision regions separated by decision
boundaries
Nearest Neighbor Classifier
k=5
Source: D. Lowe
Functions for comparing histograms
N
• L1 distance D(h1 , h2 ) | h1 (i ) h2 (i ) |
i 1
• χ distance
2 D(h1 , h2 )
N
h1 (i) h2 (i) 2
i 1 h1 (i ) h2 (i )
x i positive : xi w b 0
x i negative : xi w b 0
Which hyperplane
is best?
Support vector machines
• Find hyperplane that maximizes the margin
between the positive and negative examples
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Support vector machines
• Find hyperplane that maximizes the margin
between the positive and negative examples
x i positive ( yi 1) : xi w b 1
x i negative ( yi 1) : x i w b 1
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Finding the maximum margin hyperplane
1. Maximize margin 2/||w||
2. Correctly classify all training data:
x i positive ( yi 1) : xi w b 1
x i negative ( yi 1) : x i w b 1
Subject to yi(w·xi+b) ≥ 1
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Finding the maximum margin hyperplane
• Solution: w i i yi x i
learned Support
weight vector
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Finding the maximum margin hyperplane
• Solution: w i i yi x i
b = yi – w·xi for any support vector
0 x
Φ: x → φ(x)
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Kernels for bags of features
• Histogram intersection kernel:
N
I (h1 , h2 ) min(h1 (i ), h2 (i ))
i 1
J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for
Classifcation of Texture and Object Categories: A Comprehensive Study, IJCV 2007
Pyramid match kernel
• Fast approximation of Earth Mover’s Distance
• Weighted sum of histogram intersections at
mutliple resolutions (linear in the number of
features instead of cubic)
optimal partial
matching between
sets of features
• Cons
• No “direct” multi-class SVM, must combine two-class SVMs
• Computation, memory
– During training time, must compute matrix of kernel values for
every pair of examples
– Learning can take a very long time for large-scale problems
Generative methods
• Model the probability distribution that
produced a given bag of features
• We will cover two models, both inspired by
text document analysis:
• Naïve Bayes
• Probabilistic Latent Semantic Analysis
The Naïve Bayes model
• Assume that each feature is conditionally
independent given the class
N
p ( w1 , , wN | c) p( wi | c)
i 1
• “Graphical model”:
c w
N
Csurka et al. 2004
Probabilistic Latent Semantic Analysis
zebra
“visual topics”
grass
tree
T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
Probabilistic Latent Semantic Analysis
= α1 + α2 + α3
New image zebra grass tree
d z w
d z w
K
p ( wi | d j ) p ( wi | z k ) p ( z k | d j )
k 1
d z w
“face”
p(zk|dj)
words
topics
p(wi|dj) = p(wi|zk)
M … number of codewords
N … number of images
z arg max p ( z | d )
z
Recognition
• Finding the most likely topic (class) for an image:
z arg max p ( z | d )
z
p( w | z ) p( z | d )
z arg max p ( z | w, d ) arg max
z z z p(w | z) p( z | d )
Topic discovery in images