Professional Documents
Culture Documents
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
Slide credit: L. Lazebnik
The machine learning
framework
y = f(x)
output prediction Image
function feature
Testing
Image Learned
Prediction
Features model
Test Image Slide credit: D. Hoiem and L. Lazebnik
Features
• Raw pixels
• Histograms
• GIST descriptors
• …
Test Training
Training examples
examples example
from class 2
from class 1
f(x) = sgn(w x + b)
Contains a motorbike
Error due to
Unavoidable Error due to variance of training
error incorrect samples
assumptions
Underfitting Overfitting
Error
Test error
Training error
Testing
Generalization Error
Training
Number of Training Examples
Slide
Slide
credit:
credit:
D. D.
Hoiem
Hoiem
How to reduce variance?
x
x
x o
x x
x
+ o
o x
x
o o+
o
o
x2
x1
1-nearest neighbor
x
x
x o
x x
x
+ o
o x
x
o o+
o
o
x2
x1
3-nearest neighbor
x
x
x o
x x
x
+ o
o x
x
o o+
o
o
x2
x1
5-nearest neighbor
x
x
x o
x x
x
+ o
o x
x
o o+
o
o
x2
x1
Using K-NN
x
x
x x x
x
o x
x
o o
o
o
x2
x1
• Find a linear function to separate the classes:
f(x) = sgn(w x + b)
Classifiers: Linear SVM
x
x
x x x
x
o x
x
o o
o
o
x2
x1
• Find a linear function to separate the classes:
f(x) = sgn(w x + b)
Classifiers: Linear SVM
x
x
x o
x x
x
o x
x
o o
o
o
x2
x1
• Find a linear function to separate the classes:
f(x) = sgn(w x + b)
Nonlinear SVMs
• Datasets that are linearly separable work out great:
0 x
Φ: x → φ(x)
y ( x ) ( x) b y K ( x , x) b
i
i i i
i
i i i
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Nonlinear kernel: Example
• Consider the mapping ( x) ( x, x 2 )
x2
( x) ( y ) ( x, x 2 ) ( y, y 2 ) xy x 2 y 2
K ( x, y ) xy x 2 y 2
Kernels for bags of features
• Histogram intersection kernel:
N
I (h1 , h2 ) min(h1 (i ), h2 (i ))
i 1
• Cons
• No “direct” multi-class SVM, must combine two-class SVMs
• Computation, memory
– During training time, must compute matrix of kernel values for
every pair of examples
– Learning can take a very long time for large-scale problems
What to remember about classifiers
• General
– Tom Mitchell, Machine Learning, McGraw Hill, 1997
– Christopher Bishop, Neural Networks for Pattern
Recognition, Oxford University Press, 1995
• Adaboost
– Friedman, Hastie, and Tibshirani, “Additive logistic
regression: a statistical view of boosting”, Annals of
Statistics, 2000
• SVMs
– http://www.support-vector.net/icml-tutorial.pdf