You are on page 1of 50

CS484: Introduction to Computer Vision

Machine Learning for Computer Vision


(Szeliski: 3, 4, 10)

Min H. Kim
KAIST School of Computing

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision


MACHINE LEARNING FOR
COMPUTER VISION

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 2


Outline
• Machine Learning for Computer Vision
– Machine learning overview
– Supervised machine learning framework
– Generalization error

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 3


Machine Learning
• Learn from and make predictions on data.

• Arguably the greatest export from computing to


other scientific fields.

• Statisticians might disagree with Computer


Science on the true origins…

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 4


ML for Computer Vision
• Face Recognition
• Object Classification
• Scene Segmentation

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 5


Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 6
UNSUPERVISED
MACHINE LEARNING
Acknowledgment: Many slides from James Hays, Derek Hoiem and Grauman & Leibe 2008 AAAI Tutorial
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 7
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 8
Dimensionality Reduction
• So we decided we need fewer dimensions
• But which dimension? There are infinite choices
• Dimension of the greatest variability!
• Why?

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision Shin Yoo 9
Dimensionality Reduction

• PCA, ICA, LLE, Isomap, etc.

• Principal component analysis


– Creates a basis where the axes
represent the dimensions of variance,
from high to low.
– Finds correlations in data dimensions
to produce best possible lower-
dimensional representation based on
linear projections.

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 10


Dimension of Greatest Variability
• Example: reduce 2D
data to 1D
d1

• Picking d with the


maximum variability:
this reduces cases
where points are
d2
close to each other in
d, but far away in the
original dimensions
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision Shin Yoo 11
Principal Component Analysis (PCA)
• Defines a set of principal components, i.e., directions of
different degrees of variance
– 1st: direction of the greatest variance in the data
– 2nd: direction of the second greatest variance in the data,
perpendicular to the 1st one
–…
• Find n<<d components (dimensions), and project every
data point to these dimensions

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision Shin Yoo 12
Eigenfaces The ATT face database (formerly the ORL
database), 10 pictures of 40 subjects each

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 13


Eigenfaces

Mean face

Basis of variance (eigenvectors)

M. Turk; A. Pentland (1991). "Face recognition using eigenfaces" (PDF).


14
Lecturer:
Proc. Min H.Conference
IEEE Kim (KAIST) on ComputerCS484:
VisionIntroduction
and PatterntoRecognition.
Computer Vision
pp. 586–591. R.P.W. Duin
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 15
Clustering: image segmentation
Goal: Break up the image into meaningful or
perceptually similar regions

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 16


Segmentation for feature support or effi
ciency

50x50
Patch
50x50
Patch

[Felzenszwalb and Huttenlocher 2004]

Superpixels!

Lecturer: Min H. Kim (KAIST) [ShiVision


CS484: Introduction to Computer and Malik 2001] 17
[Hoiem et al. 2005, Mori 2005] Derek Hoiem
Segmentation as a result

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 18


GrabCut, Rother et al. 2004
Types of segmentations

Oversegmentation Undersegmentation

Hierarchical Segmentations
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 19
Clustering

Group together similar ‘points’ and represent


them with a single token.

Key Challenges:
1) What makes two points/images/patches similar?
2) How do we compute an overall grouping from
pairwise similarities?

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 20


Derek Hoiem
Why do we cluster?
• Summarizing data
– Look at large amounts of data
– Patch-based compression or denoising
– Represent a large continuous vector with the cluster
number
• Counting
– Histograms of texture, color, SIFT vectors
• Segmentation
– Separate the image into different regions
• Prediction
– Images in theCS484:
Lecturer: Min H. Kim (KAIST) same cluster
Introduction may
to Computer Visionhave the same labels
Derek
21
Hoiem
How do we cluster?
• K-means
– Iteratively re-assign points to the nearest cluster
center

• Mean-shift clustering
– Estimate modes of pdf

• Agglomerative clustering, spectral clustering,


etc.

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 22


SUPERVISED
MACHINE LEARNING
Acknowledgment: Many slides from James Hays, Derek Hoiem and Grauman & Leibe 2008 AAAI Tutorial
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 23
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 24
Data, data, data!

• Norvig – “The Unreasonable Effectiveness of


Data” (IEEE Intelligent Systems, 2009)
– “... invariably, simple models and a lot of data trump
more elaborate models based on less data”
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 25
ImageNet

• Images for each


category of
WordNet
• 1000 classes
• 1.2mil. images
• 100k test

• Top 5 error
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 26
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 27
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 28
ImageNet Competition

• Krizhevsky, 2012
• Google,
Microsoft 2015
– Beat the best
human score in
the ImageNet
challenge.

CS484: Introduction to Computer Vision 29


Lecturer: Min H. Kim (KAIST)
NVIDIA
Classification vs. Regression
• Function Approximation
– Predictive modeling is the problem of developing a
model using historical data to make a prediction on
new data where we do not have the answer.
• Classification Predictive Modeling
– Classification predictive modeling is the task of
approximating a mapping function (f) from input
variables (X) to discrete output variables (y).
• Regression Predictive Modeling
– Regression predictive modeling is the task of
approximating a mapping function (f) from input
variables (X) to a continuous output variable (y).
CS484: Introduction to Computer Vision 30
Lecturer: Min H. Kim (KAIST)
Jason Brownlee
SUPERVISED MACHINE LEARNING
FRAMEWORK

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 31


Dataset split

Training Validation Testing


Images Images Images

- Train classifier - Measure error - Secret labels


- Tune model - Measure error
hyperparameters

Random train/validate splits = cross validation


Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 32
Supervised Learning Framework
Training Training
Labels
Training
Images
Image Learned
Training
Features classifier

Testing

Image Features Apply classifier Prediction

Test Image
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 33
Slide credit: D. Hoiem and L. Lazebnik
Features

• Raw pixels
• Histograms
• Templates
• SIFT descriptors
– GIST
– ORB
– HOG….

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 34


L. Lazebnik
General Principles of Representation
• Coverage
– Ensure that all relevant info is
captured

• Concision
– Minimize number of features
without sacrificing coverage

• Directness
– Ideal features are independently
useful for prediction

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 35


Supervised Learning Framework
Training Training
Labels
Training
Images
Image Learned
Training
Features classifier

Testing

Image Features Apply classifier Prediction

Test Image
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 36
Slide credit: D. Hoiem and L. Lazebnik
Recognition task and supervision
• Label: Images in the training set must be annotated
with the “correct answer” that the model is expected
to produce

Contains a motorbike

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 37


L. Lazebnik
Spectrum of supervision (label)
Less More

E.G., ImageNet E.G., MS Coco

Unsupervised “Weakly” supervised Fully supervised

‘Semi-supervised’: small partial labeling Fuzzy; definition depends on task


Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 38
Lazebnik
Good training example?

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 39


Good labels?

CS484: Introduction to Computer Vision 40


http://mscoco.org/explore/?id=134918
Lecturer: Min H. Kim (KAIST)
Google guesses from the 1st caption
Google search: an elephant standing on top of a basket being held by a woman

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 41


Supervised Learning Framework
Training Training
Labels
Training
Images
Image Learned
Training
Features classifier

Testing

Image Features Apply classifier Prediction

Test Image
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 42
Slide credit: D. Hoiem and L. Lazebnik
The machine learning framework
• Apply a prediction function to a feature
representation of the image to get the desired
output:

f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 43
Slide credit: L. Lazebnik
The machine learning framework

f(x) = y
Prediction function Image Output (label)
or classifier feature

Training: Given a training set of labeled examples:


{(x1,y1), …, (xN,yN)}
Estimate the prediction function f by minimizing
the prediction error on the training set.

Testing: Apply f to a unseen test example xu and output


the predicted value yu = f(xu) to classify xu.
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 44
Slide credit: L. Lazebnik
Supervised Learning Framework
Training Training
Labels
Training
Images
Image Learned
Training
Features classifier

Testing

Image Features Apply classifier Prediction

Test Image
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 45
Slide credit: D. Hoiem and L. Lazebnik
Supervised Learning Framework
Features and distance measures
define visual similarity.

Training labels
dictate that examples are the same or different.

Classifiers
learn weights (or parameters) of features and
distance measures…
so that visual similarity predicts label similarity.

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 46


Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 47
Least squares line fitting
• Suppose we want to find out the relationship
between the weight and height measurements

• How can we define the linear relationship


between these two measurements?

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 48


Linear regression
• Least squares
• Total least squares
• Maximum likelihood
• Random sample consensus (RANSAC)

Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 49


Generalization

Training set (labels known) Test set (labels unknown)

How well does a learned model generalize from the data


it was trained on to a new test set?
Lecturer: Min H. Kim (KAIST) CS484: Introduction to Computer Vision 50
Slide credit: L. Lazebnik

You might also like