You are on page 1of 41

Discussion 2 – Image Classification

Overview
Image Classification : A core task in Computer
Vision
Given : dog, cat, truck, plane, …

Cat

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 2
The Problem: Semantic Gap

What the computer sees

An image is just a tensor of integers between [0, 255]


e.g. 800 x 600 x 3 (RGB)

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 3
Challenges: Viewpoint variation

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 4
Challenges: Background Clutter

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 5
Challenges: Illumination

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 6
Challenges: Occlusion

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 7
Challenges: Deformation

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 8
Challenges: Interclass Variation

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 9
An image classifier

• Unlike e.g. sorting a list of numbers


• no obvious way to hard-code the algorithm for recognizing a cat, or
other classes

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 10
Attempts : Edge Detectors

Find Edges Find Corners

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 11
Machine Learning: Data-Driven Approach
Collect a dataset of images and labels
Example Training Dataset
Use Machine Learning algorithms to train a classifier

Evaluate the classifier on new images

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 12
Discussion 2 – Image Classification
Nearest Neighbor Classifier
Simple Classifier: Nearest Neighbor

Memorize all data and


labels

Predict the label of the


most similar training
image
Slide Credit : Stanford CS231n
moodle.skylightmyanmar.com 14
Simple Classifier: Nearest Neighbor
Training images

Bird Cat Deer Car Plane

Test Image

• Calculate distance metric with training images


• Predict the label of smallest distance

Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 15


Example Dataset: CIFAR 10

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 16
Distance Metric
𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=∑ ¿ 𝑰 − 𝑰 ∨¿¿L1 (Manhattan) Distance
𝒑
𝟏
𝒑
𝟐
𝒑

Test Image Training Image Pixelwise absolute value differences

56 32 10 18 10 20 24 17 46 12 14 1

90 23 128 133 8 10 89 100 82 13 39 33 add

24 26 178 200 - 12 16 178 170


= 12 10 0 30
456

2 0 255 220 4 32 233 112 2 32 22 108

Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 17


Nearest Neighbor classifier

Memorize training Data

For each test image


Find closest train Image
Predict label of nearest image

Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 18


Nearest Neighbor classifier

Q: With N train examples, how


fast are training and prediction?

Ans: Train O(1),


Predict O(N)

This is bad: we want classifiers


that are
• fast at prediction
• slow for training is ok

Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 19


Nearest Neighbor classifier

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 20
K Nearest Neighbor

K=1 K=3 K=5

• Instead of copying label from nearest neighbor, take majority vote


from K closest points
Slide Credit : Stanford CS231n
moodle.skylightmyanmar.com 21
K Nearest Neighbor: Distance Metric
L1 (Manhattan) Distance L2 (Euclidean) Distance

𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=∑ ¿ 𝑰 − 𝑰 ∨¿¿ 𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=
𝒑
𝒑
𝟏
𝒑
𝟐
√ ∑
𝒑
( 𝑰
𝒑
𝟏 −𝑰
𝒑𝟐
)
𝟐

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 22
K Nearest Neighbor: Distance Metric
L1 (Manhattan) Distance L2 (Euclidean) Distance

𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=∑ ¿ 𝑰 − 𝑰 ∨¿¿ 𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=
𝒑
𝒑
𝟏
𝒑
𝟐
√ ∑
𝒑
( 𝑰
𝒑
𝟏 −𝑰
𝒑𝟐
)
𝟐

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 23
K Nearest Neighbor: Distance Metric
L1 (Manhattan) Distance L2 (Euclidean) Distance

𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=∑ ¿ 𝑰 − 𝑰 ∨¿¿ 𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=
𝒑
𝒑
𝟏
𝒑
𝟐
√ ∑
𝒑
( 𝑰
𝒑
𝟏 −𝑰
𝒑𝟐
)
𝟐

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 24
Nearest Neighbor: Hyperparameters
• What is the best value of k to use?
• What is the best distance metric to use?

These are hyperparameters: choices about the


algorithms themselves.

Very problem-dependent.
Must try them all out and see what works best.

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 25
Setting Hyperparameters
Idea #1: Choose hyperparameters BAD: K = 1 always works
that work best on the data perfectly on training data
Your Dataset

Idea #2: Split data into train and test, choose BAD: No idea how algorithm
hyperparameters that work best on test data will perform on new data
Train Test

Idea #3: Split data into train, val, and test; choose
hyperparameters on val and evaluate on test Better!

Train Validation Test


Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 26
Setting Hyperparameters
Your Dataset

Idea #4: Cross-Validation: Split data into folds,


try each fold as validation and average the results
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Test

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Test

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Test


Useful for small datasets, but not used too frequently in deep learning
Slide Credit : Stanford CS231n
moodle.skylightmyanmar.com 27
k-Nearest Neighbors
Almost never used in practise for image recognition
1. Very slow at test time
2. Distance metrics on pixels are not informative.
Original Image Boxed Shifted Tinted

(all 3 images have same L2 distance to the one on the left)


Slide Credit : Stanford CS231n
moodle.skylightmyanmar.com 28
k-Nearest Neighbors
Almost never used in practise for image recognition
Dimensions = 3
3. Curse of dimensionality Points =
Dimensions = 2
Points =

Dimensions = 1
Points = 4

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 29
k-Nearest Neighbors: Summary
• Image classification (Data driven approach)
• Starts with a training set of images and labels,
• Predicts labels on the test set
• K-Nearest Neighbors classifier
• Memorizes all training images
• Predicts labels based on the K nearest training examples. Very slow.
• Distance metric and K are hyperparameters
• Choose hyperparameters using the validation set
• Only run on the test set once at the very end.
• Pixel distance is not very informative.
• Curse of Dimensionality
Slide Credit : Stanford CS231n
moodle.skylightmyanmar.com 30
Discussion 2 – Image Classification
Linear Classifier
CIFAR10

50, 000 training images

10, 000 test images

each image is 32x32x3

moodle.skylightmyanmar.com 32
Parametric Approach : Linear Classifier
𝒇

3027x1
𝑾

Image 𝒇 ( 𝒙 ,𝑾 )=𝑾 𝒙
10x1 10x3027 𝒙

𝒇 (𝒙 ,𝑾 ) 10 numbers giving
class scores

Array of 32x32x3 numbers 𝑾


(3072 numbers total) parameters or
weights
Slide Credit : Stanford CS231n
moodle.skylightmyanmar.com 33
Parametric Approach : Linear Classifier
𝒇
𝑾
3027x1
Image 𝒇 ( 𝒙 ,𝑾 )=𝑾 𝒙 +𝒃 10x1
𝒃
10x1 10x3027 𝒙

𝒇 (𝒙 ,𝑾 ) 10 numbers giving
class scores

Array of 32x32x3 numbers 𝑾


(3072 numbers total) parameters or
weights
Slide Credit : Stanford CS231n
moodle.skylightmyanmar.com 34
Example: 4 pixels, 3 classes
Flatten tensors into a vector

56
0.2 -0.5 0.1 2.0
1.1 -96.8 Cat score
56 231
231 3.2 437.9 Dog score
1.5 1.3 2.1 0.0 + =
24 2
0.0 0.25 0.2 -0.3 24 -1.2 61.95 Ship score
Input Image 2
W b
x

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 35
Example: 4 pixels, 3 classes
Flatten tensors into a vector

56 -96.8
0.2 -0.5 0.1 2.0 1.1 Cat score
56 231
231 437.9 Dog score
1.5 1.3 2.1 0.0 3.2 =
24 2
0.0 0.25 0.2 -0.3 -1.2 24 61.95 Ship score
Input Image
W b 2

Bias Trick 1
x
Slide Credit : Stanford CS231n
moodle.skylightmyanmar.com 36
Interpreting a Linear Classifier 56 231

24 2

0.2 -0.5 1.5 1.3 0.0 0.25


W
0.1 2.0 2.1 0.0 0.2 -0.3

b 1.1 3.2 -1.2

score -96.8 437.9 61.95

Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 37


Linear Classifier: Geometric Perspective
Car 𝒇 ( 𝒙 ,𝑾 )=𝑾 𝒙+𝐛
Plane

Array of 32x32x3 numbers


(3072 numbers total)

Deer
Slide Credit : Stanford CS231n
moodle.skylightmyanmar.com 38
Hard cases for Linear Classifier
Class 1:
Class 1:
0 <= L2 distance <= 2
Red Region
Class 2:
Class 2:
4 <= L2 distance <= 6
Blue Region

moodle.skylightmyanmar.com 39
Linear Classifier: Three Viewpoints
Algebraic Viewpoint Visual Viewpoint Geometric Viewpoint

Hyperplanes
𝒇 ( 𝒙 ,𝑾 )=𝑾 𝒙+𝒃 One template per class

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 40
Coming up
Loss Function
To quantify how happy we are with the scores

Optimization

To find the best parameters W which minimized the loss

Slide Credit : Stanford CS231n


moodle.skylightmyanmar.com 41

You might also like