Discussion 2 - Image Classification - Annotated

Discussion 2 – Image Classification
Overview
Image Classification : A core task in Computer
Vision
Given : dog, cat, truck, plane, …
Cat
Slide Credit : Stanford CS231n

moodle.skylightmyanmar.com 2
The Problem: Semantic Gap
What the computer sees
An image is just a tensor of integers between [0, 255]

e.g. 800 x 600 x 3 (RGB)

Challenges: Viewpoint variation

Challenges: Background Clutter

Challenges: Illumination

Challenges: Occlusion

Challenges: Deformation

Challenges: Interclass Variation

An image classifier
• Unlike e.g. sorting a list of numbers

• no obvious way to hard-code the algorithm for recognizing a cat, or
other classes

Attempts : Edge Detectors
Find Edges Find Corners

Machine Learning: Data-Driven Approach
Collect a dataset of images and labels
Example Training Dataset
Use Machine Learning algorithms to train a classifier
Evaluate the classifier on new images

Nearest Neighbor Classifier
Simple Classifier: Nearest Neighbor
Memorize all data and

labels
Predict the label of the

most similar training
image
Simple Classifier: Nearest Neighbor
Training images
Bird Cat Deer Car Plane
Test Image
• Calculate distance metric with training images

• Predict the label of smallest distance
Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 15

Example Dataset: CIFAR 10

Distance Metric
𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=∑ ¿ 𝑰 − 𝑰 ∨¿¿L1 (Manhattan) Distance
𝒑
𝟏
𝒑
𝟐
𝒑
Test Image Training Image Pixelwise absolute value differences
56 32 10 18 10 20 24 17 46 12 14 1
90 23 128 133 8 10 89 100 82 13 39 33 add
24 26 178 200 - 12 16 178 170

= 12 10 0 30
456
2 0 255 220 4 32 233 112 2 32 22 108

Nearest Neighbor classifier
Memorize training Data
For each test image

Find closest train Image
Predict label of nearest image

Q: With N train examples, how

fast are training and prediction?
Ans: Train O(1),

Predict O(N)
This is bad: we want classifiers

that are
• fast at prediction
• slow for training is ok


K Nearest Neighbor
K=1 K=3 K=5
• Instead of copying label from nearest neighbor, take majority vote

from K closest points
K Nearest Neighbor: Distance Metric
L1 (Manhattan) Distance L2 (Euclidean) Distance
𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=∑ ¿ 𝑰 − 𝑰 ∨¿¿ 𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=
𝒑
𝒑
𝟏
𝒑
𝟐
√ ∑
𝒑
( 𝑰
𝒑
𝟏 −𝑰
𝒑𝟐
)
𝟐

𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=∑ ¿ 𝑰 − 𝑰 ∨¿¿ 𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=
𝒑
𝒑
𝟏
𝒑
𝟐
√ ∑
𝒑
( 𝑰
𝒑
𝟏 −𝑰
𝒑𝟐
)
𝟐

𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=∑ ¿ 𝑰 − 𝑰 ∨¿¿ 𝒅 𝟏 ( 𝑰 𝟏 , 𝑰 𝟐 )=
𝒑
𝒑
𝟏
𝒑
𝟐
√ ∑
𝒑
( 𝑰
𝒑
𝟏 −𝑰
𝒑𝟐
)
𝟐

Nearest Neighbor: Hyperparameters
• What is the best value of k to use?
• What is the best distance metric to use?
These are hyperparameters: choices about the

algorithms themselves.
Very problem-dependent.
Must try them all out and see what works best.

Setting Hyperparameters
Idea #1: Choose hyperparameters BAD: K = 1 always works
that work best on the data perfectly on training data
Your Dataset
Idea #2: Split data into train and test, choose BAD: No idea how algorithm
hyperparameters that work best on test data will perform on new data
Train Test
Idea #3: Split data into train, val, and test; choose
hyperparameters on val and evaluate on test Better!
Train Validation Test

Setting Hyperparameters
Your Dataset
Idea #4: Cross-Validation: Split data into folds,

try each fold as validation and average the results
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Test

Useful for small datasets, but not used too frequently in deep learning
k-Nearest Neighbors
Almost never used in practise for image recognition
1. Very slow at test time
2. Distance metrics on pixels are not informative.
Original Image Boxed Shifted Tinted
(all 3 images have same L2 distance to the one on the left)

k-Nearest Neighbors
Almost never used in practise for image recognition
Dimensions = 3
3. Curse of dimensionality Points =
Dimensions = 2
Points =
Dimensions = 1
Points = 4

k-Nearest Neighbors: Summary
• Image classification (Data driven approach)
• Starts with a training set of images and labels,
• Predicts labels on the test set
• K-Nearest Neighbors classifier
• Memorizes all training images
• Predicts labels based on the K nearest training examples. Very slow.
• Distance metric and K are hyperparameters
• Choose hyperparameters using the validation set
• Only run on the test set once at the very end.
• Pixel distance is not very informative.
• Curse of Dimensionality
Linear Classifier
CIFAR10
50, 000 training images
10, 000 test images
each image is 32x32x3
Parametric Approach : Linear Classifier
𝒇
3027x1
𝑾
Image 𝒇 ( 𝒙 ,𝑾 )=𝑾 𝒙
10x1 10x3027 𝒙
𝒇 (𝒙 ,𝑾 ) 10 numbers giving
class scores
Array of 32x32x3 numbers 𝑾

(3072 numbers total) parameters or
weights
Parametric Approach : Linear Classifier
𝒇
𝑾
3027x1
Image 𝒇 ( 𝒙 ,𝑾 )=𝑾 𝒙 +𝒃 10x1
𝒃
10x1 10x3027 𝒙
𝒇 (𝒙 ,𝑾 ) 10 numbers giving
class scores
Array of 32x32x3 numbers 𝑾

(3072 numbers total) parameters or
weights
Example: 4 pixels, 3 classes
Flatten tensors into a vector
56
0.2 -0.5 0.1 2.0
1.1 -96.8 Cat score
56 231
231 3.2 437.9 Dog score
1.5 1.3 2.1 0.0 + =
24 2
0.0 0.25 0.2 -0.3 24 -1.2 61.95 Ship score
Input Image 2
W b
x

Example: 4 pixels, 3 classes
Flatten tensors into a vector
56 -96.8
0.2 -0.5 0.1 2.0 1.1 Cat score
56 231
231 437.9 Dog score
1.5 1.3 2.1 0.0 3.2 =
24 2
0.0 0.25 0.2 -0.3 -1.2 24 61.95 Ship score
Input Image
W b 2
Bias Trick 1
x
Interpreting a Linear Classifier 56 231
24 2
0.2 -0.5 1.5 1.3 0.0 0.25

W
0.1 2.0 2.1 0.0 0.2 -0.3
b 1.1 3.2 -1.2
score -96.8 437.9 61.95

Linear Classifier: Geometric Perspective
Car 𝒇 ( 𝒙 ,𝑾 )=𝑾 𝒙+𝐛
Plane
Array of 32x32x3 numbers

(3072 numbers total)
Deer
Hard cases for Linear Classifier
Class 1:
Class 1:
0 <= L2 distance <= 2
Red Region
Class 2:
Class 2:
4 <= L2 distance <= 6
Blue Region
Linear Classifier: Three Viewpoints
Algebraic Viewpoint Visual Viewpoint Geometric Viewpoint
Hyperplanes
𝒇 ( 𝒙 ,𝑾 )=𝑾 𝒙+𝒃 One template per class

Coming up
Loss Function
To quantify how happy we are with the scores
Optimization
To find the best parameters W which minimized the loss


Discussion 2 - Image Classification - Annotated

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discussion 2 - Image Classification - Annotated

Uploaded by

Copyright:

Available Formats

Discussion 2 – Image Classification

Slide Credit : Stanford CS231n

What the computer sees

An image is just a tensor of integers between [0, 255]

Slide Credit : Stanford CS231n

Slide Credit : Stanford CS231n

Slide Credit : Stanford CS231n

Slide Credit : Stanford CS231n

Slide Credit : Stanford CS231n

Slide Credit : Stanford CS231n

Slide Credit : Stanford CS231n

• Unlike e.g. sorting a list of numbers

Slide Credit : Stanford CS231n

Find Edges Find Corners

Slide Credit : Stanford CS231n

Evaluate the classifier on new images

Slide Credit : Stanford CS231n

Memorize all data and

Predict the label of the

Bird Cat Deer Car Plane

• Calculate distance metric with training images

Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 15

Slide Credit : Stanford CS231n

Test Image Training Image Pixelwise absolute value differences

90 23 128 133 8 10 89 100 82 13 39 33 add

24 26 178 200 - 12 16 178 170

2 0 255 220 4 32 233 112 2 32 22 108

Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 17

Memorize training Data

For each test image

Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 18

Q: With N train examples, how

Ans: Train O(1),

This is bad: we want classifiers

Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 19

Slide Credit : Stanford CS231n

K=1 K=3 K=5

• Instead of copying label from nearest neighbor, take majority vote

Slide Credit : Stanford CS231n

Slide Credit : Stanford CS231n

Slide Credit : Stanford CS231n

These are hyperparameters: choices about the

Slide Credit : Stanford CS231n

Train Validation Test

Idea #4: Cross-Validation: Split data into folds,

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Test

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Test

(all 3 images have same L2 distance to the one on the left)

Slide Credit : Stanford CS231n

50, 000 training images

10, 000 test images

each image is 32x32x3

Array of 32x32x3 numbers 𝑾

Array of 32x32x3 numbers 𝑾

Slide Credit : Stanford CS231n

0.2 -0.5 1.5 1.3 0.0 0.25

b 1.1 3.2 -1.2

score -96.8 437.9 61.95

Slide Credit : Stanford CS231n moodle.skylightmyanmar.com 37

Array of 32x32x3 numbers

Slide Credit : Stanford CS231n

To find the best parameters W which minimized the loss

Slide Credit : Stanford CS231n