You are on page 1of 48

Chapter 1

Introduction to Deep Learning

1
Introduction to Deep Learning Chapter 1

Exciting Recent Developments

HCM City Univ. of Technology, Faculty of Mechanical Engineering 2 Duong Van Tu


Introduction to Deep Learning Chapter 1

What do these all have in common?


Deep Learning! 😄
https://www.youtube.com/watch?v=0iuCruB1wcs

Google DeepMind
3
HCM City Univ. of Technology, Faculty of Mechanical Engineering 3 Duong Van Tu
Introduction to Deep Learning Chapter 1

Motivation: ZIP codes

In 1990s, great increase in documents on paper


(mail, checks, books, etc.)

Motivation for a ZIP code recognizer on real U.S. mail for the postal
service!

HCM City Univ. of Technology, Faculty of Mechanical Engineering 4 Duong Van Tu


Introduction to Deep Learning Chapter 1

“three”

How do we know this is a three?

How does a computer know this is a three?

HCM City Univ. of Technology, Faculty of Mechanical Engineering 5 Duong Van Tu


Introduction to Deep Learning Chapter 1

0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0
120 0 0 0 0 0 0 0 0 0 0 0
240 40 0 0 0 0 0 0 0 0 0 0
242 128 0 0 0 0 0 0 0 0 0 0
255 240 10 0 0 0 0 0 0 0 0 0
254 244 120 0 0 0 0 0 0 0 0 0
255 255 121 8 0 0 0 0 0 0 0 0

what the computer sees

HCM City Univ. of Technology, Faculty of Mechanical Engineering 6 Duong Van Tu


Introduction to Deep Learning Chapter 1

Representing digits in the computer

How do we represent digits so that our machine can operate on them?

Represent image colors and intensities in a two-dimensional matrix of


numbers (i.e. an image).

0 is white, 255 is black, and numbers in between are shades of gray.

Sometimes the inverse of this scheme is used (0 = black, 255 = white)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 7 Duong Van Tu


Introduction to Deep Learning Chapter 1

Numbers known as pixel values (a grid of discrete samples that


make up an image)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 8 Duong Van Tu


Introduction to Deep Learning Chapter 1
10 11 12 13 14 15 16 17 18 19
0
10 0 0 0 0 0 0 0 170 236 255
5 11 0 0 0 0 0 0 0 0 250 254
12 0 0 0 0 0 0 0 0 255 255
10 13 0 0 0 0 0 0 0 0 240 245
14 0 0 0 0 0 0 0 0 245 236
15 15 0 0 0 0 0 0 0 0 251 253
16 0 0 0 0 0 0 0 124 255 255
20
17 0 0 0 0 0 0 0 210 254 120
25 18 0 0 0 0 0 0 0 165 230 0
19 0 0 0 0 0 0 120 154 220 0
0 5 10 15 20 25

Pixel in position [15, 15] is light


0s often have light patch in the middle
Contrast with the number 1 or 2, which often has the darker
values for these positions
HCM City Univ. of Technology, Faculty of Mechanical Engineering 9 Duong Van Tu
Introduction to Deep Learning Chapter 1

Can we define a set of heuristics (i.e. rules based on our intuition),


to classify digits?

A heuristic for classifying “7”


0

5
1 2

10
Digit is a 7 if 𝑃1 > 128 and
𝑃2 > 128 and 𝑃3 > 128
15

20

3
25

0 5 10 15 20 25

HCM City Univ. of Technology, Faculty of Mechanical Engineering 10 Duong Van Tu


Introduction to Deep Learning Chapter 1

But what if...

5
1 2

10

15 Slanted digit?
20

3
25
pixel 3 is no
longer dark! 0 5 10 15 20 25

HCM City Univ. of Technology, Faculty of Mechanical Engineering 11 Duong Van Tu


Introduction to Deep Learning Chapter 1

An Improved Heuristic!

5
1 2

Digit is a 7 if 𝑃1 >
10

15 128 and 𝑃2 > 128 and


20
(𝑃3 > 128 or 𝑃4 > 128)
4 3
25

0 5 10 15 20 25

HCM City Univ. of Technology, Faculty of Mechanical Engineering 12 Duong Van Tu


Introduction to Deep Learning Chapter 1

Not so fast...

5
1 2

the pixel 10
values are
completely 15 Digit shifted up?
different
20

4 3
25

0 5 10 15 20 25

HCM City Univ. of Technology, Faculty of Mechanical Engineering 13 Duong Van Tu


Introduction to Deep Learning Chapter 1

Not as simple as we think!

Distortions, overlappings, underlinings, etc.

Heuristics can always be foiled.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 14 Duong Van Tu


Introduction to Deep Learning Chapter 1

Machine Learning Pipeline for Digit Recognition

Dataset Preprocessing Train Model Evaluate Model

HCM City Univ. of Technology, Faculty of Mechanical Engineering 15 Duong Van Tu


Introduction to Deep Learning Chapter 1

MNIST

Modified National Institute of


Standards and Technology database

Handwritten digits

0 — 9 (10 classes)

70,000 images

HCM City Univ. of Technology, Faculty of Mechanical Engineering 16 Duong Van Tu


Introduction to Deep Learning Chapter 1

Train, validation, and test sets


•Train set — used to adjust the parameters of the model
•Validation set — used to test how well we’re doing as we develop
• Prevents overfitting, something you will learn later!
• Also known as the development set
•Test set — used to evaluate the model once the model is done
Dataset

Train set Dev set Test set


HCM City Univ. of Technology, Faculty of Mechanical Engineering 17 Duong Van Tu
Introduction to Deep Learning Chapter 1

Each 𝑥 in our dataset is called an input


𝑥 is represented by a 28 * 28 matrix of pixel
values, flattened into a one dimensional vector
(more on this later)
Each 𝑦 in our dataset is called a label
𝑦 is the corresponding answer/classification, one
of ten possibilities
Training Training
Data Labels

We refer to each (𝑥, 𝑦) as an example


This is a supervised learning task 𝑓ሚ (“model”)
Output
Loss function

Optimizer

HCM City Univ. of Technology, Faculty of Mechanical Engineering 18 Duong Van Tu


Introduction to Deep Learning Chapter 1
A (Temporary) Simplification: Binary Classification
•Classifying MNIST digits requires predicting 1 of 10 possible values
•We’ll first look at a simpler task — binary classification problem
•Determine whether handwritten digit is a 2 or not a 2
•The first neural network for binary classification: the Perceptron
Is a 2

Not a 2

HCM City Univ. of Technology, Faculty of Mechanical Engineering 19 Duong Van Tu


Introduction to Deep Learning Chapter 1

The Perceptron
𝑥1

Input: a vector of numbers 𝐱 = 𝑥1 , 𝑥2 , … 𝑥𝑛 𝑥2


𝑤1

𝑤 and 𝑏 are parameters of the perceptron 𝑤2 𝑏


𝑥3
𝑤3
•Parameters: values we adjust during Σ
𝑥4 𝑤4
learning
𝑤5
𝑥5
•Let Φ = 𝑤⋃𝑏 (the set of all 𝑤6

parameters) 𝑥6
𝑤7

𝑥7

Artificial Neuron (Perceptron)


20
HCM City Univ. of Technology, Faculty of Mechanical Engineering 20 Duong Van Tu
Introduction to Deep Learning Chapter 1

Predicting with a Perceptron

✓ Multiply each input 𝑥𝑖 by its corresponding weight 𝑤𝑖 , sum them up.


✓ Add the bias 𝑏
✓ If the result value is greater than 0, return 1, otherwise return 0
𝑛

1, 𝑖𝑓 𝑏 + ෍ 𝑤𝑖 𝑥𝑖 > 0
𝑓Φ 𝑥 =
𝑖=0
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

✓ As a binary classifier, 1 indicates that 𝑥 is a member of the class and


0, not a member
HCM City Univ. of Technology, Faculty of Mechanical Engineering 21 Duong Van Tu
Introduction to Deep Learning Chapter 1
Parameters

•Weights — the importance of each input to determining the output


•Weight near 0 imply this input has little influence on the output
•Negative weight means increasing the input will decrease the output
•Bias — the a priori likelihood of the positive class
•Ensures that even if all inputs are 0, there will be some result value

HCM City Univ. of Technology, Faculty of Mechanical Engineering 22 Duong Van Tu


Introduction to Deep Learning Chapter 1
Bias: Geometric Explanation

The bias is essentially the b term in y = mx+b

only the line 𝑓 𝑥 = 2𝑥


with bias can 3
𝑓(𝑥) = 𝑥
fit the data 2
𝑓(𝑥) = 𝑥 + 1

HCM City Univ. of Technology, Faculty of Mechanical Engineering 23 Duong Van Tu


Introduction to Deep Learning Chapter 1
Bias as special type of weight

Another way to think of bias is to represent it as an extra


weight for an input/feature that is always 1

𝑥0 , 𝑥1 , 𝑥2 , … 𝑥𝑛 ∙ 𝑤0 , 𝑤1 , 𝑤2 , … 𝑤𝑛 + 𝑏

= 𝑥0 , 𝑥1 , 𝑥2 , … 𝑥𝑛 , 1 ∙ 𝑤0 , 𝑤1 , 𝑤2 , … 𝑤𝑛 , 𝑏

HCM City Univ. of Technology, Faculty of Mechanical Engineering 24 Duong Van Tu


Introduction to Deep Learning Chapter 1
Bias as special type of weight

Another way to think of bias is to represent it as an extra weight for


an input/feature that is always 1
𝑥1 𝑥1

𝑤1 𝑤1
𝑥2 𝑥2
𝑤2 𝑏 𝑤2
𝑥3 𝑥3

=
𝑤3 𝑤3

𝑥4 𝑤4 Σ 𝑥4 𝑤4 Σ
𝑤5 𝑤5
𝑥5 𝑥5
𝑤6 𝑤6

𝑥6 𝑥6
𝑤7 𝑤7

𝑥7 𝑥7 𝑏

HCM City Univ. of Technology, Faculty of Mechanical Engineering 25 Duong Van Tu


Introduction to Deep Learning Chapter 1

A Binary Perceptron for MNIST


𝑥1

• Inputs 𝑥1 , 𝑥2 , … 𝑥𝑛 are all positive 𝑥2


𝑤1
𝑤2
𝑏

𝑥3
• 𝑛 = 784 (28 ∗ 28 pixel values) 𝑤3
𝑥4 𝑤4 Σ
• Output is either 0 or 1 𝑥5
𝑤5


• 0 → input is not the digit type we’re looking for
𝑤784

𝑥784

• 1 → input is the digit type we’re looking for

HCM City Univ. of Technology, Faculty of Mechanical Engineering 26 Duong Van Tu


Introduction to Deep Learning Chapter 1
Training a Perceptron
•How do we set the parameters Φ = 𝑤⋃𝑏 to produce the highest accuracy on our
training set?

• Iterate over training set several times, feeding in each training example into the
model, producing an output, and adjusting the parameters according to whether
that output was right or wrong

• Stop once we either (a) get every training example right or (b) after 𝑁 iterations,
a number set by the programmer.

•𝑁 is known as the number of epochs, where each epoch is an iteration of going


through all data points in the training set

• As a general rule of thumb, 𝑁 grows with the number of parameters


HCM City Univ. of Technology, Faculty of Mechanical Engineering 27 Duong Van Tu
Introduction to Deep Learning Chapter 1

The Perceptron Learning Algorithm


1. set 𝑤’s to 0.
2. for 𝑁 iterations, or until the weights do not change:
a. for each training example x𝑘 with label 𝑎𝑘
i. if 𝑎𝑘 − 𝑓 x𝑘 = 0 continue
ii. else for all weights 𝑤𝑖 , ∆𝑤𝑖 = 𝑎𝑘 − 𝑓 x 𝑘 𝑥𝑖𝑘

• 𝑏 = bias • 𝑎𝑘 = label for the k th example


• 𝑤𝑖 = weight for the ith input where
• 𝑤 = weights 𝑖≤𝑛
• 𝑁 = maximum number of training iterations • 𝑛 = number of pixels per image
• x 𝑘 = k th training example • 𝑥𝑖𝑘 = ith input of the example
where 𝑖 ≤ 𝑛

HCM City Univ. of Technology, Faculty of Mechanical Engineering 28 Duong Van Tu


Introduction to Deep Learning Chapter 1

The Perceptron Learning Algorithm


1. set 𝑤’s to 0.
2. for 𝑁 iterations, or until the weights do not change:
a. for each training example x𝑘 with label 𝑎𝑘
i. if 𝑎𝑘 − 𝑓 x𝑘 = 0 continue
ii. else for all weights 𝑤𝑖 , ∆𝑤𝑖 = 𝑎𝑘 − 𝑓 x 𝑘 𝑥𝑖𝑘

• If the output of our model matches the label, we continue


• If the correct label is 1, and our output is 1, 1 − 1 = 0
• If the correct label is 0, and our output is 0, 0 − 0 = 0

HCM City Univ. of Technology, Faculty of Mechanical Engineering 29 Duong Van Tu


Introduction to Deep Learning Chapter 1

The Perceptron Learning Algorithm


1. set 𝑤’s to 0.
2. for 𝑁 iterations, or until the weights do not change:
a. for each training example x𝑘 with label 𝑎𝑘
i. if 𝑎𝑘 − 𝑓 x𝑘 = 0 continue
ii. else for all weights 𝑤𝑖 , ∆𝑤𝑖 = 𝑎𝑘 − 𝑓 x 𝑘 𝑥𝑖𝑘

• If our label 𝑎𝑘 is a 1, and our model’s output is a 0, we update the 𝑖𝑡ℎ weight by:
• 1 − 0 ∙ 𝑥𝑖𝑘 = 𝑥𝑖𝑘
• Output was 0 and should have been 1, so make the output more positive
• If our label 𝑎𝑘 is a 0, and our model’s output is a 1, we update the 𝑖𝑡ℎ weight by:
• 0 − 1 ∙ 𝑥𝑖𝑘 = −𝑥𝑖𝑘
• Output was 1 and should have been 0, so make the output more negative

HCM City Univ. of Technology, Faculty of Mechanical Engineering 30 Duong Van Tu


Introduction to Deep Learning Chapter 1

Example: Predict whether a digit is a “2”

Just look at the effect of these two pixels

𝑥1

𝑥2 𝑥1 = 0.8
𝑥2 = 0

HCM City Univ. of Technology, Faculty of Mechanical Engineering 31 Duong Van Tu


Introduction to Deep Learning Chapter 1

Example: Predict whether a digit is a “2”


• Start off training with all parameters as 0, so 𝑤1 = 0, 𝑤2 = 0, and 𝑏 = 0
𝑥1
• 𝑓 𝑥 = (𝑤1 ∙ 𝑥1 + 𝑤2 ∙ 𝑥2 + 𝑏) > 0
• 𝑓 𝑥 = 0 ∙ 0.8 + 0 ∙ 0 + 0 ∙ 1 > 0
• Return 0 because value is not greater than 0 𝑥2
• Predict that it is not a 2!
• Correct answer: it is a 2...
• Parameter update:
• ∆𝑤1 = 1 − 0 ∙ 0.8 = 0.8
• ∆𝑤2 = 1 − 0 ∙ 0 = 0 𝑥1 = 0.8
• ∆𝑏 = 1 − 0 ∙ 1 = 1 𝑥2 = 0
• Now
• 𝑤1 = 0.8
• 𝑤2 = 0
•𝑏 = 1
HCM City Univ. of Technology, Faculty of Mechanical Engineering 32 Duong Van Tu
Introduction to Deep Learning Chapter 1

Example: Predict whether a digit is a “2”

Next example:

𝑥1
𝑥1 = 0.9
𝑥2 𝑥2 = 0.9

HCM City Univ. of Technology, Faculty of Mechanical Engineering 33 Duong Van Tu


Introduction to Deep Learning Chapter 1

Example: Predict whether a digit is a “2”


• At end of last iteration:
• 𝑤1 = 0.8, 𝑤2 = 0, and 𝑏 = 1
• 𝑓 𝑥 = (𝑤1 ∙ 𝑥1 + 𝑤2 ∙ 𝑥2 + 𝑏) > 0 𝑥1
• 𝑓 𝑥 = (0.8 ∙ 0.9 + 0 ∙ 0.9 + 1 ∙ 1) > 0
• Return 1 because value is greater than 0 𝑥2
• Predict that it is a 2!
• Correct answer: it is not a 2...
• Parameter update:
• ∆𝑤1 = 0 − 1 ∙ 0.9 = −0.9
• ∆𝑤2 = 0 − 1 ∙ 0.9 = −0.9 𝑥1 = 0.9
• ∆𝑏 = 0 − 1 ∙ 1 = −1 𝑥2 = 0.9
• Now
• 𝑤1 = 0.8 − 0.9 = −0.1
• 𝑤2 = 0 − 0.9 = −0.9
•𝑏 = 1 − 1 = 0
HCM City Univ. of Technology, Faculty of Mechanical Engineering 34 Duong Van Tu
Introduction to Deep Learning Chapter 1

Using multiple perceptrons

•We can extend perceptrons to multi-class problems by creating 𝑛


perceptrons, where 𝑛 = the number of classes

•For MNIST, we would have 10 perceptrons


•Each individual perceptron returns a value, so our model will return
the class whose perceptron value is the highest.

•Here, “perceptron value” refers to the value of the weighted sum


before being thresholded.

35
HCM City Univ. of Technology, Faculty of Mechanical Engineering 35 Duong Van Tu
Introduction to Deep Learning Chapter 1

Using multiple perceptrons


𝑥1

𝑤1
𝑥2

𝑤2

𝑥3
𝑤3

Perceptron for predicting whether


𝑥4 𝑤4 Σ 𝑜𝑢𝑡𝑝𝑢𝑡

handwritten digit is a 0
𝑤5
𝑥5

𝑤6

𝑥6
𝑤7

𝑥7

⋮𝑥1

𝑤1
𝑥2

𝑤2

𝑥3

Perceptron for predicting whether


𝑤3

𝑥4 𝑤4 Σ 𝑜𝑢𝑡𝑝𝑢𝑡

𝑥5
𝑤5

𝑤6
handwritten digit is a 9
𝑥6
𝑤7

𝑥7

HCM City Univ. of Technology, Faculty of Mechanical Engineering 36 Duong Van Tu


Introduction to Deep Learning Chapter 1

Using multiple perceptrons


𝑤1,1
𝑤1,2
𝑥1

𝑥2
𝑤1

𝑤2
Σ 𝑜𝑢𝑡𝑝𝑢𝑡1
𝑤1,3
Σ 𝑜𝑢𝑡𝑝𝑢𝑡1
𝑥3
𝑤3 𝑥1 𝑤2,1
𝑥4 𝑤4 𝑤2,2

𝑥5
𝑤5 𝑤2,3

𝑥6
𝑤6
𝑥2
𝑤7 𝑤3,1
𝑥7
𝑤3,2
𝑥1 𝑥3 𝑤3,3
𝑤1
𝑥2

=
𝑤2 𝑤4,1
𝑥3

𝑥4
𝑤3

𝑤4 Σ 𝑜𝑢𝑡𝑝𝑢𝑡2
𝑥4 𝑤4,2
𝑤4,3
Σ 𝑜𝑢𝑡𝑝𝑢𝑡2
𝑤5
𝑥5

𝑥5
𝑤6
𝑤5,1
𝑥6
𝑤7
𝑤5,2
𝑥7
𝑤5,3
𝑥1

𝑤1
𝑥6 𝑤6,1
𝑥2
𝑤2
𝑥3 𝑤6,2
𝑤3

𝑥4 𝑤4 𝑥7 𝑤6,3
Σ 𝑜𝑢𝑡𝑝𝑢𝑡3
𝑤5
𝑥5 Σ 𝑜𝑢𝑡𝑝𝑢𝑡3 𝑤7,1
𝑤6
𝑥6
𝑤7 𝑤7,2
𝑥7 𝑤7,3

Three separate perceptrons Three perceptrons sharing inputs

HCM City Univ. of Technology, Faculty of Mechanical Engineering 37 Duong Van Tu


Introduction to Deep Learning Chapter 1

Activate Functions

1. Binary Step Function

f(x) = 1, x>=0
= 0, x<0

HCM City Univ. of Technology, Faculty of Mechanical Engineering 38 Duong Van Tu


Introduction to Deep Learning Chapter 1

Activate Functions

2. Linear Function

f(x)=ax

HCM City Univ. of Technology, Faculty of Mechanical Engineering 39 Duong Van Tu


Introduction to Deep Learning Chapter 1

Activate Functions

3. Sigmoid

f(x) = 1/(1+e^-x)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 40 Duong Van Tu


Introduction to Deep Learning Chapter 1

Activate Functions

4. Tanh

tanh(x)=2sigmoid(2x)-1

HCM City Univ. of Technology, Faculty of Mechanical Engineering 41 Duong Van Tu


Introduction to Deep Learning Chapter 1

Activate Functions

5. ReLU (Rectified Linear Unit)

f(x)=max(0,x)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 42 Duong Van Tu


Introduction to Deep Learning Chapter 1

Activate Functions

6. Leaky ReLU
f(x)= 0.01x, x<0
= x, x>=0

HCM City Univ. of Technology, Faculty of Mechanical Engineering 43 Duong Van Tu


Introduction to Deep Learning Chapter 1

Activate Functions

7. Parameterised ReLU

f(x) = x, x>=0
= ax, x<0

HCM City Univ. of Technology, Faculty of Mechanical Engineering 44 Duong Van Tu


Introduction to Deep Learning Chapter 1

Activate Functions

8. Exponential Linear Unit

f(x) = x, x>=0
= a(e^x-1), x<0

HCM City Univ. of Technology, Faculty of Mechanical Engineering 45 Duong Van Tu


Introduction to Deep Learning Chapter 1

Activate Functions

9. Swish Function
f(x) = x*sigmoid(x)
f(x) = x/(1-e^-x)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 46 Duong Van Tu


Introduction to Deep Learning Chapter 1

Activate Functions

10. Softmax Function

HCM City Univ. of Technology, Faculty of Mechanical Engineering 47 Duong Van Tu


Introduction to Deep Learning Chapter 1

Choosing the right Activation Function


• Sigmoid functions and their combinations generally work better in the case of
classifiers.
• Sigmoids and tanh functions are sometimes avoided due to the vanishing gradient
problem.
• ReLU function is a general activation function and is used in most cases these days.
• If we encounter a case of dead neurons in our networks the leaky ReLU function is
the best choice.
• Always keep in mind that ReLU function should only be used in the hidden layers.
• As a rule of thumb, you can begin with using ReLU function and then move over to
other activation functions in case ReLU doesn’t provide with optimum results.
HCM City Univ. of Technology, Faculty of Mechanical Engineering 48 Duong Van Tu

You might also like