Intro To Deep Learning Day 2 and Day 3 - Revision 1.3 PDF

Intro to Deep Learning
Supervised Learning Workflow

ว ิเคราะห์ปัญหาจากข้อมูลว่าเป็นปัญหาประเภท
คัดเลือกเฉพาะข้อมูลที่เกี่ยวข้อง
ไหน (Classification, Regression, etc.)
Raw Data Cleaned Data
ตรวจสอบคุณภาพและ สร้าง Label ซึ่งจะเป็นเฉลยคําตอบที่จะใช้ Train

แบ่ง Dataset เป็น Training Set,
ความถูกต้องของ Dataset และวัดผล Model ที่เราสร้าง โดย Label จะมาจาก
Validation Set, Test Set
(Data และ Label) Raw Data หร ือ Domain Expert
กําหนดโครงสร้างของ Neural
นํา Training Set ไป Train โดย
Network และกําหนด Error ปรับจูน Hyperparameter ต่างๆ
วัดผลเทียบกับ Validation Set
Function
วัดผลครั้งสุดท้ายกับ Test Set
Deploy Model
2
Classification Problem vs Regression Problem
Classification Problem VS Regression Problem
Classification Regression
การจําแนกข้อมูล เช่น การแยกสายพันธุ์ดอกไม้จากความ การทํานายข้อมูล เช่น การทํานายราคาบ้านจากขนาดของบ้าน
กว้างความยาวของกลีบดอก
4
Classification Problem
คัดแยกดอกไอร ิส 2 สายพันธุ์ Iris Setosa และ Iris Virginica

6
Source: https://www.kaggle.com/uciml/iris
Example: Petal Width and Length Dataset
Class: Iris Setosa Class: Iris Virginica

Width Length Width Length
(x1) (x2) (x1) (x2)
0.3 0.7 1.0 0.9
0.9 0.2 0.8 0.6
0.2 0.5 1.2 0.5
0.4 0.2 0.6 0.8
0.6 0.3 1.3 0.7 7

Length (cm)
Iris Virginica
Iris Setosa
Width (cm) 8
Length (cm)
Iris Virginica
Iris Setosa
Width (cm) 9
https://www.desmos.com/calculator/qbpm3fc4zc
10
Prediction Example
ตัวอยางจุดสีดําในภาพแทนดอกไมดอกใหมที่เราจะนํามาจําแนกวาอยูกลุมไหน ความกวางของ
ดอก 0.9 cm และความยาวของดอก 0.8 cm อยูในพื้นที่ๆ เราจําแนกไดเปนดอกไมสีมวง 11
More About Linear vs Nonlinear
https://playground.tensorflow.org
12
Source: https://study.com/academy/lesson/how-to-recognize-linear-functions-vs-non-linear-functions.html
Regression Problem
ข้อมูลราคาบ้าน
ราคาบ้าน
ชื่อผู้ขาย วันที่ประกาศขาย ตําแหน่งที่ตั้ง ขนาดบ้าน (ft2) จํานวนห้องนอน
($1000)
บ๊อบ 14/02/2019 พอร์ตแลนด์ 2104 3 399
อลิซ 14/03/2019 พอร์ตแลนด์ 1600 3 329
จอห์น 01/04/2019 พอร์ตแลนด์ 2400 3 369
... ... ... ... ... ...
ตัวอย่างข้อมูลดิบของการขายบ้าน
14

Function
Deploy Model
15
คัดเลือกข้อมูลที่เกี่ยวข้อง
ชื่อผู้ขาย วันที่ประกาศขาย ตําแหน่งที่ตั้ง ขนาดบ้าน (ft2) จํานวนห้องนอน
($1000)
บ๊อบ 14/02/2019 พอร์ตแลนด์ 2104 3 399
อลิซ 14/03/2019 พอร์ตแลนด์ 1600 3 329
จอห์น 01/04/2019 พอร์ตแลนด์ 2400 3 369
... ... ... ... ... ...
16
Regression Problem
ข้อมูลราคาบ้านเทียบกับขนาดของบ้านที่ Portland, Oregon

https://www.desmos.com/calculator/m34iec3zmi
Source: https://github.com/girishkuniyal/Predict-housing-prices-in-Portland/blob/master/ex1data2.txt 17
Regression Problem
เราต้องการหาสมการเส้นตรงหนึ่งเส้นที่ทํานายราคาบ้านได้จากขนาดของบ้าน
Regression Problem
Regression Problem
ถ้าเรามีบ้านที่ต้องการขาย ขนาด 4000 ตารางฟุต ในเมือง Portland เราสามารถประมาณราคาขายของบ้านได้ที่ $600k
Recap: Classification vs Regression
21

Function
Deploy Model
22
กําหนด Label และตรวจสอบความถูกต้องของข้อมูล
ขนาดบ้าน จํานวนห้องนอน
(Label)
2104 3 399
1600 3 329
2400 3 369
... ... ...
23
คัดเลือกเฉาะข้อมูลที่เกี่ยวข้อง

Function
Deploy Model
24
แบ่ง Dataset เป็น Training Set, Validation Set, Test Set
ขนาดบ้าน จํานวนห้องนอน ราคาบ้าน (Label)
2104 3 399
1600 3 329
Training Set (60%)
2400 3 369
จํานวนห้อง ราคาบ้าน
ขนาดบ้าน
นอน (Label) ... ... ...
2104 3 399 ขนาดบ้าน จํานวนห้องนอน ราคาบ้าน (Label)
2105 3 398
1600 3 329
1605 3 328
Validation Set (20%)
2400 3 369 2405 3 368
... ... ...

... ... ...
Dataset 2107 3 395
1607 3 325
Test Set (20%)
2407 3 365
... ... ... 25


Function
Deploy Model
26
Neural Network
What is Neural Network?
Source: http://www.cs.stir.ac.uk/courses/ITNP4B/lectures/kms/4-MLP.pdf 28
Solve XOR Problem using a Simple Neural Network
x1 h1
x2 h2
6 Parameters Neural Network
https://www.desmos.com/calculator/xvtdj4fog8
29
Handwritten Digits Classification (MNIST)
Source: https://towardsdatascience.com/image-classification-in-10-minutes-with-mnist-dataset-54c35b77a38d 30
LeNet-5 to Solve Handwritten Digits Problem
60k Parameters Neural Network

Source: https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781788831109/4/ch04lvl1sec33/implementing-a-lenet-5-step-by-step 31
Image Classification Problem
Source: https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/ 32
VGG-16 to Solve Image Classification
138M Parameters Neural Network
Source: https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/ 33
A Simple Neural Network
Recap: House Price Data
Source: https://github.com/girishkuniyal/Predict-housing-prices-in-Portland/blob/master/resources/output_4_0.png 35
Neural Network: A Simple Model
x1 w1
p: house price
w2
x2
Input: Output:
x1 คือขนาดบ้านเป็นตารางฟุต p คือราคาบ้าน
x2 คือจํานวนห้องนอน Output เป็นผลลัพธ์การทํานายของ
โมเดล ในที่นี้เราต้องการให้โมเด
ลทํานายราคาบ้านจากขนาดบ้านและ
จํานวนห้องนอน
36
x1 w1
p: house price
w2
x2
Weight:
w1 คือตัวเลขที่จะบอกความสําคัญของ x1 ในการทํานายราคาบ้าน
w2 คือตัวเลขที่จะบอกความสําคัญของ x2 ในการทํานายราคาบ้าน
w ตัวไหนที่มีค่ามากหมายความว่า feature (x1 หร ือ x2) นัน
้ ๆ ช่วยทําให้โมเดลทํานายราคาบ้านได้
37
x1 w1
p: x1w1 + x2w2
w2
x2
Weight:
ค่านํ้าหนัก w1, w2 คือตัวเลขที่จะบอกความสัมพันธ์ของ feature ของข้อมูล (ขนาดบ้านและ
จํานวนห้องนอน) ค่า weight เหล่านี้มีชื่อเร ียกหลายแบบ เช่น parameter, θ (Theta)
*ดังนั้น weight เป็นชุดตัวเลขที่เราต้องการค้นหา เพื่อให้คําตอบราคาบ้านที่ถูกต้องได้

38
x1 w1
p: x1w1 + x2w2
w2
x2
Weight:
ค่านํ้าหนัก w1, w2 คือตัวเลขที่จะบอกความสัมพันธ์ของ feature ของข้อมูล (ขนาดบ้านและ
จํานวนห้องนอน) ค่า weight เหล่านี้มีชื่อเร ียกหลายแบบ เช่น parameter, θ (Theta)
*ดังนั้น weight เป็นชุดตัวเลขที่เราต้องการค้นหา เพื่อให้คําตอบราคาบ้านที่ถูกต้องได้

39
Example of Neural Network with Hidden Layer
x1 w1
w3
p: house price
x2 w2
40
x1 w1 h1
w3
p: house price
x2 w2
h1 = x1w1 + x2w2 +
b1
41
x1 w1 h1
w3
p = h1w3 + b2
x2 w2
h1 = x1w1 + x2w2 +
b1
42

Function
Deploy Model
43
PyTorch
A Simple and Powerful Deep Learning Framework
The Origin of PyTorch
45
Source: https://github.com/pytorch/pytorch/commit/731041cb6a12b6f46626e16a04ab11523fed1c49
PyTorch Authors
Edward Z. Yang Adam Paszke
Research Engineer Deep Learning Engineer
Facebook AI Research NVIDIA
Gregory Channan Soumith Chintala

Technical Lead / Software Research Engineer
Engineering Manager
Facebook AI Research
Facebook
Yangqing Jia Sam Gross

Research Scientist Director Research Engineer
Facebook Facebook AI Research
46
Source: https://github.com/pytorch/pytorch 47
49
Source: http://blog.ezyang.com/2019/05/pytorch-internals/
2D Array
50
51
Tensor Supported Data Type
52
Source: https://pytorch.org/docs/stable/tensors.html
Workshop 7: Tensor in PyTorch
Neural Networks
Source: https://www.analyticsindiamag.com/how-to-create-your-first-artificial-neural-network-in-python/ 54
Tensor as Neural Network Data Structure
55
Source: https://hmkcode.github.io/ai/backpropagation-step-by-step/
Neural Network Lifecycle
56
Source: https://medium.com/datathings/neural-networks-and-backpropagation-explained-in-a-simple-way-f540a3611f5e
57
Workshop 8: A Simple Neural
Network in PyTorch
Recap: What we want to solve?
59
Recap: House Price Dataset
2104 3 399
1600 3 329
Training Set (60%)
2400 3 369
จํานวนห้อง ราคาบ้าน
ขนาดบ้าน
นอน (Label) ... ... ...
2104 3 399 ขนาดบ้าน จํานวนห้องนอน ราคาบ้าน (Label)
2105 3 398
1600 3 329
1605 3 328
Validation Set (20%)
2400 3 369 2405 3 368
... ... ...

... ... ...
Dataset
2107 3 395
1607 3 325
Test Set (20%)
2407 3 365
... ... ... 60

Neural Network: Input from Training Set
2104 3 399
1600 3 329
2400 3 369
... ... ...
Training Set
x1 w1 h1
w3
p = h1w3 + b2
x2 w2
h1 = x1w1 + x2w2 + b1 61
Neural Network: Input from Training Set
2104 3 399
1600 3 329
2400 3 369
... ... ...
Training Set
2104 w1 h1
w3
p = h1w3 + b2
3 w2
h1 = x1w1 + x2w2 + b1 62
Neural Network: Random Weights
2104 3 399
1600 3 329
2400 3 369
... ... ...
Training Set
2104 0.16 h1
0.01
p = h1w3 + b2
0.1
3
h1 = x1w1 + x2w2 + b1 b1=0, b2=0 63
Neural Network: Feedforward
2104 3 399
1600 3 329
2400 3 369
... ... ...
Training Set
2104 0.16 h1
0.01
p = h1w3 + b2
0.1
3
h1 = 336.94 b1=0, b2=0 64
Neural Network: Feedforward
2104 3 399
1600 3 329
2400 3 369
... ... ...
Training Set
2104 0.16 h1
0.01
p = 3.3694
0.1
3
h1 = 336.94 b1=0, b2=0 65
Workshop 9: Implementing
PyTorch Neural Network on MNIST
Dataset
67

Function
Deploy Model
68
Optimizing Neural Network
Optimization Examples
Classification Optimization Linear Regression Optimization
70
Source: https://medium.com/coinmonks/backpropagation-concept-explained-in-5-levels-of-difficulty-8b220a939db5
71
Neural Network: Calculate Error/Loss
2104 3 399
การคํานวณหา Error คือการวัดผลว่า Model เรา
1600 3 329 ทํานายผลได้แม่นแค่ไหน
2400 3 369
... ... ...
Training Set
2104 0.16 h1
0.01
p = 3.3694
0.1
3
h1 = 336.94 b1=0, b2=0 72
Neural Network: Calculate Error
2104 3 399
เราจะคํานวณ Error จากระยะห่างระหว่าง p
1600 3 329 (Prediction) และ Label
2400 3 369
... ... ...
Training Set
2104 0.16 h1
0.01
p = 3.3694
0.1
3
h1 = 336.94 b1=0, b2=0 73
2104 3 399
ว ิธีการหนึ่งคือเอามาลบกันตรงๆ
1600 3 329 3.3694 - 399 = -399.63
2400 3 369
... ... ...
Training Set
2104 0.16 h1
0.01
p = 3.3694
0.1
3
h1 = 336.94 b1=0, b2=0 74
2104 3 399 แต่เราต้องการรู้ระยะห่างระหว่าง Prediction กับ

1600 3 329 Error เพราะฉะนั้น เราจะใส่ Absolute หร ือยก
2400 3 369
กําลังสองเพื่อให้ค่า Error อยู่ในช่วงเดียวกัน
... ... ...
Training Set
2104 0.16 h1
0.01
p = 3.3694
0.1
3
h1 = 336.94 b1=0, b2=0 75
Error Functions / Loss Functions
โมเดลอาจจะ Predict ออกมาเปนคาติดลบ เราอยากรูระยะหางระหวาง Model Prediction กับ Label ดังนั้นเราสามารถใส Absolute
หรือยกกําลังสองเพื่อที่ตอนนํา Error ของแตละ training samples มารวมกันอยูใน Scale เดียวกัน
(ถา Data ของเรามีลักษณะเกาะกลุมกันเหมือนตัวอยางราคาบาน เราจะมีการกระจายของ Error ที่เปน Normal Distribution ที่คา Mean
เปน 0 ซึ่งสอดคลองกับ Square Error)
Dataset ที่คลายๆ กับตัวอยางราคาบานแนะนําใหใช Square Error เพื่อหาระยะหางระวาง Model Prediction กับ Label
https://pytorch.org/docs/stable/nn.html#loss-functions
76
Source: https://people.orie.cornell.edu/mru8/doc/udell18_big_data.pdf
77
78
Optimization Goal
เส้นสีแดงคือโมเดล เป้าหมายคือ Input x เข้าไปในโมเดล neural network แล้วให้

output ค่า y ที่อยู่ใกล้เส้นสีแดงมากที่สุด (Error น้อย)
79
Source: https://medium.com/coinmonks/backpropagation-concept-explained-in-5-levels-of-difficulty-8b220a939db5
Neural Network: Reduce Error
2104 3 399
1600 3 329
2400 3 369
... ... ...
Training Set
2104 0.16 h1
0.01
p = 3.3694
0.1
3
h1 = 336.94 b1=0, b2=0 80
Neural Network: Reduce Error
2104 3 399
1600 3 329
2400 3 369
... ... ...
Training Set
เราต้องปรับโมเดลของเราที่เป็นเส้นสีแดงให้เข้าใกล้เส้นสีเขียว
ที่มี Error น้อยกว่า (prediction ใกล้เคียงกับ Label)
สิ่งที่เราปรับได้ใน Model คือค่า Weight
2104 0.16 h1
0.01
p = 3.3694
0.1
3
h1 = 336.94 b1=0, b2=0 81
เราจะปรับค่า Weight เพื่อลด Error ได้อย่างไร?
Minimize Error
82
จุดสีเขียวด้านบนคือค่า Weight ที่จะให้ Error ตํ่าที่สุด

จุดสีดําด้านบนคือค่า Weight w3 (0.01) ปัจจุบันที่มีค่า Error สูงถึง
79,261.79
เส้นสีเขียวด้านซ้ายคือโมเดลที่ Predict ราคาบ้านได้แม่นยําที่สุด (Error
น้อยที่สุด)
เส้นสีแดงด้านซ้ายคือโมเดลปัจจุบันของเรา 83
เส้นสีนํ้าเงินที่ลากผ่านจุดสีดําแสดงถึงความชัน ณ จุดสีดํา ยิ่งชันมาก

องศาที่เอียงของเส้นสีนํ้าเงินจะสูงมาก
84
เป้าหมายของเราคือจุดสีเขียวที่มีค่า Error ตํ่าที่สุด โดยปรับค่า Weight

(จุดสีดํา) ยิ่งค่า Weight ที่ให้ค่า Error ที่ลดลง ความชันเส้นสีนํ้าเงินก็จะ
ลดลงไปด้วย
85
เป้าหมายของเราคือจุดสีเขียวที่มีค่า Error ตํ่าที่สุด โดยปรับค่า Weight

(จุดสีดํา) ยิ่งค่า Weight ที่ให้ค่า Error ที่ลดลง ความชันเส้นสีนํ้าเงินก็จะ
ลดลงไปด้วย
86
เมื่อปรับค่า Weight จนค่า Error เป็นศูนย์ เราจะเห็นว่าความชันเส้นสี

นํ้าเงินจะมีค่าเป็นศูนย์เช่นกัน (ไม่มีความชัน)
87
คํานวณความชันของ Error
ทีนี้เราจะหาความชันเส้นสีนํ้าเงินได้อย่างไร?
88
เราได้กําหนดให้สมการของ Error Function เป็น Square Error

ความชันของ Error เทียบกับ Weight w3 หาได้จาก
89

90

91

92
ความชันติดลบหมายความว่าเราต้องปรับเพิ่มค่า Weight เพื่อเข้าใกล้

จุดที่ Error ตํ่าที่สุด (จุดสีเขียว)
93
94
95
Backpropagation
Backpropagation
Backpropagation คือ วิธีการปรับคา weight ใน neural network เพื่อใหโมเดลทํานายผล
ไดแมนขึ้น (error นอย) Backpropagation เปนวิธีการที่มีประสิทธิภาพกวาการวน loop ปรับ
weight ทีละคาเนื่องจากมีการ re-use คาความชันของ error จาก layer กอนหนา
ปรับคา weight จาก layer ทายไปหนา
97
Backpropagation Algorithm
https://google-developers.appspot.com/machine-learning/crash-course/backprop-scroll/
98
99
Recap: คํานวณความชันของ Error
ความชันติดลบหมายความว่าเราต้องปรับเพิ่มค่า Weight เพื่อเข้าใกล้

จุดที่ Error ตํ่าที่สุด (จุดสีเขียว)
100
Update Weight อย่างไร?
101
Learning Rate
102
Source: https://www.jeremyjordan.me/nn-learning-rate/
เรมิ่ ต้น Update Weight จาก Layer ท้ายสุด
ตอนที่เราหาความชันของ Error Function มี Function

ย่อยข้างในที่เป็น Prediction ที่เกิดจาก h1w3 ดังนั้นเราจึง
ต้องทํา chain rule โดยแตกออกมาเป็น 2 พจน์แบบนี้
103
104
105
106
เราจะได้ w3 ค่าใหม่ที่ขยับเข้าใกล้จุดตํ่าสุดของ Error มากขึ้น (จุดสีเขียว)
107
Update w1 และ w2
ในกรอบสีเขียวเราจะเห็นว่ามีการ re-use ความชันของ Error ตอนที่เราหาความชันของ Error Function มี Function

ของ layer ก่อนหน้า ทําให้การคํานวณจาก layer ท้ายไป ย่อยข้างในที่เป็น Prediction ที่เกิดจาก h1w3 ซึ่ง
layer แรกมีประสิทธิภาพ Function h1เกิดจาก x1w1+x2w2 ดังนั้นเราจึงต้องทํา
chain rule 2 ครั้งเพื่อหาค่า w1ค่าใหม่
108
109
110
111
112
113
เปร ียบเทียบ Weight เดิมกับ Weight ใหม่
2104 0.16 h1
0.1433
p = 48.28
0.1
3
h1 = 336.94 b1=0, b2=0 114
เปร ียบเทียบ Weight เดิมกับ Weight ใหม่
2104 0.16 h1
0.1433
p = 48.28 Label: $399
0.1
3
h1 = 336.94 b1=0, b2=0 115
Recap: Neural Network Lifecycle
116
Workshop 10: Backpropagation
PyTorch Autograd (Automatic Differentiation)
More detail: http://blog.ezyang.com/2019/05/pytorch-internals/

https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation
118
Source: https://pytorch.org/docs/stable/tensors.html?highlight=backward#torch.Tensor.backward
คัดเลือกเฉาะข้อมูลที่เกี่ยวข้อง

Function
Deploy Model
120
Workshop 11: Train Neural
Network in PyTorch
Workshop 12: Fashion-MNIST
Dataset
Hyperparameters: Tuning the Neural Network
Batch Size, Iteration, and Epoch
Given a training set with 48,000 samples.
● Batch Gradient Descent. Batch Size = Size of

Training Set.
● Stochastic Gradient Descent. Batch Size = 1.
● Mini-Batch Gradient Descent. 1 < Batch Size <
Size of Training Set.
We can divide the training set of 48,000 samples

into batch size of 100. It will take 480 iterations
to complete 1 epoch.
124
Source: https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/
Learning Rate
125
Source: https://www.jeremyjordan.me/nn-learning-rate/
Momentum
You can try an interactive visualization of learning rate and

momentum here: https://distill.pub/2017/momentum.
The default learning rate in Keras is 0.01, momentum 0, decay 0. 126
Weight Decay (Regularization)
n คือจํานวน layers
w[j] คือค่า weight ที่ layer jth
m คือ จํานวน input samples
λ คือค่า weight decay
Weight decay จะช่วยแก้ปัญหา Overfitting โดยจะ

เพิ่มค่า Error เมื่อมีค่า weight ที่มีค่ามากๆ ปรากฏขึ้น
ในโมเดล โดย Default Keras จะให้ค่านี้เป็น 0 ส่วน
AlexNet ตั้งค่านี้ไว้ที่ 0.0005
Source: https://www.slideshare.net/xavigiro/training-deep-networks-d1l5-2017-upc-deep-learning-for-computer-vision 127

https://www.youtube.com/watch?v=iuJgyiS7BKM
Before and After Applying Weight Decay
Before: Overfitting After: Better Model
128
Source: https://www.d2l.ai/chapter_multilayer-perceptrons/weight-decay.html
Feature Scaling
129
Source: https://slideplayer.com/slide/16156919/
130
Source: https://www.youtube.com/watch?v=wEoyxE0GP2M
Convolutional Neural Network
132
Source: https://neurology.mhmedical.com/book.aspx?bookID=1049
Image Classification Development
Alex Krizhevsky invented AlexNet and won Large Scale

Visual Recognition Challenge 2012 (ILSVRC2012).
Source: https://shift.newco.co/there-is-one-thing-that-computers-will-never-beat-us-at-f66af30565f0 133
https://qz.com/1307091/the-inside-story-of-how-ai-got-good-enough-to-dominate-silicon-valley/
Top-1 Accuracy vs Top-5 Accuracy
For top-5 accuracy, if top-5 highest prediction probabilities consist

of bird, then the accuracy metric count as correct. But top-1 accuracy
check only if the highest prediction probability matches the label.
134
Source: https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/
https://kobiso.github.io/Computer-Vision-Leaderboard/imagenet
135
Image as a Matrix
136
137
What is Convolution?
Photo editor apps
139
Source: https://www.macthai.com/2016/01/08/15-necessary-apps-editor-photo-for-ios/
Kernel/Filter
=
Input Image Convolution Blurred Image
Convolution operation is actually an inner dot product.

See more filters in Gimp:
https://docs.gimp.org/2.6/en/plug-in-convmatrix.html
140
Source: https://en.wikipedia.org/wiki/Kernel_(image_processing)
Convolution Operation
141
Source: http://cs231n.github.io/convolutional-networks/
Bias Y=θx+Thai
If the model want to predict "Which

nationality have the most beautiful
women?" you say Thai Ladies, we can
say its because you are biased.
In CNN, bias is a learnable parameters

which help the model assumption
minimize an objective function faster.
142
Calculate Feature Maps Dimension (1)
Output Dimension:
i คือ ขนาดของ input
p คือ จํานวน padding
f คือ ขนาดของ filter
Padding: 1, Stride: 1
s คือ ก้าวทีละกี่ช่อง (stride)
Input 5x5 Filter 3x3 Output 5x5

143
Calculate Feature Maps Dimension (2)
Given:
an input image size 6x6
kernel size 3x3
stride 2
padding 1
What is the dimension of an output?
144
Convolution as Edge Detection
Vertical Edge Detection
= ?
Kernel
Input Image
146
Vertical Edge Detection
=
Kernel
Output/Feature maps
Input Image
147
Horizontal Edge Detection
=
Kernel
Output/Feature maps
Input Image
148
Workshop 13: Convolution
Workshop 14: Tensor with TensorFlow
Convolutional Neural Network
152
Source: http://cs231n.github.io/convolutional-networks/
ReLU (Rectified Linear Unit)
Source: https://www.youtube.com/watch?v=m0pIlLfpXWE 153

https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
Pooling Layer
154
Source: https://computersciencewiki.org/index.php/Max-pooling_/_Pooling
A Simple Convolutional Neural Network
A simple model using Keras and TensorFlow.
156
A simple model summary. None in output shape a batch
size, it will be specify when you train a model. 157
Classification Layer
Flatten Layer and Dense Layer
Dense(10)
Feature Maps, 5x3 Flatten(), 15x1
Kernel, 15x10
1 Class #1 Probability
2
1 2 3
3
4 5 6
. Fully Connected
7 8 9 .
. .
.
10 11 12 .
13
13 14 15 14
15
Class #10 Probability
The probability values from Dense layer are raw.

We need to convert them into 0 to 1 range. 159
Softmax (1)
Take exponential of raw probability (logit score) with

elogit ensures non-negative value.
160
Source: https://towardsdatascience.com/cutting-edge-face-recognition-is-complicated-these-spreadsheets-make-it-easier-e7864dbf0e1a
Softmax (2)
Actual Image
Raw Probability Unnormalized Normalized

Probability Probability S(y)
Cat 14.2 1,468,864 ?
elogit normalize
Dog 10.2 26,903 0.02
Tiger 9.4 12,088 0.01
Sum 1,507,856 Sum 1.0
161
Cross-Entropy Loss
Normalized
Probability S(y) Cross-entropy (D) Label (L)
0.97 0.03 1.00
0.02 0.00 0.00
0.01 0.00 0.00
Loss 0.03
Cross-entropy measures distance (D) between S(y)

and label (L) for the correct class.
162
Softmax Cross-Entropy Loss
163
Source: https://towardsdatascience.com/cutting-edge-face-recognition-is-complicated-these-spreadsheets-make-it-easier-e7864dbf0e1a
Workshop 15: A Simple CNN
MobileNets
MobileNets Architecture
166
Source: https://arxiv.org/abs/1704.04861
Global Average Pooling (GAP)
Global average pooling layer is similar to max pooling layer

but it reduced dimensions more than max pooling layer.
167
Source: https://alexisbcook.github.io/2017/global-average-pooling-layers-for-object-localization/
Depthwise Separable Convolution
Regular Convolutional Layer
For example, each channel of 3x3x3 kernel slided through each channel
of an input image, and sum all values into the feature maps.
169
Source: https://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
Depthwise Convolutional Layer
Depthwise Convolution performs convolution on each channel separately. For

an image with 3 channels, a depthwise convolution creates an output feature
maps that also has 3 channels. Each channel gets its own set of weights.
170
Pointwise Convolutional Layer
The depthwise convolution is followed by a pointwise convolution. Pointwise

convolution is actually a regular convolution with 1x1 kernel size. So,
pointwise convolution convert 3 channels input into one channel.
171
Depthwise Separable Convolution is Fast!
Input Image: 224x224x3
Regular Convolution (64 filters)
2
(224-3+1) x (3x3x3) x 64 = 85,162,752 multiplication operations
output size: filter size number of filters
(input size - filter
size + 1)2
Depthwise Separable Convolution (64 filters)

Depthwise Convolution
(224-3+1)2 x (3x3x1) x 3 = 1,330,668
+ 3 kernels is number of channels from input image
Pointwise Convolution
(224-3+1)2 x (1x1x3) x 64 = 9,462,528
Total: ? multiplication operations
172
Depthwise Separable Convolution is Fast!
Input Image: 224x224x3
Regular Convolution (64 filters)
2
(224-3+1) x (3x3x3) x 64 = 85,162,752 multiplication operations
output size: filter size number of filters
input size - filter size
Depthwise Separable Convolution (64 filters)

Depthwise Convolution
(224-3+1)2 x (3x3x1) x 3 = 1,330,668
+ 3 kernels is number of channels from input image
Pointwise Convolution
(224-3+1)2 x (1x1x3) x 64 = 9,462,528
Total: 10,793,196 multiplication operations
7.9 Times Faster!

173
ReLU6
In MobileNet paper, they found that ReLU6 is more robust than regular ReLU when
using low-precision computation such as mobile devices.
174
Source: https://machinethink.net/blog/mobilenet-v2/
Transfer Learning
ImageNet dataset contains the 1000 categories and 1.2 million images.
176
Source: https://gluon-cv.mxnet.io/build/examples_datasets/imagenet.html
Case: Tiny Dataset
For example, less than 500 labels
for each class.
Transferred weights from

ImageNet. All these weights
are frozen.
Train these
classification layers on
177
your dataset.
Case: Small Dataset
For example, 500-1,000 labels for
each class.
Transferred weights from

ImageNet. All these weights
are frozen.
Train your dataset on

the last few
convolutional layers and
classification layers. 178
Case: Large Dataset
For example, more than 1,000 labels
for each class.
Train the entire networks on

your dataset. But it is
still good to initialize
with ImageNet.
179
Workshop 16: MobileNet
Workshop 17: TensorFlow.js

Intro To Deep Learning Day 2 and Day 3 - Revision 1.3 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intro To Deep Learning Day 2 and Day 3 - Revision 1.3 PDF

Uploaded by

Copyright:

Available Formats

Intro to Deep Learning

Supervised Learning Workﬂow

ตรวจสอบคุณภาพและ สร้าง Label ซึ่งจะเป็นเฉลยคําตอบที่จะใช้ Train

วัดผลครั้งสุดท้ายกับ Test Set

คัดแยกดอกไอร ิส 2 สายพันธุ์ Iris Setosa และ Iris Virginica

Class: Iris Setosa Class: Iris Virginica

0.3 0.7 1.0 0.9

0.9 0.2 0.8 0.6

0.2 0.5 1.2 0.5

0.4 0.2 0.6 0.8

0.6 0.3 1.3 0.7 7

บ๊อบ 14/02/2019 พอร์ตแลนด์ 2104 3 399

อลิซ 14/03/2019 พอร์ตแลนด์ 1600 3 329

จอห์น 01/04/2019 พอร์ตแลนด์ 2400 3 369

... ... ... ... ... ...

ตรวจสอบคุณภาพและ สร้าง Label ซึ่งจะเป็นเฉลยคําตอบที่จะใช้ Train

วัดผลครั้งสุดท้ายกับ Test Set

บ๊อบ 14/02/2019 พอร์ตแลนด์ 2104 3 399

อลิซ 14/03/2019 พอร์ตแลนด์ 1600 3 329

จอห์น 01/04/2019 พอร์ตแลนด์ 2400 3 369

... ... ... ... ... ...

ข้อมูลราคาบ้านเทียบกับขนาดของบ้านที่ Portland, Oregon

ถ้าเรามีบ้านที่ต้องการขาย ขนาด 4000 ตารางฟุต ในเมือง Portland เราสามารถประมาณราคาขายของบ้านได้ที่ $600k

ตรวจสอบคุณภาพและ สร้าง Label ซึ่งจะเป็นเฉลยคําตอบที่จะใช้ Train

วัดผลครั้งสุดท้ายกับ Test Set

... ... ...

ตรวจสอบคุณภาพและ สร้าง Label ซึ่งจะเป็นเฉลยคําตอบที่จะใช้ Train

วัดผลครั้งสุดท้ายกับ Test Set

2104 3 399 ขนาดบ้าน จํานวนห้องนอน ราคาบ้าน (Label)

... ... ...

Dataset 2107 3 395

... ... ... 25

ตรวจสอบคุณภาพและ สร้าง Label ซึ่งจะเป็นเฉลยคําตอบที่จะใช้ Train

วัดผลครั้งสุดท้ายกับ Test Set

6 Parameters Neural Network

60k Parameters Neural Network

138M Parameters Neural Network

*ดังนั้น weight เป็นชุดตัวเลขที่เราต้องการค้นหา เพื่อให้คําตอบราคาบ้านที่ถูกต้องได้

*ดังนั้น weight เป็นชุดตัวเลขที่เราต้องการค้นหา เพื่อให้คําตอบราคาบ้านที่ถูกต้องได้

ตรวจสอบคุณภาพและ สร้าง Label ซึ่งจะเป็นเฉลยคําตอบที่จะใช้ Train

วัดผลครั้งสุดท้ายกับ Test Set

Gregory Channan Soumith Chintala

Yangqing Jia Sam Gross

2104 3 399 ขนาดบ้าน จํานวนห้องนอน ราคาบ้าน (Label)

... ... ...

... ... ... 60

... ... ...

... ... ...

... ... ...

... ... ...

... ... ...

ตรวจสอบคุณภาพและ สร้าง Label ซึ่งจะเป็นเฉลยคําตอบที่จะใช้ Train

วัดผลครั้งสุดท้ายกับ Test Set

Classiﬁcation Optimization Linear Regression Optimization

... ... ...

... ... ...

... ... ...

2104 3 399 แต่เราต้องการรู้ระยะห่างระหว่าง Prediction กับ

... ... ...

เส้นสีแดงคือโมเดล เป้าหมายคือ Input x เข้าไปในโมเดล neural network แล้วให้

... ... ...

... ... ...

จุดสีเขียวด้านบนคือค่า Weight ที่จะให้ Error ตํ่าที่สุด