Professional Documents
Culture Documents
AI.5 Machine Learning (21 26)
AI.5 Machine Learning (21 26)
Machine Learning
Artificial Intelligence
CONTENTs
• Introduction to ML
• Supervised learning
• Regression: Linear Regression
• Classification: Naïve Bayes, Decision trees
• Unsupervised learning
• Clustering: K-Means
• Model Evaluation
• Neural Network
2
CONTENTs
• Introduction to ML
• Supervised learning
• Regression: Linear Regression
• Classification: Naïve Bayes, Decision trees
• Unsupervised learning
• Clustering: K-Means
• Model Evaluation
• Neural Network
3
…Introduction to ML
4
Introduction to ML
5
…Introduction to ML
6
…Introduction to ML
7
…Introduction to ML
8
…Introduction to ML
9
…Introduction to ML
10
…Introduction to ML
• Build ML models
11
…Introduction to ML
• Build ML models
12
…Introduction to ML
• Build ML models
13
…Introduction to ML
• Build ML models
14
…Introduction to ML
15
…Introduction to ML
16
…Introduction to ML
• What is ML?
17
…Introduction to ML
• How does ML relate to AI?
Three components of ML
ML vs AI
https://vas3k.com/blog/machine_learning/
18
…ML
• Overview
The map of
ML world
19
…Introduction to ML
• Overview
https://vas3k.com/blog/machine_learning/
20
…Introduction to ML
• Overview – (1).Classical ML
21
…Introduction to ML
Popular algorithms:
Q-Learning, SARSA, DQN, A3C,
Genetic algorithm
22
…Introduction to ML
Popular algorithms:
• Random Forest
• Gradient Boosting
(phương pháp Học Máy Tích hợp)
23
…Introduction to ML
Popular architectures:
• Perceptron
• Convolutional Network (CNN)
• Recurrent Networks (RNN)
• Autoencoders
24
…Introduction to ML
• Supervised vs Unsupervised
(classical ML)
• Supervised learning:
• Learning a model from labeled data.
• Unsupervised learning:
• Learning a model from unlabeled data
25
…Introduction to ML
26
…Introduction to ML
• Classification
• an object's category prediction
27
…Introduction to ML
28
…Introduction to ML
29
…Introduction to ML
• Supervised learning
• Regression is the process of predicting continuous values
30
…Introduction to ML
• Supervised learning
• Classification is the process of predicting discrete class labels/categories
31
…Introduction to ML
32
…Introduction to ML
33
…Introduction to ML
• Unsupervised learning
34
…Introduction to ML
• Unsupervised learning
• Clustering is grouping of data points or objects
35
…Introduction to ML
• Supervised vs Unsupervised
36
CONTENTs
• Introduction to ML
• Supervised learning
• Regression: Linear Regression
• Classification: Naïve Bayes, Decision trees
• Unsupervised learning
• Clustering: K-Means
• Model Evaluation
• Neural Network
37
Supervised learning
38
Supervised learning
• Given a training set of example input–output pairs:
𝑥1 , 𝑦1 , … , 𝑥𝑛 , 𝑦𝑛 / 𝑥𝑖 𝜖 𝑅𝑑
• Regression:
oy is a real value, 𝑦 𝜖 𝑅
o𝑓: 𝑅𝑑 → 𝑅
f is called a regressor.
• Classification:
oy is discrete. To simplify, 𝑦 𝜖 {−1, +1}
o𝑓: 𝑅𝑑 → −1, +1
f is called a binary classifier.
39
Supervised learning
• Regression:
40
Supervised learning
• Regression:
41
Supervised learning
• Regression:
42
Supervised learning
• Classification:
43
CONTENTs
• Introduction to ML
• Supervised learning
• Regression: Linear Regression
• Classification: Naïve Bayes, Decision trees
• Unsupervised learning
• Clustering: K-Means
• Model Evaluation
• Neural Network
• Recommender Systems
44
Linear Regression: History
45
Linear Regression
o𝜔 are weights
o𝑤 is the vector 𝜔0 , 𝜔1
oLearning the linear model → learning the 𝝎
47
Linear Regression
line in 𝑅2 hyperplane in 𝑅3
48
Introduction to Regression
49
…Introduction to Regression
• Regression model
50
…Introduction to Regression
• Applications of regression
• Regression algorithms
51
…Introduction to Regression
52
Simple Linear Regression
53
Simple Linear Regression
54
Simple Linear Regression
55
Simple Linear Regression
56
Simple Linear Regression
57
Simple Linear Regression
58
Simple Linear Regression
59
Simple Linear Regression
60
Simple Linear Regression
61
Simple Linear Regression
62
Simple Linear Regression
63
Multiple Linear Regression
64
Multiple Linear Regression
65
Multiple Linear Regression
66
Multiple Linear Regression
67
Multiple Linear Regression
68
Multiple Linear Regression
69
CONTENTs
• Introduction to ML
• Supervised learning
• Regression: Linear Regression
• Classification: Naïve Bayes, Decision trees
• Unsupervised learning
• Clustering: K-Means
• Model Evaluation
• Neural Network
• Recommender Systems
70
Bayes Rule
• 𝑝 𝐴 𝐵 is called posterior
(posterior distribution on A given B).
• 𝑝 𝐴 is called prior.
• 𝑝(𝐵) is called evidence.
• 𝑝(𝐵|𝐴) is called likelihood.
71
Bayes Rule
• This table divides the sample space into 4 mutually exclusive events.
• The probability in the margins are called marginals and are
calculated by summing across the rows and the columns.
• Another form:
72
Example of Using Bayes Rule
73
Why probabilities?
74
Discriminative Algorithms
75
Generative Algorithms
• Idea: Build a model for what positive examples look like. Build a
different model for what negative example look like.
• To predict a new example, match it with each of the models and
see which match is best.
• Model 𝑝 𝑥|𝑦 and 𝑝 𝑦 !
𝑝 𝑥|𝑦 𝑝 𝑦
• Use Bayes rule to obtain 𝑝 𝑦|𝑥 =
𝑝 𝑥
• To make a prediction:
76
Naive Bayes Classifier
• Probabilistic model.
• Highly practical method.
• Application domains to natural language text documents.
• Naive because of the strong independence assumption it makes (not
realistic).
• Simple model.
• Strong method can be comparable to decision trees and neural
networks in some cases.
77
Setting
78
Setting
80
Algorithm
81
Example
82
Solution
• Conditional probabilities:
𝑦𝑛𝑒𝑤 = 𝑦𝑒𝑠
83
CONTENTs
• Introduction to ML
• Supervised learning
• Regression: Linear Regression
• Classification: Naïve Bayes, Decision trees
• Unsupervised learning
• Clustering: K-Means
• Model Evaluation
• Neural Network
84
Tree Classifiers
87
Example
88
Example
89
Splitting criteria
90
Splitting criteria
91
Splitting criteria
92
93
94
95
96
Decision trees
• At the first split starting from the root, we choose the attribute that
has the max gain.
• Then, we re-start the same process at each of the children nodes (if
node not pure).
97
Decision trees
98
CONTENTs
• Introduction to ML
• Supervised learning
• Regression: Linear Regression
• Classification: Naïve Bayes, Decision trees
• Unsupervised learning
• Clustering: K-Means
• Model Evaluation
• Neural Network
99
…Clustering
• Introduction to Clustering
100
…Clustering
• Introduction to Clustering
101
…Clustering
• Introduction to Clustering
102
…Clustering
• Introduction to Clustering
103
…Clustering
• Clustering applicatinons
104
…Clustering
• Clustering applicatinons
105
k-Means clustering
• Introduction to Clustering
106
…k-Means clustering
• k-Means algorithms
107
…k-Means clustering
108
…k-Means clustering
• 1-dimensional similarity/distance
109
…k-Means clustering
• Multi-dimensional similarity/distance
110
…k-Means clustering
111
…k-Means clustering
112
…k-Means clustering
113
…k-Means clustering
114
…k-Means clustering
115
…k-Means clustering
116
…k-Means clustering
117
…k-Means clustering
118
…k-Means clustering
119
…k-Means clustering
120
…k-Means clustering
121
…k-Means clustering
122
…k-Means clustering
123
1. Select initial 2. Assign each object to the
centroids at random cluster with the nearest
centroid.
K-means Clustering
Euclidean:
Cosine:
12
5
…k-Means clustering
126
…k-Means clustering
127
CONTENTs
• Introduction to ML
• Supervised learning
• Regression: Linear Regression
• Classification: Naïve Bayes, Decision trees
• Unsupervised learning
• Clustering: K-Means
• Model Evaluation
• Neural Network
128
Training and Testing
129
Training and Testing
• The squared-error:
130
Training and Testing
131
Model Evaluation in Regression Models
132
Model Evaluation in Regression Models
133
Model Evaluation in Regression Models
134
Model Evaluation in Regression Models
135
Model Evaluation in Regression Models
136
Model Evaluation in Regression Models
137
Model Evaluation in Regression Models
138
Model Evaluation in Regression Models
139
Evaluation Metrics in Regression Models
• Regression accuracy
140
Evaluation Metrics in Regression Models
• Regression accuracy
141
Evaluation Metrics in Regression Models
• Regression accuracy
142
Evaluation Metrics in Classification
• Classification accuracy
143
Evaluation Metrics in Classification
• Jaccard index
144
Confusion matrix
145
Evaluation Metrics in Classification
• F1-score
146
Evaluation Metrics in Classification
• Log loss
147
Evaluation Metrics in Classification
• Log loss
148
CONTENTs
• Introduction to ML
• Supervised learning
• Regression: Linear Regression
• Classification: Naïve Bayes, Decision trees
• Unsupervised learning
• Clustering: K-Means
• Model Evaluation
• Neural Network
149
CONTENTS
• Neural Networks
• Perceptron
• From perceptron to MLP
• Backpropagation algorithm
• Multiclass case
• MNIST database
150
AI
Neural Networks
151
AI
Perceptron
𝑓 𝑥𝑖 = 𝑠𝑖𝑔𝑛 𝜔𝑗 𝑥𝑖𝑗
𝑗=0
152
AI
Perceptron expressiveness
AI153
From perceptron to MLP
154
AI
From perceptron to MLP
𝑔 𝑧 → 1 𝑤ℎ𝑒𝑛 𝑧 → +∞ 𝑔 𝑧 → 0 𝑤ℎ𝑒𝑛 𝑧 → −∞
155
AI
Perceptron with Sigmoid
156
AI
The XOR example
157
AI
The OR example
• First what is the perceptron of the OR?
158
AI
The AND and NAND example
• Similarly, we obtain the perceptrons for the AND and NAND:
• Note: how the weights in the NAND are the inverse weights of the
AND.
159
AI
The XOR example
• Let’s try to create a NN for the XOR function using elementary
perceptrons.
160
AI
The XOR example
• Let’s put them together...
161
AI
Backpropagation algorithm
162
AI
Feedforward-Backpropagation
• 𝑒 = 𝑥, 𝑦
• e: example
• x: feature vector
• y: label i
𝜔𝑖𝑗
j
𝑓(𝑥)
𝑜𝑒
AI
163
Backpropagation rules
• We consider 𝑘 outputs
• For an example e defined by (𝑥, 𝑦), the error on training example 𝑒,
summed over all output neurons in the network is:
1
𝐸𝑒 𝜔 = 𝑦𝑘 − 𝑜𝑘 2
2
𝑘
• Remember, gradient descent iterates through all the training
examples one at a time, descending the gradient of the error w.r.t.
this example.
𝜕𝐸𝑒 𝜔
∆𝜔𝑖𝑗 = − 𝛼
𝜕𝜔𝑖𝑗
164
AI
Backpropagation rules
Notations:
• 𝑥𝑖𝑗 : the 𝑖 𝑡ℎ input to neuron j.
• 𝜔𝑖𝑗 : the weight associated with the 𝑖 𝑡ℎ input to neuron j.
• 𝑧𝑗 = σ 𝜔𝑖𝑗 𝑥𝑖𝑗 , weighted sum of inputs for neuron j.
• 𝑜𝑗 : output computed by neuron j.
• 𝑔 is the sigmoid function.
• outputs: the set of neurons in the output layer.
• 𝑆𝑢𝑐𝑐 𝑗 : the set of neurons whose immediate inputs includethe
output of neuron j.
165
AI
Backpropagation notations
i
𝜔𝑖𝑗 𝑧𝑗 = 𝜔𝑖𝑗 𝑥𝑖𝑗
i j 𝑔 𝑧𝑗 𝑜𝑗
j 𝑔 𝑧𝑗 𝑜𝑗
AI
Backpropagation rules
167
AI
Backpropagation rules
168
AI
Backpropagation rules
𝜕𝐸
= − 𝑦𝑗 − 𝑜𝑗 𝑜𝑗 (1 − 𝑜𝑗 )
𝜕𝑧𝑗
∆𝜔𝑖𝑗 = 𝛼 𝑦𝑗 − 𝑜𝑗 𝑜𝑗 1 − 𝑜𝑗 𝑥𝑖𝑗
• We will note
𝜕𝐸
𝛿𝑗 = −
𝜕𝑧𝑗
∆𝜔𝑖𝑗 = 𝛼 𝛿𝑗 𝑥𝑖𝑗
169
AI
Backpropagation rules
• Case 2: Neuron 𝒋 is a hidden neuron
170
AI
Backpropagation algorithm
• Input: training examples (𝑥, 𝑦), learning rate 𝛼 (e.g., 𝛼 = 0.1), 𝑛𝑖 , 𝑛ℎ and 𝑛𝑜 .
• Output: a neural network with one input layer, one hidden layer and one output layer
with 𝑛𝑖 , 𝑛ℎ and 𝑛𝑜 number of neurons respectively and all its weights.
1. Create feedforward network 𝑛𝑖 , 𝑛ℎ , 𝑛𝑜
2. Initialize all weights to a small random number (e.g., in [-0.2, 0.2])
3. Repeat until convergence
(a) For each training example (𝑥, 𝑦)
i. Feed forward: Propagate example 𝑥 through the network and compute the
output 𝑜𝑗 from every neuron.
ii. Propagate backward: Propagate the errors backward.
Case 1 For each output neuron 𝑘, calculate its error
𝛿𝑘 = 𝑜𝑘 1 − 𝑜𝑘 (𝑦𝑘 − 𝑜𝑘 )
Case 2 For each hidden neuron ℎ, calculate its error
𝛿ℎ = 𝑜ℎ 1 − 𝑜ℎ 𝜔ℎ𝑘 𝛿𝑘
𝑘𝜖𝑆𝑢𝑐𝑐 ℎ
iii. Update each weight 𝜔𝑖𝑗 ← 𝜔𝑖𝑗 + 𝛼 𝛿𝑗 𝑥𝑖𝑗
171
AI
Observations
𝑒 𝑘𝑥
𝑔 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 = 𝑘𝑥
𝑓𝑜𝑟 𝑘 = 1, 𝑘 = 2, 𝑒𝑡𝑐.
1+𝑒
𝑥
𝑒 −𝑒 −𝑥
𝑔 𝑥 = 𝑡𝑎𝑛ℎ 𝑥 = 𝑥 −𝑥
(𝐼𝑡 𝑖𝑠 𝑎 𝑟𝑒𝑠𝑐𝑎𝑙𝑖𝑛𝑔 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛!).
𝑒 +𝑒
AI
Multiclass case
173
AI
MNIST database
Experiments with the MNIST database
• The MNIST database of handwritten digits
• Training set of 60,000 examples, test set of 10,000 examples
• Vectors in 𝑅784 (28x28 images)
• Labels are the digits they represent
• Various methods have been tested with this training set and test set
• Introduction to ML
• Supervised learning
• Regression: Linear Regression
• Classification: Naïve Bayes, Decision trees
• Unsupervised learning
• Clustering: K-Means
• Model Evaluation
• Neural Network
175
Nhân bản – Phụng sự – Khai phóng
176