Professional Documents
Culture Documents
:Supervised ML
Dr.Praisan padungweang
School of information and technology
KMUTT
Example data set
Sample Features, Variables, Attributes, Label, Class,
Column, Dimensions Target, Output
Features
Sepal Sepal Petal Petal
Species
length width length width
5.0 2.0 3.5 1.0 I. versicolor
Sample, 4.5 2.3 1.3 0.3 I. setosa
Observatio 5.0 2.3 3.3 1.0 I. versicolor
n, Record, 6.2 2.8 4.8 1.8 I. virginica
Row 6.8 2.8 4.8 1.4 I. versicolor
5.6 2.8 4.9 2.0 I. virginica
Fisher's Iris Dataset 5.8 2.8 5.1 2.4 I. virginica
7.7 2.8 6.7 2.0 I. virginica
4.4 2.9 1.4 0.2 I. setosa
7.3 2.9 6.3 1.8 I. virginica
4.3 3.0 1.1 0.1 I. setosa
4.4 3.0 1.3 0.2 I. setosa
6.1 3.0 4.6 1.4 I. versicolor
6.0 3.0 4.8 1.8 I. virginica
5.9 3.2 4.8 1.8 I. versicolor
6.5 3.2 5.1 2.0 I. virginica
6.4 3.2 5.3 2.3 I. virginica
6.9 3.2 5.7 2.3 I. virginica
: : : : :
2
https://en.wikipedia.org/wiki/Iris_flower_data_set
Predictive Analytics
Classification Models
LOGISTIC REGRESSION, DECISION TREE, SUPPORT VECTOR MACHINE, NEURAL NETWORK
Machine learning
Supervised Labeled data
Learning Direct feedback
Predict outcome/feature
Unsupervised No labels
Learning No feedback
Find hidden structure in data
4
Features Label
Labeled data
concentration ML Model and Learning
Cla
A001 148 33.6 1 Algorithm
Label Label A002 85 26.6 0
s
predict
s
: : : :
i f
feedback
i c
other other
ation
ML Model and Learning Model
other Algorithm Plasma glucose
New person
Body mass index 𝓏 = 7.5156+(-0.0352)𝑥!+(-0.0763)𝑥"
dog concentration
(𝑥")
Diabetes
1
(𝑥!)
𝑓(𝓏) =
100 30 ? 1 + 𝑒 #𝓏
dog
feedback
other
other 𝓏 = 7.5156 + (-0.0352)100+(-0.0763)30 = 1.7066
!
Diabetes?
𝑓(𝓏) = = 0.8464
! 1.7066
!"# 0.8464 => yes
Model cat
cat other
? Features
Label
predict
Regression
on
cat SP NIKKEI EU
cat
ti date
Labeled data
a 5-Jan-09 -0.00468 0 0.012698 ML Model and Learning
if ic 6-Jan-09 0.007787 0.004162 0.011341 Algorithm
lass 7-Jan-09 -0.03047 0.017293 -0.01707
C : : :
feedback
Supervised Learning
• Labeled data SP NIKKEI Model
Some day
(𝑥!) (𝑥")
EU
• Direct feedback 𝓏 = 0+0.6099𝑥! +0.1723𝑥"
0.033007 0.005594 ?
• Predict outcome/feature
𝓏 = 0+0.6099 x 0.033007+0.1723 x 0.005594
= 0.021094714 EU = 0.021094714
5
Features
Unsupervised Machine Learning
ID DayMins EveMins NightMins IntlMins
C001 265.1 197.4 244.7 10
Image Clustering
C002 161.6 195.5 254.4 13.7 ML Model and Learning
C003 243.4 121.2 162.6 12.2 Algorithm
Training data
C004 299.4 61.9 196.9 6.6
C005 166.7 148.3 186.9 10.1
: : : :
Customer segmentation
Cluster center 1
Clus ter 3
Member DayMins EveMins NightMins IntlMins 0.5
Cluster 1 34% 146.30 162.63 197.04 10.99 0
Cluster 2 30% 153.21 248.94 204.82 9.66 -0.5
Cluster 3 35% 234.84 196.69 201.17 10.00 IntlMins -1 EveMins
Model NightMins
Distance
Cluster 1 6.38
This person
person
DayMins EveMins NightMins IntlMins Cluster is in cluster 1
New
Cluster 2 47.85
124.3 277.1 250.7 15.5 ? Cluster 3 51.30
Unsupervised Learning
• No labels
• No feedback
• Find hidden structure in data
6
Customer segmentation Anomaly detection Image segmentation Gene micro-array clustering
Reinforcement machine Learning
Reinforcement Learning
• Decision process
• Reward system
• Learning series of action
7
Supervised Machine Learning Model-Regression
Features Target
Machine Learning
date SP NIKKEI EU
date
SP NIKKEI 𝓏 = 0+0.6099 x 0.033007+0.1723 x 0.005594
(𝑥+ ) (𝑥, ) = 0.021094714
Some day 0.033007 0.005594 EU?
0.021094714 8
Supervised Machine Learning Model-Classification
Features Target
Machine Learning
Plasma glucose
id Body mass index Diabetes
concentration
A001 148 33.6 1 Model learning
A002 85 26.6 0
: : : :
Training data
Historical data
Model
𝓏 = 7.5156+(-0.0352)𝑥! +(-
0.0763)𝑥"
1
𝑓(𝓏) =
1 + 𝑒 #𝓏
id
Plasma glucose
concentration
Body mass index 𝓏 = 7.5156 + (-0.0352)100+(-0.0763)30 Diabetes?
(𝑥, ) = 1.7066
(𝑥+ )
𝑓(𝓏) =
+ 0.8464 => yes
+-.)1.7066
New person 100 30 = 0.8464
9
Full dataset: https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes
Models Training and Model Selection
¡ Held-out test data
¡ data is divided into training set and test set
Training set
for model learning
Original Training set
Validation set
dataset
Model
Test set
Test set
10
Supervised model training and evaluation: Training
Class A
0-No-DR
1-Mild
Class B
2-Moderate,
3-Severe,
4-PDR
11
Supervised model training and evaluation: Validation
Class A
0-No-DR
1-Mild
Class B
2-Moderate,
3-Severe,
4-PDR
12
Supervised model training and evaluation : Using/Evaluation (Test set)
Class B: 2-4 Class A: 0-1
Class A: 0-1 Class B: 2-4
1 2 3 4
5 6 7 8
14
Decision trees
Decision trees are recursive partitioning algorithms that come up with a tree-like structure representing patterns in an
underlying data set
15
Decision tree decision boundaries
model decision boundaries orthogonal to the axes
Custo
Age Income Response
mer
16
Splitting decision
Temp Class
tttttt tttttt
36.0 t 111111
Entropy= 1 111111 Entropy= 1
36.3 t Temp Temp
36.8 t ≤40.5 >40.5
≤37.5 >37.5
37.3 t
36.9 t
37.0 t ttt 111 ttttt 111
37.6 1 ttt 111 t111
38.2 1 Entropy= 0 Entropy= 0 Entropy= 0.9 Entropy= 0
40.2 1
38.6 1 Child
Parent
39.4 1 (Weighted average)
17
Stopping criteria
Using appropriated metric
o Tree depth
o Maximum depth of a tree
o Deeper trees are more expressive (potentially allowing higher accuracy), but they are also more costly to train and are more likely to
overfit.
18
Assignment decision
typically looks at the majority class or probability within the leaf node to make the decision
probability
19
Decision tree applications
¡ Decision trees can be used for various purposes in analytics.
¡ input selection
¡ attributes that occur at the top of the tree are more predictive of the target
¡ initial segmentation.
¡ builds a tree of two or three levels deep as the segmentation scheme
¡ then uses second stage machine learning models for further refinement
20
Logistic regression/classification
Temp Class
Temp Predict
36.0 t 36.0 0.0
36.3 t 𝑓(𝓏)
36.3 0.1
1.0
36.8 t 0.9 36.8 0.2
37.3 t 0.8
37.3 0.4
36.9 t 0.7
36.9 0.2
0.6
37.0 t 0.5 37.0 0.3
37.6 1 0.4
37.6 0.5
0.3
38.2 1 0.2 38.2 0.8
40.2 1 0.1
40.2 1.0
0.0
38.6 1 35.5 36.0 36.5 37.0 37.5 38.0 38.5 39.0 39.5 40.0 40.5 𝑇𝑒𝑚𝑝 38.6 0.9
39.4 1 39.4 1.0
37.8 1 37.8 0.6
𝓏 = −75+2𝑇𝑒𝑚𝑝
1
𝑓(𝓏) =
1 + 𝑒 #𝓏
21
Logistic regression
Historical data
Feature Binary
target
Machine
id
Plasma glucose
Body mass index Diabetes Model learning Learning
concentration
A001 148 33.6 1
A002 85 26.6 0
A003 183 23.3 1
A004 89 28.1 0 Model
A005 137 43.1 1
A006 116 25.6 0 𝓏 = 7.5156+(-0.0352)𝑥! +(-
A007 78 31 1
A008 115 35.3 0 0.0763)𝑥"
A009 197 30.5 1 1
: : : : 𝑓(𝓏) =
1 + 𝑒 #𝓏
id
Plasma glucose
Body mass index
𝓏 = 7.5156 + (-0.0352)100+(-0.0763)30 Diabetes?
concentration = 1.7066
New person 100 30 ) 0.8464 => yes
𝑓(𝓏) = / 1.7066
)*+
Full dataset: = 0.8464
https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes 22
Model decision boundaries
Logistic classification
!
𝑓(𝓏)=
!"# ,(./+ .0123+ .4567893)
Decision trees
23
Support Vector Machines
¡ Nonlinear SVM classifier will first map the input data to a higher dimensional feature
space using some mapping; kernel methods.
kernel
https://www.youtube.com/watch?v=3liCbRZPrZA
24
Support Vector Machines
25
Support Vector Machines
𝑥"
Radial basis function (RBF) kernel
𝑥! 𝑥!
𝑥"
𝑥%
𝑥"
26
Support Vector Machines
¡ Kernel methods map the data into higher dimensional spaces in the hope that in this
higher-dimensional space the data could become more easily separated or better
structured.
¡ Different types of kernel functions can be used. The most popular are:
¡ Linear kernel:
¡ Polynomial kernel:
Empirical evidence has shown that the RBF kernel usually performs best, but note that it
includes an extra parameter σ to be tuned.
27
Neural
Networks
Neural networks
Perceptron
1
7.5156
29
Neural networks
¡ Multi Layer Perceptron (MLP)
30
The hidden layer of MLP
¡ Multi Layer Perceptron (MLP)
The hidden layers map input
h
0
1
h 2
h 3
h 4
31
Neural networks
Each node has a transformation function f(.) (also called activation functions). The most popular activation functions
are:
§ Linear
§ ranging between −∞ and +∞; 𝑓 𝑧 =𝑧
§ Sigmoid (Logistic)
)
§ ranging between 0 and 1; 𝑓 𝑧 =
)*+ /0
§ Hyperbolic tangent
+ 0 ;+ /0
§ ranging between –1 and +1; 𝑓 𝑧 = + 0 *+ /0
0 for z<0
§ ranging between 0 and +∞; 𝑓 𝑧 = 1
𝑧 for z ≥ 0
32
NN-model example
33
Neural networks-example 1
1
1
A002 85 26.6
A003 183 23.3
A004 89 28.1
A005 137 43.1
A006 116 25.6
A007 78 31
Body mass index
A008 115 35.3
A009 197 30.5
: : : :
34
Selecting activation function
35
Model training and evaluation strategy
Model Comparison
models
Held-out test data
data is divide into training set and test set Selected
Cr... model
¡ Training set is user for models creation
(training and validation) Test
performance
Test set
Training set
Model
Test set
36
Model training and evaluation strategy
Hold-out ….
Cross-validation for model comparison
¡ K-folds cross-validation
Model initialization
5-folds 2 1 1 1 1
Training set/ 3 3 2 2 2
1 Validation set 4 4 4 3 3
2 5 5 5 5 4
Original
dataset 3
4
Model
5
Test set 1 2 3 4 5
38
Overfitting and underfitting
ad
/B
od
Confusion Matrix
Go
Custom
Age Income Gender ... Response Predicted Predicted
er
John 30 1,200 M Bad 0.51 Good Bad
40
Model evaluation
Predicted
𝐴= 𝐴= 𝑃) 𝐴= 𝑃< 𝑨𝟑 𝑷𝟑 𝐴= 𝑃> a b c
: a ü û û
Actual
𝐴> 𝐴> 𝑃) 𝐴> 𝑃< 𝐴> 𝑃= 𝐴> 𝑨𝒌 𝑷𝒌 b û ü û
c û û ü
41
Evaluation Metrics
Predicted
T F
Accuracy =
𝑻𝑷" 𝑻𝑵
Positive predictive
value, 𝑻𝑷"𝑭𝑵"𝑭𝑷"𝑻𝑵
𝑻𝑷
Precision =
𝑻𝑷"𝑭𝑷
𝟐×𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏×𝑹𝒆𝒄𝒂𝒍𝒍
F1-score =
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏"𝑹𝒆𝒄𝒂𝒍𝒍
42
Expected Value for Model Evaluation Targeted marketing
Actual
R 150 150 R 0 300
Actual
Actual
-1 0
R N R N
Actual
N 0.1 0.75 N 0 0.85
Expected value
7.425 0
S 7.325 0 S 0 0
-0.1 0 0 0
43
REGRESSION
44
Linear regression
Linear regression is a baseline modeling technique to model a continuous
target variable.
60
50
Estimated
BMI percentage of
Body Fat
40
23.63 12.3
23.33 6.1
24.67 25.3 30
24.88 10.4
25.52 28.7 20
26.46 20.9
: :
10
Regression Problem
0
15 20 25 30 35 40 45 50
45
How Machine Learning Works?
Most of the machine learning technique learn to minimize loss
Example: Regression problem
Estimated percentage of Body Fat
60
y=Real Fat (47.5) Error
(47.5-41.73)=5.77
50
y’=Predicted Fat (41.73)
47.5
41.73 4 2
40
7.6
I -2
M
30 4 *B
5
84
1.
𝑦’
= Error (e) = 𝑦 − 𝑦’
t=
20
Fa
𝑆𝑆𝐸 = 𝑒!" +𝑒"" +𝑒(" +…+𝑒)"
10
Cost/loss
0
15 20 25 30 35
37.5940 45 50
46
The Regression
ISE =-0.033 * SP +
Price = 134.53*Size + 71270 Fat = 𝑦’= 1.8454*BMI - 27.642 -0.0616 * FTSE +
0.3021 * NIKKEI +
1.1068 * EU +
800 R² = 0.731 60 R² = 0.5547 0.001
0.1
700 50
600
R² = 0.519849
40 0.05
500
400 30
300 0
20
200
100 10 -0.05
0
0
0 10002000300040005000 -0.1
15 25 35 45
BMI ISE Predict
47
Beyond Linear regression
Polynomial regression
ℎ& (𝐱) = 𝑤' + 𝑤( 𝑥( + 𝑤) 𝑥()
Estimated percentage of Body Fat
50
45
40
35
30
25
20
15
10
𝑤* 𝑤! 𝑥! 𝑤" 𝑥!"
Fat = - 46.149 + 3.2675*BMI -0.0268*BMI2
5
0
15 20 25 30 35 40 45 50
BMI)
48
Supervised Machine Learning Model-Regression
Features Target
Machine Learning
date SP NIKKEI EU
date
SP NIKKEI 𝓏 = 0+0.6099 x 0.033007+0.1723 x 0.005594
(𝑥+ ) (𝑥, ) = 0.021094714
Some day 0.033007 0.005594 EU?
0.021094714 49
Beyond Linear regression
Polynomial regression
4 2.3
𝑃𝑟𝑖𝑐𝑒 = 𝑫𝒂𝒚𝟑 − 𝑫𝒂𝒚𝟐 + 4.8𝑫𝒂𝒚 + 900
100000 100
Model
FTSE SET All-Share Index 10 Jun 2010 to 2 Sep 2011 On 7 Sep 2011
Day=305 à Price =1359.33
50
Effect of outlier
70 90
y = 0.3252x + 10.716 80
60
70
50
60
40
50
40
30
20
10
10
0 0
15 65 115 165 215 15 65 115 165 215
BMI BMI
51