Professional Documents
Culture Documents
Machine Learning Basics
Machine Learning Basics
2017-07-14
Jinwook Kim
Book Information
Title: Deep learning
Authors:
Ian Goodfellow
Yoshua Bengio
Aaron Courville
ISBN: 9780262337434
2 InfoLab
Chapter Organization (Part 1)
3 InfoLab
Chapter Organization (Part 2)
4 InfoLab
Chapter Organization (Part 3)
5 InfoLab
Machine Learning and AI
6 InfoLab
Chapter 5. Machine Learning Basics
This chapter provides a brief course in the most important
general principles
7 InfoLab
Checkers Game: The First ML Application
8 InfoLab
Learning Algorithms
A field of study that gives computers the ability to learn
without being explicitly programmed
(Arthur Samuel, 1959)
9 InfoLab
Task, 𝑇𝑇
Classification (with missing inputs)
Regression
Transcription
Machine translation
Structured output
Anomaly detection
Synthesis and sampling
Imputation of missing values
Denoising
Density estimation
Probability mass function estimation
10 InfoLab
Classification
11 InfoLab
Regression
12 InfoLab
Transcription and Machine Translation
13 InfoLab
Performance Measure, 𝑃𝑃
Accuracy (error rate) for categorical data
To measure the proportion of examples for which model produces the
correct output
14 InfoLab
Experience, 𝐸𝐸
Unsupervised learning algorithms,
Experiences a dataset containing many features
Learns useful properties of the structure of the dataset
15 InfoLab
Example: Linear Regression
𝑦𝑦
⊺
𝑦𝑦� = 𝒘𝒘 𝒙𝒙
𝒙𝒙
16 InfoLab
Formal Definition of Linear Regression
⊺
Task, 𝑻𝑻: to predict 𝑦𝑦 from 𝒙𝒙 by outputting 𝑦𝑦� = 𝒘𝒘 𝒙𝒙
𝒙𝒙 ∈ ℝ𝑛𝑛: input data
𝑦𝑦 ∈ ℝ: output value (𝑦𝑦:
� predicted by the model)
𝒘𝒘 ∈ ℝ𝑛𝑛: parameters (or weights)
1
𝑴𝑴𝑴𝑴𝑴𝑴𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 = �(𝑦𝑦� 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 − 𝑦𝑦 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕)𝟐𝟐𝑖𝑖
𝑚𝑚
𝑖𝑖
𝑚𝑚: number of dataset
17 InfoLab
Find 𝒘𝒘 by Minimizing 𝑀𝑀𝑀𝑀𝑀𝑀𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝜵𝜵𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 = 𝟎𝟎
1
⇒ 𝜵𝜵𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 = ∑𝑖𝑖(𝑦𝑦� 𝒕𝒕𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 − 𝑦𝑦𝒕𝒕𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓)𝟐𝟐𝑖𝑖 = 𝟎𝟎
𝑚𝑚
1
⇒ 𝜵𝜵𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 = ∑𝑖𝑖(𝑿𝑿𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒘𝒘 − 𝑦𝑦𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕)𝟐𝟐𝑖𝑖 = 𝟎𝟎
𝑚𝑚
…
⊺ −𝟏𝟏 𝑿𝑿𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 ⊺
⇒ 𝒘𝒘 = 𝑿𝑿𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝑿𝑿𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝑦𝑦𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
18 InfoLab
Linear Regression Problem
19 InfoLab
Generalization
In ML, generalization is the ability to perform well on
previously unobserved inputs
1. Making the training error small
2. Making the gap between training and test error small
20 InfoLab
Controlling a Model with its Capacity
21 InfoLab
Relationship between Capacity and Error
22 InfoLab
Training Set Size and Generalization
Polynomial (degree-9)
Christopher, M. Bishop. PATTERN RECOGNITION AND MACHINE LEARNING. Springer-Verlag New York, Chapter 1.
23 InfoLab
Regularization
Regularization is any modification to prevent overfitting
⊺
𝑱𝑱 𝒘𝒘 = 𝑴𝑴𝑴𝑴𝑴𝑴𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 + 𝜆𝜆𝒘𝒘 𝒘𝒘
24 InfoLab
9-degree Regression Model with Weight Decay
Large 𝒘𝒘 over-fits to
training data
25 InfoLab
Hyperparameters vs. parameters
Hyperparameters are higher-level properties for a
model
Decides model’s capacity (or complexity)
Not be learned during training
e.g., degree of regression model, weight decay
26 InfoLab
Setting Hyperparameters with Validation Set
27 InfoLab
𝑘𝑘-fold Cross Validation
Training set
𝑘𝑘
28 InfoLab
Estimation on Statistics
Point estimation
Attempt to provide the singe best prediction of the true but unknown
property of a model using sample data, 𝒙𝒙𝟏𝟏, … , 𝒙𝒙𝒎𝒎
Function estimation
Types of estimation that predict the relationship between input and
target variables
Point estimator
� 𝒎𝒎 = 𝑔𝑔(𝒙𝒙𝟏𝟏, … , 𝒙𝒙𝒎𝒎 )
𝜽𝜽
� point estimator for the property of a model (e.g., expectation)
𝜽𝜽:
𝑚𝑚: number of data elements
𝒙𝒙𝟏𝟏, … , 𝒙𝒙𝒎𝒎 : independent and identically distributed (i.i.d.) data points
𝑔𝑔(�): any estimation function for the given data points
29 InfoLab
Properties of an Estimator
Bias measures the expected deviation from the true
value of 𝜽𝜽
� 𝒎𝒎 = 𝐄𝐄[𝜽𝜽
𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃 𝜽𝜽 � 𝒎𝒎] − 𝜽𝜽
�
𝑽𝑽𝑽𝑽𝑽𝑽 𝜽𝜽
30 InfoLab
Graphical Illustration of Bias and Variance
High-bias model
High-variance model
31 InfoLab
Bias-Variance Trade-off with Capacity
32 InfoLab
Bias & Variance on 𝑀𝑀𝑀𝑀𝑀𝑀
Let 𝒉𝒉(�; 𝒘𝒘) be a regression model defined by 𝒘𝒘
𝜽𝜽 = 𝑡𝑡(𝒙𝒙): the true (but unseen) distribution
� = 𝐸𝐸[ℎ 𝒙𝒙; 𝒘𝒘 ]: the estimation of 𝑡𝑡(𝒙𝒙)
𝜽𝜽
Then 𝑴𝑴𝑴𝑴𝑴𝑴 of 𝒈𝒈 is
� − 𝑬𝑬 𝜽𝜽
𝑬𝑬[ 𝜽𝜽 � + 𝑬𝑬[𝜽𝜽]
� − 𝜽𝜽 𝟐𝟐]
33 InfoLab
� − 𝑬𝑬 𝜽𝜽
𝑬𝑬 𝜽𝜽 � + 𝑬𝑬 𝜽𝜽� − 𝜽𝜽 𝟐𝟐
= � 𝟐𝟐 − 𝜽𝜽𝑬𝑬
𝑬𝑬[ 𝜽𝜽 � 𝜽𝜽 � + 𝜽𝜽𝑬𝑬
� 𝜽𝜽 � − 𝜽𝜽𝜽𝜽
�
� 𝜽𝜽
−𝜽𝜽𝑬𝑬 � + 𝑬𝑬 𝜽𝜽
� 𝟐𝟐 − 𝑬𝑬 𝜽𝜽
� 𝟐𝟐 + 𝑬𝑬 � 𝜽𝜽
𝜽𝜽
� 𝜽𝜽
+𝜽𝜽𝑬𝑬 � − 𝑬𝑬 𝜽𝜽
� 𝟐𝟐 + 𝑬𝑬 𝜽𝜽
� 𝟐𝟐 − 𝑬𝑬 � 𝜽𝜽
𝜽𝜽
� + 𝑬𝑬 𝜽𝜽
−𝜽𝜽𝜽𝜽 � 𝜽𝜽 − 𝑬𝑬 𝜽𝜽
� 𝜽𝜽 + 𝜽𝜽𝟐𝟐 ]
� − 𝑬𝑬 𝜽𝜽
= 𝑬𝑬 𝜽𝜽 � 𝟐𝟐 + 𝑬𝑬 � − 𝜽𝜽 𝟐𝟐
𝜽𝜽
�
+𝟐𝟐(𝑬𝑬[𝜽𝜽𝑬𝑬 � − 𝜽𝜽𝜽𝜽
𝜽𝜽 � − 𝑬𝑬 𝜽𝜽
� 𝟐𝟐 + 𝑬𝑬 𝜽𝜽
� 𝜽𝜽])
� − 𝑬𝑬 𝜽𝜽
= 𝑬𝑬 𝜽𝜽 � 𝟐𝟐 � − 𝜽𝜽
+ 𝑬𝑬 𝜽𝜽 𝟐𝟐
�
𝑽𝑽𝑽𝑽𝑽𝑽(𝜽𝜽) �
𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃 𝜽𝜽 𝟐𝟐
34 InfoLab
Ways to Trade-off Bias & Variance
Cross-validation
Highly successful on many real-world tasks
MSE of estimates
MSE incorporates both bias and variance
𝑴𝑴𝑴𝑴𝑴𝑴 = 𝑬𝑬 𝜽𝜽 � 𝒎𝒎 − 𝜽𝜽 𝟐𝟐
� 𝒎𝒎 𝟐𝟐 + 𝑽𝑽𝑽𝑽𝑽𝑽(𝜽𝜽
= 𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃 𝜽𝜽 � 𝒎𝒎)
35 InfoLab
Maximum Likelihood Estimation
Maximum likelihood estimation (MLE) is a method of
estimating the parameters of a statistical model
36 InfoLab
Linear Regression as Maximum Likelihood
Instead of single prediction 𝑦𝑦,
� we consider of the model as
producing a conditional distribution, 𝒑𝒑(𝑦𝑦|𝒙𝒙)
Log-likelihood:
𝑚𝑚 𝑚𝑚
37 InfoLab
Contents on Part 2
Bayesian statistics
38 InfoLab
InfoLab
Thank you
39