Professional Documents
Culture Documents
Lecture01 VDL
Lecture01 VDL
Lecture 02
Machine
Learning
Shallow
Deep Learning
Learning
Self-
Supervised Supervised Unsupervised Reinforcement
supervised
Structured
Unsupervised Regression Classification
Prediction
Self- Linear
supervised Regression
Reinforcement
x 𝑓𝑤 y
Learning: is the estimation of the parameter w from the training data {(𝑥𝑖, 𝑦𝑖 )}𝑛𝑖=1
Inference: Make prediction on unkown point x i.e., 𝑦 = 𝑓𝑤 𝑥
Machine
Learning
Shallow
Deep Learning
Learning
Self-
Supervised Supervised Unsupervised Reinforcement
supervised
Structured
Unsupervised Regression Classification
Prediction
Self- Linear
supervised Regression
Reinforcement
𝑓𝑤 31.45
Mapping:
Machine
Learning
Shallow
Deep Learning
Learning
Self-
Supervised Supervised Unsupervised Reinforcement
supervised
Structured
Unsupervised Regression Classification
Prediction
Self- Linear
supervised Regression
Reinforcement
Too low capacity almost there about right capcity too high
Machine
Learning
Shallow
Deep Learning
Learning
Self-
Supervised Supervised Unsupervised Reinforcement
supervised
Structured
Unsupervised Regression Classification
Prediction
Self- Linear
supervised Regression
Ridge
Reinforcement
Regression
◼ Ridge Regression
◼ M = 15
◼ left most linear regression,
◼ Others: left to write weak to strong regularization
◼ Point Estimator
^ A point estimator is a function that maps a dataset to model
parameter Estimator
Estimate
◼ Bias-Variance Dilemma:
►Statistical learning theory tells us that we can’t have both ⇒ there is a trade-off
◼ Datasets = 100
◼ True Model = Green line
◼ This Lecture
^ Ridge regression
^ Estimators, Bias, Variance
◼ Example
^ Assuming we obtain
◼ Variations:
^ If we choose as a Laplace distribution, we are going to obtain
we will get the norm and the expression becomes
Machine
Learning
Shallow
Deep Learning
Learning
Self-
Supervised Supervised Unsupervised Reinforcement
supervised
Structured
Unsupervised Regression Classification
Prediction
Self- Linear
supervised Regression
Ridge
Reinforcement
Regression
𝑓𝑤 cat
Mapping:
{ 0, 1 }
Machine
Learning
Shallow
Deep Learning
Learning
Self-
Supervised Supervised Unsupervised Reinforcement
supervised
Structured
Unsupervised Regression Classification
Prediction
Ridge
Reinforcement
Regression
Where
◼ The question is that how to choose
◼ We are working with discrete distribution i.e
◼ Putting it together
◼ In machine learning we use a general term ‘loss function’ rather than the error
function
◼ We minimize the dissimilarity between the empirical data distribution
(defined by the training set) and the model distribution
◼ A simple 1D example
Source: https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a
◼ A simple 1D example
Source: https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a
◼ A simple 1D example
◼ A simple 1D example
Source: https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a
◼ A simple 1D example
◼ A simple 1D example
With and
◼ Gradient Descent
^ Pick the step size and tolerance
^ Initialize
^ Repeat until
◼ Variants
^ Line Search
^ Conjugate gradients Source: https://en.wikipedia.org/wiki/Conjugate_gradient_method
^ L-BFGS
◼ Examples in 2D