Professional Documents
Culture Documents
0 1 Intro Notations
0 1 Intro Notations
(supervised discriminative learning)
Notation
Supervised Learning
Labelled training data: 𝒟 𝐱 ,𝑦 :
• input 𝐱∈𝒳 ℝ
• classification: 𝑦∈𝒴 0,1, … , 𝑘 1
• regression: 𝑦∈𝒴 ℝ
We say, a neural network is parametrized by 𝜽 ∈ Θ ℝ (its weights)
and is represented by 𝑓𝜽 : 𝒳 → 𝒴
A neural network function
y 𝑓𝜽 𝐱
𝐱 𝑦
Training a deep network discriminatively
𝜽∗ argmin𝜽 ℒ 𝒟 loss function on the whole training set
𝜽∗ argmin𝜽 𝑙 𝑓𝜽 𝐱 , 𝑦 loss function per training example
𝜽∗ argmin𝜽 𝑙 𝑓𝜽 𝐱 , 𝑦 Ω 𝜽 a regularization term
Example: Deep Regression Networks
𝜽∗ argmin𝜽 𝑙 𝑓𝜽 𝐱 , 𝑦 Ω 𝜽
mean squared error (MSE)
𝜽∗ argmin𝜽 𝑓𝜽 𝐱 𝑦 𝜽
L2 regularization (similar to weight decay)
What do the following two
statements mean?
• Standard deep (regression) networks give a
point estimate prediction
i.e., 𝑦 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑦|𝐱
• Standard deep network weights are a point
3 minutes
estimate of the parameter distribution
i.e., 𝜽∗ 𝑎𝑟𝑔𝑚𝑎𝑥𝜽 𝑃 𝜽 𝒟
• What is the alternative?
6