Professional Documents
Culture Documents
Lecture 03
Lecture 03
s.bhulai@vu.nl
12 September 2023
Towards a Bayesian framework
∫∫
[L] = L(t, y(x))p(x, t) dx dt
▪ Natural choice:
∫∫
[L] = {y(x) − t}2 p(x, t) dx dt
𝔼
𝔼
4 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
The frequentist pitfall
▪ Given a point x, the expected loss at that point is given by
∫
[L(t, y(x))] = {y(x) − t}2 p(t | x) dt
∫
2 {y(x) − t}p(t | x) dt
∫ ∫
y(x) = y(x)p(t | x) dt = tp(t | x) dt = t[t | x]
𝔼
𝔼
5 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
The frequentist pitfall
▪ The expected loss function
∫∫
[L] = {y(x) − t}2 p(x, t) dx dt
▪ Rewrite integrand as
{y(x) − t}2 = {y(x) − [t | x] + [t | x] + −t}2
= {y(x) − [t | x]}2 + 2{y(x) − [t | x]}{ [t | x] − t} + { [t | x] − t}2
∫ ∫
[L] = {y(x) − [t | x]}2 p(x) dx + var[t | x]p(x) dx
𝔼
𝔼
𝔼
6 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
The frequentist pitfall
▪ Recall the expected square loss,
∫ ∫∫
[L] = {y(x) − h(x)}2 p(x) dx + {h(x) − t}2 p(x, t) dx dt
∫
where h(x) = [t | x] = tp(t | x) dt
▪ We then have
{y(x; ) − h(x)}2
(bias)2 variance
𝒟
𝒟
𝒟
𝒟
9
𝒟
Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
𝔼
𝔼
𝒟
𝒟
𝔼
𝒟
𝔼
𝒟
The frequentist pitfall
▪ In conclusion:
where
∫
(bias)2 = { [y(x; )] − h(x)}2 p(x) dx
∫
variance = [{y(x; )] − [y(x; )]}2] p(x) dx
∫∫
noise = {h(x) − t}2 p(x, t) dx dt
𝒟
𝒟
𝒟
𝔼
𝔼
𝒟
𝒟
𝔼
𝒟
10 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
Bias-variance decomposition
▪ Example: 100 datasets from the sinusoidal with 25 data
points, varying the degree of regularization
where
mN = SN (S−1
0 m 0 + βΦ ⊤
t)
S−1
N = S −1
0 + βΦ ⊤
Φ
18 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
𝒩
𝒩
fi
Bayesian linear regression
▪ A common choice for the prior is
p(w) = (w | 0, α −1I)
for which
mN = β SN Φ⊤t
S−1
N = αI + βΦ ⊤
Φ
p(w) = (w | 0, α −1I)
for which
mN = β SN Φ⊤t
S−1
N = αI + βΦ ⊤
Φ
p(w) = (w | 0, α −1I)
for which
mN = β SN Φ⊤t
S−1
N = αI + βΦ ⊤
Φ
p(w) = (w | 0, α −1I)
for which
mN = β SN Φ⊤t
S−1
N = αI + βΦ ⊤
Φ
p(w) = (w | 0, α −1I)
for which
mN = β SN Φ⊤t
S−1
N = αI + βΦ ⊤
Φ
∫
p(t | t, α, β) = p(t | w, β)p(w | t, α, β) dw
where
1
σN2 (x) = + φ(x)⊤SN φ(x)
β
M−1
wjφj(x) = w⊤φ(x)
∑
y(x, w) =
j=0
▪ φj = x j
▪ Take M=2
= (Φ Φ) ⊤ −1
▪ Calculate wML Φ ⊤t
▪ Performance is measured by
1 N
{y(xn, w) − tn}2
2N ∑
E(w) =
n=1
w1 := temp1
M−1
wjφj(x) = w⊤φ(x)
∑
y(x, w) =
j=0
▪ Repeat {
1 N
wj := wj − α
∑ (y(xn, w) − tn) φj(xn)
N n=1
}