You are on page 1of 17

COETOITYTTOI

Machine Learning — WS2019 — Module IN2064


GH)
HXNIMH) Sheet 04 · Page 1
" '

JEN
'

Machine Learning Exercise Sheet 04

Linear Regression

.EE#unwm*ekN4mM&:lRgIfm+yRY
X O
YL
mE¥i
'

*leant
'

1 Least squares regression Hae LIEN


J Problem 1: Let’s assume we have a dataset where each datapoint, (xi , yi ) is weighted by a scalar factor
which we will call ti . We will assume that ti > 0 for all i. This makes the sum of squares error function
look like the following:
N
Hark) 1X ⇥ T ⇤2
rar( E)
gi=fCxiHEi
Eweighted (w) = ti w (xi ) yi
entropy
2 ,

national
i=1
FR.
{
Find the equation for the value of w that minimizes this error function. eindfo.pt)
Furthermore, explain how this weighting factor, ti , can be interpreted in terms of

ee¥¥mm÷iy¥*rti¥¥e÷•tag
1) the variance of the noise on the data and
2) data points for which there are exact copies in the dataset.
c.
2 Ridge regression

Problem 2: Show that the following holds: The ridge regression estimates can be obtained by ordinary
least squares regression
p on an augmented dataset: Augment the design matrix 2 RN ⇥M with M
additional rows I M ⇥M and augment y with M zeros. '
TOI) Etty
-

d Problem 3:
-

FEI:# tray
Derive the closed form solution for ridge regression error function
N
1X T
Eridge (w) =
2
(w (xi ) yi ) 2 +
2
wT w ¥ CMHINHNINXN ,D
MHK
i=1
(
MH
.

Additionally, discuss the scenario when the number of training samples N is smaller than the number of
basis functions M . What computational issues arise in this case? How does regularization address them?

.EE#Eqgtaai y yi
p*¥IEBtoc*i tg
.gg
¥ EI*7=9f
varia) -_

Efxy Exp -
EHH-E.ztfz.am
Upload a single PDF file with your homework solution to Moodle by 10.11.2019, 23:59 CET. We
recommend to typeset your solution (using LATEX or Word), but handwritten solutions are also accepted.
If your handwritten solution is illegible, it won’t be graded and you waive your right to dispute that.
i n
gtfo
E
M
. -
N
ee .
Qoli
i -
si on
Ff
② s t
I
°*4¥ffE
i
r k s n
E
÷•÷t
* -
-
.
I 1- 10 Oh
to
To d 3 pl
EFE
.ae?o3sfriE
-
-
EE x
"
Ei 30 - + be ON 3
I
go 3 t E
FR EE i
-
i
FEE H in
Erie #
,
38 ON A ④
WE +
.
w
pier OH
← f
-
in it off -13 tf t f-SO
" -
z TW
-
N
gots to F
got
i
an w 3

E - ro
is
v
'
{ Sd
-
in -
-
in
ios
t
33 t

'
t E
-331-8
D n
3- '
D
oo -
-
B v
E B +3
§
u
W & og og
of
*
3 or
v
g BE
ex
B N
F BI N
f N G

"

tf
-
IE - -
I
ri
0 if : .
-
x
-
EE s
SW f
I
.
§ t Q
s E O E -
x - t
E
s
! EE
::÷÷÷

K -

.
f
⇐¥
E
w
x E !
F
x
i 0
A
Ad I i al
• a sh i
1- B E r .
* E
-
e EE
OH
as ¥ it
+
E n i
f- EO x -
WE > WE g
.EE
EEE -
is #
*
Ii § # -
W "
3 EH
÷:¥i¥ f÷¥ ¥
N
I
HN
to E r
÷
3 € v
a
AA w ON
n
+
if n

tis
'l U
T t
B d 3
.-
fo +
3
3 -
"
-53 A n
-
WE
N
1- so E
)
t
B B
OT z
too a * of
t t
R t t toy t n
③ 3 3 -
+ *
IN .IR OH -
I 5 . -
B
E S u
ar f s
t
V ily i
e v
t ar - 3
3 x t 3 to
+3 ok @
a
oh OH oh 3
✓ I
WE z t +3 e- FAM W
SWE Al A 3
d ④
-
N
-
to us
-
too A
Ipo
- -
- -
Id l # t
" NN in E n -
og
-
¥8 is §& 38 of 3 n f
3
-
=
, oo -
3
w et E
b g
808 GB
og
F 3 n
p to,
:/
§
3
+3
E
÷÷
:
E
or
÷
- NE
DE
Machine Learning — WS2019 — Module IN2064 Sheet 04 · Page 2

3 Multi-output linear regression


-

✓ Problem 4: In class, we only considered functions of the form f : Rn ! R. What about the general case
of f : Rn ! Rm ? For linear regression with multiple outputs, write down the loglikelihood formulation
and derive the MLE of the parameters.

4 Comparison of Linear Regression Models


f- Hit
Wti
He -
-_

Ami
v Problem 5: We want to perform regression on a dataset consisting of N samples xi 2 RD with corre-
sponding targets yi 2 R (represented compactly as X 2 RN ⇥D and y 2 RN ).
Assume that we have fitted an L2 -regularized linear regression model and obtained the optimal weight
vector w⇤ 2 RD as


w = arg min
1X T
(w xi
N
yi ) 2 + wT w yall
2 2

Xw*=X④
w
i=1
Wnew
Note that there is no bias term.
Now, assume that we obtained a new data matrix X new by scaling all samples by the same positive factor
a 2 (0, 1). That is, X new = aX (and respectively xnew
i = axi ).
a) Find the weight vector wnew that will produce the same predictions on X new as w⇤ produces on
- -

O
X.

* b) Find the regularization factor new 2 R, such that the solution w⇤new of the new L2 -regularized
.
linear regression problem
Xw*=aX When
N
1X new
w⇤new = arg min (wT X new
i yi ) 2 + wT w
w 2 2
i=1

will produce the same predictions on X new as w⇤ produces on X. XwIaXwnew=0 w

X. fwiawneu ) OO X
-
-

Provide a mathematical justification for your answer. .


-
-

5 Programming Task: Least squares regression awnewuekerlxl W't w


't
-

Because oekerlxytx Whew


Problem 6: Load the notebook 04_homework_linear_regression.ipynb from Piazza. Fill in the miss-
ing code and run the notebook. Convert the evaluated notebook to pdf and add it to the printout of your
homework.
Note: We suggest that you use Anaconda for installing Python and Jupyter, as well as for managing
packages. We recommend that you use Python 3.
For more information on Jupyter notebooks and how to convert them to other formats, consult the Jupyter
documentation and nbconvert documentation.

Upload a single PDF file with your homework solution to Moodle by 10.11.2019, 23:59 CET. We
recommend to typeset your solution (using LATEX or Word), but handwritten solutions are also accepted.
If your handwritten solution is illegible, it won’t be graded and you waive your right to dispute that.
. .
no£
-
G I at
~ n I
• at €
~ Astro
SE Yu
o
. - o
I
R B -
*
EEN
JE
n
.
§ + E
en es s
a E EEE £ I
B " if
-
+ t
-
a
HE t
* § as
idea TENE
Ej
so T§' ET HE Ep
at g Es E or 3 D
I BE £ 8
g.X
x
I
Et
A
es e n
s
e
&
Ift god
-
EEO ion
-
f Es te
w
s
Gd
- ×
DB
E TN ←

3
-
8
EFE
e
EE
'
×
E
Fan or or or -
E -
iE€ IEA
n
E Igt E
gars ooo
'
- -
- o u Ea
B i. 8 IoT
xx
DE F- &
E
q Erin TETE
w
e &
oo
ii
ABE g
E of
s
II 9 ¥ JE
Its
Sf
&
ETI s
"
n go
F. E o S v 3

E
"
-
X
11
IF I
+
3 "
£8
IE
-
3 - E
- ee . -
a t ×
or -
er §
-

+ ¥ +
+3 is
5 oh
£3 i i
E
or
'
I .EE Ed E
-
x
¥¥¥B i
TEE :{ ' NE
www 's II
'
Ent
+
:

3×7 T E
¥ i I i 3
3 E
a -
f- 3
.

E
n 's
En '
f I
HE
F # tee # E E
IEEE
"
TX d v
8- - N i # IT god
E to " "
v
-
IE . -
-
E
e
-
ED
<

n
n
-5¥
did

at E
.
-
.
€3
o
- i
- • -
3
-
or § Er
EEE - WE

53 3 i
j
Tf
.oaiF¥aE±¥
i i EE Eon
.io .
-

qq.EE
EE EEE E .

-
-
-

:
&
.
T _-
.
Be a
E
-
res
EE As
① w i 3
¥
BI
.
B
N
E WINNE
3
E F x EE EE
{
ANE I 1- it
3.s-nw.EO.N.ES
×
3
I 33 →
oE
WE n ¥ a
T
oh ANE
AB HE E n
I
⇐÷¥÷
% E
"
EE t
£ at F E
Ige te E
-
I
e-
+
.
{
¥
90
z Ii
'
÷÷:¥÷÷÷: .
ok
¥1 :*
as ×
'
'
q tze } "
in
It 3£ f 3 ON
"
II ¥
'
AA I E
g
oleo EE £8 X u
V
£i¥*
e-
-

OH
I
-
¥9,4
¥5
'
O
*
-
-
d-
i *
¥TwE÷Et*B!§
¥5 t.ge
r
i
¥ As
.
" BD .
e NE .
J tf re
ENT v n Ad
AB
×
1- ⑧
e
.E*EtE*aE¥I
-
It
'
of
Tor - B
t
V
na
& t
• * EEE ←
-
T s -
x
site
seise . .
! -
B B B
& a X
id
- X
t
- '
"
g
* -
* x
EEE
.
e -
E #
go
is XE
EEE
one
n¥§ -

,
n

s
a- in
BE 4 ah t H
to
- stir
8*7
Tx Ei
IE O
E EEE I
g EET 't
" o
poi E E E sie T
DA EE E 5 3
E Tigre33 #
uFEEE_ANEaWE--#.
3
g-
-
u
i
-
t r
-

D
w e x
x
1- t Tn
f
n
x N -
X
H
n
T
rn "
r
IT to
H
t
>
n
f T x
BE orix E
x x
A N t II n y
'
T
8/88
I t x oo
-
Eko
→ -
4
ro x x t
¥
I E E A " E Eko +
or b b t § X x
×
"
" W
x x A
× No IIe 50¥ x
x
E- -
n x F x
-
"
x Tn
- N T E H
if
FT
' "
A n
a n
H -
a
a E "
E' I +
I t
Eto E # Ele E
t
x '
X X
f I + €
I t t U X t
x
U
X K X
×
n
I I
i
-
t v x
D
1-
x n
"
X
v I
+
DT
H F
← a
T
H
c
+
E
x
x w
t
x
E
+
µ x no
reports
T R
#
Machine Learning — WS2019 — Module IN2064 Sheet 04 · Page 3

In-class Exercises

Problem 7: Assume that we are given a dataset, where each sample xi and regression target yi is
generated according to the following process

xi ⇠ Uniform( 10, 10)


yi = ax3i + bx2i + cxi + d + ✏i , where ✏i ⇠ N (0, 1) and a, b, c, d 2 R.

The 4 regression algorithms below are applied to the given data. Your task is to say what the bias and
variance of these models are (low or high). Provide a 1-2 sentence explanation to each of your answers.
a) Linear regression high bias , high van .

b) Polynomial regression with degree 3


c) Polynomial regression with degree 10

Problem 8: Given is a training set consisting of samples X = {x1 , x2 , . . . , xN } with respective regression
targets y = {y1 , y2 , . . . , yN } where xi 2 RD and yi 2 R.
Alice fits a linear regression model f (xi ) = wT xi to the dataset using the closed form solution for linear
regression (normal equations).
Bob has heard that by transforming the inputs xi with a vector-valued function , he can fit an alternative
function, g(xi ) = v T (xi ), using the same procedure (solving the normal equations). He decides to use
a linear transformation (xi ) = AT xi , where A 2 RD⇥D has full rank.
a) Show that Bob’s procedure will fit the same function as Alice’s original procedure, that is f (x) =
g(x) for all x 2 RD (given that w and v minimize the training set error).
b) Can Bob’s procedure lead to a lower training set error than Alice’s if the matrix A is not invertible?
Explain your answer.

Problem 9: See Jupyter notebook inclass_04_notebook.ipynb.

Upload a single PDF file with your homework solution to Moodle by 10.11.2019, 23:59 CET. We
recommend to typeset your solution (using LATEX or Word), but handwritten solutions are also accepted.
If your handwritten solution is illegible, it won’t be graded and you waive your right to dispute that.
Machine Learning — WS2019 — Module IN2064 Sheet 04 · Page 3

In-class Exercises

Problem 7: Assume that we are given a dataset, where each sample xi and regression target yi is
generated according to the following process

xi ⇠ Uniform( 10, 10)


yi = ax3i + bx2i + cxi + d + ✏i , where ✏i ⇠ N (0, 1) and a, b, c, d 2 R.

The 4 regression algorithms below are applied to the given data. Your task is to say what the bias and
variance of these models are (low or high). Provide a 1-2 sentence explanation to each of your answers.
a) Linear regression
b) Polynomial regression with degree 3
c) Polynomial regression with degree 10

Problem 8: Given is a training set consisting of samples X = {x1 , x2 , . . . , xN } with respective regression
targets y = {y1 , y2 , . . . , yN } where xi 2 RD and yi 2 R.
Alice fits a linear regression model f (xi ) = wT xi to the dataset using the closed form solution for linear
regression (normal equations).
Bob has heard that by transforming the inputs xi with a vector-valued function , he can fit an alternative
function, g(xi ) = v T (xi ), using the same procedure (solving the normal equations). He decides to use
a linear transformation (xi ) = AT xi , where A 2 RD⇥D has full rank.
a) Show that Bob’s procedure will fit the same function as Alice’s original procedure, that is f (x) =
g(x) for all x 2 RD (given that w and v minimize the training set error).
b) Can Bob’s procedure lead to a lower training set error than Alice’s if the matrix A is not invertible?
Explain your answer.

Problem 9: See Jupyter notebook inclass_04_notebook.ipynb.

Upload a single PDF file with your homework solution to Moodle by 10.11.2019, 23:59 CET. We
recommend to typeset your solution (using LATEX or Word), but handwritten solutions are also accepted.
If your handwritten solution is illegible, it won’t be graded and you waive your right to dispute that.

You might also like