Professional Documents
Culture Documents
Instructions
1. It is a CLOSED BOOK examination.
2. This paper has five questions. Each question carries 10 marks. Therefore, the maximum marks
is 50.
3. Write your answers on a paper, scan and submit them at the end of the exam.
4. Write your name, roll number and the subject number (CS 419M) on the top of each of your
answer script.
5. There are multiple parts (sub-questions) in each question. Some sub-questions are objective
and some are subjective.
6. There will be partial credits for subjective questions, if you have made substantial progress
towards the answer. However there will be NO credit for rough work.
7. Please keep your answer sheets different from the rough work you have made. Do not attach
the rough work with the answer sheet. You should ONLY upload the answer sheets.
Page 1 of 6
1. Consider the one dimensional linear regression problem where we neglect the bias term. Suppose
that the dataset {(xi , yi )}m
i=1 is generated by sampling m samples from the following distribution
D (
(1, 0) with probability µ
(x, y) = (1)
(µ, −1) with probability (1 − µ)
1.a Find the probability that the dataset contains atleast one sample (µ, −1).
1.a /2
1.b Find E[x], E[y], Var(x), Var(y) where E[·] denotes expectation and Var(·) denotes the
variance.
1.b /4
1.c Consider the loss function
m
X
l(w) = (yi − wxi )2 (2)
i=1
Find the expected value of this loss function with respect to the distribution D and denote
it LD (w). Then find w∗ that minimizes
L(w) = LD (w) + λ|w|2 (3)
1.c /4
1.a This will be 1 − µm
1.b
E[x] = µ 1 + (1 − µ) µ = 2µ − µ2 (4)
E[y] = µ 0 + (1 − µ) (−1) = µ − 1 (5)
2
Var(x) = E[x2 ] − E[x] (6)
2 2
= µ 1 + (1 − µ) µ2 − 2µ − µ
(7)
= µ − 3µ2 + 3µ3 − µ4 (8)
2
Var(y) = E[y 2 ] − E[y] (9)
= µ 0 + (1 − µ) 1 − (µ − 1)2 (10)
= µ − µ2 (11)
1.c
m
X
2
LD (w) = E (yi − wxi ) (12)
i=1
m
X
= E[yi2 ] + w2 E[x2i ] − 2 w E[yi xi ] (13)
i=1
2 2 3 2
= m (1 − µ) + w (µ + µ − µ ) − 2w(µ − µ) (14)
Hence
L(w) = LD (w) + λ|w|2 (15)
= m (1 − µ) + w (µ + µ − µ ) − 2w(µ − µ) + λw2
2 2 3 2
(16)
dL 2 3 2
=⇒ = m 2w(µ + µ − µ ) − 2(µ − µ) + 2λ w = 0 (17)
dw
Hence
m(µ2 − µ)
w∗ = (18)
m(µ + µ2 − µ3 ) + λ
Page 2 of 6
This is valid only if it is a minima, i.e., the second derivative should be non-negative,
i.e.,
d2 L
2
= m(µ + µ2 − µ3 ) + λ > 0 (19)
dw
2. Consider a 1-dimensional linear regression problem. The dataset corresponding to this problem
has n examples D = {(xi , yi )}ni=1 , where xi , yi ∈ R ∀i. We don’t have access to the inputs
or outputs directly and we don’t know the exact value of n. However, we have a few statistics
computed from the data.
Let w∗ = [w0∗ , w1∗ ]⊤ be the unique solution that minimizes J(w) given by:
n
1X
J(w) = (yi − w0 − w1 xi )2 (20)
n i=1
2.a Equating the derivative of J(w) with respect to w0 and w1 to zero, we get
n
∂J 1X
= 2 (yi − w0∗ − w1∗ xi ) = 0 (21)
∂w0 n i=1
n
∂J 1X
= 2 (yi − w0∗ − w1∗ xi ) xi = 0 (22)
∂w1 n i=1
Based on these two equations, it is easy to see that only iii and iv are true.
Page 3 of 6
x̄ w0∗ + (Cxx + x̄2 )w1∗ = Cxy + x̄ȳ (24)
On solving, we get
Cxy
w0∗ = ȳ − x̄ (25)
Cxx
Cxy
w1∗ = (26)
Cxx
2.c The new statistics of the data in terms of old statistics are as follows:
n n n
1X 1X 1X
˜
x̄ = x̃i = x̄ ȳ = yi C̃xx = (x̃i − x̄)2 = Cxx + σ 2
n i=1 n i=1 n i=1
n n n
1X 1X 1X
Cyy = (yi − ȳ)2 C̃xy = (x̃i − x̄)(yi − ȳ) = Cxy + ϵi yi = Cxy
n i=1 n i=1 n i=1
Hence, the modified equations will be
Cxy
w0∗ = ȳ − x̄ (27)
Cxx + σ 2
Cxy
w1∗ = (28)
Cxx + σ 2
3. Consider the problem of learning a binary classifier on a dataset S using the 0-1 loss i.e.,
l0−1 (w, (x, y)) = I[y̸=sign(w⊤ x)] = I[yw⊤ x≤0]
where w, x ∈ Rd , y ∈ {−1, +1} and I[·] denotes the indicator function.
3.a Design a convex surrogate loss function for this 0-1 loss. Note than a convex surrogate
loss should (1) be convex and (2) should always upper bound the original loss function.
You will get full marks only if you show that the convex surrogate you propose satisfies
these two properties.
3.a /2
3.b Define X
FS (w) = λ|S|∥w∥2 + [1 − yi w⊤ xi ]+ (29)
i∈s
Show that, if w∗ minimizes FS (w) then
∗ 1
∥w ∥ = O √ (30)
λ
where we denote a = O(b) iff
k
a≤ (31)
b
for some constant k.
3.b /3
3.c Now define X
FS (w) = λ|S|∥w∥2 + (yi − w⊤ xi )2 (32)
i∈s
Show that, if w∗ minimizes FS (w) and ||x||2 ≤ xmax then
∗ 1
∥w ∥ = O (33)
λ
3.c /5
3.a When y and w⊤ x are of opposite signs, only then value of the loss will be 1, otherwise
it will be 0. So we want
yw⊤ x ≤ 0 (34)
Page 4 of 6
⇐⇒ 1 − yw⊤ x ≥ 0 (35)
Based on the inequality 35, we can propose the following hinge loss:
l(w, (x, y)) = max{0, 1 − yw⊤ x} (36)
It remains to show that the hinge loss is a valid convex surrogate for the 0-1 loss. We
show this as follows:
Page 5 of 6
By combining the inequalities 48 and 49, we have
λ|S|∥w∗ ∥2 − 2 |S|∥w∗ ∥ xmax ≤ 0 (50)
2 xmax
=⇒ ∥w∗ ∥ ≤ (51)
λ
1
=⇒ ∥w∗ ∥ = O (52)
λ
Total: 30
Page 6 of 6