You are on page 1of 13

Machine Learning (Fall 2020)

Assignment 1

ANSWER No 1: Linear regression

We have learned about linear regression in the class. Here we explore what we mean by "linear".
Specifically, we will see how linear regression can be used to fit non-linear functions of the data using
feature maps, and explore some of its limitations.

Learning degree-3 polynomials of the input


N
1
J(θ)= ∑ ¿ ¿- y i)2

2 I=1

Differentiating this objective we get the update rule


N
∇ θj(θ )=∑ ¿¿ - y i)( x i)
I=1

N
∝ γ ∑ ¿ ¿- y i)( x i)
I=1

WHERE γ IS THE LEARING RATE THE UPDATE RULE IS


N
θ=θ ∑ ¿ ¿- y i)( x i)
I =1
Answer no 2: Coding question: degree-3 polynomials regression

Code

Output graph
3. Coding question: degree-3 polynomial GD and SGD

Output
Answer no 4: Coding question: degree-k polynomial regression

Output
Generally higher degree polynomials tend to fit the data better though very high degree poly-nominal’s
can be numerically unstable

Answer no 5: Coding question: other feature maps

Output
In the presence of the sin(x) feature, the models seem to the data better more robustly. However the
numerical instability that comes with high degree polynomials remain

Answer no 6 : Overfitting with expressive models and small data


We see that when the dataset is small. Higher degree polynomials tend to pass through all the
points, but qualitatively seem like a poor fit. Numerical instability with high degree polynomials
remain a problem even with small data, with or without sin(x).
2. Incomplete, Positive-Only Labels

2.1Coding problem: ideal (fully observed) case


Output

2.2 Coding problem: The naive method on partial labels


Output

3.Warm-up with Bayes rule

4.CP8318 Only Question


We have
ρ ¿-1| x i)- ρ ( y (i )-1⋀ t (i)-1| x i) from previous sub –question
= ρ ¿-1| x i)- ρ ( ρ ¿-1 t (i)-1. x(i)
= ρ ¿-1| x i)∝
Dividing both sides by ∝,we have the desired result

5.CP8318 Only Question: Estimating


This is a solution that is easier to understand but it strongly relies on the assumption that
we made here. The Solution 2 below is less intuitive but can work for more general
settings than the setting in this sub question.

We fix i and let A –{x: ρ ¿ ¿-1| x(i)-x)-1} be the set of all positive examples and R={x: ρ ¿ ¿
=0 x(i)=x)=1} be the set of all negative examples. By the assumption ρ ¿ ¿ -1 x(i) )ϵ
{0,1},we know that A and B covers the who;e input space.we first claim that ρ ¿ ∉ A. y i
=1)=0.this is because when x(i) ∉ A. we have that x(i) ∈B,WHICH IMPLIES THAT ρ ¿ ¿ -
0 | x(i) - 1,ehich implies that ρ ¿ ¿ -1 | x(i) – 0.
Therefore the event x(i) ∉ A implies the event ¿ ¿=0 which implies that y i =0.
Thus ρ ¿ ∈ A| y i -1 =1.

By 2nd for x∈ A,We have that h(x)-α p¿ ¿ -1 x(i)-x)-α .that is p(h( x(i)-α x (i )-x)-1.
The implies that p(h( x(i) -α ∨x (i) -x. y (i )-1)-1.
Note that for any event E.and any event F with positive probability p(E)-1 implies that
p(E|F)-1.
This is because p(E)-0 which implies p(E|F)-p(E∩ F )/p (F)-0)
p(h( x(i) )=α | y (i )=1= p(h( x(i))=α | y (i )=1. x(i) ∈ A). ρ ¿ A| y (i )=1……(4)
-1. ρ ¿ A| y (i )=1)

(by p(h( x(i))=α | x(i)=x. y (i )=1)=1}

-1.1 (because p(h( x(i)) ∈ A| y (i )-1)-1}….(5)


-1
Therefore Eh ( x(i)) y (i )-1)-α

Let g( x i)-p(t i-1| x i) By d) we have that


h(( x i)-∝ g ( x i)
Then we have that
E[ph( x i) y (i )=1=Eh ( x i)| y (i )−1]
Eh ( x i ) 1{ y (i)=1 }
= ()
p { y i −1 }

E [ E1 ( y −1 )| x ] h x
i i i
= BY ;LAW OF EXPECTION
E ¿¿
=E ¿ ¿
E g ( xi ) . g xi
=∝
E¿¿
=∝ (since g(. x i) ∈{0,1 }

6. Coding problem.
Using the validation set, estimate the constant α by averaging your classifier’s
predictions over all labeled examples in the validation set:

Output

You might also like