You are on page 1of 3

EE 660 Homework 3 Posted: Thursday, 9/15/2016

Due: Tuesday, 10/4/2016, 11:59 PM

1. Curve fitting as an example of regression, using PMTK


1) Download and install PMTK from the following address:
https://github.com/probml/pmtk3
Follow its Readme to download and setup.
2) Run the demo script /demos/linregPolyVsDegree.m
(a) Attach Figure 1 in your answer.
(b) Describe how curve shape and MSE change as degree of polynomial increases,
and explain why.
3) Run the demo script /demos/linregPolyVsRegDemo.m
(a) Attach Figure 1 in your answer.
(b) What is degree of polynomial? Is it changing during the demo?
(c) Which variable controls the effect of regularization? Describe how curve
shape and MSE change as regularizer increases and why.
Hint: regularization is introduced in Murphy 7.5.1.

2. 2D MLE and MAP estimators


Given a linear Gaussian model as follows:
y = +
where:
~ (0, 2 )
with 2 = 0.5.

In which augmented notation has been used. You are given a 2D_MLE_MAP_Data.mat
file that contains 550 data points (x and y pairs) drawn from this model. The x matrix is a
550 by 3 matrix, with the last column all ones (augmented). The y matrix is a 550 by 1
vector. Our goal here is to estimate .

1) Derive the MLE of analytically. Use first 30 points as training sample, compute the
MLE of .

2) Assume that we know the distribution of as a prior:

~ (m , 2 )

in which 2 = 0.001, m = 0.5 1.

Derive the MAP of analytically. Use the same 30 points as training sample,
compute MAP of .
Hint: note that m is non-zero, meaning our problem is not a simple regularization,
you need to derive the MAP by yourself. You can follow the steps below:

a. Write down the posterior p(w|, 2 ) for our linear model (without
plugging in Gaussian distribution).

b. Represent the posterior with prior and likelihood, using the Bayes Rule.

c. Take the log posterior function ln p(w|, 2 ) as your objective function


and plugin Gaussian distributions.

d. Take derivative of the log posterior function and carry out the
calculation. (Hint: use vector and matrix differentiation can simplify
your derivation).

3) Note that x is a 2-dimensional input, and y is 1d, we can visualize our data with a 3d
scatter plot.

Plot the training set and the planes defined by


and
(use different colors
for the two planes).

Hint: a Matlab function plot_plane (w, color) is provided for plotting a plane with
given normal vector and given color (e.g. r, b etc.).

4) Compare MSE of the two methods on testing dataset (the data points not used in
training), which is better?

5) Assume 2 = 0.2 instead, repeat part 2) 4). What change do you see on the
plotted planes and MSEs?

6) Change prior back to 2 = 0.001. This time, use 200 points as training set, repeat
part 2) 4). Compare to original results, how is the MSE of the two methods
changing?

7) Please include a copy of your code with your solution (for this and all homework
assignments).

3. Steepest descent of quadratic function.


2
(a) You are given that () = ( ) , in which and are D dimensional
vectors, and and are given constants. Prove that is convex.
2 2
(b) Is () = =
=1 ( ) convex, in which and are D
2
dimensional, and and , = 1, , , are given constants? Justify your answer.

(c) Now implement a steepest descent algorithm on (). You are given a skeleton
code, in which you are given 2 sets of and ( are 2 dimensional, and number
of points = 2). You are also given a starting point (variable current in the code),
and a constant step size.

Use the formula of () (derive it if you didnt already do it above) to calculate

the gradient, and update current until the termination condition

() < 0.2

is met.

Run the script for several different step sizes. Draw the path and report number of
steps (iterations) for each case. Explain the results you observe. (Hint: 0.01 might
be a good choice for convergence in reasonable number of steps. Increase your step
size from there until your algorithm fails to converge).

You might also like