Professional Documents
Culture Documents
In which augmented notation has been used. You are given a 2D_MLE_MAP_Data.mat
file that contains 550 data points (x and y pairs) drawn from this model. The x matrix is a
550 by 3 matrix, with the last column all ones (augmented). The y matrix is a 550 by 1
vector. Our goal here is to estimate .
1) Derive the MLE of analytically. Use first 30 points as training sample, compute the
MLE of .
~ (m , 2 )
Derive the MAP of analytically. Use the same 30 points as training sample,
compute MAP of .
Hint: note that m is non-zero, meaning our problem is not a simple regularization,
you need to derive the MAP by yourself. You can follow the steps below:
a. Write down the posterior p(w|, 2 ) for our linear model (without
plugging in Gaussian distribution).
b. Represent the posterior with prior and likelihood, using the Bayes Rule.
d. Take derivative of the log posterior function and carry out the
calculation. (Hint: use vector and matrix differentiation can simplify
your derivation).
3) Note that x is a 2-dimensional input, and y is 1d, we can visualize our data with a 3d
scatter plot.
Hint: a Matlab function plot_plane (w, color) is provided for plotting a plane with
given normal vector and given color (e.g. r, b etc.).
4) Compare MSE of the two methods on testing dataset (the data points not used in
training), which is better?
5) Assume 2 = 0.2 instead, repeat part 2) 4). What change do you see on the
plotted planes and MSEs?
6) Change prior back to 2 = 0.001. This time, use 200 points as training set, repeat
part 2) 4). Compare to original results, how is the MSE of the two methods
changing?
7) Please include a copy of your code with your solution (for this and all homework
assignments).
(c) Now implement a steepest descent algorithm on (). You are given a skeleton
code, in which you are given 2 sets of and ( are 2 dimensional, and number
of points = 2). You are also given a starting point (variable current in the code),
and a constant step size.
Use the formula of () (derive it if you didnt already do it above) to calculate
the gradient, and update current until the termination condition
() < 0.2
is met.
Run the script for several different step sizes. Draw the path and report number of
steps (iterations) for each case. Explain the results you observe. (Hint: 0.01 might
be a good choice for convergence in reasonable number of steps. Increase your step
size from there until your algorithm fails to converge).