You are on page 1of 3

Machine Learning : E0270 2015

Assignment 4 : Due March 24 before class

1. Consider a Hidden Markov Model with K states, M possible emissions


for each state with parameters (, A, B) defined in the usual way, with
k = p(z1 = k), k, Akl = p(zt = l|zt1 = k), k, l and Bjk = p(xt =
j|zt = k), j, k, where {xt } are the set of observed data and {zt } are
the set of hidden states. Forward-backward algorithm for HMM inference involves a recursive forward step (alpha step) that involves updating
(zt ) = p(zt , x0 , . . . , xt ) and a backward step (beta step) that involves
computing (zt ) = p(xt+1 , . . . , xT |zt ) where xt are the observed data and
zt are the hidden states. Assuming the parameters of the HMM to be
known, we would like to pursue an alternate step for backward recursion.
Describe the alpha-gamma inference prodecure, when we define (zt ) =
p(zt |x0 , . . . , xT ) while alpha is still defined as (zt ) = p(zt , x0 , . . . , xt ) as
from the standard forward step. (You will need to derive the recursive
step for (zt ) as a function of alpha and gamma variables and also discuss
initialization. )
2. HMM - Programming Assignment:
(a) Given the parameters of a Hidden Markov Model with discrete emissions, write a function hmmloglikelihood (a template is provided) to
compute the joint log likelihood P (X, Z) of the observed data X and
the hidden state sequence Z. The observed data is in the form of a set
of S sequences of length N each : Xis {1, . . . , M }, i 1, . . . N, s
1, . . . , S and there are K hidden states, so that the set of hidden variables Zi 1, . . . , K, i. Also indicate in your answer sheet, how you
would compute the joint likelihood of multiple observed sequences
instead of a single sequence as you worked out in class.
(b) You are given a dataset X in multi seq disc.txt with S sequences
(one per line), each sequence generated from a HMM. Use these sequences learn the parameters of the HMM using the function hmmtrain
in matlab. Some training options and information on initialization is
provided in the file readme.txt for Problem1. The number of hidden
states K, of the HMM is not known. Hence, use three different settings K = {5, 10, 15} and learn the HMM transition probabilities A
and emission probabilities B in each case. In each case, after learning
the parameters, use the viterbi algorithm hmmviterbi to find the most
likely state sequence Z and compute the joint likelihood P (X, Z) using the function you have written. Report the joint likelihood for
each value of K. For which value of K is this lowest ?

(c) You are given a dataset X in obs cont.txt with a sequence of N


datapoints(one per line), each with M dimensions. You want to use
a HMM with discrete emissions to analyze these datapoints. You
will proceed, in a manner suggested in class, by first clustering the
M-dimensional points into C clusters, obtaining a sequence of clusters IDs. Such a procedure is often used in Speech analysis and this
preprocessing step often creates a codebook to get discrete observations. Train the cluster ID sequence thus obtained using the HMM
you saw in class and learn its parameters. This time, you do not
know both K, the number of hidden states and C the size of your
codebook. Learn HMM parameters by trying values K = {5, 10, 15}
and C = {10, 20, 35}. For each case, compute the joint log likelihood
of the cluster ID sequence used for the training and the best possible
state sequence through viterbi. Report these values. What value of
K and M give the best likelihood ?
3. Constructing a Bayesian Network: In this question, you are asked to construct a Bayesian Network for some of the probabilistic models you have
seen before. To represent Bayesian Networks with a large number of nodes
more compactly, the Plate Notation is often used. This notation is described in detail in your text book in Section 8.1.1 of Bishop [2]. Feel free
to use the plate notation to represent your model more compactly.
(a) Construct the Bayesian Network for the Gaussian Mixture Model:
Given N observations, X1 , . . . XN . Suppose we have K clusters with
parameters k and k respectively. Consider a set of N latent
variables, Z1 , . . . ZN such that p(Xi |Zi ) N (Zi , Zi ), and the
Zi , i, themselves come from a multinomial distribution .
(b) In Assignment 3, you were asked to read about the Probabilistic
Principal Component Analysis [1]. Draw the Bayesian Network for
the Probabilistic Principal component analysis model from section 3
of reference [1].
4. For this problem, you are expected to use Bayes Net toolkit for Directed
Graphical Models to model the following toy example. A Professor at
IISc wants to do an analysis by observing various characteristics and performance of his masters students through the following discrete random
variables : D taking values from easy,hard indicating the (D)ifficulty of
a course, H from yes, no indicating whether or not a student is (H)ardworking, G from good,bad indicating the Grade (G) that a student gets,
P from yes, no indicating whether the student got a (P)ublication during
their Masters degree, R from good, average indicating whether the student got a good GATE (R)ank or an average one, and S taking vaues from
satisfied, Not satisfied indicating whether or not the student is (S)atisfied
at the end of the course. (Observe that this assumes a student registers
for a single course.) Assume that the department maintains a database
with columns corresponding to these random variables, and one row for
each student.
(a) Download the Bayes Net Toolbox designed by Kevin Murphy from
https://code.google.com/p/bnt/. Look at the project wiki for in2

stallation instructions. With the random variables described above,


construct a Bayes net with probabilities given in file bn def students.m.
You can look at the following webpage,
http://bnt.googlecode.com/svn/trunk/docs/usage.html and the
file bnt/BNT/examples/static/sprinkler1.m for information on how
to use the code/ interpret the tables given in bn def students.m.
Draw this Bayes network and write down a factorization for this
Bayesian Network
(b) Find the probabilities (i) P(S= satisfied) , (ii) P(S= satisfied | Grade=Good)
, (iii) P(S= satisfied | Grade=Good, Hardworking=Yes, Difficulty=Low),
(iv) P(S= satisfied | Publication=yes) by running the program with
the jtree inf engine inference engine as shown in the file sprinkler1.m.
Report all these values.
(c) Trace by hand (compute directly from the tables without using the
Bayes Net toolbox) the value of P(S= satisfied | Grade=Good, Hardworking=Yes, Difficulty=Low) and compare with that obtained from
the program
(d) Given that this database has N records, i.e, we have observed all
random variable values for N students, write down the likelihood of
this data in the database. Note that all the values in all the tables
become parameters to the joint distribution.
(e) Assuming that all values are observed for all students in the department database (including the satisfaction values!), what would be the
expressions for maximum likelihood estimators of the parameters in
(i) The table for D (Difficulty) (ii) The table for G (Grade). Write
the expression for the MLE of these parameters in terms of statistics
from the observed data.
Note: Bayesian Networks thus constructed are often in real-life problems like prognosis in clinical decision making.
5. D-Seperation in Directed Graphical Models:
(a) List all the D-seperations in the Bayesian Network in problem 4
(b) Consider a Bayesian Network with random variables A, B, C and D.
You are given that A B, D|C, Show that this implies A D|C
using the definition and properties of D-Seperation.
(c) Give a linear time algorithm to check if A B|C is true where A, B
and C are non-intersecting subsets of nodes in the Bayesian Network.

References
[1] Michael E. Tipping and Christopher M. Bishop, Probabilistic Principal Componant Analysis, Journal of Royal Statistical Society, :
http://research.microsoft.com/pubs/67218/bishop-ppca-jrss.pdf
[2] Christopher M. Bishop, Pattern Recognition and Machine Learning.

You might also like