Hierarchical Dirichlet Process and Infinite Hidden Markov Model

Hierarchical Dirichlet Process and
Infinite Hidden Markov Model
Paper by Y. W. Teh, M. I. Jordan, M. J. Beal & D. M. Blei,

NIPS 2004
Duke University Machine Learning Group

Presented by Kai Ni
February 17, 2006
Outline
• Motivation
• Dirichlet Processes (DP)
• Hierarchical Dirichlet Processes (HDP)
• Infinite Hidden Markov Model (iHMM)
• Results & Conclusions

Motivation
• Problem – “multi-task learning” in which the “tasks” are

clustering problems.
• Goal – Share clusters among multiple, related clustering

problems. The number of clusters are open-ended and
inferred automatically by the model.
• Application
– Genome pattern analysis
– Information retrieval of corpus
Hierarchical Model
• A single clustering problem can be analyzed as a Dirichlet

process (DP). 
– G ~ DP( 0 , G0 ) G    k  k
k 1
– Draws G from DP are discrete, generally not distinct.
• For J groups, we consider Gj for j=1~J is a group-specific

DP. G ~ DP( , G ) 
j 0j 0j G   
j 
k 1
jk  jk
• To share information, we link the group-specific DPs

– G j ~IfDP(
G(τ)is continuous, the draws Gj have no atoms in common
0 , G0 ( ))
with probability one.
– HDP solution: G0 is itself a draw from a DP(, H)
Dirichlet Process &
Hierarchical Dirichlet Process
• Three different perspectives
– Stick-breaking
– Chinese restaurant
– Infinite mixture models
• Setup DP HDP
G ~ DP ( , G0 ) G0 |  , H ~ DP( , H )
G j |  0 , G0 ~ DP( 0 , G0 )
• Properties of DP
–
–
–
Stick-breaking View
• A mathematical explicit form of DP. Draws from DP are
discrete.

• In DP G 
k 1
k  k
with
 ~ Stick ( ),  k ~ G0
• In HDP
 
Gj  
k 1
jk  k
G0   k  k
k 1
π j ~ DP ( 0 ,  )  ~ Stick ( )
 k ~ G0 k ~ H
DP – Chinese Restaurant Process
• Exhibit clustering property
• Φ1,…,Φi-1, i.i.d., r.v., distributed according to G; Ө1,…, ӨK

to be the distinct values taken on by Φ1,…,Φi-1, nk be # of
Φi’= Өk, 0<i’<i,
HDP – Chinese Restaurant Franchise
• First level: within each group, DP mixture
– G j ~ DP(0 , G0 ),  ji | G j ~ G j , x ji |  ji ~ F ( ji )
– Φj1,…,Φj(i-1), i.i.d., r.v., distributed according to Gj; Ѱj1,…, ѰjTj to be

the values taken on by Φj1,…,Φj(i-1), njk be # of Φji’= Ѱjt, 0<i’<i.
• Second level: across group, sharing clusters

– Base measure of each group is a draw from DP:
 jt | G0 ~ G0 , G0 ~ DP( , H )
– Ө1,…, ӨK to be the values taken on by Ѱj1,…, ѰjTj , mk be # of
Ѱjt=Өk, all j, t.
HDP – CRF graph
• The values of  are shared between groups, as well as
within groups. This is a key property of HDP.
Integrating out G0
DP Mixture Model
• One of the most important application of DP: nonparametric
prior distribution on the components of a mixture model.
• G can be looked as an infinite mixture model.
G ~ DP ( 0 , G 0 )

G  
k 1
k k
i | G ~ G
x i |  i ~ F ( i )
HDP mixture model
• HDP can be used as the

prior distribution over the
factors for nested group
data.
• We consider a two-level
DPs. G0 links the child Gj
DPs and forces them to
share components. Gj is
conditionally independent
given G0
Infinite Hidden Markov Model
• The number of hidden states is allowed to be countably

infinite.
• The transition probabilities given in the ith row of the transition

matrix A can be interpreted as mixing proportions
 = (ai1, ai2, …, aik, …)
• Thus each row of the A in HMM is a DP. Also these DPs must
be linked, because they should have same set of “next states”.
HDP provides the natural framework for the infinite HMM.
iHMM via HDP
• Assign observations to groups, where the groups are indexed by the
value of the previous state variable in the sequence. Then the current
state and emission distribution define a group-specific mixture model.
• Multiple iHMMs can be linked by adding an additional level of

Bayesian hierarchy, letting a master DP couple each of the iHMM,
each of which is a set of DPs.
HDP & iHMM
HDP (CRF aspect) iHMM

Group Restaurant J (fixed) By Si-1 (random)
Data Customer xji yi
Hidden Table ji = k, k=1~ Si = k, k=1~
factor Dish k ~ H B (Si , : )
DP weights Popularity jk, k=1~ A (Si-1, : )
Likelihood F(xji| ji ) B (Si, yi)
Non-trivialities in iHMM
• HDP assumes a fixed partition of the data into groups

while HMM is for time-series data, and the definition of
groups is itself random.
• Consider CRF aspect of HDP, the number of restaurant is

infinite. Also in the sampling scheme, changing st may
affect all subsequent data assignment.
• CRF is natural to describe the iHMM, however it is

awkward for sampling. We need to use sampling algorithm
from other respects for the iHMM.
HDP Results
iHMM Results
Conclusion
• HDP is a hierarchical, nonparametric model for clustering

problems involving multiple groups of data.
• The mixture components are shared across groups and the

appropriate number is determined by HDP automatically.
• HDP can be extended to infinite HMM model, providing

effective inference algorithm.
Reference
• Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei,

“Sharing Clusters among Related Groups: Hierarchical
Dirichlet Processes”, NIPS 2004.
• Beal, M.J., Ghahramani, Z. and Rasmussen, C.E., “The

Infinite Hidden Markov Model”, NIPS 2002
• Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei,

“Hierarchical Dirichlet Processes”, Revised version to
appear in JASA, 2006.

Hierarchical Dirichlet Process and Infinite Hidden Markov Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hierarchical Dirichlet Process and Infinite Hidden Markov Model

Uploaded by

Copyright:

Available Formats

Hierarchical Dirichlet Process and

Infinite Hidden Markov Model

Paper by Y. W. Teh, M. I. Jordan, M. J. Beal & D. M. Blei,

Duke University Machine Learning Group

• Dirichlet Processes (DP)

• Hierarchical Dirichlet Processes (HDP)

• Infinite Hidden Markov Model (iHMM)

• Results & Conclusions

• Problem – “multi-task learning” in which the “tasks” are

• Goal – Share clusters among multiple, related clustering

• A single clustering problem can be analyzed as a Dirichlet

• For J groups, we consider Gj for j=1~J is a group-specific

• To share information, we link the group-specific DPs

• Exhibit clustering property

• Φ1,…,Φi-1, i.i.d., r.v., distributed according to G; Ө1,…, ӨK

– Φj1,…,Φj(i-1), i.i.d., r.v., distributed according to Gj; Ѱj1,…, ѰjTj to be

• Second level: across group, sharing clusters

• G can be looked as an infinite mixture model.

• HDP can be used as the

• The number of hidden states is allowed to be countably

• The transition probabilities given in the ith row of the transition

• Multiple iHMMs can be linked by adding an additional level of

HDP (CRF aspect) iHMM

• HDP assumes a fixed partition of the data into groups

• Consider CRF aspect of HDP, the number of restaurant is

• CRF is natural to describe the iHMM, however it is

• HDP is a hierarchical, nonparametric model for clustering

• The mixture components are shared across groups and the

• HDP can be extended to infinite HMM model, providing

• Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei,

• Beal, M.J., Ghahramani, Z. and Rasmussen, C.E., “The

• Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei,

You might also like