Professional Documents
Culture Documents
20 Latent Dirichlet Allocation
20 Latent Dirichlet Allocation
Lecture 20
Latent Dirichlet Allocation
Philipp Hennig
29 June 2021
Faculty of Science
Department of Computer Science
Chair for the Methods of Machine Learning
# content Ex # content Ex
1 Introduction 1 15 Exponential Families 8
2 Reasoning under Uncertainty 16 Graphical Models
3 Continuous Variables 2 17 Factor Graphs 9
4 Monte Carlo 18 The Sum-Product Algorithm
5 Markov Chain Monte Carlo 3 19 Example: Modelling Topics 10
6 Gaussian Distributions 20 Latent Dirichlet Allocation
7 Parametric Regression 4 21 Mixture Models 11
8 Learning Representations 22 Bayesian Mixture Models
9 Gaussian Processes 5 23 Expectation Maximization (EM) 12
10 Understanding Kernels 24 Variational Inference
11 Gauss-Markov Models 6 25 Example: Collapsed Inference 13
12 An Example for GP Regression 26 Example: EM for Kernel Topic Models
13 GP Classification 7 27 Making Decisions
14 Generalized Linear Models 28 Revision
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 1
Designing a probabilistic machine learning method:
1. get the data
1.1 try to collect as much meta-data as possible
2. build the model
2.1 identify quantities and datastructures; assign names
2.2 design a generative process (graphical model)
2.3 assign (conditional) distributions to factors/arrows (use exponential families!)
3. design the algorithm
3.1 consider conditional independence
3.2 try standard methods for early experiments
3.3 run unit-tests and sanity-checks
3.4 identify bottlenecks, find customized approximations and refinements
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 2
A Mixture of Probabilities?
Desiderata
V words K topics
V words
D documents
D documents
K topics
X ∼ Π × Θ
∏V ∏xdv
▶ topics should be probabilities: p(xd | k) = v=1 i=1 θk(i),v
▶ but documents should have several topics! Let πdk be the probability to draw a word from topic k
doc sparsity each document d should only contain a small number of topics
word sparsity each topic k should only contain a small number of the words v in the vocabulary
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 3
Each document discusses a few topics
Dirichlet sparsity in the document-topic distribution
Q M \ X Y V I Q S H I P X S T M G Q S H I P
H S G Y Q I R X d
X S T M G k X S T M G k
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 4
Each topic contains a few words
Dirichlet sparsity in the topic-word distribution
+ E Y W W M E R F E W I Q I E W Y V I ( M V M G L P I X F E W I Q I E W Y V I
X S T M G k
[ S V H v [ S V H v
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 5
Designing a probabilistic machine learning method:
1. get the data
1.1 try to collect as much meta-data as possible
2. build the model
2.1 identify quantities and datastructures; assign names
2.2 design a generative process (graphical model)
2.3 assign (conditional) distributions to factors/arrows (use exponential families!)
3. design the algorithm
3.1 consider conditional independence
3.2 try standard methods for early experiments
3.3 run unit-tests and sanity-checks
3.4 identify bottlenecks, find customized approximations and refinements
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 6
A Discrete Topic Model
almost there
πd cdi wdi θk
i = [1, . . . , Id ]
d = [1, . . . , D] k = [1, . . . , K]
∏
D ∏
Id
∏
K
⇒ C ∈ {0; 1}D×Id ×K with p(C | Π) = cdik
πdk
d=1 i=1 k=1
∏
n
p(x | π) = πxi x ∈ {1; . . . , K}
i=1
∏
K
= πknk nk := |{xi | xi = k}|
k=1
∑
αk ) ∏ αk −1 1 ∏ αk −1
K K
Γ(
p(π | α) = D(α) = ∏ k πk = πk
k Γ(αk ) B(α)
k=1 k=1
p(π | x) = D(α + n)
1 1 1
π3
π3
0 0 0
00 00 00
0.5 0.5 0.5 0.5 0.5 0.5
1 1 π2 1 1 π2 1 1 π2
π1 π1 π1
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 9
Latent Dirichlet Allocation
our generative model [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) JMLR 3, 993–1022]
αd βk
πd cdi wdi θk
i = [1, . . . , Id ] k = [1, . . . , K]
d = [1, . . . , D]
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 10
Latent Dirichlet Allocation
our generative model [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) JMLR 3, 993–1022]
αd βk
πd cdi wdi θk
i = [1, . . . , Id ] k = [1, . . . , K]
d = [1, . . . , D]
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 10
Designing a probabilistic machine learning method:
1. get the data
1.1 try to collect as much meta-data as possible
2. build the model
2.1 identify quantities and datastructures; assign names
2.2 design a generative process (graphical model)
2.3 assign (conditional) distributions to factors/arrows (use exponential families!)
3. design the algorithm
3.1 consider conditional independence
3.2 try standard methods for early experiments
3.3 run unit-tests and sanity-checks
3.4 identify bottlenecks, find customized approximations and refinements
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 11
Building an algorithm
A more careful look at the model
αd βk
πd cdi wdi θk
i = [1, . . . , Id ] k = [1, . . . , K]
d = [1, . . . , D]
p(C, Π, Θ, W)
p(C, Π, Θ | W) = ∫∫∫
p(C, Π, Θ, W) dC dΠ, dΘ
p(W | C, Π, Θ) · p(C, Π, Θ)
= ∫∫∫
p(C, Π, Θ, W) dC dΠ, dΘ
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 12
Building an algorithm
A more careful look at the model
αd βk
πd cdi wdi θk
i = [1, . . . , Id ] k = [1, . . . , K]
d = [1, . . . , D]
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 13
Building an algorithm
A more careful look at the model
αd βk
πd cdi wdi θk
i = [1, . . . , Id ] k = [1, . . . , K]
d = [1, . . . , D]
( ( ∑ )) ( D I )
∏ d ( )
Γ( k αdk ) ∏ αdk −1 ∏∏
D K
∏K
p(Π | α) · p(C | Π) = ∏ πdk · cdik
k=1 πdk
d=1 k Γ(αdk ) k=1 d=1 i=1
∑
Useful notation: ndkv = #{i : wdi = v, cdik = 1}. Write ndk: := [ndk1 , . . . , ndkV ] and ndk· = v ndkv , etc.
( ∑ )
∏
D
Γ( k αdk ) ∏ αdk −1+ndk·
K
= ∏ πdk
d=1 k Γ(αdk ) k=1
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 14
Building an algorithm
A more careful look at the model
αd βk
πd cdi wdi θk
i = [1, . . . , Id ] k = [1, . . . , K]
d = [1, . . . , D]
αd βk
πd cdi wdi θk
i = [1, . . . , Id ] k = [1, . . . , K]
d = [1, . . . , D]
( ∑ ) ( K )
∏
D
Γ( k αdk ) ∏ αdk −1+ndk·
K ∏ Γ(∑ βkv ) ∏ V
−1+n
p(C, Π, Θ, W) = ∏ πdk · ∏ v β
θkvkv ·kv
▶ note that this conditional independence can easily be read off from the above graph!
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 16
Building an algorithm
A more careful look at the model
αd βk
πd cdi wdi θk
i = [1, . . . , Id ] k = [1, . . . , K]
d = [1, . . . , D]
( ) ( Id (
) ( ) ( K )
∏
D ∏
D ∏
∏K ) ∏ Id (
D ∏
∏K ) ∏
cdik cdik
p(C, Π, Θ, W) = D(π d ; αd ) · k=1 πdk · k=1 θkwdi · D(θ k ; β k )
d=1 d=1 i=1 d=1 i=1 k=1
▶ note that this conditional independence can easily be read off from the above graph!
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 16
Let’s look at the structure again
Latent Dirichlet Allocation
αd βk
πd cdi wdi θk
i = [1, . . . , Id ] k = [1, . . . , K]
d = [1, . . . , D]
( ∑ ) ( K )
∏
D
Γ( k αdk ) ∏ αdk −1+ndk·
K ∏ Γ(∑ βkv ) ∏ V
−1+n
p(C, Π, Θ, W) = ∏ πdk · ∏ v β
θkvkv ·kv
▶ note that this conditional independence can not easily be read off from the above graph!
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 17
The Toolbox
Framework:
∫
p(y | x)p(x)
p(x1 , x2 ) dx2 = p(x1 ) p(x1 , x2 ) = p(x1 | x2 )p(x2 ) p(x | y) =
p(y)
Modelling: Computation:
▶ graphical models ▶ Monte Carlo
▶ Gaussian distributions ▶ Linear algebra / Gaussian inference
▶ (deep) learnt representations ▶ maximum likelihood / MAP
▶ Kernels ▶ Laplace approximations
▶ Markov Chains ▶
▶ Exponential Families / Conjugate Priors
▶ Factor Graphs & Message Passing
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 18
Gibbs sampling:
▶ xt ^ xt−1 ; xti ∼ p(xti | xt1 , xt2 , . . . , xt(i−1) , xt(i+1) , . . . )
▶ a special case of Metropolis-Hastings:
▶ q(x′ | xt ) = δ(x′\i − xt,\i )p(x′i | xt,\i )
▶ p(x′ ) = p(x′i | x′\i )p(x′\i ) = p(x′i | xt,\i )p(xt,\i )
p(x′ ) q(xt | x′ ) p(x′i | xt,\i ) p(xti | xt,\i )
▶ acceptance rate: a = · = · =1
p(xt ) q(x′ | xt ) p(xti | xt,\i ) p(x′i | xt,\i )
Markov Chain Monte Carlo Methods provide a relatively simple way to construct approximate posterior
distributions. They are asymptotically exact. Compared to other approximate inference methods, like
variational inference, they tend to be easier to implement but harder to monitor, and may also be more
computationally expensive
∫
1∑
S
f(x)p(x) dx ≈ f(xs ) if xs ∼ p
S
s=1
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 19
A Gibbs Sampler
Simple Markov Chain Monte Carlo inference
▶ This is comparably easy to implement because there are libraries for sampling from Dirichlet’s, and
discrete sampling is trivial. All we have to keep around are the counts n (which are sparse!) and
Θ, Π (which are comparably small). Thanks to factorization, much can also be done in parallel!
▶ Unfortunately, this sampling scheme is relatively slow to move out of initialization, because z
depends strongly on θ, π and vice versa.
▶ properly vectorizing the code is important for speed
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 20
A visual example
building a toy model adapted from T. L. Griffiths & M. Steyvers, Finding scientific topics, PNAS 101, 5228–5235
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 21
A visual example
building a toy model adapted from T. L. Griffiths & M. Steyvers, Finding scientific topics, PNAS 101, 5228–5235
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 22
It works!
your homework this week
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 23
Designing a Probabilistic Machine Learning Model
1. get the data
1.1 try to collect as much meta-data as possible
1.2 take a close look at the data
2. build the model
2.1 identify quantities and datastructures; assign names
2.2 design a generative process (graphical model)
2.3 assign (conditional) distributions to factors/arrows (use exponential families!)
3. design the algorithm
3.1 consider conditional independence
3.2 try standard methods for early experiments
3.3 run unit-tests and sanity-checks
3.4 identify bottlenecks, find customized approximations and refinements
4. Test the Setup
5. Revisit the Model and try to improve it, using creativity
Probabilistic ML — P. Hennig, SS 2021 — Lecture 20: Latent Dirichlet Allocation — © Philipp Hennig, 2021 CC BY-NC-SA 3.0 24