Professional Documents
Culture Documents
Stewart Slides
Stewart Slides
Brandon M. Stewart1
Harvard University
1
I am grateful to Justin Grimmer and David Blei for permission to use some graphics
and material from their previous presentations.
Brandon M. Stewart (Harvard University) Topic Models June 14, 2010 1 / 39
Introduction
1 Introduction
Basic Definitions
Basic Definitions
Basic Definitions
Basic Definitions
Basic Definitions
Why Cluster?
Why Cluster?
Why Cluster?
Why Cluster?
Public Land
0.15
●
0.10 ●
● ●
●
●
●
0.05
●
● ●
●
● ● ●
●● ●
●●
●● ● ●●
●●
●● ● ●
● ● ●
●
● ● ● ●
●●● ●● ●
●
●● ● ●
●● ●
● ●
● ●
●
●
●● ● ● ●
●● ●●● ●
● ● ●
●● ●
● ●
●
●● ● ● ●
0.00
●
●
● ● ●
● ●
●
Cloture 2
60
Cloture 1
count
30
20
DREAM
10
0
date
McCain 2005
iraq
●
honor
●
court
●
land
●
broadband
●
consum
●
energi
●
hurrican
●
immigr
●
transpar
●
k-means
Choosing a Model
Choosing a Model
1 We want one algorithm that has optimal performance for all our sets
of documents and on our subject of interest
Choosing a Model
1 We want one algorithm that has optimal performance for all our sets
of documents and on our subject of interest
2 Unsurprisingly, this is impossible.
Choosing a Model
1 We want one algorithm that has optimal performance for all our sets
of documents and on our subject of interest
2 Unsurprisingly, this is impossible.
3 Two important theorems:
Choosing a Model
1 We want one algorithm that has optimal performance for all our sets
of documents and on our subject of interest
2 Unsurprisingly, this is impossible.
3 Two important theorems:
1 Ugly Duckling Theorem
Choosing a Model
1 We want one algorithm that has optimal performance for all our sets
of documents and on our subject of interest
2 Unsurprisingly, this is impossible.
3 Two important theorems:
1 Ugly Duckling Theorem
2 No Free Lunch Theorem
Choosing a Model
1 We want one algorithm that has optimal performance for all our sets
of documents and on our subject of interest
2 Unsurprisingly, this is impossible.
3 Two important theorems:
1 Ugly Duckling Theorem
2 No Free Lunch Theorem
4 Thus to choose a method we need to think about substance
(Grimmer and King, 2009)
A Simple Example
A Simple Example
A Simple Example
A Simple Example
A Simple Example
A Simple Example
A Simple Example
K -Means
1 Mixture models “supposes that the data is an i.i.d sample from some
population described by a probability density function. This density
function is characterized by a parameterized model taken to be a
mixture of component density functions; each component density
describes one of the clusters. This model is then fit to the data by
maximum likelihood or corresponding Bayesian approaches.” (Hastie,
Tibshirani and Friedman, 2009).
1 Mixture models “supposes that the data is an i.i.d sample from some
population described by a probability density function. This density
function is characterized by a parameterized model taken to be a
mixture of component density functions; each component density
describes one of the clusters. This model is then fit to the data by
maximum likelihood or corresponding Bayesian approaches.” (Hastie,
Tibshirani and Friedman, 2009).
2 Now we can tell a story about our data. Poisson vs. Multinomial.
1 Mixture models “supposes that the data is an i.i.d sample from some
population described by a probability density function. This density
function is characterized by a parameterized model taken to be a
mixture of component density functions; each component density
describes one of the clusters. This model is then fit to the data by
maximum likelihood or corresponding Bayesian approaches.” (Hastie,
Tibshirani and Friedman, 2009).
2 Now we can tell a story about our data. Poisson vs. Multinomial.
3 Nothing stops us from incorporating more substance into more
complex models.
1 Assumes:
1 Assumes:
1 Each document is assigned to one topic
1 Assumes:
1 Each document is assigned to one topic
2 Each author allocates some hidden proportion of time to each topic
1 Assumes:
1 Each document is assigned to one topic
2 Each author allocates some hidden proportion of time to each topic
2 Grimmer’s project seeks to quantitatively represent the content of
senators’ press releases.
1 Assumes:
1 Each document is assigned to one topic
2 Each author allocates some hidden proportion of time to each topic
2 Grimmer’s project seeks to quantitatively represent the content of
senators’ press releases.
3 It is called the Expressed Agenda Model because it captures the way
they communicate that agenda to constituents.
Labelling Clusters
Labelling Clusters
Labelling Clusters
Labelling Clusters
Labelling Clusters
Labelling Clusters
Validation
Validation
Validation
Validation
Validation
Validation
Gap Statistic
Figure: (Left panel): observed (green) and expected (blue) values of logWK .
Both curves have been translated to equal zero at one cluster. (Right panel): Gap
curve, equal to the difference between the observed and expected values of
logWK . The Gap estimate K ∗ is the smallest K producing a gap within one
standard deviation of the gap at K + 1; here K ∗ = 2.
Cluster Quality
Cluster Quality
Cluster Quality
Cluster Quality
Cluster Quality
Rockefeller
●
Press Releases
Lautenberg
●
Press Releases
Dirichlet●Process
Figure: The height of the dendrogram provides the similarity of the two clusters
which are merged over the horizontal line.