Professional Documents
Culture Documents
edu
SECTION 1
Topic modeling pre-LDA
Landauer97
Thomas K. Landauer and Susan T. Dumais. Solutions to plato’s problem: The latent
semantic analysis theory of acquisition, induction, and representation of knowledge.
Psychological Review, (104), 1997
Deerwester90
Hofmann99
Ding08
Chris Ding, Tao Li, and Wei Peng. On the equivalence between non-negative matrix
factorization and probabilistic latent semantic indexing. Computational Statistics and
Data Analysis, 52:3913–3927, 2008
Proved the equivalence between pLSI and NMF, by showing that they both optimize the same
objective function. As they are different algorithms, this allow to design an hybrid method
alternating between NMF and pLSI, every time jumping out of the local optimum of the other
method.
1
Topic Modeling Bibliography Quentin Pleple, qpleple@ucsd.edu
SECTION 2
LDA
Chronologically, Blei, Ng and Jordan first published [Blei02] presenting LDA in NIPS treating
topics ϕk as free parameters. Shortly after, Griffiths and Steyvers published [Griffiths02a]
and [Griffiths02b] extending this model by adding a symmetric Dirichlet prior on ϕk . Finally,
Blei, Ng and Jordan published an extended version [Blei03] of their first paper in Journal of
Machine Learning Research (by far the most cited LDA paper) with a section on having this
Dirichlet smoothing on multinomial parameters ϕk .
Blei02
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. In
NIPS, 2002
First first paper for LDA, quite short, not used. See [Blei03].
Griffiths02a
Griffiths02b
Thomas L. Griffiths and Mark Steyvers. Prediction and semantic association. In NIPS,
pages 11–18, 2002
Almost the same paper as [Griffiths02a].
Blei03
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. J.
Mach. Learn. Res., 3:993–1022, March 2003
Most cited paper for LDA, extended version of [Blei02].
Griffiths04
2
Topic Modeling Bibliography Quentin Pleple, qpleple@ucsd.edu
Heinrich04
Gregor Heinrich. Parameter estimation for text analysis. Technical report, 2004
Heavily detailed tutorial about LDA, and inference using Gibbs sampling.
Steyvers06
Mark Steyvers and Tom Griffiths. Probabilistic topic models. In T. Landauer, D. Mc-
namara, S. Dennis, and W. Kintsch, editors, Latent Semantic Analysis: A Road to
Meaning. Laurence Erlbaum, 2006
LDA has been around for 3 years, they give an in-depth review and analysis of probabilistic
models, full of deep insights. hey propose measure capturing similarity between topics (KL,
KLsym, JS, cos, L1, L2), between a set of words and documents, and between words.
Blei12
David M. Blei. Probabilistic topic models. Commun. ACM, 55(4):77–84, April 2012
A short, high-level review on topic models. Not technical.
Hoffman10
Matthew Hoffman, David M. Blei, and Francis Bach. Online learning for latent dirichlet
allocation. In NIPS, 2010
Present an online version of the Variational EM algorithm introduced in [Blei03].
SECTION 3
Evaluation of topic models
Wei06
Xing Wei and Bruce Croft. Lda-based document models for ad-hoc retrieval. In SIGIR,
2006
As an extrinsic evaluation method of topics, used discovered topics for information retrieval.
Chang09
Jonathan Chang, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David M. Blei.
Reading tea leaves: How humans interpret topic models. In NIPS, 2009
Shown that surprisingly predictive likelihood (or equivalently, perplexity) and human judgment
are often not correlated, and even sometimes slightly anti-correlated.
They ran a large scale experiment on the Amazon Turk platform. For each topic, they took
the five top words of that topics and added a random sixth word. Then, they presented these
lists of six words to people asking them which is the intruder word.
3
Topic Modeling Bibliography Quentin Pleple, qpleple@ucsd.edu
If all the people asked could tell which is the intruder, then we can conclude safely that the
topic is good at describing an idea. If on the other hand, many people identified other words
as the intruder, it means that they could not see the logic into the association of words, and
we can conclude the topic was not good enough.
Wallach09a
Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. Evaluation
methods for topic models. In Proceedings of the 26th Annual International Conference
on Machine Learning, ICML ’09, pages 1105–1112, New York, NY, USA, 2009. ACM
Gives tons of methods to compute approximations of the likelihood p(wd |Φ, α) for one unseen
document, which is intractable but needed to evaluate topic models.
Buntine09
AlSumait09
Loulwah AlSumait, Daniel Barbará, James Gentle, and Carlotta Domeniconi. Topic
significance ranking of lda generative models. In ECML, 2009
Define measures based on three prototypes of junk and insignificant topics to rank discovered
topics according to their significance score. The three junk prototypes are the uniform word-
distribution, the empirical corpus word-distribution, and the uniform document-distribution:
Then the topic significance score is based on combinations of dissimilarities (KL divergence,
cosine, and correlation) from those three junk prototypes. However
Newman10c
David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. Automatic evalu-
ation of topic coherence. In NAACL, 2010
Tries different coherence measure on different dataset to compare them.
Mimno11b
David Mimno and David Blei. Bayesian checking for topic models. In EMNLP, 2011
Presents a Baysian methods measuring how well a topics model fits a corpus.
4
Topic Modeling Bibliography Quentin Pleple, qpleple@ucsd.edu
SECTION 4
Topic coherence
Newman10a
David Newman, Youn Noh, Edmund Talley, Sarvnaz Karimi, and Timothy Baldwin.
Evaluating topic models for digital libraries. In Proceedings of the 10th annual joint
conference on Digital libraries, pages 215–224, New York, NY, USA, 2010. ACM
P p(wi ,wj )
Introduced the UCI coherence measure i<j log p(wi )p(w j)
for w1 , ..., w10 top words (based on
PMI). The measure is extrinsic as it uses empirical probabilities from an external corpus such
as Wikipedia.
Mimno11a
David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders, and Andrew McCal-
lum. Optimizing semantic coherence in topic models. In EMNLP, 2011
P 1+D(w ,w )
Introduced the UMass coherence measure i<j log D(wii ) j for w1 , ..., w10 top words (intrinsic
measure).
Stevens12
Keith Stevens, Philip Kegelmeyer, David Andrzejewski, and David Buttler. Exploring
topic coherence over many models and many topics. In Proceedings of the 2012 Joint
Conference on Empirical Methods in Natural Language Processing and Computational
Natural Language Learning, EMNLP-CoNLL ’12, pages 952–961, Stroudsburg, PA,
USA, 2012. Association for Computational Linguistics
Explore computing the two coherence metrics UCI from [Newman10a] and UMass from [Mimno11a]
on multiple datasets, for different number of topics. The aggregate results (computing average
and entropy). They assume these two are good metrics and use them to compare different topic
models: LDA, LSA+SVD, and LSA+NMF.
SECTION 5
Interactive LDA
Andr09
David Andrzejewski, Xiaojin Zhu, and Mark Craven. Incorporating domain knowledge
into topic modeling via dirichlet forest priors. In ICML, pages 25–32, 2009
Make the discovery of topics semi-supervised where a user repeatedly gives orders on top words
of discovered topics: “those words should be in the same topic”, “those words should not be in
the same topic”, and “those words should be by themselves”. Orders are encoded into pair-wise
5
Topic Modeling Bibliography Quentin Pleple, qpleple@ucsd.edu
constraints on words: two words have to or can not be in the same topic. Then the model is
trained again with a complex new prior encoding the constraints based on Dirichlet Forests.
Hu11
Yuening Hu, Jordan Boyd-Graber, and Brianna Satinoff. Interactive topic modeling.
In Association for Computational Linguistics, 2011
Extended approach from [Andr09] proposing interactive topic modeling (ITM) where we don’t
have to start over the Gibbs sampler after each human action. Instead, the prior is updated in-
place to incorporate the new constraints and the underlying model is changed and seen a starting
position for a new Markov chain. Updating the model is done by state ablation; invalidate some
topic-word assignments by setting z = −1. The counts are decremented accordingly
They explore several strategies of invalidation: invalidates all assignments, only of docu-
ments that have any of the terms constraints, only of the terms concerned, or none. After each
human actions, the Gibbs sample runs for 30 more iterations before asking for human feedback
again. Experiments have been done using Amazon Mechanical Turk.
SECTION 6
Misc topic modeling
Pauca04
V. Paul Pauca, Farial Shahnaz, Michael W. Berry, and Robert J. Plemmons. Text
mining using non-negative matrix factorizations. In SDM, 2004
Reference for successful use of NMF for topic modeling.
Lee99
Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by non-negative
matrix factorization. Nature, 401(6755), 1999
Reference for successful use of NMF for topic modeling.
Doyle09
Gabriel Doyle and Charles Elkan. Accounting for burstiness in topic models. In ICML,
2009
Elkan’s paper about burstiness.
Wallach09b
Hanna Wallach, David Mimno, and Andrew McCallum. Rethinking lda: Why priors
matter. In NIPS, 2009
Study the effect of different priors on LDA output.
6
Topic Modeling Bibliography Quentin Pleple, qpleple@ucsd.edu
Andr11
David Andrzejewski, Xiaojin Zhu, Mark Craven, and Ben Recht. A framework for
incorporating general domain knowledge into latent dirichlet allocation using first-
order logic. In IJCAI, 2011
Use discovered topics in a search engine, use query expansion (like we do in Squid).
Chang10
Chuang12
Also present a new visualization of topics distributions based on a matrix of circles, and a word
ordering such that topics span contiguous words.
SECTION 7
Misc
Campbell66
Geman84
7
Topic Modeling Bibliography Quentin Pleple, qpleple@ucsd.edu
Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the
bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell., 6(6):721–741,
November 1984
Introduced Gibbs sampling.
Bottou98
Léon Bottou. Online algorithms and stochastic approximations. In David Saad, editor,
Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK,
1998. revised, oct 2012
Convergence of online algorihtms, gives the condition ∞ 2
P
t=0 ρt < ∞ needed to prove the con-
vergence of Online Variational LDA [Hoffman10].
Lee00
Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factoriza-
tion. In In NIPS, pages 556–562. MIT Press, 2000
Reference for NMF.
Bishop06
Tzikas08
Dimitris Tzikas, Aristidis Likas, and Nikolaos Galatsanos. The variational approxima-
tion for Bayesian inference. IEEE Signal Processing Magazine, 25(6):131–146, Novem-
ber 2008
A step-by-step tutorial on EM algorithm, following closely the Bishop book [Bishop06]. They
described the MAP as poor man’s Bayesian inference as this is a way of including prior knowl-
edge without having to pay the expensive price of computing the normalizer.
Crump13
8
Topic Modeling Bibliography Quentin Pleple, qpleple@ucsd.edu
on the incentive does not effect the task performance but does effect the rate at which workers
sign up for the task.