You are on page 1of 120

Topic Models: Introduction

Pawan Goyal

C S E , IIT Kharagpur

Week 9, Lecture 1

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 1 / 11


Why Topic Modeling?

Information Overload
A s more information becomes available, it becomes more difficult to find and
discover what we need.

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 2 / 11


Why Topic Modeling?

Information Overload
A s more information becomes available, it becomes more difficult to find and

EL
discover what we need.

Main Tools: Search and Links


PT
We type keywords into a search engine and find a set of related
documents
N
We look at these documents and possibly navigate to other
documents

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 2 / 11


Why Topic Modeling?

Search Based-on themes


Imagine searching and exploring documents based on themes that run

EL
through them.
We might “zoom-in” or “zoom-out” to find specific or broader themes

connected to each other PT


We might look at how themes change through time, how they are

Find the theme first and then examine the documents pertaining to that
N
theme

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 3 / 11


Why Topic Modeling?

Topic Modeling

EL
Provides methods for automatically organizing, understanding, searching and
summarizing large electronic archives without any prior annotation or labeling
Discover the hidden themes that pervade the collection
PT
Annotate the documents according to those themes
U se annotations to organize, summarize, and search
N
the texts

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 4 / 11


Applications: Discover Topics from a corpus

EL
PT
N

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 5 / 11


Intuition: Documents exhibit multiple topics

EL
PT
N

This articles is about using data analysis to determine the number of genes an
organism needs to survive
Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 6 / 11
Intuition: Documents exhibit multiple topics

EL
PT
N

Highlighted words: ‘blue’: data analysis, ‘pink’: evolutionary biology, ‘yellow’:


genetics
Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 6 / 11
Intuition: Documents exhibit multiple topics

EL
PT
N

The article blends geneti cs, data analysis and evolutionary biology in different
proportions
Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 6 / 11
Intuition: Documents exhibit multiple topics

EL
PT
N

Knowing that this article blends those topics would help situate it in a collection
of scientific articles
Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 6 / 11
Topic Model: Basic Idea

A generative statistical model that captures this intuition.

Generative Model

EL
Documents are mixture of topics, where a topic is a probability distribution over
words.

PT
N

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 7 / 11


Topic Model: Basic Idea

A generative statistical model that captures this intuition.

Generative Model

EL
Documents are mixture of topics, where a topic is a probability distribution over
words.
geneti cs topic has words about geneti cs with high probability and the
PT
evolutionary biology topic has words about evolutionary biology with high
probability.
N

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 7 / 11


Topic Model: Basic Idea

A generative statistical model that captures this intuition.

Generative Model

EL
Documents are mixture of topics, where a topic is a probability distribution over
words.
geneti cs topic has words about geneti cs with high probability and the

probability.
PT
evolutionary biology topic has words about evolutionary biology with high

Technically, the generative model assumes that the topics are generated first,
N
before the documents.

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 7 / 11


Generative Model for
LDA

EL
PT
N

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 8 / 11


Generative Model for
LDA

EL
PT
N

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 8 / 11


Generative Model for
LDA

EL
PT
N

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 8 / 11


Generative Model for
LDA

EL
PT
N

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 8 / 11


What does the statistical model reflect?

All the document in the collection share the same set of topics, but each
document exhibits those topics in different proportions

EL
Ea ch word in each document is drawn from one of the topics, where the
selected topic is chosen from the per-document distribution over topics

PT
N

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 9 / 11


What does the statistical model reflect?

All the document in the collection share the same set of topics, but each
document exhibits those topics in different proportions

EL
Ea ch word in each document is drawn from one of the topics, where the
selected topic is chosen from the per-document distribution over topics

PT
In the example article, the distribution over topics would place probability on
geneti cs, data analytics and evolutionary biology, and each word is drawn from
N
one of those three topics.

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 9 / 11


Real Inference with LDA for the example article

EL
PT
N

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 10 / 11


Central Problem of
LDA

EL
The documents themselves are observed, while the topic structure - the
topics, per-document topic distributions, and the per-document per-word
topic assignments - is hidden structure.

PT
The central computational problem is to use the observed documents to
infer the hidden topic structure, i.e. reversing the generative process.
N

Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 11 / 11


Latent Dirichlet Allocation: Formulation

EL
Pawan Goyal

PT C S E , IIT Kharagpur

Week 9, Lecture 2
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 1 / 22
Central Problem of
LDA

EL
The documents themselves are observed, while the topic structure - the
topics, per-document topic distributions, and the per-document per-word
topic assignments - is hidden structure.

PT
The central computational problem is to use the observed documents to
infer the hidden topic structure, i.e. reversing the generative process.
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 2 / 22
Goal: The posterior distribution

EL
PT
N
Infer the hidden variables
Compute their distribution conditioned on the documents

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 3 / 22
Topics from TASA corpus

37,000 text p assages from educational materials (300 topics)

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 4 / 22
Topics from TASA corpus

Documents with different content can be generated by choosing different

EL
distributions over topics.

Equal probability to first two topics:

PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 5 / 22
Topics from TASA corpus

Documents with different content can be generated by choosing different

EL
distributions over topics.

Equal probability to first two topics: about a person who has taken too
many drugs and how that affected color perceptions.
PT
Equal probability to the last two topics:
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 5 / 22
Topics from TASA corpus

Documents with different content can be generated by choosing different

EL
distributions over topics.

Equal probability to first two topics: about a person who has taken too
many drugs and how that affected color perceptions.
PT
Equal probability to the last two topics: about a person who experienced
a loss of memory, which required a visit to the doctor.
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 5 / 22
Generative model and statistical inference

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 6 / 22
Important points

EL
bag-of-words assumpti on: The generative process does not make any
assumpti ons about the order of words in the documents.
capturing polysemy: The way that the model is defined, there is no notion

PT
of mutual exclusivity that restricts words to be part of one topic only. Ex:
both ‘money’ and ‘river’ topics can give high probability to the word ‘bank’.
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 7 / 22
Graphical Model (Notation)

EL
Nodes are random variables
PT
N
Ed g es denote possible dependence
Observed variables are shaded
Plates denote replicated structure

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 8 / 22
Graphical Model (Notation)

EL
PT
Structure of the graph defines the pattern of conditional dependence
between the ensemble of random variables
N
E.g., this graph corresponds to

p(y, x1, . . . , xN ) =n=1


p(y) ∏
p(xn|y)
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 9 / 22
LDA: Graphical Model

EL
PT
N

Ea c h piece of the structure is a random variable.

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 10 / 22
Latent Dirichlet Allocation: Generative Model

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 11 / 22
What is Latent Dirichlet Allocation
(LDA)?
‘Latent’ has the same sense in L DA a s in Latent semanti c indexing, i.e.
capturing topics a s latent variables
The distribution that is used to draw the per-document topic distributions
is called a Dirichlet distribution. This result is used to allocate the words

EL
of the documents to different topics.

PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 12 / 22
What is Latent Dirichlet Allocation
(LDA)?
‘Latent’ has the same sense in L DA a s in Latent semanti c indexing, i.e.
capturing topics a s latent variables
The distribution that is used to draw the per-document topic distributions
is called a Dirichlet distribution. This result is used to allocate the words

EL
of the documents to different topics.

Dirichlet Distribution
PT
The Dirichlet distribution is an exponential family distribution over the simplex,
i.e. positive vectors that sum to one
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 12 / 22
Dirichlet Distribution

EL
PT
N
αis: hyper-parameters of the model:
αj can be interpreted a s a prior observation count for the number of times topic
j is sampled in a document
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 13 / 22
Dirichlet Distribution

EL
PT
N
αis: hyper-parameters of the model:
These priors can be interpreted a s forces in the topic distributions with higher
α moving the topics away from the corners of the simplex

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 13 / 22
Dirichlet Distribution

EL
PT
N
αis: hyper-parameters of the model:
When α < 1, there is a bias to pick topic distributions favoring just a few topics

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 13 / 22
Dirichlet Distribution

EL
PT
N
αis: hyper-parameters of the model:
It is convenient to use a symmetric Dirichlet distribution with a single
hyper-parameter α1 = α2 . . . = α

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 13 / 22
Effect of α: α = 1

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 14 / 22
Effect of α: α = 10

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 15 / 22
Effect of α: α = 100

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 16 / 22
Effect of α: α = 1

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 17 / 22
Effect of α: α = 0.1

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 18 / 22
Effect of α: α = 0.01

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 19 / 22
Effect of α: α = 0.001

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 20 / 22
Online Implementations

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 21 / 22
Latent Dirichlet Allocation: Statistical Inference

EL
PT
N

Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 22 / 22
Gibbs Sampling for LDA, Applications

EL
Pawan Goyal

PT C S E , IIT Kharagpur

Week 9, Lecture 3
N

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 1 / 13
Approximating the posterior

Algorithms to approximate it fall in two categories:

Sampling-based Algorithms

EL
Collect samples from the posterior to approximate it with an empirical
distribution

PT
N

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 2 / 13
Approximating the posterior

Algorithms to approximate it fall in two categories:

Sampling-based Algorithms

EL
Collect samples from the posterior to approximate it with an empirical
distribution

Variational Methods PT
Deterministic alternative to sampling-based algorithms
N
The inference problem is transformed to an optimization problem

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 2 / 13
Gibbs Sampling

A form of Markov chain Monte Carlo (MCMC), which simulates a

EL
high-dimensional distribution by sampling on lower-dimensional subset of
variables where each subset is conditioned on the value of all others
Sampling is done sequentially and proceeds until the sampled values

PT
approximate the target distribution
It directly estimates the posterior distribution over z, and uses this to
provide estimates for β and θ
N

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 3 / 13
Gibbs Sampling

Suppose we have a word token i for which we want to find the topic
assignment probability : p(zi = j)

EL
Represent the collection of documents by a set of word indices wi and
document indices di for this token i
Gibbs sampling considers each word token in turn and estimates the

PT
probability of assigning the current word token to each topic, conditioned
on the topic assignment to all other word tokens
From this conditional distribution, a topic is sampled and stored a s the
N
new topic assignment for this word token
This conditional is written a s P(zi = j|z−i, wi, d i ,.)

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 4 / 13
Gibbs Sampling

Let us define two matrices CWT and CDT of dimensions W × T and D ×


T respectively.
CwjWT contains the number of times word w is assigned to topic j, not
including the current instance

EL
CdjWT contains the number of times topic j is assigned to some word
token in document d, not including the current instance

PT
N

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 5 / 13
Gibbs Sampling

Let us define two matrices CWT and CDT of dimensions W × T and D ×


T respectively.
CwjWT contains the number of times word w is assigned to topic j, not
including the current instance

EL
CdjWT contains the number of times topic j is assigned to some word
token in document d, not including the current instance
Cw jWT + η CdijDT +
∝ PT
P(zi = j|z−i, wi, d i ,.) W i
α
T
∑ CwjWT + Wη ∑ CdjDT + Tα
w=1 t=1
N
The left part is the probability of word w under topic j (How likely a word is
for a topic) whereas
the right part is the probability of topic j under the current topic distribution
for document d (How dominant a topic is in a document)

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 5 / 13
Algorithm

Start: Ea ch word token is assigned to a random topic in [1 . . .


T]

EL
For each word token, a new topic is sampled a s per P(zi = j|z−i, wi, d i ,.) ,
adjusting the matrices CWT and CDT
A single p ass through all word tokens in the document is one Gibbs
sample
PT
After the burnin period, these samples are saved at regularly spaced
intervals, to prevent correlations between samples
N

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 6 / 13
Estimating θ and
β

CijWT + η
βi(j) = W

∑ CkjWT + Wη

EL
k=1
CdjDT + α
θj (d) = T

PT ∑ CdkDT + Tα
k=1
N
These values correspond to predictive distributions of
sampling a new token of word i from topic j, and
sampling a new token in document d from topic j

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 7 / 13
An Example

The algorithm can be illustrated by generating artificial data from a known topic
model and applying the algorithm to check whether it is able to infer the
original generative structure.

EL
Example
Let topic 1 give equal probability to MONEY, LOA N , B A N K and topic 2
give equal probability to words R I V E R , S T R E A M , and BA N K
PT
βMONEY (1) = βLOAN (1) = βBANK (1) = 1/3
N
βRIVER(2) = βSTREAM (2) = βBANK (2) = 1/3

We generate 16 documents by arbitrarily mixing two topics.

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 8 / 13
Initial Structure

EL
PT
N
Colors reflect initial random assignment, black = topic 1, while = topic 2

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 9 / 13
After 64 iterations of Gibbs Sampling

EL
PT
N
βMONEY (1) = 0.32, βLOAN (1) = 0.29, βBANK (1) = 0.39
βRIVER(2) = 0.25, βSTREAM (2) = 0.4, βBANK (2) = 0.35

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 10 / 13
Computing Similarities

Document Similarity
Similarity between documents d1 and d2 can be measured by the similarity
between their topic distributions θ(d1) and θ(d2)
T
p
K L divergence : D(p, q) = ∑ pj log 2 qjj

EL
j=1

2[D(p, q) + D(q, p)] seems to work well


1
Symmetrized K L divergence:

Similarity with respect to query PT


qMaximize the conditional probability of query given the document:
N
p(q|di) = ∏ p(wk|di)
wk ∈q
T

= ∏ ∑ P(wk|z = j)P(z = j|
d i)
wk ∈qj=1
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 11 / 13
Computing Similarities

Similarity between two words

EL
Having observed a single word in a new context, what are the other words that
might appear in the same context, based on the topic interpretation for the
observed word?

PT
p(w2|w1) =
T

∑ p(w2|z = j)p(z = j|
N
wi)
j=1

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 12 / 13
Example

Observed and predicted responses for the word ‘PLAY’

EL
PT
N

Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 13 / 13
LDA Variants and Applications -
I

EL
Pawan Goyal

PT C S E , IIT Kharagpur

Week 9, Lecture 4
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 1 / 24
Example Problem
Suppose you are using Gibbs sampling to estimate the distributions, θ and β
for topic models. The underlying corpus has 5 documents and 5 words, { River,
Stream, Bank, Money, L o a n } and the number of topics is 2. At certain point,
the structure of the documents looks like the following Table. For instance, the
first row indicates that the document 1 contains 4 instances of word ‘Bank’, 6

EL
instances of word ‘Money’ and 6 instances of word ‘Loan’. Black and white
circles denote whether the word is currently assigned to topics t1 and t2
respectively.

PT
U s e this structure to estimate βMONEY (2) and βBANK (1) at this point. You can
take the values of η and α to be 0.1 each.
Doc. Id River Stream Bank Money Loan
N
1 • • •• •••• ••••
2
•••◦ •• ••
3
◦ ◦◦◦ • • • • • • • • ••
4
5 ◦◦◦◦ ◦◦◦ •◦◦◦ •• •••
◦◦ ◦◦◦◦ •◦ • • ••
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 2 / 24
Modeling Science

Data
The OC R ’ed collection of Scien ce from 1990-2000

EL
17K documents
11M words
20K unique
terms (stop
words and rare
PT
words removed)
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 3 / 24
Modeling Science

Data
The OC R ’ed collection of Scien ce from 1990-2000

EL
17K documents
11M words
20K unique
terms (stop
words and rare
PT
words removed)
N
Model
100-topic model
using variational
inference

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 3 / 24
Example Inference

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 4 / 24
Example Topics

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 5 / 24
Modeling Richer Assumptions in Topic Models

EL
Correlated topic models
Dynamic topic models
Measuring scholarly impact
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 6 / 24
Correlated Topic Models

The Dirichlet is an exponential family distribution on the simplex, positive


vectors that sum to one
However, the near independence of components makes it a poor choice
for modeling topic proportions

EL
An article about fossil fuels is more likely to also be about geology than
about genetics

Using logistic normal distributionPT


A multivariate normal distribution of a k-dimensional vector x = [X1, X 2 ,...,
N
Xk]
can be written as
x ∼ Nk(µ, Σ)
with k-dimensional mean vector µ and k × k covariance matrix Σ

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 7 / 24
Correlated Topic Model
(CTM)

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 8 / 24
CTM supports more topics and provides a better fit than
LDA

EL
PT
N

Held-out log probability on Science

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 9 / 24
Correlated Topics

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 10 / 24
Dynamic Topic Models

EL
LDA assumption
L DA assumes that the order of documents does not matter
Not appropriate for corpora that spans hundreds of years
PT
We might want to track how language changes over time
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 11 / 24
Dynamic Topic Models

EL
PT
N

βk,t|βk,t−1 ∼ N(βk,t−1, σ2I)


Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 12 / 24
Dynamic Topic Models

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 13 / 24
Dynamic Topic Models

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 14 / 24
Measuring Scholarly Impact

How to model influence?

EL
Idea from Dynamic Topic Models, influential articles reflect future
changes in language use
The influence of an article is a latent variable
PT
Influential articles affect the drift of the topics that they discuss
The posterior gives a retrospective estimate of influential article
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 15 / 24
Measuring Scholarly Impact

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 16 / 24
Measuring Scholarly Impact

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 17 / 24
Supervised settings of
LDA

Use data points paired with response variables


User reviews paired with a number of stars

EL
Web pages paired with a number of likes
Documents paired with links to other documents
Images paired with a category
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 18 / 24
Supervised settings of
LDA

Use data points paired with response variables


User reviews paired with a number of stars

EL
Web pages paired with a number of likes
Documents paired with links to other documents
Images paired with a category

Supervised topic modes


PT
N
are topic models of documents and responses, fit to find topics predictive of
the response

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 18 / 24
Supervised
LDA

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 19 / 24
Supervised LDA: why a different model is required?

Think of an alternative approach using original settings of


LDA

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 20 / 24
Supervised LDA: why a different model is required?

Think of an alternative approach using original settings of LDA

EL
Formulate a model in which the response variable y is regressed on topic
proportions θ

PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 20 / 24
Supervised LDA: why a different model is required?

Think of an alternative approach using original settings of LDA

EL
Formulate a model in which the response variable y is regressed on topic
proportions θ

Why then a different model?


PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 20 / 24
Supervised LDA: why a different model is required?

Think of an alternative approach using original settings of LDA

EL
Formulate a model in which the response variable y is regressed on topic
proportions θ

Why then a different model?


PT
The response variable can be treated a s an important observation to infer
the topic probabilities in a supervised manner
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 20 / 24
Prediction

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 21 / 24
Example: Movie Reviews

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 22 / 24
Held out correlation

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 23 / 24
Supervised Topic Models

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 24 / 24
LDA Variants and Applications - II

EL
Pawan Goyal

PT C S E , IIT Kharagpur

Week 9, Lecture 5
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 1 / 16
Relational Topic Models

EL
PT
N
Connected Observations
Citation networks of documents
Hyperlinked networks of web-pages
Friend-connected social network profiles
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 2 / 16
Relational Topic Models

EL
L DA needs to be adapted to a model of content and connection
RTM s find hidden structure in both types of data

PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 3 / 16
Relational Topic Models

EL
PT
N

Works in a supervised framework, allowing predictions about new and unlinked


data

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 4 / 16
Link Prediction Task using
RTM

Given a new document, which documents is it likely to link to?

EL
RTM allows for such predictions
PT
N
links given the new words of a document
words given the links of a new document

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 5 / 16
Relational Topic Models

EL
PT
N

Formulation ensures that the same latent topic assignments used to generate
the content of the documents also generates their link structure.

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 6 / 16
RTM: Generative Model

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 7 / 16
Link Probability Function (ψ)

Dependent on the topic assignments that generated their words, zd and zd′ .

ψe(y = 1) = exp(ηT (zd ◦ zd′ ) + ν)

EL
zd = N1d
∑n zd,n
◦notation denotes the Hadamard (element-wise) product Exponential

PT
function is being used, they also tried using sigmoid (ψσ)
Link function models each per-pair binary variable a s a logistic regression
N
parameterized by η1×K and intercept ν (in case of sigmoid)
Covariates are constructed by the Hadamard product of zd and zd′ ,
capturing similarity between the hidden topic representation of the two
documents

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 8 / 16
Inference: How many links to model

One can fix yd1,d2 = 1 whenever a link is observed between d1 and d2 and
set yd1,d2 = 0 otherwise

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 9 / 16
Inference: How many links to model

One can fix yd1,d2 = 1 whenever a link is observed between d1 and d2 and
set yd1,d2 = 0 otherwise

EL
Problem with that approach is that the absence of a link cannot be
construed a s evidence for yd1,d2 = 0

PT
So, in these cases, these links are treated a s unobserved variables
Also provides a significant computational advantage
N
In large social networks like Facebook, the absence of a link between two
people doesn’t necessarily mean that they are not friends.

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 9 / 16
Predicting links from documents

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 10 / 16
Predicting links from documents

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 11 / 16
Nonparametric Models

In LDA, we need to specify the number of latent clusters (topics) in


advance.

EL
In many data analysis settings, however, we do not know the number and
would like to learn it from the data.
Bayesian Non-Parametric (BNP) models assume that there is an infinite

PT
number of latent clusters, but only a finite number of them is used to
generate the observed data.
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 12 / 16
Nonparametric Models

In LDA, we need to specify the number of latent clusters (topics) in


advance.

EL
In many data analysis settings, however, we do not know the number and
would like to learn it from the data.
Bayesian Non-Parametric (BNP) models assume that there is an infinite

generate the observed data. PT


number of latent clusters, but only a finite number of them is used to

The posterior provides a distribution over the number of clusters, the


N
assignment of data to clusters, and the parameters associated with each
cluster.

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 12 / 16
Chinese Restaurant Process: the prior over groupings

Imagine a restaurant with an infinite number of tables


Imagine a sequence of customers entering the restaurant and sitting
down

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 13 / 16
Chinese Restaurant Process: the prior over groupings

Imagine a restaurant with an infinite number of tables


Imagine a sequence of customers entering the restaurant and sitting
down

EL
The first customer enters and sits at the first table.

PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 13 / 16
Chinese Restaurant Process: the prior over groupings

Imagine a restaurant with an infinite number of tables


Imagine a sequence of customers entering the restaurant and sitting
down

EL
The first customer enters and sits at the first table.
1
The second customer enters and sits at the first table with probability 1 +α

PT α
and the second table with probability 1+α .
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 13 / 16
Chinese Restaurant Process: the prior over groupings

Imagine a restaurant with an infinite number of tables


Imagine a sequence of customers entering the restaurant and sitting
down

EL
The first customer enters and sits at the first table.
1
The second customer enters and sits at the first table with probability 1 +α

PT α
and the second table with probability 1+α .
When the nth customer arrives, she sits at any of occupied tables with
probability proportional to the number of previous customers sitting there,
N
and at the next unoccupied table with probablity proportional to α.

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 13 / 16
Chinese Restaurant Process: the prior over groupings

Imagine a restaurant with an infinite number of tables


Imagine a sequence of customers entering the restaurant and sitting
down

EL
The first customer enters and sits at the first table.
1
The second customer enters and sits at the first table with probability 1 +α

PT α
and the second table with probability 1+α .
When the nth customer arrives, she sits at any of occupied tables with
probability proportional to the number of previous customers sitting there,
N
and at the next unoccupied table with probablity proportional to α.
A large value of α will produce more occupied tables (and fewer
customers per table).

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 13 / 16
How to model a corpus

C R P helps in obtaining a random partition a s the sequence of customers


sitting at tables in a restaurant.

EL
Tables can be thought of a s ‘topics’ and customers a s ‘word’ in the
restaurant (document)

PT
C R P, however, does not model the entire corpus.
For that, we extend C R P to a set of restaurants, each for one document.
N
This model is known a s Chinese Restaurant Franchise ( C R F )

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 14 / 16
Chinese Restaurant Franchise (CRF)

Customer in the jth restaurant sits at tables in the same manner a s in


C R P, and this is done independently in the restaurants.

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 15 / 16
Chinese Restaurant Franchise (CRF)

Customer in the jth restaurant sits at tables in the same manner a s in


C R P, and this is done independently in the restaurants.

EL
But then, how do we achieve coupling among restaurants?

PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 15 / 16
Chinese Restaurant Franchise (CRF)

Customer in the jth restaurant sits at tables in the same manner a s in


C R P, and this is done independently in the restaurants.

EL
But then, how do we achieve coupling among restaurants?
The coupling is achieved by a franchise-wide menu.

PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 15 / 16
Chinese Restaurant Franchise (CRF)

Customer in the jth restaurant sits at tables in the same manner a s in


C R P, and this is done independently in the restaurants.

EL
But then, how do we achieve coupling among restaurants?
The coupling is achieved by a franchise-wide menu.

PT
The first customer to sit at a table in a restaurant chooses a dish from the
menu and all subsequent customers who sit at that table inherit that dish.
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 15 / 16
Chinese Restaurant Franchise (CRF)

Customer in the jth restaurant sits at tables in the same manner a s in


C R P, and this is done independently in the restaurants.

EL
But then, how do we achieve coupling among restaurants?
The coupling is achieved by a franchise-wide menu.

PT
The first customer to sit at a table in a restaurant chooses a dish from the
menu and all subsequent customers who sit at that table inherit that dish.
Dishes are chosen with probability proportional to the number of tables
N
(franchise-wide) which have previously served that dish.

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 15 / 16
Chinese Restaurant Franchise (CRF)

EL
PT
N

Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 16 / 16
Chinese Restaurant Franchise (CRF)

EL
PT
N
Ea ch restaurant is represented by a rectangle
Customers (θji’s) are seated at tables (circles)
At each table, a dish is served from a global
menu (φk)
ψjt is a table specific indicator, that serves to index items on the global
menu.
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 16 / 16

You might also like