Professional Documents
Culture Documents
Pawan Goyal
C S E , IIT Kharagpur
Week 9, Lecture 1
Information Overload
A s more information becomes available, it becomes more difficult to find and
discover what we need.
Information Overload
A s more information becomes available, it becomes more difficult to find and
EL
discover what we need.
EL
through them.
We might “zoom-in” or “zoom-out” to find specific or broader themes
Find the theme first and then examine the documents pertaining to that
N
theme
Topic Modeling
EL
Provides methods for automatically organizing, understanding, searching and
summarizing large electronic archives without any prior annotation or labeling
Discover the hidden themes that pervade the collection
PT
Annotate the documents according to those themes
U se annotations to organize, summarize, and search
N
the texts
EL
PT
N
EL
PT
N
This articles is about using data analysis to determine the number of genes an
organism needs to survive
Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 6 / 11
Intuition: Documents exhibit multiple topics
EL
PT
N
EL
PT
N
The article blends geneti cs, data analysis and evolutionary biology in different
proportions
Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 6 / 11
Intuition: Documents exhibit multiple topics
EL
PT
N
Knowing that this article blends those topics would help situate it in a collection
of scientific articles
Pawan Goyal (IIT Kharagpur) Topic Models: Introduction Week 9, Lecture 1 6 / 11
Topic Model: Basic Idea
Generative Model
EL
Documents are mixture of topics, where a topic is a probability distribution over
words.
PT
N
Generative Model
EL
Documents are mixture of topics, where a topic is a probability distribution over
words.
geneti cs topic has words about geneti cs with high probability and the
PT
evolutionary biology topic has words about evolutionary biology with high
probability.
N
Generative Model
EL
Documents are mixture of topics, where a topic is a probability distribution over
words.
geneti cs topic has words about geneti cs with high probability and the
probability.
PT
evolutionary biology topic has words about evolutionary biology with high
Technically, the generative model assumes that the topics are generated first,
N
before the documents.
EL
PT
N
EL
PT
N
EL
PT
N
EL
PT
N
All the document in the collection share the same set of topics, but each
document exhibits those topics in different proportions
EL
Ea ch word in each document is drawn from one of the topics, where the
selected topic is chosen from the per-document distribution over topics
PT
N
All the document in the collection share the same set of topics, but each
document exhibits those topics in different proportions
EL
Ea ch word in each document is drawn from one of the topics, where the
selected topic is chosen from the per-document distribution over topics
PT
In the example article, the distribution over topics would place probability on
geneti cs, data analytics and evolutionary biology, and each word is drawn from
N
one of those three topics.
EL
PT
N
EL
The documents themselves are observed, while the topic structure - the
topics, per-document topic distributions, and the per-document per-word
topic assignments - is hidden structure.
PT
The central computational problem is to use the observed documents to
infer the hidden topic structure, i.e. reversing the generative process.
N
EL
Pawan Goyal
PT C S E , IIT Kharagpur
Week 9, Lecture 2
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 1 / 22
Central Problem of
LDA
EL
The documents themselves are observed, while the topic structure - the
topics, per-document topic distributions, and the per-document per-word
topic assignments - is hidden structure.
PT
The central computational problem is to use the observed documents to
infer the hidden topic structure, i.e. reversing the generative process.
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 2 / 22
Goal: The posterior distribution
EL
PT
N
Infer the hidden variables
Compute their distribution conditioned on the documents
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 3 / 22
Topics from TASA corpus
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 4 / 22
Topics from TASA corpus
EL
distributions over topics.
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 5 / 22
Topics from TASA corpus
EL
distributions over topics.
Equal probability to first two topics: about a person who has taken too
many drugs and how that affected color perceptions.
PT
Equal probability to the last two topics:
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 5 / 22
Topics from TASA corpus
EL
distributions over topics.
Equal probability to first two topics: about a person who has taken too
many drugs and how that affected color perceptions.
PT
Equal probability to the last two topics: about a person who experienced
a loss of memory, which required a visit to the doctor.
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 5 / 22
Generative model and statistical inference
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 6 / 22
Important points
EL
bag-of-words assumpti on: The generative process does not make any
assumpti ons about the order of words in the documents.
capturing polysemy: The way that the model is defined, there is no notion
PT
of mutual exclusivity that restricts words to be part of one topic only. Ex:
both ‘money’ and ‘river’ topics can give high probability to the word ‘bank’.
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 7 / 22
Graphical Model (Notation)
EL
Nodes are random variables
PT
N
Ed g es denote possible dependence
Observed variables are shaded
Plates denote replicated structure
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 8 / 22
Graphical Model (Notation)
EL
PT
Structure of the graph defines the pattern of conditional dependence
between the ensemble of random variables
N
E.g., this graph corresponds to
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 10 / 22
Latent Dirichlet Allocation: Generative Model
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 11 / 22
What is Latent Dirichlet Allocation
(LDA)?
‘Latent’ has the same sense in L DA a s in Latent semanti c indexing, i.e.
capturing topics a s latent variables
The distribution that is used to draw the per-document topic distributions
is called a Dirichlet distribution. This result is used to allocate the words
EL
of the documents to different topics.
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 12 / 22
What is Latent Dirichlet Allocation
(LDA)?
‘Latent’ has the same sense in L DA a s in Latent semanti c indexing, i.e.
capturing topics a s latent variables
The distribution that is used to draw the per-document topic distributions
is called a Dirichlet distribution. This result is used to allocate the words
EL
of the documents to different topics.
Dirichlet Distribution
PT
The Dirichlet distribution is an exponential family distribution over the simplex,
i.e. positive vectors that sum to one
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 12 / 22
Dirichlet Distribution
EL
PT
N
αis: hyper-parameters of the model:
αj can be interpreted a s a prior observation count for the number of times topic
j is sampled in a document
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 13 / 22
Dirichlet Distribution
EL
PT
N
αis: hyper-parameters of the model:
These priors can be interpreted a s forces in the topic distributions with higher
α moving the topics away from the corners of the simplex
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 13 / 22
Dirichlet Distribution
EL
PT
N
αis: hyper-parameters of the model:
When α < 1, there is a bias to pick topic distributions favoring just a few topics
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 13 / 22
Dirichlet Distribution
EL
PT
N
αis: hyper-parameters of the model:
It is convenient to use a symmetric Dirichlet distribution with a single
hyper-parameter α1 = α2 . . . = α
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 13 / 22
Effect of α: α = 1
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 14 / 22
Effect of α: α = 10
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 15 / 22
Effect of α: α = 100
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 16 / 22
Effect of α: α = 1
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 17 / 22
Effect of α: α = 0.1
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 18 / 22
Effect of α: α = 0.01
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 19 / 22
Effect of α: α = 0.001
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 20 / 22
Online Implementations
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 21 / 22
Latent Dirichlet Allocation: Statistical Inference
EL
PT
N
Pawan Goyal (IIT Kharagpur) Latent Dirichlet Allocation: Formulation Week 9, Lecture 2 22 / 22
Gibbs Sampling for LDA, Applications
EL
Pawan Goyal
PT C S E , IIT Kharagpur
Week 9, Lecture 3
N
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 1 / 13
Approximating the posterior
Sampling-based Algorithms
EL
Collect samples from the posterior to approximate it with an empirical
distribution
PT
N
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 2 / 13
Approximating the posterior
Sampling-based Algorithms
EL
Collect samples from the posterior to approximate it with an empirical
distribution
Variational Methods PT
Deterministic alternative to sampling-based algorithms
N
The inference problem is transformed to an optimization problem
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 2 / 13
Gibbs Sampling
EL
high-dimensional distribution by sampling on lower-dimensional subset of
variables where each subset is conditioned on the value of all others
Sampling is done sequentially and proceeds until the sampled values
PT
approximate the target distribution
It directly estimates the posterior distribution over z, and uses this to
provide estimates for β and θ
N
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 3 / 13
Gibbs Sampling
Suppose we have a word token i for which we want to find the topic
assignment probability : p(zi = j)
EL
Represent the collection of documents by a set of word indices wi and
document indices di for this token i
Gibbs sampling considers each word token in turn and estimates the
PT
probability of assigning the current word token to each topic, conditioned
on the topic assignment to all other word tokens
From this conditional distribution, a topic is sampled and stored a s the
N
new topic assignment for this word token
This conditional is written a s P(zi = j|z−i, wi, d i ,.)
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 4 / 13
Gibbs Sampling
EL
CdjWT contains the number of times topic j is assigned to some word
token in document d, not including the current instance
PT
N
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 5 / 13
Gibbs Sampling
EL
CdjWT contains the number of times topic j is assigned to some word
token in document d, not including the current instance
Cw jWT + η CdijDT +
∝ PT
P(zi = j|z−i, wi, d i ,.) W i
α
T
∑ CwjWT + Wη ∑ CdjDT + Tα
w=1 t=1
N
The left part is the probability of word w under topic j (How likely a word is
for a topic) whereas
the right part is the probability of topic j under the current topic distribution
for document d (How dominant a topic is in a document)
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 5 / 13
Algorithm
EL
For each word token, a new topic is sampled a s per P(zi = j|z−i, wi, d i ,.) ,
adjusting the matrices CWT and CDT
A single p ass through all word tokens in the document is one Gibbs
sample
PT
After the burnin period, these samples are saved at regularly spaced
intervals, to prevent correlations between samples
N
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 6 / 13
Estimating θ and
β
CijWT + η
βi(j) = W
∑ CkjWT + Wη
EL
k=1
CdjDT + α
θj (d) = T
PT ∑ CdkDT + Tα
k=1
N
These values correspond to predictive distributions of
sampling a new token of word i from topic j, and
sampling a new token in document d from topic j
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 7 / 13
An Example
The algorithm can be illustrated by generating artificial data from a known topic
model and applying the algorithm to check whether it is able to infer the
original generative structure.
EL
Example
Let topic 1 give equal probability to MONEY, LOA N , B A N K and topic 2
give equal probability to words R I V E R , S T R E A M , and BA N K
PT
βMONEY (1) = βLOAN (1) = βBANK (1) = 1/3
N
βRIVER(2) = βSTREAM (2) = βBANK (2) = 1/3
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 8 / 13
Initial Structure
EL
PT
N
Colors reflect initial random assignment, black = topic 1, while = topic 2
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 9 / 13
After 64 iterations of Gibbs Sampling
EL
PT
N
βMONEY (1) = 0.32, βLOAN (1) = 0.29, βBANK (1) = 0.39
βRIVER(2) = 0.25, βSTREAM (2) = 0.4, βBANK (2) = 0.35
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 10 / 13
Computing Similarities
Document Similarity
Similarity between documents d1 and d2 can be measured by the similarity
between their topic distributions θ(d1) and θ(d2)
T
p
K L divergence : D(p, q) = ∑ pj log 2 qjj
EL
j=1
= ∏ ∑ P(wk|z = j)P(z = j|
d i)
wk ∈qj=1
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 11 / 13
Computing Similarities
EL
Having observed a single word in a new context, what are the other words that
might appear in the same context, based on the topic interpretation for the
observed word?
PT
p(w2|w1) =
T
∑ p(w2|z = j)p(z = j|
N
wi)
j=1
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 12 / 13
Example
EL
PT
N
Pawan Goyal (IIT Kharagpur) Gibbs Sampling for LDA, Applications Week 9, Lecture 3 13 / 13
LDA Variants and Applications -
I
EL
Pawan Goyal
PT C S E , IIT Kharagpur
Week 9, Lecture 4
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 1 / 24
Example Problem
Suppose you are using Gibbs sampling to estimate the distributions, θ and β
for topic models. The underlying corpus has 5 documents and 5 words, { River,
Stream, Bank, Money, L o a n } and the number of topics is 2. At certain point,
the structure of the documents looks like the following Table. For instance, the
first row indicates that the document 1 contains 4 instances of word ‘Bank’, 6
EL
instances of word ‘Money’ and 6 instances of word ‘Loan’. Black and white
circles denote whether the word is currently assigned to topics t1 and t2
respectively.
PT
U s e this structure to estimate βMONEY (2) and βBANK (1) at this point. You can
take the values of η and α to be 0.1 each.
Doc. Id River Stream Bank Money Loan
N
1 • • •• •••• ••••
2
•••◦ •• ••
3
◦ ◦◦◦ • • • • • • • • ••
4
5 ◦◦◦◦ ◦◦◦ •◦◦◦ •• •••
◦◦ ◦◦◦◦ •◦ • • ••
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 2 / 24
Modeling Science
Data
The OC R ’ed collection of Scien ce from 1990-2000
EL
17K documents
11M words
20K unique
terms (stop
words and rare
PT
words removed)
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 3 / 24
Modeling Science
Data
The OC R ’ed collection of Scien ce from 1990-2000
EL
17K documents
11M words
20K unique
terms (stop
words and rare
PT
words removed)
N
Model
100-topic model
using variational
inference
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 3 / 24
Example Inference
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 4 / 24
Example Topics
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 5 / 24
Modeling Richer Assumptions in Topic Models
EL
Correlated topic models
Dynamic topic models
Measuring scholarly impact
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 6 / 24
Correlated Topic Models
EL
An article about fossil fuels is more likely to also be about geology than
about genetics
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 7 / 24
Correlated Topic Model
(CTM)
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 8 / 24
CTM supports more topics and provides a better fit than
LDA
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 9 / 24
Correlated Topics
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 10 / 24
Dynamic Topic Models
EL
LDA assumption
L DA assumes that the order of documents does not matter
Not appropriate for corpora that spans hundreds of years
PT
We might want to track how language changes over time
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 11 / 24
Dynamic Topic Models
EL
PT
N
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 13 / 24
Dynamic Topic Models
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 14 / 24
Measuring Scholarly Impact
EL
Idea from Dynamic Topic Models, influential articles reflect future
changes in language use
The influence of an article is a latent variable
PT
Influential articles affect the drift of the topics that they discuss
The posterior gives a retrospective estimate of influential article
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 15 / 24
Measuring Scholarly Impact
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 16 / 24
Measuring Scholarly Impact
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 17 / 24
Supervised settings of
LDA
EL
Web pages paired with a number of likes
Documents paired with links to other documents
Images paired with a category
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 18 / 24
Supervised settings of
LDA
EL
Web pages paired with a number of likes
Documents paired with links to other documents
Images paired with a category
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 18 / 24
Supervised
LDA
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 19 / 24
Supervised LDA: why a different model is required?
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 20 / 24
Supervised LDA: why a different model is required?
EL
Formulate a model in which the response variable y is regressed on topic
proportions θ
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 20 / 24
Supervised LDA: why a different model is required?
EL
Formulate a model in which the response variable y is regressed on topic
proportions θ
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 20 / 24
Supervised LDA: why a different model is required?
EL
Formulate a model in which the response variable y is regressed on topic
proportions θ
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 20 / 24
Prediction
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 21 / 24
Example: Movie Reviews
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 22 / 24
Held out correlation
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 23 / 24
Supervised Topic Models
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - I Week 9, Lecture 4 24 / 24
LDA Variants and Applications - II
EL
Pawan Goyal
PT C S E , IIT Kharagpur
Week 9, Lecture 5
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 1 / 16
Relational Topic Models
EL
PT
N
Connected Observations
Citation networks of documents
Hyperlinked networks of web-pages
Friend-connected social network profiles
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 2 / 16
Relational Topic Models
EL
L DA needs to be adapted to a model of content and connection
RTM s find hidden structure in both types of data
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 3 / 16
Relational Topic Models
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 4 / 16
Link Prediction Task using
RTM
EL
RTM allows for such predictions
PT
N
links given the new words of a document
words given the links of a new document
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 5 / 16
Relational Topic Models
EL
PT
N
Formulation ensures that the same latent topic assignments used to generate
the content of the documents also generates their link structure.
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 6 / 16
RTM: Generative Model
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 7 / 16
Link Probability Function (ψ)
Dependent on the topic assignments that generated their words, zd and zd′ .
EL
zd = N1d
∑n zd,n
◦notation denotes the Hadamard (element-wise) product Exponential
PT
function is being used, they also tried using sigmoid (ψσ)
Link function models each per-pair binary variable a s a logistic regression
N
parameterized by η1×K and intercept ν (in case of sigmoid)
Covariates are constructed by the Hadamard product of zd and zd′ ,
capturing similarity between the hidden topic representation of the two
documents
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 8 / 16
Inference: How many links to model
One can fix yd1,d2 = 1 whenever a link is observed between d1 and d2 and
set yd1,d2 = 0 otherwise
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 9 / 16
Inference: How many links to model
One can fix yd1,d2 = 1 whenever a link is observed between d1 and d2 and
set yd1,d2 = 0 otherwise
EL
Problem with that approach is that the absence of a link cannot be
construed a s evidence for yd1,d2 = 0
PT
So, in these cases, these links are treated a s unobserved variables
Also provides a significant computational advantage
N
In large social networks like Facebook, the absence of a link between two
people doesn’t necessarily mean that they are not friends.
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 9 / 16
Predicting links from documents
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 10 / 16
Predicting links from documents
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 11 / 16
Nonparametric Models
EL
In many data analysis settings, however, we do not know the number and
would like to learn it from the data.
Bayesian Non-Parametric (BNP) models assume that there is an infinite
PT
number of latent clusters, but only a finite number of them is used to
generate the observed data.
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 12 / 16
Nonparametric Models
EL
In many data analysis settings, however, we do not know the number and
would like to learn it from the data.
Bayesian Non-Parametric (BNP) models assume that there is an infinite
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 12 / 16
Chinese Restaurant Process: the prior over groupings
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 13 / 16
Chinese Restaurant Process: the prior over groupings
EL
The first customer enters and sits at the first table.
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 13 / 16
Chinese Restaurant Process: the prior over groupings
EL
The first customer enters and sits at the first table.
1
The second customer enters and sits at the first table with probability 1 +α
PT α
and the second table with probability 1+α .
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 13 / 16
Chinese Restaurant Process: the prior over groupings
EL
The first customer enters and sits at the first table.
1
The second customer enters and sits at the first table with probability 1 +α
PT α
and the second table with probability 1+α .
When the nth customer arrives, she sits at any of occupied tables with
probability proportional to the number of previous customers sitting there,
N
and at the next unoccupied table with probablity proportional to α.
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 13 / 16
Chinese Restaurant Process: the prior over groupings
EL
The first customer enters and sits at the first table.
1
The second customer enters and sits at the first table with probability 1 +α
PT α
and the second table with probability 1+α .
When the nth customer arrives, she sits at any of occupied tables with
probability proportional to the number of previous customers sitting there,
N
and at the next unoccupied table with probablity proportional to α.
A large value of α will produce more occupied tables (and fewer
customers per table).
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 13 / 16
How to model a corpus
EL
Tables can be thought of a s ‘topics’ and customers a s ‘word’ in the
restaurant (document)
PT
C R P, however, does not model the entire corpus.
For that, we extend C R P to a set of restaurants, each for one document.
N
This model is known a s Chinese Restaurant Franchise ( C R F )
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 14 / 16
Chinese Restaurant Franchise (CRF)
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 15 / 16
Chinese Restaurant Franchise (CRF)
EL
But then, how do we achieve coupling among restaurants?
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 15 / 16
Chinese Restaurant Franchise (CRF)
EL
But then, how do we achieve coupling among restaurants?
The coupling is achieved by a franchise-wide menu.
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 15 / 16
Chinese Restaurant Franchise (CRF)
EL
But then, how do we achieve coupling among restaurants?
The coupling is achieved by a franchise-wide menu.
PT
The first customer to sit at a table in a restaurant chooses a dish from the
menu and all subsequent customers who sit at that table inherit that dish.
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 15 / 16
Chinese Restaurant Franchise (CRF)
EL
But then, how do we achieve coupling among restaurants?
The coupling is achieved by a franchise-wide menu.
PT
The first customer to sit at a table in a restaurant chooses a dish from the
menu and all subsequent customers who sit at that table inherit that dish.
Dishes are chosen with probability proportional to the number of tables
N
(franchise-wide) which have previously served that dish.
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 15 / 16
Chinese Restaurant Franchise (CRF)
EL
PT
N
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 16 / 16
Chinese Restaurant Franchise (CRF)
EL
PT
N
Ea ch restaurant is represented by a rectangle
Customers (θji’s) are seated at tables (circles)
At each table, a dish is served from a global
menu (φk)
ψjt is a table specific indicator, that serves to index items on the global
menu.
Pawan Goyal (IIT Kharagpur) LDA Variants and Applications - II Week 9, Lecture 5 16 / 16