You are on page 1of 21

Topic Modelling Using Non-Negative Matrix

factorization

Anjusha C
MA18M008

Department of Mathematics
IIT Madras

15/05/2020
What is topic modelling?

I Topic modelling is a method for finding topics from a


collection of documents that best represents the information in
the collection.
I Topics are ’repeating pattern of co-occurring terms in a corpus’
How to mathematically quantify them?
I Consider 2 documents:
I Doc A : On ’Biology’
I Doc B : On ’Linear Algebra’
Consider the words : ’decompose’,’matrix’,’mitochondria’ and
’eigen vector’. Now, the word ’decompose’ can occur in both
documents. But its relative frequency of occurrence in Doc A
will be more than Doc B. Similarly, ’matrix’ might occur in
both docs but the relative frequency in Doc B would be more
than Doc A.
I Consider the set of words occuring in the documents:
(w1 , w2 , ...., wn ). Each document can be represented as a
vector where the ith entry is the frequency of the word wi .
I Doc A = (νA1 , νA2 , ......, nuAn )
I Doc B = (νB1 , νB2 , ......, nuBn )
I These vectors are the two topics. One corresponding to
Biology and the other to Linear Algebra.
Multiple topics in the same document

Each document is a linear combination of the topic vectors (basis


vectors).
Term-document matrix and term-weighting

Souce: Introduction to Information Retrival


I Stop word removal/Tf-Idf scoring
Linear Dimensionality Reduction

AIM : extract these topic vectors from the term-document matrix.

Souce:
http://derekgreene.com/slides/topic-modelling-with-scikitlearn.pdf
NMF

I Previous approach : there can be negative entries in W and H


I NMF assumption : Non-negative entries - constrained
optimization
I AIM : Find W and H such that V ≈ WH, W ≥ 0, H ≥ 0,
given factorization rank k, (i.e, number of topics)
Cost functions

How to check the goodness of approximation V ≈ WH ?


I Frobenius norm : ||V − WH||2 = ij (Vij − (WH)ij )2
P

I K-L divergence :
P Vij
D(V ||WH) = ij (Vij log (WH) ij
− Vij + (WH)ij )
Method

I Gradient descent
I Naive GD : No guarantee of convergence
I Problem convex on only W or H at a time and not on both
simultaneously.
I So, have to optimize W keeping H fixed and then optimize H
keeping W fixed and keep alternating until convergence.
(Alternating Least Squares approach)
Lee Sung multiplicative updates
I Put forward by Daniel D Lee and H Sebastian Sung
(Link : Lee-Sung paper)
I These G-D updates are guaranteed to converge
I Frobenius norm update :

(W T V )aµ
Haµ = Haµ
(W T WH)aµ
(VH T )ia
Wia = Wia
(WHH T )ia
I K-L divergence update :
P
Wia Viµ /(WH)iµ
i
Haµ = Haµ P (1)
P k Wka
µ Haµ Viµ /(WH)iµ
Wia = Wia P (2)
ν Haν
Implementation
Dataset : ’20 newsgroups’ dataset is used. Newsgroups are
discussion groups on Usenet, which was popular in the 80s and 90s
before the web really took off. This dataset includes 18,000
newsgroups posts with 20 topics.

Only 4 topics are chosen : ’alt.atheism’, ’talk.religion.misc’,


’comp.graphics’ and ’sci.space’. The dataset is vectorised (the
document-term matrix is constructed)
NMF in Scikit learn

The ’show topics’ function picks up the top 8 words (words with
highest frequency in each topic vector). The words with the highest
frequencies are the ones that are specific to the topic.
The rows are the topics
The time taken is approximately 5.92 s.
Lee-Sung update : Frobenius norm
The
time taken is 21.27 s.
Error plot for Frobenius norm
Lee-Sung update : K-L divergence
The time taken is 54.04 s.
Error plot for Frobenius norm
References

I Daniel D Lee, H Sebastian Sung, Algorithms for non-negative


matrix factorization, Advances in neural information processing
systems, 2001
I Nicolas Gillis, The Why and How of Non-negative Matrix
Factorization, arXiv:1401.5226v2,2014
I Christopher D Manning, Introduction to Information Retrieval,
Cambridge University Press,2009
I NMF tutorial by Racheal Thomas
I http://derekgreene.com/slides/topic-modelling-with-
scikitlearn.pdf

You might also like