Professional Documents
Culture Documents
Information Retrieval
Fall 2019
Srihari-CSE535-Fall2019
Latent Semantic Analysis
n Latent semantic space: illustrating example
Observations:
n Does not exactly match the original term x document matrix (gets closer and closer depending on k)
n Notice that some of the 0 s in X are getting closer to 1 and vice versa
lsa.colorado.edu
Linear Algebra
Background
Eigenvalues & Eigenvectors
n Eigenvectors (for a square m´m matrix S)
Example
"w Î Â , w Sw ³ 0, then if Sv = lv Þ l ³ 0
n T
Example
Let é2 1 ù Real, symmetric.
n
S=ê ú
ë1 2 û
n Then é2 - l 1 ù
S - lI = ê ú Þ ( 2 - l ) 2
- 1 = 0.
ë 1 2 - lû
And S=ULU–1.
Diagonal decomposition - example
é2 1 ù
Recall S=ê ú; l1 = 1, l2 = 3.
ë1 2û
æ 1 ö æ 1ö é 1 1ù
çç ÷÷ andçç ÷÷ form U = ê- 1 1ú
The eigenvectors
è -1ø è1ø ë û
-1 é1 / 2 - 1 / 2ù Recall
Inverting, we have U = ê ú UU –1 =1.
ë1 / 2 1 / 2 û
é 1 1ù é1 0ù é1 / 2 - 1 / 2ù
Then, S=ULU–1 = ê ú ê ú ê ú
ë- 1 1û ë0 3û ë1 / 2 1 / 2 û
Example continued
Let s divide U (and multiply U–1) by 2
é 1 / 2 1 / 2 ù é1 0ù é1 / 2 - 1 / 2 ù
Then, S= ê úê ú ê ú
ë- 1 / 2 1 / 2 û ë0 3û ë1 / 2 1 / 2 û
Q L (Q-1= QT )
é 0 1ù é0 1 ù é 1 2ù é 2 2ù
ê - 1 0ú ê1 0ú ê - 2 3ú ê 2 4ú
ë û ë û ë û ë û
Time out!
n I came to this class to learn about text retrieval
and mining, not have my linear algebra past
dredged up again …
n But if you want to dredge, Strang s Applied
Mathematics is a good place to start.
n What do these matrices have to do with text?
n Recall m´ n term-document matrices …
n But everything so far needs square matrices – so
…
Singular Value Decomposition
For an m´ n matrix A of rank r there exists a factorization
(Singular Value Decomposition = SVD) as follows:
A = USV T
m´m m´n V is n´n
M>N
M<N
SVD example
é1 - 1ù
Let
ê
A=ê 0 1ú ú
êë 1 0 úû
Thus m=3, n=2. Its SVD is
é 0 2/ 6 1/ 3 ùé 1 0 ù
ê úê ú é1 / 2 1 / 2 ù
ê1 / 2 -1/ 6 1 / 3 ú ê0 3úê ú
ê1 / 2 ú ë1/ 2 -1/ 2 û
ë 1/ 6 - 1 / 3 û êë 0 0 úû
VT
U
Typically, the singular values arranged in decreasing order.
Low-rank Approximation
n SVD can be used to compute optimal low-rank
approximations.
n Approximation problem: Find Ak of rank k such that
Ak = min
X :rank ( X ) = k
A- X F Frobenius norm
Ak = åi =1s u v
k T column notation: sum
i i i of rank 1 matrices
Approximation error
n How good (bad) is this approximation?
n It s the best possible, measured by the
Frobenius norm of the error:
min A- X F
= A - Ak F
= s k +1
X :rank ( X ) = k
Observations:
n Does not exactly match the original term x document matrix (gets closer and closer depending on k)
n Notice that some of the 0 s in X are getting closer to 1 and vice versa
lsa.colorado.edu
Performing the maps
n Each row and column of A gets mapped into the
k-dimensional LSI space, by the SVD.
n Claim – this is not only the mapping with the best
(Frobenius error) approximation to A, but in fact
improves retrieval.
n A query q is also mapped into this space, by
-1
qk = q U k S
T
k
n documents
0 s
Block 2
m
terms
…
0 s
Block k
n documents
Block 1
0 s
Block 2
m
terms
…
0 s
Block k
Vocabulary partitioned into k topics (clusters);
each doc discusses only one topic.
Intuition from block matrices
n documents
0 s
Block 2
m
terms
…
0 s
Block k
= non-zero entries.
Intuition from block matrices
Likely there s a good rank-k
approximation to this matrix.
wiper
tire Block 1
V6
…
Few nonzero entries
Block k
car 10
automobile 0 1
Simplistic picture
Topic 1
Topic 2
Topic 3
Some wild extrapolation
n The dimensionality of a corpus is
the number of distinct topics
represented in it.
n More mathematical wild extrapolation:
n if A has a rank k approximation of low
Frobenius error, then there are no
more than k distinct topics in the
corpus.
LSI has many other applications
n In many settings in pattern recognition and
retrieval, we have a feature-object matrix.
n For text, the terms are features and the docs are
objects.
n Could be opinions and users …
n This matrix may be redundant in dimensionality.
n Can work with low-rank approximation.
n If entries are missing (e.g., users opinions), can
recover if dimensionality is low.
n Powerful general analytical technique
n Close, principled analog to clustering methods.
Applications of LSI/LSA
n Enterprise search
n Essay scoring
n Data/Text Mining
n Image Processing
n eigenfaces
Probabilistic Latent Semantic Analysis
which is to be maximised w.r.t. parameters P(t | k) and then also P(k | d),
T K
subject to the constraints that å P(t | k ) = 1 and å P(k | d ) = 1.
t =1 k =1
The PLSA algorithm
n Inputs: term by document matrix X(t,d), t=1:T, d=1:N and the
number K of topics sought
n Initialise arrays P1 and P2 randomly with numbers between [0,1]
and normalise them to sum to 1 along rows
n Iterate until convergence
For d=1 to N, For t =1 to T, For k=1:K
N
X (t , d ) P1(t , k )
P1(t , k ) ¬ P1(t , k )å K
P 2(k , d ); P1(t , k ) ¬ T
d =1
å P1(t , k ) P2(k , d )
k =1
å P1(t , k )
t =1
T
x(t , d ) P 2(k , d )
P 2(k , d ) ¬ P 2(k , d )å K
P1(t , k ); P 2(k , d ) ¬ K
t =1
å P1(t , k ) P2(k , d )
k =1
å P2(k , d )
k =1
The BOW toolkit for creating term by doc matrices and other text
processing and analysis utilities:
http://www.cs.cmu.edu/~mccallum/bow