You are on page 1of 20

INTRODUCTION TO

MATRIX FACTORIZATION

proprietary material
METHODS COLLABORATIVE
FILTERING
USER RATINGS PREDICTION

1 Alex Lin
Senior Architect
Intelligent Mining
Outline
  Factoranalysis
  Matrix decomposition

proprietary material
  Matrix Factorization Model

  Minimizing Cost Function


  Common Implementation

2
Factor Analysis
  Aprocedure can help identify the factors that
might be used to explain the interrelationships
among the variables

proprietary material
  Model based approach

3
Refresher: Matrix Decomposition
r q p
5 x 6 matrix 5 x 3 matrix 3 x 6 matrix

X11 X12 X13 X14 X15 X16 x

proprietary material
X21 X22 X12 X24 X25 X26 y

X31 X32 X33 X34 X35 X36 a b c z

X41 X42 X43 X44 X45 X46

X51 X52 X53 X54 X55 X56

X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z

Rating Prediction rˆui = qTi pu User Preference Factor Vector


4

Movie Preference Factor Vector


Making Prediction as Filling Missing
Value
users
1 2 3 4 5 6 7 8 9 10 … n

proprietary material
1 5 4 4 3
2 3 ?
items

3 3 ? 5 ? 2
4 3 3 4 3

4 4

rˆui = qTi pu User Preference Factor Vector

5
Rating Prediction Movie Preference Factor Vector


Learn Factor Vectors
users
1 2 3 4 5 6 7 8 9 10 … n

1 5 4 4 3

proprietary material
2 3 ?
items

3 3 ? 5 ? 2
4 3 3 4 3

4 4

4=
m U3-1 * I1-1 + U3-2 * I1-2 + U3-3 * I1-3 + U3-4 * I1-4
3 = U7-1 * I2-1 + U7-2 * I2-2 + U7-3 * I2-3 + U7-4 * I2-4
…..

3 = U86-1 * I12-1 + U86-2 * I12-2 + U86-3 * I12-3 + U86-4 * I12-4

Note: only train on known entries


2X + 3Y = 5 2X + 3Y = 5 6
4X - 2Y = 2 4X - 2Y = 2
3X - 2Y = 2
Why not use standard SVD?
  Standard SVD assumes all missing entries are
zero. This leads to bad prediction accuracy,
especially when dataset is extremely sparse. (98%

proprietary material
- 99.9%)
  See Appendix for SVD

  In some published literatures, they call Matrix


Factorization as SVD, but note it’s NOT the same
kind of classical low-rank SVD produced by
svdlibc.

7
How to Learn Factor Vectors
  How do we learn preference factor vectors (a, b, c) and
(x, y, z)?

proprietary material
  Minimize errors on the known ratings

To learn the factor


min
q*. p*
∑ (rui − x ui ) 2
vectors (pu and qi)
(u,i)∈k

Minimizing Cost Function


(Least Squares Problem)

rui : actual rating for user u on item I


€ xui : predicted rating for user u on item I

8
Data Normalization
  Remove Global mean
users

proprietary material
1 2 3 4 5 6 7 8 9 10 … n

1 1.5 -.9 -.2 .49

2 .79 ?
items

3 0.6 ? .46 ? -.4

4 .39 .82 .76 .69


.52 .8

9
Factorization Model
  Only Preference factors

min ∑ ui
(r − µ − q T
p ) 2

proprietary material
i u
q*. p*
(u,i)∈k
To learn the factor vectors (pu and qi)

Rating = 4

Global Preference
Mean Factor

rui : actual rating of user u on item I


u : training rating average
bu : user u user bias 10
bi : item i item bias
qi : latent factor array of item i
pu : later factor array of user u
Adding Item Bias and User Bias
  Add Item bias and User bias as parameters

min ∑ ui
(r − µ − b − b − q T
p ) 2

proprietary material
i u i u
q*. p*
(u,i)∈k
To learn Item bias and User bias

Rating = 4

Global Preference
Mean Factor

rui : actual rating of user u on item I


u : training rating average
Item Bias User Bias
bu : user u user bias 11
bi : item i item bias
qi : latent factor array of item i
pu : later factor array of user u
Regularization
  To prevent model overfitting
2 2
∑ ui

proprietary material
T 22 2
min (r − µ − bi − bu − q p u ) + λ ( qi +
i pu + bi + bu)
q*. p*
(u,i)∈k
Regularization to prevent overfitting

Rating = 4

Global Preference
Mean Factor

rui : actual rating of user u on item I


u : training rating average
bu : user u user bias
Item Bias User Bias bi : item i item bias
12
qi : latent factor array of item i
pu : later factor array of user u
λ : regularization Parameters


Optimize Factor Vectors
  Find optimal factor vectors - minimizing cost
function

proprietary material
  Algorithms:
  Stochastic gradient descent
  Others: Alternating least squares etc..
  Most frequently use:
  Stochastic gradient descent

13
Matrix Factorization Tuning
  Number of Factors in the Preference vectors
  Learning Rate of Gradient Descent

proprietary material
  Best result usually coming from different learning rate
for different parameter. Especially user/item bias terms.
  Parameters in Factorization Model
  Time dependent parameters
  Seasonality dependent parameters
  Many other considerations !

14
High-Level Implementation Steps
  Construct User-Item Matrix (sparse data structure!)
  Define factorization model - Cost function

proprietary material
  Take out global mean

  Decide what parameters in the model. (bias,


preference factor, anything else? SVD++)
  Minimizing cost function - model fitting
  Stochastic gradient descent
  Alternating least squares
  Assemble the predictions
  Evaluate predictions (RMSE, MAE etc..)

  Continue to tune the model


15
Thank you
  Any question or comment?

proprietary material
16
Appendix
  Stochastic Gradient Descent
  Batch Gradient Descent

proprietary material
  Singular Value Decomposition (SVD)

17
Stochastic Gradient Descent
Repeat Until Convergence {
for i=1 to m in random order {

proprietary material
θ j := θ j + α (y (i) − hθ (x ( i) ))x (ji) (for every j)
} partial derivative term
}

€ Your code Here:

18
Batch Gradient Descent
Repeat Until Convergence {
m
θ j := θ j + α ∑ (y (i) − hθ (x ( i) ))x (ji) (for every j)

proprietary material
i=1 partial derivative term
}

€ Your code Here:

19
Singular Value Decomposition
(SVD)
A = U × S ×VT
A U S VT
m x r matrix r x r matrix r x n matrix

proprietary material
m x n matrix


items

items

rank = k
k<r

users users

Ak = U k × Sk × VkT

20

You might also like