Professional Documents
Culture Documents
Recommendation Systems
Recommendation Systems
Computer Science
Department
1
Significant industrial & research interest in recommendation systems:
2 out of 3 watched hours come from recommendations
2
1. Introduction
3
User feedback
4
Ratings matrix
5
Recommendation Systems
items
ratings matrix
6
Main Approaches
Collaborative filtering (CF)
Inspect rating patters to find similar users/items
Content-based (CB)
Analyze attributes of items for building user profiles
7
Popular Recommenders: K-Νearest Νeighbors Approach
8
User-based collaborative filtering (UBCF)
9
Item-based collaborative filtering (IBCF)
10
IBCF set of users rated
rating on item i rating on item j
both items
11
SVD: Matrix Factorization Approach
Decomposition of ratings matrix R into the product of P & Q
Each user & item is described with F latent features
• P : user factors F factors
• Q : item factors
F factors
Prediction for user u about item i :
12
Matrix - Factorization
13
Matrix - Factorization
P: a |U| x K matrix
Movies P Q: a |D| x K matrix
R ? ? Q
Users
R: Matrix of size |U| x |D| that contains all the ratings that the
users have assigned to the items
14
Matrix - Factorization: Ratings matrix is approximated
Movies P
R ≈ ? ?
Users
x Q
15
Matrix - Factorization:
Each user & item is characterized with a vector of factors
f
User-factors vector
u1
Users
P: vector that
u2
...
represents the strength
of the associations
between a user and the
P
features
uf
Item-factors vector
Movies
Q: vector that
i1
Q
represents the
... strength of the f
associations
i2 between an item
and the features
if
16
Factors - f: mathematic artifact for computing relationship
Matrix - Factorization:
Preference - prediction is the dot product of user & item factors
Vector corresponding to
18
Matrix - Factorization:
How we obtain the P and Q matrices
19
Matrix - Factorization:
Solution to our minimization problem
We consider the squared error because the estimated rating can be either
20
higher or lower than the real one.
Gradient descent is a first-order iterative optimization algorithm for finding the minimum o
a function. To find a local minimum of a function using gradient descent, one takes steps
proportional to the negative of the gradient (or approximate gradient) of the function at the
current point. If instead one takes steps proportional to the positive of the gradient, one
approaches a local maximum of that function; the procedure is then known as gradient
ascent. When the function is convex, all local minima are also global minima so the
gradient discent can converge to the global solution.
Error
rror
E
of
m um
ni
l mi
ca
Lo
When the function is convex, all local minima are also global minima, so in this case gradient descent can converge to the global solution.
~
𝜕 ( 𝑟 𝑖𝑗 − 𝑟 𝑖𝑗 )
2
= 𝜕 𝑝 𝑖𝑘 = 2(
=2( = 2(
= 2( )
= 2)’) = -2
= -2 = -2
Having obtained the gradient, we can now formulate the update rules
for both and :
Cost function
24
Matrix - Factorization: Overfitting
A model learns the detail and noise in the training data to the extent
that it negatively impacts the performance of the model on new data
This means that the noise or random fluctuations in the training data is
picked up and learned by the model
25
Matrix - Factorization: Overfitting - Solution
Cost function
predicted rating
Learning rate
previous values prediction error
Update rules
(Stochastic Gradient Descend)
We choose a small learning rate to avoid making too large a step towards
the minimum as we may run into the risk of missing the minimum and end
up oscillating around it
Error
rror
E
of
um
i ni m
lm
ca
Lo
27
Notes
The convexity guarantees that the local minimum is
also a global one.
28
Motivation
29
Roadmap
1. Introduction
2. Related work
3. The εnαbler system
4. Comparative performance analysis
5. Pilot light εnαbler
6. Conclusions & future work
30
Blending: Popular Hybrid Approach
• Recommenders: generate predictions
• Blender: learns efficient combinations of predictions
Recommenders Predictions Machine-learning
1 p1
2 p2 Blender Final prediction
...
n pn
32
Roadmap
1. Introduction
2. Related work
3. The εnαbler system
4. Comparative performance analysis
5. Pilot light εnαbler
6. Conclusions & future work
33
The εnαbler: Hybrid Modular Recommendation System
k=1..m
i=1..n j=1..n
Folds H1, H2 .. Hm outer CV
Folds F1, F2 .. Fn excluding Fj
inner CV Folds F1, F2 .. Fn
H1 H2 … HK … Hm … …
F1 F2 … Fi … Fn F1 F2 Fj Fn
H-Hk
Recom. F-Fi
ML
Algorithms F-Fj
Meta- Algorithms Best
KNN Predict
features LR blending
SVD Mean Train Mean
RF Train Predict Predict
Error best blender Error
AutoRec Train
ANN
RFCB
…
35
Innovation
36
Contribution
38
The εnαbler is extensively evaluated
• Compared with several recommenders
• Evaluated with different blenders
39
Comparative Performance Analysis
0.854
0.845
RMSE
0.8
Algorithms
Neural-net-based
Learns a representation of the input data & reconstructs it
output
Prediction is given by
where
input
41
Linear regression (LR) captures simple linear relationships
Binned linear regression [1]
• Divides the data into equally sized bins based on a criterion
• Determines separate blending for each bin
0.829
LR item support
LR user support
0.8285 AutoRec
0.8275
42
[1] Jahrer et al. "Combining predictions for accurate recommender systems.“, ACM SIGKDD, 2010.
Random forest (RF) constructs several decision trees
Final prediction: mean prediction of the individual trees
0.855
RF
AutoRec
0.85
0.845
RMSE
0.84
0.835
0.83
0.825
10 50 100 250 500
Number of trees
0.828
0.826
0.826
RMSE
0.824
RMSE
0.824
0.822
0.822
0.82
ec 8 12 24 2 4 8
0.82 R 4 ,1 4 ,2 1 2, 2 ,12
to 8 2 2 2 4 4, ,8 ,1 2
Au ec 12 24 ,1 ,2 2 12 24 2,1
R 4 4
to 1 hidden layer 2 2 4, ,1
Au Hidden layer architecture
2 hidden layers 2 24
3 hidden layers
Hidden layer architecture
45
Pilot light εnαbler
46
Field study
Metrics
• Mean absolute error (MAE) between predictions & ratings
• Watching duration for measuring user engagement
47
Client of the pilot light εnαbler
48
SVD provides more accurate predictions than IBCF
49
SVD receives higher ratings than IBCF
1
SVD
IBCF
0.8
0.6
CDF
0.4
0.2
0
1 2 3 4 5
Rating level
0.8
Recommended
OnDemand
0.6
CDF
0.4
0.2
0
30 60 90 120 150 180
Minutes
52
Post-phase Offline Evaluation
1.06 1.07
0.96
0.92 0.92
0.88
RMSE
0.73
55
Ongoing & Future Work
Personalized health care & medicine
• Collecting patient information through sensors & mobile devices
• Provide personalized treatment plans
• Improve quality of care & decrease medical costs
Screening of ASD & dyslexia using eye-tracking
Assessment of Parkinson using gesture sensors
Other domains:
• Guiding tools for tourists
• Quality assessment of wine/oil/water
56
Performance of each recommender employed in
57
A range of hybrid techniques have been proposed
Switching: choose among different recommenders based on a criterion
true Collaborative
filtering
Ratings > 5
false
Content-based
≈ P x Q
factors
Ratings matrix
59
SVD: matrix factorization approach
Decomposition of ratings matrix R into the product of P and Q
Each user & item is described with F latent features
• P : user factors
• Q : item factors
• Update rules
Prediction is given by
where
input
61
εnαbler: Hybrid Modular Recommendation System