Professional Documents
Culture Documents
Vozalis 2005
Vozalis 2005
Proceedings of the 2005 5th International Conference on Intelligent Systems Design and Applications (ISDA’05)
0-7695-2286-06/05 $20.00 © 2005 IEEE
SV D(A) = U × S × V T their attempts to formulate collaborative filtering as a
classification problem. In their work they utilize SVD
U and V are two orthogonal matrices with dimensions as a dimensionality reduction technique, before they
m × m and n × n respectively. S, called the singular feed their data matrix, which is now represented in the
matrix, is an m × n diagonal matrix whose diagonal reduced feature space, into an Artificial Neural Net-
entries are non-negative real numbers. work (ANN). This ANN, or, as the authors claim, any
The initial r diagonal entries of S (s1 , s2 , . . . , sr ) alternative Machine Learning algorithm, can then be
have the property that si > 0 and s1 ≥ s2 ≥ . . . ≥ sr . trained in order to generate predictions.
Accordingly, the first r columns of U are eigenvectors
of AAT and represent the left singular vectors of A, GroupLens have also applied SVD in at least 3 dis-
spanning the column space. The first r columns of V tinct cases regarding Recommender Systems: (i) in an
are eigenvectors of AT A and represent the right singu- approach that reduces the dimensionality of the user-
lar vectors of A, spanning the row space. If we focus item space and forms predictions in that reduced space,
only on these r nonzero singular values, the effective di- by not building explicit neighborhoods at any point
mensions of the SVD matrices U , S and V will become of the procedure, (ii) in an approach that generates a
m × r, r × r and r × n respectively. user-neighborhood in the SVD reduced space and then
An important attribute of SVD, particularly useful applies normal user-based collaborative filtering on it,
in the case of Recommender Systems, is that it can and (iii) in an approach that aims at increasing the
provide the best low-rank approximation of the original scalability by applying folding-in for the incremental
matrix A. By retaining the first k << r singular values computation of the user-item model [9, 8].
of S and discarding the rest, which can be translated Finally, Goldberg et al. use Principal Component
as keeping the k largest singular values, based on the Analysis [5], a technique very similar to SVD, in or-
fact that the entries in S are sorted, we reduce the der to optimally project highly correlated data along a
dimensionality of the data representation and hope to smaller number of orthogonal dimensions. Then, their
capture the important ”latent” relations existing but Eigentaste algorithm clusters the projected data, being
not evident in the original representation of matrix A. concluded by an online computation of recommenda-
The resulting diagonal matrix is termed Sk . Matrices tions.
U and V should be also reduced accordingly. Uk is
produced by removing r − k columns from matrix U .
Vk is produced by removing r − k rows from matrix V . 2.2 The Base Algorithm: Item-based Filtering
Matrix Ak which is defined as:
Ak = Uk × Sk × VkT
In this section we will briefly discuss the filtering
stands for the closest linear approximation of the orig- algorithm which will be utilized by the proposed tech-
inal matrix A with reduced rank k. Once this trans- nique. Similarly to User-based Collaborative Filtering,
formation is completed, users and items can be repre- Item-based Filtering is based on the creation of neigh-
sented as points in the k-dimensional space. borhoods. Yet, unlike the User-based Collaborative
Filtering approach, those neighbors consist of similar
2.1 Using SVD in Recommender Systems items rather than similar users [11].
The execution steps of the algorithm are (a) Data
SVD, as part of Latent Semantic Indexing (LSI), Representation of the ratings provided by m users on
was widely used in the area of Information Re- n items. This step is based on the construction of an
trieval [4] in order to solve the problems of synonymy, m × n user-item matrix, R. (b) Neighborhood Forma-
which is a result of the many possible ways to express tion, which concludes by the construction of the active
a known concept, and polysemy, which comes from the item’s neighborhood. Similarities for all possible pairs
fact that most words usually have multiple meanings. of items existing in the data set are computed by the
Furthermore, techniques like SVD-updating or folding- application of the preferred similarity metrics. Items
in were proposed to alleviate the problem of updating, most similar to the active item, which refers to the item
which refers to the process of new terms and/or docu- for which predictions should be generated, are selected
ments being added to existing matrices [1]. for its neighborhood. (c) Prediction Generation, where
Those ideas were adopted by researchers in the area final predictions are calculated as a weighted sum of
of Recommender Systems. Initially, Billsus and Paz- ratings given by a user on all items included in the
zani [2] took advantage of the properties of SVD in active item’s neighborhood.
Proceedings of the 2005 5th International Conference on Intelligent Systems Design and Applications (ISDA’05)
0-7695-2286-06/05 $20.00 © 2005 IEEE
2.3 Experimental Methodology (c) Subtract the corresponding row average, ri ,
from all the slots of the new filled-in matrix,
For the execution of our subsequent experiments we Rf illed−in , and obtain the normalized matrix
utilized the data publicly available from the GroupLens Rnorm .
movie recommender system. The MovieLens data set
3. Compute the SVD of Rnorm and obtain matrices
consists of 100.000 ratings which were assigned by 943
U , S and V , of size m × m, m × n, and n × n,
users on 1682 movies. Users should have stated their
respectively. Their relationship is expressed by:
opinions for at least 20 movies in order to be included.
Rnorm = U · S· V T .
Ratings follow the 1(bad)-5(excellent) numerical scale.
Starting from the initial data set five distinct splits of 4. Perform the dimensionality reduction step by
training and test data were generated. keeping only k diagonal entries from matrix S to
Several techniques have been used to evaluate Rec- obtain a k × k matrix, Sk . Similarly, matrices Uk
ommender Systems. Those techniques have been di- and Vk of size m × k and k × n are generated. The
vided by Herlocker et al. [6] into three categories: Pre- ”reduced” user-item matrix, Rred , is obtained by
dictive Accuracy Metrics, Classification Accuracy Met- Rred = Uk · Sk · VkT , while rrij denotes the rating
rics, and Rank Accuracy Metrics. The choice among by user ui on item ij as included in this reduced
those metrics should be based on the selected user tasks matrix.
and the nature of the data sets. We wanted our pro- √
posed algorithms to derive a predicted score for already 5. Compute Sk and then calculate two matrix
√ T
rated items rather than generate a top-N recommenda- products:
√ Uk · Sk , which represents m users and
tion list. Based on that specific task we proceeded in Sk · VkT , which represents n items in the k dimen-
the selection of the initial evaluation metric for our sional feature space. We are particularly inter-
experiments. That metric was Mean Absolute Error ested in the latter matrix, of size k × n, whose en-
(MAE). It is a statistical accuracy metric which mea- tries represent the ”meta” ratings provided by the
sures the deviation of predictions, generated by the k pseudo-users on the n items. A ”meta” rating
Recommender System, from the true rating values, as assigned by pseudo-user ui on item ij is denoted
they were specified by the user. MAE is measured only by mrij .
for those items, for which a user has expressed his opin-
ion. 6. Proceed with Neighborhood Formation which can
be broken into two substeps:
3 The Algorithm: Item-based Filtering (a) Calculate the similarity between items ij and
Enhanced by SVD if by computing their Adjusted Cosine Sim-
ilarity as follows:
We will now describe the steps of how SVD can be jf = adjcorrjf =
sim
combined with Item-based Filtering in order to make k
mr ·mr
the base algorithm more scalable. i=1 ij if
k k
mrij 2 mrif 2
i=1 i=1
Proceedings of the 2005 5th International Conference on Intelligent Systems Design and Applications (ISDA’05)
0-7695-2286-06/05 $20.00 © 2005 IEEE
including the active item and a random item, 5 Experiments
isolate the set of items which appear to be
the most similar to the active item. The aim of the experiments which will be described
7. Conclude with Prediction Generation, achieved by in the following sections is first to locate the opti-
the following weighted sum: mal parameter settings for the Item-based enhanced
l by SVD filtering algorithm. Once those settings are
sim ∗(rrak +r¯a ) found and applied, we will compare the results with
praj = k=1l jk
|simjk | the predictions generated by a plain Item-based Rec-
k=1
Proceedings of the 2005 5th International Conference on Intelligent Systems Design and Applications (ISDA’05)
0-7695-2286-06/05 $20.00 © 2005 IEEE
0.805
0.8
0.799
0.796
which are the most similar to the active item, is able
0.795
to generate the most accurate predictions.
0.794
0.793
Proceedings of the 2005 5th International Conference on Intelligent Systems Design and Applications (ISDA’05)
0-7695-2286-06/05 $20.00 © 2005 IEEE
0.86
svd-ib
ib
References
0.85
0.84
[1] M. W. Berry, S. T. Dumais, and G. W. O’Brien. Us-
ing linear algebra for intelligent information retrieval.
0.83
SIAM Review, 37:573–595, 1995.
0.82 [2] D. Billsus and M. J. Pazzani. Learning collaborative
MAE
Proceedings of the 2005 5th International Conference on Intelligent Systems Design and Applications (ISDA’05)
0-7695-2286-06/05 $20.00 © 2005 IEEE