You are on page 1of 5

2010 International Conference on Advances in Social Networks Analysis and Mining

A Movie Rating Prediction Algorithm with


Collaborative Filtering
O. Bora Fikir lker O. Yaz Tansel zyer
TOBB University TOBB University TOBB University
Dept. of Computer Engineering Dept. of Computer Engineering Dept. of Computer Engineering
Ankara, Turkey Ankara, Turkey Ankara, Turkey
st05210114@etu.edu.tr st05110067@etu.edu.tr ozyer@etu.edu.tr

Abstract Recommendation systems are one of the research filtering and hybrid methods. Content based filtering,
areas studied intensively in the last decades and several solutions recommends something to user according to past preferences
have been elicited for problems in different domains for (s)he made. Content of the past preferences are significant for
recommending. Recommendation may differ as content, recommendation. Collaborative filtering relies on the past
collaborative filtering or both. Other than known challenges in
preference\rating correlation with other users. Based on this
collaborative filtering techniques, accuracy and computational
cost at a large scale data are still at saliency. In this paper we correlation, people with similar likes are taken into account for
proposed an approach by utilizing matrix value factorization for recommendation. Hybrid methods are the combination of both
predicting rating i by user j with the sub matrix as k-most similar [2, 5].
items specific to user i for all users who rated them all. In an Content based filtering is one of the oldest methods for
attempt, previously predicted values are used for subsequent percolating data. Systems using this method analysis the
predictions. In order to investigate the accuracy of neighborhood content of a set of items together with the ratings provided by
methods we applied our method on Netflix Prize [1]. We have individual users to infer which non-rated items might be of
considered both items and users relationships on Netflix dataset interest for a specific user. Basically, content based filtering
for predicting movie ratings. We have conducted several
uses utility function U(c,s), is computed with the utility of
experiments.
item s to user c. s content is composed of s1,s2,..,sk. Overall
Keywords- Collaborative filtering; QR factorization; k-nearest utility is based on U(c,si), i=1..k[2,5]. Content based filtering
neighborhood. has severe problems. User is limited to content has already
been rated. Unrated content is ignored. To be able to do a
I. INTRODUCTION better recommendation, user must do as many ratings onto
Recommendation systems are originated from different content. This action is mostly disliked by users. Also, some
areas such as approximation theory, cognitive systems, other criteria as presentation, loading time can play a role in
information retrieval, prediction methods, and management content rating and this is disregarded (Limited content and
science. Resnick and Varian describe recommendation systems overspecialization and new user problem) [2, 4].
as the opinion of users of community in order to help As in our daily life, we rely on the idea our friends, peers who
individuals to obtain their interests among a set of choices [5]. share the same likes\dislikes and if they recommend an item,
In mid 90s recommendation systems has become an individual we are likely to enjoy it or vice a versa.
research area [2, 4]. The taxonomy of technical design, item Collaborative filtering methods have been explained in [4, 5].
and evaluation characteristics have been given in detail [6]. In Tapestry [9] is one of the fist recommendation systems using
parallel to internet technologies, recommendation systems have collaborative filtering technique. Mainly, for U(c,s) utility
been used more especially in e-commerce for movie, music, function U(ci,s) is computed for all user ci. Collaborative
book, videos, pictures and etc.
filtering methods can be dependent on memory based and
There are very well known e-commerce examples such as model based techniques [2]. Memory based and model based
Amazon, MovieFinder, eBay, Reel.com, and CDNOW. A techniques have been surveyed in [2]. Model based methods
detailed taxonomy of these techniques can be found in [6]. are costly for model building whereas memory based model is
Past and future recommendation systems have been easy to construct.
summarized in [2, 5]. Contrary to content based filtering, collaborative filtering
As the massive information accessible via Internet grows methods do not suffer some problems content based filtering
exponentially, users have more difficulties to reach the needed does. Regardless of content, irrelevant items not seen
information. There are various attempts to cope with the previously can be dealt easily with collaborative filtering
inherent problems in information filtering techniques with data methods. Still, collaborative filtering methods have some
mining techniques to come up with a solution ([2, 7 and 8]). problems as new user, new item and data sparsity problems.
Recommendation systems can be summarized in three New user problem also exists in content based filtering. A new
approaches. Namely, the content based filtering, collaborative users preferences must be learnt first. New item must also be

978-0-7695-4138-9/10 $26.00 2010 IEEE 321


DOI 10.1109/ASONAM.2010.64
rated by enough number of users. A general approach to rating user space. Each user u j ( j = 1..n) rates items t i (i = 1..m ) and
for a user is rating small amount of item, so rating matrix is
sparse. Similarities among users must be found in a sparse users rating for each cell is denoted as rij . The subscripts of
matrix [2]. rij indicates that the entry at the ith row and jth column of the
Both content and collaborative filtering methods are used
together to surpass the limitations of both methods. There are matrix. It can be easily seen that t i n and u j m .
different approaches in hybrid methods. They can be enlisted If all the entries of  are fulfilled, item-user associations might
as 1) the construction of collaborative and content based be evaluated directly and easily. However, in most cases the
recommendation systems separately and combine them. 2) matrix  is a sparse matrix. Entries can be missing because of
Incorporating content based filtering into collaborative several reasons. Items may not have been rated or values can
filtering 3) Incorporating collaborative filtering into content be literally missing. After all, if we had chance to serve all the
based filtering and 4) constructing a unified recommender items to a particular user, the user evaluates each these items
system that incorporates both collaborative and content based according to its own preferences. Hence, our goal in prediction
filtering. A detailed definition of hybrid methods and related is to reveal out all the missing parts to discover the user-item
examples of each approach has been given and supported with associations.
a digest of different literature studies on classification of As mentioned above, rating prediction can be performed by
recommender systems in [2]. considering either similarity between items or users (item-
Although the cost of recommending an item has increased in oriented vs. user-oriented). While rating r ij , In item oriented
more complex structure, hybrid methods have been proposed
to overcome shortcomings of content based and collaborative approach the neighborhood function N k (t i , u j ) , designates
filtering techniques. the k-most similar items to t i rated by the user u j ; in user
Our method performs a novel collaborative filtering method
on the entire missing values. Iteratively, predicts ratings in based approach, the neighborhood function N k ( u j , t i ) ,
random order. As missing values are predicted they are used designates the k-most users similar user u j among all users
for latter missing values. We have proposed an algorithm for
predicting all missing values and used QR factorization who have rated t i .
method for predicting each entry. The most popular similarity metrics are pearson correlation
In this paper, we proposed a method for collaborative filtering. coefficient and cosine similarity. Let u j and u k be two
Our contributions are:
Our method restricts knowledge to k-similar items different users and S jk represents the items rated by both
rated by the user and considers ratings of those users. Pearson similarity is:
who rated them all.
It follows an iterative rating procedure. All (r
t p S jk
pj r j )( r pk r k )
previously predicted ratings are used for rating sim(u j , u k ) =
(r (r
2
other missing values. rj )2 rk )
pj pk
Completing entire missing values in user-item t p S jk t p S jk
matrix has been considered.
Besides, our method is believed to prevent round off errors
Likewise similarity between two items is calculated as:
that we come across in prediction of rating. This enables more
accuracy for the subsequent rating predictions throughout the (r ip ri )( r kp r k )
process. u p S ik
sim(t i , t k ) =
The outline of the paper is as follows: Section 1 contains
(r (r
2
introduction; section 2 summarizes the proposed work; section ip ri ) 2 kp rk )
3 has the experiments and discussion. Section 4 has the u p S ik u p S ik
conclusions.
Where S ik represents users who rate both t i and t k . Cosine
II. PROPOSED WORK similarity between users and items are:
u j .u k
 
A. Preliminary Work sim(u j , u k ) = cos( u j , u k ) =
 
and
Rating data can be represented as item-user matrix A of u j uk
size mxn. We assume columns of the matrix represent the  
t .t
sim(t i , t k ) = cos( t i , t k ) = i k
 
users; rows of the matrix represent the items that are rated.
Each entry of column i contains the rating given by the user ti tk
for all items. In mathematical point of view,  represents a
linear transformation from user space to item space. Of course Both item-oriented and user-oriented approaches are given
the matrix  is also a linear transformation from item space to above. K-most similar neighbor will de determined by using

322
similarity metric. In the next section, we will describe how we endwhile
employ our ideas.
The algorithm above completes ratings in matrix. We find
N k (t i , u j ) and users rated them. This is the sub region that
B. Rating Prediction Algorithm
has been used for predicting the movie rating of (ti,uj). In
Our prediction algorithm undergoes three phases. In the order, we randomly pick these sub regions and predict values.
first phase, pairwise similarities between items are estimated. K
most similar item is put in use for entry prediction. K-most
similar items are retrieved and users who rated all are Matrix Factorization Process (Solve (M)):
considered. This implies a smaller subset of users ( n ). For this section we assume that the temporal effects on the
dataset have been ignored and we are trying to predict the
undetermined rating rij. We can use either item-oriented or
Algorithm CompleteRatings(A,k)
user-oriented approach in here. For simplicity, we will use
//It completes nonexistent cells A(ui,tj)
item-based approach as also described in [3]. For given a set
with k-most
//similar items. A is of size mxn. of neighbors, N k (t i , u j ) they define the interpolation
Itemset(ti,uj) null Snull { }
weights as w p | t p N k (t i , u j ) and the prediction rule such
//Phase 1- Similarity matrix is
that
estimated (in symmetric //case lower
triangle would be sufficient) rij = w p rij
t p N k ( t i , u j )
for i=1..m
for j=1..m
S[i,j]sim(ti,tj)    
    
   
     
//Phase 2- Find N k (t i , u j )
    
        
for i=1..m   
 


 

 
 
 


     
        
for j=1..n
if A[i,j] is missing then      
        
//The first k-most similar items to    
     
ti and rated by
//uj are estimated Suppose that we have 9 users and 7 items with ratings given.
Itemset(ti,uj){tr|S[i,r] S[i,r] And we predict the rating r35 (the rating of 3rd item given by
S[i,r] itemset(ti,uj) the 5th user) and 5th user rated the items {t2,t4, t5, t6, t7 } and
size(Itemset(ti,uj)) k rated(tr,ui) among these items the most similar ones to t3 are {t2,t4, t6, t7}.
So the neighbor set becomes N 4 (t 3 , u 5 ) = {t 2 , t 4 , t 6 , t 7 }
j,r,rm}
//Retrieve users rated all items in
Itemset(ti,uj) The prediction rule implies that we can evaluate r35 such that:
Subusers(ti,uj){ur| t s itemset(t i , u j ) r35 =
w p3 r p 5 = w 23 r25 + w 43 r45 + w 63 r65 + w 73 r75
t p N 4 ( t 3 , u 5)
rated(ts,ur) }
To find the weights above they construct a least square
endif problem such that, in hypothetical dense case where all user
while (Itemset(ti,uj) is not empty) but uj rated both ti and all its neighbor in N k (t i , u j ) .



//Randomly one of (i,j) is picked from
min w ( rik w pi r pj ) 2
Itemsetfor
u k u j t p N k ( t i , u j )
//prediction
element RandomlyPick (Itemset(ti,uj)
item_noelement.getItem() The optimal solution of the equation above will give us the
user_noelement.getUser() { }
weights w p | t p N k (t i , u j ) . Now get back to our example:
//Form matrix with the given info to
find A[ti,uj]
Suppose that users rated t3 and all its neighbors in
MConstruct(itemset(item_no,user_no),Sub N 4 (t 3 , u 5 ) = {t 2 , t 4 , t 6 , t 7 } and users who rated them all are
users( {u 1 , u 3 , u 4 , u 7 , u 9 }
item_no,user_no)) Here, we solve a system of linear equations as below. This
//Solve the equation matrix has been constructed with:
A[ti,uj]Solve(M)
Update(S) Update(Itemset)
Update(Subusers)

323
Construct(itemset(item_no, user_no), Subusers( item_no, are linearly independent, the QR decomposition guarantees the
user_no)) in the algorithm CompleteRatings that returns numerical stability caused by the machine round offs. If the
matrix B. column vectors of B are not linearly independent then we call
The entries of the matrix above have just obtained by the the matrix B as rank deficient. The term rank deficiency is
intersection of the entries the sets {t2, t4, t6, t7} and {u1, u3, u4, significantly important for the algorithm we applied. Using the
u7, u9}. If we write the equation above as:    then it QR decomposition the solution of the corresponding equation
becomes: becomes:
~ = R 1 Q T c
Bw = c w

r21 r41 r61 r71 r13 In our algorithm we have already used both the QR and the
w 23 singular value decomposition according to the dimension of
r23 r43 r63 r73 r
w 43 33
r24 r44 r64 r74 = r43 the matrix we obtained. Especially if the matrix is rank
w deficient we use QR decomposition for the solution, otherwise
r27 r47 r67 r77 63 r73 we use the singular value decomposition.
w 73
r
29 r49 r69 r79 r93 To sum up, we have already covered the solution to the
problem for a given user-item association. However, the
B w = c problem can be easily extended filling the whole user-
association matrix. We have decided to extend the problem to
the whole user-item associations instead of determining to
Then the least square solution of this equation becomes
only one undetermined rating.
B T Bw = B T c and its identical to the solution Aw = b . The
solution of the equation is: Now consider the whole user-item association matrix
~ = B+ c contains undetermined ratings. The problem is filling the gaps
Bw = c w or evaluating the undetermined ratings. To achieve the
problem one must travel the whole the gaps and must
Where the term, B+ is the pseudo inverse of B. Suppose that determine the neighborhood of the related gap and then
the singular value factorization of B is B = Q 1 Q 2T . Then the construct the least square problem. Now we know the
dimension of least square problem matrix for each gap. Once
pseudo-inverse of B is:
one gap (unrated entry) in the whole user-item association
B + = Q 2 Q 1T matrix has been filled, we must repeat the whole process for the
next and it goes on. Filling a gap in the user-item association
The r singular values of on the diagonal of 1 , 2 ,.., r on the matrix does not affect every least square problem we have
constructed. However those who are affected this procedure
diagonal of (m by n) are the non-zero square roots of the
must be regenerated. After the regeneration, the process keeps
non-zero eigenvalues of both BBT and BTB and the reciprocals going until all the gaps in the user-item association matrix
1 , 1 ,.., 1 are on the diagonal of + (n by m). The fulfilled.
1 2 r
columns of Q1 are eigenvectors of BBT; the columns of Q2 are III. EXPERIMENTS AND DISCUSSION
eigenvectors of BTB. We implemented in Java and all the information resides on
However, for least square solutions the singular value MySQL server 5.0.75 database. Experiments are conducted on
decomposition sometimes can go unstable because of machine Intel Core 2 Duo CPU T7500 2.2GHz, 3.GB Ubuntu machine.
round offs. To guarantee the numerical stability one can use Experiments are taken on Netflix dataset [1]. In netflix, we
the QR decomposition as a solution. A QR decomposition of have 17770 movies and 480189 users. Ratings scale from 1 to
real valued square matrix B is a decomposition of B as: 5.
B = QR We have several experiments with different k values for the k-
most similar items. Here, k has been set to 15 for experiments.
Where Q is an orthogonal; R is an upper triangular matrix. The Initially, we have taken initial dataset to test our algorithm we
QR decomposition can also be applied to an mxn rectangular select 66 applicant ratings from probe dataset. Then for each
matrix with m>n as: applicant rating we evaluate prediction as described in [3].
Then we remove the applicant ratings from our item-
R
B = Q association matrix and then we run our algorithm. The graph
0 below illustrates results for the algorithm. Figure 1 displays
the difference between the actual rating values (green) and
Where Q is an mxm orthogonal matrix;  is an R is an nxn
predicted one (red).
upper triangular matrix. The bottom (m-n) rows of the right-
hand side of the equation consist entirely of zeroes. If the
matrix B is full-rank, which means the n column vectors of B

324
neither the neighborhood collection nor the methods be used.
The concurrency of the data cannot be guaranteed just because
of unavoidable reasons. One of them and may be the crucial
one is temporal effect, e.g. items served in different times
might be evaluated changeably by the users had similar taste.
Another considerable reason is effect of previous predictions.
We have suggested an algorithm that detects sub regions to be
used in prediction and performed solutions for them. Finally
all missing ratings have been assessed.

REFERENCES
[1] http://www.netflixprize.com/
[2] G. Adomavicius and A. Tuzhilin, Towards the Next Generation of
Recommender Systems: A Survey of the State-of-the-Art and Possible
Extensions, IEEE Transactions on Knowledge and Data Engineering 17
Fig. 1. Comparison of Probe Dataset Ratings with [3] and actual (2005), 634-749.
ratings [3] R. Bell and Y. Koren, Improved Neighborhood-based collaborative
filtering, KDDCup07, August 12, 2007, San Jose, California, USA.M.
[4] Balabanovi and Y. Shoham, "Fab: Content-Based, Collaborative
Recommendation," Comm. ACM, vol. 40, no. 3, Mar. 1997, pp. 6672.
[5] Resnick, Paul and Varian, Hal. Recommender Systems,
Communications of the ACM, vol. 40(3), March 1997.
[6] Schafer, J. Ben and Konstan, Joseph and Riedi, John, Recommender
systems in e-commerce, EC '99: Proceedings of the 1st ACM conference
on Electronic commerce, ACM, NY, USA, 158166, 1999.
[7] Christos Faloutsos, Douglas W. Oard, August 1995, A Survey of
Information Retrieval and Filtering Methods UM Computer Science
Department; CS-TR-3514 URI: http://hdl.handle.net/1903/436
[8] Raghavan, Prabhakar, Information retrieval algorithms: a survey, SODA
Fig. 2. Cumulative RMSE as Unknown Moves Predicted for Data set '97: Proceedings of the eighth annual ACM-SIAM symposium on
1[1...6000] Discrete algorithms, pages 1118, 1997.
[9] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, Using collaborative
filtering to weave an information tapestry, Communications of ACM,
vol. 35, no. 12, pp. 6170, 1992.

Fig. 3. Cumulative RMSE as Unknown Moves Predicted for Dataset 2


[1...6000]

We have prepared two different datasets (Dataset 1 and


Dataset 2). Each dataset performed 6000 movie predictions.
Figures 2 and 3 are the results of cumulative root mean square
errors as the ith movie is predicted (1...6000) for both datasets.
We have done experiments ten times and took the average of
results. Results show that root mean square error fluctuates at
the beginning and becomes tends to be stable after the first
2000 movies. Prediction of RMSE becomes smooth and
evolves out of the value 2 to 1.6 in Figure 1; and 3 to 1.5 for
Figure 2.
IV. CONCLUSIONS
Even if the neighbors have been found logically to predict
undetermined rating, there are major problems have been
missed. The accuracy of the prediction does not depend on

325