You are on page 1of 4

2013 Fourth International Conference on Intelligent Systems Design and Engineering Applications

Improved Collaborative Filtering Algorithm In The Research And Application Of


Personalized Movie Recommendations

Xiao Peng , Shao Liangshan , Li Xiuran


Liaoning Technical University, Huludao, Liaoning, 125105, China
Heilongjiang University of Science and Technology, Haerbin, Heilongjiang, 150000, China
xiaopeng198628@163.com

Abstract—With the development of the Internet and e- This means that a given user, we can find some of the most
commerce, recommendation system has been widely used. In similar users, can be based on the similar users recommend-
this paper, the electronic commerce recommendation system, ded decision to proceed with the final recommendations. It
has a further study and focuses on the collaborative filtering can be divided into the following three stages:
algorithm in the application of personalized movie (1) interest in modeling: the user interest information into
recommendation system. According to the characteristics of computers able to identify information.
movie recommendation system itself and traditional (2) search for nearest neighbor search with the current
collaborative filtering algorithm of sparse user ratings user's interests is most similar to a group of users.
matrix,this paper is proposed based on a hybrid user-based
(3) recommend project: according to nearest neighbor
and item-based collaborative filtering algorithms, and applied
search out in the last stage of the project evaluation
to the MovieLens dataset, achieved good effect.
information to calculate the user not to its grading forecast
Keywords- Recommendation system; Collaborative filtering rating of the project, and then will not grade project
algorithm; Recommendation algorithm; Movie according to the prediction of the descending sorting, to
former N project as the current user's recommended list.
INTRODUCTION Based on collaborative filtering recommendation system
With the rapid development of Internet technology, we can obtain relatively accurate recommendations as a result,
can access to the vast amounts of information. For example, but also can excavation of the target users and potential
tens of thousands of movie on Netflix, Amazon has millions demand, but there is still data sparse and cold start and other
of books, and more than tens of millions of all kinds of issues.
goods on taobao, so it is difficult for users to find their Item-based collaborative filtering algorithm
interested goods on so much information. So the quick and
Item-based collaborative filtering recommendation to
accurate recommendation on products to the users is
arise according to the scores of other projects to the target
need.At the same time improve customer satisfaction,
user's recommended list. That is to say, for a given project
become the key for customer. Personalized recommendation
did not score, we are looking for the most similar projects,
system mainly involves the user forecast options preferences,
can be predicted based on the score of these projects to the
and to predict the most appropriate options recommended to
rating on the project, and make a final recommendation. He
the user. Recommendation algorithm based on collaborative
can be divided into two stages:
filtering algorithm is the most widely used in movie
(1) find similar items: the degree of similarity computing
recommendation system , different from the traditional
project directly, search and the target project is the most
content-based recommendation system, collaborative
similar collection of neighbors.
filtering based on a group of users hava the same intereste or
(2) recommend items: based on neighbor search out
items to recommend. Movie project contains the properties
items on a stage, the project did not score rate prediction.
such as release time, category, the user attributes also
Item-based collaborative recommendation algorithm can
include occupation, age and so on.This paper is on the basis
be used when calculating the similarity between project
of traditional collaborative filtering algorithm , combine with
offline calculation, saves computing time, and, even if the
the characteristics of movie recommendation system, present
matrix is relatively thin, also can achieve the target users
a collaborative filtering algorithm based on the user and
from a good recommendation. However, the algorithm
based on the item.
cannot cross type is recommended, and users are limited to
TRADITIONAL COLLABORATIVE FILTERING ALGORITHM only get with previous similar projects, familiar with the
AND DEFECT contents of the unfavorable to dig the user potential interest.

User-based collaborative filtering algorithm IMPROVED COLLABORATIVE FILTERING ALGORITHM


User-based collaborative filtering recommendation based Through to the traditional collaborative filtering
on other User's point of view of target users recommend list. algorithm based on user and based on the project summary,

/13 $26.00 © 2013 IEEE


978-1-4799-2791/13 $26.00 © 2013 IEEE 349
52
DOI 10.1109/ISDEA.2013.483
10.1109/ISDEA.2013.21
10.1109/ISDEA.2013.85
on the basis of the two, combined with their own ,, The score movie k of user i;
characteristics, pass by to improve combination, formed a , =
, , The predict score movie k of user i;
new collaborative filtering algorithm. It first of all is the use (3) users complete the prediction of the remaining not rated
of project based collaborative algorithm to calculate the items by finding similar neighbours
similarity between project, according to the user for similar
item score to predict user to score ratings, making users ķcomputing similarity users I j with users, using the
between common score item is more, this can effectively cosine distance score normalization algorithm:
∑ ∈ ( )∗( )
solve the user ratings under the extreme condition that data Sim(i , j)=
, , ,

are sparse the shortage of traditional user similarity measure ∑ ∈ ( , ) ∗ ∑ ∈ ( , )


method and then use the user-based collaborative filtering
algorithm, it calculated the target users of nearest neighbors Among them, UI , is the grade intersection of user i and
is accurate, can significantly improve recommendation j, namely:
accuracy of recommendation algorithm. UI , =RI ⋂ RI
Algorithm model is as follows: , , respectively, on behalf of the user, user, I j all
(1) definition: a M users, N films form a M * N users - item actual the average value of the film, who join the user
evaluation matrix , set to users for film score, if I
there I only of 1, 3, 5, 5, 4, 3, respectively, then = (5
score of j, is to remember, and if i score of j does not + 4 + 3) / 3 = 4.
exist, is written down to 0, i score , for the forecast of ĸ Users in turn the i and the rest (m - 1) a user
film j for the user. similarity, and to establish a descending order according
(2) adjacent to predict users did not score a gradeby looking to the similarity of users set, namely in the whole user
for projects space build user set NU ={U , U , … U }, that iǂNU ,
ķComputing users i, j review excessive film together and U has the highest similarity,i U , lined, lowest
with users and set: RI , ; U similarity. Set similar to the number of neighbor
, = ⋃ ; user threshold to KU, so KU before NU of a user, as
, respectively for the user to the i, j of film score user i RU similar neighbor set.
collection. Ĺuse of user similarity neighbors, I to complete for the
ĸIn the project set , user i not score categories as remainder of the I didn't score the prediction of the film,
follows: to predict remaining virgin grade film users I k, for
= ,- ; instance:
∑ ∈ ( , )∗( )
ĹUsers in order to set each the scoring of the film in P , =R +
,

the set of NRI , I suppose I need to calculate users for ∑ ∈ | ( , )|


film score of k: Repeat the above steps, you can get all the users have
Because every film belongs to one or a variety of types, not all the predicted rating score films.
if k films at the same time belongs to the costume (4) complete recommended
dramas, comedies, then it will be k in the column Select recommended sequence can adopt the following
labeled 1 type costume dramas, comedies, thus, methods:
belonging to a film type set X to set k, n belonging to a ķ all the predicted score is greater than the score
film type set X to film, then use the Jaccard distance, threshold r project as recommended the results returned
similarity calculation of k and n as follows: to the user.

Sim(k,n)=1- ĸ the forecast rating of all items will value from big to

Then and video similarity of k nearest neighbors small to sort, select the top N project as recommended
similarity threshold is greater than the project SI all sequence.
items as k neighbor set NI .
ĺCalculate the user I score for the forecast of k P , : AN INSTANCE OF THE TEST IMPROVED COLLABORATIVE
∑ ∈ ( , )∗( ) FILTERING ALGORITHM
,
P , =R + ∑ ∈ | ( , )| The experimental data set
R , R representing all users of k, the mean score of n. Film rating of our experiment in the MovieLens dataset
Circulation perform the above steps, I can calculate the on the ml - 100 k, this data set contains 943 users to 100000
users for all the films in the film collection NRI forecast grades of 1682 films, each user has at least 20 score films.
score. By the same token, the user j can be calculated User ratings are divided into 1 to 5 points, the higher the
for all the films in the film collection NRI prediction score, according to the higher level of user preference for the
score. In this way, users I, j project set RI , to them all film. The data set is mainly made up of three data tables,
satisfy the similarity is greater than the threshold of the respectively is a user table (user), the movie table (item),
SI film finished the predicted ratings, namely for any scale (rates). Table structure is as follows:
k∈ RI , , for users of k score:

350
53
Table 4.1 the user table algorithms of recommendation quality.
UserID
The experimental results and analysis
Gender
Age
Table 4.4 Various recommendation algorithm recomm-
Occupation ended quality more stable condition
Code Number of MAE
nearest IMPROVED ITEM- USER-
Table 4.2 the item table neighbors BASED BASED
MovieID threshold
Title by the user
20 0.8584 0.8635 0.8654
Release time
25 0.8536 0.8568 0.858
Genres 30 0.8506 0.8547 0.8569
Table 4.3 the rates table 35 0.8498 0.8528 0.8546
UserID 40 0.8486 0.851 0.8526
MovieID 45 0.846 0.8506 0.851
Rating 50 0.8448 0.8496 0.8495

This experiment using the MovieLens own training set and 0.87
testing set for testing, MovieLens rating scale of 10 w times
randomly divided into 8 w of grade 2 w training set test set. 0.865
Algorithm testing work on the training set, and the test
mainly user has been evaluation of products to test, using 0.86
the algorithm on the training set to obtain score for the
forecast of film, and compared with the test set 0.855
corresponding actual ratings, recommendations are obtained.
0.85
The experimental results metrics
0.845
This experiment USES the mean absolute error (MAE) to
measure algorithm. Mean absolute error (MAE) refers to the 0.84
user on a project of the predicted rating on the project with
the user's actual ratings. The smaller the mean absolute 0.835
deviation algorithm to recommend the better the results.
Assumptions on the test set user number U evaluation of 0.83
film for n ≤n, the user's actual grade R , (j=1,2…n ), the 20 25 30 35 40 45 50
training set for prediction score r , , then U for the users:
∑ ( , , )
MAE improved MAE Item-based
=
MAE User-based
Through the calculation of the average absolute deviation of
each user value MAE (i=1,2…m), total MAE can be Diagram 4.1 MAE of three kinds of algorithms
calculated:
∑ Through the analysis of experimental results, which can
MAE=
be concluded as follows: when the nearest neighbor number
Experimental scheme increases, the three kinds of collaborative filtering algorithm,
the average absolute deviation of MAE value decreased, and
This experiment to test three kinds of algorithms: the that the number of nearest neighbor set too hours, all kinds
collaborative filtering algorithm based on User, collaborative of recommendation algorithm recommended effect is not
filtering algorithm based on Item, improve the combination good, although the three kinds of collaborative filtering
of collaborative filtering algorithms. Nearest neighbors algorithms in User number increases, the neighbor MAE
similarity threshold setting project SI is 4/5, the score data value decreased, but the improved collaborative filtering
accumulated more cases, namely stability conditions, algorithm than the User-based collaborative filtering
respectively, set up similar to the number of neighbor user algorithm and the Item-based collaborative filtering
threshold KU for 20, 25, 30, 35, 40, 45, 50, compare various algorithm provides a better recommendation results.

351
54
CONCLUSIONS
Collaborative filtering algorithm based on users,
although the rapid and accurate to make recommendations,
but the algorithm exist such problems as data sparseness and
scalability. While project-based collaborative filtering
algorithm to solve the collaborative filtering algorithm based
on user data sparseness problem, but because of the
algorithm is based on similar items to recommend, not
recommended across types, namely, lack of singular
discovery. Improved combination although collaborative
filtering algorithm based on the project and users can
simultaneously solve the collaborative filtering algorithm
based on user and based on the project encountered
problems, algorithms of scalability problems still exist, so
coordination filtering algorithm remains to be further
improved.
REFERENCES
Anand Rajaraman, “Mining of Massive Datasets” the book of Standford
University, 2010
Yang jie,Application and Research of Personalized Recommender
Systems.Master Degree thesis.2009.5.(in Chinese)
Yang zhiqi,“Collaborative filtering technology in the research and
application of e-commerce personalized recommender
systems”Master Degree thesis. 2009.5. (in Chinese)
Gerhard Fischer.User Modeling in Human-Computer Interaction.User
Modeling and User-Adapted Interaction,2001(1)
Herlocker,Jonathan L., Joseph A,John T,Loren G,Terveen.Ried Evaluating
Colaborative Filtering Recomender System㸬
G㸬Adomanvicius ,A.Tuzhilin.Expert driven validation of rule-based user
models in personalization application.2000
Bhaskar Mehta and Wolfgang Nejdl. Attack resistant collaborative filtering.
In Proceedings of the 31st annual international ACM SIGIR
conference on Research and development in information retrieval,
2008.
Karypis,G.,Sarwar,B.,Konstan,J.,and Riedl,J. Analyze of Recomendation
Algorithm for E-Commerce.ACM Conference on Electronic
Commerce,2000.

352
55

You might also like