You are on page 1of 6

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct.

19-20, Mumbai, India

Comparing performance of Collaborative Filtering Algorithms


Vandana A. Patil
Assistant Professor, Department of Information Technology St. Francis Institute of Engineering Borivali(W), Mumbai, India E-mail: vandana.patil@rediffmail.com
Abstract Recommender systems are widely used for making personalized recommendations for products or services during a live interaction nowadays. Collaborative filtering is the most successful and commonly used personalized recommendation technology. The open nature of collaborative recommender systems provides an opportunity for malicious users to access the systems with multiple fictitious identities and insert a number of fake user profiles in an attempt to bias the recommender systems in their favor. In the proposed work, we will explore to combine the user trust mechanism with collaborative filtering algorithm for the purpose of improving the robustness of recommendation algorithm and ensuring the quality of recommendations. We propose computational model of trust and then a collaborative filtering algorithm based on it. This User Trust Based collaborative Filtering Algorithm is further modified considering impact of time on the user ratings. The performance of all the three algorithms is compared in terms of Mean Absolute Error between the actual and predicted rating by the respective recommender system. Keywords: Personalized Recommendation; Recommender System; Collaborative Filtering (CF); User Trust Model

Lata Ragha
Associate Professor, Department of Computer Engineering Terna Engineering College Navi Mumbai, India E-mail: lata.ragha@gmail.com customers are also interested in to the target user, according to principles of user interest similarity. Traditional user-based collaborative recommendation algorithm uses the similarity of users tastes to generate recommendations. This profile level similarity method is subject to manipulation by malicious users. If the malicious users changed the attack strategies, in particular, they had some collaboration with others; this method would not effectively cut down the negative effect. Due to the lack of trust between users, they couldnt clearly judge whether one can be trusted or not. Thus traditional collaborative recommender systems can not prevent this kind of malicious attack. Thus how to ensure the quality of recommendations for personalized collaborative recommender systems in the face of profile injection attacks has become an important issue [1]. Recent research on collaborative recommender systems has focused on techniques that can be used to protect the predictive integrity of collaborative recommenders from malicious profile injection attacks. Traditional user-based collaborative recommendation algorithm uses the similarity of users tastes to generate recommendations. This profile level similarity method is subject to manipulation by malicious users. If the malicious users changed the attack strategies, in particular, they had some collaboration with others; this method would not effectively cut down the negative effect. Due to the lack of trust between users, they couldnt clearly judge whether one can be trusted or not. Thus the reliability of users (Trust) should also take into account within the recommendation process. It is also observed that the interest of a user at present time does not remain the same in the coming time [2]. The interest changes with time and hence it is also felt that while calculating user trust based on the user interactions with reference to item ratings, time factor shall also be considered to improve the performance of the recommender system . This paper compares the traditional collaborative filtering algorithm with the user trust based collaborative filtering algorithm and also with its modified version which is time based user trust based collaborative algorithm.

I.

INTRODUCTION

The rapid development of e-commerce has brought us a great convenience. But with the increasing expanding of Ecommerce, there are more and more goods in internet which means customers need to spend a lot of time in finding what they like or what they want. Numerous customers may lose their patient and interest in online shopping because they are unable to search things in short time, for a lot of time is spent on scanning irrelevant information and products. Personalized recommendation system in e-commerce is emerged in this context to solve this problem, which takes advantage of customers internet based on customer information, analyses customers hobby and interests, and initiatively provides personalized products to them and helps them make purchase decisions. Collaborative filtering is the most successful and commonly used personalized recommendation technology. Its basic idea is to recommend goods which other similar

978-1-4577-2078-9/12/$26.002011 IEEE

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India
This paper is organised as follows: Section 1 is the introduction to collaborative filtering, Section 2 is the brief overview of the related work, Section 3 discusses the basic steps in traditional collaborative filtering algorithms, proposed system with all its three modules is described in section 4, evaluation matrices and methodology adopted is given in section 5 and section 6 respectively., experimental results are presented in section 7, followed by conclusion. II. RELATED WORK evaluate the recommended items. The active user would get a rating that stands for the trust value of target user to the active user. The trust information was propagated among users who had a trust relation with the accepter. John, O., Barry S. [6] proposed a model in which the basic idea was to build a relation between users with recommended items. Based on the tentative idea, there would be a higher weight to active user who had more accurate recommendations on items than those with poor records within the recommendation process. They supposed that users with a high authentic value have less intention to deceive others. The item-trust recommendation algorithms were more effective to defend the random attacks [7], but if the malicious users changed the attack strategies, in particular, they had some collaboration with others; this method would not effectively cut down the negative effect. Due to the lack of trust between users, they couldnt clearly judge who accepted item, who can be trusted or not. To overcome the drawback, in this paper we explore to exploit trust information explicitly expressed by the users to improve the robustness of recommender systems. III. BASIC STEPS IN CF ALGORITHM

Personalized recommendation system in e-commerce takes advantage of customers internet based on customer information, analyses customers hobby and interests, and initiatively provides personalized products to them and helps them make purchase decisions. There are two kinds of recommender systems, content based recommender systems and collaborative filtering recommender systems. Content-based recommender systems, such as Libra, CiteSeer and WebMate are suitable to recommend items in which machine can automatically analyze their content. Research shows that content-based recommender systems are not good enough in most of the cases. Collaborative filtering recommender systems emerge as a new method to overcome the shortcomings of content-based recommender systems. With the development of collaborative filtering recommendation algorithm, many improved algorithms are proposed. There are three types of traditional collaborative filtering: User-based collaborative filtering, which is to find neighbors with similar interests or hobbies, and then recommend target customer with certain kind items based on the neighbors. Model-based collaborative filtering, this algorithm use customers historical data to build a model, and then predict resources target user is interested according to this model Item-based collaborative filtering, this method focus on comparing the similarity between items, and recommend target user with the item he havent visited according to the items which he had visited

In daily life, people tend to consult their friends or trust for the unfamiliar problem or something, and make their own choices based on these judgments and opinions. A typical collaborative filtering algorithm is based on the user's interest similarity. Its basic principle is to get user neighbors using historical ratings data; recommend to the target user according to rates similar to the nearest neighbor of the score data. This process comprises following steps to complete: [4] A. Data representation Collaborative filtering algorithm of traditional system is based on the user - item rating matrix R(m,n) to find the target user's nearest neighbor set. Among them, R(m,n) is a mn order matrix , m-rows show users and n-lines show out items and the cell shows the score value by user i on item j. B. Find the nearest neighbor Here, we use the Pearson correlation to calculate the similarity of users. Let D = {U, I, R} be a data source of a recommender system, where U = {user1, user2, ..., user m} is a set of users of the system, I = {item1, item2, ..., item n} is a set of items of the system, and R is a user ratings matrix, where ri,j belongs to R represents the rating of user i on item j. The similarity between user u and user n is given by the following Pearsons correlation coefficient Equation

Montaner, M., Lopez, B., de la Rosa, J.L. [4] introduced a trust model into the recommendation algorithm, so users could get the recommendations from the trust-building group. The tentative idea was that trust-factor was based on the customers satisfaction with the recommended items and trust value could be dynamically adjusted. The drawback of this method was lack of trust information among users at the beginning of recommendation, and whats more it was inefficient to build the trust group. So it was not an effective way to defend against the malicious noise. Paolo, M., Meersman, R., Tari, Z. (eds.), [5] proposed a method that users who accepted the recommendations would

Sim (u , n) =

I u ,n ( Ru ,c Ru )( Rn ,c Rn )
2

I u ,n ( Ru ,c Ru )

(1)
2

I u ,n ( Ru ,c Ru )

Where Ru,c and Rn c are the rating of user u and user n on item c, Ru and Rn are the average ratings over all rated items for

978-1-4577-2078-9/12/$26.002011 IEEE

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India
u and n, respectively. The set Iu n stand for the rating items on which user u and user n have co-rated. It is important to note that the coefficient can be computed only if there are items rated by both the users [1]. C. Produce Predictions

DT

u n

t
i =1

i n

(3)

Pi , c = R i +

sim ( i , j ).( R

jc

Rj)

i represents the trust value of target user u for Where t n active user n on item i and k is the number of the set which contains items that the active user and the target user have corated.

m =1

(2)

sim ( i , j )

m =1

Among them, R i is the average score of the user i,

sim ( i , j ) is the user i and j nearest neighbor centralized


user's similarity coefficient, R jc is a score of user j on item C, R j is the user js average score, N is the number of nearest neighbor. IV. PROPOSED SYSTEM

Recommendation Trust: The recommendation trust is computed with the help of target users trust group who has an interaction with the active user. Let m be a set of trust group of the target user, which contains the users who have a reliable interaction with the active user n. Let ITn represent the recommendation trust that is computed by the target user u for active user n.
u

The proposed system contains three different models described below: A. User-Based CF Model (UBCF) This is the basic CF based recommendation model. The results obtained from this model are required to compare with the results of enhanced CF techniques. In the user based CF recommendation system the user ratings data are usually described as a user-item matrix Rm*n, in which m means the no. of all users, n is the number of all items, and R i, j is the score of item j rated by user i, indicating the users preference degree for the item. The most important step in the user-based CF is the searching of the target users neighbor. Usually, the similarity is adopted as a means to measure the similar degree of user interests and hobbies through the common user ratings data. B. User-Trust Based CF Model (UTCF) It is the enhanced version of user-based CF model. In this it is proposed to address the limitations of previous model. In this model trust between users is used to compute/predict the user ratings and based on that recommendations are provided. To model the degree of trust, we assume that target user can assign a certain value to the active user by using the co-rated items of the users. We use two types of trust: direct trust and recommendation trust. The former can be constructed by users with exchange experiences such as friendship, good views. The latter is credit of a user award by the other users who are reliable by public [5]. Direct Trust: Let D T nu represent the direct trust of the target user for the active user, the direct trust value is given by Equation

IT

u n

T
i =1

u mi k

C r ( n ) mi
(4)
u mi

T
i =1

u Where Tm is a trust value of the target user u for the i reliable user in the set m and Cr(n) is credibility of active user n by the reliable user in the trust group.

User Trust Value:


u Let Trustn stand for the combined trust value of the target user u for the active user n, it is computed in combination with the direct trust DTnu , which is the trust value of the target user

u for the active user n, and the recommendation trust ITnu which is the expression of the set of users trusted by the target user u. The combined trust value is then given by Equation
u Trust n = DTnu + ITnu

(5)

Where , are weighting factors to they are constrained by the equation calculating the user similarity and trust weight is generated which is used in ratings. C.

adjust the two parts, + = 1. After value the compound generating predicted

Time Based User Trust CF Model (TBUTCF) This method will improve the existing user-trust based algorithm by incorporating the weight of user rating time, which will reflect the change of user interest with time and enhance the evaluation accuracy [2]. As the users interest may change dynamically over the time, the user may have different ratings for the same item at different times. However, the traditional method has the equal

978-1-4577-2078-9/12/$26.002011 IEEE

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India
treatment to the user ratings in the search for the nearest neighbors, which deteriorates the accuracy of the neighbor recognition and results in poor quality of recommendation. So, we introduced the time weight to factor-in the change of user interest with time to improve the accuracy of producing the recommendation. The value of time weight can be given by following equation.

MAE =

i =1

pi qi
N

(9)

Where, N is the number of the items recommended to the active use. The lower value of MAE indicates more accuracy in the prediction for user interest of the recommendation system [1]. VI. METHODOLOGY ADOPTED To evaluate the performance of the algorithms it was necessary to have the ratings data for different items by actual users. This was very important to evaluate the performance of the algorithms based on realistic information. So the data was collected from the MovieLens web site. MovieLens dataset was collected by the GroupLens Research Project at the University of Minnesota. The dataset used in this implementation consists of 100,000 ratings (1-5) from 943 users on 1682 movies [17]. As the basic dataset from MovieLens had records for 943 users, it was not possible to run the case studies using the entire dataset. The reason being the amount of time required to complete all the case studies proposed in our implementation would be enormously high. Hence to optimize on time required to run the various case it was decided to choose network with different sizes varying from 200 to 400 nodes for analysis. The records so collected in the sub dataset were divided into two groups namely Training Dataset and Test Dataset. Training dataset was used as the known information to the algorithm, and test dataset was used for evaluating the algorithm. The ratio of training to test data was varied from 70% to 90%. Finally the predicted ratings obtained by running the algorithms were compared with the actual ratings and Mean Absolute Error (% MAE) was calculated to evaluate the performance of the said algorithm. VII. EXPERIMENTAL RESULTS With User based collaborative filtering algorithm, the implemented model was tested for various network sizes. The 80% of the total dataset was provided as training dataset for network of 250, 350 and 450 nodes. For each network size the performance of the model was evaluated for different number of iterations. From the results obtained it was seen that the trend of MAE gets stabilized after completing 800 and above iterations. From the observations made and to have sufficient buffer for number of iterations to be conducted to have the fare analysis of the algorithm it was decided to keep the number of iterations as 1200. Similarly On analysis it is observed that the UTCF algorithm has shown best performance at 0.7 value of alpha parameter also the Time constant T can be fixed at 45 days which provides the optimum result for TUTCF algorithm.

H (t ) = 1 + e

1
tc tr T
(i, j )

(6)

Where, parameters tc represents the current time at which the recommendation is given and tr(i,j) represents the time at which the user i has rated the item j. T is the time span constant [13]. We make use of the time weight to modify the previous method to calculate the similarity between users. The improved similarity measure formula can be described as follows:

Ri , j = Ri , j H (t )

(7)

Sim * (u , v ) =

i I uv

(R
u ,i

u ,i

Ru )( Rv ,i Rv )

i I uv

(R

Ru ) 2

i I uv

(R

(8)

v ,i

Rv ) 2

Where Ri , j is the revised rating score for item j by user i and R i , j is the original rating score for item j by user i. To calculate the similarity in the User-Trust based CF algorithm we used the improved rating scores derived using the time weight H(t). This gives the improved similarity between the users and hence improves the performance of the recommendation system. The experimental results of above mentioned models with respect to the Mean Absolute Error (MAE) are calculated and compared at the end to evaluate the performance of them. V. EVALUATION MATRICES

Accuracy is an major indicator for the evaluation of recommended system performance. As one of the most commonly used methods, the mean absolute error (MAE) is adopted as a metric here to compare the quality of our proposed approach with other collaborative filtering methods. Supposing the top-N prediction rating set for the active user is p 1 , p 2 ,......... ., p N , and corresponding actual rating set is q 1 , q 2 ,......... ., q N the MAE can be defined as follows:

978-1-4577-2078-9/12/$26.002011 IEEE

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India
Various case studies are conducted for varied data sparcity and network sizes on all the three algorithms separately and then simultaneously. Combined Analysis: After freezing all the required parameters the performance of all three algorithms is evaluated simultaneously. The parallel evaluation of algorithms was done for various network size containing 250, 300 and 350 nodes. For each network size the sparsity in the data was varied from 70% to 90% in the steps of 10% and the performance results were captured. The trends are shown in Fig. 1 to Fig. 6 below for network containing 250, 300 and 350 nodes with training dataset percentage as 70% and 90% respectively. Similar trend is observed in the graphs of remaining all case studies. The impact of change in network size and data sparcity on performance of individual algorithms with same training and test datasets was analyzed, to draw final conclusion. From the results of the parallel evaluation of all the three algorithms it was noted that the performance of the TBUTCF algorithm is far better than other two algorithms. The trend of performance of TBUTCF was consistently stable for different network sizes as well as for variable sparsity in the datasets.
Figure 4. Combined performance evaluation for network with 300 nodes at 90% dataset

Figure 5. Combined performance evaluation for network with 350 nodes at 70% dataset

Figure 1. Combined performance evaluation for network with 250 nodes at 70% dataset

Figure 6. Combined performance evaluation for network with 350 nodes at 90% dataset

CONCLUSION We had proposed improvement in the recommendation system based on traditional user based collaborative filtering. The user trust computation model and the corresponding user trust prediction algorithm was put forth to be combined with the traditional user based collaborative filtering algorithm. From the results captured it is observed that, the data sparsity has huge impact on the performance of UBCF and network size play almost no role in deciding the performance of UBCF.
Figure 2. Combined performance evaluation for network with 250 nodes at 90% dataset

On the contrary there is no significant impact of data sparsity and variations in network sizes on the performance of UTCF algorithm. This indicates that due to incorporation of user trust, the algorithm is able to reduce the negative impact of sparse as well as malicious data on the recommender system. In UTCF implementation it was noticed that the ratings given for various items by various users at different time have same weightage. But practically it is not possible that users preference for any item remains constant forever. Users preference is bound to change with certain time interval. Hence an enhancement was also proposed in UTCF algorithm to consider the impact of change in user interest with time. For this the time weight was devised, which signifies the

Figure 3. Combined performance evaluation for network with 300 nodes at 70% dataset

978-1-4577-2078-9/12/$26.002011 IEEE

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India
importance to be given to a particular item rating. This time weight parameter was combined with the existing UTCF algorithm and new algorithm was formulated, which is called as Time Weight Based User Trust Collaborative Filtering algorithm. The performance of all the algorithms have been evaluated and compared in terms of MAE (Mean Absolute Error) value calculated for different number of nearest neighbors groups. Analysis of the results reveals that introducing user trust in UBCF algorithm improves the quality of recommendations and provides stability and consistency to recommender system. In addition to this considering the impact of change of user interest with time further enhances the performance of UTCF algorithm. REFERENCES
[1] Fuzhi Zhang, Long Bai, and Feng Gao, A User Trust-Based Collaborative Filtering Recommendation Algorithm, Springer-Verlag Berlin Heidelberg 2009, ICICS 2009, LNCS 5927, pp. 411424 Zhimin Chen, Yi Jiang, Yao Zhao, A Collaborative Filtering Recommendation Algorithm Based on User Interest Change and Trust Evaluation, International Journal of Digital Content Technology and its Applications Volume 4, Number 9, December 2010, pp 106-113 J. Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen, Collaborative Filtering Recommender Systems , School of Electrical Engineering and Computer Science Oregon State University 2008. Montaner, M., Lopez, B., de la Rosa, J.L., Developing trust in recommender agents, 1st International Conference on Autonomous Agents, ACM Press, Bologna (2002), pp. 304-305 Paolo, M, Meersman, R., Tari, Z. (eds.), Trust-aware collaborative filtering for recommender systems, LNCS, vol. 3290, Springer, Heidelberg (2004),pp. 492-508. John, O., Barry, S., Trust in Recommender Systems, 10th International Conference on Intelligent User Interfaces, ACM Press, New York (2005), pp.167-174. Xinyi Bu, Xiujuan He, An Optimized Trust Factor Based Collaborative Filtering Recommendation Algorithm In E-Commerce, International Conference of Information Science and Management Engineering, IEEE 2010, pp 468-471 Yanhong Guo,Xuefen Cheng, Dahai Dong; Chunyu Luo Rishuang Wang, An Improved Collaborative Filtering Algorithm Based on Trust in E-Commerce Recommendation Systems, research supported by a grant from the Chinese National Science Foundation Key Project, IEEE 2010, pp 1-4 Xiao Cheng Chen, Run Jia Liu, Hui You Chang, Research of Collaborative Filtering Recommendation Algorithm Based on Trust Propagation Model, International Conference on Computer Application and System Modeling (ICCASM 2010), IEEE 2010, pp. V4-177-V4-183 Jia Yubo, Cai Hao, Huang Chengwei, A Collaborative Filtering Recommendation Algorithm Based on User Trust Model, First International Conference on Networking and Distributed Computing, IEEE 2010, pp. 213-217 Yang Huai-Zhen, Li Lei, An Enhanced Collaborative Filtering Algorithm Based on Time Weight, International Symposium on Information Engineering and Electronic Commerce, IEEE 2009, pp 262265 Liang He, Faqing Wu, A Time-context-based Collaborative Filtering Algorithm, International conference on Granular Computing IEEE 2010, pp 209-213 Qian Wang, Min Sun, Cong Xu, An Improved User-model-based Collaborative Filtering Algorithm, Journal of Information & Computational Science 8:10 2011, pp 1837-1846 Franc, ois Foussi, Marco Saerens, Evaluating performance of recommender systems: An experimental comparison, IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2008, pp 736-738 [15] Stephen Naicken, Anirban Basu, Barnaby Livingston and Sethalat Rodhetbhai, A Survey of Peer-to-Peer Network Simulators, stephennaicken.com/wp-content/.../paper-pgnet2006_p2psimsurvey.pdf [16] Bruno DEFUDE, P2P simulation with PeerSim, ASR option, January/February 2007 [17] Datasets GoupLanes Research Files.htm

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

978-1-4577-2078-9/12/$26.002011 IEEE