Professional Documents
Culture Documents
39
SIGIR 2007 Proceedings Session 2: Routing and Filtering
rithm which exploits the information both from users and users or items, which would discard some information of the
items. Moreover, this algorithm will predict the missing user-item matrix.
data of a user-item matrix if and only if we think it will bring
positive influence for the recommendation of active users 3. SIMILARITY COMPUTATION
instead of predicting every missing data of the user-item
This section briefly introduces the similarity computation
matrix. The simulation shows our novel approach achieves
methods in traditional user-based and item-based collabora-
better performance than other state-of-the-art collaborative
tive filtering [2, 5, 7, 17] as well as the method proposed in
filtering approaches.
this paper. Given a recommendation system consists of M
The remainder of this paper is organized as follows. In
users and N items, the relationship between users and items
Section 2, we provide an overview of several major approaches
is denoted by an M ×N matrix, called the user-item matrix.
for collaborative filtering. Section 3 shows the method of
Every entry in this matrix rm,n represents the score value,
similarity computation. The framework of our missing data
r, that user m rates an item n, where r ∈ {1, 2, ..., rmax }. If
prediction and collaborative filtering is introduced in Sec-
user m does not rate the item n, then rm,n = 0.
tion 4. The results of an empirical analysis are presented in
Section 5, followed by a conclusion in Section 6. 3.1 Pearson Correlation Coefficient
User-based collaborative filtering engaging PCC was used
2. RELATED WORK in a number of recommendation systems [18], since it can
In this section, we review several major approaches for be easily implemented and can achieve high accuracy when
collaborative filtering. Two types of collaborative filtering comparing with other similarity computation methods. In
approaches are widely studied: memory-based and model- user-based collaborative filtering, PCC is employed to define
based. the similarity between two users a and u based on the items
they rated in common:
2.1 Memory-based approaches
The memory-based approaches are the most popular pre- (ra,i − ra ) · (ru,i − ru )
diction methods and are widely adopted in commercial col- i∈I(a)∩I(u)
Sim(a, u) = ,
laborative filtering systems [12, 16]. The most analyzed ex-
(ra,i − ra )2 · (ru,i − ru )2
amples of memory-based collaborative filtering include user-
i∈I(a)∩I(u) i∈I(a)∩I(u)
based approaches [2, 7, 10, 22] and item-based approaches [5,
(1)
12, 17]. User-based approaches predict the ratings of active
where Sim(a, u) denotes the similarity between user a and
users based on the ratings of similar users found, and item-
user u, and i belongs to the subset of items which user a and
based approaches predict the ratings of active users based
user u both rated. ra,i is the rate user a gave item i, and ra
on the information of similar items computed. User-based
represents the average rate of user a. From this definition,
and item-based approaches often use PCC algorithm [16]
user similarity Sim(a, u) is ranging from [0, 1], and a larger
and VSS algorithm [2] as the similarity computation meth-
value means users a and u are more similar.
ods. PCC-based collaborative filtering generally can achieve
Item-based methods such as [5, 17] are similar to user-
higher performance than the other popular algorithm VSS,
based approaches, and the difference is that item-based meth-
since it considers the differences of user rating styles.
ods employ the similarity between the items instead of users.
2.2 Model-based Approaches The basic idea in similarity computation between two items
In the model-based approaches, training datasets are used i and j is to first isolate the users who have rated both of
to train a predefined model. Examples of model-based ap- these items and then apply a similarity computation tech-
proaches include clustering models [11, 20, 22], aspect mod- nique to determine the similarity Sim(i, j) [17]. The PCC-
els [8, 9, 19] and latent factor model [3]. [11] presented based similarity computation between two items i and j can
an algorithm for collaborative filtering based on hierarchical be described as:
clustering, which tried to balance robustness and accuracy of (ru,i − ri ) · (ru,j − rj )
predictions, especially when little data were available. Au- u∈U (i)∩U (j)
thors in [8] proposed an algorithm based on a generaliza- Sim(i, j) = ,
tion of probabilistic latent semantic analysis to continuous- (ru,i − ri )2 · (ru,j − rj )2
valued response variables. The model-based approaches are u∈U (i)∩U (j) u∈U (i)∩U (j)
often time-consuming to build and update, and cannot cover (2)
as diverse a user range as the memory-based approaches where Sim(i, j) is the similarity between item i and item j,
do [22]. and u belongs to the subset of users who both rated item
i and item j. ru,i is the rate user u gave item i, and ri
2.3 Other Related Approaches represents the average rate of item i. Like user similarity,
In order to take the advantages of memory-based and item similarity Sim(i, j) is also ranging from [0, 1].
model-based approaches, hybrid collaborative filtering meth-
ods have been studied recently [14, 22]. [1, 4] unified collab- 3.2 Significance Weighting
orative filtering and content-based filtering, which achieved PCC-based collaborative filtering generally can achieve
significant improvements over the standard approaches. At higher performance than other popular algorithms like VSS [2],
the same time, in order to solve the data sparsity problem, since it considers the factor of the differences of user rating
researchers proposed dimensionality reduction approaches styles. However PCC will overestimate the similarities of
in [15]. The dimensionality-reduction approach addressed users who happen to have rated a few items identically, but
the sparsity problem by deleting unrelated or insignificant may not have similar overall preferences [13]. Herlocker et
40
SIGIR 2007 Proceedings Session 2: Routing and Filtering
al. [6, 7] proposed to add a correlation significance weight- Accordingly, the collaborative filtering is simplified into
ing factor that would devalue similarity weights that were two simple questions. The first is “Under what circumstance
based on a small number of co-rated items. Herlocker’s lat- does our algorithm have confidence to predict the shaded
est research work [13] proposed to use the following modified block?” and the second is “How to predict?”. The following
similarity computation equation: subsections will answer these two questions.
M ax(|Ia ∩ Iu |, γ)
Sim (a, u) = · Sim(a, u). (3)
γ 4.1 Similar Neighbors Selection
This equation overcomes the problem when only few items Similar neighbors selection is a very important step in
are rated in common but in case that when |Ia ∩ Iu | is much predicting missing data. If selected neighbors are dissimilar
higher than γ, the similarity Sim (a, u) will be larger than 1, with the current user, then the prediction of missing data
and even surpass 2 or 3 in worse cases. We use the following of this user is inaccurate and will finally affect the predic-
equation to solve this problem: tion results of the active users. In order to overcome the
M in(|Ia ∩ Iu |, γ) flaws of Top-N neighbors selection algorithms, we introduce
Sim (a, u) = · Sim(a, u), (4) a threshold η. If the similarity between the neighbor and the
γ
current user is larger than η, then this neighbor is selected
where |Ia ∩ Iu | is the number of items which user a and as the similar user.
user u rated in common. This change bounds the similarity For every missing data ru,i , a set of similar users S(u)
Sim (a, u) to the interval [0, 1]. Then the similarity between towards user u can be generated according to:
items could be defined as:
M in(|Ui ∩ Uj |, δ) S(u) = {ua |Sim (ua , u) > η, ua = u},
Sim (i, j) = · Sim(i, j), (5) (6)
δ
where |Ui ∩ Uj | is the number of users who rated both item
i and item j. where Sim (ua , u) is computed using Eq. (4). At the same
time, for every missing data ru,i , a set of similar items S(i)
towards item i can be generated according to:
4. COLLABORATIVE FILTERING
FRAMEWORK
In pratice, the user-item matrix of commercial recommen- S(i) = {ik |Sim (ik , i) > θ, ik = i}, (7)
dation system is very sparse and the density of available
ratings is often less than 1% [17]. Sparse matrix directly
leads to the prediction inaccuracy in traditional user-based where θ is the item similarity threshold, and Sim (ik , i) is
or item-based collaborative filtering. Some work applies computed by Eq. (5). The selection of η and θ is an im-
data smoothing methods to fill the missing values of the portant step since a very big value will always cause the
user-item matrix. In [22], Xue et al. proposed a cluster- shortage of similar users or items, and a relative small value
based smoothing method which clusters the users using K- will bring too many similar users or items.
means first, and then predicts all the missing data based According to Eqs.(6) and (7), we define that our algorithm
on the ratings of Top-N most similar users in the similar will lack enough confidence to predict the missing data ru,i if
clusters. The simulation shows this method could generate and only if S(u) = ∅∧S(i) = ∅, which means that user u does
better results than other collaborative filtering algorithms. not have similar users and item i does not have similar items
But cluster-based method limits the diversity of users in either. Then our algorithm sets the value of this missing
each cluster, and the clustering results of K-means relies on data to zero. Otherwise, it will predict the missing data ru,i
the pre-selected K users. Furthermore, if a user does not following the algorithm described in Subsection 4.2.
have enough similar users, then Top-N algorithm generates
a lot of dissimilar users which definitely will decrease the
prediction accuracy of the active users. 4.2 Missing Data Prediction
According to the analysis above, we propose a novel ef- User-based collaborative filtering predicts the missing data
fective missing data prediction algorithm which predicts the using the ratings of similar users and item-based collabora-
missing data when it fits the criteria we set. Otherwise, we tive filtering predicts the missing data using the ratings of
will not predict the missing data and keep the value of the similar items. Actually, although users have their own rat-
missing data to be zero. As illustrated in Fig. 1(a), before ing style, if an item is a very popular item and has obtained
we predict the missing data, the user-item matrix is a very a very high average rating from other users, then the ac-
sparse matrix and every user only rates few items with ru,i ; tive user will have a high probability to give this item a
at the same time, other unrated data are covered with shade. good rating too. Hence, predicting missing data only using
Using this sparse matrix to predict ratings for active users user-based approaches or only using item-based approaches
always results in giving bad recommendations to the ac- will potentially ignore valuable information that will make
tive users. In our approach, we evaluate every shaded block the prediction more accurate. We propose to systematically
(missing data) using the available information in Fig. 1(a). combine user-based and item-based approaches, and take
For every shaded block, if our algorithm achieves confidence advantage of user correlations and item correlations in the
in the prediction, then we give this shaded block a predicted user-item matrix.
rating value ru,i . Otherwise, we set the value of this missing Given the missing data ru,i , according to Eq. (6) and
data to zero, as seen in Fig. 1(b). Eq. (7), if S(u) = ∅ ∧ S(i) = ∅, the prediction of missing
41
SIGIR 2007 Proceedings Session 2: Routing and Filtering
i1 i2 i3 i4 i5 i6 i7 i8 i9 in i1 i2 i3 i4 i5 i6 i7 i8 i9 in
u1 r1,1 r1, 4 u1 r1,1 r̂1,3 r1, 4 r̂1, 6 r̂1,8 r̂1,9
u2 r2, 2 r2,8 u2 r2, 2 r̂2 , 4 r̂2 ,5 r̂2 ,7 r2,8 rˆ2 , n
u3 r3, 6 u3 r̂3,1 r̂3,3 r̂3, 4 r̂3,5 r3,6 r̂3,8 r̂3,9
u4 r4, 4 r4,n u4 r̂4 ,1 r̂4 , 2 r4, 4 r̂4 ,5 r̂4 ,6 r̂4 ,7 r̂4,9 r4,n
u5 r5,3 r5, 7 u5 r̂5,1 r̂5, 2 r5,3 r̂5,5 r5, 7 r̂5,8 r̂5,9 rˆ5, n
u6 r6,9 u6 r̂6,1 r̂6 , 2 r̂6 , 4 r̂6,5 r̂6 ,6 r̂6 ,7 r6,9 rˆ6 , n
um rm , 2 rm ,n um rˆm ,1 rm , 2 rˆm , 4 rˆm , 6 rˆm ,8 rˆm ,9 rm ,n
Figure 1: (a) The user-item matrix (m × n) before missing data prediction. (b) The user-item matrix (m × n)
after missing data prediction.
data P (ru,i ) is defined as: does not have similar users and at the same time, item i also
does not have similar items. In this situation, we choose not
Sim (ua , u) · (rua ,i − ua ) to predict the missing data; otherwise, it will bring negative
P (ru,i ) = λ × (u +
ua ∈S(u)
)+ influence to the prediction of the missing data ru,i . That is:
Sim (ua , u) If S(u) = ∅ ∧ S(i) = ∅, the prediction of missing data
ua ∈S(u) P (ru,i ) is defined as:
Sim (ik , i) · (ru,ik − ik ) P (ru,i ) = 0. (11)
ik ∈S(i)
(1 − λ) × (i + ), (8) This consideration is different from all other existing pre-
Sim (ik , i) diction or smoothing methods. They always try to predict
ik ∈S(i) all the missing data in the user-item matrix, which will pre-
where λ is the parameter in the range of [0, 1]. The use of dict some missing data with bad quality.
parameter λ allows us to determine how the prediction relies
on user-based prediction and item-based prediction. λ = 1 4.3 Prediction for Active Users
states that P (ru,i ) depends completely upon ratings from After the missing data is predicted in the user-item ma-
user-based prediction and λ = 0 states that P (ru,i ) depends trix, the next step is to predict the ratings for the active
completely upon ratings from item-based prediction. users. The prediction process is almost the same as predict-
In practice, some users do not have similar users and the ing the missing data, and the only difference is in the case
similarities between these users and all other users are less for a given active user a; namely, if S(a) = ∅ ∧ S(i) = ∅,
than the threshold η. Top-N algorithms will ignore this then predicts the missing data using the following equation:
problem and still choose the top n most similar users to
P (ra,i ) = λ × ra + (1 − λ) × ri . (12)
predict the missing data. This will definitely decrease the
prediction quality of the missing data. In order to predict In other situations, if (1) S(u) = ∅ ∧ S(i) = ∅, (2) S(u) =
the missing data as accurate as possible, in case some users ∅ ∧ S(i) = ∅ or (3) S(u) = ∅ ∧ S(i) = ∅, we use Eq. (8),
do not have similar users, we use the information of similar Eq. (9) and Eq. (10) to predict ra,i , respectively.
items instead of users to predict the missing data, and vice
versa, as seen in Eq. (9) and Eq. (10). This consideration in- 4.4 Parameter Discussion
spires us to fully utilize the information of user-item matrix
The thresholds γ and δ introduced in Section 3 are em-
as follows:
ployed to avoid overestimating the users similarity and items
If S(u) = ∅ ∧ S(i) = ∅, the prediction of missing data
similarity, when there are only few ratings in common. If we
P (ru,i ) is defined as:
set γ and δ too high, most of the similarities between users
Sim (ua , u) · (rua ,i − ua ) or items need to be multiplied with the significance weight,
ua ∈S(u)
and it is not the results we expect. However, if we set γ and
P (ru,i ) = u + . (9) δ too low, it is also not reasonable because the overestimate
Sim (ua , u) problem still exists. Tuning these parameters is important
ua ∈S(u) to achieving a good prediction results.
If S(u) = ∅ ∧ S(i) = ∅, the prediction of missing data The thresholds η and θ introduced in Section 4.1 also play
P (ru,i ) is defined as: an important role in our collaborative filtering algorithm. If
η and θ are set too high, less missing data need to be pre-
Sim (ik , i) · (ru,ik − ik ) dicted; if they are set too low, a lot of missing data need
ik ∈S(i) to be predicted. In the case when η = 1 and θ = 1, our
P (ru,i ) = i + . (10) approach will not predict any missing data, and this algo-
Sim (ik , i)
rithm becomes the general collaborative filtering without
ik ∈S(i)
data smoothing. In the case when η = 0 and θ = 0, our ap-
The last possibility is given the missing data ru,i , user u proach will predict all the missing data, and this algorithm
42
SIGIR 2007 Proceedings Session 2: Routing and Filtering
43
SIGIR 2007 Proceedings Session 2: Routing and Filtering
Table 4: MAE comparison with state-of-the-arts algorithms (A smaller MAE value means a better perfor-
mance).
Num. of Training Users 100 200 300
Ratings Given for Active Users 5 10 20 5 10 20 5 10 20
EMDP 0.807 0.769 0.765 0.793 0.760 0.751 0.788 0.754 0.746
SF 0.847 0.774 0.792 0.827 0.773 0.783 0.804 0.761 0.769
SCBPCC 0.848 0.819 0.789 0.831 0.813 0.784 0.822 0.810 0.778
AM 0.963 0.922 0.887 0.849 0.837 0.815 0.820 0.822 0.796
PD 0.849 0.817 0.808 0.836 0.815 0.792 0.827 0.815 0.789
PCC 0.874 0.836 0.818 0.859 0.829 0.813 0.849 0.841 0.820
0.83
same configurations. In other words, we intend to determine
0.82 EMDP−Given20 the effectiveness of our missing data prediction algorithm,
PEMD−Given20
0.81 EMDP−Given10
and whether our approach is better than the approach which
PEMD−Given10 will predict every missing data or not.
EMDP−Given5
0.8
PEMD−Given5 In Fig. 2, the star, up triangle, and diamond in solid line
0.79
represent the EMDP algorithm in Given20, Given10 and
MAE
44
SIGIR 2007 Proceedings Session 2: Routing and Filtering
Density
MAE
MAE
MAE
0.7548 0.777
0.747 0.92
0.7546
0.7765
0.7468
0.7544 0.9
0.776
0.7466
0.7542
0.7755 0.88
0.7464 0.754
MAE
MAE
0.78 0.78 0.8
MAE
MAE
0.8
0.78 0.78
0.78
0.76 0.76
Density
Density
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Eta and Theta Eta and Theta Eta and Theta
(d) (f) (g)
45
SIGIR 2007 Proceedings Session 2: Routing and Filtering
Observed from Fig. 4, we draw the conclusion that the the relationship between user information and item infor-
value of λ impacts the recommendation results significantly, mation since our simulations show the algorithm combining
which demonstrates that combining the user-based method these two kinds of information generates better performance.
with the item-based method will greatly improve the rec- Lastly, another research topic worthy of studying is the scal-
ommendation accuracy. Another interesting observation is ability analysis of our algorithm.
when following the increase of the number of ratings given
(from 5 to 10, and from 10 to 20), the value of arg minλ (M AE) 7. ACKNOWLEDGMENTS
of each curve in Fig. 4 shifts from 0.3 to 0.8 smoothly. This We thank Mr. Shikui Tu, Ms. Tu Zhou and Mr. Haix-
implies the information for users is more important than uan Yang for many valuable discussions on this topic. This
that for items if more ratings for active users are given. On work is fully supported by two grants from the Research
the other hand, the information for items would be more im- Grants Council of the Hong Kong Special Administrative
portant if less ratings for active users are available; however, Region, China (Project No. CUHK4205/04E and Project
less ratings for active users will lead to more inaccuracy of No. CUHK4235/04E).
the recommendation results.
46