You are on page 1of 11



ImWalkMF: Joint matrix factorization and implicit walk

integrative learning for recommendation

Item type Conference Paper

Authors Zhang, Chuxu; Yu, Lu; Zhang, Xiangliang; Chawla, Nitesh

Citation Zhang C, Yu L, Zhang X, Chawla N (2017) ImWalkMF:

Joint matrix factorization and implicit walk integrative
learning for recommendation. 2017 IEEE International
Conference on Big Data (Big Data). Available:

Eprint version Post-print

DOI 10.1109/bigdata.2017.8258001

Publisher IEEE

Journal 2017 IEEE International Conference on Big Data (Big


Rights (c) 2017 IEEE. Personal use of this material is permitted.

Permission from IEEE must be obtained for all other users,
including reprinting/ republishing this material for
advertising or promotional purposes, creating new
collective works for resale or redistribution to servers or
lists, or reuse of any copyrighted components of this work
in other works.

Downloaded 21-Mar-2018 00:48:21

Link to item

ImWalkMF: Joint Matrix Factorization and Implicit
Walk Integrative Learning for Recommendation
Chuxu Zhang† , Lu Yu‡ , Xiangliang Zhang‡ and Nitesh Chawla†

University of Notre Dame, Notre Dame, IN, USA
Email: {czhang11, nchawla}

King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Email: {lu.yu, xiangliang.zhang}

Abstract—Data sparsity and cold-start problems are prevalent prove MF and build the so-called implicit social recommender
in recommender systems. To address such problems, both the systems.
observable explicit social information (e.g., user-user trust con- Social recommender systems make usage of the trustable
nections) and the inferable implicit correlations (e.g., implicit
neighbors computed by similarity measurement) have been in- social relationships among users to address the sparsity issue
troduced to complement user-item ratings data for improving of ratings data, and thus improve the user preference prediction
the performances of traditional model-based recommendation by considering not only a users’ rating behavior, but also
algorithms such as matrix factorization. Although effective, (1) the tastes of a user’s trustable social neighbors. For example,
the utilization of the explicit user-user social relationships suffers in [12], a user social regularization term is integrated into
from the weakness of unavailability in real systems such as Netflix
or the issue of sparse observable content like 0.03% trust density the objective function of MF to help shape the users’ latent
in Epinions, thus there is no or little explicit social information space. However, the utilization of the explicit user-user social
that can be employed to improve baseline model in real ap- connections suffers from two main weaknesses: (a) there is no
plications; (2) the current similarity measurement approaches available indication about reliable social relationship in most
focus on inferring implicit correlations between a user (item) real-life systems such as Netflix or Ebay or (even there is)
and their direct neighbors or top-k similar neighbors based on
user-item ratings bipartite network, so that they fail to compre- the explicit relationship indication is usually very sparse (e.g.,
hensively unfold the indirect potential relationships among users the trust density in Epinions is 0.03%), thus most of the social
and items. To solve these issues regarding both explicit/implicit recommendation algorithms can not be applied to real systems;
social recommendation algorithms, we design a joint model of (b) an active user can be socially connected with others who
matrix factorization and implicit walk integrative learning, i.e., have different taste/preference [18] so that social relationship
ImWalkMF, which only uses explicit ratings information yet
models both direct rating feedbacks and multiple direct/indirect fails to encode the comprehensive correlation between the
implicit correlations among users and items from a random diverse tastes of two users toward different kinds of items.
walk perspective. We further propose a combined strategy for As for the implicit social recommender systems, they infer and
training two independent components in the proposed model incorporate implicit correlations information into MF based on
based on sampling. The experimental results on two real-world the explicit rating feedbacks. For instance, in [18], an implicit
sparse datasets demonstrate that ImWalkMF outperforms the
traditional regularized/probabilistic matrix factorization models network embedding method CUNE is proposed to compute
as well as other competitive baselines that utilize explicit/implicit similarities among users and generate top-k similar neighbors
social information. of each user and further incorporate them into MF. Although
enhancing MF with inferred correlations, current implicit
I. I NTRODUCTION social recommendation approaches have two main limitations:
Recommender systems have become an indispensable tech- (a) the rating-based similarity measurements (e.g., Pearson
nique for filtering and recommending information or items to correlation coefficient) are easy to find direct neighbors yet
personalize to users’ preferences or needs, such as product provide no correlation information for non-neighboring nodes
recommendation at Amazon and movie recommendation at on user-item ratings bipartite network; (b) the methods (e.g.,
Netflix or music recommendation at Pandora or even disease CUNE) generating top-k implicit neighbors ignore correlations
prediction systems [1]. Various approaches [2]–[8] based on between a user and their non-top-k neighbors which may still
matrix factorization (MF) have been proposed to solve the contain some potential useful information, so that they fail to
problem of ratings prediction and make recommendations explore implicit information comprehensively.
by only using user-item ratings information. To improve the To resolve the above issues regarding both explicit and
recommendation performance, recent works [9]–[16] employ implicit social recommender systems, we propose to extract
the observable explicit social relationships (e.g., trust links multiple implicit and reliable correlations among users and
among online users) to enhance MF framework and build items by only using ratings information. Specifically, we
social recommender systems. Besides, implicit correlations in- manage users’ positive feedbacks (relatively large ratings)
formation (e.g., top-k similar neighbors) induced by similarity on items as a user-item implicit bipartite network (U-I-Net)
measurement based approaches [17], [18] is utilized to im- and utilize random walk sampling on U-I-Net to generate a
set of node sequences. Each random walk sequence implies matrix r. MF tends to approximate r by a multiplication of d-
multiple direct/indirect correlations among users and items rank user and item latent factor vectors, i.e. r ≈ z T q, where
within the walk. Next, we design a joint model ImWalkMF z ∈ Rd×m and q ∈ Rd×n with d  min(m, n). Thus the
of MF and implicit walk integrative learning (IWIL) based rating of user ui on item vj can be predicted by the inner
on the collected random walk set. The MF component of product of the specific user latent factor zi (i-th column vector
ImWalkMF formalizes users’ direct rating feedbacks on items of z) and item latent factor qj (j-th column vector of q),
by using standard square loss. Besides, the IWIL component i.e. r̂ij = zi T qj . The formulation leads to the optimization
of ImWalkMF formalizes multiple direct/indirect correlations problem that minimizes the square loss on the set of known
among users and items from both user and item levels by ratings:
introducing a user-user (item-item) pull loss function and a 2 2
(rij − zi T qj )2 + λ2 (kzi k + kqj k ),
user-item (item-user) push loss function. Thus ImWalkMF minz∗ ,q∗ r

comprehensively models both direct rating feedbacks and

where R̃ is the set of known ratings, i.e., training data, and λ is
useful implicit information. In order to solve the challenge
the regularization parameter for controlling model complexity
of training ImWalkMF made of two independent components
to avoid over-fitting. Gradient based approaches can be applied
using different data samples, we propose a combined strategy
to model training since the objective function is derivable.
based on sampling to train the joint model and optimize the
Besides ratings prediction, MF has been proved to be effective
latent factors of users and items. Further evaluated experiments
in top-N recommendations [21]–[24].
verify the effectiveness of ImWalkMF in recommendation. In
summary, our main contributions are as follows: B. Social Recommendation Systems
• We innovatively introduce random walk sampling to col- Recently, due the prevalence of Web 2.0 based social
lect a set of node sequences based on user-item implicit networks, the research of social recommender systems has
bipartite network that implies multiple implicit and reliable attracted a lot of attention. Many novel social recommendation
correlations. Unlike previous work that focus on computing algorithms [9]–[13], [15] based on MF have been proposed to
similarities and inferring limited implicit relationship among improve the recommendation performance. For example, one
users (items), it captures comprehensive information to com- popular model SoReg [12] incorporates a social regularization
plement user-item ratings data. into MF and is formally defined as:
• Based on the set of random walk sequences, we propose a 2 2
T 2 λ
joint recommendation model ImWalkMF for modeling both minz∗ ,q∗ rij∈R̃ (rij − zi qj ) + 2 (kzi k + kqj k )
rating feedbacks and multiple implicit correlations among P 2
+ α uk ∈F (i) sim(i, k) kzi − qk k ,
users and items, and further design a combined strategy for
training ImWalkMF based on sampling. where F (i) denotes the set of social neighbors of user ui ,
• We conduct extensive experiments to evaluate the perfor- parameter α controls the extent of social constraint, and
mance of ImWalkMF. The results show that ImWalkMF sim(i, k) represents the precomputed similarity score between
largely improves the traditional regularized/probabilistic ma- ui and their friend uk . The extra regularization term tends
trix factorization models, and outperforms the competitive to reduce the distance between the latent factors of each
baselines that utilize explicit/implicit social information. user and their social neighbors, which helps shape the latent
The rest of the paper is organized as follows. Section space of users and improve the recommendation performance.
II reviews the background and related work of this paper. More importantly, the framework can be easily modified or
Section III describes the proposed method and algorithm. The extended by other indirect implicit information. For example,
results from extensive experiments are presented in Section the explicit social constraint can be replaced by implicit
IV, followed by the conclusion in Section V. correlations generated by predefined similarity measurement
[17] or network embedding technique [18] when the explicit
II. BACKGROUND social connections are not available. In addition, item based
constraints can be introduced to further enhance the model if
In this section, we briefly review the background and related category or tags information [25] is available.
work including recommender systems, metric learning and
random walk sampling on networks. C. Metric Learning for k-Nearest Neighbors Classification
Metric learning algorithms [26]–[29] generate a distance
A. Matrix Factorization for Collaborative Filtering metric that captures the important relationship among data
Traditional collaborative filtering (CF) techniques [19], [20] and have been widely applied to many applications such as
focus on computing rating-based similarities (e.g., cosine the nearest neighbors classification or document retrieval. One
similarity) among users (items) and find the most similar of the most well known metric learning models for k-nearest
neighbors to the query user (item). Over the past decade, the neighbors classification is the large margin nearest neighbor
matrix factorization (MF) [2], [4], [6], [8] has perhaps become (LMNN) [27], which tends to learn a metric that minimizes
the most popular CF approach due to its superior performance. the number of differently labeled impostors of the given input
The ratings of m users on n items are denoted by an m × n data point. Specifically, LMNN defines two loss functions, i.e.,
pull loss and push loss, and combines them to generate the and only if rij ≥ σ (σ = 3 in this work when ratings are
complete loss function. The pull loss of the given input i is from 1 to 5), to denote the corresponding implicit information.
formulated as: Step 1 of Figure 1 shows an example. We extract all positive
feedbacks (in red color, rij ≥ 3 in scale 5) from an 8×8 rating
ϕpull (i) = j i d(i, j)2 ,
matrix and represent them by an 8-8 U-I-Net. The U-I-Net
where d(·) is the distance metric between two input data points is unweighted since we regard each positive feedback as the
and j i denotes that j is i’s target neighbor. Besides, the same extent of implicit indication and it captures preference
push loss of the given input i is defined as: relations of each user with respect to different items.

ϕpush (i) = j i k (1 − yik )[1 + d(i, j)2 − d(i, k)2 ]+ , B. User-Item Random Walk Sequences Generation

To conduct the second step of ImWalkMF, we perform

where yik = 1 if two data points i and k belong to the
random walk samplings over U-I-Net and collect a set of
same class, otherwise yik = 0, and [x]+ = max(x, 0)
node sequences. Formally, given a root node c0 , we simulate
is the standard hinge loss. Besides the nearest neighbors
a random walk (with length L) wc0 = {c0 , c1 , c2 , ..., cL−1 }
classification, the idea of LMNN has been adopted in question
via conducting alternation between users and items. Denote
answering systems [30], [31], recommender systems [32],
the i-th node in the walk as ci−1 and it is selected from the
author identification [33], etc.
neighbors of its predecessor ci−2 using:
D. Random Walk Based Information Integration 1
p(ci−1 = x|ci−2 = y) = ,
The random walk based approaches were originally used |N b(y)|
in network sampling [34]–[36] and influential nodes identi-
where N b(y) is ci−2 ’s neighbor set in U-I-Net. Figure 1 (step
fication [37]–[39]. In recent years, the random walk method
2) gives an example: a walk w1 ≡ {1 → b → 3 → e →
has been applied to network embedding [40], [41], question
6 → g → 4 → c → 2 → a} is generated by random walk in
answering system [30], [31], etc. due to its effectiveness
U-I-Net. The surrounding context of each node in w1 implies
in integrating multiple correlations information in networks.
multiple correlations among users and items within w1 . Take
Generally, a random walk rooted at a specific node is a stochas-
sub-sequences ŵb ≡ {b → 3 → e → 6 → g → 4 → c} and
tic process with a sequence of random variables, such that
ŵ3 ≡ {3 → e → 6 → g → 4 → c → 2} (window size equals
each variable is a node chosen randomly from the neighbors
to 3) centered at user 6 and item g as examples. The multiple
of its predecessor. In analogy with a sentence in corpus,
correlations include:
the random walk node sequences could be further fed to
• user-item direct correlation: user 6 shows preference on
Word2Vec model [42] for embedding learning. Besides, the
homogeneous random walk sequence can be easily extended item e and g since 6 directly connects to e and g in ŵb .
• user-user indirect correlation: user 6 has similar taste with
in heterogeneous context and be applied to other applications
such as question retrieval (contains user, question, answer user 3 and user 4 in ŵb since both 6 and 3 rate item e
nodes) in question answering system or author identification positively, both 6 and 4 rate item g positively.
• user-item indirect correlation: user 6 has indirect preference
(contains author, paper, venue, keyword nodes) in academic
search service. on item b and item c in ŵb since user 3 and user 4 have
direct correlations with b and c respectively, and 3 and 4
III. P ROPOSED M ETHOD have indirect correlations with 6.
In this section, we present the ImWalkMF method that • item-user direct correlation: item g receives positive feed-

contains three consecutive steps: 1) constructing user-item back from both user 6 and user 4 since g directly connects
implicit bipartite network; 2) generating user-item random to 6 and 4 in ŵ3 .
walk sequences; and 3) designing the joint model of matrix • item-item indirect correlation: item g has similar attribute

factorization and implicit walk integrative learning. Figure 1 with item e and item c in ŵ3 since both g and e are rated by
gives an illustration of the whole process, which we detail user 6 positively, both g and c are rated by user 4 positively.
below. • item-user indirect correlation: item g receives indirect pref-
erence from user 3 and user 2 in ŵ3 since item e and item
A. User-Item Implicit Bipartite Network Construction c have direct correlations with 3 and 2 respectively, and e
At the first step of ImWalkMF, we extract user-item implicit and c have indirect correlations with g.
feedbacks by using the available ratings information. The user- According to the above discussion, we can find that
item online ratings are usually represented by an m × n rating each random walk contains both direct correlations between
matrix r where rij denotes that user ui ever gave rating rij to neighboring nodes and transitive correlations among non-
item vj . The value of rij denotes the extent of ui ’s preference neighboring nodes. In addition, the correlations exist not only
on vj and a relatively large score means a positive indication. between two nodes of the same type but also two nodes
Similar to the concept of implicit feedbacks [43] used in of different types, which makes the extracted information
previous work, we utilize a user-item implicit bipartite network comprehensive. Moreover, the co-occurrence frequencies of
(U-I-Net) where connections only exist between ui and vj if any two nodes in the same walks reflect their correlations if
3   user-­‐user  level  pull  loss  
user   item   b  
a c d e f g h 6   ra5ng  based  matrix  factoriza5on  
b 1   a  
1 2   3   2   1   1   2   4   3  
3   g  
2   b  
2   5   2   4   3   1   2   2   1   e   4  
3   c   random     user-­‐item  level  push  loss  
3   3   4   2   1   5   1   2   1   implicit    
4   1   2   5   2   4   2   5   2   feedback   4   d   walk   6   c  
5   2   5   1   4   2   1   2   1   rij>=3   5   e   g   3  
6   4   1   2   2   3   5   3   1   e  
6   f   4   item-­‐item  level  pull  loss  
7 2   5   1   4   2   1   2   1  
8 7   6  
2   1   2   2   3   1   1   4   c  
8   h   g   ra5ng  based  matrix  factoriza5on  
user-­‐item  ra=ng  matrix   U-­‐I-­‐Net   4  
a   c   item-­‐user  level  push  loss  
step  1   step  2   step  3  

Fig. 1. The illustration of ImWalkMF model. The first step is to construct user-item implicit bipartite network (U-I-Net) via extracting users’ positive feedbacks
on items. The second step is to generate random walk sequences (e.g., w1 labeled by solid lines with arrow) by random walk sampling on U-I-Net. The third
step is to design the joint model as well as the learning algorithm.

we generate plenty of walks rooted at each node. In the next The above equation obeys assumption (1), namely a user
subsection, we design a joint model to formalize these implicit should be closer to neighbors who have similar preference
correlations as well as the explicit ratings information. than the others. Thus we define a user-level pull loss in each
random walk w as:
C. Joint Model of Matrix Factorization and Implicit Walk zh(t) − zh(t+∆t) 2 (2)
Lupull (w) =

Integrative Learning uh(t) ∈w −τ ≤∆t≤τ,∆t6=0
uh(t) ∈U uh(t+∆t) ∈U
The previous two steps generate a set of random walk
sequences that capture multiple reliable correlations among where τ is the window size of uh(t) ’s surrounding context
users and items. As the last and the core step of ImWalkMF, w[t − τ : t + τ ], and h(·) maps the position of surrounding
we design a joint model to formalize these correlations in context to the node index. Thus constraining Lupull keeps user-
this section. The high level idea of ImWalkMF model is user indirect correlations. In addition, inspired by the metric
as follows: we use matrix factorization (MF) to model the learning work [27], we introduce the hinge loss to capture the
observable user-item ratings. In addition, we introduce an difference of ui ’s preference scores on an indirect correlated
implicit walk integrative learning (IWIL) component to model item vk and a negative item vk0 (rating is smaller than the
multiple direct/indirect correlations among users and items by threshold σ in implicit feedbacks extraction):
following two assumptions: (1) the distance of latent factors 0
n o
Huv (i, k, k ) = ξ + rik0 − zi qk (3)
between indirect correlated neighbors (i.e. user-user pair and +
item-item pair) should be as small as possible since they share where {x}+ = max(x, 0) and ξ is a positive margin value.
similar preferences or attributes; and (2) the preference score The hinge loss obeys assumption (2) and a loss penalty will
of direct/indirect correlated user-item (item-user) pair should incur if the preference score (quantified by the multiplication
be larger than the corresponding value of negative user-item of two related latent factor vectors) of positive pair (ui , vk )
(item-user) pair since users are likely to choose correlated is not at least ξ larger than the corresponding value (explicit
items than negative items. Next, we formulate the IWIL in rating) of negative pair (ui , vk0 ). Denote the set of negative
details. items which receive negative indications from ui as N (ui )
1) User-Level Integrative Learning: Denote the set of user and a user-level push loss in each random walk w is formally
and the set of item as U and V , respectively. As discussed defined as:
in Section II, user ui and item vj are represented by a user X X X n
latent factor vector zi ∈ Rd×1 and an item latent factor vector Lupush (w) = ξ + rh(t)t0
uh(t) ∈w −τ ≤∆t≤τ,∆t6=0 v 0 ∈N (uh(t) )
qj ∈ Rd×1 , respectively. Inspired by SoReg [12], we measure uh(t) ∈U vh(t+∆t) ∈V t

the distance between user ui and user ui0 by: o

− zh(t) qh(t+∆t)
0 +
du (i, i ) =k zi − zi0 k2 (1) (4)
Thus constraining Lupush is to capture user-item direct/indirect sources. To resolve this issue, we design a combined strategy
correlations. based on sampling to learn from random walk sequences.
2) Item-Level Integrative Learning: Similar to user-level Specifically, for each node (user/item) in a walk, it first draws
integrative learning, the item-level pull loss is formally defined samples from ratings data and updates parameters of the MF
as: component. Then it collects correlated neighbors sets of this
qh(t) − qh(t+∆t) 2 (5) node and draws pair/triple samples for each loss function of
Lvpull (w) =

IWIL component and further updates the corresponding pa-
vh(t) ∈w −τ ≤∆t≤τ,∆t6=0
vh(t) ∈V vh(t+∆t) ∈V rameters. In order to reduce the training epochs and sampling
overhead, each component will be trained on a mini-batch
Similarly, constraining Lvpull keeps item-item indirect corre- of data samples instead of on a single sample. The detail of
lations. Besides, denote the set of users who give negative learning algorithm is summarized in Algorithm 1.
feedbacks to vj as N (vj ) and the item-level push loss is
formulated as: D. Model Inference
X X X n
Lvpush (w) = ξ + rt0 h(t) The stochastic gradient descent is applied to update model
vh(t) ∈w −τ ≤∆t≤τ,∆t6=0 u 0 ∈N (vh(t) )
vh(t) ∈V uh(t+∆t) ∈U t parameters for both MF and IWIL components in the joint
o objective function, i.e., Eq. (8). The update rules for different
− zh(t+∆t) qh(t) training samples are:
(6) • For rating sample rij ∈ R̃, the gradients of zi and qj are
In the same way, the loss penalty of Lvpush
captures item-user calculated as follows:
direct/indirect correlations. ∆zi = −2 · (rij − zi T qj )qj + 2 · λzi
3) The Joint Objective: Denote the set of all random walk (9)
∆qj = −2 · (rij − zi T qj )zi + 2 · λqj
sequences as C, the joint objective function is defined as a
combination of the MF and the IWIL in C: • For user-user pair sample (a, a ) ∈ Cuu , the gradients of
L =LM F + Lupull (w) + Lupush (w) + Lvpull (w) za and za0 are calculated as follows:
w∈C (7)
o ∆za = 2 · (za − za0 ) + 2 · λza
+ Lpush (w) + Lreg (10)
∆za0 = −2 · (za − za0 ) + 2 · λza0
where Lreg is the regularization term for avoiding over-fitting. 0

Let Cuu , Cuvv , Cvv and Cvuu be the sets of user-user pairs • For user-item-item triple sample (l, k, k ) ∈ Cuvv , the
of Lupull , user-item-item triples of Lupush , item-item pairs of gradients of zl and qk are formulated as follows:
Lvpull and item-user-user triples of Lvpush in C, respectively. 
∆zl = −δ(ξ + rlk0 − zl qk · qk + 2 · λzl
Thus we can rewrite the joint objective function in terms of  (11)
an expectation formula: ∆qk = −δ(ξ + rlk0 − zl qk · zl + 2 · λqk
2 2 2 where δ(x) is an indicator function, which equals to 1 if
L = E{rij ∈R̃} rij − zi T qj + λ kzk + kqk
| {z } | {z } and only if x is larger than 0 otherwise equals to 0.
LM F Lreg • For item-item pair sample (b, b ) ∈ Cvv , the gradients of qb
and qb0 are calculated as follows:

+ E{(a,a0 )∈Cuu } za − za0
| {z }
pull ∆qb = 2 · (qb − qb0 ) + 2 · λqb
n o (12)
+ E{(l,k,k0 )∈Cuvv } ξ + rlk0 − zl qk ∆qb0 = −2 · (qb − qb0 ) + 2 · λqb0
| {z } (8) 0
• For item-user-user triple sample (g, h, h ) ∈ Cvuu , the
2 gradients of zh and qg are formulated as follows:
+ E{(b,b0 )∈Cvv } qb − qb0 
| {z } ∆zh = −δ(ξ + rh0 g − zh qg · qg + 2 · λzh
pull  (13)
n o ∆qg = −δ(ξ + rh0 g − zh qg · zh + 2 · λqg
+ E{(g,h,h0 )∈Cvuu } ξ + rh0 g − zh qg
| {z } With the above gradient formulas, we repeat the process of
push Algorithm 1 to train the model until the parameter updates
meet the convergence criterion.
where R̃ is the set of observable ratings and λ is the regular-
ization parameter. To minimize the joint objective function,
we utilize the stochastic gradient descent to update model
parameters. The main challenge is: we have two different In this section, we conduct extensive experiments to evaluate
components MF and IWIL that learn from two different data the performances of our model as well as the baselines.
Algorithm 1: ImW alkM F TABLE I
1 R̃, U-I-Net Statistics CiaoDVD Epinions
Result: users # 17,615 40,163
2 [z, q]
d×|U |
items # 16,121 139,738
3 Initialize z ∈ R , q ∈ R|V |×d with random values
ratings # 72,665 664,824
within [0,1]
∗ rating density 0.026% 0.012%
4 for each u ∈ U do
rating range [1, 5] [1, 5]
5 for t = 0 to T do
trusts # 111,781 487,183
6 w ← RandomW alkSampling(U-I-Net, u∗ , L)
trust density 0.036% 0.030%
[z, q] ← Inf erence([z, q], R̃, w, τ )
7 Return [z, q]
A. Experimental Design
9 RandomW alkSampling(U-I-Net, u∗ , L)
10 Initialize walk w with [u∗ ] 1) Datasets: We use two real-world datasets CiaoDVD
11 currNode = u∗ and Epinions. Both of them contain sparse rating feedbacks
12 for l = 1 to L do and sparse social trusts information. The trust connection
13 nextNode = randomSelect(currNode.neighbor) is unweighted and each trust value equals to 1. Note that
14 w.add(nextNode) the proposed ImWalkMF uses only the rating feedbacks, not
15 currNode =nextNode the social trusts information that will be considered in other
16 return w baseline methods for comparison. The main statistics of the
datasets are summarized in Table I.
18 Inf erence([z, q], R̃, w, τ ) 2) Evaluation Metric: We use two most popular metrics,
19 // user-level i.e., the mean absolute error (MAE) and the root mean square
20 for each ui ∈ w do error (RMSE), to evaluate the recommendation accuracy of
initialize Cuu and Cuvvw,ui
with empty sets each method. MAE is defined as:
// M F component
rij ∈R̂ |rij − r
bij |
23 sample a mini-batch of rating pair (i, f ) from R̃[i, :] M AE = (14)
24 update zi , qf via Eq. (9)
25 // IW IL component where rij is the rating user ui gave to item vj in test data set
26 for each vk ∈ w[i − τ : i + τ ] do R̂, and rbij represents the corresponding prediction score by a
27 for each vk0 ∈ N (ui ) do specific method. RMSE is defined as:
28 add triple (i, k, k ) to Cuvv sP
bij )2
rij ∈R̂ (rij − r
29 for each ui0 ∈ w[i − τ : i + τ ] do RM SE = (15)
add pair (i, i ) to Cuu |R̂|
31 sample a mini-batch of triple (i, k, k ) from Cuvv A smaller MAE or RMSE value means a better performance.
32 update zi , qk via Eq. (11) 3) Baselines: To validate the effectiveness of our method,
33 sample a mini-batch of triple (i, i ) from Cuu we select seven competitive comparison methods. The pro-
34 update zi , zi0 via Eq. (10) posed ImWalkMF is under the framework of matrix fac-
35 // item-level torization. Thus most of the baselines are based on matrix
36 for each vj ∈ w do factorization for the fair comparison. The baselines include:
w,v w,v
37 initialize Cvv j and Cvuuj with empty sets • RMF [6]: this is the matrix factorization approach with L2
38 sample a mini-batch of rating pair (g, j) from R̃[:, j] regularization on latent factors.
39 update zg , qj via Eq. (9) • PMF [8]: this is the probabilistic matrix factorization
40 for each uh ∈ w[j − τ : j + τ ] do algorithm which models latent factors of users and items
41 for each uh0 ∈ N (vj ) do via Gaussian distributions.
0 w,v
42 add triple (j, h, h ) to Cvuuj • SocReg [12]: this model incorporates the explicit social
connections among users into matrix factorization as a
43 for each vj 0 ∈ w[j − τ : j + τ ] do regularization constraint.
0 w,v
44 add pair (j, j ) to Cvv j • SocRec [9]: this is the social recommendation approach
sample a mini-batch of triple (j, h, h ) from
Cvuuj which models user preference by considering not only user’s
46 update zh , qj via Eq. (13) rating behavior, but also user’s social connections, based on
0 w,v the probabilistic matrix factorization.
47 sample a mini-batch of triple (j, j ) from Cvv j
48 update qj , qj 0 via Eq. (12)
49 Return [z, q]
• ISMF [17]: this is the implicit social recommendation on average 4.6% and 1.9% respectively, showing that the
algorithm which extends the matrix factorization with im- implicit random walk sequences capture informative and
plicit correlations computed by the predefined rating based useful information and the joint model effectively incorpo-
similarity function, e.g., Pearson correlation coefficient. rates the comprehensive correlations among users and items
• GPLSA [44]: this method utilizes the Gaussian probabilistic to further improve the recommendation accuracy. Although
latent semantic analysis to model the users’ preferences on the improvements of ImWalkMF over some baselines are
different items. small, the small improvements in MAE or RMSE may
• CUNEMF [18]: this model extends the matrix factorization lead to significant difference of recommendation in real
with implicit constraint generated by random walk based applications [45].
network embedding technique. • Comparing with the best baseline, we find that the im-
4) Experimental Settings: We randomly select 60% or 80% provements of ImWalkMF on category B test data is larger
of the dataset as a training set to train the model, and further than the corresponding value on category A test data.
predict the remaining 40% or 20% of the dataset. To achieve That is because test instances (whose user or item never
a fair comparison, we use grid search to find the best values appears in training process) predicted by global average
for the dimension of latent factor vector from {5, 10, 15, rating reduce the overall prediction accuracy in category A,
20}, and the regularization parameter from {0.1, 0.01, 0.001, which further validates the effectiveness of model training
0.0001} for each evaluated method. The latent factor vector of ImWalkMF. In the further analysis of next subsection,
dimension d of RMF, PMF, SocReg, SocRec, ISMF and we choose category B as the primary test data.
CUNEMF is set as 20, 20, 10, 20, 20, 20, respectively. Besides,
the regularization parameter λ of these six models equals to C. Analysis and Discussion
0.1, 0.001, 0.1, 0.001, 0.01, 0.1, respectively. For GPLSA, we 1) Performances of Variant Proposed Models: As described
similarly use grid search to find the best value for the number in Section III, ImWalkMF is a joint model of matrix factor-
of topic from {5, 10, 15, 20} and set the value to 10. For ization (MF) and implicit walk integrative learning (IWIL)
ImWalkMF, σ = 3 (in scale 5) is used to construct user-item from both user and item levels. Whether each component
implicit bipartite network, then we select L = 40, T = 10 to plays a role on the joint model and to what extent does each
generate the set of random walk sequences. In model training, component influence the recommendation performance? To
we set d = 15, λ = 0.001, ξ = 1.0 for both datasets, and τ = 6, answer these questions, we conduct experiments to evaluate
4 for CiaoDVD and Epinions, respectively. Due to the sparsity the performances of three kinds of ImWalkMF model includ-
of datasets, we use global average rating (mean value of all ing:
ratings in training data) to predict the test instances whose • UM F : the model of MF from only user level, i.e., imple-
user or item never appears in training process. Furthermore, menting line 20 to line 24 in the inference function of
we set up two categories of test data: category A contains all Algorithm 1.
test samples, and category B contains all test instances whose IW IL
• UM F : the joint model of MF and IWIL from only user
user and item appear in model training. level, i.e., implementing line 20 to line 34 in the inference
B. Result Comparison function of Algorithm 1.
• U V M F : the proposed joint model of MF and IWIL from
The performances of our method and the baselines on all
both user and item levels.
kinds of settings are reported in Table II, where the best
performances are shown by bold text. In addition, the last The performances of three models of on category B test data
row of each dataset reports the average prediction accuracy of of various settings are reported by Table III. As it can be seen
each method. The main takeaways from the result table are from the table:
summarized as follows: • UM F has close performance to RMF since it only relies on

• All models perform better on CiaoDVD dataset than on ratings based matrix factorization.
Epinions dataset. It is mainly because Epinions has sparser • UM F largely improves UM F ’s performance. It verifies
rating density than CiaoDVD, i.e. 0.012% v.s. 0.026%. In that random walk sequences capture multiple useful corre-
addition, SocReg and SocRec, ISMF and CUNEMF have lations among users and items, and the IWIL component is
better performances than RMF, which indicates the positive necessary for model learning.
influence of incorporating the explicit social relationship or • The joint model U V M F outperforms UM F , especially
implicit correlations into MF. for the prediction accuracy in terms of RMSE. It shows
• The proposed ImWalkMF has the best performance. It is that the item-level correlations, especially for item-item
much better than RMF and PMF, with 5.6% and 10.8% av- indirected correlation, further enhance the model’s ability
erage improvements, respectively. In addition, ImWalkMF in shaping latent spaces of users and items.
reduces 4.6% and 4.7% prediction errors of explicit so- Therefore, each component of the joint model helps improve
cial recommendation algorithms SocReg and SocRec, re- the performance of recommendation.
spectively. Moreover, ImWalkMF outperforms the implicit 2) Parameters Sensitivity: The hyper-parameters play im-
social recommendation models ISMF and CUNEMF by portant roles in ImWalkMF, as they determine how the model

Dataset Train% Metric RMF PMF SocReg SocRec ISMF GPLSA CUNEMF ImWalkMF
MAE 0.806 0.868 0.804 0.789 0.801 0.763 0.769 0.754
RMSE 1.039 1.131 1.035 1.027 1.031 1.026 1.001 0.989
MAE 0.796 0.869 0.788 0.784 0.786 0.733 0.741 0.719
RMSE 1.029 1.141 1.016 1.022 1.025 1.005 0.970 0.953
MAE 0.791 0.827 0.783 0.769 0.776 0.744 0.754 0.737
CiaoDV D A
RMSE 1.019 1.084 1.013 1.001 1.011 1.003 0.983 0.971
MAE 0.771 0.832 0.764 0.765 0.762 0.715 0.727 0.706
RMSE 0.997 1.097 0.994 0.999 0.995 0.980 0.953 0.935
Average over all 0.906 0.981 0.900 0.894 0.898 0.871 0.862 0.845
MAE 0.864 0.887 0.847 0.841 0.836 0.821 0.830 0.814
RMSE 1.109 1.154 1.095 1.109 1.097 1.099 1.082 1.070
MAE 0.863 0.901 0.846 0.840 0.835 0.818 0.828 0.809
RMSE 1.106 1.164 1.090 1.108 1.093 1.096 1.074 1.061
MAE 0.849 0.869 0.838 0.834 0.836 0.812 0.825 0.806
Epinions A
RMSE 1.095 1.128 1.083 1.098 1.095 1.089 1.076 1.062
MAE 0.850 0.873 0.837 0.835 0.838 0.809 0.823 0.801
RMSE 1.091 1.131 1.076 1.099 1.094 1.084 1.068 1.051
Average over all 0.978 1.013 0.964 0.971 0.966 0.954 0.951 0.934

TABLE III λ equal to 15 and 0.001, respectively.

P ERFORMANCE COMPARISON OF THREE KINDS OF I M WALK MF MODEL . • The margin ξ controls user’s preference gap between pos-
UMIW IL . itive correlated items and negative items. The larger ξ is,
the less penalty of hinge loss for negative samples will be
Dataset Train% Metric UM F IW IL
UM U V IW IL incurred. Thus the performance, especially for accuracy in
RMSE, decreases with the increment of ξ and the value of
MAE 0.794 0.723 0.719 ξ for optimal performance equals to 1.0.
RMSE 1.035 0.962 0.953
CiaoDV D
MAE 0.774 0.708 0.706
RMSE 1.006 0.942 0.935
Therefore, the certain setting of the hyper-parameters lead to
MAE 0.869 0.812 0.809
RMSE 1.115 1.067 1.061 the best performance of ImWalkMF.
MAE 0.854 0.804 0.801
RMSE 1.108 1.056 1.051 3) Performances on Cold Start Users: The cold start prob-
lem is prevalent in recommender systems, especially for new
users or new items with limited available ratings. Thus it
would be more difficult to learn preferences of cold start users
will be trained. In order to evaluate how changes to the hyper-
than general users. In order to validate the effectiveness of
parameters of ImWalkMF affect its performance on recom-
ImWalkMF for improving the recommendation for cold start
mendation, we conduct experiments to evaluate the impacts of
users, we evaluate the improvements of ImWalkMF over RMF
window size τ , latent factor vector dimension d, regularization
(in terms of the reduction in MAE and RMSE) on different
parameter λ and hinge loss margin ξ. The recommendation
user groups. First we classify all users into 6 groups (i.e.,
accuracy (in terms of MAE and RMSE on category B test
1∼5, 6∼10, 11∼15, 16∼20, 21∼25 and >25) based on the
set) of ImWalkMF on various settings of τ , d, λ and ξ are
number of observable ratings they have in training data, and
shown in Figure 2. According to the figure:
then measure the improvement in each group. The results
• With the increment of window size τ , MAE (RMSE) evaluated on category B test data are reported in Figure 3. It
decreases at first since a larger window means more useful is easy to find that ImWalkMF consistently outperforms RMF
implicit correlations among users and items. But when τ in all user groups. More importantly, our method achieves
goes beyond a certain value, MAE (RMSE) increases with relatively larger improvements in cold start user groups (the
the further increment of τ due to the possible involvement first few groups in the plots) as opposed to the other groups.
of non-correlated information. The best values of τ for It indicates that ImWalkMF well addresses the cold start
CiaoDVD and Epinions are 6 and 4, respectively. problem. Besides, the reduction in MAE and RMSE of 60%
• Similar to τ , appropriate values should be set for latent training data is larger than that of 80% training data, which
factor vector dimension d and regularization parameter λ further verifies the strength of ImWalkMF in modeling cold
such that the best embeddings of users (items) are learned start users’ preferences since less training data means sparser
for making accurate recommendations. The optimal d and available data and more cold start users.
0.735 0.76 0.80 0.730
CiaoDVD 60% Train CiaoDVD 60% Train 60% Train CiaoDVD 60% Train CiaoDVD
0.730 80% Train 0.75 80% Train 0.78
80% Train 0.725 80% Train

0.725 0.74 0.720




0.720 0.73 0.715

0.715 0.72 0.710

0.710 0.71 0.705

0.705 0.7 0.70 0.70

3 4 5 6 7 8 5 10 15 20 25 30 10-5 10 -4
10 -3
10 -2
10 -1 0.5 1 1.5 2 2.5
window size τ latent factor dimension d regularization parameter λ hinge loss margin ξ
0.98 1.02 0.98 0.98
CiaoDVD 60% Train CiaoDVD 60% Train 60% Train CiaoDVD 60% Train CiaoDVD
80% Train 1.00
80% Train 0.97
80% Train 0.97
80% Train

0.96 0.98 0.96 0.96



0.95 0.96 0.95 0.95

0.94 0.94 0.94 0.94

0.93 0.92 0.93 0.93

3 4 5 6 7 8 5 10 15 20 25 30 10-5 10-4 10-3 10-2 10-1 0.5 1 1.5 2 2.5
window size τ latent factor dimension d regularization parameter λ hinge loss margin ξ
0.824 0.825 0.87 0.816
Epinions 60% Train Epinions 60% Train 60% Train Epinions 60% Train Epinions
0.820 80% Train 80% Train 80% Train 80% Train
0.85 0.812



0.812 0.83 0.808

0.81 0.804

0.800 0.800 0.79 0.800

3 4 5 6 7 8 5 10 15 20 25 30 10-5 10-4 10-3 10-2 10-1 0.5 1 1.5 2 2.5
window size τ latent factor dimension d regularization parameter λ hinge loss margin ξ

1.08 1.09 1.075 1.095

Epinions 60% Train Epinions 60% Train 60% Train Epinions 60% Train Epinions
80% Train 1.08
80% Train 1.070 80% Train 1.085
80% Train
1.07 1.075


1.06 1.060

1.06 1.065
1.05 1.055

1.04 1.04 1.045 1.045

3 4 5 6 7 8 5 10 15 20 25 30 10-5 10-4 10-3 10-2 10-1 0.5 1 1.5 2 2.5
window size τ latent factor dimension d regularization parameter λ hinge loss margin ξ

Fig. 2. The impacts of different hyper-parameters on the performance of ImWalkMF, window size τ , latent factor vector dimension d, regularization parameter
λ and hinge loss margin ξ.

60% Train
60% Train
torization and implicit walk integrative learning, based on
80% Train 80% Train
0.07 the random walk sequences. Finally we propose a sampling
reduction in MAE

reduction in RMSE


0.07 0.06
based combined strategy to train the joint model and learn
the latent factors of users and items. The experimental re-
0.05 sults on the two real-world datasets show that our proposed
1~5 6~10 11~15 16~20
number of training ratings
21~25 >25
1~5 6~10 11~15 16~20
number of training ratings
21~25 >25 method outperforms the traditional regularized/probabilistic
0.07 0.06 matrix factorization models as well as the other baselines
60% Train 60% Train

80% Train 0.05
80% Train which employ explicit/implicit social information. Overall,
reductionin RMSE
reduction in MAE

0.04 our work provides new insights for capturing and modeling

0.03 comprehensive implicit information in recommender system.

0.02 At last, some potential future work includes: (1) extending
Epinions Epinions
1~5 6~10 11~15 16~20 21~25 >25
1~5 6~10 11~15 16~20 21~25 >25
the implicit walk integrative learning with more content like
number of training ratings number of training ratings
items’ tags information; (2) designing heterogenous random
Fig. 3. The improvements (in terms of reduction in MAE and RMSE) of walk based integrative learning framework for other applica-
ImWalkMF over RMF in different users groups. ImWalkMF reaches larger tions like citation paper recommendation in academic search
improvements in the first few (cold start) groups of users than the other groups.

This work is supported by the Army Research Labora-
In this paper, in order to address the previous work’s tory under Cooperative Agreement Number W911NF-09-2-
limitations in utilizing explicit/implicit social information to 0053 and the National Science Foundation (NSF) grant IIS-
improve the matrix factorization recommendation technique, 1447795. The views and conclusions contained in this docu-
first we utilize random walk sampling on user-item implicit ment are those of the authors and should not be interpreted as
bipartite network to collect a set of node sequences which representing the official policies, either expressed or implied,
contain multiple directed/indirected correlations among users of the Army Research Laboratory or the U.S. Government.
and items. Then we design a joint model of matrix fac- The U.S. Government is authorized to reproduce and distribute
reprints for Government purposes notwithstanding any copy- [28] M. Koestinger, M. Hirzer, P. Wohlhart, P. M. Roth, and H. Bischof,
right notation here on. This publication is jointly supported “Large scale metric learning from equivalence constraints,” in CVPR,
2012, pp. 2288–2295.
by the King Abdullah University of Science and Technology [29] W. Liu and I. W. Tsang, “Large margin metric learning for multi-label
(KAUST) Office of Sponsored Research (OSR) under Award prediction.” in AAAI, 2015, pp. 2800–2806.
No. 2639. [30] Z. Zhao, Q. Yang, D. Cai, X. He, and Y. Zhuang, “Expert finding
for community-based question answering via ranking metric network
learning,” in IJCAI, 2016, pp. 3000–3006.
R EFERENCES [31] Z. Zhao, H. Lu, V. W. Zheng, D. Cai, X. He, and Y. Zhuang,
[1] D. A. Davis, N. V. Chawla, N. A. Christakis, and A.-L. Barabási, “Time “Community-based question answering via asymmetric multi-faceted
to care: a collaborative engine for practical disease prediction,” DMKD, ranking network learning,” in AAAI, 2017.
vol. 20, no. 3, pp. 388–415, 2010. [32] C.-K. Hsieh, L. Yang, Y. Cui, T.-Y. Lin, S. Belongie, and D. Estrin,
[2] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix “Collaborative metric learning,” in WWW, 2017, pp. 193–201.
factorization,” in NIPS, 2001, pp. 556–562. [33] T. Chen and Y. Sun, “Task-guided and path-augmented heterogeneous
[3] J. D. Rennie and N. Srebro, “Fast maximum margin matrix factorization network embedding for author identification,” in WSDM, 2017.
for collaborative prediction,” in ICML, 2005, pp. 713–719. [34] C. Gkantsidis, M. Mihail, and A. Saberi, “Random walks in peer-to-peer
[4] Y. Koren, “Factorization meets the neighborhood: a multifaceted collab- networks,” in INFOCOM, 2004.
orative filtering model,” in KDD, 2008, pp. 426–434. [35] J. Leskovec and C. Faloutsos, “Sampling from large graphs,” in KDD,
[5] K. Yu, S. Zhu, J. Lafferty, and Y. Gong, “Fast nonparametric matrix 2006, pp. 631–636.
factorization for large-scale collaborative filtering,” in SIGIR, 2009, pp. [36] M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou, “Walking in
211–218. facebook: A case study of unbiased sampling of osns,” in INFOCOM,
[6] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for 2010.
recommender systems,” Computer, vol. 42, no. 8, 2009. [37] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation
[7] Y. Koren, “Collaborative filtering with temporal dynamics,” Communi- ranking: Bringing order to the web.” Stanford InfoLab, Tech. Rep., 1999.
cations of the ACM, vol. 53, no. 4, pp. 89–97, 2010. [38] M. E. Newman, “A measure of betweenness centrality based on random
[8] R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization,” in walks,” Social networks, vol. 27, no. 1, pp. 39–54, 2005.
NIPS, 2007, pp. 1257–1264. [39] X. Song, Y. Chi, K. Hino, and B. Tseng, “Identifying opinion leaders
[9] H. Ma, H. Yang, M. R. Lyu, and I. King, “Sorec: social recommendation in the blogosphere,” in CIKM, 2007, pp. 971–974.
using probabilistic matrix factorization,” in CIKM, 2008, pp. 931–940. [40] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of
[10] H. Ma, I. King, and M. R. Lyu, “Learning to recommend with social social representations,” in KDD, 2014, pp. 701–710.
trust ensemble,” in SIGIR, 2009, pp. 203–210. [41] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for
[11] M. Jamali and M. Ester, “A matrix factorization technique with trust networks,” in KDD, 2016, pp. 855–864.
propagation for recommendation in social networks,” in RecSys, 2010, [42] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
pp. 135–142. “Distributed representations of words and phrases and their composi-
[12] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King, “Recommender systems tionality,” in NIPS, 2013, pp. 3111–3119.
with social regularization,” in WSDM, 2011, pp. 287–296. [43] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Bpr:
[13] B. Yang, Y. Lei, D. Liu, and J. Liu, “Social collaborative filtering by Bayesian personalized ranking from implicit feedback,” in UAI, 2009,
trust,” in IJCAI, 2013, pp. 2747–2753. pp. 452–461.
[14] A. J. Chaney, D. M. Blei, and T. Eliassi-Rad, “A probabilistic model for [44] T. Hofmann, “Collaborative filtering via gaussian probabilistic latent
using social networks in personalized item recommendation,” in RecSys, semantic analysis,” in SIGIR, 2003, pp. 259–266.
2015, pp. 43–50. [45] Y. Koren, “Factor in the neighbors: Scalable and accurate collaborative
[15] G. Guo, J. Zhang, and N. Yorke-Smith, “Trustsvd: Collaborative filtering filtering,” TKDD, vol. 4, no. 1, 2010.
with both the explicit and implicit influence of user trust and of item
ratings,” in AAAI, 2015, pp. 123–129.
[16] X. Wang, S. C. Hoi, M. Ester, J. Bu, and C. Chen, “Learning personal-
ized preference of strong and weak ties for social recommendation,” in
WWW, 2017, pp. 1601–1610.
[17] H. Ma, “An experimental study on implicit social recommendation,” in
SIGIR, 2013, pp. 73–82.
[18] C. Zhang, L. Yu, Y. Wang, C. Shah, and X. Zhang, “Collaborative user
network embedding for social recommender systems,” in SDM, 2017.
[19] G. Linden, B. Smith, and J. York, “Amazon. com recommendations:
Item-to-item collaborative filtering,” IEEE Internet computing, vol. 7,
no. 1, pp. 76–80, 2003.
[20] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, “Grou-
plens: an open architecture for collaborative filtering of netnews,” in
CSCW, 1994, pp. 175–186.
[21] Y. Hu, Y. Koren, and C. Volinsky, “Collaborative filtering for implicit
feedback datasets,” in ICDM, 2008, pp. 263–272.
[22] R. Pan, Y. Zhou, B. Cao, N. N. Liu, R. Lukose, M. Scholz, and Q. Yang,
“One-class collaborative filtering,” in ICDM, 2008, pp. 502–511.
[23] S. Kabbur, X. Ning, and G. Karypis, “Fism: factored item similarity
models for top-n recommender systems,” in KDD, 2013, pp. 659–667.
[24] X. He, H. Zhang, M.-Y. Kan, and T.-S. Chua, “Fast matrix factorization
for online recommendation with implicit feedback,” in SIGIR, 2016, pp.
[25] C.-X. Zhang, Z.-K. Zhang, L. Yu, C. Liu, H. Liu, and X.-Y. Yan, “In-
formation filtering via collaborative user clustering modeling,” Physica
A, vol. 396, pp. 195–203, 2014.
[26] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell, “Distance metric
learning with application to clustering with side-information,” in NIPS,
2002, pp. 505–512.
[27] K. Q. Weinberger, J. Blitzer, and L. Saul, “Distance metric learning for
large margin nearest neighbor classification,” in NIPS, 2006.