Graph Filtering For Recommendation On Heterogeneous Information Networks

Received March 4, 2020, accepted March 9, 2020, date of publication March 16, 2020, date of current version March
26, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2981253
Graph Filtering for Recommendation on

Heterogeneous Information Networks
CHUANYAN ZHANG 1 AND XIAOGUANG HONG2
1 School of Computer Science and Technology, Shandong University, Jinan 250101, China
2 Software College, Shandong University, Jinan 250101, China
Corresponding author: Chuanyan Zhang (chuanyan_zhang@sina.cn)
ABSTRACT Various kinds of auxiliary data in web services have been proved to be valuable to handler
data sparsity and cold-start problems of recommendation. However, it is challenging to develop effective
approaches to model and utilize these various and complex information. Due to the flexibility in modelling
data heterogeneity, heterogeneous information network (HIN) has been adopted to model auxiliary data for
sparsity recommendation, named HIN based recommendation. But most of these HIN based methods rely
on meta path-based similarity or graph embedding, which cannot fully mine global structure and semantic
features of users and items. Besides, these methods, utilizing extended matrix factorization model or deep
learning model, suffer expensive model-building problem and cannot treat personal latent factors carefully
since their global objective functions. In this paper, we model both rate and auxiliary data through a unified
graph and propose a graph filtering (GF) recommendation method on HINs. Distinct from traditional HIN
based methods, GF uses a rate pair structure to represent user’s feedback information and predict the rating
that says, ‘‘a predicted rating depends on its similar rating pairs.’’ Concretely, we design a semantic and
sign value-aware similarity measure based on SimRank, named Constrained SimRank, to weight rating pair
similarities on the unified graph and compute the predicting rate score for an active user via weighted average
of all similar ratings. Various semantics behind edges of the unified graph have different contributions for
the prediction. Thus, an adaptive framework is proposed to learn the weights of different semantic edges
and products an optimized predicted rating. Finally, experimental studies on various real-world datasets
demonstrate that GF is effective to handler the sparsity issue of recommendation and outperforms the state-
of-the-art techniques.
INDEX TERMS Recommender systems, graph filtering, constrained SimRank, heterogeneous information
networks.
I. INTRODUCTION Early generation CF methods, also known as memory-

Recommender system, which help users discover items of based CF, use the user rating data to calculate the similarity
interest from near-infinite collections, have been playing an between users or items and make predictions according to the
increasingly role in various online services, such as YouTube, calculated similar ratings [3], [4]. For the easy-to-implement
Amazon and Alibaba [1]–[3]. Among the techniques that and highly effective, memory-based CF methods are notably
are utilized to recommendation, collaborative filtering (CF), deployed into many commercial systems. When the user
whose fundamental intuition is that if two users rate some feedback information is sparse, the performance of memory-
items similarly or have similar behaviors (e.g., watching, based CF methods deteriorates drastically since the similarity
buying), and hence will rate or act on other item similarly, is based on co-rated items. To overcome sparsity problem,
has shown its effectiveness and efficiency. Nowadays, CF is model-based CF methods have been investigated, which use
widely regarded as one of the most successful methods for the pure rating data (a.k.a. user feedback information) to learn
building recommender systems. a model for predictions [5], [3]. Most of these methods rely on
matrix factorization (MF), which factorizes the user-item rat-
The associate editor coordinating the review of this manuscript and ing matrix into two low-rank user-specific and item-specific
approving it for publication was Mehul S. Raval . matrices, and then utilizes these two factorized matrices to
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
52872 VOLUME 8, 2020
C. Zhang, X. Hong: Graph Filtering for Recommendation on HIN
make further predictions [3]. It is effective but not enough its similar rating behaviors.’’ To extract similarities between
since dimensionality reduction usually means losing useful user behaviors, we first use rate pairs to represent user behav-
information. Especially for large-scale sparse data, to achieve iors, and then we propose Constrained SimRank (CSR) to
high-density factorized matrices, the ranks should be much measure the rate pair similarities on the unified graph. CSR,
lower, and hence more information is lost. Besides, Simple an extended form of SimRank [23] on HINs, is aware of the
MF on pure rating matrix cannot handler cold-start problem. semantics and signed values (e.g., rating score, like or dislike)
With the rapid development of web services, various kinds of different edges. Distinct from meta path-based measures,
of auxiliary information (a.k.a. side information) become CSR is a global similarity measure on HINs and can easily
valuable for recommender systems [6]. Although auxiliary integrate temporal information. To predicting the score of
information has been demonstrated useful to deal with data a rate pair, we explore the weighted average value of all
sparsity and cold-start problems, it is difficult to model and its similar rate pairs. Effectively, our model is inspired by
explore them. Generally, current methods that integrate side the memory-based CF method, but we expand the scope of
information can be divided into 3 categories: (1) factoriza- similar objects via link-based CSR. Finally, we also create
tion machine (FM) models [7]; (2) CF-based models [8]; an objective function for predication optimization with the
(3) deep learning models [9]. FM models usually utilize one- weighs of semantic edges as the paraments. Distinct from
hot or multi-hot coding technique to model auxiliary infor- other models, the paraments are very finite and they reflect
mation. Although some modified methods adopt embedding the semantic importance for our predication task more than
layer in a neural network-based FM framework, the models for concrete rating behaviors. Thus, it does not change our
still too complex to learn due to the high-dimension feature recommendation intuition, i.e., special predicationspecial
vectors. The following 2 methods usually adopt HIN to char- similar rate pairs.
acterize more useful auxiliary information in recommender The primary contributions of this paper are as follow:
systems due to its flexibility in modelling data heterogeneity.
To extract useful knowledge from HINs, most of existing 1) We study the problems in exploring various kinds of
methods rely on meta path and graph embedding. However, auxiliary information for better recommendation, and
both of them cannot fully mine latent structure and semantic propose a novel case-to-case method, called Graph Fil-
tering, which predicts the rating score by similar rate
features for users and items. First, meta path which is usually
pairs on HIN that says, ‘‘the predicted score for a given
utilized to measure the user or item similarities on HINs, is a
rating behavior depends on its similar rating behaviors.’’
kind of local metric, e.g., PathSim [10] only explores the
symmetric meta path instances in HINs. Meanwhile, graph 2) In GF, a unified graph is proposed to model both user
embedding technique essentially uses a low-dimension latent behavior information and the side information. To mea-
vector to model the object structure in HINs, and information sure the similarity of rate pair on the unified graph,
we design a semantic and sign value-aware similarity
loss is inevitable. Further, all these three kinds of learning
measure, called Constrained SimRank.
approaches suffer expensive model-building problem, and are
hardly deployed in complex scenarios, especially with many 3) We propose an adaptive optimization algorithm to
various side information. Another issue of them is that they weight the contributions of different semantics on the
all create a global object function to optimize the parameters unified graph for our predication task.
4) We conduct extensive experiments to evaluate GF,
of latent factors. Thus, they cannot learn the personalized
as well as other state-of-the-art baselines. The results
latent factor well. For example, When CF models learn the
demonstrate the superiority of our proposed method in
latent vectors U, V, for a specific user i, Ui depends on all
his historical information; for a specific item j, Uj relies on various sparse datasets.
all its rated scores. That is to say the special behaviors are
easy to overlook, e.g., if we find user i usually likes action II. INTRODUCTION
movie, the learned model will think any action movie may In the literature of recommender systems, early works mainly
be favorite and completely ignore that he may dislike some focus on CF methods which utilize user historical feedback
action ones due to other reasons. In FM or deep learning information for prediction. Since traditional memory-based
models, this problem still exists, i.e., the parament of a feature CF methods usually suffer from data sparsity and cold-start
or feature interaction is subject to the global optimization problems, some works attempt to leverage additional data,
and hardly handle special cases. In conclusion, in order to such as social data [11]–[13], location information [14] and
improve the recommendation performance via various kinds context information [15] [28]. All the additional data is called
of side information, there are 3 challenging issues, namely as auxiliary information. The classical methods that exploit
effective data model, information extraction and exploitation. various kinds of auxiliary information to facilitate the data
For the first issue, we still use HIN to model both the user sparsity and cold-start problems, can be grouped in three
behavior information (e.g., buy, watch and rate) and auxiliary categories: (1) factorization machine models; (2) matrix fac-
information, named as Unified Graph. Then, we propose torization models; (3) deep learning models.
Graph Filtering (GF) for prediction in this paper that says, FMs are a general predicator working with any real valued
‘‘the predicted score for a given rating behavior depends on feature vectors. To explore the side data for recommendation,
VOLUME 8, 2020 52873

FMs first convert it into high-dimensional generic fea- score. Then, we integrate the auxiliary information into GR .
ture vectors by using one-hot/multi-hot encoding. Then, FMs In general, auxiliary information contains 2 types: structure
estimates the garget by modelling all interactions between and attribute information. Structural information, represent-
each pair of features via factorized interaction parameters. ing object relations, can be model as edges in GR . Meanwhile,
Based on FMs, Xiang et al. propose neural FM (NFM) [16], the attribute information is a kind of appended information
which seamlessly combines the linearity of FM in modeling to the corresponding objects that describe their features.
second-order feature interactions and the non-linearity of To model attribute data, denoted as 3, we create attribute
neural network in modeling higher-order feature interactions. nodes and edges. For an attribute ai , if the value is discrete,
Traditional MF models generally factorize the rating matrix likely the movie genres in Fig. 1, we model each attribute
into two latent matrixes, i.e., U and V , by optimizing the value ai,j (i.e., the jth value of) as a node and create an attribute
rate missing objective function. To improve the effectiveness edge ea (v, ai,j ) if the node v has an attribute ai with value
by exploiting the auxiliary information, many works rewrite ai,j ; if the value is continuous, e.g., user’s age, we segment
the objective function via adding a distance function about the value domain into discrete fragments. For each fragment,
the relationships among users and items [17]. With the surge we can create node and edge in GR same as discrete attribute.
of deep learning, deep neural networks are also employed Finally, we get a directed unified graph with the reverse edges
to deeply capture the latent features of users and items for added. Fig. 1 shows an example of a unified graph for movie
recommendation, likely NCF [18], CFM [9], DeepFM [8], recommendation where the users’ social relationship is a kind
NeuACF [9]. of structural information, and the movie genre is an attribute
Since HINs can naturally model complex objects and their information. Particularly, an object may have more than one
rich relationships, some works have begun to adopt HIN in values on a specific attribute, e.g., the genre of movie m1 can
recommendation. Specially, HINs are just utilized to model be affection and comedy. We denote the unified graph as GU ,
data, and extracting information from HINs and exploring and the formal definition is as follow.
them for recommendation are different problems. The two Definition 2 (Unified Graph): Given a HIN GR = (VR , ER )
usual technologies for information extraction are path-based that represents the rating matrix R, various kinds of structure
similarity measure [10], [19] and network embedding [20]. information and attribute information, the Unified Graph,
And the models that explore the extracted information from denoted as GU = (VU , EU ) is a kind of complex HIN which
HINs are either CFs or deep learning. In this paper, we still integrates the structure information via structure edges ES in
employ HINs to model auxiliary information. Distinct from GR and the attribute information through creating attribute
above methods, we propose a novel similarity measure to nodes VA and attribute edges EA , i.e., EU = ER ∪ ES ∪ EA ,
extract information and a case-to-case prediction method. VU = VR ∪ VA .
Comparing with traditional path-based similarity measures, Further, we define the concept of rate pair on GU .
CSR is global, and GF can handle users’ special behaviors Definition 3 (Rate Pair): Rate pair is a two tuple about user
for more personalized recommendation. and item, denoted as r(u, i), u ∈ user, i ∈ item in GU . If there
is an edge between u and i, i.e., e(u, i) ∈ EU , r(u, i) is called
III. PRELIMINARY Feedback Rate Pair and the value is denoted as rui ; otherwise,
In this section, we first introduce the basic preliminaries for it’s Predicting Rate Pair, and the predated score of r(u, i) is
graph filtering recommendation, including notations and data denoted as r̂ui .
structures, and then formulates the problem. First, we give the
definition of HIN, and then propose a unified graph based
on HIN to model both user feedback information and the IV. GRAPH FILTERING
complex auxiliary information. A. INTUITION OF GRAPH FILTERING MODEL
Definition 1 (Information Network): An information net- Memory-based CF is a truly effective approach for most of
work is defined as a directed graph G = (V , E) where each recommender systems except the data sparsity and cold-start
object v ∈ V belongs to one particular object type φ(v) ∈ V , problems. The fundamental assumption of CF is that if user
and each link belongs to one particular relation ψ(e) = E. u1 and u2 rate i items similarly, or have similar behaviors,
If the types of objects |A| > 1 or the types of relations and hence rate or act on other items similarly. Since the
|E| > 1, it is called heterogeneous information network; similarity measures of CF are usually based on co-relations
otherwise, it is homogeneous information network. (e.g., cosine, Pearson coefficient), the similarity values are
In GF, the information that are modeled and explored for unreliable when data are sparse [21]. Taking the rating matrix
prediction can be divided into 2 categories: user feedback in Fig.1 as an example, user u1 and u3 are entirely different
information, including ratings or other behaviors (e.g., buy, if we compute the similarity based on co-relations. However,
watch, like) and auxiliary information, likely object attributes, if we model the matrix as a HIN and use path-based similarity
social relationships. First, the user feedback information is measure, both u1 and u3 link to u4 . Hence, they have a certain
modeled as a HIN GR = (V , E) which contains 2 object similarity. Then, if we rate the movie m5 for u1 , not only
types, i.e., A = {user, item}. If there is a rate from user the feedback information of u2 and u4 , but also the rating
u to item i, a weighted edge e(u, i) is created with a rating value of u3 to m5 can be explored. Further, either user-based
52874 VOLUME 8, 2020

FIGURE 1. Example of creating unified graph for movie recommendation based on rating matrix, user relationships and the movie genres.
Distinct from memory-based CF models, GF model (i.e.,

Fig.2 (c)) computes r̂ui through global information over the
unified graph GU . First, we utilize path-based similarity on
GU to min more similar objects. Not only the rate pairs that
directly connects with the user u or item i via paths of GU , but
also the rate pairs referenced through similar users and similar
items are explored to compute r̂ui . We denote the similar user
set of the user u as SUu , the similar item set of the item i as SIi .
And the union set of SUu and SIi is called Direct Similar Set,
denoted as DSS, while the rate pair set, a Cartesian product
of users in SUu and items in SIi , is called Indirect Similar
FIGURE 2. Memory-based CF models and graph filtering model.
Set, denoted as inDSS. Every element in the 3 sets should be
feedback rate pair. For example, in Fig. 1, if we predicate the
value of the rate pair (u1 , m5 ), the similar user set is {(u2 , m5 ),
CF or item-based CF (i.e., two categories of memory-based (u3 , m5 ), (u4 , m5 )}, the similar item set is {(u1 , m1 ), (u1 , m2 ),
CF) only adopt one similar set for rating prediction, i.e., user (u1 , m3 )} and the indirect similar set is {(u2 , m1 ), (u2 , m2 ),
similar set or item similar set respectively, which is shown (u3 , m3 ), (u4 , m1 ), (u4 , m2 ), (u4 , m3 )}. We use the following
in Fig. 2. However, a rating score is produced from two formulas to introduce the relations between memory-based
objects: user and item. Thus, when we compute the predicated CF and GF.
ratings, we have more valuable rating values that can be Memory-based CF model is formulized as
explored. These aspects are the main motivation of GF.  P
The basic intuition behind GF is that given a predicting 
 rvi user based CF
rate pair, the predicated value depends on the values of its v∈SUu
r̂ui ∝ P (1)
similar rate pairs. To overcome the limitations of memory- 
 rui item based CF
j∈SIi
based CF, we first model the known information as a unified
graph GU . Given a predicting rate pair r(u, i), we try to GF model is formulized as
find its similar feedback rate pairs via path-based similarity. X X X
r̂ui ∝ rvi + ruj + rvj (2)
A straightforward way to measure the similarity between
v∈SUu j∈SIi r(v,j)∈inDSS
rate pairs is to compute the similarities between correspond-
ing users and items. Comparing with the memory-based CF where rvj is the value of rate pairs in inDSS.
methods, the final set of similar rate pairs can be divided
into 3 subsets: similar user set, similar item set and indirect B. BASIC GRAPH FILTERING EQUATION
similar set. Fig. 2 shows the two memory-based CF models, Given two rate pairs r(u, i), r(v, j), let s[(u, i), (v, j)] be the
i.e., user-based CF and item-based CF. To compute r̂ui of the similarity between them. If u = v or i = j, the similarity can
predicting rate pair r(u, i), the user-based CF, (i.e., Fig. 2 (a)), be simplified as s(i, j) or s(u, v). For example, if one user u
firstly finds the similar users of u and use their ratings to has rated two different items i and j, the distance between the
item j as the relevant score for r̂ui . While, the item-based CF, two rata pairs only relies on the items. Following our earlier
(i.e., Fig. 2(b)), firstly finds the similar items of i, and use the intuition that the value of a predicting rate pair depends on its
ratings from u to the similar items as the reference for r̂ui . similar rate pairs, the predicated rate value is computed as a
VOLUME 8, 2020 52875

weight sum of his similar rate pair scores. Based on Eq. (2), Let s(∗ ,∗ ) satisfy self-maximum. If i = j, i.e., the two
we write our basic equation for r̂ui : different users rate the same item, we only need to weight the
|SU similarity between the users; if u = v, i.e., one user rates two
Xu | |SIi | |SU
Xu | X
|SIi |
different items, only the distance between the items need to
X
r̂ui = s̃(u, v)rvi + s̃(i, j)ruj + s̃[(u, i), (v, j)]rvj
v=1 j=1 v=1 j=1
be measure; otherwise, we use the produce of the 2 similarity
(3) values. Therefore, the factorization in Eq. (7) can meet the
form of our basic GF equation (3).
where s̃ represents the normalize value of the corresponding In conclusion, to compute s[(u, i), (v, j)] in GU , we only
rate pair similarity s, which can be computed by need to design a path-based object similarity measure s(∗ ,∗ )
s[(u, i), (v, j)] which should be self-maximum and aware of the sign values
s̃[(u, i), (v, j)] = (4) and edge semantics. In this paper, we propose Constrained
Norm(u, i)
SimRank (CSR), an extended form of SimRank on HINs.
where Norm(u, i) is the sum of the similarity distances from SimRank [22] is a state-of-art link-based similarity measure
r(u, i) to all its similar feedback rate pairs. If r(v, j) is a on homogeneous information network. It is introduced to
predicting rate pair, that’s to say rvj is unknown and it’s formally the intuition that ‘‘two objects are similar if they
useless for prediction. Therefore, when we compute r̂ui , the are referenced by similar objects.’’ Theoretically, SimRank
similarity value between r(u, i) to any other predicting rate is based on an intuitive ‘‘random surfer-pairs model’, and its
pair is set to be 0. score of node pair u and v, s(u, v), specifies how soon two
Let Uu = SUu ∪ {u}, Ii = SIi ∪ {i}. Then, the Eq. (3) can random surfers are expected to meet at same node if they start
be uniformly written as at nodes u and v and randomly walk on the graph. If u = v,
|Uu | X
X |Ii | s(u, v) = 1; or
r̂ui = s̃[(u, i), (v, j)]rvj (5) |0(u)|
v=1 j=1 c X |0(v)|
X
s(u, v) = s(0i (v), 0j (v)) (8)
Further, considering the personalized rating bias of each |0(u)| |0(v)|
i=1 j=1
user, we extend Eq. (5) into
where 0 < c < 1 is the decay factor, 0(∗ ) is the neighbor
|Uu | X
X |Ii |
(in or out) set of ∗ , and 0i (∗ ) is the ith element. SimRank is
r̂ui = r̄u + s̃[(u, i), (v, j)](rvj − r̄v ) (6) not designed for HINs. Thus, it is unaware of semantics and
v=1 j=1 sign values. If we use SimRank on GU , a more complex HIN,
where r̄u and r̄v are the average rating scores for the user u these two questions should be considered. Before we answer
and v on all their rated items. them, the network schema of GU is given, and based on the
schema, we propose our CSR on GU .
C. CONSTRAINED SIMRANK
Given the basic GF equation, the next problem is to design a
rate pair similarity that meet the following conditions:
• Given a unified graph GU , the similarity measure in GF
should be path-based, which can min more latent similar
rate pairs in GU .
• Since GU is a complex HIN with various semantic edges,
the path-based measure should be semantic-aware.
• In some scenarios, the user’s feedback information is
a 1 and 0 matrix where 1 represents ‘‘like’’; ‘‘watch’’,
‘‘buy’’ and so on; otherwise, it is 0. While, in other sce-
narios, the user’s feedback information are continuous
FIGURE 3. The network schema of GU for recommendation.
scores in a specific domain. We call these ratings sign
values, and our similarity should be sign-aware, that
means similar ratings lead to similar rate pairs. Network schema is a meta template for HINs with the
object and link mapping to their types and directed relation
Given GU , it is difficult to measure the rate pair similarity
edges respectively [10]. Fig. 3 is the network schema of GU ,
since all the path-based measures are designed for two nodes
and m, n be the attribute number of user and item. There are
to our best knowledge. If the two rate pairs share the same
Z (Z = m + n + 3) kinds of sematic relations in GU . If we run
item or users, the rate pair similarity can be detailly written
SimRank on GU to compute the similarity directly, there may
as a distance between two users or items. In this perspective,
exist Z kinds of semantic edges in a path. Assume that u and v
we can measure the rate pair similarity by the two similarities
are two users in GU , based on SimRank, they randomly walk
between their corresponding users and items, i.e.,
forward and meet at the same node j in k th step. However,
s[(u, i), (v, j)] = s(u, v)s(i, j) (7) the two paths u to j and v to j may completely different in
52876 VOLUME 8, 2020

semantic. For example, there are two paths: path(u-ua1,x -j) the first step is to their out-neighbors Ok ((u, v)) of Ak , i.e., ta :
and path(v-ua2,y -j) on GU . Based on SimRank, s(u, v) should u → Ak,i (u) x and tb : v → Ak,j (v) x. Then,
be higher since u and v will meet in second step on their the tours from (Ak,i (u), Ak,j (v)) to the node x is denoted as t 0 .
random surfer. Concretely, if u shares the same job with j and For each t we can derive a corresponding t 0 by splitting the
v shares the same age with j, can we infer that u and v are of edge e((u, v), (Ak,i (u), Ak,j (v))) at the beginning in G2 . And
a certain similarity? corresponding to the path instance t and t 0 , we derive meta
Obviously, similarity only exist between objects of the path T 0 from T where T 0 : Ak A(x) by splitting the meta
same class and should be computed by comparing their cor- edge (A(u), Ak ) in TG . We use Au,v to denote the bijection that
responding related objects of the same type. Based on this takes each T 0 from T via Ak . Then, we get
intuition, we propose Constrained SimRank in which the
pairwise random surfer go to the next step only if the next m(u, v)
X X
two nodes belong to the same class. Given GU , u, v ∈ Ai . = P[t]cl(t)
If u = v, s(u, v) = 1; otherwise, s(u, v) is T : Ai Aj t = ta ||tb
|X
Tu,v ||0t (u)||0t (v)| A ⊆ TG ta , tb ∈ T
c X X
s(u, v) = s(0t,i (v), 0t,j (v)) (9) |X
Au,v | |Ok (u,v)|
|0(u)| |0(v)| X X X
t=1 i=1 j=1 =
where Tu,v is a node class set for the neighbor nodes of u and AkAk,i (u) T 0 : Ak φ(x) t 0 = ta0 ||tb0
v, 0t (u) are the u’ neighbors of type At , and 0t,i (u) is the ith Ak,j (v) A ⊆ TG ta0 , tb0 ∈ T
element, i.e., φ(0t,i (u)) = At ∈ Tu,v . For example, u and v 1 0
are two users. Then we have φ(0t,i (u)) ≡ φ(0t,j (v)). Based × P[t 0 ]cl(t )+1
|O(u)| |O(v)|
on Fig. 3, (φ(0t,i (u)), φ(0t,j (v)) in the right hand of Eq. (9)
|X
Au,v | |Ok (u,v)|
may be (user, user), (ua1 , ua1 ). . . (uam , uam ) and (item, item). c X X X
Theorem 1: The Constrained SimRank score on HINs, =
|O(u)| |O(v)|
defined in Eq. (9), is semantic-aware. Ak A (u) T 0 : A
k,i k φ(x) t 0 = ta0 ||tb0
Proof: SimRank is based on ‘‘random surfer-pairs Ak,j (v) A ⊆ TG ta0 , tb0 ∈ T
model’’ on graph, and the equivalent form of Eq. (8) is 0
× P[t 0 ]cl(t )
X
s(u, v) = P[t]cl(t) (10) Au,v | |Ak,i (u)| |Ak,j (v)|
|X
c X X
t:(u,v) (x,x) = m(Ak,i (u), Ak,j (v))
|O(u)| |O(v)|
k=1 i=1 j=1
Specially, Eq. (10) is defined on G2 . Given a graph G, G2 is
a HIN where each node represents an ordered node pair of G,
Therefore, Eq. (11) is identical to Eq. (9) of CSR. Further,
and (a, b) points to (c, d) in G2 if a points to c and b points to
given a meeting path t of Eq. (11), t = ta ||tb , ta : a1 am ,
in G. Each tour in G2 is composed by 2 same-length paths in
tb : b1 bm , since φ(ai ) = φ(bi ), we have
G: ta : u x and tb : v x. P[t] is the traveling probability
and l(t) is the length of t. Here we use two paths in step on R(φ(ai−1 ), φ(ai )) = R(φ(bi−1 ), φ(bi )), 1 < i ≤ m.
G to instead t, formally written as t = ta ||tb . We will prove
that CSR is also based on ‘‘random surfer-pairs model’’. And R denotes the edge semantics, and hence ta and tb are seman-
then, we can infer that ta and tb are semantic consistent in any tic consistent in any step. Finally, we can infer that the CSR
step for any t. score of Eq. (9) is semantic aware.
For our GSR, we extend the concept of expected meeting Given the basic form of CSR, the following question is how
distance (EMD) into constrained expected meeting distance to integrate the sign values of GU into Eq. (9). In GU , the sign
(CEMD). The similarity between u and v of same type in G values only exist in the relationships (user, item) or (item,
based on CEMD is user). The sign values can be classified into 2 types: (1)
X X discrete values (e.g., like, watch, buy); (2) continuous values,
m(u, v) = P[t]cl(t) (11)
likely rating scores. We design a function f to weight the
T : Ai Aj t : (u, v) (x, x) similarity between the sign values. Suppose that xt,i is the
A ⊆ TG t∈T sign value of the edge e(u, 0t,i (u)), and yt,j be the sign value
where TG is the network schema of G. of e(v, 0t,j (v)). Based on the definition of CSR, the values of
First, if u = v, m(u, v) = s(u, v) = 1 since l(t) = 0. If there e(u, 0t,i (u)) and e(v, 0t,j (v)) must belong to the same type.
is no path t from (u, v) to any singleton nodes, i.e., there is no If the edges have no sign value, f (xt,i , yt,j ) = 1; else if they
same-length paths ta : u x and tb : v x of the same belong to discrete values, the f function is
meta path, m(u, v) = 0, it’s easy to see that s(u, v) = 0 from (
1 if xt,i = yt,j
Eq. (9) since no similarity would flow to (u, v). Otherwise, f (xt,j , yt,j ) = (12)
consider the tours from (u, v) to a singleton node x in which 0 otherwise
VOLUME 8, 2020 52877

This ensure that only the edges of the same sign value lead can be reached by iteration to a fixed point. Given GU , for
to similarity. For example, if user u and v both like or dislike each Ai ∈ A, we use Ni to denote the number of nodes belong
item i, we will say that u and v are similar. However, if u to Ai , and create Ni2 entries S(∗ ,∗ ) of length Ni2 . For each iter-
P|A|
likes i; meanwhile, v dislikes i, u and v are not similar in the ation, we can keep i=1 Ni2 entries S k (∗ ,∗ ), where S k (a, b)
perspective of item i, even they have a high similar score in gives the score between a and b on k th iteration. S k+1 (∗ ,∗ )
structure of GU . Then, if the values of the signed edges are can be computed based on S k (∗ ,∗ ) with start at S 0 (∗ ,∗ ). Each
continuous, we compute f via an exponential function, S 0 (a, b) is a lower bound on Constrained SimRank score
f (xt,i , yt,j ) = exp(−λ xt,i , yt,j )

(13) s(a, b):
(
where λ is to control the effectiveness of sign values. 0
S (a, b) =
0 if a 6 = b
(15)
In GU , various semantic edges have different contributions 1 if a = b
for the similarity computation and further rating prediction.
Let R(Ai , Aj ) be a semantic edge in TG . Obviously, we have To computer S k+1 (∗ ,∗ ), base on Eq. (14), we get:
R(Ai , Aj ) = R(Aj , Ai ) since they share the same semantic.
(
k+1 sk (a, b) if a 6 = b
We use W to denote the semantic edge weights in Fig. (3), S (a, b) = (16)
1 if a = b
and the final CSR score is
|X
Tu,v | |0X
t (u)| |0 t (v)| On each (k + 1)th iteration, we update the similarity score
c X
s(u, v) = wt of (a, b) using the scores of its neighbors from the previous
|0(u)| |0(v)| iteration k. Since the values S k (a, b), as the k th increase,
t=1 i=1 j=1
× f (xt,i , yt,j )s(0t,i (v), 0t,j (v)) (14) is nondecreasing and has an upper limit 1, it will converges,
i.e., for any pair (a, b), a, b ∈ VU in GU = (VU , EU ), we can
is one dimension of W , W = (w1 , w2 , . . . , wZ ), and
where wtP infer that limk→∞ S k (a, b) = s(a, b).
wt > 0, Zt=1 wt = Z . Let b = max{Ni2 }. To computing Rate Pair Similarity, the
space required is O(Zb2 ) to store the results S k (∗ ,∗ ). The time
D. RATE PAIR SIMILARITY required is the same as SimRank. Further, suppose d̄ is the
Given GU , we can computer the rate pair similarity based average value of |0(∗ )| |0(∗ )| over GU , the required time is
on Eq. (7) and Eq. (14). In this section, we introduce the O(kZb2 d̄). In our experiments, we set the precision of GSR
computation of rate pair similarity. Before that, some good calculation 10−5 , and we have seen that the final score will
properties of our rate pair similarity for GF model are shown stabilize within 9 iterations, i.e., k ≤ 9.
in Theorem 2. Furthermore, the rate pair profile also retains the temporal
Theorem 2: (Properties of Rate Pair Similarity): information. The behaviors in a long time ago may have few
1) Symmetric: s[(u, i), (v, j)] = s[(v, j), (u, i)]; influences on recent behaviors. We incorporate the temporal
2) Self-maximum with W = 1.0: s[(u, i), (v, j)] ∈ [0, 1] information of rate pair in GF by adding a time-based weight
and s[(u, i), (u, i)] = 1; and the predictions, weighted by the time-based parameter,
3) Awareness of sign value. Let au,i , bv,j be the sign value for the ratings are computed as follows [23]:
of r(u, i), r(v, j). Then, s[(u, i), (v, j)] ∝ f (au,i , bv,j ). P|Uu | P|Ii | 0
Proof: These properties can be easily proved based on v=1 j=1 s [(u, i), (v, j)](rvj − r̄v )
r̂ui = r̄u + P|Uu | P|Ii | 0 (17)
our definitions. v=1 j=1 s [(u, i), (v, j)]
Since s(u, i)s(v, j) = s(v, j)s(u, i), we can infer that rate pair
similarity is symmetric according to Eq. (7). where s0 [(u, i), (v, j)] is the time-aware rate pair similarity of
The domain of SimRank is [0, 1]. Let u, v be two nodes r(u, i) and r(v, j), computed by
in G. If there is not any path between them, the SimRank s0 [(u, i), (v, j)] = s[(u, i), (v, j)]e−α(du,i −dv,j ) (18)
value is 0; else, if u = v, it’s 1; otherwise SimRank score
belongs to (0, 1). Based on Eq. (14), CSR score is no bigger Note that the prediction has a time-based relevance factor
than SimRank in the same case. And it also can be 0 or 1 in exp(−α(du,i − dv,j )) with a decaying rate controlled by the
the same condition as SimRank. Based on Eq. (7), the rate parameter α. Here, dv,j denotes the timestep when user v rated
pair similarity is self-maximum with a domain [0, 1]. the item j [24].
Based on Eq. (14), user similarity partially relies on item
similarity, and item similarity partially depends on users. E. GF BASED RECOMMENDATION
Thus, s(u, v) ∝ f (xu,i , yv,j ) and s(i, j) ∝ f (xi,u , yj,v ). Since Given GU , we can easily predict rating scores as memory-
au,i = xu,i = xi,u and bv,j = xv,j = xj,v , based on Eq. (7), based CF based on Eq. (7) and Eq. (17) except the parament
we have s[(u, i), (v, j)] ∝ f (au,i , bv,j ). W . To compute W , we propose a self-adjust method that
Beside these properties, we can also infer that the rate pair optimizes W and predicts rating scores though an iterative
similarity is semantic-aware based on Theorem 1. process:
To computer the rate pair similarity is to figure Eq. (14) of 1) Let W t be the weight parament of semantic relations in
CSR. Inspired by SimRank, the solution to the CSR equation the ith iteration. Initialize W 0 with w0i = 1, 1 ≤ i ≤ Z .
52878 VOLUME 8, 2020

2) Given GU and W t , compute r̂ based on Eq. (17). scores. Considering the sparsity problem, GF desires more
3) Recompute the rating scores r̃ of feedback rate pairs similar rate pairs for prediction. Assume that
based on r and r̂. Weight the differences between r and |Uu | X
|Ii |
its recomputed score r̃ through an objective function χ = min{
X
s0 [(u, i), (v, j)] |r(u, i) ∈ PRP }
O(W , r, r̃).
v=1 j=1
4) Get W t+1 through optimizing O(W , r, r̃).
5) Repeat until the convergence of the parament W and r̂. where PRP is the set of all predicting rate pairs in GU . Further,
In this section, we firstly built an objective function. Then, we set norm = 0.85χ .
we propose a vote mechanism to compute the paraments, and Theorem 3: Given GU and norm, there exists a unique solu-
the GF-based recommendation algorithm is given at last. tion about W ∗ = (w∗1 , . . . , w∗Z ) that minimize the objective
function in Eq. (20) except that all users’ rate scores in the
1) OBJECTIVE FUNCTION rating matrix are the same.
One precondition for our GF model to effectively predict is Proof: Taking Eq. (17) and Eq. (19) with a given norm
that the outputs should be close to the feedback rating scores into the objective function, we can infer that Eq. (20) is a
if we use GF to recompute the rating matrix. Suppose that polynomial function of multi-variable W with non-negative
rating matrix is denoted as R. If r(u, i) is a feedback rate pair, real coefficients. Assume that the polynomial expression of
R(u, i) = rui ; or R(u, i) = null. Given GU and W , GF can wi in Eq. (20) is denoted as fi (wi ). Then the Lagrange form of
complete R, denoted as R̂. Based on R̂, we can get r̃ui by Eq. (20) is
P|Uu | P|Ii | 0 Z
j=1 s [(u, i), (v, j)](R(v, j) − r̄v )
X X
r̃ui = r̄u +
v=1
(19) O0 ({r̂u,i }, W , λ) = fi (wi ) + λ( wi − Z ) (21)
P|Uu | P|Ii | 0
j=1 s [(u, i), (v, j)]
i=1 i
v=1
Based on the recomputed values of feedback rate pairs, The partial derivative functions of Eq. (21) are
we can use the missing ratings as the objective function of ∂O0 ∂fi (wi )
GF. = + λ + b = 0, (1 ≤ i ≤ m) (22)
∂wi ∂wi
Definition 4 (Graph Filtering Objective Function): Given
Based on Gauss’ Fundamental Theorem of Algebra [25],
GU , the weights W of semantic relations and rating matrix
Eq. (22) has at least one complex root. Assume λ = λ∗ .
R, the goal of GF-based recommendation is to compute the
When λ ≥ −b, there are no real number solution to Eq. (22).
value r̂ for each predicating rate pair which complements R
Considering the condition λ < −b, there must be at least one
as R̂, so that the following objective function is minimized
X positive number δ to make Eq. (22) is greater than 0 when
O({r̂ui }, W ) = (rvj − r̃vj )2 , w∗i = δ . Since Eq. (22) is continuous in the real interval
r(v,j)∈FRP (0, δ), there must be at least one solution in (0, δ).
XZ Therefore, there also is a unique positive root which max-
s.t. 0 < wi < Z and wi = Z (20)
i=1 imizes the objective function in Eq. (20).
where FRP is the set of all feedback rate pairs in GU . Given a norm in Eq. (17), minimizing the objective func-
tion is a linear programming problem. Thus, an adaptive
2) SELF-ADJUSTIVE SOLUTION adjustment method is proposed to iteratively improve W .
The Eq. (20) is too complex to figure out for the polynomial First, we initialize W 0 with 1.0, and iteratively adjust wti with
denominators about W in Eq. (17). Hence, we propose an an increment 1wti . Then, the weight wi in the (t+1)th iteration
approximate solution that is to set a constant norm. For each is computed by
predicting rate pair r(u, i), we choose top kui most similar rate 1 t
pairs. Let r(u, i)(k) be the k th similar rate pair of r(u, i), and wt+1 = (w + 1wti ) (23)
i
2 i
the top kui most similar rate pairs should subject to:
To accurately determine the increment 1w for each edge
kui
X type, a majority vote mechanism is design. In GF framework,
s0 [(u, i), (u, i)(k) ] ≤ norm we use rate scores r to predict r̂ and recompute r as r̃ based on
k=1 r̂. Due to the symmetry of the similarity, if a feedback rate pair
ui +1
kX r(v, j) has a high similarity to the predicating rate pair r(u, i),
s [(u, i), (u, i)(k+1) ] > norm
0
r̂ui will play a high important role for the re-computation of
k=1 r̃vj . That’s to say, r̂ and r will be closer if their corresponding
The constant norm plays an important role our GF model. rate pairs are more similar. Further, to minimize the distance
It determines how many feedback rate pairs are similar to the between rvj and r̃vj , where r(v, j) has a strong influence on
target predicting rate pair and further utilized for prediction. r(u, i) and r̃vj is computed through r̂ui , we should improve
Given GU , based on CSR, highly concentrated rate pairs may the similarity s[(u, i), (v, j)].
have more similar rate pairs with high scores. While most of Suppose r(v, j) belongs to the top kui similar rate pair set of
the rate pairs have few similar rate pairs with high similar r(u, i). If we improve the similarity s[(u, i), (v, j)], r̃vj will be
VOLUME 8, 2020 52879

close to rvj . Therefore, to approximate the objective function, INPUT: RATING MATRIX RM ∗ N , ATTRIBUTES OF
we should maximize the closeness of the similarity set for USERS {UAm }, ATTRIBUTES OF ITEMS {IAn }, USER
each predicating rate pair. The rate pair similarity can be RELATIONSHIP URM ∗ M , ITEM RELATIONS IRN ∗ N ,
improved from 2 aspects: user and item. If a kind of semantic AND NORM.
edge can improve the similarity of users or items in a simi- OUTPUT:{r̂ui }
larity set, the weight of this semantic edge should be higher. 1. Built the unified graph GU .
Further, a vote mechanism is designed as follow: for each 2. Initialize W 0 , wi = 1, i = 1, . . . , Z .
predicating rate pair r(u, i), let Uut and Iit be the similar set REPEAT
of user u and item i in t th iteration with self-include. If a large 3. Run Constrained SimRank on GU .
portion of vertices within Uut share the same neighbor node of 4. For each predicating rate pair r(u, i):
a certain type Ak , it means that the semantic edge R(user, Ak ) Compute similarities between r(u, i) and other rate
has a good property for the similarity between nodes in Uut . pairs based on Eq. (18).
Suppose that vp and vq are two users in Uut or items in Iit , Find top kui most similar feedback rate pairs.
we design a vote measure to determine whether two vertices 5. Computer r̂ui based on Eq. (17) with norm.
share the same neighbors: 6. Complete R as R̂.
( 7. Recompute r̃vj based on Eq. (19).
1 vp 6 = vq , ∃e(vp , a), e(vq , a), a ∈ Ak 8. Create the objective function in Eq. (20).
votek (vp , vq ) = (24)
0 otherwise 9. Update W based on Eq. (24) ∼ (28).
END UNTIL THE OBJECTIVE FUNCTION CON-
Since R(Ai , Aj ) and R(Aj , Ai ) share the same weight, we VERGES.
only focus on the weight w of semantic relation in the form 10. Output {r̂ui }.
of R(user, Ak ) or R(item, Ak ) in TG (i.e., Fig. 3). Then 1wti is
estimated by counting the number of vertices which share the
same neighbor of Ak within Uut or Iit for all predicating rate Suppose H = max{O(kZb2 d̄), O(b2 d̃)}. Then, the time
pairs in t th iteration. complexity of GF is O(KH ) where K is the outer iteration
If the nodes of Ak only connect to user nodes, and wk is the number. Detailly, in CSR computation, d̄ depends on the
weight of R(user, Ak ), the count function for 1wtk is average of node degree of GU . Meanwhile, d̃ relies on norm
X |Uu| X
X |Uu| to select the similar rate pairs. Generally, d̃ > d̄ since the
hu (Ak , t) = votek (vp , vq ) (25) number of similar rate pairs searched from whole GU via
r(u,i)∈PRP p=1 q=1 paths is usually bigger than the average of node degree. In our
experiment, we have seen that kZb2 d̄ < b2 d̃. That is to say
If the nodes of Ak only connect to item nodes, and wk is the time complexity of GF is O(Kb2 d̃). Further, our propose
about R(item, Ak ), the count function of 1wtk is method promises a high speed of convergence, and W will be
X |Ii| X
X |Ii| stable within 10 iterations.
hi (Ak , t) = votek (vp , vq ) (26)
r(u,i)∈PRP p=1 q=1 V. EXPERIMENTS
A. DATASETS
If wk is the weight of R(user, item), the count function is
We now provide an overview of the datasets.
1
hui = (hu (item, t) + hi (user, t)) (27)
2 1) AMAZON
Finally, based on Fig. 3, 1wtk is computed by We use two sets of the real traces from Amazon datasets:
movies and books. The period of extracted ratings is from
h(Ak , t) 2012 till 2014. The movie dataset consists of 2.12M ratings
1wtk = m n (28)
from 455,146 users for 156,923 movies with 41,253 directors,
hu (Ak , t) + hi (Ak , t) + hui
P P
k=1 k=1 58,8076 actors and 112 genres. The book dataset consists
of 3.52M ratings from 798,431 users for 482,879 books.
3) GRAPH FILTERING ALGORITHM And the attributes include 231,452 authors, 51,812 publisher,
For clarity, we show the steps of GF-based recommendation and 156 time-nodes. The ratings vary from 1 to 5 with an
in Algorithm 1. increment of 1.
For graph filtering, the required space is the same to store
S k (∗ ,∗ ) and R̂ that are the same as CSR. Each iteration of GF 2) DOUBAN DATASET
contains 3 part: (1) CSR computation; (2) predicting the score We create one dataset from Douban about movie domain.
for each rate pair; (3) weight adjustment. The required time The Douban movie dataset includes 14,351 users and
of prediction O(b2 ). Let d̃ be the average of the inner product 13,822 movies with 1,193,845 ratings from 1 to 5. The users
for Uu or Ii , and the time complexity of weight adjustment is are from 2,813 group and the attributes of movies consist
O(b2 d̃). of 2,541 directors, 6753 actors and 38 genres.
52880 VOLUME 8, 2020

TABLE 1. Results of effectiveness experiments on three datasets.
We use the widely used Root Mean Square Error (RMSE) in {80%, 60%, 40%}. Obviously, smaller ratio means much
and Normalized Discounted Cumulative Gain (NDCG) to sparser data for recommendation, and the capability of han-
measure the performance of the recommendation methods. dling sparsity problem is test.
Table 1 lists results of GF along with the baselines for
B. BASELINES RMSE on all left test data and NDCG with cut offs at 10.
We consider two kinds of recommendation methods: ones The major findings are summarized as follows:
only utilizing implicit feedback and others utilizing auxiliary 1) All these models are sensitive to data size and spar-
information. The comparisons are given as following: sity. With the increases of training ratio, the metric
results improve significantly. Among all these baselines,
1) ItemKNN [21] is a classical memory-based approach the methods that utilize heterogeneous information per-
that computes the cosine item similarity to provide rec- form better than pure CFs, including ItemKNN, MF
ommendation. and CMN. It indicates that exploiting auxiliary data can
2) MF [3] is a standard factorization method to exploit the partly overcome the data sparsity problem.
rating matrix for prediction. 2) The proposed GF method is better than the baselines
3) Collaborative Memory Network (CMN) [26] is a deep in most instances. Only in Douban dataset with 80%
architecture to unify MCF and latent factor model to training ratio, HERec performs better than GF. Compar-
explore user behaviors. ing with traditional HIN based methods, which directly
4) SVD++ is a variant of SVDFeature which utilized the exploit and model various kinds of auxiliary data via
meta-path2vec++ [27] embedding method to extract meta paths or network embedding, GF not only exploits
user and item embeddings as features for SVDFeature. the auxiliary data, but also learns the weights of different
5) NFM combines the linearity of FM and the non-linearity semantic relations among them for prediction. More
of neural network for modeling higher-order feature clearly, baselines first exploit the auxiliary data, and
interactions [16]. then learn models subject to the exploited information.
6) HERec [6] exploits auxiliary information based on graph Meanwhile, GF exploits auxiliary data and learns model
embedding and then utilizes non-linear fusion function in an iterative framework. Comparing with meta path
to integrate the embedded information into MF model. in NeuACF and embedding in HERec, CSR in GF can
7) NeuACF extracts different aspect-level similarities of exploit the auxiliary data more effectively.
users and items through different meta paths and feeds a 3) Comparing with Douban dataset, Amazon datasets are
neural network to learn latent factors [9]. large-scale with millions of relations. Most of baselines
Among these methods, we set the number of latent factors work better in Douban dataset than the Amazon datasets.
to 10 for MF. HIN-based methods need to specify meta paths. An intuitive explanation is that more information is
Following [10], we only select short meta paths of at most 4 lost when learning the latent features for each object in
steps. For the deep learning architectures, likely CMN, NFM large-scale sparse dataset. While, due to its case-to-case
and NeuACF, we design 4 layers uniformly. Other baseline design, GF work well in all these datasets. Furthermore,
parameters adopt the original optimal setting. with the decline of training ratio, the datasets become
much sparser, and the RMSE score of GF increase about
C. OVERALL RECOMMENDATION EFFECTIVENESS 30% at most, and the NDCG score falls about 10%,
For each dataset, we split the entire rating information into which demonstrates the effectiveness of GF in handling
a training set and a test set, with the training ratios as sparsity data.
VOLUME 8, 2020 52881

D. DETAILED ANALYSIS OF GF MODEL

1) SELF COMPARISON
In this section, we study the effectiveness of the sign value
function and edge weight in GF, and two comparisons are
designed. For the sign value function, the comparison runs
CSR on GU with f (xt,i , yt,j ) = 1 in Eq. (14), denoted as GFS.
For the edge weight factor, the comparison predicts directly
through Eq. (17) with wi = 1, 1 ≤ i ≤ Z , denoted as GFW.
The test ratio is 20%, and the results on 3 datasets are shown FIGURE 5. The performance corresponding to temporal factor at 80%
in Fig. 4. training ratio.
FIGURE 6. The weights of most important semantic relations at different

datasets and training ratios.
FIGURE 4. The performances of self-comparisons.
We can see that GF consistently performs better than GFS prediction model, even when the dataset is very sparse. Fur-
and GFW, which proves that sign value information is useful thermore, we can infer that for a new user or item in GU
for prediction and weighting semantic relations of GU is (i.e., cold-start problem), GF can predict the rating score by
meaningful. Further, comparing with GFW, GFS performs Eq. (17) without re-learning W .
better. An intuitive explanation is that semantic information
is more important than sign values, or our proposed iterative VI. CONCLUSION
GF algorithm is effective to achieve a global solution through In this paper, we study the problems in exploiting auxiliary
the objective function. information for sparsity recommendation. The process of cur-
rent methods can be divided into three stages: model, extrac-
2) TEMPORAL FACTOR tion and exploration. In extraction stage, most of existing
In this part, we study the temporal effect of user. Logical time approaches can not fully mine global structure and semantic
corresponding to the actual timestamp of a rating event is used features of users and items. Further, these methods, extended
to model the temporal information. We leverage GF and tune matrix factorization based model or deep learning model,
the temporal parameter accordingly. The results are shown suffer expensive model-building problem and cannot treat
in Fig. 5. personal latent factors carefully due to their global objective
Based on Eq. (18), if α = 0, it means no temporal functions.
information is considered in GF. First, Fig. 5 demonstrates In this paper, we propose a graph filtering method based
the temporal factor do have influence for rating prediction. on HIN for recommendation. GF predicts the rating that
Then, this factor effect where varies between 0 to 0.3. Due to says, ‘‘a predicted rating depends on its similar rating pairs.’’
the sparsity of dataset, further amplification of impacts the Concretely, we design a cast-to-case prediction framework.
predictions negatively as it reduces the contribution of old For a given predicting rate pair, we can search its similar
ratings. We provide the optimally tuned parameter for our rate pair based on Constrained SimRank, a semantic and
experiments, shown in Table 1. sign value-aware similarity measure on HIN and compute the
predicting rate score via weighted average of all its similar
3) SCALABILITY ratings. Further, an adaptive framework is proposed to learn
In this section, we study the scalability of the proposed GF the weights of different semantic edges and products an
model. The parameter W according to semantic edges is the optimized predicted rating. Finally, experimental studies on
central factor that effect the model scalability. Through our various real-world datasets demonstrate that GF is effective
experiment, in movie recommendation, the top 3 important to handler the sparsity issue of recommendation.
semantic relations are R(user, movie), R(movie, actor), and
R(movie, genre). We observe the two parameters in different REFERENCES
training ratios. The results are shown in Fig. 6. [1] M. B. Dias, D. Locher, M. Li, W. El-Deredy, and P. J. G. Lisboa,
‘‘The value of personalised recommender systems to e-business: A case
The values of the parameters W are always in a small study,’’ in Proc. ACM Conf. Recommender Syst. (RecSys), 2008,
domain. It demonstrates GF can learn an approximate optimal pp. 291–294.
52882 VOLUME 8, 2020

[2] X. Su and T. M. Khoshgoftaar, ‘‘A survey of collaborative filtering tech- [22] G. Jeh and J. Widom, ‘‘SimRank: A measure of structural-context similar-
niques,’’ Adv. Artif. Intell., vol. 2009, pp. 1–19, Oct. 2009. ity,’’ in Proc. 8th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining
[3] Y. Koren, R. Bell, and C. Volinsky, ‘‘Matrix factorization techniques for (KDD), 2002, pp. 538–543.
recommender systems,’’ Computer, vol. 42, no. 8, pp. 30–37, Aug. 2009. [23] Y. Ding and X. Li, ‘‘Time weight collaborative filtering,’’ in Proc. 14th
[4] Y. Hu, Y. Koren, and C. Volinsky, ‘‘Collaborative filtering for implicit ACM Int. Conf. Inf. Knowl. Manage. (CIKM), 2005, pp. 485–492.
feedback datasets,’’ in Proc. 8th IEEE Int. Conf. Data Mining, Dec. 2008, [24] R. Guerraoui, A.-M. Kermarrec, T. Lin, and R. Patra, ‘‘Heterogeneous
pp. 263–272. recommendations: What you might like to read after watching interstellar,’’
[5] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, Proc. VLDB Endowment, vol. 10, no. 10, pp. 1070–1081, Jun. 2017.
‘‘BPR: Bayesian personalized ranking from implicit feedback,’’ 2012, [25] Y. Zhou, H. Cheng, and J. X. Yu, ‘‘Graph clustering based on struc-
arXiv:1205.2618. [Online]. Available: http://arxiv.org/abs/1205.2618 tural/attribute similarities,’’ Proc. VLDB Endowment, vol. 2, no. 1,
[6] C. Shi, B. Hu, W. X. Zhao, and P. S. Yu, ‘‘Heterogeneous information pp. 718–729, Aug. 2009.
network embedding for recommendation,’’ IEEE Trans. Knowl. Data Eng., [26] T. Ebesu, B. Shen, and Y. Fang, ‘‘Collaborative memory network for rec-
vol. 31, no. 2, pp. 357–370, Feb. 2019. ommendation systems,’’ in Proc. 41st Int. ACM SIGIR Conf. Res. Develop.
[7] S. Rendle, ‘‘Factorization machines,’’ in Proc. IEEE Int. Conf. Data Min- Inf. Retr. (SIGIR), 2018, pp. 515–524.
ing, Dec. 2010, pp. 995–1000. [27] Y. Dong, N. V. Chawla, and A. Swami, ‘‘Metapath2vec: Scalable represen-
[8] H. Guo, R. Tang, Y. Ye, Z. Li, and X. He, ‘‘DeepFM: A factorization- tation learning for heterogeneous networks,’’ in Proc. 23rd ACM SIGKDD
machine based neural network for CTR prediction,’’ in Proc. 26th Int. Joint Int. Conf. Knowl. Discovery Data Mining (KDD), 2017, pp. 135–144.
Conf. Artif. Intell., Aug. 2017, pp. 1725–1731. [28] Z. Chuanyan, H. Xiaoguang, and P. Zhaohui, ‘‘Social image recommen-
[9] X. Han, C. Shi, S. Wang, P. S. Yu, and L. Song, ‘‘Aspect-level deep dation based on path relevance,’’ in Proc. Asia–Pacific Web Web-Age Inf.
collaborative filtering via heterogeneous information networks,’’ in Proc. Manage. Joint Int. Conf. Web Big Data, 2018, pp. 165–180.
27th Int. Joint Conf. Artif. Intell., Jul. 2018, pp. 3393–3399.
[10] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, ‘‘PathSim: Meta path-based top-
K similarity search in heterogeneous information networks,’’ Proc. VLDB
Endowment, vol. 4, no. 11, pp. 992–1003, 2011.
[11] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King, ‘‘Recommender systems
with social regularization,’’ in Proc. 4th ACM Int. Conf. Web Search Data CHUANYAN ZHANG was born in Binzhou,
Mining (WSDM), 2011, pp. 287–296. Shandong, China, in 1985. He received the B.S.
[12] C. Luo, W. Pang, Z. Wang, and C. Lin, ‘‘Hete-CF: Social-based collabo-
and M.S. degrees in software engineering from
rative filtering recommendation using heterogeneous relations,’’ in Proc.
IEEE Int. Conf. Data Mining, Dec. 2014, pp. 917–922.
Shandong University, Jinan, China, in 2012, where
[13] S. Sedhain, S. Sanner, D. Braziunas, L. Xie, and J. Christensen, ‘‘Social he is currently pursuing the Ph.D. degree in com-
collaborative filtering for cold-start recommendations,’’ in Proc. 8th ACM puter science and technology.
Conf. Recommender Syst. (RecSys), 2014, pp. 345–348. His research interests include machine learning,
[14] G. Zhao, X. Qian, and C. Kang, ‘‘Service rating prediction by exploring data integration, cloud computation, and social
social mobile users’ geographical locations,’’ IEEE Trans. Big Data, vol. 3, network analysis. He has taken part in several
no. 1, pp. 67–78, Mar. 2017. research programs funded by NSFC and Key R&D
[15] G. Adomavicius, B. Mobasher, F. Ricci, and A. Tuzhilin, ‘‘Context-aware Projects from Shandong Province.
recommender systems,’’ AI Mag., vol. 32, no. 3, pp. 67–81, 2011.
[16] X. He and T.-S. Chua, ‘‘Neural factorization machines for sparse predictive
analytics,’’ in Proc. 40th Int. ACM SIGIR Conf. Res. Develop. Inf. Retr.
(SIGIR), 2017, pp. 355–364.
[17] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. S. Chua, ‘‘Neural
Collaborative Filtering,’’ in Proc. 26th Int. Conf. World Wide Web, 2017, XIAOGUANG HONG was born in Jinan, China,
pp. 173–182. in 1964.
[18] X. Xin, B. Chen, X. He, D. Wang, Y. Ding, and J. Jose, ‘‘CFM: Convolu- He was a Professor and Ph.D. supervisor.
tional factorization machines for context-aware recommendation,’’ in Proc. He is currently working at the Software Col-
28th Int. Joint Conf. Artif. Intell., Aug. 2019, pp. 3926–3932. lege, Shandong University. His research inter-
[19] C. Shi, X. Kong, Y. Huang, P. S. Yu, and B. Wu, ‘‘HeteSim: A gen-
ests include data integration, cloud computation,
eral framework for relevance measure in heterogeneous networks,’’ IEEE
Trans. Knowl. Data Eng., vol. 26, no. 10, pp. 2479–2492, Oct. 2014.
database optimization, social network analysis,
[20] Y. Fang, W. Lin, V. W. Zheng, M. Wu, J. Shi, K. Chang, and X. Li, and recommender systems. He has undertaken
‘‘Metagraph-based learning on heterogeneous graphs,’’ IEEE Trans. many projects funded by NSFC and Key R&D
Knowl. Data Eng., to be published. Projects from Shandong Province, and has pub-
[21] R. M. Bell and Y. Koren, ‘‘Scalable collaborative filtering with jointly lished many high-quality conference and journal research papers. He is also
derived neighborhood interpolation weights,’’ in Proc. 7th IEEE Int. Conf. a Senior Member of the CCF.
Data Mining (ICDM ), Oct. 2007, pp. 43–52.
VOLUME 8, 2020 52883

Graph Filtering For Recommendation On Heterogeneous Information Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Graph Filtering For Recommendation On Heterogeneous Information Networks

Uploaded by

Copyright:

Available Formats

Received March 4, 2020, accepted March 9, 2020, date of publication March 16, 2020, date of current version March

Graph Filtering for Recommendation on

Corresponding author: Chuanyan Zhang (chuanyan_zhang@sina.cn)

I. INTRODUCTION Early generation CF methods, also known as memory-

VOLUME 8, 2020 52873

52874 VOLUME 8, 2020

Distinct from memory-based CF models, GF model (i.e.,

VOLUME 8, 2020 52875

52876 VOLUME 8, 2020

VOLUME 8, 2020 52877

52878 VOLUME 8, 2020

VOLUME 8, 2020 52879

52880 VOLUME 8, 2020

TABLE 1. Results of effectiveness experiments on three datasets.

VOLUME 8, 2020 52881

D. DETAILED ANALYSIS OF GF MODEL

FIGURE 6. The weights of most important semantic relations at different

52882 VOLUME 8, 2020

VOLUME 8, 2020 52883

You might also like