J170 A Novel Context-Aware Recommender Systems Based On A J166 Deep Sequential Learning Approach (CReS)

This article has been accepted for inclusion in a future issue of this journal.
Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS 1
Exploiting Long- and Short-Term Preferences for

Deep Context-Aware Recommendations
Tipajin Thaipisutikul , Timothy K. Shih, Senior Member, IEEE, Avirmed Enkhbat, and Wisnu Aditya

Abstract— With the tremendous growth in online information, L u and S u Long- and short-term representations.
capturing dynamic users’ preferences based on their historical ou Final user interest representation.
interactions and providing a few desirable items to users have
become an urgent service for all businesses. Recurrent neural ou Ground-truth user interest representation.
networks (RNNs) and item-based collaborative filtering (CF)
models are commonly used in industries due to their sim- I. I NTRODUCTION
plicity and efficiency. However, they fail to different contexts
that could differently influence current users’ decision-making.
Also, they are not sufficient to capture multiple users’ interests
based on features of the interacting items. Besides, they have
D IGITAL transformation is accelerating and changing the
way in which society conducts business. These days,
Internet businesses, such as search engines, social networks,
a limited modeling capability for the evolution of diversity
and dynamic user preferences. In this article, we exploit long-
e-commerce, and entertainment services, to name a few,
and short-term preferences for deep context-aware recommen- operate based on massive amounts of user-generated data.
dations (LSCAR) to enhance the next item recommendation’s These enormous numbers of users’ longitudinal generated data
performance by introducing three novel components as follows: provide a great opportunity for researchers to leverage them
1) the user–contextual interaction module is proposed to capture to understand users’ behaviors. Recommender systems have
and differentiate the interaction between contexts and users;
2) the encoded multi-interest module is introduced to capture
been employed to not only mitigate the notorious informa-
various types of user interests; and 3) the integrator fusion gate tion overload problem but also to significantly help the web
module is used to effectively fuse the related long-term interests providers increase their revenue by automatically providing the
to the current short-term part, and the module returns the final right information in a timely manner to the right user. Hence,
user interest representation. Extensive experiments and results over the years, constant efforts have been made to enhance the
for two public datasets demonstrate that the proposed LSCAR
outperforms the state-of-the-art models in almost all metrics and quality of recommendations regarding accuracy, transparency,
could provide interpretable recommendation results. and interpretability.
However, although existing studies have shown the improve-
Index Terms— Attention network, context-aware recommenda-
tion, information fusion, interpretable recommendation, sequen- ment at some levels, current systems still face three main
tial modeling, user multi-interest representation. limitations that continue to hinder the recommendation
performances.
N OMENCLATURE A. Inability to Distinguish the Influence of Various Contexts

u ∈ U, |U| Set of users and the total number of users. on the Current User Decision-Making
i ∈ I , |I | Set of items and the total number of items. Generally, user’s interests are dynamic and subtle to the
{i 1u , . . . , i tu } User u s session with items of length t. contexts, such as time and location. For example, as shown
eu and E cu Static latent vector and dynamic latent vector in Fig. 1, a user u is intrinsically interested in action/thriller
of user u. movies (e.g., 16 blocks, The Crow). However, during the
ec m Latent vector of context value cm . weekend, he chooses to watch adventure/comedy movies
S u and L u Short-term interests and long-term interests. (e.g., Dogma) since he is at home with his children. This
fi ∈ F Feature scales describing the item. example illustrates that it is crucial to include contexts
e f i and eid Latent vector of feature fi and item id. (e.g., day of the week) in the recommendation learning process
L uf Long-term representation w.r.t. each feature since they influence a user’s behaviors. Nevertheless, most
scale f i . previous works [1], [2], [16] treat contexts as extra dimensions
and incorporate them by linearly extending the latent factor
Manuscript received March 3, 2021; revised September 8, 2021; accepted
September 21, 2021. This work was supported in part by the Faculty of
model. Hence, they cannot differentiate the impact of different
Information and Communication Technology, Mahidol University, Thailand. contexts on different users since they only consider the overall
(Corresponding author: Tipajin Thaipisutikul.) context effect.
Tipajin Thaipisutikul is with the Faculty of Information and Communication
Technology, Mahidol University, Salaya, Phutthamonthon 73170, Thailand
(e-mail: tipajin.tha@mahidol.ac.th). B. Inability to Capture a User’s Multiple Interests From the
Timothy K. Shih, Avirmed Enkhbat, and Wisnu Aditya are with the Interacting Items
Department of Computer Science and Information Engineering, National
Central University, Taoyuan City 32001, Taiwan. As presented in Fig. 1, a user u is interested in multiple
Digital Object Identifier 10.1109/TCSS.2021.3116059 aspects of items, such as genres, actors, directors, and a series
2329-924X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS
Fig. 1. Motivation of our proposed model. There are multiple user interests, such as actor, genre, and director, which characterize the diversity and volatility
of both short- and long-term preferences. The dashed arrow reveals the change in user preferences. The current user behavior is influenced by current contexts,
such as time and location. The effect of contexts on user decision-making is represented with solid arrows. The thicker the line, the greater the impact a
particular context has on the user u.
of movies. Nevertheless, only a few sets of features may pri- designed to capture the dynamic nature of user preferences at
marily attract a user’s interest. Therefore, only modeling users’ two temporal levels, namely, short- and long-term preferences.
interactions based on an item’s ID is insufficient to capture The short-term preference reflects the most recent preference
the semantic features of relationships. Recently, the recurrent and is likely to drive future interactions, while the long-term
neural network (RNN) approaches [17], [19] have been widely preference is accumulated over an extended period and reflects
used to model sequential tasks. However, these methods ignore the user’s general preferences. Since there are many aspects
the fact that users can have multiple interests. Using only (i.e., item ID, item category, and item leaf category) to define
a single dot-product attention representation fails to reflect user interests, we model each aspect separately in the long-
diverse interests and so results in suboptimal recommendation term preference. On the contrary, we employ a multihead
performance. self-attention network to capture the users’ multiple interests
by representing preferences from different latent semantic
C. Inability to Effectively Model Users’ Evolving Preferences subspaces for the short-term preference. Note that a user
Directly modeling all of the user’s behaviors while ignoring attention network is also introduced in both the short- and
the dynamic nature of user preferences, as do the item- long-term preference modeling processes to cope with the
based collaborative filtering (CF) methods [8], [21], [28], personalization concern. Third, the integrator fusion gate (FG)
could limit the recommendation performance. These methods module is designed to effectively merge the static (long-
model static user–item interactions and cannot capture the term) and dynamic (short-term) preference vectors. Similar
dynamic transformation in users’ behavior sequences. As a to the gate mechanism in LSTM, an FG learns to control
result, they usually lead to homogeneous recommendations. the flow of information so that the most relevant infor-
Although more recent work [30] has tried to solve this problem mation in the long-term vectors is fused efficiently with
by considering both short- and long-term interests, it sim- a short-term vector. Finally, an FG outputs the final user
ply utilizes either concatenation or summation methods to interest representation, which will be used in the prediction
combine diverse long-term behaviors with short-term interest. process.
It is still opaque to effectively fuse those relevant long-term We summarize our main contributions as follows.
preferences regarding the current short-term preferences into 1) We propose a novel model to exploit the LSCAR in a
the recommendation process. unified framework.
In this article, we propose exploiting the long- and short- 2) We enhance the user representation by proposing the
term preferences for deep context-aware recommendations, user–contextual interaction module for explicitly mod-
namely LSCAR, to tackle the above limitations, while eling the high-order interactions between contexts and
providing an interpretable recommender solution. Specifi- users. This module also adaptively captures the different
cally, we propose to encode dynamic users’ preferences effects of different contexts on users via an attention
with three corresponding main components as follows. First, network.
the user–contextual interaction module is designed to mine 3) We propose a multihead self-attention network and a
interactions between contexts and users adaptively. It first takes user attention module to model a user’s more fine-
the static user representation and contexts as inputs to learn grained multiple interests in a personalized way.
the shared latent space of their interactions. Then, it returns 4) We design an FG to effectively combine short- and long-
a dense vector of each user–context-specific interaction. All term preferences rather than a simple combination.
user–context interaction vectors are later aggregated into an 5) On two public datasets, the experimental results not
attention network to adaptively differentiate the impact of only demonstrate that LSCAR outperforms the state-
different interactions and return a dynamic user representation of-the-art baselines in all metrics but also provides an
as an output. Second, the encoded multi-interest module is interpretable recommendation.
THAIPISUTIKUL et al.: EXPLOITING LONG- AND SHORT-TERM PREFERENCES FOR DEEP CONTEXT-AWARE RECOMMENDATIONS 3
The remainder of this article is organized as follows. for the product recommendation. Lin et al. [20] proposed
Section II presents related works, including traditional, a neural process to tackle a user cold-start recommendation
context-aware, and deep-learning methods for recommender by utilizing a task-adaptive mechanism with a metalearning
systems. Section III introduces the preliminaries to our model. However, these methods usually treat contexts as
study. Section IV describes the details of the proposed certain dimensions similar to those of users and items. The
LSCAR model. Section V presents the experimental setups, relevance between a user and a context is not intuitive and is
while Section VI discusses the experimental results. Finally, difficult to interpret. Also, very few studies have incorporated
Section VII concludes this article. sequential dependencies and contextual information simultane-
ously. Besides, they fail to distinguish the impact of different
II. R ELATED W ORK contexts in context-aware recommendations, which is one of
A. Traditional Recommendation Methods the major concerns of our work.
CF [8], [21], [28] provides the recommendation results C. Sequential-Aware Recommendation Methods
by mining the similarity of users or items from users’ his-
The sequential recommendation aims to predict users’ future
torical behaviors. For example, Linden et al. [21] proposed
interests from their past longitudinal behaviors based on
a personalized item-based CF method for each customer.
the order of time. For instance, FPMC [22] integrated a
Yang et al. [28] improved conventional CF recommendations
Markov chain (MC) with matrix factorization (MF) to build
by utilizing an interest injection concept and serendipi-
a personalized transition matrix. Fossil [13] combined users’
tous items. Du et al. [8] enhanced the neural CF-based
static and dynamic preferences by MC- and similarity-based
recommender systems by introducing noise layers and
methods. Lately, the RNNs and attention mechanisms have
teacher–student models. More recently, several deep learning
shown promising recommendation performance. For example,
techniques have been applied to various tasks. For example,
Hidasi et al. [11] proposed to capture users’ next consumption
He et al. [12] generalized multilayer perceptions (MLPs) to
with gated recurrent units (GRUs). Gao et al. [9] enhanced
estimate user–item interactions. He and Chua [10] proposed to
RNN to have more capability to capture users’ dynamic inter-
model feature interactions in a high-order fashion to improve
est over time. Chen et al. [3] proposed to fuse multiple interest
FM. Xiao et al. [26] introduced an attention network for fac-
representations with late fusion. Ying et al. [30] introduced
torization machines to distinguish the impact of the interaction.
a hierarchical structure of attention networks to model two
Yi et al. [29] integrated side information of users and items for
temporal levels of users’ preferences. Chen et al. [5] proposed
better latent vector representation. Kumar et al. [15] proposed
two attention deep-learning models for characterizing user
a hybrid of CF and content-based filtering with sentiment
interest at item component levels. Li et al. [17] leveraged the
analysis for recommendation systems. Although these methods
benefit brought by the attention network to mine the users’
have proven to be effective and widely employed, they do not
primary purpose, especially for longer sessions. Liu et al. [19]
take the dynamic and evolving nature of users’ preferences
utilized the last click in an attention network to point out
into consideration since they mainly focus on the stability of
the importance of the items in a recommendation session.
users’ preferences. On the contrary, we consider this aspect in
Xie et al. [27] proposed adversarial learning and contrastive
our proposed model.
learning to variational autoencoder (VAE) for the personalized
sequential recommendation. Cheng et al. [4] introduced a long
B. Contextual-Aware Recommendation Methods short-term session search network (LostNet) for a personalized
Contextual-aware RS utilizes contextual information to session search model to rerank documents for the current query
improve recommendation systems. Baltrunas et al. [2] and predict the following query. Nevertheless, these methods
extended MF with contexts (CARS) by introducing the inter- only focus on a single aspect of user interests, and both
action of contextual factors with item ratings. Li et al. [16] short- and long-term preferences are fused into a concatenation
introduced tensor factorization to integrate user–item–context approach, which results in suboptimal prediction performance.
interaction into the recommendation model. Moreover, In contrast, we propose a multihead self-attention approach to
Zheng et al. [32] introduced a contextual CSLIM model for capture multiple user interests. Furthermore, a gating mech-
top-N recommendations. Karatzoglou et al. [14] considered anism is deployed to effectively and efficiently incorporate
different contexts as additional data dimensions to enhance long- and short-term preferences for practical application.
context-aware recommendation. Liu et al. [18] introduced the
context-aware concept into the sequential modeling task. They III. P RELIMINARIES
designed a neural network to employ an adaptive context- In this section, we first provide the mathematical notations
specific input and adaptive context-specific transition input used throughout this article in Nomenclature. Then, we for-
matrices. Zhang et al. [31] utilized text metadata in movie mulate the definitions and the problem that we aim to solve
synopsis, actor list, and user comments for movie recommen- as follows.
dation. Wang et al. [25] included contextual information like
user trust, user preference, and user geographic and temporal A. Definition 1: User Representation
information for performance improvement. Dhelim et al. [7] We denote contextual information as cm where m is the
incorporated the social features, including personality traits total number of contexts. Let u ∈ U denote a set of users and
and topic of interests to alleviate the cold start problem |U| denote the total number of users. We aim to model the
interaction between a user and a context by its latent vectors.

We use eu ∈ Rd to denote the static latent vector of user u,
where d is the dimensionality of the user latent vector. We use
ecm ∈ Rdc to denote the dc -dimensional latent vector of context
value cm . Both user embedding eu and context embedding ecm
are learned during training with the whole proposed model
architecture. We use E cu ∈ Rd to denote the dynamic latent
vector of user u. This E cu defines the current state of a user u
in specific contexts.
Fig. 2. High-level overview of the recommendation process in LSCAR.
B. Definition 2: Item and Its Feature Representation
A. General Framework
We denote i ∈ I as a set of items, and |I | denotes the total
number of items. For each user u ∈ U, we sort his interactions As presented in Fig. 2, our model first takes the short-term
by time in chronological order. We will get |U| item sequences interest S u , the long-term interest L u , and the dynamic user
{i 1 , i 2 , . . . , i t }, where t is the maximum number of items in representation influenced by context E cu as inputs. S u and L u
each user’s sequence. All |U| item sequences are used as the are then, respectively, encoded by the user attention network

input to the W2V model [23] to encode each item id to a into the short-term session representation S u and the long-
u
unique latent vector eid ∈ Rd . In this way, the items appearing term representation L f where f is the total feature scales
in similar contexts (the items that are always consumed describing items. In the long-term modeling part, the multiple

together) will be similarly embedded in proximity in the vector feature scales L uf are fused into one representation denoted as

space. For each item i ∈ I , it normally has a set of features L u . Then, S u and L u representations are combined through an
f i ∈ F describing the item itself from different aspects. integrator FG neural network and return the final user interest
We encode the embedding of the feature f i ∈ F that has representation ou . Finally, ou will be further computed and
a 1-1 relationship with an item ID in a similar way when normalized by a SoftMax layer to produce the recommenda-
encoding item id. On the contrary, for the feature f i ∈ F tion’s probability scores for each item.
that has an M − 1 relationship with item ID, the sequence of
[ f i1 , fi2 , . . . , fi M ] belonging to a particular item is constructed B. User–Contextual Interaction Module
as an instance input to the W2V network. Finally, we will
This section describes how we adaptively distinguish the
obtain the embedding e fi ∈ Rd for each f i ∈ F .
impact of interactions between contexts and users. The overall
C. Definition 3: Short-Term Representation architecture of this module is illustrated in Fig. 3. It consists
of two submodules: 1) user–contextual shared latent space and
For each user u ∈ U , we sort his interacting items in
2) context-attentional effect. Through these two components,
ascending order of time. Then, we formulate the interacting
we learn the shared context-specific latent vector for user u,
items into sessions based on the time window. The latest
denoted as ecui , and the dynamic user representation based on
session of a user u is regarded as the short-term preference,
the current contexts as E cu . Next, we present the details and
namely, S u = [i 1u , i 2u , . . . , i t−1
u
]. Finally, the short-term interest
parameter learning process of each submodule.
S is further encoded to the short-term representation S u .
u
1) User–Contextual Shared Latent Space Submodule: This
D. Definition 4: Long-Term Representation submodule aims to learn the shared latent space between
contexts and users. As illustrated in Fig. 3(a), since users and
The long-term interest L u is computed from the interacting contexts are different objects and have different characteristics,
items that happened before S u . We denote each long-term we first map a static user latent vector eu and a context latent
interest w.r.t. each feature scale as L uf , where f ∈ F is the vector eci to the shared hidden space using a bilinear layer,
total number of feature scales describing different granularity as shown in the following:
levels of a user’s interests. Finally, L uf is further encoded to
the long-term representation L u .

z τ = f Wuτ eu + Wcτ eci + bzτ (1)
E. Problem Formulation where Wuτ , Wcτ , bzτ , and f (.) are the transformation matrices
for the user latent vector, the context latent vector, the bias
Given a static user representation eu , a context combination term, and the Rectifier (Relu) activation function. The bilinear
{c1 , . . . , cm }, short-term behaviors S u = {i 1u , i 2u , . . . , i t−1
u
}, and layer’s output is then passed through a stack of fully connected
long-term behaviors L u of a user u ∈ U, we would like to layers, as shown in the following:
recommend the top-N items that a user u is likely to interact
within the next timestep t. z 1τ = f W1τ z τ + b1τ
...
IV. P ROPOSED M ODEL : LSCAR
ecui = f L W Lτ z τL−1 + b τL (2)
This section introduces the proposed model to exploit
LSCAR. We first present the general framework. Then, where Wlτ , blτ ,
and fl (·) denote the weight matrix, the bias
we detail each component and introduce the parameter learn- vector, and the ReLu activation function, respectively. Finally,
ing process of LSCAR. the output vector in the last Lth hidden layer is denoted as ecui .
Fig. 3. Architecture of the user–contextual interaction module. It consists of two submodules that are (a) user–contextual shared latent space submodule and
(b) context-attentional effect submodule.
2) Context-Attention Effect Submodule: This submodule or adaptive binning on the continuous features to include these
aims to differentiate the importance of interaction effects ecui features with their frequency for further processing.
and return the aggregation of interaction effects as a dynamic The overall extraction process utilizes four layers: 1) an
user representation E cu . In particular, given a latent vector embedding layer; 2) a sequential layer; 3) a multihead encod-
ecui and a static user latent vector eu , we could compute the ing layer; and 4) a user attention layer. The details of each
attention score α(u, ci ), as shown in the following: layer are described as follows.
1) Embedding Layer: For a user u, his interacted item
α(u, ci )∗ = W1α f W2α ecui + W3α eu + b α (3) i ∈ I and each feature scale fi describing items can be
where W1α , W2α , W3α , b α , and f (·) are model parameters and embedded from the pretrained embedding vectors into eid ∈
an activation function ReLu. Then, the attention score is Rd and eufi ∈ Rd d-dimensional vectors. As shown in Fig. 4,
normalized by a SoftMax function, as shown in the following: in the long-term preference modeling part, we model each
feature scale L uf separately since it can describe the user’s
exp α(u, ci )∗ preference from a different point of view. We denote freqi
α(u, ci ) = . (4)
exp α(u, ci )∗ as the consumption frequency of a particular feature scale to
indicate a user’s preference level. We only learn the embedding
After we get the attention scores for each user–contextual
for nonzero frequencies (freqi ∗ euf ) since freqi = 0 leads to
interaction, the dynamic user representation E uc of user u is
euf = 0. The main reason to integrate freqi into the model
computed, as shown in the following:
learning process is to identify similar past interests. We aim

M to compute a final user preference based on the preference
E cu = α(u, ci )euci . (5) similarity and frequencies, which is highly relevant to the
i current user interactions.
In this way, the recommended items capture the user’s 2) Sequential Layer: In this layer, given the embedded
global preferences and are tailored to the specific context and sequence of short-term preference S u = {eid u
1
, eid
u
2
, . . . , eid
u
t
},
situation that the user currently finds himself in. the global temporal dependency is captured by the long short-
term memory (LSTM) network as follows:

C. Encoded Multi-Interest Module ftu = σ W 1f eid
u
+ W 2f h ut−1 + b f
t

This section details how we extract the short- and long- i tu = σ Wi1 eid
u
+ Wi2 h ut−1 + bi
term user preferences. As shown in Fig. 4, we model multiple t

otu = σ Wo1 eid
u
+ Wo2 h ut−1 + bo
feature scales separately in the long-term preference modeling t

part. The reasons are twofold. On the one hand, users usually C˜tu = tanh W 1 eu + W 2 h u + bc
c idt c t−1
not only focus on a specific item itself but are also concerned ctu = f tu ct−1
u
+ i tu C˜tu
about its features, such as genre, director, and actor. In other
h ut = otu tanh ctu (6)
words, users may like only a few features of the particular
interacting item; hence, encoding only the item ID feature level where f tu , i tu , and otu denote the forget, input, and output
is far from satisfactory. On the other hand, users may interact gates, respectively. The LSTM outputs the sequential pref-
with many items belonging to the same feature. Hence, cap- erence representation by modeling the short-term behavior
turing the user interests from the frequency of the interacting sequence into the hidden state vector h ut at time t while
features could also be a good indicator of the user level of carrying the past information to the present via the cell
interest. Although our proposed model currently can support state ctu .
only nominal contextual features, we provide room for future 3) Multihead Encoding Layer: Since users may have vari-
improvement to integrate continuous numeric attributes in both ous aspects of interest in items, a single attention network is
long- and short-term parts as follows. For the short-term part, not sufficient to capture the multiple aspect representations.
we can normalize the continuous attributes into the proper Therefore, we exploit a multihead self-attention network to
range and concatenate these features with the item ID feature. capture users’ preferences from different aspects. A self-
For the long-term part, we can perform the fix-width binning attention network is a special case of attention mechanism,
Fig. 4. Detailed model architecture of the encoded multi-interest, integrator FG, and prediction modules. The short-term representation (S u ) is encoded

into (S u ) by LSTM, multihead self-attention, and a user attention network, which will be guided by a user representation with context (E cu ). The long-term

representation (L u ) is encoded into (L uf i ) by a user attention network, which will be guided by E cu . The subscription f i ∈ F represents multiple user interests

on feature scales describing the items that user u is interacting with. L uf i is further encoded into the final representation of long-term interests L u . Then, S u

and L are fused via an integrator FG to produce the final user interest representation (o ).
u u
which takes the sequence itself as query, key, and value vector 4) User Attention Layer: Since different users have different
of d-dimension. From the previous LSTM outputs denoted as preferences even for similar item sets, we introduce a user
H u = {h u1 , h u2 , . . . , h ut }, we can aggregate the output vector attention layer on top of the lower layers in both the short-
after a self-attention network denoted as hut . A multihead and long-term preference modeling parts. This module aims
self-attention network allows our model to jointly focus on to mine more fine-grained personalized information where a
information from different representation subspaces in various dynamic user representation E cu is used as the query vector
positions. As a result, we could model user preference hut ∈ u = {hu1 , hu2 , . . . , hut }
attending to inputi , which can either be H
Rd from multiple interest aspects. The output matrix H u = in the short-term modeling part or (freqi ∗ euf ) in the long-term
{h1 , h2 , . . . , ht } is calculated as
u u u
modeling part. Next, we compute the short-term representation

S u as follows:
u = Multihead H u = W h concat headu1 , . . . , headuh (7)
H
θi∗ = f W1θ inputi + W2θ E cu + b θ (11)
where W h ∈ Rd×hdk denotes the weight matrix of the output
linear transformation, the subscript h denotes the number of where W1θ , W2θ , bθ , and f (·) are model parameters and an
heads, and dk = 1/ hd. In particular, each headui ∈ Rdk ×t activation function ReLu. Then, the attention scores θi∗ are
denotes a single latent interest as normalized by a Softmax layer as follows:

headui = Attention WiQ H u , WiK H u , WiV H u (8) exp θi∗
θi = . (12)
exp θi∗
where WiQ , WiK , WiV ∈ Rdk ×d denote the linear transformation
weight matrices of query, key, and value, respectively. Let Now, we get the weight of each inputi , which reflects the

Q ui = WiQ H u , K iu = WiK H u , and Viu = WiV H u . The importance of each input component. We then compute S u by
attention score matrix is calculated as follows: a weighted sum of input latent vectors and their corresponding
attention scores as follows:

f Q ui , K iu = Q uT
i Ki
u

Su = θi inputi . (13)
Ai = softmax f Q ui , K iu .
u
(9) i=1,...,t

Finally, we perform weighted sum pooling as follows: We compute the long-term representation L uf in a similar
way by using (11)–(13) but with different model parameters.
headui = Viu AuT
i . (10)
D. Integrator Fusion Gate Module Algorithm 1 Learning Algorithm for LSCAR

1) Aggregation Layer: The various aspects of the long- Input: The users’ short-term interests S u , the users’ long-

term preference L uf are encoded via an MLP network into term interests L u , the context combination c, a set of features

a compacted long-term representation L u . We derive the fine- fi ∈ F describing the item, the latent dimension (d), model
u batch size (bz), learning rate (lr), the number of heads, and
grained long-term preferences L via

epochs.
L u = f W f1 L uf1 + · · · + W f F L uf2 + b f (14) Output: model parameters
1: Initialize all parameters in

where W f , b f , and L uf are trainable weight matrices, a bias 2: repeat
term, and a long-term representation for each feature f , where 3: Shuffle the set of observations {S u , L u , c, f i }
f ∈ F is the total number of feature scales describing the item. 4: Construct training instances t ∈ T for each bz in
2) Fusion Layer: We elaborately design a gated neural each epoch

network to effectively fuse the long-term representation L u 5: for each bz do

and the short-term representation S u and return the final user 6: Compute the dynamic user representation E cu from
interest representation ouf ∈ Rd , as shown in Fig. 4. A gate Eq. (1)-(5)
vector λut is used to decide the contribution of the short- and 7: For the users’ short-term interests S u , obtain
long-term representations at time t. It is computed as follows: the sequential and multi-interests hidden states based on

Eq. (6)-(10)
λut = sigmoid W1λ L u + W2λ S u + b λ (15) 8: Construct the short-term representation S u from

Eq. (11)-(13)
where W1λ , W2λ , and b λ are model parameters. Finally, the final 9:

Construct the long-term representation L uf from all
user interest representation ou is computed by feature scales according to Eq. (11)-(13)
10: Construct the long-term representation L u from

ou = 1 − λut L u + λut S u (16)

Eq. (14)
where is elementwise multiplication. 11: Compute a final user representation ou from
Eq. (15)-(16)
12: Compute the probability scores for each item from
E. Prediction Module Eq. (17)
1) Prediction Layer: We compute the probability distri- 13: Update according to Eq. (18)
bution p over all items |I | at time t based on the final 14: end for
comprehensive user representation ou via a SoftMax layer as 15: until convergence
follows: 16: return

p = softmax W p ou (17)
where W p ∈ R|I |×d is a trainable weight matrix. Finally, all A. Datasets

items are ranked based on the probability score in descending We conduct our experiments on two publicly available
order, and the items corresponding to the top-N scores are datasets: ML-HetRec [14] and Gowalla [6]. The ML-Hetrec
used for recommendation at the next time step t. dataset is a widely used movie dataset spanning from
2) Model Training: We utilize the cross-entropy loss over September 7 to January 9. There are four temporal contexts
all training samples. Since the actual probability is 1 for the and four item features. The Gowalla dataset is a check-
ground-truth item and 0 otherwise, the loss function can be in dataset collected from the location-based social network
simplified as spanning from February 9 to August 11. There are five
spatiotemporal contexts describing the current state of the user.

T
Each item in Gowalla can be described by three features. The
L= log pg , t (18)
characteristics of these two datasets are summarized in Table I.
t
For both of them, we eliminate unpopular items with the
where t and T are, respectively, the index and the total number support of less than 10. We treat all users’ sequences with
of training samples, and g is the index of the ground-truth item 30-min inactivity and one-day inactivity as a single session on
for the tth training sample. As a unified model, the Adam ML-Hetrec and Gowalla. We remove sessions with fewer than
optimizer algorithm can be applied to optimize the model three items to filter too short and poorly informative sessions.
parameters. Finally, the model-learning process of LSCAR is Inactive users with fewer than five sessions are also filtered
summarized in Algorithm 1. out. We keep the first 80% of each user’s sessions for training
and the rest are for testing.
V. E XPERIMENTAL S ETUP
B. Baselines
This section describes the datasets, state-of-the-art base-
lines, evaluation metrics, and the implementation details used To demonstrate the LSCAR model’s effectiveness,
in our experiments. we include three groups of recommendation baselines,
TABLE I indexes. We use M@N to measure how well the model ranks
S TATISTICS OF THE D ATASETS U SED IN THE E XPERIMENTS the prediction item as follows:
1 1
M@N = (20)
|T | t∈T Rank i g , t
where Rank(i g , t ) is the rank of the ground-truth item i g in

the t th test session.
D. Implementation Detail
Our model is implemented in Tensorflow, and all exper-
iments were conducted on a Ge-Force GTX 1080-GPU
machine. During the training, we randomly initialized the
model parameters with a Gaussian distribution with a mean
of zero and the standard deviation of 0.01. We employed
two hidden layers for the user–contextual interaction module.
The optimized model parameters (latent dimension (d), model
batch size (bz), learning rate (lr), the number of heads, and
epochs) are obtained by using a grid-search with early stopping
and Adam optimization on the validation set for fine-tuning on
both datasets. The search space is limited with the following
including traditional, contextual, and sequential-aware
ranges: d = {8,16,32,64,128}, bz = {16,32,64,128}, lr =
recommendations for the performance evaluation as follows.
{0.0001, 0.001, 0.01, 0.1}, the number of heads = {1,2,3,4},
1) Item-Based CF [21]: This approach utilizes an and epochs = {5,10,15,20,25}. Finally, on the ML-Hetrec
item–item similarity matrix for the recommendation. dataset, we set d = 64, bz = 32, lr = 0.001, the number of
2) FPMC [22]: This approach integrates the MC with MF heads = 4, and epochs = 15. On the Gowalla dataset, we set
to model user sequential behaviors for the next item d = 64, bz = 64, lr = 0.001, the number of heads = 2,
recommendation. and epochs = 20. For all baseline methods, we follow the
3) GRU4Rec [11]: This approach applies the paralleliza- experimental setting and hyperparameters as stated in their
tion technique with GRU units to learn session rep- original papers.1,2,3
resentation. The ranking-based function is used as a
loss function, which significantly outperforms traditional VI. E XPERIMENTAL R ESULTS
methods.
In this section, we design our experiments to answer the
4) GRU4Rec++ [24]: This approach aims to improve
following research questions.
GRU4Rec performance by data augmentation.
1) RQ1: How does LSCAR perform as compared with the
5) CARNN [18]: This approach models contextual infor-
state-of-the-art baselines for a top-N recommendation
mation, such as time of day, day of the week, and time
task?
interval with the sequential information simultaneously.
2) RQ2: What is the impact of various components in the
6) NARM [17]: This approach employs global and local
LSCAR model architecture?
attention structures to capture users’ general interests
3) RQ3: How do the different hyperparameters of LSCAR
and sequential behavior.
impact its performance (i.e., epoch, batchsize, and
7) STAMP [19]: This approach extracts both long- and
heads)?
short-term user preferences from the current session
4) RQ4: Does LSCAR provide an efficient way to explain
using an attention mechanism.
how the model works and how to interpret the recom-
8) SHAN [30]: This approach leverages a two-layer atten-
mendation result?
tion network to incorporate both users’ general repre-
sentation and recent consumed items. A. Overall Performance (RQ1)
From Table II, we can draw the following observations.
C. Evaluation Metric
First, our proposed LSCAR consistently and significantly
We evaluate all models’ performance with two widely used outperforms all baselines in every metric on both datasets.
metrics, namely, recall (R) and mean reciprocal rank (M). Generally, LSCAR obtains improvements over the best base-
R@N measures the accuracy of recommendation as line by 4.99% in R@20 and 13.45% in M@20 on the
#hit ML-Hetrec dataset. On the Gowalla dataset, LSCAR improves
R@N = (19) over the best baseline by 0.63% in R@20 and 4.72% in MRR.
T
where T is the size of the test data and #hit denotes the Second, the sequential-based methods dramatically outperform
number of cases where the desired items appear in the top- 1
https://github.com/rn5l/session-rec
N ranking lists. Since the recommender systems can provide 2 https://github.com/uctoronto/SHAN
only a few items, the relevant item should be ranked in former 3 https://bitbucket.org/cdal/stackedcontextawarernn/src/master/
TABLE II TABLE III

P ERFORMANCE C OMPARISON IN T ERMS OF R@N AND M@N ON A LL P ERFORMANCE C OMPARISON OF E ACH C OMPONENT IN LSCAR
D ATASETS . B EST P ERFORMANCE I S IN B OLDFACE , W HILE THE
S ECOND B EST I S U NDERLINED
TABLE IV
A BLATION S TUDY OF THE S HORT-T ERM R EPRESENTATION S u
traditional item-based CF. For example, FPMC is better than

ItemCF by 24% and 16% on R@20 and 4.88% and 7% on
M@20 on ML-Hetrec and Gowalla, respectively. This implies
a problem in recommendation performance with the method
solely based on an item’s co-occurrence popularity. Third,
the performance superiority of GRU4Rec and GRU4Rec+
over conventional models, such as FPMC on both datasets,
demonstrates the capability of RNN-based models in dealing
with sequence information. Fourth, CARNN performs better
than GRU4Rec and GRU4Rec++ since CARNN exploits con-
textual information, such as temporal and transition contexts.
Also, NARM and STAMP remarkably beat all traditional
methods since they employ the attention mechanism to mine
the primary intention of users’ behaviors and differentiate each
item’s impact on the next recommendation prediction. Finally,
among the baseline methods, SHAN, on average, performs the
best on both datasets since it incorporates long- and short-term
behaviors in a personalized way. However, without considering
contextual information and users’ multiple interests, it can
hardly maximize the recommendation accuracy, and this is the
crucial factor that means that LSCAR beats the state-of-the-art Fig. 5. Model parameter sensitivity analysis. (a) Impact of embedding size
on R@20. (b) Impact of batch size on R@20. (c) Impact of learning rate on
baselines. R@20. (d) Impact of epoch on R@20.
B. Analysis of Different Model Components (RQ2)

Therefore, modeling contextual information is helpful for
To verify the contribution of various components in LSCAR generating accurate recommendation results. Second, in wo-L u
to the performance gain, we further implement four variants and wo-S u , the model performance significantly suffers from
for ablation tests as follows. a noticeable decrease in both datasets. This indicates that both
1) wo-E cu : We replace a dynamic user representation with components are critical for the sequential recommendation.
a static user representation. Also, in wo-FG, the proposed model suffers for both evalua-
2) wo-S u : The LSCAR model without the short-term pref- tion metrics. This reveals that a simple concatenation could
erence modeling part. indeed harm the fusion performance. Finally, the complete
3) wo-L u : The LSCAR model without the long-term pref- model LSCAR, which is the combination of E cu , S u , L u , and
erence modeling part. FG, achieves both datasets’ best performance. This shows that
4) wo-FG: We remove the FG and concatenate the long- contextual information, both multiple interests in long-term
and short-term representations. and short-term preferences, and an FG make a positive con-
5) LSCAR: The full version of our proposed model. The tribution to the substantial performance in sequential context-
observations from the ablation studies on both datasets, aware recommendations.
as shown in Table III, are summarized as follows. To evaluate the contribution of each component in S u ,
First, in wo-E cu , the recommendation performance drops we further conducted experiments based on three degraded
on both datasets, particularly on Gowalla. This shows that variants of S u in our proposed model, where S u wo LSTM,
spatiotemporal contextual information plays an important S u wo multihead, and S u wo θshort represent the model of
role in performance improvement on the Gowalla dataset. LSCAR but eliminate LSTM, multihead self-attention, and
Fig. 6. Attention visualization of all users on both datasets. (a) ML-Hetrec dataset. (b) Gowalla dataset. (c) Visualization of fusion gate score.
TABLE V has already converged with these epoch number settings.

R@20 P ERFORMANCE OF LSCAR W. R . T. THE N UMBER OF H EADS Table V shows that the short-term representation will get
more accurate with the number of heads increasing. However,
when the number of multiheads in a self-attention network is
greater than 4 and 2 on the ML-Hetrec and Gowalla datasets,
respectively, the model performance worsens. The possible
reason is that a greater number of heads results in an inferior
and smaller representation of each item in headi since the
user attention network θshort from the short-term preference embedding size of each headi is calculated by the embedding
modeling part, respectively. From Table IV, we can observe size of eid divided by the number of heads used in a self-
that the proposed model S u without LSTM to model sequential attention network.
dependencies suffers from a significant decrease in recom-
D. Analysis of Attention Weights (RQ4)
mendation performance, especially on the Gowalla dataset.
This clearly demonstrates the benefit of sequential modeling This section elaborates on how to interpret our proposed
brought by LSTM. Also, without the multihead self-attention model’s attention weights as essential factors that trigger the
for capturing multiple interests from various aspects, S u wo recommendation results. Although the attention mechanism
multihead experiences a performance decline, especially on could not generate an utterly human-readable interpretation,
the ML-Hetrec dataset. We can conclude that multihead self- it could highlight the vital salience features that drive the user’s
attention in S u plays a pivotal role in the proposed model. decision. To confirm the necessity of adaptively modeling
Moreover, since different users have different focuses on context weights in (4), Fig. 6(a) and (b) demonstrates the
various aspects, it makes sense that the user attention network average α(u, c) for all users on both datasets. We can notice
θshort is indispensable for our model. that the highest attention weights in descending order fall into
DOW, TOD, M, and Y, respectively, on the MovieLens dataset,
while, on the Gowalla dataset, TOD is the most dominant
C. Analysis of Model Hyperparameters (RQ3) context affecting users’ choices more than DOW, city, M,
In this section, we study how diverse parameter ranges affect and Y. This observation verifies our assumption that different
the performance of our proposed model. Fig. 5(a) presents the contextual factors have a different impact on users. Thus, their
R@20 for different latent dimensions d while setting other respective influences should be properly incorporated into the
optimal hyperparameters as unaltered. We can observe that model learning. Fig. 6(c) shows tendencies of FG value λ
our proposed model reaches its best performance when d is where the gate values in ML-Hetrec are generally lower than
chosen appropriately at 64 on both datasets. A larger latent those in Gowalla. This indicates that the short-term vector
dimension does not guarantee a better model performance due plays a major role in the ML-Hetrec, while the long-term
to the overfitting issue. Fig. 5(b) and (c) shows the learning rate vector is the dominant factor in the Gowalla dataset. In other
and batch size sensitivity in our model. In this study, we set the words, people tend to decide on the movie to watch according
learning rate as 0.001 on both datasets, and the batch size is set to some recent movies, while people tend to choose where to
as 32 and 64 on ML-Hetrec and Gowalla, respectively. Except go by their accumulated long-term behaviors.
for this setting, we can notice the suboptimal performance 1) Individual Case Study: We further conducted qualitative
due to the local optimal problem happening when the value is analyses to show how LSCAR generates a persuasive recom-
either set to be too small or too large. mendation. Fig. 7(a) shows the learned attention weights of a
Fig. 5(d) shows an ascending movement of recommendation context combination: year, month, TOD, and DOW. It reveals
performance when raising the epoch number to 15 and 20 on that the effects of TOD and DOW (9 P. M. on Thursday)
the ML-Hetrec and Gowalla datasets. However, we could are more critical than other contexts for a user 35223’s
not notice the performance gain when the number of epochs decision-making. Fig. 7(b) shows the user’s preferences in
exceeds these numbers. This suggests that the proposed model the genre aspect. It implies that user 35223 is a big fan
Fig. 7. Visualization of attention weights learned by LSCAR of a user ID 35223 on the ML-Hetrec dataset where the darker color in (a)–(d) indicates the
importance of a particular element. (a) Attention visualization of α(u, ci ) for each context. (b) Attention visualization of θgenre for genre preferences in the
long-term behaviors. The five highest weights are {5: Comedy}, {8: Drama}, {2: Adventure}, {1: Action}, and {15: Romance}. (c) Attention visualization
of θshort for items in the short-term behaviors. (d) Attention visualization of Aui for each headi (the last hidden output of LSTM is used as query vector to
get the weights from head1 to head4 ) over items in the short-term behaviors.
TABLE VI
T OP -T HREE R ECOMMENDATION R ESULTS FOR A U SER 35223 ON THE ML-H ETREC D ATASET
of comedy, drama, adventure, action, and romance movies. module is introduced to capture the dynamic user interests with
Fig. 7(c) demonstrates that each item has different influences respect to current contexts. Second, the encoded multi-interest
on a user’s decision, and our attention mechanism could pay module is proposed to capture a user’s multiple interests
more attention to essential items regardless of their position among short- and long-term behaviors. Finally, a novel integra-
in one session. Note that the title of each item ID in S u is tor FG is designed to effectively fuse both short- and long-term
provided as follows: {60069: “WallE,” 60487: “It’s the Great representations for a better prediction. Extensive experiments
Pumpkin, Charlie Brown,” 62434: “Zack and Miri Make a are conducted, and the results show that LSCAR significantly
Porno,” 59784: “Kung Fu Panda,” 59315: “Iron Man,” 49272: outperforms state-of-the-art baselines in all metrics on public
“Casino Royale,” 57640: “Hellboy II: The Golden Army,” benchmark datasets. Besides, through a case study, LSCAR
5445: “Minority Report,” 5989: “Catch Me If You Can,” provides highly interpretable prediction results through learned
59615: “Indiana Jones and the Kingdom of the Crystal Skull”}. attention weights. Generally speaking, we believe that this
Fig. 7(d) shows a multihead attention network’s capability to article makes a significant contribution to the sequence-
capture a user’s multiple interests within his interactions. The aware recommender community. In the future, we can extend
last hidden state h ut of LSTM is used as the query vector the proposed model by integrating social media information
(Q) to get the weights attending to the key and value vector (i.e., likes and comments) for further improvement.
(K , V ) = [h u1 , . . . , h ut ]. head1 mainly concentrates on the
recent items that are action movies, while head4 also captures R EFERENCES
action movies but with the director named Steven Spielberg.
In contrast to head2 , head3 mainly focuses on the first several [1] G. Adomavicius and A. Tuzhilin, “Toward the next generation of
recommender systems: A survey of the state-of-the-art and possible
comedy movies, but head3 captures comedy movies with the extensions,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–749,
actor named Seth Rogan. From this finding, we can provide Jun. 2005.
the recommendation results with an explanation for this user, [2] L. Baltrunas, B. Ludwig, and F. Ricci, “Matrix factorization techniques
for context aware recommendation,” in Proc. 5th ACM Conf.
as shown in Table VI. Recommender Syst., 2011, pp. 301–304.
[3] X. Chen, D. Liu, Z. Xiong, and Z.-J. Zha, “Learning and fusing multiple
VII. C ONCLUSION user interest representations for micro-video and movie recommenda-
This article proposes an exploiting LSCAR for the next item tions,” IEEE Trans. Multimedia, vol. 23, pp. 484–496, 2021.
[4] Q. Cheng et al., “Long short-term session search: Joint personalized
prediction. In particular, the contributions of our study lie in reranking and next query prediction,” in Proc. Web Conf., Apr. 2021,
three main components. First, the user–contextual interaction pp. 239–248.
[5] J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua, “Atten- [28] Y. Yang, Y. Xu, E. Wang, J. Han, and Z. Yu, “Improving
tive collaborative filtering: Multimedia recommendation with item and existing collaborative filtering recommendations via serendipity-based
component-level attention,” in Proc. 40th Int. ACM SIGIR Conf. Res. algorithm,” IEEE Trans. Multimedia, vol. 20, no. 7, pp. 1888–1900,
Develop. Inf. Retr., Aug. 2017, pp. 335–344. Jul. 2018.
[6] E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: [29] B. Yi et al., “Deep matrix factorization with implicit feedback embed-
User movement in location-based social networks,” in Proc. 17th ding for recommendation system,” IEEE Trans. Ind. Informat., vol. 15,
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2011, no. 8, pp. 4591–4601, Aug. 2019.
pp. 1082–1090. [30] H. Ying et al., “Sequential recommender system based on hierarchical
[7] S. Dhelim, H. Ning, N. Aung, R. Huang, and J. Ma, “Personality-aware attention network,” in Proc. 27th Int. Joint Conf. Artif. Intell., Jul. 2018,
product recommendation system based on user interests mining and pp. 1–7.
metapath discovery,” IEEE Trans. Comput. Social Syst., vol. 8, no. 1, [31] H. Zhang, Y. Ji, J. Li, and Y. Ye, “A triple wing harmonium model
pp. 86–98, Feb. 2021. for movie recommendation,” IEEE Trans. Ind. Informat., vol. 12, no. 1,
[8] Y. Du, M. Fang, J. Yi, C. Xu, J. Cheng, and D. Tao, “Enhancing pp. 231–239, Feb. 2015.
the robustness of neural collaborative filtering systems under mali- [32] Y. Zheng, B. Mobasher, and R. Burke, “CSLIM: Contextual SLIM
cious attacks,” IEEE Trans. Multimedia, vol. 21, no. 3, pp. 555–565, recommendation algorithms,” in Proc. 8th ACM Conf. Recommender
Mar. 2019. Syst., 2014, pp. 301–304.
[9] J. Gao, T. Zhang, and C. Xu, “A unified personalized video recommen-
dation via dynamic recurrent neural networks,” in Proc. 25th ACM Int.
Conf. Multimedia, Oct. 2017, pp. 127–135. Tipajin Thaipisutikul received the master’s degree
[10] X. He and T.-S. Chua, “Neural factorization machines for sparse (Hons.) in the research path from The University of
predictive analytics,” in Proc. 40th Int. ACM SIGIR Conf. Res. Develop. Sydney (USYD), Sydney, NSW, Australia, in 2012,
Inf. Retr., Aug. 2017, pp. 355–364. and the Ph.D. degree from the Department of Com-
[11] B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk, “Session- puter Science and Information Engineering, National
based recommendations with recurrent neural networks,” 2015, Central University (NCU), Chung-Li, Taiwan, in
arXiv:1511.06939. [Online]. Available: http://arxiv.org/abs/1511.06939 2021.
[12] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua, “Neural collab- She is currently a member of the MiNELab,
orative filtering,” in Proc. 26th Int. Conf. World Wide Web Companion, Department of Computer Science and Information
2017, pp. 173–182. Engineering, NCU. She is also an Instructor with the
[13] R. He and J. McAuley, “Fusing similarity models with Markov chains Faculty of Information and Communication Technol-
for sparse sequential recommendation,” in Proc. IEEE 16th Int. Conf. ogy (ICT), Mahidol University, Salaya, Thailand. Her research mainly focuses
Data Mining (ICDM), Dec. 2016, pp. 191–200. on machine learning, applied intelligence, data mining, and social network
[14] A. Karatzoglou, X. Amatriain, L. Baltrunas, and N. Oliver, “Multiverse analysis.
recommendation: N-dimensional tensor factorization for context-aware
collaborative filtering,” in Proc. 4th ACM Conf. Recommender Syst.,
2010, pp. 79–86. Timothy K. Shih (Senior Member, IEEE) is cur-
[15] S. Kumar, K. De, and P. P. Roy, “Movie recommendation system using rently a Distinguished Professor and the Vice Dean
sentiment analysis from microblogging data,” IEEE Trans. Comput. of the College of Electrical Engineering and Com-
Social Syst., vol. 7, no. 4, pp. 915–923, Aug. 2020. puter Science (EECS), National Central University,
[16] J. Li, P. Feng, and J. Lv, “ICAMF: Improved context-aware matrix Chung-Li, Taiwan.
factorization for collaborative filtering,” in Proc. IEEE 25th Int. Conf. Prof. Shih is also a fellow of the Institution of
Tools with Artif. Intell., Nov. 2013, pp. 63–70. Engineering and Technology (IET). He is also a
[17] J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, and J. Ma, “Neural attentive Senior Member of ACM. He received many research
session-based recommendation,” in Proc. ACM Conf. Inf. Knowl. Man- awards, including the IIAS Research Award from
age., Nov. 2017, pp. 1419–1428. Germany, the HSSS Award from Greece, the Bran-
[18] Q. Liu, S. Wu, D. Wang, Z. Li, and L. Wang, “Context-aware sequential don Hall Award from the USA, and the 2015 Google
recommendation,” in Proc. IEEE 16th Int. Conf. Data Mining (ICDM), MOOC Focused Research Award. He was named the 2014 Outstanding
Dec. 2016, pp. 1053–1058. Alumnus by Santa Clara University. He was the Founder and the Co-Editor-in-
[19] Q. Liu, Y. Zeng, R. Mokhosi, and H. Zhang, “STAMP: Short-term Chief of the International Journal of Distance Education Technologies, USA.
attention/memory priority model for session-based recommendation,” in He is also an Associate Editor of IEEE Computing, IEEE T RANSACTIONS
ON L EARNING T ECHNOLOGIES , ACM Transactions on Internet Technology,
Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
2018, pp. 1831–1839. and IEEE T RANSACTIONS ON M ULTIMEDIA.
[20] X. Lin, J. Wu, C. Zhou, S. Pan, Y. Cao, and B. Wang, “Task-adaptive
neural process for user cold-start recommendation,” in Proc. Web Conf., Avirmed Enkhbat received the master’s degree
Apr. 2021, pp. 1306–1316. from the Department of Information and Computer
[21] G. Linden, B. Smith, and J. York, “Amazon.com recommendations: Sciences, National University of Mongolia (NUM),
Item-to-item collaborative filtering,” IEEE Internet Comput., vol. 7, Ulan Bator, Mongolia, in 2016. He is currently
no. 1, pp. 76–80, Jan./Feb. 2003. pursuing the Ph.D. degree with the Department
[22] S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme, “Factorizing of Computer Science and Information Engineering,
personalized Markov chains for next-basket recommendation,” in Proc. National Central University, Chung-Li, Taiwan.
19th Int. Conf. World Wide Web, Apr. 2010, pp. 811–820. He is currently a member of MineLab, National
[23] R. Schwarzenberg, L. Raithel, and D. Harbecke, “Neural vector concep- Central University. He is also an Instructor with the
tualization for word vector space interpretation,” in Proc. 3rd Workshop Department of Information and Computer Sciences,
Eval. Vector Space Represent., 2019, pp. 1–7. NUM. His main research interests include computer
[24] Y. K. Tan, X. Xu, and Y. Liu, “Improved recurrent neural networks for vision and human–computer interaction.
session-based recommendations,” in Proc. 1st Workshop Deep Learn.
Recommender Syst., 2016, pp. 17–22.
[25] W. Wang, J. Chen, J. Wang, J. Chen, J. Liu, and Z. Gong, Wisnu Aditya received the bachelor’s degree
“Trust-enhanced collaborative filtering for personalized point of inter- in computer science from Brawijaya University,
ests recommendation,” IEEE Trans. Ind. Informat., vol. 16, no. 9, Malang, Indonesia, in 2013, and the master’s degree
pp. 6124–6132, Sep. 2020. from National Central University, Chung-Li, Taiwan
[26] J. Xiao, H. Ye, X. He, H. Zhang, F. Wu, and T.-S. Chua, “Attentional in 2017, where he is currently pursuing the Ph.D.
factorization machines: Learning the weight of feature interactions degree with the Department of Computer Science
via attention networks,” in Proc. 26th Int. Joint Conf. Artif. Intell., and Information Engineering.
Aug. 2017, pp. 1–7. He is currently a member of MineLab, National
[27] Z. Xie, C. Liu, Y. Zhang, H. Lu, D. Wang, and Y. Ding, “Adversarial Central University. His main research interests are
and contrastive variational autoencoder for sequential recommendation,” gesture recognition and deep learning.
in Proc. Web Conf., Apr. 2021, pp. 449–459.

J170 A Novel Context-Aware Recommender Systems Based On A J166 Deep Sequential Learning Approach (CReS)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

J170 A Novel Context-Aware Recommender Systems Based On A J166 Deep Sequential Learning Approach (CReS)

Uploaded by

Copyright:

Available Formats

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS 1

Exploiting Long- and Short-Term Preferences for

N OMENCLATURE A. Inability to Distinguish the Influence of Various Contexts

2 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

4 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

interaction between a user and a context by its latent vectors.

6 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

D. Integrator Fusion Gate Module Algorithm 1 Learning Algorithm for LSCAR

ou = 1 − λut L u + λut S u (16)

where W p ∈ R|I |×d is a trainable weight matrix. Finally, all A. Datasets

8 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

where Rank(i g , t ) is the rank of the ground-truth item i g in

TABLE II TABLE III

traditional item-based CF. For example, FPMC is better than

B. Analysis of Different Model Components (RQ2)

10 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

TABLE V has already converged with these epoch number settings.

12 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

You might also like

J170 A Novel Context-Aware Recommender Systems Based On A J166 Deep Sequential Learning Approach (CReS)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

J170 A Novel Context-Aware Recommender Systems Based On A J166 Deep Sequential Learning Approach (CReS)

Uploaded by

Copyright:

Available Formats

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS 1

Exploiting Long- and Short-Term Preferences for

N OMENCLATURE A. Inability to Distinguish the Influence of Various Contexts

2 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

4 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

interaction between a user and a context by its latent vectors.

6 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

D. Integrator Fusion Gate Module Algorithm 1 Learning Algorithm for LSCAR

ou = 1 − λut  L u + λut  S u (16)

where W p ∈ R|I |×d is a trainable weight matrix. Finally, all A. Datasets

8 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

where Rank(i g , t ) is the rank of the ground-truth item i g in

TABLE II TABLE III

traditional item-based CF. For example, FPMC is better than

B. Analysis of Different Model Components (RQ2)

10 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

TABLE V has already converged with these epoch number settings.

12 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS

You might also like

ou = 1 − λut L u + λut S u (16)