Professional Documents
Culture Documents
Abstract— With the tremendous growth in online information, L u and S u Long- and short-term representations.
capturing dynamic users’ preferences based on their historical ou Final user interest representation.
interactions and providing a few desirable items to users have
become an urgent service for all businesses. Recurrent neural ou Ground-truth user interest representation.
networks (RNNs) and item-based collaborative filtering (CF)
models are commonly used in industries due to their sim- I. I NTRODUCTION
plicity and efficiency. However, they fail to different contexts
that could differently influence current users’ decision-making.
Also, they are not sufficient to capture multiple users’ interests
based on features of the interacting items. Besides, they have
D IGITAL transformation is accelerating and changing the
way in which society conducts business. These days,
Internet businesses, such as search engines, social networks,
a limited modeling capability for the evolution of diversity
and dynamic user preferences. In this article, we exploit long-
e-commerce, and entertainment services, to name a few,
and short-term preferences for deep context-aware recommen- operate based on massive amounts of user-generated data.
dations (LSCAR) to enhance the next item recommendation’s These enormous numbers of users’ longitudinal generated data
performance by introducing three novel components as follows: provide a great opportunity for researchers to leverage them
1) the user–contextual interaction module is proposed to capture to understand users’ behaviors. Recommender systems have
and differentiate the interaction between contexts and users;
2) the encoded multi-interest module is introduced to capture
been employed to not only mitigate the notorious informa-
various types of user interests; and 3) the integrator fusion gate tion overload problem but also to significantly help the web
module is used to effectively fuse the related long-term interests providers increase their revenue by automatically providing the
to the current short-term part, and the module returns the final right information in a timely manner to the right user. Hence,
user interest representation. Extensive experiments and results over the years, constant efforts have been made to enhance the
for two public datasets demonstrate that the proposed LSCAR
outperforms the state-of-the-art models in almost all metrics and quality of recommendations regarding accuracy, transparency,
could provide interpretable recommendation results. and interpretability.
However, although existing studies have shown the improve-
Index Terms— Attention network, context-aware recommenda-
tion, information fusion, interpretable recommendation, sequen- ment at some levels, current systems still face three main
tial modeling, user multi-interest representation. limitations that continue to hinder the recommendation
performances.
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 1. Motivation of our proposed model. There are multiple user interests, such as actor, genre, and director, which characterize the diversity and volatility
of both short- and long-term preferences. The dashed arrow reveals the change in user preferences. The current user behavior is influenced by current contexts,
such as time and location. The effect of contexts on user decision-making is represented with solid arrows. The thicker the line, the greater the impact a
particular context has on the user u.
of movies. Nevertheless, only a few sets of features may pri- designed to capture the dynamic nature of user preferences at
marily attract a user’s interest. Therefore, only modeling users’ two temporal levels, namely, short- and long-term preferences.
interactions based on an item’s ID is insufficient to capture The short-term preference reflects the most recent preference
the semantic features of relationships. Recently, the recurrent and is likely to drive future interactions, while the long-term
neural network (RNN) approaches [17], [19] have been widely preference is accumulated over an extended period and reflects
used to model sequential tasks. However, these methods ignore the user’s general preferences. Since there are many aspects
the fact that users can have multiple interests. Using only (i.e., item ID, item category, and item leaf category) to define
a single dot-product attention representation fails to reflect user interests, we model each aspect separately in the long-
diverse interests and so results in suboptimal recommendation term preference. On the contrary, we employ a multihead
performance. self-attention network to capture the users’ multiple interests
by representing preferences from different latent semantic
C. Inability to Effectively Model Users’ Evolving Preferences subspaces for the short-term preference. Note that a user
Directly modeling all of the user’s behaviors while ignoring attention network is also introduced in both the short- and
the dynamic nature of user preferences, as do the item- long-term preference modeling processes to cope with the
based collaborative filtering (CF) methods [8], [21], [28], personalization concern. Third, the integrator fusion gate (FG)
could limit the recommendation performance. These methods module is designed to effectively merge the static (long-
model static user–item interactions and cannot capture the term) and dynamic (short-term) preference vectors. Similar
dynamic transformation in users’ behavior sequences. As a to the gate mechanism in LSTM, an FG learns to control
result, they usually lead to homogeneous recommendations. the flow of information so that the most relevant infor-
Although more recent work [30] has tried to solve this problem mation in the long-term vectors is fused efficiently with
by considering both short- and long-term interests, it sim- a short-term vector. Finally, an FG outputs the final user
ply utilizes either concatenation or summation methods to interest representation, which will be used in the prediction
combine diverse long-term behaviors with short-term interest. process.
It is still opaque to effectively fuse those relevant long-term We summarize our main contributions as follows.
preferences regarding the current short-term preferences into 1) We propose a novel model to exploit the LSCAR in a
the recommendation process. unified framework.
In this article, we propose exploiting the long- and short- 2) We enhance the user representation by proposing the
term preferences for deep context-aware recommendations, user–contextual interaction module for explicitly mod-
namely LSCAR, to tackle the above limitations, while eling the high-order interactions between contexts and
providing an interpretable recommender solution. Specifi- users. This module also adaptively captures the different
cally, we propose to encode dynamic users’ preferences effects of different contexts on users via an attention
with three corresponding main components as follows. First, network.
the user–contextual interaction module is designed to mine 3) We propose a multihead self-attention network and a
interactions between contexts and users adaptively. It first takes user attention module to model a user’s more fine-
the static user representation and contexts as inputs to learn grained multiple interests in a personalized way.
the shared latent space of their interactions. Then, it returns 4) We design an FG to effectively combine short- and long-
a dense vector of each user–context-specific interaction. All term preferences rather than a simple combination.
user–context interaction vectors are later aggregated into an 5) On two public datasets, the experimental results not
attention network to adaptively differentiate the impact of only demonstrate that LSCAR outperforms the state-
different interactions and return a dynamic user representation of-the-art baselines in all metrics but also provides an
as an output. Second, the encoded multi-interest module is interpretable recommendation.
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
THAIPISUTIKUL et al.: EXPLOITING LONG- AND SHORT-TERM PREFERENCES FOR DEEP CONTEXT-AWARE RECOMMENDATIONS 3
The remainder of this article is organized as follows. for the product recommendation. Lin et al. [20] proposed
Section II presents related works, including traditional, a neural process to tackle a user cold-start recommendation
context-aware, and deep-learning methods for recommender by utilizing a task-adaptive mechanism with a metalearning
systems. Section III introduces the preliminaries to our model. However, these methods usually treat contexts as
study. Section IV describes the details of the proposed certain dimensions similar to those of users and items. The
LSCAR model. Section V presents the experimental setups, relevance between a user and a context is not intuitive and is
while Section VI discusses the experimental results. Finally, difficult to interpret. Also, very few studies have incorporated
Section VII concludes this article. sequential dependencies and contextual information simultane-
ously. Besides, they fail to distinguish the impact of different
II. R ELATED W ORK contexts in context-aware recommendations, which is one of
A. Traditional Recommendation Methods the major concerns of our work.
CF [8], [21], [28] provides the recommendation results C. Sequential-Aware Recommendation Methods
by mining the similarity of users or items from users’ his-
The sequential recommendation aims to predict users’ future
torical behaviors. For example, Linden et al. [21] proposed
interests from their past longitudinal behaviors based on
a personalized item-based CF method for each customer.
the order of time. For instance, FPMC [22] integrated a
Yang et al. [28] improved conventional CF recommendations
Markov chain (MC) with matrix factorization (MF) to build
by utilizing an interest injection concept and serendipi-
a personalized transition matrix. Fossil [13] combined users’
tous items. Du et al. [8] enhanced the neural CF-based
static and dynamic preferences by MC- and similarity-based
recommender systems by introducing noise layers and
methods. Lately, the RNNs and attention mechanisms have
teacher–student models. More recently, several deep learning
shown promising recommendation performance. For example,
techniques have been applied to various tasks. For example,
Hidasi et al. [11] proposed to capture users’ next consumption
He et al. [12] generalized multilayer perceptions (MLPs) to
with gated recurrent units (GRUs). Gao et al. [9] enhanced
estimate user–item interactions. He and Chua [10] proposed to
RNN to have more capability to capture users’ dynamic inter-
model feature interactions in a high-order fashion to improve
est over time. Chen et al. [3] proposed to fuse multiple interest
FM. Xiao et al. [26] introduced an attention network for fac-
representations with late fusion. Ying et al. [30] introduced
torization machines to distinguish the impact of the interaction.
a hierarchical structure of attention networks to model two
Yi et al. [29] integrated side information of users and items for
temporal levels of users’ preferences. Chen et al. [5] proposed
better latent vector representation. Kumar et al. [15] proposed
two attention deep-learning models for characterizing user
a hybrid of CF and content-based filtering with sentiment
interest at item component levels. Li et al. [17] leveraged the
analysis for recommendation systems. Although these methods
benefit brought by the attention network to mine the users’
have proven to be effective and widely employed, they do not
primary purpose, especially for longer sessions. Liu et al. [19]
take the dynamic and evolving nature of users’ preferences
utilized the last click in an attention network to point out
into consideration since they mainly focus on the stability of
the importance of the items in a recommendation session.
users’ preferences. On the contrary, we consider this aspect in
Xie et al. [27] proposed adversarial learning and contrastive
our proposed model.
learning to variational autoencoder (VAE) for the personalized
sequential recommendation. Cheng et al. [4] introduced a long
B. Contextual-Aware Recommendation Methods short-term session search network (LostNet) for a personalized
Contextual-aware RS utilizes contextual information to session search model to rerank documents for the current query
improve recommendation systems. Baltrunas et al. [2] and predict the following query. Nevertheless, these methods
extended MF with contexts (CARS) by introducing the inter- only focus on a single aspect of user interests, and both
action of contextual factors with item ratings. Li et al. [16] short- and long-term preferences are fused into a concatenation
introduced tensor factorization to integrate user–item–context approach, which results in suboptimal prediction performance.
interaction into the recommendation model. Moreover, In contrast, we propose a multihead self-attention approach to
Zheng et al. [32] introduced a contextual CSLIM model for capture multiple user interests. Furthermore, a gating mech-
top-N recommendations. Karatzoglou et al. [14] considered anism is deployed to effectively and efficiently incorporate
different contexts as additional data dimensions to enhance long- and short-term preferences for practical application.
context-aware recommendation. Liu et al. [18] introduced the
context-aware concept into the sequential modeling task. They III. P RELIMINARIES
designed a neural network to employ an adaptive context- In this section, we first provide the mathematical notations
specific input and adaptive context-specific transition input used throughout this article in Nomenclature. Then, we for-
matrices. Zhang et al. [31] utilized text metadata in movie mulate the definitions and the problem that we aim to solve
synopsis, actor list, and user comments for movie recommen- as follows.
dation. Wang et al. [25] included contextual information like
user trust, user preference, and user geographic and temporal A. Definition 1: User Representation
information for performance improvement. Dhelim et al. [7] We denote contextual information as cm where m is the
incorporated the social features, including personality traits total number of contexts. Let u ∈ U denote a set of users and
and topic of interests to alleviate the cold start problem |U| denote the total number of users. We aim to model the
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
E. Problem Formulation where Wuτ , Wcτ , bzτ , and f (.) are the transformation matrices
for the user latent vector, the context latent vector, the bias
Given a static user representation eu , a context combination term, and the Rectifier (Relu) activation function. The bilinear
{c1 , . . . , cm }, short-term behaviors S u = {i 1u , i 2u , . . . , i t−1
u
}, and layer’s output is then passed through a stack of fully connected
long-term behaviors L u of a user u ∈ U, we would like to layers, as shown in the following:
recommend the top-N items that a user u is likely to interact
within the next timestep t. z 1τ = f W1τ z τ + b1τ
...
IV. P ROPOSED M ODEL : LSCAR
ecui = f L W Lτ z τL−1 + b τL (2)
This section introduces the proposed model to exploit
LSCAR. We first present the general framework. Then, where Wlτ , blτ ,
and fl (·) denote the weight matrix, the bias
we detail each component and introduce the parameter learn- vector, and the ReLu activation function, respectively. Finally,
ing process of LSCAR. the output vector in the last Lth hidden layer is denoted as ecui .
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
THAIPISUTIKUL et al.: EXPLOITING LONG- AND SHORT-TERM PREFERENCES FOR DEEP CONTEXT-AWARE RECOMMENDATIONS 5
Fig. 3. Architecture of the user–contextual interaction module. It consists of two submodules that are (a) user–contextual shared latent space submodule and
(b) context-attentional effect submodule.
2) Context-Attention Effect Submodule: This submodule or adaptive binning on the continuous features to include these
aims to differentiate the importance of interaction effects ecui features with their frequency for further processing.
and return the aggregation of interaction effects as a dynamic The overall extraction process utilizes four layers: 1) an
user representation E cu . In particular, given a latent vector embedding layer; 2) a sequential layer; 3) a multihead encod-
ecui and a static user latent vector eu , we could compute the ing layer; and 4) a user attention layer. The details of each
attention score α(u, ci ), as shown in the following: layer are described as follows.
1) Embedding Layer: For a user u, his interacted item
α(u, ci )∗ = W1α f W2α ecui + W3α eu + b α (3) i ∈ I and each feature scale fi describing items can be
where W1α , W2α , W3α , b α , and f (·) are model parameters and embedded from the pretrained embedding vectors into eid ∈
an activation function ReLu. Then, the attention score is Rd and eufi ∈ Rd d-dimensional vectors. As shown in Fig. 4,
normalized by a SoftMax function, as shown in the following: in the long-term preference modeling part, we model each
feature scale L uf separately since it can describe the user’s
exp α(u, ci )∗ preference from a different point of view. We denote freqi
α(u, ci ) = . (4)
exp α(u, ci )∗ as the consumption frequency of a particular feature scale to
indicate a user’s preference level. We only learn the embedding
After we get the attention scores for each user–contextual
for nonzero frequencies (freqi ∗ euf ) since freqi = 0 leads to
interaction, the dynamic user representation E uc of user u is
euf = 0. The main reason to integrate freqi into the model
computed, as shown in the following:
learning process is to identify similar past interests. We aim
M to compute a final user preference based on the preference
E cu = α(u, ci )euci . (5) similarity and frequencies, which is highly relevant to the
i current user interactions.
In this way, the recommended items capture the user’s 2) Sequential Layer: In this layer, given the embedded
global preferences and are tailored to the specific context and sequence of short-term preference S u = {eid u
1
, eid
u
2
, . . . , eid
u
t
},
situation that the user currently finds himself in. the global temporal dependency is captured by the long short-
term memory (LSTM) network as follows:
C. Encoded Multi-Interest Module ftu = σ W 1f eid
u
+ W 2f h ut−1 + b f
t
This section details how we extract the short- and long- i tu = σ Wi1 eid
u
+ Wi2 h ut−1 + bi
term user preferences. As shown in Fig. 4, we model multiple t
otu = σ Wo1 eid
u
+ Wo2 h ut−1 + bo
feature scales separately in the long-term preference modeling t
part. The reasons are twofold. On the one hand, users usually C˜tu = tanh W 1 eu + W 2 h u + bc
c idt c t−1
not only focus on a specific item itself but are also concerned ctu = f tu ct−1
u
+ i tu C˜tu
about its features, such as genre, director, and actor. In other
h ut = otu tanh ctu (6)
words, users may like only a few features of the particular
interacting item; hence, encoding only the item ID feature level where f tu , i tu , and otu denote the forget, input, and output
is far from satisfactory. On the other hand, users may interact gates, respectively. The LSTM outputs the sequential pref-
with many items belonging to the same feature. Hence, cap- erence representation by modeling the short-term behavior
turing the user interests from the frequency of the interacting sequence into the hidden state vector h ut at time t while
features could also be a good indicator of the user level of carrying the past information to the present via the cell
interest. Although our proposed model currently can support state ctu .
only nominal contextual features, we provide room for future 3) Multihead Encoding Layer: Since users may have vari-
improvement to integrate continuous numeric attributes in both ous aspects of interest in items, a single attention network is
long- and short-term parts as follows. For the short-term part, not sufficient to capture the multiple aspect representations.
we can normalize the continuous attributes into the proper Therefore, we exploit a multihead self-attention network to
range and concatenate these features with the item ID feature. capture users’ preferences from different aspects. A self-
For the long-term part, we can perform the fix-width binning attention network is a special case of attention mechanism,
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 4. Detailed model architecture of the encoded multi-interest, integrator FG, and prediction modules. The short-term representation (S u ) is encoded
into (S u ) by LSTM, multihead self-attention, and a user attention network, which will be guided by a user representation with context (E cu ). The long-term
representation (L u ) is encoded into (L uf i ) by a user attention network, which will be guided by E cu . The subscription f i ∈ F represents multiple user interests
on feature scales describing the items that user u is interacting with. L uf i is further encoded into the final representation of long-term interests L u . Then, S u
and L are fused via an integrator FG to produce the final user interest representation (o ).
u u
which takes the sequence itself as query, key, and value vector 4) User Attention Layer: Since different users have different
of d-dimension. From the previous LSTM outputs denoted as preferences even for similar item sets, we introduce a user
H u = {h u1 , h u2 , . . . , h ut }, we can aggregate the output vector attention layer on top of the lower layers in both the short-
after a self-attention network denoted as hut . A multihead and long-term preference modeling parts. This module aims
self-attention network allows our model to jointly focus on to mine more fine-grained personalized information where a
information from different representation subspaces in various dynamic user representation E cu is used as the query vector
positions. As a result, we could model user preference hut ∈ u = {hu1 , hu2 , . . . , hut }
attending to inputi , which can either be H
Rd from multiple interest aspects. The output matrix H u = in the short-term modeling part or (freqi ∗ euf ) in the long-term
{h1 , h2 , . . . , ht } is calculated as
u u u
modeling part. Next, we compute the short-term representation
S u as follows:
u = Multihead H u = W h concat headu1 , . . . , headuh (7)
H
θi∗ = f W1θ inputi + W2θ E cu + b θ (11)
where W h ∈ Rd×hdk denotes the weight matrix of the output
linear transformation, the subscript h denotes the number of where W1θ , W2θ , bθ , and f (·) are model parameters and an
heads, and dk = 1/ hd. In particular, each headui ∈ Rdk ×t activation function ReLu. Then, the attention scores θi∗ are
denotes a single latent interest as normalized by a Softmax layer as follows:
headui = Attention WiQ H u , WiK H u , WiV H u (8) exp θi∗
θi = . (12)
exp θi∗
where WiQ , WiK , WiV ∈ Rdk ×d denote the linear transformation
weight matrices of query, key, and value, respectively. Let Now, we get the weight of each inputi , which reflects the
Q ui = WiQ H u , K iu = WiK H u , and Viu = WiV H u . The importance of each input component. We then compute S u by
attention score matrix is calculated as follows: a weighted sum of input latent vectors and their corresponding
attention scores as follows:
f Q ui , K iu = Q uT
i Ki
u
Su = θi inputi . (13)
Ai = softmax f Q ui , K iu .
u
(9) i=1,...,t
Finally, we perform weighted sum pooling as follows: We compute the long-term representation L uf in a similar
way by using (11)–(13) but with different model parameters.
headui = Viu AuT
i . (10)
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
THAIPISUTIKUL et al.: EXPLOITING LONG- AND SHORT-TERM PREFERENCES FOR DEEP CONTEXT-AWARE RECOMMENDATIONS 7
Eq. (11)-(13)
where W1λ , W2λ , and b λ are model parameters. Finally, the final 9:
Construct the long-term representation L uf from all
user interest representation ou is computed by feature scales according to Eq. (11)-(13)
10: Construct the long-term representation L u from
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE I indexes. We use M@N to measure how well the model ranks
S TATISTICS OF THE D ATASETS U SED IN THE E XPERIMENTS the prediction item as follows:
1 1
M@N = (20)
|T | t∈T Rank i g , t
D. Implementation Detail
Our model is implemented in Tensorflow, and all exper-
iments were conducted on a Ge-Force GTX 1080-GPU
machine. During the training, we randomly initialized the
model parameters with a Gaussian distribution with a mean
of zero and the standard deviation of 0.01. We employed
two hidden layers for the user–contextual interaction module.
The optimized model parameters (latent dimension (d), model
batch size (bz), learning rate (lr), the number of heads, and
epochs) are obtained by using a grid-search with early stopping
and Adam optimization on the validation set for fine-tuning on
both datasets. The search space is limited with the following
including traditional, contextual, and sequential-aware
ranges: d = {8,16,32,64,128}, bz = {16,32,64,128}, lr =
recommendations for the performance evaluation as follows.
{0.0001, 0.001, 0.01, 0.1}, the number of heads = {1,2,3,4},
1) Item-Based CF [21]: This approach utilizes an and epochs = {5,10,15,20,25}. Finally, on the ML-Hetrec
item–item similarity matrix for the recommendation. dataset, we set d = 64, bz = 32, lr = 0.001, the number of
2) FPMC [22]: This approach integrates the MC with MF heads = 4, and epochs = 15. On the Gowalla dataset, we set
to model user sequential behaviors for the next item d = 64, bz = 64, lr = 0.001, the number of heads = 2,
recommendation. and epochs = 20. For all baseline methods, we follow the
3) GRU4Rec [11]: This approach applies the paralleliza- experimental setting and hyperparameters as stated in their
tion technique with GRU units to learn session rep- original papers.1,2,3
resentation. The ranking-based function is used as a
loss function, which significantly outperforms traditional VI. E XPERIMENTAL R ESULTS
methods.
In this section, we design our experiments to answer the
4) GRU4Rec++ [24]: This approach aims to improve
following research questions.
GRU4Rec performance by data augmentation.
1) RQ1: How does LSCAR perform as compared with the
5) CARNN [18]: This approach models contextual infor-
state-of-the-art baselines for a top-N recommendation
mation, such as time of day, day of the week, and time
task?
interval with the sequential information simultaneously.
2) RQ2: What is the impact of various components in the
6) NARM [17]: This approach employs global and local
LSCAR model architecture?
attention structures to capture users’ general interests
3) RQ3: How do the different hyperparameters of LSCAR
and sequential behavior.
impact its performance (i.e., epoch, batchsize, and
7) STAMP [19]: This approach extracts both long- and
heads)?
short-term user preferences from the current session
4) RQ4: Does LSCAR provide an efficient way to explain
using an attention mechanism.
how the model works and how to interpret the recom-
8) SHAN [30]: This approach leverages a two-layer atten-
mendation result?
tion network to incorporate both users’ general repre-
sentation and recent consumed items. A. Overall Performance (RQ1)
From Table II, we can draw the following observations.
C. Evaluation Metric
First, our proposed LSCAR consistently and significantly
We evaluate all models’ performance with two widely used outperforms all baselines in every metric on both datasets.
metrics, namely, recall (R) and mean reciprocal rank (M). Generally, LSCAR obtains improvements over the best base-
R@N measures the accuracy of recommendation as line by 4.99% in R@20 and 13.45% in M@20 on the
#hit ML-Hetrec dataset. On the Gowalla dataset, LSCAR improves
R@N = (19) over the best baseline by 0.63% in R@20 and 4.72% in MRR.
T
where T is the size of the test data and #hit denotes the Second, the sequential-based methods dramatically outperform
number of cases where the desired items appear in the top- 1
https://github.com/rn5l/session-rec
N ranking lists. Since the recommender systems can provide 2 https://github.com/uctoronto/SHAN
only a few items, the relevant item should be ranked in former 3 https://bitbucket.org/cdal/stackedcontextawarernn/src/master/
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
THAIPISUTIKUL et al.: EXPLOITING LONG- AND SHORT-TERM PREFERENCES FOR DEEP CONTEXT-AWARE RECOMMENDATIONS 9
TABLE IV
A BLATION S TUDY OF THE S HORT-T ERM R EPRESENTATION S u
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 6. Attention visualization of all users on both datasets. (a) ML-Hetrec dataset. (b) Gowalla dataset. (c) Visualization of fusion gate score.
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
THAIPISUTIKUL et al.: EXPLOITING LONG- AND SHORT-TERM PREFERENCES FOR DEEP CONTEXT-AWARE RECOMMENDATIONS 11
Fig. 7. Visualization of attention weights learned by LSCAR of a user ID 35223 on the ML-Hetrec dataset where the darker color in (a)–(d) indicates the
importance of a particular element. (a) Attention visualization of α(u, ci ) for each context. (b) Attention visualization of θgenre for genre preferences in the
long-term behaviors. The five highest weights are {5: Comedy}, {8: Drama}, {2: Adventure}, {1: Action}, and {15: Romance}. (c) Attention visualization
of θshort for items in the short-term behaviors. (d) Attention visualization of Aui for each headi (the last hidden output of LSTM is used as query vector to
get the weights from head1 to head4 ) over items in the short-term behaviors.
TABLE VI
T OP -T HREE R ECOMMENDATION R ESULTS FOR A U SER 35223 ON THE ML-H ETREC D ATASET
of comedy, drama, adventure, action, and romance movies. module is introduced to capture the dynamic user interests with
Fig. 7(c) demonstrates that each item has different influences respect to current contexts. Second, the encoded multi-interest
on a user’s decision, and our attention mechanism could pay module is proposed to capture a user’s multiple interests
more attention to essential items regardless of their position among short- and long-term behaviors. Finally, a novel integra-
in one session. Note that the title of each item ID in S u is tor FG is designed to effectively fuse both short- and long-term
provided as follows: {60069: “WallE,” 60487: “It’s the Great representations for a better prediction. Extensive experiments
Pumpkin, Charlie Brown,” 62434: “Zack and Miri Make a are conducted, and the results show that LSCAR significantly
Porno,” 59784: “Kung Fu Panda,” 59315: “Iron Man,” 49272: outperforms state-of-the-art baselines in all metrics on public
“Casino Royale,” 57640: “Hellboy II: The Golden Army,” benchmark datasets. Besides, through a case study, LSCAR
5445: “Minority Report,” 5989: “Catch Me If You Can,” provides highly interpretable prediction results through learned
59615: “Indiana Jones and the Kingdom of the Crystal Skull”}. attention weights. Generally speaking, we believe that this
Fig. 7(d) shows a multihead attention network’s capability to article makes a significant contribution to the sequence-
capture a user’s multiple interests within his interactions. The aware recommender community. In the future, we can extend
last hidden state h ut of LSTM is used as the query vector the proposed model by integrating social media information
(Q) to get the weights attending to the key and value vector (i.e., likes and comments) for further improvement.
(K , V ) = [h u1 , . . . , h ut ]. head1 mainly concentrates on the
recent items that are action movies, while head4 also captures R EFERENCES
action movies but with the director named Steven Spielberg.
In contrast to head2 , head3 mainly focuses on the first several [1] G. Adomavicius and A. Tuzhilin, “Toward the next generation of
recommender systems: A survey of the state-of-the-art and possible
comedy movies, but head3 captures comedy movies with the extensions,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–749,
actor named Seth Rogan. From this finding, we can provide Jun. 2005.
the recommendation results with an explanation for this user, [2] L. Baltrunas, B. Ludwig, and F. Ricci, “Matrix factorization techniques
for context aware recommendation,” in Proc. 5th ACM Conf.
as shown in Table VI. Recommender Syst., 2011, pp. 301–304.
[3] X. Chen, D. Liu, Z. Xiong, and Z.-J. Zha, “Learning and fusing multiple
VII. C ONCLUSION user interest representations for micro-video and movie recommenda-
This article proposes an exploiting LSCAR for the next item tions,” IEEE Trans. Multimedia, vol. 23, pp. 484–496, 2021.
[4] Q. Cheng et al., “Long short-term session search: Joint personalized
prediction. In particular, the contributions of our study lie in reranking and next query prediction,” in Proc. Web Conf., Apr. 2021,
three main components. First, the user–contextual interaction pp. 239–248.
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[5] J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua, “Atten- [28] Y. Yang, Y. Xu, E. Wang, J. Han, and Z. Yu, “Improving
tive collaborative filtering: Multimedia recommendation with item and existing collaborative filtering recommendations via serendipity-based
component-level attention,” in Proc. 40th Int. ACM SIGIR Conf. Res. algorithm,” IEEE Trans. Multimedia, vol. 20, no. 7, pp. 1888–1900,
Develop. Inf. Retr., Aug. 2017, pp. 335–344. Jul. 2018.
[6] E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: [29] B. Yi et al., “Deep matrix factorization with implicit feedback embed-
User movement in location-based social networks,” in Proc. 17th ding for recommendation system,” IEEE Trans. Ind. Informat., vol. 15,
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2011, no. 8, pp. 4591–4601, Aug. 2019.
pp. 1082–1090. [30] H. Ying et al., “Sequential recommender system based on hierarchical
[7] S. Dhelim, H. Ning, N. Aung, R. Huang, and J. Ma, “Personality-aware attention network,” in Proc. 27th Int. Joint Conf. Artif. Intell., Jul. 2018,
product recommendation system based on user interests mining and pp. 1–7.
metapath discovery,” IEEE Trans. Comput. Social Syst., vol. 8, no. 1, [31] H. Zhang, Y. Ji, J. Li, and Y. Ye, “A triple wing harmonium model
pp. 86–98, Feb. 2021. for movie recommendation,” IEEE Trans. Ind. Informat., vol. 12, no. 1,
[8] Y. Du, M. Fang, J. Yi, C. Xu, J. Cheng, and D. Tao, “Enhancing pp. 231–239, Feb. 2015.
the robustness of neural collaborative filtering systems under mali- [32] Y. Zheng, B. Mobasher, and R. Burke, “CSLIM: Contextual SLIM
cious attacks,” IEEE Trans. Multimedia, vol. 21, no. 3, pp. 555–565, recommendation algorithms,” in Proc. 8th ACM Conf. Recommender
Mar. 2019. Syst., 2014, pp. 301–304.
[9] J. Gao, T. Zhang, and C. Xu, “A unified personalized video recommen-
dation via dynamic recurrent neural networks,” in Proc. 25th ACM Int.
Conf. Multimedia, Oct. 2017, pp. 127–135. Tipajin Thaipisutikul received the master’s degree
[10] X. He and T.-S. Chua, “Neural factorization machines for sparse (Hons.) in the research path from The University of
predictive analytics,” in Proc. 40th Int. ACM SIGIR Conf. Res. Develop. Sydney (USYD), Sydney, NSW, Australia, in 2012,
Inf. Retr., Aug. 2017, pp. 355–364. and the Ph.D. degree from the Department of Com-
[11] B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk, “Session- puter Science and Information Engineering, National
based recommendations with recurrent neural networks,” 2015, Central University (NCU), Chung-Li, Taiwan, in
arXiv:1511.06939. [Online]. Available: http://arxiv.org/abs/1511.06939 2021.
[12] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua, “Neural collab- She is currently a member of the MiNELab,
orative filtering,” in Proc. 26th Int. Conf. World Wide Web Companion, Department of Computer Science and Information
2017, pp. 173–182. Engineering, NCU. She is also an Instructor with the
[13] R. He and J. McAuley, “Fusing similarity models with Markov chains Faculty of Information and Communication Technol-
for sparse sequential recommendation,” in Proc. IEEE 16th Int. Conf. ogy (ICT), Mahidol University, Salaya, Thailand. Her research mainly focuses
Data Mining (ICDM), Dec. 2016, pp. 191–200. on machine learning, applied intelligence, data mining, and social network
[14] A. Karatzoglou, X. Amatriain, L. Baltrunas, and N. Oliver, “Multiverse analysis.
recommendation: N-dimensional tensor factorization for context-aware
collaborative filtering,” in Proc. 4th ACM Conf. Recommender Syst.,
2010, pp. 79–86. Timothy K. Shih (Senior Member, IEEE) is cur-
[15] S. Kumar, K. De, and P. P. Roy, “Movie recommendation system using rently a Distinguished Professor and the Vice Dean
sentiment analysis from microblogging data,” IEEE Trans. Comput. of the College of Electrical Engineering and Com-
Social Syst., vol. 7, no. 4, pp. 915–923, Aug. 2020. puter Science (EECS), National Central University,
[16] J. Li, P. Feng, and J. Lv, “ICAMF: Improved context-aware matrix Chung-Li, Taiwan.
factorization for collaborative filtering,” in Proc. IEEE 25th Int. Conf. Prof. Shih is also a fellow of the Institution of
Tools with Artif. Intell., Nov. 2013, pp. 63–70. Engineering and Technology (IET). He is also a
[17] J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, and J. Ma, “Neural attentive Senior Member of ACM. He received many research
session-based recommendation,” in Proc. ACM Conf. Inf. Knowl. Man- awards, including the IIAS Research Award from
age., Nov. 2017, pp. 1419–1428. Germany, the HSSS Award from Greece, the Bran-
[18] Q. Liu, S. Wu, D. Wang, Z. Li, and L. Wang, “Context-aware sequential don Hall Award from the USA, and the 2015 Google
recommendation,” in Proc. IEEE 16th Int. Conf. Data Mining (ICDM), MOOC Focused Research Award. He was named the 2014 Outstanding
Dec. 2016, pp. 1053–1058. Alumnus by Santa Clara University. He was the Founder and the Co-Editor-in-
[19] Q. Liu, Y. Zeng, R. Mokhosi, and H. Zhang, “STAMP: Short-term Chief of the International Journal of Distance Education Technologies, USA.
attention/memory priority model for session-based recommendation,” in He is also an Associate Editor of IEEE Computing, IEEE T RANSACTIONS
ON L EARNING T ECHNOLOGIES , ACM Transactions on Internet Technology,
Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
2018, pp. 1831–1839. and IEEE T RANSACTIONS ON M ULTIMEDIA.
[20] X. Lin, J. Wu, C. Zhou, S. Pan, Y. Cao, and B. Wang, “Task-adaptive
neural process for user cold-start recommendation,” in Proc. Web Conf., Avirmed Enkhbat received the master’s degree
Apr. 2021, pp. 1306–1316. from the Department of Information and Computer
[21] G. Linden, B. Smith, and J. York, “Amazon.com recommendations: Sciences, National University of Mongolia (NUM),
Item-to-item collaborative filtering,” IEEE Internet Comput., vol. 7, Ulan Bator, Mongolia, in 2016. He is currently
no. 1, pp. 76–80, Jan./Feb. 2003. pursuing the Ph.D. degree with the Department
[22] S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme, “Factorizing of Computer Science and Information Engineering,
personalized Markov chains for next-basket recommendation,” in Proc. National Central University, Chung-Li, Taiwan.
19th Int. Conf. World Wide Web, Apr. 2010, pp. 811–820. He is currently a member of MineLab, National
[23] R. Schwarzenberg, L. Raithel, and D. Harbecke, “Neural vector concep- Central University. He is also an Instructor with the
tualization for word vector space interpretation,” in Proc. 3rd Workshop Department of Information and Computer Sciences,
Eval. Vector Space Represent., 2019, pp. 1–7. NUM. His main research interests include computer
[24] Y. K. Tan, X. Xu, and Y. Liu, “Improved recurrent neural networks for vision and human–computer interaction.
session-based recommendations,” in Proc. 1st Workshop Deep Learn.
Recommender Syst., 2016, pp. 17–22.
[25] W. Wang, J. Chen, J. Wang, J. Chen, J. Liu, and Z. Gong, Wisnu Aditya received the bachelor’s degree
“Trust-enhanced collaborative filtering for personalized point of inter- in computer science from Brawijaya University,
ests recommendation,” IEEE Trans. Ind. Informat., vol. 16, no. 9, Malang, Indonesia, in 2013, and the master’s degree
pp. 6124–6132, Sep. 2020. from National Central University, Chung-Li, Taiwan
[26] J. Xiao, H. Ye, X. He, H. Zhang, F. Wu, and T.-S. Chua, “Attentional in 2017, where he is currently pursuing the Ph.D.
factorization machines: Learning the weight of feature interactions degree with the Department of Computer Science
via attention networks,” in Proc. 26th Int. Joint Conf. Artif. Intell., and Information Engineering.
Aug. 2017, pp. 1–7. He is currently a member of MineLab, National
[27] Z. Xie, C. Liu, Y. Zhang, H. Lu, D. Wang, and Y. Ding, “Adversarial Central University. His main research interests are
and contrastive variational autoencoder for sequential recommendation,” gesture recognition and deep learning.
in Proc. Web Conf., Apr. 2021, pp. 449–459.
Authorized licensed use limited to: National Central University. Downloaded on January 22,2022 at 12:42:10 UTC from IEEE Xplore. Restrictions apply.