Professional Documents
Culture Documents
1 s2.0 S1567422319300134 Main
1 s2.0 S1567422319300134 Main
Will this session end with a purchase? Inferring current purchase intent of T
anonymous visitors
⁎
Osnat Mokryna, , Veronika Boginab, Tsvi Kuflikb
a
Department of Information and Knowledge Management, University of Haifa, Israel
b
Department of Information Systems, University of Haifa, Israel
A R T I C LE I N FO A B S T R A C T
Keywords: Understanding the online behavior and intent of online visitors is the subject of a long line of research. me-
Purchase intent chanisms to understand the purchase intent of visitors, to increase the number of visits that end with a purchase.
Anonymous visitors anonymous visitors garner little attention, having no shopping history or known interests. Compared to profiled
Session dynamics returning customers whose history is known, anonymous visitors garner less attention. The lack of a known
Products trendiness
shopping history or interests makes it hard to learn from their behavior, or infer their shopping intent. Here, we
Temporal session information
suggest the use of products’ popularity trends and visit’s temporal information to infer the purchase intention of
anonymous visitors. We model these dynamics and utilize our model to infer purchase intent of visitors of two
large real e-commerce retailer sites. Our model identifies online signals for purchase intent that can be used for
online purchase prediction.
⁎
Corresponding author.
E-mail address: omokryn@univ.haifa.ac.il (O. Mokryn).
https://doi.org/10.1016/j.elerap.2019.100836
Received 4 October 2018; Received in revised form 3 February 2019; Accepted 9 February 2019
Available online 26 February 2019
1567-4223/ © 2019 Elsevier B.V. All rights reserved.
O. Mokryn, et al. Electronic Commerce Research and Applications 34 (2019) 100836
2
O. Mokryn, et al. Electronic Commerce Research and Applications 34 (2019) 100836
likely to purchase, however their model depends on the personalization Indeed, a recent work shows that product popularity is trending in
of the visitors, and therefore cannot be applied to the case of anon- nature, and changes with time: Tang (Srinivasan and Mekala, 2014)
ymous shoppers. find that the trendiness of products changes with time, week by week,
More similar to our approach is the use of machine learning on or even day by day, due to either user interest change, demand shifts
clickstream data to predict individual intent. Common in this case is the ignited by some external events, or just because a product is out of
use of the dynamics of visits, e.g., the frequency of visits, time from last inventory.
visit, and in-session dynamics (like dwell time, the time spent viewing a Item popularity and user ratings over time are studied by Koren
page). Findings show that the rate of visits and dwell time increase (2010) in a different context, showing that temporal dynamics can af-
when the visitor is close to the purchase. Each visitor is then char- fect user preferences. We further examine whether these temporal dy-
acterized by these dynamics, as well as their current session’s dynamics namics, as well as the trendiness of the viewed products, are predictors
and additional information that is available (such as their demo- of the user’s current purchase intent.
graphics, detailed clickstream and product information, purchase his-
tory, social interactions and influence, and more). Conversion predic- 2.3. Session-based recommenders
tion is then made for either the current or the next visit (Van den Poel
and Buckinx, 2005; Bucklin and Sismeiro, 2009; Lukose et al., 2008; Su The study of session-based recommenders is a growing trend,
and Chen, 2015; Lo et al., 2016; Kooti et al., 2016; Raphaeli et al., especially in the music domain, as described next. Park et al. (2011)
2017). We consider the case of anonymous visitors that do not have have coined in this reference the term Session-based Collaborative
prior history in the site, nor do we know their social network. Filtering (SSCF) and present a modified user-based CF that relies on
When no information on the visitor exists, as is the case of first-time session information that captures sequence patterns and repetitiveness
or anonymous visitors, understanding their current, real-time intent, is in the users’ listening process. Their goal is to predict which song will
challenging. Polites et al. (2018) find a misalignment between online be played next given past sessions. When a song is played, an event
shoppers’ initial intention and the outcome of their online visit, i.e., describing the user and song (item), with the corresponding time stamp,
some of those stating they are in the browsing phase and do not intend is created. A session is defined as a sequence of per-user events within a
to buy end the visit with a purchase, while others, starting with the specific continuous time frame. Then, session similarity is calculated
intention to buy, do not. Purchasing without prior intent, or impulse using the cosine distance between each pair of sessions. The items’ log
purchase accounts for many of the online transactions (Chan et al., data is used as implicit feedback. In their experimental results, using log
2017). While some of the visitors have a predisposition to purchase, data from Bugs Music (one of the biggest music services in Korea), they
others might be inclined or driven to impulse purchases. Research on show that SSCF outperforms the basic CF. Zheleva et al. (2010) develop
websites stimuli that may trigger an impulse purchase considers the a session-based hierarchical graphical model using Latent Dirichlet
site’s visibility and cues embedded in it, such as promotions, persuasive Allocation and show that their model can facilitate playlist completion
aids, etc. (Jeffrey and Hodge, 2007; Wells et al., 2011; Chan et al., based on previous listening sessions or songs that the user has just lis-
2017). In our work we do not consider the initial intent of the visitor, tened to. Using the Zune2 Social music community as a test bed, they
nor can we assume their predisposition to purchase or impulsive model a song listening process by two graphical models with latent
shopping. variables. The first one, the taste model, is characterized by a set of
tastes or media preferences of a specific community. The second one,
2.2. Products’ temporal dynamics and trends the session model, is where each song the user has listened to is defined
as a finite combination of listening moods. They show that from the
Product popularity information is known to affect its success, and perplexity perspective there is a clear advantage in using a session-
people tend to consume more products perceived as popular (Hanson based model for characterizing user preferences in the social media
and Putler, 1996; Salganik et al., 2006; Cai et al., 2009; Tucker and content. Dias and Fonseca (2013) improve music recommendations by
Zhang, 2011). The temporal nature of popularity, though, is unstable. adding temporal context and session’ diversity factors into the analysis
In many scenarios in our lives (TV program consumption, product of music sessions. Their purpose is to recommend to the user the next
purchase, tweet topics and so on) our interests change with time, in a song to listen to. They represent each session using five features: time of
process known as concept drift (Widmer and Kubat, 1996; Tsymbal, day (users tend to listen to different songs at different periods of the
2004; Krawczyk et al., 2017). Concept drift can be sudden, or gradual, day), weekday (users’ song preferences are different in weekdays and
changing slowly with time. Systems for handling concept drift differ weekends), day of month (users tend to listen to more happy music in
according to the type of change they handle (Tsymbal, 2004): (1) In- the beginning of the month and more sad one towards the end), month
stance selection, where the goal is to select instances that are relevant (users’ preferences are not the same during different seasons), and song
to the current time window; (2) Instance weighting, were instances are diversity. They show that the inclusion of temporal information, either
weighted based on their estimated relevance; and (3) Ensemble explicitly or implicitly, increase the accuracy of the recommendations
learning, handling the family of predictors that are weighted per their significantly when compared with traditional Session-based CF. A
relevance to the present time. Among the three, the first is more re- fundamental difference from our work is that playlists are longer se-
levant to our research. quences compared with shopping sessions. Additionally, they did not
Concept drift is naturally linked to temporal trends. Temporal use dwell time, and did not consider the song’s popularity. Jannach
trends have been shown to govern general interests (Mokryn et al., et al. (2017) and Jannach et al. (2015) incorporate long-term pre-
2016), and are studied not only in recommender systems (Choi and ferences into a next music track generation. They distinguish two types
Varian, 2012; Dias and Fonseca, 2013; Koren, 2010; Lathia et al., 2010; of preferences: short-term history (current session) and long-term his-
Srinivasan and Mekala, 2014) but also in general Web search. Google tory (previous sessions), considering repeated tracks, co-occurrences of
Trends 1 uses the time series index of the volume of submitted queries. the tracks, favorite singers and social friends’ track preferences. They
For example, the volume of queries on a particular brand of a watch combine all into a multi-faceted scoring scheme to provide the best
during the second week of May might be helpful in predicting June recommendation for the next track in the playlist.
sales for that brand. Choi and Varian (2012) use Google Trends to de- Classification: In recommender systems, one common practice is to
monstrate that Google queries help predict economic activity. use ensembles of classifiers (Ricci et al., 2015). Any hybrid technique
1 2
https://trends.google.com/trends/. https://en.wikipedia.org/wiki/Zune.
3
O. Mokryn, et al. Electronic Commerce Research and Applications 34 (2019) 100836
that combines the results of several classifiers could be seen as an en- Table 1
semble method. Netflix winners (Bell et al., 2007) used a combination YooChoose dataset general data statistics.
of many different methods in their study. The two most common en- Name Clicks Buying sessions Non-buying sessions Items
semble techniques are Bagging and Boosting Ricci et al. (2015). Bag-
ging (Bootstrap Aggregation) was initially proposed by Breiman (1996), YooChoose 3,3003,944 509,696 8,740,032 52,739
and it combines outputs from several machine learning techniques for
improving the performance and stability of prediction or classification.
This technique is a special form of the averaging model (Hoeting et al., However, the distribution is a right-skewed one, with sessions lasting
1999). In our experiment, we use three classifiers: Bagging, NBTree, over more than 40 clicks.
and Logistic Regression. WEKA (Hall et al., 2009) enables the use of Fig. 1 depicts the distribution of sessions’ lengths, measured by the
different base-learn classifiers for Bagging. number of clicks, for sessions that ended with a purchase (termed
buying sessions), and for those that did not (termed non-buying ses-
3. Datasets description sions). It is quite easy to see that the non-buying sessions are much
shorter than buying sessions. About 80% of the sessions contain be-
The data in this study is comprised of two datasets of clickstream tween 1 and 4 sessions, while 40% of the sessions are 2-clicks sessions.
events from two e-commerce websites, each representing a different This is quite understandable, as users are not happy with what they
domain of goods, as detailed below. looked at and abandon the search. On the other hand, buying sessions
Both datasets are anonymized and contain clickstream log in- have a much longer tail. Unlike the non-buying sessions, the percentage
formation for extended periods. of 1-click buying sessions is small (4% compared with 14% 1-click non-
To predict the real-time intent of anonymous visitors we treat each buying sessions) while 2-click buying sessions is the largest portion
session as a separate visit of an anonymous visitor and do not consider (about 22% of the buying sessions) Then the percentage decreases
user information. Yet, in our datasets, there are possibly repeated visits gradually, but in a much moderate rate compared with the non-buying
by users. We explain here why treating each session as an anonymous sessions. This behavior seems to be quite understandable as users tend
session strengthens our results. Previous works found that in cases to better examine items they are about to purchase - they may want to
where shoppers are searching for information a repeated process in learn more about them and possibly compare several options. We can
which the time in-between visits decreases captures well their behavior see that in both cases most of the sessions are 2-clicks long: they ac-
(Moe and Fader, 2004; Kalczynski et al., 2006; Van den Poel and count for almost 40% of non-buying sessions, and 20% of buying ses-
Buckinx, 2005; Bucklin and Sismeiro, 2009; Lukose et al., 2008; Su and sions. Next come 3-click sessions, which account for 17.6% and 14.5%,
Chen, 2015; Lo et al., 2016; Kooti et al., 2016; Raphaeli et al., 2017). respectively. The third-place diverges between non-buying sessions (1-
However, recent work showed that shoppers might change their mind click sessions, 14.2%), and buying sessions (4-click sessions, 11.3%).
during their online visit (Polites et al., 2018). By treating each session
as an anonymous independent session, we make no prior assumptions 3.2. Zalando dataset
on the purchase intent of the user, do not consider previous behavior
and patterns, and learn solely from the session dynamics whether it will The second dataset we use is an anonymized click log from
end with a purchase. Zalando3, a large European online fashion retailer, used previously for
The datasets contain real visits of users, some are anonymous, some session-based recommendations (Tavakol and Brefeld, 2014). Every
repeating customers. We handle each session separately, and do not click is associated with a timestamp, the attributes of the viewed item,
consider personal information. This enables us to model every session user ID, and the clicked items. The dataset is richer in details, and more
as belonging to an unknown visitor, and hence we do not collect user attributes are associated with each product than in the YooChoose da-
information or previous visit dynamics. taset. However, to validate our results across these domains, we limit
ourselves to the use of the features used in the YooChoose dataset.
3.1. YooChoose RecSys dataset Table 2 describes the total number of clicks, items, and sessions in the
dataset.
Our primary dataset is the YooChoose RecSys challenge dataset, Here we see longer sessions on average, namely 8.11 clicks per
representing six months of user activities in a large European e-com- session on average, and a larger percentage (5.9%) of sessions that end
merce business that sells a variety of consumer goods including garden in a purchase.
tools, toys, clothes, electronics, and more (Ben-Shimon et al., 2015).
The YooChoose dataset contains two log files: a click events log, and a 4. Modeling dynamics in E-commerce sessions
purchase events log. The click events log consists of a list of click events
on items. Each such event is associated with the session id, a timestamp Modeling the purchase intent of an anonymous visitor can be
(the time when the click occurred), the item id, and the category of the thought of as modeling the purchase intent of an anonymous visitor
item. The purchase events log consists of purchase events from sessions during their session, i.e., during an anonymous session. To understand
that appear in the click events log and end with a purchase. Each entry the characteristics of anonymous sessions that end with a purchase, we
contains a session id, a timestamp (the time when the purchase oc- quantify the dynamics of e-commerce sessions. Each session is con-
curred), and details on the purchased item - the item id, the price, and sidered as a distinguishable visit of an anonymous visitor. We char-
the quantity. The sessions vary in terms of length, the number of clicks, acterize the dynamics of each session by the trendiness of the viewed
and the number of items clicked on. Sessions’ lengths last from a few products, the clickstream, and the session’s temporal characteristics, as
minutes to a few hours and the number of clicks varies from one click to detailed below. We define the recent trendiness of each product and
a few hundred in a session, depending on the user’s activity. consider a session to be as trendy as the trendiest product in it.
Table 1 presents the main characteristics of the YooChoose click-
stream events, i.e., the number of sessions that end with a purchase 4.1. Modeling products recent trendiness in purchasing sessions
(Buying sessions), the number of sessions in the dataset that do not end
up with a purchase (Non-buying sessions), the overall number of clicks We consider here a local view of products’ popularity, in terms of
and the overall number of items. Only 5.5% of the sessions end up with
a purchase. The average number of clicks per session is roughly 2.8,
3
with the majority of sessions ending within less than three clicks. www.zalando.com
4
O. Mokryn, et al. Electronic Commerce Research and Applications 34 (2019) 100836
Table 2 Table 3
Zalando dataset general data statistics. PS Table example from the YooChoose dataset.
Name Clicks Buying sessions Non-buying sessions Items Product Day 91 Day 92 Day 93 Trend
5
O. Mokryn, et al. Electronic Commerce Research and Applications 34 (2019) 100836
of sessions it was viewed in during the preceding n days, Pin . Then, 5. Evaluating purchase intention in an anonymized session
Pin (t ) = Σtj−=1t − n − 1 (PS (j, i) + NPS (j, i)) (1) Our goal is to determine the purchase intent of anonymous visitors
calculated as the sum of the number of buying sessions in which i was from their session’s dynamic characteristics, as modeled above. To that
clicked on with the number of non-buying sessions in which it was end we consider each session as a distinguishable visit of an anonymous
clicked on during these days. The average trendiness TD of a product i visitor. We train an ensemble of classifiers, as well as the XGBoost
at day t is then defined as follows: classifier, and further examine the effect of the different dynamics
modeled. We compare our results to deep learning technique that mines
Σtj−=1t − n − 1PS (j, i)
TDin (t ) = = recurrent patterns and utilizes neural networks (RNN).
Pin (t ) (2) For the classification task, each session is modeled with the fol-
lowing set of features: max product trendiness (TD Sk (t ) , calculated over
#of buying sessions product i was viewed in during the preceding n days
= different time windows of time t); number of clicks; temporal parameters
#of overall sessions product i was viewed in during the preceding n days
of the session. YooChoose and Zalando differ in the available temporal
(3) parameters, as described in Section 4.2. Therefore, the temporal para-
We can now proceed to model the trendiness of current sessions meters used for modeling the YooChoose dataset sessions are: Day of the
(i.e., in our example, sessions occurring at day t), using the trendiness of week; Month the session took place in, and the session’s Dwell time. The
the products viewed in them. If we define a session of length k, denoted temporal parameters used for the Zalando dataset sessions are: Day
by Sk , as a sequence of k views of products on a site (with possible number from the beginning of the dataset.
repetitions), then, a session Sk at day t will be as trendy as the trendies Fig. 2 depicts our design flow for trendiness’ modeling. We learn the
product viewed in it: global trendiness information over 80% of the data. We then take the
remaining 20% of the data, termed test set. In the Figure, this set is in
TD Sk (t ) = max TDin (t ), i ∈ Sk
i (4) the Session Generation. We then perform SMOTE over the test set, and
divide the test set to ten folds, learn from 90% of the test set, and
4.3. Modeling temporal and clickstream characteristics of a session evaluate our results over each of the remaining 10%.
Recall that each of the two datasets we have, YooChoose and
Additional temporal characteristics were used for modeling a ses- Zalando, has been split into two subsets. The first, consisting of 80% of
sion in both datasets. However, the temporal characteristics differ be- the data, is used for the modeling, and the second, consisting of 20% of
tween the YooChoose and Zalando datasets, denoted with Y and Z re- the sessions, is used for experimentations (test part). Each of these sets
spectively: keeps the original characteristics of imbalanced data, as only less than
6% of the sessions end with a purchase. Classifiers working well with
MonthY: Some months are more prone to purchases than others. For imbalanced data, perform well while classifying the main items be-
example, the YooChoose dataset spans seven months, from April to longing to the main category, but poorly otherwise (Chawla et al.,
September. During this time, August was the month with the highest 2002). To overcome this imbalance, we use SMOTE (Chawla et al.,
purchase conversion rate. 2002), a combination of over-sampling and under- sampling techni-
Day of the weekY: People behave differently on different days of ques. SMOTE combines Informed over-sampling of the minority class
the week. For example, in the YooChoose dataset we found that with the random under-sampling of the majority class. We conduct 10-
people tend to purchase more on Sundays and Mondays than on fold cross-validation experiments on the test part of the dataset, which
other days. was not used for modeling.
Dwell timeY: Dwell time, the time a customer spends on viewing a We train an ensemble of classifiers (Ricci et al., 2015), namely
particular page or a product, has been recently linked to the interest Bagging, NBTree and Logistic Regression (Hall et al., 2009), and a state-
the customer has in the product (Yi et al., 2014; Bogina and Kuflik, of-the-art boosting machine learning method, XGBoost (Chen and
2017). Here, we use the session’s latency. Guestrin, 2016). WEKA data mining software (Hall et al., 2009) enables
Day number from the beginning of the datasetZ: As the Zalando the use of different base-learn classifiers For Bagging we use the fol-
dataset does not include the dates, we use the number of days offset lowing: Reduced Error Pruning Tree (“REPTree”) (Srinivasan and
from the beginning of the dataset. Mekala, 2014), which is a quick decision tree learner that is built upon
Additionally, we use the number of clicks in a session. This feature the information gain; NBTree (Kohavi, 1996), a hybrid of decision trees
has been previously found important in several studies, as described with Naive Bayes classifiers that learn from instances that reach the
in Section 2.1. decision trees’ leaves; and Logistic Regression (Hosmer et al., 2013),
Number of clicks in a sessionY, Z defines the length of a session in where the binary dependent variable is categorical, as is the case with
number of clicks. Clearly, as the same product can be clicked-on our prediction of purchase. XGBoost5 is used with max depth of seven
several times within a session, this value does not necessarily cor- and grid search.
relates with the number of viewed products. It is used as an addi- Our experimental results show that using temporal and dynamic
tional feature for both datasets. characteristics of the products and the sessions we are able to achieve a
good classification for whether a session ends up with a purchase or not.
5
https://github.com/dmlc/xgboost.
6
O. Mokryn, et al. Electronic Commerce Research and Applications 34 (2019) 100836
Fig. 3. YooChoose: Aggregated number of sessions with trending products over different time windows, compared to the number of sessions with non-trending
products.
7
O. Mokryn, et al. Electronic Commerce Research and Applications 34 (2019) 100836
Table 5 Table 6
Zalando-Quality of prediction (F1) of purchase intent over different time win- Prediction results (F1) of the general comparative model.
dows.
Time-Window Classifier Datasets
Days Features Logistic Bagging NBTree XGBoost
Yoochoose Zalendo
2 days With Trendiness 0.702 0.859 0.905 0.7554
Without Trendiness 0.701 0.745 0.731 0.7281 2 days logistic 0.644 0.686
Bagging 0.784 0.76
3 days With Trendiness 0.702 0.892 0.892 0.7543 NBTree 0.717 0.741
Without Trendiness 0.701 0.761 0.735 0.7222 XGBoost 0.6486 0.7335
4 days With Trendiness 0.705 0.859 0.806 0.7861 3 days logistic 0.629 0.686
Without Trendiness 0.7 0.807 0.762 0.7246 Bagging 0.786 0.777
NBTree 0.717 0.763
5 days With Trendiness 0.725 0.876 0.848 0.8681 XGBoost 0.65127 0.72201
Without Trendiness 0.718 0.87 0.825 0.7409
4 days logistic 0.616 0.686
6 days With Trendiness 0.761 0.893 0.883 0.9438 Bagging 0.77 0.766
Without Trendiness 0.755 0.896 0.875 0.786 NBTree 0.7 0.714
XGBoost 0.65309 0.7187
8
O. Mokryn, et al. Electronic Commerce Research and Applications 34 (2019) 100836
whether the session ends with a purchase. There are interesting managerial implications to our finding. We
Our datasets come from the product and retail e- commerce in- have considered the session’s temporal parameters, and products’ re-
dustries. The temporal features denoting the time of the visit differ cent trendiness. Temporal information was previously considered for
between both datasets. The use of sessions’ temporal features improves returning customers. Our results indicate that sites can use sessions’
the classification performance, and like trendiness, their removal is temporal information for intent prediction not only for recurrent visi-
associated with a negative impact on the result. The temporal in- tors, but also for first-time and anonymous visitors. The temporal in-
formation available for the YooChoose dataset is rich. The Zalando formation we have for the YooChoose and the Zalando datasets differ,
dataset does not contain temporal information, and we only had the yet in both cases it improves the prediction quality. Sites should,
offset in days from the starting date of the given dataset. Nevertheless, therefore, consider using the full temporal information that exists per
adding this temporal information was sufficient to improve the in- session. Our findings on products’ trendiness contribute to under-
ference over all time windows that we experimented with. When tem- standing the intent of these visitors. We show that removing the pro-
poral information is not used, as is the case with our general model duct trendiness feature negatively affects the classification accuracy for
(described in Section 5.3) the inference quality decreases for both da- both datasets, and more so for long time windows. Utilizing our pre-
tasets. characteristics is a limitation of the study, as we do not compare liminary results (Bogina et al., 2016), the use of products’ recent po-
the same session temporal information across datasets, and therefore pularity in the site was applied in an e-commerce recommender system
cannot compare the in these cases. for purchase intent prediction (Zhang et al., 2016). trendiness, as de-
The datasets used are of real visits done by registered or returning fined and calculated in this study, and session length (in clicks) were
customers, as well as anonymous visitors. Anonymous visitors might be found by the feature selection process to be significant for under-
first-time shoppers, or returning visitors and occasional shoppers who standing the purchase intent for both datasets, and yielded good pre-
do not wish to register at the site. We treat each session as done by an diction compared to a baseline and RNN, an in-session deep learning
anonymous visitor, and model the session’s dynamics, as discussed predictor. different time windows. We find that using very recent in-
before. While we think that this approach is more challenging to our formation for learning, in the time span of two to three days, is suffi-
model, this is also a limitation of this study. cient for achieving good results. The findings from our learning process
Another limitation is the lack of use of the extensive product in- of product trendiness over windows of consecutive days indicate a high
formation available at sites. Previous works have found that shopping recency in the products’ attention span. Products are often being
intention and acceptance is influenced by product characteristics viewed by visitors in up to three consecutive days, but less so in longer
(Pavlou and Fygenson, 2006). windows of times.
Our study takes a machine learning approach to the classification of Interestingly, the number of sessions a product is viewed in during
anonymous visitors’ purchase intention. Both our datasets were im- consecutive days, e.g., in a window of two to six consecutive days,
balanced, with 5–5.9% of the sessions ending with a purchase. There decreases with each day. When we consider a time window of seven
are several known techniques to overcome imbalanced datasets. We days in which we require that a product is viewed (at least once) in each
have observed that using SMOTE on our datasets provides better results consecutive day, we find that we are left with a negligible number of
than using undersampling techniques. We further found that applying sessions, indicating a clear within-site concept drift for the vast majority
SMOTE on all features is more successful than applying SMOTE only on of products. Sites can thus track this temporal trending interest in
selected ones. classifiers provided better results when SMOTE was ap- products and identify a per product trend and concept drift; suggest
plied on all features and then some of the features were removed, re- products accordingly; look for patterns; and try to identify products
maining only with the selected ones. Additionally, w In this study, we with correlating trends, or complementary trends.
define the temporal features as nominal, rather than applying the nu-
meric values we used in our preliminary study (Bogina et al., 2016). 7. Conclusions and future work
Both these changes improve the classifiers’ results compared to the
initial ones presented in the preliminary paper. For the classification We present a method for determining the shopping intent of anon-
task we employ an ensemble of classifiers (Ricci et al., 2015), and ymous visitors to a site. Our method uses only the visitor’s session in-
XGBoost, a novel boosting method (Chen and Guestrin, 2016). We formation, namely the session temporal information, session length,
achieve classification with F1 measures of 0.9 and 0.94 for the datasets. and the recent trendiness of products clicked on in that session. The
These results outperform a random baseline, which was used in these trendiness offers a local temporal view of the products’ recent popu-
recent empiric works Lo et al., 2016; Kooti et al., 2016. We further larity. To detect a recent trend, we draw from machine learning tech-
achieve better quality compared to RNN, a within-session deep learning niques for identifying a concept drift in popularity, and find strong
method, which classified with F1 measures of 0.8 and 0.84 , respectively. locality in time, in scale of days. We show over two separate datasets
Generally, Bagging gives the best results for both datasets regardless of from the retail industry that our method achieves good classification for
the time window used. XGBoost seems to have a different tendency than understanding anonymous and occasional visitors’ purchase intent. The
the ensemble methods, producing better results over the longer time best intent inference is achieved when using temporal aspects together
windows. This might be attributed to the smaller amount of data that is with the session’s trendiness and the number of clicks.
available for longer time windows, as demonstrated in Fig. 3. Similar to The results of this work can be utilized for creating a novel real-time
our findings, it has been reported, that while XGBoost is a common recommender systems that integrates trendiness, and session temporal
choice in Kaggle challenges and KDD Cup competitions, yet, depending information into the reasoning process of an online purchase intent
on the dataset, ensemble methods may give better results (Bekkerman, classification mechanism in sites. This mechanism may guide an online
2015). recommender for improving the shopping experience to the benefit of
At the moment, we classified successfully sessions that ended with a both the buyer and the seller. user’s purchase intent to identify users in
purchase or not. It identifies online signals for purchase intent that can anonymized sessions with low purchase intent. These users then are
be used for online purchase prediction of anonymous visitors while their directed to recommender systems in the hope of converting them into
session is undergoing. Almost half the sessions are long and involve shoppers. setting the threshold for the number of cases participating in
three or more clicks. This gives rise to a predictive paradigm, in which a trendiness analysis will be an interesting option. Our work is a first
our model is used for predicting the purchase intent of an unknown step towards predicting shopping intent of anonymous visitors in sites.
visitor after three or more clicks. Predicting early that a visitor does not We find here purchase intent only at the end of the session. An inter-
intend to purchase improves the site’s ability to introduce re- esting future direction is to find how early in the session the classifier
commenders and personal aids to the visitor during their visit. may have a good enough recommendation. method considers sessions
9
O. Mokryn, et al. Electronic Commerce Research and Applications 34 (2019) 100836
at varying lengths. In a future study we intend to learn how early in the Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H., 2009. The
visit our method is applicable, and utilize these signals of intent that we weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11 (1),
10–18.
identified for prediction. The above findings may also be applicable to Hanson, W.A., Putler, D.S., 1996. Hits and misses: Herd behavior and online product
impulse purchases and returning visitors. We intend to further explore popularity. Marketing Lett. 7 (4), 297–305.
this direction in future works. each product in the session, i.e. using Hidasi, B., Karatzoglou, A., Baltrunas, L., Tikk, D., 2015. Session-based recommendations
with recurrent neural networks. arXiv preprint arXiv:1511.06939.
loop featurewhen the same product was clicked few times in sequence, Hidasi, B., Quadrana, M., Karatzoglou, A., Tikk, D., 2016. Parallel recurrent neural net-
or cycle feature, when product was clicked and then after clicking on work architectures for feature- rich session-based recommendations. In: Proceedings
few other products it was clicked again or different ratios – loop/ of the 10th ACM Conference on Recommender Systems ACM, pp. 241–248.
Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T., 1999. Bayesian model averaging:
(length of the session). a tutorial. Stat. Sci. 382–401.
Hosmer Jr, D.W., Lemeshow, S., Sturdivant, R.X., 2013. Applied logistic regression, vol.
Conflict of interest 398 John Wiley & Sons.
Jannach, D., Kamehkhosh, I., Lerche, L., 2017. Leveraging multi-dimensional user models
for personalized next-track music recommendation. In: Proceedings of the
None. Symposium on Applied Computing ACM, pp. 1635–1642.
Jannach, D., Lerche, L., Kamehkhosh, I., 2015. Beyond hitting the hits: Generating co-
Acknowledgement herent music playlist continuations with the right tracks. In: Proceedings of the 9th
ACM Conference on Recommender Systems ACM, pp. 187–194.
Jeffrey, S.A., Hodge, R., 2007. Factors influencing impulse buying during an online
We would like to thank the Zalando team for providing their data purchase. Electron. Commerce Res. 7 (3), 367–379.
for our research. Kalczynski, P.J., Senecal, S., Nantel, J., 2006. Predicting on-line task completion with
clickstream complexity measures: a graph-based approach. Int. J. Electron.
Commerce 10 (3), 121–141.
References Kim, E., Kim, W., Lee, Y., 2003. Combination of multiple classifiers for the customer’s
purchase behavior prediction. Decis. Support Syst. 34 (2), 167–175.
Kim, Y.S., Yum, B.-J., 2011. Recommender system based on click stream data using as-
Amaro, S., Duarte, P., 2015. An integrative model of consumers’ intentions to purchase
sociation rule mining. Expert Syst. Appl. 38 (10), 13320–13327.
travel online. Tourism Manage. 46, 64–79.
Kohavi, R., 1996. Scaling up the accuracy of naive-bayes classifiers: a decision-tree hy-
Baumann, A., Haupt, J., Gebert, F., Lessmann, S., 2018. Changing perspectives: using
brid. KDD 96, 202–207.
graph metrics to predict purchase probabilities. Expert Syst. Appl. 94, 137–148.
Kooti, F., Lerman, K., Aiello, L.M., Grbovic, M., Djuric, N., Radosavljevic, V., 2016.
Bekkerman, R., 2015. The present and the future of the kdd cup competition: an outsider’s
Portrait of an online shopper: understanding and predicting consumer behavior. In:
perspective. https://www.linkedin.com/pulse/present-future-kdd-cup-competition-
Proceedings of the Ninth ACM International Conference on Web Search and Data
outsiders-ron-bekkerman/.
Mining ACM, pp. 205–214.
Bell, R., Koren, Y., Volinsky, C., 2007. Modeling relationships at multiple scales to im-
Koren, Y., 2010. Collaborative filtering with temporal dynamics. Commun. ACM 53 (4),
prove accuracy of large recommender systems. In: Proceedings of the 13th ACM
89–97.
SIGKDD international conference on Knowledge discovery and data mining ACM, pp.
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woź niak, M., 2017. Ensemble
95–104.
learning for data stream analysis: a survey. Inform. Fusion 37, 132–156.
Ben-Shimon, D., Tsikinovsky, A., Friedmann, M., Shapira, B., Rokach, L., Hoerle, J., 2015.
Lathia, N., Hailes, S., Capra, L., Amatriain, X., 2010. Temporal diversity in recommender
Recsys challenge 2015 and the yoochoose dataset. In: Proceedings of the 9th ACM
systems. In: Proceedings of the 33rd international ACM SIGIR conference on Research
Conference on Recommender Systems ACM, pp. 357–358.
and development in information retrieval ACM, pp. 210–217.
Bhatnagar, A., Sen, A., Sinha, A.P., 2016. Providing a window of opportunity for con-
Lo, C., Frankowski, D., Leskovec, J., 2016. Understanding behaviors that lead to pur-
verting estore visitors. Inform. Syst. Res. 28 (1), 22–32.
chasing: a case study of pinterest. KDD 531–540.
Bogina, V., Kuflik, T., 2017. Incorporating dwell time in session-based recommendations
Lu, J., Wu, D., Mao, M., Wang, W., Zhang, G., 2015. Recommender system application
with recurrent neural networks. In: First Workshop on Temporal Reasoning in
developments: a survey. Decis. Support Syst. 74, 12–32.
Recommender Systems, Como, Italy.
Lukose, R., Li, J., Zhou, J., Penmetsa, S.R., 2008. In: Learning user purchase intent from
Bogina, V., Kuflik, T., Mokryn, O., 2016. Learning item temporal dynamics for predicting
user-centric data In Pacific-Asia Conference on Knowledge Discovery and Data
buying sessions. In: Proceedings of the 21st International Conference on Intelligent
Mining. Springer, pp. 673–680.
User Interfaces ACM, pp. 251–255.
McDowell, W.C., Wilson, R.C., Kile Jr, C.O., 2016. An examination of retail website design
Breiman, L., 1996. Bagging predictors. Mach. Learn. 24 (2), 123–140.
and conversion rate. J. Business Res. 69 (11), 4837–4842.
Bucklin, R.E., Lattin, J.M., Ansari, A., Gupta, S., Bell, D., Coupey, E., Little, J.D., Mela, C.,
Moe, W.W., 2003. Buying, searching, or browsing: differentiating between online shop-
Montgomery, A., Steckel, J., 2002. Choice and the internet: from clickstream to re-
pers using in-store navigational clickstream. J. Consumer Psychol. 13 (1–2), 29–39.
search stream. Market. Lett. 13 (3), 245–258.
Moe, W.W., Fader, P.S., 2004. Dynamic conversion behavior at E-commerce sites.
Bucklin, R.E., Sismeiro, C., 2009. Click here for internet insight: advances in clickstream
Manage. Sci. 50 (3), 326–335.
data analysis in marketing. J. Interactive Marketing 23 (1), 35–48.
Mokryn, O., Wagner, A., Blattner, M., Ruppin, E., Shavitt, Y., 2016. The role of temporal
Cai, H., Chen, Y., Fang, H., 2009. Observational learning: evidence from a randomized
trends in growing networks. PloS one 11 (8), e0156505 .
natural field experiment. Am. Econ. Rev. 99 (3), 864–882.
Montgomery, A.L., Li, S., Srinivasan, K., Liechty, J.C., 2004. Modeling online browsing
Center for Retail Research (2017). Online retailing: Britain, europe, us and canada 2017.
and path analysis using clickstream data. Marketing Sci. 23 (4), 579–595.
http://www.retailresearch.org/onlineretailing.php.
Olbrich, R., Holsing, C., 2011. Modeling consumer purchasing behavior in social shopping
Chan, T.K., Cheung, C.M., Lee, Z.W., 2017. The state of online impulse-buying research: a
communities with clickstream data. Int. J. Electron. Commerce 16 (2), 15–40.
literature analysis. Inform. Manage. 54 (2), 204–217.
Panagiotelis, A., Smith, M.S., Danaher, P.J., 2014. From amazon to apple: modeling on-
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. Smote: synthetic min-
line retail sales, purchase incidence, and visit behavior. J. Business Econ. Stat. 32 (1),
ority over-sampling technique. J. Artif. Intelligence Res. 16, 321–357.
14–29.
Chen, C., Hou, C., Xiao, J., Wen, Y., Yuan, X., 2017. Enhancing purchase behavior pre-
Park, C.H., Park, Y.-H., 2016. Investigating purchase conversion by uncovering online
diction with temporally popular items. IEICE Trans. Inform. Syst. 100 (9),
visit patterns. Marketing Sci. 35 (6), 894–914.
2237–2240.
Park, S.E., Lee, S., Lee, S.-G., 2011. Session-based collaborative filtering for predicting the
Chen, T., Guestrin, C., 2016. Xgboost: a scalable tree boosting system. In: Proceedings of
next song. In: Proceedings of the 2011 First ACIS/JNU International Conference on
the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
Computers, Networks, Systems and Industrial Engineering (CNSI) IEEE, pp. 353–358.
Mining ACM, pp. 785–794.
Pavlou, P.A., Fygenson, M., 2006. Understanding and predicting electronic commerce
Choi, H., Varian, H., 2012. Predicting the present with google trends. Econ. Record 88
adoption: an extension of the theory of planned behavior. MIS Q. 115–143.
(s1), 2–9.
Pew Research Center (2016). Online shopping and e-commerce. http://www.
Cobb, C.J., Hoyer, W.D., 1986. Planned versus impulse purchase behavior. J. Retailing.
pewinternet.org/2016/12/19/online-shopping-and-e-commerce/.
Deng, L., Poole, M.S., 2010. Affect in web interfaces: a study of the impacts of web page
Polites, G.L., Karahanna, E., Seligman, L., 2018. Intention–behaviour misalignment at b2c
visual complexity and order. Mis. Q. 711–730.
websites: when the horse brings itself to water, will it drink? Eur. J. Inform. Syst. 27
Dias, R., Fonseca, M.J., 2013. Improving music recommendation in session-based colla-
(1), 22–45.
borative filtering by using temporal context. In: The Proceedings of the 2013 IEEE
Quadrana, M., Karatzoglou, A., Hidasi, B., Cremonesi, P., 2017. Personalizing session-
25th International Conference on Tools with Artificial Intelligence (ICTAI) IEEE, pp.
based recommendations with hierarchical recurrent neural networks. In: Proceedings
783–788.
of the Eleventh ACM Conference on Recommender Systems ACM, pp. 130–137.
Ding, A.W., Li, S., Chatterjee, P., 2015. Learning user real-time intent for optimal dynamic
Raphaeli, O., Goldstein, A., Fink, L., 2017. Analyzing online consumer behavior in mobile
web page transformation. Inform. Syst. Res. 26 (2), 339–359.
and pc devices: a novel web usage mining approach. Electron. Commer. Res. Appl.
Forrester Research, 2017. Forrester data: Online retail forecast, 2017 to 2022. https://
26, 1–12.
www.forrester.com/report/Forrester+Data+Online+Retail +Forecast+2017+To
Ricci, F., Rokach, L., Shapira, B., Kantor, P.B., 2015. Recommender Systems Handbook.
+2022+US/-/E-RES139271.
Springer.
Gefen, D., Karahanna, E., Straub, D.W., 2003. Trust and tam in online shopping: an in-
Salganik, M.J., Dodds, P.S., Watts, D.J., 2006. Experimental study of inequality and un-
tegrated model. MIS Q. 27 (1), 51–90.
predictability in an artificial cultural market. Science 311 (5762), 854–856.
10
O. Mokryn, et al. Electronic Commerce Research and Applications 34 (2019) 100836
Scarpi, D., Pizzi, G., Visentin, M., 2014. Shopping for fun or shopping to buy: is it different Oper. Res. 166 (2), 557–575.
online and offline? J. Retailing Consumer Services 21 (3), 258–267. Venkatesh, V., Agarwal, R., 2006. Turning visitors into customers: a usability-centric
Schäfer, K., Kummer, T.-F., 2013. Determining the performance of website-based re- perspective on purchase behavior in electronic channels. Manage. Sci. 52 (3),
lationship marketing. Expert Syst. Appl. 40 (18), 7571–7578. 367–382.
Senecal, S., Kalczynski, P.J., Nantel, J., 2005. Consumers’ decision-making process and Wells, J.D., Parboteeah, V., Valacich, J.S., 2011. Online impulse buying: understanding
their online shopping behavior: a clickstream analysis. J. Business Res. 58 (11), the interplay between consumer impulsiveness and website quality. J. Assoc. Inform.
1599–1608. Syst. 12 (1), 32.
Sismeiro, C., Bucklin, R.E., 2004. Modeling purchase behavior at an e-commerce web site: Widmer, G., Kubat, M., 1996. Learning in the presence of concept drift and hidden
a task-completion approach. J. Marketing Res. 41 (3), 306–323. contexts. Mach. Learn. 23 (1), 69–101.
Srinivasan, D.B., Mekala, P., 2014. Mining social networking data for classification using Wolfinbarger, M., Gilly, M.C., 2001. Shopping online for freedom, control, and fun.
reptree. Int. J. Adv. Res. Comput. Sci. Manage. Stud. 2 (10). California Manage. Rev. 43 (2), 34–55.
Su, Q., Chen, L., 2015. A method for discovering clusters of e-commerce interest patterns Yi, X., Hong, L., Zhong, E., Liu, N.N., Rajan, S., 2014. Beyond clicks: dwell time for
using click-stream data. Electron. Commer. Res. Appl. 14 (1), 1–13. personalization. In: Proceedings of the 8th ACM Conference on Recommender sys-
Suh, E., Lim, S., Hwang, H., Kim, S., 2004. A prediction model for the purchase prob- tems ACM, pp. 113–120.
ability of anonymous customers to support real time web marketing: a case study. Zhang, H., Ni, W., Li, X., Yang, Y., 2016. Modeling the heterogeneous duration of user
Expert Syst. Appl. 27 (2), 245–255. interest in time-dependent recommendation: A hidden semi-markov approach. IEEE
Tavakol, M., Brefeld, U., 2014. Factored mdps for detecting topics of user sessions. In: Trans. Syst., Man, Cybern.: Syst.
Proceedings of the 8th ACM Conference on Recommender Systems ACM, pp. 33–40. Zheleva, E., Guiver, J., Mendes Rodrigues, E., Milić-Frayling, N., 2010. Statistical models
Tsymbal, A., 2004. The problem of concept drift: definitions and related work. Comput. of music-listening sessions in social media. In: Proceedings of the 19th international
Sci. Department, Trinity College Dublin 106 (2). conference on World wide web ACM, pp. 1019–1028.
Tucker, C., Zhang, J., 2011. How does popularity information affect choices? A field Zhou, L., Dai, L., Zhang, D., 2007. Online shopping acceptance model-a critical survey of
experiment. Manage. Sci. 57 (5), 828–842. consumer factors in online shopping. J. Electron. Commerce Res. 8 (1), 41.
Van den Poel, D., Buckinx, W., 2005. Predicting online-purchasing behaviour. Eur. J.
11