You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/268688632

Large-scale Recommendations in a Dynamic Marketplace

Conference Paper · October 2013

CITATIONS READS
8 682

3 authors:

Jayasimha Reddy Katukuri Rajyashree Mukherjee

17 PUBLICATIONS 96 CITATIONS
eBay
9 PUBLICATIONS 52 CITATIONS
SEE PROFILE
SEE PROFILE

Tolga Könik
Apple Inc.
30 PUBLICATIONS 328 CITATIONS

SEE PROFILE

All content following this page was uploaded by Rajyashree Mukherjee on 25 November 2014.

The user has requested enhancement of the downloaded file.


Large-scale Recommendations in a Dynamic Marketplace
Jayasimha Katukuri Rajyashree Mukherjee Tolga Konik
eBay Inc. eBay Inc. eBay Inc.
jkatukuri@ebay.com rmukherjee@ebay.com tkonik@ebay.com

ABSTRACT while serving millions active users. Our contributions in this


We present a recommendation system architecture for dynamic paper are a highly scalable algorithm for clustering the inventory
using user queries as seeds, a recommender system architecture
marketplaces. Our system addresses several challenges
for transient items in a marketplace and a method that can
recommendation engines face in an open market. It can handle
negotiate between relevance and item quality.
open-ended and rapidly changing user generated listings and the
absence of a catalog behind them. It can control the trade-off Our system is currently deployed at ebay.com on a large scale
between relevance and predicted quality. Quality is affected by and has shown statistically significant positive impact on site-
many factors like price, item condition and seller trustworthiness wide performance. We present A/B test results showing increase
and it is a particularly important challenge in an open market in user engagement. Our proposed method is general and it is
setting. The core of our solution involves learned cluster suitable for other e-commerce marketplaces where relevance and
definitions that map dynamic user listings into static identifiers. quality play an important role.
These cluster identifiers allow us to pre-compute long-term
models and keep the runtime system scalable. Our solution is 2. Related Work
currently deployed on ebay.com and has shown statistically Recommender systems can be broadly categorized into ‘content-
significant business gain in site-wide metrics. based’ and ‘collaborative-filtering-based’ systems. Content-based
algorithms use the item features to compute similarity with respect
Keywords to other items and recommendations are based on this similarity.
Clustering, recommender systems, hadoop, map-reduce, Collaborative filtering methods compute item-item matrix using
similarity-based recommendations. the user behavioral data such as co-purchase [1] or co-views [2].
Most of the existing recommender systems address
recommendations for long-living items or products. Amazon’s
1. Introduction system [1] recommends products that are stable and do not expire
Recommender systems are gaining wide popularity in e- in a short period time. YouTube video recommender system [2]
commerce as they are becoming major drivers of incremental deals with a fast expanding collection, but most of the items stay
business value and user satisfaction. Building a recommendation on the site for long periods of time. Netflix’s movie database is
engine for a large open marketplace like eBay has several slowly growing collection and this allows their recommendation
challenges. Majority of the listings are unstructured and do not
system to be based on pre-computing item-item relationships
have product catalogues behind them. The listings are short-lived
using collaborative-filtering methods [4]. The Google news
as the items are often bought within 1-2 weeks of availability.
Hence pre-computing recommendations using traditional personalization [5] is one of the few systems that addressed the
techniques like item-to-item collaborative filtering [1] is not issue of recommendations for short-lived items. In the dynamic e-
feasible. On the other hand, a solution based on completely online commerce settings of eBay, our recommender systems need to
computation is not scalable. Another challenge in an open market address the short-lived items as well as ‘seller and item quality’
setting is that the recommendation systems also need to address issues. Our proposed system provides control over the trade-off
factors like seller trustworthiness and item quality. between relevance and quality.
We propose a recommender system architecture that is 3. Two Motivating Scenarios
scalable, recommends short-lived items and provides control over In this paper, we discuss how our system addresses two typical
multiple competing objectives such as relevance and quality. Our recommendation scenarios in ecommerce: pre-purchase and post-
solution learns cluster functions that maps dynamically changing purchase recommendations.
listings to static cluster identifiers. This allows us to separate
processing into computationally intensive offline modeling and In the pre-purchase scenario our system recommends items that
very efficient runtime performance. Offline processing builds are good alternatives for the item the user is viewing. eBay as a
static models using clusters as the main language. Although marketplace is well known for auction items. When an item is
computationally expensive, it is a highly parallelizable procedure listed as an auction item, several interested buyers will bid on the
since it is based on local clustering of items as partitioned by user item, but the item will be bought by only one buyer. In this
queries. The runtime system efficiently combines the cluster scenario, buyers who placed a bid on the item but could not buy
models with dynamic features on the site. As a result, our system will often want to buy a comparable item.
can cover hundreds of millions of active items in the inventory
Figure 1 shows an example of this particular scenario. The
algorithm recognizes that the input item belongs to the learned
concept “handmade amish quilt” and makes recommendations all
Copyright © LSRS 2013 for the individual papers by the papers’
within that class of objects, even though the details of the
authors. Copying permitted only for private and academic purposes. This recommendations can vary in some other dimensions. One naïve
volume is published and copyrighted by its editors. solution to this problem is to use the seed item title as query and
recommend best matching items. We compared our method components. Since both systems use the information in the data
against this baseline and showed superior results. store, it often provides two versions of similar services. For
example, the offline modeling component has access to the
complete history of changes in the item inventory but does not
provide efficient keyword search on item properties, while the
performance system can search an indexed version of the current
inventory but does not have access to changes over time. Next we
describe the data sources that our system uses as input and output.

4.1.1 Input Information Sources


The data store contains two kinds of information sources,
continuously accumulated raw data about the individual actions
along with the resulting state on the website, and models which
Figure 1: Pre-purchase Similar Item Recommendations contain generalized knowledge that can apply to new situations.
Like in many e-commerce sites, our data store can be categorized
into inventory data, which contains a set of items and their static
attributes; clickstream data, which stores actions and dynamic
state of the site, and transaction data, which stores a history of
purchases. Note that, while transactions could be recreated from
clickstream data, it is typically stored separately for efficient
access. The data store also contains a conceptual knowledge base
including category tree, which is a hierarchical ontology that
organizes items contained in the inventory, language specific
knowledge sources like stop words, spell correction rules, etc.,
and term dictionary that lists important terms/phrases in a given
Figure 2: Post-purchase Related Item Recommendations category. Some of these knowledge sources are learned from raw
In the post-purchase scenario, we recommend items data, but we will not describe this process here due to space
complementary/related to an item, which the user has bought limitations.
recently. The input item in Figure 2 is a ‘samsung galaxy s3
phone’, and the recommended items are accessories for the phone. 4.1.2 Output Cluster Model
The most common methods of post purchase recommender Our architecture generates two main output models. The first
systems is ‘collaborative filtering’. However, computing item- model contains cluster definitions, which groups a set of
related items using collaborative filtering method is not useful for conceptually similar items. One of the most important
short-lived items. Next, we describe our architecture including architectural commitments of our approach is to represent clusters
how we address the issue of short-lived items. as an explicit bag of phrases. One major advantage of this
representation is that they can be used as search queries. For
example, given a search engine indexing an inventory of items, we
4. Architecture can retrieve an ordered set of items that best match a cluster
Our recommendation architecture consists of a number of
expression. This representation also allows us to calculate term
components that can be partitioned in three major groups: The
similarity and item coverage overlap between clusters.
data store contains data about the active and temporarily
changing state of the web site as well as models learned using that The second knowledge structure our system generates is the
information, the performance system generates recommendations related cluster model, which is used for post-purchase
at the site given a session state and information in the data store, recommendations of complimentary items. This model represents
and the offline model generation component creates models by a sparse graph between clusters. A strong link from one cluster to
conducting computationally intensive offline analyses. Next we another indicates that the likelihood of purchase of an item from
describe these components in more detail. the second cluster increases after a purchase of an item from the
first cluster. Next, we describe how the cluster models are used in
4.1 Data store the runtime system and how they are learned during offline
The data store is the glue between the computationally intensive processing.
offline model generation and the real-time performance
Offline Model Generation The Data Store Real-time Performance System

user queries Inventory SIR query ?similarOf(item)


Query-Recall Cluster cluster
items assignment formation
Generation
clickstream
query
{query items}
concepts,categories Transactions Item items SIR
Cluster Search items
new clusters Clusters Ranking
generation
recommended user
items
bought item-item
Cluster Item items RIR
assignment Conceptual Search items Ranking
Knowledgebase
bought cluster-cluster Set of queries
related cluster-cluster Related clusters
Cluster-to- cluster
cluster Cluster-cluster RIR Query
Model relations Cluster Formation ?relatedOf(item)
cluster
assignment

Fig 3: Cluster Based Architecture

allows our system to balance between relevance and quality: while


Real-time Performance System the cluster constraints keep the recommendations relevant, the
The performance system consists of two components, similar item
ranking process increases the quality of the recommendations.
recommender (SIR) and related item recommender (RIR). Both of
these components take a seed item as input, and return a set of 4.2 Offline Model Generation
items that are similar or related to that seed item.
4.2.1 Clusters Generation
Since the performance system is interacting with the user in real An online marketplace like eBay has a huge inventory of transient
time, it is crucial that any relevant complicated decision process is items ranging in the hundreds of millions. Moreover, the
compiled to the offline model. However, pre-caching all inventory covers a very broad spectrum of variety, ranging from
processing is also not feasible since under the changing inventory regular electronic items to unique collectible items. Given the
assumption we are making, the seed item can be an item the scale and variedness of the inventory, global clustering is not
performance system encounters the first time. As a result, we feasible. We had a couple of options to partition the input data:
require that any retrieval from a data source occurs through an we could use the manually generated ontology of items created for
efficient indexing service and the computation done after retrieval business reasons, namely the category hierarchy; or the historical
is kept limited. user queries from behavioral data on the site. We chose the latter
Both SIR and RIR start the recommendation process by calling a because the user queries are more contextual and provide a better
cluster assignment service that returns the best matching clusters interpretation of similarity from the users’ perspective.
given an item. To achieve that, the service compiles normalized Our clustering pipeline runs in a Hadoop map-reduce distributed
versions of cluster expressions in a Lucene1 index and runs the environment. It starts with the recall sets of user queries and
input item title and features (e.g. category, attribute-value pair) performs local clustering using K-Means algorithm to further split
through a similar normalization to return the best matching them into meaningful clusters of similar items. Later we merge
clusters. Next, SIR creates a search query by taking the union of clusters with high overlap and remove duplicates. The features
best matching cluster tokens, while RIR uses related cluster used in the clustering algorithm include normalized tokens from
service to retrieve n related clusters and constructs n separate the items’ titles, attributes used to describe them and part of the
search queries for them. Next, both algorithms use these queries category hierarchy they belong to. With this clustering mechanism
to call the item search service, which indexes active items. SIR we are able to build a stable representation for the transient items
shows few best items returned from the search service as its with a 100X reduction in size.
recommendations. On the other hand, RIR returns one item per
the query it has constructed to ensure that each recommendation is 4.2.2 Cluster-Related Clusters Generation
related to the seed item in a different way. Our proposed method computes cluster-related cluster pairs using
In the item search call, the expressions constructed from clusters the transactional data. From this data set, we first extract an item-
are used as hard-constraints and within the scope of those to-item co-purchase matrix. Next, we generate a directed cluster-
constraints, the retrieved items are sorted using a ranking to-cluster graph applying cluster assignment on the item-to-item
function that emphasize the recommended items’ quality based co-purchase matrix. An edge from a cluster ‘i’ to cluster ‘j’
on metrics like predicted conversion probability, seller quality, indicates that a group of users have purchased items in cluster ‘j’
price proximity to seed item, format affinity to seed item, etc. This after buying items in cluster ‘i’. The problem of finding related
clusters is modeled as that of ‘ranking the outgoing edges’ for a
given node. The challenges that we face in this problem include
1
http://lucene.apache.org/ scale of the data and noisy data. We address the scale using
Hadoop Map-Reduce implementation. The problem of noisy data
is addressed using cluster-cluster content similarity. For each new algorithms made a statistically significant positive impact on
node in the graph, we rank the outgoing edges. The ranking eBay’s overall revenue.
function uses collaborative filtering features such as number of
users who co-purchased items from the respective clusters, 6. Conclusion
similarity between the two clusters. In this paper, we presented recommendation system architecture
for dynamic marketplaces where item listings are user-generated
5. Experimental Results and short-lived. The proposed system provides control over
In this section we present experimental results for the two multiple competing objectives in recommendations such as
recommendation engines described in the paper. relevance, item quality and seller trustworthiness.
We compared the similar items recommendation system described The core of our solution is the cluster definitions our system
in the previous section with the legacy system that was in learns that map item features into a bag of phrases. The clusters
production at eBay at the time. The legacy system took a naïve are learned over item features in the inventory but clustering is
approach of using the seed item title as a search query to local in the coverage set of user queries. This ensures that the
recommend the best matching items in the retrieval set. We problem is scalable and the clusters are biased by the concepts the
hypothesized that relevance would increase if we were to use users are creating during their search term construction. Our
clusters learned from user queries, since they would better capture offline model generation system leverages information from
user intention. The new system also performs more efficiently multiple large data sources: inventory, user click-stream data and
since the bulk of the computation is done offline, but we are not transactional data and large scale deployment was feasible due to
reporting on that in this paper. highly parallel algorithm and usage of large-scale Hadoop cluster.
The online performance system can deal with large number of
We conducted an A/B test on the Closed View Item Page (CVIP)
requests by indexing offline learned models in memory and
in eBay to compare our algorithm with the legacy algorithm.
keeping computation on dynamic features limited.
Users typically come to the CVIP upon losing an auction. The
goal of this page is to keep the users engaged with the site and Our system is deployed at ebay.com in large-scale and we showed
provide options to buy a different yet similar item in place of the statistically significant improvement in user engagement metrics
item they could not win. The test results showed that with 90% as well as site-wide business metrics. Our future plans for
confidence, our algorithm achieved statistically significant improvement includes using more structured representations in
improvement in user engagement and site-wide business metrics. clusters in form of attribute-value pairs, and better natural
We are not allowed to report specific statistics on site-wide impact language parsing to better retrieve important group of words.
and incremental purchase driven by the algorithm, but we report
the relative improvement in user engagement (Table 1). Acknowledgements
We have also conducted an A/B test comparing our relation-item We thank Santanu Kolay, Riyaaz Shaik, Kranthi Chalasani and
recommendation system against the legacy system. The legacy Venkat Sundaranatha for contribution to this research with ideas
related items recommender system developed at eBay by Chen & and implementation.
Canny [6] also first mapped the items to a stable representation
and then used transactional data to compute relationships between 7. REFERENCES
the groups. However, there are important differences between our 1. Linden, G., B. Smith, and J. York, Amazon.com
proposed method and the original method. Unlike the earlier Recommendations: Item-to-Item Collaborative
method, ours uses clusters generated from user queries and we Filtering. IEEE Internet Computing, 2003. 7(1): p. 76-
hypothesized that they should better capture user intentions. 80.
Additionally, our system is more scalable. It was built using 2. Davidson, J., et al., The YouTube video
transactional data from a period of 1 year as opposed to 3 months recommendation system, in Proceedings of the fourth
used by Chen & Canny [6]. Our test results show a statistically ACM conference on Recommender systems. 2010,
significant improvement in site-wide business metrics (90% ACM: Barcelona, Spain. p. 293-296.
confidence interval). We also show results on improvement in 3. Adomavicius, G. and A. Tuzhilin, Toward the Next
user-engagement. (Table 1). Generation of Recommender Systems: A Survey of the
State-of-the-Art and Possible Extensions. IEEE Trans.
on Knowl. and Data Eng., 2005. 17(6): p. 734-749.
Table1. Relative improvement in user engagement over the legacy 4. Koren, Y., Factorization meets the neighborhood: a
similar and related items recommendation algorithms. multifaceted collaborative filtering model, in
System Click-through-rate (CTR) Proceedings of the 14th ACM SIGKDD international
conference on Knowledge discovery and data mining.
Similar Items Recommender 38.18% 2008, ACM: Las Vegas, Nevada, USA. p. 426-434.
Related Items Recommender 10.5% 5. Das, A.S., et al., Google news personalization: scalable
online collaborative filtering, in Proceedings of the 16th
international conference on World Wide Web. 2007,
Both A/B tests validate our proposition that user engagement and ACM: Banff, Alberta, Canada. p. 271-280.
conversion can be improved by striking a balance between item 6. Chen, Y. and J.F. Canny, Recommending ephemeral
similarity and quality and by utilizing user queries to better model items at web scale, in Proceedings of the 34th
similarity from the perspective of the users. We saw significantly international ACM SIGIR conference on Research and
higher user engagement in terms of click through rate and the both development in Information Retrieval. 2011, ACM:
Beijing, China. p. 1013-1022.

View publication stats

You might also like