You are on page 1of 10

2020 IEEE International Conference on Big Data (Big Data)

Item-Based Collaborative Filtering and Association


Rules for a Baseline Recommender in E-Commerce
Jessica Lourenco and Aparna S. Varde
Department of Computer Science
Montclair State University
Montclair, NJ, USA
2020 IEEE International Conference on Big Data (Big Data) | 978-1-7281-6251-5/20/$31.00 ©2020 IEEE | DOI: 10.1109/BigData50022.2020.9377807

(lourencoj1 | vardea)@montclair.edu

Abstract—In the ever-growing data-driven world today, data recommendation makes browsing the content much easier for
increases in many forms, e.g. e-commerce sites uploading new an end user. Rather than always having to search through an
products, streaming services adding TV shows and movies, and
music platforms uploading new songs. It would be highly infeasible enormous range of items, it is useful if the system is able to
for end users to quickly browse all this data. Hence recommender suggest other items similar to the one that a given user just
systems can benefit end users (individuals as well as companies) in bought or browsed, based on those that are frequently bought
efficiently finding suitable products. Rather than making end
users search through a vast array of items, recommender systems
with the given item. This is extremely helpful as new items are
can suggest suitable items to users based on popularity of the items added constantly. Not having recommendations would make it
and the respective users’ buying behavior. Accordingly, in this harder for users to find all they need. While this is helpful to
paper we explore two techniques widespread in recommender individual users, it is noticed that companies make a substantial
systems, i.e. item-based collaborative filtering and association rule
mining, over Amazon review data on cellphones and accessories, percentage of their revenue through user recommendations. For
and build a baseline recommender system scalable to larger data. example, Amazon.com states that 40% of their sales increased
Association rule mining is explored using the Apriori algorithm to using their recommendation engine [1].
find patterns in the data from transaction history. Item-based Amazon currently uses item-to-item collaborative filtering,
collaborative filtering is deployed using a correlation matrix to
find similar products. Both these techniques yield useful results as which is one of the techniques we explore in this paper. They
evident from our baseline experiments. This work constitutes an use this technique to match each of a user’s purchased and rated
exploratory study with longtime products in e-commerce and sets items to similar items which then appear to the user as
the stage for mining online data on relatively new products recommendation lists. Some types of recommendation lists on
pertinent to the Covid-19 pandemic. These include face masks,
hand sanitizers, disinfectant sprays, antibacterial wipes etc. Since Amazon.com are: a “frequently bought together” suggestion
multiple vendors are designing such crucial products today, it is based on pertinent items that other users generally buy along
important to provide recommendations to potential buyers. An with the items in the given user’s shopping cart and browsed
ultimate goal in our work is to build a recommender app for e- items; a “recommended for you” list based on a previous
commerce based on interesting results from our findings. This
work constitutes intelligent data mining scalable over big data in purchase by the same user; and a “related to items you’ve
e-commerce. It makes broader impacts on smart cities, since this viewed” list based on items similar to those the given user has
fits the smart living and smart economy characteristics. already seen. Their real time recommendations given by item-
to-item collaborative filtering are surely powerful e-commerce
Index Terms—Apriori algorithm, baseline recommender,
collaborative filtering, Covid-19, data mining, decision support methods. Our work in this paper is orthogonal to such work and
system, e-commerce, exploratory research, scalability, smart cities is conducted with the long-term goal of mobile application
(app) development in order to enhance the shopping
experiences of targeted users.
I. INTRODUCTION Although this continues to be a constantly developing area,
All of us today have encountered recommendation systems there are some popular data mining techniques prevalent in
in some facet of our daily lives. Netflix suggests movies we can building good recommender systems. As is widely known, data
watch based on those we have already watched. Pandora mining is the process of discovering interesting patterns and
deploys user feedback of liking or disliking a song to play music trends from large amounts of data where the data sources can
that we like. Facebook suggests friends to add considering our include databases, data warehouses, the web, various other
current friends and their friends. YouTube recommends videos information repositories, and data streamed into a system
similar to the ones we are currently viewing. An online store dynamically [2]. Among the various data mining techniques, a
offers products or services based on our purchase history. These widespread one for years is association rule mining with
are just a few of the examples that might sound familiar. As data numerous applications in many domains. Collaborative
gets bigger, it is even more significant to receive automated filtering is yet another technique that is useful in capturing user
recommendations, since the volume, velocity and variety of and item preferences and hence is helpful in the design of
such big data is directly proportional to the amount of time users recommender systems. In this paper, we conduct an exploratory
would spend browsing through it manually. study on these two techniques in order to build a simple baseline
Recommender systems thrive largely on machine learning recommender system in e-commerce that can be scaled to other
algorithms that are employed to provide relevant suggestions to larger data. Our approach for developing this e-commerce
users based on their own actions and those of other users recommender system is illustrated in Fig. 1.
interested in similar products. Recommenders are very popular We explore association rule mining with the Apriori
on e-commerce websites where receiving an automated algorithm and item-based collaborative filtering with a
978-1-7281-6251-5/20/$31.00 ©2020 IEEE 4636

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 14,2021 at 05:02:23 UTC from IEEE Xplore. Restrictions apply.
Fig 1. Illustration of approach for developing baseline e-commerce recommender (green text
indicates frequent patterns using Apriori for association rule mining, and same-colored arrows
denote related products using item-based collaborative filtering)
correlation matrix on e-commerce data from the Amazon item-based collaborative filtering, along with the experiments
review website focusing on products in the category of conducted. Section V presents applications of recommender
cellphones and accessories. In association rule mining, the systems using association rules and item-based collaborative
focus of our work is on finding patterns with products from filtering with respect to companies as well as individual
previous transactions to offer recommendations to users based customers, especially considering Covid-19 perspectives.
on what they are currently buying or have previously bought. Section VI states the conclusions and future work.
In item-based collaborative filtering, we focus on using the
product ratings to recommend interesting products to buyers, II. RELATED WORK
taking into account a similar / related product they recently Within the area of data mining and machine learning there
browsed or bought. We present our approach and experiments are numerous works related to recommender systems. An
in this paper leading to the development of the baseline e- interesting article [3] addresses the technology used to generate
commerce recommender system. recommendations and focuses on the application of data mining
Our work in this paper is a precursor to further e-commerce techniques. The techniques discussed therein include cluster
research on new products relevant to the Covid-19 global analysis, classification, association rules, and a similarity
pandemic. These include items such as full personal protective graph-based technique called Horting. Clustering techniques
equipment (PPE), gloves, hand sanitizers, N-95 and other face are used such that consumers who appear to have similar
masks, antibacterial wipes and disinfectant sprays. Since this preferences are first grouped together, and certain predictions
pandemic has continued longer than expected, and the demand can then be made for users based on those groups. The author
for these products is increasing, several vendors are designing mentions how clustering methods usually produce relatively
new items in these product categories in order to keep up with less personal recommendations than other methods, and in
the supplies as per the demand. Hence, recommendations are all some cases the clusters have worse accuracy than collaborative
the more crucial based on the use of these products and the filtering based algorithms. In using classifiers, a category is
needs of the respective buyers. Accordingly, our ultimate aim assigned to an output that can be used to classify new items.
in this research activity is to design a recommender app in the Furthermore, the author states that association rules are among
domain of e-commerce for the relevant product categories the best-known examples of data mining in recommender
based on the outcomes of our work. This would aid in decision systems [3], which motivates us to explore them in our work.
support for online shopping. This work on the whole fits the Association rules identify items frequently found in
characteristics of smart living and smart economy within the “association” with items in which a user has expressed interest
realm of smart cities, thereby making broader impacts. and hence using this technique can recommend items based on
The rest of this paper is structured as follows: Section II current interest rather than using previous customer history.
overviews related work on the data mining approaches and Finally, they discuss Horting, a graph-based technique using
recommender systems. Section III gives the details of the data edges between nodes, such that the nodes depict users, in order
description along with statistics about the customers and the to determine the similarity between the respective users’ actions
product reviews. Section IV explains the details of our approach and behaviors.
and experimentation, i.e. the techniques deployed with respect Another well-known technique used within recommender
to the data in our work, namely, association rule mining and systems is collaborative filtering. Many studies have compared

4637

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 14,2021 at 05:02:23 UTC from IEEE Xplore. Restrictions apply.
user-based and item-based collaborative filtering. The authors There are already apps and websites developed to aid with
in [4] experiment with several different item-based algorithms the Covid-19 pandemic. We find some online GIS (geographic
and the user-based nearest neighbor algorithms. They discuss information system) sources with mapping dashboards and
some challenges that are encountered when implementing user- applications [11] for tracking coronavirus around the world.
based collaborative filtering algorithms. One challenge is There is work [12] that provides free continuing professional
sparsity, where the result of the accuracy of recommendations development courses related to the Coronavirus disease.
may be poor if the recommender system is based on nearest
The brief survey of the literature herewith inspires us to
neighbor algorithms and unable to make recommendations for
explore data mining techniques for recommender systems
a particular user. Another challenge is scalability since the
within the context of online shopping, falling in the overall
nearest neighbor algorithms require computation that grows
realm of e-commerce. Though we conduct some experiments
with both the number of users as well as the number of items
with other techniques as well, the work we present in this paper
[4]. Furthermore, they conduct experiments on item-based
focuses on two techniques, i.e. item-based collaborative
recommendation algorithms and dive into the similarity
filtering and association rule mining. This is because we get
computation and prediction generation. Their experiments
more interesting results with these as compared to the other
suggest that item-based algorithms provide dramatically better
techniques, in addition to the fact that they are highlighted as
performance than user-based algorithms, while also providing
being among the most useful by the works in the literature. We
better quality than the best available user-based algorithms [4].
conduct exploratory research in this paper using existing data
Based on such research, we choose to explore item-based
from online shopping of cellphones and accessories in order to
collaborative filtering within our work.
study these techniques in detail and thus set the stage for their
Some recommender systems rely on sentiment analysis, also
use within other crucial products, i.e. those useful during the
known as opinion mining, which can classify emotions toward
Covid-19 pandemic, so that we can eventually build a
an item and determine whether there is a positive or negative
recommender app based on our findings. Some of this is
opinion expressed. Different methods have been harnessed for
orthogonal to our earlier work on app development [13, 14].
sentiment analysis in the literature. In [5], the authors provide
The work in this paper constitutes an initial study in which
an overview of sentiment analysis in social media that is also
patterns and product correlations are discovered from online e-
applicable to recommenders. They analyze the progress of
commerce data to provide relevant suggestions to users via a
techniques, review the advances in applications and discuss the
baseline recommender system, potentially useful for other
limitations as well. Overall, a sentiment analysis system can
products as well as for a recommender app development.
benefit from implementation using NLP (Natural Language
Processing) techniques along with machine learning in order to
predict a user rating from their written review. We could III. DATA DESCRIPTION
consider sentiment analysis as potential future work for The data used in this exploratory study is from Amazon
enhancing our simple baseline recommender system. review data sources (2018) as this is a widely used source found
Combining techniques can also be useful. In [6], a study is in the literature, e.g. [15]. The data includes reviews within the
aimed at designing a recommender system based on the explicit date range of May 1996 through October 2018. The raw review
and implicit preferences of the customers in order to increase data is of the size 34GB with 233.1 million reviews. An
prediction accuracy using both sentiment analysis and example of the data included in a product review in the category
collaborative filtering. The system uses NLP along with a of cellphones and accessories can be seen in Fig. 2 next.
supervised classification approach over hotel guest reviews in
order to create recommendations. In [7], recommendations are
provided based on sentiment analysis over partially labeled
training data, constituting a combination of supervised and
unsupervised learning tending towards a hybrid approach.
Furthermore, recommender systems are most useful when
users need to sort through large amounts of data. The Covid-19 Fig. 2. Example of an Amazon product review
pandemic has been accompanied by rapidly growing data. As
provided in [8] and [9], there is an enormous amount of data This raw data is subjected to preprocessing. Consider the
associated with the pandemic. Along with that comes a new following code snippet.
demand for online shopping due to social distancing. Due to merged_data = pd.merge(reviewdata, metadata, on=’asin’, how=’left’)
this e-commerce has been at the forefront of retail affecting Here, the review data json file is joined with the product
consumer behavior. According to [10], many countries can see metadata json file for the cellphones and accessories category.
the major shift to e-commerce, e.g. China where 50% of the These are then merged through the product ID, in order to have
people are using e-commerce more frequently today than all the attributes needed, in one data-frame named merged data.
earlier. Knowing this, recommender systems will certainly play Next, any null values are removed. Due to a large amount of
an important role in e-commerce users’ lives. On the whole, data comprising 10,063,255 reviews, we downscale the data to
they will make it easier for users to shop online.

4638

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 14,2021 at 05:02:23 UTC from IEEE Xplore. Restrictions apply.
a sample of 99,077 reviews for this initial study. This is This approach has been illustrated in Fig. 1. Its implementation
conducted using supervised instance-based filtering. and experimentation are conducted using Python programming
Before implementing the techniques on the dataset, it is in Jupyter Notebook [16], an open-source interactive web
important to explore the data in order to comprehend the application that allows combining code, output and explanatory
attributes and fathom the content of the data. Specifically, it is text in one document. Details are described next.
useful to get information regarding the number of customers,
reviews per customer, number of products, reviews per product
and information about the ratings. In order to attain this, we first
check for the unique number of customers and products. Fig. 3
depicts the retrieval of those numbers through a function which
counts the unique values from a list. We also calculate the
average number of reviews per customer and per product.
In the data sample shown herewith, the number of unique
customers is 69,346, the number of unique products is 2,106,
the average number of reviews per customer is 1.4 and the
average number of reviews per product is 47. While this is a
small sample, it is well-representative of the data and presents
an example of how a subset can be useful in dealing with much
larger datasets. This helpful later in our further work on
Fig. 5. Distribution of ratings histogram
extending this study to bigger datasets and hence is important
from a scalability angle.
A. Association Rule Mining
1) Overview of Technique: As widely known, association
rule mining involves discovering associations of the type X ⇒Y
(X implies Y) where X is the antecedent and Y is the consequent
[2]. It is a powerful method for market basket analysis, which
aims at finding regularities in shopping behavior of customers
in supermarkets, mail-order companies, online shopping and so
on [17]. This analysis can especially help companies increase
Fig. 3. Sample calculations for number of unique customers and products as their sales by discovering buying patterns in their products and
well as average number of reviews per customer and per product
hence recommending other products. The classical Apriori
algorithm for discovering association rules is deployed in our
Thereafter, the summary statistics is retrieved pertaining to
work. There are three important aspects here: (1) confidence,
the ratings column in the data data-frame. As observed here in
i.e. the probability that, given X occurs, Y also occurs, (2)
Fig. 4, the total count of ratings is 99,077 with a mean of 4.3,
support, i.e. probability of both X and Y occurring together in
standard deviation of 1.16, minimum rating of 1, and maximum
dataset and (3) lift, i.e. the probability of the items X and Y
rating of 5. Also, more than half of the products have a rating
occurring together divided by a multiplication of their
of 5 which is good. As depicted in Fig. 5, a histogram is created
individual probabilities. The calculations of each are as given
to view distribution of the ratings for at-a-glance analysis.
in Equations (1) to (3) where P denotes probability, #X and #Y
are the number of transactions containing X and Y respectively
and #T is the total number of transactions in the dataset.

Confidence (X⇒Y) = 𝑃(𝑌 | 𝑋) = #(𝑋 ∩ 𝑌)/ #𝑋 (1)


Support (X⇒Y) = 𝑃(𝑋˄𝑌) = #(𝑋 ∩ 𝑌)/ #𝑇 (2)
Lift (X⇒Y) = 𝑃(𝑋˄𝑌) / (𝑃(𝑋) × 𝑃(𝑌)) (3)

The Apriori algorithm allows discovery of sets of products


regardless of the order in which they are bought. The rules
Fig. 4. Summary of statistics on the ratings column in the data data-frame
discovered can then be used to make retail recommendations to
users based on the items bought. Implementing this algorithm
IV. APPROACH AND EXPERIMENTATION in our work involves using the apriori class from the apyori
library and setting parameters to get the association rules for the
The approach deployed in this exploratory study entails products. The adaptation of association rule mining in our work
harnessing the techniques of association rule mining as well as appears in Algorithm 1 herewith and is explained next.
item-based collaborative filtering over the data in this work.

4639

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 14,2021 at 05:02:23 UTC from IEEE Xplore. Restrictions apply.
support is an experimental value that is altered in multiple
Algorithm 1: Association Rule Mining for E-com Recommender executions. The third parameter, min-conf, i.e. minimum
Input: E-commerce website of category cellphone & accessories from Amazon threshold for confidence is also altered as an experimental
review data; min-sup, min-conf, min-lift, min-len thresholds for confidence c,
support s, lift t, number of products p respectively value in several executions. However, confidence is set at a
1. Preprocessing much higher value than support in our experiments shown here,
a. Merge review data R and meta data M into data-frame F e.g. min-sup=0.02 and min-conf=0.90, both varied around
b. Delete tuples with null values
these approximate values. The fourth parameter, min-lift, which
c. Select a sample of data D from F
2. Select asin, also_buy attributes from D /** product id, items that shows the strength of a rule, is varied in some experiments and
the user also buys */ is set to 3 in the examples shown here. The last parameter, min-
3. For (i = 1 to n) in D ε F /** n = number of rows */ length, which specifies the number of products in the rule, is set
a. Merge asin, also_buy cols into new col bought_items
b. Delete asin and also_buy cols to 2 in all our experiments, since in these set of experiments,
c. For (j = 1 to a) /** a = number of asin values */ we deal with 2 products. Based on these experiments, we
i. Split asin in bought_items into b separate observe the output with association rules.
cols /** b = number of bought_items */
ii. Hence obtain altered data-frame D`
4. Convert D` to list L
5. Deploy Apriori algorithm on L using min-sup, min-conf, min-lift,
min-len thresholds
6. Output products such that s > min-sup, c > min-conf , t > min-lift,
and p > min-len /** include values of support s, conf c, lift t, and
number of products p in output */

2) Deployment on Dataset: In the given dataset, the following


two attributes are chosen from the many available ones in the
cellphone and accessories category (see Fig. 6 for a sample).
• asin: the ID of a given product bought by users
• also_buy: IDs of products that users also bought with it

Fig. 7. Splitting bought_items into separate columns for each product ID

4) Output: By experimenting with the Apriori algorithm


on the product purchase history for the category of cellphones
Fig. 6. Data attributes used in the deployment of association rules and accessories from Amazon review data, we discover some
association rules. These rules can be used to provide suitable
Given these two attributes, data transformation is performed recommendations through the users’ purchase history by
using merging and splitting of attributes. Consider the recommending what they should buy next or by recommending
following code snippet. additional products when they are considering a given one. The
data[‘bought_items’]=data[[‘asin’,‘also_buy’]].agg(“, ”.join, axis=1)
data=data.drop([‘asin’, ‘also_buy’], axis=1)
recommendations through association rules are all based on
Here, the asin and also buy attributes are merged in order to patterns that previous users have created through buying
create a new column named bought_items. Next, the column products. These recommendations can be extremely helpful to
bought_items is split into separate columns, one for each asin a given user and are likely to bring more profit to companies
that occurs within bought_items as shown in Fig. 7. implementing them through recommendation systems.
The last step in the deployment of Apriori is to change the Fig. 8 shows a few sample outputs from association rules. It
format of the dataset. Heretofore, the data is in a pandas data- includes two products and the calculated support, confidence
frame but the apyori library requires the dataset to be in the and lift values. The products are shown in the first two columns
form of a list. Thus, our code converts the pandas data-frame with their asin (the product ID). The original dataset does not
into a list, iterating through each row in the data-frame. provide product titles for asin under the attribute of also_buy
3) Experimentation: In order to implement association which has the product IDs of items the user also bought. Using
rule mining, pandas, numpy, and the apyori library as well as row index 0 as an example, the output of the association rules
the dataset are all imported. The Apriori algorithm is deployed can be interpreted as follows:
 If a user buys product B00KR9FT4Q then there is a 0.93 probability
on the dataset by using the apriori class from the apyori library. that user also buys B00BW0X892 (based on confidence).
The parameters are set as needed. The first parameter is the list  The probability of product B00KR9FT4Q occurring in the data set
named records from which rules are to be extracted. The is 0.021 (based on support).
second parameter, min-sup, i.e. the minimum threshold for  The probability of product B00BW0X892 being bought together
with product B00KR9FT4Q is 37.7 (based on lift).

4640

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 14,2021 at 05:02:23 UTC from IEEE Xplore. Restrictions apply.
The product IDs are then mapped in our program to the
corresponding product names using the literature on these
products. A snapshot of this mapping is presented in Fig. 9 with
the respective products. These values can therefore be used to
stipulate recommendations via the users’ purchase history by
suggesting what they should buy next or by suggesting other
relevant products when they are considering a certain product.
Such recommendations are incorporated into the output panel
of our simple baseline e-commerce recommender system to
offer meaningful suggestions to users about products.

Fig. 10. User-based versus item-based collaborative filtering

In this paper, we focus on item-based collaborative filtering.


Fig. 8. Sample outputs from association rule mining The reasons for this are twofold. First, we notice from the
literature that it is the preferred method in most works. Second,
we already consider association rule mining to find user-based
preferences, so we explore another method here. Both these
techniques are useful in providing recommendations from two
different perspectives. We apply item-based collaborative
filtering on our dataset of cellphones and accessories from
Amazon. This involves first creating a user-item rating matrix,
then applying truncated Singular Value Decomposition (SVD),
a dimensionality reduction matrix factorization technique, and
lastly calculating similarity measures through a correlation
matrix. Entering the product ID then returns recommendations
of top similar products. This is conveyed as the output to users.
Item-based collaborative filtering is adapted in our work as
shown in Algorithm 2 herewith, and is elaborated next.
Fig. 9. Mapping of product IDs to product names

Algorithm 2: Item-Based Collaborative Filtering for E-com Recommender


B. Item-based Collaborative Filtering Input: E-commerce website of category cellphone and accessories from
Amazon review data, k = number of products in top-k ranking
1) Overview of Technique: There are two filtering techniques 1. Preprocessing
used in recommendation systems, namely, content-based a. Merge review data R and meta data M
filtering and collaborative filtering. Content-based filtering b. Delete tuples with null values
c. Select a sample of data D
analyzes and constructs portraits of users and items through 2. Select asin, title, reviewerID and rating attributes /** product id,
extra information, such as user-item profiles, and content name, reviewer info, ratings by reviewers */
analysis. Collaborative filtering instead derives suitable 3. Create user-item matrix U
a. Set index as reviewerID, columns as asin and values as
recommendations based on interactive information between the product rating
users and items such as browsing, rating, and clicks [18]. There b. Replace null values with 0s
are two types of approaches in collaborative filtering: user- 4. Create new matrix N = UT /** Transpose the matrix */
5. Perform dimensionality reduction on N
based and item-based. User-based collaborative filtering finds 6. Generate correlation matrix C based on N using Pearson’s
users with similar consumption patterns as a given user and correlation coefficient ρ
offers the content that these similar users found interesting. 7. For (i = 1 to k) in C /** products from 1 to k, where k is a
Item-based collaborative filtering uses the similarity between parameter for top-k */
a. Find asin with correlation value vi /** values v1 to vk as
the items to determine whether a given user would like it or not top-k correlation values */
[19]. Fig. 10 visually showcases the difference between the two b. Map asin to title for k products /** top-k product titles
collaborative filtering techniques as found in the source found from C */
c. Output O as title for recommended product /** top-k
[https://medium.com/@cfpinela/recommender-systems-user-based-and-item- product names */
based-collaborative-filtering-5d5f375a127f]. 8. Convey output O to the user

4641

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 14,2021 at 05:02:23 UTC from IEEE Xplore. Restrictions apply.
2) Deployment on Dataset: Among the many attributes
available in the original data of the cellphones and accessories
category, the ones considered in the item-based collaborative
filtering herewith are as follows (See Fig. 11 for a sample).
• reviewerID - the identification of the reviewer
• asin - the identification of the product
• title - the name of the product
• rating - the rating of the product

Fig. 12. Sample output of a pivot table with a user-item matrix

Fig. 11. Data attributes used in deployment of item-based collaborative filtering

In addition to the obvious attributes of asin and title depicting


the product ID and product name respectively, it is important to
consider the reviewer ID and the rating given by the reviewer
to the product. The reviewer ID serves to identify each
reviewer, thereby helping to distinguish between reviews (e.g.
we do not want all the reviews to come from the same few
reviewers). The rating serves to assess the product which is
clearly significant in providing recommendations.

3) Experimentation: The implementation of item-based


Fig. 13. Sample output of transposing the user-item matrix
collaborative filtering in this work entails importing pandas,
numpy, and TruncatedSVD from Sci-Kit Learn, in addition to
importing the review dataset. In order to maintain a simple
implementation, the product IDs, i.e. asin values are considered
initially. These are subsequently mapped to the title values in
the recommender output.
Next, a user-item matrix is created by making a pivot table.
In this table, the index is set as the reviewerID, columns as asin
and values as the product rating. In this implementation, any
null values are replaced with 0s. Fig. 12 shows a sample output
of the pivot table creating a user-item matrix. However, note
that for the item-based collaborative filtering technique, the
main interest is in the items. Hence, this matrix needs to be
transposed. Fig. 13 shows the transposed matrix that now has
the reviewerID as the columns and asin as the index. Fig. 14. Sample correlation matrix output
Thereafter, dimensionality reduction is performed by using
truncated singular value decomposition (SVD), a matrix
4) Output: The end goal here is to use the item-based
factorization technique. By using Scikit-Learn, the truncated
collaborative filtering to recommend similar products to those
SVD is implemented such that it produces a factorization where
that a user considers. By incorporating the product ratings for
the number of columns is the same as the specified truncation.
the category of cellphones and accessories from Amazon
Lastly, a correlation matrix is generated as portrayed in Fig. 14
review data, we have seen that a correlation matrix has been
in order to find similar products. Here, Pearson’s correlation
generated as found in Fig. 14 to show how each product
coefficient ρ is used via numpy’s corrcoef function to generate
correlates to another. This matrix is useful in proffering
this matrix. This correlation matrix shows how each product
recommendations. By incorporating this, the output panel of
correlates to another based on the reviewer ratings.
our simple baseline e-commerce recommender system is

4642

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 14,2021 at 05:02:23 UTC from IEEE Xplore. Restrictions apply.
implemented. This locates the product a user is considering and within recommender systems, leveraging benefits from both the
finds similar products using the output of the item-based company perspective and a customer perspective.
collaborative filtering. The product ID, i.e. asin is mapped to
the product name, i.e. title in order to get a meaningful A. Usefulness of Association Rule Mining
recommendation of the products that the system is suggesting 1) Company: If a company would implement a Covid-19
to the user. This sets the stage for the final output that includes based recommender system using association rules techniques,
the names of recommended products. Fig. 15 shows an excerpt there would be many advantages. A company could use the
from the final output to the user for the product Insten patterns in user purchase history to create new promotions, e.g.
SportBand with Case as an example. In this figure, that product if “Tzumi antibacterial wipes” are bought very frequently with
itself appears at index 0 with a correlation value of a full 1.0 “Germ-X hand sanitizers”, there could be a promo of buy two
which is obvious. get one free in this context, provided there is enough supply of
As observed in Fig. 15, that there are 9 recommendations that these items. These marketing strategies can be used by an
show the highest correlations to the given product. Based on online shopping site such as Amazon to jointly promote the sale
these recommendations, there are similar / related products on of these items. In the case of a physical store such as CVS
the list. For instance, the considered product is a band and the Pharmacy, the association rules would help identify where the
first recommendation at index 1 shows another band. Another products should be placed on the shelves for an easier shopping
detail is that the considered product is a general phone case for experience and for product marketing to customers. In an online
cellphones, hence the recommendations show different phone aspect, a company such as Amazon could recommend items
case options. As evident from this example, the item-based based on the item the customer currently has in their cart or has
collaborative filtering approach suggests correlated products to previously purchased, offering suggestions of Covid-19 related
the user. Likewise, it has been noticed in our exploratory study items frequently bought together. All these benefits could bring
that several useful recommendations are provided by our simple companies such as Amazon and CVS more revenue, in addition
baseline e-commerce recommender. to helping them in better understanding customer behavior and
conveying suggestions to product vendors on manufacturing
specific items in high demand. Most importantly, this would
have the significant broader impact of producing more items
really needed by users based on their buying behavior.

2) Customer: If a customer uses a recommender system


that deploys association rules, focusing on Covid-19 related
transactions, the customer would have a seamless shopping
experience. In times of the Covid-19 lockdown as well as in its
recovery phase and the aftermath, customers are likely to spend
less time in places such as grocery stores and pharmacies,
preferring to be at home or outdoors in the open air. Hence, in
the case of any physical store, e.g. a pharmacy, if frequently
bought items, especially those much needed during this
pandemic, are placed next to each other customers would have
the pleasure of shopping with greater efficiency. Moreover, if a
certain item is placed on a shelf instead of another similar /
related item that is out-of-stock, then that would make things
easier. For example, Lysol disinfectant sprays have been out-
of-stock for quite some time and customers have been using
Windex cleaners instead on kitchen tabletops, light switches
Fig. 15. Recommender output example for given product at index 0
etc. Such a fact could be discovered by association rules on
earlier data when both items were available, e.g. customers who
V. RECOMMENDER SYSTEM APPLICATIONS IN COVID-19 bought Lysol sprays also bought Windex cleaners. Hence,
placing Windex cleaners on the same shelf as Lysol sprays used
Based on our initial exploratory research in this paper as well
to be (when in-stock) would provide customers a better, quicker
as a general survey of classical and state-of-the-art literature
shopping experience and would also save the store staff the
relevant to recommender systems, e.g. [3, 4, 6, 7, 20, 21, 22],
time of attending to those customers’ requests about finding
we briefly outline a few applications herewith. We focus on the
similar items. Likewise, when shopping online it is good to
Covid-19 angle here, incorporating the need of the hour as seen
have items that are frequently bought together (or used in place
in recent works, e.g. [8-12]. Similar claims can apply to other
of each other) to be recommended. This would make the
general perspectives as well. These applications pertain to using
customers’ life easier; if they were also going to get one of the
association rules and item-based collaborative filtering in
recommended items in addition to or instead of a given item,

4643

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 14,2021 at 05:02:23 UTC from IEEE Xplore. Restrictions apply.
they would not have to spend much time searching. Overall, enjoy the shopping experience and would also save valuable
such arrangements would give the customers a more pleasant time, much needed during this pandemic and its aftermath.
time while shopping and therefore they would be likely to shop
at the same place again, online or in-person. VI. CONCLUSIONS AND FUTURE WORK

B. Usefulness of Item-based Collaborative Filtering This paper constitutes exploratory research that applies and
demonstrates intelligent data mining via the techniques of
1) Company: If a company would implement a Covid-19 association rules and item-based collaborative filtering
related recommender system using item-based collaborative employed within an approach for a baseline recommender
filtering techniques, it would be beneficial in various ways. A system in e-commerce. This focuses on the online shopping of
huge plus is that companies with online shopping facilities cellphones and accessories and uses the Amazon review data
would benefit from ratings provided on relatively new items (2018) as its dataset for exploratory study. The association rule
made by possibly new vendors, especially since some such mining technique through the Apriori algorithm allows for the
items would be imperative during this pandemic, e.g. face discovery of patterns of products frequently bought together by
masks. While N95 masks have been recommended by health scrutinizing the purchase history of users. The item-based
organizations, it is found that they are now more prevalent collaborative filtering technique suggests highly correlated
among medical professionals mainly due to the scarcity of these products, based on a given product of interest, calculated herein
masks. Instead washable cloth masks and disposable surgical from users’ ratings of similar / related products. The baseline
masks are more in use by the general public. Also, since N95 recommender system built in this work offers interesting
masks are not comfortable to wear for too long, it is observed suggestions to users considering the outputs of association rules
that almost equally compatible KN95 masks are more common and item-based collaborative filtering. This system is quite
and are recommended for longtime daily use due to their safety, seamlessly scalable based on the manner in which techniques
comfort and availability. In fact, KN95 masks come with have been adapted and implemented therein, hence the
descriptions such as “perfect for office”. They are indeed respective techniques built into this approach can be feasibly
particularly good for offices and also for all-day use in homes, deployed on other larger datasets with suitable modifications.
especially where there are senior citizens etc. Likewise, the
correlations between such pertinent items as captured through Some future work includes experimenting with sentiment
item-based collaborative filtering are crucial here, especially analysis for opinion mining of users’ reactions to products,
with respect to the brands of specific products and their which entails the polarity classification of emotions in text.
relationships that would provide guidance in online marketing. Since the dataset used in this work includes written reviews, the
Customers would be delighted that they do not have to spend classifications of the attitudes towards the products can
too much time on browsing and searching, and would thus be potentially be useful to improve recommendations. The work in
likely to buy more than one item if similar ones are displayed this paper sets the stage for conducting experiments with data
via the results of item-based collaborative filtering. For relevant to the recent Covid-19 pandemic, e.g. products such as
example, if customers are shopping for “N95 masks” and the face masks, hand sanitizers, antibacterial wipes and others. The
recommendations display “KN95 masks”, the customers would baseline recommender system implemented herein using
see highly relevant and useful products and buy them. Also, intelligent data mining via the techniques of association rules
they would be inclined to revisit the concerned site due to liking and item-based collaborative filtering can be potentially helpful
item-based recommendations therein, thus possibly buying in analyzing Covid-19 related data as well. Additionally, the
more items and bringing the company more revenue. accuracy of the recommender systems in this paper will be
evaluated when we extend this work to Covid-19 related
2) Customer: If a customer would come across a datasets. We may also explore advanced machine learning
recommender system pertinent to Covid-19 that uses item- techniques as needed to improve the accuracy.
based collaborative filtering, the customer would appreciate the
Given this research, we aim to build an app that would
personalization. Nowadays people prefer to have personalized
potentially serve as a recommender system for e-commerce
recommendations rather than general ones. For example, if the
data based on such products. We would incorporate aspects
customer is a healthcare professional (or is shopping on behalf
from HCI (Human Computer Interaction) and IoT (Internet of
of one through a hospital), it would be beneficial to get ratings
Things) in implementing the proposed app. Studies on accuracy
on items such as full PPE (personal protective equipment) and
and effectiveness would be conducted therein since this work
different brands of gloves from other healthcare professionals
would entail full-fledged app development that would go
who have bought similar items online. Having similar items
beyond our simple baseline recommender in this paper. Some
recommended based on user ratings would imply that the
of this further work could be in line with our earlier work on
recommendations would be geared towards that customer’s
apps [13, 14, 23] where detailed user studies have been
likings. By having such recommendations, a customer would
conducted and on other topics [24, 25] where case studies and
easily be able to compare and browse more such products
surveys have been presented.
without having to search again. Overall, the customer would

4644

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 14,2021 at 05:02:23 UTC from IEEE Xplore. Restrictions apply.
On the whole, this paper presents exploratory research on [11] K. Boulos, and E.M. Geraghty, “Geographical tracking and mapping of
coronavirus disease COVID-19”, International Journal of Health
intelligent data mining techniques to enhance decision support Geographics, Springer Nature Journal, 2020, Vol. 19, No. 8, DOI:
in e-commerce, provides a simple baseline recommender 10.1186/s12942-020-00202-8.
system scalable to larger datasets, paves the ground for further [12] BMJ Learning, “Coronavirus disease 2019 (COVID 2019): Supporting
work especially relevant to Covid-19, and fosters suitable app online courses”, https://new-learning.bmj.com/covid-19, 2020.
development on recommender systems. The work in this paper [13] Pathak, D., Varde, A., Alo, C., and Oteng F., “Ubiquitous Access for
Local Water Management through HCI Based App Development”, IEEE
broadly fits the characteristics of smart economy and smart UEMCON Conference, 2020, pp. 227-233.
living in the overall paradigm of smart cities. The research [14] Varghese C., Varde, A. and Du, X., “An Ordinance-Tweet Mining App
conducted here is supplementary to our other pertinent work on to Disseminate Urban Policy Knowledge for Smart Governance”, I3E
smart city aspects such as smart governance with ordinance 2020 Conference, Vol. 2, pp. 389-401.
mining [26], smart environment with green computing [27], [15] J. Ni, J. Li, J. McAuley, “Justifying recommendations using distantly-
labeled reviews and fined-grained aspects,”, EMNLP-IJCNLP
smart people with development of software tools supporting Conference, 2019, pp. 188-197.
21st century education and learning [28], and smart mobility [16] T. Kluyver, B. Ragan-Kelly, F. Perez, B. Granger, M. Bussonnier, J.
with peripheral contributions to autonomous vehicles [29] and Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P. Ivanov, D. Avila,
significant ones to object detection [30]. We anticipate that S. Abdalla, C. Willing, Jupyter Development team, “Jupyter notebooks –
a publishing format for reproducible computational workflows”,
further research pertinent to Covid-19 related products and Positioning and Power in Academic Publishing: Players, Agents and
recommender app development would be significantly Agendas, 2016, pp. 87-90.
beneficial to decision support in e-commerce and would also [17] R. Agrawal, and R. Srikant, “Fast Algorithms for Mining Association
make broader impacts on smart cities. Rules”, VLDB Conference, 1994, pp. 487-499.
[18] F. Xue, X. He, X. Wang, J, Xu, K. Lu, and R. Hong, “Deep Item-based
Collaborative Filtering for top-N Recommendation”, ACM Transactions
on Information Systems, 2019, Article no. 33, doi.org/10.1145/3314578.
ACKNOWLEDGMENTS [19] L. Candillier, F. Meyer, and M. Boullé, “Comparing State-Of-The-Art
Collaborative Filtering Systems”, MLDM 2007, pp. 548-562.
This work is partly supported by a Graduate Assistantship [20] K. Hammond and A. Varde, “Cloud Based Predictive Analytics: Text
and a Student Scholarship for Jessica Lourenco, as well as a Classification, Recommender Systems and Decision Support”, IEEE
Faculty Scholarship Program and a Doctoral Faculty Program ICDM Conference (workshops), 2013, pp. 607-612.
for Aparna Varde, all from internal sources at Montclair State [21] J. B. Schafer, J. Konstan, J. Riell, “Recommender Systems in E-
Commerce”, ACM E-COMMERCE Conference. 1999, pp. 158-166.
University. There is also partial support through an internal
[22] J. Lu, D. Wu, M. Mao, W. Wang, and G. Zhang, “Recommender System
grant (SGPD) and an external grant (MRI) in an area relevant Application Developments: A Survey”, Decision Support Systems
to this project. We thank all our funding sources. Journal, 2015, Vol. 74, pp. 12-32.
[23] D. Karthikeyan, S. Shah, and A. Varde, “Interactive Visualization and
App Development for Precipitation Data in Sub-Saharan Africa”, IEEE
IEMTRONICS conference, 2020, pp. 302-308.
REFERENCES [24] S. Chandra, A. Varde, and J. Wang, “A Hive and SQL Case Study in
[1] S. Tareq, M. Noor and C. Bepery, “Framework of dynamic Cloud Data Analytics”, IEEE UEMCON conference, 2019, pp. 112-118.
recommendation system for e-shopping”, International Journal of [25] P. Basavaraju, and A. Varde, “Supervised Learning Techniques in Mobile
Information Technology, Vol. 12, 2020, pp. 135-140. Device Apps for Androids”, ACM SIGKDD Explorations Journal, 2016,
[2] J. Han, M. Kamber, and J. Pei, “Data Mining: Concepts and Techniques”, Vol. 18, No. 2, pp. 18-29.
2012, pp. 8. [26] M. Puri, A. Varde, X. Du, and G. de Melo, “Smart Governance Through
[3] J. B. Schafer, “The application of data-mining to recommender systems”, Opinion Mining of Public Reactions on Ordinances”, IEEE ICTAI
Encyclopedia of Data Warehousing and Mining, 2005, pp. 44-48. Conference, 2018, pp. 838-845.
[4] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative [27] M. Pawlish, and A. Varde, “The DevOps Paradigm with Cloud Data
filtering recommendation algorithms”, The WWW Conference, 2001, pp. Analytics for Green Business Applications”, ACM SIGKDD
285-295. Explorations Journal, 2018, Vol. 20, No. 1, pp. 51-59.
[5] L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin, “A survey of sentiment [28] Varghese, A., Varde, A., Peng, J. and Fitzpatrick, E., “A Framework for
analysis in social media”, Knowledge and Information Systems (KAIS) Collocation Error Correction in Web Pages and Text Documents”, ACM
Journal, 2019, Vol. 60, pp. 617-663. SIGKDD Explorations Journal, 2015, Vol. 17, No. 1, pp. 14-23.
[6] F. Abbasi, A. Khadivar, and M. Yazdinejad, “A grouping hotel [29] P. Persaud, A. Varde, and S. Robila, “Enhancing Autonomous Vehicles
recommender system based on deep learning and sentiment analysis”, with Commonsense: Smart Mobility in Smart Cities”, IEEE ICTAI
Journal of Information Technology Management, 2019, 11(2), pp. 59–78. Conference, 2017, pp. 1008-1012.
[7] K. Gandhe, A. Varde, and X. Du, “Sentiment Analysis of Twitter Data [30] A. Garg, N. Tandon, and A. Varde, “I Am Guessing You Can’t Recognize
with Hybrid Learning for Recommender Applications, IEEE UEMCON This: Generating Adversarial Images for Object Detection Using Spatial
Conference, 2018, pp. 57-63. Commonsense”, AAAI Conference, 2020, pp. 13789-13790.
[8] P. Lin, P. Moonwy, C. Schoenick, S. Kohlmeier, D. Rishi, T. Bozsolik,
B. Hamner, “COVID-19 Open Research Dataset Challenge (CORD-19)”,
Allen Institute for Artificial Intelligence, https://www.kaggle.com/allen-
institute-for-ai/CORD-19-research-challenge, 2020.
[9] English Corpora, “The Coronavirus Corpus”, https://www.english-
corpora.org/corona, 2020.
[10] B. Anam, S. Akhter, A. Hassan Qurashi, M. Shaheen, and U. Malaysia,
“Coronavirus affects e-commerce globally”, Journal of Xi’an Shiyou
University, 2020, ISSN No: 1673-064X.

4645

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 14,2021 at 05:02:23 UTC from IEEE Xplore. Restrictions apply.

You might also like