Professional Documents
Culture Documents
Learning Techniques
Radha Guha
CSE Department
SRM University AP, Amaravati
Abstract: From the last decade, deep learning technology is showing amazing performance improvement in the
field of computer vision and natural language processing (NLP). NLP’s big leap forward recently has enabled
computers to understand ambiguous human languages decently. In this paper benefit of deep learning techniques
in Book Recommender system design is explored and validated. As every book is huge in content, content-
based filtering used for recommendation system design can benefit from NLP’s breakthrough word embedding
technique which captures word context, semantics, and word dependency better and helps in dimensionality
reduction as well. Subsequent advancement in language model with attention-based transformer architecture
deciphers word and sentence meaning better considering a larger context. Content based filtering computes
nearest neighbor recommendation and this technique will benefit as cosine similarity of one book to another can
be computed more efficiently now. A second method used for recommender design is collaborative filtering
which analyzes users’ past item preference, and user to user and item to item similarity computation. Deep
learning techniques captures non-trivial, non-linear, user-item interaction better than traditional matrix
factorization algorithms. Deep learning trains its model with huge amount of data in its parallel processing
architecture. Multi-core CPU, GPU and TPU will support deep learning’s parallel processing architecture to
handle bigdata to capture complex user-item interaction hierarchy. The contribution of this paper is to explain
recommendation system design aspects, deep learning technology and comparison of deep learning with
traditional machine learning techniques by solving a book recommendation system design.
Usually, companies keep record of their items, customers, and sales. But just recommending most sold items
from the sales database to a customer will ignore customer’s personal taste and will not work most of the times.
So, on the contrary a RecSys tries to personalize product offering by online data mining, analyzing customers’
product review comments, explicit ratings of an item or implicit likeness for an item by individual customers.
Companies capture user’s taste for an item explicitly from customer feedback for an item or implicitly from
user’s browsing history, click history and how much times they are viewing an item etc. These ratings can be in
the scale of 1 to 5 or just 1(like) or 0 (dislike) as collected in YouTube by thumbs up and thumbs down button.
Explicit ratings for items is rare and hard to collect as lazy users will not cast their vote. Recommender system
design depends more on implicit likeness of a user for an item which is gathered automatically by the
businesses.
Implicit ratings are abundant and can be captured real time and is more appropriate for recommender system
design. Only drawback of implicit rating is that no negative feedback or rating can be recorded. Many websites
use cookies; it is a small block of code placed in user’s computer to remember a user and to capture user’s
details like location, age, time of the day, how many times user has visited the page and what products he has
browsed etc. For example, for online shopping if a user put items in a shopping cart but left the session without
buying them and can still find the items in the shopping cart next time he visits the website, then it becomes a
great help to the shopper. Knowing a user beforehand, more personalized content can be delivered conveniently
to each user. Users’ purchase, browsing and click history are assigned some numerical values and considered as
implicit ratings. Sometimes, cookies can breach privacy and security of a user as they are gathering users’
information without their consent.
From the historical data, RecSys engine will estimate or predict how much a user will like an unseen item.
Amazon recommends books by content similarity and based on user’s past preference history on books. Usually
for any RecSys three tables of information is created. These are i). item profile table with book ISBN, book title,
book author, book description etc. (Table 1 for Book recommendation system) ii). user profile table containing
user id, age, location information (Table 2) and iii). item rating table with user id, item id and item rating by
individual users (Table 3). From these information RecSys system can compute many other derived information
like number of ratings per item, average rating per item, user age distribution, location distribution, rating
distribution etc. Finally, RecSys engine estimates likeness of user(i) and item(j) pair i.e., ^ L(U i , I j) for each
user and for each unseen item in the system. This measure is just an estimate and should be as close to the actual
likeness measure L(U i , I j ). This likeness function can be Boolean i.e., user likes an item or does not like it. Or
it can be measured in a scale of 1 to 5 where 5 is the maximum likeness and 1 means minimum likeness etc.
After processing the likeness function RecSys will recommend topN items to a user.
Figure 1: Illustration of Content Based Filtering (CBF) vs. Collaborative Filtering (CF)
RecSys computes likeness measures for items by two basic approaches either Content-Based Filtering (CBF)
or Collaborative Filtering (CF) (Figure 1) [4], [5], [6], [12], [13]. In both the approaches similarity measures
between item to item or user to user is the key. In CBF if user-A has read a book, then similar books will be
recommended to him. In general item features like color and size of a garment or movie genre, actors, theme, or
book content are used to find out item to item similarity. This is called content-based recommendation.
Collaborative filtering is of two types either item collaborative filtering or user collaborative filtering. Other
than content, item to item similarity can be determined if both items are ranked equally in several users’ taste.
User to user similarity can be found out from their history of purchases of common items. This approach is used
for user collaborative filtering. If two user-A and user-B have purchased several common items, then they are
similar, and any item purchased by user-A but not by the user-B can be recommended to him and vice versa.
Both CBF and CF suffers from some problems. CBF may recommend homogeneous boring items instead of
surprising users with new items. CF suffers from cold start problem when a user is new to the system and his
behavior or taste is little known or an item is new in the database, and it has not been purchased by many. A
hybrid method tries to combine both CBF and CF the best possible way for more complex but better
recommendations. When users’ context information like age, location, income etc. are used to recommend items
to them it is called context-based recommendation. Context information can be combined with CBF and CF
recommender as a remedy to cold start problem and for better performance of recommender.
In this paper, book recommendation system design challenges and opportunities are explored. Usually
reading a book takes much longer time than watching a movie. If a good book is recommended to a user, he will
cherish reading the book and remember the story much longer. Also comparing the content of Movie Lens
dataset [14] and Bookcrossing dataset [15] we find that Movie Lens dataset has less amount of text data to
process. Movie Lens data keeps date of release, genre, actors’ names, a short summary of movie theme, average
movie rating, movie review along with their sentiment polarity etc. Whereas Bookcrossing data keeps book title,
author’s name, year of publication, ISBN, publisher, genre, book review etc. But unlike movie dataset no
summary information of the book story line is kept there. Thus, for content-based similarity determination the
whole book content needs to be processed. First, one must scrape book data from online sites like Project
Gutenberg where a lot of e-Books are available for free reading. Because of huge amount of text data, efficiency
of processing and dimensionality reduction of unstructured text data is a major requirement for a Book
Recommendation System which can be achieved more efficiently by deep learning techniques.
As pointed out before, the recent breakthrough of NLP’s word embedding technique is doing the magic in
capturing word context, semantics, word to word similarity, dependency, and dimensionality reduction in word
representation itself. Two algorithms for word embeddings are Word2Vec [16] and GloVe [17] introduced in
2013 at Google and in 2014 at Stanford University, respectively. With the smallest token of the text i.e., a word
or a term being represented more efficiently all other NLP tasks like document-to-document similarity
computation, document summarization, automatic text generation, language translation etc. are performing
better in recent times. With this pre-calculated base word embedding, subsequent invention of attention-based
Transformer architecture in 2017 [18] also by Google is capturing word context and dependency in a much
longer window size of 3072 terms. Self-attention, multi-head attention and positional embedding is now
generating contextualized embedding from transformer architecture [19], [20], [21]. As it is aptly said a word is
known by the company it keeps.
Deep learning neural network use multiple hidden layers to extract hierarchical data information from the
input data. This parallel processing architecture can process huge amount of word vectors efficiently for any
non-linear mapping of input to output. DL also eliminates the manual feature extraction subtask of traditional
machine learning. DL can combine heterogeneous content like text, image, audio in the same model more
efficiently. DL algorithms also works better when there are a lot of training and testing data available. Any
neural computing mainly deep learning needs huge amount of matrix multiplications and additions. Python’s
API PyTorch maps these computations in powerful GPUs and AI accelerator known as tensor processing unit
(TPUs) available today for parallel processing efficiently.
Book RecSys is thus more suitable for deep learning. Limited book metadata available in the book detail
table like book title, author, year of publication, publisher is not enough for capturing the storyline of the book.
Book topic modeling can be done more accurately with deep learning technique as explained in Section II. Big
companies have made remarkable success in improving their recommender system design using deep learning
[21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32]. The difference between big companies and
academia research on deep learning is that big companies can afford more computing power of high-end GPUs
and TPUs. Academia research using massive parallel processing in GPUs and TPUs is not always possible. So,
in academia deep learning techniques were not much explored for recent business needs for a good
recommendation system design. Another hindrance to deep learning research is that deep learning acts as a
black box and explaining its output is a challenging task. The motivation for this paper is in seeing the
pervasiveness of deep learning techniques in many computation domains like face recognition, speech
recognition, machine translation etc. including the recommender system design. The purpose of this paper is to
take a new perspective in book recommender design with deep learning techniques which is possible even in a
single laptop computer with limited amount of data. Our book recommender system can be used for novels,
textbooks, research articles and news media exploration.
The remaining of this paper is organized as follows. Section II introduces the book crossing dataset and
points out that data preprocessing and data wrangling is an important prerequisite for data analytics. Section III
first illustrates content based filtering and collaborative filtering methods used in recommender system design.
Then the problems of implementing them through traditional machine learning algorithm like nearest neighbor
calculation and matrix factorization are showcased. Then deep learning is surveyed in reducing the
shortcomings of traditional methods. Section IV presents several experimental results with deep learning and
introduces the performance metrics that can be used for recommender system evaluation. Section V concludes
the need for more open research in the academia.
The dimensionality reduction of the original user-item interaction matrix is called model based collaborative
filtering. This matrix factorization technique does a decent job in collaborative filtering recommendation. But it
is a linear model and cannot capture more complex non-linear user-item interaction relationships.
Reconstructed matrix approximates the original rating matrix, and it can be evaluated for its prediction
accuracy by root mean square error (RMSE) or mean absolute error (MAE) metrics among many others [33].
√
n
1
n∑
Considering true rating (r ij ¿ and model prediction rating ( y ij), RMSE is given by the formula: ¿ ¿ ¿.
j=1
n
1
And MAE is given by the formula: ∑|(r − y )|. For a good prediction RMSE and MAE should be as
n j=1 ij ij
small as possible. The book crossing rating matrix if factorized with SVD decomposition and RMSE and MAE
errors are reported in Figure 2. For model evaluation training and testing data is split 80:20 ratios. To ensure
reliability of model evaluation, five-fold cross validation is done, where the dataset is split into five fractions
and each time one fraction is used as test set and four other fractions are used for training the model. Maximum
RMSE is 0.9391 and MAE is 0.7375 for Fold 5. For Fold 5 model fit time and test time was also the least than
all other folds. RMSE error is always more than MAE error as it penalizes larger error more. Just to know
prediction accuracy MAE is a preferred error metrics. And to know number of outliers or bad predictions RMSE
is a better metric. Python’s Surprise is an opensource library to build collaborative filtering recommender
system. Python Pandas is used for loading data.
A deep learning architecture named autoencoder can generalize matrix factorization and does better job in
predicting user-item interaction relationship, will be explored and validated in Section III.
Figure 4 shows Word2Vec’s shallow neural network architecture for word embedding with one input layer, one
hidden layer and one output layer only. This word embedding algorithm introduced by Mikolov et al. at Google,
has two models CBOW (continuous bag of words) and Skip-Gram to capture word ordering and semantics of a
text. In CBOW model context words (window size 5 to 10 terms or grams surrounding a target word) predicts a
target word. Whereas in Skip Gram model the target word predicts all context words (N Grams). In the input
layer the words are represented in sparse one-hot encoding but in the hidden layer a word is projected as dense
low dimensional vector of size 100 to 300 dimensions. CBOW and Skip-Gram trains to optimize its prediction
by back propagating the error at the output layer and adjusting the weight matrix by stochastic gradient descent
(SGD) algorithm. This breakthrough in better word representation is improving all other NLP tasks such as
finding document to document similarity, document classification, clustering, sentiment analysis, text
summarization, language translation, question answering, contextual advertising, automatic text generation etc.
Figure 5: Word Embedding Vector Space: Similar Words Cluster Together
In neural word embedding techniques similar words are represented with vectors that are close in vector
space as can be seen in Figure 5. Few most frequently used words from the novel “Moby Dick or the Whale”
viz. ‘whale’, ‘ship’, ‘sea’, ‘man’ and ‘eye’ word vectors are used to find their similar words i.e., words that are
used in the same context. Each word is represented by 100 dimensions vector, but it needs to be projected in 2D
or 3D for visualization. Two-dimensional visualization of word vectors by PCA or tSNE algorithm is seen to
cluster similar words as they have close vector representation. Similar context words for ‘whale’ are whales,
fish, humpback, dolphin, shark etc. Similar words for ‘ship’ are ships, boats, vessels, cargo etc. in Figure 5.
With this amazing capability to have close vector representations of words that are used in the same context,
document to document similarity measurement become more accurate. So, the content-based filtering will be
more accurate. In fact, adding the words of a sentence and averaging them gives a sentence vector and adding
the sentences of a document and averaging them gives a document vector. From this cosine similarity score k-
nearest neighbors of a particular book is computed and recommended.
Training word embedding model takes huge dataset and is computationally expensive. There are many pre
trained word embeddings available free to use which are trained with huge amount of Wikipedia dataset or
Google news etc. We could also train our Word2Vec or GloVe models with Goodreads data or by corpus
created from Project Gutenberg’s eBooks. Word2Vec and GloVe are part of Python’s NLTK and Gensim
packages.
^ z+ b ) ≔ ^x ….. Equation 2
Decoding Functionψ :Z → X z ↦ ψ ( z )=σ ( W
n
1
∑ mi∗‖x i− x^ i‖
2
L ( x , ^x )= n
∑ mi
i=1
i=1
n
1
∑ mi‖ xi −σ (W^ zi +b)‖
2
n
=
∑ mi
i=1
i=1
n
1 2
i=1
The auto encoder neural network is trained in several epochs of iterations to minimize the difference between
input and reconstructed input at the output layer as shown by the loss function L ( x , ^x ) in Equation 3. Here x i is
the actual rating for ith item, and ^x iis the predicted rating. The input is a sparse vector of user ratings or item
ratings of already seen items but at the output layer a dense vector is produced where the blank values in the
input vector is replaced by real numbers as estimated ratings for unseen items. That is possible as the loss
function is masked mean square error (MMSE) and where m j is the mask value which is 0 for items with no
rating and 1 for item with a rating. The loss function is optimized using ADAM optimizer as it is the best.
One of the earliest models for modern deep learning approach for recommender design is presented here.
Many other variants of autoencoder have evolved over the times. TensorFlow, Keras, PyTorch are some of the
deep learning libraries in Python that are needed for recommender system design.
A recommender system can be evaluated with many metrics viz. recall, precision, RMSE, mean reciprocal rank
(MRR), mean average percentage at k (MAP) etc. The experiments that are performed in this paper shows
superiority of deep learning approaches over traditional matrix factorization techniques. In this paper flexibility
of deep leering in recommender system design is validated.
In this paper a hybrid recommender is designed with weighted score of CBF and CF filtering both using deep
learning neural network. Figure 10 shows Top-6 books recommended for a user who has read Melville
Herman’s “Moby Dick, or the whale.” My future research will extend recommender design framework to
integrate sentiment analysis from book reviews collected from social networking sites.
V. Conclusions
In today’s bigdata era use of recommender system in e-commerce businesses will be ubiquitous. At the same
time deep learning techniques is becoming the de facto standard for any kind of learning from data. This paper
makes contribution towards understanding book recommender system design challenges and opportunities and
how advances in deep learning and NLP can be exploited in improving book recommender performance. DL
can reduce the cold start and sparse matrix problem of traditional machine learning algorithm for recommender
design. DL can capture non-trivial, non-linear user-item preference relation even from noisy training data. In
deep learning there is no need for domain expert tor manual feature engineering. All the benefits of DL that are
highlighted here are validated in our implementation of Book Recommender Design with Deep Auto Encoder
neural network. DL is very data hungry and compute intensive, needing parallel processing power of GPUs and
TPUs. Thus, this field is not extensively researched by individual academic researchers. But we can see that the
computer vision and natural language processing field has improved their accuracy significantly by using deep
learning. So, there is a great potential for deep learning to improve accuracy of recommender system engine
also. Understanding of deep learning will encourage readers to apply this technique in designing books, research
paper, news articles and course recommendation etc. This research paper motivates academician by providing a
new perspective of recommender system design benefit with deep learning architecture.
References:
1. Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, et al. Introduction to
information retrieval. Cambridge university press Cambridge, 2008.
2. D. Blei, A.Y. Ng, and M.I. Jordan. 2003. Latent Dirichlet Allocation. In Journal of Machine
Learning Research, 3, 993-1022.
3. Radha Guha, 2020. Exploring Information Retrieval by Latent Semantic and Latent Dirichlet
Allocation Techniques. International Research Journal of Computer Science, Vol. 7, Issue 5.
4. Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender
systems: A survey of the state-of-the-art and possible extensions. IEEE transactions on knowledge and
data engineering 17, 6 (2005), 734–749
5. Pasquale Lops, Marco Degemmis, and Giovanni Semeraro. Content-based recommender
systems: State of the art and trends. In Recommender Systems Handbook, 2011.
6. Yehuda Koren and Robert Bell. Advances in collaborative filtering. In Recommender systems
handbook, pages 77–118. Springer, 2015.
7. Ben Shneiderman 1997. Direct Manipulation for Comprehensible, Predictable, and
Controllable User Interfaces. Proceedings of IUI97, 1997 International Conference on
Intelligent User Interfaces, Orlando, FL, January 6-9, 1997, 33-39.
8. Christopher Avery, Paul Resnick, and Richard Zeckhauser 1999. The Market for Evaluations.
American Economic Review 89(3): pp 564-584.
9. J. Ben Schafer, Joseph Konstan, John Ried, ecommender Systems in E-commerce. EC '99: Proceedings
of the 1st ACM Conference on Electronic commerceNovember 1999 Pages 158–66.
10. Dietmar Jannach et. al.Measuring the Business Value of Recommender Systems.
arXiv:1908.08328v3 [cs.IR], Dec 2019.
11. Carlos A Gomez-Uribe and Neil Hunt. The Netflix recommender system: Algorithms, business
value, and innovation. ACM Transactions on Management Information Systems (TMIS),
6(4):13, 2016.
12. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems.
Computer 8, 30–37 (2009)
13. Haruna K et.al. A Collaborative Approach for Research Paper Recommender System. PLoS
ONE 12(10): e0184516, 2017. https://doi.org/10.1371/journal.pone.0184516.
14. Maxwell Harper and Joseph A. Konstan. The MovieLens Datasets: History and Context. ACM
Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19, December 2015,
DOI=http://dx.doi.org/10.1145/2827872.
15. N. Kurmashov, K. Latuta and A. Nussipbekov, Online Book Recommendation System. Twelve
International Conference on Electronics Computer and Computation (ICECCO), 2015, pp. 1-4, doi:
10.1109/ICECCO.2015.7416895.
16. T. Mikolov et al., 2013. Distributed Representations of Words and Phrases and Their
Compositionality [C]. Advances in Neural Information Processing Systems. 3111-3119, 2013.
17. J. Pennington et al. GloVe: Global Vector for Word Representation. Proceedings of the 2014
Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543.,
2014.
18. Ashish Vaswani et al. Attention is All You Need. arXiv:1706.03762. 2017.
19. M. Heidari and J. H. Jones, "Using BERT to Extract Topic-Independent Sentiment Features for
Social Media Bot Detection," 2020 11th IEEE Annual Ubiquitous Computing, Electronics &
Mobile Communication Conference (UEMCON), 2020, pp. 0542-0547, doi:
10.1109/UEMCON51285.2020.9298158.
20. Tom B. Brown. Language Models are Few-Shot Learners. arXiv:2005.14165v4 [cs.CL] Jul
2020.
21. Radha Guha, 2020. Impact of Artificial Intelligence and Natural Language Processing on
Programming and Software Engineering. International Research Journal of Computer
Science, Vol. 7, Issue 9.
22. Angelov, D. (2020). Top2Vec: Distributed Representations of Topics. arXiv preprint
arXiv:2008.09470.
23. Xin Dong, Lei Yu, Zhonghuo Wu, Yuxia Sun, Lingfeng Yuan, and Fangxi Zhang. 2017. A Hybrid
Collaborative Filtering Model with Deep Structure for Recommender Systems. In AAAI. 1309–1315.
24. Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2018. Deep Learning based Recommender System: A
Survey and New Perspectives. ACM Comput. Surv. 1, 1, Article 1 (July 2018), 35 pages. DOI:
0000001.0000001
25. Basiliyos Tilahun Betru, Charles Awono Onana, and Bernabe Batchakui. 2017. Deep Learning
Methods on Recommender System: A Survey of State-of-the-art. International Journal of Computer
Applications 162, 10 (Mar 2017).
26. Jianpeng Cheng et al. Long Short-Term Memory-Networks for Machine Reading. arXiv
preprint arXiv:1601.06733, 2016.
27. Chris DyeUr et al. Recurrent Neural Network Grammars. In Proc. of NAACL, 2016.
20.Matthew E. Peters et al. Deep Contextualized Word Representations. arXiv:1802.05365.
2018.
28. Xiangnan He et al. Neural Collaborative Filtering. arXiv:1708.05031v2 [cs.IR] 26 Aug 2017.
29. Heng-Tze Cheng et. al. Wide & Deep Learning for Recommender Systems.
arXiv:1606.07792v1 [cs.LG], Jun 2016. [16] Shuai Zhang et. al. Deep Learning Based
Recommender System: A Survey and New Perspectives. arXiv:1707.07435v7 [cs.IR], Jul 2019.
[17] Diana Frerira et al. Recommendation System Using Autoencoders. MDPI.
30. Kuchaiev, O.; Ginsburg, B. Training deep autoencoders for collaborative filtering. arXiv 2017,
arXiv:1708.01715.
31. Haghighi, P.S.; Seton, O.; Nasraoui, O. An Explainable Autoencoder For Collaborative Filtering
Recommendation. arXiv 2019, arXiv:2001.04344.
32. Radha Guha. Improving the Performance of an Artificial Intelligence Recommendation
Engine with Deep Learning Neural Nets. 2021 6th International Conference for Convergence
in Technology (I2CT) Pune, India. Apr 02-04, 202
33. Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root
mean square error (RMSE) in assessing average model performance. Climate Res. 30(1), 79–
82 (2005)