You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/333071932

Model-Based Book Recommender Systems using Naïve Bayes enhanced with


Optimal Feature Selection

Conference Paper · February 2019


DOI: 10.1145/3316615.3316727

CITATIONS READS

6 1,622

1 author:

Sang T.T. Nguyen


International University VNU-HCMC
23 PUBLICATIONS   188 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Recommender systems View project

All content following this page was uploaded by Sang T.T. Nguyen on 02 November 2019.

The user has requested enhancement of the downloaded file.


Model-Based Book Recommender Systems using Naïve
Bayes enhanced with Optimal Feature Selection
Thi Thanh Sang Nguyen
School of Computer Science & Engineering
International University, VNU-HCMC
Ho Chi Minh City, Vietnam
nttsang@hcmiu.edu.vn

ABSTRACT considered. In content-based recommendation systems, item


Book recommender systems play an important role in book search features or attributes are used to make recommendations, that is,
engines, digital library or book shopping sites. In the field of items similar with the items which are liked or bought by a user
recommender systems, processing data, selecting suitable data will be suggested to that user. This kind of model does not often
features, and classification methods are always challenging to provide high recommendation performance but can enhance the
decide the performance of a recommender system. This paper performance when integrated with the collaborative filtering
presents some solutions of data process, feature and classifier model [2]. Knowledge-based recommendation systems are useful
selection in order to build an efficient book recommender system. in the context of items that are not often purchased [1], or when
The Book-Crossing dataset, which has been studied in many book item ratings may not be available. This model can solve the cold-
recommender systems, is taken into account as a case study. The start problem, when new items come or have not been learnt. The
attributes of books are analyzed and processed to increase the knowledge-based recommendation model can suggest (new)
classification accuracy. Some well-known classification items to users based on what they want or specify.
algorithms, such as, Naïve Bayes, decision tree, etc., are utilized As we known, deep learning models are emerging and able to
to predict user interests in books and evaluated in several produce item recommendations with high accuracy [3, 4].
experiments. It has been found that Naïve Bayes is the best However, they take much time and memory for training.
selection for book recommendation with acceptable run-time and Therefore, this study focuses on selecting classification
accuracy. algorithms which are simple but efficient for recommending
books by considering the issues of training time, accuracy and
CCS Concepts error rates. This job is crucial in case of memory limitation which
• Computing methodologies➝ Classification and regression often happens at small and medium business organizations who
trees • Computing methodologies➝Feature selection want to have effective and reliable recommender systems.
Furthermore, this study is extended to integrate a deep learning
Keywords model into text-value representation in order to improve book
Book recommender systems; classification algorithms; Naïve recommendation accuracy.
Bayes; decision trees; Word2Vec.
In the context of this study, the well-known Book-crossing dataset
is used in the proposed book recommender systems. The dataset
1. INTRODUCTION has been used popularly in studying book recommender systems,
Recommender systems are quite interested in e-commerce or and is able to be collected from this site,
Business Intelligence since theirs benefits of attracting more http://www2.informatik.uni-freiburg.de/~cziegler/BX/.
customers and increasing revenue. Particularly, book Processing this large dataset is very challenging, but able to be
recommender systems can support users to select the best books applied to real datasets in book recommendation applications.
which are suitable for them. Users or customers can save time This is the reason the dataset is selected for the book
shopping or searching for books, and this encourages them to recommender systems. The book recommender systems are built
shop more books and obtain more what they want. There are three in three phases: (1) pre-processing data, (2) building a classifier
basic models of recommender systems [1]: Collaborative filtering, model, and (3) building a book recommendation engine. This
Content-based recommendation, Knowledge-based study focuses on Phase (1) and (2). Attribute values of books are
recommendation. Collaborative filtering models often analyze selected and formatted. Specially, text values are vectorized in
user interests or ratings to find out user groups having same Phase (1) to facilitate the classification process. Some
interests, then recommend items interested by the group to a user classification algorithms, such as, Naïve Bayes, decision tree, etc.
in the same group. Collaborative filtering-based recommendation are applied and compared in Phase (2). In Phase (3), a book rating
usually achieves high performance because user behaviors are prediction engine is built based on the built classifier model. This
Permission to make digital or hard copies of all or part of this work for book rating prediction engine can be used to make book
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
recommendations later but has not been developed in this study.
copies bear this notice and the full citation on the first page. Copyrights To validate as well as evaluate the proposed system, mainly
for components of this work owned by others than ACM must be focusing on the data preparation and the classification method
honored. Abstracting with credit is permitted. To copy otherwise, or selection, several experiments have been conducted.
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from The following will present related work in Section 2, classifier
Permissions@acm.org. evaluation methods in Section 3. The details of building the
ICSCA '19, February 19–21, 2019, Penang, Malaysia
© 2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6573-4/19/02…$15.00
https://doi.org/10.1145/3316615.3316727
proposed book recommender systems are presented in Section 4. have higher accuracy and speed, especially when applied to large
Section 5 shows experimental results and evaluations. Finally, datasets [6].
Section 6 concludes the study.
However, the Bayesian classifier just accepts discrete attributes,
2. RELATED WORK e.g., nominal or ordinal, but not text data. Hence,
NaiveBayesMultinomialText has been developed to operate only
2.1 Recommender Systems on text data (or String attributes) in Weka, open-source data
In collaborative filtering models, there are two types of methods, mining software (https://www.cs.waikato.ac.nz/ml/weka/).
which are memory-based methods and model-based methods [1]. Because of this advantage, NaiveBayesMultinomialText is taken
Memory-based methods (or called neighborhood-based into account in the classifier models of this study.
collaborative filtering algorithms) are commonly used because
they are easy to implement and make recommendations. The 2.2.3 Bayes Networks
neighborhoods are found based on users or items, so we have In some cases, the dependent relations between attributes are
user-based collaborative filtering methods and item-based important, Bayesian classification might not be a good choice.
collaborative filtering methods. The user-based ones discover Bayes (or Bayesian) networks which are probabilistic graphical
users having same taste with the target user to make predictions. models represent the dependencies among subsets of attributes
The item-based ones firstly determine items most similar to the [6]. In the network, each node is a variable or event, and links
target user, then consider other users‟ behaviors on these items to represent dependencies among the variables. Events occur before
generate recommendations to the target user. Combination of are the parents of the followed events. A conditional probability
user-based and item-based collaborative filtering methods can table is made for each variable or event, it shows the conditional
achieve higher performance than using a single collaborative probability for each possible combination of its parents. Basing
filtering method because it takes the advantages of the both on the Bayes networks, predictions might be more accurate.
methods [5]. On the other hand, the model-based methods use
machine learning and data mining methods to make predictions. 2.2.4 SGDText
Models can be decision trees, rule-based models, or Bayesian SGDText is an extension of SGD (stochastic gradient descent), is
methods. To build model-based recommender systems, input data [7] an optimization method for learning by support vector
needs to be processed and converted to formats, e.g., matrices, machines (SVM). It can handle large datasets, especially, String
suitable with the learning models for training, and then attributes are accepted in this classifier model. It has been
predictions or recommendations can be made based on the learnt implemented in the Weka software package. However, the
models. The advantages of model-based methods compared with problem of this algorithm is it takes much time to build the
the memory-based ones are space-efficiency (the used memory classifier model.
for the learnt model is small), training speed and prediction speed As we can see, NaiveBayesMultinomialText and SGDText are
are faster, and avoiding overfitting. Building model-based extensions for handling String attributes, while decision trees,
prediction models is a good way to find out most accurate Naïve Bayesian classifier, and Bayes networks are not able to be
methods for making recommendations. Therefore, the proposed run on text data. These learning models have different
book recommender systems are built in this way. advantages, so it is necessary to compare them to select the best
Some studied classification algorithms as well as the text classifier model for the proposed book recommender system.
processing methods are briefly described in the following
subsections. 2.3 Word Embedding
As we can see, some classification algorithms have a problem of
2.2 Classification Algorithms processing text values. Hence, word embedding methods are
2.2.1 Decision trees usually used to represent text. Vector space models, Latent
A decision tree algorithm was proposed first by J. Ross Quinlan, Semantic Analysis and Word2Vec model are the interesting
known as ID3 (Iterative Dichotomiser) in early 1980s [5]. The approaches to word embedding. According to Mikolov et al. [8],
decision tree is constructed given a set of records presenting the words in documents can be vectorized, so that we can compute
attribute values of instances or items. Each node in the tree is an the distance between words as the distance between vectors. The
attribute, and attribute values are assigned to the branches of the Word2Vec model is constructed according to skip-gram models
node. The leaf node in the tree is the class (label) attribute. The [9] learning word vector representations in order to predict the
decision tree then allows to make decisions by traversing the context of a word in the same sentence. It is argued that every
attribute nodes in the tree. ID3 was expanded into C4.5 with new word w is associated with two vectors uw and vw which are vector
features, such as, accepting continuous and discrete attributes, representations of w as word and context respectively. The
handling missing values and solving over-fitting problem by probability of predicting word wi given wj is determined using the
pruning. C4.5 is a well-known classification algorithm and will be softmax function:
used in this study.
2.2.2 Bayesian classification methods ( | ) ∑
(1)
Bayesian classification methods are developed based Bayes‟
theorem. In the Naïve Bayesian classifier, it is assumed that where V is the vocabulary size.
attributes are conditionally independent. The probability of each
attribute value classified into a class is computed, the prior The formula (1) is used to compute the similarity of two words. In
probability of each class is also estimated. Based on the estimated the Word2Vec model, the softmax function is improved by using
probabilities, the class label of a new tuple or instance can be hierarchical softmax for reducing the computing complexity. This
predicted. Compared with decision trees, Bayesian classifiers model is considered to be efficient in learning word embedding
from raw text, so it is used to vectorize book titles in this study.
3. CLASSIFICATION MODEL Scheme name is defined after @relation. Attributes are defined
by @attribute. We can define some attribute types, such as,
EVALUATION numeric, nominal and string. For nominal type, all values need to
To evaluate classification models, the used dataset is divided into be listed. Records are listed below @data.
two parts: one for training and one for testing. Normally, the
training data part is more than the testing data part. Depending on
data size and evaluation strategies, the ratio of the two parts might
be different. Cross-validation is a widely used evaluation method.
In cross-validation, the dataset is split into k folds, each fold is
selected in turn for testing and the remaining folds are for training
[6, 7], so we have k-fold cross-validation. That is the procedure
repeats k times, so that at the end, the averages of performance
measuring results are computed. In many practical experiments, k
is often selected 10 to get the best error estimation.
To evaluate the classifier performance, some metrics are used to
measure accuracy, precision, recall, F-score, error rate, etc. [6].
This study focuses on computing the Accuracy (2), Precision (3),
and Recall (4) of predictions or recommendations.

(2)
Figure 1. Framework of the proposed book recommender
(3) systems
(4) Since the data source of the system is the Book-crossing dataset,
it is necessary to convert the dataset into ARFF files. The Book-
where, TP, TN, FP, P, N refer to the number of true positive, true crossing dataset includes three tables in CSV files: BX-Users,
negative, false positive, positive, BX-Books, and BX-Book-Ratings. BX-Users table contains user
and negative samples, respectively [6]. profiles: User-ID, Location, and Age. BX-Books table stores
book information: ISBN, Book-Title, Book-Author, Year-Of-
Besides, this study also considers Root mean-squared error Publication, Publisher, Image-URL-S, Image-URL-M, and
(RMSE) (5) in some experimental evaluations. Image-URL-L. BX-Book-Ratings table includes User-ID, ISBN,
and Book rating. Book ratings are expressed on a scale from 1-10
√ (5) (higher values denoting higher appreciation). In practice, this
information can be obtained from book shopping sites. Because
where, p1, p2, …, pn are the predicted values given the test the contents of titles, authors or locations contain many special
instances, and a1, a2, …, an are actual values. characters, and delimiters, e.g., „,‟, „;‟, „”‟, which are confused
with the separator sign between columns or attributes (features),
In addition, the time of building classification models are taken
we need to remove such characters. The pre-processing unit does
into account to evaluate their efficiency in some experiments.
this job and convert the tables to the ARFF format. Besides,
4. BUILDING MODEL-BASED BOOK attribute values need to be formatted in required types, e.g.,
numeric or string.
RECOMMENDER SYSTEMS
Moreover, because each book title contains the meaning itself
4.1 Framework which might impact book classification, the book titles are
The framework of the model-based book recommender systems is
vectorized to facilitate the book classification. In other words, a
proposed as Figure 1. In this framework, there are three process
title is converted to a number value for easy comparison. The
units: (1) data pre-processing unit, (2) classifiers (or classification
Word2Vec model is involved to model the titles. Each title is a
models), and (3) book prediction engine. collection of word vectors. There are two ways to transform one
4.1.1 The data pre-processing unit title to a number. For each title, we can compute the length of the
In the pre-processing unit, input data is cleaned and converted sum of the word vectors or the dot product of the word vectors.
into a format which can be read by the classifiers. Because the In order to select flexibly features (or attributes) from the dataset
system utilizes the classifiers from the Weka library, the input for training, that is, we can join records between the tables, the
data needs to be in the ARFF format [7], as described in the object classes of User, Book, and Book-rating are developed and
following example of the weather data. linked. User-ID and ISBN attributes in Book-rating class are
@relation weather linked to User-ID attribute in User class and ISBN attribute in
@attribute outlook { sunny, overcast, rainy } Book class, respectively. The reason is we need to combine user
@attribute temperature numeric profile, book information and ratings for classification. The
@attribute humidity numeric features of users and books affect ratings. The system then can
@attribute windy { true, false }
@attribute play? { yes, no } make predictions based on user profile, item (book), and user
@data interest (rating).
sunny, 85, 85, false, no
sunny, 80, 90, true, no 4.1.2 Classifiers
overcast, 83, 86, false, yes In this study, two classifier groups of not accepting text data and
rainy, 70, 96, false, yes accepting text data are employed. Naïve Bayes, C4.5, and Bayes
rainy, 68, 80, false, yes
Network belong to the first group. NaïveBayesMultinominalText For each case, a number of extracted instances are used for
and SDGText belong to the second group. These classifiers can training and testing. The 10-fold cross-validation is used in the all
be selected in the system for training the input data. experiments. The average results of accuracy, precision, or RMSE
are calculated from 10 iterations in the cross-validation and are
After building the classifier model, we can make a rating shown as final results.
prediction given a query, i.e., a new record of a user and a book.
Or we can evaluate the performance of the classifier using testing 5 EXPERIMENTAL RESULTS AND
data.
EVALUATIONS
4.1.3 Book prediction engine In this study, the experiments were run on a PC with Intel Core
The book prediction engine can predict the rating of a book made i7-4770 processor, 3.40 GHz and 8
by a user using one of the classifiers. GB of RAM.
Furthermore, the system can be extended to recommend books to Case 1: Six nominal attributes of ISBN, author, year, publisher,
the target user. The system can find books, combine them with location and age, and a rating class are used. In that, the rating
the user profile, and predict book ratings. If the rating of a book is class has 10 values from 1 to 10.
higher than 5, that book will be recommended.
Table 1 describes the performance evaluation of Case 1. In this
4.2 Feature Selection case, we just observe the classification accuracy of typical
algorithms, e.g., NaiveBayesMultinomialText, Naïve Bayes, and
As discussed, features influencing ratings need to be selected for C4.5, on the pre-processed dataset with 10 original classes. As
training. Given the Book-crossing dataset, the eight selected seen, when the number of validated instances is larger, the
features (attributes) are: ISBN, title, author, year, publisher, accuracy of classification is lower. Modeling a decision tree takes
location, age, and rating (class). The first five features are book too much memory space, so we cannot run C4.5 on 90% of
information, the next two features are user information, and the 1,000,000 instances for training. Moreover, the accuracy of C4.5
last one is the book rating. User profile and book information are is lower than Naïve Bayes, as shown in the case the number of
combined for making rating predictions. used instances is 28,978. The accuracy of Naïve Bayes is higher
As we know, when the rating class of 10 values might be difficult than that of NaiveBayesMultinomialText when the number of
to evaluate the precision of predicting user interests (like or training instances is small, but it is opposite when the number of
unlike) and the accuracy will not be high. Therefore, the training instances is greater. The RMSE of Naïve Bayes is about
multivalue rating class is converted to the binary class in some 0.24, a little bit higher than the one of
experiments for comparison. In particular, class 0 represents the NaiveBayesMultinomialText.
ratings from 1-5, namely “unlike”, and class 1 represents the Table 1. Case 1 – Evaluation
ratings from 6-10, namely “like”.
NaiveBayesMultinomialText Naïve C4.5
In the above selected features, we care about the location and age Bayes
of users, because book recommendation can depend on the # of instances: 1,000,000
demographic data, i.e., a group of users having a similar profile. Accuracy 62.748 % 59.997 -
Moreover, a book title is a string or text, but we can consider its %
type as nominal or string in some experiments of classification. RMSE 0.230 0.236
# of instances: 28,978
When titles are vectorized, their type will be able to be numeric.
Accuracy 63.6552% 64.352 63.6552%
The below experiments will give more details of the classification %
processes. RMSE 0.228 0.241 0.104
4.3 Training and Testing
Case 2: Six nominal attributes of ISBN, author, year, publisher,
The Book-crossing dataset contains 278,858 users, 271,379 location and age, and a rating class are used. In that, the rating
books, and 1,149,780 ratings, including some missing values. class has two values 0 and 1.
After preprocessing, the mentioned features are selected for
training and testing. That means data records present the values of Table 2 describes the performance evaluation of Case 2. In this
the selected features (attributes). One record can be called an case, the NaiveBayesMultinomialText, SGDText, Naïve Bayes,
instance. The total number of extracted instances are 1,028,978. Bayes Networks, and C4.5 classifier models are used. The
accuracy and RMSE of classification, and the precision of class 1
In this study, six experimental cases are carried out, in that, the prediction are measured to compare the performance of the
book title attribute is not involved in the first four cases. They models. Similarly, C4.5 takes too much time and memory for
will show different performance evaluation results when attribute training 900,000 instances, so it is failed to get validation results.
types are selected differently. The Naïve Bayes model achieves the highest accuracy and
Case 1: Six nominal attributes and a multivalue class. precision, and the lowest RMSE. Time taken to build the Naïve
Bayes model is quite fast (0.09s when 28,978 instances are used),
Case 2: Six nominal attributes and a binary class. just longer than NaiveBayesMultinomialText. In addition, we see
that Bayes Networks can provide high prediction accuracy but
Case 3: Six nominal, string and numeric attributes, and a binary
take more time building the model than Naïve Bayes.
class.
Table 2. Case 2 – Evaluation
Case 4: Seven nominal attributes and a binary class.
NaiveBayes- SGDText Naïve Bayes C4.5
Case 5-6: Book title is considered in classification. Multinomial Bayes Networks
-Text
# of instances: 1,000,000 Precision 0 0.546 0.559 -
Accuracy 68.750% 68.750% 69% 68.210% - RMSE 0.463 0.448 0.441 -
Precision 0 0 0.510 0.490 - # of instances: 28,978
RMSE 0.460 0.560 0.460 0.460 - Accuracy 69.677% 72.835% 72.990% 69.677 %
# of instances: 28,978 Precision 0 0.590 0.622 0
Accuracy 69.680% 69.677% 71.990% 71.710% 69.680% RMSE 0.460 0.452 0.452 0.460
Precision 0 0 0.590 0.560 0 Model 0.040s 0.640s 0.090s 32.870s
RMSE 0.460 0.551 0.450 0.450 0.460 building
Model 0.060s 2.040s 0.090s 0.490s 34.150s time
building
time
Compared with some previous studies [10, 11] using the same
dataset, the proposed book recommendation system using Naïve
Compared with Case 1, the classifier models perform better in Bayes is promising with lower RMSEs, about 0.24 for predicting
Case 2. That means classifying the dataset with the binary class is book ratings in range [1-10], and 0.5 for predicting book ratings
more effective. in range [0-1]. We do not compare accuracy or precision because
the way they make predictions is different from ours.
Case 3: Six attributes of nominal ISBN, nomial author, numeric
year, nomial publisher, string location, numeric age, and a rating Case 5: Book title is considered to evaluate its impact on rating
class are used. In that, the rating class has two values 0 and 1. In prediction.
this case, we try to test attribute values with types as they are.
Year and age attribute values are numeric, so they should Based on the above experiments, Naïve Bayes is the most
computed as numbers, rather than nominal values. Comparing efficient classifier, thus it is used in this experiment while the
strings of ISBN, author, and publisher is not necessary, so we attributes are changed. Based on Cases 2 and 3, setting the
keep their types being nominal. attribute types into nominal can gain higher performance, so the
attribute types in this case are converted to nominal. In this case,
Table 3 describes the performance evaluation of Case 3. In this 1.028.978 records are put into the Naïve Bayes classifier. There
case, the NaiveBayesMultinomialText and SGDText classifier are four sub-cases of selecting attributes. The class value, i.e.,
models are used to handle String attributes. The number of used rating, is binary.
instances is 28,978. We just test a small dataset because it takes
too much time to build the SGDText model. Although the Sub-case 1: Eight attributes of nominal ISBN, titleDotPro,
classification accuracy of SGDText is higher than that of author, year, publisher, User-ID, location and age are selected. A
NaiveBayesMultinomialText, the used memory space and model titleDotPro value is the dot product of a book title.
building time is inefficient. Therefore, this case is just for Sub-case 2: similar to Sub-case 1, but ISBN is not included.
reference. Moreover, the accuracy of
NaiveBayesMultinomialText is not higher than that in Case 2. It Sub-case 3: same as Sub-case 2, but titleDotPro is changed to
can be said that string comparison is not effective in the titleSumVecLen which is the sum vector length of book title.
classification process.
Sub-case 4: same as Sub-case 2, but titles are kept as strings and
Table 3. Case 3 – Evaluation the title type is consider as nominal.
NaiveBayesMultinomialText SGDText Table 5. Case 5 –Performance over four sub-cases
# of instances: 28,978
Sub-case 1 Sub-case 2 Sub-case 3 Sub-case 4
Accuracy 69.360% 72.430%
Accuracy 72.117% 72.946% 72.891% 72.191%
Model building time 0.190s 389.250s
Precision 0.556 0.573 0.572 0.558
RMSE 0.442 0.430 0.430 0.441
Case 4: Seven nominal attributes of ISBN, author, year,
publisher, User-ID, location, age, and a rating class are used. In
Table 5 shows the performance of Naïve Bayes over the four sub-
that, the rating class has two values 0 and 1.
cases. The performance is higher when ISBN is removed from the
Table 4 describes the performance evaluation of Case 4. In this dataset. Representing titles as the dot products of word vectors is
case, the NaiveBayesMultinomialText, Bayes Networks, more effective than the sum vector lengths. This can be seen in
NaïveBayes, and C4.5 classifier models are used. The Naïve Sub-cases 2 and 3. Compared with Sub-case 4, it shows that the
Bayes classifier model still achieves the highest performance in title vectorization can bring higher performance of title
terms of accuracy and precision (of class 1 prediction), and the comparison.
lowest RMSE. Especially, the accuracy of the classifier models is
Case 6: Book title is considered to evaluate its impact on rating
higher than the ones in Case 1-3. That means the prediction
prediction, and numeric attributes are carefully considered.
accuracy is higher when the User-ID attribute is added because
each specific user is considered. In other words, considering The following sub-cases are taken into account when running the
personalization can make better predictions. Naïve Bayes model on the dataset of 1.028.978 instances.
Table 4. Case 4 – Evaluation Sub-case 1: Eight attributes of nominal ISBN, numeric
NaiveBayes- Bayes Naïve C4.5
titleDotPro, nominal author, numeric year, nominal publisher,
Multinomial- Networks Bayes nominal User-ID, nominal location and numeric age are selected.
Text Sub-case 2: similar to Sub-case 1, but ISBN is not included.
# of instances: 1,000,000
Accuracy 68.754% 71.695% 72.223% -
Sub-case 3: same as Sub-case 1, but titleDotPro is changed to [2] S. Yang, M. Korayem, K. AlJadda, T. Grainger, and S.
titleSumVecLen which is the sum vector length of book title. Natarajan. 2017. Combining content-based and collaborative
filtering for job recommendation system: A cost-sensitive
Sub-case 4: same as Sub-case 3, but ISBN is not included. Statistical Relational Learning approach. Knowledge-Based
Table 6. Case 6 – Performance over four sub-cases Systems.
Sub-case 1 Sub-case 2 Sub-case 3 Sub-case 4 [3] P. Covington, J. Adams, and E. Sargin. 2016. Deep Neural
Accuracy 71.810% 72.597% 72.029% 72.828% Networks for YouTube Recommendations. In Proceedings
Precision 0.550 0.565 0.556 0.574 of the 10th ACM Conference on Recommender Systems,
RMSE 0.446 0.434 0.444 0.4330 Boston, Massachusetts, USA, 191-198.
[4] P. Z. Han Zhu, Guozheng Li, Jie He, Han Li, Kun Gai. 2018.
The experimental results in Table 6 show that the numeric
Learning Tree-based Deep Model for Recommender
attribute type does not facilitate the Naïve Bayes models.
Systems. In Proceedings of the 24th ACM SIGKDD
6 CONCLUSIONS International Conference on Knowledge Discovery & Data
In conclusion, this study has presented the solutions of selecting Mining. ACM, New York, NY, 1079-1088. DOI=
suitable features for the rating prediction, selecting attribute types, https://doi.org/10.1145/3219819.3219826.
and selecting appropriate classifier models in the book [5] M. Chandak, S. Girase, and D. Mukhopadhyay. 2015.
recommender systems. The selected features which are author, Introducing Hybrid Technique for Optimization of Book
year, publisher, User-ID, location and age determine book rating Recommender System. Procedia Computer Science, vol. 45,
predictions. As shown in the experiments, Sub-case 2 in Case 5 is 23-31.
the best solution when Naïve Bayes are applied to the seven [6] JiaweiHan, Micheline Kamber. 2011. Data Mining:
nominal attributes, except for ISBN, and book titles are converted Concepts and Techniques, 3rd Edition. Morgan Kaufmann.
to dot products. Naïve Bayes is the best solution in most of the
experimental cases. It can achieve the highest performance and is [7] Ian H.Witten, Eibe Frank and Eibe Frank.
efficient while saving memory and time building the classifier 2011. DataMining: Practical Machine Learning Tools and
model. Techniques (Third Edition). Morgan Kaufmann.
[8] Mikolov, T., et al. 2013. Distributed Representations of
Attribute types should be nominal when applying model-based
Words and Phrases and their Compositionality. In
recommendation methods. Numeric and string types are not
Proceedings of the 26th International Conference on Neural
efficient for classifier models. Applying the word embedding
Information Processing Systems – Volume 2. Lake Tahoe,
method to represent book titles can improve title representation
Nevada, 3111-3119.
and make better predictions. In the future, more advanced
classifiers, e.g., neural networks, will be evaluated but we might [9] David Guthrie, et al. 2006. A Closer Look at Skip-Gram
need more time and memory for training the models. Because of Modelling. In Proceedings of the Fifth International
limitation of memory usage, those models are not employed in Conference on Language Resources and Evaluation (LREC-
this study. 2006).
[10] M. Chandak, S. Girase, and D. Mukhopadhyay. 2015.
7 ACKNOWLEDGMENT Introducing Hybrid Technique for Optimization of Book
This research is funded by International University VNUHCM Recommender System. Procedia Computer Science, vol. 45,
under grant number 06-IT-2017. 23-31.
8 REFERENCES [11] A. Tashkandi, L. Wiese, and M. Baum. 2017. Comparative
[1] C. C. Aggarwal. 2016. Recommender Systems. Springer Evaluation for Recommender Systems for Book
International Publishing. Recommendations. In Lecture Notes in Informatics (LNI),
Gesellschaft für Informatik, Bonn.

View publication stats

You might also like