Professional Documents
Culture Documents
Project Final1 Sirr
Project Final1 Sirr
I
HYBRID BERT REC: BERT BASED HYBRID RECOMMENDER
Bachelor of Technology
in
CSE (Data Science)
by
January, 2024
II
DECLARATION
I certify that
a. The work contained in this report is original and has been done by me under the guidance
of my supervisor(s).
b. The work has not been submitted to any other Institute for any degree or diploma.
c. I have followed the guidelines provided by the Institute in preparing the report.
d. I have conformed to the norms and guidelines given in the Ethical Code of Conduct of the
Institute.
e. Whenever I have used materials (data, theoretical analysis, figures, and text) from other
sources, I have given due credit to them by citing them in the text of the report and giving
their details in the references. Further, I have taken permission from the copyright owners
of the sources, whenever necessary.
III
CERTIFICATE
This is to certify that the project phase-I report entitled Hybrid-Bert Rec: Bert Based
Hybrid Recommender submitted by Mr. Arandkar Vishal, Mr. Gali Tushar Reddy and
Mr. Ch. Bala Varun Chary to the Institute of Aeronautical Engineering, Hyderabad, in
partial fulfillment of the requirements for the award of the Degree Bachelor of Technology in
CSE(Data Science) is a Bonafide record of work carried out by him/her under my/our
guidance and supervision. In whole or in parts, the contents of this report have not been
submitted to any other institute for the award of any Degree.
Date:
IV
ABSTRACT
In the World of e-commerce, the dynamics of user behaviour evolve rapidly, shaped by societal trends
and changing preferences. This project introduces an innovative approach to recommendation systems
for e-commerce websites, focusing on the dynamic nature of user interactions. Leveraging content-
based filtering (CBF) and collaborative filtering, the project pioneers the integration of a bidirectional
self-attention network inspired by BERT, designed to capture and adapt to sequential user behaviours.
Sequential recommender systems seek to capture information about user affinities and behaviours
considering their sequential series of interactions. In this Project, we detail BERT4Rec, a sequential
recommendation approach, based on bidirectional encoder of self attention-based Transformer
mechanisms. BERT Rec, which applies the bidirectional-encoder representations-from transformers
(BERT) technique to model user behaviour sequences by considering the target user’s historical data,
i.e., a content-based filtering (CBF) approach. Despite BERT Rec’s effectiveness, we argue that
considering only this historical data is insufficient to provide the most accurate recommendation.
HybridBERT Rec, which applies BERT to both CBF and collaborative filtering (CF). For CBF, we
want to extract the characteristics of the target user’s interactions with purchased items. For CF, we
want to find neighbouring users who are similar to the target user. Here, we extract the target item’s
characteristics using all other users who rated the target item as a second input to BERT. This generates
a target item profile. After obtaining both profiles, we use them to predict a rating score. We
experimented with three datasets, finding that our model was more accurate than the original
BERT4Rec.
V
TABLE OF CONTENTS
COVER PAGE i
TITLE PAGE ii
DECLARATION iii
CERTIFICATE iv
ABSTRACT v
CONTENTS vi
LIST OF FIGURES viii
LIST OF ABBREVIATIONS ix
CHAPTER 1 INTRODUCTION 1
1.1 Introduction 1
1.2 Existing System 2
1.3 Proposed System 3
VI
4.2.4 Stacking and Transformer Layer 14
4.2.5 Embeddings and softmax 14
4.3 Dataset 15
4.4 Implementation 17
4.4.1 CBF recommendation using BERT 17
4.4.2 CF recommendations using SVD 19
4.4.3 Hybrid Recommendation 20
References 27
Appendices 27
VII
LIST OF FIGURES
VIII
LIST OF ABBREVIATIONS
IX
Chapter 1
Introduction
1.1 Introduction
The Existing CBF systems make suggestions based on a user's item and profile information.
They believe that if a user has access expressed interest in something, they will do so again in
the long term. Comparable commodities are frequently grouped based on their features. User
profiles are created by analysing previous conversations or straightforwardly asking people
1
about their passions. Other systems use other systems that use user individual and
interpersonal data that aren't regarded as purely content-based
Moreover, The Collaborative Filtering is classified in to two types :
a)Memory-based approaches: This is also known as collaborative filtering in the
neighbourhood. Ratings of user-item pairs are simply predicted based on their proximity [33].
Collaborative filtering is further classified into 2 types: user-based collaborative filtering and
item-based collaborative filtering. User-based simply means that strong and similar
recommendations will come among like people. Item-based collaborative filtering suggests
things based on perceived relevance, as determined by customer reviews
i) User-Based Collaborative Filtering (UBCF):recommends items to a user by identifying
similar users and predicting the user's preferences based on their ratings. It employs a User-
Item Matrix, predicting unrated items for the active user by comparing their preferences to
those of similar users. The K-nearest neighbours (KNN) algorithm is commonly used to find
similar users, making it easy to implement and more accurate compared to some techniques
like content-based methods. However, it faces challenges with sparsity in user ratings,
scalability with a growing user base, and issues with new users and items (cold-start
problems).
ii) Item-Based Collaborative Filtering (IB-CF): focuses on finding items similar to the
user's preferences. It involves calculating similarity among items using techniques like
Cosine-Based Similarity or Correlation-Based Similarity. Unlike UB-CF, IB-CF pre-
calculates item similarities
b) Model-based collaborative filtering is not required to remember the based matrix.
Instead, the machine models are used to forecast and calculate how a customer gives a rating
to each product. These system algorithms are based on machine learning to predict unrated
products by customer ratings. These algorithms are further divided into different subsets, i.e.,
Matrix factorization-based algorithms, deep learning methods, and clustering algorithms.
The proposed Bert solution tries to mitigate the existing problems such as :
1.Cold Start Problem Mitigation: Instead of relying on unique item identifiers to aggregate
historical information, our approach utilizes only the item's title as content, coupled with
token embeddings. This helps address the cold start problem, a significant shortcoming of
traditional recommendation algorithms, by allowing the model to provide meaningful
recommendations even when historical data is limited.
2.User Latent Interest Learning: By training our model with user behaviour data, we aim
to enable the system to learn not only item similarities but also the latent interests of users.
This contrasts with traditional recommendation algorithms and some pair-wise deep learning
algorithms that primarily focus on providing similar items based on past purchases. Our
2
approach seeks to offer more personalized recommendations by capturing the nuanced
preferences of users.
•In CF-HybridBERT, the our solution involves extracting the target item representation,
capturing the similarity levels between all neighbours and the target user. During training, a
random user masking technique is employed on the user sequence associated with each target
item. This process aims to enable the model to reconstruct the masked user's original
embedding as accurately as possible. Once training is completed, the network is capable of
constructing the next user representation based on the characteristics of users interacting with
the target item. During testing, the target item representation is constructed by masking the
target user and adding it to the end of the user sequence. This representation encapsulates the
comparison values between each neighbouring user and the target user, signifying the
similarity levels.
3
Chapter 2
Literature Survey
F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor [1]
first comprehensive handbook which is dedicated entirely to Recommendation Systems (RS)
are applications that offer relevant items to users, from simple book recommendations to
more complex recommendations of moving items like conversational recommender system.
Recommender Systems (RSs) are software tools and techniques providing suggestions for
items to be of use to a user . The suggestions relate to various decision-making processes,
such as what items to buy, what music to listen to, or what online news to read. Although not
exclusively about recommendation systems, this book covers large- scale data mining,
including collaborative filtering and recommendation algorithms.
4
each movie. Word2Vec is a popular word embedding technique that can also be applied in
recommendation systems. The FCMR system was able to achieve good performance on a
number of metrics, but they also suggested that it would be best to use a hybrid
recommender system that combines both content-based and collaborative filtering
approaches and the word2vec Cannot capture long-term dependencies
Sun, Fei, Rui Wang, Rui Zhang, Xin Chen, Xudong Hu, and Zhiyuan Liu [5] :
proposed the BERT4Rec”. BERT4Rec is a recommendation system that uses BERT
(Bidirectional Encoder Representations from Transformers) to generate embeddings for
items. These embeddings are then used to calculate the similarity between items, which can
be used to make recommendations. BERT4Rec has been shown to be effective for a variety
of recommendation tasks, including sequential recommendation, cross-domain
recommendation, and cold-start recommendation. BERT4Rec refers to the application of
BERT (Bidirectional Encoder Representations from Transformers) in the context of
recommender systems. BERT is a pre-trained transformer-based model that has been highly
successful in natural language processing tasks. When applied to recommender systems,
BERT can learn complex patterns and dependencies in user-item interactions, potentially
leading to more accurate and personalized recommendations. The application of BERT in
recommender systems has gained attention because of its ability to capture long-range
dependencies and context in user-item interactions. It allows the model to understand the
semantics of the items and the user preferences more effectively. The drawback was that,
The google also proposed “SASRec, KeBERT4Rec” each of them has its own advantages
like Self-Attentive Sequential Recommendation (SASRec) can capture the relationships
between items that are far apart in the user's history.
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and PengJiang [6].
Modelling users’ dynamic preferences from their historical behaviours is challenging and
crucial for recommendation systems. Previous methods employ sequential neural networks
to encode users Historical interactions from left to right into hidden representations For
making recommendations. Despite their effectiveness, we argue That such left-to-right
unidirectional models are sub-optimal due To the limitations including: a) unidirectional
architectures restrict The power of hidden representation in users’ behaviour sequences; b)
they often assume a rigidly ordered sequence which is not always practical. To address these
limitations, we proposed a sequential recommendation model called BERT4Rec, which
employs the deep bidirectional self-attention to model user behaviour sequences. To avoid
the information leakage and efficiently train the bidirectional model, we adopt the Cloze
objective to sequential recommendation predicting the random masked items in the sequence
by jointly conditioning on their left and right context.
5
Anushree H and Shashidhara H S, Efficient Recommendation System Using Bert
Technology, International Journal of Advanced Research in Engineering And
Technology[7].
Modelling the diverse desires of users through their economic models for systems is dif
icult and essential. Past Techniques utilize successive Neural networks to convert the
historical experiences of customers across left to right Into encoded suggestion models.
Recommendation systems in e-commerce are Becoming an integral means of making
consumers navigate the content accessible. Recommender systems are an important aspect of
E-commerce platforms that Assist consumers choose products of choice on a wide scale
through huge investments. The metadata termed the contextual phrase, that incorporates the
reference label, and The description of the quote proves have been used by several authors to
locate the Relevant research referenced. The lack of a well-established benchmarking dataset
and No tool for recommendations that can achieve great efficiency has perhaps made the
Study challenging. The foundation of mutual marketplace stages in the rental property Field
is the personalization of recommendation systems. To support people, such a Framework
provides a valuable tool. The current technique aims to determine the Context of the movie
plot summary from the given movies using BERT as-a-service and To predict similar movie
recommendations.
Yuyangzi Fu ,Tian Wang [8] :
In e-commerce, recommender systems have Become an indispensable part of helping users
Explore the available inventory. In this work, We present a novel approach for item-based
Collaborative filtering, by leveraging BERT to Understand items, and score relevancy
between Different items. Our proposed method could Address problems that plague
traditional recommender systems such as cold start, and ”more of the same” recommended
content. We Conducted experiments on a large-scale real-World dataset with full cold-start
scenario, and The proposed approach significantly outperforms the popular Bi-LSTM model.
6
Chapter 3
Project Architecture
3.1 High-Level System Architecture
The main component of our architecture is recommendation engine which consist of three
main components :
1 Collaborative Filtering
2 Content Based Filtering
3 Hybrid Combination
2.Content based Filtering: The Content-Based Recommender relies on the similarity of the
items being recommended. The basic idea is that if you like an item, then you will also like a
“similar” item. It generally works well when it's easy to determine the context/properties of
each item.
A content based recommender works with data that the user provides, either explicitly movie
ratings for the MovieLens dataset. Based on that data, a user profile is generated, which is
then used to make suggestions to the user. As the user provides more inputs or takes actions
on the recommendations, the engine becomes more and more accurate.
8
Figure 4. The architecture of the HybridBERT4Rec model, which comprises a CBF part,
and a prediction layer
3.2 UML Diagrams
3.2.1 Data Flow Diagram:
The fundamental data workflow encompasses the collection of data, analysis of raw data,
and the subsequent cleaning and preprocessing , model application and analysing the results
Lets understand how text processing is done :
While dealing with text data preprocessing, the focus is on refining the data by eliminating
unwanted elements, performing stop words removal, and incorporating lemmatization.
1.Removing Noise from Data:
Before extracting meaningful information, it is crucial to eliminate any irrelevant or
extraneous elements that might hinder the analysis process. This noise removal step involves
identifying and discarding data components that do not contribute to the overall
understanding or insights derived from the data.
2.Stop words Removal:
Stop words, commonly occurring words in a language (e.g., 'the,' 'is,' 'and'), are often devoid
of significant semantic meaning and can add noise to textual data. Stop words removal
involves systematically excluding these words from the dataset to streamline the analysis
process, allowing the focus to shift toward more contextually relevant terms. This enhances
the efficiency of subsequent natural language processing tasks.
3.Lemmatization:
Lemmatization is a linguistic process aimed at reducing words to their base or root form,
known as lemmas. This normalization technique ensures that different grammatical
variations of a word, such as verb conjugations or plural forms, are transformed into a
common base. By doing so, lemmatization facilitates more accurate and consistent analysis
of the text data, enabling improved understanding and interpretation of the content.
4.BERT Model:
Utilizing transformers, we load the pre-trained BERT model known as bert-base-uncased, a
comprehensive discussion of which will be provided in Chapter 4. The process involves
tokenizing sentences and subsequently encoding them into embeddings by passing them
through the BERT model. This mechanism allows us to harness the power of pre-trained
language representations for diverse natural language processing tasks.
9
Figure 5. Data Flow Diagram
3.2.2Sequence Diagram
Sequence diagrams can be useful reference diagrams for businesses and other organizations.
We draw a sequence diagram to:
• Represent the details of a UML use case.
• Model the logic of a sophisticated procedure, function, or operation.
10
• See how tasks are moved between objects or components of a process
3. The dataset consists of multiple sub data like ratings ,movies and etc and it is stored
after we perform cleaning and preprocessing them.
4. The ultimate step involves presenting movie recommendations to the user, following
a ranked list recommendation approach.
11
Chapter 4
Methodology And Implementation
4.1 Methodology:
Before getting started lets get introduced with the terminology:
Lets get started with sequential recommendation model called BERT -Rec, which adopts
Bidirectional Encoder Representations from Transformers to a new task, sequential
12
4.2 Understanding Bert:
Now Encoder is built upon multiple self attention mechanism + Feed Forward Neural
network and the same with decoder .Bert is an Encoder model so we will focus only encoder
How does Encoder works.
These are calculated through a function called “Attention” .An attention function can be
described as mapping a query and a set of key-value pairs to an output, where the query,
keys, values, and output are all vectors. The output is computed as a weighted sum
The input consists of queries and keys of dimension dk, and values of dimension dv. We
compute the dot products of the query with all keys, divide each by √ dk, and apply a
softmax function to obtain the weights on the values. In practice, we compute the attention
function on a set of queries simultaneously, packed together into a matrix Q. The keys and
values are also packed together into matrices K and V . We compute the matrix of outputs as:
( )
T
QK
Attention ( Q , K ,V ) =softmax V
√ dk
● Q is query matrix
● K is key matrix
● V indicates values
In addition to attention sub-layers, each of the layers in our encoder and decoder contains a
fully connected feed-forward network, which is applied to each position separately and
identically. This consists of two linear transformations with a ReLU activation in between.
13
non-linearity to the network, allowing it to learn complex relationships and patterns in the
data.
BERTBASE has 12 layers in the Encoder stack while BERTLARGE has 24 layers in the
Encoder stack. These are more than the Transformer architecture described in the original
project (6 encoder layers).
BERT architectures (BASE and LARGE) also have larger feedforward networks (768 and
1024 hidden units respectively), and more attention heads (12 and 16 respectively) than the
Transformer architecture suggested in the original project. It contains 512 hidden units and 8
attention heads.
BERT BASE contains 110M parameters while BERTLARGE has 340M parameters. How
ever we are going to use Bert base for our recommendation system . However, the network
becomes more difficult to train as it goes deeper. Therefore, we employ a residual connection
around each of the two sublayers as in Figure a, followed by layer normalization [5]. More
over, we also apply dropout [6] to the output of each sub-layer, before it is normalized. That
is, the output of each sub-layer is LN(x + Dropout(sublayer(x))), where sublayer(·) is the
function implemented by the sub-layer itself, LN is the layer normalization function
As elaborated above, without any recurrence or convolution module, the Transformer layer
(Trm) is not aware of the order of the input sequence. In order to make use of the sequential
information of the input, we inject Positional Embeddings into the input item embeddings at
the bottoms of the Transformer layer stacks. For a given item vi , its input representation h 0
i is constructed by summing the corresponding item and positional embedding:
0
hi =v i + pi
14
where vi ∈E is the d−dimensional embedding for item vi , Pi ∈P is the d−dimensional
positional embedding for position index i.
After L layers that hierarchically exchange information across all positions in the previous
layer, we get the final output HL for all items of the input sequence. Assuming that we mask
the item vt at time step t, we then predict the masked items vt we apply the softmax layer at
last.
SoftMax Layer: . It is typically used as the output layer of a neural network to convert the
raw output scores or logits into a probability distribution over multiple classes. The softmax
function takes a vector of real numbers as input and produces a probability distribution as
output
ez i
softmax ( z i )= K
.
∑ ❑ ez j
j=1
● k is number of classes
4.3 Dataset:
We have acquired a movie dataset from Kaggle, comprising several sub-datasets, including a
primary movie dataset and separate rating datasets.
15
The movie dataset, intended for content-based filtering, is composed of 45,000 rows and 24
columns. Among these columns, 12 are
deemed essential for analysis, and they include: 'id', 'title', 'genres', 'original language',
'overview', 'tagline', 'production countries', 'release date', 'status', 'vote average', 'vote count',
and 'runtime'. To gain insights into the data, we will conduct exploratory data analysis
(EDA) techniques.
Figure SEQFigure
Figure
11.\*topARABIC
5 genres 7and
A its
detailed bar in
percentage graph
data and
framepie chart on
The ratings dataset, designated for collaborative filtering, comprises 10,000 rows and
includes 4 columns. Among these columns, three are considered crucial for analysis:
'movieId,' 'userId,' and 'rating.' Each entry in the dataset corresponds to a user identified by a
unique user ID providing a rating to a specific movie identified by a unique movie ID. The
ratings provided by users for these movies fall within the range of 1 to 5.
SEQ correlation
Figure 12. Figure \* ARABIC 8 A for ratings
of heatmap
dataframe
There are total of 671 unique users along with 9066 unique movies and the correlation plot
among them is
16
4.4 Implementation
4.4.1 CBF-Recommendation using BERT:
For content-based recommendation, we consolidate various text columns such as title,
overview, genre, tagline, and production companies. Subsequently, we apply natural
language processing (NLP) preprocessing steps, as outlined earlier.
Now, let's contemplate a scenario wherein a movie boasting an average rating of 9, derived
from merely two votes, cannot be deemed superior to a film with a lower average rating of 8
but has garnered 1000 votes. In light of this, we opt to employ IMDB's weighted rating
methodology to assess the overall quality of a movie. The weighted rating of a movie is
defined as:
where,
Now we use pre trained “Bert-base-uncased” model the base refer to the small model and
uncased refers to that all the text used during pre-training is converted to lowercase,
including both the input text and the vocabulary. For example, "Hello" and "hello" would be
treated as the same word in the "uncased" model
some key parameters and details about the BERT model:
Architecture: BERT utilizes a transformer architecture, specifically the transformer
encoder. The transformer model enables the bidirectional processing of input sequences,
allowing it to capture contextual information effectively.
Layers: BERT consists of multiple layers of transformer blocks. The number of layers in the
model is a hyperparameter that can be adjusted. The original BERT-base model has 12
layers,.
Hidden Units: Each layer in the transformer contains a certain number of hidden units,
which is another hyperparameter. The BERT-base model has 768 hidden units in each layer.
Attention Heads: The attention mechanism in the transformer is divided into attention
heads. The number of attention heads is also a hyperparameter. BERT-base has 12 attention
heads.
17
Vocabulary Size: BERT uses Word Piece tokenization, and the "uncased" variant has a
vocabulary size of 30522 tokens. This vocabulary includes both whole words and subwords.
Embedding Dimension: BERT represents words as vectors in an embedding space. The
embedding dimension for BERT-base is 768.
First, we set up our device to CUDA then we load our pre trained Bert model:
CUDA (Compute Unified Device Architecture) is a parallel computing platform and
application programming interface (API) model created by NVIDIA. It allows developers to
use NVIDIA graphics processing units (GPUs) for general-purpose processing, not just
graphics-related tasks. CUDA enables programmers to harness the computational power of
NVIDIA GPUs to accelerate various types of applications, including scientific simulations,
data processing, machine learning, and more.
Now the combined columns from above is passed through the bert model to encode them
18
Figure 13. Training BERT on movie overview
After cleaning we left with approx. of 44000 rows and the default size of each batch is 32 so
44000/32 which is approximately equal to 1377 so in total our dataset is divided into 1377
batches
4.4.2 CF –Recommendation System Using SVD :
As previously discussed the collaborative filtering is of two types item-based and user-based
● User-based, which measures the similarity between target users and other users.
● Item-based, which measures the similarity between the items that target users rate or
interact with and other items
When you have millions of users and/or items, computing pairwise correlations is expensive
and slow. We already saw that we could avoid processing all the data every time by
establishing neighbourhoods on the basis of similarity. Is it possible to reduce the size of the
ratings matrix some other way ?
The answer is yes we can use svd ( singular value decomposition ) to reduce the
dimensionality of the matrix lets understand how?
The Singular Value Decomposition (SVD), a method from linear algebra that has been
generally used as a dimensionality reduction technique in machine learning. SVD is a matrix
factorisation technique, which reduces the number of features of a dataset by reducing the
space dimension from N-dimension to K-dimension (where K<N). In the context of the
recommender system, the SVD is used as a collaborative filtering technique. It uses a matrix
structure where each row represents a user, and each column represents an item. The
elements of this matrix are the ratings that are given to items by users.
A=UΣV T
where A is the input data matrix (users’s ratings), U is the left singular vectors (user
“features” matrix), Σ is the diagonal matrix of singular values (essentially weights/strengths
19
of each concept), and VT is the right singular vectors (movie “features” matrix). U and VT
are column orthogonal, and represent different things. U represents how much users “like”
each feature and VT represents how relevant each feature is to each movie.
With SVD, we turn the recommendation problem into an Optimization problem that deals
with how good we are in predicting the rating for items given a user. One common metric to
achieve such optimization is Root Mean Square Error (RMSE). A lower RMSE is
indicative of improved performance and vice versa. RMSE is minimized on the known
entries in the utility matrix. SVD has a great property that it has the minimal reconstruction
Sum of Square Error (SSE); therefore, it is also commonly used in dimensionality reduction.
Below is the formula to achieve this:
minUv ∑ ∑ ¿ ¿
i , jeA
RMSE and SSE are monotonically related. This means that the lower the SSE, the lower the
RMSE. With the convenient property of SVD that it minimizes SSE, we know that it also
minimizes RMSE. Thus, SVD is a great tool for this optimization problem. To predict the
unseen item for a user, we simply multiply U, V, and ΣT.
21
● |A|.|B| represents the dot product of two vectors
Our dataset comprises 45,000 rows, and plotting all the cosine similarity values would result
in a cluttered visualization. Therefore, let's focus on examining the first 10 plots to gain
insights.
For a recommendation system that provides a list of top 10 recommendations based on user
input, there are few evaluation metrics to assess its performance. Here are some evaluation
approaches
Precision@k:
Precision@k measures the proportion of relevant items among the top-k recommended
items. In your case, if a user interacts with or likes certain items, you can calculate the
precision of the system by checking how many of the top 10 recommended items are
relevant to the user.
here we take k as 10
A Precision@10 of 10.00% suggests that 1 out of the 10 recommended movies was relevant,
according to the ground truth ratings.
22
The precision@10 for our Hybrid BERT recommendation system is :
Precision@10: 90.00%
RMSE (Root Mean Squared Error) is a widely used metric in recommendation systems to
evaluate the accuracy of predicted ratings compared to the actual ratings given by users. It is
particularly relevant when dealing with collaborative filtering algorithms that predict user
preferences for items based on the preferences of other users.
√
n
∑ (Y i−Y^i )2
i=1
RMSE=
n
Where
y iis the actual value of the dependent variable for the i-th observation
^
y i s the predicted value of the dependent variable for thei−th observation
N is the number of observation in dataframe
To evaluate the collaborative filtering we use RMSE we use the Suprise library that
provided various ready-to-use powerful prediction algorithms including (SVD) to evaluate
its RMSE (Root Mean Squared Error) on the ratings dataset. It is a Python scikit building
and analysing recommender systems.
The Surprise library, stands for "Simple Python Recommendation System Engine," is a
Python library designed for building and evaluating recommendation systems. It provides a
convenient and easy-to-use interface for implementing and testing various recommendation
algorithms. Surprise is particularly popular for collaborative filtering-based recommendation
systems
The main advantage of surprise library is that it focus on collaborative filtering so we cant
directly use it to evaluate the hybrid recommendation engine.
MAE is the most straightforward metric of evaluation known as Mean absolute error. The
above is a fancy equation for evaluating it. It is literally the difference between what user
might rate a movie to what our system predicts.
( )∑ ¿ y −^y ∨¿
n
1
MAE= i i
n i=1
Where
y iis the actual value of the dependent variable for the i-th observation
^
y i s the predicted value of the dependent variable for thei−th observation
N is the number of observation in dataframe
23
In order to evaluate our CF-SVD model we use RMSE,MAE in 5 folds ,folds refer to the
divisions or partitions of the dataset that are created for the purpose of training and testing a
model. Cross-validation is a technique used to assess the performance of a model by splitting
the dataset into multiple subsets or folds, training the model on some of these folds, and
evaluating it on the remaining fold(s).
5.3 Results:
If the movie title is found in the dataframe, the system adopts a hybrid approach, providing
movie recommendations and criteria.
Movie_recommendations:
24
Criteria:
The criteria involves similarity score which describes how similar the movies are related to
each other and weighted rating which was discussed in the implementation section
Chapter 6
Conclusion and Future Scope
6.1 Conclusion
In this project, we created a recommendation system using two approaches. First, we looked
at the content of movies using BERT to understand their details. This helped us capture
subtle similarities in movie content based on their descriptions. Second, we considered what
similar users liked using collaborative filtering. We used SVD to handle missing data in our
ratings.
Combining these methods resulted in a smart recommendation system. It thinks about both
the content of movies and what people generally enjoy. It turned out to be quite effective,
with a 90% accuracy in suggesting movies that users might like. The collaborative filtering
part, which checks what similar users liked, had an accuracy measure called RMSE at 0.89,
showing it's doing a good job too.
In simpler terms, our recommendation system acts like a helpful friend who understands
both the details of movies and what people typically enjoy. This makes it better at suggesting
25
movies you might really enjoy watching. It's like having a buddy who knows your taste in
movies and gives you great suggestions!.
Future Scope:
For future enhancements, there's an opportunity to refine our recommendation system by
fine-tuning the BERT model on a more extensive dataset. Currently, we've utilized a pre-
trained BERT base model, but by subjecting it to thorough training on a custom dataset, we
can expect superior performance and heightened accuracy, particularly tailored to our
specific application.
APPENDIX
REFERENCES
26
4. H. Wang, "ZeroMat: Solving Cold-start Problem of Recommender System with No
Input Data," 2021 IEEE 4th International Conference on Information Systems and
Computer Aided Education (ICISCAE), Dalian, China, 2021, pp. 102-105
5. R. Sharma, S. Rani and S. Tanwar, "Machine Learning Algorithms for building
Recommender Systems," 2019 International Conference on Intelligent Computing and
Control Systems (ICCS), Madurai, India, 2019, pp. 785-790
6. M. E. B. H. Kbaier, H. Masri and S. Krichen, "A Personalized Hybrid Tourism
Recommender System," 2017 IEEE/ACS 14th International Conference on Computer
Systems and Applications (AICCSA), Hammamet, Tunisia, 2017, pp. 244-250,
7. Y. Wang, S. C.-F. Chan, and G. Ngai, “Applicability of demographic recommender
system to tourist attractions: a case study on trip advisor,” in Proceedings of the The
2018 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and
Intelligent Agent TechnologyVolume 03. IEEE Computer Society, 2018
8. H. -W. Chen, Y. -L. Wu, M. -K. Hor and C. -Y. Tang, "Fully contentbased movie
recommender system with feature extraction using neural network," 2017 International
Conference on Machine Learning and Cybernetics (ICMLC), Ningbo, China, 2017, pp.
504-509
9. R. Esmeli, M. Bader-El-Den and H. Abdullahi, "Using Word2Vec Recommendation for
Improved Purchase Prediction," 2020 International Joint Conference on Neural
Networks (IJCNN), Glasgow, UK, 2020, pp. 1-8
10. Sun, Fei, Rui Wang, Rui Zhang, Xin Chen, Xudong Hu, and Zhiyuan Liu. "BERT4Rec:
Sequential Recommendation with Bidirectional Encoder Representations from
Transformers." arXiv preprint arXiv:1904.06690 (2019)
11. Dietmar Jannach, Ahtsham Manzoor, Wanling Cai, and Li Chen. 2021. A Survey on
Conversational Recommender Systems. ACM Comput. Surv. 54, 5, Article 105 (June
2022), 36 pages.
12. R. Ahuja, A. Solanki and A. Nayyar, "Movie Recommender System Using K-Means
Clustering AND K-Nearest Neighbor," 2019 9th International Conference on Cloud
Computing, Data Science & Engineering (Confluence), Noida, India, 2019, pp. 263-
268M. U. Gul, K. John Pratheep, M. Junaid and A. Paul, "Spiking Neural Network
(SNN) for Crop Yield Prediction," 2021 9th International Conference on Orange
Technology (ICOT), Tainan, Taiwan, 2021, pp. 1-4.
27
13. Tan, Y.; Zhang, M.; Liu, Y.; and Ma, S. 2017. RatingBoosted Latent Topics:
Understanding Users and Items with Ratings and Reviews. In IJCAI, 2640–2646.
IJCAI/AAAI Press.
14. McAuley, J. J.; and Leskovec, J. 2019. Hidden factors and hidden topics: understanding
rating dimensions with review text. In RecSys, 165–172. ACM
15. Zheng, L.; Noroozi, V.; and Yu, P. S. 2017. Joint Deep Modeling of Users and Items
Using Reviews for Recommendation. In WSDM, 425–434. AC
16. Howard, J.; and Ruder, S. 2018. Universal Language Model Fine-tuning for Text
Classification. In ACL (1), 328–339. Association for Computational Linguistics.
17. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; and Sutskever, I. 2019. Language
models are unsupervised multitask learners. OpenAI Blog 1(8): 9.
18. Peters, M. E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; and
Zettlemoyer, L. 2018. Deep Contextualized Word Representations. In NAACL-HLT,
2227–2237. Association for Computational Linguistics.
19. Paul Covington, Jay Adams, and Emre Sargin. 2019. Deep neural networks for youtube
recommendations. In Proceedings of the 10th ACM conference on recommender
systems, pages 191–198. ACM.
20. Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. 2018. Content-based
recommender systems: State of the art and trends. In Recommender systems handbook,
pages 73–105. Springer
28