Professional Documents
Culture Documents
i
HYBRID BERT REC: BERT BASED HYBRID RECOMMENDER
Bachelor of Technology
in
CSE (Data Science)
by
January, 2024
ii
DECLARATION
I certify that
a. The work contained in this report is original and has been done by me under the guidance
of my supervisor(s).
b. The work has not been submitted to any other Institute for any degree or diploma.
c. I have followed the guidelines provided by the Institute in preparing the report.
d. I have conformed to the norms and guidelines given in the Ethical Code of Conduct of the
Institute.
e. Whenever I have used materials (data, theoretical analysis, figures, and text) from other
sources, I have given due credit to them by citing them in the text of the report and giving
their details in the references. Further, I have taken permission from the copyright owners
of the sources, whenever necessary.
iii
APPROVAL SHEET
This project phase-I report entitled Hybrid-Bert Rec: Bert Based Hybrid Recommender
submitted by Mr. Arandkar Vishal, Mr. G. Tushar Reddy, and Mr. Ch. Bala Varun Chary
is approved for the award Degree Bachelor of Technology in Data Science.
Examiner Supervisor
Principal
Dr. L V Narasimha Prasad
Date:
Place:
iv
CERTIFICATE
This is to certify that the project phase-I report entitled Hybrid-Bert Rec: Bert Based Hybrid
Recommender submitted by Mr. Arandkar Vishal, Mr. Gali Tushar Reddy and Mr. Ch.
Bala Varun Chary to the Institute of Aeronautical Engineering, Hyderabad, in partial
fulfillment of the requirements for the award of the Degree Bachelor of Technology in CSE(Data
Science) is a Bonafide record of work carried out by him/her under my/our guidance and
supervision. In whole or in parts, the contents of this report have not been submitted to any other
institute for the award of any Degree.
Date:
v
ACKNOWLEDGEMENT
With Gratitude,
vi
ABSTRACT
In the World of e-commerce, the dynamics of user behaviour evolve rapidly, shaped by societal trends
and changing preferences. This project introduces an innovative approach to recommendation systems
for e-commerce websites, focusing on the dynamic nature of user interactions. Leveraging content-based
filtering (CBF) and collaborative filtering, the project pioneers the integration of a bidirectional self-
attention network inspired by BERT, designed to capture and adapt to sequential user behaviours.
Sequential recommender systems seek to capture information about user affinities and behaviours
considering their sequential series of interactions. In this Project, we detail BERT4Rec, a sequential
recommendation approach, based on bidirectional encoder of self attention-based Transformer
mechanisms. BERT Rec, which applies the bidirectional-encoder representations-from transformers
(BERT) technique to model user behaviour sequences by considering the target user’s historical data,
i.e., a content-based filtering (CBF) approach. Despite BERT Rec’s effectiveness, we argue that
considering only this historical data is insufficient to provide the most accurate recommendation.
HybridBERT Rec, which applies BERT to both CBF and collaborative filtering (CF). For CBF, we want
to extract the characteristics of the target user’s interactions with purchased items. For CF, we want to
find neighbouring users who are similar to the target user. Here, we extract the target item’s
characteristics using all other users who rated the target item as a second input to BERT. This generates
a target item profile. After obtaining both profiles, we use them to predict a rating score. We experimented
with three datasets, finding that our model was more accurate than the original BERT4Rec.
vii
TABLE OF CONTENTS
COVER PAGE i
TITLE PAGE ii
DECLARATION iii
APPROVAL SHEET iv
CERTIFICATE v
ACKNOWLEDGMENT vi
ABSTRACT vii
CONTENTS viii
LIST OF FIGURES x
LIST OF ABBREVIATIONS xi
CHAPTER 1 INTRODUCTION 1
1.1 Introduction 1
1.2 Existing System 2
1.3 Proposed System 3
viii
4.2.2 Attention mechanism 13
4.2.3 Feed forward neural network 13
4.2.4 Stacking and Transformer Layer 14
4.2.5 Embeddings and softmax 14
4.3 Dataset 15
4.4 Implementation 17
4.4.1 CBF recommendation using BERT 17
4.4.2 CF recommendations using SVD 19
4.4.3 Hybrid Recommendation 20
References 27
Appendices 27
ix
LIST OF FIGURES
x
LIST OF ABBREVIATIONS
xi
Chapter 1
Introduction
1.1 Introduction
The Existing CBF systems make suggestions based on a user's item and profile information.
They believe that if a user has access expressed interest in something, they will do so again in
the long term. Comparable commodities are frequently grouped based on their features. User
profiles are created by analysing previous conversations or straightforwardly asking people
1
about their passions. Other systems use other systems that use user individual and interpersonal
data that aren't regarded as purely content-based
Moreover, The Collaborative Filtering is classified in to two types :
a)Memory-based approaches: This is also known as collaborative filtering in the
neighbourhood. Ratings of user-item pairs are simply predicted based on their proximity [33].
Collaborative filtering is further classified into 2 types: user-based collaborative filtering and
item-based collaborative filtering. User-based simply means that strong and similar
recommendations will come among like people. Item-based collaborative filtering suggests
things based on perceived relevance, as determined by customer reviews
i) User-Based Collaborative Filtering (UBCF):recommends items to a user by identifying
similar users and predicting the user's preferences based on their ratings. It employs a User-
Item Matrix, predicting unrated items for the active user by comparing their preferences to
those of similar users. The K-nearest neighbours (KNN) algorithm is commonly used to find
similar users, making it easy to implement and more accurate compared to some techniques
like content-based methods. However, it faces challenges with sparsity in user ratings,
scalability with a growing user base, and issues with new users and items (cold-start problems).
ii) Item-Based Collaborative Filtering (IB-CF): focuses on finding items similar to the user's
preferences. It involves calculating similarity among items using techniques like Cosine-Based
Similarity or Correlation-Based Similarity. Unlike UB-CF, IB-CF pre-calculates item
similarities
b) Model-based collaborative filtering is not required to remember the based matrix. Instead,
the machine models are used to forecast and calculate how a customer gives a rating to each
product. These system algorithms are based on machine learning to predict unrated products
by customer ratings. These algorithms are further divided into different subsets, i.e., Matrix
factorization-based algorithms, deep learning methods, and clustering algorithms.
The proposed Bert solution tries to mitigate the existing problems such as :
1.Cold Start Problem Mitigation: Instead of relying on unique item identifiers to aggregate
historical information, our approach utilizes only the item's title as content, coupled with token
embeddings. This helps address the cold start problem, a significant shortcoming of traditional
recommendation algorithms, by allowing the model to provide meaningful recommendations
even when historical data is limited.
2.User Latent Interest Learning: By training our model with user behaviour data, we aim to
enable the system to learn not only item similarities but also the latent interests of users. This
contrasts with traditional recommendation algorithms and some pair-wise deep learning
algorithms that primarily focus on providing similar items based on past purchases. Our
approach seeks to offer more personalized recommendations by capturing the nuanced
preferences of users.
2
•In CF-HybridBERT, the our solution involves extracting the target item representation,
capturing the similarity levels between all neighbours and the target user. During training, a
random user masking technique is employed on the user sequence associated with each target
item. This process aims to enable the model to reconstruct the masked user's original
embedding as accurately as possible. Once training is completed, the network is capable of
constructing the next user representation based on the characteristics of users interacting with
the target item. During testing, the target item representation is constructed by masking the
target user and adding it to the end of the user sequence. This representation encapsulates the
comparison values between each neighbouring user and the target user, signifying the
similarity levels.
3
Chapter 2
Literature Survey
F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor [1]
first comprehensive handbook which is dedicated entirely to Recommendation Systems (RS)
are applications that offer relevant items to users, from simple book recommendations to more
complex recommendations of moving items like conversational recommender system.
Recommender Systems (RSs) are software tools and techniques providing suggestions for
items to be of use to a user . The suggestions relate to various decision-making processes, such
as what items to buy, what music to listen to, or what online news to read. Although not
exclusively about recommendation systems, this book covers large- scale data mining,
including collaborative filtering and recommendation algorithms.
4
movie. Word2Vec is a popular word embedding technique that can also be applied in
recommendation systems. The FCMR system was able to achieve good performance on a
number of metrics, but they also suggested that it would be best to use a hybrid recommender
system that combines both content-based and collaborative filtering approaches and the
word2vec Cannot capture long-term dependencies
Sun, Fei, Rui Wang, Rui Zhang, Xin Chen, Xudong Hu, and Zhiyuan Liu [5] :
proposed the BERT4Rec”. BERT4Rec is a recommendation system that uses BERT
(Bidirectional Encoder Representations from Transformers) to generate embeddings for items.
These embeddings are then used to calculate the similarity between items, which can be used
to make recommendations. BERT4Rec has been shown to be effective for a variety of
recommendation tasks, including sequential recommendation, cross-domain recommendation,
and cold-start recommendation. BERT4Rec refers to the application of BERT (Bidirectional
Encoder Representations from Transformers) in the context of recommender systems. BERT
is a pre-trained transformer-based model that has been highly successful in natural language
processing tasks. When applied to recommender systems, BERT can learn complex patterns
and dependencies in user-item interactions, potentially leading to more accurate and
personalized recommendations. The application of BERT in recommender systems has gained
attention because of its ability to capture long-range dependencies and context in user-item
interactions. It allows the model to understand the semantics of the items and the user
preferences more effectively. The drawback was that, The google also proposed “SASRec,
KeBERT4Rec” each of them has its own advantages like Self-Attentive Sequential
Recommendation (SASRec) can capture the relationships between items that are far apart in
the user's history.
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and PengJiang [6].
Modelling users’ dynamic preferences from their historical behaviours is challenging and
crucial for recommendation systems. Previous methods employ sequential neural networks to
encode users Historical interactions from left to right into hidden representations For making
recommendations. Despite their effectiveness, we argue That such left-to-right unidirectional
models are sub-optimal due To the limitations including: a) unidirectional architectures
restrict The power of hidden representation in users’ behaviour sequences; b) they often
assume a rigidly ordered sequence which is not always practical. To address these limitations,
we proposed a sequential recommendation model called BERT4Rec, which employs the deep
bidirectional self-attention to model user behaviour sequences. To avoid the information
leakage and efficiently train the bidirectional model, we adopt the Cloze objective to
sequential recommendation predicting the random masked items in the sequence by jointly
conditioning on their left and right context.
5
Anushree H and Shashidhara H S, Efficient Recommendation System Using Bert
Technology, International Journal of Advanced Research in Engineering And
Technology[7].
Modelling the diverse desires of users through their economic models for systems is dif icult
and essential. Past Techniques utilize successive Neural networks to convert the historical
experiences of customers across left to right Into encoded suggestion models.
Recommendation systems in e-commerce are Becoming an integral means of making
consumers navigate the content accessible. Recommender systems are an important aspect of
E-commerce platforms that Assist consumers choose products of choice on a wide scale
through huge investments. The metadata termed the contextual phrase, that incorporates the
reference label, and The description of the quote proves have been used by several authors to
locate the Relevant research referenced. The lack of a well-established benchmarking dataset
and No tool for recommendations that can achieve great efficiency has perhaps made the Study
challenging. The foundation of mutual marketplace stages in the rental property Field is the
personalization of recommendation systems. To support people, such a Framework provides
a valuable tool. The current technique aims to determine the Context of the movie plot
summary from the given movies using BERT as-a-service and To predict similar movie
recommendations.
Yuyangzi Fu ,Tian Wang [8] :
In e-commerce, recommender systems have Become an indispensable part of helping users
Explore the available inventory. In this work, We present a novel approach for item-based
Collaborative filtering, by leveraging BERT to Understand items, and score relevancy
between Different items. Our proposed method could Address problems that plague traditional
recommender systems such as cold start, and ”more of the same” recommended content. We
Conducted experiments on a large-scale real-World dataset with full cold-start scenario, and
The proposed approach significantly outperforms the popular Bi-LSTM model.
6
Chapter 3
Project Architecture
3.1 High-Level System Architecture
The main component of our architecture is recommendation engine which consist of three
main components :
1 Collaborative Filtering
2 Content Based Filtering
3 Hybrid Combination
2.Content based Filtering: The Content-Based Recommender relies on the similarity of the
items being recommended. The basic idea is that if you like an item, then you will also like a
“similar” item. It generally works well when it's easy to determine the context/properties of
each item.
A content based recommender works with data that the user provides, either explicitly movie
ratings for the MovieLens dataset. Based on that data, a user profile is generated, which is then
used to make suggestions to the user. As the user provides more inputs or takes actions on the
recommendations, the engine becomes more and more accurate.
Figure 4. The architecture of the HybridBERT4Rec model, which comprises a CBF part,
and a prediction layer
8
3.2 UML Diagrams
3.2.1 Data Flow Diagram:
The fundamental data workflow encompasses the collection of data, analysis of raw data, and
the subsequent cleaning and preprocessing , model application and analysing the results
Lets understand how text processing is done :
While dealing with text data preprocessing, the focus is on refining the data by eliminating
unwanted elements, performing stop words removal, and incorporating lemmatization.
1.Removing Noise from Data:
Before extracting meaningful information, it is crucial to eliminate any irrelevant or
extraneous elements that might hinder the analysis process. This noise removal step involves
identifying and discarding data components that do not contribute to the overall understanding
or insights derived from the data.
2.Stop words Removal:
Stop words, commonly occurring words in a language (e.g., 'the,' 'is,' 'and'), are often devoid
of significant semantic meaning and can add noise to textual data. Stop words removal
involves systematically excluding these words from the dataset to streamline the analysis
process, allowing the focus to shift toward more contextually relevant terms. This enhances
the efficiency of subsequent natural language processing tasks.
3.Lemmatization:
Lemmatization is a linguistic process aimed at reducing words to their base or root form,
known as lemmas. This normalization technique ensures that different grammatical variations
of a word, such as verb conjugations or plural forms, are transformed into a common base. By
doing so, lemmatization facilitates more accurate and consistent analysis of the text data,
enabling improved understanding and interpretation of the content.
4.BERT Model:
Utilizing transformers, we load the pre-trained BERT model known as bert-base-uncased, a
comprehensive discussion of which will be provided in Chapter 4. The process involves
tokenizing sentences and subsequently encoding them into embeddings by passing them
through the BERT model. This mechanism allows us to harness the power of pre-trained
language representations for diverse natural language processing tasks.
9
Figure 5. Data Flow Diagram
3.2.2Sequence Diagram
10
• See how tasks are moved between objects or components of a process
3. The dataset consists of multiple sub data like ratings ,movies and etc and it is stored
after we perform cleaning and preprocessing them.
4. The ultimate step involves presenting movie recommendations to the user, following
a ranked list recommendation approach.
11
Chapter 4
Methodology And Implementation
4.1 Methodology:
Before getting started lets get introduced with the terminology:
Lets get started with sequential recommendation model called BERT -Rec, which adopts
Bidirectional Encoder Representations from Transformers to a new task, sequential
12
4.2 Understanding Bert:
4.2.1Transformer Layer: Lets understand with example in machine translation the language
gets translated from one language to another this done through the transformers . Initially the
language gets encoded in terms of embeddings by “Encoder” and then “decoder “in target
language decodes it.
Now Encoder is built upon multiple self attention mechanism + Feed Forward Neural network
and the same with decoder .Bert is an Encoder model so we will focus only encoder How does
Encoder works.
These are calculated through a function called “Attention” .An attention function can be
described as mapping a query and a set of key-value pairs to an output, where the query, keys,
values, and output are all vectors. The output is computed as a weighted sum
The input consists of queries and keys of dimension dk, and values of dimension dv. We
compute the dot products of the query with all keys, divide each by √ dk, and apply a softmax
function to obtain the weights on the values. In practice, we compute the attention function on
a set of queries simultaneously, packed together into a matrix Q. The keys and values are also
packed together into matrices K and V . We compute the matrix of outputs as:
𝑄𝐾 𝑇
𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(𝑄, 𝐾, 𝑉) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( )𝑉
√𝑑𝑘
● Q is query matrix
● K is key matrix
● V indicates values
In addition to attention sub-layers, each of the layers in our encoder and decoder contains a
fully connected feed-forward network, which is applied to each position separately and
identically. This consists of two linear transformations with a ReLU activation in between.
13
Mathematically, ReLU is defined as f(x)=max(0,x), meaning it replaces any negative values
with zero while leaving positive values unchanged. The purpose of ReLU is to introduce non-
linearity to the network, allowing it to learn complex relationships and patterns in the data.
Figure 9. Feed
Figure SEQ Forward
Figure Neural
\* ARABIC
Network5: Encoder
Architecture
4.2.4 Stacking Transformer Layer:
BERTBASE has 12 layers in the Encoder stack while BERTLARGE has 24 layers in the
Encoder stack. These are more than the Transformer architecture described in the original
project (6 encoder layers).
BERT architectures (BASE and LARGE) also have larger feedforward networks (768 and
1024 hidden units respectively), and more attention heads (12 and 16 respectively) than the
Transformer architecture suggested in the original project. It contains 512 hidden units and 8
attention heads.
BERT BASE contains 110M parameters while BERTLARGE has 340M parameters. How
ever we are going to use Bert base for our recommendation system . However, the network
becomes more difficult to train as it goes deeper. Therefore, we employ a residual connection
around each of the two sublayers as in Figure a, followed by layer normalization [5]. More
over, we also apply dropout [6] to the output of each sub-layer, before it is normalized. That
is, the output of each sub-layer is LN(x + Dropout(sublayer(x))), where sublayer(·) is the
function implemented by the sub-layer itself, LN is the layer normalization function
As elaborated above, without any recurrence or convolution module, the Transformer layer
(Trm) is not aware of the order of the input sequence. In order to make use of the sequential
information of the input, we inject Positional Embeddings into the input item embeddings at
the bottoms of the Transformer layer stacks. For a given item vi , its input representation h 0
i is constructed by summing the corresponding item and positional embedding:
ℎ𝑖0 = 𝑣𝑖 + 𝑝𝑖
14
where vi ∈E is the d−dimensional embedding for item vi , Pi ∈P is the d−dimensional positional
embedding for position index i.
After L layers that hierarchically exchange information across all positions in the previous
layer, we get the final output HL for all items of the input sequence. Assuming that we mask
the item vt at time step t, we then predict the masked items vt we apply the softmax layer at
last.
SoftMax Layer: . It is typically used as the output layer of a neural network to convert the raw
output scores or logits into a probability distribution over multiple classes. The softmax
function takes a vector of real numbers as input and produces a probability distribution as
output
𝑒 𝑧𝑖
. 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧𝑖 ) = ∑𝐾
𝑗=1 𝑒 𝑧𝑗
The movie dataset, intended for content-based filtering, is composed of 45,000 rows and 24
columns. Among these columns, 12 are
deemed essential for analysis, and they include: 'id', 'title', 'genres', 'original language',
'overview', 'tagline', 'production countries', 'release date', 'status', 'vote average', 'vote count',
15
and 'runtime'. To gain insights into the data, we will conduct exploratory data analysis (EDA)
techniques.
The ratings dataset, designated for collaborative filtering, comprises 10,000 rows and includes
4 columns. Among these columns, three are considered crucial for analysis: 'movieId,' 'userId,'
and 'rating.' Each entry in the dataset corresponds to a user identified by a unique user ID
providing a rating to a specific movie identified by a unique movie ID. The ratings provided
by users for these movies fall within the range of 1 to 5.
There are total of 671 unique users along with 9066 unique movies and the correlation plot
among them is
16
4.4 Implementation
4.4.1 CBF-Recommendation using BERT:
For content-based recommendation, we consolidate various text columns such as title,
overview, genre, tagline, and production companies. Subsequently, we apply natural language
processing (NLP) preprocessing steps, as outlined earlier.
Now, let's contemplate a scenario wherein a movie boasting an average rating of 9, derived
from merely two votes, cannot be deemed superior to a film with a lower average rating of 8
but has garnered 1000 votes. In light of this, we opt to employ IMDB's weighted rating
methodology to assess the overall quality of a movie. The weighted rating of a movie is defined
as:
where,
𝑣 𝑚
𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑅𝑎𝑡𝑖𝑛𝑔 = ( ∗ 𝑅) + ( ∗ 𝐶)
𝑣+𝑚 𝑣+𝑚
Now we use pre trained “Bert-base-uncased” model the base refer to the small model and
uncased refers to that all the text used during pre-training is converted to lowercase, including
both the input text and the vocabulary. For example, "Hello" and "hello" would be treated as
the same word in the "uncased" model
some key parameters and details about the BERT model:
Architecture: BERT utilizes a transformer architecture, specifically the transformer encoder.
The transformer model enables the bidirectional processing of input sequences, allowing it to
capture contextual information effectively.
Layers: BERT consists of multiple layers of transformer blocks. The number of layers in the
model is a hyperparameter that can be adjusted. The original BERT-base model has 12 layers,.
Hidden Units: Each layer in the transformer contains a certain number of hidden units, which
is another hyperparameter. The BERT-base model has 768 hidden units in each layer.
Attention Heads: The attention mechanism in the transformer is divided into attention heads.
The number of attention heads is also a hyperparameter. BERT-base has 12 attention heads.
Vocabulary Size: BERT uses Word Piece tokenization, and the "uncased" variant has a
vocabulary size of 30522 tokens. This vocabulary includes both whole words and subwords.
Embedding Dimension: BERT represents words as vectors in an embedding space. The
embedding dimension for BERT-base is 768.
First, we set up our device to CUDA then we load our pre trained Bert model:
17
CUDA (Compute Unified Device Architecture) is a parallel computing platform and
application programming interface (API) model created by NVIDIA. It allows developers to
use NVIDIA graphics processing units (GPUs) for general-purpose processing, not just
graphics-related tasks. CUDA enables programmers to harness the computational power of
NVIDIA GPUs to accelerate various types of applications, including scientific simulations,
data processing, machine learning, and more.
Now the combined columns from above is passed through the bert model to encode them
After cleaning we left with approx. of 44000 rows and the default size of each batch is 32 so
44000/32 which is approximately equal to 1377 so in total our dataset is divided into 1377
batches
18
4.4.2 CF –Recommendation System Using SVD :
As previously discussed the collaborative filtering is of two types item-based and user-based
● User-based, which measures the similarity between target users and other users.
● Item-based, which measures the similarity between the items that target users rate or
interact with and other items
When you have millions of users and/or items, computing pairwise correlations is expensive
and slow. We already saw that we could avoid processing all the data every time by
establishing neighbourhoods on the basis of similarity. Is it possible to reduce the size of the
ratings matrix some other way ?
The answer is yes we can use svd ( singular value decomposition ) to reduce the dimensionality
of the matrix lets understand how?
The Singular Value Decomposition (SVD), a method from linear algebra that has been
generally used as a dimensionality reduction technique in machine learning. SVD is a matrix
factorisation technique, which reduces the number of features of a dataset by reducing the
space dimension from N-dimension to K-dimension (where K<N). In the context of the
recommender system, the SVD is used as a collaborative filtering technique. It uses a matrix
structure where each row represents a user, and each column represents an item. The elements
of this matrix are the ratings that are given to items by users.
𝐴 = 𝑈𝛴𝑉T
● A is the input matrix
● U is left singular Vector I,e userId
● Σ is the diagonal matrix of singular values
● VT represents the transpose of movie features
where A is the input data matrix (users’s ratings), U is the left singular vectors (user “features”
matrix), Σ is the diagonal matrix of singular values (essentially weights/strengths of each
concept), and VT is the right singular vectors (movie “features” matrix). U and VT are
column orthogonal, and represent different things. U represents how much users “like” each
feature and VT represents how relevant each feature is to each movie.
With SVD, we turn the recommendation problem into an Optimization problem that deals
with how good we are in predicting the rating for items given a user. One common metric to
achieve such optimization is Root Mean Square Error (RMSE). A lower RMSE is indicative
of improved performance and vice versa. RMSE is minimized on the known entries in the
utility matrix. SVD has a great property that it has the minimal reconstruction Sum of Square
Error (SSE); therefore, it is also commonly used in dimensionality reduction. Below is the
formula to achieve this:
20
Chapter 5
Results
5.1 Feature Importance and analysis:
The comprehension and selection of appropriate features significantly impact the performance
of a model. In the context of content-based filtering, we consider all text data as crucial
features, as it encapsulates the essence of item content and behavior. The sentence embeddings
obtained from the BERT model have a dimensionality of 768.
Now, let's delve into the pivotal role played by cosine similarity in this context. Cosine
similarity is instrumental in quantifying the angle between each pair of vectors representing
the sentence embeddings. This measurement helps gauge the similarity or dissimilarity
between different pieces of text data. By calculating the cosine similarity, we effectively
capture the semantic relationships and similarities among the items, forming a fundamental
aspect of the content-based recommendation system.
𝐴. 𝐵
𝑐𝑜𝑠(𝐴, 𝐵) =
|𝐴|. |𝐵|
● Where A,B are vectors
● |A|.|B| represents the dot product of two vectors
Our dataset comprises 45,000 rows, and plotting all the cosine similarity values would result
in a cluttered visualization. Therefore, let's focus on examining the first 10 plots to gain
insights.
For a recommendation system that provides a list of top 10 recommendations based on user
input, there are few evaluation metrics to assess its performance. Here are some evaluation
approaches
Precision@k:
21
Precision@k measures the proportion of relevant items among the top-k recommended items.
In your case, if a user interacts with or likes certain items, you can calculate the precision of
the system by checking how many of the top 10 recommended items are relevant to the user.
here we take k as 10
A Precision@10 of 10.00% suggests that 1 out of the 10 recommended movies was relevant,
according to the ground truth ratings.
RMSE (Root Mean Squared Error) is a widely used metric in recommendation systems to
evaluate the accuracy of predicted ratings compared to the actual ratings given by users. It is
particularly relevant when dealing with collaborative filtering algorithms that predict user
preferences for items based on the preferences of other users.
̂𝑖 )2
∑𝑛𝑖=1(𝑌𝑖 − 𝑌
𝑅𝑀𝑆𝐸 = √
𝑛
Where
• 𝑦𝑖 is the actual value of the dependent variable for the i-th observation
• 𝑦̂𝑖 𝑠 𝑡ℎ𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑖 − 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
• N is the number of observation in dataframe
To evaluate the collaborative filtering we use RMSE we use the Suprise library that provided
various ready-to-use powerful prediction algorithms including (SVD) to evaluate its RMSE
(Root Mean Squared Error) on the ratings dataset. It is a Python scikit building and analysing
recommender systems.
The Surprise library, stands for "Simple Python Recommendation System Engine," is a Python
library designed for building and evaluating recommendation systems. It provides a
convenient and easy-to-use interface for implementing and testing various recommendation
algorithms. Surprise is particularly popular for collaborative filtering-based recommendation
systems
22
The main advantage of surprise library is that it focus on collaborative filtering so we cant
directly use it to evaluate the hybrid recommendation engine.
MAE is the most straightforward metric of evaluation known as Mean absolute error. The
above is a fancy equation for evaluating it. It is literally the difference between what user
might rate a movie to what our system predicts.
𝑛
1
𝑀𝐴𝐸 = ( ) ∑ |𝑦𝑖 − 𝑦̂𝑖 |
𝑛
𝑖=1
Where
• 𝑦𝑖 is the actual value of the dependent variable for the i-th observation
• 𝑦̂𝑖 𝑠 𝑡ℎ𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑖 − 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
• N is the number of observation in dataframe
In order to evaluate our CF-SVD model we use RMSE,MAE in 5 folds ,folds refer to the
divisions or partitions of the dataset that are created for the purpose of training and testing a
model. Cross-validation is a technique used to assess the performance of a model by splitting
the dataset into multiple subsets or folds, training the model on some of these folds, and
evaluating it on the remaining fold(s).
5.3 Results:
If the movie title is found in the dataframe, the system adopts a hybrid approach, providing
movie recommendations and criteria.
23
Movie_recommendations:
24
Chapter 6
Conclusion and Future Scope
6.1 Conclusion
In this project, we created a recommendation system using two approaches. First, we looked
at the content of movies using BERT to understand their details. This helped us capture subtle
similarities in movie content based on their descriptions. Second, we considered what similar
users liked using collaborative filtering. We used SVD to handle missing data in our ratings.
Combining these methods resulted in a smart recommendation system. It thinks about both the
content of movies and what people generally enjoy. It turned out to be quite effective, with a
90% accuracy in suggesting movies that users might like. The collaborative filtering part,
which checks what similar users liked, had an accuracy measure called RMSE at 0.89, showing
it's doing a good job too.
In simpler terms, our recommendation system acts like a helpful friend who understands both
the details of movies and what people typically enjoy. This makes it better at suggesting
movies you might really enjoy watching. It's like having a buddy who knows your taste in
movies and gives you great suggestions!.
Future Scope:
For future enhancements, there's an opportunity to refine our recommendation system by fine-
tuning the BERT model on a more extensive dataset. Currently, we've utilized a pre-trained
BERT base model, but by subjecting it to thorough training on a custom dataset, we can expect
superior performance and heightened accuracy, particularly tailored to our specific
application.
25
APPENDIX
REFERENCES
26
11. Dietmar Jannach, Ahtsham Manzoor, Wanling Cai, and Li Chen. 2021. A Survey on
Conversational Recommender Systems. ACM Comput. Surv. 54, 5, Article 105 (June
2022), 36 pages.
12. R. Ahuja, A. Solanki and A. Nayyar, "Movie Recommender System Using K-Means
Clustering AND K-Nearest Neighbor," 2019 9th International Conference on Cloud
Computing, Data Science & Engineering (Confluence), Noida, India, 2019, pp. 263-
268M. U. Gul, K. John Pratheep, M. Junaid and A. Paul, "Spiking Neural Network (SNN)
for Crop Yield Prediction," 2021 9th International Conference on Orange Technology
(ICOT), Tainan, Taiwan, 2021, pp. 1-4.
13. Tan, Y.; Zhang, M.; Liu, Y.; and Ma, S. 2017. RatingBoosted Latent Topics:
Understanding Users and Items with Ratings and Reviews. In IJCAI, 2640–2646.
IJCAI/AAAI Press.
14. McAuley, J. J.; and Leskovec, J. 2019. Hidden factors and hidden topics: understanding
rating dimensions with review text. In RecSys, 165–172. ACM
15. Zheng, L.; Noroozi, V.; and Yu, P. S. 2017. Joint Deep Modeling of Users and Items
Using Reviews for Recommendation. In WSDM, 425–434. AC
16. Howard, J.; and Ruder, S. 2018. Universal Language Model Fine-tuning for Text
Classification. In ACL (1), 328–339. Association for Computational Linguistics.
17. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; and Sutskever, I. 2019. Language
models are unsupervised multitask learners. OpenAI Blog 1(8): 9.
18. Peters, M. E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; and Zettlemoyer,
L. 2018. Deep Contextualized Word Representations. In NAACL-HLT, 2227–2237.
Association for Computational Linguistics.
19. Paul Covington, Jay Adams, and Emre Sargin. 2019. Deep neural networks for youtube
recommendations. In Proceedings of the 10th ACM conference on recommender systems,
pages 191–198. ACM.
20. Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. 2018. Content-based
recommender systems: State of the art and trends. In Recommender systems handbook,
pages 73–105. Springer
27