Project Final1

HYBRID BERT REC: BERT BASED HYBRID RECOMMENDER
Arandkar Vishal (20951A6759)

G. Tushar Reddy (20951A6754)
Ch Bala Varun chary (21955A6702)
i
HYBRID BERT REC: BERT BASED HYBRID RECOMMENDER
A Project Phase-II Report

Submitted in Partial Fulfilment of the
Requirements for the Award of the Degree Of
Bachelor of Technology
in
CSE (Data Science)
by

Ch Bala Varun Chary (21955A6702)
Department of CSE(Data Science)
INSTITUTE OF AERONAUTICAL ENGINEERING

(Autonomous)
Dundigal, Hyderabad – 500 043, Telangana
January, 2024
© 2024, Vishal, Tushar, Varun. All rights reserved.
ii
DECLARATION
I certify that
a. The work contained in this report is original and has been done by me under the guidance
of my supervisor(s).
b. The work has not been submitted to any other Institute for any degree or diploma.
c. I have followed the guidelines provided by the Institute in preparing the report.
d. I have conformed to the norms and guidelines given in the Ethical Code of Conduct of the
Institute.
e. Whenever I have used materials (data, theoretical analysis, figures, and text) from other
sources, I have given due credit to them by citing them in the text of the report and giving
their details in the references. Further, I have taken permission from the copyright owners
of the sources, whenever necessary.
Place: Signature of the Student :

Date: Roll No. :
iii
APPROVAL SHEET
This project phase-I report entitled Hybrid-Bert Rec: Bert Based Hybrid Recommender
submitted by Mr. Arandkar Vishal, Mr. G. Tushar Reddy, and Mr. Ch. Bala Varun Chary
is approved for the award Degree Bachelor of Technology in Data Science.
Examiner Supervisor
Principal
Dr. L V Narasimha Prasad
Date:
Place:
iv
CERTIFICATE
This is to certify that the project phase-I report entitled Hybrid-Bert Rec: Bert Based Hybrid
Recommender submitted by Mr. Arandkar Vishal, Mr. Gali Tushar Reddy and Mr. Ch.
Bala Varun Chary to the Institute of Aeronautical Engineering, Hyderabad, in partial
fulfillment of the requirements for the award of the Degree Bachelor of Technology in CSE(Data
Science) is a Bonafide record of work carried out by him/her under my/our guidance and
supervision. In whole or in parts, the contents of this report have not been submitted to any other
institute for the award of any Degree.
Supervisor Head of the Department

Ms N. Lakshmi Deepthi Dr. G. Sucharitha Reddy
Professor
Date:
v
ACKNOWLEDGEMENT
I am greatly indebted to my project guide, Ms.N. Lakshmi Deepthi, Department of Computer

Science and Engineering(Data Science), for his invaluable guidance and inspiration which have
sustained me to accomplish my work successfully.
I have a great pleasure in expressing my sincere thanks to Dr.G.Sucharitha Reddy, Head of

the department, who ignited my hidden potential, built career, inculcated self confidence,
sincerity and discipline within me and gave a path of success.
It is my pleasure to acknowledge gratefully to the Management and Principal, for their

inspiration, valuable suggestions and keen interest during my work.
I am grateful to the teaching and non-teaching faculty members of the Department of Computer
Science and Engineering(Data Science), for their encouragement and the facilities provided
during my project work.
I appreciate the arduous tasks of my friends, near and dear who injected patience, fortitude to
overcome the challenges that have come my way.
I perceive this opportunity as a big milestone in my career development. I will strive to use
gained skills and knowledge in the best possible way, and I will continue to work on their
improvement, in order to attain desired career objectives. Hope to continue cooperation with
all of you in the future.
With Gratitude,

Ch Bala Varun Chary (21955A6702)
vi
ABSTRACT
In the World of e-commerce, the dynamics of user behaviour evolve rapidly, shaped by societal trends
and changing preferences. This project introduces an innovative approach to recommendation systems
for e-commerce websites, focusing on the dynamic nature of user interactions. Leveraging content-based
filtering (CBF) and collaborative filtering, the project pioneers the integration of a bidirectional self-
attention network inspired by BERT, designed to capture and adapt to sequential user behaviours.
Traditional unidirectional models face limitations in capturing contextual information, especially in e-

commerce scenarios where user choices are influenced by diverse factors. The proposed bidirectional
model enhances the representation power by considering both left and right context, addressing the
shortcomings of unidirectional approaches.
Sequential recommender systems seek to capture information about user affinities and behaviours
considering their sequential series of interactions. In this Project, we detail BERT4Rec, a sequential
recommendation approach, based on bidirectional encoder of self attention-based Transformer
mechanisms. BERT Rec, which applies the bidirectional-encoder representations-from transformers
(BERT) technique to model user behaviour sequences by considering the target user’s historical data,
i.e., a content-based filtering (CBF) approach. Despite BERT Rec’s effectiveness, we argue that
considering only this historical data is insufficient to provide the most accurate recommendation.
HybridBERT Rec, which applies BERT to both CBF and collaborative filtering (CF). For CBF, we want
to extract the characteristics of the target user’s interactions with purchased items. For CF, we want to
find neighbouring users who are similar to the target user. Here, we extract the target item’s
characteristics using all other users who rated the target item as a second input to BERT. This generates
a target item profile. After obtaining both profiles, we use them to predict a rating score. We experimented
with three datasets, finding that our model was more accurate than the original BERT4Rec.
Keywords: Recommender system, sequential recommendation, hybrid recommendation, BERT
vii
TABLE OF CONTENTS
COVER PAGE i
TITLE PAGE ii
DECLARATION iii
APPROVAL SHEET iv
CERTIFICATE v
ACKNOWLEDGMENT vi
ABSTRACT vii
CONTENTS viii
LIST OF FIGURES x
LIST OF ABBREVIATIONS xi
CHAPTER 1 INTRODUCTION 1
1.1 Introduction 1
1.2 Existing System 2
1.3 Proposed System 3
CHAPTER 2 LITERATURE SURVEY 4

CHAPTER 3 PROJECT ARCHITECTURE 7
3.1 High-Level System Architecture 7
3.2 UML Diagram 9
3.2.1 Data Flow Diagram 9
3.2.2 Sequence Diagram 10
3.2.3 Use Case Diagram 11
CHAPTER 4 METHODOLOGY AND IMPLEMENTATION 12

4.1 Methodology 12
4.1.1 What is BERT /BERT Architecture 12
4.2 Understanding BERT 13
4.2.1 Transformer Layer 13
viii
4.2.2 Attention mechanism 13
4.2.3 Feed forward neural network 13
4.2.4 Stacking and Transformer Layer 14
4.2.5 Embeddings and softmax 14
4.3 Dataset 15
4.4 Implementation 17
4.4.1 CBF recommendation using BERT 17
4.4.2 CF recommendations using SVD 19
4.4.3 Hybrid Recommendation 20
CHAPTER 5 RESULTS AND DISCUSSION 21

5.1 Feature Importance And Analysis 21
5.2 Evaluation Metrics 22
5.3 Results 24
CHAPTER 6 CONCLUSION AND FUTURE SCOPE 25

6.1 Proposed Improvements 25
6.2 Future Scope 25
References 27
Appendices 27
ix
LIST OF FIGURES
Figure 1: High level architecture of recommendation system ………………………………………7

Figure 2:Collabrative Filtering……………………………………………………………………….7
Figure 3: Content based filtering …………………………………………………………..………...8
Figure 4:The architecture of the HybridBERT4Rec model …………………………………………8
Figure 5: Data flow diagram ………………………………………………………………………..10

Figure 6: Sequence diagram ………………………………………………………………………...10
Figure 7: Use case diagram …………………………………………………………………………11
Figure 8: A comparison between Transformer and Traditional RNN ………………………….…...12
Figure 9: Feed forward neural network ………………………………………………………….….14
Figure 10: Bert model architecture for predicting next item that user might interact with ……...….15
Figure 11: Top five genres and its percentage ………………………………………………...…....16
Figure 12: Correlation of heatmap for ratings dataframe ……………………………………...…....16
Figure 13: Training BERT on movie overview ……………………………………………………..18
Figure 14: Cosine similarity for first ten embeddings of dataframe ………………..……………….21
Figure 15: Top 10 Recommendation for user input …………………………..……………………..24
Figure 16: The similarity score and ratings of top recommendations …………..…………………...24
x
LIST OF ABBREVIATIONS
CBF - Content based Filtering

CF - Collaborative Filtering
BERT - Bidirectional Encoder Representation Of Transformers
SVD - Singular Value Decomposition
UBCF - User based Collaborative Filtering
IBCF - Item based Collaborative Filtering
MSE - Mean Squared Error
RMSE - Root Mean Squared Error
xi
Chapter 1
Introduction
1.1 Introduction
Recommender systems are indispensable tools in ecommerce platforms, aiding users in

navigating extensive product catalogues and facilitating personalized experiences. Traditional
recommendation algorithms, broadly categorized as collaborative filtering-based and content-
based [2]:, face distinct challenges in meeting the dynamic demands of real-world ecommerce
platforms.
Content-Based Filtering (CBF) is a popular recommendation paradigm utilized for items such
as web pages, news articles, books, and more. The fundamental idea behind CBF is to
recommend items based on their intrinsic features and user preferences. Unlike collaborative
filtering approaches that rely on ratings from other users, CBF relies on item attributes such as
genre, director, actors, or descriptions to create a user profile. This approach works
independently of the number of users in the system and is particularly effective when dealing
with structured data.
Collaborative Filtering (CF), on the other hand, relies on user-item interaction data to make
recommendations. User-based collaborative filtering identifies similar users based on their
historical preferences and recommends items based on what similar users have liked. Item-
based collaborative filtering recommends items similar to those a user has already interacted
with. However, both approaches are impacted by data sparsity in highly active ecommerce
platforms with large and dynamic inventories.
Hybrid recommendation systems blend diverse recommendation strategies to harness
complementary advantages. Traditional hybrids fuse collaborative filtering (CF), relying on
user preferences, and content-based filtering (CBF), centered on item characteristics. While
existing hybrids, such as those incorporating textual information in movie recommendations,
address specific domains, they often struggle to capture latent user preferences. Neural-based
approaches have emerged to bridge this gap, but many are limited to either CF or CBF. This
project explores an innovative hybrid approach that integrates both CF and CBF into neural
recommendations, offering a comprehensive solution for capturing nuanced user preferences.
This bidirectional approach overcomes the constraints of unidirectional models, providing
enhanced representation power and flexibility in capturing dynamic user preferences. Training
the bidirectional model involves addressing challenges in predicting the next item for each
position in the input sequence, requiring thoughtful consideration in the training process.
1.2 Existing System
The Existing CBF systems make suggestions based on a user's item and profile information.
They believe that if a user has access expressed interest in something, they will do so again in
the long term. Comparable commodities are frequently grouped based on their features. User
profiles are created by analysing previous conversations or straightforwardly asking people
1
about their passions. Other systems use other systems that use user individual and interpersonal
data that aren't regarded as purely content-based
Moreover, The Collaborative Filtering is classified in to two types :
a)Memory-based approaches: This is also known as collaborative filtering in the
neighbourhood. Ratings of user-item pairs are simply predicted based on their proximity [33].
Collaborative filtering is further classified into 2 types: user-based collaborative filtering and
item-based collaborative filtering. User-based simply means that strong and similar
recommendations will come among like people. Item-based collaborative filtering suggests
things based on perceived relevance, as determined by customer reviews
i) User-Based Collaborative Filtering (UBCF):recommends items to a user by identifying
similar users and predicting the user's preferences based on their ratings. It employs a User-
Item Matrix, predicting unrated items for the active user by comparing their preferences to
those of similar users. The K-nearest neighbours (KNN) algorithm is commonly used to find
similar users, making it easy to implement and more accurate compared to some techniques
like content-based methods. However, it faces challenges with sparsity in user ratings,
scalability with a growing user base, and issues with new users and items (cold-start problems).
ii) Item-Based Collaborative Filtering (IB-CF): focuses on finding items similar to the user's
preferences. It involves calculating similarity among items using techniques like Cosine-Based
Similarity or Correlation-Based Similarity. Unlike UB-CF, IB-CF pre-calculates item
similarities
b) Model-based collaborative filtering is not required to remember the based matrix. Instead,
the machine models are used to forecast and calculate how a customer gives a rating to each
product. These system algorithms are based on machine learning to predict unrated products
by customer ratings. These algorithms are further divided into different subsets, i.e., Matrix
factorization-based algorithms, deep learning methods, and clustering algorithms.
1.3 Proposed Solution
The proposed Bert solution tries to mitigate the existing problems such as :
1.Cold Start Problem Mitigation: Instead of relying on unique item identifiers to aggregate
historical information, our approach utilizes only the item's title as content, coupled with token
embeddings. This helps address the cold start problem, a significant shortcoming of traditional
recommendation algorithms, by allowing the model to provide meaningful recommendations
even when historical data is limited.
2.User Latent Interest Learning: By training our model with user behaviour data, we aim to
enable the system to learn not only item similarities but also the latent interests of users. This
contrasts with traditional recommendation algorithms and some pair-wise deep learning
algorithms that primarily focus on providing similar items based on past purchases. Our
approach seeks to offer more personalized recommendations by capturing the nuanced
preferences of users.
2
•In CF-HybridBERT, the our solution involves extracting the target item representation,
capturing the similarity levels between all neighbours and the target user. During training, a
random user masking technique is employed on the user sequence associated with each target
item. This process aims to enable the model to reconstruct the masked user's original
embedding as accurately as possible. Once training is completed, the network is capable of
constructing the next user representation based on the characteristics of users interacting with
the target item. During testing, the target item representation is constructed by masking the
target user and adding it to the end of the user sequence. This representation encapsulates the
comparison values between each neighbouring user and the target user, signifying the
similarity levels.
Conversely, in CBF-HybridBERT4rec, the focus is on extracting the user representation that

signifies the target user's preferences toward the target item. For each target user, an item
sequence is considered, consisting of a series of items the target user has interacted with.
During the training stage, random item masking is applied to the item sequence. Similar to user
masking, this technique allows the model to rebuild the masked item as closely as possible to
its original embedding. After training, the obtained network can predict the next item in the
sequence. In testing, item masking involves merely masking the target item and appending it
to the end of the sequence, facilitating the prediction of the target user's profile toward the
target item. This approach ensures that the user preferences are effectively captured in the
hybrid recommendation model.
3
Chapter 2
Literature Survey
F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor [1]
first comprehensive handbook which is dedicated entirely to Recommendation Systems (RS)
are applications that offer relevant items to users, from simple book recommendations to more
complex recommendations of moving items like conversational recommender system.
Recommender Systems (RSs) are software tools and techniques providing suggestions for
items to be of use to a user . The suggestions relate to various decision-making processes, such
as what items to buy, what music to listen to, or what online news to read. Although not
exclusively about recommendation systems, this book covers large- scale data mining,
including collaborative filtering and recommendation algorithms.
G. Adomavicius and A. Tuzhilin [2] :

Towards the next generation of recommender systems: A survey of the state-of-the-art and
possible extension. The paper discuss about the existing recommender system and its
drawbacks . The current generation of recommendation methods that are usually classified
into the following three main categories: content-based, collaborative, and hybrid
recommendation approaches. This paper also describes various limitations of current
recommendation methods and discusses possible extensions that can improve
recommendation capabilities .Some of the drawbacks are that The CB and CF have problems
with data sparsity and cold start problem. The hybrid recommender system is a combination
of CB and CF and its tries solve these problems to some extent.
H. Wang [3] proposed the ZeroMat[3]:

Solving Cold-start Problem of Recommender System
with No Input Data . H. Wang proposed a solution named “ZeroMat” ; that requires no input
data at all and predicts the user item rating data that is competitive in Mean Absolute Error
and fairness metric compared with the classic matrix factorization with affluent data, and
much better performance than random placement .The cold-start problem occurs when a
recommender system has little or no information about a new user or item, making it
challenging to provide accurate and personalized recommendations. H. Wang found a
drawback which was that ,The matrix factorization can be solved for cold start problem but
data sparsity and for large dataset it can be computationally expensive
R. Esmeli, M. Bader-El-Den and H. Abdullahi [4] : in 2017, conducted a fully content-

based movie recommendation (FCMR)Using Word2Vec Recommendation for Improved
Purchase Prediction. A system that uses a neural network model, Word2Vec CBOW, with
content information The model learns vector form features of each element, and then takes
advantage of the linear relationship of learned features to calculate the similarity between each
4
movie. Word2Vec is a popular word embedding technique that can also be applied in
recommendation systems. The FCMR system was able to achieve good performance on a
number of metrics, but they also suggested that it would be best to use a hybrid recommender
system that combines both content-based and collaborative filtering approaches and the
word2vec Cannot capture long-term dependencies
Sun, Fei, Rui Wang, Rui Zhang, Xin Chen, Xudong Hu, and Zhiyuan Liu [5] :
proposed the BERT4Rec”. BERT4Rec is a recommendation system that uses BERT
(Bidirectional Encoder Representations from Transformers) to generate embeddings for items.
These embeddings are then used to calculate the similarity between items, which can be used
to make recommendations. BERT4Rec has been shown to be effective for a variety of
recommendation tasks, including sequential recommendation, cross-domain recommendation,
and cold-start recommendation. BERT4Rec refers to the application of BERT (Bidirectional
Encoder Representations from Transformers) in the context of recommender systems. BERT
is a pre-trained transformer-based model that has been highly successful in natural language
processing tasks. When applied to recommender systems, BERT can learn complex patterns
and dependencies in user-item interactions, potentially leading to more accurate and
personalized recommendations. The application of BERT in recommender systems has gained
attention because of its ability to capture long-range dependencies and context in user-item
interactions. It allows the model to understand the semantics of the items and the user
preferences more effectively. The drawback was that, The google also proposed “SASRec,
KeBERT4Rec” each of them has its own advantages like Self-Attentive Sequential
Recommendation (SASRec) can capture the relationships between items that are far apart in
the user's history.
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and PengJiang [6].
Modelling users’ dynamic preferences from their historical behaviours is challenging and
crucial for recommendation systems. Previous methods employ sequential neural networks to
encode users Historical interactions from left to right into hidden representations For making
recommendations. Despite their effectiveness, we argue That such left-to-right unidirectional
models are sub-optimal due To the limitations including: a) unidirectional architectures
restrict The power of hidden representation in users’ behaviour sequences; b) they often
assume a rigidly ordered sequence which is not always practical. To address these limitations,
we proposed a sequential recommendation model called BERT4Rec, which employs the deep
bidirectional self-attention to model user behaviour sequences. To avoid the information
leakage and efficiently train the bidirectional model, we adopt the Cloze objective to
sequential recommendation predicting the random masked items in the sequence by jointly
conditioning on their left and right context.
5
Anushree H and Shashidhara H S, Efficient Recommendation System Using Bert
Technology, International Journal of Advanced Research in Engineering And
Technology[7].
Modelling the diverse desires of users through their economic models for systems is dif icult
and essential. Past Techniques utilize successive Neural networks to convert the historical
experiences of customers across left to right Into encoded suggestion models.
Recommendation systems in e-commerce are Becoming an integral means of making
consumers navigate the content accessible. Recommender systems are an important aspect of
E-commerce platforms that Assist consumers choose products of choice on a wide scale
through huge investments. The metadata termed the contextual phrase, that incorporates the
reference label, and The description of the quote proves have been used by several authors to
locate the Relevant research referenced. The lack of a well-established benchmarking dataset
and No tool for recommendations that can achieve great efficiency has perhaps made the Study
challenging. The foundation of mutual marketplace stages in the rental property Field is the
personalization of recommendation systems. To support people, such a Framework provides
a valuable tool. The current technique aims to determine the Context of the movie plot
summary from the given movies using BERT as-a-service and To predict similar movie
recommendations.
Yuyangzi Fu ,Tian Wang [8] :
In e-commerce, recommender systems have Become an indispensable part of helping users
Explore the available inventory. In this work, We present a novel approach for item-based
Collaborative filtering, by leveraging BERT to Understand items, and score relevancy
between Different items. Our proposed method could Address problems that plague traditional
recommender systems such as cold start, and ”more of the same” recommended content. We
Conducted experiments on a large-scale real-World dataset with full cold-start scenario, and
The proposed approach significantly outperforms the popular Bi-LSTM model.
6
Chapter 3
Project Architecture
3.1 High-Level System Architecture
The main component of our architecture is recommendation engine which consist of three
main components :
1 Collaborative Filtering
2 Content Based Filtering
3 Hybrid Combination
Figure 1. High level architecture of recommendation system

COMPONENTS:
1.Collabrative Filtering : The Collaborative Filtering Recommender is entirely based on the
past behaviour and not on the context. More specifically, it is based on the similarity in
preferences, tastes and choices of two users. It analyses how similar the tastes of one user is
to another and makes recommendations on the basis of that. For instance, if user A likes
movies 1, 2, 3 and user B likes movies 2,3,4, then they have similar interests and A should
like movie 4 and B should like movie 1. This makes it one of the most commonly used
algorithms as it is not dependent on any additional information.
In general, collaborative filtering is the workhorse of recommender engines. The algorithm

has a very interesting property of being able to do feature learning on its own, which means
Figure 2. Collabrative Filtering

7
that it can start to learn for itself what features to use. It can be divided into Memory-Based
Collaborative Filtering and Model-Based Collaborative filtering.
2.Content based Filtering: The Content-Based Recommender relies on the similarity of the
items being recommended. The basic idea is that if you like an item, then you will also like a
“similar” item. It generally works well when it's easy to determine the context/properties of
each item.
A content based recommender works with data that the user provides, either explicitly movie
ratings for the MovieLens dataset. Based on that data, a user profile is generated, which is then
used to make suggestions to the user. As the user provides more inputs or takes actions on the
recommendations, the engine becomes more and more accurate.
Figure 3. Content Based Filtering
Figure 4. The architecture of the HybridBERT4Rec model, which comprises a CBF part,
and a prediction layer
8
3.2 UML Diagrams
3.2.1 Data Flow Diagram:
The fundamental data workflow encompasses the collection of data, analysis of raw data, and
the subsequent cleaning and preprocessing , model application and analysing the results
Lets understand how text processing is done :
While dealing with text data preprocessing, the focus is on refining the data by eliminating
unwanted elements, performing stop words removal, and incorporating lemmatization.
1.Removing Noise from Data:
Before extracting meaningful information, it is crucial to eliminate any irrelevant or
extraneous elements that might hinder the analysis process. This noise removal step involves
identifying and discarding data components that do not contribute to the overall understanding
or insights derived from the data.
2.Stop words Removal:
Stop words, commonly occurring words in a language (e.g., 'the,' 'is,' 'and'), are often devoid
of significant semantic meaning and can add noise to textual data. Stop words removal
involves systematically excluding these words from the dataset to streamline the analysis
process, allowing the focus to shift toward more contextually relevant terms. This enhances
the efficiency of subsequent natural language processing tasks.
3.Lemmatization:
Lemmatization is a linguistic process aimed at reducing words to their base or root form,
known as lemmas. This normalization technique ensures that different grammatical variations
of a word, such as verb conjugations or plural forms, are transformed into a common base. By
doing so, lemmatization facilitates more accurate and consistent analysis of the text data,
enabling improved understanding and interpretation of the content.
4.BERT Model:
Utilizing transformers, we load the pre-trained BERT model known as bert-base-uncased, a
comprehensive discussion of which will be provided in Chapter 4. The process involves
tokenizing sentences and subsequently encoding them into embeddings by passing them
through the BERT model. This mechanism allows us to harness the power of pre-trained
language representations for diverse natural language processing tasks.
9
Figure 5. Data Flow Diagram
3.2.2Sequence Diagram
Figure 2 data flow diagram

Figure SEQ Figure \* ARABIC 2:Data flow daigram for hybrid recommendation
Figure 6. Sequence Diagram

Sequence diagrams can be useful reference diagrams for businesses and other organizations.
We draw a sequence diagram to:
• Represent the details of a UML use case.
• Model the logic of a sophisticated procedure, function, or operation.
10
• See how tasks are moved between objects or components of a process
3.2.3 Use Case Diagram:
Figure 7. Use Case Diagram

1. User: Represents the actor interacting with the system.
2. Recommendation Engine : the recommendation engine is a combination of both

content based and collaborative based filtering along with inbuilt BERT (bidirectional
encoder representation of transformers).
3. The dataset consists of multiple sub data like ratings ,movies and etc and it is stored
after we perform cleaning and preprocessing them.
4. The ultimate step involves presenting movie recommendations to the user, following
a ranked list recommendation approach.
11
Chapter 4
Methodology And Implementation
4.1 Methodology:
Before getting started lets get introduced with the terminology:
In sequential recommendation, let U={u1,u2, . . . ,u |U | } denote a set of users, V={v1,v2, . .

. ,v |V | } be a set of items, and list Su=[v (u) 1 , . . . ,v (u) t , . . . ,v (u) nu] denote the
interaction sequence in chronological order for user u ∈ U, where v (u) t ∈ V is the item that
u has interacted with at time step2 t and nu is the the length of interaction sequence for user
u. Given the interaction history Su , sequential recommendation aims to predict the item that
user u will interact with at time step nu + 1.
4.1.1 What is Bert / Model Architecture :
Lets get started with sequential recommendation model called BERT -Rec, which adopts
Bidirectional Encoder Representations from Transformers to a new task, sequential
Figure 8. A comparison between Transformers and traditional RNN
Recommendation. It is built upon the popular self-attention layer, “Transformer layer”. As

illustrated in Figure 1b, BERT4Rec is stacked by L bidirectional Transformer layers. At each
layer, it iteratively revises the representation of every position by exchanging information
across all positions at the previous layer in parallel with the Transformer layer. Instead of
learning to pass relevant information forward step by step as RNN based methods did in Figure
1d, self-attention mechanism endows BERT-Rec with the capability to directly capture the
dependencies in any distances. This mechanism results in a global receptive field, while CNN
based methods like Caser usually have a limited receptive field. In addition, in contrast to
RNN based methods, self-attention is straightforward to parallelize.
Lets get into deep one by one
12
4.2 Understanding Bert:
4.2.1Transformer Layer: Lets understand with example in machine translation the language
gets translated from one language to another this done through the transformers . Initially the
language gets encoded in terms of embeddings by “Encoder” and then “decoder “in target
language decodes it.
Now Encoder is built upon multiple self attention mechanism + Feed Forward Neural network
and the same with decoder .Bert is an Encoder model so we will focus only encoder How does
Encoder works.
4.2.2 Attention Mechanism: Attention mechanisms have become an integral part of

compelling sequence modelling and transduction models in various tasks, allowing modelling
of dependencies without regard to their distance in the input or output sequences [2,3]In all
but a few cases [4], however, such attention mechanisms are used in conjunction with a
recurrent network or generally with feed forward neural networks in our case.
These are calculated through a function called “Attention” .An attention function can be
described as mapping a query and a set of key-value pairs to an output, where the query, keys,
values, and output are all vectors. The output is computed as a weighted sum
The input consists of queries and keys of dimension dk, and values of dimension dv. We
compute the dot products of the query with all keys, divide each by √ dk, and apply a softmax
function to obtain the weights on the values. In practice, we compute the attention function on
a set of queries simultaneously, packed together into a matrix Q. The keys and values are also
packed together into matrices K and V . We compute the matrix of outputs as:
𝑄𝐾 𝑇
𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(𝑄, 𝐾, 𝑉) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( )𝑉
√𝑑𝑘
● Q is query matrix
● K is key matrix
● V indicates values
4.2.3 Feed Forward Neural Network:
In addition to attention sub-layers, each of the layers in our encoder and decoder contains a
fully connected feed-forward network, which is applied to each position separately and
identically. This consists of two linear transformations with a ReLU activation in between.
13
Mathematically, ReLU is defined as f(x)=max(0,x), meaning it replaces any negative values
with zero while leaving positive values unchanged. The purpose of ReLU is to introduce non-
linearity to the network, allowing it to learn complex relationships and patterns in the data.
Figure 9. Feed
Figure SEQ Forward
Figure Neural
\* ARABIC
Network5: Encoder
Architecture
4.2.4 Stacking Transformer Layer:
BERTBASE has 12 layers in the Encoder stack while BERTLARGE has 24 layers in the
Encoder stack. These are more than the Transformer architecture described in the original
project (6 encoder layers).
BERT architectures (BASE and LARGE) also have larger feedforward networks (768 and
1024 hidden units respectively), and more attention heads (12 and 16 respectively) than the
Transformer architecture suggested in the original project. It contains 512 hidden units and 8
attention heads.
BERT BASE contains 110M parameters while BERTLARGE has 340M parameters. How
ever we are going to use Bert base for our recommendation system . However, the network
becomes more difficult to train as it goes deeper. Therefore, we employ a residual connection
around each of the two sublayers as in Figure a, followed by layer normalization [5]. More
over, we also apply dropout [6] to the output of each sub-layer, before it is normalized. That
is, the output of each sub-layer is LN(x + Dropout(sublayer(x))), where sublayer(·) is the
function implemented by the sub-layer itself, LN is the layer normalization function
4.2.5 Embeddings and Softmax:
As elaborated above, without any recurrence or convolution module, the Transformer layer
(Trm) is not aware of the order of the input sequence. In order to make use of the sequential
information of the input, we inject Positional Embeddings into the input item embeddings at
the bottoms of the Transformer layer stacks. For a given item vi , its input representation h 0
i is constructed by summing the corresponding item and positional embedding:
ℎ𝑖0 = 𝑣𝑖 + 𝑝𝑖
14
where vi ∈E is the d−dimensional embedding for item vi , Pi ∈P is the d−dimensional positional
embedding for position index i.
After L layers that hierarchically exchange information across all positions in the previous
layer, we get the final output HL for all items of the input sequence. Assuming that we mask
the item vt at time step t, we then predict the masked items vt we apply the softmax layer at
last.
SoftMax Layer: . It is typically used as the output layer of a neural network to convert the raw
output scores or logits into a probability distribution over multiple classes. The softmax
function takes a vector of real numbers as input and produces a probability distribution as
output
𝑒 𝑧𝑖
. 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧𝑖 ) = ∑𝐾
𝑗=1 𝑒 𝑧𝑗
● Zi is the raw score or logit for class i

● e is the base of natural logarithm
● k is number of classes
Figure 10. BERT

SEQ model\*architecture
Figure ARABIC 6for predicting
BERT modelnext item
that user might
architecture for interact withnext item that user might
predicting
interact with
4.3 Dataset:
We have acquired a movie dataset from Kaggle, comprising several sub-datasets, including a
primary movie dataset and separate rating datasets.
The movie dataset, intended for content-based filtering, is composed of 45,000 rows and 24
columns. Among these columns, 12 are
deemed essential for analysis, and they include: 'id', 'title', 'genres', 'original language',
'overview', 'tagline', 'production countries', 'release date', 'status', 'vote average', 'vote count',
15
and 'runtime'. To gain insights into the data, we will conduct exploratory data analysis (EDA)
techniques.
Figure SEQFigure Figure

11.\*
topARABIC 7 A itsdetailed
5 genres and bar in
percentage graph and pie chart on
data frame
distribution of genres in data
The ratings dataset, designated for collaborative filtering, comprises 10,000 rows and includes
4 columns. Among these columns, three are considered crucial for analysis: 'movieId,' 'userId,'
and 'rating.' Each entry in the dataset corresponds to a user identified by a unique user ID
providing a rating to a specific movie identified by a unique movie ID. The ratings provided
by users for these movies fall within the range of 1 to 5.
Figure 12.SEQ Figureof\*heatmap

correlation ARABICfor 8ratings
A dataframe
heatmap on ratings dataframe
There are total of 671 unique users along with 9066 unique movies and the correlation plot
among them is
16
4.4 Implementation
4.4.1 CBF-Recommendation using BERT:
For content-based recommendation, we consolidate various text columns such as title,
overview, genre, tagline, and production companies. Subsequently, we apply natural language
processing (NLP) preprocessing steps, as outlined earlier.
Now, let's contemplate a scenario wherein a movie boasting an average rating of 9, derived
from merely two votes, cannot be deemed superior to a film with a lower average rating of 8
but has garnered 1000 votes. In light of this, we opt to employ IMDB's weighted rating
methodology to assess the overall quality of a movie. The weighted rating of a movie is defined
as:
where,
𝑣 𝑚
𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑅𝑎𝑡𝑖𝑛𝑔 = ( ∗ 𝑅) + ( ∗ 𝐶)
𝑣+𝑚 𝑣+𝑚
● v is the number of votes for the movie (vote count)

● R is the average rating of the movie (vote average)
● C is the mean vote across the whole dataset
● m is the minimum votes required to be listed in the chart
Now we use pre trained “Bert-base-uncased” model the base refer to the small model and
uncased refers to that all the text used during pre-training is converted to lowercase, including
both the input text and the vocabulary. For example, "Hello" and "hello" would be treated as
the same word in the "uncased" model
some key parameters and details about the BERT model:
Architecture: BERT utilizes a transformer architecture, specifically the transformer encoder.
The transformer model enables the bidirectional processing of input sequences, allowing it to
capture contextual information effectively.
Layers: BERT consists of multiple layers of transformer blocks. The number of layers in the
model is a hyperparameter that can be adjusted. The original BERT-base model has 12 layers,.
Hidden Units: Each layer in the transformer contains a certain number of hidden units, which
is another hyperparameter. The BERT-base model has 768 hidden units in each layer.
Attention Heads: The attention mechanism in the transformer is divided into attention heads.
The number of attention heads is also a hyperparameter. BERT-base has 12 attention heads.
Vocabulary Size: BERT uses Word Piece tokenization, and the "uncased" variant has a
vocabulary size of 30522 tokens. This vocabulary includes both whole words and subwords.
Embedding Dimension: BERT represents words as vectors in an embedding space. The
embedding dimension for BERT-base is 768.
First, we set up our device to CUDA then we load our pre trained Bert model:
17
CUDA (Compute Unified Device Architecture) is a parallel computing platform and
application programming interface (API) model created by NVIDIA. It allows developers to
use NVIDIA graphics processing units (GPUs) for general-purpose processing, not just
graphics-related tasks. CUDA enables programmers to harness the computational power of
NVIDIA GPUs to accelerate various types of applications, including scientific simulations,
data processing, machine learning, and more.
Now the combined columns from above is passed through the bert model to encode them
Figure 13. Training BERT on movie overview
After cleaning we left with approx. of 44000 rows and the default size of each batch is 32 so
44000/32 which is approximately equal to 1377 so in total our dataset is divided into 1377
batches
18
4.4.2 CF –Recommendation System Using SVD :
As previously discussed the collaborative filtering is of two types item-based and user-based
● User-based, which measures the similarity between target users and other users.
● Item-based, which measures the similarity between the items that target users rate or
interact with and other items
When you have millions of users and/or items, computing pairwise correlations is expensive
and slow. We already saw that we could avoid processing all the data every time by
establishing neighbourhoods on the basis of similarity. Is it possible to reduce the size of the
ratings matrix some other way ?
The answer is yes we can use svd ( singular value decomposition ) to reduce the dimensionality
of the matrix lets understand how?
The Singular Value Decomposition (SVD), a method from linear algebra that has been
generally used as a dimensionality reduction technique in machine learning. SVD is a matrix
factorisation technique, which reduces the number of features of a dataset by reducing the
space dimension from N-dimension to K-dimension (where K<N). In the context of the
recommender system, the SVD is used as a collaborative filtering technique. It uses a matrix
structure where each row represents a user, and each column represents an item. The elements
of this matrix are the ratings that are given to items by users.
𝐴 = 𝑈𝛴𝑉T
● A is the input matrix
● U is left singular Vector I,e userId
● Σ is the diagonal matrix of singular values
● VT represents the transpose of movie features
where A is the input data matrix (users’s ratings), U is the left singular vectors (user “features”
matrix), Σ is the diagonal matrix of singular values (essentially weights/strengths of each
concept), and VT is the right singular vectors (movie “features” matrix). U and VT are
column orthogonal, and represent different things. U represents how much users “like” each
feature and VT represents how relevant each feature is to each movie.
SVD and Recommendations
With SVD, we turn the recommendation problem into an Optimization problem that deals
with how good we are in predicting the rating for items given a user. One common metric to
achieve such optimization is Root Mean Square Error (RMSE). A lower RMSE is indicative
of improved performance and vice versa. RMSE is minimized on the known entries in the
utility matrix. SVD has a great property that it has the minimal reconstruction Sum of Square
Error (SSE); therefore, it is also commonly used in dimensionality reduction. Below is the
formula to achieve this:
𝑚𝑖𝑛𝑈𝑣 ∑ ∑ (𝐴𝑖𝑗 − [𝑈𝑉 ∑𝑇 ]𝑖𝑗 )2

𝑖,𝑗𝑒𝐴
19
● i,j represents the rows and columns respectively
RMSE and SSE are monotonically related. This means that the lower the SSE, the lower the
RMSE. With the convenient property of SVD that it minimizes SSE, we know that it also
minimizes RMSE. Thus, SVD is a great tool for this optimization problem. To predict the
unseen item for a user, we simply multiply U, V, and ΣT.
4.4.3 Hybrid Recommendation:

Now, let's dive into the main aspect: the integration of collaborative and content-based filtering
into a unified component. This integration aims to mitigate the limitations in each approach.
Let's explore how this is achieved and the how it brings to address the drawbacks of both
methods.
Upon acquiring sentence embeddings from BERT, we proceed to calculate the cosine
similarity between each sentence and every other sentence. This computation is executed using
the cosine similarity function offered by NumPy, and the resulting values are stored in a
matrix.
In the initial step, our hybrid recommendation system begins by soliciting input from the user,
specifically the user ID and the title of a movie they are interested in. The system then checks
if the provided title exists within our dataset. If a match is found, the system proceeds to
identify the index corresponding to that particular movie title in our Data Frame.
Once the movie title index is determined, the system utilizes the cosine similarity matrix
(cos_sim) to generate the top 10 recommendations for movies that exhibit similarity to the
user's specified movie. This content-based filtering aspect ensures that recommendations are
based on the content features of the input movie, thereby capturing thematic similarities.
Subsequently, these top 10 recommendations, along with the user ID and movie ID, are
forwarded to the collaborative filtering component. This collaborative filtering step takes into
account the user's historical preferences and behaviors, recommending movies that users with
similar tastes have enjoyed. By combining both content-based and collaborative filtering
approaches, our hybrid recommendation system aims to leverage the strengths of each method
while mitigating their individual limitations. This integration provides a more comprehensive
and personalized set of recommendations for the user, enriching their overall experience with
the recommendation system.
However, in cases where the provided movie title is not found within our database, an
alternative approach is implemented. Instead, we request the user to input a movie overview.
Subsequently, we utilize BERT embeddings for the input overview, computing the cosine
similarity between this input and the overviews of other movies. The system then generates
and presents the top 10 recommendations based solely on content-based filtering using the
cosine similarity scores derived from the movie overviews.
In this scenario, the recommendation system exclusively relies on content-based filtering,
emphasizing the semantic similarities between the input movie overview and the existing
movies in the dataset. This allows the system to offer relevant recommendations by identifying
movies with similar thematic content, thus ensuring a robust and adaptable recommendation
mechanism even when specific movie titles are absent from the database.
20
Chapter 5
Results
5.1 Feature Importance and analysis:
The comprehension and selection of appropriate features significantly impact the performance
of a model. In the context of content-based filtering, we consider all text data as crucial
features, as it encapsulates the essence of item content and behavior. The sentence embeddings
obtained from the BERT model have a dimensionality of 768.
Now, let's delve into the pivotal role played by cosine similarity in this context. Cosine
similarity is instrumental in quantifying the angle between each pair of vectors representing
the sentence embeddings. This measurement helps gauge the similarity or dissimilarity
between different pieces of text data. By calculating the cosine similarity, we effectively
capture the semantic relationships and similarities among the items, forming a fundamental
aspect of the content-based recommendation system.
𝐴. 𝐵
𝑐𝑜𝑠(𝐴, 𝐵) =
|𝐴|. |𝐵|
● Where A,B are vectors
● |A|.|B| represents the dot product of two vectors
Our dataset comprises 45,000 rows, and plotting all the cosine similarity values would result
in a cluttered visualization. Therefore, let's focus on examining the first 10 plots to gain
insights.
Figure 14. Cosine similarity for first 10 embeddings of dataframe

Figure SEQ Figure \* ARABIC 10 Cosine
Similarity for first 10 embeddings in data
frame
5.2 Evaluation metrics:
For a recommendation system that provides a list of top 10 recommendations based on user
input, there are few evaluation metrics to assess its performance. Here are some evaluation
approaches
Precision@k:
21
Precision@k measures the proportion of relevant items among the top-k recommended items.
In your case, if a user interacts with or likes certain items, you can calculate the precision of
the system by checking how many of the top 10 recommended items are relevant to the user.
Precision@k = (Number of relevant items in top-k) / k
here we take k as 10
Precision@10: 10.00% (what does it indicates?):
Precision measures the accuracy of the recommendations by calculating the proportion of

relevant items among the top-N recommended items. In your case, Precision@10 indicates
that 10% of the top 10 recommended movies were relevant to the user, based on the ground
truth ratings.
A Precision@10 of 10.00% suggests that 1 out of the 10 recommended movies was relevant,
according to the ground truth ratings.
The precision@10 for our Hybrid BERT recommendation system is :

Precision@10: 90.00%
RMSE And MAE in Recommendation Systems:
RMSE (Root Mean Squared Error) is a widely used metric in recommendation systems to
evaluate the accuracy of predicted ratings compared to the actual ratings given by users. It is
particularly relevant when dealing with collaborative filtering algorithms that predict user
preferences for items based on the preferences of other users.
̂𝑖 )2
∑𝑛𝑖=1(𝑌𝑖 − 𝑌
𝑅𝑀𝑆𝐸 = √
𝑛
Where
• 𝑦𝑖 is the actual value of the dependent variable for the i-th observation
• 𝑦̂𝑖 𝑠 𝑡ℎ𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑖 − 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
• N is the number of observation in dataframe
To evaluate the collaborative filtering we use RMSE we use the Suprise library that provided
various ready-to-use powerful prediction algorithms including (SVD) to evaluate its RMSE
(Root Mean Squared Error) on the ratings dataset. It is a Python scikit building and analysing
recommender systems.
The Surprise library, stands for "Simple Python Recommendation System Engine," is a Python
library designed for building and evaluating recommendation systems. It provides a
convenient and easy-to-use interface for implementing and testing various recommendation
algorithms. Surprise is particularly popular for collaborative filtering-based recommendation
systems
22
The main advantage of surprise library is that it focus on collaborative filtering so we cant
directly use it to evaluate the hybrid recommendation engine.
MAE is the most straightforward metric of evaluation known as Mean absolute error. The
above is a fancy equation for evaluating it. It is literally the difference between what user
might rate a movie to what our system predicts.
𝑛
1
𝑀𝐴𝐸 = ( ) ∑ |𝑦𝑖 − 𝑦̂𝑖 |
𝑛
𝑖=1
Where
• 𝑦𝑖 is the actual value of the dependent variable for the i-th observation
• 𝑦̂𝑖 𝑠 𝑡ℎ𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑖 − 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
• N is the number of observation in dataframe
In order to evaluate our CF-SVD model we use RMSE,MAE in 5 folds ,folds refer to the
divisions or partitions of the dataset that are created for the purpose of training and testing a
model. Cross-validation is a technique used to assess the performance of a model by splitting
the dataset into multiple subsets or folds, training the model on some of these folds, and
evaluating it on the remaining fold(s).
The RMSE and MAE results are:
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean
Train_RMSE 0.8976 0.8971 0.8982 0.8883 0.8962 0.8955

Train_MAE 0.6894 0.6907 0.6914 0.6844 0.6916 0.6895
Test_RMSE 0.89758099 0.89706553 0.89817946 0.88828538 0.89622038 0.89620
Test_MAE 0.68941309 0.69074929 0.69138941 0.6843557 0.69160636 0.69160
5.3 Results:
If the movie title is found in the dataframe, the system adopts a hybrid approach, providing
movie recommendations and criteria.
23
Movie_recommendations:
Figure 15. Top 10 Recommendation for user input

Criteria:
The criteria involves similarity score which describes how similar the movies are related to
each other and weighted rating which was discussed in the implementation section
Figure 16. The similarity score and ratings of top 10 recommendations
24
Chapter 6
Conclusion and Future Scope
6.1 Conclusion
In this project, we created a recommendation system using two approaches. First, we looked
at the content of movies using BERT to understand their details. This helped us capture subtle
similarities in movie content based on their descriptions. Second, we considered what similar
users liked using collaborative filtering. We used SVD to handle missing data in our ratings.
Combining these methods resulted in a smart recommendation system. It thinks about both the
content of movies and what people generally enjoy. It turned out to be quite effective, with a
90% accuracy in suggesting movies that users might like. The collaborative filtering part,
which checks what similar users liked, had an accuracy measure called RMSE at 0.89, showing
it's doing a good job too.
In simpler terms, our recommendation system acts like a helpful friend who understands both
the details of movies and what people typically enjoy. This makes it better at suggesting
movies you might really enjoy watching. It's like having a buddy who knows your taste in
movies and gives you great suggestions!.
Future Scope:
For future enhancements, there's an opportunity to refine our recommendation system by fine-
tuning the BERT model on a more extensive dataset. Currently, we've utilized a pre-trained
BERT base model, but by subjecting it to thorough training on a custom dataset, we can expect
superior performance and heightened accuracy, particularly tailored to our specific
application.
Furthermore, in collaborative filtering, where we've employed Singular Value Decomposition

(SVD) to reduce dimensionality, there exists room for improvement. Instead of adhering solely
to SVD, exploring and implementing more advanced techniques for model refinement could
potentially yield better outcomes in terms of recommendation quality. This way, we can stay
at the forefront of advancements in both content-based and collaborative filtering components,
ensuring a recommendation system that continually evolves for optimal results.
25
APPENDIX
REFERENCES
1. F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, Introduction to Recommender Systems

Handbook. Springer, October 2017
2. G. Adomavicius and A. Tuzhilin, “Towards the next generation of recommender systems:
A survey of the state-of-the-art and possible extensions,” IEEE Transactions on
Knowledge and Data Engineering, vol. 17, no. 6, pp. 734–749, June 2018.
3. M. J. Pazzani and D. Billsus, Content-Based Recommendation Systems. Springer, 2020,
pp. 325–341.
4. H. Wang, "ZeroMat: Solving Cold-start Problem of Recommender System with No Input
Data," 2021 IEEE 4th International Conference on Information Systems and Computer
Aided Education (ICISCAE), Dalian, China, 2021, pp. 102-105
5. R. Sharma, S. Rani and S. Tanwar, "Machine Learning Algorithms for building
Recommender Systems," 2019 International Conference on Intelligent Computing and
Control Systems (ICCS), Madurai, India, 2019, pp. 785-790
6. M. E. B. H. Kbaier, H. Masri and S. Krichen, "A Personalized Hybrid Tourism
Recommender System," 2017 IEEE/ACS 14th International Conference on Computer
Systems and Applications (AICCSA), Hammamet, Tunisia, 2017, pp. 244-250,
7. Y. Wang, S. C.-F. Chan, and G. Ngai, “Applicability of demographic recommender
system to tourist attractions: a case study on trip advisor,” in Proceedings of the The 2018
IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent
Agent TechnologyVolume 03. IEEE Computer Society, 2018
8. H. -W. Chen, Y. -L. Wu, M. -K. Hor and C. -Y. Tang, "Fully contentbased movie
recommender system with feature extraction using neural network," 2017 International
Conference on Machine Learning and Cybernetics (ICMLC), Ningbo, China, 2017, pp.
504-509
9. R. Esmeli, M. Bader-El-Den and H. Abdullahi, "Using Word2Vec Recommendation for
Improved Purchase Prediction," 2020 International Joint Conference on Neural Networks
(IJCNN), Glasgow, UK, 2020, pp. 1-8
10. Sun, Fei, Rui Wang, Rui Zhang, Xin Chen, Xudong Hu, and Zhiyuan Liu. "BERT4Rec:
Sequential Recommendation with Bidirectional Encoder Representations from
Transformers." arXiv preprint arXiv:1904.06690 (2019)
26
11. Dietmar Jannach, Ahtsham Manzoor, Wanling Cai, and Li Chen. 2021. A Survey on
Conversational Recommender Systems. ACM Comput. Surv. 54, 5, Article 105 (June
2022), 36 pages.
12. R. Ahuja, A. Solanki and A. Nayyar, "Movie Recommender System Using K-Means
Clustering AND K-Nearest Neighbor," 2019 9th International Conference on Cloud
Computing, Data Science & Engineering (Confluence), Noida, India, 2019, pp. 263-
268M. U. Gul, K. John Pratheep, M. Junaid and A. Paul, "Spiking Neural Network (SNN)
for Crop Yield Prediction," 2021 9th International Conference on Orange Technology
(ICOT), Tainan, Taiwan, 2021, pp. 1-4.
13. Tan, Y.; Zhang, M.; Liu, Y.; and Ma, S. 2017. RatingBoosted Latent Topics:
Understanding Users and Items with Ratings and Reviews. In IJCAI, 2640–2646.
IJCAI/AAAI Press.
14. McAuley, J. J.; and Leskovec, J. 2019. Hidden factors and hidden topics: understanding
rating dimensions with review text. In RecSys, 165–172. ACM
15. Zheng, L.; Noroozi, V.; and Yu, P. S. 2017. Joint Deep Modeling of Users and Items
Using Reviews for Recommendation. In WSDM, 425–434. AC
16. Howard, J.; and Ruder, S. 2018. Universal Language Model Fine-tuning for Text
Classification. In ACL (1), 328–339. Association for Computational Linguistics.
17. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; and Sutskever, I. 2019. Language
models are unsupervised multitask learners. OpenAI Blog 1(8): 9.
18. Peters, M. E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; and Zettlemoyer,
L. 2018. Deep Contextualized Word Representations. In NAACL-HLT, 2227–2237.
Association for Computational Linguistics.
19. Paul Covington, Jay Adams, and Emre Sargin. 2019. Deep neural networks for youtube
recommendations. In Proceedings of the 10th ACM conference on recommender systems,
pages 191–198. ACM.
20. Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. 2018. Content-based
recommender systems: State of the art and trends. In Recommender systems handbook,
pages 73–105. Springer
27

Project Final1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Final1

Uploaded by

Copyright:

Available Formats

HYBRID BERT REC: BERT BASED HYBRID RECOMMENDER

Arandkar Vishal (20951A6759)

A Project Phase-II Report

Arandkar Vishal (20951A6759)

Department of CSE(Data Science)

INSTITUTE OF AERONAUTICAL ENGINEERING

© 2024, Vishal, Tushar, Varun. All rights reserved.

Place: Signature of the Student :

Supervisor Head of the Department

I am greatly indebted to my project guide, Ms.N. Lakshmi Deepthi, Department of Computer

I have a great pleasure in expressing my sincere thanks to Dr.G.Sucharitha Reddy, Head of

It is my pleasure to acknowledge gratefully to the Management and Principal, for their

Arandkar Vishal (20951A659)

Traditional unidirectional models face limitations in capturing contextual information, especially in e-

Keywords: Recommender system, sequential recommendation, hybrid recommendation, BERT

CHAPTER 2 LITERATURE SURVEY 4

CHAPTER 4 METHODOLOGY AND IMPLEMENTATION 12

CHAPTER 5 RESULTS AND DISCUSSION 21

CHAPTER 6 CONCLUSION AND FUTURE SCOPE 25

Figure 1: High level architecture of recommendation system ………………………………………7

Figure 5: Data flow diagram ………………………………………………………………………..10

CBF - Content based Filtering

Recommender systems are indispensable tools in ecommerce platforms, aiding users in

1.2 Existing System

1.3 Proposed Solution

Conversely, in CBF-HybridBERT4rec, the focus is on extracting the user representation that

G. Adomavicius and A. Tuzhilin [2] :

H. Wang [3] proposed the ZeroMat[3]:

R. Esmeli, M. Bader-El-Den and H. Abdullahi [4] : in 2017, conducted a fully content-

Figure 1. High level architecture of recommendation system

In general, collaborative filtering is the workhorse of recommender engines. The algorithm

Figure 2. Collabrative Filtering

Figure 3. Content Based Filtering

Figure 2 data flow diagram

Figure 6. Sequence Diagram

3.2.3 Use Case Diagram:

Figure 7. Use Case Diagram

2. Recommendation Engine : the recommendation engine is a combination of both

In sequential recommendation, let U={u1,u2, . . . ,u |U | } denote a set of users, V={v1,v2, . .

4.1.1 What is Bert / Model Architecture :

Figure 8. A comparison between Transformers and traditional RNN

Recommendation. It is built upon the popular self-attention layer, “Transformer layer”. As

Lets get into deep one by one

4.2.2 Attention Mechanism: Attention mechanisms have become an integral part of

4.2.3 Feed Forward Neural Network:

4.2.5 Embeddings and Softmax:

● Zi is the raw score or logit for class i

Figure 10. BERT

Figure SEQFigure Figure

Figure 12.SEQ Figureof\*heatmap

● v is the number of votes for the movie (vote count)

Figure 13. Training BERT on movie overview

SVD and Recommendations

𝑚𝑖𝑛𝑈𝑣 ∑ ∑ (𝐴𝑖𝑗 − [𝑈𝑉 ∑𝑇 ]𝑖𝑗 )2

4.4.3 Hybrid Recommendation:

Figure 14. Cosine similarity for first 10 embeddings of dataframe

Precision@k = (Number of relevant items in top-k) / k

Precision@10: 10.00% (what does it indicates?):

Precision measures the accuracy of the recommendations by calculating the proportion of

The precision@10 for our Hybrid BERT recommendation system is :

RMSE And MAE in Recommendation Systems:

The RMSE and MAE results are:

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean