You are on page 1of 60

Bachelor of Science in Computer Science

May 2023

Sentiment Analysis Of IMDB Movie


Reviews
A comparative study of Lexicon based approach and
BERT Neural Network model

Prashuna Sai Surya Vishwitha Domadula


Sai Sumanwita Sayyaparaju

Faculty of Engineering, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden


This thesis is submitted to the Faculty of Engineering at Blekinge Institute of Technology in
partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science.
The thesis is equivalent to 10 weeks of full-time studies.

The authors declare that they are the sole authors of this thesis and that they have not used
any sources other than those listed in the bibliography and identified as references. They further
declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information:
Author(s):
Prashuna Sai Surya Vishwitha Domadula
E-mail: prdo22@student.bth.se

Sai Sumanwita Sayyaparaju


E-mail: sasy22@student.bth.se

University advisor:
Dr Prashant Goswami, Associate Professor
Department of Computer Science and Engineering

Faculty of Engineering Internet : www.bth.se


Blekinge Institute of Technology Phone : +46 455 38 50 00
SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57
Nomenclature

AI Artificial Intelligence

AUC Area Under the Curve

BERT Bidirectional Encoder Representations from Transformers

BoW model Bag of Words model

CNN Convolutional Neural Network

DTM Document term matrix

F1 score Machine learning evaluation metric used for model’s performance.

GRU Gated Recurrent Unit

IMDb Internet Movie Database

kaggle online website which contains numerous databases.

LSTM Long Short Term Memory

ML Machine Learning

NLP Natural Language Processing

NLTK Natural Language Tool Kit

RNN Recurrent Neural Network

TF-IDF Term Frequency Inverse Document frequency

VADER Valence Aware Dictionary for sEntiment Reasoning


Abstract

Background: Movies have become an important marketing and advertising tool


that can influence consumer behaviour and trends. Reading film reviews is an im-
important part of watching a movie, as it can help viewers gain a general under-
standing of the film. And also, provide filmmakers with feedback on how their work
is being received. Sentiment analysis is a method of determining whether a review
has positive or negative sentiment, and this study investigates a machine learning
method for classifying sentiment from film reviews.

Objectives: This thesis aims to perform comparative sentiment analysis on tex-


tual IMDb movie reviews using lexicon-based and BERT neural network models.
Later different performance evaluation metrics are used to identify the most effective
learning model.

Methods: This thesis employs a quantitative research technique, with data anal-
ysed using traditional machine learning. The labelled data set comes from an online
website called kaggle(https://www.kaggle.com/datasets), which contains movie
review information. Algorithms like the lexicon-based approach and the BERT neu-
ral networks are trained using the chosen IMDb movie reviews data set. To discover
which model performs the best at predicting the sentiment analysis, the constructed
models will be assessed on the test set using evaluation metrics such as accuracy,
precision, recall and F1 score.

Results: From the conducted experimentation the BERT neural network model
is the most efficient algorithm in classifying the IMDb movie reviews into positive
and negative sentiments. This model achieved the highest accuracy score of 90.67%
over the trained data set, followed by the BoW model with an accuracy of 79.15%,
whereas the TF-IDF model has 78.98% accuracy. BERT model has the better preci-
sion and recall with 0.88 and 0.92 respectively, followed by both BoW and TF-IDF
models. The BoW model has a precision and recall of 0.79 and the TF-IDF has a
precision of 0.79 and a recall of 0.78. And also the BERT model has the highest F1
score of 0.88, followed by the BoW model having a F1 score of 0.79 whereas, TF-IDF
has 0.78.

Conclusions: Among the two models evaluated, the lexicon-based approach and
the BERT transformer neural network, the BERT neural network is the most effi-
cient, having a good performance score based on the measured performance criteria.

ii
Keywords: Bag of Words(BoW), Deep Learning, IMDb Movie Reviews, Machine
Learning, Natural Language Processing(NLP), Sentiment Analysis, Term Frequency-
Inverse Document Frequency(TF-IDF).

iii
Acknowledgments

We greatly appreciate all of the advice, thoughts, and recommendations that we


received from our supervisor, Dr Prashant Goswami. We are grateful for our super-
visor’s time and efforts, as well as his ongoing support, throughout the thesis.

We would like to thank everyone who stayed as a direct or indirect support. Last
but not least, we want to thank our parents for their trust in us, without which we
would not have been able to complete this thesis.

Prashuna Sai Surya Vishwitha Domadula


Sai Sumanwita Sayyaparaju

iv
Contents

Nomenclature i

Abstract ii

Acknowledgments iv

1 Introduction 1
1.1 Ethical, societal and sustainability aspects . . . . . . . . . . . . . . . 3
1.2 Aim and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Defining the scope of the thesis . . . . . . . . . . . . . . . . . . . . . 4
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 6
2.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Natural Language Processing. . . . . . . . . . . . . . . . . . . 7
2.1.2 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.4 F1 score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.5 Epoch and accuracy curve . . . . . . . . . . . . . . . . . . . . 12
2.2.6 Epoch and loss curve . . . . . . . . . . . . . . . . . . . . . . . 12

3 Related Work 13

4 Method 15
4.1 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.1 Software tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.2 Data collection and visualization . . . . . . . . . . . . . . . . 19
4.2.3 Removing HTML tags and noises in the text . . . . . . . . . . 20
4.2.4 Removing special characters . . . . . . . . . . . . . . . . . . . 20
4.2.5 Word stemming . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.6 Removing stop words . . . . . . . . . . . . . . . . . . . . . . . 20

v
4.2.7 Tokenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.8 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.9 Data splitting: training, validation, and testing . . . . . . . . 21
4.3 Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.1 Lexicon-based approach: . . . . . . . . . . . . . . . . . . . . . 22
4.3.2 BERT neural network . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Sentiment Classification . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Results and Analysis 25


5.1 Results of Literature review . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Results of the experiment . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.2 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.3 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.4 F1 score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Discussion 38

7 Conclusions and Future Work 43


7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

References 45

vi
List of Figures

2.1 Flowchart depicting various concepts from which we derived the meth-
ods employed in the thesis. . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Working of lexicon-based approach . . . . . . . . . . . . . . . . . . . 8
2.3 Structure of a neural network . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Working of a BERT neural network in classifying the IMDB movie
reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.1 Flowchart showing various steps involved in performing the experi-


ment for addressing research question 2. . . . . . . . . . . . . . . . . 16
4.2 The imported data set is depicted in this image, with columns having
the id, review, and sentiment of the movies. . . . . . . . . . . . . . . 19

5.1 A bar plot comparing the accuracy scores of three models employed
in this study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Graph plotted using the BERT model to demonstrate the model’s
accuracy using validation data. . . . . . . . . . . . . . . . . . . . . . 34
5.3 A bar plot comparing the precision scores of three models employed
in this study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4 A bar plot comparing the recall scores of three models employed in
this study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.5 A bar plot comparing the F1 scores of three models employed in this
study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.1 Correlation matrix of the BoW model showing the classified no.of true
positives and true negatives on IMDb movie reviews data set. . . . . 39
6.2 Correlation matrix of the TF-IDF model showing the classified no.of
true positives and true negatives on IMDb movie reviews data set. . . 39
6.3 Correlation matrix of the BERT model showing the classified no.of
true positives and true negatives on the IMDb movie review data set. 40
6.4 A curve plot showing the comparison of performance metrics evaluated
for the three classifier models used in the thesis. . . . . . . . . . . . . 41

vii
List of Tables

4.1 Hardware tool 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17


4.2 Hardware tool 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Table consisting of the number of reviews in training, validation and
testing data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Sentiment classification table showing examples of actual sentiment
and predicted sentiment of movie reviews using the lexicon-based and
BERT neural network classifier models. . . . . . . . . . . . . . . . . . 24

5.1 Table showing the results of the literature survey conducted in our
thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Continuation of table 5.1 showing results of the literature survey. . . 27
5.3 Continuation of table 5.1 showing results of the literature survey. . . 28
5.4 Continuation of table 5.1 showing results of the literature survey. . . 29
5.5 Summary of table 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.6 Continuation of the Summary of table 5.1 . . . . . . . . . . . . . . . 31
5.7 Continuation of the Summary of table 5.1 . . . . . . . . . . . . . . . 32
5.8 Table of the number of reviews classified into positive and negative by
each classifier model used in the study. . . . . . . . . . . . . . . . . . 32
5.9 Table of accuracy scores for all classifier models in this study. . . . . 33
5.10 Table of precision scores for all classifier models in this study. . . . . 34
5.11 Table of recall values for all classifier models in this study. . . . . . . 35
5.12 Table of F1 scores for all classifier models in this study. . . . . . . . . 36

viii
Chapter 1
Introduction

Social media trends have made it easy for people to be affected in the modern age,
and movies have a significant influence on people’s lives. Their influence on the
public’s view of several subjects, such as social security, politics and religion, has
a significant cultural impact. They are also important sources of entertainment,
providing viewers with an emotionally stimulating experience. In addition, they
can impact today’s culture concerning music, fashion and other elements. Generally
speaking, films can teach viewers about some of the topics that are important to
them, like history, science and literature [1]. In addition, they can support people’s
efforts to find an interest or career. By convincing individuals that certain acts
are appropriate or desirable, they will display behavioural influence and ultimately
change people’s behaviour. In conclusion, movies can have a big impact on how
people feel, think, and act, therefore it’s important to understand the messages
they’re trying to convey. They can also reinforce societal norms and values, helping
to shape individuals’ attitudes and beliefs.
Every time a movie reaches the audience, you can use the performance of the
movie and the number of viewers to determine how good the movie is. Viewers
have many opinions, and different people have different ways of describing how good
a movie is. One such possibility might be a text review of a movie, which is the
opinion of an individual who has seen the movie [2]. IMDb is one of the websites that
contains a huge selection of movie reviews, but there are many more online platforms
where you may observe individuals publishing these reviews. Information about
films is found in the Internet Movie Database (IMDb) at https://www.imdb.com/.
Users may search through a large collection of works and filter their search results
by choosing parameters such as genre, release year, popularity, and others. IMDb
incorporates user-generated information, such as ratings and reviews made by viewers
of a film, television programme, or video game. Users may score on a scale of 1 to
10 and submit reviews to convey their ideas and opinions.
People are interested in film reviews and ratings. These evaluations are essential
for understanding how well films operate. A film’s quantitative success or failure may
be measured by awarding it a particular number of stars, but a review collection can
provide a deeper qualitative understanding of the film’s numerous features. There are
various methods for submitting a review. The first is a numerical rating composed of
many ratings, and the second is a written evaluation. Rating-based evaluations are
brief, to the point, and numbered from 1 to 10. Textual movie reviews highlight a
film’s virtues and shortcomings. A deeper study into the review shows if it matches
the reviewer’s expectations [3].

1
Chapter 1. Introduction 2

Let us take an example of the most recent movie "RRR" and consider two re-
views from the IMDb website.

Review-1: Rating: 10/10


I have seen a lot of movies in my time, made in a lot of different styles from different
genres, from all around the world. I’ve seen everything from the most mainstream
movie imaginable to the most experimental. I can’t even remember the last time I
came away from a movie thinking that I’d never seen anything like it. But that’s how
I felt after "RRR" [4].

Review-2: Rating: 5/10


And the award for the most overrated film of 2022 goes to...2 December 2022, RRR
is one of the most critically acclaimed films of this year. It’s been widely celebrated
by both audiences and critics alike and is one of the most internationally successful
tollywood films of all time. My question is why [4]?

As you can see from the movie reviews above, a particular film has garnered a
range of reactions, including positive, negative or neutral. However, how can one
assess the overall quality of a film? The most straightforward solution is to integrate
the findings of various film reviews and award a grade of good or poor. This can be
done through "sentiment analysis". The natural language processing technique
of "sentiment analysis" [5], also referred to as "opinion mining", can be used to
identify or classify the emotional tone of a text. A movie’s performance or its effect
on a specific audience can be assessed by classifying movie reviews. We can analyze
the ratio of positive to negative reviews to determine how well the film business is
doing.
In the modern digital landscape, individuals now have an unparalleled opportu-
nity to voice their thoughts and share personal experiences through online platforms.
One notable aspect of this phenomenon is the ability to write and publish reviews
on movies. Online movie reviews have become highly significant and influential for
a variety of reasons as discussed in the earlier sections of this chapter. The need
to understand, mine and analyse the data has increased significantly as a result of
the massive evolution of the data and the amount of data that is exchanged and
produced every second. Machine learning methods and neural networks using deep
learning were keys to the big data era because conventional traditional models were
insufficient to obtain results in these big data.
There are numerous methods for performing sentiment analysis, the most preva-
lent of which is the classic method which depends on Natural Language Processing [6]
(NLP) techniques. However, some manual feature extraction is required. Because
of the growing interest in social media analysis, Artificial Intelligence (AI) [7] tech-
nologies connected to text analysis have more attention. Several layers of labelled
data or a neural network architecture can be used to train deep learning models.
Sometimes, they execute better than humans do. These methods eliminate the need
for manual feature extraction by extracting features directly from the data.
This thesis aims to perform a comparative sentiment analysis on textual IMDb
movie reviews. Here, there is a need to convert these text-based reviews into the data
form, a machine (i.e. ML classifier) can understand. So, by using a traditional NLP
Chapter 1. Introduction 3

technique called the lexicon-based approach and a deep learning model called the
BERT neural network model, we could achieve the previously discussed classification.
Later, different performance evaluation metrics are used to identify the most effective
learning model over the IMDb data set.

1.1 Ethical, societal and sustainability aspects


The social component of this thesis intends to assist social media movie lovers and
reviewers in identifying the comparative outcomes of the film business over time.
And as well as compare the polarity of movie reviews. Furthermore, there are no
substantial ethical issues because the data we utilize is legitimate, open to anyone,
and free of any public violations. The study’s data set also ignores any personally
identifiable information, such as a person’s name, age, or identity. The information
is compiled from a survey of different people’s opinions in the form of written reviews
imported from the kaggle website. The information from the data collected cannot
be linked in any way to a specific person. Additionally, performing sentiment analysis
on movie reviews using deep learning models can be used to analyze movie industry
geographies, ethnicities, and cultures. This is to get the most intrinsic insights about
popularity, competitiveness and success rates. These can constitute social aspects.
And there are no immediately identifiable sustainability aspects.

1.2 Aim and Objectives


1.2.1 Aim
This thesis aims to perform comparative sentiment analysis on textual IMDB movie
reviews using lexicon-based and BERT neural network models. The motivation be-
hind choosing these algorithms is discussed in the related works chapter under the
gap of the research section. Later different performance evaluation metrics are used
to identify the most effective learning model.

1.2.2 Objectives
The primary objectives of the thesis can be stated as follows:

• To choose and extract an appropriate data set and to conduct an exploratory


analysis of the data by performing data visualization.

• To perform data pre-processing, by applying tokenization methods and word


embeddings on the data.

• To split the data into training, validation, and testing sets and to apply and
build the selected algorithms by using training data to train the deep learning
algorithms.

• Comparative evaluation of the models on the test set is performed by consider-


ing performance metrics including accuracy, precision, recall, F1 scores and the
Chapter 1. Introduction 4

area under the accuracy and loss curves as discussed in section 2.2. And then
based on the evaluation’s findings, the most efficient algorithm is identified.

1.3 Research questions


1. How do the lexicon-based approach and BERT neural network model
contribute to sentiment analysis in providing efficient text classification?

Motivation: The motivation of this research question is to comprehend how each


approach, lexicon-based and BERT neural network model works before applying sen-
timent analysis to the data set of movie reviews by training the models.

2. Which of the two, the lexicon-based approach and BERT neural net-
work is more accurate in conducting sentiment analysis?

Motivation: The primary motivation of research question 2 is to figure out the


most accurate model that outperforms the input data.

1.4 Defining the scope of the thesis


This thesis focuses on performing sentiment analysis on IMDb movie reviews using
a comparative study of two methodologies, namely the lexicon-based approach and
the BERT neural network model. The models are then evaluated based on their
performance using performance evaluation metrics on the IMDb data set, which
contains movies and their reviews.

1.5 Outline
• Chapter 1, introduction, provides a basic overview of the plot topic and how
sentiment analysis might assist reduce the problem. In addition, the numerous
ways that we decided to implement the notion, as well as an overall overview
of the scope of the thesis, are presented here.
• Chapter 2, comprises background, which includes an introduction to the tools
and subjects utilised in the thesis to familiarise the readers with them.
• The background is then followed by chapter 3 on related works, which de-
scribes the related works we researched as part of the literature study. They
also assisted us in finding the gap in the thesis.
• Chapter 4 is the method chapter that comprises the implementation portion
of the thesis work utilising the experimental methodology. This assists the
reader in understanding the procedure of the work done to get the intended
outcomes.
• Then followed by chapter 5, the results and analysis, which examines the
collected results and provides a full analysis of the outcomes.
Chapter 1. Introduction 5

• Chapter 6, is the discussion chapter that provides thorough information on


how the research questions are answered utilising the obtained results.

• Finally, in chapter 7, conclusions and future work, we give some findings based
on our experimental analysis, as well as any additional extensions of the study.
Chapter 2
Background

This chapter will concentrate on explaining the different techniques and themes used
in this thesis. The extensive definitions and explanations of every topic assist readers
in better understanding the thesis work. The below figure 2.1, shows the mapping
of different techniques that were employed in the study.

Figure 2.1: Flowchart depicting various concepts from which we derived the methods
employed in the thesis.

2.1 Sentiment Analysis


Sentiment analysis is the process of revealing the viewpoint of a particular text.
Using sentiment analysis, one can detect whether a text has emotions that are neg-
ative, positive, or both. It is a type of text analytics that incorporates machine
learning and Natural Language Processing (NLP). The terms "opinion mining" and
"emotion artificial intelligence" are other names for sentiment analysis [8]. Our thesis
is related to text-based sentiment analysis which is extensively used and researched
in the field of computational linguistics. The methods for text analysis include topic
modelling, sentiment analysis, document classification and summarization as well as
subjective opining mining.

6
Chapter 2. Background 7

The process of extracting a text relevant data that conveys underlying sentiment
or emotion can be done through the lexicon-based approach, the machine learning
approach, or the hybrid approach which integrates both methods and can be used
to do sentiment analysis at the word or phrase level, sentence level, and document
level.
Types of sentiment analysis:
Sentiment analysis can be carried out in different ways for different uses. The fol-
lowing are the categories of sentiment analysis.
1. Fine grained sentiment: When assessing sentiments, commonly used cate-
gories such as positive, neutral, and negative is used. This includes giving a
rating scale of 1 to 5 or 1 to 10 [9].
2. Emotion detection sentiment analysis: This is a more advanced approach
for recognizing feelings in text. This type of analysis aids in detecting and
comprehending people’s emotions. Anger, sadness, happiness, frustration, fear,
panic, and all possible emotions [10].
3. Aspect based analysis: This kind of sentiment analysis primarily focuses
on a certain service aspect. Consider the following scenario: a corporation
or organisation with products or consumers. Aspect-based sentiment analysis
may assist organisations in automatically sorting and analysing client data,
and automating activities such as customer care duties helps us to acquire
substantial insights [11].
4. Intent based sentiment analysis: Intent classification is the automatic clas-
sification of textual material based on the customer’s intent. An intent classifier
can naturally examine texts and reports and categorize them [12].

2.1.1 Natural Language Processing.


Natural language processing is an artificial intelligence [13] sub-field that deals with
human-computer interaction in natural language. Its primary objective is to allow
computers to comprehend and produce human language. It is a field that integrates
computer science, linguistics, and machine learning to better comprehend human-
computer communication. These NLP approaches are widely utilised in a variety of
applications, including machine translation, text categorization, speech recognition,
chatbots, and sentiment analysis. NLP approaches include the following:

1. Tokenization: It is the process of breaking a whole text into tokens which are
nothing but small individual phrases or words.
2. Parts of speech tagging: It is the process in which each word of a sentence
is labelled to its corresponding parts of speech.
3. Named identity recognition: It is the process of recognizing the identities
of words in a text and classifying them such as people, places, etc.
4. Machine translation: It is the process of translating a text from one language
to another.
Chapter 2. Background 8

5. Text classification: It is classifying or categorizing the text into predefined


categories or topics.

6. Sentiment analysis: It is the process of analysing the tone of the sentiment


of a text into either positive, negative, or neutral.

Lexicon-Based Approach :
A lexicon-based approach is an approach that is also called the rule-based approach
primarily using a dictionary of words consisting of a predefined set of sentiments [14].
A lexicon is a set of features with an assigned sentiment value. It is essentially
utilized as a predetermined list of terms known as the dictionary and each word
has several synonyms with which it is related. "WordNet" and "SenticNet" are
commonly and widely used lexicons. The dictionary consists of sentiment scores
and word scores from the dictionary. These scores are used to find the average
score based on the words in the provided document. And by these averages, we
can compute whether the document consists of positive or negative sentiments. The
accuracy of this method simply relies on how comprehensive the lexicons are. A rule-
based approach called the Valence Aware Dictionary for sEntiment Reasoning
(VADER) [15] is developed. This helps determine the polarity of the document.
This approach considers the valence of the document to analyze the polarity of the
sentiment. Here, valence is nothing but the magnitude of positivity or negativity of
the words.

Figure 2.2: Working of lexicon-based approach


Chapter 2. Background 9

2.1.2 Machine learning


The approach of enabling a computer to imitate the way humans learn to use data
and algorithms by progressively increasing accuracy is known as machine learning.
Machine learning [16]uses different algorithms on huge amounts of data. By making
the model train using the data, the model can perform a specific task. Generally,
machine learning algorithms can be classified into the following types:

• Supervised machine learning: Supervised machine learning algorithms


train the model on a labelled data set to produce the outputs. More precisely,
the machine is trained with both input and its corresponding output and then it
is asked to predict the outputs based on the test data set. Supervised machine
learning techniques are further categorized as:

– Classification
– Regression

• Unsupervised machine learning: Unsupervised machine learning algorithms


train the models using an unlabelled data set and the machine will predict the
results without any supervision. Then the model is trained only using the input
data. Unsupervised machine learning techniques are categorized as:

– Clustering
– Association

• Semi-supervised machine learning: As the name suggests, this technique


lies between both supervised and unsupervised machine learning techniques.
Here, the algorithms will use a combination of both labelled and unlabelled
data sets in the training phase.

• Reinforcement learning: This machine learning technique is quite different


from the above-discussed techniques. Here, this uses an interactive environment
so that the agent can learn based on the feedback from experiences using a
trial and error method. These techniques work in such a way that the model
improves its accuracy.

Neural networks and deep learning :


An array of algorithms known as a neural network is intended to replicate the func-
tioning of the human brain. Execution by an algorithm resembles that of a human
being. They experiment with new things, draw lessons from the past to get better
and make judgments as a result. To analyze the sentiment of text reviews, the neu-
ral network’s capacity for learning might be crucial. In order to provide complicated
outcomes, deep learning incorporates several layers into the neural network’s hidden
layers. Neural networks automatically learn and extract information in comparison
to machine learning methods. This improves the accuracy and performance of deep
learning models. Deep learning models process data and develop patterns like the
human brain. Decision-making processes are based on these patterns. Without any
explicit programming, deep learning approaches can automatically learn and improve
over time. A deep learning model consists of an input layer, a hidden layer, and an
Chapter 2. Background 10

output layer. Layers are made up of nodes, which mimic neurons in the human
brain. The output layer gets the output, the hidden layer does the calculations using
mathematical functions, and the nodes in the input layer receive the input [17]. The
structure of a neural network is shown in Figure 2.1 below.

Figure 2.3: Structure of a neural network

BERT Model :
Bidirectional Encoder Representation for Transformer(BERT) [18] is a natural lan-
guage processing machine learning model which is developed by google research in
the year 2018. The BERT model is based on a deep learning model called trans-
formers, in which each and every output and input are interlinked with each other.
This is due to the weights between the input and output being produced automati-
cally based on their relationship. This is referred to as attention in NLP. The BERT
model is trained on large texts, giving the architecture the ability to understand the
language and to learn variability in data patterns of the NLP tasks. As the name
suggests, the BERT model learns information both from the left and right sides of a
token in the training phase.
Generally, a transformer is comprised of an encoder to read the input text and a
decoder to predict the results. But as the main objective of the BERT model is to
build a language representation model, it only has the encoder. An encoder of the
BERT model takes input in the form of a series of tokens which will be transformed
into vectors to be processed by the neural network. There are many other choices of
transformers like the hugging face:- distilled BERT, XL NET, GPT23 etc [19]. But
the BERT model stands the best among these and shows a better performance in
many NLP tasks.
BERT model generally works by following two steps: 1. pre-training and 2. fine-
tuning. Pre-training steps involve training the model using unlabelled data sets and
for fine-tuning initially pre-trained parameters are used to initialise BERT and the
parameters are fine-tuned using labelled data from the further tasks.
Chapter 2. Background 11

Figure 2.4: Working of a BERT neural network in classifying the IMDB movie reviews

2.2 Performance Metrics


To analyse the performance of a machine learning model, we will discuss various
metrics that might be utilised for model analysis. For the text-based binary class
classification utilised in this thesis, we used different performance metrics such as:

• Accuracy

• Precision

• Recall

• F1 score

Before we define each performance metric let us look at the following terms we use
while defining the performance of a model.

• True Positive (TP) - The result when a model predicts a positive class correctly.

• True Negative (TN) - The result when a model predicts a negative class cor-
rectly.

• False Positive (FP) - The result when a model falsely predicts a positive class
as a negative class.

• False Negative (FN) - The result when the model falsely predicts a negative
class as a positive class.

2.2.1 Accuracy
The accuracy of a classification model can be defined as the ratio of total number
of correct predictions made to the total number of predictions. The equation for
accuracy can be given as [20],
TP + TN
Accuracy = (2.1)
TP + TN + FP + FN
Chapter 2. Background 12

2.2.2 Precision
Precision is defined as the ratio of the number of positive labels classified to the total
number of positive labels. The equation for precision can be given as [20],
TP
P recision = (2.2)
TP + FP

2.2.3 Recall
Recall is the ratio of true positives to true positives and false negatives. This is
nothing but identifying the number of positives labelled correctly. The equation for
the recall is [20],
TP
Recall = (2.3)
TP + FN

2.2.4 F1 score
Another performance metric used is the F1 score, this metric is used to summarize
precision and recall in order to provide better results. F1 score can be defined as the
harmonic mean of both precision and recall lying between 0 and 1. The equation for
the F1 score is [20],
2 ∗ precison ∗ recall
F 1score = (2.4)
precision + recall

2.2.5 Epoch and accuracy curve


This curve is plotted on a graph across the epoch and accuracy parameters, which
are used to understand the model’s performance. The number of trips the training
data set makes around the algorithm is referred to as the epoch [20].

2.2.6 Epoch and loss curve


This curve is plotted on a graph across the epoch and loss parameters, which are
used to understand the training process of the model [20].
Chapter 3
Related Work

In 2019, Kusrini and Mochamad Mashur [21], have compared the accuracy of the
model’s performances after doing sentiment analysis on twitter data using a lexicon-
based approach and polarity multiplication. The study’s findings call for a model
that can manage large amounts of data and perform more accurately on that data.
And this study has demonstrated that the lexicon-based technique is a tried-and-true
strategy that is typically utilized to classify the text easily and with great flexibility.
In 2021, Dingyi Yu [22], in this study, TF-IDF, Bag of Words(BoW) model and
Convolution Neural Network(CNN) training method were used to create a sentiment
analysis system for movie reviews. The data set used for the training and testing
experiment contains 25,000 reviews of movies. The model includes methods like
L2 regularization and dropout to lower the danger of over-fitting. The final model
was found to have an average accuracy of 80.62 percent and a standard deviation
of 1.33. Clearly, the model still has scope for development regarding data selection,
text vectorization, and model optimization.
In 2013, Kamil Topal and Gultekin Ozsoyoglu [23], have performed a study
on emotion analysis of IMDB movie reviews by detecting the emotion of a movie
review in order to observe the performance of movies. This study used a k-means
clustering algorithm to cluster movies according to the reviewer’s emotions per di-
mension. This study suggested a model that can handle large amounts of data in
order to map emotions.
And also in 2018, Rachana Bandana [24] studied sentiment analysis of movie
reviews using heterogeneous features. This study uses a hybrid approach, a com-
bination model of machine learning and a lexicon-based approach which attempted
results, to determine the polarity of a movie review. The conclusion drawn from this
study looks for a model that may use algorithms other than machine learning for han-
dling large data. Also, deep learning features such as Word2Vec, Doc2Paragraph and
word embedding were applied to deep learning algorithms such as Recursive Neural
Network (RNN), Recurrent Neural Networks (RNNs) and Convolutional deep Neural
Networks (CNNs) to get a remarkable result.
In 2022, Sarika, Pavan Kumar [25], have conducted a study. Here, Long Short-
Term Memory (LSTM) and Gated Recurrent Unit (GRU) recurrent neural network
techniques were compared in their study to perform sentiment analysis on IMDb
movie reviews. This analysis led us to the conclusion that LSTM was more accurate
at predicting boundary values. GRU, however, predicted each class similarly. Over-
all, GRU performed somewhat better than LSTM at predicting multi-class text data
of movie reviews.

13
Chapter 3. Related Work 14

Gap identification of the research work:


Several studies have been conducted on sentiment analysis using various approaches
and models, such as lexicon-based methods, CNN, LSTM, and GRU. However, there
is a research gap concerning the direct comparison between lexicon-based approaches
and the BERT neural network model for sentiment analysis of movie reviews. Al-
though related works by Kusrini and Mochamad Mashur [21], Dingyi Yu [22], Kamil
Topal and Gultekin Ozsoyoglu [23], Rachana Bandana [24], and Sarika and Pavan
Kumar [25] have explored different techniques and models for sentiment analysis,
none of them specifically focus on comparing lexicon-based approaches with BERT.
The decision to conduct a comparative study between a lexicon-based approach
and the BERT neural network model is driven by several factors. Firstly, lexicon-
based approaches have demonstrated their effectiveness in sentiment analysis and are
widely utilized. These approaches utilize pre-defined sentiment dictionaries to assign
sentiment scores, making them efficient and easy to interpret. On the other hand, the
BERT neural network model, known for its advanced capabilities in natural language
processing tasks, including sentiment analysis, has emerged as a state-of-the-art ap-
proach. BERT models leverage pre-training on extensive unlabeled text, enabling
them to capture contextual representations and comprehend intricate language nu-
ances. This makes BERT highly suitable for tasks requiring a deep understanding
of context and semantics.
And particularly for lexicon approaches, the choice of the Bag-of-Words (BoW)
model and the Term Frequency-Inverse Document Frequency (TF-IDF) model in
this thesis is driven by several factors. Firstly, these models offer simplicity and
interpretability, making them easy to understand and analyze. They also provide
flexibility and adaptability, allowing for customization by incorporating additional
features or domain-specific lexicons. Additionally, the BoW and TF-IDF models
have a solid foundation in sentiment analysis and are widely used as baselines for
comparison. They are computationally efficient and suitable for large datasets like
IMDb movie reviews. Although these models have limitations in capturing nuanced
contexts, the explicit sentiment expressions in movie reviews make them suitable for
this task. The thesis aims to compare these lexicon-based approaches with the more
advanced BERT neural network model to evaluate the trade-offs between simplicity
and sophistication in sentiment analysis.
In conclusion, by comparing these two approaches, we can gain valuable insights
into their individual strengths and weaknesses. The lexicon-based approach offers
simplicity, transparency, and efficiency, while BERT holds more accurate and nu-
anced sentiment analysis through contextual understanding. This comparative study
allows for a comprehensive evaluation of their performance, facilitating informed
decision-making on the most suitable approach for sentiment analysis tasks, partic-
ularly in the context of textual IMDb movie reviews.
Chapter 4
Method

The focus of the method chapter is to present the methodology that we employed in
the thesis.
For RQ1, the research starts with a thorough analysis of the literature to find the
most popular algorithms for performing sentiment analysis. The most common em-
pirical methods include surveys, case studies, and experiments. Then the selected
algorithms are used and trained over the chosen data set, to evaluate each model’s
contribution in classifying the sentiment of the IMDb movie reviews.
For RQ2, we decided to use experimentation [26] as our research method in our
thesis, which is discussed in section 4.2 below. The experimental approach requires
the researcher to carry out an experiment in a methodical way to get the desired
results. This method’s main goal is to use present evaluation methodologies to apply
and assess the chosen algorithms. The experiment used the same hardware and
software tools covered in this chapter. And then the chosen IMDb movie reviews data
set is used to train the chosen algorithms for categorizing the reviews as positive or
negative using sentiment analysis. Finally, we show the comparison of the algorithms
based on performance measures to determine which one is the most accurate and
efficient.

4.1 Literature review


A literature review is a detailed examination of previous research on a certain
topic. To locate, evaluate, and synthesise relevant questions, systematic and co-
ordinated searches are necessary. A systematic literature review’s purpose is to offer
an overview of the body of information on a certain issue. This is also used to identify
research gaps and promote more research [27].
To address research question 1, we conducted an extensive literature review to
assess the performance of the selected algorithms in sentiment analysis. The review
encompassed a wide range of sources, including springerlink, IEEE Xplore, google
scholar, ACM digital library, IGI global, and others. We identified and analysed a
large number of relevant papers using a properly constructed search string. This
literature research served to lay a strong basis and provide justification for the se-
lection of these two algorithms for our study. We were able to make conclusions and
substantiate our decision to look into the effectiveness of these particular algorithms
in sentiment analysis by looking at previous studies.
Process of Literature review:
The steps taken as part of this process are briefly explained below:

15
Chapter 4. Method 16

1. Identifying the keywords that are related to our thesis, the main key concepts
we chose are, "sentiment analysis", "IMDb movie reviews", "opinion mining",
"text-based classification", "deep learning", "neural networks", "lexicon-based
approaches" and more.

2. Shortlisting all the studied and reviewed resources that are helpful in working
with the thesis.

3. By selecting the references to be added based on a set of criteria such as:

• Considering inclusion requirements including works addressing at least


one of our research aims, publications published within the previous ten
years, and research papers with English as the primary language of the
study, as well as top-tier conference publications.
• Exclusion requirements involved excluding research papers written in
languages other than English and eliminating paid papers, student arti-
cles, and earlier versions with multiple versions. Additionally, we disre-
garded articles that were not directly relevant to the focus and objectives
of our research.

4.2 Experiment
The below, flowchart is a step-by-step process of all the steps followed in the exper-
imentation process [26] involved in answering research question 2.

Figure 4.1: Flowchart showing various steps involved in performing the experiment
for addressing research question 2.

The overall remainder of the experiment done is: open-source Python packages
like matplotlib, pandas, scikit-learn, and others are used to train the selected algo-
rithms for research question 2. Firstly, the IMDb movie reviews data set will be
utilized to conduct sentiment analysis using a lexicon-based methodology, and then
BERT neural networks model. Using methods like word embeddings and tokeniza-
tion, we will first preprocess the data set before doing sentiment analysis on the test
data and then train the models over the data set. And then categorize the outcomes
Chapter 4. Method 17

as either positive or negative. Finally, we will compare the two neural networks and
determine which one is more effective by using performance evaluation metrics like
precision, accuracy, F1 score, epoch, accuracy and loss curves, and the area under
the curves.

Hardware tools

OS macOS version 12.6


Processor apple M1 chip
Memory 8GB

Table 4.1: Hardware tool 1

OS windows 11
Processor intel core i7
Ram memory 16GB

Table 4.2: Hardware tool 2

4.2.1 Software tools


We chose Python programming language for the coding part of the thesis because it
is easy to learn and implement, and it contains a variety of libraries and modules that
will make our job simpler. This language is comprised of several frameworks that
aid in fields such as artificial intelligence, machine learning, and statistical analysis.
Furthermore, as our job is data-driven, using Python as a foundation language made
visualization and analysis quicker and more thorough.
The Python libraries used as a part of the study are:

1. NumPy: NumPy (Numerical Python) is a core Python module that is widely


used in machine learning. It supports large, multi-dimensional arrays and
matrices, as well as a set of mathematical functions for effectively operating on
big arrays. In this thesis, we used NumPy for data processing and numerical
calculations [28].

2. Pandas: Python’s pandas library is a popular tool for data analysis and manip-
ulation. It offers several methods to efficiently handle and modify data coupled
with high-performance data structures like data frames and series. Pandas is
widely used in machine learning because of their capabilities in data handling,
preprocessing, data exploration, and analysis. In our thesis, we turned all of the
obtained data into a data frame for analysis and prediction using pandas [28].

3. Matplotlib: Python’s matplotlib is a well-known charting toolkit that offers


a versatile and complete collection of tools for making different kinds of plots
and visualizations. We used matplotlib for plotting bar plots that are used in
data visualization and data analysis [28].
Chapter 4. Method 18

4. Seaborn: Seaborn is a matplotlib-based python data visualization package. It


is intended primarily for the creation of aesthetically appealing and informative
statistics visuals. The seaborn library is also used for data exploratory analysis
[28].

5. NLTK: The Natural Language Toolkit (NLTK) is a set of resources and tools
for working with human language data that are included in a Python package.
It is frequently employed in tasks involving computational linguistics and Nat-
ural Language Processing (NLP). This library is used for word embeddings,
text categorization, word stemming, text processing, and tokenization [29].

6. Sklearn: Scikit-learn (or sklearn) is a popular Python machine learning li-


brary. It contains a number of tools and algorithms for carrying out various
machine-learning tasks such as classification, regression, clustering, dimension-
ality reduction, and model selection. In this thesis, the sklearn library is used
to divide data into train, validation, and test subsets, and build features for
text inputs, tokens, and count vectors such as frequency count for TF-IDF [28].

7. TensorFlow: TensorFlow is a Python library that is used to create deep-


learning models that can handle huge data. In this thesis, TensorFlow is used
for the BERT neural network model in order to make the process simple and
fast [28].

8. PyTorch: Torch, often known as PyTorch, is a prominent open-source ma-


chine learning framework used for deep learning applications. It is built on the
torch library and offers a high-level interface for developing and training neural
networks. Because of its versatility and performance, it is a popular choice for
constructing and training cutting-edge deep learning models. In the thesis we
have used this library for the BERT neural network model in order to make
the process simple and fast [28].
Chapter 4. Method 19

4.2.2 Data collection and visualization


The data utilized in this study for sentiment analysis is extracted from a data set
imported from the online website https://www.kaggle.com/. The IMDb movie
reviews data set contains 50K movie reviews [30]. This data set contains 25,000 re-
views, each categorized as negative or positive, and has three columns: id, sentiment,
and review. The picture below depicts the first 20 columns of the data set.

Figure 4.2: The imported data set is depicted in this image, with columns having
the id, review, and sentiment of the movies.

The IMDb movie reviews data set, which was imported, has 50,000 movie re-
views. The data set shown here in Figure 4.2 has 25,000 positive reviews and 25,000
negative reviews. Having an equal number of positive and negative reviews in the
data set offers several advantages for sentiment analysis performed by using machine
learning techniques. It ensures balanced training, preventing biases towards either
sentiment. The equal distribution facilitates accurate performance evaluation, allow-
ing for reliable comparisons of different models. It improves model generalization by
capturing underlying patterns for both sentiments. Additionally, it mitigates bias
and promotes fair sentiment analysis results. Overall, the balance in the data set
enhances the effectiveness and reliability of sentiment analysis models.
Chapter 4. Method 20

Data preprocessing steps:

4.2.3 Removing HTML tags and noises in the text


In order to obtain meaningful data, it is often very important to remove and clean
the data of the data set by removing HTML tags, and other unwanted elements like
the URL tags, hashtags, etc which are of no use.
Let us consider the following review of the data set as an example.
Review: A wonderful little production. <br /><br />The filming technique is
very unassuming- very old-time-B... [30]
Processed review: A wonderful little production.The filming technique is very
unassuming- very old-time-B...
If you look at the above example, at first the review consists of unwanted HTML
tags, but then on preprocessing the data, the tags from the review are removed.
But if you observe the review still consists of some unwanted symbols which can be
removed in the next step.

4.2.4 Removing special characters


This step involves removing special characters like punctuation marks, symbols, and
numbers that have no use while performing sentiment analysis. This also includes
converting all the characters into lowercase.
Let us consider the same review taken above as an example.
Review: A wonderful little production. <br /><br />The filming technique is
very unassuming- very old-time-B... [30]
Processed review: a wonderful little production the filming technique is very unas-
suming very old time

4.2.5 Word stemming


Stemming is the process of stripping a word of its suffixes and prefixes and returning
it to its root form. In order to treat terms with the same root as the same word, the
suggested system conducts stemming to lower the frequency of those words. Here,
we’ve performed stemming using the NLTK (Natural Language Tool Kit) Python
module.
Let us consider another review from the data set to show the stemming of words.
Review: What happened? What we have here is basically a solid and plausible
premise with a decent twist [30].
Processed review: what happen what we have he is basic a solid plaus premise with
a decent twist

4.2.6 Removing stop words


This step involves removing stop words from the data. Usually stop words refer to
words that add no meaning to text that is their presence does not have any use.
Words like "him, "they", "it", "both", "how", "does" etc do have nothing to do with
sentiment identification. So such stop words are removed which helps in decreasing
Chapter 4. Method 21

the processing time of the model.


Review: I really like Salman Kahn so I was really disappointed when I have seen
this movie. It didn’t have much of what I expected [30].
Processed review: really salman khan really disappointed seen movie expected

4.2.7 Tokenization
Tokenization is a data preprocessing technique of converting a separate piece of text
into smaller parts like words, phrases, or any other meaningful elements called tokens
which makes counting the number of words in the text easier. The proposed system
performed tokenization at the word level so as to consider the sentiment polarity of
each word.
An example of tokenization of a review is shown below.
Review: I thought this was a wonderful way to spend time on a too-hot summer
weekend, sitting in the movie theatre enjoying myself. I really loved the film [30].
Tokenised review: ’i’, ’thought’, ’this’, ’was’, ’a’, ’wonderful’, ’way’, ’to’, ’spend’,
’time’, ’on’, ’a’, ’too’, ’hot’, ’summer’, ’weekend’, ’sitting’ ’in’, ’the’, ’movie’, ’the-
atre’, ’enjoying’, ’myself ’, ’i’, ’really’,’ loved’, ’the’, ’film’

4.2.8 Word Embedding


Machine learning algorithms usually do not have the capacity to interpret the data
consisting of plain text or strings in its original form. They need numerical inputs
in order to perform the tasks. So the process of mapping words from the lexicon
to the corresponding vector of numbers, to derive words for sentiment predictions is
called word embedding. There are two word embedding models that have already
been trained namely the word2vec [2] and glove. We did, however, use an embedding
layer given by Keras, in which we used a vocabulary of 8000 unique words, with
each word embedded in a 100-dimension vector space. We trained our embedding
layer using training samples from the IMDb movie review data set rather than a
pre-trained embedding word model [31].

4.2.9 Data splitting: training, validation, and testing


In general, while working with machine learning models, we divide the data set into
three categories: training data set, validation data set, and testing data set. The
training data set is used to train the model over the data set, while the validation
set is a subset of the training data set that is not directly utilized for training but
provides insights into the model’s performance. The testing data set, on the other
hand, is used to assess the model’s accuracy.
This division of the data set can be done using many ways. Here in our study, first,
we checked for duplicate values on the data set and found that the imported data set
consists of 0.836% (i.e. 418 values) of duplicate values. So the total number of unique
values is 49582. So now, the data set is randomly split into three subsets: the training,
validation, and testing set with percentages of 70:10:20 respectively. The splitting
of the data set is done using the scikit-learn function called the train_test_split().
The below table gives the number of reviews in each data subset.
Chapter 4. Method 22

Data subset Number of reviews


Training data set 34707
Validation data set 4953
Test data set 9922

Table 4.3: Table consisting of the number of reviews in training, validation and
testing data sets.

4.3 Classifiers
4.3.1 Lexicon-based approach:
As previously stated, sentiment analysis on movie reviews can be accomplished in a
variety of methods; however, the approach we have selected is the standard method
employing a lexicon-based system. The two processes addressed here are the BoW
("Bag Of Words") model and the TD-IDF- ("Term Frequency - Inverse Document
Frequency") model.

BoW model :
The bag of words model involves turning the data set into a matrix and the data
is converted in the form of vectors. Such data sets are called the Document Term
Matrix - (DTM). Here, in the IMDb data set the rows of the matrix correspond to
the reviews and the columns correspond to the words of the reviews. These words are
called n-grams which means a phrase "N-gram" representing "n" number of words.
Here in the proposed system, the values of the DTM are filled and represented in
the form of a count. Count refers to the count of occurrences of the word in the
corresponding review.

TF-IDF model :
This model is used to convert text documents to matrices containing TF-IDF fea-
tures. The term frequency-inverse document frequency is the measure of an essential
word in the text.
Then these models are trained using the imported data set to perform a binary
classification in predicting the sentiment of the reviews and categorizing them into
two outputs either as positive or as negative.

4.3.2 BERT neural network


We use a neural network model called Bidirectional Encoder Representations from
Transformers(BERT) to do emotional analysis on the IMDb movie data set. In this
study, the aim is to categorize the reviews as either positive or negative. This model
is a pre-trained deep learning model that can be fine-tuned on a specific job.

Fine tuning :
BERT neural network must be fine-tuned on the data set of the labeled movie re-
views. The fine-tuning process is nothing but adjusting the weights of the model
using back-propagation and gradient descent in order to minimize the loss function.
Chapter 4. Method 23

The loss function is the measurement of the predicted sentiment to the actual sen-
timent of the labeled data. The fine-tuning process is performed using a Pytorch
optimizer.
Mathematically, the fine-tuning is represented as follows:
f (x_i; θ) = sof tmax(w_2 ∗ (ReLU (W _1 ∗ h_i + b_2)) [32]
The terms of the equation are explained as follows:

1. The function f(x_i; θ) defines a neural network model that takes in input as a
movie review in the form of a sequence of tokens and uses a pre-defined BERT
model. Here in this study, we have used the bert-based-uncased model, to
obtain contextualized embedding of the input sequence i.e. a movie review.

2. The word embeddings are then passed through two ReLU-enabled linear trans-
formations, and a softmax function is applied to the output to obtain the
probability distribution (positive or negative sentiment) across classes.

3. The model parameters (θ) are trained by minimizing a loss function between the
anticipated probability distribution and the true labels of the movie reviews.

4. The h_i is the output of the BERT model’s final encoding layer for the pre-
processed text.

5. x_i, W_1 and b_1 are the first completely connected layer’s weights and
biases, ReLU is the rectified linear activation function, and W_2 and b_2 are
the second fully connected layer’s weights and biases. Softmax is the softmax
activation function.

6. The bert-based-uncased model was adjusted using hyper parameters such as a


learning rate of 2e-5, the PyTorch optimizer, a linear scheduler with a maximum
sequence length of 128 and a batch size of 32, and five training epochs.

7. Then finally the BERT model will make use of two output classes i.e. for
classifying both positive and negative sentiment.

4.4 Sentiment Classification


Following the cleaning and preprocessing of the data set, the data is utilized to feed
the classifier models on which the sentiment analysis is performed. The sentiment
classification of IMDb movie reviews entails categorizing each review as either posi-
tive or negative.
The lexicon-based approach uses a word dictionary called a lexicon dictionary,
which comprises words assigned with a sentiment polarity, this sentiment polarity is
usually represented using a sentiment score. A sentiment score is a metric that is
used to measure the polarity of sentiment and represent them in numbers. Here, the
classifier models have been trained to predict the sentiment scores by assigning the
value "1" to a positive sentiment and "0" to a negative sentiment.
The BERT neural network comprises two output nodes that indicate the prob-
ability for positive and negative sentiment. And the network is fed input phrases
Chapter 4. Method 24

that are movie reviews from the IMDb data set, along with their matching sentiment
labels (either 1 for positive or 0 for negative). The model then learns to adjust the
weights of its connections to improve prediction accuracy. Finally, when used for
prediction, the network takes a sentence as input and outputs a probability distribu-
tion across the two probable emotion labels (0 and 1). The label that says the most
likely outcome is then chosen as the anticipated sentiment score.
Below table 4.4, is an example of the results of IMDb reviews being classified into
either positive or negative.

Review Actual sentiment(positive = Predicted sentiment


1 and negative = 0
The movie was very inter- 1 1
esting and i loved it
The movie was awful never 0 0
have seen such a thing.
It was entertaining and joy- 1 1
ful, i really had a great ex-
perience watching it.
I think i have wasted my 0 0
time not worth and was very
bad.

Table 4.4: Sentiment classification table showing examples of actual sentiment and
predicted sentiment of movie reviews using the lexicon-based and BERT neural net-
work classifier models.
Chapter 5
Results and Analysis

This chapter presents a full explanation of the findings achieved in selecting algo-
rithms for efficient sentiment analysis on IMBD movie reviews, as well as the results
obtained after training the selected algorithms on the imported data set. To perform
sentiment analysis on IMDB movie review data, the Bag of Words (Bow) model, as
well as the TD-IDF model and the BERT neural network model, are chosen and used.
To evaluate the algorithm’s accuracy, we used performance metrics such as accuracy,
precision, and f1 score, as well as the epoch-loss and accuracy curves. Every result
of the experiment is part of the test data set. The outcomes of the thesis study are
discussed in depth in the following sections of the chapter.

5.1 Results of Literature review

25
Chapter 5. Results and Analysis 26

S.no Name of the article Results from survey


1. Sentiment analysis In this study approach, the system model
of movie reviews is built using heterogeneous characteristics
using heterogeneous such as machine learning-based and lexicon-
features. [24] based features, as well as supervised learning
algorithms such as Naive Bayes (NB) and
Linear Support Vector Machine (LSVM).
Based on implementation and observation,
it is concluded that the suggested heteroge-
neous features and hybrid approach can pro-
duce a more accurate sentiment analysis sys-
tem than existing baseline systems.
2. Sentiment analysis of In this paper, a novel approach for user sen-
IMDB reviews using timent analysis based on deep Convolutional
Deep Learning Classi- Neural Network(CNN)and natural language
fier. [33] processing is suggested using a data set of
50k reviews from IMDb.
3. A comparison of The performance of Sentence-BERT senti-
Lexicon-based and ment analysis models from code-mixed low-
Transformer-based resource texts is compared to lexicon-based
Sentiment Analysis on sentiment analysis models in this paper. The
code mixed of low re- testing revealed that the combined sentence-
source languages. [34] BERT model and google machine translation
achieved average accuracy.
4. Performance analysis This study demonstrates the use of senti-
of different Neu- ment analysis on movie reviews to choose
ral Networks for the most appropriate architecture. The re-
Sentiment Analy- sults demonstrated that CNN performed bet-
sis on IMDb movie ter than other cutting-edge methods for clas-
reviews. [35] sifying sentiment in IMDb movie reviews.
5. Sentiment Analysis Big data generates massive volumes of un-
using Lexicon-based structured data, which must be processed
approach. [36] in various ways to obtain insight. To han-
dle this data, this research develops a Hu-
man Sentiment Analysis Model (HSAM). A
Human Sentiment Analysis Model (HSAM)
was proposed in this research study. To per-
form sentiment analysis on a given collection
of data, the suggested model employs the
lexicon method’s dictionary-based approach.
The proposed approach derives the sentiment
polarity associated with each word from the
SentiWordnet lexical dictionary. This model
is diametrically opposed to any other study
report examined thus far.

Table 5.1: Table showing the results of the literature survey conducted in our thesis.
Chapter 5. Results and Analysis 27

S.no Name of the article Results from survey


6. Comparing LSTM This study compared the accuracies of LSTM
and GRU for mul- and GRU models by gathering data from two
ticlass Sentiment sources, filtering it with porter stemming and
Analysis of Movie stop words, and coupling them with convolu-
Reviews. [25] tional neural networks. The results indicated
that LSTM performed better in border val-
ues, whereas GRU predicted multi-class text
data of movie reviews somewhat better than
LSTM.
7. Hybrid convolutional This article presents a hybrid Convolu-
bidirectional recurrent tional Bidirectional Recurrent Neural Net-
neural network-based work(CBRNN) model by merging a two-layer
sentiment analysis on CNN with a Bidirectional Gated Recurrent
movie reviews. [37] Unit(BGRU). The suggested CBRNN model
outperforms the state-of-the-art by 2% to
4%, with an F1 score of 87.62% and 77.4%
on the IMDb and Polarity data sets, respec-
tively. However, training takes somewhat
longer than previous methods.
8. Analyzing Sentiment The proposed work used sentiment analysis
using IMDb Dataset. on the IMDb movie reviews data set to il-
[38] lustrate how significant insights may be ob-
tained from textual data. Four classic ML
algorithms, Naive Bayes (NB), Logistic Re-
gression (LR), Random Forest (RF), and De-
cision Tree (DT), were evaluated using six
metrics: confusion matrix, accuracy, preci-
sion, recall, F1 measure, and AUC.
9. Combining a rule- This paper provides a method for senti-
based classifier with ment analysis on Twitter that combines a
weakly supervised rule-based classifier with a poorly supervised
learning for Twitter naive-bayes classifier. The experimental find-
sentiment analy- ings reveal that the technique outperforms
sis. [39] the baseline in terms of recall, precision, F1
score, and accuracy.
10. Sentiment Informa- This paper proposes a sentiment
tion based Model for Information-based Network Model(SINM)
Chinese text Senti- for explicitly learning sentiment knowledge
ment Analysis. [40] in Chinese text. It employs a hybrid
task-learning algorithm using a transformer
encoder and LSTM as model components to
learn meaningful emotional expressions and
forecast sentiment tendencies. Experiments
using ChnSentiCorp and ChnFoodReviews
revealed that SINM outperforms and
generalises most current approaches.

Table 5.2: Continuation of table 5.1 showing results of the literature survey.
Chapter 5. Results and Analysis 28

S.no Name of the article Results from survey


11. Comparative anal- This paper uses sentiment analysis to de-
ysis of customer termine individual user impressions of two
sentiments on com- top smartphone brands in India. The tweets
peting brands using about the two cellphones are utilised as in-
the Hybrid model dividual consumer input, and the sentiment
approach. [41] for each tweet is categorised using a hybrid
model, which is a combination of lexicon-
based sentiment analysis and naive bayes al-
gorithms. The overall feelings for each of
the two cellphones are contrasted, offering a
bird’s-eye view of the user’s impressions. Fi-
nally, the user attitudes toward the specific
features of the two cellphones are compared,
creating a valuable feedback mechanism for
firms.
12. BERT-IAN Model for The purpose of this study is to investigate
Aspect-based Senti- aspect-based sentiment analysis in order to
ment Analysis. [18] forecast the sentiment polarity of a certain
aspect in a phrase. Then to increase the ac-
curacy of the current model, a BERT-IAN
sentiment analysis model is presented. They
employ a BERT pre-training model to encode
aspects and context, and a transformer en-
coder with interactive attention to learn the
aspect and context’s attention interactively.
The experimental results on the restaurant
and laptop data sets demonstrate the BERT-
IAN model’s efficacy and superiority.
13. A comparative anal- The performance of deep and standard en-
ysis of sentiment semble models for binary sentiment classifi-
classification based on cation is investigated in this research. Three
Deep and Traditional traditional ensemble models (Voting Ensem-
ensemble Machine ble, Bagging Ensemble, and Boosting En-
Learning Models. [42] semble) and three deep learning ensemble
layout models (7-L CNN + Gated Recur-
rent Unit (GRU), 7-L CNN + GRU + Globe
Embedding, and 7-L CNN + Long Short-
Term Memory (LSTM) + attention layer)
are used to perform sentiment classification
on two data sets: product review and restau-
rant review. In most situations, deep learn-
ing ensemble models outperformed standard
ensemble models, with 7-L CNN + GRU +
Globe and 7-L CNN + LSTM + Attention
Layer attaining the greatest accuracy.

Table 5.3: Continuation of table 5.1 showing results of the literature survey.
Chapter 5. Results and Analysis 29

S.no Name of the article Results from survey


14. Foreign Rate Ex- This paper investigates the prediction of for-
change Prediction eign currency markets using neural networks
using Neural Net- and emotion analysis. The paper demon-
work and Sentiment strates a method for doing sentiment analy-
Analysis. [43] sis utilising a combination of naive bayes and
lexicon-based algorithms to analyse and fore-
cast the overall sentiment of different traders.
Sentences were extracted from tweets and
categorised as good or negative. The accu-
racy of sentiment analysis was determined to
be 90.625%.
15. Public Sentiment As- As sentiment analysis packages, this study
sessment of Coron- uses tweets providing precise information re-
avirus specific Tweets garding the coronavirus pandemic. BERT
using a Transformer is used to determine sentiment categories,
based BERT classifier. whereas TF-IDF is used to summarise sub-
[44] jects. To detect negative sentiment features,
trend analysis and qualitative approaches are
employed. According to the findings of this
study, fine-tuned BERT is accurate in sen-
timent classification and conveys COVID-
19-related post characteristics of TF-IDF
themes properly. A BERT and TF-IDF hy-
brid classifier is used to assess coronavirus
twitter sentiments in this study.

Table 5.4: Continuation of table 5.1 showing results of the literature survey.

According to the findings gathered from the literature review, various research
papers employed different methods to conduct sentiment analysis. These approaches
can be summarized in the following table 5.5
Chapter 5. Results and Analysis 30

Name of the Lexicon- Machine Deep learning neu- Approach


article approach learning algo- ral networks
rithms
Sentiment anal-  • Naive  Comparision of
ysis of movie bayes traditional mod-
reviews using els with machine
heterogenous • Linear learning models.
features support
vector
machine
Sentiment anal-   Convolution Neural Deep learning
ysis of IMDB Network(CNN) model.
reviews using
deep learning
classifier
A comparison   Transformers Comparision
of lexicon-based of traditional
approach and model with deep
Transformers learning model.
Performance   • Convolutional Comparision of
analysis of dif- Neural Net- neural network
ferent neural work(CNN) models.
networks for
sentiment anal- • Long Short
ysis on IMDB Term Memory
movie reviews neural net-
work(LSTM)

• Recurrent
Neural Net-
work(RNN)
Sentiment anal-    Performing sen-
ysis using Lex- timent analysis
icon based ap- using traditional
proach model.
Comparing   • Long Short Comparision of
LSTM and GRU Term Memory deep learning
for multiclass neural net- models.
sentiment anal- work(LSTM)
ysis of movie
reviews • Gated Recur-
rent Unit(GRU)
Hybrid con-   • Convolutional A hybrid ap-
volutional Bidirectional proach
bidirectional Recurrent
recurrent neural Neural Net-
network based work(CBRNN)
sentiment anal-
ysis on movie • Bidirectional
reviews Gated Re-
current
Unit(BGRU)

Table 5.5: Summary of table 5.1


Chapter 5. Results and Analysis 31

Name of the Lexicon- Machine Deep learning neu- Approach


article approach learning algo- ral networks
rithms
Analyzing sen-  • Naive  Comparision of
timent using bayes machine learn-
IMDB dataset ing models.
• Logistic
Regression

• Random
Forest

• Decision
Tree
Combining Rule-based Naive bayes  Hybrid ap-
a rule based classifier proach.
classifier with
weakly super-
vised learning
for Twitter sen-
timent analysis
Sentiment   Long Short Term Neural network
Information Memory neural net- model.
based Model for work(LSTM)
Chinese Text
Sentiment Anal-
ysis
Comparative  Naive bayes  Hybrid ap-
analysis of proach.
customer sen-
timents on
competing
brands using the
Hybrid model
approach
BERT-IAN   Bidirectional Encoder Neural network
Model for Representations from model.
Aspect-based Transformers(BERT)
Sentiment Anal-
ysis
A compara-  Ensemble mod-  Comparision of
tive analysis els different ma-
of sentiment chine learning
classification models.
based on Deep
and Traditional
ensemble Ma-
chine Learning
Models

Table 5.6: Continuation of the Summary of table 5.1


Chapter 5. Results and Analysis 32

Name of the Lexicon- Machine Deep learning neu- Approach


article approach learning algo- ral networks
rithms
Foreign Rate  Naive bayes  Hybrid ap-
Exchange Pre- proach.
diction using
Neural Network
and Sentiment
Analysis
Public Sen- TF-IDF  Bidirectional Encoder Hybrid ap-
timent As- model Representations from proach.
sessment of Transformers(BERT)
Coronavirus
specific Tweets
using a Trans-
former based
BERT classifier

Table 5.7: Continuation of the Summary of table 5.1

5.2 Results of the experiment


Based on the evaluations from the literature survey performed, we opted for the
algorithms namely, the lexicon-based models using the BoW model and the TF-IDF
model as well as the BERT model for classifying the sentiments of the reviews from
the IMDb movie review data set. After dividing the data set into training, validation,
and testing tests the number of training samples used was 10,000. The below table
shows the number of positive reviews and negative reviews classified by each model.

Name of the model No.of. positives No.of negatives


BoW model 4962 5038
TF-IDF model 4707 5293
BERT neural network 4977 5023

Table 5.8: Table of the number of reviews classified into positive and negative by
each classifier model used in the study.

The experiment was carried out in order to answer research question 2. So by


examining the outcomes of each algorithm’s performance metrics, the process of pick-
ing an algorithm among the approaches chosen that is more accurate in conducting
sentiment analysis can be obtained. All of the findings displayed are the outcomes
of experimentation on the test data set.

5.2.1 Accuracy
By determining the accuracy of each model, this performance metric is utilized to
discover the most efficient algorithm from all of the selected algorithms. The accuracy
Chapter 5. Results and Analysis 33

of the various models is shown in Table 5.1 below.

Name of the model Accuracy values


BoW model 79.15%
TF-IDF model 78.98%
BERT neural network 90.67%

Table 5.9: Table of accuracy scores for all classifier models in this study.

Figure 5.1: A bar plot comparing the accuracy scores of three models employed in
this study.

If we observe the above comparison of accuracy scores. The BERT model amongst
the other selected models has the highest accuracy score of 90.67% followed by the
BoW model with 79.15%.
Chapter 5. Results and Analysis 34

Figure 5.2: Graph plotted using the BERT model to demonstrate the model’s accu-
racy using validation data.

5.2.2 Precision
This performance metric depicts all possible positive predictions classified by the
model. The selected models are evaluated considering this metric and the values of
the precision scores of each model are shown in table 5.2 below.

Name of the model Precision scores


BoW model 0.79
TF-IDF model 0.79
BERT neural network 0.88

Table 5.10: Table of precision scores for all classifier models in this study.
Chapter 5. Results and Analysis 35

Figure 5.3: A bar plot comparing the precision scores of three models employed in
this study.

So, by observing the above results showing the comparison of precision scores of
the three models, again the BERT model has the highest precision scores among the
other evaluated models followed by both the BoW model and TF-IDF model having
the same values.

5.2.3 Recall
In order to find the optimized algorithm we use this performance metric. The below
table 5.3, shows the recall values of each model evaluated against the text data.

Name of the model Recall values


BoW model 0.79
TF-IDF model 0.78
BERT neural network 0.92

Table 5.11: Table of recall values for all classifier models in this study.
Chapter 5. Results and Analysis 36

Figure 5.4: A bar plot comparing the recall scores of three models employed in this
study.

Here also, the BERT model has the best recall score of 0.92, followed by the BoW
model with a 0.79 recall score.

5.2.4 F1 score
F1 score is another performance metric that we have chosen in order to evaluate the
selected models based on their performance. Table 5.4, shows the values of the F1
score for each algorithm.

Name of the model F1 scores


BoW model 0.79
TF-IDF model 0.78
BERT neural network 0.88

Table 5.12: Table of F1 scores for all classifier models in this study.
Chapter 5. Results and Analysis 37

Figure 5.5: A bar plot comparing the F1 scores of three models employed in this
study.
Chapter 6
Discussion

The discussion chapter contains an overview of the thesis findings as well as their
contribution to answering the research questions. We also discuss aspects that con-
tradict the findings.

Research question1: How do lexicon-based approach and BERT neural


network model contribute for sentiment analysis in providing efficient text
classification?
The motivation of this research question is to comprehend how each approach,
lexicon-based and BERT neural network models work before applying sentiment
analysis to the data set of movie reviews by training the models.
We used a literature review method in this work to learn about several approaches
to performing sentiment analysis on movie reviews. Based on the study’s findings,
we choose to categorise IMDb movie reviews into two sentiment labels: positive or
negative, using both a lexicon-based approach and a BERT neural networks model.
By reviewing the findings obtained from the trained models on the imported data set,
we were able to successfully address this research problem. The chosen models were
able to classify nearly as many reviews as the training data. This is demonstrated
using a confusion matrix, which is constructed to analyze the performance accuracy
of classifiers. A correlation matrix can be used to analyze the number of correct and
incorrect predictions made by classifiers.

BoW model: The matrix displayed in Figure 6.1 represents the results of binary
classification using the Bag-of-Words (BoW) model in our study. It indicates the
model’s ability to accurately predict whether a labeled review from the IMDb movie
reviews data set is positive or negative, belonging to either class 0 or class 1. The
matrix allows us to assess the model’s performance in correctly identifying the overall
number of positive and negative reviews.

38
Chapter 6. Discussion 39

Figure 6.1: Correlation matrix of the BoW model showing the classified no.of true
positives and true negatives on IMDb movie reviews data set.

TF-IDF model: The TF-IDF model was employed in our research to evaluate
whether a labeled review of the IMDb movie reviews data set would be positive
or negative. The results of the binary classification in the model are shown in the
matrix below Figure 6.2, which is either class 0 or class 1. The model’s accuracy in
predicting the total number of positives and negatives is evident.

Figure 6.2: Correlation matrix of the TF-IDF model showing the classified no.of true
positives and true negatives on IMDb movie reviews data set.

BERT model: In our study, the BERT model was used to determine whether
a labelled review from the data set of IMDb movie reviews will be either positive
Chapter 6. Discussion 40

or negative. The matrix in Figure 6.3 below displays the outcomes of the model’s
binary classification, which are either class 0 or class 1. It is clear that the model was
almost successful in forecasting both the overall number of positives and negatives.

Figure 6.3: Correlation matrix of the BERT model showing the classified no.of true
positives and true negatives on the IMDb movie review data set.

Research question 2: Which of the two, the lexicon-based approach


and BERT neural network are more accurate in conducting sentiment
analysis?
Here our main goal is to find out the algorithm that performs sentiment analysis
better than the other algorithm in terms of efficiency and accuracy.
Accuracy: The accuracy score in machine learning is a measurement statistic that
compares the proportion of accurate predictions made by a model with the total
predictions made. We determine it by dividing the total number of predictions by
the number of correct predictions.
Efficiency: Reducing the amount of computation required to train a certain ability
is the definition of algorithmic efficiency.
Based on the findings from the experiment in Chapter 5, the BERT neural network
model achieved the highest accuracy of 90.69% in comparison to the lexicon-based
approaches using the Bag-of-Words (BoW) and TF-IDF models, which achieved
accuracy of 79.15% and 78.98% respectively. However, since the data set contained
a larger number of records, relying solely on accuracy may not always be optimal.
Therefore, we also considered other performance metrics such as precision, recall,
Chapter 6. Discussion 41

and F1 scores.
The BERT neural network model demonstrated the highest precision score of
0.88, indicating its ability to accurately classify positive and negative sentiments. It
also exhibited a high recall score of 0.92, indicating that it successfully identified
a significant proportion of true positive instances. Additionally, the BERT model
achieved an F1 score of 0.88, which combines both precision and recall.
On the other hand, the lexicon-based approaches using the BoW and TF-IDF
models showed similar performance scores for precision, recall, and F1 scores, slightly
lower than the BERT model. This suggests that the lexicon-based approaches were
comparatively less effective in accurately predicting sentiment compared to the BERT
model.

Figure 6.4: A curve plot showing the comparison of performance metrics evaluated
for the three classifier models used in the thesis.

So, in response to RQ 2, our study clearly demonstrates that the BERT neural
network model outperformed traditional lexicon-based approaches in terms of accu-
racy and various performance metrics. The BERT model consistently showed higher
accuracy, precision, recall, and F1 scores, indicating its effectiveness in conducting
sentiment analysis. These findings confirm our hypothesis that deep learning models
like BERT can deliver superior results compared to lexicon-based methods.
The implications of our research are significant for sentiment analysis applications.
By utilizing the BERT model, analysts and businesses can achieve more precise and
reliable sentiment analysis outcomes. This is particularly advantageous in domains
Chapter 6. Discussion 42

where capturing nuanced sentiment patterns is crucial, such as customer feedback


analysis, brand monitoring, and social media sentiment analysis. However, it is im-
portant to acknowledge that the increased accuracy and performance of the BERT
model come at the expense of greater computational complexity and resource re-
quirements. Training and deploying a BERT model can be time-consuming and
demanding in terms of computational resources. Hence, when choosing an approach,
it is crucial to consider both accuracy and efficiency requirements, as well as the
available computational resources.
In conclusion, our study provides valuable insights into the effectiveness of differ-
ent sentiment analysis approaches. The BERT neural network model emerged as the
superior choice, offering enhanced accuracy, precision, recall, and F1 scores compared
to lexicon-based models. These findings contribute to the advancement of sentiment
analysis methodologies and can serve as a guide for researchers and practitioners in
selecting the most suitable approach for their specific application needs.

Validity threats:

"Internal validity" relates to the accuracy of the research. The data quality
is the major internal challenge of this study. During the data-gathering phase of
the investigation, possible threats to the internal validity of this theory may emerge.
While downloading the data set from the internet page, take care to ensure that all
of the text gets downloaded. If this is not done, an incorrect sentiment polarity will
arise, resulting in an erroneous understanding of the circumstance. Another risk is
that the algorithms which are employed may be incorrect. To address this threat,
available data were examined and a thorough literature study was conducted, in or-
der to select algorithms that perform well with the data that is currently accessible.
The experiment demonstrated that the algorithms performed as expected.
The term "external validity" relates to how well the thesis’s findings may be
used in practice. The data set in this study comprises reviews written by people on
an internet platform, therefore this data is somehow related to the outside world, and
performing sentiment analysis as mentioned in this study can be beneficial to people.
Another issue is that the model is out of date and does not completely reflect the
actual situation. However, the procedures utilised in the study were acceptable and
successful, therefore this thesis might be applied in other real-world circumstances.
Finally, the validity of the thesis conclusions is determined by how correctly we
were able to select the model that performs sentiment analysis on IMDb movies with
more accuracy. This is related to the performance measurements we used in order
to validate the chosen models. In our study, we made sure to use the appropriate
metrics, which were substituted in the findings acquired.
Chapter 7
Conclusions and Future Work

7.1 Conclusions
In the modern era, movies and television have become the primary sources of en-
tertainment, deeply intertwined with people’s lives. Consequently, individuals are
increasingly interested in gaining insights about a film before watching it. To con-
tribute to this field, we conducted a sentiment analysis of IMDb movie reviews,
aiming to classify them as positive or negative. This dissertation focuses on com-
paring two approaches: the lexicon-based method and the BERT neural network
model.
Our experiment yielded successful results, revealing that the BERT neural net-
work model outperformed the lexicon-based method in terms of accuracy and effi-
ciency. To validate this conclusion, we evaluated the performance using metrics such
as accuracy, precision, recall, and f1 score.
The BERT neural network model, a cutting-edge natural language processing
(NLP) model, exhibited superior performance in sentiment analysis compared to
the lexicon-based method. BERT’s advantage lies in its ability to grasp contextual
information and understand the nuances of language. By being trained on extensive
text data, BERT learns to predict words in a sentence based on their surrounding
context, resulting in a highly accurate and robust model. On the other hand, the
lexicon-based method relies on predefined sentiment dictionaries or lexicons. While
it can yield reasonable results, it often struggles with contextual understanding and
fails to capture subtle language nuances. Lexicon-based approaches typically assign
sentiment scores to individual words or phrases and aggregate them to determine
the sentiment of the entire text. This simplistic approach may overlook the complex
interactions between words and their contextual meanings, leading to less accurate
sentiment analysis.
To evaluate the performance of the two approaches, we utilized various metrics,
including accuracy, precision, recall, and f1 score. Accuracy measures the overall
correctness of sentiment classification, while precision and recall assess the model’s
ability to correctly identify positive and negative sentiments. The f1 score combines
precision and recall, offering a comprehensive evaluation of the model’s performance.
Based on these evaluation metrics, we consistently found that the BERT neural
network model outperformed the lexicon-based method in sentiment analysis accu-
racy. Its proficiency in capturing contextual information and understanding language
intricacies enables it to achieve higher precision, recall, and f1 score values. Con-
sequently, the BERT model proves to be a more reliable and effective approach for

43
Chapter 7. Conclusions and Future Work 44

sentiment analysis of IMDb movie reviews.


Overall, our research contributes to the understanding of sentiment analysis
methodologies and highlights the advancements made through deep learning tech-
niques like the BERT neural network model. This dissertation provides valuable
insights for further research in natural language processing and sentiment analysis
in the context of movies and television.

7.2 Future works


• Merging words with similar meanings before training the classifiers is one of
the significant enhancements that can be implemented as we progress in this
project. This strategy, referred to as word merging or word grouping, entails
the combination or clustering of words that share similar semantic meanings
or fall within the same conceptual category.

• We can further extend this to a multi-class classification problem in which we


classify the reviewer’s attitudes in more than binary terms such as "Happy",
"Bored", "Afraid" and so on.

• For more precise classification, we can use live data that is immediately acquired
from online domains using web scraping techniques.

• For better outcomes, the proposed methodology can be done and assessed using
a hybrid model that combines traditional methods using lexicons with deep
learning models like neural networks.
References

[1] J. Robbins. Intensity impact of its escalation on people, society and the environ-
ment. In ISTAS 98. Wiring the World: The Impact of Information Technology
on Society. Proceedings of the 1998 International Symposium on Technology and
Society (Cat. No.98CH36152), pages 98–104, June 1998.
[2] An Han, Liu Hao, and Ren Jifan. An empirical study on inline impact factors of
reviews usefulness based on movie reviews. In 2016 13th International Confer-
ence on Service Systems and Service Management (ICSSSM), pages 1–5, June
2016. ISSN: 2161-1904.
[3] Mahesh Joshi, Dipanjan Das, Kevin Gimpel, and Noah A. Smith. Movie Re-
views and Revenues: An Experiment in Text Regression. In Human Language
Technologies: The 2010 Annual Conference of the North American Chapter of
the Association for Computational Linguistics, pages 293–296, Los Angeles, Cal-
ifornia, June 2010. Association for Computational Linguistics.
[4] S. S. Rajamouli. RRR (Rise Roar Revolt), March 2022. Translated title: RRR
IMDb ID: tt8178634 event-location: India.
[5] Subhra Balabantaray. Impact of Indian cinema on culture and creation of world
view among youth: A sociological analysis of Bollywood movies. Journal of
Public Affairs, September 2020.
[6] M. Kavitha, Bharat Bhushan Naib, Basetty Mallikarjuna, R. Kavitha, and
R. Srinivasan. Sentiment Analysis using NLP and Machine Learning Techniques
on Social Media Data. In 2022 2nd International Conference on Advance Com-
puting and Innovative Technologies in Engineering (ICACITE), pages 112–115,
April 2022.
[7] Deni Kurnianto Nugroho. US presidential election 2020 prediction based on
Twitter data using lexicon-based sentiment analysis. In 2021 11th International
Conference on Cloud Computing, Data Science & Engineering (Confluence),
pages 136–141, January 2021.
[8] Samar Assem and Sameh Alansary. Sentiment Analysis From Subjectivity to
(Im)Politeness Detection: Hate Speech From a Socio-Pragmatic Perspective.
In 2022 20th International Conference on Language Engineering (ESOLEC),
volume 20, pages 19–23, October 2022.
[9] Xiao-Hong Cai, Pei-Yu Liu, Zhi-Hao Wang, and Zhen-Fang Zhu. Fine-Grained
Sentiment Analysis Based on Sentiment Disambiguation. In 2016 8th Inter-
national Conference on Information Technology in Medicine and Education
(ITME), pages 557–561, December 2016. ISSN: 2474-3828.

45
References 46

[10] Yogesh S. Deshmukh, Nikhil S. Patankar, Rameshwar Chintamani, and Nitin


Shelke. Analysis of Emotion Detection of Images using Sentiment Analysis and
Machine Learning Algorithm. In 2023 5th International Conference on Smart
Systems and Inventive Technology (ICSSIT), pages 1071–1076, January 2023.
ISSN: 2832-3017.
[11] Jingang Ma, Xiaohong Cai, Dejian Wei, Hui Cao, Jing Liu, and Xuqiang Zhuang.
Aspect-Based Attention LSTM for Aspect-Level Sentiment Analysis. In 2021 3rd
World Symposium on Artificial Intelligence (WSAI), pages 46–50, June 2021.
[12] Rifqi Majid and Heru Agus Santoso. Conversations Sentiment and Intent Cat-
egorization Using Context RNN for Emotion Recognition. In 2021 7th In-
ternational Conference on Advanced Computing and Communication Systems
(ICACCS), volume 1, pages 46–50, March 2021. ISSN: 2575-7288.
[13] Wen Zhou, Zongtian Liu, Yan Zhao, libin Xu, Guang Chen, Qiang Wu, Mei-li
Huang, and Yu Qiang. A Semi-automatic Ontology Learning Based on WordNet
and Event-based Natural Language Processing. In 2006 International Confer-
ence on Information and Automation, pages 240–244, December 2006. ISSN:
2151-1810.
[14] Seydeh Akram Saadat Neshan and Reza Akbari. A Combination of Machine
Learning and Lexicon Based Techniques for Sentiment Analysis. In 2020 6th
International Conference on Web Research (ICWR), pages 8–14, April 2020.
[15] G Veena, Aadithya Vinayak, and Anu J Nair. Sentiment Analysis using Im-
proved Vader and Dependency Parsing. In 2021 2nd Global Conference for
Advancement in Technology (GCAT), pages 1–6, October 2021.
[16] Elif Varol Altay and Bilal Alatas. Detection of Cyberbullying in Social Net-
works Using Machine Learning Methods. In 2018 International Congress on
Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT), pages
87–91, December 2018.
[17] Rakhee Sharma, Ngoc Le Tan, and Fatiha Sadat. Multimodal Sentiment Anal-
ysis Using Deep Learning. In 2018 17th IEEE International Conference on Ma-
chine Learning and Applications (ICMLA), pages 1475–1478, December 2018.
[18] Huibing Zhang, Fang Pan, Junchao Dong, and Ya Zhou. BERT-IAN Model for
Aspect-based Sentiment Analysis. In 2020 International Conference on Com-
munications, Information System and Computer Engineering (CISCE), pages
250–254, July 2020.
[19] Shashank Kalluri. Deep Learning Based Sentiment Analysis.
[20] Nathalie Japkowicz and Mohak Shah. Performance Evaluation in Machine
Learning. In Issam El Naqa, Ruijiang Li, and Martin J. Murphy, editors, Ma-
chine Learning in Radiation Oncology: Theory and Applications, pages 41–56.
Springer International Publishing, Cham, 2015.
[21] Kusrini and Mochamad Mashuri. Sentiment Analysis In Twitter Using Lexicon
Based and Polarity Multiplication. In 2019 International Conference of Artificial
Intelligence and Information Technology (ICAIIT), pages 365–368, March 2019.
References 47

[22] Dingyi Yu. Intelligent Analysis System of Movie Reviews Using Deep Learning
and Convolutional Neural Networks. In 2021 IEEE Conference on Telecom-
munications, Optics and Computer Science (TOCS), pages 617–621, December
2021.
[23] Kamil Topal and Gultekin Ozsoyoglu. Movie review analysis: Emotion analysis
of IMDb movie reviews. In 2016 IEEE/ACM International Conference on Ad-
vances in Social Networks Analysis and Mining (ASONAM), pages 1170–1176,
August 2016.
[24] Rachana Bandana. Sentiment Analysis of Movie Reviews Using Heterogeneous
Features. In 2018 2nd International Conference on Electronics, Materials En-
gineering & Nano-Technology (IEMENTech), pages 1–4, May 2018.
[25] Pawan Kumar Sarika. Comparing LSTM and GRU for Multiclass Sentiment
Analysis of Movie Reviews. 2020.
[26] V.R. Basili. The role of experimentation in software engineering: past, current,
and future. In Proceedings of IEEE 18th International Conference on Software
Engineering, pages 442–449, March 1996. ISSN: 0270-5257.
[27] Jeffrey W. Knopf. Doing a Literature Review. PS: Political Science & Politics,
39(1):127–132, January 2006. Publisher: Cambridge University Press.
[28] I. Stančin and A. Jović. An overview and comparison of free Python libraries for
data mining and big data analysis. In 2019 42nd International Convention on
Information and Communication Technology, Electronics and Microelectronics
(MIPRO), pages 977–982, May 2019. ISSN: 2623-8764.
[29] Xavier Schmitt, Sylvain Kubler, Jérémy Robert, Mike Papadakis, and Yves
LeTraon. A Replicable Comparison Study of NER Software: StanfordNLP,
NLTK, OpenNLP, SpaCy, Gate. In 2019 Sixth International Conference on
Social Networks Analysis, Management and Security (SNAMS), pages 338–343,
October 2019.
[30] IMDB Dataset of 50K Movie Reviews.
[31] Siwei Lai, Kang Liu, Shizhu He, and Jun Zhao. How to Generate a Good Word
Embedding. IEEE Intelligent Systems, 31(6):5–14, November 2016. Conference
Name: IEEE Intelligent Systems.
[32] Denis Rothman and Antonio Gulli. Transformers for Natural Language Pro-
cessing: Build, train, and fine-tune deep neural network architectures for NLP
with Python, PyTorch, TensorFlow, BERT, and GPT-3. Packt Publishing Ltd,
March 2022. Google-Books-ID: u9FjEAAAQBAJ.
[33] Sara Sabba, Nahla Chekired, Hana Katab, Nassira Chekkai, and Mohammed
Chalbi. Sentiment Analysis for IMDb Reviews Using Deep Learning Classifier.
In 2022 7th International Conference on Image and Signal Processing and their
Applications (ISPA), pages 1–6, May 2022.
[34] Cuk Tho, Yaya Heryadi, Iman Herwidiana Kartowisastro, and Widodo Budi-
harto. A Comparison of Lexicon-based and Transformer-based Sentiment Analy-
sis on Code-mixed of Low-Resource Languages. In 2021 1st International Con-
References 48

ference on Computer Science and Artificial Intelligence (ICCSAI), volume 1,


pages 81–85, October 2021.
[35] Md. Rakibul Haque, Salma Akter Lima, and Sadia Zaman Mishu. Perfor-
mance Analysis of Different Neural Networks for Sentiment Analysis on IMDb
Movie Reviews. In 2019 3rd International Conference on Electrical, Computer
& Telecommunication Engineering (ICECTE), pages 161–164, December 2019.
[36] Vijendra Singh, Gurdeep Singh, Priyanka Rastogi, and Devanshi Deswal. Sen-
timent Analysis Using Lexicon Based Approach. In 2018 Fifth International
Conference on Parallel, Distributed and Grid Computing (PDGC), pages 13–18,
December 2018. ISSN: 2573-3079.
[37] Sivakumar Soubraylu and Ratnavel Rajalakshmi. Hybrid convolutional
bidirectional recurrent neural network based sentiment analysis on movie
reviews. Computational Intelligence, 37(2):735–757, 2021. _eprint:
https://onlinelibrary.wiley.com/doi/pdf/10.1111/coin.12400.
[38] Sandesh Tripathi, Ritu Mehrotra, Vidushi Bansal, and Shweta Upadhyay. Ana-
lyzing Sentiment using IMDb Dataset. In 2020 12th International Conference on
Computational Intelligence and Communication Networks (CICN), pages 30–33,
September 2020. ISSN: 2472-7555.
[39] Umme Aymun Siddiqua, Tanveer Ahsan, and Abu Nowshed Chy. Combining a
rule-based classifier with weakly supervised learning for twitter sentiment anal-
ysis. In 2016 International Conference on Innovations in Science, Engineering
and Technology (ICISET), pages 1–4, October 2016.
[40] Gen Li, QiuSheng Zheng, Long Zhang, SuZhou Guo, and LiYue Niu. Sentiment
Infomation based Model For Chinese text Sentiment Analysis. In 2020 IEEE 3rd
International Conference on Automation, Electronics and Electrical Engineering
(AUTEEE), pages 366–371, November 2020.
[41] N Srivats Athindran, S. Manikandaraj, and R. Kamaleshwar. Comparative
Analysis of Customer Sentiments on Competing Brands using Hybrid Model
Approach. In 2018 3rd International Conference on Inventive Computation
Technologies (ICICT), pages 348–353, November 2018.
[42] Mahammed Kamruzzaman, Mohammed Hossain, Md. Rashidul Islam Imran,
and Sagor Chandro Bakchy. A Comparative Analysis of Sentiment Classification
Based on Deep and Traditional Ensemble Machine Learning Models. In 2021
International Conference on Science & Contemporary Technologies (ICSCT),
pages 1–5, August 2021.
[43] Swagat Ranjit, Shruti Shrestha, Sital Subedi, and Subarna Shakya. Foreign Rate
Exchange Prediction Using Neural Network and Sentiment Analysis. In 2018
International Conference on Advances in Computing, Communication Control
and Networking (ICACCCN), pages 1173–1177, October 2018.
[44] Kanak Mahor and Amit Kumar Manjhvar. Public Sentiment Assessment of
Coronavirus-Specific Tweets using a Transformer-based BERT Classifier. In
2022 International Conference on Edge Computing and Applications (ICECAA),
pages 1559–1564, October 2022.
Faculty of Faculty, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

You might also like