Final Project Report

1
Project Report
On
Using Python Language for Sentiment Analysis of Restaurant
Reviews
Dept. of
Electrical and Electronic Engineering
Faculty of Engineering and Technology
Begum Rokeya University , Rangpur
Submitted by
Md. Iftik Arman Emon
ID No: 1716017
Reg.No:000010701
Session: 2017-2018
Supervised by
Iffat Ara Badhan
Lecturer , Dept. of EEE
Begum Rokeya University ,Rangpur
2
ACKNOWLEDGEMENT
A project was included in the syllabus of the final year of the Department of Electrical and
Electronic Engineering at Begum Rokeya University, Rangpur. Represented Iffat Ara
Badhon mam’s gave necessary instructions to do this project successfully.I am grateful to her
for this . I also thank all of my friends who have helped me run the project in various ways
Author
……………………………
3
CERTIFICATE
This is to certify that Md. Iftik Arman Emon ID number 1716017 , Reg. number 000010701,
sessional :2017-2018, has successfully finished the project titled “ Using Python Language
for Sentiment Analysis of Restaurant Reviews ”. In order to completed the criteria for the
Bachelor of Engineering in Electrical and Electronic Engineering degree, the project was
carried out under my supervision and guidance. To the best of the of my knowledge and
belief, the project report contains the candidate’s original work, for which she conducted
adequate investigation and can be noted as a unique idea.
…………………………
Iffat Ara Badhan
Lecturer, Department of Electrical and
Electronic Engineering
Begum Rokeya University,Rangpur
4
DECLARATION
The project report “Using Python Language for Sentiment Analysis of Restaurant Reviews ” is
based on my personal work that was completed throughout the course of our studies under
the supervision of Iffat Ara Badhan Mam’s. I the undersigned solemnly swear .I claim that
claims made and judgments made are a results of my study. I further certify that
1. The work contained in the report is original and has been done by me under the
general supervision of my supervisor
2. I have followed the guidelines provided by the university in writing report
3. Whenever I have used materials from other sources, we have given due credit to them
in the text of the report and giving their details in the references
………………………………..
Md. Iftik Arman Emon
ID no:1716017
Reg no: 000010701
Session: 2017-2018
Department of Electrical and Electronic Engineering
Begum Rokeya University, Rangpur
5
LIST OF CONTENTS
ABSTRACT……………………………………………………………………………7
CHAPTER1.INTRODUCTION………………………………………………………8
1.1.Introduction ………………………………………………….8
1.2. Related work………………………………………………....9
1.3.Objective of Project…………………………………………..9
CHAPTER 2. ARCHITECTURE and MODELING……………………………..10
2.1. Architectural diagram…………………………………...……10

2.2. Working Principle…………………………………………….11
CHAPTER 3. DATA COLLECTION and PREPPING OF INFORMATION..…12
3.1.Flssk==1.1.1
3.2.gunicorn==19.9.0
3.3. itsdangerous==1.1.0 ……………………………………13
3.4.jinja2==2.10.1
3.5.MarkSafe==1.1..1
3.6.Werkzeug==0.15.5
3.7. numpy>=1.9.2 …………………………………..14
3.8.scipy>=0.15.1
3.9.Scikit-learn>=1.4.3
3.10. matplotlib>=1.4.3
3.11.Pandas>=0.19 …………………………………….15
CHAPTER 4. ALGORITHM and MODEL of DATA ANALYSIS……………….20
4.1. SVM……………………………………………………………20
4.2. NAIVE BAYES………………………………………………..21-
22
4.3. LOGISTIC REGRETION……………………………………..23
4.4. LTSM………………………………………………………….24-
25
4.5. BERT………………………………………………………….26-27
6
CHAPTER 5. METHODOLOGY………………………………………………….28
5.1. Methodology……………………………………………………..28
5.2 . Classification
 Multinomial Naïve Bayes
 Random Forest
 Decision Tree ………………………….28-
30
 Support Vector Machine
5.3 . Achievement Rating Assessment………………………………..30
5.3 . Predicting a class…………………………………………………31
CHAPTER 6. IMPLEMENTATION AND PERFORMANCE ANALYSIS………….31

6.1. Implementation and Performance Analysis …………………....31-
40
6.2. Conclusion……………………………………………………….41
CHAPTER 7. REFERENCES………………………………………………………..42-43
7
Abstract In the last ten years, the Internet's development has generated vast amounts of data
across all industries. These innovations have given people new avenues for expressing their
ideas on anything through tweets, blog entries, online forums, status updates, etc. Sentiment
analysis is the technique of computationally identifying and classifying opinions stated in a
text, particularly to ascertain if the writer has a good, negative, or neutral attitude towards a
given topic. Any firm should be very interested in client feedback. Therefore, in this present
paper, we use python language classification system to analyze the customer reviews of the
restaurant. This study's major topics are the use of several categorization algorithms and an
evaluation of their effectiveness. According to the simulation findings, the highest accuracy
achieved by SDG at 69.23%
Keywords LR(Logistic Regression model), DT(Decision Tree Model), RF( Random Forest
Model) , MNB( Multinomial Naïve model) , KNN ( K- Nearest Neighbors model) , Linear
SVM ( Linear Support Vector Machine model ) , SGD ( Stochastic Gradient Descent model )
8
Introduction
There is an infinite amount of online activity, including blog posts, video calls, conferences,
monitoring, and other e-commerce and online transactions. have been sparked by the
exponential rise in Internet usage. This makes it necessary to quickly collect, convert, load,
and analyzes vast amounts of diverse, unstructured data[1]. Numerous discussion boards,
blogs, social networks, e-commerce websites, news articles, and other online resources
provide a place for opinion expression that can be used to gauge the opinions of the general
population. Sentiment analysis aids in identifying, extracting, and categories ideas,
sentiments, and attitudes conveyed in textual input on many issues [2]. Additionally, it aids in
reaching objectives such as tracking public opinion on political movements, gauging
customer happiness, forecasting movie sales, and ascertaining critics' view points. To extract
key information about a certain product (including variables such a digital camera, a
computer, books, or films), sentiment analysis can be used to categorize online evaluations of
merchandise from retailers like Ebey and Flip kart . The method of sentiment analysis is
frequently used to track how the public's views on a political candidate are evolving by
looking at online discussion boards. [3]. Since it may be utilized for study into trends or
consumer preferences, monitoring the mood of bloggers is likewise becoming a highly
sought-after research area. Sentiment analysis is turning out to be absolutely essential in the
area of opinion spam. Opinion spam describes criminal practices that aim to deceive readers,
such as writing fraudulent reviews (also known as shilling). It might be seen as an automated
sentiment analysis system giving some target entities unwarranted good assessments in an
effort to advance the entities. It can also mean giving a falsely adverse review of another
organization in an effort to harm its reputation. Studying the reviews and looking at the
sentiment scores is the major objective of sentiment analysis. People primarily rely on user-
generated content while making decisions. Before making a purchase, the user can determine
via sentiment analysis whether the product's information is satisfactory or not. Companies
and advertising agencies use this analysis data to find out more about their products or
services so they can more effectively satisfy client wants. Sentiment analysis is typically
performed at several levels, ranging from coarse to fine. A document's overall sentiment is
determined by coarse- evaluation of sensitivity, while attribute-level sentiment analysis is the
focus of fine-level analysis [4]. In between these two is where emotion evaluation at the
sentence stage exists.
Either a based on information technique or a strategy based on artificial intelligence can be

used for sentiment analysis. We can use language processing techniques or Lexicon methods
to analyze the sentiment under the knowledge-based approach[5]. Unsupervised and
supervised learning are additional categories for the machine learning-based method [6]. To
draw qualities including Word distribution, elements that make up speech tags, phrases and
9
keywords expressing opinions, language processing techniques are employed [7]. However,
using a dataset that is originally categorized by a human, methods for guided artificial
intelligence learn whether the review is favorable, unfavorable, or neither. [6]. In the
Lexicon-based technique, the polarity is determined by matching opinion terms from a
sentiment lexicon with the data. Following that, ratings are given according to the dictionary
terms' unfavorable or bad connotations to the view words [2]. The planned research would
examine patron perceptions of a restaurant's service.
1.1 Related Work

This section provides a brief overview of the sentiment analysis-related research. The topic of
sentiment analysis has seen a lot of research over the last ten years from a variety of
researchers. Initial implementations of sentiment analysis focused on multimodal
categorization, which provides feedback or opinions to a class bipolar like nice or bad. By
summing the grammatical direction of an adverb-and-adjective-containing sentence. The
category of reviews was predicted by averaging the semantic orientation of an adverb-and-
adjective-containing sentence.. The phrase's thumbs-up or thumbs-down rating has then been
determined. A Chinese document's sentiment was examined by Liu et al [8] using Base Line
and a support vector machine, this determines the overall orientation of the paper by a group
of particular phrases from a glossary of emotive words and modifying it in accordance with
the background knowledge Ramachandran and Gehringer [9] suggested preprocessing the
data to enhance the sentence's raw structure's quality. For sentiment analysis, they used the
cosine similarity and latent semantic analysis techniques. More than 300 papers were
surveyed by Pang and Lee [10], who covered the applications and typical difficulties
encountered during the many stages of sentiment analysis, including polarity identification,
sentiment categorization, and summarization. The four main issues, including Word
sentiment categorization, opinion extraction, document emotion categorization, and
subjective categorization, were covered by Tang et al. [11]. One of the intriguing research
fields for academics is restaurant patron reviews. For performance evaluation, Schrauwen
employs the Naive Bayes algorithm, Maximum Entropy, and Decision Tree classifier. Use
these classification methods by evaluating their Accuracy, Precision, F1 score, and Recall
[12]. Customers now use online reviews to learn more about the establishments they intend to
visit. Therefore, those reviews are crucial for consumers who wish to learn about the
immaterial qualities of things beforehand [13]. (Ma, Y., The conducts research on sentiment
analysis utilizing the probabilistic latent semantic analysis approach. They gather information based
on the brief review, not the entire comment. The study's findings revealed a 73% accuracy rate for the
data, which is supported by [14]. Utilizing classification algorithms like Support Vector Machine,
Random Forest, Decision Tree Algorithm, Naive Bayes approach, and confusion matrix, (Martin, S.,
2019) employ natural language. The performance of naive bayes is superior than other algorithms
among the classification models[15].
1.2 Objective of Project

10
 To create a model for prediction that can predict whether a review of

the restaurant will be favorable or unfavorable
 . We'll incorporate it into forecasting algorithms. Logistic Regression,
Multinomial Bayes, Nave Bayes, and Bernoulli Nave Bayes
2 Architecture and Modeling

The proposed work recommends the following procedures for analyzing restaurant patrons'
opinions based on a dataset of customer reviews. Fig. 1 displays the architecture diagram for
the suggested algorithm. The many steps are described in the following.
Database of Data Preparation of

Restaurant Pre-processing Bag of Words
Reviews
Segregation of
Training and
Predict Class Testing data
of a review Performance
using best Analysis of
classifier Classification
using Test data
Classification
using Training
data set
New set of
Reviews
Fig.1 Architectural diagram

11
Working principle :
The natural language processing (NLP) approach of sentiment analysis, commonly referred to
as opinion mining, is used to ascertain the sentiment or emotional tone of a document.
Sentiment analysis can be used to examine customer input in the context of restaurant
reviews in order to ascertain if the sentiment indicated in the review is positive, negative, or
neutral.
Here is a summary of how Python and machine learning are used to perform sentiment
analysis on restaurant reviews:
 Data Collection : Obtaining a dataset of restaurant reviews is the first stage in

the sentiment analysis process. Reviews should be included in this dataset,
along with any associated sentiment labels (such as positive, negative, or
neutral). You can develop your own dataset by scraping restaurant review
websites, or you can use one of the many publicly accessible datasets for
sentiment analysis.
 Data preprocessing : Data preparation is the next step after obtaining the
dataset. As part of this process, the text data must be cleaned, any extraneous
information removed (such as URLs and special characters), and the text
converted to a standard format (such as lowercase). Tokenization, the process
of breaking the text down into separate words or tokens, is another component
of preprocessing.
 Feature Extraction: Models for machine learning demand numerical features
as input. As a result, the text data must be transformed into a numerical form.
The Bag-of-Words (BoW) model, in which each review is represented as a
vector of word frequencies, is a popular method for feature extraction. The
Term Frequency-Inverse Document Frequency (TF-IDF) representation is
another widely used method that provides higher weight to words that are
significant in a specific review but uncommon across all reviews.
 Model Selection: After the features have been extracted, a machine learning
model needs to be chosen. Support Vector Machines (SVM), Naive Bayes,
Logistic Regression, and neural network-based models like Common models
for sentiment analysis include LSTM (Long Short-Term Memory) and BERT
(Bidirectional Encoding Representations from Transformers).
12
3 Data Collection:
The dataset for this study was created from comments made about various
restaurants .The data used in this study was prepared from foodpanda and other
restaurant sites in rangpur division .The dataset contains 600 reviews .It contains seven
columns .First column contains SL No , Restaurant Name , Third is reviewer, Fourth is
Location Name , Fifth is cuisine, Sixth is Ratting and seventh is sentiment .The reviews
are classified in two categories .They are positive and negative .The range of the
sentiment is (0-5).Here in my dataset positive reviews contains ratting 3-5 out of 5 and
negative reviews contains ratings 1 to 2. From the dataset there are total positive reviews
348 and negative reviews 252.After cleaning,214 small reviews are cleaned. Then total
reviews are 386 which is 190 are positive and 196 are negative reviews.
4 Prepping of Information
The dataset we used is excel. In this excel format, we have used the restaurant reviews to
train the model. Since all the algorithms we are working with are supervised learning, we
have classified the dataset beforehand to train our algorithm. We import the dataset using
Pandas in python .19
The most important phase in establishing a text's atmosphere is preprocessing. In our
approach, the preprocessing is broken down into three basic phases .Here first stage is to
remove the punctuation from the sentences. Special characters such as exclamations, quotes,
etc. are eliminated by creating a suitable pattern expression. The resulting data would consist
only of alphabetical characters.
The second step is to get rid of the stop-words. Stop-words are words in the English language
that are not used for emotion or sentiment but are used as links or articles. Examples of stop-
words include “and”, “with,” “of,” and “the”. NLP techniques such as Stop-words are found
and removed from the dataset using lexical examination, grammatical evaluation, semantic
evaluation, transparency integrating, and pragmatic evaluation. The semantic analysis step
usually deletes the “not” like “not.” However, when it comes to opinion mining, it’s not the
word not that matters. For instance, the review says that “Crust is not good”. By removing
stop-word “crust”, this sentence becomes “crust good” and a negative opinion becomes a
positive opinion. To prevent this from happening, we’ve changed the semantics The
evaluation stage in the NLP and we’ve to ensure that these stop-words aren’t getting
eliminated throughout. The third step is to calculate the sentiment of all data which I import
to the excel sheet .To do this work we have some python Library .Those Python library are
given bellow :-
13
 Flask==1.1.1
 gunicorn==19.9.0
 itsdangerous==1.1.0
 Jinja2==2.10.1
 MarkupSafe==1.1.1
 Werkzeug==0.15.5
 numpy>=1.9.2
 scipy>=0.15.1
 scikit-learn>=0.18
 matplotlib>=1.4.3
 pandas>=0.19
Now I will short description about those python library
 Flask 1.1.1: A well-liked Python web framework called Flask makes it simple and
requires little boilerplate code to create online applications. It is a simple and
adaptable framework that adheres to the WSGI (Web Server Gateway Interface)
standard and is frequently utilized to develop RESTful APIs and web services.
Pip, the Python package manager, can be used to install Flask. Run the following
command after opening your command-line interface:
Pip install Flask==1.1.1
 Gunicorn 19.9.0: A well-liked WSGI (Web Server Gateway Interface) HTTP server
called Gunicorn (Green Unicorn) is frequently used to deliver Python web
applications. Flask, Django, Pyramid, and other web frameworks are just a few of the
ones that it is made to operate well with. Gunicorn is a good option for hosting
production-ready web apps because of its simplicity, performance, and scalability.
The Python package manager, pip, is used to install Gunicorn. Run the following
Pip install gunicorn==19.9.0
 Itsdangerous 1.1.0: Several security-related functions are offered by the Python

library itsdangerous, which is primarily concerned with creating and confirming
cryptographically signed tokens. It is frequently used in web applications and
frameworks like Flask to guarantee the authenticity and integrity of data transmitted
between various application components.
Using the Python package manager, pip, you can install itsdangerous. Run the
following command after opening your command-line interface:
Pip install itsdangerous==1.1.0
14
 Jinja2==2.10.1: Python has a sophisticated and popular templating engine called

Jinja2. It is utilised in other web frameworks including Django and Bottle and is the
standard template engine for the Flask web framework. The logic and display of your
application can be separated with Jinja2, making it simpler to maintain and edit web
pages and other text-based documents.
Utilising pip, the Python package manager, you can install Jinja2. Run the following
Pip install jinja2==2.10.1
 MarkupSafe==1.1.1: A Python package called MarkupSafe offers tools for escaping

and formatting strings so that they can be used safely in markup languages like
HTML and XML. By appropriately escaping user-supplied data before rendering it in
the output, it is frequently used in conjunction with templating engines like Jinja2 to
prevent Cross-Site Scripting (XSS) attacks.
Using the Python package manager pip, you can install MarkupSafe. Run the
following command after opening your command-line interface:
Pip install Markupsafe==1.1.1
 Werkzeug==0.15.5: A complete Python WSGI (Web Server Gateway Interface)

utility library is called Werkzeug. It offers a collection of tools and utilities used
frequently in online development, making it simpler to manage URLs, handle HTTP
requests, and interact with other web-related ideas. The underlying library utilised by
the Flask web framework is called Werkzeug, and other WSGI apps can use it on their
own.
The Python package manager, pip, can be used to install Werkzeug. Run the following
command once your command-line interface is open:
Pip install Werkzeug==0.15.5
 numpy>=1.9.2: In scientific computing, data analysis, and machine learning, NumPy

is a potent Python toolkit for numerical computations. It supports multi-dimensional
arrays and matrices and offers a huge library of mathematical operations to effectively
work with these arrays.The following are some essential facts regarding NumPy,
specifically version 1.9.2 and versions up to and including 1.9.2.
Use pip, the Python package manager, to install NumPy with a version of 1.9.2 or
higher. Run the following command after opening your command-line interface:
Pip install numpy>=1.9.2
 scipy>=0.15.1: SciPy is an uncommercial Python module computing in the scientific

and technical fields. It builds on NumPy and offers a large number of functions for
diverse scientific applications, including signal processing, linear algebra, statistics,
optimisation, integration, and interpolation. For complex scientific computations,
15
SciPy is frequently used in disciplines including physics, engineering, biology, and

data science. Use pip, the Python package manager, to install SciPy with a version
equal to or higher than 0.15.1. Run the following command after opening your
command-line interface:
pip install scipy>=0.15.1
 scikit-learn>=0.18: A well-known open-source machine learning library for Python

is called scikit-learn, which is frequently abbreviated as sklearn. It is based on
existing libraries like NumPy, SciPy, and matplotlib and offers straightforward and
effective tools for data mining and data analysis. Machine learning professionals
frequently utilize scikit-learn, which provides a number of methods for classification,
regression, clustering, dimensionality reduction, and other tasks. Use pip, the Python
package manager, to install scikit-learn with a version equal to or greater than 0.18.
Run the following command after opening your command-line interface:
pip install scikit-learn>=0.18
 matplotlib>=1.4.3: A popular Python package for producing static, interactive, and

animated data visualisations is called Matplotlib. It is a crucial tool for data
visualisation and analysis because it offers a flexible and user-friendly interface for
creating high-quality plots, charts, and figures.
Use pip, the Python package manager, to install Matplotlib with a version equal to or
greater than 1.4.3. Run the following command after opening your command-line
interface:
Pip install matplotlib>=1.4.3
 pandas>=0.19: A potent Python package for data analysis and manipulation is called
pandas. For handling and analyzing structured data, it offers data structures like Series
(1-dimensional labelled arrays) and Data Frame (2-dimensional labelled data tables).
Pandas is frequently used for data preprocessing, exploration, and cleaning activities
in data science, machine learning, finance, and numerous other fields.
Use pip, the Python package manager, to install pandas with a version equal to or
higher than 0.19. Run the following command after opening your command-line
interface:
Pip install pandas>=0.19
16
The preprocessing stages are a crucial step in obtaining clear and unambiguous information
so that the results of order assurance can be more precisely predicted in the future
 Case Folding: A typical preprocessing step in natural language processing

applications like sentiment analysis is case folding, sometimes referred to as text
normalisation or lowercase conversion. Since capitalization rarely affects the
semantic meaning of words, it entails changing all text to lowercase to achieve
uniformity and consistency.Using the lower() method of the string object, case
folding is simple to do in Python.
I. Add the required libraries:
import re
II. Create a function to handle case folding and any optional further text
preprocessing. After removing any non-alphanumeric characters with a
regular expression, the text will be changed to lowercase.
def preprocess_text(text):
# Remove non-alphanumeric characters and replace with spaces
processed_text = re.sub(r'[^a-zA-Z0-9\s]', ' ', text)
# Convert to lowercase
processed_text = processed_text.lower()
return processed_text
Example: Input function

# Example text for sentiment analysis
text = "This is an EXAMPLE sentence with mixed CASES."
# Preprocess the text using case folding

preprocessed_text = preprocess_text(text)
# Output the preprocessed text

print(preprocessed_text)
17
Output
this is an example sentence with mixed cases
 Symbol Removal: is the stage in which punctuation (period (! ), comma (,), question
mark (? ), exclamation point (!) and other symbols are applied, as well as explicit
characters (&,%, $, #, @ and other symbols) and numbers (0,1,2... to 9).
Example:
Input Function
# Example text with symbols
text_with_symbols = "Hello, this is an example text with some !@#$%^&*()_+
symbols."
# Remove symbols from the text

cleaned_text = remove_symbols(text_with_symbols)
# Output the cleaned text

print(cleaned_text)
Output
Hello this is an example text with some symbols
 StopWords: Common words that regularly appear in a language are known as

stopwords, and they are typically thought to have minimal semantic significance. The
terms "the," "is," "and," "in," "of," and "a," for instance, are stopwords in English.
These stopwords are frequently eliminated in activities involving natural language
processing, such as text categorization or sentiment analysis, to increase the
effectiveness and precision of the analysis.
Python has a number of modules that offer lists of words to avoid in many languages.
The Natural Language Toolkit (NLTK) is one of the most widely used libraries for
NLP activities. If you haven't previously, install the NLTK library before using
stopwords in Python.
Install NLTK library using this
pip install nltk
The StopWords data for the English language after installing NLTK:
import nltk
nltk.download('stopwords')
Example:
18
Input
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
# Download stopwords data for English

nltk.download('stopwords')
nltk.download('punkt')
def remove_stopwords(sentence):
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(sentence)
filtered_sentence = [word for word in word_tokens if word.lower() not in
stop_words]
return ' '.join(filtered_sentence)
# Example sentence with stopwords

sentence = "This is an example sentence with some common stopwords."
# Remove stopwords from the sentence

cleaned_sentence = remove_stopwords(sentence)
# Output the cleaned sentence

print(cleaned_sentence)
Output
example sentence common stopwords .
 Data cleaning: In the pipeline of data preprocessing, data cleansing is a vital stage.
The process of identifying and correcting defects, contradictions, and errors in a data
set is necessary to enhance its quality and get it ready for additional research.
I. Treatment of Missing Values:
Datasets frequently have missing values, which can cause issues during
analysis. Using pandas, a well-liked Python data manipulation toolkit, you
can deal with missing values.
Example:
import pandas as pd
# Read the dataset

df = pd.read_csv('your_dataset.csv')
# Check for missing values

print(df.isnull().sum())
# Drop rows with missing values

df = df.dropna()
19
# Fill missing values with a specific value

df['column_name'] = df['column_name'].fillna(value)
II. Removing Duplicats: Results of analyses may be distorted by duplicate
rows. The drop_duplicates() method in pandas can be used to eliminate
duplicates.
# Remove duplicates
df = df.drop_duplicates()
III. Data Type Conversion: For analysis, make sure columns have the
appropriate data types. You can change the data type using pandas.
# Convert a column to a numeric type
df['column_name'] = pd.to_numeric(df['column_name'])
# Convert a column to a date type

df['date_column'] = pd.to_datetime(df['date_column'])
IV. Outlier Detection and Handling: Statistics can be affected by outliers.
Using several methodologies, such as the Z-score or IQR (Interquartile
Range), you can locate outliers and treat them.
# Detect outliers using Z-score

from scipy import stats
z_scores = np.abs(stats.zscore(df['column_name']))
outliers = df['column_name'][z_scores > threshold]
# Remove outliers
df = df[~df['column_name'].isin(outliers)]
V. Text Cleaning: You can lowercase, remove punctuation, and stopword text
data using various methods.
import re
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
def clean_text(text):
text = text.lower()
text = re.sub(r'[^\w\s]', '', text)
stop_words = set(stopwords.words('english'))
words = word_tokenize(text)
cleaned_words = [word for word in words if word not in stop_words]
return ' '.join(cleaned_words)
df['text_column'] = df['text_column'].apply(clean_text)
VI. Feature Scaling: You may want to use feature scaling to bring numerical
features that are on various scales into a range that is similar.
from sklearn.preprocessing import MinMaxScaler

20
scaler = MinMaxScaler()
df[['numerical_column1', 'numerical_column2']] =
scaler.fit_transform(df[['numerical_column1', 'numerical_column2']])
5 ALGORITHM and MODEL OF DATA ANALYSIS
Now I will short description all of those Algorithm
SVM Algorithm
One effective machine learning approach for sentiment analysis is called Support Vector
Machines (SVM). Finding the sentiment or attitude communicated in a text (such as whether
it is favourable, negative, or neutral) is the aim of sentiment analysis. Based on the features
that have been taken out of the text, SVM can be used to categorize text data into various
sentiment groups.
The following actions to perform sentiment analysis in Python using SVM:
1. Data preprocessing: Cleanse and preprocess the text data to prepare the dataset. This
includes operations like erasing punctuation, changing the text's case to lowercase,
and eliminating stop words.
2. Feature Extraction : Convert the preprocessed text data into numerical features that
SVM may use with feature extraction. Term Frequency-Inverse Document Frequency
(TF-IDF) representation is one such technique.
3. Training the SVM Model: Split your dataset into a training set and a testing set
before starting to train the SVM model. The SVM model should next be trained using
the training set using the retrieved features.
4. Evaluating The Model: Model Evaluation: Using the testing set, assess the trained
SVM model's performance.
Here is a sample Python program that uses the scikit-learn module and the TF-IDF
representation:
# Importing necessary libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
# Sample dataset (replace this with your own dataset)
21
data = pd.DataFrame({
'text': ['I love this product!', 'This is terrible.', 'It is okay.'],
'sentiment': ['positive', 'negative', 'neutral']
})
# Data preprocessing (optional, you can add more steps based on
your needs)
data['text'] = data['text'].str.lower()
# Splitting the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(data['text'],
data['sentiment'], test_size=0.2, random_state=42)
# TF-IDF vectorization
tfidf_vectorizer = TfidfVectorizer()
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)
# Creating and training the SVM model

svm_model = SVC(kernel='linear')
# You can also try different kernels like 'rbf'
svm_model.fit(X_train_tfidf, y_train)
# Making predictions on the test set

predictions = svm_model.predict(X_test_tfidf)
# Evaluating the model

accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
# Classification report (includes precision, recall, F1-score, etc.)

print(classification_report(y_test, predictions))
Naïve Bayes
Another well-liked machine learning algorithm frequently used for sentiment analysis
jobs is naive bayes. The Bayes theorem-based probabilistic technique is effective with text
data and high-dimensional feature spaces.
Similar to the SVM method, you can use Naive Bayes to perform sentiment analysis
in Python.
1. Data Pre-Processing: Similar to the last example, prepare the dataset by
cleaning and preparing the text data.
2. Feature Extraction: Create numerical characteristics from the text data that
has been preprocessed. Similar to the SVM method, we can utilise the Bag-of-
Words or TF-IDF representation for Naive Bayes.
22
3. Training the Naïve Bayes Model: Train the Naive Bayes model using the
features that were retrieved after dividing the dataset into a training set and a
testing set.
4. Evaluating the Model: Utilizing the testing set, assess the trained Naive
Bayes model's performance.
Here is an example Python program that uses the TF-IDF format and the
scikit-learn library:

import pandas as pd
from sklearn.naive_bayes import MultinomialNB

})
# Data preprocessing (optional, you can add more steps based on your needs)

X_train, X_test, y_train, y_test = train_test_split(data['text'], data['sentiment'],
test_size=0.2, random_state=42)
# Creating and training the Naive Bayes model

naive_bayes_model = MultinomialNB()
naive_bayes_model.fit(X_train_tfidf, y_train)

predictions = naive_bayes_model.predict(X_test_tfidf)

23

Logistic Regression
Another well-liked approach for sentiment analysis jobs is logistic regression. It is an

effective binary classification system for handling text data.
I'll show you how to use Python's scikit-learn module to perform sentiment analysis using
Logistic Regression in this example. As with the prior cases, we'll extract features using the
TF-IDF model.
Example:
import pandas as pd
from sklearn.linear_model import LogisticRegression

})

X_train, X_test, y_train, y_test = train_test_split(data['text'], data['sentiment'], test_size=0.2,
random_state=42)
24
# Creating and training the Logistic Regression model

logistic_regression_model = LogisticRegression()
logistic_regression_model.fit(X_train_tfidf, y_train)

predictions = logistic_regression_model.predict(X_test_tfidf)


Although we've just used a tiny sample dataset in this example, you should utilise a larger
dataset to improve the performance of your model. To increase the model's accuracy, you
might also wish to experiment with other preprocessing methods and hyperparameter
adjustment.
LTSM
It is effective to use Long Short-Term Memory (LSTM) for sentiment analysis, particularly
when working with Textual information that is sequential. RNNs (recurrent neural
networks)of the network of long short-term memories variety are particularly good at
capturing long-term dependencies in sequences.
I'll show you how to use the Keras library, a high-level neural networks API built on top of
TensorFlow, to do sentiment analysis using LSTM in Python in this example.
Example:
import pandas as pd
from keras.models import Sequential
from keras.layers import LSTM, Dense, Embedding
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

25
})
# Tokenization
tokenizer = Tokenizer()
tokenizer.fit_on_texts(data['text'])
vocab_size = len(tokenizer.word_index) + 1
X = tokenizer.texts_to_sequences(data['text'])
X = pad_sequences(X)

X_train, X_test, y_train, y_test = train_test_split(X, data['sentiment'], test_size=0.2,
random_state=42)
# Creating the LSTM model

embedding_dim = 100
max_length = X.shape[1]
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim,
input_length=max_length))
model.add(LSTM(128))
model.add(Dense(1, activation='sigmoid'))
# Compiling the model

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Training the model

model.fit(X_train, y_train, epochs=10, batch_size=32)

y_pred = model.predict_classes(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(classification_report(y_test, y_pred))
26
Although we've just used a tiny sample dataset in this example, you should utilise a larger
dataset to improve the performance of your model. The text data is transformed into
numerical sequences using the tokenizer, and all of the sequences are made to be the same
length using pad_sequences before being fed into the LSTM.
BERT
Powerful pre-trained language model called BERT (Bidirectional Encoder Representations

from Transformers) was created by Google. Sentiment analysis is just one of the many
natural language processing applications that have seen widespread use. You can use the
Hugging Face transformers library, which offers simple access to pre-trained transformer
models like BERT, to use BERT for sentiment analysis in Python.
Here is a step-by-step tutorial on how to use Python's BERT for sentiment analysis:
1. Install required libraries:
pip install transformers

pip install torch
2. Import Necessary Library:

pip install transformers
pip install torch
3. Lode the pre-trained BERT model and tokenizer:
# Load BERT pre-trained model and tokenizer

model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
27
4. Performance the data and tokenize the input sentences:

data = [
{"text": "I love this product!", "sentiment": "positive"},
{"text": "This is terrible.", "sentiment": "negative"},
{"text": "It is okay.", "sentiment": "neutral"}
]
# Tokenize input sentences

sentences = [item["text"] for item in data]
labels = [item["sentiment"] for item in data]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
5. Make predictions using BERT:
# Make predictions
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = softmax(logits, dim=1)
predicted_labels = torch.argmax(probabilities, dim=1)
6. Evaluate the models predictions:
# Convert predicted labels to sentiment labels

id2label = {0: "negative", 1: "neutral", 2: "positive"}
predicted_sentiments = [id2label[label.item()] for label in predicted_labels]
# Evaluate the model's predictions

for i, item in enumerate(data):
print(f"Text: {item['text']}, Predicted Sentiment: {predicted_sentiments[i]}, True Sentiment:

{item['sentiment']}")
We utilized a tiny sample dataset for this example, however for better model performance,
you should use your own larger dataset. Running BERT models on a CPU could be laborious
because they require a lot of processing. If a GPU is available, think about using it for
improved performance.
28
 Model Training : The preprocessed and feature-extracted data are then used
to train the chosen model. A training set and a testing set are created from the
dataset in order to assess the effectiveness of the model.
 Model Evaluation: The model is assessed using the testing set once it has
been trained to determine its accuracy and other performance measures
including precision, recall, and F1-score. How well the model can predict
sentiment on unobserved data is determined by the evaluation.
 Sentiment Prediction : Once trained and assessed, the model can be used to
anticipate the tone of fresh restaurant evaluations. The model outputs a
sentiment label (such as positive, negative, or neutral) from the input text data
that has been preprocessed and feature extracted.
 Deployment : Finally, the sentiment analysis model that has been trained can
be used to analyse sentiment instantly. It may be incorporated into restaurant
management systems to track patron comments and offer priceless information
for enhancing the patron experience.
6 METHODOLOGY
Data collection and preprocessing, model training, and model evaluation are all steps in the
approach for Using Python Language for Sentiment Analysis of Restaurant Reviews . An instruction
manual for performing sentiment analysis on restaurant reviews is provided below:
Data Collection
Data Processing
Data Labeling
Data Splitting
Feature Extraction
Model Evaluation
29
Hyperparameter
Sentiment Prediction
Deployment
7 Classification
After dividing the dataset into its component parts, we teach the algorithm how to
classify the data by feeding it training data. Numerous classification techniques have been
used, including Naive Bayes, the decision trees, random forests, and a classifier using the
support vector algorithm (SVM). The conditional probability model is the foundation of the
Naive Bayes method. Sentiment Analysis of Restaurant Reviews The Naive Bayes classifier
assumes feature freedom and gives the probability as though the data set to be classified is
expressed as a vector x (X1,..,..Xn) of n distinct features.
p(Ck|x1... xn) =p(Ck) p(xi |Ck)
Here, C k represents K th class name
The decision tree classifier uses a number of conditions and questions to create a tree
structure in which the leaf nodes correspond to the necessary classifications. Entropy is
calculated in order to choose between the tree's roots.
H= ¿−∑ p ( x ) logp ( x)
In order to categorize the provided data, the SVM classifier creates a
→
hyperplane between the set of points x as
w x−b=0
Here, w is the normal vector to the hyperplane.
Each classification algorithm has advantages and disadvantages, and the nature of the
dataset affects how well it performs. Every Set of training data for the method to test set
ratio is changed and tested To raise efficiency.
8 Multinomial Naïve Bayes

30
The text mining industry's most popular classification technique is this one. Natural
Language Processing (NLP) uses it frequently because of its excellent performance. The
algorithm is supported by the Bayes theorem. utilizing naive Bayes principles for a text
evaluation and class (M) and (N). When N is the supplied instance to be identified and M
is the class of potential outcomes, the Bayes theorem calculates the probability P(M|N),
where M is the class of potential outcomes and N is the supplied instance.
The formula is given below:
P(M|N) = P(M) * P(N|M)/P(N)
Where,
P(N) = prior probability of N
P(M) = prior probability of class M
P(N|M) = occurrence of predictor N given class M probability
Random Forest: A classifier called Random Forest uses a variety of decision trees stored on
various subsets of a dataset to increase the predicted accuracy of the dataset. An assortment of
various decision trees with a single initial contrast make up a random forest. perhaps then selecting
the most advantageous divisor from the full list of elements, the calculation selects a random subset
of the variables.
Decision Tree: Decision trees, a kind of classification method, are a part of the supervised learning
technique. A decision tree uses both internal and external nodes to make decisions. The objective of
a decision tree is to characterise an item by creating a set of valid/false articulations. Entropy for
several qualities is represented mathematically as:
E(M,X) = ⅀ P(c) E(c) c€X

Support Vector Machine: A supervised machine learning approach called support vector
machines can be applied to classification or regression issues. Support vector regression
(SVR), which is an extension of support vector classification (SVC), is one example of a
specific sort of SVM that can be used for different machine learning applications.
9 Achievement Rating Assessment:

The efficacy of any algorithm for classification is determined by passing the test dataset
to a model that has been trained.. Through that process, we can see how well an the
dataset and can predict the types of data that come in because the algorithm has adjusted
to it. The number of False Positive (FP), True Positive (TP), and False Negative (FN),
True Negative (TN), values produced by the provided dataset is contained in the
confusion matrix that we create. The equations The incorrect acceptance rates (FAR) and
incorrect rejection rates (FRR) should be given.
FT FN
FAT = ∧FRR =
FT + TN FN +TP
Utilize this formula to figure out its precision.

31
TP+TN
Accuracy=
TP+TN + FP+ FN
a sentiment analytics system's efficiency agrees with human judgements determines

how accurate it is. For future prediction, the appropriate algorithm is employed based
on the value of Accuracy.
We can calculate the precision, recall, and f1-support of the performance metric
evaluation for such algorithms using the following equation.
a) Precision: The word "precision" refers to a high predictive value. To determine
precision, apply the following equation.
Precision = TP/ (FP + TP)

High precision indicates that the algorithm is doing properly
b) Recall: Recall is the proportion of favorable evaluations among all positive

reviews that are accurately categorized. Recall can be calculated using the equation
below.
Recall = TP / (TP + FN)
c) F1-source: The f1-score of each technique must be determined in order to select a

particular learning algorithm from a large selection of algorithms. The f1-score can be
calculated using the following equation.
f1 -score = (2 * precision * recall) / (precision + recall)
10 Predicting a Class
The chosen algorithm can be utilized for predicting the class of a fresh dataset. when it is
received. The machine can generate the most appropriate class because it has already learned
the characteristics of the dataset. Because we used reviews of restaurants when a new client
submits a review, it is added to our dataset and used as part of the algorithm, This decides
whether the evaluation of the restaurant is favorable or bad.
11 Implementation and Performance Analysis

32
The accuracy,Precision,Recall,F1 Score of LR( Logistic Regression model),

DT(Decision Tree Model), RF( Random Forest Model) , MNB( Multinomial Naïve
model) , KNN ( K- Nearest Neighbors model) , Linear SVM ( Linear Support Vector
Machine model ) , SGD ( Stochastic Gradient Descent model ) of are given a table which
I calculate of those models :
Dataset Summary:
Class Name: negative

Number of Documents:196
Number of Words:2124
Number of characters:18534
Number of Average length :8.725988700564972
Number of Unique Words:742
Most frequent Words:

Was 117
I 56
Not 58
Food 52
Very 38
Good 34
Bad 31
But 29
They 29
Chicken 27
Class Name : positive

Number of Documents:190
Number of Words:1598
Number of characters:14353
Number of Average length:8.981852315394242
Number of Unique Words:550
Most frequent Words:

33
Was 123
Good 85
Food 49
Very 33
But 32
I 27
So 26
To 22
Quality 21
not 21
Dataset Summary Visualization
Class Names Category Values

0 negative Total Documents 196
1 Positive Total Documents 190
2 negative Total Words 2124
3 positive Total Words 1598
4 negative Unique Word 742
5 positive Unique Word 550
The graphs of that table are given bellow :

34
Performance table for Unigram :
Accuracy Precision Recall F1 Score
76.840000
LR 76.920000 76.980000 76.860000
DT 76.920000 76.840000 76.980000 76.860000
RF 76.920000 76.840000 76.980000 76.860000
MNB 76.920000 76.840000 76.980000 76.860000
KNN 76.920000 76.840000 76.980000 76.860000
Linear SVM 76.920000 76.840000 76.980000 76.860000
RBF SVM 76.920000 76.840000 76.980000 76.860000
SGD 76.920000 76.840000 76.980000 76.860000

35
The ROC curve Analysis for unigram features are given bellow
Precision-Recall Curve of those values are given bellow :

36
Performance Table for Bigram:
74.24
LR 73.08 73.81 73.04
DT 73.08 74.24 73.81 73.04
RF 73.08 74.24 73.81 73.04
MNB 73.08 74.24 73.81 73.04
KNN 73.08 74.24 73.81 73.04
Linear SVM 73.08 74.24 73.81 73.04
RBF SVM 73.08 74.24 73.81 73.04
SGD 73.08 74.24 73.81 73.04

37
The ROC curve Analysis for Bigram features are given bellow:
38
Performance Table for Tri-gram:
79.49
LR 78.21 78.97 78.17
DT 78.21 79.49 78.97 78.17
RF 78.21 79.49 78.97 78.17
MNB 78.21 79.49 78.97 78.17

39
KNN 78.21 79.49 78.97 78.17
Linear SVM 78.21 79.49 78.97 78.17
RBF SVM 78.21 79.49 78.97 78.17
SGD 78.21 79.49 78.97 78.17
The ROC curve Analysis for Tri-gram features are given bellow:
40

41
12 Conclusion
In this study, we test the effectiveness of various algorithms on a dataset of restaurant

reviews and examine the algorithm that performs the best. There are three features.
They are Unigram, Bigram, Trigram. Those accuracy given bellow
For Unigram:
Highest Accuracy achieved by LR at = 76.92
Highest F1-Score achieved by LR at = 76.86
Highest Precision Score achieved by LR at = 76.84
Highest Recall Score achieved by LR at = 76.98
For Bigram:
For Trigram:

From those data, we can say that the highest accuracy of those three features are Trigram
(78.21%).If I selected the Stochastic Gradient Descent model ,then we get the accuracy is
(69.23%).By using this model we can calculate sentiment of any review which is positive or
negative .This model also give the probability of positive or negative sentiment.
42
11.References
[1] K. Ravi and V. Ravi, “A survey on opinion mining and sentiment analysis: Tasks,
approaches and applications,” Knowledge-Based Syst., vol. 89, pp. 14–46, Nov. 2015,
doi: 10.1016/j.knosys.2015.06.015.
[2] V. A. and S. S. Sonawane, “Sentiment Analysis of Twitter Data: A Survey of
Techniques,” Int. J. Comput. Appl., vol. 139, no. 11, pp. 5–15, Apr. 2016, doi:
10.5120/ijca2016908625.
[3] S. Schrauwen, “Machine Learning Approaches To Sentiment Analysis Using the
Dutch Netlog Corpus,” 2010.
[4] M. S. Neethu and R. Rajasree, “Sentiment analysis in twitter using machine learning
techniques,” in 2013 Fourth International Conference on Computing,
Communications and Networking Technologies (ICCCNT), IEEE, Jul. 2013, pp. 1–5.
doi: 10.1109/ICCCNT.2013.6726818.
[5] A. P. Jain and P. Dandannavar, “Application of machine learning techniques to
sentiment analysis,” in 2016 2nd International Conference on Applied and Theoretical
Computing and Communication Technology (iCATccT), IEEE, 2016, pp. 628–632.
doi: 10.1109/ICATCCT.2016.7912076.
[6] G. Gautam and D. Yadav, “Sentiment analysis of twitter data using machine learning
approaches and semantic analysis,” in 2014 Seventh International Conference on
Contemporary Computing (IC3), IEEE, Aug. 2014, pp. 437–442. doi:
10.1109/IC3.2014.6897213.
[7] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and
applications: A survey,” Ain Shams Eng. J., vol. 5, no. 4, pp. 1093–1113, Dec. 2014,
doi: 10.1016/j.asej.2014.04.011.
[8] R. Liu, R. Xiong, and L. Song, “A sentiment classification method for Chinese
document,” in 2010 5th International Conference on Computer Science & Education,
IEEE, Aug. 2010, pp. 918–922. doi: 10.1109/ICCSE.2010.5593462.
[9] L. Ramachandran and E. F. Gehringer, “Automated Assessment of Review Quality
Using Latent Semantic Analysis,” in 2011 IEEE 11th International Conference on
Advanced Learning Technologies, IEEE, Jul. 2011, pp. 136–138. doi:
10.1109/ICALT.2011.46.
[10] B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis,” Found. Trends® Inf.
43
Retr., vol. 2, no. 1–2, pp. 1–135, 2008, doi: 10.1561/1500000011.

[11] H. Tang, S. Tan, and X. Cheng, “A survey on sentiment detection of reviews,” Expert
Syst. Appl., vol. 36, no. 7, pp. 10760–10773, Sep. 2009, doi:
10.1016/j.eswa.2009.02.063.
[12] O. Sharif, M. M. Hoque, and E. Hossain, “Sentiment Analysis of Bengali Texts on
Online Restaurant Reviews Using Multinomial Naïve Bayes,” in 2019 1st
International Conference on Advances in Science, Engineering and Robotics
Technology (ICASERT), IEEE, May 2019, pp. 1–6. doi:
10.1109/ICASERT.2019.8934655.
[13] S. C, D. P. Ravikumar, and M. A. M.J, “Sentiment Analysis of Customer Feedback on
Restaurant Reviews,” SSRN Electron. J., 2019, doi: 10.2139/ssrn.3506637.
[14] M. Adnan, R. Sarno, and K. R. Sungkono, “Sentiment Analysis of Restaurant Review
with Classification Approach in the Decision Tree-J48 Algorithm,” in 2019
International Seminar on Application for Technology of Information and
Communication (iSemantic), IEEE, Sep. 2019, pp. 121–126. doi:
10.1109/ISEMANTIC.2019.8884282.
[15] G. Ganu, Y. Kakodkar, and A. Marian, “Improving the quality of predictions using
textual information in online user reviews,” Inf. Syst., vol. 38, no. 1, pp. 1–15, Mar.
2013, doi: 10.1016/j.is.2012.03.001.

Final Project Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Project Report

Uploaded by

Copyright:

Available Formats

1

CHAPTER 2. ARCHITECTURE and MODELING……………………………..10

2.1. Architectural diagram…………………………………...……10

CHAPTER 3. DATA COLLECTION and PREPPING OF INFORMATION..…12

CHAPTER 4. ALGORITHM and MODEL of DATA ANALYSIS……………….20

CHAPTER 6. IMPLEMENTATION AND PERFORMANCE ANALYSIS………….31

Either a based on information technique or a strategy based on artificial intelligence can be

1.1 Related Work

1.2 Objective of Project

 To create a model for prediction that can predict whether a review of

2 Architecture and Modeling

Database of Data Preparation of

Fig.1 Architectural diagram

 Data Collection : Obtaining a dataset of restaurant reviews is the first stage in

Now I will short description about those python library

 Itsdangerous 1.1.0: Several security-related functions are offered by the Python

 Jinja2==2.10.1: Python has a sophisticated and popular templating engine called

 MarkupSafe==1.1.1: A Python package called MarkupSafe offers tools for escaping

 Werkzeug==0.15.5: A complete Python WSGI (Web Server Gateway Interface)

 numpy>=1.9.2: In scientific computing, data analysis, and machine learning, NumPy

 scipy>=0.15.1: SciPy is an uncommercial Python module computing in the scientific

SciPy is frequently used in disciplines including physics, engineering, biology, and

 scikit-learn>=0.18: A well-known open-source machine learning library for Python

 matplotlib>=1.4.3: A popular Python package for producing static, interactive, and

 Case Folding: A typical preprocessing step in natural language processing

Example: Input function

# Preprocess the text using case folding

# Output the preprocessed text

# Remove symbols from the text

# Output the cleaned text

 StopWords: Common words that regularly appear in a language are known as

# Download stopwords data for English

# Example sentence with stopwords

# Remove stopwords from the sentence

# Output the cleaned sentence

# Read the dataset

# Check for missing values

# Drop rows with missing values

# Fill missing values with a specific value

# Convert a column to a date type

# Detect outliers using Z-score

from sklearn.preprocessing import MinMaxScaler

5 ALGORITHM and MODEL OF DATA ANALYSIS

Now I will short description all of those Algorithm

# Importing necessary libraries

# Splitting the dataset into training and testing sets

# Creating and training the SVM model

# Making predictions on the test set

# Evaluating the model

# Classification report (includes precision, recall, F1-score, etc.)

# Importing necessary libraries

# Sample dataset (replace this with your own dataset)

# Splitting the dataset into training and testing sets

# Creating and training the Naive Bayes model

# Making predictions on the test set

# Evaluating the model

# Classification report (includes precision, recall, F1-score, etc.)

Another well-liked approach for sentiment analysis jobs is logistic regression. It is an

# Sample dataset (replace this with your own dataset)

# Splitting the dataset into training and testing sets

# Creating and training the Logistic Regression model

# Making predictions on the test set

# Evaluating the model