You are on page 1of 67

PREDICTION OF RESTAURANT BY USING NLP

METHOD TO BUILT MACHINE LEARNING MODEL

A project Report submitted

in partial fulfillment for the award of the Degree of

Bachelor of
Technology in
Computer Science and
Engineering by

Y, BALA KISHORE REDDY (U18CS117)


B.N.V.S. PRANEETH (U18CS093)
SARTAJ HASAN MURTAJA (U18CS105)

Under the guidance of


Mrs. K. VINOTHINI

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SCHOOL OF COMPUTING

BHARATH INSTITUTE OF HIGHER EDUCATION AND RESEARCH


(Deemed to be University Estd u/s 3 of UGC Act, 1956)

CHENNAI 600 073, TAMILNADU, INDIA


April, 2022

1
CERTIFICATE

This is to certify that the project report entitled PREDICTION OF RESTAURANT


REVIEW BY USING NLP METHOD TO BUILT MACHINE LEARNING
MODEL submitted by Y.BALA KISHORE REDDY (U18CS117),
B.N.V.S.PRANEETH(U18CS093)SARTAJ HASAN(U18CS105) to the Department
of Computer Science and Engineering, Bharath Institute of Education and Research,
in partial fulfillment for the award of the degree of B. Tech in (Computer Science
and Engineering) is a bona fide record of project work carried out by them under my
supervision. The contents of this report, in full or in parts, have not been submitted to
any other Institution or University for the award of any other degree.

<Signature of Supervisor>
MRS. K VINOTHINI

Department of Computer Science & Engineering,


School of Computing,
Bharath Institute of Higher Education and Research
April, 2022

<Signature of Head of the Department>

Dr. B. Persis Urbana Ivy


Professor & Head
Department of Computer Science & Engineering,
School of Computing,
Bharath Institute of Higher Education and Research,

April, 2022

2
DECLARATION

We declare that this project report titled PREDICTION OF


RESTAURANT REVIEW BY USING NLP METHOD TO BUILT
MACHINE LEARNING MODEL submitted in partial fulfillment of the
degree of B. Tech in (Computer Science and Engineering) is a record of
original work carried out by us under the supervision of Mrs. K.
VINOTHINI , and has not formed the basis for the award of any other
degree or diploma, in this or any other Institution or University. In keeping
with the ethical practice in reporting scientific information, due
acknowledgements have been made wherever the findings of others have
been cited.

<Signature>
Y. BALA Kishore Reddy
(U18CS117)

<Signature>
B.N.V.S. PRANEETH
(U18CS093)

<Signature>
SARTAJ HASSAN MURTAJA
(U18CS105)

Chennai
<Date>
3
AKNOWLEDGEMENT

First, we wish to thank the almighty who gave us good health and success throughout our
project work.
We express our deepest gratitude to our beloved President Dr. J. Sundeep Aanand, and
Managing Director Dr.E. Swetha Sundeep Aanand for providing us the necessary facilities for
the completion of our project.
We take great pleasure in expressing sincere thanks to Vice Chancellor (I/C) Dr. K.
Vijaya Baskar Raju, Pro Vice Chancellor (Academic) Dr. M. Sundararajan, Registrar Dr. S.
Bhuminathan and Additional Registrar Dr. R. Hari Prakash for backing us in this project.
We thank our Dean Engineering Dr. J. Hameed Hussain for providing sufficient facilities for
the completion of this project.
We express our immense gratitude to our Academic Coordinator Mr. G. Krishna
Chaitanya for his eternal support in completing this project.
We thank our Dean, School of Computing Dr. S. Neduncheliyan for his encouragement
and the valuable guidance.
We record indebtedness to our Head, Department of Computer Science and
Engineering Dr. B. Persis Urbana Ivy for immense care and encouragement towards us
throughout the course of this project.
We also take this opportunity to express a deep sense of gratitude to our Supervisor Dr.
Nalini Joseph for her cordial support, valuable information and guidance, she helped us in
completing this project through various stages.
We thank our department faculty, supporting staff and friends for their help and
guidance to complete this project.

Y. BALA KISHORE REDDY( U18CS117)


B. N.V.S. PRANEETH(U18CS093)
SARTAJ HASSAN MURTAJA (U18CS105)

4
ABSTRACT

One of the most effective tools any restaurant has is the ability to track food and beverage sales daily.
Currently, Recommender systems plays an important role in both academia and industry. These are very
helpful for managing information overload. In this paper, we applied machine learning techniques for user
reviews and analyze valuable information in the reviews. Reviews are useful for making decisions for both
customers and owners. We build a machine learning model with Natural Language Processing techniques that
can capture the user's opinions from users’ reviews. For experimentation, the python language was used.
Keywords: recommender systems, machine learning, python

LIST OF FIGURES
NAME OF THE IMAGE PG.NO
Block Diagram
Real world project 13
Node MCU ESP8266 16
Node MCU ESP8266 Pinout 17
Node MCU ESP8266 development board 18
Gas sensor module 24
Object exists in a defined position 34
UBIDOTS Data Hierarchy 52
Test Case Results 65-70
TABLE OF CONTENTS

DESCRIPTION PAGE NUMBER


CERTIFICATE iii
DECLARATION v
ACKNOWLEDGEMENTS vii
ABSTRACT ix
LIST OF FIGURES xiii
LIST OF TABLES xv
ABBREVIATIONS/ NOTATIONS/ NOMENCLATURE xvii
1. TITLE OF CHAPTER 1 1
1.1 Section heading name 1
1.2 Section heading name 1
1.2.1 Second level section heading 3
1.3 Section heading name 4
1.4 Section heading name 5
1.4.1 Second level section heading 8
1.4.2 Second level section heading 11
1.4.2.1 Third level section heading 20
2. TITLE OF CHAPTER 2 23
2.1 Section heading name 23
2.2 Section heading name 24
2.2.1 Second level section heading 25
2.3 Section heading name 26
2.4 Section heading name 28
2.4.1 Second level section heading 30
2.4.2 Second level section heading 35

3. TITLE OF CHAPTER 3 41
3.1 Section heading name 41

3.2 Section heading name 44


3.3 Section heading name 50
3.4 Section heading name 52
3.4.1 Second level section heading 59
3.4.2 Second level section heading 65
3.4.2.1 Third level section heading 70
4. TITLE OF CHAPTER 4 75
4.1 Section heading name 75
4.2 Section heading name 79
4.2.1 Second level section heading 89
4.3 Section heading name 93
4.4 Section heading name 101
4.4.1 Second level section heading 126
4.4.2 Second level section heading 150
4.4.2.1 Third level section heading 190
5. TITLE OF CHAPTER 5 207
5.1 Section heading name 211
5.2 Section heading name 290
5.3 Section heading name 311
5.3.1 Second level section heading 329
5.3.2 Second level section heading 330
5.3.2.1 Third level section heading 340
REFERENCES 349
Appendix 1 Title of the appendix 1 361
Appendix 2 Title of the appendix 2 362
CHAPTER 1

INTRODUCTION

Restaurant customers give their ratings and write reviews based on their satisfaction levels. These
ratings and reviews help the other customers to make decision on going to those restaurants. These
ratings are also helpful for the restaurant owners to make changes based their reviews for improving
their business Restaurant reviews contains textual information. But most of the machine learning
algorithms works with numerical data only. Machine learning can be considered as one of the
applications of artificial intelligence (AI).ML provides a way to learn the systems without being
explicitly programmed and this learning can be used for solving problems. Machine learning takes
data as input and it learns some important relations from data to make decisions as per user
requirements. The learning process starts with the observations like samples, direct experience and
then find patterns in that data to make better decisions to predict or classify new things in the future.
For text processing machine learning provides Natural Journal of Information and Computational
Science Volume 9 Issue 11 - 2019 ISSN: 1548-7741 1669 www.joics.org language processing (NLP)
capabilities. We can easily analyze our textual datasets through NLP methodologies. NLP provides an
opportunity for data analysts to apply machine learning and deep learning algorithms to our textual
datasets. We make use of machine learning algorithms for classifying reviews and recommend the
best restaurant. In general, the methods implemented in a recommender system are three types
namely Content-based Methods, Collaborative Methods and Hybrid Methods. content-based methods
depends on likenesses between the reviews of the users. It prescribes items to a client dependent on
recently evaluated most noteworthy things by a similar client. Generally, we need to construct
customer-profile data and item-profile data by using the content of shared attribute space. For
example, consider a movie, we can represent it with the movie stars in it and the genres. For customer
profile, we can do the same thing based on the users likes some movie stars/genres etc. For
calculating how good a movie is, we may use cosine similarity. Collaborative techniques are based on
user behaviour for recommendation of items. These methods don’t need anything else except users’
historical preference on a set of items. Because it’s based on historical data, the core assumption here
is that the users who have agreed in the past tend to also agree in the future. In terms of user
preference, it usually expressed by two categories. Hybrid method comprises both the features of
content-based methods and collaborative methods.

10
1
INTRODU
CTION
The novel coronavirus covid-
19 had brought a new normal
life.India is
struggling to get out of this
virus attack and the
government implemented
lockdown for the long way.
Lockdown placed a pressure
on the global
economy. So the government
gave relaxations in lockdown .
Declared by
the WHO that a potential
speech by maintaining
distance and wearing a
mask is necessary. The
biggest support that the
government needs after
relaxation is social distancing
and wearing of masks by the
people.But many
people are getting out without
a face mask this may increase
the spread of
covid-19. Economic Times
India has stated that " Survey
Shows that 90
percent Indians are aware, but
only 44 percent wearing a
mask ". This
survey clearly points that
people are aware but they are
not wearing the
mask due to some discomfort
in wearing and
carelessness.This may result
in the easy spreading of covid-
19 in public places.
The world health organisation
has clearly stated that until
vaccines are
found the wearing of masks
and social distancing are key
tools to reduce
spread of virus.So it is
important to make people
wear masks in public
places. In densely populated
regions it is difficult to find
the persons not
wearing the face mask and
warn them.Hence we are
using image process-
ing techniques for
identification of persons
wearing and not wearing face
masks. In real time images are
collected from the camera and
it is processed
in Raspberry Pi embedded
development kit. The real time
images from the
camera are compared with the
trained dataset and detection
of wearing or
1
INTRODU
CTION
The novel coronavirus covid-
19 had brought a new normal
life.India is
struggling to get out of this
virus attack and the
government implemented
lockdown for the long way.
Lockdown placed a pressure
on the global
economy. So the government
gave relaxations in lockdown .
Declared by
the WHO that a potential
speech by maintaining
distance and wearing a
mask is necessary. The
biggest support that the
government needs after
relaxation is social distancing
and wearing of masks by the
people.But many
people are getting out without
a face mask this may increase
the spread of
covid-19. Economic Times
India has stated that " Survey
Shows that 90
percent Indians are aware, but
only 44 percent wearing a
mask ". This
survey clearly points that
people are aware but they are
not wearing the
mask due to some discomfort
in wearing and
carelessness.This may result
in the easy spreading of covid-
19 in public places.
The world health organisation
has clearly stated that until
vaccines are
found the wearing of masks
and social distancing are key
tools to reduce
spread of virus.So it is
important to make people
wear masks in public
places. In densely populated
regions it is difficult to find
the persons not
wearing the face mask and
warn them.Hence we are
using image process-
ing techniques for
identification of persons
wearing and not wearing face
masks. In real time images are
collected from the camera and
it is processed
in Raspberry Pi embedded
development kit. The real time
images from the
camera are compared with the
trained dataset and detection
of wearing or
The novel coronavirus covid-
19 had brought a new normal
life.India is
struggling to get out of this
virus attack and the
government implemented
lockdown for the long way.
Lockdown placed a pressure
on the global
economy. So the government
gave relaxations in lockdown .
Declared by
the WHO that a potential
speech by maintaining
distance and wearing a
mask is necessary. The
biggest support that the
government needs after
relaxation is social distancing
and wearing of masks by the
people.But many
people are getting out without
a face mask this may increase
the spread of
covid-19. Economic Times
India has stated that " Survey
Shows that 90
percent Indians are aware, but
only 44 percent wearing a
mask ". This
survey clearly points that
people are aware but they are
not wearing the
mask due to some discomfort
in wearing and
carelessness.This may result
in the easy spreading of covid-
19 in public place
The novel coronavirus covid-
19 had brought a new normal
life.India is
struggling to get out of this
virus attack and the
government implemented
lockdown for the long way.
Lockdown placed a pressure
on the global
economy. So the government
gave relaxations in lockdown .
Declared by
the WHO that a potential
speech by maintaining
distance and wearing a
mask is necessary. The
biggest support that the
government needs after
relaxation is social distancing
and wearing of masks by the
people.But many
people are getting out without
a face mask this may increase
the spread of
covid-19. Economic Times
India has stated that " Survey
Shows that 90
percent Indians are aware, but
only 44 percent wearing a
mask ". This
survey clearly points that
people are aware but they are
not wearing the
mask due to some discomfort
in wearing and
carelessness.This may result
in the easy spreading of covid-
19 in public place
CHAPTER 2

LITERATURE SURVEY

The consequences of challenges in the area of sentiment analysis [1] has been discussed.
Sentiment review structure is compared with sentiment analysis challenges in the first
distinction. The effect of this distinction shows that domain-dependence [2] is an important
part of sentiment challenges. The second comparison deals with the accuracy of sentiment
analysis models based on the challenges. Structured [3], Semi-structured [4], and Unstructured
[5] are three types of review structures that were used for the first comparison.

Literature survey:

Machine Learning is not a new technique for text processing. Various researchers applied machine
learning techniques for restaurant reviews classification. M. Govindarajan [1] et.al proposed a hybrid
classification model for sentiment analysis of restaurant reviews. They proposed an ensemble
classifier comprises of support vector machine and Naive Bayes models. With their model, they
achieved an accuracy of 90%. Sasikala. P[2] et.al proposed a model for classifying restaurant reviews
using sentiments in the words. Their model is based on the score combined with existing text
analyzing packages. Most people use 'yelp' for finding a good restaurant. Yelp reviews are very
helpful for finding a good restaurant. Boya Yu[3] et.al proposed support vector machines for
analyzing Restaurant Features using Sentiment Analysis on Yelp Reviews. Kirange[4] et.al also
proposed a Support Vector classifier for Emotion Classification of Restaurant Reviews. They
compared their model with Naive Bayes, K-NN and neural network models and shown that SVM
achieved good results. Tri Doan [5] et.al proposed a variant of online random forest classifiers for
performing sentiment analysis on user reviews. They showed that their model achieved an accuracy
similar to offline methods. Ekaterina Pronoza[6] et.al proposed a restaurant information extraction
method for the restaurant recommendation system. Veda Waikul [7] et.al proposed an SVM classifier
for classifying restaurant reviews. With their model, they achieved an accuracy of 77%. Journal of
Information and Computational Science
A survey of Sentiment Analysis Challenges
The consequences of challenges in the area of sentiment analysis has been discussed.
Sentiment review structure is compared with sentiment analysis challenges in the first
distinction. The effect of this distinction shows that domain-dependence [2] is an important
part of sentiment challenges. The second comparison deals with the accuracy of sentiment
analysis models based on the challenges. Structured [3], Semi-structured [4], and Unstructured
[5] are three types of review structures that were used for the first comparison. Theoretical and
technological are the two types of sentiment analysis challenges. The challenges include
Domain dependence, negation [6], bipolar words [7], entity feature/ keyword extraction [8],
spam, or fake review[9], NLP overheads like (short abbreviations, ambiguity, emotions,
sarcasm). Parts-of-speech (POS) tagging [10] gives highly accurate results for the theoretical
types of challenges. The phrases and expressions of n-gram [11] give it an edge over all other
techniques used for a technical set of challenges. The results explained the effectiveness of
sentiment analysis challenges for improving the accuracy of the model [12]. 2.2. Aspect based
Sentiment Oriented Summarization of Hotel Reviews Due to the unstable size of review
dimensions and customer produced content, different text analytic approaches like opinion
mining [13], sentiment analysis, topic modeling [14], aspect classification, play a significant
role in analyzing the content. Topic Modelling can find diverse topics in a corpus of text
because of its statistical nature. For every aspect type, there is a certain opinion linked to it
and the Sentiment analysis method can effectively bring out these emotions. Whether it is a
business intelligence problem or a case of unstructured document categorization sentiment
analysis is useful for most of the cases. It has emerged as the most important aspect of the
Information Retrieval process. The strategies regarding text summarization [15] can boost
sentiment analysis research. The opinion mining of the hotel reviews is done using
SentiWord[16] library. The reviews were summarized on different aspects and sentiment
analysis was performed
CHAPTER 3

EXISTING SYSTEM AND PROPOSED SYSTEM

3.1 EXISTING SYSTEM:

Generally, we need a procedure for representing text information for the ML algorithm.
Bag-of-words is useful to complete this task. This model is simple to implement. It is one of
the methods to extract features from the given text for machine learning models. Bag of
Words model is used to preprocess the input text by changing it into a bag of words.Bow can
be represented using a table,which contains the count of words corresponding to the word
itself.. ML methods need input data to be in number format. But restaurant reviews contain
textual information. In this method, each word is also called as “gram”. We can also create a
vocabulary of two-word pairs. It is called a bigram model. The general model is called as n-
gram model. The procedure for changing the text into numbers is called as vectorization in
Natural Language Processing models. This vectorization can be done in 3 ways namely count
vectorizer, tfidf vectorizer, HashingVectorizer.

13
3.2 PROPOSED SYSTEM:

 In this project to install all the required libraries we have used python console and
(CMD).
 And the datasets of people who want to know about restaurant review are downloaded
from Kaggle .
 And all the coding and the
compilation part is done in sublime-
text editor for the proposed
development project.

(Python console Fig- 3.2.1) (Datasets from Kaggle Fig- 3.2.2)

The libraries we have used in this project are SKlearn, Numpy, Matplotlib.

 Numpy is a library for the Python programming language, adding support for large,
multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays.
 SKlearn is a Simple and efficient tool for predictive data analysis · Accessible to
everybody, and reusable in various contexts · Built on NumPy, SciPy, and matplotlib.
 Imutils A series of convenience functions to make basic image processing functions
such as translation, rotation, resizing, skeletonization, displaying.
 Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python. Matplotlib makes easy things easy and hard things possible.
14

(Fig- 3.2.6) (Fig- 3.2.7) (Fig- 3.2.8)


CHAPTER 4

HARDWARE AND SOFTWARE REQUIREMENTS


CPU (central processing unit)
4.1 HARDWARE RAM (8 Gb preferable) REQUIREMENTS:
Keyboard and Mouse
Desktop or Monitor
GPU (graphical processing unit)
Web Cam (With Basic Pixels)
Processor (Core I5 8th Generation
Preferable)
Hard Drive

(Table- 4.1.1)

4.2 SOFTWARE REQUIREMENTS:

Software Version
Operating System Windows 10 or 11
Python 3.8 or Above
Sublime Text Editor 3 or Above
Python Console or
CMD
(Table- 4.2.1)

PACKAGE VERSION
NAME
Numpy 1.19
4.3 LIBRARIES

Time 1.18.1

MobileNet_V2 Version 2

Scikit 1.1

Matplotlib .2.7

OS
BLOCK DIAGRAM:
ALGORITHM:

After applying one of the above vectorization models,the entire text data
is converted into a sparse matrix form with numeric data. Now, this data is
ready for applying machine learning algorithms. Before applying a machine
learning algorithm, we divided the given dataset into training and testing data.
For this division, we applied 5-fold crossvalidation technique. In this
technique, data is divided into 5 parts also called folds. All the 5 folds can be
used as testing sets in one of the iterations. The dataset contains 1000
reviews.As per 5-fold cross validation technique,training set contains 800
reviews and testing set contains 200 reviews. But these reviews are in the form
of vectors in numerical notations. we applied several machine learning
algorithms for the classification of reviews. To measure accuracy of our
model, we used classification accuracy measure,which can be calculated as
follows:

accuracy= (True Positives +True Negatives)/Count of total samples

here, true positives means that actual label is true and model says it is true.True
Negatives (TN) means that the actual sample label is false and the algorithm says it is as
false.we used accuracy for comparing various ML models.Although We applied several
ML models,we achieved good results with three models only.

NLP – Natural language processing

Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial
intelligence concerned with the interactions between computers and human language, in
particular how to program computers to process and analyze large amounts of natural
language data.” There are various real life applications of NLP these days. There is a lot of
textual data being generated these days. All this data if used properly for analytics can be used
to reach business goals. We have taken some data based on restaurant reviews.
Logistic Regression:
The name of this algorithm contains regression, but it is used for the task of classification
only. It is used, when the predicted variable is categorical (or label). This model works on the
probability model. Initially, we apply linear regression, then there is a need to set a threshold
based on which classification can be done. If that value is less than 0.5, we place that sample
into one class. If that value is greater than 0.5, we place that sample into another class.

Support Vector Machines:


SVM is also a supervised learning model that can be used for classification and regression.
The major objective in SVM is finding a hyperplane that is used for distinguishing data points
into separate classes. There are many numbers of hyperplanes that exist, but the one with
maximum margin is only considered.

Steps involved:
Step 1: Import dataset with setting delimiter as ‘\t’ as columns are separated as tab space.
Reviews and their category(0 or 1) are not separated by any other symbol but with tab
space as most of the other symbols are is the review (like $ for the price, ….!, etc) and the
algorithm might use them as a delimiter, which will lead to strange behavior (like errors,
weird output) in output. 
 

All the necessary imports and libraries for the task.


Reading the data.

We then read the data, and based on the label, we divide the text into positive
and negative reviews. We now have the whole positive and negative reviews as a
text blob.

Step 2: Text Cleaning or Preprocessing 


 
 Remove Punctuations, Numbers: Punctuations, Numbers don’t help much in
processing the given text, if included, they will just increase the size of a bag of
words that we will create as the last step and decrease the efficiency of an
algorithm.
 Stemming: Take roots of the word 
 
 Convert each word into its lower case: For example, it is useless to have some
words in different cases (eg ‘good’ and ‘GOOD’).
 

 Python3
Step 3: Tokenization, involves splitting sentences and words from the body of the
text.

Next, working on the positive reviews, we remove punctuation because to


work on text data, it’s best practise to remove punctuation. Using the
tokenizer we split the text into words (tokenize it).

Similarly in NLP, converting text data into lower case is also a good
practise for uniformity. Then we remove the stopwords, stop words are the
most common words in a language. In computing, stop words are filtered
out before or after processing of natural language data.

Example of stopwords-

a, of, on, I, the, with, so, and etc


Both are important parts of Natural Language processing and each with
their own benefits and needs.

Frequency Distribution of positive review words.


The frequency distribution of the positive words gives us this output. As,
we can see, mostly the words are positive and good oriented and show a
good nature

Step 4: Making the bag of words via sparse matrix


 
 Take all the different words of reviews in the dataset without repeating of
words.
 One column for each word, therefore there is going to be many columns.
 Rows are reviews
 If a word is there in the row of a dataset of reviews, then the count of the
word will be there in the row of a bag of words under the column of the
word.
Examples: Let’s take a dataset of reviews of only two reviews 
 
Input : "dam good steak", "good food good service"
Output :

 
For this purpose we need CountVectorizer class from
sklearn.feature_extraction.text. 
We can also set a max number of features (max no. features which help the most
via attribute “max_features”). Do the training on the corpus and then apply the same
transformation to the corpus “.fit_transform(corpus)” and then convert it into an
array. If the review is positive or negative that answer is in the second column of the
dataset[:, 1]: all rows and 1st column (indexing from zero).
 

Step 5: Splitting Corpus into Training and Test set. For this, we need class
train_test_split from sklearn.cross_validation. Split can be made 70/30 or 80/20 or
85/15 or 75/25, here I choose 75/25 via “test_size”. 
X is the bag of words, y is 0 or 1 (positive or negative).

Step 6: Fitting a Predictive Model (here random forest) 


 
 Since Random forest is an ensemble model (made of many trees) from
sklearn.ensemble, import RandomForestClassifier class
 With 501 trees or “n_estimators” and criterion as ‘entropy’
 Fit the model via .fit() method with attributes X_train and y_train
 
Step 7: Predicting Final Results via using .predict() method with attribute X_test 

Note: Accuracy with the random forest was 72%.(It may be different when performed an
experiment with different test sizes, here = 0.25).

Step 8: To know the accuracy, a confusion matrix is needed.


Confusion Matrix is a 2X2 Matrix.

TRUE POSITIVE :  measures the proportion of actual positives that are correctly
identified. 
TRUE NEGATIVE :  measures the proportion of actual positives that are not correctly
identified. 
FALSE POSITIVE : measures the proportion of actual negatives that are correctly
identified. 
FALSE NEGATIVE : measures the proportion of actual negatives that are not correctly
identified.

Note: True or False refers to the assigned classification being Correct or Incorrect, while
Positive or Negative refers to assignment to the Positive or the Negative Category 
 

IMPLEMENTATION
CODE:
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print ('User uploaded file "{name}" with length {length} bytes'.format(name=fn,
length=len(uploaded[fn])))
import pandas as pd # importing pandas module
dataset = pd.read_csv('/content/Restaurant_Reviews.csv', delimiter='\t',quoting=3 )
dataset # viewing dataframe
dataset.info() # printing information about dataframe
x = dataset['Review'].values # assigning Review values
x = list(x) # converting array datatype into list datatype
shape = dataset.shape # getting shape of dataframe
shape
import nltk # importing natural language toolkit library
nltk.download('stopwords') # downloading stop words

from nltk.corpus import stopwords # importing stopwords module


from nltk.stem.porter import PorterStemmer # importing PorterStemmer module

new_stopwords = stopwords.words('english') # assigning english stopwords


new_stopwords.remove('not') # removing "not" from stopwords
# Stemming

import re # importing regular expression module


stemmer_algorithm = PorterStemmer() # calling the PorterStemmer algorithm

stemming_sentence = [] # creating an empty list

for i in range(len(x)):

only_text = re.sub('[^a-zA-Z]',' ',x[i]) # allows only text


lower_text = only_text.lower() # convert all text into lowercase
tokenize = lower_text.split() # tokenization

# converting original tokenize words into stemming words


apply_stem = [stemmer_algorithm.stem(word) for word in tokenize if not word in
set(new_stopwords)]

join_stem = ' '.join(apply_stem) # join the text into sentence


stemming_sentence.append(join_stem) # adding sentence
x[i]
for i in range(shape[0]):
dataset['Review'].replace(to_replace = x[i], value =stemming_sentence[i], inplace = True)
import matplotlib.pyplot as plt # importing matplotlib.pyplot module

dataset["Liked"].value_counts().plot(kind="bar") # creating bar plot


y = dataset["Liked"].value_counts() # assigning value counts

plt.text(0,y[0],y[0]) # ploting '0'


plt.text(1,y[1],y[1]) # ploting '1'
x = dataset['Review'].values # assigning Review values
y = dataset['Liked'].values # assigning Liked values
# importing train_test_split module
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,random_state = 0)
from sklearn.feature_extraction.text import CountVectorizer # importing CountVectorizer module
vector = CountVectorizer() # calling the CountVectorizer
x_train_vector = vector.fit_transform(x_train) # tranforming to vector
x_test_vector = vector.transform(x_test) # tranforming to vector
from sklearn.svm import SVC # importing Support Vector Classifier(SVC) algorithm
model = SVC() # calling Support Vector Classifier(SVC) algorithm
model.fit(x_train_vector, y_train) # fitting train datasets
y_pred = model.predict(x_test_vector) # predicting x_test datasets
y_pred # printing predicted values
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.metrics import classification_report
score_1 = accuracy_score(y_test,y_pred)
score_2 = precision_score(y_test,y_pred)
score_3= recall_score(y_test,y_pred)
print("Accuracy_score : ",score_1)
print("Precision_score : ",score_2)
print("Recall_score : ",score_3)
print(" Classification Report","\n")
print(classification_report(y_test,y_pred))
pip install streamlit # installing streamlit package
pip install pyngrok==4.1.1 # installing pyngrok package
from sklearn.pipeline import make_pipeline # importing make_pipeline module
model = make_pipeline(CountVectorizer(),SVC()) # creating model object
model.fit(x_train,y_train) # fitting train datasets
y_pred = model.predict(x_test) # predicting x_test dataset
y_pred # printing predicted values
import joblib as job # importing joblib module
job.dump(model,'sentiment analysis file') # dumping pipeline
%%writefile sentiment_analysis_webapp.py

import streamlit as st # importing streamlit module


import joblib as job # importing joblib module

st.title("Sentiment analyser") # creating title

model = job.load('sentiment analysis file') # loading machine learning model


user_input = st.text_area('Enter your review : ', height=150) # getting input from user

# Stemming for user input

import nltk # importing natural language toolkit library


nltk.download('stopwords') # downloading stop words

from nltk.corpus import stopwords # importing stopwords module


from nltk.stem.porter import PorterStemmer # importing PorterStemmer module

new_stopwords = stopwords.words('english') # assigning english stopwords


new_stopwords.remove('not') # removing "not" from stopwords

import re # importing regular expression module


stemmer_algorithm = PorterStemmer() # calling the PorterStemmer algorithm

stemming_input = [] # creating an empty list

only_text = re.sub('[^a-zA-Z]',' ',user_input) # allows only text

lower_text = only_text.lower() # convert all text into lowercase


tokenize = lower_text.split() # tokenization

# converting original tokenize words into stemming words


apply_stem = [stemmer_algorithm.stem(word) for word in tokenize if not word in set(new_stopwords)]

join_stem = ' '.join(apply_stem) # join the text into sentence


stemming_input.append(join_stem) # adding sentence into list

predict_output = model.predict(stemming_input) # prediting output

if st.button('Predict'): # creating button

if predict_output[0] == 0: # if review value is 0,its Negative review


st.title("Negative Review") # creating title

else: # otherwise positive


st.title("Positive Review") # creating title
from pyngrok import ngrok #importing ngrok module
! nohup streamlit run sentiment_analysis_webapp.py & # running streamlit
public_url = ngrok.connect(port = '8501') # Creating public URL
public_url # printing public url
1
INTROD
UCTION
The novel coronavirus
covid-19 had brought a
new normal life.India is
struggling to get out of
this virus attack and the
government implemented
lockdown for the long
way. Lockdown placed a
pressure on the global
economy. So the
government gave
relaxations in lockdown .
Declared by
the WHO that a potential
speech by maintaining
distance and wearing a
mask is necessary. The
biggest support that the
government needs after
relaxation is social
distancing and wearing of
masks by the people.But
many
people are getting out
without a face mask this
may increase the spread
of
covid-19. Economic
Times India has stated
that " Survey Shows that
90
percent Indians are aware,
but only 44 percent
wearing a mask ". This
survey clearly points that
people are aware but they
are not wearing the
mask due to some
discomfort in wearing
and carelessness.This
may result
in the easy spreading of
covid-19 in public places.
The world health
organisation has clearly
stated that until vaccines
are
found the wearing of
masks and social
distancing are key tools
to reduce
spread of virus.So it is
important to make people
wear masks in public
places. In densely
populated regions it is
difficult to find the
persons not
wearing the face mask
and warn them.Hence we
are using image process-
ing techniques for
identification of persons
wearing and not wearing
face
masks. In real time
images are collected from
the camera and it is
processed
in Raspberry Pi
embedded development
kit. The real time images
from the
camera are compared
with the trained dataset
and detection of wearing
or

1
INTROD
UCTION
The novel coronavirus
covid-19 had brought a
new normal life.India is
struggling to get out of
this virus attack and the
government implemented
lockdown for the long
way. Lockdown placed a
pressure on the global
economy. So the
government gave
relaxations in lockdown .
Declared by
the WHO that a potential
speech by maintaining
distance and wearing a
mask is necessary. The
biggest support that the
government needs after
relaxation is social
distancing and wearing of
masks by the people.But
many
people are getting out
without a face mask this
may increase the spread
of
covid-19. Economic
Times India has stated
that " Survey Shows that
90
percent Indians are aware,
but only 44 percent
wearing a mask ". This
survey clearly points that
people are aware but they
are not wearing the
mask due to some
discomfort in wearing
and carelessness.This
may result
in the easy spreading of
covid-19 in public places.
The world health
organisation has clearly
stated that until vaccines
are
found the wearing of
masks and social
distancing are key tools
to reduce
spread of virus.So it is
important to make people
wear masks in public
places. In densely
populated regions it is
difficult to find the
persons not
wearing the face mask
and warn them.Hence we
are using image process-
ing techniques for
identification of persons
wearing and not wearing
face
masks. In real time
images are collected from
the camera and it is
processed
in Raspberry Pi
embedded development
kit. The real time images
from the
camera are compared
with the trained dataset
and detection of wearing
or
The novel coronavirus
covid-19 had brought a
new normal life.India is
struggling to get out of
this virus attack and the
government implemented
lockdown for the long
way. Lockdown placed a
pressure on the global
economy. So the
government gave
relaxations in lockdown .
Declared by
the WHO that a potential
speech by maintaining
distance and wearing a
mask is necessary. The
biggest support that the
government needs after
relaxation is social
distancing and wearing of
masks by the people.But
many
people are getting out
without a face mask this
may increase the spread
of
covid-19. Economic
Times India has stated
that " Survey Shows that
90
percent Indians are aware,
but only 44 percent
wearing a mask ". This
survey clearly points that
people are aware but they
are not wearing the
mask due to some
discomfort in wearing
and carelessness.This
may result
in the easy spreading of
covid-19 in public place
The novel coronavirus
covid-19 had brought a
new normal life.India is
struggling to get out of
this virus attack and the
government implemented
lockdown for the long
way. Lockdown placed a
pressure on the global
economy. So the
government gave
relaxations in lockdown .
Declared by
the WHO that a potential
speech by maintaining
distance and wearing a
mask is necessary. The
biggest support that the
government needs after
relaxation is social
distancing and wearing of
masks by the people.But
many
people are getting out
without a face mask this
may increase the spread
of
covid-19. Economic
Times India has stated
that " Survey Shows that
90
percent Indians are aware,
but only 44 percent
wearing a mask ". This
survey clearly points that
people are aware but they
are not wearing the
mask due to some
discomfort in wearing
and carelessness.This
may result
in the easy spreading of
covid-19 in public place

You might also like