You are on page 1of 19

SENTIMENT ANALYSIS

OF RESTAURANT
REVIEWS

PRESENTED BY: PUSHPA (10610624)

GUIDED BY : DR MEHRAN RAFIEE


Background & Motivation:
 Customer Insights
 Quality Improvement
 Competitor Analysis
 Menu Optimization
 Staff Performance
 Reputation Management
 Marketing Insights
 Predictive Analysis
 Customer Loyalty
 Identifying Trends
Research Question:

 Are traditional Machine Learning algorithm Suitable for Sentiment Analysis?


 Are hybrid algorithms better than traditional Machine Learning algorithms?
 Which is the better parameter, rating or review for sentiment analysis?
Methodology:

 Crisp DM methodology
Business understanding

 Its important to understand the problem the business is going to solve.


 Are we just going to infer the overall sentiment ?
 Are we going to work upon certain aspects for improvement ?
 Are we going to use something as a part of business and marketing strategy?
Data understanding

 Rating vs
Review
count
Data understanding………

 Review
count per
state
Distribution of sentiment based on Ratings
Distribution of sentiment based on Review
Data preparation

 Data preprocessing & Data cleaning


 Tokenization
 Special character removal
 Punctuations & Stopwords removal
 Lemmatization
 Topic modelling
Modelling & Evaluation

 Naïve bayes
 Support Vector Classifier
 Random Forest
 Logistic Regression
 Hybrid model
Naïve Bayes

With Rating With Review


Precision Recall F1 score Precision Recall F1 score

Negative 0.81 0.43 0.56 Negative 0.76 0.27 0.40


Class Class

Positive 0.82 0.96 0.88 Positive 0.85 0.98 0.91


Class Class
Accuracy 0.84
Accuracy 0.81

Macro- 0.81 0.70 0.72 Macro- 0.80 0.62 0.65


avg avg

Weighted 0.81 0.81 0.79 Weighted 0.83 0.84 0.81


-avg -avg
Support Vector Classifier

With Rating With Review


Precisi Recall F1 score Precisi Recall F1 score
on on

Negative Class 0.79 0.68 0.73 Negative 0.80 0.61 0.69


Class

Positive Class 0.88 0.93 0.91 Positive Class 0.91 0.96 0.94

Accuracy 0.86 Accuracy 0.89

Macro- avg 0.84 0.80 0.82 Macro- avg 0.85 0.78 0.81

Weighted -avg 0.86 0.86 0.86 Weighted - 0.89 0.89 0.89


avg
Random Forest
With Rating With Review
Precision Recall F1 score Precision Recall F1 score

Negative 0.81 0.59 0.68 Negative 0.87 0.33 0.48


Class Class

Positive 0.86 0.95 0.90 Positive 0.86 0.99 0.92


Class Class
Accuracy 0.85 Accuracy 0.86

Macro- 0.84 0.77 0.79 Macro- avg 0.86 0.66 0.70


avg
Weighted - 0.85 0.85 0.84 Weighted - 0.86 0.86 0.83
avg avg
Logistic Regression

With rating With review


Precision Recall F1 score Precision Recall F1 score

Negative 0.80 0.67 0.73 Negative 0.83 0.56 0.67


Class Class

Positive 0.88 0.93 0.91 Positive 0.90 0.97 0.94


Class Class

Accuracy 0.86 Accuracy 0.89

Macro- 0.84 0.80 0.82 Macro- 0.87 0.77 0.80


avg avg

Weighted - 0.86 0.86 0.86 Weighted - 0.89 0.89 0.88


avg avg
Hybrid Model –(SVC+NB)

Precision Recall F1 score


Negative Class 0.76 0.27 0.40

Positive Class 0.85 0.98 0.91


Accuracy 0.84
Macro- avg 0.80 0.62 0.65
Weighted -avg 0.83 0.84 0.81
Best Model

 Based on the results SVC & Logistic regression are the good algorithms for this dataset
owning to bias towards positive class.
 SVC is computationally expensive , therefore Logistic regression outperforms among all
the models tested
Limitations

 Limited number of records


 Negation words, contrasts and sarcasm are not dealt separately.
 Dataset is raw and has not been corrected for grammatical errors
 Performance of various algorithm is tested without the understanding of cultural and
contextual factors
 Language evolve over time, new slangs, and expressions emerging, the models used
might not always keep up with the changes.
Thank you

You might also like