You are on page 1of 11

ANALYZING YELP REVIEWS

Insights to Restaurant Business

PREPARED BY:
SAMYUKTHA VENNAPUSA – Z1884937
PAVAN AYYALASOMAYAJULA – Z1876068
TABLE OF CONTENTS:

I EXECUTIVE SUMMARY 1
II INTRODUCTION 1
III TOOLS AND PROGRAMMING LANGUAGES 2
IV VISUALIZATION 2
V SENTIMENT ANALYSIS 3
VI TOPIC MODELLING 3
VII NETWORK ANALYSIS 4
VIII RECOMMENDATIONS 4
IX APPENDIX 5
I. EXECUTIVE SUMMARY:

As increasing numbers of consumers want to dine out or take prepared food home, the number of food-
service operations has skyrocketed. But there's still room in the market for your food-service business.
Starting a restaurant will demand a whole lot of work and analysis. The main idea of this project is to
provide insights to the owners who wants to set up a restaurant business in Arizona state. For this we
have considered Yelp dataset which is a large resource of information about reviews of various
businesses and user data.

The dataset consists of different business categories, their reviews, ratings provided by the customers
and information about restaurant’s location. It also consists of information whether the business is
closed or open.

First, we filtered the data only for restaurants and filtered the top two cities that has more restaurant
businesses. We then performed sentiment analysis for the closed and open businesses separately to see
how the reviews effected their businesses. We performed Topic Modelling to know different topics
discussed by the customers in their positive and negative reviews. We analyzed different customers
whose reviews are found useful and performed network analysis to identify the most influential people.

II. INTRODUCTION:

Yelp is a local business directory service and review site with social networking features. It allows users
to give ratings and review businesses. It provides opportunity to business owners to improve their
services and users to choose best business amongst available.

Here we have chosen Yelp Dataset to provide insights to the restaurant businesses which consists of
reviews on different business categories in the state of Arizona (AZ).

i) Dataset Description:

Below is the overview of the dataset that we are using for our analysis
reviewer reviewer
business_ca business business_ business business_rev busines busines _average reviewe reviewer _review_ reviewer
id tegories _city name _open iew_count s_stars s_state review_id _stars r_funny _name count _useful text user_id friends
1 Breakfast & Brunch;
PhoenixRestaurants
Morning GloryTRUE
Cafe 116 4 AZ fWKvX83p0-ka4JS3dc6E5A
3.72 331 Jason 376 1034 My wife tookrLtl8ZkDX5vH5nAx9C3q5Q
me here onilE_yuVTaJzgrPn7tcKNww
my birthday for breakfast and it was excellen
J9okscw-
w41t6_ELjMOv5w,SotHrRU
2 Italian; Pizza; Phoenix
RestaurantsSpinato's Pizzeria
FALSE 102 4 AZ IjZ33sJrzXqU-0X6U8NwyA
5 2 Paul 2 0 I have no idea0a2KyEL0d3Yb1V6aivbIuQ
why someQFDcBjeIx-QgaIA
people give bad reviews about this place. It g
3 Mexican; Restaurants
ScottsdaleLa Condesa Gourmet
TRUE Taco Shop307 4 AZ riFQ3vxNpP4rWLk_CSri2A
3.79 1187 Monique 295 1376 Drop what you're
wFweIWhv2fREZV_dYkz_1g
doing and drive here. After I ate here I had to go bac
4 Food; Tea Rooms;Phoenix
Japanese;
NobuoRestaurants
At Teeter
TRUEHouse 189 4.5 AZ jJAIXA46pU1swYyRCdfXtQ
4.17 0 Mark 6 4 Nobuo showssUNkXg8-KFtCMQDV6zRzQg
his unique1-oZsNLPY4vmCLj24TOc-g
talents with everything on the menu. Careful

Some of the important variables that we are considering for analyzing are listed below:

business_categories: Consists of different categories like shopping, restaurants, automotive etc.

business_open: It tells us whether a business is open or closed.

1
business_review_count: Number of reviews given to the business

business_stars: Average rating of the business

text: The reviews given by the customer for the business.

reviewer_useful: Count of the people who found the review useful.

reviewer_funny: Number of people who found the review of the reviewer funny

reviewer_review_count: Number of people who reviewed the review

III. TOOLS AND PROGRAMMING LANGUAGES:

Sentiment Analysis : Python, Microsoft Azure, Power BI

Topic Modelling : R Studio, Spark

Visualization : Power BI

Word Cloud : Power BI

Network Analysis : Gephi

Filtering the dataset:

The dataset consists of different business categories like Restaurants, Shopping, Health and so on. Since
we are focusing on Restaurants, we filtered the data only with Restaurants business. Also, using the
word cloud we could see that most of the restaurant’s businesses are found in Phoenix and Scottsdale.

So, we are limiting our dataset to these two regions for further analysis.

Now the dataset consists of different restaurants, and the reviews provided by the customers and
information related to those.

IV. VISUALIZATION:

From the visualization we could see that most of the business ratings are 4. Only few people give 5
rating to the business. When we look at the ratings given to the closed businesses, these are distributed
from 1 to 5 and maximum number of ratings are given as 4. We could see that the percentage of
businesses closed are more for 1 star and 2-star ratings.

Also, let us look at the reviewer’s information. Here we can find that reviewers named Gabi, Thomas,
Michael are the top 3 people whose reviews are found helpful by the other customers. So, it is
important to consider their reviews and make the changes in the food or service which helps in getting
more positive reviews. Also, giving special discounts to these people helps in promoting our business.
Though the reviewer is found funnier, less people found his reviews useful. So the usefulness doesn’t
depend on the person being funny or not.

2
While looking at the different businesses that are open to the closed, we could observe that the
percentage of sandwiches restaurants are closed. So, most people in Phoenix and Scottsdale are not fan
of going to this kind of restaurants. So, we can say that it is not a good idea to start this restaurant. We
could see that the Mexican restaurants have more reviews, which means more people are preferring
Mexican cuisine. Also, most of the 5 ratings are given to the Pizza restaurants when compared to other
categories. We can infer that the pizza restaurants which are present in those cities are having good.

V. SENTIMENT ANALYSIS:

Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral)
within text data using text analysis techniques.

We performed “Sentiment Analysis” on the text to find the reviews with positive and negative
sentiment score about the restaurants. We then used these negative and positive reviews in Topic
Modelling to know the topics discussed in their reviews.

VI. TOPIC MODELLING:

Topic model is a type of statistical model for discovering the abstract topics that occur in a collection of
documents. Topic modeling is used text-mining tool for discovery of hidden semantic structures in a text
body.

Here we performed Topic Modelling using R studio and sparklyr package. The algorithm used for this is
Latent Dirichlet Allocation Algorithm.

Analyzing Negative reviews:

We have “business_open” variable to know whether the restaurant is open or closed. At first, we want
to know why the businesses are closed by analyzing the reviews to see what the customers are talking
using sentiment analysis. So, considering the data only with the businesses that are closed.

We want to find the topics discussed by the customers in their reviews. Since the businesses are closed,
we want to focus on the reviews with negative sentiment score to see which factors are affecting the
businesses for their closure. We have conducted topic modelling to see the different topics that are
discussed by the customers in these negative reviews.

Form the topics discussed in figure(v), and looking at the topmost negative reviews, we could observe
that most of the people are talking about the topics like food, service, place. So, most people are
complaining about the food not being tasty, not cooked nice and the place being not clean, and the
service offered by the restaurant is also bad. Looks like people does not want to go back to the
restaurant once they visited. Also, people are discussing about foods like tacos, shrimp, pizza, cheese,
and Mexican food in their negative reviews which implies having not a good chef in these types of
restaurants can be one of the reasons for closing the restaurants.

3
Also, analyzing the negative reviews for the businesses that are open in figure(vi), we could observe the
similar topics discussed like service and the place. So being keener in these areas while establishing a
new business can help its growth.

Analyzing positive reviews:

Next, we want to look at the reviews with the positive sentiment score to analyze factors that are
helpful to increase the sales and rating of the business. We have done sentiment analysis using python
for the businesses that are open and used these scores to perform topic modelling to find various topics
that are discussed by the customers while giving positive reviews.

Here we could see from the topics in figure(vii) that people writing about various topics like food,
cheese, pizza, service, chicken, salad, and the place. By looking at the topics, we could say that
customers like to give the positive reviews if the food is fresh and they really love the food. Also, we can
observe that customers like to provide review if they really like the place. Also, we can find people who
are providing positive reviews mostly have chicken, salad pizza, cheese in their orders. We can see that
one of the reasons for the customers to write the review is if they are impressed with the service
provided in the restaurant. These are the main reasons for the restaurant to get more positive reviews.
This information helps for the new or existing business to take care in these areas and improve their
sales.

VII. NETWORK ANALYSIS:

We have conducted network analysis considering customers as nodes and friends being the edges. We
have found the closeness centrality is almost same to most of the people. The betweenness centrality is
found high for Michael which indicates that he has the most influence in the network. He is also one of
the top 3 people whose reviews are found most useful. We can use this information for promoting the
business. We can provide discounts or offers to these people helps in increasing the customers for the
business and our business reach can be increased.

VIII. RECOMMENDATIONS:

• This analysis can be used by the existing businesses to know the factors influencing the
customers to provide negative or positive reviews and by working on those businesses can
increase their growth.
• This analysis can be used the other business categories to know their negative and positive
factors of their business and incorporate the changes depending on that.

4
IX. APPENDIX:

i) Word Cloud of Business Categories and Business cities

ii) Visualization:

5
Percentage of closed businesses for 1 and 2-star ratings

iii)Sentiment Analysis Code- Python

import nltk

from nltk.corpus import stopwords

import string

import pandas as pd

from textblob import TextBlob

text = pd.read_csv('Yelp_Text.csv')

savedf = pd.DataFrame(columns = ['doc','polarity','score'])

for sentence in text.itertuples():

blob = TextBlob(sentence.text)

data = {'doc': sentence.text,

'polarity' : blob.sentiment.polarity,

'score': blob.sentiment.subjectivity}

savedf =savedf.append(data, ignore_index = True)

print(savedf.head())

6
iv)Topic Modelling:

R programming:

topic_Yelp0 <- read.csv("D:/Sam Docs/SAM/OMIS 670/Project/Yelp_Review.csv")

#Using Spark
sc <- spark_connect(master = "local", version = "2.3")#opening the connection
#Copy the data
topic_review0 <- copy_to(sc, topic_Yelp_review0)

#Topics
stop_words <- ml_default_stop_words(sc) %>%
c(
"like", "really", "also", "even", "simply", "good", "ever", "came"
)

lda_model <- ml_lda(topic_review0, ~ text, k = 4, max_iter = 5, min_token_le


ngth = 4,
stop_words = stop_words, min_df = 5)

#Use the tidy() function to extract the associated betas, which are the per-t
opic-per-word probabilities, from the model
betas <- tidy(lda_model)

#Visualizing
betas %>%
group_by(topic) %>%
top_n(10, beta) %>%
ungroup() %>%
arrange(topic, -beta) %>%
mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip()

7
v) These are the topics for businesses that are closed and whose reviews are negative

vi) Below are the topics for the negative reviews for open businesses

8
vii) Below are the topics for positive reviews for all.

viii) Network Analysis:

Below are the top people with highest betweenness centrality

Nodes: Reviewer, Edges: Friends

You might also like