You are on page 1of 21

FEDERAL STATE EDUCATIONAL INSTITUTION OF HIGHER EDUCATION

NATIONAL RESEARCH UNIVERSITY HIGHER SCHOOL OF ECONOMICS

Saint Petersburg School of Economics and Management

Department of Management

ANALYSIS OF HOTEL REVIEWS IN THE USA

Team:
Volotskaya Ekaterina
Kobozev Vladimir
Polyanskiy Pavel
Pronina Anastasiia
Tikhomirova Daria
BMN-194
Table of Contents
1. Introduction........................................................................................................................3
2. Literature review....................................................................................................................5
2.1 Relationship between consumer review and hotel performance......................................5
2.2 Theoretical framework.....................................................................................................6
3. Methodology..........................................................................................................................7
3.1 Data Introduction.............................................................................................................7
3.2 Reviews Analysis.............................................................................................................9
3.2.1 Preprocessing and Data Validation...............................................................................9
3.2.2 Filtration........................................................................................................................9
3.2.3 Key Word Extraction....................................................................................................9
3.2.4 Separation of positive and negative collocations........................................................10
3.2.5 Extraction of entities from positive and negative key words......................................10
3.2.6 Counting the number of occurrences..........................................................................11
3.2.7 Dashboard explanation................................................................................................11
3.2.8 Relationship between hotel performance and hotel reviews.......................................11
3.2.9 Conclusion on Methodology.......................................................................................12
4. Results..................................................................................................................................13
4.1 Findings..........................................................................................................................13
4.2 Dashboard presentation..................................................................................................14
4.3 Answer to the research question and hypothesis rejection............................................15
4. Discussion........................................................................................................................16
5.1 Limitations.....................................................................................................................16
5.2 Further work...................................................................................................................16
6. Conclusion...........................................................................................................................17
6.1 Theoretical contribution.................................................................................................17
6.2 Managerial implication..................................................................................................17
7. References............................................................................................................................18
8. Appendix..............................................................................................................................21

2
1. Introduction

It is not a secret that people from all over the world enjoy travelling, and the USA has
always been one of the most popular destinations (ref). It is a recognized world leader in
tourism serving an increasing number of people who decide to go on vacation there every
year. This fact can be explained by several reasons: relatively low prices comparing with
European countries, lots of the most famous attractions in the world both natural and hand-
made, and regularly controversial political situation in many other regions.

Tourists’ love for new places discovery is the factor moving the hotel industry forward,
but in the past few years it has experienced the tough period in line with the COVID-19, its
consequences, and general problematic environment, and the US was not an exception, since
the number of tourists in search of rest or bright emotions and adventures has decreased.
COVID-19 has affected all spheres of society - from small to large ones. Businesses suffered
huge losses, went bankrupt, closed and opened, trying to somehow survive in such
circumstances. Since the beginning of the pandemic, a new concept has appeared among
tourists – “travel-shaming”. The idea is the following: people are scared of being criticized
for travelling while the pandemic is still raging (Huang et al, 2022). Moreover, the lockdown,
which was introduced to prevent an increase in the rate of spread of the disease has played a
one of the most important roles in the volume of tourists around the world. In addition, during
the peak of the spread of morbidity and mortality from COVID-19, people were further afraid
to leave the house and contact people, and there was no talk of travel at all. Due to all these
facts, the hotel owners had to deal with the damage for their business happening because of
the shortage of visitors. However, after overcoming the most stressful times in terms of
morbidity, people slowly begin to travel again without a twinge of conscience, since
quarantine knocked the inhabitants of the planet out of the track of peaceful life. During the
last couple of years, the accommodation industry faced with sharp decrease in bookings
because of the drastically reduced number of both foreign and domestic tourists in 2020. It
also affected the traveling and tourism sector in the countries’ GDP worldwide. With those
changes due to the new restrictions, security measures, economic and political consequences
of the pandemic, it became clear that the world will no longer be the same as it was before, as
well as the hotel industry, indeed.

However, according to the recent reports and statistics, the tourist flow to the USA has
become higher since the beginning of 2022. Pursuant to statistics for July 2022, the number
of tourists coming to America has approximately doubled over the past year. The number of
tourists arriving in August 2022 amounted to more than 5 million people, and the revenues
that month amounted to more than 13,700 thousand dollars. Therefore, it becomes obvious
that the number of tourists will increase over time. Hence, the business, and namely the hotel
industry, have to be prepared for the rising number of travelers (2 refs).

Taking into account everything stated above, in such a context it is crucial for hotel
owners to understand how they can refine the business in order to handle with the influx of
tourists after the fall of different prohibitions. Nowadays hotels apply various approaches to
improve the welcoming, placement and other services in seasons of great traveler congestion.
For instance, several metrics like overall review performance score or recommendation rate
can be used to generally evaluate the performance of a hotel. In addition, there are special
tools like sentiment analysis or word vectorization in order to conduct more deep and precise
analysis. One of the existing and very useful methods of business improvement is the analysis
of tourists’ feedback placed on special web-pages by extracting sentiments from online

3
reviews of hotels. In this way, hotel owners receive information directly from customers
about very specific problems or their absence. This approach gives an accurate idea to the
businessmen in accommodation industry about which elements of the hotel's structure need
improvement. Summarizing all the information which was stated above, the following
research question was developed:

RQ: How do both positive and negative reviews impact hotel performance?

The rest of the paper is organized in the following manner: section 2 offers a relevant
literature, dedicated to the existing theories and previous approaches of analysis used by the
professionals; section 3 describes our dataset and proposed method itself; section 4 and 5
present the results of the experiments and their discussion, and the section 6 concludes our
contributions, states the constraints and the directions for the future research.

4
2. Literature review

The literature review will cover 2 important theme subdivisions: Relationship


between consumer review and hotel performance and Theoretical framework description for
diving into core thought of the project.

2.1 Relationship between consumer review and hotel performance


With the development of the internet networks, booking hotels through various digital
platforms and websites has become a new channel for the hotel business to communicate with
the client and the possibility of increasing the flow of consumers (Lai et al., 2021).
Simultaneously, Reddy et al. (2022) stats that nowadays hotel owners should be more aware
than ever of what their consumers are saying about them online. However, there is a certain
proportion of general managers of hotels who prefer more traditional forms of feedback, such
as personal meetings and letters from customers. Сonsequently, as the research showed, the
coefficient of productive improvements is very low in a hotel with such a feedback system,
since, as practice represented, such a format is not relevant for the current generation (Torres,
2013). Many authors note that it especially online customer reviews on hotel websites or
booking platforms that have become a valuable resource for both hotel business owners and
consumers, since reveal a vast amount of information regarding the hotel, granting future
customers the ability to decide on the direction of rest and formulate own pre-booking
valuations of the hotel. (Polpinij et al, 2022).
Nevertheless, business travelers want to receive information about both complaints
and compliments in electronic comments. And according to the author, it does not mean that
they intend to purchase a hotel room based on both opinions. Probably business travelers
would be inclined to purchase a hotel room based on positive electronic comments
concerning services of interest to them. Thus, negative electronic comments will not change
their intention to make a purchase (Memarzadeh et al., 2015). Nevertheless, the evaluation of
consumers has can have a twofold impact and send an uncertain impulse to potential
consumers, since understanding why previous customers like or dislike certain products or
services through customer reviews may not be enough to make decisions by other customers
or to improve merchandising by owners (Bien et al, 2022). To solve this uncertainty, the Ye
et al. (2022) has finalized the existing method of analyzing the content of customer reviews -
kNN, in order to reduce the risk of limitations in predicting individual behavior or perception,
and has implemented a natural language processing method LDA for analyzing online
customer reviews, thereby gaining deeper knowledge of the causal relationship of arguments
in reviews.
In general, according to Polpinij et al (2022) positive comments help the company's
reputation, attract new consumers, increase sales and profits, but negative reviews also
contain important information, since this is a direct indication of shortcomings in the service
and create conditions for improving business. However, this can only help if the guest leaves
a detailed review stating the hotel's mistakes. In compliance with Wu et al. (2022) some
experts question the legality of consumer reviews about the quality of services provided,
since the probability of unfair reviews has always existed and there are no exact guarantees
that the review is not fabricated and is not the machinations of a competitor strive to reducing
the reputation of this hotel. Nevertheless, the conclusions obtained from the study of Bian et
al. (2022) shows that indeed consumer reviews can be considered reliable and useful sources
for improving hotel service. The author notes that the reviews showed existing shortcomings
in the following aspects: check-in and check-out, as well as to improve the quality of
customer service; measures to prevent epidemics in the hotel, such as the introduction of

5
more security procedures and disinfection in the rooms; providing business customers with
additional services and adjust their business strategies in accordance with the region. Thus, a
study of Torres et al. (2022) aimed at analyzing feedback from consumers, experts, and
operators through reviews to help in the quality management process of the hotel showed that
reviews left by consumers online can help hotels increase customer satisfaction ratings, and
in turn, reviews received by customers during surveys can potentially help hoteliers improve
their image on various tourist sites. Besides, it has been suggested that consumer reviews may
affect the hotel's profits and traffic to its own website. At the same time, the authors through
correlation analysis found out that improvement in one category of consumer review
feedback often leads to improvement in other categories, thereby constating the fact that
practitioners of hotel and hospitality business could potentially benefit from a more
comprehensive approach (Torres et al., 2022; Bjorkelund et al., 2012).

2.2 Theoretical framework


Online customer reviews and opinions on social networks have in most cases replaced
traditional sources of information, such as television advertisements or brochures (Akhtar,
2019). This is common trend that this kind of information looks much more plausible due to
the growing opinion that advertising companies and news portals are increasingly publishing
false information (Bjorkelund et al, 2012). This exactly the reasons why even businesses are
beginning to listen to the opinion of consumers through comments and reviews in relation to
the development and improvement of the quality of services. A study (Kim, 2017) used
regulatory focus theory to help hotel marketers understand customer behavior and was used
to predict rapid efficiency in such areas of marketing as the adoption of new products and
promotion. Based on the regulatory focus theory which assumes that a person pursues in
order to preserve personal values and beliefs, while avoiding pain (Kim, 2017; Shi & Huang,
2022), consumers are committed to maintaining their comfort state and are more motivated to
report the factors that have arisen that take them out. In turn, reviews of experienced
consumers are regarded as guidelines for customers with service experience to understand
what is necessary for complete satisfaction. Consequently, the volume of reviews indicating
shortcomings causes any attention of the management of a separate business, and as a result
forces the company to correct the shortcomings. In addition, the expectation theory also
illustrates that expectations from a product can affect subsequent perception, and the effect of
refuting expectations is an important factor for the content of the review (Michalco et al.,
2015). Reviews of experienced consumers influence potential customers studying the
proposed list of hotels and on their basis may decide to choose a particular hotel. Thus, the
reduced flow of consumers at a hotel with a greater number of negative reviews will cause
useful changes in the hotel's service system. These arguments systematize suggestions about
the relationship between consumer reviews and hotel performance and form the first
hypothesis:

H1: Number of Positive reviews increases hotel revenue, while number of negative reviews
decreases it

This statement is going to be tested in the Methodology part of our report.

6
3. Methodology

The methodology section is devoted to the particular methods of analyzing hotel reviews
in the USA. This part is subdivided into 2 parts:
 Data introduction via Descriptive Analytics – here one can get familiarized with the
data, its structure and features.
 Reviews analysis via Predictive Analytics – here reviews are investigated and
transformed into useful insights that can be applied further by hotel owners.

3.1 Data Introduction


For report implementation we have obtained the dataset called Hotel Reviews 1 that is
dedicated to Customers’ impression about 1,000 hotels in the USA. It contains 35,912
observations for the period from 2002 to 2016 in a form of 19 variables among which 5 were
identified as suitable ones for the conduction of the analysis aimed at testing whether online
reviews system could be applicable for the extraction of valuable information for hotel’s
performance enhancement. This list contains the following indices: the state where the
business is located, Hotel name, Review’s date, Reviewer’s rating given to the hotel, and the
text of the Review.
Before the conduction of in-depth analysis with the usage of the frameworks for
defining hidden patterns, it is crucial to concentrate on extraction of initial information gotten
from the suggested dataset through descriptive and diagnostic analytics typical methods of
which allowed to build 2 visuals that are required for the basic understanding of trends
observed.

Figure 1 Distribution of the Reviews from 2002 to 2016


It is obvious that with rather fast spread and constant improvement of Internet the
system of artificially created word-of-mouth scheme of feedback distribution has
significantly contributed to the growth of the number of personal views placed on various
online platforms due to advancement of data handling technologies and quick adoption of
mobile devices (Alaimo et al., 2020). That trend is underpinned by the visual representation
of changes in the quantitative measure of users’ opinions that have been collected for 15
years and mined for the preparation of the dataset from Kaggle.com. Despite the general

1
Hotel Reviews. (2019b, June 24). Kaggle. Retrieved October 17, 2022, from
https://www.kaggle.com/datasets/datafiniti/hotel-reviews

7
positive tendency that has occurred owing to the technological progress, three milestones
stand out.

The event which initially made the statistic go up was creation of Response feature on
discussion forums in 2009 enabling the communication establishment between businesses
and consumers promoting inter-platform connectivity (Sprague, 2022). Three years later the
growth was stimulated with new National Travel and Tourism Strategy in US targeting
attraction of more than 100 million foreign visitors by 2021, showing unbelievable results of
hotel industry's occupancy increase by 62% in the second quarter of 2012 (Raush, 2012), and
affecting the emergence of new public views in virtual space. Through both technological and
regulatory actions cited a lot of review networks evolved into separate social media: it is
actually the case of the TripAdvisor which is recognized as the travel sphere’s pioneer in
different aspects and longtime leader who named user participation as “a means for
generating content” (Alaimo et al., 2020). TripAdvisor’s position in the digital economy
permits to detect its innovative activities as reference points for the entire sector. Figure 1, in
fact, is the tangible evidence of the statement: in 2014 TripAdvisor moved from being just a
review base to becoming a provisor of end-to-end services for users getting through the
whole “consumption process, from search destination to actual hotel reservation, without
leaving the platform” (Alaimo et al., 2020). The step worked for the reviews’ frequency in a
good way, retaining people and encouraging their participation in quality assessment turned

Figure 2 The most popular US states among the reviewers


out to be a great tool for evolution implying the gain of enlarge amount of user-generated
content.

However, for a better understanding of the information we are working with the
distribution of the reviews should be illustrated in a territorial way as it is necessary to see the
whole picture in case of the USA. While searching for certain explanation of partition
demonstrated by Figure 2, several reasons seemed to be worth taking into account for
justification that the reviews were not written randomly.
1. Number of hotels
A lot of US states are the leaders in the number of hospitality businesses functioning in
single region. The main examples here are California, Texas, and Florida with 15,007, 14,121
and 9,034 businesses respectively (IBISWorld, Inc., n.d.). Furthermore, more than 10 states
from the chart are listed as one of 50 most famous traveler destinations (The Most Visited
States In The United States, 2021).
2. Level of tourism, hospitality, and infrastructure development

8
Hahn (2021) concluded that the US is a country with more evolved systems supporting
tourism and the service quality worldwide, and it also affects the number of potential visitors
and, consequently, the number of reviews left by them.
3. Location and reputation of the hotels
Another success component is a reputational factor that helps to bring people to the
particular place because of their will to see a concrete hotel that frequently was the subject for
writing reviews in the past. Rich history of the USA has contributed to the appearance of
many states’ own famous landmarks (Doolin, 2019).
3.2 Reviews Analysis
Once we have introduced the data and get several insights from the graphs, it is time
to move on to the reviews themselves. The first step here is to filter the reviews and check the
validity of data. It is essential part as we are aimed to get consistent and reliable results.
3.2.1 Preprocessing and Data Validation
Just getting downloaded data is not enough to start any kind of analysis. It is a common
case when the mistakes of different types were made while parsing the data. Hence, it is
required to check whether everything is fine with observations. The next validation
procedures were performed:
 Drop missing values in the reviews – almost 98% of observations have the text of the
review, however, some missing values were detected and handled.
 Check the coordinates and their correspondent location - since our dataset is only for
US hotels, it was quite unexpected to find coordinates that correspond to the point in
Venice, Italy. Therefore, with the help of geopy 2 library in Python the correctness of
places was checked.
 Outliers were dropped – some reviews have 10-scale grading, which is abnormal,
because the description of the data claims to represent 5-scale system. The number of
outliers is small, around 100 from initial value of 35000 reviews. We drop them out.

3.2.2 Filtration
The following filters have to be applied before any analysis is done:
 Filter out hotels with total number of reviews not less than 100.
 Filter out reviews the length of which is not less than 10 words.
The first action is required because having only a few reviews will not give hotel owners
any reliable information that can be used for service improvement. For such hotels,
recommendation is to gather more feedbacks. While the threshold of 100 reviews per hotel is
estimated to be the optimal one. In our opinion, considering this particular dataset, one can
rely on results found via incorporating analysis of >=100 feedbacks.
The second filtration is connected directly with reviews text. Since we would like not
only to get sentiment of reviews(i.e. whether it is positive or negative) but also to extract
insights, helpful for service betterment, the review must be long and informative enough.
Imagine the review: “Everything was fine” or “Thank you, I was happy to stay here”. These
are indeed positive feedbacks, but there is no useful information within. Nothing can be
extracted here. Hence, we would use only those reviews that have the number of words not
less than 10.
Initially, our data have approximately 35 000 observations. After filtering, about
13 000 reviews remained. One might question such high dropout rate. Basically, it is possible
to account for all reviews, but, in our opinion, the results from such short reviews and/or such
low number of reviews will not be useful for hotels.

2
https://geopy.readthedocs.io/en/stable/

9
3.2.3 Key Word Extraction
Referring to our Research Question, we would like to detect what useful information
can be extracted from hotel reviews that can serve as a base for future improvements. Hence,
it is necessary to find these insights from the feedbacks. It is done via MonkeyLearn 3
platform in Python, which is turned out to be an extremely helpful tool that can be applied for
many different Natural Language Processing(NLP) problems. The chosen model is able to
extract key words and key words collocations from the text. Here is an example from one of
the reviews from the data: the following review “'We had a wonderful, relaxing time. The
staff were completely attentive and accommodating. We had a corner king room with a
kitchen, two patios and a deluxe bathroom. You had to drive into town but they said that
Uber is available if you need it. This was our first visit to Palm Springs and hopefully not our
last! It really... More'” contains these key word combinations by our algorithm: “attentive
staff, available uber”. As one might see, the tool is able to get things that have the main role
in the sentence.
The already extracted key words for each review is assigned as an additional column
to our dataset. Now, what we have is key words per every observation(review).
Noteworthy, the number of reviews and, hence, requests to the model is high.
Therefore, the process is time- and power-consuming. That is why the mechanism explained
below is applied:
 The data is subdivided into 13 equal samples of around 1000 reviews.
MonkeyLearn tool has a limit for number of requests per month for free
version equal to 1000. That is why it is impossible to get all key words at
once.
 Concurrency or multi-threading is used. It is an approach when several
procedures are being completed simultaneously. In our case, all samples are
operated at the same time.
3.2.4 Separation of positive and negative collocations
The next important step is to define whether extracted above key words are positive or
negative. In other words, we have to detect that “attentive staff” is a good feature of the hotel,
while “impossible to sleep, very noisy outside” has negative impact on clients’ perception. It
can be done with sentiment analysis. This analysis identifies the emotionality of the given
text, the extent to which it is negative or positive.
Our team uses transformer model created by very knowledgeable community
HuggingFace4 that has specification in NLP analysis. In particular, they offer the variety of
pre-built models that were fitted on millions of observations which is impossible to create at
home due to high computer power requirements.
Sentiment model takes the key words and returns one of the labels: either “positive”
or “negative”. For our example above, “attentive staff” receives “positive” status, while
“impossible to sleep, very noisy outside” – negative one.
Then, after sentiment algorithm is used, it is necessary to separate positive key words
from negative into 2 different columns. Once it is done, we can have the following structure
for text analysis:

Hotel Name Key words Positive labels Negative Labels


Little Paradise Hotel Loud noise, Comfortable bed Loud noise, not
comfortable bed, not sufficient signage
sufficient signage

3
https://monkeylearn.com/
4
https://huggingface.co/

10
Table 3 Example of positive and negative labels

3.2.5 Extraction of entities from positive and negative key words


The last but one step before any conclusions are drawn is to extract entities from
positive and negative labels. Referring to our Research Question once again, we need to find
what exactly can be improved. In the example above, one does not need “comfortable” in the
word combination “comfortable bed”. The point is that having this collocation in “positive”
column is already a sign that this concrete part of service satisfied the customer. Hence, it is
required to extract all nouns from both positive and negative key words. This procedure is
done via TextBlob5 library.
Now, omitting all non-informative for this step columns, one can look at the example
of completed pipeline.

Hotel Name Positive entities Negative entities


Little Paradise Hotel bed Noise, signage
Table 4 Example of positive and negative entities

3.2.6 Counting the number of occurrences


Finally, the last procedure that has to be made is to count the frequency of each entity.
The reason is simple: which entities are more important for the hotel? Noise or signage? How
should they define their improvement strategies? That is why it is vital to rank the entities by
importance. It is done by computing the number of occurrences of each item. In the end, for
each hotel the following response is created:

Hotel Name Positive side Negative side


Entity Frequency Entity Frequency
Little Paradise bed 12 noise 15
Hotel mattress 8 pool 6
sleep 7 signage 4
Table 5 Example of entities and their frequencies

3.2.7 Dashboard explanation


Creation of analytical algorithm is indeed the last step of any analysis. However,
aggregation of the analysis into interpretable interface might be a bonus for any project.
Hence, as a final result of our methodology applied the dashboard, representing all key
information for the hotels is created with help of Tableau6. Screenshots are provided in the
next part of our report dedicated to the finalizing the results.

3.2.8 Relationship between hotel performance and hotel reviews

In order to investigate potential influence of both positive and negative reviews, we conduct
the following OLS regression:

5
https://textblob.readthedocs.io/en/dev/
6
https://www.tableau.com/

11
Figure 3 Regression output

As a dependent variable revenue is used, while independent variables are:


 Total number of positive key words per hotel
 Total number of negative key words per hotel
 Average rating of hotel
 Rating difference between hotel and its state average
 ALOS – average number of nights spent by guests
 ADR – average daily rate or room revenue / number of rooms. Shows how much each
room contributes to the total revenue.

It can be seen that positive reviews are significant and have positive relationship with hotel
revenue, while negative reviews are also significant but with inverse relationship with
revenue. Hence, it can be concluded that the more the positive key words are given to the
hotel, the higher its revenue. Meanwhile, the logic with negative key words is the opposite.

3.2.9 Conclusion on Methodology


Here is the short summary of steps that are performed during reviews analysis:
 The data is filtered to at least 100 reviews per hotel and at least 10 words per review.
 Key words are extracted from reviews.
 Key words are separated to positive and negative ones via sentiment analysis.
 Entities are extracted from both positive and negative key words.
 Frequency of each entity is calculated.
 Dashboard aggregates necessary information retrieved during the analysis.
 Positive and negative key words influence the hotel revenue based on regression
analysis.

12
4. Results

This subsection is concentrated around the results and insights we obtain via applying the
analysis described in the previous section of our report.

4.1 Findings
While checking the positive and negative entities (ones again, by entities we mean the
object of reviewer’s text, for example, bed or food), the following regularities or trends are
found:
 The most frequent words are the least useful
Overall, there are the most popular words that are obviously in the top-most frequent
words for each hotel. Such words are “room”, “hotel”, “staff”, “service”. The reason is
simple: people are giving feedback on their staying at hotels. Therefore, the majority of time
is spent in rooms or restaurants where waiters serve customers. Hence, the most popular
objects of reviews are the words mentioned above. That is considered to be a problem
because it is not informative at all for hotel owners (at least at the current depth of analysis).
In the further work section, it is stated how it can be handled.
 The same objects appear in both positive and negative reviews
One more interesting result is received in terms of co-occurrence of the same key words
in both positive and negative side of the feedback analysis. For instance, Plaza Hotel and
Casino, well-famous hotel in Las Vegas with high average rating, has shown the word
“room” to be the most popular word for both negative and positive feedbacks. Similar
situation is found with another frequent word – “hotel”. It is the 2 nd among positive reviews
and the 3rd for negative ones. The explanation might fall in clients’ preferences different
opinions on the same issue. The same room can be judged differently by, say, president and a
taxi driver. Hence, here the recommendation may be to analyze the customer segment (for
example, based on their income, if possible) and then adjust the results to the feedbacks they
give. Unfortunately, our team was not provided with any information regarding who are the
clients.
 The most informative part is in the middle of frequency distribution
The culmination of 2 previous conclusions leads us to the next one: the most useful
insights in terms of what can exactly be improved or what is really appreciated by the
customers are located in the middle of key word distribution. The point here is that the most
popular words are so popular that their importance or impact tends to 0 (mainly, because
these are common like “hotel”). Similarly, the least frequent words are so frequent that there
is no sense to consider them. For example, only one negative feedback was detected for the
hotel in Las Vegas mentioned above with the word “casino”. Does it mean that the hotel
should complete in-depth analysis or consider immediate improvements? Surely, no.
Therefore, if one wants to get the objects that are really worth being discussed in terms of
potential room for betterment, then it I necessary to look somewhere in the middle (our
recommendation is to skip top-3 words). Continuing the example with Plaza, the following
words can be found via the method written above: “smell”, “water”, noise”. As one can
guess, these are words from negative word collocations. These keywords can be estimated as
helpful. Firstly, it is very unlikely that the words “smell” and “noise” can be found in positive
reviews. Secondly, these objects are precise enough in order to be considered by hotel
managers as a space for growth and improvement. In this particular case, some investigations
on the water quality, inappropriate smell in the building/rooms and noise from casino can be
done.
 Positively estimated objects are more frequent

13
The next finding is connected with the fact that on average, the number of positive key
words (assuming their frequency as well) is larger than that of negative collocations. On other
words, people write positive feedbacks a little bit more often. The justification can be the fact
that overall average rating for all reviews is 3.78 which is close to 4 or “good” grade. Hence,
the overall direction of keywords’ emotionality and value of ratings assigned is the same.
One more justification is described as the next insight.
 The average length of positive reviews is higher (x2.78)
One more proof of having more positive key words is the average length of both positive
reviews and positive key words extracted from them. The point is that on average clients do
not wish to spend time on writing negative feedback after unpleasant visit. In numbers, mean
positive review is 2.78 times longer than mean negative one. Meanwhile, tourists, who
enjoyed fully with their vacation/business trip in the given hotel, would probably desire to
make compliments and recommend the hotel via highlighting all positive sides of their visits.

 Regression conclusion

It is found that the number of positive words increase the revenue of the hotel, while the
number of negative ones decreases it.

4.2 Dashboard presentation


As a final product of our research, our team creates a dashboard that can be seen
below:

Figure 4 Dashboard

The dashboard has the following structure:


 Drop-down menu where one can select the hotel to analyze
 Key numbers:
o Average rating of the hotel
o Difference between average hotel rating and average hotel rating for the
current state

14
o State of hotel
o Number of reviews
 Graphs with positive and negative key words distribution.

This dashboard can be used by hotel managers in order to analyze key words from
reviews and to find potential objects to be improved and issues to be handled.
4.3 Answer to the research question and hypothesis (non)-rejection
Our study is aimed to identify how reviews can impact the hotel performance. Given
all information above, the answer to the research question is: positive reviews better the hotel
performance(by increasing its revenue), whereas negative reviews worsen the hotel
performance(by decreasing its revenue). Considering the H1: we are unable to reject the
hypothesis stated in the beginning of our study.
Now, it is time to proceed with discussion on our topic: managerial and theoretical
implications.

15
4. Discussion

Comparing the results of our study and previous ones, we have identified that our findings are
actually similar to those ones found by other researchers. Namely:

 online customer reviews are valuable resource for hotel owners (Memarzadeh et al.,
2015).
 negative reviews contain direct indicators of shortcomings in the service and create
conditions for improving business (Polpinij et al, 2022).
 Systematic approach is effective (Torres et al., 2022; Bjorkelund et al., 2012)

5.1 Limitations
Our approach considers many details and aspects that have to be taken into account.
However, several limitations must be mentioned below:
 Sentiment analysis is applied for key words
The point here is that sentiment model is used to estimate emotionality of the key word
collocations, not the entire reviews. It might give extra findings for the researchers to
estimate sentiment on the whole text as well.
 The most frequent words are not handled
As it was mentioned above in the Result section, the most popular words are useless.
However, these words are located at the very top of our visuals.
 Only US hotels
We decided to focus research only on the USA hotel segment. Nevertheless, customers
perception, behavior, socio-demographic factors can vary across different markets.
 Only English reviews
Similarly to territorial features, our model does not consider any reviews except for
English-written.

5.2 Further work


The drawbacks of our study are mentioned. Below one can find advice and
recommendations on how these disadvantages can be handled.
 TF-IDF approach can be used to handle the most frequent words
 Also, some Machine Learning models can be applied
 More advanced key word extraction model can be also used
 Extend the data to other countries and languages

16
6. Conclusion

Our research examines the data containing review on the hotels in the USA. We are
interested in searching for any helpful information that can be then used to better service
level of the hotels. In order to do that, the Natural Language Processing analysis is applied
via Python programming language and Tableau visualization tool. The approach is to extract
key words from reviews, divide them into positive and negative via sentiment analysis and
then, compute their frequency in order to establish ranking of importance of each word for
each hotel. Finalizing methodology section, interactive dashboard is created in order for users
to be able to investigate results for different hotels offered.
Once we analyze feedbacks, several findings are detected. All of them are connected
with different trends: unimportance of the most frequent words, co-occurrence of the same
word in both positive and negative sides, the highest usefulness of the words with the
medium frequency rate. All in all, given all the information above, we decide that it is
actually possible and helpful to analyze feedbacks programmatically and systematically.

Finally, we conduct regression analysis to identify how positive and negative words
impact the revenue of the hotel. The result shows the direct relationship between positive
reviews and revenue, while the negative reviews drop the revenue.

6.1 Theoretical contribution


In this paper we have shown one of the available paths of reaching the higher extent
of awareness of the insights given by user feedback which might be necessary for individuals
concerned with hotel business enhancement. The research not only summarize the previous
findings, but also demonstrates the indicative subset of methods and techniques available for
sentiment analysis.

6.2 Managerial implication


Our analytical tool can be incorporated by hotel managers and owners by analyzing
customers’ reviews on their hotel. It might be extremely helpful because:
 Further development strategy can be formulated
 Potential drawbacks and shortcomings of service can be detected
 Positive feedback is also helpful as it is a signal of doing their job well.
 Comparison with competitors also takes place

17
7. References

1) Accordino, J., & Fasulo, F. (2017). The Economic Impact of Heritage Tourism in
Virginia. Retrieved October 17, 2022, from https://cura.vcu.edu/media/cura/pdfs/cura-
documents/HeritageTourism_FINALE_02-16-17.pdf
2) Akhtar, N., Sun, J., Akhtar, M.N., Chen, J. (2019). How attitude ambivalence
fromconflicting online hotel reviews affects consumers’ behavioural responses: The
moderating role of dialecticism. Journal of Hospitality and Tourism Management, 41, 28-
40.
3) Alaimo, C., Kallinikos, J., & Valderrama-Venegas, E. (2020, January). Platform
Evolution: A Study of TripAdvisor. In Proceedings of the 53rd Hawaii International
Conference on System Sciences.
4) Bian, Y., Ye, R., Zhang, J., Yan, X. (2022). Customer preference identification from hotel
online reviews: A neural network based fine-grained sentiment analysis. Computers and
Industrial Engineering, 172, 108648.
5) Bjorkelund, E., Burnett, T.H., Norvag, K. (2012). A study of opinion mining and
visualization of hotel reviews. ACM International Conference Proceeding Series, 229-
238.
6) Cheaphotels.org. (2022, May 2). The most expensive spring destinations in the United
States. Retrieved October 17, 2022,
from https://www.cheaphotels.org/press/spring22.html
7) Das, B. R., & Rainey, D. V. (2010). Agritourism in the Arkansas delta byways: Assessing
the economic impacts. International Journal of Tourism Research, 12(3), 265-280.
8) DiNapoli, T. P., & Jain, R. (2021, April). The Tourism Industry in New York City. Office
of the New York State Comptroller. Retrieved October 17, 2022,
from https://www.osc.state.ny.us/reports/osdc/tourism-industry-new-york-city
9) Doolin, H. (2019, October 4). The Most Famous Hotel in Every State. House Beautiful.
Retrieved October 19, 2022,
from https://www.housebeautiful.com/lifestyle/g18655109/most-famous-hotel-every-
state/
10) Geetha, M., Singha, P. and Sinha, S. (2017), “Relationship between customer sentiment
and online customer ratings for hotels-an empirical analysis”, Tourism Management, 61,
43-54.
11) Hahn, J. (2021). Country overview: Top five US states for hotel development in Q2 2021.
Retrieved from: https://tophotel.news/country-overview-top-five-us-states-for-hotel-
development-in-q2-2021-infographic/
12) Hennig-Thurau, T., Gwinner, K.P., Walsh, G. and Gremler, D.D. (2004). Electronic
word-of-mouth via consumer-opinion platforms: what motivates consumers to articulate
themselves on the internet? Journal of Interactive Marketing, 18 (1), 38-52.
13) Herzenstein, M., Posavac, S.S. and Brakus, J.J. (2007). Adoption of new and really new
products: the effects of self-regulation systems and risk salience. Journal of Marketing
Research, 44 (2)б 251-260.
14) IBISWorld, Inc. (n.d.). IBISWorld - Industry Market Research, Reports, and Statistics.
Copyright © 1999-2019 IBISWorld, Inc. Retrieved September 28, 2022, from
https://www.ibisworld.com/
15) Kim, W.G., Park, S.A. (2017). Social media review rating versus traditional customer
satisfaction: Which one has more incremental predictive power in explaining hotel
performance? International Journal of Contemporary Hospitality Management, 29(2),
784-802.

18
16) Lai, X., Wang, F., Wang, X. (2021). Asymmetric relationship between customer
sentiment and online hotel ratings: the moderating effects of review characteristics.
International Journal of Contemporary Hospitality Management, 33(6), 2137-2156.
17) Michalco, J., Simonsen, J. G., & Hornbæk, K. (2015). An exploration of the relation
between expectations and user experience. International Journal of Human-Computer
Interaction, 31(9), 603-617.
18) Michalco, J., Simonsen, J. G., & Hornbæk, K. (2015). An exploration of the relation
between expectations and user experience. International Journal of Human-Computer
Interaction, 31(9), 603-617.
19) NOAA National Centers for Environmental Information (NCEI). (2022). Billion-Dollar
Weather and Climate Disasters. Retrieved October 17, 2022,
from https://www.ncei.noaa.gov/access/billions/
20) Northern Arizona University. (2011). The Arizona Wine Tourism Industry. Retrieved
October 17, 2022, from https://cottonwoodaz.gov/DocumentCenter/View/702
21) Polpinij, J., Saisangchan, U., Vorakitphan, V., Luaphol, B. (2022). Identifying Significant
Customer Opinion Information of Each Aspect from Hotel Reviews. 2022 19th
International Joint Conference on Computer Science and Software Engineering, JCSSE
2022.
22) Rauch, R. (2012). Lodging Forecast 2013 - A Primer. Retrieved from:
https://www.hospitalitynet.org/opinion/4057789.html
Zacks. (2012). Hotels & Lodging Stock Outlook - April 2013 - Industry Outlook.
Retrieved September 28, 2022 from: https://www.nasdaq.com/articles/hotels-lodging-
stock-outlook-april-2013-industry-outlook-2013-04-09?amp
23) Reddy, Y.C.A.P., Sagar, S.P.P., Kalyan, R.P., Charan, N.S. (2022). Classification of
Hotel Reviews using Machine Learning Techniques. 8th International Conference on
Smart Structures and Systems, ICSSS 2022.
24) Shi, X.C., Huang, X. (2022). Beyond the workday: The effect of daily customer
interpersonal injustice on hotel employee experiences after work and the next day.
Tourism Management, 93, 104571.
25) Sprague, D. J. (2022, September 9). The History of Online Reviews and How They Have
Evolved. Retrieved October 18, 2022, from https://results.shopperapproved.com/blog/the-
history-and-evolution-of-online-reviews
26) The Most Visited States In The United States. (2021, October 29). Vivid Maps. Retrieved
October 19, 2022, from https://vividmaps.com/most-visited-us-states/
27) Torres, E.N., Adler, H., Lehto, X., Behnke, C., Miao, L. (2013). One experience and
multiple reviews: the case of upscale US hotels. Tourism Review, 68(3), 3-20.
28) Tourism in the Natural State Ushers in Economic Growth. (2017, September 19). Default.
Retrieved October 17, 2022, from https://www.arkansasedc.com/news-events/arkansas-
inc-blog/post/activeblogs/2017/09/19/tourism-in-the-natural-state-ushers-in-economic-
growth
29) Wu, D.C., Zhong, S., Qiu, R.T.R., Wu, J. (2022). Are customer reviews just reviews?
Hotel forecasting using sentiment analysis. Tourism Economics, 28(3), 795-816.
30) Xu, X. (2018). Does traveler satisfaction differ in various travel group compositions?
Evidence from online reviews. International Journal of Contemporary Hospitality
Management, 30(3), 1663-1685.
31) Ye, F., Xia, Q., Zhang, M., Zhan, Y., Li, Y. (2022). Harvesting Online Reviews to
Identify the Competitor Set in a Service Business: Evidence From the Hotel Industry.
Journal of Service Research, 25(2), 301-327.

19
32) Zhang, H., van Berkel, D., Howe, P. D., Miller, Z. D., & Smith, J. W. (2021). Using
social media to measure and map visitation to public lands in Utah. Applied
Geography, 128, 102389.
33) Zhu, L., Lin, Y. and Cheng, M. (2020). Sentiment and guest satisfaction with peer-to-peer
accommodation when are online ratings more trustworthy? International Journal of
Hospitality Management, 86, 102369.

20
8. Appendix

21

You might also like