You are on page 1of 9

The current issue and full text archive of this journal is available on Emerald Insight at:

www.emeraldinsight.com/1755-4217.htm

Data science
Data science for hospitality
and tourism
Paulo Rita and Nicole Rita
NOVA Information Management School (NOVA IMS),
Universidade Nova de Lisboa, Lisbon, Portugal, and 717
Cristina Oliveira
Universitário de Lisboa (ISCTE-IUL), ISTAR-IUL, Lisboa, Lisbon, Portugal

Abstract
Purpose – This paper aims to embrace the challenge of performing a state-of-the-art scientific literature
analysis in data science for hospitality and tourism. This is important because relatively little contemporary
analysis has been published.
Design/methodology/approach – Data on over 800 publications were collected from the Scopus database
and analyzed by the differing types of publications, evolution of publications across time, top publishers and
outlets, publications per area and per topic, top keywords used, most cited papers and most productive authors.
Findings – Conclusions are drawn and some suggestions are offered regarding topics that are likely to
provide opportunities for future research.
Originality/value – This paper identifies the need for analysis on state-of-the-art academic research
published to-date on the application of methods and techniques relating to data science in hospitality and
tourism.

Keywords Data science, Literature analysis, Hospitality and tourism


Paper type Literature review

1. Introduction
The hospitality and tourism industry is faced with new problems that require mastery of
modern technology to turn data into knowledge to support not only reactive but above all
proactive managerial decision making. Data science provides both the required quantitative
and analytic approaches deemed necessary to tackle continuous challenges posed by the
macro environment and the sector itself.
Data science and business analytics are two emerging trends at the global level that if
combined constitute an enabler for identifying opportunities and challenges for tourism.
These may assist in reflecting how analytical skills can sustain a new model of data-driven
tourism by improving the tourist experience while at the same time enhancing the
capabilities of the organizations that manage tourism and hospitality based on significantly
more sophisticated tool boxes for supporting decisions.
Moreover, one of the characteristics intrinsic to any tourist attraction is the existence of
material and immaterial heritage, which translates into a wealth of places, dates and events,
whose potential to create unique experiences for visitors is undeniable. The technologies and
analytical capabilities of today can play a key role in this process. Data science allows the
Worldwide Hospitality and
development of models of analysis that simultaneously promote quality visitation and Tourism Themes
allows a better monitoring and management of these spaces, adopting metrics that, instead Vol. 10 No. 6, 2018
pp. 717-725
of just counting the number of visits, objectively evaluate the value created by each tourist © Emerald Publishing Limited
1755-4217
who visits the destination. DOI 10.1108/WHATT-07-2018-0050
WHATT This relevant and recent area for business and management also requires a review of the
10,6 scientific literature that has been published so far, with the purpose of analyzing the state-of-
the-art in academic research and identifying the core pillars of data science for hospitality
and tourism which deserve our upmost attention.
This paper is organized as follows: to begin, a conceptual background on data science
and related terms is provided, followed by an explanation of the methodology used to
718 perform the required literature analysis. Then, the main results are presented and
conclusions drawn with the perspective of assisting researchers to boost their motivation to
conduct future studies in this exciting and impactful area.

2. Conceptual background
Data science is a set of fundamental principles that support and guide the extraction of
information and knowledge from data, being intertwined with other important concepts
such as big data and data-driven decision making (Provost and Fawcett, 2013). “The
goal of data science is to improve decision making through the analysis of data.”
(Kelleher and Tierney, 2018). The digital revolution has generated incredible amounts
of data, i.e. big data, that due to its massive size and complexity cannot be adequately
processed by traditional software applications (see, for example, recent studies by
Amado et al., 2018, and Canito et al., 2018). Hence, the term “Big Data” has been used to
describe the massive volumes of data analyzed by large organizations (McAfee et al.,
2012).
Related to data science is business intelligence, an umbrella term that combines
architectures, tools, databases, analytical tools, applications, and methodologies. Its major
objective is to enable interactive access (sometimes in real time) to data, to enable
manipulation of data, and to give business managers and analysts the ability to conduct
appropriate analyses. The process is based on the transformation of data to information,
then decisions, and finally to actions (Sharma, Delen and Turban, 2018).
Business analytics has gained rapid popularity as a managerial paradigm as it is
focused on providing decision makers with the necessary information and knowledge. It
considers three main categories: descriptive, predictive and prescriptive. Whereas
descriptive analytics answers the question of “what happened/is happening?”, predictive
analytics is focused on “what will happen and why?”, and prescriptive analytics
recommends the best course of action for a given situation (Delen and Dermikan, 2013).
Data visualization is associated with visual analytics, “combining automated analysis
techniques with interactive visualizations for an effective understanding, reasoning and
decision making on the basis of very large and complex data sets” (Keim et al., 2008), and
it is also usually related with business analytics.
Data, text and web mining are commonly used to perform predictive analysis (the
importance of prediction/forecasting is illustrated in Moro and Rita, 2016). Data mining is
defined by Witten et al.(2016) as the automatic or semiautomatic process of discovering
patterns in data (see, for instance, research by Moro et al., 2016, and Moro et al., 2017a). Text
mining is the semi-automated process of extracting patterns (useful information and
knowledge) from large amounts of unstructured data sources (Cortez et al., 2018; Moro et al.,
2017b; Santos et al., 2018). Web mining is the process of discovering intrinsic relationships from
web data, which are expressed in the form of textual, linkage or usage information (Sharda
et al., 2018).
Finally, machine learning is concerned with the question of how to construct computer
programs that automatically improve with experience while deep learning deals specifically
with the use of neural networks to improve things like speech recognition, computer vision, Data science
and natural language processing (Michalski et al., 2013).

3. Method
The Scopus database from Elsevier was selected to perform the search as it is considered the
largest abstract and citation database of peer-reviewed literature, accounting for over 1.4
billion cited references. Thirty-four search queries were conducted on title, abstract and 719
keywords of the very large number of papers indexed in Scopus using this structure:
TITLE-ABS-KEY (“topic” AND “area”). Thus, each query combined a topic and an area,
according to the following list (17 topics  2 areas = 34 queries): Topics (17): data science;
big data; data visualization; business intelligence; computational intelligence; business
analytics; descriptive analytics; predictive analytics; prescriptive analytics; data analytics;
text analytics; web analytics; data mining; text mining, web mining; machine learning; deep
learning; and areas (2): tourism; hospitality.
A total of 802 papers were found to address simultaneously at least one of the above
mentioned topic and area combinations. The collected dataset was analyzed in terms of
publication type. The evolution of publications across time was also investigated.
Furthermore, the most significant publishers and the top outlets publishing papers in those
topics and areas were scrutinized. In addition, the number of publications per area and per
topic were identified. Moreover, the top keywords used and the most cited papers were
studied. Finally, the most productive authors were acknowledged.

4. Results
Regarding the publication type (Table I), most papers (394) appeared in conference
proceedings, followed by journal articles (356). Overall, throughout the past 12 years, the
number of publications in the topics and areas kept increasing with stronger growth in their
early stages of development and application (Table II), more specifically in 2008 (þ145 per
cent), and then in three of the past 4 years, namely, 2014 (þ84 per cent), 2016 (þ74 per cent),
and 2017 (þ57 per cent) year over year.
The three top publishers (Table III) were clearly the Institute of Electrical and Electronics
Engineers, known as IEEE (96 papers), closely followed by Springer (100) and then Elsevier
(83). The Association for Computing Machinery with the acronym ACM (24 papers),
Routledge (19) and Emerald (17) although way below are also worth mentioning.
In the tourism and hospitality management category of the last available Scimago
journal ranking of 2017, the number one journal that published most articles in the topics
and areas under evaluation (Table IV) was Tourism Management (#1 in Scimago; 25
papers), followed at distance by the International Journal of Hospitality Management (#5 – 9
manuscripts), Journal of Travel and Tourism Marketing (#18 – 9 articles), Journal of Travel
Research (#2 – 7 papers), and International Journal of Contemporary Hospitality
Management (#9 – 5 manuscripts). Besides these five journals, it is worth referring both

Publication type No. of papers

Conference Proceedings 394


Journals 356
Book Chapters 15
Others 37 Table I.
Total 802 Publication type
WHATT Year No. of papers Variation (%)
10,6
2018 (1st Semester) 99
2017 203 57
2016 129 74
2015 74 6
2014 79 84
720 2013 43 34
2012 32 33
2011 24 20
2010 30 43
2009 21 22
2008 27 145
2007 11 22
Table II. 2006 9
Papers published per Before 2006 21
year Total 802

Top publishers N papers

IEEE 112
Springer 100
Elsevier 83
ACM 24
Routledge 19
Emerald 17
Table III. IGI Global 15
Top publishers Sage 11

Expert Systems with Applications (11 papers) and the Journal of Destination Marketing and
Management (7 articles), although these are not classified in the same category as the former
journals, but instead in management information systems and marketing, respectively.
Research in data science has been mainly applied to tourism (almost 94 per cent of all
identified studies) as opposed to just over 6 per cent in the hospitality area (Table V).
Impressively nearly two-thirds of the manuscripts are focused in either the topics data mining
(38 per cent) or big data (27.3 per cent). Machine learning, data analytics and data visualization
are situated between 5 per cent and 10 per cent of the total papers. These are followed by
business intelligence and text mining whereas web mining, deep learning and computational
intelligence are addressed by less number of papers but by at least one percent. Data science as
an overarching term is still rarely used as are all topics on analytics (business, text, web and
predictive) and descriptive and prescriptive analytics that seem to be yet in virgin territory.
The leading keywords (Table VI) that have been used by authors to characterize their
respective papers were big data (110), data mining (86), and tourism (83). The keywords
from the leaderboard are significant and relevant in framing the focus of published research
as they account respectively for almost 14, 11 and over 10 per cent of all publications,
respectively. A second cluster of most used keywords was constituted by machine learning,
text mining, sentiment analysis (see, for example, Nave et al., 2018) and social media, with
the last three being often used in combination within studies focused on analyzing
sentiments of tourists (see, for instance, Calheiros et al., 2017) via the application of text
Top outlets N papers
Data science
Lect. Notes Comput. Sci. 38
Tourism Management 25
ACM Int. Conf. Proc. Ser. 22
Adv. Intell. Sys. Comput. 16
Int. Arch. Photogramm., Remote Sens. Spat. Inf. Sci. - ISPRS Arch. 13
Boletin Tecnico 12 721
Commun. Comput. Info. Sci. 11
Expert Systens with Applications 11
Rev. Fac. Ing. 11
Appl. Mech. Mater. 10
Procedia Comput. Sci. 10
CEUR Workshop Proc. 9
Inf. Technol. Tour. 9
International Journal of Hospitality Management 9
Journal of Travel and Tourism Marketing 9
Lect. Notes Electr. Eng. 9
Proc. - IEEE Int. Conf. Big Data, Big Data 8
Journal of Destination Marketing and Management 7
Journal of Travel Research 7
Lect. Notes Bus. Inf. Process. 7
IFIP Advances in Information and Communication Technology 6
Cluster Comput. 5 Table IV.
International Journal of Contemporary Hosp. Management 5 Top outlets

Area No. of papers (%)

Tourism 751 93.6


Hospitality 51 6.4
Total 802 100.0
Topic
Data Mining 305 38.0
Big Data 219 27.3
Machine Learning 78 9.7
Data Analytics 45 5.6
Data Visualization 41 5.1
Business Intelligence 31 3.9
Text Mining 29 3.6
Web Mining 13 1.6
Deep Learning 12 1.5
Computational Intelligence 8 1.0
Data Science 6 0.7
Business Analytics 6 0.7
Text Analytics 3 0.4
Predictive Analytics 3 0.4
Web Analytics 3 0.4
Descriptive Analytics 0 0.0 Table V.
Prescriptive Analytics 0 0.0 Published papers per
Total 802 100.0 area and topic
WHATT mining to their online reviews posted in social media platforms such as TripAdvisor,
10,6 Booking and Zomato. Smart tourism, Geographical Information Systems (GIS) and business
intelligence complete the top ten.
With regard to most cited papers, thirteen articles reached over fifty citations in Scopus,
eleven in the tourism area and two in hospitality (Table VII). The prevalent topics were data
mining/analytics (4 papers), big data and machine learning (3 papers each) with the
722 remaining being text mining and web analytics (1 article each).
Three manuscripts generated more than two hundred citations each. For example, Choi
et al. (2007) (284 Scopus citations) studied the multiplicity of image representations of a
destination on the Internet through the analysis of the contents of a variety of web
information sources, using text mining, expert judgment and correspondence analysis. Ye
et al. (2009) (215 citations) incorporated three sentiment classification techniques
(specifically the supervised machine learning algorithms of Naïve Bayes, Support Vector
Machines and the character based N-gram model) into the domain of mining reviews from
travel blogs in seven popular destinations in both the US and Europe.
Three further papers achieved just over one hundred Scopus citations. For instance,
Wood et al. (2013) (109 citations) predicted visitation rates at national parks using location of
photographs in Flickr and arrived to the conclusion that the crowd-sourced information
could actually serve as a reliable proxy for empirical visitation rates. Gretzel et al. (2015) (102
citations) focused on characterization and the need for research on smart tourism, a concept
describing the increasing importance of emerging forms of Information and Communication
Technologies in influencing tourism destinations, their industries and tourists.
Moreover, three other manuscripts follow with around 90 citations each. One of them is
the study by Xiang et al. (2015) (94 citations) that applies a text analytical approach to a
large quantity of consumer reviews (big data) extracted from Expedia.com to deconstruct
hotel guest experience and examine its association with satisfaction ratings. Another paper
is by Kwok and Yu (2013) (87 citations) who examined what types of messages gained the
most clicks of “Like” and comments on Facebook by analyzing the number of likes and
comments regarding almost one thousand Facebook messages from ten restaurant chains
and two independent operators.
The most productive author (Table VIII) in the topics and areas under study is Dr Rob
Law (16 publications), Professor of Technology Management at the Hong Kong Polytechnic
University. The second is Prof Junping Du (ten publications) from the Beijing University of
Posts and Communications, and the third Anongnart Srivihok (6 publications) from the
Kasetsart University in Thailand. Furthermore, there are seven authors with five
publications and twenty-three with four publications.

Top keywords N° Papers (%)

Big Data 110 13.7


Data Mining 86 10.7
Tourism 83 10.3
Machine Learning 41 5.1
Text Mining 34 4.2
Sentiment Analysis 29 3.6
Social Media 24 3.0
Table VI. Smart Tourism 19 2.4
Top keyword used in GIS 16 2.0
papers Business Intelligence 15 1.9
Scopus
Data science
citations Title Citation Topic Area

284 Destination image representation on the web: Choi et al. (2007) Text Tourism
Content analysis of Macau travel related Mining
websites
215 Sentiment classification of online reviews to Ye et al. (2009) Machine Tourism
travel destinations by supervised machine Learning 723
learning approaches
212 Structural semantic interconnections: A Navigli and Machine Tourism
knowledge-based approach to word sense Velardi (2005) Learning
disambiguation
109 Using social media to quantify nature-based Wood et al. (2013) Big Data Tourism
tourism and recreation
107 The Twitter of Babel: Mapping World Mocanu et al. Data Tourism
Languages through Microblogging Platforms (2013) Mining
102 Smart tourism: foundations and developments Gretzel et al. (2015) Big Data Tourism
94 What can big data and text analytics tell us Xiang et al. (2015) Big Data Hospitality
about hotel guest experience and satisfaction?
89 Internet of Things and Big Data Analytics for Sun et al. (2016) Data Tourism
Smart and Connected Communities Analytics
87 Spreading Social Media Messages on Facebook: Kwok and Yu Machine Hospitality
An Analysis of Restaurant Business-to- (2013) Learning
Consumer Communications
61 Google Analytics for measuring website Plaza (2011) Web Tourism
performance Analytics
60 Cost-aware travel tour recommendation Ge et al. (2011) Data Tourism
Mining
55 A hybrid recommendation approach for a Lucas et al. (2013) Data Tourism
tourism system Mining
52 Mining customer knowledge for tourism new Liao et al. (2010) Data Tourism
product development and customer relationship Mining Table VII.
management Most cited papers

N papers Authors

16 1: Law R.
10 1: Du J.
6 1: Srivihok A.
5 7: Chen Y.; Frikha M.; Gargouri F.; Huang Z.; Lexhagen M.; Vu H.Q.; Zhang J.
4 23: Burguillo J.C.; Cassavia N.; Claveria O.; Du Q.; Höpken W.; Ichifuji Y.; Koo Table VIII.
C.; Leal F.; Li G.; Li Y.; Li X.; Ma Y.; Malheiro B.; Masciari E.; Mhiri M.; Monte Most productive
E.; Rita, P.; Saccà D.; Torra S.; Wang S.; Wang Y.; Xiang Z.; Yotsawat W authors

5. Conclusion
This paper identified the need for literature analysis on state-of-the-art academic
research published to-date on the application of data science methods and techniques in
hospitality and tourism. Just over 800 papers were retrieved from Scopus indexed
publications and analyzed using a multitude of perspectives. Journal articles account
for around 44 per cent of all those manuscripts and the pace of publication is in the
growth stage of its life cycle.
WHATT The fact IEEE is the leading publisher and that many of the top outlets are from the
10,6 technology side emphasizes the importance of interdisciplinary research and the extent to
which those who lead tech-driven innovation impact in areas seen as domain applications,
such as hospitality and tourism. Tourism has by far the highest percentage of publications
compared with hospitality, and so this indicates that there opportunities for hospitality
industry applications to be further explored.
724 Due to the exponential growth of data driven by the internet, social media and mobile use, it
is of paramount to recognize big data and to foster mining techniques to analyze it. Hence, it is
understandable why 2 of the 17 topics, namely, “big data” and “data mining,” have been clearly
dominating research. Nevertheless, there is room for further research and to extend data
mining to text and web mining studies. A more vertical focus on descriptive, predictive and
prescriptive analytics is still somewhat under the radar and so offers a further opportunity for
research. Within the artificial intelligence arena, computational intelligence and machine and
deep learning are streams of research that are likely to flourish in the near future.

References
Amado, A., Cortez, P., Rita, P. and Moro, S. (2018), “Research trends on big data in marketing: a text
mining and topic modeling based literature analysis”, European Research on Management and
Business Economics, Vol. 24 No. 1, pp. 1-7.
Calheiros, A.C., Moro, S. and Rita, P. (2017), “Sentiment classification of consumer-generated online reviews
using topic modeling”, Journal of Hospitality Marketing and Management, Vol. 26 No. 7, pp. 675-693.
Canito, J., Ramos, P., Moro, S. and Rita, P. (2018), “Unfolding the relations between companies and
technologies under the big data umbrella”, Computers in Industry, Vol. 99, pp. 1-8.
Choi, S., Lehto, X.Y. and Morrison, A.M. (2007), “Destination image representation on the web: content
analysis of macau travel related websites”, Tourism Management, Vol. 28 No. 1, pp. 118-129.
Cortez, P., Moro, S., Rita, P., King, D. and Hall, J. (2018), “Insights from a text mining survey on expert
systems research from 2000 to 2016. expert systems”, Expert Systems, Vol. 35 No. 3, p. e12280.
Delen, D. and Dermikan, H. (2013), “Data, information and analytics as services”, Decision Support
Systems, Vol. 55 No. 1, pp. 359-363.
Ge, Y., Liu, Q., Xiong, H., Tuzhilin, A. and Chen, J. (2011), “Cost-aware travel tour recommendation”, In
Proceedings of the 17th ACM SIGKDD international conference on Knowledge Discovery and
Data Mining, ACM, pp. 983-991.
Gretzel, U., Sigala, M., Xiang, Z. and Koo, C. (2015), “Smart tourism: foundations and developments”,
Electronic Markets, Vol. 25 No. 3, pp. 179-188.
Keim, D., Andrienko, G., Fekete, J.D., Görg, C., Kohlhammer, J. and Melançon, G. (2008), “Visual
analytics: Definition, process, and challenges”, In Information Visualization, Springer, Berlin,
Heidelberg, pp. 154-175.
Kelleher, J. and Tierney, B. (2018), Data Science, MIT Press, Cambridge, MA.
Kwok, L. and Yu, B. (2013), “Spreading social media messages on facebook: an analysis of restaurant
business-to-consumer communications”, Cornell Hospitality Quarterly, Vol. 54 No. 1, pp. 84-94.
Liao, S.H., Chen, Y.J. and Deng, M.Y. (2010), “Mining customer knowledge for tourism new product
development and customer relationship management”, Expert Systems with Applications, Vol. 37
No. 6, pp. 4212-4223.
Lucas, J.P., Luz, N., Moreno, M.N., Anacleto, R., Figueiredo, A.A. and Martins, C. (2013), “A hybrid
recommendation approach for a tourism system”, Expert Systems with Applications, Vol. 40
No. 9, pp. 3532-3550.
McAfee, A., Brynjolfsson, E., Davenport, T.H., Patil, D.J. and Barton, D. (2012), “Big data: the
management revolution”, Harvard Business Review, Vol. 90 No. 10, pp. 60-68.
Michalski, R.S., Carbonell, J.G. and Mitchell, T.M. (Eds.) (2013), Machine Learning: An Artificial Data science
Intelligence Approach, Springer Science & Business Media.
Mocanu, D., Baronchelli, A., Perra, N., Gonçalves, B., Zhang, Q. and Vespignani, A. (2013), “The twitter
of babel: mapping world languages through microblogging platforms”, PloS One, Vol. 8 No. 4,
p. 61981.
Moro, S. and Rita, P. (2016), “Forecasting tomorrow’s tourist”, Worldwide Hospitality and Tourism
Themes, Vol. 8 No. 6, pp. 643-653.
Moro, S., Rita, P. and Coelho, J. (2017a), “Stripping customers’ feedback on hotels through data mining:
725
the case of las vegas strip”, Tourism Management Perspectives, Vol. 23, pp. 41-52.
Moro, S., Rita, P. and Cortez, P. (2017b), “A text mining approach to analyzing annals literature”,
Annals of Tourism Research, Vol. 66, pp. 208-210.
Moro, S., Rita, P. and Vala, B. (2016), “Predicting social media performance metrics and evaluation of
the impact on Brand building: a data mining approach”, Journal of Business Research, Vol. 69
No. 9, pp. 3341-3351.
Nave, M., Rita, P. and Guerreiro, J. (2018), “A decision support system framework to track consumer
sentiments in social media”, Journal of Hospitality Marketing and Management, pp. 1-18.
Navigli, R. and Velardi, P. (2005), “Structural semantic interconnections: a knowledge-based approach
to word sense disambiguation”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 27 No. 7, pp. 1075-1086.
Plaza, B. (2011), “Google analytics for measuring website performance”, Tourism Management, Vol. 32
No. 3, pp. 477-481.
Provost, F. and Fawcett, T. (2013), “Data science and its relationship to big data and data-driven
decision making”, Big Data, Vol. 1 No. 1, pp. 51-59.
Santos, C.L., Rita, P. and Guerreiro, J. (2018), “Improving international attractiveness of higher
education institutions based on text mining and sentiment analysis”, International Journal of
Educational Management, Vol. 32 No. 3, pp. 431-447.
Sharma, R., Delen, D. and Turban, E. (2018), Business Intelligence, Analytics, and Data Science: A
Managerial Perspective, 4th edition, Pearson, London.
Sun, Y., Song, H., Jara, A.J. and Bie, R. (2016), “Internet of things and big data analytics for smart and
connected communities”, IEEE Access, Vol. 4, pp. 766-773.
Witten, I., Frank, E., Hall, M. and Pal, C. (2016), Data Mining, 4th edition, Morgan Kaufmann,
Burlington, Massachusetts.
Wood, S.A., Guerry, A.D., Silver, J.M. and Lacayo, M. (2013), “Using social media to quantify nature-
based tourism and recreation”, Scientific reports, 3, 2976.
Xiang, Z., Schwartz, Z., Gerdes, J.H. Jr and Uysal, M. (2015), “What can big data and text analytics tell
us about hotel guest experience and satisfaction?”, International Journal of Hospitality
Management, Vol. 44, pp. 120-130.
Ye, Q., Zhang, Z. and Law, R. (2009), “Sentiment classification of online reviews to travel destinations
by supervised machine learning approaches”, Expert Systems with Applications, Vol. 36 No. 3,
pp. 6527-6535.

Corresponding author
Paulo Rita can be contacted at: p.rita@ipdt.pt

For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com

You might also like