You are on page 1of 15

Annals of Tourism Research 103 (2023) 103667

Contents lists available at ScienceDirect

Annals of Tourism Research


journal homepage: https://www.journals.elsevier.com/annals-of-
tourism-research

Research article

Tourism forecasting with granular sentiment analysis


Hengyun Li, Huicai Gao ⁎, Haiyan Song
School of Hotel and Tourism Management, The Hong Kong Polytechnic University, Hong Kong

a r t i c l e i n f o a b s t r a c t

Article history: Generic sentiment calculations cannot fully reflect tourists' preferences, whereas fine-grained
Received 4 May 2023 sentiment analysis identifies tourists' precise attitudes. This study forecasted visitor arrivals at
Received in revised form 21 September 2023 two tourist attractions in China using Internet data from multiple sources. Empirical results in-
Accepted 22 September 2023
dicate that 1) fine-grained sentiment analysis of online review data can substantially improve
Available online 23 October 2023
tourism demand models' forecasting performance; 2) combining multidimensional sentiment
Handling Editor: Gang Li analysis–based online review data with search engine data outperforms search engine data
in tourism demand prediction; and 3) fine-grained sentiment analysis–based online review
data and search engine data maintain stable predictive power during times of uncertainty.
Keywords:
© 2023 Elsevier Ltd. All rights reserved.
Deep learning
Multisource Internet big data
Tourism demand forecasting
Fine-grained sentiment analysis
Hybrid feature engineering

Introduction

Timely demand forecasting is vital to tourism businesses' success. However, predicting tourism demand during crises (e.g., the
COVID-19 pandemic) is challenging. Tourism demand has experienced drastic pandemic-related fluctuations; forecasting methods
based on historical data cannot produce reliable output (Kourentzes et al., 2021). Additionally, crisis-related ambiguities further
complicate prediction (Pinson & Makridakis, 2022). Online reviews are a form of electronic word of mouth, providing valuable
insight in uncertain times. The dynamic nature of these reviews enables monitoring of people's evolving attitudes, opinions,
and preferences, all of which are essential for accurate demand forecasting. Integrating social media data, particularly online re-
views, can uncover information that facilitates tourism demand prediction.
Sentiment analysis has been widely used to explore people's attitudes towards specific destinations or attractions based on on-
line reviews. Different from volume-based information (e.g., review volume), sentiment variables extracted from customer re-
views can capture writers' opinions. Yet most generic sentiment analysis approaches simply categorize tourists' sentiments as
positive, negative, or neutral; these methods are not detailed enough to capture nuanced preferences (Lau, Zhang, & Xu, 2018).
It is also important to understand the aspects to which sentiments are related. Distinct features of an experience, such as a service
or location, can elicit unique sentiments: consumers may view some attributes positively and others negatively. We can obtain a
richer sense of tourists' behavior by linking travelers' aspects of interest with corresponding sentiment polarity. Applying senti-
ment analysis at the aspect level may thus improve customer satisfaction, retention, and demand prediction in tourism.

⁎ Corresponding author.
E-mail addresses: neilhengyun.li@polyu.edu.hk (H. Li), huicai.gao@connect.polyu.hk (H. Gao), haiyan.song@polyu.edu.hk (H. Song).

https://doi.org/10.1016/j.annals.2023.103667
0160-7383/© 2023 Elsevier Ltd. All rights reserved.
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

On the basis of fine-grained aspect-level sentiment analysis and hybrid feature engineering, we used multisource data
(i.e., official announcement data, search engine data, and online review data) to forecast monthly and weekly tourist arrivals at
two renowned scenic areas in China: Kulangsu Island and Jiuzhaigou Valley. First, we assessed the performance of generic senti-
ment analysis and complemented it with fine-grained sentiment analysis in tourism demand forecasting. The latter analytical
method was proven to be an ideal supplement to generic sentiment analysis; we unearthed insights into both approaches' syn-
ergistic effects in tourism demand forecasting. Then, we compared the predictive capabilities of models using multiple types of
Internet big data to those without such data during COVID-19. Results indicated whether online reviews and search engine
data could consistently provide accurate tourism demand forecasts during crises. Taking it a step further, we extend our investi-
gation to examine the predictive power of multisource Internet big data (i.e., online reviews and search engine data) versus
single-source Internet big data (i.e., search engine data) in forecasting tourism demand amid the pandemic. This investigation
sheds light on the potential enhancement effect of multisource Internet big data for tourism demand prediction during this pe-
riod.
This study's contributions are multifold. First, our research presents an initial attempt to enhance tourism demand forecasting
accuracy through the complementary effects of generic and fine-grained review sentiments. Most demand forecasting systems
rely on overall sentiment polarity or fine-grained sentiments at the document level, which may not fully capture the intricate nu-
ances of tourists' preferences. We incorporated aspect-based sentiments at the review level to identify factors underlying con-
sumer behavior. Second, we evaluated the resilience of search engine and online review data for tourism demand forecasting
in turbulent times. Our conclusions verified the predictive power of digital footprints and granular customer review sentiments
amid crises.
Furthermore, this study is one of the earliest to predict tourism demand in times of uncertainty using multisource Internet
data that reflect tourists' interests and granular sentiments. Many other studies have measured tourists' attention via search en-
gine data, website traffic data, or generic review sentiments in a demand forecasting system. We incorporated tourists' attention
information, official announcement details, and review-level generic and aspect-based sentiments in one model to predict de-
mand during the pandemic. Findings deepen the understanding of how granular and supplementary multisource Internet data in-
fluence tourism demand forecasting. These results can guide industry practitioners in implementing flexible, yet targeted
strategies based on changing public attitudes towards tourist attractions. Local governments will be better prepared to judge un-
expected events (e.g., pandemics) as well.

Literature review

Tourism demand forecasting using web traffic or search engine data

Internet big data are helpful when forecasting tourism demand for destinations or businesses (Huang, Zhang, & Ding, 2017).
Attention-based Internet big data (e.g., web traffic and search engine data) have been used to predict such demand. For example,
web traffic data have been obtained from businesses' virtual storefronts for this purpose (Pan & Yang, 2017). Gunter and Önder
(2016) used web traffic data from Google to forecast arrivals in Vienna, Austria; they found that web traffic indicators increased
forecasting accuracy. These data also improved hotel demand forecasting performance. Pan and Yang (2017) incorporated this
type of data into typical time series models to predict room occupancy in Charleston, South Carolina; they ultimately confirmed
that web traffic information enhanced short-term hotel demand forecasting.
Researchers have also used search engine data to forecast demand for destinations, tourist attractions, and hotels. For instance,
Li, Li, Pan, and Law (2021) predicted visitor arrivals to Beijing, China, using feature selection methods based on Google and Baidu
Trends data. The authors found that machine learning methods could identify the most helpful search engine data for tourism de-
mand forecasting. Huang et al. (2017) applied Baidu Trends data to predict tourist arrivals at Beijing's Forbidden City; including
search engine data significantly increased forecasting outcomes over conventional models. Pan, Wu, and Song (2012) and
Rivera (2016) used Google Trends data to predict hotel demand in different regions. These studies have reinforced the utility
of search engine data in accurate demand forecasting. Although search engine and web traffic data are helpful for predicting tour-
ism demand, they fail to encompass the entire spectrum of information travelers seek. More nuanced and comprehensive data
sources are needed to gain richer insight into tourists' preferences and behavior.

Tourism demand forecasting using social media data

As a supplement to search engine and web traffic data, social media data provide ample information about tourists' behavior
(Li, Meng, & Zhang, 2022). Cui, Gallino, Moreno, and Zhang (2018) pointed out that social media information influences cus-
tomers' purchase behavior through two mechanisms: 1) the attention effect, referring to people's product awareness; and
2) the endorsement effect, with perceived product quality being based on people's sentiments. Attention and endorsement effects
can affect product sales based on data from numerous platforms, such as forums and online review communities (Li & Wu, 2018).
Social media appears increasingly impactful in shaping people's behavior. Its applications in forecasting tourism demand are
expanding in kind.
Based on the literature, Table 1 indicates that data from social networking platforms and online travel agencies are popular in
tourism demand forecasting. Ampountolas and Legg (2021) combined text analysis with Twitter data to enhance hotel demand
prediction. Segmented boosting methods that included social networking information were more accurate than naïve or

2
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

Table 1
Review of research using social media data to forecast tourism demand (2017–2022).

Authors (Year) Data Source(s) Variable(s) Sentiment Analysis

Hu, Li, Song, Li, and Law (2022) TripAdvisor Review rating, review volume, review text ✓
Andariesta and Wasesa (2022) Google Trends & TripAdvisor Post volume, comment volume
Wu, Zhong, Qiu, and Wu (2022) Ctrip.com & Qunar.com Review rating, review volume, review text ✓
Li, Li, Liu, Zhu, and Wei (2022) Hexun.com News text, news headlines ✓
Chang, Chen, Lai, Lin, and Pai (2021) Booking.com Review text, review rating ✓
Park, Park, and Hu (2021) China Daily & CNN Distribution of topics
Ampountolas and Legg (2021) Twitter Tweet volume, Twitter text ✓
Tian, Yang, Mao, and Tang (2021) WeChat, TikTok, & Weibo Posts and reads, reposts, comments, likes
Önder, Gunter, and Gindl (2020) Facebook Likes
Li, Hu, and Li (2020) Baidu Trends, Ctrip.com, & Qunar.com Review rating, review volume
Önder, Gunter, and Scharl (2019) International news websites News text ✓
Bigné, Oltra, and Andreu (2019) Twitter Retweets, replies, followers, likes
Colladon, Guardabascio, and Innarella (2019) Google Trends & TripAdvisor Post volume, user volume
Gunter et al. (2019) Google Trends & Facebook Likes
Starosta, Budz, and Krutwig (2019) German-language news streams News text ✓
Miah, Vu, Gammack, and McGrath (2017) Flickr Geotags, tags, titles, descriptions

autoregressive integrated moving average models in predicting hotel occupancy. ‘Likes’ on Facebook have also been used to pre-
dict tourism demand. Gunter, Önder, and Gindl (2019) incorporated Facebook likes and Google Trends data into a mixed fre-
quency data sampling model and an autoregressive distributed lag model to predict visitor arrivals. Both models outperformed
benchmarks for the Austrian cities of Salzburg and Vienna; the opposite was true for Innsbruck and Graz.
Scholars have employed data from online travel agencies (e.g., TripAdvisor) to forecast tourism demand as well. Variables
from online reviews can increase models' accuracy when predicting destination demand (Hu et al., 2022), tourist attraction de-
mand (Li, Hu, & Li, 2020), and hotel demand (Chang et al., 2021; Wu et al., 2022). Colladon et al. (2019) compared the utility of
TripAdvisor review data and Google Trends data when predicting tourist arrivals to seven European cities. The factor augmented
autoregressive and bridge models, which contained social media variables, outperformed models using Google Trends data.
Li, Hu, and Li (2020) further combined online review data and Baidu Trends data as multisource variables to predict visitors
to Mount Siguniang, China; multisource Internet big data outperformed single-source Internet big data in improving tourism
demand forecasting accuracy. A few studies also integrated data from media platforms to forecast tourism demand. Park et al.
(2021) used news data to forecast tourist arrivals in Hong Kong from Mainland China and the United States; the forecasting
model with news topics performed well during normal and crisis periods. Starosta et al. (2019) and Önder et al. (2019) mined
semantic variables from news to forecast tourist arrivals to European destinations. They noted that sentiment variables from
the news facilitated tourism demand prediction.

Tourism demand forecasting during COVID-19

The COVID-19 pandemic has severely disrupted the tourism industry, sparking academic interest in demand forecasting under
these conditions. Most research has featured the judgmental and econometric combination method (Zhang, Song, Wen, & Liu,
2021), scenario-based judgmental approach (Kourentzes et al., 2021; Liu, Vici, Ramos, Giannoni, & Blake, 2021; Qiu et al.,
2021), artificial intelligence approach (Fotiadis, Polyzos, & Huan, 2021), and expert judgment–based probabilistic approach
(Athanasopoulos, Hyndman, Kourentzes, & O'Hara-Wild, 2023). Many variables used to predict tourism demand mainly concern
economics; examples include gross domestic product and purchasing power parity (Kourentzes et al., 2021; Qiu et al., 2021).
Pandemic-related factors, such as the number of COVID-19 cases, have also been considered in tourism demand forecasting
(Fotiadis et al., 2021; Liu et al., 2021; Yang, Fan, Jiang, & Liu, 2022). Fotiadis et al. (2021) predicted international tourist arrivals
amid COVID-19 using data from the severe acute respiratory syndrome pandemic. The authors emphasized the importance of
comparing training sets when applying machine learning methods because each set might generate varying results. Zhang
et al. (2021) employed a mixed approach with expert opinions to forecast tourist arrivals ex-ante in Hong Kong during COVID-
19. Expert opinions were deemed crucial for accurate demand prediction. Variables taken from Internet big data, including search
engine and geotagged Twitter data, have also been adopted to forecast tourism demand amid the pandemic. Yang et al. (2022)
used Google Trends data to predict tourism demand across 74 countries in 2020; these data were useful in less than half of
the sample and were associated with pandemic severity. Overall, this stream of literature has informed industry recovery, adap-
tation, and resilience in times of uncertainty.

Sentiment analysis of social media data in tourism demand forecasting

Sentiments are cognitive reflections of one's social inclinations. In the context of sentiment analysis, customers' sentiments al-
lude to the emotions expressed in user reviews (Geetha, Singha, & Sinha, 2017). These sentiments can have endorsement effects
on products or services; that is, consumers' feelings can promote or hinder others' likelihood of engaging with an offering.

3
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

Consumer reviews and social media posts often convey emotions like happiness, frustration, and disappointment (O'Leary, 2011).
Negative review sentiments can discourage purchase behavior whereas positive sentiments can drive it (Dellarocas, Zhang, &
Awad, 2007). Consumer sentiment is known to inform subsequent purchase decisions (Easaw, Garratt, & Heravi, 2005;
Ludvigson, 2004). This predictive ability underscores the key role of such sentiment in forecasting tourism demand.
Many academics use “sentiment” and “emotion” interchangeably because both terms involve subjectivity. However, these ex-
pressions have slightly distinct meanings. Sentiment analysis is classifying the text according to the sentimental information as-
sociated with the text, which is also known as opinion mining (Alslaity & Orji, 2022; Kaur & Saini, 2014). It reveals people's
opinions about products, services, or brands (Liu, 2015). The sentiment here reflects “an attitude, thought, or judgment prompted
by a feeling” (Munezero, Montero, Sutinen, & Pajunen, 2014). Different from sentiment analysis, emotion analysis or detection is
another common task in natural language processing. The emotion here addresses conscious reactions that manifest as strong
feelings (Munezero et al., 2014). Sentiment analysis usually involves two or three types of feelings (e.g., negative, positive, and
neutral) when classifying text, whereas emotion analysis uses a wider range (e.g., joy, fear, anger, surprise, or disgust) (Kaur &
Saini, 2014). Besides, emotions tend to be more sophisticated than sentiments and are therefore more difficult to discern
(Chen et al., 2021).
Several studies have adopted sentiment analysis to analyze social media data when forecasting tourism demand. Most have
extracted the overall sentiment polarity of a sentence or document, a method known as coarse-grained sentiment analysis
(Xiao, Li, Thürer, Liu, & Qu, 2022). Hu et al. (2022) used this approach with online reviews from TripAdvisor to predict visitor ar-
rivals in Hong Kong. Their seasonal autoregressive integrated moving average (SARIMA)–mixed frequency data sampling model
with generic review sentiment outperformed the SARIMA and the seasonal naïve models in most source markets and measure-
ments; however, its forecasting ability was somewhat weaker than those using review ratings. Granular sentiment analysis has
gradually emerged thanks to advances in opinion mining and natural language processing. Instead of merely determining whether
a piece of text is positive, negative, or neutral, fine-grained sentiment analysis can also be used to identify sentiments towards
specific aspects mentioned in text. Ampountolas and Legg (2021) employed data from Twitter and SocialMention to predict hotel de-
mand; segmented boosting methods including fine-grained sentiment variables increased forecasting accuracy. Wu et al. (2022) pre-
dicted hotel demand using data from Ctrip and Qunar. Findings showed that incorporating fine-grained sentiment factors into the
forecasting model improved its accuracy in most contexts. In summary, combining sentiment analysis could potentially help the tourism
industry guarantee customer satisfaction, retention, and accurate demand predictions.

Rationale for the current study

Although search engine and social media data can improve tourism demand forecasting accuracy, several knowledge gaps
persist. First, even as sentiment variables from reviews remain popular predictors of tourism demand, consumers' prefer-
ences are rarely explained in depth (Lau et al., 2018). General ratings or generic review sentiments provide broad evaluations
of customers' attitudes but do not capture intricate preferences for destination aspects. Fine-grained sentiment analysis of-
fers a more detailed and aspect-based understanding of customers' sentiments. This approach complements generic senti-
ment analysis and review ratings by extracting multidimensional demand information, thereby enhancing the accuracy of
tourism forecasting.
Second, it remains to be seen how well search engine and online review data improve tourism demand prediction during
times of crises. COVID-19 has profoundly influenced tourists' decision making. For instance, travelers have engaged in extensive
planning and information searches due to constraints around outdoor recreation during the pandemic (Humagain & Singleton,
2021). Tourists are also now likely to share comments about destinations' health measures to promote safety during trips.
Changes in users' search habits and reviews may reflect shifts in their tourism preferences. It is thus essential to investigate
how search engine data and online review data contribute to tourism demand forecasting during the pandemic.
Third, macro factors or Internet users' attention information have often been used to predict tourism demand during crises.
Few researchers have considered the power of multisource Internet big data, including attention−/volume-based and senti-
ment−/valence-based variables. If a destination encounters challenges due to high visitor interest, the location will not benefit
from a demand forecasting model that relies solely on attention-based or Internet traffic data (Hu et al., 2022). Complementary
data sources more realistically reflect consumers' behavior and satisfaction (Xiang, Schwartz, Gerdes, & Uysal, 2015). Multisource
Internet big data are useful for enhancing forecasting accuracy in normal periods (e.g., Gunter et al., 2019; Pan & Yang, 2017). Yet
relatively little is known about how multisource Internet big data influence tourism demand prediction during crises.

Methodology

Data collection

Fig. 1 illustrates our framework for forecasting tourist arrivals to Kulangsu Island, a tourist attraction in Fujian province, China.
Kulangsu, also known as Gulangyu Island, was chosen as the focal destination for certain reasons. First, it is an AAAAA-level scenic
area in China. China's scenic areas are classified based on five levels: AAAAA, AAAA, AAA, AA, and A. AAAAA denotes the most
important and best-maintained attractions as per the country's Ministry of Culture and Tourism. Second, as a renowned cultural
World Heritage Site, Kulangsu has been featured in numerous studies on tourism demand forecasting (e.g., Hu, Xiao, & Li, 2021; Li,
Ge, Liu, & Zheng, 2020).

4
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

Fig. 1. Research framework.


Note: SARIMAX = seasonal autoregressive integrated moving average with exogenous factors; XGBoost = extreme gradient boosting; ETS = exponential
smoothing; SNAÏVE = seasonal naïve; SARIMA = seasonal autoregressive integrated moving average; TBATS = trigonometric Box-Cox ARMA trend seasonal.

Our framework consisted of four steps: data collection, data pre-processing, model specification, and model evaluation and
prediction. We referred to tourist arrival data, official announcement data, search engine data, and online review data. Monthly
tourist arrivals were obtained from the official website of the Kulangsu administrative committee between January 2015 and
July 2021; weekly tourist arrivals were acquired from the Kulangsu development center between July 2016 and February 2021.
Daily official announcement data, search engine data, and online review data were compiled between January 2015 and July 2021.
Daily search engine information were obtained from Baidu, which holds over 70 % of China's search engine market share (Cui,
2019). We followed Law, Li, Fong, and Han's (2019) data collection approach and began by defining seed keywords for “Kulangsu”
in Chinese. The keyword list was then expanded by identifying all possible search queries related to Kulangsu via Baidu Trends.
This way, we ensured that our data encompassed tourists' interests. A Python script was next used to acquire daily search engine
data for the specified keywords and to assemble the data into monthly and weekly frequencies. Online review data were taken
from Dianping.com, China's largest online review platform with over 506 famous scenic venues (Huang, Chen, Zhou, Zhao, &
Wang, 2016). Our sample contained 20,838 reviews, including the review date, content, ratings, feedback, and likes. Additionally,
we collected 571 announcements from the official website of the Kulangsu development center.

Data pre-processing

Table 2 displays the overall variables of interest. To start, we cleaned all data by 1) eliminating duplicate and irrelevant data
(e.g., meaningless symbols) from review text; 2) addressing outliers and missing values (e.g., replacing missing ‘like’ values in
specific reviews with zero); and 3) standardizing the data (e.g., converting a review date from “2019-12-22 19:11” to a uniform
format, “2019/12/22”).
Next, we pre-processed a series of variables and compiled them into a monthly frequency (January 2015–July 2021) and a
weekly frequency (July 2016–February 2021) (see Fig. A in the Appendix). Because we only obtained weekly tourist arrival
data from July 2016 to February 2021, we defined the same time range for other variables' weekly calculations to ensure consis-
tency. During pre-processing, official announcement data were first extracted as external policy indicators that influenced tourist

5
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

Table 2
List of variables.

Category Variable

Tourist arrival data Tourist arrivals


Search engine data Search engine series
Review ratings
Review volume
Review likes
Online review data Review feedback
Review length
Generic review sentiment polarity
Fine-grained review sentiment variable series
Entry (Yes/No)
Official announcement data
Visitor volume limitation
Holiday data Holiday (Yes/No)

volumes, particularly during the pandemic (e.g., visitor limits in validation assessment). We manually checked whether and when
an announcement contained a no-entry notice; the corresponding ‘Entry’ variable for the time mentioned in the announcements
was labeled 1 if yes and 0 otherwise. We also extracted holiday information, which reflected tourists' free time and the dates
when they could travel (Bi, Li, Xu, & Li, 2022).
We extracted factors guiding tourists' destination choices (e.g., review volume, ratings) from online review data. Review vol-
ume refers to the amount of feedback a tourist posts on a platform, indicating familiarity with a destination's tourism offerings
(Hu et al., 2022). Review ratings, which portray travelers' experiences with tourism services, reveal preferences or attitudes
(Duan, Gu, & Whinston, 2008). Review likes and feedback are essential elements of customer engagement. They are not random
signals meant to attract clicks; instead, they serve as measures of product quality (Bakhshi, Kanuparthy, & Shamma, 2015). Re-
view length (number of bytes), which is closely related to review helpfulness (Yang et al., 2017), can additionally influence tour-
ists' behavior and visitor arrivals.

Content-based sentiment analysis

We extracted two types of sentiment variables, namely each review's generic sentiment polarity and the sentiment polarity for
different review aspects (i.e., fine-grained sentiment variables). Fig. 2 depicts our calculation process.
Generic sentiment analysis: We used the sentiment knowledge enhanced pre-training algorithm (Tian et al., 2020) to calculate
the generic Chinese review sentiment polarity per user. This method, which the Baidu research team developed, is a pre-training
deep learning algorithm based on emotional knowledge enhancement. The ernie_1.0_skep_large_chit model was used to classify
review sentiment. This model was trained on the Chinese ChnSentiCorp dataset, which contains a large number of online Chinese
reviews. The model's input is the text content of each review; its output is either 1 or 0, respectively, indicating a review's positive
or negative sentiment. After identifying the sentiment polarity of text, we computed monthly or weekly generic sentiment polar-
ity based on the average sentiment value; a higher value conveys more positive sentiment.
Fine-grained sentiment analysis: Each review's fine-grained sentiment variables were calculated via three steps. First, we ex-
tracted a series of aspect terms with corresponding semantic polarity (positive, neutral, and negative) from the review content.

Fig. 2. Schematic drawing of sentiment analysis.

6
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

AipNlp was applied to identify aspect-based semantic variables in reviews. This toolkit is the most popular natural language pro-
cessing method for Chinese opinion mining, developed by Baidu Smart Cloud with 3 million corpus resources. The toolkit can
mine opinions on aspects of Chinese reviews using advanced natural language processing based on a bidirectional long-short-
term memory model in industries including tourism. The input was the review content, and the output was a series of aspect
terms with corresponding sentiment polarities (i.e., 0, 1, and 2 denote negative, neutral, and positive sentiment). Second, after
eliminating meaningless aspect terms (e.g., “a,” “555”), we manually classified aspect terms into categories to reduce the data di-
mensions. For example, the aspect term in the sentence “the beach is so gorgeous” is “beach”; the corresponding aspect category
is “scenery.” In the end, seven aspect categories were mined from online reviews: transportation, business, impression, scenery, ser-
vice, environment, and other (see Fig. B category word cloud in the Appendix).
Transportation consisted of traffic-related terms such as “station,” “traffic,” and “car.” Business included terms about commercial
activities and transactions, such as “shopping,” “consumption,” “prices,” and “tickets.” Impression contained terms reflecting tour-
ists' overall impressions when visiting the destination, such as “good trip” and “expectations.” Scenery referenced terms related to
tourist attractions' surroundings (e.g., “streets”). Service denoted terms regarding service quality in contexts such as hotels, restau-
rants, and tours (e.g., servers' attitudes). Environment contained terms related to tourist attractions' climate (e.g., weather, wind).
Other covered any tourism-related aspect terms that did not fit into the above categories due to volume limitations (e.g., tourists,
time, or local people).
Third, we calculated the sentiment polarity for aspect categories at monthly and weekly frequencies. The sentiment score of
each aspect per month or week was computed as Score_aspect = (P.count) / (P.count + Neu.count + Neg.count). P.count,
Neu.count, and Neg.count respectively denote the total number of positive, neutral, and negative sentiments per aspect in each
month or week. A higher Score_aspect indicates a stronger positive sentiment. Fig. C in the Appendix presents scatter plot for var-
iables' statistical analysis.

Model specification

Models with different input variables were defined and compared to address our research aims. Table 3 lists the progressive
specification of three model series: the Model 1 series without Internet big data, the Model 2 series with search engine data, and
the Model 3 series with multisource Internet big data. The Model 1 series consisted of the following models: Model 1.1 (M1.1),
Model 1.2 (M1.2), Model 1.3 (M1.3), and Model 1.4 (M1.4) used tourist arrival data as input through an exponential smoothing
(ETS), seasonal naïve (SNAÏVE), seasonal autoregressive integrated moving average (SARIMA), and trigonometric Box-Cox ARMA
trend seasonal (TBATS) model, respectively; Model 1.5 (M1.5) further integrated official announcement data and holiday data as
input in the SARIMA with exogenous factors (SARIMAX) model and extreme gradient boosting (XGBoost) model. Based on the
variables used in Model 1.5 (M1.5), the Model 2 series included search engine data as input. Given the variables in Model 2,
the Model 3 series further incorporated online review data into the SARIMAX and XGBoost models. Model 3.1 (M3.1) contained
generic review sentiment data, and Model 3.2 (M3.2) further added fine-grained review sentiment data. We utilized expanding
window modeling approach to assess results' stability and consistency: the fixed starting point and the upper bound of the win-
dow rolls forward as the window size expands. The benefit of this approach over a fixed length rolling window approach is that
the most historical data available at any point are used in time (Ubilava & Helmers, 2013). Our test period covered the last
6 months and 12 weeks; 1-, 2-, and 3-steps-ahead predictions were made.
Four typical time series models served as our benchmarks: ETS, SNAÏVE, SARIMA, and TBATS. We used SARIMAX and XGBoost
as multivariate models to forecast tourist arrivals. XGBoost is a well-known machine learning algorithm developed by Chen and
Guestrin (2016). It is designed to provide accurate predictions by constructing and combining the output of a series of decision
trees. This model has attracted widespread interest due to its superior efficiency, scalability, and robustness. It is also an optimized
gradient tree boosting system that integrates algorithmic innovations (e.g., parallel learning), which can learn and address

Table 3
Models and variables.

Model Name Tourist Holiday Announcement Generic Fine-grained Search Model


series arrival data data review review engine
data sentiment data sentiment data data

M1.1 √ ETS
M1.2 √ SNAÏVE
M1.3 √ SARIMA
M1.4 √ TBATS
Model 1 Series M1.5 √ √ √ SARIAMX &
M2.1 √ √ √ √ XGBoost
Model 2 Series
M2.2 √ √ √ √
M3.1 √ √ √ √ √
Model 3 Series
M3.2 √ √ √ √ √ √

Note: M2.1 used the SARIMAX model; M2.2 used the XGBoost model. SARIMAX = seasonal autoregressive integrated moving average with exogenous factors;
XGBoost = extreme gradient boosting; ETS = exponential smoothing; SNAÏVE = seasonal naïve; SARIMA = seasonal autoregressive integrated moving average;
TBATS = trigonometric Box-Cox ARMA trend seasonal.

7
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

overfitting issues (Shi, Wong, Li, Palanisamy, & Chai, 2019). The above models are detailed in Chen and Guestrin (2016), Hu et al.
(2022), Shumway (2017), Gunter and Önder (2016), Hassani, Silva, Antonakakis, Filis, and Gupta (2017), and Ferreira, Vega-
Oliveros, Zhao, Cardoso, and Macau (2020), respectively.
We applied a hybrid feature engineering approach to enhance the models' predictions through two major steps: data dimen-
sion reduction and feature selection. Data dimension reduction was performed prior to model specification. To address the prob-
lem of multicollinearity and reduce data dimensions, we performed correlation analysis by removing closely correlated features.
We specifically referred to the Pearson correlation coefficient, a standard correlation metric developed by Galton (Gillham, 2001).
This coefficient measures the strength of the linear relationship between two variables on a scale of +1 (perfect positive corre-
lation) to −1 (perfect negative correlation), with 0 denoting no correlation (Adler & Parmryd, 2010). For feature selection,
which we conducted during model specification, an exhaustive search technique was used to identify the optimal feature set
and corresponding model. Other methods (e.g., wrapper methods) require a specified learning algorithm to assess feature subsets.
The exhaustive search approach is not bound to a certain algorithm; it can be applied to any learning algorithm that accepts fea-
ture subsets as input. It can also evaluate all possible feature combinations to discern the most effective set for maximum predic-
tive accuracy (Kohavi & John, 1997). The technique thus does not overlook any feature set that may exhibit optimal prediction
performance.
Lastly, following Li, Hu, and Li (2020), we used the mean absolute percentage error (MAPE), mean absolute error (MAE), mean
square error (MSE), and root mean square error (RMSE) as evaluation criteria to compare the above models' forecasting perfor-
mance. Models' improvement under different conditions was also considered. Taking the enhancement of Model 2.2 (M2.2) com-
pared with Model 2.1 (M2.1) by XGBoost as an example, the equation for MAPE is IMP ¼ MAPE ðM2:1Þ  MAPE ðM2:2Þ
MAPE ðM2:1 Þ *100 %.

Empirical results

Main forecasting results

To carry out model evaluation and prediction during the pandemic, we compared the three groups of models' performance
externally and internally as shown in Fig. 3. From an external perspective, we compared the predictive accuracy of the Model
1 series (without Internet big data) and the Model 2 series (with single-source Internet big data) to determine whether search
engine data remained effective for tourism demand forecasting during the pandemic. We further compared the predictive accu-
racy of the Model 2 series and the Model 3 series (with multisource Internet big data) to identify whether social media data
maintained stable predictive power during this period. We additionally explored whether multisource Internet big data were
more accurate than single-source Internet big data in predicting tourism demand. From an internal perspective, within the
Model 3 series, we compared M3.1 and M3.2 overall to ascertain whether models containing both generic and fine-grained review
sentiment data yielded better tourism demand predictions than models with generic review sentiment data.
Table 4 provides a monthly and weekly performance comparison of the Model 1 series and the Model 2 series for 1-, 2-, and 3-
steps-ahead forecasts. The majority of the Model 2 series, when taking search engine data as input, outperformed the Model 1
series in all evaluation metrics and forecasts. Table 5 compares the performance of the Model 2 series and the Model 3 series.
The predictive power of M3.1 models including generic review sentiment data and search engine data was unstable; these models
were superior to the Model 2 series in monthly prediction, but not weekly prediction, on all evaluation metrics. The M3.2 models
showed consistently higher accuracy than the Model 2 series in 1- to 3-steps-ahead forecasts on all evaluation metrics. Regarding
the Model 3 series' internal comparison, the models with both generic and fine-grained review sentiment data (M3.2) exhibited
better prediction performance than models using generic review sentiment data (M3.1) across most evaluation metrics. We used
the Diebold–Mariano test to compare the above models' predictive accuracy (see Tables A and B in the Appendix). The results
corroborated our stated conclusions.

Fig. 3. Model comparison framework.

8
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

Table 4
Performance comparison at monthly and weekly frequencies.

Monthly Weekly

Step Model MAPE MAE MSE RMSE MAPE MAE MSE RMSE

Model 1 series
M1.1 0.468 173,107 42,493,000,000 206,138 0.863 42,674 2,845,922,241 53,347
M1.2 0.733 377,576 165,290,000,000 406,559 1.764 95,497 11,829,183,083 108,762
M1.3 0.365 163,975 35,619,000,000 188,731 0.632 32,265 1,405,451,095 37,489
M1.4 0.266 114,011 31,254,712,891 176,790 0.277 14,518 324,213,982 18,006
Ahead = 1
M1.5* 0.443 168,219 34,152,000,000 184,802 0.372 28,421 1,383,348,680 37,193
M1.5** 0.433 216,379 64,712,000,000 254,386 0.311 24,826 1,564,989,042 39,560
Model 2 series
M2.1 0.345 137,605 28,675,305,456 169,338 0.372↓ 28,340 1,376,816,277 37,105
M2.2 0.380↓ 193,057↓ 58,944,989,343↓ 242,786↓ 0.260 20,340 1,034,632,009 32,166
Model 1 series
M1.1 0.371 211,381 63,752,000,000 252,492 1.562 66,329 6,038,299,886 77,706
M1.2 0.680 414,065 190,733,000,000 436,730 1.811 92,992 11,527,991,826 107,368
M1.3 0.503 268,758 98,118,000,000 313,238 0.840 41,452 3,104,165,971 55,715
M1.4 0.437 203,103 70,242,718,226 265,033 0.673 36,698 2,142,473,406 46,287
Ahead = 2
M1.5* 0.304 157,383 30,865,000,000 175,684 0.386 28,627 1,455,664,084 38,153
M1.5** 0.433 231,931 69,440,000,000 263,514 0.391 28,864 1,859,136,458 43,118
Model 2 series
M2.1 0.270 134,959 30,086,283,320 173,454 0.385 28,554 1,449,817,301 38,076
M2.2 0.365 216,216↓ 74,358,581,963↓ 272,688↓ 0.293 23,447 1,309,160,598 36,182
Model 1 series
M1.1 0.377 237,787 76,089,000,000 275,843 1.952 75,716 8,824,214,445 93,937
M1.2 0.620 404,681 187,431,000,000 432,933 1.863 90,576 11,308,355,328 106,341
M1.3 0.510 321,472 127,730,000,000 357,394 0.962 52,065 4,128,811,698 64,256
M1.4 0.626 320,738 132,511,378,240 364,021 1.038 48,575 3,656,298,584 60,467
Ahead = 3
M1.5* 0.338 176,081 36,781,000,000 191,784 0.376 27,613 1,467,887,960 38,313
M1.5** 0.389 215,413 69,014,000,000 262,706 0.396 28,619 1,836,647,813 42,856
Model 2 series
M2.1 0.330 167,321 35,800,939,784 189,211 0.375 27,544 1,462,413,051 38,242
M2.2 0.359 208,534 69,441,879,107↓ 263,518↓ 0.282 22,037 1,156,253,029 34,004

Note: * and M2.1 indicate the SARIMAX model; ** and M2.2 indicate the XGBoost model; ↓ denotes that the model's prediction accuracy is lower than at least two
other models in the Model 1 series; MAPE = mean absolute percentage error; MAE = mean absolute error; MSE = mean square error; RMSE = root mean square
error; MAPE is presented to three decimal places, other metrics are displayed as integers.

Validation assessment

To verify our findings for Kulangsu, we conducted a study on Jiuzhaigou Valley, another AAAAA-level tourist attraction in
China. Jiuzhaigou Valley is renowned for its natural heritage and tourist area of great human and ecological value (Fang et al.,
2022). Multiple studies have documented this area's allure, highlighting its appeal to both domestic and international tourists
(e.g., Bi, Liu, & Li, 2020; Zhang, Li, Shi, & Law, 2020). We collected 6292 reviews and 249 official announcements between June
2012 and July 2021. Excessively high tourist volumes led to frequent visitor restrictions as announced on Jiuzhaigou Valley's of-
ficial website; as such, “visitor volume limitation” was included in the official announcement variable series. Fig. D in the Appen-
dix presents scatter plot for variables' statistical analysis.
Table 6 compares the monthly and weekly performance of the Model 1 series and the Model 2 series for 1-, 2-, and 3-steps-
ahead forecasting. Most of the Model 2 series, which used search engine data as input in the SARIMAX and XGBoost models,
outperformed the Model 1 series on all evaluation metrics for 1-, 2-, or 3 steps-ahead forecasting. Table 7 shows a performance
comparison between the Model 2 series and Model 3 series. The predictive capability of M3.1 models did not consistently outper-
form the Model 2 series, especially in terms of monthly prediction. The M3.2 models demonstrated consistent improvements in
accuracy over the Model 2 series across all evaluation metrics for 1- to 3-steps-ahead forecasting. As for an internal comparison
of the Model 3 series, the M3.2 models demonstrated superior performance over models employing generic review sentiment
data (M3.1 models) for most evaluation metrics. The Diebold–Mariano test (see Tables A and B in the Appendix) reinforced
our conclusions.

Conclusions

Concluding remarks

We predicted monthly and weekly visitor arrivals for two major tourist attractions (i.e., Kulangsu Island and Jiuzhaigou Valley)
in China. Models incorporating both generic and fine-grained review sentiment data outperformed models without review senti-
ment data and models with generic review sentiment data. Several conclusions can be drawn from our findings.

9
H. Li, H. Gao and H. Song

Table 5
Performance comparison at monthly and weekly frequencies.

Monthly Weekly

Step Model MAPE IMP MAE IMP MSE IMP RMSE IMP MAPE IMP MAE IMP MSE IMP RMSE IMP

SARIAMX
M2.1 0.345 137,605 28,675,305,456 169,338 0.372 28,340 1,376,816,277 37,105
M3.1 0.319 7.54 % 125,077 9.10 % 25,495,120,613 11.09 % 159,672 5.71 % 0.370 0.54 % 28,373 −0.11 %↓ 1,383,500,215 −0.49 %↓ 37,195 −0.24 %↓
M3.2 0.256 19.75 % 105,489 15.66 % 17,028,953,238 33.21 % 130,495 18.27 % 0.320 13.51 % 25,555 9.93 % 1,250,412,066 9.62 % 35,361 4.93 %
XGBoost
M2.2 0.380 193,057 58,944,989,343 242,786 0.260 20,340 1,034,632,009 32,166
M3.1 0.307 19.21 % 148,760 22.95 % 30,469,170,673 48.31 % 174,554 28.10 % 0.341 −31.15 %↓ 24,531 −20.60 %↓ 1,057,410,187 −2.20 %↓ 32,518 −1.09 %↓
Ahead = 1 M3.2 0.207 32.57 % 105,006 29.41 % 16,135,648,834 47.04 % 127,026 27.23 % 0.216 36.66 % 16,247 33.77 % 630,130,338 40.41 % 25,102 22.80 %
SARIAMX
M2.1 0.270 134,959 30,086,283,320 173,454 0.385 28,554 1,449,817,301 38,076

10
M3.1 0.239 11.48 % 117,837 12.69 % 22,893,261,006 23.91 % 151,305 12.77 % 0.382 0.78 % 28,491 0.22 % 1,451,044,563 −0.08 %↓ 38,093 −0.04 %↓
M3.2 0.199 16.74 % 108,281 8.11 % 16,299,352,583 28.80 % 127,669 15.62 % 0.335 12.30 % 26,309 7.66 % 1,348,583,879 7.06 % 36,723 3.60 %
XGBoost
M2.2 0.365 216,216 74,358,581,963 272,688 0.293 23,447 1,309,160,598 36,182
M3.1 0.381 −4.38 %↓ 202,261 6.45 % 50,444,955,447 32.16 % 224,599 17.63 % 0.376 −28.33 %↓ 26,710 −13.91 %↓ 1,379,802,584 −5.40 %↓ 37,146 −2.66 %↓
Ahead = 2 M3.2 0.202 46.98 % 106,634 47.28 % 15,467,349,989 69.34 % 124,368 44.63 % 0.238 36.70 % 18,827 29.51 % 813,976,865 41.01 % 28,530 23.19 %
SARIAMX
M2.1 0.330 167,321 35,800,939,784 189,211 0.375 27,544 1,462,413,051 38,242
M3.1 0.282 14.55 % 141,269 15.57 % 29,160,340,416 18.55 % 170,764 9.75 % 0.371 1.07 % 27,412 0.48 % 1,458,023,504 0.30 % 38,184 0.15 %
M3.2 0.226 19.86 % 122,202 13.50 % 20,636,881,139 29.23 % 143,655 15.87 % 0.325 12.40 % 25,677 6.33 % 1,455,248,241 0.19 % 38,148 0.10 %
XGBoost
M2.2 0.359 208,534 69,441,879,107 263,518 0.282 22,037 1,156,253,029 34,004
M3.1 0.234 34.82 % 105,805 49.26 % 22,532,924,095 67.55 % 150,110 43.04 % 0.308 −9.22 %↓ 21,872 0.75 % 1,083,341,765 6.31 % 32,914 3.20 %
Ahead = 3 M3.2 0.134 42.74 % 77,788 26.48 % 12,298,718,793 45.42 % 110,900 26.12 % 0.166 46.10 % 13,985 36.06 % 677,073,357 37.50 % 26,021 20.94 %

Note: SARIMAX = seasonal autoregressive integrated moving average with exogenous factors; XGBoost = extreme gradient boosting; MAPE = mean absolute percentage error; MAE = mean absolute error; MSE = mean
square error; RMSE = root mean square error; MAPE is presented to three decimal places, other metrics are displayed as integers.
Annals of Tourism Research 103 (2023) 103667
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

Table 6
Performance comparison at monthly and weekly frequencies.

Monthly Weekly

Step Model MAPE MAE MSE RMSE MAPE MAE MSE RMSE

Model 1 series
M1.1 0.911 124,872 16,591,477,587 128,808 0.275 23,319 963,373,021 31,038
M1.2 0.819 204,281 54,573,366,469 233,609 0.231 18,664 585,505,202 24,197
M1.3 0.537 115,924 18,788,607,814 137,072 0.227 19,420 586,770,559 24,223
M1.4 0.745 73,849 9,933,733,144 99,668 0.128 8959 121,512,701 11,023
Ahead = 1 M1.5* 0.277 73,088 11,597,208,692 107,690 0.218 18,392 523,121,706 22,872
M1.5** 0.722 112,122 21,604,489,151 146,985 0.186 15,952 581,117,756 24,106
Model 2 series
M2.1 0.297 75,726 12,600,000,000 112,342 0.224↓ 18,635↓ 542,000,000 23,279
M2.2 0.347 92,085 14,800,000,000↓ 121,836 0.178 13,550 409,000,000 20,219
Model 1 series
M1.1 0.638 151,621 27,189,333,718 164,892 0.327 27,543 1,304,896,840 36,123
M1.2 0.783 237,092 65,164,493,994 255,273 0.320 26,281 951,955,165 30,854
M1.3 0.491 146,865 26,762,050,762 163,591 0.299 25,967 1,102,164,264 33,199
M1.4 0.460 128,411 24,413,885,264 156,249 0.224 15,664 319,848,185 17,884
Ahead = 2 M1.5* 0.278 85,935 13,895,173,495 117,878 0.227 19,140 560,845,262 23,682
M1.5** 0.236 93,933 22,346,464,201 149,487 0.234 18,845 668,898,297 25,863
Model 2 series
M2.1 0.223 77,424 15,000,000,000 122,651 0.234↓ 19,426↓ 581,000,000 24,110
M2.2 0.229 86,468 16,500,000,000 128,542 0.183 15,114 501,000,000 22,387
Model 1 series
M1.1 0.466 163,557 41,354,716,366 203,359 0.374 31,627 1,656,226,874 40,697
M1.2 0.729 260,999 76,452,530,937 276,501 0.455 36,154 1,828,786,216 42,764
M1.3 0.529 187,768 41,284,094,071 203,185 0.399 35,189 2,121,793,636 46,063
M1.4 0.558 159,025 31,110,659,226 176,382 0.273 17,625 371,395,739 19,272
Ahead = 3 M1.5* 0.225 90,121 16,172,724,456 127,172 0.237 20,118 608,185,506 24,661
M1.5** 0.306 122,061 28,218,291,659 167,983 0.266 21,532 768,306,287 27,718
Model 2 series
M2.1 0.201 84,809 17,400,000,000 131,986 0.244 20,344 629,000,000 25,074
M2.2 0.305 116,941 20,200,000,000 142,101 0.183 14,101 341,000,000 18,456

Note: * and M2.1 indicate the SARIMAX model; ** and M2.2 indicate the XGBoost model; ↓ denotes that the model's prediction accuracy is lower than at least two
other models in the Model 1 series; MAPE = mean absolute percentage error; MAE = mean absolute error; MSE = mean square error; RMSE = root mean square
error; MAPE is presented to three decimal places, other metrics are displayed as integers.

First, fine-grained sentiment analysis improves tourism demand forecasting accuracy. Even though generic sentiment analysis
has been used to forecast tourism demand (e.g., Önder et al., 2019), our empirical results indicate that its predictive power is sus-
ceptible to destination changes and prediction frequency. Generic sentiment analysis is not granular enough to reflect consumers'
preferences. As a valuable supplement, fine-grained sentiment analysis produces a more nuanced sense of individuals' attitudes by
mining various sentiment aspects. Our findings highlight the complementary value of generic and fine-grained sentiment analysis
in relation to online reviews for tourism demand forecasting.
Second, both fine-grained sentiment analysis–based online review data and search engine data exhibited consistently accurate
predictive power during an uncertain time (i.e., COVID-19). Real-time online reviews provide an ongoing record of consumers'
sentiments. This information can be processed via fine-grained sentiment analysis to capture shifts in consumers' actions amid
crises. For instance, tourists might express relatively stronger sentiments towards attractions' service or food during COVID-19
compared with other periods; such reactions may be tied to pandemic-induced health concerns. Search engine stability is partly
attributable to the importance of online search behavior in tourists' travel planning and decision-making.
Finally, the combination of search engine data, generic review sentiment data, and fine-grained review sentiment data is superior
to search engine data when forecasting tourism demand. These results underline the need to consider multisource Internet big data
when making predictions during the pandemic. Poor prediction performance is often due to limited information, which hampers a
model's ability to estimate outcomes in diverse scenarios (Phillips, Dowling, Shaffer, Hodas, & Volkova, 2017). Blending generic
and aspect-based review sentiments with search engine data reveals a clearer picture of consumers' behavior and preferences. Our
findings echo Andariesta and Wasesa's (2022) discovery that prediction models combining online travel forums and search engine
predictors outperformed those using a single Internet data source. These outcomes also support Li, Hu, and Li's (2020) conclusion
that incorporating data from multiple platforms into a single forecasting model could enhance forecasting performance.

Implications

First, we made an initial attempt to improve demand forecasting accuracy based on fine-grained sentiment analysis at the re-
view level. This study also represents an early effort to compare the power of models without sentiments, with generic senti-
ments, and with both generic and aspect-based sentiments in forecasting tourist arrivals. Consumers might prioritize aspects of
their experiences differently, with certain aspects carrying more weight than others (Li, Yu, Li, & Gao, 2023). Online review

11
H. Li, H. Gao and H. Song

Table 7
Performance comparison at monthly and weekly frequencies.

Monthly Weekly

Step Model MAPE IMP MAE IMP MSE IMP RMSE IMP MAPE IMP MAE IMP MSE IMP RMSE IMP

Ahead = 1 SARIAMX
M2.1 0.297 75,726 12,600,000,000 112,342 0.224 18,635 542,000,000 23,279
M3.1 0.310 −4.38 %↓ 77,349 −2.14 %↓ 13,469,419,246 −6.90 %↓ 116,058 −3.31 %↓ 0.221 1.34 % 18,381 1.36 % 533,000,000 1.66 % 23,088 0.82 %
M3.2 0.191 38.39 % 61,054 21.07 % 10,800,000,000 19.82 % 103,717 10.63 % 0.197 10.86 % 16,922 7.94 % 506,000,000 5.07 % 22,488 2.60 %
XGBoost
M2.2 0.347 92,085 14,800,000,000 121,836 0.178 13,550 409,000,000 20,219
M3.1 0.390 −12.39 %↓ 105,207 −14.25 %↓ 17,124,233,672 −15.70 %↓ 130,860 −7.41 %↓ 0.180 −1.12 %↓ 14,319 −5.68 %↓ 497,774,191 −21.71 %↓ 22,311 −10.35 %↓
M3.2 0.261 33.08 % 89,627 14.81 % 20,200,000,000 −17.96 % 142,145 −8.62 %↓ 0.111 38.33 % 11,213 21.69 % 30,000,000 93.97 % 17,317 22.39 %
Ahead = 2 SARIAMX
M2.1 0.223 77,424 15,000,000,000 122,651 0.234 19,426 581,000,000 24,110

12
M3.1 0.242 −8.52 %↓ 81,391 −5.12 %↓ 16,394,146,513 −9.29 %↓ 128,040 −4.39 %↓ 0.231 1.28 % 19,203 1.15 % 573,000,000 1.38 % 23,930 0.75 %
M3.2 0.161 33.47 % 67,956 16.51 % 15,800,000,000 3.62 % 125,577 1.92 % 0.189 18.18 % 16,437 14.40 % 424,000,000 26.00 % 20,591 13.95 %
XGBoost
M2.2 0.229 86,468 16,500,000,000 128,542 0.183 15,114 501,000,000 22,387
M3.1 0.320 −39.74 %↓ 103,680 −19.90 %↓ 21,266,900,293 −28.89 %↓ 145,832 −13.45 %↓ 0.184 −0.55 %↓ 15,118 −0.03 %↓ 426,000,000 14.97 % 20,631 7.85 %
M3.2 0.224 30.00 % 70,490 32.01 % 9,780,000,000 54.01 % 98,919 32.17 % 0.110 40.22 % 8850 41.46 % 110,000,000 74.18 % 10,502 49.10 %
Ahead = 3 SARIAMX
M2.1 0.201 84,809 17,400,000,000 131,986 0.244 20,344 629,000,000 25,074
M3.1 0.213 −5.97 %↓ 88,168 −3.96 %↓ 19,067,358,989 −9.58 %↓ 138,085 −4.62 %↓ 0.240 1.64 % 20,037 1.51 % 618,000,000 1.75 % 24,857 0.86 %
M3.2 0.146 31.46 % 69,992 20.62 % 16,200,000,000 15.04 % 127,328 7.79 % 0.199 17.08 % 17,376 13.28 % 460,990,610 25.41 % 21,471 13.62 %
XGBoost
M2.2 0.305 116,941 20,200,000,000 142,101 0.183 14,101 341,000,000 18,456
M3.1 0.305 0.00 % 114,958 1.70 % 23,350,908,849 −15.60 %↓ 152,810 −7.54 %↓ 0.174 4.92 % 14,489 −2.75 %↓ 405,338,542 −18.87 %↓ 20,133 −9.09 %↓
M3.2 0.150 50.82 % 62,018 46.05 % 9,070,000,000 61.16 % 95,244 37.67 % 0.096 44.83 % 8149 43.76 % 99,954,198 75.34 % 9998 50.34 %

Note: SARIMAX = seasonal autoregressive integrated moving average with exogenous factors; XGBoost = extreme gradient boosting; MAPE = mean absolute percentage error; MAE = mean absolute error; MSE = mean
square error; RMSE = root mean square error; MAPE is presented to three decimal places, other metrics are displayed as integers.
Annals of Tourism Research 103 (2023) 103667
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

data with sentiment analysis possess predictive power in relation to tourism demand; however, no other work appears to have
investigated the incremental predictive power of review- and aspect-level sentiment analysis in tourism demand forecasting.
By delving into aspect-based customer review sentiments, our findings validate their effectiveness in enhancing tourism demand
prediction. This approach offers a more intricate view of tourists' preferences.
Second, we examined the resilience of search engine data and granular sentiment analysis–based online review data for tour-
ism forecasting during turbulent periods. Search engine data are available in real time and provide insight into tourists' informa-
tion needs, search queries, and online behavior during the pandemic. Granular sentiment analysis–based online review data
convey travelers' feedback, enabling a deeper understanding of their attitudes and concerns during the pandemic. Our results
showcase the ability of such data to capture timely information. Dynamic fluctuations in tourism demand during uncertain
times thus become clearer. We strongly recommend that search engine data or granular sentiment analysis–based online review
data be incorporated into forecasting models to make demand predictions during crises.
Third, we enriched tourism demand prediction amid a pandemic by harnessing multisource and complementary Internet in-
formation to forecast tourism demand. To the best of our knowledge, this study is one of the earliest to incorporate tourists' at-
tention information, official announcement details, and review-level generic and aspect-based customer sentiments into one
model to predict demand during the pandemic. Other studies addressed macro factors or Internet users' attention information
when forecasting tourism demand amid COVID-19. Limited research has investigated multisource Internet data's predictive
power for predicting tourism demand in this era. We integrated Internet data from multiple sources, capitalizing on their unique
strengths. Our approach represents a comprehensive and accurate method of tourism demand forecasting.
Finally, our outcomes offer practical implications for stakeholders involved in tourism industry planning and operations. Prac-
titioners can refer to this work to implement flexible yet targeted strategies that cater to individuals' preferences for tourist attrac-
tions. Additionally, by integrating multisource Internet big data, our predictions will enable local governments to make informed
judgments in unforeseen circumstances such as COVID-19.

Limitations and future studies

This study is not without limitations. Fake reviews are becoming pervasive; abundant evidence has shown that companies and
clients post fake reviews, eroding reviews' credibility (Choi, Mattila, Van Hoof, & Quadri-Felitti, 2017). Researchers may wish to
filter fake reviews in the future to obtain more exact predictions. Furthermore, we only used data from search engines, official
announcements, and online reviews to predict visitor arrivals at two tourist attractions. Subsequent studies can include other
tourist attractions or destinations using similar approaches. Follow-up work can also combine additional factors (e.g., economic
data) to enhance forecasting models' performance.
Supplementary data to this article can be found online at https://doi.org/10.1016/j.annals.2023.103667.

Data availability

The authors appreciate the support by Meituan Research Institute and confirm that the data used in this study are exclusively
intended for research purposes and will not be disclosed or disseminated.

Declaration of competing interest

The authors have no competing financial interests or personal relationships that could inappropriately influence work.

Acknowledgments

This study is supported by Early Career Scheme from the Research Grants Council of the Hong Kong Special Administrative
Region, China (Project No. 25500520).
The authors appreciate the support by Meituan Research Institute and confirm that the data used in this study are exclusively
intended for research purposes and will not be disclosed or disseminated.

Author contribution statements

Hengyun Li, Huicai Gao designed the research framework and derived the models, as well as contributed to the manuscript
writing.
Haiyan Song provided suggestions for improvement of the research design, and made a major revision on the manuscript.

References

Adler, J., & Parmryd, I. (2010). Quantifying colocalization by correlation: The Pearson correlation coefficient is superior to the Mander's overlap coefficient. Cytometry.
Part A, 77 A (8), 733–742.
Alslaity, A., & Orji, R. (2022). Machine learning techniques for emotion detection and sentiment analysis: Current state, challenges, and future directions. Behavior &
Information Technology, 0(0), 1–26.

13
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

Ampountolas, A., & Legg, M. P. (2021). A segmented machine learning modeling approach of social media for predicting occupancy. International Journal of
Contemporary Hospitality Management, 33(6), 2001–2021.
Andariesta, D. T., & Wasesa, M. (2022). Machine learning models for predicting international tourist arrivals in Indonesia during the COVID-19 pandemic: a multisource In-
ternet data approach, Journal of Tourism Futures. Advance online publication.
Athanasopoulos, G., Hyndman, R. J., Kourentzes, N., & O’Hara-Wild, M. (2023). Probabilistic Forecasts Using Expert Judgment: The Road to Recovery From COVID-19.
Journal of Travel Research, 62(1), 233–258.
Bakhshi, S., Kanuparthy, P., & Shamma, D. (2015). Understanding Online Reviews: Funny, Cool or Useful? CSCW 2015 - Proceedings of the 2015 ACM International Confer-
ence on Computer-Supported Cooperative Work and Social Computing, 1270–1276.
Bi, J. W., Li, C., Xu, H., & Li, H. (2022). Forecasting daily tourism demand for tourist attractions with Big Data: An ensemble deep learning method. Journal of Travel
Research, 61(8), 1719–1737.
Bi, J. -W., Liu, Y., & Li, H. (2020). Daily tourism volume forecasting for tourist attractions. Annals of Tourism Research, 83, 102,923.
Bigné, E., Oltra, E., & Andreu, L. (2019). Harnessing stakeholder input on Twitter: A case study of short breaks in Spanish tourist cities. Tourism Management, 71,
490–503.
Chang, Y. -M., Chen, C. -H., Lai, J. -P., Lin, Y. -L., & Pai, P. -F. (2021). Forecasting Hotel Room Occupancy Using Long Short-Term Memory Networks with Sentiment Anal-
ysis and Scores of Customer Online Reviews. Applied Sciences, 11(21), 10,291.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (pp. 785–794).
Chen, Z., Cao, Y., Yao, H., Lu, X., Peng, X., Mei, H., & Liu, X. (2021). Emoji-powered Sentiment and Emotion Detection from Software Developers’ Communication Data.
ACM Transactions on Software Engineering and Methodology, 30(2), 1–48.
Choi, S., Mattila, A. S., Van Hoof, H. B., & Quadri-Felitti, D. (2017). The Role of Power and Incentives in Inducing Fake Reviews in the Tourism Industry. Journal of Travel
Research, 56(8), 975–987.
Colladon, A. F., Guardabascio, B., & Innarella, R. (2019). Using social network and semantic analysis to analyze online travel forums and forecast tourism demand.
Decision Support Systems, 123, 113,075.
Cui, R., Gallino, S., Moreno, A., & Zhang, D. J. (2018). The operational value of social media information. Production and Operations Management, 27(10), 1749–1769.
Cui, Y. (2019, January 28). 2019 ranking of search engines in China: Baidu, Shenma, Sougou, 360's market share. https://www.marketmechina.com/search-engine-
market-share-in-china-jan-2019/.
Dellarocas, C., Zhang, X.(. M.)., & Awad, N. F. (2007). Exploring the value of online product reviews in forecasting sales: The case of motion pictures. Journal of Interactive
Marketing, 21(4), 23–45.
Duan, W., Gu, B., & Whinston, A. (2008). Do Online Reviews Matter? – An Empirical Investigation of Panel Data. Decision Support Systems, 45, 1007–1016.
Easaw, J. Z., Garratt, D., & Heravi, S. M. (2005). Does consumer sentiment accurately forecast UK household consumption? Are there any comparisons to be made with
the US? Journal of Macroeconomics, 27(3), 517–532.
Fang, H., Shao, Y., Xie, C., Tian, B., Zhu, Y., Guo, Y., Yang, Q., & Yang, Y. (2022). Using Persistent Scatterer Interferometry for Post-Earthquake Landslide Susceptibility
Mapping in Jiuzhaigou. Applied Sciences, 12(18), 9228.
Ferreira, L. N., Vega-Oliveros, D. A., Zhao, L., Cardoso, M. F., & Macau, E. E. N. (2020). Global Fire Season Severity Analysis and Forecasting. Computers & Geosciences, 134,
104,339.
Fotiadis, A., Polyzos, S., & Huan, T. -C. T. C. (2021). The good, the bad and the ugly on COVID-19 tourism recovery. Annals of Tourism Research, 87, 103,117.
Geetha, M., Singha, P., & Sinha, S. (2017). Relationship between customer sentiment and online customer ratings for hotels – An empirical analysis. Tourism
Management, 61, 43–54.
Gillham, N. W. (2001). A life of Sir Francis Galton: From African exploration to the birth of eugenics. Oxford University Press.
Gunter, U., & Önder, I. (2016). Forecasting city arrivals with Google Analytics. Annals of Tourism Research, 61, 199–212.
Gunter, U., Önder, I., & Gindl, S. (2019). Exploring the predictive ability of LIKES of posts on the Facebook pages of four major city DMOs in Austria. Tourism Economics:
the Business and Finance of Tourism and Recreation, 25(3), 375–401.
Hassani, H., Silva, E. S., Antonakakis, N., Filis, G., & Gupta, R. (2017). Forecasting accuracy evaluation of tourist arrivals. Annals of Tourism Research, 63, 112–127.
Hu, M., Li, H., Song, H., Li, X., & Law, R. (2022). Tourism demand forecasting using tourist-generated online review data. Tourism Management, 90, 104,490.
Hu, M., Xiao, M., & Li, H. (2021). Which search queries are more powerful in tourism demand forecasting: searches via mobile device or PC? International Journal of
Contemporary Hospitality Management, 33(6), 2022–2043.
Huang, X., Zhang, L., & Ding, Y. (2017). The Baidu Index: Uses in predicting tourism flows –A case study of the Forbidden City. Tourism Management, 58, 301–306.
Huang, Y., Chen, Y., Zhou, Q., Zhao, J., & Wang, X. (2016). Where are we visiting? Measurement and analysis of venues in Dianping. In 2016 IEEE International Conference
on Communications (ICC) (pp. 1–6). IEEE.
Humagain, P., & Singleton, P. A. (2021). Exploring tourists’ motivations, constraints, and negotiations regarding outdoor recreation trips during COVID-19 through a
focus group study. Journal of Outdoor Recreation and Tourism, 36, 100,447.
Kaur, J., & Saini, J. R. (2014). Emotion detection and sentiment analysis in text corpus: a differential study with informal and formal writing styles. ISSN: International Journal
of Computer Application, 0975–8887.
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1), 273–324.
Kourentzes, N., Saayman, A., Jean-Pierre, P., Provenzano, D., Sahli, M., Seetaram, N., & Volo, S. (2021). Visitor arrivals forecasts amid COVID-19: A perspective from the
Africa team. Annals of Tourism Research, 88, 103,197.
Lau, R. Y. K., Zhang, W., & Xu, W. (2018). Parallel Aspect-Oriented Sentiment Analysis for Sales Forecasting with Big Data. Production and Operations Management, 27
(10), 1775–1794.
Law, R., Li, G., Fong, D. K. C., & Han, X. (2019). Tourism demand forecasting: A deep learning approach. Annals of Tourism Research, 75, 410–423.
Li, C., Ge, P., Liu, Z., & Zheng, W. (2020). Forecasting tourist arrivals using denoising and potential factors. Annals of Tourism Research, 83, 102,943.
Li, H., Hu, M., & Li, G. (2020). Forecasting tourism demand with multisource big data. Annals of Tourism Research, 83, 102,912.
Li, H., Meng, F., & Zhang, X. (2022). Are You Happy for Me? How Sharing Positive Tourism Experiences through Social Media Affects Posttrip Evaluations. Journal of
Travel Research, 61(3), 477–492.
Li, H., Yu, B. X. B., Li, G., & Gao, H. (2023). Restaurant survival prediction using customer-generated content: An aspect-based sentiment analysis of online reviews.
Tourism Management, 96, 104,707.
Li, J., Li, G., Liu, M., Zhu, X., & Wei, L. (2022). A novel text-based framework for forecasting agricultural futures using massive online news headlines. International
Journal of Forecasting, 38(1), 35–50.
Li, X., Li, H., Pan, B., & Law, R. (2021). Machine learning in internet search query selection for tourism forecasting. Journal of Travel Research, 60(6), 1213–1231.
Li, X., & Wu, L. (2018). Herding and social media word-of-mouth: Evidence from Groupon. MIS Quarterly, 42(4), 1331–1351.
Liu, A., Vici, L., Ramos, V., Giannoni, S., & Blake, A. (2021). Visitor arrivals forecasts amid COVID-19: A perspective from the Europe team. Annals of Tourism Research, 88,
103,182.
Liu, B. (2015). Sentiment analysis: mining sentiments, opinions, and emotions. Cambridge, UK: Cambridge University Press.
Ludvigson, S. C. (2004). Consumer Confidence and Consumer Spending. The Journal of Economic Perspectives, 18(2), 29–50.
Miah, S. J., Vu, H. Q., Gammack, J., & McGrath, M. (2017). A Big Data Analytics Method for Tourist Behavior Analysis. Information & Management, 54(6), 771–785.
Munezero, M., Montero, C. S., Sutinen, E., & Pajunen, J. (2014). Are They Different? Affect, Feeling, Emotion, Sentiment, and Opinion Detection in Text. IEEE Transactions
on Affective Computing, 5(2), 101–111.
O’Leary, D. (2011). The Use of Social Media in the Supply Chain: Survey and Extensions. Intelligent Systems in Accounting, Finance and Management, 18(2–3), 121–144.
Önder, I., Gunter, U., & Gindl, S. (2020). Utilizing Facebook Statistics in Tourism Demand Modeling and Destination Marketing. Journal of Travel Research, 59(2),
195–208.

14
H. Li, H. Gao and H. Song Annals of Tourism Research 103 (2023) 103667

Önder, I., Gunter, U., & Scharl, A. (2019). Forecasting tourist arrivals with the help of web sentiment: A mixed-frequency modeling approach for big data. Tourism
Analysis, 24(4), 437–452.
Pan, B., Wu, D. C., & Song, H. (2012). Forecasting hotel room demand using search engine data. Journal of Hospitality and Tourism Technology, 3(3), 196–210.
Pan, B., & Yang, Y. (2017). Forecasting Destination Weekly Hotel Occupancy with Big Data. Journal of Travel Research, 56(7), 957–970.
Park, E., Park, J., & Hu, M. (2021). Tourism demand forecasting with online news data mining. Annals of Tourism Research, 90, 103,273.
Phillips, L., Dowling, C.P., Shaffer, K., Hodas, N.O., & Volkova, S. (2017). Using Social Media to Predict the Future: A Systematic Literature Review. ArXiv, abs/1706.06134.
Pinson, P., & Makridakis, S. (2022). Pandemics and forecasting: The way forward through the Taleb-Ioannidis debate. International Journal of Forecasting, 38(2),
410–412.
Qiu, R. T. R., Wu, D. C., Dropsy, V., Petit, S., Pratt, S., & Ohe, Y. (2021). Visitor arrivals forecasts amid COVID-19: A perspective from the Asia and Pacific team. Annals of
Tourism Research, 88, 103,155.
Rivera, R. (2016). A dynamic linear model to forecast hotel registrations in Puerto Rico using Google Trends data. Tourism Management, 57, 12–20.
Shi, X., Wong, Y. D., Li, M. Z. -F., Palanisamy, C., & Chai, C. (2019). A feature learning approach based on XGBoost for driving assessment and risk prediction. Accident
Analysis and Prevention, 129, 170–179.
Shumway, R. H. (2017). Time Series Analysis and Its Applications: With R Examples. Springer Nature.
Starosta, K., Budz, S., & Krutwig, M. (2019). The impact of German-speaking online media on tourist arrivals in popular tourist destinations for Europeans. Applied
Economics, 51(14), 1558–1573.
Tian, F., Yang, Y., Mao, Z., & Tang, W. (2021). Forecasting daily attraction demand using big data from search engines and social media. International Journal of
Contemporary Hospitality Management, 33(6), 1950–1976.
Tian, H., Gao, C., Xiao, X., Liu, H., He, B., Wu, H., Wang, H., & Wu, F. (2020). SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis. ArXiv, abs/2005.
05635.
Ubilava, D., & Helmers, C. G. (2013). Forecasting ENSO with a smooth transition autoregressive model. Environmental modeling & software, 40, 181–190.
Wu, D. C., Zhong, S., Qiu, R. T. R., & Wu, J. (2022). Are customer reviews just reviews? Hotel forecasting using sentiment analysis. Tourism Economics: the Business and
Finance of Tourism and Recreation, 28(3), 795–816.
Xiang, Z., Schwartz, Z., Gerdes, J. H., & Uysal, M. (2015). What can big data and text analytics tell us about hotel guest experience and satisfaction? International Journal
of Hospitality Management, 44, 120–130.
Xiao, Y., Li, C., Thürer, M., Liu, Y., & Qu, T. (2022). User preference mining based on fine-grained sentiment analysis. Journal of Retailing and Consumer Services, 68,
103,013.
Yang, S. B., Shin, S. H., Joun, Y., & Koo, C. (2017). Exploring the comparative importance of online hotel reviews’ heuristic attributes in review helpfulness: a conjoint
analysis approach. Journal of Travel & Tourism Marketing, 34(7), 963–985.
Yang, Y., Fan, Y., Jiang, L., & Liu, X. (2022). Search query and tourism forecasting during the pandemic: When and where can digital footprints be helpful as predictors?
Annals of Tourism Research, 93, 103,365.
Zhang, B., Li, N., Shi, F., & Law, R. (2020). A deep learning approach for daily tourist flow forecasting with consumer search data. Asia Pacific Journal of Tourism Research,
25(3), 323–339.
Zhang, H., Song, H., Wen, L., & Liu, C. (2021). Forecasting tourism recovery amid COVID-19. Annals of Tourism Research, 87, 103,149.

Hengyun Li is an Associate Professor in the School of Hotel and Tourism Management at The Hong Kong Polytechnic University. His research interests include big data
and consumer psychology.
Huicai Gao is a Ph.D. Student at The Hong Kong Polytechnic University.
Haiyan Song is a Chair Professor in the School of Hotel and Tourism Management at The Hong Kong Polytechnic University. His research interests focus on tourism
demand modeling and forecasting.

15

You might also like