You are on page 1of 12

Emerging Markets Finance and Trade

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/mree20

Measurement of Individual Investor Sentiment


and Its Application: Evidence from Chinese Stock
Message Board

Chuangxia Huang , Shigang Wen , Xin Yang , Jinde Cao & Xiaoguang Yang

To cite this article: Chuangxia Huang , Shigang Wen , Xin Yang , Jinde Cao & Xiaoguang
Yang (2020): Measurement of Individual Investor Sentiment and Its Application: Evidence
from Chinese Stock Message Board, Emerging Markets Finance and Trade, DOI:
10.1080/1540496X.2020.1835637

To link to this article: https://doi.org/10.1080/1540496X.2020.1835637

View supplementary material

Published online: 04 Nov 2020.

Submit your article to this journal

Article views: 58

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=mree20
EMERGING MARKETS FINANCE AND TRADE
https://doi.org/10.1080/1540496X.2020.1835637

Measurement of Individual Investor Sentiment and Its Application:


Evidence from Chinese Stock Message Board
Chuangxia Huanga, Shigang Wena, Xin Yanga, Jinde Caob, and Xiaoguang Yangc
a
Hunan Provincial Key Laboratory of Mathematical Modeling and Analysis in Engineering, Changsha University of
Science and Technology, Changsha, China; bSchool of Mathematics, Southeast University, Nanjing, China; cCountry
Academy of Mathematics and Systems Science, Chinese Academy of Science, Beijing, China

ABSTRACT Keywords
This paper investigates individual investor sentiment in Chinese stock mes­ individual investor
sage board Guba Eastmoney and its relation to the market returns and sentiment; SO-LNPMI
volatility. Focusing on measuring the sentiment, we propose a novel algo­ algorithm; market returns;
rithm Semantic Orientation from Laplace Smoothed Normalized Pointwise market volatility
Mutual Information(SO-LNPMI). We show that: (i) comparing to traditional
methods, SO-LNPMI has higher accuracy and better adaptive property of
probability estimate; (ii) negative sentiment is negatively correlated with
market returns, whereas positive sentiment does not have any statistically
significant impact on market returns; (iii) positive(negative) sentiment is
negatively(positively) correlated with market volatility. Our results survive
a range of robustness tests.

1. Introduction
Efficient market hypothesis(EMH) posits that the influence of investor sentiment is negligible, as
investors rationally act on noisy information and arbitrage away sentiment-driven mispricing.
Continuing evidence of financial anomalies (e.g., herd behavior, negativity effect and asymmetric
volatility phenomenon (Huang et al. 2020; Lee, Jiang, and Indro 2002; Wen et al. 2019a), however,
challenges the EMH. Only since the rise of behavioral finance has there been a reasonable recognition
that investor sentiment is an essential fact in determining stock market prices.
In view of a great many empirical evidence that investor sentiment has effects on stock market
returns and volatility, behavioral finance argues that investor sentiment does inevitably result in the
price deviating from its fundamental value (Friedman 1953). For example, Farina, Parisi, and Pomante
(2017), Affuso and Lahtinen (2019) and Dimpfl and Kleiman (2019) demonstrate that an increase in
negative sentiment generates downward price pressure immediately, and the asymmetric volatility
phenomenon related to investor sentiment can be found in the publications (Siganos, Vagenas-Nanos,
and Verwijmeren 2014; Smales 2016; Wen et al. 2019b).
Regarding investor sentiment study in finance, the measure of investor sentiment is fundamental.
Generally speaking, the main measures of investor sentiment could be categorized in three ways
(Renault 2017): survey-, market- and media-based measures. The survey-based measure refers to
survey indices derived from polls to market professionals, such as American Association of Individual
Investors (AAII) (He, He, and Wen 2019), Investors Intelligence (II) (Oliveira, Cortez, and Areal
2016) and Consumer Confidence Index (CCI) (Farina, Parisi, and Pomante 2017), which directly
reflect investors’ expectations for financial market prospects. The market-based measure employs
financial indicators as proxies for investor sentiment, because a number of financial indicators play
a role as market weather vanes (He, He, and Wen 2019). These proxies range from mutual fund flows,

CONTACT Xin Yang yangxintaoyuan@163.com Changsha University of Science and Technology, Changsha 410114, China
Supplemental data for this article can be accessed on the publisher’s website.
© 2020 Taylor & Francis Group, LLC
2 C. HUANG ET AL.

to ratio of odd-lot sales to purchases, closed-end fund discount, IPO volume, etc (Li et al. 2020; Wen
et al. 2020; Yang, Zhu, and Cheng 2020).
With the growth of the Internet, media is an important platform where investors could effectively
obtain valuable information. Sentiment indicators derived from media are the media-based measure (Li
et al. 2020; Narayan 2020; Narayan and Bannigidadmath 2017). There are several influential studies on
the measure of media-based sentiment. For example, Narayan, Ranjeeni, and Bannigidadmath (2017a,
2017b) and Narayan (2019) use financial news to reflect public expectations for stock returns. In addition
to the use of news to measure sentiment, indicators derived from social media have been attracted more
and more attention, since investors not only read market information, but also share individual
investment opinions in social media (Oliveira, Cortez, and Areal 2016).
There are two kinds of common content analysis methods to extract investor sentiment indicators
from social media, one is supervised and the other is unsupervised (Oliveira, Cortez, and Areal 2016).
For the supervised case, taking the Naive Bayes Classifier as example, this implement requires
classified sentiment tags (Yang et al. 2019). Since the sentiment tags are difficult to collect and their
manual labeling is arduous in practice, a few scientists apply sentiment lexicons of unsupervised
classification to measure investor sentiment. Siganos, Vagenas-Nanos, and Verwijmeren (2014)
quantify Facebook’s daily sentiment by Facebook’s Gross National Happiness Index, which is gener­
ated by applying the Linguistic Inquiry and Word Count dictionary. Oliveira, Cortez, and Areal (2016)
capture sentiment from tweets via a specialized financial lexicon, whose construction is based on
weighted Semantic Orientation Pointwise Mutual Information (SO-PMI) algorithm. Farina, Parisi,
and Pomante (2017) measure investor sentiment from Palgrave Econolog via the Harvard IV-4
dictionary and the Lasswell value dictionary.
To a large extent, the applied sentiment lexicons are generic English language linguistic lexicons
which may be ineffective for messages posted on stock message board, because a word may not have
the same implication in different domains (Turney and Littman 2003). In order to overcome the
limitation of generic sentiment lexicons, some expansion methods are put forward to construct
specialized financial lexicons, such as Contextual Entropy Model, SO-PMI algorithm, and so on
(Oliveira, Cortez, and Areal 2016; Yu et al. 2013).
SO-PMI is a popular expansion algorithm that has been successfully applied in lots of Sentiment
Analysis tasks (Oliveira, Cortez, and Areal 2016; Turney and Littman 2003). However, there are three
important issues in SO-PMI algorithm. To begin with, SO-PMI relies on two sets of seed words, whose
distributions are approximately symmetrical. If the distributions of the word sets are asymmetrical, the
sentiment inferences of candidates could be biased (Kigon and Hyeoncheol 2016). Secondly, the
probability of occurring an appointed word is calculated by its frequency in the sample. Thus, zero
probability occurs frequently when a limited amount of sample is available. Thirdly, a candidate is
considered neutral if its association relative to positive words is equal to the association relative to
negative words (Yu et al. 2013). Since the equal association criterion is often not satisfied, potential
neutral candidates are sensitive to polarity. Therefore, the SO-PMI algorithm needs to be further
modified.
Furthermore, since the pre-processing of non-verbal elements prior to content analysis is more
costly, the extant sentiment lexicons tend to focus on natural language analysis and ignore emoticons
(Renault 2017). This leads to the fact that most textual analyses in finance treat emoticons in the
messages to be noisy labels and remove them from texts (Farina, Parisi, and Pomante 2017; Siganos,
Vagenas-Nanos, and Verwijmeren 2014). As far as the short informal texts are concerned, it is obvious
that emoticons allow the authors to express their emotions intuitively (Novak et al. 2015), and thus
emoticons can be additional features for sentiment analysis.
Motivated by the above discussion, in this paper, by introducing normalized and smoothed variants
and threshold score to SO-PMI, we propose a novel algorithm Semantic Orientation from Laplace
Smoothed Normalized Pointwise Mutual Information (SO-LNPMI). With the help of SO-LNPMI, we
automatically develop a specialized financial lexicon of words and emoticons. Furthermore, we
EMERGING MARKETS FINANCE AND TRADE 3

measure the individual investor sentiment in Chinese stock message board Guba Eastmoney and
examine its relation to the market returns and volatility. The contributions of this paper are as follows:

(1) Comparing to the traditional SO-PMI, two obvious advantages of SO-LNPMI can be found:
one is the higher accuracy for measuring sentiment and the other is better adaptive property of
probability estimate.
(2) The extent sentiment lexicons tend to focus on natural language analysis and ignore emoticons;
Taking emoticons into account, we develop a specialized financial lexicon of emoticons and
words. To the best of our knowledge, this paper is the first to explore the ranks and roles of
emoticons in investor sentiment lexicon.
(3) Different from the literature which mainly measures the investor sentiment in news, tweets and
Facebook, we focus on the investigation of individual investor sentiment in Chinese stock
message board Guba Eastmoney in this paper.
(4) We examine the relationship between individual investor sentiment and the market returns
and volatility.

The remainder of the paper is arranged as follows. Section 2 discusses the research hypotheses.
Section 3 describes the applied methodology related to the empirical investigation. Section 4 provides
details about data sources. Section 5 presents the empirical results and relates them to economic
theory. Section 6 reports a range of robustness tests. Finally, conclusions are drawn in Section 7.

2. The Hypothesis Development


Behavioral finance argues that investor sentiment does inevitably result in the price deviating from its
fundamental value. Thus, investor sentiment is of essential effect on stock market prices. Our study
aims to further examine a) whether individual investor sentiment in Chinese stock message board
Guba Eastmoney affects the market returns and b) its effect on the market volatility. We argue that
individual investor sentiment in Guba Eastmoney will have an effect on market returns and volatility
for the following reasons. First, Guba Eastmoney contains considerable insights from individual
investors. According to Das and Sick (Das and Sisk 2005), investors’ insights tend to affect each
other, and their behaviors converge probably. Thus, the optimistic and negative sentiment in the stock
message board are contagious, which make investors more bullish or bearish. Second, Friedman
suggests that irrational investors move prices away from fundamentals by buying when prices are high
and selling when prices are low (Friedman 1953). If individual investors are affected by investment
opinions posted on the discussion board, they will tend to trade irrationally and therefore, have an
effect on stock market prices.
Based on the above discussion, we raise the following hypothesis:

Hypothesis 1. Individual investor positive sentiment does not have any impact on market returns,
whereas individual investor negative sentiment is negatively correlated with market returns.

Because of the limited attention, investors tend to prefer easily processed information (Dong and
Ni 2014). In the stock message board, users take an active apart in discussing and spreading the
opinions which they agree with. By contrast, user plays less attention to the ideas that they` disagree
with. It means that individual investors who are subject to the limited attention constraint may
underreact to some market information. In addition to the limited attention constraint, individual
investors are prone to overreact negative information. The negative effect suggests that negative
information vs. positive information leads to a stronger influence on investors (Akhtar et al. 2011).
Individual investors would sell stocks and move to a safe haven quickly when they find bad news.
Thus, positive sentiment does not have any impact on market returns, whereas negative sentiment is
negatively correlated with market returns.
4 C. HUANG ET AL.

Based on the above, we measure the individual investor sentiment in Guba Eastmoney and examine
its relation to market returns in our first hypothesis. If individual investor sentiment has an explana­
tory value for market returns, we want to know whether it is useful in explaining the higher moments
of returns, i.e., market volatility, or not. Therefore, we have our second hypothesis:

Hypothesis 2. Individual investor positive (negative) sentiment is negatively (positively) correlated


with market volatility.

By decomposing sell trades into contrarian and herding trades, Avramov et al. suggest that herding
behavior increases volatility, whereas contrarian behavior depresses volatility (Avramov, Chordia, and
Goyal 2006). Herding trades are defined as sell trades when market price declines while contrarian
trades are sell trades when market price goes up. When market price declines, negative sentiment in
the stock message board is prone to translate into herding trades and when market price rises, positive
sentiment is prone to translate into contrarian trades. Thus, individual investor positive sentiment is
negatively correlated with market volatility, whereas individual investor negative sentiment is posi­
tively correlated with market volatility. To test H2, we examine the effect of individual investor
sentiment on market volatility.

3. Methodology
This section first proposes a novel algorithm SO-LNPMI for creating financial lexicons. Second, we
introduce two kinds of sentiment indicators. Third, we adopt accuracy to evaluate classification
performance. Last, two classes of regression models are employed to examine the hypothesis.

3.1. SO-LNPMI Algorithm


SO-PMI is a popular and fast expansion algorithm, which infers semantic orientation from statistical
association (Turney and Littman 2003). SO-PMI is defined as follows:
� �
pðword1 &word2 Þ
PMIðword1 ; word2 Þ ¼ log2 ; (1)
pðword1 Þpðword2 Þ

X X
SO PMIðwordÞ ¼ PMIðword; pwordÞ PMIðword; nwordÞ; (2)
pword2Pwords nword2Nwords

8
< positive;if SO PMIðwordi Þ > 0;
SOi ¼ neutral;if SO PMIðwordi Þ ¼ 0; (3)
:
negative;if SO PMIðwordi Þ < 0;
Ci
where pðwordi Þ ¼ jjMjj , Ci is the number of messages where wordi occurs; jjMjj is the total number of
C1;2
messages.pðword1 &word2 Þ ¼ jjMjj denotes the probability that word1 and word2 co-occur in
a message. Pwords and Nwords denote positive seed word set and negative seed word set, respectively.
Obviously, PMIðword1 ; word2 Þ 2 ð 1; minf log2 ðword1 Þ; log2 ðword2 ÞgÞ, thus the semantic
inference could vary widely. In order to reduce the inaccuracy resulted from unbalanced distribution
of seed word sets, the values should be normalized as follows (Kigon and Hyeoncheol 2016):
PMIðword1 ; word2 Þ
NPMIðword1 ; word2 Þ ¼ ; (4)
log2 pðword1 &word2 Þ
where
EMERGING MARKETS FINANCE AND TRADE 5

NPMIðword1 ; word2 Þ 2 ð 1; 1Þ
.
In the process of calculating PMI, pðword1 &word2 Þ ¼ 0 indicates that the words don’t co-occur in
any message. To avoid zero probability,pðword1 &word2 Þ is Laplace smoothed as follows:
( C1;2 þ1
¼ jjMjjþN ; if C1;2 ¼ 0;
Lpðword1 &word2 Þ C1;2 (5)
¼ jjMjj ; otherwise;

where N ¼ 2 denotes the number of class values. This form of Laplace smoothing makes the
probability estimates adaptive to small corpus of messages.
Given semantic orientation of candidates according to Equation (2), one can find that candidates
are sensitive to polarity because the equal association criterion is often not satisfied. In order to make
potential neutral candidates less sensitive to polarity, a positive threshold score δ should be introduced
to SO-PMI (Li et al. 2014). The optimal threshold value is determined empirically (Yu et al. 2013).
Based on the considerations given above, we propose a novel algorithm SO-LNPMI which is
defined as follows:
Lpðword1 ;word2 Þ
log2 ½Lpðword 1 ÞLpðword2 Þ

LNPMIðword1 ; word2 Þ ¼ ; (6)
log2 Lpðword1 &word2 Þ

X X
SO LNPMIðwordi Þ ¼ LNPMIðwordi ; pwordÞ LNPMIðwordi ; nwordÞ; (7)
pword2Pwords nword2Nwords

8
< positive;if SO LNPMIðwordi Þ > δ;
SOi ¼ neutral;if δ � SO LNPMIðwordi Þ � δ; (8)
:
negative;if SO LNPMIðwordi Þ < δ:

3.2. Sentiment Index


Similar to the famous survey sentiment index, namely AAII (He, He, and Wen 2019), we consider the
percentages of message authors who are positive and negative as our indicators, i.e., Individual
Investor Positive Sentiment (Pos) and Negative Sentiment (Neg) as follows:
jjPt jj jjNt jj
Post ¼ ; Negt ¼ ; (9)
jjMt jj jjMt jj
where jjPt jj, jjNt jj and jjMt jj denote the sets of positive, negative and total messages, respectively.

3.3. Evaluation Metric


The classification performance of a sentiment lexicon is evaluated by accuracy (Yu et al. 2013). Given
a set D that contains a list of n messages, the accuracy is calculated as follows:

1X n
accðf ; DÞ ¼ IIðf ðxi Þ ¼ yi Þ; (10)
n i¼1

where IIð�Þ denotes the indicator function that yields 1 if � is true and 0, otherwise; f ðxi Þ stands for the
sentiment label of message xi based on sentiment lexicon f ; yi is the sentiment label of xi classified
manually by economic experts.
6 C. HUANG ET AL.

3.4. Multi-factor Regression Models


In this section, two classes of multi-factor regression models are employed to examine the hypothesis.
We first adopt sentiment indicator as a regressor to explain the daily market return by controlling
Fama-French factors (Affuso and Lahtinen 2019. The model can be represented as follows:
Rt ¼ α þ βsent Sentt þ βRp ðRm;t Rf ;t Þ þ βsmb SMBt þ βhml HMLt þ εt ; (11)

where Rt ¼ LnðPt Þ LnðPt 1 Þ denotes the log-return of stock index; Pt is the closing price of stock
index on day t; Sentt is the individual investor sentiment (Pos orNeg); Rm is the return of value-
weighted A share market, Rf is the three-month central bank’s base rate; Rm Rf ðRMRFÞ, SMB and
HMLare the market risk, size premium, and book value premium, respectively.
We then assess the impact of individual investor sentiment on market volatility (Avramov,
Chordia, and Goyal 2006). The model is estimated as follows:
X n
σ t ¼ α þ βsent Sentt þ βM Mt þ βNT NTt þ γi σ t i þ εt ; (12)
i¼1
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
t Þ LnðLowt ÞÞ
where σ t ¼ ðLnðHigh4Ln2 is the range-based market volatility; Hight and Lowt are the high and
low prices of stock index, respectively; Mt is the Monday dummy variable; NTt is the number of
transactions or trading size.

4. Data Sources
4.1. Stock Message Board Data
Guba Eastmoney1 is one of the most popular Chinese stock message boards where users can share
investment opinions (Huang, Qiu, and Wu 2016). According to Eastmoney Corporation’s prospectus
(Yang, Zhu, and Cheng 2020), Guba Eastmoney has the most individual users whose daily visitings
reached 14370000 from July 2009 to September 2009. Since Guba Eastmoney contains considerable
insights from individual investor, it is an appropriate source from which to extract individual investor
sentiment. We use messages posted on the discussion board for Shanghai Composite Index (SHCI) as
observations. A list of 716377 messages from July 2017 to September 2018 are extracted. In the
messages, emoticons are marked with “[emoticons]” in this paper.

4.2. Market Data


Our market price samples comprise data of SHCI from July 2017 to September 2018, in a total of 308
daily observations. The variables used in the empirical analysis are the: closing price, high price and
low price of SHCI; market risk premium, size premium, and value premium in terms of Chinese
A share market; number of transactions on the SHCI. The daily data for these variables could be
obtained from China Stock Market Trading Research Database (CSMAR).

5. Empirical Results
5.1. The Construction of Sentiment Lexicon
The textual dataset is divided into two sets: one is the training set containing 478909 messages from
July 2017 to June 2018, and the other is test set containing 237468 messages from July 2018 to
September 2018. We first select high-frequency words from the messages listed on training set as
candidates. These words are manually classified by economic experts in five categories: positive and
negative seed words, neutral words, uncertain words, and negations. We then determine the sentiment
orientation of uncertain words by implementing SO-LNPMI algorithm.
EMERGING MARKETS FINANCE AND TRADE 7

Table 1. Top 10 positive and negative emoticons.


Positive Position Frequency Number Negative Position Frequency Number
[大笑]/[Laugh] 1 0.01734 18896 [哭]/[Cry] 22 0.00490 6121
[赞]/[Likes] 13 0.00680 3793 [不屑]/[Scorn] 26 0.00410 3484
[胜利]/[Victory] 22 0.00403 3831 [不赞]/[Don’t like] 30 0.00361 4583
[鼓掌]/[Clap] 29 0.00353 3670 [大便]/[Shit] 52 0.00216 3042
[微笑]/[Smile] 33 0.00351 2551 [亏大了]/[Big loss] 61 0.00193 1989
[拜神]/[Pray] 42 0.00276 4517 [怒]/[Anger] 152 0.00065 663
[献花]/[Flowers] 48 0.00228 3074 [心碎]/[Broken heart] 155 0.00056 594
[牛]/[Bull] 71 0.00172 1836 [空仓]/[Short position] 172 0.00048 681
[傲]/[Proud] 75 0.00161 1624 [失望]/[Disappointed] 176 0.00041 329
[加油]/[Come on] 88 0.00135 1386 [卖出]/[Sell] 178 0.00037 621
Total 0.04493 45178 Total 0.01917 22107
Emoticons are sorted in ascending order of their positions in sentiment lexicon.

Having inferred the semantic orientation of candidates, we unify the seed words and expanded
words into a sentiment lexicon. Table 1 shows the positions of emoticons in the lexicons. It is
interesting to note that both the frequency and the number of occurrences of positive emoticons are
twice as many as those of negative emotions. Therefore, comparing to the negative sentiment,
individual investors with positive sentiment are more enthusiastic about the use of emoticons.

5.2. Classification Accuracy and Message Sentiment


In order to assess the accuracy of various sentiment lexicons, we first randomly select a list of 1000
messages listed on the training set. Then, we adopt the method of majority voting2 to classify
sentiment for each message by considering seven kinds of sentiment lexicons, respectively:

(1) L1-Sentiment lexicon includes seed words alone.


(2) L2-Sentiment lexicon includes emotion words expanded by SO-PMI.
(3) L3-Sentiment lexicon includes emoticons plus emotion words expanded by SO-PMI.
(4) L4-Sentiment lexicon includes emotion words expanded by SO-LNPMI.
(5) L5-Sentiment lexicon includes emoticons plus emotion words expanded by SO-LNPMI.
(6) L6-Sentiment lexicon includes emoticons plus emotion words expanded by Information Gain.
(7) L7-Sentiment lexicon includes emoticons plus emotion words expanded by Term Frequency–
Inverse Document Frequency.

Table 2 presents the accuracy of different sentiment lexicons. Both L2 and L4 outperform L1,
implying that the utilization of expansion algorithms can improve classification performance. In
comparing the expansion algorithms, L4 and L2(L5 and L3) show that SO-LNPMI outperforms SO-
PMI. In addition, accuracy is further improved by taking the emoticons into account, indicating that
emoticons contain sentiment information. We also randomly select a list of 1000 messages from test
set for evaluation. The results of test set are similar to those of training set, implying that the
classification performance of lexicons is robust.
Furthermore, we adopt two common supervised algorithms: one is Information Gain (IG) and the
other is Term Frequency–Inverse Document Frequency (TF-IDF) (Oliveira, Cortez, and Areal 2016).

Table 2. Comparative results of different lexicon for sentiment classification.


L1 L2 L3 L4 L5 L6 L7
Algorithm SO-PMI SO-PMI SO-LNPMI SO-LNPMI IG TF-IDF
Emoticons included included included included
Accuracy of training set 73.40% 75.30% 77.80% 76.10% 78.60% 76.60% 77.70%
Accuracy of test set 70.50% 71.80% 75.10% 72.40% 75.90% 73.00% 73.60%
8 C. HUANG ET AL.

Figure 1. Time series of returns, volatility, and individual investor sentiment.

Note that L5 outperforms L6 and L7, suggesting that SO-LNPMI remains competitive with these two
kinds of supervised algorithms. Overall, the emotion lexicon L5 containing emoticons and emotion
words expanded by SO-LNPMI permits a more reliable sentiment classification of stock messages.
Figure 1 shows the time series patterns of sentiment index measured by L5, market returns and
volatility.

5.3. Effect of Individual Investor Sentiment on Market Returns


To examine the impact of individual investor sentiment on stock market returns (hypothesis H1), we
use sentiment indicator as a regressor to explain the market return after controlling for Fama-French
factors. Table 3 reports the regression results.
We can find that the coefficient of Pos is not significant on any conventional significance level,
indicating that positive sentiment has no explanatory power for the market returns. However, Neg
shows a significant negative relation with market returns, implying that an increase in negative
sentiment is accompanied by decreasing market returns. These results may be driven by the limited
attention and the negative effect (Dong and Ni 2014). On one hand, individual investors who are
subject to the limited attention constraint may underreact to earning surprises on stock message
board. On the other hand, the negative effect suggests that negative information vs. positive informa­
tion leads to a stronger influence on individual investors. The investors would sell stocks and move to
a safe haven quickly when they find bad news on stock messages board. The adjusted R2 values are
close to 97%, largely because there is a high correlation3 between the return of SHCI and the market
risk premium. These results do not come as surprises given that Shenzhen Stock Exchange is the

Table 3. Effect of individual investor sentiment on market returns.


MODEL 1 MODEL 2 MODEL 3
Intercept 0.000***(−3.035) 0.000*(−1.853) 0.002*(1.919)
POS 0.006(1.514)
NEG −0.008**(−2.268)
RMRF 0.922***(92.950) 0.918***(82.961) 0.912***(84.730)
SMB −0.021(−1.241) −0.021(−1.227) −0.024***(−1.401)
HML 0.119***(5.378) 0.126***(5.588) 0.127***(5.722)
Adj-R2 0.9664 0.9666 0.9667
F-statistic 2947.766*** 2220.802*** 2242.258***

We present the regression results fromRt ¼ α þ βsent Sentt þ βRp Rm;t Rf ;t þ βsmb SMBt þ βhml HMLt þ εt : Numbers in parentheses are
the t-statistics for each coefficient. *, ** and *** denote 10%, 5% and 1% level of significance, respectively. Same as below.
EMERGING MARKETS FINANCE AND TRADE 9

biggest security exchange in terms of market capitalization in mainland China and SHCI can well
reflect the performance of Chinese aggregate market (Han and Li 2017).

5.4. Effect of Individual Investor Sentiment on Market Volatility


In this section, we take a further step by examining the effect of individual investor sentiment on
market volatility(hypothesis H2). As shown in Table 4, individual investor sentiment is very important
for the market volatility, that is, the coefficient on Pos(Neg) is statistically significant at the 1% level and
the adjusted R2 ranges from 30.26% to 32.86%(33.32%) when positive(negative) sentiment is included
as an explanatory variable in the regression. The regression results show that Pos exhibits a negative
relation with market volatility, whereas Neg exhibits a positive relation with market volatility. In other
words, individual investor sentiment has an asymmetric impact on market volatility. This result may
be driven by the herding behavior and the contrarian behavior (Lee, Jiang, and Indro 2002). When
market price declines, the individual investor negative sentiment in the stock message board are prone
to translate into herding trades and when market price rises, the individual investor positive sentiment
are prone to translate into contrarian trades. Herding behavior increases volatility and contrarian
behavior depresses volatility. In addition, it should also be noted that the trading size terms are
significantly positive, which are in line with the notion that trading causes volatility.

6. Robustness Test
In this section, additional robustness tests are implemented to support our main findings. Firstly, we
allow for the role of emoticons in explaining the market returns and volatility. Secondly, we explore
the relationship between the lagged individual investor sentiment and stock market returns and
volatility. Thirdly, we test whether accounting for the positive sentiment and negative sentiment
simultaneously would influence our findings. Lastly, we examine whether the results are sensitive to
the choices of volatility estimates. The empirical analyses of robustness test are reported in supple­
mentary document.

7. Conclusion
With the rapid development of social media, investors increase their interest in stock message board,
devoting a large amount of time to share and to read the messages about investment opinions. This
can make it a powerful source in which to measure individual investor sentiment. Sentiment
lexicons are commonly used for sentiment analysis, not only because they permit effective unsu­
pervised classification, but also because they are public and replicable. However, most of the existing

Table 4. Effect of individual investor sentiment on stock volatility.


MODEL 1 MODEL 2 MODEL 3
Intercept 0.001(0.866) 0.005***(3.437) −0.004***(−2.737)
POS −0.021***(−3.560)
PEG 0.021***(3.856)
M 0.000(0.737) 0.000(0.758) 0.000(0.790)
NT 9.58E-12**(2.303) 1.22E-11***(2.936) 7.79E-12**(1.903)
σ 1 0.339***(6.211) 0.346***(6.454) 0.333***(6.237)
σ 2 0.319***(5.869) 0.301***(5.613) 0.290***(5.400)
Adj-R2 0.3026 0.3286 0.3332
F-statistic 34.078*** 30.854*** 31.491***
n
P
We present the regression results from σt ¼ α þ βsent Sentt þ βM Mt þ βNT NTt þ γi σt i þ εt : The lag
i¼1
length of volatility is selected by using the Akaike information criterion.
10 C. HUANG ET AL.

sentiment lexicons are generic lexicons which may be ineffective for messages published on stock
message board.
In this paper, we propose a novel algorithm SO-LNPMI for the construction of specialized financial
lexicon. Furthermore, we measure the individual investor sentiment in Guba Eastmoney and examine
its relation to market returns and volatility. The empirical analyses show that: (1) negative sentiment is
negatively correlated with market returns, whereas positive sentiment does not have any statistically
significant impact on market returns; (2) positive sentiment is negatively correlated with market
volatility, whereas negative sentiment is positively correlated with market volatility; (3) comparing to
the negative sentiment, individual investors with positive sentiment are more enthusiastic about the
use of emoticons.
Although we focus on the explanatory value of individual investor sentiment for aggregate market
returns and volatility, we believe that the measure of individual investor sentiment can also be useful in
explaining other higher moments of returns, e.g., skewness. This will be our future study.

Notes
1. http://guba.eastmoney.com.
2. More details about methodologies and empirical results of content analysis can be found in supplementary
document.
3. The correlation matrix of variables are reported in supplementary document.

Acknowledgments
We would like to thank the anonymous referees and the editor for very helpful suggestions and comments which led to
improvements of our original paper.

Funding
This research was partially supported by the National Natural Science Foundation of P. R. China under Grant Nos.
71471020 and 71850008, Hunan Provincial Natural Science Foundation under Grant No. 2019JJ50650 and Scientific
Research Fund of Hunan Provincial Education Department under Grant No. 18C0221.

Data Availability Statement


The data and code that support the findings of this study are available from the corresponding author upon reasonable
request https://github.com/VincentWen0320/content-analysis.

References
Affuso, E., and K. D. Lahtinen. 2019. Social media sentiment and market behavior. Empirical Economics 57 (1):105–27.
doi:10.1007/s00181-018-1430-y.
Akhtar, S., R. Faff, B. Oliver, and A. Subrahmanyam. 2011. The power of bad: The negativity bias in Australian consumer
sentiment announcements on stock returns. Journal of Banking & Finance 35 (5):1239–49. doi:10.1016/j.
jbankfin.2010.10.014.
Avramov, D., T. Chordia, and A. Goyal. 2006. The impact of trades on daily volatility. Review of Financial Studies 19
(4):1241–77. doi:10.1093/rfs/hhj027.
Das, S. R., and J. Sisk. 2005. Financial communities. The Journal of Portfolio Management 31 (4):112–23. doi:10.3905/
jpm.2005.592103.
Dimpfl, T., and V. Kleiman. 2019. Investor pessimism and the German stock market: Exploring Google search queries.
German Economic Review 20 (1):1–28. doi:10.1111/geer.12137.
Dong, Y., and C. Ni. 2014. Does limited attention constrain investors’ acquisition of firm-specific information? Journal of
Business Finance & Accounting 41 (9–10):1361–92. doi:10.1111/jbfa.12098.
Farina, V., A. Parisi, and U. Pomante. 2017. Economics blogs sentiment and asset prices. International Journal of Finance
& Economics 22 (4):1–11. doi:10.1002/ijfe.1591.
EMERGING MARKETS FINANCE AND TRADE 11

Friedman, M. 1953. The case for flexible exchange rates. In Essays in positive economics, ed. M. Friedman, 157–203.
Chicago, IL: University of Chicago Press.
Han, X., and Y. Li. 2017. Can investor sentiment be a momentum time-series predictor? Evidence from China. Journal of
Empirical Finance 42:212–39. doi:10.1016/j.jempfin.2017.04.001.
He, Z., L. He, and F. Wen. 2019. Risk compensation and market returns: The role of investor sentiment in the stock
market. Emerging Markets Finance and Trade 55 (3):704–18. doi:10.1080/1540496X.2018.1460724.
Huang, C., S. Wen, M. Li, F. Wen, and X. Yang. 2020. An empirical evaluation of the influential nodes for stock market
network: Chinese A shares case. Finance Research Letters. doi:10.1016/j.frl.2020.101517.
Huang, Y., H. Qiu, and Z. Wu. 2016. Local bias in investor attention: Evidence from China’s internet stock message
boards. Journal of Empirical Finance 38:338–54. doi:10.1016/j.jempfin.2016.07.007.
Kigon, L., and K. Hyeoncheol. 2016. Sentiment analysis using word polarity of social media. Wireless Personal
Communications 89 (3):941–58. doi:10.1007/s11277-016-3346-1.
Lee, W. Y., C. X. Jiang, and D. C. Indro. 2002. Stock market volatility, excess returns, and the role of investor sentiment.
Journal of Banking & Finance 26 (12):2277–99. doi:10.1016/S0378-4266(01)00202-3.
Li, C., H. Xu, and W. Zhou. 2020. News coverage and portfolio returns: Evidence from China. Pacific-Basin Finance
Journal 60:101293. doi:10.1016/j.pacfin.2020.101293.
Li, X., H. Xie, L. Chen, J. Wang, and X. Deng. 2014. News impact on stock price return via sentiment analysis.
Knowledge-Based Systems 69 (1):14–23. doi:10.1016/j.knosys.2014.04.022.
Li, Z., M. Tian, G. Ouyang, and F. Wen. 2020. Relationship between investor sentiment and earnings news in high- and
low-sentiment periods. International Journal of Finance & Economics. doi:10.1002/ijfe.1931.
Narayan, P. K. 2019. Can stale oil price news predict stock returns? Energy Economics 83:430–44. doi:10.1016/j.
eneco.2019.07.022.
Narayan, P. K. 2020. Oil price news and COVID-19—Is there any connection? Energy Research Letters 1 (1). doi:
10.46557/001c.13176.
Narayan, P. K., and D. Bannigidadmath. 2017. Does financial news predict stock returns? new evidence from Islamic and
Non-Islamic stocks. Pacific Basin Finance Journal 42:24–45. doi:10.1016/j.pacfin.2015.12.009.
Narayan, P. K., D. H. B. B. Phan, and D. Bannigidadmath. 2017b. Is there a financial news risk premium in Islamic
stocks? Pacific Basin Finance Journal 42:158–70. doi:10.1016/j.pacfin.2017.02.008.
Narayan, P. K., K. Ranjeeni, and D. Bannigidadmath. 2017a. New evidence of psychological barrier from the oil market.
Journal of Behavioral Finance 18 (4):457–69. doi:10.1080/15427560.2017.1365235.
Novak, P. K., J. Smailović, B. Sluban, and I. Mozetič. 2015. Sentiment of Emojis. Plos One 10 (12):e0144296. doi:10.1371/
journal.pone.0144296.
Oliveira, N., P. Cortez, and N. Areal. 2016. Stock market sentiment lexicon acquisition using microblogging data and
statistical measures. Decision Support Systems 85:62–73. doi:10.1016/j.dss.2016.02.013.
Renault, T. 2017. Intraday online investor sentiment and return patterns in the U.S. stock market. Journal of Banking &
Finance 84:25–40. doi:10.1016/j.jbankfin.2017.07.002.
Siganos, A., E. Vagenas-Nanos, and P. Verwijmeren. 2014. Facebook’s daily sentiment and international stock markets.
Journal of Economic Behavior & Organization 107:730–43. doi:10.1016/j.jebo.2014.06.004.
Smales, L. 2016. Time-varying relationship of news sentiment, implied volatility and stock returns. Applied Economics 48
(51):4942–60. doi:10.1080/00036846.2016.1167830.
Turney, P., and M. Littman. 2003. Measuring praise and criticism: Inference of semantic orientation from association.
ACM Transactions on Information Systems 21 (4):315–46. doi:10.1.1.119.3234.
Wen, F., L. Xu, B. Chen, X. Xia, and J. Li. 2019a. Heterogeneous institutional investors, short selling and stock price crash
risk: Evidence from China. Emerging Markets Finance and Trade 2019:1–14. doi:10.1080/1540496X.2018.1522588.
Wen, F., L. Xu, G. Ouyang, and G. Kou. 2019b. Retail investor attention and stock price crash risk: Evidence from China.
International Review of Financial Analysis 65:101376. doi:10.1016/j.irfa.2019.101376.
Wen, S., Y. Tan, M. Li, Y. Deng, and C. Huang. 2020. Analysis of global remittance based on complex networks. Frontiers
in Physics 8. doi:10.3389/fphy.2020.00085.
Yang, X., S. Wen, Z. Liu, C. Li, and C. Huang. 2019. Dynamic properties of foreign exchange complex network.
Mathematics 7 (9):832. doi:10.3390/math7090832.
Yang, X., Y. Zhu, and T. Cheng. 2020. How the individual investors took on big data: The effect of panic from the
internet stock message boards on stock price crash. Pacific-Basin Finance Journal 59:101245. doi:10.1016/j.
pacfin.2019.101245.
Yu, L., J. Wu, P. Chang, and H. Chu. 2013. Using a contextual entropy model to expand emotion words and their
intensity for the sentiment classification of stock market news. Knowledge-Based Systems 41:89–97. doi:10.1016/j.
knosys.2013.01.001.

You might also like