You are on page 1of 38

How did COVID-19 shaped the tweets sentiment impact upon

stock prices of sport companies ?


Karim Derouichea , Marius Cristian Frunzab,c
a
Paris-Dauphine University, Paris, France
b
Ural Federal University(UrFU), 620002, 19 Mira street, Ekaterinburg, Russia
c
Schwarzthal Tech,231 Business Design Center, London, UK

Abstract

This article aims to study the link between Twitter announces and stock prices of sports
companies during the COVID crisis. In many instances, news, announces, social media
content affect the evolution of stock prices. This paper assesses the relationship between
the sentiment of social media and the evolution of stock prices. The study focuses on
companies from the sports sector due to their popularity and the consent number of
followers on social networks, which provide a sound basis of analysis. Two aspects are
explored: the Granger causality analysis of the tweets on stock prices and the event
study related to the COVID crisis. The approach is implemented for a sample of 18
listed companies in the sports sector.
Keywords: COVID-19, Pandemic, Panic, Sport sector, Juventus, Lazio, Cristiano
Ronaldo, NLP, Tweets, Sentiment analysis, Granger causality, Event study

1. Introduction

Information extraction based on Natural Language Processing(NLP) allows to as-


sess quickly and in a structured manner data from news, tweets, etc. Sentiment analysis
provides an additional feature by categorizing the data in various classes ( positive or
negative sentiments, bullish or bearish market views etc...). In many instances, news,
announces, social media content affect the evolution of stock prices. This paper aims
to assess the relationship between the sentiment of social media and the evolution of
stock prices. The study focuses on companies from the sports sector due to their popu-
larity and the consistent number of followers on social networks, which provide a sound
basis of analysis. The research encompasses the causality of tweets’ sentiment on stock

Preprint submitted to Dauphine July 16, 2020

Electronic copy available at: https://ssrn.com/abstract=3649726


prices, and the event study related to a significant event. The results analyze a sample
of 18 listed companies in the sports sector during the COVID-19 pandemic.
The way financial markets incorporate new information has evolved considerably
over recent years. The market reactions to news adapted to the increasing presence
of social media. New algorithmic tools have allowed traders to enter a new era of fi-
nance. The human being and the human feeling take less and less place in investment
decisions. For example, high-frequency trading represents a 60 % of transactions on
Euronext and almost 90% on the American markets.
An individual can now broadcast information in real-time to all members of his social
network from Twitter. The audience generated by Twitter can be considerable and can
increase exponentially if the content goes ”viral” With more than 400 million active
users every day, Twitter attracts a particularly broad audience. There is a so-called
”Twitter effect”, which can have a significant impact on financial markets. This phe-
nomenon is often referred as the ”Twitter Dictatorship”, underscoring that a minister
may need to resign, because a wrong tweet that went viral or a football team may need
to change the coach after a bad result.

Financial markets make no exception from this rule. In recent years, more and more
investment decisions are being made based on people’s opinions shared through social
media. The ”mass sentiment” becomes a critical factor in understanding the price
dynamic. On April 23, 2013, a false Tweet announcing an attack on the White House
induced a massive dip of the leading stock market indices. It is another illustration of
the growing and sometimes problematic influence of social networks on the markets.

Twitter represents a tool to determine the feelings of a population sample through


the aggregation of individual opinions. As a result, we can use information extracted
from tweets to predict future trends through the behaviour of the population. Mi-
croblogging platforms allow the rapid and wide dissemination of content. These plat-
forms have grown exponentially. They are now essential tools for the transmission
of information and the manipulation of popularity. Undeniably these networks have
changed the daily lives of individuals and companies seeking to improve their image
on social networks. Data from these networks is a new way of studying individuals
and collective behaviour, in particular the sentiment towards a brand, a product, a

Electronic copy available at: https://ssrn.com/abstract=3649726


personality or a stock.
Twitter is a social network that allows its users to publish ”tweets” (statuses of 280
characters), follow other people and communicate through replies or ”retweet”. Twit-
ter also allows the use of hashtags which make it possible to follow the tweets relating
to specific subjects. This is how ”trending topics” are created. The Twitter API (ap-
plication programming interface) allows the search functionality to trace the available
tweets.

It is increasingly difficult to measure the ”mass sentiment” and to assess the cor-
relation with the stock prices. Manipulating stock prices through mass sentiment is a
concerning issue.

News announcements concerning listed companies can have a huge impact on fi-
nancial markets through stock prices or investor behavior. This leads to rapid changes
or abnormal effects in financial portfolios. Wysocki (1998) showed that, depending on
the quality of the messages, there was a strong positive correlation between the volume
of messages posted on discussion boards during the hours that the stock market was
closed and the next trading day’s volume and stock returns. In 2013, Wang et al.
(2013) extended the analysis to stocks returns volatility. The study found special sig-
nificance of the sentiment behind the words of the financial reports. Indeed, there is a
correlation between certain words and the company risk. It emphasizes the importance
of the words used to predict economic indicators. This makes full sense in the Natural
Language Processing analysis.

Investors are constantly looking for new information to make forecasts. The ef-
ficient market hypothesis states that stock prices are a function of new information
and follow a random walk Fama (1970). Recent studies have shown that behavioral
economics theories can be used to predict investment decisions. They take into account
the predominant role of emotion in decision-making. More broadly, Prechter Jr et al.
(2012) finds that the relationship between public mood and trust in government is
more important to stock market investors, suggesting that public mood over politics is
a factor in investment decisions.
However, in order to minimize this relationship between news and market movements,

Electronic copy available at: https://ssrn.com/abstract=3649726


Klibanoff et al. (1998) assumes that investor sentiment is not observable, so we can
only launch retrospective studies. Furthermore, Cutler et al. (1988) did not find any
significant relationship between market movements (the biggest price changes in the
S& P 500) and the publication of new policies.

Nowadays, we live in an interconnected world. We are constantly looking for more


recent news. Some investors analyze the emotion and the feeling towards information.
The Twitter social network is characterized by the immediacy of the information. It
allows a considerable exchange of data on different subjects with different individuals.
About 500 million tweets are exchanged every day. Twitter also allows users to pro-
vide feedback. In this sense, Java et al. (2007) have shown that, on Twitter, there are
groups that form into clusters around a common interest. Subsequently, Cha et al.
(2010) show that the virality of a post on twitter is largely due to the emotion it gen-
erates.

Twitter is a gateway to sentiment analysis. Meador and Gluck (2009) have ex-
plored the opportunities that sentiment analysis offers on platforms like twitter. They
found that sentiment analysis is a tool that can determine how the audience received
a film. They also studied sentiment with the volume of shares traded for certain com-
panies. They found that, for Microsoft and for Yahoo, there was a certain correlation
between the feeling towards these companies and the volume of the exchanges of stocks.

In addition, Go et al. (2009) also show that Twitter plays an important role in con-
sumer satisfaction. Some companies use the social network to characterize the feeling
of consumers towards their products.

The work of Bollen et al. (2011) is one of the most popular in the field. In their pub-
lication, the researchers investigate whether public mood as measured from a large-scale
collection of tweets posted on twitter.com is correlated or maybe predictive of DJIA
values. They use two main tools to measure variations within the public mood from
tweets submitted to the Twitter service from February 28, 2008 to December 19, 2008.
The primary tool, OpinionFinder, analyzes the text content of tweets submitted on a
given day to produce a positive versus negative daily statistic of public mood. The

Electronic copy available at: https://ssrn.com/abstract=3649726


second tool, GPOMS, similarly analyzes the text content of tweets to come up with
a six-dimensional daily statistic of public mood to produce a more detailed view of
changes along a spread of various mood dimensions. The resulting public mood statis-
tic is correlated to the stock index Industrial Average (DJIA) to assess their ability to
predict changes within the DJIA over time. Their results indicate that the prediction
accuracy of normal securities market prediction models is significantly improved when
certain mood dimensions are included, but not others. Specifically variations along the
general public mood dimensions of Calm and Happiness as measured by GPOMS seem
to own a predictive effect, but not general happiness as measured by the OpinionFinder
tool. The article highlight a variety of bias as the non-random choice of tweets.

Mittal and Goel (2012) continued the work of Bollen et al. (2011) by integrating
a whole section of neural networks and cross-model validation. The study divides the
sample into N then train on N-1 and test on the last. This operation is repeated n
times. This method has achieved accuracy percentages of up to 75 % for Dow Jones
stocks. They also created a questionnaire with the words to analyze based on their
feelings.

Zhang et al. (2011) analyzed the positive and negative mood of some tweets on
twitter, and they compared it with stock market indices such as Dow Jones, S&P 500,
and NASDAQ. In order to improve the existing methodology, they decided to use mood
words, for instance ”fear”, ”worry”, ”hope” etc., as emotional tags of a tweet. Initially
they expected that the correlation between optimistic mood and exchange indicators
would be positive, and the pessimistic mood would negatively correlate.They found a
direct correlation for all of them with VIX, and correlation with Dow, NASDAQ and
S&P 500. This suggests that people start using more emotional words like hope, fear
and worry in times of economic uncertainty, independent of whether or not they have
a positive or negative context. They also showed that the number of retweets could be
a better baseline than the number of followers, but simply taking the whole number
of tweets gives the simplest results. The main limit of the article is on the sample of
tweets used. They considered it as not large enough.

More recently, Kordonis et al. (2016) use different techniques of Natural Language

Electronic copy available at: https://ssrn.com/abstract=3649726


Processing. They develop several machine learning model as Naïve Bayes Classifica-
tion or Support Vector Machine for providing a sentiment (either positive or negative)
on tweets. They also show that a change in the public sentiment can affect the stock
market. However, Ranco et al. (2015) showed that there is a low Pearson correlation
and Granger Causality between the Twitter volume and the sentiment of 30 companies
from DJIA. They find that the relation holds between the peaks of Twitter volume and
abnormal returns. They developed ”event studies” on their data.

The sports industry is an economic sector where measuring the relationship between
tweets’ sentiment and stock prices provides unique insights on how the public mood
impacts the market valuation. Indeed, the sports sector is a fertile ground to assess
and measure the mass sentiment. Laypersons can give their opinions on a sporting
event through social networks, thereby making it possible to aggregate their views and
draw conclusions.
The global lock-down related to the COVID-19 pandemic put a halt to all spot
activities across all countries, with the exception of Belarus. The sport fans were left
in a hiatus filled by the negative perspectives of the pandemic. The global sentiment of
the sport sector was dominated by bad news. Therefore, studying the tweets dynamic
for sport companies during the COVID pandemic could reflect the way the sentiment
of the sport fans is reflected in the social media. Moreover, the study aims to assess
whether that persistent negative sentiment of fans amplified the fall of the market
prices for a sport club or company.
This study enriches the academic literature covering the impact of social networks
on the financial markets. It unveils a new area of research related to the impact of
popular opinion on the stock prices of sports companies. The remainder of this article
is organized as follows:

ˆ Section 2 describes the methodology for sentiment analysis and assessment of


the relationships with stock prices.

ˆ Section 3 present the principles of event study.

ˆ Sections 4 describes the compilation process of the dataset used in this study
and the built of the main indexes related to tweets’ sentiment and share prices’
return.

Electronic copy available at: https://ssrn.com/abstract=3649726


ˆ Sections 5 presents the results of the sentiment analysis and event study on a
relevant sample of listed companies from the sports industry during the COVID
outbreak.

ˆ Section 6 concludes.

2. Sentiment analysis

Sentiment and opinion analysis are a NLP topic aiming to extract emotions from
text. Regarding the technical research of Natural Language Processing, the academic
literature grew very quickly over the past decade. Pak and Paroubek (2010) presents
a method to collect a corpus with positive and negative sentiments, and a corpus of
objective texts. Their method allows to extract the polarity of a tweet, whether it’s
negative, positive or even neutral thanks to emoticon. Then, they perform statistical
linguistic analysis of the collected corpus. However, the assertion that leads the au-
thors to conclude that a tweet containing emoticons is necessarily of the same kind
of sentiment is questionable. The article is based on the principles it inherits but
microblogging has changed a lot since then and sentiment analysis techniques have
evolved a lot. Agarwal et al. (2011) proposed a method based on three different mod-
els. The article is quite old, so it has several flaws. Resources have evolved a lot, such as
NLP’s apprenticeship programs. In addition, the data of available tweets is also limited.

Saif et al. (2012) propose a new analysis method for the corpus classification era.
After several research, they deduce that the best classification is that of Naive Bayes.
We associate with an entity a semantic particularity, this refines sentiment analysis.
Feelings are far better detected than for the Unigram and Part Of Speech methods
used by authors like Pak and Paroubek (2010).

A few of the main methods of sentiment analysis are resumed below.

2.1. Unsupervised learning


The sentiment and opinion analysis through unsupervised learning is a straightfor-
ward way which does not require a prior labeling of the analyzed data. A consistent
unsupervised learning algorithm is proposed by Turney (2002) for classifying reviews

Electronic copy available at: https://ssrn.com/abstract=3649726


as recommended or not recommended . This opinion classification method relies on
the estimation of the average semantic orientation of the analyzed phrases based on
the attributes contained in the phrase. Therefore, a text pre-processing should be ap-
plied to the assessed phrase in order to identify the ’epithets’ containing adjectives or
adverbs.
A phrase is assumed to have a positive semantic orientation if has good associations
(e.g., ”exceptional”) and a negative semantic orientation if has bad associations (e.g.,
”shady”). Turney (2002) approach computes the semantic orientation of a phrase using
the mutual information between the given phrase and a set of positive words (good
excellent) minus the mutual information between the given phrase and set of negative
words (ie. Bad, poor). The unsupervised machine for sentiment classification can be
resumed in the following steps:

ˆ The first step computed the mutual information of a phrase based on the semantic
of two consecutive words (ie a noun followed by an adjective or viceversa (i.e ’high
profits’). Therefore, this step based on Part of Speech tagging. With this method
two consecutive words having their tags concordant with the suitable patterns
are extracted from a given phrase. Table 1 shows few rules of selecting relevant
consecutive words for analyzing the sentiment in a sentence.

First Word Second Word Third Word


JJ NN or NNS anything
RB, RBR, or RBS JJ not NN nor NNS
JJ JJ not NN nor NNS
NN or NNS JJ not NN nor NNS
RB, RBR, or RBS VB, VBD, VBN, or VBG

Table 1: Tagging

ˆ In the second step the Pointwise Mutual Information (PMI) Church and Hanks
(1990) between two words, w1 and w2 , is computed as following :

P (w1 ∩ w2 )
P M I(w1 , w2 ) = log2 ( ) (1)
P (w1 ) · P (w2 )

Electronic copy available at: https://ssrn.com/abstract=3649726


where, p(w1 &w2 ) is the probability that word1 and word2 co-occur in the same
phrase, p(w1 ), p(w2 ) are the probability of occurrence of the respective words.
The ratio between p(w1 &w2 ) and p(w1 )p(w2 ) is a measure of the degree of statis-
tical dependence between the two words. The log of this ratio is the amount of
information captured through the presence of one of the words when we observe
the other.

ˆ The third step is to compute Semantic Orientation (SO) of a phrase X. It is


calculated with the following equation

SO(X) = P M I(X,0 buy 0 ) − P M I(X,0 sell0 ) (2)

The words ’buy’ and ’Sell’ are chosen to exemplify the sentiment related to an-
alyst’s views concerning a stock. Buy would be bullish sentiment denoting a
positive opinion and sell would be a bearish sentiment with a negative forecast.
The last step is to calculate an index of semantic orientation of the phrases based
on a dictionary of positive and negative words wp and wn with sizes of Np and
Nn respectively

Np Nn
X X
ISO(X) = P M I(X, wpi ) − P M I(X, wnj ) (3)
i=1 j=1

Once the index is computed, it is possible to classify the phrases as positive if


the index is positive and otherwise if the index is negative.

2.2. Supervised learning


Supervised learning techniques in the area of sentiment analysis that requires a
labeled training dataset of documents and encompasses simple methods like Naïve
Bayes and more complex approaches including random forests or support vector ma-
chine (SVM).

2.2.1. Naive Bayes-based classifiers


One of the simple methods with a reasonable performance for classifying opinions
is the Bayes rules approach. Bayesian classifiers or Naive Bayesian classifiers are algo-
rithms trained on a set of labeled documents or phrases .

Electronic copy available at: https://ssrn.com/abstract=3649726


Figure 1: Naive Bayes Classifiers: The attributes w1 , ...., wnd are independent

Let’s consider a document d that needs to be classified in terms of opinion based


on a finite set of features (words) in nd dimensions (w1 , w2 , ..., wnd ). The probability to
belong to a class C given those features is P (c|w1 , w2 , ..., wnd ). The appropriate class c
is given by solving the following problem.

cmaximum a posteriori = argmax(P (c|d)) = argmax(P (c|w1 , w2 , ..., wnd ))


c∈C c∈C
Y
= argmax(P (c) P (wi |c)
c∈C
i∈1,nd

Figure 1 shows the relationship between the class and the various words in the doc-
ument. The words as attributes of the class and they are assumed to be independent.
In the log space the above equation becomes:
X
cmap = argmax(log)P ((c)) + log(P (wi |c)) (4)
c∈C
i∈1,nd

A classification function Fc takes the following form:

10

Electronic copy available at: https://ssrn.com/abstract=3649726


X
Fc =log(P (c =0 P ositive0 ) + log(P (wi |c =0 P ositive0 ))− (5)
i∈1,nd
X
log(P (c =0 N egative0 ) + log(P (wi |c =0 N egative0 )) (6)
i∈1,nd

Under this formalism a document d is classified as the class c = P ositive for Fc > 0
and c = N egative for Fc < 0.
If the classifier encounters a word that has not been seen in the training set, the
probability of both the classes (positive and negative) would become zero and the
classification function generates an error. This issue can be addressed by Laplacian
smoothing :

#(wi ) + k
P (wi |c) = P (7)
(k + 1) c #(wi )
P
where k is a constant usually consider 1 and c #(wi ) is the sum of all words in
class c.
Sentiment analysis does have acceptable performance as shown in the literature. In-
deed Lewis (1998) and Domingos and Pazzani (1997) show that Naive Bayes classifiers
are optimal for certain problem classes with highly dependent features.

11

Electronic copy available at: https://ssrn.com/abstract=3649726


Figure 2: Augmented Naive Bayes Classifiers: The attributes w1 , ...., wnd are interconnected

Naive Bayes is the simplest form of Bayesian network, in which all attributes are
independent given the value of the class variable. One approach to enrich a Naive
Bayes classifier is to extend its structure to represent explicitly the dependencies among
attributes. An augmented naive Bayes (ANB), is an extended naive Bayes, in which the
class node directly points to all attribute nodes, and there exist links among attribute
nodes(Zhang (2004)).

2.2.2. Support Vector Machine


Support Vector Machine was applied to text categorization by Joachims (1998) and
benchmarked to other machine learning methods by Pang et al. (2002). We consider a
basic case with a categorization set C having only the classes C=(−1, +1) correspond-
ing to negative and positive sentiment documents, Lets assume the following sentiment
prediction linear form for any data point xi :

12

Electronic copy available at: https://ssrn.com/abstract=3649726


f (xi ) = d · xi + b where f (xi ) ∈ C (8)

where f (xi ) assigns a value of -1 indicates one class, and a value of +1 the other
class.
Considering two training sample T r+ and T r− corresponding to previously labeled
documents as positive and negative(Frunza (2015)). The Support Vector Machine finds
a hyperplane that separates the two sets with maximum margin (or the largest pos-
sible distance from both sets). This search corresponds to a constrained optimization
problem by letting C ∈ (1, −1) be the correct class of document xi , the solution can
be written as d and b are found by maximizing the following expression
X
F(x) = 0.5 ∗ |d|2 − α max(0, 1 − cj (d · xj + b)) (9)
j∈(+,−)

where c+ =1 and c− -=-1


Pang et al. (2002) shows that SVM algorithms outperforms Naive Bayes classifiers.

2.3. VADER model


A model that marked a first turning point is that of Hutto and Gilbert (2014) about
VADER (Valence Aware Dictionary for Sentiment Reasoning). The authors wanted
to leverage the ”parsimious rule-based” models to build a sentiment analysis program
that works with the new type of message from social networks, a model which does
not require training data and which is generalizable and fast enough to be used for
streaming data.

The construction of the model was based on three main stages. The first one is the
development of a lexicon of sentiment analysis which is sensitive to the polarity but
also to the intensity of the feeling. Drawing on well-established sentiment word banks
(LIWC, ANEW and GI), the authors created a list in which they incorporate many of
the common lexical characteristics of sentiment expression in microblogs. Emoticons
and slang, for example, or even acronyms and initialisms (like LOL and WTF, two in-
tense sentiment acronyms). This process has given them a lexicon of more than 9,000
lexical features.

13

Electronic copy available at: https://ssrn.com/abstract=3649726


Then, using a ”Wisdom of the crowd approach” (Surowiecki (2004)) they obtained a
point intensity estimate of the feeling of each context-free feature of the lexicon. 90,000
notes were collected. These ratings range from 4 (very positive) to -4 (very negative).
Of the 9000 characteristics, 7500 have been conserved. They were those who had a
non-zero average score and whose standard deviation was less than 2.5. For example,
the word ”brave” has a positive intensity of 2.4, ”excellent” has 2.7 and ”catastrophe”
has -3.4.

The second stage is the identification and evaluation of grammar and syntax rules
to assess the intensity of feelings. Then, they analyzed a targeted sample of 400 pos-
itive tweets and 400 negative tweets. This sample was selected from a larger initial
set of 10,000 random tweets from Twitter’s public timeline based on their sentiment
scores using the Pattern sentiment analysis engine. Two human experts individually
examined the 800 tweets and independently assessed their feeling intensity on a scale
of -4 to +4.

Using a coding technique similar to Strauss and Corbin (1998), the authors used
qualitative analysis techniques to identify the properties and characteristics that affect
the intensity of feeling in a text. This in-depth qualitative analysis made it possible to
isolate five heuristics: Punctuation, capitalization, adverbs of degrees, contrast of the
conjunction ’but’ and the previous three words.

The last stage is comparison of performance with the different existing models. The
composite score is calculated by adding the valence scores of each word in the lexicon,
adjusted according to the rules, then normalized to be between -1 (the most extreme
negative) and +1 (the most extreme positive). The VADER lexicon performs very well
in the social media domain.
Vader is a tool in competition with TextBlob. Both tools use the same lexicon-
based method. TextBlob is however an open source tool that allows greater flexibility.
The results of the two tools are quite similar .

14

Electronic copy available at: https://ssrn.com/abstract=3649726


3. Event study methodology

An event study is a statistical method to assess the impact of an event on the


share price of a listed company. Event study methodologies include a large and vari-
ate panels of techniques from simple stock return based analysis to more sophisticated
multi-factorial model including traded volumes and sentiment indexes from unstruc-
tured data. Broadly speaking an event study methods includes the following the steps
(Skrepnek and Lawson (2001)):

1. Identify the event date(s) of interest


2. Define the event window
3. Establish the model for the Security Price Returns
4. Estimate model parameters
5. Compute the abnormal returns
6. Conduct relevant statistical tests

Identifing the event date(s) of interest and of the event window is resumed in Figure
3 (MacKinlay (1997)). The main moments are the beginning and the end of the event
period T1 and T2 respectively. The event period is a window of few days around the
time recent news or information arrive. T0 indicates the debut of the pre-event period
serving fro the estimation of the model, corresponding to the normal returns. The post
event period after the moment T2 can be used to assess the impact of the event in the
longer term.

15

Electronic copy available at: https://ssrn.com/abstract=3649726


Figure 3: Event study timeline The main moments are the beginning and the end of the event
period T1 and T2 respectively. The event period is a window of few days around the time new
information arrives. T0 indicates the debut of the pre-event period serving fro the estimation of the
model, corresponding to the normal returns. The post event period after the moment T2 can be used
to assess the impact of the event in the longer term. (MacKinlay (1997))

3.1. Event study models


Few models for assessing the abnormal return appeared over the past decades with
the final aim to evaluate the dynamic of the excess returns :


ARi,t = Ri,t − Ri,t t ∈ T1¯, T2 (10)

where ARi,t is the abnormal return of firm i and event date t with t ∈ T1¯, T2 , Ri,t

is the observed return of firm i and event date t and Ri,t is the normal return of firm
i conditioned by the information previous to the debut of event (T1 ).

For determining Ri,t several approaches are proposed by the academic literture and
are presented below:

ˆ The mean-adjusted return model discussed by Brown and Warner (1980)


assumes that the returns of firm oscillate around a mean value µi

Ri,t = µi + i,t E[i,t ] = 0 and i,t ∝ N (0, σi ) (11)

16

Electronic copy available at: https://ssrn.com/abstract=3649726


where i,t are the innovation normally distributed. Despite being plainly spec-
ified the model was reported as robust under several conditions and it could
outperform more advanced methods.

ˆ The market-adjusted return model considers that the returns of the stock is
equal to the return of the market index.

Ri,t = Rm,t + i,t E[i,t ] = 0 and i,t ∝ N (0, σi ) (12)

where Rm,t is the return on the market index during period t

ˆ The market model is the most commonly used in the literature of event studies

Ri,t = αi + βi · Rm,t + i,t (13)

where αi and βi are estimated through a linear regression. For time series
with auto-correlation and heterosckedasticity , appropriate method should be
employed in the estimation process. respectively.

ˆ The CAPM model leverages the typical capital asset pricing model by a time-
series regression based on realized returns:

(Ri,t − rf,t ) = αi + βi (Rm,t − rf,t ) + i,t (14)

where rf,t is the risk-free rate at moment t

ˆ The multi risk factors model proposes a multi-variate regression based model
for modelling the returns . The model introduced by Fama and French (1993)
improve the univariate CAPM model.

(Ri,t − rf,t ) = αi + βi,m · (RM,t − rf,t ) + βi,SM B · SM Bt + βi,HM L · HM Lt + i,t (15)

where βi,m , βi,SM B and βi,HM L are the model parameters , SM Bt is the excess
return of small over big stocks measured by market cap and HM Lt is excess
return of stock with a high market-to-book ratio over stocks with a low market-
to-book ratio at moment t

17

Electronic copy available at: https://ssrn.com/abstract=3649726


ˆ The volume driven models Wong (2002) proposed also the study of trading
volume surrounding news announcements as in many cases high trading volumes
are associated with the release and reception of information.

ln(1 + Vi,t ) = ln(1 + Vm,t ) + ln(1 + Vi,t−1 ) + ln(1 + Vi,t−2 ) + γi Dayi,t (16)

where Vi,t is the traded volume of firm i at moment t , Vm,t is the market turnover
volume at time t and Dayi,t are the weekday dummy variables which equals one
for firm i if the trading took place on that day and zero otherwise.

ˆ The News sentiment driven models enhance the market model (Delort et al.
(2009), Siering (2013)) with factors based on the sentiment analysis of web based
news concerning the firm .

Ri,t = αi + βi · Rm,t + γi Sentimenti,t + i,t + (17)

where Sentimenti,t is a sentiment index on firm i ant moment t

3.2. Statistical tests


The model built allows us to measure the abnormal returns during the event period
(test period) as the difference between the observed return and those implied by the
models. With a times series of abnormal returns ARi,t few metrics can be built that
serves as input for building relevant statistic. The aim of such a test is to assess whether
the panel of abnormal returns is significantly different from zero. In other words, it
assessed whether an agent knowing about the event would be able to make a profit
from placing the appropriate trades.

3.2.1. Abnormality metrics


For the market model the abnormal returns at moment t are expressed by the
equation:

ARi,τ = Ri,τ − (α̂i + β̂i · Rm,t ) t ∈ T0 , T2 (18)

where α̂i , β̂i are the estimates of the parameters of the market model.

18

Electronic copy available at: https://ssrn.com/abstract=3649726


The average abnormal return during day t for the full sample of stock returns in a
portfolio with N firms
N
1 X
AARt = ARi,t (19)
N i=1
The cumulative abnormal returns (CAR) are computed as the sum of abnormal
returns across the time at firm level or cross sectional
T2
X
CARi,(T1 ,T2 ) = ARi,t (20)
t=T1
N
1 X
CAAR(T1 ,T2 ) = CARi(τ1 ,τ2 ) (21)
N i=1
For making the abnormal returns across a time window or a portfolio comparable
the Standardize Abnormal Return is computed at a moment t as:

ARi,t
SARi,t = (22)
SARi,t
where the adjusted standard error is computed as:

s
1 (Rm,t − Rm )2
SARi,t = σ̂ARi 1+ + PT1
T1 − T0 s=T (Rm,s − Rm )
2
0
T1
1 X
where σˆ2 ARi = (ARi,s )2
T1 − T0 − 2 s=T
0

1
PT1
and Rm = T1 −T0
Rm,s is the mean rate of return of the market index over the
s=T0
estimation period It can be noticed that σˆ2 AR is the standard deviation estimated from
i

the abnormal returns from the estimation window and SARi,t is the standard deviation
filtered over the event window.
The standardized abnormal returns for the firm i over the time horizon of the event
window T1 , T2 is:

T2
X ARi,t
CSARi,(T1 ,T2 ) = (23)
t=T
SARi,t
1

and the standard deviation of CSAR is:


r
T1 − T0 − 2
SCSARi = (T2 − T1 + 1) (24)
T1 − T0 − 4

19

Electronic copy available at: https://ssrn.com/abstract=3649726


3.2.2. Hypothesis tests
One of the usual tests is Patell’s Patell (1976) standardized residual test. It is
applied for one day period or for longer periods. The null hypothesis of the cross-
sectional test is defined as:
ˆ H0 : SAR(i,t) = 0 states the standardize abnormal return is equal to zero.

ˆ Ha : SAR(i,t) 6= 0
The test follows the t-statistics
AR(i,t)
SAR(i,t) = → t(T1 − T0 − 2) (N(0,1) for a big estimation window) (25)
SARi,t
For the full test window and cross sectional for the portfolio with N firms the null
hypothesis :
PN CSARi,(T1 ,T2 )
ˆ H0 : E(SCAR) = √1
N i=1 SCSARi
= 0 states the cumulated standardize
abnormal return is equal to zero.

ˆ Ha : E(SCAR) 6= 0

respect the t-statistics :


N
1 X CSARi,(T1 ,T2 )
T =√ → N (0, 1) (26)
N i=1 SCSARi
A cross-sectional test aims to assess the impact of an event on a portfolio of stock
over the the test period. The null hypothesis of the cross-sectional test is defined as:
ˆ H0 : CAAR(T1 ,T2 ) = 0 states the cumulative average abnormal return is equal to
zero.

ˆ Ha : CAAR(T1 ,T2 ) 6= 0 implies that there is a significant total average return


following the event.
The test statistic has the expression and tend towards the normal distribution:

CAAR(T1 ,T2 )
T = → N (0, 1) (27)
σ̂CAAR(τ1 ,τ2 )
where the variance estimator of the test is
v
u N
1 u X
σ̂CAARi,(T1 ,T2 ) = √ t (CARi,(T1 ,T2 ) − CAARi,(T1 ,T2 ) )2 (28)
N i=1

20

Electronic copy available at: https://ssrn.com/abstract=3649726


4. Dataset and methdology

4.1. Dataset presentation


We focus our analysis on the sentiment of Tweets concerning listed sports com-
panies. We will explore the relationship between the content of tweets, their implied
sentiment and the indicators of market performance. For each company included in
this study two types of data are collected:

1. A set of tweets that contain the name of the company published over the period
of interest. We collected a total of nearly 400,000 tweets on these sports entities
which we analyze with different stock market prices associated with them.
2. Time series of daily close prices for the share of the company. We collected a
total number of 18 times series from Yahoo Finance.

4.1.1. Tweets’ dataset


The Tweepy module allows us to very easily query the Twitter API to retrieve
Tweets, while the TextBlob module allows you to analyze the text of these tweets. The
following table shows an example of TextBlob’s sentiment analysis regarding WWE.
We build a tokenization algorithm that will give the essential words of the tweets. The
noise relating to the small words without meaning will then be removed. This approach
is applied consistently to the corpus of collected Tweets, thereby cresting a dataset in-
cluding for each tweet the text and the corresponding vector of tokens. Table 2 shows
an excerpt of the Tweets dataset showing for each record the text of the Tweet, the
date of publication, the length and the vector of tokens.

21

Electronic copy available at: https://ssrn.com/abstract=3649726


Tweet Day Length Token
Happy Birthday to @WWE Hall of Famer, 2018-07-30 61 [’happy’, ’birthday’, ’to’, ’wwe’, ’hall’, ’of’,
Arnold @Schwarzenegger! ’famer’, ’arnold’, ’schwarzenegger’]
#mondaynightraw Con mi Chino @wwe en la 2018-07-30 83 [’mondaynightraw’, ’con’, ’mi’, ’chino’, ’wwe’,
AAA Miami en @JohnCena Mode Never Give ’en’, ’la’, ’aaa’, ’miami’, ’en’, ’johncena’,
Up ’mode’, ’never’, ’give’, ’up’]
DAMN! He got the beat down of his life! This 2018-07-30 98 [’damn’, ’he’, ’get’, ’the’, ’beat’, ’down’, ’of’,
shit puts the WWE to shame and I absolutely ’his’, ’life’, ’this’, ’shit’, ’put’, ’the’, ’wwe’, ’to’,
loved it! ’shame’, ’and’, ’i’, ’absolutely’, ’love’, ’it’]
13 year old me is bursting. At WWE Raw in 2018-07-30 49 [’13’, ’year’, ’old’, ’me’, ’be’, ’burst’, ’at’,
Miami! ’wwe’, ’raw’, ’in’, ’miami’]
How about... a throwback?! @AlexaB- 2018-07-30 55 [’how’, ’about’, ’...’, ’a’, ’throwback’, ’alexab-
liss WWE @CarmellaWWE liss wwe’, ’carmellawwe’]
Watch @DrHugeShow’s broadcast: 2018-07-30 112 [’watch’, ’drhugeshow’, ”’s”, ’broadcast’,
#DrHUGEshow #RAW Preshow #WWE ’drhugeshow’, ’raw’, ’preshow’, ’wwe’, ’roman-
#RomanReigns #BrockLesnar #Announce- reigns’, ’brocklesnar’, ’announcements’, ’allin’,
ments #ALLIN fo ’fo’]
Updated: Manchester United vs. 2018-07-30 237 [’updated’, ’manchester’, ’united’, ’vs.’, ’real’,
Real Madrid: Everything you need ’madrid’, ’everything’, ’you’, ’need’, ’to’,
to know about Tuesday’s match ’know’, ’about’, ’tuesday’, ’s’, ’match’, ’http’,
https://www.miamiherald.com/sports/mls/ ’//www.miamiherald.com/sports/mls/article215766435.html’,
article215766435.html @MiamiHerald @her- ’miamiherald’, ’heraldsports’, ’intcham-
aldsports @IntChampionsCup @RealMadrid pionscup’, ’realmadrid’, ’manutd’, ’wwe’,
@ManUtd @WWE @realmadriden ’realmadriden’]
@WWE #RAW is about to get #TooSweet 2018-07-30 35 [’wwe’, ’raw’, ’be’, ’about’, ’to’, ’get’,
’toosweet’]
Jericho in WWE 2018-07-30 14 [’jericho’, ’in’, ’wwe’]
Tune in to Monday Night Raw Live 2018-07-30 190 [’tune’, ’in’, ’to’, ’monday’, ’night’, ’raw’, ’live’,
Tonight on @USA Network at 8p eastern as ’tonight’, ’on’, ’usa network’, ’at’, ’8p’, ’east-
@TheCoachrules joins @MichaelCole &amp; ern’, ’as’, ’thecoachrules’, ’join’, ’michaelcole’,
@WWEGraves to call all the @WWE action, ’amp’, ’wwegraves’, ’to’, ’call’, ’all’, ’the’,
live from Miami, FL #RAW #CoachsCrew ’wwe’, ’action’, ’live’, ’from’, ’miami’, ’fl’,
’raw’, ’coachscrew’]

Table 2: Example of Tweets from our dataset: Tweet text, Date, Length and Tokens

To analyze the impact of COVID-19 on the sports industry, we divide our study
into two parts. A first part is focused on sentiment analysis and a second on event
study. For the sentiment analysis part, Table 3 resumes the dataset of tweets collected
over the COVID period from January 1 to May 30 2020.

22

Electronic copy available at: https://ssrn.com/abstract=3649726


Company Number of Tweets
Activision 9,352
Alliance MMA 55
AS Roma 5,867
Dick’s Sporting goods 3,930
EA 12,019
Foot Locker 6,133
JD Sports 5,524
Juventus 11,183
Ladbrokes 4,889
Lazio 8,594
Madison Square 5,759
Manchester United 12,019
MGT Capital 247
Nike 12,019
Quick 12,294
Speedway Motorsports 258
Wanda Sport 610
WWE 12,019

Table 3: Dataset of tweets collected over the COVID period from January 1 to May 30 2020.

Then, for the event study part, We have selected 3 different analysis intervals.

ˆ March 10, 2020 is the date of announcement of the start of containment in


Italy. We use this date for the analysis of Juventus, AS Roma and Lazio Roma.

ˆ March 24, 2020 is the same date for the UK. We use this date for Manchester
United.

ˆ We analyze the rest of the companies as of March 21, which is the date of
containment in the US.

The TextBlob package for Python is a convenient way to do a lot of Natural Lan-
guage Processing (NLP) tasks. It is a sentiment lexicon that can give a polarity and a
subjectivity score. We created a utility function to classify the sentiment of a passed

23

Electronic copy available at: https://ssrn.com/abstract=3649726


tweet using textblob’s sentiment method. If the displayed polarity is above 0, then
we consider that it is a positive tweet. If it is equal to 0, we consider it is neutral
and otherwise, we consider the tweet to be negative. Based on this consideration, we
compute the following metrics for each tweet:
ˆ PTweet (t)= the number of positive tweets at day t ;

ˆ NTweet (t)= the number of negative tweets at day t ;

Figure 4 shows an histogram of the polarity of all WWE(World Wrestling Enter-


tainment) tweets. We can see that there is a majority of positive tweets.

Figure 4: Histogram of the polarity of all WWE(World Wrestling Entertainment) tweets. We can
see that there is a majority of positive tweets. Neutral tweets are those which are not positive nor
negative

To study the relationship between stock prices and the tweets’ sentiment over a
period of time, it is crucial to build a time series of daily sentiment indicators. Therefore
we build and compute two sentiment indicators as following :
ˆ ScoreAbsTwitter (t) the Absolute Sentiment Score at day t

ScoreAbsT witter (t) = PT weet − NT weet (29)

ˆ ScoreRelTwitter (t) the Relative Sentiment Score at day t


PT weet − NT weet
ScoreRelT witter (t) = (30)
PT weet + NT weet

24

Electronic copy available at: https://ssrn.com/abstract=3649726


Also, although provided, we do not use the number of neutral messages as we
believe that the comprehensive polarities (positive and negative) are more meaningful
to analyze market price swings.
Figures 5 and 6 depict the evolution of the Absolute Sentiment score and the Rel-
ative Sentiment score computed for WWE tweets between June 2018 and April 2019.
The Relative Sentiment score is between 0 and 1. This normalization will allow com-
parison with time series of stock prices returns.

Figure 5: Evolution of the Absolute Sentiment score computed for WWE tweets between June 2018
and April 2019.

25

Electronic copy available at: https://ssrn.com/abstract=3649726


Figure 6: Evolution of the Relative Sentiment score computed for WWE tweets between June 2018
and April 2019.

4.1.2. Stock prices dataset


For each of the 18 stock prices time series, we compute the daily log return, the
daily Excess of Log return over the return of the Market index and the volatility.

ˆ The daily returns has the following classic form:

RDaily (t) = log (Closet ) − log (Closet−1 ) (31)

ˆ The Excess of Log return is computed with respect to the S&P500’s Close :

RExcess (t) = RDaily (t) − RM arket (t) (32)

ˆ The volatility index is built based on daily High and Low price values:

High(t) − Low(t)
V ol(t) = (33)
High(t) + Low(t)

26

Electronic copy available at: https://ssrn.com/abstract=3649726


Figures 7 and 8 presents the evolution of daily excess of log-returns and the volatility
index for WWE share prices respectively.

Figure 7: Evolution of daily excess of log-returns for WWE share prices

27

Electronic copy available at: https://ssrn.com/abstract=3649726


Figure 8: Evolution of the volatility index for WWE share prices

We obtained stock prices thanks to Yahoo! Finance API. We have the Open, the
Close, the High and the Low values for each trading day over the studied period. We
then preprocessed the data to become suitable for our analysis. We had to deal with
the week-end and holidays missing data. We use a basic approximation in order to fill
the missing data.

P ricet−1 + P ricet+1
P riceM issing (t) = (34)
2
4.2. Methodology
To analyze the relationship between tweets’ sentiment and stock prices two main
approaches are employed:

ˆ the Granger causality analysis aiming to assess whether the tweets’ sentiment
determine the moves in share price for a given company

28

Electronic copy available at: https://ssrn.com/abstract=3649726


ˆ the event study aiming to assess whether an event reflected in the tweets impact
the stock price.

4.2.1. Granger Causality


To perform statistical causality tests between the sentiment indicators for the
tweets’ related to a company and the excess of log-returns of the share price for that
company, we implement a straightforward model depicted in the equation below.

RExcess (t) = α0a + β1a · ScoreAbstwitter (t − 1) + β2a · ScoreAbstwitter (t) + ε(t) (35)
RExcess (t) = α0r + β1r · ScoreReltwitter (t − 1) + β2r · ScoreReltwitter (t) + ε(t) (36)

The regressions aim to study the relationship between the excess log-returns and
the two sentiment indicators (Absolute and Relative indicators defined above) from the
actual day and with one day lag.
When the estimated β are relevant at a 95% confidence level we conclude that there
is a dependency between the sentiment scores (with a lag) and the stock returns. The
regression of the excess log-returns over the Absolute scores does not provide sound
results. Thus only the results front he Relative sentiment scores are discussed in the
following sections.

4.2.2. Event study


The aim of an event study is to assess the extent to which security price returns
around the time of the COVID outbreak became abnormal. One could ask: If the
announcement of a massive market swing affects social media, can the other way around
be true?. Therefore, the follyoing question comes naturally: Can tweets’ sentiment
amplify a systemic global crisis like that of the COVID-19 pandemic?
A global crisis like that generated by the COVID pandemic affects all sectors of the
economy. The sports industry was disrupted in particular by the shutdown of all com-
petitions and the closing of all sports facilities. In less than 24 hours the entire sports
industry saw its revenues flow drying out. Thus, the viability of several professional
sports and activities was seriously endangered by the crisis.

29

Electronic copy available at: https://ssrn.com/abstract=3649726


5. Results

5.0.1. Granger causality


During the COVID period, we study the correlation between the relative sentiment
score and the excess of log return or volatility. Table 4 shows some of these results for
excess of log return and Table 5 shows results for volatility analysis.

Company pvalue β1 pvalue β2


Activision 0.437016 0.458439
AS Roma 0.805811 0.665792
DSG 0.977472 0.001384
EA 0.218176 0.385306
Foot Locker 0.254013 0.950073
JD Sports 0.528520 0.712782
Juventus 0.265654 0.694285
Ladbrokes 0.557846 0.597324
Lazio 0.469911 0.709752
Madison Square 0.167958 0.242497
Manchester United 0.276008 0.574798
MGT 0.063016 0.124762
Nike 0.786688 0.383539
Wanda Sports 0.642940 0.968665
WWE 0.876652 0.628984

Table 4: Granger causality results for sentiment relative index and excess log-returns during the
COVID crisis

30

Electronic copy available at: https://ssrn.com/abstract=3649726


Company β1 (p-value) β2 (p-value)
Activision 0.437337 0.280620
AS Roma 0.971699 0.106705
DSG 0.034492 0.129969
EA 0.454173 0.073929
Foot Locker 0.875189 0.504510
JD Sports 0.244482 0.042872
Juventus 0.990181 0.465317
Ladbrokes 0.007885 0.191019
Lazio 0.356848 0.137067
Madison Square 0.095135 0.159090
Manchester United 0.282127 0.225059
MGT 0.248879 0.604459
Nike 0.110348 0.166274
Wanda Sports 0.874738 0.760987
WWE 0.189074 0.674081

Table 5: Granger causality results for sentiment relative index and volatility during the COVID crisis

We note that the results are not significant. First, as an explanation, we can say
that there is not necessarily a negative feeling towards football clubs because of the
COVID19. So their sentiment score does not decrease as much as the price of their
shares. Some large companies have had fairly significant results (Dick Sporting Goods
and MGTI). There is surely a dependency between the sentiment and the stock markets
performance indexes that we have chosen, but it is difficult to speak of causality.

5.0.2. Event Study


This analysis aims to study the impact of the lock-down announcement related to
COVID pandemic on the stock prices of companies from the sport industries. Table 6
below present our results of the COVID-19 event study.

31

Electronic copy available at: https://ssrn.com/abstract=3649726


Company CAR Next Day P-Value Next Day
Activision 0.022 0.27
AS Roma 0.279 ** 0.01
Juventus -0.368 *** 0.00
Dicks Sporting Goods -0.235 *** 0.00
EA 0.035 ** 0.03
Foot Locker -0.078 ** 0.02
JD Sports -0.201 *** 0.00
Lazio Roma -1.233 *** 0.00
Madison Square -0.058 *** 0.00
Manchester United 0.124 *** 0.00
MGTI 0.154 0.17
Nike 0.11 *** 0.00
Wanda Sports 0.339 *** 0.00
WWE 0.001 0.50

Table 6: Results of the event study for the COVID-19 impact

The results are significant for most of companies. However, there is no important
date, which seems logical because of the linearity of the COVID crisis. We can also
see that some companies have a positive cumulative abnormal return. Thus, several
companies might have been profiting from the crisis.
Figure 9, 10 and Figure 11 depict the event study results, showing the evolution of
the CAR around the time of the event for Lazio, Juventus and Foot Locker.

32

Electronic copy available at: https://ssrn.com/abstract=3649726


Figure 9: Event study results showing the evolution of the CAR around the time of the Covid lock-
down for Lazio

Figure 10: Event study results showing the evolution of the CAR around the time of the Covid
lock-down for Juventus

33

Electronic copy available at: https://ssrn.com/abstract=3649726


Figure 11: Event study results showing the evolution of the CAR around the time of the Covid
lock-down for Foot Locker

6. Conclusions

This paper studies the interaction between people’s sentiment and stock prices from
a list of leading sports brands. We use daily Tweets to construct a sentiment score
towards 18 sports brands. We analyze the dynamics of the interaction between tweets’
sentiment, stock returns and volatility to assess whether social media can impact fi-
nancial markets. We also use event study methodologies to support our hypothesis.
We assess the relationship between tweets and stock prices for the full scope sample of
18 companies over the period of the COVID crisis.

Our findings indicate the interplay between the sentiment score and the excess of
log return: (i) is more significant for football clubs; (ii) it is more significant the day
before, which suggests a causal relationship; (iii) is less relevant for big companies. This
does not allow us to conclude with certainty on the potential influence of tweets on the
stock prices. Moreover, we show that volatility is not a metric showing dependency
with the sentiment score. In a future study, we intend to test volatility with the number

34

Electronic copy available at: https://ssrn.com/abstract=3649726


of tweets on a day.
Finally, we show that the COVID crisis may have had a fundamental impact, in
terms of event studies. However, we cannot conclude that this crisis has been amplified
by the sentiment towards brands.

Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.J., 2011. Sentiment
analysis of twitter data, in: Proceedings of the Workshop on Language in Social
Media (LSM 2011), pp. 30–38.

Bollen, J., Mao, H., Zeng, X., 2011. Twitter mood predicts the stock market. Journal
of Computational Science 2, 1–8.

Brown, S.J., Warner, J.B., 1980. Measuring security price performance. Journal of
financial economics 8, 205–258.

Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.P., 2010. Measuring user influence
in twitter: The million follower fallacy, in: fourth international AAAI conference on
weblogs and social media.

Church, K.W., Hanks, P., 1990. Word association norms, mutual information, and
lexicography. Computational linguistics 16, 22–29.

Cutler, D.M., Poterba, J.M., Summers, L.H., 1988. What moves stock prices? Tech-
nical Report. National Bureau of Economic Research.

Delort, J.Y., Arunasalam, B., Milosavljevic, M., Leung, H., 2009. The impact of
manipulation in internet stock message boards. International Journal of Banking
and Finance, Forthcoming .

Domingos, P., Pazzani, M., 1997. On the optimality of the simple bayesian classifier
under zero-one loss. Machine learning 29, 103–130.

Fama, E., French, K., 1993. Common risk factors in the returns on stocks and bonds.
Journal of financial economics 33, 3–56.

Fama, E.F., 1970. Efficient capital markets: A review of theory and empirical work*.
The journal of Finance 25, 383–417.

35

Electronic copy available at: https://ssrn.com/abstract=3649726


Frunza, M.C., 2015. Solving modern crime in financial markets: Analytics and case
studies. Academic Press.

Go, A., Bhayani, R., Huang, L., 2009. Twitter sentiment classification using distant
supervision. CS224N project report, Stanford 1, 2009.

Hutto, C.J., Gilbert, E., 2014. Vader: A parsimonious rule-based model for sentiment
analysis of social media text, in: Eighth international AAAI conference on weblogs
and social media.

Java, A., Song, X., Finin, T., Tseng, B., 2007. Why we twitter: understanding mi-
croblogging usage and communities, in: Proceedings of the 9th WebKDD and 1st
SNA-KDD 2007 workshop on Web mining and social network analysis, pp. 56–65.

Joachims, T., 1998. Text categorization with support vector machines: Learning with
many relevant features. Springer.

Klibanoff, P., Lamont, O., Wizman, T.A., 1998. Investor reaction to salient news in
closed-end country funds. The Journal of Finance 53, 673–699.

Kordonis, J., Symeonidis, S., Arampatzis, A., 2016. Stock price forecasting via sen-
timent analysis on twitter, in: Proceedings of the 20th Pan-Hellenic Conference on
Informatics, pp. 1–6.

Lewis, D.D., 1998. Naive (bayes) at forty: The independence assumption in information
retrieval, in: Machine learning: ECML-98. Springer, pp. 4–15.

MacKinlay, A.C., 1997. Event studies in economics and finance. Journal of economic
literature , 13–39.

Meador, C., Gluck, J., 2009. Analyzing the relationship between tweets box-office
performance and stocks. Methods .

Mittal, A., Goel, A., 2012. Stock prediction using twitter sentiment analysis. Stand-
ford University, CS229 (2011 http://cs229. stanford. edu/proj2011/GoelMittal-
StockMarketPredictionUsingTwitterSentimentAnalysis. pdf) 15.

Pak, A., Paroubek, P., 2010. Twitter as a corpus for sentiment analysis and opinion
mining., in: LREc, pp. 1320–1326.

36

Electronic copy available at: https://ssrn.com/abstract=3649726


Pang, B., Lee, L., Vaithyanathan, S., 2002. Thumbs up?: sentiment classification using
machine learning techniques, in: Proceedings of the ACL-02 conference on Empirical
methods in natural language processing-Volume 10, Association for Computational
Linguistics. pp. 79–86.

Patell, J.M., 1976. Corporate forecasts of earnings per share and stock price behavior:
Empirical test. Journal of accounting research , 246–276.

Prechter Jr, R.R., Goel, D., Parker, W.D., Lampert, M., 2012. Social mood, stock
market performance, and us presidential elections: A socionomic perspective on
voting results. SAGE Open 2, 2158244012459194.

Ranco, G., Aleksovski, D., Caldarelli, G., Grčar, M., Mozetič, I., 2015. The effects of
twitter sentiment on stock price returns. PloS one 10.

Saif, H., He, Y., Alani, H., 2012. Semantic sentiment analysis of twitter, in: Interna-
tional semantic web conference, Springer. pp. 508–524.

Siering, M., 2013. All pump, no dump? the impact of internet deception on stock
markets., in: ECIS, p. 115.

Skrepnek, G.H., Lawson, K.A., 2001. Measuring changes in capital market security
prices: The event study methodology. Journal of Research in Pharmaceutical Eco-
nomics 11, 1–18.

Strauss, A., Corbin, J., 1998. Basics of qualitative research techniques. Sage publica-
tions Thousand Oaks, CA.

Surowiecki, J., 2004. The wisdom of crowds, 2004. New York: Anchor .

Turney, P.D., 2002. Thumbs up or thumbs down?: semantic orientation applied to


unsupervised classification of reviews, in: Proceedings of the 40th annual meeting on
association for computational linguistics, Association for Computational Linguistics.
pp. 417–424.

Wang, C.J., Tsai, M.F., Liu, T., Chang, C.T., 2013. Financial sentiment analysis
for risk prediction, in: Proceedings of the Sixth International Joint Conference on
Natural Language Processing, pp. 802–808.

37

Electronic copy available at: https://ssrn.com/abstract=3649726


Wong, E., 2002. Investigation of market efficiency: An event study of insider trading in
the stock exchange of hong kong. Unpublished Thesis, Stanford University, Stanford
.

Wysocki, P.D., 1998. Cheap talk on the web: The determinants of postings on stock
message boards. University of Michigan Business School Working Paper .

Zhang, H., 2004. The optimality of naive bayes. AA 1, 3.

Zhang, X., Fuehres, H., Gloor, P.A., 2011. Predicting stock market indicators through
twitter. Procedia-Social and Behavioral Sciences 26, 55–62.

38

Electronic copy available at: https://ssrn.com/abstract=3649726

You might also like