You are on page 1of 29

 

 
Trade the Tweet: Social Media Text Mining and Sparse Matrix Factorization
for Stock Market Prediction

Andrew Sun, Michael Lachanski, Frank J. Fabozzi

PII: S1057-5219(16)30160-0
DOI: doi: 10.1016/j.irfa.2016.10.009
Reference: FINANA 1050

To appear in: International Review of Financial Analysis

Received date: 6 August 2016


Accepted date: 17 October 2016

Please cite this article as: Sun, A., Lachanski, M. & Fabozzi, F.J., Trade the Tweet:
Social Media Text Mining and Sparse Matrix Factorization for Stock Market Prediction,
International Review of Financial Analysis (2016), doi: 10.1016/j.irfa.2016.10.009

This is a PDF file of an unedited manuscript that has been accepted for publication.
As a service to our customers we are providing this early version of the manuscript.
The manuscript will undergo copyediting, typesetting, and review of the resulting proof
before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that
apply to the journal pertain.
ACCEPTED MANUSCRIPT

Trade the Tweet: Social Media Text Mining and Sparse

Matrix Factorization for Stock Market Prediction

PT
RI
Andrew Sun

SC
AJSun Consultants
email: ajsun12@gmail.com

NU
Michael Lachanski
MA
SINSI
email: mlachans@princeton.edu
D
TE

Frank J. Fabozzi
EDHEC Business School
P

email: fabozzi321@aol.com
CE
AC

The authors thank StockTwits and Pierce Crosby for providing data. Felix Wong, and Mung Chiang
provided us the MatLab code on which our own R adaptation is based. Responsibility for errors resides with
the authors.
ACCEPTED MANUSCRIPT

Abstract

We investigate the potential use of textual information from user-generated microblogs

to predict the stock market. Utilizing the latent space model proposed by Wong et al.

PT
(2014), we correlate the movements of both stock prices and social media content. This

study differs from models in prior studies in two significant ways: (1) it leverages market

RI
information contained in high-volume social media data rather than news articles and (2)

SC
it does not evaluate sentiment. We test this model on data spanning from 2011 to 2015

on a majority of stocks listed in the S&P 500 index and find that our model outperforms a

NU
baseline regression. We conclude by providing a trading strategy that produces an attractive

annual return and Sharpe ratio. MA


D
P TE
CE
AC

1
ACCEPTED MANUSCRIPT

1 Introduction

The amount of public information available has dramatically increased since the efficient

market hypothesis was first proposed by Fama (1970). In addition to an increase in tradi-

PT
tional sources of information – news articles, analyst reports, and earnings statements, for

example – there has been a staggering increase in the amount of user-generated content on

RI
social media. For example, Twitter reports on its homepage that it has 320 million monthly

SC
active users producing about 500 million tweets per day. Kalampokis et al. (2013) review

studies exploring this expansion of social media and highlight the predictive power of social

NU
media for various applications. Social media data have become a popular source for stock

market prediction, and many have explored its relationship with financial markets.
MA
We present a text analysis based model for predicting stock price and then explore the

implications of such a model based on a trading strategy that leverages our proposed method.

We evaluate our model with in-sample and ex-sample evaluations.


D
TE

2 Literature Review
P

2.1 Text Mining from Traditional News Sources


CE

The use of text mining (i.e., the statistical analysis of natural language data) on infor-

mation sources is now a focal point for researchers. Loughran & McDonald (2011) analyze
AC

10-K filings to develop alternate word lists that better reflect the tone in financial text. They

try to identify specific words that contain information relevant to financial markets and link

these word lists to returns, trading volume, and other market metrics. Nassirtoussi et al.

(2014) summarize studies that focus on leveraging text for predicting asset price movements

and review the performance of various text mining methods on various text sources and

asset classes.

One of the earliest papers linking quantitative measures of language to stock price pre-

diction is Tetlock (2007). He examines the interactions between media content and stock

market activity and finds that investor pessimism can forecast patterns of market activity.

Furthermore, Tetlock et al. (2008) utilize the “bag-of-words” scheme to collect all the words

2
ACCEPTED MANUSCRIPT

from the Wall Street Journal (WSJ) and the Dow Jones News Service (DJNS) and then

classify them as positive or negative using the Harvard-IV-4 psychosocial dictionary. They

find that negative words used in the financial press typically forecast low firm earnings and

that market prices incorporate textual data from newswire sources with only a slight delay.

PT
Schumaker et al. (2012) also try to evaluate correlations between text and stock price

movements using the quantitative textual financial prediction system that the authors de-

RI
veloped, the Arizona Financial Text (AZFinText) system. The authors follow a two-step

SC
process: sentiment analysis and price prediction. Starting with text data from Yahoo! Fi-

nance, they determine whether the text is objective or subjective. Focusing only on the

NU
subjective text and making a significant effort in accurately classifying sentiment, Schu-

maker et al. (2012) test their classification against the MPQA Opinion Corpus, a database
MA
that contains news articles from a wide variety of news sources that are manually annotated

for opinions and other private states such as sentiments, beliefs, and emotions. They re-
D

port a classification accuracy of 74%. Employing their AZFinText system, they find that
TE

for subjective news articles it was easier to predict price direction, having a 59% accuracy.

Moreover, they find that the price decreases for articles classified as positive sentiment 53.5%
P

of the time and the price increases for articles classified as negative sentiment 52.4% of the
CE

time. These results suggest an interesting contrarian strategy for equity traders: sell with

good news and buy with bad news.


AC

Mamaysky & Glasserman (2015) show that text data can also be an indicator of market

volatility. They aggregate over 360,000 articles on 50 large financial companies between

1996-2014 and examine sequences of n words, known as n-grams, classifying each as having

positive sentiment or negative sentiment. They find that an increase in unusual language

of negative sentiment is subsequently followed by increased market volatility (measured by

the VIX index) that lasts for several months at a time.

Typically, methods linking statistical language processing to stock market prediction

focus on classifying sentiment in text sources as their first step. In contrast, Wong et al.

(2014) utilize a methodology that ignores the evaluation of sentiment. Instead, they look at

WSJ articles from 2008 to 2011 to first create a dictionary of the top 1,354 words and collect

stock prices that correspond to each article from the WSJ for each trading day. Utilizing a

3
ACCEPTED MANUSCRIPT

latent factor representation that links term frequencies to the log returns of each stock, they

develop a model that predicts the day’s closing price when given the articles for the day. The

approach developed by Wong et al. (2014) differs from most studies for two main reasons:

(1) as noted earlier, the methodology does not try to evaluate sentiment, avoiding any error

PT
in classifying positive or negative opinions and (2) the methodology can predict prices for

stocks not mentioned in any WSJ articles. Because of the simplicity and robustness of the

RI
methodology, we follow a similar one as described in Section 4.

SC
2.2 Text Mining from Social Media

NU
Researchers have long explored the predictive power of social media data; Kalampokis

et al. (2013) summarize studies suggesting how social media can be used for various types
MA
of predictions. For example, Google search queries have been used to track influenza-like

illnesses, Amazon reviews to predict product sales, and Twitter posts to predict rainfall.
D

One of the earliest empirical studies by Antweiler & Frank (2004) investigating the effect
TE

of social media on the stock market focused on the application of text on Yahoo! Finance

message boards to predict stock market volatility. These studies suggest that predictive
P

indicators can be derived from social media content.


CE

Recently, researchers have explored Twitter as their source for social media content.

Although each post or tweet is limited to 140 characters, in aggregate it is believed that the
AC

information may provide an accurate representation of public sentiment. Examining tweet

analysis and IPO performance, Liew & Wang (2016) find that there is a positive significant

correlation between IPOs’ average tweet sentiment and IPOs’ first-day returns not only on

the first trading day but also two or three days prior. Further, examining the relationship

between tweets and earnings announcements, Liew et al. (2016) report that not only are

the consensus earnings of crowdsourced information more accurate (by more than 60%) but

also tweet sentiment before the earnings announcement can predict post-announcement risk-

adjusted excess returns. Also, Azar & Lo (2016) show that tweets during the Federal Open

Market Committee (FOMC) meeting dates contain information that can be used to predict

stock market returns and can be used to build benchmark-outperforming portfolios.

Another important paper to highlight is that of Liew & Budavri (2016). They use

4
ACCEPTED MANUSCRIPT

StockTwits data to show that social media has significant power in explaining the time-

series variation in returns. They subsequently propose that a sixth “Social Media Factor”

for the Fama-French five-factor model that is both distinct from the previous five factors

and significant in predicting returns. In our study, we contribute to the literature on text

PT
mining for finance by providing the first application the algorithm from Wong et al. (2014)

to social media data from StockTwits at the daily and intraday frequency.

RI
SC
3 Data

NU
In this section, we describe and explain how we pre-processed the data.

3.1 Text Data from StockTwits


MA
The text data that we use are from StockTwits.com. Founded in 2008, StockTwits® is

a financial communications platform targeting participants in the investment community fo-


D

cusing on individual stocks and the stock market. $TICKER tags facilitate the organization
TE

and aggregation of information “streams” about equities and markets from across the web.

As of 2016, there were over 300,000 users on StockTwits producing streams that are viewed
P

by approximately 40 million worldwide. Their content can be integrated with many other
CE

financial sites, including Yahoo! Finance, CNNMoney, Reuters, TheStreet.com, Bing.com


AC

and The Globe and Mail. StockTwits invests a lot of effort into filtering out finance-unrelated

messages and spam. In our opinion, StockTwits provides both high quality and large scale

text data for our text mining purposes.

We obtained approximately 45 million messages from StockTwits streams from January

1, 2011, to August 31, 2015. Each data point provides about 40 different features including

content, follower count, following count, posted time and tag information. We, however,

are only interested in the tweet’s text and post time. To take advantage of StockTwits,

the streams need to be pre-processed via text mining. We utilize the R library tm to carry

out the pre-processing steps. First, we consolidate the streams on a given day into a single

large body of text. For intraday experiments, we further separate the data into AM, midday

and PM periods. Once the text data has been consolidated, we clean the text of nonword

5
ACCEPTED MANUSCRIPT

PT
RI
Panel a: Per day mention of the word Panel b: Per day price of Brent Crude
“oil” Oil

SC
NU
MA
Panel c: Per day mention of the word Panel d: Per day price of AAPL
“aapl”
D

Figure 1: Plot of Per Day Number of Mentions and Price of “oil and “aapl” against time
from 1/1/2011 to 8/31/2015
TE

terms such as website URLs and emoticons. Next, we remove stop-words, words such as
P

“the” and “it” that may cause extra noise, punctuation, and keep all text in lowercase for
CE

easy comparison. An exploratory analysis reveals trends in the number of times keywords

are mentioned. In Figure 1 we examine our StockTwits dataset at a high level by plotting
AC

the word count of the words “oil” and “aapl” and their corresponding prices against time.

By examining the plots, it is not clear whether a relationship between word count or price

exists. For ’oil’, there appears to be a negative correlation between word count and price in

2015. On the other hand, there may or may not be a correlation between the word count

and price of “aapl.” Loughran & McDonald (2011) suggest that raw word counts may not be

the best measure for a word’s information content and that special weighting should make

text-based analysis more informative.

Additionally, we note that the use of StockTwits has dramatically increased over the

period of our investigation. Panel a of Figure 2 shows that the word counts per day increase

at nearly an exponential rate, which can be attributed to the growth of StockTwits’ user

6
ACCEPTED MANUSCRIPT

base. We can especially observe the effect of StockTwits growth in panel b of Figure 2 where

both the average word count per year increased as well as the variance in the total word

count throughout each year. The skew in the word count is most nearly a by-product of

the high volume of social media text information. Despite StockTwits’ “growing pains,” we

PT
believe that the high volume nature of the dataset will provide valuable indicators of the

market and explore methods of normalization that deal with correcting the skew, which we

RI
outline further in Section 4.

SC
3.2 Stock Price Data

NU
In this paper, we seek to predict the prices of the component stocks of the S&P 500

Index. We examine only stocks traded during our examination period. Furthermore, we
MA
remove stocks with low volume such as Berkshire Hathaway B Shares (BRK-B). We obtain

a final stock list of 420 component stocks.


D

We obtain the historical close prices of the 420 component stocks for all trading days
TE

between January 1, 2011, to August 31, 2015, using the R library quantmod and supplement

any missing data through The Center for Research in Security Prices (CRSP). We calculate
P

the log-returns of these prices for all 1,173 trading days and retain them in a matrix where
CE

the rows are the component stocks and the columns are the trading days for a 420 × 1,173

matrix.
AC

For intraday tests, we obtain the price at open, noon and close on all trading days between

January 1, 2011, to August 31, 2015, from NYSE Trade and Quote (TAQ). Similarly, we

calculate the log-returns of these prices for all 3,519 trading periods. We keep these data in

a matrix of similar form, however, with dimensions 420 × 3,519.

We note two main differences between the data examined in this paper and previous

studies. First, many previous studies use news articles as the source of their text information.

More specifically, they look at news sources commonly read in the industry such as the WSJ

or DJNS. Correspondingly, news articles can be considered high quality but low quantity

sources of information (when compared to social media). In contrast, StockTwits streams are

low quality but high quantity sources of text information. We rely on StockTwits’ filtering

methods and eliminate nonwords (e.g. emoticons and website URLs) and stop-words to

7
ACCEPTED MANUSCRIPT

PT
RI
SC
NU
MA
D

Panel a: Plot of word counts per day


P TE
CE
AC

Panel b: Plot of word counts per year

Figure 2: Plot of Word Counts on StockTwits from 1/1/2011 to 8/31/2015

8
ACCEPTED MANUSCRIPT

improve the signal we obtain from our dataset. We limit our non-price data to that which

can be obtained from social media, in particular, StockTwits.

Second, most studies, when examining social media, utilize Twitter as their primary text

source. Although Twitter may appear to be a good data source for our methods, since we

PT
do not try to classify user sentiment, large amounts of Twitter content simply add noise

to stock market prediction algorithms. For example, phrases such as “I love NYC!” and

RI
“Analysts are bullish” both contain positive sentiment, but only one is likely to be relevant

SC
to the stock market. Therefore, we utilize StockTwits, a dataset very similar to Twitter, to

harness the power of social media and target financial content.

NU
4 Methodology
MA
In this section, we describe the methodology of this paper.1
D

4.1 Text Mining


TE

The first important step for our model is the creation of a dictionary of terms through text

mining. Our dictionary was created by examining the top words for each year and combining
P

them with the tickers of the 420 stocks. Some tickers were removed from our dictionary due
CE

to too few mentions. Sample words from our dictionary include typical indicators such as
AC

“buy,” “short,” “hold” as well as “aapl” and “spy.” After creating a dictionary, we create a

term-document matrix, a matrix where the rows correspond to the terms in the dictionary

and the columns correspond to the documents. For our data, each “document” is a successive

trading day and each entry is the word count for that term. More formally, we let Y denote

our term-document matrix and let yi,t indicate the word count for term i on day t.

Various term weighting schemes have been suggested: simple term frequency weighting

(tf), term frequency inverse document frequency (tf-idf) and modified versions such as the

one proposed by Loughran & McDonald (2011). We find that a term weighting scheme intro-

duced in Salton & Buckley (1988) works best with our methodology. Since text information

on each day may vary in total word count, we normalize the text on each day through cosine
1
All code is available for perusal on www.github.com/ajsun/trade-the-tweet. We try to follow the stan-
dards of Gentzkow & Shapiro (2014) wherever possible.

9
ACCEPTED MANUSCRIPT

PT
RI
SC
NU
MA
Figure 3: Plot of Normalized Term Frequencies

normalization (vector length normalization) method, in which each day’s text is treated as
D

a vector and divided by the Euclidian norm. If each day’s vector of term frequencies is
TE

yt
represented as yt , then kyt k2 is the cosine normalization. From the output of this normal-

ization shown in Figure 3, it can be seen that the normalization has significantly decreased
P

the trend. Note that the larger variance seen in the earlier years of the plot is reasonable, as
CE

it is expected that the amount of content produced will stabilize as StockTwits increases in

popularity. Finally, we standardize our frequencies by term. If we represent the ith term’s
AC

frequency on day t as yit , then we standardize yit using the standard z-score
yi,t − µ̄i
σ̄i
where µ̄i , σ̄i are, respectively, the mean and standard deviation of the term frequencies for

term i calculated over all prior days. We remove values of yit that are negative and prune

values above three standard deviations. Negative values are removed because intuitively,

terms mentioned fewer times than average should have no effect on the stock price. Values

above three standard deviations are pruned as to reduce the effect of outliers. This gives us

a standardized, normalized and skew-adjusted term-document matrix.

10
ACCEPTED MANUSCRIPT

4.2 A Sparse Matrix Factorization Model

Our sparse matrix factorization (SMF) model closely follows the methodology of Wong

et al. (2014). The SMF model has two favorable qualities for price prediction: (1) the

PT
matrices U and W are low rank when the selected d  s, and (2) the sparse matrix selects

only the most relevant parameters for predicting the stock market, minimizing over fit.

RI
Below we outline the model in further detail.

SC
4.2.1 Basic Matrix Factorization Framework

The matrix factorization model that we use maps both text and stocks to a joint latent

NU
factor space of dimensionality d. Each stock i is associated with a latent factor vector

ui ∈ Rd and the text data on each trading period t is associated with a vector vt ∈ Rd . The
MA
resulting dot product uTi vt captures the interaction between stock i and trading day t which

can be used to approximate the log return r̂it . In other words:


D

r̂it = uTi vt
TE

4.2.2 Introduction of Text


P

We introduce a text vector yt , which contains the word frequency on trading day t from
CE

the adjusted term-document matrix created in Section 4.1. Then, the day’s latent text
AC

vector vt is inferred from the term frequencies, yt . Given m terms, we create a new matrix

W ∈ Rd×m such that it is a linear mapping of each term m from yt to vt . The log return

for a given stock i can then be expressed as:

r̂it = uTi W yt

The goal of the problem then is to learn the feature vector ui and the mapping matrix

W using the historical data from s days. In matrix form we denote the returns as R =

[rit ] ∈ Rn×s , U ∈ Rn×d , W ∈ Rd×m , Y = [y1 ...ys ] ∈ Rm×s . R is the return matrix with

stocks as the rows and days as the columns, U is the latent factor matrix relating n stocks

to d factors, W is the latent factor matrix relating d factors to m terms and Y is the term-

frequency matrix with terms as the rows and days as the columns. We then can formulate

11
ACCEPTED MANUSCRIPT

the following objective function.

1
minimize kR − U W Y k2F
U ≥0,W 2

PT
and solve for U and W.

RI
SC
4.2.3 Sparseness Constraints

Overfitting is the main problems when solving for U and W. For example, 700 terms (a

NU
small dictionary) and only 10 latent factors would yield 7,000 parameters to be estimated.

Thus, it is important to include regularization terms in our formulation of the problem to


MA
minimize the risk of overfitting. By introducing a sparseness constraint on the matrix W ,

we ensure that the number of nonzero parameters is small. Intuitively, limiting the number

of nonzero parameters makes sense, as not all words may be relevant when predicting the log
D

return of a certain stock. Thus, sparseness constraints solve the problem with noisy terms,
TE

a problem highlighted in Section 3.1.


P

As such, we introduce sparse group lasso regularization. The sparse group lasso objective
CE

function is
m
X
λ kWj k2 + µ kW k1
AC

j=1

As can be seen, there are two terms in the sparse group lasso. In order to select only a

few words for each latent factor, we minimize the first term, where j corresponds to the j th

word of the matrix W . The introduction of the first regularization term will ensure that a

small number of the columns of W will be nonzero. We minimize the second term to ensure

that each word corresponds with a few latent factors. The second term of the sparse group

lasso reduces the number of nonzero terms in each column. The optimization problem then

becomes:

12
ACCEPTED MANUSCRIPT

m
1 X
minimize kR − U W Y k2F + λ kWj k2 + µ kW k1
U,W 2
j=1
s.t U ≥0

PT
Wong et al. (2014) show that this optimization problem can be solved for a local minimum

with the alternating direction method of multipliers (ADMM) by rewriting the problem with

RI
auxiliary variables A and B:

SC
m
1 X
minimize kR − ABY k2F + λ kWj k2 + µ kW k1 + I+ (U )

NU
A,B,U,W 2
j=1
s.t A = U, B = W
MA
where I+ (U ) = 0 if U ≥ 0 and I+ (U ) = ∞ otherwise. Then we form the augmented

Lagrangian with the Lagrange multipliers C and D.


m
1
D

X
Lρ (A, B, U, W, C, D) = kR − ABY k2F + λ kWj k2 + µ kW k1 + I+ (U )
2
j=1
TE

+ tr(C (A − U )) + tr(DT (B − W ))
T

ρ ρ
+ kA − U k2F + kB − W k2F
P

2 2
CE

We solve this Lagrangian using the ADMM method described in the Appendix. Once we

have found matrices U and W , we can generate a prediction for tomorrow’s log return given
AC

today’s text data.

4.3 Training and Testing

In this paper, we predict stock price directions at a daily and intraday frequency. At

the daily level, given StockTwits streams as inputs, we predict the log return at the close of

each trading day. For our intraday tests, we predict the log return at midday and close for

each successive trading day.

4.3.1 Daily Prediction

The dataset is split into training, validation and test sets. The training set comprises 502

trading days spanning January 1, 2011 to December 31, 2012. The validation set contains

13
ACCEPTED MANUSCRIPT

252 trading days between January 1, 2013 to December 31, 2013 and the test set contains

252 trading days from January 1, 2014 to December 31, 2014 and 167 trading days from

January 1, 2015 to August 31, 2015 for a total of 419 trading days.

The main variables for our model are

PT
• n stocks, m terms and s days

RI
• rit : the log return of stock i on day t

SC
• yjt : the adjusted frequency of word j on day t

• pit : the close price of stock i on day t

NU
• ui : latent factor vector for stock i
MA
Our methodology is as follows: On day t, calculate U and W from historical data [rit0 ]

and [yjt0 ] where t0 < t ([·] denotes the matrix of those entries). Then use the adjusted term
D

frequencies [yjt ] on day t to predict [r̂it ] for all i and j. Once the returns [r̂it ] have been
TE

predicted, we compare the signs of the predictions with the actual returns to evaluate the

prediction accuracy for the price direction. Finally, we can recover the price [pit ] from the
P

return since [pit ] = pi,t−1 er̂it .


CE

4.3.2 Intraday Prediction


AC

Like the daily prediction, the dataset is split into training, validation and test sets. The

training set comprises of 1,506 trading periods (open to midday, midday to close, close to

open) between January 1, 2011 to December 31, 2012. The validation set contains 756

trading periods between January 1, 2013 to December 31, 2013 and the test set contains

1,257 trading periods between January 1, 2014 to August 31, 2015.

Our methodology for intraday predictions is identical to the methodology for daily pre-

dictions, however, for the intraday, t denotes the total number of trading periods rather

than trading days.

4.3.3 Hyper-parameters

As part of the model, there are several hyper-parameters that require tuning.

14
ACCEPTED MANUSCRIPT

Hyper-parameter Description
s Number of historical days used to predict price on day t
d Number of latent factors
λ Penalty parameter for first sparse group lasso term

PT
µ Penalty parameter for second sparse group lasso term
ρ Lagrangian penalty parameter

RI
First, we set d = 10, to ensure that W has low rank with only 10 latent factors. Then we

SC
perform a grid search: we compare prediction accuracies for each set of hyper-parameters

and the subsequent prediction on the validation set. Furthermore, λ and µ were selected

NU
only in ranges such that the matrix W remained sparse.
MA
5 Results
D

5.1 In-Sample Results


TE

Given a matrix of returns R for n stocks over the last s days, we learn U and W and

predict the return rit for stock i on day t by using the fact that rit = uTi W yt where yt is the
P

vector of term frequency on day t. In Table 1, we see that the model is able to predict price
CE

direction from training data set with an accuracy of 70.12%. We also evaluate our in-sample

testing using precision and recall which are defined as


AC

tp
P recision =
tp + f p
tp
Recall =
tp + f n
where tp, f p, tn, f n are the number of true positive, false positive, true negative and false

negative prediction respectively. In the context of our model, precision can be seen as a

measure of exactness of our predictions, and recall can be seen as a measure of completeness

of our predictions. We obtain a precision of 68.93% and a recall of 75.05%.

In panel a of Figure 4 we see that all stocks have prediction accuracies greater than 60%.

Similarly, the dotted line in panel b of Figure 4 represents 50% accuracy and we can see

that our model is able to predict direction accuracy greater than 50% on most days in our

in-sample examination.

15
ACCEPTED MANUSCRIPT

In-sample Ex-sample
Daily Daily Intraday
Year 2011-2012 2013 2014 2015 2013 2014 2015
Accuracy 70.12 51.18 51.42 51.58 50.23 48.61 49.03
Precision 68.93 54.14 53.72 51.25 53.12 50.80 48.28

PT
Recall 75.05 53.99 46.34 53.41 49.66 49.29 44.60

Table 1: Test Accuracy Comparison (%)

RI
SC
NU
MA
D
P TE
CE

Panel a: In-sample accuracy by stock Panel b: In-sample accuracy by day


AC

Figure 4: In-Sample Accuracy

5.2 Ex-Sample Results

Table 1 outlines the prediction accuracy of our model on the test set (2014-2015). We

also report the results of our validation set (2013) for comparison. We compare our SMF

model to other baseline models outlined below.

16
ACCEPTED MANUSCRIPT

Daily Intraday
Model 2013 2014 2015 2013 2014 2015
SMF 51.12 51.42 51.58 50.23 48.61 49.03
Previous Return 49.55 49.65 49.15 48.67 48.07 48.53
Previous Price 48.66 48.16 49.31 46.52 47.38 47.11

PT
AR on Return 50.61 50.80 50.31 49.66 51.04 48.87
Random 49.39 49.74 49.65 48.89 49.07 49.17

RI
* figures in bold indicate largest values for their respective column

Table 2: Model Accuracy (%)

SC
• Previous return/price: returns and prices are predicted to be the same as the

NU
previous day’s

• Autoregressive (AR) models: autoregressive models used to predict both price


MA
and return for each stock

• Random: return predictions based on a market timer making random guesses


D

Table 2 summarizes the results of our model compared to the baseline models. We see
TE

that the SMF model predicts directional accuracy better than the baseline models by a
P

percentage point in all years. Furthermore, the EMH suggests that our model should not
CE

do any better than the random test. Thus, a small percentage point increase may actually

suggest some significance in our predictive power. We will explore our algorithm further in
AC

a proposed trading strategy in Section 5.4.

5.3 Intraday Results

In addition to our daily testing, we also extend our test to encompass predictions at an

intraday level. Table 1 reports the results of our intraday testing for our validation (2013)

and test (2014-2015) sets. Like our daily tests, we also compare our intraday SMF model

the baseline models. Table 2 summarizes the results of our evaluation. We note that the

intraday predictions from the SMF model do not beat baselines every year, and in fact our

model is outperformed by a random guess in 2014 and 2015.

17
ACCEPTED MANUSCRIPT

5.4 Trading Strategy

One method of measuring the performance of a portfolio is the Sharpe ratio (SR). For a

given portfolio h, the ex-ante SR of that portfolio is defined as:

PT
E[rh ] − rf
SR(h) =
σh
where E[rh ] is the expected return of the portfolio, rf is the risk-free return and σh is the

RI
standard deviation of the returns of the portfolio. For our calculations (and simplicity) we

SC
assume the risk-free return is approximately zero, a reasonable assumption for short-term

rates during the study period. In his working paper, Lachanski (2015) shows that we can

NU
calculate the SR of a market timing strategy using a closed-form expression:
g
SR = p
MA
κ − g2
E[rt2 ]
where κ = 4E[|rt |]2
, g = p− 1
2 and p is the probability of the model making a correct

prediction. The value for κ can be estimated using the portfolio’s daily excess returns:
D

T [ Tt=1 rt2 ]
P
TE

κ̂ = PT
4[ t=1 |rt |]2
Using this definition of market timing SR, we calculate the SR of our strategy and compare
P

it to the S&P 500 Exchange Traded Funds (ETFs): SPDR S&P 500 ETF, iShares Core S&P
CE

500 ETF, and Vanguard 500 Index Fund. We see from Table 3 that our SR is significantly

lower than other market ETFs. However, it is important to note that the SR calculated from
AC

the closed-form expression is a lower bound, and trading strategies may take advantage of

correlations between stocks to generate a higher SR. Thus, we propose the following trading

strategy.

1. Given the text information for a day, predict the up-down direction of stocks

2. Invest all capital in the stocks with an “up” prediction equally

3. Sell all assets at the close price of the day

4. Repeat for all trading days

We compare our strategy against other trading strategies including an equally-weighted

portfolio (EW) and the global minimum variance portfolio (GMV). The EW portfolio, as the

18
ACCEPTED MANUSCRIPT

Portfolio Sharpe Ratio


SMF Market Timing 0.38
SPDR S&P 500 ETF 0.97
iShares Core S&P 500 ETF 0.98
Vanguard 500 Index Fund 0.98

PT
Table 3: Sharpe Ratio of the Market Timing Model and Comparable ETFs

RI
Portfolio Cumulative Annualized SR Worst Day Best Day
Return Return Return Return

SC
SMF 1.55 1.18 1.35 -0.038 0.035
SMF Weighted 1.38 1.13 1.01 -0.038 0.036
SPDR S&P 500 ETF 1.38 1.13 0.98 -0.043 0.038

NU
GMV2 1.22 1.08 1.61 -0.012 0.010
EW 1.19 1.07 0.71 -0.042 0.021
* figures in bold indicate largest values for their respective column
MA
Table 4: Portfolio Performance Comparison

name suggests, invests in all assets equally, and the GMV portfolio invests in the portfolio
D

that minimizes the variance of all possible portfolios. Note that our strategy invests in all
TE

assets equally. We also show a strategy with weights according to the magnitude of the
P

predicted return from our model. We simulate trading for all trading days between January
CE

1, 2011 to August 31, 2015.

Table 4 lists the cumulative and annualized returns, SR, and worst day and best day
AC

returns. We see that our basic strategy obtains the highest return when compared to the

SPDR S&P 500 ETF and the weighted strategy matches the return of the ETF. We also

see that our strategy obtains an SR of 1.35, which is much higher than the lower bound,

and higher than the SR of all comparable ETFs. We note that the SR of the GMV portfolio

obtains the highest SR of all portfolios, despite having lower return. This phenomenon is

cited as one of the reasons why the SR may not always be the best measurement of fund

performance. Nevertheless, our strategy tops the index ETF in all categories.

Figure 5 shows the plot of our trading strategy over the trading period (January 2013 -

August 2015) against other portfolios. The black lines divide between 2013, 2014 and 2015
2 Σ−1 1
The GMV portfolio weights are calculated using the following closed-form expression: w = 1T Σ−1 1
where Σ is the covariance matrix.

19
AC
CE
P TE
D
MA

20
NU
SC
RI
ACCEPTED MANUSCRIPT

PT

Figure 5: Graph of Portfolio Performance


ACCEPTED MANUSCRIPT

PT
RI
SC
NU
MA
Figure 6: Model Accuracy Plotted Against Magnitude of Return
D

respectively. We see that our strategy performs the worst in 2013, but improves in 2014
TE

and in 2015. This improvement is most likely due to the improvement in our prediction

accuracy from 2013 to 2015. We hypothesize that our trading strategy is able to outperform
P

benchmark portfolios despite small increases in overall accuracy because our strategy is
CE

better at predicting returns as they increase in magnitude. Figure 6 shows the relationship

between magnitude of return and the prediction accuracy, and more specifically, the positive
AC

correlation between the two. This positive correlation suggests that our trading strategy is

better able to take advantage of large positive returns and avoid large negative returns –

helping us perform better than the index. These results suggest that we are able to leverage

market indicators inherent in the StockTwits streams to predict the stock market better

than other models.

6 Conclusions

From the results described in Section 5 we found that we are able to use SMF meth-

ods to extract market indicators from StockTwits streams to predict stock price direction.

These findings support the claim of Wong et al. (2014) that SMF methods can be used in

21
ACCEPTED MANUSCRIPT

conjunction with text mining to predict the stock market.

This paper has two main conclusions. First, market-timing predictions using the SMF

model and StockTwits streams perform better than most basic baseline models. The ex-

sample results show that in every test year, our algorithm is able to beat benchmark methods

PT
that do not utilize information other than price history in their future predictions. Moreover,

a prediction accuracy of 51.37% for daily prediction frequency was found. At first glance,

RI
most investors would not consider a 51% prediction accuracy to be significant, and few would

SC
be willing to pay a premium for a coin that flips heads 51% of the time rather than 50%.

In the context of our model, however, a prediction accuracy of simply 60% for our data

NU
should yield a lower bound for the Sharpe ratio of 2.87, larger than all but two funds listed

on Morningstar. In fact, our prediction accuracies may suggest that StockTwits contains
MA
information useful to asset managers and investors. We show how to use our predictions in

a trading strategy with a SR of 1.35 and an annualized return of 1.18.


D

Second, we conclude that increasing the frequency of predictions does not seem to im-
TE

prove prediction accuracy. At first, intuition led us to believe that prediction accuracy

should increase with an increase in frequency due to the rumor-like nature of StockTwits.
P

Unlike high quality news sources such as the Wall Street Journal, StockTwits information
CE

may only take a few hours to be incorporated into the market rather than a whole day.

However, our results show the opposite – accuracy does not seem to increase with frequency
AC

and is sometimes beaten by a random guess. By examining the interaction between users on

StockTwits, we hypothesize that this may be due to the content-sharing on the site. Rather

than produce original content and opinions, many streams share and link to information

from other sources. This secondary nature of StockTwits causes a delay so that the infor-

mation comes out after it has been incorporated into market prices. This diminishes the

predictive effect of the text information at the daily level, and eliminates it at the intraday

level. On the other hand, news articles such as those from the Wall Street Journal are pri-

mary sources, and reflect new and original content. The difference between these two news

sources may explain why Wong et al. (2014) were able obtain higher prediction accuracies

at the daily level using Wall Street Journal articles.

The efficient market hypothesis implies that it is impossible to predict the market and

22
ACCEPTED MANUSCRIPT

consistently outperform a benchmark on a risk-adjusted return basis after taking account

transaction costs. In this paper, we make two simplifying assumptions. First, we do not take

into account transaction costs. Thus, to accurately compare the performance of our method

versus the market index or other mutual fund managers, we must factor in transaction costs

PT
into our model. However, we note that mutual funds and other asset managers do charge

fees that may be equivalent to adding transaction costs. Second, we assume the risk-free

RI
rate is zero. This is likely to be inconsequential as the 3-month US Treasury rate in 2015

SC
was 0.02%, a rate very close to 0. The market conditions during our period of investigation

– zero interest rates and rising equity market – are conducive to positive performance and it

NU
would be interesting to see if the trading strategy would continue to perform well in varying

conditions.
MA
D
P TE
CE
AC

23
ACCEPTED MANUSCRIPT

References

Antweiler, W. & Frank, M. Z. (2004). Is all that talk just noise? The information content

of internet stock message boards. Journal of Finance, 59(3), 1259–1294.

PT
Azar, P. & Lo, A. W. (2016). The wisdom of Twitter crowds: Predicting stock market

reactions to FOMC meetings via Twitter feeds. Journal of Portfolio Management.

RI
Boyd, S. (2010). Distributed optimization and statistical learning via the alternating direc-

SC
tion method of multipliers. FNT in Machine Learning, 3(1), 1–122.

NU
Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work.

Journal of Finance, 25(2), 383–417.


MA
Gentzkow, M. & Shapiro, J. M. (2014). Code and Data for the Social Sciences: A Practi-

tioner’s Guide. University of Chicago mimeo. Last updated January 2014.


D

Kalampokis, E., Tambouris, E., & Tarabanis, K. (2013). Understanding the predictive power
TE

of social media. Internet Research, 23(5), 544–559.


P

Lachanski, M. (2015). Not another market timing scheme! Working Paper.


CE

Liew, J. K.-S. & Budavri, T. (2016). The ’sixth’ factor – social media factor derived directly

from tweet sentiments. SSRN Electronic Journal.


AC

Liew, J. K.-S., Guo, S., & Zhang, T. (2016). Tweet sentiments and crowd-sourced earn-

ings estimates as valuable sources of information around earnings releases. Journal of

Alternative Investments.

Liew, J. K.-S. & Wang, G. Z. (2016). Twitter sentiment and IPO performance: A cross-

sectional examination. Journal of Portfolio Management, 42(4).

Loughran, T. & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis,

Dictionaries, and 10-Ks. Journal of Finance, 66(1), 35–65.

Mamaysky, H. & Glasserman, P. (2015). Does unusual news forecast market stress? Working

Papers in Financial Research.

24
ACCEPTED MANUSCRIPT

Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., & Ngo, D. C. L. (2014). Text mining

for market prediction: A systematic review. Expert Systems with Applications, 41(16),

7653–7670.

PT
Salton, G. & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval.

Information Processing & Management, 24(5), 513–523.

RI
Schumaker, R. P., Zhang, Y., Huang, C.-N., & Chen, H. (2012). Evaluating sentiment in

SC
financial news articles. Decision Support Systems, 53(3), 458–464.

Tetlock, P. C. (2007). Giving Content to Investor Sentiment: The Role of Media in the

NU
Stock Market. Journal of Finance, 62(3), 1139–1168.
MA
Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. (2008). More than words: Quantifying

language to measure firms’ fundamentals. Journal of Finance, 63(3), 1437–1467.


D

Wong, F. M. F., Liu, Z., & Chiang, M. (2014). Stock market prediction from WSJ: Text
TE

mining via sparse matrix factorization. IEEE International Conference on Data Mining.

Zhang, Y. (2010). An alternating direction algorithm for nonnegative matrix factorization.


P

Rice Technical Report.


CE
AC

25
ACCEPTED MANUSCRIPT

Appendix

Zhang (2010) and Boyd (2010) propose an algorithm by extending the classical alternat-

ing direction method for convex optimization, the alternating direction method of multipliers

PT
(ADMM).3 The ADMM algorithm can be applied to the matrix factorization formulations

by introducing auxiliary variables U and V . The objective function is as follows

RI
1
minimize kXY − Ak2F
X,Y,U,V 2

SC
s.t X −U =0

Y −V =0

NU
U ≥ 0, V ≥ 0

where U ∈ Rm×d , V ∈ Rd×n . The augmented Lagrangian function is then


MA
1
L(X, Y, U, V, C, D) = kXY − Ak2F + C (X − U ) + D (Y − V )
2
ρ ρ
+ kX − U k2F + kY − V k2F
D

2 2
TE

where C ∈ Rm×d and D ∈ Rd×n are Lagrange multipliers and ρ is a penalty parameter for

the constraints X − U = 0 and Y − V = 0. Here stands for element-wise multiplication.


P

The steps of ADMM are derived by minimizing the augmented Lagrangian function with
CE

respect to X, Y, U and V one at a time while fixing the others at their most recent values.

The steps can be written as


AC

X+ = (AY T + ρU − C)(Y Y T + ρI)−1


T
Y+ = (X+ X+ + ρI)−1 (X+
T
A + ρV − D)
C +
U+ = (0, X+ + )
ρ
D
V+ = (0, Y+ + )+
ρ
C+ = C + ρ(X+ − U+ )

D+ = D + ρ(Y+ − V+ )
3
According to Boyd (2010), in practice there are several benefits to using ADMM. Assuming that the
functions are convex and the Lagrangian has a saddle point, the following hold: (1) the residuals converge
to 0, (2) the objective converges to an optimal value, and (3) the dual variable λ converges to an optimal
value. This suggests that ADMM is a viable optimization algorithm. In fact, Zhang (2010) find that ADMM
outperforms when tested on random matrices.

26
ACCEPTED MANUSCRIPT

where the subscript “+” is used to denote the updated value at each iteration. The notation

(·)+ denotes the element-wise maximum. Wong et al. (2014) provide a generalization of this

method that is used to solve the objective function in this paper.

PT
RI
SC
NU
MA
D
P TE
CE
AC

27

You might also like