You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/334352194

Human Computer Interaction with Multivariate Sentiment Distributions of


Stocks Intraday

Chapter · July 2019


DOI: 10.1007/978-3-030-23525-3_8

CITATIONS READS

0 102

2 authors:

Lamarcus Coleman Mariofanna Milanova


University of Arkansas at Little Rock University of Arkansas at Little Rock
1 PUBLICATION 0 CITATIONS 188 PUBLICATIONS 782 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

How Can Humans and Robots Communicate Better? View project

Visual Attention : Biology, Computational Models and Applications View project

All content following this page was uploaded by Lamarcus Coleman on 16 September 2019.

The user has requested enhancement of the downloaded file.


Human Computer Interaction with Multivariate
Sentiment Distributions of Stocks Intraday

Lamarcus Coleman 1, 2 and Mariofanna Milanova, Ph.D.1


1
University of Arkansas at Little Rock, Department of Computer Science, Little Rock AR
72204, USA
2
Gradient Laboratories Inc., Little Rock AR 72204, USA
{lrcoleman1, mgmilanova}@ualr.edu

Abstract. In this work we show that the sentiment of the broader stock market,
namely the S&P 500, is related to the activity of individual stocks intraday. We
introduce a concept we term as embedded context which is an approach to im-
proving unigram language models for restricted use cases. We use a Gaussian
M ixture M odel to create different sentiment regimes (i.e. distributions) of the
broader market over our training period and perform an analysis of the return
and volatility characteristics of each stock per each regime. We create an intra-
day momentum trading strategy using a moving average and Relative Strength
Index (RSI) over our testing period with no consideration to our prior sentiment
regime analysis which serves as our baseline model. We then create an updated
version of our intraday trading strategy which considers the sentiment regime of
the broader market. Our results show an improvement in each stock’s intraday
strategy performance as a result of considering the broader market’s sentiment
regime.

Keywords: Sentiment Analysis, Gaussian M ixture M odel, Stock M arket Pre-


diction.

1 Introduction

Prior works have shown that there exists some relationship between the public’s
mood and movement within the stock market. We pose a key question, “Why is there
a relationship between the mood of the public and movement in stock prices?” We
believe that this relationship is indicative of traders, or market participants, and n ot
the general public.

Though prior works have collected substantial data on general Twitter users and
did not specify whether or not those users were actual market participants, our
thought experiment suggests this is not necessary to build a model capable of associ-
ating sentiment to changes in asset prices. Consistent with our reasoning, we suggest
that for the purpose of stock price prediction using the sentiment of the market, only
the mood of traders, or actual market participants is needed to engender a predictive
system.
We have also found that while much work has been conducted relative to the se n-
timent of the assets and its effect on price changes, to the best of our knowledge, no
prior work has studied whether or not the sentiment of the broader market can be used
to predict changes in the prices of individual stocks on neither a daily or intraday
basis. In lieu of this, we form our first research question for testing. Can it be co n-
cluded that there exists a relationship between the sentiment of the broader market
and that of individual stocks using only information from likely market participants?

We introduce a novel idea we term embedded context. The premise of this idea is
that the key fallacy of a simple unigram model can be circumvented by increasing the
specificity of the model. Within a standard unigram model, the tokens can take on
different connotations dependent upon prior and subsequent words. Our idea of e m-
bedded context suggests that the specification of model creation to a specific use case
could limit these possible contexts and thus embed a discrete meaning of each word
into a unigram model. Relative to this work, we achieve this by 1) only searching for
tweets from likely market participants and 2) creating a custom vocabulary specific to
market participants and the interpretation of their sentiment. Thus forming our second
research question, “Can embedded context be used to circumvent the issue of context
in unigram models for restricted use cases?”

2 Literature Review

Bollen et al, [1] posed the question “Is the public mood correlated over even predic-
tive of economic indicators?” This work surveyed whether or not collective mood
states from Twitter were correlated with the Dow Jones over time. Bollen [1] used
Granger Causality Analysis to determine the correlation between past Dow values and
prior mood states to daily closing Dow prices. A Self Organizing Fuzzy Neural Ne t-
work was used for prediction of market prices to capture the non -linear relationship.

Nisar and Yeung [7] studied the relationship between political sentiment and
movements in the FTSE 100 on a daily basis. They too derived sentiment using data
from Twitter.

Kordonis et al. [4], basing their work on the prior work of Pak and Paroubek
(2016), studied the effectiveness of Naïve Bayes Bernouli Classification and SVMs
for sentiment analysis. They conducted a correlation analysis between tweets and
market movement and used this correlation for the forecasting of stock prices.

Mittal and Goel [6] found tools such as Opinion Finder and SentiWordnet, which
has been used in other studies, infeasible and thus developed their own sentiment
analysis system. They too used a Self-Organizing Fuzzy Neural Network for daily
Dow price prediction. A trading strategy was constructed based on the prediction of
the model.
Patel et al.[8] used a variety of Neural Network models on different sectors of the
Bombay Stock Exchange. They learned that the prediction of prices is better framed
as a classification task.

Jermann [3] studied the influence of executive tweets on market movement. Analy-
sis was conducted on word and sentence level features. A Naïve Bayes bag of words
model served as the baseline and a Neural Network was used for comparison. The
work illustrated a high degree of specificity in tweet collection by focusing only on
tweets from individuals that contained the name of their company within the descrip-
tion of their profile.

Davda and Mittal [2] used NLP to create a trading strategy around news headlines.
They used Yahoo Finance as a source for news and scraped price data from Google
Finance. Lee et al.[5] studied the significance of text analysis for stock price predic-
tion. While other researchers focused on breaking news events disseminated via Twit-
ter and other sources, Lee et al focused on news events reported in companies’ 8-K
filings, or the required filing for significant events within a company.

Si et al. [9] designed a Semantic Stock Network (SSN). This network constituted
of nodes (i.e. stocks) and which were connected by edges of which constituted the co -
occurrence of nodes mentioned frequently within tweets. They used a labeled topic
model to model the tweets and network structure at each node.

Trastour et al. [10] used Latent Dirichlet Allocation to extract topics from news ar-
ticles. The inputs to their model were daily and monthly proportions of articles per
each topic. They used these inputs to predict the daily and monthly crude oil prices.

3 Methodology

We began by collecting daily price data for the SPY ETF, our four stocks, Ado-
be(ADBE), AT&T(T), General Electric (GE), and Wells Fargo(WFC) over the period
of January 1, 2018 to June 1, 2018. We also collected intraday data on the five minute
timeframe for each of our four stocks.

After splitting our data into training and testing sets, we created a list of every trad-
ing day in the 2018 calendar year and used the Twitter Standard API to retrieve tweets
for the SPY, using the $SPY symbol, for each day in our training period.

A market regime consists of a distinct period or subset of activity within a larger


interval of market activity. We chose a Gaussian Mixture Model to model this ph e-
nomenon. A Gaussian Mixture Model, depicted by the equation below, is a model
used to capture the effects of multimodal distributions. In short, as the equation de-
picts, the GMM is a collection or linear combination of multiple distributions.
P(x)=π_(1 ) N(μ_1,σ_1 )+ π_(2 ) N(μ_2,σ_2 )+π_(k ) N(μ_k,σ_k )
• P(x)- probability x stock return came from specific regime/distribution
• 𝜋 – weight of 𝑁𝑘 distribution
• 𝜇 𝑘 - mean of k distribution
• 𝜎𝑘 - covariance matrix of k distribution

This model allowed us to capture the effects of market returns being generated by
different regimes or distributions.

We created our regimes over our training period by 1) preprocessing and scoring
our SPY Tweets by building a custom vocabulary, 2) computing the mean and vari-
ance of the SPY, and 3) passing each of these into our GMM. The use of the custom
vocabulary was a means to distinguish market participants and is illustrative of what
we term embedded context, or the use of unigrams within discrete contexts.

Once our regimes were created, we computed the mean return and variance for
each of our stocks over our training period and group these per the regime of the
broader market. The figure below is an example of this analysis.

Figure 1: Regime Analysis; Displays the volatility (left) and Returns (right) of
each sample drawn from its respective distribution (i.e. regime) for Adobe
(ADBE) stock

We also surveyed whether or not there existed some correlation between our stocks
and regimes. We normalized our stock returns and regimes to perform this study but
found no significant correlation. Below is a correlation matrix from this analysis.
Figure 2: Displays a correlation matrix for each stock and the regimes.

To test our hypotheses, we first developed an intraday momentum trading strategy


over our testing period which did not consider our market regime analysis. This
served as our baseline model. Next, we developed a market regime version of our
intraday trading strategy of which considers the market’s sentiment regime. Our re-
sults are displayed below.

Baseline Model
120000
100000
80000
60000 Initial Value
40000
Baseline Model
20000
0
T GE WFC ADBE
Figure 3: Baseline Model
Regime Analysis Model
120000
100000
80000
60000 Intial Value
40000 Regime Analysis Model
20000
0
T GE WFC ADBE

Figure 4: Regime Analysis Model

We found that the regime analysis model improved the performance of the intraday
strategy across all stocks. ADBE’s performance improved by 12%, GE by 47%, WFC
by 7%, and T improved by 12%.

4 Conclusion

Based on the findings of our experiment, we conclude that a relationship exists be-
tween the sentiment regime of the broader market and individual stocks intraday.
Given that our sentiment regimes were created using tweets specific to 1) the broader
market via the $SPY Twitter query, 2) and used of a custo m vocabulary of positive
and negatives words associated with market participants, we concluded that our idea
of embedded context for unigram language models at the least warrants further re-
search across other industries and is a viable approach for the creation of intraday
trading strategies. We also conclude that an emphasis on market participants rather
than the general Twitter users is likely a more efficient means of using sentiment for
stock price prediction.

For future works, we recommend the collection of more data and the use of the
StockTwits API rather than that of the Twitter API. StockTwits is a similar platform
to that of Twitter with the exception being that the users are those interested or activ e-
ly participating in trading the stock market. This venue is likely to yield some fruitful
results for future stock market sentiment analysis research.

References
1. Bollen, J., M ao, H., & Zeng, X.-J. (2011). Twitter mood predicts the stock market. Journal
of Computational Science, 2(1), 1–8. https://doi.org/10.1016/j.jocs.2010.12.007
2. Davda, A., & M ittal, P. (2008). NLP and Sentiment Driven Automated Trading, 41.
3. Jermann, M . (n.d.). Predicting Stock M ovement through Executive Tweets, 9.
4. Kordonis, J., Symeonidis, S., & Arampatzis, A. (2016). Stock Price Forecasting via Senti-
ment Analysis on Twitter. In Proceedings of the 20th Pan-Hellenic Conference on Infor-
matics - PCI ’16 (pp. 1–6). Patras, Greece: ACM Press.
https://doi.org/10.1145/3003733.3003787
5. Lee, H., Surdeanu, M ., M acCartney, B., & Jurafsky, D. (n.d.). On the Importance of Text
Analysis for Stock Price Prediction, 6.
6. M ittal, A., & Goel, A. (2011). Stock Prediction Using Twitter Sentiment Analysis.
7. Nisar, T. M ., & Yeung, M . (2018). Twitter as a tool for forecasting stock market move-
ments: A short-window event study. The Journal of Finance and Data Science, 4(2), 101–
119. https://doi.org/10.1016/j.jfds.2017.11.002
8. Patel, H. R., Parikh, S. M ., & Patel, A. M . (2017). Prediction model based on NLP and NN
for financial data outcome revelation, 5, 5.
9. Si, J., M ukherjee, A., Liu, B., Pan, S. J., Li, Q., & Li, H. (2014). Exploiting Social Rela-
tions and Sentiment for Stock Prediction. In Proceedings of the 2014 Conference on Em-
pirical M ethods in Natural Language Processing (EM NLP) (pp. 1139–1145). Doha, Qatar:
Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1120
10. Trastour, S., Genin, M ., & M orlot, A. (n.d.). Prediction of the crude oil price thanks to nat-
ural language processing applied to newspapers, 5.

View publication stats

You might also like