Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
Stock Prediction Using Web Sentiments, Financial News and Quotes

Stock Prediction Using Web Sentiments, Financial News and Quotes

|Views: 216|Likes:
Journal of Computing, eISSN 2151-9617, Volume 4, Issue 6, June 2012, http://www.journalofcomputing.org
Journal of Computing, eISSN 2151-9617, Volume 4, Issue 6, June 2012, http://www.journalofcomputing.org

More info:

Published by: Journal of Computing on Jul 19, 2012
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Stock Prediction Using Web Sentiments,Financial News and Quotes
Sulana Maria Rebelo and Kavita Asnani
In this paper, we present a model that predicts stock market closing value for Dow Jones Industrial Average (DJI)index for a giving trading day. This is done using unstructured data like financial message board messages and news articles.We also use financial stock quotes data.
We derive the sentiment for each message from the message board usingSentiWordNet and from this we derive the sentiment for every company of DJI for each trading day. News articles are replacedby key phrases using Key Phrase Extraction Algorithm (KEA). The processed message board messages, news articles andstock quotes data will be used to train a Neural Network using Back propagation Algorithm. The trained network will predictclosing value for DJI for a particular trading day.
Index Terms
Back Propagation Algorithm, Dow Jones Industrial Average (DJI), Key Phrase Extraction Algorithm (KEA),Neural Network, SentiWordNet.
1 I
ata mining can be used extensively in the financialmarkets and help in stock-price forecasting. Datamining can help investors discover hidden patternsfrom the historic data that have probable predictivecapability in their investment decisions.The web has rapidly emerged as a great source offinancial information ranging from financial news articlesto personal opinions. Research has shown that sentimentsand stock value are closely related and web sentimentscan be used to predict stock behavior [6]. The same is truefor financial news. Online forum discussions betweeninvestors are not equivalent to market noise, and insteadcontain financially relevant informational content [9].Text mining of such financial information can aid stockmarket predictions. Text mining refers to the process of
deriving meaningful information from natural languagetext. Compared with quotes data, text is unstructured,amorphous, and difficult to deal with algorithmically.There is an important need to extract useful knowledgefrom vast amounts of textual data.We propose a prediction model that will perform stockclosing value prediction for DJI by using quotes data, keyphrases in the news articles and sentiments from message boards. For developing this model we have used dailyquotes data and financial news articles corresponding toDJI and we also make use of the message board posts foreach of the 30 companies of DJI. The data is collected overthe period August 2011 to March 2012.Using the message board sentiments, key phrases fromnews articles and quotes data a Neural Network will betrained using Back Propogation Algorithm. The trainedNeural Network is used to predict the closing value forDJI
2 R
A method is proposed in [3], to predict stock closing value
using quotes data and news articles. Key phrases are
extracted from the news articles. The relationship between the news articles and the trends on the stockprices are used to train the Artificial Neural Networkusing the Back propagation Algorithm.In our research, we aim to determine the cumulativeeffect from the quotes data, key phrases from newsarticles and the sentiments from message boards on theclosing value of DJI.
Sentiment Analysis or Opinion mining aims to
determine the attitude of a speaker or a writer with
respect to some topic or the overall contextual polarity of
a document. It has to be determined whether the opinionexpressed is positive, negative or neutral. Due to therichness of human language, its large expressiveness andambiguities the problem of sentiment classification isnontrivial.In [4] a document in a different language than Englishis first translated into English using standard translationsoftware. Then, the translated document is classifiedaccording to its sentiment into one of the classes
“positive” and “negative”.
For sentiment classification, a document is searched forsentiment bearing words like adjectives. By means ofSentiWordNet (lexical resources for sentiment analysis inEnglish)[1], scores for positivity and negativity aredetermined for these words. An interpretation of thescores then leads to the document polarity.[5] proposes measures that determine the semanticorientation of adjectives for three factors of subjectivemeaning. These three factors of the emotive meaning arethe evaluative factor (e.g., good
 bad); the potency factor(e.g., strong
weak); and the activity factor (e.g., active
Sulana Maria Rebelo,
 Department of Information Technology (M.E.), Padre Conceicao College of Engineering, Goa University, Verna, India.
Kavita Asnani,
 Department of Information Technology (M.E.), PadreConceicao College of Engineering, Goa University, Verna, India.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 6, JUNE 2012, ISSN (Online) 2151-9617https://sites.google.com/site/journalofcomputingWWW.JOURNALOFCOMPUTING.ORG240
passive). Among these three factors, the evaluative factorhas the strongest relative weight. Here we make use ofWordNet synonymy-graph.The approach in [6] involves scanning for financialmessage boards and extracting sentiments expressed byindividual authors. Each message is converted to a vectorof words and author names. The value of each entry inthe vector is then calculated using TFIDF formula. Theprediction at time instance i depends upon the values(messages and stock value) at previous time instance. Forclassifier training weka toolkit is used.This approach calculates a TrustValue which assignstrust to each message based on its author. A classifier istrained which can predict whether the Stock price wouldgo up or down using the features extracted or calculated(including sentiment and TrustValue) over the past oneday.
An approach used to classifying reviews asrecommended or not recommended is given in [7]. Theclassification of a review is predicted by the averagesemantic orientation of the phrases in the review thatcontain adjectives or adverbs. The semantic orientation ofa phrase is calculated as the pointwise mutualinformation (PMI) between the given phrase and the
word “excellent” minus the PMI betwe
en the given
phrase and the word “poor”. PMI is calculated by issuing
queries to a search engine and noting the number of hits(matching documents).
3 S
3.1 Block Diagram
Predicted Closing Value
Fig. 1. Block Diagram.
Fig. 1 gives the overall design of the system. Quotes data,Message Board messages and news articles for DJI will bedownloaded from the Internet for each trading day.1. Quotes NormalizationThe quotes data for DJI is downloaded fromhttp://finance.yahoo.com/q/hp?s=%5EDJI+Historical+Prices.The quotes data is normalized using Min- MaxNormalization.2. Sentiment ClassificationFor each downloaded message board post, we derivethe sentiment the author is trying to convey. Themessage board post is downloaded from(http://messages.yahoo.com/). 3. Key Phrase ExtractionNews articles are downloaded from(http://finance.yahoo.com/news/category-stocks/). For each downloaded new article, we extract the keyphrases from that article using Key Phrase ExtractionAlgorithm (KEA). From the key phrases we derive theglobal list of key phrases, which is the top 10 mostsignificant key phrases impacting the entire corpus [2],[3].4. Prediction ModuleA Neural Network is trained using Backpropogationalgorithm. This Neural Network is used to predict theclosing value of DJI.
3.2 Sentiment Classification
Sentiment Classification aims to automatically predictsentiment polarity (e.g. positive or negative or neutral) ofa text such as blog, message board post, review etc. Theapproach followed in this research for SentimentClassification is based on SentiWordNet [4]. Once themessage board messages are downloaded, we need toderive the sentiment for each message. This means weneed to classify each message as expressing positive,negative or neutral sentiment.
Fig. 2. Sentiment Classification process.
Quotes for DJIMessage BoardMessages for DJINews Articlesfor DJIQuotesNormalizationSentimentClassificationKey PhraseExtractionPredictionModulePart-of-speech TaggingSemantic OrientationIdentificationTagged Message BoardMessagesMessage Board Messageswith Orientation
Message BoardMessages for DJIAdjectiveListSentiWordNetlexiconInternet
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 6, JUNE 2012, ISSN (Online) 2151-9617https://sites.google.com/site/journalofcomputingWWW.JOURNALOFCOMPUTING.ORG241
1. Part-of Speech Tagging: A Part-Of-Speech Tagger is a
piece of software that reads text in some language and
assigns parts of speech to each word (and other
tokens), such as noun, verb, adjective etc. We use the
Stanford tagger [8] to tag each message board post.
The output of the Stanford tagger is a tagged version
of each message. Each token in the message is
assigned an appropriate part-of-speech by the tagger.
After tagging, each adjective in the message is
followed by ‘/JJ’.
 2. Adjective list: Adjectives convey a high degree ofopinion; hence they play an important role inSentiment Classification. From each tagged message board post we extract all the adjectives. (After tagging,
each adjective in the message is followed by ‘/JJ’, so in
this way we identify the adjectives.)3. SentiWordNet lexicon: SentiWordNet is a lexicalresource for opinion mining. SentiWordNet assigns toeach synset of WordNet three sentiment scores:positivity, negativity, objectivity. For each adjective welookup this lexicon and use the correspondingpositivity, negativity and objectivity scores and then toderive the sentiment of the adjective.4. Semantic Orientation Identification: DJI has 30component companies. For any trading day, if weconsider any one company there are many messages.For each of these companies we need to find thesemantic orientation for each trading day, byconsidering all the messages for that company for thattrading day.After POS tagging described above, we extract theadjectives from each message and then follow the steps below to derive the sentiment for a particular trading dayfor a particular company.1. We find the Semantic Orientation of each adjectivein a message.2. Using the results from step 1, we find the SemanticOrientation of each message.3. Using the results from step 2, we find the sentimentfor each trading day for each company.The steps mentioned above are explained below.For each message we find its semantic orientationusing the SentiWordNet file. To find the semanticorientation of a message we need to first find the semanticorientation of each adjective in the message.The following approach is followed:For each adjective (Adj) in the message (M) we find itsSemantic Orientation (SO) using the following method:i.
Lookup the SentiWordNet file and find all recordswhere this adjective appears.Let n be the total number of records found.ii.
Calculate positivity score ( PosScore(Adj) ) andnegativity score (NegScore(Adj) ) of the adjective asfollows:PosScore(Adj) = (1)NegScore(Adj) = (2)iii.
If the PosScore(Adj) = NegScore(Adj),SO(Adj)= neutralelse If the PosScore(Adj) > NegScore(Adj) ,SO(Adj)= positiveelse SO(Adj)= negative.Once we get the semantic orientation of each adjectivein a message, we need to find the semantic orientation ofthe message.For each message (M):PosCnt =number of adjectives having positive semanticorientation.NegCnt =number of adjectives having negative semanticorientation.NeuCnt =number of adjectives having neutral semanticorientation.Semantic Orientation (M) is:neutral if PosCnt = NegCntneutral if NeuCnt > PosCnt and NeuCnt > NegCntpositive if PosCnt > NegCnt and PosCnt >= NeuCntnegative if NegCnt > PosCnt and NegCnt >=NeuCntOnce we get the semantic orientation of each messagewe find the semantic orientation for each trading day, foreach company, by following the approach (which we usedto find the semantic orientation of a message).Here first count the number of messages havingpositive sentiment, negative sentiment and neutralsentiment. Then we follow the rules given above. We thencreate a sentiment vector giving the sentiment for eachcompany.
3.3 Neural Network Training
We use the Feed Forward neural network structureemploying the Feed Forward Backpropogation algorithm.The network configuration details are as follows:1. Input layer: There is only one input layer. This layerwill accept the input data that is fed to the neuralnetwork. Each record in the input data correspondes toone trading day. In our confuration the input layer has
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 6, JUNE 2012, ISSN (Online) 2151-9617https://sites.google.com/site/journalofcomputingWWW.JOURNALOFCOMPUTING.ORG242

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->