Technocolabs Machine Learning Internship: Project Report

TECHNOCOLABS MACHINE LEARNING
INTERNSHIP
PROJECT REPORT
TITLE: Predicting Stock Price Changes Using Past

Prices and News Articles
INDEX
SL TITLE
No.
1 AIM
2 ABSTRACT
3 INTRODUCTION
4 PROJECT OVERVIEW
5 ANALYSIS – UNDERSTANDING THE DATA
6 VISULIZATION OF DATA
7 MODEL BUILDING I (LSTM MODEL ON
STOCK PRICE DATA)
8 MODEL BUILDING II (RANDOM FOREST
MODEL ON STOCK NEWS DATA)
9 ADVANTAGES
10 LIMITATIONS
11 CONCLUSION
12 REFERANCES
AIM: The goal of the project is to predict price changes in the
future for Apple stock. Information that might be leveraged to
make these predictions more accurate include prices from
previous days, and financial news articles related to Apple.
ABSTRACT: Stock market price data is generated in huge

volume and it changes every second. Stock market is a complex
and challenging system where people will either gain money or
lose their entire life savings. In this work, an attempt is made
for prediction of Apple stock movement. Two models are built
one for daily prediction of the Apple stock movement and the
other one is for prediction of percentage change from previous
20 days stock price. Supervised machine learning algorithms
are used to build those models. As part of the daily prediction
model, historical prices are combined with market sentiments.
Up to 56.6% of accuracy is observed using Random Forest
model on daily prediction of Apple stock market movement.
LSTM model tries to predict the percentage change of the
Apple stock from previous 20 days stock price.
INTRODUCTION: Stock price prediction is very
important as it is used by most of the business people as well as
common people. Building accurate model is difficult as
variation in price depends on multiple factors such as news,
social media data, production of the company, government
bonds, historical price and country’s economics. Prediction
model which considers only one factor might not be accurate.
Hence incorporating multiple factors news, social media data
and historical price might increase the accuracy of the model.
The goal of this project is to build a model which predicts stock
trend movement (trend will be up or down) using historical
Apple stock news. Two models are built as part of the project.
Both models use supervised machine learning algorithm. First
model is daily prediction model, considers the Apple stock
market historical news. This model predicts the future stock
movement for the next day. Sentiment of the stock has been
computed by using news of the stock which is extracted by
FMP.CLOUD.IO stock price API and the historical price of the
stock is extracted by Yahoo Finance stock price API. Outcome
of the stock movement analysis is considered along with open
price. Second model is predict the percentage change of Apple
stock of current date from previous 20 days stock prices.
PROJECT OVERVIEW:
COLLECTINGING
START THE DATASET
DATA CLEANING AND

HANDLING MISSING
DATA
DEPLOYMENT IN
HEROKU
DATA
PREPARATION
INTREGATING OF
MODEL I AND
MODEL II IN
STREAMLIT APP EXPLORATORY
DATA ANALYSIS
MODEL BUILDING II
MODEL BUILDING I
(RANDOM FOREST
(LSTM MODEL ON
MODEL ON STOCK
STOCK PRICE DATA)
NEWS DATA)
LABELING THE
STOCK NEWS DATA
TEXT PROCESSING ACCORDING TO
AND EDA STOCK MOVEMENT
FROM PREVIOUS
DAY
ANALYSIS – UNDERSTANDING THE DATA:
First collect the Apple news datasets by the FMP.CLOUD.IO
news API and prepare this data. Then collect the Apple stock
market historical price data by Yahoo Finance API. Labelling
the Apple stock data based on the dates and news. Considering
the Close price of the Apple stock market in this project.
Here is the plot of Apple stock price distribution:
MODEL BUILDING I (LSTM MODEL ON STOCK
PRICE DATA):
Long Short-Term Memory models are extremely powerful
time-series models. They can predict an arbitrary number of
steps into the future. An LSTM module (or cell) has 5 essential
components which allows it to model both long-term and
short-term data.
 Cell state (ct) - This represents the internal memory of the
cell which stores both short term memory and long-term
memories
 Hidden state (ht) - This is output state information
calculated w.r.t. current input, previous hidden state and
current cell input which you eventually use to predict the
future stock market prices. Additionally, the hidden state
can decide to only retrieve the short or long-term or both
types of memory stored in the cell state to make the next
prediction.
 Input gate (it) - Decides how much information from
current input flows to the cell state
 Forget gate (ft) - Decides how much information from the
current input and the previous cell state flows into the
current cell state
 Output gate (ot) - Decides how much information from the
current cell state flows into the hidden state, so that if
needed LSTM can only pick the long - term
memories or short-term memories.
Preparing the Apple stock price data and doing the featuring
part. Taking previous 20 days price of Apple stock market and
creating the feature variable which influence our target variable
or the percentage change from the 20 days before stock price.
Applying the LSTM model on this dataset and predict the
percentage change from the 20 days before stock price and save
the model.
OUTCOME OF LSTM MODEL:
INTERPRETATION OF OUTCOME:
In the above validation graph it is clearly shows that the
prediction of percentage change of previous 20 days of Apple
stock is approximately following the actual percentage of
previous 20 days Apple stock price. This means that the
outcome of the model prediction is descent.
MODEL BUILDING II TFIDF VECTORIZER AND
(RANDOM FOREST MODEL ON STOCK NEWS
DATA):
Random forest is a supervised classification machine learning
algorithm which uses ensemble method. Simply put, a random
forest is made up of numerous decision trees and helps to tackle
the problem of overfitting in decision trees. These decision
trees are randomly constructed by selecting random features
from the given dataset.
Random forest arrives at a decision or prediction based on the
maximum number of votes received from the decision trees.
The outcome which is arrived at, for a maximum number of
times through the numerous decision trees is considered as the
final outcome by the random forest. Random forests are based
on ensemble learning techniques. Ensemble, simply means a
group or a collection, which in this case, is a collection of
decision trees, together called as random forest. The accuracy
of ensemble models is better than the accuracy of individual
models due to the fact that it compiles the results from the
individual models and provides a final outcome. Labelling the
news data according to the stock market movement from the
previous day. After labelling using the TFIDF Vectorizer
convert a collection of raw documents to a matrix of TF-
IDF features. Equivalent to CountVectorizer followed by
TfidfTransformer. After that lemmatization applied to
each row to get the all words to common form which will
be helpful while assigning polarity to each word.
Lemmatization is done with the help of natural language
toolkit (NLTK) package which is available in python.
After doing all the text processing making the word
distribution by EDA.
The distribution of the most frequent words are here
(Using one word at a time):
(Using 2 words at a time):

Labelling the news data according to the stock market
movement from the previous day. Then performing the
Random Forest classifier. Random forest classifier is a
supervised classification machine learning algorithm which
uses ensemble method. Simply put, a random forest is made up
of numerous decision trees and helps to tackle the problem of
overfitting in decision trees. These decision trees are randomly
constructed by selecting random features from the given
dataset.
Random forest arrives at a decision or prediction based on the
maximum number of votes received from the decision trees.
The outcome which is arrived at, for a maximum number of
times through the numerous decision trees is considered as the
final outcome by the random forest. Random forests are based
on ensemble learning techniques. Ensemble, simply means a
group or a collection, which in this case, is a collection of
decision trees, together called as random forest. The accuracy
of ensemble models is better than the accuracy of individual
models due to the fact that it compiles the results from the
individual models and provides a final outcome.
OUTCOME OF RANDOM FOREST CLASSIFIER:

The accuracy score is 56.6 percent it can be improved by
training on more stock price data.
INTERPRETATION OF OUTCOME:
The Random Forest model predict whether the Apple stock
market movement is UP or DOWN on the next day. There is
few intuition has made from this kind of prediction.
 Seeing the past few days prediction if the prediction of
Apple stock market movement of the next day is UP then
hold the stock or buy more stock according to the risk
management.
 If the prediction of Apple stock market movement of the
next day is DOWN then sell the stock and waiting for the
dip that’s why there it will be possible for buying the stock
and sell it when the stock price is uptrend and make more
profit from it. But again all the decisions need to be taken
very carefully and according to the risk management and
more financial knowledge.
INTREGATING OF MODEL I AND MODEL II IN
STREAMLIT APP AND DEPLOYMENT:
After completing the model building part both the models are
integrated in Streamlit App and deploying in Heroku platform.
UI OF STREAMLIT APP:
ADVANTAGES:
 Forecasting and predicting the trends of market is the
most important applications of stock market.
 It also uncovers the future market behaviour which always
helps the investors to understand when and what stocks
can be purchased for the growth of their investment.
LIMITATION:
 The challenge of the stock price forecast is the most
crucial component for companies and equity traders to
predict future revenues.
 Building accurate model is difficult as variation in price
depends on multiple factors such as news, social media
data, production of the company, government bonds, and
historical price, country’s economics.
CONCLUSION:
Knowledge of stock movements by a fraction of a second can
lead to high profits investors can make which makes stock
market studies a major motivation.
The great advances and success of natural language process
and sentiment analysis of online news based on machine
learning and deep learning have gained huge popularity
recently in the financial domain especially in market prediction
models.
There are many predictive models which tell about the market
trend whether it is up or down, but they fail to give accurate
results. The project has been made to build efficient predictive
model of stock market where the movement for the next day is
predicted.
REFERANCES:
1. https://www.intechopen.com/books/e-business-higher-
education-and-intelligence-applications/recent-advances-in-
stock-market-prediction-using-text-mining-a-survey
2.https://www.cse.ust.hk/~rossiter/fyp/103_RO4_Final_2018
19.pdf

Technocolabs Machine Learning Internship: Project Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Technocolabs Machine Learning Internship: Project Report

Uploaded by

Copyright:

Available Formats

TECHNOCOLABS MACHINE LEARNING

TITLE: Predicting Stock Price Changes Using Past

ABSTRACT: Stock market price data is generated in huge

DATA CLEANING AND

(Using 2 words at a time):

OUTCOME OF RANDOM FOREST CLASSIFIER:

You might also like