Professional Documents
Culture Documents
INTERNSHIP
PROJECT REPORT
SL TITLE
No.
1 AIM
2 ABSTRACT
3 INTRODUCTION
4 PROJECT OVERVIEW
5 ANALYSIS – UNDERSTANDING THE DATA
6 VISULIZATION OF DATA
7 MODEL BUILDING I (LSTM MODEL ON
STOCK PRICE DATA)
8 MODEL BUILDING II (RANDOM FOREST
MODEL ON STOCK NEWS DATA)
9 ADVANTAGES
10 LIMITATIONS
11 CONCLUSION
12 REFERANCES
AIM: The goal of the project is to predict price changes in the
future for Apple stock. Information that might be leveraged to
make these predictions more accurate include prices from
previous days, and financial news articles related to Apple.
COLLECTINGING
START THE DATASET
DATA
PREPARATION
INTREGATING OF
MODEL I AND
MODEL II IN
STREAMLIT APP EXPLORATORY
DATA ANALYSIS
MODEL BUILDING II
MODEL BUILDING I
(RANDOM FOREST
(LSTM MODEL ON
MODEL ON STOCK
STOCK PRICE DATA)
NEWS DATA)
LABELING THE
STOCK NEWS DATA
TEXT PROCESSING ACCORDING TO
AND EDA STOCK MOVEMENT
FROM PREVIOUS
DAY
ANALYSIS – UNDERSTANDING THE DATA:
First collect the Apple news datasets by the FMP.CLOUD.IO
news API and prepare this data. Then collect the Apple stock
market historical price data by Yahoo Finance API. Labelling
the Apple stock data based on the dates and news. Considering
the Close price of the Apple stock market in this project.
Here is the plot of Apple stock price distribution:
MODEL BUILDING I (LSTM MODEL ON STOCK
PRICE DATA):
Long Short-Term Memory models are extremely powerful
time-series models. They can predict an arbitrary number of
steps into the future. An LSTM module (or cell) has 5 essential
components which allows it to model both long-term and
short-term data.
Cell state (ct) - This represents the internal memory of the
cell which stores both short term memory and long-term
memories
Hidden state (ht) - This is output state information
calculated w.r.t. current input, previous hidden state and
current cell input which you eventually use to predict the
future stock market prices. Additionally, the hidden state
can decide to only retrieve the short or long-term or both
types of memory stored in the cell state to make the next
prediction.
Input gate (it) - Decides how much information from
current input flows to the cell state
Forget gate (ft) - Decides how much information from the
current input and the previous cell state flows into the
current cell state
Output gate (ot) - Decides how much information from the
current cell state flows into the hidden state, so that if
needed LSTM can only pick the long - term
memories or short-term memories.
Preparing the Apple stock price data and doing the featuring
part. Taking previous 20 days price of Apple stock market and
creating the feature variable which influence our target variable
or the percentage change from the 20 days before stock price.
Applying the LSTM model on this dataset and predict the
percentage change from the 20 days before stock price and save
the model.
OUTCOME OF LSTM MODEL:
INTERPRETATION OF OUTCOME:
In the above validation graph it is clearly shows that the
prediction of percentage change of previous 20 days of Apple
stock is approximately following the actual percentage of
previous 20 days Apple stock price. This means that the
outcome of the model prediction is descent.
MODEL BUILDING II TFIDF VECTORIZER AND
(RANDOM FOREST MODEL ON STOCK NEWS
DATA):
Random forest is a supervised classification machine learning
algorithm which uses ensemble method. Simply put, a random
forest is made up of numerous decision trees and helps to tackle
the problem of overfitting in decision trees. These decision
trees are randomly constructed by selecting random features
from the given dataset.
Random forest arrives at a decision or prediction based on the
maximum number of votes received from the decision trees.
The outcome which is arrived at, for a maximum number of
times through the numerous decision trees is considered as the
final outcome by the random forest. Random forests are based
on ensemble learning techniques. Ensemble, simply means a
group or a collection, which in this case, is a collection of
decision trees, together called as random forest. The accuracy
of ensemble models is better than the accuracy of individual
models due to the fact that it compiles the results from the
individual models and provides a final outcome. Labelling the
news data according to the stock market movement from the
previous day. After labelling using the TFIDF Vectorizer
convert a collection of raw documents to a matrix of TF-
IDF features. Equivalent to CountVectorizer followed by
TfidfTransformer. After that lemmatization applied to
each row to get the all words to common form which will
be helpful while assigning polarity to each word.
Lemmatization is done with the help of natural language
toolkit (NLTK) package which is available in python.
After doing all the text processing making the word
distribution by EDA.
The distribution of the most frequent words are here
(Using one word at a time):
LIMITATION:
The challenge of the stock price forecast is the most
crucial component for companies and equity traders to
predict future revenues.
Building accurate model is difficult as variation in price
depends on multiple factors such as news, social media
data, production of the company, government bonds, and
historical price, country’s economics.
CONCLUSION:
Knowledge of stock movements by a fraction of a second can
lead to high profits investors can make which makes stock
market studies a major motivation.
The great advances and success of natural language process
and sentiment analysis of online news based on machine
learning and deep learning have gained huge popularity
recently in the financial domain especially in market prediction
models.
There are many predictive models which tell about the market
trend whether it is up or down, but they fail to give accurate
results. The project has been made to build efficient predictive
model of stock market where the movement for the next day is
predicted.
REFERANCES:
1. https://www.intechopen.com/books/e-business-higher-
education-and-intelligence-applications/recent-advances-in-
stock-market-prediction-using-text-mining-a-survey
2.https://www.cse.ust.hk/~rossiter/fyp/103_RO4_Final_2018
19.pdf