You are on page 1of 38

lOMoARcPSD|20617519

STOCK PRICE PREDICTOR


Using Long-Short Term Memory Networks

Udacity Machine Learning Nanodegree

MINOR PROJECT REPORT


lOMoARcPSD|20617519

Submitted by

INDU SHARMA (1619066)

in partial fulfillment for the award of the degree

of

MASTER’S OF TECHNOLOGY

in

BRANCH COMPUTER SCIENCE AND ENGINEERING

JIND INSTITUTE OF ENGINEERING AND TECHNOLOGY - JIND

KURUKSHETRA UNIVERSITY

(2022-2023)

CERTIFICATE
lOMoARcPSD|20617519

This is to certify that Minor project entitled “DRIVER DROWSINESS SYSTEM” is a bonafide
work carried out by “INDU SHARMA (1619066)” under my guidance and supervision and
submitted in partial fulfillment of the award of M. Tech degree in Computer science and
Engineering. The work embodied in the Minor Project has not been submitted for the award of other
degree or diploma to the best of my knowledge

Ms. Neeraj
(Asst.prof.)
(PROJECT
SUPERVISION)

Ms. Sapna Aggrawal


(Head of the Department)

STUDENT’S DECLARATION
lOMoARcPSD|20617519

I hereby certify that the work which is being presented in the minor project report
entitled "STOCK PRICE PREDICTOR" in fulfillment of the requirement for the
award of the Degree of Master’s of Technology in Department of Computer Science
& Engineering of Jind Institute of Engineering and Technology, Jind, Kurukshetra
University, Kurukshetra, Haryana is an authenticrecord of my own work carried out
during 2th semester.

INDU SHARMA
(1619066)

ACKNOWLEDGEMENT
We are highly grateful to the Dr. S.K Singh, Principal, Jind Institute of Engineering
and Technology, Jind, for providing this opportunity.
lOMoARcPSD|20617519

The constant guidance and encouragement received from Ms Sapna Aggrawal, HOD
(CSE/IT, deptt.), JIET, Jind has been of great help in carrying out the project work
and is acknowledged with reverential thanks.
We would like to express a deep sense of gratitude and thanks profusely to Asst.Prof.
Neeraj project guide, without the wise counsel and able guidance, it would have
been impossible to complete the report in this manner.
We express gratitude to other faculty members of CSE department of JIET for their
intellectual support throughout the course of this work.
Finally, the authors are indebted to all whosoever have contributed in this report
work

Table of Content
1
DEFINITION 2
Project Overview 2
Problem Statement 3
Metrics 3
lOMoARcPSD|20617519

ANALYSIS 4
Data Exploration 4
Exploratory Visualization 5
Algorithms and Techniques 6
Benchmark Model 7
METHODOLOGY 8
Data Preprocessing 8
Implementation 10
Refinement 11
RESULT 14
Model Evaluation and Validation 14
Justification 17
CONCLUSION 18
Free-Form Visualization 18
Reflection 19
Improvement 20

DEFINITION

Project Overview
Investment firms, hedge funds and even individuals have been using financial models to better
understand market behavior and make profitable investments and trades. A wealth of information
is available in the form of historical stock prices and company performance data, suitable for
machine learning algorithms to process.
lOMoARcPSD|20617519

Can we actually predict stock prices with machine learning? Investors make educated guesses by
analyzing data. They'll read the news, study the company history, industry trends and other lots of
data points that go into making a prediction. The prevailing theories is that stock prices are totally
random and unpredictable but that raises the question why top firms like Morgan Stanley and
Citigroup hire quantitative analysts to build predictive models. We have this idea of a trading
floor being filled with adrenaline infuse men with loose ties running around yelling something
into a phone but these days they're more likely to see rows of machine learning experts quietly
sitting in front of computer screens. In fact about 70% of all orders on Wall Street are now placed
by software, we're now living in the age of the algorithm.

This project seeks to utilize Deep Learning models, Long-Short Term Memory (LSTM) Neural
Network algorithm, to predict stock prices. For data with timeframes recurrent neural networks
(RNNs) come in handy but recent researches have shown that LSTM, networks are the most
popular and useful variants of RNNs.

I will use Keras to build a LSTM to predict stock prices using historical closing price and trading
volume and visualize both the predicted price values over time and the optimal parameters for the
model.

Problem Statement
The challenge of this project is to accurately predict the future closing value of a given stock
across a given period of time in the future. For this project I will use a Long Short Term
Memory networks1 – usually just called “LSTMs” to predict the closing price of the S&P 5002
using a dataset of past prices

GOALS
1. Explore stock prices.

2. Implement basic model using linear regression.

3. Implement LSTM using keras library.

4. Compare the results and submit the report.


lOMoARcPSD|20617519

Metrics
For this project measure of performance will be using the Mean Squared Error (MSE) and Root
Mean
Squared Error (RMSE) calculated as the difference between predicted and actual values of the
target

ANALYSIS

Data Exploration
3
The data used in this project is of the Alphabet Inc from January 1, 2005 to June 20, 2017,
this is a series of data points indexed in time order or a time series. My goal was to predict the
closing price for any given date after training. For ease of reproducibility and reusability, all data
was pulled from the Google Finance Python API 4.

The prediction has to be made for Closing (Adjusted closing) price of the data. Since Google
Finance already adjusts the closing prices for us 5, we just need to make prediction for
“CLOSE” price.

The dataset is of following form :


lOMoARcPSD|20617519

Date Open High Low Close Volume

30-Jun-17 943.99 945.00 929.61 929.68 2287662

29-Jun-17 951.35 951.66 929.60 937.82 3206674

28-Jun-17 950.66 963.24 936.16 961.01 2745568

Table: The whole data can be found out in ‘Google.csv’ in the project root folder6
Note: I did not observe any abnormality in datasets, i.e, no feature is empty and does not contains
any incorrect value as negative values.

3 Alphabet In c
5
4 Google Finance python ap i adjusts
the closing prices for us 6
Google.csv

The mean, standard deviation, maximum and minimum of the data was found to be following:

Feature Open High Low Close Volume

Mean 382.5141 385.8720 378.7371 382.3502 4205707.8896


lOMoARcPSD|20617519

Std 213.4865 214.6022 212.08010 213.4359 3877483.0077

Max 1005.49 1008.61 1008.61 1004.28 41182889

Min 87.74 89.29 86.37 87.58 521141

We can infer from this dataset that date, high and low values are not important features of the
data. As it does not matter at what was the highest prices of the stock for a particular day or what
was the lowest trading prices. What matters is the opening price of the stock and closing prices of
the stock. If at the end of the day we have higher closing prices than the opening prices that we
have some profit otherwise we saw losses. Also volume of share is important as a rising market
should see rising volume, i.e, increasing price and decreasing volume show lack of interest, and
this is a warning of a potential reversal. A price drop (or rise) on large volume is a stronger signal
that something in the stock has fundamentally changed.

Therefore i have removed Date, High and low features from data set at preprocessing step. The
mean, standard deviation, maximum and minimum of the preprocessed data was found to be
following:

Mean Std Max Min


lOMoARcPSD|20617519

Open 0.3212 0.23261 1.0 0.0

0.232
Close 0.3215 8 1.0 0.0

Volume 0.09061 0.0953 1.0 0.0

Exploratory Visualization
1
To visualize the data i have used matplotlib library. I have plotted Closing stock price of

the data with the no of items( no of days) available.

Following is the snapshot of the plotted data :

1 Matplotlib
lOMoARcPSD|20617519

X-axis: Represents Tradings Days


Y-axis: Represents Closing Price In USD

Through this data we can see a continuous growth in Alphabet Inc. The major fall in the prices between
6001000 might be because of the Global Financial Crisis of 2008-2009.

Algorithms and Techniques

The goal of this project was to study time-series data and explore as many options
as possible to accurately predict the Stock Price. Through my research i came to
know about Recurrent Neural Nets (RNN)2 which are used specifically for
2 Recurrent Neural Networ k
lOMoARcPSD|20617519

sequence and pattern learning. As they are networks with loops in them, allowing
information to persist and thus ability to memorise the data accurately. But
Recurrent Neural Nets have vanishing Gradient descent problem which does not
allow it to learn from past data as was expected. The remedy of this problem was
solved in Long-Short Term Memory Networks3, usually referred as LSTMs.
These are a special kind of RNN, capable of learning long-term dependencies.

In addition to adjusting the architecture of the Neural Network, the following full
set of parameters can be tuned to optimize the prediction model:

• Input Parameters

• Preprocessing and Normalization (see Data Preprocessing Section)

• Neural Network Architecture

• Number of Layers (how many layers of nodes in the model; used 3)

• Number of Nodes (how many nodes per layer; tested 1,3,8, 16, 32, 64,
100,128)

• Training Parameters

• Training / Test Split (how much of dataset to train versus test model on; kept
constant at
82.95% and 17.05% for benchmarks and lstm model)

• Validation Sets (kept constant at 0.05% of training sets)

• Batch Size (how many time steps to include during a single training step;
kept at
1 for basic lstm model and at 512 for improved lstm model)

• Optimizer Function (which function to optimize by minimizing error; used


“Adam” throughout)

3 Long-Short Term Memory


lOMoARcPSD|20617519

• Epochs (how many times to run through the training process; kept at 1 for
base

model and at 20 for improved LSTM)

Benchmark Model
For this project i have used a Linear Regression model as its primary benchmark. As one of my
goals is to understand the relative performance and implementation differences of machine
learning versus deep learning models. This Linear Regressor was based on the examples
presented in Udacity’s Machine Learning for Trading course and was used for error rate
comparison MSE and RMSE utilizing the same dataset as the deep learning models.

Following is the predicted results that i got from my benchmark model :


lOMoARcPSD|20617519

X-axis: Represents Tradings Days


Y-axis: Represents Closing Price In USD
Green line: Adjusted Close price
Blue Line: Predicted Close price

Train Score: 0.1852 MSE (0.4303 RMSE)


lOMoARcPSD|20617519

Test Score: 0.08133781 MSE (0.28519784 RMSE)

METHODOLOGY

Data Preprocessing
Acquiring and preprocessing the data for this project occurs in following sequence, much of
which has been modularized into the preprocess.py file for importing and use across all
notebooks:

• Request the data from the Google Finance Python API and save it in google.csv file in the

following format.

Date Open High Low Close Volume

30-Jun-17 943.99 945.00 929.61 929.68 2287662

29-Jun-17 951.35 951.66 929.60 937.82 3206674

28-Jun-17 950.66 963.24 936.16 961.01 2745568

• Remove unimportant features(date, high and low) from the acquired data and reversed the order
of data, i.e., from january 03, 2005 to june 30, 2005
lOMoARcPSD|20617519

Item Open Close Volume

0 98.80 101.46 15860692

1 100.77 97.35 13762396

2 96.82 96.85 8239545

3 97.72 94.37 10389803

• Normalised the data using MinMaxScaler helper function from Scikit-Learn.

Item Open Close Volume

0 0.012051 0.015141 0.377248


lOMoARcPSD|20617519

1 0.014198 0.010658 0.325644

2 0.009894 0.010112 0.189820

3 0.010874 0.007407 0.242701

• Stored the normalised data in google_preprocessed.csv file for future reusability.

• Splitted the dataset into the training (68.53%) and test (31.47%) datasets for linear regression
model. The split was of following shape :
x_train (2155, 1)
y_train (2155, 1)
x_test (990, 1)
y_test
(990, 1)

• Splitted the dataset into the training (82.95%) and test (17.05%) datasets for LSTM model. The
Split was of following shape: x_train (2589, 50, 3) y_train (2589,) x_test (446, 50, 3) y_test
(446,)

Implementation

Once the data has been downloaded and preprocessed, the implementation process occurs
consistently through all three models as follow:
lOMoARcPSD|20617519

I have thoroughly specified all the steps to build, train and test model and its predictions in the
notebook itself.
lOMoARcPSD|20617519

Some code implementation insight:

Benchmark model :

Step 1 : Split into train and test model :

Here I am calling a function defined in ‘stock_data.py’ which splits the data for linear

regression model. The function is as follows :


lOMoARcPSD|20617519
lOMoARcPSD|20617519

6
Step 2: In this step model is built using scikit-learn linear_model library.

Here I am calling a function defined in ‘LinearRegressionModel.py ’ which builds the

model for the project. The screenshot of the function is as follows:


lOMoARcPSD|20617519

Step 3: Now it’s time to predict the prices for given test datasets.

The screenshot of the function is as follows, it is defined in

‘LinearRegressionModel.py ’:
lOMoARcPSD|20617519

Step 4: Finally calculate the test score and plot the results of benchmark model.

Improved LSTM model :

Step 1 : Split into train and test model :

Note : The same set of training and testing data is used for improved LSTM as is
used with basic LSTM.

Step 2 : Build an improved LSTM model :

1
lOMoARcPSD|20617519

Epochs from 1 to 20 for Here I am calling a function defined in ‘lstm.py ’ which


builds the improved lstm model for the project.

The screenshot of the function is as follows:

7
NOTE: The function uses keras Long short term memory library to implement LSTM
model.

I have increased the batch_size to 512 from my improved LSTM model

Also in the function i have add increased the no of nodes in hidden layer to 128
from 100 and have added a drop out of 0.2 to all the layers.
lOMoARcPSD|20617519

Step 3: We now need to train our model.


lOMoARcPSD|20617519

I have used here a built in library function to train the model.


lOMoARcPSD|20617519

Step 4: Now it’s time to predict the prices for given test datasets.

I have used a built-in function to predict the outcomes of the model.

Step 5: Finally calculate the test score and plot the results of improved
LSTM model.

Refinement
For this project i have worked on fine tuning parameters of LSTM to get better predictions. I did
the improvement by testing and analysing each parameter and then selecting the final value for
each of

them.

To improve LSTM i have done following:

● Increased the number of hidden node from 100 to 128.


● Added Dropout of 0.2 at each layer of LSTM
lOMoARcPSD|20617519

● Increased batch size from 1 to 512


● Increased epochs from 1 to 20
● Added verbose = 2
● Made prediction with the batch size

Thus improved my mean squared error, for testing sets, from 0.01153170 MSE to

0.00093063 MSE.

The predicted plot difference can be seen as follows:


lOMoARcPSD|20617519

Fig : Plot For Adjusted Close and Predicted Close Prices for basic LSTM model
lOMoARcPSD|20617519

Fig : Plot For Adjusted Close and Predicted Close Prices for improved LSTM model
lOMoARcPSD|20617519

RESULT

Model Evaluation and Validation


With each model i have refined and fined tune my predictions and have reduced mean squared
error significantly.

● For my first model using linear regression model:


● Train Score: 0.1852 MSE (0.4303 RMSE)
● Test Score: 0.08133781 MSE (0.28519784 RMSE)

Fig: Plot of Linear Regression Model

● For my second model using basic Long-Short Term memory model:


lOMoARcPSD|20617519

● Train Score: 0.00089497 MSE (0.02991610 RMSE)


● Test Score: 0.01153170 MSE (0.10738577 RMSE)

Fig: Plot of basic Long-Short Term Memory model

● For my third and final model, using improved Long-Short Term memory model:
lOMoARcPSD|20617519

● Train Score: 0.00032478 MSE (0.01802172 RMSE)


● Test Score: 0.00093063 MSE (0.03050625 RMSE)

Fig: Plot of Improved Long-Short Term Memory Model

Robustness Check :
lOMoARcPSD|20617519

For checking the robustness of my final model i used an unseen data, i.e, data of Alphabet Inc.
from July 1, 2017 to July 20, 2017. On predicting the values of unseen data i got a decent result
for the data. The results are as follows:

Test Score: 0.3897 MSE (0.6242 RMSE)

Justification
Comparing the benchmark model - Linear Regression to the final improved LSTM model, the
Mean
Squared Error improvement ranges from 0.08133781 MSE (0.28519784 RMSE) [Linear

Regression Model] to 0.00093063 MSE (0.03050625 RMSE) [Improved LSTM] . This

significant decrease in error rate clearly shows that my final model have surpassed the basic and

benchmark model.

Also the Average Delta Price between actual and predicted Adjusted Closing Price values was:

Delta Price: 0.000931 - RMSE * Adjusted Close Range

Which is less than one cent :)


lOMoARcPSD|20617519

CONCLUSION

Free-Form Visualization
I have already discussed all the important features of the datasets and their visualisation in one of
the above sections. But to conclude my report i would choose my final model visualization,
which is improved version of LSTM by fine tuning parameters. As i was very impressed on
seeing how close i have gotten to the actual data, with a mean square error of just 0.0009. It was
an ‘Aha!’ moment for me as i had to poke around a lot (really ALOT !! :P ). But it was fun
working on this project.

Fig: Plot of Improved Long-Short Term Memory Model

Reflection
To recap, the process undertaken in this project:
lOMoARcPSD|20617519

● Set Up Infrastructure
0 iPython Notebook
○ Incorporate required Libraries (Keras, Tensor flow, Pandas, Matplotlib, Sklearn,
Numpy)
○ Git project organization
● Prepare Dataset
○ Incorporate data of Alphabet Inc company
○ Process the requested data into Pandas Dataframe
○ Develop function for normalizing data

○ Dataset used with a 80/20 split on training and test data across all models
● Develop Benchmark Model
○ Set up basic Linear Regression model with Scikit-Learn
○ Calibrate parameters
● Develop Basic LSTM Model
0 Set up basic LSTM model with Keras utilizing parameters from Benchmark
Model
● Improve LSTM Model
0 Develop, document, and compare results using additional labels for the LSMT
model 5. Document and Visualize Results
● Plot Actual, Benchmark Predicted Values, and LSTM Predicted Values per time series ●
Analyze and describe results for report.

I started this project with the hope to learn a completely new algorithm, i.e, Long-Short Term
Memory and also to explore a real time series data sets. The final model really exceeded my
expectation and have worked remarkably well. I am greatly satisfied with these results.

The major problem i faced during the implementation of project was exploring the data. It was
toughest task. To convert data from raw format to preprocess data and then to split them into
training and test data. All of these steps require a great deal of patience and very precise
approach. Also i had to work around a lot to successfully use the data for 2 models, i.e, Linear
Regression and Long-Short Term Memory, as both of them have different inputs sizes. I read
many research papers to get this final model right and i think it was all worth it :)

Improvement
Before starting my journey as Machine Learning Nanodegree Graduate i had no prior experience
in python. In the beginning of this course to do everything with python, i had to google it. But
now i have not only made 7 projects in python, i have explored many libraries along the ways
lOMoARcPSD|20617519

and can use them very comfortably. This is all because of highly interactive videos and forum
provided by Udacity. I am really happy and satisfied taking up this course.

And as there is scope of improvement in each individual so is the case with this project. This
project though predicts closing prices with very minimum Mean Squared Error, still there are
many things that are lagging in this project. Two of most important things are :

● There is no user interaction or interface provided in this project. A UI can be provided


where user can check the value for future dates.
● The stocks used for this project are only of Alphabet Inc, we can surely add more S&P
500 in the list so as to make this project more comprehensive.

I would definitely like to add these improvement to this project in future.

You might also like