Professional Documents
Culture Documents
I hereby declare that the project report entitled “Assessing the stock market for
new investors” submitted by me to Vellore Institute of Technology University,
Vellore in partial fulfilment of the requirement for the award of the course
Artificial Intelligence (ITE2010) is a record of bonafide project work carried out
by me under the guidance of Prof. Chiranji Lal Chowdhary. I further declare
that the work reported in this project has not been submitted and will not be
submitted, either in part or in full, for the award of any other course.
CERTIFICATE
This is to certify that the project report entitled “Assessing the stock market for
new investors” submitted by Binayak Bishnu (20BIT0155) to Vellore Institute
of Technology University, Vellore in partial fulfilment of the requirement for the
award of the course Artificial Intelligence (ITE2010) is a record of bonafide
work carried out by them under my guidance.
Abstract
New investors are often overwhelmed when it comes to investing in the stock market and prefer an
automated system to guide them and help them make informed decisions. Neural networks play an
important role in developing machine learning algorithms that deal with sequential data like stock
market data. The goal was to determine an effective and reliable prediction system to predict the stock
market prices for multiple steps. This study aims to compare various predicting algorithms to
determine the best fit for the given context. The best fit is decided by how close the prediction is to
the real value and for how many steps can it maintain acceptable predictions. The study found that the
Long Short-Term Memory (LSTM) model matches these requirements. It is a modified form of
Recurrent Neural Networks (RNN) and overcomes various challenges like the vanishing gradient
problem. The literature review contributed toward determining various paths to develop such systems.
The implementation done as part of this study showed the effectiveness of the LSTM model.
Currently, it works on a sample dataset of one stock but can be scaled up to make a more
comprehensive and deployable model. The LSTM is fairly new and complex. The aim is to develop it
further to make it easier to implement while retaining efficiency and reliability. The simplicity of
implementation will also support further expansion.
I. INTRODUCTION
Stock market prediction systems have been in high demand in recent times after artificial intelligence
and machine learning models became popular even in non-technical fields and workspaces. An efficient
predictor can not only help businesses and individuals to make informed decisions but also act as a guide for
those just starting. RNN models are quite effective in predicting sequential time series data due to their recursive
nature but cannot overcome the vanishing gradient problem. LSTM takes over and introduces a multi-level
recursive system with three gates to control the flow of information. It also decides which data to keep and
remove to be used for the next steps of prediction.
II. PRELIMINARIES
Preliminary concepts include the various neural networks namely, Convolution Neural Network, Recurrent
Neural Network and Deep Neural Network. It also includes knowledge of Python and Jupyter notebooks for the
implementation section.
Note: The following table contains the condensed findings from multiple papers. A list of references is
attached in the appendix.
Assessing the Stock Market for new investors
Deep predictor Predicts the price movement Input charts are easy to More towards qualitative
using candlestick of a given (k + 1)th day, by read. analysis which is limited to
charts taking the price trend of the giving the direction only.
past k-days as input. RNN is better when
working with
sequential data.
The dynamic Best out of multiple models Works better than a Very large size. Ignores
predictor selection selected based on input. single model. Reduces feature dependency.
algorithm overfitting.
RNN prediction Retains memory of what it Design specifically for Training the model is slow
models has already processed sequential datasets. and complex.
CNN prediction Designed to map image data Not very dependent on Not ideal for sequential data
models to an output variable. pre-processing of
model or data
Assessing the Stock Market for new investors
Architecture
The dataset has been collected from the Kaggle dataset (HP stocks). The data is first passed through
some preprocessing to get an idea about how the data looks and check for any outliers since they disrupt the
training process for the model. This is done primarily using matplotlib.
The process data set (Final dataset) is then split into train and test sets. The training set is used to build
the model while the testing set is used to test the model created in every iteration of its building process. If there
are discrepancies, the test model guides the LSTM process to rebuild the model for better accuracy.
While the testing is going on, the model created in each iteration is evaluated and evaluation data is
also created to show how the model building builds and rebuilds itself.
Assessing the Stock Market for new investors
Module Description
Data collected from Kaggle is visualized using matplotlib to check for outliers which
negatively affect the model building process.
The data is then prepared by first dividing it into train and test subsets. The data is then
normalized. Normalization is used to make all numerical columns into having the same scale. This is
necessary when combining columns for modelling. There are two methods:
1. Min-Max Scaling – Subtract the minimum value from the highest value and divide by
the range. Each new column has a minimum value of 0 and a maximum value of 1.
2. Standardization Scaling – Subtracting the mean of each observation and then dividing
by the standard deviation. Making mean=0, SD=1
After the LSTM is run, the prediction and the evaluation data are ready.
The output of the LSTM prediction of each iteration is plotted on a graph and compared to real values
to show the precision at each step.
The best iteration is then selected based on it having the least Mean Squared Error (MSE). This is then
plotted against the real values to show its precision.
V. EXPERIMENTS RESULTS
The dataset collected from Kaggle is loaded into the python project. Into the df variable
Assessing the Stock Market for new investors
2. Data visualized
The LSTM is running and printing the Mean Squared Error (MSE) of each step and the loss too.
5. Predictions from all cycles plotted together and compared to real data
As mentioned in the module description, results or all iterations are shown against the real value to
show the precision generated at each step.
Assessing the Stock Market for new investors
The best iteration is chosen (smallest MSE) and plotted against the real values to show the final (and
the best) prediction of the LSTM model.
RNN has nodes which are joined by directed or undirected graphs. They are quite efficient in
predicting time series data. However, they have a major drawback.
RNN has the vanishing gradient problem wherein the gradients either become negligible or explode
to infinity during back-tracking. This is partly because they use numbers with finite precision.
On the other hand, LSTM allows the gradients to flow without making any changes to them. They
have multiple layers and more than one gate to handle the flow of data within the model.
2. Averaging vs LSTM
There are two averaging methods. However, both can be used only for one-step prediction, that is,
only the next value after the last value can be effectively predicted by these methods. The Standard
Averaging method predicts the data based on the average of N data points while the exponent moving
average gives more weightage to newer data to calculate the future data.
Assessing the Stock Market for new investors
This presentation has introduced and given context to the topic of this research work. The literature survey
done has revealed various paths that can be taken to develop an effective and reliable model for stock prediction.
This work also involves a small implementation of the same using a sample stock price dataset which can then
be scaled up into a more comprehensive and deployable model.
The proposed architecture uses LSTM which is a fairly new and complex concept used for sequential data
prediction. The objective is to make the model easy to implement to support further expansion and work on this
while also retaining its efficiency and reliability. The concept of a bidirectional can be introduced where an
additional layer of LSTM is used and the outputs of both are combined to form a final output.
Assessing the Stock Market for new investors
REFERENCES
1. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0252404
DPP: Deep predictor for price movement from candlestick charts
Chih-Chieh Hung ,Ying-Ju Chen
2. https://www.sciencedirect.com/science/article/abs/pii/S0957417421011982
Dynamic Predictor selection algorithm for predicting stock market movement
Shuting Dong, Jiaxin Wang, Hongze Luo, Haodang Wang, Fang-Xiang Wu
3. https://ijcsit.com/docs/Volume%202/vol2issue3/ijcsit2011020322.pdf
Filter versus Wrapper Feature Subset Selection in Large Dimensionality Microarray: A Review Binita
Kumari#1, Tripti Swarnkar*2
4. https://www.sciencedirect.com/science/article/pii/S1877050920307237
Optimizing LSTM for time series prediction in the Indian stock market
AnitaYadav, C K Jha, AditiSharan
5. https://content.iospress.com/articles/algorithmic-finance/af176
Classification-based financial markets prediction using deep neural networks
Dixon, Matthew, Klabjan, Diego, Bang, Jin Hoon
6. https://www.sciencedirect.com/science/article/pii/S1877050918307828
NSE Stock Market Prediction Using Deep-Learning Models
Hiransha M, Gopalakrishnan E.A., Vijay Krishna Menon, SomanK.P.
7. https://neptune.ai/blog/predicting-stock-prices-using-machine-learning
Predicting Stock Prices Using Machine Learning
Author Katherine (Yi) Li
8. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5167535/
Structured prediction models for RNN based sequence labelling in clinical text
Abhyuday N Jagannatha1 and Hong Yu1,2
9. https://ieeexplore.ieee.org/abstract/document/7952591
Sequence segmentation using joint RNN and structured prediction models
Yossi Adi; Joseph Keshet; Emily Cibelli; Matthew Goldrick
10. https://www.sciencedirect.com/science/article/pii/S1877050916311619
Prediction Models for the Indian Stock Market
AparnaNayak, M. M. Manohara Pai, Radhika M. Pai
11. https://machinelearningmastery.com/gentle-introduction-long-short-term-memory-networks-experts/
A Gentle Introduction to Long Short-Term Memory Networks by the Experts
Jason Brownlee
12. https://www.mdpi.com/1999-4893/14/8/251
Comparative Analysis of Recurrent Neural Networks in Stock Price Prediction for Different Frequency
Domains
Polash Dey, Emam Hossain, Md. Ishtiaque, Mohammed Armanuzzaman Chowdhury, Md. Shariful
Alam, Mohammad Shahadat Hossain, Karl Andersson
13. https://www.researchgate.net/publication/220355039_The_Vanishing_Gradient_Problem_During_Lear
ning_Recurrent_Neural_Nets_and_Problem_Solutions
Assessing the Stock Market for new investors
The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions
Sepp Hochreiter
14. https://www.mdpi.com/2079-9292/10/21/2717
Stock Market Prediction Using Machine Learning Techniques: A Decade Survey on Methodologies,
Recent Developments, and Future Directions
Nusrat Rouf, Majid Bashir Malik, Tasleem Arif, Sparsh Sharma, Saurabh Singh, Satyabrata Aich, Hee-
Cheol Kim
15. https://www.ijeat.org/wp-content/uploads/papers/v8i4/D6321048419.pdf
Stock Market Prediction Using Machine Learning Algorithms
K. Hiba Sadia, Aditya Sharma, Adarrsh Paul, SarmisthaPadhi, Saurav Sanyal
16. https://iq.opengenus.org/disadvantages-of-rnn/
ITE2010-AI REVIEW 2 PRESENTATION
Other challenges
Conclusion
Future directions
Binayak Bishnu
References
Motivation
New investors are often overwhelmed when it comes to investing in the stock market and
prefer an automated system to guide them and help them make informed decisions.
Binayak Bishnu
Synopsis
Binayak Bishnu
Introduction 40
30
Binayak Bishnu
Preliminaries
Useful pre-existing concepts
Dynamic predictor selection Best out of multiple models Works better than single model. Very large size. Ignores feature
algorithm selected based on input. Reduces overfitting. dependency.
Creates non-linear relationships Reduces overfitting. Might lead to over analysis with
DNNs for classification
between the independent and Good balance of speed and respect to the scope of this
prediction
dependent variables. capacity. research.
Retains a memory of what it Design specifically for Slow and complex training
RNN prediction models
has already processed sequential datasets. procedures.
Designed to map image data to Not very dependent on pre- A lot of data required for proper
CNN prediction models
an output variable. processing of model or data training
Binayak Bishnu
Final review
Work
Binayak Bishnu
Proposed Algorithm
LONG SHORT-TERM MEMORY MODEL (LSTM)
FELIX A. GERS
Using LSTM
Rebuilding based
on the result
Evaluation data
1. Data collection and preparation
Standardization scaling
Calculations and
prediction
Expanding to more than one step Contains gates that regulate the
leads to incorrect predictions flow of information through the
unit better
Prediction restricted to one step
(next day/session) Solves the vanishing gradient
problem in RNN
1. Standard averaging
2. Exponential moving average
Standard Averaging
P(t+1) = (ΣPi)/N
P(t+1) = price next day or session
Pi = observed prices
N total number of days
Binayak Bishnu
Future Directions
The proposed architecture uses LSTM which is a fairly new and complex
concept used for sequential data prediction.
The objective is to make the model easy to implement to support further
expansion and work on this while also retaining its efficiency and reliability.
Binayak Bishnu
References
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5167535/
https://journals.plos.org/plosone/article?
id=10.1371/journal.pone.0252404 https://ieeexplore.ieee.org/abstract/document/7952591
https://www.sciencedirect.com/science/article/abs/pii https://tinyurl.com/Model4IndiaStock
/S0957417421011982
https://machinelearningmastery.com/gentle-introduction-
https://ijcsit.com/docs/Volume%202/vol2issue3/ijcsit2 long-short-term-memory-networks-experts/
011020322.pdf
Comparative Analysis of Recurrent Neural Networks in
https://www.sciencedirect.com/science/article/pii/S18 Stock Price Prediction for Different Frequency Domains
77050920307237
The vanishing gradient problem during learning recurrent
https://content.iospress.com/articles/algorithmic- neural nets and problem solutions
finance/af176
Stock Market Prediction Using Machine Learning
https://www.sciencedirect.com/science/article/pii/S18 Techniques: A Decade Survey on Methodologies, Recent
77050918307828 Developments, and Future Directions
https://neptune.ai/blog/predicting-stock-prices-using- https://www.ijeat.org/wp-
machine-learning content/uploads/papers/v8i4/D6321048419.pdf
Do you have any questions?
Binayak Bishnu