You are on page 1of 40

A

Project Report
On

Share Market Prediction using Machine Learning


Submitted in the partial fulfillment of the requirement of

Bachelor of Technology
In

Computer Science and Engineering

Under the Guidance of Submitted by

Mr. Tejbir Rana Mukul Gupta


(Asst.Professor)
6319018
. B.Tech 6th Sem

Department of Computer Science and Engineering


Guru Nanak Institute of Technology, Mullana
Session 2019-23

1
CANDIDATE’S DECLARATION
I hereby certify that the work which is being presented in this Project entitled “Share
Market Prediction using Machine Learning” in partial fulfillment of requirement for the
award of degree of B. Tech., Computer Science and Engineering submitted in Department of
Computer Science & Engineering at Guru Nanak Institute of Technology, Mullana, affiliated
to Kurukshetra University , Kurukshetra is an authentic record of my work carried out under
the supervision of Mr. Tejbir Rana, Assistent Proffessor of Department of CSE ,GNIT,
Mullana, Ambala.
The matter presented here has not been submitted by me in any other University /
Institute for the award of any other degree.

Mukul Gupta
University Roll No. - 6319018

This is to certify that the above statement made by the candidate is correct to the best of my
knowledge.

Tejbir Rana Dr. Sidharth Arora


(Asst. Prof.) HOD
Department of CSE

2
ACKNOWLEDGEMENT

I wish to express my sincere regards and gratitude to Mr. Tejbir Rana,( Assistant Professor)
in Department - Computer Science and engineering for his guidance, moral boosting,
continuous encouragement and appreciation, which are the vital factors in successful
completion of my minor project work.

I sincerely extend my thanks to Dr. Sidharth Arora ,Head of department of Computer


Science & Engineering for his administrative support and guidance.

My sincere thanks to all the faculty members and technical staff of Computer Science &
Engineering Department for providing the pleasant working conditions in the complete
duration of the minor project

These Acknowledgements shall remain inconclusive without thanking my classmates who


helped in providing calm and congenial atmosphere without which my project would not
have seen light of the day.

Mukul Gupta

6319018

………….. …………..
Internal’s Signature External’s signature

3
LIST OF FIGURES

FIG. NO. DESCRIPTION PAGE NO.

2.1.1 (a) Linear Regression Equation 12

2.1 (a)
Financial facts associated 17
with the system (Tanglible
Costs)
2.1 (b)
Financial facts associated 17
with the system (Intangible
Costs)
4.1 (a)
System Design 28

4
4.1 (b)
System Design Use case 29
Index

5 (a)
Final Result Diagram 35

CONTENTS
Candidate Declaration 2
Acknoledgement 3
Certificate 4
List of figures 5

CHAPTER 1 : INTRODUCTION 8-9

1.1 Overview 8

1.2 IDEA 8

1.3 Scope of the Project 9

CHAPTER 2 : FEASIBILITY STUDY 10-16

2.1 Feasibility 10

2.1.1 Comparision with existing System 11

2.1.2 Technical Feasibility 12

2.1.3 Operational Feasibility 13

2.1.4 Economical Feasibility 15

5
2.2 Costs and Benefits Analysis 16

CHAPTER 3 : PROBLEM FORMULATION AND REQUIREMENT

PHASE 18-24

3.1 Problem Formulation 18

3.2 Objective 19

3.2.1 Academic Understanding of Stock Market 12

3.3 Functional Requirement 25

3.4 Non-Functional Requirement 27

CHAPTER 4 : SYSTEM DESIGN AND IMPLEMENTATION 28-34

4.1 System Design 28

4.2 System Implementation 30

4.2.1 Moving Average 31

4.2.2 Linear Regression 32

4.2.3 K-Nearest Neighbour 33

4.2.4 Auto ARIMA 33

4.2.5 Prophet 34

4.2.6 Long Short Term Memory 34

CHAPTER 5 : Result and Discussion 35

5.1 Result 36

5.2 Discussion 36

CONCLUSIONS AND FUTURE SCOPE 36-37

6.1 Conclusions 36
6.2 Scope For Future Work 37

6
REFERENCES 29

ANNEXURE 38-40

1. Chapter 1 – INTRODUCTION

Overview:
Predicting how the share market will perform is one of the most difficult things to
do. There are so many factors involved in the prediction – physical factors vs.
physhologocal, rational and irrational behavior, etc. All these aspects combine to
make share prices volatile and very difficult to predict with a high degree of

accuracy.

Accurate prediction of stock market returns is a very challenging task due to


volatile and non-linear nature of the financial stock markets. With the introduction
of artificial intelligence and increased computational capabilities, programmed
methods of prediction have proved to be more efficient in predicting stock prices.
In this work, Artificial Neural Network and Random Forest techniques have been
utilized for predicting the next day closing price.

We will use machine learning. Using features like the latest announcements about
an organization, their quarterly revenue results, etc., machine learning techniques
have the potential to unearth patterns and insights we didn’t see before, and these
can be used to make unerringly accurate predictions. we will work with historical
data about the stock prices of a publicly listed company. We will implement a mix

7
of machine learning algorithms to predict the future stock price of this company,
starting with simple algorithms like averaging and linear regression, and then
move on to advanced techniques like Auto ARIMA and LSTM.

IDEA:
First it’s important to establish what we’re aiming to solve. Broadly, stock market
analysis is divided into two parts – Fundamental Analysis and Technical Analysis.

• Fundamental Analysis involves analyzing the company’s future profitability


on the basis of its current business environment and financial performance.
• Technical Analysis, on the other hand, includes reading the charts and using
statistical figures to identify the trends in the stock market.

Our focus will be on the technical analysis part. We will use a company dataset that
describes the last history in stock market of that company. The financial data: Open,
High, Low and Close prices of stock are used for creating new variables which are
used as inputs to the model. The models are evaluated using standard strategic
indicators. The low values of these two indicators show that the models are efficient in
predicting stock closing price.

We will works on the gives of the contents below:

1. Understanding the Problem Statement


2. Moving Average
3. Linear Regression
4. Forecasting using stacked LSTM – Deep Learning
5. Auto ARIMA
6. Prophet

SCOPE OF THE PROJECT:


The time series prediction problem was researched in the work centers in the various
financial institution. The prediction model, which is based on Linear Regression and
independent analysis, is proposed for stock market prediction. Various time series

8
analysis models are based on machine learning. The SVM is designed to solve
regression problems in non-linear classification and time series analysis. The
generalization error is minimized using an approximate function, which is based on
risk diminishing principle. Thus, the ICA technique extracts various important
features from the dataset. The time series prediction is based on Linear regression.
The most basic machine learning algorithm that can be implemented on this data is
linear regression. The linear regression model returns an equation that determines the
relationship between the independent variables and the dependent variable. Different
machine learning models and risk strategies have been applied to stock market
prediction task trying to predict mainly the direction of the price for different time
frames and using different features that would affect market prices. Support Vector
Machines (SVM) and Artificial Neural Networks (ANN) are widely used for
prediction of stock prices and its movements. Every algorithm has its way of learning
patterns and then predicting. Stock market prediction is the act of trying to determine
the future value of a company stock or other financial instrument traded on an
exchange. The successful prediction of a stock's future price could yield significant
profit. The most basic machine learning algorithm that can be implemented on this
data is linear regression. The linear regression model returns an equation that
determines the relationship between the independent variables and the dependent
variable. Stock market prediction aims to determine the future movement of the stock
value of a financial exchange. The accurate prediction of share price movement will
lead to more profit investors can make.

1 Chapter 2 – FEASIBILITY STUDY

FEASIBILITY:
Stock market cannot be accurately predicted. The future, like any complex problem,
has far too many variables to be predicted. The stock market is a place where buyers
and sellers converge. When there are more buyers than sellers, the price increases.
When there are more sellers than buyers, the price decreases. So, there is a factor
which causes people to buy and sell. It has more to do with emotion than logic.

9
Because emotion is unpredictable, stock market movements will be unpredictable. It’s
futile to try to predict where markets are going. They are designed to be
unpredictable. The proposed system will not always produce accurate results since it
does not account for the human behaviors. Factors like change in company’s
leadership, internal matters, strikes, protests, natural disasters, and change in the
authority cannot be taken into account for relating it to the change in Stock market by
the machine. The objective of the system is to give a approximate idea of where the
stock market might be headed. It does not give a long term forecasting of a stock
value. There are way too many reasons to acknowledge for the long term output of a
current stock. Many things and parameters may affect it on the way due to which long
term forecasting is just not feasible. Feasibility studies undergo four major analyses to
predict the system to be success and they are as follows:

• Comparison with existing system


• Operational Feasibility
• Technical Feasibility
• Economic Feasibility

1.1.1 Comparison with existing system:

Support Vector Machines (SVM) and Artificial Neural Networks (ANN) are widely
used for prediction of stock prices and its movements. Every algorithm has its way of
learning patterns and then predicting. The prediction model, which is based on SVM
and independent analysis, combined called SVM-ICA, is proposed for stock market
prediction. Various time series analysis models are based on machine learning. The
SVM is designed to solve regression problems in non-linear classification and time
series analysis. on. The prediction model, which is based on SVM and independent
analysis, combined called SVM-ICA, is proposed for stock market prediction. Various
time series analysis models are based on machine learning. The SVM is designed to
solve regression problems in non-linear classification and time series analysis. The
generalization error is minimized using an approximate function, which is based on
risk diminishing principle. Thus, the ICA technique extracts various important
features from the dataset. The time series prediction is based on SVM. The result of

10
the SVM model was compared with the results of the ICA technique without using a
preprocessing step. On the other hand, we have used linear regression. The most basic
machine learning algorithm that can be implemented on this data is linear regression.
The linear regression model returns an equation that determines the relationship
between the independent variables and the dependent variable.

The equation for linear regression can be written as:

Figure 2.1.1 (a)

For our problem statement, we do not have a set of independent variables. We have
only the dates instead. Let us use the date column to extract features like – day,
month, year, mon/fri etc. and then fit a linear regression model.. We will first sort the
dataset in ascending order and then create a separate dataset so that any new feature
created does not affect the original data. Another interesting ML algorithm that one
can use here is kNN (k nearest neighbours). Based on the independent variables, kNN
finds the similarity between new data points and old data points.

1.1.2 Technical Feasibility:

Efficient Market Hypothesis states that stock prices are a reflection of all the
information present in the world and generating excess returns is not possible by
merely analysing trade data which is already available to all public. Yet to further the
research rejecting this idea, a rigorous literature review was conducted and a set of
five technical indicators and 23 fundamental indicators was identified to establish the
possibility of generating excess returns on the stock market. Leveraging these data
points and various classification machine learning models, trading data of the 505
equities on the US S&P500 over the past 20 years was analysed to develop a classifier
effective for our cause. From any given day, we were able to predict the direction of

11
change in price by 1% up to 10 days in the future. The predictions had an overall
accuracy of 83.62% with a precision of 85% for buy signals and a recall of 100% for
sell signals. Moreover, we grouped equities by their sector and repeated the
experiment to see if grouping similar assets together positively effected the results but
concluded that it showed no significant improvements in the performance—rejecting
the idea of sector-based analysis. Also, using feature ranking we could identify an
even smaller set of 6 indicators while maintaining similar accuracies as that from the
original 28 features and also uncovered the importance of buy, hold and sell analyst
ratings as they came out to be the top contributors in the model. Finally, to evaluate
the effectiveness of the classifier in real-life situations, it was backtested on FAANG

(Facebook, Amazon, Apple, Netflix & Google) equities using a modest trading
strategy where it generated high returns of above 60% over the term of the testing

dataset. In conclusion, our proposed methodology with the combination of


purposefully picked features shows an improvement over the previous studies, and
our model predicts the direction of 1% price changes on the 10th day with high
confidence and with enough buffer to even build a robotic trading system.

1.1.3 Operational Feasibility:

Operational feasibility is a measure of how well a proposed system solves the


problems, and takes advantage of the opportunities identified during scope definition
and how it satisfies the requirements identified in the requirements analysis phase of
system development. Operational feasibility reviews the willingness of the
organization to support the proposed system. This is probably the most difficult of the
feasibilities to gauge. In order to determine this feasibility, it is important to
understand the management commitment to the proposed project. If the request was
initiated by management, it is likely that there is management support and the system
will be accepted and used. However, it is also important that the employee base will
be accepting of the change. The operational feasibility is the one that will be used
effectively after it has been developed. If users have difficulty with a new system, it
will not produce the expected benefits. It measures the viability of a system in terms

12
of the PIECES framework. The PIECES framework can help in identifying
operational problems to be solved, and their urgency:

1. Performance:

Does current mode of operation provide adequate throughput and response


time? As compared to traditional methods of manually retrieving the stock
data from the web and forecasting the stock prices with large number of
manual calculations, this system plays a very important role in designing an
application that automates the process of data retrieval and stock
movement/price prediction with the help of a user-friendly dashboard, thus
making the process easier and faster.

2. . Information: Does current mode provide end users and managers with
timely, pertinent, accurate and usefully formatted information? System
provides end users with timely, pertinent, accurate and usefully formatted
information. Since all the stock related information is being pulled from
Yahoo Finance against a unique NSE Stock Symbol, it will provide for
meaningful and accurate data to the investor. The investing decisions are made
by the traditional investors manually. This results in loss of validity of data
due to human error. The information handling and the investing decision in the
proposed system will be driven by computerized and automatically updated
prediction and validation of stock data. The human errors will be minimal. The
data will be automatically updated from time to time and will be validated
before the data is processed into the system.
3. Economy: Does current mode of operation provide cost-effective
information services to the business? Could there be a reduction in costs
and/or an increase in benefits? Page 13 of 76 Determines whether the system
offers adequate service level and capacity to reduce the cost of the business or
increase the profit of the business. The deployment of the proposed system,
manual work will be reduced and will be replaced by an IT savvy approach.
Moreover, it has also been shown in the economic feasibility report that the
recommended solution is definitely going to benefit economically in the long
run. The system is built on Excel, R and JavaScript. Excel and Javascript do

13
not need any additional installation; they are in-built in every system. R needs
installation but it is free software. So, overall the application is very
economically feasible.\
4. Control: Does current mode of operation offer effective controls to protect
against fraud and to guarantee accuracy and security of data and information?

As all the data is pulled from Yahoo Finance, which is a public stock data
provider, it does not contain any confidential information which can be
misused, so on that contrast there should be no use of any security corner for
this system.

5. Efficiency: Does current mode of operation makes maximum use of available


resources, including people, time, and flow of forms?

Efficiency work is to ensure a proper workflow structure to store patient data;


we can ensure the proper utilization of all the resources. It determines whether
the system makes maximum use of available resources including time, people,
flow of forms, minimum processing delay. In the current system a lot of time
is wasted as the investing decisions are made by the traditional investors
manually. The proposed system will be a lot efficient as it will be driven by
computerized and automatically updated prediction and validation of stock
data. The data will be automatically updated from time to time and will be
validated before the data is processed into the system.

6. Services: Does current mode of operation provide reliable service? Is it


flexible and expandable? The system is desirable and reliable services to those
who need it and also whether the system is flexible and expandable or not. The
proposed system is very much flexible for better efficiency and performance
of the organization. The scalability of the proposed system will be
inexhaustible as the storage capacity of the system can be increased as per
requirement. This will provide a strong base for expansion. The new system
will provide a high level of flexibility.

14
1.1.4 Economical Feasibility:

Economic analysis could also be referred to as cost/benefit analysis. It is the most


frequently used method for evaluating the effectiveness of a new system. In
economic analysis the procedure is to determine the benefits and savings that are
expected from a candidate system and compare them with costs. If benefits
outweigh costs, then the decision is made to design and implement the system. An
entrepreneur must accurately weigh the cost versus benefits before taking an
action.

Possible questions raised in economic analysis are:

1. Is the system cost effective? 2.


Do benefits outweigh costs?

3. The cost of doing full system study


4. The cost of business employee time
5. Estimated cost of hardware
6. Estimated cost of software/software development 7. Is the project possible,
given the resource constraints?
8. What are the savings that will result from the system?
9. Cost of employees' time for study
10. Cost of packaged software/software development
11. Selection among alternative financing arrangements (rent/lease/purchase)

The concerned business must be able to see the value of the investment it is pondering
before committing to an entire system study. If short-term costs are not overshadowed
by long-term gains or produce no immediate reduction in operating costs, then the
system is not economically feasible, and the project should not proceed any further. If
the expected benefits equal or exceed costs, the system can be judged to be
economically feasible. Economic analysis is used for evaluating the effectiveness of
the Proposed System. The economical feasibility will review the expected costs to see
if they are in-line with the projected budget or if the project has an acceptable return
on investment. At this point, the projected costs will only be a rough estimate. The
exact costs are not required to determine economic feasibility. It is only required to

15
determine if it is feasible that the project costs will fall within the target budget or
return on investment. A rough estimate of the project schedule is required to
determine if it would be feasible to complete the systems project within a required
timeframe. The required timeframe would need to be set by the organization.

Costs & Benefit Analysis:


It is the process of analyzing the financial facts associated with the system
development projects performed when conducting a preliminary investigation. The
purpose of a cost/benefit analysis is to answer questions such as:

• Is the project justified (because benefits outweigh costs)?


• Can the project be done, within given cost constraints?

• What is the minimal cost to attain a certain system?


• What is the preferred alternative, among solutions?

Following is the figure showing the approx. amount of cost and benefit to the
system:

Figure 2.1 (a)

16
Intangible cost:

Figure 2.2 (b)

2 Chapter 3 – Problem Formulation and Requirement Phase


Linear regression is a simple technique and quite easy to interpret, but there are a few
obvious disadvantages. One problem in using regression algorithms is that the model
overfits to the date and month column. Instead of taking into account the previous
values from the point of prediction, the model will consider the value from the same
date a month ago, or the same date/month a year ago. The Stock Market is a complex
and dynamical system, & is influenced by many factors that are subject to uncertainty.
So, it is a difficult task to forecast stock price movements. Due to technology and
globalization of business & financial markets it is important to predict the stock prices
more quickly & accurately. Automated User friendly Trading application can be
developed based on financial predictive indicator algorithms & machine learning
techniques to predict the performance of stocks in NSE’s Nifty 50 Index.

Problem Formulation:

The vast majority of the stockbrokers while making the prediction utilized the
specialized, fundamental or the time series analysis. Overall, these techniques couldn't
be trusted completely, so there emerged the need to give a strong strategy to financial
exchange prediction. To find the best accurate result, the methodology chose to be
implemented as machine learning and AI along with supervised classifier. Results
were tried on the binary classification utilizing SVM classifier with an alternate set of

17
a feature list. The greater part of the Machine Learning approach for taking care of
business issues had their benefit over factual techniques that did exclude AI, despite
the fact that there was an ideal procedure for specific issues. Swarm Intelligence
optimization method named Cuckoo search was most easy to accommodate the
parameters of SVM. The proposed hybrid CS-SVM strategy exhibited the
performance to mcreate increasingly exact outcomes in contrast with ANN. Likewise,
the CS-SVM display performed better in the forecasting of the stock value prediction.
Prediction stock cost utilized parse records to compute the predicted, send it to the
user, and autonomously perform tasks like buying and selling shares utilizing

automation concept.

Stock market prediction is basically defined as trying todetermine the stock


value and offer a robust idea for the people to know and predict the market
and the stock prices. It is generally presented using the quarterly financial ratio
using the dataset. Thus, relying on a single dataset may not be sufficient for
the prediction and can give a result which is inaccurate. Hence, we are
contemplating towards the study of machine learning with various datasets
integration to predict the market and the stock trends. The problem with
estimating the stock price will remain a problem if a better stock market
prediction algorithm is not proposed. Predicting how the stock market will
perform is quite difficult. The movement in the stock market is usually
determined by the sentiments of thousands of investors. Stock market
prediction, calls for an ability to predict the effect of recent events on the
investors. These events can be political events like a statement by a political
leader, a piece of news on scam etc. It can also be an international event like
sharp movements in currencies and commodity etc. All these events affect the
corporate earnings, which in turn affects the sentiment of investors. It is
beyond the scope of almost all investors to correctly and consistently predict
these hyperparameters. All these factors make stock price prediction very
difficult. Once the right data is collected, it then can be used to train a machine
and to generate a predictive result.

18
Objective:

2.1.1 To add to the academic understanding of stock market prediction:

o With a greater understanding of how the market moves, investors


will be better equipped to prevent another financial crisis. o
Evaluate some existing strategies from a rigorous scientific
perspective and provide a quantitative evaluation of new strategies

o With a greater understanding of how the market moves,


investors will be better equipped to prevent another financial
crisis.

In the last few decades forecasting of stock returns has become an


important field of research. In most of the cases the researchers
had attempted to establish a linear relationship between the input
macroeconomic variables and the stock returns. After the
discovery of nonlinearity in the stock market index returns, many
literatures have come up in nonlinear statistical modeling of the
stock returns, most of them required that the nonlinear model be
specified before the estimation is done. But since stock market
return is noisy, uncertain, chaotic and nonlinear in nature,
Predictive Modeling & Machine Learning has evolved in capturing
the structural relationship between a stock’s performance and its
determinant factors more accurately than many other statistical
techniques.

In literature, different sets of input variables are used to predict


stock returns. In fact, different input variables are used to predict
the same set of stock return data. Some researchers used input data
from a single time series where others considered the inclusion of
heterogeneous market information and macroeconomic variables.
Some researchers even preprocessed these input data sets before
feeding it to the Predictive Model for forecasting.

19
Min and Lee were doing prediction of bankruptcy using machine
learning. They evaluated methods based on Support Vector
Machine, multiple discriminant analysis, logistic regression
analysis, and three-layer fully connected back-propagation neural
networks. Their results indicated that support vector machines
outperformed other approaches.

A Decision Tree is a useful and popular classification technique


that inductively learns a model from a given set of data. One
reason for its popularity stems from the availability of existing
algorithms that can be used to build decision trees, such as CART
(Breiman et al., 1984), ID3 (Quinlan, 1986), and
C4.5(Quinlan,1993). These algorithms all learn decision trees from
a supplied set of training data, but do so in slightly different ways.
As discussed in the introduction, a classifier is built by analyzing
training data. That is to say, a classifier is built by analyzing a
collection of instances where each instance is composed of a set of
attribute values and a known class value. These decision tree
algorithms build top down structures that partition instances into
separate classes, and it is hoped that these structures generalize
well to instances with unknown class values. This would mean that
the decision trees have fulfilled their objectives and have indeed
discovered some underlying property of the data (Quinlan, 1986).

Tsai and Wang did a research where they tried to predict stock
prices by using ensemble learning, composed of decision trees and
artificial neural networks. They created dataset from Taiwanese
stock market data, taking into account fundamental indexes,
technical indexes, and macroeconomic indexes. The performance
of Decision Tree + Artificial Neural Network trained on Taiwan

20
stock exchange data showed Fscore performance of 77%. Single
algorithms showed F-score performance up to 67%.

Josip Arneric, Elza Jurun, Snjezana pivac describes that technical


analysis is done to find out the price movements whereas
fundamental analysis is done to predict values by looking at the
fundamentals of a particular company. They focuse on technical
analysis and define that trend can be of two types on the basis of
either time structure or general direction.

Professor Veroljub says that the way of investing is to sell when


prices are at top and to buy when prices are at lower whatever the
patterns are. In his articles he has discussed the market efficiency
theory, Classical theory, confidence theory and Dow Theory. He
also differentiates between the Classical and Confidence theory.

Wing-Keung Wong, Meher Manzur and Boon-Kiat Chew (2002)


article discuss that the helpful principle of technical analysis is to
identify trends and then go with the trend whether it is occurring
randomly or due to Page 8 of 76 fundamental factors. He also
discussed the techniques of moving averages and relative strength
index (RSI) by applying it on Singapore stock exchanges. There are
many tools and software available out there that provide forecasting of
stock market entities, share quantity and share value for a given
financial organization. Most of them claim to predict the stock market
with near to 100% accuracy but the opinions from the users vary.
Some of the popular tools and software with their methodologies are
mentioned as follows. inteliCharts Predictive Stock Market

Analytics:

21
It is a quantitative modeling tool used for financial time series
forecasting. The system is adaptive in its core as it learns the
patterns and geometrical relationships defined by historical time
series data points, which are unique for each individual stock,
index, or another financial instrument.

Markettrak:
Its stock market forecast system consists of two major parts: an
extensive database and a forecast model. The forecast model reads
the database and then makes a prediction of where the market is
headed. From this prediction, it determines a trading position for
the Dow Diamonds or the SP500 Spiders. The database and
forecast are updated daily at the close of trading.

Stock-Forecasting.com:

www.stock- forecasting.com (Center of Mathematics & Science,


Inc., Chicago, United States of America) provides innovative
price-prediction technology for active Day Traders, Short- and
Long-term Investors. They develop web-based software for stock
market forecasting and analysis.

o Evaluate some existing strategies from a rigorous scientific perspective and


provide a quantitative evaluation of new strategies:

There are three conventional approaches for stock price prediction: technical
analysis, traditional time series forecasting, and machine learning method.
Earlier classical regression methods such as linear regression, polynomial
regression, etc. were used to predict stock trends. Also, traditional statistical

22
models which include exponential smoothing, moving average, and ARIMA
makes their prediction linearly. Nowadays, Support Vector Machines (Cortes
& Vapnik, 1995) (SVM) and Artificial Neural Networks (ANN) are widely
used for the prediction of stock price movements. Every algorithm has its way
of learning patterns and then predicting. Artificial Neural Network (ANN) is a
popular and more recent method which also incorporate technical analysis for
making predictions in financial markets. ANN includes a set of threshold
functions. These functions trained on historical data after connecting each
other with adaptive weights and they are used to make future predictions.
(Trippi & Turban, 1992; Walczak, 2001; Shadbolt & Taylor, 2002) (Kuan &
Liu, 1995) investigated the out-of-sample forecasting ability of recurrent and
feedforward neural networks based on empirical foreign exchange rate data
(Kuan & Liu, 1995). In 2017, Mehdi Khashei and Zahra Haji Rahimi
evaluated the performance of series and parallel strategies to determine a more
accurate one using ARIMA and MLP (Multilayer Perceptron) (Mehdi &
Zahra, 2017).

Artificial neural networks have been used widely to solve many problems due
to its versatile nature. (Samek & Varachha, 2013) (Yodele et al., 2012),
presented a hybridized approach, i.e., a combination of the variables of
fundamental and technical analysis of stock market indicators to predict future
stock prices to improve the existing methods, (Yodele et al., 2012) (Y Kara &
A Boyacioglu, 2011) discussed stock price index movement using two models
based on Artificial Neural Network (ANN) and Support Vector Machine
(SVM). They compared the performances of both the models and concluded
that the average performance of the ANN model was significantly better than
the SVM model. (Y Kara & A Boyacioglu, 2011) (Qi & Zhang, 2008)
investigated the best modeling of trend time series using Neural Network.
They used four different approaches, i.e., raw data, raw data with a time index,
de-trending and differencing for modeling various trend patterns and
concluded Neural Network gives better results (Qi & Zhang, 2008). H.K.
Cigizoglu, (2003) discussed the application of ANN forecasting, estimation

23
and extrapolation of the daily flow data belonging to the rivers in the East
Mediterranean region of Turkey. In their study, they found that ANN provides
a better fit to the data than conventional methods (Cigizoglu, 2003). ANN can
consider as a computation or a mathematical model which is inspired by the
functional or structural characteristics of biological neural networks. These
neural networks are developed in such a way that it can extract patterns from
noisy data. ANN first train a system using a large sample of data known as
training phase then it introduces the network to the data which was not
included in the training phase, this phase known as validation or prediction
phase. The sole motive of this procedure is to predict new outcomes. (Bishop,
1995) This idea of learning from training and then predicting outcomes in
ANN comes from the human brain which can learn and respond. Thus ANN
has been used in many applications and is proven successful in executing
complex functions in a variety of fields (Fausett, 1994).

The data used in this case study is tick data of Reliance Private Limited from
period 30 NOV 2017 to 11 JAN 2018 (excluding holidays). There are roughly
15,000 data points per day. The dataset used contains approximately 430,000
data points. The data obtained from Thomson Reuter Eikon databaseFootnote1
(This database is a paid product of Thomson Reuter). Each tick refers to the
change in the price of the stock from trade to trade. The stock price at the start
of every 15 min extracted from the tick data. This represents the secondary
dataset on which same algorithms have run. In this study, we have made
predictions on Tick Data, and 15-min Data using the same neural networks
and their results are compared.

Sentiment analysis uses text mining, natural language processing, and


computational techniques to automatically extract sentiments from a text. It
aims to classify the polarity of a given text at the sentence level or class level,
whether it reflects a positive, negative, or neutral view.

24
Functional Requirement:
Functional requirement are the functions or features that must be included in any
system to satisfy the business needs and be acceptable to the users. Based on this, the
functional requirements that the system must require are as follows:

1. The system should be able to generate an approximate share price.

2. The system should collect accurate data from the stock market website in
consistent manner.

3. The prediction shall abide by the following functional requirements:


4. Prior to application of stock recommendations, the database is updated by the
latest values.
5. The charts and comparison of the companies would be done only on the latest
data stock market data.

6. The user can look previous data Information which was collected.
7. The user can also be recommended on the basis of the trending stocks which
would require the data regarding the stocks.

• Modules:

▪ Pandas: It is used for data cleaning and analysis. It has features


which are used for exploring, cleaning, transforming and visualizing from
data.
▪ NumPy: It is an open source module of Python which provides fast
mathematical computation on arrays and matrices. Since, arrays and
matrices are an essential part of the Machine
Learning ecosystem, Numpy along with Machine Learning modules like
Scikit-learn, Pandas, Matplotlib, TensorFlow, etc

25
▪ Matplotlib: It consists of several plots like the Line Plot, Bar Plot,
Scatter Plot, Histogram etc. through which we can visualise various types
of data.

▪ Matplotlib.pyplot: It is used to assign names to x-axis and y-axis or


title of the graph/plot. It is also used to represent edges and/or vertices of a
graph with integers or names.

▪ MinMaxScaler: MinMaxScaler of module sklearn.preprocessing is


used to transform features by scaling each feature to a given range. This
estimator scales and translates each feature individually such that it is in
the given range on the training set, e.g. between zero and one.

Non-Functional Requirement:
Non-functional requirement is a description of features, characteristics and attribute of
the system as well as any constraints that may limit the boundaries of the proposed
system. The non- functional requirements are essentially based on the performance,
information, economy, control and security efficiency and services. Based on these
the non-functional requirements are as follows:

1. The system should provide better accuracy.


2. The system should have simple interface for users to use.
3. To perform efficiently in short amount of time.

o Software and Hardware requirement:

o Hardware

 Processor: Intel(R) Core(TM) i5-2540M CPU @ 2.60GHz, 2601


Mhz,

 Installed Memory (RAM): 4.00 GB (2.64 GB Usable)

26
 System type: 64-bit Operating System o Software

 Back-end Data Service Provider: Yahoo! Finance


 Operating System: Windows 10
 Front-end Tool: Jupyter Notebook & Chrome Browser

o Reliability: The reliability of the product will be dependent on the


accuracy of the dataset of purchase, how much stock was purchased, high and
low value range as well as opening and closing figures. Also the stock data
used in the training would determine the reliability of the software.

o Security: The user will only be able to access the website using his login
details and will not be able to access the computations happening at the back
end.

o Maintainability: The maintenance of the product would require training


of the software by recent data so that there commendations are up to date. The
database has to be updated with recent values.

3 Chapter 4 - System Design and Implementation:

System Design:

Use case Diagram for the system:

27
Figure 4.1 (a)
Use case Index:

Figure 4.1 (b)

Use case description:

Use case ID: 1


Use case name: Collect data
Description: Every required stock data of NSE will be available in Yahoo Finance.
Automated User Interface Application Backend will be able to collect the data for
system.

Use case ID: 2


Use case name: Compute result and performance
Description: Prediction result will be handled and generated by Automated User
Interface Application Backend. The system will be built, through which the result of
prediction and system performance will be analyzed.

Use case ID: 3


Use case name: System update

28
Description: With the change of market and technology regular update of system is
required. The predicted result of stock exchange and their actual price will be
autoupdated by the Automated User Interface Application Backend on regular basis.

Use case ID: 4


Use case name: View traded exchange
Description: Company trading which is held at NSE India can be viewed by user.

Use Case ID: 5


Use Case Name: Company Stock
Description: It is extended feature of view traded exchange. This includes the stock
value of particular company.

Use Case ID: 6


Use Case Name: View predicted outcome
Description: This use case is the most important in whole project. The key feature of
this project is to predict the value of Nifty stocks. Thus, this will be available in user
interface and viewer can observe them.

System Implementation: we will work with historical data about the stock
prices of a publicly listed company. We will implement a mix of machine learning
algorithm to predict the future stock price of this company, starting with simple
algorithms like averaging and linear regression, and then move on to advanced
techniques like Auto ARIMA and LSTM.
We will takes the whole content in 7 parts:

1. Understanding the Problem Statement


2. Moving Average
3. Linear Regression
4. k-Nearest Neighbors
5. Auto ARIMA
6. Prophet
7. Long Short Term Memory (LSTM)

Understanding the Problem Statement:

29
We’ll dive into the implementation part of this article soon, but first it’s important to
establish what we’re aiming to solve. Broadly, stock market analysis is divided into
two parts – Fundamental Analysis and Technical Analysis.

• Fundamental Analysis involves analyzing the company’s future profitability


on the basis of its current business environment and financial performance.
• Technical Analysis, on the other hand, includes reading the charts and using
statistical figures to identify the trends in the stock market.

Our focus will be on the technical analysis part. We’ll be using a dataset and for
this particular project, We have used the data for ‘Tata Global Beverages’.

We will first load the dataset and define the target variable for the problem:

There are multiple variables in the dataset – date, open, high, low, last, close,
total_trade_quantity, and turnover.

• The columns Open and Close represent the starting and final price at which
the stock is traded on a particular day.
• High, Low and Last represent the maximum, minimum, and last price of the
share for the day.

• Total Trade Quantity is the number of shares bought or sold in the day and
Turnover (Lacs) is the turnover of the particular company on a given date.

Another important thing to note is that the market is closed on weekends and public
holidays.Notice the above table again, some date values are missing – 2/10/2018,
6/10/2018, 7/10/2018. Of these dates, 2nd is a national holiday while 6th and 7th fall
on a weekend.

The profit or loss calculation is usually determined by the closing price of a stock for
the day, hence we will consider the closing price as the target variable. Now we will
plot the target variable to understand how it’s shaping up in our data:

30
3.1.1 Moving Average:

‘Average’ is easily one of the most common things we use in our day-to-day lives. For
instance, calculating the average marks to determine overall performance, or finding
the average temperature of the past few days to get an idea about today’s temperature
– these all are routine tasks we do on a regular basis. So this is a good starting point to
use on our dataset for making predictions.

The predicted closing price for each day will be the average of a set of previously
observed values. Instead of using the simple average, we will be using the moving
average technique which uses the latest set of values for each prediction. In other
words, for each subsequent step, the predicted values are taken into consideration
while removing the oldest observed value from the set.

3.1.2 Linear Regression:

The linear regression model returns an equation that determines the relationship
between the independent variables and the dependent variable.

The equation for linear regression can be written as:

Here, x1, x2,….xn represent the independent variables while the coefficients θ 1,
θ2, …. θn represent the weights.

For our problem statement, we do not have a set of independent variables.


We have only the dates instead. So we will use the date column to extract
features like – day, month, year, mon/fri etc. and then fit a linear regression
model.

We will first sort the dataset in ascending order and then create a separate
dataset so that any new feature created does not affect the original data.

Apart from this, we can add our own set of features that we believe would be relevant
for the predictions. For instance, my hypothesis is that the first and last days of the
week could potentially affect the closing price of the stock far more than the other

31
days. So I have created a feature that identifies whether a given day is Monday/Friday
or Tuesday/Wednesday/Thursday.
We will now split the data into train and validation sets to check the performance of
the model.
Interference in Linear Regression: Linear regression is a simple technique and
quite easy to interpret, but there are a few obvious disadvantages. One problem in
using regression algorithms is that the model overfits to the date and month column.
Instead of taking into account the previous values from the point of prediction, the
model will consider the value from the same date a month ago, or the same
date/month a year ago. As seen from the plot above, for January 2016 and January
2017, there was a drop in the stock price. The model has predicted the same for
January 2018. A linear regression technique can perform well for problems such as
Big Mart sales where the independent features are useful for determining the target
value.

3.1.3 K-Nearest Neighbours:

In K-Nearest Neighbours, based on the independent variables, we will finds the


similarity between new data points and old data points through K-Nearest
Neighbours.

Interferance in K-Nearest Neighbour:

The RMSE value is almost similar to the linear regression model and
the plot shows the same pattern. Like linear regression, kNN also
identified a drop in January 2018 since that has been the pattern for
the past years. We can safely say that regression algorithms have not
performed well on this dataset.

3.1.4 Auto ARIMA:

32
ARIMA is a very popular statistical method for time series forecasting. ARIMA
models take into account the past values to predict the future values. There are three
important parameters in ARIMA:

• p (past values used for forecasting the next value)


• q (past forecast errors used to predict the future values)
• d (order of differencing)

Parameter tuning for ARIMA consumes a lot of time. So we will use auto ARIMA
which automatically selects the best combination of (p,q,d) that provides the least
error.

o Interferance in Auto ARIMA Model: As we saw earlier, an auto ARIMA


model uses past data to understand the pattern in the time series. Using these
values, the model captured an increasing trend in the series. Although the
predictions using this technique are far better than that of the previously
implemented machine learning models, these predictions are still not close to the
real values.

3.1.5 Prophet:

There are a number of time series techniques that can be implemented on the stock
prediction dataset, but most of these techniques require a lot of data preprocessing
before fitting the model. Prophet, designed and pioneered by Facebook, is a time
series forecasting library that requires no data preprocessing and is extremely simple
to implement. The input for Prophet is a dataframe with two columns: date and target
(ds and y).

Prophet tries to capture the seasonality in the past data and works well when the
dataset is large.

• Interferance in Prophet:

33
Prophet (like most time series forecasting techniques) tries to capture the trend
and seasonality from past data. This model usually performs well on time
series datasets, but fails to live up to it’s reputation in this case.

As it turns out, stock prices do not have a particular trend or seasonality. It highly
depends on what is currently going on in the market and thus the prices rise and fall.
Hence forecasting techniques like ARIMA, SARIMA and Prophet would not show
good results for this particular problem.

3.1.6 Long Short Term Memory:


LSTM is able to store past information that is important, and forget the information
that is not. LSTM has three gates:

• The input gate: The input gate adds information to the cell state
• The forget gate: It removes the information that is no longer required by
the model
• The output gate: Output Gate at LSTM selects the information to be
shown as output

Let us implement LSTM as a black box and check it’s performance on our
particular data.

• Interferance in LSTM:
The LSTM model can be tuned for various parameters such as changing the
number of LSTM layers, adding dropout value or increasing the number of
epochs. But are the predictions from LSTM enough to identify whether the
stock price will increase or decrease? Time series forecasting is a very
intriguing field to work with, as I have realized during my time writing these
articles. There is a perception in the community that it’s a complex field, and
while there is a grain of truth in there, it’s not so difficult once you get the
hang of the basic techniques.

4 Chapter 5 - Result and Discussion:

34
Result:

Figure 5 (a)

Discussion
Our proposed solution is a unique customization as compared to the previous works
because rather than just proposing yet another state-of-the-art LSTM model, we
proposed a fine-tuned and customized deep learning prediction system along with
utilization of comprehensive feature engineering and combined it with LSTM to
perform prediction. By researching into the observations from previous works, we fill
in the gaps between investors and researchers by proposing a feature extension
algorithm before recursive feature elimination and get a noticeable improvement in
the model performance.

Moreover, by combining latest sentiment analysis techniques with feature engineering


and deep learning model, there is also a high potential to develop a more
comprehensive prediction system which is trained by diverse types of information
such as tweets, news, and other text-based data.

5 Conclusion and Future scope:

35
Conclusions:
Stock market prediction is a very important aspect in the financial market. It is
important to predict the stock market successfully in order to achieve maximum
profit. This paper will focus on applying machine learning algorithms like Random
Forest, Support Vector Machine, KNN and Logistic Regression on datasets. We
evaluate the algorithms by finding performance metrics like accuracy, recall,
precision and f- score. Our objective is to identify the best possible algorithm for
predicting future stock market performances. The successful prediction of the stock
market will have a very positive impact on the stock market institutions and the
investors also.
Machine learning algorithms can process social media content such as tweets, posts,
and comments of people who generally have stakes in the stock market. This data is
then used to train an AI model so that it can forecast the stock prices in different
scenarios

Scope for Future Work:


The main Advantage is that since the model uses RNN, LSTM, Machine Learning
and Deep Learning models the prediction of stock prices will be more accurate. And
also in the model it can predict the future 30 days Stock Prices and it can show it in a
graph. Also the main feature is that the model can show an output of the Individual
Predicted Close prices of the Predicted 30 days

When it comes to using machine learning in the stock market, there are multiple
approaches a trader can do to utilize ML models. From determining future risk
to predicting stock prices, machine learning can be used for virtually any kind of
financial modeling.

36
6 References:

Book Name: Machine Learning using Python (by Manaranjan Pradhan and U Dinesh
Kumar)

Websites:

By Krish Naik - https://www.youtube.com/watch?v=H6du_pfuznE

By edureka! - https://www.youtube.com/watch?v=lncoLfue_Y4

By Simplilearn - https://www.youtube.com/watch?v=OXwZtlcTiuk

7 Annexure:

We are using Jupyter Notebook. Which runs source codes in shells.

import pandas_datareader as pdr

key=""

df = pdr.get_data_tiingo('AAPL', api_key='4207fee411005b9e5f163144e0df39ae4126029a')

df.to_csv('AAPL.csv')

import pandas as pd

37
df=pd.read_csv('AAPL.csv')

import matplotlib.pyplot as plt

Above is the depiction of the dataset values before prediction. It shows the plotting of the all values
present in the dataset.
time_step = 100
# Here above, we will take the last 100 values from the shares to train our model.
X_train, y_train = create_dataset(train_data, time_step)
X_test, ytest = create_dataset(test_data, time_step)
print(X_train.shape), print(y_train.shape)
# following are the significant modules and libraries for prediction
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM

38
Above picture shows 100 bathces that are used to train the data model. The basis of these
entities, out trained model will predict the future values of the dataset.

In the above graph, our model has predicted the next values with its algothm and in below
picture, the precidted line is embedded to the actual graph.

39
40

You might also like