You are on page 1of 33

Prediction of Stock Prices

Using Machine Learning

Submitted by:
Enrollment no. 19102140
Aanchal Patel

Enrollment no. 19802007


Aaditya Dev Verma

Submitted to:
Mrs. Bhawna Gupta
(Internal Supervisor)

Submitted in partial fulfillment of the Degree of


Bachelors of Technology

DEPARTMENT OF ELECTRONICS AND COMMUNICATION


JAYPEE INSTITUTE OF INFORMATION TECHNOLOGY, NOIDA
Table Of Content

Topic

Table of Content…….…….…….…….…….…….…….…….…….………2
Certificate….…….…….…….…….…….……..….…….…….…….………3
Acknowledgment….…….…….…….…….…….……..….…….…….…….4
Summary….…….…….…….…….…….……..….…….…….…….…….…5
Introduction…….…….…….…….……….…….…….…….………………6
Project Purpose…….…….…….…….……….…….…….…….…………...8
Literature Review…….…….…….…….……….…….…….…….………...9
Methodology…….…….…….…….……….…….…….…….……….…….12
Implementation…….…….…….…….……….…….…….…….………….14
Result Evaluation…….…….…….…….……….…….…….…….……….19
Snapshots…….…….…….…….……….…….…….…….……….…….….20
Conclusion………………………………………………………………….30
Future Enhancement………………………………………………………31
References…….…….…….…….……….…….…….…….……….………32

2
Acknowledgement

In the accomplishment of completion of my project on I would like to convey


my special gratitude to Mrs. Bhawna Gupta, of Electronics and
Communication Department

Your valuable guidance and suggestions helped us in various phases of the


completion of this project. I will always be thankful to you in this regard.

Aanchal Patel
19102140 (A5)

Aaditya Dev Verma


19802007 (A9)

3
Summary

Ahead we’ll see how machine learning is used for predictions of stocks, stock is an
unpredictable mechanism which follows the segments of the chain and the dependencies of
the same are unpredictable. It is defined to be a curve which keeps on changing and turning
the price from low to high and vice-versa.

In the project various high level machine learning algorithms are implemented and integrated
and the output is generated from the same making a user visible with the outputs in the form
of graph which makes it easier for them to see and interpret what's the scenario and they can
decide on the same to invest and get the benefit out of it,

Following we’ll learn about machine learning and how different algorithms perform
differently. Basically, the software takes the raw set of data from the dataset or the .csv file
and process it.

We’ll go on to find that the best algorithm among all we used is Linear regression.

4
Introduction

Predicting how the stocks of a particular


company will perform, can be a tedious
task, as many factors are to be considered,
for instance, physical, psychological,
validity of the internal facts and many
more, however with experience and proper
knowledge, our predictions can improve to
a great extent. Stock market prediction is
Figure 1
the act of trying to determine the future Source: erex [8]
value of a company stock or other financial instrument traded on an exchange. The successful
prediction of a stock's future price could yield significant profit. The efficient-market
hypothesis suggests that stock prices reflect all currently available information and any price
changes that are not based on newly revealed information thus are inherently unpredictable.
Others disagree and those with this viewpoint possess myriad methods and technologies
which purportedly allow them to gain future price information.

The stock market is known for being volatile, dynamic, and nonlinear. Accurate stock price
prediction is extremely challenging because of multiple (macro and micro) factors, such as
politics, global economic conditions, unexpected events, a company’s financial performance,
and so on.
But, all of this also means that there’s a lot of data to find patterns in. So, financial analysts,
researchers, and data scientists keep exploring analytics techniques to detect stock market
trends. This gave rise to the concept of algorithmic trading, which uses automated, pre-
programmed trading strategies to execute orders.

What if we can use the data of a particular company and recognize certain patterns and
sequences and use an algorithm which implements on these patterns to predict future values?
Well, the answer to this lies in the concept of machine learning. With the help of machine

5
learning, we can develop an algorithm using various regression techniques which can help us
in predicting the stock prices based on its analysis of the dataset.
Make accurate predictions, get deeper insights from your data, reduce operational overhead,
and improve customer experience with AWS machine learning (ML). AWS helps you at
every stage of your ML adoption journey with the most comprehensive set of artificial
intelligence (AI) and ML services, infrastructure, and implementation resources.

6
Project Purpose
Stock market prediction is a prediction system software that illuminate the risk that
undergoes during the investment in stock market. It predicts the stock rates and its rate of
exchange acknowledging the basic understanding and the statistical analysis in front of users.

Data is considered as the digital fuel that gives the possibilities of higher yearn and gives the
upcoming terms. Knowledge is power and same holds correct with the stock. Stock is
unpredictable and over-changing its dynamic in nature. The rise and fall of the same is
uneven and can't be classified so easily. Dependencies of the same deals with flexible
resources and the agents behind it.

Investment during a fiscal day determines the opening stock market for the next day. It has its
dependencies and is total integration with the level of finances and revenue generation. The
stock is tremendous and hectic in nature. The main theme of the project is to predict the
turning curves and bring the predictability method and undergo the process and algorithms to
conclude to a viable resource source.

Everything flows a pattern. Pattern is the way of derivation and so holds true for the stock
too. Stock in day to day life follows a pattern movement. Increase in some resource can
increase the price of some whereas decrease the price rate for the others, The source and the
outcome are derived on the polarity basis which can either be positive, neutral or an negative
flow. Correlation of the given polarity is determined and an effective source and reliability is
established.

This project helps in bridging the resources and empowering the people to know and trade the
most out of stock and understand the generation and the vulnerabilities that has to be seen and
predicted. The enhancement of the same is done with the resource graph which makes an user
or the customer to analyses the same and take the needs and important details before dealing
and consider those things for the yield that the person is willing to invest on. Forecasting of
the stock prediction is done by the available data source and the prediction is done for the
upcoming week. The predictability itself is a challenge and that's the main purpose of the
report.

7
Literature Review

One of the integral part to maintain the consistency is the literature survey. It's the crucial
steps to be followed in the development process. The Software Development needs
authenticity of the resources and the availability of the same. This part helps in discovering
the content that been worked on and find the utilization and the implementation of the same
in today's time. The key factor to the development is the economy and the strength of the
product. Once the innovation of the same undergoes through the building phase the support
and the resource flow is to be monitored and computed. This is also known as the Research
phase where all the research is embedded and done to carry the flow.

Machine Learning

Machine learning is a subfield of artificial intelligence, which is broadly defined as the


capability of a machine to imitate intelligent human behavior. Artificial intelligence systems
are used to perform complex tasks in a way that is similar to how humans solve problems.

The goal of AI is to create computer models that exhibit “intelligent behaviors” like humans,
according to Boris Katz, a principal research scientist and head of the InfoLab Group at
CSAIL. This means machines that can recognize a visual scene, understand a text written in
natural language, or perform an action in the physical world.

Machine learning is one way to use AI. It was defined in the 1950s by AI pioneer Arthur
Samuel as “the field of study that gives computers the ability to learn without explicitly being
programmed.”

Some data is held out from the training data to be used as evaluation data, which tests how
accurate the machine learning model is when it is shown new data. The result is a model that
can be used in the future with different sets of data.

8
Algorithms in Machine Learning

Broadly, there are 3 types of Machine Learning Algorithms

● · Supervised Learning

This algorithm consists of a target / outcome variable (or dependent variable)


which is to be predicted from a given set of predictors (independent variables).

● · Unsupervised Learning

In this algorithm, we do not have any target or outcome variable to predict /


estimate. It is used for clustering population in different groups, which is
widely used for segmenting customers in different groups for specific
intervention.

● · Reinforcement Learning

Using this algorithm, the machine is trained to make specific decisions. It


works this way: the machine is exposed to an environment where it trains
itself continually using trial and error. This machine learns from past
experience and tries to capture the best possible knowledge to make accurate
business decisions.

9
Software Description

A quick Intro of the Notebook being used in our project

● Jupyter Notebook
Project Jupyter is a project and community whose goal is to
"develop open-source software, open-standards, and services
for interactive computing across dozens of programming
languages". It was spun off from IPython in 2014 by Fernando
Pérez and Brian Granger. Project Jupyter's name is a reference
to the three core programming languages supported by Jupyter,
which are Julia, Python and R, and also a homage to Galileo's Figure 2
notebooks recording the discovery of the moons of Jupiter. Source: Wikipidea [9]

Project Jupyter has developed and supported the interactive computing products
Jupyter Notebook, JupyterHub, and JupyterLab. Jupyter is a NumFOCUS fiscally
sponsored project

What is Train/Test
Train/Test is a method to measure the accuracy of your model.
It is called Train/Test because you split the data set into two sets: a training set and a testing
set.
80% for training, and 20% for testing.
You train the model using the training set.
You test the model using the testing set.
Train the model means create the model.
Test the model means test the accuracy of the model.

10
Methodologies

Stock is unpredicted and liberal in nature. The follow of the same is impressive and reluctant
in nature. Finding the predictability and getting the nearest is the best hit goal for the same.
The exact and accurate estimation of the same is never-less possible. There are various
constrains that in-fluctuate the pricing and the rate of stock. Those constrains had to be taken
in consideration before jumping to the conclusion and report derivation.

Here as described in the figure above, the proposed system will have an input from the
dataset which will be extracted featured wise and Classified underneath. The classification
technique used is supervised and the various techniques of machine level algorithms are
implemented on the same.
Training Dataset are created for training the machine and the test cases are derived and
implemented to carry out the visualization and the plotting's. The result generated are passed
and visualized in the graphical form.

11
Hardware Requirement

Processor: Intel i5 or above

RAM: Minimum 225MB or more.

Hard Disk: Minimum 2 GB of space

Input Device: Keyboard

Output Device: Screens of Monitor or a Laptop

Software Requirement

. Operating system: Windows & Linux

. IDE: Jupiter Notebook

. Data Set: .csv file

. Visualization: mat plot lib, pandas.

. Server: Web Server with HTTP process.

12
Implementation

Multiple Variables in the dataset

● Open price, close price represent the


starting and final price at which the stock
is traded and particular day.
● High, low and last represent the maximum
minimum and last price of the share for
the day.
● Total traded quality is the no. of shares
bought or sold in the day.

Describe function
Figure 3
describe() Source: Numpy Ninja [10]
count - no. of values
mean - average
std - standard deviation(diff from mean)
min - minimum values
max - minimum values
25% - values less than 25%
50% - values less than 50%
75% - values less than 75%

Dataset Analysis

● Train/Test set splitting


● Looking for corelation
● Creating pipeline(scaling of features)
● Selecting desired models
● evaluating the models
● implementing models on test set

13
These are the machine learning algorithms used during the building of this project:

Linear Regression

Linear regression is a linear approach for modelling the relationship between a scalar
response and one or more explanatory variables (also known as dependent and independent
variables). The case of one explanatory variable is called simple linear regression; for more
than one, the process is called multiple linear regression. This term is distinct from
multivariate linear regression, where multiple correlated dependent variables are predicted,
rather than a single scalar variable.

In linear regression, the relationships are modeled using linear predictor functions whose
unknown model parameters are estimated from the data. Such models are called linear
models. Most commonly, the conditional mean of the response given the values of the
explanatory variables (or predictors) is assumed to be an affine function of those values; less
commonly, the conditional median or some other quantile is used. Like all forms of
regression analysis, linear regression focuses on the conditional probability distribution of the
response given the values of the predictors, rather than on the joint probability distribution of
all of these variables, which is the domain of multivariate analysis.

A linear regression line has an equation of the form Y = a + bX, where X is the explanatory
variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the
value of y when x = 0).

Decision Tree Regression

Decision tree builds regression or classification models in the form of a tree structure. It
breaks down a dataset into smaller and smaller subsets while at the same time an associated
decision tree is incrementally developed. The final result is a tree with decision nodes and
leaf nodes. A decision node has two or more branches, each representing values for the
attribute tested. Leaf node represents a decision on the numerical target. The topmost
decision node in a tree which corresponds to the best predictor called root node. Decision
trees can handle both categorical and numerical data.

14
Random Forest Regression

Random Forest Regression is a supervised learning algorithm that uses ensemble learning
method for regression. Ensemble learning method is a technique that combines predictions
from multiple machine learning algorithms to make a more accurate prediction than a single

model.

The diagram above shows the structure of a Random Forest. You can notice that the trees run
in parallel with no interaction amongst them. A Random Forest operates by constructing
several decision trees during training time and outputting the mean of the classes as the
prediction of all the trees. To get a better understanding of the Random Forest algorithm, let’s
walk through the steps:

Pick at random k data points from the training set.

Build a decision tree associated to these k data points.


Figure 4
Choose the number N of trees you want to build
Source: and repeat steps 1 and 2.
CFI [11]

15
For a new data point, make each one of your N-tree trees predict the value of y for the data
point in question and assign the new data point to the average across all of the predicted y
values.

A Random Forest Regression model is powerful and accurate. It usually performs great on
many problems, including features with non-linear relationships. Disadvantages, however,
include the following: there is no interpretability, overfitting may easily occur, we must
choose the number of trees to include in the model.

Moving Average Algorithm

A moving average is one of the most


basic technical indicators used to
analyze stocks. “Moving average” is a
broad term and there are many
variations used by analysts to smooth
out price data and analyze trends.

Moving averages will require a time Figure 5


period for calculations. For example, an Source: DATAQ [12]

investor may choose a 50-day moving average, where the past 50 days in the data will be
used to calculate the average. Smaller windows of time are more sensitive to changes in the
price data due to the fewer number of data points and larger time periods are less sensitive to
daily changes.

Moving averages may be used by traders as their main strategy or as a part of their trading
strategy. The most popular and simple way to use moving averages is to use a crossover
strategy which, if followed, will hopefully tell a trader when to buy and sell. This strategy
will be discussed further.

There are many different types of moving averages but the main types are :

● Simple Moving Average (SMA)

● Exponential Moving Average (EMA)

● Cumulative Moving Average (CMA)

16
The main idea behind finding average is to smooth out variations to highlight the hidden
patterns in data. A line represents it on a chart.

We have taken two steps in moving average

Long Short Term Memory Algorithm

Long Short Term Memory is a kind of recurrent neural network. In RNN output from the last
step is fed as input in the current step. LSTM was designed by Hochreiter & Schmidhuber. It
tackled the problem of long-term dependencies of RNN in which the RNN cannot predict the
word stored in the long-term memory but can give more accurate predictions from the recent
information. As the gap length increases RNN does not give an efficient performance. LSTM
can by default retain the information for a long period of time. It is used for processing,
predicting, and classifying on the basis of time-series data.

Structure Of LSTM:

LSTM has a chain structure that contains four neural networks and different memory blocks
called cells.

Information is retained by the cells and the memory manipulations are done by
the gates. There are three gates:

● Forget Gate
● Input Gate
● Output Gate

In our model, LSTM we have;

● 1 input layer
● 1 output layer
● 1 hidden layer
● 60 neurons
● No dropouts
● 10 epochs
● 5 batch size (Dataset splt size)

17
Regression based model

18
Result Evaluation

We used 5 different means to implement and find the rmse value

The rmse value for the particular algos were;

● Linear Regression : 1.7906056032906494


● Decision tree regression : 3.307298814203003
● Random Forest Regression : 2.7282285579060015
● LSTM : 48.23022167731757
● Moving Average : 21.47947317457164

As clearly valuable, among the 5 algorithms above, Linear Regression is the one that
provides the best results.

19
Snapshots

20
21
22
23
24
25
26
27
28
29
Conclusion

To conclude stock is an unpredictable mechanism which follows the segments of chain and
the dependencies of the same are unpredictable. It is defined to be an curve which keeps on
changing and turning the price from low to high and vice-versa.

As the integration of the same is higher with other dependencies so leaving one dependencies
compromises the level of accuracy. Accuracy is not the term used over in stock as the actual
prediction is not possible for any fiscal days it keeps on changing and turning the tables day
and night. Having higher component assets and the dependencies makes it more feasible and
flexible in nature causing it even harder to predict. The approx value are taken into
consideration and the hit or profit or the gain rate is calculated for the same.

In the project various high level machine learning algorithms are implemented and integrated
and the output is generated from the same making a user visible with the outputs in the form
of graph which makes it easier for them to see and interpret what's the scenario and they can
decide on the same to invest and get the benefit out of it,

The proposed software takes the raw set of data from the dataset or the .csv file and process
it. The cleaning and cleansing of data is done and then further processed to gain the effective
outcomes. After the computational mean the output is displayed in the screen in the form of
graph.

30
Future Enhancement

Stock Market are the best alternative for business to grow and it's a side way income for the
individuals who are ready to invest and earn from the same. The term stock had been in
picture ever since and it's growing in bulk everyday. There are thousands of investors
investing on the same and making the fortune out of it.

There are middle level agents and stock vendors who learn and invest on the same. The cost
for the consultation on the stock is bulky and expensive. So when it comes to people they
think a lot and invest and there's no chance and certainty for the same to produce a yieldful
result.

So stock being unpredictable and the tendency of its growth is higher than ever. If the stock
market and its prediction can be done accurate than it's going to be a gain for both the
individuals and the organization. The risk factor have to be mitigated so the efficiency of the
system should be high and people can be certain about their investment in time.

The project can be further continued to gain the effectiveness of the prediction with addition
implementations of the content that can involve real time scenario and the way of executing
and processing the real time scenario. Various constrains has to be added and performance of
the same can be acylated in the future time for the effective results. The expected form of the
display is graph where as from the same the more appearance and setting of the display can
be integrated and a pie-chart and a custom graph can further me implemented on the same.

31
References

[1] Singh, A., 2021. Stock Price Prediction Using Machine Learning | Deep Learning.
[online] Analytics Vidhya. Available at:
<https://www.analyticsvidhya.com/blog/2018/10/predicting-stock-price-machine-learningnd-
deep-learning-techniques-python/> [Accessed 27 October 2021].

[2] Colah.github.io. 2021. Understanding LSTM Networks -- colah's blog. [online] Available
at: <https://colah.github.io/posts/2015-08-Understanding-LSTMs/> [Accessed 25 October
2021].

[3] Medium. 2021. Can a Machine Learning Model Read Stock Charts and Predict Prices?.
[online] Available at: <https://towardsdatascience.com/can-an-ml-model-read-stock-charts-
and-predict-prices-fb73c551c7a4> [Accessed 25 October 2021].

[4] Pratama, H., 2021. Simple Moving Average (SMA) Indicator Using Machine Learning.
[online] Medium. Available at: <https://medium.com/data-folks-indonesia/simple-moving-
average-sma-indicator-using-machine-learning-e8951f61dd9b> [Accessed 26 October 2021].

[5] Geeksforgeeks.org. 2021. Exploratory Data Analysis in Python - GeeksforGeeks. [online]


Available at: <https://www.geeksforgeeks.org/exploratory-data-analysis-in-python/amp>
[Accessed 26 October 2021].

[6] Youtube.com. 2021. [online] Available at:


<https://www.youtube.com/watch?v=I6ZBJo_xt2I> [Accessed 30 October 2021].

[7] Erex.co.jp. 2021. Trading business. [online] Available at:


<https://www.erex.co.jp/common/images/business/img_trading_01.jpg> [Accessed 18
October 2021].

[8] Wikipedia. 2021. Wikipedia. [online] Available at:


<https://www.google.com/url?sa=i&url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPr
oject_Jupyter&psig=AOvVaw2mMc4IIzc--
GHk41BcotSj&ust=1639027609371000&source=images&cd=vfe&ved=0CAsQjRxqFwoTC
Jim7-W70_QCFQAAAAAdAAAAABAD> [Accessed 21 October 2021].

[9] Mahalik, N., 2021. Linear Regression. [online] Static.wixstatic.com. Available at:
<https://static.wixstatic.com/media/300f21_4adf885a744d4f538991abe20d511c48~mv2.png/
v1/fill/w_291,h_291,al_c,q_95/300f21_4adf885a744d4f538991abe20d511c48~mv2.webp>
[Accessed 27 October 2021].
[10] Corporate Finance Institute. 2021. Random Forest. [online] Available at:
<https://corporatefinanceinstitute.com/resources/knowledge/other/random-forest/> [Accessed
24 October 2021].

32
[11] Dataq.com. 2021. A Closer Look At The Advanced CODAS Moving Average Algorithm.
[online] Available at: <https://www.dataq.com/resources/images/article_images/moving-
average/an15fig1.jpg> [Accessed 27 October 2021].

33

You might also like