You are on page 1of 21

PROJECT REPORT

on

“STOCK MARKET PREDICTION


USING MACHINE LEARNING”

Project report submitted in partial fulfillment of the


requirement for the

DATA SCIENCE USING PYTHON

Shreyas G N(1VE19EC098)
Shreyanshu Lakthariya(1VE19EC097)
Rahul R Sarthy(1VE19EC086)
Omkar Kiran Shet(1VE19EC075)

Under the Guidance of

JITHESH KURIAN
ABSTRACT

In Stock Market Prediction, the aim is to predict the future value of the financial stocks of a
company. The recent trend in stock market prediction technologies is the use of machine
learning which makes predictions based on the values of current stock market indices by
training on their previous values. Machine learning itself employs different models tomake
prediction easier and authentic. The paper focuses on the use of Regression and LSTM based
Machine learning to predict stock values.Factors considered are open, close, low, high and
volume.In the era of big data, deep learning for predicting stock market prices and trends has
become even more popular than before.Stock Market is one of the most sophisticated and
complicated way for any kind of business, brokerage corporations, banking sector and many
other depend on this very body to make revenue and lower the risks. We have collected a 2
months data from the NSE and proposed a comprehensive customization of stocks to analyze
the performance of the particular stock using linear regression model, KNN classifier while
analyzing the moving average of the stock and comprehensive customization of feature
engineering and deep learning- based model for predicting price trend of stock markets.
Algorithmic Trading chooses the three fundamental conditions of the securities trade
whether to buy, sell, or hold a stock. Different data controls were done and numerous
abilities were made which were mapped to different names and using classifiers endeavored
to anticipate the three conditions of the securities trade.The proposed solution is
comprehensive as it includes pre-processing of the stock market data set, utilization of
multiple feature engineering techniques, combined with a customized deep learning based
system for stock market price trend prediction. We conducted comprehensive evaluations on
frequently used machine learning models and conclude that our proposed solution
outperforms due to the comprehensive feature engineering that we built.
dex Terms- Close, high, low,LSTM model, open, regression, and volume.
Table of Contents

1.Title page
2.Abstract
3.Introduction
4.Problem statement
5.Method
6.System Analysis
7.Prediction Model
8.Results
9.Conclusion
10.References
INTRODUCTION

Stock market is trading platform where different investors sale and purchase shares
according tostock availability. Stock market ups and downs effects the profit of
stakeholders. If market prices going up with available stock then stakeholders get profit
with their purchased stocks. In other case, if market going down with available stock prices
then stakeholders have to face losses. Buyers buy stocks with low prices and sell stocks at
high prices and try to get huge profit. Similarly, sellers sell their products at high prices
for profit purpose . Stock market (SM)work as trusty platform among sellers and buyers.
Advances in Artificial Intelligence (AI) supporting a lot in each field of life with its
intelligent features. Several algorithms present in AIthat performing their role in future
predictions . Machine learning (ML) is a field of artificial intelligence (AI) that can be
considered as we train machines with data and analysis future with test data. Machines can
be trained on the basis of some standard that are called algorithms. Stock market
predictions can be great beneficial to businessman. SMP provide future trend of stock
prices on the basis of previous history . If stakeholders get future predictions then
investment can lead him toward profit. Predictions can be 50% correct and 50% wrong as
it is risk of business. Similarly, we rely on ML predictions about future prices of stock. In
this chapter we would like to explain these ML algorithms with the help of their working
methodologies and examples . Before working on actual problem SMP, complete
understandingof ML algorithms role in prediction is also necessary. Introducing Machine
learning to the area of stock market has helped to many researchers because of its accurate
and effecient measurement .In this project we are using a data set obtained from the NSC
for a period of 5 years.A correct prediction of stocks can lead to huge profits for the seller
and the broker. Frequently, it is brought out that prediction is chaotic rather than random,
which means it can be predicted by carefully analyzing the history of respective stock
market.Machine learning is an efficient way to represent such processes. It predicts a
market value close to the tangible value, thereby increasing the accuracy. Introduction of
machine learning to the area of stock prediction has appealed to many researches because
of its efficient and accurate measurements . The vital part of machine learning is the
dataset used. The dataset should be as concrete as possible because a little change in the
data can perpetuate massive changes in the outcome. In this project, supervised machine
learning is employed on a data set obtained from Yahoo Finance.
PROBLEM STATEMENT

Everyone want to be rich in his life with low efforts and great advantages. Similarly, we want to
look in our future with inner most desire as we do not want to take risks or we want to decrease
risk factor. Stock market is a place where selling and purchasing can provide future aims of life
. Now the question is that how we can get advantages from stock market? Or what are the steps
that can give us stocks market predictions before taking yourself in risk zone. How Artificial
Intelligence with Machine learning algorithms can be supportive for future market trend
predictions?

OBJECTIVE
The Sole objective of this project is to check the performance, Prediction of stock price & Buy /
sell prediction of a particular Stock in the Market, using various Machine Learning and Deep
Learning models. The successful prediction of the stock market will have a very positive impact
on the stock market institutions and the investors also.
METHOD

DATA ANALYSIS AND STOCK PREDICTION

Data analysis (DA) in machine learning (ML) is a process of applying technical skills
(ML Algorithms) on historical data to obtain statistical as well as tabular results about
predictions. It also considered as technical process of data illustration and evaluation DA
is process of distinguishing signals for decision making with statistical fluctuation of
results. DA also included collection as well as analyzing process, it can be iterative
according to problem statement. Several statistical techniques implemented in DA. Data
scientists find patterns of entire data with special observations. Several types of
quantitative as well as qualitative approaches as content analysis, history analysis,
sentimental analysis and bibliographic analysis involved in DA .DA study formulate
predictions on the basis of historical data that can be present in form of notes, files,
documents, tables, audio or video tapes. Accurate analysis of different research findings
can lead to valid knowledge discovery. Inaccurate statistical presentation of data destroys
the research findings of any scientist and guide wrong destinations to readers We are
using a dataset obtained online (find historical data for various stocks), wehave used data
form‘NIFTY 50’ The dataset is in raw format. The dataset needs to be converted into a
format that can be analyzed. Therefore there are some steps that are performed before
building the model Stock market prediction seems to be a complex problem, but by the
application of ML technologies one can relate previous data to the current data and train
the machine from it and make appropriate assumptions.In our project we will be using
various models to check the Performance of the stock and conclude the most effecient
way to check the performance.Machine Learning Models like Linear Regression,
Decision Tree, KNN, Naive Bayes, .Deep Learning Models like Long short-term memory
(LSTM) are used to predict the accuracy of the model.
Machine Learning (ML) algorithms Implementation for

StocksPredictions

We will implement machine learning algorithms on above explained datasets and we will also
analyses the trends of data manipulation .In this project we are using a dataset obtained from the
&turnover. There are some steps before building the model they are Handling missing data , One
Hot Encoding it converts categorical data to quantitative variable as any data in the form of string
or object does not help in analysing data. First step is to convert the columns to ‘category’ data

type. Second step is to apply label encoding in order to convert it into numerical values which
will be valuable for analysis. Third step is to convert the column into binary value (either0 or 1).
Data Normalization it is often possible that if data is not normalized, the column with high values
will be given more importance in prediction NSC for a period of 5 years. The Dataset has 5
variables Open, Close, low, High, Volume.
PREDICTION MODELS
In this study, we use four machine learning methods LogisticRegression ,LSTM.

1. REGRESSION Model

Logistic regression is used to assign observations to a separated set of classes as a classifier.


The algorithm transforms its output to return a probability value with the logistic sigmoid
function, and predicts the target by the concept of probability. Logistic Regression is similar
to Linear Regression model, but the Logistic Regression employs sigmoid function, instead
of logistic one, with more complexity.
The hypothesis behind logistic regression tries to limit the cost function between 0 and
1.Regression is used for predicting continuous values through some given independent
values .
The project is based upon the use of linear regression algorithm for predicting correct values
by minimizing the error function as given in Figure.
This operation is called gradient descent. Regression uses a given linear function for
predicting continuous values:Where, V is a continuous value; K represents known
independent values; and, a, b are coefficients.
Work was carried out on csv format of data through panda library and calculated the
parameter which is to be predicted, the price of the stocks with respect to time.
The data is divided into different train sets for cross validation to avoid over fitting. The test
set is generally kept 20% of the whole dataset. Linear regression as given by the above
equation is performed on the data and then predictions are made, which are plotted to show
the results of the stock market prices vs time.
Stock market prediction seems a complex problem because there are many factors that have
yet to be addressed and it doesn’t seem statistical at first. But by proper use of machine
learning techniques, one can relate previous data to the current data and train the
machine to learn from it and make appropriate assumptions. Machine learning as such has
many models but this paper focuses on two most important of them and made the predictions
using them.
2.LSTM (LONG SHORT TERM MEMORY)

LSTM is a particular type of RNN with an extensive range of uses such as document
classification, time series analysis, voice and speech recognition. Opposite to feedforward
networks, the predictions (created by RNNs) are dependent on prior estimations. In
experimental works, RNNs are not applied broadly due to include a few lacks that result in
impractical estimations. Without investigation of too much detail, LSTM solves the problems
by employing assigned gates for forgetting old information and learning new ones. LSTM
layer is made of four neural network layers that interact in a specific method. A usual LSTM
unit involves three different parts, a cell, an output gate and a forget gate. The main task of cell
is recognizing values over random time intervals and the task of controlling the information
flow into the cell and out of it belongs to the gates.

Schematic illustration of LSTM


MODEL PARAMETERS
Since stock market data are time-series information, there are two approaches for training
dataset of prediction models. Because of the recurrent nature of LSTM models, the technical
indicators of one or more days (up to 30 days) are considered and rearranged as input data to
be fed into the models. For other models except LSTM, ten technical indicators are fed to the
model. Output of all models is the stock trend value with respect to input data. For recurrent
models, output is the stock trend value of the last day of the training sample. All models
(except Naïve Bayes) have one or several parameters known as hyper-parameters which should
be adjusted to obtain optimal results. In this paper, one or two parameters of every model
(except Decision Tree and Logistic Regression which fixed parameter(s) is selected to be
adjusted for an optimal result based on numerous experimental works. In Tables 1-3, all fixed
and variable parameters of tree-based models, traditional supervised models, and neural-
network-based models are presented, respectively.

Parameter value
Metric Eucledian distance
Max epochs 100
EXPERIMENTAL RESULTS

1.LOGISTIC REGRESSION
import pandas as pd
import numpy as npfrom sklearn.model_selection import
train_test_splitfrom sklearn.svm import SVR
import datetime
import matplotlib.pyplot as
plt# %matplotlib inline
df = pd.read_csv("/content/sample_data/NSE-TATAGLOBAL11 (2)
(1).csv")df
df.head()
df.isna().any() #no cleaning reqried as all are
falsedf.info()
#plot
plt.figure(figsize=(16,8))
plt.plot(df['Close'], label='Close Price
history')#setting index as date
df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-
%d')df.index = df['Date']
#sorting
data = df.sort_index(ascending=True,
axis=0)#creating a separate dataset
new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date',
'Close'])for i in range(0,len(data)):
new_data['Date'][i] = data['Date'][i]
new_data['Close'][i] =
data['Close'][i]from fastai.tabular
import *
add_datepart(new_data,'Date')
new_data.drop('Elapsed', axis=1, inplace=True) #elapsed will be the time stamp
new_data['mon_fri'] = 0

for i in range(0,len(new_data)):
if (new_data['Dayofweek'][i] == 0 or new_data['Dayofweek'][i]
== 4):new_data['mon_fri'][i] = 1
else:
new_data['mon_fri'][i]
= 0#split into train and
validation train =
new_data[:987]
valid = new_data[987:]
x_train = train.drop('Close',
axis=1)y_train = train['Close']
x_valid = valid.drop('Close',
axis=1)y_valid = valid['Close']
model.score(x_train, y_train)
model.fit(x_train,y_train)
preds = model.predict(x_valid)
rms=np.sqrt(np.mean(np.power((np.array(y_valid)-
np.array(preds)),2))) rms
valid['Predictions'] = 0 valid['Predictions'] = preds
valid.index = new_data[987:].
index train.index = new_data[:987].index
plt.plot(train['Close'])
plt.plot(valid[['Close',
'Predictions']])

We have modeled logistic regression and got the accuracy of 84.5% and got the graph.
2.LSTM MODEL
import pandas as
pdimport numpy
as np
df=pd.read_csv('/content/sample_data/NSE-TATAGLOBAL11 (2)
(1).csv')df.head()
df.tail()
df =
df.reset_index()['Close']
df.shape
df
import matplotlib.pyplot as
pltplt.plot(df)
from sklearn.preprocessing import MinMaxScale
scaler = MinMaxScaler(feature_range=(0,1))
df= scaler.fit_transform(np.array(df).reshape(-
1,1))print(df)
#Train and Test split-Whenever training Time series data we should divide the data
differentlywe should train the data with the respective date. Always remember that in
time-series data theone data is dependent on other data. The training size should be 65%
of the total length of the data frame, the test size should be the difference between the
length of the dataset and the training size.
training_size =
int(len(df)*0.70)test_size =
len(df)-training_size
train_data, test_data = df[0:training_size, :], df[training_size:
len(df), :1]training_size, test_size
len(train_data), len(test_data)
#DATA PREPROCESSING-Now consider the time steps, if I want to predict the price of
the stock in a day that how previous data should be considered .Now the time step value
will be 50. Let’s split the data X, Y. In the 0th iteration the first 50 elements goes as your
first record and the51 elements will be put up in the X. The 50elements will be put up in
the Y.
def create_dataset(dataset, time_step =
1):dataX, dataY = [],[]
for i in range(len(dataset) - time_step -
1): a = dataset[i:(i+time_step), 0]
dataX.append(a)
dataY.append(dataset[i+time_step,
0])
return np.array(dataX),
np.array(dataY)time_step = 50
X_train, Y_train = create_dataset(train_data,
time_step)X_test, Y_test = create_dataset(test_data,
time_step) print(X_train.shape),
print(Y_train.shape)
# reshape the X train and X test to 3 dimensional array which is required for
LSTMX_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)

X_test = X#Creating LSTM Model


from tensorflow.keras.models import
Sequentialfrom tensorflow.keras.layers
import Dense
from tensorflow.keras.layers import
LSTMmodel =Sequential()
model.add(LSTM(50, return_sequences=True, input_shape =
(50, 1)))model.add(LSTM(50, return_sequences=True))
model.add(LSTM(50))
model.add(Dense(1))
model.compile(loss = 'mean_squared_error', optimizer= 'adam')
model.summary()
model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs = 100, batch_size=64,
verbose=1)
# performance metrics
train_predict =
model.predict(X_train)test_predict
= model.predict(X_test) #conver to
original form
train_predict =
scaler.inverse_transform(train_predict)
test_predict =
scaler.inverse_transform(test_predict)
# Root Mean Suare
Errorimport math
from sklearn.metrics import mean_squared_error
train_rmse = math.sqrt(mean_squared_error(Y_train,
train_predict))test_rmse =
math.sqrt(mean_squared_error(Y_test, test_predict))
train_rmse, test_rmse
### Plotting
# shift train predictions for
plottinglook_back=50
trainPredictPlot =
np.empty_like(df)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[look_back:len(train_predict)+look_back, :] = train_predict
_test.reshape(X_test.shape[0], X_test.shape[1], 1)
# shift test predictions for
plotting testPredictPlot =
np.empty_like(df)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(train_predict)+(look_back*2)+1:len(df)-1, :] =
test_predict# plot baseline and predictions
plt.plot(scaler.inverse_transform(df))
plt.show()
plt.plot(trainPredictPlot)
plt.show
plt.plot(testPredictPlot)
plt.show
# plot baseline and predictions
plt.plot(scaler.inverse_transform(df))plt.plot(trainPredictPlot)plt.plot(testPredictPlot)plt.sho
w()

Data set Train prediction

Test prediction
FINAL OUPUT GRAPH OF STOCK PREDICTIONS
If I consider the last date in the test data as of 22-8-2021, I want to predict the output of
23-05-2021. We need the previous 100 data for that I am taking the data and reshaping
it.

x_input=test_data[341:].reshape(1,
-1)x_input.shape
From the above model we got the rmse value of 182.70 and test rmse value as 153.44.

#TO BUY AND SELL THE STOCK


date1 = int(input("When do you want to buy the stock : "))
date2 = int(input("When do you want to sell the stock (Less than 320
days) : "))if date2 > 319:
print("enter lesser value than 319")
date2 = int(input("When do you want to sell the stock (Less than 320 days) :
"))else:
pass
if (date2 < date1):
print("cannot sell preior to buying
stock")elif (date1 == date2):
print("Hold your posiotion for some more days")

else:
buy=
test_predict[date1] sell
= test_predict[date2]
plt.plot(testPredictPlot, color = 'g', label = 'Price
Range')plt.plot((914+date1), buy, marker = '*', label =
'Buy') plt.plot((914+date2), sell, marker = '*', label =
'Sell') plt.xlabel("Days")
plt.ylabel("Closing Price")
plt.title('Buy and Sell
Graph')plt.legend()
plt.show
print( "You Bought the stock at a price of ",
buy )print("your selling price is ", sell)
if buy > sell:
print("do not sell the
stock")elif buy == sell:
print("hold your position to get
profit")

elif buy < sell:


print("you can sell the stock")
When do you want to buy the stock : 15
When do you want to sell the stock (Less than 320 days) :
319You Bought the stock at a price of [154.47191]
your selling price is
[160.22372] you can sell the
stock
if you sell the stock, you will get [5.751816] profit per stock.

Plot between Actual and Predicted Trend of LSTM

GRAPH OF BUYING AND SELLING THE STOCK


RESULT

For training machine learning models, we implement the following steps: normalizing features
(just for continuous data), randomly splitting the main data set into train data and test data
(30% of dataset was assigned to the test part), fitting the models and evaluating them by
validation data (and ‘‘early stopping’’) to prevent over fitting, and using metrics for final
evaluation with test data. The creating deep models is different from machine learning when the
input values must bethree dimensional (samples, time_steps, features); so, we use a function to
reshape the input values. Also, weight regularization and dropout layer are employed to prevent
over fitting here. All coding process in this study is implemented by python3 with Scikit Learn
and Kears library. Based on extensive experimental works by deeming the approaches,For each
model, the gives least accuracy ,knn,lr gave accurate results but one of the most accurate
prediction was lstm deep learningmodel. As a prominent result, deep learning methods (LSTM)
show a technical skill to forecast stock movement in both approaches, especially for continuous
data when the performance of machine learning models is so weaker than binary method.
However, the running time of those isalways more than others due to use large amount of epochs
and prices from some days before.wealso have predicted when to sell and buy the stock such that
we can invest and make profit by using lstm model.The buy and sell model in the LSTM uses the
closing price values of the test prediction values of the stock and the model tells us weather the
stock as to be sold or hold the stock for a certain period and sell after sometime to get profits.
CONCLUSION
Two techniques have been utilized in this paper:LSTM and Regression, on the Yahoo
finance data set.Both the techniques have shown an improvement in the accuracy of
predictions, thereby yielding positive results. Use of recently introduced machine
learning techniques in the prediction of stocks have yielded promising results and thereby
marked the use of them in profitable exchange schemes. It has led to the conclusion that
it is possible to predict stock market with more accuracy and efficiency using machine
learning techniques. In the future, the stock market prediction system can
be further improved by utilizing a much bigger dataset than the one being utilized
currently. This would help to increase the accuracy of our prediction models.
Furthermore, other models of Machine Learning could also be studied to check for the
accuracy rate resulted by them.The purpose of this study was the prediction task of stock
market movement by machine learning and deep learning algorithms, the dataset was
based NIFTY 50 Also,six machine learning models ( Logistic Regression) and one deep
learning methods (LSTM) were employed as predictors. We supposed two approaches for
input values to models, continuous data and binary data, and we employed.so finally
LSTM are top predictors with a considerable difference compared to other models.
Indeed, the running time of those superiors is more than other algorithms.
REFERENCES
1. [1] M. Usmani, S. H. Adil, K. Raza and S. S. A. Ali, "Stock market prediction using
machine learning techniques," 2016 3rd International Conference on Computer and
Information Sciences (ICCOINS), Kuala Lumpur, 2016, pp. 322-327.
2. [2] K. Raza, "Prediction of Stock Market performance by using machine learning
techniques," 2017 International Conference on Innovations in Electrical Engineering
andComputational Technologies (ICIEECT), Karachi, 2017.
3. [3] H. Gunduz, Z. Cataltepe and Y. Yaslan, "Stock market direction prediction using
deep neural networks," 2017 25th Signal Processing and Communications Applications
Conference (SIU), Antalya, 2017.
4. [4] M. Billah, S. Waheed and A. Hanifa, "Stock market prediction using an improved
training algorithm of neural network," 2016 2nd International Conference on Electrical,
Computer & Telecommunication Engineering (ICECTE), Rajshahi, 2016, pp. 1-4.
5. [5] H. L. Siew and M. J. Nordin, "Regression techniques for the prediction of stock
price trend," 2012 International Conference on Statistics in Science, Business and
Engineering (ICSSBE), Langkawi, 2012, pp. 1-5.

You might also like