You are on page 1of 8

Stock Price Movements Classification

Using Machine and Deep Learning


Techniques-The Case Study of Indian
Stock Market

Nagaraj Naik(B) and Biju R. Mohan

Department of Information Technology,


National Institute of Technology, Karnataka, Surathkal, India
it16fv04.nagaraj@nitk.edu.in, bijurmohan@gmail.com
http://www.nitk.ac.in

Abstract. Stock price movements forecasting is an important topic for


traders and stock analyst. Timely prediction in stock yields can get more
profits and returns. The predicting stock price movement on a daily basis
is a difficult task due to more ups and down in the financial market.
Therefore, there is a need for a more powerful predictive model to predict
the stock prices. Most of the existing work is based on machine learn-
ing techniques and considered very few technical indicators to predict
the stock prices. In this paper, we have extracted 33 technical indica-
tors based on daily stock price such as open, high, low and close price.
This paper addresses the two problems, first is the technical indicator
feature selection and identification of the relevant technical indicators
by using Boruta feature selection technique. The second is an accurate
prediction model for stock price movements. To predict stock price move-
ments we have proposed machine learning techniques and deep learning
based model. The performance of the deep learning model is better than
the machine learning techniques. The experimental results are significant
improves the classification accuracy rate by 5% to 6%. National Stock
Exchange, India (NSE) stocks are considered for the experiment.

Keywords: ANN · Boruta feature selection · Deep learning · SVM

1 Introduction

Generally, the financial time series movements predictions is a difficult task due
to unstable stock data which is noisy and nonlinear. The variation in policies
such as economic policy, macroeconomic data, political uncertainty, and gov-
ernment policy are affected in the direction of the stock market. This can be
reflected in stock prices and stock market fluctuated and volatile due to this rea-
son. Classification, regression and pattern recognition problems have been solved
using Artificial Neural Networks (ANN) over the years. Stock market data is the
c Springer Nature Switzerland AG 2019
J. Macintyre et al. (Eds.): EANN 2019, CCIS 1000, pp. 445–452, 2019.
https://doi.org/10.1007/978-3-030-20257-6_38
446 N. Naik and B. R. Mohan

time series data which is more volatile during day trade and it has tremendous
noise. The structure of data is complex due to high dimensionality. Therefore, to
make accurate decisions in stock markets, fundamental analysis, technical anal-
ysis, and artificial intelligence methods have been used by professional traders.
Artificial intelligence techniques are widely used for predicting nonlinear, noisy
and chaotic kind of data. In the past, most of the studies were considered data
mining methods and Neural Networks (NN). Most of the existing NN work had
a limitation in learning the larger amount of nonlinear, complex stock data and
extracting features of larger amount data is a difficult task.
The contribution of this study can be summarized as follows. First is the
technical indicator feature selection and identification of the relevant technical
indicators by using Boruta feature selection techniques. The second is an accu-
rate prediction model for stock prices.

2 Related Work

Zhong et al. [18] studied data mining method for forecasting stock prices on a
daily basis. The study considered various financial and economic features and
dimension of the feature has been reduced by techniques, namely fuzzy robust
principal component analysis and KPCA. Stock data which is noisy and nonlin-
ear, however reducing the noise could be effective while constructing the fore-
casting model. To accomplish this task, the integration of PCA and SVR have
been proposed. In this first step, a set of technical indicators is calculated from
the daily transaction data of the target stock and then PCA applied to these
values aiming to extract the principal components. After filtering the principal
components, a model is finally constructed to forecast the future price of the tar-
get stocks [7]. The three feature selection techniques have been discussed, namely
PCA, genetic algorithms and decision trees for forecasting the stock prices [15].
Most of the literature PCA is applied for data representation and transforma-
tion. However, PCA is considered for linearly transforms the high dimension data
into new low dimensional data. Therefore the KPCA method has been proposed
to handle the nonlinear data by using appropriate kernel parameters [5]. Nahil
et al. [12] introduced Kernel Principal Component Analysis (KPCA) to reduce
the dimensions of the technical indicator feature.
Moving average convergence, divergence, and exponential moving average are
stock technical indicators have been studied to identify the short term of stock
prices [1]. Chourmouziadis et al. [6] addressed the problem of bull and bear mar-
ket trends using fuzzy logic. Lin et al. [10] proposed PCA to reduce and filter
the noise in the data. However, most of the study, PCA improves the prediction
accuracy is very small. Deep learning extensively used in medical image classi-
fication, big data analysis, electronic health record analysis, Parkinson’s disease
diagnosis and so on [14].
Stock Price Movements Classification 447

3 Data Specification
In this paper, stock data are collected from http://www.nseindia.com. The data
contain information about stock such as stock day open price, day low price,
day high price and day close price. We have considered banking sector stock,
namely ICICI Bank, Yes Bank, Kotak Bank and SBI Bank. The dataset range
is obtained from the year 2009 to 2018. Each stock on the closing basis, we have
assigned a stock class tag up and down by comparing the stock current price
and its previous price.

4 Proposed Work
The flow of the proposed model is described in Fig. 1. The data are retrieved from
NSE. The study considered 33 different combinations of technical indicators and
computed based on formulas [9] which are described in Table 1.

Table 1. Technical indicators and its formulas Kara et al. [9]

Technical Calculation Number of days


indicator name
Simple moving (Ct + Ct−1 + ....Ct−n+1 )/n 5, 10, 14, 30, 50, 100, 200
average (SMA)
Exponential (Ct − SM A(n)t−1 ) ∗ (2/n + 1) + SM A(n)t−1 5, 10, 14, 30, 50, 100, 200
moving average
Momentum Ct − Cn − 9 5, 10, 14
indicator
Stochastic 100 ∗ ((Ct − Lt (n))/(Ht (n) − Lt (n)) 14
oscillator
Stochastic (100 ∗ ((Ct − Lt (n))/(Ht (n) − Lt (n)))/3 14
oscillator
Moving average SM A(n) − SM A(n) 26, 13, 19, 45, 25, 15
convergence
divergence
Relative 100 − (100/(1 + Avg(Gain)/Avg(Loss))) 14, 28
strength index
Williams R ((Ht − Ct )/(Hn − Ln )) × 100 14, 28, 50, 100
Accumulation Ht − Ct−1 /Ht − Lt ((Ct /L)/(H/C))/(H/L) 14
distribution
index
Commodity ((H + L + C/3) − SMA)/(0.015 ∗ Mean deviation) 14, 50, 100
channel index

4.1 Technical Indicator Selection

The proposed task is carried out by using two approaches. First is Boruta feature
selection method which is used to select the important feature of technical indi-
cators. In this method it create duplicate/shadow copies of input feature to make
448 N. Naik and B. R. Mohan

Fig. 1. Overall proposed work.

dataset as random. Random shuffling of data removes their correlations with the
outcome variable. Random forest algorithm has been applied to find important
technical indicator feature based on higher mean values(Z). In this algorithm,
we have considered the Z score threshold value as 0.80. If any technical indica-
tor feature has a threshold value is greater than 0.80 then it is considered for
classification. The step by step proposed Boruta feature selection algorithm is
stated in Algorithm 1. We have carried out this task by using Boruta package
in R programming. Second task is accurate prediction model. Feature selection
performed on technical indicator using Boruta algorithm and selected technical
indicator feature is given as input to the prediction model.

Algorithm 1
1: Input 33 technical indicators feature F.
2: Create duplicate/shadow copies of technical indicators feature D.
3: Do the random Shuffle original technical indicators F and duplicate copies of tech-
nical indicator D to remove their correlations with the outcome variable.
4: Apply random forest algorithm to find important technical indicator feature based
on higher mean values .
5: Calculate Z score by using Mean/Std deviation.
6: Find the maximum Z score on duplicates technical indicator feature.
7: Remove technical indicator feature if Z is less than Technical indicator feature.

4.2 Prediction Model

Deep Learning. Deep learning is extensively used in medical image classifi-


cation, big data analysis and electronic health record analysis. The literature
Stock Price Movements Classification 449

suggested that these methods gain the highest accuracy compares to the other
machine learning algorithm. Kara et al. [9] has been proposed a framework for
stock prediction and it used a three-layer artificial neural network. In our pro-
posed work deep learning in H2O is implemented. Feature selection performed
on technical indicator using Boruta algorithm and selected technical indicator
feature is given as input to the deep learning model. The deep learning model is
used to classify stock price up and down movement and it is described in Fig. 2.
It has five layers of interconnected neuron units through which data is trans-
formed. The input layer neurons represent technical indicators feature which is
denoted by ti and Wi denotes the weights of the neurons. Stochastic gradient
descent with back-propagation has been used to adjust the weight. Bias input is
given to each layer except the output layer of the model. The objective function
L(W, Bias|j) aims is to reduce the classification error in the data.
The weighted combination of input summation is denoted in Eq. 1.

n 
α= Wi ti + Bias (1)
i=1

The activation function Tanh and rectified linear units are used. The model
supports the regularization function to avoid overfitting as shown in Eq. 2.

L(W, Bias|j) = L(W, Bias|j) + λ1R1(W, Bias|j) + λ2R2(W, Bias|j) (2)

ANN Model. Feature selection performed on technical indicator using Boruta


algorithm and selected technical indicator feature is given as input to the ANN.
In this work, the ANN is used to classify the stock price. ANN has three layers,
each layer is connected to the other. The neurons represent the technical indica-
tors. The sigmoid function activation function is used in the ANN model. The
threshold value 0.5 has been set. A gradient descent momentum parameters are
considered to determine the weights and to reduce the global minimum.

Support Vector Machines (SVM). In this prediction model, we have studied


SVM for two classes, namely up and down problems. SVM is based on the VC
learning theory and one of its major components were developed by Vapnik
[4,16]. SVMs are also showing strong performances in real-world applications.
SVM hyperplane is constructed based on input vectors. To separate the input
vector two hyperplane is constructed. Stock market data are non-linear separable
datasets and SVM can be more effective when datasets are non-linear.
Polynomial and radial basis kernel functions are shown in Eqs. 3 and 4.

P olynomialF unction : K(fi , fj ) = (fi .fj + 1)d (3)

RadialBasisF unction : K(fi , fj ) = exp(γ||fi − fj ||2 ) (4)


450 N. Naik and B. R. Mohan

Fig. 2. Proposed five layer deep learning model

The degree of polynomial function is represented by d and γ represents the


constant of radial basis function. We have varied SVM parameter’s degree value
from 1 to 4 and gamma is 0.1 to 5 to get the best accuracy.

5 Experimental Results and Discussion


Each stock on the closing basis, we have assigned a stock class tag up and down
by comparing the stock current price and its previous price. We have used Accu-
racy and F-Measure to evaluate the performance of deep and machine learning
model. The Accuracy and F-measure are given in Eqs. 5 and 6. NSE datasets

Table 2. Result comparision

Stock Ten technical indicators Patel et al. [13]


ANN SVM RF
Accuracy F-Measure Accuracy F-Measure Accuracy F-Measure
ICICI Bank 73.12% 0.7470 68.55% 0.6935 77.12% 0.7877
SBI Bank 74.12% 0.7248 70.35% 0.7080 78.85% 0.7987
Yes Bank 72.12% 0.7414 71.35% 0.7130 77.15% 0.7638
Kotak Bank 73.12% 0.7532 72.35% 0.7210 76.35% 0.7637
Stock Proposed model
ANN SVM Deep learning
Accuracy F-Measure Accuracy F-Measure Accuracy F-Measure
ICICI Bank 79.42% 0.796 76.61% 0.769 83.10% 0.815
SBI Bank 79.60% 0.793 77.01% 0.776 84.50% 0.824
Yes Bank 78.32% 0.781 76.63% 0.753 83.67% 0.833
Kotak Bank 78.60% 0.733 77.51% 0.786 83.90% 0.844
Stock Price Movements Classification 451

consist of 2400 rows. We have used tenfold cross-validation in the experiment.


The Experiment is carried out in the R Studio platform.
T rueP ositive + T rueN egative
Accuracy =
T rueP ositive + T rueN egative + F alseP ositive + F alseN egative
(5)
2 × precision × recall
F -M easure = (6)
precision + recall
The performance of the proposed prediction model is better than existing
work and it is described in Table 2. We have fine-tuned the parameter settings
to different levels to maximize the model accuracy.

6 Conclusion
Stock market predictions is a difficult task for stock fund managers and finan-
cial analysts due to unstable stock data which is noisy and nonlinear. The paper
focused on stock price movements classification on a daily basis. We conclude
that boruta feature selection is a useful method for identification of relevant
technical indicators. The study also demonstrated that deep learning model per-
formance is better than machine learning techniques. The contribution of this
study can be summarized as follows. First is the technical indicator feature selec-
tion and identification of the relevant technical indicators by using Boruta feature
selection techniques. The second is an accurate prediction model for stocks. The
stock data is collected from the National Stock Exchange (NSE), India.

Acknowledgment. This work is supported by the Visvesvaraya Ph.D Scheme for


Electronics and IT the departments of MeitY, Government of India. The Task carried
out at the Department of IT, NITK Surathkal, Mangalore, India.

References
1. Anbalagan, T., Maheswari, S.U.: Classification and prediction of stock market
index based on fuzzy metagraph. Procedia Comput. Sci. 47, 214–221 (2015)
2. Anish, C.M., Majhi, B.: Hybrid nonlinear adaptive scheme for stock market pre-
diction using feedback flann and factor analysis. J. Korean Stat. Soc. 45(1), 64–76
(2016)
3. Barak, S., Modarres, M.: Developing an approach to evaluate stocks by forecasting
effective features with data mining methods. Expert Syst. Appl. 42(3), 1325–1339
(2015)
4. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal mar-
gin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational
Learning Theory, pp. 144–152. ACM (1992)
5. Cao, L.J., Chua, K.S., Chong, W.K., Lee, H.P., Gu, Q.M.: A comparison of PCA,
KPCA and ICA for dimensionality reduction in support vector machine. Neuro-
computing 55(1–2), 321–336 (2003)
452 N. Naik and B. R. Mohan

6. Chourmouziadis, K., Chatzoglou, P.D.: An intelligent short term stock trading


fuzzy system for assisting investors in portfolio management. Expert Syst. Appl.
43, 298–311 (2016)
7. Chowdhury, U.N., Rayhan, M.A., Chakravarty, S.K., Hossain, M.T.: Integration
of principal component analysis and support vector regression for financial time
series forecasting. Int. J. Comput. Sci. Inf. Secur. (IJCSIS), 15 (2017)
8. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297
(1995)
9. Kara, Y., Boyacioglu, M.A., Baykan, Ö.K.: Predicting direction of stock price index
movement using artificial neural networks and support vector machines: the sample
of the Istanbul stock exchange. Expert Syst. Appl. 38(5), 5311–5319 (2011)
10. Lin, X., Yang, Z., Song, Y.: Short-term stock price prediction based on echo state
networks. Expert Syst. Appl. 36(3), 7313–7317 (2009)
11. Long, W., Lu, Z., Cui, L.: Deep learning-based feature engineering for stock price
movement prediction. Knowl.-Based Syst. 164, 163–173 (2019)
12. Nahil, A., Lyhyaoui, A.: Short-term stock price forecasting using kernel principal
component analysis and support vector machines: the case of Casablanca stock
exchange. Procedia Comput. Sci. 127, 161–169 (2018)
13. Patel, J., Shah, S., Thakkar, P., Kotecha, K.: Predicting stock and stock price
index movement using trend deterministic data preparation and machine learning
techniques. Expert Syst. Appl. 42(1), 259–268 (2015)
14. Tahmassebi, A., Gandomi, A., McCann, I., Schulte, M., Goudriaan, A., Meyer-
Baese, A.: Deep learning in medical imaging: fMRI big data analysis via con-
volutional neural networks. In: Proceedings of the Practice and Experience on
Advanced Research Computing. ACM (2018)
15. Tsai, C.-F., Hsiao, Y.-C.: Combining multiple feature selection methods for stock
prediction: union, intersection, and multi-intersection approaches. Decis. Support
Syst. 50(1), 258–269 (2010)
16. Vapnik, V.N., Chervonenkis, A.J.: Theory of pattern recognition (1974)
17. Weng, B., Ahmed, M.A., Megahed, F.M.: Stock market one-day ahead movement
prediction using disparate data sources. Expert Syst. Appl. 79, 153–163 (2017)
18. Zhong, X., Enke, D.: Forecasting daily stock market return using dimensionality
reduction. Expert Syst. Appl. 67, 126–139 (2017)

You might also like