Professional Documents
Culture Documents
1 Introduction
Generally, the financial time series movements predictions is a difficult task due
to unstable stock data which is noisy and nonlinear. The variation in policies
such as economic policy, macroeconomic data, political uncertainty, and gov-
ernment policy are affected in the direction of the stock market. This can be
reflected in stock prices and stock market fluctuated and volatile due to this rea-
son. Classification, regression and pattern recognition problems have been solved
using Artificial Neural Networks (ANN) over the years. Stock market data is the
c Springer Nature Switzerland AG 2019
J. Macintyre et al. (Eds.): EANN 2019, CCIS 1000, pp. 445–452, 2019.
https://doi.org/10.1007/978-3-030-20257-6_38
446 N. Naik and B. R. Mohan
time series data which is more volatile during day trade and it has tremendous
noise. The structure of data is complex due to high dimensionality. Therefore, to
make accurate decisions in stock markets, fundamental analysis, technical anal-
ysis, and artificial intelligence methods have been used by professional traders.
Artificial intelligence techniques are widely used for predicting nonlinear, noisy
and chaotic kind of data. In the past, most of the studies were considered data
mining methods and Neural Networks (NN). Most of the existing NN work had
a limitation in learning the larger amount of nonlinear, complex stock data and
extracting features of larger amount data is a difficult task.
The contribution of this study can be summarized as follows. First is the
technical indicator feature selection and identification of the relevant technical
indicators by using Boruta feature selection techniques. The second is an accu-
rate prediction model for stock prices.
2 Related Work
Zhong et al. [18] studied data mining method for forecasting stock prices on a
daily basis. The study considered various financial and economic features and
dimension of the feature has been reduced by techniques, namely fuzzy robust
principal component analysis and KPCA. Stock data which is noisy and nonlin-
ear, however reducing the noise could be effective while constructing the fore-
casting model. To accomplish this task, the integration of PCA and SVR have
been proposed. In this first step, a set of technical indicators is calculated from
the daily transaction data of the target stock and then PCA applied to these
values aiming to extract the principal components. After filtering the principal
components, a model is finally constructed to forecast the future price of the tar-
get stocks [7]. The three feature selection techniques have been discussed, namely
PCA, genetic algorithms and decision trees for forecasting the stock prices [15].
Most of the literature PCA is applied for data representation and transforma-
tion. However, PCA is considered for linearly transforms the high dimension data
into new low dimensional data. Therefore the KPCA method has been proposed
to handle the nonlinear data by using appropriate kernel parameters [5]. Nahil
et al. [12] introduced Kernel Principal Component Analysis (KPCA) to reduce
the dimensions of the technical indicator feature.
Moving average convergence, divergence, and exponential moving average are
stock technical indicators have been studied to identify the short term of stock
prices [1]. Chourmouziadis et al. [6] addressed the problem of bull and bear mar-
ket trends using fuzzy logic. Lin et al. [10] proposed PCA to reduce and filter
the noise in the data. However, most of the study, PCA improves the prediction
accuracy is very small. Deep learning extensively used in medical image classi-
fication, big data analysis, electronic health record analysis, Parkinson’s disease
diagnosis and so on [14].
Stock Price Movements Classification 447
3 Data Specification
In this paper, stock data are collected from http://www.nseindia.com. The data
contain information about stock such as stock day open price, day low price,
day high price and day close price. We have considered banking sector stock,
namely ICICI Bank, Yes Bank, Kotak Bank and SBI Bank. The dataset range
is obtained from the year 2009 to 2018. Each stock on the closing basis, we have
assigned a stock class tag up and down by comparing the stock current price
and its previous price.
4 Proposed Work
The flow of the proposed model is described in Fig. 1. The data are retrieved from
NSE. The study considered 33 different combinations of technical indicators and
computed based on formulas [9] which are described in Table 1.
The proposed task is carried out by using two approaches. First is Boruta feature
selection method which is used to select the important feature of technical indi-
cators. In this method it create duplicate/shadow copies of input feature to make
448 N. Naik and B. R. Mohan
dataset as random. Random shuffling of data removes their correlations with the
outcome variable. Random forest algorithm has been applied to find important
technical indicator feature based on higher mean values(Z). In this algorithm,
we have considered the Z score threshold value as 0.80. If any technical indica-
tor feature has a threshold value is greater than 0.80 then it is considered for
classification. The step by step proposed Boruta feature selection algorithm is
stated in Algorithm 1. We have carried out this task by using Boruta package
in R programming. Second task is accurate prediction model. Feature selection
performed on technical indicator using Boruta algorithm and selected technical
indicator feature is given as input to the prediction model.
Algorithm 1
1: Input 33 technical indicators feature F.
2: Create duplicate/shadow copies of technical indicators feature D.
3: Do the random Shuffle original technical indicators F and duplicate copies of tech-
nical indicator D to remove their correlations with the outcome variable.
4: Apply random forest algorithm to find important technical indicator feature based
on higher mean values .
5: Calculate Z score by using Mean/Std deviation.
6: Find the maximum Z score on duplicates technical indicator feature.
7: Remove technical indicator feature if Z is less than Technical indicator feature.
suggested that these methods gain the highest accuracy compares to the other
machine learning algorithm. Kara et al. [9] has been proposed a framework for
stock prediction and it used a three-layer artificial neural network. In our pro-
posed work deep learning in H2O is implemented. Feature selection performed
on technical indicator using Boruta algorithm and selected technical indicator
feature is given as input to the deep learning model. The deep learning model is
used to classify stock price up and down movement and it is described in Fig. 2.
It has five layers of interconnected neuron units through which data is trans-
formed. The input layer neurons represent technical indicators feature which is
denoted by ti and Wi denotes the weights of the neurons. Stochastic gradient
descent with back-propagation has been used to adjust the weight. Bias input is
given to each layer except the output layer of the model. The objective function
L(W, Bias|j) aims is to reduce the classification error in the data.
The weighted combination of input summation is denoted in Eq. 1.
n
α= Wi ti + Bias (1)
i=1
The activation function Tanh and rectified linear units are used. The model
supports the regularization function to avoid overfitting as shown in Eq. 2.
6 Conclusion
Stock market predictions is a difficult task for stock fund managers and finan-
cial analysts due to unstable stock data which is noisy and nonlinear. The paper
focused on stock price movements classification on a daily basis. We conclude
that boruta feature selection is a useful method for identification of relevant
technical indicators. The study also demonstrated that deep learning model per-
formance is better than machine learning techniques. The contribution of this
study can be summarized as follows. First is the technical indicator feature selec-
tion and identification of the relevant technical indicators by using Boruta feature
selection techniques. The second is an accurate prediction model for stocks. The
stock data is collected from the National Stock Exchange (NSE), India.
References
1. Anbalagan, T., Maheswari, S.U.: Classification and prediction of stock market
index based on fuzzy metagraph. Procedia Comput. Sci. 47, 214–221 (2015)
2. Anish, C.M., Majhi, B.: Hybrid nonlinear adaptive scheme for stock market pre-
diction using feedback flann and factor analysis. J. Korean Stat. Soc. 45(1), 64–76
(2016)
3. Barak, S., Modarres, M.: Developing an approach to evaluate stocks by forecasting
effective features with data mining methods. Expert Syst. Appl. 42(3), 1325–1339
(2015)
4. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal mar-
gin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational
Learning Theory, pp. 144–152. ACM (1992)
5. Cao, L.J., Chua, K.S., Chong, W.K., Lee, H.P., Gu, Q.M.: A comparison of PCA,
KPCA and ICA for dimensionality reduction in support vector machine. Neuro-
computing 55(1–2), 321–336 (2003)
452 N. Naik and B. R. Mohan