Professional Documents
Culture Documents
Prediction Stock Returns With Ensemble Methods
Prediction Stock Returns With Ensemble Methods
a r t i c l e i n f o a b s t r a c t
Article history: The problem of predicting stock returns has been an important issue for many years. Advancement in
Received 8 July 2008 computer technology has allowed many recent studies to utilize machine learning techniques such as
Received in revised form 2 September 2010 neural networks and decision trees to predict stock returns. In the area of machine learning, classifier
Accepted 1 October 2010
ensembles (i.e. combining multiple classifiers) have proven to be a method superior to single classi-
Available online 28 October 2010
fiers. In order to build a better model for predicting stock returns effectively and efficiently, this study
aims at investigating the prediction performance that utilizes the classifier ensembles method to analyze
Keywords:
stock returns. In particular, the hybrid methods of majority voting and bagging are considered. More-
Stock returns
Neural networks
over, performance using two types of classifier ensembles is compared with those using single baseline
Decision trees classifiers (i.e. neural networks, decision trees, and logistic regression). These two types of ensembles
Logistic regression are ‘homogeneous’ classifier ensembles (e.g. an ensemble of neural networks) and ‘heterogeneous’ clas-
Classifier ensembles sifier ensembles (e.g. an ensemble of neural networks, decision trees and logistic regression). Average
prediction accuracy, Type I and II errors, and return on investment of these models are also examined.
Our results indicate that multiple classifiers outperform single classifiers in terms of prediction accuracy
and returns on investment. In addition, heterogeneous classifier ensembles offer slightly better perfor-
mance than the homogeneous ones. However, there is no significant difference between majority voting
and bagging in prediction accuracy, but the former has better stock returns prediction accuracy than the
latter. Finally, the homogeneous multiple classifiers using neural networks by majority voting perform
best when predicting stock returns.
© 2010 Elsevier B.V. All rights reserved.
1568-4946/$ – see front matter © 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2010.10.001
C.-F. Tsai et al. / Applied Soft Computing 11 (2011) 2452–2459 2453
Table 1
Comparisons of related work.
Kimoto and Asakawa [17] Technical analysis NN The simulation result of stock returns is 98%
White [31] Technical analysis Neural networks (NN) Due to local minimization, the prediction
result is unsatisfactory
Gençay [8] Technical analysis NN Neural networks outperform the other models
AR
GARCH-M
Yao et al. [32] Technical analysis NN Neural networks perform better than time
series ARIMA
Leung et al. [19] Economic indicators Discriminant analysis, Logit & When the time series factor is considered, the
Probit, Probabilistic NN, NN prediction result for stock returns is better
than stock price prediction
Sorensen et al. [28] Financial ratios CART decision trees The model performs well for predicting good
and bad stocks
Olson and Mossman [23] Financial ratios NN Neural networks perform well in terms of
predicting stock price and returns. In
particular, the prediction result for the former
is better than the latter
Enke and Thawornwong [5] Technical analysis & Economic NN, Probabilistic NN, Portfolio NN Portfolio neural networks outperform the
indicators other models
Kim et al. [16] Technical analysis Classifier ensembles Classifier ensembles outperform single
classifiers
Wang and Chan [29] Technical analysis Two-layer bias Decision trees The model performs better than the general
decision trees model
Albanis and Batchelor [1] Financial ratios Combination of linear discriminant The heterogeneous multiple classifiers
analysis, learning vector outperform the single classifiers
quantization, and Probabilistic NN
Hassan et al. [12] Technical analysis Combination of HMM, ANN, and GA The combined model performs better than
single classifiers
Roh [27] Economic indicators NN, NN-EWMA NN-EGARCH performs the best, and the time
series factor can improve the prediction
performance
NN-GARCH
NN-EGARCH
by these individual classifiers to generate the final output as the are two similarly related studies referenced in this paper, which are
classification result. Kim et al. [16] and Hassan et al. [12].
In the literature, there are a number of combination meth- In Kim et al. [16], the combination methods used focus exclu-
ods, e.g. majority vote, Borda count, Bayesian, bagging, boosting, sively on combining multiple neural networks via a number of
etc. According to Kim et al. [15], there is no significant difference different combination methods, such as majority vote, Borda count,
between those combination methods. On the other hand, West et expert behavior-knowledge space, etc. Thus, the research does not
al. [30] compare several combination methods and the results show consider combining multiple heterogeneous classifiers for compar-
that the bagging method provides better performance than others isons. However, Hassan et al. [12] and Albanis and Batchelor [1]
in financial crisis prediction. Therefore, we utilize the majority vote combine heterogeneous multiple classifiers and show that their
and bagging methods for comparisons in this paper. multiple heterogeneous classifier ensembles only outperform sin-
gle classifiers.
• Majority voting. The simplest method for combining classifiers Therefore, it would be very useful to compare heterogeneous
is majority voting. The outputs of certain numbers of individ- classifier ensembles with the homogeneous neural network clas-
ual classifiers are pooled together. Then, the class which receives sifier ensembles for predicting stock returns in order to identify
the largest number of votes is selected as the final classification which type of classifier ensembles performs better. In addition,
decision [18]. prior studies have shown the superiority of classifier ensembles
• Bagging. In bagging, several classifiers are trained independently over single classifiers. Thus, some of the better classifier ensembles
by different training sets via the bootstrap method [3]. Boot- can be used as another baseline, in addition to single classifiers, for
strapping builds k replicate training datasets to construct k future studies. That is, future work proposing novel classification
independent classifiers by randomly re-sampling the original techniques can be compared with different baselines in order to
given training dataset, but with replacement. That is, each train- reach a more reliable conclusion.
ing example may appear repeatedly but not at all in any particular
replicated training dataset of k. Then, the k classifiers are aggre- 3. Research methodology
gated via an appropriate combination method, such as majority
voting. 3.1. Dataset
2.3. Related work In our experiments, we use the Taiwan Economic Journal (TEJ)
dataset. We focus on the quarterly rate of return from the electronic
Table 1 shows a comparison of related studies in terms of the industry because its transactions contain over 70% of the Taiwan
models and factors they used, as well as results. stock market.
In regards to Table 1, neural networks are the most widely used Our data ranges from the second quarter of 2002 to the third
model for stock price and returns prediction. In addition, neural quarter of 2006. Note that we exclude the data from 2000 to the
networks perform better than traditional statistical methods. There first quarter of 2001 due to 9/11 and the SARS events. As a result,
C.-F. Tsai et al. / Applied Soft Computing 11 (2011) 2452–2459 2455
Table 2
The financial ratios and economic indicators.
four training sets
Financial ratios one testing set
Capital structure Debt ratio
MLP
Long-term capital
CART
Amortization capability Current ratio
Quick ratio LR
Interest cover
Business operation capability Total asset turnover ratio five-fold cross validation
Fixed asset turnover ratio
Inventory turnover ratio Fig. 2. The single classifier construction procedure.
× . . . × (1 + the final day of ROI %) − 1) × 100 (2) 3.3.2. Homogeneous and heterogeneous classifier ensembles
To design heterogeneous classifier ensembles, the three types
of single classifiers described above are combined. That is, we use
3.2. Variables the ‘best’ MLP, CART, and LR models according to their 5-fold cross
validation results, respectively.
In prior related work, there is no standard for selecting related To design homogeneous classifier ensembles, we first select the
variables (c.f. Table 1). That is, they use different variables to predict ‘best’ model, which provides the highest rate of accuracy over the
stock returns. In this paper, we use financial ratios and economic three single classifiers. This can be done after the prediction results
indicators. We do not consider technical indicators because they of single classifiers comes out. Then, according to the results of 5-
are usually used for short-period stock price prediction. As we are fold cross-validation of the ‘best’ model (e.g. ‘A’), we select three of
focusing on quarterly rate of return, technical indicators are not the ‘A’ models which provide the top three highest rates of accuracy.
suitable.
Table 2 presents the financial ratios and economic indicators as
the input variables used in this paper, respectively. 3
In this paper, the Weka software is used to construct these baseline models.
2456 C.-F. Tsai et al. / Applied Soft Computing 11 (2011) 2452–2459
final Table 7
Classifier 1 Comparisons of MLP, DT, and LR.
output
testing set Majority
Classifier 2 Model Performance
voting
Classifier 3 Positive Negative Average accuracy (%)
returns (%) returns (%)
output 1, 2, 3
MLP 89.03 34.12 61.58
CART 45.95 74.78 50.36
Fig. 3. Classifier ensembles by majority voting.
LR 33.42 90.80 62.11
Table 3
Confusion matrix.
construct independent classifiers, we consider 3, 5, and 7 replicate
Actual Predicted training datasets to construct 3, 5, and 7 heterogeneous and homo-
Negative Positive geneous classifier ensembles. We design the 5 and 7 heterogeneous
Negative a (correct) b (incorrect)
classifier ensembles according to the prediction result of the three
Positive c (incorrect) d (correct) types of single classifiers (represented by A–C for the performance
order as an example). The former is based on the combination of
a is the number of correct predictions that an instance is negative; b is the number
of incorrect predictions that an instance is positive; c is the number of incorrect two of the top ‘A’ models, two of top ‘B’ models, and one ‘C’ model,
predictions that an instance is negative; d is the correct predictions that an instance and the latter combines three ‘A’, two ‘B’, and two ‘C’ models (c.f.
is positive. Table 14).
Table 4
The MLP results (p < 0.05).
3.4. Evaluation methods
0 1 Average accuracy In order to reliably evaluate the single classifiers and classi-
0 89.03% 10.97% 61.58% fier ensembles, we consider not only prediction accuracy but also
1 65.88% 34.12% return on investment.
Prediction accuracy is generally based on a confusion matrix
shown in Table 3.
Table 5
The CART results (p < 0.05). Therefore, the rate of prediction accuracy can be defined as fol-
lows:
0 1 Average accuracy
Table 8
The top three MLP models.
Rank Positive returns (%) Negative returns (%) Average accuracy (%) No. of hidden nodes (%) Training epoch
Table 9 Table 13
The performance of homogeneous MLP classifier ensembles (p < 0.05). The performance of 3, 5, and 7 homogeneous MLP classifier ensembles.
0 1 Average accuracy No. of training Positive returns Negative returns Average accuracy
sets (%) (%) (%)
0 72.85% 27.15% 65.8%
1 41.25% 58.75% 3 68.67 59.64 64.16
5 69.71 54.30 62.01
7 65.54 59.35 62.45
Table 10
The performance of heterogeneous classifier ensembles (p < 0.05).
Table 10 indicates that the heterogeneous classifier ensem-
0 1 Average accuracy bles perform slightly better than the homogeneous MLP classifier
0 62.92% 37.08% 66.63% ensembles in terms of the factor of average accuracy. In addi-
1 41.25% 70.33% tion, the heterogeneous classifier ensembles demonstrate a better
performance for predicting negative returns compared to the
Table 11
homogeneous classifiers by majority voting.
The performance of homogeneous MLP classifier ensembles (p < 0.05).
4.2.2. Bagging
0 1 Average accuracy
By using the bagging method to bootstrap three training sets
0 68.67% 31.33% 64.16% for constructing three multiple classifiers, Tables 11 and 12 show
1 40.36% 59.64%
the prediction results of the homogeneous MLP, and heterogeneous
classifier ensembles, respectively.
Table 12 These results show that when the number of classifier ensem-
The performance of heterogeneous classifier ensembles (p < 0.05). bles is three, constructing classifier ensembles by bagging does not
0 1 Average accuracy
provide better prediction performances over using the majority
voting approach.
0 62.40% 37.60% 65.18%
Tables 13 and 14 show the prediction performances of 3, 5,
1 32.05% 67.95%
and 7 homogeneous MLP and heterogeneous classifier ensembles,
respectively.
4.2. Classifier ensembles By using the bagging method, the heterogeneous classifier
ensembles perform better than the homogeneous ones in terms
4.2.1. Majority voting of average accuracy and prediction of negative returns.
Since prior studies such as [8,19,23] have shown the outper- Fig. 4 compares the single best MLP, the best homogeneous MLP
formance of MLP over many related classifiers and the prediction ensembles and the best heterogeneous classifier ensembles by vot-
performances of MLP and LR are similar to those discussed in ing and bagging in terms of their average accuracy and positive and
Section 4.1, constructing the homogeneous classifier ensembles is negative returns prediction results.
based on three MLP classifiers. These classifiers provide the top Regarding Fig. 4, prediction accuracy of classifier ensembles
three highest prediction performances over the 210 MLP models is better than the single best MLP model. In particular, classifier
(c.f. Section 3.3.1). Tables 8 and 9 show these three MLP classifiers ensembles largely reduce the differences in predicting positive and
and the combination result by majority voting. negative returns.
The result shows that the MLP classifier ensembles provide bet- When comparing homogeneous and heterogeneous classifier
ter performance than the single best MLP classifier in terms of ensembles, heterogeneous classifier ensembles outperform the
average accuracy and predicting negative returns. homogeneous ones. On the other hand, the combination method
For the heterogeneous classifier ensembles, they are composed using majority voting has better performance in homogeneous
of the best MLP, CART, and LR models. Table 10 shows the combi- classifier ensembles; using bagging provides slightly better perfor-
nation results. mances in heterogeneous classifier ensembles.
Table 14
The performance of 3, 5, and 7 heterogeneous classifier ensembles.
No. of training sets Positive returns (%) Negative returns (%) Average accuracy (%) Combing strategy
3 62.40 67.95 65.18 NN × 1, LR × 1, DT × 1
5 59.27 75.07 67.17 NN × 2, LR × 2, DT × 1
7 56.92 77.15 67.04 NN × 3, LR × 2, DT × 2
Table 15
Return on investment of the prediction models.
100.00%
80.00%
0.00%
MLP Voting Voting Bagging Bagging
(homo) (hetero) (homo) (hetero)
6000
5000
4000
3000
2000
1000
0
MLP
CART
Voting (homo.)
Voting (hetero.)
(homo. x 3)
(homo. x 5)
(homo. x 7)
(hetero. x 3)
(hetero. x 5)
(hetero. x 7)
Buy & Hold
LR
Bagging
Bagging
Bagging
Bagging
Bagging
Bagging
Fig. 5. Comparisons of single classifiers and classifier ensembles in terms of ROI.
Table 16
Prediction accuracy and return on investment of the prediction models.
Prediction accuracy (%) Investment performance Rank of prediction accuracy Rank of return on investment
4.3. Return on investment and heterogeneous classifier ensembles are compared, which has
never been examined in related studies (c.f. Section 2.3).
In addition to using the testing set to examine the prediction In summary, classifier ensembles perform better than single
performance of these models in the previous sections, we also classifiers do. This finding is in line with the findings of previ-
assess the return on investment (ROI) performance of these mod- ous studies, such as [7,9–11]. However, for classifier ensembles in
els. Table 15 shows the return on investment performance of these particular, much related work is limited to exclusively construct
models including the buy and hold strategy. In addition, Fig. 5 com- homogeneous classifier ensembles. Even in this paper, there is no
pares these techniques in terms of ROI. significant difference between homogeneous and heterogeneous
All of the prediction models perform better than the buy and classifier ensembles by majority voting and bagging. However, it
hold strategy. The best model is based on the homogeneous MLP is notable that the homogeneous classifier ensembles by majority
ensembles by majority voting. voting are particularly good at predicting positive returns, while the
Table 16 compares these models by examining their return pre- performance of predicting negative returns is better than the single
diction performances and return on investment. best MLP model. In addition, the homogeneous classifier ensembles
The comparative result shows that the rankings of prediction by majority voting can obtain higher positive return on investment
accuracy and return on investment of these models are not signif- and lower negative return on investment over the other models.
icantly different. Please note that although this paper considers three popular
classification techniques, there are other methods available in the
5. Conclusion field, such as k-nearest neighbor, fuzzy logic, and support vector
machines. From the practical perspective however, it is difficult
Classifier ensembles have been examined in many pattern to conduct a comprehensive study to encompass all existing tech-
recognition problems. They have shown better performances than niques. In addition, it may be hard to define the most representative
single classifiers. This study examined the performance of classifier one in the domain of stock return prediction because of the nature
ensembles in predicting stock returns. In particular, homogeneous of the embedded complexity. It is the authors’ understanding that
C.-F. Tsai et al. / Applied Soft Computing 11 (2011) 2452–2459 2459
there is no comparative study that includes all these aforemen- [12] Md.R. Hassan, B. Nath, M. Kirley, A fusion model of HMM, ANN a GA for stock
tioned methods. Thus, a future research topic may be to investigate market forecasting, Expert Systems with Applications 33 (2007) 171–180.
[13] S. Hayken, Neural Networks: A Comprehensive Foundation, 2nd ed., Prentice
and compare the results using other techniques. In addition to tak- Hall, 1999.
ing other techniques into consideration for a comparative purpose, [14] T.K. Ho, J.J. Hull, S.N. Sirhari, Decision combination in multiple classifier sys-
it is critical to assess the prediction performance of classifier ensem- tems, IEEE Transactions on Pattern Analysis and Machine Intelligence 16 (1)
(1994) 66–75.
bles trained by the continuous output variables rather than discrete [15] E. Kim, W. Kim, Y. Lee, Combination of multiple classifiers for the cus-
ones (i.e. ‘0 and ‘1 as the positive and negative stock returns, tomer’s purchase behavior prediction, Decision Support Systems 34 (2) (2003)
respectively). In prior literature (c.f. Table 1), much related work 167–175.
[16] M.J. Kim, S.H. Min, I. Han, An evolutionary approach to the combination of mul-
only used datasets which were largely composed of discrete output tiple classifiers to predict a stock price index, Expert Systems with Applications
variables to conduct the experiments. 31 (2) (2006) 241–247.
Other future work could consider the following issues: [17] T. Kimoto, K. Asakawa, Stock market prediction system with modular networks,
IEEE International Joint Conference on Neural Networks 1 (1990) 1–6.
[18] J. Kittler, M. Hatef, R.P.W. Duin, J. Matas, On combining classifiers, IEEE Trans-
• using technical indicators for input variables; actions on Pattern Analysis and Machine Intelligence 20 (3) (1998) 226–
• “on” and “off” seasons of the electronic industry which are the 239.
first and second quarters and the third and fourth quarters, [19] M.T. Leung, H. Daouk, A.-S. Chen, Forecasting stock indices: a comparison of
classification and level estimation models, International Journal of Forecasting
respectively; 16 (2) (2000) 173–190.
• non-economic factors may be involved to improve prediction [20] T.C. Mills, Non-linear time series models in economics, Journal of Economic
accuracy; Surveys 5 (1990) 215–241.
[21] T. Mitchell, Machine Learning, McGraw Hill, 1997.
• other stock datasets from different countries.
[22] J.J. Murphy, Technical Analysis of the Financial Markets, Institute of Finance,
New York, 1999.
References [23] D. Olson, C. Mossman, Neural network forecasts of Canadian stock returns using
accounting ratios, International Journal of Forecasting 19 (3) (2003) 453–465.
[24] M.B. Priestley, Non-linear and Non-stationary Time Series Analysis, Academic
[1] G. Albanis, R. Batchelor, Combining heterogeneous classifiers for stock selec- Press, 1988.
tion, Intelligent Systems in Accounting, Finance and Management 15 (1–2) [25] A.N. Refines, A. Zapranis, G. Francis, Stock performance modeling using neural
(2007) 1–21. networks: a comparative study with regression models, Neural Networks 7
[2] L. Bernstein, J. Wild, Analysis of Financial Statements, McGraw-Hill, 2000. (1994) 375–388.
[3] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) 123–140. [26] R.M. Rogers, Handbook of Key Economic Indicators, McGraw-Hill, 1998.
[4] R.G. Donaldson, M. Kamstra, Forecast combining with neural networks, Journal [27] T.H. Roh, Forecasting the volatility of stock price index, Expert Systems with
Forecasting 15 (1) (1996) 49–61. Applications 33 (4) (2007) 916–922.
[5] D. Enke, S. Thawornwong, The use of data mining and neural networks for fore- [28] E.H. Sorensen, K.L. Miller, C.K. Ooi, The decision tree approach to stock selection,
casting stock market returns, Expert Systems with Applications 29 (4) (2005) Journal of Portfolio Management 27 (1) (2000) 42–52.
927–940. [29] J.-L. Wang, S.-H. Chan, Stock market trading rule discovery using two-layer bias
[6] D. Frosyniotis, A. Stafylopatis, A. Likas, A divide-and-conquer method for multi- decision tree, Expert Systems with Applications 30 (4) (2006) 605–611.
net classifiers, Journal of Pattern Analysis and Applications 6 (1) (2003) 32–40. [30] D. West, S. Dellana, J. Qian, Neural network ensemble strategies for finan-
[7] N. García-Pedrajas, Constructing ensembles of classifiers by means of weighted cial decision applications, Computers and Operations Research 32 (10) (2005)
instance selection, IEEE Transactions on Neural Networks 20 (2) (2009) 2543–2559.
258–277. [31] H. White, Economic prediction using neural networks: the case of IBM daily
[8] R. Gençay, Non-liner prediction of security returns with moving average rules, stock returns, IEEE International Conference on Neural Networks 2 (1998)
Journal of Forecasting 15 (1996) 165–174. 451–458.
[9] W.Y. Goh, C.P. Lim, K.K. Peh, Predicting drug dissolution profiles with an ensem- [32] J. Yao, C.L. Tan, H.-L. Pok, Neural networks for technical analysis: a study on
ble of boosted neural networks: a time series approach, IEEE Transactions on KLCI, International Journal of Theoretical and Applied Finance 2 (2) (1999) 221–
Neural Networks 14 (2) (2003) 459–463. 241.
[10] L.K. Hansen, P. Salamon, Neural network ensembles, IEEE Transactions on Pat- [33] Y. Yoon, G. Swales Jr., T.M. Margavio, A comparison of discriminate analysis
tern Analysis and Machine Intelligence 12 (10) (1990) 993–1001. versus artificial neural networks, Journal of Operational Research Society 44
[11] S. Hashem, Optimal linear combinations of neural networks, Neural Networks (1993) 51–60.
10 (4) (1997) 599–614.