You are on page 1of 8

Applied Soft Computing 11 (2011) 2452–2459

Contents lists available at ScienceDirect

Applied Soft Computing


journal homepage: www.elsevier.com/locate/asoc

Predicting stock returns by classifier ensembles


Chih-Fong Tsai a,1 , Yuah-Chiao Lin c,2 , David C. Yen b,∗ , Yan-Min Chen c,2
a
Department of Information Management, National Central University, Jhongli, Taiwan, ROC
b
Department of Decision Sciences and Management Information Systems, Farmer School of Business, Miami University, Oxford, OH 45056, USA
c
Department of Accounting and Information Technology, National Chung Cheng University, 168 University Rd., Min-Hsiung, Chia-Yi, Taiwan

a r t i c l e i n f o a b s t r a c t

Article history: The problem of predicting stock returns has been an important issue for many years. Advancement in
Received 8 July 2008 computer technology has allowed many recent studies to utilize machine learning techniques such as
Received in revised form 2 September 2010 neural networks and decision trees to predict stock returns. In the area of machine learning, classifier
Accepted 1 October 2010
ensembles (i.e. combining multiple classifiers) have proven to be a method superior to single classi-
Available online 28 October 2010
fiers. In order to build a better model for predicting stock returns effectively and efficiently, this study
aims at investigating the prediction performance that utilizes the classifier ensembles method to analyze
Keywords:
stock returns. In particular, the hybrid methods of majority voting and bagging are considered. More-
Stock returns
Neural networks
over, performance using two types of classifier ensembles is compared with those using single baseline
Decision trees classifiers (i.e. neural networks, decision trees, and logistic regression). These two types of ensembles
Logistic regression are ‘homogeneous’ classifier ensembles (e.g. an ensemble of neural networks) and ‘heterogeneous’ clas-
Classifier ensembles sifier ensembles (e.g. an ensemble of neural networks, decision trees and logistic regression). Average
prediction accuracy, Type I and II errors, and return on investment of these models are also examined.
Our results indicate that multiple classifiers outperform single classifiers in terms of prediction accuracy
and returns on investment. In addition, heterogeneous classifier ensembles offer slightly better perfor-
mance than the homogeneous ones. However, there is no significant difference between majority voting
and bagging in prediction accuracy, but the former has better stock returns prediction accuracy than the
latter. Finally, the homogeneous multiple classifiers using neural networks by majority voting perform
best when predicting stock returns.
© 2010 Elsevier B.V. All rights reserved.

1. Introduction prediction of the regression equation, and thus results in better


prediction performance.
One of the most important indicative factors of the stock mar- Many recent studies in the aforementioned area focus on the
ket is the accurate prediction of stock returns for investors. Any use of machine learning techniques, such as neural networks and
computerized system which has the capability to accurately predict decision trees, to build a prediction model for stock returns pre-
stock returns is very useful and helpful to investors. diction, e.g. [5,15,29]. Among these studies, the neural networks
According to Refs. [20,24], previous studies predicting stock technique has experienced the most consideration. Related works
returns have attempted to capture the relationship between the have shown that neural networks outperform many other statisti-
available information (such as time-series data on/with financial cal based techniques such as regression and discriminant analysis
and economic variables) and the stock returns using some straight- [4,25,33].
forward linear regression assumptions. However, currently there In the area of machine learning and pattern recognition, the
is no evidence to support the assumption that the relationship combination of multiple classifiers (e.g. classifier ensembles), have
between the stock returns and amount of available information is a been shown to perform better than single classifiers [6,14]. How-
perfectly linear one. As a result, non-linear models can be employed ever, this technique has not been widely examined to predict stock
to explain the residual variance of the actual stock return from the returns in financial markets.
Specifically, there are a number of combination strategies used
to design classifier ensembles, such as majority voting and bagging.
There is no definite answer, however, to the question of whether
∗ Corresponding author. Tel.: +1 513 529 4827; fax: +1 513 529 9689.
or not the combination method is a better approach.
E-mail addresses: cftsai@mgt.ncu.edu.tw (C.-F. Tsai), actycl@ccu.edu.tw
Consequently, this paper focuses on classifier ensembles for
(Y.-C. Lin), yendc@muohio.edu (D.C. Yen), wenjer@gmail.com (Y.-M. Chen).
1
Tel.: +886 3 422 7151; fax: +886 3 4254604. stock returns prediction. In order to examine the applicability
2
Tel.: +886 52720411x34519. of classifier ensembles, the ‘homogenous’ and ‘heterogeneous’

1568-4946/$ – see front matter © 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2010.10.001
C.-F. Tsai et al. / Applied Soft Computing 11 (2011) 2452–2459 2453

classifier ensembles were constructed, similar to the studies done


by Kim et al. [16] and Hassan et al. [12]. The study of Kim et al.
is based only on the ensemble of neural networks, while Hassan’s
study is based on the ensemble of neural networks, decision trees,
and logistic regression. For model comparison purposes, average
prediction accuracy, Type I and II errors, and return on investment
are considered in this study.
This paper is organized in the following manner: Section 2 pro-
vides a literature review including related factors affecting stock
returns, machine learning techniques and related work of predict-
ing stock returns; Section 3 presents the research methodology
which contains the experimental setup of this paper including the
Fig. 1. An example of classifier ensembles [15].
dataset used, input variables, the prediction models, and the eval-
uation methods; Section 4 shows the experimental results; Section
5 provides the conclusion and possible future works. Technical analysis is different when compared to fundamental
analysis. That is, technical analysis does not focus on the ‘intrinsic
value’ of a stock, but on extrapolations from historical price patterns
2. Literature review
[22].

2.1. Modeling stock returns


2.2. Prediction of stock returns
2.1.1. Financial ratios
Predicting stock returns is one of the central problems with pat-
Financial ratios (fundamental analysis) can be used to compare
tern classification. In other words, a data sample represented by
a firm’s performance and financial situation over a period of time
a number of factors, such as financial rations (c.f. Section 2.1) is
by carefully analyzing the financial statements and assessing the
allocated into one of the finite set of classes, such as positive and
health of a business. Using ratio analysis, trends and indications of
negative stock returns [21]. Further, these factors are inputted into
good and bad business practices can be easily identified. To this end,
a classifier, and then can be sufficiently identified to perform the
fundamental analysis is performed on both historical and present
prediction task (i.e. positive or negative stock returns). This leads
data in order to perform a company stock valuation and hence,
to a problem in constructing an effective classifier, which can be
predict its probable price evolution. Financial ratios including prof-
used to better predict stock returns. In prior literature and/or stud-
itability, liquidity, coverage, and leverage can be calculated from
ies, e.g. [5,23,27], single classifiers were widely used to resolve
the financial statements [2].
this problem. In addition, classifier ensembles have continuously
shown that they can be outperformed by many single classifiers in
2.1.2. Economic Indicators many domain problems [14,18].
Economic indicators are used based on the statistical data
employed to show general trends in the economy. Therefore, 2.2.1. Single classifiers
indicators can help analyze economic performance and make pre- To construct a single classifier, a certain number of training sam-
dictions for future performance. There are three types of general ples, composed of a pair of feature vectors (i.e. related factors) and
indicators, which are coincident indicators, leading indicators, and its associated class label, are available for each class in order to
lagging indicators [26]. train the classifier [13]. In this case, the class label associated with
Coincident indicators (e.g. payroll) are obtained at the same time its training sample depicts either positive or negative stock returns.
as the related economic activity occurs. A coincident index can be In other words, the learning/training task is to compute a
used to identify the dates of peaks and troughs in the business cycle. classifier or model that approximates the mapping between the
Leading indicators (e.g. stock prices and interest rates) contain input–output examples and correctly labels the training samples
predictive values which tend to change before the general eco- with some level of accuracy. After the classifier is generated or
nomic activity occurs. As a result, they are regarded as short-term trained, it is then able to classify an unknown instance into one
predictors of the economy. For example, the stock market usu- of the learned class labels in the training samples as the prediction
ally begins to decline before the economy as a whole declines and task. More specifically, the classifier calculates the similarity of all
begins to improve before the general economy starts to recover trained classes and assigns the unlabeled instance to the class with
from a slump. the highest similarity measure.
Lagging indicators (e.g. unemployment rate) are the only indi-
cators which become apparent after the general economic activity 2.2.2. Classifier ensembles
occurs. For example, the unemployment rate usually decreases two Classifier ensembles, a classification approach based on com-
or three quarters after an upturn in the general economy. bining multiple single classifiers, aims at obtaining highly accurate
classifiers by combining less accurate ones. The ensembles are
2.1.3. Technical analysis intended to improve the classification performance of a single clas-
Technical analysis considers past financial market data, rep- sifier [14,18]. That is, the combination is able to complement the
resented by indicators such as Relative Strength Indicator (RSI) errors made by the individual classifiers on different parts of the
and field-specific charts, to be useful in forecasting price trends input space. Therefore, the performance of classifier ensembles is
and market investment decisions. In particular, technical analysis likely better than one of the best single classifiers used in isolation.
evaluates the performance of securities by analyzing statistics gen- Fig. 1 shows an example of classifier ensembles.
erated from various marketing activities such as past prices and To construct classifier ensembles, the chosen training set is used
trading volumes. Furthermore, the trends and patterns of an invest- to practice a number of classification techniques. Then, the testing
ment instrument’s price, volume, breadth, and trading activities set is used to test these classifiers individually, so that each classifier
can be used to reflect most of the relevant market information to will produce an output for each testing sample. Finally, a combining
determine its value. module or a combination method processes the outputs produced
2454 C.-F. Tsai et al. / Applied Soft Computing 11 (2011) 2452–2459

Table 1
Comparisons of related work.

Work Factors Models Results

Kimoto and Asakawa [17] Technical analysis NN The simulation result of stock returns is 98%
White [31] Technical analysis Neural networks (NN) Due to local minimization, the prediction
result is unsatisfactory
Gençay [8] Technical analysis NN Neural networks outperform the other models
AR
GARCH-M
Yao et al. [32] Technical analysis NN Neural networks perform better than time
series ARIMA
Leung et al. [19] Economic indicators Discriminant analysis, Logit & When the time series factor is considered, the
Probit, Probabilistic NN, NN prediction result for stock returns is better
than stock price prediction
Sorensen et al. [28] Financial ratios CART decision trees The model performs well for predicting good
and bad stocks
Olson and Mossman [23] Financial ratios NN Neural networks perform well in terms of
predicting stock price and returns. In
particular, the prediction result for the former
is better than the latter
Enke and Thawornwong [5] Technical analysis & Economic NN, Probabilistic NN, Portfolio NN Portfolio neural networks outperform the
indicators other models
Kim et al. [16] Technical analysis Classifier ensembles Classifier ensembles outperform single
classifiers
Wang and Chan [29] Technical analysis Two-layer bias Decision trees The model performs better than the general
decision trees model
Albanis and Batchelor [1] Financial ratios Combination of linear discriminant The heterogeneous multiple classifiers
analysis, learning vector outperform the single classifiers
quantization, and Probabilistic NN
Hassan et al. [12] Technical analysis Combination of HMM, ANN, and GA The combined model performs better than
single classifiers
Roh [27] Economic indicators NN, NN-EWMA NN-EGARCH performs the best, and the time
series factor can improve the prediction
performance
NN-GARCH
NN-EGARCH

by these individual classifiers to generate the final output as the are two similarly related studies referenced in this paper, which are
classification result. Kim et al. [16] and Hassan et al. [12].
In the literature, there are a number of combination meth- In Kim et al. [16], the combination methods used focus exclu-
ods, e.g. majority vote, Borda count, Bayesian, bagging, boosting, sively on combining multiple neural networks via a number of
etc. According to Kim et al. [15], there is no significant difference different combination methods, such as majority vote, Borda count,
between those combination methods. On the other hand, West et expert behavior-knowledge space, etc. Thus, the research does not
al. [30] compare several combination methods and the results show consider combining multiple heterogeneous classifiers for compar-
that the bagging method provides better performance than others isons. However, Hassan et al. [12] and Albanis and Batchelor [1]
in financial crisis prediction. Therefore, we utilize the majority vote combine heterogeneous multiple classifiers and show that their
and bagging methods for comparisons in this paper. multiple heterogeneous classifier ensembles only outperform sin-
gle classifiers.
• Majority voting. The simplest method for combining classifiers Therefore, it would be very useful to compare heterogeneous
is majority voting. The outputs of certain numbers of individ- classifier ensembles with the homogeneous neural network clas-
ual classifiers are pooled together. Then, the class which receives sifier ensembles for predicting stock returns in order to identify
the largest number of votes is selected as the final classification which type of classifier ensembles performs better. In addition,
decision [18]. prior studies have shown the superiority of classifier ensembles
• Bagging. In bagging, several classifiers are trained independently over single classifiers. Thus, some of the better classifier ensembles
by different training sets via the bootstrap method [3]. Boot- can be used as another baseline, in addition to single classifiers, for
strapping builds k replicate training datasets to construct k future studies. That is, future work proposing novel classification
independent classifiers by randomly re-sampling the original techniques can be compared with different baselines in order to
given training dataset, but with replacement. That is, each train- reach a more reliable conclusion.
ing example may appear repeatedly but not at all in any particular
replicated training dataset of k. Then, the k classifiers are aggre- 3. Research methodology
gated via an appropriate combination method, such as majority
voting. 3.1. Dataset

2.3. Related work In our experiments, we use the Taiwan Economic Journal (TEJ)
dataset. We focus on the quarterly rate of return from the electronic
Table 1 shows a comparison of related studies in terms of the industry because its transactions contain over 70% of the Taiwan
models and factors they used, as well as results. stock market.
In regards to Table 1, neural networks are the most widely used Our data ranges from the second quarter of 2002 to the third
model for stock price and returns prediction. In addition, neural quarter of 2006. Note that we exclude the data from 2000 to the
networks perform better than traditional statistical methods. There first quarter of 2001 due to 9/11 and the SARS events. As a result,
C.-F. Tsai et al. / Applied Soft Computing 11 (2011) 2452–2459 2455

Table 2
The financial ratios and economic indicators.
four training sets
Financial ratios one testing set
Capital structure Debt ratio
MLP
Long-term capital
CART
Amortization capability Current ratio
Quick ratio LR
Interest cover

Business operation capability Total asset turnover ratio five-fold cross validation
Fixed asset turnover ratio
Inventory turnover ratio Fig. 2. The single classifier construction procedure.

Accounts receivable turnover ratio


Profitability Return on assets
For the output labels, ‘positive’ and ‘negative’ stock returns are
Margin before interest and tax
Net assets per stock the outputs of prediction models. That is, a prediction model after
Return on stockholder’s equity training can assign a new instance into either ‘positive’ or ‘nega-
Cash flows Cash flow ratio
tive’ stock returns. In the proposed case, ‘0 and ‘1 are used as the
output labels where “0” represents the positive stock returns and
Others Constant net assets growth ratio
“1” represents the negative stock returns.
Net assets growth ratio after tax
Frequent interest growth ratio after tax
Return on total assets growth ratio 3.3. Prediction models
Return ratio of the last quarter

Economic indicators Deposit interest rate 3.3.1. Single classifiers


Currency transferring rate (US dollars to Three types of commonly used single classifiers are examined in
Taiwan dollars) this paper. The classifier types include multi-layer perception (MLP)
Discount rate
neural network, classification and regressing tree (CART) decision
Money supply
Consumer price index trees, and logistic regression (LR) and they can be thought of as the
Wholesale price index baseline models.3
Unemployment rate In designing the MLP neural network, there are two issues that
Bond trading amount
need to be considered. The architecture of the network (i.e. num-
Total assets of listed companies
Taiwan stock index bers of hidden neurons and layers) must be chosen. It is possible for
Industrial Production Index overtraining to occur during the training stage. To avoid overtrain-
ing, we designed seven different learning epochs (including 150,
300, 500, 1000, 1500, 3000, and 5000) and six different numbers
the training set contains 2511 examples (1052 positive and 1459 of hidden nodes (including 5, 10, 15, 20, 25, and 30). Three layers
negative returns) ranging from the second quarter of 2002 to the (one input, one hidden, and one output layers) are considered as
third quarter of 2005 (i.e. 14 quarters), and the testing set contains the MLP architecture.
720 examples ranging from the fourth quarter of 2005 to the third We also applied 5-fold cross-validation to avoid the variabil-
quarter of 2006 (i.e. 4 quarters). ity of samples which may affect the performance of these three
To obtain the quarterly rate of return or the return on investment models. Note that the reason of not using 10-fold cross-validation
(ROI), we use the following formulas: is simply because it would lead to a relatively small number of
(Pt × (1 + ˛ + ˇ) + D) the testing set, i.e. 72 testing samples. Utilizing the 5-fold cross-
ROI (%) = × 100 (1) validation methodology involves dividing a given dataset into five
(Pt − 1 + ˛ × C) − 1)
equal parts. Any four of the five segments is selected to perform
where Pt , the tth period closing price (index); ˛, the subscription training. The remaining part will be executed to test the model.
rate of ex-rights in the period; ˇ, the stock grant rate of ex-rights As a result, each part will be trained and tested five times. As an
in the period; C, the subscription price of ex-rights in the period; D, example of MLP, we will construct 210 different MLP models (since
cash dividends in the period. 7 learning epochs × 6 different numbers of hidden nodes × 5-fold
Therefore, the return in the nth quarter (Rn ), for example, can cross-validation). The best classification rate is used as the indi-
be obtained by: cator for the model’s performance. Fig. 2 shows the procedure of
Rn % = ((1 + the 1st day of ROI %) × (1 + the 2nd day of ROI %) constructing single classifiers.

× . . . × (1 + the final day of ROI %) − 1) × 100 (2) 3.3.2. Homogeneous and heterogeneous classifier ensembles
To design heterogeneous classifier ensembles, the three types
of single classifiers described above are combined. That is, we use
3.2. Variables the ‘best’ MLP, CART, and LR models according to their 5-fold cross
validation results, respectively.
In prior related work, there is no standard for selecting related To design homogeneous classifier ensembles, we first select the
variables (c.f. Table 1). That is, they use different variables to predict ‘best’ model, which provides the highest rate of accuracy over the
stock returns. In this paper, we use financial ratios and economic three single classifiers. This can be done after the prediction results
indicators. We do not consider technical indicators because they of single classifiers comes out. Then, according to the results of 5-
are usually used for short-period stock price prediction. As we are fold cross-validation of the ‘best’ model (e.g. ‘A’), we select three of
focusing on quarterly rate of return, technical indicators are not the ‘A’ models which provide the top three highest rates of accuracy.
suitable.
Table 2 presents the financial ratios and economic indicators as
the input variables used in this paper, respectively. 3
In this paper, the Weka software is used to construct these baseline models.
2456 C.-F. Tsai et al. / Applied Soft Computing 11 (2011) 2452–2459

final Table 7
Classifier 1 Comparisons of MLP, DT, and LR.
output
testing set Majority
Classifier 2 Model Performance
voting
Classifier 3 Positive Negative Average accuracy (%)
returns (%) returns (%)
output 1, 2, 3
MLP 89.03 34.12 61.58
CART 45.95 74.78 50.36
Fig. 3. Classifier ensembles by majority voting.
LR 33.42 90.80 62.11

Table 3
Confusion matrix.
construct independent classifiers, we consider 3, 5, and 7 replicate
Actual Predicted training datasets to construct 3, 5, and 7 heterogeneous and homo-
Negative Positive geneous classifier ensembles. We design the 5 and 7 heterogeneous
Negative a (correct) b (incorrect)
classifier ensembles according to the prediction result of the three
Positive c (incorrect) d (correct) types of single classifiers (represented by A–C for the performance
order as an example). The former is based on the combination of
a is the number of correct predictions that an instance is negative; b is the number
of incorrect predictions that an instance is positive; c is the number of incorrect two of the top ‘A’ models, two of top ‘B’ models, and one ‘C’ model,
predictions that an instance is negative; d is the correct predictions that an instance and the latter combines three ‘A’, two ‘B’, and two ‘C’ models (c.f.
is positive. Table 14).

Table 4
The MLP results (p < 0.05).
3.4. Evaluation methods

0 1 Average accuracy In order to reliably evaluate the single classifiers and classi-
0 89.03% 10.97% 61.58% fier ensembles, we consider not only prediction accuracy but also
1 65.88% 34.12% return on investment.
Prediction accuracy is generally based on a confusion matrix
shown in Table 3.
Table 5
The CART results (p < 0.05). Therefore, the rate of prediction accuracy can be defined as fol-
lows:
0 1 Average accuracy

0 45.95% 54.05% 60.36% a+d


Predication accuracy =
1 25.22% 74.78% a+b+c+d

In addition, we examine Type I and II errors in the models. Type


Table 6
I errors occur when the model classifies the negative group into the
The LR results (p < 0.05).
positive group. This is opposite of Type II errors, where the model
0 1 Average accuracy classifies the positive group into the negative group.
0 33.42% 66.58% 62.11% For the return on investment calculation, we take the predicted
1 9.20% 90.80% output into consideration. The testing dataset containing four quar-
ters of data (the fourth quarter of 2005 to the third quarter of 2006)
is used to examine return on investment of the models. The buy and
For the example of MLP, the three highest rates of accuracy corre- hold strategy is also considered for comparisons.
sponding to the three models over 210 are selected. By doing so, we
can make a comparison between heterogeneous and homogeneous
classifier ensembles. 4. Experimental results
There are two strategies used in this paper for combining mul-
tiple homogeneous and heterogeneous classifiers: the majority 4.1. Single classifiers
voting and bagging methods.
For majority voting, regardless of the three heterogeneous or Based on the experimental settings described in Section 3,
homogeneous classifier ensembles used, the final output is based Tables 4–6 show the prediction performance of the three classifiers.
on the output labels from the three models which receive two votes. Table 7 provides a comparison of these three classifiers in terms
For the example of heterogeneous classifier ensembles, if the ‘best’ of predicting the positive and negative stock returns and their aver-
MLP and LR models output “0” and the ‘best’ CART model outputs age accuracy.
“1”, then the final output of the heterogeneous classifier ensem- It can be noted that the LR model outperforms the other two
bles is “0”. Fig. 3 shows the architecture of classifier ensembles by models in terms of average accuracy. However, there is no big dif-
majority voting. ference between MLP and LR. On the other hand, it is important
As for the bagging combination method, because there is no to note that MLP performs better when predicting positive returns
answer for choosing the number of replicate training datasets to and LR deems superior in predicting negative returns.

Table 8
The top three MLP models.

Rank Positive returns (%) Negative returns (%) Average accuracy (%) No. of hidden nodes (%) Training epoch

1st 89.03 34.12 61.58 5 1000


2nd 47.52 75.07 61.3 20 1000
3rd 53.79 66.47 60.13 30 1500
C.-F. Tsai et al. / Applied Soft Computing 11 (2011) 2452–2459 2457

Table 9 Table 13
The performance of homogeneous MLP classifier ensembles (p < 0.05). The performance of 3, 5, and 7 homogeneous MLP classifier ensembles.

0 1 Average accuracy No. of training Positive returns Negative returns Average accuracy
sets (%) (%) (%)
0 72.85% 27.15% 65.8%
1 41.25% 58.75% 3 68.67 59.64 64.16
5 69.71 54.30 62.01
7 65.54 59.35 62.45
Table 10
The performance of heterogeneous classifier ensembles (p < 0.05).
Table 10 indicates that the heterogeneous classifier ensem-
0 1 Average accuracy bles perform slightly better than the homogeneous MLP classifier
0 62.92% 37.08% 66.63% ensembles in terms of the factor of average accuracy. In addi-
1 41.25% 70.33% tion, the heterogeneous classifier ensembles demonstrate a better
performance for predicting negative returns compared to the
Table 11
homogeneous classifiers by majority voting.
The performance of homogeneous MLP classifier ensembles (p < 0.05).
4.2.2. Bagging
0 1 Average accuracy
By using the bagging method to bootstrap three training sets
0 68.67% 31.33% 64.16% for constructing three multiple classifiers, Tables 11 and 12 show
1 40.36% 59.64%
the prediction results of the homogeneous MLP, and heterogeneous
classifier ensembles, respectively.
Table 12 These results show that when the number of classifier ensem-
The performance of heterogeneous classifier ensembles (p < 0.05). bles is three, constructing classifier ensembles by bagging does not
0 1 Average accuracy
provide better prediction performances over using the majority
voting approach.
0 62.40% 37.60% 65.18%
Tables 13 and 14 show the prediction performances of 3, 5,
1 32.05% 67.95%
and 7 homogeneous MLP and heterogeneous classifier ensembles,
respectively.
4.2. Classifier ensembles By using the bagging method, the heterogeneous classifier
ensembles perform better than the homogeneous ones in terms
4.2.1. Majority voting of average accuracy and prediction of negative returns.
Since prior studies such as [8,19,23] have shown the outper- Fig. 4 compares the single best MLP, the best homogeneous MLP
formance of MLP over many related classifiers and the prediction ensembles and the best heterogeneous classifier ensembles by vot-
performances of MLP and LR are similar to those discussed in ing and bagging in terms of their average accuracy and positive and
Section 4.1, constructing the homogeneous classifier ensembles is negative returns prediction results.
based on three MLP classifiers. These classifiers provide the top Regarding Fig. 4, prediction accuracy of classifier ensembles
three highest prediction performances over the 210 MLP models is better than the single best MLP model. In particular, classifier
(c.f. Section 3.3.1). Tables 8 and 9 show these three MLP classifiers ensembles largely reduce the differences in predicting positive and
and the combination result by majority voting. negative returns.
The result shows that the MLP classifier ensembles provide bet- When comparing homogeneous and heterogeneous classifier
ter performance than the single best MLP classifier in terms of ensembles, heterogeneous classifier ensembles outperform the
average accuracy and predicting negative returns. homogeneous ones. On the other hand, the combination method
For the heterogeneous classifier ensembles, they are composed using majority voting has better performance in homogeneous
of the best MLP, CART, and LR models. Table 10 shows the combi- classifier ensembles; using bagging provides slightly better perfor-
nation results. mances in heterogeneous classifier ensembles.

Table 14
The performance of 3, 5, and 7 heterogeneous classifier ensembles.

No. of training sets Positive returns (%) Negative returns (%) Average accuracy (%) Combing strategy
3 62.40 67.95 65.18 NN × 1, LR × 1, DT × 1
5 59.27 75.07 67.17 NN × 2, LR × 2, DT × 1
7 56.92 77.15 67.04 NN × 3, LR × 2, DT × 2

Table 15
Return on investment of the prediction models.

2005-Q4 2006-Q1 2006-Q2 2006-Q3 Total Rank

Buy and Hold 2777.762 −240.964 −86.1334 519.5197 2970.184 12


MLP 2809.699 258.8195 326.5163 1062.582 4457.617 5
CART 1614.678 370.6757 478.3229 1106.995 3570.672 10
LR 2653.432 279 279 321.1181 3532.55 11
Voting (homo.) 2814.95 291.9312 542.3742 1187.607 4836.862 1
Voting (hetero.) 2684.736 367.7723 478.3229 1106.995 4637.826 2
Bagging (homo. 3×) 2640.08 233.9735 407.757 1166.702 4448.412 6
Bagging (homo. 5×) 2497.837 207.6499 186.4256 1241.044 4132.984 8
Bagging (homo. 7×) 2492.117 241.4367 182.0105 1090.318 4005.882 9
Bagging (hetero. 3×) 2303.277 388.572 365.1275 1205.127 4262.104 7
Bagging (hetero. 5×) 2711 421.6797 243.484 1169.045 4545.209 3
Bagging (hetero. 7×) 2729.102 401.5217 243.484 1151.783 4525.891 4
2458 C.-F. Tsai et al. / Applied Soft Computing 11 (2011) 2452–2459

100.00%

80.00%

60.00% Positive returns


Negative returns
40.00%
Average accuracy
20.00%

0.00%
MLP Voting Voting Bagging Bagging
(homo) (hetero) (homo) (hetero)

Fig. 4. Comparisons of MLP, homogeneous MLP, and heterogeneous classifier ensembles.

6000
5000
4000
3000
2000
1000
0
MLP

CART

Voting (homo.)

Voting (hetero.)

(homo. x 3)

(homo. x 5)

(homo. x 7)

(hetero. x 3)

(hetero. x 5)

(hetero. x 7)
Buy & Hold

LR

Bagging

Bagging

Bagging

Bagging

Bagging

Bagging
Fig. 5. Comparisons of single classifiers and classifier ensembles in terms of ROI.

Table 16
Prediction accuracy and return on investment of the prediction models.

Prediction accuracy (%) Investment performance Rank of prediction accuracy Rank of return on investment

MLP 63.33 4457.617 7 5


CART 59.44 3570.672 11 10
LR 60.28 3532.55 10 11
Voting (homo.) 66.25 4836.862 4 1
Voting (hetero.) 66.39 4637.826 2 2
Bagging (homo. 3×) 64.44 4448.412 6 6
Bagging (homo. 5×) 62.50 4132.984 9 8
Bagging (homo. 7×) 62.64 4005.882 8 9
Bagging (hetero. 3×) 65.00 4262.104 5 7
Bagging (hetero. 5×) 66.67 4545.209 1 3
Bagging (hetero. 7×) 66.39 4525.891 2 4

4.3. Return on investment and heterogeneous classifier ensembles are compared, which has
never been examined in related studies (c.f. Section 2.3).
In addition to using the testing set to examine the prediction In summary, classifier ensembles perform better than single
performance of these models in the previous sections, we also classifiers do. This finding is in line with the findings of previ-
assess the return on investment (ROI) performance of these mod- ous studies, such as [7,9–11]. However, for classifier ensembles in
els. Table 15 shows the return on investment performance of these particular, much related work is limited to exclusively construct
models including the buy and hold strategy. In addition, Fig. 5 com- homogeneous classifier ensembles. Even in this paper, there is no
pares these techniques in terms of ROI. significant difference between homogeneous and heterogeneous
All of the prediction models perform better than the buy and classifier ensembles by majority voting and bagging. However, it
hold strategy. The best model is based on the homogeneous MLP is notable that the homogeneous classifier ensembles by majority
ensembles by majority voting. voting are particularly good at predicting positive returns, while the
Table 16 compares these models by examining their return pre- performance of predicting negative returns is better than the single
diction performances and return on investment. best MLP model. In addition, the homogeneous classifier ensembles
The comparative result shows that the rankings of prediction by majority voting can obtain higher positive return on investment
accuracy and return on investment of these models are not signif- and lower negative return on investment over the other models.
icantly different. Please note that although this paper considers three popular
classification techniques, there are other methods available in the
5. Conclusion field, such as k-nearest neighbor, fuzzy logic, and support vector
machines. From the practical perspective however, it is difficult
Classifier ensembles have been examined in many pattern to conduct a comprehensive study to encompass all existing tech-
recognition problems. They have shown better performances than niques. In addition, it may be hard to define the most representative
single classifiers. This study examined the performance of classifier one in the domain of stock return prediction because of the nature
ensembles in predicting stock returns. In particular, homogeneous of the embedded complexity. It is the authors’ understanding that
C.-F. Tsai et al. / Applied Soft Computing 11 (2011) 2452–2459 2459

there is no comparative study that includes all these aforemen- [12] Md.R. Hassan, B. Nath, M. Kirley, A fusion model of HMM, ANN a GA for stock
tioned methods. Thus, a future research topic may be to investigate market forecasting, Expert Systems with Applications 33 (2007) 171–180.
[13] S. Hayken, Neural Networks: A Comprehensive Foundation, 2nd ed., Prentice
and compare the results using other techniques. In addition to tak- Hall, 1999.
ing other techniques into consideration for a comparative purpose, [14] T.K. Ho, J.J. Hull, S.N. Sirhari, Decision combination in multiple classifier sys-
it is critical to assess the prediction performance of classifier ensem- tems, IEEE Transactions on Pattern Analysis and Machine Intelligence 16 (1)
(1994) 66–75.
bles trained by the continuous output variables rather than discrete [15] E. Kim, W. Kim, Y. Lee, Combination of multiple classifiers for the cus-
ones (i.e. ‘0 and ‘1 as the positive and negative stock returns, tomer’s purchase behavior prediction, Decision Support Systems 34 (2) (2003)
respectively). In prior literature (c.f. Table 1), much related work 167–175.
[16] M.J. Kim, S.H. Min, I. Han, An evolutionary approach to the combination of mul-
only used datasets which were largely composed of discrete output tiple classifiers to predict a stock price index, Expert Systems with Applications
variables to conduct the experiments. 31 (2) (2006) 241–247.
Other future work could consider the following issues: [17] T. Kimoto, K. Asakawa, Stock market prediction system with modular networks,
IEEE International Joint Conference on Neural Networks 1 (1990) 1–6.
[18] J. Kittler, M. Hatef, R.P.W. Duin, J. Matas, On combining classifiers, IEEE Trans-
• using technical indicators for input variables; actions on Pattern Analysis and Machine Intelligence 20 (3) (1998) 226–
• “on” and “off” seasons of the electronic industry which are the 239.
first and second quarters and the third and fourth quarters, [19] M.T. Leung, H. Daouk, A.-S. Chen, Forecasting stock indices: a comparison of
classification and level estimation models, International Journal of Forecasting
respectively; 16 (2) (2000) 173–190.
• non-economic factors may be involved to improve prediction [20] T.C. Mills, Non-linear time series models in economics, Journal of Economic
accuracy; Surveys 5 (1990) 215–241.
[21] T. Mitchell, Machine Learning, McGraw Hill, 1997.
• other stock datasets from different countries.
[22] J.J. Murphy, Technical Analysis of the Financial Markets, Institute of Finance,
New York, 1999.
References [23] D. Olson, C. Mossman, Neural network forecasts of Canadian stock returns using
accounting ratios, International Journal of Forecasting 19 (3) (2003) 453–465.
[24] M.B. Priestley, Non-linear and Non-stationary Time Series Analysis, Academic
[1] G. Albanis, R. Batchelor, Combining heterogeneous classifiers for stock selec- Press, 1988.
tion, Intelligent Systems in Accounting, Finance and Management 15 (1–2) [25] A.N. Refines, A. Zapranis, G. Francis, Stock performance modeling using neural
(2007) 1–21. networks: a comparative study with regression models, Neural Networks 7
[2] L. Bernstein, J. Wild, Analysis of Financial Statements, McGraw-Hill, 2000. (1994) 375–388.
[3] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) 123–140. [26] R.M. Rogers, Handbook of Key Economic Indicators, McGraw-Hill, 1998.
[4] R.G. Donaldson, M. Kamstra, Forecast combining with neural networks, Journal [27] T.H. Roh, Forecasting the volatility of stock price index, Expert Systems with
Forecasting 15 (1) (1996) 49–61. Applications 33 (4) (2007) 916–922.
[5] D. Enke, S. Thawornwong, The use of data mining and neural networks for fore- [28] E.H. Sorensen, K.L. Miller, C.K. Ooi, The decision tree approach to stock selection,
casting stock market returns, Expert Systems with Applications 29 (4) (2005) Journal of Portfolio Management 27 (1) (2000) 42–52.
927–940. [29] J.-L. Wang, S.-H. Chan, Stock market trading rule discovery using two-layer bias
[6] D. Frosyniotis, A. Stafylopatis, A. Likas, A divide-and-conquer method for multi- decision tree, Expert Systems with Applications 30 (4) (2006) 605–611.
net classifiers, Journal of Pattern Analysis and Applications 6 (1) (2003) 32–40. [30] D. West, S. Dellana, J. Qian, Neural network ensemble strategies for finan-
[7] N. García-Pedrajas, Constructing ensembles of classifiers by means of weighted cial decision applications, Computers and Operations Research 32 (10) (2005)
instance selection, IEEE Transactions on Neural Networks 20 (2) (2009) 2543–2559.
258–277. [31] H. White, Economic prediction using neural networks: the case of IBM daily
[8] R. Gençay, Non-liner prediction of security returns with moving average rules, stock returns, IEEE International Conference on Neural Networks 2 (1998)
Journal of Forecasting 15 (1996) 165–174. 451–458.
[9] W.Y. Goh, C.P. Lim, K.K. Peh, Predicting drug dissolution profiles with an ensem- [32] J. Yao, C.L. Tan, H.-L. Pok, Neural networks for technical analysis: a study on
ble of boosted neural networks: a time series approach, IEEE Transactions on KLCI, International Journal of Theoretical and Applied Finance 2 (2) (1999) 221–
Neural Networks 14 (2) (2003) 459–463. 241.
[10] L.K. Hansen, P. Salamon, Neural network ensembles, IEEE Transactions on Pat- [33] Y. Yoon, G. Swales Jr., T.M. Margavio, A comparison of discriminate analysis
tern Analysis and Machine Intelligence 12 (10) (1990) 993–1001. versus artificial neural networks, Journal of Operational Research Society 44
[11] S. Hashem, Optimal linear combinations of neural networks, Neural Networks (1993) 51–60.
10 (4) (1997) 599–614.

You might also like