Professional Documents
Culture Documents
Abstract—In this paper, we propose a model for generating are usually used. The first is to predict whether the trend
profitable trading strategies for Taiwan stock market. Our of the stock market is raising or falling. [5], [6], [9],
model applies the genetic programming (GP) to obtain prof- [14], [16]. The second benchmark is the difference of
itable and stable trading strategies in the training period, and
then the strategies are applied to trade the stock in the testing predicted indices and the real ones. The most commonly
period. The variables for GP include 6 basic information used measurements are root mean squared error (RMSE)
and 25 technical indicators. We perform five experiments on and mean absolute percentage error (MAPE) [2], [7], [10],
Taiwan Stock Exchange Capitalization Weighted Stock Index [12], [18]. The third benchmark is to apply the machine
(TAIEX) from 2000/9/14 to 2010/5/21. In these experiments, learning techniques to mine the trading strategies from
we find that the trading strategies generated by GP with
two arithmetic trees have more stable returns. In addition, the historical data, and the return got by applying these
if we obtain the trading strategies in three historical periods strategies to trade one stock is usually compared with the
which are the most similar to the current training period, buy-and-hold strategy [1], [3], [4], [13].
we are able to earn higher return in the testing period. In In this paper, our goal is to earn stable return in
each experiment, 24 cases are considered. The testing period Taiwan stock market. The trading strategy is applied to
is rolling updated with the sliding window scheme. The best
cumulative return 166.57% occurs when 545-day training Taiwan Stock Exchange Capitalization Weighted Stock
period pairs with 365-day testing period, which is much Index (TAIEX) in the testing period. The total trading
higher than the buy-and-hold strategy. period is from 2000/9/14 to 2010/5/21, approximately 10
Keywords-stock; Taiwan Stock Exchange Capitalization years in total. The best cumulative return 166.57% occurs
Weighted Stock Index; genetic programming; annualized when 545-day training period pairs with 365-day testing
return; feature set. period, which is much higher than the return of the buy-
and-hold strategy 1.19%.
I. I NTRODUCTION The rest of this paper is organized as follows. We will
Predicting the fluctuation of the stock market is a introduce the genetic programming (GP) in Section II. In
very popular research. Many investors are interested in Section III, we will present our method that the trading
investment, but forecasting the movement of the stock strategies are generated by GP. In Section IV, we will
market index is very difficult, because the stock market show our experimental results. Finally, in Section V, we
is usually affected by many external factors, such as the will give the conclusion and some future works.
interaction of international financial markets, the political
factors, the human operations, etc. Generally, investors II. P RELIMINARIES
do not have sufficient knowledge and information about In this section, we will give an introduction of the
investment so that they cannot gain return easily. genetic programming, which is the background knowledge
To get the patterns of price fluctuation hidden in the of this paper. The genetic programming (GP), which is
stock market, some studies [1], [9], [10] investigated extended from the genetic algorithm (GA), was proposed
historical data to find a precise solution of the stock market by Koza at 1992 [8]. The great advantage of GP is that
volatility by using the methods of artificial intelligence it can be applied in varieties of problems with some
and statistics. Furthermore, some studies tried to generate constraints. The solutions of these problems generated by
trading strategies for deciding the timing of buying and GP are represented as formulas. The representation of
selling. The methods for forecasting the stock market the chromosome of GP is more flexible than GA. The
fluctuation include support vector regression (SVR) [10], solutions of GA are string structures with fixed length
support vector machine (SVM) [5], [6], [9], [14], genetic which need to be encoded and to be decoded for answer
algorithm (GA) [1], [3], [4], [5], [6], [7], [11], [16], [17], transformation. The tree structure of GP has dynamic
genetic programming (GP) [13], [19], artificial neural extensibility. GP can parse the tree structure to get the
network (NN) [7], [10], [11], [12], [16], [18]. The goal of corresponding solution. Therefore, GP is more suitable for
these studies is to mine the pattern of the price fluctuation searching trading strategies. One trading strategy is usually
and to improve the accuracy of predicting the trend of produced by predefined functions, constants and variables.
the stock market. Among these studies, three benchmarks The evolutions in GP are summarized as follows.
∗ This research work was partially supported by the National Science 1) Initialization: The initial population is composed
Council of Taiwan under contract NSC99-2221-E-110-048. of individuals, which are generated randomly. The
† Corresponding author: cbyang@cse.nsysu.edu.tw structure of an individual is a tree.
152
Proceedings of the 2011 Conference on Technologies and Applications of Artificial Intelligence (TAAI 2011)
Start similar to the current training period are chosen from the
historical data by the measurement of the root mean square
error (RMSE). Here, all data series are formed by the daily
close prices.
Set the training period and testing period The equation of RMSE is given as follows:
n 2
j=1 (x1,j − x2,j )
RM SE(X1 , X2 ) = , (1)
n
where X1 , X2 , x1,j , x2,j , and n represent the data series
of the first period, the data series of the second period,
Search for three historical periods which are the most the jth element of the first period, the jth element of the
similar to the training period by RMSE measurement.
No second period, and the length of the period, respectively.
After we find the 3 historical periods with the lowest
RMSE values compared to the training period, we extend
Find the most profitable trading strategy for the three these periods. That is, suppose that the lengths of the
historical periods with GP. training and testing periods are d1 and d2, respectively.
The length of the historical period we search is d1, and
the length of the extended period is d1+ d2. Each strategy
Apply the trading strategy to the testing period. generated by GP is used to trade the stock in the 3
extended historical periods, then we can get 3 returns with
their standard deviation. The higher the average return
divided by its standard deviation is, the more stable and
profitable the trading strategy is. After getting the best
Is the sifted testing period over the trading
period? trading strategy, we apply it in the testing period.
C. Generating the Trading Strategy with the Genetic Pro-
Yes gramming
To utilize the genetic programming (GP), we have to
End define the function nodes and the terminal nodes first.
The function nodes for our method consist of operators
Figure 2. The flow chart for investing one stock. ’>’, ’<’, ’=’, ’≥’, ’≤’, ’logical and’, ’logical or’, ’+’, ’-
’, and ’×’. The terminal nodes involve 6 basic information
Testing
and 25 technical indicators of one stock, which are open
Training period
period
prices, close prices, highest prices, lowest prices, the
time line
t-n+1 t t+p
trading volumes, the amount of prices, M T M 5 , OBV ,
DI, volume5 , volume20 , RSI 5 , RSI 14 , M A10 , M A20 ,
T AP I, P SY 14 , W M S 5 , W M S 9 , BIAS 10 , BIAS 14 ,
Testing
Training period
period
t p n+1
( + )- t+p t+p*2 time line BIAS 20 , OSC 5 , RSV 9 , K 3 , D3 , EM A12 , EM A26 ,
Testing
DIF , and M ACD9 . We denote these 31 features as
Training period
period
full features. The constants for the terminal nodes are
time line
t p
( + *2)- n+1 t+p*2 t+p*3
the numbers generated randomly between −1.0 and 1.0.
Figure 3. The training period and the testing period with the sliding In addition, we modify the left subtree of the original
window scheme. Here, n, t, and p represent the length (day) of the arithmetic tree to a buy-tree, and the right subtree to a sell-
training period, the current date, and the length (day) of the testing period, tree. With this modification, the trading signal generated
respectively.
by two arithmetic trees on date t is given as follows.
B. Training Interval Selection from Historical Data IF (the value of the buy-tree on date t> 0)
AND (the value of the sell-tree on date t< 0)
Leu et al. [12] and Yu et al. [18] predict the stock close THEN signal(t) ← T RU E
price by finding the historical period which is similar to ELSE IF (the value of the buy-tree on date t< 0)
the training period. In this paper, the training period is AND (the value of the sell-tree on date t> 0)
defined to be an interval just before the current testing THEN signal(t) ← F ALSE
period. The lengths of training and testing periods need ELSE
not be the same. The historical period is selected from the signal(t) ← signal(t − 1)
interval starting from the very beginning to the day just END IF
before the training period. The goal of historical periods (2)
is to help the model construction of the training period. According to the trading signal, the action we should
In our model, three historical periods which are the most take at date t is described as follows.
153
Proceedings of the 2011 Conference on Technologies and Applications of Artificial Intelligence (TAAI 2011)
d/y
• buying: If the signal of date t is T RU E and the ϭϮϬϬϬ
IV. E XPERIMENTAL R ESULTS selected features which are frequently used in the
profitable strategies in Experiment II. Theese features
In this section, we will first introduce the benchmarks are RSI 5 , P SY 14 , W M S 5 , W M S 9 , DIF , and
and the dataset, and then show our experimental results. M ACD9 .
• Experiment IV: We use two arithmetic trees and the
A. Benchmarks full feature set with 12 macroeconomic indicators
added, such as U.S. producer price index, U.S. annual
In this section, we will introduce the measurements.
changes in consumer price index, Taiwan unemploy-
Many measurements are able to evaluate the performance
ment rate, etc.
of a trading model, but they usually use different units
• Experiment V: Two arithmetic trees and the full fea-
or currencies. To get objective benchmark results, we
ture set are used. We search the historical data to find
use the return on investment (ROI), which is defined as
f inal capital the three periods which are the most similar to the
( initial capital − 1 × 100%.) We also compare our results current training period with the RMSE measurement.
with the buy-and-hold strategy that the investor buys one
Then, the most profitable trading strategy for these
stock at the start of the trading period, and sells the stock
three periods are trained by GP. And finally, we apply
at the end of the trading period. A good trading model is
the strategy to the testing period.
able to earn high and stable return.
In all experiments, it is assumed that TAIEX is tradable
and only one unit is traded. Meanwhile, we can sell only
B. Data Collection and Preprocessing
when we have one unit in hand (bought before) The
Our dataset contains the basic information of Tai- trading fee is assumed to be 0.6% (the current fee in
wan Stock Exchange Capitalization Weighted Stock Index Taiwan stock market).
(TAIEX), and the dataset is fetched from Taiwan Eco- We train the trading strategy by using GP with training
nomic Journal (TEJ) database, which was established in periods of various lengths, including 90, 180, 270, 365,
April of 1990. It provides the stock information, including 455, 545, 635, and 730 days. And testing periods of
the prices, technical indicators, and fundamental analysis various lengths, including 90, 180, and 365 days, with
of securities of Taiwan stock market [15]. We assume the sliding window scheme, are tested. Thus, totally 24
that TAIEX is tradable. Our goal is to gain high and cases are performed in each experiment.
stable return by trading TAIEX with its daily close price The function nodes include ’>’, ’<’, ’=’, ’≥’, ’≤’,
being our trading target. The close prices of TAIEX ’logical and’, ’logical or’, ’+’, ’-’, and ’×’. The terminal
from 1995/1/1 to 2011/3/1 are shown in Figure 4. In nodes involve 6 basic information and 25 technical indica-
our experiments, the trading period is from 2000/9/14 to tors, and the constants are random numbers between −1.0
2010/5/21. The reason we choose this period is that the and 1.0. The parameters of GP are shown in Table I.
starting day and the ending day has almost the same close In Experiment I, using only one arithmetic tree may
price. In this period, it is fair to compare our results with generate trading signals which are very sensitive to the
the buy-and-hold strategy. fluctuation of the market, and it may also cause many
154
Proceedings of the 2011 Conference on Technologies and Applications of Artificial Intelligence (TAAI 2011)
Table I
T HE PARAMETERS OF THE GENETIC PROGRAMMING .
ϲϬϬϬ
d/y
Table II ϰϱϬϬ ďƵLJƚŝŵŝŶŐ
T HE CUMULATIVE RETURNS FROM 2000/9/14 TO 2010/05/21 WITH ƐĞůůƚŝ ŵŝŶŐ
time, and 7th selling time. In the bear market, we can sell
the stock before the stock crashes, such as 2nd, 3rd, and
6th selling time. It reveals the profitability of our model.
R EFERENCES
V. C ONCLUSION [1] P. F. Blanco, D. Sagi, and J.I.Hdalgo, “Technical market in-
In this paper, we propose a novel model for investment dicators optimization using evolutionary algorithm,” Proc.
of the 2008 GECCO conference companion on Genetic and
in Taiwan Stock Exchange Capitalization Weighted Stock
evolutionary computation, Atlanta, USA, July 2008.
Index (TAIEX). Our trading strategies are generated by
two arithmetic trees in GP. When 545-day training period [2] M. A. Boyacioglu and D. Avci, “An adaptive network-based
pairs with 365-day testing period, the cumulative return is fuzzy inference system ANFIS for the prediction of stock
166.57%, higher than the buy-and-hold strategy. market return: The case of the istanbul stock exchange,”
Expert Systems with Applications, Vol. 37, pp. 7908–7912,
In the future, we will try to find more suitable strategies 2010.
by searching more periods from the historical data, and
modify the evaluating method for finding similar patterns. [3] T. J. Chang, S. C. Yang, and K. J. Chang, “Portfolio
The longest common subsequence or dynamic time warp- optimization problems in different risk measures using ge-
ing is one of the possible ways to find the most similar netic algorithm,” Expert Systems with Applications, Vol. 36,
pp. 10529–10537, 2009.
subsequences in our training data for GP. We will also
apply the risk management and the portfolio to the stock [4] J. S. Chen, J. L. Hou, S. M. Wu, and Y. W. C. Chien,
investment. “Constructing investment strategy portfolios by combina-
155
Proceedings of the 2011 Conference on Technologies and Applications of Artificial Intelligence (TAAI 2011)
156