You are on page 1of 11

Applied Intelligence 21, 239–249, 2004


c 2004 Kluwer Academic Publishers. Manufactured in The United States.

Toward Global Optimization of Case-Based Reasoning Systems


for Financial Forecasting

KYOUNG-JAE KIM
Department of Information Systems, College of Business Administration, Dongguk University,
Seoul, Korea 100-715
drkjkim@empal.com, kjkim@dongguk.edu

Abstract. This paper presents a simultaneous optimization method of a case-based reasoning (CBR) system using
a genetic algorithm (GA) for financial forecasting. Prior research proposed many hybrid models of CBR and the GA
for selecting a relevant feature subset or optimizing feature weights. Most research used the GA for improving only
a part of architectural factors of the CBR model. However, the performance of the CBR model may be enhanced
when these factors are simultaneously considered. In this study, the GA simultaneously optimizes multiple factors
of the CBR system. Experimental results show that a GA approach to simultaneous optimization of the CBR model
outperforms other conventional approaches for financial forecasting.

Keywords: simultaneous optimization, case-based reasoning, genetic algorithms, feature discretization, financial
forecasting

1. Introduction ganizations have collective knowledge and expertise


which they have built up over the years. This organi-
Predicting stock market’s movement is the long- zational knowledge can be captured and stored using
cherished desire of investors, speculators, and indus- case-based reasoning (CBR). CBR is a reasoning tech-
tries. Although many studies investigated the predic- nique that reuses past cases to find a solution to the
tion of price movements in the stock market, financial new problem. CBR not only captures organizational
time series is too complex and noisy to forecast. Many knowledge and expertise but also provide explanations
researchers attempted to predict price movements in for the derived solutions. For this reason, CBR is pop-
the stock market using artificial intelligence (AI) tech- ularly applied to many applications.
niques during past decade. The earliest studies of this Previous research suggested that the integration of
area are mainly focused on applications of artificial domain knowledge into case indexing and retrieval
neural networks (ANNs) to stock market prediction process is important in building a useful CBR system
[1–6]. Recent research tends to hybridize several AI [11]. However, this task is quite difficult because do-
techniques [7–9]. main knowledge cannot be easily captured. In addition,
Although neural networks offer preeminent learning the existence of continuous data and large amounts of
ability, however, they cannot always explain why they records may pose a challenging task to explicit con-
arrived at a particular solution. Moreover, they cannot cepts extraction from the raw data due to the huge data
always guarantee a completely certain solution, arrive space determined by continuous features [12]. The re-
at the same solution again with the same input data, or duction and transformation of the irrelevant and redun-
always guarantee the best solution [10]. Unlike neu- dant features may shorten the running time of reason-
ral networks, expert systems typically provide expla- ing and yield more generalized results [13]. Prior re-
nations for their solutions. Expert systems primarily search tried to solve these two problems separately but
capture the knowledge of individual experts. But or- they did not consider these problems simultaneously. If
240 Kim

these factors are considered separately, optimization is results [13]. The conventional CBR system that simply
achieved in part, but may lead locally optimized solu- combines different metrics for continuous and discrete
tion as a whole. However, if these factors are simulta- attributes can lead to poor performance. The existence
neously considered, the performance may be enhanced of continuous data and large amounts of records may
because the optimization in a synergistic way may lead pose a challenging task to explicit concepts extraction
global optimization as a whole. from the raw data due to the huge data space deter-
This paper proposes simultaneous optimization ap- mined by continuous features [12]. In this aspect, Ting
proach using genetic algorithms (GAs) for case repre- [16] proposed discretization method for the continu-
sentation process and indexing and retrieval processes ous features in lazy learning algorithms including k-
in CBR system. This approach simultaneously selects nearest neighbor. He used the entropy minimization
the relevant feature subset and optimizes the thresh- strategy [17] to discretize the continuous features. He
olds for feature discretization. Feature discretization, showed that discretization could improve the perfor-
the process of converting data sets with continuous at- mance both in data sets with mixed continuous and dis-
tributes into input data sets with discrete attributes, fil- crete attribute types and data sets with only continuous
ters the noisy data, then we get an enhanced predic- attributes.
tion result. This paper applies the proposed approach Feature discretization has been studied in many pa-
to stock market analysis. Experimental results on ap- pers. The methods of feature discretization are clas-
plications will be presented. sified as endogenous versus exogenous, local versus
The rest of this paper is organized as follows: The global, parameterized versus non-parameterized, and
next section reviews prior research. Section 3 proposes hard versus fuzzy [18–20].
the GA approach to simultaneous optimization of CBR Endogenous methods do not take into consider-
system. In the third section, the benefits of the pro- ation the value of the dependent feature while ex-
posed approach are presented. Section 4 describes the ogenous methods do. Local methods discretize one
research design and experiments. In the fifth section, attribute at once while the global ones discretize all
the empirical results are summarized and discussed. In features simultaneously. Parameterized methods spec-
the final section, conclusions and the limitations of this ify the maximal number of intervals generated in ad-
study are presented. vance while non-parameterized methods determine it
automatically. Hard methods discretize the intervals at
2. Prior Research the cutting point exactly while fuzzy methods discretize
it by overlapping bounds [20].
CBR is composed of the steps of case representation, The endogenous methods include discretizing by the
indexing, retrieval, and adaptation. This paper simulta- self-organizing map [21], the percentile method [19,
neously optimizes case representation step and index- 22], and the clustering method [19, 23]. Piramuthu et al.
ing and retrieval step. In this section, we review basic [24] suggested a decision-tree based approach as an en-
concepts and prior studies on these steps in the CBR dogenous discretization method. These methods have
system. the advantage of simplicity in the discretization pro-
cess. However, they do not consider the association
2.1. Case Representation Step among each independent and dependent feature. The
prediction performance is enhanced by the ability of
The step of case representation clearly, concisely, and discrimination not only by a single feature but also by
truthfully represents cases to reflect specific knowledge the association among features. For this limitation, the
in the case-base. The appropriate case representation endogenous method does not provide an effective way
relies on the characteristics of the problem domain [14]. of forming categories [19].
If the problem domain is changed according to time, Exogenous methods include maximizing the sta-
case representation must have sufficient detail to be tistical significance of Cramer’s V between other
able to judge the applicability of the case in the new dichotomized variables [19], entropy minimization
situation [15]. heuristic in inductive learning and the k-nearest neigh-
On the other hand, the reduction and transformation bor method [16, 17, 25]. Recently, Kim and Han [9] pro-
of the irrelevant and redundant features may shorten the posed a GA-based feature discretization method as the
running time of reasoning and yield more generalized exogenous method for ANN. These methods discretize
Toward Global Optimization of Case-Based Reasoning Systems 241

an independent feature to maximize its association with subset is selected by the characteristics of data itself and
the values of dependent and other independent features. independent of learning algorithm [31]. On the other
hand, the wrapper approach depends on the results of
2.2. The Case Indexing and Retrieval Step learning algorithm for the selection of relevant feature
subset. Although the filter approach is rather simple
Case indexing is a task of assigning labels to cases to to implement, the wrapper approach may be effective
ensure that they can be retrieved at appropriate times because it operates with the learning algorithm in a
[15]. There are few guidelines for selecting good in- synergistic way.
dexes. In most research, indexes are selected by the Many studies resorted to selection techniques asso-
domain experts. The better the experts understand the ciated with linear methods including linear regression
domain, the better the index tends to be. Human ex- and discriminant analysis, so that nothing prevents dif-
perts, however, may not always select good indexes. If ferent sets of features to be more relevant in non-linear
the process of index selection is automated, the consis- terms [33]. Among many methods of feature selection,
tency and maintainability of the index are enhanced. the GA is increasingly being used in data mining appli-
One of the most popular indexing methods is feature cation including ANN [34, 35], inductive learning [36,
weighting. Feature weighting is assigning a weight to 37], and linear regression [38]. The GA selects relevant
each feature according to the relative importance of feature subset to the specific fitness function of the ap-
each one. Wettschereck et al. [26] presented various plication. In addition, the GA does not require the re-
feature weighting methods based on distance metrics strictive monotonicity assumption and also readily lend
in the machine learning literature. Kelly and Davis [27] itself to the use of multiple selection criteria [32].
proposed a GA-based feature weighting method for k- For CBR, Siedlecki and Sklansky [39] proposed a
nearest neighbor. Similar methods are applied to the feature selection algorithm based on genetic search and
prediction of corporate bond rating [11] and to failure- Cardie [40] presented a decision tree approach to fea-
mechanism identification [28]. In addition, Kim and ture subset selection. Skalak [41] and Domingos [42]
Shin [29] presented feature weighting methods based also proposed a hill climbing algorithm and a cluster-
on the GA and ANN. ing method for feature subset selection. In addition,
Feature subset selection also is considered as the Cardie and Howe [43] used a mixed feature weighting
popular case indexing method. It tries to pick a sub- and feature subset selection method. They first selected
set of features that are relevant to the target concept relevant features using decision tree, then they assigned
and remove irrelevant or redundant features [13]. Fea- weights to the remained features using the value of in-
ture weighting includes feature selection, since selec- formation gain for each feature. Jarmulak et al. [44, 45]
tion is a special case of weighting with binary weights. selected relevant features using decision tree algorithm
But, feature selection is more efficient and effective including C4.5 [46] and assigned feature weights using
than feature weighting when harmful features are ex- the GA. Table 1 presents prior research on feature anal-
isting. Some researchers classify the methods of feature ysis for CBR. Most of their approaches were classified
subset selection into two categories: a filter and wrap- as filter approach for feature subset selection, and they
per approach [30–32]. In the filter approach, a feature had the limitations of filter approach.

Table 1. Prior research on feature analysis for CBR.

Feature analysis Description References

Feature weighting Using the GA [11, 27–29]


Feature subset selection Using the GA [39]
Using the decision tree [40]
Using the random mutation hill climbing algorithm [41]
Using the clustering technique (local selection) [42]
Mixed feature selection Feature selection using the decision tree and weighting [43]
and weighting the remaining features using information gain
Feature selection using the decision tree and weighting [44, 45]
the remaining features using the GA
242 Kim

Once cases are represented and indexed, the retrieval ponents of a selected chromosome. Mutation randomly
process is initiated. The indexing and retrieval pro- changes a gene on a chromosome. It provides the means
cesses consist of two phases. The first phase is to deter- for introducing new information into the population.
mine the relative importance of case attributes for the Finally, the GA tends to converge on an optimal or
current problem. In the second phase, the case must near-optimal solution through these operators [52].
be matched in the case library using these attributes
and their specified importance [47]. Nearest-neighbor
matching techniques are popularly employed in this 3. Model Specification
phase.
As mentioned in Section 2, prior studies suggested that
feature weighting or feature subset selection are very
2.3. Genetic Algorithms important to enhance the prediction performance of the
CBR system. In addition, Ting [16] suggested that fea-
The GA has been investigated recently and shown to be ture discretization using entropy minimization strategy
effective in exploring a complex space in an adaptive enhance the performance. However, if these architec-
way, guided by the biological evolution of selection, tural factors are simultaneously considered the CBR
crossover, and mutation [48]. This algorithm uses nat- system finds optimal or near-optimal solutions. The fol-
ural selection, survival of the fittest, to solve optimiza- lowing are the architectural factors of designing CBR
tion problems. system.
The first step of the GA is problem representation. The first architectural factor is relevant feature sub-
The problem must be represented in a suitable form to set. Irrelevant and redundant features may cause distor-
be handled by the GA. Thus, the problem is described tion in relationship between the input and the output.
in terms of genetic code, like DNA chromosomes. The The second factor, a rather novel one, is the thresholds
GA often works with a form of binary coding. If the for feature discretization. Feature discretization filters
problems are coded as chromosomes, the populations noisy data, then enhances prediction performance and
are initialized. Each chromosome within the population generalizability.
gradually evolves through biological operations. There This paper proposes the GA as the method of feature
are no general rules for determining the population size. subset selection and discretization in the CBR system.
But, population sizes of 100–200 are commonly used For feature discretization, the GA transforms the rep-
in GA research. Once the population size is chosen, the resentation of feature values. If a fitness function is
initial population is randomly generated [49]. After the specified, the GA searches for the (near-)optimal form
initialization step, each chromosome is evaluated by a of representation via feature discretization. Properly
fitness function. According to the value of the fitness discretized features can simplify the reasoning pro-
function, the chromosomes associated with the fittest cess and may improve the generalizability of results
individuals will be reproduced more often than those because it may effectively reduce the noisy and re-
associated unfit individuals [50]. dundant data. Feature discretization needs relevant and
The GA works with three operators that are itera- rational discretizing thresholds. The thresholds, how-
tively used. The selection operator determines which ever, may vary according to the environments being
individuals may survive [51]. The crossover operator analyzed. For this reason, there are no general guide-
allows the search to fan out in diverse directions look- lines of discretization. This study uses the GA as a
ing for attractive solutions and permits chromosomal feature discretization method. The method takes into
material from different parents to be combined in a sin- consideration the dependent feature via a fitness func-
gle child. There are three popular crossover methods: tion in the GA. The GA iterates the evolution of the
single-point, two-point, and uniform. The single-point populations to maximize the fitness function. The pro-
crossover makes only one cut in each chromosome and posed method simultaneously discretizes all features
selects two adjacent genes on the chromosome of a par- into the intervals at the exact thresholds. The method
ent. The two-point crossover involves two cuts in each is classified as an exogenous, global, hard, and non-
chromosome. The uniform crossover allows two parent parameterized discretization method.
strings to produce two children. It permits great flexi- In this paper, the GA is also engaged as a selection
bility in the way strings are combined. In addition, the method of relevant feature subset for case indexing.
mutation operator arbitrarily alters one or more com- Feature subset selection reduces the reasoning time and
Toward Global Optimization of Case-Based Reasoning Systems 243

may produce generalized results. The consistency and Table 2. Summary of four models.
maintainability of the index may be enhanced through Feature Feature Feature
an automated case indexing method. weighting subset selection discretization
To test the effectiveness of the proposed model, we
COCBR – – LS∗∗
compare the results of four different models. The first
model, labeled COCBR (COnventional CBR), uses a FWCBR GA∗ – LS
conventional approach for reasoning process of CBR. FSCBR – GA LS
This model considers all initially available features as SOCBR – GA GA
a feature subset. Thus, there is no special process of ∗ Genetic algorithm.
feature subset selection. In addition, relative impor- ∗∗ Linear scaling.
tance of each feature is not considered because many
conventional CBR models do not have general feature (Feature Selection using the GA for CBR). This model
selection or weighting algorithm. For the feature trans- also uses linear scaling for feature transformation.
formation method, linear scaling is used. Linear scal- Siedlecki and Sklansky [39] proposed similar model
ing means linear scaling to unit variance in this study. It to it.
transforms a feature component x to a random variable The fourth model, the proposed model in this study,
with zero mean and unit variance [53]. It is usually em- employs the GA to select a relevant feature subset and
ployed to enhance the performance of the CBR system to optimize the thresholds for feature discretization
because it ensures the larger value input features do not simultaneously using the reference and the test case-
overwhelm smaller value input features. base. This model is named as SOCBR (Simultaneous
The second model assigns relevant feature weights Optimization using the GA for CBR) in this study.
via genetic search. This study names this model Table 2 shows summary of these four models.
FWCBR (Feature W eighting using the GA for CBR). Among three hybrid models, this study describes the
The model uses linear scaling to representing the origi- reasoning process of SOCBR. The optimization pro-
nal data. Similar models to it were previously suggested cesses of FWCBR and FSCBR are like SOCBR except
by Kelly and Davis [27], Shin and Han [11], Kim and for the process of feature transformation. The frame-
Shin [29], and Liao et al. [28]. work of SOCBR is shown in Fig. 1.
The third model uses the GA to select a relevant In Fig. 1, the data used in this study are split
feature subset. This study names this model FSCBR into three case-bases. The first one is the reference

Figure 1. Framework of SOCBR.


244 Kim

case-base. The reference case-base is used to search equation:


optimal feature subset and thresholds for feature dis-
cretization in genetic learning and also used as a case- 1 n
Fitness = CRi (i = 1, 2, . . . n)
base for retrieval. The test case-base is the second one. n i=1
This case-base measures how well the system interpo- 
if POi = AOi CRi = 1
lates using the selected feature subset and the derived
thresholds for feature discretization through the evo- otherwise CRi = 0
lutionary search process from the reference case-base.
where CRi is the prediction result for the ith trading
The holdout case-base is the third one and this case-
day which is denoted by 0 or 1, POi is the predicted
base is used to validate the generalizability of model
output from the model for the ith trading day, AOi
for the unseen data.
is the actual output for the ith trading day, and n is
The process of SOCBR consists of the following
the value of total trading days. In this stage, the GA
three stages:
operates the process of crossover and mutation on
initial chromosomes and iterates until the stopping
Stage 1. For the first stage, we search the search space conditions are satisfied.
to find an optimal or near-optimal feature subset Stage 2. The second stage is the process of case re-
and the thresholds of feature discretization. The trieval and matching for new problem in the CBR
population—the codes for features subset and the system. In this stage, nearest-neighbor matching is
thresholds for feature discretization—is initialized used as a method of case retrieval. This method
into random values before the search process. The is a popular retrieval method because it can be
parameters for searching must be encoded on chro- easily applied to numeric data including financial
mosomes. The encoded chromosomes are searched data.
to maximize the specific fitness function. The ob- Stage 3. In the third stage, selected feature subset and
jectives of this paper are to select relevant feature the thresholds for feature discretization are applied
subset and to approximate rational thresholds of fea- to the holdout data. This stage is required because the
ture discretization for the correct solutions. These GA optimizes the parameters to maximize the aver-
objectives can be represented by the average pre- age predictive accuracy of the test data, but some-
diction accuracy of the test data. Thus, this paper times the optimized parameters are not generalized
applies it to the fitness function. The fitness func- to deal with the unknown data. Table 3 describes the
tion is represented mathematically as the following algorithm of SOCBR.

Table 3. Step of SOCBR algorithm.

Step 0. Initialize the populations. (the feature subset and the thresholds for feature discretization)
(Set to small random values between 0.0 and 1.0)
Step 1. While stopping condition is false, do Steps 2–9.
Step 2. Do Steps 3–8.
Step 3. Computes the distance dab between new case in the test case-base xb
and eachcase in the reference case-base xa .
n
dab = i=1 Wi (x ai − x bi )
2

Step 4. Seek the best neighboring case xb in the past which are closest to xa
according to the distance function.
Step 5. Calculate the output for xb from the output of xa .
Step 6. Calculate fitness.
(Fitness function: Average predictive accuracy on the test case-base)
Step 7. Select individuals to become parents of the next generation.
Step 8. Create a second generation from the parent pool.
(Perform crossover and mutation.)
Step 9. Test the stop condition.
Toward Global Optimization of Case-Based Reasoning Systems 245

4. Research Design and Experiments feature discretization. The second set represents dis-
tances between the discretizing thresholds. These dis-
The research data used in this study comes from the tances automatically determine the number of cate-
daily Korean Composite Stock Price Index (KOSPI) gories for discretization because the thresholds are not
from January 1989 to December 1998. The total used if the searched thresholds are more than the max-
number of samples includes 2,928 trading days. Ini- imum number of each feature. The third set is the se-
tial features are 12 technical indicators. Table 4 de- lection codes for relevant feature subset.
scribes initially selected features. These features are The strings used in this study have the following
selected by the review of domain experts and prior encoding:
research. The first 48 bits represent the thresholds for fea-
Table 5 presents the summary statistics for each fea- ture discretization. These bits varied between –3 and
ture. As mentioned in Section 3, the case-base used in 2. Each feature is discretized into at most 5 categories.
this study is split into reference, test, and holdout case- Thus, it needs 4 thresholds for discretization. The next
base. The number of cases in each case-base is shown 36 bits indicate the distances between the discretizing
in Table 6. thresholds. These bits are searched from 0 to 100. As
This study needs three sets of parameters to perform mentioned earlier, the GA searches the number of cate-
experiments. The first set represents the thresholds for gories to be discretized using these bits. The thresholds

Table 4. Selected features and their formulas.

Feature name Description Formula References


Ct −L L t−n
%K Stochastic %K . It compares where a security’s price closed × 100
H Ht−n −L L t−n [54]
relative to its price range over a given time period. where L L t and H Ht mean lowest low and
highest high in the last t days respectively.
n−1
i=0 %K t−i
%D Stochastic %D. Moving average of %K . n [54]
n−1
i=0 %Dt−i
Slow %D Stochastic slow %D. Moving average of %D. n [55]
Momentum It measures the amount that a security’s price has Ct − Ct−4 [56]
changed over a given time span
Ct
ROC Price Rate-of-Change. It displays the difference between Ct−n × 100 [57]
the current price and the price n days ago.
Hn −Ct
Williams’ %R Larry William’s %R. It is a momentum indicator that Hn −L n × 100 [54]
measures overbought/oversold levels.
Ht −Ct−1
A/D Oscillator Accumulation/Distribution n Oscillator. It is a momentum Ht −L t [56]
indicator that associates changes in price.
Ct
Disparity 5 5-day disparity. It means the distance of current price M A5 × 100 [58]
and the moving average of 5 days.
Ct
Disparity10 10-day disparity. M A10 × 100 [58]
M A5 −M A10
OSCP Price Oscillator. It displays the difference between two M A5 [54]
moving averages of a security’s price.
(Mt −S Mt )
CCI Commodity Channel Index. It measures the variation (0.015×Dt ) [54, 56]
(Ht +L t +Ct )
of a security’s price from its statistical mean. where

Mt means 3 , S Mt is
n
i=1 Mt−i+1
n n , and Dt means
i=1 | Mt−i+1 −S Mt |
n .
RSI Relative Strength Index. It is a price following an 100 − 100
n−1 [54]
i=0 U pt−i /n
oscillator that ranges from 0 to 100. 1+ n−1
i=0 Dwt−i /n
where U pt means Upward-price-change and
Dwt means Downward-price-change at time t.

Ct : Closing price at time t, L t : Low price at time t, Ht : High price at time t, M At : Moving average of t days.
246 Kim

Table 5. Summary statistics. preserving the schema, and can generate any schema
Standard from the two parents, while single-point and two-point
Feature name Max Min Mean deviation crossover methods may bias the search with the irrele-
vant position of the features. For the mutation method,
%K 100.007 0.000 45.407 33.637
this study generates a random number between 0 and 1
%D 100.000 0.000 45.409 28.518 for each of the features in the organism. If a feature gets
Slow %D 99.370 0.423 45.397 26.505 a number that is less than or equal to the mutation rate,
Momentum 102.900 −108.780 −0.458 21.317 then that feature is mutated. As the stopping condition,
ROC 119.337 81.992 99.994 3.449 only 5000 trials are permitted. The parameters to be
Williams’ %R 100.000 −0.107 54.593 33.637 searched use only the information about the reference
A/D oscillator 3.730 −0.157 0.447 0.334 and the test case-base.
Disparity5 110.003 90.077 99.974 1.866
Disparity10 115.682 87.959 99.949 2.682 5. Experimental Results
OSCP 5.975 −7.461 −0.052 1.330
CCI 226.273 −221.448 −5.945 80.731 In this section, the prediction performances of four
RSI 100.000 0.000 47.598 29.531 models are compared. Table 7 describes the average
prediction accuracy of each model for the holdout data.
In Table 7, SOCBR achieves higher prediction accu-
racy than COCBR, FWCBR and FSCBR by 8.62, 6.56
are not used if the searched thresholds are more than and 6.05% respectively for the holdout data. FSCBR
the maximum value of each feature. The upper limit outperforms COCBR and FWCBR by 2.57 and 0.51%
of the number of categories is 5 and the lower limit for the holdout data. In addition, FWCBR outperforms
is 1. This number is automatically determined by the COCBR by 2.06%. These results may be caused by the
searching process. benefits of the global search techniques.
The next 12 bits represent the selection codes for The McNemar tests are used to examine whether
relevant feature subset. These bits are searched to cat- SOCBR significantly outperforms the other three mod-
egorize them as 0 and 1. Each bit indicates whether els. This test is used with nominal data and is particu-
the associated features are included or excluded for larly useful with before-after measurement of the same
the reasoning process. The feature with “0” is ex- subjects [59]. Table 8 shows the results of the McNemar
cluded and that with “1” is included in the step of case test to compare the performances of four models for the
retrieval. holdout data.
For the controlling parameters of GA search, the As shown in Table 8, SOCBR better than COCBR at
population size is set to 100 organisms and the the 1% and outperforms FSCBR and FWCBR with the
crossover and mutation rates are varied to prevent ANN 5% statistical significance level. But the other models
from falling into a local minimum. The range of the do not significantly outperform each other.
crossover rate is set between 0.5 and 0.7 while the mu- In addition, the two-sample test for proportions is
tation rate ranges from 0.05 to 0.1. This study per- performed. This test is designed to distinguish between
forms the crossover using a uniform crossover routine. two proportions when the prediction accuracy of the
The uniform crossover method is considered better at left-vertical methods is compared with those of the

Table 6. Number of each case-base.

Year

Case-base 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 Total

Reference 162 163 163 165 165 165 164 164 163 163 1,637
Test 70 70 71 71 72 72 71 71 71 71 710
Holdout 57 58 58 58 59 59 58 58 58 58 581
Total 289 291 292 294 296 296 293 293 292 292 2,928
Toward Global Optimization of Case-Based Reasoning Systems 247

Table 7. Average prediction accuracy for the holdout data. hybrid model of GA and CBR offers a viable alternative
Year COCBR FWCBR FSCBR SOCBR approach. Empirical results show that SOCBR offers
better predictive performance than FSCBR, FWCBR
1989 56.1 52.6 52.6 57.9 and COCBR.
1990 50.0 58.6 58.6 62.1 This study has some limitations. There will be other
1991 51.7 55.2 56.9 62.1 factors which enhance the prediction performance of
1992 44.8 51.7 44.8 63.8 CBR if they are incorporated with the simultaneous op-
1993 49.2 52.5 54.2 61.0 timization model. SOCBR produces valid results, how-
1994 52.5 54.2 59.3 57.6 ever, the GA can potentially be used to optimize another
1995 58.6 55.2 55.2 60.3 step of the reasoning process of CBR system including
1996 62.1 56.9 60.3 56.9
case deletion and case adaptation. The prediction per-
formance may be enhanced if the GA is employed for
1997 51.7 53.4 53.4 62.1
relevant instance selection and this remains an interest-
1998 44.8 51.7 51.7 63.8
ing topic for further study. In addition, further research
Total 52.14% 54.20% 54.71% 60.76%
will extend the method of feature discretization using
other global search algorithms including a tabu search
Table 8. McNemar values for the holdout data. algorithm. Of course, the generalizability of SOCBR
should be tested further by applying it to other problem
FWCBR FSCBR SOCBR
domains.
COCBR 0.712 0.912 10.261∗∗
FWCBR 0.019 5.611∗
FSCBR 4.234∗ Acknowledgments
∗ Significant at the 5% level.
∗∗ Significant
This work was supported by the Dongguk University
at the 1% level.
Research Fund.

Table 9. p values for the holdout data.

FWCBR FSCBR SOCBR References


COCBR 0.2409 0.1900 0.0016 1. T. Kimoto, K. Asakawa, M. Yoda, and M. Takeoka, “Stock mar-
FWCBR 0.4307 0.0120 ket prediction system with modular neural network,” in Proceed-
FSCBR 0.0185 ings of the International Joint Conference on Neural Networks,
San Diego, CA, 1990, pp. 1–6.
2. K. Kamijo and T. Tanigawa, “Stock price pattern recognition:
A recurrent neural network approach,” in Proceedings of the
right-horizontal methods [60]. Table 9 shows p values International Joint Conference on Neural Networks, San Diego,
CA, 1990, pp. 215–221.
for the pairwise comparison of performance between 3. H. Ahmadi, “Testability of the arbitrage pricing theory by
models. neural networks,” in Proceedings of the International Confer-
As shown in Table 9, SOCBR better than COCBR ence on Neural Networks, San Diego, CA, 1990, pp. 385–
at the 1% and outperforms FSCBR and FWCBR with 393.
the 5% statistical significance level. 4. Y. Yoon, and G. Swales, “Predicting stock price performance:
A neural network approach,” in Proceedings of the 24th Annual
Hawaii International Conference on System Sciences, Hawaii,
6. Conclusions 1991, pp. 156–162.
5. R.R. Trippi and D. DeSieno, “Trading equity index futures with a
neural network,” The Journal of Portfolio Management, vol. 19,
This paper has suggested a new hybrid model of GA pp. 27–33, 1992.
and CBR to overcome the limitations of prior studies. 6. J.H. Choi, M.K. Lee, and M.W. Rhee, “Trading S&P 500 stock
In this paper, we use the GA for CBR system in two index futures using a neural network,” in Proceedings of the
ways. We first adopt feature discretization based on the Third Annual International Conference on Artificial Intelli-
gence Applications on Wall Street, New York, 1995, pp. 63–
GA. Second, we use the GA to select relevant feature 72.
subset for CBR system. From the results of the experi- 7. Y. Hiemstra, “Modeling structured nonlinear knowledge to pre-
ment, it is apparent that for stock market prediction, the dict stock market returns,” in Chaos & Nonlinear Dynamics
248 Kim

in the Financial Markets: Theory, Evidence and Applications, classifier,” Expert Systems with Applications, vol. 15, pp. 375–
edited by R.R. Trippi, Irwin: Chicago, II, 1995, pp. 163–175. 381, 1998.
8. R. Tsaih, Y. Hsu, and C.C. Lai, “Forecasting S&P 500 stock in- 26. D. Wettschereck, D.W. Aha, and T. Mohri, “A review and empir-
dex futures with a hybrid AI system,” Decision Support Systems, ical evaluation of feature weighting methods for a class of lazy
vol. 23, pp. 161–174, 1998. learning algorithms,” Artificial Intelligence Review, vol. 11, pp.
9. K. Kim and I. Han, “Genetic algorithms approach to feature 273–314, 1997.
discretization in artificial neural networks for the prediction of 27. J.D.J. Kelly and L. Davis, “Hybridizing the genetic algorithm
stock price index,” Expert Systems with Applications, vol. 19, and the k nearest neighbors classification algorithm,” in Pro-
pp. 125–132, 2000. ceedings of the Fourth International Conference on Genetic Al-
10. R.R. Trippi and E. Turban, Neural Network in Finance and In- gorithms, Morgan Kaufmann: San Diego, CA, 1991, pp. 377–
vesting, Probus, 1992. 383.
11. K. Shin and I. Han, “Case-based reasoning supported by ge- 28. T.W. Liao, Z.M. Zhang, and C.R. Mount, “A case-based rea-
netic algorithms for corporate bond rating,” Expert Systems with soning system for identifying failure mechanisms,” Engineer-
Applications, vol. 16, pp. 85–95, 1999. ing Applications of Artificial Intelligence, vol. 13, pp. 199–213,
12. H. Liu and R. Setiono, “Dimensionality reduction via discretiza- 2000.
tion,” Knowledge-Based Systems, vol. 9, pp. 67–72, 1996. 29. S.H. Kim and S.W. Shin, “Identifying the impact of decision
13. M. Dash and H. Liu, “Feature selection methods for classi- variables for nonlinear classification tasks,” Expert Systems with
fications,” Intelligent Data Analysis-An International Journal, Applications, vol. 18, pp. 201–214, 2000.
vol. 1, pp. 131–156, 1997. 30. G. John, R. Kohavi, and K. Pfleger, “Irrelevant features and the
14. C.E. Brown and U.G. Gupta, “Applying case-based reasoning to subset selection problem,” in Proceedings of the Eleventh Inter-
the accounting domain,” International Journal of Intelligent Sys- national Conference on Machine Learning, Morgan Kaufmann:
tems in Accounting, Finance and Management, vol. 3, pp. 205– New Brunswick, NJ, 1994, pp. 121–129.
221, 1994. 31. H. Wang, D. Bell, and F. Murtagh, “Relevance approach to
15. J. Kolodner, Case-Based Reasoning, Morgan Kaufmann: San feature subset selection,” in Feature Extraction, Construc-
Mateo, CA, 1993. tion and Selection: A Data Mining Perspective, edited by H.
16. K.A. Ting, “Discretization in lazy learning algorithms,” Artifi- Liu and H. Motoda, Kluwer Academic Publishers: Boston,
cial Intelligence Review, vol. 11, pp. 157–174, 1997. 1998.
17. U.M. Fayyad and K.B. Irani, “Multi-interval discretization of 32. J. Yang and V. Honavar, “Feature subset selection using a genetic
continuous-valued attributes for classification learning,” in Pro- algorithm,” in Feature Extraction, Construction and Selection:
ceedings of the 13th International Joint Conference on Artificial A Data Mining Perspective, edited by H. Liu and H. Motoda,
Intelligence, 1993, pp. 1022–1027. Kluwer Academic Publishers: Boston, 1998.
18. J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsu- 33. A. Vellido, P.J.G. Lisboa, and J. Vaughan, “Neural networks in
pervised discretization of continuous features,” in Proceedings business: A survey of applications,” Expert Systems with Appli-
of the Twelfth International Conference on Machine Learning, cations, vol. 17, pp. 51–70, 1999.
San Francisco, CA, 1995, pp. 194–202. 34. C. Ornes and J. Sklanski, “A neural network that explains as
19. P.D. Scott, K.M. Williams, and K.M. Ho, “Forming categories well as predicts financial market behavior,” in Proceedings of
in exploratory data analysis and data mining,” in Advances in the IEEE/IAFE, 1997, pp. 43–49.
intelligent data analysis, edited by X. Liu, P. Cohen, and M. 35. J. Yang and V. Honavar, “Feature subset selection using a genetic
Berthold, Springer-Verlag: Berlin, 1997, pp. 235–246. algorithm,” IEEE Intelligent Systems and Their Applications,
20. R. Susmaga, “Analyzing discretizations of continuous at- vol. 13, pp. 44–49, 1998.
tributes given a monotonic discrimination function,” Intelligent 36. J. Bala, J. Huang, H. Vafaie, K. DeJong, and H. Wechsler, “Hy-
Data Analysis-An International Journal, vol. 1, pp. 157–179, brid learning using genetic algorithms and decision trees for
1997. pattern classification,” in Proceedings of the International Joint
21. S. Lawrence, A.C. Tsoi, and C.L. Giles, “Noisy time series Conference on Artificial Intelligence, 1995, pp. 19–25.
prediction using symbolic representation and recurrent neural 37. H. Vafaie and K. De Jong, “Feature space transformation using
network grammatical inference,” in Technical Report UMIACS- genetic algorithms,” IEEE Intelligent Systems and Their Appli-
TR-96-27 and CS-TR-3625, edited by Institute for Advanced cations, vol. 13, pp. 57–65, 1998.
Computer Studies, University of Maryland, 1996. 38. B.C. Wallet, D.J. Marchette, J.L. Solka, and E.J. Wegman, “A
22. P. Buhlmann, “Extreme events from the return-volume process: genetic algorithm for best subset selection in linear regres-
A discretization approach for complexity reduction,” Applied sion,” in Proceedings of the 28th Symposium on the Interface,
Financial Economics, vol. 8, pp. 267–278, 1998. 1996.
23. P. Kontkanen, P. Myllymaki, T. Silander, and H. Tirri, “A 39. W. Siedlecki and J. Sklanski, “A note on genetic algorithms
Bayesian approach to discretization,” in Proceedings of the Eu- for large-scale feature selection,” Pattern Recognition Letters,
ropean Symposium on Intelligent Techniques, 1997, pp. 265– vol. 10, pp. 335–347, 1989.
268. 40. C. Cardie, “Using decision trees to improve case-based learn-
24. S. Piramuthu, H. Ragavan, and M.J. Shaw, “Using feature con- ing,” in Proceedings of the Tenth International Conference on
struction to improve the performance of neural networks,” Man- Machine Learning, Morgan Kaufmann: San Francisco, CA,
agement Science, vol. 44, pp. 416–430, 1998. 1993, pp. 25–32.
25. J. Martens, G. Wets, J. Vanthienen, and C. Mues, “An initial 41. D.B. Skalak, “Prototype and feature selection by sampling and
comparison of a fuzzy neural classifier and a decision tree based random mutation hill climbing algorithms,” in Proceedings of
Toward Global Optimization of Case-Based Reasoning Systems 249

the Eleventh International Conference on Machine Learning, 55. E. Gifford, Investor’s Guide to Technical Analysis: Predicting
New Jersey, 1994, pp. 293–301. Price Action in the Markets, Pitman Publishing: London, 1995.
42. P. Domingos, “Context-sensitive feature selection for lazy 56. J. Chang, Y. Jung, K. Yeon, J. Jun, D. Shin, and H. Kim, Technical
learners,” Artificial Intelligence Review, vol. 11, pp. 227–253, Indicators and Analysis Methods, Jinritamgu Publishing: Seoul,
1997. 1996.
43. C. Cardie and N. Howe, “Improving minority class predic- 57. J.J. Murphy, Technical analysis of the Futures Markets: A
tion using case-specific feature weights,” in Proceedings of Comprehensive Guide to Trading Methods and Applications,
the Fourteenth International Conference on Machine Learn- Prentice-Hall: New York, 1986.
ing, Morgan Kaufmann: San Francisco, CA, 1997, pp. 57– 58. J. Choi, Technical Indicators, Jinritamgu Publishing: Seoul,
65. 1995.
44. J. Jarmulak, S. Craw, and R. Rowe, “Genetic algorithms to 59. D.R. Cooper and C.W. Emory, Business Research Methods,
optimise CBR retrieval,” in Advances in Case-Based Reason- Irwin: Chicago, Il, 1995.
ing: Proceedings of EWCBR-2K, edited by E. Blanzieri and L. 60. D.L. Harnett and A.K. Soni, Statistical Methods for Business
Portinale, Trento, Italy, 2000, pp. 136–147. and Economics, Addison-Wesley: Reading, MA, 1991.
45. J. Jarmulak, S. Craw, and R. Rowe, “Self-optimising CBR re-
trieval,” in ICTAI-2000 Proceedings, Vancouver, Canada, 2000,
pp. 376–383.
46. J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan
Kaufmann: San Mateo, CA, 1993.
47. P. Buta, “Mining for financial knowledge with CBR,” AI Expert,
vol. 9, pp. 34–41, 1994.
48. H. Adeli and S. Hung, Machine Learning: Neural Networks,
Genetic Algorithms, and Fuzzy Systems, Wiley: New York,
1995.
49. R.J. Bauer, Genetic Algorithms and Investment Strategies,
Wiley: New York, 1994.
50. L. Davis, Handbook of Genetic Algorithms, Van Nostrand
Reinhold: New York, 1994. Kyoung-jae Kim is an assistant professor of Information Systems
51. J. Hertz, A. Krogh, and R.G. Palmer, Introduction to the at the Dongguk University. He received his M.S. and Ph.D. degrees
Theory Neural Computation, Addison-Wesley: Reading, MA, in Management Information Systems from the Graduate School of
1991. Management at the Korea Advanced Institute of Science and Tech-
52. F. Wong and C. Tan, “Hybrid neural, genetic and fuzzy systems,” nology and his B.A. degree from the Chung-Ang University. His
in Trading On The Edge, edited by G.J. Deboeck, Wiley: New research interests include data mining, knowledge management, and
York, 1994, pp. 243–261. intelligent agents. His articles have been accepted or published in
53. A.K. Jain and R.C. Dubes, Algorithms for Clustering Data, Expert Systems, Expert Systems with Applications, Intelligent Data
Prentice Hall: NJ, 1988. Analysis, Intelligent Systems in Accounting, Finance and Manage-
54. S.B. Achelis, Technical Analysis from A to Z, Probus Publishing: ment, Neural Computing & Applications, Neurocomputing, and other
Chicago, 1995. journals.