You are on page 1of 20

An Algorithmi

Des ription of ACS2

Martin V. Butz1 and Wolfgang Stolzmann2


1
Department of Cognitive Psy hology
University of Wurzburg, Germany
butzpsy hologie.uni-wuerzburg.de
2
DaimlerChrysler AG
Resear h and Te hnology
Berlin, Germany
wolfgang.stolzmanndaimler hrysler. om

Abstra t. The various modi ations and extensions of the anti ipatory
lassi er system (ACS) re ently led to the introdu tion of ACS2, an en-
han ed and modi ed version of ACS. This hapter provides an overview
over the system in luding all parameters as well as framework, stru -
ture, and environmental intera tion. Moreover, a pre ise des ription of
all algorithms in ACS2 is provided.

1 Introdu tion

Anti ipatory learning lassi er systems (ALCSs) are a new type of lassi er
system. The major addition in ALCSs is that they omprise the notation of an-
ti ipations in their framework. Doing that, the systems predominantly are able
to anti ipate per eptual onsequen es of a tions independent of a reinfor ement
predi tion. Thus, ALCSs are systems that are able to form a omplete anti i-
patory representation, that is, they build an environmental model. The model
spe i es whi h hanges take pla e in an environment after the exe ution of a
spe i a tion with respe t to the urrent situation. The essential intention be-
hind the framework is that the representation of an environmental model allows
faster and more intelligent adaptation of behavior or problem lassi ation. By
anti ipating the onsequen es of a tions with the evolving model, the system is
able to adapt its behavior faster and beyond the apabilities of reinfor ement
learning methods (Stolzmann, Butz, Ho mann, & Goldberg, 2000, Butz, 2001a).
The system ACS2 is derived from the original ACS framework as introdu ed
in Stolzmann (1997) and Stolzmann (1998). Moreover, ACS2 embodies the more
re ently introdu ed geneti generalization me hanism (Butz, Goldberg, & Stolz-
mann, 2000). This paper provides a pre ise algorithmi des ription of ACS2. The
des ription starts in a top down manner detailing rst the overall learning y le.
The following subse tions spe ify the single parts of the y le in more detail.
This arti le should be read in onjun tion with Butz (2001a) in whi h a more
omprehensive introdu tion of ACS2 is provided as well as a previous version of
this algorithmi des ription. The interested reader is also referred to the other
ited literature above for further ba kground.
The next se tion gives an overview of ACS2's framework, rule stru ture, and
environmental intera tion. Se tion 3 provides the a tual algorithmi des ription.
We hope that the des ription in ombination with the explanations about frame-
work, stru ture, parameters, and environmental intera tion fa ilitates resear h
with ACS2. We would like to en ourage feedba k regarding potential problems
or ambiguities. Moreover, the usage of the available ACS2 ode is en ouraged
(Butz, 2001b).

2 Environment, Knowledge Representation, and


Parameters

Before rushing into the algorithmi des ription, we provide an overview of the
basi environmental intera tion of ACS2, as well as its internal stru ture. More-
over, a list of all parameters in ACS2 is provided with additional suggested
parameter settings and possible hints of how to set the parameters with respe t
to a spe i problem.

2.1 Environmental Intera tion

Similar to reinfor ement learning agents, ACS2 intera ts autonomously with an


environment. It per eives situations  2 I = f1 ; 2 ; :::; m gL where m is the
number of possible values of ea h environmental attribute (or feature), 1 ; :::; m
are the di erent possible values of ea h attribute, and L is the string length.
Note, ea h attribute is not ne essarily oded binary but an only take on dis rete
values. Moreover, the system a ts upon the environment with a tions 2 A =
f 1 ; 2 ; :::; n g where n spe i es the number of di erent possible a tions in the
environment and 1 ; :::; n are the di erent possible a tions. After the exe ution
of an a tion, the reinfor ement program evaluates the a tion in the environment
and provides s alar reward (t) 2 < as feedba k.
Figure 1 illustrates the intera tion. Hereby, the reinfor ement program is
denoted by a separate module. In a ordan e to Dorigo and Colombetti (1997)
we separate the reinfor ement from the environment, sin e the reinfor ement
ould not only be provided by the environment itself, but also by an independent
tea her, or even ACS2 itself, viewing the system in this ase as an adaptive agent
with ertain needs that produ e an internal reinfor ement on e being satis ed.
ACS2 ould represent ertain motivations and intentions that would in uen e the
reinfor ement provision. For example, ertain environmental properties might be
highly desired by ACS2 so that the a hievement of one of the properties would
trigger high reinfor ement. Thus, to what extend the a tual reinfor ement is
in uen ed by the ACS2 agent is highly problem dependent.
The gure also represents the basi knowledge representation and main in-
tuition behind ACS2. Reinfor ement and per eptual feedba k trigger learning
in ACS2. Reinfor ement as well as anti ipations are represented in a ommon
model. The knowledge represented in the model ontrols the a tion sele tion,
i.e. the behavioral poli y. Moreover, the model is intended to be exploited for
improving the behavioral poli y. How the model is represented in ACS2 is ad-
dressed in the following se tion.

State
Environment
Rein- Situation Action
Rein-
forcement force-
Behavioral
ment
Program Model & Policy

ACS2
Policy
Learner

Model
Model Exploitation
Influence

Fig. 1. ACS2 intera ts with an environment per eiving environmental situations and
exe uting a tions in the environment. Reinfor ement is provided by a separate rein-
for ement program that evaluates the urrent environmental state and might be more
or less in uen ed by ACS2.

2.2 Knowledge Representation

As in other LCSs the knowledge in ACS2 is represented by a population [P ℄


of lassi ers. Ea h lassi er represents a ondition-a tion-e e t rule that anti i-
pates the model state resulting from the exe ution of the spe i ed a tion given
the spe i ed ondition. A lassi er in ACS2 always spe i es a omplete resulting
state. It onsists of the following omponents.
C The ondition part (C 2 f1 ; 2 ; :::;  ; #g
m ) spe i es the set of situations
L

(per eptions) in whi h the lassi er an be applied.


A The a tion part (A 2 A) proposes an available a tion.
E The e e t part (E 2 f1 ; 2 ; :::; m ; #gL ) anti ipates the e e ts that the las-
si er 'believes' to be aused by the spe i ed a tion.
M The mark (M = (m1 ; m2 ; :::; mL ) with mi  f1 ; 2 ; :::; m g) re ords the
properties in whi h the lassi er did not work orre tly before.
q The quality (q ) measures the a ura y of the anti ipations.
r The reward predi tion (r) predi ts the reward expe ted after the exe ution
of a tion A given ondition C .
ir The immediate reward predi tion (ir) predi ts the reinfor ement dire tly
en ountered after the exe ution of a tion A.
t ga The GA time stamp (tga 2 N ) re ords the last time the lassi er was part
of an a tion set in whi h a GA was applied.
talp The ALP time stamp (talp 2 N ) re ords the time the lassi er underwent
the last anti ipatory learning pro ess (ALP) update.
aav The appli ation average (aav 2 <) estimates the ALP update frequen y.
exp The experien e ounter (exp 2 N ) ounts the number of times the lassi er
underwent the ALP.
num The numerosity (num 2 N ) spe i es the number of a tual (mi ro-) lassi ers
this ma ro lassi er represents.
The ondition and e e t part onsist of the values per eived from the environ-
ment and '#'-symbols. A #-symbol in the ondition alled 'don't- are'-symbol
denotes that the lassi er mat hes any value in this attribute. A '#'-symbol in
the e e t part, alled 'pass-through'-symbol, spe i es that the lassi er anti i-
pates that the value of this attribute will not hange after the exe ution of the
spe i ed a tion. All lassi er parts are modi ed by the anti ipatory learning
pro ess (ALP), the reinfor ement learning appli ation, and the geneti general-
ization me hanism, whi h are des ribed in 3.6, se tions 3.7, and 3.8, respe tively.

2.3 Parameters

The following parameters ontrol the various learning methods in ACS2. We rst
provide a list of all parameters and then reveal their usage and default values in
further detail.
 The inadequa y threshold ( 2 [0; 1℄) spe i es when a lassi er is regarded
i i

as inadequate determined by its quality q .


 The reliability threshold ( 2 [0; 1℄) spe i es when a lassi er is regarded as
r r

reliable determined by q.
The learning rate ( 2 [0; 1℄) is used in ALP and RL updates a e ting q, r,
ir, and aav.
The dis ount fa tor ( 2 [0; 1)) dis ounts the maximal reward expe ted in
the subsequent situation.
u The spe i ity threshold (umax 2 N ) spe i es the maximum number of spe -
max

i ed attributes in C that are anti ipated to stay the same in E .


 The exploration probability ( 2 [0; 1℄) spe i es the probability of hoosing a
random a tion similar to the -greedy poli y in reinfor ement learning.
ga The GA appli ation threshold (ga 2 N ) ontrols the GA frequen y. A GA
is applied in an a tion set if the average delay of the last GA appli ation of
the lassi ers in the set is greater than ga .
 The mutation rate ( 2 [0; 1℄) spe i es the probability of hanging a spe i ed
attribute in the onditions of an o spring to a #-symbol in a GA appli ation.
 The rossover probability ( 2 [0; 1℄) spe i es the probability of applying
rossover in the onditions of the o spring when a GA is applied.
as The a tion set size threshold (as 2 N ) spe i es the maximal number of
lassi ers in an a tion set whi h is ontrolled by the GA.
exp The experien e threshold (exp 2 N ) spe i es when a lassi er is regarded as
experien ed determined by exp.
Although seemingly in uen ed by a lot of parameters, studies showed that ACS2
is relatively robust to any hosen parameter setting. Usually, all parameters an
be set to standard values. In the following, we spe ify default values and give
the basi intuition behind the parameters.
The inadequa y threshold i is usually set to the standard 0:1. Even val-
ues of 0:0 showed to not negatively in uen e the performan e of ACS2 sin e in
that ase geneti generalization takes are of inadequate lassi ers. A lassi er
is regarded as inadequate on e its quality q falls below i . In general, sin e in-
adequate lassi ers are deleted by the ALP (see se tion 3.6), i should be set
to a low value. The reliability threshold r on the other hand determines when
a lassi er is regarded as reliable and onsequently be omes part of the inter-
nal environmental model. The standard value is 0:9. Generally, the higher the
value is set, the longer it takes to form a omplete model but the more reliable
the model a tually is. A more ru ial parameter is the learning rate whi h
in uen es the update pro edure of several parameters. The usual value is 0:05
whi h an be regarded as a rather passive value. The higher , the faster param-
eters approa h an approximation of their a tual value but the more noisy the
approximation is. The dis ount fa tor determines the reward distribution over
the environmental model. A usual value is 0:95. It essentially spe i es to what
extend future reinfor ement in uen es urrent behavior. The loser to one, the
more in uen e delayed reward has on urrent behavior. The spe i ity thresh-
old umax onstraints the spe ialization me hanism, namely the ALP, to what
extend it is allowed to spe ialize the onditions. A save value is always L, the
length of the per eptual string. However, if knowledge is available about the
problem, then the learning pro ess an be strongly supported by a restri ted
umax parameter. The exploration probability  determines a tion sele tion and
onsequently behavior. The fastest model learning is usually a hieved by pure
random exploration. Further biases in a tion sele tion are not onsidered in this
des ription. The interested reader should refer to (Butz, 2002) in this volume.
The following parameters manipulate the geneti generalization me hanism.
The GA threshold ga ontrols the frequen y with whi h geneti generalization
is applied. A higher threshold assures that that the ALP has enough time to
work on a generalized set if ne essary. A default threshold is 100. Lower thresh-
olds usually keep the population size down but an ause information loss in
the beginning of a run. The mutation rate  is set unusual high sin e it is a
dire tly generalizing mutation. The default is set to 0:3. Lower values de rease
the generalization pressure and onsequently de rease the speed of onversion
in the population. Higher values on the other hand an also de rease onversion
be ause of the higher amount of over-general lassi ers in the population. The
rossover probability  is usually set to the standard value of 0:8. Crossover
seems to in uen e the pro ess only slightly. No problem was found so far in
whi h rossover a tually has a signi ant e e t. The a tion set size threshold
as is more ru ial sin e it determines the set size for the geneti generalization
appli ation. If the threshold is set too low, the GA might ause the deletion of
important lassi ers and onsequently disrupt the learning pro ess. If the size is
set very high, the system might learn the problem but it will take mu h longer
sin e the population size will rise a lot. However, the default value of 20 worked
very well in all so far applied problems. Finally, the experien e threshold exp
ontrols when a lassi er is usable as a subsumer. A low threshold might ause
the in orre t propagation of an over-general lassi er. However, no negative ef-
fe t or major in uen e has been identi ed so far. Usually, exp is set to 20.

3 Algorithmi Des ription

The provided des ription approa hes the problem in a top down manner. First,
the overall exe ution y le is spe i ed. In the subsequent se tions, ea h sub-
pro edure is spe i ed in further detail.
The following notational onstraints are used in the des ription. Ea h spe i-
ed sub-pro edure is written in pure apital letters. The intera tion with the en-
vironment and parti ularly requests from the environment or the reinfor ement
program are denoted with a olon. Moreover, to denote a ertain parameter of a
lassi er we use the dot notation. Finally, it is ne essary to note that we do not
use bra es or anything to denote the length of an if lause or a loop but rather
use indentation as the dire t ontrol.

3.1 Initialization

In the beginning of an ACS2 run, rst, all modules need to be initialized. The
environment env must be reated and the animat represented by ACS2 needs to
be set to a ertain position or state in the environment and so forth. Also, the
reinfor ement program rp must be initialized. Finally, ACS2 must be initialized
itself. Hereby, the parameter settings are determined, the time-step ounter, re-
ferred to as t, is set, and the (in the beginning usually empty) population is
reated. After all initialization, whi h we do not larify in further detail be ause
of their strong problem and implementation dependen e, the main loop is alled.

START ACS2:
1 initialize environment env
2 initialize reinfor ement program rp
3 initialize ACS2
4 RUN EXPERIMENT with population [P ℄ and initial time t

3.2 The Main Exe ution Loop

The main loop RUN EXPERIMENT is exe uted as long as some termination
riteria are not met. In the main loop, the urrent situation is rst sensed (per-
eived as input). Se ond, the mat h set [M ℄ is formed from all lassi ers that
mat h the situation. If this is not the beginning of a trial, ALP, reinfor ement
learning, and GA are applied in the previous a tion set. Next, an a tion is ho-
sen for exe ution, the a tion is exe uted, and an a tion set is generated from all
lassi ers in [M ℄ that spe ify the hosen a tion. After some parameter updates,
ALP, reinfor ement learning, and GA may be applied in [A℄ if the exe ution of
the a tion led to the end of one trial. Finally, after [A℄ is stored for learning in
the next step, the loop is redone. In the ase of an end of trial, [A℄ 1 needs to be
emptied to prevent in orre t learning over a trial barrier (i.e. sin e the su essive
situation is unrelated to the previous one).
The main loop spe i es many sub-pro edures denoted in apital letters whi h
are des ribed below in further details. Some of the pro edures are more or less
trivial while others are omplex and themselves all other sub-pro edures. Ea h
of the sub-se tions try to spe ify the general idea and the overall pro ess and
then give a more detailed des ription of single parts in su essive paragraphs.

RUN EXPERIMENT([P ℄, t):


1 while(termination riteria are not met)
2  env: per eive situation
3 do
4 GENERATE MATCH SET [M ℄ out of [P ℄ using 
5 if([A℄ 1 is not empty)
6 APPLY ALP in [A℄ 1 onsidering  1 ,  , t, and [P ℄
7 APPLY REINFORCEMENT LEARNING in [A℄ 1 using
 and max ( l:q  l:r)
2[ ℄^
l ML l:E 6=f#g
8 APPLY GENETIC GENERALIZATION in [A℄ 1 onsidering t
9 a t CHOOSE ACTION with an -greedy poli y in [M ℄
10 GENERATE ACTION SET [A℄ out of [M ℄ a ording to a t
11 env: exe ute a tion a t
12 t t+1
13 rp: re eive reward 
14  1 
15  env: per eive situation
16 if(env : is end of one trial)
17 APPLY ALP in [A℄ onsidering  ,  1 , t, and [P ℄
18 APPLY REINFORCEMENT LEARNING in [A℄ using 
19 APPLY GENETIC GENERALIZATION in [A℄ onsidering t
20 [A℄ 1 [A℄
21 while(not env : is end of one trial)
22 env: reset position
23 [A℄ 1 empty

3.3 Formation of the Mat h Set

The GENERATE MATCH SET pro edure gets as input the urrent population
[P ℄ and the urrent situation  . The pro edure in ACS2 is quite trivial. All las-
si ers in [P ℄ are simply ompared to  and all mat hing lassi ers are added to
the mat h set. The sub-pro edure DOES MATCH is explained below.
GENERATE MATCH SET([P ℄, ):
1 initialize empty set [M ℄
3 for ea h lassifier l in [P ℄
4 if(DOES MATCH lassifier l in situation )
5 add lassifier l to set [M ℄
6 return [M ℄

The mat hing pro edure is ommonly used in LCSs. A 'don't are'-symbol #
in C mat hes any symbol in the orresponding position of  . A ' are' or non-#
symbol only mat hes with the exa t same symbol at that position. The DOES
MATCH pro edure he ks ea h omponent in the lassi er's ondition l:C . If
a omponent is spe i ed (i.e. is not a don't are symbol), it is ompared with
the orresponding attribute in the urrent situation  . Only if all omparisons
hold, the lassi er mat hes  and the pro edure returns true.

DOES MATCH( l, ):


1 for ea h attribute x in l:C
2 if(x 6= # and x 6= the orresponding attribute in )
3 return false
4 return true

3.4 Choosing an A tion

In ACS2 usually an -greedy method is used for a tion sele tion. However, un-
like non-generalizing reinfor ement learning methods, it is not lear whi h a tion
is a tually the best to hoose sin e one situation-a tion tuple is mostly repre-
sented by several distin t lassi ers. In this des ription we hose to use the simple
method that the a tion of the apparent most promising lassi er is hosen. Sin e
ACS2 also evolves lassi ers that expli itely predi t no hange in the environ-
ment and there is no su h thing as a waiting ne essity in the problems addressed,
those lassi ers are ex luded in the onsideration. The de ision is made in the
provided urrent mat h set [M ℄.

CHOOSE ACTION([M ℄):


1 if(RandomNumber[0; 1) < )
2 return a randomly hosen a tion possible in env
3 else
4 bestCl first l in [M ℄ with l:E 6= f#gL
5 for all lassifiers l 2 [M ℄
6 if( l:E 6= f#gL and l:q  l:r > bestCl:q  bestCl:r)
7 bestCl l
8 return l:A
3.5 Formation of the A tion Set

After the mat h set is formed and an a tion is hosen for exe ution, the GEN-
ERATE ACTION SET pro edure forms the a tion set out of the mat h set.
It in ludes all lassi ers in the urrent mat h set [M ℄ that propose the hosen
a tion a t for exe ution.

GENERATE ACTION SET([M ℄, a t):


1 initialize empty set [A℄
2 for ea h lassifier l in [M ℄
3 if( l:A = a t)
4 add lassifier l to set [A℄
5 return [A℄

3.6 Anti ipatory Learning Pro ess

The appli ation of the anti ipatory learning pro ess is rather deli ate. Due to its
simultaneous reation and deletion of lassi ers, it needs to be assured that newly
generated lassi ers are added to the urrent a tion set but are not re onsidered
in the urrent ALP appli ation. Deleted lassi ers need to be deleted from the
a tion set without in uen ing the update pro ess. The algorithmi des ription
does not address su h details. However, it is ne essary to be aware of these
possible problems.
The APPLY ALP pro edure su essively onsiders the anti ipation of ea h
lassi er. If the anti ipation is orre t or wrong, the EXPECTED CASE or
UNEXPECTED CASE is alled, respe tively. In the UNEXPECTED CASE
pro edure the quality is de reased so that it is ne essary to he k if the quality
de reased under the inadequa y threshold i . If the ase, the lassi er is re-
moved (regardless of its numerosity num sin e all mi ro- lassi ers are a tually
inadequate). When adding a new lassi er, it is ne essary to he k for identi-
al lassi ers and possibly subsuming lassi ers. Thus, another sub-pro edure
is alled in this ase. Finally, if no lassi er in the a tion set anti ipates the
en ountered hange orre tly, a overing lassi er is generated and added. The
method is usually alled from the main loop. Inputs are the a tion set [A℄ in
whi h the ALP is applied, the situation  1 - a tion a t tuple from whi h [A℄
was generated, the resulting situation  , the time t the a tion was applied, and
the urrent population [P ℄.

Appli ation Average The UPDATE APPLICATION AVERAGE pro edure uses
the moyenne adaptive modifee te hnique to rea h an a urate value of the ap-
pli ation average as soon as possible. Also the ALP time stamp talp is set in this
pro edure. The pro edure gets the to be updated lassi er l and the urrent
time t as input.
APPLY ALP([A℄,  1 , a t, , t, [P ℄):
1 wasExpe tedCase 0
2 for ea h lassifier l in [A℄
3 l:exp++
4 UPDATE APPLICATION AVERAGE of l with respe t to t
5 if( l DOES ANTICIPATE CORRECTLY  in  1 )
6 newCl EXPECTED CASE of l in  ,  1
7 wasExpe tedCase 1
8 else
9 newCl UNEXPECTED CASE of l in  ,  1
10 if( l:q < i )
11 remove lassifier l from [P ℄ and [A℄
12 if(newCl is not empty)
13 newCl:tga t
14 ADD ALP CLASSIFIER newCl to [P ℄ and [A℄
15 if(wasExpe tedCase = 0)
16 newCl COVER TRIPLE  1 , a t,  with time t
17 ADD ALP CLASSIFIER newCl to [P ℄ and [A℄

UPDATE APPLICATION AVERAGE( l, t):


1 if( l:exp < 1= )
2 l:aav l:aav + (t l:t - l:aav) / l:exp
alp

3 else
4 l:aav l:aav + * (t l:t alp - l:aav)
5 l:talp t

Che k Anti ipation While the pass-through symbols in the e e t part of a las-
si er dire tly anti ipate that these attributes stay the same after the exe ution
of an a tion, the spe i ed attributes anti ipate a hange to the spe i ed value.
Thus, if the per eived value did not hange to the anti ipated value but a tually
stayed at the value, the lassi er anti ipates in orre tly. This is onsidered in
the DOES ANTICIPATE CORRECTLY pro edure. Inputs are the to be inves-
tigated lassi er l, the situation  1 where l was applied in, and the resulting
situation  .

DOES ANTICIPATE CORRECTLY( l,  1 , ):


1 for ea h position i
2 if( l:E [i℄ = #)
3 if( 1 [i℄ 6=  [i℄)
4 return false
5 else
6 if( l:E [i℄ 6=  [i℄ or sigma 1[i℄ = ℄)
7 return false
8 return true
Expe ted Case The stru ture of the expe ted ase an be separated into two
parts in whi h either a new lassi er is generated or not. No lassi er is gener-
ated if the mark of the investigated lassi er l is either empty or no di eren e
is dete ted between the mark and the relevant situation  . In this ase, the
quality of the su essful lassi er is in reased and no new lassi er is returned
(denoted by return 'empty'). On the other hand, if di eren es are dete ted be-
tween mark and situation, o spring is generated. It is important to onsider
the ase where the spe ialization requests to spe ialize too many attributes.
In this ase, generalization of to be spe i ed attributes is ne essary. If the o -
spring without spe ialization already rea hed the threshold umax , it is ne essary
to generalize the o spring to allow spe ialization of other attributes. The diff
attribute has the stru ture of a ondition part. Its reation is spe i ed below
in another sub-pro edure. The handling of probability-enhan ed predi tions as
published in Butz; Goldberg; andStolzmann(2001), whi h we do not address in
this des ription, should be aught in line 3 if the mark is not empty. Moreover,
the probabilities in the parental lassi er would be updated in this method.

EXPECTED CASE( l, ):


1 diff GET DIFFERENCES of l:M and 
2 if(diff = f#g ) L

3 l:q l:q +  (1 l:q)


4 return empty
5 else
6 spe number of non-# symbols in l:C
7 spe New number of non-# symbols in diff
8 hild opy lassifier l
9 if(spe = umax )
10 remove randomly spe ifi attribute in hild:C
11 spe --
12 while(spe + spe New > umax )
13 if(spe > 0 and random number [0; 1) < 0:5)
14 remove randomly spe ifi attribute in hild:C
15 spe --
16 else
17 remove randomly spe ifi attribute in diff
18 spe New--
19 else
20 while(spe + spe New > umax )
21 remove randomly spe ifi attribute in diff
22 spe New--
23 spe ify hild:C with diff
24 if( hild:q < 0:5)
25 hild:q = 0:5
26 hild:exp 1
27 return hild
Di eren e Determination The di eren e determination needs to distinguish be-
tween two ases. (1) Clear di eren es are those where one or more attributes in
the mark M do not ontain the orresponding attribute in the situation  . (2)
Fuzzy di eren es are those where there is no lear di eren e but one or more
attributes in the mark M spe ify more than the one value in  . In the rst ase,
one random lear di eren e is spe i ed while in the latter ase all di eren es are
spe i ed.

GET DIFFERENCES(M , ):


1 diff f#g L

2 if(M is not empty)


3 type1 type2 0
4 for all positions i in 
5 if(M [i℄ does not ontain  [i℄)
6 type1++
7 else if(jM [i℄j > 1)
8 type2++
9 if(type1 > 0)
10 type1 random number [0; 1)  type1
11 for all positions i in 
12 if(M [i℄ does not ontain  [i℄)
13 if(type1 = 0)
14 diff [i℄ ℄
15 type1--
16 else if(type2 > 0)
17 for all positions i in 
18 if(jM [i℄j > 1)
19 diff [i℄ ℄
20 return diff

Unexpe ted Case The unexpe ted ase is rather simply stru tured. Important is
the riterion for generating an o spring lassi er. An o spring is generated only
if the e e t part of the to be investigated lassi er l an be modi ed to anti i-
pate the hange from  1 to  orre tly by only spe ializing attributes. If this is
the ase, an o spring lassi er is generated that is spe ialized in ondition and
e e t part where ne essary. The experien e of the o spring lassi er is set to one.
UNEXPECTED CASE( l,  1 , ):
1 l:q l:q  ( l:q )
2 SET MARK l:M with  1
3 for all positions i in 
4 if( l:E [i℄ 6= #)
5 if( l:E [i℄ 6=  [i℄ or  1 [i℄ =  [i℄)
6 return empty
7 hild opy lassifier l
8 for all positions i in 
9 if( l:E [i℄ = # and  1 [i℄ 6=  [i℄)
10 hild:C [i℄  1 [i℄
11 hild:E [i℄ ℄
12 if( l:q < 0:5)
13 l:q = 0:5
14 hild:exp 1
15 return hild

Covering The idea behind overing is that ACS2 intends to over all possible
situation-a tion-e e t triples. In the ALP, if su h a triple was not represented
by any lassi er in the a tion set, overing is invoked. Covering generates a las-
si er that spe i es all hanges from the previous situation  1 to situation 
in ondition and e e t part. The a tion part A of the new lassi er is set to
the exe uted a tion a t. The time is set to the urrent time t. An empty las-
si er is referred to as a lassi er that onsists only of #-symbols in ondition
and e e t part. Note, sin e the experien e ounter is set to 0, the appli ation
average parameter aav will be dire tly set to the delay til its rst appli ation
in its rst appli ation, so that the initialization is not parti ularly important.
Moreover, the quality l:q is set to 0:5 and the reward predi tion l:r is set to
zero to prevent 'reward bubbles' in the environmental model.

COVER TRIPLE( 1 , a t, , t):


1 hild generate empty lassifier with a tion a t
2 for all positions i in 
3 if( 1 [i℄ 6=  [i℄)
4 hild:C [i℄  1 [i℄
5 hild:E [i℄ ℄
6 hild:exp hild:r hild:aav 0
7 hild:talp hild:tga t
8 hild:q 0:5
9 hild:num 1
10 return hild
Insertion in the ALP If the ALP generates o spring, insertion distinguishes be-
tween several ases. First, the method he ks if there is a lassi er that subsumes
the insertion andidate l. If there is none, the method looks for equal lassi ers.
If none was found, lassi er l is inserted as a new lassi er in the population
[P ℄ and the urrent a tion set [A℄. However, if a subsumer or equal lassi er was
found, the quality of the old lassi er is in reased and the new one is dis arded.
The subsumption method is des ribed in se tion 3.9 sin e the GA appli ation
uses the same method. Note, in the equality he k it is not ne essary to he k
for the identi al a tion sin e all lassi ers in [A℄ have the same a tion.

ADD ALP CLASSIFIER( l, [A℄, [P ℄):


1 oldCl empty
2 for all lassifiers in [A℄
3 if( IS SUBSUMER of l)
4 if(oldCl is empty or :C is more general than oldCl:C )
5 oldCl
6 if(oldCl is empty)
7 for all lassifiers in [A℄
8 if( is equal to l in ondition and effe t part)
9 oldCl
10 if(oldCl is empty)
11 insert l in [A℄ and [P ℄
12 else
13 oldCl:q oldCl:q + beta  (1 oldCl:q)
14 dis ard lassifier l

3.7 Reinfor ement Learning

The reinfor ement portion of the update pro edure follows the idea of Q-learning
(Watkins, 1989). Classi er's reward predi tions are updated using the immediate
reward  and the dis ounted maximum payo predi ted in the next time-step
maxP . The major di eren e is that ACS2 does not store an expli it model
but only more or less generalized lassi ers that represent the model. Thus, for
the reinfor ement learning pro edure to work su essfully, it is mandatory that
the model is spe i enough for the reinfor ement distribution. Lanzi (2000)
formulizes this insight in a general lassi er system framework. The pro edure
updates the reward predi tions r as well as the immediate reward predi tions ir
of all lassi ers in the a tion set [A℄.

APPLY REINFORCEMENT LEARNING([A℄, , maxP ):


1 for ea h lassifier l in [A℄
2 l:r l:r +  ( + maxP l:r)
3 l:ir l:ir +  ( l:ir)
3.8 Geneti Generalization

The GA in ACS2 is a geneti generalization of ondition parts. Due to the


modi ed generalizing mutation and the evolutionary pressures, the generalizing
nature of the GA is realized. The method starts by determining if a GA should
a tually take pla e, ontrolled by the tga time stamp and the a tual time t.
If a GA takes pla e, preferable a urate, over-spe i ed lassi ers are sele ted,
mutated, and rossed. Before the insertion, ex ess lassi ers are deleted in [A℄.
Several parts of the pro esses are spe i ed by sub-pro edures whi h are des ribed
after the des ription of the main GA pro edure.

P l A ga
P
APPLY GENETIC GENERALIZATION ([A℄, t):
1 if(t 2[ ℄ l:t l:num= 2[ ℄ l:num >  )
l A GA

2 for ea h lassifier l in [A℄


3 l:t ga a tual time t
4 parent1 SELECT OFFSPRING in [A℄
5 parent2 SELECT OFFSPRING in [A℄
6 hild1 opy lassifier parent1
7 hild2 opy lassifier parent2
8 hild1:num hild2:num 1
9 hild1:exp hild2:exp 1
10 APPLY GENERALIZING MUTATION on hild1
11 APPLY GENERALIZING MUTATION on hild2
12 if(RandomNumber[0; 1) < )
13 APPLY CROSSOVER on hild1 and hild2
14 hild1 :r hild2:r (parent1 :r + parent2 :r)=2
15 hild1 :q hild2:q (parent1 :q + parent2 :q)=2
16 hild1:q hild1:q=2
17 hild2:q hild2:q=2
18 DELETE CLASSIFIERS in [A℄; [P ℄ to allow the insertion
of 2 hildren
19 for ea h hild
20 if( hild:C equals f#gL )
21 next hild
22 else
23 ADD GA CLASSIFIER hild to [P ℄ and [A℄

O spring Sele tion O spring in the GA is sele ted by a Roulette-Wheel Sele -


tion. The pro ess hooses a lassi er for reprodu tion in set [A℄ proportional to
its quality to the power three. First, the sum of all values in set [A℄ is omputed.
Next, the roulette-wheel is spun. Finally, the lassi er is hosen a ording to the
roulette-wheel result.
SELECT OFFSPRING([A℄):
1 qualitySum 0
2 for ea h lassifier l in [A℄
3 qualitySum qualitySum + l:q 3
4 hoi eP oint RandomNumber[0; 1)  qualitySum
5 qualitySum 0
6 for ea h lassifier l in [A℄
7 qualitySum qualitySum + l:q3
8 if(qualitySum > hoi eP oint)
9 return l

Mutation As has been noted before, the mutation pro ess in ACS2 is a generaliz-
ing mutation of the ondition part l:C . Spe i attributes in the onditions are
hanged to #-symbols with a ertain probability . The pro ess works as follows:

APPLY GENERALIZING MUTATION ( l):


1 for all positions i in l:C
2 if( l:C [i℄ 6= #)
3 if(RandomNumber[0; 1) < )
4 l:C [i℄ #
Crossover The rossover appli ation, as mutation, is only applied to the on-
dition part. Crossover is only applied, if the two o spring lassi ers l1 and
l2 anti ipate the same hange. This restri tion further assures the ombination
of lassi ers that inhabit the same environmental ni he. Our des ription shows
two-point rossover.

APPLY CROSSOVER ( l1, l2):


1 if( l1:E 6= l2:E )
2 return
3 x RandomNumber[0; 1) (length of l1 :C +1)
4 do
5 y RandomNumber[0; 1) (length of l1 :C +1)
6 while(x = y )
7 if(x > y )
8 swit h x and y
9 i 0
10 do
11 if(x  i and i < y )
12 swit h l1 :C [i℄ and l2 :C [i℄
13 i++
14 while(i < y )

GA Deletion While the reprodu tion pro ess uses a form of roulette wheel se-
le tion, GA deletion in ACS2 applies a modi ed tournament sele tion pro ess.
Approximately a third of the a tion set size takes part in the tournament. The
lassi er is deleted that has a signi antly lower quality than the others. If all
lassi ers have a similar quality, marked lassi ers are preferred for deletion be-
fore unmarked lassi ers and the least applied lassi er is preferred among only
marked or only unmarked lassi ers. First, however, the method ontrols if and
how long lassi ers need to be deleted in [A℄. The parameter inSize spe i es
the number of hildren that will still be inserted in the GA pro ess. Note, the
tournament is held among the mi ro- lassi ers. If a lassi er is removed from
the population, that is, if its numerosity rea hes zero, the lassi er needs to be
removed from the a tion set [A℄ as well as from the whole population [P ℄.

P
DELETE CLASSIFIERS([A℄, [P ℄, inSize):
1 while(inSize + 2[ ℄ l:num >  )
l A as

2 lDel empty
3 for ea h mi ro- lassifier l in [P ℄
4 if(RandomNumber[0; 1) < 1=3)
5 if( lDel is empty)
6 lDel l
7 else
8 if( l:q lDel:q < 0:1)
9 lDel l
10 if(j l:q lDel:q j  0:1)
11 if( l:M is not empty and lDel:M is empty)
12 lDel l
13 else if( l:M is not empty or lDel:M is empty)
14 if( l:aav > lDel:aav )
15 lDel l
16 if( lDel is not empty)
17 if( lDel:num > 1)
18 lDel:num--
19 else
20 remove lassifier l from [P ℄ and [A℄

Insertion in the GA Although quite similar to the ALP insertion, the insertion
method in the GA di ers in two important points. First, the numerosity num
rather than the quality q of an old, subsuming or identi al lassi er is in reased.
Se ond, the numerosity of an identi al lassi er is only in reased if the identi al
lassi er is not marked. Parameters are as before the to be inserted lassi er l,
the a tion set [A℄ lassi er l was generated from, and the urrent population [P ℄.
ADD GA CLASSIFIER( l, [A℄, [P ℄):
1 oldCl empty
2 for all lassifiers in [A℄
3 if( IS SUBSUMER of l)
4 if(oldCl is empty or :C is more general than oldCl:C )
5 oldCl
6 if(oldCl is empty)
7 for all lassifiers in [A℄
8 if( is equal to l in ondition and effe t part)
9 oldCl
10 if(oldCl is empty)
11 insert l in [A℄ and [P ℄
12 else
13 if(oldCl is not marked)
14 oldCl:num++
15 dis ard lassifier l

3.9 Subsumption

ACS2 looks for subsuming lassi ers in the GA appli ation as well as in the
ALP appli ation. For a lassi er lsub to subsume another lassi er ltos , the
subsumer needs to be experien ed, reliable, and not marked. Moreover, the sub-
sumer's ondition part needs to be synta ti ally more general and the e e t part
needs to be identi al. Note again, an identi al a tion he k is not ne essary sin e
both lassi ers o upy the same a tion set. The pro edure returns if lassi er
ltos is subsumed by lsub but does not apply any onsequent parameter hanges.
IS SUBSUMER( l , l ):
sub tos

1 if( l :exp > 


sub and l
exp sub :q > r and lsub :M is empty)
2 if (the number of # in lsub :C  the number of # in ltos :C )
3 if( lsub :E is equal to ltos :E )
4 return true
5 return false

4 Summary

This hapter gave a pre ise overview over the ACS2 system. Intera tion, knowl-
edge representation, and parameter identi ation should serve as a basi refer-
en e book when implementing a new problem and applying ACS2 to it. The
algorithmi des ription revealed all pro esses inside ACS2 and should serve as
a helpful guide to program an own version of ACS2 or develop an enhan ed
anti ipatory learning lassi er system out of the ACS2 framework. The des rip-
tion did not in lude any implementation details so that the system should be
programmable in any programming language with the help of this des ription.
A knowledgments We would like to thank the Department of Cognitive Psy-
hology at the University of Wurzburg for their support. The work was sponsored
by the German Resear h Foundation DFG.

Referen es

Butz, M. V. (2001a). Anti ipatory learning lassi er systems. Geneti Algo-


rithms and Evolutionary Computation. Boston, MA: Kluwer A ademi
Publishers.
Butz, M. V. (2001b). An implementation of the anti ipatory lassi er sys-
tem ACS2 in C++ (IlliGAL report 2001026). University of Illinois at
Urbana-Champaign: Illinois Geneti Algorithms Laboratory. http://www-
illigal.ge.uiu .edu/sour e d.html.
Butz, M. V. (2002). Biasing exploration in an anti ipatory learning lassi er
system. In Lanzi, P. L., Stolzmann, W., & Wilson, S. W. (Eds.), Pro eed-
ings of the Fourth International Workshop on Learning Classi er Systems
(IWLCS-2001) Berlin Heidelberg: Springer-Verlag.
Butz, M. V., Goldberg, D. E., & Stolzmann, W. (2000). Introdu ing a ge-
neti generalization pressure to the anti ipatory lassi er system: Part 1
- theoreti al approa h. In Whitely, D., Goldberg, D. E., Cantu-Paz, E.,
Spe tor, L., Parmee, I., & Beyer, H.-G. (Eds.), Pro eedings of the Geneti
and Evolutionary Computation Conferen e (GECCO-2000) pp. 34{41. San
Fran is o, CA: Morgan Kaufmann.
Butz, M. V., Goldberg, D. E., & Stolzmann, W. (2001). Probability-enhan ed
predi tions in the anti ipatory lassi er system. In Lanzi, P. L., Stolzmann,
W., & Wilson, S. W. (Eds.), Advan es in Learning Classi er Systems,
LNAI 1996 pp. 37{51. Berlin Heidelberg: Springer-Verlag.
Dorigo, M., & Colombetti, M. (1997). Robot Shaping, an experiment in behav-
ior engineering. Intelligent Roboti s and Autonomous Agents. Cambridge,
MA: MIT Press.
Lanzi, P. L. (2000). Learning lassi er systems from a reinfor ement learn-
ing perspe tive (Te hni al Report 00-03). Dipartimento di Elettroni a e
Informazione, Polite ni o di Milano.
Stolzmann, W. (1997). Antizipative Classi er Systems [Anti ipatory lassi er
systems℄. Aa hen, Germany: Shaker Verlag.
Stolzmann, W. (1998). Anti ipatory lassi er systems. In Koza, J. R.,
Banzhaf, W., Chellapilla, K., Deb, K., Dorigo, M., Fogel, D., Grazon, M.,
Goldberg, D., Iba, H., & Riolo, R. (Eds.), Geneti Programming 1998:
Pro eedings of the Third Annual Conferen e pp. 658{664. San Fran is o,
CA: Morgan Kaufmann.
Stolzmann, W., Butz, M. V., Ho mann, J., & Goldberg, D. E. (2000). First
ognitive apabilities in the anti ipatory lassi er system. In Meyer, J.-A.,
Berthoz, A., Floreano, D., Roitblat, H., & Wilson, S. W. (Eds.), From
Animals to Animats 6: Pro eedings of the Sixth International Conferen e
on Simulation of Adaptive Behavior pp. 287{296. Cambridge, MA: MIT
Press.
Watkins, C. J. C. H. (1989). Learning from delayed rewards. Do toral disser-
tation, King's College, Cambridge, UK.

You might also like