Butz, Stolzmann - 2002 - An Algorithmic Description of ACS2

An Algorithmi
Des ription of ACS2
Martin V. Butz1 and Wolfgang Stolzmann2

1
Department of Cognitive Psy hology
University of Wurzburg, Germany
butzpsy hologie.uni-wuerzburg.de
2
DaimlerChrysler AG
Resear h and Te hnology
Berlin, Germany
wolfgang.stolzmanndaimler hrysler. om
Abstra t. The various modi ations and extensions of the anti ipatory
lassier system (ACS) re ently led to the introdu tion of ACS2, an en-
han ed and modied version of ACS. This hapter provides an overview
over the system in luding all parameters as well as framework, stru -
ture, and environmental intera tion. Moreover, a pre ise des ription of
all algorithms in ACS2 is provided.
1 Introdu tion
Anti ipatory learning lassier systems (ALCSs) are a new type of lassier
system. The major addition in ALCSs is that they omprise the notation of an-
ti ipations in their framework. Doing that, the systems predominantly are able
to anti ipate per eptual onsequen es of a tions independent of a reinfor ement
predi tion. Thus, ALCSs are systems that are able to form a omplete anti i-
patory representation, that is, they build an environmental model. The model
spe ies whi h hanges take pla e in an environment after the exe ution of a
spe i a tion with respe t to the urrent situation. The essential intention be-
hind the framework is that the representation of an environmental model allows
faster and more intelligent adaptation of behavior or problem lassi ation. By
anti ipating the onsequen es of a tions with the evolving model, the system is
able to adapt its behavior faster and beyond the apabilities of reinfor ement
learning methods (Stolzmann, Butz, Homann, & Goldberg, 2000, Butz, 2001a).
The system ACS2 is derived from the original ACS framework as introdu ed
in Stolzmann (1997) and Stolzmann (1998). Moreover, ACS2 embodies the more
re ently introdu ed geneti generalization me hanism (Butz, Goldberg, & Stolz-
mann, 2000). This paper provides a pre ise algorithmi des ription of ACS2. The
des ription starts in a top down manner detailing rst the overall learning y le.
The following subse tions spe ify the single parts of the y le in more detail.
This arti le should be read in onjun tion with Butz (2001a) in whi h a more
omprehensive introdu tion of ACS2 is provided as well as a previous version of
this algorithmi des ription. The interested reader is also referred to the other
ited literature above for further ba kground.
The next se tion gives an overview of ACS2's framework, rule stru ture, and
environmental intera tion. Se tion 3 provides the a tual algorithmi des ription.
We hope that the des ription in ombination with the explanations about frame-
work, stru ture, parameters, and environmental intera tion fa ilitates resear h
with ACS2. We would like to en ourage feedba k regarding potential problems
or ambiguities. Moreover, the usage of the available ACS2 ode is en ouraged
(Butz, 2001b).
2 Environment, Knowledge Representation, and

Parameters
Before rushing into the algorithmi des ription, we provide an overview of the
basi environmental intera tion of ACS2, as well as its internal stru ture. More-
over, a list of all parameters in ACS2 is provided with additional suggested
parameter settings and possible hints of how to set the parameters with respe t
to a spe i problem.
2.1 Environmental Intera tion
Similar to reinfor ement learning agents, ACS2 intera ts autonomously with an

environment. It per eives situations 2 I = f1 ; 2 ; :::; m gL where m is the
number of possible values of ea h environmental attribute (or feature), 1 ; :::; m
are the dierent possible values of ea h attribute, and L is the string length.
Note, ea h attribute is not ne essarily oded binary but an only take on dis rete
values. Moreover, the system a ts upon the environment with a tions 2 A =
f1 ; 2 ; :::; n g where n spe ies the number of dierent possible a tions in the
environment and 1 ; :::; n are the dierent possible a tions. After the exe ution
of an a tion, the reinfor ement program evaluates the a tion in the environment
and provides s alar reward (t) 2 < as feedba k.
Figure 1 illustrates the intera tion. Hereby, the reinfor ement program is
denoted by a separate module. In a ordan e to Dorigo and Colombetti (1997)
we separate the reinfor ement from the environment, sin e the reinfor ement
ould not only be provided by the environment itself, but also by an independent
tea her, or even ACS2 itself, viewing the system in this ase as an adaptive agent
with ertain needs that produ e an internal reinfor ement on e being satised.
ACS2 ould represent ertain motivations and intentions that would in uen e the
reinfor ement provision. For example, ertain environmental properties might be
highly desired by ACS2 so that the a hievement of one of the properties would
trigger high reinfor ement. Thus, to what extend the a tual reinfor ement is
in uen ed by the ACS2 agent is highly problem dependent.
The gure also represents the basi knowledge representation and main in-
tuition behind ACS2. Reinfor ement and per eptual feedba k trigger learning
in ACS2. Reinfor ement as well as anti ipations are represented in a ommon
model. The knowledge represented in the model ontrols the a tion sele tion,
i.e. the behavioral poli y. Moreover, the model is intended to be exploited for
improving the behavioral poli y. How the model is represented in ACS2 is ad-
dressed in the following se tion.
State
Environment
Rein- Situation Action
Rein-
forcement force-
Behavioral
ment
Program Model & Policy
ACS2
Policy
Learner
Model
Model Exploitation
Influence
Fig. 1. ACS2 intera ts with an environment per eiving environmental situations and
exe uting a tions in the environment. Reinfor ement is provided by a separate rein-
for ement program that evaluates the urrent environmental state and might be more
or less in uen ed by ACS2.
2.2 Knowledge Representation
As in other LCSs the knowledge in ACS2 is represented by a population [P ℄

of lassiers. Ea h lassier represents a ondition-a tion-ee t rule that anti i-
pates the model state resulting from the exe ution of the spe ied a tion given
the spe ied ondition. A lassier in ACS2 always spe ies a omplete resulting
state. It onsists of the following omponents.
C The ondition part (C 2 f1 ; 2 ; :::; ; #g
m ) spe ies the set of situations
L
(per eptions) in whi h the lassier an be applied.

A The a tion part (A 2 A) proposes an available a tion.
E The ee t part (E 2 f1 ; 2 ; :::; m ; #gL ) anti ipates the ee ts that the las-
sier 'believes' to be aused by the spe ied a tion.
M The mark (M = (m1 ; m2 ; :::; mL ) with mi f1 ; 2 ; :::; m g) re ords the
properties in whi h the lassier did not work orre tly before.
q The quality (q ) measures the a ura y of the anti ipations.
r The reward predi tion (r) predi ts the reward expe ted after the exe ution
of a tion A given ondition C .
ir The immediate reward predi tion (ir) predi ts the reinfor ement dire tly
en ountered after the exe ution of a tion A.
t ga The GA time stamp (tga 2 N ) re ords the last time the lassier was part
of an a tion set in whi h a GA was applied.
talp The ALP time stamp (talp 2 N ) re ords the time the lassier underwent
the last anti ipatory learning pro ess (ALP) update.
aav The appli ation average (aav 2 <) estimates the ALP update frequen y.
exp The experien e ounter (exp 2 N ) ounts the number of times the lassier
underwent the ALP.
num The numerosity (num 2 N ) spe ies the number of a tual (mi ro-) lassiers
this ma ro lassier represents.
The ondition and ee t part onsist of the values per eived from the environ-
ment and '#'-symbols. A #-symbol in the ondition alled 'don't- are'-symbol
denotes that the lassier mat hes any value in this attribute. A '#'-symbol in
the ee t part, alled 'pass-through'-symbol, spe ies that the lassier anti i-
pates that the value of this attribute will not hange after the exe ution of the
spe ied a tion. All lassier parts are modied by the anti ipatory learning
pro ess (ALP), the reinfor ement learning appli ation, and the geneti general-
ization me hanism, whi h are des ribed in 3.6, se tions 3.7, and 3.8, respe tively.
2.3 Parameters
The following parameters ontrol the various learning methods in ACS2. We rst
provide a list of all parameters and then reveal their usage and default values in
further detail.
The inadequa y threshold ( 2 [0; 1℄) spe ies when a lassier is regarded
i i
as inadequate determined by its quality q .

The reliability threshold ( 2 [0; 1℄) spe ies when a lassier is regarded as
r r
reliable determined by q.
The learning rate ( 2 [0; 1℄) is used in ALP and RL updates ae ting q, r,
ir, and aav.
The dis ount fa tor ( 2 [0; 1)) dis ounts the maximal reward expe ted in
the subsequent situation.
u The spe i ity threshold (umax 2 N ) spe ies the maximum number of spe -
max
ied attributes in C that are anti ipated to stay the same in E .

The exploration probability ( 2 [0; 1℄) spe ies the probability of hoosing a
random a tion similar to the -greedy poli y in reinfor ement learning.
ga The GA appli ation threshold (ga 2 N ) ontrols the GA frequen y. A GA
is applied in an a tion set if the average delay of the last GA appli ation of
the lassiers in the set is greater than ga .
The mutation rate ( 2 [0; 1℄) spe ies the probability of hanging a spe ied
attribute in the onditions of an ospring to a #-symbol in a GA appli ation.
The rossover probability ( 2 [0; 1℄) spe ies the probability of applying
rossover in the onditions of the ospring when a GA is applied.
as The a tion set size threshold (as 2 N ) spe ies the maximal number of
lassiers in an a tion set whi h is ontrolled by the GA.
exp The experien e threshold (exp 2 N ) spe ies when a lassier is regarded as
experien ed determined by exp.
Although seemingly in uen ed by a lot of parameters, studies showed that ACS2
is relatively robust to any hosen parameter setting. Usually, all parameters an
be set to standard values. In the following, we spe ify default values and give
the basi intuition behind the parameters.
The inadequa y threshold i is usually set to the standard 0:1. Even val-
ues of 0:0 showed to not negatively in uen e the performan e of ACS2 sin e in
that ase geneti generalization takes are of inadequate lassiers. A lassier
is regarded as inadequate on e its quality q falls below i . In general, sin e in-
adequate lassiers are deleted by the ALP (see se tion 3.6), i should be set
to a low value. The reliability threshold r on the other hand determines when
a lassier is regarded as reliable and onsequently be omes part of the inter-
nal environmental model. The standard value is 0:9. Generally, the higher the
value is set, the longer it takes to form a omplete model but the more reliable
the model a tually is. A more ru ial parameter is the learning rate whi h
in uen es the update pro edure of several parameters. The usual value is 0:05
whi h an be regarded as a rather passive value. The higher , the faster param-
eters approa h an approximation of their a tual value but the more noisy the
approximation is. The dis ount fa tor determines the reward distribution over
the environmental model. A usual value is 0:95. It essentially spe ies to what
extend future reinfor ement in uen es urrent behavior. The loser to one, the
more in uen e delayed reward has on urrent behavior. The spe i ity thresh-
old umax onstraints the spe ialization me hanism, namely the ALP, to what
extend it is allowed to spe ialize the onditions. A save value is always L, the
length of the per eptual string. However, if knowledge is available about the
problem, then the learning pro ess an be strongly supported by a restri ted
umax parameter. The exploration probability determines a tion sele tion and
onsequently behavior. The fastest model learning is usually a hieved by pure
random exploration. Further biases in a tion sele tion are not onsidered in this
des ription. The interested reader should refer to (Butz, 2002) in this volume.
The following parameters manipulate the geneti generalization me hanism.
The GA threshold ga ontrols the frequen y with whi h geneti generalization
is applied. A higher threshold assures that that the ALP has enough time to
work on a generalized set if ne essary. A default threshold is 100. Lower thresh-
olds usually keep the population size down but an ause information loss in
the beginning of a run. The mutation rate is set unusual high sin e it is a
dire tly generalizing mutation. The default is set to 0:3. Lower values de rease
the generalization pressure and onsequently de rease the speed of onversion
in the population. Higher values on the other hand an also de rease onversion
be ause of the higher amount of over-general lassiers in the population. The
rossover probability is usually set to the standard value of 0:8. Crossover
seems to in uen e the pro ess only slightly. No problem was found so far in
whi h rossover a tually has a signi ant ee t. The a tion set size threshold
as is more ru ial sin e it determines the set size for the geneti generalization
appli ation. If the threshold is set too low, the GA might ause the deletion of
important lassiers and onsequently disrupt the learning pro ess. If the size is
set very high, the system might learn the problem but it will take mu h longer
sin e the population size will rise a lot. However, the default value of 20 worked
very well in all so far applied problems. Finally, the experien e threshold exp
ontrols when a lassier is usable as a subsumer. A low threshold might ause
the in orre t propagation of an over-general lassier. However, no negative ef-
fe t or major in uen e has been identied so far. Usually, exp is set to 20.
3 Algorithmi Des ription
The provided des ription approa hes the problem in a top down manner. First,
the overall exe ution y le is spe ied. In the subsequent se tions, ea h sub-
pro edure is spe ied in further detail.
The following notational onstraints are used in the des ription. Ea h spe i-
ed sub-pro edure is written in pure apital letters. The intera tion with the en-
vironment and parti ularly requests from the environment or the reinfor ement
program are denoted with a olon. Moreover, to denote a ertain parameter of a
lassier we use the dot notation. Finally, it is ne essary to note that we do not
use bra es or anything to denote the length of an if lause or a loop but rather
use indentation as the dire t ontrol.
3.1 Initialization
In the beginning of an ACS2 run, rst, all modules need to be initialized. The
environment env must be reated and the animat represented by ACS2 needs to
be set to a ertain position or state in the environment and so forth. Also, the
reinfor ement program rp must be initialized. Finally, ACS2 must be initialized
itself. Hereby, the parameter settings are determined, the time-step ounter, re-
ferred to as t, is set, and the (in the beginning usually empty) population is
reated. After all initialization, whi h we do not larify in further detail be ause
of their strong problem and implementation dependen e, the main loop is alled.
START ACS2:
1 initialize environment env
2 initialize reinfor ement program rp
3 initialize ACS2
4 RUN EXPERIMENT with population [P ℄ and initial time t
3.2 The Main Exe ution Loop
The main loop RUN EXPERIMENT is exe uted as long as some termination
riteria are not met. In the main loop, the urrent situation is rst sensed (per-
eived as input). Se ond, the mat h set [M ℄ is formed from all lassiers that
mat h the situation. If this is not the beginning of a trial, ALP, reinfor ement
learning, and GA are applied in the previous a tion set. Next, an a tion is ho-
sen for exe ution, the a tion is exe uted, and an a tion set is generated from all
lassiers in [M ℄ that spe ify the hosen a tion. After some parameter updates,
ALP, reinfor ement learning, and GA may be applied in [A℄ if the exe ution of
the a tion led to the end of one trial. Finally, after [A℄ is stored for learning in
the next step, the loop is redone. In the ase of an end of trial, [A℄ 1 needs to be
emptied to prevent in orre t learning over a trial barrier (i.e. sin e the su essive
situation is unrelated to the previous one).
The main loop spe ies many sub-pro edures denoted in apital letters whi h
are des ribed below in further details. Some of the pro edures are more or less
trivial while others are omplex and themselves all other sub-pro edures. Ea h
of the sub-se tions try to spe ify the general idea and the overall pro ess and
then give a more detailed des ription of single parts in su essive paragraphs.
RUN EXPERIMENT([P ℄, t):

1 while(termination riteria are not met)
2 env: per eive situation
3 do
4 GENERATE MATCH SET [M ℄ out of [P ℄ using
5 if([A℄ 1 is not empty)
6 APPLY ALP in [A℄ 1 onsidering 1 , , t, and [P ℄
7 APPLY REINFORCEMENT LEARNING in [A℄ 1 using
and max ( l:q l:r)
2[ ℄^
l ML l:E 6=f#g
8 APPLY GENETIC GENERALIZATION in [A℄ 1 onsidering t
9 a t CHOOSE ACTION with an -greedy poli y in [M ℄
10 GENERATE ACTION SET [A℄ out of [M ℄ a ording to a t
11 env: exe ute a tion a t
12 t t+1
13 rp: re eive reward
14 1
15 env: per eive situation
16 if(env : is end of one trial)
17 APPLY ALP in [A℄ onsidering , 1 , t, and [P ℄
18 APPLY REINFORCEMENT LEARNING in [A℄ using
19 APPLY GENETIC GENERALIZATION in [A℄ onsidering t
20 [A℄ 1 [A℄
21 while(not env : is end of one trial)
22 env: reset position
23 [A℄ 1 empty
3.3 Formation of the Mat h Set
The GENERATE MATCH SET pro edure gets as input the urrent population
[P ℄ and the urrent situation . The pro edure in ACS2 is quite trivial. All las-
siers in [P ℄ are simply ompared to and all mat hing lassiers are added to
the mat h set. The sub-pro edure DOES MATCH is explained below.
GENERATE MATCH SET([P ℄, ):
1 initialize empty set [M ℄
3 for ea h lassifier l in [P ℄
4 if(DOES MATCH lassifier l in situation )
5 add lassifier l to set [M ℄
6 return [M ℄
The mat hing pro edure is ommonly used in LCSs. A 'don't are'-symbol #
in C mat hes any symbol in the orresponding position of . A ' are' or non-#
symbol only mat hes with the exa t same symbol at that position. The DOES
MATCH pro edure he ks ea h omponent in the lassier's ondition l:C . If
a omponent is spe ied (i.e. is not a don't are symbol), it is ompared with
the orresponding attribute in the urrent situation . Only if all omparisons
hold, the lassier mat hes and the pro edure returns true.
DOES MATCH( l, ):

1 for ea h attribute x in l:C
2 if(x 6= # and x 6= the orresponding attribute in )
3 return false
4 return true
3.4 Choosing an A tion
In ACS2 usually an -greedy method is used for a tion sele tion. However, un-
like non-generalizing reinfor ement learning methods, it is not lear whi h a tion
is a tually the best to hoose sin e one situation-a tion tuple is mostly repre-
sented by several distin t lassiers. In this des ription we hose to use the simple
method that the a tion of the apparent most promising lassier is hosen. Sin e
ACS2 also evolves lassiers that expli itely predi t no hange in the environ-
ment and there is no su h thing as a waiting ne essity in the problems addressed,
those lassiers are ex luded in the onsideration. The de ision is made in the
provided urrent mat h set [M ℄.
CHOOSE ACTION([M ℄):

1 if(RandomNumber[0; 1) < )
2 return a randomly hosen a tion possible in env
3 else
4 bestCl first l in [M ℄ with l:E 6= f#gL
5 for all lassifiers l 2 [M ℄
6 if( l:E 6= f#gL and l:q l:r > bestCl:q bestCl:r)
7 bestCl l
8 return l:A
3.5 Formation of the A tion Set
After the mat h set is formed and an a tion is hosen for exe ution, the GEN-
ERATE ACTION SET pro edure forms the a tion set out of the mat h set.
It in ludes all lassiers in the urrent mat h set [M ℄ that propose the hosen
a tion a t for exe ution.
GENERATE ACTION SET([M ℄, a t):

1 initialize empty set [A℄
2 for ea h lassifier l in [M ℄
3 if( l:A = a t)
4 add lassifier l to set [A℄
5 return [A℄
3.6 Anti ipatory Learning Pro ess
The appli ation of the anti ipatory learning pro ess is rather deli ate. Due to its
simultaneous reation and deletion of lassiers, it needs to be assured that newly
generated lassiers are added to the urrent a tion set but are not re onsidered
in the urrent ALP appli ation. Deleted lassiers need to be deleted from the
a tion set without in uen ing the update pro ess. The algorithmi des ription
does not address su h details. However, it is ne essary to be aware of these
possible problems.
The APPLY ALP pro edure su essively onsiders the anti ipation of ea h
lassier. If the anti ipation is orre t or wrong, the EXPECTED CASE or
UNEXPECTED CASE is alled, respe tively. In the UNEXPECTED CASE
pro edure the quality is de reased so that it is ne essary to he k if the quality
de reased under the inadequa y threshold i . If the ase, the lassier is re-
moved (regardless of its numerosity num sin e all mi ro- lassiers are a tually
inadequate). When adding a new lassier, it is ne essary to he k for identi-
al lassiers and possibly subsuming lassiers. Thus, another sub-pro edure
is alled in this ase. Finally, if no lassier in the a tion set anti ipates the
en ountered hange orre tly, a overing lassier is generated and added. The
method is usually alled from the main loop. Inputs are the a tion set [A℄ in
whi h the ALP is applied, the situation 1 - a tion a t tuple from whi h [A℄
was generated, the resulting situation , the time t the a tion was applied, and
the urrent population [P ℄.
Appli ation Average The UPDATE APPLICATION AVERAGE pro edure uses
the moyenne adaptive modifee te hnique to rea h an a urate value of the ap-
pli ation average as soon as possible. Also the ALP time stamp talp is set in this
pro edure. The pro edure gets the to be updated lassier l and the urrent
time t as input.
APPLY ALP([A℄, 1 , a t, , t, [P ℄):
1 wasExpe tedCase 0
2 for ea h lassifier l in [A℄
3 l:exp++
4 UPDATE APPLICATION AVERAGE of l with respe t to t
5 if( l DOES ANTICIPATE CORRECTLY in 1 )
6 newCl EXPECTED CASE of l in , 1
7 wasExpe tedCase 1
8 else
9 newCl UNEXPECTED CASE of l in , 1
10 if( l:q < i )
11 remove lassifier l from [P ℄ and [A℄
12 if(newCl is not empty)
13 newCl:tga t
14 ADD ALP CLASSIFIER newCl to [P ℄ and [A℄
15 if(wasExpe tedCase = 0)
16 newCl COVER TRIPLE 1 , a t, with time t
17 ADD ALP CLASSIFIER newCl to [P ℄ and [A℄
UPDATE APPLICATION AVERAGE( l, t):

1 if( l:exp < 1= )
2 l:aav l:aav + (t l:t - l:aav) / l:exp
alp
3 else
4 l:aav l:aav + * (t l:t alp - l:aav)
5 l:talp t
Che k Anti ipation While the pass-through symbols in the ee t part of a las-
sier dire tly anti ipate that these attributes stay the same after the exe ution
of an a tion, the spe ied attributes anti ipate a hange to the spe ied value.
Thus, if the per eived value did not hange to the anti ipated value but a tually
stayed at the value, the lassier anti ipates in orre tly. This is onsidered in
the DOES ANTICIPATE CORRECTLY pro edure. Inputs are the to be inves-
tigated lassier l, the situation 1 where l was applied in, and the resulting
situation .
DOES ANTICIPATE CORRECTLY( l, 1 , ):

1 for ea h position i
2 if( l:E [i℄ = #)
3 if( 1 [i℄ 6= [i℄)
4 return false
5 else
6 if( l:E [i℄ 6= [i℄ or sigma 1[i℄ = [i℄)
7 return false
8 return true
Expe ted Case The stru ture of the expe ted ase an be separated into two
parts in whi h either a new lassier is generated or not. No lassier is gener-
ated if the mark of the investigated lassier l is either empty or no dieren e
is dete ted between the mark and the relevant situation . In this ase, the
quality of the su essful lassier is in reased and no new lassier is returned
(denoted by return 'empty'). On the other hand, if dieren es are dete ted be-
tween mark and situation, ospring is generated. It is important to onsider
the ase where the spe ialization requests to spe ialize too many attributes.
In this ase, generalization of to be spe ied attributes is ne essary. If the o-
spring without spe ialization already rea hed the threshold umax , it is ne essary
to generalize the ospring to allow spe ialization of other attributes. The diff
attribute has the stru ture of a ondition part. Its reation is spe ied below
in another sub-pro edure. The handling of probability-enhan ed predi tions as
published in Butz; Goldberg; andStolzmann(2001), whi h we do not address in
this des ription, should be aught in line 3 if the mark is not empty. Moreover,
the probabilities in the parental lassier would be updated in this method.
EXPECTED CASE( l, ):

1 diff GET DIFFERENCES of l:M and
2 if(diff = f#g ) L
3 l:q l:q + (1 l:q)

4 return empty
5 else
6 spe number of non-# symbols in l:C
7 spe New number of non-# symbols in diff
8 hild opy lassifier l
9 if(spe = umax )
10 remove randomly spe ifi attribute in hild:C
11 spe --
12 while(spe + spe New > umax )
13 if(spe > 0 and random number [0; 1) < 0:5)
14 remove randomly spe ifi attribute in hild:C
15 spe --
16 else
17 remove randomly spe ifi attribute in diff
18 spe New--
19 else
20 while(spe + spe New > umax )
21 remove randomly spe ifi attribute in diff
22 spe New--
23 spe ify hild:C with diff
24 if( hild:q < 0:5)
25 hild:q = 0:5
26 hild:exp 1
27 return hild
Dieren e Determination The dieren e determination needs to distinguish be-
tween two ases. (1) Clear dieren es are those where one or more attributes in
the mark M do not ontain the orresponding attribute in the situation . (2)
Fuzzy dieren es are those where there is no lear dieren e but one or more
attributes in the mark M spe ify more than the one value in . In the rst ase,
one random lear dieren e is spe ied while in the latter ase all dieren es are
spe ied.
GET DIFFERENCES(M , ):

1 diff f#g L
2 if(M is not empty)

3 type1 type2 0
4 for all positions i in
5 if(M [i℄ does not ontain [i℄)
6 type1++
7 else if(jM [i℄j > 1)
8 type2++
9 if(type1 > 0)
10 type1 random number [0; 1) type1
12 if(M [i℄ does not ontain [i℄)
13 if(type1 = 0)
14 diff [i℄ [i℄
15 type1--
16 else if(type2 > 0)
18 if(jM [i℄j > 1)
19 diff [i℄ [i℄
20 return diff
Unexpe ted Case The unexpe ted ase is rather simply stru tured. Important is
the riterion for generating an ospring lassier. An ospring is generated only
if the ee t part of the to be investigated lassier l an be modied to anti i-
pate the hange from 1 to orre tly by only spe ializing attributes. If this is
the ase, an ospring lassier is generated that is spe ialized in ondition and
ee t part where ne essary. The experien e of the ospring lassier is set to one.
UNEXPECTED CASE( l, 1 , ):
1 l:q l:q ( l:q )
2 SET MARK l:M with 1
4 if( l:E [i℄ 6= #)
5 if( l:E [i℄ 6= [i℄ or 1 [i℄ = [i℄)
6 return empty
7 hild opy lassifier l
9 if( l:E [i℄ = # and 1 [i℄ 6= [i℄)
10 hild:C [i℄ 1 [i℄
11 hild:E [i℄ [i℄
12 if( l:q < 0:5)
13 l:q = 0:5
14 hild:exp 1
15 return hild
Covering The idea behind overing is that ACS2 intends to over all possible
situation-a tion-ee t triples. In the ALP, if su h a triple was not represented
by any lassier in the a tion set, overing is invoked. Covering generates a las-
sier that spe ies all hanges from the previous situation 1 to situation
in ondition and ee t part. The a tion part A of the new lassier is set to
the exe uted a tion a t. The time is set to the urrent time t. An empty las-
sier is referred to as a lassier that onsists only of #-symbols in ondition
and ee t part. Note, sin e the experien e ounter is set to 0, the appli ation
average parameter aav will be dire tly set to the delay til its rst appli ation
in its rst appli ation, so that the initialization is not parti ularly important.
Moreover, the quality l:q is set to 0:5 and the reward predi tion l:r is set to
zero to prevent 'reward bubbles' in the environmental model.
COVER TRIPLE( 1 , a t, , t):

1 hild generate empty lassifier with a tion a t
3 if( 1 [i℄ 6= [i℄)
4 hild:C [i℄ 1 [i℄
5 hild:E [i℄ [i℄
6 hild:exp hild:r hild:aav 0
7 hild:talp hild:tga t
8 hild:q 0:5
9 hild:num 1
10 return hild
Insertion in the ALP If the ALP generates ospring, insertion distinguishes be-
tween several ases. First, the method he ks if there is a lassier that subsumes
the insertion andidate l. If there is none, the method looks for equal lassiers.
If none was found, lassier l is inserted as a new lassier in the population
[P ℄ and the urrent a tion set [A℄. However, if a subsumer or equal lassier was
found, the quality of the old lassier is in reased and the new one is dis arded.
The subsumption method is des ribed in se tion 3.9 sin e the GA appli ation
uses the same method. Note, in the equality he k it is not ne essary to he k
for the identi al a tion sin e all lassiers in [A℄ have the same a tion.
ADD ALP CLASSIFIER( l, [A℄, [P ℄):

1 oldCl empty
2 for all lassifiers in [A℄
3 if( IS SUBSUMER of l)
4 if(oldCl is empty or :C is more general than oldCl:C )
5 oldCl
6 if(oldCl is empty)
8 if( is equal to l in ondition and effe t part)
9 oldCl
11 insert l in [A℄ and [P ℄
12 else
13 oldCl:q oldCl:q + beta (1 oldCl:q)
14 dis ard lassifier l
3.7 Reinfor ement Learning
The reinfor ement portion of the update pro edure follows the idea of Q-learning
(Watkins, 1989). Classier's reward predi tions are updated using the immediate
reward and the dis ounted maximum payo predi ted in the next time-step
maxP . The major dieren e is that ACS2 does not store an expli it model
but only more or less generalized lassiers that represent the model. Thus, for
the reinfor ement learning pro edure to work su essfully, it is mandatory that
the model is spe i enough for the reinfor ement distribution. Lanzi (2000)
formulizes this insight in a general lassier system framework. The pro edure
updates the reward predi tions r as well as the immediate reward predi tions ir
of all lassiers in the a tion set [A℄.
APPLY REINFORCEMENT LEARNING([A℄, , maxP ):

2 l:r l:r + ( + maxP l:r)
3 l:ir l:ir + ( l:ir)
3.8 Geneti Generalization
The GA in ACS2 is a geneti generalization of ondition parts. Due to the

modied generalizing mutation and the evolutionary pressures, the generalizing
nature of the GA is realized. The method starts by determining if a GA should
a tually take pla e, ontrolled by the tga time stamp and the a tual time t.
If a GA takes pla e, preferable a urate, over-spe ied lassiers are sele ted,
mutated, and rossed. Before the insertion, ex ess lassiers are deleted in [A℄.
Several parts of the pro esses are spe ied by sub-pro edures whi h are des ribed
after the des ription of the main GA pro edure.
P l A ga
P
APPLY GENETIC GENERALIZATION ([A℄, t):
1 if(t 2[ ℄ l:t l:num= 2[ ℄ l:num > )
l A GA

3 l:t ga a tual time t
4 parent1 SELECT OFFSPRING in [A℄
5 parent2 SELECT OFFSPRING in [A℄
6 hild1 opy lassifier parent1
7 hild2 opy lassifier parent2
8 hild1:num hild2:num 1
9 hild1:exp hild2:exp 1
10 APPLY GENERALIZING MUTATION on hild1
11 APPLY GENERALIZING MUTATION on hild2
13 APPLY CROSSOVER on hild1 and hild2
14 hild1 :r hild2:r (parent1 :r + parent2 :r)=2
15 hild1 :q hild2:q (parent1 :q + parent2 :q)=2
16 hild1:q hild1:q=2
17 hild2:q hild2:q=2
18 DELETE CLASSIFIERS in [A℄; [P ℄ to allow the insertion
of 2 hildren
19 for ea h hild
20 if( hild:C equals f#gL )
21 next hild
22 else
23 ADD GA CLASSIFIER hild to [P ℄ and [A℄
Ospring Sele tion Ospring in the GA is sele ted by a Roulette-Wheel Sele -

tion. The pro ess hooses a lassier for reprodu tion in set [A℄ proportional to
its quality to the power three. First, the sum of all values in set [A℄ is omputed.
Next, the roulette-wheel is spun. Finally, the lassier is hosen a ording to the
roulette-wheel result.
SELECT OFFSPRING([A℄):
1 qualitySum 0
3 qualitySum qualitySum + l:q 3
4 hoi eP oint RandomNumber[0; 1) qualitySum
5 qualitySum 0
7 qualitySum qualitySum + l:q3
8 if(qualitySum > hoi eP oint)
9 return l
Mutation As has been noted before, the mutation pro ess in ACS2 is a generaliz-
ing mutation of the ondition part l:C . Spe i attributes in the onditions are
hanged to #-symbols with a ertain probability . The pro ess works as follows:
APPLY GENERALIZING MUTATION ( l):

1 for all positions i in l:C
2 if( l:C [i℄ 6= #)
4 l:C [i℄ #
Crossover The rossover appli ation, as mutation, is only applied to the on-
dition part. Crossover is only applied, if the two ospring lassiers l1 and
l2 anti ipate the same hange. This restri tion further assures the ombination
of lassiers that inhabit the same environmental ni he. Our des ription shows
two-point rossover.
APPLY CROSSOVER ( l1, l2):

1 if( l1:E 6= l2:E )
2 return
3 x RandomNumber[0; 1) (length of l1 :C +1)
4 do
5 y RandomNumber[0; 1) (length of l1 :C +1)
6 while(x = y )
7 if(x > y )
8 swit h x and y
9 i 0
10 do
11 if(x i and i < y )
12 swit h l1 :C [i℄ and l2 :C [i℄
13 i++
14 while(i < y )
GA Deletion While the reprodu tion pro ess uses a form of roulette wheel se-
le tion, GA deletion in ACS2 applies a modied tournament sele tion pro ess.
Approximately a third of the a tion set size takes part in the tournament. The
lassier is deleted that has a signi antly lower quality than the others. If all
lassiers have a similar quality, marked lassiers are preferred for deletion be-
fore unmarked lassiers and the least applied lassier is preferred among only
marked or only unmarked lassiers. First, however, the method ontrols if and
how long lassiers need to be deleted in [A℄. The parameter inSize spe ies
the number of hildren that will still be inserted in the GA pro ess. Note, the
tournament is held among the mi ro- lassiers. If a lassier is removed from
the population, that is, if its numerosity rea hes zero, the lassier needs to be
removed from the a tion set [A℄ as well as from the whole population [P ℄.
P
DELETE CLASSIFIERS([A℄, [P ℄, inSize):
1 while(inSize + 2[ ℄ l:num > )
l A as
2 lDel empty
3 for ea h mi ro- lassifier l in [P ℄
4 if(RandomNumber[0; 1) < 1=3)
5 if( lDel is empty)
6 lDel l
7 else
8 if( l:q lDel:q < 0:1)
9 lDel l
10 if(j l:q lDel:q j 0:1)
11 if( l:M is not empty and lDel:M is empty)
12 lDel l
13 else if( l:M is not empty or lDel:M is empty)
14 if( l:aav > lDel:aav )
15 lDel l
16 if( lDel is not empty)
17 if( lDel:num > 1)
18 lDel:num--
19 else
20 remove lassifier l from [P ℄ and [A℄
Insertion in the GA Although quite similar to the ALP insertion, the insertion
method in the GA diers in two important points. First, the numerosity num
rather than the quality q of an old, subsuming or identi al lassier is in reased.
Se ond, the numerosity of an identi al lassier is only in reased if the identi al
lassier is not marked. Parameters are as before the to be inserted lassier l,
the a tion set [A℄ lassier l was generated from, and the urrent population [P ℄.
ADD GA CLASSIFIER( l, [A℄, [P ℄):
1 oldCl empty
3 if( IS SUBSUMER of l)
4 if(oldCl is empty or :C is more general than oldCl:C )
5 oldCl
8 if( is equal to l in ondition and effe t part)
9 oldCl
11 insert l in [A℄ and [P ℄
12 else
13 if(oldCl is not marked)
14 oldCl:num++
15 dis ard lassifier l
3.9 Subsumption
ACS2 looks for subsuming lassiers in the GA appli ation as well as in the
ALP appli ation. For a lassier lsub to subsume another lassier ltos , the
subsumer needs to be experien ed, reliable, and not marked. Moreover, the sub-
sumer's ondition part needs to be synta ti ally more general and the ee t part
needs to be identi al. Note again, an identi al a tion he k is not ne essary sin e
both lassiers o upy the same a tion set. The pro edure returns if lassier
ltos is subsumed by lsub but does not apply any onsequent parameter hanges.
IS SUBSUMER( l , l ):
sub tos
1 if( l :exp >

sub and l
exp sub :q > r and lsub :M is empty)
2 if (the number of # in lsub :C the number of # in ltos :C )
3 if( lsub :E is equal to ltos :E )
4 return true
5 return false
4 Summary
This hapter gave a pre ise overview over the ACS2 system. Intera tion, knowl-
edge representation, and parameter identi ation should serve as a basi refer-
en e book when implementing a new problem and applying ACS2 to it. The
algorithmi des ription revealed all pro esses inside ACS2 and should serve as
a helpful guide to program an own version of ACS2 or develop an enhan ed
anti ipatory learning lassier system out of the ACS2 framework. The des rip-
tion did not in lude any implementation details so that the system should be
programmable in any programming language with the help of this des ription.
A knowledgments We would like to thank the Department of Cognitive Psy-
hology at the University of Wurzburg for their support. The work was sponsored
by the German Resear h Foundation DFG.
Referen es
Butz, M. V. (2001a). Anti ipatory learning lassier systems. Geneti Algo-

rithms and Evolutionary Computation. Boston, MA: Kluwer A ademi
Publishers.
Butz, M. V. (2001b). An implementation of the anti ipatory lassier sys-
tem ACS2 in C++ (IlliGAL report 2001026). University of Illinois at
Urbana-Champaign: Illinois Geneti Algorithms Laboratory. http://www-
illigal.ge.uiu .edu/sour e d.html.
Butz, M. V. (2002). Biasing exploration in an anti ipatory learning lassier
system. In Lanzi, P. L., Stolzmann, W., & Wilson, S. W. (Eds.), Pro eed-
ings of the Fourth International Workshop on Learning Classier Systems
(IWLCS-2001) Berlin Heidelberg: Springer-Verlag.
Butz, M. V., Goldberg, D. E., & Stolzmann, W. (2000). Introdu ing a ge-
neti generalization pressure to the anti ipatory lassier system: Part 1
- theoreti al approa h. In Whitely, D., Goldberg, D. E., Cantu-Paz, E.,
Spe tor, L., Parmee, I., & Beyer, H.-G. (Eds.), Pro eedings of the Geneti
and Evolutionary Computation Conferen e (GECCO-2000) pp. 34{41. San
Fran is o, CA: Morgan Kaufmann.
Butz, M. V., Goldberg, D. E., & Stolzmann, W. (2001). Probability-enhan ed
predi tions in the anti ipatory lassier system. In Lanzi, P. L., Stolzmann,
W., & Wilson, S. W. (Eds.), Advan es in Learning Classier Systems,
LNAI 1996 pp. 37{51. Berlin Heidelberg: Springer-Verlag.
Dorigo, M., & Colombetti, M. (1997). Robot Shaping, an experiment in behav-
ior engineering. Intelligent Roboti s and Autonomous Agents. Cambridge,
MA: MIT Press.
Lanzi, P. L. (2000). Learning lassier systems from a reinfor ement learn-
ing perspe tive (Te hni al Report 00-03). Dipartimento di Elettroni a e
Informazione, Polite ni o di Milano.
Stolzmann, W. (1997). Antizipative Classier Systems [Anti ipatory lassier
systems℄. Aa hen, Germany: Shaker Verlag.
Stolzmann, W. (1998). Anti ipatory lassier systems. In Koza, J. R.,
Banzhaf, W., Chellapilla, K., Deb, K., Dorigo, M., Fogel, D., Grazon, M.,
Goldberg, D., Iba, H., & Riolo, R. (Eds.), Geneti Programming 1998:
Pro eedings of the Third Annual Conferen e pp. 658{664. San Fran is o,
CA: Morgan Kaufmann.
Stolzmann, W., Butz, M. V., Homann, J., & Goldberg, D. E. (2000). First
ognitive apabilities in the anti ipatory lassier system. In Meyer, J.-A.,
Berthoz, A., Floreano, D., Roitblat, H., & Wilson, S. W. (Eds.), From
Animals to Animats 6: Pro eedings of the Sixth International Conferen e
on Simulation of Adaptive Behavior pp. 287{296. Cambridge, MA: MIT
Press.
Watkins, C. J. C. H. (1989). Learning from delayed rewards. Do toral disser-
tation, King's College, Cambridge, UK.

Butz, Stolzmann - 2002 - An Algorithmic Description of ACS2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Butz, Stolzmann - 2002 - An Algorithmic Description of ACS2

Uploaded by

Copyright:

Available Formats

An Algorithmi

Des ription of ACS2

Martin V. Butz1 and Wolfgang Stolzmann2

2 Environment, Knowledge Representation, and

2.1 Environmental Intera tion

Similar to reinfor ement learning agents, ACS2 intera ts autonomously with an

2.2 Knowledge Representation

As in other LCSs the knowledge in ACS2 is represented by a population [P ℄

(per eptions) in whi h the lassi er an be applied.

as inadequate determined by its quality q .

i ed attributes in C that are anti ipated to stay the same in E .

3 Algorithmi Des ription

3.2 The Main Exe ution Loop

RUN EXPERIMENT([P ℄, t):

3.3 Formation of the Mat h Set

DOES MATCH( l, ):

3.4 Choosing an A tion

CHOOSE ACTION([M ℄):

GENERATE ACTION SET([M ℄, a t):

3.6 Anti ipatory Learning Pro ess

UPDATE APPLICATION AVERAGE( l, t):

DOES ANTICIPATE CORRECTLY( l,  1 , ):

EXPECTED CASE( l, ):

3 l:q l:q + (1 l:q)

GET DIFFERENCES(M , ):

2 if(M is not empty)

COVER TRIPLE( 1 , a t, , t):

ADD ALP CLASSIFIER( l, [A℄, [P ℄):

3.7 Reinfor ement Learning

APPLY REINFORCEMENT LEARNING([A℄, , maxP ):

The GA in ACS2 is a geneti generalization of ondition parts. Due to the

2 for ea h lassifier l in [A℄

O spring Sele tion O spring in the GA is sele ted by a Roulette-Wheel Sele -

APPLY GENERALIZING MUTATION ( l):

APPLY CROSSOVER ( l1, l2):

1 if( l :exp > 

Butz, M. V. (2001a). Anti ipatory learning lassi er systems. Geneti Algo-

You might also like

(per eptions) in whi h the lassier an be applied.

ied attributes in C that are anti ipated to stay the same in E .

DOES MATCH( l, ):

DOES ANTICIPATE CORRECTLY( l, 1 , ):

EXPECTED CASE( l, ):

GET DIFFERENCES(M , ):

COVER TRIPLE( 1 , a t, , t):

APPLY REINFORCEMENT LEARNING([A℄, , maxP ):

Ospring Sele tion Ospring in the GA is sele ted by a Roulette-Wheel Sele -

1 if( l :exp >

Butz, M. V. (2001a). Anti ipatory learning lassier systems. Geneti Algo-