Professional Documents
Culture Documents
Abstra
t. The various modi
ations and extensions of the anti
ipatory
lassier system (ACS) re
ently led to the introdu
tion of ACS2, an en-
han
ed and modied version of ACS. This
hapter provides an overview
over the system in
luding all parameters as well as framework, stru
-
ture, and environmental intera
tion. Moreover, a pre
ise des
ription of
all algorithms in ACS2 is provided.
1 Introdu tion
Anti
ipatory learning
lassier systems (ALCSs) are a new type of
lassier
system. The major addition in ALCSs is that they
omprise the notation of an-
ti
ipations in their framework. Doing that, the systems predominantly are able
to anti
ipate per
eptual
onsequen
es of a
tions independent of a reinfor
ement
predi
tion. Thus, ALCSs are systems that are able to form a
omplete anti
i-
patory representation, that is, they build an environmental model. The model
spe
ies whi
h
hanges take pla
e in an environment after the exe
ution of a
spe
i
a
tion with respe
t to the
urrent situation. The essential intention be-
hind the framework is that the representation of an environmental model allows
faster and more intelligent adaptation of behavior or problem
lassi
ation. By
anti
ipating the
onsequen
es of a
tions with the evolving model, the system is
able to adapt its behavior faster and beyond the
apabilities of reinfor
ement
learning methods (Stolzmann, Butz, Homann, & Goldberg, 2000, Butz, 2001a).
The system ACS2 is derived from the original ACS framework as introdu
ed
in Stolzmann (1997) and Stolzmann (1998). Moreover, ACS2 embodies the more
re
ently introdu
ed geneti
generalization me
hanism (Butz, Goldberg, & Stolz-
mann, 2000). This paper provides a pre
ise algorithmi
des
ription of ACS2. The
des
ription starts in a top down manner detailing rst the overall learning
y
le.
The following subse
tions spe
ify the single parts of the
y
le in more detail.
This arti
le should be read in
onjun
tion with Butz (2001a) in whi
h a more
omprehensive introdu
tion of ACS2 is provided as well as a previous version of
this algorithmi
des
ription. The interested reader is also referred to the other
ited literature above for further ba
kground.
The next se
tion gives an overview of ACS2's framework, rule stru
ture, and
environmental intera
tion. Se
tion 3 provides the a
tual algorithmi
des
ription.
We hope that the des
ription in
ombination with the explanations about frame-
work, stru
ture, parameters, and environmental intera
tion fa
ilitates resear
h
with ACS2. We would like to en
ourage feedba
k regarding potential problems
or ambiguities. Moreover, the usage of the available ACS2
ode is en
ouraged
(Butz, 2001b).
Before rushing into the algorithmi
des
ription, we provide an overview of the
basi
environmental intera
tion of ACS2, as well as its internal stru
ture. More-
over, a list of all parameters in ACS2 is provided with additional suggested
parameter settings and possible hints of how to set the parameters with respe
t
to a spe
i
problem.
State
Environment
Rein- Situation Action
Rein-
forcement force-
Behavioral
ment
Program Model & Policy
ACS2
Policy
Learner
Model
Model Exploitation
Influence
Fig. 1. ACS2 intera
ts with an environment per
eiving environmental situations and
exe
uting a
tions in the environment. Reinfor
ement is provided by a separate rein-
for
ement program that evaluates the
urrent environmental state and might be more
or less in
uen
ed by ACS2.
2.3 Parameters
The following parameters
ontrol the various learning methods in ACS2. We rst
provide a list of all parameters and then reveal their usage and default values in
further detail.
The inadequa
y threshold ( 2 [0; 1℄) spe
ies when a
lassier is regarded
i i
reliable determined by q.
The learning rate ( 2 [0; 1℄) is used in ALP and RL updates ae
ting q, r,
ir, and aav.
The dis
ount fa
tor (
2 [0; 1)) dis
ounts the maximal reward expe
ted in
the subsequent situation.
u The spe
i
ity threshold (umax 2 N ) spe
ies the maximum number of spe
-
max
The provided des
ription approa
hes the problem in a top down manner. First,
the overall exe
ution
y
le is spe
ied. In the subsequent se
tions, ea
h sub-
pro
edure is spe
ied in further detail.
The following notational
onstraints are used in the des
ription. Ea
h spe
i-
ed sub-pro
edure is written in pure
apital letters. The intera
tion with the en-
vironment and parti
ularly requests from the environment or the reinfor
ement
program are denoted with a
olon. Moreover, to denote a
ertain parameter of a
lassier we use the dot notation. Finally, it is ne
essary to note that we do not
use bra
es or anything to denote the length of an if
lause or a loop but rather
use indentation as the dire
t
ontrol.
3.1 Initialization
In the beginning of an ACS2 run, rst, all modules need to be initialized. The
environment env must be
reated and the animat represented by ACS2 needs to
be set to a
ertain position or state in the environment and so forth. Also, the
reinfor
ement program rp must be initialized. Finally, ACS2 must be initialized
itself. Hereby, the parameter settings are determined, the time-step
ounter, re-
ferred to as t, is set, and the (in the beginning usually empty) population is
reated. After all initialization, whi
h we do not
larify in further detail be
ause
of their strong problem and implementation dependen
e, the main loop is
alled.
START ACS2:
1 initialize environment env
2 initialize reinfor
ement program rp
3 initialize ACS2
4 RUN EXPERIMENT with population [P ℄ and initial time t
The main loop RUN EXPERIMENT is exe
uted as long as some termination
riteria are not met. In the main loop, the
urrent situation is rst sensed (per-
eived as input). Se
ond, the mat
h set [M ℄ is formed from all
lassiers that
mat
h the situation. If this is not the beginning of a trial, ALP, reinfor
ement
learning, and GA are applied in the previous a
tion set. Next, an a
tion is
ho-
sen for exe
ution, the a
tion is exe
uted, and an a
tion set is generated from all
lassiers in [M ℄ that spe
ify the
hosen a
tion. After some parameter updates,
ALP, reinfor
ement learning, and GA may be applied in [A℄ if the exe
ution of
the a
tion led to the end of one trial. Finally, after [A℄ is stored for learning in
the next step, the loop is redone. In the
ase of an end of trial, [A℄ 1 needs to be
emptied to prevent in
orre
t learning over a trial barrier (i.e. sin
e the su
essive
situation is unrelated to the previous one).
The main loop spe
ies many sub-pro
edures denoted in
apital letters whi
h
are des
ribed below in further details. Some of the pro
edures are more or less
trivial while others are
omplex and themselves
all other sub-pro
edures. Ea
h
of the sub-se
tions try to spe
ify the general idea and the overall pro
ess and
then give a more detailed des
ription of single parts in su
essive paragraphs.
The GENERATE MATCH SET pro
edure gets as input the
urrent population
[P ℄ and the
urrent situation . The pro
edure in ACS2 is quite trivial. All
las-
siers in [P ℄ are simply
ompared to and all mat
hing
lassiers are added to
the mat
h set. The sub-pro
edure DOES MATCH is explained below.
GENERATE MATCH SET([P ℄, ):
1 initialize empty set [M ℄
3 for ea
h
lassifier
l in [P ℄
4 if(DOES MATCH
lassifier
l in situation )
5 add
lassifier
l to set [M ℄
6 return [M ℄
The mat
hing pro
edure is
ommonly used in LCSs. A 'don't
are'-symbol #
in C mat
hes any symbol in the
orresponding position of . A '
are' or non-#
symbol only mat
hes with the exa
t same symbol at that position. The DOES
MATCH pro
edure
he
ks ea
h
omponent in the
lassier's
ondition
l:C . If
a
omponent is spe
ied (i.e. is not a don't
are symbol), it is
ompared with
the
orresponding attribute in the
urrent situation . Only if all
omparisons
hold, the
lassier mat
hes and the pro
edure returns true.
In ACS2 usually an -greedy method is used for a
tion sele
tion. However, un-
like non-generalizing reinfor
ement learning methods, it is not
lear whi
h a
tion
is a
tually the best to
hoose sin
e one situation-a
tion tuple is mostly repre-
sented by several distin
t
lassiers. In this des
ription we
hose to use the simple
method that the a
tion of the apparent most promising
lassier is
hosen. Sin
e
ACS2 also evolves
lassiers that expli
itely predi
t no
hange in the environ-
ment and there is no su
h thing as a waiting ne
essity in the problems addressed,
those
lassiers are ex
luded in the
onsideration. The de
ision is made in the
provided
urrent mat
h set [M ℄.
After the mat
h set is formed and an a
tion is
hosen for exe
ution, the GEN-
ERATE ACTION SET pro
edure forms the a
tion set out of the mat
h set.
It in
ludes all
lassiers in the
urrent mat
h set [M ℄ that propose the
hosen
a
tion a
t for exe
ution.
The appli
ation of the anti
ipatory learning pro
ess is rather deli
ate. Due to its
simultaneous
reation and deletion of
lassiers, it needs to be assured that newly
generated
lassiers are added to the
urrent a
tion set but are not re
onsidered
in the
urrent ALP appli
ation. Deleted
lassiers need to be deleted from the
a
tion set without in
uen
ing the update pro
ess. The algorithmi
des
ription
does not address su
h details. However, it is ne
essary to be aware of these
possible problems.
The APPLY ALP pro
edure su
essively
onsiders the anti
ipation of ea
h
lassier. If the anti
ipation is
orre
t or wrong, the EXPECTED CASE or
UNEXPECTED CASE is
alled, respe
tively. In the UNEXPECTED CASE
pro
edure the quality is de
reased so that it is ne
essary to
he
k if the quality
de
reased under the inadequa
y threshold i . If the
ase, the
lassier is re-
moved (regardless of its numerosity num sin
e all mi
ro-
lassiers are a
tually
inadequate). When adding a new
lassier, it is ne
essary to
he
k for identi-
al
lassiers and possibly subsuming
lassiers. Thus, another sub-pro
edure
is
alled in this
ase. Finally, if no
lassier in the a
tion set anti
ipates the
en
ountered
hange
orre
tly, a
overing
lassier is generated and added. The
method is usually
alled from the main loop. Inputs are the a
tion set [A℄ in
whi
h the ALP is applied, the situation 1 - a
tion a
t tuple from whi
h [A℄
was generated, the resulting situation , the time t the a
tion was applied, and
the
urrent population [P ℄.
Appli
ation Average The UPDATE APPLICATION AVERAGE pro
edure uses
the moyenne adaptive modifee te
hnique to rea
h an a
urate value of the ap-
pli
ation average as soon as possible. Also the ALP time stamp talp is set in this
pro
edure. The pro
edure gets the to be updated
lassier
l and the
urrent
time t as input.
APPLY ALP([A℄, 1 , a
t, , t, [P ℄):
1 wasExpe
tedCase 0
2 for ea
h
lassifier
l in [A℄
3
l:exp++
4 UPDATE APPLICATION AVERAGE of
l with respe
t to t
5 if(
l DOES ANTICIPATE CORRECTLY in 1 )
6 newCl EXPECTED CASE of
l in , 1
7 wasExpe
tedCase 1
8 else
9 newCl UNEXPECTED CASE of
l in , 1
10 if(
l:q < i )
11 remove
lassifier
l from [P ℄ and [A℄
12 if(newCl is not empty)
13 newCl:tga t
14 ADD ALP CLASSIFIER newCl to [P ℄ and [A℄
15 if(wasExpe
tedCase = 0)
16 newCl COVER TRIPLE 1 , a
t, with time t
17 ADD ALP CLASSIFIER newCl to [P ℄ and [A℄
3 else
4
l:aav
l:aav + * (t
l:t alp -
l:aav)
5
l:talp t
Che
k Anti
ipation While the pass-through symbols in the ee
t part of a
las-
sier dire
tly anti
ipate that these attributes stay the same after the exe
ution
of an a
tion, the spe
ied attributes anti
ipate a
hange to the spe
ied value.
Thus, if the per
eived value did not
hange to the anti
ipated value but a
tually
stayed at the value, the
lassier anti
ipates in
orre
tly. This is
onsidered in
the DOES ANTICIPATE CORRECTLY pro
edure. Inputs are the to be inves-
tigated
lassier
l, the situation 1 where
l was applied in, and the resulting
situation .
Unexpe
ted Case The unexpe
ted
ase is rather simply stru
tured. Important is
the
riterion for generating an ospring
lassier. An ospring is generated only
if the ee
t part of the to be investigated
lassier
l
an be modied to anti
i-
pate the
hange from 1 to
orre
tly by only spe
ializing attributes. If this is
the
ase, an ospring
lassier is generated that is spe
ialized in
ondition and
ee
t part where ne
essary. The experien
e of the ospring
lassier is set to one.
UNEXPECTED CASE(
l, 1 , ):
1
l:q
l:q (
l:q )
2 SET MARK
l:M with 1
3 for all positions i in
4 if(
l:E [i℄ 6= #)
5 if(
l:E [i℄ 6= [i℄ or 1 [i℄ = [i℄)
6 return empty
7
hild
opy
lassifier
l
8 for all positions i in
9 if(
l:E [i℄ = # and 1 [i℄ 6= [i℄)
10
hild:C [i℄ 1 [i℄
11
hild:E [i℄ [i℄
12 if(
l:q < 0:5)
13
l:q = 0:5
14
hild:exp 1
15 return
hild
Covering The idea behind
overing is that ACS2 intends to
over all possible
situation-a
tion-ee
t triples. In the ALP, if su
h a triple was not represented
by any
lassier in the a
tion set,
overing is invoked. Covering generates a
las-
sier that spe
ies all
hanges from the previous situation 1 to situation
in
ondition and ee
t part. The a
tion part A of the new
lassier is set to
the exe
uted a
tion a
t. The time is set to the
urrent time t. An empty
las-
sier is referred to as a
lassier that
onsists only of #-symbols in
ondition
and ee
t part. Note, sin
e the experien
e
ounter is set to 0, the appli
ation
average parameter aav will be dire
tly set to the delay til its rst appli
ation
in its rst appli
ation, so that the initialization is not parti
ularly important.
Moreover, the quality
l:q is set to 0:5 and the reward predi
tion
l:r is set to
zero to prevent 'reward bubbles' in the environmental model.
The reinfor
ement portion of the update pro
edure follows the idea of Q-learning
(Watkins, 1989). Classier's reward predi
tions are updated using the immediate
reward and the dis
ounted maximum payo predi
ted in the next time-step
maxP . The major dieren
e is that ACS2 does not store an expli
it model
but only more or less generalized
lassiers that represent the model. Thus, for
the reinfor
ement learning pro
edure to work su
essfully, it is mandatory that
the model is spe
i
enough for the reinfor
ement distribution. Lanzi (2000)
formulizes this insight in a general
lassier system framework. The pro
edure
updates the reward predi
tions r as well as the immediate reward predi
tions ir
of all
lassiers in the a
tion set [A℄.
P
l A ga
P
APPLY GENETIC GENERALIZATION ([A℄, t):
1 if(t 2[ ℄
l:t
l:num= 2[ ℄
l:num > )
l A GA
Mutation As has been noted before, the mutation pro
ess in ACS2 is a generaliz-
ing mutation of the
ondition part
l:C . Spe
i
attributes in the
onditions are
hanged to #-symbols with a
ertain probability . The pro
ess works as follows:
GA Deletion While the reprodu
tion pro
ess uses a form of roulette wheel se-
le
tion, GA deletion in ACS2 applies a modied tournament sele
tion pro
ess.
Approximately a third of the a
tion set size takes part in the tournament. The
lassier is deleted that has a signi
antly lower quality than the others. If all
lassiers have a similar quality, marked
lassiers are preferred for deletion be-
fore unmarked
lassiers and the least applied
lassier is preferred among only
marked or only unmarked
lassiers. First, however, the method
ontrols if and
how long
lassiers need to be deleted in [A℄. The parameter inSize spe
ies
the number of
hildren that will still be inserted in the GA pro
ess. Note, the
tournament is held among the mi
ro-
lassiers. If a
lassier is removed from
the population, that is, if its numerosity rea
hes zero, the
lassier needs to be
removed from the a
tion set [A℄ as well as from the whole population [P ℄.
P
DELETE CLASSIFIERS([A℄, [P ℄, inSize):
1 while(inSize + 2[ ℄
l:num > )
l A as
2
lDel empty
3 for ea
h mi
ro-
lassifier
l in [P ℄
4 if(RandomNumber[0; 1) < 1=3)
5 if(
lDel is empty)
6
lDel
l
7 else
8 if(
l:q
lDel:q < 0:1)
9
lDel
l
10 if(j
l:q
lDel:q j 0:1)
11 if(
l:M is not empty and
lDel:M is empty)
12
lDel
l
13 else if(
l:M is not empty or
lDel:M is empty)
14 if(
l:aav >
lDel:aav )
15
lDel
l
16 if(
lDel is not empty)
17 if(
lDel:num > 1)
18
lDel:num--
19 else
20 remove
lassifier
l from [P ℄ and [A℄
Insertion in the GA Although quite similar to the ALP insertion, the insertion
method in the GA diers in two important points. First, the numerosity num
rather than the quality q of an old, subsuming or identi
al
lassier is in
reased.
Se
ond, the numerosity of an identi
al
lassier is only in
reased if the identi
al
lassier is not marked. Parameters are as before the to be inserted
lassier
l,
the a
tion set [A℄
lassier
l was generated from, and the
urrent population [P ℄.
ADD GA CLASSIFIER(
l, [A℄, [P ℄):
1 oldCl empty
2 for all
lassifiers
in [A℄
3 if(
IS SUBSUMER of
l)
4 if(oldCl is empty or
:C is more general than oldCl:C )
5 oldCl
6 if(oldCl is empty)
7 for all
lassifiers
in [A℄
8 if(
is equal to
l in
ondition and effe
t part)
9 oldCl
10 if(oldCl is empty)
11 insert
l in [A℄ and [P ℄
12 else
13 if(oldCl is not marked)
14 oldCl:num++
15 dis
ard
lassifier
l
3.9 Subsumption
ACS2 looks for subsuming
lassiers in the GA appli
ation as well as in the
ALP appli
ation. For a
lassier
lsub to subsume another
lassier
ltos , the
subsumer needs to be experien
ed, reliable, and not marked. Moreover, the sub-
sumer's
ondition part needs to be synta
ti
ally more general and the ee
t part
needs to be identi
al. Note again, an identi
al a
tion
he
k is not ne
essary sin
e
both
lassiers o
upy the same a
tion set. The pro
edure returns if
lassier
ltos is subsumed by
lsub but does not apply any
onsequent parameter
hanges.
IS SUBSUMER(
l ,
l ):
sub tos
4 Summary
This
hapter gave a pre
ise overview over the ACS2 system. Intera
tion, knowl-
edge representation, and parameter identi
ation should serve as a basi
refer-
en
e book when implementing a new problem and applying ACS2 to it. The
algorithmi
des
ription revealed all pro
esses inside ACS2 and should serve as
a helpful guide to program an own version of ACS2 or develop an enhan
ed
anti
ipatory learning
lassier system out of the ACS2 framework. The des
rip-
tion did not in
lude any implementation details so that the system should be
programmable in any programming language with the help of this des
ription.
A
knowledgments We would like to thank the Department of Cognitive Psy-
hology at the University of Wurzburg for their support. The work was sponsored
by the German Resear
h Foundation DFG.
Referen es