Professional Documents
Culture Documents
338
by state machines, predicates defining associations between of the algorithm follows 1 :
sorts, and action schema in solver-ready form.
procedure LOCM I ( Input action sequence Seq)
The LOCM Method For each combination of action name A and
argument pos P for actions occurring in Seq
Phase 1: Extraction of state machines In our approach, Create transition A.P , comprising
an object of any given sort occupies one of a fixed set of new state identifiers A.P.start and A.P.end
parameterised states. In Phase 1, we assume an object’s Add A.P to the transition set T S
state can be defined without parameters (parameters spec- Collect the set of objects Obs in Seq
ify associations with other objects). LOCM starts by first For each object Ob occurring in Obs
collecting the set of all transitions occurring in the exam- For each pair of transitions T1 , T2
ple sequences. A transition is defined by a combination of consecutive for Ob in Seq
action name and action argument position. For example, Equate states T1 .end and T2 .start
end
an action fetch wrench(wr1,cntnr) gives rise to two transi-
end
tions: fetch wrench.1, and fetch wrench.2. Each transition Return T S, transition set
describes the state change of objects of a single sort in iso- OS, set of object states remaining distinct
lation. For every transition occurring in the example data, a
separate start and end state are generated. The trajectory of
each object is then tracked through each training sequence. At the end of phase 1, LOCM has derived a set of state ma-
For each consecutive pair of transitions T1 , T2 , experienced chines, each of which can be identified with a sort.
by an object Ob, we assume that T1 .end = T2 .start. Phase 2: Identification of state parameters Each state
Using a training set from the tyre world, suppose some machine describes the behaviour of a single object in isola-
object c1 goes through a sequence of transitions given in the tion, without consideration of its association with other ob-
example introduced above: jects e.g. it can distinguish a state of a wrench corresponding
to being in some container, but does not make provision to
open(c1); fetch jack(j,c1); fetch wrench(wr1,c1); close(c1);
describe which container it is in.
Let us assign names S1 to S8 to the states of c1 as it is af- In the object-centred representation, the dynamic asso-
fected by each transition: ciations between objects are recorded by state parameters
embedded in each state. Phase 2 of the algorithm iden-
S1 =⇒ open.1 =⇒ S2 tifies parameters of each state by analysing patterns of
S3 =⇒ close.1 =⇒ S4 object references in the original action steps correspond-
S5 =⇒ f etch jack.2 =⇒ S6
S7 =⇒ f etch wrench.2 =⇒ S8 ing to the transitions. For example, consider the state
wrench state0 for the wrench sort (Fig. 2). Consider-
Using the example action sequence, and the constraint on ing the actions for putaway wrench(wrench,container), and
consecutive pairs of transitions, we can then deduce that fetch wrench(wrench,container). For a given wrench, con-
S2 = S5 , S6 = S7 , S8 = S3 . Suppose our example set secutive transitions putaway wrench, fetch wrench, in any
contains another action sequence: example action sequence, always have the same value as
their container parameter. From this observation, we can
open(c2); fetch wrench(wr1,c2); fetch jack(j,c2); close(c2); induce that the state wrench state0 has a state variable repre-
senting container. The same observation does not hold true
We deduce that S2 = S7 , S8 = S5 , S6 = S3 , and hence
for wrench state1. We can observe instances in the training
S2 , S3 , S5 , S6 , S7 , S8 all refer to the same state. If addition-
data where the wrench is fetched from one container, and
ally we have the sequence:
put away in a different container.
close(c3); open(c3);
then S4 = S1 can be deduced, and hence we have tied
together individual states to partially construct a state
loosen.3
machine for containers (Fig. 1). A more formal description
tighten.3
undo.3
do up.3
fetchjack.2
fetchwrench.2 wrenchstate0 fetchwrench.1
wrenchstate1
container putawaywrench.1
open.1
containerstate0 containerstate1
close.1
Figure 2: Parameterised states of wrench.
1
Figure 1: An incomplete state machine for containers in Whereas our system is designed to use multiple training se-
tyre-world quences. To simplify, the presentation here uses only a single se-
quence.
339
This second phase of the algorithm performs inductive (not (wrench_state1 ?wrench1))))
learning such that the hypotheses can be refuted by the ex-
amples, but never definitely confirmed. This phase gener- The generated predicates wrench state0, wrench state1,
ally requires a larger amount of training data to converge container state1 can be understood as in container,
than Phase 1 above. Phase 2 is processed in three steps, have wrench and open respectively. The generated schema
shown below in the algorithmic description. The first two can be used directly in a planner.
steps generate and test the hypothesised correlations in ac-
tion arguments, which indicate the need for state parameters. Evaluation of LOCM
The third step generates the set of induced state parameters. LOCM has been implemented in Prolog incorporating the
procedure LOCM II ( Input action sequence Seq, algorithm detailed above. Here we attempt to analyse and
Transition set T S, Object set Obs) evaluate it by its application to the acquisition of existing
Object state set OS) domain models. We have used example plans from two
Form hypotheses from state machine
sources: (1) Existing domains built using GIPO III. In this
For each pair A1 .P1 and A2 .P2 in T S
such that A1 .P1 .end = S = A2 .P2 .start case, we have created sets of example action sequences by
For each pair A1 .P1 and A2 .P2 from T S and S in OS random walk. (2) Domains which were used in IPC planning
with A1 .P1 .sort = A2 .P2 .sort competitions. In this case, the example traces come from so-
and P1 = P1 , P2 = P2 lution plans in the publicly released competition solutions.
(i.e. a pair of the other arguments We have successfully used LOCM to create state ma-
of actions A1 and A2 sharing a common sort) chines, object associations and action schema for 3 domains.
Store in hypothesis set HS the hypothesis Evaluation of these results is ongoing, but initial results
that when any object ob undergoes sequentially show that state machines can be deduced from a reason-
the transitions A1 .P1 then A2 .P2 , ably small number of plan examples, whereas inducing the
there is a single object ob ,
state parameters requires much larger training sets. The total
which goes through both of the corresponding
transitions A1 .P1 and A2 .P2 number of steps in sets of training sequences required for de-
(This supports the proposition that state S riving state machines and parameters is summarised below
has a state parameter which can record for three example domains.
the association of ob with ob ) Domain Examples # steps # steps
end (state) (params)
end Tyre-world Random 125 2327
Test hypotheses against example sequence Blocks Random 34 250
For each object Ob occurring in Obs Driverlog Competition 205 3046
For each pair of transitions A1 .P1 and A2 .P2
consecutive for Ob in Seq Tyre-world (GIPO version). A correct state machine is de-
Remove from hypothesis set HS any hypothesis rived, corresponding closely to the domain as constructed
which is inconsistent with example action pair in GIPO. The induced domain contains extra states for the
end jack sort, but this model is valid (see fig. 3). After train-
end ing to convergence there are 3 parameter flaws, leading to
Generate and reduce set of state parameters some faults in action schema (see the end of this section
For every hypothesis remaining in HS for a discussion of flaws).
create the state parameter supported by the hypothesis
Merge state parameters on the basis that Blocks (GIPO version). A correct state machine is derived.
a transition occurring in more than one transition pair After training to convergence there are 3 parameter flaws.
is associated with the same state parameter in each occurrence The low number of steps needed to derive the state ma-
end chine is due to there being only 2 sorts in the domain,
return: state parameters and correlations with action arguments both of which are involved in every action.
Driverlog (IPC strips version). State machines and param-
Phase 3: Formation of action schema Extraction of an eters are correct for all sorts except trucks. For trucks,
action schema is performed by extracting the transitions cor- the distinction of states with/without driver is lost, and an
responding to its parameters, similar to automated action extra state parameter (driver) is retained.
construction in the OLHE process in (Simpson, Kitchin, and
McCluskey 2007). One predicate is created to represent Randomly-generated example data can be different in
each object state. The output of Phase 2 provides corre- character from purposeful, goal-directed plans. In a sense,
lations between the action parameters and state parameters random data is more informative, because the random plan is
occurring in the start/end states of transitions. For example, likely to visit more permutations of action sequences which
the generated putaway wrench action schema is: a goal-directed sequence may not. However, if the useful,
(:action putaway_wrench
goal-directed sequences lead to induction of a state machine
:parameters (?wrench1 - wrench ?container2 - container)
with more states, this could be seen as useful heuristic infor-
:precondition (and (wrench_state1 ?wrench1) mation.
(container_state1 ?container2)) Where there is only one object of a particular sort (e.g.
:effect (and (wrench_state0 ?wrench1 ?container2) gripper, wrench, container) all hypotheses about matching
340
wheel0
fetchwheel.1 information may vary anyway between training examples.
putawaywheel.1 wheel1 putonwheel.1
Conclusion
Figure 3: Other state machines induced from the tyre-world.
In this paper, we have described the LOCM system and
its use in learning domain models (comprising object sorts,
state descriptions, and action schema), from example action
that sort always hold, and the sort tends to become an in-
ternal state parameter of everything. For this reason, it is sequences containing no state information. Although it is
unrealistic to expect example sets of plans to be available for
important to use training data in which more than one object
all new domains, we expect the technique to be beneficial in
of each sort is used.
domains where automatic logging of some existing process
The induced models may contain detectable flaws: the ex-
yields plentiful training data, e.g. games, online transac-
istence of a state parameter has been induced, but there are
tions. The work is at an early stage, but we have already
one or more transitions into the state which do not set the
obtained promising results on benchmark domains, and we
state parameter. The flaws usually arise because state pa-
see many possibilities for further developing the technique.
rameters are induced only by considering pairs of consec-
utive transitions, not longer paths. The inconsistency may
indicate that an object reference is carried in from another References
state without being mentioned in an action’s argument. In Fox, M., and Long, D. 1998. The automatic inference of
this case a repair to the model can be proposed, which in- state invariants in TIM. J. Artif. Intell. Res. (JAIR) 9:367–
volves adding the “hidden” parameter to some states, but a 421.
further cycle of testing against the example data would be McCluskey, T.; Cresswell, S.; Richardson, N.; and West,
required to check that the repair is consistent. This will be M. M. 2009. Automated acquisition of action knowledge.
further developed in future work. In International Conference on Agents and Artificial Intel-
The most fundamental limitation is whether it is possible ligence (ICAART), 93–100.
to correctly represent the domain within the limitations of Shahaf, D., and Amir, E. 2006. Learning partially observ-
the representation that we use for action schema. able action schemas. In AAAI. AAAI Press.
• We assume that an action moves the objects in its argu- Simpson, R. M.; Kitchin, D. E.; and McCluskey, T. L.
ments between clearly-defined substates. Objects which 2007. Planning Domain Definition Using GIPO. Journal
are passively involved in an action may make a transition of Knowledge Engineering 1.
to the same state, but cannot be in a don’t care state. Wu, K.; Yang, Q.; and Jiang, Y. 2005. ARMS: Action-
• Static background information, such as the specific fixed relation modelling system for learning acquisition models.
relationships between objects (e.g. which places are con- In Proceedings of the First International Competition on
nected), is not analysed by the system. In general, this can Knowledge Engineering for AI Planning.
lead to missing preconditions. The LOCM algorithm as-
sumes that all information about an object is represented
in its state and state parameters. In general, this form of
341