You are on page 1of 5

Robot Path Planning for Maze Navigation

Dimitris C. Dracopoulos
Brunel University
Department of Computer Science
London, UK
E-mail: Dimitris .DracopoulosQbrunel.ac .uk .

Abstract- This paper presents the application of niul- learn its future path, after a number of collisions with
tilayer perceptrons to the robot path planning prob- obstacles in the environment. Such an approach be-
lem, and in particular to the task of maze navigation.
Previous published results implied that the training of sides being an unnatural way (when compared with
feedforward multilayered networks failed, because of the humans), it is infeasible for most real industrial appli-
non-smoothness of data. Here the same maze problem cations, as the result of a robot colliding with its en-
is revisited.
vironment can be catastrophic for the robot itself and
damaging for the environment.
1. Introduction The next section provides some background in path
planning and relevant techniques. Section 3 describes
0 N E of the major components for the creation of
autonomous robots is the ability of a robot to
“plan its paths” and in general the ability to “plan its
the simple maze problem considered here, and section
4 proposes an artificial neural network architecture for
the solution of the maze problem along with the re-
motion”. In a limited or carefully engineered environ-
sults obtained from this approach. Finally, section 5
ment it is possible to program the robot for all possible
summarises what was achieved and gives directions for
combinations of motions in order to accomplish specific future work.
tasks [l]. But even when this is possible, one would like
to tell the robot “what” to do, rather than “how” to
do it, making its operation much easier [l]. 2. Path planning
In general however, “pre-programming” a robot for The general problem of path planning for au-
all possible cases or conditions it will meet is impossi- tonomous robots is defined as the search for a path
ble, due to the fact that the number of motion combi- which a robot (with specified geometry) has to follow
nations can be large or infinite (as is the case in non- in a described environment, in order to reach a particu-
tracked systems). In addition, it is highly desirable lar position and orientation B , given an initial position
to have robots which are able to adapt and operate and orientation A (Figure 1). The path is subject to
in unknown or (changing environments. Adaptation, certain constraints which serve t o avoid obstacle col-
robustness and operation in a wide range of environ- lision and optimise the performance of the robot (e.g.
ments, provides robots with a higher degree of “in- find the path with the minimum distance, or find the
telligence”. Such an intelligence can be achieved only path which minimises the energy spent by the robot).
through learning. Sometimes the motion of a robot is restricted to par-
This paper considersthe application of artificial neu- ticular paths or roadways, the railway tracks [3]. In
ral networks (and more specifically multilayer percep- this case, the problem of path planning is equivalent
trons) to robot path planning. The problem which is to that of graph search and any of the graph search
addressed here is that of maze navigation. algorithms can be utilised (e.g. A* or Dijkstra’s algo-
Apart from the obvious industrial applications the rithm). In general however, systems are non-tracked
solution of such a problem is inspired from daily life. and the number of routes a robot can follow is infinite.
Humans seem to be able to find their optimal path Real robot path planning becomes even more com-
through rooms they have not visited before, without plicated due to the fact that the shape of the robot has
keep bumping or colliding with the various obstacles to be taken into account. A tool which is commonly
that lie in the room. Somehow, they are able to used to face this extra complication is the configuration
“see” the obstacles and make an appropriate optimum space. The robot’s configuration space C represents the
path planning so as t o avoid them [2]. In contrast robot as a point and maps the obstacles in this space
with this, many path planning techniques (including in an appropriate way. This mapping transforms the
machine learning algorithms) seem to lead the robot problem of planning the motion of a dimensioned ob-
through a room and attempt to “force” the robot to ject into the problem of planning the motion of a point

1/98 $1 O.oOO1998 IEEE


0-7803-4859- 2081
the application of such techniques, is an environment
which is changing dynamically (e.g. if the obstacles
axe not fixed but they are moving). In such a case,
the potential function depends on time t and its design
is an extremelly difllcult or impossible task (consider
the extreme w e where the obstacles are moving in an
unknown way, i.e. the environment is unknown).
Besides the environmental constraints which add an
additional level of difficulty t o the path planning prob-
lem, there are also kinematic constraints in the motion
of the robot. Two broad categories for the kinematic
constraints are usually considered holonomic and non-
holonomic constraints. The former reduces the dimen-
sion of the configuration space attainable by the robot,
while the latter reduces the number of possible motions.
For example, a car-robot is subject t o holonomic con-
straints as its motion is restricted t o be always along
Figure 1. Path plianning: the robot would l i e to move
from the position and orientation A to the position
its main axis.
and orientation B avoiding the shaded obstacles.

3. The maze problem


[l].The full mathematical analysis of a path planning It is well known that maze navigation is an important
problem using the configuration space is too complex task in robotics. The maze problem considered in this
as the number of dimensions increases, therefore other paper was proposed by Werbos in [4]. This problem was
techniques have 1;o be used in combination with the used as a testbed for the capability of different artificial
Configuration space. neural networks to approximate non-smooth functions.
Potential fields is one of the most popular methods. As pointed out in [2], the testbed was established due
Following this approach, the robot resembles a parti- to the fact that the team in [5] was unable to succeed
cle which is moving under the influence of a potential in the training of a multilayer perceptron to solve the
field created by the target configuration (position and problem.
orientation of t b e robot) and the obstacles in the C The task is defined as follows: Given a maze of the
space. The target is considered t o be charged nega- type shown in Figure 2 (which may vary in size, num-
tively which makes the robot being attracted from it, ber of dimensions or the configurationof the obstacles),
while the obstacles have a positive charge resulting in find an appropriate path which moves a robot from an
a robot motion which avoids the obstacles. The to- initial position I to a target position G, while minimis-
tal potential field. (with a negative gradient) applies a ing the distance which the robot has t o travel.
force to the robot which pulls it towards the goal. The In [4], [SI a specialised neural network architecture,
direction of this force defines the robot’s trajectory. based on simultaneous recurrent networks was designed
A number of pcAential field algorithms exist. Many of to approximate the dynamic programing J function
them suffer from the problem of local minima. Depend- of the maze navigation problem. Such an approxima-
ing on the charges of the obstacles and the goal config- tion is crucial for adaptive critics neurocontrol meth-
uration, the robot may be leaded to a position where ods. These methods are approximate dynamic pro-
the force asked upon it is zero. In such a situation it gramming (ADP) techniques: given a utility function U
cannot move any more and it is trapped. To overcome which has t o be maximized over all future time (which
this problem, thie potential field methods have to be could be an infinite or a finite time horizon), find an
able to design potential functions which have no local approximation of the strategic utility function J, for
minima (something which is not always easy and in which maximization in the short term will maximize U
general it is computationally complex), or to employ in the long term [6]. Exact dynamic programming is
mechanisms which allow the robot t o escape from local too “expensive” computationally for complex dynamic
minima. Still however, the computational complexity systems or large problems, therefore ADP methods are
of the path planning increases exponentially with the used. A neural network (called the critic network)
dimension of the robot’s conflguration space [l]. An learns t o output (predict) an approximation J* of the
additional complication which becomes a problem for function J. Thus the Bellman equation in dynamic

2082
if the robot is capable of learning. Artificial neural net-
works have learning properties which make them ideal
candidates for robot motion planning. However, a neu-
ral network application was unsuccessful for a problem
of this type [5]. The next section proposes a neural
architecture for the solution of the maze problem de-
scribed here.

4. Neural Path Planning


The maze problem illustrated in Figure 2 has an ex-
tra complication, which is not apparent when one con-
siders the problem for the first time. For each of the
cells and a specific goal position, there is more than one
equally good direction [6]. Some of the cells have up
to four equally good directions, as there are four dif-
ferent paths which are optimum for the target config-
uration. This can be very confusing for any path plan-
X. X ner, including humans. In particular, such a case may
1 g
cause many difficulties for the training of artificial neu-
Figure 2. The maze problem: And the full optimum ral networks. Since the mapping between current state
path which will lead the robot from the position
and next adionlcurrent state is one-to-many, multi-
(xi,y;) to the position (xerye).
layer perceptrons will learn an incorrect model. This is
true because standard supervised learning algorithms
programming [7] is not average over multiple targets, assuming a squared er-
ror criterion function [8]. In addition, such data can
be non-smooth and training based on such data can
be very difficult or impossible for most networks (this
(1)
solved exactly, but it is replaced by a neural network was the motivation for the work on specialised more
able to approximate J(R) by J*(R) [8]. powerful SRN networks in [6]).
However, in [4],[e] the emphasis was given in the The approach which is tested here is based on feed-
capability of the simultaneous recurrent network ar- forward backpropagation-type neural networks. Al-
chitecture to approximate the J function, rather than though such networks suffer from the problem of local
solving the path planning problem for the 5 by 5 maze minima, in practice it is found that many local minima
of Figure 2. In addixion, the results presented the ap- give good results. In addition a global minimum, based
proximation of the ,J function for a particular target on the total error of the training data does not guar-
goal, although recent experiments with the proposed antee or imply that the generdisation of the network
SRN architecture [2], seemed to give promising results will be better. However, one could apply a straightfor-
for generalisation in different mazes. Generalisation in ward validation test (by splitting the available data in
cases where are ‘“seen” by the planner (either for dif- three sets) and decide the point which the training of
ferent initial and final conditions or for different mazes) the network should be stopped [8].
has a great importance for real world path planning The network used here is trained to predict the di-
applications. Letting a robot to move around a fixed rection that the robot should move at the next time
environment colliding with objects and become famil- step. Hence, one output node is used. The inputs to
iar with it, in order to perform again exactly the same the network are the current position of the robot (z,y)
navigation task, is an approach of limited usefulness. and its target position (zg,yg). A network architec-
Besides the cost involved in such techniques (due to ture with two hidden layers and a total size of 4-10-
the damages occurecl with the collisions), the necessity 10-1 was used (Figure 3), after having done a number
for intelligent behaviour defined by properties such as of experiments, in order to determine it. However, no
adaptivity and robulstness, makes their application in- real effort in optimising the network architecture was
appropriate for mamy cases. After all, humans are able attempted.
to “see” their near optimum path, when they arrive in The training set consisted of 174 distinct data, map-
a room €or the fist time. ping the current position and target position to the
Intelligent behaviour of a robot can be achieved only optimum action. These training data were generated

2083
4 North, South, East or West case where the initial and final conditions are the same,
so the robot does not have to move). The described
feedforward network, learned perfectly the 174 train-
ing samples and its training was stopped after 50,000
iterations. The remaining 288 samples were used to
test the generalisation capability of the network. Out
of the 288 test data, the network predicted correctly
the next move of the robot in 203 cases. This is based
in the assumption that only one move out of the four is
correct (hypothesis of uniqueness) and priority of cor-
rectness is given according to the order: North, South,
East, West. That is, if more than one moves lead the
robot to an optimum path, we count as correct only
the move which comes first in the described order (this
assumption was made as the training targets were gen-
erated in the same fashion). However, if one accepts
as correct prediction any of the moves which lead the
robot t o the optimum path (something which is much
more realistic and the true case), then the generalisa-
tion capability of the network exceeds 80%. Such a high
t
X
t
Y
tX
t accuracy is very desirable in real world path planning
problems.
g

Figure 3. The arlchitecture of the network used. The in- 5. Conclusions and future work
puts to the network are the current robot position
(x,y) and its goal (+,yg). Its output determines This work presented how multilayer perceptrons can
whether the robot will move North, South, East be applied to the robot path planning problem. For this
or West at ,the next time step.
purpose, the task of maze navigation was considered.
Previously published work indicated that the problem
from the optimum full paths of 50 trajectories. The could not be solved, using the standard feedforward
trajectories were chosen from 50 different initial-target backpropagation type networks [5]. The results shown
conditions. The generation of training data taken out here suggest, that such an approach is not only feasible
of optimum full trajectories makes the training of the for path planning problems, but also that the accuracy
network an easier task, as the data produced in this which can be achieved is quite high.
way are “smoother”. The target values in the output Future work has to demonstrate how well the neural
node were scaled in the range [-0.9,0.9] so as to avoid architecture will scale when applied to mazes of differ-
the “saturated areas” of the sigmoidal function used. ent size or to mazes where the robot has more degrees
Standard multilayer perceptron training can be sig- of freedom. Although initially it can be thought that
niscantly improved in terms of speed convergence if an more degrees of freedom make the problem more dif-
adaptive learning rate is used [9]. In all of the experi- ficult, such a case will generate much smoother data,
ments described here, a different rule for adapting the something which usually makes the training of multi-
learning rate a was used as follows: layer perceptrons an easier task.

= {
0 . 7 ~ , if
error
1.05a, otherwise
’1-04 (2) References
Jean-Claude Latombe, Robot Motion Planning, Kluwer Aca-
Using this learning rate update after each iteration of demic Publishers, 1991.
Paul J. Werbos, ”, 1996, personal communication.
the training samples, not only speeds up the process of Stephen Cameron, “Obstacle avoidance and path planning” ,
learning but can also help to avoid many local minima. Industrial Robot, vol. 21, no. 5, 1994.
The initial value which was used here for the learning Paul J. Werbos and Xiaozhong Pang, “Generalized maze
navigation: SRN critics solve what feedforward or hebbian
rate was 0.02. tn addition, the learning rule used to nets cannot”, in World Congress on Neuml Networks, Son
update the weights utilised the standard momentum Diego, California. September 1996, pp. 88-93, Lawrence Erl-
term with a coefficient of 0.7. baum Associates,Inc. and INNS Press.
P. Houillon and A. Caron, “Planar robot control in clut-
The possible number of combinations for inputs and tered space by artificial neural network”, Journal of Math
outputs to the network is 462 (if one ignores the trivial Modeling and Computing, pp. 498-502, 1993.

2084
[6] Xiaozhong Pang and Paul J. Werbos, "Neural network de-
sign for J function a.pproximationin dynamic programming",
Neural Networks, t D appear.
[7] Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic
Pmgmmming, Athima Scientific, 1996.
[SI Dimitris C. Dracopoulos, Evolutionary Learning Algorithms
for Neuml Adaptive Control, Springer Verlag, August 1997.
[9] David A. White and Donald A. Sofge, Eds., Handbook of
intelligent Control, Van Nostrand Reinhold, 1992.

2085

You might also like