You are on page 1of 6

Copyright © IF AC Control Applications in Marine Systems,

IFAC
Glasgow, Scotland, UK, 2001
C:OC>
Publ ications
www.elsevier.comllocate/ifac

AUVS' DYNAMICS MODELING, POSITION CONTROL, AND PATH PLANNING


USING NEURAL NETWORKS

Hassan Sayyaadi*, and Tamaki Ura**

*Assistant professor, School of Mechanical Engineering, Sharif University of Technology, Tehran, Iran
**Professor, Institute of Industrial Science, The University of Tokyo

Abstract: Accurate identification of nonlinear time variant MIMO systems, especially in


case of AUVs is essential for implementation of control algorithms and navigation
purposes. Control problems of AUVs have also difficulties due to the nonlinear
dynamics behaviors of vehicles and also unpredictable effects come from the
surrounded water mass. These nonlinear effects are so complicated that bring
difficulties for dynamics modeling and position control descriptions while using
conventional methods. The proposed method here uses neural networks as a general
idea for dynamics modeling and position control of any six-degree of freedom rigid
body and are applied to an AUV, named Twin Burger 2, as an example. Supervised
Learning and Unsupervised Learning are used for adjusting the neural networks'
synaptic weights and the results are illustrated. Path planning of AUVs using neural
network is also addressed here as of a complicated control scheme and Reinforcement
Learning is used for adjusting the neural network parameters of the path planning
module via some obstacle avoidance examples. Copyright ©2001 fFAC

Keywords: Autonomous Underwater Vehicle (AUV), dynamics modeling, position


control, path planning, neural network, SISO, MIMO, obstacle avoidance.

1. INTRODUCTION of MIMO systems such as AUVs with the above


mentioned nonlinearities cannot be easily
This paper discusses about three different issues accomplished using conventional control theories
regarding to AUV systems, which are 1) dynamics too. In this paper it is going to find proper substitutes
modeling, 2) position control, and 3) path planning, for conventional dynamics modeling and
all are investigated using neural networks. conventional position control to overcome to these
difficulties using neural networks.
The most popular method for deriving dynamics
equation of motion of a moveable rigid body is using Recently, different kinds of neural network structures
conventional Newton LaGrange mechanics. In order are applied to system identification and controls
to implement this method and derive equation of purposes of various AUV systems (Fujii and Ura,
motion for navigation and control purpose, in 1991). Neural network structure proposed by Fujii
addition to describing kinematics of the rigid body consists of two parts, which are Forward Model, and
precisely, it is necessary to identify all of the external Controller Network. Forward Model network is
force and moment terms as precisely as possible. responsible to identify the dynamics behavior of the
Dynamics modeling of AUVs using Newton vehicle and is trained off line using Supervised
LaGrange methods cannot be done easily due to the Learning. Position control network is used for
difficulties in identification of environmental effects keeping the vehicle in a desired position and is
caused by surrounding water mass and nonlinearities trained on line using Unsupervised Learning. The
of actuators' performances. Position control problem only disadvantage of the control system proposed by

167
Fujii is its consumption time while on line
Real World
adaptation. Imaginary Training algorithm proposed
by Ishii et al. (1995) modifies the structure proposed
by Fujii by dividing it into two different parts called
Imaginary World, and Real World. Real World part
is used for on line implementation of the control task
and Imaginary World part is used for off line
adaptation of the controller. The problem regarding
to the network proposed by Ishii is that the Forward r(
Model and Controller Network are just applied to
SISO system and have not enough generality to be Imaginary
applied to MIMO ones. In this paper it is going to r : reference signal
u : control Signal
overcome to this problem by extension of his idea to
Se : state variables (measured)
dynamics modeling, and position control problems of Sf : state variables (output of NNI)
six degree of freedom rigid bodies. Twin Burger 2 is
considered as an example and some results are Fig. 1. Imaginary Training architecture used for
illustrated (Sayyaadi et aI., 2000). adaptation of the controller network.
The other different feature addressed in this paper is
path planning of AUVs using neural networks. 2. DYNAMICS MODELING, AND POSITION
Obstacle avoidance for an AUV system is a very CONTROL OF AUVS
important task that has to be considered during a
mission in order to prevent damages to the vehicle Imaginary Training algorithm proposed by Ishii et al.
due to collision with the hazardous environmental (1995) as an on-line adaptation mechanism for neural
objects. Motion planning of an AUV can be network controller is depicted in Fig. 1. This
decomposed into path planning, and trajectory algorithm is realized by introducing the structure,
planning. Path planning for obstacle avoidance is to which consists of two separate processing parts; Real
generate a collision-free path in an environment World Part, and Imaginary World Part. In this
while optimizing it with respect to some criterion. structure the neural network controller can be
Trajectory planning is to schedule the movement of adjusted independently of the actual operation of the
an AUV along the planned path. This paper will controlled object. Real World Part is designed for
focus on the first issue, path planning. actual control operation, and Imaginary World Part is
designed for control adaptation. Real World Part is
The most recent works on path planning for obstacle an ordinary feedback system that consists of a
avoidance of AUV systems is based on Genetic controller (C_R) and a controlled object. Inputs to
Algorithm (Sugihara and Yuh, 1997). General the C_R are the differences between the reference
drawbacks of Genetic Algorithm methods are their
signals r(t) and the state variables Se(t) given by
low speed toward converged results, because the
algorithm has to search most of the possible solutions the controlled object. Imaginary World Part consists
to the problem using random probability for new of a controller CC_I), which has the same structure as
generation through crossover and mutation C_R, and a neural network identifier. The inputs to
mechanisms. Fujii et al. (1998) suggested a new C I are the difference between the reference signals
method for obstacle avoidance problems based on the r(k) and the state variables Sf (k ),come from the
Reinforcement Learning method. This algorithm was neural network identifier. Inputs to the neural
implemented for multi-robot system in simulation
network identifier are control signals u(k) given by
and actual environment. Although his method is very
practical, it has one drawback and that is his the C_I, where the time flow in the Imaginary World
controller cannot produce a continuos control action, Part is denoted by k , which is independent of that
because the control action space and the velocity of the Real World Part. Ishii applied his idea to an
state space have to be partitioned into some discrete AUV test bed, named Twin Burger 2 as an example
values in advance. for its Heading motion mode. The difficulty
regarding to Ishii's model is that his model is
The proposed module for path planning while applicable to control problem of SISO systems and
obstacle avoidance of AUVs in this paper is an has not enough generality to be applied to MIMO
example of advanced control by neural network and systems. In this research work it is going to adopt his
uses normal distribution function for continuos architecture and modify it for MIMO control
output generation. Learning method is based on purposes.
Reinforcement Learning. Proposed method be just
applied to the model extracted for AUV, Twin For MIMO systems, any kind of neural network
Burger 2, for its planar motion in a horizontal plane architecture regardless of the number of layers and
considering its Surge, Sway, and Yaw motion and the number of neurons in each layer are very
having constant Heave.

168
complex and large that cannot be trained easily in a
single step of training. As a result it has to be trained
gradually in some sequences of training procedures.
Dynamics modeling of MIMO systems is called here
as Coupled Model Neural Network Identifier
(CMNNI), and position control is called as Couple
Model Neural Network Controller (CMNNC). Each
one is decomposed to a numbers of SISO systems,
Fig. 3. AUV Twin Burger 2 is used as a versatile
called Single Degree of Freedom Neural Network
test bed for control and navigation software
Identifiers (SDFNNI), and Single Degree of Freedom
implementation, constructed in 1994.
Neural Network Controllers (SDFNNC). After
training of all of these SISO identifiers and
Every neural network needs some kind of data sets or
controllers, additional interconnecting synapses
teachers to be used as of teaching data. Using
compose MIMO identifier and controller and then
Supervised Learning carries out training of the neural
these new synaptic weights are adjusted through
network identifiers (dynamics modeling). It means
additional training processes.
that input/output behaviors of the system are sampled
through some experiments and then these sampled
Identifier and controller networks have three layers,
data sets are used for adjusting the parameters of the
which are input layer, hidden layer, and output layer.
neural networks. The proposed identification
To simulate the motion of the vehicle by giving input
technique is implemented to a four degrees of
force and only initial condition of velocity, the output
freedom AUV, Twin Burger 2, which is shown in
of the first integration layer has to be fed back to the
Fig. 3. This AUV has four motion modes called :
input layer through RC-l loop. The recurrent
Surge (longitudinal), Sway (lateral), Heave
connection from the hidden layer to the input layer,
(Vertical), and Yaw (rotational). Doppler Sonar
called RC-2, allows the network to express the
Sensor detects state variables of the vehicle. Number
dynamics behaviors with reduced number of input
of neurons in hidden layer of every SDFNNI and
state variables. The RC-3 loop makes the controller
SDFNNC are 6 neurons and as a result this number
able to provide control signal (force/moment) by
will be 24 neurons for CMNNI and CMNNC.
inputting only its initial value at the beginning of the
control task. Inputs to the controller network are past
Training algorithm of neural network identifiers in
time step of control signaVsignals (force/moment),
single degree of freedom modes and coupled mode is
position error/errors, and velocity error/errors.
carried out in 4 consequent steps. Error Back
Output of the controller network is control
Propagation rule is used while training processes. In
signal/signals of the next time step. Inputs to the
the first step of training, inputs to the identifier
identifier network are control signaVsignals, and past
networks are control signaVsignals and
tIme step velocity/velocities.
velocity/velocities come from the sampled data sets.
Evaluation error is based on the discrepancies
between simulated acceleration/accelerations and
actual one/ones. In the second step, inputs are
Surge Reference control signaVsignals come from data pack and
Velocity(k)
Surge Rererence +·~-~~:&:Pi~:;L~~~ simulated velocity/velocities. Evaluation function is
Position(k) ';o.,,..o--ll~..1l the same as step one. In the third and fourth steps,
Surge
Force(k·l )
mputs are the same as step two. Evaluation functions
(k-J)
are made from discrepancies between actual
velocity/velocities and simulated one/ones in step
three, and actual position/positions and simulate
Yaw
Pitch Rate
one/ones in step four. All of the synaptic weights of
Yaw An"t"':':"-i'--\\-ff-+---+--I'-H+--J!-\--_~
(k-+-+1~
) the SDFNNIs are set to some small random values
when training starts and then these parameters are
adjusted gradually. After getting proper convergence
in the evaluation function of each SDFNNI, the
CMNNI is made using additional interconnecting
synaptic weights. These weights are also set to some
CMNNC small random values initially and then will be
modified while training algorithm.

In case of Twin Burger 2, there are totally eight


Fig. 2. Neural network architecture for CMNNI and series of simulation results in Surge, Sway, Heave,
CMNNC composed of interconnected SDFNNI and Yaw motion modes for SDFNNIs, and CMNNI.
and SDFNNC networks in a single structure. Simulation results in Sway mode of CMNNI, which
includes the most severe coupling effects, are

169
illustrated in Fig. 4 as an example. For more Supervised Learning (initialization algorithm) for
additional examples refer to the other publications by SDFNNC is carried out in two steps. In the first step
the authors (Sayyaadi et al., 2000). It can be inputs to the network are position error, velocity
concluded that the proposed identifiers are very good error, and control signal of the past time step, all
ones to model dynamics behavior of any kind of six come from the data pack generated by the
degrees of freedom rigid body. In the velocity chart rudimentary controller. Evaluation function is made
of Fig. 4, the simulated velocity using SDFNNI is from the discrepancies between control signal of the
shown for comparison. By looking at this chart, it next time step simulated by the network and that of
can be concluded that SDFNNI is not a suitable the rudimentary controller in the data pack. In the
identifier for the problems of coupled type. second step of training, inputs to the network are
position error, velocity error, and initial control
Training algorithm for neural network controller signal come from the data set, and control signal of
(position control) is carried out using Supervised the next time step is fed back while learning
Learning, and Unsupervised Learning together. In algorithm progresses. The rudimentary controller is
case of SDFNNC, a rough controller, rudimentary a simple positive/negative switching mechanism
controller, is used to generate the primary data sets around target point.
for Supervised Learning. This controller cannot keep
the controlled object in the desired reference level, The next stage of training for SDFNNC is fine-tuning
because it is not tuned up well. But it is very useful up of the controller network, because the initialized
at the beginning of the training procedure where network cannot make the system converge to the
there is no any kind of knowledge about the control desired point due to its oscillatory behavior around
action. This step of training is just for initialization target point. This stage is adaptation algorithm and is
of the controller network. For each different mode, carried out using Unsupervised Learning. In this
synaptic weights of SDFNNC are initially set to stage identifier network and controller network are
some small random values, and then are modified to considered as of a unique network for adaptation.
imitate rudimentary controller. Inputs to the network are position error, velocity
error, and initial value of control signal. State
variables simulated by the identifier network are fed
back to the input layer as well as control signal while
training algorithm progresses. Evaluation function is
weighted summation of the position error, and
velocity error. Error Back Propagation is used for
adjusting the synaptic weights and only the weights
regarding to the controller network are modified.
Initialization algorithm and adaptation algorithm are
applied to four independent motion modes of Twin
0 . 1$
Burger 2, and the results for its Sway mode are
~ 0 .1 illustrated in Fig. 5.
:1 0 .0 '

1i 0 In case of MIMO system, it is not possible to make a


< -0.0:5

! -0 . 1
rudimentary controller due to the coupling effects
-0.1' between different modes. Thus the training
algorithm for CMNNC is only based on
Unsupervised Learning (adaptation algorithm) . The
0 .1 '
additional synaptic weights, used for interconnecting
0 .1 of the independently trained SDFNNCs in a unique
:l 0 .0'
0 CMNNC structure, are set to some small random
~ - 0 .0' t-:::ii=~;--a+-=::'~~->-+=""""'<F--\c<>-~'" values at the beginning of the training procedure and
; -0.1
en -0 . 1' r---~ then will be modified during training. CMNNC and
-0. 2 - . - . - - - -- ------- - - --_ .....__ .. _...... _.... __ .__ .__ ..-
CMNNI are considered as a unique network and
Error Back Propagation rule is applied to this
network in such a way that the only weights relating
60
T •.mc-(-o.)
70
to the controller part are modified. Adaptation
algorithm is carried out in two steps. In the first step,
all of the reference values are set to zero and training
is carried out. Inputs to the network and evaluation
function are similar to those of SDFNNC while
adaptation algorithm. In the second step all of the
reference values are set to the desired values and the
Fig. 4. Simulated results of CMNNI in Sway mode same procedure as of the first step is repeated. Fig. 6
of Twin Burger 2 and comparison with the result illustrates the performances of the CMNNC for Twin
ofSDFNNI. Burger 2.

170
3. PATH PLANNING OBST ACLE AVOIDANCE
MODULE FOR AUVS
~~ "J.......==~--;--r""" . . . .~ . . . .
f 1r· -t-I:lr -;;;:-i1 - !r7-
r=- ] ..... _.". ' .... r-.
L
t
..... _,0 -..... ~-- ._~s ..... _._ ............ 0. , ~ ~ J, The last problem addressed in this paper is path
_ 10
-.,
.".... ~ ..., - - - - -. - . ~
planning for obstacle avoidance module. This
module has neural network architecture also and is
trained using Reinforcement Learning. Neural
network structure here is composed of two different
networks, which are Predictor Unit, and Learning
Unit. Predictor Unit is responsible to predict the
reinforcement signal for the next time step and its
output is treated as of standard deviation corresponds
to a normal distribution function . Training algorithm
for this unit is based on Supervised Learning and
Error Back Propagation rule is used to minimize the
evaluation function , which is made from the
discrepancies between actual reinforcement signal
and predicted one. Learning Unit is said as of the
main unit for the desired task and its out put is treated
Fig. 5. Results of SDFNNC while initialization, and as of mean value corresponds to the normal
adaptation in Sway for Twin Burger 2. distribution function, which its standard deviation
was identified as of output of the Predictor Unit.
Normal random variable corresponds to these two
parameters is the output of the module.
30
·- :"·s~~".., ·t · V1""'''' s .........>" '"f
~
... .. t ·'U I'UU
20
~ .. 'Y1I:IVv' M"~IH~'nt
2S 10
G Fig. 7 shows the path planning for obstacle avoidance
~ ' 0
module for planar motion of Twin Burger 2.
] -20
-.30
Cartesian workspace of the vehicle is partitioned in
one direction to some predefined values, and the
0. 1 other value corresponding to the other perpendicular
direction is derived as of output of the module.
Rotation of the vehicle is calculated based on the last
two Cartesian values. Output of the path planning
module is fed forward to the position control
network. Outputs of the Predictor Unit, (J , and
0.:'\ ..
''''' ' '---l. . . .. .s __ ~~' lOO ......... Learning Unit, f..,£, are mapped by the normal
If O . 2S..,~ 1-1 _ ,•.., 1"'
... SUfWU ~~. .
..·.....
l .... -::-;~:Dt_~·.. _'0'l¥h.~
~ 0.2 ........................ " ..
function '¥(f..,£,a) to a random value a. Function
l'
.
0. 1

j ,,.
f then maps this value to the Cartesian output y. If
, "CO
r is the actual reinforcement and r is the predicted
one, updating of the synaptic weights of the Learning
Fig. 6. Results of CMNNC while adaptation in Unit is as follows :
Surge, Sway, Heave, and Yaw for Twin Burger 2.
~w(k) = a dr(k + 1) = a dr(k + 1) dJ1(k) (1)
dw(k) dJ1(k) dw(k)

x(k ) Wherea is the learning rate, and w is a synaptic


y(k) element. It is not possible to calculate the derivative
B(k) of the critic r relative to the mean J1 . If the
x(k) reinforcement value is between 0 and 1, and it is
y(k) assumed that a smaller value represents the better
performance, this then yields (Sayyaadi et aI., 2000):
x(k) ~~~=~=::J
dr(k + 1) =(;(k) _ r(k + 1»)a(k) - J1(k) (2)
dJ1(k) cr(k)
Heave
This implies that when the actual reinforcement is
smaller than the predicted one, the direction of search
Fig. 7. Path planning for obstacle avoidance module for a better action is correct, and therefore the mean
for Twin Burger 2 using Reinforcement Learning.

17l
value 11 (k) is adjusted toward a(k). Conversely,
.• Initialize Path
if the actual reinforcement is larger than the predicted ....- Avoidance Path
one, the direction of search is wrong, and therefore -- Directioning Path
3.5 -to- Avoidance & Directionin Path

the mean value ).l(k) is adjusted away from a (k) . 3


2 .5
2
Twin Burger 2 is equipped with Range Finder Sensor ] 1.5
that are used for detection of ranges to the obstacle. >-
0.5
World model shown in Fig. 7 sends states of the
0
vehicle as well as range parameters to the critic -0.5 "2 _····_···4 .. - ..... ··.. ·..6 .... ·.. _·..·.. 8·.... ·.. ···· ..10
function block (eight range sensors are planar aligned x [m]
around vehicle 45 0 apart). Reinforcement signal is
a weight summation of two parts, which are
avoidance, and directioning. avoidance part is: Fig. 8. Path planning for obstacle avoidance results
'i=1 0~11<1 for planar motion of Twin Burger 2.
1 8 '
1j =- L 'i'
8,;1
r; =_1_(1_ TJrnax
I-TJ TJ
J 1~ 11; ~ 11max (3)
< 1]i
1
max I 1Jrnax
'i =0 4. CONCLUSION

L; Lmax l This paper proposes new approaches for dynamics


Where 11; = -- '11max =--, min =0.3 m is
modeling, position control, and path planning of
Lrnin Lmin
AUVs using neural networks. Referring to the
minimum permissible range, lmax =3 m is maximum results shows that the dynamics mode ling and
position control methods are suitable for any kind of
detectable range, and 1; is range to obstacle. nonlinear MIMO system when using conventional
methods are troublesome. Path planning for obstacle
Output of the path planning module is lateral position avoidance results show that the adopted
of the vehicle. If d max =10 m denotes the Reinforcement Learning that uses normal distribution
function can produce continuos output comparing to
maximum permissible value, and d denotes the
the other Reinforcement Learning algorithm. It has
lateral distance of the vehicle from existing point to also better performances compared to the other path
the start-end line then the reinforcement component planning method such as Genetic Algorithm due to
regarding to directioning part is: its reward/penalty exploration/exploitation
r2 = -d- (4)
capabilities. Finally it is concluded that neural
networks are powerful tools for modeling, control,
d max
navigation, and so on, if treated wisely.
Total reinforcement signal is:
kj k2 REFERENCES
r= 'i + r2 (5)
k j +k2 k j +k2 Fujii, T., Y. Arai, H. Asama, and I. Endo (1998).
k} and k2 are the weight factors. If kj is zero Multilayered Reinforcement Learning for
Complicated Collision Avoidance Problems.
directioning will be achieved and if k2 is zero Proc. IEEE Robotics & Automation, pp. 2186-
avoidance will be achieved. A proper compromising 2191.
between these two parameters will generate a path Fujii, T., and T. Ura (1991). Neural-Network-Based
with avoidance and directioning behavior. These Adaptive Control Systems for AUVs. Eng.
parameters have to be decided at the starting of the Applic. Artif. Intell., Vol. 4, No. 4, pp. 309-318.
path planning procedure. In the simulation results Ishii, K., T. Fujii, and T. Ura (1995). An On-line
shown in Fig. 8 these parameters are decided to be Adaptation Method in a Neural Network Based
kl = 1.0 and k2 = 0.1. Similar to any other neural Control System for AUV's. IEEE Journal of
network, it is necessary to initialize the module using Oceanic Engineering, 'X0I. 20, No. 3, pp. 221-
some data sets. A rough path is identified and 228.
Sayyaadi, H., T. Ura, and T. Fujii (2000). Collision
Supervised Learning is used to initialize synaptic
Avoidance Controller for AUV Systems Using
weights. The module is trained for three different
Stochastic Real Value Reinforcement Learning
cases, directioning, avoidance, and directioning &
Method. Proc. SICE Japan, CD-ROM ID 114D1.
avoidance together. In these simulation results, a 2
Sugihara, K, and 1. Yuh (1997). GA-Based Motion
meters long obstacle is assumed to be located 4
Planning for Underwater Robotic Vehicle. Int.
meters far from the vehicle's starting point and
Symposium on Unmanned Untethered
perpendicular to the start-end connecting line.
Submersible Technology, pp. 406-416.

172

You might also like