100 - Tu - IEEE Trans On 2018 - ADP For Vessel PDF

This article has been accepted for inclusion in a future issue of this journal.
Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON CYBERNETICS 1
Adaptive Tracking Control of Surface Vessel Using

Optimized Backstepping Technique
Guoxing Wen , Shuzhi Sam Ge , Fellow, IEEE, C. L. Philip Chen , Fellow, IEEE,
Fangwen Tu, and Shengnan Wang
Abstract—In this paper, a tracking control approach for sur- I. I NTRODUCTION

face vessel is developed based on the new control technique
ITH the rapid development of deep-sea exploitation
named optimized backstepping (OB), which considers optimiza-
tion as a backstepping design principle. Since surface vessel
systems are modeled by second-order dynamic in strict feed-
W and ocean transportation, trajectory tracking and path
following of surface vessel had become an active research
back form, backstepping is an ideal technique for finishing the topic and attracted considerable attentions [1]. However, due
tracking task. In the backstepping control of surface vessel, the to the complexity and unpredictability of deep ocean envi-
virtual and actual controls are designed to be the optimized solu-
tions of corresponding subsystems, therefore the overall control ronments, the research topic always keeps the big challenge.
is optimized. In general, optimization control is designed based It is well known that backstepping is the most popular and
on the solution of Hamilton–Jacobi–Bellman equation. However, fundamental technique for the tracking control of high order
solving the equation is very difficult or even impossible due systems [2]–[4]. Its basic idea is that the control behaviors are
to the inherent nonlinearity and complexity. In order to over-
come the difficulty, the reinforcement learning (RL) strategy of
carried out by a recursive process, which considers many state
actor-critic architecture is usually considered, of which the critic variables as “virtual controls” and designs the control laws for
and actor are utilized for evaluating the control performance them. Recently, backstepping technique has been applied to the
and executing the control behavior, respectively. By employing tracking control of surface vessel, and many excellent research
the actor-critic RL algorithm for both virtual and actual con- results have been published, such as [5]–[9].
trols of the vessel, it is proven that the desired optimizing and
tracking performances can be arrived. Simulation results fur- In control community, optimization had become a funda-
ther demonstrate effectiveness of the proposed surface vessel mental design principle, many pioneering works based on
control. adaptive fuzzy backstepping have been proposed for nonlinear
Index Terms—Actor-critic architecture, Lyapunov stability, systems with unmeasured states [10]–[12]. Since sea voyage
optimized backstepping (OB), reinforcement learning (RL), sur- for deep-sea exploitation or ocean transportation is supported
face vessel. by massive energy consumption, it is very necessary to con-
sider optimization in surface vessel control. Optimal control
means to minimize a structured cost index, which describes
Manuscript received May 12, 2018; accepted May 25, 2018. This work was
supported in part by the Shandong Provincial Natural Science Foundation,
the balance between desired performance and available con-
China, under Grant ZR2018MF015, in part by the Doctoral Scientific trol resources. However, up to now, few of the optimal control
Research Staring Fund of Binzhou University under Grant 2016Y14, in methods are reported for surface vessel. Usually, the optimal
part by the National Natural Science Foundation of China under Grants
61572540, 61603094, and 61703050, in part by the Macau Science and control is designed by using the solution of Hamilton–Jacobi–
Technology Development Fund under Grants 019/2015/A, 024/2015/AMJ, and Bellman (HJB) equation [13], which becomes Riccati equation
079/2017/A2, and in part by the University Macau MYR Grant. This paper for the case of linear systems, but the HJB equation of vessel
was recommended by Associate Editor D. Zhao. (Corresponding author:
Guoxing Wen.) dynamic is solved difficultly owing to its inherent nonlinear-
G. Wen is with the Department of mathematics, Binzhou University, ity and intractability. Therefore, these exiting optimal schemes,
Binzhou 256600, China (e-mail: gxwen@live.cn). such as [14]–[16], cannot be directly applied to surface vessel,
S. S. Ge is with the Department of Electrical and Computer Engineering,
National University of Singapore, Singapore 117576, and also with the Social especially for tracking control.
Robotics Laboratory, Interactive Digital Media Institute, National University Although most tracking control methods of high order sys-
of Singapore, Singapore 117576 (e-mail: samge@nus.edu.sg). tems are designed based on backstepping technique [17]–[19],
C. L. P. Chen is with the Department of Computer and Information Science,
Faculty of Science and Technology, University of Macau, Macau 99999, integrating optimization into backstepping is still very chal-
China, also with the Navigation College, Dalian Maritime University, Dalian lenging due to the technical and mathematical complexities.
116026, China, and also with the State Key Laboratory of Management and
Control for Complex Systems, Institute of Automation, Chinese Academy of
Recently, a new technique named optimized backstepping
Sciences, Beijing 100080, China (e-mail: philip.chen@ieee.org). (OB) is proposed in [20], which fuses optimization control into
F. Tu is with the Department of Electrical and Computer backstepping design. The basic idea is that all virtual controls
Engineering, National University of Singapore, Singapore 117576 (e-mail:
fangwen_tu@hotmail.com).
and the actual control are designed to be the optimized solution
S. Wang is with the School of Economics and Management, Binzhou of corresponding subsystems, therefore the control of overall
University, Binzhou 256600, China (e-mail: jgxwsn@126.com). system is optimized. Since backstepping control has been well
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. developed to surface vessel control [2]–[4], OB technique can
Digital Object Identifier 10.1109/TCYB.2018.2844177 be performed for optimizing the tracking control of surface
2168-2267 c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON CYBERNETICS
vessel. However, due to surface vessels modeled in multidi-

mensional form and its velocity states described by body-fixed
frame, applying OB technique to surface vessel control is still
a challenging work.
Since neural networks (NNs) and fuzzy logical systems
(FLSs) had been proven to have the excellent approxima-
tion and learning abilities, they had become the powerful
and popular tools in the nonlinear system modeling and
controlling. In the past decades, a great number of NN or FLS-
based nonlinear control methods are published [21]–[29], in
which [21]–[25] are based on NNs and [26]–[29] are based
on FLSs. In [30]–[33], the NN or FLS-based reinforcement
learning (RL) is successfully applied to adaptive optimization
control and has become a popular means in recent. The basic
idea of RL is to obtain the appropriate actions by evaluating
the feedback from the environment. One of effective means
for implementing RL is actor-critic architecture, of which the
actor and critic are utilized for performing control actions and Fig. 1. Model ship.
evaluating the actions, respectively, [34].
Motivated by the above discussion, in this paper, an opti-
Let f (χ ) = Co (η, ν)ν(t) + Do (η, ν)ν(t) + go (η), where
mized control scheme is developed for surface vessel systems
χ (t) = [ηT (t), ν T (t)]T ∈ R6 , then the dynamic equation (2)
based on the OB control idea. Since the vessel model is
can be rewritten as
depicted in multidimensional form, it is a challenging work
to implement RL algorithm and stability analysis. In order to η̇(t) = ν(t), ν̇(t) = f (χ ) + u. (3)
design the optimized control, the vessel model in body-fixed
frame is transformed to earth-fixed frame. Therefore, the OB Remark 1: It should be mentioned that the control law
control is first obtained in earth-fixed frame, then the control is designed for the dynamic model (2), therefore the con-
for original system can be got by coordinate transformation. trol is got in earth-fixed frame first. Then the control for
dynamic model (1) can be obtained by left multiplying the
II. P ROBLEM D ESCRIPTION term MJ −1 (η).
Consider the following surface ship modeled in 3-degrees
of freedom, which are surge, sway, and yaw shown in Fig. 1: III. O PTIMIZED BACKSTEPPING C ONTROL D ESIGN
In this section, optimized tracking control is designed for
η̇(t) = J(η)v(t)
the surface vessel by the 2-step OB [20]. In order to archive
Mv̇(t) = −C(v)v(t) − D(v)v(t) − g(η) + τ (1) optimization, the actor-critic RL algorithm is carried out in
where η(t) = [ηx (t), ηy (t), ηz (t)] ∈ R3 are the position and every backstepping step, where the actor is utilized to perform
heading states in the earth-fixed frame, respectively; v(t) = the control policy and the critic is utilized to evaluate the
[vx (t), vy (t), vz (t)] ∈ R3 are the surge, sway, and yaw velocities optimization performance.
in the body-fixed frame, respectively. Control Objective: Based on the OB technique, design an
⎡ ⎤ optimized tracking control scheme for surface vessel (1) such
cos(ηz ) − sin(ηz ) 0 that: 1) all error signals are semi-globally uniformly ultimately
J(η) = ⎣ sin(ηz ) cos(ηz ) 0⎦ ∈ SO(3), bounded (SGUUB) and 2) the vessel follows the desired tra-
0 0 1 jectory ηd = [ηdx , ηdy , ηdz ]T , where ηd is sufficiently smooth
that means J −1 = J T , is the Jacobian transformation matrix and ηd with its derivative is bounded, to desired accuracy.
for the coordinate transforming between the body-fixed and Definition 1 (SGUUB [20]): Consider the nonlinear system
earth-fixed frames; M = M T is the inertia matrix, which is ẋ(t) = g(x, t)
assumed a positive definite constant matrix; C(v) = −CT (v)
is the Coliolis centripetal matrix; D(v) is the damping matrix; where x(t) ∈ Rn is the state vector. Its solution is said to be
g(η) ∈ R3 is the restoring force vector in the presence of SGUUB if, for x(0) ∈ x where x ∈ Rn is a compact set,
gravity and buoyancy; and τ ∈ R3 is the control input. there exist two constants σ and T(σ, x(0)) such that x(t) ≤ σ
Let ν(t) = J(η)v(t), then the vessel model (1) can be is held for all t > t0 + T(σ, x(0)).
rewritten as Step 1: Define the tracking error vector as zη (t) = η(t) −
ηd (t), and its time derivative along (3) is
η̇(t) = ν(t)
ν̇(t) = Co (η, ν)ν(t) + Do (η, ν)ν(t) + go (η) + u (2) żη (t) = ν(t) − η̇d (t) (4)
where Co (η, ν) = −JM −1 C(J −1 ν(t))J −1 , Do (η, ν) = J̇J −1 − where ν(t) is viewed as intermediate control of the backstep-
JM −1 D(J −1 ν(t))J −1 , go (η) = −JM −1 g(η) and u = JM −1 τ . ping.
WEN et al.: ADAPTIVE TRACKING CONTROL OF SURFACE VESSEL USING OB TECHNIQUE 3
Let α(zη ) ∈ R3 denote virtual control of the zη - Rewrite the optimal value function (6) as
subsystem (4), the infinite horizon value function is defined 2
as Vη∗ zη = βη zη (t) + Vηo zη (11)
∞
where βη is a positive design constant, Vηo (zη ) =
Vη zη = rη zη (s), α zη ds (5)
t −βη zη (t)2 + Vη∗ (zη ).
Inserting (11) into (9), the optimal virtual control can be
where rη (z, α) = zTη zη + α T α is the immediate or local cost rewritten as
function.
1 ∂Vη∗ 1 ∂Vηo
Remark 2: The optimal problem for a dynamic system is α∗ = − = −βη zη (t) − . (12)
to find an admissible control policy [20] such that the con- 2 ∂zη 2 ∂zη
trol objective is realized by expending the minimal cost. For
It is well known that NNs have the excellent adaptive
example, for the zη -subsystem (4), the optimal virtual con-
learning and function approximating abilities, it can approx-
trol is designed to guarantee that the infinite horizon value
imate any continuous function to desired accuracy. Since the
function (5) is minimized.
scalar value function Vηo is continuous for zη ∈ η , it can be
View ν(t) as the optimal virtual control α ∗ (zη ), i.e., ν(t)
∗ approximated by NNs in the following form:
α , the optimal value function is yielded as

∞ Vηo zη = Wη∗T Sη zη + εη zη (13)

Vη∗ zη = min rη zη , α ds
α∈ (η ) t where Wη∗ ∈ Rnη is the ideal NN weight, nη is the neuron
∞
number; Sη (zη ) ∈ Rnη is basis function vector; εη (zη ) ∈ R is
= rη zη , α ∗ ds (6)
t the NN approximation error, which is required that it and its
derivative are bounded (more details see [20]).
where (η ) denotes the set of admissible control policies Based on the ideal approximation (13), Vη∗ (zη ) and α ∗ can
over η , η ⊂ R3 is a compact set. be re-expressed as
The Hamiltonian function associating with the infinite hori- 2
zon value function (6) is Vη∗ zη (t) = βη zη (t) + Wη∗T Sη zη + εη zη

1 ∂ T Sη ∗ 1 ∂εη
∂Vη ∂Vη T α ∗ = −βη zη (t) − W − (14)
Hη zη , α, = rη z η , α + żη (t) (7) 2 ∂zη η 2 ∂zη
∂zη ∂zη
where (∂ T Sη /∂zη ) ∈ R3×nη and (∂εη /∂zη ) ∈ R3 are the
where ∂Vη /∂zη denotes the gradient of Vη with respect to zη .
gradients with respect to zη .
According to both (6) and (7), there is the following HJB
Using the NN approximation (14), HJB equation (8) can be
equation:
rewritten as

∂Vη∗
Hη zη , α ∗ , Wη∗ = − βη2 − 1 zη − 2βη zTη η̇d
∗ 2
Hη zη , α , = zTη zη + α ∗T α ∗
∂zη
∗
T ∂Sη
∂Vη ∗ − Wη∗T βη zη + η̇d
+ α − η̇d = 0. (8) ∂zη
T
∂zη T 2
1 ∂ Sη ∗
− W + ρη (t) = 0 (15)
Assuming the solution of (8) is existent and unique, 4 ∂zη η
the optimal virtual control α ∗ can be obtained by solving
∂H zη , α ∗ , ∂Vη∗ /∂zη /∂α ∗ = 0 where ρη (t) = (∂εη /∂zTη )α ∗ (t) + (1/4)(∂εη /∂zη )2 −
(∂εη /∂zTη )η̇d (t), which is a bounded term by a positive constant
1 ∂Vη∗ ψη , i.e., |ρη (t)| ≤ ψη .
α ∗ (t) = − . (9) The optimal virtual control (14) is unavailable because the
2 ∂zη
ideal weight matrix Wη∗ is unknown. In order to achieve the
Substituting (9) into (8), the following result yields: control scheme, the RL algorithm is performed by constructing
the following both critic and actor NNs, which are utilized to

T
T ∗
∂Vη∗ 1 ∂Vη∗ ∂Vη evaluate the controlling performance and execute the virtual
zη (t) 2 − η̇d (t) − = 0. (10) control, respectively:
∂zη 4 ∂zη ∂zη
2
By substituting solution of (10) into (9), the optimal vir- V̂η∗ (zη ) = βη zη (t) + Ŵηc
T
(t)Sη zη (16)
tual control can be obtained. However, solving the equation is 1 ∂ Sη
T
α̂ zη = −βη zη (t) − Ŵηa (t) (17)
very difficult or impossible because of its strong nonlineari- 2 ∂zη
ties. In order to realize the control scheme, the online RL of
actor-critic architecture is performed by employing adaptive where V̂η∗ denote the estimations of Vη∗ ; Ŵηc
T ∈ Rnη and Ŵ ∈
ηa
NN approximation. n
R η are the critic and actor NN weights, respectively.
Adding (16) and (17) into (8), the approximated HJB By introducing the error variable zν (t) = ν(t) − α̂(zη ), the
equation can be yielded as error dynamic (4) can be rewritten as
2
∂ T Sη zη żη (t) = zν (t) + α̂ zη − η̇d (t). (24)
1
Hη zη , α̂, Ŵη = zη + βη zη +
2
Ŵηa (t)
2 ∂zη For the zη -subsystem, Lyapunov function candidate is
T designed as
∂ T Sη zη
− 2βη zη (t) + Ŵηc (t) 1 2 1 T 1 T
∂zη Lη (t) = zη (t) + W̃ηa (t)W̃ηa (t) + W̃ηc (t)W̃ηc (t) (25)
2 2 2
1 ∂ T Sη zη where W̃ηc (t) = Ŵηc (t) − Wη∗ , and W̃ηa (t) = Ŵηa (t) − Wη∗ . Its
× βη zη (t) + Ŵηa (t) + η̇d (t) .
2 ∂zη time derivative along (21), (22), and (24) is

(18) L̇η (t) = zTη (t) zν (t) + α̂(zη ) − η̇d (t) + W̃ηa T
(t)

From (15) and (18), the Bellman residual error is derived as 1 ∂Sη zη ∂Sη ∂ T Sη
× z η (t) − γ ηa Ŵηa (t)
2 ∂zTη ∂zTη ∂zη
eη (t) = Hη zη , α̂, Ŵη − Hη zη , α ∗ , Wη∗
γηc ∂Sη zη ∂ T Sη zη
= Hη zη , α̂, Ŵη . (19) + 2 Ŵηa (t)
4 1 + ση ∂zTη ∂zη
By applying gradient descent algorithm to the positive definite
γηc
function × ση (t)Ŵηc (t) −
T
2
1 1 + ση (t)
Eη (t) = e2η (t) (20)
2
× W̃ (t)ση × σ T (t) − β 2 − 1 zη
T 2
ηc ηŴηc η
the following critic NN updating law is yielded so that the
Bellman residual error e(t) is minimized: T 2
1 ∂ Sη
˙ (t) − 2βη zTη (t)η̇d (t) + Ŵηa (t) . (26)
Ŵηc 4 ∂zη
γηc ∂eη (t)
=− 2 eη (t) Substituting (17) into (26) yields
1 + ση (t) ∂ Ŵηc (t)
⎛ 2 1 ∂ T Sη
γηc 2 L̇η (t) = −βη zη + zTη zν − zTη η̇d − zTη Ŵηa (t)
=− 2 ση ⎝σηŴηc (t) − βη − 1 zη (t) − 2βη zη (t)η̇d (t)
T 2 T 2 ∂zη

1 + ση 1 T ∂Sη zη ∂Sη ∂ T Sη
2 ⎞ + W̃ηa (t) zη (t) − γηa W̃ηa (t) T
T

1 2 ∂zη
T ∂zη ∂zη
∂ Sη zη
T

+ Ŵηa (t) ⎠ (21)
4 ∂zη γηc ∂Sη zη ∂ T Sη zη
× Ŵηa (t) + 2 W̃ηa (t) ∂zT
T
4 1 + σ η ∂zη
where γc1 > 0 is the learning rate; ση (t) = η
−([∂Sη (zη )]/∂zTη )(βη zη (t) + (1/2)(∂ T Sη /∂zη )Ŵηa (t) + η̇d (t)). γηc
× Ŵηa (t)σηTŴ (t) − 2
The actor NN updating law is designed in the following: ηc
1 + ση

1 ∂Sη zη ∂Sη ∂ T Sη
˙
Ŵηa (t) = zη (t) − γηa T Ŵηa (t) × T
W̃ηc (t)ση σηTŴ (t) − βη2 − 1 zη − 2βη zTη η̇d
2
2 ∂zη T ∂zη ∂zη ηc
γηc ∂Sη ∂ T Sη T 2
+ 1 ∂ Sη
2 ∂zTη ∂zη
Ŵηa (t)σηTŴ (t) (22) +
Ŵηa (t)
. (27)
4 1 + ση ηc
4 ∂zη
where γηa > 0 is the learning rate. Using W̃ηa (t) = Ŵηa (t) − Wη∗ , there are the following results:
Assumption 1 ([35] Persistence of Excitation (PE)): The
1 T ∂ T Sη 1 ∂ T Sη
signal of ση (t)σηT (t) is required persistent excitation over the − zη Ŵηa + zTη W̃ηa
2 ∂zη 2 ∂zη
interval [t, t + tη ], i.e., there exist constants kη > 0, kη > 0,
tη > 0 for all t to satisfy 1 ∂ T Sη ∗ ∂Sη ∂ T Sη
= − zTη Wη − γηa W̃ηa
T
(t) T Ŵηa (t)
2 ∂zη ∂zη ∂zη
kη I3 ≤ ση (t)σηT (t) ≤ kη I3 (23)
γηa T ∂Sη ∂ T Sη
=− W̃ηa (t) T × W̃ηa (t)
where I3 ∈ R3×3is identity matrix. 2 ∂zη ∂zη
Remark 3: The PE assumption is also carried out in next γηa T ∂Sη ∂ T Sη γηa
backstepping step. The signal of σν (t)σνT (t), which is defined − Ŵηa (t) T Ŵηa (t) +
2 ∂zη ∂zη 2
in next backstepping step, is required to meet the PE condition
over the interval [t, t + tν ], tν > 0, i.e., there exist constants ∂Sη ∂ T Sη ∗
× Wη∗T W .
kν > 0, kν > 0 for all t to satisfy kν I3 ≤ σν (t)σνT (t) ≤ kν I3 . ∂zTη ∂zη η
Adding the above results to (27) has Substituting (30) into (29) yields
2 1 2 γηa T
1 T ∂ T Sη ∗ L̇η (t) ≤ zν (t)2 − (βη − 2) zη − W̃ (t)
L̇η (t) = −βη zη + zTη zν − zTη η̇d − z W 2 ηa
2 η ∂zη η 2
∂Sη ∂ T Sη γηa T ∂Sη ∂ T Sη
γηa T ∂Sη ∂ T Sη γηa T × T W̃ηa (t) − Ŵηa (t) T
− W̃ηa (t) T W̃ηa (t) − Ŵ (t) ∂zη ∂zη 2 ∂zη ∂zη
2 ∂zη ∂zη 2 ηa
γηc ∂Sη ∂ T Sη
∂Sη ∂ T Sη γηa ∗T ∂Sη ∂ T Sη ∗ × Ŵηa (t) + 2 W̃ηa (t) ∂zT ∂z
T
× T Ŵηa (t) + W W 4 1 + ση η η
∂zη ∂zη 2 η ∂zTη ∂zη η
T γηc
γηc ∂Sη zη ∂ Sη zη × Ŵηa (t)σηTŴ (t) − 2
+ 2 W̃ηa (t) ∂zT
T
Ŵηa (t) ηc
1 + ση
4 1 + σ η ∂zη
η
γηc 1 T ∂Sη ∂ T Sη ∗
× σηTŴ (t) − × W̃ηc (t)ση × σηTW̃ (t) − Ŵηa
T
(t) T W
2 ηc 2 ∂zη ∂zη η
ηc
1 + ση
1 ∗T ∂Sη ∂ T Sη ∗ 1 T ∂Sη ∂ T Sη
+ Wη W + Ŵ (t)
(t) − βη2 − 1 zη − 2βη zTη η̇d ∂zTη ∂zη η 4 ηa ∂zTη ∂zη
2
× T
W̃ηc (t)ση σηTŴ 4
ηc
γηa
T 2 1
1 ∂ Sη × Ŵηa (t) − ρη (t) + η̇d 2 + 1 +

+ Ŵηa (t) . (28) 2 2
4 ∂zη
∂Sη ∂ T Sη ∗
× Wη∗T W . (31)
Using
n Cauchy inequality that ( nk=1 ak bk )2 ≤ ∂zTη ∂zη η
2 n 2
a
k=1 k k=1 kb and Young’s inequality that Using the following facts:
ab ≤ (a2 /2) + (b2 /2), there are the following facts:
1 T ∂Sη ∂ T Sη ∗ 1 ∗T ∂Sη ∂ T Sη ∗
1 2 1 − Ŵηa (t) T W + W W
zTη (t)zν (t) ≤ zη (t) + zν (t)2 2 ∂zη ∂zη η 4 η ∂zTη ∂zη η
2 2
1 2 1 1 T ∂Sη ∂ T Sη 1 T ∂Sη ∂ T Sη
−zη (t)η̇d (t) ≤ zη (t) + η̇d (t)2
T + Ŵηa (t) T Ŵηa (t) = W̃ηa (t) T
2 2 4 ∂zη ∂zη 4 ∂zη ∂zη
1 T ∂ T Sη ∗ 2 ∂Sη ∂ T Sη ∗ 1 ∂Sη ∂ T Sη
− zη (t) Wη ≤ zη (t) + Wη∗T T W . × Ŵηa (t) − Wη∗T T W̃ηa (t) (32)
2 ∂zη ∂zη ∂zη η 4 ∂zη ∂zη
γηc γηc
2 W̃ηc (t)ση (t)ρη (t) ≤ 2 ρη (t)
T 2
Applying the above inequalities to (28) has

1 + ση (t) 2 1 + ση
1 2 γηa T
L̇η (t) ≤ zν (t)2 − (βη − 2) zη (t) − W̃ (t) γηc
2 2 ηa + 2 W̃ηc (t)ση ση W̃ηc (t)
T T
(33)
∂Sη ∂ Sη
T γηa T ∂Sη ∂ T Sη
2 1 + ση
× T W̃ηa (t) − Ŵηa (t) T
∂zη ∂zη 2 ∂zη ∂zη the inequality (31) can be rewritten as
γηc ∂Sη ∂ T Sη 1 2 γηa T
× Ŵηa (t) + 2 W̃ηa (t) ∂zT ∂z
T
L̇η (t) ≤ zν (t)2 − (βη − 2) zη (t) − W̃ (t)
4 1 + ση η η 2 2 ηa
γηc ∂Sη ∂ T Sη γηc
× Ŵηa (t)σηTŴ (t) − 2 × T W̃ηa (t) − 2 W̃ηc (t)
T
∂zη ∂zη 2 1 + σ (t)
ηc
1 + ση (t)
η

2 γηa T ∂Sη ∂ T Sη
× W̃ηc (t)ση × σηTŴ (t) − βη2 − 1 zη (t)
T × ση σηTW̃ (t) − Ŵηa (t) T Ŵηa (t)
ηc ηc 2 ∂zη ∂zη
T 2 γηc ∂Sη ∂ T Sη
1 ∂ Sη + 2 W̃ T
(t) Ŵηa (t)σηTŴ (t)
− 2βη zη (t)η̇d +
T
Ŵηa

ηa
∂z T ∂z ηc
4 ∂z η 4 1 + ση η η
1 γηa ∂Sη ∂ T Sη ∗ γηc ∂Sη ∂ T Sη

+ η̇d 2 + 1 + × Wη∗T T W . (29) − 2 W̃ηc (t)σηW̃ηa
T
T (t) Ŵηa (t)
2 2 ∂zη ∂zη η 4 1 + ση ∂zTη ∂zη
Based on (15), there is the following one: γηc ∂Sη ∂ T Sη
2 + 2 W̃ηc (t)σηWη∗T ∂zT ∂z W̃ηa (t)
T
− βη2 − 1 zη (t) − 2βη zTη (t)η̇d (t) 4 1 + ση η η
1 T ∂Sη ∂ T Sη ∗ γηa ∗T ∂Sη ∂ T Sη ∗ γηc

= −σ T (t)Wη∗ − Ŵηa (t) T + 1+ W + ρη
2
∂zTη ∂zη η 2 1 +
W Wη
2 ∂zη ∂zη η 2 σ 2 η
1 ∂Sη ∂ T Sη ∗
+ Wη∗T T W − ρη (t). (30) 1
+ η̇d 2 .
4 ∂zη ∂zη η 2
(34)

Based on the following condition: γηa γηc
2
× T
W̃ηc (t)ση σηTW̃ (t) − − T
Ŵηa (t)
ηc 2 2
γc1 ∂Sη ∂ T Sη
2 W̃ηa (t) ∂zT ∂z Ŵηa (t)σηŴηc (t)
T T
∂Sη ∂ T Sη γηa ∗T ∂Sη ∂ T Sη

4 1 + ση η η × T Ŵηa (t) + 1 + Wη
∂zη ∂zη 2 ∂zTη ∂zη
γηc ∂Sη ∂ T Sη T γηc 1
− 2 W̃ηc (t)σηW̃ηa
T
Ŵ (t) × Wη∗ + 2 ρη + 2 η̇d .
2 2
(36)
∂zTη ∂zη ηa
T (t)
4 1 + ση 2 1 + σ η
γηc ∂Sη ∗T ∂ T Sη Then the above inequality can be rewritten to compact

= 2 W̃ηa (t) ∂zT Wη ση ∂z Ŵηa (t).
T
4 1 + ση η η form as
1
Equation (33) can become the following equation: L̇η (t) ≤ −ξηT (t)Aη (t)ξη (t) + Cη (t) + zν (t)2
2
2 γηa T γηa γ 2
T ∂Sη ∂ Sη
1 T
L̇η (t) ≤ zν (t)2 − (βη − 2) zη (t) − W̃ (t) − −
ηc
2 2 ηa 2 2
Ŵηa
∂zTη ∂zη
Ŵηa (37)
∂Sη ∂ T Sη γηc
× T W̃ηa (t) − 2 W̃ηc (t)
T
where
∂zη ∂zη 2 1+ σ η
γηa T ∂Sη ∂ T Sη ξη (t) = [zTη (t), W̃ηa T

(t), W̃ηc T
(t)]T
× ση σηTW̃ (t) − Ŵηa (t) T Ŵηa (t) ⎡
ηc 2 ∂zη ∂zη βη − 2 0
⎢ γ γ 2
∂Sη ∂ T Sη
⎢
Aη (t) = ⎣ 0 ηa ηc ∗T
2 − 2 − 32 Wη ση σηWη∗ ∂zTη ∂zη
1 T
γηc ∂Sη ∗T ∂ T Sη
+ 2 W̃ηa (t) ∂zT Wη ση ∂z Ŵηa (t)
T
4 1 + ση η η 0 0
⎤
γηc ∂Sη ∂ T Sη 0
+ 2 W̃ηc (t)σηWη∗T ∂zT ∂z W̃ηa (t)
T
0 ⎥
⎦
4 1 + ση η η γηc ∗T ∂Sη ∂ T Sη ∗
1
− 1
W η ∂zη ∂zη
W η ση ση
T
1+ση
2 2 32 T
γηa ∗T ∂Sη ∂ T Sη ∗ γηc
+ 1+ W + ρη
2
γηa ∗T ∂Sη ∂ Sη ∗ T γηc
∂zTη ∂zη η 2 1 +
Wη
2 σ 2 Cη (t) = (1 + )Wη W + ρ2
η 2 ∂zTη ∂zη η 2(1 + ση 2 ) η
1 1
+ η̇d 2 . (35) + η̇d 2 .
2 2
According to Young’s inequality and Cauchy inequality, there Based on Assumption 1, the matrix Aη (t) can be made pos-
are the following results: itive definite via choosing the design parameters βη , γηa , γηc
γηc ∂Sη ∗T ∂ T Sη satisfying the following conditions:
2 W̃ T
ηa (t) W ση Ŵηa (t)
4 1 + ση ∂zTη η ∂zη kη
βη > 2, γηa > γηc
2
+ Wη∗T Wη∗
1 T ∂Sη ∂ T Sη 16
≤ W̃ηa (t) T Wη∗T ση σηW
T
W̃ηa (t)
∗T ∂Sη ∂ Sη ∗
∗ 1 T
32 ∂zη η ∂z
η γηc > sup Wη W . (38)
16 t≥0 ∂zTη ∂zη η
γηc
2
∂Sη ∂ T Sη
+ T
Ŵηa (t) T Ŵηa (t)
2 ∂zη ∂zη Then (37) can become the following one:
γηc ∂Sη ∂ T Sη 1 2
2 W̃ηc (t)σηWη∗T ∂zT ∂z W̃ηa (t)
T zν (t)2 − aη ξη (t) + cη
L̇η (t) ≤ (39)
4 1 + ση η η 2
where aη = inft≥0 {λmin {Aη (t)}}, cη = supt≥0 {Cη (t)}.
γηc
2
∂Sη ∂ T Sη 1 Step 2: The actual control u is obtained in the step. From
≤ T
W̃ηa (t) W̃ηa (t) + 2
2 ∂zTη ∂zη 32 1 + ση the dynamic equation (3), the time derivative of error variable
zν (t) = ν(t) − α̂ is
∂Sη ∂ T Sη ∗ T
× W̃ηc
T
(t)σηWη∗T W σ W̃ηc (t). żν (t) = f (χ ) − α̂˙ + u.
∂zTη ∂zη η η (40)
Substituting the above inequalities into (35) has Define the optimal cost function as
2 ∞
1
L̇η (t) ≤ zν (t)2 − (βη − 2) zη (t) Vν∗ (zν ) = min rν (zν , u)ds
2 u∈ (ν ) t
γηc
2 ∞
γηa 1 ∗T ∂Sη ∂ T Sη
− − − Wη ση σηWη∗ W̃ηa (t) T W̃ηa (t) rν zν , u∗ ds
T T
2 2 32 ∂zη ∂zη
= (41)
t

1 γηc 1 ∗T ∂Sη ∂ T Sη ∗ where rν (zν , u) = zTν zν + uT u, ν is a compact set, u∗ is the
− 2 − Wη W
1 + ση 2 32 ∂zTη ∂zη η optimal control. Then the HJB equation for zν -subsystem is
derived as The optimized control u for the vessel dynamic (3) is

∂V ∗ depicted in earth-fixed frame, the control for the dynamic

Hν zν , u∗ , ν = zTν (t)zν (t) + u∗T u∗ system (1) can be obtained by the following equation:
∂zν
∂Vν∗
˙ + u∗ = 0. τ = MJ −1 (η)u(t). (51)
+ f (χ ) − α̂ (42)
∂zTν
Substituting (49) and (50) into (42), the approximated HJB
By solving (∂Hν /∂u∗ ) = 0, the optimal control u∗ is obtained equation can be obtained as
as 2
1 ∂ T Sν (zν )
1 ∂Vν∗ z
Hν zν , u, Ŵν = ν + βν zν +
2
Ŵνa (t)
u∗ = − . (43) 2 ∂zν
2 ∂zν
T
∂ T Sν (zν )
Rewrite the optimal cost function as + 2βν zν (t) + Ŵνc (t)
∂zν
Vν∗ (zν ) = βν zν (t)2 + Vνo (zν ) (44)
˙ 1 ∂ T Sν
× f (χ ) − α̂ −βν zν (t) − Ŵνa (t) .
where βν is a positive design constant, Vνo (zν ) = 2 ∂zν
−βν zν (t)2 + Vν∗ (zν ). Substituting (44) into (42), the optimal (52)
control can be rewritten as Similar with step 1, the critic NN weight updating
1 ∂Vνo (zν ) law is constructed by minimizing Bellman error eν (t) =
u∗ = −βν zν (t) − (45)
2 ∂zν Hν (zν , u, Ŵνc ). Define a positive definite function as Eν (t) =
Since the uncertaintied term [(∂Vνo (zν ))/(∂zν )] is continuous (1/2)e2ν (t), then the critic NN weight updating law is derived
and well defined in the compact set ν , it can be approximated based on the gradient descent algorithm
by NNs as ˙ (t) = − γνc ∂eν (t)
Ŵνc eν (t)
Vνo (zν ) = Wν∗T Sν (zν ) + εν (zν ) 1 + σν 2
∂ Ŵνc (t)
(46)
γνc
where Wν∗T ∈ Rnν is the ideal weight; Sν (zν ) ∈ Rnν is the basis =− σν σνT Ŵνc (t) − (βν2 − 1)z2ν (t)
1 + σν 2
function vector; εν (zν ) ∈ R is the approximation error.
Substituting (46) into (45) has + 2βν zTν (t) f (χ ) − α̂˙

1 ∂ T Sν (zν ) ∗ 1 ∂εν 1 T ∂Sν ∂ T Sν

u∗ = −βν zν (t) − Wν − (47) + Ŵνa Ŵ νa (53)
2 ∂zν 2 ∂zν 4 ∂zTν ∂zν
where (∂εν /∂zν ) is bounded by a constant δν , i.e., where γνc > 0 is the learning rate, σν = (∂Sν /∂zTν )(f (χ ) −
(∂εν /∂zν ) ≤ δν . α̂˙ − βν zν (t) − (1/2)(∂ T Sν /∂zν )Ŵνa (t)).
Inserting (46) and (47) into (42) yields The weight updating law of actor NN is

˙ (t) = 1 ∂Sν z (t) − γ ∂Sν ∂ Sν Ŵ (t)
T
Hν zν , u∗ , Wν∗ = −(βν2 − 1)zν (t)2 + 2βν zTν (t) Ŵ
νa ν νa νa
∂Sν (zν ) 2 ∂zTν ∂zTν ∂zν
× f (χ ) − α̂˙ + Wν∗T
∂zT γνc ∂Sν ∂ T Sν
ν + Ŵνa (t)σνT Ŵνc (t) (54)
× f (χ ) − α̂˙ − βν zν 4 1 + σν 2 ∂zTν ∂zν
T 2 where γνa > 0 are the learning rate.
1 ∂ Sν (zν ) ∗
− Wν
+ ρν (t) = 0 (48) Consider the overall Lyapunov function candidate as
4 ∂zν follows:
where ρν (t) = (∂εν /∂zTν )u∗ + (∂εν /∂zTν )(f (x) − α̂) ˙ + 1 1 T
L(t) = Lη (t) + zTν (t)zν (t) + W̃ (t)W̃νa (t)
(1/4)(∂εν /∂zν )2 . Since all terms of ρν (t) are bounded, it 2 2 νa
can be bounded by a constant, i.e., |ρν (t)| ≤ ψν . 1 T
+ W̃νc (t)W̃νc (t) (55)
Because the ideal constant matrix Wν∗ is unknown, the 2
optimal control (47) is unavailable. For getting the available
where W̃νc (t) = Ŵνc (t) − Wν∗ and W̃νa (t) = Ŵνa (t) − Wν∗ are
control, the online critic-actor RL is employed to implement
the critic and actor NN approximation errors, respectively.
the optimizing scheme
The time derivative of L(t) along (40), (53), and (54) is
V̂ν∗ (zν ) = βν zν (t)2 + Ŵνc
T
(t)Sν (zν ) (49)
L̇(t) = L̇η (t) + zTν (t) f (χ ) − α̂˙ + u + W̃νa T
(t)
1 ∂ Sν (zν )
T

u = −βν zν (t) − Ŵνa (50) 1 ∂Sν ∂Sν ∂ T Sν
2 ∂zν × z (t) − γ Ŵνa (t)
ν νa
2 ∂zTν ∂zTν ∂zν
where V̂ν∗ (zν ) are the approximation of Vν∗ (zν ); Ŵνc
T (t) ∈ Rnν
and Ŵνa (t) ∈ R ν are the critic and actor NN weights,
T n γνc ∂Sν ∂ T Sν
+ Ŵνc (t)σν Ŵνc (t)
T
respectively. 4 1 + σν 2 ∂zTν ∂zν
γνc ⎡
− T
W̃νc (t)σν βν − 3 12 0
1 + σν 2 ⎢ γνa γνc
2
Aν (t) = ⎣ − − ∗T T ∗
32 Wν σν σν Wν
1
0 2 2
× σνT Ŵνc (t) − (βν2 − 1)z2ν (t) + 2βν zTν f (χ ) − α̂˙ 0 0
⎤

0
1 T ∂Sν ∂ T Sν 0 ⎥
+ Ŵνa (t) T Ŵνa (t) . (56) ⎦
4 ∂zν ∂zν γνc γνc
∗T ∂Sν
− ∗ ∂T S
32 Wν ∂zTν ∂zν Wν σν σν
1 ν T
1+σν 2 2
Applying the control (50)–(56), similar with the first step, there
γνa ∗T ∂Sν ∂ T Sν ∗ γνc 2 γνc 2
is the following one: Cν (t) = (1 + )Wν Wν + f (χ ) + ρ
2 ∂zν ∂zν
T 2 2 ν
γνa T ∂Sν
L̇(t) ≤ L̇η (t) − (βν − 3)zν (t)2 − W̃νa (t) T 1
2 ∂zν +
α̇2 .
2
∂ T Sν γνa T ∂Sν ∂ T Sν
× W̃νa (t) − Ŵ (t) Ŵνa (t) Based on PE assumption, Aν (t) can be made positive def-
∂zν 2 νa ∂zTν ∂zν inite by designing the parameters βν , γνa , and γνc to satisfy
γνc ∂Sν ∂ T Sν the following conditions:
− T
W̃νa (t) T Ŵνc σνT Ŵνc (t)
4 1 + σν 2 ∂zν ∂zν kν
γνc βν > 4, γνa > γνc
2
+ Wν∗T Wν∗
− W̃νcT
(t)σν 16
1 + σν 2
∗T ∂Sν ∂ Sν ∗
1 T
γνc > sup Wν W (61)
× σνT Ŵνc (t) − (βν2 − 1)z2ν (t) + 2βν zTν f (χ ) − α̂˙ 16 t≥0 ∂zTν ∂zν ν

then (60) can become the following one:
1 T ∂Sν ∂ T Sν 2
+ Ŵνa (t) T Ŵνa (t) L̇(t) < −aη ξη (t) − aν ξν (t)2 + cη + cν
4 ∂zν ∂zν (62)
γνa ∗T ∂Sν ∂ T Sν ∗ where aν = inft≥0 {λmin {Aν (t)}}, cν = supt≥0 {Cν (t)}.
+ 1+ Wν W
2 ∂zTν ∂zν ν The main results are concluded in the following theorem.
1 1 Theorem 1: Consider the surface vessel (1) with bounded
+ f 2 (χ ) + α̇2 . (57) initial condition and reference signals. If the OB control
2 2
Rewrite (48) to the following one: utilizes the weight updating laws (21), (22) for the virtual
control (17), and (53), (54) for the actual control (50), and
−(βν2 − 1)zν (t)2 + 2βν zTν (t) f (χ ) − α̂˙ the design parameters satisfy (38), (61), and PE conditions
(Assumptions 1) are satisfied, then:
∂Sν (zν ) ∂ T Sν (zν ) ∗
= −σνT Wν∗ − Ŵνa
T
(t) Wν 1) all error signals of the optimized control are SGUUB;
∂zTν ∂zν 2) the surface vessel can track the reference trajectory to
2
1 ∂ T Sν (zν ) ∗ desired accuracy.
+ W
ν − ρν (t). (58)
4 ∂zν Proof: See the Appendix.
Similar to the first step, applying (58) to (57) yields
IV. S IMULATION E XAMPLES
L̇(t) ≤ L̇η (t) − (βν − 3)zν (t)2 The simulation is carried out by a mode ship of 1:75 scale-

γνa γνc
2 1 ∗T T ∗ ∂Sν ∂ T Sν down replica. The mass of the model ship is m = 21 kg, its
− − − Wν σν σν Wν W̃νa T
(t) T W̃νa (t)
2 2 32 ∂zν ∂zν length and width are 1.2 and 0.3 m, respectively. The inertia,

γνc γνc 1 ∂Sν ∂ T Sν Coliolis centripetal, damping matrices are

− − T
W̃νc (t)σν σνT W̃νc (t) ⎡ ⎤
1 + σν 2 2 32 ∂zTν ∂zν 20 0 0

γνa γ2 T ∂Sν ∂ Sν
T
M =⎣0 19 0.72⎦
− − νc Ŵνa Ŵνa
2 2 ∂zν ∂zν
T 0 0.72 2.7
⎡ ⎤
γνa ∗T ∂Sν ∂ T Sν ∗ 1 2 0 0 −19vy − 0.72vz
+ 1+ Wν W + f (χ)
2 ∂zTν ∂zν ν 2 C=⎣ 0 0 20vx ⎦
1 γ νc 2 19vy + 0.72vz −20vx 0
+ α̇2 + ρ . (59) ⎡
2 2 ν 0.72 + 1.3|vx | + 5.8v2x 0
Using the previous results, (59) is rewritten to compact form D=⎣ 0 0.86 + 36vy + 3|vz |
as 0 −0.1 − 5vy + 3|vz |
2 ⎤
L̇(t) ≤ −aη ξη (t) + cη − ξνT (t)Aν (t)ξν (t) + Cν 0

γνa γνc2
−0.1 −2v y + 2|vz |⎦.
T ∂Sν ∂ Sν
T
− − Ŵνa Ŵνa (60) 6 + 4vy + 4|vz |
2 2 ∂zTν ∂zν
For simplicity, the restoring force vector g(η) is assumed to
where
be 0. The initial states of position and velocity are η(0) =
ξν (t) = [zTν (t), W̃νc
T
(t), W̃νa
T
(t)]T [0.3, 0.1, 0.2]T and v(0) = [0.1, 0.2, 0.3]T .
Fig. 2. Tracking performance of position and head states.

Fig. 5. Cost functions of two backstepping steps.
Fig. 3. Positive tracking error zη .

Fig. 6. Critic and actor weights of the first step.
exp [−(zν − μi )T (zν − μi )/φi2 ], i = 1, 2, . . . , 12

and the centers μi evenly spaced in the range of
[−6, 6] × [−6, 6] × [−3, 3], the widths are φ i = 2 for all.
The updating laws are designed based on (21), (53) for
critic NNs and (22), (54) for actor NNs, respectively, of
which the learning rates are γc1 = 0.1, γc2 = 0.01, γa1 =
0.3, γa2 = 0.4, and the initial conditions are Wc1 (0) =
Wc2 (0) = [0.01, . . . , 0.02]T ∈ R72×1 , Wa1 (0) = Wa2 (0) =
[0.01, . . . , 0.01]T ∈ R72×1 . Then the virtual control and actual
control are obtained based on (17) and (50), of which the
control parameters are βη = 10 and βν = 14.
The simulation results are shown in Fig. 2–7. Figs. 2 shows
the tracking performance for the position and heading states.
The tracking error vectors, zη (t) and zν (t), are presented by
Fig. 4. Velocity tracking error zν . Figs. 3 and 4, which converges to zero. The cost terms,
rη (zη , αη ) for the first step and rν (zν , u) for the second step,
are presented by Fig. 5. The bounded critic and actor weight
The desired reference signal is ηd (t) = vectors are shown by Fig. 6 for the first step and Fig. 7 for
[12 sin(0.2t + (π/2)), 12 sin(0.2t), arcsin(sin(0.2t)) + (π/2)]T the second step. Figs. 2–7 further demonstrate the proposed
with the initial value ηd (0) = [0.2, 0.1, 0.1]T . optimizing scheme can guarantee that the control objective
For the two backstepping steps, the NNs are designed is achieved.
to contain 12 nodes, i.e., nη = nν = 12. The basis In order to display the good optimizing performance of
function vector are Sη (zη ) = [s1 (zη ), . . . . , .s12 (zη )]T the method, a comparison with the vessel control method of
and Sν (zν ) = [s1 (zν ), . . . . , .s12 (zν )]T , where literature [7] is implemented. The simulation results are shown
si (zη ) = exp [−(zη − μi )T (zη − μi )/φi2 ], si (zν ) = in Figs. 8 and 9. Fig. 8 shows that the similar performance
V. C ONCLUSION
Based on the new optimizing technique OB [20], an
optimized tracking control for surface vessel is developed. In
the optimized control, the NN-based RL strategy of actor-critic
architecture is employed, where the critic NN is used to evalu-
ate the control performance and the actor NN is used to carry
out the control behavior. The overall control for the vessel sys-
tem is optimized by designing both virtual and actual controls
to be the optimized solutions of corresponding subsystems.
Based on the Lyapunov analysis, it is proven that the pro-
posed optimal algorithm can achieve the control objective. The
effectiveness is further demonstrated by simulation results.
A PPENDIX
P ROOF OF T HEOREM 1
Fig. 7. Critic and actor weights of the second step. The following lemma is used in the proof.
Lemma 1 [36]: Let G(t) ∈ R be a continuous positive func-
tion with bounded initial value G(0). If Ġ(t) ≤ −aG(t) + c is
held, where a and c are two constants, then there is following
inequality:
c
G(t) ≤ e−at G(0) + 1 − e−at . (63)
a
Proof of Theorem 1:
1) Taking a = min{aη , aν } and c = max{cη , cν }, then (62)
can be rewritten as
L̇(t) < −aL(t) + c. (64)
According to Lemma 1, the following one can be obtained:
c
L(t) < e−at L(0) + 1 − e−at . (65)
a
The above inequality implies that all error signals, zη (t), zν (t),
W̃ηa (t), W̃ηc (t), W̃νa (t), W̃νc (t), are SGUUB.
2) Let Lz (t) = (1/2)zTη (t)zη (t) + (1/2)zTν (t)zν (t), its time
Fig. 8. Similar control performance. derivative along (24) and (40) is

L̇z (t) = zTη (t) zν (t) + α̂ − η̇d (t) + zTν (t) f (χ ) − α̂˙ + u . (66)
Substituting (17) and (50) into (66) has
2 1 ∂ T Sη
L̇z (t) = −βη zη (t) − zTη (t) Ŵηa (t) + zTη (t)zν (t)
2 ∂zη
1 ∂ T Sν
− zTη (t)η̇d (t) − βν zν (t)2 − zTν (t) Ŵνa (t)
2 ∂zν
˙
+ zTν (t)f (χ (t)) − zTν (t)α̂. (67)
Based on the following result facts:
1 2 1
zTη (t)zν (t) ≤ zη (t) + zν (t)2
2 2
1 2 1
−zη (t)η̇d (t) ≤ zη (t) + η̇d (t)2
T
2 2
2
1 T ∂ T Sη 2 ∂ T Sη
− zη (t)
Ŵηa (t) ≤ zη (t) + Ŵηa (t)
2 ∂z ∂z
η η
1 1
Fig. 9. Cost functions of two control methods. zTν (t)f (χ (t)) ≤ zν (t)2 + f (χ )2
2 2
˙ 1 1 ˙ 2

−zν (t)α̂(t) ≤ zν (t) + α̂(t)
T 2

2 2
is archived, and Fig. 9 shows the control costs of two control 2
1 T ∂ T Sν ∂ T Sν
− zν (t) Ŵνa (t) ≤ zν (t) + Ŵνa (t)
.
methods. Obviously, the proposed control method is lower-cost 2
under the same control performances. 2 ∂zν ∂zν
Equation (67) can be rewritten as [12] Y. Li, K. Sun, and S. Tong, “Adaptive fuzzy robust fault-tolerant optimal
2 control for nonlinear large-scale systems,” IEEE Trans. Fuzzy Syst., to
L̇z (t) ≤ −(βη − 2) zη (t) − (βν − 3)zν (t)2 + P(t) (68) be published, doi: 10.1109/TFUZZ.2017.2787128.
[13] F. L. Lewis, D. L. Vrabie, and V. L. Syrmos, Optimal Control, 3rd ed.
where P(t) = (1/2)η̇d (t)2 + (1/2)α̂˙ 2 + (1/2)f (χ )2 + New York, NY, USA: Wiley, 2012.
[14] S. Tong, Y. Li, and S. Sui, “Adaptive fuzzy tracking control design
(∂ Sη /∂zη )Ŵηa (t) +(∂ Sν /∂zν )Ŵνa (t)2 . Because W̃ηa (t)
T 2 T
for SISO uncertain nonstrict feedback nonlinear systems,” IEEE Trans.
and W̃νa (t) are SGUUB, which are proven by part 1, it is Fuzzy Syst., vol. 24, no. 6, pp. 1441–1454, Dec. 2016.
[15] H. Modares, F. L. Lewis, and M. B. Naghibi-Sistani, “Adaptive optimal
concluded that P(t) are bounded by a constant , i.e., P(t) < . control of unknown constrained-input systems using policy iteration and
Further, the following fact holds: neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 10,
pp. 1513–1525, Oct. 2013.
L̇z (t) < −βLz (t) + [16] D. Liu, D. Wang, and H. Li, “Decentralized stabilization for a class
of continuous-time nonlinear interconnected systems using online learn-
where β = min{βη − 2, βν − 3}. Applying Lemma 1 has ing optimal control approach,” IEEE Trans. Neural Netw. Learn. Syst.,
vol. 25, no. 2, pp. 418–428, Feb. 2014.

Lz (t) < e−βt Lz (0) + (1 − e−βt )
[17] J.-H. Park, S.-H. Kim, and C.-J. Moon, “Adaptive neural control for
β strict-feedback nonlinear systems without backstepping,” IEEE Trans.
Neural Netw., vol. 20, no. 7, pp. 1204–1209, Jul. 2009.
it implies that the tracking errors can arrive to the desired [18] J. Q. Gong and B. Yao, “Neural network adaptive robust control of
nonlinear systems in semi-strict feedback form,” Automatica, vol. 37,
accuracy by making β large enough, as a result that the surface no. 8, pp. 1149–1160, 2001.
vessel can track the predefined trajectory to desired accuracy. [19] G. Arslan and T. Başar, “Disturbance attenuating controller design
for strict-feedback systems with structurally unknown dynamics,”
Automatica, vol. 37, no. 8, pp. 1175–1188, 2001.
ACKNOWLEDGMENT [20] G. Wen, S. S. Ge, and F. Tu, “Optimized backstepping for tracking
control of strict-feedback systems,” IEEE Trans. Neural Netw. Learn.
The authors would like to thank the National Research Syst., to be published, doi: 10.1109/TNNLS.2018.2803726.
Foundation, Keppel Corporation, and the National [21] D. Wang and D. Liu, “Neural robust stabilization via event-triggering
University of Singapore for supporting this paper done mechanism and adaptive learning technique,” Neural Netw. Official J.
in the Keppel-NUS Corporate Laboratory. The conclu- Int. Neural Netw. Soc., vol. 102, pp. 27–35, Jun. 2018.
[22] Y. J. Liu, G. X. Wen, and S. C. Tong, “Direct adaptive NN con-
sions put forward reflect the views of the authors alone, trol for a class of discrete-time nonlinear strict-feedback systems,”
and not necessarily those of the institutions within the Neurocomputing, vol. 73, nos. 13–15, pp. 2498–2505, 2010.
Corporate Laboratory. The WBS number of this project is [23] D. Wang, D. Liu, C. Mu, and Y. Zhang, “Neural network learning and
robust stabilization of nonlinear systems with dynamic uncertainties,”
R-261-507-004-281. IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 4, pp. 1342–1351,
Apr. 2017.
[24] G. Wen, C. L. P. Chen, Y.-J. Liu, and L. Zhi, “Neural network-based
R EFERENCES adaptive leader-following consensus control for a class of nonlinear
[1] G. Wen, S. S. Ge, F. Tu, and Y. S. Choo, “Artificial potential-based multiagent state-delay systems,” IEEE Trans. Cybern., vol. 47, no. 8,
adaptive H∞ synchronized tracking control for accommodation vessel,” pp. 2151–2160, Aug. 2017.
IEEE Trans. Ind. Electron., vol. 64, no. 7, pp. 5640–5647, Jul. 2017. [25] Y. Guo, “Globally robust stability analysis for stochastic Cohen–
[2] T. Zhang, S. S. Ge, and C. C. Hang, “Adaptive neural network con- Grossberg neural networks with impulse control and time-varying
trol for strict-feedback nonlinear systems using backstepping design,” delays,” Ukrainian Math. J., vol. 69, no. 8, pp. 1220–1233, 2018.
Automatica, vol. 36, no. 12, pp. 1835–1846, 2000. [26] B. Xu, Z. Shi, and C. Yang, “Composite fuzzy control of a class of
[3] Y. Yang, G. Feng, and J. Ren, “A combined backstepping and small-gain uncertain nonlinear systems with disturbance observer,” Nonlin. Dyn.,
approach to robust adaptive fuzzy control for strict-feedback nonlinear vol. 80, nos. 1–2, pp. 341–351, 2015.
systems,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 34, [27] L. Zhang, Z. Ning, and Z. Wang, “Distributed filtering for fuzzy time-
no. 3, pp. 406–420, May 2004. delay systems with packet dropouts and redundant channels,” IEEE
[4] S. Tong, Y. Li, Y. Li, and Y. Liu, “Observer-based adaptive fuzzy Trans. Syst., Man, Cybern., Syst., vol. 46, no. 4, pp. 559–572, Apr. 2016.
backstepping control for a class of stochastic nonlinear strict-feedback [28] Y. Li, S. Tong, and T. Li, “Adaptive fuzzy output feedback dynamic sur-
systems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 41, no. 6, face control of interconnected nonlinear pure-feedback systems,” IEEE
pp. 1693–1704, Dec. 2011. Trans. Cybern., vol. 45, no. 1, pp. 138–149, Jan. 2015.
[5] Z.-P. Jiang, “Global tracking control of underactuated ships by [29] Y. Li, S. Tong, and T. Li, “Hybrid fuzzy adaptive output feedback con-
Lyapunov’s direct method,” Automatica, vol. 38, no. 2, pp. 301–309, trol design for uncertain MIMO nonlinear systems with time-varying
2002. delays and input saturation,” IEEE Trans. Fuzzy Syst., vol. 24, no. 4,
[6] K. Do, Z.-P. Jiang, and J. Pan, “Universal controllers for stabilization pp. 841–853, Aug. 2016.
and tracking of underactuated ships,” Syst. Control Lett., vol. 47, no. 4, [30] D. Wang, H. He, and D. Liu, “Adaptive critic nonlinear robust control: A
pp. 299–317, 2002. survey,” IEEE Trans. Cybern., vol. 47, no. 10, pp. 3429–3451, Oct. 2017.
[7] K. P. Tee and S. S. Ge, “Control of fully actuated ocean surface vessels [31] D. Wang, C. Li, D. Liu, and C. Mu, “Data-based robust optimal control
using a class of feedforward approximators,” IEEE Trans. Control Syst. of continuous-time affine nonlinear systems with matched uncertainties,”
Technol., vol. 14, no. 4, pp. 750–756, Jul. 2006. Inf. Sci., vol. 366, pp. 121–133, Oct. 2016.
[8] K. D. Do and J. Pan, “Global tracking control of underactuated ships [32] G. Wen, C. L. P. Chen, J. Feng, and N. Zhou, “Optimized multi-
with nonzero off-diagonal terms in their system matrices,” Automatica, agent formation control based on identifier-actor-critic reinforce-
vol. 41, no. 1, pp. 87–95, 2005. ment learning algorithm,” IEEE Trans. Fuzzy Syst., to be published,
[9] M. Chen, S. S. Ge, and Y. S. Choo, “Neural network tracking control doi: 10.1109/TFUZZ.2017.2787561.
of ocean surface vessels with input saturation,” in Proc. IEEE Int. Conf. [33] D. Wang, H. He, X. Zhong, and D. Liu, “Event-driven nonlinear dis-
Autom. Logistics (ICAL), Shenyang, China: IEEE, 2009, pp. 85–89. counted optimal regulation involving a power system application,” IEEE
[10] S. Tong, K. Sun, and S. Sui, “Observer-based adaptive fuzzy decen- Trans. Ind. Electron., vol. 64, no. 10, pp. 8177–8186, Oct. 2017.
tralized optimal control design for strict-feedback nonlinear large-scale [34] S. Bhasin et al., “A novel actor–critic–identifier architecture for approx-
systems,” IEEE Trans. Fuzzy Syst., vol. 26, no. 2, pp. 569–584, Apr. imate optimal control of uncertain nonlinear systems,” Automatica,
2018. vol. 49, no. 1, pp. 82–92, 2013.
[11] Y. Li, K. Sun, and S. Tong, “Observer-based adaptive fuzzy fault-tolerant [35] K. G. Vamvoudakis and F. L. Lewis, “Online actor–critic algorithm
optimal control for SISO nonlinear systems,” IEEE Trans. Cybern., to to solve the continuous-time infinite horizon optimal control problem,”
be published, doi: 10.1109/TCYB.2017.2785801. Automatica, vol. 46, no. 5, pp. 878–888, 2010.
[36] G.-X. Wen, C. L. P. Chen, Y.-J. Liu, and Z. Liu, “Neural-network- C. L. Philip Chen (S’88–M’88–SM’94–F’07)
based adaptive leader-following consensus control for second-order received the M.S. degree in electrical engineering
non-linear multi-agent systems,” IET Control Theory Appl., vol. 9, from the University of Michigan, Ann Arbor, MI,
no. 13, pp. 1927–1934, Aug. 2015. USA, in 1985, and the Ph.D. degree from Purdue
University, West Lafayette, IN, USA, in 1988.
He is currently a Chair Professor with the
Department of Computer and Information Science
and the Dean of the Faculty of Science and
Guoxing Wen received the M.S. degree in applied Technology, University of Macau, Macau, China.
mathematics from the Liaoning University of His current research interests include computational
Technology, Jinzhou, China, in 2011, and the Ph.D. intelligence, systems, and cybernetics.
degree in computer and information science from
Macau University, Macau, China, in 2014.
He was a Research Fellow with the Department
of Electrical and Computer Engineering, Faculty
of Engineering, National University of Singapore, Fangwen Tu received the B.E. degree from
Singapore, from 2015 to 2016. He is currently the Department of Electrical Engineering, Dalian
a Lecturer with the Department of Mathematics, University of Technology, Dalian, China, in 2012.
Binzhou University, Binzhou, China. His current He is currently pursuing the Ph.D. degree with the
research interests include adaptive neural network control, optimal control, Department of Electrical and Computer Engineering,
and multiagent control. National University of Singapore, Singapore.
His current research interests include intelligent
control, machine learning, data mining, and com-
puter vision.
Shuzhi Sam Ge (S’90–M’92–SM’99–F’06)

received the B.Sc. degree in control engineering
from the Beijing University of Aeronautics and
Shengnan Wang received the M.A. degree in
Astronautics, Beijing, China, in 1986, and the
multimedia marketing from Portsmouth University,
Ph.D. degree and the Diploma of Imperial College
Portsmouth, U.K., in 2007.
degree in mechanical/electrical engineering from
She is currently a Lecturer with the School of
the Imperial College of Science, Technology and
Economics and Management, Binzhou University,
Medicine, University of London, London, U.K.,
Binzhou, China. Her current research interests
in 1993.
include adaptive nonlinear control, differential
He is the Founding Director of the Social
dynamic systems, machine learning, and neural
Robotics Laboratory, Interactive Digital Media
networks.
Institute, National University of Singapore, Singapore, where he is a
Professor with the Department of Electrical and Computer Engineering.

100 - Tu - IEEE Trans On 2018 - ADP For Vessel PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

100 - Tu - IEEE Trans On 2018 - ADP For Vessel PDF

Uploaded by

Copyright:

Available Formats

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CYBERNETICS 1

Adaptive Tracking Control of Surface Vessel Using

Abstract—In this paper, a tracking control approach for sur- I. I NTRODUCTION

2 IEEE TRANSACTIONS ON CYBERNETICS

vessel. However, due to surface vessels modeled in multidi-

WEN et al.: ADAPTIVE TRACKING CONTROL OF SURFACE VESSEL USING OB TECHNIQUE 3

4 IEEE TRANSACTIONS ON CYBERNETICS

WEN et al.: ADAPTIVE TRACKING CONTROL OF SURFACE VESSEL USING OB TECHNIQUE 5

1 γηa ∂Sη ∂ T Sη ∗ γηc ∂Sη ∂ T Sη

− βη2 − 1 zη (t) − 2βη zTη (t)η̇d (t) 4 1 + ση η η

1 T ∂Sη ∂ T Sη ∗ γηa ∗T ∂Sη ∂ T Sη ∗ γηc

6 IEEE TRANSACTIONS ON CYBERNETICS

γηc ∂Sη ∗T ∂ T Sη Then the above inequality can be rewritten to compact

γηa T ∂Sη ∂ T Sη ξη (t) = [zTη (t), W̃ηa T

WEN et al.: ADAPTIVE TRACKING CONTROL OF SURFACE VESSEL USING OB TECHNIQUE 7

derived as The optimized control u for the vessel dynamic (3) is

∂V ∗ depicted in earth-fixed frame, the control for the dynamic

1 ∂ T Sν (zν ) ∗ 1 ∂εν 1 T ∂Sν ∂ T Sν

8 IEEE TRANSACTIONS ON CYBERNETICS

γνc γνc 1 ∂Sν ∂ T Sν Coliolis centripetal, damping matrices are

WEN et al.: ADAPTIVE TRACKING CONTROL OF SURFACE VESSEL USING OB TECHNIQUE 9

Fig. 2. Tracking performance of position and head states.

Fig. 3. Positive tracking error zη .

exp [−(zν − μi )T (zν − μi )/φi2 ], i = 1, 2, . . . , 12

10 IEEE TRANSACTIONS ON CYBERNETICS

WEN et al.: ADAPTIVE TRACKING CONTROL OF SURFACE VESSEL USING OB TECHNIQUE 11

12 IEEE TRANSACTIONS ON CYBERNETICS

Shuzhi Sam Ge (S’90–M’92–SM’99–F’06)

You might also like