You are on page 1of 13

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CYBERNETICS 1

ADP-Based Online Tracking Control of Partially


Uncertain Time-Delayed Nonlinear System and
Application to Wheeled Mobile Robots
Shu Li , Liang Ding , Senior Member, IEEE, Haibo Gao , Yan-Jun Liu , Senior Member, IEEE,
Lan Huang, and Zongquan Deng

Abstract—In this paper, an adaptive dynamic With the help of neural network (NN) or fuzzy logic system,
programming-based online adaptive tracking control algo- the uncertain system functions, disturbance, or some other
rithm is proposed to solve the tracking problem of the partial uncertain in the target system can be approximated [11]–[16],
uncertain time-delayed nonlinear affine system with uncertain
resistance. Using the discrete-time Hamilton–Jacobi–Bellman which greatly promoted the development of intelligent con-
function, the input time-delay separation lemma, and the trol algorithms and its applications [17]–[19]. A series of
Lyapunov–Krasovskii functionals, the partial state and input research work on ADP-based intelligent optimization con-
time delay can be determined. With the approximation of the trol algorithms were carried out [20]–[28]. By using NNs,
action and critic, and resistance neural networks, a near-optimal a series iterative or heuristic ADP-based optimal control algo-
controller and appropriate adaptive laws are defined to guar-
antee the uniform ultimate boundedness of all signals in the rithms were introduced for the discrete-time (DT) nonlinear
target system, and the tracking error convergence to a small systems [24]–[30]. The above-mentioned ADP-based control
compact set to zero. A numerical simulation of the wheeled algorithm always exhibited good performance in nonlinear tar-
mobile robotic system is presented to verify the validity of the get systems without time delay. However, with the increase of
proposed method. the complexity of the control system, nonlinear characteristics
Index Terms—Adaptive dynamic programming (ADP), neural such as time delays often have a negligible impact on the con-
network (NN), time delay, tracking control, wheeled mobile robot. trolled system, and some theoretical research results have been
achieved for such systems. In general, the time delay is divided
into two categories: 1) state and 2) input time delay [31].
The state time delay is primarily a result of delays caused
I. I NTRODUCTION by the internal transmission of signals during system operation
DAPTIVE dynamic programming (ADP) method is and occurs primarily in complex systems, such as wheeled
A a near-optimal control method, which was usually
achieved by using Pontryagin’s minimum principle or solving
mobile robotics (WMR) systems and chemical systems. To
eliminate the influence of the time delay, scholars have done
HJB equation [1]–[6]. Although the existing research results a series of studies in recent years [32]–[34]. By using the
have good performance, they generally adopt offline learning Lyapunov–Krasovskii functional to eliminate the influence
mode, with the assumption that the complete prior knowledge of state time delay, an adaptive observer was designed, and
of the target system [7]. This stringent restriction has greatly a control algorithm design for multi-input multi-output dis-
restricted the development of optimization system theory and crete nonlinear dynamic systems with unknown state delay was
application [8]–[10]. completed [32]. Similarly, an adaptive NN consensus control
was designed for the multiagent nonlinear systems with state
delays [33]–[35].
Manuscript received October 12, 2018; revised January 24, 2019; accepted Different from the state time delay, the input time delay is
February 2, 2019. This work was supported in part by the National Natural the hysteresis of the input signal relative to the motion system.
Science Foundation of China under Grant 51822502, Grant 61622303, Grant To eliminate this problem, a series of research work were
61603164, and Grant 61773188, in part by the Foundation for Innovative
Research Groups of the Natural Science Foundation of China under Grant carried out in recent years [35]–[37]. The Padder transform
51521003, in part by the Fundamental Research Funds for the Central method or some other separation technique was introduced to
Universities under Grant HIT.BRETIV.201903, and in part by the 111 Project handle the influence of the input delay on the nonlinear strict
under Grant B07018. This paper was recommended by Associate Editor
H. Zhang. (Corresponding author: Liang Ding.) feedback systems [35], and the tracking problem of N-link
S. Li, L. Ding, H. Gao, L. Huang, and Z. Deng are with the State rigid robots with the constraints and the time-varying state and
Key Laboratory of Robotics and System, Harbin Institute of Technology, input time-delay problem solved [37]. The good performance
Harbin 150001, China (e-mail: li_shu43@hotmail.com; liangding@hit.edu.cn;
gaohaibo@hit.edu.cn; 18B908108@stu.hit.edu.cn; dengzq@hit.edu.cn). of the previous above-mentioned NN-based adaptive track-
Y.-J. Liu is with the College of Science, Liaoning University of Technology, ing control methods lays a solid foundation for research into
Jinzhou 121001, China (e-mail: liuyanjun@live.com). a time-delay nonlinear adaptive tracking control for other engi-
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. neering systems, such as WMR system. To verify the validity
Digital Object Identifier 10.1109/TCYB.2019.2900326 of the proposed method, a novel WMR system takes the
2168-2267 c 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON CYBERNETICS

physical characteristics of the wheeled mobile robot itself To solve the state time dela1y problem, Assumption 1 and
and its operating environment was taken into consideration the following analysis are given.
was introduced [38]. The establishment of a wheeled mobile Assumption 1: The states and control input of (1)
robot model considering wheel interactions provides a guaran- are bounded, f (ξ(k − T1,0 − T1,k )) and g(k) are con-
tee for control-oriented model analysis and control algorithm tinuously differentiable and the Jacobin matrix satisfies
design [39], [40]. ∂f ((k − T1,0 − T1,k ))/∂ξ  ≤ λ, where η ≥ 0 denotes the
In summary, an ADP-based online tracking control for Lipschitz constant of the known function f (ξ(k − T1,0 − T1,k )).
a partial uncertain time-delayed nonlinear affine system with We notice that the assumption holds true only when the
uncertain resistance is proposed. The main contributions of function f (ξ(k−T1,0 −T1,k )) is known and neither the function
this paper are as follows. itself nor the state time delay is uncertain. Therefore, T̂1,k was
1) Take the state and the input time delay into consider- employed to estimate the particle uncertain time-delay T1,k ,
ation, a separation technique and a novel Lyapunov– then, the known function f (ξ(k − T1,0 − (T̂1,k + T1,k ))) is
Krasovskii function are designed. obtained
2) Take the decomposed partial indeterminate state       
and input time delay into consideration, a novel f ξ k − T1,0 − T1,k = f ξ k − T1,0 − T̂1,k + T1,k .
DT Hamilton–Jacobi–Bellman (DTHJB) function is (3)
designed.
3) An NN-based ADP online method is newly intro- By defining the error of the state caused by T = n0 T with
duced to handle the tracking problem of the state and ξ(k + i) = ξ(k + iT) − ξ(k + (i + 1)T), n0 ∈ Z. According
input delayed nonlinear affine systems with uncertain to [32], one can obtain
resistance.
 
ξ̄ (k + 1) = ψ ξ̄ (k) (4)
II. P ROBLEM F ORMULATION AND P RELIMINARIES
where ξ̄ (k) = [ξ(k), . . . , ξ(k − T1 )]T , ψ(ξ̄ (k)) is
A. System Dynamics and Time Delay
the nonlinear function of ξ̄ (k), and Lemma 2 is obtained
Considering the DT nonlinear system with input and state further.
time delay as Remark 2 [32]: Equation (4) corresponds to a general state
ξ (k + 1) = f (ξ (k − T1 )) + g(k)τ (k − T2 ) + (k) (1) and input delayed nonlinear system with a bounded distur-
bance and control input, which can characterize the effect of
where f (ξ(k − T1 )) is an unknown function, Ti = Ti,0 + Ti,k , delay errors.
and i = 1, 2, g(k) is a bounded constant function, and (k) Lemma 2 [32]: The Lyapunov function L0 (ξ̄ (k)) for (4)
denotes the function of the equivalent resistance torque. always satisfies the following:
 2    2
B. Analysis of Time Delay c1 ξ̄ (k) ≤ L0 ξ̄ (k) ≤ c2 ξ̄ (k) + c3 (5)
     
To solve the input delay problem, Lemma 1 is given. L1 ξ̄ (k + 1) ≤ L0 ξ̄ (k + 1) − L0 ξ̄ (k)
 2
Lemma 1 [29]: For a delayed nonlinear system (1), if there ≤ −c4 ξ̄ (k) + c5 (6)
exists a definable control τ (k) = 0 at time point k, then, there
exists a bounded time-varying coefficient η(k) and a bounded where ci , i = 1, . . . , 5 denote the positive constant parameters.
matrix function (k) that satisfy the following expression: To deal with the state delay, the following definition is given.
      Definition 1: In terms of Assumption 1, Remark 1, and
τ k − T2,0 + T2,k = τ k − T2,0 + T̂2,k + T Lemma 2, the following difference functions are defined as:
  
= (1 + η(k))τ k − T2,0 + T̂2,k    
f̃1 k − T1,0 = f k − T1,0 − f (k) (7)
= (1 + η(k))(k)τ (k) (2)      
f̃2 k − T1,0 + T̂1,k = f k − T1,0 + T̂1,k
where (k) is bounded by 2λ,min ≤ (k) ≤ 2λ,max and  
− f k − T1,0 (8)
η(k) is bounded by η ≤ |η(k)| ≤ η̄.      
Proof: For a detailed description, refer to [29] based on the f̃3 k − T1,0 + T̂1,k + T1,k = f k − T1,0 + T̂1,k + T1,k
following remark.  
− f k − T1,0 + T̂1,k (9)
Remark 1: Based on the necessary and sufficient condition
of matrix inverse existence, one can obtain that only when
the inverse of |τ (k)τ  (k)| = 0, the proof of this lemma can which are abbreviated as f̃1 (·), f̃2 (·), and f̃3 (·), respectively.
be guaranteed. In this lemma, one can obtain τi (k) = 0, with Based on (2), (3), and Definition 1, (1) can be rewritten as
i = 1, 2, . . . , n. In addition, based on the previous works, we
assumed that τi (k) = τj (k) with i = j. Under these circum- 
3
ξ (k + 1) = f̃i (·) + g0 (k)τ (k) + f (k) + (k) (10)
stances, we have |τ τ  | = 0, which indicate there exists an i=1
inverse matrix, that satisfies τ τ  (τ τ  )−1 = E, and the proof of
the lemma can be further obtained. where g0 (k) = (1 + η(k))g(k)(k).
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: ADP-BASED ONLINE TRACKING CONTROL OF PARTIALLY UNCERTAIN TIME-DELAYED NONLINEAR SYSTEM 3

C. Useful Technical Supports According to Lemma 1, the DTHJB can be rewritten as


Assumption 2: There exist constants 0 ≤ λg0 ,min ≤ λg0 ,max , 
J ∗ (ξ (k)) = min ξ T (k)Q1 ξ T (k) + τ ∗T (k)R1 τ ∗ (k) + ∗T (k)
which denote the minimum and maximum eigenvalues of τ (k)
  
g0 (k), that is, 2λg0 min ≤ g0 (k) ≤ 2λg0 max . × Q0 ∗ (k) + ξ T k − T1,0 + T1,k
Assumption 3: With respect to system (1), it is assumed that   
× Q3 ξ k − T1,0 + T1,k
the weights, approximate errors, and activation functions of the
+ (1 + η(k))2 τ ∗T (k)∗T (k)R3 ∗ (k)τ ∗ (k)
critic, action, and uncertain resistance NN weights are bounded   
by unknown positive constants: ωc (k) ≤ ω̄c , ωu (k) ≤ ω̄u , + 2ξ T (k)Q2 ξ T k − T0,1 + T1,k + 2(1 + η(k))

ω (k) ≤ ω̄ ; σc (k) ≤ σ̄c , στ (k) ≤ σ̄τ , σ (k) ≤ × τ T (k)R2 ∗ (k)τ ∗ (k) + J ∗ (ξ (k + 1)) (14)
σ̄ ; and ϕ c ≤ ϕc (k) ≤ ϕ̄c , ϕ τ ≤ ϕτ (·) ≤ ϕ̄τ ,
where τ ∗ (k) denotes the optimal control input of the system (1)
ϕ  ≤ ϕ (k) ≤ ϕ̄ .
and τ ∗ (k − (T2,0 + T2,k )) = (1 + η(k))(k)τ ∗ (k).
According to [12]–[14] and [34], the uncertain nonlin-
And, τ ∗ (k) can be obtained as
ear functions can be approximated by NNs to any desired
precision under certain conditions. Therefore, the uncertain τ ∗ (k) = −1/2H1−1 gT0 (k)∂J ∗ (ξ (k + 1))/∂ξ (k + 1) (15)
function f0 (k) in this paper, which satisfies the linearity in the
parameter conditions is approximated as where H1 (k) = R1 + 2(1 + η(k))∗T (k)R2 + (1 +
η(k))2 ∗ (k)2 R3 .
  Noticing the fact that the optimal control is always unavail-
f0 (k) = ω0T ϕ0 vT0 s(k) + σ0 (s(k)) (11)
able and depends on the future state ξ(k +1) and the uncertain
where ω0 denotes the target weight of the output and v0 part (1 + η(k))∗ (k). To circumvent this deficiency, a new
denotes the input of the hidden layers. In addition, the acti- approach to the online optimal control is given next.
vation function vector ϕ0 (vT0 s(k)) which is always selected as
a Gaussian basic function, can be abbreviated as ϕ0 (k); and III. N EAR -O PTIMAL R EGULATION
σ0 (s(k)) denotes the functional approximation error. In this section, the critic NN is used to approximate the
DTHJB equation and the action is employed to learn the
control input, which could minimize the estimated DTHJB
D. DTHJB and Desired Control
equation. The third NN is introduced to estimate the uncertain
To address the control object of (1), the infinite horizon cost resistance.
function is defined as
A. NN Approximation of Optimal Cost Function
J(ξ (k)) = ξ T (k)Q1 ξ (k) + T (k)Q0 (k) + τ T (k)R1 τ (k)
      The cost function and the controller can be approximated as
+ ξ k − T1,0 + T1,k Q3 ξ T k − T1,0 + T1,k
     
+ τ T k − T2,0 + T2,k R3 τ k − T2,0 + T2,k J(ξ (k)) = ωcT ϕc (k) + σc (k) (16)
  
+ 2ξ T (k)Q2 ξ T k − T0,1 + T1,k and
  
+ 2τ T (k)R2 τ k − T2,0 + T2,k + J(ξ (k + 1)) τ (k) = ωaT ϕa (k) + σa (k) (17)
(12)
where ωc and ωa denote the constant target NN weights; σc (k)
where Qi (k) > 0, Rm×m is a positive definite symmetric and σa (k) denote the approximate errors; and ϕc (k) and ϕa (k)
i
matrix. denote the activation function of J(k) and τ (k), respectively.
Remark 3: Qi (k) was selected as a positive matrix ensure Then, the estimated of (16) can be given as
that variations in any direction of the state affect the cost, Ĵ(k) = ω̂cT ϕc (k) (18)
which can be linked to the observability condition [41].
Based on the Bellman’s principle of optimality, a novel where Ĵ(k) denotes the approximated value of (12) and ω̂cT (k)
DTHJB function is defined as denotes the estimate of the constant target online approximate
parameter vector ωc . To satisfy Ĵ(ξ(k) = 0) = 0, ϕc (0) = 0

J ∗ (ξ (k)) = min ξ T (k)Q1 ξ T (k) + τ T (k)R1 τ (k) should be ensured for ξ  = 0 [41].
τ (k) Furthermore, (12) can be written as
     
+ ξ T k − T1,0 + T1,k Q3 ξ T k − T1,0 + T1,k
      J(k) = J(f (k − T1 ) + g0 (k)τ (k) + (k)) + r(ξ, v, ) (19)
+ τ T k − T2,0 + T2,k R3 τ k − T2,0 + T2,k
  
+ 2ξ T (k)Q2 ξ T k − T0,1 + T1,k where
+ ∗T (k)Q0 ∗ (k) + 2τ T (k)R2 τ r(ξ, v, ) = ξ T (k)Q1 ξ (k) − T (k)Q0 (k) + τ T (k)R1 τ (k)
        
× k − T2,0 + T2,k + J ∗ (ξ (k + 1)) (13) + ξ T k − T1,0 + T1,k Q3 ξ T k − T1,0 + T1,k
     
+ τ T k − T2,0 + T2,k R3 τ k − T2,0 + T2,k
where J ∗ (ξ(k)) = minτ (k) J(x(k), x(k − T1 ), τ (k), τ (k − T2 ))   
is the infinite horizon optimal cost function. Furthermore, the + 2ξ T (k)Q2 ξ T k − T0,1 + T1,k
  
controller τ (k) should be admissible and J(ξ(k) = 0) = 0. + 2τ T (k)R2 τ k − T2,0 + T2,k . (20)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON CYBERNETICS

When J(·) is replaced by (18), the relationship of (19) can Considering (15) and (16), (17) can be rewritten as
no longer be guaranteed. Since the time index in the subscript  
of Jk (·) corresponds to the same time index in ω̂cT (k), Ĵ k (k + 1 ∂ϕc (ξ (k + 1)) T ∂σc (ξ (k + 1))
0 = ωτ ϕτ (k) + H(k)
T
ωc +
1) = ω̂cT (k)ϕc (ξ(k + 1)), and Ĵ k (k) = ω̂cT (k)ϕc (ξ(k)). 2 ∂ξ (k + 1) ∂ξ (k + 1)
By defining ϕc (ξ(k + 1)) = ϕc (ξ(k + 1)) − ϕc (ξ(k)), the + στ (k). (29)
residual error of (19) is defined as
Based on (27), (29), and the definition of the action
EJ (k) = ω̂cT (k) (k) + (k) (21) approximate error as ω̃τ = ω̂τ − ωτ , one can obtain
where (k) = [ϕc (k + 1), . . . , ϕc (k + 1 − j)] and (k) =
1 ∂ϕc (ξ (k + 1)) T
[r(k), . . . , r(k − j)] with 0 < j < k − 1 and j ∈ N l , N l denotes τ̃ (k) = −ω̃τ ϕτ (k) − σ̃τ (k) − H(k)
T
ω̃c
2 ∂ξ (k + 1)
the set of natural real numbers.
(30)
Then, the OLA parameter update law can be designed as
(k)EJT (k) where σ̃τ (k) = στ (k) + 1/2H(k)∂σc (ξ(k + 1))/∂ξ(k + 1).
ω̂c (k + 1) = ω̂c (k) − αJ   (22)
 (k) T (k) + I  In addition, there exists a positive constant σ̄τ that satisfies
F
σ̃τ  ≤ σ̄τ . Then, the estimation error can be obtained as
where 0 < αJ < 1 denotes the learning rate of the OLANN.
Remark 4: It is observed that the auxiliary residual error (21) ατ ϕτ (k)ϕτT (k) ατ ϕτ (k)
ω̃τ (k + 1) = I − ω̃τ (k) −
becomes zero when ξ(k) = 0, which indicates that cost func- 1 + ϕτT (k)ϕτ (k) 1 + ϕτT (k)ϕτ (k)
 
tions (16) and (18) are zero at ξ(k) = 0. In other words, the 1 ∂ϕc (ξ (k + 1)) T
OLA process will come to the end while the states of the × σ̃τ (k) + H(k) ω̃c . (31)
2 ∂ξ (k + 1)
system come to zero. This can also be seen as the necessity
of a persistency of excitation for the OLA of the cost function, In terms of (2), (3), (17), and ω̃τ (k), (10) can be rewritten as
that is, in order to better achieve the learning of the optimal
cost function, the state must last for a sufficiently long time. 
3
 
ξ (k + 1) = f̃i (·) + g0 (k) τ ∗ (k) − ω̃τT ϕτ (k)
Considering (18), (20) can be rewritten as
i=1
r(ξ (k), v(k), (k)) = −ω̂cT (k)ϕc (ξ (k + 1)) − σ (k) (23) + g0 (k)στ (k) + f (k) + (k). (32)
where σ (k) = σ (k + 1) − σ (k). Substituting (23) into (21) Furthermore, the uncertain resistance NN will be considered
to guarantee the optimal feedback control signal.
EJ (k) = −ω̃cT (k) (k) − (k) (24)
where c (k) = [σc (k), σc (k − 1), . . . , σc (k − j)] and C. Resistance Neural Network Design
(k)2 ≤  ¯ 2 . Then, (22) can be rewritten as

With the help of NN, (k) can be approximated as
αJ (k) T (k)
ω̃c (k + 1) = I −   ω̃c (k) (k) = ω
T
ϕ (k) + σ (k) (33)
 (k) T (k) + I 
F
αJ (k)T (k) where ω denotes the constant update weights, and ϕ (k) and
−   . (25)
 (k) T (k) + I  σ (k) denote the activation function and approximation error
F
of the resistance NN, respectively.
To guarantee the optimal feedback control signal, the action Take the estimate of ω as ω̂ , we have
NN and the uncertain resistance NN will be considered.
ˆ
(k) = ω̂
T
ϕ (k). (34)
B. Estimation of Control Input
To find the control policy which can minimize (18), the Based on the Bellman’s principle of optimality, the ideal
OLA approximation of (17) is designed to begin the develop- resistance ∗ (k) can be obtained by differentiating the DTHJB
ment of a feedback control policy equation (14) from the control input (k)

τ̂ (k) = ω̂τT ϕτ (k) (26) ∗ (k) = 1/2Q−1 ∗


0 ∂J (ξ (k + 1))/∂ξ (k + 1). (35)
where ω̂τ (k) denotes the estimation of ωτ To maximize the approximated cost function, the estimation
error of the resistance NN between the estimated resistance
1 ∂ϕc (ξ (k + 1)) T
τ̃ (k) = ω̂τ ϕτ (k) + H(k)
T
ω̂c (27) ˆ
(k) and the optimal actual resistance can be designed as
2 ∂ξ (k + 1)
where H(k) = H1−1 (k)gT0 (k). ˜  (k) = ω̂
 T
ϕ (k) − Q−1
0 ∂ϕc (ξ (k + 1))/∂ξ (k + 1)ω̂c . (36)
Furthermore, the OLA parameter update can be defined as
By using a gradient-based adaptation [41], one can obtain
ϕτ (k)τ̃ T (k)
ω̂τ (k + 1) = ω̂τ (k) − ατ (28) ˜ T (k)
1 + ϕτT (k)ϕτ (k) ϕ (k) 
ω̂ (k + 1) = ω̂ (k) − α (37)
where 0 < ατ < 1 denotes the learning rate of the OLA NN. 1 + ϕ (k)ϕ (k)
T
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: ADP-BASED ONLINE TRACKING CONTROL OF PARTIALLY UNCERTAIN TIME-DELAYED NONLINEAR SYSTEM 5

where positive constant α denotes the learning rate of the IV. N EAR -O PTIMAL T RACKING C ONTROL
resistance NN. Considering (18) and (33), one can obtain Based on Assumption 2, g0 (k) = (1 + η(k))g(k)∗ (k) is
1 invertible. To find the optimal control sequence to ensure the
0 = ω
T
ϕ (k) − Q−1 ∂ϕc (ξ (k + 1))/∂ξ (k + 1)ωc + σ̃ (k) state ξ(k) tracking the desired trajectory ξd (k).
2 0
(38) Define the dynamic of the desired trajectory as

3
where σ̃ (k) = σ (k) − 1/2Q−1
0 ∂σc (ξ(k + 1))/∂ξ(k + 1). ξd (k + 1) = f̃i (ξd (·)) + f (ξd (k)) + g0 (k)τd (k) + d (k)
Based on (36), (38), and the definition of the resistance NN i=1
approximate error as ω̃ = ω − ω̂ , one can obtain (42)
1 ∂ϕc (ξ (k + 1)) T where f (ξd (k)) and f̃i (ξd (·)) are the internal dynamic of (10)
˜  (k) = −ω̃
 T
ϕ (k) + Q−1 ω̃c − σ̃ (k). with a state of ξd (k), which can be abbreviated as f (·), f̃1,d (·),
2 0 ∂ξ (k + 1)
(39) f̃2,d (·), and f̃3,d (·). g0 (k) denotes the input transformation
matrix of the transformed system (10), which is bounded and
In addition, there exists a positive constant σ̄ that satisfies invertible. The desired control input is defined as
σ̃  ≤ σ̄ . τd (ξ (k), ξd (k)) = g−1
0 (k)(ξd (k + 1) − f (ξd (k)) + d (k)). (43)
According to (38) and (39), one can obtain

Then, define the tracking error as
α ϕ (k)ϕ T (k)
e(k) = ξ (k) − ξd (k). (44)
ω̃ (k + 1) = I − ω̃ (k)
1 + ϕT (k)ϕ (k)
 Considering (10) and (42), we can obtain
α ϕ (k)
+ 
3
1 + ϕT (k)ϕ (k)
 e(k + 1) = f̃i,e (·) + fe (k) + g0 (k)τe (k) + e (k) (45)

1 −1 ∂ϕc (ξ (k + 1)) T i=1
× Q ω̃c − σ̃ (k) . (40)
2 0 ∂ξ (k + 1) where fe = f (ξ ) − f (ξd ), fi,e = fi (ξ ) − fi (ξd ), i = 1, 2, 3, and
τe = τ (ξ ) − τd (ξ, ξd ).
Based on (33), (34), and ω̃ (k), (32) can be rewritten as Considering the control input of error dynamics (44)

3
  and (45), when e(k) tends to zero and T = n0 T was selected
ξ (k + 1) = f̃i (·) + g0 (k)στ (k) + g0 (k) τ ∗ (k) − ω̃τT ϕτ (k) properly, the admissible control policy τe (e(k)) tends to be an
i=1 equilibrium point of the error dynamics. Then, the cost func-
+ f (k) + ∗ (k) − ω̃
T
ϕ (k) + σ (k). (41) tion (12) in terms of e(k) and e(k − T1,0 ), e(k − (T1,0 + T̂1,k )),
and e(k − (T1,0 + T̂1,k + T1,k )) can be rewritten as
D. Stability and Optimization Analysis Je (ξ (k)) = l(e(k)) + τeT (τ (e(k)))S1 τe (e(k))
In this section, the uniform ultimate boundedness (UUB) of + 2(1 + η(k))τeT (e(k))S2 ∗ (k)τe (e(k))
the NN approximation errors will be guaranteed in the follow- + (1 + η(k))2 (k)2 τeT (e(k))S3 τe (e(k))
ing theorem. While the approximation errors σJ , στ , and σ
of the OLA are convergence to a small set to zero [43], [44], + Te (e(k))P0 e (e(k)) + Je (e(k + 1)) (46)
the asymptotic convergence of the proposed control algorithm where l(e(k)) = eT (k)P1 eT (k) + 2e(k)P2 e(k − (T0,1 + T1,k )) +
can be obtained. eT (k − (T1,0 + T1,k ))P3 e(k − (T1,0 + T1,k )), with l(e(k)) > 0,
Selecting the initial stabilizing control input as τ0 (k), which and P0 , Si , and Pi (i = 1, 2, 3) are the positive definite values.
can guaranteed the initial states reside in a compact set Noticing control input of the error dynamic is admissible,
 ⊂ Rn . Then, the OLA tuning gains αJ , ατ , and α are the finiteness of (46) can be obtained. By solving the partial
defined to ensure that future states always stay in the compact differential equation ∂Je (e(k))/∂τe (e(k)) = 0, one has
set . According to Assumption 3, and the fact ϕJ (·) ≤ ϕ̄J ,
ϕτ (·) ≤ ϕ̄τ , ϕ (·) ≤ ϕ̄ , and ∂ϕJ (·)/∂ξ(·) ≤ ϕ̄J , the 1 ∂J ∗ (e(k + 1))
τe∗ (e) = − H2−1 (k)gT0 (k) e (47)
boundedness of the critic, action, and resistance NNs will be 2 ∂e(k + 1)
guaranteed [44]. where H2 (k) = P1 + 2(1 + η(k))∗T (k)P2 + (1 +
Theorem 1: With the initial control input as τ0 (k) for the η(k))2 ∗ (k)2 P3 .
target system (1) chosen appropriately. The OLA parameters Based on (44) and (47), τ ∗ (k) is designed as
of the critic estimator (18), the action estimator (26), and the
1 ∂J ∗ (e(k + 1))
resistance NN (34) will be updated by adaptive laws (25), (31), τ ∗ (k) = τd (ξ (k), ξd (k)) − H2−1 (k)gT0 (k) e . (48)
and (40). Then, the UUB of the system states ξ(k) and the 2 ∂e(k + 1)
critic, action, and resistance NN OLA parameter estimate The desired control policy τd (ξ(k), ξd (k)) and the optimal
errors ω̃c (k), ω̃τ (k), and ω̃ (k) can be guaranteed for all feedforward term are contained in (48). However, the optimal
k ≥ k0 + T with the suitable positive constants αJ , αu , and control input is a function of the knowledge of the internal
α . In addition, τ̂ (k) − τ̂ ∗ (k) ≤ σ0 with σ0 denotes a small dynamics fe (e(k)) and the transformed control coefficient
enough positive constant. matrix g(k), but we cannot get enough knowledge of fe (e(k)),
Proof: As shown in Appendix A. Je (e(k)), and ∂J(e(k + 1))/∂e(k + 1) from (10).
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON CYBERNETICS

To deal with this problem, an online approximation was estimation error dynamics for tracking as
used to achieve nonlinear optimal tracking control. In detail,

the following three steps were proposed: 1) by approximat- αc,e e (k) Te (k)
ω̃c,e (k + 1) = I −   ω̃c,e (k)
ing (46), an NN was employed to approximate the DTHJB  e (k) T (k) + I 
e F
function, which can evaluates the performance of the error αc,e e (k)Te (k)
system (42); 2) a near-optimal control input is designed by −   . (54)
 e (k) T (k) + I 
transforming the system into a feedback form (48); and 3) by e F
approximating f (ξd (k)) in (43), which is the desired internal
dynamics of the WMR system, the feedforward control input B. Optimal Feedback Control
is designed as (48). In this section, an NN was introduced to generate the
Furthermore, with the approximation of OLAs [5], the optimal feedback control signal, which can minimize the
cost function (46), feedback control policy (47), and desired approximated cost function (52). Then, the OLA approxima-
internal dynamics f (ξd (k)) can be estimated as tion of (50) is designed as

Je (k) = ωc,e
T
ϕc,e (k) + σc,e (k) (49) τ̂e (e(k)) = ω̂τ,e
T
ϕτ,e (e(k)) (55)

τe (k) = ωτ,e
T
ϕτ,e (k) + στ,e (k) (50) where ω̂τ,e denotes the OLA estimate of the target value ωτ,e ,
and the activation function ϕτ,e (e(k)) is designed to satisfy
and ϕτ,e (0) = 0, which can further result in τ̂e (0) = 0. Above
all, the admissibility of the control can be guaranteed.
f (ξd (k)) = ωdT ϕd (ξd (k)) + σd (ξd (k)) (51) Similar to (26), the tracking error between the feedback
control and the ideal control policy can be defined as τ̃e (e(k)).
where ωc,e , ωτ,e , and ωd are bounded constant target Then, the control OLA parameter update law can be defined as
OLA update parameters, ωc,e  ≤ ω̄c,e , ωτ,e  ≤ ω̄τ,e , and ϕτ,e (k)τ̃eT (k)
ωd  ≤ ω̄d . ϕc,e (e(k)), ϕτ,e (e(k)), and ϕd (ξd (k)) are the ω̂τ,e (k + 1) = ω̂τ,e (k) − ατ,e (56)
1 + ϕτ,e
T (k)ϕ (k)
τ,e
activation functions. The approximate errors are bounded as
σc,e  ≤ σ̄c,e , στ,e  ≤ σ̄τ,e , and σd  ≤ σ̄d . Furthermore, the where 0 < ατ,e < 1 denotes the learning rate of the
boundedness of the basis activation functions and the gradient OLA NN. With the definition ω̃τ,e = ω̂τ,e − ωτ,e and (31),
of the cost function basis vector can be obtained according we can further obtain the parameter estimation error dynamics
to the boundedness of ϕc,e (·) ≤ ϕ̄c,e , ϕτ,e (·) ≤ ϕ̄τ,e , ω̃τ,e (k + 1) as
ϕd (·) ≤ ϕ̄d , [(∂σc,e e(k + 1))/(∂e(k + 1))] ≤ σ̄c,e  , and


[(∂ϕc,e e(k + 1))/(∂e(k + 1))] ≤ ϕ̄  c,e . ατ,e ϕτ,e (k)ϕτ,e
T (k)
ω̃τ,e (k + 1) = I − ω̃τ,e (k)
1 + ϕτ,e
T (k)ϕ (k)
τ,e
ατ,e ϕτ,e (k)
A. Cost Function −
1 + ϕτ,eT (k)ϕ (k)
τ,e
With the control objective to stabilize the tracking error 
system, an optimal tracking control was designed to minimize 1
× σ̃τ,e (k) + H2−1 (k)
the cost functions. By using the approximation of OLA, (46) 2
can be approximated as 
∂ϕc,e (ξ (k + 1)) T
× g0 (k)
T
ω̃c,e . (57)
Ĵ e (e(k)) = ω̂c,e
T
ϕc,e (e(k)) (52) ∂ξ (k + 1)

where ω̂c,e is the estimation of the ideal OLA parameters ω̂c,e , C. Uncertain Resistance
and the activation vector ϕc,e (e(k)) satisfies ϕc,e (0) = 0, which By using universal approximation of the NN, the uncertain
further results in Je (0) = 0. resistance error for tracking e (k) is approximated as
However, similar to (21), the cost-to-go error Ec,e (k) was
used in the estimation cost function. The cost-to-go error ˆ e (e(k)) = ω̂,e
 T
(e(k))ϕ,e (e(k)) (58)
was defined in (21), but the parameter variables ξ(k), u(k),
and ω̂cT ϕc (k + 1) were replaced by e(k), ue (e(k)), and where ω,e (·) denotes the constant update weights, and
T ϕ (e(k + 1)), respectively. Then, the OLA parameter
ω̂c,e ϕ,e (·) denotes the resistance NN activation function, which is
c,e
update law can be defined according to the vector of the selected to satisfy ϕ,e (0) = 0 so that e (0) = 0 to stabilize
cost-to-go error for tracking as the tracking error system (44).
Similar to (37)
e (k)Ec,e
T (k)
ϕ,e (k)T,e (k)
ω̂c,e (k + 1) = ω̂c,e (k) − αc,e   (53) ω̂,e (k + 1) = ω̂,e (k) − α,e
 e (k) T (k) + I  (59)
e F 1 + ϕ,e
T (k)ϕ
,e (k)

where 0 < αc,e < 1 denotes the learning rate of the where the positive constant α,e denotes the learning rate of
OLA NN. Then, we can further obtain the OLA parameter the resistance NN. Based on (40) and the definition of the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: ADP-BASED ONLINE TRACKING CONTROL OF PARTIALLY UNCERTAIN TIME-DELAYED NONLINEAR SYSTEM 7

resistance NN approximate error as ω̃ = ω − ω̂ , the Considering the OLA reconstruction errors, the following
resistance estimate

error is defined as theorem is presented to analyze the convergence of the track-
α,e ϕ,e (k) ing error, online identifier, cost function, feedback control
ω̃,e (k + 1) = I − ω̃,e (k) signal, uncertain resistance, and feedforward control signal
1 + ϕ,e
T (k)ϕ
,e (k)
OLA parameter estimation errors.
α,e ϕ,e (k)
− Theorem 2 (System Stability): Let τe,0 (k) be any initial
1 + ϕ,e
T (k)ϕ
,e (k) admissible control for the nonlinear system (44) such that (71)

1 −1 ∂ϕc,e (ξ (k + 1)) T is initially asymptotically stable. With the help of NNs,
× σ̃,e (k) − P0 ω̃c,e . the cost function, feed forward, and the uncertain resistance
2 ∂ξ (k + 1)
(60) OLA are tuning by (54), (57), and (60), and the parameter
update law for the feedforward estimator is designed as (65).
In the next section, the nearly optimal control strategy
In addition, there exist several positive constants Kd , αc,e , ατ,e ,
is designed and the stability of the tracking algorithm is
α,e , and αd to guarantee the UUB of (54), (57), (60), (64),
analyzed.
and (65), for k ≥ k0 +T, and the estimated error of the optimal
D. Near-Optimal Control Policy control input is bounded τ (ξ, ξd , e) − τ ∗ (ξ, ξd , e) ≤ σ0 with
σ0 is a positive constant small enough.
Considering the near-optimal tracking control input (48),
Proof: Detailed in Appendix B.
which contains the optimal feedback parts and the ideal pre-
determined feedforward with unknown internal dynamics, the
desired internal dynamics can be re-estimated through the V. E XAMPLE OF THE S IMULATION
desired state ξd (k) online In this section, a WMR system is introduced to verify the
f (ξd (k)) ≡ f (ξ (k))|ξ (k)=ξd (k) . (61) effectiveness of the proposed tracking control algorithm, which
takes the interaction between the wheel and the ground into
Based on (51), (58), and (61), (10) can be rewritten as
consideration as shown in Fig. 1. Then, one has
 3
ξ (k + 1) = f̃i (·) + ωdT ϕd (ξ (k)) + σd (ξ (k)) + (k) Mξ̇ + Aξ + G = B(τ − T De ) − FR (70)
i=1
+ g0 (k)τ (ξd (k), ξ (k), e(k)). (62) where
   
Then, the online identifier can be design as m cos β 0 −mβ̇ sin β md2 φ̇
M= , A=

3 0 I 0 0
ξ̂ (k + 1) = g0 (k)τ (ξd (k), ξ (k), e(k)) + f̃i (·)      
v f mg sin θ cos ψ
ξ = , FR = DP , G =
i=1 ω τR 0
+ ω̂dT ϕd (ξ (k)) + (k) − Kd ξ̃ (k) (63)    
1 1 1 τ1
where ξ̃ (k) = ξ(k) − ξ̂ (k) denotes the identifier error and the B= , τ=
rs −d 1 d2 τ2
positive constant coefficient Kd is bounded by 0 < Kd < 1.
Considering (62) and (63), one can obtain and
⎡   1 ⎤
s + ks s + As
RC01 cos n ϑ − ξ s
ξ̃ (k + 1) = ω̃dT ϕd (ξ (k)) + σd (ξ (k)) + Kd ξ̃ (k). (64) 1 L 2 01 FN ⎦
= ⎣
RC1 RC1
T De   2 .
The parameter update law is defined as RC02 + kRC2 s2 + ARC2 cos nL ϑ2 − ξ02
s s s s FN
 T
ω̃d (k + 1) = ω̃d (k) − αd ϕd (k) ω̃dT ϕd (k) + σd (k) . (65) Noticing the fact that the quality matrix is a symmetric
matrix, (70) can be rewritten in state-space form
Based on (43), (51), and the approximation property of
the OLA, the estimate of the feedforward control can be ξ̇ = ϒξ + τ +  (71)
rewritten as
 where ϒ = −M−1 A,  = M−1 B, and  = M−1 (BT De +
τ̂d (ξd , ξ ) = g−1
0 (k)[ξd (k + 1) −ω̂d ϕd (k) − ω̂ ϕ (k) .
T T
(66) FR − G).
And, the estimate of the control input (48) and the resis- Take the control input and state time delay into account,
tance (35) can be written as one has

τ (ξ (k), ξd (k), e(k)) = τ̂d (ξ (k), ξd (k)) + τ̂e (e(k)) (67) ξ̇ = ϒ(t)ξ (t − μ1 )f + (t)τ (t − μ2 ) + (t) (72)
(ξ (k), ξd (k), e(k)) =  ˆ d (ξ (k), ξd (k)) +  ˆ e (e(k)). (68) where μ1 (t) and μ2 (t) are the state and the input time delay.
Then, based on (50) and (51), (45) can be written as Then, using the first-order Taylor expansion, one has

3 ξ (k + 1) = f (ξ (k − T1 )) + g(k)τ (k − T2 ) + (k) (73)
e(k + 1) = f̃i,e (·) + fe (e(k)) + g0 (k)τe (e(k)) + e (k)
i=1 where f (ξ(k − T1 )) = (1 + Tϒ(k))ξ(k − T1 ), g(k) = T
+ ω̃dT (k)ϕd (ξd (k)) + ω̃,e (k)ϕ,e (e(k)) + σe (k) is a constant function, Ti = μi /T = Ti,0 + Ti,k , where
T = 0.001, Ti,0 = μi,0 /T, Ti,k = μi,k /T, and i = 1, 2.
− g0 (k)ω̃τ,e (k)ϕτ,e (e(k)) (69)
Additionally, (k) = T(k) denotes the function of the
where σe (k) = σd (k) + σ (k) − g0 (k)στ,e (e). equivalent resistance torque.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON CYBERNETICS

Fig. 3. Desired and actual position of the wheeled mobile robot.

Fig. 1. Force exerted of the wheeled mobile robot.

Fig. 4. Controllers τ1 and τ2 .

Fig. 2. Desired (solid line) and actual (dashed line) states.

Remark 5: As a classic engineering system, complex time


delay always exist in the WMR system. Such as determin-
ing the time lag caused by information transmission, complex
slip rate calculation, or network hysteresis caused by uncertain
interference, which are important obstacles to the fine control
of wheeled mobile robots.
Then, based on the proposed tracking control algorithm, an
Fig. 5. 2-norm of the adaptive laws for the action NNs.
ADP-based online tracking control algorithm is proposed to
realize a high precision tracking task of (73).
Taking T = 0.001, T̂1 = 18T, and T̂2 = 12T. The desired
addition, the 2-norm of the adaptive laws of the critic and
states are defined as vd (k) = 1.2 and ωd (k) = 1.2 cos(kT).
action NNs are shown in Fig. 5. The UUB of all the signals
The design parameters are selected as αc,e = [0.21, 0.43]T ,
in the partial uncertain time delayed WMR system is obtained
ατ,e = [0.23, 0.18]T , and αd = [0.31, 0.17]T , in addition
from the performance of previous trajectories, and the tracking
θ (k) = kT. The initial states and the initial adaptive laws
errors convergence in the compact set to zero.
are defined as ξ(0) = [0, 0]T , ω̂c,e (0) = [0.02, 0.02]T ,
Regardless of the comparative simulation of time delay,
ω̂τ,e (0) = [0.02, 0.04]T , ω̂,e (0) = [0.02, 0.02]T , and
please refer to [8].
ω̂d (0) = [0.02, 0.02]T . The node numbers of the hidden layer
in the NNs correspond to nc,1 = 23, nc,2 = 30, na,1 = 27,
na,2 = 25, nd,1 = 27, and nd,2 = 25. VI. C ONCLUSION
The tracking trajectories of v(k) and ω(k) of the WMR in An ADP-based adaptive control algorithm is proposed to
Fig. 2, and the good performance of the trajectory errors are solve the tracking problem of the WMR system with partial
considered to obtain the convergence of tracking errors in the uncertain state and input time delay. Based on the assump-
compact set to zero. tions and lemmas, a series delay matrix functions and a novel
Furthermore, one can obtain the good performance of the Lyapunov–Krasovskii functionals are introduced to solve the
tracking trajectories of the actual and the desired position of time-delay problem. The adaptive laws of the critic, action,
the WMR system in Fig. 3. The trajectories of the near-optimal and uncertain resistance NNs are defined through the stan-
motor control inputs of the WMR are shown in Fig. 4. In dard gradient-based adaptation method. Above all, UUB of
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: ADP-BASED ONLINE TRACKING CONTROL OF PARTIALLY UNCERTAIN TIME-DELAYED NONLINEAR SYSTEM 9

all signals in WMR system is ensured, and the tracking error and 0 < ατ < 1, the first derivative of L2 can be obtained as
convergence to a small compact set. The numerical simulation   
results validated the proposed control algorithm. L2 (k) ≤ −ατ (5 − 3ατ )/ 2 ϕτT (k)ϕτ (k) + 1 ςτ (k)2
 
+ 2ατ (2 − ατ )/ ϕτT (k)ϕτ (k) + 1 σ̃τ (k)2
A PPENDIX A 1 ατ (2 − ατ ) 2
+ λ2 ϕ̄  ω̃c (k)2 . (76)
P ROOF OF T HEOREM 1 2 ϕτT (k)ϕτ (k) + 1 H,max c
Considering the Lyapunov stability analysis theorem, we Considering the parameter update law of the uncertain resis-
can define the Lyapunov function candidate as tance (40), and the fact 0 < ϕ T ϕ /ϕ T ϕ + 1 < 1 and
  
0 < α < 1, the first derivative of L3 can be obtained as
αJ ατ α αJ α ρ̄ 2 ḡ2   T 
L=    L1 (k) +   L2 (k)
16 1 + ϕτT ϕτ 1 + ϕ Tϕ
 2ατ 1 + ϕ Tϕ
 L3 (k) ≤ −α (3 − α )/ 2 ϕ (k)ϕ (k) + 1 ς (k)2
  T 
αJ ατ 1 + α λ2 −1
2
ϕ̄  (2 − α )/ 2 ϕ ϕ + 1 ω̃c (k)2
+   L3 (k) + L4 (k) Q0 ,max c
2α 1 + ϕτ ϕτT αJ  T 
+ 2α (2 − α )/ ϕ (k)ϕ (k) + 1 σ̃ (k)2 . (77)
αJ ατ α  7
+    Li (k) + L0 (k) (74) Considering the parameter update law of the uncertain
2 1 + ϕτ ϕτ 1 + ϕ ϕ i=5
T T
resistance (40), the tracking error in (24), and the fact that
0 < αJ < 1 and 0 < ¯ ≤  T / T + IF < 1, the first
where
derivative of L4 can be obtained as
L1 = ξ T ξ, L2 = tr{ω̃τT ω̃τ }, L3 = tr{ω̃
T
ω̃ }, L4 = ω̃cT ω̃c  
L4 ≤ −αJ  ¯ ω̃c (k)2 + αJ2 (k)
¯ 2 . (78)

k−T 0 −T
k−T
L5 = f̃1T (j)f̃1 (j), L6 = f̃2T (j)f̃2 (j) Based on [31] and Assumption 1, the first difference of L5 ,
j=k−N1 T j=k−T0 −N2 T L6 , and L7 is obtained as follows:
 
k− T1,0 +T̂1,k −T
 L5 = f̃1T (k − T)f̃1 (k − T) − f̃1T (k − N1 T)f̃1 (k − N1 T)
L7 = f̃3T (j)f̃3 (j) (79)
 
j=k− T1,0 +T̂1,k −N3 T L6 = f̃2T (k − (N1 + 1)T)f̃2 (k − (N1 + 1)T)

with T chosen as the smallest common divisor for T1,0 , − f̃2T (k − (N1 + N2 )T)f̃2 (k − (N1 + N2 )T) (80)
T̂1,k , and |T1,k |, we have T1,0 = N1 T, T̂1,k = N2 T, and L7 = f̃3T (k − (N1 + N2 + 1)T)f̃3 (k − (N1 + N2 + 1)T)
||T1,k || = N3 T. − f̃3T (k − (N1 + N2 + N3 )T)f̃3 (k − (N1 + N2 + N3 )T).
According to Remark 2 and Lemma 2, the asymptotic stabil-
(81)
ity of the closed-loop system h(·) = f (ξ(k)) + ∗ (ξ(k)) + (1 +
η(k))g(k)(k)τ ∗ (ξ(k)) can be guaranteed on a compact set. Based on (5) and (6), the first difference of the Lyapunov
Furthermore, the optimal closed-loop system is upper bounded functions (75)–(81), and Assumptions 3–5, one has
on a compact set according to h(·)2 ≤ lξ(k)2 , where l is  2 αJ ατ α (1 − 4l)
a positive constant. L(k) ≤ −c4 ξ̄ (k) −    ξ (k)2
Considering (41) and L1 , and applying the Cauchy– 16 1 + ϕτT ϕτ 1 + ϕ Tϕ


Schwartz inequality, one can obtain the first difference as 5(1 − ατ )αJ α ρ̄ 2 ḡ2
−    ςτ (k)2
⎡   4 1 + ϕτT ϕτ 1 + ϕ Tϕ

k− T1,0 +T̂1,k −T
⎢  3(1 − α )αJ ατ
L1 ≤ −(1 − 4l)ξ (k) + 12⎢ 2
f̃3T (j)f̃3 (j) −    ς (k)2
⎣   4 1 + ϕτT ϕτ 1 + ϕ Tϕ

j=k− T1,0 +T̂1,k −N2 T  2

αJ ϕ̄  c D0
k−T1,0 −T
¯
− −    ω̃c (k)2
 1 + ϕτT ϕτ 1 + ϕ Tϕ

+ f̃2T (j)f̃2 (j)
3αJ ατ α
j=k−T1,0 −N2 T +   
⎤ 4 1 + ϕτT ϕτ 1 + ϕ Tϕ



k−T
⎥ 
k+1−T
+ f̃1T (j)f̃1 (j)⎥
⎦ ×⎣ f̃1T (j)f̃1 (j)
j=k−N1 T j=k+1−N1 T
 
+ 8ς (k) + 8ρ̄ ḡ ςτ (k)
2 2 2 2 k+1− T1,0 +T̂1,k −T

+ 8ρ̄ 2 ḡ2 στ (k)2 + 8σ (k)2 (75) + f̃3T (j)f̃3 (j)
 
j=k+1− T1,0 +T̂1,k −N3 T
where ς (k) = ω̃ T (k)ϕ (ξ(k)), ς (k) = ρ̄ ḡω̃T (k)ϕ (ξ(k)),
 τ τ τ ⎤
with ρ̄ = max [(1 + η(k))(k)]. k+1−T1,0 −T

Considering the parameter update law of the feedback con- + f̃2T (j)f̃2 (j)⎦ + D2M (82)
trol signal (31), and the fact that 0 < ϕτT ϕτ /ϕτT ϕτ + 1 < 1 j=k+1−T1,0 −N2 T
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON CYBERNETICS

where A PPENDIX B
  P ROOF OF T HEOREM 2
D0 = α (2 − ατ )ρ̄ 2 ḡ2 λ2H,max + ατ (2 − α )λ2 −1
Q0 ,max Choosing the tracking Lyapunov function candidate as
 
αJ ατ (2 − α ) + α ρ̄ 2 ḡ2 (2 − ατ ) ¯ 2 αJ,e ατ,e α,e αd αJ,e α,e αd ρ̄ 2 ḡ2
D2M =    ¯2
σ̃ + αJ  LT =    LT,1 (k) +   LT,2 (k)
1 + ϕτT ϕτ 1 + ϕ Tϕ
 24 1 + ϕ̄τ,e
2 1 + ϕ̄,e
2 2ατ,e 1 + ϕ̄,e
2
     
+ αJ ατ α ρ̄ ḡ σ̄τ + σ̄ / 2 1 + ϕτT ϕτ 1 + ϕ
2 2 2 2 T
ϕ . αJ,e ατ,e αd 1
+   LT,3 (k) + LT,4 (k)
2α,e 1 + ϕ̄τ,e
2 αJ,e
Based on Assumption 1 and ψiT (ξ κi )ψi (ξ κi ) ≤ η1,i
2 , one has
αJ,e ατ,e α,e αd 
7
  +    LT,i (k) (90)
h2 (k) ≤ η1,max
2 ξ̄ (k)2 . (83) 2 1 + ϕ̄τ2 1 + ϕ̄ 2
i=5
where
Similarly, we can also obtain  T  T
   2 LT,1 = eT e, LT,2 = tr ω̃τ,e ω̃τ,e , LT,3 = tr ω̃,e ω̃,e
h3 (k) ≤ η2,max ξ̄ (k)2 ; h4 (k) =≤ η2  
3,max ξ̄ (k)
2
(84) αd2 T
LT,4 = ω̃c,e
T
ω̃c,e , LT,5 = ω̃dT ω̃d + ξ̃ ξ̃
where ηi,max = max{ηi,0 , ηi,1 , . . . , ηi,Ni }, with i = 1, 2, 3. 6
Then, (93) can be represented as 
k−T

LT,6 = T
f̃1,e (j)f̃1,e (j)
3αJ ατ α  3
  j=k−N1 T
L(k) ≤ − c4 −    ηi,max
2 ξ̄ (k)2
4 1 + ϕ̄τ 1 + ϕ̄ i=1
2 2 0 −T
k−T
LT,7 = T
f̃2,e (j)f̃2,e (j)
αJ ατ α (1 − 4l)
−    ξ (k)2 j=k−T0 −N2 T
16 1 + ϕ̄τ2 1 + ϕ̄ 2  
  k− T1,0 +T̂1,k −T
5(1 − ατ )αJ α ρ̄ 2 ḡ2 
−    ςτ (k)2 LT,8 = T
f̃3,e (j)f̃3,e (j).
4 1 + ϕ̄τ2 1 + ϕ̄ 2  
  j=k− T1,0 +T̂1,k −N3 T
3(1 − α )αJ ατ
−    ς (k)2 Similar to (75)–(81), one has

4 1 + ϕ̄τ2 1 + ϕ̄ 2  

k− T1,0 +T̂1,k −T
2 ⎢ 
αJ ϕ̄  c D0 L1 ≤ −(1 − 4le )e2 + 12⎢ T
(j)f̃3,e (j)
− ¯
−    ω̃c (k)2 + D2M ⎣ f̃3,e
1 + ϕ̄τ2 1 + ϕ̄ 2  
j=k− T1,0 +T̂1,k −N2 T
with the tuning gains selected as 0 < αJ < 1, 0 < ατ < 1, k−T1,0 −T

and 0 < α < 1 for the target nonlinear affine system that + T
f̃2,e (j)f̃2,e (j)
satisfies the boundedness of the optimal closed-loop system j=k−T1,0 −N2 T
with 0 < l < 1/4. Then, we can further implies that L < 0, ⎤
while it is necessary to satisfy the following condition: 
k−T


    + T
f̃1,e (j)f̃1,e (j)⎥

   1 + ϕ̄τ2 1 + ϕ̄
2
ξ̄ (k) >   ! D̄M j=k−N1 T
  3
c4 1 + ϕ̄τ2 1 + ϕ̄
2 − 3α α α
4 J τ  i=1 ηi,max
2
 2  2 
(85) + 12 ς,e (k) + ρ̄ 2 ḡ2 ςτ,e (k) + ςd (k)2
"   
or ξ (k) > 16 1 + ϕ̄τ2 1 + ϕ̄ 2 /(α α α (1 − 4l))D̄
J τ  M (86) + 4σ̄e2 (91)
#   
4 1 + ϕ̄τ2 1 + ϕ̄ 2 where ς,e (·) = ω̃,e
T (k)ϕ
,e (e(k)), ςτ,e (·) =
or ςτ (k) > D̄M (87) T (k)ϕ (e(k))
ω̃τ,e τ,e
5(1 − ατ )αJ α ρ̄ ḡ2
2
"        T  2
or ς (k) > 4 1 + ϕ̄τ2 1 + ϕ̄ 2 /(3(1 − α )α α )D̄
 J τ M (88) LT,2 ≤ −ατ,e 3 − ατ,e / 2 ϕτ,e (k)ϕτ,e (k) + 1 ςτ,e (k)
    T  2
    + 2ατ,e 2 + 3ατ,e / ϕτ,e (k)ϕτ,e (k) + 1 σ̃τ,e (k)
 1 + ϕ̄τ2 1 + ϕ̄ 2
 
or ω̃c (k) >    D̄M . (89) 1 ατ,e 2 + 3ατ,e 2
¯ 1 + ϕ̄τ2 1 + ϕ̄ 2 − αJ ϕ̄  2c D0
+ λ2 ϕ̄  ω̃c (k)2 (92)
 T (k)ϕ (k) + 1 H  ,max c
2 ϕτ,e τ,e
    T  2
Based on the Lyapunov extension theorem [33], it is con- LT,3 (k) ≤ −α,e 3 − α,e / 2 ϕ,e (k)ϕ,e (k) + 1 ς,e (k)
cluded that the states and the NN weights estimation errors 2
λ2 −1 ϕ̄   
for the cost function, the control input, and the uncertain α,e (2 − α ) Q0 ,max c,e
ω̃c,e (k)2
+
resistance correspond to UUB. Furthermore, the boundedness 2 ϕ,e (k)ϕ,e (k) + 1
T
of the adaptive weight approximations can be ensured, and  
2α,e 2 − α,e  
the boundedness of ω̂c (k), ω̂τ (k), ω̂ (k), and f̃i (ξ(k − jT)), + T σ̃,e (k)2 (93)
ϕ,e (k)ϕ,e (k) + 1
i = 1, 2, 3, j = 1, 2, . . . , Ni are obtained.    2
The proof of Theorem 1 is complete. LT,4 ¯ e ω̃c,e (k)2 + αJ,e
≤ −αJ,e 2 ¯
e (k) . (94)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: ADP-BASED ONLINE TRACKING CONTROL OF PARTIALLY UNCERTAIN TIME-DELAYED NONLINEAR SYSTEM 11

Based on (64) and (65) system that satisfies the boundedness of the optimal closed-
  2 loop system with 0 < l < 1/4, which imply L < 0 further,
 
LT,5 ≤ −αd2 /6 1 − 3Kd2 ξ̃ (k) − αd (1 − 3/2αd )ςd (k)2 with the following condition satisfied:
× ςd (k)2 + αd (1 + 5αd /2)σ̄d2 
(95)  !
 3αJ,e ατ,e α,e αd 3i=1 ηi,max 2
LT,6 = f̃1,e
T
(k − T)f̃1,e (k − T) − f̃1,e
T
(k − N1 T) 
ē(k) > 1/c4 −   D̄M,e (100)
 
× f̃1,e (k − N1 T) (96) 4 1 + ϕ̄τ,e 2 1 + ϕ̄,e
2


LT,7 = (k − (N1 + 1)T)f̃2,e (k − (N1 + 1)T)  
T
f̃2,e  
 16 1 + ϕ̄ 2 1 + ϕ̄ 2
τ,e ,e
− f̃2,e
T
(k − (N1 + N2 )T)f̃2,e (k − (N1 + N2 )T) (97) or e(k) > D̄M,e (101)
LT,8 = f̃3,e (k − (N1 + N2 + 1)T)f̃3,e (k − (N1 + N2 + 1)T)
T αJ,e ατ,e α,e αd (1 − 4le )

   
− f̃3,e
T
(k − (N1 + N2 + N3 )T)    16 1 + ϕ̄τ,e 2 1 + ϕ̄,e
2
 
× f̃3,e (k − (N1 + N2 + N3 )T). (98) or ξ̃ (k) > D̄M,e (102)
αJ,e ατ,e α,e αd (1 − 4le )

Similar to the proof of Theorem 1, one has    
  4 1 + ϕ̄τ,e 1 + ϕ̄,e
2 2

2
h2,e ≤ η2,max 2
ē2 ; h3,e ≤ η3,max ē2 or ςτ,e (k) >   D̄M,e (103)
5 1 − ατ,e αJ,e α,e αd ρ̄ 2 ḡ2
2
ē2 
h4,e =≤ η4,max (99)    
  4 1 + ϕ̄τ,e 1 + ϕ̄,e
2 2

ηi,max
=  , η , . . . , η },
with i = 1, 2, 3. 
where max{ηi,0 i,1 i,Ni or ς,e (k) >   D̄M,e (104)
Based on (5) and (6), the first difference of the Lyapunov 3 1 − α,e αJ,e ατ,e αd
   
functions (75)–(81), and Assumptions 3–5, one has 
⎛ ⎞  4 1 + ϕ̄τ2 1 + ϕ̄ 2

3 or ω̃c (k) >    D̄M . (105)


3α α α α
J,e τ,e ,e d 4 ¯ 1 + ϕ̄τ2 1 + ϕ̄ 2 − αJ ϕ̄  2c D0
L(k) ≤ −⎝c4 −    2
ηi,max ⎠ē(k)2 
4 1 + ϕ̄τ,e 1 + ϕ̄,e i=1
2 2
Based on the Lyapunov extension theorem [5], it is con-
αJ,e ατ,e α,e αd (1 − 4le ) cluded that the states and the NN weights estimation errors for
−    e(k)2
24 1 + ϕ̄τ2 1 + ϕ̄ 2
the cost function, the control input, and the uncertain resistance
αJ,e ατ,e α,e αd   2
  2   are correspond to UUB. Furthermore, the boundedness of the
−   1 − 3K d  ξ̃ (k) adaptive weight approximations can be guaranteed, and the
12 1 + ϕ̄τ,e 1 + ϕ̄,e
2 2
boundedness of ω̂c,e (k), ω̂τ,e (k), ω̂,e (k), and f̃i,e (ξ(k−jT)),
αJ,e ατ,e α,e
−    (2 − 5αd )ςd (k)2 i = 1, 2, 3, j = 1, 2, . . . , Ni are obtained.
4 1 + ϕ̄τ2 1 + ϕ̄ 2
The proof of Theorem 2 is completed.
⎡   ⎤
5 1 − ατ,e αJ,e α,e αd ρ̄ 2 ḡ2  2
−⎣    ⎦ςτ,e (k) R EFERENCES
4 1 + ϕ̄τ,e
2 1 + ϕ̄,e 2
⎡   ⎤ [1] F. Y. Wang, H. G. Zhang, and D. R. Liu, “Adaptive dynamic pro-
3 1 − α,e αJ,e ατ,e αd  2 gramming: An introduction,” IEEE Comput. Intell. Mag., vol. 4, no. 2,
−⎣    ⎦ς,e (k) pp. 39–47, May 2009.
4 1 + ϕ̄τ,e
2 1 + ϕ̄,e2 [2] F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive
⎛ ⎞ dynamic programming for feedback control,” IEEE Circuits Syst. Mag.,
2 vol. 9, no. 3, pp. 32–50, Aug. 2009.
αJ,e αd ϕ̄  c,e D0,e  2
− ⎝ ¯e−    ⎠ω̃c,e (k) + D2M,e [3] H.-G. Zhang, X. Zhang, Y.-H. Luo, and J. Yang, “An overview of
research on adaptive dynamic programming,” Acata Automatica Sinica,
1 + ϕ̄τ,e
2 1 + ϕ̄,e
2
vol. 39, no. 4, pp. 303–311, Apr. 2013.
[4] T. Dierks, B. T. Thumati, and S. Jagannathan, “Optimal control of
where unknown affine nonlinear discrete-time systems using offline-trained
 
    2 neural networks with proof of convergence,” Neural Netw., vol. 22,
D0,e = α,e 2 − ατ,e λ2
H,max + ατ,e 2 − α,e λ −1 nos. 5–6, pp. 851–860, Jul. 2009.
Q0 ,max [5] A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time
αJ,e ατ,e α,e αd nonlinear HJB solution using approximate dynamic programming:
D2M,e ¯2+ 
= αJ,e    Convergence proof,” IEEE Trans. Syst., Man, Cybern. B, Cybern.,
4 1 + ϕ̄τ,e
T ϕ̄
τ,e 1 + ϕ̄,e ϕ̄,e
T
vol. 38, no. 4, pp. 943–949, Aug. 2008.

[6] P. J. Webros, “A menu of designs for reinforcement learning over time,”
σ̄,e
2
  in Neural Networks for Control. Cambridge, MA, USA: MIT Press,
× ρ̄ ḡ σ̄τ,e +
2 2 2
+ αJ,e / 1 + ϕ̄τ,e
T
ϕ̄τ,e 1995, pp. 67–95.
3 [7] T. Dierks and S. Jagannathan, “Online optimal control of affine nonlinear
  (   2 discrete-time systems with unknown internal dynamics by using time-
× 1 + ϕ̄,e
T
ϕ̄,e αd ατ,e 2 − α,e σ̃¯ ,e based policy update,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23,
  2 ) no. 7, pp. 1118–1129, Jul. 2012.
+ α,e ρ̄ 2 ḡ2 2 − ατ,e σ̃¯ τ,e [8] H. Zhang, C. Qin, B. Jiang, and Y. Luo, “Online adaptive policy
 learning algorithm for H∞ state feedback control of unknown affine
nonlinear discrete-time systems,” IEEE Trans. Cybern., vol. 44, no. 12,
+ ατ,e α,e /4(2 + 5αd )σ̄d2 pp. 2706–2718, Dec. 2014.
[9] H. Modares and F. L. Lewis, “Optimal tracking control of nonlinear
with the tuning gains selected as 0 < αJ,e < 1, 0 < ατ,e < 1, partially-unknown constrained-input systems using integral reinforce-
0 < ατ,e < 1, and 0 < αd < 2/5 for the target nonlinear affine ment learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, Jul. 2014.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON CYBERNETICS

[10] K. G. Vamvoudakis and F. L. Lewis, “Online actor–critic algorithm [31] Q.-L. Wei, H.-G. Zhang, D.-R. Liu, and Y. Zhao, “An optimal control
to solve the continuous-time infinite horizon optimal control problem,” scheme for a class of discrete-time nonlinear systems with time delays
Automatica, vol. 46, no. 5, pp. 878–888, May 2010. using adaptive dynamic programming,” Acta Automatica Sinica, vol. 36,
[11] H. Liang, Y. Zhou, H. Ma, and Q. Zhou, “Adaptive distributed observer no. 1, pp. 121–129, Jan. 2010.
approach for cooperative containment control of nonidentical networks,” [32] J. Na, G. Herrmann, X. Ren, and P. Barber, “Adaptive discrete neural
IEEE Trans. Syst., Man, Cybern., Syst., vol. 49, no. 2, pp. 299–307, observer design for nonlinear systems with unknown time-delay,” Int. J.
Feb. 2019. Robust Nonlin. Control, vol. 21, no. 6, pp. 625–647, Apr. 2011.
[12] Y.-J. Liu, J. Li, S. C. Tong, and C. L. P. Chen, “Neural network control- [33] D.-P. Li, Y.-J. Liu, S. C. Tong, C. L. P. Chen, and D.-J. Li,
based adaptive learning design for nonlinear systems with full state “Neural networks-based adaptive control for nonlinear state constrained
constraints,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 7, systems with input delay,” IEEE Trans. Cybern., to be published.
pp. 1562–1571, Jul. 2016. doi: 10.1109/TCYB.2018.2799683.
[13] Y.-J. Liu, S. M. Lu, D. J. Li, and S. C. Tong, “Adaptive controller [34] D. P. Li, C. L. P. Chen, Y.-J. Liu, and S. C. Tong, “Neural network
design-based ABLF for a class of nonlinear time-varying state con- controller design for a class of nonlinear delayed systems with time-
straint systems,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 47, no. 7, varying full-state constraints,” IEEE Trans. Neural Netw. Learn. Syst.,
pp. 1546–1553, Jul. 2017. to be published. doi: 10.1109/TNNLS.2018.2886023.
[14] C. W. Wu, J. X. Liu, Y. Y. Xiong, and L. G. Wu, “Observer-based [35] H. Li, L. J. Wang, H. P. Du, and A. Boulkroune, “Adaptive fuzzy back-
adaptive fault-tolerant tracking control of nonlinear nonstrict-feedback stepping tracking control for strict-feedback systems with input delay,”
systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 7, IEEE Trans. Fuzzy Syst., vol. 25, no. 3, pp. 642–652, Jun. 2017.
pp. 3022–3033, Jun. 2018. [36] D. Wang, D. H. Zhou, Y. H. Jin, and S. J. Qin, “Adaptive generic model
[15] T. T. Gao, Y.-J. Liu, L. Liu, and D. P. Li, “Adaptive neural network-based control for a class of nonlinear time-varying processes with input time
control for a class of nonlinear pure-feedback systems with time-varying delay,” J. Process Control, vol. 14, no. 5, pp. 517–531, Aug. 2004.
full state constraints,” IEEE/CAA J. Automatica Sinica, vol. 5, no. 5, [37] D.-P. Li and D.-J. Li, “Adaptive neural tracking control for an uncer-
pp. 923–933, Sep. 2018. tain state constrained robotic manipulator with unknown time-varying
[16] L. Liu, Y.-J. Liu, and S. C. Tong, “Neural networks-based adap- delays,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 48, no. 12,
tive finite-time fault-tolerant control for a class of strict-feedback pp. 2219–2228, Dec. 2018.
switched nonlinear systems,” IEEE Trans. Cybern., to be published. [38] K. Iagnemma, H. Shibly, A. Rzepniewski, and S. Dubowsky, “Planning
doi: 10.1109/TCYB.2018.2828308. and control algorithms for enhanced rough-terrain rover mobility,” in
[17] C. Wu, J. Liu, X. Jing, H. Li, and L. Wu, “Adaptive fuzzy control for Proc. 6th Int. Symp. Artif. Intell. Robot. Autom. Space, 2001, pp. 1–8.
nonlinear networked control systems,” IEEE Trans. Syst., Man, Cybern., [39] L. Ding et al., “Experimental study and analysis of the wheels’ steering
Syst., vol. 47, no. 8, pp. 2420–2430, Aug. 2017. mechanics for planetary exploration wheeled mobile robots moving on
[18] L. Liu, Y.-J. Liu, and S. C. Tong, “Fuzzy based multi-error constraint deformable terrain,” Int. J. Robot. Res., vol. 32, no. 6, pp. 712–743,
control for switched nonlinear systems and its applications,” IEEE Trans. Jun. 2013.
Fuzzy Syst., to be published. doi: 10.1109/TFUZZ.2018.2882173. [40] K. Iagnemma and S. Dubowsky, “Traction control of wheeled robotic
[19] D. Zhai, X. Liu, and Y. J. Liu, “Adaptive decentralized controller design vehicles in rough terrain with application to planetary rovers,” Int. J.
for a class of switched interconnected nonlinear systems,” IEEE Trans. Robot. Res., vol. 23, nos. 10–11, pp. 1029–1040, Oct. 2004.
Cybern., to be published. doi: 10.1109/TCYB.2018.2878578. [41] F. L. Lewis and V. L. Syrmos, Optimal Control, 2nd ed. New York, NY,
[20] J. Škach, B. Kiumarsi, F. L. Lewis, and O. Straka, “Actor-critic off-policy USA: Wiley, 1995.
learning for optimal control of multiple-model discrete-time systems,” [42] G. Tao, “Adaptive and learning systems for signal processing, commu-
IEEE Trans. Cybern., vol. 48, no. 1, pp. 29–40, Jan. 2018. nications and control series,” in Adaptive Control Design and Analysis.
[21] H. Zhang, L. Cui, and Y. Luo, “Near-optimal control for nonzero-sum New York, NY, USA: Wiley, 2003.
differential games of continuous-time nonlinear systems using single- [43] D. Swaroop, J. K. Hedrick, P. P. Yip, and J. C. Gerdes, “Dynamic surface
network ADP,” IEEE Trans. Cybern., vol. 43, no. 1, pp. 206–216, control for a class of nonlinear systems,” IEEE Trans. Autom. Control,
Feb. 2013. vol. 45, no. 10, pp. 1893–1899, Oct. 2000.
[22] F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for [44] Z. Chen and S. Jagannathan, “Generalized Hamilton–Jacobi–Bellman
partially observable dynamic processes: Adaptive dynamic program- formulation based neural network control of affine nonlinear discrete-
ming using measured output data,” IEEE Trans. Syst., Man, Cybern. time systems,” IEEE Trans. Neural Netw., vol. 10, no. 1, pp. 90–106,
B, Cybern., vol. 41, no. 1, pp. 14–25, Feb. 2011. Jan. 2008.
[23] D. Vrabie and F. L. Lewis, “Neural network approach to continuous-
time direct adaptive optimal control for partially-unknown non-
linear systems,” Neural Netw., vol. 22, no. 3, pp. 237–246,
Apr. 2009.
[24] H. G. Zhang, Q. L. Wei, and D. R. Liu, “An iterative adaptive Shu Li received the B.S. degree in information
dynamic programming method for solving a class of nonlinear zero- and computing science and the M.S. degree in
sum differential games,” Automatica, vol. 47, no. 1, pp. 207–214, applied mathematics from the Liaoning University
Jan. 2011. of Technology, Jinzhou, China, in 2013 and 2016,
[25] Y. Z. Huang and D. R. Liu, “Neural-network-based optimal tracking respectively. He is currently pursuing the Ph.D.
control scheme for a class of unknown discrete-time nonlinear systems degree in aeronautical and astronautical science and
using iterative ADP algorithm,” Neurocomputing, vol. 125, pp. 46–56, technology with the Harbin Institute of Technology,
Feb. 2014. Harbin, China.
[26] H. Zhang, Y. Luo, and D. Liu, “Neural-network-based near-optimal con- His current research interests include adaptive
trol for a class of discrete-time affine nonlinear systems with control control, neural network control, reinforcement learn-
constraints,” IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1490–1503, ing, and intelligent control of wheeled mobile
Sep. 2009. robots.
[27] D. R. Liu and Q. L. Wei, “Policy iteration adaptive dynamic program-
ming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural
Netw. Learn. Syst., vol. 25, no. 3, pp. 621–634, Mar. 2014.
[28] Q. L. Wei and D. R. Liu, “A novel iterative θ -adaptive dynamic pro-
gramming for discrete-time nonlinear systems,” IEEE Trans. Autom. Sci. Liang Ding (M’16–SM’18) was born in 1980. He
Eng., vol. 11, no. 4, pp. 1176–1190, Oct. 2014. received the Ph.D. degree in mechanical engineer-
[29] H. Zhang, L. Cui, X. Zhang, and Y. Luo, “Data-driven robust approx- ing from the Harbin Institute of Technology, Harbin,
imate optimal tracking control for unknown general nonlinear systems China, in 2009.
using adaptive dynamic programming method,” IEEE Trans. Neural He is currently a Professor with the State Key
Netw., vol. 22, no. 12, pp. 2226–2236, Dec. 2011. Laboratory of Robotics and System, Harbin Institute
[30] T. Dierks, B. Brenner, and S. Jagannathan, “Neural network-based of Technology. His current research interests include
optimal control of mobile robot formations with reduced informa- field and aerospace robotics and control.
tion exchange,” IEEE Trans. Control Syst. Technol., vol. 21, no. 4,
pp. 1407–1415, Jul. 2013.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: ADP-BASED ONLINE TRACKING CONTROL OF PARTIALLY UNCERTAIN TIME-DELAYED NONLINEAR SYSTEM 13

Haibo Gao was born in 1970. He received the Ph.D. Lan Huang received the B.S. degree in mechan-
degree in mechanical design and theory from the ical engineering and automation from Northeast
Harbin Institute of Technology, Harbin, China, in University, Shenyang, China, in 2016 and the M.S.
2004. degree in manufacturing engineering of aerospace
He is currently a Professor with the State Key vehicle from the Harbin Institute of Technology,
Laboratory of Robotics and System, Harbin Institute Harbin, China, in 2018, where he is currently pur-
of Technology. His current research interests include suing the Ph.D. degree in aeronautical and astronau-
specialized and aerospace robotics and mechanisms. tical science and technology.
His current research interests include terrame-
chanics, dynamic modeling, and adaptive control of
wheeled mobile robots.

Yan-Jun Liu (M’15–SM’17) received the B.S.


degree in applied mathematics and the M.S. degree
in control theory and control engineering from
the Shenyang University of Technology, Shenyang,
China, in 2001 and 2004, respectively, and the Ph.D.
Zongquan Deng was born in 1956. He received
degree in control theory and control engineering
the master’s degree in mechanics from the Harbin
from the Dalian University of Technology, Dalian,
Institute of Technology, Harbin, China, in 1984.
China, in 2007.
He is currently a Professor and the Vice President
He is currently a Professor with the College
with the Harbin Institute of Technology. His current
of Science, Liaoning University of Technology,
research interests include special robot systems and
Jinzhou, China. His current research interests
aerospace mechanisms and control.
include adaptive fuzzy control, nonlinear control, neural network control, rein-
forcement learning, and optimal control.
Dr. Liu is currently an Associate Editor of the IEEE T RANSACTIONS ON
S YSTEMS , M AN , AND C YBERNETICS : S YSTEMS.

You might also like