Adaptive Dynamic Programming For Optimal Tracking Control of Unknown Nonlinear Systems With Application To Coal Gasification

1020 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 11, NO.
4, OCTOBER 2014
Adaptive Dynamic Programming for Optimal

Tracking Control of Unknown Nonlinear Systems
With Application to Coal Gasification
Qinglai Wei, Member, IEEE, and Derong Liu, Fellow, IEEE
Abstract—In this paper, we establish a new data-based iterative control system into a two-person zero-sum control system. Iter-
optimal learning control scheme for discrete-time nonlinear sys- ative ADP algorithm with iteration errors is then established to
tems using iterative adaptive dynamic programming (ADP) ap- obtain the optimal control scheme, where the convergence proof is
proach and apply the developed control scheme to solve a coal gasi- developed.
fication optimal tracking control problem. According to the system
Index Terms—Adaptive dynamic programming, coal gasifi-
data, neural networks (NNs) are used to construct the dynamics
cation, data-based control, finite approximation errors, neural
of coal gasification process, coal quality and reference control, re-
networks, optimal tracking control.
spectively, where the mathematical model of the system is unnec-
essary. The approximation errors from neural network construc-
tion of the disturbance and the controls are both considered. Via
system transformation, the optimal tracking control problem with I. INTRODUCTION
approximation errors and disturbances is effectively transformed
into a two-person zero-sum optimal control problem. A new itera-
tive ADP algorithm is then developed to obtain the optimal control
laws for the transformed system. Convergence property is devel-
oped to guarantee that the performance index function converges
C OAL is the world’s most abundant energy resource
and the cheapest fossil fuel. The development of coal
gasification technologies, which is a primary component of the
to a finite neighborhood of the optimal performance index func-
carbon-based process industries, is of primary importance to
tion, and the convergence criterion is also obtained. Finally, numer-
ical results are given to illustrate the performance of the present deal with the limited petroleum reserves [1]. Hence, optimal
method. control for the coal gasification is a key problem for developing
the carbon-based process industries. To describe the process of
Note to Practitioners—Dynamic programming is a useful coal gasification, many discussions focus on coal gasification
technique for solving optimal control problems. However, in
modeling approaches [2]–[5]. The established models are
many cases, it is computationally difficult to apply it due to the
backward-in-time calculation or the “curse of dimensionality.” usually very complex with high nonlinearities. To simplify
ADP is an effective tool for solving optimal control problems the controller design, the traditional control method for the
forward-in-time. For most ADP algorithms, the accurate system coal gasification process adopts feedback linearization control
model, the accurate iterative control and the accurate iterative method [6]–[8]. However, the controller designed by feedback
performance index function are required to obtain the optimal
linearization technique is only effective in the neighborhood
control law. These iterative ADP algorithms can be called “ac-
curate iterative ADP algorithms.” For many real-world control of the equilibrium point. When the required operating range
systems, such as coal gasification systems, the system model is is large, the nonlinearities in the system cannot be properly
very difficult to construct. The optimal control and optimal per- compensated by using a linear model. Therefore, it is necessary
formance index function cannot analytically be obtained. These to study an optimal control approach for the original nonlinear
make the accurate iterative ADP algorithms difficult to apply in
system [9]–[13]. But to the best of our knowledge, there are no
real-world industrial systems. In this paper, based on the system
data, NNs are used to overcome these difficulties, where the ap- discussions on the optimal controller design for the nonlinear
proximation errors and control disturbance are both considered. coal gasification systems. One of the difficulties is complexity
System transformation is introduced that transforms the tracking of the coal gasification systems, which makes the expression
of the optimal control law very complex. Generally, the op-
Manuscript received June 21, 2013; accepted September 01, 2013. Date of timal control law cannot be expressed analytically. Another
publication November 06, 2013; date of current version October 02, 2014. This difficulty to obtain the optimal control law lies in solving the
paper was recommended for publication by Associate Editor H. Wang and Ed-
time-varying Hamilton–Jacobi–Bellman (HJB) equation which
itor M. C. Zhou upon evaluation of the reviewers’ comments. This work was
supported in part by the National Natural Science Foundation of China under is usually too difficult to solve analytically. On the other hand,
Grant 61034002, Grant 61233001, Grant 61273140, and Grant 61374105, in in the real-world control systems of coal gasification processes,
part by the Beijing Natural Science Foundation under Grant 4132078, and in
the coal quality is also unknown for control systems. This
part by the Early Career Development Award of SKLMCCS.
The authors are with the State Key Laboratory of Management and Control makes it more difficult to obtain the optimal control law of the
for Complex Systems, Institute of Automation, Chinese Academy of Sciences, coal gasification systems. To overcome these difficulties, a new
Beijing 100190, China (e-mail: qinglai.wei@ia.ac.cn; derong.liu@ia.ac.cn).
optimal control scheme must be established.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. Adaptive dynamic programming (ADP), proposed by
Digital Object Identifier 10.1109/TASE.2013.2284545 Werbos [14], [15], has played an important role as a way
1545-5955 © 2013 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/
redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
WEI AND LIU: ADP FOR OPTIMAL TRACKING CONTROL OF UNKNOWN NONLINEAR SYSTEMS WITH APPLICATION TO COAL GASIFICATION 1021
to solve optimal control problem forward-in-time [16]–[22].

Iterative methods are widely used in ADP to obtain the solution
of HJB equation indirectly and have received lots of attentions
[23]–[27]. There are two main iterative ADP algorithms that
are based on policy and value iterations, respectively [28].
Policy iteration algorithms are implemented from an initial
admissible control law to obtain the optimal solution of HJB
equation [29], [30]. Value iteration algorithms are implemented
from an initial performance index function to obtain the optimal
control law [31]–[33]. Although iterative ADP algorithms
attract more and more attentions [34]–[40], many iterative algo-
rithms require the accurate system model, the accurate iterative
control and iterative performance index function in order to
obtain the optimal control law. These iterative ADP algorithms
can be called “accurate iterative ADP algorithms.” Moreover,
for most of the iterative algorithms, the iterative control and the Fig. 1. Flow diagram of coal gasification process.
performance index function of each iteration are both required
to be accurately obtained which guarantee the convergence of
the proposed algorithms. maximizes the performance index function), the optimal con-
For most real-world control systems, such as the coal gasi- trol law can be effectively obtained by the developed iterative
fication control system, the accurate system model is complex ADP algorithm. We emphasize that the convergence analysis is
and cannot be obtained in general. In each iteration of the itera- established to guarantee that the performance index function is
tive ADP algorithms, the accurate iterative control laws and the convergent to a finite neighborhood of the optimal performance
performance index function cannot be accurately obtained ei- index function. The convergence criterion is also obtained.
ther. In this situation, approximation structures, such as neural Finally, numerical results are given to show the effectiveness
networks, can be used to approximate the system model, the of the developed iterative ADP algorithm.
iterative control law and the iterative performance index func- This paper is organized as follows. In Section II, the problem
tion, respectively. So, there must exist approximation errors be- statement is presented. In Section III, NN modeling methods
tween the approximated functions and the expected ones, no for the coal gasification system, the coal quality and reference
matter what the approximation precisions are obtained. When control are established. In Section IV, the system transformation
the accurate system model, iterative control laws and the itera- and the iterative ADP algorithm are presented. In Section V, the
tive performance index function cannot be obtained, the conver- NN implementation for the optimal control scheme is discussed.
gence properties of the accurate iterative ADP algorithms may In Section VI, numerical results are analyzed to demonstrate the
be invalid. Until now, only in [41] approximation errors for the effectiveness of the developed optimal control scheme. Finally,
iterative control law and iterative performance index function in Section VII, the conclusion is drawn.
(iteration errors for brief) in the iterative ADP algorithm were
considered, but the accurate system model is required. To the II. PROBLEM FORMULATION
best of our knowledge, there are no discussions on the optimal
A. Coal Gasification Chemistry
control scheme of the iterative ADP algorithms, where the mod-
eling errors of unknown systems and iteration errors are both The coal gasification inputs the coal water slurry (including
considered. This motivates our research. coal and water) and combines with the oxygen into the gasifier.
In this paper for the first time, an integrated self-learning op- The coal gasification process in the gasifier operates at a high
timal control method of the coal gasification industrial process temperature, and the output of coal gasification includes syn-
using iterative ADP is developed. First, a data-based NN model thesis gas and char. The diagram of coal gasification process
is established for the coal gasification control system, where can be seen in Fig. 1.
the mathematical expression of the coal gasification system is Suppose that the composition of coal contains carbon (C),
unnecessary. Second, the coal quality and reference control hydrogen (H), oxygen (O), char (Char), which is expressed by
models are also established by NNs. Next, via system trans-
formation, the optimal tracking control system is effectively
transformed into a two-person zero-sum control system. The where . Let
NN approximation errors of the system, coal quality, reference denote the coal quality function. The coal gasification
control (system errors for brief) and the control disturbance reaction can be classified into two phases [3]. The first phase
are considered as a system control (disturbance control for is coal combustion reaction, and the chemical equations are
brief). Then, the Hamilton–Jacobi–Isaacs (HJI) equation for expressed by
the two-person zero-sum optimal control problem is derived. A
new iterative ADP algorithm is developed to obtain the optimal
control law iteratively, where the iteration errors are consid-
ered. Under the worst disturbance (the disturbance control that (1)
1022 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 11, NO. 4, OCTOBER 2014
The other phase is water-gas shift reaction which is reversible also difficult to obtain for the unknown system. Furthermore,
and mildly exothermic the coal quality is also an unknown and uncontrollable pa-
rameter. Thus, new methods must be established to solve these
(2) problems.
where CO is carbon monoxide, is carbon dioxide, and III. DATA-BASED MODELING AND PROPERTIES
is water.
The coal combustion reaction is instantaneous and nonre- In this section, three-layer back-propagation (BP) NNs are
versible. The water-gas shift reaction is reversible and the re- introduced to approximate the system (6). We also use NNs to
action is strongly dependent on the reaction temperature. Let solve the reference control and obtain the coal quality. Let the
be the reaction temperature and let denote the reac- number of hidden layer neurons be denoted by . Let the weight
tion equilibrium coefficient. Then, we have the following em- matrix between the input layer and hidden layer be denoted by
pirical formula [3]: . Let the weight matrix between the hidden layer and output
layer be denoted by . Let the input vector of the NN be de-
noted as . Then, the output of three-layer NN is represented
(3)
by
where denotes the molar quantity in the synthesis gas. (7)

For the coal gasification process, it is pointed out that the
reaction temperature is a key parameter [3], [42]. Hence, in this where ,
paper, an optimal control scheme will be established to make , are the
the reaction temperature effectively track a desired one. activation function.
Remark 2.1: Generally, the coal contains more elements, The NN estimation error can be expressed by
such as nitrogen (N), sulphur (S), chlorine (Cl), and so on.
While the chemical reactions of these elements are not main
reactions and nearly ineffective to the reaction temperature.
Thus, for convenience of analysis, the reactions of these ele-
ments are omitted and only the main reactions are considered where and are the ideal weight parameters, is
in this paper. the reconstruction error. For convenience of analysis, only
the output weights are updated during the training, while
B. Control System Description the hidden weights are kept fixed [23], [43]. Hence, in the
following, the NN function (7) is simplified by the expression
Let denote the flow of the control input (Kg/h) and , where .
denote the flow of the output (Kg/h). Then, the control
input can be defined as A. Control System Modeling and Properties
In this subsection, using input-state-output data, a BP NN
model is established to reconstruct the system (6). Let the
(4) number of hidden layer neurons be denoted by and .
Let the ideal weights be denoted by and , respec-
The system output can be defined as
tively. According to the universal approximation property of
NNs, the NN representation of the system (6) can be written as
(5)
According to (1)–(5), the coal gasification control system can

be expressed as where is the NN input.
Let and
where and are matrices with appropriate dimensions.
(6) Let for a constant . Let
be the bounded NN reconstruction errors which satisfy
where and are unknown system functions. , for , 2. The NN model for the system
Let the desired state trajectory be . Then, our goal is to de- is constructed as
sign an optimal state-feedback tracking control law
, that makes the system state track the desired state tra-
jectory. However, it is nearly impossible to obtain a direct op- (8)
timal tracking controller for system (6). First, the system func-
tions and are unknown nonlinear functions. Second, where is the estimated system state vector and is the
for desired trajectory , the corresponding reference control is estimated system output vector. Let be the estimation
of the ideal weight matrix and let be the estima- With the identification error dynamics (9) and the weight tuning
tion of the ideal weight matrix . Then, we define the system rules of and in (10), we can obtain
identification errors as
(9)
where and
. Let and
. Then, we can get Applying the Cauchy–Schwarz inequality, we can get
The weights are adjusted to minimize the following error:
By a gradient-based adaptation rule, the weights are updated as
(10) Considering , we can get
where and are learning rates.

Before proceeding, the following assumption is necessary.
Assumption 1: The NN approximation errors and
are assumed to be upper bounded by a function of esti-
mation error such that
(11)
As is finite, there exists a that satisfies

where , are bounded constant
values.
Then, we have the following theorem. Then, (11) can be written as
Theorem 3.1: Let the identification scheme (8) be used to
identify the nonlinear system (6), and let the NN weights be up-
dated by (10). If Assumption 1 holds, then the system identifi-
cation error is asymptotically stable and the error matrices
and both converge to zero, as .
Proof: Consider the following Lyapunov function candi-
date defined as:
Let be selected as
and be selected as
The difference of the Lyapunov function candidate is given by
Then, we have . The proof is completed.
B. Data-Based Identifications of Coal Quality and Reference

Control
In this section, NN will be used to identify the coal quality
function and solve the reference control law using
the system data. Different from the system modeling, the coal where and is the
quality data cannot generally be detected and identified in real- reconstruction error. Let where is an
time coal gasification process. This means that the coal quality arbitrary matrix. The NN reference control is constructed as
data can only be achieved offline. Noticing this feature, an iter-
ative training method of the neural networks can be adopted.
According to (6), we can solve , which is expressed as (18)
where the estimated reference control, is esti-
mated weight matrix. Define the identification error as
(12)
(19)
Usually, is a high nonlinear system and the analytical ex-
pression of is nearly impossible to obtain. Thus, a BP NN where and
( network for brief) is established to identify the coal quality . Similarly, the weights are updated as
function .
Let the number of hidden layer neurons be denoted as .
Let the ideal weights be denoted as . The NN representation (20)
of (12) can be written as
where is the learning rate.
(13) Next, we give the convergence properties of network and
network.
where and Theorem 3.2: Let the identification schemes (14) and (18) be
is the reconstruction error. Let , used to identify and in (12) and (17), respectively.
where is an arbitrary matrix. The NN coal quality function Let the NN weights be updated by (15) and (20), respectively.
is constructed as If for , the inequalities
(14)
(21)
where is the estimated coal quality function, and
is the estimated weight matrix. According to (12), we notice that hold, where and , then the error
solving needs the data of . As we adopt offline matrices and both converge to zero, as .
data to train the NN, the corresponding data can be achieved. Proof: Consider the following Lyapunov function
Define the identification error as candidate
where and
. Similarly, the weights are updated as
As the activation functions and
are both bounded. We can let and
(15) , respectively. The difference of the Lya-
punov function candidate is given by
where is the learning rate.
Next, we will solve the reference control using NN (
network for brief). In this paper, as we aim to design a state
feedback controller to make the system state track the de-
sired one, according to the state equation in (6), we give
to approximate the reference control
function , which is expressed as
(16)
Let the number of hidden layer neurons be denoted as . Let

the ideal weights be denoted as . The NN representation of
(12) can be written as
(17) (22)
where and . As errors are still unknown. It is difficult to design the optimal
and are bounded, there exist and that tracking control system with unknown system errors. Thus, an
satisfy effective system transformation is presented in this section.
In order to transform the system, for the desired system state
(23) , a desired reference control (desired control for brief) can be
obtained. Taking the desired state trajectory into (16), we can
Then, (22) can be written as obtain the reference control trajectory
where is defined as the desired control. Let

be the neural network
function which approximates the reference control . If
According to (21), we can select the learning rates the weights converge to sufficiently, then we have
(24)
As cannot be obtained directly, network is used to ap-

proximate . According to (13) and (14), let weights
be convergent to sufficiently, we have
Hence, we can obtain . The proof
is completed.
Remark 3.1: From Theorem 3.2, we know that the coal
quality function and the reference control law can Let and
be approximated by neural networks. It should be pointed out . According to mean value theorem, we have
that, in real-world applications, the coal quality is generally
a slow time-varying function. It implies that when the coal (25)
quality function is identified, it can be considered as
a constant vector. Hence, from the current coal gasification where ,
system, we can obtain the current state, input and output data. and is some constant. As
Then, we can first use neural network to identify ac- is bounded and is smooth, we have
cording to (12)–(15). Then, taking and the state into is bounded. If we let the neural-network generated reference
the network, we can obtain the reference control law control be expressed by
immediately, according to (16)–(20).
Remark 3.2: From (21), if the reconstruction errors (26)
and are small, then we have (21) holds. When
the reconstruction errors are large, inequalities (21) may then we can get
not necessarily be true. If we assume that the inequalities
(21) hold for and , then (27)
we have for
and . In this situation, we can conclude that where . Let be the error
are uniformly ultimately bounded (UUB). between the control and the reference control , then
we can obtain
IV. DESIGN OF OPTIMAL TRACKING CONTROLLER BY
ITERATIVE ADP ALGORITHM WITH SYSTEM (28)
AND ITERATION ERRORS
where we let be the estimation control.
In the previous section, we have shown how to use the system Remark 4.1: From (28), we can see that if we have obtained
data to approximate the dynamics of system (6). NNs are also the estimation control and the control error , then the
adopted to solve the reference control and obtain the coal quality control input can be determined. In real-world industrial
function, respectively. In this section, we will present the itera- processes, however, the control performance will be influenced
tive ADP algorithm to obtain the optimal tracking control law by low level controller. For coal gasification, the fluctuation of
with system and iteration errors. flow of the control inputs is important and cannot be ignored.
Let be the bounded disturbance of the control and then
A. The System Transformation With System Errors and
the control input can be written as
Control Disturbance
Although the control system, the reference control and the
coal quality function are approximated by NNs, the system (29)
where . In the following, the distur- As , , and are

bance of the control is considered. all bounded, we have the system disturbance bounded. Let
According to (8), let the weights of neural networks be con- and , then we can
vergent to and sufficiently. If we let get
, then the system state equation can be written
as
(30) On the other hand, as mentioned in Remark 3.1, we have

that is a constant vector after it is identified. Hence, ac-
As and cannot be obtained directly, approximations cording to (26), can also be seen as a constant vector.
are adopted. Let . As the ac- Then, system (36) can be transformed into the following regu-
tivation function is smooth, according to the mean value lation system:
theorem, we have
(37)
where , where
, is . From (37), we can see that the nonlinear
some constant. Let , tracking control system (6) is transformed into a regulation
, and system, where the system errors and the control fluctuation are
is some constant. As are both bounded transformed into an unknown bounded system disturbance.
and and are both smooth, then we have
and are both bounded. So, B. Derivation of the Iterative ADP Algorithm With System
we let and . Errors, Iteration Errors and Control Disturbance
Then, (30) can be written as In this section, our goal is to obtain an optimal control that
makes the tracking error converge to zero under the system
disturbance . As the system disturbance is unknown,
(31) it makes the design of the optimal controller very difficult. In
[44], the optimal control problem for system (37) was trans-
Let the tracking error be defined as formed into a two-person zero-sum optimal control problem,
where the system disturbance was defined as a control
(32)
variable. The optimal control law is obtained under the worst
case of the disturbance (the disturbance control maximizes the
where is the desired state trajectory. Let
performance index function). Inspired by [44], we define
(33) as a disturbance control of the system and the two controls
and of system (37) are designed to optimize the following
where is the neural-network generated reference control quadratic performance index function:
trajectory expressed by (26). According to (27), (29), and (31),
we can get
(34) (38)
where . According to (32) and (34), we

where we let and
have
. Define the matrices . Then,
the optimal performance index function can be defined as
(35)
(39)
where and
,
. Thus, (31) can be written as Let
be the utility function. In this paper, we as-
sume that the utility function for
. Generally, the system errors are small.
(36) This makes the system disturbance small and the utility
where function larger than zero. If are large, we can reduce
the matrix or enlarge the matrices and . Hence, the
assumption can be guaranteed.
According to the principle of optimality, satisfies C. Properties of the Iterative ADP Algorithm With System
the discrete-time HJI equation Errors, Iteration Errors and Control Disturbance
For the two-person zero-sum iterative ADP algo-
rithm (41)–(43), as the iteration errors are unknown, for
(40) , the properties of the iterative performance index
Define the laws of optimal controls as function and the iterative control laws and
are very difficult to analyze. In [41], for nonlinear
systems with single controller, a new “error bound” analysis
method is proposed to prove the convergence of the iterative
performance index function. In this paper, we will give the
“error bound” convergence analysis of the iterative perfor-
mance index functions for nonlinear two-person zero-sum
optimal control problems.
For , define a new iterative performance index
Hence, the HJI (40) can be written as function as
(44)
We can see that if we want to obtain the optimal control
laws and , we must obtain the optimal per- where and
formance index function . Generally, is un-
known before all the controls and are considered.
This makes HJI equation generally unsolvable. In this paper, a
(45)
new iterative ADP algorithm with system and approximation er-
is the accurate iterative control law. According to (43), for
rors is developed to overcome these difficulties. In the present
iterative ADP algorithm, the performance index function and , there exists a finite constant that makes
control law are updated by iterations, with the iteration index
(46)
increasing from 0 to infinity. Let the initial performance index
function . For , the iterative control hold uniformly. Hence, we can give the following theorem.
law and can be computed as Theorem 4.1: For , let be expressed
as in (44) and be expressed as in (43). Let be
expressed as in (37). Let be a constant that makes
(41)
hold uniformly. If there exists that makes (46) hold

(42) uniformly, then we have
Update the iterative performance index function by

(47)
where we define , for and .

Proof: The theorem can be proved by mathematical induc-
tion. First, let . We have . So, the conclusion
holds for . Next, for , introducing a parameter
(43) where . According to (44), we have
where is expressed as (37). Let and

be finite approximation error functions of the iterative control
and iterative performance index function, respectively.
Remark 4.2: From (37), we can see that the system is affine
for the disturbance control . According to (41), using the
necessary condition of the optimality, for , we
have that can be obtained as
Let . According to (46), we can obtain
which shows that (47) holds for . Assume that (47) holds
for , where . Then, for , we have
Fig. 2. The structure diagram of the algorithm.
geometrical series. If , then for , (47)

becomes
(50)
According to (46), letting , we have
(51)
According to (50) and (51), we can obtain (49).

Corollary 4.1: Suppose Theorem 4.1 holds. If for
, , and the inequality (48) holds, then the iterative
control laws and of the iterative ADP algo-
rithm (41)–(43) are convergent, i.e.,
According to (46), we can obtain (47) which proves the conclu-
sion for .
Remark 4.3: From (47), we can see that if , which
means that the accurate iterative performance index functions
can be obtained, then we can get and
the iterative performance index function is convergent.
For , the iterative performance index function V. NEURAL NETWORK IMPLEMENTATION OF THE OPTIMAL
may diverge. In the following, we will give the convergence TRACKING CONTROL SCHEME
criterion of the iterative ADP algorithm (41)–(43) using error In this section, neural networks, including action network and
bound method. critic network, are used to implement the present iterative ADP
Theorem 4.2: Suppose Theorem 4.1 holds. If for , algorithm. Both the neural networks are chosen as three-layer
the inequality BP networks. The whole structure diagram is shown in Fig. 2.
(48) A. The Critic Network

For , the critic network is used to approximate
holds, then as , the iterative performance index function the performance index function in (44). The output
in the iterative ADP algorithm (41)–(43) is uniformly of the critic network is denoted by
convergent to a bounded neighborhood of the optimal perfor-
mance index function , i.e.,
for . Let be random weight matrices. Let

, where is an arbitrary matrix. We
have that is upper bounded, i.e., .
(49) The target function can be written as
Proof: According to (47) in Theorem 4.1, we can see that
for , the sequence is a
Then, we define the error function for the critic network as function and iterative control be approximated by the critic and
action networks, respectively. The weight convergence prop-
erty of the neural networks is shown in the following theorem.
Theorem 5.1: Let the target performance index function and
the target iterative control law be expressed by
The objective function to be minimized in the critic network
training is
respectively, where and are reconstruction errors.

The gradient-based weight update rule [45] can be applied here Let the critic and action networks be trained by (52) and (53),
to train the critic network respectively. Let and . If
for , there exist and that
satisfy the following inequalities:
(54)
(52) where and

, then the error matrices both con-
where is the learning rate of critic network. If the training verge to zero, as .
precision is achieved, then we say that can be ap- Proof: From (52) and (53), we have
proximated by the critic network.
B. The Action Network

and
The action network is used to approximate the iterative con-
trol law , where is defined by (45). The output
can be formulated as
Consider the following Lyapunov function candidate:
Let where is an arbitrary matrix. We

have that is upper bounded, i.e., . So (55)
we can define the output error of the action network as
Let and
. Then, the difference of the Lyapunov func-
tion candidate (55) is given by
The weighs in the action network are updated to minimize the
following performance error measure:
The weight updating algorithm is similar to the one for the critic
network. By the gradient descent rule, we can obtain
(53)
where is the learning rate of action network.

To guarantee the effectiveness of the neural network im-
plementation, the convergence of the neural network weights
should be proved which makes the iterative performance index (56)
nonlinear system can be summarized in the following flowchart

(Fig. 3).
Remark 5.1: For the coal gasification control system, the
values of system data, such as the temperature data, the control
input data and the output data, are usually very large. In
this case, before we use these data to built neural network
approximators, the data are required to be normalized. The
normalization details can be seen in [45] and omitted in this
paper.
Remark 5.2: To train the neural networks, including model
network, network, network, action and critic networks, a
large amount of data must be collected. For most industrial pro-
cesses, these system data, such as input-state-output data and
coal quality data, cannot be collected online. Thus, in this paper,
we collected these data offline to train the model neural net-
work, to identify the coal quality, and to solve the reference
control law , where the offline collected data include the
data of the system state, system control input and system output.
Furthermore, the developed iterative ADP algorithm is also im-
plemented offline to train the critic and action neural networks.
After the weights of the critic and action neural networks are
convergent, which means that the optimal control law have been
obtained, the optimal control law is then applied to control the
coal gasification system. Hence, we can say that the developed
Fig. 3. The flowchart of iterative ADP algorithm.
iterative ADP method in this paper is an offline optimal control
method.
As and are both finite, there exist and
that satisfy VI. NUMERICAL ANALYSIS
In this section, numerical experiments are studied to
show the effectiveness of our iterative ADP algorithm. Let
(57) the coal gasification control system be expressed as in (6).
We let the current reaction temperature in the gasifier be
. Observe the corresponding system input and
Then, (56) can be written as output data (Kg/h) which are
and .
Let the desired reaction temperature . To model
the coal gasification control system (6), we collect 20,000
temperature data from the real-world coal gasification system.
The corresponding 20,000 system input data and 20,000 system
output data are also recorded. Then, a three-layer BP NN is
established with the structure 8—20—1 to approximate the
state equation in (6) and the NN is the model network. The
control input is expressed by (4). We also use three-layer BP
Selecting the learning rates and as
NN with structure 8—20—5 to approximate the input-output
equation in (6) and the NN is the input-output network. Let the
learning rates of the model network and input-output network
be both . Using the gradient-based weight update
rule [45] to train the neural networks for 20,000 iteration steps
(58) to reach the training precision . The converged weights
are given by
we can obtain . The proof is completed.
C. Summary of the Iterative ADP Algorithm

Based on the above analysis, the whole data-based iterative
adaptive dynamic programming algorithm for the unknown (59)
and
Fig. 4. The convergence trajectory of iterative performance index function.
Next, we adopt three-layer BP NNs to identify the coal quality

(13) and the reference control (16). The structure of network
and network are chosen as 10–20–4 and 6–20–3, respec-
tively. Using the gradient-based weight update rule, train the
two neural networks for 20,000 iteration steps under the learning
rate 0.002 to reach the training precision . The converged
weights are given by
Fig. 5. The trajectory of state.
Taking the current system data , and into

network and we can obtain the coal quality
. Taking the desired state
and the coal quality into network,
we can obtain the desired control input expressed by
. According to
the weights of model network, network and network, we
can easily obtain that the system disturbance .
Next, the developed iterative ADP algorithm is established
to obtain the optimal tracking control law. Let the performance
and
index function be defined in (38), where , and
, where denotes identity matrix with suitable dimension.
The critic and action networks are both chosen as three-layer
BP neural networks with the structures of 1–10–3 and 1–10–1,
respectively. For each iteration, the critic and action networks
are trained for 1000 iteration steps using the learning rate of
so that the neural network training errors be-
come less than . Let the iteration index . The con-
verged weights of the critic network and the action network are
expressed as
Fig. 6. The trajectories of control and system output. (a) Coal input trajectory.
(b) input trajectory. (c) input trajectory. (d) CO output trajectory.
successfully makes the system state track the desired tempera-

Fig. 7. The trajectories of system output. (a) output trajectory. (b) ture, which shows the effectiveness of the present algorithm. In
output trajectory. (c) output trajectory. (d) Char output trajectory. the following we will change the NN training precisions to show
the performance of the present iterative ADP algorithm. First,
we change the training precisions of model network, network,
and and network to . While the training precisions of critic
and action networks are kept at .
Let the iteration index . The convergence trajectory
of the iterative performance index function is shown in Fig. 8.
We apply the optimal control law to the system for
time steps and obtain the following results. The optimal state
trajectory is shown in Fig. 9. The corresponding control trajec-
tories and system output trajectories are shown in Figs. 10 and
respectively. The convergence trajectory of the iterative perfor- 11, respectively.
mance index function is shown in Fig. 4. We apply the optimal As is known, for real-world coal gasification, the flow fluctu-
control law to the system for time steps and obtain ation of the control inputs is important and cannot be ignored. In
the following results. The optimal state trajectory is shown in the following, we display the control system performance under
Fig. 5. The corresponding control trajectories and system output the control disturbance. Let be a zero-expectation white
trajectories are shown in Figs. 6 and 7, respectively. noise of control input, with , ,
From the above numerical results, we can see that under the . The disturbance trajectories of the control input
given NN training precisions, the optimal control derived by the are displayed in Fig. 12(a)–(c), respectively. Let the training
present iterative ADP algorithm with system and iteration errors precisions of model network, network, and network be
(b) input trajectory. (c) input trajectory. (d) CO output trajectory. Fig. 12. The control disturbance and the input-output mass error. (a) Control
disturbance . (b) Control disturbance . (c) Control disturbance
. (d) The error between the input and output mass.
Fig. 11. The trajectories of system output. (a) output trajectory. (b)
output trajectory. (c) output trajectory. (d) Char output trajectory. optimal tracking control scheme for the system. On the other
hand, if we enlarge the iteration errors, the control property is
and the training precisions of critic and action networks quite different. Let the disturbance of the control .
are kept with . The convergence trajectory of the iterative Let the training precisions for model network, network, and
performance index function is shown in Fig. 13. The optimal network be kept at . We change the training precisions
state trajectory is shown in Fig. 14. The corresponding control of critic and action networks to . Let the iteration index
trajectories and system output trajectories are shown in Figs. 15 . The convergence trajectory of the iterative perfor-
and 16, respectively. From the numerical results, we can see mance index function is shown in Fig. 17(a), where we can see
that under the disturbance of the control input, we can also ob- that the iterative performance is not convergent any more. The
tain the optimal tracking control of the system which shows corresponding state trajectory is shown in Fig. 17(b), where we
the effectiveness and robustness of the developed iterative ADP notice that the desired state is not achieved.
method. To verify the correctness of the model and the devel- Remark 6.1: From the numerical results, we can see that the
oped method, the mass errors between the input and output is developed iterative ADP algorithm permits large system errors
given in Fig. 12(d). and the control disturbance to achieve the optimal tracking
From the numerical results, we can see that when the system control law of the system. The admissible iteration errors for
errors and the disturbance of the control input are enlarged, the the iterative ADP algorithm are relatively small. We can find
developed iterative ADP algorithm is still effective to find the the reason according to the theoretical analysis. From (39), for
Fig. 16. The trajectories of system output. (a) output trajectory. (b)
output trajectory. (c) output trajectory. (d) Char output trajectory.
(b) input trajectory. (c) input trajectory. (d) CO output trajectory.
Fig. 17. The trajectory of iterative performance index function. (a) The trajec-
tory of iterative performance index function. (b) The trajectory of state.
the two-person zero-sum optimal control problem, we have

designed an optimal control law under the worst disturbance Remark 6.2: In this paper, to guarantee the effectiveness of
(the disturbance control maximizes the performance the developed iterative ADP method in genuine industrial ap-
index function, where includes the system errors and the plications, we first collected the data by a real-world industrial
disturbance of the control). On the other hand, in the iteration processes to train the model network. Hence, we should say that
process of implementing the iterative ADP algorithm, from the established model of this paper can be representative of a
(47) we can see that each iteration error in the iteration step genuine industrial system. Second, in this paper, system errors,
will be accumulated, which effects the iterative performance iteration errors and control disturbances are considered. The
index function in the next iteration. Hence, we can say that the optimal control problem with errors and disturbances is trans-
developed iterative ADP algorithm possesses high robustness formed into a two-person zero-sum control problem. We have
for the system errors and the control disturbances while the designed an optimal control scheme using value iterative adap-
robustness for the iteration errors are relatively low. Therefore, tive dynamic programming algorithm under the worst case of
it is recommended to enhance the training precision of the the errors and disturbances. The numerical results have shown
critic and action networks to guarantee the convergence of the the effectiveness of the developed method. Furthermore, we can
iterative performance index function to obtain the effective see that the mass errors between the input and output mass are
optimal control law. equivalent, which shows the correctness of developed method.
Hence, we can say that the developed optimal control scheme is [15] P. J. Werbos, “A menu of designs for reinforcement learning over
effectively a representative of the optimal control procedure for time,” in Neural Networks for Control, W. T. Miller, R. S. Sutton, and
P. J. Werbos, Eds. Cambridge: MIT Press, 1991, pp. 67–95.
a genuine industrial application. [16] H. He, Z. Ni, and J. Fu, “A three-network architecture for on-line
learning and optimization based on adaptive dynamic programming,”
Neurocomputing, vol. 78, no. 1, pp. 3–13, 2012.
VII. CONCLUSION [17] A. Heydari and S. N. Balakrishnan, “Finite-horizon control-con-
strained nonlinear optimal control using single network adaptive
In this paper, an effective iterative ADP algorithm is es- critics,” IEEE Trans. Neural Netw. Learning Syst., vol. 24, no. 1, pp.
tablished to solve optimal tracking control problems for coal 145–157, Jan. 2013.
gasification systems. Using the input-state-output data of the [18] W. S. Lin and J. W. Sheu, “Optimization of train regulation and energy
usage of metro lines using an adaptive-optimal-control algorithm,”
system, NNs are used to approximate the system model, the IEEE Trans. Autom. Sci. Eng., vol. 8, no. 4, pp. 855–864, Apr. 2011.
coal quality and the reference control, respectively, and the [19] D. Liu, Y. Zhang, and H. Zhang, “A self-learning call admission control
mathematical model of the coal gasification is unnecessary. scheme for CDMA cellular networks,” IEEE Trans. Neural Networks,
vol. 16, no. 5, pp. 1219–1228, Sep. 2005.
Considering the system errors of NNs and the control distur- [20] D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,” IEEE
bance, the optimal tracking control problem is transformed into Trans. Neural Netw., vol. 8, no. 5, pp. 997–1007, Sep. 1997.
a two-person zero-sum optimal regulation control problem. It- [21] P. J. Werbos, “Intelligence in the brain: A theory of how it works and
how to build it,” Neural Netw., vol. 22, no. 3, pp. 200–212, 2009.
erative ADP algorithm is then established to obtain the optimal
[22] X. Xu, Z. Hou, C. Lian, and H. He, “Online learning control using adap-
control law where the approximation errors in each iteration tive critic designs with sparse kernel machines,” IEEE Trans. Neural
are considered. Convergence analysis is given to guarantee that Netw. Learning Syst., vol. 24, no. 5, pp. 762–775, May 2013.
the performance index function is convergent to a finite neigh- [23] S. Bhasin, R. Kamalapurkar, M. Johnson, K. G. Vamvoudakis, F. L.
Lewis, and W. E. Dixon, “A novel actor-critic-identifier architecture
borhood of the optimal performance index function. Finally, for approximate optimal control of uncertain nonlinear systems,” Au-
numerical results are displayed to illustrate the performance of tomatica, vol. 49, no. 1, pp. 82–92, Jan. 2013.
the developed algorithm. [24] F. L. Lewis and D. Liu, Reinforcement Learning and Approximate Dy-
namic Programming for Feedback Control. New York, NY, USA:
Wiley, 2012.
REFERENCES [25] F. Y. Wang, N. Jin, D. Liu, and Q. Wei, “Adaptive dynamic program-
ming for finite-horizon optimal control of discrete-time nonlinear sys-
[1] I. B. Matveev, V. E. Messerle, and A. B. Ustimenko, “Investigation of tems with -error bound,” IEEE Trans. Neural Netw., vol. 22, no. 1,
plasma-aided bituminous coal gasification,” IEEE Trans. Plasma Sci., pp. 24–36, Jan. 2011.
vol. 37, no. 4, pp. 580–585, Apr. 2009. [26] Q. Wei and D. Liu, “Numerical adaptive learning control scheme for
[2] N. Abani and A. F. Ghoniem, “Large eddy simulations of coal gasifi- discrete-time nonlinear systems,” IET Control Theory Appl., vol. 7, no.
cation in an entrained flow gasifier,” Fuel, vol. 104, pp. 664–680, Feb. 11, pp. 1472–1486, July 2013.
2013. [27] Z. Ni, H. He, and J. Wen, “Adaptive learning in tracking control based
[3] P. Ruprecht, W. Schafer, and P. Wallace, “A computer model of en- on the dual critic network design,” IEEE Trans. Neural Netw. Learning
trained coal gasification,” Fuel, vol. 67, no. 6, pp. 739–742, 1988. Syst., vol. 24, no. 6, pp. 913–928, June 2013.
[4] S. I. Serbin and I. B. Matveev, “Theoretical investigations of the [28] F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement
working processes in a plasma coal gasification system,” IEEE Trans. learning and adaptive dynamic programming for feedback control,”
Plasma Sci., vol. 38, no. 12, pp. 3300–3305, Dec. 2010. IEEE Control Syst. Mag., vol. 32, no. 6, pp. 76–105, 2012.
[5] J. Xu, L. Qiao, and J. Gore, “Multiphysics well-stirred reactor modeling [29] M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for non-
of coal gasification under intense thermal radiation,” Int. J. Hydrogen linear systems with saturating actuators using a neural network HJB
Energy, vol. 38, no. 17, pp. 7007–7015, June 2013. approach,” Automatica, vol. 41, no. 5, pp. 779–791, May 2005.
[6] R. Guo, G. Cheng, and Y. Wang, “Texaco coal gasification quality [30] J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dy-
prediction by neural estimator based on dynamic PCA,” in Proc. namic programming,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev.,
IEEE Int. Conf. Mechatronics Autom., Luoyang, China, June 2006, vol. 32, no. 2, pp. 140–153, May 2002.
pp. 2241–2246. [31] A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time non-
[7] K. Kostur and J. Kacur, “Developing of optimal control system for linear HJB solution using approximate dynamic programming: Con-
UCG,” in Proc. 13th Int. Carpathian Control Conf., Podbanske, Slovak vergence proof,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38,
Republic, May 2012, pp. 347–352. no. 4, pp. 943–949, Aug. 2008.
[8] J. A. Wilson, M. Chew, and W. E. Jones, “State estimation-based con- [32] B. Lincoln and A. Rantzer, “Relaxing dynamic programming,” IEEE
trol of a coal gasifier,” IEE Proc. Control Theory Appl., vol. 153, no. Trans. Autom. Control, vol. 51, no. 8, pp. 1249–1260, Aug. 2006.
3, pp. 268–276, 2006. [33] H. Zhang, Q. Wei, and Y. Luo, “A novel infinite-time optimal tracking
[9] Y. Chen, Z. Li, and M. Zhou, “Optimal supervisory control of flexible control scheme for a class of discrete-time nonlinear systems via the
manufacturing systems by Petri nets: A set classification approach,” greedy HDP iteration algorithm,” IEEE Trans. Syst., Man, Cybern. B,
IEEE Trans. Autom. Sci. Eng., doi: 10.1109/TASE.2013.2241762. Cybern., vol. 38, no. 4, pp. 937–942, Jul. 2008.
[10] Q. S. Jia, “An adaptive sampling algorithm for simulation-based opti- [34] T. Dierks and S. Jagannathan, “Online optimal control of affine non-
mization with descriptive complexity preference,” IEEE Trans. Autom. linear discrete-time systems with unknown internal dynamics by using
Sci. Eng., vol. 8, no. 4, pp. 720–731, Apr. 2011. time based policy update,” IEEE Trans. Neural Netw. Learning Syst.,
[11] X. Jin, S. J. Hu, J. Ni, and G. Xiao, “Assembly strategies for remanu- vol. 23, no. 7, pp. 1118–1129, Jul. 2012.
facturing systems with variable quality returns,” IEEE Trans. Autom. [35] D. Liu, H. Javaherian, O. Kovalenko, and T. Huang, “Adaptive critic
Sci. Eng., vol. 10, no. 1, pp. 76–85, Jan. 2013. learning techniques for engine torque and air-fuel ratio control,” IEEE
[12] Q. Kang, M. Zhou, J. An, and Q. Wu, “Swarm intelligence approaches Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 4, pp. 988–993,
to optimal power flow problem with distributed generator failures in Aug. 2008.
power networks,” IEEE Trans. Autom. Sci. Eng., vol. 10, no. 2, pp. [36] D. Liu, D. Wang, D. Zhao, Q. Wei, and N. Jin, “Neural-network-based
343–353, Feb. 2013. optimal control for a class of unknown discrete-time nonlinear systems
[13] O. Wigstrom, B. Lennartson, A. Vergnano, and C. Breitholtz, “High- using globalized dual heuristic programming,” IEEE Trans. Autom. Sci.
level scheduling of energy optimal trajectories,” IEEE Trans. Autom. Eng., vol. 9, no. 3, pp. 628–634, Jul. 2012.
Sci. Eng., vol. 10, no. 1, pp. 57–64, Jan. 2013. [37] D. Liu, Y. Huang, D. Wang, and Q. Wei, “Neural network observer-
[14] P. J. Werbos, “Advanced forecasting methods for global crisis warning based optimal control for unknown nonlinear systems using adaptive
and models of intelligence,” General Systems Yearbook, vol. 22, pp. dynamic programming,” Int. J. Control, vol. 86, no. 9, pp. 1554–1566,
25–38, 1977. Sep. 2013.
[38] Q. Wei, H. Zhang, and J. Dai, “Model-free multiobjective approximate Derong Liu (S’91–M’94–SM’96–F’05) received the
dynamic programming for discrete-time nonlinear systems with gen- B.S. degree in mechanical engineering from the East
eral performance index functions,” Neurocomputing, vol. 72, no. 7–9, China Institute of Technology (now Nanjing Univer-
pp. 1839–1848, 2009. sity of Science and Technology), Nanjing, China, in
[39] Q. Wei and D. Liu, “An iterative -optimal control scheme for a class 1982, the M.S. degree in automatic control theory and
of discrete-time nonlinear systems with unfixed initial state,” Neural applications from the Institute of Automation, Chi-
Netw., vol. 32, pp. 236–244, 2012. nese Academy of Sciences, Beijing, China, in 1987,
[40] H. Zhang, Q. Wei, and D. Liu, “An iterative adaptive dynamic pro- and the Ph.D. degree in electrical engineering from
gramming method for solving a class of nonlinear zero-sum differen- the University of Notre Dame, Notre Dame, IN, USA,
tial games,” Automatica, vol. 47, no. 1, pp. 207–214, Jan. 2011. in 1994.
[41] D. Liu and Q. Wei, “Finite-approximation-error-based optimal control Dr. Liu was a Product Design Engineer with China
approach for discrete-time nonlinear systems,” IEEE Trans. Cybern., North Industries Corporation, Jilin, China, from 1982 to 1984. He was an In-
vol. 43, no. 2, pp. 779–789, Apr. 2013. structor with the Graduate School of the Chinese Academy of Sciences, Beijing,
[42] N. Gopalsami and A. C. Raptis, “Acoustic velocity and attenuation from 1987 to 1990. He was a Staff Fellow with the General Motors Research
measurements in thin rods with application to temperature profiling in and Development Center, Warren, MI, USA, from 1993 to 1995. He was an As-
coal gasification systems,” IEEE Trans. Sonics Ultrasonics, vol. 31, no. sistant Professor with the Department of Electrical and Computer Engineering,
1, pp. 32–39, Jan. 1984. Stevens Institute of Technology, Hoboken, NJ, USA, from 1995 to 1999. He
[43] Q. Yang and S. Jagannathan, “Reinforcement learning controller de- joined the University of Illinois at Chicago, Chicago, IL, USA, in 1999, and
sign for affine nonlinear discrete-time systems using online approxi- became a Full Professor of electrical and computer engineering and computer
mators,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 42, no. 2, science in 2006. He was selected for the “100 Talents Program” by the Chinese
pp. 377–390, Apr. 2012. Academy of Sciences in 2008. He has published 14 books (six research mono-
[44] T. Basar and P. Bernard, Optimal Control and Related Minimax graphs and eight edited volumes).
Design Problems. Boston, MA, USA: Birkhauser, 1995. Dr. Liu is a member of Eta Kappa Nu and a fellow of the INNS. He re-
[45] J. Si and Y.-T. Wang, “On-line learning control by association and received the Michael J. Birck Fellowship from the University of Notre Dame in
inforcement,” IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264–276, 1990, the Harvey N. Davis Distinguished Teaching Award from the Stevens
Mar. 2001. Institute of Technology in 1997, the Faculty Early Career Development (CA-
REER) Award from the National Science Foundation in 1999, the University
Scholar Award from the University of Illinois in 2006, and the Overseas Out-
standing Young Scholar Award from the National Natural Science Foundation
of China in 2008. He was an Associate Editor of Automatica from 2006 to 2009.
He serves as an Associate Editor of Neurocomputing, the International Journal
of Neural Systems, Soft Computing, Neural Computing and Applications, the
Journal of Control Science and Engineering, and Science in China Series F:
Information Sciences. He was an elected member of the Board of Governors
of the International Neural Network Society from 2010 to 2012. He is a Gov-
erning Board Member of Asia Pacific Neural Network Assembly. He was a
member of the Conference Editorial Board of the IEEE Control Systems So-
ciety from 1995 to 2000, an Associate Editor of the IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMS-I: FUNDAMENTAL THEORY AND APPLICATIONS from
Qinglai Wei (M’11) received the B.S. degree in 1997 to 1999, the IEEE TRANSACTIONS ON SIGNAL PROCESSING from 2001 to
automation, the M.S. degree in control theory and 2003, the IEEE TRANSACTIONS ON NEURAL NETWORKS from 2004 to 2009, the
control engineering, and the Ph.D. degree in control IEEE Computational Intelligence Magazine from 2006 to 2009, and the IEEE
theory and control engineering from Northeastern Circuits and Systems Magazine from 2008 to 2009, and the Letters Editor of
University, Shenyang, China, in 2002, 2005, and the IEEE TRANSACTIONS ON NEURAL NETWORKS from 2006 to 2008. He was
2008, respectively. the Founding Editor of the IEEE COMPUTATIONAL INTELLIGENCE SOCIETY’S
He was a Postdoctoral Fellow with the Institute ELECTRONIC LETTER from 2004 to 2009. Currently, he is the Editor-in-Chief
of Automation, Chinese Academy of Sciences, Bei- of the IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
jing, China, from 2009 to 2011. He is currently an and an Associate Editor of the IEEE TRANSACTIONS ON CONTROL SYSTEMS
Associate Professor with The State Key Laboratory TECHNOLOGY and the IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION
of Management and Control for Complex Systems. SYSTEMS. He is the General Chair of the 2014 IEEE World Congress on Compu-
His current research interests include neural networks-based control, adaptive tational Intelligence, Beijing, China. He was an elected AdCom member of the
dynamic programming, optimal control, nonlinear system, and their industrial IEEE Computational Intelligence Society from 2006 to 2008. He is the Chair of
applications. IEEE CIS Beijing Chapter.

Adaptive Dynamic Programming For Optimal Tracking Control of Unknown Nonlinear Systems With Application To Coal Gasification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adaptive Dynamic Programming For Optimal Tracking Control of Unknown Nonlinear Systems With Application To Coal Gasification

Uploaded by

Copyright:

Available Formats

1020 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 11, NO.

Adaptive Dynamic Programming for Optimal

to solve optimal control problem forward-in-time [16]–[22].

where denotes the molar quantity in the synthesis gas. (7)

According to (1)–(5), the coal gasification control system can

The weights are adjusted to minimize the following error:

By a gradient-based adaptation rule, the weights are updated as

(10) Considering , we can get

where and are learning rates.

As is finite, there exists a that satisfies

The difference of the Lyapunov function candidate is given by

Then, we have . The proof is completed.

B. Data-Based Identifications of Coal Quality and Reference

Let the number of hidden layer neurons be denoted as . Let

where is defined as the desired control. Let

As cannot be obtained directly, network is used to ap-

where . In the following, the distur- As , , and are

(30) On the other hand, as mentioned in Remark 3.1, we have

where . According to (32) and (34), we

hold uniformly. If there exists that makes (46) hold

Update the iterative performance index function by

where we define , for and .

where is expressed as (37). Let and

Let . According to (46), we can obtain

Fig. 2. The structure diagram of the algorithm.

geometrical series. If , then for , (47)

According to (50) and (51), we can obtain (49).

(48) A. The Critic Network

for . Let be random weight matrices. Let

respectively, where and are reconstruction errors.

(52) where and

B. The Action Network

Let where is an arbitrary matrix. We

where is the learning rate of action network.

nonlinear system can be summarized in the following flowchart

we can obtain . The proof is completed.

C. Summary of the Iterative ADP Algorithm

Fig. 4. The convergence trajectory of iterative performance index function.

Next, we adopt three-layer BP NNs to identify the coal quality

Fig. 5. The trajectory of state.

Taking the current system data , and into

Fig. 8. The convergence trajectory of iterative performance index function.

Fig. 9. The trajectory of state.

successfully makes the system state track the desired tempera-

Fig. 13. The convergence trajectory of iterative performance index function.

Fig. 14. The trajectory of state.

the two-person zero-sum optimal control problem, we have

You might also like