Professional Documents
Culture Documents
4, OCTOBER 2014
Abstract—In this paper, we establish a new data-based iterative control system into a two-person zero-sum control system. Iter-
optimal learning control scheme for discrete-time nonlinear sys- ative ADP algorithm with iteration errors is then established to
tems using iterative adaptive dynamic programming (ADP) ap- obtain the optimal control scheme, where the convergence proof is
proach and apply the developed control scheme to solve a coal gasi- developed.
fication optimal tracking control problem. According to the system
Index Terms—Adaptive dynamic programming, coal gasifi-
data, neural networks (NNs) are used to construct the dynamics
cation, data-based control, finite approximation errors, neural
of coal gasification process, coal quality and reference control, re-
networks, optimal tracking control.
spectively, where the mathematical model of the system is unnec-
essary. The approximation errors from neural network construc-
tion of the disturbance and the controls are both considered. Via
system transformation, the optimal tracking control problem with I. INTRODUCTION
approximation errors and disturbances is effectively transformed
into a two-person zero-sum optimal control problem. A new itera-
tive ADP algorithm is then developed to obtain the optimal control
laws for the transformed system. Convergence property is devel-
oped to guarantee that the performance index function converges
C OAL is the world’s most abundant energy resource
and the cheapest fossil fuel. The development of coal
gasification technologies, which is a primary component of the
to a finite neighborhood of the optimal performance index func-
carbon-based process industries, is of primary importance to
tion, and the convergence criterion is also obtained. Finally, numer-
ical results are given to illustrate the performance of the present deal with the limited petroleum reserves [1]. Hence, optimal
method. control for the coal gasification is a key problem for developing
the carbon-based process industries. To describe the process of
Note to Practitioners—Dynamic programming is a useful coal gasification, many discussions focus on coal gasification
technique for solving optimal control problems. However, in
modeling approaches [2]–[5]. The established models are
many cases, it is computationally difficult to apply it due to the
backward-in-time calculation or the “curse of dimensionality.” usually very complex with high nonlinearities. To simplify
ADP is an effective tool for solving optimal control problems the controller design, the traditional control method for the
forward-in-time. For most ADP algorithms, the accurate system coal gasification process adopts feedback linearization control
model, the accurate iterative control and the accurate iterative method [6]–[8]. However, the controller designed by feedback
performance index function are required to obtain the optimal
linearization technique is only effective in the neighborhood
control law. These iterative ADP algorithms can be called “ac-
curate iterative ADP algorithms.” For many real-world control of the equilibrium point. When the required operating range
systems, such as coal gasification systems, the system model is is large, the nonlinearities in the system cannot be properly
very difficult to construct. The optimal control and optimal per- compensated by using a linear model. Therefore, it is necessary
formance index function cannot analytically be obtained. These to study an optimal control approach for the original nonlinear
make the accurate iterative ADP algorithms difficult to apply in
system [9]–[13]. But to the best of our knowledge, there are no
real-world industrial systems. In this paper, based on the system
data, NNs are used to overcome these difficulties, where the ap- discussions on the optimal controller design for the nonlinear
proximation errors and control disturbance are both considered. coal gasification systems. One of the difficulties is complexity
System transformation is introduced that transforms the tracking of the coal gasification systems, which makes the expression
of the optimal control law very complex. Generally, the op-
Manuscript received June 21, 2013; accepted September 01, 2013. Date of timal control law cannot be expressed analytically. Another
publication November 06, 2013; date of current version October 02, 2014. This difficulty to obtain the optimal control law lies in solving the
paper was recommended for publication by Associate Editor H. Wang and Ed-
time-varying Hamilton–Jacobi–Bellman (HJB) equation which
itor M. C. Zhou upon evaluation of the reviewers’ comments. This work was
supported in part by the National Natural Science Foundation of China under is usually too difficult to solve analytically. On the other hand,
Grant 61034002, Grant 61233001, Grant 61273140, and Grant 61374105, in in the real-world control systems of coal gasification processes,
part by the Beijing Natural Science Foundation under Grant 4132078, and in
the coal quality is also unknown for control systems. This
part by the Early Career Development Award of SKLMCCS.
The authors are with the State Key Laboratory of Management and Control makes it more difficult to obtain the optimal control law of the
for Complex Systems, Institute of Automation, Chinese Academy of Sciences, coal gasification systems. To overcome these difficulties, a new
Beijing 100190, China (e-mail: qinglai.wei@ia.ac.cn; derong.liu@ia.ac.cn).
optimal control scheme must be established.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. Adaptive dynamic programming (ADP), proposed by
Digital Object Identifier 10.1109/TASE.2013.2284545 Werbos [14], [15], has played an important role as a way
1545-5955 © 2013 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/
redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
WEI AND LIU: ADP FOR OPTIMAL TRACKING CONTROL OF UNKNOWN NONLINEAR SYSTEMS WITH APPLICATION TO COAL GASIFICATION 1021
The other phase is water-gas shift reaction which is reversible also difficult to obtain for the unknown system. Furthermore,
and mildly exothermic the coal quality is also an unknown and uncontrollable pa-
rameter. Thus, new methods must be established to solve these
(2) problems.
where CO is carbon monoxide, is carbon dioxide, and III. DATA-BASED MODELING AND PROPERTIES
is water.
The coal combustion reaction is instantaneous and nonre- In this section, three-layer back-propagation (BP) NNs are
versible. The water-gas shift reaction is reversible and the re- introduced to approximate the system (6). We also use NNs to
action is strongly dependent on the reaction temperature. Let solve the reference control and obtain the coal quality. Let the
be the reaction temperature and let denote the reac- number of hidden layer neurons be denoted by . Let the weight
tion equilibrium coefficient. Then, we have the following em- matrix between the input layer and hidden layer be denoted by
pirical formula [3]: . Let the weight matrix between the hidden layer and output
layer be denoted by . Let the input vector of the NN be de-
noted as . Then, the output of three-layer NN is represented
(3)
by
(5)
of the ideal weight matrix and let be the estima- With the identification error dynamics (9) and the weight tuning
tion of the ideal weight matrix . Then, we define the system rules of and in (10), we can obtain
identification errors as
(9)
where and
. Let and
. Then, we can get Applying the Cauchy–Schwarz inequality, we can get
Let be selected as
and be selected as
the system data. Different from the system modeling, the coal where and is the
quality data cannot generally be detected and identified in real- reconstruction error. Let where is an
time coal gasification process. This means that the coal quality arbitrary matrix. The NN reference control is constructed as
data can only be achieved offline. Noticing this feature, an iter-
ative training method of the neural networks can be adopted.
According to (6), we can solve , which is expressed as (18)
where the estimated reference control, is esti-
mated weight matrix. Define the identification error as
(12)
(19)
Usually, is a high nonlinear system and the analytical ex-
pression of is nearly impossible to obtain. Thus, a BP NN where and
( network for brief) is established to identify the coal quality . Similarly, the weights are updated as
function .
Let the number of hidden layer neurons be denoted as .
Let the ideal weights be denoted as . The NN representation (20)
of (12) can be written as
where is the learning rate.
(13) Next, we give the convergence properties of network and
network.
where and Theorem 3.2: Let the identification schemes (14) and (18) be
is the reconstruction error. Let , used to identify and in (12) and (17), respectively.
where is an arbitrary matrix. The NN coal quality function Let the NN weights be updated by (15) and (20), respectively.
is constructed as If for , the inequalities
(14)
(21)
where is the estimated coal quality function, and
is the estimated weight matrix. According to (12), we notice that hold, where and , then the error
solving needs the data of . As we adopt offline matrices and both converge to zero, as .
data to train the NN, the corresponding data can be achieved. Proof: Consider the following Lyapunov function
Define the identification error as candidate
where and
. Similarly, the weights are updated as
As the activation functions and
are both bounded. We can let and
(15) , respectively. The difference of the Lya-
punov function candidate is given by
where is the learning rate.
Next, we will solve the reference control using NN (
network for brief). In this paper, as we aim to design a state
feedback controller to make the system state track the de-
sired one, according to the state equation in (6), we give
to approximate the reference control
function , which is expressed as
(16)
(17) (22)
WEI AND LIU: ADP FOR OPTIMAL TRACKING CONTROL OF UNKNOWN NONLINEAR SYSTEMS WITH APPLICATION TO COAL GASIFICATION 1025
where and . As errors are still unknown. It is difficult to design the optimal
and are bounded, there exist and that tracking control system with unknown system errors. Thus, an
satisfy effective system transformation is presented in this section.
In order to transform the system, for the desired system state
(23) , a desired reference control (desired control for brief) can be
obtained. Taking the desired state trajectory into (16), we can
Then, (22) can be written as obtain the reference control trajectory
(24)
where , where
, is . From (37), we can see that the nonlinear
some constant. Let , tracking control system (6) is transformed into a regulation
, and system, where the system errors and the control fluctuation are
is some constant. As are both bounded transformed into an unknown bounded system disturbance.
and and are both smooth, then we have
and are both bounded. So, B. Derivation of the Iterative ADP Algorithm With System
we let and . Errors, Iteration Errors and Control Disturbance
Then, (30) can be written as In this section, our goal is to obtain an optimal control that
makes the tracking error converge to zero under the system
disturbance . As the system disturbance is unknown,
(31) it makes the design of the optimal controller very difficult. In
[44], the optimal control problem for system (37) was trans-
Let the tracking error be defined as formed into a two-person zero-sum optimal control problem,
where the system disturbance was defined as a control
(32)
variable. The optimal control law is obtained under the worst
case of the disturbance (the disturbance control maximizes the
where is the desired state trajectory. Let
performance index function). Inspired by [44], we define
(33) as a disturbance control of the system and the two controls
and of system (37) are designed to optimize the following
where is the neural-network generated reference control quadratic performance index function:
trajectory expressed by (26). According to (27), (29), and (31),
we can get
(34) (38)
(35)
(39)
where and
,
. Thus, (31) can be written as Let
be the utility function. In this paper, we as-
sume that the utility function for
. Generally, the system errors are small.
(36) This makes the system disturbance small and the utility
where function larger than zero. If are large, we can reduce
the matrix or enlarge the matrices and . Hence, the
assumption can be guaranteed.
WEI AND LIU: ADP FOR OPTIMAL TRACKING CONTROL OF UNKNOWN NONLINEAR SYSTEMS WITH APPLICATION TO COAL GASIFICATION 1027
According to the principle of optimality, satisfies C. Properties of the Iterative ADP Algorithm With System
the discrete-time HJI equation Errors, Iteration Errors and Control Disturbance
For the two-person zero-sum iterative ADP algo-
rithm (41)–(43), as the iteration errors are unknown, for
(40) , the properties of the iterative performance index
Define the laws of optimal controls as function and the iterative control laws and
are very difficult to analyze. In [41], for nonlinear
systems with single controller, a new “error bound” analysis
method is proposed to prove the convergence of the iterative
performance index function. In this paper, we will give the
“error bound” convergence analysis of the iterative perfor-
mance index functions for nonlinear two-person zero-sum
optimal control problems.
For , define a new iterative performance index
Hence, the HJI (40) can be written as function as
(44)
We can see that if we want to obtain the optimal control
laws and , we must obtain the optimal per- where and
formance index function . Generally, is un-
known before all the controls and are considered.
This makes HJI equation generally unsolvable. In this paper, a
(45)
new iterative ADP algorithm with system and approximation er-
is the accurate iterative control law. According to (43), for
rors is developed to overcome these difficulties. In the present
iterative ADP algorithm, the performance index function and , there exists a finite constant that makes
control law are updated by iterations, with the iteration index
(46)
increasing from 0 to infinity. Let the initial performance index
function . For , the iterative control hold uniformly. Hence, we can give the following theorem.
law and can be computed as Theorem 4.1: For , let be expressed
as in (44) and be expressed as in (43). Let be
expressed as in (37). Let be a constant that makes
(41)
which shows that (47) holds for . Assume that (47) holds
for , where . Then, for , we have
(50)
According to (46), letting , we have
(51)
Then, we define the error function for the critic network as function and iterative control be approximated by the critic and
action networks, respectively. The weight convergence prop-
erty of the neural networks is shown in the following theorem.
Theorem 5.1: Let the target performance index function and
the target iterative control law be expressed by
The objective function to be minimized in the critic network
training is
(54)
The weight updating algorithm is similar to the one for the critic
network. By the gradient descent rule, we can obtain
(53)
and
Fig. 6. The trajectories of control and system output. (a) Coal input trajectory.
(b) input trajectory. (c) input trajectory. (d) CO output trajectory.
Fig. 10. The trajectories of control and system output. (a) Coal input trajectory.
(b) input trajectory. (c) input trajectory. (d) CO output trajectory. Fig. 12. The control disturbance and the input-output mass error. (a) Control
disturbance . (b) Control disturbance . (c) Control disturbance
. (d) The error between the input and output mass.
Fig. 11. The trajectories of system output. (a) output trajectory. (b)
output trajectory. (c) output trajectory. (d) Char output trajectory. optimal tracking control scheme for the system. On the other
hand, if we enlarge the iteration errors, the control property is
and the training precisions of critic and action networks quite different. Let the disturbance of the control .
are kept with . The convergence trajectory of the iterative Let the training precisions for model network, network, and
performance index function is shown in Fig. 13. The optimal network be kept at . We change the training precisions
state trajectory is shown in Fig. 14. The corresponding control of critic and action networks to . Let the iteration index
trajectories and system output trajectories are shown in Figs. 15 . The convergence trajectory of the iterative perfor-
and 16, respectively. From the numerical results, we can see mance index function is shown in Fig. 17(a), where we can see
that under the disturbance of the control input, we can also ob- that the iterative performance is not convergent any more. The
tain the optimal tracking control of the system which shows corresponding state trajectory is shown in Fig. 17(b), where we
the effectiveness and robustness of the developed iterative ADP notice that the desired state is not achieved.
method. To verify the correctness of the model and the devel- Remark 6.1: From the numerical results, we can see that the
oped method, the mass errors between the input and output is developed iterative ADP algorithm permits large system errors
given in Fig. 12(d). and the control disturbance to achieve the optimal tracking
From the numerical results, we can see that when the system control law of the system. The admissible iteration errors for
errors and the disturbance of the control input are enlarged, the the iterative ADP algorithm are relatively small. We can find
developed iterative ADP algorithm is still effective to find the the reason according to the theoretical analysis. From (39), for
1034 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 11, NO. 4, OCTOBER 2014
Fig. 16. The trajectories of system output. (a) output trajectory. (b)
output trajectory. (c) output trajectory. (d) Char output trajectory.
Fig. 15. The trajectories of control and system output. (a) Coal input trajectory.
(b) input trajectory. (c) input trajectory. (d) CO output trajectory.
Fig. 17. The trajectory of iterative performance index function. (a) The trajec-
tory of iterative performance index function. (b) The trajectory of state.
Hence, we can say that the developed optimal control scheme is [15] P. J. Werbos, “A menu of designs for reinforcement learning over
effectively a representative of the optimal control procedure for time,” in Neural Networks for Control, W. T. Miller, R. S. Sutton, and
P. J. Werbos, Eds. Cambridge: MIT Press, 1991, pp. 67–95.
a genuine industrial application. [16] H. He, Z. Ni, and J. Fu, “A three-network architecture for on-line
learning and optimization based on adaptive dynamic programming,”
Neurocomputing, vol. 78, no. 1, pp. 3–13, 2012.
VII. CONCLUSION [17] A. Heydari and S. N. Balakrishnan, “Finite-horizon control-con-
strained nonlinear optimal control using single network adaptive
In this paper, an effective iterative ADP algorithm is es- critics,” IEEE Trans. Neural Netw. Learning Syst., vol. 24, no. 1, pp.
tablished to solve optimal tracking control problems for coal 145–157, Jan. 2013.
gasification systems. Using the input-state-output data of the [18] W. S. Lin and J. W. Sheu, “Optimization of train regulation and energy
usage of metro lines using an adaptive-optimal-control algorithm,”
system, NNs are used to approximate the system model, the IEEE Trans. Autom. Sci. Eng., vol. 8, no. 4, pp. 855–864, Apr. 2011.
coal quality and the reference control, respectively, and the [19] D. Liu, Y. Zhang, and H. Zhang, “A self-learning call admission control
mathematical model of the coal gasification is unnecessary. scheme for CDMA cellular networks,” IEEE Trans. Neural Networks,
vol. 16, no. 5, pp. 1219–1228, Sep. 2005.
Considering the system errors of NNs and the control distur- [20] D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,” IEEE
bance, the optimal tracking control problem is transformed into Trans. Neural Netw., vol. 8, no. 5, pp. 997–1007, Sep. 1997.
a two-person zero-sum optimal regulation control problem. It- [21] P. J. Werbos, “Intelligence in the brain: A theory of how it works and
how to build it,” Neural Netw., vol. 22, no. 3, pp. 200–212, 2009.
erative ADP algorithm is then established to obtain the optimal
[22] X. Xu, Z. Hou, C. Lian, and H. He, “Online learning control using adap-
control law where the approximation errors in each iteration tive critic designs with sparse kernel machines,” IEEE Trans. Neural
are considered. Convergence analysis is given to guarantee that Netw. Learning Syst., vol. 24, no. 5, pp. 762–775, May 2013.
the performance index function is convergent to a finite neigh- [23] S. Bhasin, R. Kamalapurkar, M. Johnson, K. G. Vamvoudakis, F. L.
Lewis, and W. E. Dixon, “A novel actor-critic-identifier architecture
borhood of the optimal performance index function. Finally, for approximate optimal control of uncertain nonlinear systems,” Au-
numerical results are displayed to illustrate the performance of tomatica, vol. 49, no. 1, pp. 82–92, Jan. 2013.
the developed algorithm. [24] F. L. Lewis and D. Liu, Reinforcement Learning and Approximate Dy-
namic Programming for Feedback Control. New York, NY, USA:
Wiley, 2012.
REFERENCES [25] F. Y. Wang, N. Jin, D. Liu, and Q. Wei, “Adaptive dynamic program-
ming for finite-horizon optimal control of discrete-time nonlinear sys-
[1] I. B. Matveev, V. E. Messerle, and A. B. Ustimenko, “Investigation of tems with -error bound,” IEEE Trans. Neural Netw., vol. 22, no. 1,
plasma-aided bituminous coal gasification,” IEEE Trans. Plasma Sci., pp. 24–36, Jan. 2011.
vol. 37, no. 4, pp. 580–585, Apr. 2009. [26] Q. Wei and D. Liu, “Numerical adaptive learning control scheme for
[2] N. Abani and A. F. Ghoniem, “Large eddy simulations of coal gasifi- discrete-time nonlinear systems,” IET Control Theory Appl., vol. 7, no.
cation in an entrained flow gasifier,” Fuel, vol. 104, pp. 664–680, Feb. 11, pp. 1472–1486, July 2013.
2013. [27] Z. Ni, H. He, and J. Wen, “Adaptive learning in tracking control based
[3] P. Ruprecht, W. Schafer, and P. Wallace, “A computer model of en- on the dual critic network design,” IEEE Trans. Neural Netw. Learning
trained coal gasification,” Fuel, vol. 67, no. 6, pp. 739–742, 1988. Syst., vol. 24, no. 6, pp. 913–928, June 2013.
[4] S. I. Serbin and I. B. Matveev, “Theoretical investigations of the [28] F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement
working processes in a plasma coal gasification system,” IEEE Trans. learning and adaptive dynamic programming for feedback control,”
Plasma Sci., vol. 38, no. 12, pp. 3300–3305, Dec. 2010. IEEE Control Syst. Mag., vol. 32, no. 6, pp. 76–105, 2012.
[5] J. Xu, L. Qiao, and J. Gore, “Multiphysics well-stirred reactor modeling [29] M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for non-
of coal gasification under intense thermal radiation,” Int. J. Hydrogen linear systems with saturating actuators using a neural network HJB
Energy, vol. 38, no. 17, pp. 7007–7015, June 2013. approach,” Automatica, vol. 41, no. 5, pp. 779–791, May 2005.
[6] R. Guo, G. Cheng, and Y. Wang, “Texaco coal gasification quality [30] J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dy-
prediction by neural estimator based on dynamic PCA,” in Proc. namic programming,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev.,
IEEE Int. Conf. Mechatronics Autom., Luoyang, China, June 2006, vol. 32, no. 2, pp. 140–153, May 2002.
pp. 2241–2246. [31] A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time non-
[7] K. Kostur and J. Kacur, “Developing of optimal control system for linear HJB solution using approximate dynamic programming: Con-
UCG,” in Proc. 13th Int. Carpathian Control Conf., Podbanske, Slovak vergence proof,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38,
Republic, May 2012, pp. 347–352. no. 4, pp. 943–949, Aug. 2008.
[8] J. A. Wilson, M. Chew, and W. E. Jones, “State estimation-based con- [32] B. Lincoln and A. Rantzer, “Relaxing dynamic programming,” IEEE
trol of a coal gasifier,” IEE Proc. Control Theory Appl., vol. 153, no. Trans. Autom. Control, vol. 51, no. 8, pp. 1249–1260, Aug. 2006.
3, pp. 268–276, 2006. [33] H. Zhang, Q. Wei, and Y. Luo, “A novel infinite-time optimal tracking
[9] Y. Chen, Z. Li, and M. Zhou, “Optimal supervisory control of flexible control scheme for a class of discrete-time nonlinear systems via the
manufacturing systems by Petri nets: A set classification approach,” greedy HDP iteration algorithm,” IEEE Trans. Syst., Man, Cybern. B,
IEEE Trans. Autom. Sci. Eng., doi: 10.1109/TASE.2013.2241762. Cybern., vol. 38, no. 4, pp. 937–942, Jul. 2008.
[10] Q. S. Jia, “An adaptive sampling algorithm for simulation-based opti- [34] T. Dierks and S. Jagannathan, “Online optimal control of affine non-
mization with descriptive complexity preference,” IEEE Trans. Autom. linear discrete-time systems with unknown internal dynamics by using
Sci. Eng., vol. 8, no. 4, pp. 720–731, Apr. 2011. time based policy update,” IEEE Trans. Neural Netw. Learning Syst.,
[11] X. Jin, S. J. Hu, J. Ni, and G. Xiao, “Assembly strategies for remanu- vol. 23, no. 7, pp. 1118–1129, Jul. 2012.
facturing systems with variable quality returns,” IEEE Trans. Autom. [35] D. Liu, H. Javaherian, O. Kovalenko, and T. Huang, “Adaptive critic
Sci. Eng., vol. 10, no. 1, pp. 76–85, Jan. 2013. learning techniques for engine torque and air-fuel ratio control,” IEEE
[12] Q. Kang, M. Zhou, J. An, and Q. Wu, “Swarm intelligence approaches Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 4, pp. 988–993,
to optimal power flow problem with distributed generator failures in Aug. 2008.
power networks,” IEEE Trans. Autom. Sci. Eng., vol. 10, no. 2, pp. [36] D. Liu, D. Wang, D. Zhao, Q. Wei, and N. Jin, “Neural-network-based
343–353, Feb. 2013. optimal control for a class of unknown discrete-time nonlinear systems
[13] O. Wigstrom, B. Lennartson, A. Vergnano, and C. Breitholtz, “High- using globalized dual heuristic programming,” IEEE Trans. Autom. Sci.
level scheduling of energy optimal trajectories,” IEEE Trans. Autom. Eng., vol. 9, no. 3, pp. 628–634, Jul. 2012.
Sci. Eng., vol. 10, no. 1, pp. 57–64, Jan. 2013. [37] D. Liu, Y. Huang, D. Wang, and Q. Wei, “Neural network observer-
[14] P. J. Werbos, “Advanced forecasting methods for global crisis warning based optimal control for unknown nonlinear systems using adaptive
and models of intelligence,” General Systems Yearbook, vol. 22, pp. dynamic programming,” Int. J. Control, vol. 86, no. 9, pp. 1554–1566,
25–38, 1977. Sep. 2013.
1036 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 11, NO. 4, OCTOBER 2014
[38] Q. Wei, H. Zhang, and J. Dai, “Model-free multiobjective approximate Derong Liu (S’91–M’94–SM’96–F’05) received the
dynamic programming for discrete-time nonlinear systems with gen- B.S. degree in mechanical engineering from the East
eral performance index functions,” Neurocomputing, vol. 72, no. 7–9, China Institute of Technology (now Nanjing Univer-
pp. 1839–1848, 2009. sity of Science and Technology), Nanjing, China, in
[39] Q. Wei and D. Liu, “An iterative -optimal control scheme for a class 1982, the M.S. degree in automatic control theory and
of discrete-time nonlinear systems with unfixed initial state,” Neural applications from the Institute of Automation, Chi-
Netw., vol. 32, pp. 236–244, 2012. nese Academy of Sciences, Beijing, China, in 1987,
[40] H. Zhang, Q. Wei, and D. Liu, “An iterative adaptive dynamic pro- and the Ph.D. degree in electrical engineering from
gramming method for solving a class of nonlinear zero-sum differen- the University of Notre Dame, Notre Dame, IN, USA,
tial games,” Automatica, vol. 47, no. 1, pp. 207–214, Jan. 2011. in 1994.
[41] D. Liu and Q. Wei, “Finite-approximation-error-based optimal control Dr. Liu was a Product Design Engineer with China
approach for discrete-time nonlinear systems,” IEEE Trans. Cybern., North Industries Corporation, Jilin, China, from 1982 to 1984. He was an In-
vol. 43, no. 2, pp. 779–789, Apr. 2013. structor with the Graduate School of the Chinese Academy of Sciences, Beijing,
[42] N. Gopalsami and A. C. Raptis, “Acoustic velocity and attenuation from 1987 to 1990. He was a Staff Fellow with the General Motors Research
measurements in thin rods with application to temperature profiling in and Development Center, Warren, MI, USA, from 1993 to 1995. He was an As-
coal gasification systems,” IEEE Trans. Sonics Ultrasonics, vol. 31, no. sistant Professor with the Department of Electrical and Computer Engineering,
1, pp. 32–39, Jan. 1984. Stevens Institute of Technology, Hoboken, NJ, USA, from 1995 to 1999. He
[43] Q. Yang and S. Jagannathan, “Reinforcement learning controller de- joined the University of Illinois at Chicago, Chicago, IL, USA, in 1999, and
sign for affine nonlinear discrete-time systems using online approxi- became a Full Professor of electrical and computer engineering and computer
mators,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 42, no. 2, science in 2006. He was selected for the “100 Talents Program” by the Chinese
pp. 377–390, Apr. 2012. Academy of Sciences in 2008. He has published 14 books (six research mono-
[44] T. Basar and P. Bernard, Optimal Control and Related Minimax graphs and eight edited volumes).
Design Problems. Boston, MA, USA: Birkhauser, 1995. Dr. Liu is a member of Eta Kappa Nu and a fellow of the INNS. He re-
[45] J. Si and Y.-T. Wang, “On-line learning control by association and re- ceived the Michael J. Birck Fellowship from the University of Notre Dame in
inforcement,” IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264–276, 1990, the Harvey N. Davis Distinguished Teaching Award from the Stevens
Mar. 2001. Institute of Technology in 1997, the Faculty Early Career Development (CA-
REER) Award from the National Science Foundation in 1999, the University
Scholar Award from the University of Illinois in 2006, and the Overseas Out-
standing Young Scholar Award from the National Natural Science Foundation
of China in 2008. He was an Associate Editor of Automatica from 2006 to 2009.
He serves as an Associate Editor of Neurocomputing, the International Journal
of Neural Systems, Soft Computing, Neural Computing and Applications, the
Journal of Control Science and Engineering, and Science in China Series F:
Information Sciences. He was an elected member of the Board of Governors
of the International Neural Network Society from 2010 to 2012. He is a Gov-
erning Board Member of Asia Pacific Neural Network Assembly. He was a
member of the Conference Editorial Board of the IEEE Control Systems So-
ciety from 1995 to 2000, an Associate Editor of the IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMS-I: FUNDAMENTAL THEORY AND APPLICATIONS from
Qinglai Wei (M’11) received the B.S. degree in 1997 to 1999, the IEEE TRANSACTIONS ON SIGNAL PROCESSING from 2001 to
automation, the M.S. degree in control theory and 2003, the IEEE TRANSACTIONS ON NEURAL NETWORKS from 2004 to 2009, the
control engineering, and the Ph.D. degree in control IEEE Computational Intelligence Magazine from 2006 to 2009, and the IEEE
theory and control engineering from Northeastern Circuits and Systems Magazine from 2008 to 2009, and the Letters Editor of
University, Shenyang, China, in 2002, 2005, and the IEEE TRANSACTIONS ON NEURAL NETWORKS from 2006 to 2008. He was
2008, respectively. the Founding Editor of the IEEE COMPUTATIONAL INTELLIGENCE SOCIETY’S
He was a Postdoctoral Fellow with the Institute ELECTRONIC LETTER from 2004 to 2009. Currently, he is the Editor-in-Chief
of Automation, Chinese Academy of Sciences, Bei- of the IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
jing, China, from 2009 to 2011. He is currently an and an Associate Editor of the IEEE TRANSACTIONS ON CONTROL SYSTEMS
Associate Professor with The State Key Laboratory TECHNOLOGY and the IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION
of Management and Control for Complex Systems. SYSTEMS. He is the General Chair of the 2014 IEEE World Congress on Compu-
His current research interests include neural networks-based control, adaptive tational Intelligence, Beijing, China. He was an elected AdCom member of the
dynamic programming, optimal control, nonlinear system, and their industrial IEEE Computational Intelligence Society from 2006 to 2008. He is the Chair of
applications. IEEE CIS Beijing Chapter.