Textbook Adaptive Critic Control With Robust Stabilization For Uncertain Nonlinear Systems Ding Wang Ebook All Chapter PDF

Adaptive Critic Control with Robust
Stabilization for Uncertain Nonlinear

Systems Ding Wang
Visit to download the full and correct content document:
https://textbookfull.com/product/adaptive-critic-control-with-robust-stabilization-for-unc
ertain-nonlinear-systems-ding-wang/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
Adaptive Robust Control with Limited Knowledge on

Systems Dynamics An Artificial Input Delay Approach and
Beyond Spandan Roy
https://textbookfull.com/product/adaptive-robust-control-with-
limited-knowledge-on-systems-dynamics-an-artificial-input-delay-
approach-and-beyond-spandan-roy/
Adaptive and Fault-Tolerant Control of Underactuated

Nonlinear Systems 1st Edition Jiangshuai Huang
https://textbookfull.com/product/adaptive-and-fault-tolerant-
control-of-underactuated-nonlinear-systems-1st-edition-
jiangshuai-huang/
Theory of stabilization for linear boundary control

systems 1st Edition Nambu
https://textbookfull.com/product/theory-of-stabilization-for-
linear-boundary-control-systems-1st-edition-nambu/
Adversarial and Uncertain Reasoning for Adaptive Cyber

Defense Control and Game Theoretic Approaches to Cyber
Security Sushil Jajodia
https://textbookfull.com/product/adversarial-and-uncertain-
reasoning-for-adaptive-cyber-defense-control-and-game-theoretic-
approaches-to-cyber-security-sushil-jajodia/
Intelligent Optimal Adaptive Control for Mechatronic
Systems 1st Edition Marcin Szuster
https://textbookfull.com/product/intelligent-optimal-adaptive-
control-for-mechatronic-systems-1st-edition-marcin-szuster/
Stabilization and H∞ Control of Switched Dynamic

Systems Jun Fu
https://textbookfull.com/product/stabilization-
and-h%e2%88%9e-control-of-switched-dynamic-systems-jun-fu/
Biologically Inspired Control of Humanoid Robot Arms

Robust and Adaptive Approaches 1st Edition Adam Spiers
https://textbookfull.com/product/biologically-inspired-control-
of-humanoid-robot-arms-robust-and-adaptive-approaches-1st-
edition-adam-spiers/
Nonlinear control systems using MATLAB First Edition

Boufadene
https://textbookfull.com/product/nonlinear-control-systems-using-
matlab-first-edition-boufadene/
Robust Adaptive Dynamic Programming 1st Edition Hao Yu
https://textbookfull.com/product/robust-adaptive-dynamic-
programming-1st-edition-hao-yu/
Studies in Systems, Decision and Control 167
Ding Wang · Chaoxu Mu
Adaptive Critic
Control with Robust
Stabilization for
Uncertain Nonlinear
Systems
Studies in Systems, Decision and Control
Volume 167
Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: kacprzyk@ibspan.waw.pl
The series “Studies in Systems, Decision and Control” (SSDC) covers both new
developments and advances, as well as the state of the art, in the various areas of
broadly perceived systems, decision making and control-quickly, up to date and
with a high quality. The intent is to cover the theory, applications, and perspectives
on the state of the art and future developments relevant to systems, decision
making, control, complex processes and related areas, as embedded in the fields of
engineering, computer science, physics, economics, social and life sciences, as well
as the paradigms and methodologies behind them. The series contains monographs,
textbooks, lecture notes and edited volumes in systems, decision making and
control spanning the areas of Cyber-Physical Systems, Autonomous Systems,
Sensor Networks, Control Systems, Energy Systems, Automotive Systems,
Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace
Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power
Systems, Robotics, Social Systems, Economic Systems and other. Of particular
value to both the contributors and the readership are the short publication timeframe
and the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
More information about this series at http://www.springer.com/series/13304

Ding Wang Chaoxu Mu
•
Adaptive Critic Control

with Robust Stabilization
for Uncertain Nonlinear
Systems
123
Ding Wang Chaoxu Mu
The State Key Laboratory of Management School of Electrical and Information
and Control for Complex Systems Engineering
Institute of Automation, Chinese Academy Tianjin University
of Sciences Tianjin, China
Beijing, China
ISSN 2198-4182 ISSN 2198-4190 (electronic)

Studies in Systems, Decision and Control
ISBN 978-981-13-1252-6 ISBN 978-981-13-1253-3 (eBook)
https://doi.org/10.1007/978-981-13-1253-3
Library of Congress Control Number: 2018948621
© Springer Nature Singapore Pte Ltd. 2019

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Foreword
Machine learning is one of the core techniques of artificial intelligence. Among

them, reinforcement learning has experienced rapid development in recent years; it
generates strategies through learning during the interaction process between
machine and environment. As an important branch of reinforcement learning, the
adaptive critic technique has roots in dynamic programming and optimization
design. In order to effectively solve optimal control problems of complex dynamical
systems, the adaptive dynamic programming approach was proposed by combining
adaptive critics, dynamic programming, and artificial neural networks and attracted
extensive attention. In particular, great progress has been obtained on robust
adaptive critic control design with uncertainties and disturbances. Now, it has been
regarded as a necessary outlet to construct intelligent learning systems and achieve
true brain-like intelligence.
This book by Dr. Ding Wang and Dr. Chaoxu Mu presents recent results on
learning-based robust adaptive critic control theory and methods, including
self-learning robust stabilization, data-driven robust optimal control, adaptive tra-
jectory tracking, event-driven robust control, and adaptive H1 control design. It
covers a general analysis for adaptive critic systems in terms of stability, conver-
gence, optimality, and robustness, with emphasis on robustness of adaptive critic
control systems under uncertain environment. In addition, by considering several
practical plants, especially power systems, some application results are provided to
verify the effectiveness of adaptive critic-based robust and tracking control methods.
The book is likely to be of interest to researchers and practitioners as well as
graduate students in automation, computer science, and electrical engineering who
wish to learn core principles, methods, algorithms, and applications in the field of
robust adaptive critic control. It is beneficial to promote the development of
adaptive critic control approaches with robustness guarantee and the construction of
high-level intelligent systems. I am sure you will enjoy reading this book.
Chicago, USA Derong Liu

March 2018
v
Preface
Uncertainty and nonlinearity are involved in all walks of life. Every living organism
in the nature interacts with its environment and improves its own actions to survive
and increase. However, due to the limitation of various resources, most organisms
act in an optimal fashion in order to conserve resources yet achieve their goals.
Hence, obtaining optimal actions to minimize consumption or maximize reward,
i.e., the idea of optimization, is necessary and significant. In general, the optimal
control of nonlinear systems often requires solving the nonlinear Hamilton–Jacobi–
Bellman (HJB) equation, which is different from that of linear systems. Therefore,
the nonlinear optimal control design with dynamic uncertainties is a difficult and
challenging area. Traditionally, dynamic programming provides an effective avenue
to deal with optimization and optimal problems. However, due to the well-known
“curse of dimensionality”, it is often computationally untenable to run it to get the
optimal solutions. Moreover, the backward search direction obviously precludes the
use of dynamic programming in real-time control. Fortunately, the combination of
dynamic programming, artificial neural networks, and reinforcement learning,
especially adaptive critic structure, results in adaptive/approximate dynamic pro-
gramming (ADP), in order to solve optimal control problems forward-in-time. ADP
and reinforcement learning are quite relevant to each other when performing
intelligent optimization. They are both regarded as promising methods involving
important components of evaluation and improvement, at the background of
information technology, such as artificial intelligence, big data, and deep learning.
In the last two decades, the ADP mechanism is important and effective when
solving optimal and robust control problems under uncertain environment. Many
excellent results have been established related to adaptive, optimal, and robust
control design. This book intends to report the new results of adaptive critic control
with robust stabilization for uncertain nonlinear systems. The book covers the core
theory, novel methods, and some typically industrial applications related to the
robust adaptive critic control field. A whole framework of various robust adaptive
critic strategies is developed, including theoretical analysis, algorithm design,
simulation verification, and experimental results.
vii
viii Preface
Overall, ten chapters are included in this book. Dr. Ding Wang contributes to
Chaps. 1, 2, and 5–9 while Dr. Chaoxu Mu contributes to Chaps. 3, 4, and 10. Both
of them perform the revision and polish of all the ten chapters.
In Chap. 1, the overview of adaptive critic-based robust control (or robust
adaptive critic control) design of continuous-time nonlinear systems is provided.
The ADP-based nonlinear optimal regulation is reviewed, followed by robust sta-
bilization of nonlinear systems with matched uncertainties, guaranteed cost control
design of unmatched plants, and decentralized stabilization of interconnected sys-
tems. Additionally, further comprehensive discussions are presented, including
event-based robust control design, improvement of the critic learning rule, non-
linear H1 control design, and several notes on future perspectives.
In Chap. 2, two different robust optimal control methods of nonlinear systems
with matched uncertainties are developed. In the first part, the infinite-horizon
robust optimal control problem for continuous-time uncertain nonlinear systems is
investigated by using data-based adaptive critic designs. The neural network
identification scheme is combined with the traditional adaptive critic technique, in
order to design the robust optimal control under uncertain environment. In the
second part, the robust optimal control design is revisited by using a data-based
integral policy iteration approach, which performs the model-free policy learning.
In Chap. 3, a novel observer-based online control strategy is proposed for a class
of continuous-time uncertain nonlinear systems based on solving the HJB equation
approximately. A neural network-based observer is designed to reconstruct all
system states by only relying on output variables and also is used to the online
policy iteration control scheme. Then, within the ADP framework, a critic neural
network is constructed to approximate the optimal cost function, and after that, the
approximate expression of the optimal control policy can be directly derived.
In Chap. 4, an adaptive tracking control scheme is designed for a class of
continuous-time nonlinear systems with uncertainties based on the approximate
solution of the HJB equation. The tracking control of the continuous-time uncertain
nonlinear system can be transformed into the optimal tracking control of the
associated nominal system. By building the nominal error system and modifying
the cost function, the solution of the relevant HJB equation can be contributed to the
adaptive tracking control of the uncertain nonlinear system.
In Chap. 5, the robust feedback stabilization for a class of continuous-time
uncertain nonlinear systems via event-triggering mechanism and adaptive critic
learning technique is investigated with stability guarantee. The main idea is to
combine the event-triggering mechanism with adaptive critic designs, so as to solve
the nonlinear robust control problem under uncertain environment. The combined
framework can not only make better use of computation and communication
resources but also conduct controller design from the view of intelligent
optimization.
In Chap. 6, an effective adaptive optimal regulator is developed for a class of
continuous-time nonlinear dynamical systems through an improved neural learning
mechanism. The main objective lies in that establishing an additional stabilizing
term to reinforce the traditional training process of the critic neural network, so that
Preface ix
to reduce the requirement with respect to the initial stabilizing control law. Then,
the novel adaptive optimal control method is also applied to perform robust sta-
bilization of dynamical systems including complex nonlinearity and uncertainty.
In Chap. 7, the robust stabilization scheme of nonlinear systems with general
uncertainties is developed. The involved uncertain term is more general than the
matched case. The approximate optimal controller of the nominal plant can be
applied to accomplish robust stabilization for the original uncertain dynamics. The
neural network weight vector is very convenient to initialize by virtue of the
improved critic learning formulation. Then, the robust trajectory tracking of
uncertain nonlinear systems is investigated, where the augmented system con-
struction is performed by combining the tracking error with the reference trajectory.
In Chap. 8, an improved critic learning criterion is established to cope with the
event-based nonlinear H1 control design. The proposed problem is regarded as a
two-player zero-sum game, and the adaptive critic mechanism is used to achieve the
minimax optimization under event-based environment. Then, the event-based
optimal control law and the time-based worst-case disturbance law are obtained
approximately by training a single critic neural network. The initial stabilizing
control is no longer required during the implementation process. The infamous
Zeno behavior of the present event-based design is also avoided through theoretical
analysis.
In Chap. 9, a computationally efficient framework for intelligent critic control
design and application of continuous-time input-affine systems is established with
the purpose of disturbance attenuation. A neural network identifier is developed to
reconstruct the unknown dynamical information incorporating stability analysis.
The optimal control law and the worst-case disturbance law are designed by
introducing and tuning a critic neural network. Then, the present method is applied
to a smart micro-grid, for ensuring the balance between all power generations and
load consumptions under uncertain and disturbed environment.
In Chap. 10, an ADP-based supplementary scheme for frequency regulation of
power systems is developed. An improved sliding mode method is employed as the
basic controller, where a new sliding mode variable is proposed for load frequency
control. The ADP strategy is used to provide the supplementary control signal.
Then, another scheme based on particle swarm optimization is developed as the
optimal parameter controller for the frequency regulation problem. Practical
experiments on single-area and multi-area benchmark systems with comparative
results are performed to illustrate the favorable performance of frequency
regulation.
Beijing, China Ding Wang

Tianjin, China Chaoxu Mu
March 2018
Acknowledgements
The authors would like to thank Yuzhu Huang, Qichao Zhang, and Chao Li for
providing valuable discussions when conducting related research. The authors also
would like to thank Yong Zhang, Ke Wang, Jiaxu Hou, and Mingming Ha for
preparing some basic materials of this book.
The authors are very grateful to the National Natural Science Foundation of
China (Grants 61773373, 61773284, U1501251, 61533017), Beijing Natural
Science Foundation (Grant 4162065), the Young Elite Scientists Sponsorship
Program of China Association for Science and Technology, the Youth Innovation
Promotion Association of Chinese Academy of Sciences, and the Early Career
Development Award of The State Key Laboratory of Management and Control for
Complex Systems for providing necessary financial support to our research in the
past four years.
Beijing, China Ding Wang

Tianjin, China Chaoxu Mu
March 2018
xi
Contents
1 Overview of Robust Adaptive Critic Control Design . . . . . . . . . . .. 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1
1.1.1 Reinforcement Learning and Adaptive Critic
Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2
1.1.2 Adaptive-Critic-Based Optimal Control Design . . . . .. 3
1.1.3 Adaptive-Critic-Based Nonlinear Robust Control
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 ADP-Based Continuous-Time Nonlinear Optimal Regulation . . . 7
1.2.1 Basic Optimal Control Problem Description . . . . . . . . . 7
1.2.2 Neural Control Design with Stability Discussion . . . . . 10
1.3 Nonlinear Robust Control Design with Matched
Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 13
1.3.1 Problem Transformation Method . . . . . . . . . . . . . . . .. 13
1.3.2 Other ADP-Based Robust Control Methods . . . . . . . .. 16
1.4 Nonlinear Guaranteed Cost Control Design with Unmatched
Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 18
1.5 Nonlinear Decentralized Control Design with Matched
Interconnections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.6 Advanced Techniques and Further Discussions . . . . . . . . . . . . . 24
1.6.1 Saving the Communication Resource . . . . . . . . . . . . . . 24
1.6.2 Improving the Critic Learning Rule . . . . . . . . . . . . . . . 28
1.7 Comparison Remarks Between ADP-Based Robust Control
and H1 Control Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29
1.8 Future Perspectives and Conclusions . . . . . . . . . . . . . . . . . . .. 32
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 34
2 Robust Optimal Control of Nonlinear Systems with Matched
Uncertainties . . . . . . . . . . . ............................... 45
2.1 Introduction . . . . . . . ............................... 45
2.2 Problem Statement . . ............................... 48
xiii
xiv Contents
2.3 Basics of Robust Optimal Control Methodology . . . . . . . . . . . . 49

2.4 Robust Optimal Control via Neural Network Identification . . . . 52
2.4.1 Neural Network Identification . . . . . . . . . . . . . . . . . . . 52
2.4.2 Model-Free Policy Iteration Algorithm . . . . . . . . . . . . 55
2.4.3 Implementation Process via Critic Learning . . . . . . . . . 56
2.4.4 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.5 Revisit Robust Optimal Control via Integral Policy
Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 61
2.5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 62
2.5.2 Implementation Process with Actor-Critic
Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.6 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3 Observer-Based Online Adaptive Regulation for a Class
of Uncertain Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.3 Neural-Network-Observer-Based Online Adaptive Control . . . . . 89
3.3.1 Policy Iteration Scheme . . . . . . . . . . . . . . . . . . . . . . . 89
3.3.2 Neural-Network-Based State Observer Design . . . . . . . 90
3.3.3 Implementation of Online Adaptive Regulation . . . . . . 93
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4 Adaptive Tracking Control of Nonlinear Systems Subject
to Matched Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.2 Problem Formulation and Transformation . . . . . . . . . . . . . . . . . 119
4.3 Adaptive Tracking Control Scheme Based on ADP . . . . . . . . . 124
4.3.1 Derivation of Policy Iteration Algorithm . . . . . . . . . . . 124
4.3.2 Implementation of Adaptive Tracking Control . . . . . . . 124
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5 Event-Triggered Robust Stabilization Incorporating
an Adaptive Critic Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.2 Problem Formulation and Transformation . . . . . . . . . . . . . . . . . 148
Contents xv
5.3 Adaptive-Critic-Based Event-Triggered Robust Stabilization . . . 150

5.3.1 Robust Stabilization with Event-Triggering
Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.3.2 Adaptive Critic Control with Neural
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6 An Improved Adaptive Optimal Regulation Framework
with Robust Control Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.3 Improved Neural Optimal Control Design . . . . . . . . . . . . . . . . 176
6.3.1 Approximate Optimal Regulation . . . . . . . . . . . . . . . . 176
6.4 Application to Perform Robust Stabilization . . . . . . . . . . . . . . . 182
6.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.5 Simulation and Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
7 Robust Stabilization and Trajectory Tracking of General
Uncertain Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.3 Robust Stabilization Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 202
7.3.1 Theoretical Results of Transformation . . . . . . . . . . . . . 202
7.3.2 Neural Control Implementation . . . . . . . . . . . . . . . . . . 204
7.4 Generalization to Robust Trajectory Tracking . . . . . . . . . . . . . . 212
7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8 Event-Triggered Nonlinear H ‘ Control Design
via an Improved Critic Learning Strategy . . . . . . . . . . . . . . . . . . . 229
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
8.3 Event-Based Nonlinear H1 State Feedback . . . . . . . . . . . . . . . 233
8.3.1 Feedback Control Design Method . . . . . . . . . . . . . . . . 233
8.3.2 Neural Control Implementation . . . . . . . . . . . . . . . . . . 235
xvi Contents

8.3.4 Zeno Behavior Exclusion . . . . . . . . . . . . . . . . . . . . . . 244
8.3.5 General Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . 246
8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
9 Intelligent Critic Control with Disturbance Attenuation
for a Micro-Grid System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
9.3 Intelligent Critic Control with Disturbance Attenuation . . . . . . . 260
9.3.1 Identification of the Controlled Plant . . . . . . . . . . . . . . 260
9.3.2 Adaptive Critic Control Design Strategy . . . . . . . . . . . 263
9.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
10 ADP-Based Supplementary Design for Load Frequency
Control of Power Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
10.2 LFC Model with Parameter Uncertainties . . . . . . . . . . . . . . . . . 283
10.3 Load Frequency Control Design . . . . . . . . . . . . . . . . . . . . . . . 285
10.3.1 Improved Sliding Mode Control Design . . . . . . . . . . . 285
10.3.2 Particle Swarm Optimization Based Control
Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
10.3.3 ADP-Based Sliding Mode Control Design . . . . . . . . . . 289
10.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
10.4.1 Experiments Without Parameter Uncertainties . . . . . . . 293
10.4.2 Experiments with Disturbances and Uncertainties . . . . . 299
10.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Acronyms
ADP Adaptive/approximate dynamic programming

HJB Hamilton–Jacobi–Bellman equation
HJI Hamilton–Jacobi–Isaacs equation
LFC Load frequency control
UUB Uniformly ultimately bounded
xvii
Chapter 1
Overview of Robust Adaptive Critic
Control Design
Abstract Adaptive dynamic programming (ADP) and reinforcement learning are

quite relevant to each other when performing intelligent optimization. They are both
regarded as promising methods involving important components of evaluation and
improvement, at the background of information technology, such as artificial intel-
ligence, big data, and deep learning. Although great progresses have been achieved
and surveyed when addressing nonlinear optimal control problems, the research
on robustness of ADP-based control strategies under uncertain environment has
not been fully summarized. Hence, this chapter reviews the recent main results of
adaptive-critic-based robust control design of continuous-time nonlinear systems.
The ADP-based nonlinear optimal regulation is reviewed, followed by robust sta-
bilization of nonlinear systems with matched uncertainties, guaranteed cost con-
trol design of unmatched plants, and decentralized stabilization of interconnected
systems. Additionally, further comprehensive discussions are presented, including
event-based robust control design, improvement of the critic learning rule, nonlinear
H∞ control design, and several notes on future perspectives. This overview is bene-
ficial to promote the development of adaptive critic control methods with robustness
guarantee and the construction of higher level intelligent systems.
1.1 Introduction
Nowadays, machine learning becomes the core technique of artificial intelligence

and plays an important role in modern technology. Artificial intelligence, big data,
and deep learning are all hot topics of information technology. Machine learning
[6, 36] and deep learning [51, 85, 111] are extremely helpful for the study of big
data [18, 105]. In 2016, Google DeepMind developed a program called AlphaGo
[115] that has shown performance previously thought to be impossible for at least a
decade. Instead of exploring various sequences of moves, AlphaGo learns to make a
move by evaluating the strength of its position on the board. This kind of evaluation
was ensured to be possible via deep learning capabilities of artificial neural networks
[31, 39, 181]. Due to the excellent properties of adaptivity, advanced input-output
© Springer Nature Singapore Pte Ltd. 2019 1

D. Wang and C. Mu, Adaptive Critic Control with Robust Stabilization
for Uncertain Nonlinear Systems, Studies in Systems, Decision and Control 167,
https://doi.org/10.1007/978-981-13-1253-3_1
2 1 Overview of Robust Adaptive Critic Control Design
mapping, fault tolerance, nonlinearity, and self-learning, neural networks are fre-
quently used for universal function approximation in numerical algorithms. Deep
neural networks based learning has played a vital role in AlphaGo’s success [155].
Position evaluation, aimed at approximating the optimal cost function of the game,
is the key procedure of AlphaGo. Noticeably, reinforcement learning [120] is an
indispensable component of this advanced product.
1.1.1 Reinforcement Learning and Adaptive Critic Designs
As an important branch of artificial intelligence and especially machine learning,

reinforcement learning tackles modification of actions based on interactions with the
environment. The environment comprises everything outside the agent (the learner
and the decision-maker) and also interacts with the agent. Reinforcement learning
focuses on how an agent ought to take actions in an environment so as to maximize
the cumulative reward or minimize the punishment, where the idea of optimization
is involved. In fact, people often are interested in mimicking nature and designing
automatic control systems that are optimal to effectively achieve required perfor-
mances without unduely depending on the limited resources. Prescribing a search
tracking backward from the final step and employing the principle of optimality
thereby finding the optimal policy, dynamic programming is a useful computational
technique to solve optimal control problems [9, 59]. However, due to the defect of
backward numerical process when coping with the high-dimensional optimization
problems, it is computationally untenable to run dynamic programming to obtain
the optimal solution (i.e., the well-known “curse of dimensionality” [9]). What’s
worse, the backward direction of the search process precludes the use of dynamic
programming in real-time control.
Reinforcement learning is highly related to dynamic programming technique.
Classical dynamic programming algorithms are of limited utility in reinforcement
learning because of their dependence on the perfect model and a mass of compu-
tational expense. However, dynamic programming provides an essential foundation
for understanding reinforcement learning. There is a class of reinforcement learning
methods incorporating the actor-critic (or adaptive critic) structure, where an actor
component applies an action (or control law) to the environment and a critic com-
ponent evaluates the value of that action. The combination of actor-critic structure,
dynamic programming, and neural networks, results in the adaptive/approximate
dynamic programming (ADP) algorithm [109, 164, 165, 167], invented by Werbos
for obtaining approximate optimal solutions. The core idea of ADP is the adaptive
critic based optimization and it is regarded as a necessary outlet to achieve truly
brain-like intelligence [109, 167].
1.1 Introduction 3
1.1.2 Adaptive-Critic-Based Optimal Control Design
Artificial neural networks and fuzzy systems are always regarded as important intel-
ligent complements to practical control engineering. Actually, they are often used as
fundamental components of various computational intelligence techniques and the
optimization design of complex dynamics based on them is a significant topic of
decision and control community [33, 78, 156, 157, 173]. Linear optimal regulators
have been studied by control scientists and engineers for many years. However, it
is not an easy task to acquire the analytic solution of the Hamilton-Jacobi-Bellman
(HJB) equation for general nonlinear systems. Thus, their optimal feedback design is
much too difficult but considerable important. Remarkably, the successive approxi-
mation method [1, 5, 8, 110] and the closely related ADP method are both developed
to conquer the difficulty via approximating the HJB solution. In general, ADP is a
promising technique to approximate optimal control solutions for complex systems
[1, 5, 8, 109, 110, 164, 165, 167]. Particularly, it is regarded as an effective strategy
to design optimal controllers in online and forward-in-time manners. Among them,
the adaptive critic is the basic framework and neural networks are often involved to
serve as the function approximator. Employing the ADP method always results in
adaptive near-optimal feedback controllers and hence is useful to perform various
nonlinear intelligent control applications.
There are several synonyms used for ADP, and most of them are closely related
to neural networks. They are “adaptive critic designs” [30, 101, 102], “adaptive
dynamic programming” [55, 97], “approximate dynamic programming” [5, 112,
167], “neural dynamic programming” [76, 113], “neuro-dynamic programming”
[12], “reinforcement learning” [112, 120] including Q-learning [159], and “relaxed
dynamic programming” [62, 106]. In the basic framework, there are three compo-
nents: critic, model, and action. They are usually implemented via neural networks
and perform the function of evaluation, prediction, and decision, respectively. Some
improved structures are also proposed, such as the goal representation ADP [32,
99, 123, 200] and fuzzy ADP [123, 191]. In the last two decades, ADP has been
promoted extensively when coping with adaptive optimal control of discrete-time
systems [21, 24, 34, 35, 68, 71, 77, 90, 91, 95, 118, 143, 153, 161, 163, 174,
175, 187, 197] and continuous-time systems [13, 15, 26, 43, 47, 74, 81, 82, 87,
88, 98, 100, 117, 127, 129, 130, 182, 184, 202]. Among them, the iterative ADP
algorithm based on value iteration is important to the self-learning optimal control
design of discrete-time systems [5, 91, 143, 153, 187], while the policy iteration is
significant to the adaptive optimal control design of continuous-time systems [1, 74,
88, 127, 130]. The convergence of these iterative algorithms is a basic issue so that
it has been sufficiently studied [1, 5, 43, 47, 74, 77, 81, 88, 91, 95, 117, 118, 127,
130, 143, 153, 187]. For comprehensive survey papers and books of recent devel-
opments, please refer to [16, 44, 57, 58, 60, 65, 67, 72, 116, 131, 134, 154, 186,
192], including various topics in terms of theory, design, analysis, and applications.
As emphasized by [57, 58, 60], the ADP technique is closely related to reinforcement
learning when engaging in the research of feedback control. In general, value and
policy iterations are fundamental algorithms of reinforcement learning based ADP

in optimal control. It is easy to initialize the value iteration, but one cannot always
guarantee the stability of iterative control laws during the implementation process.
Policy iteration starts with a stabilizing controller, but it is difficult to find the initial
admissible control law in many situations. As a result, the generalized version of
these two algorithms has received great attention [57, 58, 60, 68, 72, 163] recently,
for integrating their advantages and avoiding the weaknesses.
The rapid development of information technology, especially artificial intelli-
gence, big data, and deep learning, are profoundly affecting our society. Nowadays,
the data-driven control design has become a hot topic in the field of control theory
and control engineering [37, 38, 152, 158, 177]. The development of ADP methods
greatly promotes the research of data-based optimal control design [21, 67, 81, 95,
96, 143, 149, 184, 202]. A novel iterative neural dynamic programming algorithm
was developed in [96, 149], reflecting a combination of neural dynamic program-
ming technique and the iterative ADP algorithm. The integral reinforcement learning
proposed in [52–54] provides a new outlet of achieving the model-free optimal regu-
lation. All of these results are beneficial to the development of artificial intelligence
and computational intelligence techniques.
1.1.3 Adaptive-Critic-Based Nonlinear Robust Control

Design
Existing results of ADP methods are mostly obtained under the assumption that there
are no dynamical uncertainties in the controlled plants. Nevertheless, practical control
systems are always subject to model uncertainties, exogenous disturbances or other
changes in their lifetime. They are necessarily considered during the controller design
process in order to avoid the deterioration of nominal closed-loop performance. A
controller is said to be robust if it works even if the actual system deviates from its
nominal model on which the controller design is based. The importance of the robust
control problem is evident which has been studied by control scientists for many
years (see [19, 49, 50, 56, 63, 64] and the references therein). In [63, 64], the robust
control problem was handled by using the optimal control approach for the nominal
system.1 This is a very important result which establishes a connection between the
two control topics. However, the detailed procedure is not discussed and it is difficult
to cope with general nonlinear systems. Then, an optimal control scheme based on
the HJB solution for robust controller design of nonlinear systems was proposed in
[3, 4]. The algorithm was constructed by using the least squares method performed
offline while the closed-loop stability analysis was not fully discussed.
1 It represents the portion of system without considering the uncertainty during the feedback control
design aimed at guaranteeing the desired performance of a dynamic plant containing uncertain
elements [19, 63, 64].
1.1 Introduction 5
Since 2013, there gradually appeared some publications of ADP-based robust

control designs [75, 119, 133, 138–141, 144, 178, 199]. In general, the problem
transformation is conducted to build a close relationship between the robustness
and optimality. Moreover, the closed-loop system is always proved to be uniformly
ultimately bounded (UUB) that will be defined later. In [139], a policy iteration
algorithm was developed to solve the robust control problem of continuous-time
nonlinear systems with matched uncertainties and the algorithm was improved in
[141]. This method was extended to deal with the robust stabilization of matched
nonlinear systems with unknown dynamics [144] and with constrained inputs [75].
Incidentally, it is worth mentioning that a tentative result of ADP-based robust control
design of discrete-time nonlinear systems was given in [140]. For improving the
learning rule of the critic neural network, the adaptation-oriented near-optimal control
problem was revisited and then the robust stabilization of nonlinear systems was
studied with further results [133]. Moreover, the robust control method of nonlinear
systems with unmatched uncertainties was derived in [199]. The robust control design
with matched uncertainties and disturbances was also studied in [119] as an extension
of [141]. Note the data-driven approaches are helpful to the ADP-based robust control
design since system uncertainties can sometimes be regarded as unknown dynamics.
For discussing the optimality of the ADP-based robust controller, a novel data-based
robust optimal control method of matched nonlinear systems was constructed [138].
Data-based robust adaptive control for a class of unknown nonlinear systems with
constrained-input was studied via integral reinforcement learning [178]. These results
guarantee that ADP methods are applicable to a large class of complex nonlinear
systems under uncertain environment. Hence, they greatly broadens the application
scope of ADP, since many of previous publications do not focus on the robustness
of obtained controllers. Subsequently, because of possessing the common speciality
of handling system uncertainty, the combination of sliding mode control with ADP
provides a new direction to the study of self-learning control design [23, 90]. In [90],
the application issue on air-breathing hypersonic vehicle tracking was addressed
by employing an innovative combination of sliding mode control and ADP. Then,
the sliding mode control method based on ADP was used in [23] to stabilize the
closed-loop system with time-varying disturbances and guarantee the nearly optimal
performance of the sliding-mode dynamics.
For filling up the gap in most of ADP literature where dynamic uncertainties or
unmodeled dynamics were not addressed, an important framework named robust
ADP was proposed in [14, 25, 41, 42, 45] to cope with the nonlinear robust optimal
control design from another aspect. An overview of robust ADP method for linear
and nonlinear systems was given in [45], outlining the development of robust ADP
theory with potential applications in engineering and biology. In [42], a key strat-
egy integrating several tools of modern nonlinear control theory, such as the robust
redesign and backstepping techniques as well as the nonlinear small-gain theorem
[46], was developed with ADP formulation. After that, the robust ADP method was
employed to decentralized optimal control of large-scale systems [14] and output
feedback control of interconnected systems [25]. Therein, the applications of robust
ADP to power systems were given special attention [14, 25, 41, 42, 45]. Generally,
the robust ADP design can not only stabilize the original uncertain system, but also
achieve optimality in the absence of dynamic uncertainty. It was emphasized that,
under the framework of robust ADP, computational designs for robust optimal con-
trol can be carried out based only on the online data of the state and input variables
[45]. In this sense, the robust ADP method can be regarded as a nonlinear variant
of [40], where a computational adaptive optimal control strategy was proposed to
iteratively solve the linear algebraic Riccati equation using online information of
state and input.
However, as we have seen, most of the previous research only concerns with
the robustness of the uncertain system and the optimality of the nominal system
[42, 75, 133, 139, 199]. In other words, the direct optimal control design of uncer-
tain nonlinear systems is very difficult. This is because coping with the cost function
of the uncertain plant is not an easy task. Therefore, some researchers have paid atten-
tion to the study of boundedness of the cost function with respect to the uncertain
plant, in addition to optimizing it. The guaranteed cost control strategy [17] possesses
the advantage of providing an upper bound on a given cost and therefore the degrada-
tion of control performance incurred by system uncertainties can be guaranteed to be
less than this bound. When discussing the optimality with respect to the guaranteed
cost function, it leads to the optimal guaranteed cost control problem. The guaran-
teed cost control design is a somewhat mature research topic of control community,
but there are some new results with the emerging ADP formulation [70, 94, 142, 172,
180]. Under the ADP framework, we obtain a novel self-learning optimal guaranteed
cost control scheme.
When studying complex dynamical systems, we often partition them into a number
of interconnected subsystems for convenience. The combination of these subsystems
can be seen as large-scale systems. As one of the effective control schemes for large-
scale systems, the decentralized control design has acquired much interest because of
its evident advantages, such as easy implementation and low dimensionality [66, 69,
92, 107, 114, 136, 141]. It is shown that the decentralized stabilization for a class of
interconnected nonlinear systems is closely related to the ADP-based robust control
design [66, 69, 92, 136, 141]. In this sense, the self-learning decentralized control
scheme can be constructed with ADP formulation. Note that, the robustness issue is
also included in the aforementioned guaranteed cost control and decentralized control
designs. It will be illustrated that these three control topics are closely connected
under the proposed adaptive critic framework [134].
For consistency and convenience, the following notations will be used throughout
the chapter. R represents the set of all real numbers. Rn is the Euclidean space of
all n-dimensional real vectors. Rn×m is the space of all n × m real matrices. ·
denotes the vector norm of a vector in Rn or the matrix norm of a matrix in Rn×m .
In represents the n × n identity matrix. λmax (·) and λmin (·) stand for the maximal
and minimal eigenvalues of a matrix, respectively. Let Ω be a compact subset of
Rn , Ωu be a compact subset of Rm , and A (Ω) be the set of admissible control
laws (defined in [1, 8, 127, 130]) on Ω. ρ is the parameter in the utility corre-
sponding to the uncertain term. L2 [0, ∞) denotes a space of functions where the
Lebesgue integral of the element is finite. is the L2 -gain performance level. i is the
1.1 Introduction 7
symbol of the ith subsystem in an interconnected plant, j is the sampling instant of the
event-triggering mechanism, and k is the iteration index of the policy iteration algo-
rithm. N+ = {i}i=1
N
= {1, 2, . . . , N } denotes the set of positive integers between 1
and N . N = {0, 1, 2, . . . } stands for the set of all non-negative integers. “T” is used
for representing the transpose operation and ∇(·) ∂(·)/∂ x is employed to denote
the gradient operator.
1.2 ADP-Based Continuous-Time Nonlinear Optimal

Regulation
In this section, we present a brief review of the continuous-time nonlinear optimal

regulation method with neural network implementation. The basic idea of the ADP
method for optimal control of continuous-time systems is involved therein.
1.2.1 Basic Optimal Control Problem Description
We consider a class of continuous-time nonlinear systems with control-affine inputs

given by
ẋ(t) = f (x(t)) + g(x(t))u(t), (1.1)
where x(t) ∈ Ω ⊂ Rn is the state vector, u(t) ∈ Ωu ⊂ Rm is the control vector, and
the system functions f (·) and g(·) are differentiable in the arguments satisfying
f (0) = 0. We let the initial state at t = 0 be x(0) = x0 and x = 0 be the equilibrium
point of the controlled plant. The internal system function f (x) is assumed to be
Lipschitz continuous on the set Ω in Rn which contains the origin. Generally, the
nonlinear plant (1.1) is assumed to be controllable.
In this chapter, we consider the undiscounted optimal control problem with infinite
horizon cost function. We let
U (x(t), u(t)) = Q(x(t)) + u T (t)Ru(t) (1.2)
denote the utility function,2 where the scalar function Q(x) ≥ 0 and the m-
dimensional square matrix R = R T > 0, and then define the cost function as
∞
J (x(t), u(t)) = U (x(τ ), u(τ ))dτ. (1.3)
t
Q(x(t)) is more general than the classical form x T (t)Qx(t), where

2 The selected state-related utility
Q = Q T > 0. The control-related utility can be chosen as the non-quadratic form [75, 83, 187,
189] instead of the traditionally quadratic one u T (t)Ru(t) when encountering input constraints.
For simplicity, the cost J (x(t), u(t)) is written as J (x(t)) or J (x) in the sequel.
What we always concern is the cost function starting from t = 0, represented as
J (x(0)) = J (x0 ).
During optimal control design, we want to derive the optimal feedback control
law u(x) to minimize the cost function (1.3), where u(x) should be admissible.
Definition 1.1 (cf. [1, 8, 127, 130]) A control law u(x) is said to be admissible with
respect to (1.3) on Ω, denoted by u ∈ A (Ω), if u(x) is continuous on Ω, u(0) = 0,
u(x) stabilizes system (1.1) on Ω, and J (x0 , u) is finite for all x0 ∈ Ω.
For an admissible control law u(x) ∈ A (Ω), if the related cost function (1.3) is
continuously differentiable, then the infinitesimal version is the nonlinear Lyapunov
equation
0 = U (x, u(x)) + (∇ J (x))T [ f (x) + g(x)u(x)] (1.4)
with J (0) = 0. Define the Hamiltonian of system (1.1) as
H (x, u(x), ∇ J (x)) = U (x, u(x)) + (∇ J (x))T [ f (x) + g(x)u(x)]. (1.5)
Using Bellman’s optimality principle, the optimal cost function J ∗ (x), specifically
defined as
∞
∗
J (x) = min U (x(τ ), u(τ ))dτ, (1.6)
u∈A (Ω) t
satisfies the so-called continuous-time HJB equation
min H (x, u(x), ∇ J ∗ (x)) = 0. (1.7)

u∈A (Ω)
Based on optimal control theory, the optimal feedback control law is computed by
u ∗ (x) = arg min H (x, u(x), ∇ J ∗ (x))

u∈A (Ω)
1
= − R −1 g T (x)∇ J ∗ (x). (1.8)
2
Using the optimal control expression (1.8), the HJB equation turns to be the form
0 = U (x, u ∗ (x)) + (∇ J ∗ (x))T [ f (x) + g(x)u ∗ (x)]

= H (x, u ∗ (x), ∇ J ∗ (x)), J ∗ (0) = 0. (1.9)
We notice that the optimal control law can be derived if the optimal cost function can
be obtained, i.e., the equation (1.9) is solvable. However, that is not the case. Since
the continuous-time HJB equation (1.9) is difficult to deal with in theory, it is not
an easy task to obtain the optimal control law (1.8) for general nonlinear systems.
1.2 ADP-Based Continuous-Time Nonlinear Optimal Regulation 9
This promotes the investigation of iterative algorithms, such as policy iteration. We

first construct two sequences in terms of the cost function {J (k) (x)} and the control
law {u (k) (x)}, and then start iteration from an initial admissible controller as follows:
u (0) (x) → J (1) (x) → u (1) (x) → J (2) (x) → · · · (1.10)
Generally, the policy iteration includes two important iterative steps [120], i.e., policy
evaluation based on (1.4) and policy improvement based on (1.8), which are shown
in Algorithm 1.
Algorithm 1 Policy Iteration for Optimal Control Problem

1: Initialization
Let the initial iteration index be k = 0 and J (0) (·) = 0.
Give a small positive number as the stopping threshold.
Start iteration from an initial admissible control law u (0) .
2: Policy Evaluation
Using the control law u (k) (x), solve the following nonlinear Lyapunov equation
T
0 = U x, u (k) (x) + ∇ J (k+1) (x) ẋ (1.11)
with J (k+1) (0) = 0, where ẋ = f (x) + g(x)u (k) (x).

3: Policy Improvement
Based on J (k+1) (x), update the control law via
1
u (k+1) (x) = − R −1 g T (x)∇ J (k+1) (x). (1.12)
2
4: Stopping Criterion
If |J (k+1) (x) − J (k) (x)| ≤ , stop and obtain the approximate optimal control law; else, set
k = k + 1 and go back to Step 2.
Note that the above policy iteration algorithm can finally converge to the optimal
cost function and optimal control law, i.e., J (k) (x) → J ∗ (x) and u (k) (x) → u ∗ (x)
as k → ∞. The convergence proof has been given in [1, 74] and related references
therein. However, it is still difficult to obtain the exact solution of the Lyapunov
equation. This motivates us to develop an approximate strategy to overcome the
difficulty [13, 15, 26, 42, 43, 74, 81, 88, 127, 130, 138, 144], which results in
the ADP-based neural control design. Besides, the knowledge of system dynamics
f (x) and g(x) is needed to perform the iterative process. Actually, some advanced
methods have been proposed to relax this requirement, such as the integral policy
iteration algorithm [130], the neural identification scheme [144], and the probing
signal method [40, 42]. As discussed in the following sections, great efforts are still
being made in this aspect.
1.2.2 Neural Control Design with Stability Discussion
As is shown in Sect. 1.1, several neural networks are often incorporated in adap-
tive critic designs. Among them, the critic network is regarded as the most funda-
mental element, even though there may be other elements involved, such as model
network [91, 143] and action network [127, 143]. Different configurations reflect
distinct objectives of control designers. The single critic structure is often employed
to emphasize the simplicity of the design procedure [75, 139].
During the neural network implementation, we take the universal approxima-
tion property into consideration and express the optimal cost function J ∗ (x) on the
compact set Ω as
J ∗ (x) = ωcT σc (x) + εc (x), (1.13)
where ωc ∈ Rlc is the ideal weight vector, lc is the number of neurons in the hidden
layer, σc (x) ∈ Rlc is the activation function, and εc (x) ∈ R is the reconstruction
error.3 Then, the gradient vector of the optimal cost function is
∇ J ∗ (x) = (∇σc (x))T ωc + ∇εc (x). (1.14)
Since the ideal weight is unknown, a critic neural network is developed to approxi-
mate the optimal cost function as
Jˆ∗ (x) = ω̂cT σc (x), (1.15)
where ω̂c ∈ Rlc denotes the estimated weight vector. Similarly, we derive the gradient
vector as
∇ Jˆ∗ (x) = (∇σc (x))T ω̂c . (1.16)
Note that the specific structure of the critic network is always an experimental choice
with engineering experience and intuition after noticing a tradeoff between control
accuracy and computational complexity [1]. Actually, selecting the proper neurons
for neural networks is more of an art than science [101]. Determining the number of
neurons needed for a particular application is still an open problem.
Considering the feedback formulation (1.8) and the neural network expression
(1.13), the optimal control law can be rewritten as a weight-related form
1
u ∗ (x) = − R −1 g T (x) (∇σc (x))T ωc + ∇εc (x) . (1.17)
2
3 For most of the general nonlinear cases, the ideal vector ωc and the ideal scalar εc are unknown
but they are both bounded.
Using the critic neural network (1.15), the approximate optimal feedback control
function is4
1
û ∗ (x) = − R −1 g T (x)(∇σc (x))T ω̂c . (1.18)
2
Based on the neural network formulation, the approximate Hamiltonian is written as
Ĥ (x, û ∗ (x), ∇ Jˆ∗ (x)) = U (x, û ∗ (x)) + ω̂cT ∇σc (x)[ f (x) + g(x)û ∗ (x)]. (1.19)
Noticing (1.9), we define the error as
ec = Ĥ (x, û ∗ (x), ∇ Jˆ∗ (x)) − H (x, u ∗ (x), ∇ J ∗ (x)) (1.20)
so that ec = Ĥ (x, û ∗ (x), ∇ Jˆ∗ (x)). As given in [1, 8, 74, 127], we define ∂ec /∂ ω̂c
φ ∈ Rlc and find that the set {φ1 , φ2 , . . . , φlc } is linearly independent.
Now, we show how to train the critic network and design the weight vector ω̂c
to minimize the objective function normally defined as E c = (1/2)ec2 . Traditionally,
based on (1.19), we can employ the normalized steepest descent algorithm

1 ∂ Ec φ
ω̂˙ c = −αc = −αc ec (1.21)
(1 + φ T φ)2 ∂ ω̂c (1 + φ T φ)2
to tune the weight vector, where the constant αc > 0 is the learning rate while the
term (1 + φ T φ)2 is adopted for normalization. The simple diagram of the ADP-based
controller design method is depicted in Fig. 1.1, where (1.21) is the basic learning
criterion of the neural network.
Defining the error vector between the ideal weight and the estimated value as
ω̃c = ωc − ω̂c , we can easily find that ω̃˙ c = −ω̂˙ c . Here, let us introduce two new
variables φ1 = φ/(1 + φ T φ) and φ2 = 1 + φ T φ with φ1 ∈ Rlc and φ2 ≥ 1. Then, by
using the tuning rule (1.21), we derive that the critic weight error dynamics can be
formulated as
φ1
ω̃˙ c = −αc φ1 φ1T ω̃c + αc ecH , (1.22)
φ2
where the scalar term ecH represents the residual error due to neural network approx-
imation.
In adaptive critic designs, we intend to identify the parameters of the critic net-
work so as to approximate the optimal cost function. As commonly required within
the adaptive control field [49], the persistence of excitation assumption is naturally
4 The control law function is directly computed as a closed-loop expression of the critic weight
vector in this single network structure. An additional action network is built when implementing
the synchronous policy iteration algorithm [60, 127] to improve the sequential updates [58, 130]
in terms of saving computation time and avoiding dynamics knowledge.
Fig. 1.1 The ADP-based learning process and optimal control design diagram. The solid line
represents the signal flow while the dashed line denotes the neural network back-propagating path.
The dotted component indicates whether there is an improvement module added to the learning
criterion. If it is set to “N”, there is no improvement and it is actually the traditional learning rule
(1.21). If it is set to “Y”, there will be an improved module (discussed later) during the learning
process
needed during adaptive critic learning. Note that based on [127, 129], the persistence
of excitation condition ensures that λmin (φ1 φ1T ) > 0, which is significant to perform
the closed-loop stability analysis. The following assumption is commonly used such
as in [13, 88, 100, 127].
Assumption 1.1 The control matrix g(x) is upper bounded such that g(x) ≤ λg ,
where λg is a positive constant. On the compact set Ω, the terms ∇σc (x), ∇εc (x), and
ecH are all upper bounded such that ∇σc (x) ≤ λσ , ∇εc (x) ≤ λε , and |ecH | ≤ λe ,
where λσ , λε , and λe are positive constants.
Definition 1.2 (cf. [88, 100, 144]) For a nonlinear system ẋ = f (x(t)), its solution
is said to be UUB, if there exists a compact set Ω ⊂ Rn such that for all x0 ∈ Ω, there
exist a bound Λ and a time T (Λ, x0 ) such that x(t) − xe ≤ Λ for all t ≥ t0 + T ,
where xe is an equilibrium point.
Lemma 1.1 (cf. [127]) For system (1.1) and the constructed neural network (1.15),
we suppose that Assumption 1.1 holds. The approximate optimal control law is given
by (1.18) and the critic network is tuned based on (1.21). Then, the closed-loop
system state and the critic weight error are UUB.
The UUB stability actually implies that after a transition period T , the state vector
remains within the ball of radius Λ around the equilibrium point. Note that the proof of
such UUB stability is performed by employing the well-known Lyapunov approach.
Based on Lemma 1.1, the critic weight error ω̃c is upper bounded by a finite constant.
Then, according to (1.17) and (1.18), we can find that
1 −1 T
u ∗ (x) − û ∗ (x) = R g (x) (∇σc (x))T ω̃c + ∇εc (x) (1.23)
2
is also upper bounded. This implies that the near-optimal controller û ∗ (x) can con-
verge to a neighborhood of the optimal value u ∗ (x) with a finite bound. Besides, this
bound can be set adequately small by adjusting the related parameters like the critic
learning rate.
It is also worth mentioning that the previous ADP-based optimal regulation
method provides the basis for further adaptive critic control designs. Note that the
dynamical uncertainties are not included in system (1.1). Considering the universal-
ity of the uncertain phenomenon, it is indeed necessary to extend the ADP-based
optimal control design approach to robust stabilization problems and investigate the
robustness of ADP-based controllers under uncertain environment.
1.3 Nonlinear Robust Control Design with Matched

Uncertainties
This section mainly presents the results about ADP-based robust control design
for matched uncertain nonlinear systems [3, 4, 14, 23, 25, 41, 42, 45, 75, 119,
133, 138–141, 144, 178, 199]. There are several categories of ADP-based robust
control strategies, such as the least-square-based problem transformation method
[3, 4], adaptive-design-based problem transformation method [75, 119, 133, 139–
141, 144, 199], data-based problem transformation method [138, 178], the combined
sliding mode control method [23], and the robust ADP method [14, 25, 41, 42, 45].
We will not only exhibit the robustness of the optimal controller with respect to the
nominal system but also discuss the optimality of the robust controller. Actually,
some of these methods [3, 4, 14, 25, 42, 45, 199] can be applied to unmatched
robust control design.
1.3.1 Problem Transformation Method
If dynamical uncertainties are brought into system (1.1) by various changes during
the operation of the controlled plant, we have to pay attention to the robustness of
the designed controller. We consider a class of continuous-time nonlinear systems
subjected to uncertainties and described by
ẋ(t) = f (x(t)) + g(x(t))[u(t) + d(x(t))], (1.24)
where the term g(x)d(x) reflects a kind of dynamical uncertainties matched with
the control matrix. We assume d(0) = 0, so as to keep x = 0 as an equilibrium of
the controlled plant. It is often assumed that the term d(x) is bounded by a known
function d M (x), i.e., d(x) ≤ d M (x) with d M (0) = 0.
Considering the uncertain nonlinear system (1.24), for coping with the robust
stabilization problem, we should design a control law u(x), such that the closed-
loop state vector is stable with respect to dynamical uncertainties. In this section,
by adopting a positive constant ρ and specifying Q(x) = ρd M 2
(x), we show that the
robust control problem can be addressed by designing the optimal controller of the
nominal plant (1.1), where the cost function is still given by (1.3) and the modified
utility is selected as
U R (x(t), u(t)) = ρd M
2
(x(t)) + u T (t)Ru(t). (1.25)
Note that in this situation, the optimal control function is kept unchanged even if the
modified utility is employed. For system (1.1) and cost function (1.3) with modified
utility function (1.25), the Hamiltonian becomes
H R (x, u(x), ∇ J (x)) = ρd M

2
(x) + u T (x)Ru(x) + (∇ J (x))T [ f (x)+g(x)u(x)].
(1.26)
Observing the modified utility function (1.25) and using the optimal control law
(1.8) again, the HJB equation with respect to the modified optimal control problem
becomes
1
0 = ρd M
2
(x) + (∇ J ∗ (x))T f (x) − (∇ J ∗ (x))T g(x)R −1 g T (x)∇ J ∗ (x)
4
= H R (x, u ∗ (x), ∇ J ∗ (x)), J ∗ (0) = 0. (1.27)
We first show the stability of the closed-loop form of the nominal system based
on the approximate optimal control law.
Theorem 1.1 (cf. [139]) For the nominal system (1.1) and cost function (1.3) with
modified utility function (1.25), the approximate optimal control law obtained by
(1.18) guarantees that the closed-loop system state is UUB.
Then, we show how to guarantee the robust stabilization of the matched uncertain
system (1.24) based on the designed near-optimal control law.
Theorem 1.2 (cf. [133]) For the nominal system (1.1) and cost function (1.3) with
modified utility function (1.25), the approximate optimal control obtained by (1.18)
guarantees that the closed-loop form of the uncertain nonlinear plant (1.24) pos-
sesses UUB stability if ρ > λmax (R).
Theorems 1.1 and 1.2 exhibit the closed-loop UUB stability of the nominal plant
(1.1) and uncertain plant (1.24), respectively, when applying the designed near-
optimal control law (1.18). One should pay special attention to the fact that the
closed-loop form of the uncertain plant is UUB when using the approximate optimal
Another random document with
no related content on Scribd:
W I N D S .
The Winds, different from our Quarter of the World, in these Voyages
are either peculiar to warm Latitudes; such are Trade-Winds, Land
and Sea Breezes; or to the Coast, Tornadoes, and Air-Mattans.
Trade-Winds are easterly, blow fresh night and day, all the Year,
and every where round the Globe; that Part of it I mean that we are
upon, the Ocean, whether Atlantick, Indian, or American: for the Soil
and Position of Lands, though the same Cause of them subsists
more powerfully, gives uncertain and various Deflections. They will
extend to 30°° of N. Latitude, when the Sun is on this side the
Equator, and as far S. when on the other; deflecting where he is
farthest off (here to the N. E. there to the S. E.) and always nearest
to the E. Point on the Equinoctial, or where he is vertical.
The general Causes assigned by the Ingenious for these
Phænomena, and with the greatest Probability of Truth, are;
First, the daily Rotation of the Earth Eastward upon its Axis,
whereby the Air or Wind (the enforced Stream of it) by this means
goes Westward in respect of the Superficies; and this is farther
countenanced in that these Winds are found only in the largest
Circles, where the diurnal Motion is swiftest; and also because they
blow as strong in the Night as Day; home, on the Coast of Brasil, as
near Guinea.
The second permanent Cause of this Effect, the ingenious Dr.
Halley ascribes to the Action of the Sun-beams upon the Air and
Water every day, considered together with the Nature of the Soil, and
Situation of the adjoining Continents.
The Sun heats and rarefies the Air exceedingly, in all Latitudes
within the Zodiack, (evident from the anhelous Condition it subjects
most Animals to in Calms) and therefore the Air from Latitudes more
without his Influence (as more ponderous) presses in, to restore the
Equilibrium: and to follow the Sun, must come from the Eastward.
The westerly Winds that restore this Balance, from Latitudes beyond
the Tropicks, would, I fancy, be as constant, and keep a Circulation,
were the whole a Globe of Waters: As it is, they are from 30 to 60°°,
abundantly the most predominant, with a Deviation to N. or S. on
various Accidents: blow with more force, because, among other
Reasons, the Equilibrium is restored to a greater from a lesser
Circle; and as it were to confirm this, are received into the Trade-
wind, with a Deflection of N. E. or more northward at the Point of
reception.
On the Coast of Guinea, North of the Equinoctial, the true Winds
are westerly, keeping a Track with the Shore, where it trenches all
eastward. From the River Gabon again, under the Line, the Land
stretches to the Southward, and, exactly answerable thereto, the
Winds wheel from S. E. to S. by E. to keep nigh a Parallel with it; in
both, the Shore seems to deflect the true Trade, in the same manner
Capes do Tides or Currents, and obliges it, like them, on that Point
where they have the freest Passage. If at any particular Seasons (as
in the Rains is remarked) the Winds become more southerly, and set
full upon the Shore, they are weak; and as the Sun is at such time on
this side the Equinoctial, it is probably to restore an Equilibrium to
that Air at land, more rarefied from a stronger and more reflected
Heat.
I shall give two or three other Remarks on Trade-winds, proper,
tho’ made at other Periods of the Voyage.
1. You must be distant from the Influence of Land to Windward,
before the Trade blows true and fresh, (from this Coast we may
suppose twenty or thirty Leagues) and then a Ship bound to America
will make a constant and smooth Run of forty or fifty Leagues every
twenty-four Hours. And as there are no Storms, vast numbers of
flying Fish sporting near the Ship, (found every where within the
Verge of these Winds, and no where else that ever I saw,) Bonetoes
pursuing them; with Birds of various sorts, Garnets, Boobys, Tropick-
Birds, and Sheerwaters, it makes a very delightful sailing.
2. Although the N. E. and S. E. Trade-Winds on this and that side
the Line, do not blow adverse, yet by approaching to it, are in my
Thoughts, the Occasion of becalming the Latitudes between 4 and
12° N, the Point of Contest; as we found, and will be hereafter
remarked in our Passage from Brasil to the West-Indies, in July and
August: and this I think, First, because the East southerly Trade is
known ordinarily to extend E. S. E. to 4° of Northern Latitude: and
consequently, as the East northerly is bounded a little nearer or
further from the Equinoctial, as is the Station of the Sun; Calms and
small Breezes, the Attendant of them, may vary a little, yet they will
always happen about these Latitudes, and near the windward
Shores be attended with Thunder, Lightning, and perpetual Rains.
Secondly, all Ships actually find this in their Passage from Guinea to
the West-Indies in any Month, or from England thither; the true Trade
decreasing as they approach those Latitudes, and up between Cape
Verd and the Islands, those Calms by all our Navigators are said to
be as constantly attended with Rains and Thunder.
Thirdly, Because the same thing happens at the Commencement
of the Trade, from the variable Winds in 27 or 28°° of Northern
Latitude, sooner or later as I observed is the Station of the Sun:
From all which I would infer, that from Guinea these calm Latitudes
are easier passed, not nigh, but within 100 Leagues of the Continent
of Africa, and at America not to get into them till a Ship has nigh run
her Distance; for the Land, I think, either to Windward or Leeward
does give a better Advantage to the Breezes, than nearer or more
remote: Ships from England do not want this Caution so much,
because the N. E. Trade does not fail till a little beyond the Parallel of
Barbadoes, the Southermost of our Islands.
Land and Sea-Breezes are Gales of no great Extent, the former
much fainter and inconstant will blow off an Island to a Roadsted, be
on which side of it you will, but whether at the same time or no, or
now here, now there, I am not experienced enough to say, tho’ their
Weakness and Inconstancy makes either way defensible.—They are
found at all shores within or near the Tropicks, the Sea-breeze
coming in about ten in the morning, fresh and sweet, enlivening
every thing. The Land-breeze when it does succeed, is at the same
distance from Sun-set or later, small, sultry, and stinking, especially
when from Rivers whose Banks are pestered with rotten Mangroves,
stagnating Waters, &c.
They seem to arise entirely from the Heat of the Sun-beams: That
the Air is more rarified by their Reflections on the solid Body of the
Earth than on a fluid, is certain; therefore till their rarified Air, made
so by three or four hours Sun, is brought to an Equilibrium, the
Breezes must be from the Sea at all parts of the Coast, because at
all parts, the same Cause is operating. And if this Rarefaction is
limited by a determined heighth of the Atmosphere, the Sea-breezes
that are to fill up the Vacuities will last a determined time only; two,
three, or more hours: this is fact, but whether properly solved, must
be submitted. Of affinity with this are the frequent Breezes we find
with meridian Suns at shores, even to the Latitude of England, tho’
very still before and after. Again, the Land-breezes which succeed at
night when the Sun has lost it’s Power, seem by their Weakness to
be the return of Air heaped up by the preceding day’s Heat, like
other Fluids when higher or fuller from any Cause (in one part than
another) of course has it’s reflux to make an even Surface.
Tornadoes, by the Spaniard called Travadoes, are in no part of the
World so frequent as at Guinea. They are fierce and violent Gusts of
Wind that give warning for some hours by a gradual lowering and
blackening of the Sky to Windward whence they come, accompanied
with Darkness, terrible Shocks of Thunder and Lightning, and end in
Rains and Calm. They are always off shore, between the N. and N.
E. here, and more Easterly at the Bites of Benin, Calabar, and Cape
Lopez; but although they are attended with this favourable Property
of blowing from the shore, and last only three or four hours, yet
Ships immediately at the appearance of them furl all their Sails and
drive before the Wind.
We have sometimes met with these Tornadoes two in a day, often
one; and to shew within what a narrow Compass their effects are,
Ships have felt one, when others at ten Leagues distance have
known nothing: Nay, at Anamaboo (3 or 4 Leagues off) they have
had serene Weather while we have suffered under a Tornado in
Cape Corso Road. And vice versa. A Proof of what Naturalists
conjecture, that no Thunder is heard above 30 Miles; in these
Storms it seems to be very near, one we felt the Afternoon of taking
Roberts the Pyrate, that seemed like the ratling of 10000 small Arms
within three yards of our Heads; it split our Maintop-Mast, and ended
as usual in excessive Showers, and then calm; the nearness is
judged by the Sound instantly following the Flash. Lightning is
common here at other times, especially with the shutting in of
Evening, and flashes perpendicularly as well as horizontally.
Both arise from a plenty of nitrous and sulphurous Exhalations that
make a Compound like Gun-powder, set on fire in the Air; and if the
Clouds that retain them be compact, and their heterogeneous
Contents strong, various, and unequal, then like a Cannon in
proportion to these, the disjection is with more or less Violence,
producing Thunder, which as with a [32]Shot has frequently split the
Masts of Ships; and strengthens the above Observation of their
being discharged near hand; because if at any considerable
distance, they would spread in the Explosion, and lose their Force. It
furnishes also another, viz. That neither Thunder nor Lightning can
be felt or heard far from shores; Winds may impel such Exhalations
something, but at a hundred Leagues from any Land the
Appearance must be rare and uncommon, because the matter of
their Compound cannot be collected there.
Air-mattans, or Harmatans, are impetuous Gales of Wind from the
Eastern Quarter about Midsummer and Christmas; they are attended
with Fogs, last three or four hours, (seldom with Thunder or
Lightning, as the Tornados) and cease with the Rain; are very dry,
shriveling up Paper, Parchment, or Pannels of Escruitores like a Fire.
They reach sometimes this Gold Coast, but are frequentest and in a
manner peculiar to the Bite of Benin, named so some think from Aer
Montain, respecting whence they come; or by others Mattan, the
Negrish Word for a pair of Bellows, which they having seen,
compare this Wind to.
The G U I N E A Trade.
An extensive Trade, in a moral Sense, is an extensive Evil, obvious
to those who can see how Fraud, Thieving, and Executions have
kept pace with it. The great Excess in Branches feeding Pride and
Luxury, are an Oppression on the Publick; and the Peculiarity of it in
this, and the Settlement of Colonies are Infringements on the Peace
and Happiness of Mankind.
By discoursing on this particular Branch, I do not pretend to a
Sufficiency of giving full Directions; the Natives Alteration and
Diversity of Taste are Obstacles with the most experienced: It’s only
within my Design to give a general Insight to such as are Strangers,
and a Rule to improve upon by such as are not.
We may for this end divide Guinea into a windward Coast, the
Gold Coast, and the Bay, a Tract of 6 or 700 Leagues from the River
Gambia, in 13°° N. to Angola, about 9 or 10°° S. The Portuguese
were the first Europeans that settled and built Forts here, tho’ now
the least concerned, paying their Tribute to the Dutch for Leave:
What remains of theirs is to the Southward on the River Congo at
Loango de St. Paul, and the Islands, where they keep Priests to
teach their Language to the Natives, and baptize without making
Christians.
1. In the windward Coast, Gambia, Sierraleon, and Sherbro Rivers
may be reckoned chief; the African Company having Factors and
Settlements there. Less noted, but more frequented by private Ships
in this part of Guinea, are Cape Mont, and Montzerado, Sesthos
River, Capes Palmas, Apollonia, and Tres Puntas. A number of
others intervene, of more or less Trade; which it is their Custom to
signify at the sight of any Ship by a Smoke, and is always looked on
as an Invitation to Trade; but as each is alterable among them from
the Chance of War, the Omission shews they decline it, or are out of
Stock.
This Change of Circumstance found on different Voyages,
proceeds from weak and bad Governments among themselves,
every Town having their own Cabiceers or ruling Men, (or it may be
three or four in Confederacy) all so jealous of the others Panyarring,
that they never care to walk even a mile or two from home without
Fire-Arms; each knows it is their Villanies and Robberies upon one
another that enables them to carry on a Slave-trade with Europeans;
and as Strength fluctuates, it is not unfrequent for him who sells you
Slaves to-day, to be a few days hence sold himself at some
neighbouring Town; this I have known.
The same way of reasoning answers for the Panyarrs and
Murders so frequently between them and us, and never that I heard
with the French or Portuguese. For if any of our Ships from Bristol or
Liverpool play tricks, and under pretence of Traffick seize and carry
away such of them as come on board, and trust themselves on that
Confidence, the Friends and Relations never fail with the first
Opportunity to revenge it; they never consider the Innocence of who
comes next, but as Relations in Colour, Panyarr the Boat’s Crews
who trust themselves foolishly on shore, and now and then by
dissembling a Friendship, have come on board, surprized and
murdered a whole Ship’s Company. Captain Piercy’s Lieutenant was
killed on shore on some such Pretence, or because he had a good
Suit of Cloaths, or both. Captain Canning of the Dove Brigantine
1732, was cut off by the Natives of Grand Bassau from an
Inadvertency; first, of tempting the Negroes with the sight of a fine
Cargo, and then by trusting the Mate Mr. Tho. Coote on shore; the
one prompted them to rob, and the other was an Hostage for their
Security, they ventured off in their Canoos and murdered all the
Company under the Conduct of a Fellow they called Thomas Grey,
who run the Vessel in shore; the Mate remained with them unhurt,
about sixteen days, and was then redeemed by Captain Wheeler for
17 Pounds worth of Goods, which as an Encouragement to the
Service, he was suffered to repay at London. His Food during the
stay, was Indian Corn, Rice, Snails and Monkeys; the last they shoot
as often as they want, in the Woods, and after the Guts are taken
out, singe the Hair off, and then boil it in the Skin. He saw no other
Flesh in this part of the Country, excepting a few Fowls, tho’ he was
up it above twelve miles.
2. The Gold Coast is the middle and smallest part of the Division,
stretching from Axiem a Dutch Settlement, to near the River Volta,
an extent of 70 or 80 Leagues, but of more consequence than the
others, in respect to our’s and the Dutch Company’s Forts, who
together command the greatest part of it. There is one Danish Fort at
Accra indeed, (the Leewardmost of our Settlements) but in a
decaying State, and will probably (as that of the Brandenburghers at
Cape Tres Puntas) be relinquished in a little time.
Our Company’s principal Fort is at Cape Corso. That of the Dutch,
two or three Leagues above, called Des Minas or St. George de
Elmina; each has other little ones up and down this Coast, to gather
in the Trade that centers for the respective Companies, at one or
other of the aforesaid larger Forts.
The African Company was erected under the Duke of York in K.
Charles II’s Time, and therefore Royal; the Epithet being still
retained, tho’ that Prince’s Superstition, and Thirst after Power, have
long since justly banish’d him the Realm.
In it’s first flourishing Condition, it was allowed by authentick
Accounts to have gained annually to England 900,000l. whereof in
Teeth, Camwood, Wax and Gold, was only 100,000l. and the rest in
Slaves; which in the Infancy of their Trade were in very great
demand over all the American Plantations to supply their own wants,
and carry on a clandestine Commerce with the Spanish West-Indies.
On Computation, Barbadoes wanted annually 4000 Negroes,
Jamaica 10000, Leeward Islands 6000; and because the Company
(’twas complained by such as wished them ill Success) could not
supply this Number, having only imported 46396 Slaves between the
years 1680 and 1688; Interlopers crept in, and contended for a
Share; which the Company represented as contrary to the Privileges
of their Patent, and withal, that the Accusation was groundless and
unjust, because they did supply enough for demand, and maintained
Forts and Garisons at a great Charge, for awing and subjecting the
Natives to trade, and maintaining an Industry equal to the Dutch,
without which it was plain to all impartial Considerers, it would be but
very difficultly carried on. However, their Adversaries, after some
years of grumbling, obtained an Act of Parliament 1697, whereby
private Traders for making good this deficiency of Slaves, should
have Liberty of Trade, allowing the Company 10 per Cent. towards
defraying their extraordinary Expence.
From this time the Company more visibly decayed, insomuch that
in eight following years they only imported to the West-Indies 17760
Slaves; and the separate Traders in that time 71268.
Their 10 per Cent. in the first ten years amounted to 87465l. and
therefore finding their Trade under great disadvantages with these
new Inmates, they resolved to make the best shares they could in
this Money, by lessening their Expence about the Forts. They
accordingly withdrew all Supplies from their Garisons, leaving them
to subsist by their own Management or starve. Gambia Fort having
only twelve men, was taken by a Privateer of eight Guns in 1709,
Sierraleon thirteen men, Sherbro four, and these not of any Charge
to the Company, but were possessed by such, who having a long
time resided in their Service, by help of those Fortifications were
capable to do something for themselves, and so the private Traders
by degrees got entirely quit of their Impost; the reason in a manner
ceasing, for which it was at first allowed.
About 1719, their Affairs seemed to revive again, under the
Auspices of the Duke of Chandois, who became a very considerable
Proprietor in their Stock, and promised from his Figure and Interest a
Renewal of those Privileges that had depressed them; their
Objections ceasing, (the number demanded being now very short of
what it was formerly.) More Ships were imployed than for many
years past, but whether it were their too large Expence, or
Corruption of their chief Officers, who too often in Companys think
they are sent abroad purely for their own Service, or both; they soon
felt that without a separate Act they were uncapable of contending
with private Traders, and every year more and more explaining their
Inability, they applied to Parliament, and now support their Forts by
an annual Allowance from the Government, of 10000l.
Those who are the Favourers of Companies suggest, that if the
Trade must be allowed, and the Christian Scheme of enlarging the
Flock cannot well be carried on without it, that then it seems
necessary and better for the Publick that some rich and powerful Set
of Men should have such exclusive Powers to encourage and enable
the subsisting of Forts and Garisons, to awe the Natives and
preserve the Trade from being engrossed by our dangerous Rivals
here, the Dutch; which, as we relinquish, falls an acquisition to them,
and renders all precarious; they could also bring (as an exclusive
Company) foreign Markets to their own Price.
The Company’s Trade wanting that Encouragement, every year
grows worse; buying dearer than in times past on the Coast, and
selling cheaper in the West-Indies; the reason at Guinea, is a greater
Scarcity of Slaves, and an improved Knowledge in the trading
Negroes who dispose of them; and at the West-Indies it is the
Demand failing, more disadvantageously still for them, because
separate Traders are not under the delays they are subject to: They
take the whole Coast in their way, while the other is consigned to the
Governour, and can afford to undersel their Goods (necessary
Requisites for Dispatch and Success) because they stand exempt
from all Coast-Charges. On the other side, our Colonies are now
pretty well glutted with Slaves, and their Call consequently not nigh
so large: 2000 in a year perhaps furnishes all our Plantations, and
tho’ more are imported, it is in order to transport them again to the
Spanish West-Indies, where tho’ the Assiento Ships are of late years
only indulged by Treaty, all others being liable to Confiscation, and
the People to Slavery if taken by the Spanish Guard le Costa; yet the
Prospect of Gain inciting, they still find means to continue on, and
maintain a forcible Traffick for them, under the Protection of their
Guns. This clandestine Method, by the way, hurts the South-Sea
Company, beating down the Price of their Slaves, who cannot so
well afford it, because bought, and brought there at a greater
Charge.
The third part of our Division is the Bay of Guinea, which takes in
Whydah, Benin, Callabar, &c. to Congo and Angola in 8°° S. In this
Extent Whydah is principal, there being more Slaves exported from
that place before the late Conquest of it by the King of Dauhomay,
than from all the rest of the Coast together, the Europeans being
said in some years to have carried off 20000; but more of this by and
by. I shall only observe, that as this part abounds more with Slaves,
the other does with Gold, and the windward Coast with Ivory.
I now proceed to our Method of Trade, and shall sum the Rules of
it up, under the head of Interlopers. Private trading Ships bring two
or three Boats with them upon this Coast for Dispatch, and while the
Mates go away in them with a proper Parcel of Goods, and
Instructions into the Rivers and By-places, the Ship is making good
her Trade at others near hand.
The Success of a Voyage depends first, on the well sorting, and
on the well timing of a Cargo. Secondly, in a Knowledge of the
places of Trade, what, and how much may be expected every where.
Thirdly, in dramming well with English Spirits, and conforming to the
Humours of the Negroes. Fourthly, in timely furnishing proper Food
for the Slaves. Fifthly, in Dispatch; and Lastly, the good Order and
Management of Slaves when on board; of each, a Word or two.
First, on the Timing of a Cargo: This depends at several places
much on Chance, from the fanciful and various Humours of the
Negroes, who make great demands one Voyage for a Commodity,
that perhaps they reject next, and is in part to be remedied either by
making the things they itch after, to pass off those they have not so
much mind to, or by such a continual Traffick and Correspondence
on the Coast, as may furnish the Owner from time to time with quick
Intelligence, to be done only by great Merchants, who can keep
imployed a number of Ships, that like a Thread unites them in a
Knowledge of their Demands, and a readier Supply for them, as well
as dispatch for their Master’s Interest, by putting the Purchases of
two or three Ships into one. The late Mr. Humphry Morrice was the
greatest private Trader this way, and unless Providence had fixed a
Curse upon it, he must have gained exceedingly.
Secondly, Of the Sorting, this may be observed in general; That
the Windward and Leeward Parts of the Coast are as opposite in
their Demands, as is their distance. Iron Bars, which are not asked
for to Leeward, are a substantial Part of Windward Cargoes.
Crystals, Orangos, Corals, and Brass-mounted Cutlasses are almost
peculiar to the Windward Coast;—as are brass Pans from Rio
Sesthos to Apollonia.—Cowreys (or Bouges) at Whydah.—Copper
and Iron Bars at Callabar;—but Arms, Gun-powder, Tallow, old
Sheets, Cottons of all the various Denominations, and English Spirits
are every where called for. Sealing-wax, and Pipes, are necessary in
small Quantities, they serve for Dashees (Presents) and a ready
Purchase for Fish, a Goat, Kid, or a Fowl.
To be more particular, here follows an Invoyce bought at London
about the year 1721.
A G U I N E A Cargo.
l. s. d. l. s. d.
10 Cotton Ramalls at 0 11 0 5 10 0
10 Silk Do 1 00 0 10 00 0
20 Herba-longees 0 10 0 10 00 0
20 Photees 0 17 6 17 10 0
30 Tapseils 0 12 0 18 00 0
20 Blue swaft Bafts 1 02 0 22 00 0
20 Chintz 0 12 6 12 10 0
50 Nichanees 0 13 0 32 10 0
176 Blue Paper Sletias 0 7 6 66 00 0
650 Crystal Beads No 221 per Mill. 2 00 0 13 00 0
2500 Do — No 30 2 12 0 6 10 —
4500 Do — No 36 2 18 0 13 01 0
2000 Rangos per Cwt. 0 11 0 11 00 0
4 Cases and Chests — — — 1 15 0
Charges and Entry at Custom-house — — — 3 12 6
Ct. q. l.
20 Brass Kettles qt. 2 0 02
28 Do 2 0 04
25 Do 2 0 06
251 Guinea Pans 3 0 18
-------------
9 1 02
per Cwt. 7l. 7s. 0d. 68 02 5
-------------
311 00 11
4 Casks 1 03 00
20 Chests of old Sheets each qt. 65,
0 1 10½ 121 17 06
at
130 2lb. Guinea Basins.
73 3 — Do
13 4 — Do
In all 4Cwt. 1q. 11l. 18 04 09
Box of Scales, Weights and blue
— 19 00
Pans.
Cartage, Portage, Wharfage, &c. 4 10 00
84 Quart Tankards at 2s. 2d. 9 02 00
96 Pint Do at 1 8 8 00 00
A Cask — 14 09
11 Groce of slope-pointed Knives at
14 06 00
1l. 6s.
200 Blue Ranters at 0 08 00 80 00 00
50 Narrow green Do 0 08 00 20 00 00
50 Broad blue Do 0 11 06 28 15 00
25 Says at 1 15 06 44 09 06
8 Cases with Carriage 2 10 06
150 Trading Guns at 0 08 03 61 17 06
50 Do dock Locks 0 08 06 21 05 00
150 Cags 0 00 07 02 10 06
21 Cwt. Tallow 2 01 00 43 01 00
For melting and putting up per Cwt. — 00 02 2 03 00
Cartage, and 10 large Cags — 00 11 00 16 08
-------------
797 06 07
35 Small Cags at 0 00 08 1 03 04
10 Barrels of Powder 3 05 00 32 10 00
Wateridge and shifting the Powder 00 08 06
50 Wickered Bottles 0 03 02 9 03 04
172 Gall. malt Spirits 0 02 00 17 04 00
40 Cases of Spirits 0 07 00 14 00 00
Freight of a Vessel to Portsmouth 5 10 00
Expences and Postage of Letters 0 11 00
Commission at 2½ per Cent. 22 03 03
-------------
900 00 00
10 Cwt. of Cowrys at 5l. 50 00 00
-------------
Total 950 00 00
I was but a young Trader, and could not find out till I came upon
the Coast, that this Cargo was ill sorted. At the first place we touched
(Sierraleon) where commonly may be got twenty or thirty as good
Slaves as any upon the Coast, I found I had neither Cutlasses, iron
Bars, a better sort of Fire-Arms, Malt, and other strong Liquors, the
delight of those Traders. At none of the others, quite down to the
Gold Coast, were many considerable Articles of my Invoyce ever
asked for; so that I was forced to make friends with the Factorys, and
exchange at such a loss, that had it not been for the small Wages
our Ship was at, and some lucky hits, the Owners must have
suffered much; but to give an Insight.
The Sale of Goods.

At Sierraleon.
Gold Bars.
1 Piece of Planes 10
7 77lb. Kettles 26
3 Pieces of Chintz 12
1 Piece of Handkerchief Stuff 2
---
The Price of a Woman Slave 50
7 50lb. Kettles 20
5 Pieces of Brawls 10
1 Piece of Ramal 4
1 Bar of Iron 1
---
The Price of a Boy Slave 35
At Apollonia.
Accys.
2 Photees 14
2 Cotton Ramals 8
1 Piece Longee 4
2 Sletias 5
7 Sheets 7
32 Brass Pans 32
---
A Man Slave 70
3 Photees 21
41 Sheets 41
2 Longees 8
---
A Man Slave 70
At Gambia.
Gold Bars.
9 Gallons of Brandy 9
6 Bars of Iron 6
2 Small Guns 10
1 Cag of Powder 10
2 Strings of Pacato Beads 2
1 Paper Sletia 3
---
A Woman Slave 40
At Assinee.
8 Trading Guns 32
1 Wicker Bottle 4
2 Cases of Spirits 6
28 Sheets 28
---
A Man Slave 70
At Anamaboo and Cape Palmas.

Accys.
A Cag of Tallow 2½
A quart Pewter Tankard 1
A Pint Do ½
4lb. Pewter Basin 1
2lb. Pewter Basin ½
Sealing-Wax 3
A qr. Barrel of Powder 8
A gallon Cag of Musket-Shot 6
A gallon Cag of small Shot 8
At Whydah,
Cowrys sell per Cwt.—— 12l. 10s. or in their
way of reckoning, 10 grand Quibesses.
At Angola, the Duties are about 100l. Sterl.
every Ship; and Goods sell, viz.
Pieces.
A Gun 1
A Cag of Powder 1
A deep blue Baft 3
A Culgee 3
A Tapseil 2
A Nicanee 2
A Cutchalee 1½
A red Chintz 1½
A Bundle of Anabasses qt. 10lb. 1
10 Brass Pans small and large 1
4 2lb. Pewter Basins 1
1½ Case of Spirits 1
A whole Case Do 1½
4 Cutlasses 1
A Guinea Stuff ½
2 Bunches of Beads 1
4 King’s Cloths 1
4 Looking-Glasses 1
10 Pint Mugs 1
A Brawl ½
9 Foot of black Bays 1
16 Inches of Scarlet Cloth 1
16 Do of blue Cloth 1
1 Photee 2
1 Pair Cotton Ramal 1½
As I propos’d only a general View of the Trade, I have pointed out

here the best I could, what Goods are asked for, the Price, and at
some places, the Proportion; the Slaves selling at a Medium of 15l. a
Man, and 12l. a Woman; a Gun and Barrel of Powder being always
parts of the Truck (at Cabenda) for a Slave. They have Canoos
there, will carry 200 Men; matted Sails to them, and Cordage twisted
from a wild Vine that grows in plenty about the Country; with these
they pass frequently from Congo to Loango. A Slave-Ship in the
former River would intercept much of the Trade to Cabenda and
Angola: The Duties are easy with the King of Soni, and the Harmony
they live in with a few defenceless Portuguese Missionaries, shews
they are a peaceable People.
A Second Requisite for Success in this Trade, is an acquaintance
with the Places, what may be expected at them, either as to the
Manner of Trading, bold or fearful of one another, and the Number of
Slaves they are able to bring.
Where the Company’s Factors are settled, as at Gambia, and
along the greatest part of the Gold Coast, they influence the Trade
something against private Ships; so also at Sierraleon some
separate Traders live, who voyage it with Boats into the adjacent
Rivers, and most of what a Ship can purchase, is thro’ their hands;
but those from London seldom strike higher upon the Coast than
Cape Mount, Montzerado, and Junk, falling from thence down to
Leeward; many of the places in their Course being rendered
dangerous, from the Tricks and Panyarrs the Traders have first
practised upon the Negroes; a mutual Jealousy now keeping each
side very watchful against Violence. We trade on board the Ship,
often keeping our Sailors in close quarters abaft, because few: while
the Slaves are viewing and contracting for at the fore part; at night
also keeping a good Watch, some of these Negroes attempting now
and then to steal with their Canoos athwart your Hawse, and cut the
Cable. Captain Cummin at Whydah, they stranded 1734.
They again, are as often diffident of coming nigh us, and will play
for hours together in their Canoos about the Ship, before they dare
venture. In this windward part, I have before observed, they have a
superstitious Custom, of dropping with their Finger a drop of Sea-
Water in their Eye, which they are pleased when answered in, and
passes for an Engagement of Peace and Security; and yet after all
this Ceremony, they will sometimes return to shore: If hardy enough
to come on board, they appear all the time shy and frightned, and
from the least appearance of a Panyarr, jump all over board.
Downwards to Bassam, Assinee, Jaquelahou, Cape le Hou, Jaque a
Jaques, Cape Apollonia, and Three Points, or where they have
possibly gained a Knowledge of the English Factorys, there is a
better Understanding and Security: These are places that sell off a
number of Slaves, managed however wholly on board the Ships who
anchor before the Town, hoist their Ensign, and fire a Gun: Or when
the Natives seem timorous, do it by their Boats coasting along the
Beach, and pay at some of them a small Duty to the chief Cabiceers.
Thirdly, To give dispatch, cajole the Traders with Dashees of
Brandy, and tell them, you cannot possibly stay above a day or two,
and that on their account. To a Country-Man, if he joins where there
is prospect of Goodee Trade, you are to form some Story that may
carry him farther to Leeward if possible, (two or three Leagues will
hinder his doing you any Damage for that Voyage.) The Lye did me
most Service, and for which I had the Merchant’s Dispensation, was
informing my good Friend that at Cobelahou they had taken a great
number of Captives, and that Captain —— had got his Freight there
in ten days: this I did with an air of Diffidence, to make the greater
Impression, and at the same time dashee’d his Negro Friends to go
on board and back it. If on better Intelligence such like Story should
not take, and he resolves to stay and share, your Reputation is
secured by the diffidence of your Report, and you must resolve with
him now upon a Price in your Slaves, not to outbid one another; but
at the same time make as strong a Resolution not to observe it. And
here the Master has room to display his talent, the frequency of the
Trick having made all very cautious and diffident.
When a Ship has gathered up all this Trade, she makes up the
deficiency of her Freight at Anamaboo, three Leagues below Cape
Corso, where they constantly stop, and are sometimes two or three
Months in finishing. It is a place of very considerable Trade in itself;
and besides, the Company have a House and Factor, keeping
always a number of Slaves against those demands of the
Interlopers, who, they are sensible, want dispatch, and therefore
make them pay a higher Price for it than any where on the whole
Coast; selling at six Ounces and a half a Slave (in exchange for
Goods) tho’ the poor Creatures look as meagre and thin as their
Writers.
If the Company should want rather to buy than sell, as is
sometimes the case, and fits both; then such a difference is paid by
the General, as shall make it worth the Ship’s time to go to
Windward again.
Hence I make this deduction, that if the Adventurers Stock be
small, only sufficient to employ one Vessel, to have her a Sloop;
because less hazard is run in lengthning out time, which subjects to
Sickness and Mortality among the Slaves; saves the aggregate
Charge of supporting them and a Ship’s Company, and likewise such
a Vessel will have less remains of Cargo, after her Slaving is
compleated; what is left, usually going off to the trading Cabiceers
and Factories at a low Price, or what is worse, kept on board and
spoiled.
Contrarily, great Traders who can imploy many Ships, obviate in a
great measure such Inconveniencies: They put the Trade of two or
three Ships into one at Anamaboo, (the largest and most
chargeable) and with the conjunction of their remains, go to
Windward, and begin anew.
Fourthly, giving way to the ridiculous Humours and Gestures of the
trading Negroes, is no small artifice for Success. If you look strange
and are niggardly of your Drams, you frighten him; Sambo is gone,
he never cares to treat with dry Lips, and as the Expence is in
English Spirits of two Shillings a Gallon, brought partly for that
purpose; the good Humour it brings them into, is found discounted in
the Sale of Goods.
A fifth Article, is the wholesome Victualling, and Management of
Slaves on board.
The common, cheapest, and most commodious Diet, is with
Vegetables, Horse-Beans, Rice, Indian Corn, and Farine, the former,
Ships bring with them out of England; Rice, they meet to Windward,
about Sesthos; Indian Corn, at Momford, Anamaboo, &c. and further
Supplies of them, or Farine, at the Islands of St. Thomas, and

Textbook Adaptive Critic Control With Robust Stabilization For Uncertain Nonlinear Systems Ding Wang Ebook All Chapter PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Textbook Adaptive Critic Control With Robust Stabilization For Uncertain Nonlinear Systems Ding Wang Ebook All Chapter PDF

Uploaded by

Copyright:

Available Formats

Adaptive Critic Control with Robust

Stabilization for Uncertain Nonlinear

Adaptive Robust Control with Limited Knowledge on

Adaptive and Fault-Tolerant Control of Underactuated

Theory of stabilization for linear boundary control

Adversarial and Uncertain Reasoning for Adaptive Cyber

Stabilization and H∞ Control of Switched Dynamic

Biologically Inspired Control of Humanoid Robot Arms

Nonlinear control systems using MATLAB First Edition

Robust Adaptive Dynamic Programming 1st Edition Hao Yu

Ding Wang · Chaoxu Mu

More information about this series at http://www.springer.com/series/13304

Adaptive Critic Control

ISSN 2198-4182 ISSN 2198-4190 (electronic)

Library of Congress Control Number: 2018948621

© Springer Nature Singapore Pte Ltd. 2019

Machine learning is one of the core techniques of artiﬁcial intelligence. Among

Chicago, USA Derong Liu

Beijing, China Ding Wang

Beijing, China Ding Wang

1 Overview of Robust Adaptive Critic Control Design . . . . . . . . . . .. 1

2.3 Basics of Robust Optimal Control Methodology . . . . . . . . . . . . 49

5.3 Adaptive-Critic-Based Event-Triggered Robust Stabilization . . . 150

8.3.3 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

ADP Adaptive/approximate dynamic programming

Abstract Adaptive dynamic programming (ADP) and reinforcement learning are

Nowadays, machine learning becomes the core technique of artificial intelligence

© Springer Nature Singapore Pte Ltd. 2019 1

1.1.1 Reinforcement Learning and Adaptive Critic Designs

As an important branch of artificial intelligence and especially machine learning,

1.1.2 Adaptive-Critic-Based Optimal Control Design

policy iterations are fundamental algorithms of reinforcement learning based ADP

1.1.3 Adaptive-Critic-Based Nonlinear Robust Control

Since 2013, there gradually appeared some publications of ADP-based robust

1.2 ADP-Based Continuous-Time Nonlinear Optimal

In this section, we present a brief review of the continuous-time nonlinear optimal

1.2.1 Basic Optimal Control Problem Description

We consider a class of continuous-time nonlinear systems with control-affine inputs

ẋ(t) = f (x(t)) + g(x(t))u(t), (1.1)

U (x(t), u(t)) = Q(x(t)) + u T (t)Ru(t) (1.2)

Q(x(t)) is more general than the classical form x T (t)Qx(t), where

0 = U (x, u(x)) + (∇ J (x))T [ f (x) + g(x)u(x)] (1.4)

with J (0) = 0. Define the Hamiltonian of system (1.1) as

H (x, u(x), ∇ J (x)) = U (x, u(x)) + (∇ J (x))T [ f (x) + g(x)u(x)]. (1.5)

satisfies the so-called continuous-time HJB equation

min H (x, u(x), ∇ J ∗ (x)) = 0. (1.7)

u ∗ (x) = arg min H (x, u(x), ∇ J ∗ (x))

0 = U (x, u ∗ (x)) + (∇ J ∗ (x))T [ f (x) + g(x)u ∗ (x)]

This promotes the investigation of iterative algorithms, such as policy iteration. We

u (0) (x) → J (1) (x) → u (1) (x) → J (2) (x) → · · · (1.10)

Algorithm 1 Policy Iteration for Optimal Control Problem

with J (k+1) (0) = 0, where ẋ = f (x) + g(x)u (k) (x).

1.2.2 Neural Control Design with Stability Discussion

J ∗ (x) = ωcT σc (x) + εc (x), (1.13)

∇ J ∗ (x) = (∇σc (x))T ωc + ∇εc (x). (1.14)

Jˆ∗ (x) = ω̂cT σc (x), (1.15)

∇ Jˆ∗ (x) = (∇σc (x))T ω̂c . (1.16)

Noticing (1.9), we define the error as

ec = Ĥ (x, û ∗ (x), ∇ Jˆ∗ (x)) − H (x, u ∗ (x), ∇ J ∗ (x)) (1.20)

1.3 Nonlinear Robust Control Design with Matched

1.3.1 Problem Transformation Method

ẋ(t) = f (x(t)) + g(x(t))[u(t) + d(x(t))], (1.24)

H R (x, u(x), ∇ J (x)) = ρd M