You are on page 1of 53

Adaptive Dynamic Programming with

Applications in Optimal Control 1st


Edition Derong Liu
Visit to download the full and correct content document:
https://textbookfull.com/product/adaptive-dynamic-programming-with-applications-in-o
ptimal-control-1st-edition-derong-liu/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Stochastic Optimal Control in Infinite Dimension


Dynamic Programming and HJB Equations 1st Edition
Giorgio Fabbri

https://textbookfull.com/product/stochastic-optimal-control-in-
infinite-dimension-dynamic-programming-and-hjb-equations-1st-
edition-giorgio-fabbri/

Robust Adaptive Dynamic Programming 1st Edition Hao Yu

https://textbookfull.com/product/robust-adaptive-dynamic-
programming-1st-edition-hao-yu/

Intelligent Optimal Adaptive Control for Mechatronic


Systems 1st Edition Marcin Szuster

https://textbookfull.com/product/intelligent-optimal-adaptive-
control-for-mechatronic-systems-1st-edition-marcin-szuster/

Programming Interview Problems: Dynamic Programming


(with solutions in Python) 1st Edition Leonardo Rossi

https://textbookfull.com/product/programming-interview-problems-
dynamic-programming-with-solutions-in-python-1st-edition-
leonardo-rossi/
Optimal Control in Thermal Engineering 1st Edition
Viorel Badescu

https://textbookfull.com/product/optimal-control-in-thermal-
engineering-1st-edition-viorel-badescu/

Robust Control: Theory and Applications Kang-Zhi Liu

https://textbookfull.com/product/robust-control-theory-and-
applications-kang-zhi-liu/

Adaptive aeroservoelastic control 1st Edition Tewari

https://textbookfull.com/product/adaptive-aeroservoelastic-
control-1st-edition-tewari/

Sliding Mode Control Methodology in the Applications of


Industrial Power Systems Jianxing Liu

https://textbookfull.com/product/sliding-mode-control-
methodology-in-the-applications-of-industrial-power-systems-
jianxing-liu/

Adaptive Critic Control with Robust Stabilization for


Uncertain Nonlinear Systems Ding Wang

https://textbookfull.com/product/adaptive-critic-control-with-
robust-stabilization-for-uncertain-nonlinear-systems-ding-wang/
Advances in Industrial Control

Derong Liu
Qinglai Wei
Ding Wang
Xiong Yang
Hongliang Li

Adaptive Dynamic
Programming with
Applications in
Optimal Control
Advances in Industrial Control

Series editors
Michael J. Grimble, Glasgow, UK
Michael A. Johnson, Kidlington, UK
More information about this series at http://www.springer.com/series/1412
Derong Liu Qinglai Wei Ding Wang
• •

Xiong Yang Hongliang Li


Adaptive Dynamic
Programming
with Applications
in Optimal Control

123
Derong Liu Xiong Yang
Institute of Automation Tianjin University
Chinese Academy of Sciences Tianjin
Beijing China
China
Hongliang Li
Qinglai Wei Tencent Inc.
Institute of Automation Shenzhen
Chinese Academy of Sciences China
Beijing
China

Ding Wang
Institute of Automation
Chinese Academy of Sciences
Beijing
China

ISSN 1430-9491 ISSN 2193-1577 (electronic)


Advances in Industrial Control
ISBN 978-3-319-50813-9 ISBN 978-3-319-50815-3 (eBook)
DOI 10.1007/978-3-319-50815-3
Library of Congress Control Number: 2016959539

© Springer International Publishing AG 2017


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword

Nowadays, nonlinearity is involved in all walks of life. It is a challenge for


engineers to design controllers for all kinds of nonlinear systems. To handle this
issue, various nonlinear control theories have been developed, such as theories of
adaptive control, optimal control, and robust control. Among these theories, the
theory of optimal control has drawn considerable attention over the past several
decades. This is mainly because optimal control provides an effective way to design
controllers with guaranteed robustness properties as well as capabilities of opti-
mization and resource conservation that are important in manufacturing, vehicle
emission control, aerospace systems, power systems, chemical engineering pro-
cesses, and many other applications.
The core challenge in deriving the solutions of nonlinear optimal control
problems is that it often boils down to solving certain Hamilton–Jacobi–Bellman
(HJB) equations. The HJB equations are nonlinear and difficult to solve for general
nonlinear dynamical systems. Indeed, no closed-form solution to such equations
exists, except for very special problems. Therefore, numerical solutions to HJB
equations have been developed by engineers. To obtain such numerical solutions, a
highly effective method known as adaptive/approximate dynamic programming
(ADP) can be used. A distinct advantage of ADP is that it can avoid the well-known
“curse of dimensionality” of dynamic programming while adaptively solving the
HJB equations. Due to this characteristic, many elegant ADP approaches and their
applications have been developed in the literature during the past several decades.
It is also notable that ADP techniques also provide a link with cognitive
decision-making methods that are observed in the human brain, and thus, ADP has
become a main channel to achieve truly brain-like intelligence in human-engineered
automatic control systems.
Unlike most ADP books, the present book “Adaptive Dynamic Programming
with Applications in Optimal Control” focuses on the principles of emerging
optimal control techniques for nonlinear systems in both discrete-time and
continuous-time domains, and on creating applications of these optimal control
techniques. This book contains three themes:

v
vi Foreword

1. Optimal control for discrete-time nonlinear dynamical systems, covering various


novel techniques used to derive optimal control in the discrete-time domain,
such as general value iteration, θ-ADP, finite approximation error-based value
iteration, policy iteration, generalized policy iteration, and error bounds analysis
of ADP.
2. Optimal control for continuous-time nonlinear systems, discussing the optimal
control for input-affine/input-nonaffine nonlinear systems, robust and optimal
guaranteed cost control for input-affine nonlinear systems, decentralized control
for interconnected nonlinear systems, and optimal control for differential games.
3. Applications, providing typical applications of optimal control approaches in the
areas of energy management in smart homes, coal gasification, and water gas
shift reaction.
This book provides timely and informative coverage about ADP, including both
rigorous derivations and insightful developments. It will help both specialists and
nonspecialists understand the new developments in the field of nonlinear optimal
control using online/offline learning techniques. Meanwhile, it will be beneficial for
engineers to apply the developed ADP methods to their own problems in practice.
I am sure you will enjoy reading this book.

Arlington, TX, USA Frank L. Lewis


September 2016
Series Editors’ Foreword

The series Advances in Industrial Control aims to report and encourage technology
transfer in control engineering. The rapid development of control technology has an
impact on all areas of the control discipline: new theory, new controllers, actuators,
sensors, new industrial processes, computer methods, new applications, new design
philosophies, and new challenges. Much of this development work resides in
industrial reports, feasibility study papers, and the reports of advanced collaborative
projects. The series offers an opportunity for researchers to present an extended
exposition of such new work in all aspects of industrial control for wider and rapid
dissemination.
The method of dynamic programming has a long history in the field of optimal
control. It dates back to those days when the subject of control was emerging in a
modern form in the 1950s and 1960s. It was devised by Richard Bellman who gave
it a modern revision in a publication of 1954 [1]. The name of Bellman became
linked to an optimality equation, key to the method, and like the name of Kalman
became uniquely associated with the early development of optimal control. One
notable extension to the method was that of differential dynamic programming due
to David Q. Mayne in 1966 and developed at length in the book by Jacobson and
Mayne [2]. Their new technique used locally quadratic models for the system
dynamics and cost functions and improved the convergence of the dynamic pro-
gramming method for optimal trajectory control problems.
Since those early days, the subject of control has taken many different directions,
but dynamic programming has always retained a place in the theory of optimal
control fundamentals. It is therefore instructive for the Advances in Industrial
Control monograph series to have a contribution that presents new ways of solving
dynamic programming and demonstrating these methods with some up-to-date
industrial problems. This monograph, Adaptive Dynamic Programming with
Applications in Optimal Control, by Derong Liu, Qinglai Wei, Ding Wang, Xiong
Yang and Hongliang Li, has precisely that objective.
The authors open the monograph with a very interesting and relevant discussion
of another computationally difficult problem, namely devising a computer program
to defeat human master players at the Chinese game of Go. Inspiration from the

vii
viii Series Editors’ Foreword

better programming techniques used in the Go-master problem was used by the
authors to defeat the “curse of dimensionality” that arises in dynamic programming
methods.
More formally, the objective of the techniques reported in the monograph is to
control in an optimal fashion an unknown or uncertain nonlinear multivariable
system using recorded and instantaneous output signals. The algorithms’ technical
framework is then constructed through different categories of the usual state-space
nonlinear ordinary differential system model. The system model can be continuous
or discrete, have affine or nonaffine control inputs, be subject to no constraints, or
have constraints present. A set of 11 chapters contains the theory for various
formulations of the system features.
Since standard dynamic programming schemes suffer from various implemen-
tation obstacles, adaptive dynamic programming procedures have been developed
to find computable practical suboptimal control solutions. A key technique used by
the authors is that of neural networks which are trained using recorded data and
updated, or “adapted,” to accommodate uncertain system knowledge. The theory
chapters are arranged in two parts: Part 1 Discrete-Time Systems—five chapters;
and Part 2 Continuous-Time Systems—five chapters.
An important feature of the monographs of the Advances in Industrial Control
series is a demonstration of potential or actual application to industrial problems.
After a comprehensive presentation of the theory of adaptive dynamic program-
ming, the authors devote Part 3 of their monograph to three chapter-length appli-
cation studies. Chapter 12 examines the scheduling of energy supplies in a smart
home environment, a topic and problem of considerable contemporary interest.
Chapter 13 uses a coal gasification process that is suitably challenging to demon-
strate the authors’ techniques. And finally, Chapter 14 concerns the control of the
water gas shift reaction. In this example, the data used was taken from a real-world
operational system.
This monograph is very comprehensive in its presentation of the adaptive
dynamic programming theory and has demonstrations with three challenging pro-
cesses. It should find a wide readership in both the industrial control engineering
and the academic control theory communities. Readers in other fields such as
computer science and chemical engineering may also find the monograph of con-
siderable interest.
Michael J. Grimble
Michael A. Johnson
Industrial Control Centre
University of Strathclyde
Glasgow, Scotland, UK
Series Editors’ Foreword ix

References

1. Bellman R (1954) The theory of dynamic programming. Bulletin of the American Mathematical
Society 60(6):503–515
2. Jacobson DH, Mayne DQ (1970) Differential dynamic programming, American Elsevier Pub.
Co. New York
Preface

With the rapid development in information science and technology, many busi-
nesses and industries have undergone great changes, such as chemical industry,
electric power engineering, electronics industry, mechanical engineering, trans-
portation, and logistics business. While the scale of industrial enterprises is
increasing, production equipment and industrial processes are becoming more and
more complex. For these complex systems, decision and control are necessary to
ensure that they perform properly and meet prescribed performance objectives.
Under this circumstance, how to design safe, reliable, and efficient control for
complex systems is essential for our society. As modern systems become more
complex and performance requirements become more stringent, advanced control
methods are greatly needed to achieve guaranteed performance and satisfactory
goals.
In general, optimal control deals with the problem of finding a control law for a
given system such that a certain optimality criterion is achieved. The main differ-
ence between optimal control of linear and nonlinear systems lies in that the latter
often requires solving the nonlinear Bellman equation instead of the Riccati
equation. Although dynamic programming is a conventional method in solving
optimization and optimal control problems, it often suffers from the “curse of
dimensionality.” To overcome this difficulty, based on function approximators such
as neural networks, adaptive/approximate dynamic programming (ADP) was pro-
posed by Werbos as a method for solving optimal control problems
forward-in-time.
This book presents the recent results of ADP with applications in optimal
control. It is composed of 14 chapters which cover most of the hot research areas of
ADP and are divided into three parts. Part I concerns discrete-time systems,
including five chapters from Chaps. 2 to 6. Part II concerns continuous-time sys-
tems, including five chapters from Chaps. 7 to 11. Part III concerns applications,
including three chapters from Chaps. 12 to 14.
In Chap. 1, an introduction to the history of ADP is provided, including the basic
and iterative forms of ADP. The review begins with the origin of ADP and

xi
xii Preface

describes the basic structures and the algorithm development in detail. Connections
between ADP and reinforcement learning are also discussed.
Part I: Discrete-Time Systems (Chaps. 2–6)
In Chap. 2, optimal control problems of discrete-time nonlinear dynamical systems,
including optimal regulation, optimal tracking control, and constrained optimal
control, are studied using a series of value iteration ADP approaches. First, an ADP
scheme based on general value iteration is developed to obtain near-optimal control
for discrete-time affine nonlinear systems with continuous state and control spaces.
The present scheme is also employed to solve infinite-horizon optimal tracking
control problems for a class of discrete-time nonlinear systems. In particular, using
the globalized dual heuristic programming technique, a value iteration-based
optimal control strategy of unknown discrete-time nonlinear dynamical systems
with input constraints is established as a case study. Second, an iterative θ-ADP
algorithm is given to solve the optimal control problem of infinite-horizon
discrete-time nonlinear systems, which shows that each of the iterative controls can
stabilize the nonlinear dynamical systems and the condition of initial admissible
control is avoided effectively.
In Chap. 3, a series of iterative ADP algorithms are developed to solve the
infinite-horizon optimal control problems for discrete-time nonlinear dynamical
systems with finite approximation errors. Iterative control laws are obtained by
using the present algorithms such that the iterative value functions reach the opti-
mum. Then, the numerical optimal control problems are solved by a novel
numerical adaptive learning control scheme based on ADP algorithm. Moreover, a
general value iteration algorithm with finite approximate errors is developed to
guarantee the iterative value function to converge to the solution of the Bellman
equation. The general value iteration algorithm permits an arbitrary positive
semidefinite function to initialize itself, which overcomes the disadvantage of tra-
ditional value iteration algorithms.
In Chap. 4, a discrete-time policy iteration ADP method is developed to solve
the infinite-horizon optimal control problems for nonlinear dynamical systems. The
idea is to use an iterative ADP technique to obtain iterative control laws that
optimize the iterative value functions. The convergence, stability, and optimality
properties are analyzed for policy iteration method for discrete-time nonlinear
dynamical systems, and it is shown that the iterative value functions are nonin-
creasingly convergent to the optimal solution of the Bellman equation. It is also
proven that any of the iterative control laws obtained from the present policy
iteration algorithm can stabilize the nonlinear dynamical systems.
In Chap. 5, a generalized policy iteration algorithm is developed to solve the
optimal control problems for infinite-horizon discrete-time nonlinear systems.
Generalized policy iteration algorithm uses the idea of interacting the policy iter-
ation algorithm and the value iteration algorithm of ADP. It permits an arbitrary
positive semidefinite function to initialize the algorithm, where two iteration indices
are used for policy evaluation and policy improvement, respectively. The
Preface xiii

monotonicity, convergence, admissibility, and optimality properties of the gener-


alized policy iteration algorithm are analyzed.
In Chap. 6, error bounds of ADP algorithms are established for solving undis-
counted infinite-horizon optimal control problems of discrete-time deterministic
nonlinear systems. The error bounds for approximate value iteration based on a
novel error condition are developed. The error bounds for approximate policy
iteration and approximate optimistic policy iteration algorithms are also provided. It
is shown that the iterative approximate value function can converge to a finite
neighborhood of the optimal value function under some conditions. In addition,
error bounds are also established for Q-function of approximate policy iteration for
optimal control of unknown discounted discrete-time nonlinear systems. Neural
networks are used to approximate the Q-function and the control policy.
Part II: Continuous-Time Systems (Chaps. 7–11)
In Chap. 7, optimal control problems of continuous-time affine nonlinear dynamical
systems are studied using ADP approaches. First, an identifier–critic architecture
based on ADP methods is presented to derive the approximate optimal control for
uncertain continuous-time nonlinear dynamical systems. The identifier neural net-
work and the critic neural network are tuned simultaneously, while the restrictive
persistence of excitation condition is relaxed. Second, an ADP-based algorithm is
developed to solve the optimal control problems for continuous-time nonlinear
dynamical systems with control constraints. Only a single critic neural network is
utilized to derive the optimal control, and there is no special requirement on the
initial control.
In Chap. 8, the optimal control problems are considered for continuous-time
nonaffine nonlinear dynamical systems with completely unknown dynamics via
ADP methods. First, an ADP-based novel identifier–actor–critic architecture is
developed to provide approximate optimal control solutions for continuous-time
unknown nonaffine nonlinear dynamical systems, where the identifier is constructed
by a dynamic neural network to transform nonaffine nonlinear systems into a class
of affine nonlinear systems. Second, an ADP-based observer–critic architecture is
presented to obtain the approximate optimal control for nonaffine nonlinear
dynamical systems in the presence of unknown dynamics, where the observer is
composed of a three-layer feedforward neural network aiming to get the knowledge
of system states.
In Chap. 9, robust control and optimal guaranteed cost control of
continuous-time uncertain nonlinear systems are studied using the idea of ADP.
First, a novel strategy is established to design the robust controller for a class of
nonlinear systems with uncertainties based on an online policy iteration algorithm.
By properly choosing a cost function that reflects the uncertainties, regulation, and
control, the robust control problem is transformed into an optimal control problem,
which can be solved effectively under the framework of ADP. Then, the
infinite-horizon optimal guaranteed cost control problem of uncertain nonlinear
systems is investigated by employing the formulation of ADP-based online optimal
xiv Preface

control design, which extends the application scope of ADP methods to nonlinear
and uncertain environment.
In Chap. 10, by using neural network-based online learning optimal control
approach, a decentralized control strategy is developed to stabilize a class of
continuous-time large-scale interconnected nonlinear systems. The decentralized
control strategy of the overall system can be established by adding appropriate
feedback gains to the optimal control laws of isolated subsystems. Then, an online
policy iteration algorithm is presented to solve the Hamilton–Jacobi–Bellman
equations related to the optimal control problems. Furthermore, as a generalization,
a neural network-based decentralized control law is developed to stabilize the
large-scale interconnected nonlinear systems with unknown dynamics by using an
online model-free integral policy iteration algorithm.
In Chap. 11, differential game problems of continuous-time systems, including
two-player zero-sum games, multiplayer zero-sum games, and multiplayer
nonzero-sum games, are studied via a series of ADP approaches. First, an integral
policy iteration algorithm is developed to learn online the Nash equilibrium solution
of two-player zero-sum differential games with completely unknown
continuous-time linear dynamics. Second, multiplayer zero-sum differential games
for a class of continuous-time uncertain nonlinear systems are solved by using an
iterative ADP algorithm. Finally, an online synchronous approximate optimal
learning algorithm based on policy iteration is developed to solve multiplayer
nonzero-sum games of continuous-time nonlinear systems without requiring exact
knowledge of system dynamics.
Part III: Applications (Chaps. 12–14)
In Chap. 12, intelligent optimization methods based on ADP are applied to the
challenges of intelligent price-responsive management of residential energy, with
an emphasis on home battery use connected to the power grid. First, an
action-dependent heuristic dynamic programming is developed to obtain the opti-
mal control law for residential energy management. Second, a dual iterative
Q-learning algorithm is developed to solve the optimal battery management and
control problem in smart residential environments where two iterations are intro-
duced, which are respectively internal and external iterations. Based on the dual
iterative Q-learning algorithm, the convergence property of iterative Q-learning
method for the optimal battery management and control problem is proven. Finally,
a distributed iterative ADP method is developed to solve the multibattery optimal
coordination control problem for home energy management systems.
In Chap. 13, a coal gasification optimal tracking control problem is solved
through a data-based iterative optimal learning control scheme by using iterative
ADP approach. According to system data, neural networks are used to construct the
dynamics of coal gasification process, coal quality, and reference control, respec-
tively. Via system transformation, the optimal tracking control problem with
approximation errors and disturbances is effectively transformed into a two-person
zero-sum optimal control problem. An iterative ADP algorithm is developed to
obtain the optimal control laws for the transformed system.
Preface xv

In Chap. 14, a data-driven stable iterative ADP algorithm is developed to solve


the optimal temperature control problem of water gas shift reaction system.
According to the system data, neural networks are used to construct the dynamics of
water gas shift reaction system and solve the reference control. Considering the
reconstruction errors of neural networks and the disturbances of the system and
control input, a stable iterative ADP algorithm is developed to obtain the optimal
control law. Convergence property is developed to guarantee that the iterative value
function converges to a finite neighborhood of the optimal cost function. Stability
property is developed so that each of the iterative control laws can guarantee the
tracking error to be uniformly ultimately bounded.

Beijing, China Derong Liu


Chicago, USA Qinglai Wei
September 2016 Ding Wang
Xiong Yang
Hongliang Li
Acknowledgements

The authors would like to acknowledge the help and encouragement they have
received from colleagues in Beijing and Chicago during the course of writing this
book. Some materials presented in this book are based on the research conducted
with several Ph.D. students, including Yuzhu Huang, Dehua Zhang, Pengfei Yan,
Yancai Xu, Hongwen Ma, Chao Li, and Guang Shi. The authors also wish to thank
Oliver Jackson, Editor (Engineering) from Springer for his patience and
encouragements.
The authors are very grateful to the National Natural Science Foundation of
China (NSFC) for providing necessary financial support to our research in the past
five years. The present book is the result of NSFC Grants 61034002, 61233001,
61273140, 61304086, and 61374105.

xvii
Contents

1 Overview of Adaptive Dynamic Programming . . . . . . . . . . . . . . . . . 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Adaptive Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Basic Forms of Adaptive Dynamic Programming . . . . . 10
1.3.2 Iterative Adaptive Dynamic Programming . . . . . . . . . . . 15
1.3.3 ADP for Continuous-Time Systems . . . . . . . . . . . . . . . . 18
1.3.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Related Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Part I Discrete-Time Systems


2 Value Iteration ADP for Discrete-Time Nonlinear Systems . . . .... 37
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 37
2.2 Optimal Control of Nonlinear Systems
Using General Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.1 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2.2 Neural Network Implementation . . . . . . . . . . . . . . . . . . 48
2.2.3 Generalization to Optimal Tracking Control . . . . . . . . . 52
2.2.4 Optimal Control of Systems
with Constrained Inputs . . . . . . . . . . . . . . . . . . . . . .... 56
2.2.5 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . .... 59
2.3 Iterative θ-Adaptive Dynamic Programming Algorithm
for Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.3.1 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.3.2 Optimality Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.3.3 Summary of Iterative θ-ADP Algorithm . . . . . . . . . . . . 80
2.3.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

xix
xx Contents

2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3 Finite Approximation Error-Based Value Iteration ADP . . . . . .... 91
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 91
3.2 Iterative θ-ADP Algorithm with Finite
Approximation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 92
3.2.1 Properties of the Iterative ADP Algorithm
with Finite Approximation Errors . . . . . . . . . . . . . . . . . 93
3.2.2 Neural Network Implementation . . . . . . . . . . . . . . . . . . 100
3.2.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.3 Numerical Iterative θ-Adaptive Dynamic Programming . . . . . . . 107
3.3.1 Derivation of the Numerical Iterative θ-ADP
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 107
3.3.2 Properties of the Numerical Iterative θ-ADP
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 111
3.3.3 Summary of the Numerical Iterative θ-ADP
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 120
3.3.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . .... 121
3.4 General Value Iteration ADP Algorithm with Finite
Approximation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 125
3.4.1 Derivation and Properties of the GVI Algorithm
with Finite Approximation Errors . . . . . . . . . . . . . .... 125
3.4.2 Designs of Convergence Criteria with Finite
Approximation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.4.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4 Policy Iteration for Optimal Control of Discrete-Time Nonlinear
Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.2 Policy Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.2.1 Derivation of Policy Iteration Algorithm . . . . . . . . . . . . 153
4.2.2 Properties of Policy Iteration Algorithm . . . . . . . . . . . . 154
4.2.3 Initial Admissible Control Law . . . . . . . . . . . . . . . . . . . 160
4.2.4 Summary of Policy Iteration ADP Algorithm . . . . . . . . 162
4.3 Numerical Simulation and Analysis . . . . . . . . . . . . . . . . . . . . . . 162
4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Contents xxi

5 Generalized Policy Iteration ADP for Discrete-Time Nonlinear


Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.2 Generalized Policy Iteration-Based Adaptive Dynamic
Programming Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.2.1 Derivation and Properties of the GPI Algorithm . . . . . . 179
5.2.2 GPI Algorithm and Relaxation of Initial Conditions . . . 188
5.2.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.3 Discrete-Time GPI with General Initial Value Functions . . . . . . 199
5.3.1 Derivation and Properties of the GPI Algorithm . . . . . . 199
5.3.2 Relaxations of the Convergence Criterion
and Summary of the GPI Algorithm . . . . . . . . . . . . . . . 211
5.3.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6 Error Bounds of Adaptive Dynamic Programming Algorithms . . . . 223
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.2 Error Bounds of ADP Algorithms for Undiscounted Optimal
Control Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
6.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
6.2.2 Approximate Value Iteration . . . . . . . . . . . . . . . . . . . . . 226
6.2.3 Approximate Policy Iteration . . . . . . . . . . . . . . . . . . . . . 231
6.2.4 Approximate Optimistic Policy Iteration . . . . . . . . . . . . 237
6.2.5 Neural Network Implementation . . . . . . . . . . . . . . . . . . 241
6.2.6 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
6.3 Error Bounds of Q-Function for Discounted Optimal Control
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
6.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
6.3.2 Policy Iteration Under Ideal Conditions . . . . . . . . . . . . . 249
6.3.3 Error Bound for Approximate Policy Iteration . . . . . . . . 254
6.3.4 Neural Network Implementation . . . . . . . . . . . . . . . . . . 257
6.3.5 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

Part II Continuous-Time Systems


7 Online Optimal Control of Continuous-Time Affine Nonlinear
Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 267
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 267
7.2 Online Optimal Control of Partially Unknown Affine
Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 267
7.2.1 Identifier–Critic Architecture for Solving HJB
Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 269
xxii Contents

7.2.2 Stability Analysis of Closed-Loop System . . . . . . . .... 281


7.2.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . .... 286
7.3 Online Optimal Control of Affine Nonlinear Systems
with Constrained Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 291
7.3.1 Solving HJB Equation via Critic Architecture . . . . .... 294
7.3.2 Stability Analysis of Closed-Loop System
with Constrained Inputs . . . . . . . . . . . . . . . . . . . . . . . . . 298
7.3.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
8 Optimal Control of Unknown Continuous-Time Nonaffine
Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
8.2 Optimal Control of Unknown Nonaffine Nonlinear Systems
with Constrained Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
8.2.1 Identifier Design via Dynamic Neural Networks . . . . . . 311
8.2.2 Actor–Critic Architecture
for Solving HJB Equation . . . . . . . . . . . . . . . . . . . . . . . 316
8.2.3 Stability Analysis of Closed-Loop System . . . . . . . . . . . 318
8.2.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
8.3 Optimal Output Regulation of Unknown Nonaffine Nonlinear
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
8.3.1 Neural Network Observer . . . . . . . . . . . . . . . . . . . . . . . 328
8.3.2 Observer-Based Optimal Control Scheme
Using Critic Network . . . . . . . . . . . . . . . . . . . . . . . . . . 333
8.3.3 Stability Analysis of Closed-Loop System . . . . . . . . . . . 337
8.3.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
9 Robust and Optimal Guaranteed Cost Control
of Continuous-Time Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . 345
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
9.2 Robust Control of Uncertain Nonlinear Systems. . . . . . . . . . . . . 346
9.2.1 Equivalence Analysis and Problem Transformation . . . . 348
9.2.2 Online Algorithm and Neural Network
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
9.2.3 Stability Analysis of Closed-Loop System . . . . . . . . . . . 353
9.2.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
9.3 Optimal Guaranteed Cost Control of Uncertain Nonlinear
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
9.3.1 Optimal Guaranteed Cost Controller Design . . . . . . . . . 362
9.3.2 Online Solution of Transformed Optimal Control
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
Contents xxiii

9.3.3 Stability Analysis of Closed-Loop System . . . . . . . . . . . 373


9.3.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
9.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
10 Decentralized Control of Continuous-Time Interconnected
Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
10.2 Decentralized Control of Interconnected Nonlinear Systems . . . . 388
10.2.1 Decentralized Stabilization via Optimal Control
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
10.2.2 Optimal Controller Design of Isolated Subsystems . . . . 394
10.2.3 Generalization to Model-Free
Decentralized Control . . . . . . . . . . . . . . . . . . . . . . . . . . 400
10.2.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
10.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
11 Learning Algorithms for Differential Games
of Continuous-Time Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
11.2 Integral Policy Iteration for Two-Player Zero-Sum Games . . . . . 418
11.2.1 Derivation of Integral Policy Iteration . . . . . . . . . . . . . . 420
11.2.2 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 423
11.2.3 Neural Network Implementation . . . . . . . . . . . . . . . . . . 425
11.2.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
11.3 Iterative Adaptive Dynamic Programming for Multi-player
Zero-Sum Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
11.3.1 Derivation of the Iterative ADP Algorithm . . . . . . . . . . 433
11.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
11.3.3 Neural Network Implementation . . . . . . . . . . . . . . . . . . 444
11.3.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
11.4 Synchronous Approximate Optimal Learning for Multi-player
Nonzero-Sum Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
11.4.1 Derivation and Convergence Analysis . . . . . . . . . . . . . . 460
11.4.2 Neural Network Implementation . . . . . . . . . . . . . . . . . . 464
11.4.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
11.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
xxiv Contents

Part III Applications


12 Adaptive Dynamic Programming for Optimal Residential Energy
Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
12.2 A Self-learning Scheme for Residential Energy System
Control and Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
12.2.1 The ADHDP Method . . . . . . . . . . . . . . . . . . . . . . . . . . 488
12.2.2 A Self-learning Scheme for Residential Energy
System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
12.2.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
12.3 A Novel Dual Iterative Q-Learning Method for Optimal
Battery Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
12.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
12.3.2 Dual Iterative Q-Learning Algorithm . . . . . . . . . . . . . . . 497
12.3.3 Neural Network Implementation . . . . . . . . . . . . . . . . . . 503
12.3.4 Numerical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
12.4 Multi-battery Optimal Coordination Control for Residential
Energy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
12.4.1 Distributed Iterative ADP Algorithm . . . . . . . . . . . . . . . 515
12.4.2 Numerical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
12.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
13 Adaptive Dynamic Programming for Optimal Control of Coal
Gasification Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
13.2 Data-Based Modeling and Properties . . . . . . . . . . . . . . . . . . . . . 538
13.2.1 Description of Coal Gasification Process
and Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
13.2.2 Data-Based Process Modeling and Properties . . . . . . . . 540
13.3 Design and Implementation of Optimal Tracking Control. . . . . . 546
13.3.1 Optimal Tracking Controller Design by Iterative ADP
Algorithm Under System and Iteration Errors . . . . . . . . 546
13.3.2 Neural Network Implementation . . . . . . . . . . . . . . . . . . 554
13.4 Numerical Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
13.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
14 Data-Based Neuro-Optimal Temperature Control
of Water Gas Shift Reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
14.2 System Description and Data-Based Modeling . . . . . . . . . . . . . . 572
14.2.1 Water Gas Shift Reaction . . . . . . . . . . . . . . . . . . . . . . . 572
14.2.2 Data-Based Modeling and Properties . . . . . . . . . . . . . . . 573
Contents xxv

14.3 Design of Neuro-Optimal Temperature Controller . . . . . . . .... 575


14.3.1 System Transformation . . . . . . . . . . . . . . . . . . . . . .... 575
14.3.2 Derivation of Stable Iterative ADP Algorithm . . . . .... 576
14.3.3 Properties of Stable Iterative ADP Algorithm
with Approximation Errors and Disturbances . . . . .... 578
14.4 Neural Network Implementation for the Optimal Tracking
Control Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
14.5 Numerical Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
14.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
Abbreviations

ACD Adaptive critic designs


AD Action-dependent, e.g., ADHDP and ADDHP
ADP Adaptive dynamic programming, or approximate dynamic programming
ADPRL Adaptive dynamic programming and reinforcement learning
BP Backpropagation
DHP Dual heuristic programming
DP Dynamic programming
GDHP Globalized dual heuristic programming
GPI Generalized policy iteration
GVI General value iteration
HDP Heuristic dynamic programming
HJB Hamilton–Jacobi–Bellman, e.g., HJB equation
HJI Hamilton–Jacobi–Isaacs, e.g., HJI equation
NN Neural network
PE Persistence of excitation
PI Policy iteration
UUB Uniformly ultimately bounded
VI Value iteration
RL Reinforcement learning

xxvii
Symbols

T The transposition symbol, e.g., AT is the transposition of matrix A


N The set of all natural numbers
Zþ The set of all positive integers, i.e., N ¼ f0; Z þ g
R The set of all real numbers
Rn The Euclidean space of all real n-vectors, e.g., a vector x 2 Rn is
written as x ¼ ðx1 ; x2 ; . . .; xn ÞT
Rmn The space of all m by n real matrices, e.g., a matrix A 2 Rmn is
written as A ¼ ðaij Þ 2 Rmn
kk The vector norm or matrix norm in Rn or Rnm
k  kF The Frobenius matrix norm, which is the Euclidean norm of a matrix,
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn P m
is defined as kAkF ¼ a2ij for A ¼ ðaij Þ 2 Rnm
i¼1 j¼1

2 Belong to
8 For all
) Implies
, Equivalent, or if and only if
 Kronecker product
; The empty set
, Equal to by definition
Cn ðΩÞ The class of functions having continuous nth derivative on Ω
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
R ffi
2
ℒ2(Ω) The ℒ2 space defined on Ω, i.e., Ω kf ðxÞk dx\1 for f 2 ℒ2(Ω)
ℒ∞(Ω) The ℒ∞ space defined on Ω, i.e., supx2Ω kf ðxÞk\1 for f 2 ℒ∞(Ω)
λmin ðAÞ The minimum eigenvalue of matrix A
λmax ðAÞ The maximum eigenvalue of matrix A
In The n by n identity matrix
A[0 Matrix A is positive definite
det(A) Determinant of matrix A
A1 The inverse of matrix A
tr(A) The trace of matrix A

xxix
xxx Symbols

vec(A) The vectorization mapping from matrix A into an mn-dimensional


column vector for A 2 Rmn
diagfζi g Also written as diagfζ1 ; ζ2 ; . . .; ζn g which is an n  n diagonal matrix
with diagonal elements ζ1 ; ζ2 ; . . .; ζ n
tanhðxÞ The hyperbolic tangent function of x
sgnðxÞ The sign function of x, i.e., sgnðxÞ ¼ 1 for x [ 0, sgnð0Þ ¼ 0, and
sgnðxÞ ¼ 1 for x\0
AðΩÞ The set of admissible controls on Ω
J Performance index, or cost-to-go, or cost function
Jμ Performance index, or cost-to-go, or cost function associated with the
policy μ
J Optimal performance index function or optimal cost function
V Value function, or performance index associated with a specific policy
V Optimal value function
L Lyapunov function
Wf ; Yf NN weights for function approximation
Wm ; Ym Model/identifier NN weights
Wo ; Yo Observer NN weights
Wc ; Yc Critic NN weights
Wa ; Ya Action NN weights
Chapter 1
Overview of Adaptive Dynamic
Programming

1.1 Introduction

Big data, artificial intelligence (AI), and deep learning are the three topics talked about
the most lately in information technology. The recent emergence of deep learning
[10, 17, 38, 68, 88] has pushed neural networks (NNs) to become a hot research
topic again. It has also gained huge success in almost every branch of AI, includ-
ing machine learning, pattern recognition, speech recognition, computer vision, and
natural language processing [17, 25, 26, 35, 74]. On the other hand, the study of
big data often uses AI technologies such as machine learning [80] and deep learning
[17]. One particular subject of study in AI, i.e., the computer game of Go, faced
a great challenge of dealing with vast amounts of data. The ancient Chinese board
game Go has been studied for years with the hope that one day, computer programs
can defeat human professional players. The board of game Go consists of 19 × 19
grid of squares. At the beginning of the game, each of the two players has roughly
360 options for placing each stone. However, the number of potential legal board
positions grows exponentially, and it quickly becomes greater than the total number
of atoms in the whole universe [103]. Such a number leads to so many directions any
given game can move in that makes it impossible for a computer to play by brute
force computation of all possible outcomes.
Previous computer programs focused less on evaluating the state of the board
positions and more on speeding up simulations of how the game might play out.
The Monte Carlo tree search approach was used often in computer game programs,
which samples only some of the possible sequences of plays randomly at each step to
choose between different possible moves instead of trying to calculate every possible
ones. Google DeepMind, an AI company in London acquired by Google in 2014,
developed a program called AlphaGo [92] that has shown performance previously
thought to be impossible for at least a decade. Instead of exploring various sequences
of moves, AlphaGo learns to make a move by evaluating the strength of its position on
the board. Such an evaluation was made possible by NN’s deep learning capabilities.

© Springer International Publishing AG 2017 1


D. Liu et al., Adaptive Dynamic Programming with Applications
in Optimal Control, Advances in Industrial Control,
DOI 10.1007/978-3-319-50815-3_1
2 1 Overview of Adaptive Dynamic Programming

Position evaluation (for approximating the optimal cost-to-go function of the


game) is the key to success of AlphaGo. Such ideas have been used previously
by many researchers in computer games, such as backgammon (TD-Gammon) [100,
101], checkers [87], othello [13], and chess [16]. A reinforcement learning technique
called TD(λ) was employed in AlphaGo and TD-Gammon for position evaluation.
With TD-Gammon, the program has learned to play backgammon at a grandmaster
level [100, 101]. On the other hand, AlphaGo has defeated European Go champion
Fan Hui (professional 2 dan) by 5 games to 0 [92] and defeated world Go champion
Lee Sedol (professional 9 dan) by 4 games to 1 [71, 111].
The success of reinforcement learning (RL) technique in this case relied on NN’s
deep learning capabilities [10, 17, 38, 68, 88]. The NNs used in AlphaGo have a
deep structure with 13 layers. Even though there were reports on the use of RL and
related techniques for the computer game of Go [15, 89, 93, 137, 138], it is only
with AlphaGo [92] that value networks were obtained using deep NNs to achieve
high evaluation accuracy. On the other hand, position evaluation [8, 20, 24, 89, 93]
and deep learning [18, 66, 102] have been considered for building programs to play
the game of Go; none of them achieved the level of success by AlphaGo [92]. The
match of AlphaGo versus Lee Sedol in March 2016 is a history-making event and
a milestone in the quest of AI. The defeat over humanity by a machine has also
generated huge public interests in AI technology around the world, especially in
China, Korea, US, and UK [111]. It will have a lasting impact on the research in AI,
deep learning, and RL [142].
RL is a very useful tool in solving optimization problems by employing the princi-
ple of optimality from dynamic programming (DP). In particular, in control systems
community, RL is an important approach to handle optimal control problems for
unknown nonlinear systems. DP provides an essential foundation for understanding
RL. Actually, most of the methods of RL can be viewed as attempts to achieve much
the same effect as DP, with less computation and without assuming a perfect model
of the environment. One class of RL methods is built upon the actor-critic structure,
namely adaptive critic designs, where an actor component applies an action or control
policy to the environment, and a critic component assesses the value of that action
and the state resulting from it. The combination of DP, NN, and actor-critic structure
results in the adaptive dynamic programming (ADP) algorithms.
The present book studies ADP with applications to optimal control. Significant
efforts will be devoted to the building of value functions which indicate how well
the predicted system performance is, which in turn are used for developing the opti-
mal control strategy. Both RL and ADP provide approximate solutions to dynamic
programming, and they are closely related to each other. Therefore, there has been a
trend to consider the two together as ADPRL (ADP and RL). Examples include IEEE
International Symposium on Adaptive Dynamic Programming and Reinforcement
Learning (started in 2007), IEEE CIS Technical Committee on Adaptive Dynamic
Programming and Reinforcement Learning (started in 2008), and a survey article
[42] published in 2009. A brief overview of RL will be given in the next section,
followed by a more detailed overview of ADP. We review both the basic forms of
1.1 Introduction 3

ADP as well as iterative forms. A few related books will be briefly reviewed before
the end of this chapter.

1.2 Reinforcement Learning

The main research results in RL can be found in the book by Sutton and Barto
[98] and the references cited in the book. Even though both RL and the main topic
studied in the present book, i.e., ADP, provide approximate solutions to dynamic
programming, research in these two directions has been somewhat independent [7]
in the past. The most famous algorithms in RL are the temporal difference algorithm
[97] and the Q-learning algorithm [112, 113]. Compared to ADP, the area of RL is
more mature and has a vast amount of literature (cf. [27, 34, 47, 98]).
An RL system typically consists of the following four components: {S, A, R, F},
where S is the set of states, A is the set of actions, R is the set of scaler reinforcement
signals or rewards, and F is the function describing the transition from one state
to the next under a given action, i.e., F : S × A → S. A policy π is defined as a
mapping π : S → A. At any given time t, the system can be in a state st ∈ S, take
an action at ∈ A determined by the policy π , i.e., at = π(st ), transition to the next
state s = st+1 , which is denoted by st+1 = F(st , at ), and at the same time, receive a
reward signal rt+1 = r(st , at , st+1 ) ∈ R. The goal of RL is to determine a policy to
maximize the accumulated reward starting from initial state s0 at t = 0.
An RL task always involves estimating some kind of value functions. A value
function estimates how good it is to be in a given state s and it is defined as,

 ∞

 
V π (s) = γ k rk+1 s0 =s = γ k r(sk , ak , sk+1 )s0 =s,
k=0 k=0

where 0 ≤ γ ≤ 1 is a discount factor and ak = π(sk ) and sk+1 = F(sk , ak ) for k =


0, 1, . . . . The definition of V π (s) can also be considered to start from st , i.e.,

 ∞

 
V π (s) = γ k rt+k+1 st =s = γ k r(st+k , at+k , st+k+1 )st =s, (1.2.1)
k=0 k=0

where at+k = π(st+k ) and st+k+1 = F(st+k , at+k ) for k = 0, 1, . . . . V π (s) is referred
to as the state-value function for policy π . On the other hand, the action-value function
for policy π estimates how good it is to perform a given action a in a given state s
under the policy π and it is defined as,

 ∞

 
Qπ (s, a) = γ k rt+k+1 st =s, at =a = γ k r(st+k , at+k , st+k+1 )st =s, at =a,
k=0 k=0
(1.2.2)
Another random document with
no related content on Scribd:
“We know a great deal about a good many things,” said Mrs.
Maplebury.
“What is it, Bradbury?” said Mrs. Fisher.
“I’m afraid I shall have to leave you for a couple of days. Great
nuisance, but there it is. But, of course, I must be there.”
“Where?”
“Ah, where?” said Mrs. Maplebury.
“At Sing-Sing. I see in the paper that to-morrow and the day after
they are inaugurating the new Osborne Stadium. All the men of my
class will be attending, and I must go, too.”
“Must you really?”
“I certainly must. Not to do so would be to show a lack of college
spirit. The boys are playing Yale, and there is to be a big dinner
afterwards. I shouldn’t wonder if I had to make a speech. But don’t
worry, honey,” he said, kissing his wife affectionately. “I shall be back
before you know I’ve gone.” He turned sharply to Mrs. Maplebury. “I
beg your pardon?” he said, stiffly.
“I did not speak.”
“I thought you did.”
“I merely inhaled. I simply drew in air through my nostrils. If I am
not at liberty to draw in air through my nostrils in your house, pray
inform me.”
“I would prefer that you didn’t,” said Bradbury, between set teeth.
“Then I would suffocate.”
“Yes,” said Bradbury Fisher.

Of all the tainted millionaires who, after years of plundering the


widow and the orphan, have devoted the evening of their life to the
game of golf, few can ever have been so boisterously exhilarated as
was Bradbury Fisher when, two nights later, he returned to his home.
His dreams had all come true. He had won his way to the foot of the
rain-bow. In other words, he was the possessor of a small pewter
cup, value three dollars, which he had won by beating a feeble old
gentleman with one eye in the final match of the competition for the
sixth sixteen at the Squashy Hollow Golf Club Invitation Tournament.
He entered the house, radiant.
“Tra-la!” sang Bradbury Fisher. “Tra-la!”
“I beg your pardon, sir?” said Vosper, who had encountered him in
the hall.
“Eh? Oh, nothing. Just tra-la.”
“Very good, sir.”
Bradbury Fisher looked at Vosper. For the first time it seemed to
sweep over him like a wave that Vosper was an uncommonly good
fellow. The past was forgotten, and he beamed upon Vosper like the
rising sun.
“Vosper,” he said, “what wages are you getting?”
“I regret to say, sir,” replied the butler, “that, at the moment, the
precise amount of the salary of which I am in receipt has slipped my
mind. I could refresh my memory by consulting my books, if you so
desire it, sir.”
“Never mind. Whatever it is, it’s doubled.”
“I am obliged, sir. You will, no doubt, send me a written memo, to
that effect?”
“Twenty, if you like.”
“One will be ample, sir.”
Bradbury curveted past him through the baronial hall and into the
Crystal Boudoir. His wife was there alone.
“Mother has gone to bed,” she said. “She has a bad headache.”
“You don’t say!” said Bradbury. It was as if everything was
conspiring to make this a day of days. “Well, it’s great to be back in
the old home.”
“Did you have a good time?”
“Capital.”
“You saw all your old friends?”
“Every one of them.”
“Did you make a speech at the dinner?”
“Did I! They rolled out of their seats and the waiters swept them up
with dusters.”
“A very big dinner, I suppose?”
“Enormous.”
“How was the football game?”
“Best I’ve ever seen. We won. Number 432,986 made a hundred-
and-ten-yard run for a touch-down in the last five minutes.”
“Really?”
“And that takes a bit of doing, with a ball and chain round your
ankle, believe me!”
“Bradbury,” said Mrs. Fisher, “where have you been these last two
days?”
Bradbury’s heart missed a beat. His wife was looking exactly like
her mother. It was the first time he had ever been able to believe that
she could be Mrs. Maplebury’s daughter.
“Been? Why, I’m telling you.”
“Bradbury,” said Mrs. Fisher, “just one word. Have you seen the
paper this morning?”
“Why, no. What with all the excitement of meeting the boys and
this and that—”
“Then you have not seen that the inauguration of the new Stadium
at Sing-Sing was postponed on account of an outbreak of mumps in
the prison?”
Bradbury gulped.
“There was no dinner, no football game, no gathering of Old Grads
—nothing! So—where have you been, Bradbury?”
Bradbury gulped again.
“You’re sure you haven’t got this wrong?” he said at length.
“Quite.”
“I mean, sure it wasn’t some other place?”
“Quite.”
“Sing-Sing? You got the name correctly?”
“Quite. Where, Bradbury, have you been these last two days?”
“Well—er—”
Mrs. Fisher coughed dryly.
“I merely ask out of curiosity. The facts will, of course, come out in
court.”
“In court!”
“Naturally I propose to place this affair in the hands of my lawyer
immediately.”
Bradbury started convulsively.
“You mustn’t!”
“I certainly shall.”
A shudder shook Bradbury from head to foot. He felt worse than
he had done when his opponent in the final had laid him a stymie on
the last green, thereby squaring the match and taking it to the
nineteenth hole.
“I will tell you all,” he muttered.
“Well?”
“Well—it was like this.”
“Yes?”
“Er—like this. In fact, this way.”
“Proceed.”
Bradbury clenched his hands; and, as far as that could be
managed, avoided her eye.
“I’ve been playing golf,” he said in a low, toneless voice.
“Playing golf?”
“Yes.” Bradbury hesitated. “I don’t mean it in an offensive spirit,
and no doubt most men would have enjoyed themselves thoroughly,
but I—well, I am curiously constituted, angel, and the fact is I simply
couldn’t stand playing with you any longer. The fault, I am sure, was
mine, but—well, there it is. If I had played another round with you,
my darling, I think that I should have begun running about in circles,
biting my best friends. So I thought it all over, and, not wanting to
hurt your feelings by telling you the truth, I stooped to what I might
call a ruse. I said I was going to the office; and, instead of going to
the office, I went off to Squashy Hollow and played there.”
Mrs. Fisher uttered a cry.
“You were there to-day and yesterday?”
In spite of his trying situation, the yeasty exhilaration which had
been upon him when he entered the room returned to Bradbury.
“Was I!” he cried. “You bet your Russian boots I was! Only winning
a cup, that’s all!”
“You won a cup?”
“You bet your diamond tiara I won a cup. Say, listen,” said
Bradbury, diving for a priceless Boule table and wrenching a leg off
it. “Do you know what happened in the semi-final?” He clasped his
fingers over the table-leg in the overlapping grip. “I’m here, see,
about fifteen feet off the green. The other fellow lying dead, and I’m
playing the like. Best I could hope for was a half, you’ll say, eh? Well,
listen. I just walked up to that little white ball, and I gave it a little flick,
and, believe me or believe me not, that little white ball never stopped
running till it plunked into the hole.”
He stopped. He perceived that he had been introducing into the
debate extraneous and irrelevant matter.
“Honey,” he said, fervently, “you musn’t get mad about this.
Maybe, if we try again, it will be all right. Give me another chance.
Let me come out and play a round to-morrow. I think perhaps your
style of play is a thing that wants getting used to. After all, I didn’t like
olives the first time I tried them. Or whisky. Or caviare, for that
matter. Probably if—”
Mrs. Fisher shook her head.
“I shall never play again.”
“Oh, but, listen—”
She looked at him fondly, her eyes dim with happy tears.
“I should have known you better, Bradbury. I suspected you. How
foolish I was.”
“There, there,” said Bradbury.
“It was mother’s fault. She put ideas into my head.”
There was much that Bradbury would have liked to say about her
mother, but he felt that this was not the time.
“And you really forgive me for sneaking off and playing at Squashy
Hollow?”
“Of course.”
“Then why not a little round to-morrow?”
“No, Bradbury, I shall never play again. Vosper says I mustn’t.”
“What!”
“He saw me one morning on the links, and he came to me and told
me—quite nicely and respectfully—that it must not occur again. He
said with the utmost deference that I was making a spectacle of
myself and that this nuisance must now cease. So I gave it up. But
it’s all right. Vosper thinks that gentle massage will cure my
wheezing, so I’m having it every day, and really I do think there’s an
improvement already.”
“Where is Vosper?” said Bradbury, hoarsely.
“You aren’t going to be rude to him, Bradbury? He is so sensitive.”
But Bradbury Fisher had left the room.

“You rang, sir?” said Vosper, entering the Byzantine smoking-room


some few minutes later.
“Yes,” said Bradbury. “Vosper, I am a plain, rugged man and I do
not know all that there is to be known about these things. So do not
be offended if I ask you a question.”
“Not at all, sir.”
“Tell me, Vosper, did the Duke ever shake hands with you?”
“Once only, sir—mistaking me in a dimly-lit hall for a visiting
archbishop.”
“Would it be all right for me to shake hands with you now?”
“If you wish it, sir, certainly.”
“I want to thank you, Vosper. Mrs. Fisher tells me that you have
stopped her playing golf. I think that you have saved my reason,
Vosper.”
“That is extremely gratifying, sir.”
“Your salary is trebled.”
“Thank you very much, sir. And, while we are talking, sir, if I might
—. There is one other little matter I wished to speak of, sir.”
“Shoot, Vosper.”
“It concerns Mrs. Maplebury, sir.”
“What about her?”
“If I might say so, sir, she would scarcely have done for the Duke.”
A sudden wild thrill shot through Bradbury.
“You mean—?” he stammered.
“I mean, sir, that Mrs. Maplebury must go. I make no criticism of
Mrs. Maplebury, you will understand, sir. I merely say that she would
decidedly not have done for the Duke.”
Bradbury drew in his breath sharply.
“Vosper,” he said, “the more I hear of that Duke of yours, the more
I seem to like him. You really think he would have drawn the line at
Mrs. Maplebury?”
“Very firmly, sir.”
“Splendid fellow! Splendid fellow! She shall go to-morrow, Vosper.”
“Thank you very much, sir.”
“And, Vosper.”
“Sir?”
“Your salary. It is quadrupled.”
“I am greatly obliged, sir.”
“Tra-la, Vosper!”
“Tra-la, sir. Will that be all?”
“That will be all. Tra-la!”
“Tra-la, sir,” said the butler.
CHAPTER IV
CHESTER FORGETS HIMSELF

The afternoon was warm and heavy. Butterflies loafed languidly in


the sunshine, birds panted in the shady recesses of the trees.
The Oldest Member, snug in his favourite chair, had long since
succumbed to the drowsy influence of the weather. His eyes were
closed, his chin sunk upon his breast. The pipe which he had been
smoking lay beside him on the turf, and ever and anon there
proceeded from him a muffled snore.
Suddenly the stillness was broken. There was a sharp, cracking
sound as of splitting wood. The Oldest Member sat up, blinking. As
soon as his eyes had become accustomed to the glare, he perceived
that a foursome had holed out on the ninth and was disintegrating.
Two of the players were moving with quick, purposeful steps in the
direction of the side door which gave entrance to the bar; a third was
making for the road that led to the village, bearing himself as one in
profound dejection; the fourth came on to the terrace.
“Finished?” said the Oldest Member.
The other stopped, wiping a heated brow. He lowered himself into
the adjoining chair and stretched his legs out.
“Yes. We started at the tenth. Golly, I’m tired. No joke playing in
this weather.”
“How did you come out?”
“We won on the last green. Jimmy Fothergill and I were playing
the vicar and Rupert Blake.”
“What was that sharp, cracking sound I heard?” asked the Oldest
Member.
“That was the vicar smashing his putter. Poor old chap, he had
rotten luck all the way round, and it didn’t seem to make it any better
for him that he wasn’t able to relieve his feelings in the ordinary way.”
“I suspected some such thing,” said the Oldest Member, “from the
look of his back as he was leaving the green. His walk was the walk
of an overwrought soul.”
His companion did not reply. He was breathing deeply and
regularly.
“It is a moot question,” proceeded the Oldest Member, thoughtfully,
“whether the clergy, considering their peculiar position, should not be
more liberally handicapped at golf than the laymen with whom they
compete. I have made a close study of the game since the days of
the feather ball, and I am firmly convinced that to refrain entirely from
oaths during a round is almost equivalent to giving away three
bisques. There are certain occasions when an oath seems to be so
imperatively demanded that the strain of keeping it in must inevitably
affect the ganglions or nerve-centres in such a manner as to
diminish the steadiness of the swing.”
The man beside him slipped lower down in his chair. His mouth
had opened slightly.
“I am reminded in this connection,” said the Oldest Member, “of
the story of young Chester Meredith, a friend of mine whom you
have not, I think, met. He moved from this neighbourhood shortly
before you came. There was a case where a man’s whole happiness
was very nearly wrecked purely because he tried to curb his instincts
and thwart nature in this very respect. Perhaps you would care to
hear the story?”
A snore proceeded from the next chair.
“Very well, then,” said the Oldest Member, “I will relate it.”
Chester Meredith (said the Oldest Member) was one of the nicest
young fellows of my acquaintance. We had been friends ever since
he had come to live here as a small boy, and I had watched him with
a fatherly eye through all the more important crises of a young man’s
life. It was I who taught him to drive, and when he had all that trouble
in his twenty-first year with shanking his short approaches, it was to
me that he came for sympathy and advice. It was an odd
coincidence, therefore, that I should have been present when he fell
in love.
I was smoking my evening cigar out here and watching the last
couples finishing their rounds, when Chester came out of the club-
house and sat by me. I could see that the boy was perturbed about
something, and wondered why, for I knew that he had won his
match.
“What,” I inquired, “is on your mind?”
“Oh, nothing,” said Chester. “I was only thinking that there are
some human misfits who ought not be allowed on any decent links.”
“You mean—?”
“The Wrecking Crew,” said Chester, bitterly. “They held us up all
the way round, confound them. Wouldn’t let us through. What can
you do with people who don’t know enough of the etiquette of the
game to understand that a single has right of way over a four-ball
foursome? We had to loaf about for hours on end while they
scratched at the turf like a lot of crimson hens. Eventually all four of
them lost their balls simultaneously at the eleventh and we managed
to get by. I hope they choke.”
I was not altogether surprised at his warmth. The Wrecking Crew
consisted of four retired business men who had taken up the noble
game late in life because their doctors had ordered them air and
exercise. Every club, I suppose, has a cross of this kind to bear, and
it was not often that our members rebelled; but there was
undoubtedly something particularly irritating in the methods of the
Wrecking Crew. They tried so hard that it seemed almost
inconceivable that they should be so slow.
“They are all respectable men,” I said, “and were, I believe, highly
thought of in their respective businesses. But on the links I admit that
they are a trial.”
“They are the direct lineal descendants of the Gadarene swine,”
said Chester firmly. “Every time they come out I expect to see them
rush down the hill from the first tee and hurl themselves into the lake
at the second. Of all the—”
“Hush!” I said.
Out of the corner of my eye I had seen a girl approaching, and I
was afraid lest Chester in his annoyance might use strong language.
For he was one of those golfers who are apt to express themselves
in moments of emotion with a good deal of generous warmth.
“Eh?” said Chester.
I jerked my head, and he looked round. And, as he did so, there
came into his face an expression which I had seen there only once
before, on the occasion when he won the President’s Cup on the last
green by holing a thirty-yard chip with his mashie. It was a look of
ecstasy and awe. His mouth was open, his eyebrows raised, and he
was breathing heavily through his nose.
“Golly!” I heard him mutter.
The girl passed by. I could not blame Chester for staring at her.
She was a beautiful young thing, with a lissom figure and a perfect
face. Her hair was a deep chestnut, her eyes blue, her nose small
and laid back with about as much loft as a light iron. She
disappeared, and Chester, after nearly dislocating his neck trying to
see her round the corner of the club-house, emitted a deep,
explosive sigh.
“Who is she?” he whispered.
I could tell him that. In one way and another I get to know most
things around this locality.
“She is a Miss Blakeney. Felicia Blakeney. She has come to stay
for a month with the Waterfields. I understand she was at school with
Jane Waterfield. She is twenty-three, has a dog named Joseph,
dances well, and dislikes parsnips. Her father is a distinguished
writer on sociological subjects; her mother is Wilmot Royce, the well-
known novelist, whose last work, Sewers of the Soul, was, you may
recall, jerked before a tribunal by the Purity League. She has a
brother, Crispin Blakeney, an eminent young reviewer and essayist,
who is now in India studying local conditions with a view to a series
of lectures. She only arrived here yesterday, so this is all I have been
able to find out about her as yet.”
Chester’s mouth was still open when I began speaking. By the
time I had finished it was open still wider. The ecstatic look in his
eyes had changed to one of dull despair.
“My God!” he muttered. “If her family is like that, what chance is
there for a rough-neck like me?”
“You admire her?”
“She is the alligator’s Adam’s apple,” said Chester, simply.
I patted his shoulder.
“Have courage, my boy,” I said. “Always remember that the love of
a good man, to whom the pro can give only a couple of strokes in
eighteen holes is not to be despised.”
“Yes, that’s all very well. But this girl is probably one solid mass of
brain. She will look on me as an uneducated wart-hog.”
“Well, I will introduce you, and we will see. She looked a nice girl.”
“You’re a great describer, aren’t you?” said Chester. “A wonderful
flow of language you’ve got, I don’t think! Nice girl! Why, she’s the
only girl in the world. She’s a pearl among women. She’s the most
marvellous, astounding, beautiful, heavenly thing that ever drew
perfumed breath.” He paused, as if his train of thought had been
interrupted by an idea. “Did you say that her brother’s name was
Crispin?”
“I did. Why?”
Chester gave vent to a few manly oaths.
“Doesn’t that just show you how things go in this rotten world?”
“What do you mean?”
“I was at school with him.”
“Surely that should form a solid basis for friendship?”
“Should it? Should it, by gad? Well, let me tell you that I probably
kicked that blighted worm Crispin Blakeney a matter of seven
hundred and forty-six times in the few years I knew him. He was the
world’s worst. He could have walked straight into the Wrecking Crew
and no questions asked. Wouldn’t it jar you? I have the luck to know
her brother, and it turns out that we couldn’t stand the sight of each
other.”
“Well, there is no need to tell her that.”
“Do you mean—?” He gazed at me wildly. “Do you mean that I
might pretend we were pals?”
“Why not? Seeing that he is in India, he can hardly contradict you.”
“My gosh!” He mused for a moment. I could see that the idea was
beginning to sink in. It was always thus with Chester. You had to give
him time. “By Jove, it mightn’t be a bad scheme at that. I mean, it
would start me off with a rush, like being one up on bogey in the first
two. And there’s nothing like a good start. By gad, I’ll do it.”
“I should.”
“Reminiscences of the dear old days when we were lads together,
and all that sort of thing.”
“Precisely.”
“It isn’t going to be easy, mind you,” said Chester, meditatively. “I’ll
do it because I love her, but nothing else in this world would make
me say a civil word about the blister. Well, then, that’s settled. Get on
with the introduction stuff, will you? I’m in a hurry.”
One of the privileges of age is that it enables a man to thrust his
society on a beautiful girl without causing her to draw herself up and
say “Sir!” It was not difficult for me to make the acquaintance of Miss
Blakeney, and, this done, my first act was to unleash Chester on her.
“Chester,” I said, summoning him as he loafed with an overdone
carelessness on the horizon, one leg almost inextricably entwined
about the other, “I want you to meet Miss Blakeney. Miss Blakeney,
this is my young friend Chester Meredith. He was at school with your
brother Crispin. You were great friends, were you not?”
“Bosom,” said Chester, after a pause.
“Oh, really?” said the girl. There was a pause. “He is in India now.”
“Yes,” said Chester.
There was another pause.
“Great chap,” said Chester, gruffly.
“Crispin is very popular,” said the girl, “with some people.”
“Always been my best pal,” said Chester.
“Yes?”
I was not altogether satisfied with the way matters were
developing. The girl seemed cold and unfriendly, and I was afraid
that this was due to Chester’s repellent manner. Shyness, especially
when complicated by love at first sight, is apt to have strange effects
on a man, and the way it had taken Chester was to make him
abnormally stiff and dignified. One of the most charming things about
him, as a rule, was his delightful boyish smile. Shyness had caused
him to iron this out of his countenance till no trace of it remained. Not
only did he not smile, he looked like a man who never had smiled
and never would. His mouth was a thin, rigid line. His back was stiff
with what appeared to be contemptuous aversion. He looked down
his nose at Miss Blakeney as if she were less than the dust beneath
his chariot-wheels.
I thought the best thing to do was to leave them alone together to
get acquainted. Perhaps, I thought, it was my presence that was
cramping Chester’s style. I excused myself and receded.
It was some days before I saw Chester again. He came round to
my cottage one night after dinner and sank into a chair, where he
remained silent for several minutes.
“Well?” I said at last.
“Eh?” said Chester, starting violently.
“Have you been seeing anything of Miss Blakeney lately?”
“You bet I have.”
“And how do you feel about her on further acquaintance?”
“Eh?” said Chester, absently.
“Do you still love her?”
Chester came out of his trance.
“Love her?” he cried, his voice vibrating with emotion. “Of course I
love her. Who wouldn’t love her? I’d be a silly chump not loving her.
Do you know,” the boy went on, a look in his eyes like that of some
young knight seeing the Holy Grail in a vision, “do you know, she is
the only woman I ever met who didn’t overswing. Just a nice, crisp,
snappy, half-slosh, with a good full follow-through. And another thing.
You’ll hardly believe me, but she waggles almost as little as George
Duncan. You know how women waggle as a rule, fiddling about for a
minute and a half like kittens playing with a ball of wool. Well, she
just makes one firm pass with the club and then bing! There is none
like her, none.”
“Then you have been playing golf with her?”
“Nearly every day.”
“How is your game?”
“Rather spotty. I seem to be mistiming them.”
I was concerned.
“I do hope, my dear boy,” I said, earnestly, “that you are taking
care to control your feelings when out on the links with Miss
Blakeney. You know what you are like. I trust you have not been
using the sort of language you generally employ on occasions when
you are not timing them right?”
“Me?” said Chester, horrified. “Who, me? You don’t imagine for a
moment that I would dream of saying a thing that would bring a blush
to her dear cheek, do you? Why, a bishop could have gone round
with me and learned nothing new.”
I was relieved.
“How do you find you manage the dialogue these days?” I asked.
“When I introduced you, you behaved—you will forgive an old friend
for criticising—you behaved a little like a stuffed frog with laryngitis.
Have things got easier in that respect?”
“Oh yes. I’m quite the prattler now. I talk about her brother mostly. I
put in the greater part of my time boosting the tick. It seems to be
coming easier. Will-power, I suppose. And then, of course, I talk a
good deal about her mother’s novels.”
“Have you read them?”
“Every damned one of them—for her sake. And if there’s a greater
proof of love than that, show me! My gosh, what muck that woman
writes! That reminds me, I’ve got to send to the bookshop for her
latest—out yesterday. It’s called The Stench of Life. A sequel, I
understand, to Grey Mildew.”
“Brave lad,” I said, pressing his hand. “Brave, devoted lad!”
“Oh, I’d do more than that for her.” He smoked for a while in
silence. “By the way, I’m going to propose to her to-morrow.”
“Already?”
“Can’t put it off a minute longer. It’s been as much as I could
manage, bottling it up till now. Where do you think would be the best
place? I mean, it’s not the sort of thing you can do while you’re
walking down the street or having a cup of tea. I thought of asking
her to have a round with me and taking a stab at it on the links.”
“You could not do better. The links—Nature’s cathedral.”
“Right-o, then! I’ll let you know how I come out.”
“I wish you luck, my boy,” I said.

And what of Felicia, meanwhile? She was, alas, far from returning
the devotion which scorched Chester’s vital organs. He seemed to
her precisely the sort of man she most disliked. From childhood up
Felicia Blakeney had lived in an atmosphere of highbrowism, and the
type of husband she had always seen in her daydreams was the
man who was simple and straightforward and earthy and did not
know whether Artbashiekeff was a suburb of Moscow or a new kind
of Russian drink. A man like Chester, who on his own statement
would rather read one of her mother’s novels than eat, revolted her.
And his warm affection for her brother Crispin set the seal on her
distaste.
Felicia was a dutiful child, and she loved her parents. It took a bit
of doing, but she did it. But at her brother Crispin she drew the line.
He wouldn’t do, and his friends were worse than he was. They were
high-voiced, supercilious, pince-nezed young men who talked
patronisingly of Life and Art, and Chester’s unblushing confession
that he was one of them had put him ten down and nine to play right
away.
You may wonder why the boy’s undeniable skill on the links had no
power to soften the girl. The unfortunate fact was that all the good
effects of his prowess were neutralised by his behaviour while
playing. All her life she had treated golf with a proper reverence and
awe, and in Chester’s attitude towards the game she seemed to
detect a horrible shallowness. The fact is, Chester, in his efforts to
keep himself from using strong language, had found a sort of relief in
a girlish giggle, and it made her shudder every time she heard it.
His deportment, therefore, in the space of time leading up to the
proposal could not have been more injurious to his cause. They
started out quite happily, Chester doing a nice two-hundred-yarder
off the first tee, which for a moment awoke the girl’s respect. But at
the fourth, after a lovely brassie-shot, he found his ball deeply
embedded in the print of a woman’s high heel. It was just one of
those rubs of the green which normally would have caused him to
ease his bosom with a flood of sturdy protest, but now he was on his
guard.
“Tee-hee!” simpered Chester, reaching for his niblick. “Too bad, too
bad!” and the girl shuddered to the depths of her soul.
Having holed out, he proceeded to enliven the walk to the next tee
with a few remarks on her mother’s literary style, and it was while
they were walking after their drives that he proposed.
His proposal, considering the circumstances, could hardly have
been less happily worded. Little knowing that he was rushing upon
his doom, Chester stressed the Crispin note. He gave Felicia the
impression that he was suggesting this marriage more for Crispin’s
sake than anything else. He conveyed the idea that he thought how
nice it would be for brother Crispin to have his old chum in the family.
He drew a picture of their little home, with Crispin for ever popping in
and out like a rabbit. It is not to be wondered at that, when at length
he had finished and she had time to speak, the horrified girl turned
him down with a thud.
It is at moments such as these that a man reaps the reward of a
good upbringing.
In similar circumstances those who have not had the benefit of a
sound training in golf are too apt to go wrong. Goaded by the sudden
anguish, they take to drink, plunge into dissipation, and write vers
libre. Chester was mercifully saved from this. I saw him the day after
he had been handed the mitten, and was struck by the look of grim
determination in his face. Deeply wounded though he was, I could
see that he was the master of his fate and the captain of his soul.
“I am sorry, my boy,” I said, sympathetically, when he had told me
the painful news.
“It can’t be helped,” he replied, bravely.
“Her decision was final?”
“Quite.”
“You do not contemplate having another pop at her?”
“No good. I know when I’m licked.”
I patted him on the shoulder and said the only thing it seemed
possible to say.
“After all, there is always golf.”
He nodded.
“Yes. My game needs a lot of tuning up. Now is the time to do it.
From now on I go at this pastime seriously. I make it my life-work.
Who knows?” he murmured, with a sudden gleam in his eyes. “The
Amateur Championship—”
“The Open!” I cried, falling gladly into his mood.
“The American Amateur,” said Chester, flushing.
“The American Open,” I chorused.
“No one has ever copped all four.”
“No one.”
“Watch me!” said Chester Meredith, simply.

It was about two weeks after this that I happened to look in on


Chester at his house one morning. I found him about to start for the
links. As he had foreshadowed in the conversation which I have just
related, he now spent most of the daylight hours on the course. In
these two weeks he had gone about his task of achieving perfection
with a furious energy which made him the talk of the club. Always
one of the best players in the place, he had developed an
astounding brilliance. Men who had played him level were now
obliged to receive two and even three strokes. The pro. himself
conceding one, had only succeeded in halving their match. The
struggle for the President’s Cup came round once more, and
Chester won it for the second time with ridiculous ease.

You might also like