You are on page 1of 128

IMPROVING THE CLOSED-LOOP PERFORMANCE

OF NONLINEAR SYSTEMS
By
Randal W. Beard
A Thesis Submitted to the Graduate
Faculty of Rensselaer Polytechnic Institute
in Partial Ful llment of the
Requirements for the Degree of
DOCTOR OF PHILOSOPHY
Major Subject: Electrical Engineering

Approved by the
Examining Committee:

Professor George N. Saridis, Thesis Adviser

Professor John T. Wen, Member

Professor Howard Kaufman, Member

Professor Mark Levi, Member

Rensselaer Polytechnic Institute


Troy, New York
October, 1995
(For Graduation December 1995)
c Copyright 1995
by
Randal W. Beard
All Rights Reserved

i
CONTENTS
LIST OF FIGURES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : v
ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : vi
ACKNOWLEDGMENT : : : : : : :::::::::::::::::::::::: vii
1. INTRODUCTION : : : : : : : :::::::::::::::::::::::: 1
1.1 Motivation : : : : : : : : : :::::::::::::::::::::::: 1
1.2 The Key Ideas : : : : : : : :::::::::::::::::::::::: 4
1.3 Organization of the Thesis :::::::::::::::::::::::: 6
2. LITERATURE REVIEW : : : : : : : : : : : : : : : : : : : : : : : : : : : 7
2.1 Methods Dependent on the Initial State : : : : : : : : : : : : : : : : 7
2.2 Linearization about a Nominal Path : : : : : : : : : : : : : : : : : : : 8
2.3 Perturbation Methods : : : : : : : : : : : : : : : : : : : : : : : : : : 8
2.4 Regularization of the Cost Function : : : : : : : : : : : : : : : : : : : 10
2.5 Feedback Linearization : : : : : : : : : : : : : : : : : : : : : : : : : : 11
2.6 Gain Scheduling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11
2.7 Other Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12
2.8 The Successive Approximation Method : : : : : : : : : : : : : : : : : 13
2.9 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14
3. PROBLEM FORMULATION : : : : : : : : : : : : : : : : : : : : : : : : : 16
3.1 In nite-Time Horizon Problem : : : : : : : : : : : : : : : : : : : : : : 16
3.2 Finite-Time Horizon Problem : : : : : : : : : : : : : : : : : : : : : : 22
3.3 The Generalized-Hamilton-Jacobi-Bellman Equation : : : : : : : : : : 25
3.4 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29
4. A NEW ALGORITHM TO IMPROVE CLOSED-LOOP PERFORMANCE 31
4.1 The Basic Idea of Galerkin's Method : : : : : : : : : : : : : : : : : : 31
4.2 Galerkin Projections of the GHJB equation : : : : : : : : : : : : : : : 32
4.3 The Combined Algorithm : : : : : : : : : : : : : : : : : : : : : : : : 33
4.4 Implementation Issues : : : : : : : : : : : : : : : : : : : : : : : : : : 38
4.5 Summary of the Method : : : : : : : : : : : : : : : : : : : : : : : : : 40
ii
5. CONVERGENCE AND STABILITY : : : : : : : : : : :: : : : : : : : : : 42
5.1 The Convergence Problem : : : : : : : : : : : : : :: : : : : : : : : : 42
5.2 Preliminary Tools : : : : : : : : : : : : : : : : : : :: : : : : : : : : : 43
5.3 Convergence and Stability Proofs : : : : : : : : : :: : : : : : : : : : 61
5.3.1 Convergence of Successive Approximations :: : : : : : : : : : 61
5.3.2 Convergence of Galerkin Approximations : :: : : : : : : : : : 63
5.3.3 Convergence of the Main Algorithm : : : : :: : : : : : : : : : 69
5.4 Summary of the Main Result : : : : : : : : : : : :: : : : : : : : : : 69
6. EXAMPLES AND COMPARISONS : : : : : : : : : : : : : : : : : : : : : 70
6.1 Illustrative Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : 70
6.1.1 Linear System with Non-Quadratic Cost : : : : : : : : : : : : 71
6.1.2 Nonlinear System with Non-Quadratic Cost : : : : : : : : : : 73
6.1.3 Bilinear System: Non-Smooth Control : : : : : : : : : : : : : 73
6.1.4 Inverted Pendulum : : : : : : : : : : : : : : : : : : : : : : : : 75
6.1.5 Finite-time Chemical Reactor : : : : : : : : : : : : : : : : : : 77
6.1.6 Finite-time Nonholonomic Example : : : : : : : : : : : : : : : 81
6.2 Comparative Examples : : : : : : : : : : : : : : : : : : : : : : : : : : 85
6.2.1 Comparison with Perturbation Methods : : : : : : : : : : : : 85
6.2.2 Comparison with Regularization Methods : : : : : : : : : : : 87
6.2.3 Comparison with Exact Linearization Method : : : : : : : : : 89
6.3 Design Example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 91
6.3.1 Voltage Regulation of a Power System : : : : : : : : : : : : : 91
7. CONCLUSION AND FUTURE WORK : : : : : : : : : : : : : : : : : : : 97
7.1 Overview of the Main Results : : : : : : : : : : : : : : : : : : : : : : 97
7.2 Contributions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 98
7.3 Future Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 99
7.4 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 100
REFERENCES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 101
APPENDICES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 107
A. AUXILIARY RESULTS : : : : : : : : : : : : : : : : : : : : : : : : : : : : 108
A.1 Proof of Lemma 5.2.1. : : : : : : : : : : : : : : : : : : : : : : : : : : 108

iii
A.2 Proof of Lemma 5.2.2. :
: : : : : : : : : : : : : : : : : : : : : : : : : 109
A.3 Proof of Lemma 5.2.3. :
: : : : : : : : : : : : : : : : : : : : : : : : : 110
A.4 Proof of Lemma 5.2.4. :
: : : : : : : : : : : : : : : : : : : : : : : : : 110
A.5 :
Proof of Corollary 5.2.5. : : : : : : : : : : : : : : : : : : : : : : : : 112
A.6 Proof of Lemma 5.2.12. :
: : : : : : : : : : : : : : : : : : : : : : : : : 112
A.7 :
Proof of Theorem 5.3.1. : : : : : : : : : : : : : : : : : : : : : : : : 114
B. GALERKIN'S METHOD : : : : : : : : : : : : : : : : : : : : : : : : : : : 115
C. LIST OF SYMBOLS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 118

iv
LIST OF FIGURES
3.1 Phase ow plotted against lines of constant cost. : : : : : : : : 27
3.2 Successive Approximation Algorithm. : : : : : : : : : : : : : : 28
4.1 Closed Loop System. : : : : : : : : : : : : : : : : : : : : : : : 38
4.2 Algorithm for Improving Feedback Control Laws. : : : : : : : 41
5.1 Convergence Diagram : : : : : : : : : : : : : : : : : : : : : : 43
5.2 Gain margins for successive approximations u(i) . : : : : : : : : 60
6.1 Cost and control for linear system with non-quadratic cost. : : 72
6.2 Control vs. state for bilinear system. : : : : : : : : : : : : : : 74
6.3 Cost vs. position for inverted pendulum. : : : : : : : : : : : : 76
6.4 Time varying control gains for chemical reactor. : : : : : : : : 78
6.5 Time histories for chemical reactor. : : : : : : : : : : : : : : : 80
6.6 Unicycle: Control Gains : : : : : : : : : : : : : : : : : : : : : 82
6.7 Unicycle: Time history for in nite-time gains : : : : : : : : : 83
6.8 Unicycle: Time Histories : : : : : : : : : : : : : : : : : : : : : 84
6.9 Comparison with perturbation method. : : : : : : : : : : : : : 86
6.10 Comparison with regularization method. : : : : : : : : : : : : 88
6.11 Comparison with feedback linearization. : : : : : : : : : : : : 90
6.12 Generator connected through transmission lines to in nite bus. 92
6.13 Power generator, time histories of the states. : : : : : : : : : : 95
6.14 Power generator, time history of the control. : : : : : : : : : : 96

v
ABSTRACT
There are a variety of tools for computing stabilizing feedback control laws for non-
linear systems. The diculty is that these tools usually do not take into account the
performance of the closed-loop systems. On the other hand, optimal control theory
gives guaranteed closed-loop performance but the resulting problem is dicult to
solve for general nonlinear systems. While there may be many feedback control laws
that provide adequate performance, optimal control theory insists on the one control
that provides peak performance. In this thesis we bypass the diculties inherent
in the optimal control problem by developing a design algorithm to improve the
closed-loop performance of arbitrary, stabilizing feedback control laws.
The problem of improving the closed-loop performance of a stabilizing con-
trol reduces to solving a rst-order, linear partial di erential equation called the
Generalized-Hamilton-Jacobi-Bellman (GHJB) equation. An interesting fact is that
when the process is iterated, the solution to the GHJB equation converges uniformly
to the solution of the Hamilton-Jacobi-Bellman equation which solves the optimal
control problem. The main contribution of the thesis is to show that Galerkin's
method can be used to nd a solution to the GHJB equation and that the result-
ing control laws are stable, and that when the process is iterated, it still converges
to the optimal control. The thesis therefore solves an important problem that has
remained open for over thirty years, i.e., it shows how to nd a uniform approxima-
tion to the Hamilton-Jacobi-Bellman equation such that the approximate controls
are still stable on a speci ed set.
The method developed in the thesis is a practical, o -line algorithm that
computes closed-form, feedback control laws with guaranteed performance. Our
algorithm is the rst practical method that computes arbitrarily close uniform ap-
proximations to the optimal control, while simultaneously guaranteeing closed-loop
asymptotic stability.

vi
ACKNOWLEDGMENT
I would rst like to acknowledge the assistance and guidance, over the past four
years, of my thesis advisor Professor George Saridis. In particular I am grateful
that George allowed me the exibility to search for a problem in which I was was
really interested, and then insisted that I stick with the problem when it momentarily
appeared to be a dead end. I am also indebted to George for his advise to pursue
a curriculum loaded with mathematics and theory. I feel that the quality of my
graduate education has been greatly enriched by this emphasis.
I would also like to acknowledge the technical assistance and support of Pro-
fessor John Wen, who co-advised me while George was on sabbatical in Greece. It
was John who rst suggested that I try to use Galerkin's method to approximate the
Generalized-Hamilton-Jacobi-Bellman equation. I am also grateful for the enthusi-
asm which John brings to his work and attribute the beginning of my excitement
about nonlinear control to him.
I would also like to thank the other members of my thesis committee, Pro-
fessor Howard Kaufman and Professor Mark Levi for their technical assistance and
encouragement. In addition I would like to thank Professor Joe Chow for his en-
couragement with regard to applying the methods of the thesis to voltage regulation
of power generators.
As a graduate student at RPI I have been fortunate to have friends who are
technically competent and intellectually alive. Speci cally I would like to acknowl-
edge Fernando Lizarralde, Brian Tibbetts, Pedro Lima, Joe Musto, Lee Wil nger,
Adam Divelbiss, and Sanjeev Seereeram. In particular I would like to thank Fer-
nando for many helpful discussions, questions and critiques concerning my thesis
work.
When we moved to Troy four years ago, three other families who attend the
same church arrived at the same time for the same purpose. I am deeply grateful
for our friendship with James Maxwell, Dan Moore, and John Palmer and their
families. We share common values, beliefs, ambitions and dreams. Their association
and friendship has made our stay here particularly enjoyable.
I gratefully acknowledge the constant support and encouragement of my wife
Andrea. Over the past four years she has willingly lived the lifestyle of a graduate
student and worked hard to support our family. She is a wonderful wife and mother
and anything that is good in this thesis would not have been possible without her.
I would like to thank my children Laurann and Kaitlyn for (somewhat unwillingly)
allowing me to spend so much time at school.
I am fortunate to come from a great family and would like to acknowledge
their encouragement and support. My father instilled in me a love and thirst for
learning which I hope never dies, while my mother taught me the great truth that
to succeed you only need to act as if you already have. It was my Grandfather's
example and love of science that instilled in me, very early in life, the desire to

vii
obtain a Ph.D. I am also fortunate to have seven brothers and sisters who are some
of the best friends I have.
Finally I acknowledge the hand of God in my life. Without His help, this
thesis and everything else that I have accomplished or hope to accomplish would
not be possible.
Last but not least, I gratefully acknowledge the hand by which I've been fed:
nancial support for this research came from the Center for Intelligent Systems for
Space Exploration (CIRSSE) which was funded by NASA grants NGT 10000 and
NAGW-1333.

viii
CHAPTER 1
INTRODUCTION
1.1 Motivation
Control engineering is the study of physical systems that produce a desired
response to inputs. The objective of any control design is to shape the input such
that the output of the system has certain characteristics. For example, in an auto-
motive cruise control system, the throttle is automatically adjusted to maintain a
speed prescribed by the driver. Dynamical systems are composed of inputs, states
and outputs. The states correspond to internal dynamics and are usually associated
with energy storage devices in the system. The states are dynamically in uenced by
the inputs to the system. The outputs, which depend statically on the states, are
variables in the system that can be measured. When the control, or input to the
system, depends on the state (resp. output) of the system, then the control is called
a state (resp. output) feedback control. Feedback control laws are desirable because,
among other things, they provide robustness with respect to external disturbances,
unmodeled dynamics and variations in the physical parameters of the system that
is being controlled. In this thesis we concentrate on state feedback control laws, i.e.,
\control" will always mean \state feedback control."
Physical systems are inherently nonlinear in nature. However, nonlinear sys-
tems are dicult to analyze mathematically. The typical approach is to linearize
the system around some operating point and to analyze the resulting linear system.
If the motion of the system is large, then the linear model of the system becomes
invalid. Therefore it is desirable to consider the full nonlinear model of the system.
To deal with the inherent mathematical diculty of nonlinear systems, one of two
approaches is typically adopted. The rst approach is to utilize speci c properties
of the system to develop speci c control laws that perform well for that system.
The drawback is that the results may not be applicable to any other system. The
second approach is to develop tools for general classes of nonlinear systems. The
drawback is that these tools will usually result in conservative designs since they
do not exploit speci c characteristics of the system under design. To attack any
particular problem however, it is necessary to have a number of design tools from
which to draw. Since there are relatively few design tools for nonlinear systems our
objective is to develop a feedback synthesis method for a general class of nonlinear
systems.
We will address the nonlinear regulator problem, where the control objective
is to move the system states to the origin. For the regulator problem, a closed-
loop system is stable if the states converge asymptotically to the origin. There is a
large literature devoted to the problem of designing stable regulators for nonlinear
system. The most popular tool is Lyapunov's second method which is important
to our arguments and will be reviewed in chapter 3. To use Lyapunov's method,
a designer rst proposes a control and then tries to nd a Lyapunov function for

1
2

the closed-loop system. A Lyapunov function is a generalized energy function of


the states, and is usually suggested by the physics of the problem. Using Lyapunov
theory or some other method, it is often possible to nd a stabilizing control for
a particular system. However, stability is only a bare minimum requirement in a
system design; typically other performance criteria are speci ed. For example, an
autopilot for a commercial airliner must keep the airplane on its course despite wind
gusts and changes in the atmospheric pressure, however in compensating for these
disturbances, corrections to the direction of the airplane must be slow enough that
passengers do not experience discomfort.
Consequently, the control synthesis problem is: how do we design a stable
control such that the closed-loop system behaves in some prescribed way. The
diculty is that the performance speci cations are usually heuristic (like passenger
discomfort) and are hard to quantify mathematically. The typical approach is to
formulate the performance in mathematical terms, i.e. in terms of a performance
index, and then to design a control based on this index. If the resulting closed-
loop behavior does not meet some of the heuristic speci cations, then the control
is tuned by modifying the performance index. Therefore, it must be relatively easy
to design a new control when the performance index is changed. In this thesis we
will consider performance measures that are integral weightings of the state and
the control. Intuitively, the performance of a control will be good when the entire
state and control trajectories remain small. This performance index is the standard
H2 performance measure from classical optimal control except the state weighting
function is allowed to be more general than the typical quadratic weighting function.
Given a speci c performance index, a natural question is: can we nd a control
that optimizes the performance index. This is the standard question of optimal
control. The diculty is that when the system equations are nonlinear, the solution
to the optimal control problem is very dicult, and often impossible to nd. In
fact, for many systems, an optimal control may not even exist. When a solution
does exist, the problem can often be solved numerically if the initial condition of the
system is some xed x0 . In that case the optimal control and optimal trajectory can
be found as a function of time. The diculty is that these controls are open loop: if
the initial condition is not exactly known, or if during execution, the state deviates
slightly from the optimal state (which always occurs due to imprecise modeling and
precision) then the control is no longer optimal and is usually not even stabilizing.
Since the performance index is heuristic and only used as a guide to system design,
it is hard to justify the tremendous e ort that is often required to nd the optimal
feedback control. As a consequence, practicing control engineers have not embraced
the methods of optimal control for nonlinear systems. On the other hand, the success
of LQR methods for linear systems, indicates the promise of design methods based
on the H2 performance index. To bypass the problems inherent in optimal control,
we ask a more general question that seems to make more sense from an engineering
point of view
Given an initial stabilizing control, how do we improve the closed-loop
3

performance of this control, and is there a practical method of computing


the improved control law?
Instead of searching for for the optimal control, which is like nding for a needle
in a haystack, we look for any control that improves upon the performance of an
existing feedback law.
A solution to this question provides a bridge between the problem of nding a
stable control law and nding \the" optimal control. Potential bene ts are, rst, it
will be possible to improve the closed-loop performance of existing control strategies.
In addition, by iterating the process, we obtain a general design method for nonlinear
systems that incorporates systems performance. Finally, the method builds upon a
rich literature devoted to the problem of nding stabilizing controls.
For nonlinear systems, the optimal control problem reduces to the solution of
the Hamilton-Jacobi-Bellman (HJB) equation, which is a nonlinear partial di eren-
tial equation. This equation is extremely dicult to solve and so researchers have
looked for methods of approximating its solution. Most of the results, which will be
reviewed in chapter 2, are either open-loop, heuristic, or limited to a small class of
problems. Interestingly, as we iterate the process of improving an initial control, our
algorithm computes arbitrarily close approximations of the HJB equation. There-
fore, through hindsight, we recognize that this thesis solves a problem which has
been actively studied in the control literature since the 1960's (Merriam III, 1964):
How can we approximate the HJB equation arbitrarily closely, while
ensuring that the control laws based on the approximate solution are
stable, and have a speci ed region of attraction.
Therefore the thesis also addresses the problem of synthesizing H2 optimal feedback
control laws for a general class of nonlinear systems.
The problem of designing feedback control laws with guaranteed performance is
important for many modern applications including automatic ight control (Garrard
and Jordan, 1977), process control of chemical reactions (Hofer and Tibken, 1988),
stability of large scale power systems (Gao et al., 1992), robotic manipulators (Lewis
et al., 1993) and biomedical drug infusion systems (Mohler, 1991). Currently most
of these systems are controlled by linearizing the nonlinear system around a desired
equilibrium or path, and controlling the perturbed system. A complicated backup
system must then be designed to control the system when large deviations from the
nominal path occur. Alternatively, a nonlinear control law de ned over the entire
operating range of the system would reduce the complexity and cost of the system
while simultaneously increasing the functional performance. Clearly, the subject
addressed in this thesis is important.
This thesis lls a gap in the current state-of-the-art by providing a practical
bridge between methods that synthesize stable feedback control laws and the, some-
what impractical, theory of nonlinear optimal control. The controls computed by
our method depend on an initial stabilizing control and are suboptimal, but can be
made arbitrarily close to the optimal control.
4

1.2 The Key Ideas


In this section we will give an overview of the key ideas involved in our ap-
proach. These ideas will be developed in detail throughout the thesis; the objective
here is to provide an intuitive overview of the method.
To develop our algorithm, we divide the problem into three questions:
1. Given an initial stabilizing control, how do we describe the performance of the
closed-loop system?
2. Given the closed-loop performance of the initial control, is there a control that
improves upon this performance?
3. How do we compute the improved control law?
The rst two items have been studied by several researchers (cf. Saridis and Lee,
1979; Saridis and Balaram, 1986; Saridis and Wang, 1994) and their solutions are
not unique to this thesis. The answer to the rst question comes directly from the
de nition of the performance index and is given by the solution of a linear par-
tial di erential equation called the Generalized-Hamilton-Jacobi-Bellman (GHJB)
equation, which is derived in section 3.3. The answer to the second question, also
derived in section 3.3, is given by minimizing the action of the system with respect
to the solution of the GHJB equation. In particular, the improved control will be
a simple algebraic expression using the gradient of the solution of the GHJB equa-
tion. The trouble with these results is that they require the solution of a partial
di erential equation which is, in general, dicult to solve. The advantage, however,
of solving the GHJB equation as opposed to the Hamilton-Jacobi-Bellman (HJB)
equation is that the GHJB equation is a linear equation whereas the HJB equa-
tion is a nonlinear equation. The third question is equivalent to asking if there is
a way to approximate the GHJB equation, derived from question 1, such that the
improved control in question 2, is stable? To solve the GHJB equation we employ
Galerkin's method. The idea of Galerkin's method is to assume that the solution of
the GHJB equation can be expressed as an in nite sum of basis functions. Assuming
that we know a set of basis functions that are complete in the sense that they can
approximate solutions of the GHJB equation, we solve for the coecients by rst,
truncating the series to N terms, which results in an error in the partial di erential
equation, and then projecting (through an inner product) the error onto the same
basis functions retained in the series, and setting the projected error to zero. This
results in N simultaneous equations in N unknowns, where the equations are linear
since the GHJB equation is linear. For Galerkin's method to be applicable, the
problem must be placed in a suitable inner product space. The GHJB equation is
de ned only on the set of states for which the initial control is stabilizing. If the
initial control is globally asymptotically stabilizing, then this set is all of IRn so
solutions to the GHJB equation become arbitrarily large as x does. In general then,
global solutions to the GHJB equation may not exist and if they do are not in any
5

inner product space. To circumvent both problems, we restrict the approximation


to a closed and bounded set in IRn, that is a subset of the stability region of the
initial control. On this closed and bounded set, the solution of the GHJB equation
is in the inner product space L2 , and so the projection is well de ned in terms of
n-dimensional integration.
To apply our method, we must choose
1. An initial stabilizing control that can be written in an explicit form,
2. A compact set that contains the origin and that is contained in the stability
region of the initial control,
3. A set of basis functions that can adequately approximate solutions of the
GHJB equation.
The rst item has been and still is, a major subject of research in the control
community. The problem of estimating the stability region of a feedback control has
received some attention in the literature (cf. (Genesio et al., 1985)), however these
methods are generally computationally infeasible. The advantage of our requirement
is that we do not need to know the exact region of attraction, only a bounded set
contained in the domain of attraction. This is much simpler than knowing the region
of attraction and can often be veri ed directly by Lyapunov's method. Selecting
suitable basis functions has not received attention in the literature and is an area
for future research.
The basic contribution in this thesis is to show that the improved controls
obtained from approximating the GHJB equation are stable, and that as we iterate
the process, the control in fact converges to the optimal control if it exists. Since the
ideas behind the method are standard, it is surprising that the method has not been
discovered before. We would like to suggest a reason why this is so. If Galerkin's
method is applied directly to the HJB equation, the result is a set of N nonlinear
simultaneous equations. The diculty is that this set of equations is dicult to solve
and the solution may not be unique. Solving simultaneous nonlinear equations is
a dicult numerical problem in itself. To compound the problem, the HJB may
have multiple solutions one of which (the positive de nite solution) corresponds
to a stabilizing control, while the others correspond to destabilizing controls. The
Galerkin approximations to these solutions will be given by the solutions of the N
nonlinear simultaneous equations mentioned above. Therefore, even if we could nd
all of the solutions to these equations we must still have a way of determining which
solution corresponds to the stabilizing control. If these problems could be solved
we must then address whether the Galerkin method converges, which for nonlinear
partial di erential equations is a challenging question. Due to these diculties we
conjecture that any attempt at using Galerkin's method to approximate the HJB
equation was quickly abandoned. Likewise, the successive approximation method
has not received much attention because the GHJB equation is as dicult to solve
analytically as the HJB equation. This appears to be the rst time that the two
methods have been combined.
6

1.3 Organization of the Thesis


The remainder of this thesis is organized as follows. In chapter 2 we review
some of the literature relevant to the thesis. In chapter 3 we de ne the in nite-time
horizon problem and the nite-time horizon problem, which will both be solved in
the thesis. The Hamilton-Jacobi-Bellman equation is then derived and an intuitive
explanation of its inherent complexity is explained. The successive approximation
method is then derived, giving rise to the Generalized-Hamilton-Jacobi-Bellman
equation. In chapter 4 we introduce Galerkin's method and show how it can be
used to solve the GHJB equation. When this approximation is plugged into the
successive approximation algorithm, we obtain algorithm 4.3.1 for the nite-time
problem and algorithm 4.3.2 for the in nite-time problem. In chapter 5 we prove
that these algorithms converge and that for N suciently large, the approximate
controls are stabilizing. This chapter contains the heart of the thesis. In chapter 6
we apply the method to a number of di erent systems and compare the results with
some of the methods discussed in chapter 2. Finally, in chapter 7 we summarize the
thesis and suggest directions for future research.
Since the thesis is theoretical in nature, the proofs of lemmas and theorems
that are original to the thesis will be given in the body of the text. When we require
results that are critical to the arguments, but not original to the thesis, the proofs
of these results are given in Appendix A. Galerkin's method has received a lot of
attention from the mathematical community and there are a number of sucient
conditions under which it is known that the method converges. In Appendix B
we list these conditions and show that the GHJB equation does not satisfy these
conditions. For the readers convenience, appendix C lists symbols that are used
throughout the thesis.
CHAPTER 2
LITERATURE REVIEW
In this chapter we review the technical literature relevant to the thesis. There is
a vast literature devoted to the topic of optimal control. Some of the standard
textbooks in this area are (Anderson and Moore, 1971; Bryson and Ho, 1975; Kirk,
1970; Lewis, 1986; Sage and White III, 1977), and we refer to the references cited
in those texts for pointers to this literature. In this chapter we focus on methods
aimed at synthesizing optimal, or suboptimal feedback control laws for nonlinear
systems. The relevant literature is divided into seven sections representing substan-
tially di erent approaches to the problem. Other methods that do not t in the
these categories are included in section 2.7. Many of the methods described below,
are compared with our method in chapter 6.

2.1 Methods Dependent on the Initial State


There are a great number of papers that present numerical methods for nding
optimal controls that are dependent on the initial state of the system and are there-
fore open-loop. Various approaches are employed and only a few will be discussed
here since in this thesis we are interested in feedback controls.
A common approach is to numerically solve for the state and co-state equa-
tions obtained from a Hamiltonian formulation of the optimal control problem. The
problem can be reduced to a two point boundary value problem which can be solved
by various methods. In (Bosarge et al., 1973) the two point boundary value prob-
lem is solved using the Ritz-Galerkin approximation theory that is employed (in a
di erent context) in this thesis. In (Hofer and Tibken, 1988) the authors reduce
the optimal bilinear control problem to successive iterations of a sequence of Riccati
equations. In (Aganovic and Gajic, 1994) the same problem is further reduced to
successive approximations of a sequence of Lyapunov equations. In (Cebuhar and
Costanza, 1984) the bilinear control problem is reduced to a sequence of linear con-
trol problems that converge uniformly to the optimal bilinear control, the approach
di ers from the previous two papers in that it applies to both the nite-time and
in nite-time problems.
Another approach taken in (Rosen and Luus, 1992) is to cast the nonlinear
optimal control problem in the form of a nonlinear programming problem. The
method is formulated in path space so the point that solves the nonlinear program-
ming problem is the optimal path of the system.
In (Vinter and Lewis, 1980), the authors give conditions which make a
Hamilton-Jacobi inequality necessary and sucient for optimality of a control. They
also state exact conditions under which the HJB equation is sucient for optimality.
The main result (and a tie to our work) is that there always exists a sequence of value
functions (satisfying HJB-inequalities) that converge to the optimal value function.
The problem is that the result is dependent on the initial state and therefore it does
7
8

not provide a necessary and sucient condition for optimality of a feedback control,
whereas the HJB equation still provides a sucient condition for optimality of a
feedback control. This paper is a non-technical presentation of some other papers
by Vinter and Lewis, namely (Vinter and Lewis, 1978a; Vinter and Lewis, 1978b).
Standard approaches to nd the open loop optimal control via a two point
boundary value problem can be bound in (Sage and White III, 1977).
2.2 Linearization about a Nominal Path
Feedback Synthesis around a nominal path is a method that is half way be-
tween open-loop control and full feedback control. The basic idea is to assume that
the trajectory of the system is always contained in a small region about a nominal
trajectory. The method is not completely dependent on the initial state since it is
allowed to be in some small ball around the nominal x0 . A typical approach is to
linearize the system equations around the nominal path. Examples can be found in
(Merriam III, 1964).
Another approach is to partition the state space into regions where the system
equations are approximately linear. Knowledge of the nominal trajectory enables
the number of partitions to be kept reasonably small. An example of this approach
is (White and Cook, 1973).
In (Jamshidi, 1976) the control is separated into a linear and nonlinear part.
The linear part can be realized by direct feedback where the nonlinear portion is
dependent on the nominal initial state x0 . If the trajectory does not deviate too
far from the nominal path then the linear portion of the control will tend to push
the trajectory to the nominal. The diculty is that it is dicult if not impossible
to estimate the region of attraction of the linear portion of the control around the
nominal path.
Open-loop control and feedback around a nominal path result in control sys-
tems that are not robust with respect to disturbances and modeling errors. They are
generally viewed as inferior to feedback control laws but a review has been included
here for completeness. In the remainder of this chapter we will focus on methods
for synthesizing feedback control laws.
2.3 Perturbation Methods
Feedback synthesis by the use of perturbation methods addresses the problem
of synthesizing a suboptimal feedback control for the class of nonlinear systems that
are perturbations of a linear system, i.e.
x_ = Ax + Bu + f (x); (2.1)
where  is assumed to be small, and the performance index is assumed to be
quadratic. The rst work to investigate this class of system is (Al'brekht, 1961),
where it is assumed that f (x) can be expanded in a power series around x = 0, and
9

that the system (A; B ) is stabilizable. Al'brekht shows that, under these conditions,
the optimal cost and control can be expanded as a power series around x = 0, and
that a rst order stabilizing control is found by standard LQR theory. Higher order
terms are obtained by solving certain linear partial di erential equations. These
higher order terms are then added to the control to improve the performance of the
system. It is shown that these higher order terms do not destroy the stability of
the system . In (Lukes, 1969), the authors present a de nitive study of analytic,
in nite-time regulators. A similar treatment of the nite-time problem is given in
(Willemstein, 1977). In (Werner and Cruz, 1968), it is shown that an nth order Tay-
lor series expansion of the optimal control gives a (2n + 1)th order approximation
of the performance index.
In (Garrard, 1969; Garrard, 1977; Garrard et al., 1967; Garrard and Jordan,
1977) the class of problems described by equation (2.1) is again considered. In these
papers it is shown that the associated Hamilton-Jacobi-Bellman equation can be
expanded as a power series around  = 0. The cost function is again assumed to be
quadratic. The result is a a series of linear partial di erential equations that can be
solved successively. The rst equation in this series reduces to the standard Riccati
equation, and it is shown that the second equation reduces to an equation with an
analytic solution. The higher order equations, however, cannot be readily solved
and so they are ignored. The method, therefore, synthesizes controls that include
linear and cubic functions of the state. To obtain higher order terms in the control
one would have to use a technique such as the Ritz-Galerkin method used in this
thesis, to synthesize these terms.
An approach similar to Garrard's is presented in (Nishikawa et al., 1971). The
di erence is that Pontryagin's maximum principle is used to expand the co-state
equations in a power series around  = 0. The problem is again reduced to solving
a series of linear PDE's. Again the linear and cubic terms in the control are readily
computed but higher order terms become dicult to obtain.
The papers, (Garrard, 1969) and (Nishikawa et al., 1971) consider a similar
example, so we can compare these methods to the synthesis method obtained in
this thesis. The results are given in section 6.2.1. In that section it is shown that
for the particular example considered, if our method is used to compute a control
with linear and cubic terms, the control performs slightly better than the control
obtained in (Garrard, 1969) and slightly worse than the control derived in (Nishikawa
et al., 1971). The advantage of our method is that it is relatively easy to compute
control terms of higher order, whereas it is not possible to do so with the methods
of (Garrard, 1969) and (Nishikawa et al., 1971).
The problem with these methods is that they are inherently tied to the con-
vergence of a power series for which it is dicult if not impossible to estimate the
region of convergence. Consequently, it is equally dicult to estimate the stability
region of a control calculated from a truncated power series. For bilinear systems,
however, it appears that that the region of attraction can be estimated, as reported
in (Cebuhar and Costanza, 1984).
10

In (Halme and Hamalainen, 1975), the authors present a method that is simi-
lar to perturbation methods. The basic idea is to represent the integral curve of the
solution (via Green functions) as a basic linear operator and then invert the opera-
tor. The method has several advantages over perturbation methods. Namely, it is
possible to estimate the region of convergence of the power series that makes up the
nal control, therefore it is possible to estimate the stability region of a truncated
control law. The method is also extended to nite-time problems.

2.4 Regularization of the Cost Function


Another approach to approximating the HJB equation is to \regularize" the
cost function so that an analytic expression for the control can be obtained. The
basic idea is to consider a cost function of the form
Z 1h i
J= xT Qx + uT Ru + (x) dt; (2.2)
0
where (x) is a function chosen so that the Hamilton-Jacobi-Bellman equation re-
duces to a form similar to the Riccati equation. This method computes a control
that minimizes the integral (2.2) and is therefore suboptimal with respect to the
cost function that is really of interest (i.e. (2.2) without the  term). This approach
does result in solutions that stabilize the system but it is dicult to estimate how
far the control deviate from the optimal.
In (Ryan, 1984; Tzafestas et al., 1984) this approach is applied to nd subop-
timal control laws for bilinear systems of the form
x_ = Ax + (Bx + b) u:
The method is also applicable when hard constraints are placed on the control
variable. In section 6.2.2 we consider an example given in (Ryan, 1984) and compare
the control obtained in that paper with a control computed using the method of this
thesis. To enforce the hard constraints on the control imposed in (Ryan, 1984), we
simply \clip" the control produced by our algorithm. In section 6.2.2 it is shown
that the control obtained by our method performs substantially better than the
control obtained in (Ryan, 1984).
A similar method for more general nonlinear systems is reported in (Lu, 1993).
In this paper, the cost function is regularized by including the system equations in
the cost functional. The result is shown to be more robust than linearizing about a
nominal path, but it requires a very speci c structure for the cost function, with no
possibility of tuning the control. The control is still a quasi-open loop control.
In (Freeman and Kokotovic, 1995), the problem of solving the Hamilton-
Jacobi-Bellman equation is circumvented by constructing a Lyapunov function that
solves the HJB equation for some positive de nite cost functional. The diculty is
that we don't know exactly what cost functional is being minimized.
11

2.5 Feedback Linearization


Suboptimal control laws can also be synthesized using exact feedback lineariza-
tion. The basic idea is to use feedback to cancel out the nonlinearities in the system.
The resulting system is linear and can be optimized with respect to a quadratic cost
function, via standard LQR theory. Feedback linearization has several disadvan-
tages: rst it is dicult to quantify the robustness of the control, second the control
sometimes cancels out nonlinearities that enhance stability and performance, third
to cancel the nonlinearities, the control e ort can be unreasonably large; nally even
if the resulting linear system is optimized, it is not possible to determine how close
the original nonlinear system is to optimal. The basic theory of feedback lineariza-
tion is given in (Hunt et al., 1983a; Hunt et al., 1983b; Isidori, 1989; Nijmeijer and
van der Schaft, 1990).
In (Lee and Chen, 1983) feedback linearization is used to derive a suboptimal
control law for robotic manipulators. In (Gao et al., 1992; Chapman et al., 1993;
Wang et al., 1993; Wang et al., 1994; Marino, 1984) feedback linearization is used
to synthesize suboptimal control laws for large scale power systems. With regard
to power systems, (Rajkumar and Mohler, 1995) compares generalized predictive
control, feedback linearizing control and LQR method to control a voltage generator.
The authors show that feedback linearizing control schemes are not very robust and
require large amounts of control e ort. The LQR control laws, on the other hand
are very robust, require less control but are not e ective on as large of a stability
region. Of course the generalized predictive control which is the method the authors
are studying is seen to perform the best. The method presented in this thesis is a
nonlinear extension of LQR theory so we would expect our control to perform as
well as an LQR control law but with a larger stability region. Application of our
method to a power system, and comparison with feedback linearization is shown in
section 6.3.1.
In section 6.2.3 we compare our method with the method of exact feedback
linearization for another system. The example in that section shows that our method
computes a control that performs substantially better than the control synthesized
using exact feedback linearization.

2.6 Gain Scheduling


The classical engineering approach to nonlinear systems is to linearize the
system around an operating point and then design for the linearized system. Gain
scheduling is an extension of this idea. Instead of linearizing at one point, the system
is linearized at several points in the state space, linear controls are designed at each
point and then interpolated between points. A description and detailed theoretical
justi cation of gain scheduling is given in (Shamma and Athans, 1990).
An approach related to gain scheduling is described in (Baumann and Rugh,
1986). The jest of this paper is to parameterize the linearization by a constant
operating point. In this respect the method is similar to feedback linearization.
12

They also address the problem of constructing nonlinear observers and then using
them for output feedback.
Another approach that is similar to gain scheduling is presented in (Cloutier
et al., 1996). The nonlinear system is rst parameterized by putting it into the form
x_ = A(x)x + g(x)u:
A state dependent Riccati equation is then constructed at each x:
AT (x)P (x) + P (x)A(x) ? P (x)g(x)R?1(x)gT (x)P (x) + Q(x) = 0 (2.3)
For certain systems, this equation can be solved explicitly: in which case the method
produces a feedback control law. There are two diculties. First, there are an
in nite number of parameterizations such that
f (x) = A(x)x:
However, for each x, there is exactly one parameterization corresponding to the
optimal, and that parameterization depends on x. The second diculty is that
once a parameterization is chosen, equation (2.3) must be solved for each x. When
this can not be done explicitly, the equation must be solved at a discrete number of
points and interpolated in between, i.e., gain scheduling. The disadvantage is that it
is hard to judge the sub-optimality resulting from any particular parameterization.
2.7 Other Methods
Before discussing the method of successive approximation which is the category
to which the synthesis method derived in this thesis belongs, we discuss several other
techniques that do not seem to fall into any of the categories above.
In (Goh, 1993) a feedback control is synthesized by training a neural network to
approximate the solution of the Hamilton-Jacobi-Bellman equation. The diculty
is that stability cannot be guaranteed. The neural network is trained by computing
open-loop controls for various points in the state space.
Another synthesis technique is given in (Van Trees, 1962). In this treatise the
optimal control is realized as an in nite Volterra series. The method is similar to
feedback linearization and is limited to systems that can be \inverted" in that the
nonlinearities can be eliminated by feedback. A method similar to Van Trees was
introduced by Wiener in (Wiener, 1949; Wiener, 1958) where the system equations
and input/output sequences are represented by polynomial functions.
In (Chow and Kokotovic, 1981) the authors consider the class of nonlinear
dynamics that can be divided into subsystems of slow and fast dynamics where the
fast dynamics are governed by linear equations. A two stage Lyapunov-Bellman
design is proposed where the control is shown to be stable by Lyapunov techniques
and \near-optimal" by solving the Bellman equation.
13

In (Rugh, 1971), a speci c class of nonlinear systems is studied. The approach


is to transform the system into a linear system with non-quadratic cost functional
and then to use a truncated series expansion to nd the optimal control. It is shown
that the original and transformed problem both have the same optimal control.
For robotic manipulators, an explicit optimal control is derived in (Johansson,
1990). It is shown that for this speci c system the HJB equation can be reduced to
the solution of a Matrix equation similar to the algebraic Riccati equation. Closed
loop control is obtained.

2.8 The Successive Approximation Method


Successive approximation is the foundation for the algorithm presented in this
thesis. Therefore, in this section we trace the history of the ideas that provide a
framework for this thesis. The basic idea of successive approximation is to solve a
di erential equation by establishing a reasonable initial guess to the solution and
then updating this guess based on the error that it produces. The successive ap-
proximation scheme is the di erential equation analog of the Newton-Raphson ap-
proximation method for solving nonlinear algebraic expressions. The method of
successive approximation was originally introduce by Bellman in his early work on
dynamic programming (Bellman, 1957). This method was rst applied to optimal
control problems in (Rekasius, 1964) who used the idea to successively compute sub-
optimal controls for linear system with non-quadratic performance criteria. Haus-
sler (Haussler, 1963) extended this idea to nonlinear systems and showed that the
method could simultaneously provide upper and lower bounds on the deviations of
the suboptimal control from the optimal. In (Leake and Liu, 1967), the method of
successive approximations is used to derive an algorithm for computing the solution
to the Hamilton-Jacobi-Bellman equation by computing the solution to a sequence
of linear partial di erential equations given by the Generalized-Hamilton-Jacobi-
Bellman equation. This paper is the rst place that the successive algorithm is
analyzed.
The ideas of successive approximation were placed on a sound theoretical foun-
dation by Saridis and Lee in (Saridis and Lee, 1979). The authors use successive
approximation to realize a design algorithm that improves the performance of an ini-
tial stabilizing control. The method is shown to monotonically converges pointwise
to the optimal solution, i.e. to the solution of the Hamilton-Jacobi-Bellman equa-
tion. The algorithm was further extended to stochastic nonlinear systems in (Saridis
and Wang, 1994). Our work is based on these methods which will be explained in
detail in section 3.3.
A similar method was developed independently by researchers in the Soviet
Union. In (Vaisbord, 1963), the authors present a scheme that is reminiscent of the
method in (Saridis and Lee, 1979), but that is based on heuristic arguments and
whose result is restricted to systems that are in nitely di erentiable. An important
contribution of this paper is that the authors are able to show that there exist
14

a unique solution to a partial di erential equation which is similar to the GHJB


equation. The algorithm is further analyzed in (Mil'shtein, 1964) which studies the
convergence of the value function and the control on a compact set containing the
origin. Dini's theorem is invoked to show that convergence is uniform on a compact
set. These papers also show that each successive control is stable. This is also shown
in (Saridis and Balaram, 1986) under more relaxed conditions.
A supplement to this work is given in (Bertsekas, 1976), which shows that the
successive approximation algorithm is a contraction mapping on a complete space
and therefore a unique solution exists. Most importantly, the paper derives explicit
bounds on the sub-optimality of the successive approximation at each iteration.
The diculty with all of these works is that it is unclear how to solve the
Generalized-Hamilton-Jacobi-Bellman (GHJB) equation, and so the method is not
applied to any real systems. For linear systems, the GHJB reduces to a matrix
Lyapunov function, in which case the successive approximation algorithm gives rise
to an iterative solution to the Riccati equation similar to the Kleinman algorithm
(cf. (Kleinman, 1968; Kleinman, 1970; Sandell, 1974; Mageirou, 1977; Laub, 1991)).
For nonlinear systems, the problem of solving the GHJB was rst addressed by
Balaram in (Balaram, 1985). The basic idea of the solution method presented in
(Balaram, 1985) is to partition the state space into a nite grid and to approximate
the GHJB equation with tensor product splines on regions of the grid. The spline
coecients can be computed as the solution to a system of nonlinear algebraic
equations. However, this system of equations prove to be very dicult to solve and
the method is only applied to a limited number of small examples.

2.9 Conclusion
In regard to nding approximate solutions to the Hamilton-Jacobi-Bellman
(HJB) equation, an interesting quote is found in (Merriam III, 1964):
Pertinent methods of approximation must satisfy two properties. First,
the approximation must converge uniformly to the optimum control sys-
tem with increasing complexity of the approximation. Second, when the
approximation is truncated at any degree of complexity, the resulting
control system must be stable without unwanted limit cycles. At this
time (1964), no series or other known form of approximation possesses
both these required properties.
As the literature review has shown, a similar statement could also be made
prior to this thesis, but not after. There is a recognized need to nd methods of
approximating the HJB equation such that the resulting controls are guaranteed to
be stable. This need is enhanced by the recent interest in the nonlinear H1 control
problem (cf. (van der Schaft, 1992; Wise and Sedwick, 1994; Ball et al., 1993)). It
is shown in (van der Schaft, 1992) that the nonlinear H1 control problem reduces
to the solution of the Hamilton-Jacobi-Isaacs (HJI) equation. The HJI equation is
15

identical in form to the HJB equation and the method presented in this thesis is
equally applicable to the HJI equation.
CHAPTER 3
PROBLEM FORMULATION
In this chapter we give a mathematical formulation of the problem addressed in this
thesis. We will address two related control problems, namely:
 The in nite-time horizon problem where the system equations are as-
sumed to be autonomous and the optimization index is over an in nite time
interval, and
 The nite-time horizon problem where the system equations can be time-
varying and the optimization index is over a nite time interval.
The rst two sections of this chapter introduce these problems and develop
the framework in which we address them. In section 3.3, the optimal control prob-
lem is generalized to the problem of evaluating and improving the performance of
stabilizing controls. We show how that the performance of an arbitrary stabilizing
control is given by the solution to a linear partial di erential equation called the
Generalized-Hamilton-Jacobi-Bellman (GHJB) equation. We also show how the so-
lution to this equation can be used to construct a control law that improves the
closed-loop performance. We show that the Hamilton-Jacobi-Bellman (HJB) equa-
tion is a special case of the GHJB equation and that the GHJB equation forms a
contraction on the set of admissible controls with the xed point corresponding to
the optimal control.
3.1 In nite-Time Horizon Problem
We rst describe the in nite-time horizon problem. The in nite-time problem
considers the class of nonlinear systems described by ordinary di erential equations
that are ane in the control:
x_ = f (x) + g(x)u(x); (3.1)
where x 2 IRn, f : IRn ! IRn , g : IRn ! IRnm and u : IRn ! IRm .
To ensure that the control problem is well posed we assume that f and g are
Lipschitz continuous on a set
 B (0), where B (x) is a ball around x. We also
assume that f (0) = 0.
Remark 3.1.1 In most of the derivations in this thesis it is not necessary to as-
sume that f and g are autonomous for the in nite-time horizon problem. However,
if f and g are not autonomous, the main algorithm derived in chapter 4 will require
the solution of a time-varying ordinary di erential equation over the interval [0; 1).
Moreover, this equation must be solved backward in time. Since this in not compu-
tationally feasible, we restrict our attention to autonomous f and g, in which case
the algorithm becomes particularly simple.
16
17

De ne '(t; x0; u) to be the solution at time t to the equation (3.1) with initial
conditions x0 and control u. To simplify the notation we write '(t)  '(t; x0 ; u)
when x0 and u are understood.
The rst requirement for any closed-loop system is that the control stabilize
the system. Consequently we de ne the concept of a stabilizing control. In this
thesis, stabilizing control will always mean that the system is asymptotically stable
in the sense of Lyapunov (cf. (Khalil, 1992, p. 98)).
De nition 3.1.2 Stabilizing Controls.
For the in nite-time horizon problem, the control u : IRn ! IRm is said to
stabilize system (3.1) around 0 on
 IRn if
 f (0) + g(0)u(0) = 0
 for each  > 0, there exists  > 0 such that
kx0k <  =) k'(t; x0 ; u)k < ; 8t  0; 8x0 2
:
 kx0 k <  =) limt!1 k'(t; x0 ; u)k = 0, 8x0 2
.

Throughout the thesis we will frequently use concepts from Lyapunov theory.
Therefore we will brie y review the main results (cf. (Khalil, 1992; Vidyasagar,
1993)). A function V : IRn ! IR is positive (negative) de nite on
if V (x) > 0(< 0)
for all x 2
n f0g and V (0) = 0. A function V : IRn ! IR is a Lyapunov function
on
, for the system (3.1), if V is continuously di erentiable on
, and if both V
and ?V_  ? @x (f + gu) are positive de nite on
. The main result of Lyapunov
@V T

theory is the following theorem.


Theorem 3.1.3 If there exists a Lyapunov function V (x) on
for the system (3.1),
then u(x) is a stabilizing control on
. The converse is also true if the Jacobian of
f + gu is bounded on
.
While closed-loop stability of a system is always a necessary requirement,
a control engineer usually has additional performance speci cations that must be
satis ed. For example, in a commercial ight control system disturbances from
wind gusts must be compensated such that passengers experience a minimum of
discomfort. These type of performance speci cation are dicult to quantify in
mathematical terms. The standard approach is to introduce a quantitative per-
formance measure that can be handled mathematically. The performance measure
is then treated as a design parameter. In other words, the performance measure
is adjusted until the design speci cations are satis ed. Before we de ne the stan-
dard performance measure used in optimal control theory we recall the following
de nition. l :
 IRn ! IR is monotonically increasing on
if for all x; y 2

18

kxk < kyk =) l(x) < l(y). In the optimal control literature, the standard perfor-
mance measure is an integral of a function of the state and control trajectories:
Z1
J (x0 ; u) = l ('(t)) + ku ('(t))k2R dt; (3.2)
0
where l : IRn ! IR is a positive de nite, monotonically increasing function on
,
R 2 IRmm is a symmetric, positive de nite matrix, kuk2R = uT Ru and x0 2
 IRn.
l is called the state penalty function and kuk2R is the control penalty function.
Typically l is a quadratic weighting of the states, i.e., l = xT Qx where Q is a
positive de nite matrix.
Remark 3.1.4 In linear quadratic optimal control Q is only required to be positive
semi-de nite, with (Q1=2 ; A) - detectable. For simplicity we require that l is positive
de nite which, together with the monotonically increasing property, implies that
the system is observable through l. This assumption could obviously be relaxed.
However, we see this as an unnecessary complication since l is a design parameter
that can always be chosen to satisfy the observability condition.
For equation (3.2) to give any indication of the performance of the system, the
integral must converge. Unfortunately, stability of f + gu is not sucient for the
integral to be nite.
Simple Example
The solution to the system
x_ = xu; u = ? jxj :
is
'(t) = 1 +xj0x j t :
0
4
The control u asymptotically stabilizes the system but if l(x) = jxj then
Z1 Z 1 jx0 j
l('(t)) dt = dt
0 0 1 + jx0 j t
Z 1 du
= jx0 j
1 u
= 1:
4
However if l(x) = jxj with > 1 then the integral is nite.
This necessitates the restriction of stabilizing controls to those controls that
render the cost function (3.2) nite with respect to a certain penalty on the states.
De nition 3.1.5 Admissible Controls.
Given the system (f; g). For the in nite-time horizon problem, a control u :
IRn ! IRm is admissible with respect to the state penalty function l on
, written
u 2 Al (
), if
19

 u is continuously di erentiable on
,
 u(0) = 0,
 u stabilizes (f; g) on
,
 R 1 l('(t)) + ku('(t))k2 dt < 1.
0 R

We can show a simple result that links the idea of an admissible control with
Lyapunov theory.
Lemma 3.1.6 Given the system (f; g) and a continuously di erentiable, positive
de nite state penalty function l, if u 2 Al (
) then there exists a Lyapunov function
for the system on
.
Proof: We will show that the function
Z 1h i
V (x) = l('(t; x; u)) + ku('(t; x; u))k2R dt
0
is a Lyapunov function on
. Since l is continuously di erentiable and u is contin-
uously di erentiable on
, the Lipschitz assumption on (f; g) guarantee that
@V = Z 1 @'T h @l ('(t; x; u)) + 2 @uT Ru('(t; x; u))idt
@x 0 @x @' @'
is continuous in x, i.e. V is continuously di erentiable. Since

?V_ = ? @V@x (f + gu) = l(x) + ku(x)k2R ;


T

the positive de niteness of l and kuk2R guarantee that V and ?V_ are also positive
de nite.
We would like to make the converse statement: if u stabilizes (f; g) on
, then
there exists a continuously di erentiable, positive de nite state penalty function
l :
! IR such that u 2 Al (
). Unfortunately, this is not true. The reason is that
the control penalty function is restricted to be a quadratic function of u. Therefore
there are limitations on the type of asymptotically stabilizing controls that can be
admissible. For example if the control is a linear function of x, say u(x)R= ?x and
the system decays slower than p1t then ku('(t))k2  1t and the integral 01 kuk2 dt
is not nite. Hence, we are restricted to systems that decay suciently fast. We
can, however, state the following result.
Lemma 3.1.7 If u stabilizes the system (f; g) on
, then there exists a continuously
di erentiable, positive de nite state penalty function l :
! IR such that u 2 Al (
)
if and only if u has nite energy, i.e.
Z1
0
ku('(t))k2 dt < 1; 8x0 2
;
20

Proof:R The necessity is obvious. For the suciency, the assumption guaran-
0 ku('(t))kR dt < 1, therefore it is sucient to construct l such that
tees 1 2
R 1 l(that
0 '(t)) dt < 1.
De ne the set
4

(t) = fx 2 IRn : x = '(t; x0 ; u); x0 2
g:

(t) is the image of


after time t under the map de ned by x_ = f + gu. The
asymptotic stability of the system guarantees that the image of
decreases to zero
as t ! 1. Therefore we can de ne the time, t0 , when all coordinates of the states
are less than one:
t0 =4 ft : jx^j j < 1; where x^ = arg max

(t)
kxk2g:
Let
tj =4 tj?1 + 1; j = 1; : : : :
We can now partition
as follows:

0 =
n
(t0 )

j =
(tj?1 ) n
(tj ); j = 1; : : : :
We know that
x 2
j ; j  1; =) kxk < 1;
so for each  > 0, there exists a k such kxkk < . This implies that we can choose
kj recursively as follows:
k0 = 1 Z tj Z
kj  kj?1; such that k xk2kj dx dt < j12 :
tj?1
(t)
De ne 4 n
^l = kxk2kj ; x 2
j :
^l is positive de nite and piecewise continuously di erentiable. Furthermore, for all
x0 2
,
Z1 Z 1Z
^l('(t))  ^l(x) dx dt
0 0
(t)
Zt Z Z 1Z
=
0
kxk2 dx dt + ^l(x) dx dt
0
(t) t0
(t)
Zt Z X
1
 0
(t) kxk2 dx dt + j12
0

j =1
 M < 1:
21

We employ the technique of molli cation (cf. (Jones, 1993)) to obtain a continuously
di erentiable l from ^l. Since the e ect of molli cation under the integral can be made
arbitrarily small, the proof is complete.
Remark 3.1.8 In general it is dicult to derive speci c conditions under which
the control can be made to have nite energy. However an important case when this
is always true is when the linearization of (f; g) at x = 0, i.e.
!
@f (0); g(0)
@x
is stabilizable. In this case the origin can be made exponentially stable by an
appropriate linear state feedback. Therefore there exists a nonlinear state feedback,
u, such that the real parts of the eigenvalues of @f@x+gu (0) are all negative. i.e., the
origin is exponentially stable. Therefore, the integral
Z1
0
k'(t)k2Q + ku('(t))k2R dt
Z t^ Z1
 0 k'(t)kQ + ku('(t))kR dt + t^ k'(t)k2Q + ku('(t))k2R dt
2 2
Z t^ Z1
 0 k'(t)k2Q + ku('(t))k2R dt + t^ C1e? t + C2e? t dt 1 2

< 1;
for all Q  0 and for all R > 0, where t^ is the last time that '(t) enters the
region of state space where the linearization of the closed loop system is valid. This
example shows why it is not necessary to be concerned about the admissibility of
linear systems with quadratic penalty functions: if the linear system is stabilizable,
then any stabilizing control is admissible with respect to any quadratic state penalty
function.
In the above discussion, the speci cation of the set
has been somewhat
arbitrary. However
can be made as large as the stability region of u.
Lemma 3.1.9 Given a system (f; g). If u 2 Al(
) and the region of stability of
the system x_ = f + gu is   IRn where  
, then u 2 Al ().
Proof: Since u is asymptotically stabilizing on
, there exists a t0 < 1 such
that
t > t0 =) fy : y = '(t; x; u); x 2 g 
:
Then 8x 2 
Z1 Z t0
l('( ; x)) + ku('( ; x))k2R d  l('( ; x)) + ku('( ; x))k2R d
0Z 0
1
+ l('( ; '(t0 ; x))) + ku('( ; '(t0; x)))k2R d:
t0
22

The rst integral is nite since it is over a nite time period and the second integral
is nite since '(t0 ; x) 2
.
In general it is dicult to nd the largest stability region, , of u. However, it
is usually possible to nd a region
 , or to verify (by Lyapunov methods) that
a subset
is contained in . Since our method will require some set
over which
u is stabilizing, but does not require the entire stability region , we will retain the
notation u 2 Al (
).
We will assume throughout the thesis that system (3.1) is controllable on
,
in that for an appropriate choice of l, there exists at least one admissible control,
u 2 Al (
).
The performance of an arbitrary admissible control u(0) 2 Al(
), can be
expressed in in nitesimal form by a linear partial di erential equation called the
Generalized-Hamilton-Jacobi-Bellman (GHJB) equation. An interesting fact is that
the solution to the GHJB equation can be used to nd a feedback control law that
improves the closed-loop performance of u(0) . The GHJB equation is central to the
results in the thesis and will be the subject of section 3.3. It will also be shown that
if the process is iterated then the solution to the GHJB equation converges to the
solution of the Hamilton-Jacobi-Bellman (HJB) equation.

3.2 Finite-Time Horizon Problem


In this section we describe the nite-time horizon problem. We consider the
class of nonlinear systems described by time-varying ordinary di erential equations
that are ane in the control:
x_ = f (t; x) + g(t; x)u(t; x); (3.3)
where x 2 IRn, f : IR  IRn ! IRn, g : IR  IRn ! IRnm and u : IR  IRn ! IRm .
To ensure that the control problem is well posed we assume that f and g are
Lipschitz continuous on the set [t0 ; tf ] 
, where
contains a neighborhood of the
origin. Note that for the nite-time problem, it is not necessary to assume that
f (t; 0) = 0.
De ne '( ; t0 ; x0; u) to be the solution at time  to the equation (3.3) with
initial conditions (t0; x0 ). As in the previous section, we write '(t)  '(t; t0; x0 ; u)
when t0, x0 and u are understood.
In the in nite-time horizon case we de ned the concept of stability. For the
nite-time horizon problem, the system evolves over a nite time interval and so it
does not make sense to discuss the stability of the system. The analogous concept
in the nite-time setting is that of a bounded response.
Given a set
^  IRn. Let
(t;
^ ) be the image of
^ under the mapping
'(t; t0; ; u), i.e.

(t;
^ ) = fx 2 IRn : x = '(t; t0; x0 ; u); x0 2
^ g:
23

De nition 3.2.1 Bounded Response.


Given the system (f; g), the control u(t; x) is said to have a bounded response
on [t0 ; tf ], if there exists a compact set
^  IRn, such that, 8t 2 [t0 ; tf ], the image,

(t;
^ ), is a compact set in IRn.
A system that does not exhibit nite escape in the interval [t0; tf ] for any
initial condition, has a bounded response. The converse, however, is not true.
Example
Consider the system
x_ = x2 :
The integral curves of this system are given by the equation
'(t; 0; x0) = 1 ?x0x t ;
0
which diverges to in nity at t = x1 if x0 > 0, but is bounded if x0  0. Therefore,
this system has a bounded response for any compact set contained in the negative
0

half of the real line.


Remark 3.2.2 We can list several results that guarantee that a system has a
bounded response.
 Given a system (f; g), if there exists a stabilizing control u on
, then u has
a bounded response on
for any interval [t0; tf ].
 Given a system (f; g), if u makes a compact set
invariant under the mapping
', then for all x0 2
, u has a bounded response on any interval [t0 ; tf ].
 If the system f + gu is piecewise continuous in t on [t0 ; tf ], and uniformly
Lipschitz in x, i.e.,
kf (t; x) ? f (t; y)k  L kx ? yk ;
8x; y 2 IRn , then u does not have nite escape on [t0 ; tf ] (cf. (Khalil, 1992, p.
81)), and therefore has a bounded response on [t0 ; tf ].

A bounded response is the bare minimum requirement for a nite-time prob-


lem. As with the in nite-time problem, we would like a systematic method of
incorporating additional performance speci cations. We use the standard H2 per-
formance measure used in the optimal control literature:
Z tf
l(t; '(t; t0 ; x0; u)) + ku(t; '(t; t0; x0 ; u))k2R dt;
J (t0 ; x0 ) = s('(tf ; t0; x0 ; u)) +
t0
(3.4)
were s :
 IR ! IR is a positive de nite, monotonically increasing function on
n

, l : IR 
! IR is a positive de nite, monotonically increasing function on
,
24

for each t 2 [t0 ; tf ], R 2 IRmm is a symmetric, positive de nite matrix and x0 2


^
where
^ is the pre-image of
.
We will again de ne the set of admissible controls so that equation (3.4) is
4
nite. For simplicity we de ne the notation D = [t0 ; tf ] 
.
De nition 3.2.3 Admissible Controls.
Given the system (f; g); for the nite-time horizon problem, a control u :
IR  IRn ! IRm is admissible with respect to the state penalty function l on D, and
a nal state penalty function s on
written u 2 Al;s(D), if
 u is continuously di erentiable on D,
 u has a bounded response on D,
 s('(tf )) + Rttf l(t; '(t)) + ku(t; '(t))k2R dt < 1, 8x0 2
^ .
0

Unlike the in nite-time case we have the following result.


Lemma 3.2.4 Given the system (f; g); if u is continuously di erentiable on D, then
there exists a continuous state penalty function l : D ! IR such that u 2 Al;s(D), if
and only if u has a bounded response on D.
Proof: The necessity is by de nition 3.2.3. Suciency holds since if u has a
bounded response, then the integral
Z tf
l(t; '(t)) + ku(t; '(t))k2R dt
t0
is nite for any continuous functions l and u.
Therefore, in the nite-time setting, bounded response and admissibility are
equivalent concepts.
As in the in nite time case, we will assume throughout the thesis that system
(3.3) is controllable on D, in that for an appropriate choice of l and s, there exists
at least one admissible control, u 2 Al;s(D).
Similar to the in nite-time problem, the performance of an arbitrary admis-
sible control u(0) 2 Al;s(
), can be expressed in in nitesimal form by a linear partial
di erential equation called the Generalized-Hamilton-Jacobi-Bellman (GHJB) equa-
tion. The di erence with the in nite-time case is that for the nite-time case the
solution will be time-varying. As with the in nite-time problem, the solution to the
GHJB equation can be used to nd a (time-varying) feedback control law that im-
proves the closed-loop performance of u(0) . A detailed study of the GHJB equation
will be given in section 3.3.
For the nite-time problem, the standard problem of optimal control is to nd
u to minimize the performance index (3.4) for all x 2
. The solution to this

problem is also given by the solution of the HJB equation. The HJB equation is a
special case of the GHJB equation and will be discussed in section 3.3.
25

3.3 The Generalized-Hamilton-Jacobi-Bellman Equation


In this section we derive the Generalized-Hamilton-Jacobi-Bellman (GHJB)
equation. The presentation will be intuitive and technical discussions of the con-
vergence, stability and robustness of the control will be postponed until chapter 5.
Throughout the thesis, the focus will be on the nite-time problem, since its solution
is more involved. We will speci cally mention when the results for the in nite-time
problem are di erent.
The standard optimal control problem is to nd a control to minimize the cost
function given in equation (3.4). For the problem to be well posed mathematically,
a unique optimal control must exist. This requirement places limitations on the
applicability of optimal control theory. In addition, the optimal control is very
dicult to nd, while many controls close to optimal may be much easier to compute.
In this section we generalize optimal control by considering the problem of improving
the performance of an arbitrary stabilizing control. We also show that by iterating
the improvement process, we converge uniformly to the optimal control, if it exists.
Given an initial control u(0) (t; x) 2 Al;s(D), the performance of the control at
(t; x) 2 D is given by the formula
Z tf h i
V (0) (t; x) = s('(t f ; t; x; u))+ l(; '( ; t; x; u))+ ku(; '( ; t; x; u))k2R d: (3.5)
t
However, this expression depends on the solution of the systems x_ = f + gu which is
generally not available. To obtain an expression that is independent of the solution
of the system, we di erentiate V (0) along the system trajectories to obtain
@V (0) + @V (0)T (f + gu(0)) + l + u(0) 2 = 0; V (0) (t ; x) = s(x):
f
@t @x R
This partial di erential equation is an incremental expression of the cost of an arbi-
trary control u(0) . If we can solve this equation then we have a compact expression
for equation (3.5) that does not depend on the solution '(t). This equation will be
extremely important throughout the thesis and is termed the Generalized-Hamilton-
Jacobi-Bellman equation.
De nition 3.3.1 GHJB Equation.
Given an admissible control u : IR 
! IR, The function V : IR 
! IR
satis es the Generalized-Hamilton-Jacobi-Bellman equation, written GHJB (V; u) =
0, if
@V + @V (f + gu) + l + kuk2 = 0; V (T; x) = s(x): (3.6)
@t @x R
For the in nite-time problem the GHJB equation is
@V (f + gu) + l + kuk2 = 0; V (0) = 0: (3.7)
@x R
26

We can interpret the GHJB equation geometrically. Figure 3.1(a) shows the
phase portrait of a two dimensional, in nite-time system. The dotted lines represent
the trajectories of the system. The cost at any point x is computed by integrating
equation (3.2) along the unique trajectory of the system passing through x. The solid
lines in gure 3.1(a) are constant contours of the cost function. This geometrical
interpretation suggests an intuitive idea for improving the cost of the system. If we
x the constant cost contours and minimize the action of the system with respect
to these contours, then the cost of the system will be reduced. For example, the
system in gure 3.1(b) will have lower cost then the system in gure 3.1(a). In both
the in nite-time and nite-time problems, the action of the system is given by the
Hamiltonian (cf. (Arnold, 1989, p. 248))

H (t; x; u; @V
@x
4 @V T
)=
@x (f + gu) + l + kuk2R :
To improve the performance of an arbitrary control u(0) we minimize the Hamil-
tonian, i.e.,
( )
u(1) = arg u2Amin(D) @V (0)T (f + gu(0)) + l + u(0) 2
l;s @x R

= ? 1 R?1 gT @V :
(0)
(3.8)
2 @x
 
The cost of u(1) given by the solution of the equation GHJB V (1) ; u(1) . In
(Saridis and Lee, 1979) it is shown that V (1) (t; x)  V (0) (t; x) for each (t; x) 2 D. It
is also easy to show that the convergence does not get stuck in local minimums, i.e., if
for a xed i, V (i+1) (t; x) = V (i) (t; x) then V (i) (t; x) = V  (t; x). This fact allows us to
give a simple derivation of the Hamilton-Jacobi-Bellman (HJB) equation. Assume
that a unique optimal control u exists and is an admissible control. Then the
optimal cost is given by the solution to the GHJB equation
@V  + @V T (f + gu) + l + kuk2 = 0:
@t @x R

From the solution to the GHJB equation we obtain a new control



u^ = ? 21 R?1gT @V
@x :
 
Let V^  be the solution to the equation GHJB V^  ; u^ = 0, then V^   V  . But V 
is the optimal cost so V^  = V . Since the optimal control is unique, u^ must be the
27

(a) (b)

2 2

1.5 1.5

1 1

0.5 0.5

x1 0 x1 0

−0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1


x 2
x 2

Figure 3.1: Phase ow plotted against lines of constant cost.

optimal control. Plugging u^ into the GHJB equation gives the Hamilton-Jacobi-
Bellman equation
!
HJB (V ) = GHJB V ; ? 1 R?1 gT @V
2 @x = 0;
i.e.,
HJB (V ) = @V  + @V T f ? 1 @V T gR?1gT @V T + l = 0: (3.9)
@t @x 4 @x @x
The derivation shows the complexity inherent in the HJB equation. Referring to
gure 3.1, instead of xing the system or the cost we allow both to depend on each
other; hence the nonlinearity in the equation. The derivation also shows that the
GHJB equation is a generalization of the HJB equation. Interestingly, the GHJB
equation de nes a contraction on the set of admissible controls. This fact will be
proved in chapter 5, but for now we will give the successive approximation algorithm
implied by this fact.
Algorithm 3.3.2 Successive Approximation.
Initial Step Given an initial control law, u(0) (t; x) that is admissible on D, the
performance of u(0) on D is given by the unique solution V (0) (t; x) to
 
GHJB V (0) ; u(0) (t; x) = 0:
Set i = 1.
28

u (0)

i=0

Solve
(i) 2
( f(x) + g(x) u (i) )
∆ TV (i) + l(x) + || u || R = 0

An Improved Control Law is:


(i+1) 1 -1 g T(x) ∆ (i)
u (x) = R V (x)
2

i=i+1

Figure 3.2: Successive Approximation Algorithm.


Iterative Step A control law that is admissible on D and improves the performance
of u(i?1) is given by
(i?1)
u(i) (t; x) = ? 21 R?1gT (t; x) @V@x (t; x): (3.10)
The performance of u(i) on D is given by the unique solution V (i) (t; x) to
 
GHJB V (i) ; u(i) (t; x) = 0: (3.11)
Set i = i + 1.
The algorithm is shown pictorially in gure 3.2. It has been shown in numerous
papers that the algorithm converges uniformly to the optimal control and optimal
cost (cf. (Saridis and Lee, 1979; Saridis and Wang, 1994; Vaisbord, 1963; Mil'shtein,
1964; Leake and Liu, 1967)). Therefore by iterating the process of improving an
admissible control we converge uniformly to the solution of the HJB equation. How-
ever, previous work failed to noticed that we can make de nite statements about
the stability region of each successive control law u(i) . In particular, u(i) will be
stabilizing on the same region as u(0) . In fact we can show that the stability region
of u is the largest possible stabilizing set in IRn, i.e., it is not possible to nd an
admissible control that can stabilize an initial condition that is unstable using u.
29

In (Glad, 1985; Glad, 1987; Tsitsiklis and Athans, 1984) it was shown that the op-
timal control u is robust in the sense that it has in nite gain margin and 50% gain
reduction margin. In (Saridis and Balaram, 1986), it was shown that u(i) has similar
robustness properties. Extended versions of these results appear in chapter 5 and
Appendix A.
In this section, we have derived the GHJB equation and illustrated its impor-
tance. The GHJB equation answers three fundamental questions. First, its solution
gives a compact representation of the performance of any admissible control. Second,
its solution allows us to improve the performance of the original control. Finally,
by iterating the process we converge uniformly to the solution of the HJB equation.
The GHJB equation was also shown to be a generalization of the HJB equation.

3.4 Summary
For convenience, we conclude this chapter by succinctly summarizing the prob-
lem and our main assumptions. Given the system
x_ = f (t; x) + g(t; x)u(t; x);
with cost functional
ZT
J (t0 ; x0 ) = s ('(T ; t0; x0 ; u )) + l (t; '(t; t0; x0 ; u)) + ku (t; '(t; t0 ; x0; u))k2

R dt;
t0
we make the following assumptions.
A3.1 f and g are Lipschitz continuous on
, and f (t; 0) = 0, 8t 2 [t0 ; T ].
A3.2 l and s are continuous, positive de nite, monotonically increasing functions
on D and
respectively. R is a symmetric positive de nite matrix.
A3.3 System (3.3) is controllable on D, in that there exists an admissible control,
u(0) 2 A(D), on D.
A3.4 In the in nite-time horizon case, the system equations f , g and l, as well as
the initial control, u(0) are independent of time.
Given an initial stabilizing control u(0) 2 A, the performance of this control is
given by the solution to the GHJB equation
@V (0) + @V (0)T hf (t; x) + g(t; x)u(0)(t; x)i + l(t; x) + u(0) (t; x) 2 = 0;
@t @x R
(0)
V (T; x) = s(x):
The solution V (0) to the GHJB equation can be used to construct a feedback control
law that improves the closed-loop performance of the system. The improved control
30

is given by the formula

u(1) (t; x) = ? 21 R?1gT (t; x) @V@x (t; x):


(0)

When the process is iterated, we obtain a successive approximation algorithm that


converges uniformly to the optimal control which is given on D by the formula

u(t; x) = ? 21 R?1 gT (x) @V
@x (t; x);
where V  (t; x) satis es the Hamilton-Jacobi-Bellman (HJB) equation:
@V  + @V T f (x) + l(x) ? 1 @V T g(x)R?1gT (x) @V  = 0; V  (T; x) = s(x):
@t @x 4 @x @x
While the GHJB equation is theoretically easier to solve than the HJB equa-
tion, there is still no general closed form solution to this equation. The main result of
this thesis is a practical computer algorithm that computes arbitrarily close approx-
imations to the GHJB equation, which consequently means that we can compute
arbitrarily close approximations to the HJB equation. Chapter 4 presents the al-
gorithm. The mathematical justi cations and convergence proofs are contained in
Chapter 5.
CHAPTER 4
A NEW ALGORITHM TO IMPROVE CLOSED-LOOP
PERFORMANCE
In the previous chapter we showed that by iteratively solving the Generalized-
Hamilton-Jacobi-Bellman (GHJB) equation we could improve the closed-loop per-
formance of control laws that are known to be admissible. Furthermore, we can
get arbitrarily close to the optimal control by iterating enough times. The objec-
tive of this chapter is to show how we can approximate the solution of the GHJB
equation such that the controls which result from the solution are in feedback form.
To solve the GHJB equation we will use Galerkin's spectral approximation method.
Section 4.1 contains an explanation of Galerkin's method which is then applied to
the GHJB equation in section 4.2. The approximate solution to the GHJB equation
will be plugged into the successive approximation algorithm in section 4.3 to pro-
duce a new algorithm that improves the closed-loop performance of an admissible
control and converges uniformly to the solution of the HJB equation. We will then
discuss implementation issues associated with the algorithm in section 4.4. Detailed
analysis of the algorithm will be given in chapter 5.
4.1 The Basic Idea of Galerkin's Method
In this section we will present the basic idea behind the Galerkin spectral
approximation method for solving partial di erential equations. The basic idea is
that of linear projection. Suppose that we are given an arbitrary vector v 2 IR3 ,
and that we want to write this vector as a linear combination of three other, linearly
independent vectors in IR3 , i.e., we want to nd ci; i = 1; 2; 3 such that
v = c1 v^1 + c2 v^2 + c3v^3 :
To nd the coecients ci , we project the vector v onto the vectors v^i to obtain the
following linear equation
0 1 0 10 1
hv; v^1 i hv^1; v^1 i hv^2; v^1 i hv^3; v^1 i
B@ hv; v^2 i CA = B@ hv^1; v^2 i hv^2; v^2 i hv^3; v^2 i CA B@ cc12 CA ;
hv; v^3 i hv^1; v^3 i hv^2; v^3 i hv^3; v^3 i c3
where the projection operator is the inner product in IR3. The matrix multiplying
the coecients is called the Grammian matrix of the vectors vi and is invertible if vi
are linearly independent. The coecients are now found by inverting the Grammian
matrix.
Galerkin's method is an extension of this idea to function space. Given the
di erential equation LV +  = 0, where L is a linear operator and  is in the range
of L, we assume that the solution V belongs to some space which is spanned by a

31
32

set of linearly independent functions span fj g1


1 , i.e.,
X
1
V (x) = cj j (x):
j =1
Since there are an in nite number of cj we must truncate the series to form the
approximation to V
X
N
VN (x) =4 cj j (x):
j =1
When we plug VN into the di erential equation we obtain an error
error = LVN + :
The object is to choose the coecients such that the error is as close to zero as
possible. In Galerkin's method we project the error onto the same functions used
to approximate V , i.e., we form the following N equations in N unknowns:
herror; j i = 0; j = 1;    ; N;
where the projection operator is the inner product in an appropriate Hilbert space.
4.2 Galerkin Projections of the GHJB equation
In this section we use Galerkin's method to derive an approximate solution to
4
the GHJB equation on the compact set D = [t0 ; tf ] 
where
is a compact subset
n
of IR . To do so we assume that
A4.1 we can select a set, fj (x)g1j=1 where j (0) = 0, of functions (not necessar-
ily linearly independent) on
, such that for each t 2 [t0 ; tf ], V (i) , and V_ (i)
are linear
 (icombinations
 of these functions for all x 2
, where V (i) solves
GHJB V ) ; u(i) = 0.
 
We seek an approximate solution, VN(i) , to the equation GHJB V (i) ; u(i) = 0
by letting
X
N
VN(i) (t; x) = c(ji) (t)j (x) (4.1)
j =1
were the coecients c(ji) are constant in the in nite-time horizon case. Substituting
this expression into the GHJB equation results in an error
0N 1
X
error = GHJB @ c(ji) j ; u(i) A (4.2)
j =1
were we have dropped the dependence on t and x to simplify the notation. The
coecients c(ji) are determined by setting the projection of the error, (4.2), on the
33

nite basis fj gN1 , to zero 8x 2


:
* 0N 1 +
X
GHJB @ c(ji) j ; u(i)A ; n = 0; n = 1; : : : ; N; (4.3)
j =1

where the inner product is de ned as


Z
hf; gi

f (x)g(x) dx: (4.4)
Using equation (3.6), this expression reduces to the following N equations in N
unknowns:
8P  
< Nj=1 c_(ji) hj ; ni
+ PNj=1 c(ji) D @@xj hf + gu(i)i ; nE + l + u(i) 2 ; n = 0
>

R
: PNj=1 c(ji) (tf ) hj ; ni
= hs; ni
; n = 1; : : : ; N:
>

In matrix form, the equation is


c_(i) (t) + M ?1 A(t)c(i) (t) + M ?1 b(t) = 0; (4.5)
with boundary condition
c(i) (tf ) = M ?1 P;
where
Mnj = h*j ; ni
(4.6)
@ jh (i)
i +
Anj = @x f + gu ; n ; (4.7)
 2 

bn = l + u(i) ; n ;R
(4.8)

Pn = hs; ni
: (4.9)
For the in nite-time horizon, the equation reduces to
Ac(i) + b = 0:
We will now plug this approximation into algorithm 3.3.2 to obtain a new algo-
rithm for improving the closed loop performance of admissible controls for nonlinear
systems.
4.3 The Combined Algorithm
To simplify the notation throughout the remainder of the thesis, we de ne
4
N (x) = (1(x);    ; N (x))T ; (4.10)
34

and let rN be the Jacobian of N . If  : IRN ! IR is a real valued function then
we de ne the notation
h; N i
=4 (h; 1i
; : : : h; N i
)T :
If  : IRN ! IRN is a vector valued function then we de ne the notation
0 1
h1 ; 1i
   hN ; 1 i

4B CC
h; N i
= B@ ... ...
A:
h1 ; N i
   hN ; N i

The key to the notation is that the j th row corresponds to integration weighted by
j .
Using this notation we can write the Galerkin projection of the GHJB equation
in the compact form (
hGHJB (V; u) ; N i
= 0
hV (tf ); N i
= hs; N i
:
We will also use bold face letters to denote the coecients in the Galerkin approxi-
mation method, i.e.,
4  (i) T
c(Ni) = c1 ; : : : ; c(Ni) : (4.11)
We now trace the steps of algorithm 3.3.2, substituting the approximation of
the previous section, for the GHJB equation. Given an initial control u(0) , we can
compute an approximation to its cost VN(0) = c(0) T (0)
N N where cN is the solution to
(
c_ (0) ?1 (0) ?1
N (t) + M A(t)cN (t) + M b(t) = 0;
(0)
cN (tf ) = M ?1 P
and
D E
A = hrN f; N i
+ rN gu(0); N

 2 
b = ? hl; N i
? u(0) R ; N

P = hs; N i
:
Setting i = 1 we compute an updated control based on the approximation VN(i?1)
rather than the actual cost V (i?1) :
1 ?1 T @V (i?1)
u(Ni)(t; x) = ? R g (t; x) N
2 @x (t; x);
1
= ? R?1gT (t; x)rTN c(Ni?1) : (4.12)
2
35

When (4.12) is substituted into (4.7) and (4.8) we obtain the approximation
VN(i) = cN(i)T N ;
where c(Ni) is the solution to
(
c_ (Ni) (t) + M ?1 A(t)c(Ni) (t) + M ?1 b(t) = 0;
c(Ni) (tf ) = M ?1 P
and
* +
1X N
A = hrN f; N i
? 2 c(ki?1) rN gR?1gT @ k
@x ; N
;
k=1
XN * +!
1
b = ? hl; N i
? 2 ck (i?1) @
rN gR?1gT @x ; N c(Ni?1)
k
k=1

P = hs; N i
:
For a nite-time horizon, we obtain the following algorithm.
Algorithm 4.3.1 Dual Approximation for Finite-Time Horizon.
Compute the Integrals
* +
hl; N i
; hrN f; N i
; @x rN gR?1gT @k ; N
;
D E  (0) 2 

rN gu ; N
; u R ; N ; hN ; N i
; hs; N i
:
(0)

Initial Step Let


 D E
A(t) = (hN ; N i
)?1 hrN f; N i
+ rN gu(0); N
;
?
  2 
b(t) = (hN ; N i
) hl; N i
+ u R ; N ;
1 (0)

?
P = (hN ; N i
) hs; N i
:
1

Find c(0)
N (t) satisfying the following linear di erential equation:
(
c_ (0) (0)
N (t) + A(t)cN (t) + b(t) = 0
c(0)
N (tf ) = P
Let i = 1.
Iterative Step An improved controller is given by
u(Ni) (t; x) = ? 21 R?1 gT (t; x)rTN (x)c(Ni?1) (t):
36

Let
X
N * +
M (t) = c(ji?1) (t) rN gR?1gT @k ; 
@x N
j =1

?1
 1 
A(t) = (hN ; N i
) hrN f; N i
? 2 M (t) ;
?
 1 ?

b(t) = (hN ; N i
) hl; N i
+ 4 M (t)cN (t) ;
1 (i 1)

P = (hN ; N i
)?1 hs; N i
:
Find c(Ni) (t) satisfying the following linear di erential equation:
(
c_ (Ni) (t) + A(t)c(Ni) (t) + b(t) = 0
c(Ni) (tf ) = P
Let i = i + 1.

For in nite-time horizon problems, the time dependence disappears and these
equations become particularly easy to compute. The following algorithm summa-
rizes the in nite-time horizon case.
Algorithm 4.3.2 Dual Approximation for In nite-Time Horizon.
Compute the Integrals
* +
hl; N i
; hrN f; N i
; rN gR?1gT @k ; N @x ;
D E  2 

rN gu(0); N
; u R ; N :
(0)

Initial Step Let


D E
A = hrN f; N i
+ rN gu(0) ; N
;
 2 
b = hl; N i
+ u(0) R ; N ;

Find c(0)
N satisfying the following linear equation:

Ac(0)
N + b = 0:
Let i = 1.
37

Iterative Step An improved controller is given by


u(Ni)(x) = ? 21 R?1gT (x)rTN (x)c(Ni?1) :
Let
XN * +
A = hrN f; N i
? 12 c(ji?1) rN gR?1gT @ @x
k
; N ;
j =1

X (i?1)
N * + !
b = hl; N i
+ 41 ck rN gR?1gT @
@x
k
; N c(Ni?1) ;
k

Find c(Ni) satisfying the following linear equation:


Ac(Ni) + b = 0:
Let i = i + 1.

Remark 4.3.3 It is, of course, possible to apply Galerkin's method to the HJB
equation directly. Assuming that the optimal cost function exists in a function
space spanned by span fj g1  4 T
1 we substitute the approximation VN = cN N into
the HJB equation and set the projection of the error to zero
(
hHJB (V ) ; N i
= 0
hV (tf ); N i
= hs; N i
:
After some algebra we obtain the nonlinear ordinary di erential equation
8  ?1
< c_ N1 +PhNN ;  ND i
(hr?N1 f;T @kN i
cNE   
> 

> ? k=1 ck rN gR g ; N cN = 0 : (4.13)


: cN4(tf ) = hN ; N i?1 hs; N@xi :

The diculty is that equation (4.13) is a nonlinear equation as opposed to (4.5)


which is a linear equation. For the in nite time problem, equation (4.13) becomes a
nonlinear algebraic expression with multiple solutions, one of which corresponds to
a stabilizing control law. We are now faced with the question of how to solve this
equation and how to guarantee that the solution produces a stabilizing control. The
answer is given by algorithm 4.3.2, which will be shown in chapter 5 to converge
to the solution of (4.13) and which, for N suciently large, produces a stabilizing
control law.
In the nite-time case, the situation is a little di erent. Using a Runge-Kutta
scheme it is possible to solve equation (4.13) directly (cf. (Burden and Faires,
38

.
x x
+
+
- -

f( . )

c1 1 g( . ) R-1gT(. ) ∆ φ (.)
2 1

. . .
. . .
. . .

φ (.)
cN 1 g(. ) R-1 gT( . ) ∆
2 N

Figure 4.1: Closed Loop System.


1988)) assuming that the equation does not escape in nite time. Iteratively solving
equation (4.5) has advantages and disadvantages. The advantage is that it is linear
and therefore guaranteed not to escape in nite time. The disadvantage is that the
equation may be severely ill-conditioned, depending on the initial control u(0) . We
have found that solving equation (4.13) is numerically easier than iteratively solving
(4.5). Therefore in all of the nite-time examples in chapter 6, the result is produced
by solving (4.13) directly.

4.4 Implementation Issues


In this section we discuss the implementation of algorithm 4.3.1 and algo-
rithm 4.3.2. First note that there is no dependence on an initial point or on a
nominal trajectory, i.e., the computed control is a true feedback control. We also
note that all of the computations are performed o -line. The iteration in algo-
rithm 4.3.1 is stopped at some i and the coecients c(Ni) are used to form the closed
loop system shown in gure 4.1.
The bulk of the computations in algorithm 4.3.1 and algorithm 4.3.2 are com-
puting the integrals listed at the top of algorithm 4.3.1. These quantities involve a
total of
2N + 2N 2 + 1 N 2 (N + 1)  (N 3)
2
n - dimensional integrations over the set
. If these integrations are performed
numerically, their computation becomes prohibitive when
and the dimension of
39

the state space n is large. We have circumvented this problem by using a symbolic
computational engine, which results in signi cant savings when it can be used. To
compute the integrals symbolically it is necessary that the closed form solutions
of all of the integrals exist. For example, this is true when the system equations
and the basis functions are polynomials. We have found that when the dimension
of the state space is greater than three, the necessary computations executed on a
Sun/Sparc 10 running Matlab 4.2, become prohibitive unless symbolic software is
used.
The most important practical issue regarding our algorithm is the choice of
the basis functions fj g. It is shown in chapter 5 that it is not necessary to require
that fj g are either orthogonal or even linearly independent. In fact if fj gN1 are
orthogonal (resp. linearly independent) and fj gN1 +1 are not orthogonal (resp. lin-
early dependent) then it is shown that VN(i)  VN(i+1)
. However from a practical point
of view it is desirable that fj g are at least linearly independent, since in that case
we will be able to invert the matrix
D E
rTN (f + gu); N
directly. Since orthogonalization is computationally expensive, we use functions
which are simply linearly independent.
In the examples in chapter 6 we use polynomials as basis functions. We have
found that polynomials work very well in this algorithm. From equation (3.4) and
the fact that l and R are positive de nite we can conclude that V (i) will also be
a positive de nite function. Therefore the completeness assumption A4.1, will be
satis ed if we choose basis functions that span the positive de nite functions, i.e.,
the terms obtained from the expansion of the polynomial
1 X
X n !2j
xk :
j =1 k=1
Therefore if n = 2 we can take the set
n o
fj g11 = x21 ; x1 x2 ; x22 ; x41 ; x31 x2 ; x21 x22 ; x1 x32 ; x42 ;    :
For many system this set of basis functions works very well. Theoretically,
the algorithm converges for any complete basis as N ! 1. For nite values of
N , however, the algorithm will be sensitive to the chosen basis. If the system has
modes that are not spanned by the functions fj gN1 then the control will not be
able to compensate for these modes. To keep N as small as possible we want to
choose those j 's that capture the signi cant dynamics of the system. The choice of
a basis, therefore, must receive careful consideration and it is here that engineering
ingenuity and insight are extremely important.
One of the major advantages of this method is that the control laws obtained
by algorithm 4.3.1 and algorithm 4.3.2 are tunable via the state penalty functions l
40

and s and the control weighting matrix R. It is important to note that there are only
2N integrals involving l and s and that R can be be pulled outside of the integrals
in which it is involved. Therefore, the computations that must be performed to tune
the controller are signi cantly fewer than the original computations. In fact if l and
s are appropriately chosen (e.g. quadratic penalty functions), then the control can
be tuned without re-computing any integrals.
4.5 Summary of the Method
In this chapter we used the Galerkin spectral method to approximate the
solution to the GHJB equation. To apply Galerkin's method it is rst necessary
to place the solution to the di erential equation in a Hilbert space. To do so we
restrict attention to a compact subset
, of the stability region of a known stabilizing
control. When the solutions to the GHJB equation are restricted to this set they
exist in the Hilbert space L2 (D).
When Galerkin's method is used to approximate the GHJB equation, and
the result is plugged into the successive approximation algorithm we obtain algo-
rithm 4.3.1 and algorithm 4.3.2 which are shown pictorially in gure 4.2. In the next
chapter we will derive conditions under which these algorithms converge uniformly
to the solution of the HJB equation. While there are many methods of approximat-
ing solutions of the HJB equation they are either open-loop, impractical, or fail to
guarantee that the resulting approximate controls will be stabilizing. The advantage
of our algorithm is that
 All of the computations are performed o -line, and
 The resulting controls are in feedback form.
Furthermore, in chapter 5 we show that
 The algorithm converges uniformly to the optimal control, and
 When the approximation is truncated for nite (but large enough) N and I ,
the approximate controls are guaranteed to be stabilizing on a pre-speci ed
set
.
41

Compute:
2
l,Φ T
ΦN f , ΦN , ΦN
∆ (0)
f, g, l, R, u , { φj }1 , Ω
N
N u (0)
R
N
∆ T (0)
ΦN g u , ΦN { ∆
ΦN h φk , ΦN
T ∆
} 1

T
ΦN f , ΦN T (0)

ΦN g u , ΦN
A= ∆
+
2
b= - l,Φ - u
(0)
, ΦN
N R

i=0

(i) -1 (i) (i) ... (i) T


cN =A b cN = ( c c )
1 N

N
ΦN h φk , ΦN
(i) T
=Σ c
(i) ∆ ∆
M
k=1 k

T (i)
ΦN f , ΦN - 1 M
A=

2
(i) (i)
b=- l,Φ - 1 M cN
N 4

i=i+1

Figure 4.2: Algorithm for Improving Feedback Control Laws.


CHAPTER 5
CONVERGENCE AND STABILITY
In this chapter we discuss the mathematical issues of algorithm 4.3.1 and 4.3.2. In
particular we will develop conditions under which the algorithm converges to the
solution of the Hamilton-Jacobi-Bellman equation. We will also show that for N
suciently large the approximate controls stabilize the system and are robust in
the same sense as the optimal control. In section 5.1 we explain the problem and
give a brief outline of our arguments. The backbone of the proof will be developed
in a number of lemmas in section 5.2. The convergence and stability proofs are
contained in section 5.3 and the main results are summarized in section 5.4.

5.1 The Convergence Problem


The convergence problem is summarized in gure 5.1. The notation is as
follows:
V  is the solution to the HJB equation, HJB (V ) = 0,
 
V (i) is the solution to the GHJB equation with control u(i), GHJB V (i) ; u(i) = 0,
VN is the solution to the projected HJB equation, hHJB (VN ) ; N i
= 0,
VN(i) is theDsolution to the projected
 E GHJB equation with the approximate control
(i) (i) (i)
uN , GHJB VN ; uN ; N
= 0.
There are two iteration variables: N represents the order of approximation, or the
number of terms in the series being used to approximate V  or V (i) , and i represents
the iteration variable associated with the successive approximation algorithm. These
two variables infer the four convergence problems shown in gure 5.1. Algorithms
4.3.1 and 4.3.2 produce the sequence VN(i) ! VN . Since this is the direction for which
we do not have a strong convergence result, prove the convergence of the algorithms
indirectly by showing, in section 5.3, the uniform convergence of both V (i) ! V 
and VN(i) ! V (i) . Therefore, for any  > 0 we can choose N and i suciently large
to guarantee that
 (i)  (i) (i) (i)
V ? VN  V ? V + V ? VN < :
While it is nice to know that the value function converges to the optimal,
it is critical to ensure that the controls associated with these approximations are
admissible. For the in nite-time problem, we would also like to know something
about the robustness of the approximate control. We will show in section 5.3 that
for N suciently large u(Ni) is both stable and robust in the same sense as the optimal
control. An important point is to note that for N suciently large, the stability

42
43

(i)
V V*

(i)
VN V*N

Figure 5.1: Convergence Diagram

region of u(Ni) contains


, a closed and bounded subset of the stability region of
u(0) , therefore we have a well de ned stability region as compared to perturbation
methods.
5.2 Preliminary Tools
In this section we develop the tools necessary to show that V (i) ! V  and
VN(i)! V (i) . The tools in this section will be used in an inductive argument in
section 5.3 to show that the results hold for all i. To that end, we develop tools that
are independent of i. To do so we adopt the following notation:
u 2 APl;s(D) is an arbitrary admissible control,
VN N satis es hGHJB (VN ;u) ; N i
= 0;
V^
=
= P1j=1 cc^j j satis es GHJB V^ ; u = 0;
j =1 j j
uN = ? 12 R?1 gT @V@xN ;
u^ = ? 12 R?1 gT @@xV^ ;
cN = (c1;    ; cN )T ;
c^N = (^c1;    ; c^N )T :
This section constitutes the heart of the thesis, and so we will provide a guide
to its contents. To di erentiate between proofs that are original to the thesis, we
have placed proofs that have appeared elsewhere in the literature in Appendix A.
44

 Before we can begin to analyze convergence, we need to understand the na-


ture of the Generalized-Hamilton-Jacobi-Bellman (GHJB) equation. In lem-
ma 5.2.1 and lemma 5.2.2 we completely describe solutions to the GHJB equa-
tion when l and R are positive de nite and u is admissible.
 
 In lemma 5.2.3 we show that if V^ satis es the equation GHJB V^ ; u = 0 then
V^ exactly equals the value function of the control u. Hence, the answer to
question Q1 in section 3.1 and section 3.2 is given by the solution of the GHJB
equation.
 We next show in lemma 5.2.4  if u^ is derived from the solution, V^ of
 ^ that
the GHJB equation GHJB V ; u = 0, according to equation (3.8), then u^
improves the performance of u, i.e., we answer question Q2 in section 3.1 and
section 3.2. As a corollary to lemma 5.2.4 we show in corollary 5.2.5 that u^ is
robust in the same sense as the optimal control, i.e., it has in nite gain margin
and 50 % gain reduction margin.
 In lemma 5.2.6 and corollary 5.2.7 we show two technical results that will be
needed at several points in subsequent proofs.
 The analysis of Galerkin's method is greatly simpli ed by assuming that the
basis functions fj g11 are orthonormal. However orthonormalizing a set of
functions can require extensive computational e ort, and so we do not want to
require orthonormality. To circumvent the problem, we show in lemma 5.2.8
and lemma 5.2.9 that we can assume that the basis functions are linearly
independent and orthonormal without e ecting the convergence arguments.
 Throughout the analysis we will be able to show pointwise convergence of dif-
ferent series on a compact set
. We would like to assert uniform convergence
on
. Therefore in lemma 5.2.12 we state necessary and sucient conditions
for pointwise convergence of continuous functions on a compact set to imply
uniform convergence.
 When u is admissible, the equation hGHJB (VN ; u) ; N i
= 0 forces the error
caused by approximating the actual solution V^ , projected on the linear space
spanned by fj gN1 to be zero. Lemma 5.2.13 shows that the residual error
tends to zero as N ! 1. We then use this result to show in lemma 5.2.14
and lemma 5.2.15 that the coecients for the actual (V^ ), and the approximate
(VN ) solutions of the GHJB equation converge to zero. This implies that VN
converges to V^ in the L2 (
) norm, which is shown in corollary 5.2.16.
 It is not sucient to know that the value functions converge, we also need
to know that the approximate control uN is admissible. We rst show in
lemma 5.2.17 that uN converges uniformly to u^, where u^ is derived from V^
according to equation (3.8). Since lemma 5.2.4 implies that u^ is admissible, we
are able to show in lemma 5.2.18 and lemma 5.2.19 that uN is also admissible.
45

 ^ rst show that if u is an admissible control then the equation


We
GHJB V ; u = 0 always has a solution.
Lemma 5.2.1 GHJB: Existence of Solution.
If u 2 Al;s(
), then:
 on D, a continuously di erentiable solution V^ (t; x) to the equation
 There exists
GHJB V^ ; u = 0 with boundary conditions V^ (tf ; x) = s(x) and V^ (t; 0) =
0; 8t 2 [t0 ; tf ],
 V^ (t; x) is a Lyapunov function for the system (f; g; u) on D,
 In the in nite-time case, V^ is independent of time, i.e. V^ (t; x)  V^ (x).
Proof: See section A.1.
The proof has a nice geometrical interpretation. Since u is admissible, the
phase portrait associated with the system x_ = f +gu is well-de ned and continuously
di erentiable. To each point in the phase plane we assign a value, V^ (t; x), by
integrating the incremental cost, l + kuk2R along the unique characteristic passing
through (t; x). As we move along the characteristic, the cost-to-go is positive and
decreasing, which is the interpretation of a Lyapunov function.
We will show that the solution provided by lemma 5.2.1 is unique.
Lemma 5.2.2 GHJB: Uniqueness of Solution.
 ^  di erentiable function V^ and^ an admissible u 2
If there exists a continuously
Al;s(D) such that GHJB V ; u = 0 with boundary conditions V (tf ; x) = s(x) and
V^ (t; 0) = 0; 8t 2 [t0 ; tf ], then V^ is unique.
Proof: See section A.2.
Lemma 5.2.1 and lemma 5.2.2 are the nonlinear analog of the well know fact
that there exists a unique, symmetric, positive de nite solution, P , to the Lyapunov
equation
AT P + PA + Q = 0
when A is stable and Q = QT > 0.
We will show that the GHJB equation is an in nitesimal version of the per-
formance index (3.4).
Lemma 5.2.3 Performance Index.  
If u 2 Al;s(D) then V^ satis es the equation GHJB V^ ; u = 0 with boundary
conditions V^ (tf ; x) = s(x) and V^ (t; 0) = 0; 8t 2 [t0 ; tf ], i
V^ (t; x) = J (t; x);
for all (t; x) 2 D.
46

Proof: See section A.3.


We will show that if we choose an updated control according to algorithm 4.3.1,
that the new control is admissible and that it improves the performance of the sys-
tem. The lemma is a strengthened version of (Saridis and Balaram, 1986, Theorem
3.2).
Lemma 5.2.4 Improved Control Law.  
If u 2 Al;s(D), and V^ satis es the equation GHJB V^ ; u = 0 with boundary
conditions V^ (tf ; x) = s(x) and V^ (t; 0) = 0; 8t 2 [t0 ; tf ], then
^
u^(t; x) = ? 21 R?1gT (t; x) @@xV (t; x); (5.1)

is an admissible control law for the system  (f; g) on D. If V^ is the unique positive def-
inite function satisfying GHJB V^ ; u^ = 0 with boundary conditions V^ (tf ; x) = s(x)
and V^ (t; 0) = 0; 8t 2 [t0 ; tf ], then V^ (t; x)  V^ (t; x), for all (t; x) 2 D in particular
V^ (t0; x0 ; u^)  V^ (t0 ; x0; u). Furthermore, the improvement in the performance is
given by Z tf
t
ku(; '( ; t; x; u^)) ? u^(; '( ; t; x; u^))k2R d:
Proof: See section A.4.
It has been shown in (Glad, 1985; Glad, 1987; Tsitsiklis and Athans, 1984)
that the optimal control u is robust in the sense that it has in nite gain margin
and 50% gain reduction margin. A similar result has been shown for the GHJB
equation in (Saridis and Balaram, 1986). Since this result will be important in the
next section we restate it here.
Corollary 5.2.5 Robustness of u^.
Consider the in nite-time problem with the system (f; g). Let u^ 2 Al (
) be
a control obtained from lemma 5.2.4 and let the gain perturbation D : IRm ! IRm
satisfy
zT RD(z)  1 +2 kzk2R ; > 0;
then x_ = f + gD (^u) is asymptotically stable on
.
Proof: See section A.5.
The situation is depicted geometrically in gure 5.2 on page 60; the system
remains stable as long as the control remains in the sector bounded above by in nity
and below by 12 u^.
 @T We show that if fj g1 are linearly independent, then so are the functions
N

@x (f + gu) . This result will be used in several proofs in this chapter.


j
47

Lemma 5.2.6 Linear Independence of @j (f


@x + gu).
For the in nite-time problem:
) ( )
fj gN1 - linearly independent =) @j (f + gu) N - linearly independent.
u 2 Al (
) @x 1
For the nite-time problem:
9
fj gN1 - linearly independent >= ( @j )N
u 2 Al;s(D) > =) @x (f + gu) - linearly independent.
(f + gu) 6 0 ; 1

Proof: In nite-time: If the vector eld f + gu is asymptotically stable then


along the trajectories '(t; x; u); x 2
, we have that
Z 1 d
(x) = ? dt ('( ; x; u)) d
Z01 @
= ? (f + gu) ('( ; x; u)) d:
0 @x
Now suppose that the lemma is not true. Then there exists a nonzero c 2 IRN such
that
cZ T rN (f + gu)  0
1 T
=) c rN (f + gu) dt = 0; for all x(0) 2

0 Z1
=) cT rN (f + gu) dt = 0; for all x(0) 2

0
=) cT N (x)  0
which contradicts the linear independence of fj gN1 .
Finite-time: If the vector eld f + gu has a bounded response then along the
trajectories '(t; t0; x; u); (t; x) 2 D, we have that
Z t d
('(t; t0; x; u)) ? (x) = ('( ; t0; x; u)) d
t dt
0
Z t @
= (f + gu) ('( ; t0 ; x; u)) d:
0 t @x
Now suppose that the lemma is not true. Then there exists a nonzero c 2 IRN such
that
cT rN (f + gu)  0
Zt
=) cT rN (f + gu) dt = 0; 8(t; x) 2 D
t0
48

Zt
=) cT rN (f + gu) dt = 0; 8(t; x) 2 D
t 0

=) cT N ('(t; t0; x; u))  cT N (x); 8(t; x) 2 D:


Since the dynamics are continuous and non-zero at a point they are nonzero in a ball
B  D. Therefore, either cT N (x) = 0; 8x 2 B , or N (x) are constant on B , either
of which contradicts the assumption that the set fj gN1 are linearly independent.
Corollary 5.2.7 Linear Independence of @j .
@x
Suppose that @@xj (x) 6 0, then
( )N
fj gN1 - linearly independent =) @j - linearly independent.
@x 1
Proof: Suppose not, then 9c 2 IRN such that
cT rN  0 =) cT rN (f + gu)  0
which contradicts lemma 5.2.6.
In the next lemma we show that we can assume that fj g1
1 are linearly inde-
pendent without a ecting the convergence arguments.
Lemma 5.2.8 Justi cation: NLinearly Independent Basis.
Suppose that the set fj g1 are linearly independent but the set fj gN1 +1 are
T
linearly dependent. Let VN = cTN N and VN +1 = bN +1N +1 satisfy the equations
hGHJB (VN ; u) ; N i
= 0
and
hGHJB (VN +1; u) ; N +1 i
= 0
respectively. If
fj gN1 - linearly independent =) hrN (f + gu); N i
- invertible;
then VN  VN +1.
Proof: From the hypothesis we know that there exist a nonzero N 2 IRN , such
that N +1 = NT N , so
VN +1 = bTN N + bN +1 N +1
= bTN N + bN +1 TN N
= (bN + bN +1 N )T N :
49

So VN  VN +1 () cN = bN + bN +1 N , 8t 2 [t0 ; tf ]. From the hypothesis we know


that cN satis es
D E D E D E
TN ; N
c_ N + rTN (f + gu); N
cN + l + kuk2R ; N

= 0;
with the boundary condition
D E
TN ; N
cN (tf ) = hs; N i
:
We also know the bN +1 satis es
D E D E D E
TN +1 ; N +1
b_ N +1 + rTN +1(f + gu); N +1
bN +1 + l + kuk2R ; N +1

= 0;
with the boundary condition
D E
TN +1 ; N +1
bN +1(tf ) = hs; N +1 i
:
But this implies that
0 D T E DT E 1 !
@D T  N ;  N E
D  N +1 ;  N E
A _ _
b N
N ; N +1
TN +1; N +1
bN +1
0 D E   1
BB rN (f + gu); N
 @x (f + gu); N 
CC bN !
T @TN +1

+@ D T E
rN (f + gu); N +1
@@xN (f + gu); N +1
A bN +1
T
+1

0 D 2; E 1
+@ D
l + k uk R N E
A = 0
l + kukR ; N +1

2
D T E D E D E
=) N ; N
b_ N + TN +1; N
b_ N +1 + rTN (f + gu); N
bN
* T + D E
+
@ N +1 (f + gu); N bN +1 + l + kuk2R ; N
= 0:
@x

Similar manipulations with the boundary conditions show that


D E D E
TN ; N
bN (tf ) + TN +1; N
bN +1 = hs; N i
:
After some algebraic manipulation we obtain
* + D E
@TN +1 (f + gu);  = rTN (f + gu); N
N ;
N
@x
DT E
D E
N +1; N
= TN ; N
N :
50

These equations imply that bN + N bN +1 satis es the same linear di erential equa-
tion and boundary condition as cN , therefore
bN + N bN +1 = cN ; 8t 2 [t0 ; tf ];
which proves the result in the nite time case. For the in nite time problem, similar
reasoning shows that cN and bN + N bN +1 both satisfy the linear equation
D E D E
rTN (f + gu); N
 = ?l ? kuk2R ; N

and are therefore equivalent by the hypothesis.


In the remainder of this chapter we assume that fj g1 1 are linearly indepen-
dent. The importance of this lemma is that a linearly dependent basis function
will not a ect the result obtained by algorithm 4.3.1 or algorithm 4.3.2, assum-
ing
D that (in the in nite-time
E case) we use an appropriate pseudo-inverse to invert
rN (f + gu); N
.
T
1
D Tset fjEg1 can be assumed
To simplify the convergence proof, we show that the
to be orthonormal without loss of generality, i.e. that N ; N
= IN , for all N .
Lemma 5.2.9 Justi cation: Orthonormal Basis. N
Given a set of linearly independent functions fj g1 . Suppose that these func-
tions are orthonormalized to form the set fj gN1 , i.e.
1 = 11 1
2 = 21 1 + 22 2
...
N = N 1 1 +    + NN N :
Let VN = cTN N solve
hGHJB (VN ; u) ; N i
= 0; hVN (tf ); N i
= hs; N i
; (5.2)
and let WN = bTN N solve
hGHJB (WN ; u) ; N i
= 0; hWN (tf ); N i
= hs; N i
:
If
fj gN1 - linearly independent =) hrN (f + gu); N i
- invertible;
then VN  WN .
Proof: First note that we can write N = BN N where (BN )ij = ij . BN is
lower diagonal and invertible, therefore
WN = bTN N = bTN BN N :
51

So cN = BNT bN =) VN  WN . But
hGHJB (WN ; u) ; N i
= 0; hWN (tf ); N i
= hs; N i

() BN hGHJB (WN ; u) ; N i
= 0; BN hWN (tf ); N i
= BN hs; N i

() hGHJB (WN ; u) ; N i
= 0; hWN (tf ); N i
= hs; N i
:
So BNT bN satis es the same linear di erential equation and boundary condi-
tions as cN which implies that BNT bN  cN in the nite-time case. In the in nite-
time case fj gN1 are linearly independent so the hypothesis implies that the solution
to this equation is unique, therefore from (5.2) we have that BNT bN = cN .
Throughout the rest of this chapter we will assume that fj g have been or-
thonormalized since this does not a ect the convergence result.
The orthonormality of fg1 1
1 implies that if a function (x) 2 span fj g1 then
X
1
(x) = h ; j i
j (x);
j =1
and that for any  > 0 we can choose N suciently large to guarantee that
1
X
h ; j i
j < :
j =N +1
We will state necessary and sucient conditions for pointwise convergence of
a series to imply uniform convergence on a compact set.
De nition 5.2.10 We say that a series P1j=1 cj j (x) is pointwise decreasing on
,
written
X
1
cj j (x) 2 PD (
) ;
j =1
if 8k = 1; 2; : : :, and 8 > 0, 9 > 0 and m > 0 such that 8x 2
,
n P> m ) X 1


1 cj j (x) <  =) cj j (x) < :

j =k+1 j =k+n+1

Remark 5.2.11 This condition implies that if the tail of a sequence at some point
x 2
is small, then after removing n > m terms, it is still small, where m is
a uniform number for all x 2
. In particular, this implies that if a series is
monotonically decreasing on
, i.e.,
1 1
X X
cj j (x) > cj j (x) ;
j =k j =k+1
52

then m = 1 and  =  and so P1


j =1 cj j (x) 2 PD (
).
We can state necessary and sucient conditions for pointwise convergence to
imply uniform convergence (cf. (Apostol, 1974, Exercise 9.8)).
Lemma 5.2.12 Pointwise and Uniform Convergence.
P1 c  (x), 8x 2
and  (x) are
If
 IRn is a compact
P set and W (x) = j =1 j j j
continuous on
, then 1
j =N +1 cj j (x) converges to zero uniformly on
i
(i) W (x) is continuous on
,
(ii) P1j=1 cj j (x) 2 PD (
).
Proof: See section A.6.
The next lemma shows that as N becomes large, the error that results by
approximating the GHJB equation converges to zero.
Lemma 5.2.13 jGHJB (VN ; u) (x)j ! 0.
Given u 2 Al;s(D). Let VN (t; x) = cTN (t)N (x) satisfy
hGHJB (VN ; u) ; N i
= 0; hVN (tf ); N i
= hs; N i
;
and let V^ (t; x) = P1
j =1 c^j (t)j (x) satisfy
 
GHJB V^ ; u (x) = 0 V^ (tf ; x) = s(x):
T
If
is compact and the functions @@xj (f + gu), kuk2R, l, s, are continuous on
and
are in the space span fj g1
1 , and if the coecients jcj (t)j are uniformly bounded for
all N , then
jGHJB (VN ; u)j ! 0; VN (tf ) ? V^ (tf ) ! 0;
pointwise on
. Furthermore, if
1D
X E
l + kuk2R ; j
j (x) 2 PD (
) ;
j =1
X1 * @VN +
j =1 @x (f + gu); j j (x) 2 PD (
) for N suciently large, and

X
1
hs; j i
j (x) 2 PD (
) ; (5.3)
j =1
then convergence is uniform on
.
Proof:
53

The hypothesis implies that GHJB (VN ; u) 2 span fj g1


1 , so
1
X
jGHJB (VN ; u) (x)j = hGHJB (VN ; u) ; j i
j (x)
j=11 " N * +
X X X N @
= c_ h ;  i + c k
(f + gu); j
j=N +1 k=1 k k j
k=1 k @x

D Ei
+ l + kukR ; j
j (x) :
2

Since the set fj g1


1 are orthogonal, hk ; j i
= 0. Therefore we get that
N 1 * @k +
X X
jGHJB (VN ; u) (x)j  ck (f + gu); j j (x) (5.4)
k=1 j =N +1 @x
1

X D E
+ l + kuk2R ; j
j (x)
j=N +1
1
X D E
 AB (x) + l + kuk2R ; j
j ; (5.5)
j =N +1
where
(i)
A =4 1kmax c (t)
N;2[t ;tf ] k
1 *
0
+
X @
B (x) =4 sup @x
k
(f + gu); j j :
(t;x)2D j =N +1

For the boundary condition we obtain
1 2* N + 3
X X
VN (tf ) ? V^ (t ? f ) = 4 ck (tf )k ; j ? hs; j i
5 j
j=N +1 k=1

1
X
= hs; j i
j : (5.6)
j=N +1
The lemma follows by applying the hypothesis and lemma 5.2.12.
We show that the GHJB equation is bounded below so that the previous
lemma implies convergence of the approximation to the solution. The proofs for the
in nite-time and nite-time case are somewhat di erent and so we will give separate
lemmas for each case.
Lemma 5.2.14 kcN ? c^N k ! 0: Finite-time.
54

Given u 2 Al;s(D). Let VN (t; x) = cTN (t)N (x) satisfy


hGHJB (VN ; u) ; N i
= 0; hVN (tf ); N i
= hs; N i
;
and let V^ (t; x) = P1
j =1 c^j (t)j (x) satisfy
 
GHJB V^ ; u (x) = 0 V^ (tf ; x) = s(x);
then
kcN (t) ? c^N (t)k ! 0
pointwise for each t 2 [t0 ; tf ]. In addition, if
X
1 * @T +
c^j (t) @x (f + gu)(t); N 2 PD ([t0 ; tf ])
j
j =1

then convergence is uniform on [t0 ; tf ].


4 4
Proof: De ne N (t; x) = GHJB (VN ; u) and ^N (x) = VN (tf ; x) ? s(x), then
hN ; N i
= h^N ; N i
= 0. From the hypothesis we have that
8
< GHJB (VN ; u) (t; x) ? GHJB V^ ; u (t; x) = N (t; x)
: :
(VN ? V^ )(tf ; x) = ^N (x)
Substituting the series expansion for VN and V^ , and moving the terms in the series
that are greater than N to the right hand side we obtain
8 P
< (_cN ? c^_ N )T N + (cN ? c^N )T rN (f + gu) = NP+1 1j=N +1@c^_jj j
>
> + j=N +1 c^j @x (f + gu)
: (cN ? c^N )T (tf )N = ^N + P1
j =N +1 c^j (tf )j :
Integrating both sides over
, and taking into account the orthonormality of the set
fj g11 , we obtain
( D @j E
(_cN ? c^_ N ) + hrN (f + gu); N i
(cN ? c^N ) = P1 j =N +1 c^j @x (f + gu); N

(cN ? c^N )(tf ) = 0:


Consider the equation
( _
 + hrN (f + gu); N i
 = 0
 (tf ) = 0:
Since this is a linear ordinary di erential equation,
D @j it has a uniqueE solution, namely
 (t) = 0; 8t 2 [t0; tf ]. Noting that P1 c
^
j =N +1 j (t) @x (f + gu); N
is continuous in
t, we invoke the standard result from the theory of ordinary di erential equations
55

that a continuous perturbation in the system equations and the initial state implies
a continuous perturbation of the solution (cf. (Arnold, 1973)). This implies that for
all  > 0, there exists a (t) > 0 such that 8t 2 [t0 ; tf ],
1 * +
X @
< (t) =) kcN (t) ? c^N (t)k2 < :
j
c^j (t) @x (f + gu)(t); N
j =N +1
2
But * @T +
X
1 D E
c^j (t) @x (f + gu)(t); N = ? l(t) + ku(t)k2R ; N

j
j =1

implies that the series on the left converges pointwise at each t 2 [t0 ; tf ]. So 8(t) >
0, 9k(t) such that
1 * +
X @
N > k(t) =) c^ (t) @xj (f + gu)(t); N < (t);
j=N +1 j
2
which proves the claim on pointwise convergence. Uniform convergence follows from
the hypothesis and lemma 5.2.12.
Lemma 5.2.15 kcN ? c^N k2 ! 0: In nite-time.
Given u 2 Al(
). Let VN (x) = PNj=1 cj j (x) satisfy
hGHJB (VN ; u) ; N i
= 0
4 P1
and V^ (x) = j =1 c^j j (x) satisfy
 
GHJB V^ ; u = 0;
T
where j (0) = 0. If
is compact and the functions @@xj (f + gu), kuk2R , l, s, are
continuous on
and are in the space span fj g1
1 , and if the coecients jcj (t)j are
uniformly bounded for all N , then
kcN ? c^N k2 ! 0:
Proof: De ne
N (x) =4 GHJB (VN ; u) (x);
then from the hypothesis we have that for all x 2
,
 
GHJB (VN ; u) (x) ? GHJB V^ ; u (x) = N (x):
Substituting the series expansion for VN and V^ , and moving the terms in the series
56

that are greater than N to the right hand side we obtain


X
1
(cN ? c^N )T rN (f + gu) = N + c^j @ j 4
@x (f + gu) = ^N (x):
j =N +1
T
Since f @@xj (f + gu)gN1 are continuous and linearly independent,

(cN ? c^N )T rN (f + gu) L (
) = 0 () cN = c^N :
2
(5.7)
If cN = c^N then the theorem is proved. Assume that cN 6= c^N , then

(cN ? c^N )T rN (f + gu) L (
)
Z 2

= (cN ? c^N )T [rN (f + gu)] [rN (f + gu)]T (cN ? c^N ) dx


= (ZcN ? c^N )T W (cN ? c^N )


=

j^N (x)j2 dx
 min (W ) kcN ? c^N k22
> 0
where Z
W =4 [rN (f + gu)] [rN (f + gu)]T dx;

min(W ) is the minimum eigenvalue of W , and the last inequality follows from
equation (5.7). Therefore
Z
j^N (x)j2 dx ! 0 =) kcN ? c^N k22 ! 0:

But by the mean value theorem, 9 2


such that
Z X1 @ T
2

N (x) + c^j @xj (f + gu)(x) dx

j =N +1
2
X1 @Tj
= (
) N ( ) + c^j @x (f + gu)( ) ;
j =N +1
0 1 2 1
X @ T
 (
) B@jN ( )j2 + c^j @xj (f + gu)( ) C
A
;
j =N +1

where (
) is the Lebesgue measure of
. Lemma 5.2.13 implies the pointwise
57

convergence of N (x), so 8 > 0, 9K1 ( ) such that


N > K1( ) =) jN ( )j < q  :
(
)
 
Since GHJB V^ ; u = 0,
X1 @T
c^j @xj (f + gu)(x) = ?l(x) ? ku(x)k2R
j =1
converges pointwise so 8 > 0, 9K2 ( ) such that
1
X @Tj
N > K2 ( ) =) c^j @x (f + gu)( ) < q  ;
j =1 (
)
which proves the lemma.
Corollary 5.2.16 Under the assumptions of lemma 5.2.15,

VN ? V^ L (
) ! 0:2

Proof:
2 Z 2
VN ? V^ L (
) =
VN ? V^ dx
2

Z Z 1 2
X

(cN ? c^N )T N (x) dx +

2
c^j j (x) dx

j =N +1

Z X 2
D E 1
= (cN ? c^N )T N ; TN
(cN ? c^N ) + c^j j (x) dx

j =N +1
By the mean value theorem, 9 2
such that
1 2
2 X
VN ? V^ L (
) = kcN ? c^N k22 + (
) c^j j ( )
2 j=N +1
! 0:

The following lemma states additional conditions that ensure that the approxi-
mate control uN converges to the updated control u^, where u^ is obtained by applying
lemma 5.2.4 to V^ .
Lemma 5.2.17 kuN ? u^k ! 0.
58

If the conditions of lemma 5.2.13 are satis ed and


uN (t; x) = ? 21 R?1 gT (t; x) @V N
@x (t; x)
^
u^(t; x) = ? 21 R?1 gT (t; x) @@xV (t; x)

then kuN (x)(t; x) ? u^(t; x)kR ! 0 pointwise on D. If in addition the conditions for
uniform convergence in lemma 5.2.13 are satis ed and
X
1
c^j R?1 gT @
@x 2 PD (
)
j
(5.8)
j =1
then kuN (t; x) ? u^(t; x)kR ! 0 uniformly on D.
Proof:
1 1 X 1 @


kuN ? u^kR  ? 2 R?1 gT rTN (cN ? c^N ) R + 2 c^j R?1gT @x ;
j

j =N +1 R
so u^ = ? 21 P1 ?1 T @j
j =1 c^j R g @x implies that the second term on the right hand side con-
verges pointwise to 0 and uniformly if condition (5.8) is satis ed. By lemma 5.2.13
we know that
X
1 @T
(cN (t) ? c^N (t))T rN (f + gu)(t; x) = N (t; x) + c^j (t) @xj (f + gu)(t; x)
j =N +1
converges pointwise to 0 and uniformly to 0 if conditions (5.3) are satis ed. For
each (t; x) 2 D we have by the de nition of the inner product in IRN that
(cN ? c^N )T rN (f + gu) ! 0 () rTN (cN ? c^N ) ! (t; x)
where  is perpendicular to (f + gu) at each (t; x). Since fj g1
1 are linearly inde-
pendent and (f + gu) is admissible, we have from lemma 5.2.6 that
(cN ? c^N )T rN (f + gu) ! 0 () (cN ? c^N ) ! 0:
From corollary 5.2.7 we have that

(cN ? c^N ) ! 0 () rTN (cN ? c^N ) ! 0:

Therefore rTN (cN ? c^N ) converges in the same sense as (cN ? c^N )T rN (f + gu).
59

Since R?1gT (t; x) in continuous on D and hence uniformly bounded, we have that
?1 T
R g (t; x)rTN (x)(cN (t) ? c^N (t)) R ! 0
in the same sense as (cN ? c^N )T rN (f + gu).
The next two lemmas show that for N suciently large, uN is admissible.
Lemma 5.2.18 Admissibility of uN : nite-time.
If the conditions of lemma 5.2.17 are satis ed, then for N suciently large
uN (t; x) 2 Al;s(D).
Proof: De ne
Z tf
J (x; w) =4 s('(tf ; t0; x; w)) + l('(t; t0; x; w)) + kw('(t; t0; x; w))k2R dt:
t0
We must show that for N suciently large, J (x; uN ) < 1 when J (x; u^) < 1.
But '(t; t0; x; w) depends continuously on w, i.e., small variations in w result in
small variations in '. Also since kuN ()k2R can be made arbitrarily close to ku^()k2R ,
J (x; uN ) can be made arbitrarily close to J (x; u^). Therefore for N suciently large,
J (x; uN ) < 1 and hence uN (t; x) is admissible.
Lemma 5.2.19 Admissibility of uN : in nite-time.
Under the conditions of lemma 5.2.17, if the set
( )1
g(0)R?1gT (0) @ 2 j (0)
@x2 2 1
is uniformly bounded for all N , then for N suciently large, uN 2 Al (
).
Proof: From lemma 5.2.4 we know that u^ 2 Al (
). Therefore from corol-
lary 5.2.5, uN is stabilizing on
if
u^T (x)RuN (x) > 12 u^T (x)Ru^(x)
() u^T (x)R(2^u(x) ? uN (x))
for all x 2
. The situation is intuitive in the 1D case: if u^(x) > 0 then uN (x) >
1=2^u(x) and if u^(x) < 0 then uN (x) < 1=2^u(x). So, in gure 5.2 we see that uN
must lie in the sector bounded above by 1 and below by 1=2^u. But we know from
lemma 5.2.17 that uN is uniformly within an  ball of u^, where  can be made
arbitrarily small by making N large enough. Therefore uN is guaranteed to be
stabilizing everywhere but some ball B (0; N ) centered at the origin, where N ! 0
as N ! 1 (see gure 5.2). By Lyapunov's rst theorem, uN will be stabilizing on
a small region
^  B (0; N ) (for N suciently large) if and only if the real parts of
60

0.6

0.4 u

0.2

1/2 u
u 0

−0.2

−0.4
ε ρ
N
N

−0.6
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

Figure 5.2: Gain margins for successive approximations u(i) .


the eigenvalues of the linearized system are less than zero. So we need to examine
the eigenvalues of the matrix @x@ (f + gu)(0). De ne

F =4 @f
@x (0)
Gj =4 ? 12 g(0)R?1gT (0) @@x2j (0):
2

Then since u^(x) is stabilizing on


, we know that
8 9
< X1 =
Re :(F + c^j Gj ); < 0;
j =1

where (M ) are the eigenvalues of M . So for N suciently large, uN will be stabi-
lizing if 8 9
< XN =
Re :(F + cj Gj ); < 0:
j =1
61

Note that
N 1
X 1 XN X X
F + c^j Gj ? F ? cj Gj  (^cj ? cj )Gj + c^j Gj :
j =1 j =1 2 j =1 2 j =N +1 2
Since we know that the series P1 j =1 c^j Gj converges, 8 > 0, 9K1 such that
1
X
N > K1 =) c^j Gj < =2:
j =N +1 2

Also, since kGj k2 are uniformly bounded, lemma 5.2.15 implies that 9K2 such that
N > K2 implies N N
X X
(^cj ? cj )Gj  jc^j ? cj jkGj k2
j =1 2 j =1
is less than =2, which proves that as N ! 1,
XN X1
(F + cj Gj ) ! (F + c^j Gj ):
j =1 j =1
Since all of the eigenvalues of F + P1
j =1 c^j Gj are strictly
P less than zero, there exists
N
some K after which all of the eigenvalues of F + j=1 cj Gj are strictly less than
zero. So for some nite K , N > K implies that uN is stabilizing on
.
To show that uN 2 Al (
) we must show that
4
Z1
J (x; uN ) = l('(t; x; uN )) + kuN ('(t; x; uN ))k2R dt < 1 8x 2
:
0
But since the eigenvalues of @x@ (f + guN )(0) can be made arbitrarily close to the
eigenvalues of @x@ (f + gu^)(0), the decay rates of f + guN and f + gu^ are of the same
order in a region close to zero. So (See remark 3.1.8)
J (x; u^) < 1 =) J (x; uN ) < 1 =) uN 2 Al (
):

We are ready to prove the main results of the thesis.

5.3 Convergence and Stability Proofs


5.3.1 Convergence of Successive Approximations
We rst show that the successive approximation algorithm 3.3.2 converges to
the solution of the Hamilton-Jacobi-Bellman equation, i.e.,
V (i) ! V  :
62

Theorem 5.3.1 Convergence of Successive Approximations.


Let (f; g; l) satisfy assumptions A3.1, A3.2, A3.3, A3.4 on page 29. If u(0) 2
Al;s(D), then
1. V (i) ! V  pointwise on D,
2. u(i) 2 Al;s(D); 8i  0:
If
is compact, then convergence is uniform on D.
Proof: See section A.7.
From corollary 5.2.5 we obtain the following robustness result.
Corollary 5.3.2 Robustness of u(i).
For the in nite-time problem, if D : IRm ! IRm satis es
zT RD(z)  1 +2 kzk2R ; > 0;
 
then for each i  0, x_ = f + gD u(i) is asymptotically stable on
.
There is an important corollary of theorem 5.3.1 that does not seem to have
been noticed in the literature. Namely, the optimal control has the largest possible
stability region of all admissible controls.
Corollary 5.3.3 Consider the in nite-time problem. Given the system (f; g), as-
sume that there exist a control u(0) such that u(0) 2 Al (
) for some set
containing
a ball around the origin. Let (i) be the stability region of u(i) and let  be the
stability region of the optimal control u, then

 (i)  (i+1)   :
Proof: Lemma 3.1.9 implies that u(0) 2 Al ((0) ) so from the proof of lem-
ma 5.2.4 we see that V (0) , which is de ned and nite for all x 2 (0) by lemma 5.2.1,
is a Lyapunov function for u(1) for all x 2 (0) . Therefore u(1) is asymptotically
stable on (0) which implies that (1)  (0) . The inductive step is shown by
letting
= (i?1) and u(0) = u(i?1) . The fact that   (i) follows from the proof
of theorem 5.3.1.
Remark 5.3.4 This corollary implies that u is also optimal in the sense that its
stability region is the largest possible stability region of any control that is admissible
with respect to l for the system (f; g).
63

5.3.2 Convergence of Galerkin Approximations


In this section we show that for each i  0, VN(i) ! V (i) as N ! 1 and that
for some nite K ,
N  K =) u(Ni) 2 Al;s(D):
Throughout the proof we will use the following notation
 (i) (i)
V (i) = P1 b
1 j j
(i)
 satis es GHJB V ;u  = 0
P
V^ (i) = 1 (i) ^ (i) ; u(Ni) = 0
1 c^j j satis es D GHJB  V  E
VN(i) = PN1 c(ji) j satis es GHJB V (i) ; u(Ni) ; N
= 0
u(i) = ? 12 R?1gT @V@xi? ( 1)

u^(i) = ? 12 R?1gT @V^@xi?( 1)

i?
u(Ni) = ? 21 R?1gT @VN@x
( 1)

V (i) is the value function from the successive approximation algorithm, V^ (i) is
the actual solution of the GHJB equation when the approximate control, uN , is used
(note that u^(i) is only used in the analysis), and VN(i) is the approximate solution of
the GHJB equation when the approximate control is used.
We will show that the following assumptions are sucient for convergence and
stability.
A5.1 V (i) 2 span fj g1j=1  L2 (
),
A5.2 u(0) 2 Al;s(D),
A5.3
is compact,

A5.4 @@xj (f + gu(0) ), u(0) 2R , l, s, @@xj gR?1gT @@xk are continuous and in span fj g11 ,
T T


A5.5 c(ji) (t) - is uniformly bounded as N ! 1 (c(ji) depends on N ),
P  (i) 2 
A5.6 j=1 l + uN R ; j
j (x) 2 PD (
),
1
 iT 
A5.7 for N suciently large, P1j=1 @V@xN (f + gu(Ni)); j j (x) 2 PD (
),
( )

P
A5.8 j=1 hs; j i
j (x)PD (
),
1
 T 
A5.9 P1j=1 c^j (t) @@xj (f + gu(Ni))(t); N 2 PD ([t0 ; tf ]),

P
A5.10 1j=1 c^j R?1gT @@xj 2 PD (
),
n o1
A5.11 g(0)R?1gT (0) @@xj (0) 2 j=1 is uniformly bounded,
2
64

A5.12 fj gN1 - linearly independent =) hrN (f + gu); N i


- invertible.
Remark 5.3.5 These conditions are not as bad as they might seem, and are proba-
bly very close to necessary. Condition A5.1 is necessary since otherwise we will never
be able to approximate V (0) arbitrarily closely. The condition that the space be con-
tained in L2(
), ensures that our integrals are nite. Condition A5.2 is necessary for
a positive de nite solution of the GHJB equation to exist. Conditions A5.4 are nec-
essary since we must be able to approximate all components of the GHJB equation
with linear combinations of j . Condition A5.5 is obviously necessary for conver-
gence since the coecients of V (i) are bounded. However this condition is awkward
and can probably be removed with additional analysis of the Galerkin approximation
procedure. Conditions A5.6{A5.10 are awkward, however these conditions simply
mean that the tail of the in nite sum decreases in some uniform manner. Since these
are necessary and sucient conditions for pointwise convergence to imply uniformly
convergence, and our stability analysis depends upon uniform convergence, these
conditions appear to be necessary. Condition A5.11 can probably be relaxed since
we have only to show that a series involving these term are bounded (see the proof
of lemma 5.2.19)a. It should also be possible to relax condition A5.12, since this is
used merely to justify the use of orthonormal basis functions.
Theorem 5.3.6 Convergence: VN(i) ! V (i) .
Let (f; g; l) satisfy the conditions of theorem 5.3.1. If the conditions A5.1{
A5.12 are satis ed,
then for each i  0,
 V ? VN L (
) ! 0
(i) (i)
2

 supD u(Ni+1) ? u(i+1) R ! 0
 for N suciently large u(Ni+1) 2 Al;s(D).
If in addition, fj (x)g1 1 are uniformly bounded on
, then VN ! V uniformly
(i) (i)
on D.
Proof: The proof will be by induction.
Basis Step: Since u(0) 2 Al;s(D), condition A5.1 and lemma 5.2.1 and lem-
ma 5.2.2
 imply that
 there exists a V (0) = V^ (0) = P1 (0)
j =1 c^j j such that
GHJB V (0) ; u(0) = 0 and satisfy the appropriate boundary conditions and
(0) (0) 2 N
X (0) 2
1 2
X
VN ? V L (
) = c(0)
j ? c
^ j + c^(0)
j :
2
j =1 j =N +1
2
So V (0) 2 L2 (
) implies that P1 (0)
j =N +1 c^j ! 0 and conditions A5.1{A5.5, A5.9
and lemma 5.2.15 and lemma 5.2.14 imply that
N
X c(0) (0) 2 (0) (0) 2
j ? c^j = cN ? c^N 2 ! 0:
j =1
65

If fj g1
1 are uniformly bounded then jj (x)j  M for all x 2
and j  0. Therefore
(0) (0) 2 XN
c(0) ? c^(0) 2 jj (x)j2 + X 1 2
c^(0) jj (x)j2
V ? VN  j j j
j =1 j =N +1
(0) 2 + M 2 X c^(0) 2
1
 M 2 c(0) N ? ^
c N 2 j
j =N +1
! 0 uniformly.
Lemma 5.2.18 and lemma 5.2.19 imply that u(1)N 2 Al;s(D). Lemma 5.2.17 implies
that
sup u(1) (1) 2
N ? u R ! 0:
D
Induction Step: Assume that

1. VN(i?1) ? V (i?1) L (
) ! 0,

2

2. For N suciently large, supD u(i) ? u(Ni) ! 0,


3. u(Ni) 2 Al;s(D).
Note that since
X
N
u(Ni) = ? 21 R?1gT c(ji?1) @
@x
j
j =1
condition A5.4 implies that
@k gu(i) = XN
c(i?1) @j
T
gR ?1 g T @k 2 span fj g1
@x N j @x @x 1
j =1
and
(i) 2 X
N X
N @T
uN R = c(ji?1) c(ki?1) @xj gR?1gT @ 1
@x 2 span fj g1 ;
k
j =1 k=1
so condition A5.4 is satis ed when u(0) is replaced with u(Ni) . Condition P
A5.1, lem-
ma 5.2.1 and lemma 5.2.2 imply that there exist unique functions V = 1i (i)
j =1 bj j
( )
and V^ (i) = P1
j =1 c^j j (V  V^ i u  uN ) that satisfy the equations
(i) (i) (i) (i) (i)
 
GHJB V (i) ; u(i) = 0
 
GHJB V^ (i) ; u(Ni) = 0
with the appropriate boundary conditions. Also let VN(i) be the galerkin approxima-
tion of V^ (i) , i.e., VN(i) satis es the equation
D   E
GHJB VN(i) ; u(Ni) ; N

=0
66

with the appropriate boundary condition, so


(i) (i)
VN ? V  VN(i) ? V^ (i)
L2 (
) L (
)
2
+ V^ (i) ? V (i) L (
) :
2

By the same arguments used to establish the basis step we have that
(i) (i)
VN ? V^ !0 L2 (
)
where the convergence is uniform if fj g are (iuniformly bounded.
By the induction step we know that uN ? u ! 0 uniformly on D. We will
) (i)
show that this implies that V^ (i) ! V (i) uniformly on D. For the nite-time problem
we have that 8t 2 [t0 ; tf )
Zt 2
V^ (i) (t; x) = l('( ; t0 ; x; u(Ni))) + u(Ni) ('( ; t0 ; x; u(Ni)) R dt
Zt t 0
2
V (i) (t; x) = l('( ; t0 ; x; u(i))) + u(i) ('( ; t0 ; x; u(i)) R dt:
t0

Since '(t; t0; x; w) depends continuously on w, V^ (i) can be made uniformly close to
V (i) by making u(Ni) uniformly close u(i). For in nite-time we note from the proof of
lemma 5.2.19 that close to the origin, the decay rates of u(Ni) and u(i) can be made
arbitrarily close. Therefore, there is some t after which the integral
Z1 2
l('( ; x; u(Ni))) + u(Ni)('( ; x; u(Ni) ) R dt
t
is uniformly close to
Z1 2
l('( ; x; u(i) )) + u(i) ('( ; x; u(i)) R dt:
t
From the continuity of '(t; x; u) with respect to u, we can also make the integrals
from 0 to t arbitrarily close, which proves the result.
The admissibility of u(Ni+1) follows from the admissibility of u(i), via theo-
rem 5.3.1, lemma 5.2.18 and lemma 5.2.19.
To complete the proof we must show that

sup u(i+1) ? u(Ni+1) ! 0:
D
By the triangle inequality we have that

sup u(i+1) ? u(Ni+1)  sup u(i+1) ? u^(i+1) + sup u^(i+1) ? u(Ni+1) :

Lemma 5.2.17 implies that sup u^(i+1) ? u(Ni+1) ! 0, so the proof reduces to showing
67

that
(i+1) (i+1) 1 ?1 T @V (i) ? V^ (i)
sup u ? u^ = sup ? R g
2 @x ! 0

given that sup u(i) ? u(i) ! 0, where V (i) = P1 b(ji) j satis es
N j =1

@V (i)T (f + gu(i)) + l + u(i) 2 = 0 (5.9)


@x R

and V^ (i) = P1 (i)


j =1 c^j j satis es

@ V^ (i)T (f + gu(i)) + l + u(i) 2 = 0: (5.10)


@x N N R

By subtracting equations (5.9) and (5.10) and rearranging we obtain


@ (V (i) ? V^ (i) )T (f + gu(i)) = @ V^ (i)T g(u(i) ? u(i) ) + u(i) 2 ? u(i) 2 ;
@x @x N N R R
which implies that
(i) ^ (i) T
@ (V ? V ) (f + gu(i))  @ V^ (i)T g u(i) ? u(i) + u(i) ? u(i) 2 :
@x @x N N R

From lemma 5.2.1 we have that @V@x ( )^i
is continuous on D, therefore @V^@xi T g is
( )

uniformly bounded on D. Taking the supremum over D of both sides and applying
the induction hypothesis, we obtain
(i) ^ (i) T
sup @ (V ? V ) (f + gu ) ! 0;
(i)
1 @x
X (i) (i) @Tj
=) sup (bj ? c^j ) (f + gu(i)) ! 0:
j=1 @x
(^c(ji) depends on N through uN .) By the de nition of the inner product we see that
1
X (i) (i) @Tj X
1 @T
sup (bj ? c^j ) (f + gu(i)) ! 0 () (b(ji) ? c^(ji) ) j ! 

j =1 @x @x j =1

where  is orthogonal to (f + gu(i)) at each (t; x) 2 D. Since fj g1


1 are linearly
independent, lemma 5.2.6 implies that
1
X (i) (i) @Tj
sup (bj ? c^j ) (f + gu(i)) ! 0 () b(ji) ? c^(ji) ! 0
j =1 @x
68

for each j = 1;   . By corollary 5.2.7,


1
(i) (i) X (i) (i) @j
bj ? c^j ! 0 () (bj ? c^j ) @x ! 0:
j=1
Since R?1gT is uniformly bounded on D the above analysis shows that
1
X (i) (i) ?1 T @j
sup (bj ? c^j )R g = sup u(i+1) ? u^(i+1) ! 0;

j =1 @x
which completes the proof.
69

5.3.3 Convergence of the Main Algorithm


Theorem 5.3.1 and theorem 5.3.6 imply the following result.
Theorem 5.3.7 If
 The system equations (f; g) and the state and control weighting functions
(s; l; R) satisfy the conditions A3.1{A3.4 on page 29
 u(0) is an admissible control for the system (f; g) on some region of the state
space,

is a compact subset of the region of state space over which u(0) is admissible,
 The set fj g1 1 is complete in the sense of condition A4.1 on page 32,
 Conditions A5.1{A5.12 on page 63 are satis ed,
then 8 > 0, 9I; K such that i > I and N > K implies that

 VN(i) ? V  < ,
 u(Ni) 2 Al;s(D).
5.4 Summary of the Main Result
In this chapter we have derived conditions under which algorithms 4.3.1 and
4.3.2 converge pointwise and uniformly to the optimal control. We have also derived
conditions that assure that for N suciently large the approximate control is ad-
missible. For the in nite-time problem this control is also robust in the same sense
as the optimal control. This chapter constitutes the main contribution of the thesis.
CHAPTER 6
EXAMPLES AND COMPARISONS
The objective of this chapter is to illustrate, through a series of examples, the wide
range of applicability and the e ectiveness of algorithm 4.3.1 and algorithm 4.3.2.
The examples will be organized into three sections. In section 6.1 we give some
simple examples to illustrate the characteristics of the method and the type results
that it produces. Section 6.2 presents a number of examples that compare the results
or our algorithm with controls obtained by methods described in chapter 2. Finally
in section 6.3 our method is used to design feedback control laws for some systems
for which it has traditionally been hard to design nonlinear controls for.
Given the dynamics and performance index for a system, to apply our algo-
rithm, a designer must choose three parameters:
1. The initial control u(0) (for in nite-time problems),
2. The set
,
3. The basis functions fj gN1 .
In all of the examples that we present in this chapter, the basis functions will
be obtained from even polynomials. In other words, if the dimension of the system is
n and the order of approximation is M , then we use all of the terms in the expansion
of the polynomial !
X2
M= X
n 2j
xk : (6.1)
j =1 k=1
The resulting basis functions for a two dimensional system is
fx2 ; xy; y2; x4; x3 y; : : : ; yM g:
The use of even polynomials is justi ed by lemma 5.2.1 since we know that we are
approximating a positive value function V (i) .
6.1 Illustrative Examples
In this section we apply our method to some simple examples to illustrate
characteristics of the algorithm. In section 6.1.1 we apply the method to a linear
system with non-quadratic cost. The optimal control is easy to compute by hand
and we can easily see the convergence in both the order of approximation N and
successive approximation i. In the second example, section 6.1.2, the optimal control
is a nite polynomial and we see that the algorithm converges quickly to this control.
The third example is of a rst order bilinear system, where the optimal control, which
can be calculated by hand, is not di erentiable, i.e., a continuously di erentiable
optimal solution does not exist, and classical optimal control cannot be applied. The
70
71

method is shown to converge to the in mum rather than the optimal. In section 6.1.4
the method is used to nd a feedback control for an inverted pendulum. Finally, in
section 6.1.5 a nite-time problem is solved.
In the rst three examples we will compare our control with the optimal. For
one dimensional systems the optimal control can be found directly by solving the
HJB equation for @V@x :
!2
@V  f (x) + l(x) ? b2 @V  = 0;
@x 4r @x
which gives s
@V  = 2r  4r2f 2(x) + 4rl(x) :
@x b2 b4 b2
The optimal control law is therefore given by
s 2
f (x
u(x) = ? b ? sign(bx) f b(2x) + l(rx) :
)
(6.2)

6.1.1 Linear System with Non-Quadratic Cost


The rst example is a simple linear system with a non-quadratic cost func-
tional. The example is from (Saridis and Lee, 1979):
x_ = ?
Z x1+ u 
J = x2 + x4 + u2 dt
0
For the example in this section, equation (6.2) becomes
p
u(x) = x ? x 2 + x2 ; (6.3)
therefore the optimal control cannot be represented by a nite polynomial series.
However, as the order of approximation increases we should be able to represent the
optimal control with better accuracy.
The initial control in this example was chosen (arbitrarily) to be u(0) = ?5x,
the estimate of the stability region was chosen (arbitrarily) to be
= [?1; 1], and
we used the basis functions fj g = fx2j g. As i ?! 1, we obtain the controls
u1(1)(x) = ?0:6475x;
u2(1)(x) = ?0:4260x ? 0:3082x3;
u3(1)(x) = ?0:4152x ? :3467x3 + 0:03x5
u4(1)(x) = ?0:4143x ? 0:3525x3 + 0:041x5 ? 0:0059x7:
For comparison, the rst four terms of the Taylor series expansion of the
72

optimal control (6.3), around zero is


u(x)j4 = ?0:4142x ? 0:3536x3 + 0:0442x5 ? 0:0884x7:
The synthesis method outlined in (Garrard et al., 1967), (Garrard, 1969), (Garrard,
1977),and (Garrard and Jordan, 1977) computes Taylor series truncations of the
optimal control. Interestingly, the cost associated with u4(1) is lower than the cost
associated with uj4. In fact, on the region
= [?1; 1] the maximum error between
V  and V4(1) is on the order or 10?6 where the error between V  and V  j4 is on the
order of 10?3.
gure 6.1 (a) shows the cost verses the approximation order M and the itera-
tion i for the xed initial state x0 = 1. Note the monotone improvement as both the
iteration variable i and the number of basis functions N are increased. Generating
a plot such as gure 6.1 (a) enables a control engineer to quickly judge the tradeo
between increased control complexity (i.e. large N ) and the possible gain in the
performance.
(a) Cost vs. Iteration (b) Control vs. State
0.8 25

N=1: −− j=1: −−
N=2: .. 20
j=2: ..
0.7 N=3: −
j=3: −.
N=4: −.− 15
optimal: −
0.6 10

5
0.5
Control
Cost

0.4
−5

0.3 −10

−15
0.2
−20

Optimal
0.1 −25
0 5 10 15 −5 0 5
Iteration State

Figure 6.1: Cost and control for linear system with non-quadratic cost.
73

6.1.2 Nonlinear System with Non-Quadratic Cost


Consider the following one-dimensional nonlinear system with non-quadratic
cost from (Saridis and Lee, 1979):
x_ = xZ 31+u 
J = x2 + 2x4 + u2 dt:
0
The optimal control for this system can be computed from (6.2) to be
p
u(x) = ?x3 ? x  x4 + 2  x2 + 1
= ?x ? 2x3:
Our method computes the following control laws for orders of approximation 2, 4,
6 and 8:
u2(1)(x) = ?2:4286x;
u4(1)(x) = ?x ? 2x3 ;
u6(1)(x) = ?x ? 2x3
u8(1)(x) = ?x ? 2x3 :
This example shows that if the optimal control is a nite polynomial in x, that our
method will in fact nd this control.
6.1.3 Bilinear System: Non-Smooth Control
In this example we apply our method to a simple one-dimension bilinear system
with quadratic cost:
x_ = xu
Z 1 
J = x2 + u2 dt:
0
This example is provided to show that the method is applicable when the optimal
control is not smooth and is not admissible. We start with an initial stabilizing
control of
u(0) = ?x2 ;
For this example, the optimal control can also be derived analytically from equation
(6.3) to be
u(x) = ? jxj :
First note that the optimal control is not continuously di erentiable and so
it is not admissible. Therefore in the standard sense of optimal control theory,
the optimal control does not exist. Our algorithm, still computes controls that
74

are arbitrarily close to the in mum. Also note that u does not have a Taylor
series expansion at x = 0 and so the method in (Garrard, 1969; Garrard, 1977;
Garrard and Jordan, 1977) fails to be valid for this example. When the order of
approximation is 4, we obtain the following control:
u4(1) = ?11:397x2 + 26:316x4 ? 31:395x6 + 13:4053x8:
In gure 6.2 we show the calculated control for iterations i = 1; 2; 10 verses the
(a) Control vs. State, N=3 (b) Control vs. State, N=14
0 0

−0.2 −0.2

−0.4 −0.4
Control

Control
(0) (0)
−0.6 u : .. −0.6 u : ..
(1)
(1)
:: u :
u 14
3
(2)
−0.8 (2) −0.8 u : −.−
u : −.− 14
3
(14)
(14) u : −−
u : −− 14
−1 3 −1
*
*
:− u :−
u

−1.2 −1.2
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
State State

Figure 6.2: Control vs. state for bilinear system.


initial and optimal control, when the order of approximation is M = 3 and M = 14.
The plots show that as the order of approximation is increased, we compute a
control that approaches the optimal, even in the case when the optimal control is
not a smooth function of the state.
75

6.1.4 Inverted Pendulum


In this section we apply our algorithm to nd a sub-optimal feedback control
for the inverted pendulum with input torque:
! ! !
x_ 1 = x2 + 0 u; (6.4)
x_ 2 g sin(x ) ? klx 1
l 1 2 ml2
where g is the gravity constant, l is the length of the pendulum, k is the coecient of
friction, and m is the mass. System (6.4) has two equilibria: an unstable equilibrium
at (0; 0), and a stable equilibrium at (; 0). The objective is to design a suboptimal
regulator on the region

= f(x1; x2 )j jx1 j  3 ; jx2 j  1g:
4
For a cost function we use the standard quadratic weighting of the states and control.
The following control, which essentially inverts the pendulum, is asymptotically
stabilizing on
:
u(0) (x) = ?2mgl sin(x1 ):
When our synthesis method is used to nd a sixth order approximation we obtain
the following control
u6(1) = ?2:4146x1 ? 1:615x2 + 0:0049x32 + 0:1196x1x22 + 0:286x21x2
+0:278x31 + 0:0063x52 ? 0:0038x1x42 ? 0:0018x21x32 ? 0:0135x31x22
?0:021x41 x2 ? 0:015x51:

gure 6.3 shows the results of applying our algorithm to this system. The
cost in these gures is computed from an initial position of x1 (0) 2 [? 34 ; 34 ], and
an initial velocity of x2 (0) = 0. gure 6.3 (a) shows the improvement in cost as
the iteration i is increased. gure 6.3 (b) shows the improvement in cost as the
approximation order M is increased. Clearly, an approximation order of M = 4 (i.e.
cubic terms in the control) is sucient for this example.
76

(a) Cost vs state, N=6


25
(0)
V : ..
20 (1)
V : −.−
6
15 (2)
Cost

V : −−
6
10 (oo)
V :−
6
5

0
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
Position x_1
(b) Cost vs state, i=oo
25
(0)
V :−
20 (oo)
V : ..
4
15 (oo)
Cost

V : −.
6
10 (oo)
V : −−
8
5

0
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
Position x_1

Figure 6.3: Cost vs. position for inverted pendulum.


77

6.1.5 Finite-time Chemical Reactor


In this section we present our solution for a nite-time problem found in (Hofer
and Tibken, 1988). The solution given in (Hofer and Tibken, 1988) is open loop
an so comparisons won't be made with our method other than to say that the time
trajectories for their initial conditions look similar. The system is a simpli ed model
of a continuously stirred tank reactor with an exothermic reaction. The model is
given by ! ! ! !
x_1 = 13=6 5=12 x1 + ?x1 u;
x_
2 ?50=3 ?8=3 x 2 0
and the optimization index given by
Z tf  
J =  kxk2 + 10 kxk2 + juj2 dt;
0
where x1 represents the temperature and x2 represents the concentration of the
initial product of the chemical reaction and  is the weighting on the nal state.
For nite-time problems, choosing the set
is not as straightforward as in the
in nite-time case since we do not have an initial control. Therefore a little trial and
error may be necessary. In (Hofer and Tibken, 1988) the initial state is chosen to
be x0 = (0:15; 0)T and the open loop trajectories from this point are bounded by
0:3, therefore we choose
= [?1; 1]  [?1; 1]. Since all of the trajectories that we
observed in simulation were within this set the choice seems to have been judicious.
Even polynomials in x1 and x2 are chosen for the basis set. To nd the coecients
for the control, we use a sti ode solver in Matlab to solve equation (4.13). Several
comments are in order concerning the solution to this equation. If the order of
approximation is 4, (corresponding to N = 8) then the ode in equation (4.13) has
nite-escape at t = 0:28 seconds If the order of approximation is 6 (corresponding to
N = 15) then no nite-escape occurs for   350, however for   400 a nite escape
occurs close to zero. This shows the problem encountered in using equation (4.13)
as opposed to algorithm 4.3.1. The example shows the nature of our theoretical
result, i.e., an admissible control is only guaranteed for N suciently large. When
tf = 5 the controller gains are shown in gure 6.4 for  = 0; 10; 100; 350. It should be
noted that for each value of  the gains converge to the same values as tf ! 1. For
this problem we can also nd a stabilizing control law and compute (time-invariant)
gains for the in nite-time horizon case. It is interesting that as tf ! 1 the nite-
time gains converge to the in nite-time gain. Therefore, there exists an analogy
to the time-varying verses algebraic Riccati equation: as tf ! 1 the solution to
the time-vary Riccati equation converges to the solution of the algebraic Riccati
equation.
Figure 6.5 (a) shows the time history of the system from the initial condition
x0 = (0:15; 0)T , when the control gain is computed for  = 100 and tf = 10. Since
the control is a feedback control it can be applied to the nal state, thus obtaining
78

(a) Control Gains: rho = 0 (b) Control Gains: rho = 10


60 100

40
50
20

0
0
−20

−40 −50
0 2 4 6 0 2 4 6
time time
(c) Control Gains: rho = 100 (d) Control Gains: rho = 350
150 400

100
200
50

0 0

−50
−200
−100

−150 −400
0 2 4 6 0 2 4 6
time time

Figure 6.4: Time varying control gains for chemical reactor.


79

a periodic feedback control:


(
uP (t; x) = uuN ((tt;?x);t ; x); t < t 0( +t 1)ttf

N f f f
where  is a positive integer. Since all of the action in the control takes place at the
end of the period we expect good response by shortening the period and repeating
the control several times. Figure 6.5 shows four di erent cases. In gure 6.5 (a)
the time horizon is tf = 10 and the control is only repeated once. In gure 6.5 (b),
the time horizon is tf = 5 and the control is repeated twice. The plots show that
the behavior is essentially the same with slight improvement for the second case.
In gure 6.5 (c) the time horizon is tf = 2 and the control is repeated ve times.
We can see from the plots that the regulator performs best in this case. From
gure 6.4 (c) we can see that most of the variation in the control is in the last two
seconds. This implies that tf = 2 should be the best period to choose. If we shorten
the period beyond this point then the performance begins to degrade as can be seen
in gure 6.5 (d) where tf = 1 and the control is repeated ten times.
There are many interesting phenomena associated with the nite-time prob-
lem, that need to be investigated.
80

(a) States: t_f = 10, P = 1 (b) States: t_f = 5, P=2


0.4 0.4

0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4
0 5 10 0 5 10
time time
(c) States: t_f = 2, P=5 (d) States: t_f = 1, P = 10
0.2 0.4

0.2
0

−0.2
−0.2

−0.4 −0.4
0 5 10 0 5 10
time time

Figure 6.5: Time histories for chemical reactor.


81

6.1.6 Finite-time Nonholonomic Example


In this section we apply our algorithm to a rst order nonholonomic system
in chain form (Murray and Sastry, 1993):
x_ = u
y_ = v
z_ = xv

where u and v are control variables. A famous theorem by Brockett states that
this system cannot be stabilized by a continuous, constant state feedback (Brock-
ett, 1983). To control the system various methods have been proposed using dis-
continuous and/or time-varying feedback. A survey of the eld of stabilization of
nonholonomic system can be found in (Samson, 1995; Srdalen and Egeland, 1995).
We will use the nite-time version of our algorithm to nd a (continuous) time-
varying feedback control law for the system. Repeated application of the control
law results in a piecewise continuous periodic controller. The cost function for the
system will be
Z tf
J = 1 k'(tf ; 0; x; u)k2 + 2 k'( ; 0; x; u)k2 + ku(; '( ; 0; x; u))k2 d;
0
where 1 and 2 are parameters. The set
was chosen somewhat arbitrarily to be

= [?2; 2]  [?2; 2]  [?2; 2]. For basis functions we use even polynomials in x,
y and z up to order six, deleting terms which integrated to zero due to symmetry.
We had N = 19 basis functions. For certain values of 1 and 2 the projected HJB
equation (4.13) escapes in nite-time. Various values of 1 and 2 , along with their
admissibility or inadmissibility are shown in the table below.
1 2 control status Plot
0 1 admissible gure 6.6 (a)
1 0 nite-escape
1 1 admissible gure 6.6 (b)
5 1 nite-escape
5 5 admissible gure 6.6 (c)
10 10 nite-escape
1 10 admissible gure 6.6 (d)
1 100 admissible gure 6.6 (e)
1 1000 admissible gure 6.6 (f)
10 100 admissible
The admissible gains associated with this table are shown in gure 6.6 for a nal
time of tf = 30 seconds. We can see that the nal state weighting 1 directly
a ects the admissibility of the control. When 1 is large then the control gains must
82

(a) Control Gains: rho = (0,1)^T (b) Control Gains: rho = (1,1)^T
5 5

0 0

−5 −5
0 10 20 30 0 10 20 30
(c) Control Gains: rho = (5,5)^T (d) Control Gains: rho = (1,10)^T
10 10

0 0

−10 −10

−20 −20
0 10 20 30 0 10 20 30
(e) Control Gains: rho = (1,100)^T (f) Control Gains: rho = (1,1000)^T
50 100

0
0
−100

−50 −200
0 10 20 30 0 10 20 30
time time

Figure 6.6: Unicycle: Control Gains


be large at the end of the time period. In addition, most of the activity in the
control gains occurs at the end of the time period. Large values of 1 correspond
to rapid variation in the control gains. A sti ode solver is used to compute the
gains, however it is not clear if the gain equations are ill-posed or too sti for our
software. This is an area for further investigation. As the number of basis functions
is increased, 1 can be made larger while retaining admissibility.
Figure 6.6 indicates that as tf ! 1, the control gains at t0 = 0 converge to
constant values. As mentioned in the previous section this is analogous to the time-
varying verses algebraic Riccati equation. In the previous example the control gains
converge to the gains obtained from the in nite-time problem. In this example, the
in nite-time problem is not well-posed since there is not a continuous constant state
feedback that stabilizes the system. The time history of the system for x0 = (1; 1; 1)T
with the converged control gains is shown in gure 6.7. We can see from the gure
that the control is stabilizing but not asymptotically stabilizing.
To design a control for the system we let 1 = 1 and 2 = 10. When the
83

1.2

0.8

0.6

states
0.4

0.2

−0.2
0 5 10 15 20 25 30
time

Figure 6.7: Unicycle: Time history for in nite-time gains


nal time in the performance index is chosen to be tf = 30, then one period of the
closed-loop system is shown in gure 6.8 (a). When tf = 10, then three periods of
the closed-loop system are shown in gure 6.8 (b). Figure 6.8 (c) and (d) correspond
to tf = 5 and tf = 2:5 respectively. These gures show that the choice of the nal
time tf a ect the performance of the system. This e ect is another area for future
investigation.
84

(a) States: t_f = 30, P = 1 (b) States: t_f = 10, P = 3


1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5
0 10 20 30 0 10 20 30

(c) States: t_f = 5, P=6 (d) States: t_f = 2.5, P = 12


1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5
0 10 20 30 0 10 20 30
time time

Figure 6.8: Unicycle: Time Histories


85

6.2 Comparative Examples


In this section we compare our method with some of the methods described
in chapter 2. There are very few nite-time examples that appear in the literature
and so we will only make comparisons for in nite-time problems. We compare
our algorithm with perturbation methods in section 6.2.1, with the regularization
method in section 6.2.2, and with feedback linearization in section 6.2.3.
6.2.1 Comparison with Perturbation Methods
In this section we compare the control law obtained by using our algorithm,
with those obtained by using the perturbation methods of Nishikawa (Nishikawa
et al., 1971) and Garrard (Garrard, 1969) which are explained in section 2.3.
The system considered in (Nishikawa et al., 1971) and (Garrard, 1969) is
)
x_ 1 = x2 (6.5)
x_ 2 = x31 + u
subject to the performance index
Z1
J = 21 x21 + x22 + u2dt:
0
We will compare the methods for the arbitrarily selected value of  = 1.
For a fourth order approximation, the three methods produce the following
control laws:
un(x) = ?x1 ? 1:7321x2 ? x31 ? 1:3235x21x2 ? 0:6923x1x22 ? 0:1323x32;
ug (x) = ?x1 ? 1:7321x2 ? x31 ? 0:7506x21x2 ? 0:3997x1x22 ? 0:0769x32;
u4(1) (x) = ?1:0376x1 ? 1:7975x2 ? 1:3079x31 ? 1:3429x21x2
?0:4664x1x22 ? 0:74x32;
where un is the control derived in (Nishikawa et al., 1971), ug is the control derived
in
(Garrard, 1969), and u4(1) is the control derived by the method in this thesis.
The cost of using each of these controls for a xed value of x2(0) = 0 and
x1 (0) 2 [?1; 1] is shown in gure 6.9 (a). In this plot the costs are indistinguish-
able, so we also plot the di erences. The di erence Vn(x) ? V4(1)(x) is plotted in
gure 6.9 (b), while the di erence Vg (x) ? V4(1) (x) is plotted in gure 6.9 (c).
From gure 6.9 we see that the suboptimal control derived in this thesis has
comparable performance to the established methods of (Nishikawa et al., 1971) and
(Garrard, 1969). The method of (Nishikawa et al., 1971) is slightly better for this
example. However, both the method of (Garrard, 1969) and (Nishikawa et al.,
1971) become prohibitively complex as the order of approximation is increased to
include terms of higher order than cubic. The method presented in this thesis, easily
86

(a) Cost vs. State


2
V : ..
g
V : −−
1 n
(oo)
V :−
4
0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
x_1
−3 (oo)
x 10 (b) V n −V 4 vs. State
1

−1

−2
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
x_1
(oo)
0.02 (c) V g −V 4 vs. State

0.01

−0.01
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
x_1

Figure 6.9: Comparison with perturbation method.


computes higher order approximations and is applicable to a much broader class of
systems.
87

6.2.2 Comparison with Regularization Methods


In this section we compare our method with the results obtained in (Ryan,
1984) using the Regularization method described in section 2.4. One of the examples
given in (Ryan, 1984) is the the following second-order system with variable damping
y(t) + (1 + u(t)) y_ (t) + y(t) = 0; ju(t)j  1; > 0;
which can be written as the two-dimensional bilinear system
x_ (t) = Ax(t) + Bx(t)u(t); ju(t)j  1;
where ! !
A = ?01 ?1 ; B = 00 ?0 ; > 0:
The cost function to be minimized is the standard quadratic functional of the state
and control Z1
J = xT (t)Qx(t) + ru2(t) dt;
0
where we let Q = I and r = 1=2. Let the saturation function be de ned as
8
>
< +1;  > 1;
sat() = > ; jj > 1;
: ?1;  < ?1;
The suboptimal control computed in (Ryan, 1984) is
h i
ur = ?sat r?1 (x) ;  (x) = ?x2 ( x1 + 2x2);
which is the control that minimizes the cost function
Z1
J= xT (t)Qx(t) + ru2(t) + (x) dt;
0
where (
(x) = j1r(?x1)j 2?(x2)r;; jj((xx))jj 
1 > r;
2 r;
When the method presented in this thesis is used to compute a control law to a
sixth order approximation, we obtain the following control
u6(1)(x) = 2:026x1x2 + 1:7593x22 ? 0:7822x42 ? 1:6904x1x32
?0:0925x21x22 ? 0:1481x31x2 + 0:1603x62 + 0:8166x1x52
+0:2675x21x42 + 0:0073x31x32 + 0:1443x41x22 + 0:0755x51x2 :
88

To satisfy the constraint juj  1 we implement the control


u(x) = sat(u6(1)(x)):
A comparison of the value function for both of these controls with the initial velocity
set to zero, i.e. x2 = 0 is shown in gure 6.10 (a). gure 6.10 (b) shows the value
functions when x1 = x2 is the initial condition. In these gures Vr is the value
function associated with the control, ur computed in (Ryan, 1984), an V6(1) is the
value function associated with the control u, obtained using our method. From these
gures it is clear that the control synthesized by our method performs substantially
better than the control found in (Ryan, 1984).
(a) Cost vs state (b) Cost vs state
1.8 6

1.6
5
1.4

1.2 4
: −− V : −−
V r
r
1
Cost

Cost

3 (oo)
(oo)
V :− V :−
0.8 6
6

0.6 2

0.4
1
0.2

0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Position x Position x =x
1 1 2

Figure 6.10: Comparison with regularization method.


89

6.2.3 Comparison with Exact Linearization Method


In this section we will compare our method with the method of exact feedback
linearization. We will use the following system
! !
x_ = ?xx1+?xx2 + 01 u:
3
(6.6)
1 2
The control objective is to regulate the system while minimizing the quadratic func-
tional of the states Z1
J = xT (t)x(t) + u2(x(t)) dt:
0
Using the feedback method outlined in (Isidori, 1989) this system can be linearized
using the state feedback
u(x) = 3x51 + 3x21 x2 ? x2 + v (6.7)
and the coordinate transformation
z1 = x 1 (6.8)
z2 = x31 + x2
In the new coordinate system (6.6) becomes
! !
z_ = 0 ?1 + 0 v:
1 0 1
We now use standard LQR theory to optimize the transformed system with
respect to the cost function
Z1
J= zT (t)z(t) + v2(z(t)) dt
0
to obtain the control
v(z) = 0:4142z1 ? 1:3522z2: (6.9)
The suboptimal control synthesized by exact linearization is given by substituting
(6.8) into (6.9) to obtain a function v(x) and then using the control given in (6.7)
to obtain
ufl (x) = 3x51 + 3x21 x2 ? x2 + 0:4142x1 ? 1:3522(x31 + x2 ):
Using our method to a sixth order approximation, we obtain the following
control
u6(1)(x) = ?2:5822x2 + 0:1643x1 + 0:301x32 ? 0:8441x22x1
?1:3757x2x21 ? 0:9661x31 ? 0:0574x52 + 0:0995x42x1
90

?0:3463x32x21 + 0:6204x22x31 ? 0:7337x2x41 + 0:4071x51:


Cost vs state
4

3.5

3
V : ..
fl

2.5 (oo)
V :−
6
Cost

1.5

0.5

0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
x
1

Figure 6.11: Comparison with feedback linearization.


In gure 6.11, a plot of the cost function, Vfl , associated with the control ufl ,
is compared with the cost function, V6(1) , associated with the control u6(1). For this
example, our method clearly computes a control that performs much better than
the control synthesized using exact feedback linearization.
91

6.3 Design Example


The examples given in the previous sections show the characteristics of the
algorithm and how it compares to other methods. The systems that were considered,
however, are mainly of academic interest. In this section we apply the method to
design a control for a real system that is of signi cant engineering interest. The
objective it to demonstrate that the algorithm is a powerful design tool.
6.3.1 Voltage Regulation of a Power System
The problem that we will consider is regulating the voltage of a power generator
that is subject to major disturbances like large sudden faults. An actual power
system has multiple generators interconnected through a large dynamic network.
A common approach to designing control systems for the generators is to model
the dynamics of a single generator and to approximate everything else a an in nite
bus, i.e., it is assumed that the voltage and phase of the entire network are not
e ected by the input power or eld excitation of the generator (Stevenson, 1982).
The model, shown in gure 6.12, consists of a single generator connected to the
in nite bus through parallel transmission lines. Models of this type are frequently
employed in power system design. The dynamical equations that govern the system
are highly nonlinear and so the typical approach is to linearize the system around
a desired operating point and design a control law for the linearized system. While
this approach is adequate for regulating steady state behavior, when a fault occurs
on the transmission line, large transients occur in the states which forces the system
beyond the region for which the linearized model hold. When large faults occur,
various heuristic methods are used to bring the states of the generator back into the
region where the linear control law is valid.
During the transients following a fault, a great deal of power is lost and so
it is desirable to regulate the system to equilibrium as quickly as possible. There
are several papers in the power literature that employ nonlinear control theory to
design regulators to control the transients after a large fault. These include (Wang
et al., 1993; King et al., 1994; Gao et al., 1992; Marino, 1984; Wang et al., 1994;
Chapman et al., 1993). In (Wang et al., 1993) a voltage regulator is designed using
feedback linearization. We will show how algorithm 4.3.2 can be used to enhance
the performance of their design.
In (Wang et al., 1993) the following system equations are derived to model the
generator in gure 6.12:
_ (t) = !(t)
!_ (t) = ? 2DH !(t) + 2!H0 (Pm (t) ? Pe(t))
( " #
1 1 V 0 0 V
P_e(t) = ? T 0 Pe(t) + T 0 x sin((t)) kcuf + Td0 (xd ? xd ) x0 !(t) sin((t))
s s
d0 d0 o ds ds
0
+Td0!(t) cot((t))
92

where
4
(t) = the power angle,
4
!(t) = the relative speed,
4
kc = the gain of the excitation ampli er,
4
uf (t) = the input of the SCR ampli er of the generator,
4
D = per unit damping constant,
4
H = per unit inertia constant,
4
!0 = synchronous machine speed,
4
Pm = the mechanical input power,
4
Pe = the active electrical power delivered by generator,
4
Tdo = direct axis transient short circuit time constant,
4
Vs = in nite bus voltage,
4
xd = direct axis reactance of the generator,
4
x0d = direct axis transient reactance of the generator,
0
0 4 xds
Td0 = x Td0 :
ds
We rst assume that Pm(t) is constant and that we wish to drive Pe(t) ! Pm .
De ne the following constants
a1 = ? 2DH
a2 = 2!H0
a3 = ? T10
d0

Breaker
Generator
XL

XL
Transformer
Fault

Figure 6.12: Generator connected through transmission lines to in nite bus.


93

V 2 (xd ? x0 )
a4 = s d
xdsx0ds
a5 = Tk0cVxs :
d0 ds
Rearranging the equations give
_ = !(t)
!_ = a1 !(t) + a2 (Pm ? Pe(t))
P_e = a3 Pe(t) + !(t)Pe(t) cot((t)) + a4!(t) sin2 ((t)) + a5 sin((t))uf :
The objective of the control is to shape uf to drive
0 1 0 1
B@ !(t) CA ! B@ 00 CA :
(t)
Pe(t) Pm
To put the equations in regulator form, we make the following change of vari-
able:
x =  ? 0
y = !
z = P e ? Pm :
Letting
? a3 Pm ;
uf = a usin(
5 x+ ) 0
the system equations becomes
x_ = y
y_ = a1y ? a2 z
z_ = a3z + y(z + Pm )cot(x + 0 ) + a4 ysin2(x + 0 ) + u
The equations are now in the form required by algorithm 4.3.2. For the cost function
we arbitrarily selected a quadratic weightings on the states and control,
Z1
J= (x2 + y2 + z2 + u2)dt:
0
Using feedback linearization, an initial stabilizing control is ((Wang et al., 1993))
ufl = ?y(z + Pm ) cot(x + 0) ? a4 y sin2 (x + 0 ) + k1x + k2y + k3z;
94

where K are the Kalman gains associated with the linear system
0 1 0 1
0 1 0 0
A = @ 0 a1 ?a2 A ; B = @ 0 C
B C B A ; Q = I (3); R = 1:
0 0 a3 1
Therefore the initial control is
ufl = ?y(z + Pm) cot(x + 0 ) ? a4y sin2(x + 0) + x + 1:0725y ? 9:6993z:
For the basis functions use polynomials up to degree 4,
fj g = f z2 ; yz; y2; xz; xy; x2 ; z4 ; yz3; y2z2 ; y3z; y4; xz3 ; xyz2; xy2z;
xy3; x2z2 ; x2yz; x2 y2; x3 z; x3 y; x4g:
The set
is selected by observing the uncontrolled dynamics (which are sta-
ble), and bounding the state response. The following bounds were selected:
40o    140o
:98  !  1:02
0  Pe  1:5:
When algorithm 4.3.2 is applied to system, the following control is calculated for
i = 10:
u= ?5:2205z + 0:2407y + 1:1936x ? 6:8881z3 + 3:6315z2 y
?0:9524zy2 + 0:0987y3 ? 2:1929z2x + 0:7961zxy
?0:01972xy2 ? 2:18995zx2 + 0:4273x2y ? 0:05338x3:
The physical limits of the plant saturate the control variable between
?5  u  5:
In gure 6.13 we show the response of the system to a fault occurring between
0:55  t  0:7 seconds. In the gure the dotted line represents the uncontrolled
system, the dashed line represents the system controlled by feedback linearization,
and the solid line corresponds to the system under the control derived by our method.
The key variable is angle, . Reduction in the maximum angle swing and good
transient behavior in , correspond to money saved by the power utility. It can
be seen from gure 6.13 that the control derived by our method corresponds to a
10% reduction in the maximum power swing as compared to the control obtained
by feedback linearization.
The time history of the control variable is shown in gure 6.14. This plot shows
that the feedback linearizing control has several unwanted swings before the system
95

150 1.02

100

Speed
Angle

1
50

0 0.98
0 2 4 6 0 2 4 6
time time
1 6

0.8 4
Eqp

Eq

0.6 2

0.4 0
0 2 4 6 0 2 4 6
time time
1.5 1

1
Pe

Vt

0.5
0.5

0 0
0 2 4 6 0 2 4 6
time time

Figure 6.13: Power generator, time histories of the states.


96

Control
0

−1

−2

−3

−4

−5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
time

Figure 6.14: Power generator, time history of the control.


is brought close to the equilibrium. The indicates that the control is over sensitive.
On the other hand, the control derived by our method remains in saturation, until
switching to the other limit, indicating more robust behavior.
CHAPTER 7
CONCLUSION AND FUTURE WORK
In this thesis we posed the problem of nding a practical method to improve the
closed-loop performance of a stabilizing feedback control and showed that the prob-
lem reduces to solving the Generalized-Hamilton-Jacobi-Bellman (GHJB) equation.
We showed that Galerkin's spectral method could be used to approximate the GHJB
equation such that the resulting controls are in feedback form and stabilize the
closed-loop system. We also showed that for a high enough order of approximation,
the process can be iterated and that the resulting algorithm converges uniformly to
the solution of the Hamilton-Jacobi-Bellman (HJB) equation.
As a convenience to the reader, in the next section we give a concise statement
of the main results of the thesis.

7.1 Overview of the Main Results


Motivation:
 There is a need for general design methods that systematically improve
the closed-loop performance of nonlinear systems.
 There is a need for a practical method of computing arbitrarily close
approximations of the HJB equation, such that the resulting control is
admissible on a well de ned set.
Assumptions:
 The system equations (f; g) and the state and control weighting functions
(s; l; R) satisfy the conditions A3.1{A3.4 on page 29
 u(0) is an admissible control for the system (f; g) on a region
of the
state space.

is a compact subset of the region of state space over which u(0) is
admissible.
 The set fj g1 1 is complete in the sense of condition A4.1 on page 32.
 Conditions A5.1{A5.12 on page 63 are satis ed.
Results:
 Algorithms 4.3.1 and 4.3.2 on pages 35 and 36 respectively, converge to
the optimal control as N and i go to 1.
 For N suciently large, u(Ni) is an admissible control on
.
Advantages:
 The algorithm produces feedback control laws.
97
98

 The controls computed by the method are robust in the same sense as
the optimal control.
 All computations are performed o -line.
 Once a solution is found, the control can be implemented in hardware
and run in real time.
 Coecients for the state and control weighting functions can be taken
outside the integral so tuning the control through the penalty function
becomes computationally fast.
Disadvantages:
 \The curse of dimensionality:" n-dimensional integrals must be com-
puted. This limits the method to systems with small state spaces.
 The control is given as a series of basis functions. Therefore the control
is inherently complex.
7.2 Contributions
The main contributions can be summarized as follows.
1. In chapter 3 we formalized the notion of an admissible control for both the
in nite-time horizon problem and the nite-time horizon problem. While ad-
missibility is implicitly assumed in the literature, we have never seen it explic-
itly de ned.
2. Attention was restricted to a compact set contained within the stability region
of an initial control. This places the performance index and control in an
inner product space allowing Galerkin's spectral method to be applied to the
problem.
3. The successive approximation algorithm was used to show that the optimal
control has the largest possible stability region of any admissible control.
4. Galerkin's spectral method was used to reduce the GHJB equation to a linear
algebraic equation. The resulting approximation was then used in the succes-
sive approximation algorithm to develop a new algorithm that improves the
performance of arbitrary stabilizing controls and converges to the solution of
the HJB equation.
5. Sucient conditions were developed to ensure that the dual approximation
converges uniformly to the optimal solution.
6. For a high enough (but nite) order of approximation, the approximate control
was shown to be admissible and to be robust in the same sense as the optimal
control.
99

7. It was shown that the region of stability of the approximate control can be
made equal to
, the estimate of the stability region of the initial control.
Therefore, in contrast to other method that approximate the HJB equation,
the approximate controls are (state) feedback controls with a guaranteed re-
gion of attraction that is well de ned.
8. The algorithm was applied to a variety of systems (both in nite and nite
time) and was shown to produce results that are comparable to other well
known approximation schemes, while being applicable to a much broader class
of systems.
7.3 Future Work
Suggestions for future research are summarized below.
1. The convergence result in chapter 5 shows that for a xed i, there is a K after
which the approximate control is admissible and close to the optimal. However,
when we implement the algorithm we x N and increase i. Therefore we would
like to know that as i increases, N is not required to increase. In other words,
if N1 works for i = 1, then we would like to know that N1 works as i ! 1.
Simulations indicate that this result is true and we leave it as a conjecture to
be shown.
2. We have found that for nite-time problems it is much easier to compute the
Galerkin approximation of the HJB equation directly, since it is as easy to
numerically solve a nonlinear ordinary di erential equation as a linear one.
To justify this approach we need to show that VN ! V  as N ! 1. If we
could place a bound on N as i increases, the proof should follow as a limiting
case of the proof in chapter 5, but this needs to be investigated. We also
need to investigate the existence and uniqueness of the ordinary di erential
equation generated by doing the Galerkin approximation of the HJB equation.
3. We would like to have error bounds on the solution when i and N are xed.
4. Selection of basis functions is a major consideration in the approach. The
quality of the control will be determined by the basis functions used. Polyno-
mials seem to work well, but for a given system it would be nice to have some
guidance in selecting a good basis.
5. The method uniformly approximates the Hamilton-Jacobi-Bellman equation.
Since other Hamilton-Jacobi equations show up in various branches of nonlin-
ear system theory, the results of this thesis should extend naturally to these
problems which include the following list:
 Nonlinear optimal control of stochastic systems,
 Nonlinear H1 optimal control,
100

 Nonlinear estimation,
 Control of Nonlinear Systems using output feedback,
 Nonlinear di erential games.
6. In its current framework, our algorithm cannot handle explicit constraints on
the state and control variables. It may be possible to extend the method to the
case where explicit constraints are placed on the control (e.g. kuk  1): Rather
than solving an explicit linear equation, we would solve a linear programming
problem.

7.4 Conclusion
Given a nonlinear system, it is usually possible to nd a feedback control law
that renders the closed-loop system stable. The thesis provides a practical tool that
enables a control engineer to advance the design by systematically incorporating
system performance. It is hoped that the algorithm will assist practicing engineers
to design enhanced controls for a variety of nonlinear systems.
REFERENCES
Aganovic, Z. and Gajic, Z. (1994). The successive approximation procedure for
nite-time optimal control of bilinear systems. IEEE Transactions on Automatic
Control, AC{39(9):1932{1935.
Al'brekht, E. G. (1961). On the optimal stabilization of nonlinear systems.
Joural of Applied Mathematics and Mechanics, 25(5):836{844.
Anderson, B. D. O. and Moore, J. B. (1971). Linear Optimal Control.
Prentice-Hall, Englewood Cli s, New Jersey.
Apostol, T. M. (1974). Mathematical Analysis. Addison Wesley.
Arnold, V. I. (1973). Ordinary Di erential Equations. MIT Press.
Arnold, V. I. (1989). Mathematical Methods of Classical Mechanics. Springer
Verlag.
Balaram, J. (1985). Suboptimal Control of Nonlinear Systems. PhD thesis,
Rensselaer Polytechnic Institute, Troy, New York 12180.
Ball, J. A., Helton, J. W., and Walker, M. L. (1993). h1 control for nonlinear
systems with output feedback. IEEE Transactions on Automatic Control,
38(4):548{559.
Baumann, W. T. and Rugh, W. J. (1986). Feedback control of nonlinear
systems by extended linearization. IEEE Transactions on Automatic Control,
31(1):40{46.
Bellman, R. E. (1957). Dynamic Programming. Princeton University Press,
Princeton, New Jersey.
Bertsekas, D. P. (1976). On error bounds for successive approximation methods.
IEEE Transactions on Automatic Control, 21:394{396.
Bosarge, W. E., Johnson, O. G., McKnight, R. S., and Timlake, W. P. (1973).
The Ritz-Galerkin procedure for nonlinear control problems. SIAM Journal of
Numerical Analysis, 10(1):94{110.
Brockett, R. W. (1983). Asymptotic stability and feedback stabilization. In
Millman, R. S. and Sussmann, H. J., editors, Di erential Geometric Control
Theory, pages 181{191. Birkhauser.
Bryson, A. E. and Ho, Y. C. (1975). Applied Optimal Control. Hemisphere, New
York.

101
102

Burden, R. L. and Faires, J. D. (1988). Numerical Analysis. PWS-KENT


Publishing Company, Boston, fourth edition edition.
Cebuhar, W. A. and Costanza, V. (1984). Approximation procedures for the
optimal control of bilinear and nonlinear systems. Journal of Optimization
Theory and Applications, 43(4):615{627.
Chapman, J. W., Ilic, M. D., King, C. A., Eng, L., and Kaufman, H. (1993).
Stabilizing a multimachine power system via decentralized feedback linearizing
excitation control. IEEE Transactions on Power Systems, 8(3):830{839.
Chow, J. H. and Kokotovic, P. V. (1981). A two-stage Lyapunov-Bellman
feedback design of a class of nonlinear systems. IEEE Transactions on
Automatic Control, 26(3):656{663.
Cloutier, J. R., D'Souza, C. N., and Mracek, C. P. (1996). Nonlinear regulation
and nonlinear h1 control via the state-dependent Riccati equation technique. In
IFAC World Congress (Preprint), San Francisco, CA.
Finlayson, B. A. (1972). The Method of Weighted Residuals and Variational
Principles. Academic Press.
Freeman, R. A. and Kokotovic, P. V. (1995). Optimal nonlinear controllers for
feedback linearizable systems. In Proceedings of the American Control
Conference, pages 2722{2726, Seattle, Washington.
Gao, L., Chen, L., Fan, Y., and Ma, H. (1992). A nonlinear control design for
power systems. Automatica, 28:975{979.
Garrard, W. L. (1969). Additional results on suboptimal feedback control of
nonlinear systems. International Journal of Control, 10(6):657{663.
Garrard, W. L. (1977). Suboptimal feedback control for nonlinear systems.
Automatica, 8:219{221.
Garrard, W. L. and Jordan, J. M. (1977). Design of nonlinear automatic ight
control systems. Automatica, 13:497{505.
Garrard, W. L., McClamroch, N. H., and Clark, L. G. (1967). An approach to
suboptimal feedback control of nonlinear systems. International Journal of
Control, 5(5):425{435.
Genesio, R., Tartaglia, M., and Vicino, A. (1985). On the estimation of
asymptotic stability regions: State of the art and new proposals. IEEE
Transactions on Automatic Control, 30(8):747{755.
Glad, S. T. (1987). Robustness of nonlinear state feedback{a survey.
Automatica, 23(4):425{435.
103

Glad, T. (1985). Robust nonlinear regulators based on Hamilton-Jacobi theory


and Lyapunov functions. In IEE Control 85 Conference, pages 276{280,
Cambridge.
Goh, C. J. (1993). On the nonlinear optimal regulator problem. Automatica,
29:751{756.
Halme, A. and Hamalainen, R. P. (1975). On the nonlinear regulator problem.
Journal of Optimization Theory and Applications, 16:255{275.
Haussler, R. L. (1963). On the Suboptimal Design of Nonlinear Control Systems.
PhD thesis, Purdue University, Lafayette, IN.
Hofer, E. P. and Tibken, B. (1988). An iterative method for the nite-time
bilinear-quadratic control problem. Journal of Optimization Theory and
Applications, 57(3):411{427.
Hunt, L. R., Su, R., and Meyer, G. (1983a). Design for multi-input nonlinear
systems. In Brockett, R. W., Millman, R. S., and Sussmann, H., editors,
Di erential Geometric Control Theory, pages 268{298. Birkhauser.
Hunt, L. R., Su, R., and Meyer, G. (1983b). Global transformations of nonlinear
systems. IEEE Transactions on Automatic Control, 28(1):24{31.
Isidori, A. (1989). Nonlinear Control Systems. Communication and Control
Engineering. Springer{Verlag, New York, second edition.
Jamshidi, M. (1976). A feedback near-optimum control for nonlinear systems.
Information and Control, 32:75{84.
Johansson, R. (1990). Quadratic optimization of motion coordination and
control. IEEE Transactions on Automatic Control, AC-35(11):1197{1208.
Jones, F. (1993). Lebesgue Integration on Euclidean Space. Jones and Bartlett
Publishers.
Kantorovich, L. V. and Krylov, V. I. (1958). Approximate Methods of Higher
Analysis. Interscience Publishers, Inc.
Khalil, H. K. (1992). Nonlinear Systems. Macmillan Publishing Company, New
York.
King, C. A., Chapman, J. W., and Ilic, M. D. (1994). Feedback linearizing
excitation control on a full-scale power system model. IEEE Transactions on
Power Systems, 9(2):1102{1109.
Kirk, D. E. (1970). Optimal Control Theory. Prentice-Hall.
104

Kleinman, D. L. (1968). On an iterative technique for Riccati equation


computations. IEEE Transactions on Automatic Control, AC-13:114{115.
Kleinman, D. L. (1970). An easy way to stabilize a linear constant system.
IEEE Transactions on Automatic Control, AC-15:692.
Kreyszig, E. (1978). Introductory Functional Analysis. John Wiley.
Laub, A. J. (1991). Invariant subspace methods for the numerical solution of
Riccati equations. In Bittanti, Laub, W., editor, The Riccati Equation, pages
163{196. Springer Verlag.
Leake, R. J. and Liu, R.-W. (1967). Construction of suboptimal control
sequences. J. SIAM Control, 5(1):54{63.
Lee, C. S. G. and Chen, M. H. (1983). A suboptimal control design for
mechanical manipulators. In American Control Conference, pages 1056{1061,
Sheraton-Palace Hotel, San Fransico, CA.
Lewis, F. L. (1986). Optimal Control. John Wiley & Sons, New York.
Lewis, F. L., Abdallah, C. T., and Dawson, D. M. (1993). Control of Robot
Manipulators. Macmillan Publishing Co., New York.
Lu, P. (1993). A new nonlinear optimal feedback control law. Control Theory
and Advanced Technology, 9(4):947{954.
Lukes, D. L. (1969). Optimal regulation of nonlinear dynamical systems. SIAM
Journal on Control, 7(1):75{100.
Mageirou, E. F. (1977). Iterative techniques for Riccati game equations.
Journal of Optimization Theory and Applications, 22(1):51{61.
Marino, R. (1984). An example of a nonlinear regulator. IEEE Transactions on
Automatic Control, AC-29(3):276{279.
Merriam III, C. W. (1964). Optimization Theory and the Design of Feedback
Control Systems. McGraw-Hill.
Mikhlin, S. G. (1964). Variational Methods in Mathematical Physics. The
MacMillian Company, New York.
Mikhlin, S. G. and Smolitskiy, K. L. (1967). Approximate Methods for Solution
of Di erential and Integral Equations. American Elsevier Publishing Company,
Inc., New York.
Mil'shtein, G. N. (1964). Successive approiximations for solution of one optimal
problem. Automation and Remote Control, 25:298{306.
105

Mohler, R. R. (1991). Nonlinear Systems: Applications to Bilinear Control,


volume 2. Prentice Hall, New Jersey.
Murray, R. M. and Sastry, S. S. (1993). Nonholonomic motion planning:
Steering using sinusoids. IEEE Transactions on Automatic Control,
38(5):700{716.
Nijmeijer, H. and van der Schaft, A. J. (1990). Nonlinear Dynamical Control
Systems. Springer-Verlag, New York.
Nishikawa, Y., Sannomiya, N., and Itakura, H. (1971). A method for
suboptimal design of nonlinear feedback systems. Automatica, 7:703{712.
Petryshyn, W. V. (1965). On a class of k-p.d. and non-k-p.d. operators and
operator equations. Journal of Mathematical Analysis and Applications, 10:1{24.
Rajkumar, V. and Mohler, R. R. (1995). Nonlinear control methods for power
systems: A comparison. IEEE Transactions on Control Systems Technology,
3(2):231{237.
Rekasius, Z. V. (1964). Suboptimal design of intentionally nonlinear controllers.
IEEE Transactions on Automatic Control, AC-9(4):380{386.
Rosen, O. and Luus, R. (1992). Global optimization approach to nonlinear
optimal control. Journal of Optimization Theory and Applications,
73(3):547{562.
Rugh, W. J. (1971). System equivalence in a class of nonlinear optimal control
problems. IEEE Transactions on Automatic Control, AC-16:189{194.
Ryan, E. P. (1984). Optimal feedback control of bilinear systems. Journal of
Optimization Theory and Applications, 44(2):333{362.
Sage, A. P. and White III, C. C. (1977). Optimum Systems Control.
Prentice-Hall, 2 edition.
Samson, C. (1995). Control of chained systems, application to path following
and time-varying point-stabilization of mobil robots. IEEE Transactions on
Automatic Control, 40(1):64{77.
Sandell, N. R. (1974). On Newton's method for Riccati equation solution. IEEE
Transactions on Automatic Control, AC-19:254{255.
Saridis, G. N. and Balaram, J. (1986). Suboptimal control for nonlinear
systems. Control Theory and Advanced Technology, 2(3):547{562.
Saridis, G. N. and Lee, C.-S. G. (1979). An approximation theory of optimal
control for trainable manipulators. IEEE Transactions on Systems, Man, and
Cybernetics, SMC-9(3):152{159.
106

Saridis, G. N. and Wang, F. (1994). Suboptimal control of nonlinear stochastic


systems. Control Theory and Advanced Technology, 10(4):847{871.
Schultz, M. H. (1969a). Error bounds for the Rayleigh-Ritz-Galerkin method.
Journal of Mathematical Analysis and Applications, 27:524{533.
Schultz, M. H. (1969b). The Galerkin method for nonself-adjoint di erential
equations. Journal of Mathematical Analysis and Applications, 28:647{651.
Shamma, J. S. and Athans, M. (1990). Analysis of gain scheduled control for
nonlinear plants. IEEE Transactions on Automatic Control, 35(8):898{907.
Srdalen, O. J. and Egeland, O. (1995). Exponential stabilization of
nonholonomic chained systems. IEEE Transactions on Automatic Control,
40(1):35{49.
Stevenson, W. D. (1982). Elements of Power System Analysis. McGraw-Hill.
Tsitsiklis, J. N. and Athans, M. (1984). Guaranteed robustness properties of
multivariable nonlinear stochastic optimal regulators. IEEE Transactions on
Automatic Control, 29(8):690{696.
Tzafestas, S. G., Anagnostou, K. E., and Pimenides, T. G. (1984). Stabilizing
optimal control of bilinear systems with a generalized cost. Optimal Control
Applications and Methods, 5:111{117.
Vaisbord, E. M. (1963). An approximate method for the synthesis of optimal
control. Automation and Remote Control, 24:1626{1632.
van der Schaft, A. J. (1992). l2 -gain analysis of nonlinear systems and nonlinear
systems and nonlinear state feedback h1 control. IEEE Transactions on
Automatic Control, AC-37(6):770{784.
Van Trees, H. L. (1962). Synthesis of Optimum Nonlinear Control Systems.
M.I.T. Press, Cambridge, Massachusetts.
Vidyasagar, M. (1993). Nonlinear Systems Analysis. Prentice Hall, second
edition.
Vinter, R. B. and Lewis, R. M. (1978a). The equivalence of strong and weak
formulations for certain prolems in optimal control. SIAM Journal of Control
and Optimization, 16(4):546{570.
Vinter, R. B. and Lewis, R. M. (1978b). A necessary and sucient condition for
optimality of dynamic programming type, making no a priori assumption on the
controls. SIAM Journal of Control and Optimization, 16(4):571{583.
107

Vinter, R. B. and Lewis, R. M. (1980). A veri cation theorem which provides a


necessary and sucient condition for optimality. IEEE Transactions on
Automatic Control, 25(1):84{89.
Wang, Y., Hill, D. J., Middleton, R. H., and Gao, L. (1993). Transient stability
enhancement and voltage regulation of power systems. IEEE Transactions on
Power Systems, 8(2):620{627.
Wang, Y., Hill, D. J., Middleton, R. H., and Gao, L. (1994). Transient
stabilization of power systems with an adaptive control law. Automatica,
30(9):1409{1413.
Werner, R. A. and Cruz, J. B. (1968). Feedback control which preserves
optimality for systems with unknown parameters. IEEE Transactions on
Automatic Control, 13(6):621{629.
White, R. R. and Cook, G. (1973). Use of piecewise linearization for suboptimal
control of nonlinear systems. International Journal of Control, 18(2):385{397.
Wiener, N. (1949). Extrapolation, Interpolation, and Smoothing of Stationary
Time Series. John Wiley & Sons, New York.
Wiener, N. (1958). Nonlinear Problems in Random Theory. John Wiley & Sons,
New York.
Willemstein, A. P. (1977). Optimal regulation of nonlinear dynamical systems
on a nite interval. SIAM Journal of Control and Optimization,
15(6):1050{1069.
Wise, K. A. and Sedwick, J. L. (1994). Succesive approximation solution of the
hji equation. In Proceedings of the 33rd Conference on Decision and Control,
pages 1387{1391, Lake Buena Vista, FL.
Zeidler, E. (1990a). Nonlinear Functional Analysis and its Applications, II/A:
Linear Monotone Operators. Springer-Verlag.
Zeidler, E. (1990b). Nonlinear Functional Analysis and its Applications, II/B:
Nonlinear Monotone Operators. Springer-Verlag.
APPENDIX A
AUXILIARY RESULTS
There are numerous theorems and lemmas scattered throughout the thesis. When
a theorem or lemma is original to the thesis, the proof is included in the body of
the text. In this appendix we give the proof of any theorem or lemma that is not
original to the thesis but is necessary to make the arguments complete.
A.1 Proof of Lemma 5.2.1.
Existence: We prove existence by construction. Since u is admissible, as-
sumption A3.1 and the existence and uniqueness theorem of ordinary di erential
equations (cf. (Arnold, 1973)), gives that '( ; t; x; u) is uniquely de ned for each
(t; x) 2 D. De ne
Z tf
V^ (t; x) = s('(tf ; t; x; u)) + l(; '( ; t; x; u)) + ku(; '( ; t; x; u))k2R d: (A.1)
t
For the nite-time case we clearly have that V^ (tf ; x) = s(x). Also, u admissible
implies that '( ; t; 0; u) = 0 and u(; 0) = 0, hence
Z tf
V^ (t; 0; u) = s('(tf ; t; 0; u)) + l(; '( ; t; 0; u)) + ku(; '( ; t; 0; u))k2R d = 0;
t
by the positive de nite assumption A3.2: therefore V^ satis es the necessary bound-
ary conditions. By de nition we know that along trajectories of the system we have
that that
^ V^ T x_
V^_ (t; x) = @@tV + @@x
^ ^T
= @ V + @ V [f (t; x) + g(t; x)u(t; x)] :
@t @x
By di erentiating (A.1) along trajectories of the system we obtain
V^_ (t; x) = ?l(t; '(t; t; x; u)) ? ku(t; '(t; t; x; u))k2R
= ?l(t; x) ? ku(t; x)k2R : (A.2)
 
Combining these results we have that V^ satis es the GHJB equation GHJB V^ ; u =
0.
Continuously Di erentiable: By di erentiating (A.1) along trajectories of the

108
109

system, we have that


dV^ (t; x) = ?l(t; x) ? ku(t; x)k2
dt R

which is continuous by the admissibility assumption on u and the continuity as-


sumption, A3.2, on l.
Lyapunov Function: Positive de niteness of V^ and ?V^_ follows from equations
(A.1) and (A.2) and the positive de nite assumption, A3.2 on l, s and R.
Autonomous: In the in nite-time case, l and u are autonomous so (A.1) be-
comes Z
^V (t; x) = 1 l('( ; t; x; u)) + ku('( ; t; x; u))k2R d:
t
Since f and g are autonomous, we have that
'( ; t; x; u) = '( ? t; 0; x; u):
Therefore by the change of variables  = t +  we obtain
Z1
V^ (t; x) = l('(; 0; x; u)) + ku('(; 0; x; u))k2R d
0
= V^ (0; x):
Since this can be done for each t 2 [0; 1), V^ is independent of time.

A.2 Proof of Lemma 5.2.2.


To prove the lemma we rst show the following fact. Suppose that there exist
continuous functions
V (t; x) and V (t; x) such that
@V + @V T (f + gu)  l + kuk2  @V + @V T (f + gu)
@t @x R @t @x
V (tf ; x)  s(x)  V (tf ; x);
then
V (t; x)  V^ (t; x)  V (t; x);
 
where V^ (t; x) solves GHJB V^ ; u = 0.
We will prove the lower bound V ; the upper bound is similar and has been
shown in (Saridis and Balaram, 1986, Theorem 3.1).
Along trajectories of the system we have that
T
_V (t; x) = @V + @V (f + gu);
@t @x
110

^ V^ T (f + gu):
V^_ (t; x) = @@tV + @@x

Assumptions A3.1 and A3.2 now give the following equations:


Z tf
V (t; x) = V (tf ; '(tf ; t; x; u); u) ? V_ (; '( ; t; x; u); u) d
Z tf t
 s('(tf ; t; x; u)) ? t V_ (; '( ; t; x; u); u) d
Zt
 s('(tf ; t; x; u)) + t f (; '( ; t; x; u); u) d
Z tf
= s('(tf ; t; x; u)) ? V^_ (; '( ; t; x; u); u) d
t
= s('(tf ; t; x; u)) ? V^ (tf ; '(tf ; t; x; u); u) + V^ (t; x)
.. V (t; x)  V^ (t; x):
To prove the lemma we suppose that U (t; x) is also a solution to
GHJB (U; u) = 0:
Then U (t; x) satis es both the lower and upper bounds above which implies that
U (t; x)  V^ (t; x)  U (t; x):

A.3 Proof of Lemma 5.2.3.


(=)): The existence of V^ (t; x) is guaranteed by lemma 5.2.1. By comparing
the de nition of V^ given in equation (A.1) with the performance functional (3.4), it
is clear that
V^ (t; x) = J (t; x):
((=): From equation (3.4) we have that
Z tf
V^ (t; x) = s ('(tf ; t; x; u)) + l ('( ; t; x; u)) + ku (; '( ; t; x; u))k2R d:
t
The result is obtained by di erentiating both sides along the trajectories of the
system x_ = f + gu.

A.4 Proof of Lemma 5.2.4.


Admissibility: Since V^ is continuously di erentiable, the continuity assump-
tion A3.1 on g implies that u^ is continuous. Since V^ is positive de nite it attains a
111

minimum at the origin, hence @@xV^ must vanish there: therefore u^(t; 0) = 0.
We now show that V^ (t; x) is a Lyapunov function for the system (f; g; u^).
V^ (t; x) is positive de nite by lemma 5.2.1. Taking the derivative of V^ (t; x) along
the system (f; g; u^) and using the fact that g T @@xV^ = ?2Ru^ we obtain

^V_ (t; x; u) = @ V^ + @ V^ [f (t; x) + g(t; x)^u(t; x)]


T
@t @x
=
@ V^ + @ V^ T f (t; x) ? 2 ku^k2 : (A.3)
@t @x R

But
@ V^ + @ V^ f (t; x) = ? @ V^ T g(t; x)u(t; x) ? l(t; x) ? kuk2
@t @x @x R
= 2^u Ru ? l ? kukR :
T 2

Substituting this into (A.3) we obtain


V^_ (t; x; u) = ?l(x) ? ku^k2R ? ku ? u^k2R
 0:
The admissibility of u^ now follows from lemma 3.2.4 for the nite-time case and
from lemma 3.1.6 and theorem 3.1.3 for the in nite-time case.
Performance: Along trajectories of the system (f; g; u^) we have for each (t; x)
Z tf
V^ ? V^ = l(; '( ; t; x; u^)) + ku^( ; t; x; u^)k2R d (A.4)
Zt tf
+ ?l(; '( ; t; x; u^)) ? ku( ; t; x; u^)k2R d
t
Z tf @ (V^ ? V^ ) @ (V^ ? V^ )T
= ?
t @t + @x [f + gu^] d: (A.5)
^  ^ 
Solving for @ V^ and @ V^ from GHJB V ; u = 0 and GHJB V ; u^ = 0 gives
@t @t

@ V^ = ? @ V^ T [f + gu] ? l ? kuk2
@t @x R
T
@ V^ = ? @ V^ [f + gu] ? l ? kuk2
@t @x R

Plugging into (A.5) gives


^V (t; x) ? V^ (t; x) = ? Z tf kuk2 ? ku^k2 ? @ V^ g  (^u ? u) d:
t R R @x
112

Since @ V^ g = ?2Ru^T we have


@x
^V^ (t; x) ? V^ (t; x) = ? Z tf kuk2 + ku^k2 ? 2^uT Ru d
R R
Zt tf
= ? ku ? u^k2R d
t
 0:

A.5 Proof of Corollary 5.2.5.


The proof follows by showing that V^ is a Lyapunov function for the system
x_ = f + gD(^u). Since V^ is positive de nite it is sucient to show that ddtV^ is negative
de nite. Since
@ V^ T (f + gu) + l + kuk2 = 0;
@x R
completing the squares gives
@ V^ T (f + gu^) = ?l ? ku ? u^k2 ? ku^k2 :
@x R R

Therefore
@ V^ T (f + gD(^u)) = ?l ? ku ? u^k2 ? ku^k2 + @ V^ T gD(^u) ? @ V^ T gu^:
@x R R @x @x
Using the fact that @ V^ T g = ?2^
uT R , the hypothesis gives that
@x
@ V^ T (f + gD(^u)) = ?l ? ku ? u^k2 ? ku^k2
@x R R
< 0:

A.6 Proof of Lemma 5.2.12.


(=)). (i): P 1 j =N +1 cj j (x) ! 0 uniformly on
implies that 8 > 0, 9K1 such
that N > K1 =) P1
j =N +1 cj j (x) < =3, 8x 2
. Since j (x) are continuous on

, PNj=1 cj j (x) is uniformly


PN continuous
on
, therefore 8 > 0, 9(N ) such that
jx ? yj < (N ) =) j=1 cj j (x) < =3. So for any  > 0, x K > K1, then for
113

any 0 <  < (K ) we have that jx ? yj <  implies


K
X X X
jW (x) ? W (y)j  W (x) ? cj j (x) + cj j (x) ? cj j (y)
K K

K j=1 j=1 j =1
X
+ cj j (y) ? W (y)

1j=1 K 1
X X XK X
= cj j (x) + cj j (x) ? cj j (y) + cj j (y)
j=K +1 j=1 j =1 j=K +1
< :
Therefore PW (x) is continuous.
j =K +1 cj j (x) ! 0 uniformly on
implies that 8x 2
and 8 > 0, 9m
(ii): 1
such that 1
X
n > m =) cj j (x) < :
j=k+n+1
So for any  > 0,
n P> m
) X 1


1 cj j (x) <  =) cj j (x) < :

j =k+1 j =k+n+1
((=). P1
j =N +1 cj j (x) ! 0 pointwise on
implies that 8 > 0, 9k(x) such
that 1
X
n  k(x) =) cj j (x) < =3:
j =N +1
Since PNj=1 cj j (x) is uniformly continuous on
, 8 > 0, 900 (N ) such that
N
X X
jx ? yj < 00(N ) =) cj j (x) ? cj j (y) < =3:
N

j =1 j =1

The uniform continuity of W (x) implies that 8 > 0, 9 0 such that


jx ? yj < 0 =) jW (x) ? W (y)j < =3:
4
At each x 2
, de ne (x) = minf0; 00(k(x))g. Let B (x; ) de ne the ball
centered at x of radius , then [x2
B (x; (x)) is an open covering of
. Since
is
compact, we can extract a nite sub-cover [pj=1B (xj ; (xj )) 
. So for any x 2
,
9x0 2 fxj gpj=1 such that x 2 B (x0 ; (x0 )). Therefore
k(x )
X kX
(x )

jx ? x0 j < (x0) < 00(k(x0 )) =) cj (x) ? cj (x0) < =3
0 0

j =1 j =1
114
1
X
k(x0)  k(x) =) cj j (x) < =3
j=k(x )+1 0

jx ? x0 j < (x0) < 0 =) jW (x) ? W (x0)j < =3:

So 8x 2
and 8 > 0, 9k 2 fk(xj )gpj=1, such that P1 j =k+1 c j  j (x) < . From the
hypothesis, 9m;  such that
n P> m ) X 1


1 cj j (x) <  =) cj j (x) < :

j =k+1 j =n+k+1
Let K = max1jpfm + k(xj )g, then
1
X
N  K =) cj j (x) < ; 8x 2
:
j =N +1
A.7 Proof of Theorem 5.3.1.
By lemma 5.2.1 and lemma 5.2.2 we know  that there exists a unique, positive
de nite solution, V , to the equation GHJB V; u = 0 with appropriate bound-
(0) (0)
ary conditions. From lemma 5.2.4 we have that u(1) 2 Al;s(D) and that V (1) (t; x) 
V (0) (t; x). By induction we have that V (i) (t; x)  V (i?1) (t; x)  V (0) (t; x) and
u(i) 2 Al;s(D). We can repeat the argument used in the proof of lemma 5.2.4 to
show that 8i  0, V  (t; x)  V (i) (t; x). Hence for each (t; x) 2 D, V (i) (t; x) is a
monotonically decreasing sequence that is bounded below. Hence V (i) (t:x) converges
to some V (1) (t; x). However, it is easy to verify that
   
GHJB V (1) ; u(1)  HJB V (1)  0
with identical boundary conditions, so V (1) (t; x)  V  (t; x).
If
is a compact set, then uniform convergence follows from Dini's theorem
of analysis (cf. (Apostol, 1974)).
APPENDIX B
GALERKIN'S METHOD
There are many results in the literature concerning Galerkin approximations and
sucient conditions for general classes of equations are known. The diculty is
that the Hamilton-Jacobi-Bellman equation does not satisfy the appropriate con-
ditions. In this section we review some of the results on Galerkin approximation
and demonstrate the inadequacy of the current methods when applied the GHJB
equation.
Classical references on Galerkin's method can be found in (Kantorovich and
Krylov, 1958; Mikhlin, 1964; Mikhlin and Smolitskiy, 1967). These sources show
that Galerkin's method applied to the equation
AV = b;
where A is a linear operator, converges if the the operator is symmetric, positive
de nite and positive bounded below. An operator is symmetric if for all V; W 2
D(A)
hAV; W i = hV; AW i ;
positive if for all V 2 D(A) n f0g
hAV; V i > 0;
and positive bounded below if 9 > 0 (independent of V ) such that
hAV; V i  kV k2
8V 2 D(A).
These results are extended in (Petryshyn, 1965) so that A only needs to be K -
symmetric and K -positive bounded below, where K is an operator that multiplies
W and V in the equations above. In (Schultz, 1969a), the Galerkin approximation
method is placed in a Hilbert space setting. The di erential operators are required
to to positive bounded below and symmetric. In these cases, it is shown the the
Galerkin method converges and that for any nite subspace, the Galerkin method
yields the best-approximation from that subspace. Both linear and nonlinear oper-
ators are considered. A particular non-self-adjoint nonlinear operator is considered
in (Schultz, 1969b), however results do not exist for general non-self-adjoint and
nonlinear operators. A general survey of results concerning the convergence of the
Galerkin method, and error bounds presented in the literature prior to 1972 in
given in (Finlayson, 1972). A modern treatment is given in (Zeidler, 1990a) for
linear operators and (Zeidler, 1990b) for nonlinear operators.
The following de nition and result is given in (Zeidler, 1990a, p. 279).
De nition B.0.1 (Uniquely Approximation-Solvable) Let X be a Hilbert
115
116

space and b 2 X . The linear operator equation


AV = b; V 2 X (B.1)
is uniquely approximation-solvable i the following hold:
(i) Equation (B.1) has a unique solution V .
(ii) There exist a number n0 such that, for all n  n0, the Galerkin equation has a
unique solution VN .
(iii) The Galerkin method converges, i.e., VN ! V as N ! 1.
Theorem B.0.2 For each b 2 X , equation (B.1) is uniquely approximation-solvable
if A : X ! X has one of the following properties:
(a) A = I + K , where K : X ! X is linear, and kK k < 1.
(b) A = I + C , where C : X ! X is linear and compact, and AV = 0 implies
V = 0.
(c) A : X ! X is linear, continuous and positive bounded below.
(d) A = B + C , where B : X ! X is linear, continuous, and positive bounded
below, C : X ! X is linear and compact, and AV = 0 implies V = 0.
We will show that the linear operator associated with the GHJB equation does
not satisfy any of these conditions.
Proposition B.0.3 Given an asymptotically stable vector eld f + gu, the di er-
ential operator
AV = ? @V
T
@x (f + gu)
does not satisfy (a), (b), (c) or (d) in theorem B.0.2.
Proof: Since a linear operator is continuous if and only if it is bounded, and
compactness implies boundedness (cf. (Kreyszig, 1978)), conditions (a), (b), (c) and
(d) imply that A is a bounded operator. We will show that A is not bounded. It is
sucient to show one concrete example. Consider the following vector eld in IR1:
f + gu = ?x. De ne the following sequence of functions on
= [?1; 1]:
V m = x2m :
Then
sup kAV m k2
kV m k6=0 kV m k2
R 1 4m2x4m dx
= sup ?R1
kV m k6=0 ?11 x4m dx
117

= sup 4m2
kV k6=0
m
= 1:

In the following propositions we show that A is neither symmetric nor positive


bounded below.
Proposition B.0.4 Under the conditions of proposition B.0.3, A is not symmetric.
Proof: To prove the proposition we construct a simple counterexample. Let
f + gu = ?x,
= [?1; 1], U = x2 and W = x4 , then
* + Z1
@U (f + gu); W = ?2x6 dx
@x ?1
Z1
6= ?4x6 dx
*?1 +
=
@W
U; @x (f + gu) :

Proposition B.0.5 Under the conditions of proposition B.0.3, A is not positive


bounded below.
Proof: We again construct a simple counter-example. Let f + gu = ?x. De ne
 
V^j = x2 j ; j = 1; 2;   
1
2

and let Vj be function that is obtained by mollifying V^j so that Vj is continuously


di erentiable at 0 but its integral, and the integral of its derivative along f + gu is
only changed by . Then
* +
@Vj (f + gu); V  hV ; V i
j j j
@x
1 Z 1 ?j j Z1
=) x dx  x j dx + 
4 2 2

j ?1
2
?1
=) 1  j + 
2 2+j
=) 2  j + ; j = 1; 2;    :

To our knowledge, these conditions encompass the known sucient conditions


for the convergence of the Galerkin approximation method. Since they do not
include the Generalized-Hamilton-Jacobi-Bellman equation, the convergence results
in this thesis are necessary to justify the algorithm in chapter 4.
APPENDIX C
LIST OF SYMBOLS
4

= Compact subset of IRn containing a ball around the origin p. 16


4
kuk2R = uT Ru p. 18
'(t; t0; x0 ; u) =4 The solution to x_ = f + gu with initial conditions (t0 ; x0) p. 17
4
Al (
) = The set of admissible controls p. 18
Al;s(D) =4 The set of admissible controls p. 24
4
f = system drift term p. 22
4
n = The size of the state space p. 22
4
g = system term multiplying the control p. 22
4
l = performance weighting function on the state p. 23
4
R = performance weighting matrix on the control p. 23
4
s = performance weighting function on the nal state p. 23
4
D = [t0 ; tf ] 
p. 24
4
GHJB = Generalized-Hamilton-Jacobi-Bellman p. 25
4
HJB = Hamilton-Jacobi-Bellman p. 25
4 @V + @V T (f + gu) + l + kuk2
GHJB (V; u) = @t @x R p. 25
4 @V + @V T fl ? 1 @V T gR?1g T @V
HJB (V ) = @t @x 4 @x @x p. 27
4
V (i) = The ith performance index in successive approximation algorithm p. 27
4
u(i) = The ith control in successive approximation algorithm p. 27
4
V = The solution of the HJB equation p. 26
4
u = The optimal control p. 26
4
N = (1; : : : ; N )T p. 33
4
rN = The
 (i) Jacobian of N p. 33
4 (i) T
c(Ni) = c1 ; : : : ; cN p. 34
4
j = Basis functions p. 32
4
VN(i) = N th Galerkin approximation to V (i) p. 32
4
u(Ni) = N th approximation to u(i) p. 34
hf; gi
=4 R f (x)g(x) dx p. 33

h; N i
=4 Vector or Matrix of inner products p. 33
4
u = An arbitrary admissible control p. 43
4
VN = Solution to hGHJB(VN ;u) ; N i
= 0 p. 43
4
V^ = Solution to GHJB V^ ; u = 0 p. 43
4
uN = ? 21 R?1gT @V@xN p. 43

118
119

4
u^ = ? 12 R?1gT @@xV^ p. 43
4
cN = (c1;    ; cN )T p. 43
4
c^N = (^c1;    ; c^N )T p. 43
4
PD (
) = Pointwise decreasing
  p. 51
4
V^ (i) = Solution of GHJB V^ (i) ; u(Ni) = 0 p. 63

You might also like