Optimal Control: 5.1 Performance Indices

CHAPTER
5 Optimal Control
BEFORE me grew the human soul,

And after I am dead and gone,
Through grades of effort and control
The marvellous work shall still go on
—–Archibald Lampman
If everything seems under control, you’re just not going fast

enough.
—–Mario Andretti
One of the essential elements of the control problem, discussed in Section 1.1, is a means
of testing the performance of any proposed control law. Whenever we use the term “best” or
“optimal” to describe the effectiveness of a given control strategy, we do so with respect to
some numerical index of performance called the performance index. We assume that the value
of the performance index decreases as the quality of the given admissible control law increases.
The admissible controller that ensures the completion of the system objective and at the same
time minimizes the performance index is called an optimal controller for the system. In this
chapter, we discuss various methods of constructing optimal controllers.
5.1 Performance Indices

Constructing a performance index—that is, choosing a means to measure the system
performance—can be considered a part of the system modeling. Here, we discuss some typical
choices of performance indices.
First, suppose that the objective is to control a dynamical system modeled by the equations
ẋ(t) = Ax(t) + Bu(t), x(t0 ) = x 0 , (5.1)
y(t) = C x(t) (5.2)
on a fixed interval [t0 , t f ] so that the components of the state vector are “small.” A suitable
performance index to be minimized would be
tf
J1 = x T (t)x(t) dt. (5.3)
t0
225
226 CHAPTER 5 • OPTIMAL CONTROL
Obviously, if J1 is small, then the state vector norm, x(t), is small in the sense of the above
performance index.
If the objective is to control the system so that the components of the output, y(t), are to be
small, then we could use the performance index
tf
J2 = y T (t) y(t) dt
t0
tf
= x T (t)C T C x(t) dt
t0
tf
= x T (t) Qx(t) dt, (5.4)
t0
where the weight matrix Q = C T C is symmetric positive semidefinite.

In the case when we wish to control the system in such a manner that the components of the
input, u(t), are “not too large,” a suitable performance index to be minimized is
tf
J3 = u T (t)u(t) dt, (5.5)
t0
or
tf
J4 = u T (t)Ru(t) dt, (5.6)
t0
where the weight matrix R is symmetric positive definite. There is no loss of generality in
assuming the weight matrix R to be symmetric. If R was not symmetric, we could represent the
quadratic term u T Ru equivalently as

R + RT
u Ru = u
T T
u,
2
where the matrix 12 (R + R T ) is symmetric.
As pointed out by Owens [218, p. 186], we cannot simultaneously minimize the performance
indices (5.3) and (5.5) because minimization of (5.3) requires large control signals, while min-
imization of (5.5) requires small control signals. To solve the dilemma, we could compromise
between the two conflicting objectives by minimizing the performance index that is a convex
combination of J1 and J3 ,
J = λJ1 + (1 − λ)J3
tf
= (λx T (t)x(t) + (1 − λ)u T (t)u(t)) dt, (5.7)
t0
where λ is a parameter in the range [0, 1]. If λ = 1, then J = J1 ; and if λ = 0, then J = J3 . By trial
and error, we could select λ from the interval [0, 1] to compromise between the two extremes.
A generalization of the performance index (5.7) is
tf
J= (x T (t) Qx(t) + u T (t)Ru(t)) dt.
t0
5.2 A GLIMPSE AT THE CALCULUS OF VARIATIONS 227
In certain applications, we may wish the final state x(t f ) to be as close as possible to 0. Then,
a possible performance measure to be minimized is
x T (t f )Fx(t f ), (5.8)
where F is a symmetric positive definite matrix.

We can combine the performance measures (5.4), (5.6), and (5.8) when our control aim is to
keep the state “small,” the control “not too large,” and the final state as near to 0 as possible.
The resulting performance index is

1 1 tf T
J = x T (t f )Fx(t f ) + (x (t) Qx(t) + u T (t)Ru(t)) dt, (5.9)
2 2 t0
where the factor 12 is placed to simplify subsequent algebraic manipulations. Minimizing

(5.9) subject to (5.1) is called the linear quadratic regulator problem, or the LQR problem for
short.
In some cases, the controller’s goal is to force the system state to track a desired state trajectory,
x d (t), throughout the interval [t0 , t f ] while maintaining the deviations of the actual state x(t)
“small” from the desired trajectory with the control effort u(t) “not too large” and the final state
x(t f ) being as near as possible to some desired state x d (t f ). A suitable performance index to
be minimized in this case might be
1
J = (x(t f ) − x d (t f ))T F(x(t f ) − x d (t f ))
2

1 tf
+ ((x(t) − x d (t))T Q(x(t) − x d (t)) + u T (t)Ru(t)) dt. (5.10)
2 t0
5.2 A Glimpse at the Calculus of Variations

The goal of this section is to provide a brief introduction to the subject of the calculus of
variations. The calculus of variations deals with minimization or maximization of functionals.
A functional is a mapping or a transformation that depends on one or more functions and the
values of the functional are numbers. Examples of functionals are performance indices that we
introduced in the previous section. We now give two more examples of functionals.
◆ Example 5.1
The length l of the arc connecting two given points A( x0 , y0 ) and B( x1 , y1 ) in the
x–y plane, as depicted in Figure 5.1, is an example of a functional. We know that
x1
l = l( y( x)) = 1 + (dy( x)/dx) 2 dx,
x0
subject to the constraints y( x0 ) = y0 and y( x1 ) = y1 .

B(x1, y1)
y y(x)
A(x0, y0)
Figure 5.1 The length of arcs

joining two given points in the plane
x is an example of a functional.
◆ Example 5.2
Another example of a functional is the area of the surface of revolution formed by
a surface connecting two circular disks as illustrated in Figure 5.2. In particular, the
surface of revolution can be formed by a soap film clinging to two circular disks.
When a wire circle is dipped into a soap solution and withdrawn, a circular disk
of soap film bounded by the circle is formed. If a second smaller circle is made to
touch the disk and then moved away, the two circles will be joined by a surface
of film that is a surface of revolution in the particular case when the circles are
parallel and have their centers on the same axis perpendicular to their planes. If
we denote the arc of intersection with the x–y plane by y = y( x), then the area of
the surface of revolution is
x1
v = v( y( x)) = 2π y( x) 1 + (dy( x)/dx) 2 dx,
x0
subject to the constraints y( x0 ) = y0 , y( x1 ) = y1 . One may then want to minimize
v = v( y( x)) to find y = y( x) resulting in the surface of revolution of minimum area.
y
A(x0, y0)
y y(x)
B(x1, y1)
Figure 5.2 The area of surfaces

of revolution connecting two
circular disks is an example of a
(1 (dydx)2)12 dx functional.
A(0, 0)
x
x x(t)
y y(t)
B(x1, y1)
mg
y Figure 5.3 Illustration of the brachistochrone problem.
A B
x Figure 5.4 The problem of finding geodesics.
The problem of optimization of functionals dates back to Johann Bernoulli. In 1696 he published
a letter in which he posed the problem, which is now known as the brachistochrone problem.
Brachistochrone translates from Greek as the shortest time. The brachistochrone problem can be
described as follows. A particle of mass m is to slide down along a frictionless track connecting
two given points. The motion of the particle is to be constrained only by gravity and the shape
of the curve of the track. Find the curve of the track that minimizes the time travel between the
two given points. We illustrate the problem in Figure 5.3. The brachistochrone problem was
solved by Johann Bernoulli himself, his brother Jacob, Isaac Newton, the Marquis de L’Hôpital,
Gottfried Leibniz, and Tschirnhaus. A solution to the brachistochrone problem is presented in
Example 5.23. In 1697 Johann Bernoulli posed and solved the geodesics problem, which can be
formulated as follows: Find a line of shortest length that connects two points on a given surface;
see Figure 5.4 for an illustration of the problem. The curves of the least length connecting pairs
of points on a surface are called geodesics. A general procedure for solving the above type
of problems, using the calculus of variations, was given by Leonhard Euler (who flourished
between 1707 and 1783).
To acquaint ourselves with the methods of the calculus of variations, we first need to introduce
concepts concerning functionals.
5.2.1 Variation and Its Properties

A function can be viewed as a rule, or a mapping, that assigns to each element of some set a
unique element of a possibly different set. In particular, a function x : R → R of a real variable t
assigns to each real number a uniqe real number. An increment of the argument of a function of
one variable t is t = t − t1 .
Similarly, a functional is a mapping that assigns to each function, from some class of func-
tions, a unique number. Thus, we can say that a functional is a “function of a function.” Let
x : R → R be an argument of a functional. By a variation δx of an argument x of a functional
v we mean the difference of two functions
δx = x − x1 .
We assume that x can change in an arbitrary way in some class of functions.
Recall that a function x : R → R is continuous if a small change of its argument t corresponds
to a small change of the value x(t) of the function. Similarly, a functional v is said to be continuous
if a “small” change of its argument x corresponds to a small change of the value of the functional.
A functional v is called linear if
v(ax1 + x2 ) = av(x1 ) + v(x2 ),
where a is a constant.
We will now introduce the notion of the variation of a functional. This notion is analogous to
the notion of a function differential. To connect the two, we first consider the case of a function
of one variable. Let x be a differentiable function defined on an open interval U , and let t ∈ U .
The derivative of x at t is defined as
x(t + t) − x(t)
x (t) = lim . (5.11)
t→0 t
Let
x(t + t) − x(t)
ϕ(t) = − x (t). (5.12)
t
The function ϕ is not defined at t = 0; however,
lim ϕ(t) = 0.
t→0
We rewrite (5.12) as
x(t + t) − x(t) = x (t)t + ϕ(t) t. (5.13)
The above relation has meaning only when t = 0. To make it hold at t = 0, we define
ϕ(t)|t=0 = 0. To proceed, let
β(t) = ϕ(t) if t > 0
(5.14)
β(t) = −ϕ(t) if t < 0.
Then, if x is differentiable, there exists a function β such that
x(t + t) − x(t) = x = x (t)t + β(t)|t|
= L(t, t) + β(t)|t|, (5.15)
where limt→0 β(t) = 0, and L(t, t) = x (t)t is a linear function in t.
For a real-valued function f = f (x) : Rn → R, relation (5.15) takes the form
f (x + x) − f (x) = f = D f (x)x + β(x)x
= L(x, x) + β(x)x, (5.16)
where limx→0 β(x) = 0, and

L(x, x) = D f (x) x = ∇ f (x)T x
is a linear function in x. Relation (5.16) holds for operators defined over Banach spaces with
scalars being either real or complex numbers. The functionals we consider are operators over a
Banach space of continuous functions, C([t0 , t1 ]), with the norm given by (A.99) on page 656.
If an increment v = v(x + δx) − v(x) of a functional v can be represented as
v = L(x, δx) + β(δx)δx,
where L(x, δx) is a linear functional with respect to δx, the term δx = maxt∈[t0 ,t1 ] |δx| denotes
the maximal value of |δx|, and β(δx) → 0 if δx → 0, then the linear part L of v is called
the variation of the functional and is denoted δv; that is,
δv = L(x, δx).
◆ Example 5.3
We find the variation of the functional
1
v= (2x 2 (t) + x(t))dt.
0
For this we first calculate its increment to get

v = v( x + δx) − v( x)
1 1
= (2( x + δx) 2 + ( x + δx))dt − (2x 2 + x)dt
0 0
1
= (2x 2 + 4xδx + 2(δx) 2 + x + δx − 2x 2 − x)dt
0
1 1
= (4x + 1)δx dt + 2 (δx) 2 dt.
0 0
The linear part of v is

1
δv = (4x + 1)δx dt,
0
which is the variation of the given functional.
To proceed, note that the linear part on the right-hand side of (5.16) can be computed as

d 
D f (x)x = L(x, x) = f (x + αx)  (5.17)
dα α=0
As mentioned before, the above also holds for operators over Banach spaces other than Euclidean
spaces. Using this observation, we can state and prove a lemma that will allow us to calculate
the variation of a given functional more efficiently than the method we used in Example 5.3.
Lemma 5.1

d 
δv = v( x + αδx) 
 .
dα α=0
Proof Suppose that for a given functional v there exists its variation. This means that
we can represent v as
v = v( x + αδx) − v( x) = L ( x, αδx) + β(αδx)|α|δx.
Then, the derivative of v( x + αδx) with respect to α evaluated at α = 0 is equal to
v L ( x, αδx) + β(αδx)|α|δx
lim = lim
α→0 α α→0 α
L ( x, αδx) β(αδx)|α|δx
= lim + lim
α→0 α α→0 α
= L ( x, δx),
because L (·, ·) is linear with respect to the second argument, and hence
L ( x, αδx) = αL ( x, δx).
Furthermore,
β(αδx)|α|δx
lim = lim β(αδx)δx = 0.
α→0 α α→0
This completes the proof.
◆ Example 5.4
We will now use Lemma 5.1 to find the variation of the functional considered in
Example 5.3. We have
d
δv = v( x + αδx)|α=0
dα 
1 
d 
= (2( x + αδx) 2 + ( x + αδx))dt 
dα 0 
α=0
1 


= (4( x + αδx)δx + δx)dt 
0 
α=0
1
= (4x + 1)δx dt,
0
which is the variation of the given functional in Example 5.3.

◆ Example 5.5
We will find the variation of the functional
1
v( x1 , x2 ) = v( x1 (t), x2 (t)) = x1 e−x2 dt,
0
using the formula


d 
δv = v( x + αδx) 
 .
dα α=0
We have

1 
d 
δv = ( x1 + αδx1 )e−( x2 +αδx2 ) dt 
dα 0 
α=0
1 
d 

= ( x1 + αδx1 )e−( x2 +αδx2 ) dt 
0 dα 
α=0
1 
 
= δx1 e−( x2 +αδx2 ) − δx2 e−( x2 +αδx2 ) ( x1 + αδx1 ) dt 
0 
α=0
1
= (e−x2 δx1 − x1 e−x2 δx2 )dt.
0
Thus,
1
−x
δv = e 2 δx1 − x1 e−x2 δx2 dt.
0
Definition 5.1 A functional v is maximal on some function x0 if the value of the functional
on any function x close to x0 is less than or equal to that on x0 , that is,
v = v(x) − v(x0 ) ≤ 0.
We will now give a first order necessary condition for a given functional to achieve a maximum
or minimum on a curve.
Theorem 5.1 If for a functional v there exists a variation and the functional reaches a
maximum or a minimum on a curve x0 , then
δv( x0 ) = 0.
Proof Suppose that x0 and δx are fixed. Then
v( x0 + αδx) = (α);
that is, v( x0 +αδx) is a function of α that for α = 0 achieves its maximum or minimum.
This implies that dαd
= 0, or, equivalently,

d 
v( x0 + αδx)  = 0.
dα α=0
Hence, by Lemma 5.1, we have δv = 0. Therefore, a functional that achieves its

maximum or a minimum on a function x0 has its variation evaluated at x0 equal to
zero.
The above results will be used in the next section to determine conditions that a curve has to
satisfy to be a maximizer or a minimizer, that is, to be an extremal for a given functional.
5.2.2 Euler–Lagrange Equation

In this section, we consider the following optimization problem.
Problem 1 Suppose that we are given two points (t0 , x0 ) and (t1 , x1 ) in the (t, x)-plane. We
wish to find a curve, or trajectory, joining the given points such that the functional
t1
v(x) = F(t, x, ẋ) dt (5.18)
t0
along this trajectory can achieve its extremal—that is, maximal or minimal—value. The problem
is illustrated in Figure 5.5. In solving Problem 1, we use the following lemma.
Lemma 5.2 Suppose that a function φ : R → R is continuous on the interval [t0 , t1 ].

Then,
t1
φ(t)δx(t)dt = 0
t0
if and only if φ(t) = 0 at every point of the interval [t0 , t1 ].
x(t)
x1
x* x*(t)
x x(t)
x0
t0 t1 t Figure 5.5 Illustration of Problem 1.

x(t)
Figure 5.6 Illustration of the proof of

t0 t1 t
a b Lemma 5.2.
"t
Proof Sufficiency is obvious because we can satisfy t01 φ(t)δx(t)dt = 0 by choosing
φ that vanishes at every point of the interval [t0 , t1 ].
We prove necessity by contraposition.
"t We thus prove that if φ is nonzero on some
subinterval of [t0 , t1 ], then t01 φ(t)δx(t)dt = 0. Specifically, assume, without loss of
generality, that φ is positive on the interval [a, b], where t0 ≤ a ≤ b ≤ t1 . Note that
if φ were nonzero at only one point, then because it is continuous, it would have to
be nonzero in some neighborhood of that point. Hence, we can assume that a < b.
Consider now a variation given by
2
k (t − a) 2 (t − b) 2 for a ≤ t ≤ b,
δx = δx(t) = (5.19)
0 elsewhere.
If we assumed φ to be negative on the interval [a, b], then we would use δx to be
negative of (5.19). We illustrate δx in Figure 5.6. Because " t δx is positive in [a, b] and
zero elsewhere in the interval [t0 , t1 ], the integrand of t01 φ(t)δx(t)dt will have the
same sign as φ in [a, b] and will be zero outside this subinterval. Thus
t1 b
φ(t)δx(t)dt = φ(t)δx(t)dt = 0,
t0 a
and the proof is complete.
Solution to Problem 1 We assume that there is an optimal trajectory x = x(t) joining the points
(t0 , x0 ) and (t1 , x1 ) such that δv(x(t)) = 0. Let us then consider an arbitrary acceptable curve
x ∗ = x ∗ (t) that is close to x and from a family of curves
x(t, α) = x(t) + α(x ∗ (t) − x(t)) = x(t) + αδx(t).
Note that for α = 0 we get x(t), and for α = 1 we obtain x ∗ (t). Furthermore,
d d
(δx(t)) = (x ∗ (t) − x(t)) = δx ,
dt dt
and analogously
d2 d2
2
(δx(t)) = 2 (x ∗ (t) − x(t)) = δx .
dt dt
We are now ready to investigate the behavior of the functional v given by (5.18) on the curves
from the family x(t, α). Note that the functional v can be considered as a function of α, that
is, v(x(t, α)) = (α). It follows from the first-order necessary condition for a point to be an
extremizer of a function of one variable that α = 0 is a candidate to be an extremizer of if

d(α)   = 0,
dα α=0
where
t1
(α) = F(t, x(t, α), x (t, α)) dt,
t0
and x (t, α) = d
dt
x(t, α). We now evaluate d(α)
dα
:
t1
d(α) d d
= Fx x(t, α) + Fx x (t, α) dt,
dα t0 dα dα
∂ ∂
where Fx = ∂x
F(t, x(t, α), x (t, α)), and Fx = ∂x
F(t, x(t, α), x (t, α)). Because
d d
x(t, α) = (x(t) + αδx(t)) = δx(t)
dα dα
and
d d
x (t, α) = (x (t) + αδx (t)) = δx (t),
dα dα
we can write
t1
d
(α) = (Fx (t, x(t, α), x (t, α))δx(t) + Fx (t, x(t, α), x (t, α))δx (t)) dt.
dα t0
By Lemma 5.1,
d
(0) = δv.
dα
Therefore,
t1
δv = (Fx δx + Fx δx ) dt,
t0
where for convenience we dropped the argument t. Integrating by parts the last term, along with
taking into account that δx = (δx) , yields
t1
d
δv = (Fx δx)|t0 +
t1
Fx − Fx δx dt.
t0 dt
The end points (t0 , x0 ) and (t1 , x1 ) are fixed. This implies that
δx|t=t0 = x ∗ (t0 ) − x(t0 ) = 0,
and
δx|t=t1 = x ∗ (t1 ) − x(t1 ) = 0.
Hence
t1
d
δv = Fx − Fx δx dt.
t0 dt
By Lemma 5.2,
d
δv = 0 if and only if Fx − Fx = 0.
dt
The equation Fx − d
dt
Fx = 0 can also be represented as
Fx − Fx t − Fx x x − Fx x x = 0.
The equation
d
Fx − Fx = 0 (5.20)
dt
is known as the Euler–Lagrange equation. The curves x = x(t, C1 , C2 ) that are solutions to
the Euler–Lagrange equation are called extremals. The constants C1 and C2 are the integration
constants. Thus, a functional can only achieve its extremum on the extremals.
◆ Example 5.6
We find the extremal for the functional
π/2
v= (( x ) 2 − x 2 )dt, (5.21)
0
where x(0) = 0, and x(π/2) = 1. The Euler–Lagrange equation for the problem
takes the form

d ∂ d ∂
F x − F x = (( x ) 2 − x 2 ) − (( x ) 2 − x 2 )
dt ∂x dt ∂ x
d
= −2x − (2x )
dt
= −2x − 2x
= 0.
Equivalently, the Euler–Lagrange equation for the functional (5.21) can be repre-
sented as
x + x = 0.
The characteristic equation of the above differential equation is s 2 + 1 = 0.
Therefore, the general solution of the Euler–Lagrange equation has the form
x(t) = C1 cos t + C2 sin t.
Taking into account the end-point conditions gives
x(0) = C1 = 0,
π
x = C2 = 1.
2
Thus, we have one extremal
x(t) = sin t.
The functional (5.21) is minimized or maximized only on the extremal x(t) = sin t.
◆ Example 5.7
We now find the extremal for the functional
1
v= (( x ) 2 + 12xt)dt, (5.22)
0
where x(0) = 0, and x(1) = 1. The Euler–Lagrange equation for the this functional
is

d ∂ d ∂
Fx − F x = (( x ) 2 + 12xt) − (( x ) 2 + 12xt)
dt ∂x dt ∂ x
d
= 12t − (2x )
dt
= 12t − 2x
= 0.
Equivalently, the Euler–Lagrange equation for the functional (5.22) can be repre-
sented as
x = 6t.
The general solution to the above differential equation is
x(t) = C1 t + C2 + t 3 .
From the end-point conditions, we get
x(0) = C2 = 0,
x(1) = C1 + 1 = 1.
Hence, C1 = 0, and the extremal has the form
x(t) = t 3 .
Suppose now that A = A(t0 , x10 , . . . , xn0 ) and B = B(t1 , x11 , . . . , xn1 ) are two given points
in the (n + 1)-dimensional space (t, x1 , . . . , xn ), along with the functional
t1
v = v(x1 , . . . , xn ) = F(t, x1 , . . . , xn , ẋ 1 , . . . , ẋ n ) dt.
t0
To find a trajectory along which δv = 0, we consider only the variation in one coordinate, say xi ,
while other coordinates are fixed. We perform this process for each coordinate. The functional
v will depend only upon xi , and we write
v(x1 , . . . , xi , . . . , xn ) = ṽ(xi ).
Thus, we reduced the given n-dimensional problem to n one-dimensional problems. Proceeding

as before, we conclude that the functional v = v(x1 , . . . , xn ) can only be extremized on the
trajectories that satisfy n Euler–Lagrange equations,
d
Fxi − Fẋ = 0, i = 1, . . . , n. (5.23)
dt i
Observe that if in (5.23) we set F = L = K − U , where L is the Lagrangian, K =

K (q̇ 1 , . . . , q̇ n ) is the kinetic energy of the particle, U = U (q1 , . . . , qn ) is its potential energy,
and xi = qi , then (5.23) becomes

∂L d ∂L
− = 0, i = 1, 2, . . . , n,
∂qi dt ∂ q̇ i
which are the Lagrange equations of motion for a particle. Thus a particle moves along a path
on which the functional
t2
J= L dt
t1
is extremized, which agrees with the Hamilton principle saying that a particle moves along the
path that minimizes the integral J .
In our discussion up to now, both end points were fixed. We now would like to solve a more
general optimization problem, where the end points need not be fixed. For simplicity, we assume
first that one of the end points is fixed, while the other is free. After we solve this problem, we
can easily generalize to the case when both end points are free.
Problem 2 In the (t, x) plane the left end point A is fixed, while the right end point B is free
(see Figure
"t 5.7). It is required to find a trajectory x = x(t) starting at A on which the functional
v = t01 F(t, x, x ) dt can achieve its extremal value.
Solution to Problem 2 Following Elsgolc [75, pp. 64–72], we begin by assuming that a solution
to Problem 2 exists; that is, there is a trajectory x = x(t) joining (t0 , x0 ) and (t1 , x1 ) such
that δv = 0. This trajectory must satisfy the Euler–Lagrange equation. Consider an adjoining
x(t) C
x1 B
x0 A
t0 t1 t Figure 5.7 Illustration of Problem 2.
trajectory specified by x + δx joining (t0 , x0 ) and (t1 + δt1 , x1 + δx1 ). Then,

v = v(x + δx) − v(x)
t1 +δt1 t1
= F(t, x + δx, x + δx ) dt − F(t, x, x ) dt
t0 t0
t1 +δt1 t1
= F(t, x + δx, x + δx ) dt + (F(t, x + δx, x + δx ) − F(t, x, x )) dt.
t1 t0
Our goal now is to extract from v its linear portion in δx, which we know is the variation of
the given functional. We begin by approximating the first integral as
t1 +δt1
F(t, x + δx, x + δx ) dt ≈ F(t, x, x )|t=t1 δt1 .
t1
Expanding the integrand of the second integral into a Taylor series and retaining only first-order
terms yields
t1 t1

(F(t, x + δx, x + δx ) − F(t, x, x )) dt = (Fx (t, x, x )δx + Fx (t, x, x )δx ) dt.
t0 t0
Integrating the second term by parts and combining with the first term gives
t1 t1
t d
(Fx δx + Fx δx ) dt = (Fx δx)t10 + Fx − Fx δx dt,
t0 t0 dt
where for simplicity of notation we suppressed the arguments of Fx and Fx . We assumed that
a solution to the problem exits. Hence, an extremal must satisfy the Euler–Lagrange equation
Fx − dtd Fx = 0. Because the left end point is fixed, we obtain
δx|t=t0 = 0.
Taking into account the results of the above manipulations yields
v ≈ F(t, x, x )|t=t1 δt1 + (Fx δx)|t=t1 . (5.24)
x(t) x(t1) C(t1 t1, x1 x1)
D
E
x1 F
B
x0 A Figure 5.8 A geometric

interpretation of calculating the
variation of the functional for
t0 t1 t1 t1 t Problem 2.
The final step in obtaining δv is to evaluate (Fx δx)|t1 . The problem is two-dimensional, so we
can illustrate it graphically as shown in Figure 5.8. Referring to Figure 5.8, we obtain
D B = δx|t=t1 ,
FC = δx1 ,
EC ≈ x (t1 )δt1 ,
D B = FC − EC.
Hence,
δx|t=t1 ≈ δx1 − x (t1 )δt1 .
Substituting the above into (5.24) gives
δv = F(t, x, x )|t=t1 δt1 + Fx |t=t1 (δx1 − x (t1 )δt1 )
= (F − x Fx )|t=t1 δt1 + Fx |t=t1 δx1 .
If the variations δt1 and δx1 are arbitrary, we obtain δv = 0 if in addition to the Euler–Lagrange
equation the following conditions are also satisfied:
(F − x Fx )|t=t1 = 0 and Fx |t=t1 = 0.
The above end-point relations are called the transversality conditions. In conclusion, the optimal
trajectory must satisfy both the Euler–Lagrange equation and the transversality conditions.
◆ Example 5.8
Consider the functional of Example 5.7, where now the left end point is fixed as
before—that is, x(0) = 0—while the right end point x(1) is free. Here, we have
t1 = 1. Because δt1 = 0 and δx1 is arbitrary, the transversality conditions for the
right end point are reduced to one condition,
F x |t=t1 = 0.
0.4
0.3
The right end-point is on t t1 1
0.2
0.1
x 0
0.1
0.2
0.3
0.4
0 0.2 0.4 0.6 0.8 1
t
Figure 5.9 A plot of the extremal, x(t) = −3t + t3 , of Example 5.8.
Taking into account the solution to the Euler–Lagrange equation obtained in

Example 5.7 and then using the fact that the left end point is fixed, we obtain
0 = F x |t=t1
= 2x (t1 )
= 2(C1 + 3t 2 )|t=1
= 2(C1 + 3).
Hence, C1 = −3 and the extremal for this case is
x(t) = −3t + t 3 .
In Figure 5.9, we show a plot of the above extremal.
◆ Example 5.9
We consider again the same functional as in Example 5.7, where, this time, the left
end point is fixed, as before, while at the right end point t1 is free and x(t1 ) = 1.
Hence, in this case, δx1 = 0 while δt1 is arbitrary. The transversality conditions
reduce to one condition,
( F − x F x )|t=t1 = 0.
We perform simple calculations to obtain
F − x F x = 12xt − ( x ) 2 ,
1
t1 0.51169 t1 1.2311
0.8
0.6
0.4
x 0.2
0.2
0.4
0 0.2 0.4 0.6 0.8 1 1.2
t
Figure 5.10 Plots of the extremals of Example 5.9.
where
x = (C1 t + t 3 ) = C1 + 3t 2 .
Note that because x(t1 ) = C1 t1 + t13 = 1, we have
1 − t13
C1 = .
t1
Taking into account the above and the fact that x(t1 ) = 1, we perform simple
algebraic manipulations to arrive at the following equation for t1 :
( F − x F x )|t=t1 = (12x(t1 )t1 − ( x (t1 )) 2 ) = −4t16 + 8t13 − 1 = 0.

Solving the above equation, we obtain the following two solutions:
. 1/3
3
t1 = 1± .
4
Plots of the two extremals for this example are shown in Figure 5.10.
If the position of the end point B is restricted to a curve x = g1 (t), then

δx1 ≈ g1 (t1 )δt1 ,
and hence
δv = (F + (g1 − x )Fx )δt1 = 0
if
(F + (g1 − x )Fx )|t=t1 = 0. (5.25)
The above is the transversality condition for the case when the end point B is restricted to the
curve x = g1 (t).
In the next section we discuss a specific problem of minimizing a functional subject to
constraints in the form of differential equations.
5.3 Linear Quadratic Regulator
5.3.1 Algebraic Riccati Equation (ARE)

Consider a linear system modeled by
ẋ = Ax + Bu, x(0) = x 0
and the associated performance index
∞
J= (x T Qx + u T Ru) dt,
0
where Q = Q T ≥ 0 and R = R T > 0. Our goal is to construct a stabilizing linear state-feedback

controller of the form u = −K x that minimizes the performance index J . We denote such a
linear control law by u∗ .
First, we assume that a linear state-feedback optimal controller exists such that the optimal
closed-loop system
ẋ = ( A − B K )x
is asymptotically stable. The above assumption implies that there is a Lyapunov function V =
x T P x for the closed-loop system; that is, for some positive definite matrix P the time derivative
d V /dt evaluated on the trajectories of the closed-loop system is negative definite.
Theorem 5.2 If the state-feedback controller u∗ = −K x is such that

dV
min + x Qx + u R u = 0,
T T
(5.26)
u dt
for some V = x T P x, then the controller is optimal.
Proof We can rewrite the condition of the theorem as

dV 
 + x T Qx + u∗T R u∗ = 0.
dt u=u∗
Hence,

dV 
 = −x T Qx − u∗T R u∗ .
dt u=u∗
5.3 LINEAR QUADRATIC REGULATOR 245
Integrating both sides of the resulting equation with respect to time from 0 to ∞, we
obtain
∞
V( x(∞)) − V( x(0)) = − ( x T Qx + u∗T R u∗ )dt.
0
Because by assumption the closed-loop system is asymptotically stable, x(∞) = 0,

and therefore
∞
V( x(0)) = x 0 P x 0 =
T
( x T Qx + u∗T R u∗ )dt.
0
Thus, we showed that if a linear state-feedback controller satisfies the assumption of

the theorem, then the value of the performance index for such a controller is
J ( u∗ ) = x 0T P x 0 .
To show that such a controller is indeed optimal, we use a proof by contradiction. We
assume that (5.26) holds and that u∗ is not optimal. Suppose that ũ yields a smaller
value of J , that is,
J ( ũ) < J ( u∗ ).
It follows from (5.26) that

dV 
 + x T Qx + ũ T R ũ ≥ 0,
dt u=ũ
that is,

dV 
 ≥ −x T Qx − ũ T R ũ.
dt u=ũ
Integrating the above with respect to time from 0 to ∞ yields
∞
V( x(0)) ≤ ( x T Qx + ũ T R ũ)dt
0
implying that
J ( u∗ ) ≤ J ( ũ),
which is a contradiction, and the proof is complete.
It follows from the above theorem that the synthesis of the optimal control law involves
finding an appropriate Lyapunov function, or equivalently, the matrix P. The appropriate P is
found by minimizing
dV
f (u) = + x T Qx + u T Ru, (5.27)
dt
We first apply to (5.27) the necessary condition for unconstrained minimization,

∂ dV 
+ x Qx + u Ru 
T T
 ∗ =0 .
T
∂ u dt u=u
Differentiating the above yields

∂ dV ∂
+ x T Qx + u T Ru = (2x T P ẋ + x T Qx + u T Ru)
∂ u dt ∂u
∂
= (2x T P Ax + 2x T P Bu + x T Qx + u T Ru)
∂u
= 2x T P B + 2u T R
= 0T .
Hence, the candidate for an optimal control law has the form
u∗ = −R−1 B T P x = −K x, (5.28)
−1
where K = R B P. Note that
T

∂2 dV ∂2
+ x T
Qx + u T
Ru = (2x T P Ax + 2x T P Bu + x T Qx + u T Ru)
∂u 2 dt ∂ u2
∂
= (2x T P B + 2u T R)
∂u
= 2R
> 0.
Thus the second-order sufficiency condition for u∗ to minimize (5.27) is satisfied.
We now turn our attention to finding an appropriate matrix P. The optimal closed-loop system
has the form
ẋ = ( A − B R−1 B T P)x, x(0) = x 0 .
Our optimal controller satisfies the equation

dV   + x T Qx + u∗T Ru∗ = 0,
dt  ∗ u=u
that is,
2x T P Ax + 2x T P Bu∗ + x T Qx + u∗T Ru∗ = 0.
We substitute the expression for u∗ , given by (5.28), into the above equation and represent it as
x T ( AT P + P A)x − 2x T P B R−1 B T P x + x T Qx + x T P B R−1 B T P x = 0.
Factoring out x yields
x T ( AT P + P A + Q − P B R−1 B T P)x = 0.
The above equation should hold for any x. For this to be true we have to have
AT P + P A + Q − P B R−1 B T P = O. (5.29)
The above equation is referred to as the algebraic Riccati equation (ARE). In conclusion, the
synthesis of the optimal linear state feedback controller minimizing the performance index
∞
J= (x T Qx + u T Ru) dt
0
subject to
ẋ = Ax + Bu, x(0) = x 0
requires solving the ARE given by (5.29).
◆ Example 5.10
Consider the following model of a dynamical system:
ẋ = 2u1 + 2u2 , x(0) = 3,
along with the associated performance index

∞
2
J = x + r u21 + r u22 dt,
0
where r > 0 is a parameter.
1. We first find the solution to the ARE corresponding to the linear state-feedback
optimal controller. We have
A = 0, B = [2 2], Q = 1, R = r I 2.
The ARE for this problem is
8 2
O = AT P + P A + Q − P B R −1 B T P = 1 − p ,
r
whose solution is
.
r
p= .
8
2. We now write the equation of the closed-loop system driven by the optimal
controller. The optimal controller has the form

1 1
u = −R −1 B T P x = − √ x.
2r 1
Hence, the closed-loop optimal system is described by
4
ẋ = [2 2]u = − √ x.
2r
3. Finally, we find the value of J for the optimal closed-loop system. We have
.
9 r
J = x T (0) P x(0) = .
2 2

We can directly verify that J = 92 r
2 as follows:
∞

min J = min x 2 + r u21 + r u22 dt
0
∞ √ .
8t 2r 9 r
= 9 exp − √ (1 + 1) dt = 9 = .
0 2r 4 2 2
A major difficulty when solving the ARE is that it is a nonlinear equation. In the next section
we present an efficient method for solving the ARE.
5.3.2 Solving the ARE Using the Eigenvector Method

We now present a method for solving the ARE referred to as the MacFarlane and Potter method
or the eigenvector method. In our discussion of the method, we follow the development of Power
and Simpson [237]. We begin by representing the ARE in the form
! !
A −B R−1 B T In
[ P −I n ] = O. (5.30)
−Q −A T
P
The 2n × 2n matrix in the middle is called the Hamiltonian matrix. We use the symbol H to
denote the Hamiltonian matrix, that is,
!
A −B R−1 B T
H= .
−Q − AT
Thus, the ARE can be represented as

!
In
[P −I n ]H = O.
P
We premultiply the above equation by X −1 and then postmultiply it by X, where X is a nonsin-

gular n × n matrix,

X
[X −1 P −X −1 ]H = O. (5.31)
PX
Observe that if we can find matrices X and P X such that

X X
H = ,
PX PX
then equation (5.31) becomes

−1 −1 X
[X P −X ] = O.
PX
We have thus reduced the problem of solving the ARE to that of constructing appropriate matrices
X and P X. To proceed further, let v i be an eigenvector of H and let si be the corresponding
eigenvalue; then
Hv i = si v i .
In our further discussion, we assume that H has at least n distinct real eigenvalues among its
2n eigenvalues. (The results obtained can be generalized for the case when the eigenvalues of
H are complex or nondistinct.) Thus, we can write
 
s1 0 · · · 0
0 s ··· 0
 2 
H[v 1 v 2 · · · v n ] = [v 1 v 2 · · · v n ]  .. .. ..  .
. . .
0 0 ··· sn
Let

X
= [v 1 v2 · · · vn ]
PX
and
 
s1 0 ··· 0
0 ··· 0
 s2 
=  .. .. ..  .
. . .
0 0 · · · sn
The above choice of X and P X constitutes a possible solution to the equation

X
[X −1 P −X −1 ] = O.
PX
To construct P, we partition the 2n × n eigenvector matrix [v 1 v 2 · · · v n ] into two n × n

submatrices as follows:

W
[v 1 v 2 · · · v n ] = .
Z
Thus,

X W
= .
PX Z
Taking X = W and P X = Z and assuming that W is invertible, we obtain
P = ZW −1 . (5.32)
We now have to decide what set of n eigenvalues should be chosen from those of H to construct
the particular P. In the case when all 2n eigenvalues of H are distinct, the number of different
matrices P generated by the above-described method is given by

(2n)!
.
(n!)2
Let Q = C T C be a full rank factorization of Q, as defined on page 636. It follows from
Theorem 3 of Kučera [172, p. 346] that the Hamiltonian matrix H has n eigenvalues in the
open left-half complex plane and n in the open right-half plane if and only if the pair ( A, B)
is stabilizable and the pair ( A, C) is detectable. The matrix P that we seek corresponds to the
asymptotically stable eigenvalues of H. The eigenvalues of the Hamiltonian matrix come in
pairs ±si (see Exercise 5.15). The characteristic polynomial of H contains only even powers of
s. With such constructed matrix P, we have the following result:
Theorem 5.3 The poles of the closed-loop system

ẋ(t) = ( A − B R −1 B T P ) x(t)
are those eigenvalues of H having negative real parts.

W
Proof Because [v1 v2 · · · vn ] = , we can write
Z
!
A −B R −1 B T W W
= .
−Q − AT Z Z
Performing appropriate multiplication of n × n block matrices yields

AW − B R −1 B T Z = W,
or
A − B R −1 B T ZW−1 = A − B R −1 B T P = WW−1 ,
because P = ZW−1 . Thus, the matrix A − B R −1 B T P is similar to the matrix whose
eigenvalues are the asymptotically stable eigenvalues of H. The proof is complete.
◆ Example 5.11
Consider the following model of a dynamical system:
ẋ = 2x + u,
along with the associated performance index
∞
J = ( x 2 + r u2 )dt.
0
Find the value of r such that the optimal closed-loop system has its pole at −3.
We form the associated Hamiltonian matrix
!  
1
A −B R −1 B T 2 −
H= = r .
−Q − AT −1 −2
The characteristic equation of H is
1
det(s I 2 − H) = s 2 − 4 − = 0.
r
Hence,
1
r =
5
results in the optimal closed-loop system having its pole located at −3.
The eigenvector method for solving the ARE can be generalized to the case when the eigenvalues
of the Hamiltonian matrix H are not necessarily distinct or real.
5.3.3 Optimal Systems with Prescribed Poles

We can always find a state feedback u = −K x such that the closed-loop system ẋ = ( A− B K )x
will have its poles in prescribed locations (symmetric with respect to the real axis) if and only if
the pair ( A, B) is reachable. In the multi-input systems the gain matrix K that shifts the poles to
the prescribed locations is not unique. We would like to use the remaining degrees of freedom to
construct a gain matrix K that shifts the system poles to the desired locations and at the same
time minimizes a performance index. Specifically, we are given a system model,
ẋ(t) = Ax(t) + Bu(t),
the desired set of closed-loop poles,
{s1 , s2 , . . . , sn },
and an m × m positive definite matrix R = R T . We wish to construct a state-feedback con-
trol law u(t) = − K x(t) and an n × n positive semidefinite matrix Q = Q T such that the
closed-loop system has its poles in the desired locations, and at the same time the performance
index
∞
J= (x T (t) Qx(t) + u T (t)Ru(t)) dt
0
is minimized. In other words, we are interested in designing an optimal control system with
prescribed poles. We illustrate the problem with a very simple example.
◆ Example 5.12
For a dynamical system modeled by
ẋ = 2x + u
= Ax + B u,
and the performance index

∞
J = (qx 2 + u2 )dt
0
∞
= ( x T Qx + u T R u)dt,
0
find q so that the eigenvalue of

A − B R −1 B T P
is equal to −3.
The corresponding Hamiltonian matrix is

2 −1
H= .
−q −2
The characteristic polynomial of H is

s−2 1
det[s I 2 − H] = det = s 2 − 4 − q.
q s+2
We want
det[s I 2 − H] = (s + 3)(s − 3) = s 2 − 9.
Hence, the desired value of q is q = 5.
We now discuss a method for designing optimal control systems with prescribed poles. In
constructing the gain matrix K and the weight matrix Q, we repeatedly use a state variable
transformation. To proceed, we analyze the system model, the performance index, and the
Hamiltonian matrix in a new basis. Let
x = T x̃,
where det T = 0. Then, the system model in the new coordinates has the form
x̃˙ (t) = T −1 AT x̃(t) + T −1 Bu(t)
= Ã x̃(t) + B̃u(t).
The performance index J , after state variable transformation, takes the form
∞
J˜ = ( x̃ T (t)T T QT x̃(t) + u T (t)Ru(t)) dt.
0
The Hamiltonian matrix is transformed to

!
T −1 AT −T −1 B R−1 B T T −T
H̃ =
−T T QT −T T AT T −T
!
Ã − B̃ R−1 B̃ T
= ,
− Q̃ − ÃT
where T −T = (T −1 )T = (T T )−1 . Observe that

! !
T −1 0 T 0
H̃ = H ;
0 T T
0 T −T
that is, H̃ and H are similarity matrices because

! !
T −1 0 T 0
= I 2n .
0 TT 0 T −T
Hence, the eigenvalues of H and H̃ are identical. Our next step is to compute the characteristic
equation of H̃ as a function of Q̃. Computing det(s I 2n − H̃) is facilitated after we perform
some manipulations on the characteristic matrix s I 2n − H̃ that do not change its determinant.
Specifically, we premultiply the characteristic matrix s I 2n − H̃ by a 2n × 2n matrix whose
determinant equals 1. Then, the determinant of the product is the same as det(s I 2n − H̃), but it
is much easier to evaluate. We have
! !
In 0 s I n − Ã B̃ R−1 B̃ T
− Q̃(s I n − Ã)−1 I n Q̃ s I n + ÃT
!
s I n − Ã B̃ R−1 B̃ T
= .
0 s I n + ÃT − Q̃(s I n − Ã)−1 B̃ R−1 B̃ T
Let
F = B̃ R−1 B̃ T ,
then
!
s I n − Ã F
det(s I 2n − H̃) = det
0 s I n + ÃT − Q̃(s I n − Ã)−1 F
= det(s I n − Ã) det(s I n + ÃT − Q̃(s I n − Ã)−1 F).
Suppose now that the matrix A has n linearly independent eigenvectors; hence it is diagonaliz-
able. Let
T = [t 1 t2 ··· tn]
be a matrix composed of the eigenvectors of A. Therefore,

 
λ1 0 ··· 0
0 λ2 ··· 0
 
Ã = T −1 AT =  .. .. ..  .
. . .
0 0 · · · λn
Note that ÃT = Ã. Assume that

 
0 0 0 ··· 0
 ... ... ..
.
.. 
.

0 0 0 · · · 0
 
Q̃ = Q̃ j =  ;
0 0 q̃ j j · · · 0
. . .. .. 
 .. .. . .
0 0 0 ··· 0
that is, Q̃ j has only one nonzero element q̃ j j . Under the above assumptions the matrix
(s I n + ÃT − Q̃ j (s I n − Ã)−1 F) takes the form
 
s + λ1 0 ··· 0
 0 s + λ2 · · · 0 
 
s I n + ÃT − Q̃ j (s I n − Ã)−1 F =  .. .. .. .. 
 . . . . 
0 0 · · · s + λn
 
1
  0 · · · 0
0 0 0 ··· 0  s − λ1  
 ... ... .. ..   
 f 11 f 12 · · · f 1n

 . .  0 1
  ··· 0  
  f 21 f 22 · · · f 2n 

− 0 0 q̃ j j 0  s − λ2   ..
 .. .. .. ..   .. .. .. ..   .
..
.
. .. .. 
.

. . . .   . . . . 
 f n1 f n2 · · · f nn
0 0 0 ··· 0  1 
0 0 ···
s − λn
 
  0 0 0 ··· 0
s + λ1 0 0 ··· 0  0
 . 0 0 ··· 0 
 0 s + λ2 0 ··· 0   . .. .. ..  
 . . . .   . . . . 
 . .. .. .. 
 .   
= − 
q̃ j j f j1 q̃ j j f j2 q̃ j j f j j
· · ·
q̃ j j f jn 

 0 s + λj 
  s − λj 
0 0 s − λj s − λj s − λj
 . . . . .  
 . . .
. .
. . . .
.   .. .. .. .. 
 . . . . 
0 0 0 · · · s + λn
0 0 0 ··· 0
 
s + λ1 0 0 ··· 0 0
 0 s + λ2 0 ··· 0 0 
 .. .. .. .. .. 
 
 . . . . . 
 

= − j j j1 − j j j2 · · · · · · s + λ − j j j j − j j jn 
q̃ f q̃ f q̃ f q̃ f
.
 s − λj s − λ
j
s − λ s − λ 
 j j j 
 .. .. .. .. .. .. 
 . . . . . . 
0 0 0 ··· 0 s + λn
Hence,
 
2
n
  q̃ j j f j j
det(s I n + ÃT − Q̃ j (s I n − Ã)−1 F) =  (s + λi ) s + λ j − .
i=1
s − λj
i= j
We can now evaluate the characteristic polynomial of the Hamiltonian matrix,

det(s I 2n − H) = det(s I 2n − H̃)
 
n
2  2n
 q̃ j j f j j
= (s − λi )  (s + λi ) s + λ j −
i=1 i=1
s − λj
i= j
2
n
= (s + λi )(s − λi )((s + λ j )(s − λ j ) − q̃ j j f j j ).
i=1
i= j
From the form of the characteristic polynomial of the Hamiltonian matrix, we conclude that
selecting an appropriate value of q̃ j j allows us to shift the jth pole of the closed-loop system
into a prespecified location while leaving the other poles in their original positions. To see this
we represent the term that involves q̃ j j as
(s + λ j )(s − λ j ) − q̃ j j f j j = s 2 − λ2j − q̃ j j f j j

= s 2 − λ2j + q̃ j j f j j
= (s − s j )(s + s j ),
where

sj = ± λ2j + q̃ j j f j j .
Fromthe above expression for s j , we can calculate q̃ j j that results in the jth pole being shifted
to − λ2j + q̃ j j f j j . We obtain
s 2j − λ2j
q̃ j j = . (5.33)
f jj
Note that while selecting the desired location for s j , we have to make sure that q̃ j j > 0 to ensure
positive semidefiniteness of the weight matrix Q̃ j . Having Q̃ j , we compute Q j = T −T Q̃ j T −1
and solve the Riccati equation
AT P j + P j A + Q j − P j B R−1 B T P j = O (5.34)
for P j . We compute u = −R−1 B T P j x + u j = −K j x + u j that shifts only one pole of the
closed-loop system
ẋ = ( A − B K j )x + Bu j
= A j x + Bu j
to its specified position. We then start with the new system
ẋ = A j x + Bu j
and shift the next pole to a specified location, and then we combine the obtained feedback gains.
The above method is based on the result, established by Solheim [267], that different feedback
matrices K j and the corresponding weight matrices Q j can be combined to give the final optimal
gain K and the corresponding weight matrix Q. Indeed, suppose we apply the above method
to the system ẋ = ( A − B K j )x + Bu j . After finding an appropriate weight matrix Q j+1 , we
solve the Riccati equation
( A − B K j )T P j+1 + P j+1 ( A − B K j ) + Q j+1 − P j+1 B R−1 B T P j+1 = O (5.35)
for P j+1 to compute the gain K j+1 . Adding the two Riccati equations (5.34) and (5.35), and
performing simple manipulations yields
AT ( P j + P j+1 ) + ( P j + P j+1 ) A + Q j + Q j+1 − ( P j + P j+1 )B R−1 B T ( P j + P j+1 ) = O.
Let P = P j + P j+1 and Q = Q j + Q j+1 , then
K = K j + K j+1
= R−1 B T P j + R−1 B T P j+1 .
In a similar manner we can show that the method works for more than just two iterations.
◆ Example 5.13
Consider a simple model of one-link robot manipulator shown in Figure 5.11. The
motion of the robot arm is controlled by a DC motor via a gear. The DC motor
is armature-controlled and its schematic is shown in Figure 5.12. We assume that
the motor moment of inertia is negligible compared with that of the robot arm.
We model the arm as a point mass m attached to the end of a massless rod of
length l. Hence the arm inertia is I a = ml 2 . We assume that the gear train has
no backlash, and all connecting shafts are rigid. As can be seen from Figure 5.11,
counterclockwise rotation of the arm is defined as positive, and clockwise rotation
Mass m
p
Massless rod of length l
1: N
DC motor Gear
u Figure 5.11 One-link manipulator controlled by a

Control voltage DC motor via a gear.
La Ra
m motor shaft
ia position

u eb back emf

Figure 5.12 Schematic of an

Armature circuit if constant armature-controlled DC motor of
Field circuit Example 5.13.
of the arm is defined as negative, whereas counterclockwise rotation of the motor

shaft is defined as negative, and clockwise rotation of the shaft is defined as positive.
The torque delivered by the motor is
Tm = K mi a ,
where K m is the motor-torque constant, and i a is the armature current. Let N

denote the gear ratio. Then we have
θp radius of motor gear number of teeth of motor gear 1

= = = .
θm radius of arm gear number of teeth of arm gear N
This is because the gears are in contact and thus
θ p × radius of arm gear = θm × radius of motor gear,
and the radii of the gears are proportional to their number of teeth. The work done
by the gears must be equal. Let T p denote the torque applied to the robot arm.
Then,
T p θ p = Tmθm.
Thus, the torque applied to the pendulum is
T p = NTm = NK mi a .
We use Newton’s second law to write the equation modeling the arm dynamics,
d 2θ p
Ia = mgl sin θ p + T p . (5.36)
dt 2
Substituting into (5.36) the expressions for I a and T p and then rearranging yields
d 2θ p
ml 2 = mgl sin θ p + NK mi a , (5.37)
dt 2
where g = 9.8 m/sec2 is the gravitational constant. Applying Kirchhoff’s voltage

law to the armature circuit yields
di a dθ p
La + Ra i a + K b N = u,
dt dt
where K b is the back emf constant. We assume that L a ≈ 0. Hence,
dθ p
u = Ra i a + K b N , (5.38)
dt
We next compute i a from (5.38) and substitute the result into (5.37) to get
 
dθ p
d 2θ KbN
p  u dt  .
ml 2 = mgl sin θ p + NK m  −  (5.39)
dt 2 Ra Ra
We can now construct a state-space model of the one-link robot. For this we
choose the following state and output variables:
dθ p
x1 = θ p , x2 = = ω p, and y = x1 .
dt
Then, using (5.39), we obtain the following simple state-space model of the robot
manipulator:
 
x2
ẋ 1 
ẋ 2
= g K b K m N2 NK m  ,
sin x1 − x2 + u
l ml 2 Ra ml 2 Ra
y = x1 .
Reasonable parameters for the robot are: l = 1 m, m = 1 kg, N = 10, K m =

0.1 Nm/A, K b = 0.1 Vsec/rad, Ra = 1 . With the above parameters the robot
model takes the form:

ẋ 1 x2
= ,
ẋ 2 9.8 sin x1 − x2 + u
y = x1 .
Time histories of state trajectories of the uncontrolled nonlinear system, u = 0,

are shown in Figure 5.13 for the initial conditions x1 (0) = 1 and x2 (0) = 0. A
phase portrait of the uncontrolled nonlinear system is shown in Figure 5.14. The
linearized model about x = 0, u = 0 has the form

d 0 1 0
x = x + u,
dt 9.8 −1 1 (5.40)
y = [1 0]x.
4
x1
3
2
x1, x2
1
x2
0
1
2
3
0 2 4 6 8 10
Time (sec)
Figure 5.13 Plots of y = x1 and x2 versus time of the uncontrolled nonlinear system of
Example 5.13.
10
8
6
4
2
x2 0
2
4
6
8
10
10 5 0 5 10
x1
Figure 5.14 A phase portrait of the uncontrolled nonlinear system of Example 5.13.
Plots of y = x1 and x2 versus time are shown in Figure 5.15. A phase portrait of
the linearized system is shown in Figure 5.16. Let
∞
J = ( y2 + u2 )dt.
0
We shall find the linear state-feedback control law u = −kx that minimizes J
subject to the equations given by (5.40). We have

1 0
Q = cT c = and R = [1].
0 0
25
20
x2
15
x1, x2
10
x1
0
0 0.2 0.4 0.6 0.8 1
Time (sec)
Figure 5.15 Plots of x1 and x2 versus time of the linearized system of Example 5.13.
10
8
6
4
2
x2 0
2
4
6
8
10
10 5 0 5 10
x1
Figure 5.16 A phase portrait of the linearized system of Example 5.13.
Solving the associated Riccati equation yields

72.3371 19.6509
P = . (5.41)
19.6509 5.3484
Hence,
k = [19.6509 5.3484] (5.42)
and

0 1
Ac = .
−9.8509 −6.3484
0.5
x1
0
x1, x2
x2
0.5
1
1.5
0 1 2 3 4 5
Time (sec)
Figure 5.17 Plots of x1 and x2 versus time of the linear closed-loop system of Example 5.13.
10
8
6
4
2
x2 0
2
4
6
8
10
10 5 0 5 10
x1
Figure 5.18 A phase portrait of the linear closed-loop system of Example 5.13.
Plots of x1 and x2 versus time of the closed-loop linear system

ẋ = ( A − bk) x = Ac x
y = x1 ,
when the initial conditions are x1 (0) = 1 and x2 (0) = 0, are shown in Figure 5.17.
A phase portrait of the closed-loop linear system is shown in Figure 5.18. We now
apply the above optimal controller to the nonlinear model. Plots of x1 and x2
versus time of the closed-loop nonlinear systems are shown in Figure 5.19. A
phase portrait of the nonlinear closed-loop system is shown in Figure 5.20. The
closed-loop poles of the linearized system—that is, the eigenvalues of Ac —are
λ1 = −2.7003 and λ2 = −3.6481.
0.5
x1
0
x1, x2
0.5
x2
1
1.5
0 1 2 3 4 5
Time (sec)
Figure 5.19 Plots of x1 and x2 of the nonlinear closed-loop system of Example 5.13.
10
8
6
4
2
x2 0
2
4
6
8
10
10 5 0 5 10
x1
Figure 5.20 A phase portrait of the nonlinear closed-loop system of Example 5.13.
◆ Example 5.14
We use the data from the previous example to find the weight matrix Q 2 and the
corresponding k2 so that λ2 is shifted to s2 = −5; that is, the new poles of the
closed-loop system
ẋ = ( Ac − bk2 ) x
are s1 = λ1 and s2 = −5. For the control law u = −kx + r , the linear closed-loop
system is

0 1 0
ẋ = x+ u
−9.8509 −6.3484 1
= Ac x + br,
where for simplicity we used x rather than x, and u instead of u. The eigen-
values of Ac are λ1 = −2.7003 and λ2 = −3.6481. We have λ1 = λ2 and thus the
matrix Ac is diagonalizable. Note that different eigenvalues are only a sufficient
condition for diagonalizability. We construct the similarity transformation using
the eigenvectors of Ac. The eigenvectors of Ac are

0.3473 −0.2644
v1 = and v2 = .
−0.9378 −0.9644
Let x = T z, where T = [v1 v2 ]. Then, Ac in the new coordinates has the form

−2.7003 0
Ãc = T −1 Ac T =
0 −3.6481
and

3.0383
b̃ = T −1 b = .
3.9912
Next, we compute

9.2312 12.1264
T −1 bR −1 bT T −T = .
12.1264 15.9296
Hence,
s 2 − λ22
q̃22 = 2 = 0.7339.
15.9296
Therefore,

0 0
Q̃ 2 = = T T Q2 T
0 0.7339
and

85.2450 31.5686
Q 2 = T −T Q̃ 2 T −1 = .
31.5686 11.6907
Solving the ARE with Ac, b, Q 2 , and R yields

9.8572 3.6504
P2 =
3.6504 1.3518
and
k2 = [3.6504 1.3518].
As a check, if we take

86.2450 31.5686
Qnew = Q + Q 2 = ,
31.5686 11.6907
where Q = cT c as in Example 5.13, and R new = R, then solving the associated

ARE using A, rather then Ac, we obtain

82.1943 23.3013
P new = = P + P 2,
23.3013 6.7002
where P is given by (5.41), and
knew = [23.3013 6.7002] = k + k2 ,
where k is the optimal gain given by (5.42).
◆ Example 5.15
For the linearized model given by (5.40) and R = [2], we will now determine Q
and the corresponding k that result in the closed-loop system ẋ = ( A − bk) x with
poles located at s1 = −4 and s2 = −5, where

0 1 0
A= , b= .
9.8 −1 1
We shift poles into the desired locations one by one. We first diagonalize A using
similarity transformation constructed from eigenvectors of A. The eigenvalues of A
are λ1 = 2.6702 and λ2 = −3.6702. We denote the corresponding eigenvectors
by v1 and v2 , respectively. We obtain

2.6702 0
T −1 AT = ,
0 −3.6702
where

0.3507 −0.2629
T = [v1 v2 ] = .
0.9365 0.9648
We will now construct k1 so that λ1 is shifted to s1 = −4, while λ2 remains
unchanged; that is, the eigenvalues of A − bk1 become s1 and λ2 . For this, we first
compute

0.1011 0.1349
F = T −1 bR −1 T T bT = .
0.1349 0.1800
Hence, f 11 = 0.1011,
42 − (2.6702) 2
q̃11 = = 87.7352,
0.1011
and

87.7352 0
Q̃ 1 = = T T Q1 T .
0 0
Hence,

239.0024 65.1202
Q 1 = T −T Q̃ 1 T −1 = .
65.1202 17.7431
Solving the associated ARE, using R = [2], yields

179.7014 48.9626
P1 = .
48.9626 13.3407
Therefore,
k1 = [24.4813 6.6703]
and

0 1
A1 = A − bk1 = .
−14.6813 −7.6703
We will now construct k2 that shifts λ2 to s2 leaving s1 unchanged. We first
diagonalize A1 using its eigenvectors. Note that A1 and A share the common
eigenvector v2 . The other eigenvector of A1 corresponding to its eigenvalue s1 is
u1 = [−0.2425 0.9701]T . We have

s1 0
T −1 A1 T = ,
0 λ2
where

−0.2425 0.2629
T = [u1 v2 ] = .
0.9701 −0.9648
We next compute

78.0607 72.0157
T −1 bR −1 bT T −T = .
72.0157 66.4389
Thus, f˜22 = 66.4389,
52 − (−3.6702) 2
q̃22 = = 0.1735,
66.4389
and

0 0
Q̃ 2 = = T T Q2 T .
0 0.1735
From the above, we obtain

368.9003 92.2211
Q 2 = T −T Q̃ 2 T −1 = .
92.2211 23.0543
Solving the associated ARE, we get

42.5495 10.6369
P2 =
10.6369 2.6591
and
k2 = [5.3185 1.3296].
The final weight matrix Q that results in shifting poles into the desired locations
is

607.9027 157.3413
Q = Q1 + Q2 = .
157.3413 40.7974
The corresponding P is

222.2509 59.5996
P = P1 + P2 = .
59.5996 15.9998
The corresponding optimal gain vector k is
k = k1 + k2 = [29.7998 7.9999].
5.3.4 Optimal Saturating Controllers

Asymptotic stability of an equilibrium state of a nonlinear system is a local property. This
means that the system’s trajectories starting in a neighborhood of the equilibrium state will
approach this equilibrium asymptotically. Often, we are interested in determining how far from
the equilibrium state the system’s trajectories can start and still converge to the equilibrium
asymptotically. A region around an equilibrium state within which all trajectories approach the
equilibrium is called a region of attraction, or region of asymptotic stability (RAS). Finding the
exact RAS analytically can be a challenging task. A tool often used to estimate the exact RAS
is a Lyapunov function. Specifically, if V = V (x) is a Lyapunov function of the equilibrium
state of interest—that is, V is positive definite with respect to the equilibrium state in a domain
S and the time derivative of V evaluated on the trajectories of the system is negative definite in
S—then the set
c = {x : x ∈ S, V (x) ≤ c}
is an estimate of the exact RAS. In other words, a region of attraction is a set of states x for
which V̇ (x) < 0 and V (x) ≤ c for some positive constant c. The set c may be a conservative
estimate, often a crude underestimate, of the true RAS. In order to obtain the least conservative
estimate of the domain of attraction, corresponding to the given Lyapunov function, we could
try to maximize the size of c by finding the largest V contour that can just fit on the equilibrium
side of the surface described by V̇ (x) = 0. This can be accomplished by solving the optimization
problem for c such that
c = min V (x). (5.43)
V̇ (x)=0
V (x) 0
2.5
2
1.5
1
0.5
x2 V (x) 0
0
0.5
1
1.5
2 V (x) constant
V (x) 0
2.5
1.5 1 0.5 0 0.5 1 1.5
x1
Figure 5.21 Estimating the region of asymptotic stability.
Solving the above optimization problem corresponds to maximizing the size of the region
{x : V (x) ≤ c} in which V̇ (x) < 0 by making the boundary of {x : V (x) ≤ c} to be tangent to
the boundary of {x : V̇ (x) < 0}. This is illustrated in Figure 5.21.
We next estimate the RAS for single-input linear dynamical systems with saturating linear
optimal control. Our development follows that of Friedland [91, Section 5.1]. We will use such
controllers in Chapter 11 to stabilize chaotic systems. We consider a single-input linear system
model,
ẋ = Ax + bUmax sat(u/Umax ), (5.44)
where x ∈ Rn and

 1 if z ≥ 1
sat(z) = z if |z| < 1

−1 if z ≤ −1.
In the linear region, |u| ≤ Umax , (5.44) becomes ẋ = Ax + bu. We will consider using the
control law of the form
u = Umax sat(−kx/Umax ). (5.45)
When |kx| ≤ Umax , the control law given by (5.45) becomes a linear state-feedback controller,
u = −kx. For simplicity of notation, we can assume that Umax = 1. Indeed, we can always scale
the input vector b so that Umax = 1. The gain vector k is obtained by solving the associated
LQR problem with the performance index
∞
J= (x T Qx + r u 2 ) dt,
0
where we assume that the pair ( A, b) is reachable, Q = Q T > 0, and r > 0. Solving the algebraic
Riccati equation,
AT P + P A + Q − P bbT P/r = O, (5.46)
we use its solution, P = P T > 0, to construct the gain vector,
k = bT P/r. (5.47)
We then proceed with evaluating of dtd V = dtd x T P x on the trajectories of the closed-loop system,
V̇ = 2x T P ẋ
= x T ( AT P + P A)x − 2x T P b sat(kx). (5.48)
By (5.47),
bT P = r k. (5.49)
Using the above and (5.46) gives
AT P + P A = r k T k − Q. (5.50)
Substituting (5.50) and (5.49) into (5.48) yields
V̇ = −x T Qx − 2r kx sat(kx) + r (kx)2 (5.51)
For x such that |kx| ≤ 1, the above becomes

V̇ = −x T Qx − r (kx)2 ,
which is negative definite. Thus, the origin is (locally) asymptotically stable for the closed-loop
system.
An estimate of the RAS is a set of states bounded by the surface V = constant for which
V̇ < 0. To find as large an estimate as possible, we will attempt to find the surface V̇ = 0 and
find c = min V (x) on that surface. Then, the region bounded by V (x) = c is an estimate of the
RAS. Solving the above problem amounts to solving the optimization problem,
min x T P x
x
subject to −x Qx − 2r kx sat(kx) + r (kx)2 = 0.

T
(5.52)
Solving the above problem is not a simple task. Instead, we will solve an easier problem resulting
in a more conservative estimate of the RAS. We first rewrite (5.52) as
y 2 − 2y sat(y) − x T Qx/r = 0, (5.53)
where y = kx. We will investigate the case when |y| ≥ 1 because for |y| < 1, we have V̇ < 0
and we are interested in the solution to the above when V̇ = 0. When y ≥ 1, (5.53) becomes
y 2 − 2y − x T Qx/r = 0. (5.54)
because for y ≥ 1, sat(y) = 1. The solution to (5.54) for y ≥ 1 is

y = 1 + 1 + x T Qx/r > 2, (5.55)
while for y ≤ −1 the solution to (5.53) has the form

y = −1 − 1 + x T Qx/r < −2. (5.56)
The above solutions satisfy the inequality
|y| ≥ 2.
Thus, we have V̇ < 0 for |y| = |kx| ≤ 2 and hence the RAS includes the region
{x : V (x) ≤ c̃},
where c̃ is obtained by solving the optimization problem
c̃ = max x T P x
|kx|≤2
We will use the Lagrange multipliers method to solve the above constrained optimization prob-
lem. We will solve the equivalent problem,
max x T P x
subject to (kx)2 = 4.
The Lagrangian for the above problem is
l(x, λ) = x T P x + λ(4 − (kx)2 ).
Applying to the Lagrangian l the first-order necessary conditions for unconstrained minimization
gives
D x l(x, λ) = 2x T P − 2λ(kx)k = 0T , (5.57)
Dλl(x, λ) = 4 − (kx) = 0. 2
(5.58)
From (5.57), we obtain
x T P x = λ(kx)2 .
Because (kx)2 = 4,
λ = 14 x T P x,
and therefore
c̃ = x T P x = 4λ.
Again, using (5.57), we obtain
x T P = λ(kx)k.
Postmultiplying both sides of the above by P −1 k T gives
x T k T = kx = λ(kx)k P −1 k T . (5.59)
Note that kx = 0 because (kx)2 = 4. Hence, from (5.59),
1
λ= .
k P −1 k T
Substituting (5.47) into the above gives
1 r2
λ= = .
k P −1 k T T
b Pb
Hence,
4r 2
c̃ = 4λ =
bT P b
5.3.5 Linear Quadratic Regulator for Discrete Systems

on an Infinite Time Interval
Consider a discrete linear plant model,
x(k + 1) = Ax(k) + Bu(k), k = 0, 1, 2, . . . , (5.60)
with a specified initial condition x(0) = x 0 , where x(k) ∈ Rn and u(k) ∈ Rm . We assume that
the pair ( A, B) is reachable. Our objective is to construct a stabilizing linear state-feedback
controller,
u(k) = −K x(k),
that minimizes the quadratic performance index

∞

J (u) = {x T (k) Qx(k) + u T (k)Ru(k)}, (5.61)
k=0
where Q = Q T ≥ 0, and R = R T > 0. We assume that the components of the control vector
are unconstrained. An optimal control law will be denoted u∗ .
Because we seek the controller among the ones that stabilize the system (5.60), the closed-
loop system,
x(k + 1) = ( A − B K )x(k), (5.62)
is asymptotically stable; that is, the eigenvalues of the matrix A − B K are in the open unit disk.
This implies that there exists a Lyapunov function, V (x(k)) = x T (k) P x(k), for the
system (5.62). Therefore the first forward difference, V (x(k)) = V (x(k + 1)) − V (x(k)),
evaluated on the trajectories of (5.62) is negative definite. We now state and prove a sufficient
condition for u(k) = −K x(k) to be optimal.
Theorem 5.4 If the state feedback controller, u(k) = −K x(k), is such that
min(V( x(k)) + x T (k) Qx(k) + u T (k) R u(k)) = 0, (5.63)
u
then it is optimal.
Proof We can represent (5.63) as

V( x(k))|u=u∗ + x T (k) Qx(k) + u∗T (k) R u∗ (k) = 0.
Hence,
V( x(k))|u=u∗ = ( V( x(k + 1)) − V( x(k))|u=u∗ = −x T (k) Qx(k) − u∗T (k) R u∗ (k).
We sum both sides from k = 0 to k = ∞ to get
∞

V( x(∞)) − V( x(0)) = − ( x T (k) Qx(k) + u∗T (k) R u∗ (k)).
k=0
Because by assumption the closed-loop system is asymptotically stable, we have

x(∞) = 0. Hence,
∞

V( x(0)) = x 0T P x 0 = ( x T (k) Qx(k) + u∗T (k) R u∗ (k)).
k=0
Thus, if a linear stabilizing controller satisfies the hypothesis of the theorem, then the
value of the performance index (5.61) resulting from applying this controller is
J = x 0T P x 0 .
To show that such a controller is an optimal one, we will use a proof by contradiction.
We assume that the hypothesis of the theorem holds but the controller u∗ that satisfies
this hypothesis is not optimal; that is, there is a controller ũ such that
J ( ũ) < J ( u∗ ). (5.64)
The hypothesis of the theorem implies that
V( x(k))|u=ũ + x T (k) Qx(k) + ũ T (k) R ũ(k) ≥ 0. (5.65)
We represent (5.65) as
V( x(k))|u=ũ ≥ −x T (k) Qx(k) − ũ T (k) R ũ(k).
Summing the above from k = 0 to k = ∞ yields
∞

V( x(0)) ≤ ( x T (k) Qx(k) + ũ T (k) R ũ(k));
k=0
that is,
J ( ũ) ≥ J ( u∗ ),
which contradicts (5.64). The proof is complete.
Finding an optimal controller involves finding an appropriate quadratic Lyapunov function

V (x) = x T P x, which then is used to construct the optimal controller. We first find u∗ that
minimizes the function
f = f (u(k)) = V (x(k)) + x T (k) Qx(k) + u T (k)Ru(k). (5.66)
To proceed, we perform preliminary manipulations on the function f . In particular, we use the
definition of V and substitute into (5.66) the system equation (5.60) to obtain
f (u(k)) = x T (k + 1) P x(k + 1) − x T (k) P x(k) + x T (k) Qx(k) + u T (k)Ru(k)
= ( Ax(k) + Bu(k))T P( Ax(k) + Bu(k)) − x T (k) P x(k) + x T (k) Qx(k)
+ u T (k)Ru(k). (5.67)
Applying to the above function the first-order necessary condition for a relative minimum gives
∂ f (u(k))
= 2( Ax(k) + Bu(k))T P B + 2u T (k)R
∂ u(k)
= 2x T (k) AT P B + 2u T (k)(R + B T P B)
= 0T .
The matrix R + B T P B is positive definite because R is positive definite, and therefore the
matrix R + B T P B is invertible. Hence,
u∗ (k) = −(R + B T P B)−1 B T P Ax(k) = −K x(k), (5.68)
where
K = (R + B T P B)−1 B T P A.
Let
S = R + B T P B.
Then, we represent (5.68) as
u∗ (k) = −S−1 B T P Ax(k). (5.69)
We next check if u∗ satisfies the second-order sufficient condition for a relative minimizer of
the function (5.66). We have
∂2
(x T (k + 1) P x(k + 1) − x T (k) P x(k) + x T (k) Qx(k) + u T (k)Ru(k))
∂ u2 (k)
∂
= (2x T (k) AT P B + 2u T (k)(R + B T P B))
∂ u(k)
= 2(R + B T P B)
> 0, (5.70)
that is, u∗ satisfies the second-order sufficient condition for a relative minimizer of the func-
tion (5.66).
The optimal controller (5.69) can be constructed if we have found an appropriate positive
definite matrix P. Our next task is to devise a method that would allow us to compute the desired
matrix P. For this, we first find the equation describing the closed-loop system driven by the
optimal controller given by (5.69):
x(k + 1) = ( A − B S−1 B T P A)x(k). (5.71)
Our controller satisfies the hypothesis of Theorem 5.4, that is,
x T (k + 1) P x(k + 1) − x T (k) P x(k) + x T (k) Qx(k) + u∗T (k)Ru∗ (k) = 0.
5.4 DYNAMIC PROGRAMMING 273
Substituting into (5.71) the expression for u∗ given by (5.69), along with performing simple
manipulations, gives
x T (k)( A − B S−1 B T P A)T P( A − B S−1 B T P A)x(k) − x T (k) P x(k)
+ x T (k) Qx(k) + x T (k) AT P B S−1 RS−1 B T P Ax(k)
= x T (k) AT P Ax(k) − x T (k) AT P B S−1 B T P Ax(k)
− x T (k) AT P B S−1 B T P Ax(k) + x T (k) AT P B S−1 B T P B S−1 B T P Ax(k)
− x T (k) P x(k) + x T (k) Qx(k) + x T (k) AT P B S−1 RS−1 B T P Ax(k)
= x T (k) AT P Ax(k) − x T (k) P x(k) + x T (k) Qx(k)
− 2x T (k) AT P B S−1 B T P Ax(k)
+ x T (k) AT P B S−1 (R + B T P B)S−1 B T P Ax(k)
= x T (k) AT P Ax(k) − x T (k) P x(k) + x T (k) Qx(k)
− 2x T (k) AT P B S−1 B T P Ax(k)
+ x T (k) AT P B S−1 B T P Ax(k)
= x T (k)( AT P A − P + Q − AT P B S−1 B T P A)x(k)
= 0. (5.72)
The above equation should hold for any x. For this to be true we have to have
AT P A − P + Q − AT P B S−1 B T P A = O (5.73)
The above equation is called the discrete algebraic Riccati equation.
5.4 Dynamic Programming

The objective of this section is to introduce the reader to the method of dynamic programming
invented by Richard E. Bellman (1920–1984) in 1957. Dynamic programming can be used to
synthesize optimal controllers for nonlinear, time-varying dynamical systems. We first present
dynamic programming for discrete-time systems and then discuss the continuous version of
dynamic programming.
5.4.1 Discrete-Time Systems

Dynamic programming is based on the principle of optimality (PO), which was formulated by
Bellman as follows:
An optimal policy has the property that whatever the initial state and initial decision are, the
remaining decisions must constitute an optimal policy with regard to the state resulting from the
first decision.
In other words, an optimal trajectory has the property that at an intermediate point, no matter how
it was reached, the rest of the trajectory must coincide with an optimal trajectory as computed
from this intermediate point as the initial point. We will justify the PO by contradiction. Sup-
pose that we are given an optimal trajectory starting from initial state x 0 = x(t0 ) at time t0 and
x(tf ) Optimal trajectory cost

J *(t , x(t ))
0 0
x(t)
x(t0) Figure 5.22 Illustration of the Principle of Optimality.
terminating at the final state x(t f ) at time t f . Let x(t) be an intermediate state on the optimal tra-
jectory and let J ∗ (t, x(t)) be the optimal cost of traveling from x(t) to the final state x(t f ). Thus,
J ∗ (t0 , x(t0 )) is the optimal cost of traveling from x 0 = x(t0 ) to x(t f ). It then follows that the
portion of the optimal trajectory from x(t) to x(t f ) is also optimal. For if some other path,
indicated by the dashed curve in Figure 5.22, connecting x(t) and x(t f ) resulted in a smaller
cost than the solid path, then this would provide a less expensive route from x 0 to x(t f ), thus
contradicting the optimality of the original trajectory.
We note that the PO seems to be already known to Johann Bernoulli, who stated in 1706 that
“. . . all curves, which should give a maximum, preserve also in all their sections the laws of the
same maximum.” Here, the reader should interpret maximum as optimum.
Bellman’s principle of optimality serves to “limit the number of potentially optimal control
strategies that must be investigated” [182, p. 315]. The optimal control strategy is determined,
using the PO, by working backward from the final stage.
We illustrate this “backwardness” with an example of a sequential decision-making process.
We define a decision as the choice of alternative paths leaving a given node.
◆ Example 5.16
We use the PO to find the minimum cost path that starts at the node a and ends
at any one of the nodes k, l, m in the routing network depicted in Figure 5.23.
The travel costs are shown beside each path segment. The line segments can only
be traversed from left to right. Applying the PO to this routing problem, we find
the optimal path by performing a backward pass through the network. Let J q∗ be
the minimum cost from a generic node q to any one of the three possible final
nodes. Then, we have
J g∗ = 16, J h∗ = 8, J i∗ = 7, J j∗ = 6.
The next stage yields
J d∗ = min{3 + J g∗ , 11 + J h∗ } = min{19, 19} = 19.
Thus, in this case, we have two optimal paths from node d. We can take either dgk
or dhk paths. We then compute optimal paths from nodes e and f. We have
J e∗ = min{6 + J h∗ , 10 + J i∗ } = min{6 + 8, 10 + 7} = 14
g
3 16
d
9 11 8 k
h
b
4 2 6
e 9
a 10
i
7 5
c 4
7 l
14
6
f
12 m
j 8 Figure 5.23 A routing network.
and
J ∗f = min{4 + J i∗ , 12 + J j∗ } = min{4 + 7, 12 + 6} = 11.
Thus, the optimal path emanating from node e is ehk, and the path from node f is
fil. The next stage yields
J b∗ = min{9 + J d∗ , 2 + J e∗ } = min{9 + 19, 2 + 14} = 16,
and
J c∗ = min{5 + J e∗ , 14 + J ∗f } = min{5 + 14, 14 + 11} = 19.
Therefore, the optimal path starting from b is behk, and starting from c the optimal
path is cehk. Finally, the initial, and hence final, stage yields
J a∗ = min{4 + J b∗ , 7 + J c∗ } = min{4 + 16, 7 + 19} = 20.
Hence, the optimal, minimum total cost route from a to any one of the three
possible termination nodes k, l, or m is abehk. The total cost of this path is 20.
We now consider a class of discrete-time dynamical systems modeled by the difference equation
x(k + 1) = f (k, x(k), u(k)), (5.74)
with a given initial condition x(k0 ) = x 0 , and the associated performance index
N −1

J = J0 = (N , x(N )) + F(k, x(k), u(k)).
k=0
Let Jk∗ (x(k)) denote the minimum cost of transferring the system from x(k) to the terminal
point x(N ). Because x(N ) is the terminal point, we have JN∗ (x(N )) = (N , x(N )). Using the
PO, we obtain the cost of transfer from x(N − 1) to the terminal point:
JN∗ −1 (x(N − 1)) = min {F(N − 1, x(N − 1), u(N − 1)) + JN∗ (x(N ))}
u(N −1)
= min {F(N − 1, x(N − 1), u(N − 1)) + (N , x(N ))}.
u(N −1)
When applying the PO to solving optimal control problems for discrete-time dynamical systems,
we usually perform two stage-by-stage passes through the time stages. We begin with a backward
pass. For this, we use (5.74) to eliminate x(N ). Next, we carry out the minimization to find
u∗ (N − 1). We then repeat the process. A typical step of the backward pass is described as
Jk∗ (x(k)) = min{F(k, x(k), u(k)) + Jk+1
∗
(x(k + 1))}
u(k)
∗
= min{F(k, x(k), u(k)) + Jk+1 ( f (k, x(k), u(k)))}.
u(k)
The backward pass is completed when the initial time k0 is reached. Now, because x 0 = x(k0 )
is known, we find u0 = u(k0 ) in terms of this state. We can then proceed with the forward pass.
We use x 0 , u0 , and (5.74) to compute x(k0 + 1). The state x(k0 + 1) is then used to compute
u(k0 + 1) from x(k0 + 2) = f (k0 + 1, x(k0 + 1), u(k0 + 1)), and so on.
◆ Example 5.17
Consider a scalar discrete-time dynamical system modeled by
x(k + 1) = 2x(k) − 3u(k), x(0) = 4,
1 2
1
J = J 0 = ( x(2) − 10) 2 + ( x (k) + u2 (k)).
2
k=0
Note that the final state, in this example, is free. We use the PO to find optimal
u(0) and u(1). There are no constraints on u(k). We begin with the backward pass.
We have
J ∗ ( x(2)) = ( x(2) − 10) 2 .
Then,
3 4
J ∗ ( x(1)) = min 12 ( x 2 (1) + u2 (1)) + J ∗ ( x(2))
u(1)
3 4
= min 12 ( x 2 (1) + u2 (1)) + (2x(1) − 3u(1) − 10) 2 .
u(1)
Because there are no constraints on u(1), we find optimal u(1) as a function of x(1)
by applying the first-order necessary condition for unconstrained optimization to
get
∂ 31 2 4
( x (1) + u2 (1)) + (2x(1) − 3u(1) − 10) 2 = 0.
∂u(1) 2
Hence,
1 (12x(1) − 60),
u(1) = 19
and therefore
2
1 12x(1) − 60 2 12x(1) − 60
J ∗ ( x(1)) = x 2 (1) + + 2x(1) − 3 − 10 .
2 19 19
Next, we compute u(0). For this observe that

3 4
J ∗ ( x(0)) = min 12 ( x 2 (0) + u2 (0)) + J ∗ ( x(1)) .
u(0)
Taking into account that
x(1) = 2x(0) − 3u(0), x(0) = 4,
and the fact that there are no constraints on u(0), we again use the first-order nec-
essary condition for unconstrained optimization to find optimal u(0). We compute
∂ 31 2 4
( x (0) + u2 (0)) + J ∗ ( x(1)) = 0
∂u(0) 2
to get
5 6
∂ 1 1 12(8 − 3u(0)) − 60 2
8 + u2 (0) +
∂u(0) 2 2 19
2
∂ 12(8 − 3u(0)) − 60
+ 2(8 − 3u(0)) − 3 − 10 = 0.
∂u(0) 19
Performing some manipulations yields 13.7895u(0) = 27.7895. Hence, u(0) =

2.0153. This completes the backward pass. We are now ready for the forward pass.
We use the system difference equation, x(0), and u(0) to compute x(1) = 1.9541.
Using the above, we find
12x(1) − 60
u(1) = = −1.9237.
19
Therefore,
x(2) = 2x(1) − 3u(1) = 9.6794,
which completes the forward pass.
5.4.2 Discrete Linear Quadratic Regulator Problem

In Example 5.17 above, we solved a simple discrete linear quadratic regulator problem where
we found the open-loop optimal control strategy. We will now apply the PO to find the optimal
control of a linear state-feedback form for the discrete linear quadratic regulator problem. This
problem is stated as follows. Given a discrete linear plant model,

x(k + 1) = Ax(k) + Bu(k), 0 ≤ k ≤ N − 1,
with a specified initial condition x(0) = x 0 . We wish to calculate the optimal control sequence
{u∗ (0), u∗ (1), . . . , u∗ (N − 1)},
that minimizes the quadratic performance index
N −1
1 T 1 T
J = J0 = x (N )H N x(N ) + {x (k) Qx(k) + u T (k)Ru(k)},
2 2 k=0
where H N = H TN ≥ 0, Q = Q T ≥ 0, and R = R T > 0. We assume that the components of the

control vector are unconstrained. To solve this problem we use the PO. To begin, we note that
J ∗ (x(N )) = 12 x T (N )H N x(N ).
The above is the penalty for being in a state x(N ) at a time N .

We now decrement k to N − 1 to get
JN −1 (x(N − 1)) = 12 x T (N )H N x(N ) + 12 x T (N − 1) Qx(N − 1) + 12 u T (N − 1)Ru(N − 1).
We use the state equation to eliminate x(N ) from JN −1 to obtain
JN −1 = 12 ( Ax(N − 1) + Bu(N − 1))T H N ( Ax(N − 1) + Bu(N − 1))

+ 12 x T (N − 1) Qx(N − 1) + 12 u T (N − 1)Ru(N − 1).
Because there are no constraints on control, we can apply the first-order necessary condition
from static optimization to find u∗ (N − 1) as a function of x(N − 1),
∂ JN −1
= ( Ax(N − 1) + Bu(N − 1))T H N B + u T (N − 1)R
∂ u(N − 1)
= 0T .
Solving for the control satisfying the first-order necessary condition, we obtain
u∗ (N − 1) = −(R + B T H N B)−1 B T H N Ax N −1 .
Let
K N −1 = (R + B T H N B)−1 B T H N A,
then
u∗ (N − 1) = −K N −1 x(N − 1).
Computing the optimal cost of transferring the system from N − 1 to N yields
JN∗ −1 = 12 x T (N − 1)( A − B K N −1 )T H N ( A − B K N −1 )x(N − 1)

+ 12 x T (N − 1) K TN −1 R K N −1 + Q x(N − 1).
Let
H N −1 = ( A − B K N −1 )T H N ( A − B K N −1 ) + K TN −1 R K N −1 + Q,
then
JN∗ −1 = 12 x T (N − 1)H N −1 x(N − 1).
Decrementing k to N − 2 yields
JN −2 = 12 x T (N − 1)H N −1 x(N − 1) + 12 x T (N − 2) Qx(N − 2) + 12 u T (N − 2)Ru(N − 2).
Note that JN −2 has the same form as JN −1 . Thus, we obtain analogous optimal feedback gain
where N is being replaced with N − 1. Continuing in this fashion, we get the following results
for each k = N − 1, N − 2, . . . , 0:
K k = (R + B T H k+1 B)−1 B T H k+1 A,
u∗ (k) = −K k x(k),
H k = ( A − B K k )T H k+1 ( A − B K k ) + K kT R K k + Q,
and
Jk∗ = 12 x T (k)H k x(k).
The above control scheme can be implemented by computing the sequence of gain matrices
{K k } off-line and stored. Then, we can implement the controller u∗ (k) = −K k x(k). Note that
the time-varying nature of the feedback controller may make an implementation of the control
strategy quite complicated, especially for large n and N . However, as we go to the limit as
N → ∞, we can simplify the solution to the problem. Specifically, let us consider a (reachable)
system model
x(k + 1) = Ax(k) + Bu(k), k ≥ 0, x(0) = x 0
∞
1 T
J= {x (k) Qx(k) + u T (k)Ru(k)}
2 k=0
obtained from the previously considered performance index by letting N → ∞ and setting
H N = 0. Then, the optimal controller takes the form
u∗ (k) = −K ∞ x(k), k ≥ 0,
where
K ∞ = (R + B T H ∞ B)−1 B T H ∞ A (5.75)
and H ∞ is defined as the limit obtained by continuing the recursion
H k = ( A − B K k )T H k+1 ( A − B K k ) + K kT R K k + Q (5.76)
back to k = −∞, that is,

lim H k = H ∞ .
k→−∞
Alternatively, letting k → −∞ in (5.76), we obtain

H ∞ = ( A − B K ∞ )T H ∞ ( A − B K ∞ ) + K ∞
T
R K ∞ + Q, (5.77)
where K ∞ is given by (5.75). Substituting (5.75) into (5.77) yields
H ∞ = ( A − B(R + B T H ∞ B)−1 B T H ∞ A)T H ∞ ( A − B(R + B T H ∞ B)−1 B T H ∞ A)
+ ((R + B T H ∞ B)−1 B T H ∞ A)T R((R + B T H ∞ B)−1 B T H ∞ A) + Q. (5.78)
Thus, we can compute H ∞ by solving the above algebraic equation. We note that the above
formidable looking equation is just the discrete algebraic Riccati equation. Indeed, setting
P = H ∞ , using the notation
S = R + B T H ∞ B = R + B T P B,
and performing simple manipulations, we obtain (5.73).
We next discuss a continuous form of dynamic programming. We begin our discussion by
analyzing the minimum time regulator problem. Then, we focus our attention on a more general
optimal control problem.
5.4.3 Continuous Minimum Time Regulator Problem

In this subsection we consider a control system modeled by
ẋ = f (t, x, u)
with a given initial state x(t0 ) = x 0 . We restrict the controller u(t) to be restrained by the compact
set U ⊂ Rm . For example, we may restrict the components u i , i = 1, 2, . . . , m, of the control
vector u to satisfy the amplitude constraints
|u i | ≤ ki ,
where ki are fixed real numbers. A piecewise continuous control vector u satisfying the above
constraints is called admissible. The set of admissible controllers is denoted U . Our goal is to
determine an admissible control law u∗ that steers the system’s trajectory from a given initial
state x(t0 ) = x 0 to a specified final state x f in the shortest possible time. This is equivalent to
minimizing the performance index
tf
J= dt = t f − t0 .
t0
We call such a controller time optimal. We assume that for any x = x f there exists admissible,
time optimal control transferring the system from x to x f in minimum time. We denote by
J ∗ (t, x) = J ∗ (t, x(t)) the time of the optimal transfer from x(t) to the specified x f . Observe
that J ∗ (t, x) = J ∗ (t, x1 , x2 . . . , xn ), that is,
J ∗ : Rn+1 → R,
which means that J ∗ is a real-valued function of n + 1 variables. We assume that the function
J ∗ is continuous and has continuous partial derivatives everywhere except at the point x f .
x(t0)
t t0
Optimal trajectory
x(t)
J *(t, x(t))
xf Figure 5.24 Illustration of the derivation of the
condition that a controller must satisfy when
transferring a dynamical system from an initial
state to the final state in an optimal manner.
Now, let x 0 = x(t0 ) = x f be an arbitrary initial state in Rn . Assume that we applied some
admissible u0 ∈ U on the time interval [t0 , t]. As a result of this control action, the system is
transferred from x(t0 ) to x(t). The time spent on the motion from x(t0 ) to x(t) is t − t0 . We
then apply an admissible optimal control that transfers the system in minimum time J ∗ (t, x(t))
from x(t) to x f . The above considerations are illustrated in Figure 5.24. Denoting the optimal
time of transferring the system from x(t0 ) to x f by J ∗ (t0 , x(t0 )), we obtain
(t − t0 ) + J ∗ (t, x(t)) ≥ J ∗ (t0 , x(t0 )).
Rearranging the terms of the above equation and dividing by (t − t0 ) yields
J ∗ (t, x(t)) − J ∗ (t0 , x(t0 ))
1+ ≥ 0.
t − t0
Letting t → t0 and taking into account that ẋ i = f i (t, x, u), i = 1, 2, . . . , n, we obtain
∂ J ∗ (t, x) ∂ J ∗ (t, x) ∂ J ∗ (t, x) ∂ J ∗ (t, x)
n n
1+ + ẋ i = 1 + + fi
∂t i=1
∂ xi ∂t i=1
∂ xi
∂ J∗ ∂ J∗
= 1+ + f
∂t ∂x
≥ 0.
We conclude that for any x 0 = x f and for any u ∈ U ,
∂ J∗ ∂ J∗
1+ + f ≥ 0.
∂t ∂x
We next define the Hamiltonian function for the time optimal control problem to be
H (t, x, u, p) = 1 + pT f (t, x, u),
where
T
∂ J∗ ∂ J∗ ∂ J∗
p= ... .
∂ x1 ∂ x2 ∂ xn
Thus, we can say that for any x 0 = x f and for any u ∈ U ,
∂ J ∗ (t, x)
+ H (t, x, u) ≥ 0.
∂t
Optimal trajectory
x(t0)
x(t)
J *(t0, x0) (t t0)
Figure 5.25 Illustration of the derivation of the

xf optimality condition of Theorem 5.5.
We will now show that

∂ J ∗ (t, x)
+ H (t, x, u) = 0
∂t
for any optimal transfer, that is,
∂ J ∗ (t, x)
+ min H (t, x, u) = 0 for any x = x f .
∂t u∈U
In the following, we use the PO. Suppose that u(t) ∈ U is an optimal controller that transfers the
system from the initial state x(t0 ) = x f to the given final state x f in a time t f . Thus, we have
J ∗ (t0 , x(t0 )) = t f − t0 . The transfer from x(t) to x f requires (t f − t) time units. See Figure 5.25
for an illustration of the arguments. Hence,
t f − t = J ∗ (t0 , x(t0 )) − (t − t0 ),
and it is not possible to go from x(t) to x f in a time less than J ∗ (t0 , x(t0 )) − (t − t0 ). For if
such a more rapid motion were to exist, indicated by the dashed curve in Figure 5.25, then by
moving from x 0 to x(t) during the time t − t0 and then moving from x(t) to x f during a time
less than J ∗ (t0 , x(t0 )) − (t − t0 ), we would accomplish the transfer from x 0 to x f during a time
less than J ∗ (t0 , x(t0 )), which is impossible. Therefore, we can write
t f − t = J ∗ (t, x(t))
= J ∗ (t0 , x(t0 )) − (t − t0 ).
Rearranging terms, dividing by t − t0 , and letting t → t0 yields
∂ J ∗ (t, x)
+ min H (t, x, u) = 0 for any x = x f .
∂t u∈U
We summarize our considerations in the following theorem.
Theorem 5.5 Let ẋ = f (t, x, u) be a control system model, let U be a set of admissible
controllers, and let x f be a prescribed final state. If we suppose that
1. for any x = x f there exists an admissible optimal control transferring x to x f , and
2. the function J ∗ (t, x(t)), which is the time of optimal transfer of the system from x,
at time t, to x f , is continuously differentiable except at the point x f ,
then
∂ J ∗ (t, x)
+ H(t, x, u) ≥ 0 for any u ∈ U and x = x f ,
∂t
∂ J ∗ (t, x)
+ min H(t, x, u) = 0 for any optimal transfer where x = x f .
∂t u∈U
The above theorem is a special case of a more general result known as the continuous form
of the dynamic programming. We now use the PO to derive a continuous form of dynamic
programming for a class of problems with more general performance indices.
5.4.4 The Hamilton–Jacobi–Bellman Equation

We consider a control process model of the form
ẋ = f (t, x, u), x(t0 ) = x 0 , (5.79)
and we consider the performance index to minimize
tf
J (t0 , x(t0 ), u) = (t f , x(t f )) + F(τ, x(τ ), u(τ )) dτ. (5.80)
t0
Define
tf
J (t, x(t), u(τ )) = (t f , x(t f )) + F(τ, x(τ ), u(τ )) dτ
t
with t ≤ τ ≤ t f , and let

J ∗ (t, x(t)) = min J (t, x(t), u(τ )).
u
We subdivide the interval [t, t f ] as

[t, t f ] = [t, t + t] ∪ [t + t, t f ].
Then, we can write
t+t tf
∗
J (t, x(t), u(τ )) = min F dτ + F dτ + (t f , x(t f )) .
u t t+t
However, by the PO the trajectory on the interval [t + t, t f ] must be optimal. Hence,
t+t
J ∗ (t, x(t)) = min F dτ + J ∗ (t + t, x(t + t)) .
u t
We next expand J ∗ (t + t, x(t + t)) in a Taylor series about the point (t, x(t)) to get
t+t
∗ ∗ ∂ J∗ ∂ J∗
J (t, x(t)) = min F dτ + J + t + (x(t + t) − x(t)) + H.O.T. ,
u t ∂t ∂x
where H.O.T. stands for higher-order terms. Because J ∗ (t, x(t)) is independent of u, we cancel
the terms J ∗ (t, x(t)) out and use that fact that x(t + t) − x(t) ≈ ẋt to obtain
t+t
∂ J∗ ∂ J∗
0 = min F dτ + t + ẋ t + H.O.T. .
u t ∂t ∂x
By assumption, t is small; hence we can write

∂ J∗ ∂ J∗
0 = min F t + t + f t + H.O.T. .
u ∂t ∂x
Dividing the above equation by t and letting t → 0 yields

∂ J∗ ∂ J∗
0= + min F + f . (5.81)
∂t u ∂x
∗
Let H = F + ∂∂Jx f denote the Hamiltonian function. Then, we write the partial differential
equation (5.81) as
∂ J∗
0= + min H (5.82)
∂t u
subject to the boundary condition

J ∗ (t f , x(t f )) = (t f , x(t f )).
The partial differential equation (5.82) for the optimal cost J ∗ (t, x(t)) is called the Hamilton–
Jacobi–Bellman (HJB) equation. It provides the solution to the optimal control problem for
general nonlinear dynamical systems. However, analytical solution to the HJB equation is diffi-
cult to obtain in most cases. We now illustrate, with three examples, how the HJB equation can
be used to construct optimal controllers.
◆ Example 5.18
We consider a general class of dynamical systems modeled by the scalar differential
equation
ẋ = ax + bu, x(t0 ) = x0 ,
with the associated cost index

1 2 1 tf
J (t0 ) = f x (t f ) + (qx 2 (t) + r u2 (t))dt,
2 2 t0
where the initial time t0 equals zero, the final time t f < ∞ is fixed, and the final
state x(t f ) is free. Our goal is construct the control u∗ that minimizes J (0).
We first form the Hamiltonian function for the problem:

∂J ∗ 1 2 1 2 ∂J ∗
H t, x, u, = qx + r u + (ax + bu).
∂x 2 2 ∂x
Because there are no constraints on the control u, we can determine the form
of the optimal control by applying the first-order necessary condition for static
optimization. We compute
∂H ∂J ∗
= ru + b = 0.
∂u ∂x
The optimal control, therefore, has the form
1 ∂J ∗
u∗ = − b .
r ∂x
This control indeed minimizes the Hamiltonian function H because
∂2 H
= r > 0;
∂u2
that is, the second-order sufficient condition for static minimization is satisfied.
We thus reduced the problem of constructing the optimal control to that of finding
∂ J ∗ /∂ x. Substituting u∗ into the HJB equation yields

∂J ∗ 1 2 b2 ∂ J ∗ 2 ∂J ∗ b2 ∂ J ∗ 2
+ qx + + ax −
∂t 2 2r ∂x ∂x r ∂x

∂J ∗ 1 2 b2 ∂ J ∗ 2 ∂J ∗
= + qx − + ax
∂t 2 2r ∂x ∂x
= 0.
Let us now assume that J ∗ is a quadratic function of the state for all t ≤ t f , that is,
J ∗ = 12 p(t) x 2 ,
where p(t) is a scalar function of time to be determined. Partial derivatives of J ∗

are
∂J ∗ 1 ∂J ∗
= ṗ(t) x 2 and = p(t) x.
∂t 2 ∂x
Substituting the above into the HJB equation and performing some manipulations
gives

1 1 b 2 p 2 (t)
ṗ(t) + q + ap(t) − x 2 = 0.
2 2 2r
The above holds for all x = x(t) if and only if
1 1 b 2 p 2 (t)
ṗ(t) + q + ap(t) − = 0,
2 2 2r
or, equivalently,
b 2 p 2 (t)
ṗ(t) + q + 2ap(t) − =0 (5.83)
r
subject to the boundary condition p(t f ) = f that we obtain by comparing J ∗ =
1 p (t) x 2 and J (t ) = 1 f x 2 (t ). To solve the above equation we assume that
2 f 2 f
ẇ(t)
p(t) = ρ ,
w(t)
where the parameter ρ is to be determined. Note that
ẅ(t)w(t) − ẇ2 (t)

ṗ(t) = ρ .
w2 (t)
Substituting the above into (5.83), we obtain
b2 p2
ṗ + q + 2ap −
r
ẅw − ẇ2 w2 ẇ b 2 ρ 2 ẇ2 /r
=ρ + q + 2aρ −
w2 w2 w w2
ρ ẅw − ρ ẇ + qw + 2aρ ẇw − b ρ 2 ẇ2 /r
2 2 2
=
w2
= 0.
We now choose ρ so that the terms nonlinear in ẇ cancel out. For this it is enough
that
−ρ ẇ2 − b 2 ρ 2 ẇ2 /r = 0.
Hence, if we set
r
ρ = − 2,
b
then the equation to be solved takes the form

r 2ar
− 2 ẅ − 2 ẇ + qw w
b b
= 0.
w2
The above will be satisfied if
r 2ar
− 2 ẅ − 2 ẇ + qw = 0,
b b
or, equivalently,
qb 2
ẅ + 2a ẇ − w = 0,
r
which is just a second-order linear differential equation. Its characteristic equation

has two real roots:
.
qb 2
s1,2 = −a ± a2 + .
r
Hence
w(t) = C1 es1 t + C2 es2 t ,
and
ẇ(t) s C es1 t + s2 C2 es2 t

p(t) = ρ =ρ 1 1 st .
w(t) C1 e 1 + C2 es2 t
Using the boundary value p(t f ) = f , we express C1 in terms of C2 to obtain
ρs2 C2 es1 t f − f C2 es2 t f

C1 = .
f es1 t f − ρs1 es1 t f
We substitute the above expression for C1 into p(t) and cancel out C2 . Then, we
use the obtained result to construct the optimal state-feedback controller:
1 ∂J ∗ 1
u∗ = − b = − bp(t) x
r ∂x r
◆ Example 5.19
Consider the schematic of an armature controlled DC motor, given in Figure 5.26,
where the system parameters are Ra = 2 , L a ≈ 0 H, K b = 2 V/rad/sec, and
K i = 2 Nm/A, and the equivalent moment of inertia referred to the motor shaft is
I eq = 1 kg · m2 . The friction is assumed to be negligible.
1. Construct the first-order differential equation modeling the DC motor.

2. Use the Hamilton–Jacobi–Bellman equation to find the optimal state-feedback
Ra La (negligible)

ea
Ieq

Figure 5.26 Schematic of an
if constant armature-controlled DC motor system
of Example 5.19.
controller, ea = k(t)ω, which minimizes the performance index

10
1
J = ω2 (10) + Ra i a2 (t)dt,
2 0
where i a is the armature current. The final state is free. There are no constraints
on ea . Assume J ∗ = 12 p(t)ω2 .
Applying Kirchhoff’s voltage law to the armature circuit yields

Ra i a + K b ω = ea .
The torque developed by the motor, Tm, is
I eq ω̇ = Tm = K i i a .
Substituting i a from the first equation into the second and dividing both sides by
I eq yields
Ki Kb Ki
ω̇ = − ω+ ea .
Ra I eq I eq Ra
Substituting into the above the parameter values, we obtain

ω̇ = −2ω + ea . (5.84)
We now represent the performance index, J , in terms of ω and ea . Applying Ohm’s
law to the armature circuit, we get
ea − K b ω = Ra i a .
Hence,

10 (e − K ω) 2
1 2 a b
J = ω (10) + dt.
2 0 R a
The Hamiltonian function is

(ea − K b ω) 2 ∂J ∗ Ki Kb Ki
H= + − ω+ ea
Ra ∂ω Ra I eq I eq Ra
(ea − 2ω) 2 ∂J ∗
= + (−2ω + ea ).
2 ∂ω
Because there are no constraints on ea , we can find the optimal control by solving
the equation
∂H 2(ea − K b ω) Ki ∂ J ∗
0= = + .
∂ea Ra I eq Ra ∂ω
Hence,
∂J ∗
ea = 2ω − . (5.85)
∂ω
Substituting (5.85) into the HJB equation yields

∂J ∗ 1 ∂J ∗ 2 ∂J ∗ ∂J ∗ ∂J ∗ 1 ∂J ∗ 2
0= + − + − = − .
∂t 2 ∂ω ∂ω ∂ω ∂t 2 ∂ω
By assumption J ∗ = 12 p (t)ω2 . Hence,
∂J ∗ 1 ∂J ∗
= ṗ ω2 and = p ω.
∂t 2 ∂ω
Substituting the above into the HJB equation, we get

1 ( ṗ − p 2 )ω2 = 0.
2
We next solve the nonlinear differential equation
ṗ − p 2 = 0. (5.86)
We assume that p = ρ ẇ
w . Therefore,
ẅw − ẇ2
ṗ = ρ .
w2
We substitute the expressions for p and ṗ into (5.86) to get
ρ ẅw − ρ ẇ2 − ρ 2 ẇ2

= 0.
w2
Selecting ρ = −1 eliminates the nonlinear terms in the numerator, and we are left
with the equation
ẅ = 0,
whose solution is
w = w(t) = C1 t + C2 ,
where C1 and C2 are integration constants. Hence,
ẇ C1
p=− =− .
w C1 t + C2
Because
J ∗ (10) = 12 ω2 (10) = 12 p(10)ω2 (10),
we conclude that p (10) = 1. We use this information to eliminate one of the

integration constants. We obtain
C1
p (10) = 1 = − .
10C1 + C2
Hence,
C2 = −11C1 ,
and
C1 C1 1
p = p(t) = − =− =− .
C1 t + C2 C1 t − 11C1 t − 11
Therefore,

∂J ∗ 1
ea∗ = 2ω − = 2ω − pω = 2+ ω
∂ω t − 11
In the above examples, the final time t f was fixed and t f < ∞. In the following example, we
consider the linear quadratic regulator design for t f = ∞ using the HJB equation.
◆ Example 5.20
For the dynamical system model

ẋ 1 0 1 x1 0
= + u
ẋ 2 0 −1 x2 1

1 ∞ 2
J = x1 + x22 + u2 dt,
2 0
we will use the HJB equation to find the optimal state-feedback controller. We first
form the Hamiltonian function
1 2 ∂J ∗ ∂J ∗
H= x1 + x22 + u2 + x2 + (−x2 + u).
2 ∂ x1 ∂ x2
We assume that J ∗ has the form

J ∗ = 12 αx12 + 2βx1 x2 + γ x22 .
Hence,
∂J ∗
= 0,
∂t
and the HJB equation takes the form
min H = 0,
u
with the boundary condition

J ∗ ( x(t f )) = J ∗ ( x(∞)) = 0.
Because there are no constraints on u, we apply the first-order necessary condition
for unconstrained optimization to find u minimizing H:
∂H
= 0.
∂u
We denote by u∗ the solution to the above equation. Differentiating, we obtain
∂J ∗
u∗ + = 0.
∂ x2
Hence,
∂J ∗
u∗ = − .
∂ x2
We substitute u∗ into the HJB equation to get

1 ∂J ∗ 2 ∂J ∗ ∂J ∗ ∂J ∗ 2
x12 + x22 + + x2 − x − = 0.
2 ∂ x2 ∂ x1 ∂ x2 2 ∂ x2
Note that
∂J ∗ ∂J ∗
= αx1 + βx2 and = βx1 + γ x2 .
∂ x1 ∂ x2
Substituting the above into the HJB equation yields

1 x 2 + x 2 + (βx + γ x ) 2 + (αx + βx ) x − (βx + γ x ) x − (βx + γ x ) 2
2 1 2 1 2 1 2 2 1 2 2 1 2
= 0.
We rearrange the above equation to obtain

1 x 2 (1 − β 2 ) + x x (α − β − βγ ) + 1 x 2 (1 − 2γ − γ 2 + 2β) = 0.
2 1 1 2 2 2
For this equation to hold, we have to have

1 − β 2 = 0,
α − β − βγ = 0,
1 − 2γ − γ 2 + 2β = 0.
There are three solutions to the above system of equations:
1. α = 2, β = γ = 1.
2. α = −2, β = 1, γ = −3.
3. α = 0, β = γ = −1.
Thus, we have three candidate solutions to the problem:
u1 = −x1 − x2 ,
u2 = −x1 + 3x2 ,
u3 = x1 + x2 .
However, only u1 stabilizes the plant. Hence, the optimal control law for the
problem is
u∗ = −x1 − x2 .
Note that only for this controller the boundary condition, J ∗ ( x(t f )) = J ∗ ( x(∞)) = 0,
is satisfied.
5.5 Pontryagin’s Minimum Principle
5.5.1 Optimal Control with Constraints on Inputs

In this section we provide an informal derivation of the minimum principle of Pontryagin, which
is a generalization of the Euler–Lagrange equations that also includes problems with constraints
on the control inputs. Only a special case of the minimum principle will be stated, but this
special case covers a large class of control problems. We consider the problem of minimizing
the performance index given by
tf
J = (t f , x(t f )) + F(x(t), u(t)) dt (5.87)
t0
subject to
ẋ = f (x, u), x(t0 ) = x 0 , and x(t f ) = x f . (5.88)
We consider two cases: fixed final state and free final state.
It follows from our discussion of the HJB equation that the optimal controller, u∗ , minimizes
the Hamiltonian function, that is,
u∗ = arg min H (x, u, p),
u∈U
where H (x, u, p) = H = F + pT f , and

T
∂ J∗ ∂ J∗ ∂ J∗
p = [ p1 p2 ··· pn ] =
T
··· .
∂ x1 ∂ x2 ∂ xn
To proceed, we need one more assumption in addition to the assumptions we made while
discussing the HJB equation. We assume that J ∗ (x) has continuous second partial derivatives
∂ 2 J ∗ /∂ xi ∂ x j ; that is, the functions pi have continuous first derivatives ∂ pi /∂ x j . This implies that
T
∂p ∂p
= . (5.89)
∂x ∂x
We also assume that the components of f have continuous first derivatives ∂ f i /∂ x j . We then
5.5 PONTRYAGIN’S MINIMUM PRINCIPLE 293
compute
∂H ∂
= (F + pT f )
∂x ∂x
∂F ∂p ∂f
= + fT + pT , (5.90)
∂x ∂x ∂x
where ∂ F/∂ x = ∇ xT F. From our previous discussion it follows that the Hamiltonian function
attains its minimum on the optimal trajectory. Taking this into account, we conclude that for the
optimal process we obtain
∂H
= 0T for (x, u) = (x ∗ , u∗ ).
∂x
Hence, for the optimal process (x ∗ , u∗ ) we have
∂F ∂p ∂f
+ fT + pT = 0T . (5.91)
∂x ∂x ∂x
We now differentiate p with respect to time to get
∂p ∂p
ṗ = ẋ = f. (5.92)
∂x ∂x
Combining (5.91) and (5.92) and taking into account (5.89), we obtain

∂F T ∂f T
ṗ = − − p. (5.93)
∂x ∂x
Thus, p is the solution to the differential equation (5.93), and hence p = p(t). Therefore
∂ p/∂ x = 0, and by (5.90) we have
∂H ∂F ∂f
= + pT . (5.94)
∂x ∂x ∂x
Comparing (5.93) and (5.94) gives

∂H T
ṗ = − (5.95)
∂x
The above equation is called in the literature the adjoint or costate equation.
We summarize our development in the following theorem.
Theorem 5.6 Necessary conditions for u ∈ U to minimize (5.87) subject to (5.88) are

∂H T
ṗ = − ,
∂x
where H = H( x, u, p) = F ( x, u) + pT f ( x, u), and
H( x ∗ , u∗ , p∗ ) = min H( x, u, p).
u∈U
If the final state, x(t f ), is free, then in addition to the above conditions it is required
that the following end-point condition is satisfied:
p (t f ) = ∇ x |t=t f .
We now illustrate the above theorem with the following well-known example.
◆ Example 5.21
We consider a controlled object modeled by

ẋ 1 0 1 x1 0
= + u
ẋ 2 0 0 x2 1
= Ax + bu
with the performance index
tf
J = dt.
0
The control is required to satisfy

|u(t)| ≤ 1
for all t ∈ [0, t f ]. This constraint means that the control must have magnitude no
greater than 1. Our objective is to find admissible control that minimizes J and
transfers the system from a given initial state x 0 to the origin.
We begin by finding the Hamiltonian function for the problem:
H = 1 + pT ( Ax + bu)

0 1 x1 0
= 1 + [ p1 p2] + u
0 0 x2 1
= 1 + p 1 x2 + p 2 u.
The costate equations are
 
∂H

ṗ 1  ∂ x1  0

= −  = .
ṗ 2 ∂H  − p1
∂ x2
Solving the costate equations yields
p 1 = d1 and p 2 = −d 1 t + d 2 ,
where d1 and d2 are integration constants. We now find an admissible control
minimizing the Hamiltonian:
argu min H = argu min(1 + p 1 x2 + p 2 u)
= argu min( p 2 u)

u(t) = 1 if p 2 < 0,
=
u(t) = −1 if p 2 > 0.
Hence,
u∗ (t) = −sign( p∗2 ) = −sign(−d1 t + d2 ),
where

1 if z > 0,
sign( z) =
−1 if z < 0.
Thus, the optimal control law is piecewise constant taking the values 1 or −1. This
control law has at most two intervals of constancy because the argument of the sign
function is a linear function, −d1 t + d2 , that changes its sign at most once. This
type of control is called a bang-bang control because it switches back and forth
between its extreme values. System trajectories for u = 1 and u = −1 are families
of parabolas, and their equations were derived in Example 2.2. A phase portrait for
u = 1 is shown in Figure 2.4, while that for u = −1 is shown in Figure 2.5. Note
that only one parabola from each family passes through the specified terminal
point x = [0 0]T in the state plane. Segments of the two parabolas through the
origin form the switching curve,
x1 = − 12 x 22 sign( x2 ).
This means that if an initial state is above the switching curve, then u = −1 is used
until the switching curve is reached. Then, u = 1 is used to reach the origin. For an
initial state below the switching curve, the control u = 1 is used first to reach the
switching curve, and then the control is switched to u = −1. The above control
action can be described as

 1 if v < 0,
u = −sign( x2 ) if v = 0,

−1 if v > 0,
where v = v( x1 , x2 ) = x1 + 12 x 22 sign( x2 ) = 0 is the equation describing the switch-

ing curve. We can use the above equation to synthesize a closed-loop system such
that starting at an arbitrary initial state in the state plane, the trajectory will always
be moving in an optimal fashion toward the origin. Once the origin is reached, the
trajectory will stay there. A phase portrait of the closed-loop time optimal system
is given in Figure 5.27.
3
2.5
2
1.5
1
.5
x2 0
.5
1
1.5
2
2.5
3
6 5 4 3 2 1 0 1 2 3 4 5 6
x1
Figure 5.27 A phase-plane portrait of the time optimal closed-loop system of Example 5.21.
◆ Example 5.22
Consider the schematic of an armature-controlled DC motor, given in Figure 5.26.
Use the minimum principle to find the armature voltage ea (t) such that the motor
angular velocity ω changes from ω(0) = 0 rad/sec at t = 0 to ω(1) = 10 rad/sec
while minimizing the energy dissipated in the armature resistor, Ra . There are no
constraints on ea . Note that the energy dissipated in the armature resistor is
1
J = Ra i a2 (t)dt,
0
where i a is the armature current.

The modeling equation of the DC motor was derived in Example 5.19. It is given
by (5.84) on page 288. We now represent the expression for the energy dissipated
in the armature resistor in terms of ω and ea . Applying Ohm’s law to the armature
circuit, we get
ea − K bω = Ra i a .
Hence,
1 1
(ea − K b ω) 2
J = Ra i a2 (t)dt = dt.
0 0 Ra
The Hamiltonian function is

(ea − K b ω) 2 Ki Kb Ki
H= +p − ω+ ea .
Ra Ra I eq I eq Ra
Because there are no constraints on ea , we can find the optimal control by solving
the equation
∂H 2(ea − K b ω) Ki
0= = + p.
∂ea Ra I eq Ra
Hence,
ea = 2ω − p. (5.96)
The costate equation is
∂H 2K b(ea − K b ω) Ki Kb
ṗ = − = + p = 2(ea − 2ω) + 2 p. (5.97)
∂ω Ra Ra I eq
We substitute (5.96) into (5.97) to obtain

ṗ = 0.
Hence,
p(t) = constant = c.
Therefore,
ea = 2ω − c. (5.98)
Substituting (5.98) into (5.84), we get

ω̇ = −c.
Solving the above yields
ω(t) = −ct + ω(0) = −ct,
because ω(0) = 0. We next use the boundary condition ω(1) = 10 to find c =
−10. Hence, the optimal armature voltage is
ea∗ (t) = 20t + 10
◆ Example 5.23
In this example, we will solve Johann Bernoulli’s brachistochrone problem, intro-
duced in Section 5.2, using the minimum principle. We first recall the problem as
restated by Johann Bernoulli in 1697 (quoted from Dunham [70, p. 200]):
“Among the infinitely many curves which join the two given points . . . choose
the one such that, if the curve is replaced by a thin tube or groove, and a small
sphere placed in it and released, then this will pass from one point to the other in
the shortest time.”
It is difficult not to mention the reward that one obtains for solving the problem
(quoted from Dunham [70, p. 200]):
“Let who can seize quickly the prize which we have promised to the solver.
Admittedly this prize is neither of gold nor silver, for these appeal only to base and
venal souls. . . . Rather, since virtue itself is its own most desirable reward and a
fame is a powerful incentive, we offer the prize fitting for the man of noble blood,
compounded of honor, praise, and approbation . . .” We add that Johann Bernoulli
wrote the above after he already knew the solution to the problem!
Model and the Performance Index
The field is conservative, hence the sums of the kinetic and potential energy is
constant,
1 mv 2 (t) − mgy(t) = 1 mv 2 (t ) − mgy(t ) = 0,
2 2 0 0
where we assumed that the particle is at rest at t0 . Using the above, we can find
the particle’s speed at any time t ≥ t0 :

v(t) = 2gy(t).
Then, the brachistochrone problem can be formulated as an optimal control prob-
lem of the form
T
minimize J (t0 ) = dt
t0

ẋ = v cos θ,
subject to
ẏ = v sin θ,
where θ is the angle that the particle velocity vector forms with the positive x axis,
while ẋ and ẏ are the horizontal and vertical components of the particle velocity
vector. The path angle θ is considered as the control input to be determined.
Particle Trajectory
To proceed, we form the Hamiltonian function
H = 1 + p1 v cos θ + p 2 v sin θ, (5.99)
where p 1 and p 2 are the costate variables. The costate equations are
 
∂H !
0
ṗ 1  ∂x 

= −  = g .
ṗ 2 ∂H − ( p 1 cos θ + p 2 sin θ )
v
∂y
It follows from the above that p 1 = K = constant. Because there are no constraints
on the control input θ, the necessary condition for the control optimality is
∂H
= 0,
∂θ
where
∂H
= − p 1 v sin θ + p 2 v cos θ.
∂θ
Hence,
p 2 = p 1 tan θ = K tan θ.
˙ First note that
We next find an expression for θ.
∂ p2 ∂( K tan θ) K
ṗ 2 = θ˙ = θ˙ = θ˙. (5.100)
∂θ ∂θ cos2 θ
On the other hand,
g
ṗ 2 = − ( p 1 cos θ + p 2 sin θ )
v
g
= − ( K cos θ + K tan θ sin θ)
v
Kg sin2 θ
=− cos θ +
v cos θ

Kg cos2 θ + sin2 θ
=−
v cos θ
Kg
=− . (5.101)
v cos θ
Combining (5.100) and (5.101) yields
g
θ̇ = − cos θ. (5.102)
v
We now use the boundary conditions to express x and y as functions of time and
θ. First, using the above we can write
dy dy g
ẏ = θ̇ = − cos θ .
dθ dθ v
On the other hand,
ẏ = v sin θ.
Equating the right-hand sides of the above two equations yields
dy dy 1
2
= = − tan θdθ.
v 2gy g
Integrating both sides gives

1 dy 1
=− tan θ dθ,
2g y g
that is,
1 1
ln |y| = ln | cos θ|.
2g g
Performing manipulations, we obtain
y = C cos2 θ,
where C is an integration constant. Evaluating the above at t = T gives
y( T ) = y1 = C cos2 θ ( T ).
Hence,
y1
C=
cos2 θ( T )
and
y1
y(t) = cos2 θ (t) (5.103)
cos θ( T )
2
In a similar way, we can obtain an expression for x as a function of time and θ .

Indeed, on the one hand we have
dx dx g
ẋ = θ̇ = − cos θ,
dθ dθ v
while on the other hand we obtain
ẋ = v cos θ.
Comparing the right-hand sides of the above two equations gives
dx v2 y1
=− = −2y = −2 cos2 θ.
dθ g cos2 θ ( T )
We represent the above as

y1
dx = −2 cos2 θ dθ.
cos θ( T )
2
Integrating both sides, using the identity, cos2 θ = 12 (1 + cos 2θ ), yields

y1 θ sin 2θ
x = −2 + + C̃,
cos2 θ( T ) 2 4
where C̃ is the integration constant. Evaluating the above at t = T gives

y1 θ( T ) sin 2θ ( T )
x1 = −2 + + C̃.
cos θ( T )
2 2 4
Hence,

y1 sin 2θ ( T )
C̃ = x1 + θ( T ) + .
cos2 θ( T ) 2
Using the above, we obtain
y1
x(t) = x1 + (2(θ( T ) − θ(t)) + sin 2θ ( T ) − sin 2θ(t)) (5.104)
2 cos2 θ( T )
We point out that the equation for x(t), y(t) and θ̇ specify an implicit feedback con-
trol law in the sense that the control θ can be calculated given the state ( x(t), y(t)).
We now use the boundary conditions to find the description of the time-optimal
closed loop system. We can assume, without loss of generality, that x(t0 ) = y(t0 ) =
x(0) = y(0) = 0. Then, using the expression for y(t) we note that y(t0 ) = 0 =
y1
cos2 θ( T )
cos2 θ(t0 ). Thus, the angle of departure is θ(t0 ) = 90◦ . With the above
initial path angle, y, and hence v, is increased at the outset as quickly as possible.
Using these data, we next evaluate x at t = t0 to form an equation that can be
used to find the angle of arrival θ( T ). The equation is
2x1 cos2 θ( T ) + y1 (2θ( T ) − π + sin 2θ( T )) = 0.
The above equation can be solved easily using MATLAB’s Symbolic Toolbox or
the MATLAB’s fsolve function. Once θ( T ) is found, we use the expression for y
to find cos θ, where
.
y(t)
cos θ(t) = cos θ( T ).
y1
Hence,

sin θ(t) = 1 − cos2 θ(t).
We substitute the above expression into ẋ = v cos θ and ẏ = v sin θ to obtain the
state-space model of the time optimal motion of the particle moving under the
force of gravity from a given point to another specified point. The equations are
.
y(t)
ẋ(t) = 2gy(t) cos θ ( T ),
y1
.
y(t)
ẏ(t) = 2gy(t) 1− cos2 θ( T )
y1
subject to the boundary conditions [x(t0 ) y(t0 )] = [x0 y0 ] and [x( T ) y( T )] =

[x1 y1 ].
The Time of Quickest Descent
We now derive an expression that can be used to compute the time of the particle
travel. We first establish an auxiliary result that we will use in our derivation.
Specifically, we will show that θ̇(t) is constant on the interval [t0 , T ]. Differentiating
both sides of equation (5.102) gives

dθ̇ d g
= −√ cos θ
dt dt 2gy

g d 1
= −√ √ cos θ
2g dt y

g 1 ẏ cos θ sin θ
= −√ − − √ θ̇
2g 2 y3/2 y

g 1 ẏ cos θ sin θ
= √ + √ θ̇ .
2g 2 y3/2 y
Substituting into the above the expressions for θ̇ and ẏ gives

√
dθ̇ g 1 2gy sin θ cos θ sin θ cos θ
= √ −g √ √
dt 2g 2 y3/2 y 2gy
√ √
g g g
= √ √ sin θ cos θ − √ sin θ cos θ
2g y 2 y 2
= 0.
Because θ̇ is constant on the interval [t0 , T ], we use (5.102) to write
dθ g
= −√ cos θ( T ).
dt 2gy( T )
Separating the variables yields

g
dθ = − √ cos θ ( T )dt.
2gy( T )
Integrating both sides of the above, we obtain

θ( T ) T
g
dθ = − √ cos θ( T ) dt.
θ (t) 2gy( T ) t0
Hence,
g
θ( T ) − θ(t) = − √ cos θ( T )( T − t0 ).
2gy( T )
To obtain an expression for the particle travel time, we set in the above equation
t = t0 = 0 and perform simple manipulations to obtain
π/2 − θ( T )
T = 2y( T )/g
cos θ( T )
It’s an Upside-Down Cycloid!

The path of quickest descent is an upside-down cycloid. A cycloid is a curve
described by a point on the circumference of a circle that rolls without slipping
(see Figure 5.28). The equations describing a cycloid are
x(φ) = R(φ − sin φ),

y(φ) = R(1 − cos φ).
We will now show that equations (5.104) and (5.103) describe a cycloid. To pro-
ceed, we define
y1
R= and φ(t) = π − 2θ(t). (5.105)
2 cos2 θ( T )
Using the above notation and taking into account that sin(π − φ) = sin φ, we
rewrite (5.104) as
x(t) − x1 + R(φ( T ) − sin φ( T )) = R(φ(t) − sin φ(t)).
y
Figure 5.28 Constructing an upside-down cycloid—the solution to the brachistochrone
problem.
Using the notation given in (5.105) and the fact that cos 2φ = 2 cos2 φ − 1, we
represent (5.103) as
y1
y(t) = cos2 θ(t)
cos θ( T )
2
y1 cos 2θ(t) + 1
=
cos2 θ( T ) 2
= R(1 + cos(π − φ(t)))
= R(1 − cos φ(t)),
because cos(π − φ(t)) = − cos φ.
In the following, we discuss a general minimum time regulator problem for a class of multi-
input dynamical systems modeled by
ẋ = Ax + Bu.
The goal is to transfer the system from some initial state to the origin in minimum time subject
to
|u i | ≤ ki , ki > 0,
i = 1, 2, . . . , m.
"t
Thus, the performance index for the problem is J = t0 f dt. We begin our analysis by forming
the associated Hamiltonian function
H = 1 + pT ( Ax + Bu)
m
= 1 + pT Ax + ( pT bi )u i ,
i=1
where bi is the ith column of B. Let
si (t) = p∗T (t)bi .
Then, applying Pontryagin’s minimum principle, we obtain the following form of the optimal
control:
u i∗ (t) = −ki sign(si (t)), i = 1, 2, . . . , m.
Thus, the optimal control for the ith control channel has the bang-bang form, u i∗ = ±ki . Note
that si (t) = p∗T (t)bi = 0 is the switching curve for the ith control channel. We compute the
costate vector from the costate equation,
∂H
ṗ∗T = −
∂x
∂
= − ( p∗T Ax)
∂x
= − p∗T A,
or
ṗ∗ = − AT p∗ .
The solution to the costate equation is

p∗ (t) = e− A t p∗ (0).
T
Thus, the ith switching curve can be represented as

si (t) = p∗T (0)e− At bi = 0.
Observe that if si (t) = 0 on some time interval, then u i∗ (t) is indefinite on this interval. We now
show that under some mild assumptions there is no time interval on which si (t) = 0; that is,
there is no time interval on which u i∗ (t) is indefinite. First, we assume that bi = 0. Note that in
our problem here, minu∈U H (x, u) = 0 because J ∗ (t, x) = J ∗ (x) and therefore ∂ J ∗ /∂t = 0.
Hence, 1 + p∗T (t)( Ax ∗ + Bu∗ ) = 0 for all t for the optimal process. Thus, the optimal costate
vector p∗ (t) cannot be zero for any value of t. Finally, if si = p∗T bi is zero on some time
interval—that is, si is constant on a time interval—then this implies that on the above time
interval, we obtain
ṡ i = ṗ∗T bi = − p∗T Abi = 0.
Similarly, higher derivatives of si (t) = p∗T bi will also be zero on the time interval on which
( j)
si (t) = p∗T bi is zero, that is, si (t) = (−1) j p∗T A j bi = 0, j = 1, 2, . . .. We represent the
above relations as
p∗T [bi Abi ··· An−1 bi · · ·] = 0T .
If the system is controllable by the ith input acting alone, then
rank[bi Abi ··· An−1 bi · · ·] = n,
∗ ∗
and we would have to have p = p (t) = 0. However, we already ruled out this case and thus
si (t) = p∗T bi cannot be zero on any time interval. In conclusion, if the system is controllable
by each input acting alone, then there is no time interval on which optimal control vector is
indefinite.
5.5.2 A Two-Point Boundary-Value Problem

We consider the problem of minimizing the quadratic performance index

1 1 tf T
J = x T (t f )Fx(t f ) + (x (t) Qx(t) + u T (t)Ru(t)) dt (5.106)
2 2 t0
subject to
ẋ(t) = Ax(t) + Bu(t), x(t0 ) = x 0 , x(t f ) is free. (5.107)
We assume that there are no constraints on the control input u. The weight matrices F and Q
are real symmetric positive semidefinite, and the matrix R is real symmetric positive definite.
The Hamiltonian function for the above linear quadratic control problem is
H = 12 x T Qx + 12 u T Ru + pT ( Ax + Bu), (5.108)
where p ∈ Rn is the costate vector. It follows from Theorem 5.6 that the optimal controller must
minimize the Hamiltonian function. Because the control vector is unconstrained, the necessary
condition for optimality of the control u is

∂H
= 0T . (5.109)
∂u
Evaluating (5.109) yields
∂H
= u T R + p T B = 0T . (5.110)
∂u
Hence, the optimal controller has the form
u = −R−1 B T p. (5.111)
Substituting the above into (5.107), we obtain
ẋ = Ax − B R−1 B T p, x(t0 ) = x 0 . (5.112)
We next find the costate equation,
T
∂H
ṗ = −
∂x
= − Qx − AT p. (5.113)
Combining (5.112) and the above yields 2n linear differential equations:
!
ẋ A −B R−1 B T x
= . (5.114)
ṗ −Q − AT p
By Theorem 5.6, the state vector at time t f must satisfy

p(t f ) = 12 (∇ x T Fx)|t=t f = Fx(t f ). (5.115)
In summary, we reduced the linear quadratic control problem to solving 2n linear differential
equations with mixed boundary conditions, which is an example of a two-point boundary value
problem, or TPBVP for short. We shall now solve the above TPBVP. If we had the initial
conditions x(t0 ) and p(t0 ), then the solution to (5.114) would have the form
! !
x(t f ) H(t f −t0 ) x(t0 )
=e , (5.116)
p(t f ) p(t0 )
where
!
A −B R−1 B T
H= .
−Q − AT
However, in this particular TPBVP, p(t0 ) is unknown and instead we are given p(t f ). To proceed,
let
! !
x(t f ) H(t f −t) x(t)
=e , (5.117)
p(t f ) p(t)
where
!
H(t f −t)
11 (t f , t) 12 (t f , t)
e = .
21 (t f , t) 22 (t f , t)
Each of the blocks i j (t f , t), i, j = 1, 2, is n × n. Thus we have

! ! !
x(t f ) 11 (t f , t) 12 (t f , t) x(t)
= . (5.118)
p(t f ) 21 (t f , t) 22 (t f , t) p(t)
The above matrix equation can be represented as

x(t f ) = 11 (t f , t)x(t) + 12 (t f , t) p(t), (5.119)
p(t f ) = 21 (t f , t)x(t) + 22 (t f , t) p(t). (5.120)
Using the boundary condition (5.115) and (5.119) gives
p(t f ) = Fx(t f ) = F11 (t f , t)x(t) + F12 (t f , t) p(t). (5.121)
Subtracting (5.120) from (5.121), we obtain
0 = (F11 (t f , t) − 21 (t f , t))x(t) + (F12 (t f , t) − 22 (t f , t)) p(t). (5.122)
Using the above, we compute
p(t) = (22 (t f , t) − F12 (t f , t))−1 (F11 (t f , t) − 21 (t f , t))x(t)
= P(t)x(t), (5.123)
where
P(t) = (22 (t f , t) − F12 (t f , t))−1 (F11 (t f , t) − 21 (t f , t)). (5.124)
Substituting (5.123) into (5.111) yields the optimal state-feedback controller
u(t) = −R−1 B T P(t)x(t). (5.125)
We will now show that P(t) satisfies a matrix differential equation. Indeed, differentiat-
ing (5.123) gives
Ṗ x + P ẋ − ṗ = 0. (5.126)
Substituting into (5.112), (5.113), and (5.123) yields
( Ṗ + P A − P B R−1 B T P + Q + AT P)x = 0. (5.127)
The preceding must hold throughout t0 ≤ t ≤ t f . Hence, P must satisfy
Ṗ = P B R−1 B T P − AT P − P A − Q (5.128)
subject to the boundary condition

P(t f ) = F. (5.129)
Equation (5.128) is called the matrix Riccati differential equation. Note that because F is
EXERCISES 307
symmetric, so is P = P(t). Observe that P = P(t) given by (5.124) is a solution to (5.128)

subject to the boundary condition (5.129).
The Riccati differential equation (5.128) associated with a class of one-dimensional plants
was solved in Example 5.18.
Notes
Sussmann and Willems [273] made the following statement, in 1997, regarding the origins of
optimal control: “Optimal control was born in 1697—300 years ago—in Groningen, a university
town in the north of The Netherlands, when Johann Bernoulli, professor of mathematics at the
local university from 1695 to 1705, published his solution of the brachystochrone problem.”
For a serious student of optimal control, the seminal text by Lee and Markus [177] is recom-
mended. Less advanced than Lee and Markus are Kirk [160], Bryson and Ho [36], Owens [218],
and Pierre [234]. The classical text by Kwakernaak and Sivan [173] is also worth perusing be-
cause of its clarity and its treatment of the stochastic aspects of control problems. Informative,
and at the same time entertaining, accounts of the origins of the calculus of variations are given
by Dunham [70, Chapter 8] and by Sussmann and Willems [273]. Elsgolc [75] provides a lucid
introduction, illustrated with many examples, to the calculus of variations. Applications of the
calculus of variations in physics and engineering are provided by Weinstock [297]. Interrela-
tions between the calculus of variations and optimal control are discussed by Hestenes [121].
Frequency-domain aspects of optimal control and robust optimal control design are discussed by
Burl [39]. Informal discussion of the Pontryagin minimum principle in Section 5.5 was adapted
from Boltyanskii [30]. For more numerical examples related to optimal control, see Lewis and
Syrmos [182], and Bryson [37]. Exciting applications of optimal control to a number of control
problems of technology and science can be found in Lee and Markus [177] .
EXERCISES
5.1 Find the variations of the functionals:
" π/2
(a) v(x1 , x2 ) = 0 (x14 + 2x12 x22 + x22 ) dt,
"1
(b) v(x1 , x2 ) = 0 (e x1 −x2 − e x1 +x2 ) dt.
5.2 Find the equations of the curves that are extremals for the functional
tf
1 2
v(x) = 4
ẋ − t ẋ − x + 12 x ẋ dt
0
for the boundary conditions

(a) x(0) = 1, t f = 2, and x(2) = 10;
(b) x(0) = 1, t f = 10, and x(10) is free.
5.3 Find the equation of the curve that is an extremal for the functional
tf
1 2
v(x) = 2
ẋ − t ẋ + 2x 2 dt
0
subject to the boundary conditions

x(0) = 3/4, t f = 1, and x(1) is free.
5.4 Given a dynamical system model

0 1 0
ẋ = x+ u,
0 0 1
y = [2 0]x,
∞
J= (y 2 + u 2 ) dt.
0
Assuming that u = − kx is such that J is minimized, find the poles of the closed-loop
system.
5.5 Given a dynamical system model

0 1 0
ẋ = x+ u,
0 0 1
y = [2 0]x
∞
J= (y 2 + u 2 ) dt.
0
Use the Kronecker product to represent the associated algebraic Riccati equation as the
nonlinear vector algebraic equation. Solve it, and construct the optimal state-feedback
control law u = −kx. Assume that

1
x(0) = .
2
Find the optimal value of J .
5.6 Find a state-feedback control law u so as to minimize
1
J= (x 2 + u 2 ) dt
0
subject to
ẋ = −x + u, x(0) = 2.
∗
Assume J = p(t)x . 2
5.7 Use the principle of optimality to find the set of numbers x1 , x2 , . . . , xn satisfying
x1 + x2 + · · · + xn = c, where c is a positive constant such that the product x1 x2 · · · xn
is maximized (see [20]).
5.8 Use dynamic programming to find the maximum cost path between node a and node
t in the routing network depicted in Figure 5.29, where the travel costs are shown
EXERCISES 309
d
6 3 4
b g
4
1
6 5
7 3
a e
2 t
4
3
8 4
6
5
c h
2
7 3 2 Figure 5.29 A routing network of
f Exercise 5.8.
beside each segment and toll charges are shown by each node. The network can only
be traversed from left to right.
5.9 Use dynamic programming to find u(0) and u(1) that minimize

1
J = (x(2) − 2)2 + u 2 (k)
k=0
subject to
x(k + 1) = x(k) + u(k), x(0) = 1.
5.10 Consider the following optimal control problem:
1
1
minimize J2 (u) = u(k)2
2 k=0
subject to x(k + 1) = x(k) + 2u(k), x(0) = 3, x(2) = 0.
Convert the above optimal control problem into a quadratic optimization problem with
an equality constraint using the composite input vector

u(0)
z= .
u(1)
Then, employ Lagrange’s multiplier theorem to find the optimal sequence {u ∗ (0),
u ∗ (1)}.
5.11 Consider the following optimal control problem:
1
1
1
minimize J2 (u) = x(2)2 + (x(k)2 + u(k)2 )
2 2 k=0
subject to x(k + 1) = x(k) + 2u(k), x(0) = 3.
Convert the above optimal control problem into a quadratic optimization problem with
an equality constraint using the composite input-state vector
 
u(0)
x(1)
 
z= .
u(1)
x(2)
Then, employ Lagrange’s multiplier theorem to find the optimal sequence {u ∗ (0),
u ∗ (1)}.
5.12 Find u = u(t) so as to minimize
1
J= u 2 dt
0
subject to
ẋ = −2x + u, x(0) = 10, x(1) = 0.
5.13 For the circuit shown in Figure 5.30, find the input voltage, u(t), that changes the
current in the inductor from i = 10 A at t = 0 to i = 0 at t = 1 sec while minimizing
the performance index,
1
J= u 2 (t) dt.
0
2 i(t)
u(t) 1H

Figure 5.30 A circuit for Exercise 5.13.
5.14 Use the minimum principle to find the input voltage u(t) that charges the capacitor, in
Figure 5.31, from x(0) = 2 V at t = 0 to x(10) = 12 V while minimizing the energy
dissipated in the resistor. There are no constraints on u(t). Note that, by Kirchhoff’s
voltage law,
Ri(t) + x(t) = u(t),
i(t) R 100 k

u(t) C 10 F x(t)

Figure 5.31 A capacitor charging circuit
of Exercise 5.14.
EXERCISES 311
where i(t) = C d x(t)

dt
, and the energy dissipated is
10
J= Ri 2 (t) dt.
0
5.15 Show that if µ is an eigenvalue of the Hamiltonian matrix H, then so is −µ.
5.16 Consider the Hamiltonian function defined by
H = F + pT f .
Show that if the functions f and F do not explicitly depend upon t, then
H |u=u∗ = constant, t0 ≤ t ≤ t f .
5.17 Given the Hamiltonian matrix corresponding to a dynamical system and its associated
performance index. The Hamiltonian matrix is

2 −5
H= .
−1 −2
The MATLAB command [V,D] = eig(H) resulted in the following data:

0.9806 0.7071 3 0
V = , D= .
−0.1961 0.7071 0 −3
(a) Write the equation of the closed loop system driven by the optimal controller
u = −kx.
(b) Find the solution to the algebraic Riccati equation corresponding to the optimal
controller.
5.18 (a) Suppose that si is an eigenvalue of the matrix
A − B R−1 B T P,
and v i = 0 is the corresponding eigenvector, where P is the symmetric positive
definite solution of the algebraic Riccati equation. Show that
!
vi
Pv i
is an eigenvector of the associated Hamiltonian matrix corresponding to the eigen-
value si .
(b) Show that if Q = Q T > 0, then the real parts of the eigenvalues of the matrix
A − B R−1 B T P
are all negative, where P = P T > 0 is the solution of the associated algebraic
Riccati equation.
5.19 Given the following model of a dynamical system:
ẋ 1 = x2 + 5,
ẋ 2 = u,
where
|u| ≤ 1.
The performance index to be minimized is

tf
J= dt.
0
Find the state-feedback control law u = u(x1 , x2 ) that minimizes J and drives the
system from a given initial condition x(0) = [x1 (0), x2 (0)]T to the final state x(t f ) = 0.
Proceed as indicated below.
(a) Derive the equations of the optimal trajectories.
(b) Derive the equation of the switching curve.
(c) Write the expression for the optimal state-feedback controller.
5.20 (Based on Rugh and Murphy [246]) Given a dynamical system model
ẋ = Ax + bu,
where
   
0 1 0 ··· 0 0 0
 ···  0
 0 0 1 0 0  .
 .. ..  
A= . ··· ., b =  ..  ,
   
 0 0 0 ··· 0 1  0
−a0 −a1 −a2 · · · an−2 an−1 1

1 ∞ T
J= (x Qx + u 2 ) dt, Q = Q T ≥ 0.
2 0
We label the above problem as ( A, b). Consider now the dynamical system model
x̃˙ = Ã x̃ + b̃ũ,
where b = b̃,
 
0 1 0 ··· 0 0
0 0 1 · · · 0 0
. .. 
 
Ã =  .. ··· . ,
 
0 0 0 · · · 0 1
0 0 0 ··· 0 0
and the associated performance index has the form

1 ∞ T
J˜ = ( x̃ Q̃ x̃ + ũ 2 ) dt,
2 0
where Q̃ = Q + D Ã + ÃT D + DbbT D, and
 
0 1 0 ··· 0 −a0
 0 · · · −a1 
 0 1 0 
 . .. 
D =  .. ··· . .
 
 0 0 0 ··· 0 −an−2 
−a0 −a1 −a2 · · · −an−2 −an−1
EXERCISES 313
We refer to the second problem as ( Ã, b̃).

(a) Show that the symmetric matrix P is a solution to the ( A, b) problem if, and only
if, the symmetric matrix P̃ = P − D is a solution to the ( Ã, b̃) problem.
(b) Show that A − bbT P = Ã − bbT P̃.
Hint Note that Ã = A− bbT D and b̃ = b. Write down the algebraic Riccati equation
for the ( Ã, b̃) problem and perform appropriate substitutions.
5.21 Consider the following variation of the brachistochrone problem: A particle of mass
m is to slide down a frictionless track connecting two given points that are not located
on the vertical line. The particle’s initial speed is v0 . The motion of the particle is
constrained only by gravity and the shape of the track. Find the curve of the track that
minimizes the time travel between the two given points. Solve this brachistochrone
problem using the minimum principle of Pontryagin. Illustrate your derivations with a
numerical example and simulations using MATLAB.
5.22 (a) Consider the ground vehicle model described in Subsection 1.7.2. Let the input
signal be u = δ f (the front wheel steer angle) and δr = 0. Write a MATLAB
script that animates the lateral motion Y versus the longitudinal motion X of the
vehicle, the steer angle δ f , and the heading angle ψ. Use the steer angle shown in
Figure 5.32.
(b) Use the linear model and the data from Table 1.1 to design a linear quadratic
regulator (LQR). In this part, the control has the form
δ f = −kx + Yr e f ,
where Yr e f is the command input shown in Figure 5.33. That is, the closed-loop
2
Steer angle (deg)
1
2
3
0 2 4 6 8 10
Time (sec)
Figure 5.32 Plot of the steer angle δ f versus time for Exercise 5.22.
Command input (meters) 3
1
2
3
0 2 4 6 8 10
Time (sec)
Figure 5.33 Plot of the command input Yr e f versus time for Exercise 5.22.
system here will have the form

ẋ = ( A − bk)x + bYr e f .
You as a designer will select the weights Q and r . See how the linear vehicle
tracks the command input. Observe the sign difference in the command input and
the heading angle. Your LQR gain should be calculated for each chosen vx . In
other words, if in your interactive simulation a specific value of vx is chosen, then
your program should calculate the corresponding gain for this particular value of
vx . Implement the LQR state feedback employing the vehicle model that uses the
nonlinear cornering characteristic shown in Figure 1.15 and test the performance
of the closed-loop system utilizing the command input shown in Figure 5.33. You
may need to impose constraints on δ f —for example, |δ f | ≤ 20◦ . (Do not forget
about converting degrees into radians.)

Optimal Control: 5.1 Performance Indices

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimal Control: 5.1 Performance Indices

Uploaded by

Copyright:

Available Formats

CHAPTER

BEFORE me grew the human soul,

If everything seems under control, you’re just not going fast

5.1 Performance Indices

where the weight matrix Q = C T C is symmetric positive semideﬁnite.

where F is a symmetric positive deﬁnite matrix.

where the factor 12 is placed to simplify subsequent algebraic manipulations. Minimizing

5.2 A Glimpse at the Calculus of Variations

subject to the constraints y( x0 ) = y0 and y( x1 ) = y1 .

Figure 5.1 The length of arcs

Figure 5.2 The area of surfaces

y Figure 5.3 Illustration of the brachistochrone problem.

x Figure 5.4 The problem of ﬁnding geodesics.

5.2.1 Variation and Its Properties

where limx→0 β(x) = 0, and

For this we ﬁrst calculate its increment to get

The linear part of v is

which is the variation of the given functional.

This completes the proof.

which is the variation of the given functional in Example 5.3.

using the formula

Hence, by Lemma 5.1, we have δv = 0. Therefore, a functional that achieves its

5.2.2 Euler–Lagrange Equation

Lemma 5.2 Suppose that a function φ : R → R is continuous on the interval [t0 , t1 ].

if and only if φ(t) = 0 at every point of the interval [t0 , t1 ].

t0 t1 t Figure 5.5 Illustration of Problem 1.

Figure 5.6 Illustration of the proof of

and the proof is complete.

Thus, we reduced the given n-dimensional problem to n one-dimensional problems. Proceeding

Observe that if in (5.23) we set F = L = K − U , where L is the Lagrangian, K =

t0 t1 t Figure 5.7 Illustration of Problem 2.

trajectory speciﬁed by x + δx joining (t0 , x0 ) and (t1 + δt1 , x1 + δx1 ). Then,

x(t) x(t1) C(t1 t1, x1 x1)

x0 A Figure 5.8 A geometric

(F − x Fx )|t=t1 = 0 and Fx |t=t1 = 0.

Taking into account the solution to the Euler–Lagrange equation obtained in

Hence, C1 = −3 and the extremal for this case is

In Figure 5.9, we show a plot of the above extremal.

We perform simple calculations to obtain

( F − x F x )|t=t1 = (12x(t1 )t1 − ( x (t1 )) 2 ) = −4t16 + 8t13 − 1 = 0.

If the position of the end point B is restricted to a curve x = g1 (t), then

5.3 Linear Quadratic Regulator

5.3.1 Algebraic Riccati Equation (ARE)

where Q = Q T ≥ 0 and R = R T > 0. Our goal is to construct a stabilizing linear state-feedback

Theorem 5.2 If the state-feedback controller u∗ = −K x is such that

Because by assumption the closed-loop system is asymptotically stable, x(∞) = 0,

Thus, we showed that if a linear state-feedback controller satisﬁes the assumption of

Differentiating the above yields

ẋ = 2u1 + 2u2 , x(0) = 3,

along with the associated performance index

where r > 0 is a parameter.

The ARE for this problem is

Hence, the closed-loop optimal system is described by

5.3.2 Solving the ARE Using the Eigenvector Method

Thus, the ARE can be represented as

We premultiply the above equation by X −1 and then postmultiply it by X, where X is a nonsin-

Observe that if we can ﬁnd matrices X and P X such that

then equation (5.31) becomes

The above choice of X and P X constitutes a possible solution to the equation

To construct P, we partition the 2n × n eigenvector matrix [v 1 v 2 · · · v n ] into two n × n

Taking X = W and P X = Z and assuming that W is invertible, we obtain

matrices P generated by the above-described method is given by

Theorem 5.3 The poles of the closed-loop system

x(t) x(t1) C(t1 t1, x1 x1)