Professional Documents
Culture Documents
Preface
Before we leave the topic of dynamic programming, it is worthwhile to discuss some of the computational
issues that arise in its application. Although computers have come a long way since the time the
algorithm was first introduced, some problems still persist. We then introduce the subject of the calculus
of variations, which is the second method used to solve optimal control problems. We develop the
concepts which support the fundamental theorem of the calculus of variations, which then leads to the
Euler-Lagrange equations. The solution of these equations is one of our primary objectives going
forward.
Recall that we began by considering a system described by a state difference equation, i.e.
x ( k 1) a D ( x ( k ) , u ( k ) ) k 0, 1, , N 1 (8.1)
We set out to find the control law that minimizes the performance criterion
N 1
J h(x( N )) gD (x(k ) , u( k )) (8.2)
k 0
We applied the method of dynamic programming to this problem which led to the recurrence relation
J N K , N ( x ( N K ) ) min { g D x ( N K ) , u ( N K )
u ( N K )
J N ( K 1 ) , N ( a D x ( N K ) , u ( N K ) } (8.3)
K 1, 2, , N
J N , N ( x ( N ) ) h ( x ( N ) ) (8.4)
The solution of the recurrence relation is an optimal control law or optimal policy, i.e.
u ( x ( N K ) , N K ) K 1, 2, , N
We obtain the solution by trying all admissible control values at each admissible state value. If the states
and controls were not quantized to begin with, then they must be in order to undertake the algorithm
computationally. For example, if the system is second-order, the total number of discrete state values at
each time k t is s1 s 2 , where the quantities s1 and s 2 are determined by the relationship
1
ECE 551 LECTURE 4
x r max x r min
sr 1 (8.5)
xr
Where, we have assumed that x r is chosen such that the interval x r max x r min contains an integer
number of points. Hence, for the n -order system, the number of state values for each time t k t is
th
given by
S s1 s 2 s n (8.6)
x r max x r min
sr 1 r 1, 2, , n
xr
x r max x r min
Again, we assume that the ratio is an integer.
xr
The admissible range of control values is quantized in exactly the same manner. Hence, the total number
of quantized control values is given by
C c1c 2 c m (8.7)
u q max u q min
cq 1 q 1, 2, , m (8.8)
uq
Lets denote the admissible quantized state and control values at time t k t as follows:
x (i ) ( k ) i 1, 2, , S
u( j ) ( k ) j 1, 2, , C
In performing the dynamic programming algorithm, we repeatedly evaluate the following quantity:
C N K , N ( x ( i ) ( N K ) , u ( j ) ( N K ) ) g D x ( i ) ( N K ) , u ( j ) ( N K )
(8.9)
J N ( K 1 ) , N (x ( i , j ) ( N K 1 ) )
2
ECE 551 LECTURE 4
Equation (8.9) represents the minimum cost of operation over the final K stages of a N stage process
assuming that the control value u
( j)
( N K ) is applied at state x (i )
( N K ) . The objective at each
stage is to find the value of u
( j)
( N K ) that yields the quantity J N K , N ( x ( i ) ( N K ) ) , which is the
minimum of C N K , N ( x
(i )
( N K ) , u( j ) ( N K ) ) .
The smallest value of C N K , N ( x
(i )
( N K ) , u ( j ) ( N K ) ) and its associated control value are stored
in computer memory. The process repeats until the optimal control and cost corresponding to each state
value x
(i )
( N K ) has been computed at the current time increment.
The value of K is then incremented and the overall process repeats until all stages have been computed.
Dynamic programming makes the direct search feasible because instead of searching among the set of
all admissible controls that cause admissible trajectories, we consider only those controls that also satisfy
the principle of optimality. Figure 1 below illustrates this point.
S1
S2 S3
S4
Note that S1 represents all controls, S2 represents admissible controls, S3 represents controls which
cause admissible trajectories, and S4 represents controls which satisfy the principle of optimality. Thus,
only the shaded (pink) region represents the controls which the dynamic programming algorithm
searches.
Lets consider a simple example, i.e. a first-order process with one control input. We assume that the
admissible state values are quantized into 10 (ten) levels, and the admissible control values into 4 (four)
levels.
3
ECE 551 LECTURE 4
In direct enumeration, we would try all of the 4 (four) control values at each of the 10 (ten) initial state
values for one time increment t . Thus x ( t ) will assume any of the 40 (forty) admissible state values.
Assuming that all of these state values are admissible, we apply all four control values at each of the 40
(forty) state values and determine the resulting x ( 2 t ) . This process repeats for the appropriate
number of stages.
In dynamic programming, at every stage we try all 4 (four) control values at each of 10 (ten) state values.
Again, this process repeats for the appropriate number of stages.
Table 1 below summarizes and compares the results obtained using direct enumeration and dynamic
programming. Note the significant savings in the number of computations which results from the use of
dynamic programming.
1 40 40
2 80 200
3 120 840
4 160 3400
5 200 13640
6 240 54600
10 4
N
k
N 40 N
k 1
Before we leave the topic of dynamic programming, one last comment is in order. Higher dimension
problems, i.e. higher-order state equations with multiple inputs, yield a great number of values to be
stored during the computation process. Bellman called this the curse of dimensionality. Nowadays,
with the abundance of available high-speed memory, this is less of a curse.
4
ECE 551 LECTURE 4
Calculus of Variations
We begin our discussion of the calculus of variations with the concept of a functional. The performance
measure is an example of a functional, which we define as follows:
For example, let x be a continuous function of t defined in the interval t 0 , t f and let
tf
J ( x) x(t )d t
t0
The real number assigned by the functional J is the area under the x ( t ) curve.
tf
J ( x) x(t )d t
t0
Where, we have that x is a continuous function of the time-variable t . We first ask whether the principle
of homogeneity is satisfied, i.e.
tf
J ( x) x(t )d t
t0
tf tf
J ( x ) x(t )d t x(t )d t
t0 t0
Hence, we have that J ( x ) J ( x ) for all real numbers and all x and x in . Next, we ask
whether the principle of additivity is satisfied, i.e.
5
ECE 551 LECTURE 4
tf tf tf
J ( x1 x 2 ) x
t0
1 ( t ) x 2 ( t ) d t x1 ( t ) d t x 2 ( t ) d t
t0 t0
tf
J ( x1 )
t0
x 1 (t ) d t
tf
J ( x2 ) x
t0
2 (t ) d t
tf
J ( x) x
2
(t ) d t
t0
Where, once again we have that x is a continuous function of the time-variable t . First, lets look at
whether the principle of homogeneity is satisfied, i.e.
tf
J ( x ) x2 (t ) d t
t0
tf tf
J ( x ) [ x ( t ) ] d t 2 2
x
2
(t )d t
t0 t0
Another useful concept will be the closeness of functions which uses the norm of a function defined as
follows:
6
ECE 551 LECTURE 4
3. x1 x2 x1 x2
Now, lets consider two functions y ( t ) and z ( t ) . In order to compare the closeness of these functions,
we examine the quantity y ( t ) z ( t ) . If the functions are identical, then we would expect the norm to
be equal to zero. Similarly, if the norm is small, then we expect the functions to be close, whereas, if the
norm is large, then the functions are taken to be far apart. An example of a norm is as follows:
x max x ( t )
t0 t t f
Many other choices are possible. Any choice must satisfy the three properties listed above.
If x and x are functions for which the functional J is defined, then the increment of J
, denoted by J , is given by J J ( x x ) J ( x ) .
Note that x is the variation of the function x . For example, lets find the increment of the following
functional, i.e.
tf
J ( x) x
2
(t ) d t
t0
Where, once again we have that x is a continuous function of the time-variable t . The increment is given
by
J J ( x x) J ( x)
tf tf
x(t ) x (t ) x (t ) d t
2 2
t0 t0
2 x ( t ) x ( t ) x ( t ) d t
tf
t0
The variation of a functional is very important because it plays the same role in determining extreme
values of functionals as the differential does in finding maxima and minima of functions. We start with the
increment of a functional, which can be written as follows:
J ( x , x) J ( x , x) g ( x , x) x
7
ECE 551 LECTURE 4
differentiable on x and x is the variation of J evaluated for the function x . Lets take an example.
Let x be a continuous scalar function defined for t 0 , 1 . Lets find the variation of the following
functional, i.e.
1
J ( x ) x2 (t ) 2 x (t ) d t
0
J ( x , x) J ( x x) J ( x)
x (t ) x (t )
1
2
2 x (t ) x (t ) d t
0
1
x2 (t ) 2 x (t ) d t
0
2 x ( t ) 2 x ( t ) x ( t ) d t
1
2
0
1 1
2 x ( t ) 2 x ( t ) d t x ( t ) d t
2
0 0
x ( t ) d t g ( x , x) x
2
And that,
lim
x 0
g ( x , x) 0
x max x ( t )
0 t 1
8
ECE 551 LECTURE 4
x 1 1
x ( t ) d t x
x (t ) 2
x 0
2
dt
0
x
Thus,
g ( x , x)
1
x ( t ) 2 d t
0
x
x ( t ) 2 x (t ) x(t )
And so,
1
x (t ) x (t ) 1
0
x
d t x(t ) d t
0
This is a direct result of our definition of x , which implies that x x(t ) for all t 0 , 1 .
Thus, if x 0 , then x ( t ) 0 for all t 0 , 1 , and so
1
lim x ( t ) d t 0
x 0
0
1
J ( x , x) 2 x(t ) 2 x(t ) d t
0
Note that the variation of the functional J only contains terms linear in x ( t ) .
Now we must examine the extrema of functionals, since that is the objective of optimal control, i.e. to find
the extrema of the performance measure (functional). We start by considering a functional J with domain
. Suppose that J has a relative extremum at x . Then, there is an 0 , such that for all functions x
in the domain which satisfy x x
, the increment of J has the same sign. Consequently, if
J J ( x ) J ( x ) 0
9
ECE 551 LECTURE 4
Then, we have that J ( x ) is a relative minimum, whereas, if
J J ( x ) J ( x ) 0
Then, we have that J ( x ) is a relative maximum. If can be arbitrarily large, then J ( x ) is a global
extremum.
If x is an extremal, then the variation of J must vanish on x , i.e.
J ( x , x) 0 for
all admissible x .
tf
J ( x) g ( x ( t ) , x ( t ) , t ) d t
t0
Has a relative extremum x , then by applying the fundamental theorem we get
g
tf
x x
d g
J ( x , x )
( t ) , x ( t ) , t x ( t ) , x ( t ) , t x ( t ) d t 0
t0
dt x
So, if x is an extremal, then it must satisfy the Euler-Lagrange equation, i.e.
d g
g
x
x ( t ) , x ( t ) , t
x ( t ) , x ( t ) , t 0
dt x
This is a second-order, non-linear, time-varying ordinary differential equation (ODE), i.e. it is difficult to
solve analytically. Numerical solution results in a non-linear, two-point boundary-value problem to find x ,
which may also prove to be difficult to solve.
10