You are on page 1of 10

ECE 551 LECTURE 4

Preface

Before we leave the topic of dynamic programming, it is worthwhile to discuss some of the computational
issues that arise in its application. Although computers have come a long way since the time the
algorithm was first introduced, some problems still persist. We then introduce the subject of the calculus
of variations, which is the second method used to solve optimal control problems. We develop the
concepts which support the fundamental theorem of the calculus of variations, which then leads to the
Euler-Lagrange equations. The solution of these equations is one of our primary objectives going
forward.

Dynamic Programming Computations

Recall that we began by considering a system described by a state difference equation, i.e.

x ( k 1) a D ( x ( k ) , u ( k ) ) k 0, 1, , N 1 (8.1)

We set out to find the control law that minimizes the performance criterion

N 1
J h(x( N )) gD (x(k ) , u( k )) (8.2)
k 0

We applied the method of dynamic programming to this problem which led to the recurrence relation

J N K , N ( x ( N K ) ) min { g D x ( N K ) , u ( N K )
u ( N K )

J N ( K 1 ) , N ( a D x ( N K ) , u ( N K ) } (8.3)
K 1, 2, , N

Where, the initial value is given by

J N , N ( x ( N ) ) h ( x ( N ) ) (8.4)

The solution of the recurrence relation is an optimal control law or optimal policy, i.e.

u ( x ( N K ) , N K ) K 1, 2, , N

We obtain the solution by trying all admissible control values at each admissible state value. If the states
and controls were not quantized to begin with, then they must be in order to undertake the algorithm
computationally. For example, if the system is second-order, the total number of discrete state values at
each time k t is s1 s 2 , where the quantities s1 and s 2 are determined by the relationship

1
ECE 551 LECTURE 4

x r max x r min
sr 1 (8.5)
xr

Where, we have assumed that x r is chosen such that the interval x r max x r min contains an integer

number of points. Hence, for the n -order system, the number of state values for each time t k t is
th

given by

S s1 s 2 s n (8.6)

Where, we have that

x r max x r min
sr 1 r 1, 2, , n
xr

x r max x r min
Again, we assume that the ratio is an integer.
xr

The admissible range of control values is quantized in exactly the same manner. Hence, the total number
of quantized control values is given by

C c1c 2 c m (8.7)

Where, we have that

u q max u q min
cq 1 q 1, 2, , m (8.8)
uq

Lets denote the admissible quantized state and control values at time t k t as follows:

x (i ) ( k ) i 1, 2, , S

u( j ) ( k ) j 1, 2, , C

In performing the dynamic programming algorithm, we repeatedly evaluate the following quantity:


C N K , N ( x ( i ) ( N K ) , u ( j ) ( N K ) ) g D x ( i ) ( N K ) , u ( j ) ( N K )
(8.9)
J N ( K 1 ) , N (x ( i , j ) ( N K 1 ) )

2
ECE 551 LECTURE 4

Equation (8.9) represents the minimum cost of operation over the final K stages of a N stage process
assuming that the control value u
( j)
( N K ) is applied at state x (i )
( N K ) . The objective at each
stage is to find the value of u
( j)
( N K ) that yields the quantity J N K , N ( x ( i ) ( N K ) ) , which is the

minimum of C N K , N ( x
(i )
( N K ) , u( j ) ( N K ) ) .


The smallest value of C N K , N ( x
(i )
( N K ) , u ( j ) ( N K ) ) and its associated control value are stored
in computer memory. The process repeats until the optimal control and cost corresponding to each state
value x
(i )
( N K ) has been computed at the current time increment.

The value of K is then incremented and the overall process repeats until all stages have been computed.

Dynamic programming makes the direct search feasible because instead of searching among the set of
all admissible controls that cause admissible trajectories, we consider only those controls that also satisfy
the principle of optimality. Figure 1 below illustrates this point.

S1

S2 S3

S4

Figure 1 Subsets of the Control Space

Note that S1 represents all controls, S2 represents admissible controls, S3 represents controls which
cause admissible trajectories, and S4 represents controls which satisfy the principle of optimality. Thus,
only the shaded (pink) region represents the controls which the dynamic programming algorithm
searches.

Lets consider a simple example, i.e. a first-order process with one control input. We assume that the
admissible state values are quantized into 10 (ten) levels, and the admissible control values into 4 (four)
levels.

3
ECE 551 LECTURE 4

In direct enumeration, we would try all of the 4 (four) control values at each of the 10 (ten) initial state
values for one time increment t . Thus x ( t ) will assume any of the 40 (forty) admissible state values.
Assuming that all of these state values are admissible, we apply all four control values at each of the 40
(forty) state values and determine the resulting x ( 2 t ) . This process repeats for the appropriate
number of stages.

In dynamic programming, at every stage we try all 4 (four) control values at each of 10 (ten) state values.
Again, this process repeats for the appropriate number of stages.

Table 1 below summarizes and compares the results obtained using direct enumeration and dynamic
programming. Note the significant savings in the number of computations which results from the use of
dynamic programming.

Table 1 Comparison of Dynamic Programming & Direct Enumeration

Number of Calculations Required Number of Calculations Required


Number of Stages in the Process
by Dynamic Programming by Direct Enumeration

1 40 40

2 80 200

3 120 840

4 160 3400

5 200 13640

6 240 54600

10 4
N
k
N 40 N
k 1

Before we leave the topic of dynamic programming, one last comment is in order. Higher dimension
problems, i.e. higher-order state equations with multiple inputs, yield a great number of values to be
stored during the computation process. Bellman called this the curse of dimensionality. Nowadays,
with the abundance of available high-speed memory, this is less of a curse.

4
ECE 551 LECTURE 4

Calculus of Variations

We begin our discussion of the calculus of variations with the concept of a functional. The performance
measure is an example of a functional, which we define as follows:

A functional is a rule of correspondence that assigns to each function x in a certain class


a unique real number. The class is the domain of the functional, and the set of
real numbers associated with the functions in is called the range of the functional.

For example, let x be a continuous function of t defined in the interval t 0 , t f and let
tf

J ( x) x(t )d t
t0

The real number assigned by the functional J is the area under the x ( t ) curve.

We will find several other definitions useful, e.g.

A linear functional of x satisfies the principle of homogeneity, i.e. J ( x ) J ( x ) for


all x and for all real numbers such that x , and the principle of additivity,
i.e. J ( x1 x2 ) J ( x1 ) J ( x 2 ) for all x1 , x2 and x1 x 2 in the class .

For example, lets reconsider the following functional, i.e.

tf

J ( x) x(t )d t
t0

Where, we have that x is a continuous function of the time-variable t . We first ask whether the principle
of homogeneity is satisfied, i.e.

tf

J ( x) x(t )d t
t0
tf tf

J ( x ) x(t )d t x(t )d t
t0 t0

Hence, we have that J ( x ) J ( x ) for all real numbers and all x and x in . Next, we ask
whether the principle of additivity is satisfied, i.e.

5
ECE 551 LECTURE 4

tf tf tf

J ( x1 x 2 ) x
t0
1 ( t ) x 2 ( t ) d t x1 ( t ) d t x 2 ( t ) d t
t0 t0
tf

J ( x1 )
t0
x 1 (t ) d t

tf

J ( x2 ) x
t0
2 (t ) d t

Hence, we have that J ( x1 x 2 ) J ( x1 ) J ( x 2 ) for all x1 , x2 and x1 x 2 in the class . Since


the principles of homogeneity and additivity are both satisfied, the functional is linear.

Now, lets consider a different functional as follows:

tf

J ( x) x
2
(t ) d t
t0

Where, once again we have that x is a continuous function of the time-variable t . First, lets look at
whether the principle of homogeneity is satisfied, i.e.

tf

J ( x ) x2 (t ) d t
t0
tf tf

J ( x ) [ x ( t ) ] d t 2 2
x
2
(t )d t
t0 t0

Thus, we note that J ( x ) J ( x ) for all , and so the functional is non-linear.

Another useful concept will be the closeness of functions which uses the norm of a function defined as
follows:

The norm of a function is a rule of correspondence that assigns to each function x ,



defined for t 0 , t f , a real number. The norm of x is denoted as x , and satisfies the
following properties:

1. x 0 and x 0 iff x ( t ) 0 for all t t 0 , t f .

2. x x for all real numbers .

6
ECE 551 LECTURE 4

3. x1 x2 x1 x2

Now, lets consider two functions y ( t ) and z ( t ) . In order to compare the closeness of these functions,

we examine the quantity y ( t ) z ( t ) . If the functions are identical, then we would expect the norm to
be equal to zero. Similarly, if the norm is small, then we expect the functions to be close, whereas, if the
norm is large, then the functions are taken to be far apart. An example of a norm is as follows:

x max x ( t )
t0 t t f

Many other choices are possible. Any choice must satisfy the three properties listed above.

The increment of J is an important concept which is defined as follows:

If x and x are functions for which the functional J is defined, then the increment of J
, denoted by J , is given by J J ( x x ) J ( x ) .

Note that x is the variation of the function x . For example, lets find the increment of the following
functional, i.e.

tf

J ( x) x
2
(t ) d t
t0

Where, once again we have that x is a continuous function of the time-variable t . The increment is given
by

J J ( x x) J ( x)
tf tf

x(t ) x (t ) x (t ) d t
2 2

t0 t0

2 x ( t ) x ( t ) x ( t ) d t
tf

t0

The variation of a functional is very important because it plays the same role in determining extreme
values of functionals as the differential does in finding maxima and minima of functions. We start with the
increment of a functional, which can be written as follows:

J ( x , x) J ( x , x) g ( x , x) x

7
ECE 551 LECTURE 4

Where, we have that J is linear in x . Now, if lim g ( x , x ) 0 , then J is said to be


x 0

differentiable on x and x is the variation of J evaluated for the function x . Lets take an example.
Let x be a continuous scalar function defined for t 0 , 1 . Lets find the variation of the following
functional, i.e.


1
J ( x ) x2 (t ) 2 x (t ) d t
0

First, we find the increment of J , i.e.

J ( x , x) J ( x x) J ( x)

x (t ) x (t )
1
2
2 x (t ) x (t ) d t
0


1
x2 (t ) 2 x (t ) d t
0

2 x ( t ) 2 x ( t ) x ( t ) d t
1

2

0
1 1
2 x ( t ) 2 x ( t ) d t x ( t ) d t
2

0 0

Now, lets show that

x ( t ) d t g ( x , x) x
2

And that,

lim
x 0
g ( x , x) 0

Since x is continuous, we let

x max x ( t )
0 t 1

Next, we have that

8
ECE 551 LECTURE 4

x 1 1
x ( t ) d t x
x (t ) 2
x 0
2
dt
0
x

Thus,

g ( x , x)
1
x ( t ) 2 d t
0
x

Note the following, i.e.

x ( t ) 2 x (t ) x(t )

And so,

1
x (t ) x (t ) 1


0
x
d t x(t ) d t
0

This is a direct result of our definition of x , which implies that x x(t ) for all t 0 , 1 .
Thus, if x 0 , then x ( t ) 0 for all t 0 , 1 , and so

1
lim x ( t ) d t 0
x 0
0

This confirms that g ( x , x ) 0 as x 0. Consequently, the variation of J is given by

1
J ( x , x) 2 x(t ) 2 x(t ) d t
0

Note that the variation of the functional J only contains terms linear in x ( t ) .

Now we must examine the extrema of functionals, since that is the objective of optimal control, i.e. to find
the extrema of the performance measure (functional). We start by considering a functional J with domain
. Suppose that J has a relative extremum at x . Then, there is an 0 , such that for all functions x
in the domain which satisfy x x

, the increment of J has the same sign. Consequently, if

J J ( x ) J ( x ) 0

9
ECE 551 LECTURE 4


Then, we have that J ( x ) is a relative minimum, whereas, if

J J ( x ) J ( x ) 0

Then, we have that J ( x ) is a relative maximum. If can be arbitrarily large, then J ( x ) is a global

extremum.

We now state the fundamental theorem of the calculus of variations, i.e.


If x is an extremal, then the variation of J must vanish on x , i.e.

J ( x , x) 0 for
all admissible x .

By admissible x , we mean that x x must be a member of the class .

In the next lecture, we will show that if the functional

tf

J ( x) g ( x ( t ) , x ( t ) , t ) d t
t0


Has a relative extremum x , then by applying the fundamental theorem we get

g
tf

x x
d g
J ( x , x )
( t ) , x ( t ) , t x ( t ) , x ( t ) , t x ( t ) d t 0

t0
dt x


So, if x is an extremal, then it must satisfy the Euler-Lagrange equation, i.e.

d g
g
x

x ( t ) , x ( t ) , t

x ( t ) , x ( t ) , t 0
dt x

This is a second-order, non-linear, time-varying ordinary differential equation (ODE), i.e. it is difficult to

solve analytically. Numerical solution results in a non-linear, two-point boundary-value problem to find x ,
which may also prove to be difficult to solve.

10

You might also like