You are on page 1of 16

EE363 Winter 2008-09

Lecture 5
Linear Quadratic Stochastic Control

linear-quadratic stochastic control problem

solution via dynamic programming

51
Linear stochastic system

linear dynamical system, over finite time horizon:

xt+1 = Axt + But + wt, t = 0, . . . , N 1

wt is the process noise or disturbance at time t

wt are IID with E wt = 0, E wtwtT = W

x0 is independent of wt, with E x0 = 0, E x0xT0 = X

Linear Quadratic Stochastic Control 52


Control policies

state-feedback control: ut = t(xt), t = 0, . . . , N 1

t : Rn Rm called the control policy at time t

roughly speaking: we choose input after knowing the current state, but
before knowing the disturbance

closed-loop system is

xt+1 = Axt + Bt(xt) + wt, t = 0, . . . , N 1

x0, . . . , xN , u0, . . . , uN 1 are random

Linear Quadratic Stochastic Control 53


Stochastic control problem

objective:

N 1
!
X
xTt Qxt uTt Rut + xTN Qf xN

J =E +
t=0

with Q, Qf 0, R > 0

J depends (in complex way) on control policies 0, . . . , N 1

linear-quadratic stochastic control problem: choose control policies


0, . . . , N 1 to minimize J
(linear refers to the state dynamics; quadratic to the objective)

an infinite dimensional problem: variables are functions 0, . . . , N 1

Linear Quadratic Stochastic Control 54


Solution via dynamic programming

let Vt(z) be optimal value of objective, from t on, starting at xt = z

N 1
!
X
xT Qx uT Ru + xTN Qf xN

Vt(z) = min E +
t ,...,N 1
=t

subject to x +1 = Ax + Bu + w , u = (x )

we have
VN (z) = z T Qf z
J = E V0(x0) (expectation over x0)

Linear Quadratic Stochastic Control 55


Vt can be found by backward recursion: for t = N 1, . . . , 0

T T

Vt(z) = z Qz + min v Rv + E Vt+1(Az + Bv + wt)
v

expectation is over wt
we do not know where we will land, when we take ut = v

optimal policies have form

t(xt) T

= argmin v Rv + E Vt+1(Axt + Bv + wt)
v

Linear Quadratic Stochastic Control 56


Explicit form

lets show (via recursion) value functions are quadratic, with form

Vt(xt) = xTt Ptxt + qt, t = 0, . . . , N,

with Pt 0

PN = QN , qN = 0

now assume that Vt+1(z) = z T Pt+1z + qt+1

Linear Quadratic Stochastic Control 57


Bellman recursion is

Vt(z) = z T Qz + min{v T Rv +
v

E((Az + Bv + wt)T Pt+1(Az + Bv + wt) + qt+1)}


= z T Qz + Tr(W Pt+1) + qt+1 +
min{v T Rv + (Az + Bv)T Pt+1(Az + Bv)}
v

we use E(wtT Pt+1wt) = Tr(W Pt+1)


same recursion as deterministic LQR, with added constant

optimal policy is linear state feedback: t(xt) = Ktxt,

Kt = (B T Pt+1B + R)1B T Pt+1A

(same form as in deterministic LQR)

Linear Quadratic Stochastic Control 58


plugging in optimal w gives Vt(z) = z T Ptz + qt, with

Pt = AT Pt+1A AT Pt+1B(B T Pt+1B + R)1B T Pt+1A + Q


qt = qt+1 + Tr(W Pt+1)

first recursion same as for deterministic LQR


second term is just a running sum

we conclude that
Pt, Kt are same as in deterministic LQR
strangely, optimal policy is same as LQR, and independent of X, W

Linear Quadratic Stochastic Control 59


optimal cost is

J = E V0(x0)
= Tr(XP0) + q0
N
X
= Tr(XP0) + Tr(W Pt)
t=1

interpretation:
xT0 P0x0 is optimal cost of deterministic LQR, with
w0 = = wN 1 = 0
Tr(XP0) is average optimal LQR cost, with w0 = = wN 1 = 0
Tr(W Pt) is average optimal LQR cost, for E xt = 0, E xtxTt = W ,
wt = = wN 1 = 0

Linear Quadratic Stochastic Control 510


Infinite horizon

choose policies to minimize average stage cost

N 1
1 X T T

J = lim E xt Qxt + ut Rut
N N
t=0

optimal average stage cost is

J = Tr(W Pss)

where Pss satisfies the ARE

Pss = Q + AT PssA AT PssB(R + B T PssB)1B T PssA

optimal average stage cost doesnt depend on X

Linear Quadratic Stochastic Control 511


(an) optimal policy is constant linear state feedback

ut = Kssxt

where
Kss = (R + B T PssB)1B T PssA
Kss is steady-state LQR feedback gain
doesnt depend on X, W

Linear Quadratic Stochastic Control 512


Example

system with n = 5 states, m = 2 inputs, horizon N = 30

A, B chosen randomly; A scaled so maxi |i(A)| = 1

Q = I, Qf = 10I, R = I

x0 N (0, X), X = 10I

wt N (0, W ), W = 0.5I

Linear Quadratic Stochastic Control 513


Sample trajectories

sample trace of (xt)1 and (ut)1


4

(xt)1 2

4
0 5 10 15 20 25 30

1
(ut)1

1
0 5 10 15 20 25 30
t

blue: optimal stochastic control, red: no control (u0 = = uN 1 = 0)

Linear Quadratic Stochastic Control 514


Cost histogram

cost histogram for 1000 simulations


200
J
100

0
0 100 200 300 400 500 600 700
200
J pre
100

0
0 100 200 300 400 500 600 700
200
J ol
100

0
0 100 200 300 400 500 600 700
200
J nc
100

0
0 100 200 300 400 500 600 700
J

Linear Quadratic Stochastic Control 515


Comparisons

we compared optimal stochastic control (J = 224.2) with

prescient control
decide input sequence with full knowledge of future disturbances
u0, . . . , uN 1 computed assuming all wt are known
J pre = 137.6

open-loop control
u0, . . . , uN 1 depend only on x0
u0, . . . , uN 1 computed assuming w0 = = wN 1 = 0
J ol = 423.7

no control
u0 = = uN 1 = 0
J nc = 442.0

Linear Quadratic Stochastic Control 516

You might also like