Linear-Quadratic Stochastic Control Problem

EE363 Winter 2008-09
Lecture 5
Linear Quadratic Stochastic Control
linear-quadratic stochastic control problem
solution via dynamic programming
51
Linear stochastic system
linear dynamical system, over finite time horizon:
xt+1 = Axt + But + wt, t = 0, . . . , N 1
wt is the process noise or disturbance at time t
wt are IID with E wt = 0, E wtwtT = W
x0 is independent of wt, with E x0 = 0, E x0xT0 = X
Linear Quadratic Stochastic Control 52

Control policies
state-feedback control: ut = t(xt), t = 0, . . . , N 1
t : Rn Rm called the control policy at time t
roughly speaking: we choose input after knowing the current state, but
before knowing the disturbance
closed-loop system is
xt+1 = Axt + Bt(xt) + wt, t = 0, . . . , N 1
x0, . . . , xN , u0, . . . , uN 1 are random

Stochastic control problem
objective:
N 1
!
X
xTt Qxt uTt Rut + xTN Qf xN

J =E +
t=0
with Q, Qf 0, R > 0
J depends (in complex way) on control policies 0, . . . , N 1
linear-quadratic stochastic control problem: choose control policies

0, . . . , N 1 to minimize J
(linear refers to the state dynamics; quadratic to the objective)
an infinite dimensional problem: variables are functions 0, . . . , N 1

Solution via dynamic programming
let Vt(z) be optimal value of objective, from t on, starting at xt = z
N 1
!
X
xT Qx uT Ru + xTN Qf xN

Vt(z) = min E +
t ,...,N 1
=t
subject to x +1 = Ax + Bu + w , u = (x )
we have
VN (z) = z T Qf z
J = E V0(x0) (expectation over x0)

Vt can be found by backward recursion: for t = N 1, . . . , 0
T T

Vt(z) = z Qz + min v Rv + E Vt+1(Az + Bv + wt)
v
expectation is over wt
we do not know where we will land, when we take ut = v
optimal policies have form
t(xt) T

= argmin v Rv + E Vt+1(Axt + Bv + wt)
v

Explicit form
lets show (via recursion) value functions are quadratic, with form
Vt(xt) = xTt Ptxt + qt, t = 0, . . . , N,
with Pt 0
PN = QN , qN = 0
now assume that Vt+1(z) = z T Pt+1z + qt+1

Bellman recursion is
Vt(z) = z T Qz + min{v T Rv +
v
E((Az + Bv + wt)T Pt+1(Az + Bv + wt) + qt+1)}

= z T Qz + Tr(W Pt+1) + qt+1 +
min{v T Rv + (Az + Bv)T Pt+1(Az + Bv)}
v
we use E(wtT Pt+1wt) = Tr(W Pt+1)

same recursion as deterministic LQR, with added constant
optimal policy is linear state feedback: t(xt) = Ktxt,
Kt = (B T Pt+1B + R)1B T Pt+1A
(same form as in deterministic LQR)

plugging in optimal w gives Vt(z) = z T Ptz + qt, with
Pt = AT Pt+1A AT Pt+1B(B T Pt+1B + R)1B T Pt+1A + Q

qt = qt+1 + Tr(W Pt+1)
first recursion same as for deterministic LQR

second term is just a running sum
we conclude that
Pt, Kt are same as in deterministic LQR
strangely, optimal policy is same as LQR, and independent of X, W

optimal cost is
J = E V0(x0)
= Tr(XP0) + q0
N
X
= Tr(XP0) + Tr(W Pt)
t=1
interpretation:
xT0 P0x0 is optimal cost of deterministic LQR, with
w0 = = wN 1 = 0
Tr(XP0) is average optimal LQR cost, with w0 = = wN 1 = 0
Tr(W Pt) is average optimal LQR cost, for E xt = 0, E xtxTt = W ,
wt = = wN 1 = 0

Infinite horizon
choose policies to minimize average stage cost
N 1
1 X T T

J = lim E xt Qxt + ut Rut
N N
t=0
optimal average stage cost is
J = Tr(W Pss)
where Pss satisfies the ARE
Pss = Q + AT PssA AT PssB(R + B T PssB)1B T PssA
optimal average stage cost doesnt depend on X

(an) optimal policy is constant linear state feedback
ut = Kssxt
where
Kss = (R + B T PssB)1B T PssA
Kss is steady-state LQR feedback gain
doesnt depend on X, W

Example
system with n = 5 states, m = 2 inputs, horizon N = 30
A, B chosen randomly; A scaled so maxi |i(A)| = 1
Q = I, Qf = 10I, R = I
x0 N (0, X), X = 10I
wt N (0, W ), W = 0.5I

Sample trajectories
sample trace of (xt)1 and (ut)1

4
(xt)1 2
4
0 5 10 15 20 25 30
1
(ut)1
1
0 5 10 15 20 25 30
t
blue: optimal stochastic control, red: no control (u0 = = uN 1 = 0)

Cost histogram
cost histogram for 1000 simulations

200
J
100
0
0 100 200 300 400 500 600 700
200
J pre
100
0
0 100 200 300 400 500 600 700
200
J ol
100
0
0 100 200 300 400 500 600 700
200
J nc
100
0
0 100 200 300 400 500 600 700
J

Comparisons
we compared optimal stochastic control (J = 224.2) with
prescient control
decide input sequence with full knowledge of future disturbances
u0, . . . , uN 1 computed assuming all wt are known
J pre = 137.6
open-loop control
u0, . . . , uN 1 depend only on x0
u0, . . . , uN 1 computed assuming w0 = = wN 1 = 0
J ol = 423.7
no control
u0 = = uN 1 = 0
J nc = 442.0

Linear-Quadratic Stochastic Control Problem - Solution Via Dynamic Programming

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear-Quadratic Stochastic Control Problem - Solution Via Dynamic Programming

Uploaded by

Copyright:

Available Formats

EE363 Winter 2008-09

solution via dynamic programming

linear dynamical system, over finite time horizon:

xt+1 = Axt + But + wt, t = 0, . . . , N 1

wt is the process noise or disturbance at time t

wt are IID with E wt = 0, E wtwtT = W

x0 is independent of wt, with E x0 = 0, E x0xT0 = X

Linear Quadratic Stochastic Control 52

state-feedback control: ut = t(xt), t = 0, . . . , N 1

t : Rn Rm called the control policy at time t

xt+1 = Axt + Bt(xt) + wt, t = 0, . . . , N 1

x0, . . . , xN , u0, . . . , uN 1 are random

Linear Quadratic Stochastic Control 53

J depends (in complex way) on control policies 0, . . . , N 1

linear-quadratic stochastic control problem: choose control policies

an infinite dimensional problem: variables are functions 0, . . . , N 1

Linear Quadratic Stochastic Control 54

let Vt(z) be optimal value of objective, from t on, starting at xt = z

Linear Quadratic Stochastic Control 55

optimal policies have form

Linear Quadratic Stochastic Control 56

Vt(xt) = xTt Ptxt + qt, t = 0, . . . , N,

now assume that Vt+1(z) = z T Pt+1z + qt+1

Linear Quadratic Stochastic Control 57

E((Az + Bv + wt)T Pt+1(Az + Bv + wt) + qt+1)}

we use E(wtT Pt+1wt) = Tr(W Pt+1)

optimal policy is linear state feedback: t(xt) = Ktxt,

Kt = (B T Pt+1B + R)1B T Pt+1A

(same form as in deterministic LQR)

Linear Quadratic Stochastic Control 58

Pt = AT Pt+1A AT Pt+1B(B T Pt+1B + R)1B T Pt+1A + Q

first recursion same as for deterministic LQR

Linear Quadratic Stochastic Control 59

Linear Quadratic Stochastic Control 510

choose policies to minimize average stage cost

optimal average stage cost is

where Pss satisfies the ARE

Pss = Q + AT PssA AT PssB(R + B T PssB)1B T PssA

optimal average stage cost doesnt depend on X

Linear Quadratic Stochastic Control 511

Linear Quadratic Stochastic Control 512

system with n = 5 states, m = 2 inputs, horizon N = 30

A, B chosen randomly; A scaled so maxi |i(A)| = 1

x0 N (0, X), X = 10I

Linear Quadratic Stochastic Control 513

sample trace of (xt)1 and (ut)1

blue: optimal stochastic control, red: no control (u0 = = uN 1 = 0)

Linear Quadratic Stochastic Control 514

cost histogram for 1000 simulations

Linear Quadratic Stochastic Control 515

we compared optimal stochastic control (J = 224.2) with

Linear Quadratic Stochastic Control 516

You might also like