Professional Documents
Culture Documents
Lecture 5
Linear Quadratic Stochastic Control
51
Linear stochastic system
roughly speaking: we choose input after knowing the current state, but
before knowing the disturbance
closed-loop system is
objective:
N 1
!
X
xTt Qxt uTt Rut + xTN Qf xN
J =E +
t=0
with Q, Qf 0, R > 0
N 1
!
X
xT Qx uT Ru + xTN Qf xN
Vt(z) = min E +
t ,...,N 1
=t
subject to x +1 = Ax + Bu + w , u = (x )
we have
VN (z) = z T Qf z
J = E V0(x0) (expectation over x0)
T T
Vt(z) = z Qz + min v Rv + E Vt+1(Az + Bv + wt)
v
expectation is over wt
we do not know where we will land, when we take ut = v
t(xt) T
= argmin v Rv + E Vt+1(Axt + Bv + wt)
v
lets show (via recursion) value functions are quadratic, with form
with Pt 0
PN = QN , qN = 0
Vt(z) = z T Qz + min{v T Rv +
v
we conclude that
Pt, Kt are same as in deterministic LQR
strangely, optimal policy is same as LQR, and independent of X, W
J = E V0(x0)
= Tr(XP0) + q0
N
X
= Tr(XP0) + Tr(W Pt)
t=1
interpretation:
xT0 P0x0 is optimal cost of deterministic LQR, with
w0 = = wN 1 = 0
Tr(XP0) is average optimal LQR cost, with w0 = = wN 1 = 0
Tr(W Pt) is average optimal LQR cost, for E xt = 0, E xtxTt = W ,
wt = = wN 1 = 0
N 1
1 X T T
J = lim E xt Qxt + ut Rut
N N
t=0
J = Tr(W Pss)
ut = Kssxt
where
Kss = (R + B T PssB)1B T PssA
Kss is steady-state LQR feedback gain
doesnt depend on X, W
Q = I, Qf = 10I, R = I
wt N (0, W ), W = 0.5I
(xt)1 2
4
0 5 10 15 20 25 30
1
(ut)1
1
0 5 10 15 20 25 30
t
0
0 100 200 300 400 500 600 700
200
J pre
100
0
0 100 200 300 400 500 600 700
200
J ol
100
0
0 100 200 300 400 500 600 700
200
J nc
100
0
0 100 200 300 400 500 600 700
J
prescient control
decide input sequence with full knowledge of future disturbances
u0, . . . , uN 1 computed assuming all wt are known
J pre = 137.6
open-loop control
u0, . . . , uN 1 depend only on x0
u0, . . . , uN 1 computed assuming w0 = = wN 1 = 0
J ol = 423.7
no control
u0 = = uN 1 = 0
J nc = 442.0