Professional Documents
Culture Documents
ORF 544 Week 9 CFAs and Intro To MDP
ORF 544 Week 9 CFAs and Intro To MDP
Warren Powell
Princeton University
http://www.castlelab.princeton.edu
subject to
Ax b ( ) Parametrically
modified constraints
T
» Wemin
tune F by E C ( St , X t ( ))
( )optimizing:
t 0
f F
subject to
Ax c b b
T
» Wemin
tune F by E C ( St , X t ( ))
( )optimizing:
t 0
• Derivative-free
• Derivative-based
Logistics
» Need to accommodate
different sources of
uncertainty.
• Market behavior
• Transit times
• Supplier uncertainty
• Product quality
© 2018 Warren B. Powell
Cost function approximations
Imagine that we want to purchase parts from
different suppliers. Let xtp be the amount of
product we purchase at time t from supplier p to
meet forecasted demand Dt . We would solve
X t ( St ) arg max xt c
pP
x
p tp
subject to
x tp Dt
pP
xtp u p Xt
xtp 0
» This assumes our demand forecast D is accurate.
t
subject to
Reserve
x tp Reserve
Dt
pP
xtp u p Xt ( )
xtp buffer
Buffer stock
» This is a “parametric cost function approximation”
t t+1 t+2
The assignment of drivers to loads evolves over time, with new loads
being called in, along with updates to the status of a driver.
© 2019 Warren B. Powell
Cost function approximations
A purely myopic policy would solve this problem
using
C ( St , xt | )
SMART-Invest
One-dimensional parameter search
Wind &
solar
Battery 750 GWhr battery!
storage
Fossil 20 GW
backup
Wind
Solar
1 24 36
Slow fossil running units 36
1 24
Slow fossil running units 36
1 24
Slow fossil running units 36
1 24
Slow fossil running units
1 24 36
Slow fossil running units 36
1 24
Slow fossil running units 36
1 24
Slow fossil running units
1 24 36
Slow fossil running units
…
1 year
… …
12 345 8760
» Reserve constraint:
Tunable policy parameter
» Other constraints:
• Ramping
• Capacity constraints
• Conservation of flow in storage
• ….
t 0
Given a system model (transition function)
St 1 S M St , xt ,Wt 1 ( )
And exogenous information process:
( S0 , x0 , W1 , S1 ,..., xt 1 , Wt , St )
This involves running a simulator.
» or F ( ) F ( , ) F ( , )
x
2
Wind
Solar
The search algorithm
Results
» The stochastic search algorithm reliably finds the optimum
from different starting points.
Different Starting
© 2019 Warren Points
B. Powell
The search algorithm
Empirical performance of algorithm
» Starting the algorithm from different starting points
appears to reliably find the optimal (determined using a
full grid search).
» Algorithm tended to require < 15 iterations:
• Each iteration required 4-5 simulations to compute the
complete gradient
• Required 1-8 evaluations to find the best stepsize
• Worst case number of function evaluations is 15 x (8+5) =
195.
• Budischak paper required “28 billion” for full enumeration
(used super computer)
» Run times
• Aggregated supply stack: ~100 minutes
• Full supply stack: ~100 hours
High carbon
tax, shift from
slow to fast
fossil, requires
minimal reserve
margin ~1 pct
100
Percent from renewable
Total renewables
80
Wind
60
40
20 Solar
0
0 20 50 60 70 80 90 100 150 300 400 500 1000 3000
$/MWhr cost of fossil fuels
Rolling forecasts,
updated each
hour.
Forecast made at
midnight:
Actual
Energy storage optimization
Benchmark policy – Deterministic lookahead
. . . .
t t 1 t2 t 3
The real process
© 2018 W.B. Powell
Energy storage optimization
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2018 W.B. Powell
Energy storage optimization
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2018 W.B. Powell
Energy storage optimization
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2018 W.B. Powell
Energy storage optimization
Benchmark policy – Deterministic lookahead
with:
» Lookup table modified forecasts (one adjustment term for
each time t ' t in the future):
x ttwr' x ttwd' t ' t fttE'
F ( , ) C S t (), X t (S t () | )
t 0
T
» The challenge
min E
is C
t 0
to optimize
S , X (S the
t t
| )parameters:
t
Energy storage optimization
One-dimensional contour plots – perfect forecast
» for i=1,…, 8 hours into the future.
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
q
Energy storage optimization
2.0
1.8
1.6
1.4
1.2
q1 1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
q10
Energy storage optimization
2.0
1.8
1.6
1.4
1.2
q1 1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
q2
Energy storage optimization
2.0
1.8
1.6
1.4
1.2
q2 1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
q3
Energy storage optimization
2.0
1.8
1.6
1.4
1.2
q3 1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
q5
Energy storage optimization
Simultaneous perturbation stochastic approximation
» Let:
• be a dimensional vector.
• be a scalar perturbation
• be a dimensional vector, with each element drawn from a normal (0,1)
distribution.
» We can obtain a sampled estimate of the gradient using two
function evaluations: and
éF ( x n + dn Z n ) - F ( x n - d n Z n ) ù
ê ú
ê n
2d Z1 n ú
ê ú
êF ( x n + dn Z n ) - F ( x n - d n Z n ) ú
ê ú
ê
Ñ x F ( x n , W n+ 1 ) = ê n
2d Z 2 n ú
ú
ê ú
ê M ú
ê n n n n n n
ú
êF ( x + d Z ) - F ( x - d Z ) ú
ê n n
ú
êë 2d Z p ú
û
Energy storage optimization
Notes:
» The next three slides represent applications of the
SPSA algorithm with different stepsize rules.
» These illustrate the importance of tuning stepsizes (or
more generally, stepsize policies).
» In fact, it is necessary to tune the stepsize depending on
the starting point (or at least the starting region).
» These were done with RMSProp.
Energy storage optimization
Effect of starting points
» Using a single starting point is fairly reliable.
Pct. Improvement over benchmark lookahead
Starting point
Energy storage optimization
Tuning the parameters
𝜃 𝑖=1 𝜃 𝑖 ∈[0,1] 𝜃 𝑖 ∈[0.5,1 .5] 𝜃 𝑖 ∈[1,2]
Energy storage optimization
Tuning the parameters
𝜃 𝑖=1 𝜃 𝑖 ∈[0,1] 𝜃 𝑖 ∈[0.5,1 .5] 𝜃 𝑖 ∈[1,2]
Energy storage optimization
Tuning the parameters
𝜃 𝑖=1 𝜃 𝑖 ∈[0,1] 𝜃 𝑖 ∈[0.5,1 .5] 𝜃 𝑖 ∈[1,2]
Energy storage optimization
Notes:
» Stepsize policies are extremely random.
» While there are rate of convergence results,
convergence is very random.
Research directions:
» Performing one-dimensional search using a “noisy
Fibonacci search”
» Using a homotopy method:
• Exploits the fact that when the noise in the forecast error =
zero.
• Parameterize the noise by and vary from 0 to 1.
• Try to find starting with .
Commitment
2:00 pm 3:00 pm
1:15 pm
decisions
1:30
2 Turbine 2
Ramping, but
no on/off
decisions. Turbine 3
Turbine 1
Notification time
1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30
1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30
1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30
1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30
Day-ahead
Intermediate
Real-time
29 offshore sub-blocks in 5
build-out scenarios:
» 1: 8 GW
» 2: 28 GW
» 3: 40 GW
» 4: 55 GW
» 5: 78 GW
July October
t 't 1
t
tH
min
xtt ' t '1,...,24
C(x
t 't
tt ' , ytt ' )
( ytt ' )t '1,...,24
tH
X (St | )= min
t
xtt ' t '1,...,24
C(x
t ' t
tt ' , ytt ' )
( ytt ' )t '1,...,24
xtmax
,t ' xt ,t ' up
Ltt ' Up-ramping reserve
xt ,t ' xtmax
,t ' down
Ltt ' Down-ramping reserve
tH
X (St | )= min
t
xtt ' t '1,...,24
C(x
t ' t
tt ' , ytt ' )
( ytt ' )t '1,...,24
xtmax
,t ' xt ,t ' up
Ltt ' Up-ramping reserve
xt ,t ' xtmax
,t ' down
Ltt ' Down-ramping reserve
up , down
» It is easy to tune this policy when there are only two
parameters
© 2019 Warren B. Powell
Stochastic unit commitment
Properties of a parametric cost function
approximation
» It allows us to use domain expertise to control the
structure of the solution
» We can force the model to schedule extra reserve:
• At all points in time (or just during the day)
• Allocated across regions
» This is all accomplished without using a multiscenario
model.
» The next slides illustrate the solution before the reserve
policy is imposed, and after. Pay attention to the
change pattern of gas turbines (in maroon).
Hour
ahead
Actual wind forecast
Modeling
( S0 , x0 X ( S0 ), W1 , S1 , x1 X ( S1 ),W2 , S2 ,...)
» Objective function
T
max E C St , X t ( St ) | S0
t 0
X t* ( St ) arg max xt C ( St , xt ) Vt 1 ( St 1 )
» where
Vt ( St ) arg max xt C ( St , xt ) Vt 1 ( S t 1 )
» An optimal policy finds the best decision given the optimal value
of the state that the decision takes us to:
T
X t ( St ) arg max xt C ( St , xt ) E max E C ( S t ' , X t ' ( S t ' )) | S t 1 | S t , xt
*
t 't 1
» We then
X t* ( Sreplace
t ) arg max xt C ( S t , xt ) E Vt 1 ( S t 1 ) | S t , xt
the downstream value of starting in state with a
value function:
P = $50
Pre-decision state
Post-decision state
S S S
© 2019 Warren B. Powell
Discrete Markov decision processes
Notes:
» “Dynamic programming” (or more precisely, discrete
Markov decision processes) exploits the property that
there is (or may be) a compact set of states that all
paths return to.
» Capturing this avoids the explosion that happens with
decision trees.
End step 4;
End Step 3;
Find Vt * ( St ) max At Q( St , at )
*
Store X t ( St ) arg max xt Q( St , at ). (This is our policy)
End Step 2;
End Step 1;
T
X ( St ) arg max xt C ( St , xt ) E max E C ( St ' , X t ' ( St ' )) | St 1 | S t , xt
*
t
t 't 1
X t* ( St ) arg max xt C ( St , xt ) E Vt 1 ( St 1 ) | St , xt
Etwind
Dt
Pt grid
Rtbattery
Etwind E wind
Eˆ wind
1 t t 1
Rtbattery
1 Rt
battery
Axt xt Controllable inputs
V ( St ) min xt X C ( St , xt ) E V ( St 1 ( St , xt ,Wt 1 ) | St
End step 4;
End Step 3;
Find Vt * ( Rt ) max xt Q( Rt , xt )
*
Store X t ( Rt ) arg max xt Q( Rt , xt ). (This is our policy)
End Step 2;
End Step 1;
© 2019 Warren B. Powell
Curse of dimensionality
Dynamic programming in multiple dimensions
Step 0: Initialize VT 1 ( ST 1 ) 0 for all states.
Step 1: Step backward t T , T 1, T 2,...
Step 2: Loop over St = Rt , Dt , pt , Et (four loops)
Step 3: Loop over all decisions xt (all dimensions)
Step 4: Take the expectation over each random dimension Dˆ t , pˆ t , Eˆ t
Compute Q( St , xt ) C ( St , xt )
100 100 100
V S S , x ,W
w1 0 w2 0 w3 0
t 1
M
t t t 1 ( w1 , w2 , w3 ) PW ( w1 , w2 , w3 )
End step 4;
End Step 3;
Find Vt * ( St ) max xt Q( St , xt )
*
Store X t ( St ) arg max xt Q( St , xt ). (This is our policy)
End Step 2;
End Step 1;
© 2019 Warren B. Powell
Curse of dimensionality
Notes:
» There are potentially three “curses of dimensionality”
when using backward dynamic programming:
• The state variable – We need to enumerate all states. If the
state variable is a vector with more than two dimensions, the
state space gets very big, very quickly.
• The random information – We have to sum over all possible
realizations of the random variable. If this has more than two
dimensions, this gets very big, very quickly.
• The decisions – Again, decisions may be vectors (our energy
example has five dimensions). Same problem as the other
two.
» Some problems fit this framework, but not very.
However, when we can use this framework, we obtain
something quite rare: an optimal policy.
© 2019 Warren B. Powell
Curse of dimensionality
Strategies for computing/approximating value
functions:
» Backward dynamic programming
• Exact using lookup tables
• Backward approximate dynamic programming:
– Linear regression
– Low rank approximations
» Forward approximate dynamic programming
• To be covered next week