Dynamic Programming

DYNAMIC
PROGRAMMING
Lecture 3
Prof. Preetam Basu
IIM Calcutta
Stochastic Dynamic Programming
State at the next stage is not completely

determined by the state and the decision
at the current stage.
There is a probability distribution for what

the next state will be
The probability distribution of the next

state is completely determined by the
state and the decision at the current stage
Formulation: Stochastic Dynamic

Programming
Stochastic dynamic programming

problems can be solved using
recursions of the following form (for
max problems):
f t (i ) max (expected reward at stage t | i, a )

a
f T (i ) boundary condition
p ( j | i, a ) f
j
t 1 (
j )
Stochastic Dynamic Programming

Example
When Sally arrives at the bank, 30 minutes remain in her lunch

break.
If Sally makes it to the head of the line and enters service before the
end of her lunch break, she earns reward r . Here the assumption is
Sally earns the reward r if her transaction starts. This model is
represented as Model 1
There is another way of interpreting the reward r: If we assume that

Sally earns the reward only when her transaction is completed, then
the model will be a little bit different. This model is given as Model 2
However, Sally does not enjoy waiting in lines, so to reflect her dislike
for waiting in line, she incurs a cost of c for each minute she waits.
During any minute in which n people are ahead of Sally, there is a

probability p(x|n) that x people will complete their transactions.
Suppose that when Sally arrives, 20 people are ahead of her in line.
Use dynamic programming to determine a strategy for Sally that will

maximize her expected net revenue (reward-waiting costs).
Solution: Model 1
When Sally arrives at the bank, she must decide

whether to join the line or give up and leave.
At any later time, she may also decide to leave if it is

unlikely that she will be served by the end of her
lunch break.
We can work backward to solve the problem.
We define ft(n) to be the maximum expected net

reward that Sally can receive from time t to the end of
her lunch break if at time t, n people are ahead of her.
Solution Contd: Model 1
We let t=0 be the present and t=30 be

the end of the problem. Boundary
f 30 (n) 0, n 0
Conditions
f t (0) r are:
for any t
and
Since t=29 is the beginning of the last
minute of the problem, we write
0
(Leave)
f 29 (n) max rp (n | n) c
p (k | n) f 30 (n k ) (Stay)
k n
Solution Contd
For t<29, we write

0
(Leave)
f t (n) max rp (n | n) c p (k | n) f (n k ) (Stay)
t 1
k n
Solution Contd
The last recursion follows, because if

Sally stays, she will earn an expected
reward (as in the t=29 case) of rp(n|n)-c
during the current minute, and with
probability p(k|n), there will be n-k
people ahead of her; in this case, her
expected net reward from time t+1 to
time 30 will be ft+1(n-k).
If Sally stays, her overall expected

p (k | nfrom
) f t 1 (n time
k ) t+1, t+2,
reward received
k n
,30 will be
Solution Contd
To determine Sallys optimal waiting

policy, we work backward until f0(20) is
computed.
Problems in which the decision maker

can terminate the problem by choosing
a particular action are known as
stopping rule problems.
Model 2
Here assumption is: Sally earns the reward

r only when her transaction is complete
As in Model 1, we let t=0 be the present
(n) problem.
0, n 0.
and t=30 be the end off 30the
For any t<30, we write
(Leave)
0
f t (n) max rp (n 1 | n) c p (k | n) f (n k ) (Stay)
t 1
k n
Example 1: Inventory
Management
ABC Fashion Stores is a leading retailer for Mens shirts. ABC

orders shirts from XYZ manufactures on a monthly basis. In
the current month ABC has 100 shirts in their stock. The
management at ABC need to determine an optimal ordering
policy for the next 12 months. Each month they cannot order
more than M shirts. Demand for the shirts in known and
follows a probability distribution p(Dt). Purchasing cost of each
shirt is c, holding cost for each unsold shirt is h per month,
and the salvage value for each unsold shirt at the beginning
of the 13 th month is s. The selling price of each shirt is k.
The orders made by ABC have a lead-time of one month, i.e.,

whatever is ordered in the current month will be available to
ABC in the next month.
Formulate the above as a probabilistic dynamic program that

maximizes profit for ABC.
Example 2: Capacity
Expansion
The management at Khosla Constructions is strategically

planning to add production facilities to their current operations
for the next 10 years. They want to decide how many new
facilities they should add each year for the next 10 years. Each
year they cannot add more than M facilities. Presently they have
15 facilities. Each of the facilities produce n units of a special
type of generator. The cost of adding a new facility is Rs. U. The
generators produced by Khosla earn Rs. B per unit. The demand
for the generators follow a probability distribution given by p(Dt).
At the end of the 10th year production facilities are salvaged for
Rs. S/facility. Develop a dynamic programming model that
maximizes the profit for Khosla Constructions.
Assume each year Khosla produces to the full capacity, the cost
running each facility is k and the capacity added at t will only
come into operation at t+1
Example 3

Dynamic Programming

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dynamic Programming

Uploaded by

Copyright:

Available Formats

DYNAMIC

Stochastic Dynamic Programming

State at the next stage is not completely

There is a probability distribution for what

The probability distribution of the next

Formulation: Stochastic Dynamic

Stochastic dynamic programming

f t (i ) max (expected reward at stage t | i, a )

Stochastic Dynamic Programming

When Sally arrives at the bank, 30 minutes remain in her lunch

There is another way of interpreting the reward r: If we assume that

During any minute in which n people are ahead of Sally, there is a

Use dynamic programming to determine a strategy for Sally that will

When Sally arrives at the bank, she must decide

At any later time, she may also decide to leave if it is

We can work backward to solve the problem.

We define ft(n) to be the maximum expected net

Solution Contd: Model 1

We let t=0 be the present and t=30 be

For t<29, we write

f t (n) max rp (n | n) c p (k | n) f (n k ) (Stay)

The last recursion follows, because if

If Sally stays, her overall expected

To determine Sallys optimal waiting

Problems in which the decision maker

Here assumption is: Sally earns the reward

ABC Fashion Stores is a leading retailer for Mens shirts. ABC

The orders made by ABC have a lead-time of one month, i.e.,

Formulate the above as a probabilistic dynamic program that

The management at Khosla Constructions is strategically

You might also like