Chapter 3 Dynamic Programming

Operations Research
Chapter 3: Dynamic Programming
Sonia REBAI
Tunis Business School
University of Tunis
Introduction
ü Dynamic programming (DP) is a recursive optimization approach that helps
take interdependent and sequential decisions.
ü A recursive optimization approach is a method that optimizes over a

number of steps so that each step will provide the next one with adequate
information.
ü As opposed to LP, there is no mathematical formalism that can lead to

solving DP
ü DP is a solution method to be adapted to each specific problem.

Introduction - continued
ü As « divide & conquer », DP combines solved sub-problems to approach
the global problem.
ü Unlike “divide & conquer”, subproblems of DP are not independent.

ü DP reduces the calculation effort by
§ Solving in “bottom-up” fashion the subproblems
§ Preserving in the memory solutions to encountered subproblems

§ Using the obtained solutions only when the related subproblems are
involved.
Foundations of DP
The basic features that characterize DP problems are :
ü The problem can be divided into sub-problems also called stages.
ü Each stage has a number of states.
ü Recursively, a policy decision is identified at each stage.
ü The effect of the policy decision at each stage is to transform the current
state to a state associated with the next stage.

Foundations of DP - continued
ü Given the current state, an optimal policy for the remaining stages is
independent of the policy decisions adopted in previous stages.
ü Two calculation approaches may be adopted :
§ Backward method: we start from the last step and we go back to the
first.
§ Forward method: we start from the first stage and we go to the last.
Consider an n-step sequential decision problem.
ü We decompose the problem into n steps, each corresponding to a particular
decision.
ü Each step will feed the next one so that the output of one step serves as
input to the next step. Decision
Input Step Output
Result
Backward approach
ü Xi : the decision at step i
ü Si : The input or state of the system at step i
ü Ri(Si, Xi) : The immediate result of decision Xi given that the state is Si.
ü Fi(Si,Xi) : the cumulative results from i to n given that at step i decision Xi
is chosen and the state is Si.

According to the backward approach, calculations are made from step n to 1.
X1 Xn-1 Xn
S1 Step S2 ... Sn-1 Step Sn

1 n-1 Step n
R1(S1,X1) Rn-1(Sn-1,Xn-1) Rn(Sn,Xn)
F1(S1,X1) Fn-1(Sn-1,Xn-1) Fn(Sn,Xn)=Rn(Sn,Xn)

Example 1: Shortest path problem
We want to move from a city A to a city H. Several paths are possible.
Determine the shortest path.

6
5 B E 4
2
3 7 1
A C F H
4 4
3
5
D G
Example 1: Shortest path problem - continued
We denote by:
ü Xi: the destination city at step i (i=1, ..., 3).
ü Si: the departure city in step i, (i=1, ..., 3).
ü Ri(Si, Xi): the distance between cities Si and Xi, (i=1, ..., 3).
ü Fi(Si,Xi) the distance traveled from city Si to city H knowing that Xi is the
destination at step i.
ü Xi* is the destination that minimizes the distance Fi(Si,Xi).
ü Fi*(Si) = Min Xi Fi(Si,Xi) = Fi(Si,Xi*).
Step 3 F3(S3,X3) = R3(S3, X3) Optimal Decision
S3 \ X 3 H F3*(S3) X 3*
E 4 4 H
F 1 1 H
G 3 3 H
Step 2
F2(S2,X2) = R2(S2, X2) + F3(X2) Optimal Decision
S2 \ X 2 E F G F2*(S2) X 2*
B 6 +4 = 10 2 +1 = 3 - 3 F
C - 7+ 1= 8 4+ 3 = 7 7 G
D - - 5 +3 = 8 8 G
Step 1
F1(S1,X1) = R1(S1, X1) + F2*(X1) Optimal Decision
S1 \ X 1 B C D F1*(S1) X 1*
A 5 + 3 = 8 3 + 7= 10 4 + 8 = 12 8 B
Thus, the shortest path linking A to H is A-B-F-H with length 8.
The recursive function is written as follows:
Fi(Si,Xi) = Ri(Si,Xi) + Fi+1*(Xi) (i = 1,2)
F3(S3,X3) = R3(S3,X3)
Forward approach
ü Xi : the decision at step i
ü Si : state of the system after step i;
ü Ri(Si,Xi) : The immediate result of step i based on decision Xi and given
that the state is Si;
ü Fi(Si,Xi) : the cumulative result of steps 1 to i given that at step i decision
Xi is made and the state is Si.

Unlike the backward approach, the forward method starts at step 1 till step n.
X1 X2 Xn
Step S1 Step S2 ... Sn-1 Step Sn

1 2 n
R1(S1,X1) R2(S2,X2) Rn(Sn,Xn)
F1(S1,X1)= R1(S1,X1) F2(S2,X2) Fn(Sn,Xn)

Example 1: Shortest path problem - continued
We denote by:
ü Xi: the departure city at step i (i = 1, ..., 3)
ü Si: the destination city at step i (i = 1,.., 3)
ü Ri(Si,Xi) : the distance between the cities Xi and Si
ü Fi(Si,Xi) the distance traveled from city A to city Si knowing that Xi is the
departure city at step i.
ü Xi* is the departure city which minimizes the distance Fi(Si,Xi)
ü Fi*(Si) = MinXi Fi(Si,Xi) = Fi(Si,Xi*)
Step 1 F1(S1,X1) = R1(S1,X1) Optimal Decision
S1 \ X1 A F1*(S1) X1*
B 5 5 A
C 3 3 A
D 4 4 A
Step 2
F2(S2,X2) = R2(S2,X2) + F1*(X2) Optimal Decision
S2 \ X2 B C D F2*(S2) X2*
E 6 + 5 = 11 - - 11 B
F 2 + 5 = 7 7 + 3 = 10 - 7 B
G - 4 +3 = 7 5 +4 = 9 7 C
Step 3
F3(S3,X3) = R3(S3,X3) + F2*(X3) Optimal Decision
S3 \ X3 E F G F3*(S3) X3*
H 4 + 11 =15 1 + 7 = 8 3 + 7 = 10 8 F
Thus, once again we find the same shortest path linking A to H:
A-B-F-H with length 8.
Fi(Si,Xi) = Ri(Si,Xi) + Fi-1*(Xi) (i = 1,2)
F1(S1,X1) = R1(S1,X1)
Which method to use?
ü The method to adopt depends on the availability of information on the
initial or final state
ü If we know the initial state but not the final state, then we use the
backward method
ü If we know the final state but not the initial state, then we use the forward
method
ü If both states are known, then both methods apply
Keep in mind
DP Characteristics
ü The problem can be decomposed into a number of steps
ü At each step, a number of candidate states may apply
ü To each step and each corresponding state, we identify the possible
decisions that can be made
ü We use a recursive formula so that at least one decision can be maintained
for each state

DP Characteristics - continued
ü The recursive formula expresses the immediate consequence of the decision
Ri(Si,Xi) combined with the best cumulative result over the various steps
accounted for from last to current in a forward approach, F*i-1(Si-1) and from
first to current in a backward approach, F*i+1(Si+1).
ü The general form of the recursive formula is:
Fi(Si,Xi) = f(direct result, optimal cumulative result over the previous steps)
DP Characteristics - continued
More precisely:
Fi(Si,Xi) = g(Ri(Si,Xi),Fi-1*(Si-1)) (forward)
Fi(Si,Xi) = g(Ri(Si,Xi),Fi+1*(Si+1)) (backward)
ü Function g could be additive, multiplicative or other.

ü State vectors Si-1 and Si+1 are expressed in terms of Si and Xi.
ü DP relies on the principle of optimality or Bellman Principle. That is, any

sub-policy of an optimal policy is also optimal
Example 2 : Budget Allocation
An industrial firm having a budget of 60,000 TD must allocate its entire budget
among its three plants in Tunis, Sousse, and Sfax. Each plant cannot receive
more than 40,000 TD. All amounts must be multiple of 10,000 TD. Expected
revenues for each type of investment in thousands of TD are given below:
Expected Revenues
Investments Tunis Sousse Sfax
0 0 0 0
10 30 45 35
20 50 60 75
30 90 70 95
40 100 90 110
Example 2 - continued
Given that plants of Tunis & Sfax must each receive a minimum of 10,000
dinars, determine the optimal budget allocation.
ü Xi : The budget allocated to plant i; i=1 for Tunis, 2 for Sousse, and 3
for Sfax.
ü Si : the available budget before making a decision about plants i,
i+1,…, 3, i = 1, 2, 3.
ü Ri(Si, Xi): the obtained revenue of plant i resulting from a budget
allocation of Xi and an available budget of Si for plants i, i+1,...3, where
i = 1, 2, 3.
ü Fi(Si,Xi) is the maximum cumulative revenue of plants i,…, 3, for a
total budget of Si for these plants and that the allocated budget for plant
i is Xi, i = 1, 2, 3.
ü Xi* optimal budget to allocate to plant i that maximizes revenue

Fi(Si,Xi).
ü Fi*(Si) = Max Xi Fi(Si,Xi) = Fi(Si,Xi*).

Step 3 (Sfax) F3(S3,X3) = R3(S3, X3) Optimal decision
S3 \ X 3 10 20 30 40 F3*(S3) X3 *
10 35 - - - 35 10
20 - 75 - - 75 20
30 - - 95 - 95 30
40 - - - 110 110 40
Step 2 (Sousse)
F2(S2,X2) = R2(S2, X2) + F*3(S2-X2) OD
S2 \ X 2 0 10 20 30 40 F2*(S2) X2*
20 0+75 =75 45 +35 =80 - - - 80 10
30 0+95 =95 45 +75 =120 60 +35 =95 - - 120 10
40 0 +110 =110 45 +95 =140 60 +75 =135 70 +35 =105 - 140 10
50 - 45+110 =155 60 +95 =155 70 +75 =145 90 +35 =125 155 10 ou 20
Step 1 (Tunis)
F1(S1,X1) = R1(S1, X1) + F*2(S1-X1) OD
S1 \ X 1 10 20 30 40 F1*(S1) X1 *
60 30 +155 =185 50 +140 =190 90 +120 =210 100 +80 =180 210 30
Consequently, the optimal allocation is of 30,000 TD for the plant of Tunis,
10,000 TD for the plant of Sousse and 20,000 TD for the plant of Sfax. The
total revenue is of 210,000 TD.
Fi(Si,Xi) = Ri(Si,Xi) + Fi+1*(Si -Xi) (i = 1,2)
F3(S 3,X3) = R3(S3,X3)

Example 3 : Production Planning Problem
We are interested in determining the production levels of a certain product over
the next 4 months. A production run involves a fixed cost of 3 DT and a variable
cost of 1 DT per unit. At the end of each month, any excess of stock involves a
holding cost of 0.5 DT per unit. At any month, the production capacity is of 4
units while the storage capacity is of 2 units. The demand for the next 4 months
is respectively 1, 3, 2, and 4. Given that the initial stock is empty, determine the
optimal production plan.

ü n=4
ü We use the backward method as the initial stock is known
ü Xi : The quantity to produce at month i, (i = 1, ..., 4)
ü Si : The stock level at the start of month i (i = 1,…, 4)
ü Ri (Si, Xi) = Cost at month i (i = 1,…, 4)

= Production cost + holding cost
= Production cost of Xi units + holding cost of (Si+Xi–Di) units
where Di is the demand of month i (i =1,…, 4)
(3+1*Xi) + 0.5*(Si+Xi–Di) if Xi ≠ 0
Ri (Si, Xi) =
0.5*(Si+Xi–Di) if Xi = 0
ü Fi (Si,Xi) = total minimum cost for months i, i+1,…, 4, given that at the start
of month i the stock level is Si and Xi units are to be produced (i =1,…,4).
ü Fi*(Si) = Min Xi Fi(Si,Xi) (i =1,…, 4)
ü Fi(Si, Xi) = g(Ri(Si,Xi) , F*i+1(Si+1))
= g(Ri(Si,Xi) , F*i+1(Si + Xi – Di))
= Ri(Si,Xi) + Fi+1*(Si + Xi – Di), (i =1,..,3)
ü F4(S4,X4) = R4(S4,X4)
Step 4 (month 4) D4 = 4
F4(S4,X4) = R4(S4,X4) Optimal Decision
S4\ X4 2 3 4 F4*(S4) X4*
0 - - 3 +4+ 0 = 7 7 4
1 - 3 + 3+ 0 = 6 3+4 + 0,5 = 7,5 6 3
2 3 + 2+ 0 = 5 3 + 3+ 0,5 =6,5 3+ 4 + 1 = 8 5 2
Step 3 (month 3/ month4) D3 = 2

F3(S3,X3) = R3(S3,X3) + F4*(S3+X3-D3) Optimal Decision
S3 \ X3 0 1 2 3 4 F3*(S3) X3*
0 - - 3 +2+ 0 +7 3 +3 +0,5 + 3 +4+ 1+
12 2
= 12 6 = 12,5 5 =13
1 - 3+1+0+7 3 +2+ 0,5 + 3 +3+ 1+ - 11 1
= 11 6 = 11,5 5 = 12
2 0+ 0 +7 3 +1 +0,5 3 +2 + 1+ - -
= 7 + 6 =10,5 5 = 11 7 0
Step 2 (months 2/3-4) D2 = 3
F2(S2,X2) = R2(S2,X2) + F3*(S2+X2-D2) Optimal Decision
S2 \ X2 1 2 3 4 F2*(S2) X2*
0 - - 3 +3 +0 + 3 +4+ 0,5
18 3
12 = 18 +11=18,5
1 - 3 + 2 + 0 +12 3 +3+ 0,5 3 + 4 + 1 15 4
= 17 + 11=17,5 + 7 = 15
2 3 + 1+ 0 + 12 3+ 2 + 0,5 + 3 +3 + 1+
- 14 3
= 16 11 = 16,5 7 = 14
Step 1 (months 1/2-4) D1 = 1

F1(S1,X1) = R1(S1, X1) + F2*(S1+X1-D1) Optimal Decision
S1 \X1 1 2 3 F1*(S1) X1*
0 3+1+ 0+18 = 22 3 +2+ 0,5 +15 = 20,5 3 + 3 +1+14= 21 20,5 2
Hence, the optimal production plan is: X1*= 2, X2*= 4, X3*= 0, X4*= 4 with a
minimum cost of = 20.5 dinars.
Hence, the optimal production plan is: X1*= 2, X2*= 4, X3*= 0, X4*= 4
with a minimum cost of = 20.5 dinars.

Chapter 3 Dynamic Programming

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 Dynamic Programming

Uploaded by

Copyright:

Available Formats

Operations Research

Chapter 3: Dynamic Programming

ü A recursive optimization approach is a method that optimizes over a

ü As opposed to LP, there is no mathematical formalism that can lead to

ü DP is a solution method to be adapted to each specific problem.

ü Unlike “divide & conquer”, subproblems of DP are not independent.

§ Preserving in the memory solutions to encountered subproblems

ü The problem can be divided into sub-problems also called stages.

ü Each stage has a number of states.

ü Recursively, a policy decision is identified at each stage.

state to a state associated with the next stage.

independent of the policy decisions adopted in previous stages.

ü Two calculation approaches may be adopted :

Input Step Output

ü Si : The input or state of the system at step i

ü Fi(Si,Xi) : the cumulative results from i to n given that at step i decision Xi

is chosen and the state is Si.

S1 Step S2 ... Sn-1 Step Sn

R1(S1,X1) Rn-1(Sn-1,Xn-1) Rn(Sn,Xn)

F1(S1,X1) Fn-1(Sn-1,Xn-1) Fn(Sn,Xn)=Rn(Sn,Xn)

Determine the shortest path.

The recursive function is written as follows:

Fi(Si,Xi) = Ri(Si,Xi) + Fi+1*(Xi) (i = 1,2)

ü Si : state of the system after step i;

ü Ri(Si,Xi) : The immediate result of step i based on decision Xi and given

that the state is Si;

ü Fi(Si,Xi) : the cumulative result of steps 1 to i given that at step i decision

Xi is made and the state is Si.

Step S1 Step S2 ... Sn-1 Step Sn

R1(S1,X1) R2(S2,X2) Rn(Sn,Xn)

F1(S1,X1)= R1(S1,X1) F2(S2,X2) Fn(Sn,Xn)

A-B-F-H with length 8.

The recursive function is written as follows:

Fi(Si,Xi) = Ri(Si,Xi) + Fi-1*(Xi) (i = 1,2)

ü At each step, a number of candidate states may apply

ü To each step and each corresponding state, we identify the possible

decisions that can be made

ü We use a recursive formula so that at least one decision can be maintained

for each state

ü The general form of the recursive formula is:

Fi(Si,Xi) = g(Ri(Si,Xi),Fi-1*(Si-1)) (forward)

Fi(Si,Xi) = g(Ri(Si,Xi),Fi+1*(Si+1)) (backward)

ü Function g could be additive, multiplicative or other.

ü DP relies on the principle of optimality or Bellman Principle. That is, any

ü Xi* optimal budget to allocate to plant i that maximizes revenue

ü Fi*(Si) = Max Xi Fi(Si,Xi) = Fi(Si,Xi*).

The recursive function is written as follows:

Fi(Si,Xi) = Ri(Si,Xi) + Fi+1*(Si -Xi) (i = 1,2)

F3(S 3,X3) = R3(S3,X3)

optimal production plan.

ü We use the backward method as the initial stock is known

ü Xi : The quantity to produce at month i, (i = 1, ..., 4)

ü Si : The stock level at the start of month i (i = 1,…, 4)

ü Ri (Si, Xi) = Cost at month i (i = 1,…, 4)

ü Fi*(Si) = Min Xi Fi(Si,Xi) (i =1,…, 4)

ü Fi(Si, Xi) = g(Ri(Si,Xi) , F*i+1(Si+1))

= g(Ri(Si,Xi) , F*i+1(Si + Xi – Di))

= Ri(Si,Xi) + Fi+1*(Si + Xi – Di), (i =1,..,3)

Step 3 (month 3/ month4) D3 = 2

Step 1 (months 1/2-4) D1 = 1

You might also like

ü Fi(Si) = Max Xi Fi(Si,Xi) = Fi(Si,Xi).