You are on page 1of 23

This article was downloaded by: [116.102.89.

57] On: 19 August 2016, At: 07:10


Publisher: Institute for Operations Research and the Management Sciences (INFORMS)
INFORMS is located in Maryland, USA

INFORMS Transactions on Education


Publication details, including instructions for authors and subscription information:
http://pubsonline.informs.org

Implementing Dynamic Programs in Spreadsheets


John F. Raffensperger, Pascal Richard,

To cite this article:


John F. Raffensperger, Pascal Richard, (2005) Implementing Dynamic Programs in Spreadsheets. INFORMS Transactions on
Education 5(2):25-46. http://dx.doi.org/10.1287/ited.5.2.25
Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions
This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact permissions@informs.org.
The Publisher does not warrant or guarantee the articles accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.
Copyright 2005, INFORMS
Please scroll down for articleit is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, management
science, and analytics.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

Vol. 5, No. 2, January 2005, pp. 2546


issn 1532-0545  05  0502  0025

informs

doi 10.1287/ited.5.2.25
2005 INFORMS

I N F O R M S
Transactions on Education

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

Implementing Dynamic Programs in Spreadsheets


John F. Raffensperger

Department of Management, Private Bag 4800, University of Canterbury, Christchurch,


New Zealand, john.raffensperger@canterbury.ac.nz

Pascal Richard

University Institute of Technology, Business Management Department,


and Laboratory of Applied Computer Science, Poitiers, France, pascal.richard@ensma.fr

his paper has two purposes: to help explain dynamic programs (DP) to students who have no background
in procedural languages, and to quantitatively motivate good spreadsheet designs that can be developed
and modied easily. Teachers and practitioners could benet from using spreadsheets to solve the common
problems that are suited to DP. We give examples of several important DPs, implemented in different ways
with a spreadsheet.
We further analyze these DPs for their constructive complexity, which is the number of keystrokes required
to write a spreadsheet for a given computational task, as a function of the input data. We show that a given DP
can be written in several ways, with varying constructive complexity. The implementation can drastically affect
the difculty of writing and modifying a spreadsheet. The different examples demonstrate the value of good
spreadsheet design.

1.

Introduction

see, complex problems can be implemented in spreadsheets using just a few formulas. In this paper, we
show that spreadsheets can reduce the gap between
theoretical foundations of DP techniques and practical
applications needed in any courses.
We identify the likely teaching environment for
this material to be the lecturer in operations research
or quantitative methods, with students in business
who do not have a background in procedural languages. Engineering students may appreciate viewing
the recursion graphically, but are likely to have done
some programming, and therefore will not benet as
much. Anyone, though, can use our spreadsheets as
templates. The lecturer can put Excels handy sensitivity analysis tools to good use. So this paper is
supplementary teaching material to the lecturer who
wishes to give students a sense of the recursive nature
of DPs, without resorting to a procedural language.
With our focus on the graphical layout, we do
not use Visual Basic (VB). We want to test the
ability of the spreadsheet to solve these models,
and using VB would in fact be using a procedural
language. VB, however, can be used to good effect,
and writing a DP in VB is easy for someone with
skill in programming. As an aside, we implemented
a simple Gilmore-Gomory algorithm as a macro in
Excel (Knapsack by VB.xls http://archive.ite.journal.
informs.org/Vol5No2/RaffenspergerRichard/

1.1.

Why Teach Dynamic Programming with


a Spreadsheet?
Students in management science learn operations
research techniques, such as linear programming,
PERT/CPM methods, and dynamic programming
(DP). Teaching is based on specic softwares that
are mainly used in the academic world. Specialists
in linear programming can use the Lindo package;
others dealing with project management can use
Microsoft Project and so on. But in contrast to linear
programming and project management, no generic
tools are available to solve DPs. So DP is often viewed
as an algorithmic approach to combinatorial problems, rather than a tool. For these reasons, DP is not
extensively studied by students learning management
science, even though the underlying principles are
foundations of optimization theory.
Spreadsheets are widely accepted in the business
world and are not dedicated to specic techniques.
Spreadsheets can be extended easily with add-ins. The
spreadsheet is also accepted as a good modeling tool,
as suggested by numerous textbooks. While spreadsheets are not particularly suitable for solving large
dynamic programs, especially for hard problems with
an exponential number of states, spreadsheets can
be helpful in teaching DP to students. As we shall
25

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

26

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

Spreadsheets/KnapsackbyVB.xls). Surprisingly, the


VB code took about 50 times longer than our simplest pure spreadsheet version. The code is in the
appendix.
Other people have used VB to good use with Excel.
In fact, preceding these, Ho (1987) gave Lotus 1-2-3
macros to solve some classic DP problems (and LP
problems). Jensen (2002) has a nice DP tutorial with
Excel using VB heavily, along with many other excellent materials. Sarmiento (2003) uses VB templates
elegantly to solve DPs.
In contrast to the VB approach, Neuwirth (1995)
uses the spreadsheet to get insight into recursive relations for combinatorial formulas, and as a graphical
tool to represent combinatorial relations between
objects. Our paper is in the same spiritwe hope
to get insight into the underlying algorithms, and
how they work, using the graphical layout of the
spreadsheet. We intend to use the graphical environment apart from procedural code, and we want
to demonstrate good spreadsheet design, so VB is
a digression. By exposing the DP in Excel, the student can visually see the number of cells required,
and get an idea of the computational complexity and
its recursive nature. In contrast, a tool like Solver
hides the complexity intentionally, using the worksheet only to hold the input and output. So students
do not always appreciate what is going on behind the
dialog boxes. Furthermore, our analysis of constructive complexity has application beyond dynamic programs, but would not apply in the same way if we
were to use VB.
We briey survey a sample of textbooks with material on DP. These books may be divided into three
sets, spanning a range from theoretical to practical.
The rst set, at one end of the range, has the
textbooks that are primarily about DP. These vary
in level, with some appropriate for undergraduate
applied courses, and others better suited for graduates. Except where noted, these books are theoretical,
and lack any hints about how these DPs would be
implemented. Beautifully written and typeset, Bather
(2000) has plenty of simple numerical examples, and
exercises (with partial answers in the back). The text
is more theoretical in the latter half, and is aimed
more at math students, with little about implementation. Beckmann (1968) ties DP clearly to LP. The text
would be suitable for undergraduate math or economic, though it lacks exercises. Bellman and Dreyfus
(1962) remains one of the most beautiful texts in spite
of its age. Flowcharts show the logic. The text has lots
of numeric examples. While it has no exercises, it is
clearly intended for practical use, with sections titled,
Description of the Computational Procedure. The
book frequently touches on issues of computer implementation. Cooper and Cooper (1981) is quite mathematical, and would be appropriate for graduate math

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

or O.R. students. Dreyfus and Law (1977) give plenty


of exercises and bless the reader with their answers in
the back of the book. They give very little about how
to actually implement DPs. The words computer
and memory do not appear in the index. Nevertheless, the text is clear with nice direct language.
Gluss (1972) discusses computation time in some
detail and shows a ow chart. The text is a bit heavy
on equations, light on diagrams, and very light on
applications. Hadley (1970) reads like the older classic
texts, and discusses computing issues to some extent.
His comments on DP are relevant to the problems
of putting DPs in spreadsheets. The reader    will
note that no two problems yielded precisely the same
form of recurrence relations. This is a characteristic of dynamic programming. The recurrence relations obtained to solve the problem are strongly
dependent on the particular structure of the problem being solved. For this reason, for example, it
is not possible to develop a general computer code
for solving all dynamic programming problems, as
we were able to do for linear problems. Nemhauser
(1966), in this reviewers mind, is one of a few with
a rm grip on the phrase classic DP textbook.
Though almost 40 years old, the writing is clear and
direct. The ow charts are the closest thing to reading a computer program, but without the difculties of the syntax. Puterman (1978), not a text, is a
broad collection of useful papers, including applications. The print suffers a bit from being typewritten rather than typeset. Sniedovich (1992) addresses
head-on the sometimes confusing question, What
is dynamic programming? He uses nice direct language, but the text lacks exercises. The book would
serve as a useful reference, especially for a theoretical course in DP. Bertsekas (2001) wrote two beautifully written comprehensive volumes, one on nite
horizons, and the second for innite horizons. The
texts are appropriate for graduate students. Denardo
(2003), the most recent of the texts, is shorter than
the Bersekas work, and would be more accessible to
undergraduates.
The second set of books with material on DP
is the general OR textbooks. Wagner (1975), while
an older text now, is beautifully written. The exercises are pencil pushing, in Wagners words, but
he points out that these problems should be solved
by a computer. He covers DP at some length, over
several chapters. The material is still appropriate
for undergraduate work, and highly approachable,
it is just 30 years old. Winston (2003) has almost
50 pages on deterministic DP, the most thorough
of the modern texts seen. He shows network diagrams and table calculations. His nal section briey
covers DP with Excel. We use his knapsack example here in this paper. Winston has a separate chapter on probabilistic DP. Hillier et al. (2000) have less

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

27

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

material than Winston, and nothing on DP in spreadsheets, but still good basic coverage. Lawrence and
Pasternack (1998) put a small section on DP in the
companion CD-ROM, rather than in the printed text,
but include specialized DP software. In general, the
OR textbooks have good basic sections on DP. They
cover the classic models in clear direct fashion. It
still seems, however, that we have not moved much
beyond Wagners pencil pushing for DP, except for
smatterings of spreadsheets and the occasional special
bit of software.
The third set of texts with material about DP is the
computer science texts. We found these to be quite
appealing, in some ways, because the texts go right
to the heart of the matter. How, exactly, do we get a
computer to solve these problems? We mention two
texts here. Weiss (1993) has only a short section on DP,
and misses the applications that would be considered
standard in O.R. However, he discusses the critical
issue of using a table rather than recursion, and shows
code in C. He goes into some detail about the number
of operations required for a given implementation.
Sedgewick (1988) similarly has a modest section on
DP, discusses implementation issues in detail, giving
Pascal-like pseudocode to show the algorithms. We
expect almost any basic text on algorithms will have
something on DP, and will emphasise implementation over DP theory. So the DP literature has quite a
range of material, from theoretical without touching
on implementation, to practical with little DP theory.
For the typical O.R. undergraduates, the textbooks
do seem to have good discussions, and could use
more only in implementation. Given the difculties
with creating a general program for DP, as Hadley
pointed out, this is understandable. Hence, for implementation, we must look to the computer science
textbooksor spreadsheets.
1.2. Outline of the Paper
The next sections present well-known DPs and
several spreadsheet implementations leading to different constructive complexities and different pedagogical presentations. The spreadsheets are linked to
the images in each section. The sections are as follows:
constructive complexity, illustrated with the wellknown Critical Path Method,
general suggestions about designing a spreadsheet DP, illustrated with the classic knapsack
problem,
the batch scheduling problemminimizing
makespan for a single batching machine,
minimizing the weighted number of late jobs on
a single machine,
minimizing the makespan on parallel unrelated
machines.

A point of notation: the top of each spreadsheet shows


the word Formula in a cell with a gray background.
This indicates that every cell with a formula has
that light gray background format. All other cells are
constant.

2.

Constructive Complexity versus


Computational Complexity

Operations researchers evaluate the efciency of


an algorithm by its complexity using the big O()
notation. This efciency is the number of operations
required to solve a problem as a function of the input,
without considering technical details about computer
technologies. Let I be the input to the algorithm and
f I be the time to run the algorithm with input I.
Then an algorithm is:
a polynomial time algorithm if f I is a polynomial in the size of the input;
a pseudo-polynomial time algorithm if f I
depends on the size of the input (for instance the maximum value of input data);
an exponential algorithm if f I is an exponentional function in the size of the input.
Most of Excels built-in functions solve problems that
can be calculated in polynomial time. Thus, the number of cells required to implement a DP in a spreadsheet leads directly to its computational complexity.
Due to their computational complexity, even polynomial DPs can take up a lot of space in a spreadsheet.
In addition to computational complexity, the spreadsheet writer should be concerned with nding a good
design for the spreadsheet, to minimize the cost of
writing and revising the spreadsheet if the problem parameters change. Similarly, lecturers should
teach good spreadsheet design. These design issues
have little to do with computational complexity.
That is why we provide a new metric to qualify
spreadsheets.
A spreadsheet that requires a few keystrokes to
create and modify is generally considered a better design than a spreadsheet that solves the same
problem, but that requires more keystrokes to create and modify. We therefore introduce the concept
of constructive complexity. Constructive complexity is
the number of keystrokes required to implement a
given computational problem in a spreadsheet, as
a function of the input data (i.e., the instance). The
measure of constructive complexity will be based as
for computational complexity on the Big O notation, to capture the asymptotic behavior of a function.
To prove constructive complexity, a necessary condition is to give the keystrokes to accomplish the
given task. Thus, constructive complexity quanties
some aspects of good spreadsheet design. Constructive complexity for a given task depends on the

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

28

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets


INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

computer-user interface, the spreadsheets features,


and the writers cleverness.
The referees have made several suggestions to
improve our spreadsheets. Others may be able to
improve these further. We welcome such suggestions.
The point is that constructive complexity helps frame
which design is better, and why. The lecturer should
make this point to students, and can motivate this
notion with our examples. A reasonable person may
be able to conceive of extreme cases where an O(1)
spreadsheet is harder to write than a spreadsheet of
higher complexity. But the point is to get students to
think about it.
For data entry, we can usually enter data in the
format we like in the fewest keystrokes, then use formulas to convert the data to the format or layout that
the spreadsheet requires. If we can create this conversion in O(1) keystrokes, then the data entry is effectively sunk. If we cannot do the conversion in O(1)
keystrokes, then data entry affects the constructive
complexity of the spreadsheet. A spreadsheet writer
could require that data be entered more than once,
whether from poor design or perhaps for data verication. So data entry may not always require a xed
number of keystrokes. Except for such strange cases,
data entry will almost always require the same order
of magnitude of input operations on any input device.
We therefore will treat data entry as a sunk cost. We
also note that a rectangular selection and ll are O(1)
operations.
Seal (2001) gave a spreadsheet implementation for
CPM and PERT. Ragsdale (2003) greatly improves on
Seals already good work. With CPM, we use recursion to nd the longest path through a network of
nodes and arcs. In this section, we briey analyze
Ragsdales two spreadsheets for their constructive
complexity.
Figure 1a

Figure 1a shows Ragsdales Standard spreadsheet


example for CPM. We have modied it only in the
formatting and some of the text. A glance at the formulas in Figure 1b for EST and LFT shows that these
formulas are not consistent. Each one must be edited
by hand, a cell for each node, with a reference for
each predecessor. As Ragsdale points out, numerous
manual changes to formulas are required if project
activities are added or deleted, or if predecessor relations among the activities are altered. Constructive
complexity quanties the difculty. The constructive
complexity of this spreadsheet is On2 , where n is
the number of activities, because each of the On2 
arcs requires an edit by hand. Addition of a new node
requires insertion of a row, and then lling in the row
of formulas. Each EST and LFT formula cannot be
copied, and must be written by hand in On time,
meaning that each formula could require an edit for
all n nodes. If we think of the spreadsheet writer as
the processor, the number of arcs and nodes as the
input data, and writing the spreadsheet as an algorithm, the spreadsheet-writing algorithm would run
in On2  time.
Figure 1c below shows Ragsdales New CPM
spreadsheet (which also has a nice Gantt chart which
we do not show). Again, we have modied it only
in the formatting and some text. This spreadsheet can
be written and modied in O(1) time, meaning that
the time required to make the spreadsheet is constant,
regardless of the size of the model. Formulas do not
need updating if the predecessors change, and addition or deletion of an activity requires only addition
or deletion of the corresponding row. Thus, an analysis of the constructive complexity quanties how much
easier the spreadsheet is to write.
Interestingly, this version uses circular references.
For example, cell G6 depends on H6, while H6

Ragsdales Standard Spreadsheet Example for CPM. Here and Throughout This Paper, Formulas Are Gray, Like the Cell with the Label
Formula

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

29

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

Figure 1b

Formulas Displayed for Ragsdales On 2  CPM Spreadsheet in Figure 1a

Figure 1c

Ragsdales O1 CPM Spreadsheet

Key Cell Formulas


Cell

Formula

Copied to

E6
F6
G6
H6
I6

{=MAX(IF(ISERR(FIND($A$6:$A$15,C6)),0,$F$6:$F$15))}
=E6+D6
=H6-D6
{=MIN(IF(ISERR(FIND(A6,$C$6:$C$15)),MAX($F$6:$F$15),$G$6:$G$15))}
=G6-E6

E7:E15
F7:F15
G7:G15
H7:H15
I7:I15

depends on G6. Even so, the spreadsheet is correct.


The latest start time for a task depends on the latest
nish for previous tasks. The ISERR() function returns
TRUE if its argument is any error (except #N/A), else
FALSE. The FIND(c, string) returns an error if character c is not in string. Note that Excel formulas inside
curly brackets
are array formulas, to be entered
with Ctrl-Shift-Enter. Even if the student does not
understand the details of these formulas, the overall
effect of the stronger design should be obvious.
In the next section, we discuss general principles of
design for a spreadsheet DP, with constructive complexity in mind.

3.

Designing a DP in a Spreadsheet:
The Classic Recursion for
the Knapsack

The basic principle of dynamic programming is that


an optimal policy for a general optimization problem
also leads to an optimal policy for every subproblem
that belongs to it. This allows a complex problem
to be decomposed into stages. At each stage, a decision is taken and its return is evaluated. The result
is then used by subsequent stages. All stages (and
decisions) are then dependent and can be graphically
represented as a decision tree. Each stage takes as

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

30

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

input the values of the state variables, which leads to


new values of these variables as output (i.e., after the
decision making process). These computations allow
calculation of the optimal value of the objective function, but not the optimal policy. To obtain the optimal policy, a primal retrieval must be designed to
establish which decision at a given stage is associated
with the optimal value of the objective function. Thus,
solving a dynamic program is a sequential process,
even for the primal retrieval. The DP can be visualized as a graph. Each stage and state combination is
a node, and each decision is an arc from one node to
another node.
To see how this network structure works, consider
the general integer knapsack problem, a classic example of DP recursion. For now, we will omit the primal
retrieval to focus on the mechanics of the recursion.
Max 16x1 + 22x2 + 12x3 + 8x4
5x1 + 7x2 + 4x3 + 3x4 14
x1 x2 x3 x4 non-negative and integer
Let n be the number of items, four in this case, and
b be the capacity of the knapsack, 14 in this case. The
dynamic programming network is shown in Figure 2.
Arcs starting from every node dene a decision. The
state variable is the available knapsack capacity as
a node and corresponding prots are represented as
node labels.
Textbook DP examples usually show one table per
stage. Our spreadsheets will use only one or two
tables for enumerating all stages and values of state
variables. Generally, a spreadsheet DP can have two
alternate layouts:
a one-dimensional table, where each arc is listed
in a column or row, and stages are in a separate table,
a two-dimensional table, where each arc is a cell
in the table, with stages listed in a column at the right,
or a row at the bottom.
The layout can be simplied if a dimension can
be embedded within an Excel formula, as we shall
do to reduce the constructive complexity. We shall
see that spreadsheet DPs are sometimes conveniently
written with matrix transposition, circular references,
database lookups, and array functions, all of which
are advanced tools. These functions may intimidate
Figure 2

A DP Network for the General Integer Knapsack Problem

students, but can serve to motivate the use of such


functions. Certainly the advanced functions do not
need to be presented rst, and the spreadsheet may
be easier to develop if it starts simple, and is then
improved.
As a general approach for implementing DP in a
spreadsheet, we recommend rst nding a very simple spreadsheet for a small example. The spreadsheet formula dependencies represent dependencies
among nodes in a DP network. We recommend that
the spreadsheet be simplied to the fewest number of cells reasonably possible. Following this, the
spreadsheet writer can look for alternate formulas
that decrease the constructive complexity.
First, dene the DP network of the dynamic
program.
Second, implement dependencies among nodes
in the DP network as dependent formulas in a
spreadsheet.
Third, simplify the spreadsheet as much as
possible.
Last, rene the model to decrease its constructive
complexity.
This four step method exploits the graphical layout
of DPs as dependencies among cells in a spreadsheet.
Recursion formulas are not difcult to implement.
We found that the most difcult part was design of
the primal retrieval. Many times, we used complicated formulas (such as arrays and lookups) only
to improve the constructive complexity relative to a
simple version.
We continue our knapsack problem example. Based
on the DP network, a rst very simplied knapsack
implementation is in Figure 3a. The object sizes are
given in the Capacity and Prot table in A6:B20. For
example, an object of size 4 has prot of $12, as in
cell B10.
Notice the relationship between this spreadsheet
and the network diagram in Figure 2. Cells in
the table correspond to arcs. The Max Prot in
row 22 corresponds to the objective value at each
stage. The implementation is a two-dimensional table,
where stages are in columns and arcs are enumerated in the tables interior cells. To emphasize the network structure, consider Figure 3c, which shows the
spreadsheet precedence structure as a graph (omitting
the dependencies on constant data in column B). The

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

cell precedence structure is exactly the DP network


diagram. This network diagram cannot be seen with
a procedural language.
Unfortunately, this implementation is quite annoying to write, as every formula in the table is different. Figure 3b shows that the rst term of each formula
in the table would be copied diagonally, while the second term would be copied across. The users is therefore forced to write each formula by hand.
So we have a simple spreadsheet, but one that
is bothersome to write and modify, due to its high
constructive complexity. The difcult bit is to nd
a formula that will look up the object prots for
the diagonal, while transposing the look up from
the Max prot in row 22. Our spreadsheet in
Figure 3d will do just that, with VLOOKUP() and
TRANSPOSE().
This knapsack implementation has three distinct
formulas: rst, the formulas in the table D6:Q19; second, the MAX() formulas in row 22; and third, row 22
transposed with the formula TRANSPOSE(C22:Q22),
entered as an array into S6:S20. This version of the
knapsack algorithm requires transposition.
So how does this spreadsheet work? It lists all possible arcs in the dynamic program. For example, consider the selected cell, I10, in Figure 3d. The row label
for cell I10 is 4, meaning that we have already packed
4 units. That is, we have recursively lled up to 4
optimally. The column label for cell I10 is 6, meaning
that we are considering all options to ll an additional
2 units. What can we add? Nothing, since we have
no object of size 2 to add. This could be shown in the
network as an arc from node 4 to node 6, but since we
have no object of size 2, we have not drawn an arc.
Figure 3a

Onb Knapsack Spreadsheet

31

However, this spreadsheet formulation still requires


the calculation for an arc from node 4 to node 6. Thus,
we already have an indication that this is not the best
possible formulation. The largest of the values into
node 6 (column I) is $16, which is the greatest prot
we can get with a capacity of 6. This comes from one
object of size 5 or two objects of size 3. This value of
$16 is noted in row 22 in cell I22, then transposed to
column S in cell S10, to be used in later calculations.
Note that the cells in row 6 for Capacity of 0 match
the network arcs coming out of node 0. We have
omitted some arcs, since they are unnecessary, as we
shall see with better formulations. Each arc in the network corresponds to a cell in D6:Q19, but the network
would be unnecessarily crowded if we put in all possible b 2 b/2 arcs.
In Figure 3d, we must copy formulas carefully for
each level of capacity. If the formulas are copied to the
entire rectangle D6:Q20, we have a circular reference.
For example, D7 would ow to D22, which would
ow to S7, which would ow back to D7. Because we
must do a copy operation for each level of capacity,
1    b, this implementation is therefore of constructive complexity Ob, quite different from the computational complexity of Ob 2 .
However, we can reduce the constructive complexity to O(1), by simply adding an IF() statement
to the arc formulas, as shown in Figure 4, which also
shows the primal retrieval and optimal solution in
rows 23 and 24. We can copy a single formula to the
entire table, C6:Q20.
So how do we nd the optimal solution (Figure 4)?
Row 22 shows the maximum prot possible at each
stage, based on the incoming arcs in cells C6:Q20.

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

32

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets


INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

Figure 3b

Onb Knapsack Spreadsheet Formulas. Note that Some Terms Recur Diagonally, While Other Terms Recur Across. This Spreadsheet Is
Very Difcult to Write and Modify

Figure 3c

Graph of the Cell Precedences for the Onb Knapsack Spreadsheet Formulas (Omitting the Dependencies on Constant Data in Column B).
The Cell Precedence Structure Is Exactly the DP Network Diagram

But these maxima may not be optimal. For example,


cell P22 shows a maximum prot of $40, which could
be had with ve items of value $8, but $40 is not
the optimal prot, because we can use the remaining
capacity to obtain a prot of $44. Row 23 determines
the optimal prots for each stage. The formula in cell
D23 uses SUMIF() to return the prot of the optimal
item. This was suggested by a referee and improves
on an earlier version. Row 24 shows the item values
that we should pack: we can optimally pack four

items with value $8 and one item with value $12, for
a total prot of $44.
Our next implementation can be found in Winston
(2003). The primal retrieval is original. This implementation avoids the need for transposition. In fact,
the fundamental algorithm is quite different from the
previous versions, being the following Onb recursion of Gilmore and Gomory (1966).
For the knapsack problem, max cx ax b, where
a is the vector of sizes, x is the vector of decision

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

33

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

Figure 3d

A Spreadsheet of Ob Constructive Complexity, to Solve the General Integer Knapsack Problem

Key Cell Formulas


Cell
D6
S6:S20
D22

Formula

Copied to

=VLOOKUP(D$5-$A6,$A$6:$B$20,2)+$S6
{=TRANSPOSE(C22:Q22)}
=MAX(D6:D20)

E6:Q6, F7:Q7, G8:Q8,    Q19:Q19


Single array formula.
E22:Q22

variables and b is the scalar capacity, the Gilmore and


Gomory recursion is:
for i = 0    b

f i =

max

j=1  n iaj

cj + f i aj 


The algorithm itself contains no transposition, so neither does the graphical spreadsheet implementation
in Figure 5.
So how does this spreadsheet work? It lists arcs in
the dynamic program, but fewer than in Figures 3
and 4. Those earlier examples have an arc from each
possible capacity to every greater possible capacity.
In fact, we need only the arcs associated with each
possible capacity and each item. For example, consider the selected cell, D15, in Figure 5. The row label
for cell D15 is 5, meaning that we have 5 units of
capacity available. The column label for cell D15 is
x3, item 3, with size 4 and value $12. Observe that
we have recursively already packed the capacity of 4
optimally, with a value of $8, as found in cell F14. The
formula in D15 rst checks to make sure that item 3
will t in size 5. It will, so the formula looks up an
item of size 1, discovers it has value $0, which means
we can still add this item of size 4, and get a value of

$12. This value of 12 would be noted in the Max Prot


column F, but we actually did better with item 1 of
size 5 and value $16.
As before, Max prot shows the best way to
pack each value of capacityif that were the only
capacity available, while Optimal prots shows the
true optimal way given the actual total capacity available. The Optimal items shows the value of the
optimal items to pack, again four items of value $8
and one item of value $12.
Let us analyze the constructive complexity. This
implementation requires four distinct formulas. After
entering the data in rows 5, 6, and 7, the values
0,    14 in column A are entered, then the formulas
in columns B through E are entered, all O(1) operations. Next, the MAX() formulas are entered into
column F, another O(1) operation. Finally, the primal retrieval is added in columns G and H. The
spreadsheet has 5 distinct formulas. An instance with
b = 1 000 and n = 200 would also have only 5 distinct formulas. Therefore, this spreadsheet is of O1
constructive complexity. Within the knapsack capacity, changing the size or prot of items is trivial. To
increase the capacity, we simply select the last three

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

34

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

Figure 4

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

A Spreadsheet of O1 Constructive Complexity, to Solve the General Integer Knapsack Problem

Key Cell Formulas


Cell
C6
S6:S20
D22
D23
R23
D24

Formula

Copied to

=IF(C$5<=$A6,0,VLOOKUP(C$5-$A6,$A$6:$B$20,2)+$S6)
{=TRANSPOSE(C22:Q22)}
=MAX(D6:D20)
=E23-SUMIF($B$6:$B$20,E23-C$22,$B$6:$B$20)
=Q22
=E23-D23

C6:Q20
Single array formula.
E22:Q22
E23:Q23

E24:Q24

rows (excluding the optimal objective) and ll downwards (overwriting the objective). We then add the
objective again in column H, pointing to the last cell
in column G. To add another item requires inserting a
column, perhaps between E and F, and then checking
the MAX() formula. Thus, this model can be revised
in O(1) time.
The thoughtful reader may observe that the data
entry in the previous model, Figure 4, appears to be
Ob, while the data entry in this model, Figure 5,
is On. However, we have assumed that the rectangular ll in Figure 4, A6:B20, could be done in O(1)
time. Then, if B6:B20 in Figure 4 were initialized with
zeroes, the data entry would be only On, which is
sunk.
Our next example knapsack, Figure 6, was suggested by an anonymous referee, and is a signicant improvement from an earlier draft. The primal
retrieval is original. The spreadsheet has just one dis-

tinct formula for the recursion, and this formula is


reasonably short. Primal retrieval requires two more
formulas. The spreadsheet needs only 3b total formulas. As such, it may prove useful to the practitioner. The one distinct formula relies on lookup and
array functions. The user enters item prots in column B, for each possible size in column A. Changing the prots or sizes of items up to the capacity
is trivial. Compared with the previous example, the
dimension of items has been collapsed into an array
formula.
The formula in column D uses SUMIF() to test for
the optimal item. This was suggested by a referee,
and is an improvement over our previous version.
The SUMIF() returns the prot of the optimal item.
We leave the explanation of the other formulas to the
reader, since they are quite simple. The formulas follow directly from the recursion algebra.

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

35

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

Figure 5

A Spreadsheet That Implements the Gilmore and Gomory


Knapsack Recursion

Figure 6

A Spreadsheet with Only One Distinct Formula, to Solve


the General Integer Knapsack Problem

Key Cell Formulas


Key Cell Formulas

Cell

Formula

Copied to

{=MAX(LOOKUP(A8-A$7:A7,A$7:A8,B$7:B8)
+C$7:C7)}
=D9-SUMIF($B$7:$B$21,D9-C7,$B$7:$B$21)
=C21
=D8-D7

C9:C21

Cell

Formula

Copied to

C8

B11

=IF($A11<B$6,0,B$7+INDEX($F$10:
$F10,1+$A11-B$6))
=MAX(B12:E11)
=G12-SUMIF($B$7:$E$7,G12-F10,
$B$7:$E$7)
=F24
=G12-G11

B12:E24

D8
D21
E8

F11
G11
G25
H11

F12:F24
G12:G24

H12:H24

Modifying this spreadsheet is straightforward, but


we need to keep in mind how a dynamic program
works. To add capacity, insert rows between rows 20
and 21. Starting with rows 17, 18, and 19, use the ll
handle to ll data and formulas downwards. Cell D20
points to the last row, so this needs to be overwritten.
Thus, the spreadsheet can be written or revised in
O(1) time.
We see that this implementation uses fewer cells
than earlier versions, Ob cells, rather than Ob 2  or
Onb cells. If we wish to try to solve a large knapsack
problem in Excel, we are more likely to succeed with
the better implementation. We were able to solve a
knapsack with capacity 1,000, with random prots, in
just a few seconds.
The last knapsack DP implementation lists the arcs
as a database. The database lookup function, DMAX()
serves in place of the MAX() function used above.
In his book on Lotus 1-2-3, LP, and DP, Ho (1987)
observes a connection between DPs and databases.
In dynamic programming, you do not actually write
down a problem in the process of formulating it.
Instead, you identify aspects of the problem which
will serve as the key components in a DP model. In

D9:D20

E9:E21

this sense, a DP model looks more like a database


than a clear-cut optimization problem.
This spreadsheet works very much as the previous ones, with a different layout and different
lookup functions. It enumerates all possible arcs in
the dynamic program as a one-dimension list. Consider the selected cell, C16, in Figure 7. The formula nds the value $12 for an item of size 40,
then adds this value to the best current objective $0
for a capacity of 0. This value of $12 is recorded
in the Max Prot row 8, cell F8, for use in later
calculation.
This spreadsheet rst requires an O(1) operation to
create the table in rows 5 to 8. The spreadsheet has
four distinct formulas, all of which are O(1) operations, as shown in Figure 7. This spreadsheet therefore has constructive complexity of O(1). The user
enters the object prots in row 7 for each possible
size in row 6. Like the rst knapsack implementation,
changing the size or prot of items below the capacity
is trivial. The user needs only to change the Prot data
in row 7. Adding t more units of capacity requires
inserting t columns in the summary table at the top,
and then copying nt + t 2 /2 t/2 rows downwards,
both of which are O1 operations. The range in each
database formula in row 8 needs to be corrected, but
this could be avoided if rows were inserted at the bottom of the database, and anyway this is also an O(1)
operation.

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

36

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

Figure 7

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

A O1 Spreadsheet That Lists the DP Arcs as a Database

Key Cell Formulas


Cell

Formula

Copied to

B8
C9
C10
A14
B14
C13

=DMAX($B$12:$C$117,$C$12,B5:B6)
=D9-SUMIF($B$7:$P$7,D9-B8,$B$7:$P$7)
=C9-B9
=IF(B13=$P$6,A13+1,A13)
=IF(A14=A13,B13+1,A14+1)
=HLOOKUP(-A13+B13,$B$6:$P$7,2)+HLOOKUP(A13,$B$6:$P$8,3)

C8:P8
D9:P9
C11:P10
A15:A117
B15:B117
C14:C117

If the node label formulas in A13:B117 are converted to values, then the database can be sorted
various ways. The modeler may want to think about
dominated arcs, could graph the data in various ways,
and could export the arcs to a linear programming
solver such as AMPL or Lingo, perhaps with another
column for the arc cost. Such a formulation could be
used to solve the one-dimensional cutting stock problem using the formulation of Dyckhoff (1981) or de
Carvalho (1998).

4.

The Batch Scheduling


ProblemMinimizing Makespan
for a Single Batching Machine

In this problem, a batch processing machine can process a set of jobs simultaneously. The jobs are processed together to form a batch. The processing time
of a batch is the maximum of the processing times
of jobs belonging to it. We assume that a job cannot
be preempted. If the number of jobs that can be processed simultaneously in a batch is greater than or
equal to the number of jobs, then the problem can

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

37

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

be solved in polynomial time (Lee 1999) by using DP.


More precisely, the problem can be solved in On2 
time, where n is the number of jobs.
Let ri be the release date of job i and pi be its
processing time. If job i is released before job j, and
has a smaller processing time than j, then i can be
scheduled in the same batch as j without increasing the makespan. Therefore, we need to consider
only the case where jobs that come later have smaller
processing times, and we can suppose for the implementation that r1 r2 rn and p1 > p2 > > pn .
The batch scheduling problem follows a standard
forward recursion, where the optimal makespan of
the k rst jobs in the optimal schedule is f k:
f 0 = 0

f k = min maxf k i rk  + pki+1 


1ik

for 1 k 0
Figure 8a presents the DP network.
Applying this renement method gave a rst version based on On constructive complexity. Formulas
are simple since only MAX and MIN functions have
been used. Cell H6 equals =MAX(H4,C11)+D5. Cell
G7 equals =MAX(G4,C11)+D5 and is copied to H7.
Cell F8 equals =MAX(F4,C11)+D5, and it is used to
ll cells G8:H8, and so on for cells E9, D10. As previously, column f k contains the minimum value for
the D6:D10, and the ll handle is used through cells
E11:H11. The constructive complexity is On since
n + 1 formulas have been entered. This version is presented in Figure 9, without the primal retrieval which
we leave for later.
To improve the constructive complexity, we give
a rened implementation of the spreadsheet DP in
Figure 8c. In Figure 8c, cell C10 is copied by using the
ll handle in all cells belonging to the range C10:G14.
If k i < 0, then the innity value (set for this example
to 1,000) is used as an upper bound of the objective
function. The formula in C15 calculates the minimum
value of C10:C14, and is copied through cells D15:G15.
Thus, the constructive complexity is O(1) since only
two different formulas have been entered. The second part of the spreadsheet is dedicated to the primal
retrieval, implemented in C17:G21. Rows 17 and 18
perform a backward recursion. For instance, the value
4 in F17 indicates that 4 jobs belong to the last batch,
and the remaining jobs in a batch are counted backwards in row 18. Row 17 denes an array formula
that uses the intermediate computations performed in

range C10:G15 of the forward recursion table to detect


the number of jobs in a batch:
Cell C17 is dened by {=IF(D18>0,0,MAX(IF(C15=
C10:C14,$B$10:$B$14,0)))} and copied upwards from
D17 to G17.
Cell C18 is dened by =IF(C17=0,IF(D18=0,0,
D18-1),C17-1) and is copied upwards from D18
to G18.
Start and nish times of jobs are calculated in
C20:G21, using array formulas.
A careful audit of this spreadsheet reveals several circular references. For instance, C15 depends
on C10 and vice versa due to the index function in
$C$15:$G$15. Nevertheless, this circular reference is
not detected while entering the formula in the cells.
This is the same kind of hidden circular reference
in the CPM spreadsheet presented in Figure 2. Thus,
a spreadsheet can contain circular references and still
be correct.
A reverse engineering of the rst version shows that
the underlying recursion formula has been simplied:
f 0 = 0

and

f k = min maxf i rk  + pi+1 


0ik1

for 1 k n

Figure 8a

DP Network for Scheduling a Batching Machine

Figure 8b

Second Version of Scheduling a Batching Machine

Key cell formulas


Cell

Formula

Copied to

H6
G7
F8
E9
D10
D11

=MAX(H4,C11)+D5
=MAX(G4,C11)+D5
=MAX(F4,C11)+D5
=MAX(E4,C11)+D5
=MAX(D4,C11)+D5
=MIN(D6:D10)

H7
G8:H8
F9:G9
E10:G10
E11:G11

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

38

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

Figure 8c

A Rene Version of Scheduling a Batching Machine

Key cell formulas


Cell

Formula

C10
C15
C17
C18
D19
C20
C21

=IF(C$7<$B10,$B$6, MAX(INDEX($B$15:$G$15,1,C$7-$B10+1),C$8) +INDEX($C$9:$G$9,1,C$7-$B10+1))


=MIN(C10:C14)
{=IF(D18>0,0, MAX(IF(C15=C10:C14,$B$10:$B$14,0)))}
=IF(C17=0,IF(D18=0,0,D18-1),C17-1)
=IF(C17=0,C19,C19+1)
{=MAX(IF(C19=$C$19:$G$19,$C$15:$G$15,0))}
{=C20-MAX(IF(C19=$C$19:$G$19,$C$9:$G$9,0))}

As a consequence, reasoning about organization of


Excels computations in the spreadsheet leads the student to a better understanding of recursion formulas
by analysing computation dependencies in the DP
network. We have shown through this example a
slight simplication of the original recursion formulas
proposed in Lee (1999) (but with the same computational complexity).

5.

Minimizing the Weighted Number


of Late Jobs on a Single Machine

Suppose jobs have to be scheduled on a single


machine without preemption. A job i cannot start
before time 0 and must be processed for pi units of
time (if possible) before its deadline di . For such a
problem, meeting deadlines is usually achieved by
scheduling jobs in non-decreasing order of their deadlines. This rule, called Earliest Due Date (EDD), minimizes the maximum lateness. Minimizing the number
of late jobs is achieved by applying EDD, but if one
deadline is not met for the current scheduled job, then
the longest job already scheduled is moved to the
end of the schedule. When jobs have different levels
of importance (modeled as an integer wi , then the

Copied to
C10:G14
D15:G15
D17:G17
D18:G18
E19:G19
E20:G20
F19:F22

objective function becomes the weighted number of


late jobs. There exist pseudo-polynomial time algorithms to solve the problem. The presented DP is due
to Lawler and Moore (1969) and runs in Onpj  time,
thus in pseudo-polynomial time.
The problem is solved by using recursion formulas
for calculating the minimum weighted number of late
jobs in the interval 0 t while considering the j rst
jobs. Let Uj t = 1 if job j is completed by its deadline
in the interval 0 t, and Uj t = 0 if job j is late or
not scheduled in the interval 0 t. Let fj t denote
the maximum criterion value for j jobs, subject to the
constraint that the total processing time of the on-time
jobs is at most t. Then the recursion formula can be
based on the following performance measure:

wi Ui t
fj t = min
i=1j

Stages are associated with jobs and the state variable


is time t. The method assumes that jobs are indexed in
non-decreasing order of their due dates. Then recursion formulas are:
f0 t = 0

for j = 0    n and t 0

fj t = 

for j = 0    n and t < 0

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

39

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

fj t = minfj1 t pj  fj1 t + wj 


for j = 1    n and 0 t dj

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

fj t = fj dj 

for j = 1    n and t > dj 

To start, we need an upper bound


T of the makespan.
A simple upper bound is T = i=1n pj . The optimal value is computed by fn T . The implementation
leads to the network diagram in Figure 10a, and the
spreadsheet in Figure 10b.
The interpretation of the network is as follows. Each
vertical column of nodes is a stage, corresponding
to a job. Each column of nodes is a state for time
t = 0    22. The arc from (stage, state) = j 1 t pj 
to (stage, state) = j t has length pj and cost 0. What
decision action is this? This arc means Start job j at
time t, to get it done on time. The arc from node
j 1 t to node j t has length 0 and cost wj . This
means Let job j1 be late. Put job j at the end of the
schedule. The label shown on each node j t is the
total lateness penalty weight fj t for decisions made
up to that point. For example, in the worst case, the
straight horizontal lines from left to right at time 0, 0
to 10 to 60 to 65 to 74, means all jobs are late, and we
incur a total penalty of 74.
The forward recursion is implemented with
only one distinct formula. Thus, the constructive
Figure 10a

complexity is O(1). The main difculty is nding


values from previous iterations of the DP, using the
INDEX function, in cell C9. We consider two cases.
Case 1: the deadline of the rst job is greater than or
equal to the current date d1 < t. Case 2: d1 t.
If d1 < t then the objective value is f d1 , where
d1 is the deadline of the rst job f1 t = f1 d1 . This
is calculated by using an index function that seeks a
value in the table:
INDEX(C$9:C9,C$8+1,1)
If d1 t, then we have to calculate: fj t = minfj1
t p fj1 t+wj . This is implemented by the nested
IF function IF($A9<C$7) that checks if t pi is negative or not. If it is negative then fj t =  otherwise it can be evaluated. The rst term in the min()
statement is calculated as follows: C$6+B9 and calculates fj1 t + wj . The second term corresponds
to: INDEX(B$9:B9,$A9+1-C$7,1) and thus calculates
fj1 t p.
With these two cases, we obtain the following
formula:
IF($A9<C$7,C$6+B9,MIN(C$6+B9,INDEX(B$9:B9,
$A9+1-C$7,1)))

DP Network Diagram for Minimizing the Weighted Number of Late Jobs on a Single Machine. The Optimal Path Is Shown in the Dotted
Lines. Jobs 1, 2, and 4 Are on Time. Job 3 Is Late and Incurs a Penalty of 5

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

40

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

Figure 10b

Spreadsheet DP for Minimizing the Weighted Number of Late Jobs on a Single Machine

Key cell formulas


Cell

Formula

Copied to

C9
C34
C35
C36
C37
C38
G38
G39
C39

=IF($A9<=C$8, IF($A9<C$7,C$6+B9, MIN(C$6+B9,INDEX(B$9:B9,$A9+1-C$7,1))), INDEX(C$9:C9,C$8+1,1))


=C6+INDEX(B9:B31,C37+1,1)
=INDEX(C9:C31,C37+1,1)
=C35<>C34
=MIN(D38,C8)
=C37-IF(C36,C7,0)
=F8
=SUM(C7:F7)
=IF(C36,C37,MAX(D39:$F39)+C7)

C9:F32
D34:F34
D35:F35
D36:F36
D37:F37
D37:F37

C39:F39

Thus, the complete formula in cell C9 is:


=IF($A9<=C$8, IF($A9<C$7,C$6+B9, MIN(C$6+B9,
INDEX(B$9:B9,$A9+1-C$7,1))),INDEX(C$9:C9,
C$8+1,1))
The optimal value is found in the last cell of the range
that stores fn T .
Backward recursion allows us to calculate a
schedule of jobs completed by their deadlines. Thus,

it determines starting times of jobs, their completion


times and the total weighted number of jobs completed by their deadlines. If a job has its starting
time equal to its completion time in backward recursion, then such a job is not completed by its deadline and will be scheduled after all on-time jobs (the
corresponding starting time is not computed in the
backward recursion). This is achieved by using all
fj t values computed in the forward recursion. Jobs

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

41

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

are considered in the same order as in the forward


recursion, from index 1 to n. The computation starts at
time t = dn (dened in cell G38). Then job j can nish
by time mint dj . If the job j is late fj t = fj1 t+wj ,
otherwise job i started at time t pj . The spreadsheet
implementation of the basic recursion scheme leads to
cells C34:F38. The cells in row 37 store possible completion times of jobs. If the job is completed by its
deadline, then row 38 stores related starting times of
jobs.
We now describe formulas in the range C34:F38. We
only deal with the last column of this range since the
formulas are copied to adjacent cells in each row. The
rst time period to consider by the backward recursion is in cell G38 (the latest deadline):
F37=MIN(G38,F8)
Then we need the values fj t calculated as follows:
F35=INDEX(F9:F31,F37+1,1)
and fj1 t + wj calculated as follows:
F34=F6+INDEX(E9:E31,F37+1,1)
The last job is on time if F34<>F35 (in cell F36).
Lastly, row tb stores the next job completion time to
consider for the next job (i.e., t or t pj according to
the status of job j). This is achieved by calculating the
starting time of the current job using the following
formula in cell F38:
F38=F37-IF(F36,F7,0)
Note that four formulas have been dened and copied
to adjacent cells. This number of formulas is not
related to the number of jobs. Thus, the constructive
complexity of the backward recursion is O(1).

6.

Venturing Into Higher Dimensions:


Minimizing the Makespan on
Parallel Unrelated Machines

Minimizing the makespan on identical parallel


machines requires an exponential time algorithm,
if the number of machines is greater than 2. But
for unrelated parallel machines, minimizing the
makespan is hard even for two-machine problems.
We consider a problem with n jobs and m unrelated
machines. The processing time of job j on machine k
is denoted pjk . We present two different DPs to solve
this problem, directly in their rened versions.
We rst present a backward DP to solve the problem, due to concepts presented in Rothkopf (1966).
Let tk be a state variable dening the time at which
machine k becomes idle. A state ej of the DP is dened
by a vector t1    tm . The jobs are considered one by

one according to their indices. Let fj t1    tm  be the


optimal makespan for the rst j jobs assuming that
machines become idle respectively at times t1    tm .
This optimal value can be dened by the following
recursion equations:
Initialization: f0 t1    tm  = 0 if t1 = = tm ,
f0 t1    tm  =  otherwise.
Recursion:
fj t1    tm  = min1km maxtk
fj1 t1    tk1 tk pjk    tm 
We assume that if tk pjk < 0 then fj1 ( ) returns
innity.
To bound the number of states, we need an upper
bound T on the makespan. T can be found by a
heuristic algorithm or by constructing a schedule by
hand. The number of different states is then dened
by the Cartesian product of 0    T
0    T

m times multiplied by the number of stages n + 1


(including one for the initalization).
Without loss of generality, we next deal with a twomachine scheduling problem. We consider four jobs
and an upper bound of the makespan equal to 9. The
spreadsheet is presented in Figure 11. Since 500 rows
have been required, most of them are hidden.
The rst step is to dene states of the DP. Stage
number j considers the j rst jobs. A state variable
denes the time at which the corresponding machine
completes a job. Two state variables are necessary
since we consider two parallel machines. Thus, for
every stage, all values of the state variables must
be enumerated from 0 to the upper bound T of the
makespan. Our case study requires 500 rows to enumerate all stages and states.
We enumerate stages and states with one formula
(cell C14) that is based on a simple number conversion. The state number in column B is encoded in
base 10. Then, it is converted in base T and the formula extracts the hundreds place for the stage number (column C), the tens place for t1 (column D) and
the units place for t2 (column E). Let x be a decimal number denoting a state number. Dening the
values of state variables can be obtained by converting x in a binary number. The arithmetic calculations are: x mod 2 is the unit place,
x/T mod T
is the tens place, and lastly for the hundreds place:

x/T 2 mod T . This principle leads to a general layout for enumerating the state space in a DP using the
index i as the position of digits (values 2, 1, and 0 in
cells C13:E13):
x/T i mod T . This formula is dened
using Excels built-in functions:
C14=MOD(ROUNDDOWN($B14/($C$9 C$13),0),
$C$9)
Columns F and G use VLOOKUP() statements to
report processing times in the main table. Columns H
to J dene the DP recursion formula. We need to

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

42

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

Figure 11

General Structure of the Backward DP

Key cell formulas


Cell

Formula

Copied to

C14
F14
H14
H114

=MOD(ROUNDDOWN($B14/($C$9 C$13),0),$C$9)
=IF($C14=0,0,VLOOKUP($C14,$B$4:$D$7,F$13+1,FALSE))
=IF(AND($D14=0,$E14=0),0,$C$10)
=IF(D114-F114<0,$C$10, MAX(D114, INDEX($J$14:$J$513,
(C114-1) $C$9 C$13+(D114-F114) $C$9+E114+1)))
=IF(E114-G114<0,$C$10, MAX(E114,INDEX($J$14:$J$513,
(C114-1) $C$9 C$13+D114 $C$9+E114-G114+1)))
=MIN(H14,I14)
{=MIN(IF(C14:C513=4,J14:J513,$C$10))}

C14:E513
F14:G513
H14:I113
H115:H513

I114
J14
D515

detail separately the initialization stage and the recursion formula implementations.
The initialization stage is dened by the formula:
f0 t1    tm  = 0 if t1 = = tm , f0 t1    tm  = 
otherwise. We dene the corresponding function in
cells H14 (copied to all cells belonging to the initialization stage), where D14, E14 are stage numbers and
C10 the makespan upper bound:
H14=IF(AND($D14=0,$E14=0),0,$C$10)
A max function is computed for each machine
(respectively in columns H and J for machines 1

I115:I513
J15:J513

and 2). Then, the minimum of H and I is the objective


function. In column H we need to compute the max
function corresponding to m = 1 (the rst machine):
maxt1 fj1 t1 pj1 t2 
t1 is in cell D114 and the processing time pj1 is in
cell F114. If t1 pj1 < 0 then the returned value is the
upper bound located in cell C10, otherwise the max
function is evaluated. In this case, we need to dene
the decimal number corresponding to the values of
the state variables. This decimal number is then used
to calculate the row corresponding to fj1 t1 pj1 t2 

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

43

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

in the array J14:J513 by using the INDEX function.


The decimal number is computed as the sum of hundreds digit (power 2 that is in cell C13), tens place and
units place, plus one since the rst row corresponds
to the decimal number 0:

Figure 12

Gantt Chart of the Solution Calculated in the


Spreadsheet DP

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

(C114-1) $C$9 C$13+(D114-F114) $C$9+E114+1


The complete formula in cell H114 is:
H114=IF(D114-F114<0,$C$10, MAX(D114,
INDEX($J$14:$J$513,(C114-1) $C$9 C$13
+(D114-F114) $C$9+E114+1)))
Now, for column I, the same principle is used to calculate the max function correspond to m = 2 (the second
machine):
maxfj1 t1 t2 pj2  t2 
t2 is in cell E114 and the processing time pj2 in cell
G114. If t2 pj2 < 0 then the returned value is the
upper bound located in cell C10, otherwise the max
function is evaluated. In this case, the decimal number
corresponding to the values of the state variables is
computed as for the rst machine. This decimal number is then used to calculate the row corresponding
to fj1 t1 t2 pj2  in the array J14:J513 by using the
INDEX function. The decimal number is computed as
the sum of hundreds digit (power 2 that is in cell C13),
tens place and units place, plus one since the rst row
corresponds to the decimal number 0:
(C114-1) $C$9 C$13+D114 $C$9+E114-G114+1
The complete formula in cell I114 is:
=IF(E114-G114<0,$C$10, MAX(E114,INDEX($J$14:
$J$513,(C114-1) $C$9 C$13+D114 $C$9
+E114-G114+1)))
The objective function in cell J114 is:
J114=MIN(H14,I14)
The makespan is the minimum cost found in states
belonging to stage n, where n is the number of jobs to
be scheduled. The makespan is calculated by an array
formula as follows:
{=MIN(IF(C14:C513=4,J14:J513,$C$10))}
Columns F and G store the processing times of the
job associated with the current stage. This will simplify the formula by avoiding VLOOKUP() formulas
in subsequent columns. The recursion is implemented
in columns H:I.

For the example with four jobs and two machines


in the spreadsheet of Figure 11, the optimal makespan
is 8 and the corresponding optimal schedule is presented in Figure 12.
Now we turn to the constructive complexity.
Exactly one formula is dened in every column of
the table except for columns H and G where two are
needed (one for the initialization stage and another
for remaining stages). Thus, the constructive complexity is related to the number of machines m,
but not to the number of jobs. Thus, the constructive complexity is Om. This spreadsheet can easily be extended to deal with more machines and a
larger upper bound. Nevertheless, even if the constructive complexity is low, some formulas are tricky.
They have to be entered carefully to avoid errors, and
are difcult to audit.
With this last example, we see the limit of implementing DPs in a spreadsheet. If the value T is large
enough (for our case study with 4 jobs and 2 machines,
the maximum number for T is 115), Excel would not
have enough rows to enumerate all possible states. Of
course, we can use several sheets or design a cyclic
pattern to continue the table in subsequent columns
of the sheet. But such an implementation would not
be a natural spreadsheet implementation. The most
natural implementation uses a classical programming
language that handles large arrays of data.
We now present a forward DP due to Horowitz
(1976), that enumerates jobs in the order of their
index. In each stage, exactly one job is assigned to a
machine. Thus, job 1 has to be assigned to a machine.
This leads to m possible assignments that will be
modeled as m different states in the rst stage. In the
next stage, job 2 is considered while considering every
state created in stage 1, leading to m2 new states. And
so on for the remaining jobs. Figure 13 presents the
state enumeration for a 2-machine scheduling problem. Every level l in the tree is associated with the
possible assignment for job l 1 (level 1 is associated
with the root of the tree).
The spreadsheet implementation is presented in
Figure 14. The columns From and To in the spreadsheet refer to node labels dened in Figure 13. The

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

44

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

Figure 13

State Space with Two Machines. The First Number on Each


Label Is the Job Number. The Second Number Is the Time
Used on Machine 1. The Third Number Is the Time Used
on Machine 2. An Arc Upwards Corresponds to Assigning
the Job to Machine 1. An Arc Downwards Corresponds to
Assigning the Job to Machine 2

Figure 14

Spreadsheet Implementation of the Forward DP

enumeration of states is based on the following formulas, copied downwards:


Machine, cell A15=1+MOD(A14,$C$4)
From node, cell B15=B14+(A14>A15)
To node, cell C15=C14+1
The job associated with the current state is dened by
the index of the job considered in the previous state
(i.e., in column From) plus one. This is achieved by
using a VLOOKUP statement. The formula in cell D15
that will be copied downwards is:
D15=VLOOKUP(B15,$C$14:$D15,2)+1
Columns E and F calculate the current makespan
of each machine. For a given node, the current
machine is dened in column A. The makespan on
a machine is dened by the makespan on the same
machine dened in the predecessor node, plus the
processing time of the current job if the machine is
the current one. This leads to the following formula
in cell E15 (to copy downwards and to column F):
E15=VLOOKUP($B15,$C$14:E15,E$13+2)+IF($A15=
E$13,VLOOKUP($D15,$A$5:$C$8,E$13+1,FALSE),0)
Lastly, the makespan of the state in row 15 is dened
as MAX(E15,F15) and then copied downwards. The
optimal makespan is calculated by a DMIN function

Key cell formulas


Cell

Formula

Copied to

G9
G10
A15
B15
C15
D15
E15

=$A$8
=DMIN(D13:G45,COLUMNS(D13:G13),G8:G9)
=1+MOD(A14,$C$4)
=B14+(A14>A15)
=C14+1
=VLOOKUP(B15,$C$14:$D15,2)+1
=VLOOKUP($B15,$C$14:E15,E$13+2)
+IF($A15=E$13,VLOOKUP($D15,$A$5:
$C$8,E$13+1,FALSE),0)
=MAX(E15:F15)
=AND(D15=$A$8,G15=$G$10)

A16:A45
B16:B45
C16:C45
D16:D45
E15:F45

G15
H15

G16:G45
H16:H45

as in the backward DP. To detect if the current state


is an optimal one it is sufcient to calculate cell H15:
H15=AND(D15=$A$8,G15=$G$10)
Thus, the constructive complexity is O(1), whereas the
computational complexity becomes Omn .
We now show that the DP presented in Figure 14
can easily be extended. To insert a new machine:
(1) Move C4:C8 one column to the right.
(2) Copy D4:D8 back to C4:C8. Fix the data and
labels.
(3) Insert a column between E and F.

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

(4) Copy E12:E44 to F12.


(5) Fix the column labels in F13:G13.
To insert a new job:
(1) Insert a row between rows 7 and 8.
(2) Add the data and correct the job numbers in
column A of the Process time table.
(3) Insert new rows near the bottom, and copy the
row of formulas downwards.
(4) Make sure the Jobs column reaches the number of jobs +1, else go back to step 3.
We can easily delete machines or jobs by deleting
rows or columns and correcting the labels. But if we
increase the number of jobs or machines, the spreadsheet gets huge. Still, these modications on a job
or a machine require O(1) constructive complexity,
because the number of keystrokes required to change
the spreadsheet is independent of the size of the
model.
This spreadsheet provides a relatively easy way to
do primal retrieval, visually: the user has to locate the
leaf corresponding to the optimal value and then consider the path from that node to the root in Figure 13.
There is only one path from the root to every leaf in
the decision tree and it contains all taken decisions.

7.

Summary

In this paper, we have demonstrated a variety of


dynamic programming structures in Excel. These
examples will be of benet to teachers and students
in operations research, and to the practitioner. We
have also studied each model using the concept of
constructive complexity, which allows a quantitative
approach to studying the structure of a spreadsheet.
We found that reducing the constructive complexity
of a spreadsheet is usually helped with advanced
spreadsheet features, such as lookups, array functions, and circular references. We hope that operations researchers can analyze their own spreadsheets
in similar ways, to produce better spreadsheets, and
to teach good spreadsheet design.
Acknowledgments

We would like to thank the referees for their efforts in


helping us improve the spreadsheets and the text of this
paper. Their comments and suggestions were invaluable,
and helped the presentation considerably.

Appendix

Visual Basic code for the Gilmore-Gomory algorithm.


Warning: this runs in about 50 times the time of the example
from Figure 7.
Option Explicit
Gilmore Gomory knapsack algorithm.
Public Function knapsack(capacity As Integer, sizes
As Range, prots As Range)
Dim i, j, currentcapacity

45

Dim totalprot() As Single Starts at zero.


ReDim totalprot(1 + capacity) As Single
Dim currentprot As Single
Dim solution() As Integer
Starts at zero, but rst element is item 1.
ReDim solution(sizes.Cells.Count) As Integer
Do some checks.
If sizes.Cells.Count <> prots.Cells.Count Then knapsack
= Application.Na()
For j = 1 To sizes.Cells.Count
If sizes(j).Value <= 0 Or sizes(j).Value <> Application.
Round(sizes(j).Value, 0) Then
knapsack = Application.Na()
Return
End If
Next j
Do the recursion.
For i = 0 To capacity
For j = 1 To sizes.Cells.Count
If i + sizes(j).Value <= capacity Then
totalprot(i + sizes(j).Value) = Application.
Max(totalprot(i + sizes(j).Value), totalprot(i)
+ prots(j).Value)
End If
Next j
Next i
Primal retrieval
currentprot = totalprot(capacity)
currentcapacity = capacity
Do While currentcapacity > 0
For j = 1 To sizes.Cells.Count
If totalprot(currentcapacity) = totalprot
(currentcapacity sizes(j).Value) + prots(j).Value Then
solution(j 1) = 1 + solution(j 1)
currentcapacity = currentcapacity sizes(j).Value
Exit For
End If
Next j
Loop
Put in the right orientation.
If sizes.Rows.Count = 1 Then
knapsack = solution
Else
knapsack = Application.WorksheetFunction.
Transpose(solution)
End If
End Function

References
Bather, J. 2000. Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions. Wiley, Chichester, W. Sussex,
England; New York.
Beckmann, M. J. 1968. Dynamic Programming of Economic Decisions.
Springer-Verlag, Berlin & New York.
Bellman, R., S. E. Dreyfus. 1962. Applied Dynamic Programming.
Princeton University Press, Princeton, NJ.
Bertsekas, D. P. 2001. Dynamic Programming and Optimal Control,
2nd ed. (Volumes 1 and 2), Athena Scientic.
de Carvalho, J. M. Valrio. 1998. Exact solution of cutting stock
problems using column generation and branch-and-bound.
Internat. Trans. Oper. Res. 5(1) 3544.
Cooper, L., M. W. Cooper. 1981. Introduction to Dynamic Programming, 1st ed. Pergamon Press, Oxford.

Downloaded from informs.org by [116.102.89.57] on 19 August 2016, at 07:10 . For personal use only, all rights reserved.

46

Raffensperger and Richard: Implementing Dynamic Programs in Spreadsheets

Denardo, E. V. 2003. Dynamic Programming: Models and Applications.


Dover Publications.
Dreyfus, S. E., M. L. Averill. 1977. The Art and Theory of Dynamic
Programming. Academic Press, New York.
Dyckhoff, H. 1981. A new linear programming approach to the cutting stock problem. Oper. Res. 29(6, NovDec).
Gilmore, P. C., R. E. Gomory. 1966. The theory and computation of
Knapsack functions. Oper. Res. 14(1) 10451074.
Gluss, B. 1972. An Elementary Introduction to Dynamic Programming:
A State Equation Approach. Allyn and Bacon, Boston.
Hadley, G. 1970. Nonlinear and Dynamic Programming. AddisonWesley, Reading, Mass.
Hillier, F. S., M. S. Hillier, G. J. Lieberman. 2000. Introduction to
Management Science. Irwin-McGraw-Hill, Boston.
Ho, J. K. 1987. Linear and Dynamic Programming with Lotus 1-2-3,
Release 2. Management Information Source, Portland, OR.
Horowitz, E., S. Sahni. 1976. Exact and approximate algorithms
for scheduling non-identical processors. J. Assoc. Comput.
Machinery 23(2) 317327.
Jensen, P. A., J. F. Bard. 2002. Operations Research Models and Methods.
John Wiley and Sons, 2002. http://www.me.utexas.edu/
jensen/ORMM/index.html.
Lawler, E. L., J. M. Moore. 1969. A functional equation and its application to resource allocation and sequencing problems. Management Sci. 16(1) 7784.
Lawrence, J. A., B. A. Pasternack. 1998. Applied Management Science,
Chapter 16. John Wiley and Sons, New York.
Lee, C. Y., R. Uzsoy. 1999. Minimizing makespan on a single batch
processing machine with dynamic job arrivals. Internat. J. Production Res. 37(1) 219236.
Nemhauser, G. L. 1966. Introduction to Dynamic Programming. Wiley,
New York.
Neuwirth, E. 1995. Spreadsheet structures as a model for proving
combinatorial identities. J. Comput. Math. Sci. Ed. 14(3) 419434.

INFORMS Transactions on Education 5(2), pp. 2546, 2005 INFORMS

Puterman, M. L., ed. 1978. Dynamic Programming and Its Applications. Proc. Internat. Conf. Dynamic Programming Its Appl. (University of British Columbia, Vancouver, British Columbia, Canada,
April 1416, 1977), Academic Press, New York.
Ragsdale, C. T. 2003. A new approach to implementing project networks in spreadsheets. INFORMS Trans. Ed. 3(3), http://ite.
pubs.informs.org/Vol3No3/Ragsdale/index.php.
Rothkopf, M. H. 1966. Scheduling independent tasks on parallel
processors. Management Sci. 12(5) 437447.
Sarmiento, A. 2003. Dynamic programming solver. shareware,
http://www.geocities.com/p4software/.
Seal, K. C. 2001. A generalized PERT/CPM implementation in a
spreadsheet. INFORMS Trans. Ed. 2(1), http://ite.pubs.informs.
org/Vol2No1/Seal/Seal.php.
Sedgewick, R. 1988. Algorithms, 2nd ed. Addison-Wesley, Reading,
Mass.
Sniedovich, M. 1992. Dynamic Programming. M. Dekker,
New York.
Wagner, H. M. 1975. Principles of Operations Research: With Applications to Managerial Decisions, 2d ed. Prentice-Hall, Englewood
Cliffs, NJ.
Weiss, M. A. 1993. Data Structures and Algorithm Analysis.
Benjamin/Cummings Publishing, Redwood City, CA.
Winston, W. 2003. Operations Research: Applications and Algorithms,
4th ed. Duxbury Press, Belmont, CA.

To reference this paper, please use:


Raffensperger, J. F. and R. Pascal. (2005), Implementing Dynamic
Programs in Spreadsheets, INFORMS Transactions on Education, Vol. 5, No 2, http://archive.ite.journal.informs.org/Vol5No2/
RaffenspergerRichard/.