You are on page 1of 17

This article was downloaded by: [193.190.253.

145] On: 11 January 2021, At: 06:02


Publisher: Institute for Operations Research and the Management Sciences (INFORMS)
INFORMS is located in Maryland, USA

Transportation Science
Publication details, including instructions for authors and subscription information:
http://pubsonline.informs.org

Approximate Dynamic Programming for a Class of Long-


Horizon Maritime Inventory Routing Problems
Dimitri J. Papageorgiou, Myun-Seok Cheon, George Nemhauser, Joel Sokol

To cite this article:


Dimitri J. Papageorgiou, Myun-Seok Cheon, George Nemhauser, Joel Sokol (2015) Approximate Dynamic Programming for a
Class of Long-Horizon Maritime Inventory Routing Problems. Transportation Science 49(4):870-885. https://doi.org/10.1287/
trsc.2014.0542

Full terms and conditions of use: https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and-


Conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact permissions@informs.org.

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.

Copyright © 2014, INFORMS

Please scroll down for article—it is on subsequent pages

With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.)
and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individual
professionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods to
transform strategic visions and achieve better outcomes.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
Vol. 49, No. 4, November 2015, pp. 870–885
ISSN 0041-1655 (print) — ISSN 1526-5447 (online) http://dx.doi.org/10.1287/trsc.2014.0542
© 2015 INFORMS

Approximate Dynamic Programming for a Class of


Long-Horizon Maritime Inventory Routing Problems
Dimitri J. Papageorgiou, Myun-Seok Cheon
Corporate Strategic Research, ExxonMobil Research and Engineering Company, Annandale, New Jersey 08801
{dimitri.j.papageorgiou@exxonmobil.com, myun-seok.cheon@exxonmobil.com}

George Nemhauser, Joel Sokol


H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332
{gnemhaus@isye.gatech.edu, jsokol@isye.gatech.edu}

W e study a deterministic maritime inventory routing problem with a long planning horizon. For instances
with many ports and many vessels, mixed-integer linear programming (MIP) solvers often require hours to
produce good solutions even when the planning horizon is 90 or 120 periods. Building on the recent successes of
approximate dynamic programming (ADP) for road-based applications within the transportation community,
we develop an ADP procedure to generate good solutions to these problems within minutes. Our algorithm
operates by solving many small subproblems (one for each time period) and by collecting information about how
to produce better solutions. Our main contribution to the ADP community is an algorithm that solves MIP
subproblems and uses separable piecewise linear continuous, but not necessarily concave or convex, value
function approximations and requires no off-line training. Our algorithm is one of the first of its kind for maritime
transportation problems and represents a significant departure from the traditional methods used. In particular,
whereas virtually all existing methods are “MIP-centric,” i.e., they rely heavily on a solver to tackle a nontrivial
MIP to generate a good or improving solution in a couple of minutes, our framework puts the effort on finding
suitable value function approximations and places much less responsibility on the solver. Computational results
illustrate that with a relatively simple framework, our ADP approach is able to generate good solutions to
instances with many ports and vessels much faster than a commercial solver emphasizing feasibility and a popular
local search procedure.
Keywords: approximate dynamic programming; deterministic inventory routing; maritime transportation;
mixed-integer linear programming; time decomposition
History: Received: June 2013; revision received: January 2014; accepted: February 2014. Published online in Articles
in Advance July 21, 2014.

1. Introduction IRPs for all modes of transportation are given in


We consider a deterministic maritime inventory routing Andersson et al. (2010) and Coelho, Cordeau, and
problem (MIRP) in which a supplier is responsible for Laporte (2014). Idiosyncrasies and optimization models
both the routing of vessels to distribute a single product of IRPs specific to maritime settings are discussed in
and the inventory management at all ports in the supply Christiansen and Fagerholt (2009), Christiansen et al.
chain network. The supplier controls a fleet of vessels (2013), and Papageorgiou et al. (2014b).
for the entire planning horizon and knows inventory There are two motivating applications for our solu-
bounds as well as production and consumption rates tion methodology. First, from a strategic planning
at all locations. Compared to other inventory routing perspective, the model considered in this paper can
applications, the routing decisions in this problem are aid in the analysis of supply chain design decisions for
relatively simple because heterogeneous vessels make applications involving valuable bulk goods such as liq-
direct deliveries and fully load and fully discharge uefied natural gas (LNG). Because the model is needed
at a port. The main complexity stems from the long for strategic purposes, long planning horizons must be
planning horizon considered and the wide range of considered and methods that attempt to solve a single
travel times required to deliver product to customers. mixed-integer linear programming (MIP) model often
Inventory routing problems (IRPs) arise naturally in become hampered as the time dimension increases. A
a vendor managed inventory setting where a supplier diverse set of users, including some without formal
(e.g., a vertically integrated company) manages the training in operations research or optimization, fre-
distribution and inventory levels of product at his quently wants to experiment with many scenarios to
customers (Campbell et al. 1998). Surveys of general understand the impact of various design choices on the
870
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
Transportation Science 49(4), pp. 870–885, © 2015 INFORMS 871

profitability of a supply chain. These decisions include for a class of long-horizon maritime inventory rout-
fleet size and mix; long-term contracts for vessels and ing planning problems that outperforms well-known
with customers; investment in equipment, infrastruc- approaches including a popular MIP-based local search
ture, and capacity limits; and other factors not present heuristic and a commercial MIP solver emphasizing
in the traditional IRP. Meanwhile, constructing a high- feasibility. (2) An ADP approach that uses separable
fidelity integrated model to address all of the issues piecewise linear continuous, but not necessarily concave,
faced by various business users within a stochastic value function approximations and requires no off-line
programming or robust optimization framework is out training. (3) Further evidence that the added complexity
of the question—agreement on model fidelity and the of solving MIP subproblems has the potential to yield
scenarios or uncertainty sets from users across different good solutions.
business units would be difficult to obtain. With this This paper is organized as follows. Section 2 contains
backdrop, the first motivation for our algorithm is a literature review of research germane to maritime
to assist “experimentalist” business users interested transportation and ADP. In §3, we present a detailed
in solving numerous instances in a small amount of description of our problem along with a MIP formula-
time to analyze different business questions. In this tion and a dynamic programming formulation for it.
setting, speed in generating good solutions trumps the In §4, we provide our solution methodology using an
importance of finding provably optimal solutions. ADP framework. Finally, computational results in §5
Second, from a tactical planning perspective, this illustrate the effectiveness of our ADP approach.
model may be useful within a decomposition frame-
work for a more detailed MIRP. For example, in Papa- 2. Literature Review
georgiou et al. (2014a), an MIRP with multiple ports per
region and split pickups and split deliveries is consid- 2.1. Maritime Applications
ered. The problem is then solved using a decomposition From an application perspective, this paper focuses on
long-horizon MIRPs such as those arising in the LNG
approach in which the model is first aggregated by
industry, which are known as LNG-IRPs. An MIRP can
region, i.e., all ports within a region are thought of
be defined as “a planning problem where an actor has
as one “super-port,” and solutions routing vessels
the responsibility for both the inventory management
between regions are generated. The model and algo-
at one or both ends of the maritime transportation legs,
rithm presented here are well suited for this setting.
and for the ships’ routing and scheduling” (Christiansen
There are several reasons why we chose to explore an
et al. 2013, p. 475). Using this definition, previous
approximate dynamic programming (ADP) framework
approaches applied to LNG-IRPs can be divided into
over the myriad algorithms available for deterministic
two groups based on whether the actor has control of
IRPs. First, ADP has a proven track record of generating
both the production and consumption ports or just
high-quality solutions to dynamic resource allocation one of the two. Rakke et al. (2011, 2015), Stålhane et al.
problems, of which dynamic fleet management is a (2012), and Halvorsen-Weare and Fagerholt (2013) treat
special case (Topaloglu and Powell 2006). When mod- the case where the actor only has control of production
eled as a dynamic program, our problem shares many by attempting to generate annual delivery plans for
features of the dynamic fleet management problem the world’s largest LNG producer. The producer has
(DFMP). In this context, our problem evolves as a to fulfill a set of long-term customer contracts. Each
sequence of dispatching problems where in each time contract either outlines monthly demands or states that
period, vessels in ports are available to be dispatched to a certain amount of LNG is to be delivered fairly evenly
another port. The key difference between our problem spread throughout the year to a given consumption
and the DFMP is the presence of an inventory man- port. Over- and under-deliveries are accepted but incur
agement component. A second reason is the success a penalty. In contrast, there are also LNG-IRPs that
ADP has enjoyed when solving large-scale dynamic arise for vertically integrated companies who have
problems in road-based and rail-based applications control of both the production and consumption side
found in industry (Simão et al. 2009; Bouzaiene-Ayari of the supply chain (Grønhaug and Christiansen 2009;
et al. 2014). A third motive is that ADP has the ability Grønhaug et al. 2010; Fodstad et al. 2010; Goel et al.
to accommodate stochasticity without drastic changes 2012, 2014; Shao et al. 2014). In some applications,
to the framework or implementation. Although we only the opportunity to sell LNG in the spot market using
consider deterministic problems, we find great interest short-term contracts is also present.
in being able to adapt the framework developed here Several solution methods for the case when the actor
to stochastic variants of the underlying deterministic only has control of production have been investigated.
problem. Rakke et al. (2011) propose a rolling horizon heuristic
The primary contributions of this paper are the fol- in which a sequence of overlapping MIP subproblems
lowing: (1) The development of an ADP algorithm are solved. Each subproblem involves at most three
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
872 Transportation Science 49(4), pp. 870–885, © 2015 INFORMS

months of data and consists of a one-month “central different types to fulfill these service requests. Myopi-
period” and a “forecasting period” of at most two cally choosing the vehicle type that maximizes the
months. Once a best solution is found (either by opti- immediate profit is often not best over a longer horizon.
mality or within a time limit), all decision variables in Empty repositioning is also a key issue.
the central period are fixed at their respective values, Our point of departure is the class of the DFMPs
and the process “rolls forward” to the next subprob- studied in Godfrey and Powell (2002a, b), Topaloglu
lem. Stålhane et al. (2012) propose a construction and and Powell (2006), and Topaloglu (2006, 2007). God-
improvement heuristic that creates scheduled voy- frey and Powell (2002a) study a stochastic DFMP in
ages based on the availability of vessels and product which requests for vehicles to move items from one
while keeping inventory feasible. Halvorsen-Weare location to another occur randomly over time and
and Fagerholt (2013) study a simplified version of the expire after a certain number of periods. Once a vehicle
LNG-IRP where cargoes for each long-term contract arrives at its destination node (location-time pair), it
are pregenerated with defined time windows, and the is available for servicing another request or for trav-
fleet of ships can be divided into disjoint groups. The eling empty to a new location. A single vehicle type
problem is decomposed into a routing subproblem and with single-period travel times is considered and an
a scheduling master problem where berth, inventory, ADP algorithm in which a separable piecewise linear
and scheduling decisions are handled in the master concave value function approximation is shown to
problem, and routing decisions are dealt with in the yield strong performance. This work is extended in
subproblem. Unlike branch-and-price, the subproblems Godfrey and Powell (2002b) to handle multiperiod
are solved only once. Most recently, Rakke et al. (2015) travel times between locations. Further extensions are
developed a branch-price-and-cut approach that relies made to allow for deterministic multiperiod travel
on delivery patterns at the customers. times with multiple vehicle types (Topaloglu and Pow-
Solution techniques for MIRPs faced by a vertically ell 2006), random travel times with a single vehicle
integrated company are presented in Papageorgiou type (Topaloglu 2006), and random travel times with
et al. (2014b), where MIRPs with inventory tracking multiple types (Topaloglu 2007). In all of these stud-
at every port are surveyed. Grønhaug et al. (2010) ies, separable piecewise linear concave value function
introduce a branch-and-price method in which the approximations are used and shown to work well.
master problem handles the inventory management There are two important observations to make regard-
and the port capacity constraints, and the subproblems ing the above papers. First, they all treat dynamic
generate the ship route columns. Fodstad et al. (2010) fleet management problems, not inventory routing
solve an MIP directly while Uggen, Fodstad, and problems. In the dynamic fleet management setting,
Nørstebø (2013) present a fix-and-relax heuristic. Goel tasks occur over time and must be completed before
et al. (2012) present a simple construction heuristic and some expiration period. In the IRP setting, the notion
adapt the local search procedure of Song and Furman of a task does not exist since product is being continu-
(2013) to generate solutions to instances with 365 time ously produced and consumed. Consequently, for the
periods. Their model seeks to minimize penalties and DFMP, the movement of vehicles is critical, whereas
does not consider travel costs. the amount of a product on the vehicles or at each
location is not an issue and, therefore, is not modeled.
2.2. Approximate Dynamic Programming Second, they all use value function approximations
Over the past few decades, ADP has emerged as a that are only a function of the vehicle state. That is, they
powerful tool for certain classes of multistage stochastic value the number of each vehicle type that will be
dynamic problems. The monographs by Bertsekas and available at each location over future time periods.
Tsitsiklis (1996), Sutton and Barto (1998), and Powell In contrast, Toriello, Nemhauser, and Savelsbergh
(2011) provide an introduction and solid foundation (2010) use value function approximations that are a
to this field. In the last decade, Powell (2011) and his function of the inventory state at each location to address
associates have successfully applied ADP to large-scale a deterministic IRP with a planning horizon of 60 peri-
applications arising in transportation and logistics. ods. Their problem involves a fleet of homogeneous
Our work builds on the ideas presented by Powell vehicles that transport a single product between a
and his associates in the context of stochastic dynamic single loading region and a single discharging region.
resource allocation problems. Dynamic fleet manage- Each region may have multiple ports. They assume that
ment problems are a special case in this problem class. (1) the interregional travel time is a constant regardless
When modeled as MIPs, these problems take place on of which location is last visited in the loading region
a time-space network involving location-time pairs. Ser- and which location is the first visited in the discharging
vice requests (demands for service) from location i to region and that (2) all locations visited in a region by
location j appear over time (randomly, in the stochastic the same vehicle are visited in the same time period.
setting), and profit is earned by assigning vehicles of With these assumptions, the problem reduces to an IRP
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
Transportation Science 49(4), pp. 870–885, © 2015 INFORMS 873

with single-period travel times. In addition, after trav- product and a production port can sell excess inventory
eling from the loading region to the discharging region, whenever necessary. The penalty parameter Pj1 t denotes
vehicles exit the system because they are assumed to the unit cost associated with the spot market at port j in
behave like voyage-chartered vessels as in Furman et al. time period t. We assume that Pj1 t > Pj1 t+1 for all t ∈ T
(2011), Song and Furman (2013), Engineer et al. (2012), so that the spot market is only used as late as possible,
and Hewitt et al. (2013). They solve a nontrivial MIP i.e., to ensure that a solution will not involve lost
in each subproblem. They employ separable piecewise production (stockout) until the inventory level reaches
linear concave value functions of the inventory to gen- capacity (falls to zero). When the penalty parameters
erate high-quality solutions much faster than solving a are large (like a traditional “Big M” value), inventory
large MIP model with a commercial solver. However, bounds can be considered “hard” constraints. When
their value function approximation requires hours of they are small, however, inventory bounds can be
off-line training to construct; all of our computations treated as “soft” constraints. This “soft” interpretation
are performed online. may be beneficial in strategic planning problems for
Bouzaiene-Ayari et al. (2014) tackle locomotive plan- several reasons. First, a user may attempt to solve an
ning problems arising at Norfolk Southern. One of their instance with a demand forecast that cannot be met by
models involves much more detail than the one pre- the existing fleet to understand the limitations of the
sented here, and ADP is successfully employed to gen- current infrastructure (see also Goel et al. 2012). Second,
erate high-quality solutions. Like Toriello, Nemhauser, the inventory bounds given as input may be overly
and Savelsbergh (2010), they also solve MIP subprob- conservative to make the solution more robust when, in
lems. They updated the slopes of their value function fact, slight bound violations may be acceptable. Third,
approximations using the duals of the LP relaxation. incurring a small penalty for a particular solution (as
In this paper, we extend the ideas above for the opposed to declaring it strictly infeasible) can mitigate
DFMP by considering a deterministic IRP with a sin- minor unwanted effects of using a discrete-time model
gle loading region, multiple discharging regions, and (Papageorgiou et al. 2014b).
multiperiod travel times. One-way travel times range Vessels travel from port to port, loading and dis-
between five and 37 periods. Like Toriello, Nemhauser, charging product. We assume vessels fully load and
and Savelsbergh (2010), we employ value function fully discharge at a port and that direct deliveries are
approximations that are only a function of the inventory made. Each vessel belongs to a vessel class vc ∈ VC.
state. However, the presence of multiple discharging Vessel class vc has capacity Qvc . Vessels are owned by
regions, multiperiod travel times, and longer time the supplier or time-chartered for the entire planning
horizons makes our problem arguably more complex. horizon. We assume that port capacity always exceeds
vessel capacity, i.e., Sj1max vc
t ≥ max8Q 2 vc ∈ VC9, and that
vessels can fully load or discharge in a single period.
3. Problem Description and These assumptions allow vessels to load or discharge
Formulations in the same period in which they leave a port so that
In this section, we present a MIP formulation as well loading and discharging decisions do not need to be
as a dynamic programming formulation of the prob- explicitly modeled.
lem. We begin with a description of the problem and In both formulations below, it is convenient to model
introduce relevant notation. the problem on a time-expanded network. The network
Let T be the set of time periods and let the full has a set N01 T +1 of nodes and a set A of directed arcs.
horizon be of length T = —T—. Let JP and JC denote The node set is shared by all vessel classes, whereas
the set of production (or loading) and consumption (or each vessel class has its own arc set Avc . The set N01 T +1
discharging) ports, respectively, and let J = JP ∪ JC of nodes consists of a set N = 84j1 t52 j ∈ J1 t ∈ T9 of
be the set of all ports. Let the parameter ãj be 1 if “regular” nodes, or port-time pairs, as well as a source
j ∈ JP and −1 if j ∈ JC . Each port j has a berth limit Bj . node n0 and a sink node nT +1 .
We assume that there is exactly one port within each Associated with each vessel class vc is a set Avc
region (consequently, we will use the terms “port” and of arcs, which can be partitioned into source, sink,
“region” interchangeably). Note that each port may have waiting, and travel arcs. A source arc a = 4n0 1 4j1 t55
only one classification: loading or discharging. Let dj1 t from the source node to a regular node represents the
denote the amount of product produced or consumed arrival of a vessel to its initial destination. A sink arc
at port j in time period t. Each port has an inventory a = 44j1 t51 nT +1 5 from a regular node to the sink node
capacity of Sj1max
t . The amount of inventory at the end of conveys that a vessel is no longer being used and has
time period t must be between 0 and Sj1max t . exited the system. A waiting arc a = 44j1 t51 4j1 t + 155
Since it may not be possible to satisfy all demand from a port j in time period t to the same port in
or avoid hitting tank top, we include a simplified time period t + 1 represents that a vessel stays at the
spot market so that a consumption port may buy same port in two consecutive time periods. Finally,
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
874 Transportation Science 49(4), pp. 870–885, © 2015 INFORMS

 
a travel arc a = 44j1 1 t1 51 4j2 1 t2 55 with j1 6= j2 represents
Qvc xavc
X X
+ ãj dj1 t − − j1 t 1
travel between two distinct ports, where the travel
vc∈VC a∈FSvc1
n
inter
time t2 − t1 between ports is given. If a travel or sink
arc is taken, we assume that a vessel fully loads or ∀ n = 4j1 t5 ∈ N1 (1c)
discharges immediately before traveling. The cost of X X
xavc ≤ Bj 1 ∀ n = 4j1 t5 ∈ N1 (1d)
traveling on arc a ∈ Avc is Cavc . vc∈VC a∈FSvc1 inter
n
The set of all travel and sink arcs for each vessel
class are denoted by Avc1 inter (where “inter” stands for j1 t ≥ 01 ∀ n = 4j1 t5 ∈ N1 (1e)
“interregional”). The set of incoming and outgoing arcs sj1 t ∈ 601 Sj1max
t 71 ∀ n = 4j1 t5 ∈ N1 (1f)
associated with vessel vc ∈ VC at node n ∈ N01 T +1
is denoted by RSvc n (for reverse star) and FSn (for
vc xavc ∈ 801 191 ∀ vc ∈ VC1 ∀ a ∈ Avc1 inter 1 (1g)
vc1 inter
forward star), respectively. Similarly, FSn denotes xavc ∈ + 1 ∀ vc ∈ VC1 ∀ a ∈ A \A vc vc1 inter
0 (1h)
the set of all outgoing travel and sink arcs at node n
for vessel class vc. For our strategic planning problem, The objective is to minimize the sum of all trans-
modeling the flow of vessel classes avoids the additional portation costs and penalties for lost production and
level of detail associated with modeling each individual stockout. (We write the model as a maximization
vessel. Moreover, we found that modeling vessel classes problem to coincide with the framework used in our
could remove symmetry and improve solution times by dynamic programming formulation below, where it is
more than an order of magnitude on large instances. typical to maximize a value function.) Constraints (1b)
Assumptions: For ease of reference, we collect the require flow balance of vessels within each vessel
assumptions made throughout this paper: (1) There is class. Constraints (1c) are inventory balance constraints
exactly one port within each region. (2) Port capacity at loading and discharging ports, respectively. Berth
always exceeds the capacity of the vessels; e.g., Sj1max limit constraints (1d) restrict the number of vessels
t ≥
max8Qvc 2 vc ∈ VC9. (3) Travel times are deterministic. that can attempt to load/discharge at a port at a
(4) Vessels can fully load or discharge in a single period given time. This formulation requires that a vessel
(in other words, the time to load/discharge is deter- must travel at capacity from a loading region to a
ministic and built into the travel time). (5) Production discharging region and empty from a discharging
and consumption rates are known. (6) There is a single region to a loading region. This model does not require
loading port as is typically the case for LNG-IRPs decision variables for tracking inventory on vessels
(Rakke et al. 2015; Halvorsen-Weare and Fagerholt (vessel classes), nor does it include decision vari-
2013; Stålhane et al. 2012; Goel et al. 2012). (7) In a ables for the quantity loaded/discharged in a given
single time period, at most one vessel per vessel class period.
may begin an outgoing voyage to a given discharging This model is similar to the one studied in Goel
region or a return voyage to the loading region. et al. (2012). The major differences are that they do
not include travel costs in the objective function; they
3.1. A Discrete-Time Arc-Flow MIP Model model each vessel individually (in other words, there
Before describing an MIP formulation of the problem, is only one vessel per vessel class); they model con-
we need to define the decision variables. Let xavc be sumption rates as decision variables with upper and
the number of vessels in vessel class vc that travel on lower bounds; and they include an additional set of
arc a ∈ Avc . Let sj1 t be the ending inventory at port j continuous decision variables to account for cumulative
in time period t. Initial inventory sj10 is given as data. unmet demand at each consumption port.
Finally, let j1 t be the amount of inventory bought from 3.2. Dynamic Programming Formulation
or sold to the spot market near port j in time period t. We now formulate our MIRP as a finite-horizon
We consider the following discrete-time arc-flow MIP dynamic programming (DP) problem. It is convenient
model: to interpret this DP representation as a sequence of
MIP Model dispatching problems. At each point in time, a regional
manager has a set of vessels available for dispatching in
 
8−Cavc xavc 9 +
X X XX
max 8−Pj1 t j1 t 9 (1a) his region. If enough inventory is available for a vessel
vc∈VC a∈Avc j∈J t∈T
to fully load or enough excess capacity is available
s0t0  for a vessel to fully discharge, then the manager faces
+1 if n = n0
X vc X vc  three options for each available vessel: send the vessel
xa − xa = −1 if n = nT +1 1 to another region, have the vessel remain in the region,
a∈FSvc a∈RSvc

n n
0 if n ∈ N or force the vessel to exit the system.

With this interpretation in mind, we now describe
∀ n ∈ N01 T +1 1 ∀ vc ∈ VC (1b)
the DP formulation. The state of the system at time t is
sj1 t = sj1 t−1 given by the vector tuple 4rt 1 st 5, where
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
Transportation Science 49(4), pp. 870–885, © 2015 INFORMS 875

rt = 6rj1vcu1 t 7j∈J1 u=t10001T 1 vc∈VC , a vector of current and keeps track of the number of vessels in each vessel
future vessel positions class that will become available in some future time
st = 6sj1 u1 t 7j∈J1 u=t10001T , a vector of current and future period u > t. Inventory at ports is updated according
inventory levels to the equations
rj1vcu1 t = Just before making decisions in time period t (
(i.e., in the time t subproblem), the number s + dj1 u − j1 u − qj1outu if j ∈ JP 1
sj1 u1 t = j1 u−11 t
of vessels in vessel class vc that are or will sj1 u−11 t − dj1 u + j1 u + qj1inu if j ∈ JC 1
be available for service at location j in the
beginning of time period u when decisions are ∀ n = 4j1 t5 ∈ N1 ∀ u ≥ t1 (4)
made in time period u (u ≥ t)
where qj1inu and qj1outu represent the quantity of inven-
sj1 u1 t = The number of units of inventory “available”
tory incoming to and outgoing from port j at time u
at location j at the end of time period u, after
after decisions in time t have been made. Specifically,
making and executing all decisions in the time t
define qj1outu = vc∈VC Qvc 4 a∈FSvc1 inter xavc 5, if u = t and 0
P P
subproblem. 4j1 u5
if u > t, and qj1inu = vc∈VC Qvc 4 a∈XS xavc + rj1vcu1 t 5 with
P P
Here, “available” inventory refers to inventory that
XS = FSvc1 inter
4j1 u5 if u = t and XS = RSvc 4j1 u5 if u > t. Last,
is either in storage at the port (i.e., has already been
before transitioning from the time t subproblem to the
discharged) or is on vessels that are at the port but have
time t + 1 subproblem, we must initialize sj1t1t+1 = sj1 t1 t
yet to discharge. The initial state of the system, i.e.,
for all j ∈ J.
inventories and vessel positions, is given. Let sj1 t−11 t
Using the principle of optimality, we can write our
denote the initial inventory available at port j in the
time t optimization problem as
beginning of time period t prior to any events (e.g.,
decisions, deliveries, consumptions, etc.) taking place.

−Cavc xavc
X X X
Given a time period t and the state of the system, Vt 4rt 1st−1 5 = max
vc∈VC j∈J a∈FSvc
we have restrictions on the number and weighted 4j1 t5

combination of vessels that may leave a port in a given X



time period: − Pj1 t j1 t +Vt+1 4rt+1 1st 5 (5a)
j∈J

xavc ≤ Bj 1 ∀ n = 4j1 t5 ∈ N
X X
(2a)
s.t.
vc∈VC a∈FSvc1
n
inter

(2)1 (3)1 (4) (5b)


(
sj1 t−11 t +dj1 t if j ∈ JP 1
Qvc xavc ≤
X X
vc∈VC a∈FSvc1 inter Sj1max
t −sj1 t−11 t −dj1 t if j ∈ JC 1 j1 u ≥ 01 ∀ n = 4j1 u5 ∈ N2 u ≥ t1 (5c)
n

∀ n = 4j1 t5 ∈ N0 (2b) sj1 u1 t ≥ 01 ∀ n = 4j1 u5 ∈ N2 u ≥ t1 (5d)


xavc ∈ 801 191 ∀ vc ∈ VC1
Constraints (2a) are berth limit restrictions (identical to
Constraints (1d)) and limit the number of vessels that ∀ a ∈ Avc1 inter 2 a = 44·1 t51 4·1 ·551 (5e)
may take an interregional or sink arc in time period t. xavc ∈ + 1 ∀ vc ∈ VC1 ∀ a ∈ A \A vc vc1 inter
2
Constraints (2b) ensure that the maximum amount
of inventory that can be loaded (discharged) onto all a = 44·1 t51 4·1 ·550 (5f)
vessels leaving a port does not exceed the amount of
available inventory (remaining capacity) at that port. Note that Vt is a function of rt and st−1 , not st . This
Next, we have to model the dynamics of the system, is because we have followed the standard notation
i.e., the transition of vessels and inventory over time. in inventory models where a variable st denotes the
To model the flow of vessels, we have the following ending inventory in time period t. Also note that we
requirements: only require the inventory variables sj1 u1 t to be non-
negative and not below port capacity. This is because
X vc
xa = rj1vct1 t 1 ∀ n = 4j1 t5 ∈ N1 ∀ vc ∈ VC (3a) according to our definition, sj1 u1 t represents the amount
a∈FSvc
n of inventory in storage or on a vessel at port j in some
future time period u and therefore could easily exceed
rj1vcu1 t+1 − xavc = rj1vcu1 t 1
X
a=44i1 t51 n5∈RSvc
capacity at a port.
n

∀ n = 4j1 u5 ∈ N2 u > t1 ∀ vc ∈ VC0 (3b)


4. Solution Methodology
Equation (3a) states that all vessels available at time t Solving dynamic programming problems is notoriously
must transition by remaining at the same port, moving challenging because of the curse of dimensionality:
to another port, or exiting the system. Equation (3b) As the dimension of the state space grows, the time
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
876 Transportation Science 49(4), pp. 870–885, © 2015 INFORMS

required to solve the problem exactly grows expo- function approximations (VFAs) have enjoyed much
nentially. The MIRP studied here is no exception. success. Toriello, Nemhauser, and Savelsbergh (2010)
Attempting to solve Bellman’s equation (5) exactly is note that PWL concave functions are appropriate for
futile. Instead, we try to solve it approximately using several reasons. From a modeling viewpoint, they
ADP methods. can easily be embedded into an MIP (when solved
We accomplish this by replacing the future value as a maximization problem). From a practical per-
function Vt+1 with a suitable approximation V̂t+1 and spective, concavity captures the diminishing returns
solving the approximate problem one expects to gain from future inventories. Finally,
 from a theoretical perspective, they are the “closest”
Ṽt 4rt 1 st−1 5 = max
X X X
8−Cavc xavc 9 continuous functions to true MIP value functions,
vc∈VC j∈J a∈FSvc
which are known to be piecewise linear, superadditive,
4j1 t5
 and upper semicontinuous, but possibly discontinu-
X ous (Blair and Jeroslow 1977, 1979). Separability in
− Pj1 t j1 t + V̂t+1 4rt+1 1 st 5 (6a)
j∈J
space/location is also quite natural for problems in
which vehicles always fully load and fully discharge at
s0t0 (5b)–(5f)0 (6b) a single location (Topaloglu and Powell 2006; Topaloglu
2007). Meanwhile, separability in time is a fairly major
Now we describe our algorithm. Pseudocode of our
issue and is less understood but has proven to be
approach is shown in Algorithm 1. The most common
effective in a stream of research papers for dynamic
ADP methods step forward in time. The decisions made
fleet management applications (Godfrey and Powell
in the time t subproblem are guided by the current
2002b; Topaloglu and Powell 2006; Topaloglu 2007;
value function approximation, as shown in step 5. After
Ruszczynski 2010). It should be noted that ADP may
a solution to the time t subproblem is obtained, we struggle on DFMPs in which a load/activity can be
typically collect information to determine what the served over multiple time periods.
marginal benefit would be from having an additional In this work, we also use a value function approxi-
vessel or an additional unit of inventory available at mation that is a separable piecewise linear continuous
a given port and future time. Next, we update the function, but we deviate by removing the concavity
state of the system. Once all subproblems have been restriction. From a modeling point of view, removing
solved, a solution to the full planning problem exists this restriction offers more flexibility in the approxi-
and we update the value function approximations mation. Meanwhile, this added flexibility introduces
using information obtained from the current solution several concerns. A first concern is that the resulting
and from each of the subproblems. optimization subproblem will be much more challeng-
Algorithm 1 (Basic Deterministic ADP Algorithm) ing to solve. As will be shown in our computational
experiments, this turns out not to be the case for our
1: Initialization: Choose an approximation V̂t for all basic implementation. General discontinuous piecewise
t ∈ T. linear functions can be modeled well using MIP tech-
2: for n = 1 to N do niques and, in our case, the resulting MIPs are easily
3: Initialize the state of the system 4r1 1 s0 5. solved in seconds or less. Another worry is that the
4: for t = 1 to T do lack of concavity will prolong the convergence of the
5: Solve the time t subproblem algorithm. Indeed, concavity has been shown to accel-
−Cavc xavc
X X X
max erate the rate of convergence of ADP algorithms for
x1 
vc∈VC j∈J a∈FSvc
4j1 t5 certain problem classes (Nascimento and Powell 2009,
2013). This, too, does not appear to be a major hurdle
X
− Pj1 t j1 t + V̂t+1 4rt+1 1 st 50
j∈J for our algorithm. A third complexity stems from the
6: Obtain marginal value information. infamous “exploration versus exploitation” question in
7: Update the state of the system using ADP algorithms: Should we make a decision because,
Equations (3) and (4). given our current value function approximation, it
8: end for appears to be optimal, or should we explore a new state
9: Update the value function approximation: to garner additional information (see, e.g., Chapter 12
V̂t ← Update4V̂t 1 rt 1 st 1 t 5 for all t ∈ T. of Powell 2011)? Without concavity, it may be necessary
10: end for to introduce some form of exploration.
To construct our approximation, we replace the true
11: return The best solution found and its
value function Vt+1 4rt+1 1 st 5 with an approximate value
corresponding value function approximations.
function that depends only on future inventory levels
4.1. Value Function Approximations at discharging ports:
For dynamic resource allocation maximization prob-
X X
V̂t+1 4st 5 = V̂j1 u 4sj1 u1 t 50
lems, separable piecewise linear (PWL) concave value j∈JC u≥t+’j
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
Transportation Science 49(4), pp. 870–885, © 2015 INFORMS 877

Here, V̂j1 u is a univariate piecewise linear concave Figure 1 illustrates how our approximation is used.
function defined by two slopes v̂j11 u ≥ 0 and v̂j12 u = 0 and Given an available vessel in the loading region (LR)
two breakpoints ‚1j1 u = dj1 u and ‚2j1 u = ˆ for each dis- in time period 1, the time 1 subproblem considers
charging region j ∈ JC and for each time period u ∈ T. the trade-off between the immediate cost of moving
The parameter ’j is the travel time between the loading the vessel and the reward associated with satisfying
region and discharging region j. No value is given to future demands. In this example, discharging region 1
inventory in the loading region. Note that this approx- (DR1) and 2 (DR2) have a capacity-to-rate ratio of
imation ignores the number of vessels that will be four and three periods, respectively. The first slope of
available in the future. Although it might at first seem each VFA V̂j1 u reflects the value of having inventory
like a significant amount of information is not being sj1 u in that specific period. By considering a single
used, in fact, it is not the case. Since vessels always region j and summing over all future time periods in
fully discharge, knowing the future amount of available which a piecewise linear concave function is shown,
inventory sj1 u1 t at a discharging port is more useful we obtain a single value of future inventory V̂t+1 4st 5
than knowing the number of vessels in each vessel that is weighted by the value of inventory in each
class that will make the delivery. On the other hand, period as shown in Equation (8).
some information is lost at the loading port, namely, Although our value function approximations at each
the availability of future vessels to deliver product. node are shown to be piecewise linear concave with
With this approximation, the term V̂t+1 4st 5 becomes respect to the known future inventory at that node,
X X X k k the resulting approximation may not be concave with
v̂j1 u wj1 u1 t 1
j∈JC u≥t+’j k∈811 29
respect to the future inventory at time period t + ’j .
Recall that vessels dispatched from the loading port
where wj1k u1 t are continuous decision variables that in time period t will arrive at discharging port j at
relate to the inventory variables sj1 u1 t through the time period t + ’j . Because travel times are assumed
constraints to be known and identical for all vessel classes, all
sj1 u1 t =
X k
wj1 u1 t ∀ n = 4j1u5 ∈ N2 j ∈ JC 1 u ≥ t (7a) vessels dispatched in an earlier time period will arrive
k∈811 29
before time period t + ’j . Thus, the ending inventory
in time period u > t + ’j at port j is related to the
0 ≤ wj1k u1 t ≤ ‚kj1 u ending inventory at time period t + ’j by the equation
sj1 u1 t = 6sj1t+’j 1t − uu0 >t+’j dj1u0 7+ where 6a7+ = max801 a9.
P
∀ k ∈ 811 291 ∀ n = 4j1 u5 ∈ N2 j ∈ JC 1 u ≥ t0 (7b)
Writing our VFA solely in terms of functions Ŵj1t+’j
As a final approximation, rather than consider the
that depend on the ending inventories at time t + ’j ,
value of all future inventories at a port j on and after
we obtain an equivalent representation of our VFA as
time period t + ’j , we truncate the time horizon of
the sum of nonconvex PWL functions:
the subproblem based on travel times and so-called
capacity-to-rate ratios. In particular, let C2Rj1 t be the X X X
V̂t+1 4st 5 = V̂j1 u 4sj1 u1 t 5 = Ŵj1t+’j 4sj1t+’j 1t 50
capacity-to-rate ratio at discharging port j beginning in j∈J u≥t+’j j∈J
time period t, i.e., the number of periods it will take for
port j to run out of inventory when starting full in time An example of this interpretation is shown in Figure 2.
period t. Then in time period t, we only value inventory In summary, our time t MIP subproblem is
up to time period t + uj1 t where uj1 t = ’j + C2R j1 t − 1. 
The rationale for this truncation is to avoid giving
8−Cavc xavc 9− Pj1t j1t
X X X X
ports with a high consumption rate and a short travel V̂t 4rt 1st−1 5 = max
vc∈VC j∈J a∈FSvc j∈J
time an artificially high reward for sending a vessel. 4j1t5

For example, suppose there are two discharging ports t+u


X Xj1t X

k k
and the travel times to port 1 and port 2 are five and + v̂j1u wj1u1t (9a)
30 periods, respectively. Then our subproblem needs to j∈JC u≥t+’j k∈81129
consider at least 30 periods. Suppose the subproblem
includes 35 periods. If port 1 has a high consumption s.t.
rate and port 2 has a low consumption rate, then
valuing future inventory at port 1 from time period (2)1 (4)1 (5c)–(5f)1 (7) (9b)
five to 35 would make it very attractive to send vessels sj1 u1 t j1 u = 01 ∀ n = 4j1 u5 ∈ N2 u ≥ t1 (9c)
to port 1. This is a situation we would like to avoid.
With this truncation, the approximation becomes where we linearize the constraint sj1 u1 t j1 u = 0 using big
t+uj1 t M parameters: sj1 u1 t ≤ Myj1 u1 t and j1 u ≤ M41 − yj1 u1 t 5
v̂j1k u1 t wj1k u1 t 0 where yj1 u1 t is a binary decision variable taking value 1
X X X
V̂t+1 4st 5 = (8)
j∈JC u=t+’j k∈811 29 if sj1 u1 t is positive and 0 if j1 u is positive.
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
878 Transportation Science 49(4), pp. 870–885, © 2015 INFORMS

Time

LR 1, 1 1, 2 1, 6 1, 7 1, 8 1, 9 1, 10 1, 11

^ ^ ^ ^
V2, 8 V2, 9 V2, 10 V2, 11

Cost = 15 Cost = 20

s2, 8 s2, 9 s2, 10 s2, 11

DR1 2, 1 2, 2 2, 6 2, 7 2, 8 2, 9 2, 10 2, 11

^ ^ ^
V3, 6 V3, 7 V3, 8
Capacity-to-rate ratio = 4 periods

s3, 6 s3, 7 s3, 8

DR2 3, 1 3, 2 3, 6 3, 7 3, 8 3, 9 3, 10 3, 11

Capacity-to-rate ratio = 3 periods

Figure 1 Example of Dispatching Decisions Faced by the Loading Port in a Given Subproblem When Only One Vessel Class Is Present
Notes. A separable piecewise linear concave value function approximation with two slopes exists at each node. The subscript t on the variables sj1 u1 t has been
omitted.

It is important to mention that our approach does should not be truncated. Thus, as a final step in our
not allow vessels to leave the system until the very solution approach, we have a simple routine, which we
last time period of the horizon and therefore there is call “end effect polishing,” to remove these needless
no value for this option. Consequently, some needless trips that are an artifact of the finite horizon.
trips at the end of the horizon may take place. The
rationale for removing the option to take a vessel out 4.2. Updating the Value Function Approximation
of service is because, in our instances, there is not Having described our value function approximation,
an overabundance of vessels and so all vessels are we turn to the question of how we update it from
continually in operation. Moreover, some might argue iteration to iteration. Although our VFA is relatively
that our problem is an infinite horizon problem and simple, i.e., the only parameters that may be changed
are the slopes v̂j11 u , devising a suitable updating scheme
takes some care.
^
V2, 8
^
V2, 9
^
V2, 10
^
V2, 11 Just as there are numerous choices for designing a
value function approximation, there are also a number
of techniques commonly found for updating the VFAs
(see, e.g., George and Powell 2006). Perhaps the most
20
s2, 8
15
s2, 9 s2,10 s2, 11 important consideration is to determine what the goal
20 30
d2, 8 d2, 9 d2, 10 d2, 11 of the update is. In early iterations of an ADP algorithm,
it is often beneficial to explore the solution space.
^
W 2, 8 Translation of s2, u to s2, 8 Thus, it is usually preferred to have a fast update
s2, 9 = [s2, 8 – d2, 9]+ rule that results in substantial changes to the value
+ function. On the other hand, in later iterations, some
s2, 10 = [s2, 8 – d2, 9 – d2, 10]
sort of convergence is often desired, in which case
s2, 11 = [s2, 8 – d2, 9 – d2, 10 – d2, 11]+
small changes are sought. Regression and batch least-
squares are sometimes used (Powell 2011). Toriello,
s2, 8
20 35 55 85 Nemhauser, and Savelsbergh (2010) suggest other fitting
procedures.
Figure 2 From Concave to Nonconvex Piecewise Linear Value Function
Approximations Our focus is on generating one or more good solu-
Notes. In terms of ending inventory at each node, a separable PWL concave tions quickly; convergence is less of a concern. Con-
VFA with two slopes is used. In terms of ending inventory in the time period a sequently, we prefer rapid updates over those that
delivery is made, a single PWL nonconvex VFA is used. require a significant amount of fitting at each iteration.
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
Transportation Science 49(4), pp. 870–885, © 2015 INFORMS 879

To this end, we update each slope v̂j11 u using a generic to a long-horizon problem by solving a sequence of
smoothing technique: single-period subproblems, we are dependent on the
VFAs used in each subproblem to help us make wise
11 4n+15 11 4n5 4n5
v̂j1 u = 41 − ƒn 5v̂j1 u + ƒn j1 u 0 dispatching decisions that ultimately lead to a good
11 4n+15
solution.
In words, to obtain the slope v̂j1 u to be used on To decrease this dependence on the VFAs, we also
iteration n + 1, we take a convex combination of the explore the strategy of solving multiperiod subprob-
11 4n5
previous estimate of the slope v̂j1 u with some new lems. Powell refers to this as a hybrid procedure under
4n5
estimate j1 u obtained from the solution at iteration n. the name “Rolling Horizon Procedures with Value
Here, ƒn ∈ 601 17 is a parameter that depends on the Function Approximations” (see §6.5.2 of Powell 2011).
iteration. The idea is straightforward. In the time t subproblem,
With this updating procedure, there are two issues: rather than consider routing decisions for only those
the choice of the stepsize ƒn and how the new esti- vessels that are available to be dispatched in time
4n5
mate j1 u of the slope is obtained. We found that a period t, consider instead decisions for all vessels avail-
simple harmonic stepsize rule ƒn = 1/4C + n5 was able for dispatching in time periods t1 t + 11 0 0 0 1 t + H ,
adequate where C is some positive integer. In fact, where H is the lookahead. Once this more complex sub-
our methods exhibited similar performance for value problem is solved, fix only those dispatching decisions
of C ∈ 811 0 0 0 1 109. Consequently, C = 1 in all of our for vessels dispatched in time period t and continue to
computation results. The more challenging issue was the next subproblem starting in time period t + 1. The
how to obtain the new estimate of the slope. hope is that by looking ahead to see which vessels will
Our first attempt at obtaining new estimates of the be available in the near future as well as how these
slopes v̂j11 u was based on using the values of the dual vessels will be dispatched, our routing decisions for
variables j1 u1 t associated with each inventory balance vessels in time period t will be improved.
Equation (4). For the dynamic fleet management prob- For a multiperiod lookahead of H periods, we modify
lem, this approach has been shown to work well because the value function approximation as follows:
the subproblem can be recast as a network flow problem
t+H +uj1 t1 H
where dual information is readily available (Godfrey
v̂j1k u1 t wj1k u1 t 0
X X X
V̂t+1 4st 5 = (10)
and Powell 2002a, b). When the subproblem is an MIP, u=t+H +’j k∈811 29
j∈JC
it is not clear whether similar dual information can
be obtained. Bouzaiene-Ayari et al. (2014) solve MIP Approximation (10) is the same as the VFA in (8) with
subproblems and report good results from updating the modifications to the second summation. First, because
slopes of their VFAs using the duals of the LP relaxation. the last period in which a vessel can arrive at port j is
In all of our attempts, however, using the values of the t + H + ’j , we sum over future inventories from time
dual variables j1 u1 t from the LP relaxation was not period t +H +’j onwards (as opposed to t +’j onwards).
productive. Second, since multiple vessels may be dispatched to
Ultimately, we turned to an updating rule that we the same region and, therefore, discharge more product
believe is quite simple and works well but may not be than in the no lookahead case, we extend the number
applicable for other mainstream problems. One of the of time periods over which we value future inventory
dominant components in the objective function is the using the parameter uj1 t1 H 4≥ uj1 t 5. In our computational
penalty incurred for stocking out at a discharging port. experiments below, we set uj1 t1 H = uj1 t + 5 because
Thus, after obtaining a solution to the full planning this parameter setting gave the best performance, on
problem, we identified all nodes at which a stockout average, over all settings with uj1 t1 H = uj1 t + k for
occurred by observing the spot market quantities j1 u . k ∈ 801 11 0 0 0 1 109.
4n5
We then set j1 u = Pj1 u j1 u ; i.e., the marginal value of
an additional unit of inventory is proportional to the 4.4. Optimization-Based Local Search
price paid to satisfy the stockout in the period. Heuristic for MIP Model (1)
We attempt to solve MIP Model (1) with two methods.
4.3. A Multiperiod Lookahead These methods will act as a benchmark for our ADP
There are at least two potential drawbacks of the algorithm. Our first approach is to solve MIP Model (1)
basic ADP approach outlined above. First, our value directly using an MIP solver. One could also tighten
function approximation V̂t is not parameterized by the the formulation by appending, for example, lot-sizing
vector rt and thus completely ignores the information based cuts to the model as presented in Papageorgiou
related to future vessel arrivals at the loading port. et al. (2014b). In our experiments, we found that adding
Second, regardless of the VFA chosen, the success of such cuts to the model improved the dual bound but
the algorithm relies almost entirely on the quality of did not help the solver find good feasible solutions
the VFA. That is, by choosing to construct a solution faster; in fact, they often hampered primal performance.
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
880 Transportation Science 49(4), pp. 870–885, © 2015 INFORMS

Our second approach is to apply an optimization- apply the local search described above to the solution
based heuristic introduced by Song and Furman (2013) returned by our ADP after N iterations. In this setting,
for MIRPs but discussed for another application in we treat local search like a post-processing step to
Savelsbergh and Song (2008). In their setting, decisions correct local suboptimalities in our best ADP solution.
for each individual vessel are modeled. Their local
search procedure is akin to a 2-opt procedure and 5. Computational Experiments
works as follows: after obtaining an initial feasible In this section, we compare the performance of variants
solution, the decision variables associated with all of our ADP method with that of the commercial MIP
but two vessels are fixed and an exact optimization solver Gurobi 5.0 solving the MIP Model (1) and the
algorithm is called to locally optimize the decisions for MIP-based local search procedure outlined in §4.4. In all
these two vessels. This procedure is applied for up experiments in which the performance of Gurobi and
to —V—
2
iterations, where —V— is the number of vessels local search are evaluated, we set Gurobi’s MIPFocus
and vessel pairs are chosen randomly in each iteration. parameter to 1 to emphasize feasibility so that more
Goel et al. (2012) adapt this local search procedure to time and effort are spent trying to find good feasible
generate solutions to MIRP instances with 365 time solutions. Gurobi’s default settings were used to solve
periods. Their main algorithmic contribution is to show the MIP subproblems in our ADP methods. All value
how vessel pairs should be chosen to improve solution function approximations are initialized with zero slopes
quality and reduce total solution time. This local search and no off-line training is performed. All models
is also applied in Hewitt et al. (2013). and algorithms were coded in Python and run on a
Since our problem does not deal with individual single thread. All experiments were carried out on
vessels, we modified the above approach to work with a Linux machine with kernel 2.6.18 running a 64-bit
vessel classes. In our implementation, we construct x86 processor equipped with two 2.27 GHz Intel Xeon
an initial feasible solution by solving MIP Model (1) E5520 chips and 32 GB RAM.
directly using an MIP solver for up to 30 seconds. Because we set a time limit of 30 seconds to search
Although one could employ other techniques to con- over each neighborhood in our local search imple-
struct an initial feasible solution, we believe that this mentation, our local search is no longer deterministic
approach is the most sensible for comparison since because of the idiosyncrasies of MIP solvers. Therefore,
it does not involve any additional algorithms to be results in which local search is used are averaged over
implemented. Moreover, in other applications, the local 10 runs. In general, we observed little variability across
search procedure’s performance has been shown to these runs.
be nearly independent of the starting solution (Goel Computational experiments were conducted on a sub-
et al. 2012). With a feasible solution in hand, we fix all set of instances from the Maritime Inventory Routing
routing decisions associated with all but one vessel Problem Library (MIRPLib) (Papageorgiou et al. 2014b),
class and locally optimize the routing decisions for a library of MIRPs available at mirplib.scl.gatech.edu.
that one vessel class. We optimize one vessel class These instances are inspired by real-world MIRPs but
at a time because there are multiple vessels within do not represent any particular real-world data set. The
each class (thus, multiple vessels are simultaneously 24 instances (labeled “Group 2 Instances” in MIRPLib)
being rerouted), and for large instances simultaneously can be categorized as easy, moderate, or hard. The
optimizing two vessel classes led to challenging MIPs. easy instances include two discharging ports, whereas
Since the resulting MIP is still challenging to solve to the hard instances involve as many as 12 discharging
provable optimality, we set a time limit of 30 seconds. ports, 10 vessel classes, and one-way travel times of
For ease of implementation, we simply cycle through 37 periods.
the vessel classes one at a time for a given time limit. The main metric that we use to compare solution
Given the past success and popularity of this type methods is the average fraction of the relative gap
of heuristic, we believe this implementation offers a closed over time, where the average is taken over
meaningful benchmark. all 24 instances. Here the relative gap is defined as
4zmethod − zbest 5/zmethod , where zmethod is the objective
4.5. Combining ADP and Local Search function value of the method used (e.g., ADP, local
Finally, we describe another hybrid procedure of poten- search, or Gurobi emphasizing feasibility) and zbest
tial interest. In many instances, one of our ADP methods is the objective function value of the best known
quickly finds high-quality solutions within the first solution found after days of computing. Specifically,
few iterations. However, it may be the case that our for a particular instance, the value zbest was computed
ADP “gets stuck” and is unable to find a solution of by warm-starting the local search procedure with the
higher quality in the remaining iterations. This happens best ADP solution found, applying local search for
when the slope updates are unable to drive the ADP five hours, and then warm-starting Gurobi with this
to a better set of slopes. To find better solutions, we solution and solving for 10 days.
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
Transportation Science 49(4), pp. 870–885, © 2015 INFORMS 881

Table 1 Algorithms Compared 1.00

Algorithm Description Section 0.95

Average fraction of relative gap closed


ADP_LA0 Basic ADP with no lookahead 4.1 0.90
ADP_LAH ADP with a lookahead of H periods 4.3
LS Local search 4.4 0.85
GRB Gurobi 5.0 emphasizing feasibility 4.4
ADP_LA0_LS Basic ADP followed by local search 4.5 0.80

0.75
We compare the various algorithms listed in Table 1
0.70
on instances with a 120-period horizon. Although
LNG-IRPs usually require one to solve instances with 0.65
horizons of 90 to 365 periods, we believe that using
a 120-period horizon gives the strongest algorithmic 0.60 ADP_LA0_LS
comparison for two reasons. First, we performed pre- ADP_LA0
0.55 LS
liminary experiments in which we attempted to solve GRB
MIP Model (1) with a 180- and 360-period horizon 0.50

0
100
200
300
400
500
600
700
800
900
1,000
1,100
1,200
1,300
1,400
1,500
1,600
1,700
1,800
directly using Gurobi or by using local search but
found ADP to be vastly superior. Indeed, for instances Time (s)
with a 360-period horizon, our ADP with no lookahead
could close, on average, 81% of the relative gap in Figure 3 Comparison of Our Basic ADP with No Lookahead (ADP_LA0),
800 seconds, whereas Gurobi and local search could Our Basic ADP Followed by Local Search (ADP_LA0_LS),
close 27% and 56%, respectively, in 1,800 seconds. Gurobi (GRB), and Local Search (LS)
Our second reason for focusing on 120-period horizon
instances is because they offer a reasonable comparison observation is that our basic ADP with no lookahead
with rolling horizon heuristics, arguably the most (ADP_LA0) outperforms both Gurobi emphasizing
common heuristics applied to long-horizon problems. feasibility and the local search procedure with respect
For example, to generate solutions to planning problems to solution time (time to best) and quality. ADP_LA0
with a 365-period horizon, Rakke et al. (2011) solve nearly reaches its best solution within roughly two
subproblems involving 90 periods and piece together to three minutes, whereas Gurobi and local search
the solutions to these overlapping subproblems to require more time. Moreover, on average, the quality
create a solution for the full planning horizon. For of the ADP solution is 92% of the best known, whereas
several of our instances, one-way interregional travel local search is near 84% and Gurobi is at 72% after 30
times are more than 30 periods in duration and we minutes of CPU time.
found that solving a reduced MIP with a 90-period A second observation is the stagnation of ADP_LA0
time horizon could lead to solutions with odd end and local search after a given amount of time. In other
behavior. Extending these horizons over 120 periods words, after 10 to 30 iterations, ADP with no looka-
seemed to yield more stable results. Preliminary testing head is unable to find a set of slopes that produce an
on several 360-period horizon instances showed that,
improving solution. In light of the fact that our primary
when incorporated into a rolling horizon framework,
goal is to find good solutions quickly, this observation
our ADP could close, on average, 91% of the relative
is not disconcerting since this goal is achieved in two
gap compared to 84% (only some easy and medium
or three minutes. However, if more time is available,
instances were compared) without the rolling horizon
we would like to improve the solution quality further.
framework. We believe this experiment supports our
One option of overcoming this stagnation is to apply
claim that ADP can find good solutions to 360-period
local search on the best solution found by an ADP
instances. The implementation of our rolling horizon
method. The performance of this approach is shown in
heuristic was naive and only semiautomated. We solved
six 120-period instances rolling forward 60 periods after Figure 3 where we see that it is capable of closing 96%
each solve. Rather than attempt to improve this rolling of the gap, on average, after 900 seconds.
horizon implementation further, we decided to focus Another way to overcome this stagnation is to intro-
on solving 120-period instances to better understand duce more deliberate exploitation and exploration
the performance of our ADP algorithm. schemes. We attempted to achieve this by modifying
the ADP slope updating procedure. Borrowing from
5.1. Comparison of ADP with other popular heuristics (e.g., tabu search), we imple-
Traditional MIP Methods mented an intensification strategy in which, after a
Figure 3 shows the average fraction of the relative gap certain number of iterations, we set the slopes used in
closed over time by four methods. The most important the VFA equal to those that produced the best known
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
882 Transportation Science 49(4), pp. 870–885, © 2015 INFORMS

solution found thus far in the search process. Like of equal or better quality. A “>3,600” or “>18,000”
other intensification strategies, the hope is that by means that local search or Gurobi could not find a
returning to a good set of slopes at a later iteration better solution within a one-hour or five-hour time
when the stepsize ƒn is smaller, the slope updates limit, respectively. A negative value means that a bet-
would stay closer to those that produced the best ter solution was found in less time than in our ADP
solution and find a better neighboring solution. We method. We see that ADP with no lookahead performs
also experimented with a diversification strategy. Let best on 21 of the 24 instances. In three instances, local
rgapn = min84zn − zinc 5/zn 1 19 denote the relative gap search is able to find a better solution than ADP. In all
computed on the nth iteration of the ADP algorithm, but one instance, Gurobi requires more time to find a
where zinc is the objective function value of the incum- solution of equal or better quality.
bent solution (the best solution found by iteration n) Our convention for naming instances is based on
and zn is the objective function value found on the the number of loading and discharging regions, the
nth iteration of the algorithm. After a certain number number of ports, the number of vessel classes, and
of iterations, we altered the stepsize updating rule the number of vessels. This convention is best under-
to ƒn = 4C + 41 − rgapn 5n5−1 . If the objective function stood with an example. Consider an instance named
value zn of the solution found in iteration n is poor, LR1_DR03_VC05_V16b. LR1 means that there is one load-
we expect rgapn to be closer to 1 and so the stepsize ing region/port. DR03 means that there are three dis-
will be close to 1/C, leading to a more drastic change charging regions/ports. VC05 means that there are five
in the VFA updates. On the other hand, if rgapn is vessel classes. V16 means that there are a total of 16
small, the stepsize will be closer to 1/4C + n5 so that vessels (with at least one vessel belonging to each
the VFA updates are modest. In our experiments, these vessel class). Finally, if a letter is included at the end,
intensification and diversification strategies did not this is to distinguish this instance from other instances.
produce better results. Small improvements could be As a final comment, we believe that our ADP method
made on some instances, but typically at the expense becomes more attractive as the number of vessel classes
of deteriorated performance on other instances. increases. This is because the time it takes an MIP solver
Table 2 shows the time required for our ADP algo- to solve the MIP Model (1) directly or using local search
rithm to find its best solution and the additional time should increase more rapidly with an increase in vessel
that local search and Gurobi needed to find a solution classes than that of our ADP algorithm. Since some
applications may not allow vessels to be aggregated
Table 2 Instance-by-Instance Comparison: Additional Time (Sec) by vessel class, our ADP approach may become more
Required by Local Search (LS) and Gurobi (GRB) to Reach a appealing if each individual vessel must be modeled.
Solution of Equal or Better Quality

Time to best Additional time 5.2. Comparison of ADP Methods


With and Without a Lookahead
Instance ADP_LA0 LS GRB
We also attempted to improve our basic ADP method
LR1_DR02_VC01_V6a 0 >31600 456 by including a lookahead of H = 1, 2, or 3 periods.
LR1_DR02_VC02_V6a 15 −5 −5 As discussed in §4.3, our VFA ignores the availabil-
LR1_DR02_VC03_V7a 9 >31600 >181000 ity of vessels at the loading port in future periods.
LR1_DR02_VC03_V8a 17 >31600 93
By including a multiperiod lookahead of H periods,
LR1_DR02_VC04_V8a 9 >31600 208
LR1_DR02_VC05_V8a 1 >31600 101823 dispatching decisions for vessels available in time
LR1_DR03_VC03_V10b 84 >31600 51016 periods t1 0 0 0 1 t + H are considered. As a consequence,
LR1_DR03_VC03_V13b 21 >31600 >181000 we would hope that each subproblem would be less
LR1_DR03_VC03_V16a 13 16 220 myopic as more information is at hand.
LR1_DR04_VC03_V15a 96 >31600 195 Our experiments reveal a counterintuitive result.
LR1_DR04_VC03_V15b 193 >31600 >181000
LR1_DR04_VC05_V17a 25 >31600 471
As shown in Figure 4, given our choice of VFAs and
LR1_DR04_VC05_V17b 72 85 40 slope updates, a multiperiod lookahead of one to
LR1_DR05_VC05_V25a 37 >31600 >181000 three periods performs worse, on average, than having
LR1_DR05_VC05_V25b 198 117 435 no lookahead. We observed that in a small number
LR1_DR08_VC05_V38a 707 >31600 >181000 of instances a lookahead could outperform an ADP
LR1_DR08_VC05_V40a 26 >31600 >181000 with no lookahead, both in terms of solution time
LR1_DR08_VC05_V40b 520 −302 11973
LR1_DR08_VC10_V40a 46 >31600 >181000
and quality. One might ask if this inferior average
LR1_DR08_VC10_V40b 277 >31600 >181000 performance is because each major iteration (step 2
LR1_DR12_VC05_V70a 81 >31600 >181000 of Algorithm 1) of an ADP method with a lookahead
LR1_DR12_VC05_V70b 11400 −11011 >181000 requires more time because each MIP subproblem
LR1_DR12_VC10_V70a 11180 >31600 >181000 is more computationally demanding. This is not the
LR1_DR12_VC10_V70b 11314 >31600 >181000
case. Although we only show the first 30 minutes of
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
Transportation Science 49(4), pp. 870–885, © 2015 INFORMS 883

1.00 Table 3 Average Percentage of Time Spent in Each ADP-Related


Function
0.95
Function % of time
Average fraction of relative gap closed

0.90
MIP initialization 21058
0.85 MIP solving 47051
State updating 17064
0.80 End effect polishing 6002
Slope updating 0084
0.75 Miscellaneous 6041
0.70

0.65 time is spent either initializing or solving all of the


MIP subproblems. This large percentage is expected
0.60 ADP_LA0 since MIP solving is costly compared to all of the other
ADP_LA1
0.55 ADP_LA2 operations. However, a more intelligent implementation
ADP_LA3 should be able to shrink the percentage of time spent
0.50 on MIP initialization since this involves nothing more
0
100
200
300
400
500
600
700
800
900
1,000
1,100
1,200
1,300
1,400
1,500
1,600
1,700
1,800
than several loops. End effect polishing is discussed at
Time (s) the end of §4.1 and refers to a simple procedure that
we apply after obtaining a solution for the full horizon.
Figure 4 Comparison of Our Basic ADP with No Lookahead (ADP_LA0) Since our algorithm does not allow vessels to leave
with ADP Methods Using a 1-, 2-, and 3-Period Lookahead
the system, needless trips at the end of the horizon
may take place. End effect polishing seeks to remove
computation time, in all but a few cases, the ADP these needless trips that are an artifact of truncating
methods with a lookahead (after hours of computing) what some might argue is an infinite horizon problem.
still failed to achieve a comparable average solution In our approach, value function (or slope) updating
quality as our basic ADP method with no lookahead. requires virtually no time because convex combinations
One possible explanation for this lack of improve- of information are used. This percentage of time would
ment is that a longer lookahead (i.e., setting H > 3) is increase if one were to use more sophisticated schemes
required to take advantage of future information for for updating the value function, e.g., regression.
these instances. Because the median one-way travel
time is typically between 15 and 18 periods (giving
rise to round-trip travel times of 30 to 36 periods), a 6. Conclusions and Future Work
lookahead of H = 1, 2, or 3 periods may not capture This paper introduced an ADP framework for generat-
any relevant additional information versus having ing good solutions quickly to a class of MIRP with a
no lookahead at all. The problem with increasing the long planning horizon. The ADP approach appears to
lookahead, however, is that each subproblem becomes be one of the first of its kind in the maritime routing
even more time consuming, thus making it difficult to and scheduling domain and represents a significant
find good solutions quickly. departure from previous methods for this class of prob-
We also tried warm-starting a multiperiod lookahead lems. Rather than putting the burden on an MIP solver
with the best VFA found by our basic ADP with no to produce good solutions or improving solutions, our
lookahead. This stategy is similar to the intensification approach shifts this effort to identifying value function
idea described above. Specifically, after applying our approximations that lead to good solutions. Compu-
basic ADP method with no lookahead for N = 100 tational experiments indicate that this framework is
iterations, we initialized the VFA of an ADP with a capable of obtaining better solutions than a commercial
lookahead with the best VFAs, i.e., the set of slopes that MIP solver and a popular local search method tasked
produced the best solution over 100 iterations, found with considering many periods simultaneously.
by the ADP with no lookahead. Even after insisting Regarding future research directions, Powell (2011)
that smaller stepsizes take place, this approach still and his associates have laid the groundwork for an
failed, on average, to generate solutions of equal or algorithmic approach that can incorporate stochastic
better quality than ADP_LA0. elements, e.g., stochastic demands, travel times, etc.
Although we have not explored these extensions, the
5.3. Profiling the ADP Algorithm attractive feature of the proposed ADP framework is
Table 3 shows the percentage of time our basic ADP that it requires minor changes. In particular, it considers
algorithm with no lookahead spends in each of its sample realizations of the uncertain elements to solve
major functions, averaged over all instances and all each time t subproblem and then proceeds as normal.
time horizons considered. Almost 70% of the solution When the value function approximations are updated
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
884 Transportation Science 49(4), pp. 870–885, © 2015 INFORMS

with a convex combination procedure as we have done, Fodstad M, Uggen KT, Rømo F, Lium A-G, Stremersch G (2010)
the stepsize may become dependent on the noise in the LNGScheduler: A rich model for coordinating vessel routing,
inventories and trade in the liquefied natural gas supply chain.
estimates obtained over the course of the algorithm. J. Energy Markets 3(4):31–64.
Our framework concentrates on a setting involving Furman KC, Song J-H, Kocis GR, McDonald MK, Warrick PH
a single loading port/region. Considering multiple (2011) Feedstock routing in the ExxonMobil downstream sector.
loading ports/regions would be a useful extension and Interfaces 41(2):149–163.
George AP, Powell WB (2006) Adaptive stepsizes for recursive esti-
address an industrial setting that is likely to become mation with applications in approximate dynamic programming.
more prevalent. At the same time, the presence of Machine Learn. 65(1):167–198.
multiple loading ports introduces more complicated Godfrey GA, Powell WB (2002a) An adaptive dynamic programming
routing decisions. In addition, we would have to algorithm for dynamic fleet management, I: Single period travel
times. Transportation Sci. 36(1):21–39.
overcome what Godfrey and Powell (2002b) refer to as
Godfrey GA, Powell WB (2002b) An adaptive dynamic programming
the “long-haul bias,” a phenomenon in which more algorithm for dynamic fleet management, II: Multiperiod travel
costly decisions are often made to satisfy high value times. Transportation Sci. 36(1):40–54.
opportunities before less costly ones could be made to Goel V, Furman KC, Song J-H, El-Bakry AS (2012) Large neighborhood
search for LNG inventory routing. J. Heuristics 18(6):821–848.
satisfy the same opportunities.
Goel V, Slusky M, van Hoeve W-J, Furman KC, Shao Y (2014)
Another interesting experiment would be to assess Constraint programming for LNG ship scheduling and inventory
the benefit from storing value function approximations management. Eur. J. Oper. Res. Forthcoming.
when the model is reoptimized. For example, within the Grønhaug R, Christiansen M (2009) Supply chain optimization for
the liquefied natural gas business. Innovations in Distribution
context of a general decision support tool that is called
Logistics, Lecture Notes in Economics and Mathematical Systems,
every month to obtain an updated long-term plan, it Vol. 619 (Springer-Verlag, Berlin), 195–218.
seems likely that warm-starting our ADP framework Grønhaug R, Christiansen M, Desaulniers G, Desrosiers J (2010) A
with the best known value function approximations branch-and-price method for a liquefied natural gas inventory
routing problem. Transportation Sci. 44(3):400–415.
would lead to better solutions faster.
Halvorsen-Weare E, Fagerholt K (2013) Routing and scheduling in
a liquefied natural gas shipping problem with inventory and
Acknowledgments berth constraints. Ann. Oper. Res. 203(1):167–186.
The authors wish to thank Warren Powell for references to Hewitt M, Nemhauser GL, Savelsbergh MWP, Song J-H (2013)
recent ADP research as well as helpful “lessons from the A branch-and-price guided search approach to maritime inven-
field” that helped them avoid common pitfalls. They also tory routing. Comput. Oper. Res. 40(5):1410–1419.
Nascimento JM, Powell WB (2009) An optimal approximate dynamic
thank Belgacem Bouzaiene-Ayari for instructive algorithmic
programming algorithm for the lagged asset acquisition problem.
suggestions related to ADP implementation. They are grateful Math. Oper. Res. 34(1):210–237.
to two anonymous referees for their perceptive comments Nascimento JM, Powell WB (2013) An optimal approximate dynamic
that helped improve the quality of the paper. programming algorithm for concave, scalar storage problems
with vector-valued controls. Automatic Control, IEEE Trans.
58(12):2995–3010.
References Papageorgiou DJ, Keha AB, Nemhauser GL, Sokol J (2014a) Two-stage
decomposition algorithms for single product maritime inventory
Andersson H, Hoff A, Christiansen M, Hasle G, Løkketangen A (2010) routing. INFORMS J. Comput. 26(4):825–847.
Industrial aspects and literature survey: Combined inventory
Papageorgiou DJ, Nemhauser GL, Sokol J, Cheon M-S, Keha AB
management and routing. Comput. Oper. Res. 37(9):1515–1536.
(2014b) MIRPLib—A library of maritime inventory routing
Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynamic Programming (Athena
problem instances: Survey, core model, and benchmark results.
Scientific, Belmont, MA).
Eur. J. Oper. Res. 235(2):350–366.
Blair CE, Jeroslow RG (1977) The value function of a mixed integer
Powell WB (2011) Approximate Dynamic Programming: Solving the
program: I. Discrete Math. 19(2):121–138.
Blair CE, Jeroslow RG (1979) The value function of a mixed integer Curses of Dimensionality, 2nd ed. (John Wiley & Sons, Hobo-
program: II. Discrete Math. 25(1):7–19. ken, NJ).
Bouzaiene-Ayari B, Cheng C, Das S, Fiorillo R, Powell WB (2014) Rakke JG, Andersson H, Christiansen M, Desaulniers G (2015) A new
From single commodity to multiattribute models for locomo- formulation based on customer delivery patterns for a maritime
tive optimization: A comparison of integer programming and inventory routing problem. Transportation Sci. 49(2):384–401.
approximate dynamic programming. Transportation Sci., ePub Rakke JG, Stålhane M, Moe CR, Christiansen M, Andersson H, Fager-
ahead of print July 28, http://dx.doi.org/10.1287/trsc.2014.0536. holt K, Norstad I (2011) A rolling horizon heuristic for creating
Campbell A, Clarke L, Kleywegt A, Savelsbergh MWP (1998) The a liquefied natural gas annual delivery program. Transportation
inventory routing problem. Crainic TG, Laporte G, eds. Fleet Res. Part C 19(5):896–911.
Management and Logistics (Kluwer, Boston), 95–113. Ruszczynski A (2010) Post-decision states and separable approxima-
Christiansen M, Fagerholt K (2009) Maritime inventory routing prob- tions are powerful tools of approximate dynamic programming.
lems. Floudas CA, Pardalos PM, eds. Encyclopedia of Optimization, INFORMS J. Comput. 22(1):20–22.
2nd ed. (Springer-Verlag, New York), 1947–1955. Savelsbergh MWP, Song J-H (2008) An optimization algorithm for
Christiansen M, Fagerholt K, Nygreen B, Ronen D (2013) Ship the inventory routing problem with continuous moves. Comput.
routing and scheduling in the new millennium. Eur. J. Oper. Res. Oper. Res. 35(7):2266–2282.
228(3):467–483. Shao Y, Furman KC, Goel V, Hoda S (2014) Bound improvement for
Coelho LC, Cordeau J-F, Laporte G (2014) Thirty years of inventory- LNG inventory routing. Transportation Sci. Forthcoming.
routing. Transportation Sci. 48(1):1–19. Simão HP, Day J, George AP, Gifford T, Nienow J, Powell WB
Engineer FG, Furman KC, Nemhauser GL, Savelsbergh MWP, Song (2009) An approximate dynamic programming algorithm for
J-H (2012) A branch-price-and-cut algorithm for single product large-scale fleet management: A case application. Transportation
maritime inventory routing. Oper. Res. 60(1):106–122. Sci. 43(2):178–197.
Papageorgiou et al.: ADP for a Class of Long-Horizon MIRPs
Transportation Science 49(4), pp. 870–885, © 2015 INFORMS 885

Song J-H, Furman KC (2013) A maritime inventory routing problem: random travel times and multiple vehicle types. Zeimpekis
Practical approach. Comput. Oper. Res. 40(3):657–665. VS, Giaglis GM, Tarantilis CD, Minis I, eds. Dynamic Fleet
Stålhane M, Rakke JG, Moe CR, Andersson H, Christiansen M, Management: Concepts, Systems, Algorithms and Case Studies
Fagerholt K (2012) A construction and improvement heuristic (Springer, New York), 65–93.
for a liquefied natural gas inventory routing problem. Comput. Topaloglu H, Powell WB (2006) Dynamic-programming approxima-
Indust. Engrg. 62(1):245–255. tions for stochastic time-staged integer multicommodity-flow
Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction problems. INFORMS J. Comput. 18(1):31–42.
(MIT Press, Cambridge, MA). Toriello A, Nemhauser GL, Savelsbergh MWP (2010) Decomposing
Topaloglu H (2006) A parallelizable dynamic fleet management inventory routing problems with approximate value functions.
model with random travel times. Eur. J. Oper. Res. 175(2): Naval Res. Logist. 57(8):718–727.
782–805. Uggen K, Fodstad M, Nørstebø V (2013) Using and extending
Topaloglu H (2007) A parallelizable and approximate dynamic fix-and-relax to solve maritime inventory routing problems. TOP
programming-based dynamic fleet management model with 21(2):355–377.

You might also like