An Approximate Dynamic Programming Approach To Network Revenue Management With Customer Choice

An Approximate Dynamic Programming Approach to Network Revenue
Management with Customer Choice

Dan Zhang and Daniel Adelman
Graduate School of Business
University of Chicago
Chicago, IL 60637
August 29, 2006
Abstract
We consider a network revenue management problem where customers choose among open
fare products according to some pre-specified choice model. Starting with a Markov decision
process (MDP) formulation, we approximate the value function with an affine function of the
state vector. We show that the resulting problem provides a tighter bound for the MDP value
than the choice-based linear program proposed by Gallego et al. (2004) and van Ryzin and Liu
(2004). We develop a column generation algorithm to solve the problem for a multinomial logit
choice model with disjoint consideration sets. We also provide new results on the well-known
decomposition heuristic. Our numerical study shows the policies from our solution approach
can significantly outperform heuristics from the choice-based linear program.
While a substantial amount of research has been done on methods for solving the network
revenue management problem, much less work has been done in solving the version where customers choose among available network products. Usually, when airlines open up a menu of fares
for a given set of flights, customers will make substitutions between those available, or purchase
nothing. Although incorporating customer choice is important in practice, methodologically it is
more difficult than the independent demand case, which already suffers from Bellmans curse of
dimensionality.
Recently, van Ryzin and Liu (2004) studied a linear programming formulation, which they
call the choice-based linear program, to construct heuristics that parallel the bid-price control and
decomposition heuristic approaches from traditional network revenue management. In essence, their
LP formulation is the choice equivalent of the widely used deterministic linear program (DLP)
for network revenue management. It approximates the original stochastic problem by replacing
stochastic demand with its expected value. They provide a column generation algorithm to solve
the problem for the multinomial logit choice model with disjoint consideration sets (MNLD). The
LP formulation is closely related to the model proposed in Gallego et al. (2004), where the focus
is on analyzing so-called flexible products.
The purpose of this paper is to extend the approximate dynamic programming approach of
Adelman (2006) to the customer choice setting, and compare it with van Ryzin and Liu (2004).
dan.zhang@chicagogsb.edu
dan.adelman@chicagogsb.edu, corresponding author
This is an emerging approach to a wide variety of problems in operations research, for which a
lot of active research is ongoing; see, e.g., de Farias and Van Roy (2003). The idea here is to
formulate the underlying dynamic program as a linear program, and then make an affine functional
approximation to the value function to obtain dynamic bid-prices. In the independent demand
setting, these dynamic bid-prices perform better, in terms of both the bound and policy obtained,
than the static bid-prices obtained from the standard linear program. We discover in this paper
that this statement remains true when the method is extended to the choice setting. In fact, the
gap between the bounds obtained empirically can be as much as 50%. However, even when the two
bounds are close, the performance of policies based on the dynamic bid-prices can be much better.
In addition to showing that the basic results of Adelman (2006) continue to hold in the choice
case, there are two unique contributions in the present paper:
1. Solving the subproblem: Unlike in Adelman (2006), the column generation subproblem
for solving our linear program is a non-trivial nonlinear integer programming problem for
general discrete choice models. For the MNLD choice model, the subproblem belongs to the
class of integer generalized fractional programs. General algorithms for efficient solution of
such problems are not currently available (Schaible and Shi (2003)). However, we show that
in this case the subproblem is equivalent to a linear mixed integer programming problem
which can be solved easily.
2. New decomposition results: We provide new theory and results on the decomposition
heuristic, which was extended by van Ryzin and Liu (2004) to the choice setting. (See Talluri
and van Ryzin (2004b) for a discussion of the decomposition heuristic for network revenue
management.) In this scheme, the network problem is decomposed into many tractable singleleg problems, which then give single-leg value functions that are both time and capacity
dependent. These are then substituted into the right-hand side of Bellmans equation to
construct a control policy.
We show that the value function approximation obtained from the decomposition heuristic
with static bid-prices, proposed by van Ryzin and Liu (2004), yields a state-wise upper bound
on the optimal value function. Furthermore, we show that when used with our dynamic bidprices, the value function approximation obtained from the decomposition heuristic yields a
lower upper bound than our approximate linear program. To our knowledge, this new and
improved bound is the strongest reported thus far in the literature, and until now it has not
been known that the decomposition heuristic even provides a bound at all. Empirically, we
find that the decomposition heuristic with dynamic bid-prices gives superior policy performance as compared with the version using static bid-prices in van Ryzin and Liu (2004).
This is significant in practice, because the decomposition heuristic with static bid-prices is
generally regarded as one of the best strawmen available, and is widely used in industry. For
example, one of the leading software houses uses it.
We also relate the notion of efficient choice sets (Talluri and van Ryzin, 2004a) to optimal solutions
of our approximate linear program, giving additional insight into their structure.
2
Brief review of literature. Our work is closely related to the revenue management and approximate dynamic programming literatures. For a comprehensive review of revenue management
literature, see Talluri and van Ryzin (2004b). Many researchers have realized the deficiency of
independent demand models, and therefore have studied problems with rich customer choice activities. Examples include Brumelle et al. (1990), Belobaba and Weatherford (1996), Zhao and Zheng
(2001), among many others. Talluri and van Ryzin (2004a) consider customer choice among fare
classes on a single-leg flight. Zhang and Cooper (2005,2006) consider seat allocation and pricing
issues for multiple flights on the same origin and destination. van Ryzin and Vulcano (2004) introduce a simulation-based optimization approach for network revenue management under a fairly
general choice scheme.
The body of literature on approximate dynamic programming is relatively small but growing;
see Bertsekas and Tsitsiklis (1996) for a review. Our approach is most closely related to the linear
programming approach for dynamic programs. Puterman (1994) gives an excellent review of the
area. The functional approximation idea was first considered by Schweitzer and Seidmann (1985).
Organization of the paper. The rest of the paper is organized as follows. Section 1 introduces
the problem. Section 2 considers the affine functional approximation. Section 3 studies structural
properties of the approximate linear program. Section 4 provides a column generation algorithm.
Section 5 introduces heuristics from solutions of the approximate linear program. Section 6 reports
numerical results.
Problem formulation
In this section, we provide the basic formulations used in the paper. The MDP formulation in
Section 1.1 is essentially the same as the one presented in van Ryzin and Liu (2004). The LP
formulation in Section 1.2 was first studied by Gallego et al. (2004), and later considered by van
Ryzin and Liu (2004).
1.1
MDP formulation
For ease of exposition, we use airline terminology throughout the paper. Consider a flight network
with m legs and the set of capacities c = (c1 , . . . , cm ), where ci is the capacity of leg i. There
are n products offered, where a product is a flight itinerary and fare class combination. Let
N = {1, . . . , n} be the set of products. The fare for product j is fj . The consumption matrix is
an (m n)-matrix A (aij ). The entry aij represents the integer amount of resource i required
by a class j customer. The i-th row Ai is the incidence vector for leg i, and the j-th column Aj
is the incidence vector for product j. There are discrete time periods that are counted forward,
so period is the last period. To simplify notation, we reserve the symbols i, j, and t for legs,
products, and time, respectively.
In each period, there is one customer arrival with probability , and no customer arrival with
probability 1 . When a customer arrives, the firm must decide what products to offer. Let
S N be the offer set of the firm. Note that S = means that no product is offered. The
3
customer chooses the product j S with probability Pj (S), and makes no purchase with probability
P
P0 (S) = 1 jS Pj (S).
The state at the beginning of any period t is an m-vector of unsold seats x. So the state space is
X = {0, . . . , c1 } {0, . . . , cm }. Let vt (x) be the maximum total expected revenue over periods
t, . . . , starting at state x at the beginning of period t. The optimality equation is
vt (x) = max
Pj (S)(fj + vt+1 (x Aj )) + (P0 (S) + 1 )vt+1 (x)
SN (x)
jS
Pj (S)[fj (vt+1 (x) vt+1 (x Aj ))] + vt+1 (x)

t, x.
(1)
= max
SN (x)
jS
The boundary conditions are v +1 (x) = 0 for all x. In the above, the set N (x) = {j N : x Aj }
is the set of products that can be offered when the state is x.
The value function at the initial state c can be computed by the following linear program
(D0)
min v1 (c)
v()
X
vt (x)
Pj (S)(fj + vt+1 (x Aj )) + (P0 (S) + 1 )vt+1 (x)
t, x, S N (x)
jS
with decision variables vt (x) t, x.

The equivalence between (D0) and the dynamic program can be shown from complementary
slackness. The finite horizon dynamic program associated with (1) can also be viewed as an infinitehorizon dynamic program with time index included in the state space. It can be shown by induction
that any feasible solution v() to (D0) is an upper bound on vt (). See also Adelman (2006) for
relevant discussions.
1.2
Choice-based linear programming formulation
van Ryzin and Liu (2004) propose a linear programming formulation to generate heuristics for the
problem in (1). A similar formulation is studied in Gallego et al. (2004) in a different problem
context. The model, which ignores the randomness in the system, can be viewed as a deterministic
expected value formulation of (1). Let S denote the firms offer set. Customer demand (viewed as
continuous quantity) flows in at rate . If the set S is offered, product j is sold at rate Pj (S) (i.e.,
a proportion Pj (S) of the demand is satisfied by product j). Let R(S) denote the revenue from
one unit of customer demand when the set S is offered. Then
R(S) =
fj Pj (S).
jS
Note that R(S) is a scalar. Similarly, let Qi (S) denote the resource consumption rate on flight i,
i = 1, . . . , m, given that the set S is offered. Let Q(S) = (Q1 (S), . . . , Qm (S))T . The vector Q(S)
satisfies Q(S) = AP (S), where P (S) = (P1 (S), . . . , Pn (S))T is the vector of purchase probabilities.
4
Let h(S) be the total time the set S is offered. Since the demand is deterministic as seen by
the model and the choice probabilities are time-homogeneous, only the total time a set is offered
matters; i.e., we do not care about the order in which different offer sets are used. The objective is
to find the total time h(S) each set S should be offered to maximize the firms revenue. The linear
program can be written as follows
(LP)
zLP = max
h
R(S)h(S)
SN
Q(S)h(S) c
(2)
h(S) =
(3)
SN
SN
h(S) 0,
S N.
Note that using equality instead of inequality in (3) is without loss of optimality since N ,
so there is a slack variable h().
In the no-choice case, i.e., when Pj (S) = pj j S and Pj (S) = 0 otherwise, it can be
shown that (LP) is equivalent to the deterministic linear programming (DLP) model in revenue
management literature. In this sense, the (LP) model is an extension of (DLP) to the choice case.
The dual values corresponding to (2) can be used to approximate the value of each resource.
Such an approach is taken in van Ryzin and Liu (2004) and Gallego et al. (2004). The (LP) model
is a static model where a single value i is attached to resource i for each i. We can use the
approximation
vt (x) vt (x Aj )
aij i
t.
(4)
A policy can be determined by using the approximation (4) in the dynamic programming recursion
(1). We need to solve
"
#
X
X
max
Pj (S) fj
aij i .
(5)
SN
jS
We have limited ourselves to the homogeneous case where the problem parameters, including
arrival rates and choice parameters, are stationary over time. One way to accommodate nonstationary parameters is to define a decision variable ht (S) for each possible set S in each period t.
Such an extension will have the same number of capacity constraints as (LP) with one constraint
for each resource; hence, a single dual value is attached to each resource. In contrast, the dynamic
programming formulation and the formulation (D0) are both dynamic models that can easily
accommodate non-stationary problem parameters.
The (LP) model has an exponential number of decision variables. Solving (LP) is in general a
hard problem. Furthermore, solving (5) for general choice probability P is a non-trivial problem.
For the MNLD choice model, van Ryzin and Liu (2004) provide an efficient column generation
5
algorithm to solve (LP) and a ranking procedure for solving (5). In MNLD, each customer is
interested in a subset of the products. Let L = {1, . . . , l} be the set of customer segments. We
assume that customer segment l L has consideration set Nl N . We assume l1 6= l2 , Nl1 Nl2 =
; i.e., different customer segments have disjoint consideration sets. Within each segment, customer
choice follows a Multinomial Logit (MNL) model. Under MNL, the choice probability can be defined
by a choice vector. To completely specify the choice probability, we need to specify the preference
value vlj for l L, j Nl , and the no-purchase value vl0 for l L. In general, a choice set S can
also be represented by an availability vector. We use a vector ul to denote the product availability
for segment l such that
(
1 if product k is available
ulk =
k Nl .
0 if product k is not available
For convenience, we use vector and set notations interchangeably. Then the probability a segment
l customer purchases product j Nl is
ulj vlj
.
jNl ulj vlj + vl0
Plj (ul ) = P
To accommodate MNLD choice model in the framework of Section 1.1, we can assume an
arriving customer first chooses which segment he belongs to, and then chooses products within the
given segment. In particular, we assume the probability an arriving customer belongs to segment
P
l is l / where l l = . It then follows that for j Nl ,
Pj (S) =
where
ulk (S) =
l Plj (ul (S))

,
1 if k S
0 if k
/S
k Nl .
Functional approximation
The formulation (D0) has a huge number of decision variables and constraints, making its exact
solution impractical for moderately sized problem instances. One way to reduce the size of the
problem is to approximate vt () by a set of pre-selected basis functions. One potential approach is
to use a linearly parameterized function class
vt (x)
K
X
Vt,k k (x),
(6)
k=1
where k (x) is a pre-specified basis function and Vt,k is the weight on k (x). After plugging (6) into
(D0), the weights can then be determined by solving the resulting linear program. In this paper,
we consider an affine functional approximation, which is a special case of (6).
2.1
Formulation
Consider the affine functional approximation

vt (x) t +
Vt,i xi ,
(7)
where Vt,i estimates the marginal value of a seat on flight i in period t, and t is a constant offset.
We assume +1 = 0 and V +1,i = 0 i.
Plugging (7) to (D0), we obtain
X
(D1) min 1 +
V1,i ci
(8)
,V
t t+1 +
X
i
Vt,i xi Vt+1,i (xi
X
jS
Pj (S)aij )
Pj (S)fj
t, x, S N (x).
jS
(9)
The dual of (D1) is

(P1) zP 1 = max
Y
t,x,SN (x)
xi Yt,x,S =
x,SN (x)
x,SN (x)
Y 0.
Yt,x,S =
X
jS
Pj (S)fj Yt,x,S
ci
if t = 1

P
x,SN (x) xi
jS Pj (S)aij Yt1,x,S t = 2, . . . ,
if t = 1
t = 2, . . . ,
x,SN (x) Yt1,x,S
i, t
(10)
(11)
The constraint (10) is a flow-balance constraint, which says that the mass of each resource i
flowing into time t must equal that flowing out. The constraint (11) can be replaced by
X
Yt,x,S = 1 t.
(12)
x,SN (x)
Therefore, we can interpret the decision variables Yt,x,S as approximate state-action probabilities.
2.2
Relationship to LP
To derive (LP) from (P1), define

h(S)
Yt,x,S
S N.
(13)
t,x
Note that Yt,x,S = 0 S * N (x). Since Yt,x,S can be interpreted as the probability that the set S
is offered in period t given capacity level x, the right hand side of (13) can be interpreted as the
7
total time the set S is offered throughout the time horizon, which is exactly the interpretation of
decision variable h(S) in (LP). Then the objective function in (P1) can be written as
X
X X
Pj (S)fj h(S) =
R(S)h(S).
SN
jS
SN
Summing (10) over t, we obtain

X
xi Yt,x,S = ci +
X
X
t=2 x,SN
t,x,SN
Canceling terms and rearranging, we obtain

ci =
1 X X
X
xi
X
jS
Pj (S)aij Yt1,x,S
Pj (S)aij Yt,x,S +
t=1 x,SN jS
xi Y,x,S
x,SN
xi Y,x,S .
(14)
x,SN
If Yt,x,S > 0, we must have xi aij i, j S; so xi

X
i.
X X
jS
Pj (S)aij . It then follows that
Pj (S)aij Y,x,S .
x,SN jS
Hence, (14) implies that

ci
XX
Pj (S)aij Yt,x,S =
t,x,S jS
Qi (S)h(S).
SN
Summing (11) over t, we get

X
h(S) = .
SN
Together with Proposition 1 in Adelman (2006), we have the following proposition.

Proposition 1 Any feasible solution to (P1) yields a feasible solution to (LP) having the same
objective value. Hence zLP zP 1 v1 (c).
Theorem 1 in Adelman (2006) shows that a similar relation holds when there is no customer
choice among products for a classic revenue management problem. van Ryzin and Liu (2004) show
that zLP is asymptotically optimal, i.e., converges to v1 (c), as demand, capacity, and time horizon
scale linearly. It follows from Proposition 1 that zP 1 is also asymptotically optimal.
Structural properties
We discuss some structural properties of (D1)-(P1). First, we show that only efficient sets are
used in the solution to (P1); i.e., Yt,x,S > 0 only if S is an efficient set. Such a result parallels
that discussed in van Ryzin and Liu (2004) for (LP). In view of the definition of efficient sets
given below, this result has some intuitive appeal. On the other hand, it is not clear that only
8
efficient sets are used in an optimal solution to the dynamic programming formulation (1); therefore,
(P1) inherits the limitation of a linear additive structure of the functional approximation (7),
which is not necessarily satisfied by the value function v(). Second, we show that an optimal
is non-increasing over
solution (V , ) satisfies a time-monotonicity property. In particular, Vt,i
time for each i. Since Vt can be interpreted as the value of a unit of resource i, and less selling
opportunities exist as time moves forward, this property is also quite intuitive. Furthermore, adding
time monotonicity constraints to (D1)-(P1) speeds up the solution algorithm substantially in our
numerical experiments. Hence, such a property is also quite useful from a practical standpoint.
van Ryzin and Liu (2004) study the concept of efficient sets in the context of network revenue
management with customer choice. They show that only efficient sets are used in the solution to
(LP); i.e., if h(S) > 0 in an optimal solution to (LP) for some offer set S, then S must be an
efficient set. The idea of efficient sets is closely related to the earlier work of Talluri and van Ryzin
(2004a) for a single-leg revenue management problem with customer choice, where they show that
only efficient sets are offered in an optimal solution to the dynamic programming formulation of
the problem. Informally, an offer set is inefficient if on average it consumes no less resource but
produces less revenue compared with another (possibly randomized) offer set.
The following definition and lemma are taken from van Ryzin and Liu (2004).
Definition 1 A set T is said to be inefficient if there exists a set of weights (S), S N satisfying
P
SN (S) = 1 and (S) 0 S N such that
R(T ) <
(S)R(S)
SN
Q(T )
(S)Q(S).
SN
Otherwise, the set T is said to be efficient.

Lemma 1 A set T is efficient if and only if, for some = (1 , . . . , m )T 0, T is an optimal
solution to
max{R(S) T Q(S)}.
SN
In the following we show that only efficient sets are used in the solution to (P1).
Proposition 2 If Yt,x,T > 0 in the optimal solution to (P1), then T is an efficient set.
Proof. Fix t, x. From (D1), the reduced cost of column S is
X
X
X
Vt,i xi Vt+1,i (xi
Pj (S)fj t + t+1
Pj (S)aij ) .
jS
jS
If Yt,x,T > 0, then the set T has a reduced cost of zero at an optimal solution ( , V ) to (D1). It
then follows that for all S
X
X
X
Vt,i
Pj (S)fj t + t+1
xi Vt+1,i
(xi
Pj (S)aij )
i
jS
0=
Pj (T )fj t + t+1
jS
X
i
jT
Vt,i
xi Vt+1,i
(xi
X
jT
Pj (T )aij ) .
Canceling terms and rearranging, we obtain

X
X
X
X
X
X
Pj (S)fj
Vt+1,i
Pj (S)aij
Pj (T )fj
Vt+1,i
Pj (T )aij .
jS
jS
jT
jT
The above inequality is satisfied for all S N . Therefore T is an optimal solution to

X
X
X
max
Pj (S)fj
Vt+1,i
Pj (S)aij .
SN
jS
jS
By Lemma 1, T is an efficient set.

We note that solutions to (P1) overcome some of the difficulties encountered when trying to
use solutions of (LP) in the dynamic setting of (1); namely, a solution to (LP) only gives the
time duration for which each offer set should be used, but does not specify an order in which the
sets should be used. The solution to (P1), however, allows different sets to be used at different
times. In fact, from constraint (11), the decision variable Yt,x,S may be interpreted as approximate
state-action probabilities.
Next, we show that there exists an optimal solution (V , ) to (D1) that is time-monotonic.
Given a feasible solution Y to (P1), define the first time resource i is used by
X
ti = arg min t {1, . . . , } : x, s with Yt,x,S > 0,
Pj (S)aij > 0 .
(15)
jS
We have the following monotonicity result.
Theorem 1 Assume ci > 0 i. There exists an optimal solution ( , V ) of (D1) and a set of
indices {ti : i} such that
t t+1
Vt,i
= Vt+1,i
Vt,i
(16)
i, t = 1, . . . , ti 1
Vt+1,i
i, t =
V , 0.
ti , . . . ,
(17)
(18)
(19)
Proof. The proof follows Adelman (2006). See appendix.

can be interpreted
Conditions (17) and (18) show that V is non-increasing over time. Since Vt,i
as approximate marginal value of resource i at time t, this result is intuitively appealing, since as
time moves forward, we have less opportunities to sell.
10
700
resource 1
resource 2
resource 3
resource 4
600
500
400
300
200
100
0
10
t
15
20
Figure 1: Time trajectory of V in an example with 4 resources

Figure 1 shows the time trajectory of V for an example with 4 resources. The value of resource
, is non-increasing over time. The values are constant for part of the selling horizon,
i at time t, Vt,i
and then start to decrease. Such behavior is quite common in other numerical examples.
Column generation algorithm
The program (P1) has a large number of variables but relatively few constraints, so it can be
potentially solved via column generation. In this section, we develop such an algorithm.
First, it is easy to find an initial feasible solution to start the column generation algorithm.
There is one corresponding to closing all products in each period; i.e. let
(
1 if x = c, S =
Yt,x,S =
t, x, S.
0 otherwise
At a given iteration, suppose the primal solution is (V, ). Let t,x,S be the reduced profit of
the column corresponding to x, S in period t. The maximum reduced profit can be computed by
solving
X
X
X
Vt,i xi Vt+1,i (xi
max t,x,S = max
Pj (S)fj
Pj (S)aij ) t + t+1
t,x,SN (x)
t,x,SN (x)
max
t,x,SN (x)
jS
X
jS
"
Pj (S) fj
jS
aij Vt+1,i
11
X
i
(Vt,i Vt+1,i )xi t + t+1 .
If the objective value is greater than 0, then we add the column corresponding to the optimal
solution to the existing set of columns for (P1); otherwise, optimality is attained.
For fixed t > 1, we need to solve the following optimization problem
"
#
X
X
X
(S0)
max
Pj (S) fj
aij Vt+1,i
(Vt,i Vt+1,i )xi t + t+1
x,S
jS
xi aij
i, j S
xi {0, . . . , ci }
(20)
i.
The effectiveness of a column generation algorithm hinges on efficient solution of the column
generation subproblems. For general choice probability P , (S0) is potentially a nonlinear integer
constrained optimization problem, which is quite difficult to solve. Because of constraint (20), the
optimization problem is not separable in x and S. Even if x is fixed, the optimization on S is a
combinatorial optimization which is potentially difficult by itself. For the multinomial logit choice
model with disjoint consideration sets (MNLD), van Ryzin and Liu (2004) develop an efficient
ranking procedure to solve the optimization on S; see also Gallego et al. (2004). However, such a
procedure cannot be extended to solve (S0) which also optimizes over x.
In the following we show that for MNLD, (S0) can be reduced to a linear integer program, the
solution of which is relatively easy. MNLD was first introduced by van Ryzin and Liu (2004) in
the airline network revenue management context.
P
Let l l L be such that lL l = , where l is the probability of a customer arrival in
segment l for a given period. Plugging in the MNLD choice probabilities, (S0) becomes
!
P
P
X l jN ulj vlj [fj i aij Vt+1,i ]
X
l
P
(S-MNLD) max
(Vt,i Vt+1,i )xi t + t+1

x,u
jNl ulj vlj + vl0
i
lL
xi aij ulj
i, j Nl , l L
xi {0, . . . , ci } i
ul {0, 1}|Nl |
l L.
If the integrality constraints are relaxed, the optimization problem above belongs to a class of
optimization problems called Generalized Linear-Fractional Programs (see,e.g. Boyd and Vandenberghe, 2004). A general approach to solve such a problem efficiently is not available. See Schaible
and Shi (2003) for a recent review of literature on this subject.
If x is fixed, (S-MNLD) becomes a linear fractional program with integrality constraints,
which can be reduced to a linear integer program through change of variables. In van Ryzin and
Liu (2004), an efficient greedy algorithm is devised to solve the problem in that case. When the
optimization is on both x and u as in (S-MNLD), one way to solve the problem is to optimize
over x and u sequentially and iterate. It can be shown that such a procedure will not necessarily
lead to an optimal solution.
In the following, we show that (S-MNLD) can be reduced to an equivalent linear integer program by exploiting the structure of the problem. In particular, we use the fact that the consideration
sets are disjoint and that ulj is binary.
12
Let
ulj
jNl vlj ulj + vl0
1
l = P
jNl vlj ulj + vl0
zlj = P
j Nl , l L
(21)
l L.
(22)
It follows from the definition of zlj and l that

X
vlj zlj + vl0 l = 1 l L
jNl
l 0 l.
(23)
Plugging (21)(23) into (S-MNLD), we obtain

"
#
XX
X
X
(S-MNLD1)
max
l vlj fj
aij Vt+1,i zlj
(Vt,i Vt+1,i ) xi t + t+1
x,z,
lL jNl
aij zlj
xi
l
i, j Nl , l L
(24)
xi {0, . . . , ci } i
zlj {0, l } j Nl , l L
X
(25)
l 0
(27)
(26)
jNl
l.
Lemma 2 (S-MNLD) is equivalent to (S-MNLD1); i.e., both optimization problems have the
same optimal objective value and an optimal solution to one can be obtained from an optimal
solution of the other.
Proof. Since (S-MNLD1) is obtained from (S-MNLD) through change of variables, it can
be easily shown that an optimal solution to (S-MNLD) is a solution to (S-MNLD1) and both
optimization problems have the same objective value at the solution.
Suppose (
x, z,
) is an optimal solution to (S-MNLD1). From (25) and (26), we must have
>
0. Let u
lj = zlj /
l l, j. Then (
x, u
) clearly satisfies the constraints in (S-MNLD). Furthermore,
l, j,
u
lj
u
lj
l
zlj
P
=P
=P
= zlj ,
lj vlj + vl0
lj vlj
l + vl0
l
lj vlj + vl0
l
jNl u
jNl u
jNl z
where the last equation follows by using (26). It then follows that the two optimization problems
have the same objective value at the given solution. This completes the proof.
From Lemma 2, it suffices to solve (S-MNLD1). There is potentially one difficulty in solving
(S-MNLD1): the constraint (24) is nonlinear. However, we show that if vl0 > 0 l L and
aij {0, 1} i, j it can be replaced by an equivalent linear constraint.
13
Theorem 2 Suppose vl0 > 0 for all l L and aij {0, 1}. We only need to solve the following
linear integer program to find the maximum reduced profit for each t > 1
"
#
XX
X
X
(S-MNLD2)
max
l vlj fj
aij Vt+1,i zlj
(Vt,i Vt+1,i ) xi t + t+1
x,z,
lL jNl
xi aij vl0 zlj
i, j Nl , l L
xi {0, . . . , ci }
(28)
zlj {0, l } j Nl , l L
X
(29)
jNl
l 0
l.
Proof. We will show that the feasible region does not change before and after replacing the
constraint (24) by (28). From (26),
1
1
l
v
+
v
v
l0
l0
jNl lj
It then follows from (24) that
xi
aij zlj
aij vl0 zlj
l
l L.
(30)
i, j Nl , l L.
Hence all feasible solutions to (S-MNLD1) satisfy (28). Next we show that by replacing (24) with
(28), all feasible solutions satisfy (24). Let (
x, z, )
be a feasible solution after (24) is replaced by
i 1, then
(28). For fixed i, if x
i = 0, then zlj = 0 from (28) for all j such that aij > 0; if x
aij zlj
zlj
x
i 1 l since l {0, 1} and aij {0, 1}; note that from (30)
> 0. This shows that any
feasible solution after replacing the constraint is also feasible for (24). This completes the proof.
Subsequently, we will refer to (S-MNLD2) as the column generation subproblem.
Some existing linear programming solvers, such as CPLEX, provide ways to handle (29) directly.
If the constraint needs to be dealt with explicitly, we can replace it with several linear constraints.
Let M be a big number. Then (29) can be replaced by the following set of constraints
zlj l M (1 lj ) j Nl , l L
zlj M lj
0 zlj l
lj {0, 1}
j Nl , l L
j Nl , l L
l L, j N.
If lj = 1, then zlj = l ; otherwise if lj = 0, then zlj = 0. Let v = minl vl0 , then M can be any
number greater than 1/v.
Policies from functional approximation
In this section we show how to use the value function approximations to construct control policies.
14
5.1
Direct use of dynamic bid-prices
Let (V , ) be the optimal solution for (D1). Using the approximation

X
vt (x) vt (x Aj )
aij Vt,i
,
i
a control policy in period t and state x can be computed by solving for each l
h
i
P
P
v
u
a
V
f
j
jNl lj lj
i ij t+1,i
.
P
max
v
u
+
v
ulj {0,1{xAj } }jNl
lj
lj
l0
jNl
(31)
The constraint ulj {0, 1{xAj } } in (31) incorporates the constraint on capacity. The heuristic is
motivated by the dynamic programming recursion (1). The maximization in (31) can be solved
efficiently using a simple ranking procedure. This fact was previously established by van Ryzin and
Liu (2004) and Gallego et al. (2004). The resulting policy is called ADP in Section 6.
5.2
LP Decomposition
Decomposition approaches where a network program is decomposed into many single resource
problems are widely used in practice. In van Ryzin and Liu (2004), this approach is extended
to the network revenue management problem with customer choice. Let be the vector of dual
values of the resource constraints in (LP). Given i, the value function vt (x) can be approximated
by taking
X
vt (x) vti (xi ) +
k xk .
k6=i
It then follows that
vt (x) vt (x Aj ) vti (xi ) vti (xi aij ) +
k akj .
(32)
k6=i
Plugging (32) into the optimality equation (1) and rearrange, we obtain
X
X
i
vti (xi ) = max
P (S) fj vti (xi ) + vti (xi aij )
i akj + vt+1
(xi ).
SN (x)
(33)
k6=i
jS
Equation (33), together with boundary conditions vi +1 (xi ) = 0 xi , determines a value function
vti () for resource i.
Repeating the above for i = 1, . . . , m gives a set of value functions {
vti ()}i , which can be used
to compute a control policy. Specifically, we use the approximation
X
vt (x)
vti (xi ).
i
Plugging this approximation into the dynamic programming recursion as in (1), a policy can be
determined by solving for each l

!
P
P i
i (x a ))
v
u
f
(
v
(x
)
j
i
i
ij
lj
lj
t
jNl
i t
P
max
t, x.
v
u
+
v
ulj {0,1{xAj } }jNl
lj
lj
l0
jNl
15
This policy is called LPD in Section 6.

We finish this section by showing the following result.
Proposition 3 vti (xi ) +
k6=i k xk
vt (x) i.
Proof. Note that vti () is the solution to the linear program (D0) with vt (x) replaced by vti (xi ) +
P
k6=i k xk . The proof is similar to the proof for Part (i) in Proposition 4 and is omitted.
Proposition 3 shows that LP decomposition produces a bound for the value function of the
problem. Although decomposition heuristic is studied in much of existing literature (see, e.g.,
Talluri and van Ryzin, 2004b), the bound in Proposition 3 is new.
5.3
Decomposition based on solutions to (D1)
We can also use an optimal solution (V , ) to (D1) in a decomposition approach. Given i, we

use the following approximation
X
vt (x) vti (xi ) +

Vt,k
xk t, x,
(34)
k6=i
where vti () is the value function for the leg-i problem. It then follows that
X
Vt,k
akj .
vt (x) vt (x Aj ) vti (xi ) vti (xi aij ) +
k6=i
Substituting (34) into (D0), we obtain

X
(LPi )
min v1i (ci ) +
V1,k
ck
vti ()
k6=i
vti (xi )
Vt,k
xk
k6=i
i
Pj (S)(fj + vt+1
(x aij ) +
(xk akj )Vt+1,k

)
k6=i
jS
i
+ (P0 (S) + 1 )(vt+1
(xi ) +
xk Vt+1,k
)
t, x, S N (x).
(35)
k6=i
Proposition 4 relates the objective value from (LPi ) to zLP , zP 1 , and the MDP value.
Proposition 4
(i) mini {vti (xi ) +
(ii) zLP zP 1 maxi {v1i (ci ) +
k6=i Vt,k xk }
k6=i Vt,1 ck }
vt (x) t, x;
mini {v1i (ci ) +
k6=i Vt,1 ck }
v1 (c).
P
x } v (x) for all i, t, x. We prove the
Proof. (i) We only need to prove vti (xi ) + k6=i Vt,k
t
k
proposition by induction. For t = , we note from (35) that
X
X
vi (xi ) +
V,k
xk
Pj (S)fj
x, S N (x).
k6=i
jS
It follows that
vi (xi ) +
X
k6=i
V,k
xk max
SN (x)
16
X
jS
Pj (S)fj = v (x).
This shows that the proposition holds for t = . Next, assume the proposition holds for t + 1. Then
by (35) and the inductive assumption for all x, S we have
X
X
vti (xi ) +
Vt,k
xk
Pj (S)(fj + vt+1 (x Aj )) + (P0 (S) + 1 )vt+1 (x).
k6=i
jS
From the optimality equation for period t, we obtain

X
vti (xi ) +
Vt,k
xk vt (x).
k6=i
(ii) The first inequality is established in Proposition 1, and the third inequality is immediate.
The last inequality is implied by Part (i). So we will only need to show the second inequality. It
P
c i. Let (V , ) be an optimal solution to (D1). From
suffices to show zP 1 v1i (ci ) + k6=i Vt,1
k
i
x t, x is a feasible solution to (LP ). This
the feasibility of (V , ) to (D1), vt (xi ) = t + Vt,i
i
i
i
establishes the inequality.
P
c } is a tighter bound of v (c) than
Part (ii) in Proposition 4 shows that mini {v1i (ci ) + k6=i Vt,1
1
k
zP 1 . Such a bound is useful because it provides a better benchmark in numerical studies.
(LPi ) has the same number of constraints as (D1) but more decision variables. So solving
(LPi ) is potentially even harder than solving (D1), although it is possible to solve the program via
column generation. Instead, we consider the following dynamic program
(
X
X
max
P (S) fj vti (xi ) + vti (xi aij )
Vt,k
akj
vti (xi ) =
xi ,SN (x)
k6=i
jS
(Vt,k
Vt+1,k
)xk
k6=i
i
(xi ).
+ vt+1
(36)
with boundary conditions vi +1 (xi ) = 0 xi , i. We can use backward recursion algorithm for dynamic programs to solve for vti (). In each iteration, the maximization problem in (36) is almost the
same as the column generation subproblem (S0). The only difference is that here in each iteration
the value of xi is fixed. As discussed in Section 4, for MNLD choice model we only need to solve
a linear mixed integer program. Proposition 5 shows that v1i (ci ) from (36) is equal to v1i (ci ) from
(LPi ).
Proposition 5 v1i (ci ) = v1i (ci ) i.
Proof. We first show that vti (xi ) vti (xi ). The proof is by induction and is similar to the proof of
Part (i) in Proposition 4. It then follows that v1i (ci ) v1i (ci ). Furthermore, by (36), vti () satisfies
the constraints in (LPi ), and is therefore feasible for (LPi ). Hence, v1i (ci ) v1i (ci ). This completes
the proof.
After vti () is determined for each i, we can use the approximation
vt (x) vt (x Aj )
n
X
i=1
17

vti (xi ) vti (xi aij )
to compute a heuristic policy for (1) by solving
"
#
n
X
X

i
i
max
Pj (S) fj
vt (xi ) vt (xi aij )
.
SN (x)
jS
i=1
This optimization again can be done easily for MNLD choice model.
Numerical experiments
We conducted numerical experiments to study the performance of the proposed solution approach
for (P1), the relationships among the different bounds, and the performance of the proposed policy
approaches.
In the no-choice (independent demand) setting, Adelman (2006) shows that the performance of
policies based on dynamic bid-prices depends heavily on the load factor. Because demand depends
on the offer set, the notion of load factor is not immediately clear for network choice problems.
Given choice probability P , let
X
S arg max
Pj (S)fj .
SN
jS
Note that S is a revenue-maximizing set of open products when there is ample capacity of each
resource. We call
P P
P
t=1 jS m
aij Pj (S )
Pm i=1
=
(37)
i=1 ci
the load factor. In the no-choice setting S = N and the load factor defined in (37) is the same as
the one commonly used in the literature; see, e.g., Adelman (2006).
6.1
Computational and bound performance
We study the performance of the proposed algorithm to solve (P1) on randomly generated huband-spoke network instances. We implemented the solution methods using C++ and CPLEX on
an Intel Xeon 3.6GZ workstation. In the column generation algorithm for (P1), the objective value
of the restricted problem plus the sum over time of the maximum reduced profit in each period
serves as an upper bound for the optimal objective value (see Adelman, 2006). In this subsection,
we use a 5% optimality tolerance; that is the column generation procedure terminates when the
sum of the maximum reduced profit in each period over time is within 5% of the objective value of
the restricted problem.
We consider a set of hub-and-spoke network instances with K {2, 4, 8, 12} non-hub locations.
This set of instances is called HS1 subsequently. Figure 2 shows the network structure when K = 4,
where each arc in the network represents a flight leg. Half of the non-hub locations each have two
parallel flights to the hub, and the other half each have two parallel flights from the hub. There
are 2K flights in total. Table 1 shows key statistics for HS1 instances. For K = 12, there are
48 customer segments and 336 products. We assume that all products with the same origin and
destination belong to the same segment, so the number of segments is the same as the number of
18
Figure 2: Network structure for HS1 instances with K = 4.

origin-destination (O-D) pairs. Two classes, a high-fare class and a low-fare class, are offered for
each possible itinerary. The high class fares of local itineraries to the hub are drawn from the Poisson
distribution with mean 30; the corresponding low fares are drawn from the Poisson distribution
with mean 10. All other high fares are drawn from the Poisson distribution with mean 300, and all
other low fares are drawn from the Poisson distribution with mean 100. Choice parameters for high
and low fare products are drawn from the Poisson distribution with mean 50 and 200, respectively.
The no-purchase weight for each segment is drawn from the Poisson distribution with mean 10.
To randomly generate the arrival rate for each customer segment, we draw El from the Poisson
P
distribution with mean 20 for each l and set l = 0.9El / l El . Note that the total arrival rate in
each period is 0.9. We generated problem instances with {50, 100, 200, 400, 800}.
Table 2 shows the load factor and capacity per leg for the problem instances. The capacity
across legs is taken to be the same for each instance. The CPU seconds to solve (P1) is also
reported. The biggest instance with 12 non-hub locations and 800 periods takes about 24 minutes
to solve, which is practical in real application. Also the CPU seconds increases as and K increases.
This observation is not surprising, since the size of (P1) increases with both and K.
Table 3 reports the CPU seconds for HS1 instances except for half the load factor. The load
was cut in half by cutting the arrival rate in each period in half. The table shows that it is 3.63
times faster on average to solve these problems with a range of 1.18 to 7.88. This speedup can be
explained by the fact that as load factor decreases, the combination of a few offer sets provides
near-optimal performance. In particular, when there is enough capacity to accept all customer
requests, the policy that only offers the revenue maximization set S is optimal. As a result, fewer
columns need to be added when solving (P1).
Table 4 reports the approximate relative difference zLP /zP 1 for HS1. Note that the difference is
not exact since (P1) is not solved to optimality. For this set of instances, the difference is relatively
small, with the largest difference about 8% for the case with = 50 and K = 12.
However, we use another set of instances to show that the relative difference zLP /zP 1 can be
huge. We consider hub-and-spoke networks with K {2, 4, 8, 16} non-hub locations. This set
of instances is called HS2 subsequently. There are flight to and from the hub for each non-hub
location. Figure 3 shows the network structure when K = 4. One product is offered for each
itinerary. The fare is generated from the Uniform[75,250]. The choice parameter value for each
19
Figure 3: Network structure for HS2 instances with K = 4.

product is three times a random number drawn from the Poisson distribution with mean 20. The
no-purchase value for each segment is set to 1. We draw El from the Uniform distribution with
P
range 0.1 to 0.7 for each l and set l = 0.8El / l El . We generated problem instances with
= {20, 50, 100, 200, 400, 800}. Capacity is taken to be the same across legs in each instance.
Table 5 report load factor and capacity for the instances. Table 6 shows the approximate relative
difference zLP /zP 1 for HS2. For = 20 and K = 16, the relative difference is about 53%. This
shows that zP 1 can serve as a much better revenue bound. The magnitude of difference reported
here is similar to that reported in Adelman (2006).
In both Tables 4 and 6, the approximate relative difference increases as the horizon length
decreases. This behavior can be explained by the result of van Ryzin and Liu (2004), which shows
that LP is asymptotically optimal as the problem scales up linearly in capacity and time. The
difference also appears to be bigger for problems with more complex network structure.
6.2
Description of simulated instances
We studied the relative performance of four different heuristic policies. LP is the policy specified
by (5). LPD was introduced in Section 5.2. Both LP and LPD are based on dual values from (LP)
and were previously studied by van Ryzin and Liu (2004). ADP was introduced in Section 5.1,
and ADPD was introduced in Section 5.3. In addition, we also tested each heuristic where (LP)
and (P1) are resolved 5 times for equally-spaced time intervals. Each heuristic is simulated 100
times using the same arrival streams. We tested the heuristics on four sets of instances. Table 7
reports the capacity configuration and load factor of the test instances, and Table 8 summarizes
key statistics of the simulated instances. These test instances cover a wide variety of cases with
different network structure and randomly generated choice parameters. For each set of instances,
we consider {20, 50, 100, 200, 400}.
PF1 (4 parallel flights). Parallel flights are scheduled flights on the same origin and destination on the same day. In practice, substitution among parallel flights is widely observed. We
randomly generated instances with 4 parallel flights. A single class is offered on each flight. All the
products belongs to the same segment, and the arrival probability is 0.9 in each period. The fare
is generated from Uniform[10,100]. The MNL choice parameter for product j, vj , is generated from
P
the Poisson distribution with mean 100. The no-purchase weight, v0 , equals 0.5 nj=1 vj ; hence
20
when all products are open, the no purchase probability is 1/3.

PF2 (8 parallel flights). This set of instances is the same as PF1 except there are 8 instead
of 4 parallel flights.
HS3 (Hub-and-spoke network with 2 non-hub locations). There are 4 parallel flights
from location 1 to the hub, and 4 parallel flights from the hub to location 2. The fare from location
1 to the hub is generated from Uniform[1,10], and the fare from the hub to location 2 is generated
from Uniform[10,100]. Each through itinerary fare is 0.95 times the corresponding sum of local
itinerary fares. There are three disjoint product segments, one for location 1 to the hub, one for
the hub to location 2, and one for the through itinerary (location 1 to location 2). The arrival rates
are 0.45,, 0.225, and 0.225, respectively. The MNL choice parameter for product j, vlj , is generated
from the Poisson distribution with mean 100. The no-purchase weight for segment l, vl0 , equals
P
0.5 jNl vlj .
HS4 (Hub-and-spoke network with 4 non-hub locations). This set of instances is generated the same way as HS1 except for the arrival rates. We assume the locations are marked
1 to 4. Locations 1 and 2 each have two parallel flights to the hub, and locations 3 and 4 each
have two parallel flights from the hub. All products with the same origin and destination belong
to the same segment, so there are 8 segments. Segments 1 to 4 correspond with local itineraries
for locations 1 to 4, respectively. Segments 5 to 8 contain through itinerary products. The arrival
rates for segments 1 to 8 are 0.2, 0.2, 0.05, 0.05, 0.1, 0.1, 0.1, and 0.1, respectively.
6.3
Policy results
The bounds and simulated averages for the test instances are reported in Tables 912. To make it
easy to compare the bounds and performance, we also report the relative difference of bounds and
policy performance in Tables 13 and 14.
We report four different bounds. (LP) is solved to optimality in all the examples, and the
P
v1i (ci ) + k6=i k ck }
objective value zLP is reported in the LP bound column. The value mini {
is reported in the LPD bound column. The objective value from (P1) is reported in P1 bound
P
c } is reported in
column . (P1) is solved to 0.5% of optimality. The value mini {v1i (ci ) + k6=i V1,k
k
the ADPD bound column. Note that since (P1) is not solved to optimality, the ADPD bound
inherits errors from the solution of (P1). Consequently, the reported ADPD bound is not always
tighter than P1 bound, as suggested by Proposition 4.
Our numerical results indicate that ADP can outperform LP with a revenue difference of up to
440%. Even with resolving, ADP beats LP by up to 8%. Such a difference in revenue performance
is quite significant for revenue management applications. We postulate that the difference in performance is due to the fact that (P1)-(D1) gives better bid-prices, which are time-dependent. We
should also note that the bid-price of one resource affects the set of products offered; even products
that do not use this particular resource can be affected because of customer choice among products.
This should be contrasted with the bid-price controls for the independent demand case, where the
bid-price of a particular resource only affects the controls of products that use this resource. We
also note that the performance difference between LP and ADP tends to, although not always,
21
be smaller for problems with longer time horizon. This behavior again can be attributed to the
asymptotic result of van Ryzin and Liu (2004).
6.4
Decomposition
LPD and ADPD are both based on decomposition of the network problem into many leg-based
problems. van Ryzin and Liu (2004) report that LPD performs consistently better than LP. It
is also our experience that decomposition-based approaches, including LPD and ADPD, usually
perform better than naive bid-price controls.
We are however more interested in the relative performance of LPD and ADPD. Tables 13 and
14 show that without resolving ADPD can perform up to 9% better than LPD. With resolving,
the difference can still be as high as 6%. Of course, to use ADPD, we need to solve (P1) and the
harder dynamic programming recursion outlined in Section 5.3.
The relative performance of LPD and ADPD is affected by several factors. First, it depends
on the decomposability of a particular problem. The basic decomposition idea is to decompose the
dynamic programming value function vt (x) so that
vt (x)
m
X
vti (xi ) t, x
(38)
i=1
for a set of functions {vti ()}t,i . We say a problem is fully decomposable if the approximation in
(38) is exact. To gain insights into the performance of decomposition heuristics, consider two fully
decomposable problems:
1. One flight with multiple fare classes;
2. A parallel flights problem with independent demand for each flight.
In both cases, the problems can be trivially decomposed into single-leg problems. Consequently,
the decomposition heuristics actually give an optimal policy for both problems. In fact, the bidprices from (LP) and (P1) are not needed for the decomposition.
Now, consider two modified problems:
1. A hub-and-spoke network problem with one flight for each local itinerary and very low arrival
rates for through itineraries;
2. A parallel flights problem where only a small portion of customers switch flights if their most
preferred flight is not offered.
Both problems listed above are clearly not fully decomposable because of a network effect and
a customer choice effect, respectively. However, we expect that decomposition-based approaches
work well for these problems, because the network effect and customer choice effect are weak. On
the other hand, we usually see that the relative performance difference is bigger for problems with
a stronger network effect or customer choice effect. The customer choice effect is demonstrated by
22
the results of PF1 and PF2. The two example sets differ only in the number of parallel flights.
With more parallel flights, the customer choice effect is more pronounced. Table 13 shows the
difference between LPD and ADPD is much larger for PF2 than PF1 cases.
Other factors that affect the relative performance between LPD and ADPD includes load
factor and capacity asymmetry. When load factor is either very high or very low, the performance
difference is usually small. This is not surprising since effective revenue management control is most
needed when the load factor is in an intermediate range. We also observe that when flights differ
significantly in their capacity, the performance difference is usually bigger. When flight capacity is
asymmetric, it is more likely that some resources are more critical than others. Hence, it is more
important to use accurate bid-prices in such cases.
Appendix A: Proof of Theorem 1

We first prove the following lemma.
Lemma 3 Fix i and assume ti exists (which implies ci > 0). Then
(i) t > ti , Yt,x,S > 0 such that xi < ci .
(ii) t ti , Yt,x,S > 0 only if xi = ci .
Proof. Assume ti < . Let x and S be such that Yt,x ,S > 0 and
definition of ti . Then
X
xi
Pj (S )aij < xi ci .
jS
Pj (S )aij > 0 as in the
jS
For period
ti
+ 1,
xi Yti +1,x,S =
x,SN (x)
x,SN (x)
xi
xi
jS
< ci .
X
jS
Pj (S)aij Yti ,x,S
Pj (S )aij Yti ,x ,S +
ci Yti ,x,S
x6=x ,S6=S
If xi = ci for all x, S such that Yti +1,x,S > 0, the left hand side equals ci , a contradiction. Repeat
the argument shows Part (i).
The proof for Part (ii) is by induction. The statement is true for t = 1 from problem definition
(xi = ci for t = 1). For 1 < t ti , suppose the statement is true for t 1. Then
X
X
X
xi Yt,x,S =
xi Yt1,x,S = ci
Yt1,x,S = ci .
x,SN (x)
x,SN (x)
x,SN (x)
Suppose xi < ci for some x , S such that Yt,x ,S > 0. Then

X
X
xi Yt,x,S =
xi Yt,x,S + xi Yt,x ,S < ci ,
x,SN (x)
x6=x ,S6=S
23
which is a contradiction. This completes the proof.

The main proof follows.
Proof. Suppose ti defined by (15) exists. Consider ti = ti . The case where ti does not exist is
considered later. Monotonicity of V is proved in three steps.
V
1. Vt,i
t+1,i t > ti .
V
2. Vt,i
t+1,i t < ti .
< V t + 1, i for some t t , then we can raise V to V
3. If Vt,i
i
t,i
t+1,i without loss of optimality.
Finally, the remaining claims are proved.

Step 1. Assume ti < . Consider an optimal primal-dual solution Y , (V , ). From Lemma 3,
Yti ,x,S > 0 such that xi < ci . Therefore, complementary slackness implies that
#
"
X
X
X

Pj (S) fj
aij Vt+1,i
Vt,k
Vt+1,k
xk t + t+1
= 0.
(39)
i
jS
Consider an x such that xi = xi + 1 and xk = xk k 6= i. Note that S N (x) N (x ). By dual

feasibility
#
"
X
X
X

Vt,k
xk Vt,i
(xi + 1)
Pj (S) fj
aij Vt+1,i
Vt+1,k
Vt+1,i
jS
k6=i
0.
t + t+1
(40)
Combining (39) and (40) leads to (17).

V
Step 2. We now show that Vt,i

t+1,i for all t < ti . The constraint (11) implies that there
exists a Yt,x,S
> 0 for some x and S. From Lemma 3, it has the property xi = ci . By complementary
slackness, we have
"
#
X
X
X

Vt,k
xk Vt,i
ci
Pj (S) fj
aij Vt+1,i
Vt+1,k
Vt+1,i
i
jS
k6=i
t + t+1
0.
(41)
P
Now consider an x equals to x except xi = ci 1 0. From definition of ti , j Pj (S)aij = 0 for
all t < ti . Hence S is still feasible. By dual feasibility
"
#
X
X
X

Pj (S) fj
aij Vt+1,i
Vt,k
Vt+1,k
xk Vt,i
Vt+1,i
(ci 1)
jS
k6=i
t + t+1
0.
(42)
V
Combining (41) and (42) yields Vt,i
t+1,i .
Step 3. Given an optimal primal-dual solution Y , ( , V ), consider the largest t ti such
< V
that Vt,i
t+1,i . We will construct an optimal solution that satisfies (17) (or (18) if t = ti )
24
by showing the solution preserves dual feasibility and complementary slackness. Denote the new
solution ( , V ), and let it be the same as ( , V ) except that
= Vt+1,i
Vt,i
t = t + (Vt , i Vt+1,i
)ci .
We first check period t inequality (9). Given x, S, the reduced profit is

"
#
X
X
X

Pj (S) fj
aij Vt+1,i
Vt,k
Vt+1,k
xk Vt,i
Vt+1,i
xi t + t+1
=
k6=i
jS
"
Pj (S) fj
jS
aij Vt+1,i
X
k6=i
0.

Vt,k
xk t Vt,i
ci + t+1
Vt+1,k
Vt+1,i
(43)
The inequality follows because (43) is the reduced profit under dual feasible solution ( , V ) for an
> 0. Because
x with xi = ci and a feasible set S. In particular, consider any x, S such that Yt,x,S
t ti , it follows from Lemma 3 that xi = ci and (41) holds under ( , V ). Observe that the
expression in (43) is the same as (41); i.e., the reduced profit is the same under ( , V ) as under
( , V ). Therefore, complementary slackness is preserved.
Next, we check dual feasibility and complementary slackness for period t 1 inequalities (9),
assuming t > 1. For given x, S, (9) can be rewritten as
"
#
X
X
X

Pj (S) fj
aij Vt,i
Vt1,k
Vt,k
xk Vt1,i
Vt,i
xi
k6=i
jS
t1
+ t 0.
Under ( , V ) the reduced profit is
X
X
X

Pj (S) fj
akj Vt,k
aij Vt+1,i
Vt1,k
Vt,k
xk Vt1,i
Vt+1,i
xi
jS
k6=i
(44)
k6=i
t1
+ t + (Vt,i
Vt+1,i
)ci .
(45)
Subtracting (44) from (45), we obtain
(Vt,i
Vt+1,i
)
X
jS
Pj (S)aij + (ci xi ) 0.
Hence dual feasibility is preserved. Furthermore, for any Yt1,x,S

> 0, we know from Lemma 3
P
that xi = ci and from the definition of ti that jS Pj (S)aij = 0. Therefore, the last inequality
changes to equality; i.e., complementary slackness is preserved.
and appear no where else besides considered above, and hence
The modified variables Vt,i
t
overall complementary slackness and dual feasibility is preserved. The same argument applied
successively for each smaller t leads to (17).
25
V
If no ti exists, step 2 of the argument above shows that Vt,i
t+1,i for all 1 t . The
construction procedure in step 3 can be applied to achieve Vt,i = Vt+1,i for all t. Because V+1,i = 0,
= 0 for all t, hence (17) holds for t = .
it follows that Vt,i
i
The nonnegativity of Vt,i in (19) follows from (18) and V+1,i = 0. The time monotonicity of
t given in (16) follows from (9) because x = 0 and S = is a feasible state-action pair. The
nonnegativity of in (19) follows from (16) and +1 = 0.
26
Case
# of non-hub locations
# of O-D pairs (segments)
# of resources
# of itineraries
# of products
HS1
K {2, 4, 8, 12}
2
K + K4
2K
2K + K 2
4K + 2K 2
HS2
K {2, 4, 8.16}
2K + K(K 1)
2K
2K + K(K 1)
4K + 2K(K 1)
Table 1: Statistics of hub-and-spoke test instances HS1 and HS2.
27
50
100
200
400
800
load
factor
1.37
1.35
1.36
1.35
1.40
2,4,16
capacity
per leg
10
21
38
79
156
CPU
seconds
0.33
0.85
1.24
3.59
6.73
# non-hub locations, #
4,8,48
load
capacity
CPU
factor
per leg
seconds
1.31
6
1.98
1.39
11
2.66
1.39
22
5.87
1.42
45
12.31
1.38
88
24.93
resources, # products
8,16,160
load
capacity
CPU
factor
per leg
seconds
1.42
3
18.74
1.47
6
35.15
1.46
12
55.04
1.43
25
164.98
1.42
49
239.12
load
factor
1.54
1.53
1.37
1.43
1.43
Table 2: Load factor, capacity, and CPU seconds for HS1.
50
100
200
400
800
2
0.24
0.42
0.86
1.18
5.72
locations
4
8
0.33 6.15
0.99 5.09
2.14 22.16
3.83 42.19
7.46 50.35
12
104.42
29.31
34.36
100.14
525.17
Table 3: CPU seconds to solve (P1) for HS1 with half the load factor.
12,24,336
capacity
CPU
per leg
seconds
2
332.77
4
150.42
9
270.63
17
557.12
34
1432.72
locations
50
100
200
400
800
zLP
4769.99
10400.20
16996.20
39471.10
69810.90
2
zP 1
4685.68
10286.70
16748.40
39334.40
69079.90
4
zLP
zP 1
1.02
1.01
1.01
1.00
1.01
zLP
7829.55
14512.40
28820.00
59009.30
114076.00
zP 1
7681.12
14304.70
28540.00
58415.30
112267.00
zLP
zP 1
1.02
1.01
1.01
1.01
1.02
zLP
8865.23
17906.80
34230.40
70714.00
137934.00
zP 1
8400.01
17343.90
33448.50
69336.90
135953.00
zLP
zP 1
1.06
1.03
1.02
1.02
1.01
zLP
8756.77
17266.10
38154.10
73638.30
144887.00
Table 4: Approximate upper bounds and relative difference for HS1.
28
20
50
100
200
400
800
2,4,12
load
capacity
factor
per leg
1.81205
3
1.5718
8
1.59299
19
1.5843
34
1.58434
68
1.58774
126
# non-hub locations, #
4,8,40
load
capacity
factor
per leg
1.57753
2
1.54017
5
1.6015
10
1.58113
20
1.60127
40
1.58524
81
resources, # products
8,16,144
16,32,544
load
capacity
load
capacity
factor
per leg
factor
per leg
1.79335
1
0.92483
1
1.47362
3
2.29805
1
1.46147
6
1.53856
3
1.5985
11
1.54098
6
1.61746
22
1.66522
11
1.57865
43
1.62501
23
Table 5: Load factor and capacity for HS2.
20
50
100
200
400
800
2
1.05
1.02
1.01
1.01
1.01
1.00
locations
4
8
1.17 1.41
1.05 1.12
1.03 1.07
1.02 1.04
1.02 1.03
1.02 1.03
16
1.53
1.35
1.13
1.07
1.05
1.04
Table 6: Approximate relative difference for HS2.
12
zP 1
8125.07
16648.40
37198.00
72106.40
142255.00
zLP
zP 1
1.08
1.04
1.03
1.02
1.02
PF1
20
50
100
200
400
Capacity
1133
4488
8 8 16 16
20 20 30 30
40 40 60 60
PF2
Load
Factor
1.11
1.12
0.59
1.09
1.2
Capacity
11111111
22224444
44448888
9 9 9 9 16 16 16 16
20 20 20 20 30 30 30 30
HS3
Load
Factor
1.26
1.1
1.04
1.09
1.09
Capacity
11111111
22442244
44884488
9 9 15 15 9 9 15 15
17 17 31 31 17 17 31 31
HS4
Capacity
Capacity
22222222
55555555
11 11 11 11 11 11 11 11
21 21 21 21 21 21 21 21
43 43 43 43 43 43 43 43
Load
Factor
1.56
1.41
1.25
1.48
1.48
Load
Factor
1.52
1.48
1.35
1.42
1.36
Table 7: Load factor and capacities for simulated instances.
29
Case
# of non-hub locations
# of O-D pairs (segments)
# of resources
# of itineraries
# of products
PF1
1
4
4
4
PF2
1
8
8
8
HS3
2
3
8
24
24
HS4
4
8
8
24
48
Table 8: Statistics of simulated test instances.
Re-solving
20
50
100
200
400
20
50
100
200
400
LP
bound
305.499
1734.61
1609.69
3550.27
7359.56
LPD
bound
304.261
1733.54
1609.63
3550.27
7359.56
LP
128.56
378.69
1422.97
3075.73
3393.99
281.49
1566.05
1546.57
3385.39
6932.41
Stdev
(6.41)
0.00
(194.74)
(266.17)
0.00
(34.32)
(155.77)
(77.32)
(132.20)
(283.70)
LPD
286.95
1589.82
1486.29
3341.04
7175.14
290.44
1636.20
1531.20
3433.37
7232.18
Stdev
(36.93)
(153.29)
(110.67)
(173.92)
(232.30)
(28.18)
(118.83)
(76.06)
(115.59)
(152.03)
P1
bound
301.918
1728
1608.83
3550.24
7359.56
ADPD
bound
300.085
1721.08
1608.62
3550.35
7377.72
ADP
295.74
1666.95
1462.21
3526.60
7351.05
296.18
1670.61
1564.52
3525.36
7318.45
Table 9: Bounds and simulated average revenue for PF1.
Stdev
(19.72)
(123.85)
(163.93)
(56.33)
(42.54)
(22.04)
(114.00)
(74.23)
(55.95)
(128.24)
ADPD
289.28
1642.89
1551.98
3460.37
7295.36
296.11
1666.27
1575.57
3518.41
7338.04
Stdev
(31.14)
(117.78)
(83.33)
(137.83)
(144.87)
(23.19)
(102.39)
(50.46)
(70.71)
(67.18)
Re-solving
20
50
100
200
400
20
50
100
200
400
LP
bound
424.272
1177.18
2498.59
6263.01
12620.3
LPD
bound
422.495
1172.82
2498.59
6262.97
12620.3
LP
166.18
1071.83
1573.23
4596.93
11500.01
360.35
1047.74
2276.78
5867.58
12029.51
Stdev
(14.04)
(107.29)
(21.80)
(520.95)
(454.43)
(55.75)
(96.35)
(135.91)
(238.17)
(393.85)
LPD
390.77
1001.83
2249.15
5294.28
11427.90
389.41
1069.40
2346.49
5899.84
12131.83
Stdev
(40.86)
(125.58)
(174.79)
(400.87)
(651.68)
(39.64)
(90.52)
(128.18)
(251.61)
(336.61)
P1
bound
399.752
1149.93
2493.98
6250.53
12618.3
ADPD
bound
399.925
1135.45
2496.9
6260.75
12663.7
ADP
395.81
1106.87
2443.76
6040.58
12543.88
391.99
1105.67
2449.23
6145.10
12496.28
Stdev
(36.65)
(87.33)
(78.10)
(213.50)
(161.25)
(41.23)
(91.97)
(71.97)
(136.46)
(182.70)
ADPD
395.04
1094.14
2397.51
5998.74
12354.94
393.57
1114.75
2445.00
6116.54
12520.95
Stdev
(38.32)
(99.16)
(119.18)
(297.17)
(370.28)
(38.61)
(82.59)
(70.29)
(182.68)
(163.59)
Table 10: Bounds and simulated average revenue for PF2.
30
Re-solving
20
50
100
200
400
20
50
100
200
400
LP
bound
282.75
885.26
936.57
3322.46
6680.41
LPD
bound
276.20
873.80
931.79
3316.06
6674.38
LP
198.47
708.21
828.25
2787.66
6123.25
219.84
748.63
844.74
3088.68
6368.18
Stdev
(69.10)
(133.30)
(107.30)
(277.20)
(403.73)
(61.51)
(113.22)
(81.24)
(184.74)
(242.72)
LPD
215.59
728.02
849.25
3110.89
6393.09
223.02
768.14
875.66
3169.09
6486.49
Stdev
(61.70)
(131.93)
(96.11)
(242.15)
(319.41)
(60.54)
(99.57)
(74.21)
(167.84)
(222.64)
P1
bound
250.24
854.10
923.70
3307.20
6664.83
ADPD
bound
250.49
843.17
920.54
3314.41
6681.96
ADP
235.58
772.03
847.81
2894.62
6226.28
237.25
785.59
877.89
3166.23
6432.14
Table 11: Bounds and simulated average revenue for HS3.
Stdev
(51.41)
(109.20)
(72.71)
(275.36)
(305.55)
(48.66)
(98.27)
(73.81)
(176.80)
(278.87)
ADPD
235.58
779.50
881.78
3134.01
6480.61
237.17
791.82
885.71
3229.29
6522.37
Stdev
(51.23)
(113.09)
(70.62)
(220.60)
(275.52)
(48.05)
(93.28)
(62.41)
(122.21)
(214.97)
Re-solving
20
50
100
200
400
20
50
100
200
400
LP
bound
2534.41
6418.83
13889.00
26454.40
55660.10
LPD
bound
2501.63
6388.56
13877.90
26422.00
55611.30
LP
1790.82
3965.17
11110.40
22504.33
51122.14
1963.30
5398.34
12279.58
24357.47
51863.12
Stdev
(386.52)
(753.34)
(1115.40)
(1341.46)
(2409.41)
(379.91)
(537.58)
(862.97)
(1169.01)
(1961.96)
LPD
1975.15
5478.94
12933.26
24437.91
53684.53
2054.06
5642.24
12929.16
24901.04
53722.25
Stdev
(367.98)
(579.81)
(728.68)
(1243.38)
(1666.95)
(332.89)
(460.03)
(687.85)
(947.68)
(1315.53)
P1
bound
2371.81
6300.67
13660.20
26347.20
55409.80
ADPD
bound
2369.42
6292.95
13671.60
26413.10
55541.50
ADP
2053.02
5626.97
12848.76
24319.58
51638.53
2095.46
5769.49
12805.56
24783.93
52225.56
Stdev
(328.91)
(558.80)
(716.39)
(1261.57)
(2300.02)
(312.13)
(486.50)
(796.65)
(1060.12)
(1864.89)
ADPD
2095.84
5783.79
12912.86
25081.66
53712.35
2082.81
5793.13
12980.10
25117.72
53855.08
Table 12: Bounds and simulated average revenue for HS4.
31
PF1
Re-solving
20
50
100
200
400
20
50
100
200
400
LP bound
P1 bound
101.19%
100.38%
100.05%
100.00%
100.00%
LPD bound
ADPD bound
101.39%
100.72%
100.06%
100.00%
99.75%
PF2
ADP
LP
230.04%
440.19%
102.76%
114.66%
216.59%
105.22%
106.68%
101.16%
104.13%
105.57%
ADPD
LPD
100.81%
103.34%
104.42%
103.57%
101.68%
101.95%
101.84%
102.90%
102.48%
101.46%
LP bound
P1 bound
106.13%
102.37%
100.18%
100.20%
100.02%
LPD bound
ADPD bound
105.64%
103.29%
100.07%
100.04%
99.66%
ADP
LP
238.18%
103.27%
155.33%
131.40%
109.08%
108.78%
105.53%
107.57%
104.73%
103.88%
ADPD
LPD
101.09%
109.21%
106.60%
113.31%
108.11%
101.07%
104.24%
104.20%
103.67%
103.21%
Table 13: Relative difference in bounds and policy performance for parallel flights instances PF1 and PF2.
Stdev
(338.19)
(462.26)
(736.21)
(1095.66)
(1654.14)
(313.85)
(459.11)
(690.57)
(946.74)
(1369.68)
HS3
Re-solving
32
5
20
50
100
200
400
20
50
100
200
400
LP bound
P1 bound
112.99%
103.65%
101.39%
100.46%
100.23%
LPD bound
ADPD bound
110.27%
103.63%
101.22%
100.05%
99.89%
HS4
ADP
LP
118.70%
109.01%
102.36%
103.84%
101.68%
107.92%
104.94%
103.92%
102.51%
101.00%
ADPD
LPD
109.27%
107.07%
103.83%
100.74%
101.37%
106.34%
103.08%
101.15%
101.90%
100.55%
LP bound
P1 bound
106.86%
101.88%
101.67%
100.41%
100.45%
LPD bound
ADPD bound
105.58%
101.52%
101.51%
100.03%
100.13%
ADP
LP
114.64%
141.91%
115.65%
108.07%
101.01%
106.73%
106.88%
104.28%
101.75%
100.70%
ADPD
LPD
106.11%
105.56%
99.84%
102.63%
100.05%
101.40%
102.67%
100.39%
100.87%
100.25%
Table 14: Relative difference in bounds and policy performance for hub-and-spoke network instances HS3 and HS4.
References
D. Adelman. Dynamic bid-prices in revenue management. Forthcoming in Operations Research,
2006.
P. P. Belobaba and L. R. Weatherford. Comparing decision rules that incorporate customer diversion in perishable asset revenue management situations. Decision Sciences, 27(2):343363,
1996.
D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont,
MA, 1996.
S. Boyd and L. Vandenberghe. Convex Programming. Cambridge University Press, 2004.
S. L. Brumelle, J. I. McGill, T. H. Oum, K. Sawaki, and M. W. Tretheway. Allocation of airline
seats between stochastically dependent demands. Transportation Science, 24(3):183192, 1990.
D. de Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850865, 2003.
G. Gallego, G. Iyengar, R. Phillips, and A. Dubey. Managing flexible products on a network. CORC
Technical Report Tr-2004-01, IEOR Department, Columbia University, 2004.
M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John
Wiley & Sons, New York, 1994.
S. Schaible and J. Shi. Fractional programming: The sum-of-ratios case. Optimization Methods
and Software, 18(2):219229, 2003.
P. Schweitzer and A. Seidmann. Generalized polynomial approximations in Markovian decision
processes, 1985.
K. Talluri and G. van Ryzin. Revenue management under a general discrete choice model of
consumer behavior. Management Science, 50(1):1533, 2004a.
K. Talluri and G. van Ryzin. The Theory and Practice of Revenue Management. Kluwer Academic
Publishers, 2004b.
G. van Ryzin and Q. Liu. On the choice-based linear programming model for network revenue
management. Working paper, Columbia Graduate School of Business, 2004.
G. van Ryzin and G. Vulcano. Simulation-based optimization of virtual nesting controls under
consumer choice behavior. Working paper, Columbia Graduate School of Business, 2004.
D. Zhang and W. L. Cooper. Revenue management for parallel flights with customer-choice behavior. Operations Research, 53(3):415431, 2005.
33
D. Zhang and W. L. Cooper. Pricing substitutable flights in airline revenue management. Working
paper, Graduate School of Business, University of Chicago, 2006.
W. Zhao and Y.-S. Zheng. A dynamic model for airline seat allocation with passenger diversion
and no-shows. Transportation Science, 35(1):8098, 2001.
34

An Approximate Dynamic Programming Approach To Network Revenue Management With Customer Choice

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Approximate Dynamic Programming Approach To Network Revenue Management With Customer Choice

Uploaded by

Copyright:

Available Formats

An Approximate Dynamic Programming Approach to Network Revenue

Management with Customer Choice

Pj (S)[fj (vt+1 (x) vt+1 (x Aj ))] + vt+1 (x)

with decision variables vt (x) t, x.

Choice-based linear programming formulation

l Plj (ul (S))

Consider the affine functional approximation

Vt,i xi Vt+1,i (xi

The dual of (D1) is

x,SN (x) Yt1,x,S

To derive (LP) from (P1), define

Summing (10) over t, we obtain

Canceling terms and rearranging, we obtain

If Yt,x,S > 0, we must have xi aij i, j S; so xi

Pj (S)aij . It then follows that

Hence, (14) implies that

Summing (11) over t, we get

Together with Proposition 1 in Adelman (2006), we have the following proposition.

Otherwise, the set T is said to be efficient.

Canceling terms and rearranging, we obtain

The above inequality is satisfied for all S N . Therefore T is an optimal solution to

By Lemma 1, T is an efficient set.

We have the following monotonicity result.

Proof. The proof follows Adelman (2006). See appendix.

Figure 1: Time trajectory of V in an example with 4 resources

Column generation algorithm

(Vt,i Vt+1,i )xi t + t+1 .

(Vt,i Vt+1,i )xi t + t+1

It follows from the definition of zlj and l that

vlj zlj + vl0 l = 1 l L

Plugging (21)(23) into (S-MNLD), we obtain

xi aij vl0 zlj

It then follows from (24) that

Policies from functional approximation

Direct use of dynamic bid-prices

Let (V , ) be the optimal solution for (D1). Using the approximation

It then follows that

vt (x) vt (x Aj ) vti (xi ) vti (xi aij ) +

This policy is called LPD in Section 6.

Decomposition based on solutions to (D1)

We can also use an optimal solution (V , ) to (D1) in a decomposition approach. Given i, we

vt (x) vti (xi ) +

Substituting (34) into (D0), we obtain

(xk akj )Vt+1,k

(i) mini {vti (xi ) +

(ii) zLP zP 1 maxi {v1i (ci ) +

mini {v1i (ci ) +

From the optimality equation for period t, we obtain

to compute a heuristic policy for (1) by solving

Computational and bound performance

Figure 2: Network structure for HS1 instances with K = 4.

Figure 3: Network structure for HS2 instances with K = 4.

Description of simulated instances

when all products are open, the no purchase probability is 1/3.

Appendix A: Proof of Theorem 1

Pj (S )aij > 0 as in the

Pj (S)aij Yti ,x,S

Suppose xi < ci for some x , S such that Yt,x ,S > 0. Then

which is a contradiction. This completes the proof.

Finally, the remaining claims are proved.

Consider an x such that xi = xi + 1 and xk = xk k 6= i. Note that S N (x) N (x ). By dual

Combining (39) and (40) leads to (17).

Step 2. We now show that Vt,i