Optimal Scheduling of A Two-Stage Hybrid Flow Shop: Originalarticle

Math. Meth. Oper. Res.
(2006) 64: 107–124

DOI 10.1007/s00186-006-0066-4
O R I G I NA L A RT I C L E
Mohamed Haouari · Lotfi Hidri · Anis Gharbi
Optimal scheduling of a two-stage hybrid

flow shop
Received: 11 June 2004 / Accepted: 20 May 2005 / Published online: 9 June 2006
© Springer-Verlag 2006
Abstract We present an exact branch-and-bound algorithm for the two-stage hy-

brid flow shop problem with multiple identical machines in each stage. The objec-
tive is to schedule a set of jobs so as to minimize the makespan. This is the first
exact procedure which has been specifically designed for this strongly N P -hard
problem. Among other features, our algorithm is based on the exact solution of
identical parallel machine scheduling problems with heads and tails. We report the
results of extensive computational experiments on instances which show that the
proposed algorithm solves large-scale instances in moderate CPU time.
Keywords Deterministic scheduling · Hybrid flow shop · Branch-and-bound

method
1 Introduction
The two-stage hybrid flow shop scheduling problem may be formulated as follows.
A set J of n jobs has to be scheduled in a manufacturing system with two stages
(machining centers) Z 1 and Z 2 . Each stage Z i (i = 1, 2) has m i identical machines
in parallel. Each job j ( j = 1, . . . , n) has to be processed first for a j units of time
by one machine of Z 1 , and then for b j units of time by one machine of Z 2 . These
operations must be processed without preemption. Moreover, a job cannot be pro-
cessed by more than one machine at the same time and each machine processes
at most one job at one time. All processing times are assumed to be deterministic
and integer and all machines are ready from time zero onwards. The objective is
to construct a schedule for which the maximum completion time, or makespan, is
M. Haouari (B) · L. Hidri · A. Gharbi
Combinatorial Optimization Research Group–ROI,
Ecole Polytechnique de Tunisie, BP 743,
2078 La Marsa, Tunisia
E-mail: mohamed.haouari@ept.rnu.tn
108 M. Haouari et al.
minimized. Using the notation of Hoogeveen et al. (1991), this problem is denoted
F2(P) || Cmax .
The F2(P) || Cmax might be viewed as a generalization of two fundamental
scheduling problems. Indeed, if all processing times at Z 2 are equal to zero then
the problem amounts to solving a parallel machine problem with identical proces-
sors (P || Cmax ) which is N P -hard in the ordinary sense (Karp 1972). Also, if both
stages contain a single machine, then the problem reduces to the two-machine flow
shop problem (F2 || Cmax ) which is solvable in O(n log n) time. However, Gupta
et al. (1997) show that when at least one stage contains multiple machines (i.e.
max(m 1 , m 2 ) > 1) then the problem turns out to be N P -hard in the strong sense.
Moreover, a further indication of the intrinsic hardness of this problem is that even
its preemptive version is N P -hard in the strong sense (Hoogeveen et al. 1996).
The F2(P) || Cmax and its variants are adequate models for several manu-
facturing settings. Practical applications are discussed in Lin and Liao (2003),
Narasimhan and Pawnwalker (1984), and Sherali et al. (1990), to quote just a
few. The theoretical and practical importance of the F2(P) || Cmax motivated sev-
eral researchers to investigate it. In particular, most efforts have been focused
on developing and analyzing heuristic algorithms with worst-case error bounds.
These methods are based on a predefined ordering of the jobs and have a low
complexity of O(n log n) (see for instance (Buten and Shen 1973; Langston 1987;
Lee and Vairaktarakis 1994; Sriskandarajah and Sethi 1989). In particular, in Lee
and Vairaktarakis (1994), the authors describe a heuristic with an error bound of
2 − max(m11 ,m 2 ) . Recently, a remarkable result has been obtained by Schuurman and
Woeginger (2000) who demonstrated the existence of a polynomial time approxi-
mation scheme for this challenging problem.
On contrast, the literature dealing with the exact solution of F2(P) || Cmax is
surprisingly scant. Indeed, the only relevant work that we are aware is the branch-
and-bound algorithm described by Gupta et al. (1997) who addressed the particular
case where the second stage contains a single machine (m 2 = 1). They presented
computational tests with up to 250 jobs. In addition, several authors developed
exact procedures for the multiple-stage hybrid flow shop problem (see Brah and
Hunsucker 1991; Carlier and Néron 2000; Moursli and Pochet 2000; Néron et al.
2001; Perregaard 1995; Portman et al. 1998; Rajendran and Chaudhuri 1992; Sal-
vador 1973). Néron et al. (2001) describe an exact approach which outperforms all
previous ones and report the optimal solution of small-sized instances with up to
15 jobs, 5 stages, and 3 machines in each stage. For an overview on exact methods
for the multiple-stage hybrid flow shop problem the reader is referred to the survey
paper of Kis and Pesch (2004).
In this paper, we present an effective branch-and bound algorithm which has
been specifically designed for solving the F2(P) || Cmax problem with an arbitrary
number of machines in each stage. However, although our approach could be easily
modified to handle the particular case where one of the two stages contains a single
machine, we assume, for the sake of simplicity, that each stage contains at least two
parallel machines (i.e. min(m 1 , m 2 ) ≥ 2). A distinctive feature of our branch-and-
bound is that theevaluation
of terminal nodes of the search tree requires the optimal
solution of a P r j Cmax . However, although this problem is known to be intrac-
table, we provide evidence that its hardness doesn’t preclude its effectiveness for
lower bound computation. Other features that are peculiar to our procedure include
Optimal scheduling of a two-stage hybrid flow shop 109
a branching strategy that is based on a representation of a F2(P) || Cmax solution

as a permutation of jobs, tight lower and upper bounding procedures, dominance
rules, and procedures for adjusting heads and tails. Our algorithm has produced
proved optimal solutions for a number of randomly generated instances with up to
1,000 jobs.
The remainder of this paper is organized as follows. In the next section, we pres-
ent an overview of the branch-and-bound algorithm. In the subsequent sections,
we provide a detailed description of the implemented lower and upper bounds
as well as several important enhancements. Finally, we present the results of our
computational experiments and we provide some concluding remarks.
2 An overview of the branch-and-bound algorithm
2.1 Problem representation
It is instructive to view the F2(P) || Cmax in another way: as an identical parallel

machine scheduling problem with a complex optimality criterion. To develop this
interpretation, let denote the set of feasible schedules of stage Z 1 . Obviously,
each σ ∈ induces a well-defined completion time C 1j (σ ) for each j ∈ J. For a

given σ ∈ , consider the Pm 2 r j Cmax that is obtained by setting for all j ∈ J
a release date r j = C 1j (σ ) and a processing time p j = b j . Let C̃max (σ ) denote
its optimal makespan. Clearly, the F2(P) || Cmax amounts to finding a schedule
σ ∗ ∈ satisfying:
C̃max (σ ∗ ) = min C̃max (σ ).

σ ∈
A schedule σ ∈ could be represented as a permutation of the n jobs. This

permutation is simply obtained by ranking the jobs according to the nondecreas-
ing order of their starting times. Conversely, given a permutation of the n jobs
(σ1 , σ2 , . . . , σn ), the starting times of the associated schedule are computed in
O(n) time using the list scheduling rule which successively schedules the jobs
σ1 , σ2 , . . . , σn , in that order, whenever a machine becomes idle. It is noteworthy
that Carlier and Néron (2000) and Néron et al. (2001) proposed a similar schedule
representation but as a permutation of operations rather than jobs.
2.2 A branching scheme
Since each feasible schedule could be represented as a permutation of n jobs, we

adopted the following branching scheme. Each node Nl of level l of the search
tree corresponds to a partial permutation (i.e. schedule) σ (Nl ) = (σ1 , σ2 , . . . , σl )
of l jobs. Therefore, the corresponding set of unscheduled jobs is J¯(Nl ) = J \
{σ1 , σ2 , . . . , σl }. Obviously, the root
node
N0 corresponds to the empty permuta-
tion. Each node Nl has n − l = J¯(Nl ) descendants. Each of these descendants
corresponds to a partial permutation (σ1 , σ2 , . . . , σl , j0 ) where j0 ∈ J¯(Nl ). In this
way, a node at level n − 1 corresponds to a well defined schedule of the first stage.
In our branch-and-bound algorithm, we adopted the depth-first strategy.
Now, we provide a detailed description of the implemented lower and upper

bounding strategies as well as two additional important enhancements: a proce-
dure for detecting infeasible nodes and/or tightening the computed bounds, and
dominance rules for restricting the search tree size.
3 Lower bounds
3.1 Lower bounds that are computed at the root node
In this section, we describe the lower bounds that are only computed at the root
node.
3.1.1 A parallel machine based lower bound
Consider an instance of the F2(P) || Cmax . If we relax the constraint that each
machine of the first stage can process at most one job at a time, then a relaxation
can be obtained by setting
for all j ∈ J a release date r j = a j . The resulting
relaxation is a Pm 2 r j Cmax . Obviously, the optimal makespan Cmax
1 of this latter
problem is a valid lower bound on the optimal makespan of the F2(P) || Cmax .
Similarly, a lower bound denoted Cmax 2 is derived by relaxing the constraint
that each machine of the second stage can process at most one job at a time. In
this case, the resulting relaxation is a Pm 1 q j Cmax with q j = b j for all j ∈ J .
Hence, a lower bound is
LB1 = max(Cmax
1
, Cmax
2
).
It is noteworthy that since the exact computation of LB1 requires the optimal solu-
tion of an N P -hard problem, then for the sake of efficiency a weaker version of
LB1 could be alternatively computed by replacing Cmax 1 and/or C 2 by their corre-
max
sponding lower bounds. The reader is referred to Haouari M, Gharbi A (2004) for a
description of several fast lower bounds for parallel machine scheduling problems
with heads and tails. In our implementation, LB1 is obtained as a by-product of the
heuristic that is described in section 4.1. In any case, if the optimization algorithm
fails to solve exactly the parallel machine problem within a preset CPU time limit
then it delivers the best lower bound obtained upon termination. This lower bound
is used for the computation of LB1 .
3.1.2 An SPT-rule based lower bound
Let S ⊆ J denote a subset of jobs. Define I2 (S) as a lower bound on the total idle
time in stage Z 2 . This idle time is a direct consequence of the flow shop constraints.
Following Haouari and M’Hallah (1997), who considered the particular case where
S = J , we take I2 (S) equal to the minimum sum of completion times, on stage Z 1 ,
of the m 2 jobs of S whose processing times are the shortest. Clearly, I2 (S) can be
obtained by applying the (i.e. shortest processing time) (SPT)-rule (Brucker 1998).
It is easy to establish that the value

I2 (S) + j∈S b j
LBSPT (S) =
2
m2
defines a lower bound. Consequently, we can enhance this bound by considering

the maximal value over all subsets. Hence, a valid lower bound is
LB2SPT = max LB2SPT (S).
S⊆J
By using the symmetry of the hybrid flow shop problem, and by interchanging the
roles of stages Z 1 and Z 2 , we get the following lower bound

I1 (S) + j∈S a j
LBSPT = max
1
.
S⊆J m1
Hence, a valid lower bound is

LB2 = max(LB1SPT , LB2SPT )
The following lemma provides evidence that L B2 can be computed efficiently.
Lemma 1 L B2 can be computed in O(n 2 max(m 1 , m 2 )) time.
Proof Given S ⊆ J with |S| = m 2 , let āk (S) denotes the k th smallest a j ( j ∈ S)
and j[k] denotes a job in S satisfying a j[k] = āk (S). Assume that the jobs in S are
scheduled on machines M1 , M2 , . . . , Mm 1 of stage Z 1 according to the SPT rule.
We make the following observations
(i) the completion time C j[k] of a job j[k] ( k = 1, . . . , m 2 − 1) satisfies C j[k] ≤
C j[k+1] ;

(ii) If k = βm 1 + i with β = mk1 and i > 0 then j[k] is scheduled on Mi .
Otherwise, j[k] is scheduled on Mm 1 ;
(iii)
The jobs that are
scheduled on Mi (i = 1, . . . , m 1 − 1) are j[βm 1+ i]
β = 0, . . . , mm2 −i . The jobs that are scheduled on Mm 1 are j[βm 1 ]
1
m2
β = 1, . . . , m 1 ;

(iv) If k = mk1 m 1 + i then the contribution wk of job j[k] in I2 (S) is

m2 − i k
wk = − +1
m1 m1

m2 − i k −i
= − +1
m1 m1

m2 − k
= +1
m1

m2 − k + 1
= .
m1
Hence, we express I2 (S) in the following way

I2 (S) = w j ā j (S).
j=1,m 2

Now, define g(S) = j=1,m 2 w j ā j (S) + j∈S b j for S ⊆ J. We prove that find-
ing S ∗ ⊆ J such that g(S ∗ ) = max S⊆J g(S) amounts to solving a longest path
problem in a digraph.
For simplicity, in the sequel of this proof we assume w.n.l.g. that the jobs are
indexed so that a1 ≤ a2 ≤ · · · ≤ an . To that aim, consider the layered weighted
digraph G = (V, A) where the set of nodes is partitioned into m 2 + 2 subsets
V0 , V1 , . . . , Vm 2 +1 where V0 and Vm 2 +1 are singletons that contain a ‘start’ node
s and an ‘end’ node t, respectively, and Vk (k = 1, . . . , m 2 ) contains nodes ( j, k)
with j = k, k + 1, . . . , n − m 2 + k.
The set of arcs is constructed in the following way:
• For each node ( j, 1) ∈ V1 there is an arc (s, ( j, 1)) with weight w1 a j + b j
• For each node ( j, k) ∈ Vk (k = 2, . . . , m 2 ) there are j − k + 1 arcs of the form
((h, k − 1), ( j, k)) for h = k − 1, . . . , j − 1. Each arc incident to ( j, k) has a
weight wk a j + b j
• For each node ( j, m 2 ) ∈ Vm 2 there is an arc (( j, m 2 ), t) with weight k= j+1,n bk .
Consider a path P = (s, ( j1 , 1), ( j2 , 2), . . . , ( jm 2 , m 2 ), t) in G between s and
t with cost c(P). Then, the subset of jobs S(P) = { j1 , j2 , . . . , jm 2 , jm 2 + 1, jm 2 +
2, . . . , n} satisfies g(S(P)) = c(P). Conversely, given a subset S ⊆ J satisfying
|S| ≥ m 2 , one could easily associate to it a unique path in G with a corresponding
cost g(S).
Consequently, and since a subset S satisfying |S| < m 2 is necessarily domi-
nated, one could find a subset maximizing g(.) by finding a longest path between s
and t in G. The graph G being acyclic, the computation of a longest path can be car-
ried out in a time complexity linear in the number of arcs. Since, |A| = O(n 2 m 2 ),
then we conclude that computing LB2SPT can be done in O(n 2 m 2 ) time.
Similarly, by interchanging the roles of stages Z 1 and Z 2 , we conclude that
computing LB1SPT can be done in O(n 2 m 1 ) time. This completes the proof.

Example 1 Consider the instance with n = 6, m 1 = 2, and m 2 = 3 defined by

Table 1. We have

(3 + 4 + 9) + (1 + 3 + 5 + 7 + 9 + 11)
LBSPT (J ) =
2
= 18
3

(1 + 3) + (3 + 4 + 6 + 7 + 8 + 10)
LB1SPT (J ) = = 21
2
Then, LBSPT (J ) = max(18, 21) = 21. Now, consider the subset S = {3, 4, 5, 6}.
We have

(6 + 7 + 14) + (5 + 7 + 9 + 11)
LBSPT (S) =
2
= 20
3
Table 1 Data of example 1
j 1 2 3 4 5 6
aj 3 4 6 7 8 10
bj 1 3 5 7 9 11

(5 + 7) + (6 + 7 + 8 + 10)
LB1SPT (S) = = 22
2
Thus, LBSPT (S) = max(20, 22) = 22. This value corresponds to LB2 .
3.2 Lower bounds that are computed at non-root nodes
Assume that at a given node N = N0 of the search tree, a set JS of jobs have
been already scheduled and define J¯ = J \JS . Each job j ∈ JS has a well defined
completion time on stage Z 1 which is denoted by C1 j . Also, each machine Mi
(i = 1, . . . , m 1 ) of the first stage has an availability time τi on which it becomes
ready for processing jobs from J¯. We assume that τ1 ≤ τ2 ≤ · · · ≤ τm 1 . In the
following two sub-sections we present two lower bounds on the optimal makespan
of the subproblem corresponding to node N .

3.2.1 A P r j Cmax based lower bound

Clearly, a valid relaxation is a P r j Cmax which is defined on the second stage
and where each job j has a release date r j such that r j = C1 j if j ∈ JS and
r j = a j + τ1 otherwise. In the sequel, we assume that the jobs are ranked accord-
ing to non decreasing heads (that is, r1 ≤ r2 ≤ · · · ≤ rn ). Now, we describe how
to derive a very fast lower bound for this parallel machine problem. To that aim,
for a given subset S ⊆ J, define r̄k (S) as the k th smallest release date of S. Using
the same arguments as Carlier (1987), we claim that the value
 
r̄k (S) + bj
 k=1,m 2 j∈S 
LB3 (S) =  
 m2 
 
is a valid lower bound on the subproblem corresponding to node N . Again, we

enhance this lower bound by considering the lower bound
LB3 = max LB3 (S)
S⊆J
Lemma 2 LB3 can be computed in O(n log m 2 ).

Proof Clearly, we restrict our attention to the subsets S ⊆ J satisfying r̄m 2 (S) ∈
{rm 2 , rm 2 +1 , . . . , rn }. Define Sk (k = m 2 , . . . , n) as a subset of J satisfying
LB3 (Sk ) = max LB3 (S)
S⊆J :r̄m 2 (S)=rk
Therefore, we get
LB3 = max LB3 (Sk )
m 2 ≤k≤n

Define f (S) = j=1,m 2 r̄ j (S) + j∈S b j for S ⊆ J. We observe that
f (Sk+1 ) = f (Sk ) + rk+1 − r jk − b jk for k = m 2 , . . . , n − 1,
where jk = arg min j∈Sk (r j + b j ).
The computation of f (Sm 2 ) requires O(n) time. Given f (Sk ), the main effort
for computing f (Sk+1 ) is the determination of the job jk . This can be done in
O(log m 2 ) time. Therefore, the computation of f (Sk ) for k = m 2 , . . . , n can be
done in O(n log m 2 ) time.

3.2.2 A P, N Cinc q j Cmax based lower bound
A relaxation of the hybrid flow shop problem is derived by setting for each job j ∈ J¯
a tail q j = b j . In this way, we define on stage Z 1 a parallel machine problem
with
machine availability times and tails. This problem is denoted P, N Cinc q j Cmax
where N Cinc indicates that the number of available machines is non decreasing
with time (Schmidt 2000). Papers dealing with similar parallel machine scheduling
problems are Lee (1991) and Lee et al. (2000).
It is worth noting that due to availability times, some machines may not process
any job in any optimal solution. Gharbi and Haouari (2005) prove the following
proposition.
Proposition
3 Let U B denote an upper bound on the optimal makespan of P, N Cinc
q j Cmax problem. Define

j∈ J¯ a j
• m l ( J ) = U B−τ −q̄ ( J¯) where q̄1 ( J¯) = min q j
¯
1 1 j∈ J¯
• m u ( J¯) as the smallest k(k = 1, . . . , m 1 −1) satisfying τk+1 +min j∈ J¯ (a j +q j ) >
U B. If no k satisfies this condition, then m u ( J¯) = m 1 .
Then, the number of machines m that are processing in an optimal schedule

satisfies m l ( J¯) ≤ m ≤ m u ( J¯).
Assume that the jobs of J¯ are assigned to exactly m machines of stage Z 1 , then
a valid lower bound which is defined for a given subset S ⊆ J¯ is:

i=1,m τi + j∈S a j + k=1,m q̄k (S)
LB4 (S, m) =
m
where q̄k (S) is defined as the kth smallest tail of S. Again, we can improve this
lower bound by maximizing it over all of the subsets of J¯. Define
LB4 (m) = max LB4 (S, m)

S⊆ J¯
Provided that the jobs are sorted according to non decreasing tails, the following
result holds.
Lemma 4 LB4 (m) can be computed in O(n log m).
Proof This result could be proved using the same arguments as those of
Lemma 2.

Consequently, a valid lower bound for the subproblem defined by node N is
LB4 = min LB4 (m)

m l ( J¯)≤m≤m u ( J¯)
An immediate consequence of Lemma 4 is
Corollary 5 LB4 can be computed in O(m 1 n log m 1 ).
4 Upper bounds
At each node of the search tree a heuristic is called for delivering an upper bound
on the optimal makespan. Two heuristics were implemented. The first one is based
on the exact solution of a parallel machine scheduling problem and is only used at
the root node for generating an initial upper bound, while the second one, which
is a very fast priority-rule based heuristic, is called at each node of the search tree.
Now, we successively describe these two procedures.
4.1 An optimization based heuristic
This heuristic has been designed in the same vein as the celebrated Shifting Bottle-
neck Procedure (Adams et al. 1988). Basically, it consists in alternatively solving a
parallel machine problem on stage Z 1 and on stage Z 2 until a termination condition
holds. A detailed description of this procedure is the following.
4.1.1 Heuristic H1
Phase 1: Construction of an
initial feasible schedule
1.1. Construct a Pm 1 q j Cmax instance by setting for each job j ∈ J a pro-
cessing time p j = a j and a tail q j = b j
1.2. Solve exactly the instance defined in 1.1. Let C1 j denote the completion
time of j ∈ J
1.3. Construct a Pm 2 r j Cmax instance by setting for each job j ∈ J a pro-
cessing time p j = b j and a release date r j = C1 j
1.4. Solve exactly the instance defined in 1.3. Let t2 j denote the start time of
j ∈ J. Set U B equal to the value of the optimal makespan
Phase 2: Improvement of the upper bound
2.1. Construct a Pm 1 || L max instance by setting for each job j ∈ J a process-
ing time p j = a j and a due date d j = t2 j
2.2. Solve exactly the instance defined in 2.1. Let C1 j define the completion time
of j ∈ J. If L max =
0 then Stop, Else Set U B = U B + L max
2.3. Construct a Pm 2 r j Cmax as indicated in 1.3
2.4. Solve exactly the instance defined in 2.3. Let t2 j define the start time of
j ∈ J. If Cmax < U B then Set U B = Cmax
2.5. Go to Step 2.1
In phase 1, the procedure constructs a schedule on the first stage (Step 1.2) and
a second one on the second stage (Step 1.4). These two schedules are concatenated
in order to get an initial feasible solution. In the second phase, the procedure alter-
natively keeps the assignment of the operations on stage Z 2 fixed, and attempts to
reschedule the first stage in order to get a better solution (Step 2.2). Obviously, an
optimal solution to the problem defined in Step 2.2 satisfies L max ≤ 0. Moreover,
one could easily check that the existence of a schedule on stage Z 1 with L max < 0
implies that concatenating it with the schedule of stage Z 2 , yields an improved
global solution (and vice versa). Similarly, the procedure keeps the assignment of
the operations on stage Z 1 fixed, and attempts to reschedule the second stage in
order to get a better solution (Step 2.4). The process is continued until no improve-
ment is possible.
In our implementation, the parallel machine problems are solved using the
branch-and-bound algorithm described in Gharbi and Haouari (2004).
By interchanging the roles of stages Z 1 and Z 2 , a second upper bound is com-
puted in a similar way. We take the best of the two derived solutions.
4.2 A priority-rule based heuristic
In order to get very quickly an upper bound associated with each non-root node N
of the search tree, we implemented the following priority-rule based heuristic. This
heuristic takes as an input the subset of unscheduled jobs J¯ and the availability
times τi (i = 1, . . . , m 1 ) of the machines of stage Z 1 .
4.2.1 Heuristic H2
Phase 1: Scheduling of stage Z 1

1.1. Order the jobs in J¯ by non-increasing b j . Set U = J¯
1.2. Whenever a machine is idle, schedule a job j ∈ U with largest b j . Set
U = U \{ j}
1.3. If U = ∅ then go to Step 1.2
Phase 2: Scheduling of stage Z 2
2.1. For each job j ∈ J set a release date r j = C1 j (completion time of j on
stage Z 1 ). Set U = J
2.2. Whenever a machine is idle, schedule an already released job j ∈ U with
largest b j . Set U = U \ { j}
2.3. If U = ∅ then Stop, Else go to Step 2.2
The main effort in H 2 is the job sorting is step 1.1 which requires O(n log n)
time. This ranking is performed just once. Hence, at each node the computational
effort of running H 2 is O(n).
5 Further enhancements
5.1 Adjustments
Several authors have shown that adjusting the data often improves the efficiency
of branch-and-bound algorithms (see for instance (Brucker et al. 1994; Carlier and
Pinson 1994; Gharbi and Haouari 2002; Lopez et al. 1992; Néron et al. 2001).
In our branch-and-bound algorithm, we implemented the so-called Feasibility and
Adjustment Procedure (FAP) which has been successfully used by Gharbi and
Haouari (2002) for the P|r j , q j , d j |Cmax . The objective of the F A P is twofold.
It aims at adjusting the heads and tails, and checking the feasibility of a nonpre-
emptive schedule. The FAP can be extended to deal with the F2(P) || Cmax in the
following way. First, let LB and UB denote a lower and an upper bound on the
optimal solution of an F2(P) || Cmax instance. The problem is to check the fea-
sibility of a nonpreemptive schedule with makespan less than or equal to a value
C ∈ [LB, UB−1]. The FAP is applied to the P|r j , q j , d j |Cmax defined on stage Z 1
by associating with each job j ∈ J a head r1 j = τ1 , a tail q1 j = b j and a deadline
d1 j = C − q1 j . Similarly, the FAP is applied to the P|r j , q j , d j |Cmax defined on
stage Z 2 by associating with each job j ∈ J a head r2 j = a j , a tail q2 j = 0 and a
deadline d2 j = C. From this point, the heads and tails in both stages are adjusted
similarly as described in [9] and the detailed description of the adjustment proce-
dure is not reproduced here. However, it is worth noting that any adjustment of a
head r1 j is propagated to Z 2 by setting r2 j = max{r2 j , r1 j + a j }. Also, any adjust-
ment of a deadline d2 j is propagated to Z 1 by setting d1 j = min{d1 j , d2 j − b j }.
The process is stopped whenever a schedule is proved infeasible or no adjustment
has been performed. Hence, each job has in each stage a well defined head, tail and
deadline.
In the branch-and-bound algorithm, once a node N is selected, the FAP is
applied. Then, either an infeasibility is detected and the node is consequently
pruned, or the heads and tails are adjusted. In this latter case, we compute for each
non-root node the lower bound
LB(N ) = max{LB3 , LB4 }.
5.2 Dominance rules
Dominance rules aim at removing dominated nodes from the set of candidate nodes
to be branched. These rules play a crucial role in speeding up the branch-and-bound
algorithm. Several dominance rules have been proposed by Gharbi and Haouari
(2004) for the P, N Cinc |r j , q j |Cmax . These dominance rules can be immediately
extended to the F2(P) || Cmax . Consider the P, N Cinc |r j , q j |Cmax defined on stage
Z 1 . Let J¯ = { j1 , j2 , . . . , j K } denote the set of unscheduled jobs sorted according
to the nondecreasing order of their release dates. The following dominance rules
hold:
R1 : If two jobs jk and jk+1 have equal heads, processing times and tails, then job
jk+1 is not candidate to be appended to σ
R2 : All jobs h such that rh ≥ min j∈ J¯ (r j + p j ) are not candidate to be appended
to σ
R3 : Assume that there is a job jk such that r jk ≥ τ2 . Then, only jobs jh (h =
k + 1, . . . , K ) are candidate to be appended to σ jk
R4 : If r jK ≥ τ2 , then job j K is not candidate to be appended to σ
5.3 A cyclic implementation
In order to take advantage of the symmetry of the F2(P) || Cmax , we propose a

cyclic implementation of our branch-and-bound algorithm. It consists in iteratively
solving the Forward and the Backward problem (i.e. the original problem and its
symmetric, respectively). If the branch-and-bound algorithm fails in finding an
optimal solution within a given time limit for the Forward (Backward) problem,
then it is applied to the Backward (Forward) problem using at the root node the best
upper and lower bounds so far computed. The process continues until a solution
is proved optimal or there is no improvement of neither the lower nor the upper
bound.
6 Computational experiments
The proposed branch-and-bound algorithm was coded in C and implemented on

a Pentium IV 2.8GHz Personal Computer with 1GB RAM. The time limit for the
Forward (Backward) problem was set equal to 600s. Now, we describe the test
problems and present computational results.
6.1 Test problems
Three sets of test problems have been randomly generated:

• Set A: these instances were generated in a similar way as in [20]. The number
of jobs n is taken equal to 10, 20, 30, 40, 50, 100, 150, 300, 500, and 1,000 jobs.
The number of machines (m 1 , m 2 ) are (2, 2), (2, 4), (4, 2), and (4, 4). The pro-
cessing times are drawn randomly either from the discrete uniform distribution
on [1,20] or [1,40]. We combined these problem characteristics to obtain 12
different problem classes for each fixed n. For each combination, 20 instances
were generated. Hence, the Set A contains a total number of 2400 instances.
• Set B: The number of machines m 1 and m 2 were drawn randomly from the
discrete uniform distribution on [2,6]. The processing times on stage Z i were
drawn randomly from the discrete uniform distribution on [1, 5m i ] (i = 1, 2).
The number of jobs n was taken equal to 10, 20, 30, 40, 50, 100, 150, 200, 500,
750, and 1,000. For each n, 50 instances were generated which results in 550
instances.
• Set C: This set contains 550 instances that were generated in a similar way as
those of Set B, but the processing times on both stages were drawn randomly
from the discrete uniform distribution on [1, 20].
Thus, we generated a total of 3500 instances. The set A, contains a diversified
mix of shop and size configurations. For the set B, the workloads at the two stages
tend to be well balanced while for the set C, the workloads are mostly unbalanced.
6.2 Performance analysis
The results of the computational study on the set A are summarized in Table 2. The
column headings are as follows:
Table 2 Performance on set A

n (m 1 , m 2 ) PTR(40 :40) PTR(40 :20) PTR(20 :40)
US NN Time Gap US NN Time Gap US NN Time Gap
10 (2, 2) 0 1, 878 2.20 – 0 8, 183 1.20 – 0 3, 496 30.18 –
(2, 4) 0 310 0.13 – 0 1 1.16 – 0 8, 319 7.10 –
(4, 2) 0 1 0.15 – 0 1, 311 0.12 – 0 1 0.48 –
(4, 4) 0 1, 542 0.17 – 0 246 0.08 – 0 1 0.37 –
20 (2, 2) 2 1, 093, 411 72.59 0.88 0 983, 549 30.14 – 0 21 6.75 –
(2, 4) 0 1 0.24 – 0 1 0.08 – 3 24, 061 97.70 1.71
(4, 2) 0 5 5.48 – 2 2, 161, 536 84.11 6.31 0 1 5.76 –
(4, 4) 4 4, 752, 792 347.72 1.78 0 1, 113, 987 45.23 – 1 3, 025 16.10 1.25
30 (2, 2) 0 1, 197, 850 95.23 – 0 506 0.13 – 0 52 0.79 –
(2, 4) 0 10 1.10 – 0 1 0.31 – 4 22, 552 61.29 1.68
(4, 2) 0 13 13.86 – 1 641, 798 174.66 6.36 0 1 2.14 –
(4, 4) 5 873, 327 138.61 3.54 0 521, 573 46.81 – 0 9, 562 22.00 –
40 (2, 2) 0 238, 851 60.35 – 0 1 0.31 – 0 4 0.76 –
(2, 4) 0 405, 130 30.16 – 0 1 0.22 – 0 540, 679 108.28 –
(4, 2) 0 4 6.63 – 2 38, 967 38.51 2.14 0 1 5.71 –
(4, 4) 2 696, 646 109.64 0.89 0 781, 358 66.45 – 0 15, 607 18.04 –
50 (2, 2) 0 1, 466 0.75 – 0 21 1.64 – 0 1 1.39 –
(2, 4) 0 3 0.75 – 0 1 0.25 – 0 1, 619 13.54 –
(4, 2) 0 1,103 9.18 – 0 307, 777 181.84 – 0 1 2.29 –
(4, 4) 3 51, 786 57.84 0.97 0 301, 217 48.79 – 0 50, 169 21.07 –
100 (2, 2) 0 2, 850 3.38 – 0 200 1.19 – 0 505 3.11 –
(2, 4) 0 5 1.09 – 0 1 1.12 – 0 10, 362 49.36 –
(4, 2) 0 1 12.32 – 0 3, 898 49.06 – 0 1 2.29 –
(4, 4) 1 49, 594 99.42 0.20 2 1 8.25 0.19 2 1, 418 7.67 0.18
150 (2, 2) 0 1, 690 38.85 – 0 1 2.74 – 0 1 11.06 –
(2, 4) 0 1 3.16 – 0 1 1.91 – 0 5, 667 13.73 –
(4, 2) 0 1 10.52 – 0 3, 460 39.05 – 0 1 14.64 –
(4, 4) 3 17, 571 83.11 0.13 1 12, 966 27.69 0.14 0 1, 163 35.48 –
300 (2, 2) 0 11, 964 106.08 – 0 1 13.88 – 0 1 42.34 –
(2, 4) 0 1 29.94 – 0 1 16.68 – 0 6, 077 59.46 –
(4, 2) 0 1 66.96 – 1 23, 845 216.93 0.06 0 1 52.65 –
(4, 4) 0 14, 321 183.83 – 1 1 33.62 0.07 0 1 92.28 –
500 (2, 2) 0 28, 412 511.08 – 0 59 82.49 – 0 3, 401 206.54 –
(2, 4) 0 2, 745 184.86 – 0 175 90.79 – 0 1, 838 152.69 –
(4, 2) 0 4, 159 264.00 – 0 106 167.99 – 0 2, 285 245.86 –
(4, 4) 1 10, 061 371.55 0.04 1 4, 236 242.69 0.04 3 2, 716 296.61 0.05
1000 (2, 2) 0 2, 680 383.19 – 2 977 224.71 0.01 1 3, 790 437.33 0.01
(2, 4) 1 911 274.40 0.01 0 2, 027 322.07 – 0 4, 064 416.44 –
(4, 2) 0 1, 952 390.14 – 5 2, 276 348.69 0.02 0 1, 785 388.12 –
(4, 4) 10 7, 140 602.32 0.03 13 4, 119 443.33 0.02 7 9, 341 757.72 0.03
• n : number of jobs
• (m 1 , m 2 ) : number of machines in the first and second stage, respectively
• P T R(α : β) : indicates that the processing times of the first stage and the sec-
ond stage were drawn from the discrete uniform distribution on [1,α] and [1,β],
respectively
• U S : number of instances for which optimality was not proved after reaching
the time limit
• N N : mean number of nodes for solved instances
• T ime : average CPU time (in seconds) for solved instances
−LB
• Gap : average gap of unsolved instances where the gap is 100 × UBLB
Table 2 provides strong evidence that the proposed algorithm can solve large
scale instances within moderate CPU time. We observe that most of the unsolved
instances are of very large scale (48% are the 1,000-job ones). Moreover, the hard-
est instances are those where the workloads in the two stages are balanced. Also,
the problems get harder as the number of machines increases. On contrary, when
the workloads are unbalanced, the problems are much easier to solve. For instance,
our procedure solved to optimality all of the 200 instances with (m 1 , m 2 ) = (2, 4)
and PTR(40 : 20). For this problem class, branching was only required for very
large instances (n ≥ 500). Surprisingly, we found that solving a medium-sized bal-
anced instances (n = 20 or 30) could be more challenging than solving large-sized
ones.
The global performance of the proposed procedure is confirmed by the com-
putational results that were obtained on the sets B and C. In particular, we observe
from Tables 3, 4 that 82% of the balanced instances (Set B) and 95% of the unbal-
anced instances (Set C) were solved to optimality. Furthermore, the average gap
of the unsolved instances is strictly less than 0.28% for n ≥ 200.
Overall, our algorithm produced proven optimal solutions for 94% of the in-
stances (3,290 instances of 3,500).
Table 3 Performance on set B

n US NN Time Gap
10 0 1, 003 0.05 –
20 15 872, 657 134.98 3.99
30 16 497, 285 82.45 3.26
40 12 412, 450 140.48 4.56
50 13 223, 893 121.07 2.95
100 6 124, 819 105.40 1.69
150 5 28, 772 29.85 0.29
200 10 46, 974 83.99 0.24
500 10 391 97.08 0.10
750 9 2, 018 210.67 0.06
1,000 6 1, 388 256.53 0.08
Table 4 Performance on set C

n US NN Time Gap
10 0 439 0.10 –
20 4 1, 407, 467 79.76 3.78
30 1 128, 426 9.00 1.37
40 1 2, 921 10.59 0.95
50 2 54, 742 27.66 3.93
100 1 66, 611 54.54 0.52
150 1 3, 123 18.63 0.96
200 1 1, 184 18.10 0.28
500 1 6, 076 259.86 0.10
750 5 3, 781 353.28 0.06
1,000 9 2, 646 409.88 0.04
6.3 Impact of the different components
In order to evaluate the pertinence of each component (lower bounds, upper bound,
dominance rules, cyclic implementation), several variants of our algorithm have
been implemented. These variants are the following:
• B&B\LBi : The lower bound LBi (i = 1, . . . , 4) is not implemented
• B&B\LBroot : The lower bounds that are computed at the root node (namely
LB1 and LB2 ) are replaced by the following much simpler lower bound
 
 1
LB5 = max  aj
 + min b j , min a j
 m1  j∈J j∈J
j∈J
  
1 
+
m b j
 , max(a j + b j )

 2 j∈J  j∈J 
• B&B\UBroot : The upper bound that is computed at the root node is not imple-
mented
• B&B\ DR: The dominance rules are not implemented
• B&B\ Cyclic: There is no cyclic implementation
We compared each of these variants to our branch-and-bound algorithm on the
two following sample sets of instances:
• Set 1: The 60 instances of Set A with n = 40 and (m 1 , m 2 ) = (4, 4);
• Set 2: The 50 instances of Set B with n = 100.
The results are displayed in Tables 5, 6. In these tables, we adopted the following
notation:
T imeratio : mean ratio of the CPU time of the proposed variant to the original
algorithm,
N Nratio : mean ratio of the number of explored nodes of the proposed variant to
the original algorithm,
U S : the number of instances that remain unsolved.
Tables 5, 6 show the worth of implementing each of the proposed components
since our algorithm consistently outperforms all its variants. We observe that the
Table 5 Impact of the different components on set 1
US NNratio Timeratio
B&B\L B1 2 1.099 1.070
B&B\L B2 2 1.009 1.094
B&B\L B3 11 1.246 4.347
B&B\L B4 3 4.650 1.548
B&B\L Broot 2 20.095 9.923
B&B\U Broot 3 97416.562 77.380
B&B\D R 3 1.069 1.152
B&B\C yclic 5 2.211 5.320
Table 6 Impact of the different components on set 2
US NNratio Timeratio
B&B\L B1 7 1.002 1.027
B&B\L B2 11 76978.054 28.963
B&B\L B3 17 1 1
B&B\L B4 7 4.007 1.347
B&B\L Broot 11 77106.562 20.462
B&B\U Broot 11 99645.593 23.899
B&B\D R 9 1.001 1.019
B&B\C yclic 9 1.301 1.613
dominance rules have a modest (but not negligeable) impact since removing these
rules caused, for the instances of Set 1, an increase of the average CPU time by
15.2%. All the other components play a much significant role. For instance, we
observe that skipping the upper bound computation at the root node makes the algo-
rithm about 77 and 24 times slower for the instances of Set 1 and Set 2, respectively.
Also, if at the root node, LB1 and LB2 were replaced by the trivial bound LB5 ,
then the algorithm becomes 10 and 57 times slower for the instances of Set 1 and
Set 2, respectively, and the mean number of explored nodes increases dramatically.
7 Conclusion
In this paper, we presented an exact algorithm for the two-stage hybrid flow shop
problem with at least two identical machines in each stage. Our algorithm incor-
porates several features including a representation of a schedule as a permutation
of jobs, fast lower bounds, effective heuristics, adjustment procedures, and domi-
nance rules. Moreover, one of the results of our work is to provide evidence that
embedding the exact solution of an N P -hard problem within a branch-and-bound
does not preclude its effectivity. We presented extensive computational results
which provide evidence that the proposed approach produces optimal solutions for
large-sized instances.
Future research efforts need to be focused on the development of exact proce-
dures for solving more complex hybrid flow shop problems involving setup times,
inter-stage transport times, release dates, and due dates. A second issue worthy
of future investigation is to investigate an integer programming formulation based
solution procedure for the F2(P) || Cmax . This solution procedure might be more
effective for solving medium-sized balanced instances which were found particu-
larly challenging.
References
Adams J, Balas E, Zawack D (1988) The shifting bottleneck procedure for job shop scheduling.
Manage Sci 34:391–401
Brah SA, Hunsucker JL (1991) Branch and bound method for the flow shop with multiple pro-
cessors. Eur J Oper Res 51:88–99
Brucker P, Jurisch B, Kramer A (1994) The job–shop problem and immediate selection. Ann
Oper Res 50:73–114
Brucker P (1998) Scheduling algorithms. Springer, Berlin Heidelberg New York Germany
Buten RE, Shen VY (1973) A scheduling model for computer systems with two classes of
processors. In: Proceedings of the sagmore computer conference on parallel processing, pp
130–138
Carlier J (1987) Scheduling jobs with release dates and tails on identical machines to minimize
the makespan. Eur J Oper Res 29:298–306
Carlier J, Pinson E (1994) Adjustment of heads and tails for the job-shop problem. Eur J Oper
Res 78:146–161
Carlier J, Néron E (2000) An exact method for solving the multiprocessor flowshop. RAIRO-Oper
Res 34:1–25
Gharbi A, Haouari M (2002) Minimizing makespan on parallel machines subject to release dates
and delivery times. J Scheduling 5:329–355
Gharbi A, Haouari M (2004) Optimal parallel machines scheduling with initial and final avail-
ability constraints. In: Proceedings of the ninth international workshop on project management
and scheduling PMS, pp 218–221
Gharbi A, Haouari M (2005) Optimal parallel machines scheduling with availability constraints.
Discrete Appl Math (in press)
Gupta JND, Hariri AMA, Potts CN (1997) Scheduling a two-stage hybrid flow shop with parallel
machines at the first stage. Ann Oper Res 69:171–191
Haouari M, M’Hallah R (1997) Heuristic algorithms for the two-stage hybrid flowshop problem.
Oper Res Lett 21:43–53
Haouari M, Gharbi A (2004) Lower bounds for scheduling on identical parallel machines with
heads and tails. Ann Oper Res 129:187–204
Hoogeveen JA, Lenstra JK, Veltman B (1996) Preemptive scheduling in a two-stage multipro-
cessor flow shop is NP-Hard. Eur J Oper Res 89:172–175
Karp RM (1972) Reducibility among combinatorial problems in complexity of computer com-
putations. In: Miller RE, Thatcher JW, (eds) Plenum Press, New York, pp 85–103
Kis T, Pesch E (2004) A review of exact solution methods for the non-preemptive multiprocessor
flowshop problem. Eur J Oper Res (in press)
Langston MA (1987) Interstage transportation planning in the deterministic flowshop environ-
ment. Oper Res 35:556–564
Lee CY (1991) Parallel machine scheduling with non-simultaneous machine available time.
Discrete Appl Math 30:53–61
Lee CY, Vairaktarakis GL (1994) Minimizing makespan in hybrid flowshop. Oper Res Lett
16:149–158
Lee CY, He Y, Tang G (2000) A note on parallel machine scheduling with non-simultaneous
machine available time. Discrete Appl Math 100-133–135
Lin HT, Liao CJ (2003) A case study in a two-stage hybrid flow shop with setup time and dedicated
machines. Int J Product Econ 86:133–143
Lopez P, Erschler J , Esquirol P (1992) Ordonnancement de tâches sous contraintes: une approche
énergétique. RAIRO-APII 26:453–481
Moursli O, Pochet Y (2000) A branch and bound algorithm for the hybrid flowshop. Int J Product
Econ 64:113–125
Narasimhan SL, Panwalker SS (1984) Scheduling in a two-stage manufacturing process. Int J
Product Res 22:555–564
Néron E, Baptiste Ph, Gupta JND (2001) Solving hybrid flow shop problem using the energetic
reasoning and global operations. Omega 29:501–511
Perregaard M (1995) Branch and bound method for the multiprocessor jobshop and flowshop
scheduling problem. Master thesis, Department of Computer Science, University of Copenha-
gen
Portman MC, Vignier A, Dardilhac D, Dezalay D (1998) Branch and bound crossed with GA to
solve hybrid flowshops. Eur J Oper Res 107:389–400
Rajendran C, Chaudhuri D (1992) Scheduling in n-job, m-stage flowshop with parallel processors
to minimize makespan. Int J Product Econ 27:137–143
Salvador MS (1973) A solution to a special class of flow shop scheduling problems. In: El-
maghraby SE (ed) Symposium of the theory of scheduling and applications. Springer, Berlin,
Heidelberg New York pp 83–91
Schmidt G (2000) Scheduling with limited machine availability. Eur J Oper Res 121:1–15
Schuurman P, Woeginger GJ (2000) A polynomial time approximation scheme for the two-stage
multiprocessor flow shop problem. Theor Comput Sci 237:105–122
Sherali HD, Sarin SC, Kodialam MS (1990) Models and algorithms for a two-stage production
process. Product Plan Control 1:27–39
Sriskandarajah C, Sethi SP (1989) Scheduling algorithms for flexible flowshops : worst and
average case performance. Eur J Oper Res 43:143–160

Optimal Scheduling of A Two-Stage Hybrid Flow Shop: Originalarticle

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimal Scheduling of A Two-Stage Hybrid Flow Shop: Originalarticle

Uploaded by

Copyright:

Available Formats

Math. Meth. Oper. Res.

(2006) 64: 107–124

Mohamed Haouari · Lotfi Hidri · Anis Gharbi

Optimal scheduling of a two-stage hybrid

Abstract We present an exact branch-and-bound algorithm for the two-stage hy-

Keywords Deterministic scheduling · Hybrid flow shop · Branch-and-bound

a branching strategy that is based on a representation of a F2(P) || Cmax solution

2 An overview of the branch-and-bound algorithm

2.1 Problem representation

It is instructive to view the F2(P) || Cmax in another way: as an identical parallel

C̃max (σ ∗ ) = min C̃max (σ ).

A schedule σ ∈ could be represented as a permutation of the n jobs. This

2.2 A branching scheme

Since each feasible schedule could be represented as a permutation of n jobs, we

Now, we provide a detailed description of the implemented lower and upper

3.1 Lower bounds that are computed at the root node

3.1.1 A parallel machine based lower bound

3.1.2 An SPT-rule based lower bound

defines a lower bound. Consequently, we can enhance this bound by considering

Hence, a valid lower bound is

Example 1 Consider the instance with n = 6, m 1 = 2, and m 2 = 3 defined by

Table 1 Data of example 1

3.2 Lower bounds that are computed at non-root nodes

is a valid lower bound on the subproblem corresponding to node N . Again, we

Lemma 2 LB3 can be computed in O(n log m 2 ).

Then, the number of machines m that are processing in an optimal schedule

LB4 (m) = max LB4 (S, m)

Lemma 4 LB4 (m) can be computed in O(n log m).

Consequently, a valid lower bound for the subproblem defined by node N is

LB4 = min LB4 (m)

An immediate consequence of Lemma 4 is

Corollary 5 LB4 can be computed in O(m 1 n log m 1 ).

4.1 An optimization based heuristic

4.2 A priority-rule based heuristic

Phase 1: Scheduling of stage Z 1

LB(N ) = max{LB3 , LB4 }.

5.2 Dominance rules

5.3 A cyclic implementation

In order to take advantage of the symmetry of the F2(P) || Cmax , we propose a

The proposed branch-and-bound algorithm was coded in C and implemented on

6.1 Test problems

Three sets of test problems have been randomly generated:

6.2 Performance analysis

Table 2 Performance on set A

Table 3 Performance on set B

Table 4 Performance on set C

6.3 Impact of the different components

Table 5 Impact of the different components on set 1

Table 6 Impact of the different components on set 2

You might also like