Thesis: Testing in Scheduling Problems For Information Retrieval

Testing in scheduling problems for information
retrieval
Ioannis Samaras
Student number: 2577900
31 July 2016
Master Thesis Operations Research and Business Econometrics
Thesis committee:
Thesis supervisor: Dr. Ir. R.A. Sitters
Second Reader: Dr. D.A. van der Laan
Abstract
This thesis focuses on the two machine flow shop problem with unknown delays. This a problem
often found in the design of manufacturing facilities, where the equipment can be ordered only
after its specifications are known with one-hundred-percent confidence. In this industry, the man-
agement uses the approach of Critical Path Method (CPM), which, given the stochastic nature of
the problem, can lead to undesired prolongation of the project duration. For this reason, the idea
of testing some of the jobs is proposed as an exploration through exploitation concept.
Keywords: Two-machine flow shop with unknown delays, scheduling with testing, unknown
time lags, exploration through exploitation, information retrieval
1
Contents
1 Introduction 4
1.1 Motivation for this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Some details about the various project phases . . . . . . . . . . . . . . . . . . . 5
1.3 What happens in reality and what can be achieved . . . . . . . . . . . . . . . . . 5
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Literature Review 6
2.1 Flow shops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Main categories of scheduling problems . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Learning and Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Model Description 9
3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 Deterministic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.2 Stochastic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.3 Stochastic Model with Testing . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Model Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Preliminaries 11
4.1 Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Theorems and Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Local search properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Integer Linear Programming 26

5.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6 Modeling 27
6.1 Project phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2.1 Free Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2.2 Costly Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7 Computational Results 31
7.1 Deterministic Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.1.1 Equal Processing Times . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.1.2 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.1.3 Deterministic problem - Discussion . . . . . . . . . . . . . . . . . . . . 35
7.2 Stochastic Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2.1 Normal processing times at WS1 and equal processing times at WS2 . . . 35
7.2.2 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2.3 Stochastic problem discussion . . . . . . . . . . . . . . . . . . . . . . . 42
7.3 k-free Testing vs Costly Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.3.1 Equal Processing Times . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.3.2 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.3.3 Testing discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8 Conclusions and Future Research 46
2
Abbreviations
CPM Critical Path Method
EDD Earliest Due Date
FCFS First Come First Served
ILP Integer Linear Program
LB Lower Bound
LDF Longest Delay First
LPT Longest Processing Time
SPT Shortest Processing Time
WS Work Station
3
1 Introduction
1.1 Motivation for this thesis
In the field of chemical and pharmaceutical manufacturing the introduction of new products is nec-
essary for a companys innovation and sustainability. With the new products, new challenges also
incur with respect to the expansion or modification of the current production facilities and more
specifically the production lines making the facility design an absolutely essential operation. The
planning and scheduling functions in a company rely mostly on the critical path method (CPM) to
allocate limited resources to the activities that have to be done. This allocation of resources has
to be done in such a way that the company optimizes its objectives and achieves its goals. These
resources can be engineering teams, licenses for computer-aided-engineering software, the project
budget, equipment or machinery that can be re-used from other finished projects etc. The above
mentioned planning also defines the deadlines of the projects according to the organizations ob-
jectives often ignoring details from a microscopic view of the project. The following example
illustrates the role of planning and scheduling in a real industrial situation, namely in the manage-
ment of large construction and installation projects that consist of many stages. Consider a phar-
maceutical company that intends to produce a new drug (e.g. liquid form). The project involves
a number of distinct tasks including electrical and instrumentation design, the process design, the
procurement of instruments and equipment, the construction phase, the programming phase (im-
plementation) and documentation phase. A precedence relationship structure exists among these
tasks: some can be done in parallel (concurrently), whereas others can only start when certain
predecessors have been completed. The goal is to complete the entire project in minimum time,
in other words to minimize the maximum completion time also known as the makespan. Planning
and scheduling provide a coherent process to manage the project as well as a good estimate for
its completion time but sometimes fail to reveal which tasks are critical and determine the actual
duration of the entire project. This failure is attributed mostly to the outsourcing policy that such
organizations employ. For example, the engineering teams for electrical & instrumentation design
and process design are outsourced whereas the initial planning (e.g. deadlines, resources and bud-
get), the coordination and supervision are performed by the organization. A good approximation
of such a project is shown in the following figure.
Figure 1: Illustration of a middle size project
4
1.2 Some details about the various project phases
Table 1 shows possible phases for a project in the design of manufacturing facilities. It is impor-
tant to mention at this point that the graph in Figure 1 will not be used explicitly in this thesis but
only two of its nodes for reasons explained in the following section. Therefore, Table 1 intends
primarily to introduce one instance of a real application, which was the motivation for this the-
sis. In addition to this, since the characteristics of the possible phases are realistic we intend to
emphasize that in order to build a basic model one has to make assumptions and simplifications
mentioned in later sections.
Table 1: Overview of possible project phases
S: Basic Provides the basis (basic engineering design in engineering terms) for the jobs
Engineering in A and B. It also contains information about the processing times of the jobs
in A and B with 30% precision. However, this is a not entirely an optimistic-
pessimistic approach but rather assumptions about additional work (jobs that
might need to be reprocessed due to engineering refinements).
A: Electrical The processing times are known with 30% precision, whereas the release
Engineering dates are equal to 0. In principal the duration of the same job in A is less or
equal to its processing in B.
B: Process The processing times are known with 30% precision whereas the release
Engineering dates are 0. The sequence policy of the jobs in B is defined by the priority
rules that are out of the scope of this thesis. The machines in B cannot be idle.
The processing times are unknown until they are fully processed.
C: The processing times are known with 30% precision whereas the release
Automation date of is job is dependent on its completion time in B. The processing times
Specification are unknown until they are fully processed.
D: Montage The processing times are known with 30% precision whereas the release
date of each job is dependent on its completion time in B plus the delay, which
is the lead (delivery/transportation) time of the equipment. The delay of each
job remains unknown until its completion in B.
E: The processing times are known, whereas the release date of each job is de-
Automation pendent on its completion time in D.
G: Safety The processing times are known, whereas the release date of each job is de-
Documentation pendent on its completion time in C.
F: Final The processing times are known, whereas the release date of each job is de-
Documentation pendent on its latest completion time in E or G.
1.3 What happens in reality and what can be achieved

The initial design from S will be given to the engineering teams in A and B, who will work on
parallel the same jobs. The order of the jobs is usually decided by a rule that seems to be an
EDD rule (earliest due date) but it is more a weighted EDD rule as some parts are considered
as basis. Sometimes there is no priority rule as it is important to meet the general completion
deadline. Typically the parts that are longer have earlier deadlines so that the general deadline will
not be affected in case some unforeseen problem comes up. It is noteworthy that the production
line is not operational until the whole project is finished so the idea of only one final deadline is
reasonable. We assume a job j starts being processed by a machine in A and by another machine
in B. Note that the job j is the same job that can be processed by parallel machines but can have
different completion times (job j can be the design of a specific part of the production line and
has to be processed both by process engineering and by electrical engineering can be done in
5
parallel). When the job j is finished in B, there is a minimum delay for each job before it can
then be processed by D. This delay becomes known only when the job is completed in B and
as a result the earliest possible release time rj for phase D can be determined. Moreover, there
is constant communication between teams in A and B and this is the reason why it is assumed
that if a job is stopped being processed by a machine in A, it also stops being processed by a
machine in B and vice versa. This implies also that preemption is allowed as well as every stop is
considered preemption. The processing times in D are known as soon as the job j in A is finished
(also known with the 30% precision when S is finished). Summarizing, in an attempt to also
state the problem, the minimum time that is needed between B and D is unknown and can lead to
undesired prolongation of the completion time of the project. The fact that these delays between
B and D are typically longer than the processing times suggests that the critical path is the path
S-B-D-E-F. Aim of this thesis is to focus on the two most significant nodes of the graph (B and
D), which consist a two-machine flow shop problem with minimum delays. The reason behind
this is that every reduction in the completion time of this flow shop is an actual reduction in the
completion time of the whole project. We will attempt to use testing to reveal information for the
delays and assess whether this is a strategy that can be employed when information on the delays
is missing. It will also be investigated whether the initial policy of scheduling the jobs according
to EDD, usually equivalent to the LPT schedule, can be replaced by another scheduling rule.
1.4 Outline
Main characteristic of our attempt is the concept of exploration through exploitation, which con-
sists testing some of the jobs in B in order to learn their delays and employ another scheduling
policy that will lead to a shorter completion time. The notion of testing is also the novelty of this
thesis, which can be applied in the above realistic project management scenario. Due to the fact
that not much work has been done in the field of testing in scheduling problems, we consider a
single machine for each stage of the flow shop. Section 2 provides a detailed literature review to
build up the necessary theoretical background for this scheduling problem and its innovative ap-
proach. In Section 3 we give a formal description of the basic problem and this is divided into three
distinct models. Afterwards, in Section 4 we define the lower bounds and we give some theorems
and general observations about the three models. Moreover, we present the optimal algorithm for
a special case of the deterministic problem and we propose five algorithms that are believed to pro-
duce near optimal schedules. In an attempt to extend some of the observations for more complex
settings we also give three properties of local search. In Section 5 the problem is also formulated
as an Integer Linear Program aiming to obtain the optimal solution of every instance. After that,
in Section 6 we show how the industrial application described in the introduction was modeled
as a two-machine flow shop with delays for two cases, namely when testing is for free and when
testing is costly including a detailed cost function. We also show how the proposed algorithms
are modified to cope with the stochastic version and the two cases of testing. Section 7 is devoted
to the results obtained by simulation for various cases and different distributions for the delays.
Finally, this thesis is concluded with general discussion of the findings and suggestions for further
research in Section 8.
2 Literature Review
2.1 Flow shops
Flow shop is one of the most classic scheduling problems whose conception is traced back to the
1950s. However, one of the main assumptions of those problems has always been that the time
6
required to move a job from one workstation to another is negligible (Johnson, 1954). This as-
sumption might be the case in some situations, but according to the description with regard to the
facility design a "transportation" time will be required for each and every job. Our problem is the
two-machine flow shop with minimum delays. Nawijn and Kern (1991) study a single-machine
problem with two operations per job and intermediate minimum delays, which is equivalent to
the two-machine flow shop problem with minimum delays. They show that the problem is NP-
hard in the ordinary sense if the solution space is not restricted to permutation schedules. The
result is strengthened to NP-hardness in the strong sense for F 2|dj |Cmax (Yu et al., 2004), for
F 2|dj , p1j = p2j |Cmax (DellAmico, 1996), for F 2|dj {0, d}, p1j = p2j |Cmax (Yu, 1996), and
finally for F 2|dj , p1j = p2j = 1|Cmax (Yu et al., 2004). DellAmico (1996) proposed several
lower bounds, which were used in the derivation of several polynomial 2-approximation algo-
rithms for the makespan objective. In the same paper, he also proposed a tabu search algorithm
that produced good results. Orman and C. Potts (1997) study the problem of minimizing the idle
time of a radar, which they formulate as a single-machine scheduling problem with exact rather
than minimum delays. They identify some special cases that are polynomially solvable. The coun-
terpart of the above problem, in which the delay constraints are restricted to be "maximum", has
also been studied in the literature. Yang and Chern (1995) show that the problem to minimize
the makespan objective is NP-hard and propose a branch and bound algorithm. Another closely
related problem is the proportionate two-machine flow shop (p1j = p2j ), for which Ageev (2007)
improved the results of DellAmico giving a 32 -approximation for this special case. The last de-
velopment in the two-machine flow shop problem with exact delays was made by Leung et al.
(2007), where they focused on objectives of the makespan and total completion times. For the
makespan objective they show that the problem is strongly NP-hard even if there are only two
possible delay values. They showed that some special cases are solvable in polynomial time. They
also designed approximation algorithms for the general case and some NP-hard special cases. It
is also noteworthy that they showed that the optimal schedule for the problem F 2|dj |Cmax does
not have to be a permutation schedule. This comes also to an agreement with the findings of C. N.
Potts et al. (1991) according to which, for the problem of minimizing the maximum completion
time, the value of the best permutation schedule is worse than that of the true optimal schedule by

a factor of more than 12 m, where m the number of machines. In another paper Strusevich and
Zwaneveld (1994), consider the two machine flow shop with setup, processing, and removal times
separated. They show that there may not exist an optimal solution that is a permutation schedule,
and that the problem is NP-hard in the strong sense. Furthermore, long before all these problems
were mentioned, the first who introduced the notion of delays were Johnson (1958), Mitten (1959),
Nabeshima (1963) and Swarc (1968). They refer to them as time lags, which can be viewed as
processing times of a non-bottleneck machine in between the two workstations, thus comprising
a special case of a F 3||Cmax problem. So one could solve the problem by applying the Johnsons
rule to the processing times (p1j + dj , dj + p2j ). More specifically, Johnson proposed that for
two jobs i and j: job i precedes job j if min{p1j + dj , di + p2i } > min{p1i + di , dj + p2j }.
With respect to the Johnsons solution of the F 3||Cmax problem and the definition of a non bot-
tleneck machine, Kamburowski (2000) summarized all the progress regarding the non-bottleneck
machines. According to that paper a middle machine can be perceived as a non-bottleneck ma-
chine and the Johnsons algorithm is applicable under any of the following conditions:
1. Johnson (1954): p1j di or p2i dj for all i 6= j, which basically implies that the middle
machine is dominated by machine 1 or machine 3.
2. Burns and Rooker (1975): min{p1k , p2k } dk for all k.
3. Monma and Rinnooy Kan (1983): p1j + (1 )p2i dk for all i, j and k, and some ,
0 1.
7
4. Kamburowski (2000): (p1j di ) + (1 )(p2i dj ) 0 for all i 6= j and some ,
0 1.
2.2 Main categories of scheduling problems

Lageweg et al. (1981) were the first to systematically tabulate the complexity of scheduling prob-
lems, thus confirming that most of them are NP-Hard (Lageweg et al., 1982). Due to the NP-
Hardness, the algorithmic approach comprises indeed a difficult task, and particularly even more
difficult when the data is unknown or gradually revealed. For this reason the scheduling problems
are divided into three main categories, namely deterministic, stochastic and online. In determin-
istic scheduling, all problem data is known with certainty in advance and the problem simply
becomes an optimization process. The main characteristic of stochastic scheduling is the fact that
parts of the input data may be subject to random fluctuations. Hence, the effective processing times
are not known with certainty in advance. More precisely, it is assumed that the processing times of
any job are governed by a random variable, and the actual processing times become known only
upon completion (Uetz, 2001). So the stochastic scheduling answers the question: What is the best
that can be achieved under the given uncertainty about the future? Unlike to those two categories,
in the online scheduling no knowledge regarding the processing or the arrival time to the system
is assumed, and information is revealed gradually. Thus, the main question of online schedul-
ing is mainly: What was achieved under uncertainty about the future, and what could have been
achieved if the future would not have been uncertain? The main attempts of research in the field
of stochastic scheduling were mostly focused on solving stochastic counterparts of deterministic
scheduling problems. While these models are often more realistic than the deterministic models,
they also tend to be more difficult to solve (Uetz, 2001). Some results can be generalized to the
stochastic case (e.g., the WSEPT rule, Rothkopf (1966)), while others require special assumptions
regarding the distribution functions (e.g., exponential distribution, Weiss and M. Pinedo (1980)).
On certain occasions, stochasticity actually simplifies problems and their analysis (M. L. Pinedo,
2008) but in general, the analysis of stochastic models is considered much harder.
2.3 Learning and Testing

In the classical scheduling theory, the processing times of the jobs are considered to be deter-
ministic. Nevertheless, this assumption might be unrealistic in many situations. This finding is
supported by several industrial empirical studies, which show that unit costs decline as firms pro-
duce more of a product and gain knowledge or experience. In fact, the repetition of similar tasks
continuously improves the workers skills; workers are able to perform setup, to deal with machine
operations and software, or to handle raw materials and components at a greater pace (Xingong
and Guangle, 2010). This phenomenon is known as the learning effect, as proposed by M. L.
Pinedo (2008). Biskup (1999) was one of the first to introduce the innovative idea of learning to
scheduling problems pointing out that the learning effect has been observed in numerous indus-
trial situations and various business activities. The concept has been widely employed in manage-
ment science since its discovery by Wright (1936). Biskup (1999) also proposed a position-based
learning model and showed that two single-machine scheduling problems remain polynomially
solvable. Ever since, this are received gradually more attention by the researchers. In many real-
istic situations, the machine and human learning effects might exist simultaneously. For example,
Wu and Lee (2009) indicated that the boot time of Windows Vista is shortened each time due to
machine learning, while the word processing speed is shortened due to human learning. In these
models, the effect of learning is related to changes in processing times. Another aspect of learning
is deemed to be the information revelation. For example, in many settings the exact nature of the
various jobs is uncertain; i.e. the time and amount of resources required to process a given job and
8
its relative priority might not be known exactly. In a recent article, Levi et al. (2015) identify that
there are cases, where collecting more information on a job requires the allocation of the same
resources used to process the job. This gives rise to operational trade-offs of exploration versus
exploitation, specifically, how to dynamically allocate resources between diagnostic work called
testing that aims to collect more information on the arriving jobs, and processing work called
working that simply serves the jobs (customers) in the systems. In the same article, they are the
first to introduce this new class of problems that capture exploration versus exploitation trade-offs
in the service environments. Moreover, they provide a structural analysis after formulating the
problem as a high-dimensional Dynamic Program and give a characterization of optimal policies.
Apart from this article and despite the wide spectrum of problems that has been explored in the
area of scheduling, the topic of testing itself seems to have not received attention.
3 Model Description
This section gives detailed information regrading the three models used in this thesis. In detail,
these are the deterministic model, the stochastic and the stochastic with testing. Before breaking
down the three models we state the basis of them, which is the same for all.
3.1 Problem Definition

We consider the following problem. There are two work stations (WS), and a set of jobs J =
{1, ..., n}. Each job j J consists of two successive operations O1j and O2j , where O1j is
processed first for p1j time units at WS1, and then O2j is processed for p2j time units at WS2.
Each job j is available for WS1 at time zero and becomes available for WS2 dj time units after
its completion at WS1 and as a result it is defined by a triple (p1j , p2j , dj ). Note that even though
the processing times in the project described in Section 1.2 are known with some precision, we
assume that p1j and p2j are deterministic for all models and we do not consider a final deadline or
due dates. The dj is the minimum time interval between the completion time of O1j at WS1 and
the starting time of O2j at WS2 and becomes known only after the O1j is complete at WS1. As a
result, the release date for each job j J at the WS2 is given by:
rj = C1j + dj , (1)
where C1j is the completion time of O1j . Clearly the sequence at WS1 will affect the release dates
for WS2. The problem is to find a feasible schedule with minimal length, namely to minimize the
completion time of the last job. Obviously, when all dj are zero or uniform the problem becomes
the classical problem F 2||Cmax for which the Johnsons rule is an optimal solution but restricted
to permutation schedules. According to M. L. Pinedo (2008), finding an optimal schedule when
sequence changes are allowed is significantly harder than finding an optimal schedule when se-
quence changes are not allowed.
Unfortunately, to the best of our knowledge permutation schedules have not been discussed by the
first researchers who studied the two-machine flow shop with time delays (Mitten, Johnson) but
C. N. Potts et al. (1991) exhibited a family of instances for which the value of the best permuta-

tion schedule is worse than that of the true optimal schedule by a factor of more than 21 m. Our
model can be then defined as a two-machine flow shop with minimum delays and will follow the
notation: F 2|dj |Cmax .
3.1.1 Deterministic Model

In the deterministic setting of the problem each job is defined by a triple (p1j , p2j , dj ), where the
three parameters are known in advance. Moreover, several other assumptions that hold are the
9
following:
All jobs are available for processing at WS1.
All jobs must be processed first at WS1 and then at WS2 .
Preemption is not allowed.
There is unlimited capacity between the two work stations - no blocking effect.
No job can skip either O1j or O2j .
Non-permutation schedules are allowed (the jobs are not required to follow the same se-
quence in the two WS).
3.1.2 Stochastic Model

In the stochastic setting of the problem each job is defined by a triple (p1j , p2j , dj ), where the
processing times p1j , p2j are known in advance, whereas only the distribution of dj is known.
Moreover, several other assumptions that hold are the following:
The actual value of dj is revealed only after the job j is processed at WS1 (O1j is com-
pleted).
All jobs are available for processing at WS1.
All jobs must be processed first at WS1 and then at WS2.
Preemption is not allowed.
There is unlimited capacity between the two work stations - no blocking effect.
No job can skip either O1j or O2j .
Non-permutation schedules are allowed (the jobs are not required to follow the same se-
quence in the two WS).
3.1.3 Stochastic Model with Testing

This model is exactly the same model as the previous model but with an additional assumption:
The actual value of dj can also be revealed after the job j is tested instead of processed at
some cost.
The testing assumption corresponds to the investment of an additional amount of time in the job
(some percentage of its processing time p1j at WS1) and consequently the retrieval of some infor-
mation about its delay prior to its actual processing. The most common example to describe the
basic two-machine flow-shop without delays is the one of the paint shop. Consider a paint shop
and a building with multiple levels. Every level is a job, which has to undergo two operations:
sanding and painting. If it were just for this problem, it is solved optimally by Johnsons rule.
Now consider that every level has to be painted with a specific color and this color can be speci-
fied and purchased only after the sanding is done. Assume that not all the colors are immediately
available so their delivery times are the delays. Testing one job would mean to decide on the color
in advance (before sanding) and then contact the color supplier to learn its approximate delivery
10
time. The amount of time spent on this task is called Cost of Testing (CoT) and its cost function
for a job j can be defined as follows:
CoTj = j p1j (2)
where j is a percentage different for each job and depends, in general, on the difficulty level
of each job. In our model, it is assumed that testing occurs before processing the jobs at WS1,
which implies that the jobs are available for processing at WS1 at time nj=1 CoTj instead of zero
P
as in the other two models. Note that the jobs are available for processing at WS1 at time zero if
the testing is for free. Of course, the cost function can be defined appropriately for every different
problem as it will happen in Section 6.2.2 for the project management problem mentioned in the
beginning. For instance, for the paint shop example the j might be a constant, same for all jobs
and thus CoTj = p1j .
All in all, the assumption of testing is the novelty of this model and this thesis, since it provides
an alternative way to tackle the stochastic version of the problem and introduces a new category
of scheduling problems.
3.2 Model Complexity

The deterministic version of the problem is equivalent to the two machine flow shop with delays.
Nawijn and Kern (1991) study a single-machine problem with two operations per job and interme-
diate minimum delays, which is equivalent to the two-machine flow shop problem with minimum
delays. They show that the problem is NP-hard in the ordinary sense if the solution space is not
restricted to permutation schedules. The result is strengthened to NP-hardness in the strong sense
for F 2|dj , p1j = p2j |Cmax (DellAmico, 1996), for F 2|dj {0, d}, p1j = p2j |Cmax (Yu, 1996),
and finally for F 2|dj , p1j = p2j = 1|Cmax (Yu et al., 2004). The last proof above immediately
implies that the general problem F 2|dj |Cmax is strongly NP-hard. In our model these delays are
also unknown, so the stochastic problem remains quite complex.
4 Preliminaries
In this section, the theoretical background for dealing with the problem is built up and several
observations for the general case but also special cases are made. Moreover, some algorithms that
are expected to perform near optimally will be described.
4.1 Lower Bounds

Lower bounds (LB) on the optimal value can be very useful when the optimal solution cannot be
found. They can be used to bound the performance ratio of an algorithm. Given an instance I of
this problem with an algorithms solution value Z(I) and a lower bound LB(I), the ratio of the
Z(I)
algorithms solution and the optimal solution is at most LB(I) . Next, we give three lower bounds.
All jobs j have to be processed at WS1 and WS2. Assuming that a job j with min{p1j + dj } is
j
scheduled first at WS1 and has starting time s2j = p1j + dj , this gives an immediate LB equal to:
n
X
LB1 = min{p1j + dj } + p2j (3)
j
j=1
Moreover, there is another job with max{p1j + dj } and let this job be scheduled first at WS1.
j
Ideally this job would finish last at WS2, suggesting that all other jobs have started later at WS1
11
but completed earlier at WS2 after the FCFS rule was applied. This gives the second LB equal to:
LB2 = max{p1j + dj } + p2j (4)

j
Finally, some job j is scheduled last at WS1 and in the best case this job has also a delay dj = dmin .
Assuming that this job is completed last at WS2, this gives another LB with significant value,
especially for the case where p2j = p or d is uniformly distributed, which is equal to:
n
X
LB3 = p1j + dj + p2j (5)
j=1
As a result the lower bound can be defined as:
LB = max{LB1, LB2, LB3} (6)
4.2 Theorems and Observations

Theorem 1. For the deterministic problem and for every instance of two jobs with equal process-
ing times for both work stations (p1j = p2j = p), it is optimal to schedule the job with the longest
delay first.
Proof. We consider a pair of adjacent jobs a, b with processing times p at WS1 and WS2. Ev-
ery job is characterized by a triple (p1j , p2j , dj ). For a: (p1a , p2a , da ) = (p, p, dL ) and for b:
(p1b , p2b , db ) = (p, p, dS ), with dL > dS . We argue that it is optimal to schedule a first.
Schedule 1: First b, then a

At WS1 the release dates are zero, so there are no idle times between b and a.
C1b = p.
C1a = p + p = 2p.
rb = dS + p.
ra = dL + 2p.
Obviously, ra > rb .
For this reason, C2b = rb + p = dS + p + p = dS + 2p.
C2b ra = dS + 2p dL 2p = dS dL < 0.
C2a = max {C2b , ra } + p = ra + p = dL + 2p + p = dL + 3p.
Denote the length of the schedule in this case by Cmax0 = C2a . Next, we shall prove that
0
scheduling job a first gives a schedule of length at most Cmax .
Figure 2: Schedules at WS1 and WS2 when b is scheduled first
Schedule 2: First a, then b.

At WS1 there are no release dates, so there are no idle times between a and b.
12
C1a = p.
C1b = p + p = 2p.
ra = dL + p.
rb = dS + 2p.
There are three possible cases:
(i) ra = rb ,
which implies that dL + p = dS + 2p dL = dS + p. Then, it does not matter which
jobs comes first at WS2. Let that job be a. As a result:
C2a = ra + p = dL + p + p = dL + 2p. Obviously, rb < C2a , so b is processed
without idle time.
Cmax = C2b = C2a + p = dL + 2p + p = dL + 3p = Cmax 0 .
(ii) ra < rb ,
which implies dL + p < dS + 2p dL < dS + p. Since ra < rb , a can be processed
first at WS2 and it is not known whether it is going to be idle time after that or not. So
for the WS2 we have:
C2a = ra + p = dL + p + p = dL + 2p.
C2a rb = dL + 2p dS 2p = dL dS > 0, so again no idle time at WS2.
Cmax = C2b = max{rb , C2a } + p = C2a + p = dL + 2p + p = dL + 3p = Cmax 0 .
Figure 3: Schedules at WS1 and WS2 when a is scheduled first and ra rb
(iii) ra > rb ,
which implies that dL + p > dS + 2p dL > dS + p. In this case b is processed first
at WS2 and it is unknown if there will be idle time in-between.
C2b = rb + p = dS + 2p + p = dS + 3p.
C2b ra = dS + 3p dL p = dS dL + 2p = dS + p dL + p < dL dL + p = p.
We need to take cases again, since we cannot determine if there is idle time just from
the inequality C2b ra < p.
C2b ra
So, Cmax = C2a = max{C2b , ra } + p = C2b + p = dS + 3p + p = dS + 4p < Cmax 0 .
(dS + 4p < dL + 3p dS + p < dL )
C2b < ra
So, Cmax = C2a = max{C2b , ra } + p = ra + p = dL + p + p = dL + 2p < Cmax 0 .
13
Figure 4: ra > rb without idle time Figure 5: ra > rb with idle time
0
We observe that Cmax Cmax , which means it is optimal to schedule a before b, as claimed
initially.
Example 1. The following shows that Theorem 1 cannot be generalized for more than 2 jobs,
in other words the rule of longest delay first is not optimal (see Figure 6). In the first schedule
Table 2: Example instance where the assumption for the LDF optimality does not hold
Job p1j p2j dj

A 1 1 3
B 1 1 5
C 1 1 4
D 1 1 2
the jobs at WS1 are scheduled according to the rule longest delay first giving a Cmax = 10. The
reason for this is that with this schedule rA = rB = rC = rD = 6. Another scheduling rule could
be to schedule the jobs according to their delays in pairs LPT-SPT. In detail, the longest delay
comes first but after that follows a job with short delay so that the jobs are scheduled as early as
possible, thus releasing jobs at different times. The schedule provided by the LPT-SPT gives also
the optimal solution for the given instance Cmax = 9.
Figure 6: Illustration of the schedules at WS1 and WS2 for Example 1
14
Theorem 2. For the deterministic problem and for every instance with unit processing times for
both work stations (p1j = p2j = 1), where also holds dj dj+1 2, if delays are sorted in an
increasing order, it is optimal to schedule the jobs according to the rule longest delay first (LDF)
at WS1 and apply the rule First Come First Served (FCFS) at WS2.
Proof. We will prove that this algorithm gives Cmax = LB. We assume that the jobs are indexed
and sorted in an order such that dn < dn1 < ... < d2 < d1 . From the LB definition in Section
4.1 we can determine the LB for this instance as:
X n n
X
LB = max{min{p1j + dj } + p2j , max{p1j + dj } + p2j , p1j + dn + p2j } =
j j
j=1 j=1
= max{1 + dn + n, 1 + d1 + 1, n + dn + 1} = max{1 + dn + n, 1 + d1 + 1}.
According to the LDF we schedule job 1 first, then job 2 and so on. As a result, we get:
C
P11n = p11 , C12 = p11 +p12 , ... , C1(n1) = p11 +p12 +...+p1(n1) , C1n = p11 +p12 +...+p1n =
j=1 p1j .
The release dates also become:
r1 = C11 + d1 , r2 = C12 + d2 , ... , rn 1 = C1(n1) + dn 1, rn = C1n + dn
or equivalently:
r1 = p11 + d1 , r2 = p11 + P p12 + d2 , ... , rn1 = p11 + p12 + ... + p1(n1) + dn1 , rn =
p11 + p12 + ... + p1n + dn = nj=1 p1j + dn .
Moreover, it can be proved that:
rn < rn1 < ... < r2 < r1 n + dn < 1 + d1 (7)
since rn rn1 = p11 +p12 +...+p1n +dn (p11 +p12 +...+p1(n1) +dn1 ) = p1n +dn dn1 =
1 + dn dn1 1 rn < rn1 (dn1 dn 2 dn1 + dn 2).
7
Due to (7) the lower bound becomes: LB = 1 + d1 + 1. If we compare sequentially rn1
and rn2 until r2 and r1 , we show that the inequality between two adjacent jobs holds.
Consequently, the job that finished last at WS1 is scheduled first at WS2.
At WS2 the order is n, n 1, ..., 2, 1 and thus their respective completion times are:
C2n = rn + 1 = n + dn + 1, C2(n1) = rn1 + 1 = n 1 + dn1 + 1 = n + dn1 , ... ,
C22 = r2 + 1 = 2 + d2 + 1, C21 = r1 + 1 = 1 + d1 + 1.
Note, that every job at WS2 is released no sooner than the previous job is completed. This is easily
shown because: rn1 C2n = p11 + p12 + ... + p1(n1) + dn1 (rn + 1) = p11 + p12 + ... +
p1(n1) + dn1 (p11 + p12 + ... + p1n + dn + 1) = dn1 p1n dn 1 0.
The job that is completed last is the one with the longest delay:
Cmax = C21 = r1 + 1 = p11 + d1 + 1 = 1 + 1 + d1 = LB = OP T .
The job with max{p1j + dj }, namely the job associated with the longest delay (d1 ) is scheduled
j
first and completed last without any delay, which is optimal, since it is equal to the lower bound.
Theorem 3. For the stochastic problem where d is an IID random variable and the processing
times for WS2 are equal (p2j = p) it is optimal to apply Shortest Processing Time First (SPT) at
WS1 and FCFS at WS2.
Proof. The two main ideas are that SPT reveals faster information about the time delays d but
also SPT does not use any information from the time delays d. Intuitively, the SPT creates release
dates at faster rate and minimizes them. Consider the work stations 1 and 2. There are n jobs
in total that have to be processed at WS1 and afterwards at WS2. For WS1 the release dates
are 0 for all jobs, whereas for WS2 are given by the formula: rj = C1j + dj , with dj an IID
random variable. Assume that the processing times are indexed from 1, 2, ..., n with the property
15
p11 p12 ... p1n for WS1. With the use of SPT the completion times, which comprise
the first term of the sum that gives the P release dates, are as follows: C11 = p11 , C12 = p11 +
p12 , ..., C1n = p11 + p12 + ... + p1n = nj=1 C1j .
Respectively,
Pn the release dates become r1 = C11 + d1 , r2 = C12 + d2 , ..., rn = C1n + dn =
j=1 C 1j + d n , where dj is equal to the expected value.
Due to SPT it holds that: C11 < C12 < ... < C1n . As a result, the expected values satisfy:
Ehr1 i < Ehr2 i < ... < Ehrn i and the expected sequence of the WS2 is the same as at the
WS1. By definition, for WS2 it is optimal to process the jobs according the rule PnFCFS as it is not
optimal for a machine to remain idle. It is also known that SPT minimizes j=1 C1j and thus
C11 , C12 , ..., C1n and in this case the r1 , r2 , ..., rn .
We distinguish 3 cases for the analysis:
If p minj p1j for all jobs, the expected Cmax becomes: EhCmax i = Ehrn i + p with
expected idle times. This holds because:
C11 = p11 and r1 = p11 + Ehdi.
C12 = p11 + p12 and r2 = p11 + p12 + Ehdi.
...
C1n = p11 + p12 + .. + p1n and rn = p11 + p12 + .. + p1n + Ehdi.
For the WS2 we also have:
C21 = r1 + p.
C22 = max{r2 , C21 } + p, for which it can easily be proved that r2 C21 ,
since p11 + p12 + Ehdi p11 + Ehdi + p.
C2n = max{rn , C2(n1) } + p, with rn C2(n1) that holds for all jobs.
If p maxj p1j for all jobs, the expected Cmax becomes: EhCmax i = Ehr1 i + nj=1 C2j
P
without expected idle times. This holds because:
C11 = p11 and r1 = p11 + Ehdi.
C12 = p11 + p12 and r2 = p11 + p12 + Ehdi.
...
C1n = p11 + p12 + .. + p1n and rn = p11 + p12 + .. + p1n + Ehdi.
For the WS2 we also have:
C21 = r1 + p.
C22 = max{r2 , C21 } + p, for which it can easily be proved that r2 C21 ,
since p11 + p12 + Ehdi p11 + Ehdi + p.
C2n = max{rn , C2(n1) } + p, with rn C2(n1) that holds for all jobs.
If minj p1j < p < maxj p1j the expected Cmax satisfies the relation:
Ehrn i + p < EhCmax i < Ehr1 i + ni=1 C2j .
P
In the first case the SPT has minimized C1n = nj=1 C1j and thus the Ehrn i = C1n + Ehdn i
P
and in the second the earlier starting time of the WS2 Ehr1 i = C11 + Ehd1 i. In all three cases, it
minimizes the total expected completion time, which completes the proof.
Example 2. This instance shows that if the above assumptions do not hold and the processing
times at WS2 are arbitrary, the SPT rule might not be optimal (see Figure 7). The first schedule is
the one, whose sequence at WS1 is constructed by SPT instead of the LPT in the second.
16
Table 3: Example instance where SPT at WS1 is not optimal
Job p1j p2j Ehdj i

A 4 1 5
B 5 5 5
Figure 7: Illustration of the schedules at WS1 and WS2 constructed by SPT-FCFS and LPT-FCFS
respectively for Example 2
Observation 1. For the stochastic problem with testing, if testing is for free, it is optimal to test
all jobs in advance.
By definition, if the d component is known the problem becomes the deterministic F 2|dj |Cmax .
Then, if the transportation delay is uniform (equal to d), the problem is in P and one has to apply
the Johnsons algorithm to processing times (p1j + dj , dj + p2j ). Due to the fact that the problem
we study is not the variant that can be solved optimally by Johnsons rule but several algorithms
have been proposed in literature (Karuno and Nagamochi, 2003) apart from a 32 -approximation
algorithm for the case where all processing times are equal (Ageev, 2007), it is better to solve the
deterministic version. As a result, it is optimal to test all jobs in advance.
Observation 2. For the stochastic problem with testing, where d is an IID random variable, and
unit processing times at WS2 for all jobs (p = 1 = p2j p1j ), if it is allowed to test a single job
without any testing cost, then the decision maker chooses to test the job with longest processing
time at WS1.
We now give a rough sketch of the proof. As shown by Theorem 3 it is optimal to apply the SPT
rule as the d components are revealed faster and the release times are minimized. Also with distri-
bution of d known, it is possible to classify d into three categories, namely {dSmall , dM edium , dLarge }.
Any d after becoming known, we say that is dLarge if there is an interval [dLargemin , +) such
that d [dLargemin , +). For simplicity, we write that dLarge Ehdj i + 1. Sorting the jobs
according to SPT at WS1 gives:
p11 p12 ... p1(n1) p1n . It has been proved by Theorem 3 that the expected completion
time at WS2 will be:
EhCmax i = Ehrn i+p = nj=1 C1j +Ehdj i+p. We will denote the last job (index n) as job y and
P
the second last (n 1) as x for the analysis. If we test the longest job y at WS1 its dy component
17
is revealed. If dy Ehdj i we keep the sequence as it is and proceed to WS2.
In case dy > Ehdj i, then dy is said to be dLarge . For the previous expected completion time it also
holds:
0 i = Ehrn i + p = nj=1 C1j + Ehdj i + p = nj=1 C1j + dLarge + p,
P P
EhCmax
since d is not expected value anymore. We need to show that the best we can do knowing
the real d of only one job is to swap the sequence of the last two jobs. Originally, it was:
p11 p12 ... px py . Thus, swapping x and y and leaving the sequence of all other
jobs the same we get for WS1 the following sequence: 1, 2 , 3 , .... , n 2, y, x, where also holds:
p11 p12 ... p1(n2) and py px .
Job y has a completion time at WS1 equal to:
C1y = p11 + p12 + ... + p1(n2) + py .
Job x now completes last:
C1x = p11 + p12 + ... + p1(n2) + py + px .
As a result their completion times are respectively ry = C1y + dLarge and rx = C1x + dj .
Knowing that dLarge Ehdj i + 1 is not enough to determine, which of the jobs x and y will be
scheduled first at WS2. So we need to take all possible cases:
ry < rx
This also implies that dLarge < px + Ehdj i. The job y is scheduled first so EhC2y i =
C1y + dLarge + p. Job x completes last but we cannot determine if there will be idle time
between x and y at WS2. C2x = max{rx , C2y } + p
rx C2y .
EhCmax i = EhC2x i = rx + p = nj=1 C1j + Ehdi + p < EhCmax
0
P
i.
rx < C2y
EhCmax i = EhC2x i = EhC2y i + p = C1y + dLarge + p = nj=1 C1j px + dLarge + p <
P
0
EhCmax i.
ry = rx Pn
EhC
Pn max i = EhC 2x i = rx + p + p = ry + p + p = j=1 C1j px + dLarge + 2p
0
j=1 C1j + dLarge + p = EhCmax i.
ry > rx
EhCmax i = EhC2y i + p = nj=1 C1j py + dLarge + p < EhCmax
0
P
i.
0
For all cases, EhCmax i EhCmax i, which completes the claim that it is optimal to test the longest
job at WS1.
Example 3. In this example it is shown how the test of the longest job can improve the expected
completion time. We assume that at WS1 there are n 2 other jobs that have already been
Table 4: Example instance for the optimality of testing the longest job
Job p1j p2j Ehdj i actual d

x 15 1 15 -
y 16 1 15 16
processed and the n2

P
j=1 C1j is equal to 100.
The last two jobs have Ehdi = 15 but after testing job y its delay is revealed to be dy = 16.
Initially the EhCmax i = C2y + Ehdi + p = 131 + 15 + 1 = 147. After the testing and the realized
18
delay dy , the completion time becomes EhCmax i = 148 also depicted in Figure 8. The second
schedule is constructed after swapping the jobs x and y resulting to an expected improvement of
the EhCmax i.
Observation 3. For the stochastic problem with testing, where d is an IID random variable, and
unit processing times at WS2 for all jobs (p = 1 = p2j p1j ), if it is allowed to test k jobs without
any testing cost, then the decision maker chooses to test the k longest jobs.
According to Observation 2 but also due to improvements in the Cmax that can be given when
scheduling the jobs with the longest delay first we generalize the testing of the longest job to the k
longest jobs and we claim that under circumstances changing the sequence of these k jobs at WS1
can produce a better schedule, otherwise the schedule remains as it was originally. The Examples
4 and 5 show the application of Observation 3, where in the former an actual improvement can be
achieved unlike in the latter.
Example 4. We consider an instance of n jobs, where the first n 2 jobs are completed at WS1 in
100 time units. The decision maker is allowed to test 3 jobs for free and chooses the longest jobs
x, y and z, with data shown in Table 5.
Table 5: Example instance where testing the k-longest jobs is effective

x 10 1 10 9
y 15 1 10 12
z 17 1 10 15
19
Initially, the EhCmax i = 153 but after testing the realized Cmax = 158. Knowing this the
decision maker can swap the order of the jobs at WS1 and minimize the completion time at WS2
Cmax = 152, as can be also seen in Figure 9.
Example 5. We consider the same instance as before with the only difference being the actual d
of the jobs x, y, and z. After the testing, it is revealed that the job associated with the shortest d is
scheduled last as can be observed in Figure 10. For this instance there is no possible swap between
these 3 jobs that can produce a smaller Cmax at WS2. This can be easily verified if one creates all
3! = 6 possible schedules at WS1 and their corresponding schedules at WS2 according to FCFS
to calculate all completion times.
Table 6: Example instance where testing the k-longest jobs is not effective

x 10 1 10 12
y 15 1 10 15
z 17 1 10 9
Remark 1. In a more complicated setting, the Observations 2 and 3 can also hold under circum-
stances and in order to determine the optimal sequence between two jobs we give some properties
of local search in Section 4.4.
20
4.3 Algorithms
As mentioned before, for the problem F 2|dj |Cmax the optimal schedule does not have to be a per-
mutation schedule, which is given by the Johnsons rule. This means that ideally in order to find an
optimal non-permutation schedule one should generate all permutations for WS1 and then sched-
ule the jobs at WS2 according to the rule FCFS. The optimal solution would be the min{Cmax }
but this would require n! time.
For the general case, which concerns the stochastic problem with testing in addition to the O(nlogn)
Johnsons algorithm we will propose also four algorithms that generate non-permutation sched-
ules. Note that the Johnsons rule is applied for cases where the delays are uniform, which implies
that the sequence at WS2 is also the FCFS rule. However, this is not the case for the delays in
our model and since the Johnsons rule is considered to be an algorithm that produces permu-
tation schedules we modify this rule to produce non-permutation schedules (Flexible Johnsons).
The following algorithms concern the deterministic model (or equivalently the stochastic with free
testing for all jobs) and will later be modified for the general stochastic variant where testing is
subject to limitations. As far as the 4 algorithms are concerned, Theorem 1 and Example 1 were
the basis for their development. The LDF simply schedules the jobs in a descending order of their
delays, whereas the LPT1LPT2 and LPT1SPT2 prioritize the jobs with long delays but differ from
each other when choosing the job to be scheduled after a long delay job. Unlike those 3 algo-
rithms, the SPT1LPT2 prioritizes the short delay jobs but at the same time schedules a long delay
job after every short delay job, thus being the LPT1SPT2 with reverse order in every pair.
Johnsons rule
J OHNSON S RULE
1: Test all jobs for free
2: Partition the jobs into two sets A and B
3: . A contains the jobs with p1j p2j and B the jobs with p1j > p2j
4: Sort the jobs from set A in increasing order of p1j + dj (SPT(p1j + dj ))
5: Sort the jobs from set B in decreasing order of p2j + dj (LPT (p2j + dj ))
6: Apply list scheduling for the jobs in A followed by the jobs in B at WS1
7: Apply list scheduling for the same sequence at WS2
8: return Cmax = C2(last)
21
Flexible Johnsons
F LEXIBLE J OHNSON S
2: Partition the jobs into two sets A and B
3: . A contains the jobs with p1j p2j and B the jobs with p1j > p2j
4: Sort the jobs from set A in increasing order of p1j + dj (SPT(p1j + dj ))
5: Sort the jobs from set B in decreasing order of p2j + dj (LPT (p2j + dj ))
6: Apply list scheduling for the jobs in A followed by the jobs in B at WS1
7: Calculate the rj for all jobs
8: Apply FCFS at WS2
Algorithm 1: Longest Delay First (LDF)
LDF
2: Sort d in descending order
3: Apply list scheduling for d at WS1
Algorithm 2: Delays in pairs LPTlong LPTshort (LPT1LPT2)
LPT1LPT2
3: Partition jobs into two sets A and B
4: . set A contains the first half of the jobs with the longest delays whereas
B contains the rest (short delay jobs)
5: repeat
6: Create a pair of jobs (in total n2 ), where the first job is the one with
longest delay from set A and the second is the job with longest delay
from B, and index this pair counting from 1 onwards
7: Delete the two jobs from A and B respectively
8: until Sets A and B are empty
9: Apply list scheduling of the pairs according to lowest index first
22
Algorithm 3: Delays in pairs SPTshort LPTlong (SPT1LPT2)
SPT1LPT2
4: . set A contains the first half of the jobs with longest delays whereas B
contains the rest (short delay jobs)
5: repeat
6: Create n2 a pair of jobs, where the first job is the one with shortest
delay from set B and the second the job with longest delay from A, and
index this pair counting from 1 onwards
Algorithm 4: Delays in pairs LPTlong SPTshort (LPT1SPT2)
LPT1SPT2
4: . set A contains the first half of the jobs with longest delays (long delay
jobs) whereas B contains the rest (short delay jobs)
5: repeat
longest delay from set A and the second is the job with shortest delay
4.4 Local search properties

In this section we provide 3 properties of local search that intend to extend the Observations 2
and 3 for more complex cases. With these properties, the decision maker can decide on possible
swaps to minimize the completion time when it is allowed to test a limited number of jobs without
any testing cost. However, we should mention that these properties were not implemented in the
aforementioned algorithms and thus will not be used later on.
Property 1. For the problem F 2|dj |Cmax , if two adjacent jobs h and k at WS1 satisfy:
(i) p1k p2h dh
23
(ii) p1h p2k dh
(iii) p1h p1k dh dk 0
(iv) p2k p1h dh dk 0
then it is optimal to schedule k before h at WS1 and then apply FCFS at WS2.
Proof. If k is scheduled first then we define the completion times of jobs k and h for the WS1 as
follows:
C1k = p1k .
C1h = p1k + p1h .
rk = p1k + dk .
rh = p1k + p1h + dh .
For WS2, it is needed to be found which job can start first.
rh rk = p1k + p1h + dh p1k dk = p1h + dh dk > 0.
As a result, k is scheduled first.
C2k = rk + p2k = p1k + dk + p2k .
C2k rh = p1k + dk + p2k (p1k + p1h + dh ) = dk dh + p2k p1h 0.
So, C2h = C2k + p2h = dk dh + p2k p1h + p2h = Cmaxkf irst .
If h is scheduled first then:

C1h = p1h .
C1k = p1h + p1k .
rh = p1h + dh .
rk = p1h + p1k + dk .
rk rh = p1h + p1k + dk (p1h + dh ) = p1k + dk dh .
Also, p1h p1k dh dk 0 dk dh p1k p1h .
Then, rk rh p1k + p1k p1h = 2p1k p1h .
We distinguish two cases:
2p1k p1h rk rh
Hence, job h starts first.
C2h = rh + p2h = p1h + dh + p2h .
We need to determine if the release time of k happens before the completion of h.
C2h rk = p1h + dh + p2h (p1h + p1k + dk ) = p2h p1k + dh dk 0.
So, C2k = C2h + p2k = p1h + dh + p2h + p2k = Cmaxhf irst .
Comparing this with the Cmax when k was scheduled first we get:
Cmaxhf irst Cmaxkf irst = p1h +dh +p2h +p2k (dk dh +p2k p1h +p2h ) = p2k p1k 0.
2p1k < p1h rk < rh

Hence, job k can start first.
C2k = rk + p2k = p1h + p1k + dk + p2k .
We need to determine if the release time of h happens before the completion of k.
C2k rh = p1h + p1k + dk + p2k (p1h + dh ) = p1k + dk + p2k dh > 0, since
p1h p1k dh dk and p2k p1h .
Job h will start as soon as k is finished.
C2h = C2k + p2h = p1h + p1k + dk + p2k + p2h = Cmaxhf irst .
Comparing this with the Cmax when k was scheduled first we get:
Cmaxhf irst Cmaxkf irst = p1h + p1k + dk + p2k + p2h (dk dh + p2k p1h + p2h ) =
dk dh + p2k 0.
24
Because p1h p1k dh dk dk dh p1h + p1k .
Also, p1h p2k .
Then dk dh p2k + p1k dk dh + p2k p1k 0.
We observe that in both cases Cmaxkf irst Cmaxhf irst , so the claim is proved.
(i) p1k p2k dk
(ii) p1h p2h dh
(iii) p2k p1h dh dk p1k p1h 0

Then it is optimal to schedule k first at WS1 and apply FCFS at WS2.
Proof. If k is scheduled first, then the jobs k, h are scheduled successively without idle times.
C1k = p1k .
C1h = p1k + p1h .
rk = C1k + dk = p1k + dk .
rh = C1h + dh = p1k + p1h + dh .
Clearly, rk < rh , so k will be scheduled first at WS2. C2k = rk + p2k = p1k + dk + p2k .
It remains to determine whether there will be idle time between k and h at WS2.
C2k rh = p1k + dk + p2k (p1k + p1h + dh ) = dk + p2k p1h dh 0.
So, h is processed as soon as k is finished.
C2h = rh + p2h = p1k + dk + p2k + p2h = Cmaxkf irst .
If h is scheduled first, then the jobs h, k are scheduled successively without idle times.
C1h = p1h .
C1k = p1h + p1k .
rh = C1h + dh = p1h + dh .
rk = C1k + dk = p1h + p1k + dk .
We need to determine which of the two jobs should start first:
rk rh = p1h + p1k + dk p1h dh = p1k + dk dh .
It is not clear, so we need to take cases:
p1k + dk dh 0 rk rh
Job h will be scheduled first. C2h = rh + p2h = p1h + dh + p2h . Now, it remains to
determine, whether there will be idle time between k and h at WS2. C2h rk = p1h + dh +
p2h (p1h + p1k + dk ) = dh + p2h p1k dk 0.
As a result, no idle time between h and k. C2k = C2h + p2k = p1h + dh + p2h + p2k =
Cmaxhf irst Cmaxkf irst .
p1k + dk dh < 0 rk < rh

Job k will be scheduled first. C2k = rk + p2k = p1h + p1k + dk + p2k . Now, it remains
to determine, whether there will be idle time between k and h at WS2. C2k rh = p1h +
p1k + dk + p2k (p1h + dh ) = p1h + p1k + dk + p2k p1h dh = p1k + dk + p2k dh 0.
C2h = C2k + p2h = p1h + p1k + dk + p2k + p2h = Cmaxhf irst > Cmaxkf irst .
We observe that Cmaxhf irst Cmaxkf irst , so the claim is proved.
25
(i) p1k p2h
(ii) p1h p2k
(iii) p1k p2k dh dk p1k p2h
Then, it is optimal to schedule h first at WS1 and apply FCFS at WS2.
Proof. If k is scheduled first at WS1 then:

C1k = p1k .
C1h = p1k + p1h .
rk = C1k + dk = p1k + dk .
rh = C1h + dh = p1k + p1h + dh .
Clearly, rk < rh , so k will be scheduled first at WS2.
C2k = rk + p2k = p1k + dk + p2k .
It remains to determine whether there will be idle time between k and h at WS2.
C2k rh = p1k + dk + p2k p1k p1h dh = dk + p2k dh p1h 0.
There will be some idle time between h and k and the completion time of h will be:
C2h = rh + p2h = p1k + p1h + dh + p2h = Cmaxkf irst .
If h is scheduled first then:

C1h = p1h .
C1k = p1h + p1k .
rh = C1h + dh = p1h + dh .
rk = C1k + dk = p1h + p1k + dk .
To determine, which job can start first at WS2:
rh rk = p1h + dh p1h p1k dk = dh p1k dk 0,
as p1k p2k dh dk p1k dh dk p1k + dk dh .
As a result h can be scheduled first at WS2. C2h = rh + p2h = p1h + dh + p2h C2h rk =
p1h + dh + p2h p1h p1k dk = dh + p2h p1k dk 0.
We conclude that the completion time of k is: C2k = C2h + p2k = p1h + dh + p2h + p1h + p2k =
Cmaxhf irst .
Comparing the Cmaxkf irst and Cmaxhf irst we get:
Cmaxkf irst Cmaxhf irst = p1k + p1h + dh + p2h (p1h + dh + p2h + p1h + p2k ) = p1k + p1h +
dh + p2h p1h dh p2h p1h p2k = p1k p2k 0 Cmaxkf irst Cmaxhf irst .
We observe that Cmaxkf irst Cmaxhf irst , which completes the claim.
5 Integer Linear Programming

In this section the problem will be modeled as an integer linear program (ILP). In the literature only
the permutation flow shop has been formulated as an ILP with Wagners model to be considered the
best formulation (Pan, 1997). The two-machine flow shop with delays can be more complicated
as the number of variables and constraints increases. Boudhar and Chikhi (2011) consider the
delays as transportation times and that for these transportation a single robot is used. This requires
a specific capacity between the two machines and the transportation time need is always constant.
The problem we study can be similarly formulated with the difference that the number of robots
is the same as the number of jobs, the capacity is infinite as there is no blocking effect and the
transportation time is not a constant.
26
5.1 Formulation
Mixed integer linear programming model for the general problem in order to determine the se-
quence of jobs that minimizes the makespan criterion. The notation that will be used is the fol-
lowing:
i: index of jobs, i = 1, .., n
P1i : processing time of job i at WS1
P2i : processing time of job i at WS2
Di : duration of delay for job i
s1i : starting time of operation 1 of job i at WS1
s2i : starting time of operation 2 of job i at WS2
ij = 1, if s1i < s1j , 0 otherwise
ij = 1, if s2i < s2j , 0 otherwise
Z = Cmax : maximum completion time at WS2
minimize Z
subject to ij + ji = 1, i, j, i < j and i 6= j (C1)
s1i s1j + ij M M P1i , i 6= j (C2)
s1i s2i (P1i + Di ), i (C3)
ij + ji = 1, i, j i < j and i 6= j (C4)
s2i s2j + ij M M P2i , i, j and i 6= j (C5)
s2i Z P2i , i (C6)
ij , ij {0, 1}, i, j (C7)
s1i 0, i (C8)
s2i 0, i (C9)
Note that all variables and constants are integers and M is a very large number. Constraint
(C1) means that for any two jobs i and j, either i precedes or j. The second constraint defines the
requirement that the first machine executes only one job at a time. Constraint (C3) suggests that
the processing time of the second operation of a job can only begin once the job has arrived to the
second machine. Constraint (C4) ensures that all jobs must be executed by the second machine.
The fifth constraint ensures that the second machine executes only one job at a time. Constraint
(C6) implies that the end of processing of any job at WS2 is lower or equal to the makespan.
Constraint (C7) means that and are binary. Constraints (C8) and (C9) state that the starting
times at WS1 and WS2 are non-negative.
6 Modeling
The problem in the facility design mentioned in the introduction has to be modeled to automate the
current method used by the management. This chapter will focus on the modeling of the problem.
It starts with the modeling of the two phases, the algorithms and provides information about the
data and how this was modified to create other datasets.
6.1 Project phases

As mentioned before the most critical part (Nodes B and D) of this project management problem
shown in Figure 1 will be modeled as a two-machine flow shop with minimum delays. Each of
27
the two nodes will be perceived as a Work Station (WS). Ideally, each work station can have m
machines at its disposal and the jobs could be processed in parallel but due to simplicity reasons
we use only a single machine at each WS. Furthermore, single machine models are important in
decomposition methods, when scheduling problems in more complicated machine environments
are broken down into a number of smaller single machine scheduling problems and their results
can provide a basis for heuristics regarding the complicated setting (M. L. Pinedo, 2004).
The processing times at WS1, WS2 as well as the delays were initially raw data collected
during my professional experience as a Systems Engineer for the design of chemical facilities. The
data was extracted from data sheets of cost estimations and other tools that monitored the progress
of the projects and was converted to hour units. Then, with the use of Matlab the data was fitted
into various distributions until a close match was found. Table 7 summarizes the empirical data
used for the problem.
Table 7: Basic Case: empirical data for the industrial problem
component Distribution
p1 normal 10.4 3.2
p2 normal 15.1 4.2
d lognormal 4.51 1
In order to test the algorithms and their validity when the input data changes several other dis-
tributions were created scaled to match the expected value of the original distributions. Obviously,
the empirical data have very large variance, which was reduced for the other distributions but still
was large enough. This happened to make the comparison of the results easier and secondly to
maintain the originality of the problem as close as possible to reality.
Table 8: General Case: modified empirical data to include other distributions
component Distribution Remarks

d lognormal d LogN orm(4.51, 1), Ehi 150, Ehi 215
d normal d N (150, 552 )
d bimodal d (N (60, 152 ) + N (240, 502 ))
d uniform d U(145, 155)
d constant 150
6.2 Algorithms
In this section we describe how the algorithms proposed in Section 4.3 will function given the fact
that the delays are unknown but testing is allowed.
6.2.1 Free Testing

In this scenario, testing is for free with the restriction that a percentage of the total jobs can be
tested to reveal the actual delay, in other words we can test k jobs for free. The two most obvious
questions that are posed here are: how do we schedule the remaining unknown jobs and which
jobs do we test? In Section 4, it was claimed that it is optimal to test the k longest jobs as a
28
long job associated with a long delay should be scheduled first. However, no claim was made
regarding the question how to schedule the unknown jobs and considering that it is hard to prove
such a claim mathematically, the simulations are the next best option. Intuitively, considering the
nature of the proposed scheduling algorithms, which is basically to avoid jeopardizing a prolonged
random schedule by ensuring that jobs with long delays will not be scheduled last, it is plausible to
choose to schedule the unknown jobs first or somewhere in the middle. Tables 9 and 10 illustrate
all the options that are included in each and every algorithm that was implemented. These two
categories of options are also two major questions that this research intends to answer, in other
words: "which jobs to test" and "how to schedule the unknown jobs".
Table 9: Option: "which jobs to test"
Option Explanation
LPT Tests the k allowed jobs starting from the job with longest
processing time at WS1.
SPT Tests the k allowed jobs starting from the job with shortest
processing time at WS1.
Random Tests the k allowed jobs randomly.
Table 10: Option: "how to schedule the unknown jobs"
Option Explanation
Unknown Schedules the unknown jobs according to SPT and then the algorithm creates the
first sequence of the known jobs, which are scheduled right after.
Unknown The algorithm creates the sequence of the known jobs, which are split in two
middle groups, then the unknown jobs are scheduled according to SPT in the middle.
Unknown Schedules the unknown jobs according to SPT and then the algorithm creates
last the sequence of the known jobs, which are scheduled right before the set of the
unknown.
Unknown Schedules the unknown jobs according to SPT and then the algorithm creates the
nested sequence of the known jobs. After each pair of known jobs follows one unknown
job.
Next, we show how the algorithms of Section 4.3 are extended. We only show this for Algo-
rithm 2 (LPT1LPT2). The other algorithms, including the Johnsons and the modified Johnsons,
are modified similarly.
29
Algorithm 2a: Delays in pairs LPTlong LPTshort (LPT1LPT2) with k-free testing
LPT1LPT2
1: Choose which k jobs to test . LPT, SPT, Random
2: Sort unknown jobs according to SPT rule
3: Choose how to schedule the rest unknown jobs . Unknown first,
Unknown middle, Unknown last, Unknown Nested
4: Sort d in descending order for the known jobs
5: Partition jobs into two sets A and B . set A contains the first half of the
long delay jobs whereas B contains the rest (short delay jobs)
6: repeat
10: Sort the pairs according to lowest index first
11: Merge the unknown jobs with the pairs according to the selected option
12: Apply list scheduling at WS1
The following instance shows how the algorithm LPT1LPT2 works in the case of k-free testing
but the option "how to schedule the unknown jobs" is replaced by another idea. The unknown
delays are replaced by the expected delay and they are treated by the algorithm as "known" jobs.
Algorithm 2b: Delays in pairs LPTlong LPTshort (LPT1LPT2) with k-free testing and expected
delay
LPT1LPT2
1: Choose which k jobs to test . LPT, SPT, Random
2: Sort unknown jobs according to SPT rule
3: Sort d in descending order for all the jobs . The Not-Tested jobs have
delay equal to the expected delay
4: Partition jobs into two sets A and B . set A contains the first half of the
long delay jobs whereas B contains the rest (short delay jobs)
5: repeat
9: Apply list scheduling at WS1
6.2.2 Costly Testing

This section describes a more realistic scenario according to which the testing is not for free. The
concept of testing in a real industrial environment is essentially the investment of an additional
30
amount of time in the task and the retrieval of some information
Pn prior to its actual processing.
Pn This
practically implies that the processing at WS1 will last j=1 (p1j + CoTj ), where j=1 CoTj is
not only the cost of testing but also the starting time of the O1j of the very first job. Alternatively,
considering that it has been decided to test k jobs first and process all the jobs at WS1 after testing,
this means that the release dates of all jobs are equal to the total testing cost. As mentioned before,
in the industrial application the delays correspond to lead times of equipment. The delays can be
revealed only after the equipment is completely specified and ordered. In this case, the testing
option offers the following alternative: The engineering team (Node B in Figure 1) instead of
designing and specifying the equipment (O1j ) can spend some amount of time to investigate the
task, suggest potential equipment and contact the respective suppliers to learn the preparation and
delivery times of the equipment. The described task can last 10%-30% of the O1j but it reveals
the corresponding delay and this consists the idea of learning through testing. As a result the cost
function of CoT for a job j in this specific industrial problem can be defined as follows:
CoTj = j p1j (8)
where
0.1, if dj {dmin , dmin + }

j = 0.2, if dj {dmin + , dmin + 2} (9)

0.3, if dj {dmin + 2, dmax }

dmax dmin
and is equal to 3 .
7 Computational Results
In this section we will present the results obtained by the simulation of the industrial problem for
the three models described in Section 3.1. The data was modified further from the Basic Case
shown in Table 7 to include simulations regarding the Theorem 3 and some special cases (e.g.
proportionate two-machine flow shop with minimum delays).
7.1 Deterministic Problem

This section is devoted to presenting the simulation results obtained for the deterministic problem.
The deterministic scenario is equivalent to the stochastic scenario, where all jobs can be tested for
free. The algorithms that were implemented are the ones described in Section 4.3 in addition to the
SPT(p1j ) and the SPT(p1j + dj ). In the following two subsections, the results of the simulations
for both equal and not equal processing times will be presented.
As far as the ILP is concerned, it was implemented according to the formulation in Section
5 and its results could be verified for instances up to 9 jobs. In detail, another function was built
for verification purposes, which produces all possible schedules at WS1 and then applies FCFS
at WS2. Clearly, this function finds the optimal solution but requires n! time. For example, to
find the optimal solution for one instance of 10 jobs the computation time was approximately 36
hours. With regard to the verification procedure, the ILP works correct for instances of 7, 8 and 9
jobs but produces infeasible schedules for instances of 2 to 6 jobs (5%-20% of the schedules are
not feasible). It is worth mentioning that the very large number M needs calibration whenever
the number of jobs changes but no specific algorithm was used to find M as suggested by another
ILP not relevant to our problem (Frasch et al., 2011). All in all, the ILP was expected to be the
comparison criterion to measure the performance of the algorithms due to the fact that it would
give the optimal solution but also due to its computational efficiency. Thus, not being able to verify
the correctness of the ILP we will not be presenting its results.
31
7.1.1 Equal Processing Times
The following 4 tables show the results that were obtained for different number of jobs for 4
different distributions and a sample size of 10000 simulations. Since the processing times at WS1
and WS2 are equal to 13 for all jobs, this is a special case of the two machine flow shop called
proportionate flow shop. The minimum of each column is colored to make it easier to spot the
algorithm that achieved the shortest schedule.
Table 11: p1j = p2j and d LogN orm(4.51, 1)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT(p1j ) 415.60 573.61 702.15 817.30 923.80 1022.41 1367.08 2098.79
SPT(p1j + dj ) 439.57 627.30 784.52 926.75 1057.67 1183.92 1621.47 2544.61
Flexible Johnsons 439.57 627.30 784.52 926.75 1057.67 1183.92 1621.47 2544.61
Johnsons 439.57 627.30 784.52 926.75 1057.67 1183.92 1621.47 2544.61
LDF 391.56 519.67 619.96 712.09 793.74 877.38 1162.17 1830.54
LPT1-LPT2 392.31 518.72 614.39 699.27 774.00 850.70 1111.64 1728.10
SPT1-LPT2 405.27 532.01 628.36 713.50 788.21 864.25 1119.75 1723.56
LPT1-SPT2 393.37 520.25 617.06 703.07 778.61 855.46 1113.14 1720.40
LB 387.81 510.61 603.14 681.98 748.91 812.35 1022.85 1513.72
Table 12: p1j = p2j and d N (150, 552 )
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT(p1j ) 273.99 343.45 408.25 475.76 539.81 606.94 868.58 1522.12
SPT(p1j + dj ) 292.77 377.28 452.15 524.74 595.70 665.23 936.25 1600.68
Johnsons 292.77 377.28 452.15 524.74 595.70 665.23 936.25 1600.68
LDF 253.48 327.89 420.95 500.87 573.84 644.69 919.50 1586.88
LPT1-LPT2 253.52 310.51 384.74 456.04 529.66 597.24 867.87 1528.64
SPT1-LPT2 267.02 326.20 391.99 457.35 522.49 587.45 846.68 1495.65
LPT1-SPT2 257.73 321.32 388.48 454.45 519.92 585.04 844.71 1494.11
LB 241.17 262.20 283.18 325.12 381.60 442.69 692.56 1331.14
32
Table 13: p1j = p2j and d (N (150, 552 ) + N (150, 552 ))
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT(p1j ) 321.85 397.76 464.08 529.41 595.37 659.47 920.92 1572.49
SPT(p1j + dj ) 343.09 437.42 515.43 588.22 659.63 729.21 1000.72 1664.35
Johnsons 343.09 437.42 515.43 588.22 659.63 729.21 1000.72 1664.35
LDF 297.34 339.06 390.92 484.81 583.78 679.70 982.57 1651.22
LPT1-LPT2 298.80 350.38 396.04 456.58 524.86 594.89 865.40 1528.19
SPT1-LPT2 312.45 365.20 417.16 475.96 539.87 603.71 862.25 1514.70
LPT1-SPT2 300.99 354.26 407.21 470.49 536.26 601.33 861.18 1514.22
LB 291.66 320.59 333.45 343.35 377.27 437.37 693.87 1339.45
Table 14: p1j = p2j and d U(145, 155)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT(p1j ) 231.37 297.10 362.41 427.57 492.69 557.76 817.92 1468.00
SPT(p1j + dj ) 231.37 297.10 362.41 427.57 492.69 557.76 817.92 1468.00
Johnsons 231.37 297.10 362.41 427.57 492.69 557.76 817.92 1468.00
LDF 231.37 297.10 362.41 427.57 492.69 557.76 817.92 1468.00
LPT1-LPT2 231.37 297.10 362.41 427.57 492.69 557.76 817.92 1468.00
SPT1-LPT2 231.37 297.10 362.41 427.57 492.69 557.76 817.92 1468.00
LPT1-SPT2 231.37 297.10 362.41 427.57 492.69 557.76 817.92 1468.00
LB 224.70 288.86 353.55 418.39 483.27 548.21 808.09 1458.01
7.1.2 General Case

Tables 15, 16, 17 and 18 depict the simulation results that used the data from the industrial problem
including the 3 additional distributions for the delays. With these distributions we intended to
study the performance of the implemented algorithms.
33
Table 15: p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d LogN orm(4.51, 1)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT(p1j ) 407.11 567.74 690.89 804.48 895.51 975.17 1248.67 1893.21
SPT(p1j + dj ) 428.54 618.31 768.73 904.93 1022.23 1124.00 1485.33 2282.48
Johnsons 426.49 613.56 761.57 894.82 1009.13 1107.94 1458.84 2229.99
LDF 391.45 534.99 646.70 746.11 835.20 914.93 1206.75 1986.40
LPT1-LPT2 391.77 533.14 638.01 728.42 804.69 868.83 1115.16 1807.32
SPT1-LPT2 401.62 543.09 647.78 739.25 815.51 880.08 1123.21 1796.45
LPT1-SPT2 392.31 533.98 639.37 731.55 808.66 874.42 1120.16 1798.25
LB 387.52 525.36 625.66 712.73 782.79 838.19 1051.59 1664.76
Table 16: p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d N (150, 552 )
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT(p1j ) 263.85 321.34 382.18 445.22 512.56 582.18 873.81 1620.49
SPT(p1j + dj ) 281.87 353.14 416.93 476.30 533.90 591.41 843.22 1567.56
Johnsons 279.77 348.60 409.53 466.44 521.78 577.48 832.65 1563.59
LDF 252.48 329.09 432.45 529.77 617.21 698.70 1015.33 1789.83
LPT1-LPT2 252.34 313.74 398.49 475.49 557.73 632.74 942.25 1705.87
SPT1-LPT2 262.39 322.88 395.28 468.14 540.45 614.87 909.18 1656.63
LPT1-SPT2 255.38 320.92 395.24 468.77 541.31 615.79 911.12 1659.38
LB 240.42 264.19 302.10 361.11 430.21 502.29 792.61 1537.24
Table 17: p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d (N (60, 152 ) + N (240, 502 ))
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT(p1j ) 316.04 375.20 426.30 480.24 534.60 589.64 846.47 1584.78
SPT(p1j + dj ) 336.27 416.02 478.10 540.82 597.54 654.80 876.47 1550.45
Johnsons 334.20 411.12 470.68 530.86 584.58 639.08 851.60 1549.20
LDF 300.71 340.36 399.23 497.66 600.29 702.28 1075.54 1855.05
LPT1-LPT2 301.53 349.35 394.60 455.45 524.80 601.21 916.34 1682.54
SPT1-LPT2 312.66 361.84 413.63 474.09 538.31 607.29 901.49 1653.84
LPT1-SPT2 303.32 353.34 406.17 469.39 536.85 608.43 905.30 1657.73
LB 295.25 321.85 332.26 361.15 422.59 497.25 795.60 1546.13
34
Table 18: p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d U(145, 155)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT(p1j ) 233.18 306.76 382.24 456.12 530.77 606.18 906.78 1662.75
SPT(p1j + dj ) 232.23 305.15 380.48 454.40 528.97 604.16 904.59 1660.63
Johnsons 231.64 304.89 380.37 454.33 528.93 604.11 904.58 1660.62
LDF 239.58 315.96 392.55 467.53 542.69 618.06 919.57 1676.52
LPT1-LPT2 239.59 315.69 392.29 467.16 542.42 617.79 919.12 1676.16
SPT1-LPT2 236.02 311.51 387.90 462.66 537.87 612.71 914.26 1671.27
LPT1-SPT2 239.15 315.25 391.61 466.35 541.58 616.68 918.19 1674.99
LB 231.12 304.73 380.26 454.27 528.88 604.06 904.57 1660.61
7.1.3 Deterministic problem - Discussion

Tables 11 to 14 depict the case, where the processing times are equal. Theorem 1 was able to
determine a scheduling policy for every instance of two jobs but it was not possible to generalize it
for many jobs. The algorithms that were proposed and the intuition that the job with longest delay
should be followed by a job with shorter delay was captured by the algorithms LDF, LPT1LPT2
and LPT1SPT2. The last two, thought, perform much better on average with the LPT1SPT2 to be
slightly better. The worst-case ratios LP TLB
1LP T 2
and LP TLB
1SP T 2
range within 1.01 1.40 for the
3
proportionate flow shop for instances up to 100 jobs, where a 2 -approximation is the best known
algorithm. Nevertheless, our algorithms have no approximation guarantee.
The scenario with processing times given by a normal distribution represents the general case,
namely a realistic problem and its results are shown in Tables 15 to 18. When the time delays
are uniformly distributed even though the problem does not satisfy the conditions to apply the
Johnsons rule, comparing the best achieved solution, which is a permutation schedule, to the
LB we have a clear indication that this might be the optimal solution. Note that the modified
Johnsons algorithm gives the same solution as the original only for the uniform distribution or
when the processing times are equal, where the two algorithms behave as the SPT(p1j + dj ). The
ratios LP TLB
1LP T 2
and LP TLB
1SP T 2
range within 1.01 1.32 for the general case, which indicates
a good performance but also has to do with the fact that the nature of the algorithms captures the
large variance.
7.2 Stochastic Problem

This section intends to present the results relevant to the stochastic problem. As a reminder, in
the stochastic version the processing times at WS1 and WS2 are deterministic, whereas only the
distribution of the delays is known and thus the expected value.
7.2.1 Normal processing times at WS1 and equal processing times at WS2
This section presents the simulation results related to the Theorem 3, for which the sample size
was 10000. The tables are divided into 3 groups, which is consistent with the 3 distinct cases
that were used in the proof of Theorem 3. Note that for the stochastic problem the SPT(p1j ) and
SPT(p1j + dj ) yield the same results due to the fact that the first does not use the information of
35
the delays and the second uses the expected value of the distribution, which makes no difference
in the sequence of the jobs. Hence, SPT(p1j ) and SPT(p1j + dj ) were concatenated to SPT.
Tables 19, 20 and 20 show the computational results specifically for the Theorem 3 where the
values for the SPT and the LB are the expected completion time and the expected lower bound.
Hence, there is no need for distinction between the various distributions since they all have the
same expected value. The data was modified accordingly such as the properties of each case were
satisfied.
Table 19: minj p1j p2j and Ehdi = 150
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 204.35 256.09 308.01 360.05 411.93 463.98 671.68 1192.07
LB 204.35 256.09 308.01 360.05 411.93 463.98 671.68 1192.07
Table 20: minj p1j < p2j < maxj p1j and Ehdi = 150
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 212.96 265.07 317.16 369.31 421.25 473.10 680.84 1200.73
LB 212.96 265.07 317.16 369.31 421.25 473.10 680.84 1200.73
Table 21: maxj p1j p2j and Ehdi = 150
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 231.82 305.60 379.96 454.52 529.22 603.99 903.41 1652.78
LB 231.82 305.60 379.96 454.52 529.22 603.99 903.41 1652.78
The rest simulations of this section follow the same structure but this time the results show
what is the best that can be achieved under uncertainty. Tables 22 to 25 show the results for the
case where minj p1j p2j for all the jobs. For these simulations the processing times were
adjusted such that p1j 2 and p2j = 2.
36
Table 22: minj p1j p2j and d LogN orm(4.51, 1)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 387.83 540.80 672.12 778.53 877.09 949.40 1239.83 1826.79
LPT 393.61 557.37 699.41 812.88 918.34 992.75 1316.19 1989.97
Random 390.77 549.97 687.45 797.38 895.84 970.78 1280.21 1905.20
Johnsons 394.03 556.69 695.96 808.81 917.63 991.20 1307.34 1965.51
LB 368.88 499.41 607.25 687.25 758.78 802.18 990.33 1370.33
Table 23: minj p1j p2j and d N (150, 552 )
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 245.95 299.31 348.99 402.47 453.35 505.98 711.65 1228.96
LPT 252.69 313.20 368.84 422.59 474.76 528.30 738.21 1261.78
Random 248.71 305.36 359.37 410.85 463.17 515.46 722.34 1242.74
Johnsons 252.38 311.86 363.97 416.35 468.22 520.19 729.54 1248.63
LB 225.71 246.76 259.31 279.59 309.24 353.65 551.00 1059.20
Table 24: minj p1j p2j and d (N (60, 152 ) + N (240, 502 ))
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 299.74 359.26 409.37 460.26 511.06 562.11 766.79 1285.54
LPT 305.36 372.57 426.64 479.43 536.13 588.80 799.45 1324.12
Random 302.38 366.41 418.10 467.22 522.16 578.72 781.27 1303.74
Johnsons 305.71 370.95 425.83 478.10 528.24 579.79 787.27 1308.98
LB 280.12 308.52 321.76 328.87 335.19 357.77 551.67 1069.28
37
Table 25: minj p1j p2j and d U(145, 155)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 204.01 256.00 308.25 360.02 411.98 464.29 671.98 1193.17
LPT 204.27 256.49 308.91 360.49 412.69 465.11 673.04 1194.38
Random 204.12 256.13 308.32 360.22 412.08 464.31 672.04 1193.52
Johnsons 204.04 256.03 308.34 360.10 412.08 464.30 672.04 1193.19
LB 200.70 251.92 303.87 355.52 406.76 459.47 666.97 1188.27
Tables 26 to 30 show the results for the case where minj p1j < p2j < maxj p1j for all the
jobs. For these simulations only the processing times were adjusted such that p2j = 10.
Table 26: minj p1j < p2j < maxj p1j and d LogN orm(4.51, 1)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 396.23 549.23 680.45 786.89 883.35 957.79 1248.22 1835.04
LPT 402.07 565.83 707.80 821.38 929.29 1001.04 1324.58 1998.48
Random 399.19 558.31 695.92 805.74 908.45 979.11 1288.49 1913.45
Johnsons 414.64 592.03 747.20 873.69 991.22 1082.38 1449.12 2201.75
LB 376.89 507.42 615.25 695.28 766.83 810.26 998.58 1378.88
Table 27: minj p1j < p2j < maxj p1j and d N (150, 552 )
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 255.16 309.78 359.89 413.33 463.90 516.36 721.96 1239.04
LPT 262.03 323.31 380.12 434.84 487.79 543.73 764.32 1332.54
Random 258.18 315.42 370.28 422.57 473.88 527.89 735.65 1256.66
Johnsons 271.75 339.41 397.70 454.26 506.98 560.21 767.43 1284.66
LB 233.72 254.80 267.48 288.42 318.99 364.14 561.19 1068.81
38
Table 28: minj p1j < p2j < maxj p1j and d (N (60, 152 ) + N (240, 502 ))
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 308.57 368.48 418.91 471.33 519.27 571.47 775.95 1294.60
LPT 314.19 381.95 436.24 489.20 543.48 598.92 810.56 1349.76
Random 311.28 375.43 427.47 478.99 531.43 588.21 790.94 1314.05
Johnsons 325.68 401.61 462.48 516.76 569.82 623.89 829.09 1348.33
LB 288.16 316.52 329.76 336.87 343.19 366.71 562.03 1078.72
Table 29: minj p1j < p2j < maxj p1j and d U(145, 155)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 213.44 265.40 317.46 369.40 420.80 473.87 681.26 1202.23
LPT 219.31 277.38 335.30 392.69 449.84 507.99 737.60 1313.44
Random 216.08 270.05 322.84 375.92 428.27 481.24 690.29 1212.63
Johnsons 213.73 265.79 317.91 369.99 421.42 474.41 681.95 1202.91
LB 209.83 261.37 313.28 365.12 416.37 469.23 676.47 1197.34
Tables 30 to 33 show the results for the case where maxj p1j p2j for all the jobs. For these
simulations the processing times p1j were capped to 15, whereas p2j = 15.
Table 30: maxj p1j p2j and d LogN orm(4.51, 1)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 401.65 554.89 686.11 793.14 890.14 964.66 1262.47 1897.84
LPT 407.34 571.01 712.44 826.52 933.84 1006.00 1333.51 2041.80
Random 404.57 563.60 701.20 811.18 913.48 984.72 1299.66 1962.42
Johnsons 428.91 619.14 787.05 925.89 1057.99 1158.59 1580.53 2475.92
LB 382.02 513.27 621.72 705.19 779.04 827.81 1064.16 1670.13
39
Table 31: maxj p1j p2j and d N (150, 552 )
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 262.04 321.29 378.85 442.16 507.97 578.41 869.11 1610.66
LPT 268.77 334.35 400.60 472.81 547.34 624.40 931.70 1690.29
Random 265.10 326.53 389.01 457.52 526.40 600.55 901.03 1650.96
Johnsons 286.38 368.68 442.63 516.75 587.07 661.58 956.08 1703.17
LB 239.09 264.11 298.49 360.15 427.91 499.68 789.52 1527.71
Table 32: maxj p1j p2j and d (N (60, 152 ) + N (240, 502 ))
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 314.73 375.49 426.50 479.64 530.02 587.13 839.26 1572.58
LPT 319.90 388.31 443.51 497.67 556.60 622.93 921.75 1687.43
Random 317.32 381.83 434.59 487.03 542.32 604.54 879.75 1628.65
Johnsons 340.11 429.38 504.64 575.06 648.05 722.41 1017.32 1760.88
LB 293.39 321.41 335.08 356.75 420.48 494.04 790.60 1535.80
Table 33: maxj p1j p2j and d U(145, 155)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 231.88 305.43 379.83 454.29 528.98 603.62 903.26 1652.15
LPT 239.31 315.70 391.40 466.97 542.42 617.70 918.38 1669.12
Random 235.63 310.78 385.83 460.91 535.61 610.81 910.68 1660.87
Johnsons 231.91 305.51 379.96 454.42 529.17 603.85 903.68 1652.64
LB 230.39 303.86 378.05 452.54 527.07 601.77 901.10 1650.11
7.2.2 General Case

The following 4 tables show the simulation results for the case of the industrial problem. These
included the 3 additional distributions for the delays in order to observe the performance of the
algorithms whenever the actual delays are unknown.
40
Table 34: p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d LogN orm(4.51, 1)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 407.11 567.74 690.89 804.48 895.51 975.17 1248.67 1893.21
LPT 412.97 584.20 717.59 832.41 934.27 1019.21 1328.31 2040.63
Random 410.37 575.09 703.12 815.97 913.62 998.27 1288.02 1973.52
Johnsons 433.83 630.03 789.12 931.40 1056.51 1164.70 1559.98 2452.49
LB 387.52 525.36 625.66 712.73 782.79 838.19 1051.59 1664.76
Table 35: p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d N (150, 552 )
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 263.85 321.34 382.18 445.22 512.56 582.18 873.81 1620.49
LPT 270.40 336.42 404.45 476.26 552.80 631.07 941.02 1711.11
Random 266.95 328.75 392.76 461.44 531.54 605.57 906.48 1661.40
Johnsons 287.38 367.91 443.23 516.13 589.59 662.09 960.56 1712.13
LB 240.42 264.19 302.10 361.11 430.21 502.29 792.61 1537.24
Table 36: p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d (N (60, 152 ) + N (240, 502 ))
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 316.04 375.20 426.30 480.24 534.60 589.64 846.47 1584.78
LPT 322.41 390.58 445.91 505.00 563.47 629.89 932.99 1709.77
Random 319.46 382.35 434.60 491.75 547.41 608.12 886.71 1640.84
Johnsons 341.66 430.16 503.75 579.62 650.81 725.91 1022.16 1775.05
LB 295.25 321.85 332.26 361.15 422.59 497.25 795.60 1546.13
41
Table 37: p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d U(145, 155)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 233.18 306.76 382.24 456.12 530.77 606.18 906.78 1662.75
LPT 240.93 319.11 396.90 472.97 549.15 625.93 930.56 1695.29
Random 237.05 312.54 388.94 463.64 538.56 614.14 915.28 1672.34
Johnsons 232.93 306.74 382.34 456.30 530.95 606.42 907.15 1663.22
LB 231.12 304.73 380.26 454.27 528.88 604.06 904.57 1660.61
7.2.3 Stochastic problem discussion

Tables 19, 20 and 21 show that Theorem 3 is also confirmed by the simulations, whereas Tables
22 to 33 show what can be achieved under uncertainty. The modified Johnsons algorithm seems
to give similar results to the SPT as its construction is such that it minimizes idle times but also
produces non-permutation schedules. Note that in Tables 30 to 33 the Flexible Johnsons yields
the exact same results as the SPT since it holds p1j p2j and all jobs are in set A and sorted
according to SPT(p1j + dj ). Tables 34 to 37 illustrate results from a more general setting of
the problem, where the testing is not allowed. This is also closely related to the real problem that
inspired this thesis. Clearly, the SPT is not optimal any more but the performance of the algorithms
is also dependent on the distribution. The SPT seems to be the ideal option if the delays follow
the lognormal distribution whereas the modified Johnsons algorithm outperforms all the other
algorithms if the delays are normally and uniformly distributed. For the case, where d follows
the bimodal distribution the SPT behaves better when the number of jobs is small (less than 15)
but for larger number of jobs the modified Johnsons gives better solutions. However, the overall
performance of the two algorithms for the bimodal does not differ more than 0.5%. As far as the
uniform distribution is concerned, the results of the SPT, modified Johnsons and the Johnsons
are almost the same with minor differences that can be observed in Table 37. Producing similar
results with the Johnsons rule implies that most of the schedules created are indeed permutation
schedules in the case of uniform distribution. Still, the existence of non-permutation schedules
can be attributed to the fact that the delays dominate the processing times (Kamburowski, 2000),
the analysis of which is out of the scope of this thesis. The fact that the SPT is marginally worse
than the Flexible Johnsons for the normal, bimodal and uniform distribution but slightly better for
the lognormal for large number of jobs could make the SPT the overall preferable choice for the
stochastic problem.
7.3 k-free Testing vs Costly Testing

In this part, the decision maker is allowed to test k jobs in advance to come closer to the determin-
istic version. Ideally, the testing would be for free but practically as explained the testing comes
at the cost of some increase in the processing time of the tested jobs at WS1. Due to the testing
option, the algorithms proposed can be modified to produce a schedule knowing only the delays
of the k jobs. Section 6.2 explained in detail the various options that were implemented in the
algorithms so that they could deal with partial information about the delays. The following figures
illustrate the results of the free and the costly testing for different cases and used a sample size of
1000. Although, the simulations were performed for various number of jobs only the instances
with 25 are included as was a representative instance with long enough completion times that in
42
some cases could compensate the costly testing. The comparison of worst-case ratios between
the stochastic and the deterministic problem is also deemed useful to identify instances where the
testing can be used effectively (see the Appendix). Moreover, all the 4 proposed algorithms use
the option "Schedule the Unknown Jobs First", whenever 0 < k < n, where n the number of total
jobs. For the same case, the Johnsons algorithm as well as its modified version use the Ehdi and
treat the unknown jobs as "known", something that did not have the same effect when implemented
in the other 4 algorithms. For the results of testing, we used figures instead of tables to illustrate
the results since it is more interesting to observe the qualitative behavior of the algorithms as they
use more information on the delays. Finally, all algorithms choose to test the k longest jobs in all
cases as it seems to be the most prevalent option compared to the other two. More details about
the two policies ("schedule the unknown first" and "test k-longest") and why they were chosen can
be found in the Appendix.
7.3.1 Equal Processing Times

Figures 11 to 14 show the results that were obtained for 25 jobs when the testing varies from 0 to
100%. For each of the 4 different distributions one can observe the effects on the total completion
time when the testing is costly. The processing times at WS1 and WS2 are equal to 13 for all jobs
and the sample size was 1000.
(a) k-testing for free (b) k-testing costly
Figure 11: 25 Jobs, p1j = p2j = 13, d LogN orm(4.51, 1)
Figure 12: 25 Jobs, p1j = p2j = 13, d N (150, 552 )
43
Figure 13: 25 Jobs, p1j = p2j = 13, d (N (60, 152 ) + N (240, 502 ))
Figure 14: 25 Jobs, p1j = p2j = 13, d U(145, 155)
7.3.2 General Case

Figures 15 to 18 show the results that were obtained for 25 jobs when the testing varies from 0
to 100%. The data of the industrial problem were used for these simulations including the 3 ad-
ditional distributions in order to observe not only the performance of the implemented algorithms
but also the effects of costly testing on the completion time with regard to the distribution of the
delays.
Figure 15: 25 Jobs, p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d LogN orm(4.51, 1)
44
Figure 16: 25 Jobs, p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d N (150, 552 )
Figure 17: 25 Jobs, p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d (N (60, 152 ) +
N (240, 502 ))
Figure 18: 25 Jobs, p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 ) and d U(145, 155)
7.3.3 Testing discussion

In Figures 11 to 14, one can observe the results for the four selected distributions and equal pro-
cessing times at WS1 and WS2. For the free testing it was expected that the Cmax would decrease
as the percentage of tested jobs increased. This, however, does not happen for all distributions and
algorithms. For example, for the deterministic problem in the case of equal processing times and
uniformly distributed delays, we already saw that all algorithms are nearly optimal and these sim-
ulations show that the solution in the stochastic is the same. So, a costly testing would only worsen
the results as shown by the Figure 14. In Figures 11 to 13 and the case of k-testing for free one
can observe that the algorithms SPT(p1j + dj ) and Flexible Johnsons increase their solutions as
the percentage of tested jobs increases until the problem becomes deterministic, where they both
give same solutions as the original Johnsons, proving that stochasticity improves their results for
45
those cases. Remember that in the case of equal processing times the algorithms SPT(p1j + dj ),
Flexible Johnsons and Johnsons are generally bad since they have no sorting ability, so the results
of the deterministic problem can only be worse than those of the stochastic, where they all apply
their free stochastic counterparts. For the lognormal distribution the algorithms LDF, LPT1LPT2,
SPT1LPT2 and LPT1SPT2 improve their solution as they utilize more information of the delays
even if the testing is not for free. With regard to the normal and bimodal distributions, the LDF
algorithm to our surprise after a certain point cannot use the extra information to its benefit even
if it is for free. With that said, note that there are also algorithm curves that are concave, which is
result of the fact that the algorithms are memory-less. For example, the LDF in Figure 13 achieves
its best solution when it tests 40% of the jobs. Yet, if it can test 50% of the jobs it produces a
whole new schedule without taking into account that there might be a better schedule if it would
use less information.
For the general case, the Flexible Johnsons and the Johnsons algorithm are not disabled any-
more and for the case of the normally and uniformly distributed delays they actually improve their
results as the problem tends to the deterministic version. Except for the case, where the delays
follow the lognormal distribution, no algorithm can give a better result than the SPT in the general
case of the full costly testing. Note, that the SPT is actually the SPT(p1j ), a free algorithm, and
thus does not require to test any jobs. It should also be mentioned that all the algorithms presented
in Section 7.2 can be used as free algorithms using only the Ehdj i but we chose SPT as the only
free algorithm in this section due to its overall performance for the stochastic problem. Finally,
from the Figure 17 there is some indication that the stochastic problem with testing can yield a
schedule better than that of the purely stochastic and deterministic schedule.
8 Conclusions and Future Research

In this thesis, we considered the problem F 2|dj |Cmax , which is NP-hard in its deterministic ver-
sion. Due to the fact that the delays might be unknown in reality we introduced the testing ap-
proach as a method to go from the stochastic as close as possible to the deterministic problem.
The main conclusion of this research can be summarized by the famous quotation "The theory
of sequencing and scheduling, more than any other area in operations research, is characterized
by a virtually unlimited number of problem types" (Lawler et al., 1993). In other words, every
different case that was studied can comprise a different problem for which an in depth study can
result in different conclusions. In the beginning of this research an engineering approach enriched
with operations research methods was adopted as a real problem motivated the research, so it is
deemed plausible to comment the findings accordingly. Unfortunately, there is no panacea for the
different variations of this problem. In detail, as far as the deterministic problem is considered,
if the processing times are equal then it is recommended to use the LPT1SPT2. If processing
times are not equal, in the case of lognormal distribution we would advise the decision maker to
choose LPT1LPT2, for normal and uniform distribution the Flexible Johnsons and for bimodal
the SPT. If only one algorithm could be chosen for the whole deterministic problem then the sug-
gestion would probably be for the LPT1SPT2 as it performs better on average. Before discussing
the stochastic case, it is worth mentioning that indeed the optimal schedule does not have to be a
permutation schedule. This observation was made about 20 years ago (Strusevich and Zwaneveld,
1994) but has not received significant attention. For the stochastic variant, Theorem 3 suggests
that if processing times at WS2 are equal then SPT is always optimal. In a more general setting
of the problem with known only the distribution of delays, if the decision maker has the luxury of
choosing an algorithm per distribution then it is recommended to choose SPT for the lognormal,
and Flexible Johnsons for normal, bimodal and uniform. On the contrary, if only one algorithm
could be chosen then the SPT seems to be a very good choice as it performs better on average.
46
Choosing the SPT compared to the LPT or Random can lead to significant reductions in the com-
pletion time as the number of jobs increases. As a result the EDD rule used by the CPM could
be replaced by another scheduling rule. Returning to the other initial research question whether
testing can be employed as a means to retrieve the delays, which was the very innovation of this
research, there is no straight answer. We could speak, though, only for the instances we simulated.
We could say, in general, that choosing to test and not use one of the "free" algorithms (e.g. SPT
or Flexible Johnsons) to create a schedule when the delays are unknown really depends on the
distribution, the number of total jobs as well as the cost function of testing. Specifically, whenever
the delays are uniformly distributed the stochastic version does not differ form the deterministic
as the improvement is 0.3%, which makes testing obsolete for instances of 25 jobs. The same
applies more or less to the normal distribution as the improvement with free testing is almost 2%.
However, the improvement in the case of lognormal and testing for free is 10% for the general
case and 16% if the processing times are equal, which make an improvement of 6.3% and 11.6%
respectively possible in the case of costly testing for the given cost function. The overview of
detailed recommendations for the simulated instances can be found in the Appendix. For further
research, it is recommended that the given algorithms acquire memory. We saw that in the case of
bimodal the LPT1LPT2 achieves a solution better than that of the deterministic setting when 40%
of the jobs are tested. Having memory, the algorithm will be able to produce the best obtained
schedule even if that was achieved under uncertainty. As a result, the testing would be more effec-
tive. Furthermore, it was mentioned before that the value of testing is related to the distribution but
also the cost function. It would be interesting to study how these algorithms would perform not
only with other distributions but with the same distributions and different values. Another aspect
worth investigating is that of learning and particularly when the testing can reduce the processing
time of O1j , which would be based on that assumption that testing is not only information retrieval
but also a sort of processing. Finally, we would recommend further research on the ILP in order
to investigate in depth if the calibration of M is the right strategy to get optimal feasible solutions
or additional constraints are required.
References
Ageev, Alexander A. (2007). A 3/2-Approximation for the Proportionate Two-Machine Flow Shop
Scheduling with Minimum Delays. 5th International Workshop, WAOA 2007 Eilat, Israel, Oc-
tober 11-12, 2007 Revised Papers, pp. 55-66.
Biskup, D. (1999). Single-machines scheduling with learning considerations. European of Journal
Operation Research 115, pp. 173-178.
Boudhar, Mourad and Nacira Chikhi (2011). Two machine Flow shop with transportation time.
Available at http://studia.complexica.net/Art/RI090204.pdf.
Burns, Fennell and John Rooker (1975). A special case of the 3 x n flow shop problem. Naval
Research Logistics Quarterly 22, pp. 811-817.
DellAmico, Mauro (1996). Shop Problems with Two Machines and Time Lags. INFORMS, pp.
777-787.
Frasch, Janick V., Sven Oliver Krumke, and Stephan Westphal (2011). MIP Formulations for
Flowshop Scheduling with Limited Buffers. Theory and Practice of Algorithms in (Computer)
Systems 6595, pp. 127-138.
Johnson, S. M. (1954). Optimal two- and three-stage production schedules with setup times in-
cluded. Elsevier - Discrete Applied Mathematics 1, pp. 61-68.
(1958). Sequencing n Jobs on Two Machines with Arbitrary Time Lags; Alternate proof and
discussion of the general case. Available at http://www.rand.org/pubs/papers/
P1526.html.
47
Kamburowski, Jerzy (2000). Non-bottleneck machines in three-machine flow shops. Journal of
Scheduling 3, pp. 209-223.
Karuno, Yoshiyuki and Hiroshi Nagamochi (2003). A Better Approximation for the Two-Machine
Flowshop Scheduling Problem with Time Lags. 14th International Symposium, ISAAC 2003
Kyoto, Japan, December 15-17, 2003 Proceedings, pp. 309-318.
Lageweg, B. J., E. L. Lawler, J. K. Lenstra, and A. H. G. Rinnooy Kan (1981). Computer-aided
complexity classification of deterministic scheduling problems. Tech. rep. BW138. Amster-
dam, The Netherlands: Centre for Mathematics and Computer Science.
(1982). Computer-aided complexity classification of combinatorial problems. Communica-
tions of the ACM 25, pp. 817-822.
Lawler, E.L., J.K. Lenstra, A.H.G. Rinnooy Kan, and D.B. Shmoys (1993). Sequencing and schedul-
ing: algorithms and complexity, Handbooks in Operations Research and Management Science,
Vol.4: Logistics of Production and Inventory. North-Holland.
Leung, Joseph Y-T., Haibing Li, and Hairong Zhao (2007). Scheduling two-machine flow shops
with exact delays. International Journal of Foundations of Computer Science 18.2, pp. 341-
359.
Levi, Retsef, Thomas Magnanti, and Yaron Shaposhnik (2015). Scheduling with Testing. IN-
FORMS.
Mitten, L. G. (1959). Sequencing n Jobs on Two Machines with Arbitrary Time Lags. Available at
http://pubsonline.informs.org/doi/pdf/10.1287/mnsc.5.3.293.
Monma, C.L. and A.H.G. Rinnooy Kan (1983). A concise survey of efficiently solvable of the
permutation flow-shop problem. RAIRO - Operations Research 17.2, pp. 105-119.
Nabeshima, I. (1963). Sequencing on two machines with start lag and stop lag. Journal of the
Operations Research Society 5, pp. 97-101.
Nawijn, W.M. and W. Kern (1991). Scheduling multi-operation jobs with time lags on a single
machine. Proceedings 2nd Twente Workshop on Graphs and Combinatorial Optimization, U.
Faigle and C. Hoede (eds.), Enschede.
Orman, A.J. and C.N. Potts (1997). On the complexity of coupled-task scheduling. Discrete Ap-
plied Mathematics 72, pp. 141-154.
Pan, Chao-Hsien (1997). A study of integer programming formulations for scheduling problems.
International Journal of Systems Science 28, pp. 33-41.
Pinedo, Michael L. (2004). Planning and Scheduling in Manufacturing and Services. Springer.
(2008). Scheduling: Theory Algorithms and systems. Springer.
Potts, Chris N., David B. Shmoys, and David P. Williamson (1991). Permutation vs. non-permutation
flow shop schedules. Operations Research Letters 10, pp. 281-284.
Rothkopf, M. H. (1966). Scheduling with random service times. Management Science 12, pp.
703-713.
Strusevich, Vitaly A. and Carin M. Zwaneveld (1994). On Non-Permutation Solutions to Some
Two Machine Flow Shop Scheduling Problems. ZOR - Mathematical Methods of Operations
Research 39, pp. 305-319.
Swarc, W. (1968). On some sequencing problems. Naval Research Logistics Quarterly 15, pp.
127-155.
Uetz, Marc (2001). Algorithms for deterministic and stochastic scheduling. PhD Thesis, TU Berlin.
Weiss, G. and M. Pinedo (1980). Scheduling tasks with exponential service times on non-identical
processors to minimize various cost functions. Journal of Applied Probability 17, pp. 187-202.
Wright, T.P. (1936). Factors affecting the cost of airplanes. Journal of Aeronautical Science 3, pp.
122-128.
Wu, C.C. and W.C. Lee (2009). Single-machine and flow shop scheduling with a general learning
effect model. Computers and Industrial Engineering 56, pp. 1553-1558.
48
Xingong, Zhang and Yan Guangle (2010). Machine scheduling problems with a general learning
effect. Mathematical and Computer Modelling 51, pp. 84-90.
Yang, D.L. and M.S. Chern (1995). A two-machine flow shop sequencing problem with limited
waiting time constraints. Computers and Industrial Engineering 28, pp. 63-70.
Yu, Wenci (1996). The two-machine flow shop problem with delays and the one-machine total
tardiness problem. PhD Thesis, Available at http://repository.tue.nl/461119.
Yu, Wenci, Han Hoogeveen, and Jan Karel Lenstra (2004). Minimizing makespan in a two-machine
flow shop with delays and unit-time operations is NP-Hard. Journal of Scheduling 7, pp. 333-
348.
49
Appendix
In this section, we will present the results of the simulations that led to useful conclusions for the
scheduling policy whenever the testing option is limited. These conclusions were implemented and
used to form the final configuration of algorithms and determine the behavior during the limited
test occasion. The two most important questions when the resources allow one to test and learn
the delays of only some jobs are:
1. Which jobs to test first?
2. How to schedule the unknown jobs?
The following sections are devoted to the most important comparisons that determined the above
mentioned scheduling policies.
Which jobs to test first?
(a) d LogN orm(4.51, 1) (b) d N (150, 552 )
Figure 19: Behavior of the algorithms Flexible Johnsons and Johnsons w.r.t. the k-tested jobs for
various instances of free testing: 25 jobs, p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 )
(a) d (N (60, 152 ) + N (240, 502 )) (b) d U(145, 155)
Figure 20: Behavior of the algorithms Flexible Johnsons and Johnsons w.r.t. the k-tested jobs for
50
(a) d LogN orm(4.51, 1) (b) d N (150, 552 )
Figure 21: Behavior of the algorithms LDF and LPT1LPT2 w.r.t. the k-tested jobs for various
instances of free testing: 25 jobs, p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 )
(a) d (N (60, 152 ) + N (240, 502 )) (b) d U(145, 155)
Figure 22: Behavior of the algorithms LDF and LPT1LPT2 w.r.t. the k-tested jobs for various
instances of free testing: 25 jobs, p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 )
(a) d LogN orm(4.51, 1) (b) d N (150, 552 )
Figure 23: Behavior of the algorithms SPT1LPT2 and LPT1SPT2 w.r.t. the k-tested jobs for
51
(a) d (N (60, 152 ) + N (240, 502 )) (b) d U(145, 155)
Figure 24: Behavior of the algorithms SPT1LPT2 and LPT1SPT2 w.r.t. the k-tested jobs for
How to schedule the unknown jobs?
(a) d LogN orm(4.51, 1) (b) d N (150, 552 )
Figure 25: Behavior of the algorithms Flexible Johnsons and Johnsons w.r.t. the position
of k-tested jobs for various instances of free testing: 25 jobs, p1j N (10.4, 3.22 ), p2j
N (15.1, 4.22 )
(a) d (N (60, 152 ) + N (240, 502 )) (b) d U(145, 155)
Figure 26: Behavior of the algorithms Flexible Johnsons and Johnsons w.r.t. the position
of k-tested jobs for various instances of free testing: 25 jobs, p1j N (10.4, 3.22 ), p2j
N (15.1, 4.22 )
52
(a) d LogN orm(4.51, 1) (b) d N (150, 552 )
Figure 27: Behavior of the algorithms LDF and LPT1LPT2 w.r.t. the position of k-tested jobs for
(a) d (N (60, 152 ) + N (240, 502 )) (b) d U(145, 155)
Figure 28: Behavior of the algorithms LDF and LPT1LPT2 w.r.t. the position of k-tested jobs for
(a) d LogN orm(4.51, 1) (b) d N (150, 552 )
Figure 29: Behavior of the algorithms SPT1LPT2 and LPT1SPT2 w.r.t. the position of k-tested
jobs for various instances of free testing: 25 jobs, p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 )
53
(a) d (N (60, 152 ) + N (240, 502 )) (b) d U(145, 155)
Figure 30: Behavior of the algorithms SPT1LPT2 and LPT1SPT2 w.r.t. the position of k-tested
jobs for various instances of free testing: 25 jobs, p1j N (10.4, 3.22 ), p2j N (15.1, 4.22 )
Identification of instances where testing might give improvements

Another important question of this thesis has been "how to identify instances (number of jobs and
distributions) that might lead to improvement of the Cmax ". Since the assumption of testing can
convert the stochastic problem to deterministic, by comparing the ratios CLB max
for the two cases,
we can identify instances to unlock improvements in the completion time. For example, in the
case of lognormal distribution and 5 jobs the ratio in the stochastic scenario is already below 1.1
whereas for 25 jobs is around 1.25. At the same time the various deterministic algorithms (see
Figure 31) have ratios below 1.05 for number of jobs up to 25 which suggests that there is room
for improvement at least when testing is for free for 25 jobs but we cannot say the same for 5
jobs. Note that for the following figures the Stochastic case does not use testing, whereas the
Deterministic induces no cost for testing.
(a) Stochastic (b) Deterministic
Figure 31: Ratios of the algorithms w.r.t. the number jobs, p1j = p2j = 13
and d LogN orm(4.51, 1)
54
and d N (150, 552 )
and d (N (60, 152 ) + N (240, 502 ))
and d U(145, 155)
55
Figure 35: Ratios of the algorithms w.r.t. the number jobs, p1j N (10.4, 3.22 ),
p2j N (15.1, 4.22 ) and d LogN orm(4.51, 1)
p2j N (15.1, 4.22 ) and d N (150, 552 )
p2j N (15.1, 4.22 ) and d (N (60, 152 ) + N (240, 502 ))
56
p2j N (15.1, 4.22 ) and d U(145, 155)
Stochastic problem with equal processing times at WS1 and WS2

Since we presented the results of the deterministic proportionate flow shop it was deemed reason-
able to show also what is the best that can be achieved under uncertainty. The comparison between
the stochastic and deterministic proportionate flow shop shows immediately where the free testing
can be used effectively.
Table 38: p1j = p2j = 13 and d LogN orm(4.51, 1)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 415.60 573.61 702.15 817.30 923.80 1022.41 1367.08 2098.79
LPT 415.60 573.61 702.15 817.30 923.80 1022.41 1367.08 2098.79
Random 415.60 573.61 702.15 817.30 923.80 1022.41 1367.08 2098.79
Johnsons 439.57 627.31 784.52 926.76 1057.67 1183.92 1621.47 2544.62
LB 387.81 510.61 603.14 681.98 748.91 812.35 1022.85 1513.72
Table 39: p1j = p2j = 13 and d N (150, 552 )
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 273.99 343.45 408.25 475.76 539.81 606.94 868.58 1522.12
LPT 273.99 343.45 408.25 475.76 539.81 606.94 868.58 1522.12
Random 273.99 343.45 408.25 475.76 539.81 606.94 868.58 1522.12
Johnsons 292.77 377.28 452.15 524.74 595.70 665.23 936.25 1600.68
LB 241.17 262.20 283.18 325.12 381.60 442.69 692.56 1331.14
57
Table 40: p1j = p2j = 13 and d (N (60, 152 ) + N (240, 502 ))
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 321.85 397.76 464.08 529.41 595.37 659.47 920.92 1572.49
LPT 321.85 397.76 464.08 529.41 595.37 659.47 920.92 1572.49
Random 321.85 397.76 464.08 529.41 595.37 659.47 920.92 1572.49
Johnsons 343.09 437.42 515.43 588.22 659.63 729.21 1000.72 1664.35
LB 291.66 320.59 333.45 343.35 377.27 437.37 693.87 1339.45
Table 41: p1j = p2j = 13 and d U(145, 155)
# jobs
5 10 15 20 25 30 50 100
Algorithm
SPT 231.37 297.10 362.41 427.57 492.69 557.76 817.92 1468.00
LPT 231.37 297.10 362.41 427.57 492.69 557.76 817.92 1468.00
Random 231.37 297.10 362.41 427.57 492.69 557.76 817.92 1468.00
Johnsons 231.37 297.10 362.41 427.57 492.69 557.76 817.92 1468.00
LB 224.70 288.86 353.55 418.39 483.27 548.21 808.09 1458.01
58
Detailed recommendations
The following tables summarize the recommendations for instances of 25 jobs. Note that these
recommendations apply only for this specific number of jobs, since the behavior of the algorithms
is affected by the number of jobs.
Table 42: Overview of recommendations for 25-job instances and testing for free
Distribution Stochastic Stochastic with Testing

p1j = p2j p1j 6= p2j p1j = p2j p1j 6= p2j
lognormal SPT SPT Test + LPT1LPT2 Test + LPT1LPT2
normal SPT Flex. Johnsons Test + LPT1SPT2 Test + Flex. Johnsons
bimodal SPT Flex. Johnsons Test + LPT1LPT2 Test 40% + LPT1LPT2
uniform Flex. Johnsons Flex. Johnsons DO NOT TEST Test + Flex. Johnsons
Table 43: Overview of recommendations for 25-job instances and costly testing
Distribution Stochastic Stochastic with Testing

p1j = p2j p1j 6= p2j p1j = p2j p1j 6= p2j
lognormal SPT SPT Test + LPT1LPT2 Test + LPT1LPT2
normal SPT Flex. Johnsons DO NOT TEST DO NOT TEST
bimodal SPT Flex. Johnsons Test 60% + LPT1LPT2 DO NOT TEST
uniform Flex. Johnsons Flex. Johnsons DO NOT TEST DO NOT TEST
59

Thesis: Testing in Scheduling Problems For Information Retrieval

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis: Testing in Scheduling Problems For Information Retrieval

Uploaded by

Copyright:

Available Formats

Testing in scheduling problems for information

Master Thesis Operations Research and Business Econometrics

5 Integer Linear Programming 26

8 Conclusions and Future Research 46

EDD Earliest Due Date

FCFS First Come First Served

ILP Integer Linear Program

LDF Longest Delay First

LPT Longest Processing Time

SPT Shortest Processing Time

Figure 1: Illustration of a middle size project

1.3 What happens in reality and what can be achieved

2. Burns and Rooker (1975): min{p1k , p2k } dk for all k.

2.2 Main categories of scheduling problems

2.3 Learning and Testing

3.1 Problem Definition

3.1.1 Deterministic Model

All jobs are available for processing at WS1.

All jobs must be processed first at WS1 and then at WS2 .

Preemption is not allowed.

No job can skip either O1j or O2j .

3.1.2 Stochastic Model

All jobs are available for processing at WS1.

All jobs must be processed first at WS1 and then at WS2.

Preemption is not allowed.

No job can skip either O1j or O2j .

3.1.3 Stochastic Model with Testing

CoTj = j p1j (2)

3.2 Model Complexity

4.1 Lower Bounds

LB2 = max{p1j + dj } + p2j (4)

As a result the lower bound can be defined as:

LB = max{LB1, LB2, LB3} (6)

4.2 Theorems and Observations

Schedule 1: First b, then a

Figure 2: Schedules at WS1 and WS2 when b is scheduled first

Schedule 2: First a, then b.

Figure 3: Schedules at WS1 and WS2 when a is scheduled first and ra rb

Job p1j p2j dj

Figure 6: Illustration of the schedules at WS1 and WS2 for Example 1

rn < rn1 < ... < r2 < r1 n + dn < 1 + d1 (7)

Job p1j p2j Ehdj i

Job p1j p2j Ehdj i actual d

processed and the n2

Figure 8: Illustration of the schedules at WS1 and WS2 for Example 3

Table 5: Example instance where testing the k-longest jobs is effective

Job p1j p2j Ehdj i actual d

Figure 9: Illustration of the schedules at WS1 and WS2 for Example 4

Job p1j p2j Ehdj i actual d

Algorithm 1: Longest Delay First (LDF)

Algorithm 2: Delays in pairs LPTlong LPTshort (LPT1LPT2)

Algorithm 4: Delays in pairs LPTlong SPTshort (LPT1SPT2)

4.4 Local search properties

(i) p1k p2h dh

(iii) p1h p1k dh dk 0

(iv) p2k p1h dh dk 0

If h is scheduled first then:

2p1k < p1h rk < rh

(ii) p1h p2h dh

(iii) p2k p1h dh dk p1k p1h 0

p1k + dk dh < 0 rk < rh

(ii) p1h p2k

(iii) p1k p2k dh dk p1k p2h