Professional Documents
Culture Documents
net/publication/220105766
CITATIONS READS
40 882
8 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Jacek R. Radzikowski on 17 May 2019.
*This work has been funded by the DoD under the LUCITE contract #MDA904-98-C-A081 and by the NSF INT-
0109038
scheduling version was used in our experiments. PBS was originally
policies developed to manage aerospace computing resources at
Resource Manager NASA Ames Research Center.
resource available Sun Grid Engine/CODINE is an open source package
requirements resources Resource supported by Sun Inc. It evolved from DQS (Distributed
Job Scheduler Monitor Queuing System) developed by Florida State University.
User
Server Its commercial version called CODINE was offered by
Job GENIAS Gmbh in Germany and became widely deployed
jobs & Dispatcher in Europe.
resource allocation
their requirements
and job execution Condor is a public domain software package that was
started at University of Wisconsin. It was one of the first
User job
systems that utilized idle workstation cycles and supported
checkpointing and process migration.
Figure 1. Major functional blocks of a Job
Management System 2.3. Functional similarities and differences
among selected Job Management Systems
c. perform centralized job scheduling that matches all
available resources with all submitted jobs according to The most important functional characteristics of
the predefined policies [10], selected four JMSs are presented and contrasted in Table
d. allocate resources and initiate job execution, 1. From this table, it can be seen that LSF supports all
e. monitor all jobs and collect accounting information. operating systems, job types, and features included in the
To perform these basic tasks, a JMS must include at table. CODINE lacks support for Windows NT, stage-in
least the following major functional units shown in and stage-out, and checkpointing. PBS and Condor trail
Fig. 1: LSF and CODINE in terms of support for parallel jobs,
1. User server – which lets user submit jobs and their dynamic load balancing and master daemon fault
requirements to a JMS (task b), and additionally may recovery. They also support a smaller number of operating
allow the user to inquire about the status and change systems compared to LSF.
the status of a job (e.g., to suspend or terminate it).
2. Job scheduler – which performs job scheduling and
queuing based on the resource requirements, resource
3. Experimental Setup
availability, and scheduling policies (task c).
3. Resource manager – used to monitor resources and 3.1. Metrics
dispatch jobs on a given execution host (tasks a, d, e).
The following performance measures were investigated
2.2. Choice of Job Management Systems in our study:
1. Throughput is defined in general as a number of jobs
More than twenty JMS packages, both commercial and completed in a unit of time. Since this number depends
public domain, are currently in use [1, 7]. For interest of strongly on how many jobs are taken into account, we
time we selected four representative and commonly used consider throughput to be a function of the number of
JMSs jobs, k, and define it as k divided by the amount of time
• LSF – Load Sharing Facility necessary to complete k JMS jobs (see Fig. 2a). We also
• PBS – Portable Batch System define total throughput as a special case of throughput for
• Sun Grid Engine / CODINE, and parameter k equal to the total number of jobs submitted to
• Condor a JMS during the experiment, N (see Fig. 2b).
The common feature of these JMSs is that all of them are In Fig. 3, we show the typical dependence of
based on a central Job Scheduler running on a single throughput on the number of jobs taken into account, k. It
node. can be seen that throughput increases sharply as a function
LSF (Load Sharing Facility) is a commercial JMS from of k until the moment when either all system CPUs
Platform Computing Corp. [11,12]. It evolved from become busy, or the number of jobs submitted and
Utopia system developed at the University of Toronto completed in a unit of time become equal. When the
[13], and is currently probably the most widely used JMS.
PBS (Portable Batch System) has both a public domain
and a commercial version [14]. The commercial version
called PBS Pro is supported by Veridian Systems. This
Table 1. Conceptual functional comparison of selected Job Management Systems
LSF CODINE PBS Condor
Distribution commercial public domain commercial and public domain
public domain
Operating System Support
Linux, Solaris yes yes yes yes
Tru64 yes yes yes no
Windows NT yes no no partial
Types of Jobs
Interactive jobs yes yes yes no
Parallel jobs yes yes partial limited to PVM
Features Supporting Efficiency, Utilization, and Fault Tolerance
Stage-in and yes no yes yes
stage-out
Process migration yes yes no yes
Dynamic load yes yes no no
balancing
Checkpointing yes using external only kernel-level yes
libraries
Daemon fault master and execution master and execution only for execution only for execution
recovery hosts hosts hosts hosts
Job submissions
time
1 N
k
Throughput (k) =
Jobs finishing execution Tk
time=0 time
i1 ik iN
Job submissions
time
1 N
N
Total Throughput =
Jobs finishing execution TN
average CPU The network structure of the testbed is flat, so that every
machine 1 utilization
Experiment 2, regarding the long job list, the total 3.5 Common settings of Job Management
number of jobs was reduced to 75 to keep the time of Systems
each experiment within the range of 2 hours.
Each experiment was repeated for four JMSes, under An attempt was made to set all JMSes to an
exactly the same initial conditions, including the same equivalent configuration, using the following major
initial seeds of the pseudo-random generators. configuration settings:
Additionally, all experiments were repeated 3 times for A. Maximum Number of Jobs per CPU
the same JMS to minimize the effects of random events In all experiments, except Experiment 2, a maximum
in all machines participating in the experiment. number of jobs assigned simultaneously to each CPU
Additionally, each experiment was repeated for was set to one. In other words, no timesharing of CPUs
several different average job submission rates. These was allowed. This setting was chosen as an optimum
rates have been chosen experimentally in such a way that because of the numerical character of benchmarks used
they correspond to qualitatively different JMS loads. For in our study. All benchmarks from the short, medium,
the smallest submission rate, each system is very lightly and long job lists have no input or output. For this kind
loaded. Only a subset of all available CPUs is utilized at of benchmarks, timesharing can improve only the
any point in time. Any new job submitted to the system response time, but has a negative effect on two most
can be immediately dispatched to one of the execution important performance parameters: turn-around time and
hosts. For the highest submission rate, a majority of throughput. This deteriorating effect of timesharing was
CPUs are busy all the time, and almost any new job clearly demonstrated in our Experiment 4, where two
submitted to a JMS must spend some time in a queue jobs were allowed to share the same CPU.
before being dispatched to one of the execution hosts. B. CPU factor of execution hosts
The characteristic features of five experiments The CPU factors determine the relative performance
performed during our experimental study are of execution hosts for a given type of load. Based on the
summarized in Table 2. Experiments 1-4 were designed recommendations given in the JMS manuals, CPU
to measure the performance of each JMS for different factors for LSF and CODINE were set based on the
job submission rates. relative performance of benchmarks representing a
Experiment 5 was aimed at quantifying fault typical load. For each list of benchmarks, two
tolerance of each JMS by determining its resistance representative benchmarks were selected, and run on all
against the master and execution daemon failures. Five machines of distinctly different types. The CPU factors
minutes after the beginning of this experiment, master were set based on an average ratio of the execution time
daemons on a master host or execution host daemons on on the slowest machine to the execution time on the
a single execution host were killed. In one version of the machine for which the CPU factor was determined.
experiment, the killed daemons were restarted one Based on this procedure, the slowest machine had
minute later, in the other version, no further action was always a CPU factor equal to 1.0. The CPU factors of
taken. In all cases, the total number of jobs that remaining machines varied in the range from 1.2 to 1.7
completed execution was recorded and compared with for a small job list, and from 1.4 to 1.95 for the medium
the total number of jobs submitted to the system for and long job lists. The CPU factors of Condor were
execution. computed automatically by this JMS based on the
Condor-specific benchmarks running on the execution
hosts in the spare time. The CPU factors in LSF,
CODINE, and Condor affect the operation of the Submission
script
scheduler. In PBS, the equivalent parameter has no submission
effect on scheduling, and affects only accounting and time beginning
of execution time
time limit enforcement. Job Job
submission
C. Dispatching interval program job
Management job top
System
The dispatching interval determines how often the end of
JMS scheduler attempts to dispatch pending jobs. This execution time
timing top
parameter clearly affects an average response time, as list average
log files log files
of jobs interval
well as scheduler overhead. It may also influence the between
remaining performance parameters. jobs Timing Utilization
postprocessing postprocessing
LSF on one side, and PBS, CODINE, and Condor on program program
the other side use a different definition of this parameter. utilization
timing parameters
In all systems, this parameter describes the maximum parameters
time in seconds between subsequent attempts to Figure 7. Software used to collect performance
schedule jobs. However, in PBS, CODINE, and Condor measures
the attempts to schedule a job also occur whenever a
new job is submitted, and whenever a running batch job In order to determine the JMS utilization, the Unix
terminates. The same is not the case for LSF. On the top utility was used. This utility records an
other hand, LSF has two additional parameters that can approximate percentage of the CPU time used by each
be used to limit the time spent by the job in the queue, process running on a single machine averaged over a
and thus reduce the response time. short period of time, e.g., 15 seconds (see Fig. 5). For
F. Scheduling policies each point in time, the sum of percentages corresponding
No changes to the parameters describing scheduling to all JMS jobs is computed. These sums are then
policies were made, which means that the default First averaged over the entire duration of an experiment, to
Come First Serve (FCFS) scheduling policy was used for determine an average utilization of each machine by all
all systems. One should be however aware that within JMS jobs The execution host utilizations averaged over
this policy, a different ranking of hosts fulfilling the job all execution hosts determine the overall utilization of a
requirements might be used by different JMSs. JMS.
Three programs were developed to support the
4. Methodology and measurement collection experiments and were used in a way shown in Fig. 7. A
C++ Job Submission program has been written to
Each experiment was aimed at determining values of emulate a random submission of jobs from a given host.
all performance measures defined in Section 3.1. All This program takes as an input a list of jobs, a total
parameters were measured in the same way for all JMSs, number of submissions, an average interval between two
using utilities and mechanisms of the operating systems consecutive submissions, and the name of a JMS used in
only. a given experiment. Two post-processing Perl scripts,
In particular, timestamps generated using the C Timing and Utilization post-processing utilities, have
function gettimeofday(), were used to determine been developed to process log files generated by
the exact time of a job submission, as well as the begin benchmarks and the top utility. These scripts generate
and end of the execution time. The function exhaustive reports including values of all performance
gettimeofday() gets the current time from the measures separately for every execution host, and jointly
operating system. The time is expressed in seconds and for the entire Micro-Grid testbed.
microseconds elapsed since Jan 1, 1970 00:00 GMT.
The actual resolution of the returned time depends on 5. Experimental Results
the accuracy of the system clock, which is hardware
dependent. Two most important parameters determining the
The Unix Network Time Protocol (NTP) was used performance of a Job Management System are turn-
to synchronize clocks of all machines of our Micro-grid. around time and throughput. Throughput is particularly
The protocol provides accuracy ranging from important when a user submits a large batch of jobs to a
milliseconds (on LANs) to tenths of milliseconds (on JMS and does not do any further processing till all jobs
WANs). complete execution. Turn-around time is particularly
important when a user
Throughput [jobs/hour] Turn-around Time [s]
140 2500
LSF Codine 123 LSF
119
PBS 114
120 Condor PBS 1949
107 2000
111 111 99 103 Codine 1765
100 91 1627
89
84
Condor 1466
80 72 1500 1293
1134 1148
60 944
1000
40 607
496 462 505
500
20
Figure 8. Average throughput and average turn-around-time for the medium job list
0 0
0.5 job/min 2 jobs/min 0.5 job/min 2 jobs/min
Average job submission rate Average job submission rate
Figure 9. Average throughput and average turn-around-time for the long job list
Throughput [jobs/hour]
Turn-around Time [s]
1400 LSF 1311
1316 140
LSF 120
PBS
1200 1123
120 PBS
Codine 1012
Codine
1000 Condor 100
Condor
800 80
68
605 624 62
596
58 58
532
600 60 50 51 51
52 50
375 42 41 42
328 340
400 224 226
311 327
256
308 40 34
29
33
29 29 31 32
221 192
152
200 20
0 0
4 jobs/min 6 jobs/min 12 jobs/min 30 jobs/min 60 jobs/min 4 jobs/min 6 jobs/min 12 jobs/min 30 jobs/min 60 jobs/min
Average job submission rate Average job submission rate
Figure 10. Average throughput and average turn-around-time for the short job list
Throughput [jobs/hour] Response Time [s]
120 LSF 114 Codine 800
105 734
PBS Condor LSF Codine
700 671
100 91 90 636 PBS Condor
97
82 80 600
80
67 500 452
300 285
40
200
20
100
0 2 jobs/CPU, 4 jobs/min 0
1 job/CPU, 4 jobs/min
1 job/CPU, 4 jobs/min 2 jobs/CPU, 4 jobs/min
Maximum number of jobs per CPU
Maximum number of jobs per CPU
Figure 11. Average partial throughput and response time for medium jobs with two jobs allowed to share a single
CPU
tends to work in a pseudo-interactive mode and awaits performance of Condor, was approximately the same as
results of each subsequent experiment. Additionally, an performance of two best systems.
average response time might be important in case of In Experiment 3, for short job list, the throughputs of
interactive jobs that require user input and thus a constant LSF and Codine were the highest of all investigated
presence of the user. systems. At the same time, the turn-around time was
In Experiment 1, with the medium job list, LSF and consistently the best for CODINE.
PBS were consistently the best in terms of the average The analysis of the system utilization and job
throughput, while PBS was the best in terms of the distribution revealed the following reasons for the
average turn-around time. The overall differences among different relative performance of each system in terms of
all four systems were smaller than 21% for average throughput and turn-around time. LSF tends to dispatch
throughput, and 33% for average turn-around time. jobs to all execution hosts, independently of their relative
In Experiment 2, with the long job list, the throughputs speed (see Fig. 12a). It also uses a complex algorithm for
of LSF, Condor, and PBS were almost identical, while scheduling, which guarantees that jobs are executed
CODINE was trailing by 29%. The turn-around time was tightly one after the other. Both factors contribute to high
the best for PBS and LSF for both investigated throughput. At the same time, distributing jobs to all
submission rates. For the higher submission rate, the machines, including slow ones, increases average
In Experiment 5 (see Table 3), LSF was shown to be
resistant against the master daemon failure. When the
master daemons were killed, they were automatically
restarted shortly after. No jobs were lost. PBS and
Condor, do not have the capability to restart killed master
daemons but they resume normal operation after these
daemons are restarted manually. The interruption has a
limited effect on the number of jobs that complete
execution. All JMSs appeared to be resistant against the
failure of execution host daemons. These daemons were
not automatically restarted, but as a result of their failure,
affected jobs were redirected to other execution hosts.
References