You are on page 1of 8

Combining traditional jobs with virtualized jobs

in a batch computing infrastructure


Pau Tallada Crespí

April 6, 2011

Objectives
The Port d'Informació Cientíca (PIC) is a center for scientic data process-
ing and storage supporting scientic groups working in projects which require
strong computing resources for the analysis of massive sets of distributed data.
Currently, most of our resources are devoted to the LHC experiments and, as
a consequence, the computing environment in our worker nodes is tailored fol-
lowing their needs.
However, we also work with many other projects whose requirements might
not be fulllled using the LHC computing environment, not to mention all the
other projects with which we may work along in the future. Therefore, in order
to provide a better service to those experiments and to increase the number of
projects we can support, we want to use virtualization technology to provide a
customized computing environment to each experiment. In order to achive that
goal, the following objectives have to be met:
1. Virtualized jobs must coexist in the same physical infrastructure as the
traditional ones. As a small center, we should avoid partitioning our
resources or duplicating maintenance eorts.
2. All the projects must use the same set of computing services (worker nodes,
batch servers, storage facilities, etc.).
3. The virtualitzation technology used must come with or be compatible
with the computing environment. That is a harsh limitation for a rapidly
changing technology such as virtualization.
4. When a user submits a job, she must specify if it has to be run inside a
virtual machine and, in that case, also the virtual machine.
5. The user must specify at the time of the submit the name of the virtual
machine inside which the job must be run.
6. Test the viability of this technique measuring and comparing the perfor-
mance of the traditional computing environment with the virtualized one.

1
Infrastructure
This work is a continuation of previous eorts from Marc Rodríguez to execute
PBS jobs inside virtual machines running on a worker node. For the implemen-
tation, we have used two physical worker nodes and a virtual machine running
the PBS server. Besides, the virtualized jobs would execute inside a virtualized
worked node. Table 1 details the hardware specications of those machines.
Name CPU RAM HDD Purpose
pbs-dev 1 @ 2.33 GHz 512 MiB 10 GiB batch server, scheduler, frontend
wncloud01 2x2 @ 3.00 GHz 8192 MiB 250 GiB worker node
wncloud02 2x2 @ 3.00 GHz 8192 MiB 250 GiB worker node
vwn 1 @ 3.00 GHz 2048 MiB 10 GiB virtualized worker node
Table 1: Machine specications

One of the objectives of this work is to compare the performance of the


virtual machines depending of the virtualization technology in use. That usually
includes the operating system and its kernel. Table 2 contains a summary of
the software installed on each machine, along its respective version. In all cases,
the PBS version used has been torque 2.5.3-1cri.
Name O.S. Linux KVM
pbs-dev SL 5.3 2.6.18-194.32.1.el5 N/A
wncloud01 SL 5.3 2.6.18-194.32.1.el5 kvm-83-164.el5_5.30 (Jan 2009)
wncloud02 Ubuntu 10.10 2.6.35-25-generic 2.6.35-25-generic (Sep 2010)
vwn SL 5.3 2.6.18-194.32.1.el5 N/A
Table 2: Software versions

Implementation
In a batch system, the jobs are run following the procedure described in the
Figure 1:

Figure 1: Traditional job execution ow

1. The user submits a job to the batch server at pbs-dev.

2
2. The batch system matches the job requirements and dispatches it to a
suitable worker node.
3. The job runs on the worker node under the same privileges as the user
who submitted it.
4. When the job nishes, its output is stored in the batch server.
5. Finally, the batch server dispatches the job output to the user.
On the other hand, the jobs that need to be run inside a virtual machine follow
the procedure described in Figure 2:

Figure 2: Virtualized job execution ow

1. The user submits a job to the batch server at pbs-dev, with a ag speci-
fying the virtual machine inside which the job has to be run.
2. The batch server matches the job requirements and dispatches it to a
suitable worker node.
3. On the worker node, the prologue script analyzes the job ags and inter-
cepts it if it is a virtualized job.
4. The prologue script stores the job in a ISO9660 image, along with addi-
tional data such as authentication keys.
5. Then, it replaces the job with the instantiation of the virtual machine
specied by the user.

3
6. When the virtual machine boots, it mounts the externally supplied ISO9660
image and launches a contextualization script.
7. The contextualization script congures the virtual machine and launches
the job with the user's privileges.
8. When the job ends, the contextualization script sends the output to the
physical worker node and shuts down the virtual machine.
9. When the virtual machine has been shut down, the epilogue script runs
and cleans out the temporary data created.
10. Finally, the batch server dispatches the job output to the user.

Benchmarking results
To be able to compare the results between the virtual machines and the physical
ones, the latter were booted with only one core active and the amount of RAM
limited to 2048 MiB. Several tests were run to benchmark the performance of
the network, the disk and the CPU.
Each test has its own chart with the results corresponding to both physical
machines (the on running SL 5.3 and the one running Ubuntu 10.0) and both
virtual machines (running SL 5.3 on top of the formers). Besides, some tests
also have an additional category, which usually corresponds to a more realistic
conguration of the virtual machine, in other words, to the conguration which
should be used in a production environment.
The network test consisted on a series of 10 runs of the iperf tool. The rst
two virtual machines run a bridged network (on a TAP device), and the last one
is using user-mode network on top of SL 5.3. The results are shown in Figure
3.
iperf -c
1000

900

800

700
wncloud01
600 wncloud02
vwn01
Mb/s

500 vwn02
400 vwn01-u

300

200

100

Figure 3: Network benchmarking: iperf

4
The disk performance has been measured using two tools, each one of them
has been run 10 times. The results shown in Figure 4 are those corresponding to
the hdparm tool. Somewhat, it seems that SL 5.3 KVM is not able to virtualize
eciently this kind of operation.
hdparm -t
70

60

50
wncloud01
40 wncloud02
MB/s

vwn01
vwn02
30

20

10

Figure 4: Disk benchmarking: hdparm

The second tool used has been dt. This tool is based upon the well-known
dd. The results shown in Figure 5 throw some strange results:
• The virtual machine on top of Ubuntu 10.10 can read faster than the
underlying physical one. Maybe the caches are doing something tricky
and altering the results.
• The virtual machine on top of SL 5.3 loses half of its write performance
compared to the physical one, but it got completely knocked down when
using the snapshot feature (the fth value).
Finally, the CPU performance has been measured also using two dierent
tools. The rst test involved compressing an ISO image of 1.36 GiB using
bzip2. The results shown in Figure 6 reveal that the virtualized CPU is nearly
as ecient and the physical one.
The second tool is the HEP SPEC 2006 benchmark suite, which is the ref-
erence tool from WLCG to measure the performance of a worker node. The
results in Figure 7 show that the impact of CPU virtualization is nearly neg-
ligible, as the virtualized worker nodes achieve a relative performance up to
92-95%, compared to the physical one.

Concluding remarks
On the positive side of the conclusions, the main one is that the basic function-
ality works very well. In fact, the network and CPU performance is excellent,

5
dt limit=2g bs=1m
60

50

40

wncloud01
wncloud02
vwn01
MiB/s

30
vwn02
vwn01-u

20

10

0
read read (direct) write write (direct)

Figure 5: Disk benchmarking: dt

even the disk read performance in a virtual machine is tied with the physical
one. All this setup is completely transparent to the user and only needs tiny
changes in the conguration of the worker nodes. Furthermore, this approach
provides a set of useful features like snapshotting, reverse vnc, and the ability
for the user to be in complete control of the virtual machine.
However, there are several shortcommings to this approach. The main one
is that the disk write performance penalty is so huge that it is not feasible
for production use. Besides, root privileges are needed to get a good network
throughput using a bridge. Finally, there is still a lot of work to undertake, such
as the problem of image cataloging and distribution.

6
bzip2 1.36 GiB
6,00

5,00

4,00
wncloud01
time (min)

wncloud02
3,00 vwn01
vwn02

2,00

1,00

0,00

Figure 6: CPU benchmarking: bzip2

HEPSPEC2006
14

12

10
wncloud01
8 wncloud02
score

vwn01
vwn02
6

Figure 7: CPU benchmarking: HEPSPEC 2006

7
Workarounds and future work
There are some techniques that can be used to overcome the limitations de-
scribed in the previous section. The low performance of snapshotting, disk
writing and the need for root privileges to use a bridged network can be mit-
igated running a newer version of the hypervisor on the worker nodes. There
are two implementations of the worker node environment, one based on SL 5.3
and another based on Debian 5.0. The latter uses a more recent version of
the kernel (2.6.26 vs 2.6.18) and may solve some of the performance problems
detected. If Debian cannot be used or if the disk write performance persists,
a network exported lesystem could be used to avoid the performance penalty
of the virtualized disk. There are several network lesystem solutions, but the
most promising in this context should be the lightweight ones based on FUSE.
If the workarounds described above do not suce, then maybe it is time
to make the leap and embrace a full cloudy model. There are several sites
which have a fully virtualized farm (including the LHC experiments) and several
software implementations for the cloud management. However, adopting the
cloudy model is a long term project which needs a lot of commitment from all
the departments, as without strong directive guidance and motivation it will
most certainly fail. One nal comment to the cloudy model is that we have
to be able to dierentiate from the traditional cloud providers as we cannot
compete in price. Our advantages have to be the proximity with the user, our
excellence in technical support, or additional services and features.

You might also like