You are on page 1of 2

Parallel compilation of CMS software

Giulio Eulisse*, Stefan Schmid**, Lassi A. Tuura*


*) Northeastern University, Boston (MA), U.S.A.
**) Eidgenössische Technische Hochschule (ETH), Zürich (ZH), Switzerland

LHC experiments have large amounts of software to build. CMS has studied ways to shorten project build times using
parallel and distributed builds as well as improved ways to decide what to rebuild. We have experimented with making idle
desktop and server machines easily available as a virtual build cluster using distcc and zeroconf. We have also tested
variations of ccache and more traditional make dependency analysis. We report on our test results, with analysis of the
factors that most improve or limit build performance.

The compiling process Parallelization of the compilation process


(as performed by distcc)
Albeit often seen as a four different phases:
single operation the preprocessing, the actual
compiling process -- or, compilation, assembly and The problem of improving After some investigation we
better, the building process linking. software building speed is have chosen the
-- is actually made up of probably the second most opensource tool distcc for
common problem in mainly two reasons:
software development
Preprocessing

The source code and all the


.c, .cc,
.cpp files .h .h headers included in to it are
retrieved and merged
(second only to having the 1) it works and it is easy to
together in single, self
consistent and software
source actually compiling). install.
installation indipendent .i file.
All the macros are substituted
For this reason there are
CPP as well with their value.
already a number of 2) it is opensource and big
projects that try to address corporations (like Apple)
Compiling

The compiler processes the


.i
standalone .i file and produces it by sharing the workload are actually contributing to
an assembly source-code .a
to multiple machine in it.
C++
file targeted to the configured
platform
various ways.
Theory of Parallel Algorithms
Assembling

.s
The assembly source in
compiled to binary The performance gain
instructions and a .o object
AS
file is produced obtained by parallelizing a
.so
.so
Preprocessing has to be job on a cluster of machine
libraries performed on the master
libraries
other .o
Preprocessing
Preprocessing
node because development is measured in terms of a
.o other .o environment could be
object
object
Preprocessing
different on remote nodes quantity called speed-up
which is defined as the
Linking

The object file is linked


against the requested
libraries and possibly against
ratio between the time
LD
other object files
Compiling client Compiling client Compiling client
required to perform the task
on a single machine and
Executable
or library
Compiling Compiling Compiling
the one obtained by doing it
Since preprocessed
sources are self-contained on a cluster:
the compilation and
assembling phases can be

Assembling Assembling Assembling


performed on remote
nodes. tsingle
Performance test with scram V0.20 speedup =
tcluster
CMS uses a custom tool In contrast to IGUANA,
called SCRAM for building COBRA -- CMS Naively thinking, one could
its software. application framework --
Linking assume that by
Linking must occur on the master
The current production has no inter-module nodes as it is dependent on parallelizing a job on n
libraries and objects files and it's
version of SCRAM V0.20 dependencies which intrinsically serial because its different machines one
product is singular (either one
adds several bottlenecks allows to compile different excutable or one library) could get a n-fold
for parallelization besides modules in parallel. improvement in
the intrinsic ones present performances.
in the compilation task as However, even if SCRAM This is not actually true
such. is wrapped, the speedup Performance test with SCRAM V1 because every task has an
For example because of with more then 2 hosts is intrinsic serial component,
its recursive nature, Perl is small. f, that cannot be
To overcome the
started over and over parallelized.
bottlenecks of the old
again, adding overhead to Finally, we integrated also This results in having the
version of SCRAM a total
the process. the compiler cache actual maximum speed up
rewrite effort of it has been
ccache which yields a very As it can be seen, the being:
started in late 2003 and it is
good performance, see speedups are very 1
now entering into "prime
Figure "Cobra 7.5.0 different. speedupmax =
time".
This is due to the dictionary f + 1−f
P
(ccache)"
(cache hits only). generation that occurs in
The new version of so that even considering an
SEAL building process
SCRAM -- dubbed V1 -- infinite number of CPUs
which cannot be
abandons the recursive the maximum speedup is
parallelized using distcc.
nature of the old one and limited by the serial
The picture above shows COBRA, which does not
has only one makefile. This component f.
the building time for have such a problem
reduces the serial
IGUANA 4.5.0 -- the shows a good speedup.
component due to SCRAM
interactive framework 1
time-stamping work and
used by CMS -- also
uses different algorithms
speedup∞ =
separating the different f
for dependencies. The
contributions to the
overhead of a preliminary
building time.
version of SCRAM v.1
As it can be seen, only the
turned out to be less than
pure compilation can be
one second, which makes it
sped up, the remaining
very appealing when
components 'Linking' and
compared to the old V0.20
'Rest' --- consisting of
version.
SCRAM, preprocessing,
and networking ---
In our test we built the
contribute a significant
projects SEAL 1.3.3, POOL
part to the whole
1.5.0 and COBRA 7.6.2.
execution time and are not
The result can be found in
sped up.
the following table.
Automatic compiling peers discovery References

As is, distcc works very Hennessy, J.: Computer


well in static environments Architecture. A Quantitative
where the peers that share Notebook of Approach. (2002); 3rd
a user in a
the compilation workload meeting Edition, San Francisco:
are well defined. Busy Morgan Kaufmann Publishers
doing
This is not always the case, other stuff
Openspace
because it's not always desktop
Busy
Tanenbaum, A.S.: Moderne
Dedicated
possible to afford a Compiling doing
other
Betriebssysteme (1995); 2nd
farm
separate building farm. User
stuff Edition, Munich, Vienna:
Moreover one would like to machine
User starts a
Hanser, p. 697ff.
take advantage of Developer
desktop
compilation job.
After prepocessing
the sources are
occasional spare resources sent to compiling
Busy
doing
Mark Lutz: Programming
farm for compilation
that might appear under Boss
other
stuff
Python (2001); 2nd Edition,
certain circumstances and desktop Busy
doing
Sebastopol: O’Reilly.
other
he would like that this stuff

happened in automatic Powers, S., Peerk, J.,


way. O’Reilly, T., Loukiedes, M.:
Unix Power Tools (2002); 3rd
For this reason we Edition, O’Reilly.
I'm not busy
developed a simple server IDLE
that, once installed on a The idle nodes
http://iguana.web.cern.ch/
client machine, would notify IDLE
signal their
availability via
I'm not busy
iguana/
the network neighbourhood ZeroConf

about the availability of the http://cobra.web.cern.ch/


client once a certain policy- Some meeting
starts.
cobra/
specified condition is Some of the users
join it leaving their Not joining
satisfied. computers idle. the http://cmsdoc.cern.ch/orca/
meeting...
Still busy...

This is similar to work IDLE


I'm not busy http://cmsdoc.cern.ch/oscar/
already done by Apple to its
private version of distcc but http://seal.web.cern.ch/seal/
since it is not available
under Linux we decided to http://pool.cern.ch
develop our own prototype.
The user starts
shipping compiling
The compiling farm
http://cmsdoc.cern.ch/
jobs also to the
Like Apple we based our newly spotted Compiling goes down for
maintenance
Releases/SCRAM/current/
available machines.
server on ZeroConf service but the user still
has more CPU
doc/html/SCRAM.html
discovery mechanism. Compiling
power using
"casual" working
ZeroConf (A.K.A. Compiling
nodes
http://distcc.samba.org
Rendezvous) is an industry
standard protocol, mainly Preprocessing http://www.zeroconf.org
and linking
developed by Apple
Busy Going down!
Computers Inc., for doing http://dotlocal.org
other stuff
advertising and discovering
of services over IP based Compiling
http://ccache.samba.org
networks.
http://per.bothner.com/papers/
It works by using the well GccSummit03-slides/
defined DNS-SD protocol
and can be used in both http://www.mosix.org
server-less environments
(via multicast DNS usage) Every machine in our
prototyped system runs
as well as in managed
two processes and it can
Other enhancements Conclusions:
ones.
actually serve as and future directions From the analysis made 4. The speed-up depends 6. Integration of ccache is
compiling client as well as above, we can draw the on many parameters, which easy and very useful.
it could share it following conclusions: are not only related to the
compilation workload with As it as been explained, when
computer infrastructure 7. It is important to avoid
others. dealing with parallelization, it is
1. In most cases, one (number, latency, simultaneous writes to the
The "server daemon" fundamental to reduce the
additional host reduces the bandwidth, ...) but also to same AFS volume. For
observes the status of its intrinsic serial component of a
execution time remarkably. the project’s properties example, temporary files
underlying host. job.
The utility of a third or forth (e.g. number and size of its should not be written to the
Whenever the host is In the case of compilation a
machine is less obvious. files). It is not possible to home directory in a parallel
idling, the daemon number of different additional
give a formula for an algorithm. With the present
advertises the machine as tricks could be used, for
2. It is crucial to reduce arbitrary project to calculate technologies, the
available on the network example the usage of
serial components like lower bounds of distribution of compilation
neighbourhood via precompiled headers to reduce
preprocessing, SCRAM, performance gains. jobs to idling hosts provides
multicast. the preprocessing phase, using
networking, and so on to only moderate speedups,
The "client daemon" a compiler server to eliminate
get a good speed-up. 5. Pragmatic rules of thumb i.e. a factor of two can
notices the availability of compiler startup time and the
have to be applied, as it is hardly be achieved even
new machines and adds usage of compilers caches such
3. SCRAM V0.20 and the difficult to predict which -j- with dozens of desktops.
them to the lists of as ccache to actually avoid
generation of SEAL’s option is best or whether to
compiling hosts, recompiling of sources that
dictionary are significant include local host or not.
making sure that they have not changed across two
bottlenecks for
actually have a compatible different versions of the
parallelization.
compiler. software.

You might also like