PARC++ - A Parallel C++

SOFTWARE-PRACTICE AND EXPERIENCE, VOL. 25(6), 62?
-636 (JUNE 1995)
PARC++: A Parallel C + +
K A I TODTER AND CARSTEN HAMMER
Siemens A G , ZFE T S E 3, 81739 Munich, Otto-Hahn-Ring 6, Germany (email:
{ kai. toedter, hammer} @zfe.siemens.de)
AND
WERNER STRUCKMANN
Institute of Programming Languages and Information Systems, Technical University
Braunschweig, P. 0. Box 3329, Germany (email: struck@ips.cs. tu-bs.de)
SUMMARY
PARC+ + is a system that supports object-oriented parallel programming in C+ +. PARC+ + provides
the user with a set of predefined C + + classes that can easily be used for the construction of parallel
C + + programs. With the help of PARC++ objects, the programmer is able to create and start new
processes (threads), to synchronize their activities (Blocklock, Monitor) and to manage communication
via message passing (Mailbox). PARC+ + is written in C + + and currently runs on top of the EMEX
operating system on a FORCE machine with 11 processing elements and an EDS (European Declarative
System) with 28 processing elements. The paper also contains information about the run-time system
model, the implementation and some performance measurements.
KEY WORDS: C+ + ; object-oriented programming; parallel computing; European Declarative System (EDS)
INTRODUCTION
The objective of this project is the design of a general-purpose C + + class library
to provide the user with a programming system for writing object-oriented parallel
programs. The primary design objectives were:
1. Portability. PARC+ + should be portable among computer architectures that
provide shared or virtually shared memory.
2. User-friendliness. The constructs should be user-friendly and easy to under-
stand.
3. Broad range of MIMD architectures. PARC+ + should provide message passing
as well as shared memory constructs (the target architecture includes shared
memory machines as well as distributed memory machines with virtually shared
memory).
4. Compiler independence. The constructs should be implementable in pure C+ +
without changes in the compiler.
We chose C + + because we wanted a widely used programming language that
provides data abstraction and object-oriented programming. In addition, C+ + is
relatively easy to learn for C programmers. For further understanding of this article,
CCC 0038-0644/95/060623-14 Received March 1993

01995 by John Wiley & Sons, Ltd. Revised November 1994
624 K. TODTER, C. HAMMER A N D W . STRIJCKMANN
basic knowledge of C+ + is useful. Some concepts have been taken from PRESTO
l y 2
(another parallel C + + extension developed by Bria.n N. Bershad, Edward D.

Lazowska and Henry M. Levy3) and have partly been extended.
PARC++ contains a very direct approach to support parallelism. In fact, object
orientation is used to support parallelism. This approach is a result of our intention
of not changing a certain compiler. Thus we can use any C + + compiler; we are
independent of specific target processors, of vendors and of new releases. Our
system was easy to implement and it remains portable.
We are aware that there are further-reaching solutions for the integration of the
object-oriented and the parallel paradigm, such as simply specifying the execution
of methods to be asynchronous or parallel. For example, a keyword async may
indicate asynchronous (i.e. parallel) execution of methods of other objects, whereas
methods in a par object can be executed parallel to each other, but not necessarily
asynchronously to the caller. For these concepts, you need to write your own
compiler, fiddle with existing compilers or implement at least a precompiler. Integrat-
ing concepts like those indicated above using a precompiler will be part of a following
project.
OBJECT MODEL IN PARALLEL PARC++ PROGRAMS

PARC+ + is a system that supports object-oriented parallel programming in C+ +.
As in PRESTO, an object hides not only its data and implementation, but also its
execution. The user of an object chooses between synchronous and asynchronous
invocation, and the implementor of an object chooses between sequential and parallel
execution. If, for example, an object matrix contains a method ma-produkt that
computes the product of the current matrix with another one, the user does not see
if the method executes sequentially or if the cornputatlion is divided into different
concurrent processes (threads). Furthermore, the user of an object has the possibility
of invoking the methods synchronously or asynchronously. When a method of an
object is invoked asynchronously, it runs concurrentlly to the invoker’s further
execution.
PARC+ + provides several classes to be used to write parallel programs. Objects
of these classes can be used to invoke methods of other objects (or conventional
functions) asynchronously, to return results etc. In addition PARC+ + provides
classes for communication and synchronization, some based on shared memory (e.g.
Blocklock or Monitor), some based on message passing (e.g. Mailbox). Additionally,
some conventional (shared memory) synchronization meichanisms such as monitors”
are available to the user.
There are different reasons for providing both shared memory and message passing
constructs. First, certain algorithms can be implemented in a more natural way with
message passing constructs, even on a shared memory machine. Secondly, data
exchange in a virtually shared memory environment via message passing may be more
efficient than reading and writing of virtually shared data. Thirdly, the programmer is
able to develop portable programs for different computer architectures.
Most PARC+ + classes are parametrized. For example, the constructor contains
some parameters for the internal structure of an object, and the user of a monitor
object may specify the number of condition variables. The following sections describe
the additional classes provided by PARC+ + programs.
PARC+ + : A PARALLEL C+ -k 625
The thread class
The thread concept is an extension of the corresponding PRESTO3 concept.
Threads are therefore the building elements of a PARC++ program. Before a
thread can be started, it has to be created. The maximum stack size can be specified
in the constructor. After a thread has been created, it can be executed by the
method start(...), which needs an object as the first and its method as the second
parameter. Then, the parameters of the object’s method are given. If conventional
functions are to be executed in parallel, the first parameter is zero, the second
parameter the function itself. The following example describes creating and starting
of threads:
class testclass {
public:
testclass 0; //constructor
void fct(int i, long I); // method
>;
/I ...
testclass *c = new testclass(); creates a new object of the class testclass
Thread *t = new Thread(10000); // creates a new Thread with stack size
10000
t- >start(c,c- >fct,4.5); I/ passes control of parallel execution
// of c->fct(4,5) to t
A thread object includes further methods:

(a) int test0 : tests whether the function invoked by start has been finished. In
this case, a 1 is returned, otherwise a 0.
(b) void wait0 : waits for the termination of the function invoked by start, where-
upon the own processing element is released.
(c) void wait-allO : waits for the termination of the function and all its descendants
while the own processing element is released.
(d) void* result() : returns the result of the function invoked by start, which has
to be cast explicitly to the desired type. The result type of void functions is
undefined. If the result is not yet available, result blocks until the function
is terminated.
(e) void kill() : kills a thread and aborts the invoked function. A possible use of
the method kill is the explicit killing of threads which wait for events that
cannot occur any more.
Suppose that the function fct of the last example were not defined as void but as
long fct(int i, long I). Then test0 and result() could be used like
long res; /I for the result of c->fct

if(t- >test()) 11 tests whether c->fct(4) is finished
res = (1ong)t->result(); // casts result
else ... // do some other work
By default, the load balancing is automatically done by the PARC++ run-time

626 K . TODTER, c. HAMMER AND w. STRUCKMANN
system. But the user can explicitly set the target node where a thread should run
(int set-target(int node)).
There are further member functions to handle thread objects, for example, to
get the processing element on which a thread was started or running, the level in
the process tree, the parent process etc.
The Blocklock and Spinlock classes

To synchronize the access to shared objects, PARC+ + provides the programmer
with two 'low-level' classes, called Blocklock and Spinlock. The methods of these
classes are lock, try-lock and unlock with the usual semantics (like a binary
semaphore). If the lock is not available, a Blocklock releases the processing element,
whereas a Spinlock spins on the processor until the loclk is available.
In most cases, Blocklock should be preferred, but there are cases where the
programmer knows that the waiting time would be minimal (and the cost for a
thread switch is higher). In such cases, a Spinlock is the better alternative. The
following example shows the use of Blocklock and Spinlock:
Blocklock *b = new Blocklock(); // b is a pointer to a Blocklock

Spinlock *s = new SpinlockO; // s is a pointer to a Spinlock
b- >lock(); // tries to get the lock, continues only if b

has got the lock
// if the lock is not available, another thread
may run
// critical section ...

// holds the lock
b->unlock(); // releases the lock
S- >lock(); // tries to get the lock, continues only if s has

got the lock
// if the lock is not available, s has to wait
(busy wait)
N critical section ...

// holds the lock
s->unlock(); // releases the lock
if (b- >try-lock) N tries to get the lock

{
//critical section // only if the lock is immediately available!
b->unlock();
1
else // the lock was not available
// no critical section
PARC+ -k : A PARALLEL C+ -k 627
The Monitor class
The Monitor class is a variant of a Brinch Hansen monitor4with included condition
variables. A monitor can manage several condition variables. The number of con-
dition variables can be stated optionally in the constructor. The default value is set
to zero (no condition variables). The methods of a Monitor object are
(a) void enter() : enters the monitor. After execution of the enter() method, the
critical program code follows.
(b) void exit() : releases the monitor. This is one of the methods that can be used
after execution of the critical code.
(c) void wait(int = 0) : if a thread executes a wait0 method, it will be inserted in
a process queue and releases the monitor.
(d) void signal(int = 0) : if one thread holds the monitor and executes a signal()
operation, then a waiting thread will be woken up and will hold the monitor
again without executing another enter operation. After executing the signal0
operation, the monitor will be released.
(e) void signal-wait(int,int) : combines the signal and wait operations. The program-
mer executes a signal operation to one condition variable and at the same
time a wait operation to another condition variable without releasing the
monitor.
The following consumer/producer example shows the use of monitors and condition
variables:
class buffer : Monitor {

int full; // full buffer = 1, empty buffer = 0;
int b; // buffer for 1 int value;
public:
buffer() : Monitor(2){} // 2 condition variables
void put(int c)
{
enter(); // enter monitor
if (full) wait(0); // wait (if buffer is full) and release monitor
b = c; full = 1; // monitor is entered
signal(1); // wake up "get" and release monitor
1
int get()
i
int c;
enter(); // enter monitor
if (!full) wait(1); // wait (if buffer is empty) and release moni-
tor
c = b; full = 0; // monitor is entered
signal(0); // wake up "put" and release monitor
return c;
1
1
class ProdCon {
628 K . TODTER, C. HAMMER A N D W. STRUCKMANN
buffer *buff;
public:
ProdConO
{
*buff = new buffer(); // new buffer monitor
I
void producer0 {. ..} // produces a character and puts it in the
buffer
void consumer0 { ...} // consumes a character from the buffer
1
main0
{
ProdCon *e = new ProdConO; // new producer/consumer
Thread *p = new ThreadO; // creates two threads
Thread *c = new ThreadO;
p- >start(e,e- >producer); // starts prodlucer and consumer
c- >start(e,e- >consumer);
}
The Mailbox class

The Mailbox class provides constructs for the efficient realization of communication:
int async-send(int type, char* format ...) : asynchronously sends a message that
can be received from the current mailbox object. The type is used to distinguish
different message types. The format string corresponds to the printf format
string (with the exception of the per cent sign). After executing an async-send
operation, the thread continues.
int async-send(int type, int size, char* buffer) : asynchronously sends a message
containing size bytes where buffer is a pointer to the message.
int sync-send(int type, char* format ...I : synchroi~ouslysends a message that
can be received from the current mailbox object. type and format string are
defined as in async-send. The sync-send operation blocks until a receive
operation is executed.
int sync-send(int type, int size, char* buffer) : synchronously sends a message
of size bytes where buffer is a pointer to the message.
int receive(int type, char* format ...) : receives a message of the type type. The
format string is identical to that of the send operation. Here, the format
string is redundant and only used for error detection. If there is no message
in the mailbox, the receive operation waits (blocking) until a message arrives.
int try-receive(int type, char* format ...) : tries to receive a message of the type
type. If there is no message in the mailbox, the try-receive operation immedi-
ately returns a 0. Otherwise, the message is received and a 1 is returned.
The format string is identical to that of the receive operation.
All five operations return an error code where a 1 signals a successful execution.
The number of message types managed by a mailbox has to be indicated in the
constructor. The following example shows the use of mailboxes:
PARC+ -k : A PARALLEL C+ -k 629
class msgpass : Mailbox {
public:
msgpasso : Mailbox(3) // a mailbox with 3 types (0,1,2)
void sender()
{
int err; // error code
sync-send (0."s","hello world"); // synchronous sending of a char*
// blocks until the receiver gets the message
// the following async-send and receive operations realize
// a bidirectional rendezvous
// the corresponding part is realized in the function receiver20
char strLl281;
// ... initialization of str
async-send(1 ,strlen(str), (char*)str); // asynchronous sending of str
receive(2,"d",&err); // receives the error code
/I the format string expects an int
1
void receiver1(1
{
char* rec-msg;
receive(O,"s",&rec-msg); // receives a message of type 0
// the format string expects a string
}
void receiver20
{
long I; // message to receive
int err; // internal error
// the receive and async-send operations are the counterpart
// to the above bidirectional rendezvous
char str[l281;
receive(?, 128, (char*)str); // receives a message of type 1
// 128 bytes, buffer str
/I set err, computation ...
if (err) // returns an error message
async-send(Z,"d", 1);
else
async-send(2, "d",O);
1
}
As message passing constructs of a Mailbox support both synchronous and asynchro-
nous sending, they are suitable for any kinds of problems. A bidirectional rendezv-
ous, for instance, is easy to simulate if the sender immediately receives a return
message that is sent from the receiver after having done some work (see example).
630 K. TODTER, C. HAMMER AND W. STRlJCKMANN
The Info and Clock classes

PARC++ provides an Info class that gives information about the PARC++
system. The most important methods are:
(a) int get-PEO : returns the physical number of the current processing element
(PE).
(b) int get-virt-PEO : returns the logical (virtual) number of the current processing
element. These numbers are counted from 0 to (all PEs in system - 1).
(c) int get-PEO-count : returns the number of PEs in the current system.
With help of the Info class, the user is able to decide at run-time how fine-grained
the parallelism should be or whether it is meaningful to invoke some code asynchro-
nously or not.
In the Clock class, methods to reset, get and print a micaosecond clock are included.
It is also possible to synchronize the clocks of all PEs in the system. After calling
the method sync0 of one Clock object, the member func:tion get() of the same object
returns a node-independent synchronized value.
Classes for visualization and tracing

In a parallel environment, it is meaningful to have the opportunity of tracing
parallel constructs. For this purpose, PARC+ + provides three different trace for-
mats. Thus different visualization tools which may be used for debugging or perform-
ance analyses can be invoked.
The first tool is called ‘Visit’ (Visualize it!). Visit’ has been developed by H.
Ilmberger, C . P. Wiedemann, S. Thiirmel and J . Acebr6n at Siemens. PARC++
provides an interface to Visit that allows real time visualization as well as replay of
the trace events. Visit provides a graphical representation of threads, locks and
mailboxes. For example, a process tree shows all the dependencies on father and
child threads (see Figure 1). In this process tree, icons give information about the
Figure 1. Visit
PARC+ -t: A PARALLEL C+ + 631
state of a thread (busy, terminated, waiting for mail, waiting for a lock, holding a
lock etc.).
Furthermore, there are displays for message queues (mailboxes), thread queues
(locks), bar charts for processor utilization and sequence charts for the time stamps
of the trace events.
The second trace format that P A RC+ + provides can be visualized with Para-
Graph,6 a visualization tool developed by M. T. Heath and J. A. Etheridge at Oak
Ridge National Laboratory. Whereas Visit is used for debugging (it helps the
programmer to understand the relationship between the parallel constructs), Para-
Graph is more suitable for performance analyses and tuning. ParaGraph provides
many displays for process utilization and communication. To analyse the perform-
ance, ParaGraph displays the three processor states ‘busy’, ‘idle’ and ‘overhead’.
There is one handicap* in visualizing PARC++ traces with ParaGraph: the only
objects ParaGraph handles are nodes (processing elements). In a multi-threaded
object-oriented environment such as PARC+ + it is more interesting to represent
objects as threads, mailboxes etc. than the nodes on which the methods of the
objects run. Because of this, we have changed the semantics of the ParaGraph
trace-file and have mapped PARC++ objects to ParaGraph nodes. For example,
if we have two PARC++ threads and one PARC++ mailbox, we map them to
ParaGraph nodes 0, 1 and 2. The advantage of this mapping is that the user is able
to analyse the performance of single threads.
The third trace format of PARC++ can be visualized with the POPA17 (Parallel
Object-oriented Program Animation Instrument). POPAI has been developed by
K. Kiderle as a diploma thesis at the University of Munich. All PARC++ objects
except threads can be visualized (see Figure 2). For example, POPAI shows the
queues of mailboxes and monitors. Furthermore, it is possible to create user events
which can easily be visualized with POPAI.
Vlawllratlm
Figure 2. POPAI
* ParaGraph usually visualizes PICL8 trace-files. In PICL there is only one process running on each node, and
so the handicap does not appear.
632 K. TODTER, C. HAMMER A N D W. STRUCKMANN
COMPARISON WITH OTHER PARALLEIL C+ + DIALECTS
In recent years, there have been many attempts to define parallel extensions of
C + + . In this section we want to compare PARC++ with some of these projects.
In PARC++ we have taken the PRESTO3 thread concept because it is easy to
handle and user-friendly. In addition, there are some differences from and extensions
to PRESTO, which further enlarge the user's comfort: in PRESTO (1.0) the user
has to invoke the member function willjoinO before a thread is started if he wants
to get the result after the execution of the thread. A,fter starting, he has to wait
with Objany tval = t->join(),. There is no possibility of testing whether a thread
execution is completed or of waiting for all descendants of a started thread. In
PARC+ + the user may use test(), result(), wait0 and wait-all(). Furthermore, there
is no way of saying explicitly on which PE a thread shall execute, whereas PARC+ +
provides methods such as set-target. The locking mechanisms of PRESTO are similar
to PARC++. In addition, PARC++ provides mechanisms to try whether a lock
is free or not. In P A R C + + monitors, the condition variables are included, in
PRESTO there are external objects which have to be bound to monitors. PRESTO
does not provide any communication via message passing.
COOL9 (Concurrent Object-Oriented Language) was developed by R. Chandra,
A. Gupta and J. L. Hennessy at Stanford University. PARC++, PRESTO and
COOL are similar in that all three attempt to exploit the object model for con-
currency. But there are several differences from PARC++ and PRESTO: COOL
is a syntactical extension of C + + . It provides comfortable mechanisms for con-
currency and synchronization. But to implement COOL, the C + + compiler has to
be changed. In COOL, the programmer can declare a function as parallel, for
example, parallel int too(). If the user invokes foe(), the function will be executed
parallel (it is also possible to invoke a sequential function parallel and a parallel
function sequentially). As a result, it is easier to encapsulate parallelism within the
implementation of a class. Another difference from PARC++ is that COOL does
not provide monitors, but mutex functions. If several member functions of the same
object are declared as mutex, they all execute with mutual exclusion. COOL is no
more powerful (in semantics) than PRESTO or PAR(:++, but more flexible and
concurrency may be finer-grained. For this, the implennentor of COOL has to pay
the high price of changing a C + + compiler for every !special parallel machine.
C + + Parmacsl' (B. Beck, Sequent Computer Systems) is based on the M4
macros, also called Parmacs, which have been developed at the Argonne National
Laboratories. The major difference from PARC++, PRESTO and COOL is the
process model: in C + + Parmacs, the same C + + program (pmaino) runs on every
PE of the system. It is not possible to create new processes or threads dynamically.
Synchronization is done via several types of monitors.
Threads," developed by T. W. Doeppner and A. J . Gebele, is based on a system
by B. Stroustrup,l* which was designed for a single processor. The only way to
execute code in parallel is to define a class as a subclass of Task and to implement
the parallel code within the constructor. Synchronization of tasks can be done by
using monitors which are very similar to PARC++ monitors (we have adapted the
signal-wait from Threads monitors).
The major advantage of P A R C + + (in comparison with the four parallel C + +
dialects above) is the provision of both message passing viia mailboxes and synchroniz-
PARC++ : A PARALLEL C+ + 633
ation via locks and monitors. Furthermore, PARC+ + supports several trace formats
which can be visualized by different tools.
IMPLEMENTATION AND SYSTEM ARCHITECTURE

PARC+ + currently runs on a FORCE machine with 11 processing elements under
the EMEX operating system.13 Furthermore, it runs on an EDS machine,'"16 an
experimental prototype of a multiprocessor computer with up to 256 processing
elements that are connected via a high-speed delta network. Both FORCE and EDS
are distributed memory machines with virtually shared memory.
The use of PARC++ is very easy. The programmer has to include the header
file parc++.h to rename the main0 function as pmain0 and to link his code with the
P A R C ++ library libparc++.a. The main0 function is already provided by the
PARC+ + run-time system. After all initializations have been completed, the pmain()
function will be mapped to the first available user thread.
Communication and synchronization in the PARC+ + run-time system are done
via the interprocess communication (IPC) of the operating system, this holding for
tasks internal to the run-time system as well as for execution of PARC++'s mail-
boxes, locks and monitors. The virtually shared memory of EMEX supports the
shared store model of PARC+ +.
The virtually shared memory mechanism is also used to move code and data to
newly created processes on other processors. The PARC+ + run-time system need
not care about the actual copying of code and data to other processors; only the
start address and some managing information of a process are sent via IPC to
another processor, which then tries to start the code at this address. If the code is
not present, the virtually shared memory mechanism gets an interrupt and fetches
a copy of the corresponding code from the node where it resides. The same holds
for data and other code accessed by the P ARC+ + program. Later accesses to this
codekhese data are then fast local accesses. On the EDS machine, fetching code
from other PEs will be done by a second processor on each PE, the so-called system
support unit (SSU). The PARC+ + program can continue executing another thread
in the meantime.
Because of better run-time efficiency, most of P ARC+ + is implemented without
making use of virtually shared memory, but with extensive use of message passing
constructs provided by EMEX. The EMEX process hierarchy consists (in decreasing
granularity) of jobs, tasks, teams and threads.I3 A PAR<:++ program consists of
one task, one team per processing element and several threads. The basic role of
the PARC++ run-time system is to map PARC++ threads onto EMEX threads
that run on physical processors. To achieve this, the following five PARC+ + system
threads are used:"
1. The init thread. Initializes the distribution thread (see below) on one processing
element and on managing threads (see below) on every available processing
element.
2. The distribution thread. The task of the distribution thread is to distribute
'work' within the PARC+ + run-time system. 'Work' basically means a function
or a method of a user object invoked asynchronously by Thread::start. If any
processor of the system has nothing to do, its managing thread sends a message
to the distribution thread to get some work. The distribution thread manages
634 K . TODTER, c. HAMMER AND w. STRIJCKMANN
a work queue and sends this work (if there is any) to the requesting managing
thread.
3. The managing thread. Has various tasks. First, the managing thread creates a
user thread (see below) and sends a work request to the distribution thread.
If work is available, it will be passed to the user thread. The user thread can
now start execution. If the user thread is interrupted (for example, if a lock
is not available), the managing thread will create another user thread and will
send a new work request to the distribution thread. Furthermore, the managing
thread manages all locks, monitors and mailboxes created on its own processing
element.
4. The user thread. The task of a user thread is to execute the function or method
invoked by Thread::start. When the execution is finished, a message will be
sent to the managing thread.
5. The nowork thread. This thread has the lowest priority in the whole team.
Thus, if the nowork thread gets control, all other system threads (especially
the user threads) wait for some events. The only ‘task of the nowork thread is
to send a nowork message to the managing thread. After receiving such a
message, the managing thread creates a new user thread and sends a work
request to the distribution thread.
PERFORMANCE MEASUREMENTS
PARC++ currently runs on a FORCE machine (MC 68030 Processor on every
board) and an EDS machine (see below). The FORCE, is used as a testbed, where
virtually shared memory is emulated relatively slowly because message passing
between all processing elements is realized via one single bus. The major target
machine of PARC++ is an EDS machine with 40hAHz SPARC processors, a
Delta net for message passing and a separate processor on each processing element
supporting the virtually shared memory and the message passing. Performance
measurements on this machine are more relevant.
The initialization time of PARC+ + depends on the number of available processing
elements. On both machines, initialization time is about 0.14 s for the first processing
element. For every further processing element, the initialization time is about 0.4 s.
The reason for this ‘long’ time is the copying of the whole PARC++ run-time
system to every processing element.
The asynchronous start of a member function costs about 4 ms on FORCE and
about 1.5 ms on EDS. The real time between asynchronous invocation, parallel
execution and getting the return value on the invoker’s site is about 20 ms on
FORCE and about 8 ms on EDS (average values). The time for wait(), wait-all(),
test(), and result0 is about 0~01-0.015ms on both machines.
The time for requesting a lock depends on the processing element on which the
lock was created. For example, if a lock was created on the same processing element,
the request time is about 1.8 ms on EDS (3.6 ms on FORCE). If the lock was
created on another processing element, the request time increases up to 18 ms
(32 ms on FORCE) for the first request. This time includes the initialization of lock
objects. Once initialized, the request time decreases to 1-8 ms on EDS (3.5 ms on
FORCE). So the request time is lower than the page fault time! The time to unlock
a lock is about 0.2 ms on EDS and about 0.8 ms on FORCE. The timing of mailboxes
PARC+ + : A PARALLEL C+ + 635
(receive and send one integer value) and monitors (enter and exit) is similar to the
timing of locks. A monitor's wait0 lasts about the time of 2 * lock0 + unlock(), a
signal() lasts about the time of lock0 + unlock().
Because of the overhead of the asynchronous invocation, only functions with an
execution time of more than 20 ms (on the FORCE machine) or more than 8 ms
(on the EDS machine) should be executed in parallel. The performance of PARC+ +
depends basically on the performance of the message passing constructs and the
mapper of the virtually shared memory, both provided by the EMEX operating
system.
EXPERIENCES WITH PORTING PARC+ +

PARC++ is implemented as a C + + library. So it is relatively easy to port it to
other systems. The original development machine was a FORCE with 16 processor
boards. This code was brought to the EDS machine with minor changes (there had
been some differences in addressing the processor nodes). After that, PARC+ +
was ported to a network of four PCs (CHORUS Mix'' operating system). To do
that, some changes in process creation were necessary. The porting was done (not
by the authors) in one week. The whole PARC++ system contains only about
10,000 lines of code (with comments). The classes the user sees (Thread, Mailbox,
Monitor etc.) are about 3000 lines; the run-time system is about 2500 lines of code.
The remaining code is visualization and testing. To port PARC++ on a parallel
machine, there are three requirements the hardware and/or the operating system
have to meet:
1. Shared memory or at least virtually shared memory must be provided.
2. If the shared memory is virtual, an effective simple message passing should be
available.
3. It must be possible to run multiple threads on each processor element.
Details of the implementation are available from the authors via email.
CONCLUSIONS
PARC+ + is a programming tool to write object-oriented parallel programs, provid-
ing several synchronization and communication constructs. In contrast to other
parallel C + + versions, PARC+ + provides both shared memory and message passing
constructs. As a result, some kinds of algorithms can be implemented in a more
natural way. Furthermore, data exchange in a virtually shared memory environment
via message passing may be more efficient than the reading and writing of virtually
shared data. PARC+ + provides interfaces to several visualization tools such as
Visit, POPAI and ParaGraph, which may be used for debugging and/or performance
analysis. PARC++ is implemented as a C + + class library without changing the
compiler. In this way, it is easy to port it to other systems. The PARC+ + implemen-
tation was the first application of the newly developed EMEX operating system.
We assume that we can increase the efficiency of PARC++ and hope that further
PARC++ applications will show that we have reached our objective.
636 K . TODTER, C. HAMMER AND W. STRIJCKMANN
ACKNOWLEDGEMENTS
We would like to thank Jiirgen Knopp and Thomas Henties for their feedback on
the design and implementation of PARC++ and Frliederike Richter for helpful
ideas and comments on this paper.
REFERENCES
1. B. Stroustrup, The C + + Programming Language, Addison-Wesley, March 1986.
2. B. Stroustrup, ‘An overview of C++’, AT&T Bell Laboratories, Murray Hill, 1986.
3. B. N. Bershad, E. D. Lazowska and H. M. Levy, ‘PRESTO: a system for object-oriented parallel
programming’, Software-Practice and Experience, 18, (8), 713-732 (1988).
4. P. Brinch Hansen, ‘The programming language Concurrent Pascal’, IEEE Trans. Software Engineer-
ing, 2 , 199-206 (1975).
5. H. Ilmberger and C. P. Wiedemann, ‘Visualization and control environment for parallel program
debugging’, HICSS-26, Hawaii, January 1993.
6. M. T. Heath and J. A. Etheridge, ‘Visualizing the performatnce of parallel programs’, IEEE
Software, 8, ( 5 ) , 29-39 (1991).
7 . K. Kiderle, ‘Visualisierung und Anaiyse des dynamischen Ablaufverhaltens paralleler objektorienti-
erter Programme’, Diploma thesis, Technical University Munich, February 1993.
8. G. A. Geist, M. T. Heath, B. W. Peyton and P. H. Worley, ‘PICL: a portable instrumented
communication library’, Technical Report. 1990.
9. R. Chandra, A. Gupta and J. L. Hennessy, ‘COOL: a language for parallel programming’,
Technical Report No. CSL-TR-89-396, University of Stanford, 1989.
10. €3. Beck, ‘Shared-memory parallel programming in C + +’, IEEE Software, July 1990, pp. 38-18.
11. T. W. Doeppner and A. J. Gebele, ‘ C + + on a parallel machine’, USENIX C + + Papers,
Department of Computer Science, Brown University, 1987, pp. 95-107.
12. B. Stroustrup, ‘A set of C + + classes for co-routine style prograimmlng’, AT&T Bell Laboratories
Computer Science Technical Report, available with Release notes for 1.2.1 C + f .
13. H. G . Baumgarten, L. Borrmann, €I. Hartlage, N. Holt, P. Istavrinos and S. Prior: ‘Specification
of the process control language (PCL)’, ESPRIT EP 2025, EDS.DD.lS.0007 Munich, 1989.
14. (3. Haworth, S. Leuning, C. Hammer and M. Reeve, ‘The European declarative system, database
and languages’, IEEE Micro, December 1990.
15. C. J . Skelton, C. Hammer, M. Lopez, M. J. Reeve, P. Townsend and K. F. Wong, ‘EDS: a
parallel computer system for advanced information processing’, Conference on Parallel Architectures
and Languages Europe, Park 92, Paris, June 1992.
16. M. Ward, P. Townsend and G. Watzlawik, ‘EDS hardware architecture’, Conference on Vector
and Parallel Processing, Zurich. September 1990.
17. K. Tiidter. ‘Entwicklung eines parallellen C+ +-Dialektes’, Diploma thesis, Technical University
Braunschweig, January 1992.
18. F. Armand, F. Herrmann, J. Lipkis and M. Rozier, ‘Multithreaded processes in CHORUS/MiX,
Proc. EUUG Spring’90 Conference, Munich, April 1990.
19. C. A. R. Hoare, ‘Monitors: an operating system structuring concept’, Communications of the
ACM, 17, (lo), 549-557 (1974).

PARC++ - A Parallel C++

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PARC++ - A Parallel C++

Uploaded by

Copyright:

Available Formats

SOFTWARE-PRACTICE AND EXPERIENCE, VOL. 25(6), 62?

-636 (JUNE 1995)

CCC 0038-0644/95/060623-14 Received March 1993

(another parallel C + + extension developed by Bria.n N. Bershad, Edward D.

OBJECT MODEL IN PARALLEL PARC++ PROGRAMS

A thread object includes further methods:

long res; /I for the result of c->fct

By default, the load balancing is automatically done by the PARC++ run-time

The Blocklock and Spinlock classes

Blocklock *b = new Blocklock(); // b is a pointer to a Blocklock

b- >lock(); // tries to get the lock, continues only if b

// critical section ...

b->unlock(); // releases the lock

S- >lock(); // tries to get the lock, continues only if s has

N critical section ...

s->unlock(); // releases the lock

if (b- >try-lock) N tries to get the lock

class buffer : Monitor {

The Mailbox class

The Info and Clock classes

Classes for visualization and tracing

COMPARISON WITH OTHER PARALLEIL C+ + DIALECTS

IMPLEMENTATION AND SYSTEM ARCHITECTURE

EXPERIENCES WITH PORTING PARC+ +

You might also like