Mpi

|

| |
Dr. Rajkumar Buyya

÷ J
J J

!J
½utline
Introduction to Message Passing

Environments
HelloWorld MPI Program
Compiling and Running MPI programs
½n interactive clusters And Batch clusters
Elements of Hello World Program
MPI Routines Listing
Communication in MPI programs
Summary
Message-Passing Programming
Paradigm
Each processor in a message-passing program
runs a sub-program
written in a conventional sequential language
all variables are private
communicate via special subroutine calls
O O O O

SPMD: A dominant paradigm for
writing data parallel applications
main(int argc, char **argv)
{
if(process is assigned Master role)
{
/* Assign work and coordinate workers and collect results */
MasterRoutine(/*arguments*/);
}
else /* it is worker process */
{
/* interact with master and other workers. Do the work and send
results to the master*/
WorkerRoutine(/*arguments*/);
}
}
Messages
Messages are packets of
data moving between sub-
programs.
The message passing system
has to be told the following
information
Sending processor
Source location
Data type
Data length
Receiving processor(s)
Destination location
Destination size
Messages
Access:
Each sub-program needs to be connected to a message passing
system
Addressing:
Messages need to have addresses to be sent to
Reception:
It is important that the receiving process is capable of dealing
with the messages it is sent
A message passing system is similar to:
Post-office, Phone line, Fax, E-mail, etc
Message Types:
Point-to-Point, Collective, Synchronous (telephone)/Asynchronous
(Postal)
Message Passing Systems and MPI
- www.mpi-forum.org
Initially each manufacturer developed their own message
passing interface
Wide range of features, often incompatible
MPI Forum brought together several Vendors and users of HPC
systems from US and Europe ² overcome above limitations.
Produced a document defining a standard, called
Message Passing Interface (MPI), which is derived from
experience or common features/issues addressed by many
message-passing libraries. It aimed:
to provide source-code portability
to allow efficient implementation
it provides a high level of functionality
support for heterogeneous parallel architectures
parallel I/½ (in MPI 2.0)
MPI 1.0 contains over 115 routines/functions that can be
grouped into 8 categories.
eneral MPI Program Structure
O
O

O

MPI programs
MPI is a library - there are N½ language

changes
Header Files
C: #include <mpi.h>
MPI Function Format
C: error = MPI_Xxxx(parameter,...);
MPI_Xxxx(parameter,...);
Example - C
´

!!"

# "
$ %"
&
MPI helloworld.c
´

''"
!!"
( () *)+,-! '"
( ' () *)+,-!'"

./*
0012
' '"
# "

&
MPI Programs Compilation and
Execution
Manjra: RIDS Lab Linux Cluster
Master: manjra.cs.mu.oz.au

Internal worker nodes:

node1
!" # node2
$" ....
%&' node13
()*+ ,,# -
( &
. /0
1 2
3 4 ! !0
522 !06 4
2 6
|
7
!" #
7$"2 4
%&'
( &
. /0
1 2

"
# !
How Manjra cluster looks
Front View Back View

A snapshot of Manjra cluster
Compile and Run Commands
Compile:
manjra> mpicc helloworld.c -o helloworld
Run:
manjra> mpirun -np 3 helloworld [hosts picked from configuration]
manjra> mpirun -np 3 -machinefile machines.list helloworld
The file machines.list contains nodes list:
manjra.cs.mu.oz.au
node1
node2
- !
..
node6
node13
Some nodes may not work today, if they had failed!
Sample Run and ½utput
A Run with 3 Processes:

Hello World from process 0 of 3
A Run by default
manjra> helloworld

Note: Process execution need not be in

process number order.

Note: Change in process output order. For
each run, process mapping can be different.
They may run on machines with different
load. Hence such difference.
Running Applications using PBS
(Portable Batch System) on
Manjra cluster
PBS
PBS is a batch system - jobs get submitted to a queue
The job is a shell script to execute your program
The shell script can contain job management instructions (note
that these instructions can also be in the command line)
PBS will allocate your job to some other computer, log in as you,
and execute your script, ie your script must contain cd's or
aboslute references to access files (or globus objects)
Useful PBS commands:
qsub - submits a job
qstat - monitors status
qdel - deletes a job from a queue
PBS directives
Some PBS directives to insert at the start

of your shell script:
´34566
´345
´345
´345 7
´345 $
´345 7 8 589
Manjra and PBS
<manjra.cs.mu.oz.au> runs a batch system - called

PBS:
You submit a script telling the system how to run your job
The script requests the number of nodes in DEDICATED mode.
The batch system is PBS
Ú

Ú
Ú

!
" #$$$$%$$ #$$$$%$$ #&$$ '
( $$ '

$
mpich on majra
Run with
67:7

where jobscript is
´3489

PBS Script
> [raj@manjra mpi]$ cat hello.bat

cd mpi
/usr/local/mpich/mpich-1.2.5.2/bin/mpirun -np 5
helloworld-hostname
> [raj@manjra mpi]$ cat hello.sh
#!/bin/bash
cd /home/mpi678-2010/mpi
mpirun -np 5 ./helloworld

Submitting to a Queue
[raj@manjra mpi]$ qsub hello.bat

2811.manjra.cs.mu.oz.au
`
[raj@manjra mpi]$ qsub ²V hello.sh

2811.manjra.cs.mu.oz.au
Q Status
[raj@manjra mpi]$ qstat

ÿ)
Ú

*+$,-
.-)
$Ú!
"
*+#&-
.-)
$'!
"
½utput ² Result/Error
½utput
hello.bat.oXXXXX
Error, if any
hello.bat.eXXXXX
Where XXXXX is the ID assigned to your
job by PBS
References
PBS User uide:

http://www.doesciencegrid.org/public/pbs
More on MPI Program Elements
and Error Checking
Handles
MPI controls its own internal data structures

MPI releases ¶handles· to allow programmers
to refer to these
´Cµ handles are of distinct ;
¶d types
and arrays are indexed from 0
Some arguments can be of any type - in C
these are declared as
Initializing MPI
The first MPI routine called in any MPI

program must be MPI_Init.
The C version accepts the arguments to main
/0/12
3454.
222
3678
MPI_Init must be called by every MPI
program
Making multiple MPI_Init calls is erroneous
MPI_INITIALIZED is an exception to first
rule
MPI_C½MM_W½RLD
MPI_INIT defines a
communicator called
MPI_C½MM_W½RLD for every
process that calls it.
All MPI communication calls
require a communicator
argument
MPI processes can only
communicate if they share a
communicator.
A communicator contains a
group which is a list of
processes
Each process has it·s rank
within the communicator
A process can have several
communicators
Communicators
MPI uses objects called Communicators that
defines which collection of processes communicate
with each other.
Every process has unique integer identifier
assigned by the system when the process initialises.
A rand is sometimes called process ID.
Processes can request information from a
communicator
( ' '
Returns the rank of the process in comm
( (
Returns the size of the group in comm
Finishing up
An MPI program should call /09:

when all communications have completed.
½nce called no other MPI calls can be made
Aborting:
/0;)
14 7
Attempts to abort all processes listed in
comm
if 4 </0=0= the whole program
terminates
Hello World with Error Check
Display Hostname of MPI Process
#include <mpi.h>
main(int argc, char **argv)
{
int numtasks, rank;
int resultlen;
static char mpi_hostname[MPI_MAX_PR½CESS½R_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_C½MM_W½RLD, &numtasks);
MPI_Comm_rank(MPI_C½MM_W½RLD, &rank);
MPI_et_processor_name( mpi_hostname, &resultlen );
printf("Hello World from process %d of %d running on %s\n", rank,

numtasks, mpi_hostname);
MPI_Finalize();
}
MPI Routines
MPI Routines ² C and Fortran
Environment Management
||

-

|
,

- 8#,
1
8,

(

Environment Management Routines
Point-to-Point Communication
A simplest form of message passing

½ne process sends a message to another
Several variations on how sending a message
can interact with execution of the sub-
program
Point-to-Point variations
Synchronous Sends
provide information about the completion of the
message
e.g. fax machines
Asynchronous Sends
½nly know when the message has left
e.g. post cards
Blocking operations
only return from the call when operation has completed
Non-blocking operations
return straight away - can test/wait later for
completion
||

Collective Communications
Collective communication routines are higher

level routines involving several processes at a
time
Can be built out of point-to-point
communications
Barriers
synchronise processes
Broadcast
one-to-many communication
Reduction operations
combine data from several processes to produce a single
(usually) result
-
(

|
,
(

(

- 8#, (

1
8, (

(

MPI Communication Routines
and Examples
MPI Messages
A message contains a number of elements

of some particular data type
MPI data types
Basic Types
Derived types
Derived types can be built up from basic
types
´Cµ types are different from Fortran types
MPI Basic Data types - C
O

O

O

O

O

O

O

O

O

O
O ! "
O !
"
O !#
O $
Point-to-Point Communication
Communication between two processes

Source process sends message to
destination process
Communication takes place within a
communicator
Destination process is identified by its rank
in the communicator
MPI provides four communication modes for
sending messages
standard, synchronous, buffered, and ready
½nly one mode for receiving
Standard Send
Completes once the message has been sent
Note: it may or may not have been received
Programs should obey the following rules:
It should not assume the send will complete before the
receive begins - can lead to deadlock
It should not assume the send will complete after the
receive begins - can lead to non-determinism
processes should be eager readers - they should guarantee
to receive all messages sent to them - else network
overload
Can be implemented as either a buffered
send or synchronous send
Standard Send (cont.)
4 7 -;
;

(
7 the address of the data to be sent
the number of elements of datatype buf contains
;
the MPI datatype
rank of destination in communicator
a marker used to distinguish different message types
the communicator shared by sender and receiver
the fortran return value of the send
Standard Blocking Receive
Note: all sends so far have been blocking (but this

only makes a difference for synchronous sends)
Completes when message received
+7;

5 rank of source process in communicator
5 returns information about message
Synchronous Blocking Message-Passing
processes synchronise
sender process specifies the synchronous mode
blocking - both processes wait until transaction completed
For a communication to succeed
Sender must specify a valid destination

rank
Receiver must specify a valid source rank
The communicator must be the same
Tags must match
Message types must match
Receivers buffer must be large enough
Receiver can use wildcards
<=> 4)?+(@
<=> A<B
actual source and tag are returned in status parameter
Standard/Blocked Send/Receive
MPI Send/Receive a Character
(cont...)
// mpi_com.c
#include <mpi.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
int numtasks, rank, dest, source, rc, tag=1;
char inmsg, outmsg='X';
MPI_Status Stat;
MPI_Init(&argc,&argv);
if (rank == 0) {
dest = 1;
rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_C½MM_W½RLD);
printf("Rank0 sent: %c\n", outmsg);
source = 1;
rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_C½MM_W½RLD, &Stat);
}
MPI Send/Receive a Character
else if (rank == 1) {
source = 0;
rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag,
MPI_C½MM_W½RLD, &Stat);
printf("Rank1 received: %c\n", inmsg);
dest = 0;
rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag,
MPI_C½MM_W½RLD);
}
MPI_Finalize();
}
Execution Demo
mpicc mpi_com.c
[raj@manjra mpi]$ mpirun -np 2 a.out
Rank0 sent: X
Rank0 recv: Y
Rank1 received: X
Non Blocking Message Passing
Exercise: Ping Pong
1. Write a program in which two processes

repeatedly pass a message back and forth.
2. Insert timing calls to measure the time
taken for one message.
3. Investigate how the time taken to exchange
messages varies with the size of the
message.
A simple Ping Pong.c (cont..)
#include <mpi.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
int numtasks, rank, dest, source, rc, tag=1;
char inmsg, outmsg='X';
char pingmsg[10]; char pongmsg[10]; char buff[100];
MPI_Status Stat;
strcpy(pingmsg, "ping");
strcpy(pongmsg, "pong");
MPI_Init(&argc,&argv);
Ñ $ % &
if (rank == 0) { /* Send Ping, Receive Pong */

dest = 1;
source = 1;
rc = MPI_Send(pingmsg, strlen(pingmsg)+1, MPI_CHAR, dest, tag, MPI_C½MM_W½RLD);
rc = MPI_Recv(buff, strlen(pongmsg)+1, MPI_CHAR, source, tag, MPI_C½MM_W½RLD,
&Stat);
printf("Rank0 Sent: %d & Received: %s\n", pingmsg, buff);
}
A simple Ping Pong.c
else if (rank == 1) { /* Receive Ping, Send Pong */
dest = 0;
source = 0;
rc = MPI_Recv(buff, strlen(pingmsg)+1, MPI_CHAR, source, tag,
MPI_C½MM_W½RLD, &Stat);
printf("Rank1 received: %s & Sending: %s\n", buff, pongmsg);
rc = MPI_Send(pongmsg, strlen(pongmsg)+1, MPI_CHAR, dest,
tag, MPI_C½MM_W½RLD);
}
MPI_Finalize();
}
Timers
C: 7 * "

Returns an elapsed wall clock time in seconds (double
precision) on the calling processor.
Time is measured in seconds
Time to perform a task is measured by consulting the time
before and after
Upcoming Evaluations
Mid term exam: ´peerµ evaluation

Review your understanding of topics covered so far.
No official marking ² ´How you are going testµ.
Date: April 27 (Monday),
Time: 20 min (exam), 15min (for peer marking)
Microsoft uest Lecture (May?)

Assignment 2:
Implementation of ´parallelµ Matrix multiplication (using
MPI)
Deadline: April 30 from 1: 10-12; 2: 2-4pm
Acknowledgements:
MPI Slides are Derived from
Dirk van der Knijff, High Performance
Parallel Programming, PPT Slides
MPI Notes, Maui HPC Centre:
http://www.buyya.com/csc433/MPITut.pdf
Melbourne Advanced Research Computing
Center
http://www.hpc.unimelb.edu.au

Mpi

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mpi

Uploaded by

Copyright:

Available Formats

| 

Dr. Rajkumar Buyya

 Introduction to Message Passing

     

 MPI is a library - there are N½ language

 #  "

     Master: manjra.cs.mu.oz.au

 Front View  Back View

 A Run with 3 Processes:

 A Run with 6 Processes:

 Note: Process execution need not be in

 A Run with 6 Processes:

 Some PBS directives to insert at the start

 <manjra.cs.mu.oz.au> runs a batch system - called

> [raj@manjra mpi]$ cat hello.bat

 [raj@manjra mpi]$ qsub hello.bat

`    

 [raj@manjra mpi]$ qsub ²V hello.sh

 [raj@manjra mpi]$ qstat

 PBS User uide:

 MPI controls its own internal data structures

 The first MPI routine called in any MPI

 An MPI program should call /09:

printf("Hello World from process %d of %d running on %s\n", rank,

 A simplest form of message passing

 Collective communication routines are higher

 A message contains a number of elements

O    

 Communication between two processes

 Note: all sends so far have been blocking (but this

 Sender must specify a valid destination

1. Write a program in which two processes

if (rank == 0) { /* Send Ping, Receive Pong */

 C: 7 *  "

 Mid term exam: ´peerµ evaluation

 Microsoft uest Lecture (May?)

You might also like

|

Introduction to Message Passing

MPI is a library - there are N½ language

# "

Master: manjra.cs.mu.oz.au

Front View Back View

A Run with 3 Processes:

A Run with 6 Processes:

Note: Process execution need not be in

A Run with 6 Processes:

Some PBS directives to insert at the start

<manjra.cs.mu.oz.au> runs a batch system - called

[raj@manjra mpi]$ qsub hello.bat

`

[raj@manjra mpi]$ qsub ²V hello.sh

[raj@manjra mpi]$ qstat

PBS User uide:

MPI controls its own internal data structures

The first MPI routine called in any MPI

An MPI program should call /09:

A simplest form of message passing

Collective communication routines are higher

A message contains a number of elements

O

Communication between two processes

Note: all sends so far have been blocking (but this

Sender must specify a valid destination

C: 7 * "

Mid term exam: ´peerµ evaluation

Microsoft uest Lecture (May?)