Professional Documents
Culture Documents
Table of contents
Collective Communication
Communicator
Intercommunicator
Collective Communication
Communication involving a group of processes
Selection of the collective group by a suitable
communicator
All communication members get an identical call.
No tags
Collective communication
...does not necessarily mean all processes
(i.e. global communication)
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 3
Collective Communication
Amount of data sent must exactly match the amount of
data received
Collective routines are collective across an entire
communicator and must be called in the same order from
all processors within the communicator
Collective routines are all blocking
Buffer can be reused upon return
Multi-Task functions
Multi-Broadcast operation
MPI_Allgather()
All participating tasks make the data available to other
participating tasks
Multi-Accumulation operation
MPI_Allreduce()
All participating tasks get result of the operation
Total exchange
MPI_Alltoall()
Each involved task sends and receives to/from all
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 6
Synchronisation
Barrier operation
MPI_Barrier(comm)
All tasks in comm wait on each other to achieve a barrier.
Only collective routine which provides explicit
synchronization
Returns at any processor only after all processes have
entered the call
Barrier can be used to ensure all processes have reached
a certain point in the computation
Mostly used for synchronization sequence of tasks
(e.g. debugging)
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 7
Example: MPI_Barrier
Broadcast operation
MPI_Bcast(buffer,count,datatype,root,communicator)
All processes in the communicator use same function call.
Data from rank root process are distributed to all process
in the communicator
The call is blocking, but not connected to synchronization
Accumulation operation
MPI_Reduce(sendbf,recvbf,count,type,op,master,comm)
Calling process is master
Join operation op (e.g. summation)
Processes involved put their local data into sendbf
master collects results into recvbf
Reduce operation
Pre-defined operations
MPI_MAX
MPI_MAXLOC
MPI_MIN
MPI_SUM
MPI_PROD
MPI_LXOR
MPI_BXOR
...
maximum
maximum and index of maximum
minimum
summation
product
logical exclusive OR
bitwise exclusive OR
Gather operation
MPI_Gather(sbf,scount,stype,rbf,rcount,rtype,ma,comm)
sbuf local send-buffer
rbuf receive-buffer from master ma
Each processor sends rcount elements of data type
rtype to master ma
Order of data in the rbuf corresponds to numerical order
in communicator comm
Scatter operation
MPI_Scatter(sbf,scount,stype,rbf,rcount,rtype,ma,comm)
Master ma distributes/scatters data from sbf
Each process receives sub-buffers from sbf in local
receive buffer rbf
Master ma sends to itself
Order of data in the rbuf corresponds to numerical order
in communicator comm
Example: Scatter
Three processes involved in comm
Send-buffer: int sbuf[6]={3,14,15,92,65,35};
Recieve-buffer:
int rbuf[2];
Function call
MPI_Scatter(sbuf,2,MPI_INT,rbuf,2,MPI_INT,0,comm);
leads to the following distribution:
Process
rbuf
0
{ 3, 14}
1
{15, 92}
2
{65, 35}
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 15
if (world_rank == 0)
rand_nums = create_rand_nums(elements_per_proc * world_size);
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Multi-broadcast operation
MPI_Allgather(sbuf,scount,stype,rbuf,rcount,rtype,comm)
Data from local sbuf are sent to all in rbuf
Indication of master redundant since all processes receive
the same data
MPI_Allgather corresponds to MPI_Gather followed by a
MPI_Bcast
5
6
7
Output
/home/th/:
Avg of all
Avg of all
Avg of all
Avg of all
Total exchange
MPI_Alltoall(sbuf,scount,stype,rbuf,rcount,rtype,comm)
Matrix view
Before MPI_Alltoall process k has row k of the matrix
After MPI_Alltoall process k has column k of the matrix
MPI_Alltoall corresponds to
MPI_Gather followed by a MPI_Scatter
Variable gather
MPI_Gatherv(sbuf,scount,styp,
rbuf,rcount,displs,rtyp,ma,comm)
Also variable function for Allgather, Allscatter & Alltoall
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 21
Example MPI_Scatterv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/*Initialising */
if(myrank==root) init(sbuf,N);
/* Splitting work and data */
MPI_Comm_size(comm,&size);
Nopt=N/size;
Rest=N-Nopt*size;
displs[0]=0;
for(i=0;i<N;i++) {
scount[i]=Nopt;
if(i>0) displs[i]=displs[i-1]+scount[i-1]*sizeof(double);
if(Rest>0) { scount[i]++; Rest--;}
}
/* Distributing data */
MPI_Scatterv(sbuf,scount,displs,MPI_DOUBLE,rbuf,
scount[myrank],MPI_DOUBLE,root,comm);
Example comparison
Compare different approaches
A RNM , N rows, M columns
Row-wise distribution
y=Ax
BLAS-Routine
Column-wise distribution
Reduction operation
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 24
Example row-wise
Row-wise distribution
Result vector y distributed
1
2
3
4
5
6
7
8
9
10
11
void local_mv(N,M,y,A,lda,x)
{
double x[N],A[N*M],y[M],s;
/*partial sum-local op.*/
for(i=0;i<M;i++) {
s=0;
for(j=0;j<N;j++)
s+=A[i*lda+j]*x[j];
y[i]=s;
}
}
Timing
arith.
mem.access
-x
-y
-A
2 N M Ta
M Tm (N, 1)
Tm (M, 1)
M Tm (N, 1)
Example column-wise
Task
Distribution column-wise
Solution vector assembled by reduction operation
Distributing vector x
Vector x
MPI_Scatter
(p-1)*Tk(M)
Communicator
Communicators
Motivation
Communicator: Distinguish different contexts
Conflict-free organization of groups
Integration of third party software
Example: Distinction between
library functions
application
Predefined communicators
MPI_COMM_WORLD
MPI_COMM_SELF
MPI_COMM_NULL
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 35
Duplicate communicators
MPI_Comm_dup(MPI_COMM comm, MPI_COMM &newcomm);
Creates a copy newcomm of comm
Identical process group
Allows
clear delineation
characterisation
of process groups
example
MPI_COMM myworld;
...
MPI_Comm_dup(MPI_COMM_WORLD, &myworld)
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 36
Splitting communicators
MPI_Comm_split(MPI_COMM comm, int color, int key,
MPI_COMM &newcomm);
Divides communicator comm into multiple communicators
with disjoint processor groups
MPI_Comm_split has to be called by all processes in comm
Processes with the same value of color forms a new
communicator group
P0
P1
1
7
P2
2
6
P3 P4
1
5
4
P5 P6
2
3
2
P7
1
1
P8
2
0
MPI_COMM_WORLD
comm1
P1 P4
2
1
P7
0
comm2
P2 P5 P8
2
1
0
P0
0
P3
1
P6
2
Grouping communicators
MPI_COMM_group(MPI_COMM comm, MPI_Group grp)
Creates a process group from a communicator
More group constructors
MPI_COMM_create
Generating a communicator from the group
MPI_Group_incl
Include processes into a group
MPI_Group_excl
Exclude processes from a group
MPI_Group_range_incl
Forms a group from a simple pattern
MPI_Group_range_excl
Excludes processes from a group by simple pattern
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 41
n=3,
rank=[5,0,2]
newgrp=(f,a,c)
MPI_Group_excl(grp, n, &rank, &newgrp)
Exclude from new group newgrp
n=3 processes
newgrp=(b,d,e,g)
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 42
newgrp=(g,h,b,d,f,a,e,i)
MPI_Group_range_excl(grp, 3, ranges, &newgrp)
Exclude from new group newgrp
n=3 range triples defined by [[6,7,1],[1,6,2],[0,9,4]]
newgrp=(c,j)
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 43
MPI_Group_union
MPI_Group_intersection
MPI_Group_difference
MPI_Group_compare
MPI_Group_free
MPI_Group_size
MPI_Group_rank
Intercommunicator
Intercommunicator
Intracommunicator
Up til now, we had only handled communication inside a
contiguous group.
This communication was inside (intra/internal) a
communicator.
Intercommunicator
A communicator who establishes a context between groups
Intercommunicators are associated with 2 groups of
disjoint processes
Intercommunicators are associated with a remote group
and a local group
The target process (destination for send, source for
receive) is its rank in the remote group.
A communicator is either intra or inter, never both
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 46
Create intercommunicator
MPI_Intercomm_create(local_comm, local_bridge,
bridge_comm, remote_bridge, tag, &newcomm )
local_comm
local Intracommunicator (handle)
local_bridge
Rank of a distinguished process in local_comm (integer)
bridge_comm
Remote intracommunication, which should be connected
to local_comm by the newly build intercommunicator
newcomm
remote_bridge
Rank of a certain process in remote communicator
18. May 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 47
Example
1
2
3
4
5
6
7
8
9
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
10
11
12
13
14
15
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Example
/* Do work ... */
1
2
3
4
5
6
7
8
9
10
11
12
13
14
MPI_Finalize();
15
16
Motivation Intercommunicator
Used for
Meta-Computing
Cloud-Computing
Low bandwidth between components
e.g. cluster < > pc
bridge head controls
communication with remote-computer