March 15

/* Histogram Updation with locks */
----- #define INP_SIZE (1 <<26)

omp_lock_t lock[HIST_SIZE] ; #define HIST_SIZE (1<<20)
int main( int *arg, char *argv[] )
{ int hist[HIST_SIZE] ;
int inp[INP_SIZE] ;
int i, key, sum=0; double t1, t2;
for ( i = 0 ; i < HIST_SIZE ; i++)
omp_init_lock( &(lock[i]) );
/* Initialize inp to random values and hist entries to 0 */
t1 = omp_get_wtime();
#pragma omp parallel for private ( key )
for ( i = 0 ; i < INP_SIZE ; i++) {
key = inp [i] ;
omp_set_lock(&(lock[key]) );
hist[key]++;
omp_unset_lock(&(lock[key]) );
}
Sum=67108864. Time=3.425 (4 threads)
Sum=67108864. Time=0.522617
omp_destroy_lock( &(lock[i]) ); (64 threads)
/* Add up hist entries in sum */
printf( “Sum=%d. Time=%d\n”, sum, t2 - t1) ;
}
lock Advantage
lock Lock
lock Bucket • Take a block of entries called bucket and
associate a lock with that bucket, now
cache can be used
Bucket size fit • locks are 220

in the cache
Dis Advantage
• If two thread trying to update entry in

one bucket, then it has to wait
• They have to content for the same lock.
• Hence reduce parallelism
/* Histogram Updation with locks */ #define INP_SIZE (1 <<26)
----- #define HIST_SIZE (1<<20)
omp_lock_t lock[HIST_SIZE] ;
int main( int *arg, char *argv[] ) int hist[HIST_SIZE] ;
int inp[INP_SIZE] ;
{
omp_init_lock( &(lock[i]) );
/* Initialize inp to random values and hist entries to 0 */4 threads 64 threads
Num Bkts
t1 = omp_get_wtime(); 2 18.7102
#pragma omp parallel for private ( key ) 4 14.7779
for ( i = 0 ; i < INP_SIZE ; i++) { 8 8.9611
key = inp [i] ; 16 8.15825 102.275
omp_set_lock(&(lock[key]) ); 32 7.86844
hist[key]++; 64 6.73602
omp_unset_lock(&(lock[key]) );
128 6.01224
}
256 4.49584 9.11782
512 3.63246
omp_destroy_lock( &(lock[i]) ); 1024 3.5995 3.0454
-----/* Add up hist entries in sum */ 32768 0.796329
printf( “Sum=%d. Time=%d\n”, sum, t2 - t1)262144 ; 0.548705
------- 1048576 3.425 0.522617
} Still not linear speedup
/* Histogram Updation with atomic */
#define INP_SIZE (1 <<26)

{ int inp[INP_SIZE] ;
for ( i = 0 ; i < INP_SIZE ; i++) {
key = inp [i] ;
#pragma omp atomic  mini critical section
hist[key]++; Its not a CS for following piece of
} code, instead it is on memory
t2 = omp_get_wtime(); location
/* Add up hist entries in sum */ only fix set of operation it can do

printf( “Sum=%d. Time=%d\n”, sum, t2 - t1) ; like memory update operation
}
Th-1 Th-2
So here what it says … hist[1] ++ hist[2] ++
Following memory update, will be perform
atomically, for this memory location These are 2 different memory
location, hence allowed
/* Histogram Updation with atomic */
#define INP_SIZE (1 <<26)

{ int inp[INP_SIZE] ;
for ( i = 0 ; i < INP_SIZE ; i++) {
key = inp [i] ;
#pragma omp atomic Sum=67108864. Time=0.769491 (4 threads)
hist[key]++; Sum=67108864. Time=0.061616 (64 threads)
} ----- Atomic
/* Add up hist entries in sum */

printf( “Sum=%d. Time=%d\n”, sum, t2 - t1) ;
-------
}
• So CS and locks have overheads

• Atomic does only for memory and limited set of operations
Distributed Memory
Applications
MPI – Message Passing Interface – specification of a library
• Different platform different kind of libraries was build.

• Port programs, If built program for one platform for one distributed
system you can carry that on other DS
• If you make assumption which is not there in specifications, it may
work on one platform and may not on other.
• Here we built message passing parallel programming model, used in

distributed memory systems
• Can be used in shared memory system as well
MPI – Message Passing Interface – specification of a
library
• Each processor has its own address space, not shared
• Hence rely on message passing to communicate

• Distributed memory systems – very large scale system that organized
as nodes that are connected with some interconnection network.
• So MPI is suitable for such environments
• But we can run it on a single node, with multiple rank running

on the single node, they will be launched as separate processes,
but each one will have its own address space
• Biggest strength is it is portable

• How to Compile mpi programs
 mpicc in C
 mpic++ or mpixx in C++
• How to run mpi programs
 mpirun -np 4 a.out

It goes to scheduler,
 mpiexec -np 4 a.out where it launch
processes
All 4 nodes will execute the

same code
To run MPI (C) program on multiple machines

1. Create the outputfile using command,
Syntax: mpicc programName Example : mpicc hello.c
2. Create a txt file in which IP addresses for all the machines are written. Ist
address must be own
Syntax: kwrite machine_file_name Example : kwrite machine
3. This file is then given to the following command as,
Syntax: mpirun machinefile machine_file_name -np no._of_proc.s
outputfile
Example :mpirun machinefile machine -np 2 hello
Note: -np is for number of processors. While running the program in the multiple
machines it is neccessary to store all files at the same location in all machines.
MPI Basic function/subroutine
#include <mpi.h>
• MPI_INIT: initialize MPI

• MPI_COMM_SIZE: how many processes ?
• MPI_COMM_RANK: identify the processes
• MPI_SEND :
• MPI_RECV:
• MPI_FINALIZE: close MPI
All you need is to know this 6 calls

March 15

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

March 15

Uploaded by

Copyright:

Available Formats

/* Histogram Updation with locks */

----- #define INP_SIZE (1 <<26)

Bucket size fit • locks are 220

• If two thread trying to update entry in

int main( int arg, char argv[] ) int hist[HIST_SIZE] ;

/* Add up hist entries in sum */ only fix set of operation it can do

int main( int arg, char argv[] ) int hist[HIST_SIZE] ;

/* Add up hist entries in sum */

• So CS and locks have overheads

• Different platform different kind of libraries was build.

• Here we built message passing parallel programming model, used in

• Hence rely on message passing to communicate

• So MPI is suitable for such environments

• But we can run it on a single node, with multiple rank running

• Biggest strength is it is portable

• How to Compile mpi programs

• How to run mpi programs

 mpirun -np 4 a.out

All 4 nodes will execute the

To run MPI (C) program on multiple machines

• MPI_INIT: initialize MPI

All you need is to know this 6 calls

You might also like