You are on page 1of 12

/* Histogram Updation with locks */

----- #define INP_SIZE (1 <<26)


omp_lock_t lock[HIST_SIZE] ; #define HIST_SIZE (1<<20)
int main( int *arg, char *argv[] )
{ int hist[HIST_SIZE] ;
int inp[INP_SIZE] ;
int i, key, sum=0; double t1, t2;
for ( i = 0 ; i < HIST_SIZE ; i++)
omp_init_lock( &(lock[i]) );
/* Initialize inp to random values and hist entries to 0 */
t1 = omp_get_wtime();
#pragma omp parallel for private ( key )
for ( i = 0 ; i < INP_SIZE ; i++) {
key = inp [i] ;
omp_set_lock(&(lock[key]) );
hist[key]++;
omp_unset_lock(&(lock[key]) );
}
t2 = omp_get_wtime();
Sum=67108864. Time=3.425 (4 threads)
for ( i = 0 ; i < HIST_SIZE ; i++)
Sum=67108864. Time=0.522617
omp_destroy_lock( &(lock[i]) ); (64 threads)
/* Add up hist entries in sum */
printf( “Sum=%d. Time=%d\n”, sum, t2 - t1) ;

}
lock Advantage
lock Lock
lock Bucket • Take a block of entries called bucket and
associate a lock with that bucket, now
cache can be used

Bucket size fit • locks are 220


in the cache

Dis Advantage

• If two thread trying to update entry in


one bucket, then it has to wait
• They have to content for the same lock.
• Hence reduce parallelism
/* Histogram Updation with locks */ #define INP_SIZE (1 <<26)
----- #define HIST_SIZE (1<<20)
omp_lock_t lock[HIST_SIZE] ;
int main( int *arg, char *argv[] ) int hist[HIST_SIZE] ;
int inp[INP_SIZE] ;
{
int i, key, sum=0; double t1, t2;
for ( i = 0 ; i < HIST_SIZE ; i++)
omp_init_lock( &(lock[i]) );
/* Initialize inp to random values and hist entries to 0 */4 threads 64 threads
Num Bkts
t1 = omp_get_wtime(); 2 18.7102
#pragma omp parallel for private ( key ) 4 14.7779
for ( i = 0 ; i < INP_SIZE ; i++) { 8 8.9611
key = inp [i] ; 16 8.15825 102.275
omp_set_lock(&(lock[key]) ); 32 7.86844
hist[key]++; 64 6.73602
omp_unset_lock(&(lock[key]) );
128 6.01224
}
256 4.49584 9.11782
t2 = omp_get_wtime();
for ( i = 0 ; i < HIST_SIZE ; i++)
512 3.63246
omp_destroy_lock( &(lock[i]) ); 1024 3.5995 3.0454
-----/* Add up hist entries in sum */ 32768 0.796329
printf( “Sum=%d. Time=%d\n”, sum, t2 - t1)262144 ; 0.548705
------- 1048576 3.425 0.522617
} Still not linear speedup
/* Histogram Updation with atomic */
#define INP_SIZE (1 <<26)
----- #define HIST_SIZE (1<<20)

int main( int *arg, char *argv[] ) int hist[HIST_SIZE] ;


{ int inp[INP_SIZE] ;
int i, key, sum=0; double t1, t2;
/* Initialize inp to random values and hist entries to 0 */
t1 = omp_get_wtime();
#pragma omp parallel for private ( key )
for ( i = 0 ; i < INP_SIZE ; i++) {
key = inp [i] ;
#pragma omp atomic  mini critical section
hist[key]++; Its not a CS for following piece of
} code, instead it is on memory
t2 = omp_get_wtime(); location

/* Add up hist entries in sum */ only fix set of operation it can do


printf( “Sum=%d. Time=%d\n”, sum, t2 - t1) ; like memory update operation

}
Th-1 Th-2
So here what it says … hist[1] ++ hist[2] ++
Following memory update, will be perform
atomically, for this memory location These are 2 different memory
location, hence allowed
/* Histogram Updation with atomic */
#define INP_SIZE (1 <<26)
----- #define HIST_SIZE (1<<20)

int main( int *arg, char *argv[] ) int hist[HIST_SIZE] ;


{ int inp[INP_SIZE] ;
int i, key, sum=0; double t1, t2;
/* Initialize inp to random values and hist entries to 0 */
t1 = omp_get_wtime();
#pragma omp parallel for private ( key )
for ( i = 0 ; i < INP_SIZE ; i++) {
key = inp [i] ;
#pragma omp atomic Sum=67108864. Time=0.769491 (4 threads)
hist[key]++; Sum=67108864. Time=0.061616 (64 threads)
} ----- Atomic
t2 = omp_get_wtime();

/* Add up hist entries in sum */


printf( “Sum=%d. Time=%d\n”, sum, t2 - t1) ;
-------
}

• So CS and locks have overheads


• Atomic does only for memory and limited set of operations
Distributed Memory
Applications
MPI – Message Passing Interface – specification of a library

• Different platform different kind of libraries was build.


• Port programs, If built program for one platform for one distributed
system you can carry that on other DS
• If you make assumption which is not there in specifications, it may
work on one platform and may not on other.

• Here we built message passing parallel programming model, used in


distributed memory systems
• Can be used in shared memory system as well
MPI – Message Passing Interface – specification of a
library
• Each processor has its own address space, not shared

• Hence rely on message passing to communicate


• Distributed memory systems – very large scale system that organized
as nodes that are connected with some interconnection network.
MPI – Message Passing Interface – specification of a library

• So MPI is suitable for such environments

• But we can run it on a single node, with multiple rank running


on the single node, they will be launched as separate processes,
but each one will have its own address space

• Biggest strength is it is portable


MPI – Message Passing Interface – specification of a library

• How to Compile mpi programs

 mpicc in C
 mpic++ or mpixx in C++

• How to run mpi programs

 mpirun -np 4 a.out


It goes to scheduler,
 mpiexec -np 4 a.out where it launch
processes

All 4 nodes will execute the


same code
MPI – Message Passing Interface – specification of a library

To run MPI (C) program on multiple machines


1. Create the outputfile using command,
Syntax: mpicc programName Example : mpicc hello.c
2. Create a txt file in which IP addresses for all the machines are written. Ist
address must be own
Syntax: kwrite machine_file_name Example : kwrite machine
3. This file is then given to the following command as,
Syntax: mpirun machinefile machine_file_name -np no._of_proc.s
outputfile
Example :mpirun machinefile machine -np 2 hello

Note: -np is for number of processors. While running the program in the multiple
machines it is neccessary to store all files at the same location in all machines.
MPI Basic function/subroutine

#include <mpi.h>

• MPI_INIT: initialize MPI


• MPI_COMM_SIZE: how many processes ?
• MPI_COMM_RANK: identify the processes
• MPI_SEND :
• MPI_RECV:
• MPI_FINALIZE: close MPI

All you need is to know this 6 calls

You might also like