Professional Documents
Culture Documents
/* #pragma omp task depend(in: <list> ) depend ( out: <list> ) depend ( inout: <list ) */
• depend ( out: x )
A()
• depend ( out: y )
B()
• depend ( out: z )
C()
• depend ( in: x, w )
E()
Matrix Multiplication using tasks
Already seen matrix multiplication. A x B = C
Blocked Algorithm (assume ‘n’ is multiple of ‘b’)
n x n matrix b – block size
Here we are creating task for every (i, k) (k,j) to update (i, j)
K= 1,2,3…. Update ( i, j )
• Creating tasks out of every pairs of blocks in A & B, not for every C block
• So the same C block gets updated by multiple tasks
}
-----
/* Two different critical sections */
-----
#pragma omp parallel default( shared )
{
----
#pragma omp for
for ( i = 0 ; i < ARR_SIZE ; i++) {
psum += a[i]; pprod *= a[i];
}
#pragma omp critical (section1)
{
printf( “In CS 1\n” );
for ( j = 0 ; j < 100000000; j++)
sum += psum;
printf( “Out CS 1\n” );
}
#pragma omp critical (section2)
{
printf( “In CS 2\n” );
for ( j = 0 ; j < 100000000; j++)
prod += pprod;
printf( “Out CS 2\n” );
}
}
/* Two different critical sections */
-----
#pragma omp parallel default( shared )
{
----
#pragma omp for
for ( i = 0 ; i < ARR_SIZE ; i++) {
psum += a[i]; pprod *= a[i];
}
#pragma omp critical (section1)
{
printf( “In CS 1\n” );
for ( j = 0 ; j < 100000000; j++) It ensures that NO two
sum += psum; thread can enter a CS-1 or
printf( “Out CS 1\n” ); CS-2 at the same time but,
}
#pragma omp critical (section2) One in CS-1 and other in CS-
{ 2 at the same time is OK
printf( “In CS 2\n” );
for ( j = 0 ; j < 100000000; j++)
prod += pprod;
printf( “Out CS 2\n” );
}
}
• Here we have given names to the CS
• What if we have CS, which are dependent on the number
of element or size of the data we have
• If its not static.
t1 = omp_get_wtime();
#pragma omp parallel for private ( key )
for ( i = 0 ; i < INP_SIZE ; i++)
{
key = inp[i] ;
hist[key]++;
}
t2 = omp_get_wtime();
-----/* Add up hist entries in sum */
• Threads looking at every element, extract the key, go and update that
particular histogram entry
• If two different threads reads the same integer in the input, may be at
different locations, but if the value is same, they have same value of
key
• Then they will update the same histogram entry together – hence
race condition
/* Histogram Updation */
#define INP_SIZE (1 <<26)
#include <omp.h> #define HIST_SIZE (1<<20)
#include <stdio.h> int hist[HIST_SIZE] ;
int inp[INP_SIZE] ;
int main( int *arg, char *argv[] )
{
int i, key, sum=0; double t1, t2;
---- /* Initialize inp to random values and hist entries to 0 */
t1 = omp_get_wtime();
#pragma omp parallel for private ( key )
for ( i = 0 ; i < INP_SIZE ; i++)
{
key = inp[i] ; Sum=67108864. Time=2.93 (1 thread)
= 2 26
hist[key]++;
} Sum=67104683. Time=…. (4 threads)
t2 = omp_get_wtime(); (race condition)
-----/* Add up hist entries in sum */
t1 = omp_get_wtime();
#pragma omp parallel for private ( key )
for ( i = 0 ; i < INP_SIZE ; i++) { #define INP_SIZE (1 <<26)
key = inp [i] ; #define HIST_SIZE (1<<20)
int hist[HIST_SIZE] ;
#pragma omp critical
int inp[INP_SIZE] ;
hist[key]++;
}
t2 = omp_get_wtime();
}
/* Histogram Updation with critical */ #define INP_SIZE (1 <<26)
----- #define HIST_SIZE (1<<20)
t1 = omp_get_wtime();
#pragma omp parallel for private ( key )
for ( i = 0 ; i < INP_SIZE ; i++) {
key = inp [i] ;
}
/* Histogram Updation with locks */
-----
omp_lock_t lock[HIST_SIZE] ; lock variable of size 220
int main( int *arg, char *argv[] )
{
int i, key, sum=0; double t1, t2;
for ( i = 0 ; i < HIST_SIZE ; i++)
initialize locks
omp_init_lock( &(lock[i]) );
---- /* Initialize inp to random values and hist entries to 0 */
}
• Different threads are able to get their respective locks
simultaneously, CS was not allowing different threads to
enter that region of code
• Here different threads, if they are working on different
elements, at least they can enter the code
• So we get advantage of that.
Disadvantage:
Dis Advantage