You are on page 1of 35

5: Introductory OpenMP

John Burkardt1
1 Virginia Tech

10-12 June 2008

Burkardt Introductory OpenMP


Introductory OpenMP

OpenMP enables parallel execution on shared memory systems.


The user inserts special comment statements, which are actually
OpenMP directives.
These directives indicate which parts of the program can be done
in parallel, and whether any data needs special treatment.
The user decides whether the OpenMP directives should be used
(for parallel execution) or ignored (in which case the program runs
sequentially).

Burkardt Introductory OpenMP


Introductory OpenMP

OpenMP enables cooperative execution of a user program by


processors sharing a common memory.
OpenMP works well with:
a multicore chips;
a NUMA system, multiple chips with separate, but linked,
memories

OpenMP can also be used in combination with MPI, combining


the power of fast chips and networking, but this is an advanced
topic!

Burkardt Introductory OpenMP


Introductory OpenMP
OpenMP is an efficient language for modern multicore chips.

Burkardt Introductory OpenMP


Introductory OpenMP

OpenMP is available for C, C++, FORTRAN77 and


FORTRAN90 programmers.
OpenMP is mainly a series of “comments” to the language
compiler, pointing out opportunities for parallel execution.
At compile time, the user declares whether the program should be
prepared for sequential or parallel execution.
If the program is compiled for parallel execution, many details are
only decided at execution time (how many processors will be used,
which processors do what.)

Burkardt Introductory OpenMP


Introductory OpenMP

Glendower: ”I can call spirits from the vasty deep.”


Hotspur: ”Why, so can I, or so can any man. But will they come
when you do call for them?”
Shakespeare, Henry IV, Part 1.

Any C/C++/Fortran program can “call” OpenMP


...but you only get parallel code if the compiler supports OpenMP.

Burkardt Introductory OpenMP


Compilers Support OpenMP

Compiler writers do support OpenMP:

Gnu gcc/g++ 4.2, gfortran 2.0;


IBM xlc, xlf
Intel icc, ifort
Microsoft Visual C++ (2005 Professional edition)
Portland C/C++/Fortran, pgcc, pgf95
Sun Studio C/C++/Fortran

Burkardt Introductory OpenMP


The OpenMP ”Language”

OpenMP can be divided into five parts:


parallel control structures
data classification (private/shared)
environment inquiry and setting
work sharing
synchronization

We will concentrate on the first three parts.

Burkardt Introductory OpenMP


OpenMP Ideas: Threads

OpenMP uses the idea of threads to describe the multiple agents


that are cooperating.
A thread is an entity that can carry out some part of a program,
with its own list of tasks and data.
In any computation, a single master thread begins the
computation.
When a parallel loop is encountered, the master thread calls up
more threads. Technically, the single thread of execution forks into
multiple threads.
The threads cooperate until the parallel work is done. Then the
multiple threads join into a single thread again.

Burkardt Introductory OpenMP


OpenMP Ideas: Loops

OpenMP assumes that the parallel work is done in the for loops
or do loops in the program.
The user marks which loops are to be executed in parallel.
So the program begins as a single thread of execution.
When a parallel loop is reached, extra threads of execution are
generated.
Each thread will execute some loop iterations.
When the loop is complete, the extra threads disappear.

Burkardt Introductory OpenMP


OpenMP Ideas: Shared and Private Data

In the ideal case, each iteration of the loop uses data in a way that
doesn’t depend on other iterations. Loosely, this is the meaning of
the term shared data.
For instance, in the SAXPY task, each iteration is
y(i) = s * x(i) + y(i)
You can imagine that any number of processors could cooperate in
a calculation like this, with no interference.
We will start by assuming that all data can be shared, and wait for
reality to correct us!
This is OpenMP’s default assumption, as well.

Burkardt Introductory OpenMP


SAXPY: Sequential Version

# i n c l u d e <s t d l i b . h>
# i n c l u d e <s t d i o . h>

i n t main ( i n t a r g c , c h a r ∗ a r g v [ ] )
{
i n t i , n = 1000;
double x [ 1 0 0 0 ] , y [ 1 0 0 0 ] , s ;

s = 123.456;

f o r ( i = 0 ; i < n ; i++ )
{
x [ i ] = ( d o u b l e ) r a n d ( ) / ( d o u b l e ) RAND MAX ;
y [ i ] = ( d o u b l e ) r a n d ( ) / ( d o u b l e ) RAND MAX ;
}

f o r ( i = 0 ; i < n ; i++ )
{
y[ i ] = y[ i ] + s ∗ x[ i ];
}
return 0;
}

Burkardt Introductory OpenMP


The SAXPY task

The SAXPY task adds a multiple of vector X to vector Y.


The arrays X and Y can be shared, because only the thread
associated with loop index I needs to look at the I-th entries.
Each thread will need to know the value of S but they can all
agree on what that value is. (They “share” the same value).
This is a “perfect” parallel application: no private data, no
memory contention.
SAXPY is a common low level task, as in Gauss elimination.

Burkardt Introductory OpenMP


SAXPY: with OpenMP Directives

# i n c l u d e <s t d l i b . h>
# i n c l u d e <s t d i o . h>
# i n c l u d e <omp . h>

i n t main ( i n t a r g c , c h a r ∗ a r g v [ ] )
{
i n t i , n = 1000;
double x [ 1 0 0 0 ] , y [ 1 0 0 0 ] , s ;

s = 123.456;

f o r ( i = 0 ; i < n ; i++ )
{
x [ i ] = ( d o u b l e ) r a n d ( ) / ( d o u b l e ) RAND MAX ;
y [ i ] = ( d o u b l e ) r a n d ( ) / ( d o u b l e ) RAND MAX ;
}

#pragma omp p a r a l l e l f o r
f o r ( i = 0 ; i < n ; i++ )
{
y[ i ] = y[ i ] + s ∗ x[ i ];
}
return 0;
}

Burkardt Introductory OpenMP


OpenMP C Syntax

The #pragma omp string is a marker that indicates to the


compiler that this is an OpenMP directive.
The parallel clause indicates that parallel execution is being
requested.
The for clause indicates that the parallel execution will take place
in a for loop.

Burkardt Introductory OpenMP


OpenMP Fortran Syntax

The rules for Fortran are similar.


The marker string is !$omp.
The parallel clause indicates that parallel execution is being
requested.
The do clause indicates that the parallel execution will take place
in a do loop.
In Fortran, but not C, the end of the parallel loop must be marked
by a !$omp end parallel marker.

Burkardt Introductory OpenMP


SAXPY: with OpenMP Directives

program main

integer i , n
double p r e c i s i o n x (1000) , y (1000) , s

n = 1000
s = 123.456

do i = 1 , n
x ( i ) = rand ( )
y ( i ) = rand ( )
end do

! $omp p a r a l l e l do
do i = 1 , n
y( i ) = y( i ) + s ∗ x( i )
end do
! $omp end p a r a l l e l

stop
end

Burkardt Introductory OpenMP


The SAXPY task - How To Think About Threads

To visualize parallel execution, suppose 4 threads will execute the


SAXPY loop.
OpenMP might assign the first 250 iterations to thread 1, and so
on.
Then you also have to imagine that the four threads each execute
their loop more or less simultaneously.
Even this simple model of what’s going on will suggest some of the
things that can go wrong in a parallel program!

Burkardt Introductory OpenMP


The SAXPY loop, as OpenMP might think of it

i f ( t h r e a d i d == 1 ) then
do i = 1 , 250
y( i ) = y( i ) + s ∗ x( i )
end do
else i f ( thread id == 2 ) t h e n
do i = 2 5 1 , 500
y( i ) = y( i ) + s ∗ x( i )
end do
else i f ( thread id == 3 ) t h e n
do i = 5 0 1 , 750
y( i ) = y( i ) + s ∗ x( i )
end do
else i f ( thread id == 4 ) t h e n
do i = 7 5 1 , 1000
y( i ) = y( i ) + s ∗ x( i )
end do
end i f

Burkardt Introductory OpenMP


The SAXPY task - Comments

What about the loop that initializes X and Y?


The problem here is that we’re calling the rand function.
Normally, inside a parallel loop, you can call a function and it will
also run in parallel. However, the function cannot have side effects.
The rand function is a special case; it has an internal “static” or
“saved” variable whose value is changed and remembered
internally.
Getting random numbers in a parallel loop requires care. We will
leave this topic for later discussion.

Burkardt Introductory OpenMP


DOT: Sequential Version

# i n c l u d e <s t d l i b . h>
# i n c l u d e <s t d i o . h>

i n t main ( i n t a r g c , c h a r ∗ a r g v [ ] )
{
i n t i , n = 1000;
double x [ 1 0 0 0 ] , y [ 1 0 0 0 ] , xdoty ;

f o r ( i = 0 ; i < n ; i++ )
{
x [ i ] = ( d o u b l e ) r a n d ( ) / ( d o u b l e ) RAND MAX ;
y [ i ] = ( d o u b l e ) r a n d ( ) / ( d o u b l e ) RAND MAX ;
}

xdoty = 0 . 0 ;
f o r ( i = 0 ; i < n ; i++ )
{
xdoty = xdoty + x [ i ] ∗ y [ i ] ;
}
p r i n t f ( ”XDOTY = %e\n” , x d o t y ) ;
return 0;
}

Burkardt Introductory OpenMP


The DOT task

The DOT task seems similar to the SAXPY task, but there’s an
important difference:
The arrays X and Y can still be shared - only the thread associated
with loop index I needed to look at the I-th entries.
But the results of every computation now are added to the same,
single variable, XDOTY.
The threads doing iterations I1 and I2 will both want to “read”
the current copy of XDOTY, add their increment to it, and
“write” the updated copy.
Uncontrolled sequences of read’s and writes cause errors!

Burkardt Introductory OpenMP


The DOT task

The dot product is one example of a reduction operation.


Examples include computing;
the sum of the entries of a vector,
the product of the entries of a vector,
the maximum or minimum of a vector,
the Euclidean norm of a vector,

Reduction operations, if recognized, can be carried out in parallel.


In OpenMP, this is done with a reduction declaration which tells
the compiler what it needs to know so that the result vector is
computed in an orderly and correct way.

Burkardt Introductory OpenMP


DOT: with OpenMP directives

# i n c l u d e <s t d l i b . h>
# i n c l u d e <s t d i o . h>
# i n c l u d e <omp . h>

i n t main ( i n t a r g c , c h a r ∗ a r g v [ ] )
{
i n t i , n = 1000;
double x [ 1 0 0 0 ] , y [ 1 0 0 0 ] , xdoty ;

f o r ( i = 0 ; i < n ; i++ )
{
x [ i ] = ( d o u b l e ) r a n d ( ) / ( d o u b l e ) RAND MAX ;
y [ i ] = ( d o u b l e ) r a n d ( ) / ( d o u b l e ) RAND MAX ;
}

xdoty = 0 . 0 ;
#pragma omp p a r a l l e l f o r r e d u c t i o n ( + : x d o t y )
f o r ( i = 0 ; i < n ; i++ )
{
xdoty = xdoty + x [ i ] ∗ y [ i ] ;
}
p r i n t f ( ”XDOTY = %e\n” , x d o t y ) ;
return 0;
}

Burkardt Introductory OpenMP


OpenMP Ideas: Private and Shared Data

OpenMP assumes that most of the problem data is shared, that


is, there is a single copy, to which all threads have access.
If data is only “read”, it’s obviously shared.
Some data may be shared even though it is written. An example is
an array A. If entry A[I] is only written during loop iteration I,
then the array A can probably remain shared.
Examples of private data include temporary variables that are
assigned during a loop iteration.

Burkardt Introductory OpenMP


The PRIME SUM task

The PRIME SUM task will illustrate the concept of private and
shared variables.
Our task is to compute the sum of the inverses of the prime
numbers from 1 to N.
A natural formulation stores the result in TOTAL, then checks
each number I from 2 to N.
To check if the number I is prime, we ask whether it can be evenly
divided by any of the numbers J from 2 to I − 1.
We can use a temporary variable PRIME to help us.

Burkardt Introductory OpenMP


The PRIME SUM task: Sequential Version
# i n c l u d e <c s t d l i b >
# i n c l u d e <i o s t r e a m >
u s i n g namespace s t d ;

i n t main ( i n t a r g c , c h a r ∗ a r g v [ ] )
{
int i , j , total ;
i n t n = 1000;
bool prime ;

total = 0;
f o r ( i = 2 ; i <= n ; i++ )
{
prime = true ;

f o r ( j = 2 ; j < i ; j++ )
{
i f ( i % j == 0 )
{
prime = f a l s e ;
break ;
}
}
i f ( prime )
{
total = total + i ;
}
}
c o u t << ”PRIME SUM ( 2 : ” << n << ” ) = ” << t o t a l << ”\n” ;
return 0;
}

Burkardt Introductory OpenMP


The PRIME SUM Task: Eliminating Data Conflicts!

Let’s imagine we parallelize the I loop.


So each thread:
works on an integer I
initializes PRIME to be TRUE
checks whether any J can divide I and resets PRIME if
necessary;
If PRIME ends up TRUE, add 1/I to TOTAL.

The variables J, PRIME and TOTAL represent possible data


conflicts that we must resolve.

Burkardt Introductory OpenMP


PRIME SUM: With OpenMP Directives
# i n c l u d e <c s t d l i b >
# i n c l u d e <i o s t r e a m >
u s i n g namespace s t d ;

i n t main ( i n t a r g c , c h a r ∗ a r g v [ ] )
{
i n t i , j , t o t a l , n = 1000 , t o t a l = 0 ;
bool prime ;

# pragma omp p a r a l l e l f o r p r i v a t e ( i , p ri m e , j ) s h a r e d ( n )
# pragma omp r e d u c t i o n ( + : t o t a l )
f o r ( i = 2 ; i <= n ; i++ )
{
prime = true ;

f o r ( j = 2 ; j < i ; j++ )
{
i f ( i % j == 0 )
{
prime = f a l s e ;
break ;
}
}
i f ( prime )
{
total = total + i ;
}
}
c o u t << ”PRIME SUM ( 2 : ” << n << ” ) = ” << t o t a l << ”\n” ;
return 0;
}

Burkardt Introductory OpenMP


STEPS

For the STEPS task, we suppose we want to evaluate a function at


equally spaced points in a rectangle.
A common way starts (X,Y) at (0,0) and then increments X by
DX. If we exceed the maximum value of X, we reset that to 0, and
increment Y by DY.
In sequential computing, this is a natural way to “visit” every
point.
This simple idea won’t work in parallel without some changes.
Each thread needs a separate copy of (X,Y), but the value of
these variables depends on having all the previous iterations.
This is a simple example of a common problem, data
dependence. Some cases are curable, some not.

Burkardt Introductory OpenMP


STEPS: Sequential Version

program main

i n t e g e r i , j , m, n
r e a l dx , dy , t o t a l , x , y ,

t o t a l = 0.0
y = 0.0
do j = 1 , n
x = 0.0
do i = 1 , m
total = total + f ( x , y )
x = x + dx
end do
y = y + dy
end do

stop
end

Burkardt Introductory OpenMP


STEPS

To make the loop iterations independent,


precompute X(1:M) and Y(1:N) in arrays.
or notice X = I /M and Y = J/N
The second solution is simple and saves us a separate preparation
loop and extra storage.
The first solution, converting some temporary scalar variables to
vectors and precomputing them, may be able to help you
parallelize a stubborn loop.

Burkardt Introductory OpenMP


STEPS: With OpenMP directives

program main

i n t e g e r i , j , m, n
real total , x , y ,

t o t a l = 0.0
! $omp p a r a l l e l do p r i v a t e ( i , j , x , y ) s h a r e d ( m, n ) r e d u c t i o n ( + : t o t a l )
do j = 1 , n
y = j / real ( n )
do i = 1 , m
x = i / real ( m )
total = total + f ( x , y )
end do
end do
! $omp end p a r a l l e l do

stop
end

Burkardt Introductory OpenMP


STEPS

Another issue pops up in the STEPS program. What happens


when you call the function f(x,y) inside the loop?
Notice that f is not a variable, it’s a function, so it is not declared
private or shared.
The function might have internal variables, loops, might call other
functions, and so on.
OpenMP works in such a way that a function called within a
parallel loop will also participate in the parallel execution. We
don’t have to make any declarations about the function or its
internal variables at all.
Each thread runs a separate copy of f.
(But if f includes static or saved variables, trouble!)

Burkardt Introductory OpenMP


SUMMARY

I hope you have started to become familiar with some of the basic
concepts of OpenMP.
We will come back to OpenMP in more detail for:
a few more useful features of the language;
some obstacles to parallelization;
how to get or set the number of threads
how to measure the improvement in performance.

Burkardt Introductory OpenMP

You might also like