Fdi 2008 Lecture4

4: Some Model Programming Tasks
John Burkardt1
1 Virginia Tech
10-12 June 2008
Burkardt Some Model Programming Tasks

Model Programming Tasks
Every program is different...however

There are some basic calculations that:
occur frequently in scientific programming;
are simple enough to analyze and understand;
illustrate issues parallel programming.

SUM, sum a list of 10,000 numbers
DOT, compute the dot product of two vectors
SORT, sort a list of 10,000 numbers
Rb
INT, estimate a f (x) dx
MC-RANDOM, generate 10,000 random numbers
MC-INT, Monte Carlo integration
MC-ISING, Monte Carlo magnetism simulation
MC-FIRE, Monte Carlo forest fire
MC-RADIATION, Monte Carlo radiation simulation
DE, solve an ordinary differential equation
DE-HEAT, model heat flow over time on a wire
DE-NBODY, N-body evolutions (molecules or stars)
DOT, SAXPY, MV, MM, multiplying vectors and matrices
JAC, solve a sparse linear system A*x=y
SGEFA, solve a dense linear system A*x=y
The SUM and DOT tasks
The SUM task: compute the sum of 10,000 numbers.

We’ll assume the numbers are available in a vector:
sum = x[0] + x[1] + x[2] + ... + x[9999]
It’s easy to see how multiple processors could help here!
The DOT task is similar:
dot = x[0] * y[0] + x[1] * y[1] + ... + x[9999]*y[9999]

SUM: The Sum Task in Shared Memory
The teacher writes down 0 on a piece of paper, the current sum.

She has a list of numbers, which she does not intend to add herself.
Each student may come to the front, copy the current sum
estimate and one number from the list (crossing it off), sit down,
add the number to the sum, and come back to the front with an
updated sum, which shall replace the value in the teacher’s sheet.
What could go wrong here?

SUM: The SUM Task in Distributed Memory
The teacher writes down 0 on a piece of paper, the current sum.

She divides her list of numbers up into roughly equal parts, and
reads out the numbers, assigning one sublist to each student.
Each student sums their numbers, and then shouts out their result.
The teacher tries to hear each result and add it to the running
total.
What could go wrong here?

SORT: The Sorting Task
The SORT task: sort 10,000 numbers.

If a single person or processor is involved, then sorting is
straightforward. There are fast ways (quicksort, heapsort) and slow
ways (bubblesort), but they all get there in the end.
When we have several people available to sort one list, can we
actually take advantage of the extra help?
(Merging is much faster than sorting.)

SORT: The Sorting Task
The SORT task is an example of a problem involving a certain

amount of conditional logic.
The fundamental task is to compare two quantities:
if X(I) < X(I+1) then interchange X(I) and X(J)
Sorting involves no arithmetic, just comparisons and interchanges,
but it is certainly not a “zero cost” operation!
Complicated conditional logic can make it difficult for the compiler
to do a good job of parallelizing your code.

INT: The Integration Task
Rb
The INT task: estimate a f (x) dx.
A naive way to approximate an integral multiplies the average of
many values of f (x) by the length of the interval (b − a).
Complications arise if f (x) is hard to evaluate, or if its behavior
varies in parts of the interval.
Another issue arises if we need an accurate estimate, and intend to
average as many values as necessary to get there.

MC-RANDOM: The Monte Carlo Randomization Task
Monte Carlo simulations analyze a situation by randomly sampling

parameters and averaging the outcomes.
This usually means computing a lot of “random” numbers;
How are random numbers generated? Two examples:
LC-RNG, Linear congruential random number generator
LF-RNG, Lagged Fibonacci random number generator
A random number generator has a period, after which it will

repeat its results. A long period is better!

The LC-RNG is often implemented in integer arithmetic:
I1 = ( A * I0 + B ) mod C ==> X1 = I1 / IMAX

I2 = ( A * I1 + B ) mod C ==> X2 = I2 / IMAX
I3 = ( A * I2 + B ) mod C ==> X3 = I3 / IMAX
There is a default value of I0, the “initial seed”.

The user can reset the seed at any time, which will cause the
random number sequence to jump back or ahead.

For us, the use of random number raises these issues:

efficiency 1000 values in 1 call vs 1000 calls for 1 value at a
time;
contention in a shared memory system, how are random
numbers shared?
coordination in a distributed memory system, you want your
random subsequences not to overlap.
period: for really big computations, you want a long period
on the sequence.

A big problem arises in shared memory computations.

The random number generator may store a seed value internally or
the user may supply it.
If seed is a single number, then multiple processes will be trying to
access it and modify it.
This can cause delays (if the processes are “polite”) or errors (if
processes read and write the seed chaotically).
This problem is fixable (but you have to notice it first!)

MC-INT: The Monte Carlo Integration Task
Rb
The MC-INT task: estimate a f (x) dx.
The integral can be estimated by evaluating f (x) at points
randomly selected in [a, b], and multiply the average value by the
length of the interval b − a.
A general multidimensional integral over D can be estimated in the
same way, by evaluating f (x, y , z) at points randomly selected
from D, and multiplying the average value by the volume of D.

MC-INT: The Monte Carlo Integration Task
Interesting features:
works the same in any spatial dimension.
works the same for irregular regions.
the computation includes an estimate of its accuracy.
no matter how many samples have been taken, the user can stop
with a valid estimate, or take more samples for an improved
estimate.
the computation is embarassingly parallel.

MC-ISING: The Monte Carlo Ising Model
To understand an observed physical process, it is helpful to find a

simplified model that seems to have some of the same properties.
In the Ising model of magnetism, each cell in a rectangular is
assigned an initial spin +1 or -1.
The total magnetism of the system is the sum of the spins.
In the ferromagnetic model, the energy is the negative of the sum
of neighboring spins:
X
E =− (si,j si,j+1 + si,j si+1,j )
i,j
(This clever sum counts all neighbor pairs just once, assuming
wraparound at the boundaries.)

Start with random spins in each cell, and total energy E .

Let T be a nonnegative “temperature”
Choose a cell at random. Compute E2 , the energy the system
would have if we reversed this single spin.
If E2 < E , reverse the spin always.

−(E2 −E
If E < E2 , reverse the spin with probability e T .
For 0 < T , the system will organize itself into regions of positive
and negative spin.


MC-FIRE: The Monte Carlo Forest Fire Model
A forest fire simulation allows each cell the states

alive
burning
dead
At each time step, a cell’s new state is computed:
if alive
if any neighbor is burning, this cell is burning
else, with probability B, spontaneously begin burning
if burning
become dead
if dead
with probability A, spontaneously become alive

MC-RADIATION: The Monte Carlo Radiation Model
Neutrons emitted at origin with random energy E.

A shield wall of thickness T.
A neutron will penetrate D units of shield, where D depends on
energy.
If still inside shield, a neutron is either absorbed (end of story) or
re-emitted with random direction and lower energy.
For a given level of radiation (say 10,000 particles a second), and
shield thickness T, what portion of the initial energy penetrates the
shield?
The Monte Carlo method was created because of the necessity of
making estimates of this kind!

DE: Differential Equations
The DE task: determine the values of x(t) over a range

t0 <= t <= t1 given an initial value x(t0 ) and a differential
equation
dx
= f (x, t) (1)
dt

DE: Differential Equations
A differential equation is solved one step at a time!

DE: An Ordinary Differential Equation
Because a differential equation is solved one step at a time, it

resists parallelization.
The solution data is like a list of values, where the n + 1-th value
depends on the n-th value.
But differential equations often include parts that can be
parallelized, often because there is a spatial dependence of the
data.

DE-HEAT: The Heat Differential Equation
The HEAT task: determine the values of H(x, t) over a range

t0 <= t <= t1 and space x0 <= x <= x1 , given an initial value
H(x, t0 ), boundary conditions, a heat source function f (x, t), and
a partial differential equation
∂H ∂2H
− k 2 = f (x, t)
∂t ∂x
Parallel processing is actually useful for this problem!

Choose a set of nodes in the X direction.

Assign the initial value of H at all the nodes.
Rewrite the differential equation at node i:
∂2H (H(i−1)−2H(i)+H(i+1))
replace ∂x 2
by ∆x 2
.
∂H HNEW (i)−H(i)
replace ∂t by ∆t

Now, at each node, you have a formula for computing the new
value of H(i) based on data at the old time.
The computations of the values at the next time are independent,
and can be done in parallel.
For a distributed memory version, perhaps using domain
decomposition, each processor lets its two neighbors know the
updated internal boundary values.


DE-NBODY: The N Body Differential System
The N Body task:
initial locations (molecules or stars)

properties (mass, charge)
velocities
Apply forces:
gravity
electric field
van der Waal attraction
Update locations and velocities.
Parallelizable, but need to communicate new locations frequently.

The DOT, SAXPY, MV and MM Tasks
The DOT, MV and MM tasks are very clean, very busy linear
algebra operations.
They are easy to analyze and program.
But even these ”almost perfectly straightforward” tasks can be
programmed well or poorly!
1 DOT: s = x ∗ y , (scalar result)

2 SAXPY: z = s ∗ x + y , (vector result)
3 MV : y = A ∗ x, (vector result)
4 MM: C = A ∗ B, (matrix result)

For vectors of length N, and matrices of order N by N, the FLOP

count is
DOT N
SAXPY N
MV N ∗ N
MM N ∗ N ∗ N

These linear algebra tasks are so mathematically simple that it’s

hard to see how a performance issue could arise.
But as an illustration, consider the two implementations of the
MV task.

MV in the I,J order
integer i , j , n
real a(n , n ) , x (n ) , y(n)
do i = 1 , n
y ( i ) = 0.0
do j = 1 , n
y( i ) = y( i ) + a( i , j ) ∗ x( j )
end do
end do

MV in the J,I order
integer i , j , n
real a(n , n ) , x (n ) , y(n)
do i = 1 , n
y ( i ) = 0.0
end do
do j = 1 , n
do i = 1 , n
y( i ) = y( i ) + a( i , j ) ∗ x( j )
end do
end do

JAC: The Jacobi Iteration Task
The JAC task is a sort of inverse of the MV task.

Given matrix A and vector y , find x so that y = A ∗ x.
We’ll assume A allows the use of Jacobi iteration.

JAC: The Jacobi Iteration Task
A simple version of one step of Jacobi iteration is as follows. We

start with an initial vector x.
1 In equation 1, use old values for x(2) through x(n), and solve
for new x(1);
2 In equation 2, use old values for x(1), and x(3) through x(n),
and solve for new x(2);
3 ...and so on until...
4 In equation n, use old values for x(1) through x(n − 1), and
solve for new x(n)
At the end of this process (and only then!) replace all the old
values by the new values.
Then repeat the solution step.

SGEFA: The Direct Linear Solution Task
The SGEFA task, like the JAC task, seeks x so that y = A ∗ x.

The all-purpose solution method is Gauss elimination.
1 Eliminate x(1) from equations 2 through n;

2 eliminate x(2) from equations 3 through n.
4 eliminate x(n − 1) from equation n.
5 solve equation n for x(n)
6 solve equation n − 1 for x(n − 1)
8 solve equation 1 for x(1).

SGEFA: The Direct Linear Solution Task
SGEFA searches through the data for maximum pivot values,

moves data to put the pivot row in a special place, and does a lot
of arithmetic, making it a “typical” scientific calculation.
For an N by N matrix, the FLOP count is
2N 3 /3 for the elimination,
N 2 for the back substitution;
The case N = 100 was the original LINPACK Benchmark. Soon

N = 1000 gave more reliable results. The current distributed
memory version, HPL, lets N increase to suit the system.

In our discussions and exercises, I will refer to some of the tasks we

have mentioned.
Many scientific calculations have parts that are understandable in
terms of these model tasks.
Understanding these simple tasks will help you to identify
corresponding features of your own programs, and thus help you
apply parallel programming ideas.

Fdi 2008 Lecture4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fdi 2008 Lecture4

Uploaded by

Copyright:

Available Formats

4: Some Model Programming Tasks

10-12 June 2008

Burkardt Some Model Programming Tasks

Every program is different...however

Burkardt Some Model Programming Tasks

The SUM task: compute the sum of 10,000 numbers.

Burkardt Some Model Programming Tasks

The teacher writes down 0 on a piece of paper, the current sum.

Burkardt Some Model Programming Tasks

The teacher writes down 0 on a piece of paper, the current sum.

Burkardt Some Model Programming Tasks

The SORT task: sort 10,000 numbers.

Burkardt Some Model Programming Tasks

The SORT task is an example of a problem involving a certain

Burkardt Some Model Programming Tasks

Burkardt Some Model Programming Tasks

Monte Carlo simulations analyze a situation by randomly sampling

A random number generator has a period, after which it will

Burkardt Some Model Programming Tasks

The LC-RNG is often implemented in integer arithmetic:

I1 = ( A * I0 + B ) mod C ==> X1 = I1 / IMAX

There is a default value of I0, the “initial seed”.

Burkardt Some Model Programming Tasks

For us, the use of random number raises these issues:

Burkardt Some Model Programming Tasks

A big problem arises in shared memory computations.

Burkardt Some Model Programming Tasks

Burkardt Some Model Programming Tasks

Burkardt Some Model Programming Tasks

To understand an observed physical process, it is helpful to find a

Burkardt Some Model Programming Tasks

Start with random spins in each cell, and total energy E .

If E2 < E , reverse the spin always.

Burkardt Some Model Programming Tasks

Burkardt Some Model Programming Tasks

A forest fire simulation allows each cell the states

Burkardt Some Model Programming Tasks

Neutrons emitted at origin with random energy E.

Burkardt Some Model Programming Tasks

The DE task: determine the values of x(t) over a range

Burkardt Some Model Programming Tasks

A differential equation is solved one step at a time!

Burkardt Some Model Programming Tasks

Because a differential equation is solved one step at a time, it

Burkardt Some Model Programming Tasks

The HEAT task: determine the values of H(x, t) over a range

Burkardt Some Model Programming Tasks

Choose a set of nodes in the X direction.

Burkardt Some Model Programming Tasks

Burkardt Some Model Programming Tasks

Burkardt Some Model Programming Tasks

The N Body task:

initial locations (molecules or stars)

Burkardt Some Model Programming Tasks

1 DOT: s = x ∗ y , (scalar result)

Burkardt Some Model Programming Tasks

For vectors of length N, and matrices of order N by N, the FLOP

Burkardt Some Model Programming Tasks

These linear algebra tasks are so mathematically simple that it’s

Burkardt Some Model Programming Tasks

Burkardt Some Model Programming Tasks

Burkardt Some Model Programming Tasks