You are on page 1of 5

HPC - report of MPI assignment

Andrea Bricola

January 2, 2023
Chapter 1

Report

Background on pi number

Figure 1.1

Pi is a mathematical constant. Pi number is equal to the ratio of the circumference


over the diameter of a circle. Pi can be approximated by computing the integral of
the following function in the interval [0,1].
Z 1
4
π= f (x) =
0 1 + x2

Pi computation
The computational problem regards dividing the integral in 101 0 intervals and calcu-
lating the Riemann sum to approximate the value of the integral with proper precision.
The method of Riemann sum consists of dividing the integral of a function in many
intervals, computing the area in each interval and finally approximating the integral
as the sum of all areas.

1
Figure 1.2

The program computes an approximation of π with precision to the 8th decimal


digit.

Integral computation with MPI

Figure 1.3

The idea to make the computation faster is distributing the intervals of Riemann sum
among processing units that work in parallel, so each processor obtains the sum of a

2
subset of intervals and finally a global sum is acquired as aggregation of subset sums.
In order to aggregate local sums I used the procedure MPI Reduce() so each process
sends a message to root process with its local sum and finally root process computes
the total integral

Experiments with MPI


Hardware
THe experiment are conducted in computer cluster OCAPIE (Ottimizzazione di CAl-
colo Parallelo Intensivo Applicate a problematiche in ambito Energetico). There are
8 avaiable nodes and each machine has a Xeon phi has processor, which has 64 cores
and supports hyperthreading with 256 logical cores.

1.0.1 Measuring performances


An MPI program involves the initialization of processes inside the local operative sys-
tem and possibly in remote computers. An MPI program requires a certain amount
of time before processes initialize and can read their own rank. After compiling a
multiprocess program with mpicc, I executed it with mpiexec, and I observed a delay
of some seconds before processes start and can print to standard output their rank.
This delay was around 12 seconds when I initialized 64 processes (one for each core)
in each of the 8 avaiable nodes in the computer cluster. The initialization time is a
parallelization overhead which cannot be removed.

In order to measure computation time of the integral, I inserted in my code a


chronometer which starts before the parallel computation of intervals and stops once
root processes receives local sums and computes the total integral.

results
• 32 processes on one node: 239
• 64 processes on one node: 119 seconds
• 128 processes on one node: 61 seconds
• 256 processes on one node: 37 seconds
• 256 processi distribuiti nel cluster: 29 seconds
• 500 processi distributii nel cluster: 15.7 seconds
• 700 processi distribuiti nel cluster: 11.2
• 800 processi distribuiti: 10.9
• 900 processi 9.8
• 1000 processi 8.9
• 1200 processi: 14.9 secondi (cluster occupato?)
vedere se migliora mettendo piu di 64 processi con un nodo solo, e poi decidere
come organizzare i processi remoti

3
Experiments with MPI and OMP
precisione rilevata: 12 cifre decimali

• 64 thread su un processo: 121.1 secondi

• 128 thrwad su un processo: 62.7

• 256 thread su un processo: 34.5

• 512 thread distribuiti: 19.5

• 768 thread distribuiti: 13.6 secondi

• 1024: 11.3 secondi

• 2048 thread: 10

• 4096: 9 seconds

• 8192 thread: 9.2

• 16384 thread:

You might also like