You are on page 1of 3

Student ID: 2022480266

Name: Jovan Huang Tian Chun

Report
Brief description of your implementation

Below is a snapshot of my implementation in reducing the integers without using MPI_Reduce.

It is a Parallel Reduction as learnt in Lecture 2 MPI, Slide 67.


From the lecture slide, every pairs of number are reduced in every round. In this case, we will
try to reduce every pair of array in every rounds.
From my implementation, you can see that only MPI_Send and MPI_Recv were used to
transfer data from one process to another.

Assuming there are 8 NP (as given by question), this will just require log2(8) rounds of
reduction, which is 3 loops. This is achievable by having 1 while loop and 1 nested while loop.
We adjusted to value of increase every loop accordingly. The outer while loop will have 3
loops if NP = 8 as log2(8) = 3 (x=3). The inner while loop’s y will start from index 0 and increase
by 2x+1 every time it finishes one loop as long as y is still below size (which is NP). This method
allows us to perform parallel reduction. Please see below and follow along to visualise how
this code flows.

In my codes, you can see from my comments that:


- In the first loop (x=0), receiver will be processes 0,2,4,6 while sender will be processes
1,3,5,7. This means the sender’s array of process 1 will be sent to process 0 which will
then add it to its own array.
- Basically in the first loop when x=0:
o process 1 will send array to process 0
o process 3 will send array to process 2
o process 5 will send array to process 4
o process 7 will send array to process 6
- When first loop ends, we have already finish half of the process. Also, at the receivers’
end, they will add the incoming array to their own array.
- Now, we are left with processes 0, 2, 4, 6.
- Then, in the second loop when x=1, the receivers will be processes 0 and 4, while the
senders will be processes 2,6.
- Basically in the second loop when x=1:
o process 2 will send array to process 0
o process 6 will send array to process 4
Student ID: 2022480266
Name: Jovan Huang Tian Chun

- When second loop ends, we have already finished half the processes again. At the
receiver’s end, they will add the incoming array to their own array again.
- Now, we are left with processes 0, 4.
- In the third loop, when x =2, the receiver will be process 0 and the sender will be
process 4.
- Hence, process 4 will send its array to process 0 and process 0 will add the incoming
array to its own array.

With this implementation, I went on to conduct experiments and show my results in the next
section.
Student ID: 2022480266
Name: Jovan Huang Tian Chun

Time result of 4 configurations for array size

Array Size’s Time Results


NP
64K (us) 1M (us) 16M (us) 256M (us)
Mine 1845 19441 242853 3864188
2 MPI_Reduce 3987 58985 931829 13849416
Mine 4401 32182 422248 6609738
4 MPI_Reduce 4014 42418 652634 10571879
Mine 7759 47012 612596 8912455
8 MPI_Reduce 10662 40504 614588 9479453

From here, we can see that my implementation of MPI_reduce is mostly faster than
the official MPI_Reduce, except when there are 8 processes for an array size of 1M.

You might also like