Professional Documents
Culture Documents
Report
Brief description of your implementation
Assuming there are 8 NP (as given by question), this will just require log2(8) rounds of
reduction, which is 3 loops. This is achievable by having 1 while loop and 1 nested while loop.
We adjusted to value of increase every loop accordingly. The outer while loop will have 3
loops if NP = 8 as log2(8) = 3 (x=3). The inner while loop’s y will start from index 0 and increase
by 2x+1 every time it finishes one loop as long as y is still below size (which is NP). This method
allows us to perform parallel reduction. Please see below and follow along to visualise how
this code flows.
- When second loop ends, we have already finished half the processes again. At the
receiver’s end, they will add the incoming array to their own array again.
- Now, we are left with processes 0, 4.
- In the third loop, when x =2, the receiver will be process 0 and the sender will be
process 4.
- Hence, process 4 will send its array to process 0 and process 0 will add the incoming
array to its own array.
With this implementation, I went on to conduct experiments and show my results in the next
section.
Student ID: 2022480266
Name: Jovan Huang Tian Chun
From here, we can see that my implementation of MPI_reduce is mostly faster than
the official MPI_Reduce, except when there are 8 processes for an array size of 1M.