You are on page 1of 4

Assignment 1

Name Vishal Jasrotia Class ESE 536


Date 09/21/2017 Period Fall 2017
SBUID 111578708 Netid Jvishal
Email jasrotia.vishal@stonybrook.edu

1.Given n/log n processors, design an algorithm to find the maximum of n numbers in


O(log n) time on an EREW-PRAM model. Assume that initially each shared memory
location holds one input value. Give necessary explanation and analysis.
Number of elements = n. Number of processors = n/logn
1.Divide n element into n/logn group of size log n elements. Sequential algorithm can find maximum
number from an array of x elements in O(x) time by traversing all x elements and comparing one by
one. So, in this case each processor can work on one group of size logn to find maximum in each
group in O(logn) time as the size of the group is logn.
2.After this step, we will get 1 maximum element from each group i.e. n/logn elements. Now, one
processor can find maximum from 2 element in O(1) time. So, n/(logn*2) processors can find
maximums between all pair of two elements. So, n/logn*2 elements will be left after iteration.so after
k iteration n/(logn*2k) elements will be there in final array. For finding maximum, elements
remaining after k iteration should be 1.

n/(log(n)*2k) = 1 n/(log(n) = 2k Hence, k = log(n/logn)


1. Time complexity = Time complexity of step1 + Time complexity of step2 + time complexity of read
and write operation between the iteration in step2.

2. Time complexity = O(logn) + O(n/logn) + time complexity of read and write operation between the
iteration in step2.

Time taken by write after first iteration = number of elements x time taken by single write operation,
Which is n/logn elements x O(1) = O(n/logn) for write after first iteration. Time complexity of read
after first iteration is also O(n/logn). Similarly, time complexity of read and write after second
iteration is o(n/logn*2) and so on and so forth. Total read/write time complexity = O(n/logn) +
O(n/logn*2) + O(n/logn*4) O(1) = O(n/logn).Use this in equation 2.

2. Time complexity = O(logn) + O(n/logn) + O(n/logn) = O(logn)

Algorithm:
Divide n elements into n/logn groups of logn elements:
For each Pi in 1<= i <= n/logn: Do in parallel
find max in group Gi in logn element in time O(logn)
write to max of each group in location M[Gi].
while i not equal to 1 for Gi://only one element left(max)
For each Pi in 1<= i <= Gi/2: Do in parallel.
Read from M[i*2-1] and M[i*2]
Compare and find max between i*2-1 and i*2 element.
Write to max of two to memory M[i].
Gi= Gi/2

2.Given n processors and assume that initially each shared memory location M[i](1<=i<=n)
holds an input value ai. Design an O(logn) algorithm on a CREW-PRAM model such that at
the end of the algorithm M[i] = ik=1 ak. Give necessary explanation and analysis.

2. Number of elements = n. Number of processors = n


Step 1. Sum all adjacent pairs in M[i] 1 i n using n processors and save in memory. Time
complexity is sum of time taken n processors and the time taken by write operation, which is O(1).
Now, we have n/2 elements.
Step 2. Calculate M[i] for 1 i n/2 using sequential algorithm in O(n/2) time.
Step 3. After step2, we have every alternate element of the resultant array.

Initial array = [a1, a2, a3, a4,.an]


Array after step 3 M = [ f2 , , f4 fn-1 ]

Step 4. To find remaining elements by adding the previous sum and original element at that place
because previous elements in final array is already sum of all previous element.
Example f1 = a1, f3 = f2 + a3, f5 = f4 +a5, so on and so forth.

*Step 2 can be solved recursively, by again adding adjacent number in pairs and jump to step 1 until
only one element remaining in array.
Algorithm:
1.For each Pi 1<=i<=n/2: do in parallel:
add two adjacent numbers in pairs. : T(1)
2.find sum M[i] 1<=i<=n/2 : T(n/2)
3.for each Pi in 1<=i<=n/2: do in parallel: : T(1)
add f(n-1) + a(n) and place in M[n] : T(1)

Time complexity T(n) = T(1) + T(n/2) + T(1) + T(1). Step 2 can be recursively solved by passing n/2 array
again to step1 until only 1 element left in array. So, T(n) = T(n/2K) + C recursively, which is O(logn).

Page 2 of 4
3.Design an algorithm on a CRCW-PRAM model for fast multiplication of two n x n matrix
for the following case:
(a). The number of processors P(n) = n and the time complexity of the algorithm T(n) =
O(n2).

Algorithm for matrix A(n, n) and B(n, n) multiplication:


For each i 1<=i<=n and j 1<=j<=n: -- T(n2)
For each Pk 1<=k<=n: do in parallel -- T(1)
A(0,k) x B(k,0)
Add result of all Pk and write to M(i,j) --T(n/2) add n number using n processor O(logn).

T(n) = T(n2) + T(n/2) + T(1). Time complexity is O(n2)

Traverse n x n location and find each element by multiplying A(0,k) and B(k,o) in parallel by Pk
processor and, then add all to get M(i,j). Repeat this for all n x n element. Only advantage this
algorithm has over the 1 processor sequential algorithm is that it can multiply all n x n number to
find i,j element in final matrix in O(1) time because of parallelism.

(b). The number of processors P(n) = n2 and the time complexity of the algorithm T(n) =
O(n2).
Algorithm for matrix A(n, n) x B(n, n) multiplication using n2 processors:
For each Pi,j in 1<=i,j<=n : do in parallel.
M(i,j) = nk=0 A(i,k)*B(k,j) -- T(k)

Since we have n2 processors, we can use each processer to find M(i, j) number in final matrix. Each
processor can calculate sigma of A(i,k) * B(k,j) for k 0 to n to find number at position i,j.

Time complexity = T(k), which is O(n). k* is number of element in matrix row/column.

4.Prove that the best parallel algorithm written for an n processors EREW-PRAM model can
be no more than O(logn) times slower than any algorithm for a CRCW model of PRAM
having same number of processors.

Difference between the EREW and CRCW model is that, in CRCW all processors can read and
write to memory location mi concurrently. But, in EREW processors cannot read memory location
in same cycle. So, we need to broadcast the data for other processors. Every processor can
broadcast data using following algorithm:

Page 3 of 4
Start broadcast of mi:
Foreach Pi 1<=i<=n:
For j in 1<=j<=log2 n:
M[i+ 2j] = M[i]
j = j+1
then Pi reads from M[i] location

So the idea is before Pi reads from memory, it will be broadcasted to other locations exponentially.
Time complexity is O(logn) because of For j in 1<=j<=log2 n.
Case: In matrix multiplication in Answer 3 (a). Time complexity is O(n2) CRCW model. If we use
EREW model for same problem then only extra step is read and write, which can be done with
above broadcasting algorithm. So time complexity for EREW model will be O(logn) more than
CRCW model for matrix multiplication.

Page 4 of 4