0 views

Uploaded by kepler ent

Solution to parallel algorithms.

- 10809004h1
- sol05
- Engineering journal ; Automatic Train Control Algorithms with Regulation Restrictions Adaptive to System State Changes
- Iportant i Simo
- Improved Computation for LM Training
- Highperformancecomputingin Power System
- uday-cc08
- Devlin Groups 052515
- Along Binary Basic Built in Behavior Resulting
- Intro to algorithms 2
- Ccp New Manual
- beowulf system paper
- algorithm.txt
- PC-4
- Cloud)
- SQL Server 2012 Intel Flexibility and TCO Benefits for Mission Critical Databases White Paper
- Wakefield GPUs for Reservoir Simulation
- Low Power
- 1410.0759v3
- openMP

You are on page 1of 4

Date 09/21/2017 Period Fall 2017

SBUID 111578708 Netid Jvishal

Email jasrotia.vishal@stonybrook.edu

O(log n) time on an EREW-PRAM model. Assume that initially each shared memory

location holds one input value. Give necessary explanation and analysis.

Number of elements = n. Number of processors = n/logn

1.Divide n element into n/logn group of size log n elements. Sequential algorithm can find maximum

number from an array of x elements in O(x) time by traversing all x elements and comparing one by

one. So, in this case each processor can work on one group of size logn to find maximum in each

group in O(logn) time as the size of the group is logn.

2.After this step, we will get 1 maximum element from each group i.e. n/logn elements. Now, one

processor can find maximum from 2 element in O(1) time. So, n/(logn*2) processors can find

maximums between all pair of two elements. So, n/logn*2 elements will be left after iteration.so after

k iteration n/(logn*2k) elements will be there in final array. For finding maximum, elements

remaining after k iteration should be 1.

1. Time complexity = Time complexity of step1 + Time complexity of step2 + time complexity of read

and write operation between the iteration in step2.

2. Time complexity = O(logn) + O(n/logn) + time complexity of read and write operation between the

iteration in step2.

Time taken by write after first iteration = number of elements x time taken by single write operation,

Which is n/logn elements x O(1) = O(n/logn) for write after first iteration. Time complexity of read

after first iteration is also O(n/logn). Similarly, time complexity of read and write after second

iteration is o(n/logn*2) and so on and so forth. Total read/write time complexity = O(n/logn) +

O(n/logn*2) + O(n/logn*4) O(1) = O(n/logn).Use this in equation 2.

Algorithm:

Divide n elements into n/logn groups of logn elements:

For each Pi in 1<= i <= n/logn: Do in parallel

find max in group Gi in logn element in time O(logn)

write to max of each group in location M[Gi].

while i not equal to 1 for Gi://only one element left(max)

For each Pi in 1<= i <= Gi/2: Do in parallel.

Read from M[i*2-1] and M[i*2]

Compare and find max between i*2-1 and i*2 element.

Write to max of two to memory M[i].

Gi= Gi/2

2.Given n processors and assume that initially each shared memory location M[i](1<=i<=n)

holds an input value ai. Design an O(logn) algorithm on a CREW-PRAM model such that at

the end of the algorithm M[i] = ik=1 ak. Give necessary explanation and analysis.

Step 1. Sum all adjacent pairs in M[i] 1 i n using n processors and save in memory. Time

complexity is sum of time taken n processors and the time taken by write operation, which is O(1).

Now, we have n/2 elements.

Step 2. Calculate M[i] for 1 i n/2 using sequential algorithm in O(n/2) time.

Step 3. After step2, we have every alternate element of the resultant array.

Array after step 3 M = [ f2 , , f4 fn-1 ]

Step 4. To find remaining elements by adding the previous sum and original element at that place

because previous elements in final array is already sum of all previous element.

Example f1 = a1, f3 = f2 + a3, f5 = f4 +a5, so on and so forth.

*Step 2 can be solved recursively, by again adding adjacent number in pairs and jump to step 1 until

only one element remaining in array.

Algorithm:

1.For each Pi 1<=i<=n/2: do in parallel:

add two adjacent numbers in pairs. : T(1)

2.find sum M[i] 1<=i<=n/2 : T(n/2)

3.for each Pi in 1<=i<=n/2: do in parallel: : T(1)

add f(n-1) + a(n) and place in M[n] : T(1)

Time complexity T(n) = T(1) + T(n/2) + T(1) + T(1). Step 2 can be recursively solved by passing n/2 array

again to step1 until only 1 element left in array. So, T(n) = T(n/2K) + C recursively, which is O(logn).

Page 2 of 4

3.Design an algorithm on a CRCW-PRAM model for fast multiplication of two n x n matrix

for the following case:

(a). The number of processors P(n) = n and the time complexity of the algorithm T(n) =

O(n2).

For each i 1<=i<=n and j 1<=j<=n: -- T(n2)

For each Pk 1<=k<=n: do in parallel -- T(1)

A(0,k) x B(k,0)

Add result of all Pk and write to M(i,j) --T(n/2) add n number using n processor O(logn).

Traverse n x n location and find each element by multiplying A(0,k) and B(k,o) in parallel by Pk

processor and, then add all to get M(i,j). Repeat this for all n x n element. Only advantage this

algorithm has over the 1 processor sequential algorithm is that it can multiply all n x n number to

find i,j element in final matrix in O(1) time because of parallelism.

(b). The number of processors P(n) = n2 and the time complexity of the algorithm T(n) =

O(n2).

Algorithm for matrix A(n, n) x B(n, n) multiplication using n2 processors:

For each Pi,j in 1<=i,j<=n : do in parallel.

M(i,j) = nk=0 A(i,k)*B(k,j) -- T(k)

Since we have n2 processors, we can use each processer to find M(i, j) number in final matrix. Each

processor can calculate sigma of A(i,k) * B(k,j) for k 0 to n to find number at position i,j.

4.Prove that the best parallel algorithm written for an n processors EREW-PRAM model can

be no more than O(logn) times slower than any algorithm for a CRCW model of PRAM

having same number of processors.

Difference between the EREW and CRCW model is that, in CRCW all processors can read and

write to memory location mi concurrently. But, in EREW processors cannot read memory location

in same cycle. So, we need to broadcast the data for other processors. Every processor can

broadcast data using following algorithm:

Page 3 of 4

Start broadcast of mi:

Foreach Pi 1<=i<=n:

For j in 1<=j<=log2 n:

M[i+ 2j] = M[i]

j = j+1

then Pi reads from M[i] location

So the idea is before Pi reads from memory, it will be broadcasted to other locations exponentially.

Time complexity is O(logn) because of For j in 1<=j<=log2 n.

Case: In matrix multiplication in Answer 3 (a). Time complexity is O(n2) CRCW model. If we use

EREW model for same problem then only extra step is read and write, which can be done with

above broadcasting algorithm. So time complexity for EREW model will be O(logn) more than

CRCW model for matrix multiplication.

Page 4 of 4

- 10809004h1Uploaded byb1b11b111b1111
- sol05Uploaded bySonnySonni
- Engineering journal ; Automatic Train Control Algorithms with Regulation Restrictions Adaptive to System State ChangesUploaded byEngineering Journal
- Iportant i SimoUploaded byJaime Bonilla
- Improved Computation for LM TrainingUploaded byTeknik Mcu
- Highperformancecomputingin Power SystemUploaded byrnmukerjee
- uday-cc08Uploaded byOscar YTal Pascual
- Devlin Groups 052515Uploaded byrollsroycemr
- Along Binary Basic Built in Behavior ResultingUploaded bysfofoby
- Intro to algorithms 2Uploaded bysri
- Ccp New ManualUploaded byPrakash Sharma
- beowulf system paperUploaded bynani49
- algorithm.txtUploaded bylord gladiator
- PC-4Uploaded bytt_aljobory3911
- Cloud)Uploaded bypeterbacsi
- SQL Server 2012 Intel Flexibility and TCO Benefits for Mission Critical Databases White PaperUploaded byJesús Manuel Viña Iglesias
- Wakefield GPUs for Reservoir SimulationUploaded bymartinezrdl
- Low PowerUploaded bySanthoshReddy
- 1410.0759v3Uploaded bypeterhaijin
- openMPUploaded byTarak Nath Nandi
- Project ReportUploaded bykicha120492
- Part 1 - Introduction to MatlabUploaded bynhatthang299
- [Acta Chimica Slovaca] Utilization of parallel computing in chemical engineering.pdfUploaded bype
- UCLA COM SCI 180 Spring MTUploaded byJovan White
- A Clustered Caching Placement in HeterogeneousUploaded byMuhammad Asad Khan
- SYNOPSIS Parallel)Uploaded byfamilyk
- Designing Memory Consistency Models For Shared-Memory MultiprocessorsUploaded byssgam
- A Sentimental Education - Sentiment Analysis Using Subjectivity Summarization Based on Minimum CutsUploaded byJunta Zen
- CHAP18Uploaded byKui Mangus
- artinvocabUploaded byKevinhu827

- MH1401_lab4(3)Uploaded byLivardy Wufianto
- math 10c i can statementsUploaded byapi-302976262
- Tocmod1 - Nutlearners.blogspot.comUploaded bySuseel Jai Krishnan
- linear system theory and desing (2).pdfUploaded byNelson Omar Baca Barahona
- An Implicit Partial Pivoting Gauss Elimination Algorithm for Linear System of Equations With Fuzzy ParametersUploaded byAlexander Decker
- Split PublishedUploaded byFatima Bouazza
- Unit 6. Sistems of Linear Equations. (2x2 Linear Equations)Uploaded bymariacaballerocobos
- CV (sagnik).pdfUploaded bySagnik Sen
- Ecuacion de OndaUploaded byDunia Cortez
- Perceptual Video Hashing Based on the Achlioptas’s Random ProjectionsUploaded byR Sandeep Nazre
- numbersUploaded bydavierang
- DSP Internal -I Question.docUploaded byAnonymous 1MIgfL
- Transformations3DUploaded byNaimish Kakkad
- Far Ey ProjectUploaded byPaty Rodríguez
- Handout 1Uploaded byMichael Yu
- Daa Course File Final 2012Uploaded bygurusodhii
- 221 notes 10-30 (1)Uploaded byCamilo Escobar
- Pawlak Book_2_13Uploaded bygsh26
- samplecalcprobsUploaded byVolkan Sezgin
- The Vortrix Product is a Complex IsomorphismUploaded byMath Guy
- All Class Xii Papers _ Cbse Board_ 2010-2011 (1)Uploaded byBasant Mishra
- Water Cycle Algorithm for Solving Multi-objective ProblemsUploaded byArdeshir Bahreininejad
- CircleUploaded byAkshit Salecha
- Bbmp1103- Edward - CopyUploaded byNur Fazalina
- Project Euler Counting Fractions: SolutionUploaded byAndrew Lee
- Appendix a Syntax Quick Reference 2017 Essential MATLAB for Engineers and Scientists Sixth EditionUploaded byShreyas Kolapkar
- Programming Assignment 1 - Particle Swarm OptimizationUploaded byY SAHITH
- Scheduling AlgoUploaded bySohaib Naeem
- S.V. Astashkin- Tensor product in symmetric function spacesUploaded byJutyyy
- matrix addition lesson planUploaded byapi-254434819