You are on page 1of 25
VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE Matunga, Mumbai-400 019 Autonomous Institute affiliated to University of Mumbat 7 April 2021 EXAMINATION End Semester Examination DATE OF EXAM 27" Apri April 2021 | | | 1:00 SEMESTER & PROGRAM Semester-VIII, Final Year 1IME 11-00 am to 1:00 pm | B Tech (IT) | { TIME ALLOWED 72.0 NRS. MARKS 60 Marks COURSE NAME (CODE) | Multicore Technologies -( ITAI15T) Instructions 1. All questions carry equal marks, 2. Figures to the night indicate full marks 3 (CO1 Q.1 a, Deserbe relationship between Processes and Threads? Explain various models, FFOTl used for mapping between threads and processors. Is kernel level thread share code or data? 1) b. What is cluster? Draw supercomputer architecture for working of cluster system {co ] [Use 1 master node and 8 slave nodes]. Q.2_ a. Explain typical step for constructing parallel algorithm. ; Va i. Solving 15 puzzle problem using Exploratory Decomposition ii, Solving Parallel quick sort using Hybrid decompositions. . In DNS algorithm after broadcasting values for given matrices find out matrix’s. [C02 distribution of k= 0,1,2,3. (04) $261 7580 m.6 2 & 1826 j3 814] lo 43 8 “11 8 5 6 $37 9 [Note: No need to solve the problem; describe only broadcast matrices values for k= 0,4:2,3in matri’s] Q.3 a. The following Reservation Table corresponds to a two function pipeline {C03} 08 sit 0 4° 2 3 4 — 1 A_ |B | AB 2 A B | 3 AB | Draw and explain state diagram for two function pipeline. b. Given a kernel {C03} add <<< dimGrid, dimBlock >>> ; dimGrid(4,8); dimBlock = (10,10) ; 04 Can you draw the thread block diagram for mapping threadid= 9132 ° a. For each of the following code segment, use OpenMP Pragmas to make the loop [C04] Parallel, or explain why the code segment is not suitable for parallel execution i forks < 2%k fy afi] = afi} + af i- k); 4 [co4] 04 b. Define MPI function in term of i, ii ill, Ww Rank Communications Format of MPI calls Communicator (co4 «Explain methodology to construct MPI * ‘OpenMP (Hybrid) programming. can we C 1 xpuct MPI + CUDA (Hyiia) program? Just Your answer co2 5 a Theintensity of the graph node is given In following figure oa % a 08 rccution tion value col Node cind out best possible path to reach Goal node using Best-First Search method. Foe explain a general schematic for parallel Best Fs! search using a centralized Strategy for find out best possible path fo reach Goal node. 04 b. Can we terminate given Bestirst Search method using Hung's terminal detection Algorithm? Justify? **Best of Luck"*” E NOLOGICAL INSTITUT! JABAI TEC! ° ’ GBs VEERMATA Tae ta ary tm : ie, VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE f Matunga, Mumbai-400 019 Autonomous Institute affiliated to University of Mumbai EXAMINATION End Semester Examination DATE OF EXAM | 27" April 2021 | | | April 2021 VII, Final Year ‘Me 11-00 am 10 1:00 pm | |B Tech (IT) TIME ALLOWED |2.0 URS COURSE NAME (CODE) SEMESTER & PROGRAM | MARKS | 60 Marks | Multicore Technologies (IT41 1ST) Instructions 1. All questions carry equal marks, 2 | Figures to the right indicate full marks, Q1 a. Explain Hyper-Threading Technology (Architecture). {con A video playing in Windows Media Player can be slowed down during a Web page loading in an Internet Explorer. 1s. this Problem solved by Hyper-Threading 98 Technology (Architecture)? How? b. Differentiate between Top 500 and Green 500? Which supercomputers are mostly [CO1] used in multicore technology in Energy Perspective (Top 500 or Green 500)? Why? Q2 a. Explain Data-Parallel Models and Master-Slave Model in term of decomposition, [C02] a. mapping techniques and applying appropriate strategy to minimize interactions. (08) eB Find out the communication Steps in Cannon algorithm on 9 process for matrix's [C02] and B. A(LO) (1,1) A(1,2) A(2,0) (2,1) A(2,2) B(1,0) B(1,1) B(1,2) fis AQ,1) A(0,2y B20) B21) B(2.2) B(0,0) B(0,1) saa (04) Q3 a. A pipeline with the five Segment S1,S2,S3,Ss,Ss is characterized by the following [CO3} reservation table: (08) Draw and explain briefly the state diagram for this pipeline, Determine minimum average latency (MAL) and all greedy cycles. [Jie [tis Te Ts Te Te [te {tr is f I : [Se KIX I - [Ss fe x [Se] i {x |x __| [Ss X_1x | b. Consider the following CUDA diagram {C03} 04 ww wy wy wu Wy VW ww, VEERMATA JUABAT ros 4000 ore, ayo ® poate att —_— ) _— r TpATEOF EXAM | * 5757 Gomester Examination — i. Fillthe X, ¥ parameters for the blocks ii, Find the threadidx, blockldx, and block Dim of the thread highlighted in Oval. Q.4 a. For each of the following code segment, use OpenMP pragmas to make the loop Icoal parallel, or explain why the code segment is not suitable for parallel execution. font 1-0; i= (ant) sqrtt); i+) { afi] =2.3 * i if (i < 10) bli} afi}: } ii, Flag=0; for( i=0; in & (!flag); i++) afi) 2.3 if(a [i] < bli] Mag = 1; } b. Write an MPI program that has a 2 processes, where the process with rank = 0 cos should send ‘MT’ letter to all the processes using MPI_Scatter <. Explain methodology to construct MPI + OpenMP programming. Can we construct {CO4) MPI + CUDA program? Justify your answer. a. Is parallel formulation of Dijkstra’s single ~ source shortest path algorithm is similar [CO2, to the parallel formulation of Prim’s algorithm for minimum spanning tree shown in CO4] figure with starting node A? Justify your answer briefly. Q. a b. How can we convert given Dijkstra’s single — source shortest path algorithm to 04 Johnson's algorithm for finding sparse graphs efficiently? Justify. “*Best of Luck*** OO — — eran rs oe VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE Matunga, Mur 400019 Autonomous Institute affiliated to University of Mumbai 1 bid Semester Examination, | DATE OF EXAM | 26° February 2021 | February 2021 | [ SEMESTER & PROGRAM Semester-VIH, Final Year UTIME 12:00 pm to 100 pm |B ech (Computer Fge.& IT) | | 1.0 HRS. MARKS 20 Marks UTIME ALLOWED COURSE NAME —(CODE) | Multicore Technologies (COMNAT & ITA IST) All questions carry equal marks. Figures to the right indicate full marks. Q.1 In a co-operative network, tasks are solved by assignir mission oriented problems. There is failure uncertainty in or more processers fails. This type of uncertainty problems ¢ property. (Robustness: If one oF more processors fails, then those failed pr replaced by other processors with the same tasks of failed processors.) Note: 1- Task assigned, 0- No task assigned, and 2- Task assigned two times Instructions t to complete the coz) ng the processors the (10) ‘co-operative network, where ‘an be solved using robustness focessors can be Table! rs and their tasks 7 Lo [ Tasks _ E i tl ee eee pimeea0 0 fo ia fe | ee is pane fo 1 fi E pot 1 [iio | ips 1 0 oe l | [/p2,p4.p5 | pl.p3.p5 | p2.p4.p5 | p2.p3.p4 | pl.p3.p Table 2: Tasks required to complete the corresponding missions _ Tass _ 2 tJ 4 ts wiolclol+ erioiclo| Processors ol—|—l=|— elolHlol- The preference of missions is: ml -> m2 -> m3 -> m4 ->m5. Table 3: Work load balance to the processor to complete tasks of mission I I Processor \ tl 12 B 4 mi |0 pl polar (13) s | m2 p4,p5_| pl 0 0 S m3 | p2 ps p4 0 E (mi |0 [ps 0 ‘0 ms [0 [0 | pd | pp Using the above table information, design best-fit parallel algorithm models in terms of decomposition, mapping techniques and strategy to minimize interactions to complete the mission oriented problems, Design a suitable architecture to predict the weather forecast system's result {cot} conceming storage and processing flow in machine perspective APL (04) . desctibe three steps to determine the total cost of the parallel [CO Using given matric DNS algorithm. If we ignore the communication time for the first step and ignore co2) the computation time for addition in the final phase. Can we get this algorithm cost (44 optimal? : . | : * | oOo Uwe [Note: No need to solve the problem; describe steps how to get results] if we modify hyperquick sort algorithm by makes a better estimations of median keys (co2} to ensure a more balanced key distribution among the processor nodes, and to avoid (02) better performance than the worst case of nearly sorted input sequence. Can we get ort algorithms? the other Parallel Quicksort, Parallel Bitonic Sort, and Hyper quicks' Justify your answer. “Best of Luck"**” Matunga, Mumbai-400 019 Autonomous Institute affilated to University of Murbal & VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE EXAMINATION Mid Semester Examination | DATE OF EXAM | 28" February 2020 | February 2020 \ SEMESTER & PROGRAM | Sem-Il, First Year MTech | TIME 10,00 am to 11:30 am (Computer Eng. and NIMS) TIME ALLOWED 1S HRS, MARKS 40 Marks COURSE NAME (CODE) Parallel & Distributed Algorithm ~ (COSI 115) Instructions 1 All questions carry equal marks 2 Figures to the night indicate full marks Qi A=(11,50,53,95,36,67,86,44,35, 16,81, 1,44,23,15,5,97,48, 16.8,66. 96.17.49, [CO1] 58,76,54,39,82,47,65,51} be set of unsorted element __ Illustration High-level view of a parallel quick sort approach to sort given set (04) of element b. Determine the isoefficiency of hyperquick sort (03) cf we modify hypercuick sort algorithm by makes a better estimations of (03) median keys to ensure a more balanced key distribution among the processor nodes, and to avoid the worst case of nearly sorted input sequence. Can we get a better performance than the other Parallel Quicksort, Parallel Bitonic Sort, and Hyper quicksort, algorithms? Justify your answer. (Assume this set array sorted on four processes logically organized as two- dimension hypercube) Q2 a. The following Reservation Table corresponds to a two function pipeline {cor} fc feaO aee 1 [A [8 A [Bm Za4_ 1A 8 | 3£[8 AB [A i. List all four cross forbidden list of latencies and corresponding combined (02+06) cross-collision matrices, ii, Draw and,explain.state diagram for two function pipeline b. Fora given sparse matrix-vector product interaction graph. If (02) Assumptions is given: "34, Each node takes unit time to process 2.,,_Each interaction (edge) causes an overhead of a unit time . “® 2 o> Find out communication and computation: I If node 0 is a task. ii If node 0, 4-and 5 are a task. as follows Given set of Q3 a. The traveling salesman problem (TSP) is defined determine a tour through c cities and the distance between each pair of cities. all cities of minimum length. A tour ofall cities is @ trip visitng each city once sand returning to the starting points. Its length is the sum of distance traveled This problem can be solved using OP formulation View ne cities as vertices ina graph G(V. E). Let the set of cities V be represented by (v1, Va. .Nob and let SC {V1, v2.va,....Vo}. further let cs; be the between cities‘ and If £ (Sk) represent the cost of starting at city v1. passing all cities in set S. and terminating in city k, then the following recursive equation can be used to compute f(S,k) Ck fH min{f(S—KKhm)} + cnn) m€ S-{k} S=(k} SAK} allel formulation. Compute the run Based on above equation, derive a pari optimal? time and the speed up. Is this parallel formulation cost Why CUDA divide computation requirement in twice into grids then the blocks? Honey bees collectively select the best nectar source available using simple bees have one possible behavioral rules. In the process of foraging, behaviors: to dance to communicate the quality of the food source to other bees. The intensity of the Honey bees danc .@ which is given by figure Find out Honey bees best possible path to reach Goal node using Best-First Search method Also explain a general schematic for parallel Best-First search using a centralized strategy for said example (08)[CO2} (02) {co2) (04) (06) VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE Matunga, Mumbai - 400 019 [Autonomous] IST4: Examination Sem & Programme VIB. Tech. (IT) uration 1:30 Hours » Course Code & Parallel Computing, Maxons 20 oanote Course Instructions 1. All questions are compulsory 2. Figures to the right indicate full marks QI A. Distinguish among computer terminologies in each of the following group 1. Parallelism versus Pipelining, 2. Serial Processing versus Parallel Processing B__ In the following block of computations, a and b are two external inputs and 7 is the final output, Two intermediate result are labeled x and y X4—atar ye bt bi z4-(xty)/(x-y) | a, Draw a data flow graph for this code block. where *. +. - and / are arithmetic operators, b. Show a template implementation of the data flow graph in (a) Indicate the event that can be done in parallel in the execution of the above block of code Q2. A Differentiate between Synchronous Multiprocessors Architectures and Asynchronous Multiprocessors Architectures. B. Derive the following terms a. Speedup according to Folk Theorem 1.1 b. Efficiency according to Folk Theorem 1.2 Q3. A. Explain Special Features of OpenMP. B For each of the following code segment, use OpenMP pragmas to make the loop parallel, or explain why the code segment is not suitable for parallel execution. i< (int) sqrt(x); i+) 3+i8 for(i 4 afi) =2: ai if (a [i] <.bfi] fag i Ashe c. for(i =Osi< ns +4) afi} = f00 (i d i

You might also like