VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE
Matunga, Mumbai-400 019
Autonomous Institute affiliated to University of Mumbat
7 April 2021
EXAMINATION End Semester Examination DATE OF EXAM 27" Apri
April 2021 |
| | 1:00
SEMESTER & PROGRAM Semester-VIII, Final Year 1IME 11-00 am to 1:00 pm |
B Tech (IT) | {
TIME ALLOWED 72.0 NRS. MARKS 60 Marks
COURSE NAME (CODE) | Multicore Technologies -( ITAI15T)
Instructions 1. All questions carry equal marks,
2. Figures to the night indicate full marks
3
(CO1
Q.1 a, Deserbe relationship between Processes and Threads? Explain various models, FFOTl
used for mapping between threads and processors. Is kernel level thread share
code or data?
1)
b. What is cluster? Draw supercomputer architecture for working of cluster system {co ]
[Use 1 master node and 8 slave nodes].
Q.2_ a. Explain typical step for constructing parallel algorithm. ; Va
i. Solving 15 puzzle problem using Exploratory Decomposition
ii, Solving Parallel quick sort using Hybrid decompositions.
. In DNS algorithm after broadcasting values for given matrices find out matrix’s. [C02
distribution of k= 0,1,2,3. (04)
$261 7580
m.6 2 & 1826
j3 814] lo 43 8
“11 8 5 6 $37 9
[Note: No need to solve the problem; describe only broadcast matrices
values for k= 0,4:2,3in matri’s]
Q.3 a. The following Reservation Table corresponds to a two function pipeline {C03}
08
sit 0 4° 2 3 4 —
1 A_ |B | AB
2 A B |
3
AB |
Draw and explain state diagram for two function pipeline.
b. Given a kernel {C03}
add <<< dimGrid, dimBlock >>> ; dimGrid(4,8); dimBlock = (10,10) ; 04
Can you draw the thread block diagram for mapping threadid= 9132 °
a. For each of the following code segment, use OpenMP Pragmas to make the loop [C04]
Parallel, or explain why the code segment is not suitable for parallel execution
i forks < 2%k fy
afi] = afi} + af i- k);
4[co4]
04
b. Define MPI function in term of
i,
ii
ill,
Ww Rank
Communications
Format of MPI calls
Communicator
(co4
«Explain methodology to construct MPI * ‘OpenMP (Hybrid) programming. can we C 1
xpuct MPI + CUDA (Hyiia) program? Just Your answer
co2
5 a Theintensity of the graph node is given In following figure oa
% a 08
rccution
tion
value
col
Node
cind out best possible path to reach Goal node using Best-First Search method.
Foe explain a general schematic for parallel Best Fs! search using a centralized
Strategy for find out best possible path fo reach Goal node.
04
b. Can we terminate given Bestirst Search method using Hung's terminal detection
Algorithm? Justify?
**Best of Luck"*”E
NOLOGICAL INSTITUT!
JABAI TEC! ° ’
GBs VEERMATA Tae ta ary tm :
ie, VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE
f Matunga, Mumbai-400 019
Autonomous Institute affiliated to University of Mumbai
EXAMINATION End Semester Examination DATE OF EXAM | 27" April 2021 |
| | April 2021
VII, Final Year ‘Me 11-00 am 10 1:00 pm |
|B Tech (IT)
TIME ALLOWED |2.0 URS
COURSE NAME (CODE)
SEMESTER & PROGRAM
| MARKS | 60 Marks |
Multicore Technologies (IT41 1ST)
Instructions 1. All questions carry equal marks,
2 | Figures to the right indicate full marks,
Q1 a. Explain Hyper-Threading Technology (Architecture). {con
A video playing in Windows Media Player can be slowed down during a Web page
loading in an Internet Explorer. 1s. this Problem solved by Hyper-Threading 98
Technology (Architecture)? How?
b. Differentiate between Top 500 and Green 500? Which supercomputers are mostly [CO1]
used in multicore technology in Energy Perspective (Top 500 or Green 500)? Why?
Q2 a. Explain Data-Parallel Models and Master-Slave Model in term of decomposition, [C02]
a.
mapping techniques and applying appropriate strategy to minimize interactions. (08)
eB Find out the communication Steps in Cannon algorithm on 9 process for matrix's [C02]
and B.
A(LO) (1,1) A(1,2)
A(2,0) (2,1) A(2,2)
B(1,0) B(1,1) B(1,2)
fis AQ,1) A(0,2y
B20) B21) B(2.2)
B(0,0) B(0,1) saa (04)
Q3 a. A pipeline with the five Segment S1,S2,S3,Ss,Ss is characterized by the following [CO3}
reservation table: (08)
Draw and explain briefly the state diagram for this pipeline, Determine minimum
average latency (MAL) and all greedy cycles.
[Jie [tis Te Ts Te Te [te {tr
is f I :
[Se KIX I -
[Ss fe x
[Se] i {x |x __|
[Ss X_1x |
b. Consider the following CUDA diagram {C03}
04
ww
wy
wy
wu
Wy
VW
ww,VEERMATA JUABAT ros 4000 ore, ayo
® poate att —_— )
_— r TpATEOF EXAM | *
5757 Gomester Examination —
i. Fillthe X, ¥ parameters for the blocks
ii, Find the threadidx, blockldx, and block Dim of the thread highlighted in Oval.
Q.4 a. For each of the following code segment, use OpenMP pragmas to make the loop Icoal
parallel, or explain why the code segment is not suitable for parallel execution.
font 1-0; i= (ant) sqrtt); i+)
{
afi] =2.3 * i
if (i < 10) bli} afi}:
}
ii, Flag=0;
for( i=0; in & (!flag); i++)
afi) 2.3
if(a [i] < bli] Mag = 1;
}
b. Write an MPI program that has a 2 processes, where the process with rank = 0 cos
should send ‘MT’ letter to all the processes using MPI_Scatter
<. Explain methodology to construct MPI + OpenMP programming. Can we construct {CO4)
MPI + CUDA program? Justify your answer.
a. Is parallel formulation of Dijkstra’s single ~ source shortest path algorithm is similar [CO2,
to the parallel formulation of Prim’s algorithm for minimum spanning tree shown in CO4]
figure with starting node A? Justify your answer briefly.
Q.
a
b. How can we convert given Dijkstra’s single — source shortest path algorithm to 04
Johnson's algorithm for finding sparse graphs efficiently? Justify.
“*Best of Luck***OO —
— eran rs oe
VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE
Matunga, Mur 400019
Autonomous Institute affiliated to University of Mumbai
1 bid Semester Examination, | DATE OF EXAM | 26° February 2021
| February 2021 |
[ SEMESTER & PROGRAM Semester-VIH, Final Year UTIME 12:00 pm to 100 pm
|B ech (Computer Fge.& IT) |
| 1.0 HRS. MARKS 20 Marks
UTIME ALLOWED
COURSE NAME —(CODE) | Multicore Technologies (COMNAT & ITA IST)
All questions carry equal marks.
Figures to the right indicate full marks.
Q.1 In a co-operative network, tasks are solved by assignir
mission oriented problems. There is failure uncertainty in
or more processers fails. This type of uncertainty problems ¢
property. (Robustness: If one oF more processors fails, then those failed pr
replaced by other processors with the same tasks of failed processors.)
Note: 1- Task assigned, 0- No task assigned, and 2- Task assigned two times
Instructions t
to complete the coz)
ng the processors the
(10)
‘co-operative network, where
‘an be solved using robustness
focessors can be
Table! rs and their tasks 7
Lo [ Tasks _
E i tl ee
eee pimeea0 0 fo ia
fe | ee
is pane fo 1 fi
E pot 1 [iio
| ips 1 0 oe
l | [/p2,p4.p5 | pl.p3.p5 | p2.p4.p5 | p2.p3.p4 | pl.p3.p
Table 2: Tasks required to complete the corresponding missions _
Tass _
2 tJ 4 ts
wiolclol+
erioiclo|
Processors
ol—|—l=|—
elolHlol-
The preference of missions is: ml -> m2 -> m3 -> m4 ->m5.
Table 3: Work load balance to the processor to complete tasks of mission
I I Processor
\ tl 12 B 4
mi |0 pl polar (13)
s | m2 p4,p5_| pl 0 0
S m3 | p2 ps p4 0
E (mi |0 [ps 0 ‘0
ms [0 [0 | pd | pp
Using the above table information, design best-fit parallel algorithm models in terms of
decomposition, mapping techniques and strategy to minimize interactions to complete the
mission oriented problems,Design a suitable architecture to predict the weather forecast system's result {cot}
conceming storage and processing flow in machine perspective APL (04)
. desctibe three steps to determine the total cost of the parallel [CO
Using given matric
DNS algorithm. If we ignore the communication time for the first step and ignore co2)
the computation time for addition in the final phase. Can we get this algorithm cost (44
optimal?
: . | : * |
oOo Uwe
[Note: No need to solve the problem; describe steps how to get results]
if we modify hyperquick sort algorithm by makes a better estimations of median keys (co2}
to ensure a more balanced key distribution among the processor nodes, and to avoid (02)
better performance than
the worst case of nearly sorted input sequence. Can we get
ort algorithms?
the other Parallel Quicksort, Parallel Bitonic Sort, and Hyper quicks'
Justify your answer.
“Best of Luck"**”Matunga, Mumbai-400 019
Autonomous Institute affilated to University of Murbal
& VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE
EXAMINATION Mid Semester Examination | DATE OF EXAM | 28" February 2020
| February 2020 \
SEMESTER & PROGRAM | Sem-Il, First Year MTech | TIME 10,00 am to 11:30 am
(Computer Eng. and NIMS)
TIME ALLOWED 1S HRS, MARKS 40 Marks
COURSE NAME (CODE) Parallel & Distributed Algorithm ~ (COSI 115)
Instructions 1 All questions carry equal marks
2 Figures to the night indicate full marks
Qi A=(11,50,53,95,36,67,86,44,35, 16,81, 1,44,23,15,5,97,48, 16.8,66. 96.17.49, [CO1]
58,76,54,39,82,47,65,51} be set of unsorted element
__ Illustration High-level view of a parallel quick sort approach to sort given set (04)
of element
b. Determine the isoefficiency of hyperquick sort (03)
cf we modify hypercuick sort algorithm by makes a better estimations of (03)
median keys to ensure a more balanced key distribution among the
processor nodes, and to avoid the worst case of nearly sorted input
sequence. Can we get a better performance than the other Parallel
Quicksort, Parallel Bitonic Sort, and Hyper quicksort, algorithms? Justify
your answer.
(Assume this set array sorted on four processes logically organized as two-
dimension hypercube)
Q2 a. The following Reservation Table corresponds to a two function pipeline {cor}
fc feaO aee
1 [A [8 A [Bm
Za4_ 1A 8 |
3£[8 AB [A
i. List all four cross forbidden list of latencies and corresponding combined (02+06)
cross-collision matrices,
ii, Draw and,explain.state diagram for two function pipeline
b. Fora given sparse matrix-vector product interaction graph. If (02)
Assumptions is given:
"34, Each node takes unit time to process
2.,,_Each interaction (edge) causes an overhead of a unit time
. “® 2 o>
Find out communication and computation:
I If node 0 is a task.
ii If node 0, 4-and 5 are a task.as follows Given set of
Q3 a. The traveling salesman problem (TSP) is defined
determine a tour through
c
cities and the distance between each pair of cities.
all cities of minimum length. A tour ofall cities is @ trip visitng each city once
sand returning to the starting points. Its length is the sum of distance traveled
This problem can be solved using OP formulation View ne cities as vertices
ina graph G(V. E). Let the set of cities V be represented by (v1, Va. .Nob
and let SC {V1, v2.va,....Vo}. further let cs; be the between cities‘ and If
£ (Sk) represent the cost of starting at city v1. passing all cities in set S. and
terminating in city k, then the following recursive equation can be used to
compute f(S,k)
Ck
fH min{f(S—KKhm)} + cnn)
m€ S-{k}
S=(k}
SAK}
allel formulation. Compute the run
Based on above equation, derive a pari
optimal?
time and the speed up. Is this parallel formulation cost
Why CUDA divide computation requirement in twice into grids then the
blocks?
Honey bees collectively select the best nectar source available using simple
bees have one possible
behavioral rules. In the process of foraging,
behaviors: to dance to communicate the quality of the food source to other
bees. The intensity of the Honey bees danc
.@ which is given by figure
Find out Honey bees best possible path to reach Goal node using Best-First
Search method
Also explain a general schematic for parallel Best-First search using a
centralized strategy for said example
(08)[CO2}
(02)
{co2)
(04)
(06)VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE
Matunga, Mumbai - 400 019
[Autonomous]
IST4: Examination
Sem & Programme VIB. Tech. (IT) uration 1:30 Hours
»
Course Code & Parallel Computing, Maxons 20 oanote
Course
Instructions 1. All questions are compulsory
2. Figures to the right indicate full marks
QI A. Distinguish among computer terminologies in each of the following group
1. Parallelism versus Pipelining,
2. Serial Processing versus Parallel Processing
B__ In the following block of computations, a and b are two external inputs and 7 is the
final output, Two intermediate result are labeled x and y
X4—atar ye bt bi z4-(xty)/(x-y) |
a, Draw a data flow graph for this code block. where *. +. - and / are arithmetic
operators,
b. Show a template implementation of the data flow graph in (a)
Indicate the event that can be done in parallel in the execution of the above
block of code
Q2. A Differentiate between Synchronous Multiprocessors Architectures and Asynchronous
Multiprocessors Architectures.
B. Derive the following terms
a. Speedup according to Folk Theorem 1.1
b. Efficiency according to Folk Theorem 1.2
Q3. A. Explain Special Features of OpenMP.
B For each of the following code segment, use OpenMP pragmas to make the loop
parallel, or explain why the code segment is not suitable for parallel execution.
i< (int) sqrt(x); i+)
3+i8
for(i
4 afi) =2: ai
if (a [i] <.bfi] fag
i Ashe
c. for(i =Osi< ns +4)
afi} = f00 (i
d i