You are on page 1of 2

COURSE CODE 16CS71

USN

RVCOLLEGE OF ENGINEERING®
(Autonomous Institution affiliated to VTU, Belagavi)
VII Semester B. E. Nov/Dec-19Examinations
DEPARTMENT Computer Science and Engineering
COURSE CODE / TITLE Parallel Architecture and Distributed Programming
(COMMON FORMANT FOR 2018 AND 2016 SCHEME)

Time: 03 Hours Maximum Marks: 100


Instructions to candidates:
1. Answer all questions from Part A. Part A questions should be answered in first three
pages of the answer book only.
2. Answer FIVE full questions from Part B. In Part B question number 2, 7 and 8 are
compulsory. Answer any one full question from 3 and 4 & one full question from 5 and 6
PART-A
1 1.1 Differentiate between write broadcast and write update protocols. 02
1.2 Differentiate between true sharing and a false sharing miss. 02
1.3 A vector processor can handle strides greater than one, called ___________ 01
1.4 What is chaining? 01
1.5 Demonstrate with an example, how strip mining can be used to handle large size 02
vector.
1.6 Differentiate between coarse grained and fine grained decomposition 02
1.7 Define task dependency graph. 01
1.8 Write OpenMp C code to create 8-threads. 01
1.9 Message passing programs are generally written using the _____paradigms. 01
1.10 Write MPI C program that prints a “Hello World” message from each processor. 02
1.11 ___ is called prior to any calls to other MPI routines. 01
1.12 With reference to OpenACC programming paradigms, when encountering the
__directive, the compiler will generate 1 or more parallel ___, which execute 02
redundantly
1.13 Mention the use of copyin, copyout data clauses used in OpenACC 02
PART-B
2 Illustrate snooping write invalidate cache coherence protocol for a write back cache
a showing the states in a state transition diagram for the requests generated by CPU
requests. 08
Suppose a speed up of 90 has to be achieved with 100 processors. What fraction of
b
the original computation can be sequential? 04
c Discuss briefly the models of memory consistency. 04

3 a Illustrate the basic structure and functionalities of vector architecture, VMIPS 08


b Briefly discuss the differences between vector architecture and GPUs 04
c Briefly discuss conditional branching in GPUs 04
OR
4 a With a neat diagram explain the process of scheduling threads of SIMD instructions. 08
Analyze the similarities and differences between Multimedia SIMD Computers and
b
GPUs. 08

5 a Briefly describe Recursive Decomposition Technique with an example 5


b Bring out the different mapping Techniques for Load balancing 5
c Briefly comment on Work Pool model and Master Slave Model 6
OR
6 Describe a message-transfer protocol for buffered sends and receives in which the
a buffering is performed only by the sending process. What kind of additional 08
hardware support is needed to make these types of protocols practical?
The MPI standard allows for two different implementations of the MPI_Send
operation – one using buffered-sends and the other using blocked-sends. Discuss
b some of the potential reasons why MPI allows these two different implementations.
In particular, consider the cases of different message-sizes and/or different
architectural characteristics 08

7 Consider Cannon's matrix-matrix multiplication algorithm. Our discussion of


Cannon's algorithm has been limited to cases in which A and B are square matrices,
mapped onto a square grid of processes. However, Cannon's algorithm can be
extended for cases in which A, B, and the process grid are not square. In particular,
a
let matrix A be of size n x k and matrix B be of size k x m. The matrix C obtained
by multiplying A and B is of size n x m. Also, let q x r be the number of processes
in the grid arranged in q rows and r columns. Develop an MPI program for 08
multiplying two such matrices on a q x r process grid using Cannon's algorithm.
Write a MPI program that has a total of 4 processes. Process with rank 1, 2 and 3
b should send the following messages respectively to the process with rank 0:
HELLO, CSE, RVCE 08

8 How parallel computing is implemented using 3 Levels of Parallelism in OpenACC 08


a
b Draw and explain OpenACC Execution Model 08

You might also like