Reg. No.

:

M.E. DEGREE EXAMINATION, JUNE 2010 Second Semester Applied Electronics

AP9222 — COMPUTER ARCHITECTURE AND PARALLEL PROCESSING (Common to M.E-Computer and Communication, M.E-VLSI Design and M.E-Embedded System Technologies) (Regulation 2009) Time : Three hours Answer ALL Questions Maximum : 100 Marks

1. 2.

Define Bernstein conditions related to parallelism and dependence relations. A workstation uses a 15-MHz processor with a claimed 10-MIPS rating to execute a given program mix. What is the effective CPI of this computer assuming a one-cycle delay for each memory access? List the parameters used for evaluating parallel computations. Topologically equivalent networks are those whose graph representations are isomorphic with the same interconnection capabilities. Prove that the Omega network is topologically equivalent to the Baseline network. A two — level memory system has eight virtual pages on a disk to be mapped into four page frames in the main memory. A certain program generated the following page trace:

3. 4.

5.

4

0

Show the successive virtual pages residing in the four page frames with respect to the above trace using LRU replacement policy. Compute the hit ratio in the main memory. Assume the page frames are initially empty.

6.

State the two sufficient conditions to achieve sequential consistency in shared memory access. Why are MIMD, MPMD or SPMD control preferred over SIMD data parallelism?

7.

1

1,0,2,2,1,7,6,7,0,1,2,0,3,0,4,5,1,5,2,4,5,6,7,6,7,2,4,2,7,3,3,2,3

4

0

PART A — (10 × 2 = 20 Marks)

1

4

0

Question Paper Code:

J7605

1

(8) (ii) 4 Or 2 0 Are there any resource dependencies if only one copy of each functional unit is available the CPU? (8) 1 where (Ri) means the content of register Ri and M(10) contains 64 initially. data flow and reduction computers in terms of the program flow mechanism used. R2 s4: Store M(1024). 4 11. R1 s5: Store M((R2)). (8) Explain the steps involved in calculating the grain size and communication latency for multiplying two 2 × 2 matrices. (a) (i) Analyze the data dependencies among the following statements in the given program: 0 J7605 1 Compare the advantages and disadvantages of chained directories for cache coherence control in large-scale multiprocessor systems. (a) (b) 4 0 1 (i) Compare control flow. Sun and Ni’s law to estimate the speedup performance of an n-processor system compared with that of a single-processor system ignoring all communication overheads. resource sharing and interprocessor communications. Explain about the theoretical models of parallel computers used by algorithm designers and chip developers. Also explain the differences among UMA. Distinguish between multiprocessors and multicomputer based on their structures. NUMA. . (16) Explain the applicability and restrictions involved in using Amdhal’s law. (8) Or (b) Characterize the architectural operations of SIMD and MIMD computers. Bring out the differences in the message passing OS models. M(10) s3: Add Rl. Guustafon’s law. 1024 (1) (2) (ii) Draw a dependence graph to show all the dependencies. Distinguish between spin locks and suspended locks for sole access Lou critical section. (16) 12. Comment on the advantages and disadvantages of the above computer models. 9. 1024 s2: Load R2. PART B — (5 × 16 = 80 Marks) s1: Load RI.8. 10. COMA and NORMA computers.

What is the required hit ratio h to make E>0. 4 (ii) Design I: 16. (2) Determine the maximum memory bandwidth obtained if only one memory module fails in each of the above memory organizations.13. 0 Consider the following three interleaved memory designs for a main memory system with 16 memory modules. Design 3: 4 way interleaving with four memory banks. (8) Or 3 (ii) 4 4 (1) Specify the address formats for each of the above memory organizations. operation modes. and t1 and t2 the access times respectively. functional capabilities and potential performance.95 if r=l00? Or (4) (b) (i) Describe the daisy chaining and the distributed arbiter for arbitration on a multiprocessor system. (a) (i) 0 1 Why are fine-grain processors chosen for future multiprocessors over medium-grain processors used in the past? From scalability point of view why is fine-grain parallelism more appealing than medium-grain or coarse-grain parallelism for building MPP systems? (8) Compare the connection machines CM-2 and CM-5 in their architectures. Let c1 and c2 be the costs per kilobyte. Each module is assumed to have a capacity of 1Mbyte. Express E in terms of r and h. 1 0 bus and and (8) Consider a two level memory hierarchy M1 and M2. (8) 1 J7605 Explain the difference between superscalar and VLIW architectures in terms of hardware and software requirements. (8) . What is the effective memory access time of this hierarchy? Let r=t2/tl be the speed ratio of the two memories. Let E=t1/ta be the access efficiency of the memory system. s1 and s2 the memory capacities. State the advantages shortcomings of each from both the implementational operational points of view. Comment on the relative merits of the three interleaved memory organizations. (a) (i) (ii) (1) (2) (3) Under what conditions will the average cost of the entire memory approach c2.way interleaving with one memory bank Design 2: 8-way interleaving with two memory banks. (8) (3) 14. The machine is byte-addressable. Comment on the improvement made in CM-5 over CM-2 from the viewpoints of a computer architect and a machine programmer. Denote the hit ratio of MI as h.

(6) . (10) (1) (2) (3) (4) (5) 15. List all the simple and greedy cycles from the state diagram. load balancing. S1 S2 S3 S4 1 X X X 2 3 X 4 5 6 X X One non-compute delay stage into the pipeline can be inserted to make a latency of 1 permissible in the shortest greedy cycle. (8) 0 ——————— 1 4 X 0 J7605 1 Prove that the greedy algorithm for multicast routing on a wormhole routed hypercube network always yields the minimum network traffic and minimum distance from the source to any of the destinations. What is the optimal throughput of this pipeline? (ii) Explain the multiprocessor UNIX design goals in the areas of compatibility. Draw the state transition diagram for the optimal cycle. 1 4 0 4 4 What is perfect decomposition? Discuss the differences in program replication techniques on multi-computers as opposed to program partitioning on multiprocessors. parallel I/O and network services. portability. (8) Or (b) Explain loop transformation theory and discuss how it can be applied for loop vectorization or Parallelization. Prove that the new MAL equals the lower bound. (a) (i) Show the modified reservation table with five rows and seven columns.(b) (i) (ii) Consider the following reservation table for a four stage pipeline with a clock cycle r = 20 ns. The purpose is to yield a new reservation table leading to an optimal latency equal to the upper bound. address space.

Sign up to vote on this title
UsefulNot useful