You are on page 1of 10
‘be treated as malpractice, 30, ¢ cross lines on the remai Jor equations written eg, 42+8 § 2 i t 7 z E 5 é 3 5 B 2 g 5 é 2 7 z E 1 USN | ] | ] 06CS81 Eighth Semester B.E. Degree Examination, December 2012 Advanced Computer Architecture Time: 3 hrs. Max, Marks:100 Note: Answer FIVE full questions atleast TWO questions from cack part. PART-—A 1a, List and explain four important technologies which have led to the improvements in ‘computer system. (10 Marks) b. Give a brief explanation about trends in power in integrated circuits and cost. (10 Marks) 2 a, Explain the pipetine hazards, in detail (10 Marks) b. Show java loop is unrolled so that there are four copies of the loop body, assuming Ry ~ Ro (that is, the size of the array) is initially a multiple of 32, which means that the number of loop iterations is a multiple of 4. Eliminate any obviously redundant computations and do not reuse any of the registers. (10 Marks) 3° a. What is dynamic prediction? Draw the state transition diagram for 2 bit prediction scheme? (04 Marks) b. What is the basic complier technique for exposing ILP? (06 Marks) c. How to overcome the data hazards with dynamic scheduling? (10 Marks) 4 a, How do exploit ILP, using multiple issues and dynamic scheduling? (10 Marks) 'b. What is the basic concept of VLIW approach? (10 Marks) PART -B 5 a, Explain the symmetric shared memory architecture, in detail. (10 Marks) b. Explain in detail, the distributed shared memory and directory based coherence. (10 Marks) 6 a, Howto protect virtual memory and virtual machines? (10 Marks) b, Assume that the hit time of a two —way set-associative first -level data cache is 1.1 times faster than a four-way set-associative cache of the same size. The miss rate falls from 0.049 to 0.044 for an 8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the clock. Assume the miss penalty is 10 clock cycles to the L2 cache for the ‘two-way set-associative cache, and that the L2 cache does not miss. Which has the faster average memory assess time? (05 Marks) cc. Suppose you measures a new DDR3 DIMM to transfer at 16000 MB/sec. what do you think its name will be? What is the clock rate of that DIMM? What is your guess of the name of DRAMS used in that DIMM? (os Marks) 7 a. Describe eleven advanced optimizations for cache performance. (12 Marks) b. What is memory technology and optimization? (08 Marks) 8 a. How to enhance the loop level parallelism? (10 Marks) b. What all are the hardware support for exposing parallelism? (10 Marks) 50, will be treated as malpractice. ions writen eg, 42+8 2. Any revealing of identification, appeal to evaluator and /or eqy Important Note : 1. On completing your answers, compulso USN 06CS81 Eighth Semester B.E. Degree Examination, May/June 2010 Advanced Computer Architecture Time: 3 hrs, Max. Marks:100 Note: Answer any FIVE full questions, selecting at least TWO questions from each part. PART-A 1 a. Define computer architecture. Illustrate the seven dimensions of an ISA. (08 Marks) What is dependability? Explain two main measures of dependability. (06 Marks) . Given the following measurements: Frequency of FP operations = 25% “Average CPI of FP operations = Average CPI of other instructions = 1.33. Frequency of FPSQR = 2% CPI of FPSQR = 20 Assume that the two design alternatives are to decrease the CPI of FPSQR to 2 or to decrease the average CPI of all FP operations to 2.5. Compare the two design alternatives using the processor performance equations. (06 Marks) .0 ‘With a neat diagram, explain the classic five-stage pipeline for a RISC processor. (10 Marks) What are the major hurdles of pipelining? Illustrate the branch hazards in detail. (10 Marks) What are the techniques used to reduce branch costs? Explain both static and dynamic branch prediction used for same. (10 Marks) With a neat diagram, give the basic structure of Tomasulo based MIPS FP unit and explain the various fields of reservation stations. (10 Marks) Explain the basic VLIW approach for exploiting ILP, using rmultiple issues. (10 Marks) What are the key issues in implementing advanced speculation techniques? Explain them in detail (10 Marks) PART-B Explain the basic schemes for enforcing coherence in a shared memory multiprocessor system. (10 Marks) Explain the directory based coherence for a distributed memory multiprocessor system. (10 Marks) ‘Assume we have a computer where the clocks per instruction (CPI) is 1.0 when all memory accesses hit in the cache. The only data accesses are loads and stores and these total 50% of the instructions. If the mass penalty is 25 clock cycles and the mass rate is 2%, how much. faster would the computer be if all instructions were cache hits? (10 Marks) Explain in brief, the types of basic cache optimization. (lo Marks) Which are the major categories of advanced optimizations of cache performance? Explain any one in detail. (10 Marks) . Explain in detail, the architecture support for protecting processes from each other via virtual memory. (10 Marks) Explain in detail, the hardware support for preserving exception behaviour during speculation, (10 Marks) Explain the prediction and speculation support provided in 1A64 (lo Marks) eeeee 40 usy | l | | 06CS81 Eighth Semester B.E. Degree Examination, June 2012 Advanced Computer Architecture Time: 3 hrs. ‘Max. Marks: 100 Note: Answer FIVE full questions, selecting at ieast TWO questions from each part. EB PART-A © 1a. Define computer architecture. Illustrate the seven dimensions of an ISA. (08 Marks) a b. Explain in brief measuring, reporting and summarizing performance of computer system. 3 (08 Marks) 8 c. Assume a disk subsystem with the following components and MTTF: ag 10 disks, each rated at 1000000 - hour MTTE. | ag * SCSI controller, 500,000 — hour MTF. ae + | power supply, 200,000 — hour MTTF, ai = 1 fan, 200,000 — hour MTF. 8 * 1 SCSI cable, 1,000,000 — hour MTTF. eS Using the simplifying assumptions that the lifetimes are exponentially distributed and that 2 3 failures are independent, compute the MT'TF of the system as a whole. (04 Marks) 2 2 a. Explain how pipeline is implemented in MIPS. (06 Marks) fe b. Explain different techniques in reducing pipeline branch penalties. (06 Marks) | és c, What are the major hurdles of pipelining? Explain briefly. (04 Marks) } 35 d. Consider the unpipelined processor in RISC. Assume that it has a 1 ns clock cycle and that it i a3 uses 4 cycles for ALU operations and branches and 5 cycles for memory operations, Assume =: that the relative frequencies of these operations are 40%, 20% and 40% respectively. | 23 Suppose that due to clock skew and setup, pipelining the processor adds 0,2 ns of overhead BE to the clock. Ignoring any latency impact, how much speedup in the instruction execution ee rate will we gain from a pipeline? (4 Marks) z E i 3° a. What are the basic compiler techniques for exposing ILP? Explain briefly. (08 Marks) ee b. Explain Tomarulo’s algorithm, sketching the basic structure of a MIPS floating point unit ey (08 Marks) as ¢. Explain true data dependence, name dependence and control dependence with an example i 33 code fragment, (04 Marks) | PP 4 a Explain exploiting ILP using dynamic scheduling, multiple issue and speculation. (08 Marks) i b. Explain Pentium 4 pipeline supporting multiple issue with speculation. (08 Marks) i ¢. Suppose we have a VLIW that could issue two memory references, two FP operations and one integer operation or branch in every clock cyele, show an unrolled version of the loop x(i)=x(i) +8, for such a processor. Unroll as many times as necessary to climinate any stalls, Ignore delayed branches. = MIPS Code a Loop: L.D. Fo, (Ri); [ E ADD.D Fa, Fo, Fos | SD Fy, OR); | DADDUI RL Ri #8 : BNE Rj, Re, Loop (04 Marks) t 1of2 ce oP 06CS81 PART-B Explain basic schemes for enforcing coherence. (08 Marks) Explain performance of symmetric shared memory multiprocessors. (08 Marks) ‘Suppose we have an application running on a 32-processor multiprocessor, which has a 200 ns time to handle reference to a remote memory. For this application, assuming that all the references except those involving communication hit in the local memory hierarchy, which is slightly optimistic, Processors are stalled on a remote request, and the processor clock rate is 2 GHz. If the base CPI (assuming that all references hit in the cache) is 0.5, how much faster is the multiprocessor if there is no communication versus if 0.2% of the instructions involve a remote communication reference? (04 Marks) Explain the six basic cache optimization techniques. (10 Marks) Given the data below, what is the impact of second level cache associativity on its mass penalty? “> Hit time La for direct mapped = 10 clock cycles “Two way set associativity increases hit time by 0.1 clock cycles to 10.1 clock cycles. “Local miss rate L; for direct mapped = 25% ‘Local miss rate L; for two-way set associative = 20% Miss penalty L2 = 200 clock cycles. (06 Marks) What are the techniques for fast address translation? Explain. (04 Marks) Explain any 3 advanced cache optimization techniques. (08 Marks) Explain memory technology and optimizations. (06 Marks) Assume that the hit time ofa two-way set associative first level data cache is 1.1 times faster than a four-way set associative cache of the same size. The miss falls from 0,049 to 0.044 for an 8 KB data cache. Assume a hit is # clock cycle and that the cache is the critical path for the clock. Assume that the miss penalty is 10 clock cycles to the L; cache for the two-way set associative cache, and that the L cache does not miss. Which has the faster average memory access time? (06 Marks) Explain detecting and enhancing loop level parallelism for VLIW. (06 Marks) Explain Intel-TA 64 architecture with a neat diagram. (06 Marks) Explain hardware support for exposing parallelism for VLIW and EPIC. (8 Marks) wanes 2of2 usw |] | | ] 06CS81 i Eighth Semester B.E. Degree Examination, December 2011 Advanced Computer Architecture Time: 3 hrs. Max. Marks:i00 Note: Answer any FIVE full questions, selecting at least TWO questions from each part. PAI A 1 a. Define the computer architecture. Explain the response time, throughput, elapsed time and & processor clock, (06 Marks) 2 b. Briefly explain the Amdahl’s law. (07 Marks) 7 ¢. Tyo code sequences for a particular machine are considered by a compiler designer. i Instruction class CPI for this instruction elass i 3 2 2 S c 3 Be The compiler designer considers 2 code sequences that require the following instruction Be counts for a particular high — level language statement, : code sequence _instuetion counts for instruction class A Code sequence SIRE 5 cy #! 1 20, 10 20 53 2 40 10 10) 2g i) Which code sequence executes most of the instructions? a8 fi) What is the CPI for each sequence? ze iif) Which will be faster? (7 Marks) 23 22 2 a. What are the major hurdles of pipelining? Hlustrate the data hazard, briefly. (10 Marks) 34 b. With a neat block diagram, explain how an instruction can be executed in 4 or 5 clock cycles 7 in MIPS data path, without the pipeline register. (10 Marks) ei 3 3 a. List the steps to unroll the code and schedule (05 Marks) gs >. Explain how Tomasulo’s algorithm can be extended to support speculation, (10 Marks) as ©. Explain the dynamic branch prediction state diagram, (05 Marks) i EZ 4 a. Explainthe basic VLIW approach Lis its drawbacks, (08 Marks) a b. With a neat diagram, explain the steps involved in handling an instruction, with a branch Be target buffer. Also evaluate how well it works. (12 Marks) | 62 t 2 PART-B 2 ; Z 8 & Explain the different taxonomy of parallel architecture, (08 Marks) | 5 b. With a neat diagram, explain the basic structure of a centralized shared — memory and | z distributed ~ memory multiprocessor. (06 Marks) | ¢. Explain the snooping, with a respect to cache ~ coherence protocols. (06 Marks) 1of2 a 06CS81 Explain the six basic optimizations. (12 Marks) ‘With a neat diagram, explain the hypothetical memory hierarchy. (08 Marks) Explain the DRAM technology. How do you improve memory performance inside a DRAM chip? (10 Marks) Explain the compiler optimizations to reduce miss rate. (10 Marks) Find all the true dependences, output dependences and antidependences and eliminate the output and antidependences by renaming, in the code given below: for (i=1; i<=100; +Dt ylil=xfi]/c; /# sl» x[iJ=x[i}#es #82 #1 2{i] = yfi}+e ; J*s3 4/ ylil=c-ylil; #54 4/ } (10 Marks) Write short notes on: i) The Itanium 2 processor ii) IA 64 register model. (LO Marks) 20f2 treated as malpractice, ing blank pages. ADB = 50, , compulsorily draw diagonal cross lines on the remini fication, appeal to evaluator and for equations written ep, 4 Important Note : 1. On completing your answers, 2. Any revealing of identi USN LE] 06CS81 Eighth Semester B.E. Degree Examination, June/July 2011 Advanced Computer Architecture ‘Time: 3 hrs. Max. Marks:100 Note: Answer FIVE full questions, selecting atleast TWO questions each from Part — A and Part - B. PART-A 1 . Explain with a learning curve, how the cost of processor varies with time along with factors influencing the cost. (06 Marks) . Find the number of dies per 200cm wafer of circular shape that is used to cut die that is 1.5em side and compare the number of dies produced on the same wafer if die is 1.2Sem. (06 Marks) Define Amdahls law. Derive an expression for CPU clock as a function of instruction count, clocks per instruction and clock eycle time. (08 Marks) . What are major hazards in a pipeline? Explain data hazard and methods to minimize data hazard with example. (08 Maris) Consider the following calculations: x=y+z ; a=b*. Assume the calculations are done using registers. Show, using 5 stage pipeline, how many clock pulses are required for direct operations. By recording with stalls show how many clock pulses are required and saving in the number of clock pulses to solve data hazard. (12 Marks) What are data dependencies? Explain name dependences with example between two instructions. (06 Marks) What is correlating predictors? Explain with examples. (06 Marks) For the following instructions, using dynamic scheduling show the status of R.O.B, Reservation station when only MUL.D is ready to commit and two L.D committed, LD F6,32(R2) LD F2,44(R3) MUL.D FO, F2, F4 SUBD F8,F2, F6 DIV.D F10, FO, F6 ADD.D F6, F8, F2. Also show the type of hazards between instructions, (08 Marks) Explain the basic VLIW approach for exploiting ILP, using multiple issues. (08 Marks) . What are the key issues in implementing advanced speculation techniques? Explain in detail, (08 Marks) Write a note on value predictors. (04 Marks) PART-B Explain the directory based cache coherence for a distributed memory multi processor system along with state transition diagram. (10 Marks) 1of2 O6CS8L . Explain any two hardware primitives to implement synchronization with example.(10 Marks) Explain block replacement strategies to replace a block, with example when a cache | (06 Marks) . Explain the types of basic cache optimization. + (09 Marks) . With a diagram, explain organization of data cache in the opteron microprocessor. (0S Marks) Explain the following advanced optimization of cache : i) Compiler optimizations to reduce miss rate. ii) Merging write buffer to reduce miss penalty. iii) Non blocking caches to increase cache band width. (09 Marks) . Explain in detail the architecture support for protecting processor from each other via virtual machines. (06 Marks) Explain internal organization of 64Mb DRAM. (05 Marks) Explain in detail the hardware support for preserving exception behaviour during speculation. (10 Marks) . Explain the architecture of IAG4 intel processor and also the prediction and speculation support provided. (10 Marks) 20f2 bbe treated as malpractice, = 50, 3 and /or equations writer eg, 42+8 Important Note : |. On completing your answeis, compulsor USN | ] | ] 06CS81 Eighth Semester B.E. Degree Examination, December 2010 Advanced Computer Architecture ‘Time: 3 hrs. Max. Marks:100 ve Note: Answer any FIVE full questions, selecting at least TWO questions from each part. PART-A List and explain four important technologies, which have led to the improvements in ‘computer system. (07 Marks) ‘The given data presents the power consumption of several computer system components: Component Product Performance Power | Processor _| Sun Niagara 8-core 2-79 W, DRAM Kingston 1 GB 3.7W | Hard drive | Diamond Max 7.9 W read i 4.0 W idle i) Assuming the maximum load for each component, a power supply efficiency of 70%, what wattage must the server's power supply deliver to a system with a Sun Niagara 8-core chip, 2 GB 184-pin Kingston DRAM and 7200 rpm hard drives? ii) How much power will the 7200 rpm disk drive consume, if it is idle roughly 40% of tie time? iii) Assume that for the same set of requests, a 5400 rpm disk will require twice as much lime to read data as a 10800 rpm disk. What percentage of time would the 5400 mpm disk drive be idle to perform the same transaction as in part (i)? (07 Marks) We will run two applications on dual Pentium processor, but the resource requirements are not the same, The fitst application needs 80% of the resources, and the other only 20% of the resources. 1) Given that 40% of the first application is parallelizable, how much speed up will we achieve with that application, if run in isolation? ii) Given that 99% of the second application is parallelizable, how much speed up will this application observe, if run in isolation? iii) Given that 40% of the first application is parallelizable, how much overall system speedup would you observe, if we parallelized it? (06 Marks) List pipeline hazards. Explain any one in detail. (07 Marks) List and explain five different ways of classifying exception in a computer system.(07 Marks) An unpipelined machine has 10 ns clock cycle and it uses four cycles for ALU operations and branches, five cycles for memory operations. Assume that relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose due to clock skew and set up, Pipelining the machine adds 1 ns overhead to the clock. Find the speed up from pipelining, (06 Marks) Show how the below loop would look on MIPS 5-stage pipeline, under the following situations. Find the number of cycles per iteration, for each case. Assume the latencies for integer and floating point operations, as given in the prescribed text book. Loop: L.D FO, 0(RI) ADD.D F4, FO, F2 S.D 4, 01) DADDUI R1,RL#-8 BNE RI, R2, loop 1 of2 b. 06CS81 Question No.3(a) continued... 1) Without scheduling and without loop unrolling ii) With scheduling and without loop unrolling. iii) With loop unrolling four times and without scheduling. iv) With loop unrolling four times and with scheduling. (22 Marks) ‘What is the drawback of I-bit dynamic branch prediction method? Clearly state, how it is overcome in 2-bit prediction. Give the state transition diagram of 2-bit predictor. (08 Marks) Explain the salient features of VLIW processor. (08 Marks) Explain branch-target buffer. . (08 Marks) Write a short note on value predictors. (04 Marks) PART-B ‘What is multiprocessor cache coherence? List two approaches to cache coherence protocol. Give the state diagram for write-invalidate write-back cache coherence protocol. Explain the three states of a block. (12 Marks) List and explain any three hardware primitives to implement synchronization. (08 Marks) Assume we have a computer where CPI is 1.0 when all memory accesses hit in the cache. ‘The only data accesses are loads and stores, and these total 50% of the instructions. If the miss penalty is 25 cycles and miss rate is 2%, how much faster would the computer be, if all instructions were cache hits? (08 Marks) Briefly explain four basic cache optimization methods. (12 Marks) List and explain three C’s model that sorts all cache misses. (06 Marks) Explain the optimization methods mentioned belo i) Trace cache to reduce hit time ii) Non-blocking cache to increase cache bandwidth iii) Mutt: banked cache to increase cache bandwidth. (wy Marks) Briefly explain how memory protection is enforced via virtual memory. (05 Marks) Consider the loop below: for (1 3is100;i=it 1) { A[i]=A[i]+B{i];1*S1#1 BLit1J=CliJ+Dfi}sles2e1 3 What are the dependences between $1 and $2? Is this loop parallel? If not, show how to make it parallel. (08 Marks) Explain Intel [A-64 architecture. (12 Marks) 2of2

You might also like