This action might not be possible to undo. Are you sure you want to continue?

1 The suboperations performed in each segment of the pipeline are as follows: R1<--------------- Ai, R2<-------------- Bi R3<--------------- Ci, R4<-------------- Di R5<--------------- R1+R2, R6<---------- R3+R4 R7<----------------R5*R6 Input Ai and Bi Input Ci and Di Add the inputs Multiply

Each segment contains one or two registers and a combinational circuit as shown in the configuration below: Ai Bi Ci Di

R1

R2

R3

R4

Adder

Adder

R5

R6

R1

Multipier

R7 Fig.1 Pipeline Configuration for (Ai+Bi)*(Ci+Di) The following table shows the content of all the registers for i= 1 through 6, Segment 1 R2 R3 B1 C1 B2 C2 B3 C3 B4 C4 B5 C5 B6 C6 Segment 2 R5 R6 A1+B1 C1+D1 A2+B2 C2+D2 A3+B3 C3+D3 A4+B4 C4+D4 A5+B5 C5+D5 A6+B6 C6+D6 Segment 3 R7 (A1+B1)*(C1+D1) (A2+B2)*(C2+D2) (A3+B3)*(C3+D3) (A4+B4)*(C4+D4) (A5+B5)*(C5+D5) (A6+B6)*(C6+D6)

Clock Pulse 1 2 3 4 5 6 7 8

R1 A1 A2 A3 A4 A5 A6 -

R4 D1 D2 D3 D4 D5 C6

the maximum speedup that can be achieved is 6 . we assume tn = ktp i. no of segments= 6 β΄ π = 6 +100β1 = 100 Γ50 5000 1050 = 100 =4. Speedup.e. the speedup ratio = 100/21=4.4 i. n = 100. for the maximum speedup.76 ii. 6+(200-1) = 205 clock cycles QUESTION 9.2 Firstly. S = ππ‘π π+πβ1 π‘π 1 T1 2 T2 T1 3 T3 T2 T1 4 T4 T3 T2 T1 5 T5 T4 T3 T2 T1 9 T8 T7 T6 T5 T4 10 11 12 13 T8 T7 T6 T5 T8 T7 T6 T8 T7 T8 No of task. π‘π ππ‘π π = π‘π = π‘π = π π = 6 Γ10 10 =6 Hence. tp = 10ns.3 Number of segments. n = 200 No of clock cycles = k +(n-1) Therefore.QUESTION 9. k = 6 Number of tasks. we determine the no of clock cycle No of segment k = 6 No of tasks n = 8 Tp. clock cycle time is k+(n-1) 6+(8-1) = 13 clock cycles The Space-time diagram for a 6-segment pipeline is shown below: 6 7 8 1 T6 T7 T8 2 T5 T6 T7 3 T4 T5 T6 4 T3 T4 T5 5 T2 T3 T4 6 T1 T2 T3 It takes 13 clock cycles to process 8 tasks in a 6-segment pipeline QUESTION 9.76 21 Hence. tn = 50ns.

For the non-pipeline system. tn= 100ns. speedup that can be achieved is the number of segments in the pipeline. n = 7.5 Ai 40ns R1 Bi Ci R2 Multiplier 45ns R3 R4 Adder R1 Number of tasks . tn= (40+45+15)ns = 100ns ntn= 7 Γ 100= 700ns c. k= 3 a. Number of segment. Speedup for 10tasks. Minimum clock cycle time: Tp = (45+5)ns =50ns b. we have. n=10.79 1000 1071 d.e. S= ππ‘π π‘π = π S= 3 . tp= 50ns S= ππ‘π π+πβ1 π‘π = 10Γ100 3+10β1 50 = 1000 12Γ50 = 50 63 =0. Max.934 e. i. Speedup of the pipeline for 100tasks: S= ππ‘π π+π β1 π‘π = 100Γ100 3+100β1 50 = 10000 102Γ50 = =0.QUESTION 9.

t3 = 95ns. M are the mantissas a. . . Interface register delay time tr = 5ns Maximum time delay = t3 Clock cycle tp = t3 + tr = 95 + 5 = 100ns Hence.8. t2 = 30ns.. X2.7 The time delay of the four segments in the pipeline in figure 3 are as follows: t1 = 50ns.. . X3 . t4 = 45ns. C . a. t3 = 95ns.Question 9. . How would you use the floatingβpoint pipeline adder of fig 9. ..6 to add 100 floatingβpoint numbers X1 + X2 + X3 + . . . b. . . + X100 ? Solution Let the floating point numbers X1.. . B. t2 = 30ns. c . How can we reduce the total time to about one half of the time calculated in part (a)? Solution Time delays for each of the four segments are: t1 = 50ns.. X100 = M X 2m Where A. . X100 be represented in the form below: X1 = A X 2 a X2 = B X 2 b X3 = C X 2 c . The interface registers delay time tr = 5ns. the minimum clock cycle for each task is 100ns For 100 tasks (100 pairs of numbers) we have: (100 * 100) ns = 10000ns Question 9... How long would it take to add 100 pairs of numbers in the pipeline? b. . t4 = 45ns. . m are the exponents with the assumption that the floating point numbers are binary numbers.

.

It also stores the next few instructions after the branch target instruction. it searches the associative memory branch target buffer for the address of the instruction. BRANCH TARGET BUFFER (BTB) The branch target buffer is an associative memory included in the fetch segment of the pipeline. the flow of control is unchanged and the next instruction to be executed is the instruction immediately following the current instruction in memory. the next instruction to be executed is an instruction at some other place in memory. the instruction is available directly and pre-fetch continues from the new path. depending on a condition such as CPU flag. also it alters the sequence program flow by loading the program counter with the target address. subroutine calls or GOTO statements. Both are saved until the branch is executed. Pipelined computers employ various hardware techniques to minimize the performance degradation caused by instruction branching.10 Pipeline Processing Four possible hardware schemes that can be used in an instruction pipeline in order to minimize the performance degradation caused by instruction branching What is Branch Instruction? A branch (or jump on some computer architectures such as the PDP-8 and intel x86) is a point in a computer program where the flow of control is altered. The term branch is usually used when referring to a program written in machine code or assembly language. PREFETCH TARGET INSTRUCTION Pre-fetching of the target instruction is way of handling a conditional branching. If the branch condition is successful. a branch instruction can be taken or not taken: if a branch is not taken. the pipeline continues from the branch target instruction. An extension of this procedure is to continue fetching instructions from both places until the branch decision is made. the pipeline shifts to a new instruction stream and stores the target instruction in the .Question 9. and an unconditional branch which is always taken. An instruction that causes a branch. There are usually two forms of branch instruction which are: a conditional branch that can be either taken or not taken. One way of handling a conditional branch is to pre-fetch the target instruction in addition to the instruction following the branch. At that time control chooses the instruction stream of the correct program flow. in a high-level programming language. if taken. If the instruction is not in the branch target buffer. If it is in the branch target buffer. Each entry in the branch target buffer consists of the address of a previously executed branch instruction and the target instruction for that branch. When the pipeline decodes a branch instruction. Branches usually take the form of conditional statements.

LOOP BUFFER A variation of the branch target buffer is the loop buffer. Hence the inner product = 40*40 = 1600 b. a. it will have .19 Flop is the number of floating point operation performed per seconds by a computer system. If the processor of this super computer can calculate floating point operations through a pipeline each cycle time. b. Multiply-add = inner products *40 = 1600 There are 64000 multiply-add operations needed to calculate the product matrix. A typical super computer has a basic 4 cycle time to 20ns.16 Consider the multiplication of two 40*40 matrices using a vector processor. This is a very high speed register file maintained by the instruction fetch segment of the pipeline. The advantage of this scheme is that branch instructions that have occurred previously are readily available in the pipeline without interruption. it is stored in the loop buffer in its entirety. Question 9. Megaflop is the number of millions operations performed by the computer system and Gigaflop is the number of billions operations performed by the computer system. The program can be executed directly without having to access memory until the loop mode is removed by the final branching out. Question 9. The pipeline then begins pre-fetching the instruction stream from the predicted path. including all branches. How many multiply-add operations are needed to calculated the product matrix? Solutions a. There are 40 product terms in each of the inner product. A correct prediction eliminates the wasted time caused by branch penalties.branch target buffer. When a program loop is detected in the program. The product terms in each inner product = 40. How many product terms are there in each inner product and how many inner products must be evaluated. BRANCH PREDICTION A pipeline with branch prediction uses some additional logic to guess the outcome of a conditional branching instruction before it is executed.

The total time required = = ππ ππ ππππππ‘ππππ ππ ππ ππππππ π πππ = 2500ns Γ ππππππ π ππ ππ¦πππ π‘πππ 400 4 Γ 40ππ =4000ns ii) when using a single processor with a clock cycle of 10ns to perform the same task. . there is no difference in the time taken to perform the jobs between the two cases. Hence. The Supercomputer can perform 100 million floating point operation i.the ability to perform 50 to 250 megaflops. the time that it will take this computer to carry out the operation is: 1000 x 250 100 QUESTION 9.20 To perform 400 floating-point operations using four processors with a cycle time of 40ns in each. 250 Gigaflops. 100 megaflops.e. The number of operation is 250 billion floating point operations i.e. the total time required = (400/1)Γ 10 =4000ns Therefore.

- 2.2
- 09 Pipeline Hazards
- Computer Architecture
- 03 Dynamic Sched
- Index to Computer Architecture
- Performance
- R700-Family Instruction Set Architecture
- Chapter 8 - Pipelining
- 5 PP Pipeline
- Lec02 Review
- L04-Pipelining
- l05_singlecycle.pdf
- Comparch 002 Lecture Slides SD2 XAek1vrBgk (2)
- Kulkarni Presentation CISC
- lec07
- MIPS Pipeline in detail
- 10.1.1.28.2053
- Adding microMIPS Backend to LLVM Compiler Infrastructure.pdf
- Cpu Organisasi
- Float Pipe
- DISCUSSING INSTRUCTION SET ARCHITECTURE
- 14907_sharc
- Test Yourself MP Final,engineering,test, electronics
- Pipeline and Vector Processing
- Flynns Classification
- Computer Architecture
- Lec 8
- ee3376-isa
- 41.ppt
- Pipe Line1
- Csc 424 Assignment

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd