You are on page 1of 23

Tikrit University The academic year 2019-2020

College of Petroleum Process Eng.


Petroleum and Control Eng. Dept.. Course:Computer Architecture1

Chapter7: Parallel and Pipelined


Processing
Basic Ideas
• Parallel processing • Pipelined processing

time time
P1 a1 a2 a3 a4 P1 a1 b1 c1 d1

P2 b1 b2 b3 b4 P2 a2 b2 c2 d2

P3 c1 c2 c3 c4 P3 a3 b3 c3 d3

P4 d1 d2 d3 d4 P4 a4 b4 c4 d4

Less inter-processor communication More inter-processor communication


Complicated processor hardware Simpler processor hardware

different types of operations performed


a, b, c, d: different data streams processed
Data Dependence

• Parallel processing requires NO • Pipelined processing will


data dependence between involve inter-processor
processors communication

P1 P1

P2 P2

P3 P3

P4 P4

time time
Basic Pipeline
Five stage “RISC” load-store architecture

1. Instruction fetch (IF)


• get instruction from memory, increment PC
2. Instruction Decode (ID)
• translate opcode into control signals and read registers
3. Execute (EX)
• perform ALU operation, compute jump/branch targets
4. Memory (MEM)
• access memory if needed
5. Writeback (WB)
• update register file
Time Graphs
Clock cycle
1 2 3 4 5 6 7 8 9
add IF ID EX MEM WB

lw IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

Latency: 5 cycles
Throughput: 1 instr/cycle
Concurrency: 5 CPI = 1
Cycles Per Instruction (CPI)

• Instruction mix for some program P, assume:


• 25% load/store ( 3 cycles / instruction)
• 60% arithmetic ( 2 cycles / instruction)
• 15% branches ( 1 cycle / instruction)

• Multi-Cycle performance for program P:


• 3 * .25 + 2 * .60 + 1 * .15 = 2.1
• average cycles per instruction (CPI) = 2.1
SIX STAGE OF INSTRUCTION PIPELINING

 Fetch Instruction(FI)
Read the next expected instruction into a buffer
 Decode Instruction(DI)
Determine the opcode and the operand specifiers.
 Calculate Operands(CO)
Calculate the effective address of each source operand.
 Fetch Operands(FO)
Fetch each operand from memory. Operands in registers need
not be fetched.
 Execute Instruction(EI)
Perform the indicated operation and store the result
 Write Operand(WO)
Store the result in memory.
Timing diagram for instruction pipeline
operation
Six-stage CPU instruction pipeline
6
Pipeline Performance: Clock & Timing

Si Si+1

m d

Clock cycle of the pipeline :

Latch delay : d
= max { m }+d
Pipeline frequency : f
f=1/
Advantages

• Pipelining makes efficient use of resources.


• Quicker time of execution of large number of
instructions
• The parallelism is invisible to the programmer.
Speed Up Equation for Pipelining

For simple RISC pipeline, CPI = 1:


Reduced Instruction Set Computers(
(RISC) Pipelining
(RISC)Pipelining
• Key Features of RISC
– Limited and simple instruction set
– Memory access instructions limited to memory <-> registers
– Operations are register to register
– Large number of general purpose registers
(and use of compiler technology to optimize register use)
– Emphasis on optimising the instruction pipeline
(& memory management)
– Hardwired for speed (no microcode)
Memory to Memory vs Register to Memory
Operations

• (RISC uses only Register to memory)


RISC Pipelining Basics
• Define two phases of execution for register based instructions
– I: Instruction fetch
– E: Execute
• ALU operation with register input and output
• For load and store there will be three
– I: Instruction fetch
– E: Execute
• Calculate memory address
– D: Memory
• Register to memory or memory to register operation

For simple RISC pipeline, CPI = 1:


Effects of RISC Pipelining

(b) Three stage pipelined timing


Optimization of RISC Pipelining

• Delayed branch
– Leverages branch that does not take effect until
after execution of following instruction
‫زفحي عرف يذ ل لخدي زيح ذيفنت ىتح دعب ذيفنت عت مي ت ت ةي‬
– The following instruction becomes the delay slot
Normal vs Delayed Branch

You might also like