You are on page 1of 15

Computer Hardware

Pipeline
Conventional Datapath
11-1

  2.4 ns is 0.6 ns
3
0.6

required to Clock WB
Register file OF Register file
perform a single 0.6 ns 0.6

operation (i.e. 1
416.7 MHz). MUX B 0.2 ns MUX B 0.2

OF
EX
0.2

Function unit 0.8 ns 2 Function unit 0.8

EX
MUX D 0.2 ns
WB 0.2

3 MUX D 0.2

© 2008 Pearson Education, Inc.


M. Morris Mano & Charles R. Kime (a) Conventional (b) Pipelined
LOGIC AND COMPUTER DESIGN FUNDAMENTALS,4e
Production Line Analogy
  Automated car wash: Cars are pulled through a series of
stations at which a particular step if performed:
1. Wash
2. Rinse
3. Dry
  Think of latency time = time needed to wash, rinse and
dry. Think of rate of delivery of washed cars or throughput
  Based on this analogy àpipelined datapaths with n-
stages have a processing rate or throughput for
instructions that is n times that of non-pipelined
datapaths.
11-1
Pipelined Datapath Clock

0.6 ns 0.6 ns
3

Clock WB
  A Pipelined DatapathRegister
is donefile by OF Register file
breaking a conventional datapath0.6into ns 0.6 ns
parts by inserting registers as pipeline
platforms between these parts
1
  These registers provide temporary
MUX B 0.2 ns MUX B 0.2 ns
storage for data passing through the
pipeline OF
  Data moves synchronously with the EX
clock 0.2 ns

  Delay of operand fetch (OF) is 0.8 ns,


delay of execution (EX) is 1.0 ns,0.8
Function unit
delay
ns 2 Function unit 0.8 ns
of write-back (WB) is 1.0 ns
  min clock period = 1.0 ns
  Operating frequency= 1.0 Ghz MHz (2.4
times that of the non-pipelined.) EX
MUX D 0.2 ns
WB 0.2 ns
  Even though there are 3 stages, the
improvement factor is not quite 3. Why? 3 MUX D 0.2 ns

© 2008 Pearson Education, Inc.


M. Morris Mano & Charles R. Kime (a) Conventional (b) Pipelined
LOGIC AND COMPUTER DESIGN FUNDAMENTALS,4e
Pipelined Datapath
11-2
Register
OF file
AA A data B data BA

Constant in
  OF consists of reading 1
Operand Fetch (OF)
register values (A&B), or MUX B MB

selecting constant value OF

(MB). The pipeline EX


Address out

platform stores the Data out

operand(s) to be used in
EX during next clock cycle A B
2 FS
  In EX a function unit Execute (EX)
V Function
operation occurs, and the C
unit

results captured by the N


Z F

2nd pipeline platform Data in

  WB is the write-back EX

stage: the result is saved WB

from the EX stage or the 3


Write-back (WB) MD 0
MUX D
1

value on Data in (selected


by MUX D). © 2008 Pearson Education, Inc. WB RW
DA
D data
Register
file (same
M. Morris Mano & Charles R. Kime as above)
LOGIC AND COMPUTER DESIGN FUNDAMENTALS,4e
11-3

Pipelined Execution Pattern


Clock cycle
1 2 3 4 5 6 7 8 9
R1 R2 R3 1 OF EX WB
R4 sl R6 2 OF EX WB
R7 R7 1 3 OF EX WB
R1 R0 2 4 OF EX WB

Data out R3 5 OF EX WB

R4 Data in 6 OF EX WB

R5 0 7 OF EX WB
Microoperation

  What is total time required by conventional datapath


for execution?
à 7 (microoperation)× 2.4 (ns) = 16.8 ns
© 2008 Pearson Education, Inc.
M. Morris Mano & Charles R. Kime

  What is total time required by pipelined datapath for


LOGIC AND COMPUTER DESIGN FUNDAMENTALS, 4e

execution?
à (9 cycles)× 1 = 9 ns
Pipelined Execution Pattern
Clock cycle
1 2 3 4 5 6 7 8 9
R1 R2 R3 1 OF EX WB
R4 sl R6 2 OF EX WB
R7 R7 1 3 OF EX WB
R1 R0 2 4 OF EX WB

Data out R3 5 OF EX WB

R4 Data in 6 OF EX WB

R5 0 7 OF EX WB
Microoperation
  Maximum improvement of pipelined over conventional can
be obtained when the pipeline if fully utilized (all stages are
active) e.g. over the 5 clock cycles, 3 to 7 (the pipeline is
full),Mano &5 operations
© 2008 Pearson Education, Inc.
M. Morris Charles R. Kime are completed in 5 ns. While in the same
time the conventional can execute 5ns ÷2.6 ns/
LOGIC AND COMPUTER DESIGN FUNDAMENTALS, 4e

microoperation = 2.083 microoperations


à the pipelined executes 5 ÷ 2.083 = 2.4 times as many
microoperations as conventional
Pipelined
11-4 Computer IF
PC

Address
Stage Instruction
1 memory
Instruction

Registers are added to IF


DOF
IR
Register
the pipeline platforms AA
file
A data B data BA

to pass the instruction


information through the Stage Zero fill
2
pipeline. Instruction decoder
MUX B MB

AA BA MB
DOF Data A Data B
EX Address out
FS MW

4 A B
FS Address
C
Function
Stage V Data
unit
3 memory
N
Z F Data out
Data in
Data out MW
EX Data F Data I
Data in Address
WB

Stage DA MD RW Data
4 MD MUX D memory
(same as
above)
WB RW D data
© 2008 Pearson Education, Inc. DA Register
file (same
M. Morris Mano & Charles R. Kime CONTROL DATAPATH as above)
LOGIC AND COMPUTER DESIGN FUNDAMENTALS,4e
11-5

Performance of Pipelined Computer


Clock cycle
1 2 3 4 5 6 7 8 9 10
1 IF DOF EX WB

2 IF DOF EX WB
3 IF DOF EX WB
4 IF DOF EX WB

5 IF DOF EX WB

6 IF DOF EX WB

7 IF DOF EX WB
Instruction

  Compare the performance of the single-cycle


computer with the performance of the pipelined
computer (Compare for the situation in which the
© 2008 Pearson Education, Inc.
M. Morris Mano & Charles R. Kime
pipeline is fully utilized.)
LOGIC AND COMPUTER DESIGN FUNDAMENTALS, 4e

  4 instructions versus 20ns/17ns/inst. or 1.18


instructions Throughput Pipelined = 3.4x Single-Cycle
Performance Issues

  If a pipeline has 4 stages


–  performance is improved 4 times! Why?
  Pipelining Hazards cause the pipe to stall
because of some conflict in the pipe (prevents the
next instruction in pipe from executing in its turn)
  Types of hazards
–  Structural: contention for same hardware resource
–  Data: dependency on earlier instruction for the correct
sequencing of register reads and writes
–  Control: branch/jump instructions stall the pipe until
get correct target address into PC
Reduction in Throughput
  Fillingand flushing of the pipeline reduces the
throughput executed below the maximum level.
  Data and control hazards are timing problems
that arise because the execution of an operation
in a pipeline is delayed by one or more clock
cycles from the time at which the instruction
containing the operation was fetched.
Data Hazard Problem
A hardware-based solution
11-11

R1 data hazard detected


pipeline stalled, and bubble launched
R1 write and reads
1 2 3 4 5 6 7 8
MOVA R1, R5 R1 R5 IF DOF EX WB
R2 Write and read
(ADD R2, R1, R6) R2 R1 R6 IF DOF

ADD R2, R1, R6 R2 R1 R6 IF DOF EX WB

(ADD R3, R1, R2) R3 R1 R2 IF DOF

ADD R3, R1, R2 R3 R1 R2 IF DOF EX WB

R2 data hazard detected,


pipeline stalled, and
bubble launched.

© 2008 Pearson Education, Inc.


M. Morris Mano & Charles R. Kime
LOGIC AND COMPUTER DESIGN FUNDAMENTALS, 4e
Control Hazards
Control Hazards
11-16

R1 = 0 evaluated

PC set to 20
1 2 3 4 5 6 7
No change
1 BZ R1, 18 IF DOF EX WB
2 MOVA R2 R3 IF DOF EX WB No change

3 MOVA R1 R2 IF DOF WB
20 MOVA R5 R6 IF DOF EX WB

Branch detected
and bubbles launched
Instruction MOV R5, R6
fetched from target address

© 2008 Pearson Education, Inc.


M. Morris Mano & Charles R. Kime
LOGIC AND COMPUTER DESIGN FUNDAMENTALS, 4e

You might also like