Pipelining

© All Rights Reserved

0 views

Pipelining

© All Rights Reserved

- Planner for Semester 1(maths).docx
- Vlsi Signal Processing
- Sap 1
- Advanced Computer Architecture
- math01-27to01-30207
- Modular Composition
- LTE Site Acceptance_Site1.docx
- ADE7878
- a 3 oa 4
- Math g7-Dll July 16-20
- 10 Mathematical Aptitude
- Pipe
- 13054119 Pipe Lining
- HH_FIR_VHDL
- VHD FIR Filter
- ifip-vlsi09
- Lecture 22
- Verilog Programs
- AD7730_7730L
- Microelectronics Research

You are on page 1of 27

why wait . . . ?

... let's solve a "real problem"

device: washer

function: fill, agitate, spin

washerPD = 30 mins

function: heat, fast spin

dryerPD = 60 mins

one load at a time

reason that students put off

doing laundry so long is not

because they procrastinate,

step 1:

are lazy,

or even not because they are

working with their

computation slides

at a time is not smart step 2:

doing N loads of laundry

the "combinational" way

step 2:

of pipelining yet ! step 3:

step 4:

.....

= N*90 mins

doing N loads... the 1st year E.E. way

step 1:

laundry process

step 2:

if we account for the startup step 3:

transient correctly

.....

when doing pipeline analysis, we're

mostly interested in the "steady

state" where we assume we have an

infinite supply of inputs

= N*60 mins

some definitions

latency:

the delay from when an input is established until the

output associated with that input becomes valid

assuming that the

(0th year's laundry = 90 mins) wash is started

(1st year's laundry = 120 mins) as soon as

possible and waits

(wet) in the

implies that 0th's in a six hour wait gets 4 loads done,

washer until dryer

while 1st's gets 5 and goes home half an hour earlier

is available

throughput:

the rate of which inputs or outputs are processed

(0th year's laundry = 1/90 mins-1= 0.011 mins-1)

(1st year's laundry = 1/60 mins-1= 0.016 mins-1)

okay, back to circuits...

F

latency = tPD

X H P(X) throughput = 1/tPD

we can't get the answer

G faster, but are we making

effective use of our

hardware at all times?

X

F(X)

G(X)

P(X)

stable while H performs its computation

pipelined circuits

use registers to hold H's input stable!

now F and G can be working on input

Xi+1.

because of the 2-stage pipeline :

P(X) is valid during clock j+2.

suppose F, G, H have propagation delays of 15, 20, 25 ns

and we are using ideal zero-delay registers:

latency throughput

unpipelined 45 1/45

worse better

pipeline diagrams

clock cycle

i i+1 i+2 i+3

pipeline stages

...

G reg G(Xi) G(Xi+1) G(Xi+2)

move diagonally through the diagram,

progressing through one pipeline stage each clock cycle

pipeline conventions

definition:

a K-stage pipeline ('K-pipeline") is an acyclic circuit having

exactly K registers on every path from an input to an output

a combinational circuit is thus a 0-stage pipeline

convention:

every pipeline stage, hence every K-stage pipeline, has a

register on its output (not on its input)

always:

the clock common to all registers must have a period

sufficient to cover propagation over combinational paths

PLUS (input) register tPD PLUS (output) register tSETUP

period of the clock common to all registers

the throughput of a K-pipeline is the

frequency of the clock

ill-formed pipelines

consider a bad job of pipelining:

none

problem:

successive inputs get mixed: e.g., B(A(Xi+1 ), Yi)

this happened because some paths from inputs to outputs

had 2 registers, and some had only 1!

can this happen on a well-formed K pipeline?

a pipelining methodology

step 1: STRATEGY:

draw a line that crosses every

output in the circuit, and focus your attention on placing

select one endpoint as an pipelining registers around the

origin slowest circuit elements

(bottlenecks)

step 2:

continue to draw new lines

from the origin across various

circuit connections such that

these new lines partition the

inputs from the outputs

every point where a separating

line crosses a connection will

always generate a valid

pipeline

pipeline example

observations:

• 1-pipeline improves neither

latency nor throughput

• troughput is improved by

breaking long combinational

paths, allowing faster

clock

• too many stages cost

LATENCY THROUGHPUT latency while not

improving throughput

0-pipe 4 1/4 • back-to-back registers

are often required to keep

1-pipe 4 1/4 pipeline well-formed

2-pipe 4 1/2

3-pipe 6 1/2

considering pipelining

• advantages

– higher throughput than

the corresponding combinatorial device

– different parts of the logic

work on different parts of the problem

• disadvantages

– generally, increases latency

– only as good as the weakest link

how do 1st year EE's a.d.2010 laundry

they work around the bottleneck:

first they find a place

with twice as many dryers as washers

step 1:

step 3:

latency = 90 min

step 4:

step 5:

circuit interleaving

one way to overcome

a pipeline bottleneck

is to replicate

the critical element

as many time as needed

and alternate inputs

between the various copies

N-1 registers

latency = 2 clocks

to N pipeline stages

combining pipelining and interleaving

combining interleaving

with pipelining

moves the bottleneck

from the C-element

to the F-element

with a propagation delay of 8 ns

a throughput of 4 ns,

this can be considered and latency of 8ns.

as an extra pipelining stage

that passes through the middle of the C' module

assignment B4

Pipeline a combinational encryptor X 5 1 3 3 1

for throughput! 0

The device takes an integer value X 2 4 7

and computes an encrypted version C(X).

1 5 3 8 5

The propagation delay of each module

is given in ms. 6 9 11

(contamination delays are zero). 13

5 10 3 12 1

before monday, march 8, 9:00 C(X)

• what is the latency and throughput

of the unpipelined device? From: student@tue.nl

• give the locations for registers To: computation@ics.ele.tue.nl

(ideal, zero-delay) by edge numberSubject : B4

after maximizing the throughput!

use as few as possible registers! 27 0.40

• give the latency and throughput 4 9 10 drawing in

of your pipelined device! 30 0.80 attachment

attachment <student_B3.xxx>

multiplication (positive numbers)

multiplicand A3 A2 A1 A0

multiplier B3 B2 B1 B0

x

ABi called a "partial A3B0 A2B0 A1B0 A0B0

product

A3B1 A2B1 A1B1 A0B1

A3B2 A2B2 A1B2 A0B2

+ A3B3 A2B3 A1B3 A0B3

easy part:

forming partial products (just an AND gate since Bi is either 0 or 1)

hard part:

adding partial products column by column with carry

multiplication

multiplier B1

3 0B2 1B1 1B0

x

ABi called a "partial A31B0 A02B0 01B0 A10B0

A

product

A31B1 A20B1 A01B1 A

10B1

A3B2 A2B2 A1B2 A0B2 +

+ A3B3 A2B3 A11B3 A10B3 0 1 1

1 0 0 1

multiplying N-bit number by M-bit number gives (N+M)-bit result +

1 1 0 0 0 1 1

easy part:

forming partial products (just an AND gate since Bi is either 0 or 1)

hard part:

adding M N-bit partial products

sequential multiplier

and the multiplier (B) has M bits.

init: P 0, load A&B

repeat M times {

P P + (BLSB ==1 ? A : 0)

shift P/B right one bit

}

can proces one partial product at a time

and then cycle the circuit M times

sequential multiplier (64-bit ALU)

after initialization

multiplicand

(product register at 0)

and loading the operands;

63 31 0

E

<< repeat 32 times:

>>

{if 1==LSB in multiplier,

then add multiplicand;

shift multiplier 1 right;

shift multiplicand 1 left;

64 - bit

add zero }

ALU multiplier

63 0 31 0

product register E E

<<

>>

LSB

finite state machine

sequential multiplier (32-bit ALU)

after initialization

multiplicand

(product register at 0)

and loading the operands;

31 0

E

repeat 32 times:

{if 1==LSB in multiplier,

then add multiplicand;

shift multiplier 1 right;

shift product 1 right;

32 - bit add zero }

ALU multiplier

63 0 31 0

product register E E

<<

<<

>> >>

LSB

finite state machine

sequential multiplier (32-bit ALU)

after initialization

multiplicand

(product register at 0)

and loading the operands;

31 0

E

repeat 32 times:

{if 1==LSB in multiplier,

then add multiplicand;

shift content of

product register 1 right;

32 - bit add zero }

ALU multiplier

63 0

product register E

<<

>>

LSB

a combinational multiplier

A3 A2 A1 A0

B0

tPD = 10*tPD,FA

FA FA FA FA

(follow the path A3 A2 A1 A0

from A0 to P7) B1

FA FA FA FA

A3 A2 A1 A0

B2

FA FA FA FA

A3 A2 A1 A0

B3

FA FA FA FA

P7 P6 P5 P4 P3 P2 P1 P0

pipelined multiplier

A3 A2 A1 A0

B0

"carry save" FA FA FA FA

configuration

A3 A2 A1 A0

B1

FA FA FA FA

A3 A2 A1 A0

B2

FA FA FA FA

A3 A2 A1 A0

B3

FA FA FA FA

FA FA FA FA

P7 P6 P5 P4 P3 P2 P1 P0

summary

• latency (L) = time it takes for given input to effect an output

• throughput (T) = rate at which new outputs appear

• for combinational circuits: L = tPD of device, T = 1/L

• for K-stage pipeline (K > 0):

– always have registers on output(s)

– K registers on every path from input to output

– T = (tPD,reg + tPD,slowest pipeline stage + tSETUP)-1

• to increase throughput: split the slowest stage

• no further splitting possible, use replication/interleaving

– L = KxT

• pipelined latency ≥ combinational latency

• pipelining can be combined chapter

with circuit interleaving 4.5-p332

en 3.3:

- Planner for Semester 1(maths).docxUploaded byaisyah
- Vlsi Signal ProcessingUploaded byNatheswaran
- Sap 1Uploaded byRamu Aryan
- Advanced Computer ArchitectureUploaded bysaurabhkher19
- math01-27to01-30207Uploaded byapi-279101525
- Modular CompositionUploaded bySorin Crăcană
- LTE Site Acceptance_Site1.docxUploaded byezze45
- ADE7878Uploaded byFrancisco Gutierrez Mojarro
- a 3 oa 4Uploaded byapi-377113175
- Math g7-Dll July 16-20Uploaded byArgel Panganiban Dalwampo
- 10 Mathematical AptitudeUploaded bythinkiit
- PipeUploaded byrvs_093
- 13054119 Pipe LiningUploaded byMasoomBachii
- HH_FIR_VHDLUploaded byVinny Lam
- VHD FIR FilterUploaded byDr-Atul Dwivedi
- ifip-vlsi09Uploaded byakash singh
- Lecture 22Uploaded byMallikarjun Mallikarjun
- Verilog ProgramsUploaded byYash Kuncolienker
- AD7730_7730LUploaded byDa Q Sha
- Microelectronics ResearchUploaded byJaize Delan
- Lecture 31Uploaded bySrinivasan Subramanian
- AD5700_5700-1-243703.pdfUploaded byFranco M. Caverzan
- Manufacturing EffciencyUploaded byBrayden Chetty
- l15 Vlsi PrintUploaded byGaurav Upadhyay
- UART synchronousUploaded byu_saravanan
- Chapter 2_ ArchitectureUploaded byGayathri 'Gaya' Jeyaram
- ResearchUploaded byhatemalshandoli1
- YEAR 2 `MATHUploaded byCkin Wan Ali
- sarajan skidipappapUploaded byFodina Rachel
- Final Report VLSIUploaded byRanganadh Mv

- 5989-9102ENUploaded bywrite2arshad_m
- OHP_CMOS_6(H20-5-16)Uploaded bysanjeevsoni64
- Behavioral DelaysUploaded byapi-3719969
- Tai_Pereira - An Approximate Formula for Calculating the Directivity of an AntennaUploaded bysanjeevsoni64
- Electron Emission in Intense Electric Fields. - R. H. Fowler, F.R.S., And Dr. L. NordheimUploaded bysanjeevsoni64
- Spectrum Analysis BasicsUploaded by趙世峰
- Microwave Power Dividers and Couplers PrimerUploaded byMohsin Fayyaz
- ee242_mixer_fundamental.pdfUploaded bysanjeevsoni64
- Graphene-Based Nano-Antennas for Electromagnetic Nanocommunications in the Terahertz Band.pdfUploaded bysanjeevsoni64
- Fundamentally Changing Nonlinear Microwave Design_Vye 2010Uploaded byKateXX7
- VNA_Models - RyttingUploaded bysanjeevsoni64
- Multi-Layered Planar Filters Basedon Aperture Coupled , Microstripor Stripline ResonatorsUploaded bysanjeevsoni64
- Direct Transistor Level Layout for Digital BlocksUploaded bysanjeevsoni64
- Design and Optimization of Single, Dual, And Triple Band Transmission Line Matching Transformers for Frequency-Dependent LoadsUploaded bysanjeevsoni64
- Tri-Band Circularly Polarized Annular Slot Antenna for GPS and CNSS ApplicationsUploaded bysanjeevsoni64
- Quantitative Theory of Nanowire and Nanotube Antenna PerformanceUploaded bysanjeevsoni64
- Mathemtical_Methods - Niels Walet.pdfUploaded bysanjeevsoni64
- Demonstration of Beam Steering Viadipole-coupled Plasmonic Spiral AntennaUploaded bysanjeevsoni64
- Design of Millimeter-wave Wideband Mixerwith a Novel if Block - m. z. ZhanUploaded bysanjeevsoni64
- Surface Mount Packages Linear Models for Diode - Avago Application Note 1124Uploaded bysanjeevsoni64
- Computer-Aided Design Ofbroadband Single Balancedwaveguide Mixer at K-bandUploaded bysanjeevsoni64
- Electron Spin and Its History - Eugene D. ComminsUploaded bysanjeevsoni64
- 70857.pdfUploaded byEdward T Ramirez
- LNAUploaded bysanjeevsoni64
- Design High Speed, Low Noise, Low Power Two Stage CMOSUploaded bysanjeevsoni64
- 07-RF Electronics Kikkert Ch5 MixersUploaded byAnonymus_01
- [Peter C. L. Yip (Auth.)] High-Frequency CircuitUploaded byAar Kay Gautam

- 275. Savoy Place: IET, Institute of Engineering and Technology, websiteUploaded byJohn Adam St Gang: Crown Control
- donalsonultrafilteridodUploaded bymahipatsinh
- Hamming CodesUploaded byAaron Merrill
- 01_PWAV Repeater Basic Jul2008Uploaded byUdithaNavod
- morphological image processingUploaded byappuchoco
- Training for SAP ERP in for Applications.pdfUploaded bySangoothura Vayasulay Sangeetha
- r22 Pilot manuel Poh Full BookUploaded byGoran Tanaskovic
- FS-6-episode-4Uploaded byszarielle yumiko
- Pressure Drop for Flow Through Packed Beds.pdfUploaded byrohl55
- Brain Dump for PMP PreparationUploaded byFitrio Makarov Nugraha
- GS14 Brochure and Data SheetUploaded bybbutros_317684077
- Aircraft PerformanceUploaded byAishwarya Ravi
- BCA 525 Principles & Practice of Management - II - (B)Uploaded bySwarnajeet Gaekwad
- Caudwell - Liberty (Set)Uploaded byodos_fanourios
- ACE III-gas heaterUploaded byMario Kesaulja
- AdditionalDSUSMaterial CompleteUploaded byEyosyas Woldekidan
- Intro to HydrologyUploaded byFred Enea
- BasicUploaded byBhaskar Reddy
- 3 Commonly Found Problem_33_EN.rev1Uploaded bySiti Ameilia Nawar
- MC -NOTESUploaded bySuresh Gowda
- Stanford CEM-DCI-SDC Curriculum 2Uploaded bySri Kalyan
- Journal PaperUploaded byanon_785063011
- natural cement specs.pdfUploaded byJawad Kamal
- Open Source and Free Business Intelligence SolutionsUploaded byAdabala Nageswara Rao
- (Urbán et al.2016) Bifactor structural modelUploaded byGeorgia De Oliveira Moura
- vConverter 5.0 UserGuideUploaded bywmlakha
- Hettich_ProjectSolutions_0412_EN.pdfUploaded byaudithan
- Internet Research Ethics for the Social Age: New Challenges, Cases, and Contexts (Full)Uploaded byMichaelZimmer
- Notice: Meetings: Leaf River Energy Center LLCUploaded byJustia.com
- HP: Supplying The Desk Jet Printer to EuropeUploaded byFez Research Laboratory