1 views

Uploaded by Deshitha Chamikara Wickramrathna

Comp Architecture Chapter 4_Pipelining

- Instruções arduino UNO
- mpmc cse ppt.pdf
- Crash Analysis
- front page of industrial training
- 6740395 Complete 8086 Instruction Set
- Adv Procssoer
- HSCD Assignment 1
- Unit II 2marks
- 5-6
- Outlines 27
- r05221901-computer-organisation
- 03
- Answer of Comp.architecter
- GCCS2MasterRev6.0
- Assignment 2
- Designing Parallel Algorithms_ Part 4
- 2016melbpc Supercomputers
- RCI 18
- Compute Cores Whitepaper
- Gpu Processing in Matlab

You are on page 1of 53

Computer Architecture - II

Pipelining

Parallel processing

simultaneous data processing to achieve faster

execution time

to execute two or more instructions at the same time

processing that can be accomplished during a given

interval of time

Parallel processing

instructions read from memory

on the memory

focused on the behavioral aspects of Parallel Processing

Parallel processing classification

Single instruction stream, single

data stream SISD

A Processor Unit

A memory unit

may be achieved by means of multiple functional

units or by pipeline processing

Single instruction stream,

multiple data stream SIMD

A single control unit

Many Processor Units

A memory unit

unit. All processors receive the same instruction, but

operate on different data.

Multiple instruction stream,

single data stream MISD

Many Processor Units

Which on its own contains

A control unit

A local memory

Theoretical only

on same data.

Multiple instruction stream,

multiple data stream MIMD

Many Processor Units

Many Control Units

A computer system capable of processing several

programs at the same time.

be classified in this category

concerns operational and structural interconnections

What is a Pipeline

Pipelining is used by all modern microprocessors to

enhance performance by overlapping the execution

of instructions.

A common analogue for a pipeline is a factory

assembly line. Assume that there are three stages:

o Welding

o Painting

o Polishing

What is a Pipeline

A single person would take three hours to produce one

product.

upon completing their stage they could pass their

product on to the next person (since each stage takes

one hour there will be no waiting).

assembly line has been filled.

Pipelining: Laundry Example

washer, one dryer and one

operator, it takes 90 A B C D

minutes to finish one load:

Dryer takes 40 minutes

operator folding takes 20

minutes

Sequential Laundry

6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20

T

a A

s

k

B

O

r

d C

e 90 min

r

D

This operator scheduled his loads to be delivered to the laundry every 90

minutes which is the time required to finish one load. In other words he

will not start a new task unless he is already done with the previous task

The process is sequential. Sequential laundry takes 6 hours for 4 loads

Efficiently scheduled laundry: Pipelined Laundry

6 PM 7 8 9 10 11 Midnight

Time

30 40 40 40 40 20

40 40 40

T

a A

s

k

B

O

r

d C

e

r

D

Another operator asks for the delivery of loads to the laundry every 40

minutes!?.

Pipelined laundry takes 3.5 hours for 4 loads

Pipelining Facts Multiple tasks operating

simultaneously

Pipelining doesnt help

6 PM latency of single task, it

7 8 9

helps throughput of entire

Time workload

T

a 30 40 40 40 40 20 Pipeline rate limited by

s slowest pipeline stage

k A

Potential speedup =

O Number of pipe stages

r B

d Unbalanced lengths of pipe

e The washer stages reduces speedup

r C waits for the

dryer for 10

minutes Time to fill pipeline and

D time to drain it reduces

speedup

Building a Car

Unpipelined Start and finish a job before moving to the next

Parallelism = 1 car

24 hrs.

Latency= 24 hrs.

Throughput = 1/24 hrs.

24 hrs.

Jobs

24 hrs.

Time

Latency the amount of time that a single operation takes to execute

Throughput the rate at which operations get executed (generally

expressed as operations/second or operations/cycle)

The Assembly Line

Pipelined Break the job into smaller stages

Eng. Body Paint 8h

A B C Parallelism = 3 cars

8h Eng. Body Paint Latency= 24 hrs.

A B C

Throughput = 1/8 hrs.

Eng. Body Paint

A B C

Jobs

Eng. Body Paint

3X

A B C

Time

In computer..

Unpipelined Start and finish a job before moving to the next

FET DEC EXE

Jobs

Time

In computer..

Pipelined Break the job into smaller stages

FET DEC EXE

A B C

I1 I1 I1

Cycle 1 FET DEC EXE

A B C

I2 I2

Cycle 2 FET DEC EXE

A B C

Jobs I3

Cycle 3

A B C

Time

In computer..

Unpipelined Start and finish a job before moving to the next

FET DEC EXE

Jobs

Time

In computer..

Pipelined Break the job into smaller stages

FET DEC EXE

A B C

I1 I1 I1 Clock Speed = 1/1ns = 1 GHz

Cycle 1 FET DEC EXE

A B C

I2 I2

Cycle 2 FET DEC EXE

A B C

Jobs I3

Cycle 3

A B C

1ns

3 ns

Time

Pipelining

Throughput the rate at which operations get executed (generally

expressed as operations/second or operations/cycle)

Clocks and Latches

Stage 1 Stage 2

Clocks and Latches

Stage 1 L Stage 2 L

Clk

Clocks and Latches

Stage 1 L Stage 2 L

Clk

Clock

Input S1 R1 S2 R2 S3 R3 S4 R4

Example

Assume a 2 ns flip-flop delay

Characteristics Of Pipelining

Decomposes a sequential process into segments.

is dedicated to a particular segment.

processor operates concurrently with all other segments.

segments.

stage is slower than another, the entire throughput of

the pipeline is affected

Pipelining

Instruction execution is divided into k segments or

stages

Instruction exits pipe stage k-1 and proceeds into pipe

stage k

All pipe stages take the same amount of time; called

Length of the processor cycle is determined by the

k segments

Pipeline Performance

n:instructions n is equivalent to number of loads in

the laundry example

k: stages in

k is the stages (washing, drying and

pipeline folding.

: clock cycle Clock cycle is the slowest task time

Tk: total time

Tk (k (n 1))

n

T1 nk

Speedup

Tk k (n 1) k

Efficiently scheduled laundry: Pipelined Laundry

6 PM 7 8 9 10 11 Midnight

Time

30 40 40 40 40 20

40 40 40

T

a A

s

k

B

O

r

d C

e

r

D

Speedup

Consider a k-segment pipeline operating on n data

sets. (In the above example, k = 3 and n = 4.)

first result from the output of the pipeline.

at each clock cycle.

the task.

Speedup

If we execute the same task sequentially in a

single processing unit, it takes (k * n) clock

cycles.

The speedup gained by using the pipeline is:

S = k * n / (k + n - 1 )

Speedup

S = k * n / (k + n - 1 )

pipeline),

S~k

number of functional units for a large data sets. This

is because the multiple functional units can work in

parallel except for the filling and cleaning-up cycles.

Speedup

Example

- 4-stage pipeline

- subopertion in each stage; tp = 20nS

- 100 tasks to be executed

- 1 task in non-pipelined system; 20*4 = 80nS

Pipelined System

(k + n - 1)*tp = (4 + 99) * 20 = 2060nS

Non-Pipelined System

n*k*tp = 100 * 80 = 8000nS

Speedup

Sk = 8000 / 2060 = 3.88

identical function units

Example of Pipelining

Suppose we want to perform the combined

multiply and add operations with a stream

of numbers:

Ai * Bi + Ci for i =1,2,3,,7

Example of Pipelining

The sub-operations performed in each

segment of the pipeline are as follows:

R1 Ai, R2 Bi

R3 R1 * R2 R4 Ci

R5 R3 + R4

Example of Pipelining

Ai Bi Ci

R1 Ai , R2 Bi R1 R2

Input Ai and Bi

R3 R1 * R2, R4 Ci

Multiplier

Multiply and input Ci

R5 R3 + R4 R3 R4

Add Ci to product

Adder

R5

Content of registers in pipeline example

Clock

Pulse

Segment1 Segment2 Segment3

number R1 R2 R3 R4 R5

2 A2 B2 A1*B1 C1 ----

3 A3 B3 A2*B2 C2 A1*B1+C1

4 A4 B4 A3*B3 C3 A2*B2+C2

5 A5 B5 A4*B4 C4 A3*B3+C3

6 A6 B6 A5*B5 C5 A4*B4+C4

7 A7 B7 A6*B6 C6 A5*B5+C5

8 ---- ---- A7*B7 C7 A6*B6+C6

9 ---- ---- ---- ---- A7*B7+C7

Ai*Bi + Ci*Di+ Ei

is executed using a pipeline

Arithmetic Pipeline

From the early times of computing arithmetics withheld an

important aspect, yet arithmetic operations happen to

consume much of the time with in the arithmetic and logic

unit.

and has opened up to many means of High performance of

computing.

operations and floating point operations.

Arithmetic Pipeline: Floating Point Adder

X = A * 2a

A is defined to be the mantissa and a is called the

exponent.

Arithmetic Pipeline: Floating Point Adder

X = A * 2a

Y = B * 2b

A floating point adder can be executed via 4 simple

sub operations

Align the mantissas.

Add or subtract the mantissas.

Normalize the result.

Arithmetic Pipeline: Floating Point Adder

decimal floats are added.

X = 0.9832* 103

Y = 0.8929* 102

Arithmetic Pipeline: Floating Point Adder

X = 0.9832* 103

Y = 0.8929* 102

The larger exponent is 3 and thus it is chosen as the

exponent for the result.

Arithmetic Pipeline: Floating Point Adder

X = 0.9832* 103

Y = 0.8929* 102

Since Y is with the lesser exponent its mantissa is

shifted to the right and the two gained values are,

X = 0.9832* 103

Y = 0.08929* 103

value Z is gained

Z = 1.07249* 103

Finally the gained result is normalized in manner which

staples a mantissa with a fraction with a none zero

value for the first decimal point.

Arithmetic Pipeline for Floating Point Adder

Exponents

Mantissas

a b

A B

R

R

Compare

Difference

Segment 1 Exponent

By subtraction

Align mantissas

R

R

Segment 2 Choose exponent

Add or subtract

Segment 3 mantissas

R R

Normalize

Segment 4 Adjust

Exponent result

R R

Arithmetic Pipeline for Floating Point Adder

Instruction Pipeline

to the Arithmetic Pipeline even though it works

with an instruction field as suppose to a data

stream.

Instruction Pipeline

process of an instruction requires the following sequence

of steps.

Decode the instruction.

Calculate the effective address.

Fetch the operands from memory.

Execute the instruction.

Store the result in the proper place.

Instruction Pipeline

Consider the following specification of a pipeline mean to

have 4 separate segments

processed at the same time.

Pipeline Conflicts

Difficulties in general can be caused due to the reasons

specified below.

Resource conflicts

when two segments access memory at the same

time.

Data dependency conflicts

occur when an instruction is dependent of a result of a

previous instruction which is not available yet

Branch difficulties conflicts

when branching and other instructions that change the

value of the PC.

Four-segment CPU pipeline for overcome

Pipeline Conflicts

Fetch instruction

Segment 1 from memory

Decode instruction

And calculate

Segment 2

Effective address

yes

Branch?

no

Fetch operand

Segment 3 From memory

Execute

Segment 4

instruction

Interrupt yes

handling Interrupt?

no

Update PC

Empty pipe

Four-segment CPU pipeline for overcome

Pipeline Conflicts

Timing of Instruction Pipeline

Step: 1 2 3 4 5 6 7 8 9 10 11 12 13

Instruction: 1 FI DA FO EX

2 FI DA FO EX

(Branch) 3 FI DA FO EX

4 FI -- -- FI DA FO EX

5 -- -- -- FI DA FO EX

6 FI DA FO EX

7 FI DA FO EX

Four-segment CPU pipeline for overcome

Pipeline Conflicts

The four segments illustrated in above table have the following

meanings:

calculate the effective address.

Thank You

- Instruções arduino UNOUploaded byLeonardo Alves
- mpmc cse ppt.pdfUploaded byRavi Chander
- Crash AnalysisUploaded byRoberto
- front page of industrial trainingUploaded bySyam Pearson
- 6740395 Complete 8086 Instruction SetUploaded bysriraghava6
- Adv ProcssoerUploaded byPalaganiRamakrishna
- HSCD Assignment 1Uploaded byNeha Gupta
- Unit II 2marksUploaded byMohanakrishna
- 5-6Uploaded bycn3588
- Outlines 27Uploaded byAro Jaya
- r05221901-computer-organisationUploaded bySRINIVASA RAO GANTA
- 03Uploaded byRaju Sk
- Answer of Comp.architecterUploaded bydirjit
- GCCS2MasterRev6.0Uploaded byGerardo Treviño Garcia
- Assignment 2Uploaded bysamuelzombie
- Designing Parallel Algorithms_ Part 4Uploaded byhssaroch
- 2016melbpc SupercomputersUploaded byLev Lafayette
- RCI 18Uploaded bymarlonmarlon92
- Compute Cores WhitepaperUploaded byFarhan Eka
- Gpu Processing in MatlabUploaded byScott
- 123Uploaded byGanesh Sai
- lecture -1Uploaded byFanan Samo Sam
- sessionUploaded byrakhikishore
- 8086 Family Users m 00 IntelUploaded byfcoppole
- Mock LectureUploaded byAjay Singh
- 15CS205J-MCQ.pdfUploaded byISHANI GUHA
- 344-cheatsheetUploaded byBerkay Özerbay
- 229036_Grand Research Challenges - USA - 2005 - Revitalizing Computer Architecture ResearchUploaded byMívian Ferreira
- DeVilliers SlidesUploaded byAntonio Martín Alcántara
- ARM Notes1Uploaded byEr Shreyas Shah

- ConnectingWith Database ExplorerUploaded byDeshitha Chamikara Wickramrathna
- Comp Architecture Chapter 6_memory_organizationUploaded byDeshitha Chamikara Wickramrathna
- The Link-Board Control in the RPC Trigger System for 2004Uploaded byDeshitha Chamikara Wickramrathna
- L11 Pipelined Datapath AndUploaded byDeshitha Chamikara Wickramrathna
- Surface Integrals of Vector FieldsUploaded byDeshitha Chamikara Wickramrathna
- 09 Memoryorganization 150216185702 Conversion Gate01Uploaded byDeshitha Chamikara Wickramrathna
- Separation of VariablesUploaded byDeshitha Chamikara Wickramrathna
- Operational AmplifiersUploaded byDeshitha Chamikara Wickramrathna
- ADC_F08Uploaded byAnirudh Parshi
- The 555 Timer Circuit IUploaded byDeshitha Chamikara Wickramrathna
- OPAmp_2Uploaded byDeshitha Chamikara Wickramrathna
- The 555 Timer Circuit IIUploaded byDeshitha Chamikara Wickramrathna
- 7 ClientServerArchitectureUploaded byRochana Ramanayaka
- MEL_5101_L5_2009Uploaded byDeshitha Chamikara Wickramrathna
- Class_DIAGRAMS.pptUploaded byDeshitha Chamikara Wickramrathna
- MEL_5101_L1_2009Uploaded byDeshitha Chamikara Wickramrathna
- Usecase_DIagrams.pptUploaded byDeshitha Chamikara Wickramrathna

- LTC1628fbUploaded bycsclz
- aa8.pdfUploaded by775i945GZ
- Real-Time Digital Signal ProcessingUploaded byHyungjoon Lim
- SMSC Call flowsUploaded byAnkit Raina
- Krunal Patel ResumeUploaded byanant shah
- BreezeMAX 3GHz Family Datasheet PbpUploaded byJesús Zavala
- The Impact of Adjacent Channel Interference in Multi-Radio Systems Using IEEE 802.11Uploaded byHam Radio HSMM
- EE2354-MICROPROCESSOR AND MICROCONTROLLERS-PART A.docUploaded byMichael Foster
- Sadržaj Prvoga Kolokvija s Primjerima Mogućih Zadataka i RješenjimaUploaded byAndrej Agatić
- JHS-183Uploaded byorigjason
- lec_071237Uploaded byJohn Carmona
- Marshall AS100D manualUploaded byIvan
- Alarm ESL Link FaultUploaded byNugraha Agus Hidayat
- Poster YJ1 12Uploaded byAli Raza
- Diversity Techniques to combat fading in WiMAX.pdfUploaded byoldjanus
- AKAI LM-H30CJSA -Uploaded bynaseem
- Weller WCB 2 mjerac tempearture.pdfUploaded byslvidovic
- communication syatemsUploaded byManojSharma
- 1xEV DO Technical OverviewUploaded byTao Wang
- GaAsFetUploaded byJ Jesús Villanueva García
- c045-p0597Uploaded byUsharaniy
- AbbreviationsUploaded byLJ Asencion Cacot
- Ladder DiagramUploaded byAnastasia Pesterean
- TOP221-227Uploaded byJEVG1
- Ultrasonic Navigation for blind with audio interface documentationUploaded byGoutham Raj Mode
- 577e4fe05da18fe32bdfb5f4340ebae4Uploaded byUnfinished_projects
- mini loopUploaded bytiozeca
- Applications of Microwave EngineeringUploaded byAmbuj Arora
- Design optimization of Reversible Logic Universal Barrel Shifter for Low Power applicationsUploaded byvinaykumar460
- Transceiver Blue Optics SfpUploaded byCBOGmbH