You are on page 1of 28

Advanced Computer

Architecture
Lec 9
Pipelining
Dr. Noman Hasany
Pipelining concept
Scalar Pipeline
Instruction Pipeline
Linear Pipeline Processors
• A linear pipeline processor is a cascade of processing stages which
are linearly connected to perform a fixed function over a stream
of data flowing from one end to the other.
• In modem computers. linear pipelines are applied for instruction
execution, arithmetic computation, and memory-access
operations.
• External inputs (operands) are fed into the pipeline at the first
stage Si. The processed results are passed from stage Si to stage
Si+1. for all i = 1, 2,... k-1, for a k stage pipeline. The final result
emerges from the pipeline at the last stage Sk.
Asynchronous Pipeline
• Controlled by a handshaking protocol.
• When stage Si is ready to transmit, it sends a ready signal to stage Si+1.
After stage Si+1 receives the incoming data, it returns an acknowledge
signal to Si.
• Different amounts of delay may be experienced in different stages.
• Asynchronous pipelines may have a variable throughput rate.
Synchronous Pipeline
• Upon the arrival of a clock pulse, all latches transfer data to the next
stage simultaneously.
• It is desired to have approximately equal delays in all stages, which
determines the clock period and thus the speed of the pipeline.
• Unless otherwise specified, only synchronous pipelines are studied in
this book.
Reservation Table
• The utilization pattern of successive
stages in a synchronous pipeline is
specified by a reservation table.
• Once the pipeline is filled up, one result
emerges from the pipeline for each
additional cycle.
• This throughput is sustained only if the
successive tasks are independent of each
other.
Clocking and Timing control
• Denote the maximum stage delay as τm.
• The clock pulse has a width equal to d. In general, τm >>d , by one to
two orders of magnitude.
• This implies that the maximum stage delay τm dominates the clock
period.
Pipeline Frequency and Throughput
• Orders of magnitude are used to make approximate comparisons. If numbers
differ by one order of magnitude, x is about ten times different in quantity
than y. If values differ by two orders of magnitude, they differ by a factor of
about 100.
• The pipeline frequency is defined as the inverse of the clock period:

• ‘f ’ is maximum throughput if one result is expected to come out of the


pipeline per cycle.
• Depending on the initiation rate of successive tasks entering the pipeline, the
actual throughput of the pipeline may be lower than f. This is because more
than one clock cycle has elapsed between successive task initiations.
• The total clock cycles required to process n tasks is k+(n-1) clock
cycles, where k cycles are needed to complete the execution of the
very first task and the remaining n-1 tasks require n- 1 cycles.
• In practice, most pipelining is staged at the functional level with 2<=
5<=15.
Very few pipelines are designed to exceed I0 stages in real computers.
The optimal choice of the number of pipeline stages should be able to
maximize the performance to cost ratio for the target processing load.
Speedup Factor
• The speedup factor of a k-stage pipeline over an equivalent non
pipelined processor is defined as:

• The maximum speedup is:


• This maximum speedup is very difficult to achieve because of data
dependences between successive tasks (instructions), program
branches, interrupts etc.
Efficiency of Linear Pipeline
• The efficiency E of a linear It-stage pipeline is defined as
k

• Actual upon ideal speedup


• A lower bound on Ek is 1/k when n = 1.

• However, the efficiency approaches to 1 when n —> ꚙ.


Throughput of Linear Pipeline
• The pipeline throughput Hk is defined as the number of tasks
(operations) performed per unit time:

• The maximum throughput occurs when Ek —> 1 as n —> ꚙ.


• Note that:
Assignment 2a
Arithmetic Pipeline
Important Note
Assignment 2b
Superscalar Pipeline Design
Superscalar
• To compare the relative performance of a superscalar processor with
that of a scalar base machine, we estimate the ideal execution time of
N independent instructions through the pipeline
• The time required by the scalar base machine is
Superscalar

You might also like