You are on page 1of 9

ECE 695R:

SYSTEM-ON-CHIP DESIGN
Module 3: Behavioral Synthesis
Lecture 3.16: Advanced Pipelining

Anand Raghunathan
raghunathan@purdue.edu
Fall 2014, ME 1052, T Th 12:00PM-1:15PM
ECE 695R: System-on-Chip Design, Fall 2014 © 2013 Anand Raghunathan 1
Pipelining
• Utilizing pipelined functional units
– Simple extension of scheduling – can initiate a new
operation on the FU before the previous one completes

Resource constraint:
1 adder, 1 multiplier

2-stage
pipelined
*1 multiplier *1

*2
+1
+1
*2

Note: Direct impact on clock


period and number of clock cycles
ECE 695R: System-on-Chip Design, Fall 2014 2
Pipelining
• Pipelining the execution of the entire
algorithm
– Start one instance of the computation
before the previous one has completed
Example:
The CDFG has a longest path of 4
cycles.

One instance of the computation


cannot be completed in less than 4
cycles because of data
dependencies.

However, we can increase the Resources:


throughput or initiation rate (rate at 2 MUL
which new inputs are accepted)! 1 ADD
ECE 695R: System-on-Chip Design, Fall 2014 3
Pipelining

• Example:
Initiation rate =
2 cycles
Latency =
4 cycles

Input 1 Input 2 Input 3

Resources:
2 MUL
2 ADD

ECE 695R: System-on-Chip Design, Fall 2014 4


Pipelining and Resource Utilization

• Doubling the initiation rate does NOT


double the hardware resources needed.
Why?
Pipelining can improve resource utilization

ECE 695R: System-on-Chip Design, Fall 2014 5


Pipelining

• Initiation rate
= 1 cycle

Resources:
4 MUL
3 ADD

ECE 695R: System-on-Chip Design, Fall 2014 6


Pipelined Scheduling
• Generating pipelined data paths
– Divide DFG/CDFG into parts that represent pipeline
stages
– Assume that the DFGs corresponding to each of the stages
execute concurrently
– Use any scheduling algorithm

Stage 1

Stage 1 Stage 2 Stage 3


Behavior
Stage 2
(CDFG)

Stage 3
Scheduler

ECE 695R: System-on-Chip Design, Fall 2014 7


Loop Pipelining

• Loops often inhibit exploitation of


parallelism
• Option #1: Fully (or partially) unroll
loop
– Benefit: Scheduler sees and optimizes
entire iteration space
– Problems: ______________________
• Option #2: Loop Pipelining
– Also called loop rolling, loop rotation, …

ECE 695R: System-on-Chip Design, Fall 2014 8


Loop Pipelining

time
A1 Prologue -
B1 A2 fill the
C1 B2 A3 pipe

D1 C2 B3 A4
Loop body D2 C3 B4 A5 Kernel –
A
with 4 … steady
B
data- D C B A state
C
dependent
D
operations Dn-2 Cn-1 Bn Epilogue -
Dn-1 Cn drain the
Steady state: 4 iterations executed Dn pipe
simultaneously, 1 operation from each
iteration. An iteration starts
and finishes in each cycle.

Steady state can be expressed as a loop!

ECE 695R: System-on-Chip Design, Fall 2014 9

You might also like