You are on page 1of 30

Logic Synthesis

Sequential Synthesis

Courtesy RK Brayton (UCB) 1


and A Kuehlmann (Cadence)
Introduction
• Design optimization from System level to layout
– far too complex to approach in one big step
– divide and conquer approach with fine tuned balance between
• capability to apply clean mathematical modeling and
abstraction
• algorithmic complexity to compute solutions
• loss of optimality based on hard partitioning
• design and verification methodology that requires user
guidance
– sweet spots change over time due to:
• semi-conductor technology improvements
• changes of design architectures/requirements
• new algorithmic solutions, etc.
Introduction
• Example: traditional ASIC methodology:
– RTL verification based on simulation
– logic synthesis from RTL to gate level using combinational
paradigm
– static timing analysis
– formal equivalence checking based on combinational paradigm
– ATPG and scan-based testing based on combinational paradigm
– standard cell place & route methodology with zero clock-skew
distribution
Introduction
• However:
– clean boundaries between modeling levels get blurred
• larger chips and shrinking device sizes require more detailed
modeling
• aggressive performance and power requirements
• new modeling and algorithmic approaches
– Example:
• RTL sign-off methodology
• combined approach to logic synthesis and physical design

4
Overview of Circuit Optimizations
Distance from Physical Implementation
Combinational Optimization

Necessity of Integrated Solution


Verification Challenge
Optimization Space

Clock Skew Scheduling

Retiming

Architectural Restructuring

System-Level Optimization

5
Sequential Optimization Techniques
• State assignment
– Lots of theory, practical only for small FSMs, that too targeting 2-level
control logic
• Sequential don’t cares
– Compute unreachable states, use them as external don’t cares for the
next-state logic
• State minimization
– Easy for completely specified FSMs (n ¢ log n algorithm)
– Incompletely specified FSMs
• Retiming
– balancing of path delays by moving registers within circuit topology
– interleaving with combinational optimization techniques

6
Integration in Design Flow
• Optimization Space
– significant more optimization freedom for improving performance,
power, and area
• Distance from Physical Implementation
– difficult to accurately model impact on final implementation
– difficult to mathematically characterize optimization space
• Verification Challenge
– departure from combinational comparison model would break
formal equivalence checking
– different simulation behavior causes acceptance problems

7
Retiming
r3
r2 4 5

r1 2 3
r4

4 5

2 r’1 3
r4
Skew =0
Dmax=6 Tcycle=8
Dmax=8 Dmax=0
Dmin=2 Dmin=3 Dmin=0
( Skew
T =7
cycle
)
= -1
r’1 r4
8
Retiming
+ • Only setup time constraint (0 clock skew)
• Simple integration with other logical (e.g. combinational) or
physical optimizations
• Easy combination with clock skew scheduling to obtain global
optimum

- • Changes combinational model of design


– severe impact on verification methodology
• Inaccurate delay model if applied globally
• Computation of equivalent reset state required

9
Retiming - Architectural Restructuring
r3
2 20
r2

{
2 ... ... 2
r2 r4
r1

r3
r2 2 10 10
{
2
r’4
{
...
r’1
... 2
r4

10
Retiming - Architectural Restructuring
+ • Smooth extension of regular retiming
• Potential to alleviate global performance bottlenecks by adding
sequential redundancy and pipelining

- • Significant change of design structure


– substantial impact on verification methodology
• Flexible architectural restructuring changes I/O behavior
– existing RTL specification methods not always applicable

11
Example
Design example:
- 360 I/O
- 2240 flip-flops
- 41665 timing edges

Target cycle time (norm): 1.5

Worst slack: -0.079 (5%)

Distribution:
20% 30 edges
40% 63 edges
60% 130 edges
80% 249 edges
100% 425 edges

12
Verification
• Timing verification unchanged
• Sequential optimizations change the next-state and output functions
– traditional combinational equivalence checking not applicable
– simulation runs not recognizable by designer - acceptance problems

• Generic solution:
– preserve retime function (mapping function) from synthesis for:
• reducing sequential EC problem back to combinational case
– no false positives possible!!!!
• modifying simulation model to reproduce original simulation output

13
Optimizing Circuits by Retiming

Netlist of gates and registers:

Inputs

Outputs

Various Goals:
– Reduce clock cycle time
– Reduce area
• Reduce number of latches

14
Retiming
Problem
– Pure combinational optimization can be suboptimal since relations
across register boundaries are disregarded
Solutions
– Retiming: Move register(s) so that
• clock cycle decreases, or number of registers decreases and
• input-output behavior is preserved
– RnR: Combine retiming with combinational optimization techniques
• Move latches out of the way temporarily
• optimize larger blocks of combinational

15
Circuit Representation

[Leiserson, Rose and Saxe (1983)]


Circuit represented as retiming graph G(V,E,d,w)
– V  set of gates
– E  set of connections
– d(v) = delay of gate/vertex v, (d(v)0)
– w(e) = number of registers on edge e, (w(e)0)

16
Circuit Representation
Example: Correlator (from Leiserson and Saxe) (simplified)

+ 7
0
Host 0
0 0
  2 3 3
0
(x, y) = 1 if x=y Retiming Graph (Directed)
0 otherwise a b
Circuit Operation delay

Every cycle in Graph has at least one register i.e. no  3


combinational loops.
+ 7

17
Preliminaries
e0 e1 ek 1
For a path p : v0  v1  vk 1  vk
k
d ( p)   d (vi ) (includes endpoints)
i 0
k 1
w( p)   w(ei )
i 0

7 Path with
Clock cycle 0

c  max {d ( p)}
0
0 0 w(p)=0
p: w( p ) 0
2 3 3
0

For correlator c = 13

18
Basic Operation

• Movement of registers from input to output of a gate or vice versa

Retime by -1

Retime by 1

• Does not affect gate functionality's


• Mathematical formulation:
– r: V  Z, an integer vertex labeling
– wr(e) = w(e) + r(v) - r(u) for edge e = (u,v)
19
Basic Operation

Thus in the example, r(u) = -1, r(v) = -1 results in

7 7
0 0
0 1
0 0 1
0 v
u v u
2 1 3 3
3 3 0
0

• For a path p: st, wr(p) = w(p) + r(t) - r(s)


• Retiming:
– r: VZ, an integer vertex labeling
– wr(e) =w(e) + r(v) - r(u) for edge e= (u,v)
– A retiming r is legal if wr(e)  0, eE

20
Retiming for Minimum Clock Cycle
Problem Statement: (minimum cycle time)
Given G (V, E, d, w), find a legal retiming r so that

c  max {d ( p)}
p: wr ( p ) 0
is minimized

Retiming: 2 important matrices


• Register weight matrix

W (u, v)  min{w( p) : u 
p
 v}
p
• Delay matrix

D(u, v)  max{d ( p) : u 
p
 v, w( p)  w(u, v)}
p

21
Retiming for minimum clock cycle
W = register path weight matrix
7 (minimum # latches
0
0 on all paths between
v0 0 0 u and v)
2 D = path delay matrix
3 3
0 (maximum delay on
V1 V2
all paths between
W D u and v)
V0 V1 V2 V3 V0 V1 V2 V3

V0 0222 0 3 6 13 V0
V1 0000 13 3 6 13 V1
V2 0200 10 13 3 10 V2
V3 0220 7 10 13 7 V3

c    p, if d(p)   then w(p)  1


22
Conditions for Retiming

Assume that we are asked to check if a retiming exists for a clock cycle 
Legal retiming: wr(e)  0 for all e. Hence
wr(e) = w(e) = r(v) - r(u)  0 or
r (u) - r (v)  w (e)
For all paths p: u  v such that d(p)  , we require wr(p)  1
– Thus k 1
1  wr ( p )   wr (ei )
i 0
k 1
  [ w(ei )  r (vi 1 )  r (vi )]
i 0

 w( p )  r (vk )  r (v0 )
 w( p )  r (v)  r (u )
Take the least w(p) (tightest constraint) r(u)-r(v)  W(u,v)-1
Note: this is independent of the path from u to v, so we just need to apply
it to u, v such that D(u,v)   23
Solving the constraints
• All constraints in difference-of-2-variable form
• Related to shortest path problem
W D
Correlator:  = 7 V0 V1 V2 V3 V0 V1 V2 V3

V0 0222 0 3 6 13 V0
V1 0000 13 3 6 13 V1
D>7: V2
V2 0200 10 13 3 10
Legal: r(u)-r(v)w(e) r(u)-r(v)W(u,v)-1 7 10 13 7 V3
V3 0220
r (v0 )  r (v1 )  2 r (v0 )  r (v3 )  1
r (v1 )  r (v2 )  0 r (v1 )  r (v0 )  1
r (v1 )  r (v3 )  0 r (v1 )  r (v3 )  1 0
7

r (v2 )  r (v3 )  0 r (v2 )  r (v0 )  1 v0 0 0


0

r (v3 )  r (v0 )  0 r (v2 )  r (v1 )  1 2 3 3


r (v2 )  r (v3 )  1 v1
0
V2
r (v3 )  r (v1 )  1
r (v3 )  r (v2 )  1 24
Solving the constraints
• Do shortest path on constraint graph: (O(|V||E| )) (Bellman Ford Algorithm)
• A solution exists if and only if there exists no negative weighted cycle.

Constraint graph
D>7:
Legal: r(u)-r(v)w(e) r(u)-r(v)W(u,v)-1
r (v0 )  r (v1 )  2 -1
r (v0 )  r (v3 )  1 0
r (v1 )  r (v2 )  0 r(0) 2 -1
r (v1 )  r (v0 )  1 0 r(1)
r (v1 )  r (v3 )  0
r (v1 )  r (v3 )  1 0 0 0 1 1
r (v2 )  r (v3 )  0
r (v2 )  r (v0 )  1
r (v3 )  r (v0 )  0 0 -1 1 0,-1
r (v2 )  r (v1 )  1 1

r (v2 )  r (v3 )  1 0
r(2) r(3)
r (v3 )  r (v1 )  1 0,-1
0
-1
r (v3 )  r (v2 )  1 1

A solution is r(v0) = r(v3) = 0, r(v1) = r(v2) = -1 25


Retiming
To find the minimum cycle time, do a binary search among the entries of
the D matrix (0(V E  logV))

7
0
0
W D
V0 V1 V2 V3 V0 V1 V2 V3
v0 0 0
V0 0222 0 3 6 13 V0
2 3 3 V1
0 V1 0000 13 3 6 13
v1 V2 V2 0200 10 13 3 10 V2
V3 7 10 13 7 V3
0220
Retimed correlator:
+ +
Retime
Host Host

   
Clock cycle
= 3+3+7=13 Clock cycle = 7
26
a b a b
Retiming: 2 more algorithms
1. Relaxation based:
– Repeatedly find critical path;
– retime vertex at end of path by +1 (O(VElogV))

+1
v

Critical path
u

2. Also, Mixed Integer Linear Program formulation

27
Retiming for Minimum Area

Goal: minimize number of registers used


min N r   wr (e)
eE

  ( w(e)  r (v)  r (u ))
e:u  v

  w(e)   (r (v)  r (u ))
eE e:u v

 N   (r (v)  r (u ))
u v

 N    r (v)(# fanin(v )  # fanout (v ) 


vV

 N   aV r (v )
vV
where av is a constant. 28
Minimum Registers - Formulation

 a r (v )
Minimize:
v
vV

Subject to: wr(e) =w(e) + r(v) - r(u)  0

• Reducible to a flow problem

29
Problems with Retiming
• Computation of equivalent initial states
– do not exist necessarily
1
?

0 ?

– General solution requires replication of logic for initialization

• Timing models
– too far away from actual implementation

30

You might also like