You are on page 1of 59

Digital VLSI Design

Lecture 5: Logic Synthesis


Semester A, 2016-17
Lecturer: Dr. Adam Teman

5 December 2016

Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited;
however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed,
please feel free to email adam.teman@biu.ac.il and I will address this as soon as possible.
What is Logic Synthesis? module counter(
input clk, rstn, load,
input [1:0] in,
• Synthesis is the process that converts RTL into a technology- output reg [1:0] out);
always @(posedge clk)
specific gate-level netlist, optimized for a set of pre-defined if (!rstn) out <= 2'b0;
else if (load) out <= in;
constraints. else out <= out + 1;
• You start with: endmodule

• A behavioral RTL design

Synthesis
Standard Cell Library
• A standard cell library
• A set of design constraints Design Constraints

• You finish with: module counter ( clk, rstn, load, in, out );
• A gate-level netlist, mapped to the input [1:0] in;
output [1:0] out;
standard cell library input clk, rstn, load;
wire N6, N7, n5, n6, n7, n8;

• (For FPGAs: LUTs, flip-flops, and RAM blocks) HDDFFPQ1 out_reg_1 (.D(N7),.CK(clk),.Q(out[1]));

• Hopefully, it’s also efficient in terms of speed,


HDDFFPQ1 out_reg_0 (.D(N6),.CK(clk),.Q(out[0]));
HDNAN2DL U8 (.A1(out[0]),.A2(n5),.Z(n8));
HDNAN2DL U9 (.A1(n5),.A2(n7),.Z(n6));
area, power, etc. HDINVDL U10 (.A(load),.Z(n5));
HDOA211DL U11 (.A1(in[0]),.A2(n5),.B(rstn),.C(n8),.Z(N6));
HDOA211DL U12 (.A1(in[1]),.A2(n5),.B(rstn),.C(n6),.Z(N7));
HDEXNOR2DL U13 (.A1(out[1]),.A2(out[0]),.Z(n7));
endmodule
2
What is Logic Synthesis?
• Given: Finite-State Machine F(X,Y,Z, λ, δ)
where:
• X: Input alphabet
• Y: Output alphabet
• Z: Set of internal states
• λ: X x Z  Z (next state function)
• δ: X x Z  Y (output function)

• Target: Circuit C(G, W) where:


• G: set of circuit components
g = {Boolean gates, flip-flops, etc.}
• W: set of wires connecting G

3
Motivation
• Why perform logic synthesis?
• Automatically manages many details of the design process:
• Fewer bugs
• Improves productivity
• Abstracts the design data (HDL description) from any particular implementation technology
• Designs can be re-synthesized targeting different chip technologies;
• E.g.: first implement in FPGA then later in ASIC
• In some cases, leads to a more optimal design than could be achieved by
manual means (e.g.: logic optimization)
• Why not logic synthesis?
• May lead to less than optimal designs in some cases

4
Simple Example
module foo (a,b,s0,s1,f);
input [3:0] a;
input [3:0] b;
input s0,s1;
output [3:0] f;
reg f;

always @ (a or b or s0 or s1)
if (!s0 && s1 || s0)
f=a;
else
f=b;
endmodule

5
Goals of Logic Synthesis
• Minimize area
• in terms of literal count, cell count, register count, etc.
• Minimize power
• in terms of switching activity in individual gates,
deactivated circuit blocks, etc.
• Maximize performance
• in terms of maximal clock frequency of synchronous
systems, throughput for asynchronous systems
• Any combination of the above
• combined with different weights
• formulated as a constraint problem
• “minimize area for a clock speed > 300MHz”
• More global objectives
• feedback from layout
• actual physical sizes, delays, placement and routing

6
How does it work?
Variety of general and ad-hoc (special case) methods:
• Instantiation:
• maintains a library of primitive modules (AND, OR, etc.) and user defined modules
• “Macro expansion”/substitution:
• a large set of language operators (+, -, Boolean operators, etc.)
and constructs (if-else, case) expand into special circuits
• Inference:
• special patterns are detected in the language description and treated specially
(e.g.,: inferring memory blocks from variable declaration and read/write statements,
FSM detection and generation from “always @ (posedge clk)” blocks)
• Logic optimization:
• Boolean operations are grouped and optimized with logic minimization techniques
• Structural reorganization:
• advanced techniques including sharing of operators,
and retiming of circuits (moving FFs), and others
7
Syntax Analysis
Basic Synthesis Flow
Library Definition
• Syntax Analysis:
• Read in HDL files and check for syntax errors. Elaboration and
read_hdl –verilog sourceCode/toplevel.v Binding

• Library Definition: Pre-mapping


• Provide standard cells and IP Libraries. Optimization
set_attribute library “/design/data/my_fab/digital/lib/TT1V25C.lib”

• Elaboration and Binding:


• Convert RTL into technology-independent generics.
• State reduction, encoding, register infering.
• Bind all leaf cells to provided libraries.
elaborate toplevel

• Pre-mapping Optimization:
• Reduce Boolean literals.

8
Syntax Analysis
Basic Synthesis Flow
Library Definition
• Constraint Definition:
• Define clock frequency and other design constraints. Elaboration and
read_sdc sdc/constraints.sdc Binding

• Technology Mapping: Pre-mapping


• Map generic logic to technology libraries. Optimization
synthesize –to_mapped –effort medium

• Post-mapping Optimization: Constraint Definition

• Iterate over design, changing gate sizes, Boolean literals,


architectural approaches to try and meet constraints. Technology Mapping
• Report and export
• Report final results with an emphasis on timing reports. Post-mapping
report timing –num paths 10 > reports/timing_reports.rpt Optimization

• Export netlist and other results for further use.


write_hdl > export/netlist.v Report and export
9
Syntax
Analysis
1 3 4 6 7 Library
2 5 8 Definition
Intro Lib. Boolean Tech. Verilog
Compile SDC Opt. Elaboration
Def. Min Mapping for Synt.
and Binding
Pre-mapping
Optimization
Constraint
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export

Compilation
…but aren’t we talking about synthesis?
Syntax
Analysis

Compilation in the synthesis flow Library


Definition
Elaboration
and Binding

• Before starting to synthesize, Pre-mapping


Optimization

we need to check the syntax for correctness. Constraint


Definition

• Synthesis vs. Compilation: Technology


Mapping

• Compiler Post-mapping
Optimization
• Recognizes all possible constructs in a formally defined program language Report and

• Translates them to a machine language representation


export

of execution process
• Synthesis
• Recognizes a target dependent subset of a hardware
description language
• Maps to collection of concrete hardware resources
• Iterative tool in the design flow

11
Syntax
Analysis
1 3 4 6 7 Library
2 5 8 Definition
Intro Lib. Boolean Tech. Verilog
Compile SDC Opt. Elaboration
Def. Min Mapping for Synt.
and Binding
Pre-mapping
Optimization
Constraint
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export

Library Definition

12
Syntax
Analysis

It’s all about the standard cells… Library


Definition
Elaboration
and Binding

• Well, we discussed quite a bit about standard cell libraries. Pre-mapping


Optimization

• The library definition stage is tells the synthesizer where to look for
Constraint
Definition
Technology
leaf cells for binding and the target library for technology mapping. Mapping

• We can provide a list of paths to search for libraries in: Post-mapping


Optimization
set_attribute lib_search_path “/design/data/my_fab/digital/lib/” Report and
export
• And we have to provide the name of a specific library,
usually characterized for a single corner:
set_attribute library “TT1V25C.lib”

• We also need to provide .lib files for IPs, such as


memory macros, I/Os, and others.
Make sure you understand all the warnings about the libs that the
synthesizer spits out, even though you probably can’t fix them.
13
Syntax
Analysis
1 3 4 6 7 Library
2 5 8 Definition
Intro Lib. Boolean Tech. Verilog
Compile SDC Opt. Elaboration
Def. Min Mapping for Synt.
and Binding
Pre-mapping
Optimization
Constraint
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export

Boolean Minimzation
Mapping to Generics and Libs, Basics of Boolean Minimization
(BDDs, Two-Level Logic, Espresso)

14
Syntax
Analysis

Elaboration and Binding Library


Definition
Elaboration
and Binding

• During the next step of logic synthesis, the tool: Pre-mapping


Optimization

• Compiles the RTL into a Boolean data structure (elaboration) Constraint


Definition
• Binds the non-Boolean modules to leaf cells (binding), and Technology
Mapping
• Optimizes the Boolean logic (minimization). Post-mapping
Optimization
• The resulting design is mapped to generic, Report and
export
technology independent logic gates.
• This is the core of synthesis and has been a very central subject of
research in computer science since the eighties.

15
Syntax
Analysis
B
Two-Level Logic C
Library
Definition
Elaboration
B and Binding
• During elaboration, primary inputs and outputs (ports) are D Pre-mapping
Optimization
defined and sequential elements (flip-flops, latches) are A
Constraint
C
inferred. D F Definition

• This results in a set of combinational logic clouds with:


Technology
A Mapping

• Input ports and register outputs are inputs to the logic C Post-mapping
Optimization
• Output ports and register inputs are the outputs of the logic D
Report and
• The outputs can be described as Boolean functions of the C export
inputs. D
• The goal of Boolean minimization is to reduce the number of
literals in the output functions. f = x1x2
A
• Many different data structures are used to represent the B F
D
Boolean functions:
• Truth tables, cubes, Binary Decision Diagrams, A x3
equations, etc. B x2
• A lot of the research was developed upon SOP or POS C
x1
representation, which is better known as “Two-Level Logic”
16
Syntax
Analysis

Two-Level Logic Minimization Library


Definition
Elaboration
and Binding

• In our freshman year we learned about Karnaugh maps: Pre-mapping


Optimization

• For n inputs, the map contains 2n entries AB


A Constraint
Definition
CD 00 01 11 10
• Objective is to find minimum prime cover 00 X 1 0 1 Technology
Mapping

• However… 01 0 1 1 1
D
Post-mapping
Optimization
• Difficult to automate (NP-complete) 11 0 X X 0 Report and
export
• Number of cells is exponential (<6 variables)
C
10 0 1 0 1

• A different approach is the Quine-McCluskey method B

• Easy to implement in software


• BUT computational complexity too high
• Some Berkeley students fell asleep while solving a Quine-McCluskey exercise.
• They needed a shot of Espresso.
17
Syntax
Analysis

Espresso Heuristic Minimizer Library


Definition
Elaboration
and Binding
ESPRESSO(F) {
• Start with an SOP solution. do { Pre-mapping
Optimization
• Expand reduce(F); Constraint
expand(F);
• Make each cube as large as possible without Definition
irredundant(F); Technology
covering a point in the OFF-set. Mapping
} while (fewer terms in F);
• Increases the number of literals (worse solution) verify(F); Post-mapping
• Irredundant }
Optimization
Report and
• Throw out redundant cubes. export
• Remove smaller cubes whose points are covered by larger cubes.
• Reduce
• The cubes in the cover are reduced in size.
• In general, the new cover will be different from the initial cover.
• “expand” and “irredundant” steps can possibly find out a new way to cover
the points in the ON-set.
• Hopefully, the new cover will be smaller.

18
Syntax
f  AC  CD  AC  CD Analysis

Espresso Example f  AC  ACD  AC  ACD


Library
Definition
Elaboration
and Binding
Pre-mapping
Optimization

A Constraint
A
AB AB Definition

CD 00 01 11 10 CD 00 01 11 10 Technology
Mapping
00 1 1 0 0 00 1 1 0 0
Post-mapping
Optimization
01 1 1 1 1 01 1 1 1 1 Report and
D export
D
11 0 0 1 1 11 0 0 1 1
C C ESPRESSO(F) {
10 1 1 1 1 10 1 1 1 1 do {
reduce(F);
expand(F);
B B irredundant(F);
Result of REDUCE: } while (F smaller);
Initial Set of Primes found by verify(F);
Steps1 and 2 of the Espresso Shrink primes while still }
Method covering the ON-set

4 primes, irredundant cover, Choice of order in which


but not a minimal cover! to perform shrink is important
19
Syntax
f  AC  AD  AC  CD Analysis

Espresso Example f  AC  AD  CD Only 6


Library
Definition
Elaboration
and Binding
literals! Pre-mapping
Optimization

A A Constraint
AB AB Definition

CD 00 01 11 10 CD 00 01 11 10 Technology
Mapping
00 1 1 0 0 00 1 1 0 0
Post-mapping
Optimization
01 1 1 1 1 01 1 1 1 1 Report and
D D export
11 0 0 1 1 11 0 0 1 1
C C ESPRESSO(F) {
10 1 1 1 1 10 1 1 1 1 do {
reduce(F);
expand(F);
B B irredundant(F);
} while (F smaller);
Second EXPAND generates a IRREDUNDANT COVER found by verify(F);
different set of prime implicants final step of espresso }

Only three prime implicants!

20
Syntax
Analysis

Multi-level Logic Minimization Library


Definition
Elaboration
and Binding

• Two-level logic minimization has been widely researched Pre-mapping


Optimization

and many famous methods have come out of it. Constraint


Definition
• However, often it is better and/or more practical to use Technology
Mapping
many levels of logic (remember logical effort?). Post-mapping
Optimization
Report and

• Therefore, a whole new optimization regime, known as


export

multi-level logic minimization was developed.


• We will not cover multi-level minimization in this course,
however, you should be aware that the output of logic
minimization will generally be multi-level and not two-
level.

21
Syntax
Analysis

Multi-level Logic Minimization Library


Definition
Elaboration
and Binding

• For example: Pre-mapping


Optimization

• Given the following logic set: Constraint


Definition
t1 = a + bc; 17
a+bc Technology
t2 = d + e; Literals Mapping
t1t2 + fg
t3 = ab + d; Post-mapping

t4 = t1t2 + fg; d+e Optimization

t5 = t4h + t2t3; t4h + t2t3 t5’ F Report and


export
F = t5’; ab+d

13
• Multi-level Logic Minimization can result in: Literals
t1 = d + e;
t2 = b + h; d+e t1t3 + fgh t4’ F
t3 = at2 + c;
t4 = t1t3 + fgh;
b+h at2 +c
F = t4’;
22
Syntax
Analysis

Binary Decision Diagrams (BDD) Library


Definition
Elaboration
and Binding

• BDDs are DAGs that represent the truth table of a given function Pre-mapping
Optimization
Constraint
Root node Definition
x1 x2 x3 f(x1x2x3) Technology
f(x1,x2,x3) Mapping
0 0 0 1 x1
Post-mapping
0 1 Optimization
0 0 1 1
~(x2x3) Report and
x2 x2 x2 ~x3 export
0 1 0 1
0 1 0 1
0 1 1 0
1 0 0 0 ~x3
x3 x3 ~x3 x3 x3
1 0 1 0
0 1 0 1 0 1 0 1
1 1 0 1
1 1 1 0 0 0 1 0

f(x1, x2, x3) = ~x1~x2~x3 + ~x1~x2x3 + ~x1x2~x3 + x1x2~x3


23
Syntax
Analysis

Binary Decision Diagrams (BDD) Library


Definition
Elaboration
and Binding

• The Shannon Expansion of a function relates Pre-mapping


Optimization
the function to its cofactors: Constraint

• Given a boolean function f(x1,x2,…,xi,…,xn)


Definition
f Technology
• Positive cofactor: fi1 = f(x1,x2,…,1,…,xn) Mapping

• Negative cofactor: fi0 = f(x1,x2,…,0,…,xn) a


Post-mapping
Optimization

• Shannon’s expansion theorem states that


Report and
export

• f = xi’ fi0 + xi fi1


• f = (xi + fi0 )(xi’ + fi1 )
• This leads to the formation of a BDD: b’c’ + bc c
• Example: f = ac + bc + a’b’c’
= a’ (b’c’ + bc) + a (c + bc)
= a’ (b’c’ + bc) + a (c)

24
Syntax
Analysis

Reduced Ordered BDD (ROBDD) Library


Definition
Elaboration
and Binding

• BDDs can get very big. Pre-mapping


Optimization

• So let’s see if we can provide a reduced representation. Constraint


Definition

• Reduction Rule 1: Merge equivalent leaves Technology


Mapping
Post-mapping
Optimization
a a a f(x1,x2,x3) x1
f(x1,x2,x3)
x1 Report and
export

~(x2x3) ~(x2x3) x2 ~x3


x2 x2 x2 ~x3 x2 x2

x3 x3 ~x3 x3 x3 x3 x3 ~x3 x3 ~x3 x3


~x3

1 1 1 0 0 1 0 1 0
f(x1, x2, x3) = ~x1~x2~x3 + ~x1~x2x3 + ~x1x2~x3 + x1x2~x3 = ~x1~x2 + ~x1x2~x3 + x1x2~x3
25
Syntax
Analysis

Reduced Ordered BDD (ROBDD) Library


Definition
Elaboration
and Binding

• BDDs can get very big. Pre-mapping


Optimization

• So let’s see if we can provide a reduced representation. Constraint


Definition

• Reduction Rule 2: Merge isomorphic nodes Technology


Mapping
Post-mapping

x1
f(x1,x2,x3) f(x1,x2,x3) x1
Optimization
x x x x Report and
export

~(x2x3) x2 x2 ~x3 x2
~(x2x3) x2 x2 ~x3 x2
y z y z

x3 x3 ~x3 x3 ~x3 x3 x3 x3 ~x3 x3

1 0 1 0

26
Syntax
Analysis

Reduced Ordered BDD (ROBDD) Library


Definition
Elaboration
and Binding

• BDDs can get very big. Pre-mapping


Optimization

• So let’s see if we can provide a reduced representation. Constraint


Definition

• Reduction Rule 2: Eliminate Redundant Tests Technology


Mapping
Post-mapping
x1
f(x1,x2,x3) x1
f(x1,x2,x3) Optimization
x Report and
y export

y ~(x2x3) x2 x2 x2 ~x3 ~(x2x3) x2 x2 x2 ~x3

x3 x3 ~x3 x3 x3
f(x1, x2, x3) = ~x1~x2 + ~x3
~x1x2~x3 + x1x2~x3

1 0 1 0

27
Syntax
Analysis

Binary Decision Diagrams (BDD) Library


Definition
Elaboration
and Binding
f = ab+a’c+a’bd
• Some benefits of BDDs: Pre-mapping
Optimization
• Check for tautology is trivial. root Constraint
• BDD is a constant 1. node
Definition
a
• Complementation.
Technology
Mapping

• Given a BDD for a function f, the BDD for f’ can be c+bd b Post-mapping
Optimization
obtained by interchanging the terminal nodes. b b
• Equivalence check.
Report and
c export
c+d
• Two functions f and g are equivalent if their BDDs (under c c
the same variable ordering) are the same.
• An Important Point: d
• The size of a BDD can vary drastically if the order d
in which the variables are expanded is changed.
• The number of nodes in the BDD can be
0 1
exponential in the number of variables in the
worst case, even after reduction
28
Syntax
Analysis
1 3 4 6 7 Library
2 5 8 Definition
Intro Lib. Boolean Tech. Verilog
Compile SDC Opt. Elaboration
Def. Min Mapping for Synt.
and Binding
Pre-mapping
Optimization
Constraint
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export

Constraint Definition

29
Syntax
Analysis

Constraint Definition Library


Definition
Elaboration
and Binding

• Following Elaboration, the design is loaded into the synthesis Pre-mapping


Optimization
tool and stored inside a data structure. Constraint
Definition
• Hierarchical ports (inputs/outputs) and registers can be Technology
Mapping
accessed by name. Post-mapping
Optimization
set in_ports [get_ports IN*]
Report and
set regs [get_cells –hier *_reg] export

• At this point, we can load the design constraints in SDC format,


as we learned in the previous lecture.
read_sdc –verbose sdc/constraints.sdc

• Carefully check that all constraints were accepted by the tool!

30
Syntax
Analysis
1 3 4 6 7 Library
2 5 8 Definition
Intro Lib. Boolean Tech. Verilog
Compile SDC Opt. Elaboration
Def. Min Mapping for Synt.
and Binding
Pre-mapping
Optimization
Constraint
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export

Technology Mapping

31
Syntax
Analysis

Technology mapping Library


Definition
Elaboration
and Binding

• Technology mapping is the phase of logic synthesis when gates are Pre-mapping
Optimization

selected from a technology library to implement the circuit. Constraint


Definition

• Why technology mapping? Technology


Mapping

• Straight implementation may not be good. Post-mapping


Optimization

• For example, F=abcdef as a 6-input AND gate causes a long delay. Report and
export
• Gates in the library are pre-designed, they are usually optimized in
terms of area, delay, power, etc.
• Fastest gates along the critical path, area-efficient gates (combination)
off the critical path.
• Can apply a minimum cost tree-covering algorithm to solve this
problem.

32
Syntax
Analysis

Technology Mapping Algorithm Library


Definition
Elaboration
and Binding

• Using a recursive tree-covering algorithm, we can easily, and almost Pre-mapping


Optimization
optimally, map a logic network to a technology library. Constraint
Definition
• This process incurs three steps: Technology
Mapping
• Map netlist and tech library to simple gates Post-mapping
• Describe the netlist with only NAND2 and NOT gates Optimization

• Describe SC library with NAND2 and NOT gates and associate a cost with each gate Report and
export
• Tree-ifying the input netlist
• Tree covering can only be applied to trees!
• Split tree at all places, where fanout > 2
• Minimum Cost Tree matching
• For each node in your tree, recursively find the minimum cost target pattern at that
node.
• Let us briefly go through these steps

33
Syntax
Analysis

1. Simple Gate Mapping Library


Definition
Elaboration
and Binding

• Apply De Morgan laws to your Boolean function to make it Pre-mapping


Optimization

a collection of NAND2 and NOT gates. Constraint


Definition
• Let’s take the example of multi-level logic minimization: Technology
Mapping
Post-mapping
t1 = d + e;
f Optimization
t2 = b + h; Report and
t3 = at2 + c; g export
t4 = t1t3 + fgh; fgh
F = t4’; d
t1
t1  d  e  NAND  d , e  e F
t2  b  h  NAND  b , h 
h
t3  at2  c  at2  c  NAND  NAND  a, t2  , c  t2
b

t4  t1t3  fgh  NAND t1t3 , fgh 
a

fgh  fh  g  fh  g  NAND NAND  f , h , g  c
t3

34 F  t4
Syntax
Analysis

1. Simple Gate Mapping Library


Definition
Elaboration
and Binding

• And then, given a set of gates (standard cell library) Pre-mapping


Optimization

with cost metrics (area/delay/power): Constraint


Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export

• We need to define the gates with the same NAND2/NOT set:


aoi21 (3) oai22 (4)
inv(1) nand3 (3)
nand2(2)

xor (5)
nor3 (3)
nor2(2)

35
Syntax
Analysis

2. Tree-ifying Library
Definition
Elaboration
and Binding

• To apply a tree covering algorithm, we must work on a tree! Pre-mapping


Optimization

• Is any given logic network a tree? Constraint


Definition
• No! Technology
Mapping
• We must break the tree at any node with fanout>2 Post-mapping
Optimization
Report and
export

We get 3 trees
36
Syntax
Analysis

3. Minimum Tree Covering Library


Definition
Elaboration
and Binding

• Now, we can apply a recursive algorithm to achieve a minimum cover: Pre-mapping


Optimization

• Start at the output of the graph. i Constraint


Definition
• For each node, find all the matching target patterns. Technology
Mapping
• The cost of node i for using gate g is: gi Post-mapping

 
cost  i   min k cost  gi    k cost  ki  k1 k2
Optimization
Report and
• where ki are the inputs to gate g. export

• For simplicity, we will redraw our graph and show an example:


• Every NOT is just an empty circle: I
• Every AND is just a full circle: N k inputs to gi

• Every input is just a box: A

37
Syntax
Analysis

3. Minimum Tree Covering - Example Library


Definition
Elaboration
and Binding

F f: NOT 2 + min(w) = 2 + 11 = 13 Pre-mapping


Optimization
AOI21 I f AND2 4 + min(y)+min(z) = 4 + 2 + 6 = 12
Constraint
AOI21 6 + min(x) = 6 + 3 = 9 Definition

w: NAND2 3 + min(y)+min(z) = 3 + 2 + 6 = 11 Technology


Mapping
N w y: NOT 2
Post-mapping
z: NAND2 3 + min(x) = 3 + 3 = 6 Optimization

x: NAND2 3 Report and


export
I y N z

NAND2
A B N x
B I I
A 6
6
I N N
4
C D I N N I I I I
2 3
CD
NOT NAND2 AND2 NOR2 AOI21
38
1 3 4 6 7
2 5 8
Intro Lib. Boolean Tech. Verilog
Compile SDC Opt.
Def. Min Mapping for Synt.

Verilog for Synthesis -


revisited

39
Some things we may have missed
• Now that we’ve seen how synthesis works, let’s revisit some of
the things we may have skipped or only briefly mentioned earlier…
• Let’s take a simple 42 encoder as an example:
• Take a one-hot encoded vector and output the position of the ‘1’ bit.
• One possibility would be to describe this logic with a nested if-else block:
always @(x)
begin : encode
if (x == 4'b0001) y = 2'b00;
else if (x == 4'b0010) y = 2'b01;
else if (x == 4'b0100) y = 2'b10;
else if (x == 4'b1000) y = 2'b11;
else y = 2'bxx;
end

• The result is known as “priority logic”


• i.e., some bits have priority over others…
40
Some things we may have missed
• It would have been better to use a case construct:
• all cases are always @(x)
begin : encode
matched in parallel case (x)
4’b0001: y = 2'b00;
4’b0010: y = 2'b01;
• and better yet, synthesis 4'b0100: y = 2'b10;
4'b1000: y = 2'b11;
can optimize away the default: y = 2'bxx;
constants and other endcase
end
Boolean equalities:

41
Some things we may have missed
• In the previous example, if the encoding was wrong (i.e., not one-hot), we would
have propagated an x in the logic simulation.
• But what if we guarantee that the input was one hot encoded?
• Then we could write our code differently…
always @(x)
begin : encode
if (x[0]) y = 2'b00;
else if (x[1]) y = 2'b01;
else if (x[2]) y = 2'b10;
else if (x[3]) y = 2'b11;
else y = 2'bxx;
end

• In fact, we have implemented a “priority decoder”


(the least significant ‘1’ gets priority)

42
A few points about operators
• Logical operators map into primitive logic gates Y = ~X << 2

• Arithmetic operators map into adders, subtractors, …


X[3]
• Unsigned 2’s complement Y[5]

• Model carry: target is one-bit wider that source X[2] Y[4]

• Watch out for *, %, and / X[1] Y[3]

• Relational operators generate comparators


X[0] Y[2]

Y[1]
• Shifts by constant amount are just wire connections
• No logic involved
Y[0]
• Variable shift amounts a whole different story  shifter
• Conditional expression generates logic or MUX
43
Datapath Synthesis
• Complex operators (Adders, Multipliers, etc.) are implemented in a special way

• Pre-written descriptions can be found in


Synopsys DesignWare or Cadence ChipWare IP libraries.

44
Global Clock Gating
Clock Gating enF FSM

• As you know, since a clock is continuously toggling,


it is a major consumer of dynamic power.
• Therefore, in order to save power, we will try to turn enE Execution
Unit
off the clock for gates that are not in use.
• Block level (Global) clock-gating
• If certain operating modes do not use an entire enM
Memory
module/component, a clock gate should be defined Control
clk
in the RTL.
• Register level (Local) clock-gating Local Clock Gating
• However, even at the register level,
if a flip-flop doesn’t change it’s output, din d q
dout
internal power is still dissipated due d q
to the clock toggling. din dout
en qn
• This is very typical of an enabled signal en
qn
clk
clk
sampling, and therefore can be automatically clk clk
detected and gated by the synthesis tool.

45
Clock Gating
• Local clock gating: 3 methods • Conventional RTL Code
• Logic synthesizer finds and //always clock the register
always @ (posedge clk) begin
implements local gating if (enable) q <= din;
opportunities end
• RTL code explicitly specifies
clock gating • Low Power Clock Gated RTL
• Clock gating cell explicitly • ><
//only clock the ff when enable is true
assign gclk = enable && clk;
instantiated in RTL always @ (posedge gclk) begin

• Global clock gating: 2 methods


q <= din;
end
• RTL code explicitly specifies • Instantiated Clock Gating Cell
clock gating //instantiate a clock gating cell
• Clock gating cell explicitly clkgx1 i1 (.en(enable), .cp(clk), .gclk_out(gclk));
always @ (posedge gclk) begin
instantiated in RTL q <= din;
end
46
Clock Gating – Glitch Problem
• What happens if there is a glitch on the enable signal?
clk

en

What if the glitch


gclk happened during
the high phase?

Ah, we live in a Not so Maybe the world aint


perfect world!  Fast! so perfect after all…

47
Solution: Glitch-free Clock Gate
• By latching the enable signal during the
positive phase, we can eliminate glitches:
clk

en
//clock gating with glitch prevention latch
always @ (enable or clk)
begin
if (!clk)
en_out en_out <= enable
end
assign gclk = en_out && clk;

gclk

48
Merging clock enable gates
• Clock gates with common enable can be merged
• Lower clock tree power, fewer gates
• May impact enable signal timing and skew.

clk
E clk
en E

enable E

49
Data Gating
• While clock gating is very well understood and automated, a similar situation
occurs due to the toggling of data signals that are not used.
• These situations should be
recognized and data gated.

assign add_out = A+B;


assign shift_out = A<<B;
assign out = shift_add ? shift_out : add_out;

assign shift_in_A = A && shift_add;


assign shift_in_B = B && shift_add;
assign shift_out = shift_in_A << shift_in_B;
assign out = shift_add ? shift_out : add_out;
50
Design and Verification – HDL Linting
• HDL Linting tools provide a quick easy check of likely coding inconsistencies:
• Simulation problems Simulation/Synthesis
• Synthesis Problems Miss-matches

• Simulation Synthesis mismatches always @(a)


• Clock gating z = a & b;

• Latch inference Latch Inference


• Clock Domain Crossing issues
always @(a or b or c)
• Nonsensical assignments / implicit bit widths issues if (c) z = a & b;

• Not for checking syntactic correctness


Clock Gating
• Use your simulator for that. (Will generally be more helpful)
assign clka = clk & cond;
• Alternatively some synthesis tools will give you basic lint always @(posedge clka)
z <= a & b;
warnings
• For simulation-synthesis mismatch errors
51
Syntax
Analysis
1 3 4 6 7 Library
2 5 8 Definition
Intro Lib. Boolean Tech. Verilog
Compile SDC Opt. Elaboration
Def. Min Mapping for Synt.
and Binding
Pre-mapping
Optimization
Constraint
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export

Timing Optimization

52
How can we optimize timing?
• There are many ‘transforms’ that the synthesizer applies to the logic to
improve the cost function:
• Resize cells
• Buffer or clone to reduce load on
critical nets
• Decompose large cells Delay = 4
• Swap connections on commutative pins or among equivalent nets
• Move critical signals forward
• Pad early paths
• Area recovery
• Simple example: Delay = 2
• Double inverter removal transform:
53
0.05
Resizing, Cloning and Buffering 0.04
0.03

d
• Resize a logic gate to better drive a load:
0.02
0.01
d 0.2
a a a 0
? e 0.2 A C 0 0.2 0.4 0.6 0.8 1
b b b 0.026
f 0.035 load
0.3

• Or make a copy (clone of the gate) to distribute the load: A B C


d 0.2
d
e 0.2 A
e
a
f 0.2 f
?
b g 0.2 a
g
B
h 0.2 b h d 0.2
• Or just buffer the fanout net: e 0.2
a
B f 0.2
b B g 0.2
0.1
54 h 0.2
Redesign Fan-In/Fan-out Trees
• Redesign Fan-In Tree
Arr(a)=4 a e
1 a
Arr(b)=3 b b 1
e
1
1 c
Arr(c)=1 c Arr(e)=6 1 Arr(e)=5
1 d
Arr(d)=0 d

• Redesign Fan-Out Tree 3 3


1 Longest Path = 4
Slowdown of
1 1 buffer due to
1 1 load

1 1
1 2

1 1
Longest Path = 5
55
Decomposition and Swapping
• Consider decomposing complex gates into less complex ones:

• Swap commutative pins:


• Simple sorting on arrival times and delays can help
1 1
0 2
a 5 c 3
1 1
1 1
b 1 b 1
2 c 0
a
2 2

56
Retiming
FF FF FF
• Given the following network:
D Q D Q D Q

6 4 2 4 4

clock Cycle = 10

• How would you meet the 10ns clock cycle time?


• Re-order sequential elements and combinational logic
FF FF FF

D Q D Q D Q

6 4 2 4 4

clock Cycle = 10
57
A last point: Topographical Synthesis
• Also called Physical Aware Synthesis

58
Main References
• Rob Rutenbar “From Logic to Layout”
• IDESA
• Rabaey, “Low Power Design Essentials”
• vlsicad.ucsd.edu ECE 260B – CSE 241A
• Roy Shor, BGU
• Synopsys slides

59

You might also like