Professional Documents
Culture Documents
5 December 2016
Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited;
however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed,
please feel free to email adam.teman@biu.ac.il and I will address this as soon as possible.
What is Logic Synthesis? module counter(
input clk, rstn, load,
input [1:0] in,
• Synthesis is the process that converts RTL into a technology- output reg [1:0] out);
always @(posedge clk)
specific gate-level netlist, optimized for a set of pre-defined if (!rstn) out <= 2'b0;
else if (load) out <= in;
constraints. else out <= out + 1;
• You start with: endmodule
Synthesis
Standard Cell Library
• A standard cell library
• A set of design constraints Design Constraints
• You finish with: module counter ( clk, rstn, load, in, out );
• A gate-level netlist, mapped to the input [1:0] in;
output [1:0] out;
standard cell library input clk, rstn, load;
wire N6, N7, n5, n6, n7, n8;
• (For FPGAs: LUTs, flip-flops, and RAM blocks) HDDFFPQ1 out_reg_1 (.D(N7),.CK(clk),.Q(out[1]));
3
Motivation
• Why perform logic synthesis?
• Automatically manages many details of the design process:
• Fewer bugs
• Improves productivity
• Abstracts the design data (HDL description) from any particular implementation technology
• Designs can be re-synthesized targeting different chip technologies;
• E.g.: first implement in FPGA then later in ASIC
• In some cases, leads to a more optimal design than could be achieved by
manual means (e.g.: logic optimization)
• Why not logic synthesis?
• May lead to less than optimal designs in some cases
4
Simple Example
module foo (a,b,s0,s1,f);
input [3:0] a;
input [3:0] b;
input s0,s1;
output [3:0] f;
reg f;
always @ (a or b or s0 or s1)
if (!s0 && s1 || s0)
f=a;
else
f=b;
endmodule
5
Goals of Logic Synthesis
• Minimize area
• in terms of literal count, cell count, register count, etc.
• Minimize power
• in terms of switching activity in individual gates,
deactivated circuit blocks, etc.
• Maximize performance
• in terms of maximal clock frequency of synchronous
systems, throughput for asynchronous systems
• Any combination of the above
• combined with different weights
• formulated as a constraint problem
• “minimize area for a clock speed > 300MHz”
• More global objectives
• feedback from layout
• actual physical sizes, delays, placement and routing
6
How does it work?
Variety of general and ad-hoc (special case) methods:
• Instantiation:
• maintains a library of primitive modules (AND, OR, etc.) and user defined modules
• “Macro expansion”/substitution:
• a large set of language operators (+, -, Boolean operators, etc.)
and constructs (if-else, case) expand into special circuits
• Inference:
• special patterns are detected in the language description and treated specially
(e.g.,: inferring memory blocks from variable declaration and read/write statements,
FSM detection and generation from “always @ (posedge clk)” blocks)
• Logic optimization:
• Boolean operations are grouped and optimized with logic minimization techniques
• Structural reorganization:
• advanced techniques including sharing of operators,
and retiming of circuits (moving FFs), and others
7
Syntax Analysis
Basic Synthesis Flow
Library Definition
• Syntax Analysis:
• Read in HDL files and check for syntax errors. Elaboration and
read_hdl –verilog sourceCode/toplevel.v Binding
• Pre-mapping Optimization:
• Reduce Boolean literals.
8
Syntax Analysis
Basic Synthesis Flow
Library Definition
• Constraint Definition:
• Define clock frequency and other design constraints. Elaboration and
read_sdc sdc/constraints.sdc Binding
Compilation
…but aren’t we talking about synthesis?
Syntax
Analysis
• Compiler Post-mapping
Optimization
• Recognizes all possible constructs in a formally defined program language Report and
of execution process
• Synthesis
• Recognizes a target dependent subset of a hardware
description language
• Maps to collection of concrete hardware resources
• Iterative tool in the design flow
11
Syntax
Analysis
1 3 4 6 7 Library
2 5 8 Definition
Intro Lib. Boolean Tech. Verilog
Compile SDC Opt. Elaboration
Def. Min Mapping for Synt.
and Binding
Pre-mapping
Optimization
Constraint
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export
Library Definition
12
Syntax
Analysis
• The library definition stage is tells the synthesizer where to look for
Constraint
Definition
Technology
leaf cells for binding and the target library for technology mapping. Mapping
Boolean Minimzation
Mapping to Generics and Libs, Basics of Boolean Minimization
(BDDs, Two-Level Logic, Espresso)
14
Syntax
Analysis
15
Syntax
Analysis
B
Two-Level Logic C
Library
Definition
Elaboration
B and Binding
• During elaboration, primary inputs and outputs (ports) are D Pre-mapping
Optimization
defined and sequential elements (flip-flops, latches) are A
Constraint
C
inferred. D F Definition
• Input ports and register outputs are inputs to the logic C Post-mapping
Optimization
• Output ports and register inputs are the outputs of the logic D
Report and
• The outputs can be described as Boolean functions of the C export
inputs. D
• The goal of Boolean minimization is to reduce the number of
literals in the output functions. f = x1x2
A
• Many different data structures are used to represent the B F
D
Boolean functions:
• Truth tables, cubes, Binary Decision Diagrams, A x3
equations, etc. B x2
• A lot of the research was developed upon SOP or POS C
x1
representation, which is better known as “Two-Level Logic”
16
Syntax
Analysis
• However… 01 0 1 1 1
D
Post-mapping
Optimization
• Difficult to automate (NP-complete) 11 0 X X 0 Report and
export
• Number of cells is exponential (<6 variables)
C
10 0 1 0 1
18
Syntax
f AC CD AC CD Analysis
A Constraint
A
AB AB Definition
CD 00 01 11 10 CD 00 01 11 10 Technology
Mapping
00 1 1 0 0 00 1 1 0 0
Post-mapping
Optimization
01 1 1 1 1 01 1 1 1 1 Report and
D export
D
11 0 0 1 1 11 0 0 1 1
C C ESPRESSO(F) {
10 1 1 1 1 10 1 1 1 1 do {
reduce(F);
expand(F);
B B irredundant(F);
Result of REDUCE: } while (F smaller);
Initial Set of Primes found by verify(F);
Steps1 and 2 of the Espresso Shrink primes while still }
Method covering the ON-set
A A Constraint
AB AB Definition
CD 00 01 11 10 CD 00 01 11 10 Technology
Mapping
00 1 1 0 0 00 1 1 0 0
Post-mapping
Optimization
01 1 1 1 1 01 1 1 1 1 Report and
D D export
11 0 0 1 1 11 0 0 1 1
C C ESPRESSO(F) {
10 1 1 1 1 10 1 1 1 1 do {
reduce(F);
expand(F);
B B irredundant(F);
} while (F smaller);
Second EXPAND generates a IRREDUNDANT COVER found by verify(F);
different set of prime implicants final step of espresso }
20
Syntax
Analysis
21
Syntax
Analysis
13
• Multi-level Logic Minimization can result in: Literals
t1 = d + e;
t2 = b + h; d+e t1t3 + fgh t4’ F
t3 = at2 + c;
t4 = t1t3 + fgh;
b+h at2 +c
F = t4’;
22
Syntax
Analysis
• BDDs are DAGs that represent the truth table of a given function Pre-mapping
Optimization
Constraint
Root node Definition
x1 x2 x3 f(x1x2x3) Technology
f(x1,x2,x3) Mapping
0 0 0 1 x1
Post-mapping
0 1 Optimization
0 0 1 1
~(x2x3) Report and
x2 x2 x2 ~x3 export
0 1 0 1
0 1 0 1
0 1 1 0
1 0 0 0 ~x3
x3 x3 ~x3 x3 x3
1 0 1 0
0 1 0 1 0 1 0 1
1 1 0 1
1 1 1 0 0 0 1 0
24
Syntax
Analysis
1 1 1 0 0 1 0 1 0
f(x1, x2, x3) = ~x1~x2~x3 + ~x1~x2x3 + ~x1x2~x3 + x1x2~x3 = ~x1~x2 + ~x1x2~x3 + x1x2~x3
25
Syntax
Analysis
x1
f(x1,x2,x3) f(x1,x2,x3) x1
Optimization
x x x x Report and
export
~(x2x3) x2 x2 ~x3 x2
~(x2x3) x2 x2 ~x3 x2
y z y z
1 0 1 0
26
Syntax
Analysis
x3 x3 ~x3 x3 x3
f(x1, x2, x3) = ~x1~x2 + ~x3
~x1x2~x3 + x1x2~x3
1 0 1 0
27
Syntax
Analysis
• Given a BDD for a function f, the BDD for f’ can be c+bd b Post-mapping
Optimization
obtained by interchanging the terminal nodes. b b
• Equivalence check.
Report and
c export
c+d
• Two functions f and g are equivalent if their BDDs (under c c
the same variable ordering) are the same.
• An Important Point: d
• The size of a BDD can vary drastically if the order d
in which the variables are expanded is changed.
• The number of nodes in the BDD can be
0 1
exponential in the number of variables in the
worst case, even after reduction
28
Syntax
Analysis
1 3 4 6 7 Library
2 5 8 Definition
Intro Lib. Boolean Tech. Verilog
Compile SDC Opt. Elaboration
Def. Min Mapping for Synt.
and Binding
Pre-mapping
Optimization
Constraint
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export
Constraint Definition
29
Syntax
Analysis
30
Syntax
Analysis
1 3 4 6 7 Library
2 5 8 Definition
Intro Lib. Boolean Tech. Verilog
Compile SDC Opt. Elaboration
Def. Min Mapping for Synt.
and Binding
Pre-mapping
Optimization
Constraint
Definition
Technology
Mapping
Post-mapping
Optimization
Report and
export
Technology Mapping
31
Syntax
Analysis
• Technology mapping is the phase of logic synthesis when gates are Pre-mapping
Optimization
• For example, F=abcdef as a 6-input AND gate causes a long delay. Report and
export
• Gates in the library are pre-designed, they are usually optimized in
terms of area, delay, power, etc.
• Fastest gates along the critical path, area-efficient gates (combination)
off the critical path.
• Can apply a minimum cost tree-covering algorithm to solve this
problem.
32
Syntax
Analysis
• Describe SC library with NAND2 and NOT gates and associate a cost with each gate Report and
export
• Tree-ifying the input netlist
• Tree covering can only be applied to trees!
• Split tree at all places, where fanout > 2
• Minimum Cost Tree matching
• For each node in your tree, recursively find the minimum cost target pattern at that
node.
• Let us briefly go through these steps
33
Syntax
Analysis
34 F t4
Syntax
Analysis
xor (5)
nor3 (3)
nor2(2)
35
Syntax
Analysis
2. Tree-ifying Library
Definition
Elaboration
and Binding
We get 3 trees
36
Syntax
Analysis
cost i min k cost gi k cost ki k1 k2
Optimization
Report and
• where ki are the inputs to gate g. export
37
Syntax
Analysis
NAND2
A B N x
B I I
A 6
6
I N N
4
C D I N N I I I I
2 3
CD
NOT NAND2 AND2 NOR2 AOI21
38
1 3 4 6 7
2 5 8
Intro Lib. Boolean Tech. Verilog
Compile SDC Opt.
Def. Min Mapping for Synt.
39
Some things we may have missed
• Now that we’ve seen how synthesis works, let’s revisit some of
the things we may have skipped or only briefly mentioned earlier…
• Let’s take a simple 42 encoder as an example:
• Take a one-hot encoded vector and output the position of the ‘1’ bit.
• One possibility would be to describe this logic with a nested if-else block:
always @(x)
begin : encode
if (x == 4'b0001) y = 2'b00;
else if (x == 4'b0010) y = 2'b01;
else if (x == 4'b0100) y = 2'b10;
else if (x == 4'b1000) y = 2'b11;
else y = 2'bxx;
end
41
Some things we may have missed
• In the previous example, if the encoding was wrong (i.e., not one-hot), we would
have propagated an x in the logic simulation.
• But what if we guarantee that the input was one hot encoded?
• Then we could write our code differently…
always @(x)
begin : encode
if (x[0]) y = 2'b00;
else if (x[1]) y = 2'b01;
else if (x[2]) y = 2'b10;
else if (x[3]) y = 2'b11;
else y = 2'bxx;
end
42
A few points about operators
• Logical operators map into primitive logic gates Y = ~X << 2
Y[1]
• Shifts by constant amount are just wire connections
• No logic involved
Y[0]
• Variable shift amounts a whole different story shifter
• Conditional expression generates logic or MUX
43
Datapath Synthesis
• Complex operators (Adders, Multipliers, etc.) are implemented in a special way
44
Global Clock Gating
Clock Gating enF FSM
45
Clock Gating
• Local clock gating: 3 methods • Conventional RTL Code
• Logic synthesizer finds and //always clock the register
always @ (posedge clk) begin
implements local gating if (enable) q <= din;
opportunities end
• RTL code explicitly specifies
clock gating • Low Power Clock Gated RTL
• Clock gating cell explicitly • ><
//only clock the ff when enable is true
assign gclk = enable && clk;
instantiated in RTL always @ (posedge gclk) begin
en
47
Solution: Glitch-free Clock Gate
• By latching the enable signal during the
positive phase, we can eliminate glitches:
clk
en
//clock gating with glitch prevention latch
always @ (enable or clk)
begin
if (!clk)
en_out en_out <= enable
end
assign gclk = en_out && clk;
gclk
48
Merging clock enable gates
• Clock gates with common enable can be merged
• Lower clock tree power, fewer gates
• May impact enable signal timing and skew.
clk
E clk
en E
enable E
49
Data Gating
• While clock gating is very well understood and automated, a similar situation
occurs due to the toggling of data signals that are not used.
• These situations should be
recognized and data gated.
Timing Optimization
52
How can we optimize timing?
• There are many ‘transforms’ that the synthesizer applies to the logic to
improve the cost function:
• Resize cells
• Buffer or clone to reduce load on
critical nets
• Decompose large cells Delay = 4
• Swap connections on commutative pins or among equivalent nets
• Move critical signals forward
• Pad early paths
• Area recovery
• Simple example: Delay = 2
• Double inverter removal transform:
53
0.05
Resizing, Cloning and Buffering 0.04
0.03
d
• Resize a logic gate to better drive a load:
0.02
0.01
d 0.2
a a a 0
? e 0.2 A C 0 0.2 0.4 0.6 0.8 1
b b b 0.026
f 0.035 load
0.3
1 1
1 2
1 1
Longest Path = 5
55
Decomposition and Swapping
• Consider decomposing complex gates into less complex ones:
56
Retiming
FF FF FF
• Given the following network:
D Q D Q D Q
6 4 2 4 4
clock Cycle = 10
D Q D Q D Q
6 4 2 4 4
clock Cycle = 10
57
A last point: Topographical Synthesis
• Also called Physical Aware Synthesis
58
Main References
• Rob Rutenbar “From Logic to Layout”
• IDESA
• Rabaey, “Low Power Design Essentials”
• vlsicad.ucsd.edu ECE 260B – CSE 241A
• Roy Shor, BGU
• Synopsys slides
59