Datapath Optimizations

Datapath Optimizations in Oasys
A general rule in Oasys is we only merge those datapaths in one partition which can have Carry-
save(CS) edges in between them. These datapaths are generally non-lossy additions or lossy to
the same precision additions. Mults feeding adders are also marked CS if they follow the non-
lossy/lossy rule mentioned above.
Lossy to same precision example:

Input [3:0] a,b,c;
Output [3:0] out;
wire [3:0] temp = a+b;  out = a+b+c (a,b,c in one partition)
out = temp+c;
Here “out” is a lossy adder but “out” and “temp both have size 4 with temp also being a
lossy addition in size “4”. Hence we merge all the inputs in one partition.
=> This will have to be taken care of when we plan to introduce hierarchy/partition for
datapaths.
Optimizations:
CSE opt:
RTL:
assign out1 = a+b+c
assign out2 = a+b+d
Synthesis:
assign temp = a+b
assign out1 = temp+c
assign out2 = temp+d
Now signal “temp” has 2 fanouts and hence Synthesis netlist will not have any Carry-Save
additions. So right after synthesis there will be 3 datapath Partitions one for each line.
If any of out1 or out2 become critical in timing we will undo the CSE and try to infer Carry-
Save additions.
Optimize Netlist if lets say out1 is critical:

assign out1 = a+b+c //implemented using carry save additions and in one partition
 This is already done by rtlc-FP. Have to check for other complex variations.
=====================================================================
=========
Conditional Pruning opt:

Flavour 1: Constant Propagation to eliminate addition
RTL:
assign out = (sel == 4’b1010) ? (sel + 1’b1) : (a+b)
Netlist:
assign out = (sel == 4’b1010) ? (4’b1011) : (a+b)
=> Done by RTLC-FP
Flavour 2: Range propagation to reduce size of adder

RTL:
Input [2:0] a;
input[15:0] sel;
wire [15:0] temp = sel+a
assign out = (sel < 2’b11) ? temp : a
Netlist
wire [3:0] temp = sel+a //because sel can be atmost 2’b10 for temp to be active
assign out = (sel < 2’b11) ? temp : a
Size of “temp” adder is reduced from size 16 to 4
=> Done by rtlc-precision but not by FP.

Optimization in precision done at Nomlevel(nomSweepAfterFlatten)
Done here: BreakCombinationalLoops
Next here: After_MuxDataPathOptForModgens
=====================================================================
=========
Collapse adder Vector writes opt:
RTL:
module test(d,idx,q);
input [7:0] d; input [3:0] idx; output [31:0] q;
wire [4:0] t; assign t = idx+1;
always @(*) begin
q=0;
{q[t+7], q[t+6], q[t+5], q[t+4], q[t+3], q[t+2], q[t+1], q[t]} = d
end
endmodule
Netlist:
input [3:0] idx; output [31:0] q; wire [4:0] t;
assign t = idx+1;
always @(*) begin
q = 0;
q[t +: 8] = d;
end
 Not performed in rtlc-FP & rtlc-precision.
Add Shift Opt:

RTL:
assign out = c << (a+1)
Netlist:
assign out = {c,1’b0} << a
iff (a+1) is non-lossy
 Not performed in rtlc-FP * rtlc-precision.
=============================================================
Cancellations opt:
RTL:
assign out = (a+b+c) == (a+d+e)
Netlist:
assign out = (b+c) == (d+e)
 Disable in rtlc-FP &enabled in rtlc-precision.
===============================================================
==
Carry-Save Additions opt:
RTL: Simple additions
input [3:0] a,b,c;
output [3:0] out;
wire [3:0] temp;
assign temp= a+b+4;
assign out = temp+c+5;
Netlist
assign out = a+b+c+9; //4+5 = 9, separate const terms get added into one equiv term
We will create one datapath partition that will implement a ripple-adder after synthesis
and carry-save addition after timing opt
 Not performed in rtlc-precision and rtlc-fp.

-----------------------------------------------------------------------------------------------------------
carry save marking across wire nodes

RTL:
input signed [3:0] c;
input signed [2:0] b;
input signed [7:0] a;
output signed [7:0] out;
wire signed [6:0] mult; wire signed [7:0] temp;
assign mult = b*c;
assign temp = {mult[6],mult[6:0]};
assign out = a + temp;
Netlist:
assign out = ($signed)b*c + ($signed)a //both the mult and addition will get merged in one
partition
 Datapath partition problem.
For this testcase you will see just one datapath partition with inputs a,b and c.
--------------------------------------------------------------------------------------------------------------------------------
----
RTL:
input [4:0] a,b,c;
output [6:0] out;
assign out = {(a+b),3’b000}+c
Netlist:
assign out = {a,3’b000}+{b,3’b000}+c
But if (a+b) already has 2 fanouts we wont do this optimization in synthesize step as we will
throw up on area. However, if in timing (a+b) is de-cloned and has one fanout, we will do this
CS optimization in timing flow.
 Does not seems happening in Oasys for one case. Not happening in RTLC either.
---------------------------------------------------------------------------------------------------------------------
------
carry save marking across Inequalities

RTL:
assign out = ((d1+d2)) > (d3+d4) ? a : b
Netlist:
assign temp = (d1+d2) - (d3+d4);

assign out = temp ? a : b
 not done in rtlc-fp /precision. We always create LESS_THAN macro.

=====================================================================================
Resource Sharing opt:

RTL:
assign out = sel ? (a+b) : (a+c)
Netlist:
assign out = a+ (sel ? b : c)
 Not done in rtlc-FP. Performed in rtlc-precision.

Datapath Optimizations

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Datapath Optimizations

Uploaded by

Copyright:

Available Formats

Datapath Optimizations in Oasys

Lossy to same precision example:

Optimize Netlist if lets say out1 is critical:

Conditional Pruning opt:

=> Done by RTLC-FP

Flavour 2: Range propagation to reduce size of adder

=> Done by rtlc-precision but not by FP.

Add Shift Opt:

iff (a+1) is non-lossy

 Not performed in rtlc-FP * rtlc-precision.

 Disable in rtlc-FP &enabled in rtlc-precision.

 Not performed in rtlc-precision and rtlc-fp.

carry save marking across wire nodes

 Datapath partition problem.

carry save marking across Inequalities

assign temp = (d1+d2) - (d3+d4);

 not done in rtlc-fp /precision. We always create LESS_THAN macro.

Resource Sharing opt:

 Not done in rtlc-FP. Performed in rtlc-precision.

You might also like