Professional Documents
Culture Documents
ABSTRACT
Registers
Registers
Output
Input
A probabilistic power estimation technique for combinational Combinational
circuits is presented. A novel set of simple waveforms is the logic
kernel of this technique. The transition density of each cir- CLK
cuit node is estimated. Existing methods have local glitch
filtering approaches that fail to model this phenomenon cor- Figure 1: Combinational logic with synchronous pri-
rectly. Glitches originated from a node may be filtered in mary inputs as the target architecture of power es-
some, but not necessarily all, of its successor nodes. Our timator
waveform set approach allows us to utilize a global glitch
filtering technique that can model the removal of glitches in The average dynamic power consumed in a combinational
more detail. It produces error free estimates for tree struc- circuit is given by [11]:
1 2 X
N
tured circuits. For other circuit, experimental results us- Pav = VDD Ci Di (1)
ing the ISCAS’85 benchmarks show that the waveform set 2 i=1
method generally provides significantly better estimates of
the transition density compared to previous techniques. where VDD is the supply voltage, N is the total number of
nodes, Ci is the total load capacitance at node i and Di is
Categories and Subject Descriptors the transition density at node i defined as [12]:
B.6.3 [LOGIC DESIGN]: Design Aids—Switching theory; ni (T )
B.6.1 [LOGIC DESIGN]: Design Styles—Combinational Di = lim (2)
T →∞ T
logic
ni (T ) is the number of transitions in a time interval of length
General Terms T . Blocks of combinational logic within a synchronous sys-
Algorithms tem are very common in digital circuits (Fig. 1). The tran-
sitions on the primary combinational inputs then typically
Keywords appear on the clock edge and the total delay of the com-
probabilistic power estimation, gate-level, probability wave- binational circuit is less than the clock period. With these
form, combinational logic, transition density assumptions the transition density Di in a pure combinato-
rial circuit can be rewritten as:
1. INTRODUCTION
Di = fclk ni (3)
The demand for fast and accurate power estimation at
early design stages are steadily increasing as energy con- where ni is the average number of transitions at node i in
sumption is becoming a more important design factor. The one clock cycle and fclk is the clock frequency.
total power consumption in CMOS circuits results from a Deterministic delay assignments for the gates lead to a
combination of dynamic and static sources. In this paper we finite set of transition times for each node. Assuming a
focus on the dynamic part, caused by charging and discharg- given transition probability at the primary inputs, we get
ing of the load capacitances at the outputs of the nodes, i.e., a set Si of transition times with their probabilities at each
logic gates, in the circuit. Several power estimation tech- node i. Di can then be specified as follows:
niques have been proposed [1], [2], [3], [4], [5], [6], [7], [8], [9] X
and [10]. Di = fclk pj (4)
j∈Si
37
simulation (TPS) [4] presents an estimation technique based y(+¥,¥) y(1, 2)
on the notion of tagged waveforms which models the set of 1 2
all possible events at the output of each circuit node. The y(¥,+¥) y(3, 0)
origin of the estimation errors in the probabilistic power 0 3
38
function ComputeSWS (S1 , S2 )
a 1 OutputSTS=∅;
for all W1 ∈ S1
for all W2 ∈ S2
e g
Compute W = f (W1 , W2 );
if W is simple then
b 1 Add W to OutputSWS;
f else
d [W1 , W2 , W3 ]=DECOMP(W );
1 1
Add W1 , W2 and W3 to OutputSWS;
c MUX
simplify OutputSWS;
return OutputSWS;
4. DEALING WITH
y(+¥,2) y(+¥,2) INTERDEPENDENCIES
waveform at f 2 waveform at g 2 The problem of interdependencies between different nodes
arises in the presence of reconvergent fanouts in a combina-
y(3,2) yy(+¥,¥) tional logic circuit. For circuits with reconvergent fanout
waveform at g decompose waveform at g
paths, the exact computation of transition probabilities is a
2 3
problem of NP-complexity. Several methods try to overcome
this problem. [13] uses the supergate concept introduced
Figure 4: An example non-simple waveform gener- in [14] and tries to estimate spatial dependencies, while [4]
ation and its decomposition to simple waveforms uses macroscopic correlations. Correlation coefficients can
also be approximated using local ordered binary decision di-
agrams (LOBDDs) [15] and [16]. [17] and [10] use Markov
3. COMPUTATION OF SIMPLE Chain models to capture correlations. [7] employs Bayesian
WAVEFORM SETS networks to simplify the heavy global computation needed
For a given node in a combinational circuit, the output to overcome the problem of reconvergent fanouts. The com-
SWS is computed using its input SWSs as shown in Al- putations are localized to small units, referred to as cliques,
gorithm I in Figure 5. The computation for a node can assuming zero-delay model. We adopt the concept of macro-
commence if all input SWSs are computed earlier. The scopic correlation. Macroscopic correlations are correlations
list of available nodes is initialized to the primary inputs between the different nodes for fixed input values and zero-
in the beginning. After completion of the computations for delay model. Since the values of nodes are assumed to be
each node, the list of available nodes should be updated by settled and delay independent at times +∞ and −∞, we use
adding the new nodes that can be computed and remov- these values for capturing macroscopic correlations. These
ing the computed node. After computation of the SWS of macroscopic correlations are encapsulated by correlation co-
a given node i, its transition density is calculated. When efficients, which are defined as:
SWSs are computed for all nodes having node i as input, p(A = a ∧ B = b)
the SWS for node i can be removed from memory to reduce κa,b
A,B = (7)
p(A = a)p(B = b)
the estimator’s run-time memory requirement. The wave-
form W = Ψ(tr , tf ) resulting from applying W1 and W2 to where A and B are two logic nodes in the circuit and a and
a logic gate f , has the occurrence probability of p(W1 )p(W2 ) b are two logic values (a, b ∈ {0, 1}). We can rewrite the
0,0 1,0 0,1
where p(W1 ) and p(W2 ) are the occurrence probabilities of correlation coefficients κA,B , κA,B and κA,B as a function of
1,1
W1 and W2 , respectively. This assumption is correct if W1 κA,B = κA,B .
and W2 are independent. The inertial and transport de-
lay of the logic gate f , denoted as dfi and dft respectively, 0,0
κA,B =1+
pA pB (κA,B −1)
(1−pA )(1−pB )
are taken into account by adding the delay values dfi and 1,0
κA,B =
1−pB κA,B
(8)
1−pB
dft to the timetamps of the resulted waveform’s rise and fall 0,1
κA,B =
1−pA κA,B
1−pA
edges, i.e. tr and tf . For non-simple output waveform, if the
glitch length |tr − tf | is smaller than the inertial delay dfi , where pA and pB are the one-probability of node A and B,
it will be replaced by constant zero or one depending on the respectively. These correlation coefficients can be computed
waveform value at +∞. Although our proposed technique under the zero-delay model. In [17], Marculescu et al. give
is not limited to a particular delay model, we will utilize a a technique for computation of temporal and spatial cor-
fanout delay model. The delays of the gates are assumed relations. This is, however, computationally complex. A
to be proportional to their fanout number. The Simplify method for computing pairwise correlation coefficients re-
procedure in Algorithm I locates waveforms with equal rise cursively is presented by Ercolani et al. in [18]. This method
and fall times within the given SWS and replaces them with approximates joint probabilities using only pairwise correla-
a waveform with the same rise and fall times and an oc- tion coefficients of all involved signals and ignores higher or-
39
ders of correlations. However signal probabilities computed function ComputeSWS (S1 , S2 )
OutputSTS=∅;
using introduce errors to the estimations because of their i=Current Node ID;
deviation from accurate computation of signal probabilities. for all W1 ∈ S1
Another source of the error is introduced by utilization of for all W2 ∈ S2
macrocorrelations instead of microcorrelations and ignoring if (MiW1 and MiW2 )
temporal correlations. Microcorrelation provides the accu- Compute W = f (W1 , W2 );
rate correlation of two signals at a particular time with real if W is simple then
delay models. We will discuss these errors in Section 6. M W = M W1 and M W2 ;
In our power estimation method, the correlation coeffi- Add W to OutputSWS;
cients are multiplied with the occurrence probabilities. As- else
Compute newM W ;
sume that WA and WB are two waveforms at inputs A and [W1 , W2 , W3 , W4 , W5 ]=DECOMP2(W );
B of gate f . The occurrence probability for the waveform W
M1..4 = M W1 and M W2 and newM W ;
at the output of f is κp(WA )p(WB ) where
M5 = M W1 and M W2 ;
W
W |
A +∞ ,WB |+∞ W |A −∞ ,WB |−∞ Add W1..5 to OutputSWS;
κ = κA,B · κA,B (9) simplify OutputSWS;
As an example let us consider the multiplexer circuit in return OutputSWS;
Figure 3 again. Waveforms W1 = Ψ(1, +∞) and W2 =
Figure 6: Algorithm II for computing SWSs with
Ψ(+∞, 2) are applied to inputs e and f , respectively. A one-
glitch filtering
probability assignment of 0.5 at the primary inputs gives
3
occurrence probabilities of each of these waveform of 16 .
The reconverging fanout from node b to node g means that i
the waveforms at the inputs of node g are not independent. a f g 1
Using Ercolani’s method and Eq. 8, gives the following cor- b 1 1 1 1 2 h
relation coefficients: 5
8 1,0 4
c d e
1,1 0,1 0,0
κe,f = , κe,f = κe,f = and κe,f =0 (10) y(4,+¥)
9 3 y(0,+¥) 4
40
W1..4 so that they will be removed, leaving only waveform
W5 . The glitch is consequently filtered out. Let us consider Table 1: Error comparison for ISCAS’85 benchmark
the example shown in Figure 7. A glitch with length 3 will circuits with fanout delay assignment. All errors are
be created at node f due to unequal delays of input paths in percentage.
TPS TPS-DT TPS-EDT SWS
of node f . The successor nodes i and h have different iner-
tial delays. The glitch with length 3 created in node f will Circuit Eav s Etot Eav s Etot Eav s Etot Eav s Etot
pass through node g and i, while it will be filtered out in
C17 2.3 2.6 0.1 2.3 2.6 0.1 2.3 2.6 0.1 0.7 1.1 0.7
node h. Therefore the node h mask tag of waveform W1..4
generated at node f will be false while all other mask tags C432 29.9 38.8 35.8 9.5 11.8 6.5 11.5 16.6 11.5 5.2 9.6 2.8
will be false. During the SWS computation for node h, C499 6.8 14.0 7.0 3.6 8.2 0.6 2.3 3.0 3.0 1.0 1.6 0.3
all waveforms with false mask tag value for node h will be
C880 8.3 15.3 1.6 8.0 15.7 5.2 4.8 9.0 0.0 4.4 7.9 2.7
removed.
C1355 24.2 31.6 32.9 5.8 11.2 5.4 5.0 9.5 0.5 6.3 9.5 0.2
6. EXPERIMENTAL RESULTS C1908 15.0 23.1 4.1 17.7 27.9 11.2 7.0 16.3 2.0 7.8 11.1 2.3
The proposed technique is implemented in C++. This C2670 16.6 29.8 7.2 16.7 28.3 9.9 13.2 23.6 6.2 9.7 16.7 2.9
prototype tool includes the procedure for signal probabil-
C3540 13.8 26.3 9.8 10.3 25.6 2.4 10.5 26.4 3.7 7.3 17.9 1.5
ity and correlation coefficient estimation. The inputs to the
tool are the netlist, the delay model and one-probabilities C5315 11.8 24.4 2.3 13.4 31.5 10.1 11.3 27.0 3.4 7.1 11.5 2.1
of primary inputs. We want to emphasize that any realistic C6288 27.4 27.5 32.1 15.7 18.8 4.1 12.7 15.4 0.2 15.7 20.1 4.9
delay model can be used in our system and it not restricted
C7552 14.5 27.5 3.2 14.8 31.4 7.8 14.1 27.6 1.3 11.5 27.7 0.9
to the fanout model. One-probabilities for primary inputs
are in our experiments set to 0.5, but can be any arbitrary AVE. 15.5 23.7 12.4 10.7 19.4 5.8 8.6 16.1 2.9 7.0 12.2 1.9
value between 0 and 1. The reference transition densities
for all experiments are acquired from a circuit logic simu- Figure 8: Average node
lation with 100000 random input vectors that satisfy the error, average number 140
one-probabilities of the primary inputs (0.5). A sample tree of waveforms per node 120
structured circuit is analyzed in [5] and computation errors andruntime versus pth 100
are reported. Node errors are up to 42% for the proba- for C1355 80
41
[3] M.G. Xakellis and F.N. Najm. Statistical estimation
Table 2: Runtime and computation complexity for of the switching activity in digital circuits. In Proc. of
ISCAS’85 benchmark circuits with fanout delay as- DAC, pages 728–733, 1994.
signment. All errors are in percentage.
[4] C.S. Ding, C.Y. Tsui, and M. Pedram. Gate-level
logic depth
power estimation using tagged probabilistic
waveforms
Number of
Ave. logic
Ave. # of
Maximum
Runtime
depth
nodes
Etot1
Etot2
Circuits and Systems, 17:1099–1107, November 1998.
Eav1
Eav2
(sec)
42