Professional Documents
Culture Documents
generation
B. Pal, A. Sinha, P. Dasgupta, P.P. Chakrabarti and K. De
Abstract: Recent design and verification languages, such as SystemVerilog, support a rich test
bench language, which provides significant support towards developing layered, structured, con-
strained random test bench architectures. Typically, the test bench language offers many features
that are not synthesisable and therefore cannot be carried into the hardware for hardware acceler-
ated simulation. One of the main challenges in improving the performance of hardware accelerated
simulation is to run the task of random value selection under specified constraints in hardware. This
problem (possibly for the first time) is addressed and a two-step approach is presented. In the first
step, the constraints are pre-processed in software to generate a set of entailed regions. In the
second step, random value selection is performed in hardware using the entailed regions
pre-computed in the first step. It is shown that this method has modest area overhead and produces
constraint satisfying random valuations within very few cycles. Results on test bench architectures
for the ARM AMBA Bus and IBM CoreConnect protocol suites have been reported.
Authorized licensed use limited to: Princeton University. Downloaded on August 19, 2009 at 15:31 from IEEE Xplore. Restrictions apply.
Fig. 1 Emerging trends in hardware accelerated simulation
a Pure software simulation
b Hardware accelerated simulation
c Outline of the proposal
From our experience in modelling the verification IPs for How complex are our constraints? This is an important
several standard bus protocols (including ARM AMBA question in terms of feasibility, since solving constraint sat-
[12], IBM CoreConnect (IBMCC) [13], and PCI Bus isfaction problems using real variables is a notoriously
[14]), we have observed that constrained value selection complex problem in general and it is hard to conceive any
in randomised test benches are broadly of two types: efficient implementation of existing algorithms in hardware.
On the other hand, the types of constraints typically used by
(1) Selection of control bits: For example, the model of an verification engineers in constrained random test benches are
arbiter in the test bench may randomly select a grant line often very simple. The most common constraints are range
from among requesting master devices. Also the test constraints, that specify the range of individual variables.
bench may randomly select the transfer type (single/ Another common form of constraint specifies that the value
burst) from a set of allowable transfer types. of a variable must be within a given range of another variable
(2) Selection of data words: For example, in a memory (e.g. a randomly generated address variable may be con-
mapped architecture, the address must be within a specific strained within some specified offset of a base address).
range for a given device to be selected. As another Such constraints involve two variables. Individual constraints
example, the starting address may have to be aligned to involving more than two variables are rare in most appli-
10 K byte boundaries in a burst transfer. cations. For example in the entire test bench architectures
for ARM advanced microcontroller bus architecture
For the first type, the constraints are typically Boolean, or (AMBA), IBM CC and PCI Bus, we never required any
can be modelled as Boolean constraints without much over- single constraint involving more than two variables although
head. Value selection under Boolean constraints has been many variables were constrained. Sample test benches of
well studied [15 – 17]. There has also been several attempts IBMCC device control register (DCR), on-chip peripheral
towards synthesising Boolean constraints in hardware, bus (OPB) protocols and AMBA AHB Master component
including attempts to synthesise boolean decision diagrams are given in our website for reference [20].
(BDD) that model the solution space [18, 19]. We target systems of linear constraints where each
Our interest in this paper is on constraints of the second constraint involves at most two variables. Since all the con-
type. Since these are constraints on words, any Boolean straints are hard-coded in the test bench, we have the
encoding of the problem will lead to combinatorial liberty to simplify the set of constraints prior to hardware
explosion because there will be too many variables. For accelerated simulation. Our methodology works in two steps.
example, BDDs grow alarmingly after about 100 variables,
whereas each word in a 64 bit architecture will add 64 (1) Constraint pre-processing in software: In the first step,
variables. we use an integer linear programming (ILP) solver to
424 IET Comput. Digit. Tech., Vol. 1, No. 4, July 2007
Authorized licensed use limited to: Princeton University. Downloaded on August 19, 2009 at 15:31 from IEEE Xplore. Restrictions apply.
Fig. 2 Example constraint random test-bench
deduce a set of regions covering the constraint satisfying region (see Fig. 3) for the above constraints is entailed by
valuations. Since this is a pre-processing step executed in the lines:
software, and since this step is executed only once before
the start of the emulation, the complexity of this step is not
a major issue and we can afford to use sophisticated C1:offset 0; C2:base + offset 511;
algorithms.
(2) Constrained random selection in hardware: In the C3:base þ 2offset 512
second step we synthesise a constrained random test genera- This processing is done in software. In the second step, we
tor in hardware. This circuit uses a pseudo-random genera- use the stored information to search for a random valuation
tor based on a linear feedback shift register (LFSR) and uses of the inputs in hardware. The pre-computation and entail-
constraint specific circuitry to constrain the generator within ment of the solution region make the search algorithm deter-
the regions entailed by the first step. ministic, non-backtracking, synthesisable and workable
within limited hardware resources.
The paper is organised as follows. We present the overall
The main challenge is that the first step must produce idea around suitable examples in Section 2. Section 3
entailed regions in some form such that the second step presents the software preprocessing phase. Section 4
can work efficiently, both in terms of the number of clock describes the synthesis methodology with suitable
cycles required to produce the random valuations, and in examples. Section 5 extends the basic pre-processing step
terms of the area overhead of the generating circuit. We in presence of inequality constraints. Section 6 presents the
present a simple example for illustration as follows. application aspects of the tool for the layered verification
Example 1.1: Consider a Bus connecting a Master device
and two Slave devices. The Master, when required, can
initiate transfers (read or write) to the Slave devices. The
address map of the Slave is [0, 1023], where Slave1 has
the range [0, 511] and Slave2 has the range [512, 1023].
For slave-specific tests, a test bench that models the
Master needs to constrain the values of base and offset in
order to provide addresses within the desired range. A
SystemVerilog constraint random test bench fragment is
given as follows. For readability, only relevant portions of
the test bench are given (Fig. 2).
The proposed methodology is performed in two steps. In
the first step, we take the input test bench constraints and
pre-compute the solution region and entail this in some
well-defined data structure. For example, the solution Fig. 3 Solution region
IET Comput. Digit. Tech., Vol. 1, No. 4, July 2007 425
Authorized licensed use limited to: Princeton University. Downloaded on August 19, 2009 at 15:31 from IEEE Xplore. Restrictions apply.
choose a random value within [z lower, z upper]. In general,
in the k-th step, a random value is assigned to the k-th vari-
able and the given problem reduces to a sub-problem having
(n 2 k) variables. More importantly, the choice of the value
of the k-th variable is done in such a way that guarantees the
existence of a solution for the reduced problem involving
(n 2 k) variables.
By virtue of projections, our target in hardware is to get a
random value for a constrained variable from the projected
polygon in any given plane. Essentially, the hardware
should be simple enough to churn out random values in a
stipulated time frame, otherwise the whole purpose of hard-
ware acceleration would be defeated. For this, suitable logic
is required to store and propagate the information obtained
from each plane. We partition our methodology into two
halves – software preprocessing of the 2D data we have
from each plane, followed by hardware synthesis of the
oracle targeted to generate valid assignments to the set of
Fig. 4 Overall idea constrained variables.
Authorized licensed use limited to: Princeton University. Downloaded on August 19, 2009 at 15:31 from IEEE Xplore. Restrictions apply.
Fig. 6 Procedure compute strip
Authorized licensed use limited to: Princeton University. Downloaded on August 19, 2009 at 15:31 from IEEE Xplore. Restrictions apply.
Our estimate gives the square of this ratio. From the intui-
tive dimensional analysis, we find the following (the oper-
ator dim() gives the dimension of a quantity)
Authorized licensed use limited to: Princeton University. Downloaded on August 19, 2009 at 15:31 from IEEE Xplore. Restrictions apply.
Fig. 10 Outline of the procedure GenRandNum
We shift the LFSR only once (in step 2) such that it x within a chosen strip, we need to refine the bounds of
requires one cycle to generate R. A sample schematic of the other variables associated with the overlapping con-
the associated logic is shown Fig. 11. The logic is produced straints for that strip. The refinement is done by a (simple)
by Synopsys Design Compiler. combinational logic. The logic uses the linear constraints
involving x and every other associated variable y, and sub-
Example 4.1: Let xmin ¼ 1 and xmax ¼ 21. Therefore stitutes the value of x (in the constraints) to get a refined
d ¼ 20. Also assume that D ¼ 25. Thus the width of the domain for y. As the OCS for a strip is constant, the
LFSR becomes 5-bits. Suppose the 5 bit LFSR generates amount of logic is also constant.
R(¼50 b11100). So, R . d. Thus we check R from the
MSB. It has a ‘1’ at the MSB. So we flip it. The modified Example 4.2: Consider the following two constraints on two
R becomes 50 b01100 (decimal 12), which becomes less variables x and y.
than d. Thus the generated value for x becomes 13
(1 þ 12). A Con1 :x þ y 1; Con2 :x þ 2 y 6
A detailed analysis of the circuit complexity is given in
Table 1. The first column shows width (in bits) of R (i.e. Initially, the lower and upper bounds of x and y are given
D). The second column shows width (in bits) of d. The as [1, 6] and [1, 3]. Assume that during randomization, we
third column shows the gate count of the generated logic. take x first and then y. For this, a sample bound updation
The fourth column enumerates the area overhead and the logic (for variable y) is given as follows.
fifth column shows the critical path delay (CPD). Note
that the circuit complexity grows linearly with the width yltemp ¼ d1 xeylower ¼ (ylower
of R. The synthesis has been done using Design Compiler
yltemp ) ? ylower :yltemp
[24] (of Synopsys).
yhtemp ¼ b3 (x=2)c
4.2 Refinement of the bounds of a variable
yupper ¼ (yupper yhtemp ) ? yupper :yhtemp
Each strip for a variable x has its own OCS (defined in
Section 3.2). After choosing a random value for a variable Now assume we get value 3 for variable x from the LFSR.
For this i the refined bounds of y (from the above logic)
become [1, 2].
Note that with reference to Fig. 9, the choice of x0 (for
variable x) makes only the strips Sy2 (completely
covered), Sy1 and Sy3 (partially covered) belong to the
updated bound [ y lower, y upper] of y. We will refer to this
logic as a procedure called RefineBound.
Fig. 11 Sample schematic for the value adjustment logic A sample synthesised circuit (by Synopsys Design
Compiler) for the updation logic (of Example 4.2) is
8 3 34 8901.27 1.55
10 3 42 10 874.5 1.64
12 3 49 12 811.73 1.55
15 3 58 15 680.46 1.64
Fig. 12 Outline of three primary procedures
IET Comput. Digit. Tech., Vol. 1, No. 4, July 2007 429
Authorized licensed use limited to: Princeton University. Downloaded on August 19, 2009 at 15:31 from IEEE Xplore. Restrictions apply.
Table 2: Logic complexity of the adjustment hardware
3 2 19 5535.0 1.57
2 31 5935.74 1.69
3 35 7215.7 2.04
5 42 9149.08 2.55
Fig. 13 Sample circuit for the re-computation logic
430 IET Comput. Digit. Tech., Vol. 1, No. 4, July 2007
Authorized licensed use limited to: Princeton University. Downloaded on August 19, 2009 at 15:31 from IEEE Xplore. Restrictions apply.
Theorems 4.1 and 4.2, respectively, show that our meth-
odology is sound and complete.
(ai1 xk þ ai2 xkþ1 )uai3 where u [ { , } (1) Observe that the set of constraints fC1 , C2 , C3 , C4 , C15g
and fC1 , C2 , C3 , C4 , C25g produce two disjoint convex
Let Cp and Cq be the member of the overlapping con- regions in the four-dimensional hyperspace. Therefore the
straint set for the strip constituting x~ k . As x~ kþ1 is out of task is to compute the strips for both the regions during
the refined bound the software preprocessing stage and during the actual hard-
ware test-generation, first randomly choose the solution
(ap1 xk þ ap2 xkþ1 ) : uap3 or, (aq1 xk þ aq2 xkþ1 ) : uaq3 region and next, apply the aforesaid methodology.
Authorized licensed use limited to: Princeton University. Downloaded on August 19, 2009 at 15:31 from IEEE Xplore. Restrictions apply.
Table 4: Results on industry-standard Bus protocol
protocol (for interaction with the DUT), these BFMs are fifth column shows the number of planes and the seventh
designed using several in-line constraints. The methodology column shows the number of strips in all the planes. The
presented in this paper generates hardware for these in-line sixth column indicates the number of input constraints
constraints. (cons). The eighth and ninth columns show the run-times
(preprocessing and synthesis times) in an Intel(R)
7 Results Pentium(R) 2.8 GHz, 512 MB RAM machine. The run-
times are given in mili-seconds (ms). The 10th and 11th
We have applied our prototype tool on the test-benches of columns enumerate the area-overhead (bit count and Gate
two industry standard on-chip bus protocols, namely count) in the generated circuit. Also we provide the
ARM AMBA AHB and IBM CoreConnect. The test- number of bits of memory required by the LFSR’s. We
benches have been modelled following standard test-bench have used Design Compiler [24] of Synopsys for synthesis-
modelling guidelines (such as RVM/VM). These test- ing the hardware. Note that the area complexity of IBM CC
benches have also been used as environment models for OPB and AHB Master is quite less compared to the others.
the verification of different components of the two proto- This is because the width of the variables associated with
cols. There are two goals of presenting the experimental the environment of these components are small. Other com-
results. Firstly, it establishes a proof of concept on industry ponents include constraints involving address buses. The
standard designs and justifies our claim that systems of con- last column shows the number of clock cycles required to
straints involving only bi-variate constraints are both rel- generate a random valuation satisfying the constraints.
evant and often adequate in practice. Secondly, it Example 7.1: To illustrate the above table further, consider
demonstrates that our synthesis methodology generates cir- the following scenario from the AMBA AHB protocol. A
cuits that have modest area overheads and can generate master component (master number: 4) gets a Split or
random valuations within a few cycles. Retry response for a transaction targeting towards a slave
In order to provide an idea of the complexity of constraint component (slave number: 1). The slave has an address
synthesis for a complex test-bench, we present only a few range: [128:255]. The transfer is a half-word (16 bit) read
instances of the constraints (see Table 4). Sample con- transfer for specific byte lanes. The byte lanes are selected
strained random test-benches for IBM CC OPB Master by two variables fracad (2 bits) and size (3 bits). The span
and DCR Master device are given in [20]. of the transfer is characterised by variables addr (32 bits)
Table 4 shows the results of our tool on IBM and burst (3 bits). Transfer mode and response types are
CoreConnect and AMBA AHB on-chip Bus protocols. selected by variables trans (2 bits) and resp (2 bits),
Analysis is given for some specific constraints modeled in respectively. Now we analyse the complexity of one of
the environment of the DUT. The name of the DUT is the constraints required to model this scenario in verifying
given first (e.g. AHB Arbiter, Master, Slave etc.). Next the AHB arbiter component. To verify the arbiter
details of the associated environment constraints are component, we require to model an environment
given. The third column shows the number of variables consisting of the master and the slave device. The in-line
and the fourth column enumerates the width of these vari- constraint for this environment is given in Fig. 17.
ables in the constraint. For example, a specific constraint Note that this constraint requires six variables (see
in the environment for verifying the AHB Master includes Table 4). The variable widths are (32, 3, 3, 2, 2, 2). The
six variables (of width 32, 3, 3, 2, 2 and 2 bits, respectively). number planes is 6 C2 ¼ 15. The number of constraints are
Note that the width of the variables greatly affects the 11 (see above). Analysis for the other bus components are
number of bits of memory in the generated hardware. The given in a similar fashion. A
Authorized licensed use limited to: Princeton University. Downloaded on August 19, 2009 at 15:31 from IEEE Xplore. Restrictions apply.
8 Conclusions 9 Varghese, J., Butts, M., and Batcheller, J.: ‘An efficient logic
emulation system’, Proc. IEEE Trans. VLSI Syst., 1993, 1, (2),
pp. 171– 174
By investigating several complex test benches for verifying 10 Bergeron, J., Cerny, E., Hunter, A., and Nightingale, A.: ‘Verification
on-chip Bus protocols, we have identified that these test methodology manual for systemverilog’ (Springer, 2005), vol. XVIII,
benches in general use constraints which involve either p. 510
ranges of individual variables or linear constraints involving 11 Reference Verification Methodology for Vera, http://www.synopsys.
two variables. In this work, we concentrate on synthesis com/products/simulation/pdf/va_vol4_ids1_vera.pdf
12 AMBA Specification Rev2.0, http://www.arm.com/products/
of these types of test benches. However the problem of solutions/AMBA_Spec.html
synthesis of general constraints (involving any number of 13 IBM CoreConnect Bus Specification. www-306.ibm.
variables) remains an open problem. com\chis\techlib\techlib.nsf\techdocs\
14 PCI Bus Specification 3.0,http://www.pcisig.com/membersdownloads/
specifications/conventional/PCI_LB3.0-2-6-04.pdf
9 Acknowledgment 15 Chandra, A.K., and Iyengar, V.S.: ‘Constraint solving for test
generation– a technique for high level design verification’. Proc.
Intl. Conf. on Computer Design (ICCD), October 1992,
We take this opportunity to thank the referees for their time pp. 245– 248, ISBN: 0-8186-3110-4
in reviewing this paper, and for their comments. The authors 16 Shimizu, K., and Dill, D.: ‘Deriving a simulation input generator and a
acknowledge Synopsys (India) Pvt. Ltd. for partial Support coverage metric from a formal specification’. Proc. Design
of this work. Pallab Dasgupta and P.P. Chakrabarti further Automation Conf. (DAC), June 2002, pp. 801–806
17 Yuan, J., Shultz, K., Pixley, C., Miller, H., and Aziz, A.: ‘Modeling
acknowledge the Department of Science and Technology design constraints and biasing in simulation using BDDs’.
(DST), Govt. of India for partial support of this work. Proc. Int. Conf. on Computer-Aided Design (ICCAD), 1999,
pp. 584– 589
18 Albin, K., Yuan, J., Aziz, A., and Pixley, C.: ‘Constraint synthesis for
10 References environment modeling in functional verification’. Proc. of the Design
Automation Conf. (DAC), USA, June 2003, pp. 296–299
1 OpenVera LRM 2.0, http://www.open-vera.com 19 Kukula, J.H., and Shiple, T.R.: ‘Building circuits from relations’.
2 e Reuse Methodology (eRM), www.verisity.com\products\erm.html Proc. 12th Int. Conf. on Computer Aided Verification (CAV),
3 SystemVerilog 3.1a Language Reference Manual. http://www.eda. LNCS, 2000, vol. 1855, pp. 113–123
org/sv/SystemVerilog_3.1a.pdf 20 Example Constrained Random Test Benches. http://www.facweb.
4 Dechter, R.: ‘Constraint processing’ (Morgan Kaufmann Publishers, iitkgp.ernet.in/~pallab/Constraint1.htm
2003), ISBN 1-55860-890-7 21 Berg de, M., Kreveld, V.M., Overmars, M., and Schwarzkopf, O.:
5 Tsang, E.: ‘Foundations of constraint satisfaction’ (Academic Press, ‘Computational geometry: algorithms and applications’ (Springer
1993) Verlag, 2000, 2nd rev.,), p. 367, ISBN: 3-540-65620-0
6 Bauer, J., Bershteyn, M., Kaplan, I., and Vyedin, P.: ‘A reconfigurable 22 Boissonnat, J., and Yvinec, M. (translated by Bronniman H.):
logic machine for fast event-driven simulation’. Proc. Design ‘Algorithmic geometry’ (Cambridge University Press), p. 541,
Automation Conf. (DAC), June 1998, pp. 668– 671 ISBN: 13:9780521565295
7 Cadambi, S., Mulpuri, C., and Ashar, P.: ‘A fast, inexpensive and 23 ILOG CPLEX: High-performance software for mathematical
scalable hardware acceleration technique for functional simulation’. programming. http://www.ilog.com/products/cplex/
Proc. Design Automation Conf. (DAC), June 2002, pp. 570–575 24 Design Compiler. www.synopsys.com\products\logic\logic.html
8 Suyama, T., Yokoo, M., Sawada, H., and Nagoya, A.: ‘Solving 25 Young-II, K.: ‘TPartition: testbench partitioning for hardware
satisfiability problems using reconfigurable computing’. Proc. IEEE accelerated functional verification’, Proc. IEEE Design and Test
Trans. on VLSI Systems, February 2001, vol. 9, pp. 109– 116 Comput., 2004, 21, (6), pp. 484–493
Authorized licensed use limited to: Princeton University. Downloaded on August 19, 2009 at 15:31 from IEEE Xplore. Restrictions apply.