9 .Efficient Design For Fixed Width Adder

Efficient Design for Fixed-Width Adder-Tree
Abstract
Conventionally, fixed-width adder-tree (AT) design is obtained from the full-width AT
design by employing direct or post-truncation. In direct-truncation, one lower order bit of
each adder output of full-width AT is post-truncated, and in case of post-truncation, {p}
lower order-bits of final-stage adder output are truncated, where p = dlog2 Ne and N is the
input-vector size. Both these methods does not provide an efficient design. In this paper, a
novel scheme is presented to obtain fixed-width AT design using truncated input. A bias
estimation formula based on probabilistic approach is presented to compensate the truncation
error. The proposed fixed-width AT design for input-vector sizes 8 and 16 offers
(37%,23%,22%) and (51%,30%,27%) areadelay product (ADP) saving for word-length sizes
(8,12,16), respectively, and calculates the output almost with the same accuracy as the post-
truncated fixed-width AT which has the highest accuracy among the existing fixed-width AT.
Further, we observed that Walsh-Hadamard transform based on the proposed fixed-width AT
design reconstruct higher-texture images with higher peak signal to noise ratio (PSNR) and
moderate-texture images with almost the same PSNR compared to those obtained using the
existing AT designs. Besides, the proposed design creates an additional advantage to
optimize other blocks appear at the upstream of the AT in a complex design
CHAPTER 1
INTRODUCTION
Low power, area-efficient and high-performance computing systems are increasingly used in
portable and mobile devices. For such applications, digital signal processing (DSP)
algorithms are implemented in fixed-point VLSI systems. Adder-tree (AT) commonly used in
parallel designs of inner-product computation and matrix-vector multiplication. Multiplier
design also involves a shift-adder-tree (SAT) for accumulation of partial product bits. Word-
length growth is a common problem encountered when multiplication and addition are
performed in fixed-point arithmetic.
The shape of the bitmatrix of SAT is different from the AT. Consequently, word length
grows in a different order in SAT and AT. Besides, there are few other bits also added in the
SAT to take care of negative partial products of multiplier. Specific designs have been
suggested for efficient realization of fixed-width multipliers with less truncation error [1].
However, the scheme used in fixed-width multiplier is not appropriate to develop a fixed-
width AT design due to different shaped bit-matrix. The full-width AT (FL-AT) design
produces (w + p)-bit output for every N-point input-vector, where p = log2 N. For the same
size input-vector, the fixed-width AT (FX-AT) design produces w-bit output. Conventionally,
FX-AT design is obtained from the FL-AT design by employing direct or post-truncation. In
direct-truncation (DT), one lower order bit of each adder output of FL-AT is post-truncated,
and in case post-truncation, {p} lower order-bits of final adder output of FL-AT are
truncated. In recent years, several schemes have been suggested for approximate computation
of addition using ripple carry adder (RCA) to save critical path delay (CPD) and area [2]–[5].
The bio-inspired lower part OR adder is proposed based on approximate logic [2]. Four
different types of approximate adder designs are proposed in [3]. An approximate 2-bit adder
is proposed in [5] for approximate computation of triple multiplicand without carry
propagation. These approximate designs can be used to implement RCA with less delay and
area with some loss of accuracy. The approximate RCA design can be used to obtain fixed-
width AT employing post-truncation. However, the approximate fixedwidth AT (APX-FX-
AT) does not offer an area-delay efficient design.
Bit-level optimization of FL-AT for multiple constant multiplication (MCM) is proposed to
take advantage of shifting operation [6]. An efficient FL-AT design is proposed in [7] using
the approximate adder of [3] for imprecise realization of Gaussian filter for image processing
applications. We find that the optimized AT of [6] is specific to MCM based design and none
of the existing design discusses the issues related to fixed-width implementation of AT. It is
observed that direct-truncation and post-truncation methods does not provide an efficient FX-
AT design. It is necessary to have a different approach for developing efficient FX-AT design
which is currently missing in the literature. An efficient FX-AT design certainly help to
improve the efficiency of dedicated VLSI systems implementing complex DSP algorithm.
In this research, we propose a scheme to develop an efficient FX-AT design with truncated
input. Use of truncated input in FX-AT offers two fold advantages: (1) area and delay saving
within the FX-AT due to reduction in adder-width (by p-bits), and (2) creates a scope to
optimize other computing blocks appear at the upstream of AT in a complex design.
However, the use of truncated input introduces a large amount of error in the FXAT output
which needs to be biased appropriately. The main contribution of the research are:
 Use of truncated input in fixed-width AT design.
 Formula to estimate the bias for error compensation.
1.2 LITERATURE SURVEY

IMPrecise adders for low-power Approximate CompuTing by Vaibhav Gupta,
DebabrataMohapatra, Sang Phill Park, AnandRaghunathan and KaushikRoy:
Low-power is an imperative requirement for portable multimedia devices employing various
signal processing algorithms and architectures. In most multimedia applications, the final
output is interpreted by human senses, which are not perfect.
Energy-efficient signal processing via algorithmic noise-tolerance, by R. Hegde; N.R.

Shanbhag:
In this a framework for low energy digital signal processing (DSP) where the supply voltage
is scaled beyond the critical voltage required to match the critical path delay to the
throughput Design of low-power high-speed truncation-error tolerant adder and its
application in digital signal processing, by N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z.
H. Kong: In modern VLSI technology, the occurrence of all kinds of errors has become
inevitable. By adopting an emerging concept in VLSI design and test, error tolerance (ET), a
novel error-tolerant adder (ETA) is proposed
Inexact designs for approximate low power addition by cell replacement, by H. A. F.

Almurib, T. N. Kumar, and F. Lombard:
It has three designs of an inexact adder cell for approximate computing. These cells require a
substantially smaller number of transistors compared to an exact full adder cell as well as
known inexact designs Accuracy configurable adder for approximate arithmetic designs by
A. B. Kahng and S. Kang Approximation can increase performance or reduce power
consumption with a simplified or inaccurate circuit in application contexts where strict
requirements are relaxed.
A low-power, high-performance approximate multiplier with configurable partial error

recovery by Cong Liu ; Jie Han; Fabrizio Lombard;
Approximate circuits have been considered for error-tolerant applications that can tolerate
some loss of accuracy with improved performance and energy efficiency. Multipliers are key
arithmetic circuits in many such applications such as digital signal processing (DSP). In this a
novel approximate multiplier with a lower power consumption and a shorter critical path than
traditional multipliers is proposed for high-performance DSP applications.
CHAPTER 2
ADDER STRUCTURES
2.1 Adders
Adders are used in many aspects [11], [12]. It is generally recognized that most of the time
required by adders is due to carry propagation, so how to reduce the propagation time is the
focus on today’s techniques. Different binary adder schemes have their own characters, such
as area and energy dissipation. No such adder scheme is the best for every condition, so to
choose in a specific context with specific requirement and constraint is important. Because
this thesis work does not focus on analysis of delay time of different adders, here the function
of some commonly used adders is given.
2.1.1 Two’s Complement Representation

Two’s complement representation uses the most significant bit as a sign bit, making it easy to
test whether an integer is positive or negative. Range of two’s complement representation is
from −2n−1 to 2n−1 − 1. Consider an n bits integer A, in two’s complement representation. If A
is positive, then the sign bit an−1 is zero. The remaining bits represent the magnitude of the
number, in the same fashion as for sign magnitude:
The number zero is identified as positive and therefore has a 0 sign bit and a magnitude of all
0s, we can see that the range of positive integers that maybe represented is from 0 to 2 n−1 −1.
Any larger number would require more bits.
2.1.2 Fixed Time Type
Most commonly implemented is the fixed time type adder scheme. The character is that no
signal is indicated when addition is completed. Therefore, the worst case delay should be
considered.
2.1.3 Variable Time Type
Contrary to fixed time type adder scheme, the variable time type adders have a completion
signal so that the result of the addition can be used as soon as the completion signal is
asserted.
2.1.4 Carry-Propagate Adder

Carry-propagate adders (CPA) can get the result in conventional number system, also called
fixed-radix system. The property of fixed-radix system is that every number has a unique
representation, so that no two sequences have the same numerical value. A digit set from 0 to
r−1, where r means radix.
2.1.4.1 Ripple-Carry Adder
An n -bit adder used to add two n -bit binary numbers can build by connecting n full adders
in series. Each full adder represents a bit position i (from 0 ton −1). Each carry out from a full
adder at position i is connected to the carry in of the full adder at the higher position i +1.
The sum output of a full adder at position i as shown in Figure 1 is given by:
Si = X i ⊕Yi ⊕Ci
The carry output of each FA as shown in Figure 1 is given by:

Ci+1 = X iYi + X iCi +YiCi
Fig2.1 Ripple-carry adder
In the expression of the sum, Ci must be generated by the full adder at the lower position i −1.
tc is the delay from the input from the full adder to the carry output and ts is the delay form the
input to the sum output. The worst case delay is given by
TCRA = (n −1)tc + max(tc ,ts )
This adder is slow for large n. The main advantage of this adder is the simplicity of its cell
and connection among them.
2.1.4.2 Carry-Look ahead Adder

The basic idea of carry-lookahead adder is computing the carries simultaneously, i.e. in this
type of adder all the carries in the same groups are computed at the same time.
The carry-lookahead adder has two functions, first is to compute all the carries then the
operation Si = X i ⊕Yi ⊕Ci is implemented by a simple 3-input XOR gate. The design of the
lookahead carry generator involves two Boolean functions named Generate and Propagate.
For each pair of input bits these functions are defined as:
Gi = X iYi
Pi = X i ⊕Yi
The carry bit Ci+1 generated when adding two bits X i and Yi , is '1' when the function Gi is '1'
or if the CI is ’1’ and the function Pi is '1' simultaneously. In the first case, the carry bit is
activated by the local conditions (the values of X i and Yi). In the second, the carry bit is
received from the less significant elementary addition and is propagated further to the more
significant elementary addition depending on the function Pi. Therefore, the carry-out bit
corresponding to a pair of bits X i and Yi is computed according to the equation:
Ci = Gi + PiCi−1
Hence, the carry signal can be computed by carry in, Generate and Propagate signals.
For example, consider a four bit adder
C1 = G0 + P0Cin
C2 = G1 + P1G0 + P1P0Cin
C3 = G2 + P2G1 + P2P1G0 + P2P1P0Cin
C4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0Cin
Figure 2 can help us understand the carry out signal computation procedure more clearly.
Fig2.2 Carry out of Carry Lookahead Adder
The sum output of each column is given in Figure 3.

sum_outi = X i ⊕Yi ⊕carry_in
Fig2.3: Sum of Carry Lookahead Adder
The advantage of carry-lookahead adder is if we consider the input vector of n bits is divided
into groups of m bits and groups connected like a ripple-carry adder, the worst delay should
be:
The worst delay is less than ripple-carry adder because tgroups is smaller than mtc.
Hence the carry-lookahead adder is faster than ripple-carry adder.
2.1.5 Redundant Adders

The character of redundant adders is that no carry propagation is required. In other words,
independence of numbers of bits of the adders. The operand is represented using a redundant
set. The main purpose of the redundant adder is to reduce the addition time. But this kind of
adder have some disadvantages, first is the increase of the number of bits needed for
representation of a number, which depend on the degree of the redundancy. Another
disadvantage is that some of operations can’t be performed in redundant numbers such as
magnitude comparison or sign detection.
2.1.5.1 Carry-Save Adder

Carry-save Adder(CSA) have the same circuit as the full adder, as show in Figure 4.
Fig2.4 Function of Carry-save adder
The carry in signal is considered as an input of the CSA, and the carry out signal is
considered as an output of the CSA. Figure 5 show hown carry save adders are arranged to
adder threen bit numbers x , y , z . into two numbers c and s.
Fig2.5 CSA used in n bit numbers
In Figure 5, note that all full adders are independent

Fig2.6 CSA computation
Figure 6 show the CSA compute flow and Table 1 will show how the CSA works (basic on
binary numbers).
Table 2.1 CSA Computation
The computation can be divided into two steps, first we compute S and C using a CSA, then
we use a CPA to compute the total sum. From this example, we can see that the carry signal
and the sum signal can be computed independently to get only two n -bits numbers. A CPA is
used for the last step computation and the carry propagation exist only in the last step.
2.1.5.2 Signed-Digit Adder (SDA)
Signed-digit (SD) number representation systems have been defined for any radix r with digit
values ranging over the set (- alpha, . . ., -1, 0, 1, . . ., alpha), where alpha r −1 is an arbitrary
integer in the range ≤ alpha ≤ r −1. Such number representation 2
systems possess sufficient redundancy to allow for the cut up of carry or borrow chains and
hence result in fast propagation-free addition and subtraction. The result of the addition uses
signed digit representation. Use fixed-radix representation with digit value from a signed-
integer set.
with a digit set (- alpha, . . ., -1, 0, 1, . . ., alpha).
Here the addition algorithm is not mention in detail.

The objective of SDA is to eliminate carry propagation. A signed-digit addition is performed
in two steps.
Step 1: to compute sum(w) and transfer(t ), the transfer’s function is something like carriers
in CPA.
x + y = w+t
At the digital level this correspond to
xi + yi = wi + rti+1
Fig2.7 First step of Signed-digit Addition

Figure 7 show the addition of the first two bits of n -bit numbers
Step 2: compute s = w + t At the digital level

si = wi +ti
We can compute si without produce a carry, as shown in Figure 8.
Fig2.8 Second step of Sign-digit Addition
Finally, we can conclude SDA structure, as shown in Figure 9.
Fig2.9 Sign-digit Addition
A.Avizienis [13] proposed a redundant binary number (a radix-2 signed-digit number). With
this type of number, the propagation of carry figures is absorbed into its redundancy and the
addition processes are unrelated to the number of digits and can be executed in only two
steps. More detail to compute ti and representation of operands has been mentioned in [14].
2.1.6 Multi-operand Addition
A common structure for adding several operands is an adder tree, such as Wallace tree,
Dadda tree, carry save adder tree and so on. In this thesis, carry save adder tree structure and
Wallace tree are used. The primitive operation performed on the inputs bit-array is reduction,
to achieve an output bit-array with a small number of bits. There are two methods used:
reduction by rows and reduction by columns, carry save adder tree belong to first method and
the Wallace tree belong to second method. Modules to reduce the rows are called adders and
reduce the columns are called counters.
2.1.6.1 Carry Save Adder Tree

The carry save adder tree can be used to add three operands in two’s complement
representation and produces a result as the sum of two vectors. A 3-to-2 reduction is called
[3:2] adder, and using this tree, we can use a [ p :2] adder to reduce p bit-vectors to 2 bit-
vectors using CSAs.
Fig2.10 A [ p :2] adder
From Figure 10, each column’s bit numbers are k, and have p levels. We can use [3:2] adders
to reduce the rows and get 2 bit vectors. No propagation of the carries are required except on
the last two rows which result in a speed up of the computation.
Fig2.11 Reduction by rows
From Figure 11, the number of input vectors were reduced by the rows. Finally, we should
estimate the numbers of levels of the CSA tree as
where k is the number of input operands.
2.1.6.2 Wallace tree

Wallace tree structures are widely used in additions with several operands. The reduction by
column is similar to reduction by rows if the number of bits in each column of the array is the
same. But conditions are always not like this. For example the partial products of the
multiplier, the Least-significant column can’t receive bits from other columns. So reduction
by columns is introduced.
The basic concept is to reduce bit numbers in each column of each level. So full adder and
half adder are used as (3:2) counter adder and (2:2) counter.
Fig2.12 FA and HA as (3:2) counter and (2:2) counter
In Figure 12, three nodes inside pane represent the FA’s three inputs and two nodes outside
represents the FA’s carry out and sum. The half adder has two inputs, abd one sum and one
carry out. Here is a example used in this thesis presented.
Example: a = [2 3] means 2-bit numbers with weight 2 1 and 3-bit numbers with weight 20.
We can use a Wallace tree as shown in Figure 13 to achieve fast addition. The basic module
in the Wallace tree is (3:2) counter and (2:2) counter.
Fig2.13 Example of Reduction by Columns

The vector change from a = [2 3] to a = [1 2 1]. The max change from 3 bits to 2 bits. Carry
propagation delay was eliminated except for the last row. The last step use the CPA, like
carry-lookahead adder, to compute the sum, and fast addition is achieved.
The Dadda tree is a special condition of the Wallace tree where all bit-numbers are collected
and using Wallace tree with minimum number of counters and critical path. Wallace tree was
chosen as basic algorithm of program for this thesis.
CHAPTER 3
PROPOSED DESIGN
3.1 EXISTING SYSTEM:

Low-power, area-efficient and high performance computing systems are increasingly used in
portable and mobile devices. For such applications, digital signal processing (DSP)
algorithms are implemented in fixed-point VLSI systems. Adder-tree (AT) commonly used in
parallel designs of inner product computation and matrix-vector multiplication. Multiplier
design also involves a shift-adder-tree (SAT) for accumulation of partial product bits. Word-
length growth is a common problem encountered when multiplication and addition are
performed in fixed-point arithmetic. The shape of the bit matrix of SAT is different from the
AT. Consequently, word length grows in a different order in SAT and AT. Besides, there are
few other bits also added in the SAT to take care of negative partial products of multiplier.
Specific designs have been suggested for efficient realization of fixed-width multipliers with
less truncation error. However, the scheme used in fixed-width multiplier is not appropriate
to develop a fixed-width AT design due to different shaped bit-matrix. The full-width AT
(FLAT) design produces (w + p)-bit output for every N-point input-vector, where p = log2 N.
For the same size input-vector, the fixed- width AT (FX-AT) design produces w-bit output.
Conventionally, FX-AT design is obtained from the FL-AT design by employing direct or
post-truncation. In direct-truncation (DT), one lower orderbit of each adder output of FL-AT
is posttruncated, and in case post-truncation, {p} lower order-bits of final adder output of
FLAT are truncated. In recent years, several schemes have been suggested for approximate
computation of addition using ripple carry adder (RCA) to save critical path delay (CPD) and
area. The bio-inspired lower part OR adder is proposed based on approximate logic. Four
different types of approximate adder designs are proposed. An approximate 2-bit adder is
proposed in for approximate computation of triple multiplicand without carry propagation.
These approximate designs can be used to implement RCA with less delay and area with
some loss of accuracy. The approximate RCA design can be used to obtain fixed-width AT
employing posttruncation. However, the approximate fixed width AT (APX-FX-AT) does not
offer an area-delay efficient design.
Bit-level optimization of FL-AT for multiple constant multiplication (MCM) is proposed to
take advantage of shifting operation. An efficient FL-AT design is proposed using the
approximate adder of for imprecise realization of Gaussian filter for image processing
applications. We find that the optimized AT of is specific to MCM based design and none of
the existing design discusses the issues related to fixed-width implementation of AT. It is
observed that direct truncation and post-truncation methods does not provide an efficient
FXAT design. It is necessary to have a different approach for developing efficient FX-AT
design which is currently missing in the literature. An efficient FX-AT design certainly help
to improve the efficiency of dedicated VLSI systems implementing complex DSP algorithm.
3.2 TRUNCATION AND BIAS ESTIMATION FORMULA
To discuss the truncation scheme, bits of N-point input vector are arranged in a (N × w) size
matrix A such that the i-th row of bits correspond to i-th component {xi} of input vector x and
w is the bit-width. The bit-matrix A for N = 8 and w = 8 is shown in Fig.1. Bits of A are
added column-wise along with the carry bits to calculate one bit AT output and from w
columns of A, w lower order bits of FL-AT output {y} are calculated. The remaining p upper
bits of {y} are calculated from the carry bits generated from the w-th column of A. The
structure of FL-AT for N = 8 and w = 8 is shown in Fig.2(a), where each adder is
implemented using a RCA. Structure of FX-AT-PT and FX-AT-DT are shown in Fig.2(b)
and Fig.2(c), respectively. It is observed that both FX-AT-PT and FX-AT-DT does not
provide an area-delay efficient FXAT design. To address this issue, a different type of
truncation method and error compensation scheme is discussed here.
Fig.3.1. Input bit-matrix A of adder tree for N = 8 and w = 8. y0 and y1, respectively,
represents the output of FL-AT and FX-AT-PT
Fig. 3.2. (a) Full-width adder-tree for N = 8 and w = 8. (b) Fixed-width post-truncated adder-
tree (FX-AT-PT) for N = 8 and w = 8. (c) Fixed-width with 1-bit direct-truncated adder-tree
(FX-AT-DT) for N = 8 and w = 8. (d) Function of carry-generator used in final stage of FX-
AT-PT
The bit-matrix A is partitioned into two parts named most significant part (MSP) and least
significant part (LSP). The lower p columns of A forms the LSP and the upper (w − p)
columns forms the MSP. LSP columns are partially or fully truncated to design a fix-width
AT which is discussed in the following section.
3.3 Truncated Fixed-width Adder-Tree

The MSP of input bit matrix is only considered to calculate the output of proposed truncated
fixed-width AT (TFX-AT). Due to truncation of LSP, a significant amount of error is
introduced in the AT output. A fixed bias for the truncated LSP is added to the MSP for error
compensation. The fixed-bias is estimated using the probabilistic approach. The probability
of each component of LSP (xi,j, for 1 ≤ i≤ N and 0 ≤ j ≤ p − 1, where p = log2 N) for binary ’0’
or ’1’ is . The output of TFX-AT is expressed as:
y = MSP + 2p. σ
where, the term (2p.σ) corresponds to the LSP and (σ) represents the estimated bias. The
expected value of the LSP is calculated using the formula:
The quantized value of σ is calculated as
Using (3) the fixed-bias for N = 8 and 16 is found to be {σ=4 and 8}. The input bit-matrix of
TFX-AT with fixedbias is shown in Fig.3(a) for N = 8 and w = 8. The binary values of σ=4 is
added to the least significant column of MSP for error-compensation. Structure of proposed
TFX-AT with fixed-bias is shown in Fig.3(b)
3.4 Proposed Improved Truncated Fixed-width Adder-Tree
To estimate the bias of the truncated part more precisely the LSP of A is further partitioned
into two parts as major-part and minor-part. The most significant column of LSP constitutes
the major-part (MJP) and the remaining (p − 1) lower order columns constitute the minor-part
(MNP). The bias in this case is estimated using the MJP and MNP of LSP as:
σ = σmajor + σminor (4)
where, σmajor and σminor are the estimated bias of MJP and MNP of LSP. σmajor is estimated
accurately using the actual signal value of MJP where the σminor is estimated using the
probabilistic approach. The quantized value of σmajor is estimated using the relation:
The quantized value of σminor is estimated as:
The input bit-matrix of the proposed improved truncated fixed-width adder-tree (ITFX-AT) is
shown in Fig.4 for N = 8 and w = 8. The logic-block corresponding to σmajor calculates the
carry bits {c0,c1,c2,c3,c4,c5,c6}. These carry bits and the fixed-bias corresponding to σminor are
added to the least significant column of MSP. According to (6b), the value of σminor for N = 8
is found to be 2. The structure of proposed ITFX-AT is shown in Fig.5(a). The seven
halfadders (A) connected in a tree structure calculates the carry bits {c0,c1,c2,c3,c4,c5,c6}
corresponding to σmajor. Out of these 7 half-adders, 4 half-adders of first tree level are replaced
with four full-adders with fixed input-carry 1 to add the fixedbias (+4) to the MJP of LSP
instead of least significant column of MSP. Full-adders with fixed input-carry ’1’ are further
optimized into a modified half-adder (A*) comprising of a XNOR and OR-gate.
(a)
Fig. 3.3(a) Input bit-matrix of proposed truncated fixed-width adder-tree
(TFX-AT) with fixed-bias for error-compensation. (b) Structure of proposed TFX-AT design
Fig. 3.4 Input bit-matrix of proposed improved truncated fixed-width adder tree (ITFX-AT)
for N = 8,w = 8.
MJP and MNP represents the major part and minor part of LSP. {c0,c1,c2,c3,c4,c5,c6} represent
the carry bits corresponding to the estimate of σmajor
Fig.3.5 (a) Structure of proposed ITFX-AT. (b) Logic function of half-adder (A).
(c) Logic function of modified adder (A*) (d) Logic function of carry cell
(a)
(b)
Fig. 3.6 (a) Structure of approximate fixed-width adder-tree (APX-FX-AT) using accurate
RCA and 3-bit approximate (APX) RCA of [3] for N = 8 and w = 8. (b) Approximate Full
Adder (AFA) {Type-1, Type-2, Type-3, Type-4}
Approximate full-adders of [3] are considered to add the LSP of the input matrix for reducing
the logic complexity and CPD of the FL-AT. The approximate adder (AXA) is implemented
using the approximate full-adder (AFA) {Type1, Type-2, Type-3 and Type-4} of [3]. Post-
truncated approximate FX-AT (APX-FX-AT-PT) is obtained from approximate full-width
adder-tree (APX-FL-AT) to study the performance of the proposed FX-AT designs. Note that
APX-FL-AT-Type-4 is identical to the APX-AT of [7]. Structure of APX-FX-AT-PT is
shown in Fig.6(a) using accurate RCA and 3-bit approximate RCA for N = 8 and w = 8.
CHAPTER 4
SOFTWARE REQUIREMENT
4.1 Verilog HDL

HDL stands for Hardware Description Language. Using this, we can design modern
digital circuits. It is standardized on 1995 by IEEE.
4.2 Language Elements

Identifiers
Identifier contains the sequence of digits, letters, underscore, $. The very first letter
should be underscore or a letter.
e.g:
COUNTER
AND$
_DECODE
Keywords
It has some keywords. eg: always Escaped Identifier: It contains ASCII character in
identifier. e.g: Gate
Comments
The two ways for writing the comments are
a)/*Four-bit shift register*/
b) // Four-bit shift register
Format
Construct is written using one or multiple lines.
eg1:
initial
begin
A=0;
Y=0; …
end
Value Set
It contains four values. They are
1. 0 [logic 0]
2. 1 [logic 1]
3. Z [High impedance]
4. X [Unknown value]
System Task and Functions

It starts with character $.
eg : $ time -> It returns the time of current simulation
Compiler Directives
It starts with backquote character. eg: ‘define, ‘include etc.
4.3 Modeling
In Verilog HDL, three modelling are used to write the coding. They are
1) Gate level / structural modelling

2) Dataflow modelling
3) Behavioral modelling
4.3.1 Gate Level Modeling
Multiple Input Gates – Gate Primitives. These are

nand
and
or
nor
xnor
xor
The above gates have more number of inputs & only one output.
Syntax: gate type [instance name] (Output, input 1, input 2,.....input N)
Multiple Output Gates – Gate Primitives

These gates have one input and more number of outputs. These are
buf
not
Syntax: gate type [instance name] (out 1, out 2,......out N, input)
Tristate Gates
Tristate gates have 1 output, one data input and one control input. These are
buf if 1(output =z, if control input is 0)
buf if 0(output=z, if control is one, or data is transferred to output)
not if 1(output is z, if control is one, or output = ~(input))
not if 0(output is z, if control is one, or output = ~(input))
Pullup and Pulldown Gates

These gates have no input and only 1 output. Pull up gate produce one at its output and pull-
down gates produce zero at its output.
Syntax: pullup PUP (output), pulldown PDOWN (output)
Various MOS Switches

The various MOS switches are
cmos
n mos
p mos
r cmos
r nmos
r pmos
Syntax: type of gate [instance name] (output, input, control)
Various Bidirectional Switches

The various bidirectional switches are
tran
tranif1
tranif0
r tran
r tranif1
r tranif0
Syntax: type of gate [instance name] (signal 1, signal 2, control)
Gate Delays
The time taken for propagation of signal from the input of gate to the output of gate is
specified using gate delays.
Syntax: gate name [delay] [instance name] (terminal list);
Array of Instances
Syntax: gate name[delay] instance name[left bound : right bound](terminal list)
4.3.2 Dataflow Modeling

The combinational circuits are modeled using data flow model.
Continuous Assignments
This one assigns value to a net.
Syntax: assign target = expression;
Net Declaration Assignments

The continuous assignment is one part of net declaration itself.
wire y;
assign y= ‘b;
It is equivalent to wire y= ‘b;
Various Delays
assign #(3,6)z = x & y; ( Rise delay =3, Fall delay = 6, Transition delay = 3)
assign z = x & y; (No delay)
4.3.3 Behavioural Modelling
The behavioural modelling is the third type of modelling used in this HDL.
Procedural Constructs
Procedural constructs are of two types. They are
1) Initial statement
2) Always statement
1) Initial Statement
Syntax: initial [timing control] procedural statement;
2) Always Statement
Syntax: always [timing control] procedural statement;
The procedural statements are

 Wait statement
 Loop statement
 Conditional statement
 Case statement
4.4 Modelsim
Modelsim software simulates VHDL, Verilog HDL, System C and built in C debugger.
The simulation is performed with GUI. It uses an unified kernel for all supporting
languages simulation. It enables simulation, debugging & verification of supported
languages.
4.5 Quartus II
Quartus II software is introduced by Altera for PLD design. It analyses and synthesis
HDL language. The developer is able to compile the designs, do timing analysis, allows
the programmer to configuring the target device, examine the diagram of RTL,
implement HDL etc,.
The features include:

- For JTAG, generates STAPL / JAM files
- A tool named SOPC builder eliminates the integration of tasks manually by
producing interconnect logic automatically.
- Qsys uses optimized FPGA NOC architecture that increases the performance doubly
when compared to that of SOPC builder.
- DSP builder, a bridge tool between Quartus II and Simulink tool.
4.6 Xilinx FPGA
FPGA consists of configurable logic blocks along with configurable interconnects. The
design engineers configure such devices to design several tasks. The main advantage of
FPGA is that parallel execution of code. Also, the requirement of RAM is reduced. Another
advantage is that it does not rely on word length. Efficiency is maximum when utilizing
smallest word length. It also allows greater flexibility.
Fig4.1: FPGA Structure
FPGA architecture consists of 5 functional elements. They are

 CLBs which contains LUTs which are flexible.
- CLB will store data and performs larger logic functions
 Block RAM
- 18 Kilobit storage dual port
 Multiplier blocks
- Computes the multiplication of two 18- bit numbers
 I/O blocks
- Controls the data flowing
 DCM
-Provides solutions digitally for delaying, dividing, multiplying,
distributing and phase shifting of clock signals.
FEATURES
 90nm process technology
 Enhanced DDR support
 Up to 648 Kilobits of block RAM
 Frequency range: 5 MHz - 300 MHz
 Flexible and abundant logic resources
 Very low cost

 Frequency division, multiplication and synthesis
4.7 XILINX SOFTWARE
Overview on the software

The ISE® Design Suite controls all aspects of the design flow. Through the Project
Navigator interface, we can access all of the design entry and design implementation
tools. We can also access the files and documents associated with our project.
Project Navigator Interface

By default, the Project Navigator interface is divided into four panel sub-windows, as
seen in Figure 4.2. On the top left are the Start, Design, Files, and Libraries panels,
which include display and access to the source files in the project as well as access to
running processes for the currently selected source. The Start panel provides quick
access to opening projects as well as frequently access reference material,
documentation and tutorials. At the bottom of the Project Navigator are the Console,
Errors, and Warnings panels, which display status messages, errors, and warnings. To
the right is a multi-document interface (MDI) window referred to as the Workspace. The
Workspace enables us to view design reports, text files, schematics, and simulation
waveforms. Each window can be resized, undocked from Project Navigator, moved to a new
location within the main Project Navigator window, tiled, layered, or closed. We can use the
View > Panels menu commands to open or close panels. We can use the Layout > Load
Default Layout to restore the default window layout. These windows are discussed in more
detail in the following sections.
Fig4.2: Project Navigator

Design panel
The design panel provides access to the view, Hierarchy ad process panes
View Pane
The View pane radio buttons enable us to view the source modules associated with the
Implementation or Simulation Design View in the Hierarchy pane. If we select Simulation,
we must select a simulation phase from the drop-down
Hierarchy Pane
The Hierarchy pane displays the project name, the target device, user documents, and
design source files associated with the selected Design View. The View pane at the
top of the Design panel allows you to view only those source files associated with the
selected Design View, such as Implementation or Simulation.
Each file in the Hierarchy pane has an associated icon. The icon indicates the file type
(HDL file, schematic, core, or text file, for example). For a complete list of possible
sources types and their associated icons, see the “Source File Types” topic in the ISE
Help. From Project Navigator, select Help > Help Topics to view the ISE Help.
If a file contains lower levels of hierarchy, the icon has a plus symbol (+) to the left of
the name. We can expand the hierarchy by clicking the plus symbol (+). We can open a file
for editing by double-clicking on the filename. Processes Pane
The Processes pane is context sensitive, and it changes based upon the source
type selected in the Sources pane and the top-level source in our project. From the Processes
pane, we can run the functions necessary to define, run, and analyse your design. The
Processes pane provides access to the following functions:
• Design Summary/Reports
Provides access to design reports, messages, and summary of results data.
Message filtering can also be performed.
• Design Utilities
Provides access to symbol generation, instantiation templates, viewing
command line history, and simulation library compilation.
• User Constraints
Provides access to editing location and timing constraints.
• Synthesis
Provides access to Check Syntax, Synthesis, View RTL or Technology Schematic, and
synthesis reports. Available processes vary depending on the synthesis tools we use.
• Implement Design
Provides access to implementation tools and post-implementation analysis tools.
• Generate Programming File

Provides access to bitstream generation.
• Configure Target Device

Provides access to configuration tools for creating programming files and programming the
device.
The Processes pane incorporates dependency management technology. The tools keep track
of which processes have been run and which processes need to be run. Graphical status
indicators display the state of the flow at any given time. When you select a process in the
flow, the software automatically runs the processes necessary to get to the desired step. For
example, when you run the Implement Design process, Project Navigator also runs the
Synthesis process because implementation is dependent on up-to-date synthesis results. To
view a running log of command line arguments used on the current project, expand Design
Utilities and select View Command Line Log File Files Panel. The Files panel provides a
flat, sortable list of all the source files in the project. Files can be sorted by any of the
columns in the view. Properties for each file can be viewed and modified by right-clicking
on the file and selecting Source Properties.
Libraries Panel
The Libraries panel enables you to manage HDL libraries and their associated HDL source
files. You can create, view, and edit libraries and their associated sources.
Console Panel
The Console provides all standard output from processes run from Project Navigator. It
displays errors, warnings, and information messages. Errors are signified by a red X next to
the message; while warnings have a yellow exclamation mark (!).
Errors Panel
The Errors panel displays only error messages. Other console messages are filtered out.
Warnings Panel
The Warnings panel displays only warning messages. Other console messages are filtered
out.
Error Navigation to Source

You can navigate from a synthesis error or warning message in the Console, Errors, or
Warnings panel to the location of the error in a source HDL file. To do so, select the error
or warning message, right-click the mouse, and select Go to Source from the right-click
menu. The HDL source file opens, and the cursor moves to the line with the error. Error
Navigation to Answer Record You can navigate from an error or warning message in the
Console, Errors, or Warnings panel to relevant Answer Records on the Product Support and
Documentation page of the Xilinx® website. To navigate to the Answer Record, select the
error or warning message, right-click the mouse, and select Search for Answer Record from
the right-click menu. The default Web browser opens and displays all Answer Records
applicable to this message.
Workspace
The Workspace is where design editors, viewers, and analysis tools open. These include
ISE Text Editor, Schematic Editor, Constraint Editor, Design Summary/Report Viewer,
RTL and Technology Viewers, and Timing Analyzer. Other tools such as the PlanAhead™
tool for I/O planning and floorplanning, ISim, third-party text editors, Xpower Analyzer,
and iMPACT open in separate windows outside the main Project Navigator environment
when invoked.
Design Summary/Report Viewer
The Design Summary provides a summary of key design data as well as access to all of
the messages and detailed reports from the synthesis and implementation tools. The
summary lists high-level information about your project, including overview information, a
device utilization summary, performance data gathered from the Place and Route (PAR)
report, constraints information, and summary information from all reports with links to the
individual reports. A link to the System Settings report provides information on
environment variables and tool settings used during the design implementation. Messaging
features such as message filtering, tagging, and incremental messaging are also available
from this view
4.8 HDL Based Design:
Overview of HDL-Based Design
This chapter guides you through a typical HDL-based design procedure using a design
of a runner’s stopwatch. The design example used in this tutorial demonstrates many device
features, software features, and design flow practices you can apply to your own design.
This design targets a Spartan®-3A device; however, all of the principles and flows taught
are applicable to any Xilinx® device family, unless otherwise noted. The design is
composed of HDL elements and two cores. You can synthesize the design using Xilinx
Synthesis Technology (XST), Synplify/Synplify Pro, or Precision software
Required Software
To perform this tutorial, you must have Xilinx ISE® Design Suite installed. This tutorial
assumes that the software is installed in the default location c:\xilinx\release_number\
ISE_DS\ISE. If you installed the software in a different location, substitute your installation
path in the procedures that follow.
Note: For detailed software installation instructions, refer to the Xilinx Design Tools:
Installation and Licensing Guide (UG798) available from the Xilinx website
VERILOG
This software supports both VHDL and Verilog designs and applies to both designs
simultaneously, noting differences where applicable. You will need to decide which HDL
language you would like to work through for the tutorial and download the appropriate files
for that language. XST can synthesize a mixed-language design. However, this tutorial
does not cover the mixed language feature. Starting the ISE Design Suite To start the ISE
Design Suite, double-click the Project Navigator icon on your desktop, or select Start > All
Programs > Xilinx ISE Design Suite > Xilinx Design Suite 14 > ISE Design Tools >
Project Navigator.
Fig4.3: Project navigator Desktop
Creating a New Project

To create a new project using the New Project Wizard, do the following:
1. From Project Navigator, select File > New Project. The New Project Wizard appears.
Go to New project Wizard-> Create New Project Page
2. In the Location field, browse to c:\xilinx_tutorialor to the directory in which you
installed the project.
3. In the Name field, enter wtut_vhdor wtut_ver.
4. Verify that HDL is selected as the Top-Level Source Type, and click Next
The New Project Wizard—Device Properties page appears.

5. Select the following values in the New Project Wizard—Device Properties page:
• Product Category: All
• Family: Spartan3A and Spartan3AN
• Device: XC3S700A
• Package: FG484
• Speed: -4
• Synthesis Tool: XST (VHDL/Verilog)
• Simulator: ISim (VHDL/Verilog)
• Preferred Language: VHDL or Verilog depending on preference.
This will determine the default language for all processes that generate HDL files.
Other properties can be left at their default values.
6. Click Next, then Finish to complete the project creation.
7. Stopping the Tutorial, we may stop the tutorial at any time and save our work by
selecting File >Save All. Design Description The design used in this tutorial is a
hierarchical, HDL-based design, which means that the top-level design file is an HDL
file that references several other lower-level macros. The lower-level macros are
either HDL modules or IP modules. The design begins as an unfinished design.
Throughout the tutorial, we will complete the design by generating some of the
modules from scratch and by completing others from existing files. When the design
is complete, we will simulate it to verify the design functionality. In the runner’s
stopwatch design, there are five external inputs and four external output buses. The
system clock is an externally generated signal.
Inputs
The following are input signals for the tutorial stopwatch design:
• strtstop
Starts and stops the stopwatch. This is an active low signal which acts like the start/
stop button on a runner’s stopwatch.
• reset
Puts the stopwatch in clocking mode and resets the time to 0:00:00.
• clk
Externally generated system clock.
• mode
Toggles between clocking and timer modes. This input is only functional while the
clock or timer is not counting.
• lap_load
This is a dual function signal. In clocking mode, it displays the current clock value
in the ‘Lap’ display area. In timer mode, it loads the pre-assigned values from the
ROM to the timer display when the timer is not counting.
Outputs
The following are outputs signals for the design:
• lcd_e, lcd_rs, lcd_rw
These outputs are the control signals for the LCD display of the Spartan-3A demo
board used to display the stopwatch times.
• sf_d[7:0]
Provides the data values for the LCD display.
Functional Blocks
The completed design consists of the following functional blocks:
• clk_div_262k
Macro that divides a clock frequency by 262,144. Converts 26.2144 MHz clock into
100 Hz 50% duty cycle clock.
• dcm1
Clocking Wizard macro with internal feedback, frequency controlled output, and
duty-cycle correction. The CLKFX_OUT output converts the 50 MHz clock of the
Spartan-3A demo board to 26.2144 MHz.
• debounce
Schematic module implementing a simplistic debounce circuit for the
strtstop, mode, and lap_load input signals.
• lcd_control
Module controlling the initialization of and output to the LCD display.
• statmach
State machine HDL module that controls the state of the stopwatch.
• timer_preset
CORE Generator™ tool 64x20 ROM. This macro contains 64 preset times from
0:00:00 to 9:59:99 that can be loaded into the timer.
• time_cnt
Up/down counter module that counts between 0:00:00 to 9:59:99 decimal. This macro
has five 4-bit outputs, which represent the digits of the stopwatch time.
Synthesizing the Design

So far, we have been using Xilinx Synthesis Technology (XST) for syntax checking. Next,
we synthesize the design using either XST, Synplify/Synplify Pro, or Precision software. The
synthesis tool uses the design’s HDL code and generates a supported netlist type (EDIF or
NGC) for the Xilinx implementation tools. The synthesis tool performs the following general
steps (although all synthesis tools further break down these general steps) to create the netlist:
• Analyze/Check Syntax
Checks the syntax of the source code.
• Compile
Translates and optimizes the HDL code into a set of components that the
synthesis tool can recognize.
• Map
Translates the components from the compile stage into the target technology’s primitive
components. The synthesis tool can be changed at any time during the design flow. To
change the synthesis tool, do the following:
1. In the Hierarchy pane of the Project Navigator Design panel, select the
targeted part.
2. Right-click and select Design Properties.
3. In the Design Properties dialog box, click the Synthesis Tool value and use the
pull-down arrow to select the desired synthesis tool from the list
Note: If we do not see our synthesis tool among the options in the list, we may not
have the software installed or may not have it configured in the ISE Design Suite. The
synthesis tools are configured in the Preferences dialog box. Select Edit >
Preferences, expand ISE General, and click Integrated Tools.
Changing the design flow results in the deletion of implementation data. We have not yet
created any implementation data in this tutorial. For projects that contain
implementation data, Xilinx recommends that we make a copy of the project using File >
Copy Project if one would like to make a backup of the project before continuing.
Using XST Now that we have created and analyzed the design, the next step is to
synthesize the design. During synthesis, the HDL files are translated into gates and
optimized for the target architecture.Processes available for synthesis using XST are as
follows:
• View RTL Schematic

-Generates a schematic view of our RTL netlist.
• View Technology Schematic

-Generates a schematic view of our technology netlist.
• Check Syntax
-Verifies that the HDL code is entered properly.
• Generate Post-Synthesis Simulation Model

-Creates HDL simulation models based on the synthesis netlist.
Entering Synthesis Options
Synthesis options enable us to modify the behavior of the synthesis tool to

make optimizations according to the needs of the design. One commonly used option is to
control synthesis to make optimizations based on area or speed. Other options include
controlling the maximum fanout of a flip-flop output or setting the desired frequency of the
design. To enter synthesis options, do the following:
1. In the Hierarchy pane of the Project Navigator Design panel, select
stopwatch.vhd (or stopwatch.v).
2. In the Processes pane, right-click the Synthesize process, and select Process
Properties.
3. Under the Synthesis Options tab, set the Netlist Hierarchy property to a value of
Rebuilt.
Note: To use this property, we must set the Property display level to Advanced.
4. Click OK.
Now we are ready to synthesize our design. To take the HDL code and generate a compatible
netlist, do the following:
1. In the Hierarchy pane, select stopwatch.vhd(or stopwatch.v).
2. In the Processes pane, double-click the Synthesize process
3. Using the RTL/Technology Viewer XST can generate a schematic representation
of the HDL code that we have entered. A schematic view of the code helps us
analyze our design by displaying a graphical connection between the various
components that XST has inferred. Following are the two forms of schematic
representation:
• RTL View: Pre-optimization of the HDL code.
• Technology View: Post-synthesis view of the HDL design mapped to the target
technology.
To view a schematic representation of the HDL code, do the following:
1. In the Processes pane, expand Synthesize, and double-click View RTL Schematic
or View Technology Schematic.
2. If the Set RTL/Tech Viewer Startup Mode dialog appears, select Start with the
Explorer Wizard.
3. In the Create Schematic start page, select the clk_divider and lap_load_debounce
components from the Available Elements list, and then click the Add button to move
the selected items to the Selected Elements list.
4. Click Create Schematic. The schematic viewer allows us to select the portions of
the design to display as schematics. When the schematic is displayed, double-click on
the symbol to push into the schematic and view the various design elements and
connectivity. Right-click the schematic to view the various operations that can be
performed in the schematic viewer.
Figure 4.4: RTL Schematic

CHAPTER 5
RESULTS & SIMULATION
CHAPTER 6
CONCLUSION
In this research, a novel scheme is presented to obtain fixed width AT design using truncated
input. A bias estimation formula based on probabilistic approach is presented to compensate
the truncation error. Based on the proposed scheme, two separate fixed-width AT designs are
derived. Both the proposed designs offer a substantial amount of area and CPD saving over
the existing fixed-width AT designs. For vector sizes 8 and 16, the proposed ITFX-AT offers
(37%,23%,22%) and (51%,30%,27%) ADP saving for word-length sizes (8,12,16),
respectively, and calculates the output almost with the same accuracy as the post-truncated
fixed-width AT which has the highest accuracy among the existing fixed-width AT. Further,
we observed that Walsh-Hadamard transform based on proposed adder design reconstruct
higher-texture images with higher PSNR and moderate-texture images with almost the same
PSNR compared to those obtained from the existing fixed-width adder designs. Besides, the
use of truncated input samples in fixed-width AT design is an interesting feature which
creates an additional advantage to optimize other blocks appear at the upstream of the AT in a
complex design.
Advantages:
 Proposed system used truncated input for efficient fixed width adder tree design
 Better Area Delay Product(ADP)
 Using bias estimation formula based on probabilistic approach for compensating the
truncation error.
REFERENCES
[1] B. K. Mohanty and V. Tiwari, “Modified probabilistic estimation bias formulation for
hardware efficient fixed-width Booth multiplier”, Circuits, Systems and Signal
Processing, Springer, vol.33, no.12, pp. 3981– 3994, Dec., 2014
[2] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie and C. Lucas, “Bioinspired imprecise

computational blocks for efficient VLSI implementation of soft-computing
applications”, IEEE Transactions on Circuits and Systems-I, Regular Papers, vol. 57,
no. 4, pp. 850–862, Apr.2010.
[3] V. Gupta, D. Mahapatra, A. Raghunathan, and K. Roy, “Low-power digital signal

processing using approximate adders”, IEEE Transactions on Computer Aided Design of
Integrated Circuits and Systems, vol. 32, no. 1, pp. 124–137, Jan.2013.
[4] J. Liang, J. Han and F. Lombardi, “New metrics for the reliability of approximation and
probabilistic adders”, IEEE Transactions on Computers, vol. 62, no. 9, pp. 1760–1771,
Sept.2013.
[5] H. Jiang, J. Han, F. Qiao and F. Lambardi, “Approximate radix-8 Booth multipliers for
low-power and high-performance operations”, IEEE Transactions on Computers, vol.
65, no. 8, pp. 2638–2644, Aug.2016.
[6] Y. Pan and P. K. Meher, “Bit-level optimization of adder-trees for multiplie constant
multiplications for effcient FIR filter implementation”, IEEE Transactions on Circuits
and Systems-I, Regular Papers, vol. 61, no. 2, pp. 455–462, Feb.2014.
[7] R. O. Julio, L. B. Soares, E. A. C. Costa and S. Bampi, “Energy-efficient Gaussian filter

for image processing using approximate adder”, In. Proc. International Conference on
Electronics, Circuits and Systems, 2015, pp.450-453, 2015.
[8] D. Gizopoulos, M. Psarakis, A. Paschalis, and Y. Zorian, “Easily Testable Cellular Carry
Lookahead Adders,” Journal of Electronic Testing: Theory and Applications 19, 285-
298, 2003.
[9] S. Xing and W. W. H. Yu, “FPGA Adders: Performance Evaluation and Optimal
Design,” IEEE Design & Test of Computers, vol. 15, no. 1, pp. 24-29, Jan. 1998.
[10] M. Bečvář and P. Štukjunger, “Fixed-Point Arithmetic in FPGA,” ActaPolytechnica, vol.

45, no. 2, pp. 67-72, 2005.
[11] P. M. Kogge and H. S. Stone, “A Parallel Algorithm for the Efficient Solution of a
General Class of Recurrence Equations,” IEEE Trans. on Computers, Vol. C-22, No 8,
August 1973.
[12] [6]. P. Ndai, S. Lu, D. Somesekhar, and K. Roy, “Fine-Grained Redundancy in Adders,”
Int. Symp. on Quality Electronic Design, pp. 317-321, March 2007.
[13] [7]. T. Lynch and E. E. Swartzlander, “A Spanning Tree Carry Lookahead Adder,” IEEE
Trans. on Computers, vol. 41, no. 8, pp. 931-939, Aug. 1992. [8]. N. H. E. Weste and D.
Harris, CMOS VLSI Design, 4th edition, Pearson–AddisonWesley, 2011.
[14] [9]. R. P. Brent and H. T. Kung, “A regular layout for parallel adders,” IEEE Trans.
Comput., vol. C-31, pp. 260-264, 1982.
[15] [10]. D. Harris, “A Taxonomy of Parallel Prefix Networks,” in Proc. 37th Asilomar
Conf. Signals Systems and Computers, pp. 2213–7, 2003.
[16] https://homepages.cae.wisc.edu/ ece533/images/
[17] http://seamless-pixels.blogspot.in/2014/07/grass-2-turf-lawn-greenground-field.html
[18] https://www.decoist.com/2013-02-28/flower-beds/
[19] http://www.cosasexclusivas.com/2014/06/daily-overview-el-planetatierra-visto.html
[20] https://healthtipsfr.blogspot.com/2017/04/blog-post−40.html
[21] http://www.ugaoo.com/knowledge-center/how-to-design-a-flower-bed/

9 .Efficient Design For Fixed Width Adder

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

9 .Efficient Design For Fixed Width Adder

Uploaded by

Copyright:

Available Formats

Efficient Design for Fixed-Width Adder-Tree

1.2 LITERATURE SURVEY

Energy-efficient signal processing via algorithmic noise-tolerance, by R. Hegde; N.R.

Inexact designs for approximate low power addition by cell replacement, by H. A. F.

A low-power, high-performance approximate multiplier with configurable partial error

2.1.1 Two’s Complement Representation

2.1.4 Carry-Propagate Adder

The carry output of each FA as shown in Figure 1 is given by:

Fig2.1 Ripple-carry adder

2.1.4.2 Carry-Look ahead Adder

For example, consider a four bit adder

Fig2.2 Carry out of Carry Lookahead Adder

The sum output of each column is given in Figure 3.

Fig2.3: Sum of Carry Lookahead Adder

2.1.5 Redundant Adders

2.1.5.1 Carry-Save Adder

Fig2.4 Function of Carry-save adder

Fig2.5 CSA used in n bit numbers

In Figure 5, note that all full adders are independent

Table 2.1 CSA Computation

with a digit set (- alpha, . . ., -1, 0, 1, . . ., alpha).

Here the addition algorithm is not mention in detail.

Fig2.7 First step of Signed-digit Addition

Step 2: compute s = w + t At the digital level

We can compute si without produce a carry, as shown in Figure 8.

Fig2.8 Second step of Sign-digit Addition

Finally, we can conclude SDA structure, as shown in Figure 9.

Fig2.9 Sign-digit Addition

2.1.6.1 Carry Save Adder Tree

Fig2.10 A [ p :2] adder

where k is the number of input operands.

2.1.6.2 Wallace tree

Fig2.13 Example of Reduction by Columns

3.1 EXISTING SYSTEM:

3.3 Truncated Fixed-width Adder-Tree

The quantized value of σ is calculated as

The quantized value of σminor is estimated as:

4.1 Verilog HDL

4.2 Language Elements

System Task and Functions

1) Gate level / structural modelling

4.3.1 Gate Level Modeling

Multiple Input Gates – Gate Primitives. These are

Multiple Output Gates – Gate Primitives

Pullup and Pulldown Gates

Various MOS Switches

Various Bidirectional Switches

4.3.2 Dataflow Modeling

Net Declaration Assignments

4.3.3 Behavioural Modelling

The procedural statements are

The features include:

4.6 Xilinx FPGA

FPGA architecture consists of 5 functional elements. They are

 90nm process technology

 Enhanced DDR support

 Up to 648 Kilobits of block RAM

 Frequency range: 5 MHz - 300 MHz

 Flexible and abundant logic resources

 Very low cost

4.7 XILINX SOFTWARE

Overview on the software

Project Navigator Interface