You are on page 1of 51

CHAPTER-1 INTRODUCTION

The residue number system (RNS) has been employed for efficient parallel carry-free arithmetic computations (addition, subtraction, and multiplication) in DSP applications as the computations for each residue channel can independently be done without carry propagation. A residue number system is defined by a set of N integer constants, {m1, m2, m3, ... , mN },referred to as the moduli. Let M be the least common multiple of all the mi. Any arbitrary integer X smaller than M can be represented in the defined residue number system as a set of modulus. RNS based computations can achieve significant speedup over the binarysystem-based computation, they are widely used in DSP processors, FIR filters, and communication components Arithmetic modulo 2n + 1 computation is one of the most common RNS operations that are used in pseudorandom number generation and cryptography [The modulo 2n + 1addition is the most crucial step among the commonly used moduli sets, such as {2n 1, 2n, 2n + 1}, {2n 1, 2n, 2n + 1, 22n + 1} and {2n 1, 2n, 2n + 1, 2n+1 + 1}. There are many previously reported methods to speed up the modulo 2n + 1 addition. Depending on the input/output data representations,these methods can be classified into two categories, namely,diminished-1 and weighted respectively. In the diminished-1 representation, each input and output operand is decreased by 1 compared with its weighted representation. Therefore, only n-bit operands are needed in diminished-1 modulo 2n + 1 addition, leading to smaller and faster components. However, this incurs an overhead due to the translators from/to the binary weighted system. On the other hand, the weighted-1 representation uses (n + 1)-bit operands for computations, avoiding the overhead of translators, but requires larger area compared with the diminished-1 representations. The general operations in modulo 2n + 1 addition were discussed including diminished-1 and weighted modulo addition. parallel-prefix adders for diminished-1 modulo 2n+ 1 addition. To improve the areatime and time power products, the circular carry selection scheme was used to efficiently select the correct carry-in signals for final modulo addition . The aforementioned methods all
1

N smaller integers

{x1, x2, x3, ... , xN} with xi = X modulo mi representing the residue class of X to that

deal

with

diminished-1

modulo

addition.

However,

the

hardware

for

decreasing/increasing the inputs/outputs by 1 is omitted in the literature. In addition, the value zero is not allowed in diminished-1 modulo 2n + 1 addition, and hence, the zero-detection circuit is required to avoid incorrect computation. This leads to increased hardware cost, here proposed a unified approach for weighted and diminished-1 modulo 2n + 1 addition. This approach is based on making the modulo 2n + 1addition of two (n + 1)-bit input numbers A and B congruent to Y + U + 1, where Y and U are two n-bit numbers. Thus, any dimished-1 adder can be used to perform weighted modulo 2n + 1 addition of Y and U. first use the translators to decrease the sum of two n-bit inputs A and B by 1 and then performed the weighted modulo 2n + 1 addition using diminished-1 adders. It should be noted that, for the architecture , the ranges of two inputs A and B are less than that proposed (i.e., {0, 2n 1} versus {0, 2n}). In this brief, we propose improved area-efficient weighted modulo 2n + 1 adder design using diminished-1 adders with simple correction schemes. This is achieved by subtracting the sum of two (n + 1)-bit input numbers by the constant 2n + 1 and producing carry and sum vectors. The modulo 2 n + 1 addition can then be performed using parallel-prefix structure diminished-1 adders by taking in the sum and carry vectors plus the inverted end-around carry with simple correction schemes. Compared with the work in, the area cost for our proposed adders is lower. In addition, our proposed adders do not require the hardware for zero detection that is needed in diminished-1 modulo 2n + 1 addition.

CHAPTER-2
2

AIM AND SCOPE OF PROJECT


In the diminished-1 representation, each input and output operand is decreased by 1 compared with its weighted representation. Therefore, only n-bit operands are needed in diminished-1 modulo 2n + 1 addition, leading to smaller and faster components. However, this incurs an overhead due to the translators from/to the binary weighted system. On the other hand, the weighted-1 representation uses (n + 1)-bit operands for computations, avoiding the overhead of translators, but requires larger area compared with the diminished1 representations. To improve the areatime and timepower products, the circular carry selection scheme was used to efficiently select the correct carry-in signals for final modulo addition. The aforementioned methods all deal with diminished-1 modulo addition. However, the hardware for decreasing /increasing the inputs/outputs by 1 is omitted in the literature. In addition, the value zero is not allowed in diminished-1 modulo 2n + 1 addition,and hence, the zero-detection circuit is required to avoid incorrect computation.The brent kung tree based prefix structure uses onle less are when compared with the sklansky style prefix structure This leads to increased hardware cost.The proposed unified approach for weighted and diminished-1 modulo 2n + 1 addition is based on making the modulo 2n + 1addition of two (n + 1)-bit input numbers A and B congruent to Y + U + 1, where Y and U are two n-bit numbers. Thus, any dimished-1 adder can be used to perform weighted modulo 2n + 1 addition of Y and U. The authors first used the translators to decrease the sum of two n-bit inputs A and B by 1 and then performed the weighted modulo 2n + 1 addition using diminished-1 adders In this design we are combining the previous two modulo (2 n+1)

adders (diminished-1, weighted-1) to reduce the area & improve the performance, .

CHAPTER-3 EXISTING METHODOLOGY


3

3.1 THEORY
Residue arithmetic has been used in digital computing systems for many years. In particular, arithmetic modulo appears to play an important role in a variety of applications. Modulo 2n+1 arithmetic is most commonly met in the residue number system (RNS) , which is an arithmetic system well-suited to applications in which the operations are limited to addition, subtraction and multiplication; a common case for several digital signal processor (DSP) algorithms. The RNS has been used for the design of digital signal processors finite- impulse response (FIR) filters communication components [16]. Three-moduli sets ({2n 1, 2n, 2n + 1}of the form have received significant attention as the RNS base, mainly because of the existence of efficient residue to binary converters Addition in such systems is performed using three channels, that, in fact, are a modulo {2n 1)(equivalently ones complement), a modulo and a modulo adder( 2n + 1). From this, we conclude that the design of an efficient modulo (( 2n + 1}adder is a vital task in RNS-based applications that include a modulus of the form. Unfortunately, in an RNS that uses a three moduli set , {2n 1, 2n, 2n + 1} the modulo(2n + 1} channel becomes the execution-rate bottleneck, since it has to deal with n+1 bit operands, while the other two channels operate on -bit ones. The was introduced to alleviate this problem, by having diminished-1 representation and

each operand represented decreased by one compared to its weighted representation and by deriving the results in an alternative manner when one or both operands or the results are zero. The diminished-1sum is then computed as, by a diminished-1 adder, which is an adder that increments the integer sum of and whenever the carry flag of their respective integer addition is not set.A diminished-1 adder can be derived by connecting the inverted carry output of an integer adder back to its carry input. However,such solutions are inefficient due to the resulting oscillations. Therefore, a number of efficient architectures that do not suffer from oscillations have been proposed. The need for handling zero operands and results separately, as well as the need for time and hardware consuming input (output) translators from (to) the weighted to (from) the diminished- 1 representation, make the use of the diminished-1 representation efficient only when a large number of calculations take place before a new conversion is required. In all other cases,
4

including all applications apart from RNS implementations, modulo adders with operands in weighted representation are more suitable. Efficient architectures for modulo adders for operands in weighted representation have also been proposed . These two cases, namely modulo adders that operate on operands in the diminished-1 representation (hereafter called diminished-1 adders) and those that operate on operands that follow a weighted representation (hereafter called weighted adders) have, so far, been considered distinct cases and efficient architectures for them have been studied independently. In this brief it is shown that these two alternatives can be unified. A diminished-1 adder can be derived by connecting the inverted carry output of an integer adder back to its carry input. Given two -bit numbers and , the problem of computing two -bit numbers and , such that to be congruent to modulo , is attacked. It is shown that this problem has a constant time solution, enabling every architecture that has been or will be proposed for diminished-1 addition to also be used for addition of operands in the weighted representation. The required unifying arithmetic operator is just a simplified inverted end-around carry-save-adder (CSA) stage.[12],[15]

Fig 3.1 CSA stage with inverted end-around carry 3.2 REVIEW OF TWO PREVIOUS WEIGHED MODULO 2N+1 ADDER

Given two (n + 1)-bit numbers A and B, where 0 A,B 2n, the values of diminished-1 of A and B are denoted by A =A 1 and B = B 1, respectively. The diminished-1 sum Scan be computed by S = |S 1|2n+1 = |A + B 1|2n+1 = |A + B|2n + cout (1) where |X|Z is defined as modulo Z of X, and cout is denoted as the inverted endaround carry of the diminished-1 modulo 2n sum of n-bit A and B.

3.2.1

VERGOS AND EFSTATHIOU In this first compute the congruent modulo sum of A + B to produce Y and

U, and then, the final modulo sum is performed by any diminished-1 modulo adder .Suppose A and B are two (n + 1)-bit input numbers, i.e., A = anan1, . a0 = an 2n + An and B = bnbn1, . . . , b0 = bn 2n + Bn, where 0 <= A,B <= 2n, and An and Bn are two n-bit numbers; then |A + B|2n+1 = ||An + Bn + D + 1|2n+1 + 1|2n+1 =|Y + U + 1|2n+1., D = 2n 4 + 2cn+1 + sn, which is equivalent to 1111, . . . , cn+1sn, where cn+1 = an bn ( is denoted as the logic AND operation), and sn = an bn ( is denoted as the logic EXCLUSIVE-OR operation) is the bit of D with binary weights 21 and 20, respectively. The first step of this equation computes modulo 2n + 1 carry-save addition, giving the carry vector Y and the sum vector U, where Y = yn2yn3, . . . , y0yn1 and U = un1un2, . . . , u0 are produced by adding An, Bn, and D, respectively. It can be seen that the values of D with binary weights of 22 through 2n1 are all 1, which can simplify the design of adders to produce the carries and sums using OR and XNOR gates for every bit position directly .In the bits of D with binary weights 21 and 20, the adders should be modified to accept the values sn and cn+1, respectively.

Fig 3.2 Architecture of Vergos and Efstanthiou

3.2.2

VERGOS AND BAKALIS

In this method subtract the sum of the two n-bit inputs A and B by 1 to produce the diminished-1 values A and B, and modulo 2n sum of A and B can be performed by any diminished-1 architecture, as follows: ||A + B|2n+1|2n = |A + B|2n + cout. The value cout is the inverted end-around carry produced by A + B, and the architecture is shown in Fig.3.2. The architecture proposed makes use of a constant time operator, which is composed of a simplified carry-save adder stage, leading to efficient modulo 2n + 1 adders. The architecture proposed] can be applied in the design of area-efficient residue generators and multioperand modulo adders. However, the values that are subtracted by the inputs A and B are not constants. In this way to implement the translator for decreasing the sum of two inputs by 1 was not mentioned.The ranges of two inputs A and B are less than the one proposed in older one (i.e., {0, 2n 1} versus {0, 2n}). [1]

Fig 3.3 Architecture of Vergos and Bakalis 3.3 DIMINISHED -1 ADDER Diminished-1 adder can be used for the modulo 2n +1 addition of two n-bit operands in the weighted representation, if it is driven by operands whose sum has been decreased by 1. This scheme outperforms solutions that are based on the use of binary adders and/or weighted modulo 2n + 1 adders in both area and delay terms. The diminished adder used in this type Sklansky-style diminished adder . For the Sklansky adder shown in Fig 3.3 ,. The Sklansky-style parallel-prex operation requires N/2 additions at each stage of the tree. Since all additions at a given stage in the tree are completely independent, they can be run in parallel. This is what makes this technique attractive for parallelizing associative functions.This sklansky type structure uses more are than the brent kung tree prefix structure. [1].

FIG 3.4

Sklansky-style parallel-prefix structure

FIG 3.5

BASIC CELLS IN SKLANSKY

STYLE STRUCTURE

Sklansky prefix tree takes the least logic levels to compute the carries. Plus, it uses less cells than Kogge-Stone structure at the cost of higher fan-out..The sklansky style prefix structure uses large area when compared with the brent-kung tree parallel prefix structures For a 16-bit Sklansky prefix tree, the maximum fan-out is 9 (i.e. f = 3). The structure can be viewed as a compacted version of Brent-kung's, where logic levels is reduced and fan-out increased. Sklansky-style parallel-prefix structure with correction circuits for our proposed weighted modulo 28 + 1 adder. The square (_) and diamond () nodes denote the pre- and postprocessing stages of the operands, respectively. The black nodes () evaluate the prefix operator, and the white nodes () pass the unchanged signals to the next prefix level.[1]

10

CHAPTER-4 PROPOSED SYSTEM 4.1 INTRODUCTION


An area-efficient weighted modulo 2n + 1 adder design using diminished-1 adders with simple correction schemes. This is achieved by subtracting the sum of two (n + 1)-bit input numbers by the constant 2 n + 1 and producing carry and sum vectors. The modulo 2n + 1 addition can then be performed using parallel-prefix structure diminished-1 adders by taking in the sum and carry vectors plus the inverted end-around carry with simple correction schemes. The area cost for our proposed adders is lower. In addition, our proposed adders do not require the hardware for zero detection that is needed in diminished-1 modulo 2n + 1 addition.. 4.2 THEORY An area efficient modulo Instead of subtracting the sum of A and B by D, which is not a constant as proposed in we use the constant value (2n + 1) to be added by the sum of A and B. In addition, we make the two inputs A and B to be in the range {0, 2n}, which is 1 more than {0, 2n 1} as proposed in we present the designs of our proposed weighted modulo 2n +1 adder .Given two (n + 1)-bit inputs,. A = anan1, . . . , a0 and B =bnbn1, . . . , b0, where 0 A,B 2n. The weighted modulo 2n + 1 of A + B can be represented as follows

11

From these equations , it can easily be seen that the value of the weighted modulo 2n + 1 addition can be obtained by first subtracting the value of the sum of A and B by (2n + 1) (i.e., 0111, . . . , 1) and then using the diminished-1 adder to get the final modulo sum by making the inverted end-around carry as the carry-in Now, we present the method of weighted modulo 2n + 1 addition of A and B as follows. Denoting Yand U as the carry and sum vectors of the summation of A,B and (2n + 1), where Y= yn 2yn 3, . . . , y0yn1 and U = un1un2, . . . , u0, the modulo addition can be expressed as follows:

12

For i = 0 to n 2, the values of yi and ui can be expressed as yi = ai bi and ui = ai bi, respectively ( is denoted as logic OR operation). Since the bit widths of Y and U are only n bits, the values of yn 1 and un1 are required to be computed taking the values of an, bn, an1, and bn1 into consideration (i.e., yn 1 and un1 are the values of the carry and the sum produced by 2an + 2bn + an1 + bn1 + 1, respectively). It should be noted that 0 A,B 2n, which means an = an1 = 1 or bn = bn1 = 1 will cause the value of A or B to exceed the range of {0, 2n}. Thus, these input combinations (i.e., an = an1 = 1 or bn = bn1 = 1) are not allowed and can be viewed as dont care conditions, which can help us simplify the circuits for generating yn 1 and un1. That is, the maximum value of 2an + 2bn + an1 + bn1 + 1 is 5, which occurs at an = bn = 1 (i.e., the maximum value of yn 1 is 2). .The reason for FIX is that, under some conditions, yn
1

=2 (e.g., an = bn = 1 and an1 = bn1 = 0), which cannot be

represented by 1-bit line therefore,the value of yn 1 is set to 1, and the remaining value of carry (i.e., 1) is set to FIX.[1]. 4.3 RESIDUE NUMBER SYSTEM A basic number system consists of a correspondence between sequences of digits and numbers. In a fixed-point number system, each sequence corresponds to exactly one number, and the radix-point |the decimal point" in the
13

ordinary decimal number system| that is used to separate the integral and fractional parts of a representation is in a fixed position.In contrast, in a floating-point number system, a given sequence may correspond to several numbers: the position of the radix-point is not fixed,and each position in a digit-sequence indicates the particular number represented. Usually, floating-point systems are used for the representation of real numbers, and fixed-point systems are used to represent integers (in which the radix point is implicitly assumed to be at the right-hand end) or as parts of floatingpoint representations; but there are a few exceptions to this general rule. Almost all applications of RNS are as fixed-point number systems.If we consider a number such as 271.834 in the ordinary decimal number system, we can observe that each digit has a weight that corresponds to its position: hundred for the 2, ten for the 7, ... thousand for the 4. This number system is therefore an example of a positional (or weighted) number system; residue number systems, on the other hand, are nonpositional. The decimal number system is also a single-radix (or fixed-radix ) system, as it has only one base (i.e. ten). .Although mixed-radix (i.e. multiple-radix) systems are relatively rare, there are a few useful ones. Indeed, for the purposes of conversion to and from other number systems, as well as for certain operations, it is sometimes useful to associate a residue number system with a weighted, mixedradix number system. Residue number systems are based on the congruence relation, which is defined as follows. Two integers a and b are said to be congruent modulo m if m divides exactly the difference of a and b; it is common, especially in mathematics tests, to write a= b (mod m) to denote this. Thus, for example, 10 = 7 (mod 3); 10 = 4 (mod 3); 10 = 1 (mod 3), and 10 =-2 (mod 3). The number m is a modulus or base. If q and r are the quotient and remainder, respectively, of the integer division of a by m -that is, a = q.m + r -|then, by defenition, we have a = r (mod m). The number r is said to be the residue of a with respect to m, and we shall usually denote this by r =/a/m The set of m smallest values, {0; 1; 2; : : : ;m 1}, that the residue may assume is called the set of least positive residues modulo m. Unless otherwise specified, we shall assume that these are the only residues in use. Consider a set, {m1;m2; : : : ;mN}, of N positive and pairwise relatively prime moduli Let M be the product of the moduli. Then every number X < M has a unique representation in the residue number system, which is the set of residues
14

{/X]MI:1<=I<=N}. A partial proof of this is as follows. Suppose X1 and X2 are two dierent numbers with the same residue-set. Then /X1/mi = /x/jmi , and so /X1 - X2/mi = 0. Therefore X1 - X2 is the least common multiple (lcm) of mi. But if the mi are relatively prime, then their lcm is M, and it must be that X1 - X2 is a multiple of M. So it cannot be that X1 < M and X2 < M. Therefore, the set {/jX/mi : 1<= i <= N} is unique and may be taken as the representation of X. The number M is called the dynamic range of the RNS, because the number of numbers that can be For unsigned numbers, that range is [0;M - 1].[17] Representations in a system in which the moduli are not pairwise relatively prime will be not be unique: two or more numbers will have the same representation. As an example, the residues of the integers zero through fifteen relative to the moduli two, three, and five (which are pairwise relatively prime) are given in the left half of Table 4.1. And the residues of the same numbers relative to the moduli two, four, and six (which are not pairwise relatively prime) are given in the right half of the same table.Observe that no sequence of residues is repeated in the first half, whereas there are repetitions in the second. The preceding discussions define what may be considered standard residue number systems, and it is with these that we shall primarily be concerned. Nevertheless, there are useful examples of nontandard" RNS, the most common of which are the redundant residue number systems. Such a system is obtained by,essentially, adding extra (redundant) moduli to a standard system. The dynamic range then consists of a \legitimate" range, defined by the non-redundant moduli and an \illegitimate" range; for arithmetic operations,initial operands and results should be within legitimate range. Redundant number systems of this type are especially useful in fault-tolerant computing. The redundant moduli mean that digit-positions with errors may be excluded from computations while still retaining a sucient part of the dynamic range. Furthermore, both the detection and correction of errors are possible: with k redundant moduli, it is possible to detect up to k errors and to correct up to k/2errors. A different form of redundancy can be introduced by extending the size of the digit-set corresponding to a modulus, in a manner similar to RSDs. For a modulus m, the normal digit set is {0,1,...m-1} but if instead the digit-set used is {0,1...m-1}, where m>=m then some residues will have redundant representations.[10].[18] Table 4.1 Residues for various moduli
15

represented is M.

4.3.1 MODULE SELECTION In general, then, there are at least four considerations that should be taken into account in the selection of moduli. First, the selected moduli must provide an adequate range whilst also ensuring that RNS representations are unique. The second is, as indicated above, the effiency of binary representations; in this regard, a balance between the different moduli in a given moduli-set is also important. The third is that, ideally, the implementations of arithmetic units for RNS should to some extent be compatible with those for conventional arithmetic, especially given the \legacy" that exists for the latter. And the fourth is the size of individual moduli: Although, as we shall see, certain RNS-arithmetic operations do not require carries between digits, which is one of the primary advantages of RNS, this is so only between digits. Since a digit is ultimately represented in binary,there will be carries between bits, and therefore it is important to ensure that digits (and, therefore, the moduli) are not too large. Low-precision digits also make it possible to realize costeffective table-lookup implementations of arithmetic operations. But, on the other hand, if the moduli are small, then a large number of them may be required to ensure a sufficient dynamic range. Of course, ultimately the choices made, and indeed
16

whether RNS is useful or not, depend on the particular applications and technologies at hand. 4.3.2 Negative numbers Some applications require that it be possible to represent negative numbers as well as positive ones. As with the conventional number systems, any one of the radix complement, diminished-radix complement, or sign-and- magnitude notations may be used in RNS for such representation. The merits and drawbacks of choosing one over the other are similar to those for the conventional notations. In contrast with the conventional notations, however, the determination of sign is much more diffcult with the residue notations, as is magnitude-comparison. This is the case even with sign-and- magnitude notation, since determining the sign of the result of an arithmetic operation such as addition or subtraction is not easy|even if the signs of the operands are known. The extension of sign-and-magnitude notation to RNS involves the use of a single sign-digit or prepending to each residue in a representation an extra bit or digit for the sign; we shall assume the former. For the comple- ment notations, the range of representable numbers is usually partitioned into two approximately equal parts, such that approximately half of the numbers are positive and the rest are negative. 4..3.3 Basic arithmetic

The standard arithmetic operations of addition/subtraction and multiplication are easily implemented with residue notation, depending on the choice of the moduli, but division is much more difficult. The latter is not surprising, in light of the statement above on the diffculties of sign-determination and magnitude-comparison. Residue addition is carried out by individually adding corresponding digits, relative to the modulus for their position. That is, a carry-out from one digit position is not propagated into the next digit position. Subtraction may be carried out by negating (in whatever is the chosen notation) the subtrahend and adding to the minuend. This is straightforward for numbers in diminished-radix complement or radix complement notation. For numbers represented in residue sign-and-magnitude, a slight modiffcation of the algorithm for conventional sign-and-magnitude is necessary: the
17

sign digit is fanned out to all positions in the residue representation, and addition then proceeds as in the case for unsigned numbers but with a conventional signand-magnitude algorithm. Multiplication too can be performed simply by multiplying corresponding residue digit-pairs, relative to the modulus for their position; that is, multiply digits and ignore or adjust an appropriate part of the result. 4.3.4 Conversion The most direct way to convert from a conventional representation to a residue one, a process known as forward conversion, is to divide by each of the given moduli and then collect the remainders. This, however, is likely to be a costly operation if the number is represented in an arbitrary radix and the moduli are arbitrary. If, on the other hand, the number is represented in radix-2 (or a radix that is a power of two) and the moduli are of a suitable form (e.g. 2n1), then there procedures that can be implemented with more effciency. The conversion from residue notation to a conventional notation, a process known as reverse conversion, is more dicult (conceptually, if not necessarily in the implementation) and so far has been one of the major impediments to the adoption use of RNS. One way in which it can be done is to assign weights to the digits of a residue representation and then produce a \conventional" (i.e positional, weighted) mixed-radix representation from this. This mixed-radix representation can then be converted into whatever conventional form is desired. In practice, the use of a direct conversion procedure for the latter can be avoided by carrying out the arithmetic of the conversion in the notation for the result. Another approach involves the use of the Chinese Remainder Theorem, which is the basis for many algorithms for conversion from residue to conventional notation; this too involves, in essence, the extraction of a mixed-radix representation. Residue number systems are also useful in error detection and correction. This is apparent, given the independence of digits in a residue-number representation: an error in one digit does not corrupt any other digits. In general, the use of redundant moduli, i.e. extra moduli that play no role in determining the dynamic range,facilitates both error detection and correction. But even without redundant moduli, fault tolerance is possible, since computation can still continue

18

after the isolation of faulty digit-positions, provided that a smaller dynamic range is acceptable. RNS can help speed up complex-number arithmetic:[2] 4.4 MODULO 2N+1 ADDER DESIGN Efficient modulo 2 n+1 adders are important for several applications including residue number system, digital signal processors and cryptography algorithms. In a conventional modulo 2n+1 adder, all operands have (n+1)-bit length. To avoid using (n+1)-bit circuits, the diminished-1 and carry save diminished-1 number systems can be effectively used in applications. In the paper, we also derive two new architectures for designing modulo 2n+1 adder, based on n-bit ripple-carry adder. The first architecture is a faster design whereas the second one uses less hardware. In the proposed method, the special treatment required for zero operands in Diminished-1 number system is removed. In the fastest modulo 2n+1 adders in normal binary system, there are 3-operand adders.For efficient design the hardware overhead and power consumption will be reduced. As well as power reduction, in some cases, power-delay product will be also reduced. The modular characteristic of the Residue Number System (RNS) offers the potential for high-speed and parallel arithmetic. In RNS logic, each operand is represented by its residues with respect to a set of numbers comprising the base. Addition, subtraction and multiplication are performed in parallel on the residues in distinct design units (often called channels) avoiding carry propagation among residues So, arithmetic operations, e.g. addition, subtraction and multiplication can be carried out more efficiently in RNS than in conventional twos complement systems. That makes RNS a good candidate for implementing a lot of application fields. Typical applications of the RNS can be found in Digital Signal Processing (DSP) for filtering, convolutions, correlations, FFT computation , fault-tolerant computer systems communication cryptography. The choice of moduli set is very important and necessary for nearly equal delay of the channels. Special moduli sets have been used extensively to reduce the hardware complexity in the implementation of converters and arithmetic operations. Among which the triple moduli set {2n-1,2n,2n+1} has some benefits . Because of operand lengths of these moduli, the operation delay of this system is determined by the modulo 2n+ 1 channel. The latter means that, if we cut down the time required for modulo 2 n+1
19

addition,]. In order to speed up the modulo

2n+1

arithmetic operations the

diminished-1 representation of binary numbers has been introduced]. In the Diminished-1 number system, each number X is represented by X*=X-1, while zero is handled separately.. But in these circuits, it is necessary to use special treatment for zero operands. To overcome mentioned problem, a number representation socalled Carry Save Diminished-1 has been proposed in. In this paper, an addition algorithm in the carry save diminished-1 system is proposed. In the proposed addition algorithm, the special treatment for zero operands is not required. Modulo 2n+ 1 adders can also be designed as a special case of general modulo m adders. The novel architecture removes some significant problems of old structures and reduces both area and power dissipation. In the paper, we derive new methodology for modulo 2n+ 1 adder that leads to a ripple-carry adder architecture. Although ripple-carry adder has more delay than carry-accelerate adder, it is useful for low power and low area applications. Using implementation in a CMOS technology, we show that the proposed ripple-carry design methodology leads to considerably less area and power consumption than those reported in the related papers and in some cases, power-delay product is also reduced. The conventional methods for modulo 2 n+ 1 adder including general modulo adders, diminished-1 and carry save diminished-1 modulo adders implemented by ripple-carry and parallel-prefix addition. Modulo 2n+ 1 adders can be designed as a special case of general modulo m adders. To remove the problem of (n+1)-bit wide circuits for the modulo 2n+ 1 channel, the diminished-1 and carry save diminished-1 number systems have been proposed.[3]

20

Fig 4.1 general block diagram of modulo 2n+1 adder. The general block diagram of 2N+1 adder is given in fig 4.1 The only difference between modulo 2 +1 adder and modulo 2 - 1 adder is the inverter
n n

that takes cout as input. In this end-around adder, cout needs to be inverted before going to the incrementer. The ways of building a modulo 2 + 1 can also be divided
n

into three categories. One utilizes the reduced parallel prefix tree with an extra logic level at the bottom . A second method uses the similar idea as the full parallel prefix tree. The third one is the end-around adder with any type of adder followed by an incrementer.[12]

4.4.1 Parallel-Prefix Ling Structures for Modulo 2n + 1 Adders Ling's scheme can be applied to modulo 2n + 1 adders. Efficient modulo 2n+1 adders are important for several applications including residue number system, digital signal processors and cryptography algorithms. In a conventional modulo 2 n+1
21

adder, all operands have (n+1)-bit length. To avoid using (n+1)-bit circuits, the diminished-1 and carry save diminished-1 number systems can be effectively used in applications The idea is applicable for full parallelprefix structure for modulo 2 n + 1 adders . Diminished-1 adder can be used for the modulo 2 n +1 addition of two n-bit operands in the weighted representation, if it is driven by operands whose sum has been decreased by 1. This scheme outperforms solutions that are based on the use of binary adders and/or weighted modulo 2n + 1 adders in both area and delay terms. . To improve the areatime and timepower products, the circular carry selection scheme was used to efficiently select the correct carry-in signals for final modulo addition . The aforementioned methods all deal with diminished-1 modulo addition. However, the hardware for decreasing/increasing the inputs/outputs by 1 is omitted in the literature. This approach is based on making the modulo 2n + 1 addition of two (n + 1)-bit input numbers A and B congruent to Y + U + 1, where Y and U are two nbit numbers. The three main parts of this technique are translator,diminished-1 adder and correction scheme.Thus, any dimished-1 adder can be used to perform weighted modulo 2n + 1 addition of Y and U In addition, the value zero is not allowed in diminished-1 modulo 2n + 1 addition, and hence, the zero-detection circuit is required to avoid incorrect computation any dimished-1 adder can be used to perform weighted modulo 2n + 1 addition of Y and U We then apply this scheme in the design of residue generators (RGs) and multi-operand modulo adders (MOMAs). However, due to the complexity of the full parallel-prefix structure, the benefit tend to diminish when Ling's equations are utilized, especially for wide adders (i.e. 64-bit or larger). As there is an inverted carry-in for modulo 2n + 1 adders, the same logic will be there in Ling's reduced prefix tree.[13]

22

Fig 4.2 modified Modulo 2 + 1 Adder with the Reduced Parallel-Pre_x


n

Ling Structure

4.4.2 Combination of Binary and Modulo 2n +1 Adder Reviewing binary and modulo 2n+1 adder architectures, it can be found that the prefix tree can be applied to all these adders. Special moduli sets have been used extensively to reduce the hardware complexity in the implementation of converters and arithmetic operations. Modulo 2n+ 1 adders can be designed as a special case of general modulo m adders. To remove the problem of (n+1)-bit wide circuits for the modulo 2n+ 1 channel, the diminished-1 and carry save diminished-1 number systems have been proposed Among which the triple moduli set {2n1,2n,2n+1} has some benefits . Because of operand lengths of these moduli, the operation delay of this system is determined by the modulo 2n+ 1 channel. Modulo adders are an extension of binary adders. The reduced parallel-prefix structure applies to both modulo 2n-1 and 2n+1 adders, however, with the only difference of one inverter.

23

Fig 4.3 Combined Binary and Modulo

Adders.

In Figure 4.3, gi=pi comes from the pre-computation stage. The selections 0, 1 and 2 in the multiplexor are for modulo 2n + 1, modulo 2n - 1 and binary addition, respectively the general equation for binary addition, the combined-function adder can be formulized as the following equation shows.

Inserting this structure between pre- and post-computation, the adder architecture is complete. The modified parallel-prefix tree does not handle the carry-input. This is the only difference between this special prefix tree and that solely for binary adder. The carry input is handled at the last row of gray cells. This agrees with the associativity of the synthesis rule. The prefix tree can be modified from any type of normal binary prefix tree.[4].
24

4.5 CARRY PROPAGATION ADDITION Carry-propagate addition finally converts the redundant carry-save output from the carry-save adder into irredundant binary representation by performing a carry-propagation . A variety of different schemes exist to speed up carrypropagation that trade off area versus speed. The relevant adder architectures and their characteristics are summarized in Table 4.2. Two principles have to be distinguished here: the prefix structure employed to propagate carries from lower to upper bits and the sum bit generation that determines how the sum bits are calculated from the carries.

Table 4.2 ADDER ARCHITECTURE CHARACTERISTICS 4.5.1 PREFIX STRUCTURE carry-propagation in binary addition is a prefix problem [8], which can be calculated using prefix structures .Besides the straightforward serial-prefix structure (implemented by the ripple-carry adder) many different parallel-prefix structures exist, which speed up carry-propagation at the cost of increased area requirements . They basically differ in terms of depth (= circuit speed), size (= circuit area) and maximum fanout, which can be bounded (constant) or unbounded (dependent on the operand width) and influences circuit speed and area in a more subtle way. The internal signals of a prefix implementation can be coded in different ways, resulting in different possible logic implementations.
25

Most

common

are

the

use

of

generate/propagate signal pairs computed by AND-OR gates and carry-in-0/carry-in1 signal pairs 4.5.2 Architecture Performance Comparison The relative performance of all these adder architectures varies greatly among different technology libraries, so that only qualitative characteristics regarding area and speed are summarized in Table I instead of quantitative comparison results. In addition, the following observations can be made: The ripple-carry adder implemented using full-adder cells is always the smallest and slowest adder. The carry-skip adder massively speeds up the ripplecarry adder at a very

moderate area penalty but is still slower than any other architecture. However, due to its false paths it cannot readily be used in synthesis-based design. The carry-select adder is very area efficient for medium speeds if special carryselect adder cells are available in the library. Its prefix structure has the special property of allowing maximally 2 prefix nodes per bit position. The carry-increment adder is an optimization of the carry-select adder that uses the carry-lookahead scheme instead of the carry-select scheme for the same prefix structure. It has the same delay but a 30% smaller gate count. The Brent-Kung parallel-prefix adder gives a good trade-off between area and speed, lying in the range of -15% to -30% area reduction at +15% to +30% delay increase as compared to the faster Sklansky parallelPrefix adder. The Sklansky parallel-prefix adder (uses the prefix structure first proposed by Sklansky for conditional-sum adders has a prefix structure of minimal depth and therefore is among the fastest adder architectures. Its unbounded-fanout property helps reduce circuit area (fewer prefix nodes) but adds some extra delay for driving the high-fanout nodes. The Kogge-Stone parallel-prefix adder also has a minimal depth prefix structure. Its bounded-fanout property eliminates the need for driving high-fanout nodes, making it the fastest adder in most technologies, but comes at the cost of much bigger area (more prefix nodes) and more wiring. Compared to the Sklansky prefix adder, it shows an area increase between +23% (8 bit) and +75% (128 bit) at a fairly constant delay reduction of around -4% (all widths).[5]. 4.6 PARELLEL PREFIX TREE STRUCTURE
26

Parallel-prefix trees have various architectures. These prefix trees can be distinguished by four major factors. 1) Radix/Valency 2) Logic Levels 3) Fan-out 4) Wire Tracks In the following discussion about prefix trees, the radix is assumed to be 2 (i.e. the number of 32 inputs to the logic gates is always 2). The more aggressive prefix schemes have logic levels [log2(n)], where n is the width of the inputs. However, these schemes require higher fanout,or many wire-tracks or dense logic gates, which will compromise the performance e.g.speed or power. Some other schemes have relieved fan-out and wire tracks at the cost of more logic levels. When radix is fixed, The design trade-off is made among the logic levels, fan-out and wire tracks. Kogge-Stone, Brent-Kung ,Sklansky , Ladner-Fischer are the major type prefix structure. These prefix networks achieve three extreme goals: minimal logic levels and wire tracks, minimal max-fanout and logic levels, and minimal wire tracks and max-fanout, respectively. In addition, LadnerFischer, HanCarlson Knowles and implemented the trade-off between each pair of the extreme cases.

Structure of the prefix network determines the type of the prefix adder. Ziegler et considered sparsity, fanout and radix as three dimensions in the design space of regular parallel prefix adders and presented a unified formalism to describe such structures. KoggeStone tree was a better choice than Ladner Fischer tree.. 4.6.1 Kogge-stone parallel prefix structure Kogge-Stone prefix tree is among the type of prefix trees that use the fewest logic levels.A 16-bit example is shown in Figure 3.8. In fact, Kogge-Stone is a member of Knowles prefix tree . The 16-bit prefix tree can be viewed as Knowels . The numbers in the brackets represent the maximum branch fan-out at each logic level. The maximum fan-out is 2 in all logic levels for all width Kogge-Stone prefix trees. The key of building a prefix tree is how to implement according to the specific features of that type of prefix tree and apply the rules described in the previous section. Gray cells are inserted similar to black cells except that the gray cells final output carry outs instead of intermediate G=P group. The reason of starting with Kogge-Stone prefix tree is that it is the easiest to build in terms of using a program concept. The example in Figure 4.4 is 16-bit (a power of 2) prefix tree.

27

Fig 4.4

Kogge-Stone Prefix Tree

For the Kogge-Stone prefix tree, at the logic level 1, the inputs span is 1 bit (e.g. group (4:3) take the inputs at bit 4 and bit 3). Group (4:3) will be taken as inputs and combined with group (6:5) to generate group (6:3) at logic level 2. Group (6:3) will be taken as inputs and combined with group (10:7) to generate group (10:3) at logic level 3, and so on so forth. 4.6.2 brent kung Brent-Kung prefix tree is a well-known structure with relatively sparse network. The fanout is among the minimum as f = 0. So is the wire tracks where t = 0. The cost is the extra L - 1 logic levels. A 16-bit example is shown in Figure 4.5. The critical path is shown in the figure with a thick gray line.Brent-Kung tree uses only Less are when compared with Sklasky prefix tree. .

28

Fig 4.5 4.6.3 Sklansky Prefix Tree

16-bit Brent-Kung Prefix Tree

Sklansky prefix tree takes the least logic levels to compute the carries. Plus, it uses less cells than Kogge-Stone structure at the cost of higher fan-out. Figure 4.6 shows the 16-bit example of Sklansky prefix tree with critical path in solid line.The sklansky style prefix structure uses large area when compared with the brent-kung tree parallel prefix structures For a 16-bit Sklansky prefix tree, the maximum fan-out is 9 (i.e. f = 3). The structure can be viewed as a compacted version of Brent-kung's, where logic levels is reduced and fan-out increased. The number of logic levels is log2n. Each logic level has n=2 cells as can be observed in Figure 4.6. The area is estimated as (n/2)log2n. When n = 16, 32 cells are required.

29

Fig 4.6 16-bit Sklansky Prefix Tree 4.6.4 Ladner-Fischer Prefix Tree The major problem of Sklansky prefix tree is its high fan-out. LadnerFischer prefix tree is proposed to relieve this problem. To reduce fan-out without adding extra cells, more logic levels have to be added. Figure 4.6 shows a 16-bit example of Ladner-Fischer prefix tree..Ladner-Fischer prefix tree is a structure that sits between Brent-Kung and Sklansky prefix tree. It can be observed that in Figure 4.6 the first two logic levels of the structure are exactly the same as Brent-Kung's. Starting from logic level 3, fan-out more than 2 is allowed (i.e. f > 0). Comparing the fan-out of Ladner-Fischer's and Sklansky's, the number is reduced by a factor of 2 since Ladner-Fischer prefix tree allows more fa-nout one logic level later than Sklansky prefix tree.[4]

30

Fig 4.7 11 bit Ladner-Fischer Prefix Tree Synthesis .6.5 Knowles Prefix Tree Knowles proposed a family of prefix trees with flexible architectures. Knowles prefix trees use the fan-out at each logic level to name their family members. ]. Figure 4.7 shows a 16-bit Knowles prefix tree. Even different fan-out in the same logic level is allowed in Knowles prefix trees, which is called hybrid Knowles prefix tree. It can be proven that overlapping is allowed even for more than 1 bit as it is allowed in prefix trees The Knowles prefix tree family has multiple architectures which it can implement. It will not be diffficult to extend the algorithm once the basic concepts on the prefix trees are forrmly established. Both Kogge-Stone and Knowles prefix tree have the same number of logic levels. In Knowles prefix tree, the fan-out at logic level 4 is 3 instead of 2. To build such prefix trees, the pseudo-code made for Kogge-Stone prefix tree can be reused except for the change at the last level, they also have the same number of cells. Hence, the area for Knowles prefix tree is also estimated as nlog2n - n + 1.
31

Fig 4.8 16-bit Knowles Prefix Tree

4.6.6 HAN-CARLSON PREFIX TREE The idea of Han-Carlson prefix tree is similar to Kogge-Stone's structure since it has a maximum fan-out of 2 or f = 0. The difference is that Han-Carlson prefix tree uses much less cells and wire tracks than Kogge-Stone. The cost is one extra logic level. Han-Carlson prefix tree can be viewed as a sparse version of Kogge-Stone prefix tree. In fact, the fan-out at all logic levels is the same (i.e. 2). The pseudo-code for Kogge-Stone's structure can be easily modi_ed to build a Han-Carlson prefix tree. The major difference is that in each logic level, Han-Carlson prefix tree places cells every other bit and the last logic level accounts for the missing carries. Figure 4.8 shows a 16-bit Han-Carlson prefix tree, ignoring the buffers. The critical path is shown with thick solid line. This type of Han-Carlson prefix tree has log n + 1
2

logic levels. It happens to have the same number cells as Sklansky prefix tree since the cells in the extra logic level can be move up to make the each of the previous logic levels all have n=2 cells. The area is estimated as (n/2)log n. When n = 16, the number is 32.
2

32

Fig 4.9 HAN CARLSON PREFIX TREE 4.6.7 Harris Prefix Tree

The idea from Harris about prefix tree is to try to balance the logic levels, fan-out and wire tracks. Harris proposed a cube to show the taxonomy for prefix trees in Figure 4.9, which illustrates the idea for 16-bit prefix trees . All the prefix trees mentioned above are on the cube, with Sklansky prefix tree standing at the fan-out extreme, Brent- Kung at the logic levels extreme, and Kogge-Stone at the wire track extreme. The balanced prefix structure is close to the center of cube . The logic levels is 24 + 1 = 5,maximum fan-out is 2f + 1 = 3 and wire track is 2t = 2. The diagram is shown in Figure 4.9 with critical path in solid line

33

Fig 4.10 HARRIS PREFIX TREE These are the various types of prefix parelel structures that are using for adder design.Each will have their own advantages and disadvantages ,so according to the purpose of task we can select the prefix structures. .6.8 Algorithmic Analysis for Prefix Trees Unfolding the algorithms mentioned, prefix trees can be built structurally either by HDL or schematic entry.Each type pf prefix wil show difference in area,logic levels ,fan out and Wiretracks .According to usage only we are choosing the prefix tree structures in diminished -1 adders. Table 3.4 summarizes the prefix trees' parameters, including logic levels, area estimation, fan-out and wire tracks.[5],[6].

Fig 4.3 Algorithmic Analysis

Type

Logic

Area
34

Fanout

Wire tracks

levels

Brent- Kung

2logn-2

2n-log2n-2

Kogge-Stone

Log2n

Nlog2n-n+1

n/2

Ladner-Fischer

Log2n+1

(n/4)log2n+3n/4-1

n/4+1

Knowles

Log2n

(n/2)log2n

Sklansky

Log2n

(n/2)log2n

n/2+1

Han-Carlson

Log2n

(n/2)log2n

n/4

Harris

Log2n+1

(n/2)log2n

n/8

4.11 DESIGN OF AREA EFFICIENT WEIGHTED MODULO 2N+ 1 ADDER An improved area-efficient weighted modulo 2n + 1 adder design using diminished-1 adders with simple correction schemes. This is achieved by subtracting the sum of two (n + 1)-bit input numbers by the constant 2n + 1 and producing carry and sum vectors. The modulo 2n + 1 addition can then be performed using parallelprefix structure diminished-1 adders by taking in the sum and carry vectors plus the inverted end-around carry with simple correction schemes. The area cost for our proposed adders is lower. In addition, our proposed adders do not require the hardware for zero detection that is needed in diminished-1 modulo 2n + 1 addition.This consists of three blocks 1) Translator 2) diminished -1 adder 3) correction circuit.Fig 4.7 shows the architecture of area efficient weighted modulo 2n+1 adder using correction schemes.
35

Fig 4.11

Architecture of proposed modulo 2n+1 adder

4.7.1 TRANSLATOR Translator subtracts the sum of two (n + 1)-bit input numbers by the constant 2n + 1 and produces carry and sum vectors. we use the constant value (2n + 1) to be added by the sum of A and B. In addition, we make the two inputs A and B to be in the range {0, 2n}, which is 1 more than {0, 2n 1} in the existing system.The translator wil change the 9-bit input to 8 bit output which wil be input for diminished adder.The architecture of translator is given in the fig 4.12

36

Fig 4.12

Architecture of translator

4.7.2

TRANSLATOR CIRCUITS

Translator consists of FAF and FA+ architecture the values given to translator wil pass through these circuits and it wil acts as a translator which reduces 1 bit in this area and then make a proper input for the diminished -1 adder.Fig 4.13 shows the structure of basic cells in translator.[1].

Fig 4.13 basic cells in translator

4.7.3 TRANSLATOR FROM MODULO 2N+1 TO THE PROPOSED


37

REPRESENTATION Let X = xn xn 1 xn 2 ... x1 x0 be a binary number with 0 X 2n and


* * * * x z X * = x z xn 1 xn 2 ... x1 x0 the targeted representation. The zero indication bit x z

can be computed by :

xz = xn xn 1 ... x1 x0

X - 1, if xz = 0 * n * while X = , or equivalently X = X 1 + x z = X + 2 1 + x z 0, if x z = 1

2n

The last relation reveals that X * can be computed by a modulo 2 n adder, that accepts as inputs the all 1s operand and the n least significant bits of X operand and as carry input the x z signal. Assuming an inclusive-OR implementation of the adder, we have that gi = xi and ti = 1. Therefore, utilizing we get that the carry at each position i is given by :
ci = gi ti ci 1 = xi ci 1 = ... = xi xi 1 ... x0 cin = = xi xi 1 ... x0 (xn xn1 ... x x0 ) 1

The latter relation reveals that the adder required for implementing a translator from the binary system to the adopted representation is composed by an exclusiveNOR gate per bit and of a carry computation unit easily implemented as trees of NOR gate 4.7.4 DIMINISHED-1 ADDER Depending on the input/output data representations, these methods can be classified into two categories, namely,diminished-1 and weighted, respectively. In the diminished-1 representation, each input and output operand is decreased by 1 compared with its weighted representation. Therefore, only n-bit operands are needed in diminished-1 modulo 2n + 1 addition, leading to smaller and faster components. However, this incurs an overhead due to the translators from/to the binary weighted system. On the other hand, the weighted-1 representation uses (n + 1)-bit operands for computations, avoiding the overhead of translators,but requires larger area compared with the diminished-1 representations. The general operations in modulo 2n + 1 addition were discussed in , including diminished-1 and weighted modulo addition. proposed efficient parallel-prefix adders for diminished-1 modulo 2n+ 1 addition. To improve the areatime and timepower products, the circular carry selection scheme was used to efficiently select the correct carry-in signals for final
38

modulo addition . The aforementioned methods all deal with diminished-1 modulo addition. However, the hardware for decreasing/increasing the inputs/outputs by 1 is omitted in the literature. In addition, the value zero is not allowed in diminished-1 modulo 2n + 1 addition, and hence, the zero-detection circuit is required to avoid incorrect computation any dimished-1 adder can be used to perform weighted modulo 2n + 1 addition of Y and U. first the translators to decrease the sum of two n-bit inputs A and B by 1 and then performed the weighted modulo 2n + 1 addition using diminished-1 adders. It should be noted that, for the architecture in Vergos and Bakalis, the ranges of two inputs A and B are less than that proposed in Vergos and efstathiou (i.e.,{0, 2n 1} versus {0, 2n}). In this brief, we propose improved areaefficient weighted modulo 2n + 1 adder design using diminished-1 adders with simple correction schemes. Diminished-1 adder can be used for the modulo 2 n +1 addition of two n-bit operands in the weighted representation, if it is driven by operands whose sum has been decreased by 1. This scheme outperforms solutions that are based on the use of binary adders and/or weighted modulo 2n + 1 adders in both area and delay terms. We then apply this scheme in the design of residue generators (RGs) and multioperand modulo adders (MOMAs). The resulting arithmetic components remove at least a whole parallel adder out of the critical path of the currently most efficient proposals. Experimental results indicate savings of more than 30% in execution time and of approximately 19% in implementation area when the proposed architectures are used.Various tpes of diminished adders are KoggeStone tree, Sklansky, Brent Kung There are many classic parallel prefix adders that have been proposed, including Sklansky , KoggeStone and BrentKung . These prefix networks achieve three extreme goals: minimal logic levels and wire tracks, minimal max-fanout and logic levels, and minimal wire tracks and max-fanout, respectively. In addition, LadnerFischer, HanCarlson and Knowles implemented the trade-off between each pair of the extreme cases. Structure of the prefix network determines the type of the prefix adder. Ziegler et considered sparsity, fanout and radix as three dimensions in the design space of regular parallel prefix adders and presented a unified formalism to describe such structures. KoggeStone tree was a better choice than Ladner Fischer tree. The works discussed above are based on ASIC technology. Vitoroulis investigated the performance of parallel prefix adders implemented with FPGA technology. It reported on the area requirements and critical
39

path delay for a variety of classical parallel prefix adder structures. However, parallel prefix trees were implemented as a single adder, without being a part of bigger designs. The diminished-1 adders result forms the least significant bits of the weighted sum. The indication of complementary input vectors at the diminished-1 adder is the most significant bit of the weighted sum. Parallel prefix networks are widely used in high- performance adders. Networks io the Literature represent tradeoffs between number of logic levels, fanout, and wiring tracks. . Adders using these networks are compared using the method of logical effort. Tbe new architecture is competitive in latency and area for some technologies.Common priority encoding, etc[1]. 4..7..3.1 Brent kung parallel prefix tree prefix computations include addition, incrementation,

The Brent-Kung adder is a parallel prefix form carry look-ahead adder The Brent-Kung adder is a parallel prefix adder that requires 2(log2N)-1 stages. It was originally proposed as a simple and regular design of a parallel adder that addresses the problems of connecting gates in a way to minimize chip area. Accordingly, it is considered one of the better tree adders for minimizing wiring tracks, fanout, and gate count and is used as a basis for many other networks.,To implement a parallel prefix tree, we need half-adder to calculate generated-carry and propagated-carry at each bit position. Then, using these carry signals, we need some other cells to compute group-generated carries and group propagated carries. shows some gate-level basic cells which calculate group-propagated carry Pi:j and groupgenerated carry Gi:j in the parallel prefix trees intermediate stages. In, the quadrate cell calculates Pi:j and Gi:j simultaneously whereas the triangular cell just calculates Gi:j . Therefore the circuit of the quadrate cell is more complex than that of the triangular cell. With the help of these basic cells, the rough implementation of BrentKung tree. We use HAi (0 i 7) to denote Half adder. Here, we do not take the buffers into account. Here, for a regular parallel prefix adder which does addition of two addends, we always assume that the incoming carry into this adder is c0 = 0. For two N-bit binary addends x = (xn1xn2, . . . , x0), y = (yn1yn2, . . . , y0), the formulations of computing
40

carry and sum at bit position i in parallel prefix tree are ci = Gi1:0 _ (Pi1:0 ^ c0), si = Pi ci , where 0 i n 1. Because c0 = 0, we have ci = Gi1:0 _ (Pi1:0 ^ c0) = Gi1:0. That is why we can use two different basic cells in to build the regular Brent Kung tree in. The idea is that sometimes only the signal Gi1:0 is needed, therefore the triangular cell which is more simple can be used to reduce the complexity. Vitoroulis compared the performance and area for regular parallel prefix trees which are implemented on FPGA technology. But when the parallel prefix trees are implemented as components of our EAC adder they cannot be designed in the regular way . Both Gi:0 and Pi:0 should be kept as the outputs for reuse in the next stage. For example, if we want to use BrentKung tree as the component in the EAC adder, which means the parallel prefix tree is implemented using BrentKung tree, we can only use the quadrate cell to calculate the signals in the intermediate stages. We must change the regular design of BrentKung tree the rough architecture of the modified BrentKung tree adopted. Therefore on FPGA technology, the properties of the different parallel prefix trees such as area and performance will be different from the results listed in Vitorouliss report. correction schemes. Diminished-1 adder can be used for the modulo 2n +1 addition of two n-bit operands in the weighted representation, if it is driven by operands whose sum has been decreased by 1. This scheme outperforms solutions that are based on the use of binary adders and/or weighted modulo 2n + 1 adders in both area and delay terms As a result, if we implement different parallel prefix trees in our EAC adder, we should first change the implementation of the parallel prefix tree itself; then, we also should take into account the relationship between the parallel prefix trees and the other parts of the EAC adder. Parallel prefix networks are widely used in high- performance adders. Networks io the Literature represent tradeoffs between number of logic levels, fanout, and wiring tracks. . Adders using these networks are compared using the method of logical effort. Tbe new architecture is competitive in latency and area for some technologies.Common incrementation, priority encoding prefix computations include addition,

41

Fig 4.14 8 bit brent kung tree Brent-Kung prefix tree is a well-known structure with relatively sparse network. The fanout is among the minimum as f = 0. So is the wire tracks where t = 0. The cost is the extra L - 1 logic levels This consists of a half adder and 2 basic cells one is square cell and other is triangular cell. Generate and propagate for least significant i bits.The output from the translator is giving to the half adder then it is goin to the quadratic cell and then goin to the triangular cells. Equations = (g0,p0) gi = Ai.Bi pi=AiBi for i>0: (Gi,Pi)=(gi,pi)(Gi-1,Pi-1) = (gi, pi) (gi-1, pi-1) . . . . (g1, p1)

42

Fig 4.15 Basic cells in brent kung tree based diminished adder some gate-level circuits of basic cells calculating Pi:j and Gi:j in the intermediated stages of parallel prefix tree. The quadrate cell calculates Pi:j and Gi:j simultaneously while the triangular cell just calculates Gi:j . So, the circuit of the quadrate cell is more complex than that of the triangular cell. Using the basic cells, the rough implementation of Brent-Kung tree is shown in Fig.4 .10 we use HAi (0 i 7) to denote Half adder. For a regular parallel prefix adder which just does the addition of two addends, the incoming carry into this adder is c0 = 0. For two N-bit binary addends x = (xn1xn2 . . . x0), y = (yn1yn2 . . . y0) the formulations of computing carry and sum at bit position i in parallel prefix tree are ci = Gi1:0 (Pi1:0 c0), si = Pi ci, where 0 i n 1. Because c0 = 0, so, we have ci = Gi1:0 (Pi1:0 c0) = Gi1:0. That why we can use two different basic cells i to build the Brent- Kung tree The performance and area for regular parallel prefix tree implemented on FPGA technology is good. But when the parallel prefix trees are implemented as components of the EAC adder , they cant be designed in the regular way . Both Gi:0 and Pi:0 should be kept as outputs for use in the next stage. For example, if we want

43

to implement Brent-Kung tree as the parallel prefix tree in we must only use one basic cell . 4.7.4 CORRECTION CIRCUIT

The reason for FIX is that, under some conditions, yn 1 = 2 (e.g., an = bn = 1 and an1 = bn1 = 0), which cannot be represented by 1-bit line (marked as in Table I); therefore, the value of yn 1 is set to 1, and the remaining value of carry (i.e., 1) is set to FIX. Notice that FIX is wired-OR with the carry-out of Y + U (i.e., cout) to be the inverted endaround carry (denoted by cout FIX) as the carry-in for the diminished-1 addition stage later on. When yn
1

= 2, FIX =1; otherwise, FIX = 0.

According to Table I, we can have yn 1 = (an bn an1 bn1), un1 = an1 bn1, and FIX = anbn bnan1 anbn1, respectively.Based on the aforementioned, our proposed weighted modulo 2n + 1 addition of A and B is equivalent to

Fig 4.16 Correction circuit This consists of a and gate and or gate this the signal of FIX can be computed in parallel with the translation to Y + U, leading to efficient correction
44

Table 4.4 Truth table for fix an 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 Bn 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 An-1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 Bn-1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Un-1 1 0 0 1 1 X 0 X 1 0 X X 1 X X X Yn-1 0 1 1 1 1 X 1* X 1 1* X X 1* X X X Fix 0 0 0 0 0 X 1 X 0 1 X X 1 X X X

According to Table I, we can have y_n 1 = (an bn an1 bn1), u_n1 = an1 bn1, and FIX = anbn bnan1 anbn1, respectively. Based on the aforementioned, our proposed weighted modulo 2n + 1 addition of A and B is equivalent to

CHAPTER-5
45

5.1

RESULTS AND ANALYSIS

5.1.1 The wave form of translator is given below:

5.1.2 The output wave form of modulo 2n+1 adder without correction scheme is shown below:

5.1.3 The output waveform of modulo 2n+1 with corection scheme is given below
46

Sum of numbers upto 256 (8bit) wil come as usual then the value wil come zero,one respectively.

5.2 SYNTHESIS REPORT

5.2.1 SKLANSKY-STYLE

PARELLEL PREFIX STRUCTURE

Target Device:XA3S250E-4VQG100. NUMBER OF SLICES: NUMBER OF 4 INPUT LUTS: NUMBER OF IOS: NUMBER OF BONDED IOBS: 30 OUT OF 2448 52 OUT OF 4896 27 27 OUT OF 66 40% 1% 1%

5.2.2 BRENT KUNG STYLE PARELLEL PREFIX STRUCTURE


Target Device:XA3S250E-4VQG100.
47

NUMBER OF SLICES: NUMBER OF 4 INPUT LUTS: NUMBER OF IOS: NUMBER OF BONDED IOBS:

24 OUT OF 2448 44 OUT OF 4896 27 27 OUT OF 66

0% 0% 40%

CHAPTER -6 CONCLUSION
48

An improved area-efficient weighted modulo 2n + 1 adder has been designed with brent kung parallel prefix tree based diminished adder. This has been achieved by modifying the existing diminished-1 modulo adders to incorporate simple correction schemes. The proposed adders can perform weighted modulo 2n + 1 addition and produce sums that are within the range {0, 2n}. The area cost for our proposed adders is lower. In addition, proposed adders do not require the hardware for zero detection that is needed in diminished-1 modulo 2 n + 1 addition. This is achieved by subtracting the sum of two (n + 1)-bit input numbers by the constant 2 n + 1 and producing carry and sum vectors. The modulo 2n + 1 addition can then be performed using parallel-prefix structure diminished-1 adders by taking in the sum and carry vectors plus the inverted end-around carry with simple correction schemes.Correction scheme include a fix value if the input value is higher than a particular value then the value of fix is 1 othervise it will show zero.The main module used are translator ,diminished -1 adder and a correction sheme.. The area cost for our proposed adders is lower. In addition, our proposed adders do not require the hardware for zero detection that is needed in diminished-1 modulo 2 n + 1 addition .Brent prefix structure uses only less area when compared with the skylansky prefix structure. The proposed adders has been implemented using 0.13-m CMOS technology, and the area required for our adders is lesser than previously reported weighted modulo 2n + 1 adders with the same delay constraints. Synthesis results show that our proposed adders can outperform previously reported weighted modulo adder in terms of area under the same delay constraints.

REFERENCE
[1] H.T.Vergos and C.Efstathiou,A unifying approach for weighted and diminished-1 modulo 2n+1 additionIEEE Trans.circuit system 0ct 2008.

49

[2] M.A.soderstrand,W.K.Jenkins,Residue Number System Arithmetic Modern application in Digital Signal Processing. [3] . Somayeh Timarchi, Keivan Navi Improved Modulo 2n +1 Adder Design International Journal of Computer and Information Engineering 2:7 2008 [4] Jun chen parallel-prefix structures for binary and modulo december, 2008. [5] Zimmermann and David Q. Tran Asilomar optimized synthesis of sum-ofA Comparative Study of products Reto Conference on Signals, Systems, and Computers, November 2003 [6] Feng Liu, Fariborz F.F, Otmane Ait Mohamed Conference on Digital System Design [7] F. Liu, Q. Tan Field programmable gate array prototyping of end-around carry parallel prefix tree architectures IET Computers & Digital Techniques Received on 27th March 2009 [8] J.Sklansky,conditional sum addition logicIRE Trans. Electron comput june 1960 [9] Amir Sabbagh Molahosseini, Keivan Navi, Chitra Dadkhah, Omid Kavehei, and Somayeh Timarchi .Efficient Reverse Converter Designs for the New 4-Moduli Sets IEEE transactions on circuits and systems, april 2010. [10]Residue number system world scientific publishing Pvt.Ltd. http://www.worldscibooks.com/engineering/p523.html [11] H T Vergos, D Nicholas Diminished one modulo 2n+1 adder design IEEE Tran comput.Dec 2002. [12] T. B Juang,M Y Tsai Corrections on VLSI Design of diminished one modulo 2n+1 adder using circular carry selection. [13] R. Zimmermann, Efficient VLSI implementation of modulo 2n 1 addition and multiplication, in Proc. 14th IEEE Symp. Comput. Arithmetic,Apr. 1999,. [14] A. S. Madhukumar and F. Chin, Enhanced architecture for residue number system-based CDMA for high-rate data transmission, IEEE Trans. Wireless Communn Sep. 2004 [15] G. L. Bernocchi, G. C. Cardarilli, Low-power adaptive filter based on RNS components, in Proc. IEEE ISCAS, May 2007, .. [16] N. Kostaras and H. T. Vergos, KoVer: A sophisticated residue arithmetic core generator, in Proc. 16th IEEE Int. Workshop Rapid Syst.Prototyp., 2005, Parallel Prefix Adders in FPGA Implementation of EAC 2009 12th Euromicro

50

[17] T. Keller, T. H. Liew, and L. Hanzo, Adaptive redundant residue number system coded multicarrier modulation, IEEE J. Sel. Areas Commun., , Nov. 2000. [18] G. C. Cardarilli, A. Nannarelli, and M. Re, Reducing power dissipation in FIR filters using the residue number system, in Proc. IEEE 43rd IEEE Midw. Symp. Circuits Syst.jan 2000,

51

You might also like