145 views

Uploaded by Uday Rao

multiplier using radix 8

- Cryptography in The Field of Cloud Computing for Enhancing Security
- booth multiplier
- BOOTH MULTIPLIER USING VHDL
- PowerVu Management Keys Hacked
- IJAIEM-2013-07-04-009
- New Trends in Cryptography
- A Survey on Enhancement of Text Security Using Steganography and Cryptographic Techniques
- Design and Implementation of Low-Power and Area-Efficient 64 bit CSLA using VHDL
- Jiang 2015
- IIJEC-2014-07-03-6
- ArithmeticCircuits
- Privacy Preserving by Anonymization Approach
- cloud security
- chapter- 5
- SECURED FRAMEWORK FOR PERVASIVE HEALTHCARE MONITORING SYSTEMS
- Teaching Information Security With Virtual Lab
- IJCST-V2I4P11 Author:P.J.Thangamani;G.Nagalakshmi
- Privacy Protection and the Law
- Cs 255 Lecture Notes
- Ciphertext Policy Attribute Based Encryption

You are on page 1of 78

INTRODUCTION

1.1

In recent years, we have experienced a great development in the field of digital

computers and communications systems. Several public-key cryptosystems were

proposed in order to enable the encryption of messages using a public encryption key e

without a prior communication of a secret key. The secrecy relies on the fact that

decryption key is computationally infeasible to deduce from the public encryption key.

Then, the only person who can decrypt the cipher-text is the receiver, who knows the

secret decryption keyd.

Public-key cryptography plays an important role in digital communication and

storage systems. Processing public-key cryptosystems requires huge amount of

computation, and, there is therefore, a great demand for developing dedicated hardware

to speed up the computations. Speeding up the computation using specialized hardware

enables the use of larger keys in public-key cryptosystems. This is translated into an

increase of the security of the system. Also, this enables the speedup of a secure link

between two distant points using an insecure channel, which is critical in real-time

systems. The reduction of the hardware amount is another important aspect when

implementing in dedicated hardware because it allows for the miniaturization of portable

devices and reduces fabrication costs

The Residue Number System (RNS) is a non-weighted number system that can

map large numbers to smaller residues, without any need for carry propagations .Its most

important property is that additions, subtractions, and multiplications are inherently

carry-free. These arithmetic operations can be performed on residue digits concurrently

and independently. Thus, using residue arithmetic, would in principle, increase the speed

of computations RNS has shown high efficiency in realizing special purpose applications

such as digital filters , image processing , RSA cryptography and specific applications

for which only additions, subtractions and multiplications are used and the number

1

dynamic range is specific. Special moduli sets have been used extensively to reduce the

hardware complexity in the implementation of converters and arithmetic operations.

Among which the triple moduli set {2n+1,2n,2n-1} have some benefits. Since the

operation of multiplication is of major importance for almost all kinds of processors,

efficient implementation of multiplication modulo 2n-1 is important for the application of

RNS.

1.2

RNS DEFINITION

A residue number system is characterized by a base that is not a single radix but an

An integer X is represented in the residue number system by N-tuple (x N, xN-1 x1)

where xI is a nonnegative integer satisfying

X = mI * qI + xI , (1)

where qI is the largest integer such that 0<=x I <= (mI 1). xi is known as the residue of X

modulo mi, and notations X mod mi and |X|mi are commonly used.

Example:

The RNS divides an integer into a number of smaller integers (i.e. with a shorter

binary representation) that can be processed in parallel independently of each other. This

provides a speed-up for arithmetic operations that are inherently dependent on operand

length, such as addition and multiplication. The disadvantages of using an RNS are the

complexity involved in division, magnitude comparisons and conversions between binary

and residue number.

1.3 ADVANTAGES

The RNS system provide a unique feature of parallelism that make arithmetic

operations such as addition, subtraction and modulation very easy to handle and perform

increasing speed and reducing chip area.

Carry free

High-Speed

Parallel Operation

Medium Security

Error Detection and Correction Capability

Fault Tolerant

RNS has its disadvantages too. Operations such as division, sign-detection, and

magnitude comparison and overflow detection are complex and hard to implement. This

has limited the application of RNS to certain fields where addition/multiplication

operations are used extensively and the result is known to be within a predetermined

range. RNS work only for integer values therefore adding extra cost for conversion from

binary-to-RNS and vice versa.

CHAPTER-2

LITERATURE SURVEY

CRYPTO SYSTEM:

There are two different meanings of the word cryptosystem. One is used by the

cryptographic community, while the other is the meaning understood by the public. In

this meaning, the term cryptosystem is used as shorthand for "cryptographic system". A

cryptographic system is any computer system that involves cryptography. Such systems

include for instance, a system for secure electronic mail which might include methods for

digital signatures, cryptographic hash functions, key management techniques, and so on.

Cryptographic systems are made up of cryptographic primitives, and are usually rather

complex.

Typically, a cryptosystem consists of three algorithms: one for key generation, one

for encryption, and one for decryption. The term cipher (sometimes cypher) is often used

to refer to a pair of algorithms, one for encryption and one for decryption. Therefore, the

term "cryptosystem" is most often used when the key generation algorithm is important.

For this reason, the term "cryptosystem" is commonly used to refer to public key

techniques; however both "cipher" and "cryptosystem" are used for symmetric key

techniques.

Public-key cryptography refers to a cryptographic system requiring two separate

keys, one of which is secret and one of which is public. Although different, the two parts

of the key pair are mathematically linked. One key locks or encrypts the plaintext, and

the other unlocks or decrypts the cipher text. Neither key can perform both functions by

itself. Public-key cryptography is a fundamental, important, and widely used

technology. It is an approach used by many cryptographic algorithms and cryptosystems.

It underpins such Internet standards as Transport Layer Security (TLS), PGP, and GPG.

There are three primary kinds of public key systems.

Fast multipliers are essential parts of digital signal processing systems. The speed

of multiply operation is of great importance in digital signal processing as well as in the

general purpose processors today, especially since the media processing took off. In the

past multiplication was generally implemented via a sequence of addition, subtraction,

and shift operations. Multiplication can be considered as a series of repeated additions.

The number to be added is the multiplicand, the number of times that it is added is the

multiplier, and the result is the product. Each step of addition generates a partial product.

In most computers, the operand usually contains the same number of bits.

The basic multiplication principle is twofold i.e. evaluation of partial products and

accumulation of the shifted partial products. It is performed by the successive additions

of the columns of the shifted partial product matrix. The multiplier is successfully

shifted and gates the appropriate bit of the multiplicand.

In order to achieve high-speed multiplication, multiplication algorithms using

parallel counters, such as the modified Booth algorithm has been proposed, and some

multipliers based on the algorithms have been implemented for practical use. This type of

multiplier operates much faster than an array multiplier for longer operands because its

computation time is proportional to the logarithm of the word length of operands.

Booth multiplication is a technique that allows for smaller, faster multiplication

circuits, by recoding the numbers that are multiplied. It is possible to reduce the number

of partial products by half, by using the technique of radix-4 Booth recoding. The basic

idea is that, instead of shifting and adding for every column of the multiplier term and

multiplying by 1 or 0, we only take every second column, and multiply by 1, 2, or 0, to

obtain the same results. Grouping starts from the LSB, and the first block only uses two

bits of the multiplier..

Each block is decoded to generate the correct partial product. The encoding of the

multiplier Y, using the modified booth algorithm, generates the following five signed

digits, -2, -1, 0, +1, +2. Each encoded digit in the multiplier performs a certain operation

on the multiplicand, X, as illustrated in Table 1

For the partial product generation, we adopt Radix-4 Modified Booth algorithm to

reduce the number of partial products for roughly one half. For multiplication of 2s

complement numbers, the two-bit encoding using this algorithm scans a triplet of bits.

When the multiplier B is divided into groups of two bits, the algorithm is applied to this

group of divided bits.

The PP generator generates five candidates of the partial products, i.e., {-2A,-A,

0, A, 2A}. These are then selected according to the Booth encoding results of the operand

B. When the operand besides the Booth encoded one has a small absolute value, there are

opportunities to reduce the spurious power dissipated in the compression tree

Carry-save adder is a type of digital adder, used in computer micro architecture to

compute the sum of three or more n-bit numbers in binary. It differs from other digital

adders in that it outputs two numbers of the same dimensions as the inputs, one which is a

sequence of partial sum bits and another which is a sequence of carry bits.

Motivation

Consider the sum:

1234567

+ 87654322

=100000000.

Using the arithmetic we learned as children, we go from right to left, "8+2=0,

carry 1", "7+2+1=0, carry 1", "6+3+1=0, carry 1", and so on to the end of the sum.

Although we know the last digit of the result at once, we cannot know the first digit until

we have gone through every digit in the calculation, passing the carry from each digit to

7

the one on its left. Thus adding two n-digit numbers has to take a time proportional to n,

even if the machinery we are using would otherwise be capable of performing many

calculations simultaneously.

The carry-save unit consists of n full adders, each of which computes a single sum

and carry bit based solely on the corresponding bits of the three input numbers. Given the

threen - bit numbers a, b, and c, it produces a partial sum ps and a shift-carry sc:

The entire sum can then be computed by:

1.

2.

Appending a 0 to the front (most significant bit) of the partial sum sequence ps.

3.

Using a ripple carry adder to add these two together and produce the resulting n +

1-bit value.

When adding together three or more numbers, using a carry-save adder followed

by a ripple carry adder is faster than using two ripple carry adders. This is because a

ripple carry adder cannot compute a sum bit without waiting for the previous carry bit to

be produced, and thus has a delay equal to that of n full adders.

Advantages:

1. Produce all of its output in parallel resulting in the same as a full adder.

2. Very little propagation delay when Cary save adder plus ripple adder=n+1 and 2 ripple

carry adders=2n.

3. Allow for high clock speeds

Disadvantages:

1. We do not know whether the result is positive or negative

2. This is the draw back when performing modulo multiplication since you didnt know

whether the inter mediate result is greater than or less than the modulation.

CHAPTER-3

ADDERS & BINARY MULTIPLIERS

ADDER

In electronics, an adder is a digital circuit that performs addition of numbers. In

modern computers adders reside in the arithmetic logic unit (ALU) where other

operations are performed. Although adders can be constructed for many numerical

representations, such as Binary-coded decimal or excess-3, the most common adders

operate on binary numbers. In cases where two's complement is being used to represent

negative numbers it is trivial to modify an adder into an adder-subtracter

Types of adders

For single bit adders, there are two general types.

A half adder has two inputs, generally labelled A and B, and two outputs, the sum

S and carry C. S is the two-bit XOR of A and B, and C is the AND of A and B. Essentially

the output of a half adder is the sum of two one-bit numbers, with C being the most

significant of these two outputs.

The second type of single bit adder is the full adder. The full adder takes into

account a carry input such that multiple adders can be used to add larger numbers. To

remove ambiguity between the input and output carry lines, the carry in is labelled Ci or

Cin while the carry out is labelled Co or Cout.

9

Half adder

10

binary digits. The half adder produces a sum and a carry value which are both binary

digits.

InputOutput

ABC

0 0 0

1 0 1

0 0 1

1 1 0

Full adder

Fig 3.2: Inputs: {A, B, Carry In} Outputs: {Sum, Carry Out}

11

A full adder is a logical circuit that performs an addition operation on three binary

digits. The full adder produces a sum and carries value, which are both binary digits. It

can be combined with other full adders (see below) or work on its own.

Input Output

ABCiCo S

000 0

001 0

010 0

011 1

100 0

101 1

110 1

111 1

Note that the final OR gate before the carry-out output may be replaced by an

XOR gate without altering the resulting logic. This is because the only discrepancy

between OR and XOR gates occurs when both inputs are 1; for the adder shown here, one

can check this is never possible. Using only two types of gates is convenient if one

desires to implement the adder directly using common IC chips.

A full adder can be constructed from two half adders by connecting A and B to the input

of one half adder, connecting the sum from that to an input to the second adder,

connecting Ci to the other input and or the two carry outputs. Equivalently, S could be

made the three-bit xor of A, B, and Ci and Co could be made the three-bit majority

12

function of A, B, and Ci. The output of the full adder is the two-bit arithmetic sum of

three one-bit numbers.

BINARY MULTIPLIER

A Binary multiplier is an electronic hardware device used in digital electronics or

a computer or other electronic device to perform rapid multiplication of two numbers in

binary representation. It is built using binary adders.

The rules for binary multiplication can be stated as follows

1. If the multiplier digit is a 1, the multiplicand is simply copied down and

represents the product.

2. If the multiplier digit is a 0 the product is also 0.

For designing a multiplier circuit we should have circuitry to provide or do the following

three things:

1. It should be capable identifying whether a bit is 0 or 1.

2. It should be capable of shifting left partial products.

3. It should be able to add all the partial products to give the products as sum of

partial products.

4. It should examine the sign bits. If they are alike, the sign of the product will be a

positive, if the sign bits are opposite product will be negative. The sign bit of the

product stored with above criteria should be displayed along with the product.

From the above discussion we observe that it is not necessary to wait until all the partial

products have been formed before summing them. In fact the addition of partial product

can be carried out as soon as the partial product is formed.

13

Notations:

a multiplicand

b multiplier p

product

Binary multiplication (eg n=4)

p=ab

an1 an2 a1a0

bn1bn2 b1b0

p2 n1 p2 n2 p1 p0

xxxx

xxxx

--------xxxx

xxxx

xxxx

xxxx

b0a20

b1a21

b2a22

b3a23

--------------xxxxxxxx

14

Multiplication followed by accumulation is a operation in many digital systems,

particularly those highly interconnected like digital filters, neural networks, data

quantisers, etc.

One typical MAC(multiply-accumulate) architecture is illustrated in figure. It

consists of multiplying 2 values, then adding the result to the previously accumulated

value, which must then be restored in the registers for future accumulations. Another

feature of MAC circuit is that it must check for overflow, which might happen when the

number of MAC operation is large .

This design can be done using component because we have already design each of

the units shown in figure. However since it is relatively simple circuit, it can also be

designed directly. In any case the MAC circuit, as a whole, can be used as a component in

application like digital filters and neural networks.

15

The architecture of a radix 2n multiplier is given in the Figure. This block

diagram shows the multiplication of two numbers with four digits each. These numbers

are denoted as V and U while the digit size was chosen as four bits. The reason for this

will become apparent in the following sections. Each circle in the figure corresponds to

a radix cell which is the heart of the design. Every radix cell has four digit inputs and

two digit outputs. The input digits are also fed through the corresponding cells.

The dots in the figure represent latches for pipelining. Every dot consists of four

latches. The ellipses represent adders which are included to calculate the higher order

bits.

16

BOOTH MULTIPLIER

The decision to use a Radix-4 modified Booth algorithm rather than Radix-2

Booth algorithm is that in Radix-4, the number of partial products is reduced to n/2.

Though Wallace Tree structure multipliers could be used but in this format, the

multiplier array becomes very large and requires large numbers of logic gates and

interconnecting wires which makes the chip design large and slows down the operating

speed.

Booth Multiplication Algorithm for radix 2

Booth algorithm gives a procedure for multiplying binary integers in signed 2s

complement representation. I will illustrate the booth algorithm with the following

example: Example, 2 ten x (- 4) ten

0010 two * 1100 two

Step 1: Making the Booth table

I. From the two numbers, pick the number with the smallest difference between a series

of consecutive numbers, and make it a multiplier.

i.e., 0010 -- From 0 to 0 no change, 0 to 1 one change, 1 to 0 another change ,so there are

two changes on this one

1100 -- From 1 to 1 no change, 1 to 0 one change, 0 to 0 no change, so there is only one

change on this one.

Therefore, multiplication of 2 x ( 4), where 2

ten

II. Let X = 1100 (multiplier)

Let Y = 0010 (multiplicand)

17

ten

(0010

two

III. Load the X value in the table.

IV. Load 0 for X-1 value it should be the previous first least significant bit of X

V. Load 0 in U and V rows which will have the product of X and Y at the end of

operation.

VI. Make four rows for each cycle; this is because we are multiplying four bits numbers.

Booth algorithm requires examination of the multiplier bits, and shifting of the

partial product. Prior to the shifting, the multiplicand may be added to partial product,

subtracted from the partial product, or left unchanged according to the following rules:

Look at the first least significant bits of the multiplier X, and the previous least

significant bits of the multiplier X - 1.

0 0 Shift only

1 1 Shift only.

0 1 Add Y to U, and shift

1 0 Subtract Y from U, and shift or add (-Y) to U and shift

Take U & V together and shift arithmetic right shift which preserves the sign bit

of 2s complement number. Thus a positive number remains positive, and a negative

number remains negative.

Shift X circular right shift because this will prevent us from using two registers

18

19

We have finished four cycles, so the answer is shown, in the last rows of U and V which

is: 11111000two.

Note: By the fourth cycle, the two algorithms have the same values in the Product

register.

20

One of the solutions of realizing high speed multipliers is to enhance parallelism

which helps to decrease the number of subsequent calculation stages. The original version

of the Booth algorithm (Radix-2) had two drawbacks. They are: (i) The number of add

subtract operations and the number of shift operations becomes variable and becomes

inconvenient in designing parallel multipliers. (ii) The algorithm becomes inefficient

when there are isolated 1s. These problems are overcome by using modified Radix4

Booth algorithm which scan strings of three bits with the algorithm given below:

1) Extend the sign bit 1 position if necessary to ensure that n is even.

2) Append a 0 to the right of the LSB of the multiplier.

3) According to the value of each vector,each Partial Product will he 0, +y , -y, +2y or

2y.

The negative values of y are made by taking the 2s complement and in this paper

Carry-look-ahead (CLA) fast adders are used. The multiplication of y is done by shifting

y by one bit to the left. Thus, in any case, in designing a n-bit parallel multipliers, only

n/2 partial products are generated.

X(i)

X(i1)

X(i2)

+0

+y

+y

+2y

2y

+0

21

CHAPTER-4

PROBLEM IDENTIFICATION

Multipliers are most commonly used in various electronic applications e.g. Digital

signal processing in which multipliers are used to perform various algorithms like FIR,

IIR etc. Earlier, the major challenge for VLSI designer was to reduce area of chip by

using efficient optimization techniques to satisfy MOORES law. Then the next phase is

to increase the speed of operation to achieve fast calculations like, in todays

microprocessors millions of instructions are performed per second. Speed of operation is

one of the major constraints in designing DSP processors and todays general-purpose

processors. However area and speed are two conflicting constraints. So improving speed

results always in larger areas. Now, as most of todays commercial electronic products are

portable like Mobile, Laptops etc. that require more battery backup. Therefore, lot of

research is going on to reduce power consumption. So, in this paper it is tried to find out

the best solution to achieve low power consumption, less area required and high speed for

multiplier operation. The basic principle used for multiplication is to evaluate partial

products and accumulation of shifted partial products. In order to perform this operation

number of successive addition operation is required. Therefore one of the major

components required to design a multiplier is Adder. Adders can be Ripple Carry, Carry

Look Ahead, Carry Select, Carry Skip and Carry Save [1-3]. A lot of research work has

been done to analyze performance of different fast adders. The effect of the RCA wordlength, on the time complexities of each constituent component of the multiplier is

analyzed qualitatively and the multiplier delay is shown to be almost linearly dependent

on the RCA word-length. Consequently, the delay of the multiplier can be directly

controlled by the wordlength of the RCAs. By means of modulo arithmetic properties, we

show that the compensation constant that negates the effect of the bias introduced in this

process can be precomputed and implemented by direct hardwiring with no delay

overhead for all feasible combinations of and it is shown that the proposed multiplier

lowers power dissipation of the radix-4 Booth encoded multiplier.

22

CHAPTER-5

IMPLEMENTATION DETAILS

5.1 MODULAR MULTIPLICATION

Modular arithmetic operations (i.e., inversion, multiplication and exponentiation)

are used in several cryptography applications, such as decipherment operation of RSA

algorithm, Difie-Hellman key exchange algorithm, elliptic curve cryptography, and the

Digital Signature Standard including the Elliptic Curve Digital Signature Algorithm.

Modular Multiplication is the key algorithm of RSA and other public key

cryptosystems, and so provides an indication of the efficiency of the RNS

implementation. The majority of the currently established Public-Key Cryptosystems

(RSA, Difie-Hellman, Digital Signature Algorithm (DSA), Elliptic Curves (ECC), etc.)

require modular multiplication in finite fields as their core operation which accounts for

up to 99% of the time spent for encryption and decryption.

Modular Multiplication in Public Key Cryptosystems

One of the cornerstones of public-key cryptography is modular arithmetic, on

which nearly all established schemes are based. An efficient software implementation of

modular arithmetic is therefore desirable. While modular additions and subtractions are

rather trivial cases, efficient modular multiplication remains an elusive target for

optimization.

23

Multiplier :

modulo

2n 1

modulo 2n 1 with the preceding adder.

Partial products :

reduction modulo

2n 1

n1

Let A=

a

k=0

2k

.1

24

A multiplied by a power of 2, 2j , results a left cyclic shifting of j bits

n1

A.2j=

2k+j

k=0

n1 j

=

k=0

j1

2k+j +

n-j+k

k=0

2k Mod (2N-1)

A.2j = { an-1-j an-2-j .a0 an-1 an-2 ..an-j}

Mod (2N-1)

Now let

n1

B=

m=0

2m

.. 4

n1

AB=

m=0

partial products, which is the ANDing of bm and the bit row of Eq. ().The multiplication is

thus converted to the summation of the partial product array together to get the final

product.

GENERAL CLASSIFICATION OF RNS-BASED MULTIPLIERS

Modular RNS-based multipliers can be classified into three main groups.

I. The first group deals with specific moduli, i.e. 2n-1, 2n, or 2n+1. Figure 2 below shows

An example of a type-1 modular multiplier that make use of an LUT.

II. The second type uses any moduli value and utilizes special ROM architectures in order

25

III. The third group of multipliers handle medium to large values of moduli but it uses

mainstream arithmetic components that have been developed beforehand, thus facilitating

the job of the hardware designer by reducing the overall project lifespan. These

components could be regular binary multipliers, adders, subtractors, logic components

and small size ROM architectures.

Booth multiplication is a technique that allows for smaller, faster multiplication

circuits, by recoding the numbers that are multiplied. It is the standard technique used in

chip design, and provides significant improvements over the "long multiplication"

technique.

26

which is more aggressive than the radix-4 Booth encoding. However, in the radix-8

Booth encoded modulo 2n-1 multiplication, not all modulo-reduced partial products can

be generated using the bitwise circular-left-shift operation and bitwise inversion.

Radix-4 and radix-8 multiplication

Recoding of binary numbers was first hinted at by Booth four decades ago.

MacSorley proposed a modification of Booths algorithm a decade after. The modified

Booths algorithm (radix-4 recoding) starts by appending a zero to the right of x0

(multiplier LSB).

Xi

Xi

+2

+1

Parti

Xi

al

pro

duct

0 0Y

1 +1Y

0 +1Y

1 +2Y

0 -2Y

1 -1Y

0 -1Y

10

( n2 )1

X=

D i.4i

.6

i =0

27

Quartet

Signed

values

digit

values

0000

0001

+1

0010

+1

0011

+2

0100

+2

0101

+3

0110

+3

0111

+4

1000

-4

1001

-3

1010

-3

1011

-2

1100

-2

1101

-1

1110

-1

1111

28

Here we have an odd multiple of the multiplicand, 3Y, which is not immediately

available. To generate it we need to perform this previous add: 2Y+Y=3Y. But we are

designing a multiplier for specific purpose and thereby the multiplicand belongs to a

previously known set of numbers which are stored in a memory chip. We have tried to

take advantage of this fact, to ease the bottleneck of the radix-8 architecture, that is, the

generation of 3Y. In this manner we try to attain a better overall multiplication time, or at

least comparable to the time we could obtain using radix-4 architecture (with the

additional advantage of using a less number of transistors). To generate 3Y with 21-bit

words we only have to add 2Y+Y, that is, to add the number with the same number

shifted one position to the left, getting in this way a new 23-bit word, as shown in figure

4:

In fact, only a 21-bit adder is needed to generate the bit positions from z1 to z21.

Bits z0 and z22 are directly known because z0=y0 and z22=y20 (sign bit of the 2scomplement number; 3Y and Y have the same sign). If in the memory from where we

take the numbers just two additional bits are stored together with each value of the set of

numbers, we can decompose the previous add in three shorter adds that can be done in

parallel. In this way, the delay is the same of a 7-bit adder:

29

Bits which are going to be stored are the two intermediate carry signals c8 and

c15. Before each word of the set of numbers is stored in the memory, the value of its

intermediate carries has to be obtained and stored beside it. In this way, they are

immediately available when it is required to perform the previous add to get the multiple

3Y of one of the numbers that belongs to the set.

The digit set conversion is given by

Di =y3i-1+y3i+2y3i+1-4y3i+2

.7

Where y-1, yn, yn+1 and yn+2 are zero. For the radix-8 Booth encoded modulo 2n-1

multiplier, the required modulo-reduced partial products are shown in Table 4.

From Table 3, the necessary modulo-reduced partial products except 3X can be

generated by circular-left-shift operation and/or bit-wise complementation of the

multiplicand, X. The generation of

30

ENCODING

ALGORITHM:

n1

Let X =

xi

i=0

n1

and Y=

yi

i=0

multiplier of the modulo 2n-1 multiplier, respectively. The radix-8 Booth encoding

algorithm can be viewed as a digit set conversion of four consecutive overlapping

multiplier bits y3i+2 y3i+1y3i (y3i-1) to a signed digit, di ,di

[]

n

3

Where

y-1 =yn=yn+1=yn+2=0

31

(1)

di

|di.X|2n-1

+0

00

+1

TABLE 5:

+2

CLS(X,1)

+3

|+3x|2n-1

+4

CLS(X,2)

ModuloReduced

Multiples

For The

Table 5 summarizes the modulo-

di

|di.X|2n-1

-0

11

-1

-2

CLS( X ,1)

-3

|-3x|2n-1

-4

CLS( X ,2)

reduced multiples of X for all possible values of the radix-8 Booth encoded multiplier

digit, di, where CLS(X, J) denotes a circular-left-shift of X by j bit positions. Three

unique properties of modulo 2n-1 arithmetic that will be used for simplifying the

combinatorial logic circuit of the proposed modulo multiplier design are reviewed here.

The all possible two operand adder implementations, the RCA has indubitably the

least area and dynamic power dissipation. The addends X2n-1 and 2X2n-1are added with

carry propagation through full adders (FAs), and the end-around-carry addition is realized

with carry propagation through half adders.

32

additions in series such that the carry propagation length is twice the operand length, n. In

the worst case, the late arrival of the |+3X

n

2 -1

stages of the modulo 2n-1multiplier. Hence, this approach for hard multiple generation

can no longer categorically ensure that the multiplication in the modulo 2 n-1 channel still

falls in the noncritical path of a RNS multiplier.

PROPOSED RADIX-8 BOOTH ENCODED MODULO 2 n-1 MULTIPLIER

DESIGN

To ensure that the radix-8 Booth encoded modulo multiplier does not constitute

the system critical path of a high-DR moduli set based RNS multiplier, the carry

propagation length in the hard multiple generation should not exceed n-bits. To this end,

the carry propagation through the HAs in Fig. 5 can be eliminated by making the endaround-carry bit c7 a partial product bit to be accumulated in the CSA tree. This technique

reduces the carry propagation length to n bits by representing the hard multiple as a sum

and a redundant end-around-carry bit pair.

GENERATION OF PARTIALLY-REDUNDANT HARD MULTIPLE

Let |X2n-1 and 2X2n-1be added by a group of M=(n/k) k-bit RCAs such that there is

no carry propagation between the adders. shows this addition for n=8 and k=4.

33

where the sum and carry-out bits from the RCA block are represented as

respectively. In Fig. 6, the carry-out of RCA 0,

is not propagated to the carry input of RCA 1 but preserved as one of the partial

product bits to be accumulated in the CSA tree. The binary weight of the carry-out

of

RCA 1 has, however, exceeded the maximum range of the modulus and has to be modulo

reduced before it can be accumulated by the CSA tree.

From Fig., the partially-redundant form of |+3X 2n-1 is given by the partial-sum

and partial-carry pair (S, C) where

(5)

the negative hard multiple in a partially-redundant form,

computed as follows:

34

is

(6)

the hard multiple such that both C and

M 1

B=

2k . j

j=0

=

0. .....01...0....01

. (7)

The addends for the computation of the biased hard multiple, |B+3X 2n-1 in a

partially-redundant form are X2n-1 and 2X2n-1 and B or equivalently S , C and B. Since B

is chosen to be a binary word that has logic ones at bit positions 2kj , and logic zeros at

other bit positions,| B+3X 2n-1 can be generated by simple XNOR and OR operations on

the bits of and at bit positions 2kj . Fig. 7 illustrates how these bits in the sum and the

carry outputs of RCA 0 and RCA 1 are modified. In general |B+3X 2n-1, is given by the

partial-sum and partial-carry pair (BS, BC) such that

.. (8)

Where

(9)

35

And

... (10)

For j= 0, 1.M-1.

Let

.. (11)

2B|2n-1. Therefore,

modulo 2n-1 is |

The proposed technique represents the hard multiple in a biased partiallyredundant form. Since the occurrences of the hard multiple cannot be predicted at design

time, all multiples must be uniformly represented. Similar to the hard multiple, all other

Booth encoded multiples listed in Table 5 must also be biased and generated in a

partially-redundant form. Fig. 8 shows the biased simple multiples, |B+0|2n-1, |B+X|2n-1, |

B+2X|2n-1 , |B+4X|2n-1 represented in partially redundant form for n=8. From Fig. 8, it can

be seen that the generation of these biased multiples involves only shift and selective

complementation of the multiplicand bits without additional hardware overhead.

36

PARTIALLY-REDUNDANT PARTIAL PRODUCTS

The i-th partial product of a radix-8 Booth encoded modulo 2n-1 multiplier is given by

PPi=|23i .di . X) |2n-1.. (12)

To include the bias B necessary for partially-redundant representation of PPi, (12) is

modified to

PPi=|23i (B+di . X) |2n-1. (13)

Using Property 3, the modulo 2n-1 multiplication by 23i , in (13) is efficiently

implemented as bitwise circular-left-shift of the biased multiple,(B+ di . X). For n=8, k=4,

37

Fig. 9 illustrates the partial product matrix of |X .Y|28-1 with (N/3+1) partial products in

partially-redundant representation. Each PPi consists of an n-bit vector, ppi7, ppi1, ppi0 and

a vector of n/k=2, redundant carry bits qi1,qi0 . Since qi0 and qi1 are the carry-out bits of the

RCAs, they are displaced by k-bit positions for a given PP i. The bits, qij is displaced

circularly to the left of q(i-1)j by 3 bits, i.e., q20 and q21 are displaced circularly to the left

of q10 and q11 by 3 bits, respectively q10 and q11 are in turn displaced to the left of q 00

and q01 by 3 bits, respectively. The last partial product in Fig. 9 is the Compensation

Constant (CC) for the bias introduced in the partially- redundant representation.

The generation of qij the modulo-reduced partial products, PP0, PP1, and PP2, in a

partially-redundant representation using Booth Encoder (BE) and Booth Selector (BS)

blocks are illustrated in Fig. 10. The BE block produces a signed one-hot encoded digit

38

from adjacent overlapping multiplier bits as illustrated in Fig. 11(a). The signed one-hot

encoded digit is then used to select the correct multiple to generate PP i. A bit-slice of the

radix-8 BS for the partial product bit, ppij is shown in Fig.

As the bit positions of do not overlap, as shown in Fig., they can be merged into a

single partial product for accumulation. The merged partial products, PP i and the constant

39

CC are accumulated using a CSA tree with end-around-carry addition at each CSA level

and a final two-operand modulo 2n-1 adder as shown in Fig.

40

MOTIVATIONS

To humans, decimal numbers are easy to comprehend and implement for

performing arithmetic. However, in digital systems, such as a microprocessor, DSP

(Digital Signal Processor) or ASIC (Application-Specific Integrated Circuit), binary

numbers are more pragmatic for a given computation.

Binary carry-propagate adders have been extensively published, heavily attacking

problems related to carry chain problem. Binary adders evolve from linear adders, which

have a delay approximately proportional to the width of the adder, e.g. ripple-carry adder

(RCA) , to logarithmic-delay adder, such as the carry-lookahead adder (CLA) .

Modulo addition, an operation with a small variation to binary addition can also

be applied with prefix architectures. Common modulo addition can even be found in

memory addressing. Modulo 2n-1 addition is one of the most common operations that has

been put to hardware implementations because of its circuit efficiency. Furthermore,

modulo 2n + 1 addition is critical to improving advanced cryptography techniques.

MODULO 2N-1 ADDERS

Arithmetic modulo 2n-1 (Mersenne numbers) and modulo 2n + 1 (Fermat numbers) is

used in various applications, e.g., residue number systems (RNS) and cryptography.

Modulo 2n-1 addition is one of the most common operations that has been put to

hardware implementations because of its circuit efficiency. There are several ways of

41

doing modulo 2n-1 addition. The basic idea is to add the carry-out to the sum as in the

fashion of end-around add.

MODULO (2N-1) ADDITION

Modulo (2n-1) addition or, which is the same, ones complement addition can be

formulated as

A + B( 2n 1 )

( A+ B+1 ) mod 2n

if A+ B 2n 1 .. (5.3.1)

A+ B Ot h erwise

Note that the value 11..1 never occurs and that only one single representation

00..0 of zero exists. The equation (14) can be rewritten using the condition A+B

2n =

A + B( 2n 1 )

( A+ B+1 ) mod 2n

.. (5.3.2)

if A +B 2n

A+ B Ot h erwise

new condition

A+B 2n is equivalent to cout=1,where cout is the carryout of the addition A + B, equation

(5.3.2) can be rewritten as

(A+B) mod = (A+B+ cout) mod2n

.(16)

realized by the n-bit end-around-carry parallel-prefix adder of Fig.

UNDERSTANDING PARALLEL-PREFIX STRUCTURES

To resolve the delay of carry-lookahead adders, the scheme of multilevellookahead adders or parallel-prefix adders can be employed. The idea is to compute small

group of intermediate prefixes and then find large group prefixes, until all the carry bits

42

are computed. These adders have tree structures within a carry-computing stage similar to

the carry propagate adder. The Process Steps involved in Parallel Prefix Addition is

depicted in fig 15.

A parallel prefix adder can be seen as a 3-stage process:

Pre-computation:

In pre-computation stage, each bit computes its carry generate (g)/propagate (p)

signals and a temporary sum as below. These two signals are said to describe how the

Carry-out signal will be handled.

gi=ai .bi

pi=ai xor bi

ci+1=gi+pi .ci

Prefix:

In the prefix stage, the group carry generate/propagate signals are computed to

form the carry chain and provide the carry-in for the adder below. Various signal

graphs/architectures can be used to calculate the carry-outs for the final sum. A few of

them are as follows.

Sklansky

Brent-kung

Ladner-Fischer

43

Post-computation:

In the post-computation stage, the sum and carry-out are finally produced. The

carry-out can be omitted if only a sum needs to be produced.

si=pi xor ci

Parallel-prefix structures are found to be common in high performance adders

because of the delay is logarithmically proportional to the adder width. An example of an

8-bit parallel-prefix structure is shown in figure below.

44

In the prefix tree, group generate/propagate are the only signals used. The group

generate/ propagate equations are based on single bit generate/propagate, which are

computed in the pre-computation stage.

gi=ai .bi

pi=ai xor bi (5.5.1)

where 0 i n. g-1 = cin and p-1 = 0. Sometimes, pi can be computed with OR logic

instead of an XOR gate.

In the prefix tree, group generate/propagate signals are computed at each bit.

Gi:k=Gi:j+Pi:j.Gj-1:k

Pi:k=Pi:j.Pj-1:k (5.5.2)

More practically, Equation (5.5.2) can be expressed using a symbol

denoted by Brent and Kung . Its function is exactly the same as that of a black cell. That

is

Or

Gi:k=(gi,pi) o (gi-1,pi-1)o.o(gk,pk)

pi:k=pi.pi-1..pk

The

. (5.5.4)

In the post-computation, the sum and carry-out are the final output.

45

Si=pi.Gi-1:-1

Cout=Gn:-1.. (5.5)

Where -1 is the position of carry-input. The generate/propagate signals can be

grouped in different fashion to get the same correct carries. Based on different ways of

grouping the generate/propagate signals, different prefix architectures can be created.

Figure 17 shows the definitions of cells that are used in prefix structures, including black

cell and gray cell. Black/gray cells implement Equation (5.5.2) or (5.5.3), which will be

heavily used in the following discussion on prefix trees

EMPTY PREFIX TREE

.

Fig 5.4.5: 8-bit Empty Prefix Tree

46

Step 1:

Step 2 :

47

Step 3:

The way of building a prefix tree can be processed as the arrows indicate (i.e.

from LSB to MSB horizontally and then from top logic level down to bottom logic level

vertically).

The example shown in Figure 19.3 is an 8-bit Sklansky prefix tree.

Sklansky prefix tree takes the least logic levels to compute the carries. Plus, it

uses less cells than Knowles and Kogge-Stone structure at the cost of higher fan-out.

48

Figure 19.4 shows the 16-bit example of Sklansky prefix tree with critical path in solid

line.Few of them are given below.

Kogge-Stone prefix tree

Brent-kung

Ladner-Fischer

Han-Carlson Prefix Tree

TYPES

LOGIC

AREA

LEVELS

FAN-

WIRE

OUT

TRACKS

Brent-Kung

2 log 2 n1

2 nlog2 n2

Kogge-Stone

log 2 n

n log 2 nn+1

n/2

Ladner-

log 2 n+1

n/4+1

log 2 n

n log 2 nn+1

n/4

log 2 n

(n/2) log 2 n

n/2+1

Fischer

Knowles[2,1,1,

1]

Sklansky

49

Han-Crlson

log 2 n

(n/2) log 2 n

n/4

Harris

log 2 n+1

(n/2) log 2 n

n/8

In a prefix problem, n inputs xn-1, xn-2 .. .x0 and an arbitrary associative operator

are used

to compute n outputs yi=xi.xi-1..x0 for i=0,1,2.n-1.Thus each output yi is dependent

on all inputs xj of same or lower magnitude (j i) .

Carry propagation in binary addition is a prefix problem. The n-bit carry

propagate addition with input operands A and B, carry-in cin, sum output S, and carry-out

cout can be expressed by the logic equations:

2

(Cout ,S)=

Cout + S= A + B + Cin

PRE-PROCESSING:

gi=

a0 b0 +a 0 c 0 +b 0 c 0 if i=0

a i b i otherwise

pi=ai xor bi

PREFIX COMPUTATION:

(G0i:i,P0i:i) = (gi,pi)

50

=( Gl-1i:j+1+ Pl-1i:j+1 Gl-1j:k , Pl-1i:j+1 Pl-1j:k)

POST- PROCESSING:

m

Ci+1= G

i:0

si=pi xor ci

The cell definitions for the above mentioned codes have been depicted in figure 19.

51

52

The prefix-structure size is only increased by n black nodes and the critical path

by one black node, which results in highly area and delay efficient end-around-carry

adders. Note that an n-bit end-around-carry parallel-prefix adder has the same delay but is

smaller compared to an ordinary 2n-bit parallel-prefix adder.

COMPARISON OF DIFFERENT PREFIX ADDERS

SKLANSKY ADDER

Minimal depth

High fan-out nodes

53

KOGGE-STONE ADDER

Low depth

High node count (implies more area).

Minimal fan-out of 1 at each node (implies faster performance)

54

LADNER-FISCHER ADDER

Low depth

High fan-out nodes

This adder topology appears the same as the Schlanskly adder. Ladner-Fischer

formulated a parallel prefix network design space which included this minimal depth

BRENT-KUNG ADDER

55

Maximum logic depth in PP adders (implies longer calculation time).

Minimum number of nodes (implies minimum area).

CHAPTER-6

56

EXECUTION DETAILS

6.1SOFTWARE REQUIREMENTS

MODELSIM 6.4b

XILINX 14.6

It requires Xilinx ISE 10.1 version of software where Verilog source code can be

used for design implementation.

Introduction To Modelsim

In ModelSim, all designs are compiled into a library. You typically start a new

simulation in ModelSim by creating a working library called "work". "Work" is the

library name used by the compiler as the default destination for compiled design units.

Compiling Your Design: After creating the working library, you compile your design

units into it. The ModelSim library format is compatible across all supported

platforms. You can simulate your design on any platform without having to recompile

your design.

Loading the Simulator with Your Design and Running the Simulation With the design

compiled, you load the simulator with your design by invoking the simulator on a toplevel module (Verilog) or a configuration or entity/architecture pair (VHDL).

Assuming the design loads successfully, the simulation time is set to zero, and you

enter a run command to begin simulation.

Debugging Your Results

If you dont get the results you expect, you can use ModelSims robust debugging

Environment to track down the cause of the problem.

Basic Simulation Flow

The following diagram shows the basic steps for simulating a design in ModelSim.

Create a working

library

57

Load and Run

Simulation

Debug results

Project Design Flow

Important differences:

You do not have to create a working library in the project flow; it is done for you

automatically.

Create a project

project

Compile design files

Run simulation

Debug results

58

This tool can be used to create, implement, simulate, and synthesize Verilog designs for

implementation on FPGA chips.

ISE: Integrated Software Environment

Environment for the development and test of digital systems design targeted to

FPGA or CPLD

Integrated collection of tools accessible through a GUI

Based on a logical synthesis engine (XST: Xilinx Synthesis Technology)

XST supports different languages:

Verilog

VHDL

XST produce a net list integrated with constraints

Supports all the steps required to complete the design:

Translate, map, place and route

Bit stream generation

Supports verification at different steps of the design

Step 1: Design entry

HDL (Verilog or VHDL, ABEL x CPLD), Schematic Drawings, Bubble

Diagram

Step 2: Synthesis

Translates .v, .vhd, .sch files into a netilist file (.ngc)

Step 3: Implementation

FPGA: Translate/Map/Place & Route, CPLD: Fitter

Step 4: Configuration/Programming

Download a BIT file into the FPGA

Program JEDEC file into CPLD

Program MCS file into Flash PROM

Simulation can occur after steps 1, 2, 3

59

INTRODUCTION

FPGA stands for Field Programmable Gate Array which has the array of logic

module, I /O module and routing tracks (programmable interconnect). FPGA can be

configured by end user to implement specific circuitry. Speed is up to 100 MHz but at

present speed is in GHz.

FPGA DESIGN FLOW

FPGA contains a two dimensional arrays of logic blocks and interconnections

between logic blocks. Both the logic blocks and interconnects are programmable. Logic

blocks are programmed to implement a desired function and the interconnects are

programmed using the switch boxes to connect the logic blocks.

FPGAs, alternative to the custom ICs, can be used to implement an entire System

On one Chip (SOC). The main advantage of FPGA is ability to reprogram. User can

reprogram an FPGA to implement a design and this is done after the FPGA is

manufactured. This brings the name FieldProgrammable.

SRAM is used to implement a LUT.A k-input logic function is implemented using

2^k * 1 size SRAM. Number of different possible functions for k input LUT is 2^2^k.

Advantage of such an architecture is that it supports implementation of so many logic

functions, however the disadvantage is unusually large number of memory cells required

to implement such a logic block in case number of inputs is large.

60

LUT based design provides for better logic block utilization. A k-input LUT

based logic block can be implemented in number of different ways with trade off between

performance and logic density. An n-LUT can be shown as a direct implementation of a

function truth-table. Each of the latch holds the value of the function corresponding to

one input combination. For Example: 2-LUT can be used to implement 16 types of

functions like AND , OR, A+not B .... etc.

FPGA DESIGN FLOW

In this part of tutorial we are going to have a short intro on FPGA design flow. A

simplified version of design flow is given in the flowing diagram.

61

Design Entry

There are different techniques for design entry. Schematic based, Hardware

the design and designer. If the designer wants to deal more with Hardware, then

Schematic entry is the better choice.

Synthesis

The process which translates VHDL or Verilog code into a device netlist formate.

i.e a complete circuit with logical elements( gates, flip flops, etc) for the design.If the

design contains more than one sub designs, ex. to implement a processor, we need a CPU

as one design element and RAM as another and so on, then the synthesis process.

62

Implementation

This process consists a sequence of three steps

1. Translate

2. Map

3. Place and Route

Translate

Process combines all the input netlists and constraints to a logic design file. This

Map

63

Process divides the whole circuit with logical elements into sub blocks such that

they can be fit into the FPGA logic blocks. That means map process fits the logic defined

by the NGD file into the targeted FPGA elements (Combinational Logic Blocks (CLB),

Input Output Blocks (IOB)) and generates an NCD (Native Circuit Description) file

which physically represents the design mapped to the components of FPGA. MAP

program is used for this purpose.

PAR program is used for this process. The place and route process places the sub

blocks from the map process into logic blocks according to the constraints and connects

the logic blocks.

Device Programming

64

Now the design must be loaded on the FPGA. But the design must be converted to

a format so that the FPGA can accept it. BITGEN program deals with the conversion. The

routed NCD file is then given to the BITGEN program to generate a bit stream (a .BIT

file) which can be used to configure the target FPGA device. This can be done using a

cable. Selection of cable depends on the design.

Behavioral Simulation:This is first of all simulation steps; those are encountered throughout the hierarchy of

the design flow. This simulation is performed before synthesis process to verify RTL

(behavioral) code and to confirm that the design is functioning as intended.

6.3 RESULT

6.3.1:SIMULATIONRESULT

65

66

67

Macro Statistics

# Registers

:8

Flip-Flops

:8

# Xors

: 109

1-bit xor2

: 61

1-bit xor3

: 48

===============================================================

======

Final Register Report

Macro Statistics

# Registers

:8

Flip-Flops

:8

===============================================================

======

*

Final Report

===============================================================

======

Final Results

RTL Top Level Output File Name

: TOP_NIS.ngr

: TOP_NIS

Output Format

: NGC

Optimization Goal

: Speed

68

Keep Hierarchy

: NO

Design Statistics

# IOs

: 26

Cell Usage:

# BELS

: 193

LUT2

: 10

LUT3

: 56

LUT4

: 120

MUXF5

:7

# Flip Flops/Latches

#

:8

FDR

:8

# Clock Buffers

:1

:1

BUFGP

# IO Buffers

: 25

IBUF

: 17

OBUF

:8

===============================================================

===

===============================================================

===

69

Number of Slices

2%

1%

Number of IOs

: 26

: 26 out of

: 8

Number of GCLKs

: 1 out of

232

24

11%

4%

===============================================================

===

Timing Summary:

------------------------------------------------------------------------------------------------------------Speed Grade: -4

Minimum input arrival time before clock: 24.020ns

Maximum output required time after clock: 4.283ns

Maximum combinational path delay: No path found

Total : 24.020ns

(56.3% logic, 43.7% route)

70

Timing Detail:

--------------------------------------------------------------------------------------------------------------All values displayed in nanoseconds (ns)

RTL SCHEMATICS

71

72

73

CHAPTER-7

APPLICATIONS

The residue number system is very attractive solution to many researchers

especially during the last decade. Extensive research have been put on the theory of

improving the RNS system and applying it in some application areas such as, digital

signal processing, digital filters, fast Fourier transform (FFT), and image processing.

The RNS is inherently parallel, modular and fault tolerant. Performing operations

such as addition, subtraction, and multiplication is inherently carry-free, thus reducing a

great amount of circuit integration area where carry-detection circuitry had to be

implemented before.

RSA Algorithm

Digital Signal Processing

74

Digital Filtering

Image Processing

Error Detection and Correction

CHAPTER-8

CONCLUSION

A new approach for multiplication, modulo (2n-1) is proposed. In this design

Partial Product Generator, Carry Save Adder and Parallel Prefix Adder are Used. Similar

to the binary multiplier, the generation of the partial products is accomplished by AND

gates. The Partial Product Generator (Radix-8) is applied to increase the speed by

compression of row size from N to (N/2)-1. Carry Save Adder is used to add the PPG

output values. To completely utilize the unequal delay of a full adder, an algorithm for

delay optimization of the Wallace tree is developed. The proposed parallel Prefix Adding

approach exhibits superior performance, in terms of either speed of hardware

requirement, in comparison with a recent counterpart for the same purpose. In addition,

the proposed multiplier modulo (2n-1) shows an extremely regular structure and is very

suitable for VLSI implementation.

75

I have Used ModelSim- 6.4b for simulation, Xilinx ISE 14.6 for Synthesis, Time

Analysis and Power Analysis and FPGA SPARTAN- 3E Kit for dumping and Post

Simulation of the design. I achieved the total delay value is 24ns and total power value is

0.0076 W.

CHAPTER-9

FUTURE SCOPE

Montgomery modular multiplication algorithm is a well-known method that is

employed in efficient modular multiplication architectures and therefore is widely used in

GF( p) elliptic curve applications.

The complexity of Montgomery multiplier makes the testing process a big

challenge. A methodology for developing testing modules is introduced. Including a selftesting block in the multiplier's system will be beneficial and will reduce the time and

effort for testing. A self-testing block will perform Montgomery multiplication of

hardwired numbers and compare the result with predefined values. A flag bit can be used

to indicate an error.

Power dissipation study of the design is also needed in the context of power

differential attack. This type of attack on a cryptographic system tries to deduce

76

parameters of the system by observing system's power dissipation. This study would be

applicable to show the adequacy of this design approach to hw-power devices, such as

portable computers.

More study need to be done to see the effect of applying re-timing technique to

radix-2 design, and how the re-timing will affect the performance of the design. Some

investigations need to be done to show how the radix-4 design presented in this text can

be extended to cover the unified architecture presented . The integration of multiplication

and exponentiation can be included as part of a hardware co-processor.

CHAPTER-10

BIBLIOGRAPHY

1)

Radix-8

Booth

Student Member, IEEE, and Chip-Hong Chang, Senior Member, IEEER.

2)

VERILOG HDL A Guide to Digital Design and Synthesis IEEE 1364-2001 Complaint

By SAMIR PALNITKAR.

3)

4)

V. Miller, Use of elliptic curves in cryptography, in Proc. Advances in CryptologyCRYPTO85, Lecture Notes in Computer Science, 1986, vol. 218, pp. 417426.

5)

IEEE Trans. Comput., vol. 53, no. 3, pp. 370374, Mar. 2004.

77

6)

multiplier architecture, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol.

44, no. 8, pp. 656659, Aug. 1997.

7)

Wiley, 2001.

8)

Newarchitectures for modulo 2n-1 adders, in Proc. 12th IEEE Int. Conf. Electronics,

Circuits and Systems, Gammarth, Tunisia, Dec. 2005, pp. 14.

9)

modulo 2n-1 addition with a single representation of zero, IEEE Trans. Comput., vol.

56, no. 11, pp. 14841492, Nov. 2007.

10)

R. Muralidharan and C. H. Chang, Fast hard multiple generators for radix-8 Booth

encoded modulo 2n-1and modulo 2n+1multipliers, in Proc. 2010 IEEE Int. Symp.

Circuits and Systems, Paris, France, Jun. 2010, pp. 717720.

78

- Cryptography in The Field of Cloud Computing for Enhancing SecurityUploaded byGRENZE Scientific Society
- booth multiplierUploaded byShambuliung Savakkanavar
- BOOTH MULTIPLIER USING VHDLUploaded bykaggi0
- PowerVu Management Keys HackedUploaded byAndy Villasanti
- IJAIEM-2013-07-04-009Uploaded byAnonymous vQrJlEN
- New Trends in CryptographyUploaded byShiv Saroj
- A Survey on Enhancement of Text Security Using Steganography and Cryptographic TechniquesUploaded byEditor IJTSRD
- Design and Implementation of Low-Power and Area-Efficient 64 bit CSLA using VHDLUploaded byInternational Journal for Scientific Research and Development - IJSRD
- Jiang 2015Uploaded bychakri474
- IIJEC-2014-07-03-6Uploaded byAnonymous vQrJlEN
- ArithmeticCircuitsUploaded byKc प्रबल केसी
- Privacy Preserving by Anonymization ApproachUploaded byEditor IJRITCC
- cloud securityUploaded byKuljit Kaur
- SECURED FRAMEWORK FOR PERVASIVE HEALTHCARE MONITORING SYSTEMSUploaded byijscai
- Teaching Information Security With Virtual LabUploaded byMarlon Tayag
- IJCST-V2I4P11 Author:P.J.Thangamani;G.NagalakshmiUploaded byEighthSenseGroup
- chapter- 5Uploaded byGreg Mavhunga
- Privacy Protection and the LawUploaded byMary Grace Ventura
- Cs 255 Lecture NotesUploaded byAnmol Sood
- Ciphertext Policy Attribute Based EncryptionUploaded byIRJET Journal
- DEC report 4(C)Uploaded byS M Akash
- Security FeaturesUploaded byTiffany Johnson
- MISC 730 L6Uploaded byHellenNdegwa
- A Survey on Different Ways of Secure Image TransmissionUploaded byIRJET Journal
- Handou#7_DataPathUploaded byachuu1987
- Presentation document of DAC-MACUploaded bysumalraj
- Reversible LogicUploaded byKamal Agarwal
- CAUploaded byAnkita Sondhi
- Module 10Uploaded byerdvk
- ch6.pptxUploaded byAin Anuar

- 9548Uploaded byrtahir_uwi
- RPX Repeater User GuideUploaded bytuyenHTHOme
- bonafide oops.docxUploaded bybenson celix
- satellite_2590Uploaded byelguso
- 7210 SAS-X Installation GuideUploaded bynelusabie
- In 2Uploaded byRavindra Mara
- RRAS_StepByStep_DeployVPNReconnectUploaded byPreston Steele
- Ultrasonic Based Smart Blind Stick for Visually Impaired PersonsUploaded bysoftroniics
- SWActivateRequest_9000000000169675P4DF35DD_GHFBJ9948FD933GJUploaded bySukhamMichael
- TALLYWINGS FEATURESUploaded bytallywale
- CHAPTER 4Uploaded byVeenuz del Rosario
- 3GPP TS 11.11.pdfUploaded byVivek Rai
- FOURIERUploaded byMartin Machmar
- ELC17 - Improving the Bootup Speed of AOSPUploaded bySachin Sajan
- Queensland Rainforest ResortUploaded byraptorsid
- Jeff Hayes Project Manager ResumeUploaded byJeff Hayes
- Veracity Company ProfileUploaded bydarlene
- Os Php Eclipse Pdt Debug PDFUploaded bystarexrex
- MCA III rd YearUploaded bysujathalavi
- Rhce Exam SolUploaded byHarshal Tapadiya
- sdrUploaded byCreativ Pinoy
- research paperUploaded byapi-402311716
- EZHD DVR SpecificationsUploaded byAnonymous cWG3miY
- 1756-pm004_-en-p TAGSUploaded bybenditasea3352
- Video ScriptUploaded byramonrusioltrillo
- secure-C.pdfUploaded byMihailo Horvat
- Intel SA 00086 Detection UGUploaded byNecora Perez
- An Oracle WMS Implementation in a Manufacturing EnvironmentUploaded byfloatingbrain
- Comp230 Week 5 Lab InstructionsUploaded byShivani
- Advanced CPM Scheduling With SureTrakUploaded byJack777100