You are on page 1of 10

# International Journal of Advances in Science and Technology, Vol. 3, No.

6, 2011

## VLSI Implementation of Modified Montgomery Multiplication Algorithm

V. Narasimha Nayak1*, Suneel Mudunuru1, Md. Sami2*, M. Nagesh Babu3*
Assistant Professor, Dept.of ECE, K L University, Vijayawada, A.P, India. narasimhanayak17@gmail.com suneel_127@yahoo.co.in 2 Assistant Professor, Dept.of ECE, SBIT, Khammam, A.P, India. sami_849@yahoo.com 3 Associate Professor, Dept.of ECE, K L University, Vijayawada, A.P, India. maile.nageshbabu@gmail.com
1

Abstract
Modular multiplication is the basic building block for computation performed in RSA cryptosystem. In RSA crypto system both encryption and decryption are done by using multiplication and exponentiation. The present work describes the characteristics to implement modular multiplication and exponentiation by using Montgomery algorithm. The performance results of Montgomery algorithm for two different operand sizes and technologies in terms of area and time delay are presented and discussed. The characteristic of FPGA prototype which is has an iterative sequential architecture is designed to implement modular multiplication using the Montgomery algorithm..The entire design was simulated using the Xilinx Project Manager (version Build 13.0) and the design was elaborated using VHDL.

Keywords: GF (Galois Field), Field programmable gate array (FPGA), Montgomery Multiplication. 1. Introduction
The performance of the RSA crypto system is mainly depending on the implementation efficiency of modular multiplication and exponentiation. In order to improve the requirements of the encryption and decryption we need to minimize the number of modular multiplications and as well as the time and area requirements [4]. Many algorithms are proposed for modular multiplication, such as Brickells algorithm [5], Booths algorithm [6]. But Montgomery algorithm is the most efficient algorithm to implement modular multiplication and exponentiation in RSA crypto systems. In this paper a modified Montgomery algorithm is used to implement the modular multiplication and exponentiation. In this method we will find out the Montgomery multiplication of two numbers A and B over a modulus M. By using this Montgomery multiplication we have implemented the Montgomery exponentiation for encryption [4], [8].

2. Montgomery Algorithm
Multiplication is the basic building block in many computational systems such as in RSA cryptosystem. An RSA cryptosystem mainly consists of a modulus M and two integers D and E which are called as private and public keys, and these parameters has to satisfy the property TDE = T mod M with plain text T obeying 0T<M. Messages are encrypted using the public key as C = TD mod M and decrypted as T = CE mod M. So modular exponentiation is used to perform both encryption and decryption. It consists of a repetition of modular multiplications. Montgomery introduced an efficient technique for multiplying two integers over modulo M and this method is commonly known as Montgomery multiplication [4]. This algorithm computes (A*B*2-n) mod M through an iterative

Special Issue

Page 20 of 86

## International Journal of Advances in Science and Technology

process of additions and shifts. Montgomery algorithm is the more popular and efficient one to implement the modular multiplication and exponentiation. Generally the modular multiplication algorithms consists of two steps, one generates the product P=A*B and other reduces this product to P mod M. It yields the reduced product using a series of additions. Let A, B and M be the multiplicand, the multiplier and the modulus respectively, and n be the number of digits in their binary representation, i.e. the radix for binary is 2. So, we denote A, B and M as follows:

A ai X 2 i
i 0

n 1

B bi X 2 i
i 0

n 1

M mi X 2 i
i 0

n 1

The Montgomery algorithm follows certain conditions The modulus M should be relatively prime to the radix, i.e. there exists no common divisor for M and the radix; The multiplier and the multiplicand need to be smaller than M, where M is a prime number and it should be in the range 2n-1<M<2n. Algorithm for Montgomery Multiplication(A,B,M) { 1 : Int R=0 2 : For i=0 to n-1 3 : R=R+ai*B { 4: If r0=0 then 5: R=R div 2 Else 6 : R=R+M div 2 } 7 : Return R } This algorithm fails for the large input values as we get the output exceeding the modulus value. For example: Let A = 01010 (1010), B = 100112 (1910) and M = 111112(3110) as inputs, the output R = (3910) for i=3. At this point (i=3), the output value is exceeding the modulus value. So, the time delay and area will be increased. To overcome this limitation, a modified algorithm for Montgomery Multiplication is proposed as follows: Modified Algorithm for Montgomery Multiplication (A, B, M) { 1 : Int R=0 2 : For i=0 to n-1 3 : R=R+ai*B { 4 : If r0=0 then 5 : R=R div 2 Else 6 : R=R+M div 2 } 7 : If R>M then 8 : R=R-M 9 : Return R }

Special Issue

Page 21 of 86

## International Journal of Advances in Science and Technology

Using the modified algorithm we achieved the output, R = 4 (i = 4) for the same input, which is below modulus value. Hence, there is a drastic reduction in area and latency. The table below gives the area and time delay implemented for 5 bit and 16 bits. This can be implemented for even higher bits. Time (ns) 15.5 38.84 Area (slices) 113 1294

## 3. Iterative Sequential Architecture for Montgomery Multiplication

Montgomery multiplication is performed using this architecture A, B,M are taken as inputs and R=(A*B*2-n) mod M is taken as output. The architecture for the Montgomery modular multiplier is given in Fig1. This architecture uses adders, multiplexers, shift registers and also a controller. The first multiplexer, i.e.MUX21, passes 0 or the content of B depending on whether a0 bit indicates 0 or 1. The second multiplexer, i.e. MUX22 passes 0 or the content of M depending on whether r0 bit indicates 0 or 1 respectively. The first adder, i.e. ADDER1, gives the sum R + ai B value and the second adder, i.e. ADDER2, gives the sum R + M . The shift register SHIFT REGISTER1 produce the bit ai. At each iteration right shift of register is done, so that the LSB of SHIFT REGISTER1 contains ai.

Figure 1: Iterative prototype For Montgomery multiplication S.No 6-bit 16-bit Time(ns) 12.12 25.7 Area(slices) 28 10

Special Issue

Page 22 of 86

## International Journal of Advances in Science and Technology

4. Montgomery Exponentiation
In RSA crypto system encryption and decryption are done by using exponentiation. It consists of repetition process of Montgomery modular multiplications. Here exponentiation is performed over a modulus. In RSA algorithm The private key of a user consists of two primes p and q and an exponent D. The public key consists of the modulus N = p q and an exponent E such that E = D1 mod LCM ((p1), (q 1)). To encrypt a message M the user computes: C = ME mod N. Decryption is done by M = CD mod N. Modular exponentiation can be realized by using the standard square and multiply algorithm as given by Algorithm[15]. Montgomery Exponentiation Algorithm to get ME mod N Integers N, 0 M< N, 0< E< N, E = (et-1,et-2,.,eo)2, et-1=1 1: A M 2: for i from t-2 to 0 do 3: AAA mod N 4: if ei = 1 then 5: AAM mod N 6: end if; 7: end for 8 : Return(A)

Figure 2: Calculation of a22 using square and multiply method S.No 5-bit 16-bit Time(ns) 53 136.76 Area(slices) 566 5177 Frequency(MHz) 184.78 239.8

## Table 3: Montgomery Exponentiation

5. Simulation Results
5.1 Montgomery Multiplication using Modified Algorithm
5.1.1. 5-bit multiplication: In Figure 3 a, b are the two 5-bit inputs and m is the 5-bit modulus which is relatively prime to the radix and greater than a and b. The simulation result is having the latency 15.5 ns occupying the area of 113slices.

Special Issue

Page 23 of 86

## International Journal of Advances in Science and Technology

Figure 3: RTL Schematic and simulation of 5-bit multiplication 5.1.2. 16 Bit Multiplication: In Figure 4 a, b are the two 16-bit inputs and m is the 16-bit modulus which is relatively prime to the radix and greater than a and b. The simulation result is having the latency 38.8 ns occupying the area of 1294 slices.

Special Issue

Page 24 of 86

## 5.2 Iterative Sequential Architecture:

In Figure 5 Iterative Montgomery multiplier is implemented with operand size 6 bits. The simulation results are having the latency of 12.12ns occupying area 28 slices. 5.2.1. 6 bit Multiplication:

Special Issue

Page 25 of 86

## International Journal of Advances in Science and Technology

5.2.2. 16-bit Multiplication: In Figure 6 Iterative Montgomery multiplier is implemented with operand size of 16 bits. The simulation results are having the latency of 25.7ns occupying area 10 slices.

## 5.3 Montgomery Exponentiation Using Modified Montgomery Algorithm:

In Figure 7 Montgomery exponentiation is implemented for 5 bits which is having the latency of 53ns occupying area 566 slices, and also fig.8, 9, 10 describes simulation results of exponentiation for different operand sizes. 5.3.1. RTL design and simulation of X2 mod M:

Special Issue

Page 26 of 86

## International Journal of Advances in Science and Technology

5.3.2. RTL design and simulation of X16 mod M(X, M=5 bit):

Figure 8: RTL design and simulation of X16 mod M(X, M=5 bit) 5.3.3. Simulation of X32 mod M(X, M=5 bit):

Figure 9: simulation of X32 mod M(X, M=5 bit) 5.3.4. Simulation Of X16 mod M(X,M=16 bit): Figure 10 represents Montgomery exponentiation which is implemented for 16 bits.

Special Issue

Page 27 of 86

## International Journal of Advances in Science and Technology

6. Conclusion
Modular multiplication and exponentiation with the modified Montgomery algorithm has been simulated by using VHDL language. VHDL implementations show desired results for modular multiplication and exponentiation. The Montgomery algorithm gives better area and delay when the modifications are done. The simulated results in Modular multiplication and exponentiation with the modified Montgomery algorithm show the latency and occupancy increases with increase in operand size. .In iterative multiplication architecture the simulation results show the increase in latency and decrease in area when we are using low operand sizes. But area will increase when we use large operand sizes.

7. References
[1] Imai H., Hanaoka G., Shikata J, Otsuka A., Nascimento A.C., Cryptography with Information Theoretic Security, Information Theory Workshop, 2002, Proceedings of the IEEE, 20-25 Oct 2002. [2] Behrouz A.Forouzan , Cryptography and network security, Special Indian edition 27,Tata McGraw-Hill, 2006 [3] Debdeep Mukhopadhyay, Gaurav Sengar, and Dipanwita Roy Chowdhury, Hierarchical Verification of Galois Field Circuits, IEEE transactions on computer-aided design of integrated circuits and systems, Vol. 26, No. 10, October 2007. [4] Nadia Nedjah and Luiza De Macedo Mourelle,Two hardware implementations for the Montgomery multiplication: Sequential versus parallel, Department of De systems Engineering and Computation, Faculuty of Engineering, State University of Rio de Janeiro , Proceedings of the 15th Symposium on Integrated Circuits and Systems Design(SBCCI02)0-7695-1807-9/0 , 2002 . [5] C. D. Walter, A verification of Brick ells fast modular multiplication algorithm, International Journal of Computer Mathematics, 33:153:169, 1990. [6] G. W. Bewick, Fast multiplication algorithms and implementation, Ph. D. Thesis, Department of Electrical Engineering, Stanford University, United States of America, 1994. [7] R. L. Rivest, A. Shamir, and L. Adleman, "A Method for Obtaining Digital Signatures and PublicKey Cryptosystems," Communications of the ACM, vol. 21, pp. 120126, February 1978. [8] Nedjah, N.; de Macedo Mourelle, L.; Reconfigurable hardware implementation of Montgomery modular multiplication and parallel binary exponentiation, Department of de Systems Engineering and Computation, Faculty of Engineering, State University of Rio de Janeiro, Brazil, Publication Year: 2002 , Page(s): 226 - 233. [9] S. Morioka, Y. Katayama, and T. Yamane, Towards efficient verification of arithmetic algorithms over Galois fields GF(2m), in Proc. CAV, 2001, vol. 2102, pp. 465477. [10] C. K. Koc, RSA Hardware Implementation, RSA Laboratories, version 1.0 , August 1995. [11] Chinuk Kim, VHDL Implementation of Systolic modular multiplications on RSA crypto systems, the city college of the city university of Newyork, jan-2001. [12] Henry G.Baker, Computing A*B (mod N) Efficiently in ANSI C, Nimble Computer Corporation, 16231 Meadow Ridge Way, Encino, CA 91436. [13] J.-H.Oh, S.-J. Moon, Modular multiplication method, IEE Proc.-Comput. Dixit. Tech., Vol. 145. No. 4, July 1998. [14] L. Batina, S. B. O rs, B. Preneel, and J. Vandewalle. Hardware architectures for public key cryptography. Elsevier Science Integration the VLSI Journal, in print, 2002. [15] Montgomery Modular Exponentiation on Reconfigurable Hardware, Worcester polytechnic Institute ,ECE department, Worcester, MA 01609-2280,USA. [16] Ching-Chao Yang, Tian-Sheuan Chang, and Chein-Wei Jen, A New RSA Cryptosystem Hardware Design Based on Montgomerys Algorithm, IEEE Transactions 49 on Circuits and Systems-II: Analog and Digital Signal Processing. Vol. 45, No. 7, pp. 908-913, July 1998. [17] Richa Garg , Renu Vig, An efficient Montgomery multiplication algorithm and RSA cryptographic processor, University Institute Of Engineering And Technology,Sector-25,panjab University, Chandigarh, India, 0-7695-3050-8/07, 2007 IEEE.

Special Issue

Page 28 of 86

## International Journal of Advances in Science and Technology

[18] Che-Wun Chiou, Chiou Yng Lee , An-Wen Deng and Jim-Min Lin, Efficient implementation of Montgomery Multiplication in GF(2M), Tamkang Journal of Science and Engineering, Vol. 9, No 4, pp. 365_372 (2006). [19] T. Blum and C. Paar. Montgomery modular exponentiation on reconfigurable hardware. In Proceedings of 14th IEEE Symposium on Computer Arithmetic, pages 7077, Adelaide, Australia, April 14-16 1999. [20] S. E. Eldridge and C. D. Walter, Hardware implementation of Montgomerys modular multiplication algorithm, IEEE Trans. Comput., vol. 42, pp. 693 - 699, June 1993. [21] G. Gaubatz, Versatile Montgomery multiplier architectures, M. S. Thesis, Worcester Polytechnic Institute, Dept. of Electrical Engineering, April 2002. [22] . Ko, T. Acar, and B. Kaliski, Analyzing and comparing Montgomery multiplication algorithms, IEEE Micro, June 1996, pp. 26-33.

Authors Profile
V. Narasimha Nayak was born in Khanapuram, Khammam (Dist.), AP, India. He received B.Tech. in Electronics & Communication Engineering from Swarna Bharathi Institute Of science and Technology, Khammam (Dist.,), AP, India, M.Tech from NIT Rourkela,Rourkela, Orissa, India. He is working as Assistant Professor for Department of Electronics & Communication Engineering, KL University, Vijayawada, AP, India. He has published one international journal.

E-mail: narasimhanayak17@gmail.com
Suneel Mudunuru was born in Vijayawada, Krishna (Dist.), AP, India. He received B.E in Electronics & Communication Engineering from S.R.K.R.Engineering College, Bhimavaram, AP, India and M.Tech from Nalanda Institute of Engineering and Technology, Sattenapalli, Guntur, AP, India. He is working as Assistant Professor in Department of Electronics & Communication Engineering, K L University, Vijayawada, AP, India. He has published one International Journal and presented one paper in International Conference.

E-mail: suneel_127@yahoo.co.in
Md.Sami was born in Khammam (Dist.), AP, India. He received B.Tech. in Electronics & Communication Engineering from Vazir sultan College Of Engineering, Khammam (Dist.,), AP, India, M.Tech from Sreekavitha Engineering college, Khammam, AP, India. He is working as Assistant Professor for Department of Electronics & Communication Engineering, Swarna Bharathi Institute Of science and Technology, Khammam (Dist.,), AP, India.

E-mail: sami_849@yahoo.com
M. Nagesh Babu was born in Kurnool, Kurnool (Dist.), AP, India. He received B.Tech in Electronics & Communication Engineering from JNTU Anantapur, AP, India and M.Tech from Hyderabad Institute of Technology and Management, R.R(Dist), AP, India. He is working as Associate Professor in Department of Electronics & Communication Engineering, K L University, Vijayawada, AP, India. He has 9 years of Industry experience and 9 years of Teaching experience.He presented 2 papers in National conferences.

E-mail: maile.nageshbabu@gmail.com

Special Issue

Page 29 of 86