International Conference on Communication and Signal processing ICCOS 2011 Karunya University, Coimbatore-641 114.

17-18, March 2011

Department of electronics and communication engineering, kongu engineering college

Department of electronics and communication engineering, kongu engineering college
Perundurai,india Email: Montgomery multiplication (MM) algorithm [12], [13], we propose a multimode multiplier supporting the essential operations used by AES and the public-key cryptosystems.

Abstract—.This work presents a highly efficient
multimode multiplier supporting prime field, namely, polynomial field, and matrix– vector multiplications based on an symmetric word-based Montgomery multiplication (MM) algorithm. In this paper, the multimode multiplier supporting matrix-vector and dual field multiplication is designed and simulated using ModelSim software tool. Then it can be implemented in cryptographic algorithms to verify the performance. Keywords—AES, Asymmetric Word Based Modular Multiplication Algorithm, Prime Field, Galios Field(GF)



Many communication applications have been invented to make daily life more convenient, such as the credit card transaction system, Internet transaction service, etc. However, using insecure network to transmit private data may suffer from significant risk, resulting in huge loss. One of the most useful methods to protect data is employing a cryptographic system, as the design of cipher algorithms is based on an advanced mathematical theorem. Recently, there have been many works on designing cost-effective encryption hardware used in portable applications[ 1]–[12]. Some works [1]–[5] focus on area reduction of AES, while others [6]–[10] propose to reduce hardware cost for both ECC and RSA cryptosystems. Three different area reduction strategies on InvSubBytes/SubBytes transformations are proposed in [3],[8]–[10].In our survey, most papers [1]–[10] focus on area minimization of a single type of cryptosystem (either AES/DES or RSA/ECC). When considering security issues on portable applications, users often need both types of cryptographic hardware to speed up performance for certain applications. Thus, it is necessary to study efficient architectures supporting dual-type cryptosystems with higher performance but lower area cost. The purpose of this paper is to design a cost-effective multiplier that can enhance the performance of multiple encryption algorithms, particularly for AES. Based on a wordbased

AES is a private-key block cipher algorithm, which is composed of three key procedures: the encryption, decryption, and round-key expansion processes. It deals with data blocks of 128 b using keys with three standard lengths of 128, 192, or 256 b. Figure 3.1 shows the AES algorithm. Each 128-bit state, operated by four primitive transformations. During the encryption/decryption process, the four primitive transformations are executed iteratively in rounds, where the value of will be 10, 12, or 14, depending on which key size is selected. In the encryption procedure, the incoming data will first be bitwise XORed with an initial key, and then, four transformations are executed in the following order: Sub- Bytes, ShiftRows, MixColumns, and AddRoundKey. arranged as a 4x4 Notice that the MixColumns transformation is not performed in the last round. information is round needs a round key, an initial key is used to generate all round keys before encryption/decryption. execution sequence is reversed in the decryption process, where their inverse transformations are InvSubBytes, InvShiftRows, InvMixColumns, and add roundkey respectively.In the AES algorithm, the SubBytes transformation is a nonlinear byte substitution composed of two operations: 1) modular inversion over GF(2 ) , modulo an irreducible polynomial p(x)=x +x 4  x +x+1 and 2) affine transformation defined as , where is an 8 x8 b matrix, v is an 8-b constant, and x/y denotes 8-b input/output.
8 3 8

Department of Electronics and Communication, Karunya University


x1.1 AES Algorithm In the MixColumns transformation.N)/2//reduction P4:return S=(S>N)?S-N:S.Y. all operands are represented in word-based form. and P4 in Algorithm II. Karunya University 871 . As shown in The accumulation phase.AWBMM Algorithm We modify the MM algorithm into an asymmetric word-based MM (AWBMM) algorithm.y0)2.Y.y1. Given the inputs X. For(i=0 to m-1){ P1:qi=(S+xi.Y≤N.….Nu-2. where N is an m -bit modulus and 0 ≤X. an m bit integer X is represented in wordbased form with u s-bit words (s x u=m) . accumulates each partial product.X<N Outputs: m Department of Electronics and Communication.Y.N) c( x)  {03}x3  {01}x  {02} modulo x 4 1 . In the AWBMM algorithm (Algorithm II). accumulation.n0)2.Y<N.N0)2t. P2. figure 3.//Parity Generation P2.….X0)2S.N]. and S z. the for-loop iteratively generates the parity (q i ) . Y=( and an m -bit integer Y is represented as vt -bit words (t x v=m). After the for-loop.N) {= 0. N=(Nu-1. and the MM algorithm is the most effective algorithm to compute modular multiplication. P3. Algorithm I:MM(X. generally consists of four phases:The parity generation phase. P3.k . and performs the modular reduction. and final correction.Y.The asymmetric feature of the operand size helps us design an efficient multiplier to support MV and dual-field The reduction phase.……N1. Algorithm II: AWBMM(X.…. W=-N-1mod2S OUTPUTS: C=(CU-1.It is similar to the MM algorithm in that the AWBMM algorithm also has the same operation phases: parity generation. the output of the MM algorithm is equal to XY2 mod N .……Y1. and P4} to mark the four operation phases respectively.C0)2S =XY2-m mod N AWBMM(X. The final-correction phase.n1. respectively. the final correction adjusts the final result to fall within the range of [0. it replaces the modular multiplication as a series of additions and right shifting.x0)2.Y and N.C U-2.0≤X.//Final correction The MM algorithm.N) Inputs: X=(xm-1.0<X. P2. In the MM algorithm. The variable T denotes the parity.Y0)2t. The ShiftRows transformation is a simple operation in which each row of the state is cyclically shifted right by different offsets.Notice that Xj denotes the jth word of X and that Xij:k denotes a sequence of bits from the jth bit to the kth bit of Xj. which was proposed by Montgomery in 1985.….the 128-b data arranged as a 4 x 4 state are operated column by column.International Conference on Communication and Signal processing ICCOS 2011 S=XY2-mmod N.N) INPUTS: X=(Xu-1. The AddRoundKey transformation is a bitwise XOR operation of each round key and current state 1) MM Algorithm Modular multiplication is the major operation of many popular public-key cryptosystems.C 1.Xu-2. shown in Algorithm I.S=S+ Xi.Y//Accumulation P3: S=S+ qi.Y)/2. marked with P1. but the word width of different operands may be different For instance. MM(X. We use the labels {P1. reduction. N=(nm-1. and j are four variables used to store the temporary values in the inner for-loop. The difference is that all operands in the AWBMM algorithm are processed word by word. Y=(Yv-1.Yv-2. The four elements of each column form a four-term polynomial that is multiplied by a constant polynomial B.……X1.

The eight columns of matrix M shown above. the polynomial product is easy to obtain by concatenating the sum vectors of P1.C1b1 .. using an example. The carry vectors of each CSA and the final sum vector are sent to the Wallace tree accumulator (WTA) module. K=(j-v/u)/u. the polynomial product from the WTA module. Therefore.//FINAL CORRECTION The MV multiplication of (11) can be reformulated as eight vectors XORed together... as both their multiplier and ours are designed based on the word-based MM algorithm.4: Proposed Multimode 8x8b multiplier Figure 3. for(i=0 to u-1)\{ Z=0.4 shows the proposed multimode multiplier. A. 3. and the integer product from the final-stage adder.l=j%(v/u). pp2. It is named XTC since only the sum vectors of each CSA are XORed. a novel multimode 8x8 b multiplier has been presented. Hence. In the meantime. In the XTC module.the MV product can be obtained at the XTC module. P2 (final sum vector). The multiplier size is further enlarged to handle all MV multiplications Based on the modified AES and MM algorithms..…. we can get the MV product from the XTC module. the other partial products in P1 and P3 are sent to WTA as well for carry-save accumulation. which is a multimode 8x8 b multiplier. B. and P3.4(b).pp8} with { C 0 b0 .. as shown in Figure 3.International Conference on Communication and Signal processing ICCOS 2011 {S=0. Finally.. II PROPOSED MULTIMODE MULTIPLIER The eight partial products are padded with some zeros and partitioned as P1. If(v/u≤j≤v-1) Ck(l+1)t-1:lt=s mod2t } Cu-1=z. It is obvious that the final sum vector of XTC is equal to the XOR value of pp1. pp2. it shows eight partial products of an 8x8 b multiplication. In addition. The multimode multiplier is modified from the dual-field multiplier.. the description of this section is focused on the Matrix-Vector multiplication. P4: return C=(C≥N)?C-N:C.pp2. P2. and P3.. P1: T=(C0+XIY0)W mod 2s.//PARITY GENERATION for(j=0 to v-1){//accumulation & reduction P2: S=(CJ+XIYJ+TNJ+Z) P3: Z=S/2t. By replacing {pp1.4.. Proposed Multimode 8x8 b Multiplier Figure 3. The new eight partial products of P2 are labeled as pp1... In Figure. Karunya University 872 .4(a). an adder is used to convert the WTA‘s outputs (carry and sum) into a normal binary number. and pp8.C 7 b7 } . Width Selection of Multimode Multiplier Until now. and pp8 and fed into an XOR tree calculator (XTC) shown in Figure.3. the square and circle at the output of CSA or HA denote the produced carry vector and sum vector. we propose a multimode multiplier to support both MV and dual-field multiplications. Department of Electronics and Communication.

(c) shows the extension from a multimode 8 x 8 b multiplication (see Figure 3. main controller. In the figure. For example. 5(a) shows the Mix..7 shows the proposed cipher core architecture based on a multimode 128 x 32 b multiplier (highlighted by solid rectangle).B1‗. can be represented as 32x8-b vectors XORed together. it needs a 128 x 32 b multiplier to do the XOR operation. I/O controller. The positions of all blocks are carefully arranged so that the result of the MixColumns transformation is defined as the 16x4 intermediate values XORed column by column. storage element unit. and it produces an 8-b intermediate value after the MV multiplication.6 : Proposed cipher core architecture based on a multimode 128x32b multiplier.5(b). Karunya University 873 . Figure 3. In Figure 3. The MixColumns transformation is finally formulated as the XOR value of 32 128-b vectors. needed in the new AES round function. Each block. e. and {03*}.5(c). each 8-b result of the MixColumns transformation. The results. is represented as the first four columns in the rightmost side.4) to a multimode 128 32 b multiplication. the 32 128-b partial products are padded with some zeros in the upper. In fact. Figure 3.B2‘. i.B0‘. such as B3‘. Then. Figure 3. In the same way.International Conference on Communication and Signal processing ICCOS 2011 Figure 3. the MixColumns transformation of the first column of the 4x4 state. The 32x8-b vectors of each column can therefore be concatenated into 32 128-b vectors. it simply replaces the III CIPHER CORE ARCHITECTURE Figure 3.Columns transformation represented by the 16x4 blocks.{02*}.5 : Arrangement of new MixColumns coefficients and 128b data. it shows the partial products of a multiplication whose size is 128 32 b..B3‘.5(a). The central 32 128-b partial products with the padded zeros are computed by an enlarged XTC. It does not affect the result of dual-filed multiplication since there are just some zeros padded in the partial products. Other components include dedicated hardware for the AES function (encircled by dashed rectangle).5(b). indicates an MV multiplication. Department of Electronics and Communication.and bottomright corners.g. as shown in Figure 3.e. if the newInv-/MixColumns transformations are needed.are obtained by vertically XORing the rightmost 4 x 4 intermediate values. a 128-b input is partitioned into 16 8-b vectors Columns are defined as {01*}. as well as I/O interface. hence. central partial products as new 32 128-b vectors arranged like Figure 3. It needs 64 MV multiplications executed concurrently in each Inv-/MixColumns transformation.{02*}B3.

and Sengupta I.‖ in Proc. are reformulated as multiple MV multiplications.S.-F. Appl.. pp 43–56 Harris D. 1. 17th IEEE Symp Comput. Comput. vol. no. vol. 18. Apr. 44. Chen L. pp.. Therefore. 170..(2004) ―A novel memoryless AES cipher architecture for networking applications. it saves more area cost than other straightforward methods.‖ in Proc. and Li J.. Lu and Shau-Yin Tseng Integrated Design of AES (Advanced Encryption Standard) Encrypter and 4 2 [4] [5] [6] [7] IV [8] [9] [10] [11] [12] Decrypter Proceedings of the IEEE International Conference on ApplicationSpecific Systems....-C. no. The multimode multiplier also supports the modular addition. Conf. RoyChowdhury D.) ―Modular multiplication without trial division.‖ IEEE Trans. 57–69... REFERENCES [1] Alam M.(2003) ―A highly regular and scalable AES hardware architecture. pp. and Cheng-Wen Wu ―An Efficient multimode Multiplier Supporting AES and Fundamental Operations of Public-Key Cryptosystems‖ IEEE transactions on Very Large Scale Integration (VLSI) Systems. Aigner M. the authors present a ―fully rolled inner pipelined architecture‖ that uses only two 8b basis conversion units(GF(2 )  GF (2 )).. Lu C.. subtraction. 2005) ―An improved unified scalable radix-2 Montgomery multiplier. no.-Y. the proposed architecture will have higher efficiency CONCLUSION In this paper.( 1985. Koç Ç K. as well as multiplication in both and fields. The multimode multiplier also supports the modular addition. no. Chou C. subtraction. Daly A. Conf. 4. and Mukaida K.( 2006.G. vol. pp.‖ in Proc. (2007) ―A new compact architecture for AES with optimized Shiftrows operation. a high-efficiency and highperformance cipher core based on the multimode multiplier is presented.-C. Gulak P.. Ghosh S. 8 [2] [3] Department of Electronics and Communication. When comparing the hardware efficiency for both AES and MM algorithms. Comput. Krishnamurthy R.-C. 519– 521. which are the most areaconsuming part. Chen-Hsing Wang. the AES round function is regrouped as new linear and nonlinear functions. a high-efficiency and high-performance cipher core based on the multimode multiplier is presented. april 2010 553 Chih-Chung.(2005) ―A scalable dual mode arithmetic unit for public key cryptosystems.‖ in Proc. Lai Y. Mangard S. IEEE ISCAS.‖ in Proc.. vol.. L. IEEE ISCAS Li H. Int.. vol. Codes Cryptography. Jul.-W.) ―An areaefficient universal cryptography processor for smart cards. Arithmetic. and Hsu S. Chang L. Chieh-Lin Chuang... and Processors (ASAP‘02) 1063-6862/02 Crowe F... Architectures.. In addition.. 1851–1854.-Specific Syst. As the proposed integration architecture efficiently shares the hardware resources. Apr. Anders M. the composite field arithmetic to decompose the Inv-/SubBytes transformations is going to be used.. 277–285. and Chiu C.D. while others need 16 conversion and 16 inverseconversion units. Mathew S. pp.. 14.‖ in Proc.‖ IEEE Trans. and Acar T. 4. Architectures.. no. pp..International Conference on Communication and Signal processing ICCOS 2011 RESULTS In this paper. directly integrating different cipher cores into single core architecture. and Tseng S... (2002)―Integrated design of AES (Advanced Encryption Standard) encrypter and decrypter. Very Large Scale Integr. 1–6..-K.. they are efficiently executed by the proposed multimode multiplier. then. Karunya University 874 . In [2]. the integration architecture supports more features than other low-cost AES designs. DATE. In the next phase. Conf. (2007) ―An area optimized reconfigurable encryptor for AESRijndael. Sheikholeslami A. 52. IEEE Int. Ray S. Montgomery P. Eslami Y.. (VLSI) Syst. Processors. as well as multiplication in both and fields. 14. The new Inv-/MixColumns transformations.. and it also supports scalable key sizes for the MM algorithm by changing the storage size. ITCC.‖ Math. Mukhopadhayay. and Dominikus S. Masui. 1. and Marnane W. (1997) ―Montgomery multiplication‖ Des.

Sign up to vote on this title
UsefulNot useful