You are on page 1of 5

Prof. P. Magesh Kannan et al.

/ (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES


Vol No. 5, Issue No. 2, 310 - 314

VLSI Architecture and ASIC Implementation of ICE Encryption


Algorithm
Prof. P. Magesh Kannan Venkatesan Sellappan
VLSI Division, School of Electronics Engineering, PG Student – VLSI Design, SENSE
VIT University, Vellore, India VIT University, Vellore, India
E-mail: mageshkannan.p@vit.ac.in E-mail: venkatesan.srec@gmail.com

Abstract— In modern security, the need for safe cryptographic and for high–speed performance. The proposed architecture
algorithms that are hardware implemental is mandatory. A has very encouraging performance result in terms of speed and
hardware architecture is proposed in this paper, for the throughput. This makes the design very useful in current
implementation of the Information Concealment Engine applications that use DES as the base of a cryptographic
encryption algorithm. The Information Concealment Engine protocol [2]. ICE is a Feistel network with a block size of 64
algorithm was designed for use in software applications. Those
bits. The standard ICE algorithm takes a 64-bit key and has 16
applications are slow due to the use of modular arithmetic. So the
need for faster implementations becomes mandatory. That can be rounds. A fast variant, Thin-ICE, uses only 8 rounds. An

T
achieved through hardware implementations. The system open-ended variant ICE-n, uses 16n rounds with 64n bit key.
operates for the encryption processes and has been optimized for They described an attack on Thin-ICE which recovers the
low hardware resources and for high-speed. In this paper we are secret key using 2 23 chosen plaintexts with a 25% success
going to discuss about the ICE, DES and Triple DES, also we probability. If 2 27 chosen plaintexts are used, the probability
compare the performance of all. The proposed architecture has can be improved to 95%. For the standard version of ICE, an
been implemented as ASIC in 180 nm technology. attack on 15 out of 16 rounds was found, requiring 2 56 work
and at most 2 56 chosen plaintexts.
Keywords-ICE, Montgomery
multiplication, Systolic architecture

I. INTRODUCTION
ES
Multiplication, Modular
In this paper we are going to discuss about the ICE, DES
and Triple DES, also we compare the performance of all.

Security is a primary requirement of any wireless


cryptographic protocol. In order to find a solution to this A. DES
always up to date problem, cryptographic algorithms are The Data Encryption Standard or DES algorithm is a 64-
constructed to provide secure communication applications. bit block cipher that can use key size of 64 bits. Both ICE and
There are many good algorithms with different usages and DES uses standard Feistel block cipher but one of the
characteristics, not all of them can be characterized fully differences from DES is that after the expansion function E,
secure [1], [2]. In cryptography, ICE (Information key permutation is used in ICE. This secret key encryption
A
Concealment Engine) is a block cipher published by Kwan in algorithm uses a key that is 56 bits, or seven characters long.
1997. The algorithm is similar in structure to DES, but with At the time it was believed that trying out all
the addition of a key-dependent bit permutation in the round 72,057,594,037,927,936 possible keys (a seven with 16 zeros)
function. The key-dependent bit permutation is implemented would be impossible because computers could not possibly
efficiently in software. The ICE algorithm is not subject to ever become fast enough. In 1998 the Electronic Frontier
patents, and the source code has been placed into the public Foundation (EFF) built a special-purpose machine that could
domain. decrypt a message by trying out all possible keys in less than
IJ

three days. The machine cost less than $250,000 and searched
The ICE algorithm was designed for use in software over 88 billion keys per second.
applications. Those applications however are slow due to the
use of modular arithmetic [3]. So the need for faster B. Triple DES
implementations is great. That can be achieved through
hardware implementations. Considering the fact that hardware The Triple-DES variant was developed after it became
implementations are generally faster and more Ease of Use clear that DES by itself was too easy to crack. It uses three 56-
reliable than software implementations the outcome of a bit DES keys, giving a total key length of 168 bits. Encryption
hardware design is even more interesting. using Triple-DES is simply
cipher text = EK3(DK2(EK1(plaintext)))
In this paper, the architecture and the VLSI
implementation of the ICE encryption algorithm are proposed. DES encrypt with K1, DES decrypt with K2, then DES
The system operates for the both encryption and decryption encrypt with K3.Because Triple-DES applies the DES
processes and has been optimized for low hardware resources algorithm three times (hence the name), Triple-DES takes

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 310


Prof. P. Magesh Kannan et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 5, Issue No. 2, 310 - 314

three times as long as standard DES. Decryption using Triple-


DES is the same as the encryption, except it is executed in
reverse.

Decryption is the reverse:


plaintext = DK1(EK2(DK3(cipher text)))

The paper is organized as follows. In section 2 the ICE


encryption algorithm is introduced. In section 3 the F function
is described. In section 4, a thorough analysis of the
architecture and the VLSI implementation is made. In section
5 Montgomery multiplication is described. In section 6 the
performance is presented and some conclusions are presented
in section 7.
II. THE ICE ENCRYPTION ALGORITHM
ICE is a standard Feistel block cipher [2], [3] with a
structure similar to DES. It takes a 64-bit plaintext, which is
split into two 32-bit halves. In each round of the algorithm the

T
right half and a 60-bit subkeys are fed into the function F. The
output of F is XORed with the left half, and then the halves Figure 1. ICE F Function
are swapped. This is the Transformation Round of the ICE
algorithm. This process is repeated for 16 rounds [2], A. The Expansion Function E
[3].However the final swap is left out. The decryption process
is the same, except that the subkeys are used in reverse order. The 32-bit plaintext half is expanded to four 10-bit values,
ES
The advantages of Feistel structure are one-to- one
mapping between plaintext and cipher text, which is necessary
for a cipher to be decryptable. Secondly, Feistel ciphers have
E1, E2, E3, E4, in the following manner [2], [3].
E1=P1P0P31P30P29P28P27P26P25P24
E2=P25P24P23P22P21P20P19P18P17P16
E3=P17P16P15P14P13P12P11P10P9P8
been publicly cryptanalysed for more than two decades, and E4=P9P8P7P6P5P4P3P2P1P0
no systematic weakness has been uncovered. And finally, This expansion function was chosen because four 10-bit
Feistel ciphers are reasonably fast and simple to implement in values were needed for the S-boxes, and it was reasonably fast
software. Speed and simplicity were two important design to implement in software [3].
aims for ICE [3].
B. Keyed Permutation
III. THE F FUNCTION After expansion, keyed permutation is used. The
permutation subkeys is 20 bits long, and is used to swap bits
The ICE F function is similar in structure to the one used
A
between E1 and E3, and between E2 and E4. When the odd
in DES, with the exception of keyed permutation described
bits of the permutation key are set they swap El relative bits
below [2]. The function as a whole is illustrated in Figure. 1
with E3 bits else they swap E2 relative bits with E4 bits. For
example, if bit 1 of the subkey is set, bit 0 of E1 and E3 will
be swapped. If bit 2 of the subkey is set, bit 1 of E2 and E4
will be swapped [2],[3].
IJ

The values E1, E2, E3, and E4, after being permuted, are
XORed with 40 bits of subkeys, then used as input to the four
S-boxes, S1, S2, S3, and S4. Each S-box takes a 10 bit input
and produces an 8 bit output [3].

C. The Permutation Function P


The four 8 bit S-box outputs are combined via a P-box into
a 32-bit value, which become the output of the F function. The
P-box, which is specified in table 1, was designed to maximize
diffusion from each S-box, and ensure that bits which are
separated by 16 places never come from the same S-box, nor
from S-boxes separated by two places(S1 and S3) [3].

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 311


Prof. P. Magesh Kannan et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 5, Issue No. 2, 310 - 314

D. The S-Boxes
The S-boxes in ICE are similar in structure to those used in
LOKI [4] in their use of Galois field exponentiation. Each S-
box takes a 10-bits input X. Bits X9 and X 0are concatenated to
form the row selector R. Bits X8.....X1 are concatenated to
form the 8-bit column selector C. For each row R, there is an
XOR offset value OR, and a Galois field prime PR [2], [3]. The
8-bit output of an S-box for an input X is given by (C XOR
OR)7 mod PR under 8-bit Galois field arithmetic. The exponent
7 was chosen because it is a one-to-one function [2], [3]. The
XOR offsets for each row in each S-box are given in table 1,
while the prime numbers are specified in table 2.
Table 1. The S-Box offset values

Figure 2. The Proposed Feedback Architecture

Table 2 . The S-Box Galois Field Prime values From the analysis of ICE, main design function lies in the ICE

T
transformation round. Especially, in the implementation of the
F function, shown in Figure 3. The F function has four parts.
The Expansion function E, the key permutation, the S-boxes
and the permutation function P [2].

Key permutation can be easily implemented using two

IV. THE PROPOSED ARCHITECTURE


ES
The proposed architecture is shown in Figure 2.This
multiplexers 2-1, while the Expansion and permutation
functions are just rearrangement of wires. So the main
implementation cost of the F function lies in the design of the
S-boxes [2].
architecture performs both encryption and decryption with
input plaintext and 64-bit input keys. It uses an input and an Considering that each S-box uses modular exponentiation
output register. Each of them stores the values of left and right in order to calculate its output, a VLSI architecture based on
part of every round and swaps the two parts if needed [2]. The Montgomery Multiplication algorithm is proposed [2]. This
Key Expansion Unit creates the round keys, using the 64-bit component is finding the mathematical function A7mod P. The
Input Key following the specifications of ICE. The Keys are architecture of this component is pipelined, with 6 stages, and
stored inside the RAM for every round [2]. it is based on the following algorithm:
The encryption process is simple. In each clock cycle, the
A
data stored in the input register are inserted in the ICE FUNCTION A7mod P(X, P)
Transformation Round along with the subkeys stored in the 1. A=MM(X, R’, P)
RAM. This process is repeated for 16 rounds. However at the
final round, the Output Register is used in order to swap the 2. B=MM(A, A, P)
left and right part of the ICE Transformation Round Output. 3. C=MM(B, A, P)
That value of the Output Register is the cipher text. The 4. D=MM(C, C ,P)
decryption process follows the same process. However the
IJ

5. E=MM(D, A, P)
subkeys are used in reverse order [2].
6. Out=MM(E, 1, P)

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 312


Prof. P. Magesh Kannan et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 5, Issue No. 2, 310 - 314

Let A, B, and M be the multiplicand and multiplier and


modulus respectively and let n be the length of operands [9].
The pre-condition of the Montgomery algorithm are as
follows.
1) 0 < A, B < M
2) 2n-1 < M <2n and gcd(M, 2)=1. So M must be odd
number.

Function MM (A, B, M)
1. R=0
2. For k=o to n-1 do begin
3. q=(a0+xky0)mod b
4. R=R+ xkB +qM
5. R=R/b end
6. Return R
End
This algorithm is a modified version of original
Montgomery Multiplication algorithm. The base b is

T
considered Radix 2 (b=2) and R=2n where n is the bit length of
operand A, B [2].
ES
Figure 3. The ICE Transformation Round

MM is the Montgomery Multiplication function [5-15] and


R’= R2mod p is precalculated, fixed number. Step 1 is needed
to transform the input value X into Montgomery format and
step 6 to change the Montgomery formatted result E into a
normal number value. The Montgomery Multiplication
algorithm was implemented using a systolic architecture,
shown in Figure 4 and Figure 5 [2].
A
Figure 4. The Processing Element (a) and the PE for the calculation of q (b)
V. MONTGOMERY MULTIPLICATION of the Conventional Architecture
The operation of modular multiplication consists of two
steps: one generates the product P= A×B and the other reduces Introducing carry save redundant logic, the modified
this product P modulo M [5-16]. The implement a Montgomery multiplication can be transformed to [8]
multiplication is based on an iterative adder-accumulator for
IJ

the generated partial product. This solution is quite slow as the Function MMcs (A, B, M)
final result is only available after n clock cycles; n is the size 1. C2in=0, C1in=0, Sin=0
of operands [9]. 2. For k=0 to n-1 do begin
3. Q=(Sin0+C1in0+C2in0+Akb0)mod 2
The advantage of Montgomery calculation is that we do
not need subtractions in order to reduce the intermediate 4. C1+C2+S=C2in+C1in+Sin+AkY+qM
results. The disadvantage is the fact that the Montgomery’s 5. C2in=(C2)/2, C1in=(C1)/2, Sin=S/2 End
modular multiplication calculates A.B2 mod M instead of
-n
6. Return C2in/2, C1in/2 and Sin/2
A.B mod M and two Montgomery’s modular multiplications
are required for one modular multiplication [6]. This algorithm
computes the product of two integers modulo a third one The Carry2 signal of a PE is connected with the next PE of
without performing division by M. It yields the reduced the next row of PE, Carry1 signal is connected with the same
product using a series of additions [9]. PE of the next row of PE (the same column) while the Sum

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 313


Prof. P. Magesh Kannan et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 5, Issue No. 2, 310 - 314

signal is connected with the previous PE of the next row VII. CONCLUSIONS
starting from bit 0 [8]. ICE is a symmetric key block cipher designed for software
applications. An efficient architecture for the VLSI
implementation is proposed in this paper. It is designed for
high clock speed - performance and minimized area resources
using Montgomery multiplication method. It is proven that the
ICE algorithm, used for this architecture, is able to be
implemented on hardware applications. The implementation on
ASIC is a system that has an external clock of 3 GHz and a
throughput of 12 Gbits/sec. Compared to other popular
encryption algorithms it is concluded that the ICE
implementation’s performance is better than most block
encryption algorithms implementations.

REFERENCES
[1] Bruce schneier, Applied Cryptography –Protocols, Algorithms and Source
Code in C, John Wiley & Sons, Seconded. New York, 1996.
[2] A. P. Fournaris, N. Sklavos and O. Koufopavlou,―VLSI Architecture and
FPGA Implementation of ICE Encryption Algorithm,‖

T
[3] M Kwan, ― The Design of the ICE Encryption Algorithm‖, in proc. Of Fast
software Encryption workshop, 1997.
[4] L. Brown, J. Pieprzyk and J. Seberry, LOKI: A cryptographic primitive for
Authentication and Secrecy Applications, Advance in cryptology-
AUSCRYPT, 90 Proceedings, Springer-verlag, pp. 229-236, 1990
[5] Peter L. Montgomery, ―Modular multiplication without trial division,‖
Mathematics of Computation, vol. 44, no.170, pp. 519-521, 1985
Figure 5. The Systolic Conventional Architecture [6] David Narh Amanor, Christof Paar, Jan Pelzl and Viktor Bunimov,

VI. PERFORMANCE
ES
The proposed architecture has been captured by using
Manfred Schimmler, ― Efficient hardware architectures for Modular
Multiplication on FPGAS‖,
[7] C. T. Koc, T. Acar and B.S Kaliski, ―Analysing and Comparing
Montgomery Multiplication Algorithm,‖ IEEE micro, vol. 16, no. 3 pp.26-33,
June 1996.
Verilog HDL. All the internal components of the design were [8] A. P. Fournaris and O. Koufopavlou, ― Montgomery Modular
Multiplication Architectures and hardware implementations for an RSA
Synthesized placed and routed using cadence tools using cryptosystem,‖
ASIC. The VLSI synthesis results are shown in Table 3 [9] Nadia Nedjah and Luiza De Macedo Mourelle, ―Systolic hardware
contain the Frequency, Area and Power comparisons Implementation for the Montgomery Modular Multiplication‖,
[10] S. E. Eldridge and C. D. Walter, Hardware implementation of
Table 3 . Covered Frequency, Area and Power comparisons Montgomery’s Modular Multiplication Algorithm‖, IEEE Transaction on
Computers, 42(6):619-624, 1993.
Encryption Freq Power Area [11] C. D Walter,―Montgomery Exponentiation Needs No Final Subtraction‖,
A
Electronics Letters, vol. 35 no. 21, October 1999, pp 1831-1832.
Algorithm (GHz) (pW) [12] D. J Guan, ― Montgomery Algorithm for Modular Multiplication,‖
Lecture notes, National Sun Yat Sen University, 2001.
ICE 3 68631.1 571191 [13] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to
algorithms, The MIT Press, Cambridge, 1990.
DES 1 76236.8 293458 [14] Shay Gueron, ―Enhanced Montgomery Multiplication,‖ in proc. Of
Workshop on Cryptographic Hardware and Embedded System, San Francisco,
TRIPLE DES 1 251144.4 223921 August 13-15 2002.
[15] Taek-Won Kwan, Chang-Seok you, Won-seok Heo, Yong-kyu Kang, and
IJ

Jun-Rim Choi, ―Two implementation methods of a 1024-bit RSA


cryptoprocessor Based on Modified Montgomery Algorithm‖, proceedings of
In Table 3, the implementations are compared the power, 2001 IEEE ISCAS, 01, May 6-9, Sydney, Australia
Covered area and operating frequency. ICE performance
In operating frequency is better compared to DES and
TRIPLE DES blocks ciphers.

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 314

You might also like