This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

Scribd Selects Books

Hand-picked favorites from

our editors

our editors

Scribd Selects Audiobooks

Hand-picked favorites from

our editors

our editors

Scribd Selects Comics

Hand-picked favorites from

our editors

our editors

Scribd Selects Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

P. 1

The Design of Low-Area 32-Bit AES Encryption-Decryption System on FPGA|Views: 305|Likes: 2

Published by Wattanit Hotrakool

This is unpublished work under Creative Commons Attribution-ShareAlike 3.0 Unported License.

For more detail visit http://creativecommons.org/licenses/by-sa/3.0/

For more detail visit http://creativecommons.org/licenses/by-sa/3.0/

This is unpublished work under Creative Commons Attribution-ShareAlike 3.0 Unported License.

For more detail visit http://creativecommons.org/licenses/by-sa/3.0/

For more detail visit http://creativecommons.org/licenses/by-sa/3.0/

See more

See less

https://www.scribd.com/doc/58343899/The-Design-of-Low-Area-32-Bit-AES-Encryption-Decryption-System-on-FPGA

05/24/2012

**Encryption/Decryption System on FPGA
**

Wattanit Hortrakool, AAI AlHarbiy, Ji Song, Xiao-Yang Ji, Yu-Ou Jiang

Abstract – In many papers, FPGA design for the

Advanced Encryption Standard (AES) Rijndael

algorithm mainly focused on the high throughput that

is up to twenty gigabit per second (Gbps). While there

are few application need high throughput, instead, the

low cost and low area are more suitable. This paper

indicates a 32-bit core architecture which occupies

only 288 slices in Spartan-3 device and provide the

throughput upto 195 Mbps.

Index Terms— Advanced Encryption Standard

(AES), Field Programmable Gate Array (FPGA),

Encryption, decryption, and low area.

I. INTRODUCTION

This coursework objective is to design a

encryption and decryption unit using the Advanced

Encryption Standard (AES) algorithm and

implement the system on Field Programmable Gate

Array(FPGA) board.

National Institute of Standards and

Technology replacing propose AES of Rijndael

cipher algorithm on 2001. It is a new digital

encryption standard that replace Digital Encryption

Standard (DES). Moreover, it is a Symmetric Key

Cryptosystem that means the encryption and

decryption use the same key ciphers. This

algorithm could use the 128, 192 and 256 bits as

the block ciphers size on 128-bit data block, and it

is more flexible, security and effective in the

cryptography [1].

Recently, the low area consumption of AES are

applied in Wireless Local Area Networks (WLAN),

Wireless Personal Area Networks (WPAN),

Wireless Sensor Networks (WSN) or other fields

[2]. Typically, AES algorithm with loop-unrolled

and 128-bit data path is high-speed design, but the

consumption and area also remain high. Reducing

the data-path from 128-bit to 32-bit could decrease

the slices of area, thus the 32-bit data path is

applied in our design with 128-bit ciphers length.

This paper is organized as follows. In section 2

indicates an overview of AES. Section 3 presents

our the 32-bit low-area architecture. The

implementation result and comparison with other

works are shown in Section 4. Finally, Section 5

makes a conclusion of this paper.

II. ADVANCE ENCRYPTION STANDARD

The Advance Encryption Standard is a round-

based symmetric bloc cypher algorithm. AES uses

a cipher key of length 128, 192, or 256 bits to

encrypt or decrypt the data block of 128 bits [3].

The number of iteration round Nr depends on the

size of key, which are 10, 12, 14 rounds

respectively. In each round between 1 to Nr-1,

there are four basic operations which are SubByte,

ShiftRow, MixColumn and AddRoundKey. Each

128-bit data block called state. SubByte is a

nonlinear byte substitution, uses a substitution table

(Sbox) to operate on each byte of the state

independently. ShiftRow circularly shifts different

numbers of bytes on the row of state. MixColumn

mixes the bytes in the columns using the

multiplication of the state with a polynomial

modulo. Finally, AddRoundKey is an XOR process,

adding a round key from Key Expansion unit to the

state in each iteration. The encryption and

decryption flow diagrams are shown in Figure 1.

(a) Encryption (b) Decryption

Figure 1 128-bit AES Encryption/Decryption flow diagram.

Normal text

AddRoundKey

SubByte

ShiftRow

MixColumn

AddRoundKey

SubByte

ShiftRow

AddRoundKey

Cypher Text

Normal text

AddInvRoundKey

InvSubByte

InvShiftRow

InvMixColum

InvSubByte

InvShiftRow

AddInvRoundKey

Normal text

AddInvRoundKey

In thi

low-area

implemen

device (X

design u

However

192-bit a

architectu

A. SubBy

In thi

(shown

2048x9 d

These Bl

LUT for

the Block

each row

wide, th

connecte

whereas

encryptio

BRAMs

operation

The use

occupied

computin

throughp

presented

B. ShiftR

The d

(shown in

16-bit Sh

shift regi

because

shift reg

means th

because

capable t

slice of S

III. PROPOS

s section, ou

AES system

nted on the

XC3S50). Th

using 32-bit

r, this system

and 256-bit k

ure of this sys

yte and Invese

is design, the

in Figure 3)

dual-port Bloc

lock RAMs a

SBox and In

k RAMs prov

w. The address

he first 8 bit

d to the in

the 9

th

bit

on/decryption

can perform

n for 4 bytes, w

of these BR

d by 4 sets of

ng 4 SBox

put especially

d in the system

Row and Inver

design of S

n Figure 4) ar

hift Registers(

isters are grou

of 32-bit data

ister to hand

here are 32 re

SRL16 requi

to put 2 regis

Spartan 3 cont

ED ARCHITEC

ur designed

m is presented

e smallest X

his design is t

datapath wit

m can be rede

key as well.

stem is shown

eSubByte

e SubByte a

are implemen

ck RAMs(RA

are treated as

nvSBox. Each

vides the Sub

s of the Block

ts address of

nput data fro

is used to

LUT. Altog

m SubByte a

which is 1 co

RAM halp re

combinationa

xes, as wel

when there i

m.

rseShiftRow

ShiftRow and

re implemente

(SRL16) and

uped as a byte

apath- There

dle the data f

egisters prese

ires only 1 L

ster into a sin

tains 2 LUTs).

Figur

TURE

architecture f

d. Our design

Xilinx Sparta

the round-bas

th 128-bit ke

esign easily f

The high-lev

in Figure 2.

and InvSubBy

nted using tw

AMB16_S9_S

ROMs to sto

h 8-bit output

bByte result f

k RAM is 11-

f each port a

om every ro

select betwe

gether, these

and InvSubBy

olumn, at a tim

educe the slic

al logic used f

ll as increa

is no pipelini

d InvShiftRo

ed using a gro

multiplexers.

e. In this desig

are 4 groups

for one colum

ented. Howev

LUT, there a

ngle slice (CL

.

re 2 High-level ar

for

n is

an3

sed

ey.

for

vel

yte

wo

9).

ore

of

for

-bit

are

ow

een

2

yte

me.

ces

for

ase

ing

ow

oup

. 8

gn-

of

mn,

ver,

are

LB

op

m

si

ef

si

co

th

In

In

In

In

In

rchitecture

Figure 3 Im

The output

peration is

multiplexers us

mple calcula

fficiently cont

gnal. As a re

omponent occ

he input data a

nvShiftRow.

Figure 4 D

Byte 1 in

Byte 2 in

Byte 3 in

Byte 4 in

E/D

nput row 1

nput row 2

nput row 3

nput row 4

O

mplementation of

of ShiftRo

done by co

sing 4-state Fi

ation, each

trolled using

sult, the Shift

cupies only 18

and the output

Design of ShiftRo

Addr

BRAM

Addr

Addr

BRAM

Addr

utput row 1-4

SubByte/InvSub

ow and Inv

ontrolling th

inite State Ma

multiplexer

only single b

tRow and Inv

8 slices. Table

ut data of Shif

ow and InvShiftR

Ou

Byte 4

Byte 1

Byte 2

Byte 3

8

Byte

vShiftRow

he 7-to-1

achine. By

can be

bit control

vShiftRow

e 1 shows

ftRow and

Row

utput

4

2

3

8xSRL16

8xSRL16

8xSRL16

8xSRL16

C. MixC

Galois

column tr

column

paramete

calculate

and decry

The m

a constan

c(x)

While

by a fixed

=

This e

implemen

of {0b}

implemen

where

c(x)

Howev

InvMixC

results in

using the

different

Figure

As sh

InvMixC

the logic

number o

Instead

apply aft

be seen,

each of

Moreove

mean onl

0 4

1 5

2 6

3 7

(a) In

Table 1 Result o

Column and In

s Field multip

ransformation

are represen

er in GF ( 2

d by function

yption. The fo

o(x) = o

3

x

3

mix column m

nt polynomial

) = {uS]x

3

+

, in the decry

d polynomial

J(x)

= {ub]x

3

+ {u

equation of in

nted owing to

, {0d}, {09

nt as follow.

J(x) = c(x

= {u8]x

3

+{

¡(x) =

ver, this

Column is very

n large circui

e method me

method follow

5 Implementation

hown in Figu

Column in this

c and resourc

of slices occup

c(x) ∗

c(x) ∗

J(x)

2

=

d of comput

er c(x)in orde

J(x)

2

has o

them is str

er, {05} coul

ly one multipl

4 8 12 0

9 13 5

6 10 14 10

7 11 15 15

nput (b)Aft

of ShiftRow and I

nverseMixColu

plication is es

n, and in the 3

nted as pol

2

8

). Every

n that is variab

orm of polyno

3

+o

2

x

2

+o

1

multiplied mod

c(x).

+{u1]x

2

+ {u

yption the inv

d (x), shown

) = c

-1

(x)

uJ]x

2

+ {u9]x

nverse mix co

o complicated

9} and {0e}

x) +c(x) +¡

{u8]x

2

+ {u8

{u4]x

2

+ {u4

method of

y complex, in

it. In this des

entioned abov

wing [4].

n of mix column

column

ure 5, the M

s system are d

ces in order

pied. By using

J(x) = {u1]

J(x)

2

= J(x)

= {u4]x

2

+ {u

ting J(x) dire

er to get the in

only two mul

raightforward

ld equal to

lication need c

4 8 12

9 13 1 1

0 14 2 6 1

3 7 11

er ShiftRow (c)Aft

InvShiftRow

umn

ssential for m

2-bit system t

lynomials w

byte could

ble in encrypti

mials is

1

x +o

0

dulo x

4

+ 1 w

1]x + {u2]

verse multipli

by

x + {uc]

lumn is direc

d multiplicati

. It could

¡(x)

8]x + {u8]

4]

implementi

nefficient whi

sign, instead

ve, we apply

and inverse mix

MixColumn a

esigned to sha

to optimize t

g this relation

]

)

S]

ectly, J(x)

2

c

nverse. As it c

ltiplications a

to impleme

{04}+{01} th

calculate.

0 4 8 12

13 1 5 9

10 14 2 6

7 11 15 3

ter InvShiftRow

mix

the

with

be

ion

with

ied

ctly

ion

be

ing

ich

of

y a

and

are

the

can

can

and

ent.

hat

sy

co

M

D

ex

ev

ca

w

re

w

st

be

ut

co

w

co

D

an

R

Fi

w

de

R

fir

th

on

pr

an

co

M

E.

en

m

co

as

co

key

re

By using th

ystem is mu

omponent occ

MixColumn an

D. Key Expans

Generally th

xpansion. The

very block of

an change key

way will repea

esult in speed

way is to comp

ore all keys i

efore key add

tilising the

omponent.

In order to a

way is used.

omponent ca

Decryption uni

nd then store

AM(RAMB1

igure 6.

Figu

As a new ke

will store in 3

elay. When th

otWord, S-Bo

rst column re

hat, the new c

ne clock and

rocess is repea

nd stored. The

olumn being c

Machine.

. Control Un

The contro

ncryption/decr

multiplexers. T

ome from the

s a master

ontroller also

y_in

set

his method, t

uch reduced

cupies total

d InvMixColu

sion

here are two w

e first way is

encrypted da

y very fast w

at key schedu

reduction and

plete whole k

into block ram

dition. This s

resource,

achieve low-ar

By using th

an be share

t. We first pre

them in a 5

6_36), the d

ure 6 Structure of

ey come in, t

-deep shift re

he fourth colu

ox and then a

eading from th

column create

add with the

ated until all 1

e control of R

calculate are

it

ol of data

ryption is do

The coltrol si

Finite State

controller. M

used to contr

FSM

Ro

Rcon

the complexi

d. As a re

of 56 slices

umn.

ways to imple

s process key

ata. Using this

with no delay

ule every tim

d inefficient.

key expansion

m, then read

second way

especially

rea degsign, th

his way, the

ed with En

ecompute all r

12x9 single-p

diagram is s

Key Expander

the first three

egister after o

umn flow in,

add with Rco

the shift regis

ed will be de

e second colu

10 round keys

Rcon and the c

done by a Fi

a path us

one by contro

ignals for mu

Machine whi

Moreover, thi

rol the round

del otWord

S‐Box

ty of the

sult, this

for both

ement key

y schedule

s way one

y, but this

me, which

The other

n first and

them just

also help

SubByte

he second

SubByte

ncryption/

round key

port block

shown in

e columns

one clock

it will do

n and the

ster. After

elayed by

umn. This

s are done

chosen for

inite State

sing for

olling the

ultiplexers

ich works

is master

d key read

ay

3deep

SRL16

key_

order during the reading from block RAM. In this

design, the controller is a 256-state FSM, which is

implemented using the 8-bit counter.

In general, the value of 8-bit counter is

correspond to the address of block RAM. Using

this way, the round key can be easily read from

block RAM then added into the data.

In order to control the datapath, multiplexers can

be easily control by a single Encryption/Decryption

signal (E/D) using simple logic gates. Moreover, in

case of decyption, the round key is needed to be

read in reverse order. This can be done by using a

simple substraction circuit to reverse the state of

FSM.

IV. RESULT AND COMPARISON

TABLE 2 PERFORMANCE COMPARISON BETWEEN

AES IMPLEMENTATION

Our P.

Chodo

wiec et

al.[4]

G.

Rouvro

y et al.

[5]

S.

McMill

an et

al.[6]

K. Gaj

et al.

[7]

Device

used

Sparta

n3

Sparta

n2-6

Sparta

n3

Virtex Virtex

-6

Functio

nality

Both

Encry

ption

and

Decry

ption

Both

Encry

ption

and

Decry

ption

Both

Encry

ption

and

Decry

ption

Encry

ption

only

Both

Encry

ption

and

Decry

ption

Key

length

128-

bit

128-

bit

128-

bit

Extern

al Key

Expan

sion

Extern

al Key

Expan

sion

CLB

slices

288 222 163 240 2902

BRAM

s

3 3 3 8 0

Throug

hput

(Mbps)

195 166 208 250 331.5

Clock

(MHz)

130 60 71 136 26

Clock

cycle

per

round

8 4 4 7 1

As shown in Table 2, this design is comparable

to the other available designs. The first issue is

device used. Our design used Spartan3 device,

which are small and low cost FPGA, which is

comparable to the design of [4] and [5]. Whereas

the [6] and [7] used Virtex FPGA which are much

more advance and, thus, more expansive. In term of

functionality, Most design available with

encryption/decryption capability, except that of [6]

which has only encryption core available . In term

of key length, out design, as well as [4] and [5], has

internal key expansion unit which can work with

128-bit key version. However, the design of [6] and

[7] has no internal key expansion, therefore, the

additional key expansion from outside source is

required. In term of area, Our design requires 288

slices with 3 block RAMs, which are a slightly

higher than the other low-area design, such as

[4],[5], and [6]. However, our throughput achieved

is at 195 Mbps, which are even higher than [4]

which is our ancestor. The interesting part of our

result is the maximum clock frequency. In our

design, we can achieve the maximum clock

frequency at 130MHz, which is much higher than

any other design in devices in Spartan family. Our

clock frequency almost equal to the system which

is implemented in Virtex family. By using this

design, the Spartan3 device can work at very fast

speed and enable the high clock frequency for other

circuits implemented in the same FPGA.

V. CONCLUSION

In this work, a compact and fast solution of AES

on FPGA was implemented. This design is shown

to be one of the highest throughput per slice as

compared to table above. This implementation was

done on the smallest Spartan-3 FPGA, results in

288 slices occupied with 3 block RAMs to achieve

the throughput of 195Mbps at 130MHz clock

frequency. This design can serve wide range of

embedded system that varies from applications

which is sensitive to latency and need high speed

connection like video conference down to

applications that require low area like smart card.

REFERENCE

[1] S. N. Han and X. J. Li, “Area and Power Optimized serial

AES Encrypt/Decrypt Circuit” MICROELECTRONICS,

vol. 40, Beijing: Chinese Academy of Sciences, 2010.

[2] W.-K. Chen, Linear Networks and Systems (Book style).

Belmont, CA: Wadsworth, 1993, pp. 123–135.

[3] National Institute of Standards and Technology (NIST),

Information Technology Laboratory (ITL), Advanced

Encryption Standard (AES), Federal Information

Processing Standards (FIPS) Publication 197, November

2001.

[4] P. Chodowiec, K. Gaj, P. Bellows and B. Schott,

Experimental Testing of the Gigabit IPSec-Compliant

Implementations of Rijndael and Triple DES Using

SLAAC-1V FPGA Accelerator Board, Information

Security Conference (ISC 2001), Malaga, Spain, 2001.

[5] G. Rouvroy, F.-X. Standaert, J.-J. Quisquater and J.-D.

Legat,“Compact and efficient encryption/decryption

module for FPGA implementation of the AES Rijndael

very well suited for small embedded applications”, In

Proc. IEEE Int. Conf. on Inf. Tech.: Coding and

Computing, vol. 2, pp. 583–587, Las Vegas, NV, USA,

April.2004.

[6] S. McMillan and C. Patterson, JBits Implementations of

the Advanced Encryption Standard (Rijndael), Field-

Programmable Logic and Application (FPL 2001),

Belfast, Northern Ireland, UK, 2001.

[7] K. Gaj and P. Chodowiec, Comparison of the hardware

performance of the AES candidates using reconfigurable

haredware, Third Advanced Encryption Standard (AES3)

Candidate Conference, New York, 2000.

Wattanit Hotrakool | Curriculum Vitae

The Second Internet

fips-197

A Study of Image Fusion Technique Using Wavelet

- Read and print without ads
- Download to keep your version
- Edit, email or read offline

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

CANCEL

OK

You've been reading!

NO, THANKS

OK

scribd