You are on page 1of 5

Minimal Instruction Set AES Processor using Harvard Architecture

J. H. Kong, L.-M. Ang and K. P. Seng


Department of Electrical and Electronic Engineering
University Of Nottingham Malaysia Campus
lalan Broga, Semenyih, 43500 Selangor, Malaysia.
E-mail: keyx9kjh@nottingham.edu.my.kenneth.ang@nottingham.edu.my.jasmine.seng@nottingham.edu.my

Abstract-This paper presents an FPGA implementation of


Advance Encryption Standard (AES), using Minimal II. MISC AES PROCESSOR
Instruction Set Computer (MISC) with Harvard Architecture.
The MISC is a derivation of the OISC. Unlike OISC
With simple logic components and a minimum set of
which uses only one instruction, the MISC AES uses four
fundamental instructions, the MISC using Harvard
minimized instructions to perform the complete encryption
Architecture enables the AES encryption in severely constraint
process. This section discusses the MISC AES architecture
hardware environment, with lesser execution clock cycles. The
in detail including the instruction set, data path architecture
MISC architecture was verified using the Handel-C
and control circuit, memory map and program flow.
hardware description language and implemented on a
Xilinx Spartan3 FPGA. The implementation uses two separate A. Instruction Set
block RAMs and occupied only 1 % of the total available chip
To perform AES computations onto the plain text, byte­
area.
oriented transformations are adapted from the AES
Keywords-Minimal Instruction Set Computer; Computer encryption method. Instructions for transformations like, Sub
Security; AES Bytes Transformation and Mix Columns Transformation
have to be custom developed for the MISC AES Architecture.
The 4 different instructions used to perform the complete
I. INTRODUCTION
AES encryption are shown in the Figure 1 below. The four
A Minimal Instruction Set Computer (MISC) is a instructions are SBN (Subtract and Branch if Negative),
computer similar to an OISC with the difference of, the XOR, xTime and Sub Bytes. The four MISC instructions
MISC incorporates not only one instruction but several are differentiated using the two MSBs of each of the
simple instructions, sufficient to execute the necessary instructions. The SBN instruction (Subtract and Branch if
operations. The one instruction set computer (OISC), also Negative) takes in two data values and subtracts them. If the
called as the Ultimate Reduced Instruction Set Computer output yields a negative value, the Program Counter will
(URlSC), is a single-instruction universal computing be added with the Target Address and a branch
machine. From this computer organization, it can be further instruction is executed. The XOR operation takes in two
developed into a customized processor, for any computing data values and performs the XOR operation on them.
needs. With a sufficient amount of minimized instructions, a The xTime instruction is a part of the operation in order
new purpose-designed computer can be built. to perform the complete Mix Column transformation. The
To meet the stringent requirements in terms of chip size Sub Bytes instruction takes in a data byte, and performs the
of some systems, e.g. wireless sensor networks, mobile Sub Bytes Transformation.
communication and embedded systems, the MISC can be From the figure 2, each of the instructions uses 3 bytes in
made use of for custom hardware modules for AES the program memory. The first byte holds the Op Code and
encryption. To perform a full AES encryption, a total of a the address of Mem_A, the second byte holds the address of
total of 10 rounds of substitution and permutation are Mem B and the last byte holds the target address. With four
required. All four byte-oriented transformation together with different op codes embedded in the first byte of the
a 128 cipher key are applied onto the input plain text. For instruction, the MISC selects the appropriate output from the
each round, a unique cipher key will be generated by the Key corresponding processor block.
Expansion algorithm. In every round, the Round Key will be
added to the plain text before going through rounds of
transformation such as the S-Box Substitution (Sub Bytes),
Mix Columns, and Shift Rows [5].
This paper seeks to further present an application of the
MISC, using a Harvard Architecture for lower execution
clock cycles of the instructions. Security applications with
constraints like small physical size and memory sizes can be
fulfilled by the practical implementation of MISC using
encryption algorithms especially the advanced encryption
standard (AES).

978-1-4244-5540-9/10/$26.00 ©2010 IEEE

65
SBN (Subtract and Branch If Negative) Memory Address Register (D_MAR) stores the address of
Mem_B Mem_B - Mem_A
= the memory, providing a pointer to which memory location
If Mem_B < 0 Goto (PC + C) in the data memory is to be written or read. The Z and N
Else Goto (PC + 1) registers are used indicate whether the arithmetic result
performed by the Adder block, is either a zero or a negative.
XOR The Memory Data Register (MDR) stores the output
Mem_B = Mem_B XOR Mem_A computed by the ALU. The Operation Code Register (OP)
stores the Op Code of the current instruction and the ALU
xTime output MUX is used to select the output from the ALU. The
Mem_B xTime(Mem_B)
=
computation results can be either the results of the Adder,
XOR, xTime or the Sub Bytes. After both Mem_A and
Sub Bytes Mem_B are loaded, the ALU output multiplexer will select
Mem_B Sub_Bytes(Mem_B)
=
the appropriate output from the processor blocks, according
to the op code stored in the op code register. The result will
Figure 1. The MISC Instrcution sets. then be written back to the memory, replacing the value of
the second data item read (Mem_B) from the data memory.
Operation Op Code (2 MSB) Instruction Format
C. Memory Map
SBN 0 (00) (00 @Mem_A), (Mem_B), Target
XOR 1 (01) (01 @Mem_A), (Mem_B), Target The memory used in the AES MISC architecture is based
on the Harvard Architecture. The architecture includes a
XTime 2 (10) (10 @00000000), (Mem_B), Target
1024 x lO-bit Program Memory and a 256 x 8-bit Data
Sub Bytes 3 (11) (11 @00000000), (Mem_B), Target Memory. Figure 4 below shows the size of the implemented
Figure 2. The MISC Instrcution format. memory in the MISC architecture.
In the program section, instructions are sorted in a
sequence as the MISC executes in accordance. In the data
B. MISe Data PathArchitecture
section, the breakdown of the memory allocation the plain
text, master key and other temporary variables are shown in
figure 4 and figure 5 below.
o ,------,

Data Section
(128 bytes)

127
o :=======:

Program Code Section


(1024 bytes)
MDR

D_MEM_READ
D_MEM_WRITE

1023 '--__________---'

Figure 4. The MISC AES Memory Map,.


Figure 3. The MISC AES Data Path Architecture.

The figure 3 shows an overview of the MISC AES data


path architecture. The architecture consists of 8 registers, 3
multiplexers (MUX), 2 block-RAMs (BRAM), an Adder, an
XOR Block, an xTime Block and a Sub Bytes Block. At the
top of the architecture, the PC register stores the Program
Counter (PC) value, which holds the address of the program
instruction. The R register will store the first data (Mem_A)
read from the date memory. The Program Memory Address
Register (P_MAR) stores the address of the next program
instruction in the program memory to be read. The Data

66
CipherKey(0-15)
Plain Tex1 (16 -31)
Temp lor Shift Row(32 -47)
Rcon[i] (48- 63)
(Reserved) (64 -79)
Temp lor Mix Column (80-95)
Temp lor Mix Column (96-111)
127 Temp Variables (112 - 127)

Key Generation (0 176)


Shift Row (177 272)
Sub Bytes(273 - 320)
Loopl (361 - 323)
Mix Columns(324-899)
Add RoundKey(900-947)
Loop2 (949 950)
1023 END (951 - 953) Figure 7. The Mix Column Transformation.
NOP (954 - 1023)

Figure 5. The Memory allocation for Data and Program Code Sections.

D. AESALU
The MISC AES ALU consists of 4 logic blocks: Adder,
XOR, xTime, and Sub Bytes. The results of the ALU are
dependent on the Op Code value embedded in the
instructions, which will be stored in the Op Code Register
during processing. The ALU output MUX will choose the
appropriate output from the desired logic block, Figure 8. The AES Processor ALU.
corresponding to the op code stored in the program memory.
3) Sub Bytes Block using combinational circuit: The
AESALU Sub Bytes block in the MISC AES is implemented using
Mem_B Mem_A Mem_B Mem_A
[9:0] [9:0] [7:0] [7:0] a non-linear substitution cipher. Conventional S-Box
implementation method is to store the values of the S-Box

I\t."
XOR_Out
into a ROM and use it as a look-up table. The input byte is
mapped to the multiplicative inverse in the Galois field GF
(28). The combinational logic approach in [1] involve
inversions in Galois Field (GF). Composite Field arithmetic
ALU Output Mux implementations exploited in [4], [5] can be employed in
such that the field elements are mapped to elements in the
isomorphic composite fields. In the isomorphic composite
Figure 6. The AES Processor ALU.
fields, operations can be applied by lower cost subfield
operations, with a complex harware. Figure 11 shows the
1) Adder and XOR Block: The Adder circuit takes in 2 combinational logic components for the Sub Bytes
inputs and adds them together. The adder is important in transformation.
performing the subtract operation for the SBN instruction. E. Control Circuit
In the XOR Block, the circuit takes in 2 inputs and performs
The control signals are produced by a combinational
an XOR operation on the 2 input data.
logic circuit during each clock cycles. The Control Circuit is
2) xTime Block: The xTime block is a part of the Mix
driven by a 3-bit Counter (C2CICO) and a total of 6 clock
Column transformation. In [1], by using the sub structure cycles. At each clock cycle, the control signals set and
computation of a byte and between the computations of four control the registers, MUX and memory loading. At clock
bytes in an array of bytes, the derivation of the Mix Column cycle 0, the value of the program counter (PC) is loaded into
transformation can be writtenin a form, involving a process the P_MAR and the Z register will be set accordingly by the
of several XOR processes and xTime processes. The Mix Adder block to determine whether the PC has restart at OxOO.
Column and xTime logic transformations are shown in At clock cycle 1, the address of the data is read from the
Figure 7 and Figure 8. program memory location, addressed by the P_MAR, and
loaded into D_MAR and at the same time, the Op Code for
the instruction is written to the OP Register. On the other
hand, the incremented value of PC is stored in P_MAR, to
load the second address of Mem B into D MAR in the next
- -

cycle. At clock cycle 2, the value of Mem_A is read and


stored into the R register. These processes are repeated for a

67
second data during clock cycle 2. The PC value will be same instruction codes, rounds of operations can be executed
increased by 1 after each data loading operation is done. At Figure 10 shows the program flow of the AES in the
clock cycle 1 and 2, the PC value is pipelined into the Program Section.
P_MAR and the memory addresses corresponding to the PC I
value is pipelined into the D_MAR. Both Mem_A and
Mem_B are sequentially loaded into the data path for II Key Generation I
processing. +
At clock cycle 3, the value of Mem_B is read and sent to II Shift Row
I
the ALU for computation, together with the data item +
Mem_A which is stored in R register. The Adder and the II Sub Bytes
I If Loop1 is
other hardware blocks will perform their individual Negative, jump

operations from the two given inputs (Mem_A and Mem_B). --1 Check Loop1
I to Add Key.
Else, proceed.

At that particular clock cycle, depending on the value of the I


OP Register, the desired output will be chosen via an ALU II Mix Column
I
MUX . At the same clock cycle, the output from the ALU is +
stored to the MDR Register. With the arithmetic operations Program Start -
H Add Key
I
performed on the data, clock cycle 4 will load the jump .j.
If Loop2 is
Negative, jump
address from memory. Then the jump address will be added I Check Loop2
f---- to Key
Generation.
to the PC value at the same clock cycle provided that the OP • Else, proceed.

value is corresponding to the SBN instruction and a I END


I
negative result is found at the output. At clock cycle 5, the
value of the PC is incremented. Figure 10. The Memory allocation for Data and Program Code Sections.
The Fig. 7 below shows the control signals to each bit of
the 3-bit counter, expressed in Boolean equations. If a
branching occurs, the N register would have a value of 0 III. IMPLEMENTATION RESULTS
stored. The PC register would just would take in the value of The MISC AES Harvard architecture implementation
the target address and increase by 1. Then the following occupies only 1% of available flip-flop and LUTs in the
instruction in the written program code will be performed. hardware is used. There is only a total of 1024 kilobytes
The whole 6 clock cycles will repeat itself for until then end memory used for the AES program and 256 kilobytes
of the program reached and 3-bit counter will reset once the memory used for the data and temp variables in the MISC
value C 5 is reached.
=
architecture implementation. Using 2 BRAMs, which is only
ALU_BO=C2CO +c1CO 6% of the total available Block RAMs in the RCIO board is
ALU_BI=C2Co sufficient in this implementation. The total memory used for
ALU_AO=C1CO the AES program in the program memory is 954 bytes and
ALU_AI=C2Co +c1CO each instruction take 3 bytes and the total of instructions
CIN=c1 +co
used is 318. Each instruction takes 6 clock cycles to execute
and a total of 1908 clock cycles is used. The hardware usage
can be found in the Table 1 and Table 2 below.
DMAR_Write=C1Co +C2C1CO
TABLEr. THE HARDWARE USAGE OF THE AES IMPLEMENTAnON
PC_Write=C2C1CO +C2C1CO +C2C1CoN
PMem_Read=C2Co +C1CO +C,C1CO Component Quantity Total Usalfe
DMem_Read=C1 Flip Flop 300 slice 13,312 2%
4 input LUTs 540 26,624 2%
DMem_Write=C2Co
- Logic 485 - -

R_Write=C1Co - Route thru - -


54
Z_Write=C2C1CO - Dual Port Rams 0 - -

N_Write=C1CO - Shift registers 1 - -

MDR_Write=C1CO Bonded lOB 44 221 19%


Block RAMs 2 32 6%
Op_Write=C2C1Co
BUFGMUX 4 8 50%
Op_SEL=C1CO
DCMs 1 4 25%
Figure 9. The boolean expression of the control signals.

TABLE II. THE NUMBER OF LOGIC GATES USED IN THE AES ALU
F. AES Program Flow
In the AES program, the program flow for has to be
sorted in a way to suit the location of the instructions. By
using SBN and branching instructions, program codes can be
reused for memory utilization. Instead of duplicating the

68
ALU block AND Gate XORGate ORGate [2] Tim Good and Mohammed Benaissa, Very Small FPGA Application­
xTime - 3 - Specific Instruction Processor for AES, IEEE transactions on circuits
Sub Bvtes 45 186 - and systems,vol. S3,no. 7,2006.
XOR - 1 - [3] William F. Gilreath, Phillip A. Laplante, "Subtract and Branch
Adder (10-bit) 20 20 10 if Negative (SBN)," Computer Architecture: A Minimalist
Perspective, Springer, United States of America, pp.41-42,2003.
[4] Edwin NC Mui (Custom R&D Engineer, Texco Enterprise PTD.
LTD), Practical Implementation of Rijndael SBox using
IV. CONCLUSION
Combinational Logic.
In this paper, the MISC AES Processor using the Harvard [S ] Gael Rouvroy, Franc;ois-Xavier Standaert, Jean-Jacques Quisquater
architecture is proposed. The total program memory required and Jean-Didier Legat, "Compact and Efficient Encryption /
for the AES program is less than lk bytes. During simulation Decryption Module for FPGA Implementation of the AES Rijndael
Very Well Suited for Small Embedded Applications", itcc, vol. 2,
and development of MISC, indicator modules such as pp.S83,International Conference on Information Technology: Coding
7segment display were added into the design, thus resulting and Computing (lTCC'04) Volume 2,2004.
an increase in the total amount of slices used. By using the [6] Kane, E. 200S. "An Ultimate Minimal RISC Processor for Space
Harvard architecture, the clock cycles during encryption Applications". Colorado Space Grant Consortium's: Undergraduate
process would be reduced and nonetheless, the simplified Symposium Proceedings. (Aug. 200S).
MISC Architecture implements on simple hardware add [7] D. Engels, X Fan, G. Gong, H Hu, and E. M. Smith, Hummingbird:
points to adapt to the stringent hardware requirement and ultra-lightweight cryptography for resource-constrained devices, The
14th International Conference on Financial Cryptography and Data
faster encryption.
Security - FC 2010, January 2S-28, 2010, Tenerife, Canary Islands,
Spain.
REFERENCES
[ 1] Xinmiao Zhang and Keshab K. Parhi, High-Speed VLSI
Architectures for the AES Algorithm, IEEE transactions on very large
scale integration (VLSI) systems,vol. 12,no. 9,2004.

8 bit inpu
(1 byte) 8 bit output
(1 byte)
b b , b b,
b: b b b b b b

24.

24.

4.

Figure 11. The complete circuit of the Sub Bytes Transformation.

69

You might also like