Professional Documents
Culture Documents
65
SBN (Subtract and Branch If Negative) Memory Address Register (D_MAR) stores the address of
Mem_B Mem_B - Mem_A
= the memory, providing a pointer to which memory location
If Mem_B < 0 Goto (PC + C) in the data memory is to be written or read. The Z and N
Else Goto (PC + 1) registers are used indicate whether the arithmetic result
performed by the Adder block, is either a zero or a negative.
XOR The Memory Data Register (MDR) stores the output
Mem_B = Mem_B XOR Mem_A computed by the ALU. The Operation Code Register (OP)
stores the Op Code of the current instruction and the ALU
xTime output MUX is used to select the output from the ALU. The
Mem_B xTime(Mem_B)
=
computation results can be either the results of the Adder,
XOR, xTime or the Sub Bytes. After both Mem_A and
Sub Bytes Mem_B are loaded, the ALU output multiplexer will select
Mem_B Sub_Bytes(Mem_B)
=
the appropriate output from the processor blocks, according
to the op code stored in the op code register. The result will
Figure 1. The MISC Instrcution sets. then be written back to the memory, replacing the value of
the second data item read (Mem_B) from the data memory.
Operation Op Code (2 MSB) Instruction Format
C. Memory Map
SBN 0 (00) (00 @Mem_A), (Mem_B), Target
XOR 1 (01) (01 @Mem_A), (Mem_B), Target The memory used in the AES MISC architecture is based
on the Harvard Architecture. The architecture includes a
XTime 2 (10) (10 @00000000), (Mem_B), Target
1024 x lO-bit Program Memory and a 256 x 8-bit Data
Sub Bytes 3 (11) (11 @00000000), (Mem_B), Target Memory. Figure 4 below shows the size of the implemented
Figure 2. The MISC Instrcution format. memory in the MISC architecture.
In the program section, instructions are sorted in a
sequence as the MISC executes in accordance. In the data
B. MISe Data PathArchitecture
section, the breakdown of the memory allocation the plain
text, master key and other temporary variables are shown in
figure 4 and figure 5 below.
o ,------,
Data Section
(128 bytes)
127
o :=======:
D_MEM_READ
D_MEM_WRITE
1023 '--__________---'
66
CipherKey(0-15)
Plain Tex1 (16 -31)
Temp lor Shift Row(32 -47)
Rcon[i] (48- 63)
(Reserved) (64 -79)
Temp lor Mix Column (80-95)
Temp lor Mix Column (96-111)
127 Temp Variables (112 - 127)
Figure 5. The Memory allocation for Data and Program Code Sections.
D. AESALU
The MISC AES ALU consists of 4 logic blocks: Adder,
XOR, xTime, and Sub Bytes. The results of the ALU are
dependent on the Op Code value embedded in the
instructions, which will be stored in the Op Code Register
during processing. The ALU output MUX will choose the
appropriate output from the desired logic block, Figure 8. The AES Processor ALU.
corresponding to the op code stored in the program memory.
3) Sub Bytes Block using combinational circuit: The
AESALU Sub Bytes block in the MISC AES is implemented using
Mem_B Mem_A Mem_B Mem_A
[9:0] [9:0] [7:0] [7:0] a non-linear substitution cipher. Conventional S-Box
implementation method is to store the values of the S-Box
I\t."
XOR_Out
into a ROM and use it as a look-up table. The input byte is
mapped to the multiplicative inverse in the Galois field GF
(28). The combinational logic approach in [1] involve
inversions in Galois Field (GF). Composite Field arithmetic
ALU Output Mux implementations exploited in [4], [5] can be employed in
such that the field elements are mapped to elements in the
isomorphic composite fields. In the isomorphic composite
Figure 6. The AES Processor ALU.
fields, operations can be applied by lower cost subfield
operations, with a complex harware. Figure 11 shows the
1) Adder and XOR Block: The Adder circuit takes in 2 combinational logic components for the Sub Bytes
inputs and adds them together. The adder is important in transformation.
performing the subtract operation for the SBN instruction. E. Control Circuit
In the XOR Block, the circuit takes in 2 inputs and performs
The control signals are produced by a combinational
an XOR operation on the 2 input data.
logic circuit during each clock cycles. The Control Circuit is
2) xTime Block: The xTime block is a part of the Mix
driven by a 3-bit Counter (C2CICO) and a total of 6 clock
Column transformation. In [1], by using the sub structure cycles. At each clock cycle, the control signals set and
computation of a byte and between the computations of four control the registers, MUX and memory loading. At clock
bytes in an array of bytes, the derivation of the Mix Column cycle 0, the value of the program counter (PC) is loaded into
transformation can be writtenin a form, involving a process the P_MAR and the Z register will be set accordingly by the
of several XOR processes and xTime processes. The Mix Adder block to determine whether the PC has restart at OxOO.
Column and xTime logic transformations are shown in At clock cycle 1, the address of the data is read from the
Figure 7 and Figure 8. program memory location, addressed by the P_MAR, and
loaded into D_MAR and at the same time, the Op Code for
the instruction is written to the OP Register. On the other
hand, the incremented value of PC is stored in P_MAR, to
load the second address of Mem B into D MAR in the next
- -
67
second data during clock cycle 2. The PC value will be same instruction codes, rounds of operations can be executed
increased by 1 after each data loading operation is done. At Figure 10 shows the program flow of the AES in the
clock cycle 1 and 2, the PC value is pipelined into the Program Section.
P_MAR and the memory addresses corresponding to the PC I
value is pipelined into the D_MAR. Both Mem_A and
Mem_B are sequentially loaded into the data path for II Key Generation I
processing. +
At clock cycle 3, the value of Mem_B is read and sent to II Shift Row
I
the ALU for computation, together with the data item +
Mem_A which is stored in R register. The Adder and the II Sub Bytes
I If Loop1 is
other hardware blocks will perform their individual Negative, jump
operations from the two given inputs (Mem_A and Mem_B). --1 Check Loop1
I to Add Key.
Else, proceed.
TABLE II. THE NUMBER OF LOGIC GATES USED IN THE AES ALU
F. AES Program Flow
In the AES program, the program flow for has to be
sorted in a way to suit the location of the instructions. By
using SBN and branching instructions, program codes can be
reused for memory utilization. Instead of duplicating the
68
ALU block AND Gate XORGate ORGate [2] Tim Good and Mohammed Benaissa, Very Small FPGA Application
xTime - 3 - Specific Instruction Processor for AES, IEEE transactions on circuits
Sub Bvtes 45 186 - and systems,vol. S3,no. 7,2006.
XOR - 1 - [3] William F. Gilreath, Phillip A. Laplante, "Subtract and Branch
Adder (10-bit) 20 20 10 if Negative (SBN)," Computer Architecture: A Minimalist
Perspective, Springer, United States of America, pp.41-42,2003.
[4] Edwin NC Mui (Custom R&D Engineer, Texco Enterprise PTD.
LTD), Practical Implementation of Rijndael SBox using
IV. CONCLUSION
Combinational Logic.
In this paper, the MISC AES Processor using the Harvard [S ] Gael Rouvroy, Franc;ois-Xavier Standaert, Jean-Jacques Quisquater
architecture is proposed. The total program memory required and Jean-Didier Legat, "Compact and Efficient Encryption /
for the AES program is less than lk bytes. During simulation Decryption Module for FPGA Implementation of the AES Rijndael
Very Well Suited for Small Embedded Applications", itcc, vol. 2,
and development of MISC, indicator modules such as pp.S83,International Conference on Information Technology: Coding
7segment display were added into the design, thus resulting and Computing (lTCC'04) Volume 2,2004.
an increase in the total amount of slices used. By using the [6] Kane, E. 200S. "An Ultimate Minimal RISC Processor for Space
Harvard architecture, the clock cycles during encryption Applications". Colorado Space Grant Consortium's: Undergraduate
process would be reduced and nonetheless, the simplified Symposium Proceedings. (Aug. 200S).
MISC Architecture implements on simple hardware add [7] D. Engels, X Fan, G. Gong, H Hu, and E. M. Smith, Hummingbird:
points to adapt to the stringent hardware requirement and ultra-lightweight cryptography for resource-constrained devices, The
14th International Conference on Financial Cryptography and Data
faster encryption.
Security - FC 2010, January 2S-28, 2010, Tenerife, Canary Islands,
Spain.
REFERENCES
[ 1] Xinmiao Zhang and Keshab K. Parhi, High-Speed VLSI
Architectures for the AES Algorithm, IEEE transactions on very large
scale integration (VLSI) systems,vol. 12,no. 9,2004.
8 bit inpu
(1 byte) 8 bit output
(1 byte)
b b , b b,
b: b b b b b b
24.
24.
4.
69