Professional Documents
Culture Documents
CHAPTER 3
In the summer of 2001, AES replaced the aging DES as the Federal
Information Processing Encryption Standard (FIPS). DES is seen as reaching
the end of its life, as cracking of its cipher is seen to be more tractable on
current computer hardware. The AES algorithm will be used for many
applications within the government and in the private sector. Breaking an
AES encrypted cipher text by trying all possible keys is currently
computationally infeasible.
In this section, all the notations, symbols and parameters used are
based on the convention used in the NIST FIPS-197 AES Standard. The key
parameters used in the NIST FIPS-197 are:
The AES specifications differ by its length of secret key, and also
the key transformation process between AES128/192 and AES256. In the
NIST FIPS-197, the only Key-Block-Round combinations that conform to the
standard are given in below Table 3.1.
28
Example 3.1: The byte with hexadecimal value ‘57’ (binary 01010111)
corresponds with polynomial
x6 + x4 + x2 + x + 1 . (3.2)
result with the remainder polynomial after division by a special eight order
irreducible polynomial, which for AES is
Example 3.3
(x 6 + x 4 + x 2 + x +1) • x = x7 + x5+ x 3 + x2 + x
(x 6 + x 4 + x 2 + x +1) • 1 = x6 + x4 + x2 + x + 1
x13 + x11 + x9 + x8 + x6 + x5 + x4 + x3 + 1
x13 + x11 + x9 + x8 + x6 + x5 + x4 + x3 + 1
x 11 + x4 + x3 + 1
31
(a · x) = 1
(a · x) mod n = 1
However, this multiplicative inverse exists only if ‘a’ and ‘n’ are
relatively prime.
Example 3.4
where:
c0 = a0 · b0 c4 = a3 · b1ْ a2 · b2ْa1 · b3
c1 = a1 · b0ْ a0 · b1 c5 = a3 · b2ْ a2 · b3
c2 = a2 · b0ْ a1 · b1ْ a0 · b2 c6 = a3 · b3
i. Key Expansion
In the Cipher, the array is called State (denoted as S), which consists
of 4 rows of bytes, each row has 4 bytes, and each byte consists of 8 bits, thus
35
total bits of the S is 4 rows x 4 bytes x 8 bits = 128 bits. Each individual byte
has two indices; row number r with range (0 r < 4), and column number c
with range (0 c< 4), hence allowing it to be referred as Sr,c. All
transformations in the cipher are made on this State array.
In the key scheduler, the array used in data processing is called Key
State (denoted as W). Key State is a single-dimension array of bytes with 8
rows of 32-bits word, denoted as r with range (0 r< 8), hence allowing
referred as Wr . One key state array with eight W is named as round key K.
(a)
(b) (c)
Figure 3.1 (a) Initial input bytes (b) State Array and (c) Key State Array
36
Plain text : 32 43 f6 a8 88 5a 30 8d 31 31 98 a2 e0 37 07 34
Cipher key : 60 3d eb 10 15 ca 71 be 2b 73 ae f0 85 7d 77 81
The AES algorithm takes the Cipher Key, K, and performs a Key
Expansion routine to generate a key schedule. The Key Expansion generates a
total of Nb (Nr + 1) words: the algorithm requires an initial set of Nb words,
and each of the Nr rounds requires Nb words of key data. The resulting key
schedule consists of a linear array of 4-byte words, denoted [wi ], with i in the
range0 i <Nb(Nr + 1).
The expansion of the input key into the key schedule proceeds
according to the pseudo code in Figure 3.3. SubWord is a function that takes a
four-byte input word and applies the S-box to each of the four bytes to
produce an output word. The function RotWord takes a word [a0,a1,a2,a3] as
input, performs a cyclic permutation, and returns the word [a1,a2,a3,a0]. The
37
W0 W1 W2 W3
W4 W5 W6 W7
W8 W9 W10 W11
x
x
Y
X Y
Sbox(
Rot(Y))
X Y
Figure 3.4 Round key expansion
ROT rotates each byte in a word one position to the left. Let’s say a
word consists of four bytes {a0, a1, a2, a3} and after ROT transformation, the
new word is {a1, a2, a3, a0}. For example in hexadecimal, if a word
[{EA},{31}, {D4}, {F0}], ROT returns the word [{31}, {D4}, {F0}, {EA}].
3.5 illustrate the ROT process.
39
LUT Method
column with index ‘3’ in Figure.3.7. This would result in S’1,1 having a value
of {ed}.
Figure 3.7 S-box: Substitution values for the byte xy (in hexadecimal
format)
(shx+sl)-1 = sh
x + (shx+sl
(3.15)
T S O S S S
2
h h l l
2 1
Table 3.2 Gate counts and critical paths of functional blocks in the
subbytes transformation
At the start of the encryption, the cipher input is copied into the
internal state array. An initial round key is then added and the state is then
transformed by iterating a Round Transformation in a number cycles. The
number of cycles varies with the key length and block size. There are 4
functions involved in the Round Transformation:
i. AddRoundKey
(a) (b)
KEY
KEY
EXPANSION
UNIT
OUTPUT
INPUT
CLK
CLK
RST
ENC AES-128
CRYPTO DATA_OUT
KEY_I
DATA_I
Table 3.3 lists the primary input and output signals for the AES-128
core, which are essential to select the AES specification, operation mode, data
/ key input as well as generated output.
Signal Width
Type Description
Name (bit)
Clk 1 Input Processor main clock signal
Rst 1 Input Processor main reset signal
0 – normal operation; 1 – System reset
Enc 1 Input Processor mode of operation signal
0 – Decryption mode; 1 – Encryption
mode
Key_in 128 Input The Secret Key to be used by AES Key
Expander to expand all round keys.
Data_in 128 Input The initial data block to be encrypted or
decrypted
Data_out 128 Output Final result of AES transformation
There are five major components which decide the throughput, area,
power of AES encryption and decryption. They are inverters, 2:1
multiplexers, XOR gates, D flip-flops, and totally self-checking two-rail
checkers. All of these components should be made sure to be faulty free and
produce two-rail outputs for a valid two rail input. When the inputs are valid
the output is valid and correct, and when an input is non-valid the output is
non-valid. Using truth table in Table 3.4 it is seen that an input set that yields
a non-valid output in the presence of every possible stuck-at fault; hence, the
XOR cell is totally self-checking for all single stuck-at faults and non-valid
57
inputs. Normally, Pseudo-nMOS technology has not been used because of its
more static power consumption than CMOS technology but it is preferred
because the devices are fast and the short between power and ground makes
the output predictable in the presence of a fault as shown in the Figure 3.24.
Table 3.5 Delay and Power report of the gates used in Fault Detection
SYSTEM
PARTITIONING
DESIGN ENTRY
SIMULATION
SYNTHESIS
FLOOR PLAN
PLACEMENT
LAYOUT SIMULATION
ROUTING
EXTRACTION
Subbytes
Analysis: The above waveform shows the simulation results of the Subbyte
module. Here signal ‘in’ is 8-bit input for this module and signal ‘out’ is the
8-bit output. In this subbytes operation one 8-bit value substituted with
another 8-bit value with the help of lookup tables.
Invsubbytes
Key Expansion
Analysis: The above waveform shows the simulation results of the key
expansion module. Here signal ‘key’ is 128-bit input and signals ‘w0 to w43’
are outputs. By using 128 bit input key we are generating total of 10 round
keys which are used in every round operation.
64
Analysis: The above waveform shows the simulation results of the single
round encryption operation. Here signal ‘round_in’ is 128-bit input,
‘w0,w1,w2,w3’is combinely treated as round key and ‘round_out’ is 128-bit
output.
Analysis: The above waveform shows the simulation results of the single
round decryption operation. Here signal ‘round_in’ is 128-bit input,
‘w0,w1,w2,w3’is combinely treated as round key and ‘round_out’ is 128-bit
output.
66
Encryption operation
Input1 = 128’h00112233445566778899aabbccddeeff;
Input2 = 128’h10112233445566778899aabbccddeeff;
Output1 = 128’h69c4e0d86a7b0430d8cdb78070b4c55a;
Output2 = 128’h0761adfd2febd4d105b1ac2ff88171b3;
Decryption operation
Input1 = 128’h0761adfd2febd4d105b1ac2ff88171b3;
Input2 = 128’h69c4e0d86a7b0430d8cdb78070b4c55a;
Output1 =128’h10112233445566778899aabbccddeeff;
Output2 =128’h00112233445566778899aabbccddeeff;
5. With the help of Chip scope pro analyzer software we can verify
our output on monitor.
The design is modeled using Verilog HDL and simulated with the
help of Modelsim and Cadence NCsim. Synthesis is done by using RTL
Compiler and physically designed with SOC Encounter. The transistor level
design is being done by Cadence ADE and the simulation is carried out using
SPECTRE. In the proposed architecture throughput increase to 32.32 Gbps
with 180nm TSMC technology library. The design has also been targeted on
FPGA, which achieved a throughput of 31.9Gbps on Xilinx xc5vlx110t-1
device which is faster and more effective than the fastest previous FPGA
implementations known to date.