Professional Documents
Culture Documents
AMBA-AHB interface
Paola Ceminari Ariel Arelovich Martı́n Di Federico
UNS/ INTI - CMNB INTI - CMNB INTI - CMNB
Bahı́a Blanca Bahı́a Blanca Bahı́a Blanca
pceminari@inti.gob.ar ariela@inti.gob.ar martind@inti.gob.ar
Abstract—The aim of this work is to describe three different between the plain text block and first round key, while general
architectural designs for AES cipher, which is a symmetric block rounds consist in four transformations called SubBytes (SB),
encryption standard. The three architectures are oriented to dif- ShiftRows (SR), MixColumns (MC) and AddRoundKey (ARK).
ferent applications and are designed using different approaches,
like pipeline structures and resource sharing. They also include The final round is similar to general rounds, except that it
an AMBA AHB interface, which is an open standard that defines does not include the MixColumns transformation. All input
the interconnection of blocks in a System-on-Chip (SoC). and output data blocks, as well as the intermediate results
during encryption, can be visualized as a 4 × 4 matrix, called
I. I NTRODUCTION
state matrix, which elements corresponds to the sixteen bytes
Advanced Encryption Standard (AES) resulted from an in a 128-bits data. Next sections describe the transformations
initiative carried out by National Institute of Standards and used during encryption.
Technology (NIST) in 2000 [1]. The algorithm selection was
an open process and the winner algorithm was Rijndael [2],
a substitution-permutation block cipher. Originally, AES was
destined to sensitive information protection in United States
governmental institutions, but some years later it became a
global de facto standard. One of the main AES features
is its capability to be implemented efficiently over different
hardware and software platforms [3] [4] [5].
This paper presents three architectures for AES cipher. They
represent solutions for different applications and include a
standard communication protocol to allow their incorporation
in more complex systems. It is organized as follows: the
AES algorithm is described in Section II, considering all
the transformations that take place during encryption and
key expansion processes. In Section III the design strategies
for each architecture are presented. Section IV details the
AMBA AHB protocol and its interface with AES ciphers.
Finally, Sections V and VI show simulation and synthesis
results.
II. AES ALGORITHM DESCRIPTION
AES operates over 128 bits data blocks; while the secret
key size can be 128, 192 or 256 bits. The encryption process
consist in the iterated application of invertible transforms,
denominated rounds, as shown in Fig. 1. At first place an initial
round is performed, followed by Nr − 1 iterations of a general
Fig. 1. AES encryption diagram.
round, and one iteration for a final round. The number of round
iterations, Nr , depends on secret key length: Nr = 10 for a
128 bits key, Nr = 12 for a 192 bits key and Nr = 14 for a A. SubBytes
256 bits key. Each round requires a sub-key (also denominated This transformation corresponds to the substitution layer in
round key), which is generated from secret key by an expan- the algorithm and is the only nonlinear operation in AES
sion algorithm. The initial round consists in a XOR operation standard. It consists in the application of a transformation
978-1-5090-3963-0/17/$31.00
2017
c IEEE
(called S-BOX) to each one of the state matrix elements. Math- E. Operation modes
ematically, the S-BOX is an inversion in Galois Finite Field The operation modes define how a plain text whose length is
GF {28 }, using the irreducible polynomial x8 +x4 +x3 +x+1, longer than the block size is sent to the cipher. The most used
followed by an affine mapping. This mapping increments operation modes are detailed in recommendations emitted by
the transformation complexity and avoids fixed and opposite NIST [8]. The basic modes are Electronic Codebook (ECB),
points, i.e no byte is mapped to itself or its inverse value. Cipher Block Chaining (CBC), Cipher Feedback (CFB), Out-
The S-BOX transformation can be implemented by arith- put Feedback (OFB) and Counter (CTR). The architectures
metic operations in finite fields [6] [7], or by lookup tables presented in this work are designed in ECB mode, in which
(LUTs). the plain text is divided in 128-bits segments which are
B. ShiftRows encrypted independently. This operation mode is weak from a
cryptographic point of view, because it preserves the plain text
This transformation consists in a shift in the state matrix
statistic properties. However, it is implemented in this work
rows. The first row is not shifted, the second row is shifted
because it is base for the other operation modes and, as it is
circularly one byte to the left, the third is shifted circularly
not a feedback mode, admits the encryption of multiple data
two bytes to the left, and the last one is shifted circularly
blocks simultaneously (parallelism).
three bytes to the left.
III. D ESIGN
C. MixColumns
The architectures developed in this work have a modular
This transformation is a linear operation that is applied in- structure that consists in three blocks: key expansion, en-
dependently over each column in state matrix. Mathematically cryption and control. The expansion block generates round
corresponds to the multiplication between polynomials over fi- keys from the secret key, following the expansion algorithm
nite field GF {28 }. Each column is considered as a polynomial described in the standard. In the other hand, the control
with coefficients in GF {28 } field and are multiplied, modulo block consists in a FSM that manages the communication
the irreducible polynomial x4 + 1, with a constant polynomial between expansion and encryption blocks, avoiding errors and
defined in the standard as 03x3 + 01x2 + 01x + 02. Equation 1 ensuring that data encryption starts once the key expansion is
shows this operation for one column. Superscript 0 is used completed. The control block also signals when the module
to represent a state matrix element after a transformation is can receive a new plain text block and the output data
applied. is valid. The three architectures presented are called basic,
0
pipeline and compact. The difference between them lies in the
S0,j 02 03 01 01 S0,j
0
S1,j implementation of their blocks, mainly the encryption one.
0 = 01 02 03 01 × S1,j
S2,j 01 01 02 03 S2,j (1)
A. Basic architecture
0 03 01 01 02 S3,j
S3,j All the internal buses in this architecture are 128 bits wide,
D. Key Expansion and one round is executed per clock cycle. Fig. 2 shows
A recursive algorithm is used to obtain the sub-keys needed a diagram in which can be seen that the encryption block
during encryption. This algorithm depends on the secret key consists in the hardware implementation of one general round,
size. For a 128 bits secret key, the first sub-key is identical whose output is feedback to the input by a register. The general
to the secret key. Then, every remaining sub-key word is round consists of sixteen instances of the LUT that conforms
computed by the algorithm shown in Eq. 2. Function g() the SubBytes transformation (one for each byte); four instances
is nonlinear and consists in a circular shift followed by a of MixColumns transformation (one for each word), and six-
substitution, using the same S-BOX as in encryption process, teen 8-bits XORs that implement AddRoundKey. ShiftRows
and a XOR operation between the most significant byte and an transformation is carried out by the proper addressing between
8 bits constant, called RC, whose value is different for each SubBytes output and MixColumns input. The multiplexer is
iteration. used to differentiate between a general round and a final round.
The main goal of this architecture is to obtain a higher In compact architecture, the key expansion block also has
throughput when compared with the basic module. This is 32-bits internal buses [9]. One sub-key word is calculated
accomplished by carrying out multiple rounds simultaneously. per clock cycle, as shown in Fig. 6. Each calculated word
The encryption block consists in nine general rounds and a is sent to the encryption block, where round keys are stored
final round, as it is shown in Fig. 4. Every general round has in four memory blocks (one for each key matrix row). These
sixteen SubBytes and AddRoundKey instances, besides four blocks are accessed every time AddRoundKey transformation
MixColumns instances. Rounds are connected by registers, is carried out.
allowing the processing of ten data blocks at the same time
(fully-pipelined structure). The throughput is maximum in this
architecture once its internal structure is complete, obtaining
one cipher text block at the output at each clock cycle. The
key expansion block in this architecture is identical to the one
presented for basic architecture.