You are on page 1of 8

A High-Throughput FPGA-Based

Architecture for Advanced Encryption


Standard: AES-512 Using Pre-ciphered
Lookup Table

Vivek Kumar, Purnendu Shekhar Pandey and Praful Ranjan

Abstract This paper proposes an FPGA architecture for a 512-bit AES imple-
mentation using a pre-ciphered lookup table approach. The hardware realization
uses a 512-bit block message and a 512-bit key. The architecture is designed to give
an increased throughput for applications were session keys are used for commu-
nication. The architecture exploits the fact that session key does not change for
substantial duration for an entire session; therefore, a pre-ciphered lookup table can
be used to enhance the encryption throughput. The design is suitable for applica-
tions where communication is performed in sessions and the key does not alter
frequently, such as HTTP, Telnet remote login session in the application layer.
An FPGA architecture is developed using Verilog HDL and synthesized using
Virtex-7 device which shows a 290.71% increase in the throughput achieved in
comparison with the previous implementation.


Keywords Session key Advanced encryption standard  Crypto-accelerator
Symmetric key encryption

V. Kumar (&)
Department of Computer Science and Engineering, THDC Institute of Hydropower
Engineering and Technology, Bhagirathipuram, Tehri, Uttarakhand, India
e-mail: vivek9837@gmail.com
P.S. Pandey  P. Ranjan
Department of Electronics and Communication Engineering, THDC-IHET, Bhagirathipuram,
Tehri, Uttarakhand, India
e-mail: purnendu12345@gmail.com
P. Ranjan
e-mail: prf98354@rediffmail.com

© Springer Nature Singapore Pte Ltd. 2018 41


R. Singh et al. (eds.), Intelligent Communication, Control and Devices,
Advances in Intelligent Systems and Computing 624,
https://doi.org/10.1007/978-981-10-5903-2_5
42 V. Kumar et al.

1 Introduction

Encryption of the digital information has been the most widely used technique of
data security for communications over insecure channels. Data encryption falls
under two main categories of symmetric and asymmetric key encryption where a
shared key is used for symmetric key encryption, while separate pair of public and
private keys are used for asymmetric key encryption. Asymmetric key encryption is
extensively employed in digital signatures where data is of short size, but the
mathematical complexity hinders its use for encryption of data of large size. To
encrypt the data of larger bit size, symmetric key encryption is used; one of the
most famous symmetric key algorithms is advanced encryption standard (AES).
AES [1] was published by NIST in 2001 as a replacement of its predecessor, the
DES algorithm [2]. So far AES has been the most commonly used algorithm for
symmetric encryption because of its simplicity and has thus attracted the use of
crypto-accelerator architectures for enhanced throughput.
AES-128, AES-256 [3, 4] are the commonly used variants of the AES algorithm,
where key sizes are 128 and 256 bits, respectively. A more secure version of AES,
AES-512 bits, is proposed here which uses a pre-ciphered lookup table (LUT),
suitable for the digital communication where information is exchanged in sessions
and validity of the symmetric key is of substantial duration. The use of LUT
enhances the throughput with an endured area increase which is due to the size of
the key and message block, both of which are of 512 bits. Mathematically for a
particular key, any plain character will always map to a defined ciphertext.
Therefore, if the known characters are encrypted beforehand and are stored in an
LUT together with their ciphertexts, the message to be encrypted can then be
replaced by the pre-ciphered characters by replacing the plain characters with their
corresponding ciphertexts.
The proposed AES-512-bit architecture performs the encryption in two phases,
where Phase 1 is where all the characters from standard ASCII character set are
ciphered and stored in an LUT. Phase 2 is where the actual encryption of the input
plaintext is performed by substituting the characters with their corresponding
ciphertext. Phase 1 has 4 major byte-orientated transformations. First transforma-
tion is AddRoundKey where a XOR operation is performed between a 512-bit key
and a 512-bit input matrix. Second is the SubBytes transformation where each byte
is substituted with a pre-defined byte using a substitution box (S-BOX). Third
transformation is the ShiftRows, where each row of the input matrix is shifted
cyclically with different degrees. And fourth is the MixColumns transformation
where the input matrix is columnwise multiplied with a constant matrix. The input
plaintext goes through a number of rounds, each round contains the-above men-
tioned transformations. For a 512-bit variant of AES, there are a total of 16 rounds
employed in the proposed architecture.
A High-Throughput FPGA-Based Architecture … 43

2 Related Work

Several hardware implementations have been proposed for AES, majority of which
were of 128, 256-bit block size. Emphasize has always been on the throughput
achieved with lesser area requirement. A parallel hardware implementation of
128-bit AES proposed [5] used four 32-bit data blocks in parallel and was able to
outperform previous 128-bit architectures in terms of throughput and area usage. In
[6], researchers have implemented and analyzed the performance and impact of the
area of several cryptographic algorithms such as AES, Camelia, and SMS4 on two
different LUT-size FPGA devices. As pipelining is lucrative method for attaining a
decent throughput, researchers over the years have proposed many pipelined
architectures for AES. A reconfigurable pipelined architecture [7] using parallel
connections excelled in processing speed and throughput achieved. Another hard-
ware architecture using pipelining for AES algorithm over GF((24)2) [8] was
implemented by partitioning the ten rounds of AES-128 into repeated AES mod-
ules. A more secure hardware variant of AES, AES-512 architecture proposed by
[9] showed a high-throughput achievement with tolerable area increase and as key
search space was now 2512, was also more resilient to cryptanalysis.
The proposed AES-512-bit architecture, which targets a high throughput of
encryption, avails the advantage of using pre-ciphered LUTs and as anticipated
gives a faster processing speed with an endured area increase.

3 Hardware Implementation of AES-512

The AES-512 module assumes that the communication between network entities is
performed using the standard ASCII character set. In a secure communication
whenever the information exchange starts, the communicating parties share a ses-
sion key and for a limited period of time (it is recommended to change session key
frequently) every data sent or received is encrypted using this key. For a given
character if the encryption key does not change, it will always encrypt to the same
ciphertext. The architecture exploits this fact, and all of the characters from the
standard ASCII set are encrypted and stored in a pre-ciphered lookup table using
the session key.
This task is carried out in Phase 1 (Fig. 1). Input plaintext message is encrypted
in Phase 2. Plaintext message is an array of characters, and each character has
already been ciphered (in Phase 1), now only a substitution operation is required
where each character in the message is substituted with its corresponding ciphertext
using the pre-ciphered lookup table. Substitution operation in Phase 2 is a low-cost
operation; therefore, the encryption of the input message is quite fast and as
anticipated, a significant throughput is achieved. However, the technique is viable
till the validity of the key. A key change will mean that the pre-ciphered LUT is
obsolete, because now the plain characters will encrypt to an altogether different
44 V. Kumar et al.

Fig. 1 Pre-ciphered lookup table encryption approach

Fig. 2 Architecture layout of the AES-512-bit LUT-based encryption

ciphertext due to the introduction of the new key and using old LUT will produce
incorrect results; therefore, production of a new LUT will be required. A switching
may be performed between normal course of message encryption and the
substitution-based encryption depending upon the validity of the session key. If the
key validity is of shorter duration, a normal course of encryption must be adopted,
but for key with significant validity, a LUT-based substitution encryption will
propel the throughput achieved.
The architectural layout of the proposed encryption technique for AES-512 is
depicted in Fig. 2. The architecture comprises of three major functional commu-
nicating blocks, the character encryption generator (CEG) module, the encrypted
character storage (ECS) module, and message encryption (ME) module. A write
A High-Throughput FPGA-Based Architecture … 45

control logic (WCL) functioning between the CEG and ECS module ensures a
synchronous write operation from CEG to ECS module. The detailed functionalities
of modules are discussed in subsequent subsections.

3.1 Character Encryption Generator (CEG) Module

The CEG module is the functional block which performs the encryption of the
ASCII character set using AES-512-bit algorithm. There are 128 characters in the
standard ASCII character set; therefore, 8 bits suffice to represent them all. The first
input to the module is an 8-bit ASCII character. The other input is a 512-bit key;
this is the session key shared between the communicating entities. It is presumed
that the key is shared using some standard key sharing protocol. Now, because the
message block and key block must be of 512 bits in size, the 8-bit character is
padded with 504 bits of bogus character to yield a 512-bit message block. Once
padded, the message block and key are fed to the algorithm, and the obtained
character cipher block, together with the original 8-bit character, is transmitted to
the encrypted character storage (ECS) module. The procedure is repeated for all the
128 characters of the ASCII set.

3.2 Encrypted Character Storage (ECS) Module

The ECS module is an EPROM which is used for storing the ASCII characters
together with their corresponding ciphertext. The ECS module takes the plain
character together with the encrypted text and stores them in a form of a 2-D array,
indexed using the 8-bit character representation. A write control logic works
between CEG and ECS module to ensure a synchronous write operation. Whenever
the CEG module generates a 512-bit encrypted character, it also generates an
enc_done signal which is fed to a down counter D1. D1 counts the number of
characters that have been encrypted and transmitted to the ECS module. Once all
the characters have been encrypted, it transmits a low-count-done signal which
disables further write operations on ECS module. Once filled with all the 128
encrypted ASCII characters, the ECS module will be operative for the message
encryption (ME) module which encrypts the actual input message.

3.3 Message Encryption (ME) Module

The actual input message encryption is performed by the ME module. The module
takes as input, a 512-bit message, and substitutes each character of the input
message with its corresponding ciphertext using the pre-ciphered lookup table. The
46 V. Kumar et al.

Fig. 3 Architectural design of character encryption generator

input message is fragmented into chunks of 8-bit character using the 8-bit-selector
sub-module. The selected 8 bits are transmitted to the ECS module which looks for
a matching character. Once the match is found, the corresponding ciphertext is
transmitted back to the ME module together with a load-done signal. The received
512-bit cipher block is produced at the output port and the load-done signal is fed to
the down counter D2 which counts the number of input message fragments that
have been processed. It transmits a high_frag count signal to the bit-selector
sub-module which selects the next 8 bits and the process is continued. The 512-bit
message will have 64 chunks of 8-bit characters. The down counter D2 counts these
64 chunks and when all the bits are exhausted, it transmits a low-frag count signal
to the bit-selector sub-module which in turn resets value to entertain a new 512-bit
message.
Figure 3 depicts a detailed architecture of AES-512-bit character encryption
generator (CEG) module. The architecture comprises of a KeyGeneration module
with four integral transformation modules, namely AddKey, SubstituteByte,
ShiftRows, and MixColumn. An external input of a 512-bit key is fed to the
KeyGeneration module which expands it. The key together with the input message
is given as input to the AddKey module, and after subsequent transformations, the
output data is fed back to the AddKey module. A control logic is provided for the
selection of the data that needs to be fed to the AddKey module; a counter logic
initially set to 0 selects the input plaintext message from mux m1 for the first round
of AES, and for subsequent rounds, transformed data from MixColumn is selected
as input for AddKey.

Table 1 AES-512-bit Device Freq (MHz) Area (CLBS) Throughput (MBPS)


implementation results
VIRTEX-7 926.6 5813 3381
VIRTEX-6 695.3 5813 2540
A High-Throughput FPGA-Based Architecture … 47

Table 2 AES-512-bit Design Area (CLBS) Throughput (MBPS)


implementation results
AES-512 5813 3381
[9] 6701 1163
[10] 3528 294
[11] 5673 353

The last round of AES does not have a MixColumn transformation, and a final
AddRoundKey transformation is followed after ShiftRows. Mux m2 is used for the
selection of data from MixColumn module except for the last round where data
from ShiftRows transformation is selected. A 512-bit ciphertext is produced at
gating circuit when counter logic is done with 16 rounds.

4 Results and Evaluation

The proposed AES-512-bit encryption technique was designed in Verilog for


verification and simulation. Both families of Virtex-6 and Virtex-7 were used as
target devices, while Xilinx ISE 14.1 was used to synthesize the Verilog codes.
Virtex-7 family are new FPGA devices with higher performance and bandwidth.
Table 1 shows the synthesis results. The result includes device family, area (in
terms of configurable logic blocks (CLBs)), operating frequencies, and throughput
obtained. The proposed encryption technique was able to achieve 290.71% higher
throughput compared to previous 512-bit design. The area increase was, however,
64.76% (in terms of the increased CLB usage). Table 2 depicts the comparison of
the implemented design with previous implementations, [9] is the 512-bit design.
The area increase is evident from the bigger key size and block size. Also, increased
key size makes the algorithm more resilient for the brute-force attack.

5 Conclusion

A high-throughput crypto-architecture is demanded in real-time applications like


the multimedia and scientific research applications. The paper proposed and
implemented a high-throughput LUT-based AES-512-bit algorithm which used a
pre-ciphered lookup table which propelled the throughput. The architecture is more
viable for the applications which carry out communication in session like the
TELNET- and HTTP-based applications.
48 V. Kumar et al.

References

1. Daemon J, Rijmen V. The Rijndael Block Cipher AES Proposal. NIST, Version. 1999 Mar; 2.
2. Parikh C, Patel P. Performance evaluation of AES algorithm on various development
platforms. In 2007 IEEE International Symposium on Consumer Electronics 2007 Jun 20
(pp. 1–6). IEEE.
3. Liberatori M, Otero F, Bonadero JC, Castieira J. Aes-128 cipher. High speed, low cost FPGA
implementation. In 2007 3RD Southern Conference on Programmable Logic 2007 Feb
(pp. 195–198). IEEE.
4. Orlic VD, Peric M, Banjac Z, Milicevic S. Some aspects of practical implementation of AES
256 crypto algorithm. In Telecommunications Forum (TELFOR), 2012 20th 2012 Nov 20
(pp. 584–587). IEEE.
5. Chang CJ, Huang CW, Chang KH, Chen YC, Hsieh CC. High throughput 32-bit AES
implementation in FPGA. InCircuits and Systems, 2008. APCCAS 2008. IEEE Asia Pacific
Conference on 2008 Nov 30 (pp. 1806–1809). IEEE.
6. Gao X, Lu E, Li L, Lang K. LUT-based FPGA Implementation of SMS4/AES/Camellia. In
Embedded Computing, 2008. SEC’08. Fifth IEEE International Symposium on 2008 Oct 6
(pp. 73–76). IEEE.
7. Guo Z, Li G, Liu Y. Dynamic reconfigurable implementations of AES algorithm based on
pipeline and parallel structure. In Computer and Automation Engineering (ICCAE), 2010 The
2nd International Conference on 2010 Feb 26 (Vol. 3, pp. 257–260). IEEE.
8. Abdel-hafeez S, Sawalmeh A, Bataineh S. High performance AES design using pipelining
structure over GF ((2 4) 2). In Signal Processing and Communications, 2007. ICSPC 2007.
IEEE International Conference on 2007 Nov 24 (pp. 716–719). IEEE.
9. Moh’d A, Jararweh Y, Tawalbeh LA. AES-512: 512-bit Advanced Encryption Standard
algorithm design and evaluation. In Information Assurance and Security (IAS), 2011 7th
International Conference on 2011 Dec 5 (pp. 292–297). IEEE.
10. Wolkerstorfer J, Oswald E, Lamberger M. An ASIC implementation of the AES SBoxes. In
Cryptographers Track at the RSA Conference 2002 Feb 18 (pp. 67–78). Springer Berlin
Heidelberg.
11. Elbirt AJ, Yip W, Chetwynd B, Paar C. An FPGA-based performance evaluation of the AES
block cipher candidate algorithm finalists. IEEE Transactions on Very Large Scale Integration
(VLSI) Systems. 2001 Aug; 9(4):545–57.

You might also like