You are on page 1of 51

11

CHAPTER-I

INTRODUCTION

1.1 Advanced Encryption Standard

The Advanced Encryption Standard (AES), a symmetric key block which is


published by the National Institute of Standards and Technology (NIST) in December
2001. It is a non-Feistel block cipher that encrypts and decrypts a fixed data block of 128-
bits. There are three different key lengths. The encryption/decryption consists of 10 rounds
of processing for 128-bit keys, 12 rounds for 192-bit keys, and 14 rounds for 256-bit keys.

Cryptography is associated with the process of converting ordinary plain text into
unintelligible text and vice versa. There are three types of cryptographic technique namely
- Symmetric key cryptography, Hash functions and public key cryptography. Symmetric
key algorithms namely Advanced Encryption Standard (AES), Data Encryption Standard
use the same key for encryption and decryption. It is much faster, easy to implement and
requires less processing power.

Cryptography is the science of secret, or hidden writing

It has two main Components:

 Encryption
Practice of hiding messages so that they cannot be read by anyone other than the
intended recipient
 Authentication & Integrity
Ensuring that users of data/resources are the persons they claim to be and that a
message has not been surreptitiously altered
Requirements of secure communication

 Secrecy
Only intended receiver understands the message
 Authentication
Sender and receiver need to confirm each other’s identity
 Message Integrity
12

Ensure that their communication has not been altered, either maliciously or by
accident during transmission
Cipher: Cipher is a method for encrypting messages.
13

Fig.1.1 Symmetric Key Encryption

Encryption algorithms are standardized & published

The key which is an input to the algorithm is secret

– Key is a string of numbers or characters

– If same key is used for encryption & decryption the algorithm is called symmetric

– If different keys are used for encryption & decryption the algorithm is called
asymmetric.

Uses a pair of keys for encryption

– Public key for encryption

– Private key for decryption

Messages encoded using public key can only be decoded by the private key

– Secret transmission of key for decryption is not required

– Every entity can generate a key pair and release its public key

Symmetric algorithms:

Algorithms in which the key for encryption and decryption are the same are Symmetric

Example: Caesar Cipher

1.2 Types of Ciphers:

Block Ciphers

o Encrypt data one block at a time (typically 64 bits, or 128 bits)

o Used for a single message

Stream Ciphers

o Encrypt data one bit or one byte at a time

o Used if data is a constant stream of information


14

In modern-day block ciphers, the role of substitution-boxes is to transform the plaintext


data nonlinearly to generate cipher-text data with sufficient confusion. It has been well-
confirmed that the robustness and security of such block ciphers heavily based on the
cryptographic strength of the underlying substitution-boxes. Reason being, they are the
only components that are held responsible to bring required nonlinearity and complexity
into the security system which can frustrate the attackers. Accordingly, a number of
different concepts have been explored to construct strong S-boxes. To move forward with
the same aim, a novel simple modular approach, the very first time, is investigated to
construct nonlinear S-box in this paper. The new modular approach consists of three
operations such as new transformation, modular inverses, and permutation. A number of
highly nonlinear S-boxes can be easily constructed with slight changes in the novel
transformation parameters. An example S-box is presented whose critical performance
assessment against some benchmarking criterions such as high nonlinearity, absence of
fixed points, fulfillment of SAC and BIC properties, low differential uniformity and linear
approximation probability and comparison with recent S-boxes demonstrate its upright
cryptographic potentiality. In addition, an image encryption algorithm is also proposed
wherein the generated S-box is applied to perform the pixels shuffling and substitution for
strong statistical and differential encryption performance.

Data and information communication has become very important ingredient of today’s
technological life and considered as significant assets of an individual or organization. If
the confidentiality of information is compromised, then the information can be used for
harmful purposes. Current innovations in information technology and their prolific
applications in our life have caused in a gigantic growth in the size of the data being
transmitted online. The private information being very sensitive assets require protection
from attackers. Therefore, prior to its transmission, data demand its protection and needs
methods for its transformation into a meaningless form for the invaders. Cryptographic
algorithms are the mathematical methods and techniques that assist in the protection of
data. Stream ciphers transform the data in a bit-by-bit or byte-by-byte manner. Whereas,
the block ciphers transform data in chunks which comprise large number of bits or bytes at
a time. In modern symmetric encryption, block ciphers are considered as one of the most
effective tools for data protection. Data Encryption Standard (DES), Blowfish, Advanced
Encryption Standard (AES), RC5, etc. are examples of contemporary block ciphers.
Precise implementations of block ciphers are easy and are more general in nature than the
15

stream ciphers. One category of prevalent block ciphers is known as the SP network-based
block ciphers. These block ciphers use two major operations of substitution and
permutation for the transformation of data into a perplexing form.

Fig.1.2 Architecture of 256 AES Algorithm

A substitution operation substitutes a byte/block with another byte/block using a


substitution table known as a substitution box or S-box. On the other hand, a permutation
process shuffles the input bits or bytes in some linear fashion. A substitution-box is a
pivotal constituent of modern-day block ciphers that helps in the generation of a muddled
cipher text for the specified plaintext. Through the incorporation of S-box, a nonlinear
mapping among the input and output data is established to create confusion. The more
confusion an S-box can create in the output data, the more secure a block cipher is. As a
result, the provision of security by a block cipher employing one or more S-boxes directly
depends on how much stronger S-boxes are. Block ciphers consist of many components in
addition to one or more S-boxes. Contrary to other components, an S-box is the alone
nonlinear component of block ciphers that supports the enhancement of data protection.
Generally, a block cipher uses either a static S-box or one or more dynamic S-boxes. A
static S-box is fixed for every incoming data and secret key which is used repeatedly in the
block cipher. A block cipher based on a static S-box employs that S-box in all its rounds. A
static S-box allows an attacker to inspect its characteristics, discover its fragilities, and
eventually find the chance of getting plaintext from the respective cipher-text. As an
example, static S-boxes employed in Data Encryption Standard (DES) were an easy target
16

for the attackers. Consequently, to overcome the weaknesses due to static S-boxes, many
cryptographers have explored innovative techniques to design dynamic S-boxes. Dynamic
S-boxes are generated using cipher key and provide a way to augment the cryptographic
power of a block cipher. Construction and usage of the key-dependent and dynamic S-
boxes in a cipher enhance its cryptographic power. Blowfish cipher employs such dynamic
S-boxes in its working.

1.3 ENCRYPTION PROTOCOLS:

Pretty Good Privacy (PGP)

 Used to encrypt e-mail using session key encryption

 Combines RSA, Triple DES, and other algorithms

Secure/Multipurpose Internet Mail Extension (S/MIME)

 Newer algorithm for securing e-mail

 Backed by Microsoft, RSA, AOL

Secure Socket Layer (SSL) and Transport Layer Socket (TLS)

 Used for securing TCP/IP Traffic

 Mainly designed for web use

 Can be used for any kind of internet traffic


17

CHAPTER-II

LITERATURE REVIEW

M. Rajeswara Rao, Dr.R.K.Sharma, SVE Department, NIT Kurushetra


“FPGA Implementation of combined S box and Inv S box of AES” 2017 4th
International conference on signal processing and integrated networks (SPIN).

An implementation of a combinational memory-less S-Box and inv S-Box (combinedly)


for ByteSub and InvByteSub transformations of AES on a same hardware. This is a part of
the combined architecture of AES in which both encryption and decryption can be
performed with an enable pin. Previously LUTs are used to implement the S-Box and Inv
S-Box of AES separately, which causes large amount of memory and area. In this paper,
the proposed architecture is implementing using composite field arithmetic in finite fields
GF (28) which is advantageous than LUT approach on the basis of hardware complexity.
As both S-Box and Inv S-Box are implementing on a same hardware, there is large
reduction in gate count as well as in area. The power consumption is also reduced because
of the resource sharing of multiplicative inverse module in both S-Box and Inv S-Box. The
AES is suitable in wide range of applications, such as banking, digital video recorders, web
servers, ATMs and cellular phones. Hardware implementation of AES offers more
physical security and higher speed too when compared to software implementation of
AES. The main advantage of AES over most of the cryptographic algorithms is that it
doesn’t include Feistel structure.

Summary: By comparing with the LUT based in this hardware complexity is reduced.

Nalini C. Iyer ; Deepa ; P.V. Anandmohan ; D.V. Poornaiah 2010


“Mix/InvMixColumn decomposition and resource sharing in AES”.

In this paper, compact architectures for AES Mix Column and its inverse is presented to
reduce the area cost in resulting AES implementation. In the hardware implementation of
AES with direct mapping substitute byte optimization, MixColumn/Inverse MixColumn
transformation demands the utilization of logic resources and then effects the critical path
delay and resulting throughput. The proposed MixColumn/Inverse MixColumn design
based on byte and bit-level decomposition leads to two types of architecture which
demonstrates deeper resource sharing within byte and between bytes and rearrangement of
18

output terms with respect to FPGA architecture in bit level resupply. The proposed
architectures have been investigated on a FPGA based implementation platform.
Application of the proposed architectures resulted in reduction of reconfigurable logic area
by 40% as compared to separate implementation of MixColumn and Inverse MixColumn
reduction and also path delay by 9% resply. Experimental results show that our proposed
architecture can reduce the area cost significantly and compared with other previous
implementations reported so far.

There exist many presentations of hardware implementations (ASIC and FPGA) of


Rijndael AES algorithms in literature. Static implementations based on ASICs are
inherently impossible to update or upgrade in response to new security threats. On the
other hand, Field-programmable gate array (FPGA) technology has much greater potential
for providing higher security level in response to new threats because of its capability for
dynamic reconfiguration and also time to market as compared to ASIC implementation.

Summary: The proposed Mix/Inv mix architecture can reduce the area cost significantly.

Yulin Zhang ; Xinggang Wang; 2010 “Pipelined implementation of AES encryption


based on FPGA” IEEE International Conference on Information Theory and
Information Security.

This paper presents the outer-round only pipelined architecture for a FPGA
implementation of the AES-128 encryption processor. The proposed design uses the Block
RAM storing the S-box values and exploits two kinds of Block RAM. By combining the
operations in a single round, we can reduce the critical delay. As the network transmission
speed upgrades to the gigabits per second (Gbps), the software-based implementations of
cryptographic algorithms cannot meet its needs. The hardware-based implementations
using some special optimization techniques (such as pipeline and lookup tables, etc.), can
greatly improve throughput and reduce the key generation time. Besides, the processes of
cryptographic algorithms and the key generation packaged in chip, which cannot easily be
read or changed by external attacker, so hardware-based implementations can get the
higher physical security .In recent years , many hardware based implementations had been
proposed in literature[3- 5,7-15].Some implementations use the field programmable gate
arrays (FPGA) and the others use the application specific integrated circuit(ASIC).ASIC
lacks of flexibility and has high development costs and long development cycle.
19

Summary: By combining the operations in a single round, we can reduce the critical
delay.

C. Sivakumar ; A. Velmurugan ; 2007 “High Speed VLSI Design CCMP AES Cipher
for WLAN (IEEE 802.11i)” International Conference on Signal Processing,
Communications and Networking.

The Advanced Encryption Standard (AES) algorithm has become the default choice for
various security services in numerous applications. In this paper, we propose a high speed,
non-pipelined FPGA implementation of the AES-CCMP (Counter-mode/CBCMAC
Protocol) cipher for wireless LAN using Xilinx development tools and Virtex-It Pro FPGA
circuits. IEEE 802.11i defines the AES-based cipher system, which is operated on CCMP
Mode. All the modules in this core are described by using Verilog 2001 language. The
developed AES CCMP core is aimed at providing high speed with sufficient security. The
encryption/decryption data path operates at 194/148MHz resulting in a throughput of 2.257
Gbits/sec for the encryption and 1.722 Gbits/sec for decryption. Compared to software
implementation, migrating to hardware provides higher level of security and faster
encryption speed. A comparison is provided between our design and similar existing
implementations. Each input byte of the State matrix is independently replaced by another
byte from a look-up table called S-Box. The AES S-box is a 256-entry table composed of
two transformations: First each input byte is replaced with its multiplicative inverse in GF
(28) with the element {00} being mapped onto itself, followed by an affine transformation
over GF (28). For decryption, inverse S-box is obtained by applying inverse affine
transformation followed by multiplicative inversion in G(28).

Summary: In the current wireless LAN environment, WEP-the current algorithm for
security-is not safe against attacks. We consider AES CCMP algorithms for wireless LAN
security.

P. S. Abhijith ; Mallika Srivastava ; Aparna Mishra ; Manish Goswami ; B. R.


Singh ; “High performance hardware implementation of AES using minimal
resources” 2013 International Conference on Intelligent Systems and Signal
Processing (ISSP).

Increasing need of data protection in computer networks led to the development of several
cryptographic algorithms hence sending data securely over a transmission link is critically
20

important in many applications. Hardware implementation of cryptographic algorithms are


physically secure than software implementations since outside attackers cannot modify
them. In order to achieve higher performance in today’s heavily loaded communication
networks, hardware implementation is a wise choice in terms of better speed and
reliability. This paper presents the hardware implementation of Advanced Encryption
Standard (AES) algorithm using Xilinx– virtex5 Field Programmable Gate Array (FPGA).
In order to achieve higher speed and lesser area, Sub Byte operation, Inverse Sub Byte
operation, Mix Column operation and Inverse Mix Column operations are designed as
Look Up Tables (LUTs) and Read Only Memories (ROMs).

Encryption is usually done just before sending data. To utilize the channel resources
completely encryption algorithm must have a speed at least equivalent to data transmission
speed. Achieving high throughput for encryption algorithm for a communication channel
of high data rate is a challenging task.

Summary: With the designing of all the operations as LUTs and ROMs, the proposed
architecture achieves a throughput and thereby utilizing only slices in the targeted FPGA.

N. S. Sai Srinivas; Md. Akbaruddin; 2016 “FPGA based hardware implementation of


AES Rijndael algorithm for Encryption and Decryption” International Conference
on Electrical, Electronics, and Optimization Techniques (ICEEOT).

AES algorithm or Rijndael algorithm is a network security algorithm which is most


commonly used in all types of wired and wireless digital communication networks for
secure transmission of data between two end users, especially over a public network. This
paper presents the hardware implementation of AES Rijndael Encryption and Decryption
Algorithm by using Xilinx Virtex-7 FPGA. The hardware design approach is entirely
based on pre-calculated look-up tables (LUTs) which results in less complex architecture,
thereby providing high throughput and low latency. There are basically three different
formats in AES. They are AES-128, AES-192 and AES-256. The encryption and
decryption blocks of all the three formats are efficiently designed by using Verilog-HDL
and are synthesized on Virtex-7 XC7VX690T chip (Target Device) with the help of Xilinx
ISE Design Suite-14.7 Tool. The synthesis tool was set to optimize speed, area and power.
The power analysis is made by using Xilinx XPower Analyzer. Pre-calculated LUTs are
used for the implementation of algorithmic functions, namely S-Box and Inverse S-Box
21

transformations and also for GF (28) i.e. Galois Field Multiplications involved in Mix-
Columns and Inverse Mix-Columns transformations. The proposed architecture is found to
be having good efficiency in terms of latency, throughput, speed/delay, area and power.

Summary: The LUT based design approach gives less complex architecture and saves the
processing time to a great extent by retrieving the necessary values from memory
locations.

Ashwini M. Deshpande; Mangesh S. Deshpande; Devendra N. Atazanavir; 2009


“FPGA implementation of AES encryption and decryption” International Conference
on Control, Automation, Communication and Energy Conservation

Advanced Encryption Standard (AES), a Federal Information Processing Standard (FIPS),


is an approved cryptographic algorithm that can be used to protect electronic data. The
AES can be programmed in software or built with pure hardware. However Field
Programmable Gate Arrays (FPGAs) offer a quicker and more customizable solution. This
paper presents the AES algorithm with regard to FPGA and the Very High Speed
Integrated Circuit Hardware Description language (VHDL). ModelSim SE PLUS 5.7g
software is used for simulation and optimization of the synthesizable VHDL code.
Synthesizing and implementation (i.e. Translate, Map and Place and Route) of the code is
carried out on Xilinx - Project Navigator, ISE 8.2i suite. All the transformations of both
Encryption and Decryption are simulated using an iterative design approach in order to
minimize the hardware consumption. Xilinx XC3S400 device of Spartan Family is used
for hardware evaluation. This paper proposes a method to integrate the AES encrypter and
the AES decrypter. This method can make it a very low-complexity architecture, especially
in saving the hardware resource in implementing the AES (Inv) Sub Bytes module and
(Inv) Mix columns module etc. Most designed modules can be used for both AES
encryption and decryption. Besides, the architecture can still deliver a high data rate in
both encryption/decryption operations. The proposed architecture is suited for hardware-
critical applications, such as smart card, PDA, and mobile phone, etc.

Finally, the normalized Hamming distance algorithm is used for matching retrieval. In
order to protect the speech security in the cloud, a speech encryption algorithm based on a
4D hyper chaotic system is proposed. The experimental results show that the proposed
method has good discrimination, robustness, recall and precision compared with the
22

existing methods, and it has good retrieval efficiency and retrieval accuracy for longer
speech.

Summary: This method can make it a very low-complexity architecture and low hardware
utilization.

CHAPTER-III

EXISTING METHOD

In the existing method, we are implementing the AES 128-bits key for 128-bit data with
the same s-box. 128-bit AES encryption block is implemented in 14 rounds. Each round
consists of Add Round Key, Sub Bytes, Shift Rows, Mix column. Round 0 consists of only
Add round Key operation.

Round 14 consists of Sub Bytes, Shift Rows and Add Round Key operations, which need 3
clock cycles. Rounds 1 to 13 consists of all the four operations. We do a distinct operation
in each clock cycle. Hence once the hardware has been implemented for Add Round Key,
Sub Bytes, Shift Rows, mix column, the same hardware can be used for all the 14 rounds.
None of the four operations shares the same clock cycle.

The sequence of round operation with specific sequence of 4 operations to complete the
AES encryption. AES algorithm is serial process, i.e., output of first round is the input to
the second round. Hence, we can use the same hardware for each round. The data structure
of 128-bit matrix. Each column consists of 4 elements of 8 bits each, so in total we have 32
bits per word. The number of S-box required and Mix Column required to implement
conventional AES algorithm.

AES algorithm implementation is done using four operations namely Sub Bytes, Shift
Rows, Mix Columns and Add Round Key. Fig. 1 shows the architecture of 256-bit AES
algorithm. In total there are 14 rounds of operation for encryption and 14 rounds for
decryption. The cipher text after encryption will be transmitted across the channel. The
receiver side will decrypt the message using same key which is used in encryption. In 256-
bit AES algorithm, the key size is 256 bits, but all the data size is 128 bits. Data include
message to be encrypted, cipher text and the decrypted message.
23

The internal data structure of 128-bit data. The 128-bit data is used as 4x4 matrix, where
each element of the matrix is of 8 bits. Since all the four operations are performed on
columns basis, we convert the 128-bit data in 4x4 matrix with each element being 8 bits.

256-bit AES encryption block is implemented in 14 rounds. Each round consists of Add
Round Key, Sub Bytes, Shift Rows, Mix column. Round 0 consists of only Add round Key
operation as shown in Fig. 3. Round 14 consists of Sub Bytes, Shift Rows and Add Round
Key operations, which need 3 clock cycles as shown in Fig. 3. Rounds 1 to 13 consists of
all the four operations as shown in Fig. 13. We do a distinct operation in each clock cycle.
Hence once the hardware has been implemented for Add Round Key, Sub Bytes, Shift
Rows, mix column, the same hardware can be used for all the 14 rounds. None of the four
operations shares the same clock cycle. Fig. 3 shows the sequence of round operation with
specific sequence of 4 operations to complete the AES encryption. AES algorithm is serial
process, i.e., output of first round is the input to the second round. Hence, we can use the
same hardware for each round.

Table.1.1 Cycles Required in Each Round


24

Table 1.2 Structure of Conventional Implementation

3.1 FOUR STAGES OF EACH ROUND:

1) Substitute Bytes: The first transformation, SubBytes, is used at the encryption site.
It is a non-linear byte substitution that operates independently on each byte of the
state using a substitution table (S-Box). All the 16 bytes of the state are substituted
by the corresponding values which are found from the lookup table. In decryption,
InvSubBytes is used. Bytes of a state are substituted from InvSubBytes table.
Figure 2 shows the SubBytes operation.
2) Shift Rows: In the encryption, the state bytes are shifted left in each row. It is called
ShiftRows operation. The number of the shifts depends on the row number (0, 1, 2
or 3) of the state matrix. Row 0 bytes are not shifted and row 1, 2, 3 are shifted to 1,
2, 3 bytes left accordingly. Figure 3 shows the ShiftRows operation.
3) Mix Column: The MixColumns transformation operates at the column level. It
transforms each column of the state to a new column. The transformation is
actually the matrix multiplication of a state column by a constant square matrix. All
the arithmetic operations are conducted in the Galois Field (Finite Field). The bytes
are treated as polynomials rather than numbers. Figure 4 shows the MixColumns
operation.
4) Add Round Key: AddRoundKey proceeds one column at a time. It is similar to
MixColumns in this respect. AddRoundKey adds a round keyword to each column
matrix. Matrix addition operation is performed in the AddRoundKey stage. Figure
5 shows the AddRoundKey operation.
In encryption, SubBytes, ShiftRows, MixColumns, and AddRoundKey are performed in all
rounds except the last round. MixColumns transformation operation is not performed in the
last round of encryption. The decryption process essentially follows the same structure as
the encryption, in addition to the nine rounds of Inverse ShiftRows, Inverse SubBytes,
Inverse AddRoundKey and Inverse MixColumns Transformation.
25

Rijndael S-Box Generation Method The Rijndael S-Box is a square matrix which is used in
the Rijndael cipher. The S-Box serves as a lookup table. It is generated by determining the
multiplicative inverse for a given number in GF (28 ) and then transforming the
multiplicative inverse using affine transformation.

1) Multiplicative Inverse Phase: In multiplicative inverse phase, the input byte is


inversed by substituting value from multiplicative inverse table.
2) Affine Transformation: Selection of the irreducible polynomial and the designated
byte are the two most important factors of affine transformation phase. In Rijndael
AES, x 8 + x 4 + x 3 + x + 1 is used as the irreducible polynomial and as the
constant column matrix 0x63 specially designated byte is chosen. Basically, the
affine transformation consists of two operations. Firstly, 8x8 square matrix’s
multiplication and secondly, 8x1 constant column matrix addition.
26

CHAPTER-IV

PROPOSED METHOD

The proposed 32-bit AES implementation. We are doing operations per word (32-bits) in
each cycle. Number of blocks required for conventional (256 bit) and proposed (32-bit)
implementation are as follows

1) S box - 16, 4 per clock cycle


2) Mix column block - 4, 1 per clock cycle
In the proposed implementation, we are using the same Sbox hardware for both encryption
and decryption. Affine transform is the only difference between S-box (encryption) and
inverse S-box (decryption). All the other logic (for encryption and decryption) is same for
the S-box and inverse S-box. Hence, we are reusing every logic other than the Affine
transform for encryption and decryption. Fig. shows the mux selection between S-box and
inverse S-box [1]. While doing AES encryption S-box path is chosen and while doing AES
decryption inverse S-box path is chosen [1].

Figure 4.1 Combined Structure of S Box and Inverse S Box

In proposed 32-bit operation method, we are reusing S-box and Mix Column blocks. In
proposed design “Mix Column” and “Add Round Key” together we called Mix block.

Figure 4.2 Combined Structure of Mix Column and Add Round Key – Mix
27

Table 4.1 Pipelined Structure of Proposed Method

the pipeline structure of the proposed design, where each color represents different round
as follows, Mix – round 0

Mix – round 1

Mix – round 2

Mix – round 3 and Mix – round 14

Each word having a size of 32 bits. In cycle 1, we are doing Mix operation of word 0
(mix_0). We can denote this as cycle1 [round0 (mix_0)].

Hence this 32-bit word is available to undergo 32- bit Sub operation. Hence in cycle 2 we
are doing sub operation of word 0 (sub_0) and Mix operation of word 1 (mix_1). We have
valid input for “sub” block word 0 in clock cycle 2, and hence we don’t need to wait for all
28

the 4 words “mix” block operation to complete. We can denote this as


cycle2[round1(sub_0), round0(mix_1)].

In clock cycle 3, we are doing sub operation for word 1 (sub_1), shift operation of word 0
(shift_0) and mix operation of word 2 (mix_2), and. We can denote this as cycle3[round1
(sub_1, shift_0), round0(mix_2)].

In clock cycle 4, we are doing sub operation for word 2 (sub_2) and shift operation for
word 1 (shift_1) and mix operation for word 3 (mix_3). We can denote this as
cycle4[round1(sub_2, shift_1), round0(mix_3)].

Since all 128-bit (4 words) round 0 mix operation completed, we don’t have mix operation
in cycle 5. In clock cycle 5, we are doing sub operation for word 3 (sub_3) and shift
operation for word 2 (shift_2). We can denote this as cycle5[round1(sub_3, shift_2)].

Since all 128-bit (4 words) round 0 sub operation completed, we don’t have sub operation
in cycle 6. In clock cycle 6, we are doing round 1 shift operation of word 3 (shift_3) and
mix operation of word 0 (mix_0) and. Since we already have the last byte value from
sub_3, we are using that for mix_0. In this way we don’t need to wait extra one cycle of
shift operation to start the mix operation. We can denote this as cycle6[round1(shift_3,
mix_0)].

In clock cycle 7, we are doing sub operation for word 0 (sub_0) and mix operation for
word 1 (mix_1). We can denote this as cycle7[round2(sub_0), round1(mix_1)].

In clock cycle 8, we are doing sub operation for word 1 (sub_1) shift operation for word 0
(shift_0) and mix operation for word 2 (mix_2). We can denote this as
cycle8[round2(sub_1, shift_0), round1(mix_2)].

In clock cycle 9, we are doing sub operation for word 2 (sub_2), shift operation for word 1
(shift_1) and mix operation for word 3 (mix_3). We can denote this as cycle9[round2
(sub_2, shift_1), round1 (mix_3)].

In clock cycle 10, we are doing sub operation for word 3 (sub_3) shift operation for word 2
(shift_2). The same order of execution repeats for all 14 rounds. We can denote this as
cycle10[round2(sub_3, shift_2)]. The same sequence repeats for all 14 rounds.
29

As shown in Fig. 10, in cycle 6 we are using sub bytes S box for key generation block,
because of this we don’t need extra S box for key generation block. We are generating
128-bit key for every 5 cycles, so that it requires only 4 S box in one cycle. In conventional
method, we need 8 S box for key generation block. 128-bit key generated in cycle 6 will be
used in cycle 14 mix operation. Similarly, key generated in cycle 11 used in mix operation
of cycle 19.

As shown in Fig. 10, in cycle 9 we have valid output for round 1 (cycle 5 to 9), so we need
5 cycles to perform round 1. Total we need 74 clock cycles to complete AES encryption.

4.1 SIDE CHANNEL ATTACKS

Scan test has been widely adopted as a default testing technique among most VLSI
designs, including crypto cores. Unfortunately, these scan chains might be used as a “side
channel” to recover the secret keys from the hardware implementations of cryptographic
algorithms, for example scan-based attacks on Data Encryption Standard (DES), Advanced
Encryption Standard (AES), and Elliptic Curve Cryptography (ECC) have been illustrated
in [1]–[3], respectively.

SE

FF
DI
0
D
O
SI 1
SO

CK
Figure 4.3- Normal scan FF

In general, the existing scan-based side channel attacks (SSCA) could be viewed as one
kind of differential cryptanalysis by using scan chains of crypto cores. Unlike other known
side channel attacks, SSCA is much easier. It is because that in SSCA, in addition to the
primary outputs of the crypto cores, a hacker could use scan chain to shift out the
intermediate contents during a
30

Cryptographic operation. It was illustrated in [2] that on average overall only 544
plaintexts are required to discover the AES key by using SSCA, which clearly shows the
great potential threat of scan-based side channel attack.

4.2 PREVIOUS IMPLEMENTATIONS OF THE S-BOX

One of the most common and straight forward implementations of the S-Box for the
SubByte operation which was done in previous work was to have the pre-computed values
stored in a ROM based lookup table. In this implementation, all 256 values are stored in a
ROM and the input byte would be wired to the ROM’s address bus. However, this method
suffers from an unbreakable delay since ROMs have a fixed access time for its read and
write operation. [3] Furthermore, such implementation is expensive in terms of hardware.

A more refined way of implementing the S-Box is to use combinational logic. Such
examples of work that implements the S-Box using this method were [1], [3] and [5]. This
S-Box has the advantage of having small area occupancy, in addition to be capable of
being pipelined for increased performance in clock frequency. The S-Box architecture
discussed in this paper is based on the combinational logic implementation.

4.3 THE SUB BYTE AND INV SUB BYTE TRANSFORMATION

The Sub Byte transformation is computed by taking the multiplicative inverse in GF (2^8)
followed by an affine transformation. For its reverse, the Inv Sub Byte transformation, the
inverse affine transformation is applied first prior to computing the multiplicative inverse.
The steps involved for both transformation is shown below.

SubByte: Multiplicative Inversion in GF (2^8) Affine Transformation

InvSubByte: Inverse Affine Transformation  Multiplicative Inversion in GF (2^8).

The AT and AT -1 are the Affine Transformation and its inverse while the vector a is
the multiplicative inverse of the input byte from the state array. From here, it is observed
that both the SubByte and the InvSubByte transformation involve a multiplicative
inversion operation. Thus, both transformations may actually share the same multiplicative
inversion module in a combined architecture. An example of such hardware architecture is
shown below. Switching between SubByte and InvSubByte is just a matter of changing the
value of INV. INV is set to 0 for SubByte while 1 is set when InvSubByte operation is
desired.
31

Figure 4.4- Sub byte/inverse sub byte implementation

4.4 S-BOX CONSTRUCTION METHODOLOGY

This section illustrates the steps involved in constructing the multiplicative inverse
module for the S-Box using composite field arithmetic. Since both the SubByte and
InvSubByte transformation are similar other than their operations

Which involve the Affine Transformation and its inverse, therefore only the
Implementation of the Sub Byte operation will be discussed in this paper. The
multiplicative inverse computation will first be covered and the affine transformation will
then follow to complete the methodology involved for constructing the S-Box for the
SubByte operation. For the InvSubByte operation, the reader can reuse multiplicative
inversion module and combine it with the Inverse Affine Transformation.

4.5 SUBBYTE TRANSFORMATION

First multiplicative inversion of the eight-bit value is taken then affine transformation
is done. The following matrix is the affine transformation
32

4.5.1 MULTIPLICATIVE INVERSION:

Figure 4.5- Multiplicative inverse.

δ and δ inverse Multiplication:

For x 2

Figure 4.6- Squaring in GF.

(x3, x2, x1, and x0)=> (q3, q2, q1, q0).


q3=x3.
q2=x3^x2.
q1=x2^x1.
q0=x3^x1^x0.
33

For x−1:
(q3, q2, q1, q0)=> (q3, q2, q1, q0)-1

For xλ:

Figure 4.7- Multiplication in GF.

(X3, x2, x1, and x0) => (q3, q2, q1, q0).
q3=x2^x0.
q2=x3^x2^x1^x0.
q1=x3.
q0=x2.
34

For 4x4 multiplication:

Figure 4.8 4x4 multiplication.

4.5.2 INVERSE SUBBYTE:

After taking inverse affine transform the eight-bit sub byte value is transformed into
eight bit value by undergoing multiplicative inversion. The matrix shows the inverse affine
transform.

Figure 4.9
Inverse SubByte

4.6 PROPOSED
SCAN FLIP FLOP
Due to the
security and testability
requirements as mentioned
above, a novel robust secure scan-based test approach is proposed as a countermeasure
against scan-based differential cryptanalysis.
35

Figure 4.10- Proposed bit masking FF model.

When in normal function mode (SE==0) SFF loads data from the logic through DI,
and the output to logic is DO. Because the additional inverter and the XOR gate are
inserted along the scan path, they do not affect the timing of the design. Thus in function
mode, RSSF works like a traditional scan flip flop…. When in scan test mode, we can
observe from Fig. 4.7 during scan shift operation, the content of FF is XOR ed with SI to
be shifted out to the next SFF and the inverted scan-in data (SI) will be loaded into FF.
Thus, for hackers, it becomes extremely complicated to identify the relationship between
the captured response and the scan-out.

4.7 BIT LEVEL MASKING IN AES


The basic idea of the proposed RSS design is to encrypt the contents in scan chains
during scan operation, so as to reduce the controllability and observability of unintended
users. By doing this, it becomes more complicated for hackers to identify the bit
differences between pairs of related plaintexts when they are encrypted under the same
key. One kind of the proposed RSS design is shown in Fig. 4.7, in which the contents of
two neighboring SFFs are encoded during scan operation from a security aspect.
When compared with the traditional SFF, an extra inverter and an XOR gate are
introduced in the RSS design. This simple logic could be used for encryption during scan
operations. Observe that the proposed robust scan flip-flop (RSSF) has identical pin outs
when compared with the traditional scan flip-flop as shown in Fig. 4.1, and is therefore
fully compatible with industry standard design tools from a design perspective, when
integrated into current design flows it only requires the RSSF added into the cell library.
36

CHAPTER 5

ADVANTAGES & APPLICATIONS


Advantages:

 Low complexity.

 High security.

Applications:

 Wireless security,

 Processor security,

 File encryption.
37

CHAPTER 6

XILINX AND VERILOG HDL

6.1 INTRODUCTION

 HDL is an abbreviation of Hardware Description Language. Any digital system can


be represented in a REGISTER TRANSFER LEVEL (RTL) and HDLs are used to
describe this RTL.
 Verilog is one such HDL and it is a general-purpose language –easy to learn and
use. Its syntax is similar to C.
 The idea is to specify how the data flows between registers and how the design
processes the data.
 To define RTL, hierarchical design concepts play a very significant role.
Hierarchical design methodology facilitates the digital design flow with several
levels of abstraction.
 Verilog HDL can utilize these levels of abstraction to produce a simplified and
efficient representation of the RTL description of any digital design.
 For example, an HDL might describe the layout of the wires, resistors and
transistors on an Integrated Circuit (IC) chip, i.e., the switch level or, it may
describe the design at a more micro level in terms of logical gates and flip flops in a
digital system, i.e., the gate level. Verilog supports all of these levels.

6.2 DESIGN STYLES

Any hardware description language like Verilog can be design in two ways one is bottom-
up design and other one is top-down design.

Bottom-Up Design:

The traditional method of electronic design is bottom-up (designing from transistors and
moving to a higher level of gates and, finally, the system). But with the increase in design
complexity traditional bottom-up designs have to give way to new structural, hierarchical
design methods.
38

Top-Down Design:

For HDL representation it is convenient and efficient to adapt this design-style. A real top-
down design allows early testing, fabrication technology independence, a structured system
design and offers many other advantages. But it is very difficult to follow a pure top-down
design. Due to this fact most, designs are mix of both the methods, implementing some key
elements of both design styles.

6.3 FEATURES OF VERILOG HDL

 Verilog is case sensitive.


 Ability to mix different levels of abstract freely.
 One language for all aspects of design, testing, and verification.
 In Verilog, Keywords are defined in lower case.
 In Verilog, most of the syntax is adopted from "C" language.
 Verilog can be used to model a digital circuit at Algorithm, RTL, Gate and Switch
level.
 There is no concept of package in Verilog.
 It also supports advanced simulation features like TEXTIO, PLI, and UDPs.

6.4 VLSI DESIGN FLOW


The VLSI design cycle starts with a formal specification of a VLSI chip, follows a series
of steps, and eventually produces a packaged chip.

Figure 6.1 - VLSI DESIGN FLOW


39

Specification:
The first step of any design process is to lay down the specifications of the system. System
specification is a high level representation of the system. The factors to be considered in
this process include: performance, functionality, and physical dimensions like size of the
chip. The specification of a system is a compromise between market requirements,
technology and economical viability. The end results are specifications for the size, speed,
power, and functionality of the VLSI system.
Architectural Design
The basic architecture of the system is designed in this step. This includes, such decisions
as RISC (Reduced Instruction Set Computer) versus CISC (Complex Instruction Set
Computer), number of ALUs, Floating Point units, number and structure of pipelines, and
size of caches among others. The outcome of architectural design is a Micro-Architectural
Specification (MAS).
Behavioral or Functional Design:
In this step, main functional units of the system are identified. This also identifies the
interconnect requirements between the units. The area, power, and other parameters of
each unit are estimated.
Modules. The key idea is to specify behavior, in terms of input, output and timing of each
unit, without specifying its internal structure.
The outcome of functional design is usually a timing diagram or other relationships
between units.
Logic Design:
In this step the control flow, word widths, register allocation, arithmetic operations, and
logic operations of the design that represent the functional design are derived and tested.
This description is called Register Transfer Level (RTL) description. RTL is expressed in
a Hardware Description Language (HDL), such as VHDL or Verilog. This description can
be used in simulation and verification
Circuit Design:
The purpose of circuit design is to develop a circuit representation based on the logic
design. The Boolean expressions are converted into a circuit representation by taking into
40

consideration the speed and power requirements of the original design. Circuit Simulation
is used to verify the correctness and timing of each component
The circuit design is usually expressed in a detailed circuit diagram. This diagram shows
the circuit elements (cells, macros, gates, transistors) and interconnection between these
elements. This representation is also called a netlist. And each stage verification of logic is
done.
Physical design:
In this step the circuit representation (or netlist) is converted into a geometric
representation. As stated earlier, this geometric representation of a circuit is called a layout.
Layout is created by converting each logic component (cells, macros, gates, transistors)
into a geometric representation (specific shapes in multiple layers), which perform the
intended logic function of the corresponding component. Connections between different
components are also expressed as geometric patterns typically lines in multiple layers.
Layout verification:
Physical design can be completely or partially automated and layout can be generated
directly from netlist by Layout Synthesis tools. Layout synthesis tools, while fast, do have
an area and performance penalty, which limit their use to some designs. These are verified.
Fabrication and Testing:
Silicon crystals are grown and sliced to produce wafers. The wafer is fabricated and diced
into individual chips in a fabrication facility. Each chip is then packaged and tested to
ensure that it meets all the design specifications and that it functions properly.

6.5 Work Flow - Xilinx Verilog HDL

Getting started

Frist we need to download and install Xilinx and ModelSim. These tools both have free
student versions. Please accomplish Appendix B, C, and D in that order before continuing
with this tutorial. Additionally if you wish to purchase your own Spartan3 board, you can
do so at Digilent’s Website. Digilent offers academic pricing. Please note that you must
download and install Digilent Adept software. The software contains the drivers for the
board that you need and also provides the interface to program the board.
41

Creating a New Project

Xilinx Tools can be started by clicking on the Project Navigator Icon on the Windows
desktop. This should open up the Project Navigator window on your screen. This window
below window shows the last accessed project.

Fig 6.2 Xilinx Project Navigator Window

Opening a project

Select File->New Project to create a new project. This will bring up a new project window
(Figure 2) on the desktop. Fill up the necessary entries as follows:
42

Fig 6.3 Opening a Project

Project Name: Write the name of your new project which is user defined.
Project Location: The directory where you want to store the new project in the specified
location in one of your drive. In above window they are stored in location c drive which is
not correct , the location of software and code should not be same location.

Clicking on NEXT should bring up the following window:

Fig 6.4 Device Properties


For each of the properties given below, click on the ‘value’ area and select from the list of
values that appear.
 Device Family: Family of the FPGA/CPLD used. In this laboratory we will be
using the Spartan3E FPGA’s.
 Device: The number of the actual device. For this lab you may enter XC3S250E
(this can be found on the attached prototyping board)
 Package: The type of package with the number of pins. The Spartan FPGA used in
this lab is packaged in CP132 package.
 Speed Grade: The Speed grade is “-4”.
 Synthesis Tool: XST [VHDL/Verilog]
 Simulator: The tool used to simulate and verify the functionality of the design.
Modelsim simulator is integrated in the Xilinx ISE. Hence choose “Modelsim-XE
Verilog” as the simulator or even Xilinx ISE Simulator can be used.
43

Then click on NEXT to save the entries.


All project files such as schematics, netlists, Verilog files, VHDL files, etc., will be stored
in a subdirectory with the project name.
In order to open an existing project in Xilinx Tools, select File->Open Project to show the
list of projects on the machine. Choose the project you want and click OK.

Clicking on NEXT on the above window brings up the following window:

Fig 6.5 Create New Source

Create New source window (snapshot from Xilinx ISE software)


If creating a new source file, click on the NEW SOURCE.
Creating a Verilog HDL input file for a combinational logic design:
In this lab we will enter a design using a structural or RTL description using the Verilog
HDL. You can create a Verilog HDL input file (.v file) using the HDL Editor available in
the Xilinx ISE Tools (or any text editor).
In the previous window, click on the NEW SOURCE
A window pops up as shown in Figure 4. (Note: “Add to project” option is selected by
default. If you do not select it then you will have to add the new source file to the project
manually.)
44

Fig 6.6 Select Source Type

Select Verilog Module and in the “File Name:” area, enter the name of the Verilog source
file you are going to create. Also make sure that the option Add to project is selected so
that the source need not be added to the project again. Then click on Next to accept the
entries.

Fig 6.7 Define Module


In the Port Name column, enter the names of all input and output pins and specify the
Direction accordingly. A Vector/Bus can be defined by entering appropriate bit numbers
in the MSB/LSB columns. Then click on Next>to get a window showing all the new
source information above window. If any changes are to be made, just click on <Back to
go back and make changes. If everything is acceptable, click on Finish > Next > Next >
Finish to continue.
45

Fig 6.8 Project Summary

Synthesis and Implementation of the Design:


The design has to be synthesized and implemented before it can be checked for
correctness, by running functional simulation or downloaded onto the prototyping board.
With the top-level Verilog file opened (can be done by double-clicking that file) in the
HDL editor window in the right half of the Project Navigator, and the view of the project
being in the Module view , the implement design option can be seen in the process view.
Design entry utilities and Generate Programming File options can also be seen in the
process view.
To synthesize the design, double click on the Synthesize Design option in the Processes
window.
To implement the design, double click the Implement design option in the Processes
window. It will go through steps like Translate, Map and Place & Route. If any of these
steps could not be done or done with errors, it will place a X mark in front of that,
otherwise a tick mark will be placed after each of them to indicate the successful
completion
After synthesis right click on synthesis and click view text report in order to generate the
report of our design.
46

Fig 6.9 Implementing the Design

By double clicking it opens the top level module showing only input(s) and output(s) as
shown below.

Fig 6.10 Top Level Hierarchy of the design


47

By double clicking the rectangle, it opens the realized internal logic as shown below.

Fig 6.11 Internal logic

CHAPTER 7

MODEL SIM
7.1 CREATING A NEW PROJECT
After starting Model Sim, click on File > New > Project and select a directory on your U:
drive for the project as shown in Figure 2 and Figure 3. This directory will contain all files
for the new project

Fig 7.1 Create a new project.


48

Fig 7.2: Selecting a directory for a new project


After completing these steps, ModelSim creates necessary project and preset files to later
ease the process of opening projects and view previous simulation logs.

7.2 ADDING FILES TO MODELSIM

The next step is to add files to the project. There are two options, either to create a new file
within ModelSim built-in text editor or add an existing file from the directory. The choice
is made with the pop-up window shown in Figure 4, which should show up automatically
after creating a new project. The files that can be added here are. v files, i.e., Verilog HDL
files.

Fig 7.3: Pop-up window to select file additions


49

7.3 COMPILING FILES


To compile a file, right click on the file name within the project directory and then
Compile > Compile All (see Figure 8). If the compilation is successful, a green tick mark
will appear in the status column for this file.

Fig 7.4: Compiling files in ModelSim

7.5 STARTING A SIMULATION

Before you proceed to this step, make sure that all files in your project are compiled
successfully. This is necessary because there could be dependencies between the files.

To run the simulation click Simulate > Start Simulation as shown in Figure 9. A pop-
up window will prompt you to select the file that you want to simulate. In the “Design”
tab, look for an item called “work” and then click the “+” button that is immediately to
its left (see Figure 10). This will show more files. Click on the file that you want to test
and then click OK.

Fig 7.5: Starting a simulation in


ModelSim
50

7.6 OBSERVATION FROM SIMULATION

After a successful compilation start simulation with required input files added from
a project folder, then stop the simulation.

Now we can get a snap for simulation result, from a project folder we need to select
a individual module for analysis. We can zoom the simulation window and note down the
required area of simulation.

Fig 7.6: Observation From Simulation


51
52

CHAPTER 8
8.1 RESULTS

Fig 8.1 AES RTL schematic

Fig 8.2 AES Technology schematic


53

Fig 8.3 AES Simulation result

Fig 8.4 AES Without Masking


54

Fig 8.5 AES With Masking


55

Fig 8.6 Multistage AES


56

Fig 8.7 AES Composite Output (128 bits)


57

Fig 8.8 AES Pipeline Output (128 bits)


58

Fig 8.9 AES Pipeline Output (256 bits)

8.2 RESULTS COMPARISON

Table 8.1 Result Comparison


59

CHAPTER 9

CONCLUSION
In this project, we implemented the composite s box-based AES for 256-bit data
encryption. The proposed AES algorithm is highly optimized in Key leakage avoidance
and Sub bytes blocks, for Area and Power. The optimization has been done by reusing the
S-box block. We proposed reusing the and Key Schedule operations, reusing the same
hardware for both encryption and decryption. The proposed method is generic and can be
used for any Key size. In this simple architecture is used for the key selection and multi-
level confusion matrices were added to make the key strong.
60

REFERENCES
[1] M. Rajeswara Rao, Dr.R.K.Sharma, SVE Department, NIT Kurushetra “FPGA
Implementation of combined S box and Inv S box of AES” 2017 4th International
conference on signal processing and integrated networks (SPIN).

[2] Nalini C. Iyer ; Deepa ; P.V. Anandmohan ; D.V. Poornaiah “Mix/InvMixColumn


decomposition and resource sharing in AES”.

[3] Xinmiao Zhang, Student Member, IEEE, and Keshab K. Parhi, Fellow, “High Speed
VLSI architectures for the AES Algorithm”, IEEE. VOL.12. No.9. September 2004

[4] Shrivathsa Bhargav, larry Chen, abhinandan Majumdar, Shiva Ramudit “128 bit AES
Decryption”, CSEE 4840 – Embedded system Design spring 2008, Columbia University.

[5] Atul M. Borkar ; R. V. Kshirsagar ; M. V. Vyawahare “FPGA implementation of AES


algorithm”.

[6] Announcing the Advanced Encryption Standard (AES), November 26 2001.

[7] Yulin Zhang ; Xinggang Wang; “Pipelined implementation of AES encryption based
on FPGA” 2010 IEEE International Conference on Information Theory and Information
Security.

[8] Yuwen Zhu ; Hongqi Zhang ; Yibao Bao ; “Study of the AES Realization Method on
the Reconfigurable Hardware” 2013 International Conference on Computer Sciences and
Applications.

[9] Tsung-Fu Lin ; Chih-Pin Su ; Chih-Tsun Huang ; Cheng-Wen Wu; “A high-throughput


low-cost AES cipher chip” Proceedings. IEEE AsiaPacific Conference on ASIC.

[10] C. Sivakumar ; A. Velmurugan ; “High Speed VLSI Design CCMP AES Cipher for
WLAN (IEEE 802.11i)” 2007 International Conference on Signal Processing,
Communications and Networking.

[11] Vatchara Saicheur ; Krerk Piromsopa ; “An implementation of AES128 and AES-512
on Apple mobile processor” 2017 14th International Conference on Electrical
Engineering/Electronics, Computer, Telecommunications and Information Technology
(ECTI-CON)
61

[12] S.P Guruprasad ; B.S Chandrasekar ; “An evaluation framework for security
algorithms performance realization on FPGA” 2018 IEEE International Conference on
Current Trends in Advanced Computing (ICCTAC)

[13] N. S. Sai Srinivas ; Md. Akramuddin; “FPGA based hardware implementation of AES
Rijndael algorithm for Encryption and Decryption” 2016 International Conference on
Electrical, Electronics, and Optimization Techniques (ICEEOT).

[14] P. S. Abhijith ; Mallika Srivastava ; Aparna Mishra ; Manish Goswami ; B. R. Singh ;


“High performance hardware implementation of AES using minimal resources” 2013
International Conference on Intelligent Systems and Signal Processing (ISSP).

[15] Wei Wang ; Jie Chen ; Fei Xu ; “An implementation of AES algorithm Based on
FPGA” 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery

[16] Ashwini M. Deshpande ; Mangesh S. Deshpande ; Devendra N. Kayatanavar; “FPGA


implementation of AES encryption and decryption” 2009 International Conference on
Control, Automation, Communication and Energy Conservation

You might also like