You are on page 1of 6

Chapter

Introduction
1
1.1 Preamble

The National Institute for Science and Technology (NIST), U.S. Department of
Commerce, in the year 1997 invited proposals from researcher and academic groups for
developing a new symmetric-key encryption standard. The evaluation criteria were
Security, Cost and Implementation characteristics of the Algorithm. Security was the
most important criteria, which encompassed the features such as resistance of the
algorithm to cryptanalysis, soundness of its mathematical basis and randomness of the
algorithm output [1] (James Nechvatal, 2000). Cost was the second important
evaluation criteria that encompassed licensing requirements, computational efficiency
on various platforms, memory requirements and hardware implementations. The third
evaluation criterion was implementation characteristics such as flexibility, hardware and
software suitability, and algorithm simplicity. After reviewing the results of the
preliminary research and analysis by cryptographic research community, NIST decided
to propose Rijndael algorithm developed by Joan Daemen and Vincent Rijmen as the
Advanced Encryption Standard (AES). The Rijndael algorithm demonstrated was
having the best performance on both hardware and software platforms. It also had the
shortest encryption/decryption time and also known to be resistant to all known linear
and differential cryptanalysis.

AES is an unclassified, publicly disclosed encryption algorithm, available royalty-free,


with symmetric key as a block cipher which supports a block size of 128 bits and key
sizes of 128, 192, and 256 bits [2] (Federal Information Processing Standards
Publication 197, November 26, 2001). Even though the software implementation of
AES had advantages viz. ease of upgrade, flexibility and portability, it lacks the strong
physical security. On the contrary the hardware implementations have been proven
more physically secure, being very difficult to modify or read by any attacker [3]

I
(Akashi Satoh, 2001). The hardware implementations of AES can be optimized for
speed, size and power consumption. Two major targeted platforms for AES
implementations are Field Programmable Gate Arrays (FPGA) and Application Specific
Integrated Circuits (ASIC). There are many possible architectural options to the
hardware design of AES on FPGAs and ASICs. These architectural options includes,
internal pipelining, external pipelining, rolling and loop unrolling. Selection of these
options depends on requirement of different speed/area trade-offs for different
applications of AES algorithm.

1.2 Background and Motivation

Different real time applications such as electronic transaction and audio/video


communication require significantly large network bandwidth. In addition to network
bandwidth such applications also require high security measures. Security processing
includes encryption and decryption of data, which normally are computation intensive,
demands powerful architectures so as to reduce the impact of delay overheads on the
primary application. Solutions like design of security (cryptographic) coprocessor that
off loads cryptographic algorithms from main processors have gained importance in
high speed network applications. Some applications demand the security of vendor
secrets inside a device to facilitate gradual feature activation, secure firmware updates
and aspects of user privacy. Most of the embedded systems are based on small
microprocessors with limited computing power and, executions of computationally
costly cryptographic algorithms without severely impacting the performance on these
platforms are extremely impractical. Compared to these microprocessor based
implementations, specifically designed hardware implementations can be designed
optimally for speed critical and area complex applications.

Applications using AES algorithm may require different speed/area/power trade-offs.


Some applications pertaining to cellular phones and smart cards would prefer small area
and low power implementation of AES on their design, whereas speed critical
applications like CCTV recorders, real time applications would require higher
speed/area ratio. Hardware based implementation of AES is highly secure and consumes
much less power than a software implementation. Various architectural level
optimization needs to be explored to suit the demands of resource critical application

2
like, USB pen drives, inductively powered RF identification (RFID), smart cards and
wireless sensor networks (WSNs) etc. The optimization methodology may include
resource sharing between encryptor/decryptor units and on-the-fly computations to
reduce area. Complexity in computation can be reduced through the use of look-up-
tables (LUTs) but requires high memory space. Duplicating and pipelining the
hardware, required for round units, can achieve higher speed, while folding of the
architectures can achieve smaller area of implementation.

The motivating force for our research is to develop architectures for high throughput,
and low area with lesser effect of trade-offs on each other, so as to integrate the
cryptographic hardware as an IP core into the application chipsets. This ensures
encryption of data, right at the place of origination.

1.3 Objectives

The research aims on developing optimal performance architectures to achieve high


throughput and minimum area. Following are the principal objectives that have been set
for the proposed research work.
1. Envisaging various configurations/architectural styles/techniques like rolled,
unrolled, partially rolled, parallel, systolic arrays and exploiting composite field
arithmetic for designing AES architectures for high throughput and/or low area
implementation.
2. To evolve with possibly novel architectures, with highest possible sharing of
resources for encryption and decryption.
3. To design the architecture for implementation on ASIC platform.
4. To compare and verify the evolved architectures with the existing designs.

Detailed objectives are set to achieve the above mentioned principle objectives so as to
follow a methodology to pursue them. They are mentioned as below:
1. To develop on the fly computation for memory less architecture for low area
and low power implementation of AES sub-processes as listed below:
a. Substitute Byte operation
b. MixColumn operation
c. Key expansion
2. To reduce the memory requirement, for Look-Up-Table (LUT) approach of
implementation of SubstituteByte, MixColumn and Key Expansion for
encryption and decryption
3. To develop a combined architecture for encryption and decryption, sharing
maximum resources between them.
4. To design and verify higher throughput, rolled architecture for all key sizes,
sharing maximum hardware resources between encryption and decryption.
5. To design and verify higher throughput, pipelined architecture for all key
sizes, sharing maximum hardware resources between encryption and
decryption.
6. To develop a combined systolic architecture for encryption and decryption
data path.
7. To develop a systolic architecture for Key Expansion unit.

1.4 Methodology followed

Primarily the research carried out is organized in two directions, one for investigating
high speed architectures and the other for investigating minimum area architectures for
implementing AES. The AES architecture has two concurrent data paths, namely as
Encryption/Decryption (ED) data path and Key Expansion (KE) data path. The round
key generated in the KE data path is required in the corresponding round of ED data
path. In order to achieve the objectives: High throughput and Low area, architectural
transformations are employed. In addition to the concurrent ED and KE data path of
AES, there exist concurrencies and scheduling of the sub-process within the data paths,
in design, for implementing AES on hardware platforms. Our investigations are divided
in two stages. The first stage is being the actual design of the architecture, supporting
either high throughput or low area implementation. The second stage would be the
layouting and physical design, using 180nm standard cell libraries and computation of
throughput, area and power consumption of the implementations.

The architectures designed are hence implemented on Virtex-4 FPGA, prior to physical
layouting. Intentionally an oversized device has been selected, so that the logic and

4
routing resource constraints of the device does not affect the functional verification of
the design. A timing simulation verifies the post synthesis and post technology
mapping functionality, of the design. The physical layout design is done on 180nm
technology using Taiwan Semiconductor Manufacturing Company (TSMC) standard
cell libraries.

The adjoining chart in Figure 1.1 shows the development stages of the investigation
carried out to meet the goals. The memory based design employs LUTs, require
minimum computations and is less complex for hardware implementation, whereas the
OTF computation methodology requiring no memory blocks, is higher in complexity
for hardware implementation. A further optimization of the LUT or OTF design
strategies, for low area can be achieved by using rolled architecture and high throughput
can be achieved using pipelined architecture. The upper part of the chart illustrates the
two strategies, whereas, the lower part of the chart gives option for performance of the
design on throughput and area count. Systolic array architecture is a midway solution
and can also be considered as optimal between the low area and high throughput
implementation.

AES Architecture

(Memory based Design)


(Memory less Design)
Look-Up-Table
On-The-fly
based
computations
SBOX
Substitute Byte
TBox
- Key Expansion Key Expansion using
(All key sizes) SBox,Expanded Key
Registers(All key sizes)

i \
(Low Area Implementations) (High Speed Implementations)
Rolled Architecture Pipelined Architecture

(Optimal Implementations)!/'
Systolic Architecture

Mix Column, Inv Mix Column


Key Expansion (All key sizes)

Figure 1.1 Development stages and flow of investigation


1.5 Platforms used and Implementation Strategy

All the three architectures (Rolled, Pipelined and Systolic) designed are developed and
their first hand verification is conducted on Xilinx ISE 9.1, onVertex-4 FPGA devices.
All the simulations (Functional, Post Synthesis and Post Routing) are performed using
ModelSim. The verified architecture is then synthesized using Cadence's RTL Compiler
12.1. The synthesis is done using 180nm TSMC standard cell libraries. The first
estimation of the clock frequency is determined on post synthesis design netlist. The
design is then optimized for minimizing the hardware used in the post synthesis design,
on the basis of synthesis report generated. Sufficient slack in timing is considered while
writing the constraints for synthesis, so as to accommodate other eventualities that may
crop up while physical layout. The final post synthesis design in netlist form is taken
for the final physical layout, which is performed using Cadence's SoC Encounter 12.1.
Multiple iterations of clock synthesis and routing is done to determine maximum clock
frequency attainable by the design.

1.6 Organization of the thesis

The organization of the thesis is as follows: Chapter 2 discusses the literature survey
and highlights of the work of various researchers in the field of hardware
implementation of AES. Chapter 3 describes the AES algorithm, specifications and key
expansion. Chapter 4 encompasses the concept of Look-Up-Table and On-The-Fly
computation approaches, used for implementing AES. Design of architectures, their
performance, comparison and conclusions for rolled, pipelined and Systolic
architectures, are described in Chapter 5, and 6. The later Chapter summarizes the
contributions of our research work, conclusions and scope for further investigations.
The thesis ends with references, publications and information regarding Indian patent
filed, based on the work carried out.

You might also like