Professional Documents
Culture Documents
See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
IET Computers & Digital Techniques
Research Article
Abstract: This study presents a high throughput field-programmable gate array (FPGA) implementation of advanced encryption
standard-128 (AES-128). AES is a well-known symmetric key encryption algorithm with high security against different attacks
that are widely used in different applications. The main goal of this study is to design a high throughput and FPGA efficiency
(FPGA-Eff) cryptosystem for high-traffic applications. To achieve high throughput, loop-unrolling, inner and outer pipelining
techniques are employed. In AES, substitution bytes (Sub-Bytes) is one of the costly functions that occupy a large number of
resources and has a large delay. To reduce the area of Sub-Bytes, new-affine-transformation, which is the combination of
inverse isomorphic and affine transformation, is proposed and employed. Besides that, AES has been modified according to the
proposed architecture. For the first nine rounds, Shift-Rows and Sub-Bytes have been exchanged, and Shift-Rows is merged
with Add-Round-Key. To make an equal latency between stages, Mix-Columns is divided into two different stages. AES is
implemented in counter mode on Xilinx Virtex-5 using VHDL. The proposed implementation achieves a throughput of 79.7
Gbps, FPGA-Eff of 13.3 Mbps/slice, and frequency of 622.4 MHz. Compared to the state-of-the-art work, the proposed design
has improved data throughput by 8.02% and FPGA-Eff by 22.63%.
IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 6, pp. 344-352 344
© The Institution of Engineering and Technology 2020
1751861x, 2020, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cdt.2019.0179 by INASP/HINARI - PAKISTAN, Wiley Online Library on [10/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
• The AES algorithm is modified in which Sub-Bytes and Shift-
Rows are exchanged for the first nine rounds; Add-Round-Key
and Shift-Rows are merged into one stage.
• The Sub-Bytes is optimised in which inverse isomorphic and
affine transformation (Af) are combined together, and new-
affine-transformation, hereafter referred to as New-Aff, is
proposed.
• The critical path of the composite field arithmetic for
multiplicative inverters in GF(28) is considered; and the delay of
Sub-Bytes is reduced by inserting registers in optimal places.
• Mix-Columns are implemented based on logic gates and into
two different stages. To make an equal delay between these two
stages, the first stage is added to Sub-Bytes.
• Full loop unrolling, inner (inserting registers between rounds)
and outer (inserting registers among functions) pipeline stages
are used.
IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 6, pp. 344-352 345
© The Institution of Engineering and Technology 2020
1751861x, 2020, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cdt.2019.0179 by INASP/HINARI - PAKISTAN, Wiley Online Library on [10/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
calculation is not very complicated, it is possible to calculate the
values by incoming data according to (1)
a′ 02 03 01 01 a
b′ 01 02 03 01 b
= ×
c′ 01 01 02 03 c
d′ 03 01 01 02 d
a′ = 2 ⋅ a + 3 ⋅ b + c + d
b′ = a + 2 ⋅ b + 3 ⋅ c + d
→ (1)
c′ = a + b + 2 ⋅ c + 3 ⋅ d
d′ = 3 ⋅ a + b + c + 2 ⋅ d
a′ = 2 ⋅ a + (2 ⋅ b + b) + c + d
Fig. 3 Different dependent cryptography modes
(a) CBC, (b) PCBC, (c) CFB, (d) OFB (IV stands for initialisation vector) b′ = a + 2 ⋅ b + (2 ⋅ c + c) + d
→
c′ = a + b + 2 ⋅ c + (2 ⋅ d + d)
d′ = (2 ⋅ a + a) + b + c + 2 ⋅ d
x7 = a6
x6 = a5
Fig. 6 Proposed Mix-Columns-1 with Shift, NOT, and Mix 2 × 1
x5 = a4
is observed in Fig. 5, the edge of the encrypted image in ECB can x4 = a7 ⊕ a3
be easily detected; and by doing that the main features of the plain- x = (2 ⋅ a) mod (x8 + x4 + x3 + x + 1) → (4)
image are distinguishable. In comparison with ECB, the encrypted x3 = a7 ⊕ a2
image in CTR is uniform and indistinguishable. x2 = a1
x1 = a7 ⊕ a0
4 AES functions implementation x0 = a7
This section describes efficient techniques for implementation of
the Mix-Columns and Sub-Bytes, and express the employed As 2·α is a shift bit to left, the other way to calculate 2·α using
method in the proposed design. one-bit shift to left, one Mux 2 × 1, and a NOT logic gate. In (4),
when a7 = 0, 2·α is equal to the one-bit shift to the left of α, and
4.1 Mix-Columns when a7 = 1, 2·α is calculated by one-bit shift to the left of α XOR
by x4 + x3 + x + 1 , which is equal to ‘1b’ in hex. As ‘1b’ is fixed,
There are different ways and methods for calculating the Mix-
the result can be transferred to NOT gate. The diagram of
Columns. As the Mix-Columns has fixed values, one way is to pre-
calculating 2·α by Mux, Shift, and NOT is shown in Fig. 6. In the
calculate values and store them as a LUT. This method, however, is
proposed design, the Mix-Columns is executed in two different
not suitable because of high area consumption and high latency and
stages: in one stage, which is called ‘Mix-Columns-1’, 2·α is
is not efficient for area limited applications. As the Mix-Columns
calculated; in another stage, which is called ‘Mix-Columns-2’, the
final values of (1) is computed.
346 IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 6, pp. 344-352
© The Institution of Engineering and Technology 2020
1751861x, 2020, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cdt.2019.0179 by INASP/HINARI - PAKISTAN, Wiley Online Library on [10/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1 0 1 0 0 0 0 0 x7
1 1 0 1 1 1 1 0 x6
1 0 1 0 1 1 0 0 x5
1 0 1 0 1 1 1 0 x4
δ×x= × (6)
1 1 0 0 0 1 1 0 x3
1 0 0 1 1 1 1 0 x2
Fig. 7 Implementation of Sub-Bytes using composite field with the
0 1 0 1 0 0 1 0 x1
combination of Af and δ−1
0 1 0 0 0 0 1 1 x0
4.2 Sub-Bytes
1 1 1 0 0 0 1 0 x7
Sub-Bytes is the most costly transformation in AES, in both time
0 1 0 0 0 1 0 0 x6
and area aspects. The first step for finding the values of this
function is to calculate the multiplicative inverse of inputs over 0 1 1 0 0 0 1 0 x5
GF(28). The multiplicative inverse of f(x) is g(x) that 0 1 1 1 0 1 1 0 x4
(7)
−1
f x ⋅ g x mod x8 + x4 + x3 + x + 1 = 1. The second step is δ ×x= ×
0 0 1 1 1 1 1 0 x3
computation over an Af. The important effect of applying an 1 0 0 1 1 1 1 0 x2
invertible Af before and after the Sub-Bytes is that the resistance of
S-boxes against most attacks remains unchanged [18]. Equation (5) 0 0 1 1 0 0 0 0 x1
is the Af, which g x = (a7′a6′a5′a4′a3′a2′a1′a0′) is the multiplicative 0 1 1 1 0 1 0 1 x0
inverse of f(x), and Y is the Sub-Bytes of the f(x)
GF(22) → GF(2)
Y = Af(g(x)) + Φ
GF((22)2) → GF(22) →
1 1 1 1 1 0 0 0 a′7 0 2 2 2 2
GF(((2 ) ) ) → GF((2) )
0 1 1 1 1 1 0 0 a′6 1 (8)
0 0 1 1 1 1 1 0 a′5 1 P0(x): x2 + x + 1
0 0 0 1 1 1 1 1 a′4 0 (5) P1(x): x2 + x + ϕ, where φ = {10}2
= × ⊕ 2
1 0 0 0 1 1 1 1 a′3 0 P2(x): x + x + λ, where λ = {1100}2
1 1 0 0 0 1 1 1 a′2 0
One of the advantages of using the composite field arithmetic is to
1 1 1 0 0 0 1 1 a′1 1
reduce the area of Sub-Bytes as well as it is an efficient method for
1 1 1 1 0 0 0 1 a′0 1 pipeline design, while the hardware complexity is its disadvantage.
As it is mentioned, the Sub-Bytes consists of two steps: in the first
There are different hardware implementations of Sub-Bytes step, the multiplicative inverse in GF(28) is calculated; the second
function. Direct calculation of the multiplicative inverse is not step is to apply the Af. Let ‘Mul’ and ‘S’ define as multiplicative
easy. As all of the values are available, one of the straightforward inverse and Sub-Bytes. Thus, Sub-Bytes can be written as follows:
implementations of the S-box is pre-calculating the values and
storing them in a LUT read only memory. The most significant part S( f (x)) = Af(g(x)) + Φ
of the input determines the row of the table and the least significant
part determines the column of the table. The read and write from S( f (x)) = Af(δ−1(Mul(δ ⋅ f (x)))) + Φ
(9)
the LUT to registers need high latency and more areas. Thus, LUT S( f (x)) = Af ⋅ δ−1(Mul(δ ⋅ f (x))) + Φ
is not an efficient method for high-speed application and pipeline
S( f (x)) = γ(Mul(δ ⋅ f (x))) + Φ
design.
The other way is the Boolean simplification map for the pre-
calculation of Sub-Bytes using truth table based on the Karnaugh The ‘γ’ is calculated by: ‘γ = Af·δ−1’; thus, instead of using two
map to make a direct relation between the parameter of the Sub- different matrices in two different stages, one matrix is used in the
Bytes function. The simplification of the whole S-box table is not design. γ is called New-Aff. New-Aff is written as
an easy way. The authors of [4] divided the S-box table into 16
tables and for each sub-table the SoP is calculated as a module 1 0 0 0 1 1 0 0 x7
logic function. The 16 to 1 multiplexer was used to achieve the 1 1 1 1 0 0 0 0 x6
output. The most significant 4-bit are the inputs of the modules and 1 0 0 0 0 1 0 0 x5
the least significant 4-bit are used as the selector of the multiplexer.
Although this method reduces the latency of the S-box by LUT, it 1 0 0 1 0 0 1 1 x4
γ = Af ⋅ δ−1 = × (10)
needs a large area too. 0 0 0 0 0 1 1 1 x3
Instead of working with the S-box, the multiplicative inverse is 0 1 1 1 1 1 0 1 x2
calculated by composite field arithmetic and followed by the Af.
1 0 0 0 0 0 0 1 x1
The composite field arithmetic is based on GF(21), GF(22), and
GF((22)2). As a result, the element that is on GF(28) should be 1 1 0 0 0 1 1 1 x0
mapped to its composite field representation. The isomorphic
function, ‘δ’, and its inverse, ‘δ−1’ are used to transfer between Fig. 7 shows the employed multiplicative inverse GF(28) in the
GF(28) and its composite field and vice versa ((6) and (7)). After proposed architecture. In Fig. 7, ‘δ’ is the isomorphic mapping to
mapping to the composite field, the elements in GF(28) are composite fields, ‘x2’ is the squarer in GF(24), ‘λ’ is the
decomposed to the lower order fields of GF((22)2), GF(22), and multiplication with constant ‘λ’ in the GF(24), ‘x−1’ is the
GF(21) by the irreducible polynomials (8) multiplicative inversion in GF(24), and ‘X’ is the multiplication
operation in GF(24).
In Fig. 7, λx2 is squarer in GF(24) followed by multiplication
with constant λ in GF(24). Every binary number in the Galois field
‘q’ can be represented by bx + c, where ‘b’ is the upper half-term
(‘qH’) and ‘c’ is the lower half-term (‘qL’). By considering this idea
IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 6, pp. 344-352 347
© The Institution of Engineering and Technology 2020
1751861x, 2020, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cdt.2019.0179 by INASP/HINARI - PAKISTAN, Wiley Online Library on [10/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Fig. 8 Implementations of inversion in GF(24) by square and multiplication approach
348 IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 6, pp. 344-352
© The Institution of Engineering and Technology 2020
1751861x, 2020, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cdt.2019.0179 by INASP/HINARI - PAKISTAN, Wiley Online Library on [10/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Fig. 10 The proposed block diagram of AES with pipeline and loop unrolled technique
(a) Loop unrolled and outer pipelined stages, (b) Inner pipelined stages
AES-128 is executed ten times that need 4 × 10 = 44 columns of some other comparable FPGA implementations. The first five
keys. In the proposed design, keys are expanded before running the implementations, including our work, are on the same platform.
algorithm and are stored into registers. The experiments performed in [5, 13, 23] are in CTR mode. In
[5], four 128-bit AES cores are executed in parallel; as the authors
5.3 Security analysis of the proposed cryptosystem of [5] have mentioned that the throughput of a single core is a
quarter of their four designs, the throughput for one core is equal to
The security of a cryptosystem contains two parts: the algorithm 44.7 Gbps. The authors of [5] did not mention which number of
and the implementation. In the proposed architecture, AES is slices they had used for calculation FPGA-Eff. Since the reported
implemented without any alteration on the algorithm's functions. slice number in [5] is closed to the total number of Slices-LUT in
Since the content of AES has not been changed, it is expected the FPGA, it is assumed that the reported number would be the number
design keeps the same level of confidentiality and security. of Slices-LUT. The encryption part of the AES algorithm was
In the implementation side, there are some attacks and threats designed by Farashahi et al. [25] that included a feedback loop and
that depend on the implementation design. A side-channel attack is iterative architecture, which occupied fewer areas and required
implementation-dependent. The target of side-channel attacks is more clock cycles. To make a fair comparison with [25], AT = area
exploiting the relationship between physical attributes, such as × time has been calculated, which is a common and wide
power consumption and timing behaviour, and the manipulated measurement in literature, including [31]. As the proposed
data. Loop unrolling is a recommended technique for increasing cryptosystem has been tested by 688 × 576 pixels images, the
the security against side-channel attacks [21]. Loop unrolling, encryption time in [25] has been manually calculated, and it is
inner-round pipelining, and outer-round pipelining are used for the equal to 507 µs (for the proposed design it is equal to 40 µs). Then,
design, which guarantees reasonable protection from side-channel AT is 1.8 and 0.6 s × slices for [25] and the proposed design,
attacks. Also, the proposed architecture is secure against timing respectively, which has 66% improvement through encryption over
attacks because (i) there is no conditional branches and dependency the same task. As a result, although the design in [25] occupied a
between the inputs and cipher-text; (ii) for encrypting each few LUTs, it is not suitable for encrypting sequential and
plaintext, the proposed architecture executes in a constant number dependent large data, such as images. Although the method
of clock cycles; and (iii) the critical path delay of the proposed proposed in [13] was implemented in different FPGAs, it is
architecture is constant during the execution [22]. pipelined with CTR mode, which is very similar to our work. As is
observed from Table 2, the proposed design has the highest
6 Implementation results, simulation, and throughput, 79.7 Gbps. The throughput of the proposed design is
comparison improved by 8.02%, compared with the best previous work [23],
for implementing on the mentioned FPGA and same cryptography
The results of the implementation are reported in this section. The mode. By considering the number of occupied slices, the proposed
target FPGA is Xilinix Vertix-5 (XC5VLX85-FF676-3) by using design's FPGA-Eff is 13.3 Mbps/slice by 22.63% improvement
ISE 14.7 and ISim 14.7 for synthesis, simulation, and post- over [29].
placement timing results with VHDL. In the pipelined AES The proposed cryptosystem can encrypt and decrypt different
implementations, plaintext blocks are accepted in each clock cycle. plaintexts and plaint-images. Owing to two reasons, the proposed
As the design is pipelined, the throughput is calculated by the design is tested on some medical images: first, images have
number of processed bits per cycle, which is 128-bit for the design, sequential and dependent data blocks; second, the transmission of
multiplied by the frequency. Thanks to the pipelined medical images over the network has been increased rapidly. To
implementation, reducing the latency of the design by inserting the show the independency of inputs, however, coloured images with
pipeline stage in optimised placement, and exchanging and different sizes have been tested by the proposed cryptosystem.
merging some part of the AES, the achieved clock cycle is 1.607 ns Block random access memories (BRAM) are used to store image
and the frequency is 622.4 MHz. The throughput of the design is data when testing encryption or decryption using the proposed
79.7 Gbps. Moreover, the FPGA efficiency (FPGA-Eff) is a design. Afterwards, the data is sent to the cryptosystem, and the en/
parameter to consider the hardware resources of a design and it is decrypted results are then sent back to the host PC. Table 2 lists the
calculated by the design's throughput divided by the number of results of our proposed AES cryptosystem and offers a direct
slices (in Mb/s per slice). However, different articles calculated comparison to related works.
FPGA-Eff by a different number of slices that contains number of Histogram of an image is a statistical feature that represents the
slices LUT and number of occupied slices (the detail of total distribution of pixel values in a digital image. A secure and
implementation results are written in Table 2), in which the total effective cryptosystem should produce a uniform cipher histogram
number in Vertix-5 (XC5VLX85) is 51,840 and 12,960, distribution, which guarantees the resistance against statistical
respectively. Table 2 shows the result of the proposed work and attacks. Fig. 13 depicts the main tested images, the encryption (in
350 IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 6, pp. 344-352
© The Institution of Engineering and Technology 2020
1751861x, 2020, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cdt.2019.0179 by INASP/HINARI - PAKISTAN, Wiley Online Library on [10/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Fig. 13 Encryption, decryption, and histogram of coloured images with different size by using the proposed cryptosystem
(a) Main images, (b) Histograms of main images, (c) Encrypted images, (d) Histograms of encrypted images
noiseless status) of images, and the histogram of the images and terms of throughput, maximum frequency, and efficiency provides
their corresponding ciphers. the best performance.
The histogram of a coloured image is produced by taking the
histograms over each of its three channels (red, green, and blue). 8 Acknowledgments
As can be seen in Fig. 13, the histograms of the encrypted images
by the proposed cryptosystem are uniform for every encrypted This work was supported by the University of Saskatchewan
image and different from the input images. The proposed Dean's Scholarship and NSERC.
cryptosystem encrypts a grey-scale image with 688 × 576 pixels in
40 µs. Every coloured image has three channels (red, green, and 9 References
blue); and to encrypt (or decrypt) a coloured image, every channel
[1] Granado-Criado, J.M., Vega-Rodríguez, M.A., Sánchez-Pérez, J.M., et al.: ‘A
should be encrypted (or decrypted) independently and finally new methodology to implement the AES algorithm using partial and dynamic
merged together. A coloured image with 688 × 576 × 3 pixels is reconfiguration’, Integration, 2010, 43, (1), pp. 72–80
encrypted in 120 µs. [2] Shahbazi, K., Eshghi, M.: ‘Design of a specific instructions set processor for
AES algorithm’, Int. J. Comput. Appl., 2014, 93, (4), pp. 36–40
[3] Ali, L., Aris, I., Hossain, F.S., et al.: ‘Design of an ultra high speed AES
7 Conclusion processor for next generation IT security’, Comput. Electr. Eng., 2011, 37, (6),
pp. 1160–1170
In this work, a high throughput and high FPGA-Eff [4] Ahmad, N., Hasan, R., Jubadi, W.M.: ‘Design of AES S-box using
implementation of AES-128 algorithm in CTR mode for high- combinational logic optimization’. IEEE Symp. on Industrial Electronics &
traffic applications have been presented. In this design, loop Applications (ISIEA), Penang, Malaysian, October 2010, pp. 696–699
[5] Soltani, A., Sharifian, S.: ‘An ultra-high throughput and fully pipelined
unrolling, inner pipelining, and outer pipelining techniques are implementation of AES algorithm on FPGA’, Microprocess. Microsyst., 2015,
employed. Using the pipeline and loop unrolling techniques have 39, (7), pp. 480–493
increased the security against hardware-dependent attack. The [6] Oukili, S., Bri, S., Kumar, A.V.S.: ‘High speed efficient FPGA
main important factor for achieving the high throughput and implementation of pipelined AES S-box’. 4th IEEE Int. Colloquium on
Information Science and Technology (CiSt), Marrakesh, Morocco, October
efficiency are: reducing the design's delay by exchanging and 2016, pp. 901–905
merging AES functions; optimising the Sub-Bytes, by employing [7] Rachh, R.R., Anami, B.S., Mohan, P.V.A.: ‘Efficient implementations of S-
New-Aff, and Mix-Columns; reducing the critical path of the box and inverse S-box for AES algorithm’. TENCON, IEEE Region 10 Int.
composite field arithmetic. To achieve the low delay followed by a Conf., Singapore, January 2009, pp. 1–6
[8] Shastry, P.V.S., Agnihotri, A., Kachhwaha, D., et al.: ‘A combinational logic
high frequency of the design, the efforts have been devoted to implementation of S-box of AES’. IEEE 54th Int. Midwest Symp. on Circuits
make an approximate equal delay between different sub-pipeline and Systems (MWSCAS), Seoul, South Korea, August 2011, pp. 1–4
stages. The target FPGA was Vertix-5 (XC5VLX85-FF676-3) and [9] Shahbazi, K., Eshghi, M., Mirzaee, R.F.: ‘Design and implementation of an
the achieved frequency, throughput, and FPGA-Eff were 622.4 ASIP-based cryptography processor for AES, IDEA, and MD5’, Eng. Sci.
Technol. Int. J., 2017, 20, (4), pp. 1308–1317
MHz, 79.7 Gbps, and 13.3 Mbps/slice, respectively. In comparison
with others, the results show that this implementation of AES in
IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 6, pp. 344-352 351
© The Institution of Engineering and Technology 2020
1751861x, 2020, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cdt.2019.0179 by INASP/HINARI - PAKISTAN, Wiley Online Library on [10/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
[10] Soliman, M.I., Abozaid, G.Y.: ‘FPGA implementation and performance [21] Bhasin, S., Graba, T., Danger, J.-L., et al.: ‘A look into SIMON from a side-
evaluation of a high throughput crypto coprocessor’, J. Parallel Distrib. channel perspective’. IEEE Int. Symp. on Hardware-Oriented Security and
Comput., 2011, 71, (8), pp. 1075–1084 Trust (HOST), Arlington, USA, May 2014, pp. 56–59
[11] Li, Z., Zhuang, Y., Zhang, C., et al.: ‘Low-power and area-optimized VLSI [22] Kocher, P.C.: ‘Timing attacks on implementations of Diffie-Hellman, RSA,
implementation of AES coprocessor for Zigbee system’, J. China Univ. Posts DSS, and other systems’. Annual Int. Cryptology Conf., Santa Barbara, USA,
Telecommun., 2009, 16, (3), pp. 89–94 August 1996, pp. 104–113
[12] Anwar, H., Daneshtalab, M., Ebrahimi, M., et al.: ‘FPGA implementation of [23] Qu, S., Shou, G., Hu, Y., et al.: ‘High throughput, pipelined implementation
AES-based crypto processor’. IEEE 20th Int. Conf. on Electronics, Circuits, of AES on FPGA’. Int. Symp. on Information Engineering and Electronic
and Systems (ICECS), Abu Dhabi, United Arab Emirates, December 2013, Commerce, Ternopil, Ukraine, May 2009, pp. 542–545
pp. 369–372 [24] Kouzehzar, H., Moghadam, M.N., Torkzadeh, P.: ‘A high data rate pipelined
[13] Granado-Criado, J.M., Vega-Rodríguez, M.A.: ‘Hardware coprocessors for architecture of AES encryption/decryption in storage area networks’. Iranian
high-performance symmetric cryptography’, J. Supercomput., 2017, 73, (6), Conf. on Electrical Engineering (ICEE), Mashhad, Iran, May 2018, pp. 23–28
pp. 2456–2482 [25] Farashahi, R.R., Rashidi, B., Sayedi, S.M.: ‘FPGA based fast and high-
[14] Yoo, S.-M., Kotturi, D., Pan, D.W., et al.: ‘An AES crypto chip using a high- throughput 2-slow retiming 128-bit AES encryption algorithm’,
speed parallel pipelined architecture’, Microprocess. Microsyst., 2005, 29, (7), Microelectron. J., 2014, 45, (8), pp. 1014–1025
pp. 317–326 [26] Li, H., Ding, J., Pan, Y.: ‘Cell array reconfigurable architecture for high-
[15] Guo, Z., Li, G., Liu, Y.: ‘Dynamic reconfigurable implementations of AES efficiency AES system’, Microelectron. Reliab., 2012, 52, (11), pp. 2829–
algorithm based on pipeline and parallel structure’. The 2nd Int. Conf. on 2836
Computer and Automation Engineering (ICCAE), Singapore, February 2010, [27] El Maraghy, M., Hesham, S., El Ghany, M.A.A.: ‘Real-time efficient FPGA
pp. 257–260 implementation of AES algorithm’. IEEE Int. SOC Conf., Erlangen,
[16] Wang, C., Heys, H.M.: ‘Using a pipelined S-box in compact AES hardware Germany, September 2013, pp. 203–208
implementations’. Proc. 8th IEEE Int. NEWCAS Conf., Montreal, Canada, [28] Rais, M.H., Qasim, S.M.: ‘FPGA implementation of Rijndael algorithm using
June 2010, pp. 101–104 reduced residue of prime numbers’. IEEE 4th Int. Design and Test Workshop
[17] Canright, D.: ‘A very compact S-box for AES’. Int. Workshop on (IDT), Riyadh, Saudi Arabia, November 2009, pp. 1–4
Cryptographic Hardware and Embedded Systems, Edinburgh, United [29] Rais, M.H., Qasim, S.M.: ‘Efficient hardware realization of advanced
Kingdom, August 2005, pp. 441–455 encryption standard algorithm using Virtex-5 FPGA’, Int. J. Comput. Sci.
[18] Aldabbagh, S.S.M., Al Shaikhli, I.F.T.: ‘Security of present S-box’. Int. Conf. Netw. Secur., 2009, 9, (9), pp. 59–63
on Advanced Computer Science Applications and Technologies (ACSAT), [30] Banu, J.S., Vanitha, M., Vaideeswaran, J., et al.: ‘Loop parallelization and
Kuala Lumpur, Malaysia, November 2012, pp. 219–222 pipelining implementation of AES algorithm using OpenMP and FPGA’.
[19] Abed, S., Jaffal, R., Mohd, B.J., et al.: ‘FPGA modeling and optimization of a IEEE Int. Conf. on Emerging Trends in Computing, Communication and
SIMON lightweight block cipher’, Sensors, 2019, 19, (4), pp. 913–940 Nanotechnology (ICECCN), Tirunelveli, India, March 2013, pp. 481–485
[20] Mazumdar, B., Ali, S.S., Sinanoglu, O.: ‘A compact implementation of [31] Ebrahimi, S., Bayat-Sarmadi, S., Mosanaei-Boorani, H.: ‘Post-quantum
Salsa20 and its power analysis vulnerabilities’, ACM Trans. Des. Autom. cryptoprocessors optimized for edge and resource-constrained devices in
Electron. Syst., 2016, 22, (1), pp. 1–26 IoT’, IEEE Internet of Things J., 2019, 6, (3), pp. 5500–5507
352 IET Comput. Digit. Tech., 2020, Vol. 14 Iss. 6, pp. 344-352
© The Institution of Engineering and Technology 2020