You are on page 1of 8

5500 IEEE INTERNET OF THINGS JOURNAL, VOL. 6, NO.

3, JUNE 2019

Post-Quantum Cryptoprocessors Optimized for


Edge and Resource-Constrained Devices in IoT
Shahriar Ebrahimi , Siavash Bayat-Sarmadi , Member, IEEE, and Hatameh Mosanaei-Boorani

Abstract—By exponential increase in applications of the


Internet of Things (IoT), such as smart ecosystems or e-health,
more security threats have been introduced. In order to resist
known attacks for IoT networks, multiple security protocols must
be established among nodes. Thus, IoT devices are required
to execute various cryptographic operations, such as public
key encryption/decryption. However, classic public key cryp-
tosystems, such as Rivest–Shammir–Adlemon and elliptic curve
cryptography are computationally more complex to be efficiently
implemented on IoT devices and are vulnerable regarding quan-
tum attacks. Therefore, after complete development of quantum
computing, these cryptosystems will not be secure and practi-
cal. In this paper, we propose InvRBLWE, an optimized variant
for binary learning with errors over the ring (Ring-LWE) scheme
that is proven to be secure against quantum attacks and is highly
efficient for hardware implementations. We propose two archi- Fig. 1. Architecture of IoT network and available hardware resources in
tectures for InvRBLWE: 1) a high-speed architecture targeting each layer.
edge and powerful IoT devices and 2) an ultralightweight archi-
tecture, which can be implemented on resource-constrained nodes
in IoT. The proposed architectures are scalable regarding secu-
rity levels and we provide experimental results for two versions While IoT shapes the future of Internet, communications
of the InvRBLWE scheme providing 84 and 190 bits of clas-
sic security. Our implementation results on field programmable among IoT nodes must be secured against different attacks,
gate array dominate the best of the classic and post-quantum malwares, and viruses. To implement secure channels for
previous implementations. Moreover, our two different appli- communications, each IoT node must execute at least a few
cation specific integrated circuit (ASIC) implementations show operations from cryptographic primitives, such as public key
improvement in terms of speed, area, power, and/or energy. To encryption/decryption. Fig. 1 presents the overall architecture
the best of our knowledge, we are the first to implement learning
with error-based cryptosystems on ASIC platform. of current advanced IoT networks that are constructed from
three major layers: 1) cloud; 2) edge devices; and 3) IoT end-
Index Terms—Hardware implementation, Internet of Things nodes. While servers and some of edge devices can benefit
(IoT), lattice-based cryptography, post-quantum cryptography,
ring learning with errors (Ring-LWEs), from high-performance resources, such as 64-bit processors
and field programmable gate arrays (FPGAs), the end-nodes
have constrained resources. The practical cryptographic hard-
I. I NTRODUCTION ware solution for such resource-constrained devices can only
NTERNET of Things (IoT) introduces a dynamic and be achieved through implementation of the architecture on
I highly adaptive network by extending the connectivity abil-
ities to embedded devices, such as sensors and actuators.
application specific integrated circuit (ASIC) platforms.
We note that most of classic public key cryptosystems
Therefore, in recent years, IoT applications, such as smart are computationally more complex to be efficiently imple-
transportation, home automation, manufacturing automation, mented on IoT nodes [2], [3]. In addition to such high
and e-healthcare have grown exponentially among vari- complexity, current classic public key cryptosystems, such as
ous customers ranging from end-users to big organizations. Rivest–Shammir–Adlemon (RSA) [4] and elliptic curve cryp-
Connecting more IoT devices, especially resource-constrained tography (ECC) [5], [6] rely on hard problems that have
ones, to the network results in higher security threats [1]. polynomial time solutions using quantum search algorithm [7].
Thus, when quantum computing gains required computation
Manuscript received December 19, 2018; revised February 4, 2019; power, all of classic cryptosystems loose their security lev-
accepted February 18, 2019. Date of publication March 5, 2019; date of cur- els and will need very large keys to remain secure. This
rent version June 19, 2019. This work was supported in part by the Sharif
University of Technology under Grant G960803. (Corresponding author: makes current implementations of RSA and ECC impractical
Siavash Bayat-Sarmadi.) in post-quantum era due to increased hardware implementation
The authors are with the Department of Computer Engineering, complexity. Therefore, it is necessary to consider alterna-
Sharif University of Technology, Tehran 145888-9694, Iran (e-mail:
shebrahimi@ce.sharif.edu; sbayat@sharif.edu; mosanaei@ce.sharif.edu). tive cryptosystems that rely on other hard problems and
Digital Object Identifier 10.1109/JIOT.2019.2903082 are quantum-resistant. Some examples include shortest/closest
2327-4662 c 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
EBRAHIMI et al.: POST-QUANTUM CRYPTOPROCESSORS OPTIMIZED FOR EDGE AND RESOURCE-CONSTRAINED DEVICES IN IoT 5501

vector problems (SVP/CVP) in lattices [8], learning with implementation consumes 62% and 66% less area and power
errors (LWEs) [9] and its variants [10]–[12], code-based cryp- compared to the most lightweight implementation of ECC,
tography [13], and isogenies over elliptic curve [14], [15]. respectively. Furthermore, proposed architectures are shown
There are multiple cryptosystems proposed based on afore- to be resistant against simple power analysis (SPA) [26] and
mentioned hard problems among which lattice-based cryptog- timing attacks [27]. The main contributions of this paper are
raphy is more practical for resource-constrained nodes in IoT summarized as follows.
due to its relatively fast operations with low complexity. In the 1) A variant of the Ring-BinLWE scheme has been
recent post-quantum cryptography standardization by National proposed, namely InvRBLWE, which is fully optimized
Institute of Standard and Technology (NIST) [16], most of the for hardware implementation.
proposed schemes in the first round of general submissions are 2) The operation cost in the proposed scheme is reduced
based on lattice hard problems [17]–[19]. by omitting all of the required reduction operations.
In 2009, Regev et al. [9] introduced a new lattice-based 3) Our high-speed implementation dominates previous
cryptosystem that relies on the hardness of the LWE problem. work in terms of speed on both FPGA and ASIC
The search version of the problem is defined over finding platforms.
secret s in a linear combination a.s + e, where a is known 4) The high-speed ASIC implementations of LWE-based
and e is an error according to a certain distribution. LWE cryptosystems requires significantly lower energy com-
and its variants have shown resistance to various types of pared to the best AISC implementations of classic and
classic and quantum attacks [10]–[12], [20]–[22]. In 2010, a post-quantum cryptosystems. This architecture best suits
new variant of LWE based on ring theory introduced in [10], for battery-based IoT devices.
called Ring-LWE. It utilizes ideal lattices and has relatively 5) The ultralightweight implementation of InvRBLWE on
smaller key sizes compared to the original LWE scheme. In ASIC platform consumes less power and area compared
2016, Buchmann et al. [11] proposed a new variant of Ring- to that ECC requires. To the best of our knowledge,
LWE, namely Ring-BinLWE, by choosing errors from a binary this is the first public key implementation to have such
distribution instead of Gaussian one. low power consumption that can even be supplied by
Many hardware implementations for LWE-based schemes Vibration Piezo or electro magnetic (EM) [3] energy
have been proposed [17], [23]–[25]. Recently, a hard- harvesting units.
ware implementation based on Ring-BinLWE [11] has been The rest of the paper is organized as follows. In Section II,
proposed in [25] that is relatively faster than previous imple- required background is described. Section IV provides detailed
mentations for Ring-LWE schemes. information of the proposed scheme and architectures. We
In this paper, we propose a hardware-optimized variant compare our implementation results against previous work on
of Ring-BinLWE, hereafter referred to as InvRBLWE. Our both FPGA and ASIC platforms in Section V. Finally, the
optimization utilizes inverted ring of Ring-BinLWE and there- paper is concluded in Section VI.
fore, matches 2’s-complement notation range that is highly
optimized for hardware implementation. In Section III, we II. P RELIMINARIES
justify that operations over the ring require no reduction
In this section, we describe the required background and
when ring elements are presented in 2’s-complement nota-
notations to follow the rest of this paper. At first, we make
tion and thus, the entire reduction operations are omitted from
a few clarifications regarding mathematical notations used in
the InvRBLWE scheme. We propose two architectures for
this paper. Later, this paper presents a brief history of devel-
InvRBLWE targeting different IoT devices.
opments in lattice-based cryptosystems. Finally, we describe
1) High-Speed: A fast and low-complexity architecture,
the Ring-BinLWE scheme [11] in details.
which is a good match for edge and high-performance
devices in IoT.
2) Ultralightweight: An ultra low-power and low area A. Ring Theory
architecture targeting resource-constrained end-nodes For an integer q ∈ Z, we define the finite ring Zq =
that are powered by batteries or energy harvesting units Z/qZ = {0, 1, . . . , q − 1}. In other words, Zq is a finite ring,
in IoT. where q is its modulus. For an integer n ∈ Z, the set of all
Regarding implementation results, the high-speed archi- n-dimensional vectors, which each dimension belongs to Zq ,
tecture has been implemented on FPGA that is considered is denoted as Znq = {w0 , w1 , . . . , wn−1 |wi ∈ Zq }.
a high-performance platform practical on edge and pow- The set of all polynomials whose coefficients belong to Z is
erful IoT devices. Our FPGA implementations dominate shown as Z[x]. Similarly, we refer to the set of all polynomials
previous hardware implementations of any LWE-based scheme that have their coefficients chosen from Zq , as Zq [x]. Thus,
by improving Area × Time (AT) complexity with at least each vector in Znq can be mapped to a unique polynomial of
52% in encryption/decryption operations. The proposed archi- degree n − 1 in Z[x]. In addition to simple rings over integers,
tectures for InvRBLWE are platform-independent and can we can extend ring definitions to polynomials. Therefore, the
also be implemented on ASIC platforms, which are ideal ring of polynomials R is defined as R = Z[x]/f (x), where
for crypto-processors in IoT devices. The ASIC high-speed f (x) is the modulus of the ring. If all of the coefficients of the
implementation results appear to be at least two times faster polynomials in the ring are chosen from Zq , we refer to the
than previous work. Additionally, the ultralightweight ASIC polynomial ring as Rq = Zq [x]/f (x).
5502 IEEE INTERNET OF THINGS JOURNAL, VOL. 6, NO. 3, JUNE 2019

2) Encryption: To encrypt an n-bit message m ∈ {0, 1}n ,


Bob encodes the message as a unique polynomial m̄ in
Rq using the function described in (1). Then three n-
bit random binary vectors e1 , e2 , e3 ∈ Rq are chosen
to calculate the ciphertext using the known polynomial
a and Alice’s public key p. The ciphertext consists of
a pair of two polynomials c1 = a.e1 + e2 and c2 =
p.e1 + e3 + m̄ in Rq . The length of the ciphertext is
q
2 × n × log2 bits
ENCODE : {0, 1}n −→ Rq
 q i (1)
(m0 , . . . , mn−1 ) −→ n−1
i=0 mi ( 2 )x .
3) Decryption: To be able to decrypt the ciphertext, Alice
uses the private key r2 to calculate m̄ = c1 .r2 + c2 ∈ Rq .
Alice uses the function described in (2) to decode m̄
DECODE : Rq −→ {0, 1}n
Fig. 2. Ring-BinLWE scheme. n−1 i
ai x → (m0 , . . . , mn−1 )
i=0 
q (2)
1, |ai − i −  n−3 2 | > 4
mi =
B. Ring-BinLWE Scheme 0, else.
In 2009, Regev et al. introduced a new lattice-based cryp- The main advantage of the Ring-BinLWE compared to the
tosystem that relies on the hardness of the LWE problem [9]. other Ring-LWE-based schemes is the binary error distribu-
The search version of the problem is defined as finding a secret tion, which can be implemented on hardware more efficiently.
s ∈ Znq hidden in a known pair (a, b) such that b = as + e, Ring-BinLWE improves key and ciphertext sizes compared to
while b, a ∈ Znq and e ∈ Zq . Moreover, a ∈ Znq is known and the standard Ring-LWE [11].
uniformly random and e ∈ Zq is an error according to a dis- There are multiple classic and quantum attacks introduced
tribution ψ over Zq . There are different choices proposed for against LWE problem with binary errors [11], [12], [22].
ψ, such as Gaussian [9] and binary [11], [12]. According to the latest and most successful attacks in [22],
A more efficient and practical variant of the LWE scheme choosing parameter set of n = 256 and q = 256 provides 84
has been introduced in [10], called Ring-LWE that is defined and 73 bits of classic and quantum security level, respectively.
over Rq = Zq [x]/f (x) = (−q/2 + 1, q/2). There are Moreover, the parameter set of n = 512 and q = 256 results
multiple choices for f (x) as the modulus for the ring Rq . in 190 and 140 bits of classic and quantum security level,
Assigning f (x) = xn + 1 causes the shift operation of the ring respectively.
to turn into an anti-circular rotation [10] that is more efficient
for software and hardware implementation. It is worth men- III. S CHEME O PTIMIZATION
tioning that after each operation over the ring Rq the result As discussed in previous section, one of the critical modules
must be checked to decide whether a reduction is required or in implementation of ring Rq operations is reduction, which
not. The reduction module in both hardware increases overall should be evaluated after every single operation. In order to
area and critical path delay (CPD) of the system. make Ring-BinLWE [11] more efficient for hardware imple-
There are multiple variants of LWE proposed by researchers mentation, we reconsider the underlying ring Rq and propose
that suggest new choices for LWE characteristics [12], [20]. an optimized version of the scheme, called InvRBLWE. In the
In 2016, Buchmann et al. [11] proposed a variant of the Ring- proposed InvRBLWE scheme, the coefficients of the polyno-
LWE problem using binary error distribution. Ring-BinLWE mials in Rq are selected from the inverted range compared to
has smaller key sizes and does not require any complex the original Ring-BinLWE. In InvRBLWE, every member A(x)
operations such as convolutions in Gaussian distribution. in Rq is defined as A(x) = n−1i=0 αi x , where αi ∈ Zq , which in
i

Detailed information of the Ring-BinLWE scheme operations this scheme is represented in the range of (−q/2, q/2−1);
is provided in Fig. 2. The scheme consists of three main notice the difference with the original range of Ring-LWE.
phases: 1) key generation 2) encryption; and 3) decryption. In InvRBLWE, the range of coefficients, exactly matches
q
We describe the operations in each phase in the following. the 2’s-complement notation range for a log2 -bit integer.
1) Key Generation: This phase starts with choosing two The advantage of InvRBLWE is that the reductions over the
random binary vectors r1 , r2 ∈ {0, 1}n . These vectors inverted ring can be handled easily in hardware implementa-
can be mapped to unique polynomials in Rq . In addi- tion. We show that all necessary reductions for any modular
tion, a publicly known polynomial a ∈ Rq is used to operation are performed automatically by normal overflow and
calculate the public key p = r1 − a.r2 ∈ Rq . The binary underflow in 2’s-complement notation according to Lemma 1.
vector r1 is a one-time error and will be discarded after- Lemma 1: Assume q = 2k and k ∈ Z. Moreover, the
wards, while r2 is Alice’s private key. In this scheme, ring Zq is represented with 2’s-complement notation. Then,
q
the private and public keys consist of n and n × log2 modular addition/subtraction of two members a, b ∈ Zq does
bits, respectively. not require any reduction.
EBRAHIMI et al.: POST-QUANTUM CRYPTOPROCESSORS OPTIMIZED FOR EDGE AND RESOURCE-CONSTRAINED DEVICES IN IoT 5503

Proof: The opposite of a number x in 2’s-complement


notation is computed as x̄ = 2k − x. As mentioned earlier,
the underflow/overflow in 2’s-complement notation exactly
matches underflow/overflow in ring Zq operations. In case of
overflow, in order to execute a reduction in ring Zq , we clearly
subtract q from the result. Equation (3) computes the reduc-
tion in this scenario. Similarly, for the case of underflow, an
addition by q has to be performed. This is shown in (4).
REDUCTION: If x ≥ q2 : then
xnew = x − q = x + q̄ = x + (2k − q) (3)
= x + (2k − 2k ) = x + 0 =⇒ xnew = x.
REDUCTION: If x < − q2 : then
xnew = x + q = x + 2k (4)
= 2k − x̄ = −x̄ =⇒ xnew = x.
Fig. 3. FSM diagram for the InvRBLWE scheme.
Thus, no reduction is required once elements of
Zq are presented using 2’s-complement notation with optimize the area of the proposed scheme for tiny devices
q
k = log2 bits. to achieve an ultralightweight architecture for InvRBLWE.
Despite the notation of the underlying ring, the entire oper- Finally, the proposed architectures are analyzed for resistance
ations and parameters of InvRBLWE are exactly same as against SPA and timing attacks.
the original Ring-BinLWE. Hence, all of the correctness and
security claims in [11] and [12] also apply to InvRBLWE. A. High-Speed Architecture
In the proposed InvRBLWE, all of the modular operations
are performed in an inverted manner compared to the stan- As shown in Section III, the InvRBLWE scheme can be
dard Ring-BinLWE. In order to maintain the correctness of optimized for hardware implementation due to the fact that
the scheme, the encode and decode functions have to be it does not require any reduction over the ring Rq (while
updated accordingly. The encode function in original Ring- q = 2k ). Thus, unlike the standard Ring-BinLWE implemen-
BinLWE, maps a binary vector m ∈ {0, 1}n to a vector tations [25], all of the modular operations can be implemented
m̄ ∈ Rq by assigning q2 to every none-zero bits in the mes- without any considerations for reduction. Therefore, while
sage [11]. In the notation used in InvRBLWE, which is same decreasing the CPD, the required hardware contains only n
as 2’s-complement, (q/2) is congruent to −(q/2). Thus, in parallel adders with a few control signals. Different choices
contrast to the original encode function [11], the proposed for scheme parameters (n and q) provide multiple security lev-
encode function for InvRBLWE maps every none-zero bits of els of Ring-BinLWE [11], [12]. In this paper, we have chosen
the message m to −(q/2) as shown in (5). The decode function two parameter sets of Ring-BinLWE to be implemented by
is similarly updated to be the opposite of the original decode the InvRBLWE scheme. The scheme parameter q is set to
function presented in [11], as shown in (6) 256, while n is set to 256 and 512 to achieve 84 and 190 bits
of security, respectively [11].
ENCODE : {0, 1}n −→ Rq We propose an all-in-one unified architecture for all three
 q i (5)
(m0 , . . . , mn−1 ) −→ n−1 i=0 mi (− 2 )x . main phases of the scheme (key generation, encryption, and
DECODE : Rq −→ {0, 1}n decryption) for hardware implementation using scheme param-
n−1 i eter q = 256. Fig. 3 shows the finite state machine (FSM) of
i=0  ai x → (m0 , . . . , mn−1 )
q (6) all three scheme operations. As shown in Fig. 2, encryption
0, |ai − i −  n−3 2 | > 4
mi = has the highest time-complexity because of its two multipli-
1, else. cation and three addition operations. In the following, the
detailed information regarding hardware implementation of
We note that every implementation over the proposed
decryption phase is provided. The other two phases (key gen-
InvRBLWE scheme, can benefit from omission of reduction
eration and encryption) have similar types of operations as
modules compared to the standard scheme that has been
shown in Figs. 2 and 3 and their implementations have only
implemented in [11] and [25]. Therefore, as we show in
a few alternative control signals. Because of common use of
Sections IV and V, our implementations of binary Ring-LWE
entire registers and adder units between different phases of
have lower complexity, which result in faster operations and
the scheme, all of the three phases are implemented using one
higher efficiency compared to previous work.
unified architecture.
Decryption: Fig. 4 shows proposed architecture for decryp-
IV. P ROPOSED A RCHITECTURE tion. In this phase, one must calculate m̄ = c1 + c2 .r2 that
In this section, we discuss the proposed architecture and consists of one multiplication and one addition. The architec-
its detailed features. First, we provide information for our ture must decode the calculated m̄ using function provided
high-speed architecture, which offers straightforward solu- in (6). As shown in Fig. 3, during the multiplication, the con-
tion regarding implementation InvRBLWE scheme. Later, we trol signal S1 is set to zero. Therefore, all of the adders, except
5504 IEEE INTERNET OF THINGS JOURNAL, VOL. 6, NO. 3, JUNE 2019

Fig. 5. Ultralightweight architecture for decryption phase.

Fig. 4. High-speed architecture for decryption phase. of n × 8-bit registers in high-speed architecture. Clearly, this
architectural modification results in exploiting only one 8-bit
the first one, are performing add function. The multiplication adder as well. By decreasing the number of adder and register
inputs are a ∈ Rq and r2 ∈ {0, 1}n . Hence, the multiplication pairs from n to only one, the required area and corresponding
can be calculated in n clock cycles using the shift-and-add power consumption are decreased. As mentioned earlier, the
q multiplication requires n rounds to be completed. This archi-
method, which requires n parallel adders of 8-bit (log2 -bit)
length as shown in Fig. 4. It is worth mentioning that r2 is tectural modification causes each round of the multiplication to
loaded into a shift-register, which outputs each bit of r2 in require n clock cycles in the lightweight architecture. Hence,
certain clock cycle during multiplication operation. by taking the very final addition to account, the total num-
Due to the characteristics of the ring Rq , the shift opera- ber of cycles for this architecture to perform decryption is
tion is performed using an anti-circular rotation because of the (n + 1) × n.
modulus f (x) = xn + 1 [11]. To implement such anti-circular All of the control signals are very similar to the high-speed
q architecture with the difference that a counter to keep track of
rotation in hardware, we simply feed each 8-bit (log2 -bit) reg-
ister to the input of the next adder and finally feed the negative each state is added. To perform anti-circular rotation in shift-
of Res[n-1] (the last 8-bit register) to the first adder. Setting and-add algorithm over the ring Rq , two additional control
carry_in of the first adder to 1, changes its function to sub- signals Crtl_1 and Crtl_2 are added, which decide whether
traction, which completes the anti-circular rotation. During the current operation is an addition or a subtraction.
multiplication, control signal S1 is set to zero and the first
adder performs subtraction. The rest of the adders always per- C. Side-Channel Analysis (SPA and Timing)
form addition and the corresponding carry_in signal is set to
The main target for the proposed architectures is IoT devices
zero.
that are mostly provided by constrained resources. Therefore,
After completion of the multiplication, an additional clock
no additional countermeasures are used in the proposed archi-
cycle is required to calculate the final addition operation. In
tectures. We aim to provide SPA and timing attack resis-
this state, the control signal S1 is set to one and all of the
tant architectures according to principals proposed by [26].
adders perform addition. The public key p is available after
We note that regarding stronger attacks, such as differen-
n + 1 clock cycles. During each clock cycle, the results are
tial power analysis (DPA), complex, and resource-consuming
stored in n registers of 8-bit length. It is worth mentioning that
countermeasures that are proposed in previous work, such
r2 is a binary vector and each bit of r2 needs to be extended to
q as [25] and [31] can also be applied to our architectures and
8 sequential bits (log2 -bit). This bit adjustment is performed
can be considered for future work.
in hardware with no significant overhead by simple rewiring.
In the proposed architecture, for the InvRBLWE scheme,
At the end of decryption phase, the result must be decoded
there are no conditional branches that is taken or not taken
from a polynomial in Rq to a binary vector based on (6). We
according to a secret value. Therefore, the architecture oper-
implement the decode function by an array of n 2-input XOR
ates independently from input values during each clock cycle,
gates that compare two most significant bits of each register
which results in constant number of clock cycles too. More
block, namely Res[7] and Res[6].
precisely, regarding high-speed architecture, the key genera-
tion, encryption, and decryption phases require exactly n + 1,
B. UltraLightweight Architecture 2n + 3, and n + 1 clock cycles, respectively. Moreover, using
In order to achieve a lightweight architecture for InvRBLWE the ultralightweight architecture, the key generation, encryp-
scheme, instead of the straightforward and parallel use of tion, and decryption phases require exactly (n + 1) × 256,
components (i.e., 8-bit adders and registers), we have utilized (2n + 3) × 256, and (n + 1) × 256 clock cycles, respectively.
only one set of such components in a serial manner as shown In addition, the CPD of the proposed architectures is always
in Fig. 5. Moreover, contrasting with the high-speed method to the same during each phase of execution. Thus, the proposed
perform multiplication, the multiplicand is shifted during each architecture is secure against timing attacks [27].
cycle instead of the product. This choice gives us the capa- To quantitatively evaluate the SPA resistance, we imple-
bility of using only one 8-bit register for the product instead mented InvRBLWE on a Sakura-X board in order to capture
EBRAHIMI et al.: POST-QUANTUM CRYPTOPROCESSORS OPTIMIZED FOR EDGE AND RESOURCE-CONSTRAINED DEVICES IN IoT 5505

TABLE I
C OMPARISON OF FPGA I MPLEMENTATION R ESULTS

(SoC); or 3) FPGA devices. On the other hand, the ultra-


lightweight architecture is a proper match for tiny end-nodes
and resource-constrained devices that in many cases are pow-
ered by energy harvesting systems and have limited power
thresholds.
In the followings, first, we provide results over three mostly
used FPGA devices by previous work. Second, regarding
ASIC implementations of proposed architectures, the 65 nm
Fig. 6. Quantitative analysis regarding SPA resistance.
Taiwan Semiconductor Manufacturing Company (TSMC) [32]
and 45 nm Nangate digital standard cell libraries, 45 nm are
power traces. The power measurements have been performed used. We compare our ASIC implementation results against
using a deep-memory high-performance USB oscilloscope ECC [6] and SIKE [29],which suggests that the proposed
(PicoScope 6403D) that can sample at 5 GS/s rate. Using the implementations are proper match for future (quantum) and
same method as described in [25], we performed difference- even current era in IoT.
of-means test for up to 100 000 traces on r2 [2] (the input key
for decryption) while r2 [1:0] is fixed to “10” same as [25]. As
A. FPGA Implementation
shown in Fig. 6, SPA is not possible even after 100 000 traces
with the confidence of 99.99% (bounded by the dashed lines). Table I presents the detailed results of comparison between
the proposed implementations and previous work (using
n = 256 and n = 512 that provide 84 and 190 bits
V. I MPLEMENTATION R ESULTS of classic security, respectively). The evaluations cover all
As discussed in previous section, we propose two architec- previous LWE-based FPGA implementations. As the underly-
tures for InvRBLWE targeting different devices in edge and ing scheme is considered as a post-quantum scheme [11], [12],
IoT nodes. We have implemented the high-speed architecture we have also compared our implementations against the sub-
on both FPGA and AISC platforms due to the fact that mitted post-quantum schemes in the first round submissions
edge and powerful devices in IoT can take advantage of hav- held by NIST [16], which provided hardware implementation
ing: 1) high-performance crypto-processor; 2) system-on-chip results. Additionally, to show the practicality of the proposed
5506 IEEE INTERNET OF THINGS JOURNAL, VOL. 6, NO. 3, JUNE 2019

TABLE II
C OMPARISON OF ASIC I MPLEMENTATIONS

architectures, we compare our implementation results against cell libraries. To the best of our knowledge, this is the first
the best implementations of ECC as a classic cryptosystem. ASIC implementation of an LWE-based cryptosystem.
We exploit Area × Time as the measurement for complex- According to the recent NIST’s lightweight crypto standard-
ity, which is a common and widely used measurement [6] ization reports [3], a practical crypto-processor for resource-
in previous work. constrained IoT end-nodes should meet certain power and
In [25], three hardware implementations for decryption energy thresholds. Battery-powered devices have limited
phase of standard Ring-BinLWE,rblwe on Xilinx Spartan-6 energy in contrast to energy-harvesting devices that can pro-
device for scheme parameters n = 256 and q = 256 have been vide unlimited energy with bounded power.
provided. Our implementation using the optimized InvRBLWE Table II compares our ASIC implementation results against
scheme benefit from avoiding reduction and therefore, have those for SIKE [29] as a post-quantum key generation scheme
shorter CPD. This makes our implementations with the same based on isogenies over elliptic curve. Moreover, the best
set of parameters on the same device to have up to three times ASIC implementation results available for ECC have been
higher frequency and perform about 3.1 times faster than the compared against ours. Due to the low-complexity operations
high-performance implementation in [25]. Thus, InvRBLWE of the optimized InvRBLWE scheme, our ASIC implementa-
improves decryption time and AT complexity by at least 68% tion is at least two times faster than ECC implementations.
and 87% compared to the fastest implementation in [25], Additionally, the ultralightweight ASIC implementation con-
respectively. sumes 62% and 66% less area and power compared to the
Compared to standard Ring-LWE [10] implementa- most lightweight implementation of ECC, respectively.
tions [17], [24], InvRBLWE improves time and AT com- As seen in the table, our high-speed architecture provides
plexity by at least 83% and 52% on Xilinx Virtex-6 device. fast scheme operations compared to all of previous work,
Moreover, compared to the best of the available hardware while consuming lower energy. This makes the high-speed
implementations of other post-quantum cryptosystems, our implementation of InvRBLWE a proper choice for IoT devices
maximum security implementation (n = 512 and q = 256) powered by batteries. On the other hand, the ultralightweight
achieves higher frequency while consuming less slices of architecture can be implemented by only 7.5k gates and also
FPGA and improves AT complexity significantly. consumes only 0.18 mW. Such low power consumption makes
Finally, compared to the most recent and the best of the this architecture to be the first public key implementation that
ECC’s implementations [6], we achieve higher speed and more can be supplied by ultra low-power energy harvesters such as
than 92% AT improvement. Regarding time and AT com- Vibration Piezo or EM [3].
parison, we have compared our most secure implementation We note that the ASIC implementation results for both
(190 bits of security) against the ECC scheme over GF(2283 ) architectures indicate that InvRBLWE has the potential to be
in order to offer a fair result. considered as an alternative for classic cryptosystems.

B. ASIC Implementation VI. C ONCLUSION


To show that the proposed hardware architectures are Security of IoT is becoming a concern considering the
platform-independent, we have implemented them on ASIC exponential growth in IoT nodes and applications. Moreover,
using two different 65 nm TSMC and 45 nm Nangate standard post-quantum security is also an issue for the future of Internet
EBRAHIMI et al.: POST-QUANTUM CRYPTOPROCESSORS OPTIMIZED FOR EDGE AND RESOURCE-CONSTRAINED DEVICES IN IoT 5507

and IoT due to the venerability of the classic cryptosystems [8] C. Peikert, “Public-key cryptosystems from the worst-case shortest vec-
such as ECC and RSA against quantum attacks [7]. Thus, tor problem,” in Proc. ACM 41st Annu. ACM Symp. Theory Comput.,
2009, pp. 333–342.
many of the organizations are already looking for reliable [9] O. Regev, “On lattices, learning with errors, random linear codes, and
and practical alternatives for classic cryptosystems [16], [34]. cryptography,” J. ACM, vol. 56, no. 6, p. 34, 2009.
Among current post-quantum crypto-schemes, lattice-based [10] V. Lyubashevsky, C. Peikert, and O. Regev, “On ideal lattices and learn-
ing with errors over rings,” in Proc. Annu. Int. Conf. Theory Appl.
cryptography has gained high attention from researchers Cryptograph. Techn., 2010, pp. 1–23.
due to its relatively faster operations compared to other [11] J. Buchmann, F. Göpfert, T. Güneysu, T. Oder, and T. Pöppelmann,
post-quantum cryptosystems, such as code-based [13] or “High-performance and lightweight lattice-based public-key encryption,”
in Proc. ACM 2nd Int. Workshop IoT Privacy Trust Security, 2016,
isogeny [14]. pp. 2–9.
In this paper, we propose an optimized version of Ring- [12] J. Buchmann, F. Göpfert, R. Player, and T. Wunderer, “On the hardness
BinLWE [11], namely InvRBLWE, which is highly efficient of LWE with binary error: Revisiting the hybrid lattice-reduction and
meet-in-the-middle attack,” in Proc. Int. Conf. Cryptol. Africa, 2016,
for hardware implementation. Moreover, we propose two pp. 24–43.
architectures for InvRBLWE scheme targeting different IoT [13] Classic McEliece. Accessed: Jan. 2019. [Online]. Available:
devices with alternative capabilities varying from edge and https://classic.mceliece.org/
[14] D. Jao and L. De Feo, “Towards quantum-resistant cryptosystems from
powerful devices to resource-constrained and tiny end-nodes. supersingular elliptic curve isogenies,” in Proc. Int. Workshop Post
Our FPGA implementations improve time and AT compared to Quant. Cryptography, 2011, pp. 19–34.
the best previous RingLWE implementations [17], [24], [25] [15] B. Koziel, R. Azarderakhsh, and M. M. Kermani, “A high-performance
and scalable hardware architecture for isogeny-based cryptography,”
by at least 68% and 52%, respectively. These implementa- IEEE Trans. Comput., vol. 67, no. 11, pp. 1594–1609, Nov. 2018.
tions also improve AT by at least 92% compared to the [16] National Institute of Standards and Technology. Accessed: Mar. 2018.
best of ECC implementations. We are the first to propose a [Online]. Available: https://www.nist.gov/
[17] S. S. Roy, F. Vercauteren, N. Mentens, D. D. Chen, and I. Verbauwhede,
practical ASIC implementation of LWE-based cryptosystems “Compact ring-LWE cryptoprocessor,” in Proc. Workshop Cryptograph.
for IoT devices. The proposed high-speed ASIC implemen- Hardw. Embedded Syst., 2014, pp. 371–391.
tation is at least two times faster than the best of the ECC [18] The Cryptographic Suite for Algebraic Lattices (CRYSTALS). Accessed:
Jan. 2019. [Online]. Available: https://pq-crystals.org/
implementations while consuming less energy. This makes [19] NTRU Prime Family: Streamlined NTRU Prime and NTRU LPRime.
the high-speed architecture suitable for battery-based solu- Accessed: Jan. 2019. [Online]. Available: https://ntruprime.cr.yp.to/
tions. Moreover, our lightweight ASIC implementation has index.html
[20] D. Micciancio and C. Peikert, “Hardness of SIS and LWE with small
power consumption as low as 0.18 mW, which can be supplied parameters,” in Proc. Adv. Cryptol. (CRYPTO), 2013, pp. 21–39.
by low-power energy harvesting devices such as, Vibration [21] R. De Clercq, S. S. Roy, F. Vercauteren, and I. Verbauwhede, “Efficient
Piezo or EM [3]. In other words, the lightweight architec- software implementation of ring-LWE encryption,” in Proc. IEEE
Design Autom. Test Europe Conf. (DATE), 2015, pp. 339–344.
ture is the first public key ASIC implementation that satisfies [22] F. Göpfert, C. van Vredendaal, and T. Wunderer, “A hybrid lattice basis
NIST report criteria [3] and can be practically exploited in IoT reduction and quantum search attack on LWE,” in Proc. Int. Workshop
end-nodes. Post Quant. Cryptography, 2017, pp. 184–202.
[23] N. Göttert, T. Feller, M. Schneider, J. Buchmann, and S. Huss, “On the
As future work, the proposed hardware implementations can design of hardware building blocks for modern lattice-based encryption
be extensively analyzed against side-channel analysis (SCA). schemes,” in Proc. Int. Workshop Cryptograph. Hardw. Embedded Syst.,
Our current implementations are resistant regarding SPA and 2012, pp. 512–529.
[24] T. Pöppelmann and T. Güneysu, “Towards practical lattice-based public-
timing attacks. However, there exist other powerful attacks, key encryption on reconfigurable hardware,” in Proc. Int. Conf. Sel.
such as DPA [26], simple, and differential fault attacks (SFA Areas Cryptography, 2013, pp. 68–85.
and DFA) [31]. [25] A. Aysu, M. Orshansky, and M. Tiwari, “Binary ring-LWE hard-
ware with power side-channel countermeasures,” in Proc. IEEE Design
Autom. Test Europe Conf. Exhibit. (DATE), 2018, pp. 1253–1258.
[26] P. Kocher, J. Jaffe, B. Jun, and P. Rohatgi, “Introduction to differential
R EFERENCES power analysis,” J. Cryptograph. Eng., vol. 1, no. 1, pp. 5–27, 2011.
[27] P. C. Kocher, “Timing attacks on implementations of Diffie–Hellman,
[1] Z. Ling et al., “Security vulnerabilities of Internet of Things: A case RSA, DSS, and other systems,” in Proc. Annu. Int. Cryptol. Conf., 1996,
study of the smart plug system,” IEEE Internet Things J., vol. 4, no. 6, pp. 104–113.
pp. 1899–1909, Dec. 2017. [28] A. A. Kamal and A. M. Youssef, “An FPGA implementation of the
[2] R. Chaudhary, G. S. Aujla, N. Kumar, and S. Zeadally, “Lattice NTRUEncrypt cryptosystem,” in Proc. IEEE Int. Conf. Microelectron.
based public key cryptosystem for Internet of Things environment: (ICM), 2009, pp. 209–212.
Challenges and solutions,” IEEE Internet Things J., to be published. [29] D. Jao et al. Supersingular Isogeny Key Encapsulation (SIKE).
doi: 10.1109/JIOT.2018.2878707. Accessed: Jan. 2019. [Online]. Available: https://sike.org
[3] C. Patrick and P. Schaumont, “The role of energy in the lightweight [30] W. Wang, J. Szefer, and R. Niederhagen, “FPGA-based key generator
cryptographic profile,” in Proc. NIST Lightweight Cryptography for the Niederreiter cryptosystem using binary Goppa codes,” in Proc.
Workshop, 2016, pp. 1–16. Int. Conf. Cryptograph. Hardw. Embedded Syst., 2017, pp. 253–274.
[4] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital [31] T. Schneider, A. Moradi, and T. Güneysu, “ParTI—Towards com-
signatures and public-key cryptosystems,” Commun. ACM, vol. 21, no. 2, bined hardware countermeasures against side-channel and fault-injection
pp. 120–126, 1978. attacks,” in Proc. Annu. Cryptol. Conf., 2016, pp. 302–332.
[5] N. Koblitz, “Elliptic curve cryptosystems,” Math. Comput., vol. 48, [32] Semiconductor Manufacturing Company (TSMC). Accessed: Jan. 2018.
no. 177, pp. 203–209, 1987. [Online]. Available: http://www.tsmc.com/
[6] R. Salarifard, S. Bayat-Sarmadi, and H. Mosanaei-Boorani, “A [33] NanGate Standard Cell Library. Accessed: Jan. 2018. [Online].
low-latency and low-complexity point-multiplication in ECC,” IEEE Available: http://www.si2.org/
Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 9, pp. 2869–2877, [34] L. Chen et al., Post-Quantum Cryptography, U.S. Dept. Commerce, Nat.
Sep. 2018. Inst. Stand. Technol., Gaithersburg, MD, USA, 2016.
[7] P. W. Shor, “Algorithms for quantum computation: Discrete logarithms
and factoring,” in Proc. IEEE 35th Annu. Symp. Found. Comput. Sci.,
Santa Fe, NM, USA, 1994, pp. 124–134. Authors’ photographs and biographies not available at the time of publication.

You might also like