Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2017.2675449, IEEE
Transactions on Circuits and Systems II: Express Briefs
1
Abstract—We present here a more memory efficient method for encoding method [9]. His proposal is fairly easy to implement
encoding binary information into words of prescribed length and and runs in linear time. The price to pay is the variable length
weight. Existing solutions either require complicated float point encoding [10]. He also proposed an approximation method
arithmetic or additional memory overhead making it a challenge
for resource constrained computing environment. The solution in this paper by restricting the value of d to the power of
we propose here solves these problems yet obtains better coding two. This method greatly simplifies the encoding process and
efficiency by a memory efficient approximation of the critical improves coding throughput. Heyse et al. [11], [12] followed
intermediate value in constant weight coding. For the time being, this up and proposed to adapt the approximation method from
the design presented in this work is the most compact one for [10] for embedded system applications.
any code-based encryption schemes.
However, we find the method derived by Heyse et al. is
Index Terms—Code-based Cryptography, McEliece/Niederre- not applicable for large parameter sets of the Niederreiter
iter Cryptosystem, Constant Weight Coding, FPGA Implemen- scheme [13]–[15]. Their design maintains a lookup table of
tation.
pre-stored data with space complexity of O(n) to encode
input messages into constant weight words. This table is still
I. I NTRODUCTION quite big for small embedded systems making their design less
OST modern public-key cryptographic systems rely on unscalable if n is large. For instance, the classical code based
M either the integer factorization or discrete logarithm
problem, both of which would be easily solvable on large
signature — CFS signature scheme [13] exploits very large
Goppa code and it requires to compress the lengthy constant
enough quantum computers using Shor’s algorithm [1]. The weight signatures (n = 216 , 218 and 220 ) into binary stream.
upcoming breakthroughs of powerful quantum computers have MDPC-McEliece/Niederreiter encryption [16] uses very large
shown their potential in computing solutions to the problems n for practical security levels: n is set to 19712 for 128-bit
mentioned above [2]. The cryptographic research community security and 65536 for 256-bit security. Recently, Baldi et al.
has recognized the urgency of this challenge and begun to proposed a novel LDGM sparse syndrome signature scheme
settle their security on alternative hard problems in the last [17] which also requires a large constant weight coding during
years, such as multivariate-quadratic, lattice-based and code- the signature generation. At the time being, we do not have a
based cryptosystems [3], [4]. We address in this paper the considerably lightweight yet efficient solution for the constant
problem of encoding information into binary words of pre- weight encoding when considering these applications.
scribed length and Hamming weight in resource-constrained The purpose of this work is to propose yet another ef-
computing environment, e.g. reconfigurable hardware, embed- ficient constant weight coding engine for all possible se-
ded microcontrol systems, etc. This is of interest, in particular, cure parameters of the code-based cryptosystems proposed
when one wishes to efficiently implement McEliece’s scheme in literature. Our contributions include: We propose a new
[5] or Niederreiter’s scheme [6], the most promising candidates approximation method for computing a critically important
for code-based cryptography. value d in constant weight coding without complicated float
For Niederreiter/McEliece encryption, one needs to convert point arithmetic and memory footprint. This method allows
any binary stream of plaintext into the form of constant weight us to implement a compact yet fast constant weight encoder
words. Constant weight means that the number of ‘1’ in the for resource-constrained computing environment. We improve
binary plaintext must be an constant. The exact solution [7] the coding efficiency from the point view of source coding by
of constant weight coding requires the computation of large fine-tuning the optimal precision for computing the value of d,
binomial coefficients and has a quadratic complexity though when compared with other approximation methods. Our secure
it is optimal as a source coding algorithm. In [8], Sendrier implementation of Niederrieter encryptor by integrating the
proposed the first solution with a linear complexity using Huff- proposed constant weight encoder can encrypt approximately
man codes. Afterwards, the author of [8] improves its coding 1 million plaintexts per second on a Xilinx Virtex-6 FPGA,
efficiency very close to 1 by exploiting Goloumb’s run-length requiring 183 slices and 18 memory blocks.
The rest of the paper is structured as follows. We will first
J. Hu and R. Cheung are with the Department of Electronic Engineering, revisit Sendrier’s proposal of constant weight coding and its
City University of Hong Kong, Hong Kong e-mail: j.hu@my.cityu.edu.hk; approximation variant [10] in Section II. This motivates us
r.cheung@cityu.edu.hk. T. Güneysu is with the Department of Mathe- to propose a new approximation method and to fine tune it
matics and Computer Science, University of Bremen, Germany e-mail:
tim.gueneysu@uni-bremen.de for an optimal source coding performance, presented in Sec-
Manuscript received XX XX, 2016; revised XX XX, 2016. tion III. Section IV describe our actual implementations for the
1549-7747 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2017.2675449, IEEE
Transactions on Circuits and Systems II: Express Briefs
2
proposed constant weight encoder/decoder and Niederrieter returns an integer such that 1 ≤ best d(n, t) ≤ n − t and
encryption unit. Our results compared with the state of arts Sendrier suggested to choose it close to the number d =
are presented in Section V. Section VI concludes this paper. (n − t−1
2 )(1 − 2 1t ) to obtain high coding efficiency. Sendrier
1
1549-7747 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2017.2675449, IEEE
Transactions on Circuits and Systems II: Express Briefs
3
B. Represent θ[t] in fixed-point format bound measures the coding efficiency [10]. It is also worth
As aforementioned, θ[t] is an array of fractional numbers mentioning that for (n = 220 , t = 8), the performance of
and should be stored in fixed-point format. Note that the our proposal falls slightly behind with only 0.68% of loss
integer part of θ[t] must be 0 and therefore we only need when compared with the Sendrier’s method, it nonetheless
to preserve its fractional part. Hereafter, we denote our fixed- outruns Sendrier’s approximation and Heyse’s approximation.
point format as fixed 0 i, where i indicates the bits we have In particular, the performance of Heyse’s approximation turns
used for storing the fractional part of θ[t]. out to be unfavourable with 56.73% loss and we are pushing
In practice, we hope to use fixed 0 i with the smallest i the limits of Heyse’s method here as the lower bits of n are
while maintaining desirable coding efficiency. Smaller i means innegligible and cannot be removed with such large n.
lower data storage and also a simpler multiplication, which
is particularly advisable for resource-constrained computing IV. P ROPOSED CONSTANT WEIGHT ENCODER AND
platforms. Indeed, one of the key issues of this paper is to DECODER
determine the best fixed 0 i for θ[t]. In the next section, we The most critical arithmetic unit is the best d module which
describe our experiments on exploring the optimal fixed 0 i. computes the best value of d according to the input n and t.
With our proposal of computation of best d, it consists of
C. Find the optimal precision for constant weight encoding three stages performing the following task:
The purpose of the precision tuning is to find the lowest 1) compute θ[t] via a priority encoder. As discussed in
precision that still maintains a relative high coding efficiency. previous section, format fixed 0 6 is used to represent
A lower precision means we can use a multiplier of smaller θ[t] for (n = 210 , t = 38) and (n = 211 , t = 27) and
operand size leading to better path delay and slice utilization. fixed 0 4 is used for (n = 216 , t = 9) and (n = 220 , t =
A higher coding efficiency means one can encode more bits 8). We find by analysis that for some t, the values of
from the source into a constant weight word. This is of interest, θ[t] are identical, driving us to exploit priority encoder
in particular, when someone encrypts a relatively large file to simplify the encoding of θ[t].
using code-based crypto: It takes much less time for encryption 2) compute n · θ[t] via a fixed point multiplier. We
if we have a high coding efficiency close to 1. configured the Xilinx LogiCORE IP to implement high-
To find the optimal precision of fixed 0 i, we studied performance, optimized multipliers for different n and
how different i could affect the coding efficiency. In our t. The fractional part of the multiplication result is
experiments, we investigated all possible precision from f- truncated and its integer part is preserved for the next
p 0 32 to fp 0 1 to compare with the Sendrier’s methods [10] stage to process.
and Heyse’s approximate version [11]. Table I compares our 3) output the value of d and u. Recall that the value of
n · θ[t] must be round to d = 2u . Here again another
TABLE I: The coding performance of the optimal d chosen priority encoder is used to decode the integer part of
from our approximation method n · θ[t].
avg. number coding efficiency
n, t method A. Bin2CW encoding and decoding engine
of bits read efficiency improved
origin. [10] 229.34 99.54% — Figure 1 depicts the architecture of the proposed constant
10 approx. [10] 225.52 97.86% -1.68% weight encoder. Input binary message is passed inwards using
2 , 38
approx. [12] 221.51 96.12% -3.42%
n*fixed 0 6 227.29 98.65% -0.89% a non-symmetric 8-to-1 FIFO which imitates the function of
origin. [10] 202.74 99.58% — read(B, 1). We use a serial-in-parallel-out shift register to
approx. [10] 199.62 98.04% -1.54% perform read(B, u), 0 ≤ u ≤ ⌈log2 ( n2 )⌉. The proposed best d
211 , 27
approx. [12] 199.83 98.14% -1.44% module is exploited here to compute the value of d. The values
n*fixed 0 6 200.27 98.36% -1.22% of n, t, δ are accordingly updated using three registers.
origin. [10] 125.24 99.79% —
approx. [10] 123.13 98.09% -1.70%
Figure 2 depicts the architecture of the proposed constant
216 , 9 weight decoder. A m-to-m bit FIFO is used instead to transfer
approx. [12] 119.67 95.33% -4.46%
n*fixed 0 4 124.27 99.02% -0.77% the input t-tuple word by word. This logic is actually the bottle
origin. [10] 144.41 99.80% — neck in the constant weight decoder when compared with the
approx. [10] 142.59 98.54% -1.26% encoder. We similarly use three registers to update the values
220 , 8
approx. [12] 62.30 43.06% -56.73%
of n, t, δ. The major difference is that the shift register here
n*fixed 0 4 143.41 99.11% -0.68%
outputs the value of δ bit by bit as step 9 of Algorithm 2
proposed methods with the Sendrier’s [10], Sendrier’s original requires.
approximation using power of 2 [10] and Heyse’s table lookup
approximation [11]. From this table it is seen that our proposal B. Integrating with the Niederreiter encryptor
is the closest approximation of Sendrier’s method among all In this section, we demonstrate how the proposed Bin2CW
five parameter sets used for the Niederreiter scheme. Note that encoder can integrate with the Niederrieter encryptor for data
the average number of bits we have to read before producing
( ) encryption, shown in Algorithm 3.
the constant weight words is upper bound by log2 nt and The Bin2CW encoder is used to perform the first step in
the thus the ratio of the average number read and the upper this algorithm. Recall that Bin2CW encoder returns a t-tuple
1549-7747 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2017.2675449, IEEE
Transactions on Circuits and Systems II: Express Briefs
4
&::RUG
,1,7 %(67BG
$''5
!"&
!"# 5RZ
5($'B 5($'B
!"$
!"&
!"% '() &(
0HVVDJHELWV
ELW5HJLVWHU
,QLWLDO ,QLWLDO
YDOXH YDOXH
ELW,QWHJHU$GGHU
1549-7747 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2017.2675449, IEEE
Transactions on Circuits and Systems II: Express Briefs
5
TABLE II: Compact implementations of CW encoder and decoder on Xilinx Virtex-6 FPGA
Area Memory Frequency Throughout Coding
n, t Method Coding aspects Platform
[slices] blocks [MHz] [Mbps] efficiency
Bin2CW, new approx. 91 0+1 350 190.9
This work Xilinx xc6vlx240t 98.36%
CW2Bin, new approx. 95 0+1 340 172.0
11 Bin2CW, Heyse’s approx. 118 0+1 340 218.1
n=2 , t = 27 Heyse et al. [12] Xilinx xc6vlx240t 98.14%
CW2Bin, Heyse’s approx. 110 0+1 340 172.2
∗ Bin2CW, original 18.0 99.58%
Sendrier [10] Intel Pentium 4 — — 2400
Bin2CW, approximate 35.1 98.04%
Bin2CW, new approx. 103 0+1 440 322.0
This work Xilinx xc6vlx240t 99.02%
CW2Bin, new approx. 109 0+1 310 216.9
Bin2CW, Heyse’s approx. 90 10+3 240 187.5
n = 216 , t = 9 Heyse et al. [12] Xilinx xc6vlx240t 95.33%
CW2Bin, Heyse’s approx. 90 10+3 230 175.2
Bin2CW, original 18.4 99.79%
Sendrier [10] Intel Pentium 4 — — 2400
Bin2CW, approximate 23.0 98.09%
∗ Sendrier implemented a different but very close parameter set n = 211 , t = 30. We also put it here for reference.
TABLE III: FPGA Implementation results of Niederreiter en- the Niederreiter encryptor on a Xilinx device. Experiments
cryption with n = 2048, t = 27 compared with [12] after PAR show that our lightweight implementation of Niederrieter
Aspect (Virtex6-VLX240) Niederreiter [12] This work encrypting unit can encrypt approximately 1 million plaintexts
Slices 315 183 per second on a Xilinx Virtex-6 FPGA, requiring 183 slices
LUTs 926 505 and 18 memory blocks.
FFs 875 498
BRAMs 17 18∗
Frequency 300 MHz 340 MHz R EFERENCES
CW Encode e = Bin2CW(m) ≈ 200 cycles 349.1 cycles [1] P. W. Shor, “Polynomial-time algorithms for prime factorization and dis-
Encrypt c = e · Ĥ ≈ 200 cycles 352.1 cycles crete logarithms on a quantum computer,” SIAM journal on computing,
Area×Time ≈ 210 189.5 vol. 26, no. 5, pp. 1484–1509, 1997.
[2] N. Xu, J. Zhu, D. Lu, X. Zhou, X. Peng, and J. Du, “Quantum
∗ We factorization of 143 on a dipolar-coupling nuclear magnetic resonance
used 16×36Kb RAMs and 2×16Kb RAMs system,” Phys. Rev. Lett., vol. 108, p. 130501, Mar 2012.
[3] D. J. Bernstein, “Introduction to post-quantum cryptography,” in Post-
Quantum Cryptography. Springer, 2009, pp. 1–14.
module, which grows logarithmically from 11bit × 6bit to [4] R. Overbeck and N. Sendrier, “Code-based cryptography,” in Post-
quantum cryptography. Springer, 2009, pp. 95–145.
16bit × 4bit. On the other hand, the memory overhead of [5] R. J. McEliece, “A public-key cryptosystem based on algebraic coding
Heyse’s implementations grow linearly with n and might be theory,” DSN progress report, vol. 42, no. 44, pp. 114–116, 1978.
problematic when n is large as aforementioned. To show this, [6] H. Niederreiter, “Knapsack-type cryptosystems and algebraic coding
theory,” Pproblems Of Control and Information Theory-Problemy Up-
we re-implemented Heyse’s work for (n = 216 , t = 9). The ravleniya I Thorii Informatsii, vol. 15, no. 2, pp. 159–166, 1986.
experimental results verifies this point. Additionally, another [7] T. M. Cover, “Enumerative source encoding,” IEEE Transactions on
negative side effect of heavy memory overhead is that the Information Theory, vol. 19, no. 1, pp. 73–77, 1973.
[8] N. Sendrier, “Efficient generation of binary words of given weight,” in
working frequency of circuits drops rapidly as shown in Cryptography and Coding. Springer, 1995, pp. 184–187.
Table II. For large parameters (Table II (b)), such look up table [9] G. Goloumb, “Run length encoding,” IEEE Transactions on Information
is extremely large by consuming 13 memory blocks, leading Theory, vol. 12, pp. 399–401, 1966.
[10] N. Sendrier, “Encoding information into constant weight words,” in
to slower speed due to relatively far and complicated routing. 2005 IEEE International Symposium on Information Theory Proceedings
These block memories are the real bottleneck of Heyse’s work (ISIT). IEEE, 2005, pp. 435–438.
as n grows. [11] S. Heyse, “Low-reiter: Niederreiter encryption scheme for embedded
microcontrollers,” in Post-Quantum Cryptography. Springer, 2010, pp.
We finally implemented the Niederreiter encryptor, a cryp- 165–181.
tographic application where constant weight coding is used [12] S. Heyse and T. Güneysu, “Towards one cycle per bit asymmet-
ric encryption: code-based cryptography on reconfigurable hardware,”
exactly as mentioned in Section IV. Table III compares our in Cryptographic Hardware and Embedded Systems–CHES 2012.
work with the state of art [12]. Our new implementation is the Springer, 2012, pp. 340–355.
most compact one with better area-time tradeoffs. We use the [13] N. T. Courtois, M. Finiasz, and N. Sendrier, “How to achieve a Mceliece-
based digital signature scheme,” in Advances in Cryptology-ASIACRYPT
same block memory as [12] did where 16 × 36Kb + 1 × 18Kb 2001. Springer, 2001, pp. 157–174.
RAMs are utilized to store the public key matrix Ĥ, and one [14] G. Landais and N. Sendrier, “Implementing cfs,” in Progress in Cryptol-
18Kb RAM for the 8-to-1 FIFO within the constant weight ogy - INDOCRYPT 2012: 13th International Conference on Cryptology.
Springer, 2012, pp. 474–488.
encoder. [15] M. Finiasz, “Parallel-CFS,” in Selected areas in cryptography. Springer,
2011, pp. 159–170.
[16] R. Misoczki, J.-P. Tillich, N. Sendrier, and P. S. Barreto, “MDPC-
VI. C ONCLUSION Mceliece: New Mceliece variants from moderate density parity-check
codes,” in 2013 IEEE International Symposium on Information Theory
We proposed a new method of determining the optimal Proceedings (ISIT). IEEE, 2013, pp. 2069–2073.
value d in constant weight coding. This method enabled a [17] M. Baldi, M. Bianchi, F. Chiaraluce, J. Rosenthal, and D. Schipani, “Us-
more compact yet efficient implementation of constant weight ing LDGM codes and sparse syndromes to achieve digital signatures,”
in Post-quantum cryptography. Springer, 2013, pp. 1–15.
encoder and decoder for resource-constrained computing sys-
tems. Afterwards, we exploited this new encoder to implement
1549-7747 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.