2 EC Cryptography: 2.1 Elliptic Curve Arithmetic

CryptoProcessor for Elliptic Curve Digital Signature Algorithm using FPGA
By: Hadis Mehari and Dr.-Ing Getahun Mokuria
Abstract 2 EC Cryptography
ECC has become popular due to its Elliptic Curve Cryptography is a pub-
superior strength per bit compared to lic key cryptographic system (PKCS)
traditional public key algorithms which utilizes points on elliptic curves.
RSA. Nowadays the RSA is being These points can be represented graph-
replaced by ECC in many systems. ically in a two-dimensional plane.
This is due to the fact that ECC gives
higher security with shorter bit length The previous condition guarantees
than RSA. In this thesis Hardware that there do not exist more than one
accelerator for point multiplication is tangent line for a given point on the
proposed. All the Elliptic Curve curve, i.e., the curve is smooth. After
blocks are simulated using FPGA simplification of some variables the
advantage, ModelSim above equation becomes
SE10.0b and synthesized using Xilinx
ISE design Suite 13.2. Experimental y 2 + xy = x3 + ax2 + b................4.4
results show that a single point where a and b 6=
multiplication executes in 93 µs. 0 belong to K, and 4a3 + 27b2 6= 0.
2.1 Elliptic Curve Arithmetic

1 Introduction
To best understand the way point for-
Elliptic curve cryptography (ECC) was mulae are derived, elementary point op-
independently introduced by Koblitz erations are typically described geomet-
and Miller in 1985. Since then, this rically. The basic principle is that a
public-key cryptosystem has attracted secant line drawn between two points
increasing attention due to its shorter also intersects the elliptic curve at a
key size requirement in comparison with third point, as illustrated by the line
other established systems such as RSA. between P and Q in F igure 2.1. The
For instance, it is widely accepted that result of point addition R is the re-
160-bit ECC offers equivalent security flection of this third point across the
as 1024-bit RSA. [15]. Smaller key sizes x − axis.
lead to faster processing, which is very
interesting to folks that are implement-
ing encryption on small, mobile devices
with limited resources in terms of power,
CPU and memory. It is also very desir-
able for large servers that will be han-
dling many encrypted sessions. The
security of a cryptosystem depends on
how hard it is to solve the underlying
mathematical problem. Figure 2.1: point addtion (P + Q = R)
In the case a point is added with

itself. A tangent line of the curve at
P is used in place of the secant line
Addis Ababa Institute of Technology 1 © Addis Ababa, Ethiopia, September 2012.

between P and Q as show in figure The implementations of multiplication,
2.2. Inversion and square are complicated
in hardware.
3.1 Adder Block

If P= (xP , yP ) and Q = (xQ , yQ ) are
two different points on the elliptic
curve and R = (xR , yR ) = P + Q.
The coordinates of R can be
computed as follows
Figure 2.2: point doubling (P + P = λ = (yP + yQ )/(xP + xQ )....(3.1).

2P = R) xR = λ2 + xP + xQ + 1.....(3.2)
2.1.1 algebraic point addition yR = λ(xP + xQ ) + xR + yP ....(3.3)
1. Additive inverse of a point, P = According to the structure of the data

(x, y) is now − P = (x, x + y). path, the computation time is
approximately equal to
2. P + O = P ;
Tpoint− addition = m(Tmod− f− product +
3. If P = (0, y) then 2P = 0; Tmod− f−division )...........(3.4)
4. Point addition for two distinct The data path for computing the
points P and Q following equations are shown in
Let P = (xP , yP ) and Q = Figure 3.1.
(xQ , yQ ) with P 6= −Q and P 6=
Q, then R = (xR , yR ) is given
by : xR = λ2 + λ + xP + xQ + a;
yR = λ(xP + xR ) + xR + yP ;
yP + yQ
where λ = ;
xP + xQ
5. Similarly, f or point doubling (2P =
R where xP 6= 0) the operation is
xR = λ2 + λ + a and
yR = x2P +(λ+1) xR with xP 6= 0 Figure 3.1: point addition datapath
y
where λ = x + .
x
3.2 Finite Field Squarer
3 Architecture of ECC The finite field square is a specific case
of general multiplication and can be
Addition in a binary Galois field is triv- performed by the multiplication, but
ial, since it is a bit-by-bit addition, hav- it takes too much time. Performance
ing no carry bit, which can be imple- can be improved significantly by opti-
mented by simply using XOR gates. mizing the architecture, specifically for
2
the case of square. The square is com- that has been optimized for binary Ga-
puted as follows: lois fields. The quotient of two poly-
nomials in GF (2m ) can be computed
C = A2 mod f (x) = using the binary version of the binary
(am−1 x2(m−1) + am−2 x2(m−2) +. . . + algorithm that is used for calculation
a1 x2 + a0 ) mod f (x).......(3.5) of gcd from required polynomials. The
binary algorithm for computing z(x) =
The finite field square can be g(x)h(x)−1 mod f (x) has been given be-
implemented by expanding A to low.
double its bit-length by interleaving 0
bit in-between the original bits of A Algorithm 3.1 Binary Algorithm
and then reducing the double length a := f ; b := h; c := zero; d :=
result. g; alpha := m; beta := m − 1;
while beta >= 0 loop
Equation 3.5 can be changed to
if b0 = 0 then b := shif t− one(b);
C = A2 modf (x) =
m d := divide− by− x(d, f ); beta = beta −
x Ah (x)modf (x) + Al (x).......(3.6)
1;
where else old− b := b; old− d :=
Ah (x) = 0 + am−1 x m−2
+0+... d; old− beta := beta;
+a(m+3)/2 x3 + 0 + a(m+1)/2 x + b := shif t− one(add(a, b));
0..............(3.7) d := divide− by− x(add(c, d), f );
if alpha > beta then a := 0ld− b; c :=
Al (x) = a(m−1)/2 xm−1 + 0 + old− d; beta := alpha − 1; alpha :=
a(m−3)/2 + 0 + a1 x2 + a0. ..............(3.8) old− beta;
else beta := beta − 1;
Then the high part of Equation.(6.8) end if ;
can be computed as follows with end if ;
pentanomial irreducible polynomial: end loop;
z := c;
xm Ah (x) mod f (x) =
(xi + xk + xl + 1)(0 + am−1 xm−2 +
0 + . . . + a(m+1)/2 x + 0.........(3.9) 3.4 Scalar Multiplication
This section gives an example of finite-
According to Equations.(6.9), the
field application, namely, the implemen-
shift operation is relatively simple,
tation of the scalar product (point mul-
and then the square can be changed
tiplication) over an elliptic curve. It is
to the finite field addition.
the basic computation primitive of el-
liptic curve cryptography.
3.3 The Finite Field Inver-
sion Scalar multiplication is a block of all
elliptic curve cryptosystems. It is the
This component calculates the quotient operation of calculating an integer
of two 164-bit vectors modulo the prime multiple of an element in additive
for K − 163. This utilizes the ‘binary group of elliptic curve. In other
inversion’ algorithm which is a modi- words, it is an operation of the form
fied version of the Euclidean Algorithm KP . where P is a point on the
3
elliptic curve and K is a positive Algorithm 3.2 Point
integer. Computing KP means Multiplication(Q = KP )
adding the point P exactly K − 1 Q− inf inity := true; a := K; b := 0;
times to itself, which results in while ((a 6= 0) or (b 6= 0)) loop
another point Q on the elliptic curve. a− div− 2 := div− 2(a);
if a mod 2 = 0 then
a := b + a− div− 2;
b := −a− div− 2;
In this section both k and P are else if (a/2) mod 2 = b mod 2 then
unknown until the run-time, i.e. they if Q− inf inity then xQ := xP ; yQ :=
are seeded in to the program at the yP ; Q− inf inity := f alse;
run time. Since k and P may vary. else point− addition (xP , y, xQ , y f, new− xQ new− yQ );
The scalar number k represents the xQ := new− xQ ; yQ := new− yQ ;
number of time P is added to itself. end if ;
To summarize, doubling has been a := b + a− div− 2;
substituted by squaring, a simple b := −a− div− 2;
operation over a binary field. The else if Q− inf inity then xQ := xP ;
strategy used for developing an yQ := add (xP , yQ ); Q− inf inity :=
efficient point multiplication f alse;
algorithm is to find, for a given else point− addition(xP , add (xP , yP ),
integer k, aPτ − ary expression of the xQ , yQ , f, new− xQ , new− yQ );
l−1
form K =√ i=0 µi τ i where xQ := new− xQ ; yQ := new− yQ ;
τ = (1 + −7)/2 and µi ∈ {0, ±1} end if ;
a := b+a− div− 2+1; b := −(a− div− 2+
The following algorithm 3.2 computes 1);
KP . end if ;
xP := product− mod− f (xP , xP , f );
yP := product− mod− f (yP , yP , f );
end loop;
A datapath for executing Algorithm

3.2 is shown in Fig. 3.2. and its
computational time is approximately
equal to
T ≈ mTpoint− addition ≈
m2 (Tmod− f− product +
Tmod− f− divison ).........(3.10)
4
Figure 3.2: Point multiplication
4 RESULTS AND COM-

PARSONS
4.1 Simulation Results
This section presents results of the im-
plementation of the elliptic curve key
scalar multiplication on GF 2163 using
polynomial basis. In this scheme, the Figure 4.1: Simulation result for the
proposed ECC co-processor is imple- scalar multiplication with field order
mented with VHDL. The Xilinx ISE m=163.
13.2 and Modelsim 10.0b softwares are
used for synthesizing and RTL simula-
tion of the design, respectively. The 4.2 Performance Comparisons
optimization goal during synthesis is This section compares time latency and
set as speed and area. resource utilization of the some the re-
cent scalar multiplication hardware pro-
After completing the finite field totypes with the proposed design in this
arithmetic operations, the scalar thesis. The comparisons were restricted
multiplication is then designed, to processors with comparable comput-
synthesized and simulated. ing the scalar multiplication serially as
shown in table 4.1. As it is seen from
The simulation results for the scalar
table 5, the slowest implementation and
multiplication for the field order
the most efficient resource utilization
m=163 with FPGA
was designed by Orlando &Parri [16].
XC5VLX20t-2FF323 as the target
Among all the works N.Gura, et.al’s
device is shown in the wave form in
design is the worst in resource utiliza-
figure 4.1.
tion.
5
ficient than the other designs . For ex-
KP(ms)
298.4
1300
940
144
304
135
210
93
ample, the execution speed of this de-
sign is 3 times faster than the architec-
ture in [17].The proposed design also
4,245 (44%)
3,357(16%)
uses roughly five times less hardware
#LUT
19,508
7,362
3,479
3002
resources than the N. Gura et. al’s ar-
chitecture. Table 8 shows a compar-
ison of the performance of the scalar
multiplication timing results and resource
1,393 (14%)
#Flip-flops
1,393(6%)
utilizations with the same hardware im-

1,769
6,442
1,930
2,010
plementations by importing the design
and algorithm of others to design to
make fair for comparison.
Xilinx XC4VFX140-FF1517
Xilinx XCV400E-8-BG560
Altera II EP2C20F484C7
XC5VLX20t-2FF323
Xilinx XCV2000E
Xilinx XCV2000E
KP (µs)
Xilinx XCV400E
Xilinx XCV2000
333.186
93.922
198.6
FPGA
#Slices
1852
1712
2016
#flip-flops
1845
1703
2010
Mathias Schmalisch[61]
Orlando & parr [39]
Kimmo Järvinen[8]
N. Gura et. al.[25]
Jonathan Lutz[62]
Mubarek Kedir[1]
Propased Design
Implementation
Jian Huang[63]
#LUTs
3426
3256
3479
XCVLX20t-2FF323
XCVLX20t-2FF323
XCVLX20t-2FF323
FPGA
Table 4.1: Performance comparison of

timing results and resource utilizations
with other Published results
As described in Table 4.1, the per-

Proposed Design
double-and-add
formance results presented in this pa-

Montgomery
Method
per are not easily comparable to previ-

ous hardware implementations. This
is because each design uses a differ-
ent hardware platform, finite field, field
size, and multiplier. In addition, the
research objective of each design differs Table 4.2: Performance comparisons of
somewhat. However, According to Ta- timing results and resource utilizations
ble 4.1, it is evident that the proposed with the same hardware platform.
architecture is the fastest and more ef-
6
According to Table 4.2, one can com- [3] Mostafa Abd-El-Barr, Alaaeldin
pare the simulated latency of this de- Amin & Turki F. Al-Somani,
sign with Montgomery and double-and- Design, Analysis, and FPGA
add methods of point multiplication based prototyping of High-Performance
design. The point multiplication in this Arithmetic for Cryptographic Ap-
design is roughly 2 times faster than plications, King Fahd Univer-
that of bit-serial Montgomery point mul- sity of Petroleum & Minerals,
tiplication (198.6 µs), and 3 times faster Dhahran 31261, Saudi Arabia.
than that of bit-serial basic (the tradi-
[4] FIPS PUB 186-2, digital signa-
tional) double-and-add algorithm point
ture standard (DSS), u.s. depart-
multiplication (333.186 µs).
ment of commerce /National In-
stitute of Standards and Technol-
5 Conculsions ogy, 2000 January 27.
[5] eBACS, “ECRYPT
The work presented here concentrated
Benchmarking of Cryp-
on the development of a high perfor-
tographic Systems,” 2010,
mance elliptic curve processor for the
http://bench.cr.yp.to/ebats.html.
computation of point multiplications for
curves defined over fields GF (2m ). The [6] Alfred Mdeanezes and Scott Van-
architecture presented in this thesis is stone, The Elliptic Curve Digital
based on bit-serial scalar multiplication Signature Algorithm (ECDSA),
algorithm. The arithmetic units over University of Waterloo, Canada.
the finite field GF (2163 ) and the ellip-
[7] Oguz Yayla, Scalar multiplication
tic curve scalar multiplication crypto-
on elliptic curves in Middle East
processor are designed and simulated.
Technical University, 2006
It provides a time latency of 93 µs in
Xilinx XC5VLX20t-2FF323FPGA. Thus, [8] Kimmo J¨arvinen,
The proposed design outperforms all Cryptoprocessor for Elliptic
other implementations motioned of on Curve Digital Signature Al-
the literature review. gorithm (ECDSA), Helsinki
University of Technology, Finland
August 7, 2007.
References
[1] Mubarek Kedir, Hardware Ac- [9] Benjamin Glas, Prime Field
celeration of Elliptic Curve Based ECDSA Signature Processing
Cryptographic Algorithms: De- for Reconfigurable Embedded
sign and Simulation, Addis Ababa Systems in International Journal
University school of graduate of Reconfigurable Computing
studies faculty of technology, Volume 2011.
April, 2008.
[10] Bart Preneel cryptographic Hard-
[2] Kristin Cordwell and Chen ware and Embedded systems
Zhao, Elliptic Curve Computa- CHES 2011”13th international
tions, New Mexico Supercomput- workshop,Nara, Japan, sept/oct
ing Challenge, April 1st, 2009. 2011.
7
[11] Chang Hoon Kim, FPGA im-
plementation of high performance
elliptic curve cryptographic pro-
cessor over GF (2163 ), in Journal
of Systems Architecture 54 (2008)
893–900.
[12] Rahila Bilal, Area Efficient
High Speed ECC Coprocessor
over GF (2m ), in European Jour-
nal of Scientific Research ISSN
1450-216X Vol.58 No.3 (2011).
[13] V. S. Dimitrov, K. U. J¨arvinen,
M. J. Jacobson, W. F. Chan& and
Z. Huang, FPGA implementation
of point multiplication on Koblitz
curves using Kleinian integers. In
Cryptographic Hardware and Em-
bedded Systems.
[14] Joseph Sterling Grah, Hash func-
tions in cryptography, University
of Bergen, June 1, 2008.
[15] Patrick Longa, Accelerating the
Scalar Multiplication on Ellip-
tic Curve Cryptosystems over
Prime Fields, University of Ot-
tawa, Canada, 2007.
[16] Nils Gura, Sheueling Chang

Shantz, An End-to-End Systems
Approach to Elliptic Curve Cryp-
tography, Sun Microsystems Lab-
oratories.
[17] Jian Huang, Hao Li and Phil

Sweany, An FPGA Implementa-
tion of Elliptic Curve Cryptogra-
phy for Future Secure Web Trans-
action, University of North Texas.

2 EC Cryptography: 2.1 Elliptic Curve Arithmetic

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 EC Cryptography: 2.1 Elliptic Curve Arithmetic

Uploaded by

Copyright:

Available Formats

CryptoProcessor for Elliptic Curve Digital Signature Algorithm using FPGA

By: Hadis Mehari and Dr.-Ing Getahun Mokuria

2.1 Elliptic Curve Arithmetic

In the case a point is added with

Addis Ababa Institute of Technology 1 © Addis Ababa, Ethiopia, September 2012.

3.1 Adder Block

Figure 2.2: point doubling (P + P = λ = (yP + yQ )/(xP + xQ )....(3.1).

2.1.1 algebraic point addition yR = λ(xP + xQ ) + xR + yP ....(3.3)

1. Additive inverse of a point, P = According to the structure of the data

A datapath for executing Algorithm

4 RESULTS AND COM-

utilizations with the same hardware im-

N. Gura et. al.[25]

Table 4.1: Performance comparison of

As described in Table 4.1, the per-

formance results presented in this pa-

per are not easily comparable to previ-

[16] Nils Gura, Sheueling Chang

[17] Jian Huang, Hao Li and Phil

You might also like