39. an FPGA Implementation of 30Gbps Security Model for GPON Systems | Secrecy | Electronics

An FPGA Implementation of 30Gbps Security Module for GPON Systems

1

Truong Quang Vinh1, Ju-Hyun Park1, Young-Chul Kim1, Kwang-Ok Kim2 Department of Electronics and Computer Engineering, Chonnam National University 300 Yongbong-dong, Buk-gu, Gwangju 500-757, Korea tqvinh@soc.chonnam.ac.kr 2 Electronics and Telecommunication Research Institute 161 Gajeong-dong, Yuseong-gu, Daejeon, Korea kwangok@etri.re.kr Abstract
amount of resource needed. To achieve high throughput for gigabit links in GPONs, we apply pipelined architectures for all process blocks of security module, especially for AES core. The pipelined architecture for AES can improve the throughput but it utilizes much area due to duplicated hardware for implementing 11 rounds. Therefore, some researchers have proposed several speed-area trade-off to implement the architectures for AES algorithm. To optimize the resource for AES implementation, researchers focus on improving some blocks of the ciphers. In [8]-[10], efficient implementations of the Sbox are proposed to minimize area and delay. The architecture of proposed S-box is combination of SubBytes and Inverse SubBytes transformations, instead of look-up tables that require much memory. To enhance key schedule, some authors propose on-the-fly key expansion that can generate the round keys concurrently during the encryption or decryption procedure without extra memory to store the round keys [8], [9]. This paper explores efficient schemes for designing the security module in order to achieve the target performance of GPON systems. Our design employs a composite field arithmetic architecture for SubByte transformation. Moreover, we apply sub-pipelined for this function block to improve the throughput of AES algorithm. Another part that has improvement is the key-expander. We propose an area-efficient key expander which can compute round keys in on-the-fly manner. Besides, we exploit sub-pipelined architecture for key expansion block and use optimized set of registers to store round keys. Our key expander is suitable for pipelined AES architecture that can start at the same time with data encryption. The paper is organized as follows. Section 2 presents the architecture of the GPON security module. Section 3 describes the hardware implementation of AES algorithm. The advanced features for AES hardware implementation are presented in Section 4. Section 5 presents the implementation results and the

GPON systems require gigabit throughput data encryption for security and privacy. This paper presents an implementation of very high speed security module for GPON on Virtex4 FPGA. The security module supports payload encryption with constant delay by using counter mode AES algorithm. Our design of AES has three advanced features: composite field arithmetic SubByte, efficient MixColumn transformation, and On-the-Fly Key-Scheduling. Fullpipelined architecture is employed for the AES architecture in order to achieve the high performance for security module. The experiment shows that the proposed architecture can achieve a throughput of 30Gbits/s on a Xilinx Virtex-4 VLX100-12 device. The performance of our design is well suitable for encryption applications of GPON systems.

1. Introduction
Recently, GPONs (Gigabit-capable Passive Optical Networks) are attractive for cost-effective delivery of high-bandwidth data directly to building, curb, and home. This creates a strong requirement for access network to be trustworthy, secure, and reliable. Therefore, encryption module is an essential part in GPON systems for protecting broadcast data from eavesdropping due to the multicast nature of the GPONs. The ITU-T G.984 document [1] recommends using the Advanced Encryption Standard (AES) for payload encryption in GPONs. The National Institute of Standards and Technology (NIST) defined five modes of operation of AES [2]. However, only AES with counter mode (CTR-AES) can be used for GPON payload encryption. In this paper, we present a GPON security module using CTR-AES algorithm which is implemented by a full-pipelined architecture for area and performance optimization. For hardware implementation of security module, there are two critical constrains: performance and

978-1-4244-2358-3/08/$20.00 © 2008 IEEE

868

CIT 2008

1 AES general architecture The AES algorithm is a symmetric-key cipher. the forward cipher function is invoked on each counter blocks. we give the conclusion. 3. a. ShiftRow. we apply pipeline technique both for outer round and inner round of AES architecture. and the resulting output blocks are exclusive-ORed with the corresponding plaintext blocks to produce the ciphertext blocks. storing and sending a new key. The forward cipher function is used in both CTR decryption and CTR encryption. the transmission data are ensured to be confidentiality.2 The full-pipelined architecture for AES algorithm In order to achieve very high throughput. For outer round pipelining. MixColumn. The ShiftRow step is a circular shifting of bytes in each row of the round data. The shadow key is used if the OLT require key exchange. The MixColumn transformation operates on the State column-by-column. both the OLT and ONU (Optical Network Unit) begin using the new key at precisely the same frame boundary. c. The XORed operation is executed in security encoder block. CTR-ARE Core: is the same process of AES algorithm except input values which is crypto counter. Port-ID Table: is implemented as 4K 12-bit registers to store the port identifier.performance comparisons with different architectures. When the new key is transferred successfully to OLT. MixColumn and AddRoundKey into sub-pipelined stages with equivalent delay. treating each column as a four-term polynomial. The Fig. For the authentic frames. f. The architecture of the GPON security The GPON security module is implemented to guarantee a secure communication in Tx/Rx link of GPON. The top structure of the GPON security module is shown in Fig. In the section 6. Security Decoder: generates Crypto counter with the format: (Inter Frame Count[19:0] & Intra Frame Count[15:0]) & (Inter Frame Count[29:0] & Intra Frame Count[15:0]) & (Inter Frame Count[29:0] & Intra Frame Count[15:0]). 2. only one hardware implementation is used for both encryption and decryption. For the inner round pipelining. Key Expander: restores the initial key and generates round keys for CTR-AES from 128-bit key input. It has the same 3. Security Encoder: multiplexes the cipher GEM (G-PON Encapsulation Method) Payloads from Bypass GEM Payload and Encrypted GEM Payload depending whether security function is enabled. The AES algorithm in GPON security module uses counter mode to encrypt data [2]. we decompose four processes SubByte. the encoder performs XORed 128bits Pseudorandom Cipher block with delayed GEM payload to generate cipher GEM payload. integrity. Therefore. AES core implementation Fig. delay as encryption time to synchronize with the cipher GEM payload at the output. b. 1. and AddRoundKey. ShiftRow. 3. each round except the final round consists of four steps: SubByte. The ONT responds by generating. and origin authenticity of each frame sent and received by the OLT (Optical Line Termination) / ONT (Optical Network Termination) [1]. Only frames with the appropriate Port-ID are encrypted by CTR-AES core. The SubByte is nonlinear transformation. 1. The top structure of the GPON security module. 869 . 128-bit input blocks are transformed into 128-bit pseudorandom cipher blocks e. in which both the sender and the receiver use a single key for encryption and decryption. The round keys are different in every round and are generated by Key Expansion. It also registers 128-bit GTC Payload for the Payload Bypass. the pipeline registers are placed between the data path instances of each round. The total bit number of round_keys is 1408 = 128*(10+1). In the encryption of the AES algorithm. The AddRoundKey can be simply performed by applying exclusive OR to the round key with the data block. Using the module.2 shows full pipelined architecture of AES algorithm. Payload Bypass: delivers the insecure payload without an authentication encryption. The crypto counter increases at every 128-bit data block. d. In counter mode encryption. which substitutes each byte of round data according to a substitution table called SBox.

c Å (s 2.c Å ({02} · s 3. Advanced features for AES Hardware implementation This section presents innovative features in AES hardware implementation.c = ({03} · s 0. and we can apply substructure sharing to optimize area for hardware implementation. c (2) í s'2. MixColumn Fig. To overcome this Several architectures have been proposed for the implementation of MixColumn transformation. c 0.c Å s 3.3). Substructure-shared architecture is applied in [4] [6]. c ) î The equation for MixColumn transformation is now more symmetrical.c ) Å s1. 3-stage pipelined SBox using GF operations. Next.c 0. In direct form. Our improvement for AES architecture is focused on SubByte.1. Therefore. map affine Fig. Then. By using the 2-stage pipelined architecture with three 8-bit registers (Fig. and Key Expander block.2. but it consumes much resource.c 1. the two GF(24) elements are inverse mapped to one element in GF(28). 4. c = {02} · (s 0. Each sub-block in encryption process is optimized for area and delay.c Å ({02} · s 2. c Å (s 0. The detail hardware implementations for these blocks are described as follows. In this architecture. 4.c ) Å ({03} · s 3.c 2. [9]. c ) ï ïs'3.c ) ï ïs'3. The {02} constant multiplication is computed by the function denoted by a = xtime(b). drawback. c Å (s 0. c Å s3. c 3. the critical path is broken in half. c Å s1. Although the composite field implementation of Sbox is very efficient in area. Full-pipelined architecture for AES algorithm. the SubByte block has to be decomposed into 2 stages and 3 stages.c = ({02} · s 0. The xtime() function can be implemented at the byte level as a left shift and map -1 (1) 870 . c í s'2.c ïs' = s Å ({02} · s ) Å ({03} · s ) Å s ï 1.c Å s1. respectively. c 2.c Å s 2.c ) ïs' = {02} · (s Å s ) Å s Å (s Å s ) ï 1. The implementation of a SBox can be done by a look-up table. further pipelining can be used. the affine transformation is performed. c ) Å s 2. the SubByte phase has the most delay. c Å s1. c ) Å s1. we can implement a SBox using Galios Field operations [10]. Fig. c = {02} · (s 2. the columns of the State are considered as polynomials over GF(2 8) and multiplied modulo x4 + 1 with a fixed polynomial c(x ) = ‘03’ x3 + ‘01’ x2 + ‘01’ x + ‘02’. we also use substructure sharing techniques to implement an efficient hardware for MixColumn transformation.c ) Å s 2. the equation (1) should be rewritten in an efficient way as ìs'0. 2-stage pipelined SBox using GF operations. the multiplicative inverse is calculated using GF(24) operation. Then. it suffers from a long critical path. the multiplicative inverse in GF(28) is calculated. c 3. In MixColumn transformation.Among round processes of AES algorithm. c ) Å s3. To apply this technique.c ) Å ({03} · s1. 4. the number of sub-stages of this block is more than that of other phases. We implemented two full-pipelined architectures which have 2-stage sub-pipeline and 5stage sub-pipeline for each round process. 2. [7]. c 2. the input is considered as an element of GF(28). Last. c Å s 3. We can achieve a very high throughput when using 5-stage sub-pipelined for AES architecture. MixColumn. the input values is mapped to two elements of GF(2 4). Nevertheless. c = {02} · (s 3. To reduce more path delay. c Å s1. the 3-stage pipelined architecture can be also applied (Fig. SubByte transformation In the SubByte transformation (Sbox). First.c = s 0. c 1. In our implementation. 3.4). the MixColumn transformation can be expressed as ìs'0. Field arithmetic GF(2 4) is used instead of GF(28) to optimize area. c Å s0.c ) î 4. an affine transformation over GF(2) is applied. Thus.

The 5-stage sub-pipelined design has higher performance than the 3-stage sub-pipelined design. the number of sub-pipelined stages for key expansion must be the same with the number of encryption sub-stages. Inherited from that architecture. However. the corresponding cipher text blocks will appear every clock cycle. For simulation. the designs in [3]-[5] have less performance because they just use outer-pipeline architecture. Fig. the key expander is divided into r sub-stages. In the implementations of [7] and [9]. The table 1 shows the comparison between existed AES implementations and our implementation. in which the author used r sets of registers all round keys and temporary values for sub-pipelined stage. (b) The implementation of xtime() function. Thus.6.6. 5.2i was used to synthesize the design and provided post-placement timing results. these designs require more slices for extra hardware. In term of throughput/slice. In order to operate synchronously with the sub-pipelined round process. we implement an area-efficient key expander which also can compute round key in on-the-fly manner. and duplicate this hardware 10 times for total 10 rounds [4]. The result of synthesized report shows that our design with 5-stage sub-pipelined architecture can achieve throughput of 31. we can achieve the throughput of 26. we used ModelSim 5. the authors improve the throughput by applying sub-pipeline architectures. According to the experiment result. By this scheme. After r clock cycles. The 3-stage sub-pipelined design has total 31 stages (r×10 + 1). 5.a subsequent conditional bitwise XOR with {1b} if the most significant of input byte is one (b7 = 1). some researchers implemented a key expansion routine to compute a round key. our implementation is more efficient than the published approaches. We evaluated the hardware cost in terms of BRAMs. We use 11 registers to store 11 round keys. we also choose Xilinx VirtexE-family device beside Virtex4 for our design in order to compare the result fairly.6 Gbits/s. Nevertheless. Some other researchers propose method to reduce Xinmiao Zhang [9] has proposed key expander that can operate in on-the-fly manner. The architecture of on-the-fly key expander. Fig. the MixColumn transformation can be implemented as shown in the Fig. By using this architecture. For pipelined AES architecture. The sub-pipelined architecture for on-the-fly key expander with 3 substage (r=3) is shown in Fig.8c to verify the encrypt/decrypt operations. so all the round keys are available after (r×Nr) +1 clock cycles. Key-Expander The Key Expansion routine generates a total of 11 round keys from an initial key in 128-bit AES algorithm. maximum frequency and throughput. all round keys must be available at the same time. this design consumes more area for pipeline registers and takes more clock cycles for round processes.3. Performance results and comparisons We implemented the GPON security module with full-pipelined architecture of 128-bit CTR-AES on Virtex-4 VLX100-12. after 31 clock cycles. The xtime() block can be implemented by 3 2-bit XOR gate. By using efficient architecture of xtime() and applying XOR-sharing. The data encryption and the key expansion can start simultaneously. a new round key is generated.7Gbits/s. It is different from the architecture of the key expansion in [9]. slices. 871 . (a) The efficient architecture of the MixColumn. we can reduce more area than the previous architecture. [5]. but they consume much area. 4. We implemented two full-pipelined architectures of AES core which have 3 sub-pipelined stages (r=3) and 5 sub-pipelined stages (r=5). These architectures can calculate all round keys at the same time. Since previous architectures have been implemented on VirtexE device. which includes 108 2-bit XOR gates (each XOR gate contains 3 gates). reg reg Roundkey(0) reg Roundkey(1) reg reg Roundkey(2) reg Roundkey(3) reg reg Roundkey(Nr) The total number of gate counts for MixColumn transformation is 324. Therefore. Since round keys are generated on the fly.5. Xilinx ISE 8.

Comparison of FPGA implementation of the AES algorithm Design Shuenn-Shyang [3] Jae-Gon Lee [4] Saqib. pp. Oswald.1 150. Therefore. vol. Lin Hao. May 2004. N. [10] J. By using these improvement features. Woong Hwangbo. 2002. 2005. area-efficient MixColumn. Proceedings of the International Symposium on Circuits and Systems. 216-221. 2001. and on-the-fly sub-pipelined KeyExpander.Table 1. F. Oct. I. 126-130. 9. Rodriguez-Henriquez. Parhi. M. pp. Lamberger. 597-600. no. Diaz-Perez.25 208.408 1. Second International Conference on Embedded Software and Systems.272 1.3 Amendment 1. “High-speed VLSI architectures for the AES algorithm”.P. de Macedo Mourelle.49 247. “Area-throughput trade-offs for fully pipelined 30 to 70 Gbits/s AES processors”. The CAD tools for design in this work were supported by IDEC. 4. 6. no.gov/ CryptoToolkit/modes/.. “An ASIC Implementation of the AES Sboxes”. [6] Nedjah.4 91. “Recommendation for Block Cipher Modes of Operation”.816 3.an efficient approach for sequential and pipeline architectures”.. Acknowledgement This research was financially supported by the Electronics and Telecommunication Research Institute (ETRI) in Korea. 957-967. “A Compact Pipelined Hardware Implementation of the AES-128 Cipher”.2 93. [4] Jae-Gon Lee. April 2006. Rujin Yang. K. A. Sept. For full-pipelined architecture with 51 stages. Seonpil Kim. Wan-Sheng Ni.308 1.984. and 13384 slices. pp. 872 . “An efficient FPGA implementation of advanced encryption standard algorithm”.. the total areas for the security module with AES core (r=3) and AES core (r=5) are 11958 slices. Verbauwhede..nist. Our implementation is well suitable for encryption applications of GPON systems.29-52. Our design has three main efficient features: composite field arithmetic SubByte. N. security encoder. pp.K.. Chong-Min Kyung. 366-372. [3] Shuenn-Shyang Wang.19 Throughput (Mbps) 1604 5120 2584 16500 11965 21556 11661 19232 26686 31640 slices 1857 8009 2744 11719 9406 11022 8914 9820 9478 9904 BRAMs 0 104 0 0 0 0 0 0 0 0 Mbps/slice 0..639 0. Conclusions We presented a FPGA implementation of the high speed GPON security module using counter mode AES algorithm. Proceedings of the Fourth Mexican International Conference on Computer Science . respectively. [5] Jarvinen [7] Xinmiao Zhang (r=3) [9] Xinmiao Zhang (r=7) [9] Our AES core design (r=3) Our AES core design (r=5) Our AES core design (r=3) Our AES core design (r=5) Device XCV1000e-8 XCV3200e-8 XCV812e-8 XCV1000e-8 XCV812e-8 XCV1000e-8 XCV1000e-8 XCV1000e-8 XC4VLX100-12 XC4VLX100-12 Frequency (MHz) 125. NIST Special Publication . 55. A. [5] Saqib. 2004. Third International Conference on Information Technology: New Generations .867 0.956 1.942 1. [9] Xinmiao Zhang. [7] Yongzhi Fu. ITU-T G. The some extra resource is needed for security decoder. “Design of an extremely high performance counter mode AES reconfigurable processor”.5 168. 68-72. http://csrc. [2] Morris Dworkin. and M. vol.. 2003. Dec. vol. pp.38 40 20. July. we can achieve throughput of 30 Gbits/s on Virtex4 VLX100 device. N.958 2.. 2. 2005. [8] Hodjat. Xuejie Zhang. 2005. Feb. April 2006. Sept. Proceeding of RSA Conference .A. IEEE Transactions on Computers.192 129. Wolkerstorfer. Proceedings of 6th International Conference on ASIC . Cardoso. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. “AES algorithm implementation . pp. our design has optimal area and maximum throughput. and payload bypass. E. pp.195 The whole architecture of GPON security including AES core are synthesized on Xilinx Virtex-4 VXL10012. References [1] “Gigabit-capable Passive Optical Networks (G-PON): Transmission convergence layer specification”. 12.. L. “Top-down implementation of pipelined AES cipher and its verification with FPGA-based simulation accelerator”.A.

Sign up to vote on this title
UsefulNot useful