You are on page 1of 22

Agenda

Ch 04. Hardware Implementation I. Overview of hardware implementation

for Fast Encryption II. Hardware Design Approaches


2.1 Hardware design approaches
2.2 ASIC/FPGA
2.3 Pipeline/Parallel-structures

III. High-speed Block cipher


3.1 high-speed for components
이 훈 재 (李 焄 宰) Hoon-Jae Lee 3.2 Comparisons for implemented AES/SEED/ARIA/IDEA
CNSL
Cryptography and Network Security Lab. IV. High-speed Stream cipher
4.1 Parallel-Structured/Shifting LFSR
hjlee@dongseo.ac.kr 4.2 Word-Based Stream Cipher

http://kowon.dongseo.ac.kr/~hjlee
http://crypto.dongseo.ac.kr

2011-04-03 CNSL-Internet-DongseoUniv. 1

High-Speed Implementation - Adder & Shifter Speed (ns)


- Memory Access Time (ns)
 64-bit Adder Speed[0.13um ASIC, 2004]
Samsung DDR SRAM Speed (2004.7) 180/326ps
 Belgium, Neve et.al
 IEEE Trans. VLSI system
 2004
 720ps @ 1.1V, 0.18um
 326ps @ 1.1V, 0.13um
 180ps @ 2.5V, 0.13um

2011-04-03 CNSL-Internet-DongseoUniv. 3 2011-04-03 CNSL-Internet-DongseoUniv. 4

1
- Adder & Shifter Speed (ns) - Multiplier Speed (ns)

 32-bit Adder Speed[SAMSUNG 0.13um ASIC, 2002]


Samsung ASIC Speed (2002)
 64-bit Multiplier Speed[0.35um ASIC, 2003]
12.8ns
 European, SEDA
 2003, M.S thesis “Design and Realization of a high-
speed 64 x 64 multiplier for low power applications”
 12.8ns @ 0.35um

2011-04-03 CNSL-Internet-DongseoUniv. 5 2011-04-03 CNSL-Internet-DongseoUniv. 6

High-Speed Implementation 1.1 AES evaluation-Initial(1997, NIST)


Algorithm XOR
– Core Operation
Mod 232 Mod 232 Fixed Variable Mod 232 GF(28) S-box
Add sub shift rotate Mul Mul *LUT
*LUT
Notations >>>
<<<
Max. Sever 180 ps 180 ps Several Several 12.8 ns Several Several
Speed(ASI al 10 10 ps 10 ps ns ns
C) ps
MARS ● ● ● ● ● ● ●

RC6 ● ● ● ● ●

Rijndael ● ● ● ●

Serpent ● ● ●

Twofish ● ● ● ● ●

2011-04-03 CNSL-Internet-DongseoUniv. 7

2
1.1 AES evaluation - Initial(1997, NIST) 1.1 AES evaluation - Final(2000, NIST)

1.1 AES evaluation - Final(2000, NIST) 1.1 AES evaluation – hardware efficiency

Government  ASIC
NSA
IBM, Inc.
Misubishi, Inc.
Universities  FPGA
UC Berkeley
USC(University of Southern California)
WPI(Worcester Polytechnic Institute )
GMU(George Mason University)
Micronic, Inc.

3
1.2 Somethings to design approaches 1.2 Somethings to design approaches
 Crypto/Non-Crypto Module
 Crypto Module: Refer to FIPS 140-1,2,3 (CMVP)
Implementation Platforms
 Non-Crypto Module: Power module, Interface module, etc.

 Security device approaches


 End-to-end Device: mainly software-oriented design
 Link Encryption device: mainly hardware-oriented design
 Hybrid device: ETE & LE

 Noiseless or noisy channel approaches


 Block cipher(PKC): wired-communication, computer security
Not adaptable for noisy channel for satellite or wireless
 Stream cipher: bit-oriented, adaptable in satellite or wireless
channels,
Required for Mode changing in OFB mode for block cipher
 Word-based Stream cipher: Block cipher & Stream cipher
- Europe ECRYPT-eSTREAM (http://www.ecrypt.eu.org/stream/)
14

1.2 Somethings to design approaches 1.2 Somethings to design approaches


Which Platform? Platform Characteristics
 The choice of the implementation platform is driven by a
multitude of factors, which include,

 Performance needed
 Development and per-unit costs
 Power consumption (Important in case of wireless devices!)
 Flexibility
 Physical Security

 Choice heavily depends on application requirements, but…

15 16

4
1.2 Somethings to design approaches 1.2 Somethings to design approaches

Reconfigurable Hardware and Cryptography


Reconfigurable Hardware and Cryptography
Why Hardware?
 Advantages of reconfigurable platforms
Software Implementations are too slow
for time critical applications  Algorithm agility
Hardware implementations are  Algorithm Upgradability
intrinsically more secure  Architecture Efficiency
 Resource Efficiency
 Algorithm Modification
Why Reconfigurable? …  Throughput (Relative to software)
 Cost Efficiency (Relative to ASICs)

17 18

1.2 Somethings to design approaches 1.2 Somethings to design approaches

Hardware Implementation Methodologies Hardware Implementation Methodologies


 HDL Designing done using FSM model  Four basic components of a CFSM
 Termed as CFSM (Cryptographic FSM)
 State Register
 Key Register
 Updating Logic
 Control & Load Logic

 Modification of generic CFSM for


implementing hash functions…

19 20

5
1.2 Somethings to design approaches 1.2 Somethings to design approaches
 Module approaches
Hardware Implementation Methodologies  S/W module: Designed/implemented at Client PC

 u-P module: crypto-microprocessor


 CFSM for Hash Functions  low/medium speed
 H/W module: FPGA or ASIC
 high-speed or ultra-high speed
(higher costs)

 Hardware Design Approaches


 FPGA (Field-Programmable Gate Array) :
- ALTERA, FPGA
- XILINX , CPLD
 ASIC (Application-Specific Integrated Circuits) :
- CAD company supported

21

1.2 Somethings to design approaches 1.2 Somethings to design approaches


Crypto M
 Examples  Examples
 Project 25 OTAR Documents : TIA/EIA APCO Project 25

 TIA/EIA Telecommunications Systems Bulletin, APCO Project 25,


TSB102.AACA, "Over-The-Air-Rekeying(OTAR) Protocol New
Technology Standards Project Digital Radio Technical Standards," Jan.
1996.
 TIA/EIA Telecommunications Systems Bulletin, APCO Project 25,
TSB102.AACA-1,"Over-The-Air-Rekeying(OTAR) Protocol Addendum
1," Dec. 2000.
 TIA/EIA Telecommunications Systems Bulletin, TSB 102.AACB, "Over-
The-Air-Rekeying(OTAR) Operational Description, Jan. 1997.

 TIA, IS 102.AAAA,”DES Encryption Protocol”

 TIA, TSB-102.BAAA, “Recommanded Common Air Interface” Non-Crypto M Non-Crypto M


 Appendum BAAA-1
 TIA,TSB-102.BAAD,”Common Air Interface Operational Description for
Conventional Channels”

6
II. Hardware Design Approaches 2.1 Higher speed approaches in Block cipher)
Approaches Descriptions Properties

 Design Tools 1) HW Design FPGA(Field Programmable Gate Array) - Low cost, easy design at Lab./research Inst.
Tools * Xilinx Vertex/Altera FLEX, etc - Lower speed to ASIC if the same layout
 FPGA(Field Programmable Gate Array)
 ASIC(Application Specific Integrated Circuits) ASIC(application specific IC) - High cost, difficulty in design at design house
* Samsung/Hynix/Lucent Tech. etc - Higher speed to FPGA about 5~10 times if the
 NT(nano-technology) same layout
2) Semi- 0.5µm → 0.35 µm → 0.22 µm → 0.18 µm Decreasing layout → High speed, low power,
 Design approaches conductor layout → 0.13 µm (→ 90nm → 70nm → 60nm → 50nm) Down sizing

 Round-Common: implementation for one-round function 3) Pipeline Round-common →◘ → Single round chip (1)
approaches
 lower gates, lower costs, but low-speed Pipeline-structured →◘◘◘◘◘→ multiple k-round chip (k times in speed)
* partial pipeline (k=#round, #round/divisor,
 Pipeline-structured: implementation for full-round sequential functions * full pipeline or #round x positive integer)
 higher gates, higher costs, but high-speed
Parallel-structured →◘ → Single round chip x n parallel (n times)
 Parallel : parallel operation for round-common or pipeline-structured →◘ →
 Parallel & Pipeline : combinations of parallel and pipeline-structured Parallel Pipeline →◘◘◘◘◘→ multiple k-round chip x n parallel (n x k times)
→◘◘◘◘◘→
4) Component Selection of high-speed components High-speed component
approaches Ex) XOR, Mod232adder, MUL, SFT, S-P (Security needs in the first important)

Combine or Optimize of high-speed components Optimized combination of components


Ex) # rounds, # steps in F-function • #round is inverse ratio to high-speed
• #step in round is inverse ratio to high-speed
• the more fast in each step is the better

2.2 ASIC vs. FPGA 2.2 FPGA Device - example(Target)


Items ASIC FPGA  FPGA – Altera chip

Through -puts High Medium FPGA device is “Cyclone II” by


(0.13 µm/0.5 µm) (0.22um , 0.35um)
Altera

Power Light-weight Medium Specification of “Cyclone II”


consumption
 EP2C35F6728C
 350,000 Gate
Costs for High costs Low costs
implementation  33,216 Logic elements
(50,000$-100,000$/item) (design tools & PC)
 672 pins
Costs for Low cost(large items) Medium cost  8 speed (4 is fastest speed)
Sales High cost(small items)
Design Expert-company Field-programmable
Approaches (Customized)

7
2.2 FPGA Device - example(Target) 2.2 ASIC – company in Korea

 Foundary – Manufacturing company


 Samsung LSI (http://www.samsungelectronics.com/semiconductors/asic/ASIC.htm)
 Hynix Semiconductor (http://www.hynix.com/)
 ANAM Semiconductor (http://www.aaww.com/)
 DongBu Electronics (http://www.dsemi.com/)

 Design House
 Samsung (http://www.samsungelectronics.com/semiconductors/asic/ASIC.htm)
 Hynix (http://www.hynix.com/)
 CNC technology (http://www.cnstec.com/)
 ADC (http://www.adc.co.kr/)
 TLI (http://www.tli.co.kr/)
 ECT (http://www.ect.co.kr/)
 INC technology (http://www.inctech.co.kr/)
 ARALION (http://www.aralion.com)

2.3 Pipeline-structured 2.3 Pipeline-structured


 Throughputs (s : # of input data, n : # of pipeline steps)
 Input – (unit 1 … unit n) - Output

 Efficiency: T1 s×n
Sp = =
Tn n + ( s − 1)
Sp
 Max. throughputs: E= × 100
n

n
lim S p = lim =n
s →∞ s →∞ n 1
+ (1 − )
s s

8
2.3 pipeline vs. round-common III. High-speed Block cipher
 Round-Function for high-speed
라운드 구성에 따른 결과 출력도 라운드 공유 방식
라운드 방식에 따른 게이트 소요도
라운드 공유 방식
파이프라인 방식
S-box  memory speed(RAM/ROM)
파이프라인 방식
Bit-wised adder/multiplier  XOR/AND
1 만 단 위 총 소 요 게 이 트 수

1152 78
1024 72
총 결과 출력 비트

66
896 60
768 54
48
Word-wised modulo adder/multiplier
640
512
42
36 (Mod 232)  (+)mod 232 , (⊙)mod 232
30
384 24
256 18
12
S-P/MA/SSMA  LUT (Look-Up
6
128
0
0 Table)
1 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
라운드 수행 횟수 라운드 소요 횟수  # of Round vs. Throughput or
Security
• round-common (blue): # of gates are round-independant
 Pipeline vs. Throughput or Area
• pipeline(red): # of gates are round-dependent increasing

Block Ciphers: Key Elements Block Cipher: Core Operations


 Bitwise XOR, AND, OR.
 Addition or subtraction modulo 2n
 Shift or rotation by a constant number of bits.
 Data-dependent rotation by a variable number
of bits.
 Multiplication modulo the table entry value.
 Multiplication in the Galois field specified by the
table entry value.
 Inversion modulo the table entry value.
 Look-up-table substitution

9
3.1 High-speed core components 3.1 High-speed core components
- Memory Access Time (ns) - Adder & Shifter Speed (ns)
삼성전자 DDR SRAM Speed (2004.7월)  64-bit Adder Speed[0.13um ASIC]  180/372ps
 Belgium, Neve et.al
 IEEE Trans. VLSI system
 2004
 720ps @ 1.1V, 0.18um
 326ps @ 1.1V, 0.13um
 180ps @ 2.5V, 0.13um

 64-bit adder [2006년]


 372 ps @ 0.18-um CMOS
 2006 IEEE CS ISCAS 2006,
 Kim JooYoung, et al.

3.1 High-speed core components 3.1 High-speed core components


- Adder & Shifter Speed (ns) - Multiplier Speed (ns)
 32-bit Adder Speed[SAMSUNG 0.13um ASIC]  387ps  64-bit Multiplier Speed[0.35um ASIC] 10ns
Samsung ASIC Speed  Europe SEDA et.al

 2003, M.S. thesis “Design and Realization of a high-speed 64 x 64 multiplier


for low power applications”

 12.8ns @ 0.35um

 10ns [2003-11,Joseph Gebis, UCB]

10
3.1 High-speed core components 3.2 Implementation examples- AES (Final,2000)
– Core Operation
Algorithm XOR Mod 232 Mod 232 Fixed Variable Mod 264 Mul GF(28) S-box
Add sub shift rotate Mul *LUT
Designed ASIC or Performan #Integrated Year Remarks
*LUT by FPGA ce[Mbps] [Gates]
Notation >>> GMU - FPGA 414 2,507 2000
<<< George Mason 0.22um CLBslices
Univ.
Max. delay ~10 ps 180 ps 180 ps ~10 ps ~10 ps 10.0 ns ~1 ns ~1 ns
(ASIC) USC FPGA 353 4312 2000
MARS ● ● ● ● ● ● ● 0.22um
WPI - FPGA 294 3528 2000
RC6 ● ● ● ● ● Worcester 0.22um
Ploytechnic
Instit.
Rijndael ● ● ● ●
MICRONIC FPGA 179 - 2000
0.22um
Serpent ● ● ●
NSA ASIC 606 - 2000 2-Gen.
Twofish ● ● ● ● ● 0.5um Previous

3.2 Implementation Examples in Block cipher 3.2 Implementation Examples in Block cipher
- AES: George Mason Univ’2000 - AES: George Mason Univ’2000

11
3.2 Implementation Examples in Block cipher 3.2 Implementation Examples in Block cipher
- AES: George Mason Univ’2000 - AES: George Mason Univ’2000

3.2 Implementation Examples in Block cipher 3.2 Implementation Examples in Block cipher
- AES pipeline [2002-2006] - SEED
Designed ASIC or Performan # Integrated Published Remarks Designed ASIC or Performan # Integrated Published Remarks
by ce[Gbps] to by FPGA ce[Gbps] [Gates] to
FPGA [Gates]
Lee, ASIC 2.56 CISC’200 Seo, FPGA 2.62 16,770 APASIC’2
YoonKyun 0.25um 2 YongHo 000
g et.al et.al
Goo, FPGA 20 WISC’200 40- Choi, ASIC 237 14,110 KICS,
BonSeog 2 stage(10) ByungYoo 0.25um 2000
et.al Pipeline n et.al

McLoone, FPGA 12.02 IEEE Jeon, ASIC 258.9 17,610 KIISC, Non-
Syp2000 ShinWoo 0.5um 2001 pipeline
McCanny et.al
Hodjat FPGA 21.54 5177 FCCM200 Choi, FPGA 35.34 10.610 KIISC, Smartcard,
et.al CLBslices 4 HeongMug etc.
2004.10
/UCLA et.al
Saggese FPGA 20.3 5810 FPL2003 /Samsung
et.al CLBslices Um, FPGA 6,400 (54,803 LUT KISS, pipeline
Goo, FPGA Under 3,992 KIISC RFID for SungYong Xilinx + 2,048 2003.3
BonSeog Gbps Journal low-power et.al Vertex-II buf+gates)
0.25um
et.al 2006.10

12
3.2 Implementation Examples in Block cipher 3.2 Implementation steps– AES= 4 steps
- ARIA
Designed ASIC or Performan # Integrated Publishe Remarks
by FPGA ce[Gbps] [Gates] d to
NSRI ASIC 1,871 8,935 WISC200 Non-pipeline
Team Hynix 3 [AES]*
0.25um [1,839]* [9,088]* High-speed,
low-
integrated
Park, FPGA 1,142 29,930 ga WISC200 Non-pipeline
JinSeob 0.22um 1,599 sl 4
et.al
Yoo, Xilinx 437 1,490 slices IEEK, Non-pipeline
YeongGab VertexE- 2005.4
et.al 1600 FPGA
Jang, 64-bit - - KIISC -
HwanSug microproc Journal
et.al essor 2006.6
c) Round function

3.2 Implementation steps– IDEA= 7 steps 3.2 Implementation steps– SEED= 8 steps

- 8-round transformation
- 8-round transformation
(6 x 8 = 48 subkeys)
(6 x 8 = 48 subkeys)
- output transformation
- output transformation
(4 subkeys)
(4 subkeys)
- Key generation
- Key generation
(16-bit, 52 subkeys)
(16-bit, 52 subkeys)
- Multiplication-Addition
- Multiplication-Addition
(MA) structures
(MA) structures

a) Block Diagram
b) Round functions

13
3.2 Implementation steps– ARIA= 3 steps 3.2 Implementation steps– comparisons

Parameters DES T-DES IDEA ARIA SEED AES

Block size 64 64 64 128 128 128

128/192/ 128/192/2
Key size 56 112 128 128
256 56
# rounds 16 16 x 3 8 12/14/16 16 10/12/14
128x13/ 128x11/
Roundkey 48x16
48x16 16x52 128x15/ 32x2x16 128x13/
size ,3
128x17 128x15
8x8, 4
S-BOX 6x4, 8 6x4, 8 16-bit MA 8x8, 2 8x8
8x32, 4
Key Space 256 2112 2128 2128~256 2128 2128~256
Proper CPU
8-bit 8-bit 16-bit 32-bit 32-bit 32-bit
(word size)
Year 1975-77 1979 1990-92 2003 1997 2000-01
Switzerlan
Country USA USA KOREA KOREA USA
d
TTAS.KO
Standard FIPS-46 - - - FIPS-197
-12.0004

3.2 Block cipher implementation analysis


3.2 Implementation steps– comparisons
- Block cipher mode vs. Performance(by KISA)
Cipher Clock #of round # Steps in Speed
Period(ns) (#of clock) 1-round (Mbps)

MARS 5 32(114) - 224.6

RC6 5 20(122) - 209.8

RIJNDAEL 9.7 10(10) 4 steps 1,319.6


SERPENT 6.3 32(32) - 634.9

TWOFISH 5 16(128) - 200


SEED 10.3 16(48) 8 steps 258.9

ARIA - 12() 3 steps 1,781.0

[summary] (1) # Steps in 1-round  “important factor in high-speed ver.”


( SEED=8 > IDEA=7 > DES=5 > AES=4 > ARIA=3 )
(2) “More important factor is critical time in each step”

14
Conclusion (On Designing Block cipher) IV. High performance in stream cipher
Approaches Descriptions Properties
1) HW Design
Tools
FPGA(Field Programmable Gate Array)
* Xilinx Vertex/Altera FLEX, etc
- Low cost, easy design at Lab./research Inst.
- Lower speed to ASIC if the same layout
 High speed components
ASIC(application specific IC)
* Samsung/Hynix/Lucent Tech. etc
- High cost, difficulty in design at design house
- Higher speed to FPGA about 5~10 times if the
Higher speed LFSR
same layout
2) Semi- 0.5µm → 0.35 µm → 0.22 µm → 0.18 µm Decreasing layout → High speed, low power, Higher speed Nonlinear Combiner
conductor layout → 0.13 µm (→ 90nm → 70nm → 60nm → 50nm) Down sizing
3) Pipeline
approaches
Round-common →◘ → Single round chip (1) Higher speed Filter Function
Pipeline-structured →◘◘◘◘◘→ multiple k-round chip (k times in speed)
* partial pipeline
* full pipeline
(k=#round, #round/divisor,
or #round x positive integer)
Higher speed Irregular Clocked
Parallel-structured →◘
→◘


Single round chip x n parallel (n times)
Device
Parallel Pipeline →◘◘◘◘◘→ multiple k-round chip x n parallel (n x k times)

4) Component
→◘◘◘◘◘→
Selection of high-speed components High-speed component
Parallel-Stream Cipher
Word-Based Stream Cipher
approaches Ex) XOR, Mod232adder, MUL, SFT, S-P (Security needs in the first important)

Combine or Optimize of high-speed components Optimized combination of components


Ex) # rounds, # steps in F-function • #round is inverse ratio to high-speed
• #step in round is inverse ratio to high-speed
• the more fast in each step is the better

4.1 Higher speed of LFSR 4.1 Higher speed of LFSR


- LFSR-based stream cipher - LFSR-based stream cipher
x5
1 x x2 x3
 Basic components in stream cipher
Output
 LFSR, NFSR, Nonlinear Function, etc. System clock
0 1 2 3 4

 LFSR(Linear Feedback Shift Register) 5 3 2


- Primitive polinomial : P ( x ) = x + x + x + x + 1

 Primitive polynomial  Maximal period = 2n - 1 a) Blockdiagram of the n=4 stage LFSR


 Ext. clock  speed control(1-bit output/clock)
 Linear output  Predictable for output bits

Q D Q .D
... Q D Q D Q D Output

0 1 2 3 4

n-stage LFSR Output ~Q < ~Q < ~Q < ~Q < ~Q <


System clock

System clock b) Implementation of the n=4 stage LFSR

15
4.1 Higher speed of LFSR 4.1 Higher speed of LFSR
- LFSR-based stream cipher - LFSR-based stream cipher
 Parallel Stream Cipher  (n, m) PS-LFSR
Feedback connection : the original connection

 Basic components in parallel stream cipher


 PS-LFSR, Serial/Parallel Converter 0 1 2 3 4 5
n-stage LFSR
...... i ...... n-1 Output

 m-Parallel Nonlinear Functions SYSTEM CLOCK

a) n-stage LFSR

 Parallel-Shifting LFSR(PS-LFSR) Feedback connection m : m-bit left-shifting of the original combination


feedback m

 Crypto-degree  Similar to LFSR -(m-1) -(m-2) -(m-3)


. . . . . .
...... n-m+1

 Ext. clock  m-bit parallel output/clocking  m times faster Feedback connection 2 : 1-bit left-shifting of the original combination
feedback 2

-1 0 1 ...... n-2
( 2≤m≤n ) Feedback connection 1: the original combination
feedback 1

. . .
0 1 2 ...... n-1

----------------------------------------
[Ref] HJ Lee, SJ Moon” Parallel Stream Cipher for Secure High-Speed Communications” --- m-1 2m-1 3m-1 n-1
-(m-1) ... ... ... n-2
Signal Processing, Vol.82, No. 2, Feb. 2002. ...
-5
5
4
m+5
m+4
2m+5
2m+4
:
n-3
n-4
. . . m-bit
: OUTPUT
-4 3 m+3 2m+3 n-5
:
-3 2 m+2 2m+2 n-6
-2 1 m+1 2m+1 ...
-1 0 m 2m n-m

SYSTEM CLOCK

(m-1)-stage LBUF n-stage PS-LFSR

b) (n,m) PS-LFSR

4.1 Higher speed of LFSR 4.1 Higher speed of LFSR

 (n=39, m=8) PS-LFSR


 (n=40, m=8) PS-LFSR - example feedback 8
Feedback connection 8

0 1 2 3 4 5 ...... 35 ...... 39
-7 -6 -5 30
Output
40-stage LFSR ......
SYSTEM CLOCK
feedback 2
Feedback connection 2
a) n=40 stage LFSR

-1 0 1 37

Feedback connection 8 : s(33+t)= s(28+t) ^ s(-5+t) ^ s(-6+t) ^ s(-7+t)


feedback 8 feedback 1
Feedback connection 1
-7 -6 -5 28
. . . . . . ...
feedback 2
0 1 2 38
Feedback connection 2 : s(39+t)= s(34+t) ^ s(1+t) ^ s(t) ^ s(-1+t)

-1 0 1 34
feedback 1
Feedback connection 1: s(40+t)= s(35+t) ^ s(2+t) ^ s(1+t) ^ s(t)
. . .
0 1 2 35 --- 7 15 23 31 ---
-7 6 14 22 30 38
-6 5 13 21 29 37
...
-5 4 12 20 28 36
-8 7 15 23 31 39 m=8 bits
-4 3 11 19 27 35
-7 6 14 22 30 38
OUTPUT
-6 5 13 21 29 37
. . . m=8 bits
-3 2 10 18 26 34
-5 4 12 20 28 36
-4 3 11 19 27 35 OUTPUT -2 1 9 17 25 33
-3 2 10 18 26 34 -1 0 8 16 24 32
-2 1 9 17 25 33
-1 0 8 16 24 32

SYSTEM CLOCK
SYSTEM CLOCK
p( x) = x 39 + x 37 + x 25 + x 24 + x 22 + x 8 + x 6 + x 4 + 1
8-stage LBUF 40-stage PS-LFSR

b) (n,m)=(40,8) PS-LFSR 7-stage LBUF 39-stage PS-LFSR

b) (n,m)=(39,8) PS-LFSR

16
4.1 Higher speed of LFSR 4.1 Higher speed of LFSR

 m-Parallel Stream Cipher  m-parallel summation generator


c 11 , c12 ,.., c1 M
M M

C1 C2 CN C1 C2 CN m
... ...
l1-PS-LFSR1
(x11)
Summation
(x1) (x1) (x21) Generator:
l1-LFSR1 l1-LFSR1 (y1)
Nonlinear Nonlinear . . . SUM-BSG1
... ...
combiners: combiners:
(x2) (x2)
l2-LFSR2 l2-LFSR2 (xm1)
(memoryless (memoryless m
l2-PS-LFSR2
(x3) or not) (x3) or not)
l3-LFSR3 l3-LFSR3

...... ...... . . . . . . . . . . . . . . .

(xN) (xN)
lN-LFSRN f1, f2, ..., fm lN-LFSRN f1, f2, ..., fm
cm1 , cm 2 ,.., cmM
M M
... ... ... ...

(x1m)
m m Summation
Serial/ Parallel/ (x2m) Generator:
Serial/ Parallel/ (ym)
m m m m
Plaintext Parallel Serial Parallel Serial Plaintext . . . SUM-BSGm
Converter Converter Converter Converter m
Ciphertext lm-PS-LFSR m
(xmm)

N
o
t
e
:
N
=
m
,
M
=
M
=
.
.
=
M

=
M
1

m
Transmitter Channel Receiver

4.1 Higher speed of LFSR 4.1 Higher speed of LFSR

 Higher speed clock-controlled LFSR  Higher speed clock-controlled LFSR


(compensate for down-speed) (compensate for down-speed)
KEY_DATA

KEY_LDEN
feedback connection 4 carry = S1

From FA sum = S0

feedback connection 3

feedback connection 2 feedback 4

feedback connection feedback connection 1 d[-3+t] d[3+t]d[6+t] d[31+t]d[33+t]d[44+t] d[47+t] feedback 3


d[85+t]
......

...... feedback 2

00
0 01 feedback 1
02
Y 127-LFSR
39-stage LFSR ca 03 Y 129-LFSR
89-stage 4-bit parallel bLFSR d S0 S1 S0 S1 S0 S1 S0 S1 S0 S1 S0 S1

KEYDATA 1 d[88+t]
d[t] d[6+t]d[9+t] d[34+t]d[36+t]d[47+t] d[50+t]
SEL 1 s0 s1 s0 s1 s0 s1 s0 s1 s0 s1 s0 s1 s s0 s1
......

......
k n d -3 y
0
1 d0 y
0
1 d84 y
0
1 d85 y
0
1 d86 y
0
1 d87 y
0
1 d88 0 y
0
1
SEL CLK 2 2 2 2 2 2 y 2

... ... 3 3 3 3 3 3 1 3

......
......
KEYLDEN 3-stage ......
LBUF(d-3,d-2,d -1)

fac(t)
(t) fb (t)z(t) SYSTEM_CLOCK

ffac ffdb
CLOCK-CONTROL
CLOCK-CONTROL DATA GENERATION
CLOCK-CONTROL
d0
d1 fd function output sequences
d2 KEYSTREAM
......
SYSTEM CLOCK * filtered function
d88

17
4.1 Higher speed of LFSR 4.1 Higher speed of LFSR

 (n=39, m=8) PS-LFSR  (n=39,m=8)PS-LFSR: Graphic Design


feedback 8
Feedback connection 8

-7 -6 -5 30
......

feedback 2
Feedback connection 2

-1 0 1 37
feedback 1
Feedback connection 1
...
0 1 2 38

--- 7 15 23 31 ---
-7 6 14 22 30 38
-6 5 13 21 29 37
...
-5 4 12 20 28 36
m=8 bits
-4 3 11 19 27 35
OUTPUT
-3 2 10 18 26 34
-2 1 9 17 25 33
-1 0 8 16 24 32

SYSTEM CLOCK
p( x) = x 39 + x 37 + x 25 + x 24 + x 22 + x 8 + x 6 + x 4 + 1

7-stage LBUF 39-stage PS-LFSR

b) (n,m)=(39,8) PS-LFSR

4.1 Higher speed of LFSR 4.1 Higher speed of LFSR

 (n=39, m=8) PS-LFSR : Simulation  (n=39, m=8) PS-LFSR : Time Delay

18
4.2 Word Based Stream Ciphers
4.1 Higher speed of LFSR
- Basic components in WBSC
 LFSR vs. PS-LFSR Word Based Linear Feedback Shift
Items 39-LFSR (39,8) PS-LFSR
Register (LFSR)
Period 2 39 − 1 239 − 1 Dynamic Tables
Processing rate @
500MHz system clock
500 x 8 Pseudorandom Functions
500 Mbps = 4 Gbps
(Max. Delay = 1.73 ns) Generic Finite State Machine (FSM)

Hardware Complexity 219 gates 401 gates


[ gates ] (1.83 : 1 )
-39 D F/Fs -46 D F/Fs
-1 (2-1) MUX -1 (2-1) MUX
-7 XORs -56 XORs

4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Word Based LFSR - Dragon
 Word-based Stream Cipher (WBSC)
Word Based LFSR  Fast, word-operation (@Tx, @ Rx)
 Ex) Snow, Sober, Turing, Dragon

LFSR
 Examples:
 Snow
 Sober
 Turing
 Dragon
Nonlinear Filter

W-bit word W-bit word

Output W-bit word W-bit word W-bit word W-bit word

19
4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Dragon - Dragon
 Dragon
 Word –based stream cipher : ICISC’2004  H/W performance analysis (32-bit word)
 Throughput: 23Gbps (H/W), 3.8Gbps (S/W)
 Cooperative results with Australia QUT-ISRC(Dr. Dawson)
 Submitted to the Int’l standard ECRYPT-eSTREAM (phase 3) Step 1 : 0.387 ns

Step 2 : 1.0 ns

NLFSR M
Step 3 : 1.0 ns

FPDS Selection
Step 4 : 0.387 ns

Total : 2.774 ns
F
Performance = 64bit /2.774ns = 23 Gb/s in Maximal
feedback keystream = 64bit /4.27ns = 15 Gb/s in Normal
* In SAMSUNG 0.13um ASIC, 32-bit adder delay time=0.387 ns

4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Dragon - Dragon
Parallel-Structured Word-based LFSR (PS-WLFSR)
Word-based FSR (WFSR)
Similar
to the bit-based LFSR (LFSR)
Feedback shifted by the unit of the word-size (W)

20
4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Dragon - Dragon

4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Dragon - Dragon

Implementing s-box in F function


High-accessable SRAM memory (compiled SRAM)
Per S-Box, used 256X32 bit (1KB) SRAM
Totally 24 S-Boxes, 24KB SRAM required

21
4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Dragon - Dragon
 (New) Dragon in Parallel-Structured (m=16)
 (New) Dragon in Parallel-Structured (m=8)

4.2 Word Based Stream Ciphers


- Dragon Conclusion
 Performance Analysis  Higher speed in block cipher
Items Worst case Typical case Best case
ASIC latest tech.(layout) : 0.13um, 90–40nm
memory 287,600 287,600 287,544
Area
comb. 8,126 8,068 8,219 Pipeline/Parallel pipeline-structured
(gate size)
total 295,726 295,688 295,763 The smaller in #step of round, the faster
Critical Path delay(ns) 14.36 10.26 6.72
Throughput 4.4 Gbps 6.2 Gbps 9.5 Gbps
Selection of the faster components in step
Parallel-Throughput(m=8) 35.2 Gbps 49.6 Gbps 76 Gbps (Security is the best required)
(max. m=16) 70.4 Gbps 99.2 Gbps 152 Gbps
 Higher speed in stream cipher
Parallel-Structured LFSR  m times faster
※ [Note] 1) comb. : Combinational logic
Clock-controlled (compensate)  1 time
2) Best/Typical/Worst case : By Synthesis Library Environmental
3) Throughput [bps] ≒ (Output bits) × Speed
Word-Based Stream Cipher  W times faster

22

You might also like