Professional Documents
Culture Documents
http://kowon.dongseo.ac.kr/~hjlee
http://crypto.dongseo.ac.kr
2011-04-03 CNSL-Internet-DongseoUniv. 1
1
- Adder & Shifter Speed (ns) - Multiplier Speed (ns)
RC6 ● ● ● ● ●
Rijndael ● ● ● ●
Serpent ● ● ●
Twofish ● ● ● ● ●
2011-04-03 CNSL-Internet-DongseoUniv. 7
2
1.1 AES evaluation - Initial(1997, NIST) 1.1 AES evaluation - Final(2000, NIST)
1.1 AES evaluation - Final(2000, NIST) 1.1 AES evaluation – hardware efficiency
Government ASIC
NSA
IBM, Inc.
Misubishi, Inc.
Universities FPGA
UC Berkeley
USC(University of Southern California)
WPI(Worcester Polytechnic Institute )
GMU(George Mason University)
Micronic, Inc.
3
1.2 Somethings to design approaches 1.2 Somethings to design approaches
Crypto/Non-Crypto Module
Crypto Module: Refer to FIPS 140-1,2,3 (CMVP)
Implementation Platforms
Non-Crypto Module: Power module, Interface module, etc.
Performance needed
Development and per-unit costs
Power consumption (Important in case of wireless devices!)
Flexibility
Physical Security
15 16
4
1.2 Somethings to design approaches 1.2 Somethings to design approaches
17 18
19 20
5
1.2 Somethings to design approaches 1.2 Somethings to design approaches
Module approaches
Hardware Implementation Methodologies S/W module: Designed/implemented at Client PC
21
6
II. Hardware Design Approaches 2.1 Higher speed approaches in Block cipher)
Approaches Descriptions Properties
Design Tools 1) HW Design FPGA(Field Programmable Gate Array) - Low cost, easy design at Lab./research Inst.
Tools * Xilinx Vertex/Altera FLEX, etc - Lower speed to ASIC if the same layout
FPGA(Field Programmable Gate Array)
ASIC(Application Specific Integrated Circuits) ASIC(application specific IC) - High cost, difficulty in design at design house
* Samsung/Hynix/Lucent Tech. etc - Higher speed to FPGA about 5~10 times if the
NT(nano-technology) same layout
2) Semi- 0.5µm → 0.35 µm → 0.22 µm → 0.18 µm Decreasing layout → High speed, low power,
Design approaches conductor layout → 0.13 µm (→ 90nm → 70nm → 60nm → 50nm) Down sizing
Round-Common: implementation for one-round function 3) Pipeline Round-common →◘ → Single round chip (1)
approaches
lower gates, lower costs, but low-speed Pipeline-structured →◘◘◘◘◘→ multiple k-round chip (k times in speed)
* partial pipeline (k=#round, #round/divisor,
Pipeline-structured: implementation for full-round sequential functions * full pipeline or #round x positive integer)
higher gates, higher costs, but high-speed
Parallel-structured →◘ → Single round chip x n parallel (n times)
Parallel : parallel operation for round-common or pipeline-structured →◘ →
Parallel & Pipeline : combinations of parallel and pipeline-structured Parallel Pipeline →◘◘◘◘◘→ multiple k-round chip x n parallel (n x k times)
→◘◘◘◘◘→
4) Component Selection of high-speed components High-speed component
approaches Ex) XOR, Mod232adder, MUL, SFT, S-P (Security needs in the first important)
7
2.2 FPGA Device - example(Target) 2.2 ASIC – company in Korea
Design House
Samsung (http://www.samsungelectronics.com/semiconductors/asic/ASIC.htm)
Hynix (http://www.hynix.com/)
CNC technology (http://www.cnstec.com/)
ADC (http://www.adc.co.kr/)
TLI (http://www.tli.co.kr/)
ECT (http://www.ect.co.kr/)
INC technology (http://www.inctech.co.kr/)
ARALION (http://www.aralion.com)
Efficiency: T1 s×n
Sp = =
Tn n + ( s − 1)
Sp
Max. throughputs: E= × 100
n
n
lim S p = lim =n
s →∞ s →∞ n 1
+ (1 − )
s s
8
2.3 pipeline vs. round-common III. High-speed Block cipher
Round-Function for high-speed
라운드 구성에 따른 결과 출력도 라운드 공유 방식
라운드 방식에 따른 게이트 소요도
라운드 공유 방식
파이프라인 방식
S-box memory speed(RAM/ROM)
파이프라인 방식
Bit-wised adder/multiplier XOR/AND
1 만 단 위 총 소 요 게 이 트 수
1152 78
1024 72
총 결과 출력 비트
66
896 60
768 54
48
Word-wised modulo adder/multiplier
640
512
42
36 (Mod 232) (+)mod 232 , (⊙)mod 232
30
384 24
256 18
12
S-P/MA/SSMA LUT (Look-Up
6
128
0
0 Table)
1 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
라운드 수행 횟수 라운드 소요 횟수 # of Round vs. Throughput or
Security
• round-common (blue): # of gates are round-independant
Pipeline vs. Throughput or Area
• pipeline(red): # of gates are round-dependent increasing
9
3.1 High-speed core components 3.1 High-speed core components
- Memory Access Time (ns) - Adder & Shifter Speed (ns)
삼성전자 DDR SRAM Speed (2004.7월) 64-bit Adder Speed[0.13um ASIC] 180/372ps
Belgium, Neve et.al
IEEE Trans. VLSI system
2004
720ps @ 1.1V, 0.18um
326ps @ 1.1V, 0.13um
180ps @ 2.5V, 0.13um
12.8ns @ 0.35um
10
3.1 High-speed core components 3.2 Implementation examples- AES (Final,2000)
– Core Operation
Algorithm XOR Mod 232 Mod 232 Fixed Variable Mod 264 Mul GF(28) S-box
Add sub shift rotate Mul *LUT
Designed ASIC or Performan #Integrated Year Remarks
*LUT by FPGA ce[Mbps] [Gates]
Notation >>> GMU - FPGA 414 2,507 2000
<<< George Mason 0.22um CLBslices
Univ.
Max. delay ~10 ps 180 ps 180 ps ~10 ps ~10 ps 10.0 ns ~1 ns ~1 ns
(ASIC) USC FPGA 353 4312 2000
MARS ● ● ● ● ● ● ● 0.22um
WPI - FPGA 294 3528 2000
RC6 ● ● ● ● ● Worcester 0.22um
Ploytechnic
Instit.
Rijndael ● ● ● ●
MICRONIC FPGA 179 - 2000
0.22um
Serpent ● ● ●
NSA ASIC 606 - 2000 2-Gen.
Twofish ● ● ● ● ● 0.5um Previous
3.2 Implementation Examples in Block cipher 3.2 Implementation Examples in Block cipher
- AES: George Mason Univ’2000 - AES: George Mason Univ’2000
11
3.2 Implementation Examples in Block cipher 3.2 Implementation Examples in Block cipher
- AES: George Mason Univ’2000 - AES: George Mason Univ’2000
3.2 Implementation Examples in Block cipher 3.2 Implementation Examples in Block cipher
- AES pipeline [2002-2006] - SEED
Designed ASIC or Performan # Integrated Published Remarks Designed ASIC or Performan # Integrated Published Remarks
by ce[Gbps] to by FPGA ce[Gbps] [Gates] to
FPGA [Gates]
Lee, ASIC 2.56 CISC’200 Seo, FPGA 2.62 16,770 APASIC’2
YoonKyun 0.25um 2 YongHo 000
g et.al et.al
Goo, FPGA 20 WISC’200 40- Choi, ASIC 237 14,110 KICS,
BonSeog 2 stage(10) ByungYoo 0.25um 2000
et.al Pipeline n et.al
McLoone, FPGA 12.02 IEEE Jeon, ASIC 258.9 17,610 KIISC, Non-
Syp2000 ShinWoo 0.5um 2001 pipeline
McCanny et.al
Hodjat FPGA 21.54 5177 FCCM200 Choi, FPGA 35.34 10.610 KIISC, Smartcard,
et.al CLBslices 4 HeongMug etc.
2004.10
/UCLA et.al
Saggese FPGA 20.3 5810 FPL2003 /Samsung
et.al CLBslices Um, FPGA 6,400 (54,803 LUT KISS, pipeline
Goo, FPGA Under 3,992 KIISC RFID for SungYong Xilinx + 2,048 2003.3
BonSeog Gbps Journal low-power et.al Vertex-II buf+gates)
0.25um
et.al 2006.10
12
3.2 Implementation Examples in Block cipher 3.2 Implementation steps– AES= 4 steps
- ARIA
Designed ASIC or Performan # Integrated Publishe Remarks
by FPGA ce[Gbps] [Gates] d to
NSRI ASIC 1,871 8,935 WISC200 Non-pipeline
Team Hynix 3 [AES]*
0.25um [1,839]* [9,088]* High-speed,
low-
integrated
Park, FPGA 1,142 29,930 ga WISC200 Non-pipeline
JinSeob 0.22um 1,599 sl 4
et.al
Yoo, Xilinx 437 1,490 slices IEEK, Non-pipeline
YeongGab VertexE- 2005.4
et.al 1600 FPGA
Jang, 64-bit - - KIISC -
HwanSug microproc Journal
et.al essor 2006.6
c) Round function
3.2 Implementation steps– IDEA= 7 steps 3.2 Implementation steps– SEED= 8 steps
- 8-round transformation
- 8-round transformation
(6 x 8 = 48 subkeys)
(6 x 8 = 48 subkeys)
- output transformation
- output transformation
(4 subkeys)
(4 subkeys)
- Key generation
- Key generation
(16-bit, 52 subkeys)
(16-bit, 52 subkeys)
- Multiplication-Addition
- Multiplication-Addition
(MA) structures
(MA) structures
a) Block Diagram
b) Round functions
13
3.2 Implementation steps– ARIA= 3 steps 3.2 Implementation steps– comparisons
128/192/ 128/192/2
Key size 56 112 128 128
256 56
# rounds 16 16 x 3 8 12/14/16 16 10/12/14
128x13/ 128x11/
Roundkey 48x16
48x16 16x52 128x15/ 32x2x16 128x13/
size ,3
128x17 128x15
8x8, 4
S-BOX 6x4, 8 6x4, 8 16-bit MA 8x8, 2 8x8
8x32, 4
Key Space 256 2112 2128 2128~256 2128 2128~256
Proper CPU
8-bit 8-bit 16-bit 32-bit 32-bit 32-bit
(word size)
Year 1975-77 1979 1990-92 2003 1997 2000-01
Switzerlan
Country USA USA KOREA KOREA USA
d
TTAS.KO
Standard FIPS-46 - - - FIPS-197
-12.0004
14
Conclusion (On Designing Block cipher) IV. High performance in stream cipher
Approaches Descriptions Properties
1) HW Design
Tools
FPGA(Field Programmable Gate Array)
* Xilinx Vertex/Altera FLEX, etc
- Low cost, easy design at Lab./research Inst.
- Lower speed to ASIC if the same layout
High speed components
ASIC(application specific IC)
* Samsung/Hynix/Lucent Tech. etc
- High cost, difficulty in design at design house
- Higher speed to FPGA about 5~10 times if the
Higher speed LFSR
same layout
2) Semi- 0.5µm → 0.35 µm → 0.22 µm → 0.18 µm Decreasing layout → High speed, low power, Higher speed Nonlinear Combiner
conductor layout → 0.13 µm (→ 90nm → 70nm → 60nm → 50nm) Down sizing
3) Pipeline
approaches
Round-common →◘ → Single round chip (1) Higher speed Filter Function
Pipeline-structured →◘◘◘◘◘→ multiple k-round chip (k times in speed)
* partial pipeline
* full pipeline
(k=#round, #round/divisor,
or #round x positive integer)
Higher speed Irregular Clocked
Parallel-structured →◘
→◘
→
→
Single round chip x n parallel (n times)
Device
Parallel Pipeline →◘◘◘◘◘→ multiple k-round chip x n parallel (n x k times)
4) Component
→◘◘◘◘◘→
Selection of high-speed components High-speed component
Parallel-Stream Cipher
Word-Based Stream Cipher
approaches Ex) XOR, Mod232adder, MUL, SFT, S-P (Security needs in the first important)
Q D Q .D
... Q D Q D Q D Output
0 1 2 3 4
15
4.1 Higher speed of LFSR 4.1 Higher speed of LFSR
- LFSR-based stream cipher - LFSR-based stream cipher
Parallel Stream Cipher (n, m) PS-LFSR
Feedback connection : the original connection
a) n-stage LFSR
Ext. clock m-bit parallel output/clocking m times faster Feedback connection 2 : 1-bit left-shifting of the original combination
feedback 2
-1 0 1 ...... n-2
( 2≤m≤n ) Feedback connection 1: the original combination
feedback 1
. . .
0 1 2 ...... n-1
----------------------------------------
[Ref] HJ Lee, SJ Moon” Parallel Stream Cipher for Secure High-Speed Communications” --- m-1 2m-1 3m-1 n-1
-(m-1) ... ... ... n-2
Signal Processing, Vol.82, No. 2, Feb. 2002. ...
-5
5
4
m+5
m+4
2m+5
2m+4
:
n-3
n-4
. . . m-bit
: OUTPUT
-4 3 m+3 2m+3 n-5
:
-3 2 m+2 2m+2 n-6
-2 1 m+1 2m+1 ...
-1 0 m 2m n-m
SYSTEM CLOCK
b) (n,m) PS-LFSR
0 1 2 3 4 5 ...... 35 ...... 39
-7 -6 -5 30
Output
40-stage LFSR ......
SYSTEM CLOCK
feedback 2
Feedback connection 2
a) n=40 stage LFSR
-1 0 1 37
-1 0 1 34
feedback 1
Feedback connection 1: s(40+t)= s(35+t) ^ s(2+t) ^ s(1+t) ^ s(t)
. . .
0 1 2 35 --- 7 15 23 31 ---
-7 6 14 22 30 38
-6 5 13 21 29 37
...
-5 4 12 20 28 36
-8 7 15 23 31 39 m=8 bits
-4 3 11 19 27 35
-7 6 14 22 30 38
OUTPUT
-6 5 13 21 29 37
. . . m=8 bits
-3 2 10 18 26 34
-5 4 12 20 28 36
-4 3 11 19 27 35 OUTPUT -2 1 9 17 25 33
-3 2 10 18 26 34 -1 0 8 16 24 32
-2 1 9 17 25 33
-1 0 8 16 24 32
SYSTEM CLOCK
SYSTEM CLOCK
p( x) = x 39 + x 37 + x 25 + x 24 + x 22 + x 8 + x 6 + x 4 + 1
8-stage LBUF 40-stage PS-LFSR
b) (n,m)=(39,8) PS-LFSR
16
4.1 Higher speed of LFSR 4.1 Higher speed of LFSR
C1 C2 CN C1 C2 CN m
... ...
l1-PS-LFSR1
(x11)
Summation
(x1) (x1) (x21) Generator:
l1-LFSR1 l1-LFSR1 (y1)
Nonlinear Nonlinear . . . SUM-BSG1
... ...
combiners: combiners:
(x2) (x2)
l2-LFSR2 l2-LFSR2 (xm1)
(memoryless (memoryless m
l2-PS-LFSR2
(x3) or not) (x3) or not)
l3-LFSR3 l3-LFSR3
...... ...... . . . . . . . . . . . . . . .
(xN) (xN)
lN-LFSRN f1, f2, ..., fm lN-LFSRN f1, f2, ..., fm
cm1 , cm 2 ,.., cmM
M M
... ... ... ...
(x1m)
m m Summation
Serial/ Parallel/ (x2m) Generator:
Serial/ Parallel/ (ym)
m m m m
Plaintext Parallel Serial Parallel Serial Plaintext . . . SUM-BSGm
Converter Converter Converter Converter m
Ciphertext lm-PS-LFSR m
(xmm)
N
o
t
e
:
N
=
m
,
M
=
M
=
.
.
=
M
=
M
1
m
Transmitter Channel Receiver
KEY_LDEN
feedback connection 4 carry = S1
From FA sum = S0
feedback connection 3
...... feedback 2
00
0 01 feedback 1
02
Y 127-LFSR
39-stage LFSR ca 03 Y 129-LFSR
89-stage 4-bit parallel bLFSR d S0 S1 S0 S1 S0 S1 S0 S1 S0 S1 S0 S1
KEYDATA 1 d[88+t]
d[t] d[6+t]d[9+t] d[34+t]d[36+t]d[47+t] d[50+t]
SEL 1 s0 s1 s0 s1 s0 s1 s0 s1 s0 s1 s0 s1 s s0 s1
......
......
k n d -3 y
0
1 d0 y
0
1 d84 y
0
1 d85 y
0
1 d86 y
0
1 d87 y
0
1 d88 0 y
0
1
SEL CLK 2 2 2 2 2 2 y 2
... ... 3 3 3 3 3 3 1 3
......
......
KEYLDEN 3-stage ......
LBUF(d-3,d-2,d -1)
fac(t)
(t) fb (t)z(t) SYSTEM_CLOCK
ffac ffdb
CLOCK-CONTROL
CLOCK-CONTROL DATA GENERATION
CLOCK-CONTROL
d0
d1 fd function output sequences
d2 KEYSTREAM
......
SYSTEM CLOCK * filtered function
d88
17
4.1 Higher speed of LFSR 4.1 Higher speed of LFSR
-7 -6 -5 30
......
feedback 2
Feedback connection 2
-1 0 1 37
feedback 1
Feedback connection 1
...
0 1 2 38
--- 7 15 23 31 ---
-7 6 14 22 30 38
-6 5 13 21 29 37
...
-5 4 12 20 28 36
m=8 bits
-4 3 11 19 27 35
OUTPUT
-3 2 10 18 26 34
-2 1 9 17 25 33
-1 0 8 16 24 32
SYSTEM CLOCK
p( x) = x 39 + x 37 + x 25 + x 24 + x 22 + x 8 + x 6 + x 4 + 1
b) (n,m)=(39,8) PS-LFSR
18
4.2 Word Based Stream Ciphers
4.1 Higher speed of LFSR
- Basic components in WBSC
LFSR vs. PS-LFSR Word Based Linear Feedback Shift
Items 39-LFSR (39,8) PS-LFSR
Register (LFSR)
Period 2 39 − 1 239 − 1 Dynamic Tables
Processing rate @
500MHz system clock
500 x 8 Pseudorandom Functions
500 Mbps = 4 Gbps
(Max. Delay = 1.73 ns) Generic Finite State Machine (FSM)
4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Word Based LFSR - Dragon
Word-based Stream Cipher (WBSC)
Word Based LFSR Fast, word-operation (@Tx, @ Rx)
Ex) Snow, Sober, Turing, Dragon
LFSR
Examples:
Snow
Sober
Turing
Dragon
Nonlinear Filter
19
4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Dragon - Dragon
Dragon
Word –based stream cipher : ICISC’2004 H/W performance analysis (32-bit word)
Throughput: 23Gbps (H/W), 3.8Gbps (S/W)
Cooperative results with Australia QUT-ISRC(Dr. Dawson)
Submitted to the Int’l standard ECRYPT-eSTREAM (phase 3) Step 1 : 0.387 ns
Step 2 : 1.0 ns
NLFSR M
Step 3 : 1.0 ns
FPDS Selection
Step 4 : 0.387 ns
Total : 2.774 ns
F
Performance = 64bit /2.774ns = 23 Gb/s in Maximal
feedback keystream = 64bit /4.27ns = 15 Gb/s in Normal
* In SAMSUNG 0.13um ASIC, 32-bit adder delay time=0.387 ns
4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Dragon - Dragon
Parallel-Structured Word-based LFSR (PS-WLFSR)
Word-based FSR (WFSR)
Similar
to the bit-based LFSR (LFSR)
Feedback shifted by the unit of the word-size (W)
20
4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Dragon - Dragon
4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Dragon - Dragon
21
4.2 Word Based Stream Ciphers 4.2 Word Based Stream Ciphers
- Dragon - Dragon
(New) Dragon in Parallel-Structured (m=16)
(New) Dragon in Parallel-Structured (m=8)
22