You are on page 1of 54

Ch 05.

Hardware Implementation
for Fast Encryption

이 훈 재 (李 焄 宰) Hoon-Jae Lee
CNSL
Cryptography and Network Security Lab.
hjlee@dongseo.ac.kr

http://kowon.dongseo.ac.kr/~hjlee
http://crypto.dongseo.ac.kr

2013-05-09 CNSL-Internet-DongseoUniv. 1

Agenda
I. Overview of hardware implementation
II. Hardware Design Approaches
2.1 Hardware design approaches
2.2 ASIC/FPGA
2.3 Pipeline/Parallel-structures

III. High-speed Block cipher


3.1 high-speed for components
3.2 Comparisons for implemented AES/SEED/ARIA/IDEA

IV. High-speed Stream cipher


4.1 Parallel-Structured/Shifting LFSR
4.2 Word-Based Stream Cipher

V. Highly Reliable Stream Synchronous Systems

1
High-Speed Implementation
- Memory Access Time (ns)
Samsung DDR SRAM Speed (2004.7)

2013-05-09 CNSL-Internet-DongseoUniv. 3

- Adder & Shifter Speed (ns)


q 64-bit Adder Speed[0.13um ASIC, 2004]à
180/326ps
Ø Belgium, Neve et.al
Ø IEEE Trans. VLSI system
Ø 2004
Ø 720ps @ 1.1V, 0.18um
Ø 326ps @ 1.1V, 0.13um
Ø 180ps @ 2.5V, 0.13um

2013-05-09 CNSL-Internet-DongseoUniv. 4

2
- Adder & Shifter Speed (ns)

q 32-bit Adder Speed[SAMSUNG 0.13um ASIC, 2002]


Samsung ASIC Speed (2002)

2013-05-09 CNSL-Internet-DongseoUniv. 5

- Multiplier Speed (ns)

q 64-bit Multiplier Speed[0.35um ASIC, 2003]à


12.8ns
Ø European, SEDA
Ø 2003, M.S thesis “Design and Realization of a high-
speed 64 x 64 multiplier for low power applications”
Ø 12.8ns @ 0.35um

2013-05-09 CNSL-Internet-DongseoUniv. 6

3
High-Speed Implementation
Algorithm XOR
– Core Operation
Mod 232 Mod 232 Fixed Variable Mod 232 GF(28) S-box
Add sub shift rotate Mul Mul *LUT
*LUT
Notations >>>
<<<
Max. Sever 180 ps 180 ps Several Several 12.8 ns Several Several
Speed(ASI al 10 10 ps 10 ps ns ns
C) ps
MARS ● ● ● ● ● ● ●

RC6 ● ● ● ● ●

Rijndael ● ● ● ●

Serpent ● ● ●

Twofish ● ● ● ● ●

2013-05-09 CNSL-Internet-DongseoUniv. 7

1.1 AES evaluation-Initial(1997, NIST)

4
1.1 AES evaluation - Initial(1997, NIST)

1.1 AES evaluation - Final(2000, NIST)

5
1.1 AES evaluation - Final(2000, NIST)

1.1 AES evaluation – hardware efficiency

qGovernment à ASIC
ØNSA
ØIBM, Inc.
ØMisubishi, Inc.
qUniversities à FPGA
ØUC Berkeley
ØUSC(University of Southern California)
ØWPI(Worcester Polytechnic Institute )
ØGMU(George Mason University)
ØMicronic, Inc.

6
1.2 Somethings to design approaches
q Crypto/Non-Crypto Module
Ø Crypto Module: Refer to FIPS 140-1,2,3 (CMVP)
Ø Non-Crypto Module: Power module, Interface module, etc.

q Security device approaches


Ø End-to-end Device: mainly software-oriented design
Ø Link Encryption device: mainly hardware-oriented design
Ø Hybrid device: ETE & LE

q Noiseless or noisy channel approaches


Ø Block cipher(PKC): wired-communication, computer security
Not adaptable for noisy channel for satellite or wireless
Ø Stream cipher: bit-oriented, adaptable in satellite or wireless
channels,
Required for Mode changing in OFB mode for block cipher
Ø Word-based Stream cipher: Block cipher & Stream cipher
- Europe ECRYPT-eSTREAM (http://www.ecrypt.eu.org/stream/)

1.2 Somethings to design approaches

Implementation Platforms

14

7
1.2 Somethings to design approaches
Which Platform?
q The choice of the implementation platform is driven by a
multitude of factors, which include,

Ø Performance needed
Ø Development and per-unit costs
Ø Power consumption (Important in case of wireless devices!)
Ø Flexibility
Ø Physical Security

q Choice heavily depends on application requirements, but…

15

1.2 Somethings to design approaches

Platform Characteristics

16

8
1.2 Somethings to design approaches

Reconfigurable Hardware and Cryptography


qWhy Hardware?

ØSoftware Implementations are too slow


for time critical applications
ØHardware implementations are
intrinsically more secure

qWhy Reconfigurable? …

17

1.2 Somethings to design approaches

Reconfigurable Hardware and Cryptography

q Advantages of reconfigurable platforms

Ø Algorithm agility
Ø Algorithm Upgradability
Ø Architecture Efficiency
Ø Resource Efficiency
Ø Algorithm Modification
Ø Throughput (Relative to software)
Ø Cost Efficiency (Relative to ASICs)

18

9
1.2 Somethings to design approaches

Hardware Implementation Methodologies

q HDL Designing done using FSM model


q Termed as CFSM (Cryptographic FSM)

19

1.2 Somethings to design approaches

Hardware Implementation Methodologies

q Four basic components of a CFSM

Ø State Register
Ø Key Register
Ø Updating Logic
Ø Control & Load Logic

q Modification of generic CFSM for


implementing hash functions…

20

10
1.2 Somethings to design approaches

Hardware Implementation Methodologies

q CFSM for Hash Functions

21

1.2 Somethings to design approaches


q Module approaches
Ø S/W module: Designed/implemented at Client PC

Ø u-P module: crypto-microprocessor


à low/medium speed
Ø H/W module: FPGA or ASIC
à high-speed or ultra-high speed
(higher costs)

q Hardware Design Approaches


Ø FPGA (Field-Programmable Gate Array) :
- ALTERA, FPGA
- XILINX , CPLD
Ø ASIC (Application-Specific Integrated Circuits) :
- CAD company supported

11
1.2 Somethings to design approaches

q Examples
u Project 25 OTAR Documents : TIA/EIA APCO Project 25

v TIA/EIA Telecommunications Systems Bulletin, APCO Project 25,


TSB102.AACA, "Over-The-Air-Rekeying(OTAR) Protocol New
Technology Standards Project Digital Radio Technical Standards," Jan.
1996.
v TIA/EIA Telecommunications Systems Bulletin, APCO Project 25,
TSB102.AACA-1,"Over-The-Air-Rekeying(OTAR) Protocol Addendum
1," Dec. 2000.
v TIA/EIA Telecommunications Systems Bulletin, TSB 102.AACB, "Over-
The-Air-Rekeying(OTAR) Operational Description, Jan. 1997.

v TIA, IS 102.AAAA,”DES Encryption Protocol”

v TIA, TSB-102.BAAA, “Recommanded Common Air Interface”


v Appendum BAAA-1
v TIA,TSB-102.BAAD,”Common Air Interface Operational Description for
Conventional Channels”

1.2 Somethings to design approaches


Crypto M
q Examples

Non-Crypto M Non-Crypto M

12
II. Hardware Design Approaches
q Design Tools
Ø FPGA(Field Programmable Gate Array)
Ø ASIC(Application Specific Integrated Circuits)
Ø NT(nano-technology)

q Design approaches
Ø Round-Common: implementation for one-round function
à lower gates, lower costs, but low-speed
Ø Pipeline-structured: implementation for full-round sequential functions
à higher gates, higher costs, but high-speed
Ø Parallel : parallel operation for round-common or pipeline-structured
Ø Parallel & Pipeline : combinations of parallel and pipeline-structured

2.1 Higher speed approaches in Block cipher)


Approaches Descriptions Properties
1) HW Design FPGA(Field Programmable Gate Array) - Low cost, easy design at Lab./research Inst.
Tools * Xilinx Vertex/Altera FLEX, etc - Lower speed to ASIC if the same layout
ASIC(application specific IC) - High cost, difficulty in design at design house
* Samsung/Hynix/Lucent Tech. etc - Higher speed to FPGA about 5~10 times if the
same layout
2) Semi- 0.5μm → 0.35 μm → 0.22 μm → 0.18 μm Decreasing layout → High speed, low power,
conductor layout → 0.13 μm (→ 90nm → 70nm → 60nm → 50nm) Down sizing
3) Pipeline Round-common →◘ → Single round chip (1)
approaches
Pipeline-structured →◘◘◘◘◘→ multiple k-round chip (k times in speed)
* partial pipeline (k=#round, #round/divisor,
* full pipeline or #round x positive integer)
Parallel-structured →◘ → Single round chip x n parallel (n times)
→◘ →
Parallel Pipeline →◘◘◘◘◘→ multiple k-round chip x n parallel (n x k times)
→◘◘◘◘◘→
4) Component Selection of high-speed components High-speed component
approaches Ex) XOR, Mod232adder, MUL, SFT, S-P (Security needs in the first important)

Combine or Optimize of high-speed components Optimized combination of components


Ex) # rounds, # steps in F-function • #round is inverse ratio to high-speed
• #step in round is inverse ratio to high-speed
• the more fast in each step is the better

13
2.2 ASIC vs. FPGA
Items ASIC FPGA

Through -puts High Medium


(0.13 μm/0.5 μm) (0.22um , 0.35um)

Power Light-weight Medium


consumption

Costs for High costs Low costs


implementation (50,000$-100,000$/item) (design tools & PC)

Costs for Low cost(large items) Medium cost


Sales High cost(small items)
Design Expert-company Field-programmable
Approaches (Customized)

2.2 FPGA Device - example(Target)


q FPGA – Altera chip

qFPGA device is “Cyclone II” by


Altera

qSpecification of “Cyclone II”


Ø EP2C35F6728C
Ø 350,000 Gate
Ø 33,216 Logic elements
Ø 672 pins
Ø 8 speed (4 is fastest speed)

14
2.2 FPGA Device - example(Target)

2.2 ASIC – company in Korea

q Foundary – Manufacturing company


Ø Samsung LSI (http://www.samsungelectronics.com/semiconductors/asic/ASIC.htm)
Ø Hynix Semiconductor (http://www.hynix.com/)
Ø ANAM Semiconductor (http://www.aaww.com/)
Ø DongBu Electronics (http://www.dsemi.com/)

q Design House
Ø Samsung (http://www.samsungelectronics.com/semiconductors/asic/ASIC.htm)
Ø Hynix (http://www.hynix.com/)
Ø CNC technology (http://www.cnstec.com/)
Ø ADC (http://www.adc.co.kr/)
Ø TLI (http://www.tli.co.kr/)
Ø ECT (http://www.ect.co.kr/)
Ø INC technology (http://www.inctech.co.kr/)
Ø ARALION (http://www.aralion.com)

15
2.3 Pipeline-structured
q Input – (unit 1 … unit n) - Output

2.3 Pipeline-structured
q Throughputs (s : # of input data, n : # of pipeline steps)

T1 s´n
Sp = =
Tn n + ( s - 1)

q Efficiency:
Sp
E= ´ 100
n
q Max. throughputs:
n
lim S p = lim =n
s ®¥ s ®¥ n 1
+ (1 - )
s s

16
2.3 pipeline vs. round-common

라운드 공유 방식 라운드 공유 방식
라운드 구성에 따른 결과 출력도 라운드 방식에 따른 게이트 소요도
파이프라인 방식 파이프라인 방식

1 만 단위 총 소요 게이트 수
1152 78
72
총 결과 출력 비트

1024 66
896 60
768 54
48
640 42
512 36
30
384 24
256 18
12
128 6
0
0
1 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
라운드 수행 횟수 라운드 소요 횟수

• round-common (blue): # of gates are round-independant


• pipeline(red): # of gates are round-dependent increasing

III. High-speed Block cipher


q Round-Function for high-speed
ØS-box è memory speed(RAM/ROM)
ØBit-wised adder/multiplier à XOR/AND
ØWord-wised modulo adder/multiplier
(Mod 232) è (+)mod 232 , (⊙)mod 232
ØS-P/MA/SSMA è LUT (Look-Up
Table)
q # of Round vs. Throughput or
Security
q Pipeline vs. Throughput or Area

17
Block Ciphers: Key Elements
q Bitwise XOR, AND, OR.
q Addition or subtraction modulo 2n
q Shift or rotation by a constant number of bits.
q Data-dependent rotation by a variable number
of bits.
q Multiplication modulo the table entry value.
q Multiplication in the Galois field specified by the
table entry value.
q Inversion modulo the table entry value.
q Look-up-table substitution

Block Cipher: Core Operations

18
3.1 High-speed core components
- Memory Access Time (ns)
삼성전자 DDR SRAM Speed (2004.7월)

3.1 High-speed core components


- Adder & Shifter Speed (ns)
q 64-bit Adder Speed[0.13um ASIC] à 180/372ps
l Belgium, Neve et.al
l IEEE Trans. VLSI system
l 2004
l 720ps @ 1.1V, 0.18um
l 326ps @ 1.1V, 0.13um
l 180ps @ 2.5V, 0.13um

q 64-bit adder [2006년]


l 372 ps @ 0.18-um CMOS
l 2006 IEEE CS ISCAS 2006,
l Kim JooYoung, et al.

19
3.1 High-speed core components
- Adder & Shifter Speed (ns)
q 32-bit Adder Speed[SAMSUNG 0.13um ASIC] à 387ps

Samsung ASIC Speed

3.1 High-speed core components


- Multiplier Speed (ns)
q 64-bit Multiplier Speed[0.35um ASIC]à 10ns

l Europe SEDA et.al

l 2003, M.S. thesis “Design and Realization of a high-speed 64 x 64 multiplier


for low power applications”

l 12.8ns @ 0.35um

l 10ns [2003-11,Joseph Gebis, UCB]

20
3.1 High-speed core components
– Core Operation
Algorithm XOR Mod 232 Mod 232 Fixed Variable Mod 264 Mul GF(28) S-box
Add sub shift rotate Mul *LUT
*LUT

Notation >>>
<<<

Max. delay ~10 ps 180 ps 180 ps ~10 ps ~10 ps 10.0 ns ~1 ns ~1 ns


(ASIC)

MARS ● ● ● ● ● ● ●

RC6 ● ● ● ● ●

Rijndael ● ● ● ●

Serpent ● ● ●

Twofish ● ● ● ● ●

3.2 Implementation examples- AES (Final,2000)

Designed ASIC or Performan #Integrated Year Remarks


by FPGA ce[Mbps] [Gates]
GMU - FPGA 414 2,507 2000
George Mason 0.22um CLBslices
Univ.

USC FPGA 353 4312 2000


0.22um
WPI - FPGA 294 3528 2000
Worcester 0.22um
Ploytechnic
Instit.
MICRONIC FPGA 179 - 2000
0.22um
NSA ASIC 606 - 2000 2-Gen.
0.5um Previous

21
3.2 Implementation Examples in Block cipher
- AES: George Mason Univ’2000

3.2 Implementation Examples in Block cipher


- AES: George Mason Univ’2000

22
3.2 Implementation Examples in Block cipher
- AES: George Mason Univ’2000

3.2 Implementation Examples in Block cipher


- AES: George Mason Univ’2000

23
3.2 Implementation Examples in Block cipher
- AES pipeline [2002-2006]
Designed ASIC or Performan # Integrated Published Remarks
by FPGA ce[Gbps] [Gates] to
Lee, ASIC 2.56 CISC’200
YoonKyun 0.25um 2
g et.al
Goo, FPGA 20 WISC’200 40-
BonSeog 2 stage(10)
et.al Pipeline
McLoone, FPGA 12.02 IEEE
McCanny Syp2000
Hodjat FPGA 21.54 5177 FCCM200
et.al CLBslices 4
/UCLA
Saggese FPGA 20.3 5810 FPL2003
et.al CLBslices
Goo, FPGA Under 3,992 KIISC RFID for
BonSeog 0.25um Gbps Journal low-power
et.al 2006.10

3.2 Implementation Examples in Block cipher


- SEED
Designed ASIC or Performan # Integrated Published Remarks
by FPGA ce[Gbps] [Gates] to
Seo, FPGA 2.62 16,770 APASIC’2
YongHo 000
et.al
Choi, ASIC 237 14,110 KICS,
ByungYoo 0.25um 2000
n et.al
Jeon, ASIC 258.9 17,610 KIISC, Non-
ShinWoo 0.5um 2001 pipeline
et.al
Choi, FPGA 35.34 10.610 KIISC, Smartcard,
HeongMug 2004.10 etc.
et.al
/Samsung
Um, FPGA 6,400 (54,803 LUT KISS, pipeline
SungYong Xilinx + 2,048 2003.3
et.al Vertex-II buf+gates)

24
3.2 Implementation Examples in Block cipher
- ARIA
Designed ASIC or Performan # Integrated Publishe Remarks
by FPGA ce[Gbps] [Gates] d to
NSRI ASIC 1,871 8,935 WISC200 Non-pipeline
Team Hynix 3 [AES]*
0.25um [1,839]* [9,088]* High-speed,
low-
integrated
Park, FPGA 1,142 29,930 ga WISC200 Non-pipeline
JinSeob 0.22um 1,599 sl 4
et.al
Yoo, Xilinx 437 1,490 slices IEEK, Non-pipeline
YeongGab VertexE- 2005.4
et.al 1600 FPGA
Jang, 64-bit - - KIISC -
HwanSug microproc Journal
et.al essor 2006.6

3.2 Implementation steps– AES= 4 steps

c) Round function

25
3.2 Implementation steps– IDEA= 7 steps

- 8-round transformation
- 8-round transformation
(6 x 8 = 48 subkeys)
(6 x 8 = 48 subkeys)
- output transformation
- output transformation
(4 subkeys)
(4 subkeys)
- Key generation
- Key generation
(16-bit, 52 subkeys)
(16-bit, 52 subkeys)
- Multiplication-Addition
- Multiplication-Addition
(MA) structures
(MA) structures

a) Block Diagram
b) Round functions

3.2 Implementation steps– SEED= 8 steps

26
3.2 Implementation steps– ARIA= 3 steps

3.2 Implementation steps– comparisons

Parameters DES T-DES IDEA ARIA SEED AES

Block size 64 64 64 128 128 128

128/192/ 128/192/2
Key size 56 112 128 128
256 56
# rounds 16 16 x 3 8 12/14/16 16 10/12/14
128x13/ 128x11/
Roundkey 48x16
48x16 16x52 128x15/ 32x2x16 128x13/
size ,3
128x17 128x15
8x8, 4
S-BOX 6x4, 8 6x4, 8 16-bit MA 8x8, 2 8x8
8x32, 4
Key Space 256 2112 2128 2128~256 2128 2128~256
Proper CPU
8-bit 8-bit 16-bit 32-bit 32-bit 32-bit
(word size)
Year 1975-77 1979 1990-92 2003 1997 2000-01
Switzerlan
Country USA USA KOREA KOREA USA
d
TTAS.KO
Standard FIPS-46 - - - FIPS-197
-12.0004

27
3.2 Implementation steps– comparisons
Cipher Clock #of round # Steps in Speed
Period(ns) (#of clock) 1-round (Mbps)

MARS 5 32(114) - 224.6

RC6 5 20(122) - 209.8

RIJNDAEL 9.7 10(10) 4 steps 1,319.6


SERPENT 6.3 32(32) - 634.9

TWOFISH 5 16(128) - 200


SEED 10.3 16(48) 8 steps 258.9

ARIA - 12() 3 steps 1,781.0

[summary] (1) # Steps in 1-round à “important factor in high-speed ver.”


( SEED=8 > IDEA=7 > DES=5 > AES=4 > ARIA=3 )
(2) “More important factor is critical time in each step”

3.2 Block cipher implementation analysis


- Block cipher mode vs. Performance(by KISA)

28
Conclusion (On Designing Block cipher)
Approaches Descriptions Properties
1) HW Design FPGA(Field Programmable Gate Array) - Low cost, easy design at Lab./research Inst.
Tools * Xilinx Vertex/Altera FLEX, etc - Lower speed to ASIC if the same layout
ASIC(application specific IC) - High cost, difficulty in design at design house
* Samsung/Hynix/Lucent Tech. etc - Higher speed to FPGA about 5~10 times if the
same layout
2) Semi- 0.5μm → 0.35 μm → 0.22 μm → 0.18 μm Decreasing layout → High speed, low power,
conductor layout → 0.13 μm (→ 90nm → 70nm → 60nm → 50nm) Down sizing
3) Pipeline Round-common →◘ → Single round chip (1)
approaches
Pipeline-structured →◘◘◘◘◘→ multiple k-round chip (k times in speed)
* partial pipeline (k=#round, #round/divisor,
* full pipeline or #round x positive integer)
Parallel-structured →◘ → Single round chip x n parallel (n times)
→◘ →
Parallel Pipeline →◘◘◘◘◘→ multiple k-round chip x n parallel (n x k times)
→◘◘◘◘◘→
4) Component Selection of high-speed components High-speed component
approaches Ex) XOR, Mod232adder, MUL, SFT, S-P (Security needs in the first important)

Combine or Optimize of high-speed components Optimized combination of components


Ex) # rounds, # steps in F-function • #round is inverse ratio to high-speed
• #step in round is inverse ratio to high-speed
• the more fast in each step is the better

IV. High performance in stream cipher


q High speed components
ØHigher speed LFSR
ØHigher speed Nonlinear Combiner
ØHigher speed Filter Function
qHigher speed Irregular Clocked
Device
qParallel-Stream Cipher
qWord-Based Stream Cipher

29
4.1 Higher speed of LFSR
- LFSR-based stream cipher

q Basic components in stream cipher


Ø LFSR, NFSR, Nonlinear Function, etc.

q LFSR(Linear Feedback Shift Register)


Ø Primitive polynomial [ Maximal period = 2n - 1
Ø Ext. clock [ speed control(1-bit output/clock)
Ø Linear output [ Predictable for output bits

n-stage LFSR Output


System clock

4.1 Higher speed of LFSR


- LFSR-based stream cipher
x5
1 x x2 x3

0 1 2 3 4 Output
System clock

5 3 2
- Primitive polinomial : P ( x) = x + x + x + x + 1

a) Blockdiagram of the n=4 stage LFSR

Q D Q .D. . . Q D Q D Q D Output

0 1 2 3 4

~Q < ~Q < ~Q < ~Q < ~Q <

System clock b) Implementation of the n=4 stage LFSR

30
4.1 Higher speed of LFSR
- LFSR-based stream cipher
Ì Parallel Stream Cipher
q Basic components in parallel stream cipher
Ø PS-LFSR, Serial/Parallel Converter
Ø m-Parallel Nonlinear Functions

q Parallel-Shifting LFSR(PS-LFSR)
Ø Crypto-degree [ Similar to LFSR
Ø Ext. clock [ m-bit parallel output/clocking [ m times faster
( 2£m£n )

----------------------------------------
[Ref] HJ Lee, SJ Moon” Parallel Stream Cipher for Secure High-Speed Communications”
Signal Processing, Vol.82, No. 2, Feb. 2002.

4.1 Higher speed of LFSR


- LFSR-based stream cipher
Ì (n, m) PS-LFSR
Feedback connection : the original connection

0 1 2 3 4 5 ...... i ...... n-1 Output


n-stage LFSR

SYSTEM CLOCK

a) n-stage LFSR

feedback m
Feedback connection m : m-bit left-shifting of the original combination

-(m-1) -(m-2) -(m-3) ...... n-m+1


. . . . . .

feedback 2
Feedback connection 2 : 1-bit left-shifting of the original combination

-1 0 1 ...... n-2
feedback 1
Feedback connection 1: the original combination
. . .
0 1 2 ...... n-1

--- m-1 2m-1 3m-1 n-1


-(m-1) ... ... ... n-2
... 5 m+5 2m+5 n-3
: . . . m-bit
-5 4 m+4 2m+4 n-4
: OUTPUT
-4 3 m+3 2m+3 n-5
:
-3 2 m+2 2m+2 n-6
-2 1 m+1 2m+1 ...
-1 0 m 2m n-m

SYSTEM CLOCK

(m-1)-stage LBUF n-stage PS-LFSR

b) (n,m) PS-LFSR

31
4.1 Higher speed of LFSR

Ì (n=40, m=8) PS-LFSR - example


0 1 2 3 4 5 ...... 35 ...... 39 Output
40-stage LFSR

SYSTEM CLOCK

a) n=40 stage LFSR

feedback 8
Feedback connection 8 : s(33+t)= s(28+t) ^ s(-5+t) ^ s(-6+t) ^ s(-7+t)

-7 -6 -5 28
. . . . . .

feedback 2
Feedback connection 2 : s(39+t)= s(34+t) ^ s(1+t) ^ s(t) ^ s(-1+t)

-1 0 1 34
feedback 1
Feedback connection 1: s(40+t)= s(35+t) ^ s(2+t) ^ s(1+t) ^ s(t)
. . .
0 1 2 35

-8 7 15 23 31 39
-7 6 14 22 30 38
-6 5 13 21 29 37
. . . m=8 bits
-5 4 12 20 28 36
-4 3 11 19 27 35 OUTPUT
-3 2 10 18 26 34
-2 1 9 17 25 33
-1 0 8 16 24 32

SYSTEM CLOCK

8-stage LBUF 40-stage PS-LFSR

b) (n,m)=(40,8) PS-LFSR

4.1 Higher speed of LFSR

Ì (n=39, m=8) PS-LFSR


feedback 8
Feedback connection 8

-7 -6 -5 30
......

feedback 2
Feedback connection 2

-1 0 1 37
feedback 1
Feedback connection 1
...
0 1 2 38

--- 7 15 23 31 ---
-7 6 14 22 30 38
-6 5 13 21 29 37
...
-5 4 12 20 28 36
m=8 bits
-4 3 11 19 27 35
OUTPUT
-3 2 10 18 26 34
-2 1 9 17 25 33
-1 0 8 16 24 32

SYSTEM CLOCK
p( x) = x 39 + x 37 + x 25 + x 24 + x 22 + x 8 + x 6 + x 4 + 1

7-stage LBUF 39-stage PS-LFSR

b) (n,m)=(39,8) PS-LFSR

32
4.1 Higher speed of LFSR

Ì m-Parallel Stream Cipher

C1 C2 ... CN C1 C2 ... CN

(x1) (x1)
l1-LFSR 1 Nonlinear l1-LFSR 1 Nonlinear
... ...
combiners: combiners:
(x2) (x2)
l2-LFSR 2 l2-LFSR 2
(memoryless (memoryless
(x3) or not) (x3) or not)
l3-LFSR 3 l3-LFSR 3

...... ......

(xN) (xN)
lN-LFSR N f 1, f 2, ..., f m lN-LFSR N f 1, f 2, ..., f m

... ... ... ...

m m
Serial/ Parallel/ Serial/ Parallel/
m m m m
Plaintext Parallel Serial Parallel Serial Plaintext
Converter Converter Converter Converter
Ciphertext

Transmitter Channel Receiver

4.1 Higher speed of LFSR

Ì m-parallel summation generator


c11 , c12 ,.., c1M
M M
m
l1-PS-LFSR1
(x11)
Summation
(x21) Generator:
(y1)
. . . SUM-BSG1

(xm1)
m
l2-PS-LFSR2

. . . . . . . . . . . . . . .

c m1 , c m 2 ,.., c mM
M M

(x1m)
Summation
(x2m) Generator:
(ym)
. . . SUM-BSGm
m
lm-PS-LFSRm
(xmm)

Note: N = m, M1 = M2 = ..= Mm = M

33
4.1 Higher speed of LFSR

Ì Higher speed clock-controlled LFSR


(compensate for down-speed)

feedback connection 4

feedback connection 3

feedback connection 2

feedback connection feedback connection 1

00
0 01
02
Y 127-LFSR
39-stage LFSRca 03 Y 129-LFSR
89-stage 4-bit parallel bLFSR d
KEYDATA 1

SEL 1
k n
SEL CLK
... ...
KEYLDEN

fac(t)
(t) fb (t)z(t)
ffac fdb
CLOCK-CONTROL
CLOCK-CONTROL DATA GENERATION
CLOCK-CONTROL

SYSTEM CLOCK

4.1 Higher speed of LFSR

Ì Higher speed clock-controlled LFSR


(compensate for down-speed)
KEY_DATA

KEY_LDEN
carry = S1

From FA sum = S0

feedback 4

d[-3+t] d[3+t]d[6+t] d[31+t]d[33+t]d[44+t] d[47+t] feedback 3


d[85+t]
......

...... feedback 2

feedback 1

S0 S1 S0 S1 S0 S1 S0 S1 S0 S1 S0 S1

d[88+t]
d[t] d[6+t]d[9+t] d[34+t]d[36+t]d[47+t] d[50+t]

s0 s1 s0 s1 s0 s1 s0 s1 s0 s1 s s0 s1
......

s0 s1
0 0 ...... 0 0 0 0 0
d-3 y 1 d0 y 1 d84 d85 y 1 d86 y 1 d87 y 1 d88 0 y 1
y 1
2 2 2 2 2 2 y 2
3 3 3 3 3 3 1 3

......
......
3-stage ......
LBUF(d-3,d-2,d-1)

SYSTEM_CLOCK

d0
d1 fd function output sequences
d2 KEYSTREAM
......
* filtered function
d88

34
4.1 Higher speed of LFSR

Ì (n=39, m=8) PS-LFSR


feedback 8
Feedback connection 8

-7 -6 -5 30
......

feedback 2
Feedback connection 2

-1 0 1 37
feedback 1
Feedback connection 1
...
0 1 2 38

--- 7 15 23 31 ---
-7 6 14 22 30 38
-6 5 13 21 29 37
...
-5 4 12 20 28 36
m=8 bits
-4 3 11 19 27 35
OUTPUT
-3 2 10 18 26 34
-2 1 9 17 25 33
-1 0 8 16 24 32

SYSTEM CLOCK
p( x) = x 39 + x 37 + x 25 + x 24 + x 22 + x 8 + x 6 + x 4 + 1

7-stage LBUF 39-stage PS-LFSR

b) (n,m)=(39,8) PS-LFSR

4.1 Higher speed of LFSR

Ì (n=39,m=8)PS-LFSR: Graphic Design

35
4.1 Higher speed of LFSR

Ì (n=39, m=8) PS-LFSR : Simulation

4.1 Higher speed of LFSR

Ì (n=39, m=8) PS-LFSR : Time Delay

36
4.1 Higher speed of LFSR
Ì LFSR vs. PS-LFSR
Items 39-LFSR (39,8) PS-LFSR

Period 2 39 - 1 2 39 - 1
Processing rate @ 500 x 8
500MHz system clock 500 Mbps = 4 Gbps
(Max. Delay = 1.73 ns)

Hardware Complexity 219 gates 401 gates


[ gates ] (1.83 : 1 )
-39 D F/Fs -46 D F/Fs
-1 (2-1) MUX -1 (2-1) MUX
-7 XORs -56 XORs

4.2 Word Based Stream Ciphers


- Basic components in WBSC
qWord Based Linear Feedback Shift
Register (LFSR)
qDynamic Tables
qPseudorandom Functions
qGeneric Finite State Machine (FSM)

37
4.2 Word Based Stream Ciphers
- Word Based LFSR
qWord Based LFSR

LFSR
q Examples:
l Snow
l Sober
l Turing
l Dragon Nonlinear Filter

Output

4.2 Word Based Stream Ciphers


- Dragon
q Word-based Stream Cipher (WBSC)
Ø Fast, word-operation (@Tx, @ Rx)
Ø Ex) Snow, Sober, Turing, Dragon

W-bit word W-bit word

W-bit word W-bit word W-bit word W-bit word

38
4.2 Word Based Stream Ciphers
- Dragon
q Dragon
Ø Word –based stream cipher : ICISC’2004
Ø Throughput: 23Gbps (H/W), 3.8Gbps (S/W)
Ø Cooperative results with Australia QUT-ISRC(Dr. Dawson)
Ø Submitted to the Int’l standard ECRYPT-eSTREAM (phase 3)

NLFSR M

FPDS Selection

feedback keystream

4.2 Word Based Stream Ciphers


- Dragon

q H/W performance analysis (32-bit word)

39
4.2 Word Based Stream Ciphers
- Dragon
vWord-based FSR (WFSR)
lSimilar to the bit-based LFSR (LFSR)
lFeedback shifted by the unit of the word-size (W)

4.2 Word Based Stream Ciphers


- Dragon
vParallel-Structured Word-based LFSR (PS-WLFSR)

40
4.2 Word Based Stream Ciphers
- Dragon

4.2 Word Based Stream Ciphers


- Dragon

41
4.2 Word Based Stream Ciphers
- Dragon

qImplementing s-box in F function


ØHigh-accessable SRAM memory (compiled SRAM)
ØPer S-Box, used 256X32 bit (1KB) SRAM
ØTotally 24 S-Boxes, 24KB SRAM required

4.2 Word Based Stream Ciphers


- Dragon

42
4.2 Word Based Stream Ciphers
- Dragon
v (New) Dragon in Parallel-Structured (m=8)

4.2 Word Based Stream Ciphers


- Dragon
v (New) Dragon in Parallel-Structured (m=16)

43
4.2 Word Based Stream Ciphers
- Dragon
v Performance Analysis
Items Worst case Typical case Best case
memory 287,600 287,600 287,544
Area
comb. 8,126 8,068 8,219
(gate size)
total 295,726 295,688 295,763
Critical Path delay(ns) 14.36 10.26 6.72
Throughput 4.4 Gbps 6.2 Gbps 9.5 Gbps
Parallel-Throughput(m=8) 35.2 Gbps 49.6 Gbps 76 Gbps
(max. m=16) 70.4 Gbps 99.2 Gbps 152 Gbps

※ [Note] 1) comb. : Combinational logic


2) Best/Typical/Worst case : By Synthesis Library Environmental
3) Throughput [bps] ≒ (Output bits) × Speed

Conclusion
q Higher speed in block cipher
ØASIC latest tech.(layout) : 0.13um, 90–40nm
ØPipeline/Parallel pipeline-structured
ØThe smaller in #step of round, the faster
ØSelection of the faster components in step
(Security is the best required)
q Higher speed in stream cipher
ØParallel-Structured LFSR à m times faster
ØClock-controlled (compensate) à 1 time
ØWord-Based Stream Cipher à W times faster

44
V. Highly Reliable Synchronous
Stream Cipher System

Lee, HoonJae
이훈재 (李焄宰)
Division of Computer Information
Engineering,
Dongseo University, Busan, Korea
hjlee@dongseo.ac.kr
http://kowon.dongseo.ac.kr/~hjlee

2013-05-09 Dongseo University, Korea 89

I. Introduction
u Two kinds in Stream cipher :
Ø Self-synchronous Stream Cipher

Ø Synchronous Stream Cipher

u In Synchronous Stream Cipher,


Ø it requires synchronization of key streams on both ends of
connection (@ the sending part and @ the receiving part)
Ø 1-bit insertion or 1-bit deletion in synchronous stream cipher
will cause loss of synchronization
Ø frequent resynchronizations are not good in security aspect

u In this presentation,
Ø a highly reliable keystream synchronization method

and a synchronous stream cipher system


Ø the performances in the stream cipher system

45
2. Keystream Synchronization
In Synchrous Stream Cipher

(1) On-synchronization case:


Plaintext :11110000 11110000 11110000 11110000 11110000
Keystream:10110100 10101101 11001011 11010010 10101110
--------------------------------------------------
Cipertext : 01000100 01011101 00111011 00100010 01011110

Cipertext : 01000100 01011101 00111011 00100010 01011110


Keystream:10110100 10101101 11001011 11010010 10101110
--------------------------------------------------
Plaintext : 11110000 11110000 11110000 11110000 11110000

2. Keystream Synchronization
In Synchrous Stream Cipher

(2) Out-of-synchronization case: (1-bit inserted, shifted @ Receiver)


Plaintext :11110000 11110000 11110000 11110000 11110000
Keystream:10110100 10101101 11001011 11010010 10101110
--------------------------------------------------
Cipertext : 01000100 01011101 00111011 00100010 01011110

Cipertext : 00100010 00101110 10011101 10010001 00101111 0


Keystream: 10110100 10101101 11001011 11010010 10101110
--------------------------------------------------
Plaintext : 10010110 10000011 01010110 01000011 10000001 0

46
2. Keystream Synchronization
In Synchrous Stream Cipher

(3) Out-of-synchronization case: (1-bit deleted, shifted @ Receiver)


Plaintext :11110000 11110000 11110000 11110000 11110000
Keystream:10110100 10101101 11001011 11010010 10101110
--------------------------------------------------
Cipertext : 01000100 01011101 00111011 00100010 01011110

Cipertext:0 10001000 10111010 01110110 01000100 1011110


Keystream: 10110100 10101101 11001011 11010010 10101110
--------------------------------------------------
Plaintext : 00111100 00010111 10111101 10010110 00010010

2. Keystream Synchronization

q Stream ciphers require synchronization of


key streams on both ends of connection
Ø This is not suitable when packet losses are
common
q Example, OTAR (Over-the-Air Rekeying)
Ø Sending new keys to a remote device over the
communications link (keys are encrypted) &
automatically loading the crypto devices

47
Ex) OTAR Documents – TIA/EIA APCO Project 25

u Project 25 OTAR Documents

v TIA/EIA Telecommunications Systems Bulletin, APCO Project 25,


TSB102.AACA, "Over-The-Air-Rekeying(OTAR) Protocol New
Technology Standards Project Digital Radio Technical Standards," Jan.
1996.
v TIA/EIA Telecommunications Systems Bulletin, APCO Project 25,
TSB102.AACA-1,"Over-The-Air-Rekeying(OTAR) Protocol Addendum
1," Dec. 2000.
v TIA/EIA Telecommunications Systems Bulletin, TSB 102.AACB, "Over-
The-Air-Rekeying(OTAR) Operational Description, Jan. 1997.

v TIA, IS 102.AAAA,”DES Encryption Protocol”

v TIA, TSB-102.BAAA, “Recommanded Common Air Interface”


v Appendum BAAA-1
v TIA,TSB-102.BAAD,”Common Air Interface Operational Description for
Conventional Channels”
2013-05-09 95 Dongseo University,
Korea

2. Keystream Synchronization
- TIA/EIA APCO Project 25 -
u APCO Project 25 - CAI Message Format
à NOT Highly-Reliable ENCRYPT-SYNC in Wireless Noisy Channel
(Window size N = 72 bit (small) !! è 128-bit required !!)

48
3. Stream Cipher System
n Encipering/Deciphering system
KEK

CODEC TEK
Cipher
CAI
Equipment
(Fig. 3)
(Fig. 2)
KMF
TEK

CODEC TEK
Cipher
CAI
Equipment
KMF (Fig. 3)
(Fig. 2)
TEK

KEK

3. Stream Cipher System


n Encipering/Deciphering system
n encryption system with OTAR function for protecting the wirel
ess end-data
n transmit TEK (Traffic Encryption Key) through wireless MR
(Mobile Radio) or CAI
n encryption system, after converting from analog signal to di
gital data through CODEC
Encryption SYSTEM è Fig.2
n ENCRYPTION ALGORITHM with highly-secure (>128-bit Key)

for Block Cipher : AES, SEED, ARIA


for Stream Cipher : LILI-II, Dragon, PingPong
n ENCRYPTION SYNCHRONIZATION : highly-reliable

(=128-bit SYNC)

98 Dongseo University,
Korea

49
3. Stream Cipher System

2013-05-09 99 Dongseo University,


Korea

Technical Approach
q Stream Synchronization : Software Implementation

En En
Voice crypted Crypted Voice
(Analog) Voice Voice (Analog)
(Digital) (Digital)

2013-05-09 100 Dongseo University,


Korea

50
Technical Approach
qStream Synchronization : SW

A/D
En
Or Crypted
Voice

D/A

2013-05-09 101 Dongseo University,


Korea

Technical Approach

2013-05-09 102 Dongseo University,


Korea

51
3. Stream Cipher System
n System Blocks (See Fig.2)
n Block (1) is codec interface or KMF identification interface
circuit, and
n Block (2) is circuit with CAI function.
n Block (3) is main controller to control whole system and
n Block (4) is a synchronization pattern generator which
generates encryption synchronization pattern.
n Block (5) and (12) are transmission and reception session
key buffer.
n Block (6) and (11) are transmission and reception session
key construction in order to distribute session key in secure.
n Block (7) and (13) are transmission and reception encryptio
n algorithm which have high security (i.e. SEED, AES).
n Block (8) and (14) are ZS (Zero-suppression), encryption/
decryption operation (XOR).
n Block (9) is data selector to divide among synchronization
pattern/session key/cipher text.
n Block (10) is synchronization pattern detector to detect patt
ern in transmission occurred

2013-05-09 103 Dongseo University,


Korea

3. Stream Cipher System


n Encryption Algorithms
Previous algorithm (OTAR) Required algorithm (OTAR)

Confidentiality Identification Confidentiality Identification


Algorithm Algorithm Algorithm Algorithm
1) Block cipher:
AES, SHA-160,
SEED, SHA-1,
ARIA MD5,
DES MD5 T-DES, IDEA SEED-CBC,
2) Stream cipher: AES-CBC,
LILI-II T-DES-CBC,
Dragon IDEA-CBC
PingPong

52
3. Stream Cipher System

q Parameters for Encryption-SYNC Performances


Ø PD (detection probability) : Probability of correct-receiving
the transmitted SYNPAT without any bit error.
Ø PM(missing probability) : Probability of incorrect-receiving
the transmitted SYNPAT with any bit error
Ø PF(false-detection probability) : Probability of the receiving
the SYNPAT from random sequences if the sender should
not be transmitted
Ø TF(Mean false-detection time) : Mean time between false-
detection
Ø N : Detection window ,
Ø B : Channel bit error rates
Ø R (bps) : Transmission rates
Ø NT(0≤NT≤N) : Threshold

2013-05-09 105 Dongseo University,


Korea

3. Stream Cipher System

q Encryption-SYNC Performances
Ø PD : Detection probability
Ø PM : Missing probability
Ø N : Detection window ,
Ø B : Channel bit error rates
Ø R (bps) : Transmission rates
Ø NT(0≤NT≤N) : Threshold

53
3. Stream Cipher System
q [Table] Sync. probability for variable NT(@BER = 10-1).
NT PF PD PM
0 2.938735877055719e-39 1.390084229174165e-06 9.999986099157708e-01
1 3.790969281401877e-37 2.116017137142591e-05 9.999788398286286e-01
2 2.426514213684907e-35 1.606491218512547e-04 9.998393508781488e-01
3 1.027479040902622e-33 8.115975682014417e-04 9.991884024317985e-01
4 3.237791337733303e-32 3.071835266561997e-03 9.969281647334380e-01
5 8.098686849208071e-31 9.300045916275050e-03 9.906999540837249e-01
6 1.674842950156203e-29 2.348652596439370e-02 9.765134740356063e-01
7 2.945347751630233e-28 5.095875762354045e-02 9.490412423764596e-01
8 4.496053253292625e-27 9.712736992623532e-02 9.028726300737647e-01
9 6.051629962835398e-26 1.655253152108301e-01 8.344746847891700e-01
10 7.271572314915841e-25 2.559625999178171e-01 7.440374000821830e-01
11 7.878396318751689e-24 3.637565370098624e-01 6.362434629901376e-01
12 7.760297741953771e-23 4.805333041263665e-01 5.194666958736335e-01
13 6.997607780111669e-22 5.963119811331635e-01 4.036880188668365e-01
14 5.810342711442406e-21 7.019829976111680e-01 2.980170023888320e-01
15 4.465076540551982e-20 7.912163018922382e-01 2.087836981077618e-01
16 3.189612506824416e-19 8.612396598832773e-01 1.387603401167227e-01
17 2.126183271330397e-18 9.124985894574300e-01 8.750141054256999e-02
18 1.327071906532612e-17 9.476204491915974e-01 5.237955080840262e-02
19 7.779171576740664e-17 9.702134587513881e-01 2.978654124861191e-02
20 4.294311477937454e-16 9.838947814335587e-01 1.610521856644131e-02
21 2.237862512500631e-15 9.917126802385244e-01 8.287319761475564e-03
22 1.103341505902957e-14 9.959375044101476e-01 4.062495589852388e-03
23 5.156943983868469e-14 9.981009409782960e-01 1.899059021703953e-03
24 2.289145482496759e-13 9.991526115496695e-01 8.473884503304996e-04
25 9.666701992393990e-13 9.996387170662638e-01 3.612829337361623e-04
26 3.889317585852534e-12 9.998526865920681e-01 1.473134079319482e-04
27 1.493042993527993e-11 9.999425009624112e-01 5.749903758878183e-05
28 5.475729948142875e-11 9.999784979923908e-01 2.150200760919763e-05
29 1.920913323991833e-10 9.999922899581286e-01 7.710041871389350e-06
30 6.452936410277733e-10 9.999973470123162e-01 2.652987683759989e-06
2013-05-09 107 Dongseo University,
Korea

3. Stream Cipher System


q [Table] Sync. probability for variable BER(@NT = 25).
BER PF PD PM
10-1 9.666701992 x 10-11 0.9996387171 3.612829337 x 10-2
10-2 9.666701992 x 10-11 1.0000000000 0.000000000
10-3 9.666701992 x 10-11 1.0000000000 1.110223025 x 10-14
10-4 9.666701992 x 10-11 1.0000000000 1.110223025 x 10-14
10-5 9.666701992 x 10-11 1.0000000000 1.110223025 x 10-14
10-6 9.666701992 x 10-11 1.0000000000 0.000000000
10-7 9.666701992 x 10-11 1.0000000000 0.000000000
10-8 9.666701992 x 10-11 1.0000000000 0.000000000
q @ N=128, NT=25 à THR=128-25=103
q @ BER=0.1 à PF=0.96667 x 10-12,
PD=0.9996387, PM=0.36 x 10-3 ,
TF ~ 10 Years
q @ BER=0.01 à PD=1-10-15
2013-05-09 108 Dongseo University,
Korea

54

You might also like