This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

Bruce Schneier

∗

John Kelsey

†

Doug Whiting

‡

David Wagner

§

Chris Hall

¶

Niels Ferguson

15 June 1998

Abstract

Twoﬁsh is a 128-bit block cipher that accepts a variable-length key up to 256 bits. The cipher is a

16-round Feistel network with a bijective F function made up of four key-dependent 8-by-8-bit S-boxes,

a ﬁxed 4-by-4 maximum distance separable matrix over GF(2

8

), a pseudo-Hadamard transform, bitwise

rotations, and a carefully designed key schedule. A fully optimized implementation of Twoﬁsh encrypts

on a Pentium Pro at 17.8 clock cycles per byte, and an 8-bit smart card implementation encrypts at

1660 clock cycles per byte. Twoﬁsh can be implemented in hardware in 14000 gates. The design of both

the round function and the key schedule permits a wide variety of tradeoﬀs between speed, software size,

key setup time, gate count, and memory. We have extensively cryptanalyzed Twoﬁsh; our best attack

breaks 5 rounds with 2

22.5

chosen plaintexts and 2

51

eﬀort.

Keywords: Twoﬁsh, cryptography, cryptanalysis, block cipher, AES.

Current web site: http://www.counterpane.com/twofish.html

∗

Counterpane Systems, 101 E Minnehaha Parkway, Minneapolis, MN 55419, USA; schneier@counterpane.com.

†

Counterpane Systems; kelsey@counterpane.com.

‡

Hi/fn, Inc., 5973 Avenida Encinas, Suite 110, Carlsbad, CA 92008, USA; dwhiting@hifn.com.

§

University of California Berkeley, Soda Hall, Berkeley, CA 94720, USA; daw@cs.berkeley.edu.

¶

Counterpane Systems; hall@counterpane.com.

**Counterpane Systems; niels@counterpane.com.
**

1

Contents

1 Introduction 3

2 Twoﬁsh Design Goals 3

3 Twoﬁsh Building Blocks 4

3.1 Feistel Networks . . . . . . . . . . . . 4

3.2 S-boxes . . . . . . . . . . . . . . . . . 5

3.3 MDS Matrices . . . . . . . . . . . . . 5

3.4 Pseudo-Hadamard Transforms . . . . . 5

3.5 Whitening . . . . . . . . . . . . . . . . 5

3.6 Key Schedule . . . . . . . . . . . . . . 5

4 Twoﬁsh 5

4.1 The Function F . . . . . . . . . . . . . 7

4.2 The Function g . . . . . . . . . . . . . 7

4.3 The Key Schedule . . . . . . . . . . . 8

4.4 Round Function Overview . . . . . . . 12

5 Performance of Twoﬁsh 12

5.1 Performance on Large Microprocessors 12

5.2 Performance on Smart Cards . . . . . 15

5.3 Performance on Future Microprocessors 16

5.4 Hardware Performance . . . . . . . . . 17

6 Twoﬁsh Design Philosophy 18

6.1 Performance-Driven Design . . . . . . 18

6.2 Conservative Design . . . . . . . . . . 19

6.3 Simple Design . . . . . . . . . . . . . . 20

6.4 S-boxes . . . . . . . . . . . . . . . . . 21

6.5 The Key Schedule . . . . . . . . . . . 22

7 The Design of Twoﬁsh 23

7.1 The Round Structure . . . . . . . . . . 23

7.2 The Key-dependent S-boxes . . . . . . 24

7.3 MDS Matrix . . . . . . . . . . . . . . 27

7.4 PHT . . . . . . . . . . . . . . . . . . . 29

7.5 Key Addition . . . . . . . . . . . . . . 29

7.6 Feistel Combining Operation . . . . . 29

7.7 Use of Diﬀerent Groups . . . . . . . . 29

7.8 Diﬀusion in the Round Function . . . 29

7.9 One-bit Rotation . . . . . . . . . . . . 30

7.10 The Number of Rounds . . . . . . . . 31

7.11 The Key Schedule . . . . . . . . . . . 31

7.12 Reed-Solomon Code . . . . . . . . . . 36

8 Cryptanalysis of Twoﬁsh 36

8.1 Diﬀerential Cryptanalysis . . . . . . . 36

8.2 Extensions to Diﬀerential Cryptanal-

ysis . . . . . . . . . . . . . . . . . . . . 41

8.3 Search for the Best Diﬀerential Char-

acteristic . . . . . . . . . . . . . . . . . 41

8.4 Linear Cryptanalysis . . . . . . . . . . 44

8.5 Interpolation Attack . . . . . . . . . . 45

8.6 Partial Key Guessing Attacks . . . . . 46

8.7 Related-key Cryptanalysis . . . . . . . 46

8.8 A Related-Key Attack on a Twoﬁsh

Variant . . . . . . . . . . . . . . . . . 48

8.9 Side-Channel Cryptanalysis and

Fault Analysis . . . . . . . . . . . . . . 50

8.10 Attacking Simpliﬁed Twoﬁsh . . . . . 50

9 Trap Doors in Twoﬁsh 52

10 When is a Cipher Insecure? 53

11 Using Twoﬁsh 53

11.1 Chaining Modes . . . . . . . . . . . . 53

11.2 One-Way Hash Functions . . . . . . . 53

11.3 Message Authentication Codes . . . . 53

11.4 Pseudo-Random Number Generators . 53

11.5 Larger Keys . . . . . . . . . . . . . . . 54

11.6 Additional Block Sizes . . . . . . . . . 54

11.7 More or Fewer Rounds . . . . . . . . . 54

11.8 Family Key Variant: Twoﬁsh-FK . . . 54

12 Historical Remarks 56

13 Conclusions and Further Work 57

14 Acknowledgments 58

A Twoﬁsh Test Vectors 65

A.1 Intermediate Values . . . . . . . . . . 65

A.2 Full Encryptions . . . . . . . . . . . . 67

2

1 Introduction

In 1972 and 1974, the National Bureau of Standards

(now the National Institute of Standards and Tech-

nology, or NIST) issued the ﬁrst public request for an

encryption standard. The result was DES [NBS77],

arguably the most widely used and successful en-

cryption algorithm in the world.

Despite its popularity, DES has been plagued with

controversy. Some cryptographers objected to the

“closed-door” design process of the algorithm. The

debate about whether DES’ key is too short for ac-

ceptable commercial security has raged for many

years [DH79], but recent advances in distributed key

search techniques have left no doubt in anyone’s

mind that its key is simply too short for today’s

security applications [Wie94, BDR+96]. Triple-

DES has emerged as an interim solution in many

high-security applications, such as banking, but it

is too slow for some uses. More fundamentally,

the 64-bit block length shared by DES and most

other well-known ciphers opens it up to attacks

when large amounts of data are encrypted under the

same key.

In response to a growing desire to replace DES,

NIST announced the Advanced Encryption Stan-

dard (AES) program in 1997 [NIST97a]. NIST so-

licited comments from the public on the proposed

standard, and eventually issued a call for algorithms

to satisfy the standard [NIST97b]. The intention is

for NIST to make all submissions public and even-

tually, through a process of public review and com-

ment, choose a new encryption standard to replace

DES.

NIST’s call requested a block cipher. Block ciphers

can be used to design stream ciphers with a variety of

synchronization and error extension properties, one-

way hash functions, message authentication codes,

and pseudo-random number generators. Because of

this ﬂexibility, they are the workhorse of modern

cryptography.

NIST speciﬁed several other design criteria: a longer

key length, larger block size, faster speed, and

greater ﬂexibility. While no single algorithm can

be optimized for all needs, NIST intends AES to be-

come the standard symmetric algorithm of the next

decade.

Twoﬁsh is our submission to the AES selection pro-

cess. It meets all the required NIST criteria—128-

bit block; 128-, 192-, and 256-bit key; eﬃcient on

various platforms; etc.—and some strenuous design

requirements, performance as well as cryptographic,

of our own.

Twoﬁsh can:

• Encrypt data at 285 clock cycles per block on

a Pentium Pro, after a 12700 clock-cycle key

setup.

• Encrypt data at 860 clock cycles per block on

a Pentium Pro, after a 1250 clock-cycle key

setup.

• Encrypt data at 26500 clock cycles per block

on a 6805 smart card, after a 1750 clock-cycle

key setup.

This paper is organized as follows: Section 2 dis-

cusses our design goals for Twoﬁsh. Section 3 de-

scribes the building blocks and general design of the

cipher. Section 4 deﬁnes the cipher. Section 5 dis-

cusses the performance of Twoﬁsh. Section 6 talks

about the design philosophy that we used. In Sec-

tion 7 we describe the design process, and why the

various choices were made. Section 8 contains our

best cryptanalysis of Twoﬁsh. In Section 9 we dis-

cuss the possibility of trapdoors in the cipher. Sec-

tion 10 compares Twoﬁsh with some other ciphers.

Section 11 discusses various modes of using Twoﬁsh,

including a family-key variant. Section 12 contains

historical remarks, and Section 13 our conclusions

and directions for future analysis.

2 Twoﬁsh Design Goals

Twoﬁsh was designed to meet NIST’s design criteria

for AES [NIST97b]. Speciﬁcally, they are:

• A 128-bit symmetric block cipher.

• Key lengths of 128 bits, 192 bits, and 256 bits.

• No weak keys.

• Eﬃciency, both on the Intel Pentium Pro and

other software and hardware platforms.

• Flexible design: e.g., accept additional key

lengths; be implementable on a wide variety

of platforms and applications; and be suitable

for a stream cipher, hash function, and MAC.

• Simple design, both to facilitate ease of analy-

sis and ease of implementation.

Additionally, we imposed the following performance

criteria on our design:

• Accept any key length up to 256 bits.

3

• Encrypt data in less than 500 clock cycles per

block on an Intel Pentium, Pentium Pro, and

Pentium II, for a fully optimized version of the

algorithm.

• Be capable of setting up a 128-bit key (for op-

timal encryption speed) in less than the time

required to encrypt 32 blocks on a Pentium,

Pentium Pro, and Pentium II.

• Encrypt data in less than 5000 clock cycles per

block on a Pentium, Pentium Pro, and Pen-

tium II with no key setup time.

• Not contain any operations that make it inef-

ﬁcient on other 32-bit microprocessors.

• Not contain any operations that make it inef-

ﬁcient on 8-bit and 16-bit microprocessors.

• Not contain any operations that reduce its ef-

ﬁciency on proposed 64-bit microprocessors;

e.g., Merced.

• Not include any elements that make it ineﬃ-

cient in hardware.

• Have a variety of performance tradeoﬀs with

respect to the key schedule.

• Encrypt data in less than less than 10 millisec-

onds on a commodity 8-bit microprocessor.

• Be implementable on a 8-bit microprocessor

with only 64 bytes of RAM.

• Be implementable in hardware using less than

20,000 gates.

Our cryptographic goals were as follows:

• 16-round Twoﬁsh (without whitening) should

have no chosen-plaintext attack requiring

fewer than 2

80

chosen plaintexts and less than

2

N

time, where N is the key length.

• 12-round Twoﬁsh (without whitening) should

have no related-key attack requiring fewer

than 2

64

chosen plaintexts, and less than 2

N/2

time, where N is the key length.

Finally, we imposed the following ﬂexibility goals:

• Have variants with a variable number of

rounds.

• Have a key schedule that can be precomputed

for maximum speed, or computed on-the-ﬂy

for maximum agility and minimum memory

requirements. Additionally, it should be suit-

able for dedicated hardware applications: e.g.,

no large tables.

• Be suitable as a stream cipher, one-way hash

function, MAC, and pseudo-random number

generator, using well-understood construction

methods.

• Have a family-key variant to allow for diﬀer-

ent, non-interoperable, versions of the cipher.

We feel we have met all of these goals in the design

of Twoﬁsh.

3 Twoﬁsh Building Blocks

3.1 Feistel Networks

A Feistel network is a general method of transform-

ing any function (usually called the F function) into

a permutation. It was invented by Horst Feistel

[FNS75] in his design of Lucifer [Fei73], and popular-

ized by DES [NBS77]. It is the basis of most block ci-

phers published since then, including FEAL [SM88],

GOST [GOST89], Khufu and Khafre [Mer91], LOKI

[BPS90, BKPS93], CAST-128 [Ada97a], Blowﬁsh

[Sch94], and RC5 [Riv95].

The fundamental building block of a Feistel network

is the F function: a key-dependent mapping of an

input string onto an output string. An F function

is always non-linear and possibly non-surjective

1

:

F : {0, 1}

n/2

×{0, 1}

N

→ {0, 1}

n/2

where n is the block size of the Feistel Network, and

F is a function taking n/2 bits of the block and N

bits of a key as input, and producing an output of

length n/2 bits. In each round, the “source block”

is the input to F, and the output of F is xored with

the “target block,” after which these two blocks swap

places for the next round. The idea here is to take

an F function, which may be a weak encryption al-

gorithm when taken by itself, and repeatedly iterate

it to create a strong encryption algorithm.

Two rounds of a Feistel network is called a “cycle”

[SK96]. In one cycle, every bit of the text block has

been modiﬁed once.

2

1

A non-surjective F function is one in which not all outputs in the output space can occur.

2

The notion of a cycle allows Feistel networks to be compared with unbalanced Feistel networks [SK96, ZMI90] such as

MacGuﬃn [BS95] (cryptanalyzed in [RP95a]) and Bear/Lion [AB96b], and with SP-networks (also called uniform transforma-

tion structures [Fei73]) such as IDEA, SAFER, and Shark [RDP+96] (see also [YTH96]). Thus, 8-cycle (8-round) IDEA is

comparable to 8-cycle (16-round) DES and 8-cycle (32-round) Skipjack.

4

Twoﬁsh is a 16-round Feistel network with a bijec-

tive F function.

3.2 S-boxes

An S-box is a table-driven non-linear substitution

operation used in most block ciphers. S-boxes vary

in both input size and output size, and can be cre-

ated either randomly or algorithmically. S-boxes

were ﬁrst used in Lucifer, then DES, and afterwards

in most encryption algorithms.

Twoﬁsh uses four diﬀerent, bijective, key-dependent,

8-by-8-bit S-boxes. These S-boxes are built using

two ﬁxed 8-by-8-bit permutations and key material.

3.3 MDS Matrices

A maximum distance separable (MDS) code over a

ﬁeld is a linear mapping from a ﬁeld elements to b

ﬁeld elements, producing a composite vector of a+b

elements, with the property that the minimum num-

ber of non-zero elements in any non-zero vector is at

least b +1 [MS77]. Put another way, the “distance”

(i.e., the number of elements that diﬀer) between

any two distinct vectors produced by the MDS map-

ping is at least b + 1. It can easily be shown that

no mapping can have a larger minimum distance be-

tween two distinct vectors, hence the term maximum

distance separable. MDS mappings can be repre-

sented by an MDS matrix consisting of a × b ele-

ments. Reed-Solomon (RS) error-correcting codes

are known to be MDS. A necessary and suﬃcient

condition for an a ×b matrix to be MDS is that all

possible square submatrices, obtained by discarding

rows or columns, are non-singular.

Serge Vaudenay ﬁrst proposed MDS matrices as a

cipher design element [Vau95]. Shark [RDP+96]

and Square [DKR97] use MDS matrices (see also

[YMT97]), although we ﬁrst saw the construction

used in the unpublished cipher Manta

3

[Fer96].

Twoﬁsh uses a single 4-by-4 MDS matrix over

GF(2

8

).

3.4 Pseudo-Hadamard Transforms

A pseudo-Hadamard transform (PHT) is a sim-

ple mixing operation that runs quickly in software.

Given two inputs, a and b, the 32-bit PHT is deﬁned

as:

a

= a +b mod 2

32

b

= a + 2b mod 2

32

SAFER [Mas94] uses 8-bit PHTs extensively for dif-

fusion. Twoﬁsh uses a 32-bit PHT to mix the out-

puts from its two parallel 32-bit g functions. This

PHT can be executed in two opcodes on most mod-

ern microprocessors, including the Pentium family.

3.5 Whitening

Whitening, the technique of xoring key material be-

fore the ﬁrst round and after the last round, was used

by Merkle in Khufu/Khafre, and independently in-

vented by Rivest for DES-X [KR96]. In [KR96], it

was shown that whitening substantially increases the

diﬃculty of keysearch attacks against the remain-

der of the cipher. In our attacks on reduced-round

Twoﬁsh variants, we discovered that whitening sub-

stantially increased the diﬃculty of attacking the ci-

pher, by hiding from an attacker the speciﬁc inputs

to the ﬁrst and last rounds’ F functions.

Twoﬁsh xors 128 bits of subkey before the ﬁrst Feis-

tel round, and another 128 bits after the last Feistel

round. These subkeys are calculated in the same

manner as the round subkeys, but are not used any-

where else in the cipher.

3.6 Key Schedule

The key schedule is the means by which the key bits

are turned into round keys that the cipher can use.

Twoﬁsh needs a lot of key material, and has a com-

plicated key schedule. To facilitate analysis, the key

schedule uses the same primitives as the round func-

tion.

4 Twoﬁsh

Figure 1 shows an overview of the Twoﬁsh block ci-

pher. Twoﬁsh uses a 16-round Feistel-like structure

with additional whitening of the input and output.

The only non-Feistel elements are the 1-bit rotates.

The rotations can be moved into the F function to

create a pure Feistel structure, but this requires an

additional rotation of the words just before the out-

put whitening step.

The plaintext is split into four 32-bit words. In the

input whitening step, these are xored with four key

words. This is followed by sixteen rounds. In each

3

Manta is a block cipher with a large block size and an emphasis on long-term security rather than speed. It uses an SP-like

network with DES as the S-boxes and MDS matrices for the permutations.

5

15

more

rounds

one

round

PHT

g

g

g g g g

g g g g

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

h

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

S-box 1

S-box 2

S-box 3

MDS

g

:

:

K4 K5 K6 K7

K0 K1 K2 K3

input

whitening

undo

last

swap

output

whitening

<<<8

S-box 0

S-box 1

S-box 2

S-box 3

MDS

K2r+9

K2r+8

F

S-box 0

g

P (128 bits)

C (128 bits)

>>>1

<<<1

Figure 1: Twoﬁsh

6

round, the two words on the left are used as input

to the g functions. (One of them is rotated by 8

bits ﬁrst.) The g function consists of four byte-wide

key-dependent S-boxes, followed by a linear mix-

ing step based on an MDS matrix. The results of

the two g functions are combined using a Pseudo-

Hadamard Transform (PHT), and two keywords are

added. These two results are then xored into the

words on the right (one of which is rotated left by 1

bit ﬁrst, the other is rotated right afterwards). The

left and right halves are then swapped for the next

round. After all the rounds, the swap of the last

round is reversed, and the four words are xored with

four more key words to produce the ciphertext.

More formally, the 16 bytes of plaintext p

0

, . . . , p

15

are ﬁrst split into 4 words P

0

, . . . , P

3

of 32 bits each

using the little-endian convention.

P

i

=

3

j=0

p

(4i+j)

· 2

8j

i = 0, . . . , 3

In the input whitening step, these words are xored

with 4 words of the expanded key.

R

0,i

= P

i

⊕K

i

i = 0, . . . , 3

In each of the 16 rounds, the ﬁrst two words are

used as input to the function F, which also takes

the round number as input. The third word is xored

with the ﬁrst output of F and then rotated right by

one bit. The fourth word is rotated left by one bit

and then xored with the second output word of F.

Finally, the two halves are exchanged. Thus,

(F

r,0

, F

r,1

) = F(R

r,0

, R

r,1

, r)

R

r+1,0

= ROR(R

r,2

⊕F

r,0

, 1)

R

r+1,1

= ROL(R

r,3

, 1) ⊕F

r,1

R

r+1,2

= R

r,0

R

r+1,3

= R

r,1

for r = 0, . . . , 15 and where ROR and ROL are func-

tions that rotate their ﬁrst argument (a 32-bit word)

left or right by the number of bits indicated by their

second argument.

The output whitening step undoes the ‘swap’ of the

last round, and xors the data words with 4 words

of the expanded key.

C

i

= R

16,(i+2) mod 4

⊕K

i+4

i = 0, . . . , 3

The four words of ciphertext are then written as 16

bytes c

0

, . . . , c

15

using the same little-endian conver-

sion used for the plaintext.

c

i

=

_

C

i/4

2

8(i mod 4)

_

mod 2

8

i = 0, . . . , 15

4.1 The Function F

The function F is a key-dependent permutation on

64-bit values. It takes three arguments, two input

words R

0

and R

1

, and the round number r used to

select the appropriate subkeys. R

0

is passed through

the g function, which yields T

0

. R

1

is rotated left

by 8 bits and then passed through the g function to

yield T

1

. The results T

0

and T

1

are then combined

in a PHT and two words of the expanded key are

added.

T

0

= g(R

0

)

T

1

= g(ROL(R

1

, 8))

F

0

= (T

0

+T

1

+K

2r+8

) mod 2

32

F

1

= (T

0

+ 2T

1

+K

2r+9

) mod 2

32

where (F

0

, F

1

) is the result of F. We also deﬁne the

function F

for use in our analysis. F

is identical to

the F function, except that it does not add any key

blocks to the output. (The PHT is still performed.)

4.2 The Function g

The function g forms the heart of Twoﬁsh. The in-

put word X is split into four bytes. Each byte is

run through its own key-dependent S-box. Each S-

box is bijective, takes 8 bits of input, and produces

8 bits of output. The four results are interpreted as

a vector of length 4 over GF(2

8

), and multiplied by

the 4×4 MDS matrix (using the ﬁeld GF(2

8

) for the

computations). The resulting vector is interpreted

as a 32-bit word which is the result of g.

x

i

=

_

X/2

8i

_

mod 2

8

i = 0, . . . , 3

y

i

= s

i

[x

i

] i = 0, . . . , 3

_

_

_

_

z

0

z

1

z

2

z

3

_

_

_

_

=

_

_

_

· · · · ·

.

.

. MDS

.

.

.

· · · · ·

_

_

_·

_

_

_

_

y

0

y

1

y

2

y

3

_

_

_

_

Z =

3

i=0

z

i

· 2

8i

where s

i

are the key-dependent S-boxes and Z is the

result of g. For this to be well-deﬁned, we need to

specify the correspondence between byte values and

the ﬁeld elements of GF(2

8

). We represent GF(2

8

)

as GF(2)[x]/v(x) where v(x) = x

8

+x

6

+x

5

+x

3

+1

is a primitive polynomial of degree 8 over GF(2).

The ﬁeld element a =

7

i=0

a

i

x

i

with a

i

∈ GF(2)

7

is identiﬁed with the byte value

7

i=0

a

i

2

i

. This is

in some sense the “natural” mapping; addition in

GF(2

8

) corresponds to a xor of the bytes.

The MDS matrix is given by:

MDS =

_

_

_

_

01 EF 5B 5B

5B EF EF 01

EF 5B 01 EF

EF 01 EF 5B

_

_

_

_

where the elements have been written as hexadeci-

mal byte values using the above deﬁned correspon-

dence.

4.3 The Key Schedule

The key schedule has to provide 40 words of ex-

panded key K

0

, . . . , K

39

, and the 4 key-dependent

S-boxes used in the g function. Twoﬁsh is deﬁned

for keys of length N = 128, N = 192, and N = 256.

Keys of any length shorter than 256 bits can be used

by padding them with zeroes until the next larger

deﬁned key length.

We deﬁne k = N/64. The key M consists of 8k

bytes m

0

, . . . , m

8k−1

. The bytes are ﬁrst converted

into 2k words of 32 bits each

M

i

=

3

j=0

m

(4i+j)

· 2

8j

i = 0, . . . , 2k −1

and then into two word vectors of length k.

M

e

= (M

0

, M

2

, . . . , M

2k−2

)

M

o

= (M

1

, M

3

, . . . , M

2k−1

)

A third word vector of length k is also derived from

the key. This is done by taking the key bytes

in groups of 8, interpreting them as a vector over

GF(2

8

), and multiplying them by a 4 ×8 matrix de-

rived from an RS code. Each result of 4 bytes is then

interpreted as a 32-bit word. These words make up

the third vector.

_

_

_

_

s

i,0

s

i,1

s

i,2

s

i,3

_

_

_

_

=

_

_

_

· · · · ·

.

.

. RS

.

.

.

· · · · ·

_

_

_·

_

_

_

_

_

_

_

_

_

_

_

_

m

8i

m

8i+1

m

8i+2

m

8i+3

m

8i+4

m

8i+5

m

8i+6

m

8i+7

_

_

_

_

_

_

_

_

_

_

_

_

S

i

=

3

j=0

s

i,j

· 2

8j

for i = 0, . . . , k −1, and

S = (S

k−1

, S

k−2

, . . . , S

0

)

Note that S lists the words in “reverse” order. For

the RS matrix multiply, GF(2

8

) is represented by

GF(2)[x]/w(x), where w(x) = x

8

+x

6

+x

3

+x

2

+1 is

another primitive polynomial of degree 8 over GF(2).

The mapping between byte values and elements of

GF(2

8

) uses the same deﬁnition as used for the MDS

matrix multiply. Using this mapping, the RS matrix

is given by:

RS =

_

_

_

_

01 A4 55 87 5A 58 DB 9E

A4 56 82 F3 1E C6 68 E5

02 A1 FC C1 47 AE 3D 19

A4 55 87 5A 58 DB 9E 03

_

_

_

_

The three vectors M

e

, M

o

, and S form the basis of

the key schedule.

4.3.1 Additional Key Lengths

Twoﬁsh can accept keys of any byte length up to

256 bits. For key sizes that are not deﬁned above,

the key is padded at the end with zero bytes to the

next larger length that is deﬁned. For example, an

80-bit key m

0

, . . . , m

9

would be extended by setting

m

i

= 0 for i = 10, . . . , 15 and treating it as a 128-bit

key.

4.3.2 The Function h

Figure 2 shows an overview of the function h. This

is a function that takes two inputs—a 32-bit word

X and a list L = (L

0

, . . . , L

k−1

) of 32-bit words of

length k—and produces one word of output. This

function works in k stages. In each stage, the four

bytes are each passed through a ﬁxed S-box, and

xored with a byte derived from the list. Finally,

the bytes are once again passed through a ﬁxed S-

box, and the four bytes are multiplied by the MDS

matrix just as in g. More formally: we split the

words into bytes.

l

i,j

=

_

L

i

/2

8j

_

mod 2

8

x

j

=

_

X/2

8j

_

mod 2

8

for i = 0, . . . , k − 1 and j = 0, . . . , 3. Then the se-

quence of substitutions and xors is applied.

y

k,j

= x

j

j = 0, . . . , 3

8

q

0

q

0

q

0

q

0

q

0

q

0

q

0

q

0

q

0

q

0

g

g

g

g

? ? ? ?

?

C

C

C

C

? ? ? ?

?

? ? ? ?

?

? ? ? ?

?

? ? ? ?

?

MDS

Z

X

k = 4

k < 4

k > 2

k = 2

L

3

L

2

L

1

L

0

q

1

q

1

q

1

q

1

q

1

q

1

q

1

q

1

q

1

q

1

Figure 2: The function h

9

If k = 4 we have

y

3,0

= q

1

[y

4,0

] ⊕l

3,0

y

3,1

= q

0

[y

4,1

] ⊕l

3,1

y

3,2

= q

0

[y

4,2

] ⊕l

3,2

y

3,3

= q

1

[y

4,3

] ⊕l

3,3

If k ≥ 3 we have

y

2,0

= q

1

[y

3,0

] ⊕l

2,0

y

2,1

= q

1

[y

3,1

] ⊕l

2,1

y

2,2

= q

0

[y

3,2

] ⊕l

2,2

y

2,3

= q

0

[y

3,3

] ⊕l

2,3

In all cases we have

y

0

= q

1

[q

0

[q

0

[y

2,0

] ⊕l

1,0

] ⊕l

0,0

]

y

1

= q

0

[q

0

[q

1

[y

2,1

] ⊕l

1,1

] ⊕l

0,1

]

y

2

= q

1

[q

1

[q

0

[y

2,2

] ⊕l

1,2

] ⊕l

0,2

]

y

3

= q

0

[q

1

[q

1

[y

2,3

] ⊕l

1,3

] ⊕l

0,3

]

Here, q

0

and q

1

are ﬁxed permutations on 8-bit val-

ues that we will deﬁne shortly. The resulting vector

of y

i

’s is multiplied by the MDS matrix, just as in

the g function.

_

_

_

_

z

0

z

1

z

2

z

3

_

_

_

_

=

_

_

_

· · · · ·

.

.

. MDS

.

.

.

· · · · ·

_

_

_·

_

_

_

_

y

0

y

1

y

2

y

3

_

_

_

_

Z =

3

i=0

z

i

· 2

8i

where Z is the result of h.

4.3.3 The Key-dependent S-boxes

We can now deﬁne the S-boxes in the function g by

g(X) = h(X, S)

That is, for i = 0, . . . , 3, the key-dependent S-box

s

i

is formed by the mapping from x

i

to y

i

in the h

function, where the list L is equal to the vector S

derived from the key.

4.3.4 The Expanded Key Words K

j

The words of the expanded key are deﬁned using the

h function.

ρ = 2

24

+ 2

16

+ 2

8

+ 2

0

A

i

= h(2iρ, M

e

)

B

i

= ROL(h((2i + 1)ρ, M

o

), 8)

K

2i

= (A

i

+B

i

) mod 2

32

K

2i+1

= ROL((A

i

+ 2B

i

) mod 2

32

, 9)

The constant ρ is used here to duplicate bytes; it

has the property that for i = 0, . . . , 255, the word iρ

consists of four equal bytes, each with the value i.

The function h is applied to words of this type. For

A

i

the byte values are 2i, and the second argument

of h is M

e

. B

i

is computed similarly using 2i +1 as

the byte value and M

o

as the second argument, with

an extra rotate over 8 bits. The values A

i

and B

i

are combined in a PHT. One of the results is further

rotated by 9 bits. The two results form two words

of the expanded key.

4.3.5 The Permutations q

0

and q

1

The permutations q

0

and q

1

are ﬁxed permutations

on 8-bit values. They are constructed from four dif-

ferent 4-bit permutations each. For the input value

x, we deﬁne the corresponding output value y as

follows:

a

0

, b

0

= x/16, x mod 16

a

1

= a

0

⊕b

0

b

1

= a

0

⊕ROR

4

(b

0

, 1) ⊕8a

0

mod 16

a

2

, b

2

= t

0

[a

1

], t

1

[b

1

]

a

3

= a

2

⊕b

2

b

3

= a

2

⊕ROR

4

(b

2

, 1) ⊕8a

2

mod 16

a

4

, b

4

= t

2

[a

3

], t

3

[b

3

]

y = 16 b

4

+a

4

where ROR

4

is a function similar to ROR that ro-

tates 4-bit values. First, the byte is split into two

nibbles. These are combined in a bijective mixing

step. Each nibble is then passed through its own 4-

bit ﬁxed S-box. This is followed by another mixing

step and S-box lookup. Finally, the two nibbles are

recombined into a byte. For the permutation q

0

the

4-bit S-boxes are given by

t

0

= [ 8 1 7 D 6 F 3 2 0 B 5 9 E C A 4 ]

t

1

= [ E C B 8 1 2 3 5 F 4 A 6 7 0 9 D ]

t

2

= [ B A 5 E 6 D 9 0 C 8 F 3 2 4 7 1 ]

t

3

= [ D 7 F 4 1 2 6 E 9 B 3 0 8 5 C A ]

where each 4-bit S-box is represented by a list of the

entries using hexadecimal notation. (The entries for

the inputs 0, 1, . . . , 15 are listed in order.) Similarly,

for q

1

the 4-bit S-boxes are given by

t

0

= [ 2 8 B D F 7 6 E 3 1 9 4 0 A C 5 ]

t

1

= [ 1 E 2 B 4 C 3 7 6 D A 5 F 9 0 8 ]

t

2

= [ 4 C 7 5 1 6 9 A 0 E D 8 2 B 3 F ]

t

3

= [ B 9 5 1 C 3 D E 6 4 7 F 2 0 8 A ]

10

PHT

PHT

h h

h h

h h

h h

MDS

h

MDS

g

MDS

g

MDS

h

M0

M1 M3

2i

2i

2i

2i + 1

2i + 1

2i + 1

2i + 1

<<<8

R1

<<<8 <<<9

2i

M2

S0 S1

R0 F0

F1

Figure 3: A view of a single round F function (128-bit key)

11

4.4 Round Function Overview

Figure 3 shows a more detailed view of how the func-

tion F is computed each round when the key length

is 128 bits. Incorporating the S-box and round-

subkey generation makes the Twoﬁsh round function

look more complicated, but is useful for visualizing

exactly how the algorithm works.

5 Performance of Twoﬁsh

Twoﬁsh has been designed from the start with per-

formance in mind. It is eﬃcient on a variety of plat-

forms: 32-bit CPUs, 8-bit smart cards, and dedi-

cated VLSI hardware. More importantly, though,

Twoﬁsh has been designed to allow several layers of

performance tradeoﬀs, depending on the relative im-

portance of encrytion speed, key setup, memory use,

hardware gate count, and other implementation pa-

rameters. The result is a highly ﬂexible algorithm

that can be implemented eﬃciently in a variety of

cryptographic applications.

All these options are interoperable; these are sim-

ply implementation trade-oﬀs and do not aﬀect the

mathematics of Twoﬁsh. One end of a communi-

cation could use the fastest Pentium II implemen-

tation, and the other the cheapest hardware imple-

mentation.

5.1 Performance on Large Micropro-

cessors

Table 1 gives Twoﬁsh’s performance, encryption or

decryption, for diﬀerent key scheduling options and

on several modern microprocessors using diﬀerent

languages and compilers. The times for encryption

and decryption are usually extremely close, so only

the encryption time is given. There is no time re-

quired to set up the algorithm except for key setup.

The time required to change a key is the same as

the time required to setup a key. The approximate

total code size (in bytes) of the routines for encryp-

tion, decryption, and key setup is also listed, where

available.

All timing data is given in clock cycles per block, or

clock cycles to set up the complete key. For exam-

ple, on a Pentium Pro a fully optimized assembly-

language version of Twoﬁsh can encrypt or decrypt

data in 285 clock cycles per block, or 17.8 clock cy-

cles per byte, after a 12700-clock key setup (equiv-

alent to encrypting 45 blocks). On a 200 MHz

Pentium Pro microprocessor, this translates to a

throughput of just under 90 Mbits/sec.

We have implemented four diﬀerent keying options.

All of our keying options precompute K

i

for i =

0, . . . , 39 and use 160 bytes of RAM to store these

constants. The diﬀerences occur in the way the

function g is implemented. There are several other

possible keying options, each with slightly diﬀerent

setup/throughput tradeoﬀs, but the examples listed

below are representative of the range of possibilities.

Full Keying This option performs the full key

precomputations. Using 4 Kb of table space, each

S-box is expanded to a 8-by-32-bit table that com-

bines both the S-box lookup and the multiply by

the column of the MDS matrix. Using this option, a

computation of g consists of four table lookups, and

three xors. Encryption and decyption speeds are

constant regardless of key size.

Partial Keying For applications where few blocks

are encrypted with a single key, it may not make

sense to build the complete key schedule. The par-

tial keying option precomputes the four S-boxes in 8-

by-8 bit tables, and uses four ﬁxed 8-by-32-bit MDS

tables to perform the MDS multiply. This reduces

the key-schedule table space to 1 Kb. For each byte,

the last of the q-box lookups is in fact incorporated

into the MDS table, so only k of the q-boxes are

incorporated into the 8-by-8-bit S-box table that is

built by the key schedule. Encryption and decryp-

tion speed are again constant regardless of key size.

Minimal Keying For applications where very few

blocks are encrypted with a single key, there is a

further possible optimization. Compared to partial

keying, one less layer of q-boxes is precomputed into

the S-box table, and the remaining q-box is done

during the encryption. For the 128-bit key this is

particularly eﬃcient as precomputing the S-boxes

now consists of copying the table of the appropriate

q-box and xoring it with a constant (which can be

done word-by-word instead of byte-by-byte). This

option uses a 1 Kb table to store the partially pre-

computed S-boxes. The necessary key bytes from

S are of course precomputed as they are needed in

every round.

Zero Keying The zero keying option does not

precompute any of the S-boxes, and thus needs no

extra tables. Instead, every entry is computed on

12

Processor Language Keying Code Clocks to Key Clocks to Encrypt

Option Size 128-bit 192-bit 256-bit 128-bit 192-bit 256-bit

Pentium Pro/II Assembly Compiled 8900 12700 15400 18100 285 285 285

Pentium Pro/II Assembly Full 8450 7800 10700 13500 315 315 315

Pentium Pro/II Assembly Partial 10700 4900 7600 10500 460 460 460

Pentium Pro/II Assembly Minimal 13600 2400 5300 8200 720 720 720

Pentium Pro/II Assembly Zero 9100 1250 1600 2000 860 1130 1420

Pentium Pro/II MS C Full 11200 8000 11200 15700 600 600 600

Pentium Pro/II MS C Partial 13200 7100 9700 14100 800 800 800

Pentium Pro/II MS C Minimal 16600 3000 7800 12200 1130 1130 1130

Pentium Pro/II MS C Zero 10500 2450 3200 4000 1310 1750 2200

Pentium Pro/II Borland C Full 14100 10300 13600 18800 640 640 640

Pentium Pro/II Borland C Partial 14300 9500 11200 16600 840 840 840

Pentium Pro/II Borland C Minimal 17300 4600 10300 15300 1160 1160 1160

Pentium Pro/II Borland C Zero 10100 3200 4200 4800 1910 2670 3470

Pentium Assembly Compiled 8900 24600 26800 28800 290 290 290

Pentium Assembly Full 8200 11300 14100 16000 315 315 315

Pentium Assembly Partial 10300 5500 7800 9800 430 430 430

Pentium Assembly Minimal 12600 3700 5900 7900 740 740 740

Pentium Assembly Zero 8700 1800 2100 2600 1000 1300 1600

Pentium MS C Full 11800 11900 15100 21500 630 630 630

Pentium MS C Partial 14100 9200 13400 19800 900 900 900

Pentium MS C Minimal 17800 3800 11100 16900 1460 1460 1460

Pentium MS C Zero 11300 2800 3900 4900 1740 2260 2760

Pentium Borland C Full 12700 14200 18100 26100 870 870 870

Pentium Borland C Partial 14200 11200 16500 24100 1100 1100 1100

Pentium Borland C Minimal 17500 4700 12100 19200 1860 1860 1860

Pentium Borland C Zero 11800 3700 4900 6100 2150 2730 3270

UltraSPARC C Full 16600 21600 24900 750 750 750

UltraSPARC C Partial 8300 13300 19900 930 930 930

UltraSPARC C Minimal 3300 11600 16600 1200 1200 1200

UltraSPARC C Zero 1700 3300 5000 1450 1680 1870

PowerPC 750 C Full 12200 17100 22200 590 590 590

PowerPC 750 C Partial 7800 12200 17300 780 780 780

PowerPC 750 C Minimal 2900 9100 14200 1280 1280 1280

PowerPC 750 C Zero 2500 3600 4900 1030 1580 2040

68040 C Full 16700 53000 63500 96700 3500 3500 3500

68040 C Partial 18100 36700 47500 78500 4900 4900 4900

68040 C Minimal 23300 11000 40000 71800 8150 8150 8150

68040 C Zero 16200 9800 13300 17000 6800 8600 10400

Table 1: Twoﬁsh performance with diﬀerent key lengths and options

13

the ﬂy. The key setup time consists purely of com-

puting the K

i

values and S. For an application that

cannot have any key setup time, the time it takes to

encrypt one block is the sum of the key setup time

and encryption time for the zero keying option.

Compiled In this option, available only in assem-

bly language, the subkey constants are directly em-

bedded into a key-speciﬁc copy of the code, saving

memory fetches and allowing the use of the Pentium

LEA opcode to perform both the PHT and the sub-

key addition all in a single clock. Some additional

setup time is required, as well as about an extra

5000 bytes of memory to hold the “compiled” code,

but this option allows the fastest execution time of

285 clocks per block on the Pentium Pro. The setup

time for a Pentium more than doubles over the full

keying case, because of its smaller cache size, but the

Pentium MMX setup time is still comparable to the

Pentium Pro setup time. However, almost all the ex-

tra time required is consumed merely in copying the

code; the table does not reﬂect the fact that, once

a single key has been initialized, future keys can be

compiled at a cost of only a few hundred more clocks

than full key schedule time.

Language, Compiler, and Processor Choice

As with most algorithms, the choices of language and

compiler can have a huge impact on performance. It

is clear that the Borland C 5.0 compiler chosen as the

standard AES reference is not the best optimizing

compiler. For example, the Microsoft Visual C++

4.2 compiler generates Twoﬁsh code that is at least

20% faster than Borland on a Pentium computer,

with both set to optimize for speed (630 clocks per

block for Microsoft Visual C++ 4.2 versus 870 clocks

per block for Borland C 5.0); on a Pentium Pro/II,

the diﬀerence between the compilers is not quite as

large (e.g., 600 clocks/block vs. 640 clocks/block),

but it is still signiﬁcant. Part of the diﬀerence stems

from the inability of the Borland compiler to gener-

ate intrinsic rotate instructions, despite documenta-

tion claiming that it is possible. This problem alone

accounts for nearly half of the speed diﬀerence be-

tween Borland and Microsoft. The remaining speed

diﬀerence comes simply from poorer code genera-

tion. The Borland compiler is uniformly slower than

Microsoft’s compiler. The encryption speed in Mi-

crosoft C of 40 Pentium clocks per byte (i.e., 630

clocks/block at 16 bytes/block) is over ten percent

faster than the best known DES assembly language

implementation on the same platform. However,

coding the Twoﬁsh algorithm in assembly language

achieves speeds of 285 clocks, achieving a very sig-

niﬁcant speedup over any of the C implementations.

To make matters even more complicated, the as-

sembly language that optimizes performance on a

Pentium (or Pentium MMX) is drastically diﬀerent

from the assembly language required to maximize

speed on a Pentium Pro or Pentium II, even though

the ﬁnal code size and speed achieved on each plat-

form are almost identical. For example, the Pen-

tium Pro/II CPUs can only perform one memory

read per clock cycle, while the Pentium and Pen-

tium MMX can perform two. However, the Pentium

Pro/II can peform two ALU operations per clock in

addition to memory accesses, while the Pentium can

process only a total of two ALU operations or mem-

ory accesses per clock. These (and other) signiﬁcant

architectural diﬀerences result in the fact that run-

ning the optimal Pentium Twoﬁsh encryption code

on a Pentium Pro results in a slowdown of nearly

2:1, and vice versa! Fortunately, it is relatively sim-

ple to detect the CPU type at run-time and select

which version of the assembly code to use. Another

anomaly is that the key schedule setup time is con-

siderably faster (43%) on the Pentium MMX CPU

than on the Pentium, not because the key schedule

uses any MMX instructions, but simply because of

the larger cache size of the MMX chip.

Empirically, there also seems to be some anomalous

behavior of the C compilers. In almost all cases, the

encryption and decryption routines in C achieved

speeds within a few percent of each other. However,

there were cases in the table where the two speeds

diﬀered by considerably more than ten percent (we

used the larger number in the table), which is very

odd because the “inner loop” C code used is vir-

tually identical. We also noted several cases where

compiler switches which seemed unrelated to per-

formance optimization sometimes caused very large

changes in timings.

It should be noted that performance numbers for

Pentium II processors are almost identical in all

cases to those for a Pentium Pro, which is not sur-

prising since Intel claims they have the same core.

The Pentium and Pentium MMX achieve almost

identical speeds for encryption and decryption, al-

though, as noted above, the MMX key setup times

are faster, due mainly to the larger cache size.

The bottom line is that, when comparing the rela-

tive performance of diﬀerent algorithms, using the

same language and compiler for all implementations

helps to make the comparison meaningful, but it

does not guarantee a valid measure of the relative

speeds. We have listed many diﬀerent software per-

14

formance metrics across platforms and languages to

facilitate speed comparisons between Twoﬁsh and

other algorithms. Our belief is that, on any given

platform (e.g., Pentium Pro), the assembly language

performance numbers are the best numbers to use

to gauge absolute performance, since they are unaf-

fected by the vagaries and limitations of the compiler

(e.g., inability to produce rotate opcodes). High-

level languages (e.g., C, Java) are also important

because of the ease of porting to diﬀerent platforms,

but once an algorithm becomes standardized, it will

ultimately be coded in assembly for the most popu-

lar platforms.

Code and Data Size As shown in Table 1, the

code size for a fully optimized Twoﬁsh on a Pen-

tium Pro ranges from about 8450 bytes in assem-

bler to 14100 bytes in Borland C. In assembler, the

encryption and decryption routines are each about

2250 bytes in size; the remaining 4000 bytes of code

are in the key scheduling routines, but about 2200

bytes of that total can be discarded if only 128-bit

keys are needed. In Borland C, the encryption and

decryption routines are each about 4500 bytes in

length, and the key schedule routine is slightly less

than 5000 bytes. Note that each routine ﬁts easily

within the code cache of a Pentium or a Pentium

Pro. These sizes are larger than Blowﬁsh but very

similar to a fully optimized assembly language ver-

sion of DES. Note that, with the exception of the

zero keying option, the code sizes in the table are

for fully unrolled implementations; in either C or as-

sembler, it is possible to achieve signiﬁcantly smaller

code sizes using loops with round counters, at a cost

in performance.

In addition to the code, there are about 4600 bytes of

ﬁxed tables for the MDS matrix, q

0

, and q

1

required

for key setup, and each key requires about 4300 bytes

of key-dependent data tables for the full keying op-

tion. During encryption or decryption, these tables

ﬁt easily in the Pentium data cache. The other key-

ing options use less data and table space, as dis-

cussed above.

Total Encryption Times Any performance mea-

sures that do not take key setup into account are

only valid for asymptotically large amounts of text.

For shorter messages, performance is the sum of key

setup and encryption. For very short messages, the

key setup time can overwhelm the encryption speed.

Table 2 gives Twoﬁsh’s performance on the Pen-

tium Pro (assembly-language version), both 128-bit

key setup and encryption, for a variety of message

lengths. This table assumes the best of our imple-

mentations for the particular length of text.

5.2 Performance on Smart Cards

Twoﬁsh has been implemented on a 6805 CPU,

which is a typical smart card processor, with several

diﬀerent space–time tradeoﬀ options. Our diﬀerent

options result in the following numbers:

RAM Code and Clocks Time per block

(bytes) Table Size per Block @ 4MHz

60 2200 26500 6.6 msec

60 2150 32900 8.2 msec

60 2000 35000 8.7 msec

60 1760 37100 9.3 msec

The code size includes both encryption and decryp-

tion.

4

The block encryption and decryption times

are almost identical. If only encryption is required,

minor improvements in code size and speed can be

obtained. The only key schedule precomputation

time required in this implementation is the Reed-

Solomon mapping used to generate the S-box key

material S from the key M, which requires slightly

over 1750 clocks per key. This setup time could

be cut considerably at the cost of two additional

512-byte ROM tables. It should also be observed

that the lack of a second index register on the 6805

has a signiﬁcant impact on the code size and perfor-

mance, so a diﬀerent CPU with multiple index reg-

isters (e.g., 6502) might be a better ﬁt for Twoﬁsh.

Size and speed estimates for larger key sizes are very

straightforward, given the 128-bit implementation.

The extra code size is fairly negligible—less than 100

extra bytes for a 192-bit key, and less than 200 bytes

for a 256-bit key. The encryption time per block in-

creases by less than 2600 clocks per block (for any

of the code size/speed tradeoﬀs above) for 192-bit

keys, and by about 5200 clocks per block for 256-bit

keys. Similarly, the key schedule precomputation in-

creases to 2550 clocks for 192-bit keys, and to 3400

clocks for 256-bit keys.

The 60 bytes of RAM for the smart card implemen-

tation include 16 bytes for the plaintext/ciphertext

block and 16 bytes for the key. If the 16 key bytes

and the extra 8 bytes of the Reed-Solomon results

(S) are retained in memory between encryption op-

erations, the remaining 36 bytes of RAM are avail-

able for other functions, and there is zero startup

4

For comparison purposes: DES on a 6805 takes about 2K code, 23 bytes of RAM, and 20000 clock cycles per block.

15

Plaintext Keying Clocks Clocks to Total Clocks

(bytes) Option to Key Encrypt per Byte

16 Zero 1250 860 131.9

32 Zero 1250 1720 92.8

64 Zero 1250 4690 73.3

128 Zero 1250 6880 63.5

256 Partial 4900 7360 47.9

512 Full 7800 10080 34.9

1K Full 7800 20160 27.3

2K Full 7800 40320 23.5

4K Compiled 12700 72960 20.9

8K Compiled 12700 145920 19.4

16K Compiled 12700 291840 18.6

32K Compiled 12700 583680 18.2

64K Compiled 12700 1167360 18.0

1M Compiled 12700 18677760 17.8

Table 2: Best speed to encrypt a message with a new 128-bit key on a Pentium Pro

time for the next encryption operation with the same

key. Note that larger key sizes also require more

RAM to store the larger keys and the larger Reed-

Solomon results. In some applications it might be

viable to store all key material in non-volatile mem-

ory, reducing the RAM requirements of the imple-

mentation signiﬁcantly.

Observe that it is possible to save further ROM space

by computing q

0

and q

1

lookups using the underly-

ing 4-bit construction, as speciﬁed in Section 4.3.5.

Such a scheme would replace 512 bytes of ROM ta-

ble with 64 bytes of ROM and a small subroutine

to compute the full 8-bit q

0

and q

1

, saving perhaps

350 bytes of ROM; unfortunately, encryption speed

would decrease by a factor of ten or more. Thus, this

technique is only of interest in smart card applica-

tions for which ROM size is extremely critical but

performance is not. Nonetheless, such an approach

illustrates the implementation ﬂexibility aﬀorded by

Twoﬁsh.

5.3 Performance on Future Micro-

processors

Given the ever-advancing capabilities of CPUs, it is

worthwhile to make some observations about how

the Twoﬁsh algorithm will run on future proces-

sors, including Intel’s Merced. Not many details are

known about Merced, other than that it includes an

Explicitly Parallel Instruction Computing (EPIC)

architecture, as well the ability to run existing Pen-

tium code. EPIC is related to VLIW architectures

that allow many parallel opcodes to be executed at

once, while the Pentium allows only two opcodes in

parallel, and the Pentium Pro/Pentium II may pro-

cess up to three opcodes per clock. However, access

to memory tables is limited in most VLIW imple-

mentations to only a few parallel operations, and we

expect similar restrictions to hold for Merced. For

example, an existing Philips VLIW CPU can pro-

cess up to ﬁve opcodes in parallel, but only two of

the opcodes can read from memory.

Since Twoﬁsh relies on 8-bit non-linear S-boxes, it

is clear that table access is an integral part of the

algorithm. Thus, Twoﬁsh might not be able to take

advantage of all the parallel execution units avail-

able on a VLIW processor. However, there is still

plenty of parallelism in Twoﬁsh that can be well-

utilized in an optimized VLIW software implemen-

tation. Equally important, the alternative of not

using large S-boxes, while it may allow greater paral-

lelism, also naturally involves less non-linearity and

thus generally requires more rounds. For example,

Serpent [BAK98], based on “inline” computation

of 4-bit S-boxes, may experience a relatively larger

speedup than Twoﬁsh on a VLIW CPU, but Serpent

also requires 32 rounds, and is considerably slower

to start with.

It should also be noted that, as with most encryp-

tion algorithms, the primitive operations used in

Twoﬁsh could very easily be added to a CPU in-

struction set to improve software performance sig-

niﬁcantly. Future mainstream CPUs may include

such support for the new AES standard. However,

it is also worthwhile to remember that DES has been

16

a standard for more than twenty years, and no pop-

ular CPU has added instruction set support for it,

even though DES software performance would ben-

eﬁt greatly from such features.

5.4 Hardware Performance

No actual logic design has been implemented for

Twoﬁsh, but estimates in terms of gates for each

building block have been made. As in software, there

are many possible space–time tradeoﬀs in hardware

implementations of Twoﬁsh. Thus, it is not mean-

ingful to give just one ﬁgure for the speed and size

attributes of Twoﬁsh in hardware. Instead, we will

try to outline several of the options and give esti-

mate for speed and gate count of several diﬀerent

architectures.

For example, the round subkeys could be precom-

puted and stored in a RAM, or they could be com-

puted on the ﬂy. If computed on the ﬂy, the h func-

tion logic could be time-multiplexed between sub-

keys and the round function to save size at a cost in

speed, or the logic could be duplicated, adding gates

but perhaps running twice as fast. If the subkeys

were precomputed, the h function logic would be

used during a key setup phase to compute the sub-

keys, saving gates but adding a startup time roughly

equal to one block encryption time. Similarly, a sin-

gle h function logic block could be time-multiplexed

between computing T

0

and T

1

, halving throughput

but saving even more gates.

As another example of the possible tradeoﬀs, the S-

boxes could be precomputed and stored in on-chip

RAMs, allowing faster operation because there is no

need to ripple through several layers of key mate-

rial xors and q permutations. The addition of such

RAMs (e.g., eight 256-byte RAMs) would perhaps

double or triple the size of the logic, and it would

also impose a signiﬁcant startup time on key change

to initialize the RAMs. Despite these disadvantages,

such an architecture might raise the throughput by

a factor of two or more (particularly for the larger

key sizes), so for high-performance systems with in-

frequent re-keying, this option may be attractive.

The construction method speciﬁed in Section 4.3.5

for building the 8-bit permutations q

0

and q

1

from

four 4-bit permutations was selected mainly to min-

imize gate count in many hardware implementations

of Twoﬁsh. These permutations can be built either

directly in logic gates or as full 256-byte ROMs in

hardware, but such a ROM is usually several times

larger than the direct logic implementation. Since

each full h block in hardware (see Figure 2) involves

six q

0

blocks and six q

1

blocks (for N = 128), the

gate savings mount fairly quickly. The circuit delays

in building q

0

or q

1

using logic gates are typically at

least as fast as those using ROMs, although this met-

ric is certainly somewhat dependent on the particu-

lar silicon technology and circuit library available.

It should also be noted that the Twoﬁsh round struc-

ture can be very nicely pipelined to break up the

overall function into smaller and much faster blocks

(e.g., q’s, key xors, MDS, PHT, subkey addition,

Feistel xor). None of these operations individu-

ally is slow, but trying to run all of them in a

single clock cycle does aﬀect the cycle time. In

ECB mode, counter mode, or an interleaved chain-

ing mode, the throughput can be dramatically in-

creased by pipelining the Twoﬁsh round structure.

As a very simple example of two-level pipelining, we

could compute the q

i

’s, key xors, and MDS multiply

for one block during “even” clocks, while the PHT,

subkey addition, and Feistel xor would be computed

on the “odd” clocks; a second block is processed in

parallel on the alternate clock cycles. Using care-

ful balancing of circuit delays between the two clock

phases, this approach allows us to cut the logic delay

in half, thus running the clock at twice the speed of

an unpipelined approach. Such an approach does

not require duplicating the entire Twoﬁsh round

function logic, but merely the insertion of one extra

layer of clocked storage elements (128 bits). Thus,

for a very modest increase in gate count, throughput

can be doubled, assuming that the application can

use one of these cipher modes. It is clear that this

general approach can be applied with more levels of

pipelining to get higher throughput, although dimin-

ishing returns are achieved past a certain point. For

even higher levels of performance, multiple indepen-

dent engines can be used to achieve linear speedups

at a linear cost in gates. We see no problem meet-

ing NSA’s requirement to “be able to encrypt data

at a minimum of 1 Gb/s, pipelined if necessary, in

existing technology” [McD97].

Table 3 gives hardware size and speed estimates for

the case of 128-bit keys. Depending on the architec-

ture, the logic will grow somewhat in size for larger

keys, and the clock speed (or startup time) may in-

crease, but it is believed that a 128-bit AES scheme

will be acceptable in the market long enough that

most vendors will choose to implement that recom-

mended key length. These estimates are all based

on existing 0.35 micron CMOS technology. All the

examples in the table are actually quite small in to-

day’s technology, except the ﬁnal (highest perfor-

mance non-pipelined) instance, but even that is very

17

Gate h Clocks Pipeline Clock Throughput Startup

count blocks per Block Levels Speed (Mbits/sec) clocks Comments

14000 1 64 1 40 MHz 80 4 Subkeys on the ﬂy

19000 1 32 1 40 MHz 160 40

23000 2 16 1 40 MHz 320 20

26000 2 32 2 80 MHz 640 20

28000 2 48 3 120 MHz 960 20

30000 2 64 4 150 MHz 1200 20

80000 2 16 1 80 MHz 640 300 S-box RAMs

Table 3: Hardware tradeoﬀs (128-bit key)

doable today and will become fairly inexpensive as

the next generation silicon technology (0.25 micron)

becomes the industry norm.

6 Twoﬁsh Design Philosophy

In the design of Twoﬁsh, we tried to stress the fol-

lowing principles:

Performance When comparing diﬀerent options,

compare them on the basis of relative perfor-

mance.

Conservativeness Do not design close to the edge.

In other words, leave a margin for error and

provide more security than is provably re-

quired. Also, try to design against attacks that

are not yet known.

Simplicity Do not include ad hoc design elements

without a clear reason or function. Try to de-

sign a cipher whose details can be easily kept

in one’s head.

These principles were applied not only to the overall

design of Twoﬁsh, but to the design of the S-boxes

and the key schedule.

6.1 Performance-Driven Design

The goal of performance-driven design is to build

and evaluate ciphers on the basis of performance

[SW97]. The early post-DES cipher designs would

often compete on the number of rounds in the cipher.

The original FEAL paper [SM88], for example, dis-

cussed the beneﬁts of a stronger round function and

fewer rounds. Other cipher designs of the period—

REDOC II [CW91], LOKI [BPS90] and LOKI 93

[BKPS93], IDEA [LM91, LMM91]—only considered

performance as an afterthought. Khufu/Khafre

[Mer91] was the ﬁrst published algorithm that ex-

plicitly used operations that were eﬃcient on 32-bit

microprocessors; SEAL [RC94, RC97] is a more re-

cent example. RC2 [Riv97, KRRR98] was designed

for 16-bit microprocessors, SOBER [Ros98] for 8-bit

ones. Other, more recent designs, do not seem to

take performance into account at all. Two 1997 de-

signs, SPEED [Zhe97]

5

and Zhu-Guo [ZG97], are sig-

niﬁcantly slower than alternatives that existed years

previous.

Arbitrary metrics, such as the number of rounds,

are not good measures of performance. What is im-

portant is the cipher’s speed: the number of clock

cycles per byte encrypted. When ciphers are ana-

lyzed according to this property, the results can be

surprising [SW97]. RC5 might have twice the num-

ber of rounds of DES,

6

but since its round function

is more than twice as fast as DES’, RC5 is faster

than DES on most microprocessors.

Even when cryptographers made eﬀorts to use eﬃ-

cient 32-bit operations, they often lacked a full ap-

preciation of low-level software optimization princi-

ples associated with high-performance CPUs. Thus,

many algorithms are not as eﬃcient as they could

be. Minor modiﬁcations in the design of Blow-

ﬁsh [Sch94], SEAL [RC94, RC97], and RC4 [Sch96]

could improve performance without aﬀecting secu-

rity [SW97] (or, alternatively, increase the algo-

rithms’ complexity without aﬀecting performance).

In designing Twoﬁsh, we tried to evaluate all design

decisions in terms of performance.

5

Speed has been cryptanalyzed in [HKSW98, HKR+98].

6

Here we use the term “round” in the traditional sense: as it was deﬁned by DES [NBS77] and has been used to describe

Feistel-network ciphers ever since. The RC5 documentation [Riv95] uses the term “round” diﬀerently: one RC5-deﬁned round

equals two Feistel rounds.

18

Since NIST’s platform of choice was the Intel Pen-

tium Pro [NIST97b], we concentrated on that plat-

form. However, we did not ignore performance on

other 32-bit CPUs, as well as 8-bit and 16-bit CPUs.

If there is any lesson from the past twenty years of

microprocessors, it is that the high end gets bet-

ter and the low end never goes away. Yesterday’s

top-of-the-line CPUs are currently in smart cards.

Today’s CPUs will eventually be in smart cards,

while the 8-bit microprocessors will move to devices

even smaller. The only thing we did not consider

in our performance metrics are bitslice implementa-

tions [Bih97, SAM97, NM97], since these can only

be used in very specialized applications and often

require unrealistic implementations: e.g., 32 simul-

taneous ECB encryptions, or 32 interleaved IVs.

7

6.1.1 Performance-driven Tradeoﬀs

During our design, we constantly evaluated the rel-

ative performance of diﬀerent modiﬁcations to our

round function. Twoﬁsh’s round function encrypts

at about 20 clock cycles; 16 rounds translates to

about 320 clock cycles per block encrypted. When

we contemplated a change to the round function, we

evaluated it in terms of increasing or decreasing the

number of rounds to keep performance constant.

Three examples:

• We could have added a data-dependent rota-

tion to the output of the two MDS matrices

in each round. This would add 10 clock cycles

to the round function on the Pentium (2 on

the Pentium Pro). To keep the performance

constant, we would have to reduce the num-

ber of rounds to 11. The question to ask is:

Are 11 rounds of the modiﬁed cipher more or

less secure than 16 rounds of the unmodiﬁed

cipher?

• We could have removed the one-bit rotation.

This would have saved clocks equivalent to one

Twoﬁsh round. Are 17 rounds of this new

round function more or less secure than 16

rounds of the old one?

• We could have deﬁned the key-dependent S-

boxes using the whole key, instead of half

of it. This would have doubled key setup

time on high-end machines, and halved en-

cryption speed on memory-poor implementa-

tions (where the S-boxes could not be precom-

puted). On memory-poor machines, we would

have to cut the number of rounds in half to

be able to aﬀord this. Are 8 rounds of this

improved cipher better than 16 rounds of the

current design?

This analysis is necessarily dependent on the micro-

processor architecture the algorithm is being com-

pared on. While we focused on the Intel Pentium

architecture, we also tried to keep 8-bit smart card

and hardware implementations in mind. For exam-

ple, we considered using a 8-by-8 MDS matrix over

GF(2

4

) to ensure a ﬁner-grained diﬀusion, instead of

a 4-by-4 MDS matrix over GF(2

8

); the former would

have been no slower on a Pentium but at least twice

as slow on a low-memory smart card.

6.2 Conservative Design

There has been considerable research in design-

ing ciphers to be resistant to known attacks

[Nyb91, Nyb93, OCo94a, OCo94b, OCo94c, Knu94a,

Knu94b, Nyb94, DGV94b, Nyb95, NK95, Mat96,

Nyb96], such as diﬀerential [BS93], linear [Mat94],

and related-key cryptanalysis [Bih94, KSW96,

KSW97]. This research has culminated in strong

cipher designs—CAST-128 [Ada97a] and MISTY

[Mat97] are probably the most noteworthy—as well

as some excellent cryptanalytic theory.

However, it is dangerous to rely solely on theory

when designing ciphers. Ciphers provably secure

against diﬀerential cryptanalysis have been attacked

with higher-order diﬀerentials [Lai94, Knu95b] or

the interpolation attack [JK97]: KN-cipher [NK95]

was attacked in [JK97, SMK98], Kiefer [Kie96] in

[JK97], and a version of CAST in [MSK98a]. The

CAST cipher cryptanalyzed in [MSK98a] is not

CAST-128, but it does illustrate that while the

CAST design procedure [AT93, HT94] can create

ciphers resistant to diﬀerential and linear cryptanal-

ysis, it does not create ciphers resistant to whatever

form of cryptanalysis comes next. SNAKE [LC97],

another cipher provably secure against diﬀerential

and linear cryptanalysis, was successfully broken us-

ing the interpolation attack [MSK98b]. When de-

signing a cipher, it is prudent to assume that new

attacks will be developed in order to break it.

We took a slightly diﬀerent approach in our design.

Instead of trying to optimize Twoﬁsh against known

attacks, we tried to make Twoﬁsh strong against

both known and unknown attacks. While it is im-

possible to optimize a cipher design for resisting

7

One AES submission, Serpent [BAK98], uses ideas from bitslice implementations to create a cipher that is very eﬃcient on

32-bit processors while sacriﬁcing performance on 8-bit microprocessors.

19

attacks that are unknown, conservative design and

over-engineering can instill some conﬁdence.

Many elements of Twoﬁsh reﬂect this philosophy.

We used well-studied design elements throughout

the algorithm. We started with a Feistel network,

probably the most studied block-cipher structure,

instead of something newer like an unbalanced Feis-

tel network [SK96, ZMI90] or a generalized Feistel

network [Nyb96].

We did not implement multiplication mod 2

16

+ 1

(as in IDEA or MMB [DGV93]) or data-dependent

rotations (as in RC5

8

or Akelarre [AGMP96]

9

) for

non-linearity. The most novel design elements we

used—MDS matrices and PHTs—are only intended

for diﬀusion (and are used in Square [DKR97] and

SAFER, respectively).

We used key-dependent S-boxes, because they of-

fer adequate protection against known statistical at-

tacks and are likely to oﬀer protection to any un-

known similar attacks. We deﬁned Twoﬁsh at 16

rounds, even though our analysis cannot break any-

where near that number. We added one-bit rota-

tions to prevent potential attacks that relied solely

on the byte structure. We designed a very thorough

key schedule to prevent related-key and weak-key

attacks.

6.3 Simple Design

A guiding design principle behind Twoﬁsh is that

the round function should be simple enough for us

to keep in our heads. Anecdotal evidence from al-

gorithms like FEAL [SM88], CAST, and Blowﬁsh

indicates that complicated round functions are not

always better than simple ones. Also, complicated

round functions are harder to analyze and rely on

more ad-hoc arguments for security (e.g., REDOC-

II [CW91]).

However, with enough rounds, even bad round func-

tions can be made to be secure.

10

Even a simple

round function like TEA’s [WN95] or RC5’s seems

secure after 32 rounds [BK98]. In Twoﬁsh, we tried

to create a simple round function and then iterate it

more than enough times for security.

6.3.1 Reusing Primitives

One of the ways to simplify a design is to reuse the

same primitives in multiple parts of a cipher. Cryp-

tographic design does not lend itself to the adage

of not putting all your eggs in one basket. Since

any particular “basket” has the potential of break-

ing the entire cipher, it makes more sense to use

as few baskets as possible—and to scrutinize those

baskets intensely.

To that end, we used essentially the same construc-

tion (8-by-8-bit key-dependent S-boxes consisting of

alternating ﬁxed permutations and subkey xors fol-

lowed by an MDS matrix followed by a PHT) in

both the key schedule and the round function. The

diﬀerences were in the key material used (the round

function’s g function uses a list of key-derived words

processed by an RS code; the key schedule’s h func-

tion uses individual key bytes directly) and the rota-

tions. The rotations represent a performance-driven

design tradeoﬀ: putting the additional rotations into

F would have unacceptably slowed down the cipher

performance on high-end machines. The use of the

RS code to derive the key material for g adds sub-

stantial resistance to related-key attacks.

While many algorithms reuse the encryption oper-

ation in their key schedule (e.g., Blowﬁsh, Panama

[DC98], RC4, CRISP [Lee96], YTH [YTH96]), and

several alternative DES key schedules reuse the DES

operation [Knu94b, BB96], we are unaware of any

that reuse the same primitives in exactly this man-

ner.

11

We feel that doing so greatly simpliﬁes the

analysis of Twoﬁsh, since the same kinds of analysis

can apply to the cipher in two diﬀerent ways.

6.3.2 Reversibility

While it is essential that any block cipher be re-

versible, so that ciphertext can be decrypted back

into plaintext, it is not necessary that the identi-

cal function be used for encryption and decryption.

Some block ciphers are reversible with changes only

in the key schedule (e.g., DES, IDEA, Blowﬁsh),

while others require diﬀerent algorithms for encryp-

tion and decryption (e.g., SAFER, Serpent, Square).

The Twoﬁsh encryption and decryption round func-

tions are slightly diﬀerent, but are built from the

8

RC5’s security is almost wholly based on data-dependent rotations. Although initial cryptanalysis was promising [KY95]

(see also [Sel98]), subsequent research [KM97, BK98] suggests that there is considerably more to learn about the security

properties of data-dependent rotations.

9

Akelarre was severely broken in [FS97, KR97].

10

Student cryptography projects bear this observation out. At 16 rounds, the typical student cipher fares rather badly against

a standard suite of statistical tests. At 32 rounds, it looks better. At 128 rounds, even the worst designs look very good.

11

The closest idea is an alternate DES key schedule that uses the DES round function, both the 32-bit block input and 48-bit

key input, to create round subkeys [Ada97b].

20

same blocks. That is, it is simple to build a hardware

or software module that does both encryption and

decryption without duplicating much functionality,

but the exact same module cannot both encrypt and

decrypt.

Note that having the cipher work essentially the

same way in both directions is a nice feature in

terms of analysis, since it lets analysts consider

chosen-plaintext and chosen-ciphertext attacks at

once, rather than considering them as separate at-

tacks with potentially radically diﬀerent levels of dif-

ﬁculty [Cop98].

6.4 S-boxes

The security of a cipher can be very sensitive to the

particulars of its S-boxes: size, number, values, us-

age. Ciphers invented before the public discovery of

diﬀerential cryptanalysis sometimes used arbitrary

sources for their S-box entries.

Randomly constructed known S-boxes are unlikely

to be secure. Khafre uses S-boxes taken from the

RAND tables [RAND55], and it is vulnerable to dif-

ferential cryptanalysis [BS92]. NewDES [Sco85],

12

with S-boxes derived from the Declaration of Inde-

pendence [Jeﬀ+76], could be made much stronger

with good S-boxes. DES variants with random ﬁxed

S-boxes are very likely to be weak [BS93, Mat95],

and CMEA was weakened extensively because of a

poor S-box choice [WSK97].

Some cipher designers responded to this threat by

carefully crafting S-boxes to resist known attacks—

DES [Cop94], s

n

DES [KPL93, Knu93c, KLPL95],

CAST [MA96, Ada97a]—while others relied on ran-

dom key-dependent S-boxes for security—Khufu,

Blowﬁsh, WAKE [Whe94].

13

The best existing at-

tack on Khufu breaks 16 rounds [GC94], while the

best attack on Blowﬁsh breaks only four [Rij97]. Ser-

pent [BAK98] reused the DES S-boxes.

GOST [GOST89] navigated a middle course: each

application has diﬀerent ﬁxed S-boxes, turning them

into an application-speciﬁc family key.

6.4.1 Large S-boxes

S-boxes vary in size, from GOST’s 4-by-4-bit S-

boxes to Tiger’s 8-by-64-bit S-boxes [AB96b]. Large

S-boxes are generally assumed to be more secure

than smaller ones—a view we share—but at the

price of increased storage requirements; DES’ eight

6-by-4-bit S-boxes require 256 bytes of storage, while

Blowﬁsh’s four 8-by-32-bit S-boxes require 4 kilo-

bytes. Certainly input size matters more than out-

put size; an 8-by-64-bit S-box can be stored in 2

kilobytes, while a 16-by-16-bit S-box requires 128

kilobytes. (Note that there is a limit to the advan-

tages of making S-boxes bigger. S-boxes with small

input size and very large output size tend to have

very good linear approximations; S-boxes with suﬃ-

ciently large outputs relative to input size are guar-

anteed to have at least one perfect linear approxi-

mation [Bih95].)

Twoﬁsh used the same solution as Square: mid-sized

S-boxes (8-by-8-bit) used to construct a large S-box

(8-by-32-bit).

6.4.2 Algorithmic S-boxes

S-boxes can either be large tables, like DES,

Khufu/Khafre, and YLCY [YLCY98], or derived

algebraically, like FEAL, LOKI-89/LOKI-91 (and

LOKI97 [Bro98]), IDEA, and SAFER. The advan-

tage of the former is that there is no mathematical

structure that can potentially be used for cryptanal-

ysis. The advantage of the latter is that the S-boxes

are more compact, and can be more easily imple-

mented in applications where the ROM or RAM for

large tables is not available.

Algebraic S-boxes can result in S-boxes that are

vulnerable to diﬀerential cryptanalysis: [Mur90]

against FEAL, and [Knu93a, Knu93b] against

LOKI. Higher-order diﬀerential cryptanalysis is es-

pecially powerful against algorithms with simple al-

gebraic S-boxes [Knu95b, JK97, SMK98]. Both tab-

ular and algebraic techniques, however, can be used

to generate S-boxes with given cryptographic prop-

erties, simply by testing the results of the generation

algorithm.

In Twoﬁsh we tried to do both: we chose to build our

8-by-8-bit S-boxes algorithmically out of random 4-

by-4-bit S-boxes. However, we chose the 4-by-4-bit

S-boxes randomly and then extensively tested the re-

sulting 8-by-8-bit S-boxes against the cryptographic

properties we required. This idea is similar to the

one used in CS-cipher [SV98].

12

Despite the algorithm name, NewDES is neither a DES variant nor a new algorithm based on DES.

13

The WAKE design has several variants [Cla97, Cla98]; neither the basic algorithm nor its variants have been extensively

cryptanalyzed.

21

6.4.3 Key-dependent S-boxes

S-boxes are either ﬁxed for all keys or key depen-

dent. It is our belief that ciphers with key-dependent

S-boxes are, in general, more secure than ﬁxed S-

boxes.

There are two diﬀerent philosophies regarding key-

dependent S-boxes. In some ciphers, the S-box is

constructed speciﬁcally to ensure that no two entries

are identical—Khufu and WAKE—while others sim-

ply create the S-box randomly and hope for the best:

REDOC II [CW91] and Blowﬁsh [Sch94]. The latter

results in a simpler key schedule, but may result in

weaknesses (e.g., a weakness in reduced-round vari-

ants of Blowﬁsh [Vau96a]). Another strategy is to

generate key-dependent S-boxes from a known se-

cure S-box and a series of strict mathematical rules:

e.g., Biham-DES [BB94].

Most key-dependent S-boxes are created by some

process completely orthogonal to the underlying ci-

pher. SEAL, for example, uses SHA [NIST93] to

create its key-dependent S-boxes. Blowﬁsh uses re-

peated iterations of itself. The results are S-boxes

that are eﬀectively random, but the cost is an enor-

mous performance penalty in key-setup time.

14

An

alternative is to build the S-boxes using fairly sim-

ple operations from the key. This results in a much

faster key setup, but unless the creation algorithm

is extensively cryptanalyzed together with the en-

cryption algorithm, unwanted synergies could lead

to attacks on the resulting cipher.

To avoid diﬀerential (as well as high-order diﬀer-

ential, linear, and related-key) attacks, we made

the small S-boxes key dependent. It is our belief

that while random key-dependent S-boxes can of-

fer acceptable security if used correctly, the bene-

ﬁts of a surjective S-box are worth the additional

complexities that constructing them entails. So, to

avoid attacks based on non-surjective round func-

tions [BB95, RP95b, RPD97, CWSK98], we made

the 8-by-8-bit S-boxes, as well as the 8-by-32-bit S-

boxes, bijective.

However, there is really no such thing as a key-

dependent S-box. Twoﬁsh uses a complex multi-

stage series of S-boxes and round subkeys that are

often precomputed as key-dependent S-boxes for ef-

ﬁciency purposes. (For example, see Figure 3 on

page 11.) We often used this conceptualization when

carrying out our own cryptanalysis against Twoﬁsh.

6.5 The Key Schedule

An algorithm’s key schedule is the mechanism that

distributes key material to the diﬀerent parts of the

cipher that need it, expanding the key material in

the process. This is necessary for three reasons:

• There are fewer key bits provided as input to

the cipher than are needed by the cipher.

• The key bits used in each round must be

unique to the round, in order to avoid “slide”

attacks [Wag95b].

• The cipher must be secure against an attacker

with partial knowledge or control over some

key bits.

When key schedules are poorly designed, they often

lead to strange properties of the cipher: large classes

of equivalent keys, self-inverse keys, etc. These prop-

erties can often aid an attacker in a real-world at-

tack. For example, the DES weak (self-inverse) keys

have been exploited in many attacks on larger cryp-

tographic mechanisms built from DES [Knu95a], and

the S-1 [Anon95] cipher was broken due to a bad

key-schedule design [Wag95a]. Even worse, they can

make attacks on the cipher easier, and some at-

tacks on the cipher will be focused directly at the

key schedule, such as related-key diﬀerential attacks

[KSW96, KSW97]. These attacks can be especially

devastating when the cipher is used in a hash func-

tion construction.

Key schedules can be divided into several broad cate-

gories [CDN98]. In some key schedules, knowledge of

a round subkey uniquely speciﬁes bits of other round

subkeys. In some ciphers the bits are just reused, as

in DES, IDEA, and LOKI, and in others some ma-

nipulation of the round subkeys is required to de-

termine the other round subkeys: e.g., CAST and

SAFER. Other key schedules are designed so that

knowledge of one round subkey does not directly

specify bits of other round subkeys. Either the round

subkey itself is used to generate the other round sub-

keys in some cryptographically secure manner, as in

RC5 and CS-Cipher, or a one-way function is used

to generate the sound subkeys (sometimes the block

cipher itself): e.g., Blowﬁsh, Serpent [BAK98], ICE

[Kwa97], and Shark.

Some simple design principles guided our develop-

ment of the key schedule for Twoﬁsh:

• Design the Key Schedule for the Ci-

pher. This is not simply a cryptographic

PRNG grafted onto the cipher; the Twoﬁsh

14

For example, setting up a single Blowﬁsh key takes as much time as encrypting 520 blocks, or 4160 bytes, of data.

22

key schedule is instead an integral part of the

whole cipher design.

• Reuse the Same Primitives. The Twoﬁsh

key schedule’s subkey generation mechanism,

h, is built from the same primitives as the

Twoﬁsh round function. This allowed us to

apply much of the same analysis to both the

round function and the subkey generation.

This also makes for a relatively simple picture

of the cipher and key schedule together. It is

reasonable to consider one round’s operations,

and the derivation of its subkeys, at the same

time.

• Use All Key Bytes the Same Way. All

key material goes through h (or g, which is

the same function). That is, the only way a

key bit can aﬀect the cipher is after it deﬁnes

a key-dependent S-box. This allows us to ana-

lyze the properties of the key schedule in terms

of the properties of the byte permutations.

• Make It Hard to Attack Both S-box and

Subkey Generation. The key material used

to derive the key-dependent S-boxes in g is de-

rived from the key using an RS code having

properties similar to those of the MDS matrix.

Deriving the key material in this way maxi-

mizes the diﬃculties of an attacker trying to

mount any kind of related-key attack on the

cipher, by giving him conﬂicting requirements

between controlling the S-box keys and con-

trolling the subkeys.

6.5.1 Performance Issues

For large messages, performance of the key schedule

is minor compared to performance of the encryption

and decryption functions. For smaller messages, key

setup can overwhelm encryption speed. In the de-

sign of Twoﬁsh, we tried to balance these two items.

Our performance criteria included:

• The key schedule must be precomputable for

maximal eﬃciency. This involves trying to

minimize the amount of storage required to

keep the precomputed key material.

• The key schedule must work “on the ﬂy,” de-

riving each block of subkey material as it is

needed, with as little required memory as pos-

sible.

• The key schedule must be reasonably eﬃcient

for hardware implementations.

• The key schedule must have minimal latency

for changing keys.

15

If performance were not an issue, it would make

sense to simply use a one-way hash function to ex-

pand the key into the subkeys and S-box entries, as

is done in Khufu, Blowﬁsh, and SEAL. However, the

AES eﬃciency requirements make such an approach

unacceptable. This led to a much simpler key sched-

ule with much more complicated analysis.

The key schedule design of some other ciphers has

led to various undesirable properties. These proper-

ties, such as the existence of equivalent keys; DES-

style weak, semi-weak, and quasi-weak keys; and

DES-style complementation properties, do not nec-

essarily make the cipher weak. However, they tend

to make it harder to use the cipher securely. With

our key schedule, we can make convincing arguments

that none of these properties exists.

7 The Design of Twoﬁsh

7.1 The Round Structure

Twoﬁsh was designed as a Feistel network, because

it is one of the most studied block cipher building

blocks. Additionally, use of a Feistel network means

that the F function need only be calculated in one

direction.

16

This means that we were able to use

operations in our F function that are ineﬃcient in

the other direction, and make do with tables and

constants for one direction only. Contrast this with

an SP-network, which must execute its encryption

function in both the forward and backward direc-

tions.

15

In its comments on the AES criteria, the NSA suggested that “a goal should be that two blocks could be enciphered with

diﬀerent keys in virtually the same time as two blocks could be enciphered with the same key” [McD97]. The cynical reader

would immediately conclude that the NSA is concerned with the eﬃciency of their brute-force keysearch machines. However,

there are implementations where key agility is a valid concern. Key-stretching techniques can always be used to frustrate

brute-force attacks [QDD86, KSHW98]. A better defense, of course, is to always use keys too large to make a brute-force search

practicable, and to generate them randomly.

16

In fact, the F function can be non-surjective, as it is in DES or Blowﬁsh.

23

7.2 The Key-dependent S-boxes

A fundamental component of Twoﬁsh is the set of

four key-dependent S-boxes. These must have sev-

eral properties:

• The four diﬀerent S-boxes need to actually be

diﬀerent, in terms of best diﬀerential and lin-

ear characteristics and other kinds of analysis.

• Few or no keys may cause the S-boxes used

to be “weak,” in the sense of having high-

probability diﬀerential or linear characteris-

tics, or in the sense of having a very simple

algebraic representation.

• There should be few or no pairs of keys that

deﬁne the same S-boxes. That is, changing

even one bit of the key used to deﬁne an S-

box should always lead to a diﬀerent S-box.

In fact, these pairs of keys should lead to ex-

tremely diﬀerent S-boxes.

7.2.1 The Fixed Permutations q

0

and q

1

The construction method for building q

0

and q

1

from

4-bit permutations (speciﬁed in Section 4.3.5) was

chosen because it decreases hardware and memory

costs for some implementations, as discussed pre-

viously, without adding any apparent weaknesses to

the cipher. It is helpful to recall that these individual

ﬁxed-byte permutations are used only to construct

the key-dependent S-boxes, which, in turn, are used

only within the h and g functions. In particular, the

individual characteristics of q

0

and q

1

are not terri-

bly relevant (except perhaps in some related-key at-

tacks), because Twoﬁsh always uses at least three of

these permutations in series, with at least two xors

with key material bytes. Consideration was initially

given to using random full 8-bit permutations for

q

0

and q

1

, as well as algebraically derived permuta-

tions (e.g., multiplicative inverses over GF(2

8

)) that

have slightly better individual permutation charac-

teristics, but no substantial improvement was found

when composite keyed S-boxes were constructed and

compared to the q

0

and q

1

used in Twoﬁsh.

The q

0

and q

1

permutations were chosen by random

search, constructing random 4-bit permutations t

0

,

t

1

, t

2

, and t

3

for each. Using the notation of Matsui

[Mat96], we deﬁne

DP

max

(q) = max

a=0,b

Pr

X

[q(X ⊕a) ⊕q(X) = b]

and

LP

max

(q) = max

a,b=0

_

2 Pr

X

[X · a = q(X) · b] −1

_

2

where q is the mapping DP

max

and LP

max

are be-

ing computed for, the probabilities are taken over

a uniformly distributed X, and the operator · com-

putes the overall parity of the bitwise-and of its two

operands. Only ﬁxed permutations with DP

max

≤

10/256, LP

max

≤ 1/16, and fewer than three ﬁxed

points were accepted as potential candidates. These

criteria alone rejected over 99.8 percent of all ran-

domly chosen permutations of the given construc-

tion. Pairs of permutations meeting these criteria

were then evaluated as potential (q

0

, q

1

) pairs, com-

puting various metrics when combined with key ma-

terial into Twoﬁsh’s S-box structure, as described

below.

The actual q

0

and q

1

chosen were one of several

pairs with virtually identical statistics that were

found with only a few tens of hours of searching

on Pentium class computers. Each permutation has

DP

max

= 10/256 and LP

max

= 1/16; q

0

has one

ﬁxed point, while q

1

has two ﬁxed points.

7.2.2 The S-boxes

Each S-box is deﬁned with two, three, or four bytes

of key material, depending on the Twoﬁsh key size.

This is done as follows for 128-bit Twoﬁsh keys:

s

0

(x) = q

1

[q

0

[q

0

[x] ⊕s

0,0

] ⊕s

1,0

]

s

1

(x) = q

0

[q

0

[q

1

[x] ⊕s

0,1

] ⊕s

1,1

]

s

2

(x) = q

1

[q

1

[q

0

[x] ⊕s

0,2

] ⊕s

1,2

]

s

3

(x) = q

0

[q

1

[q

1

[x] ⊕s

0,3

] ⊕s

1,3

]

where the s

i,j

are the bytes derived from the key

bytes using the RS matrix. Note that with equal

key bytes, no pair of these S-boxes is equal. When

all s

i,j

= 0, s

0

(x) = q

1

[s

1

(q

−1

1

[x])]. There are other

similar relationships between S-boxes. We have not

been able to ﬁnd any weaknesses resulting from this,

so long as q

0

and q

1

have no high-probability diﬀer-

ential characteristics.

In some sense this construction is similar to a rotor

machine citeDK85, with two diﬀerent types of rotor

(q

0

and q

1

). The ﬁrst rotor is ﬁxed, and the ﬁxed

oﬀset between two rotors speciﬁed by a key byte.

We did not ﬁnd any useful cryptanalysis from this

parallel, but someone else might.

For the 128-bit key, we have experimentally veriﬁed

that each N/8-bit key used to deﬁne a byte permuta-

tion results in a distinct permutation. For example,

in the case of a 128-bit key, the S-box s

0

uses 16 bits

of key material. Each of the 2

16

s

0

permutations

24

deﬁned is distinct, as is also the case for s

1

, s

2

, and

s

3

. We have not yet exhaustively tested longer key

length, but we conjecture that all S-boxes generated

by our construction are distinct. We also conjecture

that this would be the case for almost all choices of

q

0

and q

1

meeting the basic criteria discussed above.

7.2.3 Exhaustive and Statistical Analysis

Given the ﬁxed permutations q

0

and q

1

, and the def-

initions for how to construct s

0

, s

1

, s

2

, and s

3

from

them, we have performed extensive testing of the

characteristics of these key-dependent S-boxes. In

the 128-bit key case, all testing has been performed

exhaustively, which is feasible because each S-box

uses only 16 bits of key material. In many cases,

however, only statistical (i.e., Monte Carlo) testing

has been possible for the larger key sizes. In this

section, we present and discuss these results. It is

our hope to complete exhaustive testing over time

for the larger key sizes where feasible, but the prob-

ability distributions obtained for the statistical tests

give us a fairly high degree of conﬁdence that no

surprises are in store.

Table 4 shows the DP

max

distribution for the var-

ious key sizes. A detailed explanation of the for-

mat of this table will also help in understanding the

other tables in this section. An asterisk (

∗

) next to

a key size in the tables discussed in this section in-

dicates that the entries in that row are a statistical

distribution using Monte Carlo sampling of key bits,

not an exhaustive test. For example, in Table 4,

only the 256-bit key row uses statistical sampling;

the 128-bit and 192-bit cases involve an exhaustive

test. Clearly, the maximum value from a statistical

sample is not a guaranteed maximum. Note that,

for 128-bit keys, each S-box has a DP

max

value no

larger than 18/256. The remaining columns give the

distribution of observed DP

max

values. Each entry

is expressed as the negative base-2 logarithm of the

fraction of S-boxes with the given value (or value

range), with blank entries indicating that no value

in the range was found. For example, in the N = 128

case only 1 of every 2

12.0

key-dependent S-boxes has

DP

max

= 18/256, while over half (1 in 2

0.9

) have

DP

max

= 12/256. These statistics are taken over

all four S-boxes (s

0

, s

1

, s

2

, s

3

), so a total of 4 ×2

16

(i.e., 256K) S-boxes were evaluated for the 128-bit

key case. Each Monte Carlo sampling involves at

least 2

16

S-boxes, but in many cases the number is

considerably larger.

Table 5 shows the distribution of LP

max

for the

various key sizes. Observe that the vast majority

of all Twoﬁsh S-boxes have LP

max

< (88/256)

2

,

although there is a small fraction of S-boxes with

larger values. For 128-bit keys, no Twoﬁsh S-box

has an LP

max

value greater than (100/256)

2

, while

the maximum value is somewhat higher for larger

key sizes. Monte Carlo statistics are given for the

larger two key sizes, since the computational load

for computing LP

max

is roughly a factor of ﬁfteen

higher than for DP

max

.

The distribution of the number of permutation ﬁxed

points for various key sizes is given in Table 6. A

ﬁxed point of an S-box s

i

is a value x for which

x = s

i

(x). This metric does not necessarily have

any direct cryptographic signﬁcance. However, it is

a useful way to verify that the S-boxes are behaving

similarly to random S-boxes, since it is possible to

compute the theoretical distribution of ﬁxed points

for random S-boxes. The probabilities for random

S-boxes is given in the last row. (The probabil-

ity of n ﬁxed points is approximately e

−1

/n!.) The

Twoﬁsh S-box distributions of ﬁxed points over all

keys match theory fairly well.

The metrics discussed so far give us a fair level of

conﬁdence that the Twoﬁsh method of constructing

key-dependent S-boxes does a reasonable job of ap-

proximating a random set of S-boxes, when viewed

individually. Another possible concern is how dif-

ferent the various Twoﬁsh S-boxes are from each

other, for a given key length. This issue is of par-

ticular interest in dealing with related-key attacks,

but it also has an important bearing on the ability

of an attacker to model the S-boxes. Despite the

fact that the Twoﬁsh S-boxes are key-dependent, if

the entire set (or large subsets) of S-boxes, taken as

a black box, are very closely related, the beneﬁt of

key-dependency is severely weakened. As a patho-

logical example, consider a ﬁxed 8-bit permutation

q[x] which has very good DP

max

and LP

max

val-

ues, but which is used as a key-dependent family

of S-boxes simply by deﬁning s

k

(x) = q[x] ⊕ k. It

is true that each s

k

permutation also has good in-

dividual metrics, but the class of s permutations is

so closely related that conventional diﬀerential and

linear cryptanalysis techniques can probably be ef-

fectively applied without knowing the key k. The

Twoﬁsh S-box structure has been carefully designed

with this issue in mind.

In one class of related-key attacks, an attacker at-

tempts to modify the key bytes in such a way as to

minimize the diﬀerences in the round subkey values

A

i

, B

i

. Since the Twoﬁsh S-box structure is used

in computing A

i

and B

i

, with M

e

and M

o

as key

material, respectively, a measure of the diﬀerential

25

−log

2

(Pr(DP

max

= x/256))

Key Size Max Value x = 8 x = 10 x = 12 x = 14 x = 16 x = 18 x = 20 x = 22 x = 24

128 bits 18/256 15.4 1.3 0.9 4.1 8.0 12.0

192 bits 24/256 15.2 1.3 0.9 4.1 8.0 12.2 16.5 20.8 25.0

256 bits

∗

22/256 15.1 1.3 0.9 4.1 8.0 12.2 16.7 22.0

Table 4: DP

max

over all keys

−log

2

(Pr(LP

max

= (x/256)

2

))

Key Size Max Value x = 56..63 64..71 72..79 80..87 88..95 96..103 104..111

128 bits (100/256)

2

9.3 1.0 1.2 4.2 8.0 12.4

192 bits

∗

(104/256)

2

9.3 1.0 1.2 4.2 8.2 12.2 17.0

256 bits

∗

(108/256)

2

9.4 1.0 1.2 4.2 8.1 12.5 17.4

Table 5: LP

max

over all keys.

−log

2

(Pr(# ﬁxed points = x))

Key Size Max Value x = 0 x = 1 x = 2 x = 3 x = 4 x = 5 x = 6 x = 7 x = 8 x = 9 x = 10

128 bits 8 1.4 1.4 2.4 4.1 6.0 8.2 11.1 14.1 17.0

192 bits 10 1.4 1.4 2.4 4.0 6.0 8.4 10.9 13.8 16.8 19.8 23.4

256 bits

∗

10 1.4 1.4 2.4 4.0 6.0 8.4 10.9 13.7 16.8 21.0 21.0

random 1.4 1.4 2.4 4.0 6.0 8.3 10.9 13.7 16.7 19.9 23.2

Table 6: Number of ﬁxed points over all keys

characteristics of A

i

(or B

i

) across keys will help us

understand both how diﬀerent the S-boxes are from

each other and how likely such an attack is to suc-

ceed.

To this end, let us ﬁrst look at how many consec-

utive values of A

i

with a ﬁxed xor diﬀerence can

be generated for two diﬀerent keys. Without loss of

generality, we consider only A

i

, and in particular we

consider a change in only the key material M

e

that

aﬀects one of the four S-boxes. Let y

i

be the out-

put sequence for one S-box used in generating the

sequence A

i

, and the sequence for another key be

y

i

. Consider the diﬀerence sequence y

∗

i

= y

i

⊕ y

i

.

For example, in the 128-bit key case, with 16 bits of

key material per S-box, there are about 2

31

pairs of

distinct keys for each S-box, so there would be 2

31

such diﬀerence sequences, each of length 20. What

is the probability of having a “run” of n consecu-

tive equal values in the sequence? If n can be made

to approach 20, then a related-key attack might be

able to control the entire sequence of A

i

values, and,

even worse, our belief in the independence of the

key-dependent S-boxes must be called seriously into

question. Note that an attacker has exactly 16 bits

of freedom for a single S-box in the 128-bit key case,

so intuitively it seems unlikely that he should be able

to force a given diﬀerence sequence that is very long.

Table 7 shows the distribution of run lengths of

the same xor diﬀerence y

∗

i

for consecutive i. For

random byte sequences, we would theoretically ex-

pect that Pr(xor run length = n) should be roughly

2

−8(n−1)

, which matches quite nicely with the be-

havior observed in the table. It can be seen that

the probability of injecting a constant diﬀerence

into more than ﬁve consecutive subkey entries is ex-

tremely low, which is reassuring.

Table 8 shows the results of measuring this same

property in a diﬀerent way. Instead of requiring a

run of n consecutive identical diﬀerences, this metric

is similar to a “mini”-DP

max

, computed over only

the 20 input values used in the subkey generation

(e.g., 0, 2, 4, . . . , 38). The quantity measured is the

maximum number of equal diﬀerences out of the 20

values generated, across key pairs. In other words,

while it may be diﬃcult to generate a large run, is

it possible to generate equal values in a large frac-

tion of the set of diﬀerences? The results are again

very encouraging, showing that it is extremely diﬃ-

cult to force more than ﬁve or six diﬀerences to be

identical. This also shows that it is not possible to

inﬂuence only a few A

i

, as that would require many

zero diﬀerences.

26

−log

2

(Pr(xor run length = n))

Key Size Max Value n = 1 n = 2 n = 3 n = 4 n = 5

128 bits 5 0.01 7.98 15.95 23.89 30.00

192 bits

∗

4 0.01 7.98 15.93 23.69

256 bits

∗

5 0.01 7.98 15.93 24.02 30.29

Table 7: Subkey xor diﬀerence run lengths

−log

2

(Pr(max # equal xor diﬀerences = n))

Key Size Max Value x = 1 x = 2 x = 3 x = 4 x = 4 x = 6 x = 7

128 bits 7 1.1 0.9 5.9 11.6 17.6 23.9 31.0

192 bits

∗

7 1.1 0.9 5.9 11.7 17.8 23.9 29.0

256 bits

∗

7 1.1 0.9 5.9 11.7 17.7 23.8 29.3

Table 8: “Mini”-DP

max

subkey distribution

For every metric discussed here, a similar distribu-

tion using randomly generated 8-bit permutations

has been generated for purposes of comparison. For

example, DP

max

was computed for a set of 2

16

randomly generated permutations, and the result-

ing distribution of DP

max

values compared to that

of the Twoﬁsh key-dependent S-boxes. For each

metric, the probability distributions looked virtu-

ally identical to those obtained for the Twoﬁsh set

of key-dependent S-boxes, except for small ﬂuctua-

tions on the tail ends of the distribution, as should

be expected. This similarity is comforting, as is the

fact that the probability distributions for each met-

ric look quite similar across key sizes. These results

help conﬁrm our belief that, from a statistical stand-

point, the Twoﬁsh S-box sets behave largely like a

randomly chosen set of permutations.

7.3 MDS Matrix

The four bytes output from the four S-boxes are mul-

tiplied by a 4-by-4 MDS matrix over GF(2

8

). This

matrix multiply is the principal diﬀusion mechanism

in Twoﬁsh. The MDS property here guarantees that

the number of changed input bytes plus the number

of changed output bytes is at least ﬁve. In other

words, any change in a single input byte is guaran-

teed to change all four output bytes, any change in

any two input bytes is guaranteed to change at least

three output bytes, etc. More than 2

127

such MDS

matrices exist, but the Twoﬁsh MDS matrix is also

carefully chosen with the property of preserving the

number of bytes changed even after the rotation in

the round function.

The MDS matrix used in Twoﬁsh has ﬁxed coeﬃ-

cients. Initially, some thought was given to mak-

ing the matrix itself key-dependent, but such a

scheme would require the veriﬁcation that the key-

dependent values in fact formed an MDS matrix,

adding a non-trivial computational load to the key

selection and key scheduling process. However, it

should be noted that there are many acceptable

MDS matrices, even with the extended properties

discussed below.

For software implementation on a modern micropro-

cessor, the MDS matrix multiply is normally imple-

mented using four lookup tables, each consisting of

256 32-bit words, so the particular coeﬃcients used

in the matrix do not aﬀect performance. However,

for smart cards and in hardware, “simple” coeﬃ-

cients, as in Square [DKR97], can make implemen-

tations cheaper and faster. Unlike the MDS matrix

used in Square, Twoﬁsh does not use the inverse ma-

trix for decryption because of its Feistel structure,

nor is there a requirement that the matrix be a cir-

culant matrix.

However, because of the rotation applied after the

xor within the round function, it is desirable to se-

lect the MDS matrix carefully to preserve the dif-

fusion properties even after the rotation. For both

encryption and decryption, a right rotation by one

bit occurs after the xor. This direction of rotation

is chosen to preserve the MDS property with respect

to the PHT addition a+2b, since the rotation undoes

the shift applied to b as part of the PHT. It is true

that the most signiﬁcant bit of b is still “lost” in this

half of the PHT, but the MDS properties for three

of the bytes are still fully guaranteed with respect

to b, and they are met with probability 254/255 for

the fourth byte.

27

The eﬀect of the rotation on the unshifted PHT ad-

ditions also needs to be addressed. A single byte

input change to the MDS matrix will change all four

output bytes, which aﬀect the round function out-

put, but after rotation there is no such guarantee. If

a byte value diﬀerence of 1 is one output from the

matrix multiply, performing a 32-bit rotate on the

result will shift in one bit from the next byte in the

32-bit word and shift out the only non-zero bit. The

MDS matrix coeﬃcients in Twoﬁsh are carefully se-

lected so that, if a byte diﬀerence 1 is output from

the matrix multiply with a single byte input change,

the next highest byte is guaranteed to have its least

signiﬁcant bit changed too. Thus, if the rotation

shifts out the only original ﬂipped bit, it will also

shift in a ﬂipped bit from the next byte.

The construction used to preserve this property af-

ter rotation is actually very simple. The idea is to

choose a small set of non-zero elements of GF(2

8

)

with the property that, for each pair x, y in the set,

if x∗a = 1, then y ∗a (i.e., y/x) has the least signiﬁ-

cant bit set. Observe that this property is reﬂexive;

i.e., if x = y, then y/x = 1, so the property holds. It

is intuitively obvious (and empirically veriﬁed) that

the probability that two random ﬁeld elements sat-

isfy this property is roughly one half, so it should

not be diﬃcult to ﬁnd such sets of elements. Also,

since over 75% of all 4-by-4 matrices over GF(2

8

)

are MDS, the hope of ﬁnding such a matrix sounds

reasonable.

A computer search was executed over all primitive

polynomials of degree eight, looking for sets of three

“simple” elements x,y,z with the above property.

Simple in this context means that x ∗ a for arbi-

trary a can be easily computed using at most a few

shifts and xors. Several dozen sets of suitable val-

ues were found, each of which allowed several MDS

matrixes with the three values. The primitive poly-

nomial v(x) = x

8

+ x

6

+ x

5

+ x

3

+ x

0

was selected,

together with the ﬁeld elements 1, EF, and 5B (us-

ing hexadecial notation and the ﬁeld element to byte

value correspondence of Section 4). The element EF

is actually β

−2

+β

−1

+1, where β is a root of v(x),

and 5B is β

−2

+ 1, so multiplication by these ele-

ments consists of two LFSR right shifts mod v(x),

plus a few byte xors.

Another constraint imposed on the Twoﬁsh MDS

matrix is that no row (or column) of the matrix be

a rotation of another row (or column) of the matrix.

This property guarantees that all single-byte input

diﬀerences result in unique output diﬀerences, even

when rotations over 8 bits are applied to the output,

as is done in generating the round subkeys. This con-

straint did not seem to limit the pool of candidate

matrices signiﬁcantly. In fact, the Twoﬁsh MDS ma-

trix actually exhibits a much stronger property: all

1020 MDS output diﬀerences for single-byte input

changes are distinct from all others, even under bit-

wise rotations by each of the rotation values in the

range 6..26. The subkey generation routine takes

advantage of this property to help thwart related-

key diﬀerential attacks. For all single byte input

changes, the rotation of the output 32-bit word B

by eight bits has an output diﬀerence that is guar-

anteed to be unique from output diﬀerences in the

unrotated A quantity.

There are many MDS matrices containing only

the three elements 1, EF, and 5B. The particular

Twoﬁsh matrix was also chosen to maximize the

minimum binary Hamming weight of the output dif-

ferences over all single-byte input diﬀerences. This

Twoﬁsh MDS matrix guarantees that any single-

byte input change will produce an output Hamming

diﬀerence of at least eight bits, in addition to the

property that all four output bytes will be aﬀected.

In fact, as shown in the table below, only 7 of the

1020 possible single-byte input diﬀerences result in

output diﬀerences with Hamming weight diﬀerence

eight; the remaining 1013 diﬀerences result in higher

Hamming weights. Also, only one of the seven out-

puts with Hamming weight 8 has its most signiﬁcant

bit set, meaning that at least eight bits will be af-

fected even in the PHT term T

0

+ 2T

1

for 1019 of

the 1020 possible single-byte input diﬀerences. In-

put diﬀerences in two bytes are guaranteed to aﬀect

three output bytes, and they can result in an output

Hamming diﬀerence of only three bits, with proba-

bility 0.000018; the probability that the output dif-

ference Hamming weight is less than eight bits for

two byte input diﬀerences is only 0.00125, which is

less than the corresponding probability (0.0035) for

a totally random binomial distribution.

Hamming Number of Number with

Weight occurrences MSB Set

8 7/1020 1

9 23/1020 4

10 42/1020 15

There are many other MDS matrices, using either

the same or another set of simple ﬁeld elements, that

can guarantee the same minimum output Hamming

diﬀerence, and the particular matrix chosen is repre-

sentative of the class. No higher minimum Hamming

weight was found for any matrices with such simple

elements.

28

It is fairly obvious that this matrix multiply can be

implemented at high speed with minimal cost, both

in hardware and in ﬁrmware on a smart card.

It should also be noted that a very diﬀerent type of

construction could be used to preserve MDS prop-

erties on rotation. Use of an 8-by-8 MDS matrix

over GF(2

4

) will guarantee eight output 4-bit nib-

ble changes for every input nibble change. Because

the changes now are nibble-based, a one-bit rotation

may shift the only non-zero bit of a nibble out of

a byte, but the other nibble remains entirely con-

tained in the byte. In fact, it can easily be seen that

this construction preserves the MDS property nicely

even for multi-bit rotations. Unfortunately, 8-by-8

MDS matrices over GF(2

4

) are nowhere nearly as

plentiful as 4-by-4 matrices over GF(2

8

), so very lit-

tle freedom is available to pick simple coeﬃcients.

The best construction for such a matrix seems to

be an extended RS-(15,7) code over GF(2

4

), which

requires considerably more gates in hardware and

more tables in smart card ﬁrmware than the Twoﬁsh

matrix. Because of this additional cost, we decided

not to use an 8-by-8 MDS matrix over GF(2

4

).

The actual transformation deﬁned by the MDS ma-

trix is purely linear over GF(2). That is, each of the

output bits is the xor of a subset of the input bits.

7.4 PHT

The PHT operation, including the addition of the

round subkeys, was chosen to facilitate very fast op-

eration on the Pentium CPU family using the LEA

(load eﬀective address) opcodes. The LEA opcodes

allow the addition of one register to a shifted (by

1,2,4,8) version of another register, along with a 32-

bit constant, all in a single clock cycle, with the re-

sult placed in an arbitrary Pentium register. For

best performance, a version of the encryption and

decryption code can be “compiled” for a given key,

with the round subkeys inserted as constant values

in LEA opcodes in the instruction stream. This ap-

proach requires a full instantiation in memory of the

code for each key in use, but it provides a speedup

for bulk encryption.

Instead of using 4 key-dependent S-boxes, a 4-by-4

MDS matrix, and the PHT, we could have used 8

key-dependent S-boxes and an 8-by-8 MDS matrix

over GF(2

8

). Such a construction would be easier

to analyse and would have nicer properties, but it

is much slower in virtually all implementations and

would not be worth it.

7.5 Key Addition

As noted in the previous section, the round subkeys

are combined with the PHT output via addition to

enable optimal performance on the Pentium CPU

family. From a cryptographic standpoint, an xor

operation could have been used, but it would reduce

the best Pentium software performance for bulk en-

cryption. It should be noted that using addition

instead of xor does impose a minor gate count and

speed penalty in hardware, but this additional over-

head was considered to be well worth the extra per-

formance in software. On a smart card, using addi-

tion instead of xor has virtually no impact on code

size or speed.

7.6 Feistel Combining Operation

Twoﬁsh uses xor to combine the output of F with

the target block. This is done primarily for sim-

plicity; xor is the most eﬃcient operation in both

hardware and software. We chose not to use addi-

tion (used in MD4 [Riv91], MD5 [Riv92], RIPEMD-

160 [DBP96] and SHA [NIST93]), or a more compli-

cated combining function like Latin squares (used in

DESV [CDN95]). We did not implement dynamic

swapping [KKT94] or any additional complexity.

7.7 Use of Diﬀerent Groups

By design, the general ordering of operations in

Twoﬁsh alternates as follows: 8-by-8 S-box, MDS

matrix, addition, and xor. The underlying alge-

braic operations thus alternate between non-linear

table lookup, a GF(2)-linear combination of the bits

by the MDS matrix, integer addition mod 2

32

, and

GF(2) addition (xor). Within the S-boxes, several

levels of alternating xor and 8-by-8 permutations

are applied. The goal of this ordering is to help de-

stroy any hope of using a single algebraic structure

as the basis of an attack. No two consecutive oper-

ations use the same structure, except for the PHT

and the key addition that are designed to be merged

for faster implementations.

7.8 Diﬀusion in the Round Function

There are two major mechanisms for providing diﬀu-

sion in the round function. The ﬁrst is the MDS ma-

trix multiply, which ensures that each output byte

depends on all input bytes. The two outputs of the

g functions (T

0

and T

1

) are then combined using a

PHT so that both of them will aﬀect both 32-bit

29

Feistel xor quantities. The half of the PHT involv-

ing the quantity T

0

+ 2T

1

will lose the most signif-

icant bit of T

1

due to the multiply by two. This

bit could be regained using some extra operations,

but the software performance would be signiﬁcantly

decreased, with very little apparent cryptographic

beneﬁt. In general, the most signiﬁcant byte of this

PHT output will still have a non-zero output diﬀer-

ence with probability 254/255 over all 1-byte input

diﬀerences.

We cannot guarantee that a single byte input change

to the F function will change 7 or 8 of the output

bytes of F. The reason is that the carries in the

addition of the PHT can remove certain byte diﬀer-

ences. For example, an addition with constant might

turn a diﬀerence of 00000180

16

into 00000080

16

. The

chances of this happening depend on the distance

between the two bits that inﬂuence each other. A

large reduction in the number of changed bytes is

very unlikely.

7.9 One-bit Rotation

Within each round, both of the 32-bit words that

are xored with the round function results are also

rotated by a single bit. One word is rotated before

the xor, and one after the xor. This structure pro-

vides symmetry for decryption in the sense that the

same software pipeline structure can be applied in

either direction. By rotating a single bit per round,

each 32-bit quantity is used as an input to the round

function once in each of the eight possible bit align-

ments within the byte.

These rotations in the Twoﬁsh round functions were

included speciﬁcally to help break up the byte-

aligned nature of the S-box and MDS matrix oper-

ations, which we feared might otherwise permit at-

tacks using statistics based on strict byte alignment.

For example, Square uses an 8-by-8-bit permutation

and an MDS matrix in a fashion fairly similar to

Twoﬁsh, but without any rotations. An early draft

of the Square paper proposed a very simple and pow-

erful attack, based solely on such byte statistics, that

forced the authors to increase the number of rounds

from six to eight. Also, an attack against SAFER

is based on the cipher’s reliance on a byte structure

[Knu95c].

Choosing a rotation by an odd number of bits en-

sures that each of the four 32-bit words are used as

input to the g function in each of the eight possible

bit positions within a byte. Rotating by only one bit

position helps optimize performance on the Pentium

CPU (which, unlike the Pentium Pro, has a one-

clock penalty for multi-bit rotations) and on smart

card CPUs (which generally do not have multi-bit

rotate opcodes). Limiting rotations to single-bit also

helps minimize hardware costs, since the wiring over-

head of ﬁxed multi-bit rotations is not negligible.

There are three downsides to the rotations in

Twoﬁsh. First, there is a minor performance im-

pact (less than 7% on the Pentium) in software due

to the extra rotate opcodes. Second, the rotations

make the cipher non-symmetric in that the encryp-

tion and decryption algorithms are slightly diﬀer-

ent, thus requiring distinct code for encryption and

decryption. It is only the rotations that separate

Twoﬁsh from having a “reversible” Feistel structure.

Third, the rotates make it harder to analyze the ci-

pher for security against diﬀerential and linear at-

tacks. In particular, they make the simple technique

of merely counting active S-boxes quite a bit more

complicated. On the other hand, it is much harder

for the attacker to analyze the cipher. For instance,

it is much harder to ﬁnd iterative characteristics,

since the bits do not line up. The rotates also make

it harder to use the same high-probability charac-

teristic several times as the bits get rotated out of

place. On the whole, the advantages were considered

to outweigh the disadvantages.

It is possible to convert Twoﬁsh to a pure Feistel

structure by incorporating round-dependent rota-

tion amounts in F, and adding some ﬁxed rotations

just before the output whitening. This might be a

useful view of the cipher for analysis purposes, but

we do not expect any implementation to use such a

structure.

The rotations were also very carefully selected to

work together with the PHT and the MDS matrix

to preserve the MDS diﬀerence properties for single

byte input diﬀerences to g. In particular, for both

encryption and decryption, the one-bit right rotation

occurs after the Feistel xor with the PHT output.

The MDS matrix was chosen to guarantee that a 32-

bit word rotate right by one bit will preserve the fact

that all four bytes of the 32-bit word are changed

for all single input byte diﬀerences. Thus, placing

the right rotation after the xor preserves this prop-

erty. However, during decryption, the rotate right

is done after the xor with the Feistel quantity in-

volving T

0

+2T

1

. Note that, in this case, the rotate

right puts the 2T

1

quantity back on its byte bound-

ary, except that the most signiﬁcant bit has been

lost. Therefore, given a single input byte diﬀerence

that aﬀects only T

1

, the least signiﬁcant three bytes

of the Feistel xor output are guaranteed to change

30

after the rotation, and the most signiﬁcant byte will

change with probability 254/255.

The fact that the rotate left by one bit occurs be-

fore the Feistel xors during encryption guarantees

that the same relative ordering (i.e., rotate right af-

ter xor) occurs during decryption, preserving the

diﬀerence-property for both directions. Also, per-

forming one rotation before and one after the Feistel

xor imposes a symmetry between encryption and

decryption that helps guarantee very similar soft-

ware performance for both operations on the Pen-

tium CPU family, which has only one ALU pipeline

capable of performing rotations.

7.10 The Number of Rounds

Sixteen rounds corresponds to 8 cycles, which seems

to be the norm for block ciphers. DES, IDEA, and

Skipjack all have 8 cycles. Twoﬁsh was deﬁned to

have 16 rounds primarily out of pessimism. Al-

though our best non-related-key attack only breaks 5

rounds of the cipher, we cannot be sure that undis-

covered cryptanalysis techniques do not exist that

can do better. Hence, we consider 16 rounds to be

a good balance between our natural skepticism and

our desire to optimize performance.

Even so, we took pains to ensure that the Twoﬁsh

key schedule works with a variable number of

rounds. It is easy to deﬁne Twoﬁsh variants with

more or fewer rounds.

7.11 The Key Schedule

To understand the design of the key schedule, it is

necessary to consider how key material is used in

Twoﬁsh:

The Whitening Subkeys 128 bits of key mate-

rial are xored into the plaintext block before en-

cryption, and another 128 bits after encryption.

Since the rest of the encryption is a permutation,

this can be seen as selecting among as many as 2

256

diﬀerent (but closely related) 128-bit to 128-bit per-

mutations. This key material has the eﬀect of mak-

ing many cryptanalytic attacks a little more diﬃcult,

at a very low cost. Note that nearly all the added

strength against cryptanalytic attack is added by the

xor of subkeys into the input to the ﬁrst and last

rounds’ F functions.

The Round Subkeys 64 bits of key material are

combined into the output of each round’s F function

using addition modulo 2

32

. The F function with-

out the round subkey addition is a permutation on

64-bit values; the round subkey selects among one

of 2

64

closely related permutations in each round.

These subkeys must be slightly diﬀerent per round,

to prevent a slide attack, as will be discussed below.

The Key-dependent S-boxes To create the S-

boxes, the key is mapped down to a block of data half

its size, and that block of data is used to specify the

S-boxes for the whole cipher. As discussed before,

the key-dependent S-boxes are derived by alternat-

ing ﬁxed S-box lookups with xors of key material.

7.11.1 Equivalence of Subkeys

In this section, we discuss whether diﬀerent se-

quences of subkeys can give equivalent encryption

functions. Recall that F

**is the F function without
**

the addition of the round subkeys. For this analysis

we keep the function F

**constant. That is, we only
**

vary the subkeys K

i

and not S. The properties of

a Feistel cipher ensure that no pair of 2-round sub-

key sequences can be equivalent for all inputs. It

is natural to ask next whether any pairs of three se-

quential rounds’ subkeys can exist that cause exactly

the same encryption.

For a pair of subkey sequences, (k

0

, k

1

, k

2

) and

(k

∗

0

, k

∗

1

, k

∗

2

), to be equivalent in their eﬀects, every

input block (L

0

, R

0

) must encrypt to the same out-

put block (L

1

, R

2

) for both sequences of subkeys.

Note that k

0

= k

∗

0

and k

2

= k

∗

2

; as we would other-

wise have two sequences of 2-round keys that would

deﬁne the same 2-round encryption function. We

have the following equalities

R

1

= R

0

⊕(k

0

+F

(L

0

))

R

∗

1

= R

0

⊕(k

∗

0

+F

(L

0

))

L

1

= L

0

⊕(k

1

+F

(R

1

))

L

1

= L

0

⊕(k

∗

1

+F

(R

∗

1

))

R

2

= R

1

⊕(k

2

+F

(L

1

))

R

2

= R

∗

1

⊕(k

∗

2

+F

(L

1

))

where ⊕represents bitwise xoring, and + represents

32-bit word-wise addition. Using the two equations

for L

1

we get

k

1

+F

(R

1

) = k

∗

1

+F

(R

∗

1

)

δ

1

= F

(R

1

) −F

(R

∗

1

)

where δ

1

= k

∗

1

−k

1

is ﬁxed. Let T = F

(L

0

)+k

0

and

observe that when L

0

goes over all possible values,

so does T. We get

δ

1

= F

(R

0

⊕T) −F

(R

0

⊕(T +δ

0

)) (1)

31

where δ

0

= k

∗

0

− k

0

. Note that δ

1

and δ

0

depend

only on the round keys, and that the equation must

hold for all values of R

0

and T. Set T = 0 and look

at the cases R

0

= 0 and R

0

= δ

0

. We get

F

(0) −F

(δ

0

) = δ

1

= F

(δ

0

) −F

(0) = −δ

1

The substraction here is modulo 2

32

for each of the

two 32-bit words. That leaves us with the following

possible values for δ

1

:

δ

1

∈ {(0, 0), (0, 2

31

), (2

31

, 0), (2

31

, 2

31

)}

These are the possible diﬀerence values at the out-

put of F

**in equation 1. We can easily convert them
**

to diﬀerence values at the input of the PHT of F

.

Each of the possible values for δ

1

corresponds to ex-

actly one possible value for (δ

T0

, δ

T1

):

(δ

T0

, δ

T1

) ∈ {(0, 0), (0, 2

31

), (2

31

, 0), (2

31

, 2

31

)}

We can write down the analogue to equation 1 for

g:

δ

T0

= g(R

⊕T

) −g(R

⊕(T

+δ

0

))

for all R

and T

and where δ

0

is the appropriate half

of δ

0

. Observe that for the speciﬁc values of δ

T0

that

are possible, subtraction and xor are the same. For

T

**= 0 this translates in a simple diﬀerential equa-
**

tion

δ

T0

= g(R

) ⊕g(R

⊕δ

0

)

for all R

**. We know that g has only one perfect
**

diﬀerential: 0 → 0, so we conclude that δ

0

= 0.

Similarly, we can conclude that the other half of δ

0

must also be zero, and thus δ

0

= 0. This is a con-

tradiction, as k

0

= k

∗

0

.

We conclude that there are no two sets of 3-round

subkey sequences that result in the same encryption

function.

A Conjecture About Equivalent Subkeys in

Twoﬁsh We believe that, for up to 16 rounds of

Twoﬁsh, there are no pairs of equivalent round keys,

where equivalent round keys lead to the same en-

cryption/decryption function for all possible inputs.

There simply do not appear to be enough degrees of

freedom in choosing the diﬀerent subkeys to make

pairs of equivalent subkey sequences in as few as 16

rounds. However, we have been unable to prove this.

We also conjecture that there are no pairs of Twoﬁsh

keys that lead to identical encryptions for all inputs.

(Recall that the Twoﬁsh keys are used to derive both

subkey sequences and also S-boxes.) The function g

depends only on half the key entropy, but within

that restriction we have veriﬁed that no two keys

lead to the same function g. This property does

not guarantee that the round function F has no

key-equivalences (since other key material is added

into the outputs of the PHT), but it provides partial

heuristic evidence for that contention.

7.11.2 Byte Sequences

The subkeys in Twoﬁsh are generated by using the

h function, which can be seen as four key-dependent

S-boxes followed by an MDS matrix. The input to

the S-boxes is basically a counter. In this section we

analyze the sequences of outputs that this construc-

tion can generate.

All key material is used to deﬁne key-dependent S-

boxes in h, which are then used to derive subkeys.

Each S-box gets a sequence of inputs, (0, 2, 4, . . . , 38)

or (1, 3, 5, . . . , 39). The S-box generates a corre-

sponding sequence of outputs. The corresponding

outputs from the four S-boxes are combined using

the MDS matrix multiply to produce the sequence

of A

i

and B

i

words, and those words are processed

with the PHT (with a couple of rotations thrown

in) to produce a pair of subkey words. Analyzing

these byte sequences thus gives us important insights

about the whole key schedule.

We can model each byte sequence generated by a

key-dependent S-box as a randomly selected non-

repeating byte sequence of length 20. This allows us

to make many useful predictions about the likelihood

of ﬁnding keys or pairs of keys with various interest-

ing properties. Because we will be analyzing the key

schedule using this assumption in the remainder of

this section, we should discuss how reasonable it is

to treat this byte sequence as randomly generated.

As discussed in Section 7.2.3 we have not found any

statistical deviations between our key-dependent S-

boxes and the random model in any of our extensive

statistical tests.

We are looking at 20-byte-long sequences of distinct

bytes. There are 256!/236! of those sequences, which

is close to 2

159

.

7.11.3 Equivalent S-box Keys

We have veriﬁed that there are no equivalent S-box

keys that generate the same sequence of 20 bytes. In

the random model, the chance of this happening for

the N = 256 case is about 2

63

· 2

−159

= 2

−96

. This

is the chance of such equivalent S-boxes existing at

all.

In fact, if we peel oﬀ one layer of our q construction

and assume the rest of the construction is random,

32

we can improve that bound. Without loss of gener-

ality we look at

17

:

s

0

(x) = q

0

[q

0

[q

1

[q

1

[q

0

[x] ⊕k

0

] ⊕k

1

] ⊕k

2

] ⊕k

3

]

Identical sequences are possible only if the inputs to

that last q

0

ﬁxed permutation are identical for both

sequences. That means that the task of ﬁnding a

pair of identical sequences comes down to a simple

task: ﬁnding a pair of (k

0

, k

1

, k

2

) byte values that

leads to a pair of sequences before the xor with k

3

that have a ﬁxed xor diﬀerence. Then, k

3

can be

changed to include that xor diﬀerence, and identical

sequences of inputs will go into that last q

0

S-box.

Let

t[i] := q

0

[q

1

[q

1

[q

0

[i] ⊕k

0

] ⊕k

1

] ⊕k

2

]

The goal is to ﬁnd a pair of t[i] sequences such that

t[i] ⊕t

∗

[i] = constant

Let us assume that t generates a random sequence.

The chances of any pair of t, t

∗

generating such a

constant diﬀerence is about 2

−151

. This brings the

chance of ﬁnding a pair with such a constant diﬀer-

ence down to 2

47

· 2

−151

= 2

−104

.

7.11.4 Byte Diﬀerence Sequences

Let us consider the more general problem of how

to get a given 20-byte diﬀerence sequence between

a pair of S-boxes. Suppose we have two S-boxes,

each deﬁned using 32 bits of key material, which

are not equal, but which must be chosen to give us

a given diﬀerence sequence in the xor of their byte

sequences. We can estimate the probability of a pair

of 4-byte inputs existing with the desired xor diﬀer-

ence sequence as 2

63

· 2

−159

= 2

−96

. Note that this

is the probability that such a pair of inputs exists,

not the probability that a random pair of keys will

have this property.

7.11.5 The A and B Sequences

From the properties of the byte sequences, we can

discuss the properties of the A and B sequences gen-

erated by each key M.

A

i

= MDS(s

0

(i, M), s

1

(i, M), s

2

(i, M), s

3

(i, M))

Since the MDS matrix multiply is invertible, and

since i is diﬀerent for each round’s subkey words gen-

erated, we can see that no A or B value can repeat

itself.

Similarly, we can see from the construction of h that

each key byte aﬀects exactly one S-box used to gen-

erate A or B. Changing a single key byte always

alters every one of the 20 bytes of output from that

S-box, and so always alters every word in the 20-

word A or B sequence to which it contributes.

Consider a single byte of output from one of the S-

boxes. If we cycle any one of the key bytes that con-

tributes to that S-box through all 256 possible val-

ues, the output of the S-box will also cycle through

all 256 possible values. If we take four key bytes

that contribute to four diﬀerent S-boxes, and we cy-

cle those four bytes through all possible values, then

the result of h will also cycle through all possible

values. This proves that A and B are uniformly dis-

tributed for all key lengths, assuming the key M is

uniformly distributed.

7.11.6 Diﬀerence Sequences in A and B

Let us also consider diﬀerence sequences. If we have

a speciﬁc diﬀerence sequence we want to see in A,

we are faced with an interesting problem: since the

MDS matrix multiply is xor-linear, each desired

output xor from the matrix multiply allows only

one possible input xor. This means that:

1. A zero output xor diﬀerence in A can occur

only with a zero output xor diﬀerence in all

four of the byte sequences used to build A.

2. Only 1020 possible output diﬀerences (out of

the 2

32

) in A

i

can occur with a single “active”

(altered) S-box. Most diﬀerences require all

four S-boxes used to form A

i

to be active.

3. Each desired output xor in A requires a spe-

ciﬁc output xor in each of the four byte se-

quences used to form A. This means that get-

ting any desired diﬀerence sequence into all 20

A

i

values requires getting a desired xor se-

quence into all four 20-byte sequences. (Note

that if the desired output xor in A

i

is an ap-

propriate value, up to three of the four byte

sequences can be identical without much trou-

ble, simply by leaving their key material un-

changed.) As mentioned above, this is very

unlikely to be possible for a randomly chosen

diﬀerence pattern in the A sequence. (There

are of course diﬀerence sequences of A

i

’s that

can occur.)

17

For ease of discussion, we number the key bytes as k

0

..k

3

going into one S-box. The actual byte ordering for s

0

and a

256-bit key’s subkey-generating S-box is k

0

, k

8

, k

16

, k

24

. The numbering of the key bytes has no eﬀect on the security arguments

in this section.

33

The above analysis is of course also valid for the B

sequence.

7.11.7 The Sequence (K

2i

, K

2i+1

)

As A

i

and B

i

are uniformly distributed (over all

keys), so are all the K

i

. As all pairs (A

i

, B

i

) are

distinct, all the pairs (K

2i

, K

2i+1

) are distinct, al-

though it might happen that K

i

= K

j

for any pair

of i and j.

7.11.8 Diﬀerence Sequences in the Subkeys

Diﬀerence sequences in A and B translate into dif-

ference sequences in (K

2i

, K

2i+1

). However, while it

is natural to consider A and B diﬀerence sequences

in terms of xor diﬀerences, subkeys can reasonably

be considered either as xor diﬀerences or as diﬀer-

ences modulo 2

32

. Thus, we may discuss diﬀerence

sequences:

D[i, M, M

∗

] = K

i,M

−K

i,M

∗

X[i, M, M

∗

] = K

i,M

⊕K

i,M

∗

where the diﬀerence is computed between the key

value M and M

∗

.

7.11.9 XOR Diﬀerences in the Subkeys

Each round, the subkeys are added to the results of

the PHT of two g functions, and the results of those

additions are xored into half of the cipher block. An

xor diﬀerence in the subkeys has a fairly high prob-

ability of passing through the addition operation and

ending up in the cipher block. (The probability of

this is determined by the Hamming weight of the

xor diﬀerence, not counting the highest-order bit.)

However, to get into the subkeys, a xor diﬀerence

must ﬁrst pass through the ﬁrst addition.

Consider

x +y = z

(x ⊕δ

0

) +y = z ⊕δ

1

Let k be the number of bits set in δ

0

, not counting

the highest-order bit. Then, the highest probability

value for δ

1

is δ

0

, and the probability that this will

hold is 2

−k

. This is true because addition and xor

are very closely related operations. The only diﬀer-

ence between the two is the carry between bit posi-

tions. If ﬂipping a given bit changes the carry into

the next bit position, this alters the output xor dif-

ference. This happens with probability 1/2 per bit.

The situation is more complex for multiple adjacent

bits, but the general rule still holds: for every bit in

the xor diﬀerence not in the high-order bit position,

the probability that the diﬀerence will pass through

correctly is cut in half.

For the subkey generation, consider an xor diﬀer-

ence, δ

0

, in A. This aﬀects two subkey words:

K

2i

= A

i

+B

i

K

2i+1

= ROL(A

i

+ 2B

i

, 9)

where the additions are modulo 2

32

. If we assume

these xor diﬀerences propagate independently in

the two subkeys (which appears to be the case), we

see that this leads to an xor diﬀerence of δ

0

in the

even subkey word with probability 2

−k

, and the xor

diﬀerence ROL(δ

0

, 9) in the odd subkey with the

same probability. The most probable xor diﬀerence

in the round’s subkey block thus occurs with prob-

abiity 2

−2k

. A desired xor diﬀerence sequence for

all 20 pairs of subkey words is thus quite diﬃcult to

get to work when k ≥ 3, assuming the desired xor

diﬀerence sequence can be created in the A sequence

at all.

When the xor diﬀerence is in B, the result is slightly

more complicated; the most probable xor diﬀerence

in a round’s pair of subkey words may be either

2

−(2k−1)

or 2

−2k

, depending on whether or not the

xor diﬀerence in B covers the next-to-highest-order

bit.

7.11.10 Diﬀerences in the Subkeys

An xor diﬀerence in A or B is easy to analyze in

terms of additive diﬀerences modulo 2

32

: an xor

diﬀerence with k active bits has 2

k

equally likely

additive diﬀerences. Note that if we have a addi-

tive diﬀerence in A, we get it in both subkey words,

just rotated left nine bits in the odd subkey word.

Thus, k-bit xor diﬀerences lead to a given additive

diﬀerence in a pair of subkey words with probabil-

ity 2

−k

. (The rotation does not really complicate

things much for the attacker, who knows where the

changed bits are.)

Note that when additive subkey diﬀerences modulo

2

32

are used in an attack, they survive badly through

the xor with the plaintext block. We estimate that

xor diﬀerences are much more likely to be directly

useful in mounting an attack.

7.11.11 Properties of the Key Schedule and

Cipher

One NIST requirement is that the AES candidates

have no weak keys. Here we argue that Twoﬁsh has

34

none.

Equivalent Keys A pair of equivalent keys,

M, M

∗

, is a pair of keys that encrypt all plaintexts

into the same ciphertexts. We are almost certain

that there are no equivalent keys in Twoﬁsh. There

is no pair of keys, M, M

∗

, that gives the same sub-

key sequence. This is easy to see; to get identical

subkeys, we have to get an identical A and B se-

quence at the same time, which requires getting all

eight of the key scheduling S-boxes to give the same

output for the same input for two diﬀerent keys. We

have veriﬁed that this cannot be done.

It is conceivable that diﬀerent sequences of subkeys

and diﬀerent S-boxes in g could end up producing

the same encryption function, thus giving equiva-

lent keys. This seems extremely unlikely to us, but

we cannot prove that such equivalent keys do not

exist.

Self-Inverse Keys Self-inverse keys are keys for

which encrypting a block of data twice with the same

key gives back the original data. We do not believe

that self-inverse keys exist for Twoﬁsh. Keys cannot

generate a self-inverse sequence of subkeys, because

the same round subkey value can never appear more

than once in the cipher. Again, it is conceivable

that some keys are self-inverse despite using diﬀer-

ent subkeys at diﬀerent points, but it is extremely

unlikely.

Pairs of Inverse Keys A pair of inverse keys is a

pair of keys M

0

, M

1

, such that E

M0

(E

M1

(X)) = X,

for all X. Pairs of these keys that have identical

subkeys, just running backwards, are astronomically

unlikely to exist at all. For this to work, each S-box

must have a pair of keys that reverses it. This is

a kind of collision property; for each S-box, it has

probability 2

−96

of existing at all.

Simple Relations A key complementation prop-

erty exists when:

E

M

(P) = C ⇒ E

M

(P

) = C

where P

, C

, and K

**are the bitwise complement
**

of P, C, and K, respectively. No such property has

been observed for Twoﬁsh.

More generally, a simple relation [Knu94b] is deﬁned

as:

E

M

(P) = C ⇒ E

f(M)

(g(P, M)) = h(C, M)

where f, g, and h are simple functions. We have

found no simple relations for Twoﬁsh, and strongly

doubt that they exist.

7.11.12 Key-dependent Characteristics and

Weak Keys

The concept of a key-dependent characteristic seems

to have been introduced in [BB93] in their cryptanal-

ysis of Lucifer, and also appears in [DGV94a] in an

analysis of IDEA.

18

The idea is that certain iterative

properties of the block cipher useful to an attacker

become more eﬀective against the cipher for a spe-

ciﬁc subset of keys.

A diﬀerential attack on Twoﬁsh may consider xor-

based diﬀerences, additive diﬀerences, or both. If an

attacker sends xor diﬀerences through the PHT and

subkey addition steps, his diﬀerential characteristic

probabilities will be dependent on the subkey values

involved. In general, low-weight subkeys will give

an attacker some advantage, but this advantage is

relatively small. (Zero bits in the subkeys improve

the probabilities of cleanly getting xor-based diﬀer-

ential characteristics through the subkey addition.)

Since there appears to be no special way to choose

the key to make the subkey sequence especially low

weight, we do not believe this kind of key-dependent

diﬀerential characteristic will have any relevance in

attacking Twoﬁsh.

A much more interesting issue in terms of key-

dependent characteristics is whether the key-

dependent S-boxes are ever generated with espe-

cially high probability diﬀerential or high bias linear

characteristics. The statistical analysis presented

earlier shows that the best linear and diﬀerential

characteristics over all possible keys are still quite

unlikely.

Note that the structure of both diﬀerential and

linear attacks in Twoﬁsh is such that such at-

tacks appear to generally require good characteris-

tics through at least three of the four key-dependent

S-boxes (if not all four), so a single high-probability

diﬀerential or linear characteristic for one S-box will

not create a weakness in the cipher as a whole.

18

See [Haw98] for further cryptanalysis of IDEA weak keys.

35

7.12 Reed-Solomon Code

The RS structure helps defend against many possible

related-key attacks by diﬀusing the key material in

a direction “orthogonal” to the ﬂow used in comput-

ing the 8-by-8-bit S-boxes of Twoﬁsh. For example,

a single byte change in the key is guaranteed to aﬀect

all four key-dependent S-boxes in g. Since RS codes

are MDS [MS77], the minimum number of diﬀerent

bytes between distinct 12-byte vectors generated by

the RS code is guaranteed to be at least ﬁve. Notice

that any attempt in a related-key attack to aﬀect

only a single byte in the computation of A or B is

guaranteed to aﬀect all four bytes in the computa-

tion of T

0

and T

1

. The S-box keys are used in reverse

order from the associated key bytes so that related-

key material is used in a diﬀerent order in the round

function than in the subkey generation.

The reversible RS code used in Twoﬁsh was cho-

sen via computer search to minimize implementation

cost. The code generator polynomial is

x

4

+ (α +

1

α

)x

3

+αx

2

+ (α +

1

α

)x + 1

where α is a root of the primitive polynomial w(x)

used to deﬁne the ﬁeld.

Because all of these coeﬃcients are “simple,” this RS

computation is easily performed with no tables, us-

ing only a few shifts and xor operations; this is par-

ticularly attractive for smart cards and hardware im-

plementations. This computation is only performed

once per key schedule setup per 64 bits of key, so

the concern in choosing an RS code with such sim-

ple coeﬃcients was not the performance overhead,

but saving ROM space or gates. Precomputing the

RS remainders requires only 8 bytes of RAM on a

smart card for 128-bit keys.

Note that the RS matrix multiply can be imple-

mented as a simple loop using the generator polyno-

mial without storing the coeﬃcients of the RS ma-

trix.

8 Cryptanalysis of Twoﬁsh

We have spent over one thousand man-hours at-

tempting to cryptanalyze Twoﬁsh. A summary of

our successful attacks is as follows:

• 5-round Twoﬁsh (without the post-whitening)

with 2

22.5

chosen plaintext pairs and 2

51

com-

putations of the function g.

• 10-round Twoﬁsh (without the pre- and post-

whitening) with a chosen-key attack, requiring

2

32

chosen plaintexts and about 2

11

adaptive

chosen plaintexts, and about 2

32

work.

The fact that Twoﬁsh seems to resist related-key at-

tacks well is arguably the most interesting result, be-

cause related-key attacks give the attacker the most

control over the cipher’s inputs. Conventional crypt-

analysis allows an attacker to control both the plain-

text and ciphertext inputs into the cipher. Related-

key cryptanalysis gives the attacker an additional

way into a cipher: the key schedule. A cipher that

is resistant to attacks with related keys is necessarily

resistant to simpler techniques that only involve the

plaintext and ciphertext.

19

Based on our analysis, we conjecture that there ex-

ists no more eﬃcient attack on Twoﬁsh than brute

force. That is, we conjecture that the most eﬃ-

cient attack against Twoﬁsh with a 128-bit key has a

complexity of 2

128

, the most eﬃcient attack against

Twoﬁsh with a 192-bit key has a complexity of 2

192

,

and the most eﬃcient attack against Twoﬁsh with a

256-bit key has a complexity of 2

256

.

8.1 Diﬀerential Cryptanalysis

This attack breaks a 5-round Twoﬁsh variant with a

128-bit key, without the post-whitening, with 2

22.5

chosen plaintexts and 2

51

work. Due to the high

computational requirements, we have not imple-

mented this attack, though we have validated some

of its elements. We also discuss an attack on four

rounds of Twoﬁsh with no post-xor requiring 2

32

work, and attacks on a Twoﬁsh variant with ﬁxed S-

boxes breaking six rounds with 2

67

work, and break-

ing seven rounds with 2

131

work. None of the attacks

appears to be extensible to enough rounds to pose

any substantial threat to the full 16 round Twoﬁsh.

8.1.1 Overview of the Attack

The general idea of the attack is to force a character-

istic for the F function of the form (a, b) → (X, 0) to

occur in the second round. We realize this character-

istic internally by causing the same low Hamming-

weight xor diﬀerence to occur in the outputs of both

g computations in the second round of the cipher.

When this happens, and there are k bits in that

xor diﬀerence, there is a 2

−k

probability that the

19

We have discussed the relevance of related-key attacks to practical implementations of a block cipher in [KSW96, KSW97].

Most importantly, related-key attacks aﬀect a cipher’s ability to be used as a one-way hash function.

36

output of the PHT will be unchanged in one of its

two words. That, in turn, leads to a detectable pat-

tern in the output diﬀerence from the third round,

and thus to a detectable pattern (with a great deal

of work) in the output diﬀerence in the output from

the cipher’s ﬁfth round.

8.1.2 The diﬀerential

The attack requires that we get a speciﬁc, pre-

dictable diﬀerence sequence. We choose:

r ∆

Rr,0

∆

Rr,1

∆

Rr,2

∆

Rr,3

0 0 0 a

b

1 a b 0 0

2 0 X a b

3 Y Z 0 X

4 Q R Y Z

5 S T Q R

where the table gives the diﬀerence patterns for

the round values for each of the 5 rounds, and

where a

= ROL(a, 1), b

= ROR(b, 1). We have

Z = ROL(a, 1) ⊕F

2,1

, and the low-order bit of F

2,1

is guaranteed to be zero; also, we choose a so that

the low-order bit of ROL(a, 1) is also zero. Then

the diﬀerence Z is detectable, because the low-order

bit of Z is always zero. To recognize right pairs, we

shall (roughly speaking) guess the last-round sub-

key and decrypt up one round, checking whether the

low-order bit of the diﬀerence is as expected. There

are also several other ways to distinguish the Y, Z

diﬀerence, but they do not change the diﬃculty of

the attack. Note that X, Y , Z, S, T, Q, and R are

all diﬀerences whose values we do not care about, so

long as the low-order bit of Z is always zero.

The only critical event for this attack occurs at r =

2, where we need the characteristic (a, b) → (0, X)

to happen with high probability. Based on the prop-

erties of the MDS matrix, we know that there are

three single-byte output xors for s

0

that lead to an

output xor of g with Hamming weight of only 8. If

the same one of those diﬀerences goes into the MDS

matrix in both g computations in the second round,

then, with probability 2

−8

, we get an oﬀsetting pair

of values in the two g outputs; one g output has some

value added to it modulo 2

32

, and the other output

has the same value subtracted from it. When the

outputs go through the PHT, they lead to only one

output word from the whole F being changed, which

is what makes this attack possible. Therefore, we

choose a = (α, 0, 0, 0) to be a single-byte diﬀerence

in s

0

, and b = ROR(a, 8) so that the single-byte

diﬀerence into the second g computation in round 2

lines up with the single-byte diﬀerence in a.

The thing that makes this attack hard is getting that

event to occur several times in succession. We choose

input structures to guarantee that, after 2

22.5

cho-

sen plaintexts, we are very likely to get at least one

batch of 2048 “right pairs,” which have the 5-round

diﬀerential characteristic shown above. With 2048

successive events of this kind, we can mount an at-

tack in which we attempt to recover the S-box en-

tries. When our guess is successful, these will be

correct, and we will be able to recover most of the

key.

8.1.3 The Input Structure

The input structure works as follows: The input pair

is

(A, B, C, D), (A

∗

, B

∗

, C

∗

, D

∗

)

The input structures we use must thus accomplish

two things:

1. Select 2

n

plaintext pairs in such a way that

there is a high probability of at least one pair

being a right pair. Recall that a right pair is

a pair that follows the whole diﬀerential char-

acteristic described above.

2. For each plaintext pair generated in the pre-

vious step, generate about 2048 related plain-

text pairs, such that if the ﬁrst plaintext pair

is a right pair, so are all 2048 related plaintext

pairs.

The ﬁrst step allows us to get the characteristic

through; the second step allows us to mount the

whole 5-round attack, by giving us enough data to

recover the low bits of the S-box entries.

Getting One Potential Right Pair Let us con-

sider pairs of inputs to the F function in the second

round. We must ﬁrst get a desired xor diﬀerence to

occur in the output of the active S-box of the attack,

inside the g function computation. This xor diﬀer-

ence must occur in the output of the active S-box

s

0

in both parallel g functions in the second round’s

F function. We will call a pair of texts for which

this happens a potential right pair. We expect there

to be between three and four output xor diﬀerences

from s

0

that will lead to some 8-bit output xor dif-

ference from the whole g function, and we do not

care which of these diﬀerences we hit. However, we

do not know s

0

, and so we cannot expect to choose

the best bytewise input diﬀerence to s

0

to get one of

our desired xor diﬀerences.

37

Simulations show that, with a randomly selected s

0

,

a randomly selected (non-zero) input xor diﬀerence

has about a 1/2 probability of causing any desired

output xor diﬀerence, when it is tried on all 256

possible input values. (That is, s

0

[i] ⊕ s

0

[i ⊕ δ] will

hit any desired output xor with probability about

1/2, for any nonzero δ, when i is stepped through all

possible byte values.) Our structure attempts to get

this sequence of inputs into the same byte position

in both parallel g functions in the second round’s F

function at the same time. This requires 256 diﬀer-

ent input pairs, there is a probability of 1/2 of one

of them being a potential right pair for each of the

three or four useful s

0

output xors. This will gen-

erate a potential right pair with probability 7/8 if

there are three useful s

0

output xors.

There is a complication to all this. The second

round’s F function input is the right half of the

plaintext, xored with the ﬁrst round’s F function

output. Since we do not know the ﬁrst round’s F

function output, we cannot directly control the in-

puts to the second round. Instead, we must guess

the xor of the high-order byte of each word of the F

function output. (This ignores the input whitening,

but that merely oﬀsets our guess and has no eﬀect.)

This means that, to have at least a 7/8 probability

of getting one potential right pair, we must try 2

16

diﬀerent input pairs.

Getting Right Pairs from Potential Right

Pairs A potential right pair has the property that

it is getting the same xor diﬀerence in the output

of both parallel g functions in the second round. Be-

cause this xor diﬀerence has a Hamming weight of

only 8, the probability is at least 2

−8

that the two g

functions will have a very useful relationship mod-

ulo 2

32

: one g function will have some value added

to it modulo 2

32

; the other g function in the same

plaintext will have that value subtracted from it.

Consider two random values, U and V . If we xor

some nonzero value with Hamming weight 8, T, into

both, it is possible that each time a bit in U is

changed from a zero to a one, the corresponding bit

in V is changed from a one to a zero. When this

happens:

U +V = (U ⊕T) + (V ⊕T) (2)

This is what happens in a right pair.

For any U and T, the values of V for which this holds

are characterized by (U⊕T ⊕V )∧T = 0 where the ∧

is the bitwise-and operation. That is, in all bit posi-

tions where T has a 1-bit, the bit values of U and V

must be opposite. Looking only at one bit position,

xoring T into both U and V changes one of the bits

to be added from zero to one, and the other bit to

be added from one to zero. Thus, the result remains

the same. When T has a Hamming weight of 8, then

2

−8

of all V values will satisfy equation 2 for given

U and T.

From a single potential right pair, we must now try

many diﬀerent values for the input to the second g

function in the second round, without altering the

high-order byte values. When we try about 256 dif-

ferent values of this kind with a potential right pair,

we expect to get a real right pair. This can be

done by changing any of the bytes that go to the

second round’s second g function, without changing

any other bytes. This means that, from potential

right pair (A, B, C, D), (A

∗

, B

∗

, C

∗

, D

∗

), we derive

a new potential right pair by leaving the relation-

ship between the pair the same, but altering D in

its low-order two bytes.

Because we do not know which of the plaintext pairs

described above has the potential right pair, we must

apply this process to all of them. Thus, we gener-

ate 256 diﬀerent plaintext pairs for each of the 2

16

plaintext pairs we already have, giving 2

24

plaintext

pairs. Out of all this, we expect to get at least one

right pair.

In Section 8.3.1, we found empirically that a char-

acteristic of the form (a, b) → (0, X) for the round

function has a probability of about 2

−20

. This allows

us to improve the probability of the characteristic for

the F function, and reduce the number of plaintext

pairs needed to about 2

20

.

Turning One Right Pair into 2048 Right Pairs

We now have 2

20

plaintext pairs, of which at least

one is, with fairly high probability, a right pair. Un-

fortunately, to mount the 5-round attack, we need

a batch of 2048 such pairs. The reason we need

2048 pairs is that we will recover two bits of infor-

mation about each entry in each S-box; this makes

for 2 · 4 · 256 = 2048 unknown bits, and each of the

2048 pairs give us one bit of information on those un-

knowns. Once we have recovered information on all

(or most) S-box entries, we can then readily recover

the keying material entering each S-box by brute

force.

Consider, again, equation 2. We already saw that

this equation is equivalent to (U ⊕T ⊕V ) ∧ T = 0.

This implies ((U ⊕W) ⊕T ⊕(V ⊕W)) ∧T = 0, for

any value of W. So if we have a (U, V, T) that satis-

ﬁes equation 2, then (U ⊕W, V ⊕W, T) also satisﬁes

that equation for any value of W.

38

This gives us a way to derive many plaintext pairs

from one right pair. If we have a right pair

(A, B, C, D), (A

∗

, B

∗

, C

∗

, D

∗

), then we can generate

a new right pair, (A

i

, B

i

, C

i

, D

i

), (A

∗

i

, B

∗

i

, C

∗

i

, D

i

).

To do this, we must manipulate the C

i

, D

i

values in

such a way that the same xor diﬀerence occurs in

the output of both g functions in the second round.

In doing this, we face the same constraints as in get-

ting a desired output xor diﬀerence from s

0

, and

so we use the same solution: send in the same byte

inputs to s

1

in C and D. We then alter the byte in-

puts to s

1

in the altered C

i

and D

i

in the same way,

so that the same actual pair of inputs occurs in s

1

in both g functions. This is guaranteed to give the

same output xor in both g functions. The C

∗

i

, D

∗

i

values have the same relationship to the C

i

, D

i

val-

ues as do the C

∗

, D

∗

values to the C, D values.

This introduces still one more complication to our

input structure. We again do not know the out-

put from the ﬁrst round’s F function, and so do not

know how to get identical bytes into s

1

in the second

round. We must try 256 diﬀerent possible values in

our inputs, to deal with all possible byte xors be-

tween the s

1

input in C and in D in the second

round. This increases the total number of batches

of plaintext pairs requested by a factor of 256.

Summary of the Input Structure To get a ﬁve

round attack to work, the input structure is highly

complex. We thus summarize how it is built:

1. We ﬁx A and B throughout the attack.

2. Let the bytes of C, D be denoted by

(c

0

, c

1

, c

2

, c

3

),(d

0

, d

1

, d

2

, d

3

). We ﬁx c

3

, d

2

, d

3

.

3. We let c

1

range over all possible values, and

force d

1

= c

1

.

4. We let c

2

range over 16 possible values.

5. We let the pair (c

0

, d

0

) range over any 1450

possible values; those values can be arbitrary,

so long as they are distinct.

6. Finally, we form all 2

8

· 16 · 1450 = 2

22.5

possi-

ble combinations of c

1

, d

1

, c

2

, (c

0

, d

0

), and get

the encryption of the 2

22.5

resulting plaintexts.

This gives us 2

44

possible plaintext pairs which

can be chosen from the set of available texts;

we shall use them as described below. The in-

tuition is that 2

44

> 2

18

· 256 · 2048, so it is

plausible that there could be enough plaintext

pairs, if the input structure is well-chosen.

Using the Structure We shall describe how we

use the available plaintexts to construct the neces-

sary plaintext pairs. First, we guess the value v of

the byte xor mentioned above (used to ensure that

the s

1

inputs are the same in round 2). From now

on, we shall think of v as ﬁxed. We can restrict our

attention to those plaintext pairs (C, D), (C

∗

, D

∗

)

with c

1

⊕ c

∗

1

= v; for those pairs, we’ll also have

d

1

⊕ d

∗

1

= v for free; and those two relations ensure

that the byte inputs into s

1

match up in the second

round as desired if we have guessed v correctly. This

gives us about 2

44

/2

8

= 2

36

plaintext pairs with ap-

propriate values of (c

1

, d

1

), (c

∗

1

, d

∗

1

). We shall also

insist that c

2

= c

∗

2

; this reduces the number of us-

able plaintext pairs to 2

36

/16 = 2

32

.

Next let’s look for a right pair. Suppose that some

value of (C, D), (C

∗

, D

∗

) leads to a right pair. Then

we note that what makes this a right pair depends

only on (c

0

, d

0

), (c

∗

0

, d

∗

0

). Therefore, given any one

right pair, you can easily get a batch of 2

32

/1450

2

=

2048 right pairs by ﬁxing (c

0

, d

0

), (c

∗

0

, d

∗

0

) and letting

c

1

, c

∗

1

, c

2

, c

∗

2

range over all possible values such that

c

1

⊕c

∗

1

= v and c

2

= c

∗

2

.

Based on these observations, we try each of the 1450

2

possibilities for (c

0

, d

0

), (c

∗

0

, d

∗

0

) in turn, testing each

of them for a right pair. When we ﬁnd one right

pair, the structure ensures we’ll ﬁnd 2048 simulta-

neous right pairs.

8.1.4 Recovering Key Material

We describe next how to use a batch of 2048 simul-

taneous right pairs to recover all of the key mate-

rial entering the S-boxes. We focus on one value of

(c

0

, d

0

), (c

∗

0

, d

∗

0

), considering it ﬁxed; the issue is how

to identify whether it contains 2048 right pairs, and

if so, how to recover key material.

Suppose we knew we had a batch of right pairs.

Then, the diﬀerence T is visible from a ciphertext

pair, and T = ROL(Z, 1) ⊕ ∆F

4,1

. Of course, we

know that the low-order bit of Z is zero; therefore,

we need only look at T and F

4,1

mod 4. We let

t

0

, t

1

be the outputs of the two g functions in round

four, so that F

4,1

= t

0

+ 2t

1

+ K

17

mod 2

32

. We

guess the low-order bit of K

17

, so that the low-order

bit of F

4,1

depends only on the low-order bit of t

0

,

and the next-lowest bit of F

4,1

depends only on an

xor of the low-order bit from t

1

, the next-lowest

bit from t

0

(and possibly the low-order bit of t

0

, de-

pending on K

17

mod 2). This, in turn, means that

the next-lowest bit of the observable diﬀerence T de-

pends only on the xor of low-order bits from the g

function outputs.

39

At this stage, we could guess the key material en-

tering the S-boxes, compute the low-order bits of

t

0

and t

1

for each ciphertext pair, and check our

guess against the observed values of T. However,

this would already require 2

64

work for a 128-bit

key, and 2

128

work for a 256-bit key, which is too

high for comfort. Therefore, we propose an alter-

nate analysis method with much lower complexity.

Our key recovery technique is based on establish-

ing and solving linear equations. We construct

2 · 4 · 256 = 2048 formal unknowns x

i,j,k

∈ GF(2) as

two linear combinations on each of the 256 entries

in each of the four S-boxes. For example, x

0,3,17

would represent the low-order bit of the output of

the MDS matrix when applied to s

3

(17); similarly

for the other x’s. The main observation is that

the observable diﬀerence T gives us one equation

on those unknowns, for each ciphertext pair which

arises from a right pair. This is true because the low-

order bit of t

0

is obtained as a GF(2)-linear combi-

nation of x’s, namely x

0,0,p

⊕x

0,1,q

⊕x

0,2,r

⊕x

0,3,s

if

the input to g is (p, q, r, s). With this idea in mind,

we use our batch of 2048 pairs to generate 2048 lin-

ear equations on the 2048 unknowns. This system

can be solved by standard linear algebra techniques

(such as Gaussian elimination, although a more so-

phisticated algorithm is more appropriate since we

will have a very sparse matrix).

Therefore, our algorithm for identifying right pairs

and recovering key material is as follows. Look at

the 2048 candidate pairs generated by one value of

(c

0

, d

0

), (c

∗

0

, d

∗

0

), using them to generate a system of

2048 linear equations on 2048 unknowns. If this sys-

tem of linear equations has no solution, then we

know that those candidate pairs do not represent

right pairs, and we may move on to another value of

(c

0

, d

0

), (c

∗

0

, d

∗

0

). (When we are looking at the wrong

value of (c

0

, d

0

), (c

∗

0

, d

∗

0

), the system of linear equa-

tions will be inconsistent with good probability, and

thus the cost of ﬁltering should not be too high.)

Otherwise, we solve the system of linear equations,

thus obtaining two bits of information on each S-

box entry (or on most of them). This enables us

to isolate the key material entering each S-box, and

solve for each S-box in turn. The key material enter-

ing, say, s

0

can be recovered by a brute force search

using the known values of x

·,0,·

; this between 2

16

work (for a 128-bit key) and 2

32

work (for a 256-

bit key). (Alternatively, the key material entering

s

0

could be recovered by a table lookup in a 2

16

–

2

32

-entry table.) If some S-box does not admit any

choice of keys, then we can also reject the batch of

2048 candidate pairs, and move on to another value

of (c

0

, d

0

), (c

∗

0

, d

∗

0

). However, when we ﬁnally en-

counter a right pair, this procedure will recover all

of the key material entering the S-boxes. Finally,

the rest of the key can be recovered by other tech-

niques; for example, see below for a more eﬀective

diﬀerential attack which works when the S-boxes are

known.

In the end, we reject each wrong batch of pairs

with about 2048

2

work, on average, and recover the

key material from the right pairs with at most 2

32

work. This leads to a total work factor of about

1450

2

· 2048

2

· 256 + 2

32

= 2

51

; in addition, the at-

tack needs about 2

22.5

chosen plaintexts.

8.1.5 Reﬁnements and Extensions

Attacking Four Rounds The above structure

can be used in a much easier 4-round attack. The

attacker can immediately rule out all but the right

batch of ciphertext pairs, since he can see the Z dif-

ference word directly. He can then mount a 2

32

eﬀort

search for S; the right key will reveal an unchanged

low-order bit in A after g is computed. The total

work expected is 2

32

trial g computations.

Known g Keys We can also consider a variant of

Twoﬁsh with known S-boxes in g. In this case, we

can use the same basic attack, but push it through

more rounds.

Consider a 6-round Twoﬁsh variant with this input

structure. To expose the low-order bit of the Z dif-

ference, we must now guess the last round’s subkey.

A correct guess will allow us to compute the ﬁfth

round’s g functions, which will allow us to recognize

the sequence of right pairs by the low-order bit of

the Z diﬀerence, just as in the above attack. There

is an improvement, however: we only really need

to know the ﬁrst g computation result in the ﬁfth

round, since the PHT has the eﬀect of making the

low-order bit of the second word of F function out-

put dependent only on that value and one bit of the

ﬁfth round subkey. This means we need to know

only the ﬁrst input to the g function in the ﬁfth

round, which means we need only guess 32 bits of

the sixth round subkey. The resulting attack breaks

six rounds with about 2

67

work.

Consider a 7-round Twoﬁsh variant with this input

structure. We now need to know the whole input to

the sixth round F function. If we guess the whole

seventh round subkey, and 32 bits of the sixth round

subkey, we can mount the same attack. We thus get

an attack with 2

131

work, which is of some theoret-

ical interest against 192-bit and 256-bit keys.

40

It may also be possible to improve the number of

plaintexts required by a factor of several thousand,

because we now know the diﬀerence tables for our

S-boxes.

8.2 Extensions to Diﬀerential Crypt-

analysis

8.2.1 Higher-Order Diﬀerential Cryptanaly-

sis

Higher-order diﬀerential cryptanalysis looks at

higher order relations (e.g., quadratic) between pairs

of plaintext and ciphertexts [Lai94, Knu95b]. These

attacks seem to work best against algorithms with

simple algebraic functions, only a few rounds, and

poor short-term diﬀusion. In particular, we are

not aware of any higher-order diﬀerential attack

reported in the open literature that is successful

against more than 6 rounds of the target cipher. We

cannot ﬁnd any higher-order diﬀerentials that can

be exploited in the cryptanalysis of Twoﬁsh.

8.2.2 Truncated Diﬀerentials

Attacks using truncated diﬀerentials apply a dif-

ferential attack to only a partial block [Knu95b].

We have not found any truncated attacks against

Twoﬁsh. The almost complete diﬀusion within a

round function makes it very diﬃcult to isolate a

portion of the block and ignore the rest of the block.

Truncated diﬀerential attacks are most often suc-

cessful when most of the cipher’s internal compo-

nents operate on narrow bit paths; Twoﬁsh’s 64-bit-

wide F function seems to make truncated diﬀerential

characteristics hard to ﬁnd. Additionally, truncated

diﬀerential attacks are often diﬃcult to extend to

more than a few rounds in a cipher. We believe that

Twoﬁsh is secure against truncated diﬀerential at-

tacks.

8.3 Search for the Best Diﬀerential

Characteristic

When assessing the security of ciphers built out of

S-boxes and simple diﬀusion boxes, it is often easy

to obtain a crude lower bound on the complexity of

a diﬀerential attack by simply bounding the num-

ber of active S-boxes that must be involved in any

successful diﬀerential characteristic. (An inactive S-

box is one that is fed with the trivial characteristic

0 → 0; all others are considered active.) If any diﬀer-

ential characteristic of suﬃcient length must include

at least n active S-boxes, and there is no character-

istic for any of the S-boxes with probability greater

than DP

max

, then one can conclude that the proba-

bility of any diﬀerential characteristic can be at most

(DP

max

)

n

. Of course, this approach often results

in extremely crude and conservative bounds, in the

sense that the true complexity of mounting a practi-

cal diﬀerential attack is often much higher than the

bound indicates. Nonetheless, when (DP

max

)

n

is

suﬃciently small, the results can provide powerful

evidence of the cipher’s security against diﬀerential

cryptanalysis.

We follow this approach to develop additional ev-

idence that Twoﬁsh is likely to be secure against

diﬀerential attacks. Section 7.2.3 already provides

convenient values of DP

max

for us; the hard part

is to determine a bound on the number n of active

S-boxes.

Our approach was based on the observation that the

number of active S-boxes can be counted by examin-

ing the active bytes positions of the examined diﬀer-

ential characteristics. By an active byte of a diﬀeren-

tial characteristic, we mean one that has a non-zero

diﬀerence at that position of the characteristic. Of

course, an S-box in the F function is active exactly

when its input byte is active, so characterizing the

pattern of active bytes in a given diﬀerential charac-

teristic allows us to count the number of S-boxes it

includes. Moreover, due to the heavily byte-oriented

nature of the F function, we can obtain bounds on

the propagation of active bytes through the F func-

tion; for example, if there is one active byte at the

input to the F function, there must be at least four

active bytes at its output. Furthermore, knowing

the pattern of active bytes in the diﬀerence at the

output of the F function lets one derive constraints

on the pattern of active bytes at the inputs of F

functions at neighboring rounds.

For convenience, we introduce a simple notation to

indicate the location of active bytes: a x repre-

sents an (unspeciﬁed) non-zero byte diﬀerence (and

thus an active byte position), while a 0 represents a

zero byte diﬀerence (and thus an inactive byte po-

sition). An even shorter notation treats the string

speciﬁcation of active bytes as a binary string, and

packs it into a hexadecimal string for more con-

venient display. For example, the characteristic

0000000000B4000C

16

→ 591FE47201234500

16

for

the F function might be represented in our ﬁrst no-

tation as 00000x0x → xxxxxxx0, which indicates

that two of the byte positions in the input diﬀer-

ence are active and all but one of the bytes in the

output diﬀerence are active; a shorter form for the

41

same characteristic is 05

16

→ FE

16

. Thus, any one

such pattern represents a large class of possible dif-

ferential characteristics.

In this approach, we can count the minimum number

of S-boxes over all diﬀerential characteristic by fully

searching the set of all patterns of active bytes. Our

implementation used Matsui’s algorithm for pruned

search of all diﬀerential characteristics [Mat95]. We

conservatively assumed that a good analyst might be

able to bypass the ﬁrst round of Twoﬁsh and mount

a 3-R attack, thus needing to cover only rounds 2–13.

Therefore, we concentrated our eﬀorts on 12-round

characteristics.

When implementing this approach, it is diﬃcult to

model the eﬀects of the one-bit rotations and the

PHT precisely, since they introduce non-linear de-

pendence across byte boundaries. To simplify the

task, we considered a restricted model that intro-

duces two small inaccuracies:

1. We ignore the one-bit rotations. (In some

cases, the one-bit rotations can cause diﬀusion

across byte boundaries in a form that we did

not model.)

2. A few troublesome diﬀerential characteristics

for the PHT are ignored: namely, those

that rely on a carry bit propagating through

at least 8 bit positions. For example, the

characteristic (01FFFE80

16

, 01FFFE80

16

) →

(00000100

16

, 00000080

16

) for the PHT has

positive probability, yet it is not considered in

our model. These sorts of characteristics were

omitted because they are harder to character-

ize in full generality, and also because they

have very low probability. (Experiments sug-

gest that the previous example has a probabil-

ity of perhaps 2

−36

or so.)

Any attack which does not violate this model must

obey our bounds below.

In this model, we found that there are no good high-

probability diﬀerential characteristics covering more

than 12 rounds of Twoﬁsh. Our results show that,

under this model, the best 12-round diﬀerential char-

acteristic must involve at least 20 active S-boxes.

The reasoning is as follows. Our implementation

of Matsui’s algorithm took too long to search the

space of all diﬀerential characteristics that were 9

rounds or longer, so we were forced to generalize

from data on characteristics covering 8 or fewer

rounds. The program did show that covering 4

rounds requires at least 6 active S-boxes, covering 6

rounds requires at least 10 active S-boxes, and cov-

ering 8 rounds requires at least 14 active S-boxes.

Of course, any 12-round characteristic must include

a sub-characteristic covering the ﬁrst 4 rounds and a

sub-characteristic covering the last 8 rounds; there-

fore, any characteristic covering 12 rounds must in-

clude at least 6 + 14 = 20 active S-boxes. (This

bound is probably not tight.)

For example, one of the best 8-round characteristics

found was

1 88->FE (2 active)

2 88<-00 (0 active)

3 88->00 (2 active)

4 88<-F0 (4 active)

5 00->F0 (0 active)

6 00<-F0 (4 active)

7 88->F0 (2 active)

8 88<-00 (0 active)

9 88 00

Here we see that the characteristics for the F func-

tion were 88

16

→ FE

16

, 0 → 0, 88

16

→ F0

16

, and so

on.

We may now add in the results of Section 7.2.3.

For all 128-bit keys, we are guaranteed that

DP

max

≤ 18/256, so any attacker would need at

least (DP

max

)

−20

≈ 2

76.6

chosen plaintexts; for 192-

bit keys, we have DP

max

≤ 24/256, so the best an

attacker could hope for is an attack needing 2

68.3

chosen plaintexts. Of course, even if these attacks

were possible, they could only work for a very small

class of weak keys (constituting about 2

−48

of the

128-bit keyspace, or in the second case, about 2

−64

of the 192-bit keyspace).

It is much more natural to demand that the at-

tack work for a signiﬁcant percentage of all keys. In

this case, DP

max

= 12/256 is a more representative

value (for all key sizes), and then any diﬀerential at-

tack working for a signiﬁcant fraction of the keyspace

would require at least 2

88.3

chosen plaintexts. This

already rules out all practical attacks.

These estimates are likely to signiﬁcantly underesti-

mate the complexity of a diﬀerential attack. We list

here some of the practical barriers a real attacker

would have to surmount:

1. The attacker would have to ﬁnd speciﬁc dif-

ferences for the active byte positions that

ﬁt together and that still make the attack

work. This involves picking speciﬁc diﬀer-

ences, where the average probability of any

particular diﬀerential of an S-box is of course

much lower than the maximum probability.

42

(Doing so is already hard within the model;

the one-bit rotations make it even harder, since

they eliminate any hope for good iterative

characteristics.)

2. Those speciﬁc diﬀerences would have to prop-

agate through the key-dependent S-boxes, the

MDS matrix, the PHT, and the subkey ad-

dition with high probability. The PHT and

subkey addition are particularly thorny barri-

ers: in our model, the attacker was not charged

with the cost of pushing a diﬀerence through

them, but in real life, the overall probability

of the diﬀerential characteristic would be sig-

niﬁcantly reduced by the combined probabili-

ties of passing the non-trivial round character-

istics through the PHT and subkey addition.

An attacker would have to ﬁnd a way to min-

imize those costs. Because the PHT and the

subkey addition rely on addition modulo 2

32

,

pushing a diﬀerence with signiﬁcant probabil-

ity through requires a diﬀerence with relatively

low Hamming weight. However, this imposes

signiﬁcant constraints on the diﬀerence pushed

through the MDS matrix; and Section 7 shows

that it is very diﬃcult to cause the diﬀerence

at the output MDS matrix to have low Ham-

ming weight.

3. The attacker would most probably have to

ﬁnd a high-probability characteristic for each

of the four S-boxes s

0

, . . . , s

3

at the same time.

We have seen earlier that attaining DP

max

for even one S-box requires exceptional luck—

only about 2

−12

of all 128-bit keys attain

DP

max

= 18/256 at any one S-box—and at-

taining DP

max

at all four S-boxes places even

more constraints on the key. Of course, even

for the small class of weak keys where this con-

straint is satisﬁed, the attacker would have to

learn which input and output diﬀerences for

the S-box attain that maximum, and iden-

tify a way to piece them together into full 12-

round characteristics. Since the S-boxes are

key-dependent, even the former task is non-

trivial, and the latter may well be impossible

in most cases.

Therefore, we believe that any realistic diﬀerential

attack is likely to require many more texts, in prac-

tice, than our bound might indicate.

A brief note on the implications of these results is in

order. Recall that our model is an imperfect ideal-

ization of Twoﬁsh; in some cases, the abstraction has

introduced inaccuracies. The right way to interpret

these results, then, is to conclude that any diﬀer-

ential attack which respects this model is extremely

unlikely to succeed.

This provides compelling evidence that attackers

who respect our model will probably get nowhere.

But what about the adversary who “breaks the

rules”?

We believe that piecing together an attack that

avoids the diﬃculties imposed by our model will not

be easy. There are three choices:

1. Try to take advantage of the one-bit rotations.

We believe that the one-bit rotations make

cryptanalysis harder, if they have any eﬀect at

all. By forcing neighboring characteristics for

the round function to “line up” with shifted

versions of each other, the one-bit rotation

makes it very hard to piece together several

short characteristics into a useful characteris-

tic covering many rounds. As an important

special case, this should make good iterative

characteristics very hard to ﬁnd (if they even

exist). As another important special case, if s

0

(say) has only one input diﬀerence with proba-

bility DP

max

, it makes it diﬃcult to use that

input diﬀerence every time that s

0

is active;

this serves to increase the diﬃculty of ﬁnding

high-probability characteristics even further.

2. Try to model the PHT more accurately than

we have done. This might be an promising

avenue for further exploration. However, the

characteristics for the PHT which we mod-

eled inaccurately are hard to use in real at-

tacks: they are rare (which means they are

hard to piece together with good character-

istics for other cipher components), and they

typically have a very low probability.

3. Try to do both at the same time. This should

be even harder than either of the above.

While our results with this model do not rule out

the possibility that such an approach to diﬀeren-

tial cryptanalysis might be made to work, it seems

clear that the analyst will be hard-pressed to achieve

much success.

In short, all evidence available to us suggests that

diﬀerential attacks on Twoﬁsh are likely to be well

out of reach for the foreseeable future.

43

8.3.1 Diﬀerential characteristics of F

We ran a test for diﬀerential characteristics of the F

function with one active byte in each 4-byte g input

and at most 4 active bytes at the 8-byte F output.

We found some interesting diﬀerentials. For exam-

ple:

Prob. Diﬀerential

2

−20

x000 0x00 → 0000 xxxx

2

−26

x000 0x00 → 0000 0xxx

2

−22

x000 0x00 → xxxx 0000

2

−26

000x x000 → 0000 xx0x

2

−21

000x x000 → 0000 xxxx

2

−22

000x x000 → xxxx 0000

2

−20

00x0 000x → 0000 xxxx

2

−22

00x0 000x → xxxx 0000

2

−14

0x00 0000 → 0x0x 0x0x

These probabilities are very approximate as they de-

rive from numerical experiments with not too many

samples. Note that in all but the last of these, the

two active input bytes are going through the same

key-dependent S-box due to the 8-bit rotate at the

input of the second g function. The leading cause of

all of these diﬀerentials seems to be that these two

bytes generate the same output diﬀerential from the

S-box, and thus the same diﬀerential at the output

of the MDS matrix. With some positive probability,

the PHT maps this into an output diﬀerential where

one half is zero. This probability depends on the bit

pattern of the diﬀerential just before the PHT.

The last characteristic has a diﬀerent explanation.

The input diﬀerential results in a diﬀerential at the

output of the MDS matrix that has a relatively high

chance of being converted into a 0x0x pattern after a

constant is added to it. For example, the diﬀerential

03f2037a

16

has a fair chance of being converted to

a 0x0x pattern after a 32-bit addition.

All of these characteristics have a relatively low

probability. We have not found any attack that di-

rectly relates to this property, but it provides a use-

ful starting point for further research. In particular,

upper bounds on the diﬀerential characteristics of F

of this type could be used to improve our bound on

the best diﬀerential characteristic for the full cipher.

8.4 Linear Cryptanalysis

We perform a similar analysis to that of Section 8.3

in the context of linear cryptanalysis. The approach

is much the same, as are the results.

Our model is much as before: it is centered around

the pattern of active bytes, and it involves a few

inaccuracies in special cases. (Here a byte position

is considered active if any of its bits are active in

the linear approximation for that round; i.e., if the

mask Γ selects at least one bit from that byte posi-

tion.) Our characterization of linear approximations

for the PHT was a bit weaker than the correspond-

ing analysis for diﬀerential attacks; as a result, our

model is less accurate than before. In particular:

1. As before, the one-bit rotations are not always

treated properly.

2. More importantly, our model of the PHT fails

to accurately treat some low-probability ap-

proximations for the PHT where there are

fewer active bytes at the output than at the

input.

Because of the second limitation, our model should

be considered only a heuristic; it may fail to capture

some important features of the round function.

A search for the best linear characteristics within

this model found that every 12-round characteristic

has at least 36 active S-boxes. Here is an example

of one characteristic we found:

1 FE->FF (1 active)

2 FF<-FF (2 active)

3 FF->DD (4 active)

4 CC<-DD (6 active)

5 CC->00 (0 active)

6 CC<-00 (6 active)

7 CC->77 (4 active)

8 00<-77 (0 active)

9 00->77 (4 active)

10 66<-77 (6 active)

11 66->00 (0 active)

12 66<-00 (3 active)

13 66 07

The ﬁrst few approximations for the F function are

01

16

→ FF

16

, 22

16

→ FF

16

, 33

16

→ DD

16

, and so on.

To translate these results into an estimated attack

complexity, we turn to the results of Section 7.2.3.

We have LP

max

≤ (108/256)

2

, so this suggests an

attacker would need at least (LP

max

)

−36

≈ 2

89.6

known plaintexts; and such an analysis would only

work for a very small class of weak keys (represent-

ing about 2

−49.6

of the keyspace for 128-bit keys, or

about 2

−68

of the 192-bit keyspace).

It is much more natural to demand that the attack

work for a signiﬁcant percentage of all keys. In this

44

case, LP

max

= (80/256)

2

is a much more represen-

tative value, and thus in this model any linear at-

tack working for a signiﬁcant fraction of the keyspace

would require at least 2

120.8

chosen plaintexts.

These results are only heuristic estimates, and prob-

ably overestimate the probability of the best linear

characteristic. Nonetheless, they present some use-

ful evidence for the security of Twoﬁsh against linear

cryptanalysis.

8.4.1 Multiple Linear Approximations

Multiple linear approximations [KR94, KR95] allow

one to combine the bias of several high-probability

linear approximations. However, it only provides a

signiﬁcant advantage over traditional linear crypt-

analysis when there are a number of linear approxi-

mations whose bias is close to that of the best linear

approximation. In practice, this seems to improve

linear attacks by a small constant factor. Hence, we

do not feel that Twoﬁsh is vulnerable to this kind of

cryptanalysis.

8.4.2 Non-linear Cryptanalysis

Another generalization of linear cryptanalysis looks

at non-linear relations [KR96a]: e.g., quadratic re-

lations. While this attack, combined with the tech-

nique of multiple approximations [KR94], managed

to improve the best linear attack against DES a

minute amount [SK98], we do not believe it can be

brought to bear against Twoﬁsh for the same reasons

that it is immune to linear cryptanalysis.

8.4.3 Generalized Linear Cryptanalysis

This generalization of linear cryptanalysis uses the

notion of binary I/O sums [HKM95, Har96, JH96].

An attacker attempts to ﬁnd a statistical imbalance

that can be described as the result of some group op-

eration on some function of the plaintext and some

function of the ciphertext. We have not found any

such statistical imbalances, and believe Twoﬁsh to

be immune to this kind of analysis.

8.4.4 Partitioning Cryptanalysis

Partitioning cryptanalysis is another generalization

of linear cryptanalysis [Har96, JH96, HM97].

20

An

attacker trying to carry out a partitioning attack

is generally trying to ﬁnd some way of partitioning

the input and output spaces of the round function

so that knowledge of which partition the input to a

round is in gives some information about which par-

tition the output from a round is in. This can be seen

as a general form of a failure of the block cipher to

get good confusion; an attacker after N rounds can

distinguish the output from the Nth round from a

random block of bits, because the output is some-

what more likely to be in one speciﬁc partition than

in any of the others. This can be used in a straight-

forward way to attack the last round of the cipher.

We have been unable to ﬁnd any useful way to par-

tition the input and output spaces of the Twoﬁsh F

function or a Twoﬁsh round that works consistently

across many keys, because of the key-dependent S-

boxes. For the 128-bit key case, there are 2

64

diﬀer-

ent F functions, presumably each with its own most

useful partitioning. We are not aware of any general

way to partition F’s inputs and outputs to facilitate

such attacks.

8.4.5 Diﬀerential-linear Cryptanalysis

Diﬀerential-linear cryptanalysis uses a combination

of techniques from both diﬀerential and linear crypt-

analysis [LH94]. Due to the need to cover the

last part of the cipher with two copies of a lin-

ear characteristic, the bias of the linear character-

istic is likely to be extremely small unless the lin-

ear portion of the attack is conﬁned to just three

or four rounds. (The available linear characteristics

for Twoﬁsh’s round function have a relatively low

probability, and are very hard to combine due to

the MDS and PHT mappings.) This means that the

cryptanalyst would need to cover almost all of the

rounds with the diﬀerential characteristic, making

a diﬀerential-linear analysis not much more power-

ful than a purely diﬀerential analysis. Therefore, we

feel that diﬀerential-linear cryptanalysis is unlikely

to be successful against the Twoﬁsh structure. In

our analysis, we have found no diﬀerential-linear at-

tacks that work against Twoﬁsh.

8.5 Interpolation Attack

The interpolation attack [JK97, MSK98b] is eﬀective

against ciphers that use simple alegbraic functions.

The principle of the attack is simple: if the cipher-

text can be represented as a polynomial or rational

expresson (with N coeﬃcients) of the plaintext, then

the polynomial or rational expression can be recon-

structed using N plaintext/ciphertext pairs.

20

Similar ideas can be found in [Vau96b].

45

However, interpolation attacks are often only work-

able against ciphers with a very small number of

rounds, or against ciphers whose rounds functions

have very low algebraic degree. Twoﬁsh’s S-boxes

already have relatively large algebraic degree, and

the combination of operations from diﬀerent alge-

braic groups (including both addition mod 2

32

and

xor) should help increase the degree even further.

Therefore, we believe that Twoﬁsh is secure against

interpolation attacks after even only a small number

of rounds.

8.6 Partial Key Guessing Attacks

A good key schedule should have the property that,

when an attacker guesses some subset of the key bits,

he does not learn very much about the subkey se-

quence or other internal operations in the cipher.

The Twoﬁsh key schedule has this property.

Consider an attacker who guesses the even words

of the key M

e

. He learns nothing of the key S to g.

For each round subkey block, he now knows A

i

. If he

guesses K

0

, he can compute the corresponding K

1

.

He can carry this attack out against as many round

subkeys as he likes, but each guess takes 32 bits. We

can see no way for the attacker to actually test the

96-bit guess that it would take to attack even one

round’s subkey in this way on the full Twoﬁsh.

An alternative is to guess the key input S to g. This

is only half the length of the full key M, but provides

no information about the round keys K

i

. The dif-

ferential attack described in Section 8.1 is the best

way we were able to ﬁnd to test such a partial key

guess. We can see no way to test a guess of S on the

full sixteen round Twoﬁsh.

8.7 Related-key Cryptanalysis

Related-key cryptanalysis [Bih94, KSW96, KSW97]

uses a cipher’s key schedule to break plaintexts en-

crypted with related keys. In its most advanced

form, diﬀerential related-key cryptanalysis, both

plaintexts and keys with chosen diﬀerentials are used

to recover the keys. This type of analysis has had

considerable success against ciphers with simplistic

key schedules—e.g., GOST [GOST89] and 3-Way

[DGV94b]—and is a realistic attack in some circum-

stances. A conventional attack is usually judged

in terms of the number of plaintexts or ciphertexts

needed for the attack, and the level of access to the

cipher needed to get those texts (e.g., known plain-

text, chosen plaintext, adaptive chosen plaintext).

8.7.1 Resistance to Related-key Slide At-

tacks

A “slide” attack occurs in an iterated cipher when

the encryption of one block for rounds 1 through n

is the same as the encryption of another block for

rounds s + 1 to s + n. An attacker can look at

two encryptions, and can slide the rounds forward

in one of them relative to another. S-1 [Anon95]

can be broken with a slide attack [Wag95a]. Travois

[Yuv97] has identical round functions, and can also

be broken with a slide attack. Conventional slide at-

tacks allow one to break the cipher with only known-

or chosen-plaintext queries; however, as we shall see

next, there is a generalization to related-key attacks

as well.

Related-key slide attacks were ﬁrst discovered by Bi-

ham in his attack on a DES variant [Bih94]. To

mount a related-key slide attack on Twoﬁsh, an at-

tacker must ﬁnd a pair of keys M, M

∗

such that

the key-dependent S-boxes in g are unchanged, but

the subkey sequences slide down one round. This

amounts to ﬁnding, for each of the eight byte-

permutations used for subkey generation, a change

in the keys such that:

s

i

(j, M) = s

i

(j + 2s, M

∗

)

for n values of j. In total, this requires 8n of these

relations to hold.

Let us look in more detail for a ﬁxed key M. Let

m ∈ {5, . . . , 8} be the number of S-boxes used to

compute the round keys that are aﬀected by the dif-

ference between M and M

∗

. Observe that m ≥ 5

due to the restriction that S cannot change and the

properties of the RS matrix that at least 5 inputs

must change to keep the output constant. There are

at most

_

8

m

_

2

32m−128

possible choices of M

∗

. We

have a total of nm 8-bit relations that need to be sat-

isﬁed. The expected number of M

∗

that satisfy these

relations is thus

_

8

m

_

· 2

−8nm+32m−128

. For n ≥ 4 this

is dominated by the case m = 5; we will ignore the

other cases for now. So for each M we can expect

about 2

38−40n

keys M

∗

that support a slide attack

for n ≥ 4. This means that any speciﬁc key is un-

likely to support a slide attack with n ≥ 4. Over all

possible key pairs, we expect 2

293−40n

pairs M, M

∗

for which a slide of n ≥ 4 occurs. Thus, it is unlikely

that a pair exists at all with n ≥ 8.

Swapping Key Halves It is worth considering

what happens when we swap key halves. That is,

we swap the key bytes so that the values of M

e

and

M

o

are exchanged. In that case, the sequence of A

i

46

and B

i

values totally changes, because of the diﬀer-

ent index values used. We can see no useful attack

that could come from this.

Permuting the Subkeys Although there is no

known attack stemming from it, it is interesting to

ask whether there exist pairs of keys that give per-

mutations of one another’s subkeys. There are 20!

ways that the rounds’ subkey blocks could be per-

muted. This is almost as large as 2

64

, and so there

may very well be pairs of keys that give permuta-

tions of one another’s round subkey blocks.

8.7.2 Resistance to Related-key Diﬀerential

Attacks

A related-key diﬀerential attack seeks to mount a

diﬀerential attack on a block cipher through the

key, as well as or instead of through the plain-

text/ciphertext port. Against Twoﬁsh, such an at-

tack must control the subkey diﬀerence sequence for

at least the rounds in the middle. For the sake

of simplifying discussions of the attack, let us con-

sider an attacker who wants to put a chosen subkey

diﬀerence into the middle twelve rounds’ subkeys.

That is, he wants to change M to M

∗

, and control

D[i, M, M∗] for i = 12..35. At the same time, he

needs to keep the g function, and thus the key S,

from changing. All else being equal, the longer the

key, the more freedom an attacker has to mount a

related-key diﬀerential attack. We thus will assume

the use of 256-bit keys for the remainder of this sec-

tion. Note that a successful related-key attack on

128- or 192-bit keys that gets only zero subkey diﬀer-

ences in the rounds whose subkey diﬀerences it must

control translates directly to an equivalent related-

key attack on 256-bit keys.

Consider the position of the attacker if he attempts a

related-key diﬀerential attack with diﬀerent S keys.

This must result in diﬀerent g outputs for all inputs,

since we know that there are no pairs of S values

that lead to identical S-boxes. Assuming the pair of

S values does not lead to linearly related S-boxes, it

will not be possible to compensate for this change

in S with changes in the subkeys in single rounds.

The added diﬃculty is approximately that of adding

24 active S-boxes to the existing related-key attack.

For this reason, we believe that any useful related-

key attack will require a pair of keys that keeps S

unchanged.

8.7.3 The Zero Diﬀerence Case

The simplest related-key attack to analyze is the one

that keeps both S and also the middle twelve rounds’

subkeys unchanged. It thus seeks to generate iden-

tical A and B sequences for twelve rounds, and thus

to keep the individual byte sequences used to derive

A and B identical.

The RS code used to derive S from M strictly limits

the ways an attacker can change M without alter-

ing S. The attacker must try to keep the number of

active subkey generating S-boxes as low as possible,

since each active S-box is another constraint on his

attack. The attacker can keep the number of active

S-boxes down to ﬁve without altering S, and so this

is what he should do. With only the key bytes af-

fecting these ﬁve subkey generation S-boxes active,

he can alter between one and four bytes in all ﬁve S-

boxes; the nature of the RS matrix is that if he needs

to alter four bytes in any one of these S-boxes, he

must alter bytes in all ﬁve. In practice, in order to

maximize his control over the byte sequences gener-

ated by these S-boxes, he must alter four bytes in

all ﬁve active S-boxes.

To get zero subkey diﬀerences, the attacker must

get zero diﬀerences in the byte sequences generated

by all ﬁve active S-boxes. Consider a single such

byte sequence: the attacker tries to ﬁnd a pair of

four-byte key inputs such that they lead to identical

byte sequences in the middle twelve rounds, which

means the middle twelve bytes. There are 2

63

pairs

of key inputs from which to choose, and about 2

95

possible byte sequences available. If the byte se-

quences behave more-or-less like random functions

of the key inputs, this implies that it is extremely

unlikely that an attacker can ﬁnd a pair of key in-

puts that will get identical byte sequences in these

middle twelve rounds. We discuss this kind of anal-

ysis of byte sequences in Section 7.11.2. From this

analysis, we would not expect to see a pair of keys

for even one S-box with more than eight successive

bytes unchanged, and we would expect even eight

successive bytes of unchanged byte sequence to re-

quire control of all four key bytes into the S-box.

We would expect a speciﬁc pair of key bytes to be

required to generate these similar byte sequences.

To extend this to ﬁve active S-boxes, we expect there

to be, at best, a single pair of values for the twenty

active key bytes that leave the middle eight subkeys

unchanged.

47

8.7.4 Other Diﬀerence Sequences

An attacker who has control of the xor diﬀerence

sequences in A

i

, B

i

does not necessarily have great

control over the xor or modulo 2

32

diﬀerence se-

quence that appears in the subkeys.

First, we must consider the context of a related-key

diﬀerential attack. The attacker does not generally

know all of the key bytes generating either A

i

or B

i

.

Instead, he knows the xor diﬀerence sequence in A

i

and B

i

.

Consider an A

i

value with an xor diﬀerence of δ.

If the Hamming weight of δ is k, not including the

high-order bit, then the best estimate for the xor

diﬀerence that ends up in the two subkey words for

a given round generally has probability about 2

−2k

.

(Control of the A

i

, B

i

xor diﬀerence sequence does

not make controlling the subkey xor diﬀerences sub-

stantially easier.)

Consider an A

i

value with an xor diﬀerence of δ.

If the Hamming weight of δ is k, then the best esti-

mate for the modulo 2

32

diﬀerence of the two subkey

words for a given round has probability about 2

−k

.

This points out one of the diﬃculties in mount-

ing any kind of successful related-key attack with

nonzero A

i

, B

i

diﬀerence sequences. If an attacker

can ﬁnd a diﬀerence sequence for A

i

, B

i

that keeps

k = 3, and needs to control the subkey diﬀerences

for twelve rounds, he has a probability of about 2

−72

of getting the most likely xor subkey diﬀerence se-

quence, and about 2

−36

of getting the most likely

modulo 2

32

diﬀerence sequence.

8.7.5 Probability of a Successful Attack

With One Related-Key Query

We consider the use of the RS matrix in deriving

S from M to be a powerful defense against related-

key diﬀerential attacks, because it forces an attacker

to keep at least ﬁve key generation S-boxes active.

Our analysis suggests that any useful control of the

subkey diﬀerence sequence requires that each active

S-box in the attack have all four key bytes changed.

Further, our analysis suggests that, for nearly any

useful diﬀerence sequence, each active S-box in the

attack has a speciﬁc pair of deﬁning key bytes it

needs to work. An attacker specifying his key rela-

tion in terms of bytewise xor has ﬁve pairs of se-

quences of four key bytes each, which he wants to

get. This leaves him with a probability of a pair

of keys with his desired relation actually leading to

the desired attack of about 2

−115

, which moves the

attack totally outside the realm of practical attacks.

So long as an attacker is unable to improve this,

either by ﬁnding a way to get useful diﬀerence se-

quences into the subkeys without having so many ac-

tive key bytes, or by ﬁnding a way to mount related-

key attacks with diﬀerent S values for the diﬀerent

keys, we do not believe that any kind of related-key

diﬀerential attack is feasible.

Note the implication of this: clever ways to control a

couple extra rounds’ subkey diﬀerences are not going

to make the attacks feasible, unless they also allow

the attacker to use far fewer active key bytes. For

reference, note that with one altered key byte per ac-

tive subkey generation S-box, the attacker ends up

with a 2

−39

probability that a pair of related keys

will yield an attack; with two key bytes per active

S-box, this increases to 2

−78

; with three key bytes

per active S-box, it increases to 2

−117

. In practice,

this means that any key relation requiring more than

one byte of key changed per active S-box appears to

be impractical.

8.7.6 Conclusions

Our analysis suggests that related-key diﬀerential

attacks against the full Twoﬁsh are not workable.

Note, however, that we have spent less time work-

ing on resistance to chosen key attacks, such as will

be available to an attacker if Twoﬁsh is used in the

straightforward way to deﬁne a hash function. For

this reason, we recommend that more analysis be

done before Twoﬁsh is used in the straightforward

way as a hash function, and we note that it appears

to be much more secure to use Twoﬁsh in this way

with 128-bit keys than with 256-bit keys, despite the

fact that this also slows the speed of a hash function

down by a factor of two.

8.8 A Related-Key Attack on a

Twoﬁsh Variant

8.8.1 Overview of the Attack

Results We deﬁne a partial chosen key attack on

a 10-round Twoﬁsh variant without the pre- or post

xors. The attack requires us to control twenty se-

lected bytes of the key, and to set them diﬀerently for

a pair of related keys K, K

∗

. The remaining twelve

bytes of the two keys, which are the same for both

keys, are unknown to us; recovering them is the ob-

jective of the attack. This attack can be converted

into a related-key diﬀerential attack, but it requires

2

155

randomly selected related key pairs be tried be-

fore one will prove vulnerable to the attack.

48

The attack requires two keys chosen to have the de-

sired relation. Under K, it requires about 1024 cho-

sen plaintexts. Under K

∗

, it requires about 2

32

cho-

sen plaintexts, and about 2

11

adaptive chosen plain-

texts. The resulting attack recovers the twelve un-

known key bytes with about 2

32

eﬀort.

How The Attack Works We start by request-

ing one plaintext encrypted under key K, and 2

64

related plaintexts encrypted under K

∗

. We expect

one of these to be a right pair, which means it will

give the same result for its left ciphertext half under

K

∗

as the original plaintext block did under key K.

We request another plaintext from K and K

∗

to ver-

ify that this was really a right pair, rather than just

an accidental occurrence. We now vary the plain-

texts requested under K

∗

, to isolate and learn the

diﬀerential properties of each S-box in g. This allows

us to learn S. With knowledge of the active twenty

key bytes of the attack, and knowledge of S, we are

able to learn the remaining twelve bytes of key, and

thus to break the cipher.

8.8.2 Finding a Key Pair

The ﬁrst step in the attack is ﬁnding a pair of keys,

K, K

∗

, with some ﬁxed xor relationship, such that

we get a useful subkey diﬀerence sequence, while

leaving the S key unchanged. We described above

how such keys can exist. Here, we assume the exis-

tence of some pair of keys with a subkey xor diﬀer-

ence sequence zero in rounds 1..8 but with nonzero

xor subkey diﬀerences in rounds 0 and 9.

Structure of the Key Pair Based on the discus-

sion above, we assume that the keys K, K

∗

diﬀer in

twenty bytes, with the changed bytes selected in such

a way to leave S unchanged between the two keys,

and to change all four key bytes used to deﬁne ﬁve

of the S-boxes used for subkey generation. We note

that a random pair of keys chosen to have this prop-

erty has only a 2

−155

probability of being the pair

of keys that will generate our desired subkey diﬀer-

ence sequence. This makes this attack impractical as

a diﬀerential related-key attack, though an attacker

who is able to control the active twenty bytes of key

involved in a pair of keys could mount the attack for

a chosen pair of keys. (That is, the attacker could

choose the values of the active twenty bytes of both

K and K

∗

, while remaining ignorant of the other

twelve bytes of K and K

∗

, which must be identical

for both keys. The attacker would then attempt to

learn those remaining twelve bytes of key.)

Subkey Diﬀerences We actually care only about

the subkey diﬀerence in the round subkey words. Re-

call that the subkey words are generated from a pair

of 32-bit words, which are generated from the sub-

key generation S-boxes. Let’s call these values U, V

and U

∗

, V

∗

. We don’t know any of U, V, U

∗

, V

∗

, but

we do know U

= U ⊕U

∗

and V

= V ⊕V

∗

.

8.8.3 Choosing the Plaintexts to Request

From U

, V

**, it is possible to generate a set of about
**

2

32

xor diﬀerence values that we can expect to see

result from the changed subkeys when they’re used

in the F function in the ﬁrst round. We thus do the

following:

1. Request the encryption of some ﬁxed 128-bit

block A, B, C, D under key K.

2. For each of the 2

32

diﬀerent 64-bit xor dif-

ference values, X

i

, Y

i

, we might expect from

U

, V

**, request the encryption of 128-bit block
**

A, B, C ⊕X

i

, D ⊕Y

i

).

3. Consider only those ciphertexts from the sec-

ond step that lead to the same right half of

ciphertext as the encryption under K in the

ﬁrst step. We may need to request one more

encryption under both K and K

∗

, if we have

more than one such ciphertext. The goal is

to determine the output xor from the ﬁrst

round’s F function when the input is A, B,

under the two keys. (The only diﬀerence in

the output is caused by the diﬀerence in round

subkeys.)

8.8.4 Extracting the Key Material

At this point, we know the single xor diﬀerence from

the F function. We can now learn the S-boxes in the

g function. To do this, we replace the high-order

byte of A in the right pair with 256 diﬀerent values.

Thus, we request the encryption of A

j

, B, C, D un-

der K

∗

. This causes one of 256 possible xor dif-

ferences in the g function output. We expect to

have to try about 512 requests with diﬀerent xor

diﬀerence values X

i

, Y

i

, where these values are de-

rived from the X

i

, Y

i

values that generated the orig-

inal right pair based on the possible xor diﬀerences

in g. We thus request 512 plaintexts of the form

A

j

, B, C ⊕ X

i

, D ⊕ Y

i

) to be encrypted under K

∗

.

One of these, we expect to be a right pair, which we

can recognize, again, by the fact that the right half of

49

the ciphertext is the same for A

j

, B, C, D encrypted

under K, and for A

j

, B, C ⊕ X

i

, D ⊕ Y

i

encrypted

under K

∗

.

After carrying this out for each of the A

j

, we have

good information about the xor diﬀerences being

generated from g by the changes between A and A

j

as input. In particular, we will note the fourteen

cases where we probably got 8-bit Hamming weights

in the output xor diﬀerence of g. This information

should be enough to brute-force this S-box, s3. That

is, we try all 2

32

possible S-boxes for s3, and expect

one to be the best ﬁt for our data.

We repeat the attack for each of the other bytes in

A, so that we recover all four S-boxes in g. When

we know these S-boxes, this means we know S. Note

that at the beginning of the attack, we already know

twenty of the thirty-two key bytes used. More im-

portantly, due to the structure of the RS matrix used

to derive S from the raw key bytes, we know all but

three bytes used to derive each four byte “row” of

values in S. This allows us to guess and check the

remaining key bytes three bytes at a time, thus re-

covering the entire key.

8.9 Side-Channel Cryptanalysis and

Fault Analysis

Resistance to these attacks was not part of the AES

criteria, and hence not a major concern in this de-

sign. However, we do have these comments to make

on the design.

Side-channel cryptanalysis [KSWH98b] uses infor-

mation about the cipher in addition to the plaintext

or ciphertext. Examples include timing [Koc96],

power consumption (including diﬀerential power

analysis [Koc98]), NMR scanning, and electronic

emanations.

21

With many algorithms it is possi-

ble to reconstruct the key from these side channels.

While total resistance to side-channel cryptanalysis

is probably impossible, we note that Twoﬁsh exe-

cutes in constant time on most processors.

Fault analysis [BDL97, BS97] can be used to suc-

cessfully cryptanalyze this cipher. Again, we believe

that total resistance to fault analysis is an impossi-

ble design constraint for a cipher. The resistance to

fault analysis of any block cipher can be improved

using classical fault tolerance techniques.

8.10 Attacking Simpliﬁed Twoﬁsh

8.10.1 Twoﬁsh with Known S-boxes

As discussed above, our diﬀerential attack on

Twoﬁsh with ﬁxed S-boxes works for 6-round

Twoﬁsh, and requires only 2

67

eﬀort. A 7-round

diﬀerential attack requires 2

131

eﬀort.

Additionally, much of Twoﬁsh’s related-key attack

resistance comes from the derivation of the S-boxes,

so a related-key attack is much easier against a

known S-box variant.

8.10.2 Twoﬁsh Without Round Subkeys

We attack a Twoﬁsh variant without any sub-

keys, thus whose whole key material is in the key-

dependent S-boxes. The attack requires about 2

33

chosen plaintexts; it breaks the Twoﬁsh variant with

about 2

36

eﬀort, even when the cipher has a 128-bit

key. (Note that this is the amount of key material

used to deﬁne the S-boxes in normal Twoﬁsh with a

256-bit key.)

It is obvious that a Twoﬁsh variant with ﬁxed S-

boxes and no subkeys would be insecure—there

would be no key material injected. We develop a

slide attack on Twoﬁsh with any number of rounds,

and a 128-bit key. (The Twoﬁsh key would be 256

bits, but since we never use more than 128 bits of

key in the key-dependent S-boxes, the eﬀective key

size is 128 bits.)

Note that this attack would not work if we used

round subkeys that were simply counter values, as

would happen if we used the identity permutation for

all the key-scheduling S-boxes. We are not currently

aware of any attack on such a Twoﬁsh variant.

Overview of the Attack In a slide attack, we at-

tempt to ﬁnd a pair of plaintexts, (P, P

∗

), such that

P

∗

has the same value as the intermediate value af-

ter one or more rounds of encrypting P. If we can

ﬁnd and expose this value, we gain direct access to

the results of a single round of the cipher; we then

attack that single round to recover the key.

Our attack has two parts:

1. We must ﬁrst ﬁnd a pair of plaintexts, (P, P

∗

),

such that P

∗

is the result of encrypting P with

one round.

2. We use this pair to derive four relations on g

and thus determine the speciﬁc S-boxes used

in g. This yields the eﬀective key of the cipher.

21

The NSA refers to this particular side channel as “TEMPEST.”

50

Finding Our Related Pair of Texts Twoﬁsh

is a Feistel cipher, operating on two 64-bit halves in

each block. We use the representation where the left

half of the block is the input to the Feistel function,

and the halves are swapped after each round. Let

(L

0

, R

0

) represent a given pair of 64-bit halves in-

put into the cipher, and let (L

N

, R

N

) represent the

resulting ciphertext. To mount this attack, we need

to ﬁnd some (L

0

, R

0

, L

1

) such that L

1

is the value

of the left half of the block after the ﬁrst round. To

test whether we have such a pair, we can try en-

crypting (L

0

, R

0

) and (L

1

, L

0

). When we have a

triple with the desired properties, we get (L

N

, R

N

)

from the ﬁrst encryption, and (L

N+1

, L

N

) from the

second encryption.

We use a trick by Biham [Bih94] to get such a pair

of plaintexts with only 2

33

chosen plaintexts: we

choose a ﬁxed L

0

, and encrypt it with 2

32

randomly

selected R

0

values as (L

0

, R

0

). We then encrypt the

same L

0

with 2

32

random R

1

values, as (R

1

, L

0

).

By the birthday paradox, we expect to see one pair

of values for which R

1

= R

0

⊕ F(L

0

). To ﬁnd this

pair, we sort the ciphertexts from the ﬁrst batch of

encryptions on their left halves, and the ciphertexts

from the second batch of encryptions on their right

halves. When we ﬁnd a match, the probability is

good that we have a pair with the desired property.

Extracting g Values Once we have this triple

(L

0

, R

0

, L

1

), we also have two relations on F:

F(L

0

) = R

0

⊕L

1

and F(L

N

) = R

N

⊕L

N+1

. Note,

however, that we do not yet have direct g values.

Instead, we have two instances of the results of com-

bining two g outputs with a PHT. Since we have the

actual PHT output values, we can simply undo the

PHT, yielding relations on g. Our pair has given us

two relations on F, and thus four relations on g.

Extracting the S-boxes The g outputs are the

result of applying four key-dependent S-boxes to the

input bytes, and then combining those bytes with an

MDS matrix multiply. Since the MDS multiply is

invertible, we invert it to get back the actual S-box

outputs for all four diﬀerent values. If those values

are all diﬀerent, then we have four bytes of output

from each S-box. We can try all 2

32

possible key

input values for each S-box, and see which ones are

consistent with the results; for most sets byte values,

only one or two S-boxes will match them. We thus

learn each key-dependent S-box in g, perhaps with

a couple of alternative values. We try all possible

alternatives against any plaintext/ciphertext values

we have available, and very quickly recover the cor-

rect S-boxes. Since this is the only key material in

this Twoﬁsh variant, the attack is done.

8.10.3 Twoﬁsh with Non-bijective S-boxes

We decided early in the design process to use purely

bijective S-boxes. One rationale was to ensure that

the 64-bit round function be bijective. That way,

iterative 2-round diﬀerential characteristics cannot

exist; when they do exist, they often result in

the highest-probability multi-round characteristic,

so avoiding them should help to reduce the risk of a

successful diﬀerential attack. Also, attacks based

on non-surjective round functions [BB95, RP95b,

RPD97, CWSK98] are sure to fail when the 64-bit

Feistel round function is bijective.

22

We argue here that this was a good design decision,

by showing that a Twoﬁsh variant which uses non-

bijective S-boxes is likely to be much easier to break.

Observe that when q

0

and q

1

are non-bijective, their

3-way composition into s

i

is likely to be even more

non-surjective than either of q

0

or q

1

on its own. It

is easily seen that the expected size of the range of

a random function on 8 bits (such as q

0

or q

1

) is

r

1

= 1 − (1 − 1/256)

256

≈ 1 − e

−1

≈ 0.632. When

we compose such a function twice, its expected range

becomes r

2

= 1−(1−1/256)

256r1

≈ 1−e

−r1

≈ 0.469;

and the expected size of the range of a 3-way compo-

sition will be r

3

≈ 1 −e

−r2

≈ 0.374. In other words,

we expect that only about 96 = 0.374 × 256 of all

possible 8-bit values can appear as the output of S.

Therefore, the output of the h function can attain

only about 2

26.3

≈ 96

4

possible values, and the 64-

bit Feistel function can attain only about 2

52.6

of all

2

64

possible outputs. This is certainly a rather large

certiﬁcational weakness. (This gets worse when the

key size grows, since the number of compositions of

the q functions gets larger.)

We point out a serious diﬀerential attack when us-

ing non-bijective S-boxes. Consider the probabil-

ity p

∆x

that a given input diﬀerence ∆x yields a

collision in the S-box output; i.e., ∆x → 0. Let

p =

∆x=0

p

∆x

/255 be the average probability

over all non-zero input diﬀerences, and let m =

max

∆x=0

p

∆x

be the maximum probability over all

non-zero input diﬀerences. We have Ep = 3/256;

also, Pr(p ≥ 2/256) ≈ 0.78, Pr(p ≥ 10/256) ≈ 0.02,

and Pr(p ≥ 16/256) ≈ 0.0002. As for the distri-

bution of m, empirically Em ≈ 11.7/256; experi-

ments suggest Pr(m ≥ 10/256) ≈ 0.975, Pr(m ≥

16/256) ≈ .04, and Pr(m ≥ 22/256) ≈ 0.0002.

22

Vaudenay’s attack on Blowﬁsh took advantage of non-bijectivity in the Blowﬁsh round function [Vau96a].

51

Consider a Twoﬁsh variant with non-bijective S-

boxes and no rotations. We obtain a 2-round iter-

ative diﬀerential characteristic with probability m,

and thus a 13-round diﬀerential characteristic with

probability m

6

. We ﬁnd that, for 97.5% of the keys,

one can break the variant with about 2

28

chosen

plaintexts; for 4% of the keys, it can be broken with

2

24

chosen plaintexts; and for a class of weak keys

consisting of 0.02% of the keys, the variant cipher

can be broken with roughly 2

21

chosen plaintexts.

(Of course, one can trade oﬀ the number of texts

needed against the size of the weak key class; the

ﬁgures given are just examples of points on a smooth

curve.)

Next we consider a Twoﬁsh variant with non-

bijective S-boxes, but with all rotations left intact.

If we look at any 2-round diﬀerential characteris-

tic whose ﬁrst round is the trivial characteristic and

whose second round involves just one active S-box,

we expect its probability to be about p. One dif-

ﬁculty is that the rotations prevent us from ﬁnding

an iterative 2-round characteristic. However, we can

certainly piece together 6.5 diﬀerent 2-round diﬀer-

ential characteristics (each of the right form so it will

have probability about p) to ﬁnd a 13-round char-

acteristic with expected probability p

6

. (The latter

probability can probably be improved to about 3p

6

,

due to the extra degrees of freedom, but as we are

only doing a back-of-the-envelope estimate anyway,

we will omit these considerations.) Thus, we can

ﬁnd a diﬀerential attack that succeeds with about

2

39

chosen plaintexts for the majority (about 78%)

of the keys; also, for 2% of the keys, this variant

can be broken with 2

28

chosen plaintexts; and for a

class of weak keys consisting of 0.02% of the keys,

the variant cipher can be broken with roughly 2

24

chosen plaintexts.

This analysis clearly shows the value of bijective S-

boxes in stopping diﬀerential cryptanalysis. Also,

it helps motivate the beneﬁt of the rotations: they

make short iterative characteristic much harder to

ﬁnd, thereby conferring (we hope) some additional

resistance against diﬀerential attacks.

9 Trap Doors in Twoﬁsh

We assert that Twoﬁsh has no trap doors. As de-

signers, we have made every eﬀort to make Twoﬁsh

secure against all known (and unknown) cryptanal-

yses, and we have made no eﬀort to build in a secret

way of breaking Twoﬁsh. However, there is no way

to prove this, and not much reason for anyone to

believe us. We can oﬀer some assurances.

In this paper, we have outlined all of the design ele-

ments of Twoﬁsh and our justiﬁcations for including

them. We have explained, in great detail, how we

chose Twoﬁsh’s “magic constants”: the RS code, q

0

,

q

1

, and the MDS matrix. There are no mysterious

design elements; everything has an explicit purpose.

Moreover, we feel that the use of key-dependent S-

boxes makes it harder to install a trap door into the

cipher. As diﬃcult as it is to create a trap door for a

particular set of carefully constructed S-boxes, it is

much harder to create one that works with all pos-

sible S-boxes or even a reasonably useful subset of

them (of relative size 2

−20

or so).

Additionally, any trap door would have to survive

16 rounds of Twoﬁsh. It would have to work even

though there is almost perfect diﬀusion in each

round. It would have to survive the pre- and post-

whitening. These design elements have long been

known to make any patterns diﬃcult to detect; trap

doors would be no diﬀerent.

None of this constitutes a proof. Any reasonable

proof of general security for a block cipher would

also prove P = NP. Rather than outlining the proof

here, we would likely skip the AES competition and

go collect our Fields Medal.

However, we have made headway towards a philo-

sophical proof. Assume for a moment that, despite

the diﬃculties listed above, we did manage to put

a trap door into Twoﬁsh. This would imply one of

two things:

One, that we have invented a powerful new cryptan-

alytic attack and have carefully crafted Twoﬁsh to

be resistant to all known attacks but vulnerable to

this new one. We cannot prove that this is not true.

However, we can point out that as cryptographers we

would achieve much more fame and glory by publish-

ing our powerful new cryptanalytic attack. In fact,

we would probably publish it along with this paper,

making sure Twoﬁsh is immune so that we can prof-

itably attack the other AES submissions.

The other possibility is that we have embedded a

trap door into the Twoﬁsh magic constants and then

transformed them by some means so that ﬁnding

them would be a statistical impossibility (see [Har96]

for some discussion of this possibility). The re-

sulting construction would seem immune to current

cryptanalytic techniques, but we as designers would

know a secret transformation rule that we could ap-

ply to facilitate cryptanalysis. Again, we cannot

prove that this is not true. However, it has been

shown that this type of cipher, called a “master-key

cryptosystem,” is equivalent to a public-key cryp-

tosystem [BFL96]. Again, as cryptographers we

52

would achieve far greater recognition by publishing

a public-key cryptosystem that is not dependent on

factoring [RSA78] or the discrete logarithm problem

[DH76, ElG85, NIST94]. And the resulting algo-

rithm’s dual capabilities as both a symmetric and

public-key algorithm would make it far more ﬂexi-

ble than the AES competition.

There is a large gap between a weakness that

is exploitable in theory and one that is ex-

ploitable in practice. Even the best attack

against DES (a linear-cryptanalysis-like attack com-

bining quadratic approximations and a multiple-

approximation method) requires just under 2

43

plaintext/ciphertext blocks [SK98], which is equiva-

lent to about 64 terabytes of plaintext/ciphertext en-

crypted under a single key. A useful trap door would

need to work with much less plaintext—a few thou-

sand blocks—or it would have to reduce the eﬀec-

tive keyspace to something on the order of 2

72

. We

believe that, given the quality of the public crypt-

analytic research community, it would be impossible

to put a weakness of this magnitude into a block

cipher and have it remain undetected through the

AES process. And we would be foolish to even try.

10 When is a Cipher Insecure?

More and more recent ciphers are being deﬁned

with a variable number of rounds: e.g., SAFER-K64

[Mas94], RC5, and Speed [Zhe97]. This means that

it is impossible to categorically state that a given

cipher construction is insecure: there might always

be a number of rounds n for which the cipher is still

secure. However, while this might theoretically be

true, this is not a useful engineering deﬁnition of “se-

cure.” After all, the user of the cipher actually has to

choose how many rounds to use. In a performance-

driven model, it is useful to compare ciphers of equal

speed, and compare their security, or compare ci-

phers of equal security and compare their speeds.

For example, FEAL-32 is secure against both diﬀer-

ential and linear attacks. Its speed is 65 clock cycles

per byte of encryption, which makes it less desirable

than faster, also secure, alternatives.

With that in mind, Table 9 gives performance met-

rics for block and stream ciphers on the Pentium

processor.

23

11 Using Twoﬁsh

11.1 Chaining Modes

All standard block-cipher chaining modes work with

Twoﬁsh: CBC, CFB, OFB, counter [NBS80]. We

are aware of no problems with using Twoﬁsh with

any commonly used chaining mode. (See [Sch96] for

a detailed comparison of the various modes of op-

eration.) A cryptanalyst considering OFB-, CFB-,

or CBC-mode encryption with Twoﬁsh may collapse

the pre- and post-xor of key material into a single

xor, but does not appear to beneﬁt much from this.

11.2 One-Way Hash Functions

The most common way of using a block cipher

as a hash function is a Davies-Meyer construction

[Win84]:

H

i

= H

i−1

⊕E

Mi

(H

i−1

)

There are ﬁfteen other variants [Pre93]. We believe

that Twoﬁsh can be used securely in any of these for-

mats; note, however, that the key schedule has been

analyzed mainly for related-key attacks, not for the

class of chosen-key attack that hash functions must

resist. Additionally, the 128-bit block size makes

straightforward use of the Davies-Meyer construc-

tion useful only when collision-ﬁnding attacks can

be expected to be unable to try 2

64

trial hashes to

ﬁnd a collision.

As keys that are non-standard sizes have equivalent

keys that are longer, any use of Twoﬁsh in a Davies-

Meyer construction must ensure that only a single

key length is used.

11.3 Message Authentication Codes

Any one-way hash function can be used to build

a message authentication code using existing tech-

niques [BCK96]. Again, we believe Twoﬁsh’s strong

key schedule makes it very suitable for these con-

structions.

11.4 Pseudo-Random Number Gen-

erators

Twoﬁsh can also be used as a primitive in a

pseudo-random number generator suitable for gen-

erating session keys, public-key parameters, proto-

col nonces, and so on [Plu94, KSWH98a, Gut98,

KSWH98c].

23

These metrics are based on theoretical analyses of the algorithms and actual hand-tooled assembly-language implementa-

tions [SW97, PRB98].

53

Algorithm Key Length Width (bits) Rounds Cycles Clocks/Byte

Twoﬁsh variable 128 16 8 18.1

Blowﬁsh variable 64 16 8 19.8

Square 128 128 8 8 20.3

RC5-32/16 variable 64 32 16 24.8

CAST-128 128 64 16 8 29.5

DES 56 64 16 8 43

Serpent 128, 192, 256 128 32 32 45

SAFER (S)K-128 128 64 8 8 52

FEAL-32 64, 128 64 32 16 65

IDEA 128 64 8 8 74

Triple-DES 112 64 48 24 116

Table 9: Performance of diﬀerent block ciphers (on a Pentium)

11.5 Larger Keys

Even though it would be straightforward to extend

the Twoﬁsh key schedule scheme to larger key sizes,

there is currently no deﬁnition of Twoﬁsh for key

lengths greater than 256 bits. We urge caution in

trying to extend the key length; our experience with

Twoﬁsh has taught us that extending the key length

can have important security implications.

11.6 Additional Block Sizes

There is no deﬁnition of Twoﬁsh for block lengths

other than 128 bits. While it may be theoretically

possible to extend the construction to larger block

sizes, we have not evaluated these constructions at

all. We urge caution in trying to extend the block

size; many of the constructions we use may not scale

well to 256 bits, 512 bits, or larger blocks.

11.7 More or Fewer Rounds

Twoﬁsh is deﬁned to have 16 rounds. We designed

the key schedule to allow natural extensions to more

or fewer rounds if and when required. We strongly

advise against reducing the number of rounds. We

believe it is safe to increase the number of rounds,

although we see no need to do so.

11.8 Family Key Variant: Twoﬁsh-

FK

We often see systems that use a proprietary vari-

ant of an existing cipher, altered in some hopefully

security-neutral way to prevent that system from in-

teroperating with standard implementations of the

cipher. A family key is a way of designing this into

the algorithm: each diﬀerent family key is used to

deﬁne a diﬀerent variant of the cipher. In some

sense, the family key is like an additional key to

the cipher, but in general, it is acceptable for the

family key to be very computationally expensive to

change. We would expect nearly all Twoﬁsh imple-

mentations that used any family key to use only one

family key.

Our goals for the family key algorithm are as follows:

• No family key variant should be substantially

weaker than the original cipher.

• Related-key attacks between diﬀerent un-

known but related family keys, or between

a known family key and the original cipher,

should be hard to mount.

• The family key should not merely reorder the

set of 128-by-128-bit permutations provided by

the cipher; it should change that set.

A Twoﬁsh family key is simply a 256-bit random

Twoﬁsh key. This key, FK, is used to derive several

blocks of bits, as follows:

1. Preprocessing Step:

This step is done once per family key.

(a) T

0

= FK.

(b) T

1

= (E

FK

(0), E

FK

(1)) ⊕FK.

(c) T

2

= E

FK

(2).

(d) T

3

= E

FK

(3).

(e) T

4

= First 8 bytes of E

FK

(4).

Note that, using our little-endian convention,

the small integers used as plaintext inputs

should occur in the ﬁrst plaintext byte.

54

2. Key Scheduling Step: This step is done once

per key used under this family key.

(a) Before subkey generation, T

0

is xored

into the key, using as many of the leading

bytes of T

0

as necessary.

(b) After subkey generation, T

2

is xored into

the pre-xor subkeys, T

3

is xored into the

post-xor subkeys, and T

4

is xored into

each round’s 64-bit subkey block.

(c) Before the cipher’s S-boxes are derived,

T

1

is xored into the key. Once again, we

use as many of the leading bytes of T

1

as

we need. Each byte of the key is then

passed through the byte permutation q

0

,

and the result is passed through the RS

matrix to get S. Note that the deﬁnition

of T

1

means that the eﬀect of the ﬁrst

xor of FK into the key is undone.

Note the properties of the alterations made by any

family key:

• The keyspace is simply permuted for the initial

subkey generation.

• The subkeys are altered in a simple way. How-

ever, there is strong evidence that this alter-

ation cannot occur by changing keys, based on

the same diﬀerence sequence analysis used in

discussing related-key attacks on Twoﬁsh.

• The S-boxes used in the cipher are altered in a

powerful way, but one which does not alter the

basic required properties of the key schedule.

Getting no changes in the S-boxes used in the

cipher still requires changing at least ﬁve bytes

of key, and those ﬁve bytes of key must change

ﬁve of the S-boxes used for subkey generation.

11.8.1 Analysis

Eﬀects of Family Keys on Cryptanalysis We

are not aware of any substantial diﬀerence in the

diﬃculty of cryptanalyzing the family key version

of the cipher rather than the regular version. The

cipher’s operations are unchanged by the family

key; only subkey and S-box generation are changed.

However, they are changed in simple ways; the S-

boxes are generated in exactly the same way as be-

fore, but the key material provided to them is pro-

cessed in a simple way ﬁrst; the round subkeys are

generated in the same way as before (again, with

the key material processed in a simple way ﬁrst), and

then have a constant 64-bit value xored in. Related-

key attacks of the kind we have been able to consider

are made slightly harder, rather than easier, by this

64-bit value. The new constants xored into the pre-

and post-xor subkeys simply permute the input and

output space of the cipher in a very simple way.

Related Keys Across Family Keys Related-

key attacks under the same family key appear, as

we said above, to be at least as hard as related-

key attacks in normal Twoﬁsh. There is still the

question, however, of whether there are interesting

related keys across diﬀerent family keys. It is hard

to see how such related keys would be used, but the

analysis may be worth pursuing anyway.

An interesting question, from the perspective of an

attacker, is whether there are pairs of keys that

give identical subkeys except for the constant val-

ues xored into them, and that also give identical

S-boxes. By allowing an attacker to put a constant

xor into all round subkeys, such pairs of keys would

provide a useful avenue of attack.

This can be done by ﬁnding a pair of related fam-

ily keys, FK, FK

∗

, which are identical in their ﬁrst

128 bits, and a pair of 128-bit cipher keys, M, M

∗

,

such that M generates the same set of S-boxes with

FK that M

∗

does with FK

∗

. For a random pair of

M, M

∗

values, this has probability 2

−64

of happen-

ing. Thus, an attacker given complete control over

M, M

∗

and knowledge of FK, FK

∗

can ﬁnd such a

pair of keys and family keys. However, this does not

seem to translate into any kind of clean related-key

attack; the attacker must actually choose the speciﬁc

keys used.

We do not consider this to be a valid attack against

the system. In general, related-key attacks between

family keys seem unrealistic, but one which also re-

quires the attacker to be able to choose speciﬁc key

values he is trying to recover is also pointless.

A Valid Related-key Attack An attacker can

also try to ﬁnd quadruples FK, FK

∗

, M, M

∗

such

that the subkey generation and the S-box genera-

tion both get identical values. This requires that

T

0

⊕M = T

∗

0

⊕M

∗

T

1

⊕M = T

∗

1

⊕M

∗

If FK, FK

∗

have the property that

T

0

⊕T

∗

0

= T

1

⊕T

∗

1

= δ

then related-key pairs M, M⊕δ will put a ﬁxed xor

diﬀerence into every round subkey, and may allow

55

some kind of related-key attack. Again, we do not

think this is an interesting attack; the attacker must

force the family keys to be chosen this way, since the

probability that any given pair of family keys will

work this way is (for 128-bit cipher keys) 2

−128

. We

do not expect such relationships to occur by chance

until about 2

64

family keys are in use.

12 Historical Remarks

Twoﬁsh originated from an attempt to take the origi-

nal Blowﬁsh design and modify it for a 128-bit block.

We wanted to leverage the speed and good diﬀusion

of Blowﬁsh, while also improving it where we could.

We wanted the new cipher to have a bijective F func-

tion, a much more eﬃcient key schedule, and to be

implementable in custom hardware and smart cards

in addition to 32-bit processors (i.e., have smaller

tables). And we wanted it to be even faster than

Blowﬁsh (per byte encrypted), if possible.

Initial thoughts were to have the Blowﬁsh round

structure operate on the four 32-bit subblocks in a

circular structure, but there were problems getting

the diﬀusion to work in both the encryption and

decryption directions. Having two parallel Blow-

ﬁsh round functions and letting them interact via a

two-dimensional Feistel structure ran into the same

problems. Our solution was to have a single Feistel

structure with two Blowﬁsh-like 32-bit round func-

tions and to combine them using a PHT (an idea

stolen from SAFER). This idea also provided nearly

complete avalanche during the round.

Round subkeys are required to avoid slide attacks

against identical round functions. We used addition

instead of xor to take advantage of the Pentium

LEA opcode and implement them in eﬀectively zero

time.

We used 8-by-8-bit S-boxes and an MDS matrix (an

idea stolen from Square, although Square uses a sin-

gle ﬁxed S-box) instead of random 8-by-32-bit S-

boxes, both to simplify the key schedule and ensure

that the g function is bijective. This construction

ensured that Twoﬁsh would be eﬃcient on 32-bit

processors (by precomputing the S-boxes and MDS

matrix into four 8-by-32-bit S-boxes) while still al-

lowing it to be computed on the ﬂy in smart cards.

And since our MDS matrix is only ever computed

in one direction, we did not have to worry about

the matrix’s eﬃciency in the reverse direction (which

Square had to consider).

The construction also gave us considerable perfor-

mance ﬂexibility. We worked hard to keep this ﬂex-

ibility, so implementers would have a choice of how

much key pre-processsing to do depending on the

amount of plaintext to be encrypted. And we tried

to maintain these tradeoﬀs for 32-bit microproces-

sors, 8-bit microprocessors, and custom hardware.

Since one goal was to be able to keep the com-

plete design in our heads, any complication that did

not have a clear purpose was deleted. Additional

complications we chose not to introduce were key-

dependent MDS matrices or round-dependent vari-

ations on the byte ordering and the PHT. We also

toyed with the idea of swapping 32-bit words within

the Feistel halves (something we called the “twist”),

but abandoned it because we saw no need for the

additional complexity.

We did keep in the one-bit rotations of the target

block, primarily to reduce the vulnerability to any

attack based on byte boundaries. The particular

manifestation of this one-bit rotation was due to a

combination of performance and cryptanalytic con-

cerns.

We considered using all the key bytes, rather than

just half, to deﬁne the key-dependent S-boxes. Un-

fortunately, this made the key setup time for high-

end machines unreasonably large, and also made en-

cryption too slow on low-end machines. By dropping

this to half the key bits, these performance ﬁgures

were improved substantially. By carefully selecting

how the key bytes are folded down to half their size

before being used to generate the cipher’s S-boxes,

we were able to ensure that pairs of keys with the

same S-boxes would have very diﬀerent subkey se-

quences.

The key schedule gave us the most trouble. We had

to resist the temptation to build a cryptographic key

schedule like the ones used by Blowﬁsh, Khufu, and

SEAL, because low-end implementations needed to

be able to function with minimal additional memory

and, if necessary, to compute the subkeys as needed

by the cipher. However, a simple key schedule can

be an important weak point in a cipher design, leav-

ing the whole cipher vulnerable to partial-key guess-

ing attacks, related-key attacks, or to attacks based

on waiting for especially weak keys to be selected

by a user. Though our ﬁnal key schedule is rather

complex, it is conceptually much simpler than many

of our intermediate designs. Reusing many of the

primitives of the cipher (each round’s subkeys are

generated by a computation nearly identical to the

one that goes on inside the round function, except

for the speciﬁc inputs involved) made it possible to

investigate properties of both the key schedule and

of the cipher at the same time.

56

We spent considerable time choosing q

0

and q

1

.

Since these provide the primary non-linearity in the

cipher, they had to be strong. We wanted to be

able to construct them algebraically, for applications

where storing 512 bytes of ﬁxed tables was not possi-

ble. In the end, we built permutations from random

parameters, and tested the permutations against our

required criteria.

And ﬁnally, we cryptanalyzed Twoﬁsh. We cryptan-

alyzed and cryptanalyzed and cryptanalyzed, right

up to the morning of the submission deadline. We’re

still cryptanalyzing; there’s no stopping.

13 Conclusions and Further

Work

We have presented Twoﬁsh, the rationale behind its

design, and the results of our initial cryptanalysis.

Design and cryptanalysis go hand in hand—it is im-

possible to do one without the other—and it is only

in the analysis that the strength of an algorithm can

be demonstrated.

During the design process, we learned several lessons

about cipher design:

• The encryption algorithm and key schedule

must be designed in tandem; subtle changes

in one aﬀect the other. It is not enough to de-

sign a strong round function and then to graft

a strong key schedule onto it (unless you are

satisﬁed with an ineﬃcient and inelegant con-

struction, like Blowﬁsh has); both must work

together.

• There is no such thing as a key-dependent S-

box, only a complicated multi-stage nonlin-

ear function that is implemented as a key-

dependent S-box for eﬃciency.

• Keys should be as short as possible. It is much

harder to design an algorithm with a long key

than an algorithm with a short key. Through-

out our design process, we found it easier to

design and analyze Twoﬁsh with a 128-bit key

than Twoﬁsh with a 192- or 256-bit key.

• Build a cipher with strong local encryption and

let the round function handle the global diﬀu-

sion. Designing Twoﬁsh in this manner made

it very hard to mount any statistical cryptan-

alytical attacks.

• Consider performance at every stage of the de-

sign. Having a code optimization guru on our

team from the beginning drastically changed

the way we looked at design tradeoﬀs, to the

ultimate beneﬁt of Twoﬁsh.

• Analysis can go on forever. If the submission

deadline were not on 15 June 1998, we would

still be cryptanalyzing and tweaking Twoﬁsh.

We believe Twoﬁsh to be an ideal algorithm choice

for AES. It is eﬃcient on large microprocessors,

smart cards, and dedicated hardware. The multiple

layers of performance tradeoﬀs in the key schedule

make it suitable for a variety of implementations.

And the attention to cryptographic detail in the

design—both the encryption function and the key

schedule—make it suitable as a codebook, output-

feedback and cipher-feedback stream cipher, one-

way hash function, and pseudo-random number gen-

erator.

We welcome any new cryptanalysis from the crypto-

graphic community. We plan on continuing to eval-

uate Twoﬁsh all through the AES selection process.

Speciﬁcally:

• Whether the number of rounds can safely be

reduced. At this point our best non-related-

key attack—a diﬀerential attack—can only

break ﬁve rounds. If no better attacks are

found after a few years, it may be safe to

reduce the number of rounds to 14 or even

12. Twelve-round Twoﬁsh can encrypt and de-

crypt data at about 250 clock cycles per block

on a Pentium, Pentium Pro, and Pentium II.

• Whether there are alternative ﬁxed tables that

increase security. We have chosen both the

MDS matrix and the ﬁxed permutations, q

0

and q

1

, to meet our mathematical require-

ments. In the event we ﬁnd better constants

that make Twoﬁsh even harder to cryptana-

lyze, we may want to revise the algorithm.

• Whether we can deﬁne a Twoﬁsh variant with

ﬁxed S-boxes. This variant would have a faster

key-setup time than the algorithm presented

here—about 1200 clock cycles instead of 7800

clock cycles on a Pentium Pro—and the same

encryption and decryption speeds. Further re-

search is required on what the ﬁxed S-boxes

would look like, and how much data could be

safely encrypted with this variant.

• Whether we can improve our lower bound on

the complexity of a diﬀerential attack.

57

Developing Twoﬁsh was a richly rewarding experi-

ence, and one of our most satisfying cryptographic

projects to date. We look forward to the next phase

of the AES selection process.

14 Acknowledgments

The authors would like to thank Carl Ellison, Paul

Kocher, and Randy Milbert, who read and com-

mented on drafts of the paper, and Beth Fried-

man, who copyedited the (almost) ﬁnal version of

the paper. Additionally, the authors would like

to thank NIST for initiating the AES process, and

Miles Smid, Jim Foti, and Ed Roback for putting

up with a never-ending steam of questions and com-

plaints about its details. This work has been funded

by Counterpane Systems and Hi/fn Inc.

References

[AB96a] R. Anderson and E. Biham, “Two Prac-

tical and Provably Secure Block Ciphers:

BEAR and LION,” Fast Software Encryp-

tion, Third International Workshop Proceed-

ings, Springer-Verlag, 1996, pp. 113–120.

[AB96b] R. Anderson and E. Biham, “Tiger: A Fast

New Hash Function,” Fast Software Encryp-

tion, Third International Workshop Proceed-

ings, Springer-Verlag, 1996, pp. 89–97.

[Ada97a] C. Adams, “Constructing Symmetric Ci-

phers Using the CAST Design Procedure,”

Designs, Codes and Cryptography, v.12, n.3,

Nov 1997, pp. 71–104.

[Ada97b] C. Adams, “DES-80,” Workshop on Selected

Areas in Cryptography (SAC ’97) Workshop

Record, School of Computer Science, Car-

leton University, 1997, pp. 160–171.

[AGMP96] G.

´

Alvarez, D. De la Guia, F. Montoya, and

A. Peinado, “Akelarre: A New Block Cipher

Algorithm,” Workshop on Selected Areas in

Cryptography (SAC ’96) Workshop Record,

Queens University, 1996, pp. 1–14.

[Anon95] Anonymous, “this looked like it might be in-

teresting,” sci.crypt Usenet posting, 9 Aug

1995.

[AT93] C.M. Adams and S.E. Tavares, “Designing

S-boxes for Ciphers Resistant to Diﬀerential

Cryptanalysis,” Proceedings of the 3rd Sym-

posium on State and Progress of Research in

Cryptography, Rome, Italy, 15–16 Feb 1993,

pp. 181–190.

[BAK98] E. Biham, R. Anderson, and L. Knud-

sen, “Serpent: A New Block Cipher Pro-

posal,” Fast Software Encryption, 5th In-

ternational Workshop Proceedings, Springer-

Verlag, 1998, pp. 222–238.

[BB93] I. Ben-Aroya and E. Biham, “Diﬀeren-

tial Cryptanalysis of Lucifer,” Advances in

Cryptology — CRYPTO ’93 Proceedings,

Springer-Verlag, 1994, pp. 187–199.

[BB94] E. Biham and A. Biryukov, “How to

Strengthen DES Using Existing Hardware,”

Advances in Cryptology — ASIACRYPT ’94

Proceedings, Springer-Verlag, 1994, pp. 398–

412.

[BB95] E. Biham and A. Biryukov, “An Improve-

ment of Davies’ Attack on DES,” Advances

in Cryptology — EUROCRYPT ’94 Proceed-

ings, Springer-Verlag, 1995, pp. 461–467.

[BB96] U. Blumenthal and S. Bellovin, “A Bet-

ter Key Schedule for DES-Like Ciphers,”

Pragocrypt ’96 Proceedings, 1996, pp. 42–54.

[BCK96] M. Bellare, R. Canetti, and H. Karwczyk,

“Keying Hash Functions for Message Au-

thentication,” Advances in Cryptology —

CRYPTO ’96 Proceedings, Springer-Verlag,

1996, pp. 1–15.

[BDL97] D. Boneh, R.A. DeMillo, and R.J. Lipton

“On the Importance of Checking Crypto-

graphic Protocols for Faults,” Advances in

Cryptology — EUROCRYPT ’97 Proceed-

ings, Springer-Verlag, 1997, pp. 37–51.

[BDR+96] M. Blaze, W. Diﬃe, R. Rivest, B. Schneier,

T. Shimomura, E. Thompson, and M.

Weiner, “Minimal Key Lengths for Symmet-

ric Ciphers to Provide Adequate Commercial

Security,” Jan 1996.

[BFL96] M. Blaze, J. Feigenbaum, and F. T.

Leighton, Master-Key Cryptosystems, DI-

MACS Technical Report 96-02, Rutgers Uni-

versity, Piscataway, 1996.

[Bih94] E. Biham, “New Types of Cryptanalytic At-

tacks Using Related Keys,” Journal of Cryp-

tology, v. 7, n. 4, 1994, pp. 229–246.

[Bih95] E. Biham, “On Matsui’s Linear Cryptanal-

ysis,” Advances in Cryptology — EURO-

CRYPT ’94 Proceedings, Springer-Verlag,

1995, pp. 398–412.

[Bih97] E. Biham, “A Fast New DES Implemen-

tation in Software,” Fast Software Encryp-

tion, 4th International Workshop Proceed-

ings, Springer-Verlag, 1997, pp. 260–271.

58

[BK98] A. Biryukov and E. Kushilevitz, “Improved

Cryptanalysis of RC5,” Advances in Cryp-

tology — EUROCRYPT ’98 Proceedings,

Springer-Verlag, 1998, pp. 85–99.

[BKPS93] L. Brown, M. Kwan, J. Pieprzyk, and J. Se-

berry, “Improving Resistance to Diﬀerential

Cryptanalysis and the Redesign of LOKI,”

Advances in Cryptology — ASIACRYPT ’91

Proceedings, Springer-Verlag, 1993, pp. 36–

50.

[BPS90] L. Brown, J. Pieprzyk, and J. Seberry,

“LOKI: A Cryptographic Primitive for Au-

thentication and Secrecy Applications,” Ad-

vances in Cryptology — AUSCRYPT ’90

Proceedings, Springer-Verlag, 1990, pp. 229–

236.

[Bro98] L. Brown, “Design of LOK97,” draft AES

submission, 1998.

[BS92] E. Biham and A. Shamir, “Diﬀerential

Cryptanalysis of Snefru, Khafre, REDOC II,

LOKI, and Lucifer,” Advances in Cryptol-

ogy — CRYPTO ’91 Proceedings, Springer-

Verlag, 1992, pp. 156–171.

[BS93] E. Biham and A. Shamir, Diﬀerential Crypt-

analysis of the Data Encryption Standard,

Springer-Verlag, 1993.

[BS95] M. Blaze and B. Schneier, “The MacGuf-

ﬁn Block Cipher Algorithm,” Fast Software

Encryption, Second International Workshop

Proceedings, Springer-Verlag, 1995, pp. 97–

110.

[BS97] E. Biham and A. Shamir, “Diﬀerential Fault

Analysis of Secret Key Cryptosystems,” Ad-

vances in Cryptology — CRYPTO ’97 Pro-

ceedings, Springer-Verlag, 1997, pp. 513–525.

[CDN95] G. Carter, E. Dawson, and L. Nielsen,

“DESV: A Latin Square Variation of DES,”

Proceedings of the Workshop on Selected Ar-

eas in Cryptography (SAC ’95), Ottawa,

Canada, 1995, pp. 158–172.

[CDN98] G. Carter, E. Dawson, and L. Nielsen,,

“Key Schedules of Iterative Block Ciphers,”

Third Australian Conference, ACISP ’98,

Springer-Verlag, to appear.

[Cla97] C.S.K. Clapp, “Optimizing a Fast Stream

Cipher for VLIW, SIMD, and Super-

scalar Processors,” Fast Software Encryp-

tion, 4th International Workshop Proceed-

ings, Springer-Verlag, 1997, pp. 273–287.

[Cla98] C.S.K. Clapp, “Joint Hardware/Software

Design of a Fast Stream Cipher,” Fast Soft-

ware Encryption, 5th International Work-

shop Proceedings, Springer-Verlag, 1998, pp.

75–92.

[CM98] H. Chabanne and E. Michon, “JER-

OBOAM” Fast Software Encryption, 5th In-

ternational Workshop Proceedings, Springer-

Verlag, 1998, pp. 49–59.

[Cop94] D. Coppersmith, “The Data Encryption

Standard (DES) and its Strength Against

Attacks,” IBM Journal of Research and De-

velopment, v. 38, n. 3, May 1994, pp. 243–

250.

[Cop98] D. Coppersmith, personal communication,

1998.

[CW91] T. Cusick and M.C. Wood, “The REDOC-II

Cryptosystem,” Advances in Cryptology —

CRYPTO ’90 Proceedings, Springer-Verlag,

1991, pp. 545–563.

[CWSK98] D. Coppersmith, D. Wagner, B. Schneier,

and J. Kelsey, “Cryptanalysis of

TWOPRIME,” Fast Software Encryption,

5th International Workshop Proceedings,

Springer-Verlag, 1998, pp. 32–48.

[Dae95] J. Daemen, “Cipher and Hash Function De-

sign,” Ph.D. thesis, Katholieke Universiteit

Leuven, Mar 95.

[DBP96] H. Dobbertin, A. Bosselaers, and B. Pre-

neel, “RIPEMD-160: A Strengthened Ver-

sion of RIPEMD,” Fast Software Encryp-

tion, Third International Workshop Proceed-

ings, Springer-Verlag, 1996, pp. 71–82.

[DC98] J. Daemen and C. Clapp, “Fast Hashing

and Stream Encryption with PANAMA,”

Fast Software Encryption, 5th Interna-

tional Workshop Proceedings, Springer-

Verlag, 1998, pp. 60–74.

[DGV93] J. Daemen, R. Govaerts, and J. Vandewalle,

“Block Ciphers Based on Modular Arith-

metic,” Proceedings of the 3rd Symposium

on: State and Progress of Research in Cryp-

tography, Fondazione Ugo Bordoni, 1993, pp.

80–89.

[DGV94a] J. Daemen, R. Govaerts, and J. Vande-

walle, “Weak Keys for IDEA,” Advances in

Cryptology — EUROCRYPT ’93 Proceed-

ings, Springer-Verlag, 1994, pp. 159–167.

[DGV94b] J. Daemen, R. Govaerts, and J. Vandewalle,

“A New Approach to Block Cipher Design,”

Fast Software Encryption, Cambridge Secu-

rity Workshop Proceedings, Springer-Verlag,

1994, pp. 18–32.

59

[DH76] W. Diﬃe and M. Hellman, “New Directions

in Cryptography,” IEEE Transactions on In-

formation Theory, v. IT-22, n. 6, Nov 1976,

pp. 644–654.

[DH79] W. Diﬃe and M. Hellman, “Exhaustive

Cryptanalysis of the NBS Data Encryption

Standard,” Computer, v. 10, n. 3, Mar 1979,

pp. 74–84.

[DK85] C. Deavours and L.A. Kruh, Machine Cryp-

tography and Modern Cryptanalysis, Artech

House, Dedham MA, 1985.

[DKR97] J. Daemen, L. Knudsen, and V. Rijmen,

“The Block Cipher Square,” Fast Soft-

ware Encryption, 4th International Work-

shop Proceedings, Springer-Verlag, 1997, pp.

149–165.

[ElG85] T. ElGamal, “A Public-Key Cryptosystem

and a Signature Scheme Based on Discrete

Logarithms,” IEEE Transactions on Infor-

mation Theory, v. IT-31, n. 4, 1985, pp. 469–

472.

[Fei73] H. Feistel, “Cryptography and Computer

Privacy,” Scientiﬁc American, v. 228, n. 5,

May 1973, pp. 15–23.

[Fer96] N. Ferguson, personal communication, 1996.

[FNS75] H. Feistel, W.A. Notz, and J.L. Smith,

“Some Cryptographic Techniques for

Machine-to-Machine Data Communications,

Proceedings on the IEEE, v. 63, n. 11, 1975,

pp. 1545–1554.

[FS97] N. Ferguson and B. Schneier, “Cryptanalysis

of Akelarre,” Workshop on Selected Areas in

Cryptography (SAC ’97) Workshop Record,

School of Computer Science, Carleton Uni-

versity, 1997, pp. 201–212.

[GC94] H. Gilbert and P. Chauvaud, “A Chosen-

Plaintext Attack on the 16-Round Khufu

Cryptosystem,” Advances in Cryptology —

CRYPTO ’94 Proceedings, Springer-Verlag,

1994, pp. 359–368.

[GOST89] GOST, Gosudarstvennyi Standard 28147-89,

“Cryptographic Protection for Data Pro-

cessing Systems,” Government Committee

of the USSR for Standards, 1989.

[Gut98] P. Gutmann, “Software Generation of Ran-

dom Numbers for Cryptographic Purposes,”

Proceedings of the 1998 Usenix Security

Symposium, 1998, pp. 243–257.

[Har96] C. Harpes, Cryptanalysis of Iterated Block

Ciphers, ETH Series on Information Pro-

cessing, v. 7, Hartung-Gorre Verlang Kon-

stanz, 1996.

[Haw98] P. Hawkes, “Diﬀerential-Linear Weak Key

Classes of IDEA,” Advances in Cryptology

— EUROCRYPT ’98 Proceedings, Springer-

Verlag, 1998, pp. 112–126.

[HKM95] C. Harpes, G. Kramer, and J. Massey,

“A Generalization of Linear Cryptanalysis

and the Applicability of Matsui’s Piling-up

Lemma,” Advances in Cryptology — EURO-

CRYPT ’95 Proceedings, Springer-Verlag,

1995, pp. 24–38.

[HKR+98] C. Hall, J. Kelsey, V. Rijmen, B. Schneier,

and D. Wagner, “Cryptanalysis of SPEED,”

unpublished manuscript, 1998.

[HKSW98] C. Hall, J. Kelsey, B. Schneier, and D.

Wagner, “Cryptanalysis of SPEED,” Finan-

cial Cryptography ’98 Proceedings, Springer-

Verlag, 1998, to appear.

[HM97] C. Harpes and J. Massey, “Partitioning

Cryptanalysis,” Fast Software Encryption,

4th International Workshop Proceedings,

Springer-Verlag, 1997, pp. 13–27.

[HT94] H.M. Heys and S.E. Tavares, “On the

Security of the CAST Encryption Algo-

rithm,” Canadian Conference on Electrical

and Computer Engineering, 1994, pp. 332–

335.

[Jeﬀ+76] T. Jeﬀerson et al., “Declaration of Indepen-

dence,” Philadelphia PA, 4 Jul 1776.

[JH96] T. Jakobsen and C. Harpes, “Bounds

on Non-Uniformity Measures for General-

ized Linear Cryptanalysis and Partitioning

Cryptanalysis,” Pragocrypt ’96 Proceedings,

1996, pp. 467–479.

[JK97] T. Jakobsen and L. Knudsen, “The Interpo-

lation Attack on Block Ciphers,” Fast Soft-

ware Encryption, 4th International Work-

shop Proceedings, Springer-Verlag, 1997, pp.

28–40.

[Kie96] K. Kiefer, “A New Design Concept for

Building Secure Block Ciphers,” Proceed-

ings of the 1st International Conference on

the Theory and Applications of Cryptogra-

phy, Pragocrypt ’96, CTU Publishing House,

1996, pp. 30–41.

[KKT94] T. Kaneko, K. Koyama, and R. Terada, “Dy-

namic Swapping Schemes and Diﬀerential

Cryptanalysis,” IEICE Tranactions, v. E77-

A, 1994, pp. 1328–1336.

60

[KLPL95] K. Kim, S. Lee, S. Park, and D. Lee, “Se-

curing DES S-boxes Against Three Robust

Cryptanalysis,” Proceedings of the Work-

shop on Selected Areas in Cryptography

(SAC ’95), Ottawa, Canada, 1995, pp. 145–

157.

[KM97] L.R. Knudsen and W. Meier, “Diﬀerential

Cryptanalysis of RC5,” European Transac-

tions on Communication, v. 8, n. 5, 1997,

pp. 445–454.

[Knu93a] L.R. Knudsen, “Cryptanalysis of LOKI,”

Advances in Cryptology — ASIACRYPT

’91, Springer-Verlag, 1993, pp. 22–35.

[Knu93b] L.R. Knudsen, “Cryptanalysis of LOKI91,”

Advances in Cryptology — AUSCRYPT ’92,

Springer-Verlag, 1993, pp. 196–208.

[Knu93c] L.R. Knudsen, “Iterative Characteristics of

DES and s

2

DES,” Advances in Cryptology

— CRYPTO ’92, Springer-Verlag, 1993, pp.

497–511.

[Knu94a] L.R. Knudsen, “Block Ciphers — Analysis,

Design, Applications,” Ph.D. dissertation,

Aarhus University, Nov 1994.

[Knu94b] L.R. Knudsen, “Practically Secure Feis-

tel Ciphers,” Fast Software Encryption,

Cambridge Security Workshop Proceedings,

Springer-Verlag, 1994, pp. 211–221.

[Knu95a] L.R. Knudsen, “New Potentially ‘Weak’

Keys for DES and LOKI,” Advances in

Cryptology — EUROCRYPT ’94 Proceed-

ings, Springer-Verlag, 1995, pp. 419–424.

[Knu95b] L.R. Knudsen, “Truncated and Higher Or-

der Diﬀerentials,” Fast Software Encryp-

tion, 2nd International Workshop Proceed-

ings, Springer-Verlag, 1995, pp. 196–211.

[Knu95c] L.R. Knudsen, “A Key-Schedule Weakness

in SAFER K-64,” Advances in Cryptology—

CRYPTO ’95 Proceedings, Springer-Verlag,

1995, pp. 274–286.

[Koc96] P. Kocher, “Timing Attacks on Implemen-

tations of Diﬃe-Hellman, RSA, DSS, and

Other Systems,” Advances in Cryptology —

CRYPTO ’96 Proceedings, Springer-Verlag,

1996, pp. 104–113.

[Koc98] P. Kocher, personal communication, 1998.

[KPL93] K. Kim, S. Park, and S. Lee, “Reconstruc-

tion of s

2

DES S-Boxes and their Immunity

to Diﬀerential Cryptanalysis,” Proceedings

of the 1993 Japan-Korea Workshop on In-

formation Security and Cryptography, Seoul,

Korea, 24–26 October 1993, pp. 282–291.

[KR94] B. Kaliski Jr., and M. Robshaw, “Linear

Cryptanalysis Using Multiple Approxima-

tions,” Advances in Cryptology — CRYPTO

’94 Proceedings, Springer-Verlag, 1994, pp.

26–39.

[KR95] B. Kaliski Jr., and M. Robshaw, “Linear

Cryptanalysis Using Multiple Approxima-

tions and FEAL,” Fast Software Encryption,

Second International Workshop Proceedings,

Springer-Verlag, 1995, pp. 249–264.

[KR96a] L. Knudsen and M. Robshaw, “Non-Linear

Approximations in Linear Cryptanalysis,”

Advances in Cryptology — EUCROCRYPT

’96, Springer-Verlag, 1996, pp. 224–236.

[KR96] J. Kilian and P. Rogaway, “How to Protect

DES Against Exhaustive Key Search,” Ad-

vances in Cryptology — CRYPTO ’96 Pro-

ceedings, Springer-Verlag, 1996, pp. 252–267.

[KR97] L.R. Knudsen and V. Rijmen, “Two Rights

Sometimes Make a Wrong,” Workshop on

Selected Areas in Cryptography (SAC ’97)

Workshop Record, School of Computer Sci-

ence, Carleton University, 1997, pp. 213–

223.

[KRRR98] L.R. Knudsen, V. Rijmen, R. Rivest, and

M. Robshaw, “On the Design and Security

of RC2,” Fast Software Encryption, 5th In-

ternational Workshop Proceedings, Springer-

Verlag, 1998, pp. 206–221.

[KSHW98] J. Kelsey, B. Schneier, C. Hall, and D. Wag-

ner, “Secure Applications of Low-Entropy

Keys,” Information Security. First Inter-

national Workshop ISW ’97 Proceedings,

Springer-Verlag, 1998, 121–134.

[KSW96] J. Kelsey, B. Schneier, and D. Wagner,

“Key-Schedule Cryptanalysis of IDEA, G-

DES, GOST, SAFER, and Triple-DES,” Ad-

vances in Cryptology — CRYPTO ’96 Pro-

ceedings, Springer-Verlag, 1996, pp. 237–251.

[KSW97] J. Kelsey, B. Schneier, and D. Wag-

ner, “Related-Key Cryptanalysis of 3-WAY,

Biham-DES, CAST, DES-X, NewDES, RC2,

and TEA,” Information and Communica-

tions Security, First International Confer-

ence Proceedings, Springer-Verlag, 1997, pp.

203–207.

[KSWH98a] J. Kelsey, B. Schneier, D. Wagner, and

C. Hall, “Cryptanalytic Attacks on Pseu-

dorandom Number Generators,” Fast Soft-

ware Encryption, 5th International Work-

shop Proceedings, Springer-Verlag, 1998, pp.

168–188.

61

[KSWH98b] J. Kelsey, B. Schneier, D. Wagner, and C.

Hall, “Side Channel Cryptanalysis of Prod-

uct Ciphers,” ESORICS ’98 Proceedings,

Springer-Verlag, 1998, to appear.

[KSWH98c] J. Kelsey, B. Schneier, D. Wagner, and C.

Hall, “Yarrow: A Pseudorandom Number

Generator,” in preparation.

[Kwa97] M. Kwan, “The Design of ICE Encryp-

tion Algorithm,” Fast Software Encryp-

tion, 4th International Workshop Proceed-

ings, Springer-Verlag, 1997, pp. 69–82.

[KY95] B.S. Kaliski and Y.L. Yin, “On Diﬀer-

ential and Linear Cryptanalysis of the

RC5 Encryption Algorithm,” Advances

in Cryptology—CRYPTO ’95 Proceedings,

Springer-Verlag, 1995, pp. 445–454.

[Lai94] X. Lai, “Higher Order Derivations and

Diﬀerential Cryptanalysis,” Communica-

tions and Cryptography: Two Sides of

One Tapestry, Kluwer Academic Publishers,

1994, pp. 227–233.

[LC97] C.-H. Lee and Y.-T. Cha, “The Block Ci-

pher: SNAKE with Provable Resistance

Against DC and LC Attacks,” Proceedings

of JW-ISC ’97, KIISC and ISEC Group of

IEICE, 1997, pp. 3–17.

[Lee96] M. Leech, “CRISP: A Feistel Network with

Hardened Key Scheduling,” Workshop on

Selected Areas in Cryptography (SAC ’96)

Workshop Record, Queens University, 1996,

pp. 15–29.

[LH94] S. Langford and M. Hellman, “Diﬀerential-

Linear Cryptanalysis,” Advances in Cryptol-

ogy — CRYPTO ’94 Proceedings, Springer-

Verlag, 1994, pp. 17–26.

[LM91] X. Lai and J. Massey, “A Proposal for a

New Block Encryption Standard,” Advances

in Cryptology — EUROCRYPT ’90 Proceed-

ings, Springer-Verlag, 1991, pp. 389–404.

[LMM91] X. Lai, J. Massey, and S. Murphy, “Markov

Ciphers and Diﬀerential Cryptanalysis,” Ad-

vances in Cryptology — CRYPTO ’91 Pro-

ceedings, Springer-Verlag, 1991, pp. 17–38.

[MA96] S. Mister and C. Adams, “Practical S-Box

Design,” Workshop on Selected Areas in

Cryptography (SAC ’96) Workshop Record,

Queens University, 1996, pp. 61–76.

[Mad84] W.E. Madryga, “A High Performance En-

cryption Algorithm,” Computer Security: A

Global Challenge, Elsevier Science Publish-

ers, 1984, pp. 557–570.

[Mas94] J.L. Massey, “SAFER K-64: A Byte-

Oriented Block-Ciphering Algorithm,” Fast

Software Encryption, Cambridge Security

Workshop Proceedings, Springer-Verlag,

1994, pp. 1–17.

[Mat94] M. Matsui, “Linear Cryptanalysis Method

for DES Cipher,” Advances in Cryptology

— EUROCRYPT ’93 Proceedings, Springer-

Verlag, 1994, pp. 386–397.

[Mat95] M. Matsui, “On Correlation Between the Or-

der of S-Boxes and the Strength of DES,”

Advances in Cryptology — EUROCRYPT

’94 Proceedings, Springer-Verlag, 1995, pp.

366–375.

[Mat96] M. Matsui, “New Structure of Block Ci-

phers with Provable Security Against Diﬀer-

ential and Linear Cryptanalysis,” Fast Soft-

ware Encryption, 3rd International Work-

shop Proceedings, Springer-Verlag, 1996, pp.

205–218.

[Mat97] M. Matsui, “New Block Encryption Al-

gorithm MISTY,” Fast Software Encryp-

tion, 4th International Workshop Proceed-

ings, Springer-Verlag, 1997, pp. 54–68.

[McD97] T.J. McDermott, “NSA comments on crite-

ria for AES,” letter to NIST, National Secu-

rity Agency, 2 Apr 97.

[Mer91] R.C. Merkle, “Fast Software Encryption

Functions,” Advances in Cryptology —

CRYPTO ’90 Proceedings, Springer-Verlag,

1991, pp. 476–501.

[MS77] F.J. MacWilliams and N.J.A. Sloane, “The

Theory of Error-Correcting Codes,” North-

Holland, Amsterdam, 1977.

[MSK98a] S. Moriai, T. Shimoyama, and T. Kaneko,

“Higher Order Diﬀerential Attack of a CAST

Cipher,” Fast Software Encryption, 5th In-

ternational Workshop Proceedings, Springer-

Verlag, 1998, pp. 17–31.

[MSK98b] S. Moriai, T. Shimoyama, and T. Kaneko,

“Interpolation Attacks of the Block Cipher:

SNAKE,” unpublished manuscript, 1998.

[Mur90] S. Murphy, “The Cryptanalysis of FEAL-

4 with 20 Chosen Plaintexts,” Journal of

Cryptology, v. 2, n. 3, 1990, pp. 145–154.

[NBS77] National Bureau of Standards, NBS FIPS

PUB 46, “Data Encryption Standard,” Na-

tional Bureau of Standards, U.S. Depart-

ment of Commerce, Jan 1977.

62

[NBS80] National Bureau of Standards, NBS FIPS

PUB 46, “DES Modes of Operation,” Na-

tional Bureau of Standards, U.S. Depart-

ment of Commerce, Dec 1980.

[NIST93] National Institute of Standards and Technol-

ogy, “Secure Hash Standard,” U.S. Depart-

ment of Commerce, May 1993.

[NIST94] National Institute of Standards and Tech-

nologies, NIST FIPS PUB 186, “Digital Sig-

nature Standard,” U.S. Department of Com-

merce, May 1994.

[NIST97a] National Institute of Standards and Technol-

ogy, “Announcing Development of a Federal

Information Standard for Advanced Encryp-

tion Standard,” Federal Register, v. 62, n. 1,

2 Jan 1997, pp. 93–94.

[NIST97b] National Institute of Standards and Technol-

ogy, “Announcing Request for Candidate Al-

gorithm Nominations for the Advanced En-

cryption Standard (AES),” Federal Register,

v. 62, n. 117, 12 Sep 1997, pp. 48051–48058.

[NK95] K. Nyberg and L.R. Knudsen, “Provable Se-

curity Against Diﬀerential Cryptanalysis,”

Journal of Cryptology, v. 8, n. 1, 1995, pp.

27–37.

[NM97] J. Nakajima and M. Matsui, “Fast Software

Implementation of MISTY on Alpha Pro-

cessors,” Proceedings of JW-ISC ’97, KIISC

and ISEC Group of IEICE, 1997, pp. 55–63.

[Nyb91] K. Nyberg, “Perfect Nonlinear S-boxes,” Ad-

vances in Cryptology — EUROCRYPT ’91

Proceedings, Springer-Verlag, 1991, pp. 378–

386.

[Nyb93] K. Nyberg, “On the Construction of

Highly Nonlinear Permutations,” Advances

in Cryptology — EUROCRYPT ’92 Proceed-

ings, Springer-Verlag, 1993, pp. 92–98.

[Nyb94] K. Nyberg, “Diﬀerentially Uniform Map-

pings for Cryptography,” Advances in Cryp-

tology — EUROCRYPT ’93 Proceedings,

Springer-Verlag, 1994, pp. 55–64.

[Nyb95] K. Nyberg, “Linear Approximation of Block

Ciphers,” Advances in Cryptology — EURO-

CRYPT ’94 Proceedings, Springer-Verlag,

1995, pp. 439–444.

[Nyb96] K. Nyberg, “Generalized Feistel Networks,”

Advances in Cryptology — ASIACRYPT ’96

Proceedings, Springer-Verlag, 1996, pp. 91–

104.

[OCo94a] L. O’Connor, “Enumerating Nondegener-

ate Permutations,” Advances in Cryptology

— EUROCRYPT ’93 Proceedings, Springer-

Verlag, 1994, pp. 368–377.

[OCo94b] L. O’Connor, “On the Distribution of Char-

acteristics in Bijective Mappings,” Advances

in Cryptology — EUROCRYPT ’93 Proceed-

ings, Springer-Verlag, 1994, pp. 360–370.

[OCo94c] L. O’Connor, “On the Distribution of Char-

acteristics in Composite Permutations,” Ad-

vances in Cryptology — CRYPTO ’93 Pro-

ceedings, Springer-Verlag, 1994, pp. 403–412.

[Plu94] C. Plumb, “Truly Random Numbers,” Dr.

Dobbs Journal, v. 19, n. 13, Nov 1994, pp.

113-115.

[Pre93] B. Preneel, Analysis and Design of Crypto-

graphic Hash Functions, Ph.D. dissertation,

Katholieke Universiteit Leuven, Jan 1993.

[PRB98] B. Preneel, V. Rijmen, A. Bosselaers, “Re-

cent Developments in the Design of Conven-

tional Cryptographic Algorithms,” State of

the Art and Evolution of Computer Security

and Industrial Cryptography, Lecture Notes

in Computer Science, B. Preneel, R. Gov-

aerts, J. Vandewalle, Eds., Springer-Verlag,

1998, to appear.

[QDD86] J.-J. Quisquater, Y. Desmedt, and M. Davio,

“The Importance of ‘Good’ Key Schedul-

ing Schemes,” Advances in Cryptology —

CRYPTO ’85 Proceedings, Springer-Verlag,

1986, pp. 537–542.

[RAND55] RAND Corporation, A Million Random Dig-

its with 100,000 Normal Deviates, Glencoe,

IL, Free Press Publishers, 1955.

[RC94] P. Rogaway and D. Coppersmith, “A

Software-Optimized Encryption Algorithm,”

Fast Software Encryption, Cambridge Secu-

rity Workshop Proceedings, Springer-Verlag,

1994, pp. 56–63.

[RC97] P. Rogaway and D. Coppersmith, “A

Software-Optimized Encryption Algo-

rithm,” full version of [RC94], available at

http://www.cs.ucdavis.edu/~rogaway/

papers/seal.ps, 1997.

[RDP+96] V. Rijmen, B. Preneel, A. Bosselaers, and E.

DeWin, “The Cipher SHARK,” Fast Soft-

ware Encryption, 3rd International Work-

shop Proceedings, Springer-Verlag, 1996, pp.

99–111.

[Rij97] V. Rijmen, Cryptanalysis and Design of Iter-

ated Block Ciphers, Ph.D. thesis, Katholieke

Universiteit Leuven, Oct 1997.

63

[RIPE92] Research and Development in Advanced

Communication Technologies in Europe,

RIPE Integrity Primitives: Final Report

of RACE Integrity Primitives Evaluation

(R1040), RACE, June 1992.

[Riv91] R.L. Rivest, “The MD4 Message Digest

Algorithm,” Advances in Cryptology —

CRYPTO ’90 Proceedings, Springer-Verlag,

1991, pp. 303–311.

[Riv92] R.L. Rivest, “The MD5 Message Digest Al-

gorithm,” RFC 1321, Apr 1992.

[Riv95] R.L. Rivest, “The RC5 Encryption Algo-

rithm,” Fast Software Encryption, 2nd In-

ternational Workshop Proceedings, Springer-

Verlag, 1995, pp. 86–96.

[Riv97] R. Rivest, “A Description of the RC2(r) En-

cryption Algorithm,” Internet-Draft, work

in progress, June 1997.

[Ros98] G. Rose, “A Stream Cipher Based on Lin-

ear Feedback over GF(2

8

),” Third Australian

Conference, ACISP ’98, Springer-Verlag, to

appear.

[RP95a] V. Rijmen and B. Preneel, “Cryptanalysis

of MacGuﬃn,” Fast Software Encryption,

Second International Workshop Proceedings,

Springer-Verlag, 1995, pp. 353–358.

[RP95b] V. Rijmen and B. Preneel, “On Weaknesses

of Non-surjective Round Functions,” Pro-

ceedings of the Workshop on Selected Ar-

eas in Cryptography (SAC ’95), Ottawa,

Canada, 1995, pp. 100–106.

[RPD97] V. Rijman, B. Preneel and E. DeWin, “On

Weaknesses of Non-surjective Round Func-

tions,” Designs, Codes, and Cryptography, v.

12, n. 3, 1997, pp. 253–266.

[RSA78] R. Rivest, A. Shamir, and L. Adleman,

“A Method for Obtaining Digital Signatures

and Public-Key Cryptosystems,” Communi-

cations of the ACM, v. 21, n. 2, Feb 1978,

pp. 120–126.

[SAM97] T. Shimoyama, S. Amada, and S. Moriai,

“Improved Fast Software Implementation of

Block Ciphers,” Information and Commu-

nications Security, First International Con-

ference, ICICS ’97 Proceedings, Springer-

Verlag, 1997, pp. 203–207.

[Sch94] B. Schneier, “Description of a New Variable-

Length Key, 64-Bit Block Cipher (Blow-

ﬁsh),” Fast Software Encryption, Cambridge

Security Workshop Proceedings, Springer-

Verlag, 1994, pp. 191–204.

[Sch96] B. Schneier, Applied Cryptography, Second

Edition, John Wiley & Sons, 1996.

[Sco85] R. Scott, “Wide Open Encryption Design

Oﬀers Flexible Implementation,” Cryptolo-

gia, v. 9, n. 1, Jan 1985, pp. 75–90.

[Sel98] A.A. Sel¸cuk, “New Results in Linear Crypt-

analysis of RC5,” Fast Software Encryp-

tion, 5th International Workshop Proceed-

ings, Springer-Verlag, 1998, pp. 1–16.

[SK96] B. Schneier and J. Kelsey, “Unbalanced

Feistel Networks and Block Cipher De-

sign,” Fast Software Encryption, 3rd In-

ternational Workshop Proceedings, Springer-

Verlag, 1996, pp. 121–144.

[SK98] T. Shimoyama and T. Kaneko, “Quadratic

Relation of S-box and Its Application to

the Linear Attack of Full Round DES,” Ad-

vances in Cryptology — CRYPTO ’98 Pro-

ceedings, Springer-Verlag, 1998, in prepara-

tion.

[SM88] A. Shimizu and S. Miyaguchi, “Fast Data

Encipherment Algorithm FEAL,” Advances

in Cryptology — EUROCRYPT ’87 Proceed-

ings, Springer-Verlag, 1988, pp. 267–278.

[SMK98] T. Shimoyama, S. Moriai, and T. Kaneko,

“Improving the Higher Order Diﬀerential

Attack and Cryptanalysis of the KN Ci-

pher,” Information Security. First Inter-

national Workshop ISW ’97 Proceedings,

Springer-Verlag, 1998, pp. 32–42.

[SV98] J. Stern and S. Vaudenay, “CS-Cipher,”

Fast Software Encryption, 5th Interna-

tional Workshop Proceedings, Springer-

Verlag, 1998, pp. 189–205.

[SW97] B. Schneier and D. Whiting, “Fast Soft-

ware Encryption: Designing Encryption Al-

gorithms for Optimal Speed on the Intel

Pentium Processor,” Fast Software Encryp-

tion, 4th International Workshop Proceed-

ings, Springer-Verlag, 1997, pp. 242–259.

[Vau95] S. Vaudenay, “On the Need for Multi-

permutations: Cryptanalysis of MD4 and

SAFER,” Fast Software Encryption, Sec-

ond International Workshop Proceedings,

Springer-Verlag, 1995, pp. 286–297.

[Vau96a] S. Vaudenay, “On the Weak Keys in Blow-

ﬁsh,” Fast Software Encryption, 3rd Inter-

national Workshop Proceedings, Springer-

Verlag, 1996, pp. 27–32.

64

[Vau96b] S. Vaudenay, “An Experiment on DES Sta-

tistical Cryptanalysis,” 3rd ACM Confer-

ence on Computer and Communications Se-

curity, ACM Press, 1996, pp. 139–147.

[Wag95a] D. Wagner, “Cryptanalysis of S-1,” sci.crypt

Usenet posting, 27 Aug 1995.

[Wag95b] D. Wagner, personal communication, 1995.

[WH87] R. Winternitz and M. Hellman, “Chosen-key

Attacks on a Block Cipher,” Cryptologia, v.

11, n. 1, Jan 1987, pp. 16–20.

[Whe94] D. Wheeler, “A Bulk Data Encryp-

tion Algorithm,” Fast Software Encryption,

Cambridge Security Workshop Proceedings,

Springer-Verlag, 1994, pp. 127–134.

[Wie94] M.J. Wiener, “Eﬃcient DES Key Search,”

TR-244, School of Computer Science, Car-

leton University, May 1994.

[Win84] R.S. Winternitz, “Producing One-Way Hash

Functions from DES,” Advances in Cryp-

tology: Proceedings of Crypto 83, Plenum

Press, 1984, pp. 203–207.

[WN95] D. Wheeler and R. Needham, “TEA, a Tiny

Encryption Algorithm,” Fast Software En-

cryption, 2nd International Workshop Pro-

ceedings, Springer-Verlag, 1995, pp. 97–110.

[WSK97] D. Wagner, B. Schneier, and J. Kelsey,

“Cryptanalysis of the Cellular Message En-

cryption Algorithm,” Advances in Cryptol-

ogy — CRYPTO ’97 Proceedings, Springer-

Verlag, 1997, pp. 526–537.

[YLCY98] X. Yi, K.Y. Lam, S.X. Cheng, and X.H.

You, “A New Byte-Oriented Block Cipher,”

Information Security. First International

Workshop ISW ’97 Proceedings, Springer-

Verlag, 1998, 209–220.

[YMT97] A.M. Youssef, S. Mister, and S.E. Tavares,

“On the Design of Linear Transformations

for Substitution Permutation Encryption

Networks,” Workshop on Selected Areas in

Cryptography (SAC ’97) Workshop Record,

School of Computer Science, Carleton Uni-

versity, 1997, pp. 40–48.

[YTH96] A.M. Youssef, S.E. Tavares, and H.M. Heys,

“A New Class of Substitution-Permutation

Networks,” Workshop on Selected Areas in

Cryptography (SAC ’96) Workshop Record,

Queens University, 1996, pp. 132–147.

[Yuv97] G. Yuval, “Reinventing the Travois: Encry-

tion/MAC in 30 ROM Bytes,” Fast Soft-

ware Encryption, 4th International Work-

shop Proceedings, Springer-Verlag, 1997, pp.

205–209.

[ZG97] F. Zhu and B.-A. Guo, “A Block-Ciphering

Algorithm Based on Addition-Multiplication

Structure in GF(2

n

),” Workshop on Selected

Areas in Cryptography (SAC ’97) Workshop

Record, School of Computer Science, Car-

leton University, 1997, pp. 145–159.

[Zhe97] Y. Zheng, “The SPEED Cipher,” Finan-

cial Cryptography ’97 Proceedings, Springer-

Verlag, 1997, pp. 71–89.

[ZMI90] Y. Zheng, T. Matsumoto, and H. Imai, “On

the Construction of Block Ciphers Prov-

ably Secure and Not Relying on Any Un-

proved Hypotheses,” Advances in Cryptol-

ogy — CRYPTO ’89 Proceedings, Springer-

Verlag, 1990, pp. 461–480.

[ZPS93] Y. Zheng, J. Pieprzyk, and J. Seberry,

“HAVAL — A One-Way Hashing Algorithm

with Variable Length of Output,” Advances

in Cryptology — AUSCRYPT ’92 Proceed-

ings, Springer-Verlag, 1993, pp. 83–104.

A Twoﬁsh Test Vectors

A.1 Intermediate Values

The following ﬁle shows the intermediate values of

three Twoﬁsh computations. This particular im-

plementation does not swap the halves but instead

applies the F function alternately between the two

halves.

FILENAME: "ecb_ival.txt"

Electronic Codebook (ECB) Mode

Intermediate Value Tests

Algorithm Name: TWOFISH

Principal Submitter: Bruce Schneier, Counterpane Systems

==========

KEYSIZE=128

KEY=00000000000000000000000000000000

;

;makeKey: Input key --> S-box key [Encrypt]

; 00000000 00000000 --> 00000000

; 00000000 00000000 --> 00000000

; Subkeys

; 52C54DDE 11F0626D Input whiten

; 7CAC9D4A 4D1B4AAA

; B7B83A10 1E7D0BEB Output whiten

; EE9C341F CFE14BE4

; F98FFEF9 9C5B3C17 Round subkeys

; 15A48310 342A4D81

; 424D89FE C14724A7

; 311B834C FDE87320

; 3302778F 26CD67B4

; 7A6C6362 C2BAF60E

; 3411B994 D972C87F

; 84ADB1EA A7DEE434

; 54D2960F A2F7CAA8

; A6B8FF8C 8014C425

; 6A748D1C EDBAF720

; 928EF78C 0338EE13

; 9949D6BE C8314176

; 07C07D68 ECAE7EA7

; 1FE71844 85C05C89

; F298311E 696EA672

;

65

PT=00000000000000000000000000000000

Encrypt()

R[-1]: x= 00000000 00000000 00000000 00000000.

R[ 0]: x= 52C54DDE 11F0626D 7CAC9D4A 4D1B4AAA.

R[ 1]: x= 52C54DDE 11F0626D C38DCAA4 7A0A91B6. t0=C06D4949. t1=41B9BFC1.

R[ 2]: x= 55A538DE 5C5A4DB6 C38DCAA4 7A0A91B6. t0=7C4536B9. t1=67A58299.

R[ 3]: x= 55A538DE 5C5A4DB6 899063BD 893E49A9. t0=60DAC1A4. t1=2D84C23D.

R[ 4]: x= 2AE61A96 84BC42D3 899063BD 893E49A9. t0=607AAEAD. t1=6ED2DBF9.

R[ 5]: x= 2AE61A96 84BC42D3 F14F2618 821B5F36. t0=067D0B49. t1=318EACB4.

R[ 6]: x= 0FFE0AD1 D6B87B70 F14F2618 821B5F36. t0=58554EDB. t1=62585CF7.

R[ 7]: x= 0FFE0AD1 D6B87B70 CD0D38A1 C069BD9B. t0=839B0017. t1=B3A89DB0.

R[ 8]: x= A85CE579 DE2661CE CD0D38A1 C069BD9B. t0=E9BC6975. t1=F0DDA4C3.

R[ 9]: x= A85CE579 DE2661CE 7A39754C 973ABD2A. t0=54687CDF. t1=9044BF4B.

R[10]: x= 013077D7 B3528BA1 7A39754C 973ABD2A. t0=77FC927F. t1=8B8678CC.

R[11]: x= 013077D7 B3528BA1 D57933FD F8EA8B1B. t0=E3C81108. t1=828E7493.

R[12]: x= 64F0EAA1 DA27090C D57933FD F8EA8B1B. t0=B33C25D6. t1=83068533.

R[13]: x= 64F0EAA1 DA27090C F64F1005 99149A52. t0=A0AA2F81. t1=FFF30DB7.

R[14]: x= B0681C46 606D0273 F64F1005 99149A52. t0=114C17C5. t1=EB143CFF.

R[15]: x= B0681C46 606D0273 EB27628F 2C51191D. t0=677DA87D. t1=989D1459.

R[16]: x= C1708BA9 9522A3CE EB27628F 2C51191D. t0=9357B338. t1=AC9926BF.

R[17]: x= 5C9F589F 322C12F6 2FECBFB6 5AC3E82A.

CT=9F589F5CF6122C32B6BFEC2F2AE8C35A

Decrypt()

CT=9F589F5CF6122C32B6BFEC2F2AE8C35A

R[17]: x= 5C9F589F 322C12F6 2FECBFB6 5AC3E82A.

R[16]: x= C1708BA9 9522A3CE EB27628F 2C51191D. t0=9357B338. t1=AC9926BF.

R[15]: x= B0681C46 606D0273 EB27628F 2C51191D. t0=677DA87D. t1=989D1459.

R[14]: x= B0681C46 606D0273 F64F1005 99149A52. t0=114C17C5. t1=EB143CFF.

R[13]: x= 64F0EAA1 DA27090C F64F1005 99149A52. t0=A0AA2F81. t1=FFF30DB7.

R[12]: x= 64F0EAA1 DA27090C D57933FD F8EA8B1B. t0=B33C25D6. t1=83068533.

R[11]: x= 013077D7 B3528BA1 D57933FD F8EA8B1B. t0=E3C81108. t1=828E7493.

R[10]: x= 013077D7 B3528BA1 7A39754C 973ABD2A. t0=77FC927F. t1=8B8678CC.

R[ 9]: x= A85CE579 DE2661CE 7A39754C 973ABD2A. t0=54687CDF. t1=9044BF4B.

R[ 8]: x= A85CE579 DE2661CE CD0D38A1 C069BD9B. t0=E9BC6975. t1=F0DDA4C3.

R[ 7]: x= 0FFE0AD1 D6B87B70 CD0D38A1 C069BD9B. t0=839B0017. t1=B3A89DB0.

R[ 6]: x= 0FFE0AD1 D6B87B70 F14F2618 821B5F36. t0=58554EDB. t1=62585CF7.

R[ 5]: x= 2AE61A96 84BC42D3 F14F2618 821B5F36. t0=067D0B49. t1=318EACB4.

R[ 4]: x= 2AE61A96 84BC42D3 899063BD 893E49A9. t0=607AAEAD. t1=6ED2DBF9.

R[ 3]: x= 55A538DE 5C5A4DB6 899063BD 893E49A9. t0=60DAC1A4. t1=2D84C23D.

R[ 2]: x= 55A538DE 5C5A4DB6 C38DCAA4 7A0A91B6. t0=7C4536B9. t1=67A58299.

R[ 1]: x= 52C54DDE 11F0626D C38DCAA4 7A0A91B6. t0=C06D4949. t1=41B9BFC1.

R[ 0]: x= 52C54DDE 11F0626D 7CAC9D4A 4D1B4AAA.

R[-1]: x= 00000000 00000000 00000000 00000000.

PT=00000000000000000000000000000000

==========

KEYSIZE=192

KEY=0123456789ABCDEFFEDCBA98765432100011223344556677

;

;makeKey: Input key --> S-box key [Encrypt]

; EFCDAB89 67452301 --> B89FF6F2

; 10325476 98BADCFE --> B255BC4B

; 77665544 33221100 --> 45661061

; Subkeys

; 38394A24 C36D1175 Input whiten

; E802528F 219BFEB4

; B9141AB4 BD3E70CD Output whiten

; AF609383 FD36908A

; 03EFB931 1D2EE7EC Round subkeys

; A7489D55 6E44B6E8

; 714AD667 653AD51F

; B6315B66 B27C05AF

; A06C8140 9853D419

; 4016E346 8D1C0DD4

; F05480BE B6AF816F

; 2D7DC789 45B7BD3A

; 57F8A163 2BEFDA69

; 26AE7271 C2900D79

; ED323794 3D3FFD80

; 5DE68E49 9C3D2478

; DF326FE3 5911F70D

; C229F13B B1364772

; 4235364D 0CEC363A

; 57C8DD1F 6A1AD61E

;

PT=00000000000000000000000000000000

Encrypt()

R[-1]: x= 00000000 00000000 00000000 00000000.

R[ 0]: x= 38394A24 C36D1175 E802528F 219BFEB4.

R[ 1]: x= 38394A24 C36D1175 9C263D67 5E68BE8F. t0=988C8223. t1=33D1ECEC.

R[ 2]: x= C8F5099F 0C4B8F53 9C263D67 5E68BE8F. t0=E8C880BC. t1=19C23B0A.

R[ 3]: x= C8F5099F 0C4B8F53 69948F5E E67C030F. t0=C615F1F6. t1=17AE5B7E.

R[ 4]: x= 07633866 59421079 69948F5E E67C030F. t0=90AB32AA. t1=7F56EB43.

R[ 5]: x= 07633866 59421079 C015BE79 149B9CEC. t0=52971E00. t1=F6BC546D.

R[ 6]: x= A042B99D 709EF54B C015BE79 149B9CEC. t0=DAA00849. t1=2D2F5FCE.

R[ 7]: x= A042B99D 709EF54B 0CD39FA6 B250BEDA. t0=EE03FB5B. t1=FB5A051C.

R[ 8]: x= F7B097FA 9E5C4FF7 0CD39FA6 B250BEDA. t0=09A1B597. t1=18041948.

R[ 9]: x= F7B097FA 9E5C4FF7 77FC8B29 CC2B3F88. t0=99C9694E. t1=F1687F43.

R[10]: x= A279C718 421A8D38 77FC8B29 CC2B3F88. t0=5D174956. t1=2F7D5E04.

R[11]: x= A279C718 421A8D38 5B1A0904 12FEBF99. t0=5BC40012. t1=78D2617B.

R[12]: x= E4409C22 702548A2 5B1A0904 12FEBF99. t0=C251B3CE. t1=4AC0BD46.

R[13]: x= E4409C22 702548A2 5DDAA2A1 EFB2F051. t0=91BC2070. t1=6FC0BBF3.

R[14]: x= 8561A604 825D2480 5DDAA2A1 EFB2F051. t0=A7D24F8E. t1=84878F62.

R[15]: x= 8561A604 825D2480 5CC6CB7B 62A2CE64. t0=93690387. t1=0EB8FA83.

R[16]: x= 17738CD3 B5142D18 5CC6CB7B 62A2CE64. t0=5FE8370B. t1=F3D5AB78.

R[17]: x= E5D2D1CF DF9CBEA9 B8131F50 4822BD92.

CT=CFD1D2E5A9BE9CDF501F13B892BD2248

Decrypt()

CT=CFD1D2E5A9BE9CDF501F13B892BD2248

R[17]: x= E5D2D1CF DF9CBEA9 B8131F50 4822BD92.

R[16]: x= 17738CD3 B5142D18 5CC6CB7B 62A2CE64. t0=5FE8370B. t1=F3D5AB78.

R[15]: x= 8561A604 825D2480 5CC6CB7B 62A2CE64. t0=93690387. t1=0EB8FA83.

R[14]: x= 8561A604 825D2480 5DDAA2A1 EFB2F051. t0=A7D24F8E. t1=84878F62.

R[13]: x= E4409C22 702548A2 5DDAA2A1 EFB2F051. t0=91BC2070. t1=6FC0BBF3.

R[12]: x= E4409C22 702548A2 5B1A0904 12FEBF99. t0=C251B3CE. t1=4AC0BD46.

R[11]: x= A279C718 421A8D38 5B1A0904 12FEBF99. t0=5BC40012. t1=78D2617B.

R[10]: x= A279C718 421A8D38 77FC8B29 CC2B3F88. t0=5D174956. t1=2F7D5E04.

R[ 9]: x= F7B097FA 9E5C4FF7 77FC8B29 CC2B3F88. t0=99C9694E. t1=F1687F43.

R[ 8]: x= F7B097FA 9E5C4FF7 0CD39FA6 B250BEDA. t0=09A1B597. t1=18041948.

R[ 7]: x= A042B99D 709EF54B 0CD39FA6 B250BEDA. t0=EE03FB5B. t1=FB5A051C.

R[ 6]: x= A042B99D 709EF54B C015BE79 149B9CEC. t0=DAA00849. t1=2D2F5FCE.

R[ 5]: x= 07633866 59421079 C015BE79 149B9CEC. t0=52971E00. t1=F6BC546D.

R[ 4]: x= 07633866 59421079 69948F5E E67C030F. t0=90AB32AA. t1=7F56EB43.

R[ 3]: x= C8F5099F 0C4B8F53 69948F5E E67C030F. t0=C615F1F6. t1=17AE5B7E.

R[ 2]: x= C8F5099F 0C4B8F53 9C263D67 5E68BE8F. t0=E8C880BC. t1=19C23B0A.

R[ 1]: x= 38394A24 C36D1175 9C263D67 5E68BE8F. t0=988C8223. t1=33D1ECEC.

R[ 0]: x= 38394A24 C36D1175 E802528F 219BFEB4.

R[-1]: x= 00000000 00000000 00000000 00000000.

PT=00000000000000000000000000000000

==========

KEYSIZE=256

KEY=0123456789ABCDEFFEDCBA987654321000112233445566778899AABBCCDDEEFF

;

;makeKey: Input key --> S-box key [Encrypt]

; EFCDAB89 67452301 --> B89FF6F2

; 10325476 98BADCFE --> B255BC4B

; 77665544 33221100 --> 45661061

; FFEEDDCC BBAA9988 --> 8E4447F7

; Subkeys

; 5EC769BF 44D13C60 Input whiten

; 76CD39B1 16750474

; 349C294B EC21F6D6 Output whiten

; 4FBD10B4 578DA0ED

; C3479695 9B6958FB Round subkeys

; 6A7FBC4E 0BF1830B

; 61B5E0FB D78D9730

; 7C6CF0C4 2F9109C8

; E69EA8D1 ED99BDFF

; 35DC0BBD A03E5018

; FB18EA0B 38BD43D3

; 76191781 37A9A0D3

; 72427BEA 911CC0B8

; F1689449 71009CA9

; B6363E89 494D9855

; 590BBC63 F95A28B5

; FB72B4E1 2A43505C

; BFD34176 5C133D12

; 3A9247F7 9A3331DD

; EE7515E6 F0D54DCD

;

PT=00000000000000000000000000000000

Encrypt()

R[-1]: x= 00000000 00000000 00000000 00000000.

R[ 0]: x= 5EC769BF 44D13C60 76CD39B1 16750474.

R[ 1]: x= 5EC769BF 44D13C60 D38B6C9F A23B7169. t0=29C0736C. t1=E4D3D68D.

R[ 2]: x= 99424DFF FBC14BFC D38B6C9F A23B7169. t0=9D16BBB3. t1=64AD7A3F.

R[ 3]: x= 99424DFF FBC14BFC 698BE047 6A997290. t0=E66B9D19. t1=B87B2DFD.

R[ 4]: x= 2C125DD7 5A526278 698BE047 6A997290. t0=0BB41F61. t1=3945E62C.

R[ 5]: x= 2C125DD7 5A526278 E35CD910 7CB57D06. t0=D5397903. t1=F35A3092.

R[ 6]: x= D5178F25 00D35CC5 E35CD910 7CB57D06. t0=8C8927A1. t1=C3D8103E.

R[ 7]: x= D5178F25 00D35CC5 D8447F91 65C2BD96. t0=4D8B7489. t1=0B2FC79F.

R[ 8]: x= FF92E109 DF621C97 D8447F91 65C2BD96. t0=C1176720. t1=F301CE95.

R[ 9]: x= FF92E109 DF621C97 28BFEFF5 D45666FB. t0=9F3BEC03. t1=77BD388E.

R[10]: x= BB79AD2E AA410F41 28BFEFF5 D45666FB. t0=8C6DB451. t1=0B8B72BA.

R[11]: x= BB79AD2E AA410F41 6576A3ED BFF8215E. t0=8A317EF8. t1=A1EAEAAE.

R[12]: x= 4A6BBAFF 439F4766 6576A3ED BFF8215E. t0=8F8307AA. t1=472014C3.

R[13]: x= 4A6BBAFF 439F4766 F7186836 04CA5304. t0=CEB0BBE1. t1=C12302BE.

R[14]: x= CBD3C29D BC31FEBE F7186836 04CA5304. t0=5CF5C93C. t1=C1033512.

R[15]: x= CBD3C29D BC31FEBE D4E77B7C 5415D5D3. t0=853A6BB2. t1=9F09EB26.

R[16]: x= 85411C2B 7777DC05 D4E77B7C 5415D5D3. t0=877AF61D. t1=4B61EEC7.

R[17]: x= E07B5237 B8342305 CAFC0C9F 20FA7CE8.

CT=37527BE0052334B89F0CFCCAE87CFA20

Decrypt()

CT=37527BE0052334B89F0CFCCAE87CFA20

R[17]: x= E07B5237 B8342305 CAFC0C9F 20FA7CE8.

R[16]: x= 85411C2B 7777DC05 D4E77B7C 5415D5D3. t0=877AF61D. t1=4B61EEC7.

R[15]: x= CBD3C29D BC31FEBE D4E77B7C 5415D5D3. t0=853A6BB2. t1=9F09EB26.

R[14]: x= CBD3C29D BC31FEBE F7186836 04CA5304. t0=5CF5C93C. t1=C1033512.

R[13]: x= 4A6BBAFF 439F4766 F7186836 04CA5304. t0=CEB0BBE1. t1=C12302BE.

R[12]: x= 4A6BBAFF 439F4766 6576A3ED BFF8215E. t0=8F8307AA. t1=472014C3.

R[11]: x= BB79AD2E AA410F41 6576A3ED BFF8215E. t0=8A317EF8. t1=A1EAEAAE.

R[10]: x= BB79AD2E AA410F41 28BFEFF5 D45666FB. t0=8C6DB451. t1=0B8B72BA.

R[ 9]: x= FF92E109 DF621C97 28BFEFF5 D45666FB. t0=9F3BEC03. t1=77BD388E.

R[ 8]: x= FF92E109 DF621C97 D8447F91 65C2BD96. t0=C1176720. t1=F301CE95.

66

R[ 7]: x= D5178F25 00D35CC5 D8447F91 65C2BD96. t0=4D8B7489. t1=0B2FC79F.

R[ 6]: x= D5178F25 00D35CC5 E35CD910 7CB57D06. t0=8C8927A1. t1=C3D8103E.

R[ 5]: x= 2C125DD7 5A526278 E35CD910 7CB57D06. t0=D5397903. t1=F35A3092.

R[ 4]: x= 2C125DD7 5A526278 698BE047 6A997290. t0=0BB41F61. t1=3945E62C.

R[ 3]: x= 99424DFF FBC14BFC 698BE047 6A997290. t0=E66B9D19. t1=B87B2DFD.

R[ 2]: x= 99424DFF FBC14BFC D38B6C9F A23B7169. t0=9D16BBB3. t1=64AD7A3F.

R[ 1]: x= 5EC769BF 44D13C60 D38B6C9F A23B7169. t0=29C0736C. t1=E4D3D68D.

R[ 0]: x= 5EC769BF 44D13C60 76CD39B1 16750474.

R[-1]: x= 00000000 00000000 00000000 00000000.

PT=00000000000000000000000000000000

A.2 Full Encryptions

The following ﬁle shows a number of (plaintext, ci-

phertext, key) pairs. These pairs are related, and

can easily be tested automatically. The plaintext of

each entry is the ciphertext of the previous one. The

key of each entry is made up of the ciphertext two

and three entries back. We believe that these test

vectors provide a thorough test of a Twoﬁsh imple-

mentation.

FILENAME: "ecb_tbl.txt"

Electronic Codebook (ECB) Mode

Tables Known Answer Test

Tests permutation tables and MDS matrix multiply tables.

Algorithm Name: TWOFISH

Principal Submitter: Bruce Schneier, Counterpane Systems

==========

KEYSIZE=128

I=1

KEY=00000000000000000000000000000000

PT=00000000000000000000000000000000

CT=9F589F5CF6122C32B6BFEC2F2AE8C35A

I=2

KEY=00000000000000000000000000000000

PT=9F589F5CF6122C32B6BFEC2F2AE8C35A

CT=D491DB16E7B1C39E86CB086B789F5419

I=3

KEY=9F589F5CF6122C32B6BFEC2F2AE8C35A

PT=D491DB16E7B1C39E86CB086B789F5419

CT=019F9809DE1711858FAAC3A3BA20FBC3

I=4

KEY=D491DB16E7B1C39E86CB086B789F5419

PT=019F9809DE1711858FAAC3A3BA20FBC3

CT=6363977DE839486297E661C6C9D668EB

I=5

KEY=019F9809DE1711858FAAC3A3BA20FBC3

PT=6363977DE839486297E661C6C9D668EB

CT=816D5BD0FAE35342BF2A7412C246F752

I=6

KEY=6363977DE839486297E661C6C9D668EB

PT=816D5BD0FAE35342BF2A7412C246F752

CT=5449ECA008FF5921155F598AF4CED4D0

I=7

KEY=816D5BD0FAE35342BF2A7412C246F752

PT=5449ECA008FF5921155F598AF4CED4D0

CT=6600522E97AEB3094ED5F92AFCBCDD10

I=8

KEY=5449ECA008FF5921155F598AF4CED4D0

PT=6600522E97AEB3094ED5F92AFCBCDD10

CT=34C8A5FB2D3D08A170D120AC6D26DBFA

I=9

KEY=6600522E97AEB3094ED5F92AFCBCDD10

PT=34C8A5FB2D3D08A170D120AC6D26DBFA

CT=28530B358C1B42EF277DE6D4407FC591

I=10

KEY=34C8A5FB2D3D08A170D120AC6D26DBFA

PT=28530B358C1B42EF277DE6D4407FC591

CT=8A8AB983310ED78C8C0ECDE030B8DCA4

:

:

:

I=48

KEY=137A24CA47CD12BE818DF4D2F4355960

PT=BCA724A54533C6987E14AA827952F921

CT=6B459286F3FFD28D49F15B1581B08E42

I=49

KEY=BCA724A54533C6987E14AA827952F921

PT=6B459286F3FFD28D49F15B1581B08E42

CT=5D9D4EEFFA9151575524F115815A12E0

==========

KEYSIZE=192

I=1

KEY=000000000000000000000000000000000000000000000000

PT=00000000000000000000000000000000

CT=EFA71F788965BD4453F860178FC19101

I=2

KEY=000000000000000000000000000000000000000000000000

PT=EFA71F788965BD4453F860178FC19101

CT=88B2B2706B105E36B446BB6D731A1E88

I=3

KEY=EFA71F788965BD4453F860178FC191010000000000000000

PT=88B2B2706B105E36B446BB6D731A1E88

CT=39DA69D6BA4997D585B6DC073CA341B2

I=4

KEY=88B2B2706B105E36B446BB6D731A1E88EFA71F788965BD44

PT=39DA69D6BA4997D585B6DC073CA341B2

CT=182B02D81497EA45F9DAACDC29193A65

I=5

KEY=39DA69D6BA4997D585B6DC073CA341B288B2B2706B105E36

PT=182B02D81497EA45F9DAACDC29193A65

CT=7AFF7A70CA2FF28AC31DD8AE5DAAAB63

I=6

KEY=182B02D81497EA45F9DAACDC29193A6539DA69D6BA4997D5

PT=7AFF7A70CA2FF28AC31DD8AE5DAAAB63

CT=D1079B789F666649B6BD7D1629F1F77E

I=7

KEY=7AFF7A70CA2FF28AC31DD8AE5DAAAB63182B02D81497EA45

PT=D1079B789F666649B6BD7D1629F1F77E

CT=3AF6F7CE5BD35EF18BEC6FA787AB506B

I=8

KEY=D1079B789F666649B6BD7D1629F1F77E7AFF7A70CA2FF28A

PT=3AF6F7CE5BD35EF18BEC6FA787AB506B

CT=AE8109BFDA85C1F2C5038B34ED691BFF

I=9

KEY=3AF6F7CE5BD35EF18BEC6FA787AB506BD1079B789F666649

PT=AE8109BFDA85C1F2C5038B34ED691BFF

CT=893FD67B98C550073571BD631263FC78

I=10

KEY=AE8109BFDA85C1F2C5038B34ED691BFF3AF6F7CE5BD35EF1

PT=893FD67B98C550073571BD631263FC78

CT=16434FC9C8841A63D58700B5578E8F67

:

:

:

I=48

KEY=DEA4F3DA75EC7A8EAC3861A9912402CD5DBE44032769DF54

PT=FB66522C332FCC4C042ABE32FA9E902F

CT=F0AB73301125FA21EF70BE5385FB76B6

I=49

KEY=FB66522C332FCC4C042ABE32FA9E902FDEA4F3DA75EC7A8E

PT=F0AB73301125FA21EF70BE5385FB76B6

CT=E75449212BEEF9F4A390BD860A640941

==========

KEYSIZE=256

I=1

KEY=0000000000000000000000000000000000000000000000000000000000000000

PT=00000000000000000000000000000000

CT=57FF739D4DC92C1BD7FC01700CC8216F

I=2

KEY=0000000000000000000000000000000000000000000000000000000000000000

PT=57FF739D4DC92C1BD7FC01700CC8216F

CT=D43BB7556EA32E46F2A282B7D45B4E0D

I=3

KEY=57FF739D4DC92C1BD7FC01700CC8216F00000000000000000000000000000000

PT=D43BB7556EA32E46F2A282B7D45B4E0D

CT=90AFE91BB288544F2C32DC239B2635E6

I=4

KEY=D43BB7556EA32E46F2A282B7D45B4E0D57FF739D4DC92C1BD7FC01700CC8216F

PT=90AFE91BB288544F2C32DC239B2635E6

CT=6CB4561C40BF0A9705931CB6D408E7FA

I=5

KEY=90AFE91BB288544F2C32DC239B2635E6D43BB7556EA32E46F2A282B7D45B4E0D

PT=6CB4561C40BF0A9705931CB6D408E7FA

CT=3059D6D61753B958D92F4781C8640E58

I=6

KEY=6CB4561C40BF0A9705931CB6D408E7FA90AFE91BB288544F2C32DC239B2635E6

PT=3059D6D61753B958D92F4781C8640E58

67

CT=E69465770505D7F80EF68CA38AB3A3D6

I=7

KEY=3059D6D61753B958D92F4781C8640E586CB4561C40BF0A9705931CB6D408E7FA

PT=E69465770505D7F80EF68CA38AB3A3D6

CT=5AB67A5F8539A4A5FD9F0373BA463466

I=8

KEY=E69465770505D7F80EF68CA38AB3A3D63059D6D61753B958D92F4781C8640E58

PT=5AB67A5F8539A4A5FD9F0373BA463466

CT=DC096BCD99FC72F79936D4C748E75AF7

I=9

KEY=5AB67A5F8539A4A5FD9F0373BA463466E69465770505D7F80EF68CA38AB3A3D6

PT=DC096BCD99FC72F79936D4C748E75AF7

CT=C5A3E7CEE0F1B7260528A68FB4EA05F2

I=10

KEY=DC096BCD99FC72F79936D4C748E75AF75AB67A5F8539A4A5FD9F0373BA463466

PT=C5A3E7CEE0F1B7260528A68FB4EA05F2

CT=43D5CEC327B24AB90AD34A79D0469151

:

:

:

I=48

KEY=2E2158BC3E5FC714C1EEECA0EA696D48D2DED73E59319A8138E0331F0EA149EA

PT=248A7F3528B168ACFDD1386E3F51E30C

CT=431058F4DBC7F734DA4F02F04CC4F459

I=49

KEY=248A7F3528B168ACFDD1386E3F51E30C2E2158BC3E5FC714C1EEECA0EA696D48

PT=431058F4DBC7F734DA4F02F04CC4F459

CT=37FE26FF1CF66175F5DDF4C33B97A205

68

Contents

1 Introduction 2 Twoﬁsh Design Goals 3 Twoﬁsh Building Blocks 3.1 Feistel Networks . . . . . . . . 3.2 S-boxes . . . . . . . . . . . . . 3.3 MDS Matrices . . . . . . . . . 3.4 Pseudo-Hadamard Transforms . 3.5 Whitening . . . . . . . . . . . . 3.6 Key Schedule . . . . . . . . . . 4 Twoﬁsh 4.1 The Function F . . . . . . 4.2 The Function g . . . . . . 4.3 The Key Schedule . . . . 4.4 Round Function Overview 3 3 4 4 5 5 5 5 5

8.6 8.7 8.8

. . . . . .

. . . . . .

. . . . . .

. . . . . .

Partial Key Guessing Attacks . . . . Related-key Cryptanalysis . . . . . . A Related-Key Attack on a Twoﬁsh Variant . . . . . . . . . . . . . . . . 8.9 Side-Channel Cryptanalysis and Fault Analysis . . . . . . . . . . . . . 8.10 Attacking Simpliﬁed Twoﬁsh . . . . 9 Trap Doors in Twoﬁsh 10 When is a Cipher Insecure? 11 Using Twoﬁsh 11.1 Chaining Modes . . . . . . . . . . . 11.2 One-Way Hash Functions . . . . . . 11.3 Message Authentication Codes . . . 11.4 Pseudo-Random Number Generators 11.5 Larger Keys . . . . . . . . . . . . . . 11.6 Additional Block Sizes . . . . . . . . 11.7 More or Fewer Rounds . . . . . . . . 11.8 Family Key Variant: Twoﬁsh-FK . . 12 Historical Remarks 13 Conclusions and Further Work 14 Acknowledgments

. 46 . 46 . 48 . 50 . 50 52 53 53 53 53 53 53 54 54 54 54 56 57 58

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

5 . 7 . 7 . 8 . 12 12 12 15 16 17 18 18 19 20 21 22 23 23 24 27 29 29 29 29 29 30 31 31 36

5 Performance of Twoﬁsh 5.1 Performance on Large Microprocessors 5.2 Performance on Smart Cards . . . . . 5.3 Performance on Future Microprocessors 5.4 Hardware Performance . . . . . . . . . 6 Twoﬁsh Design Philosophy 6.1 Performance-Driven Design 6.2 Conservative Design . . . . 6.3 Simple Design . . . . . . . . 6.4 S-boxes . . . . . . . . . . . 6.5 The Key Schedule . . . . . 7 The 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12

. . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

A Twoﬁsh Test Vectors 65 A.1 Intermediate Values . . . . . . . . . . 65 A.2 Full Encryptions . . . . . . . . . . . . 67

Design of Twoﬁsh The Round Structure . . . . . . . The Key-dependent S-boxes . . . MDS Matrix . . . . . . . . . . . PHT . . . . . . . . . . . . . . . . Key Addition . . . . . . . . . . . Feistel Combining Operation . . Use of Diﬀerent Groups . . . . . Diﬀusion in the Round Function One-bit Rotation . . . . . . . . . The Number of Rounds . . . . . The Key Schedule . . . . . . . . Reed-Solomon Code . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

8 Cryptanalysis of Twoﬁsh 8.1 Diﬀerential Cryptanalysis . . . . . . 8.2 Extensions to Diﬀerential Cryptanalysis . . . . . . . . . . . . . . . . . . . 8.3 Search for the Best Diﬀerential Characteristic . . . . . . . . . . . . . . . . 8.4 Linear Cryptanalysis . . . . . . . . . 8.5 Interpolation Attack . . . . . . . . .

36 . 36 . 41 . 41 . 44 . 45 2

1

Introduction

Twoﬁsh can: • Encrypt data at 285 clock cycles per block on a Pentium Pro, after a 12700 clock-cycle key setup. • Encrypt data at 860 clock cycles per block on a Pentium Pro, after a 1250 clock-cycle key setup. • Encrypt data at 26500 clock cycles per block on a 6805 smart card, after a 1750 clock-cycle key setup. This paper is organized as follows: Section 2 discusses our design goals for Twoﬁsh. Section 3 describes the building blocks and general design of the cipher. Section 4 deﬁnes the cipher. Section 5 discusses the performance of Twoﬁsh. Section 6 talks about the design philosophy that we used. In Section 7 we describe the design process, and why the various choices were made. Section 8 contains our best cryptanalysis of Twoﬁsh. In Section 9 we discuss the possibility of trapdoors in the cipher. Section 10 compares Twoﬁsh with some other ciphers. Section 11 discusses various modes of using Twoﬁsh, including a family-key variant. Section 12 contains historical remarks, and Section 13 our conclusions and directions for future analysis.

In 1972 and 1974, the National Bureau of Standards (now the National Institute of Standards and Technology, or NIST) issued the ﬁrst public request for an encryption standard. The result was DES [NBS77], arguably the most widely used and successful encryption algorithm in the world. Despite its popularity, DES has been plagued with controversy. Some cryptographers objected to the “closed-door” design process of the algorithm. The debate about whether DES’ key is too short for acceptable commercial security has raged for many years [DH79], but recent advances in distributed key search techniques have left no doubt in anyone’s mind that its key is simply too short for today’s security applications [Wie94, BDR+96]. TripleDES has emerged as an interim solution in many high-security applications, such as banking, but it is too slow for some uses. More fundamentally, the 64-bit block length shared by DES and most other well-known ciphers opens it up to attacks when large amounts of data are encrypted under the same key. In response to a growing desire to replace DES, NIST announced the Advanced Encryption Standard (AES) program in 1997 [NIST97a]. NIST solicited comments from the public on the proposed standard, and eventually issued a call for algorithms to satisfy the standard [NIST97b]. The intention is for NIST to make all submissions public and eventually, through a process of public review and comment, choose a new encryption standard to replace DES. NIST’s call requested a block cipher. Block ciphers can be used to design stream ciphers with a variety of synchronization and error extension properties, oneway hash functions, message authentication codes, and pseudo-random number generators. Because of this ﬂexibility, they are the workhorse of modern cryptography. NIST speciﬁed several other design criteria: a longer key length, larger block size, faster speed, and greater ﬂexibility. While no single algorithm can be optimized for all needs, NIST intends AES to become the standard symmetric algorithm of the next decade. Twoﬁsh is our submission to the AES selection process. It meets all the required NIST criteria—128bit block; 128-, 192-, and 256-bit key; eﬃcient on various platforms; etc.—and some strenuous design requirements, performance as well as cryptographic, of our own. 3

2

Twoﬁsh Design Goals

Twoﬁsh was designed to meet NIST’s design criteria for AES [NIST97b]. Speciﬁcally, they are: • A 128-bit symmetric block cipher. • Key lengths of 128 bits, 192 bits, and 256 bits. • No weak keys. • Eﬃciency, both on the Intel Pentium Pro and other software and hardware platforms. • Flexible design: e.g., accept additional key lengths; be implementable on a wide variety of platforms and applications; and be suitable for a stream cipher, hash function, and MAC. • Simple design, both to facilitate ease of analysis and ease of implementation. Additionally, we imposed the following performance criteria on our design: • Accept any key length up to 256 bits.

and pseudo-random number generator. where N is the key length. Blowﬁsh [Sch94]. • Have a family-key variant to allow for diﬀerent. and RC5 [Riv95]. It is the basis of most block ciphers published since then. which may be a weak encryption algorithm when taken by itself. and less than 2N/2 time. Pentium Pro. Khufu and Khafre [Mer91].. In each round. LOKI [BPS90. • Not include any elements that make it ineﬃcient in hardware. where N is the key length. ZMI90] such as MacGuﬃn [BS95] (cryptanalyzed in [RP95a]) and Bear/Lion [AB96b].g. 8-cycle (8-round) IDEA is comparable to 8-cycle (16-round) DES and 8-cycle (32-round) Skipjack. 3 3. and repeatedly iterate it to create a strong encryption algorithm. • Encrypt data in less than 5000 clock cycles per block on a Pentium. Pentium Pro. Finally. • Not contain any operations that make it inefﬁcient on other 32-bit microprocessors.g. 4 . the “source block” is the input to F . GOST [GOST89]. e. and F is a function taking n/2 bits of the block and N bits of a key as input. and Shark [RDP+96] (see also [YTH96]). MAC. Two rounds of a Feistel network is called a “cycle” [SK96]. The fundamental building block of a Feistel network is the F function: a key-dependent mapping of an input string onto an output string. An F function is always non-linear and possibly non-surjective1 : F : {0. • Not contain any operations that make it inefﬁcient on 8-bit and 16-bit microprocessors. including FEAL [SM88]. and producing an output of length n/2 bits. no large tables. SAFER. • Be implementable on a 8-bit microprocessor with only 64 bytes of RAM. • Encrypt data in less than less than 10 milliseconds on a commodity 8-bit microprocessor. In one cycle.1 Twoﬁsh Building Blocks Feistel Networks A Feistel network is a general method of transforming any function (usually called the F function) into a permutation. non-interoperable. Our cryptographic goals were as follows: • 16-round Twoﬁsh (without whitening) should have no chosen-plaintext attack requiring fewer than 280 chosen plaintexts and less than 2N time. We feel we have met all of these goals in the design of Twoﬁsh. we imposed the following ﬂexibility goals: • Have variants with a variable number of rounds. notion of a cycle allows Feistel networks to be compared with unbalanced Feistel networks [SK96. CAST-128 [Ada97a]. 1}n/2 × {0. it should be suitable for dedicated hardware applications: e.” after which these two blocks swap places for the next round.000 gates. and with SP-networks (also called uniform transformation structures [Fei73]) such as IDEA. It was invented by Horst Feistel [FNS75] in his design of Lucifer [Fei73]. • Have a variety of performance tradeoﬀs with respect to the key schedule. and the output of F is xored with the “target block. • Be capable of setting up a 128-bit key (for optimal encryption speed) in less than the time required to encrypt 32 blocks on a Pentium. Thus. and Pentium II. versions of the cipher. one-way hash function.• Encrypt data in less than 500 clock cycles per block on an Intel Pentium. • Be implementable in hardware using less than 20. Additionally. Merced. and popularized by DES [NBS77]. 1}N → {0. 1}n/2 where n is the block size of the Feistel Network. Pentium Pro.. and Pentium II. The idea here is to take an F function. or computed on-the-ﬂy for maximum agility and minimum memory requirements. • 12-round Twoﬁsh (without whitening) should have no related-key attack requiring fewer than 264 chosen plaintexts. and Pentium II with no key setup time. every bit of the text block has been modiﬁed once. using well-understood construction methods.2 non-surjective F function is one in which not all outputs in the output space can occur. 1A 2 The • Have a key schedule that can be precomputed for maximum speed. for a fully optimized version of the algorithm. • Not contain any operations that reduce its efﬁciency on proposed 64-bit microprocessors. BKPS93]. • Be suitable as a stream cipher.

S-boxes were ﬁrst used in Lucifer. It can easily be shown that no mapping can have a larger minimum distance between two distinct vectors. The plaintext is split into four 32-bit words. the number of elements that diﬀer) between any two distinct vectors produced by the MDS mapping is at least b + 1. by hiding from an attacker the speciﬁc inputs to the ﬁrst and last rounds’ F functions.e. obtained by discarding rows or columns. In our attacks on reduced-round Twoﬁsh variants. the key schedule uses the same primitives as the round function. In each 3 Manta is a block cipher with a large block size and an emphasis on long-term security rather than speed. Serge Vaudenay ﬁrst proposed MDS matrices as a cipher design element [Vau95]. the technique of xoring key material before the ﬁrst round and after the last round. and another 128 bits after the last Feistel round.6 Key Schedule The key schedule is the means by which the key bits are turned into round keys that the cipher can use. b = a + 2b mod 232 3.Twoﬁsh is a 16-round Feistel network with a bijective F function.5 Whitening 3. the 32-bit PHT is deﬁned as: a = a + b mod 232 Figure 1 shows an overview of the Twoﬁsh block cipher. but are not used anywhere else in the cipher. This is followed by sixteen rounds. MDS mappings can be represented by an MDS matrix consisting of a × b elements. These S-boxes are built using two ﬁxed 8-by-8-bit permutations and key material. 8-by-8-bit S-boxes. producing a composite vector of a + b elements. 3. S-boxes vary in both input size and output size. but this requires an additional rotation of the words just before the output whitening step. was used by Merkle in Khufu/Khafre. with the property that the minimum number of non-zero elements in any non-zero vector is at least b + 1 [MS77]. Twoﬁsh needs a lot of key material. Whitening. In [KR96]. Shark [RDP+96] and Square [DKR97] use MDS matrices (see also [YMT97]). Put another way. This PHT can be executed in two opcodes on most modern microprocessors. SAFER [Mas94] uses 8-bit PHTs extensively for diffusion. bijective. and can be created either randomly or algorithmically.. 4 Twoﬁsh 3.3 MDS Matrices A maximum distance separable (MDS) code over a ﬁeld is a linear mapping from a ﬁeld elements to b ﬁeld elements. Given two inputs. we discovered that whitening substantially increased the diﬃculty of attacking the cipher. then DES. A necessary and suﬃcient condition for an a × b matrix to be MDS is that all possible square submatrices. the “distance” (i.4 Pseudo-Hadamard Transforms A pseudo-Hadamard transform (PHT) is a simple mixing operation that runs quickly in software. The only non-Feistel elements are the 1-bit rotates.2 S-boxes An S-box is a table-driven non-linear substitution operation used in most block ciphers. it was shown that whitening substantially increases the diﬃculty of keysearch attacks against the remainder of the cipher. these are xored with four key words. To facilitate analysis. although we ﬁrst saw the construction used in the unpublished cipher Manta3 [Fer96]. Twoﬁsh uses a 16-round Feistel-like structure with additional whitening of the input and output. Twoﬁsh uses a 32-bit PHT to mix the outputs from its two parallel 32-bit g functions. Twoﬁsh uses a single 4-by-4 MDS matrix over GF(28 ). including the Pentium family. are non-singular. The rotations can be moved into the F function to create a pure Feistel structure. hence the term maximum distance separable. a and b. These subkeys are calculated in the same manner as the round subkeys. 3. Reed-Solomon (RS) error-correcting codes are known to be MDS. and afterwards in most encryption algorithms. and independently invented by Rivest for DES-X [KR96]. and has a complicated key schedule. Twoﬁsh uses four diﬀerent. key-dependent. Twoﬁsh xors 128 bits of subkey before the ﬁrst Feistel round. It uses an SP-like network with DES as the S-boxes and MDS matrices for the permutations. In the input whitening step. 5 .

P (128 bits) K0 g g K1 K2 g g K3 input whitening F <<<1 g S-box 0 S-box 1 S-box 2 S-box 3 MDS PHT K2r+8 g g S-box 0 S-box 1 <<<8 S-box 2 S-box 3 MDS K2r+9 >>>1 one round g hh (((( (( hhhh hhhhhhhh (((( hhhh hhhh (((( (((((((( ( hhhh hhh ( h(((((( h((((( h (((hhhhhhhh hhhhhhh ((( ( (( h hhhh (((( (((((((( hhhh hhhhhh (((( (((( hhhh ( ( : : 15 more rounds undo last swap hh (( (((( hhhh hhhhhhhh (((( (((((((( hhhh ( hhhh hhhhhhhh (( (((( (((( ((h( (((( hhh (hh(hhhhh ((( hhhh ((( (((( h hhhh (((( hhhh hhhhhhh (((( (((((((( hhhh (( h K4 g g K5 C (128 bits) K6 g g K7 output whitening Figure 1: Twoﬁsh 6 .

Ci = R16. . Each Sbox is bijective. the other is rotated right afterwards). . Fr. More formally. .1 = ROL(Rr. we need to specify the correspondence between byte values and the ﬁeld elements of GF(28 ). Thus. . 3 where (F0 . These two results are then xored into the words on the right (one of which is rotated left by 1 bit ﬁrst. . 3 4. except that it does not add any key blocks to the output. .round. P3 of 32 bits each using the little-endian convention. (The PHT is still performed.3 = = Rr. and multiplied by the 4×4 MDS matrix (using the ﬁeld GF(28 ) for the computations).3 . ci = C i/4 8(i mod 4) 2 mod 28 i = 0. T0 T1 F0 F1 = g(R0 ) = g(ROL(R1 . . F1 ) is the result of F . . the ﬁrst two words are used as input to the function F . these words are xored with 4 words of the expanded key. .1 Rr+1. which yields T0 . . 1) Rr+1. . The results T0 and T1 are then combined in a PHT and two words of the expanded key are added. . MDS . . 7 The ﬁeld element a = i=0 ai xi with ai ∈ GF(2) 7 . · . Finally. Each byte is run through its own key-dependent S-box. the two halves are exchanged. . and xors the data words with 4 words of the expanded key. 3 In the input whitening step. p15 are ﬁrst split into 4 words P0 . The output whitening step undoes the ‘swap’ of the last round.0 Rr. 15 and where ROR and ROL are functions that rotate their ﬁrst argument (a 32-bit word) left or right by the number of bits indicated by their second argument. The third word is xored with the ﬁrst output of F and then rotated right by one bit. .1 4. . .2 Rr+1. . . . . For this to be well-deﬁned. We also deﬁne the function F for use in our analysis. The results of the two g functions are combined using a PseudoHadamard Transform (PHT). . .2 ⊕ Fr. which also takes the round number as input.1 The Function F The function F is a key-dependent permutation on 64-bit values. R0. two input words R0 and R1 . .2 The Function g The function g forms the heart of Twoﬁsh. . .) In each of the 16 rounds. The fourth word is rotated left by one bit and then xored with the second output word of F . .0 .0 . 3 y0 y1 y2 y3 for r = 0. the two words on the left are used as input to the g functions. R0 is passed through the g function. and two keywords are added. It takes three arguments. The input word X is split into four bytes. . After all the rounds. 3 · ··· · . · ··· · 3 i=0 = zi · 28i The four words of ciphertext are then written as 16 bytes c0 . . . the 16 bytes of plaintext p0 . . We represent GF(28 ) as GF(2)[x]/v(x) where v(x) = x8 + x6 + x5 + x3 + 1 is a primitive polynomial of degree 8 over GF(2). . 8)) = (T0 + T1 + K2r+8 ) mod 232 = (T0 + 2T1 + K2r+9 ) mod 232 Pi = j=0 p(4i+j) · 28j i = 0. . followed by a linear mixing step based on an MDS matrix. r) Rr+1. . F is identical to the F function. Rr. R1 is rotated left by 8 bits and then passed through the g function to yield T1 .i = Pi ⊕ Ki i = 0. 3 si [xi ] i = 0. . . and the round number r used to select the appropriate subkeys. . 15 where si are the key-dependent S-boxes and Z is the result of g. .0 . The four results are interpreted as a vector of length 4 over GF(28 ). (One of them is rotated by 8 bits ﬁrst. The left and right halves are then swapped for the next round. takes 8 bits of input. c15 using the same little-endian conversion used for the plaintext. .0 = ROR(Rr. (Fr. . the swap of the last round is reversed. .1 . . .1 ) = F (Rr. xi z0 z1 z2 z3 Z yi = = = X/28i mod 28 i = 0. and the four words are xored with four more key words to produce the ciphertext. 1) ⊕ Fr. The resulting vector is interpreted as a 32-bit word which is the result of g.(i+2) mod 4 ⊕ Ki+4 i = 0.) The g function consists of four byte-wide key-dependent S-boxes. and produces 8 bits of output.

1 m = . . The key M consists of 8k bytes m0 . GF(28 ) is represented by GF(2)[x]/w(x). . . . and S form the basis of the key schedule. The mapping between byte values and elements of GF(28 ) uses the same deﬁnition as used for the MDS matrix multiply. · 8i+3 . M2k−2 ) = (M1 . and the 4 key-dependent S-boxes used in the g function. . and multiplying them by a 4 × 8 matrix derived from an RS code. k − 1.2 The Function h and then into two word vectors of length k.1 Additional Key Lengths 5B EF 01 EF where the elements have been written as hexadecimal byte values using the above deﬁned correspondence. . Using this mapping. 15 and treating it as a 128-bit key. . . yk. . . and the four bytes are multiplied by the MDS matrix just as in g. For the RS matrix multiply. interpreting them as a vector over GF(28 ). . .2 m8i+4 · ··· · m8i+5 si.3. This is in some sense the “natural” mapping. Finally. . . m8k−1 . si. 2k − 1 Twoﬁsh can accept keys of any byte length up to 256 bits. .j · 28j . Each result of 4 bytes is then interpreted as a 32-bit word. For key sizes that are not deﬁned above. . the four bytes are each passed through a ﬁxed S-box. The bytes are ﬁrst converted into 2k words of 32 bits each 3 Mi = j=0 m(4i+j) · 28j i = 0. . Keys of any length shorter than 256 bits can be used by padding them with zeroes until the next larger deﬁned key length. 3. . . . . . an 80-bit key m0 . Mo . . . and N = 256. . M2k−1 ) A third word vector of length k is also derived from the key. . . The MDS matrix is given by: 01 EF 5B EF MDS = EF 5B EF 01 5B 01 EF 5B 7 for i = 0. For example. . This is done by taking the key bytes in groups of 8.j xj = = Li /28j mod 28 X/28j mod 28 for i = 0. . . the bytes are once again passed through a ﬁxed Sbox. . . . M2 . . These words make up the third vector. Twoﬁsh is deﬁned for keys of length N = 128.3. k − 1 and j = 0. . . and S = (Sk−1 .j = xj 8 j = 0. . and xored with a byte derived from the list. the RS matrix is given by: 01 A4 55 87 5A 58 DB 9E A4 56 82 F3 1E C6 68 E5 RS = 02 A1 FC C1 47 AE 3D 19 A4 55 87 5A 58 DB 9E 03 The three vectors Me .3 The Key Schedule The key schedule has to provide 40 words of expanded key K0 . More formally: we split the words into bytes. .is identiﬁed with the byte value i=0 ai 2i . . 4. Sk−2 . Lk−1 ) of 32-bit words of length k—and produces one word of output. m8i m8i+1 m8i+2 si. . where w(x) = x8 +x6 +x3 +x2 +1 is another primitive polynomial of degree 8 over GF(2). . . li. m9 would be extended by setting mi = 0 for i = 10. the key is padded at the end with zero bytes to the next larger length that is deﬁned. This function works in k stages. In each stage. M3 . . . . . 4. S0 ) Note that S lists the words in “reverse” order. . addition in GF(28 ) corresponds to a xor of the bytes. . 3 Si = j=0 si. Me Mo = (M0 .3 m8i+6 m8i+7 3 Figure 2 shows an overview of the function h. K39 . . 4.0 · ··· · si. . . We deﬁne k = N/64. Then the sequence of substitutions and xors is applied. N = 192. . . RS . . This is a function that takes two inputs—a 32-bit word X and a list L = (L0 . .

X ? q1 ? q0 ? q0 ? g L 3 ? q1 C k=4 k < 4C ? q0 ? q0 ? q1 ? q1 C k>2 k = 2C ? q1 ? q0 ? q1 ? g L 2 ? q0 ? g L 1 ? q1 ? q1 ? q0 ? g L 0 ? q0 ? q1 ? q0 ? q1 ? q0 MDS ? Z Figure 2: The function h 9 .

1) ⊕ 8a2 mod 16 t2 [a3 ].3 ] ⊕ l1.1 ] ⊕ l1.0 ] ⊕ l0. This is followed by another mixing step and S-box lookup. . x mod 16 a0 ⊕ b0 Here. and the second argument of h is Me . .0 q0 [y4. .0 ] ⊕ l3. each with the value i.1 q0 [y3. Bi is computed similarly using 2i + 1 as the byte value and Mo as the second argument. t1 [b1 ] a2 ⊕ b2 Z = i=0 zi · 28i = 16 b4 + a4 where Z is the result of h. q0 and q1 are ﬁxed permutations on 8-bit values that we will deﬁne shortly.2 ] ⊕ l2.1 ] q1 [q1 [q0 [y2. we deﬁne the corresponding output value y as follows: a0 . 4. . . 3. 9) . 255. it has the property that for i = 0.2 ] ⊕ l0.3.3 = = = = q1 [y4.0 y2. Each nibble is then passed through its own 4bit ﬁxed S-box.1 y2.3 The Key-dependent S-boxes We can now deﬁne the S-boxes in the function g by g(X) = h(X. 1. · y1 = .0 ] ⊕ l2.3 ] ⊕ l3. .1 ] ⊕ l0. .3 ] ⊕ l2.3 ] ⊕ l0.0 ] q0 [q0 [q1 [y2. For the permutation q0 the 4-bit S-boxes are given by t0 t1 t2 t3 = = = = [ [ [ [ 8 E B D 1 C A 7 7 B 5 F D 8 E 4 6 1 6 1 F 2 D 2 3 3 9 6 2 5 0 E 0 F C 9 B 4 8 B 5 A F 3 9 6 3 0 E 7 2 8 C 0 4 5 A 9 7 C 4 D 1 A ] ] ] ] The words of the expanded key are deﬁned using the h function. (The entries for the inputs 0. b2 a3 b3 a4 .1 y3.3. the word iρ consists of four equal bytes. Me ) ROL(h((2i + 1)ρ. . 1) ⊕ 8a0 mod 16 t0 [a1 ].1 ] ⊕ l3. . The values Ai and Bi are combined in a PHT.3 ] = = = = q1 [y3. . the key-dependent S-box si is formed by the mapping from xi to yi in the h function.If k = 4 we have y3. the byte is split into two nibbles.3 In all cases we have y0 y1 y2 y3 = = = = q1 [q0 [q0 [y2.4 The Expanded Key Words Kj where ROR4 is a function similar to ROR that rotates 4-bit values. The two results form two words of the expanded key. with an extra rotate over 8 bits. y . . for q1 the 4-bit S-boxes are given by t0 t1 t2 t3 10 = = = = [ [ [ [ 2 1 4 B 8 E C 9 B 2 7 5 D B 5 1 F 4 1 C 7 C 6 3 6 3 9 D E 7 A E 3 6 0 6 1 D E 4 9 A D 7 4 5 8 F 0 F 2 2 A 9 B 0 C 0 3 8 5 8 F A ] ] ] ] = (Ai + Bi ) mod 232 = ROL((Ai + 2Bi ) mod 232 .1 ] ⊕ l2. 8) where each 4-bit S-box is represented by a list of the entries using hexadecimal notation. S) That is. .2 ] q0 [q1 [q1 [y2. 4.2 y2.2 y3.5 The Permutations q0 and q1 The permutations q0 and q1 are ﬁxed permutations on 8-bit values. The resulting vector of yi ’s is multiplied by the MDS matrix. the two nibbles are recombined into a byte.2 q1 [y4. t3 [b3 ] a0 ⊕ ROR4 (b0 . .) Similarly. Finally. MDS .1 q0 [y4. b0 a1 b1 a2 .3. .0 q1 [y3. The function h is applied to words of this type.0 y3. for i = 0. 15 are listed in order.0 ] ⊕ l1.3 The constant ρ is used here to duplicate bytes.3 If k ≥ 3 we have y2. For the input value x. z2 2 · ··· · z3 y3 3 a2 ⊕ ROR4 (b2 .2 q0 [y3. They are constructed from four different 4-bit permutations each.2 ] ⊕ l1. These are combined in a bijective mixing step. b4 y = = = = = = = x/16 . One of the results is further rotated by 9 bits. For Ai the byte values are 2i. just as in the g function. ρ Ai Bi K2i K2i+1 = = = 224 + 216 + 28 + 20 h(2iρ. Mo ).2 ] ⊕ l3. where the list L is equal to the vector S derived from the key. First. 4. y z0 0 · ··· · z1 .

h 2i 2i 2i 2i h M2 h M0 PHT MDS h 2i + 1 2i + 1 2i + 1 2i + 1 h M3 h M1 MDS < <8 < < <9 < g R0 h h PHT MDS F0 S0 g S1 R1 < <8 < h h MDS F1 Figure 3: A view of a single round F function (128-bit key) 11 .

and three xors. . On a 200 MHz . this translates to a throughput of just under 90 Mbits/sec. All of our keying options precompute Ki for i = 0. The approximate total code size (in bytes) of the routines for encryption. For the 128-bit key this is particularly eﬃcient as precomputing the S-boxes now consists of copying the table of the appropriate q-box and xoring it with a constant (which can be done word-by-word instead of byte-by-byte). these are simply implementation trade-oﬀs and do not aﬀect the mathematics of Twoﬁsh. Twoﬁsh has been designed to allow several layers of performance tradeoﬀs. For each byte. Encryption and decyption speeds are constant regardless of key size. where available. Partial Keying For applications where few blocks are encrypted with a single key. and key setup is also listed. Minimal Keying For applications where very few blocks are encrypted with a single key. key setup. on a Pentium Pro a fully optimized assemblylanguage version of Twoﬁsh can encrypt or decrypt data in 285 clock cycles per block. This option uses a 1 Kb table to store the partially precomputed S-boxes. Zero Keying The zero keying option does not precompute any of the S-boxes. More importantly. All these options are interoperable. We have implemented four diﬀerent keying options. 5. There are several other possible keying options. One end of a communication could use the fastest Pentium II implementation. The necessary key bytes from S are of course precomputed as they are needed in every round.1 Performance on Large Microprocessors Table 1 gives Twoﬁsh’s performance. and the other the cheapest hardware implementation. one less layer of q-boxes is precomputed into the S-box table. though. there is a further possible optimization. but the examples listed below are representative of the range of possibilities.4 Round Function Overview Pentium Pro microprocessor.4. so only the encryption time is given. each with slightly diﬀerent setup/throughput tradeoﬀs. Using 4 Kb of table space. Compared to partial keying. it may not make sense to build the complete key schedule. The partial keying option precomputes the four S-boxes in 8by-8 bit tables. Full Keying This option performs the full key precomputations. memory use. For example. Using this option. for diﬀerent key scheduling options and on several modern microprocessors using diﬀerent languages and compilers. and dedicated VLSI hardware. It is eﬃcient on a variety of platforms: 32-bit CPUs. encryption or decryption. Instead. There is no time required to set up the algorithm except for key setup. or clock cycles to set up the complete key. . The result is a highly ﬂexible algorithm that can be implemented eﬃciently in a variety of cryptographic applications. each S-box is expanded to a 8-by-32-bit table that combines both the S-box lookup and the multiply by the column of the MDS matrix. All timing data is given in clock cycles per block. decryption. 5 Performance of Twoﬁsh Twoﬁsh has been designed from the start with performance in mind. The time required to change a key is the same as the time required to setup a key. Encryption and decryption speed are again constant regardless of key size. or 17.8 clock cycles per byte. . and other implementation parameters. The diﬀerences occur in the way the function g is implemented. but is useful for visualizing exactly how the algorithm works. so only k of the q-boxes are incorporated into the 8-by-8-bit S-box table that is built by the key schedule. every entry is computed on 12 Figure 3 shows a more detailed view of how the function F is computed each round when the key length is 128 bits. a computation of g consists of four table lookups. Incorporating the S-box and roundsubkey generation makes the Twoﬁsh round function look more complicated. This reduces the key-schedule table space to 1 Kb. . 8-bit smart cards. and the remaining q-box is done during the encryption. the last of the q-box lookups is in fact incorporated into the MDS table. depending on the relative importance of encrytion speed. after a 12700-clock key setup (equivalent to encrypting 45 blocks). and uses four ﬁxed 8-by-32-bit MDS tables to perform the MDS multiply. The times for encryption and decryption are usually extremely close. hardware gate count. 39 and use 160 bytes of RAM to store these constants. and thus needs no extra tables.

Processor Pentium Pro/II Pentium Pro/II Pentium Pro/II Pentium Pro/II Pentium Pro/II Pentium Pro/II Pentium Pro/II Pentium Pro/II Pentium Pro/II Pentium Pro/II Pentium Pro/II Pentium Pro/II Pentium Pro/II Pentium Pentium Pentium Pentium Pentium Pentium Pentium Pentium Pentium Pentium Pentium Pentium Pentium UltraSPARC UltraSPARC UltraSPARC UltraSPARC PowerPC 750 PowerPC 750 PowerPC 750 PowerPC 750 68040 68040 68040 68040 Language Assembly Assembly Assembly Assembly Assembly MS C MS C MS C MS C Borland C Borland C Borland C Borland C Assembly Assembly Assembly Assembly Assembly MS C MS C MS C MS C Borland C Borland C Borland C Borland C C C C C C C C C C C C C Keying Option Compiled Full Partial Minimal Zero Full Partial Minimal Zero Full Partial Minimal Zero Compiled Full Partial Minimal Zero Full Partial Minimal Zero Full Partial Minimal Zero Full Partial Minimal Zero Full Partial Minimal Zero Full Partial Minimal Zero Code Clocks to Key Clocks to Encrypt Size 128-bit 192-bit 256-bit 128-bit 192-bit 256-bit 8900 12700 15400 18100 285 285 285 8450 7800 10700 13500 315 315 315 10700 4900 7600 10500 460 460 460 13600 2400 5300 8200 720 720 720 9100 1250 1600 2000 860 1130 1420 11200 8000 11200 15700 600 600 600 13200 7100 9700 14100 800 800 800 16600 3000 7800 12200 1130 1130 1130 10500 2450 3200 4000 1310 1750 2200 14100 10300 13600 18800 640 640 640 14300 9500 11200 16600 840 840 840 17300 4600 10300 15300 1160 1160 1160 10100 3200 4200 4800 1910 2670 3470 8900 24600 26800 28800 290 290 290 8200 11300 14100 16000 315 315 315 10300 5500 7800 9800 430 430 430 12600 3700 5900 7900 740 740 740 8700 1800 2100 2600 1000 1300 1600 11800 11900 15100 21500 630 630 630 14100 9200 13400 19800 900 900 900 17800 3800 11100 16900 1460 1460 1460 11300 2800 3900 4900 1740 2260 2760 12700 14200 18100 26100 870 870 870 14200 11200 16500 24100 1100 1100 1100 17500 4700 12100 19200 1860 1860 1860 11800 3700 4900 6100 2150 2730 3270 16600 21600 24900 750 750 750 8300 13300 19900 930 930 930 3300 11600 16600 1200 1200 1200 1700 3300 5000 1450 1680 1870 12200 17100 22200 590 590 590 7800 12200 17300 780 780 780 2900 9100 14200 1280 1280 1280 2500 3600 4900 1030 1580 2040 16700 53000 63500 96700 3500 3500 3500 18100 36700 47500 78500 4900 4900 4900 23300 11000 40000 71800 8150 8150 8150 16200 9800 13300 17000 6800 8600 10400 Table 1: Twoﬁsh performance with diﬀerent key lengths and options 13 .

630 clocks/block at 16 bytes/block) is over ten percent faster than the best known DES assembly language implementation on the same platform. For an application that cannot have any key setup time. However.0 compiler chosen as the standard AES reference is not the best optimizing compiler. The Pentium and Pentium MMX achieve almost identical speeds for encryption and decryption. The encryption speed in Microsoft C of 40 Pentium clocks per byte (i. not because the key schedule uses any MMX instructions. We have listed many diﬀerent software per- . which is not surprising since Intel claims they have the same core. Compiler. but simply because of the larger cache size of the MMX chip. because of its smaller cache size.0). and vice versa! Fortunately.g.. The setup time for a Pentium more than doubles over the full keying case. almost all the extra time required is consumed merely in copying the code. Language. the choices of language and compiler can have a huge impact on performance. once a single key has been initialized. the diﬀerence between the compilers is not quite as large (e. future keys can be compiled at a cost of only a few hundred more clocks than full key schedule time. the assembly language that optimizes performance on a Pentium (or Pentium MMX) is drastically diﬀerent from the assembly language required to maximize speed on a Pentium Pro or Pentium II. the Pentium Pro/II CPUs can only perform one memory read per clock cycle. the table does not reﬂect the fact that. which is very odd because the “inner loop” C code used is virtually identical.e.2 versus 870 clocks per block for Borland C 5. when comparing the relative performance of diﬀerent algorithms. with both set to optimize for speed (630 clocks per block for Microsoft Visual C++ 4. However. using the same language and compiler for all implementations helps to make the comparison meaningful. coding the Twoﬁsh algorithm in assembly language 14 achieves speeds of 285 clocks. on a Pentium Pro/II. there also seems to be some anomalous behavior of the C compilers. there were cases in the table where the two speeds diﬀered by considerably more than ten percent (we used the larger number in the table). it is relatively simple to detect the CPU type at run-time and select which version of the assembly code to use. as well as about an extra 5000 bytes of memory to hold the “compiled” code. available only in assembly language. In almost all cases. It is clear that the Borland C 5. even though the ﬁnal code size and speed achieved on each platform are almost identical. We also noted several cases where compiler switches which seemed unrelated to performance optimization sometimes caused very large changes in timings. but it is still signiﬁcant. despite documentation claiming that it is possible. For example. Compiled In this option. the subkey constants are directly embedded into a key-speciﬁc copy of the code. although. but the Pentium MMX setup time is still comparable to the Pentium Pro setup time. However. while the Pentium can process only a total of two ALU operations or memory accesses per clock. the Pentium Pro/II can peform two ALU operations per clock in addition to memory accesses. Another anomaly is that the key schedule setup time is considerably faster (43%) on the Pentium MMX CPU than on the Pentium. the Microsoft Visual C++ 4. These (and other) signiﬁcant architectural diﬀerences result in the fact that running the optimal Pentium Twoﬁsh encryption code on a Pentium Pro results in a slowdown of nearly 2:1. while the Pentium and Pentium MMX can perform two. Empirically. the encryption and decryption routines in C achieved speeds within a few percent of each other. For example. This problem alone accounts for nearly half of the speed diﬀerence between Borland and Microsoft. the time it takes to encrypt one block is the sum of the key setup time and encryption time for the zero keying option. as noted above.. Some additional setup time is required.2 compiler generates Twoﬁsh code that is at least 20% faster than Borland on a Pentium computer. The bottom line is that. The remaining speed diﬀerence comes simply from poorer code generation. It should be noted that performance numbers for Pentium II processors are almost identical in all cases to those for a Pentium Pro. 640 clocks/block). However. saving memory fetches and allowing the use of the Pentium LEA opcode to perform both the PHT and the subkey addition all in a single clock.the ﬂy. but this option allows the fastest execution time of 285 clocks per block on the Pentium Pro. due mainly to the larger cache size. Part of the diﬀerence stems from the inability of the Borland compiler to generate intrinsic rotate instructions. The Borland compiler is uniformly slower than Microsoft’s compiler. The key setup time consists purely of computing the Ki values and S. and Processor Choice As with most algorithms. To make matters even more complicated. achieving a very signiﬁcant speedup over any of the C implementations. 600 clocks/block vs. but it does not guarantee a valid measure of the relative speeds. the MMX key setup times are faster.

for a variety of message lengths. as discussed above.. For shorter messages.6 msec 60 2150 32900 8. During encryption or decryption. Size and speed estimates for larger key sizes are very straightforward.g. Note that each routine ﬁts easily within the code cache of a Pentium or a Pentium Pro.2 Performance on Smart Cards Twoﬁsh has been implemented on a 6805 CPU. This table assumes the best of our implementations for the particular length of text. C. For very short messages. with the exception of the zero keying option. the key setup time can overwhelm the encryption speed. and the key schedule routine is slightly less than 5000 bytes. and 20000 clock cycles per block. but once an algorithm becomes standardized. This setup time could be cut considerably at the cost of two additional 512-byte ROM tables. minor improvements in code size and speed can be obtained. It should also be observed that the lack of a second index register on the 6805 has a signiﬁcant impact on the code size and performance. In assembler. it is possible to achieve signiﬁcantly smaller code sizes using loops with round counters. Pentium Pro). so a diﬀerent CPU with multiple index registers (e. These sizes are larger than Blowﬁsh but very similar to a fully optimized assembly language version of DES. but about 2200 bytes of that total can be discarded if only 128-bit keys are needed.g. If the 16 key bytes and the extra 8 bytes of the Reed-Solomon results (S) are retained in memory between encryption operations. 15 .. the assembly language performance numbers are the best numbers to use to gauge absolute performance.formance metrics across platforms and languages to facilitate speed comparisons between Twoﬁsh and other algorithms. Code and Data Size As shown in Table 1.4 The block encryption and decryption times are almost identical. the encryption and decryption routines are each about 4500 bytes in length. Our belief is that. the encryption and decryption routines are each about 2250 bytes in size. both 128-bit 4 For key setup and encryption. Our diﬀerent options result in the following numbers: RAM Code and Clocks Time per block (bytes) Table Size per Block @ 4MHz 60 2200 26500 6. given the 128-bit implementation. and there is zero startup comparison purposes: DES on a 6805 takes about 2K code. The extra code size is fairly negligible—less than 100 extra bytes for a 192-bit key. there are about 4600 bytes of ﬁxed tables for the MDS matrix. the remaining 36 bytes of RAM are available for other functions. which requires slightly over 1750 clocks per key. 23 bytes of RAM. If only encryption is required. the remaining 4000 bytes of code are in the key scheduling routines.3 msec The code size includes both encryption and decryption.7 msec 60 1760 37100 9. 5. Note that.. with several diﬀerent space–time tradeoﬀ options. on any given platform (e. and to 3400 clocks for 256-bit keys. Table 2 gives Twoﬁsh’s performance on the Pentium Pro (assembly-language version). at a cost in performance. performance is the sum of key setup and encryption. The other keying options use less data and table space. Similarly. inability to produce rotate opcodes). which is a typical smart card processor.g.2 msec 60 2000 35000 8. and by about 5200 clocks per block for 256-bit keys. and less than 200 bytes for a 256-bit key. Highlevel languages (e. Total Encryption Times Any performance measures that do not take key setup into account are only valid for asymptotically large amounts of text.g. it will ultimately be coded in assembly for the most popular platforms.. The only key schedule precomputation time required in this implementation is the ReedSolomon mapping used to generate the S-box key material S from the key M . 6502) might be a better ﬁt for Twoﬁsh. Java) are also important because of the ease of porting to diﬀerent platforms. the code size for a fully optimized Twoﬁsh on a Pentium Pro ranges from about 8450 bytes in assembler to 14100 bytes in Borland C. q0 . and q1 required for key setup. In addition to the code. the key schedule precomputation increases to 2550 clocks for 192-bit keys. since they are unaffected by the vagaries and limitations of the compiler (e. The 60 bytes of RAM for the smart card implementation include 16 bytes for the plaintext/ciphertext block and 16 bytes for the key. the code sizes in the table are for fully unrolled implementations. In Borland C. The encryption time per block increases by less than 2600 clocks per block (for any of the code size/speed tradeoﬀs above) for 192-bit keys. and each key requires about 4300 bytes of key-dependent data tables for the full keying option. in either C or assembler. these tables ﬁt easily in the Pentium data cache.

4 18. Twoﬁsh might not be able to take advantage of all the parallel execution units available on a VLIW processor. as well the ability to run existing Pentium code. other than that it includes an Explicitly Parallel Instruction Computing (EPIC) architecture. and the Pentium Pro/Pentium II may process up to three opcodes per clock. Such a scheme would replace 512 bytes of ROM table with 64 bytes of ROM and a small subroutine to compute the full 8-bit q0 and q1 .0 17. EPIC is related to VLIW architectures that allow many parallel opcodes to be executed at . access to memory tables is limited in most VLIW implementations to only a few parallel operations.9 27. based on “inline” computation of 4-bit S-boxes. Note that larger key sizes also require more RAM to store the larger keys and the larger ReedSolomon results. Observe that it is possible to save further ROM space by computing q0 and q1 lookups using the underlying 4-bit construction. as speciﬁed in Section 4. while the Pentium allows only two opcodes in parallel. However. including Intel’s Merced. unfortunately. also naturally involves less non-linearity and thus generally requires more rounds. and we expect similar restrictions to hold for Merced. For example. For example. as with most encryption algorithms.9 34. Since Twoﬁsh relies on 8-bit non-linear S-boxes.2 18. Thus.3 63. Thus. saving perhaps 350 bytes of ROM. this technique is only of interest in smart card applications for which ROM size is extremely critical but performance is not.5 20. reducing the RAM requirements of the implementation signiﬁcantly. but Serpent also requires 32 rounds.3 Performance on Future Microprocessors Given the ever-advancing capabilities of CPUs. may experience a relatively larger speedup than Twoﬁsh on a VLIW CPU.5. Serpent [BAK98]. once. but only two of the opcodes can read from memory. and is considerably slower to start with. In some applications it might be viable to store all key material in non-volatile memory. such an approach illustrates the implementation ﬂexibility aﬀorded by Twoﬁsh.5 47. there is still plenty of parallelism in Twoﬁsh that can be wellutilized in an optimized VLIW software implementation.9 92. It should also be noted that.6 18. the alternative of not using large S-boxes. it is also worthwhile to remember that DES has been 16 5. Equally important.Plaintext (bytes) 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 1M Keying Option Zero Zero Zero Zero Partial Full Full Full Compiled Compiled Compiled Compiled Compiled Compiled Clocks to Key 1250 1250 1250 1250 4900 7800 7800 7800 12700 12700 12700 12700 12700 12700 Clocks to Encrypt 860 1720 4690 6880 7360 10080 20160 40320 72960 145920 291840 583680 1167360 18677760 Total Clocks per Byte 131.3 23. an existing Philips VLIW CPU can process up to ﬁve opcodes in parallel. encryption speed would decrease by a factor of ten or more. Nonetheless.8 Table 2: Best speed to encrypt a message with a new 128-bit key on a Pentium Pro time for the next encryption operation with the same key. Future mainstream CPUs may include such support for the new AES standard. it is clear that table access is an integral part of the algorithm. it is worthwhile to make some observations about how the Twoﬁsh algorithm will run on future processors. the primitive operations used in Twoﬁsh could very easily be added to a CPU instruction set to improve software performance signiﬁcantly. However. while it may allow greater parallelism.8 73. Not many details are known about Merced. However.9 19.3.

It should also be noted that the Twoﬁsh round structure can be very nicely pipelined to break up the overall function into smaller and much faster blocks (e. such an architecture might raise the throughput by a factor of two or more (particularly for the larger key sizes). we could compute the qi ’s. Feistel xor). although this metric is certainly somewhat dependent on the particular silicon technology and circuit library available. the h function logic would be used during a key setup phase to compute the subkeys. Table 3 gives hardware size and speed estimates for the case of 128-bit keys. but merely the insertion of one extra layer of clocked storage elements (128 bits). but it is believed that a 128-bit AES scheme will be acceptable in the market long enough that most vendors will choose to implement that recommended key length. throughput can be doubled.g.4 Hardware Performance six q0 blocks and six q1 blocks (for N = 128). the gate savings mount fairly quickly. Depending on the architecture. MDS. but such a ROM is usually several times larger than the direct logic implementation. If the subkeys were precomputed.5 for building the 8-bit permutations q0 and q1 from four 4-bit permutations was selected mainly to minimize gate count in many hardware implementations of Twoﬁsh. and Feistel xor would be computed on the “odd” clocks. for a very modest increase in gate count. In ECB mode. All the examples in the table are actually quite small in today’s technology. counter mode. and the clock speed (or startup time) may increase. For even higher levels of performance. q’s. it is not meaningful to give just one ﬁgure for the speed and size attributes of Twoﬁsh in hardware. allowing faster operation because there is no need to ripple through several layers of key material xors and q permutations. halving throughput but saving even more gates. or an interleaved chaining mode. As a very simple example of two-level pipelining. the Sboxes could be precomputed and stored in on-chip RAMs. Instead. or the logic could be duplicated. Using careful balancing of circuit delays between the two clock phases. As another example of the possible tradeoﬀs.a standard for more than twenty years. while the PHT. saving gates but adding a startup time roughly equal to one block encryption time. We see no problem meeting NSA’s requirement to “be able to encrypt data at a minimum of 1 Gb/s. It is clear that this general approach can be applied with more levels of pipelining to get higher throughput.. thus running the clock at twice the speed of an unpipelined approach. pipelined if necessary. Since each full h block in hardware (see Figure 2) involves . Similarly. but trying to run all of them in a single clock cycle does aﬀect the cycle time. there are many possible space–time tradeoﬀs in hardware implementations of Twoﬁsh. in existing technology” [McD97]. The circuit delays in building q0 or q1 using logic gates are typically at least as fast as those using ROMs. and no popular CPU has added instruction set support for it. the throughput can be dramatically increased by pipelining the Twoﬁsh round structure. assuming that the application can use one of these cipher modes. a second block is processed in parallel on the alternate clock cycles. but even that is very 17 No actual logic design has been implemented for Twoﬁsh. eight 256-byte RAMs) would perhaps double or triple the size of the logic.35 micron CMOS technology. a single h function logic block could be time-multiplexed between computing T0 and T1 . If computed on the ﬂy. the h function logic could be time-multiplexed between subkeys and the round function to save size at a cost in speed. The addition of such RAMs (e. subkey addition. even though DES software performance would beneﬁt greatly from such features.3. and MDS multiply for one block during “even” clocks. PHT. These estimates are all based on existing 0. For example. the round subkeys could be precomputed and stored in a RAM. and it would also impose a signiﬁcant startup time on key change to initialize the RAMs. As in software.g. These permutations can be built either directly in logic gates or as full 256-byte ROMs in hardware. except the ﬁnal (highest performance non-pipelined) instance.. so for high-performance systems with infrequent re-keying. adding gates but perhaps running twice as fast. we will try to outline several of the options and give estimate for speed and gate count of several diﬀerent architectures. 5. multiple independent engines can be used to achieve linear speedups at a linear cost in gates. although diminishing returns are achieved past a certain point. key xors. None of these operations individually is slow. this approach allows us to cut the logic delay in half. Thus. but estimates in terms of gates for each building block have been made. The construction method speciﬁed in Section 4. Such an approach does not require duplicating the entire Twoﬁsh round function logic. subkey addition. this option may be attractive. the logic will grow somewhat in size for larger keys. or they could be computed on the ﬂy. key xors. Thus. Despite these disadvantages.

25 micron) becomes the industry norm. Also. Even when cryptographers made eﬀorts to use eﬃcient 32-bit operations. leave a margin for error and provide more security than is provably required. In other words. RC97]. we tried to evaluate all design decisions in terms of performance. When ciphers are analyzed according to this property. RC2 [Riv97. LMM91]—only considered performance as an afterthought. 18 . HKR+98]. alternatively. SEAL [RC94. Simplicity Do not include ad hoc design elements without a clear reason or function. SEAL [RC94. Other. RC5 is faster than DES on most microprocessors. the results can be surprising [SW97]. they often lacked a full appreciation of low-level software optimization principles associated with high-performance CPUs. Thus. are not good measures of performance. 6. The RC5 documentation [Riv95] uses the term “round” diﬀerently: one RC5-deﬁned round equals two Feistel rounds. more recent designs. SOBER [Ros98] for 8-bit ones. compare them on the basis of relative performance. we use the term “round” in the traditional sense: as it was deﬁned by DES [NBS77] and has been used to describe Feistel-network ciphers ever since. Try to design a cipher whose details can be easily kept in one’s head.6 but since its round function is more than twice as fast as DES’. increase the algorithms’ complexity without aﬀecting performance). RC5 might have twice the number of rounds of DES. many algorithms are not as eﬃcient as they could be. The early post-DES cipher designs would often compete on the number of rounds in the cipher. Other cipher designs of the period— REDOC II [CW91]. Minor modiﬁcations in the design of Blowﬁsh [Sch94]. Khufu/Khafre [Mer91] was the ﬁrst published algorithm that explicitly used operations that were eﬃcient on 32-bit microprocessors. LOKI [BPS90] and LOKI 93 [BKPS93]. In designing Twoﬁsh. Conservativeness Do not design close to the edge. we tried to stress the following principles: Performance When comparing diﬀerent options. KRRR98] was designed for 16-bit microprocessors. such as the number of rounds. but to the design of the S-boxes and the key schedule. SPEED [Zhe97]5 and Zhu-Guo [ZG97]. Two 1997 designs. and RC4 [Sch96] could improve performance without aﬀecting security [SW97] (or. fewer rounds. What is important is the cipher’s speed: the number of clock cycles per byte encrypted. RC97] is a more recent example. The original FEAL paper [SM88]. IDEA [LM91.1 Performance-Driven Design The goal of performance-driven design is to build and evaluate ciphers on the basis of performance [SW97]. Arbitrary metrics. discussed the beneﬁts of a stronger round function and 5 Speed 6 Here has been cryptanalyzed in [HKSW98. These principles were applied not only to the overall design of Twoﬁsh. are signiﬁcantly slower than alternatives that existed years previous. 6 Twoﬁsh Design Philosophy In the design of Twoﬁsh. try to design against attacks that are not yet known. for example.Gate count 14000 19000 23000 26000 28000 30000 80000 h blocks 1 1 2 2 2 2 2 Clocks per Block 64 32 16 32 48 64 16 Pipeline Levels 1 1 1 2 3 4 1 Clock Speed 40 MHz 40 MHz 40 MHz 80 MHz 120 MHz 150 MHz 80 MHz Throughput (Mbits/sec) 80 160 320 640 960 1200 640 Startup clocks 4 40 20 20 20 20 300 Comments Subkeys on the ﬂy S-box RAMs Table 3: Hardware tradeoﬀs (128-bit key) doable today and will become fairly inexpensive as the next generation silicon technology (0. do not seem to take performance into account at all.

However. Today’s CPUs will eventually be in smart cards. While it is impossible to optimize a cipher design for resisting 7 One AES submission. When we contemplated a change to the round function. This would add 10 clock cycles to the round function on the Pentium (2 on the Pentium Pro). we did not ignore performance on other 32-bit CPUs. Nyb95. On memory-poor machines. OCo94b. HT94] can create ciphers resistant to diﬀerential and linear cryptanalysis.2 Conservative Design During our design. KSW97]. The CAST cipher cryptanalyzed in [MSK98a] is not CAST-128. We took a slightly diﬀerent approach in our design. Are 8 rounds of this improved cipher better than 16 rounds of the current design? This analysis is necessarily dependent on the microprocessor architecture the algorithm is being compared on. Twoﬁsh’s round function encrypts at about 20 clock cycles.g. we concentrated on that platform. since these can only be used in very specialized applications and often require unrealistic implementations: e. Are 17 rounds of this new round function more or less secure than 16 rounds of the old one? • We could have deﬁned the key-dependent Sboxes using the whole key.1 Performance-driven Tradeoﬀs have to cut the number of rounds in half to be able to aﬀord this. While we focused on the Intel Pentium architecture. we would There has been considerable research in designing ciphers to be resistant to known attacks [Nyb91. such as diﬀerential [BS93]. NK95. Kiefer [Kie96] in [JK97]. as well as 8-bit and 16-bit CPUs. we also tried to keep 8-bit smart card and hardware implementations in mind. we considered using a 8-by-8 MDS matrix over GF(24 ) to ensure a ﬁner-grained diﬀusion. it is prudent to assume that new attacks will be developed in order to break it. To keep the performance constant. Ciphers provably secure against diﬀerential cryptanalysis have been attacked with higher-order diﬀerentials [Lai94. SAM97. When designing a cipher. For example. Nyb96]. This would have doubled key setup time on high-end machines. If there is any lesson from the past twenty years of microprocessors. and a version of CAST in [MSK98a]. instead of half of it. it does not create ciphers resistant to whatever form of cryptanalysis comes next. it is that the high end gets better and the low end never goes away. 6. OCo94a. 32 simultaneous ECB encryptions. NM97]. KSW96. 19 . was successfully broken using the interpolation attack [MSK98b]. Three examples: • We could have added a data-dependent rotation to the output of the two MDS matrices in each round. The only thing we did not consider in our performance metrics are bitslice implementations [Bih97. Mat96.Since NIST’s platform of choice was the Intel Pentium Pro [NIST97b]. Knu94b. the former would have been no slower on a Pentium but at least twice as slow on a low-memory smart card. uses ideas from bitslice implementations to create a cipher that is very eﬃcient on 32-bit processors while sacriﬁcing performance on 8-bit microprocessors. The question to ask is: Are 11 rounds of the modiﬁed cipher more or less secure than 16 rounds of the unmodiﬁed cipher? • We could have removed the one-bit rotation. However. 16 rounds translates to about 320 clock cycles per block encrypted. Nyb94.7 6. we constantly evaluated the relative performance of diﬀerent modiﬁcations to our round function. Yesterday’s top-of-the-line CPUs are currently in smart cards. DGV94b. Nyb93. Serpent [BAK98]. linear [Mat94]. Instead of trying to optimize Twoﬁsh against known attacks. we tried to make Twoﬁsh strong against both known and unknown attacks. Knu95b] or the interpolation attack [JK97]: KN -cipher [NK95] was attacked in [JK97. OCo94c. we evaluated it in terms of increasing or decreasing the number of rounds to keep performance constant. SMK98]. another cipher provably secure against diﬀerential and linear cryptanalysis. This would have saved clocks equivalent to one Twoﬁsh round. instead of a 4-by-4 MDS matrix over GF(28 ).. Knu94a. but it does illustrate that while the CAST design procedure [AT93. or 32 interleaved IVs. it is dangerous to rely solely on theory when designing ciphers. and halved encryption speed on memory-poor implementations (where the S-boxes could not be precomputed). while the 8-bit microprocessors will move to devices even smaller. SNAKE [LC97]. and related-key cryptanalysis [Bih94. This research has culminated in strong cipher designs—CAST-128 [Ada97a] and MISTY [Mat97] are probably the most noteworthy—as well as some excellent cryptanalytic theory. we would have to reduce the number of rounds to 11.1.

The diﬀerences were in the key material used (the round function’s g function uses a list of key-derived words processed by an RS code. KR97].. conservative design and over-engineering can instill some conﬁdence. the typical student cipher fares rather badly against a standard suite of statistical tests. Blowﬁsh. respectively). and Blowﬁsh indicates that complicated round functions are not always better than simple ones. it is not necessary that the identical function be used for encryption and decryption. DES. ZMI90] or a generalized Feistel network [Nyb96]. Since any particular “basket” has the potential of breaking the entire cipher. even though our analysis cannot break anywhere near that number. Panama [DC98]. We used key-dependent S-boxes. but are built from the 8 RC5’s security is almost wholly based on data-dependent rotations. probably the most studied block-cipher structure. At 32 rounds. subsequent research [KM97. At 128 rounds. CAST.. Although initial cryptanalysis was promising [KY95] (see also [Sel98]). it looks better. we tried to create a simple round function and then iterate it more than enough times for security. CRISP [Lee96]. Serpent. 6. One of the ways to simplify a design is to reuse the same primitives in multiple parts of a cipher. However.g.g. because they offer adequate protection against known statistical attacks and are likely to oﬀer protection to any unknown similar attacks. 10 Student cryptography projects bear this observation out. 6. Square). At 16 rounds. REDOCII [CW91]). Also. since the same kinds of analysis can apply to the cipher in two diﬀerent ways.11 We feel that doing so greatly simpliﬁes the analysis of Twoﬁsh. While many algorithms reuse the encryption operation in their key schedule (e.. We added one-bit rotations to prevent potential attacks that relied solely on the byte structure. Blowﬁsh). The use of the RS code to derive the key material for g adds substantial resistance to related-key attacks.2 Reversibility While it is essential that any block cipher be reversible. We deﬁned Twoﬁsh at 16 rounds. We used well-studied design elements throughout the algorithm. instead of something newer like an unbalanced Feistel network [SK96.g. 9 Akelarre was severely broken in [FS97. to create round subkeys [Ada97b]. even bad round functions can be made to be secure.10 Even a simple round function like TEA’s [WN95] or RC5’s seems secure after 32 rounds [BK98].. we used essentially the same construction (8-by-8-bit key-dependent S-boxes consisting of alternating ﬁxed permutations and subkey xors followed by an MDS matrix followed by a PHT) in both the key schedule and the round function. To that end. Cryptographic design does not lend itself to the adage of not putting all your eggs in one basket. The Twoﬁsh encryption and decryption round functions are slightly diﬀerent. Anecdotal evidence from algorithms like FEAL [SM88]. even the worst designs look very good. YTH [YTH96]). The rotations represent a performance-driven design tradeoﬀ: putting the additional rotations into F would have unacceptably slowed down the cipher performance on high-end machines.3. The most novel design elements we used—MDS matrices and PHTs—are only intended for diﬀusion (and are used in Square [DKR97] and SAFER.3. Some block ciphers are reversible with changes only in the key schedule (e. BK98] suggests that there is considerably more to learn about the security properties of data-dependent rotations. while others require diﬀerent algorithms for encryption and decryption (e. with enough rounds. BB96].1 Reusing Primitives 6.g.3 Simple Design A guiding design principle behind Twoﬁsh is that the round function should be simple enough for us to keep in our heads. Many elements of Twoﬁsh reﬂect this philosophy. it makes more sense to use as few baskets as possible—and to scrutinize those baskets intensely. complicated round functions are harder to analyze and rely on more ad-hoc arguments for security (e. 20 . IDEA. We started with a Feistel network. In Twoﬁsh. the key schedule’s h function uses individual key bytes directly) and the rotations. and several alternative DES key schedules reuse the DES operation [Knu94b.attacks that are unknown. both the 32-bit block input and 48-bit key input. We designed a very thorough key schedule to prevent related-key and weak-key attacks. so that ciphertext can be decrypted back into plaintext. We did not implement multiplication mod 216 + 1 (as in IDEA or MMB [DGV93]) or data-dependent rotations (as in RC58 or Akelarre [AGMP96]9 ) for non-linearity. 11 The closest idea is an alternate DES key schedule that uses the DES round function. we are unaware of any that reuse the same primitives in exactly this manner. SAFER. RC4.

This idea is similar to the one used in CS-cipher [SV98]. JK97. KLPL95]. and YLCY [YLCY98]. Knu93c. neither the basic algorithm nor its variants have been extensively cryptanalyzed. Randomly constructed known S-boxes are unlikely to be secure. CAST [MA96. NewDES is neither a DES variant nor a new algorithm based on DES. like DES. could be made much stronger with good S-boxes. usage. it is simple to build a hardware or software module that does both encryption and decryption without duplicating much functionality. Serpent [BAK98] reused the DES S-boxes.same blocks. Higher-order diﬀerential cryptanalysis is especially powerful against algorithms with simple algebraic S-boxes [Knu95b. Certainly input size matters more than output size. 6.4 S-boxes The security of a cipher can be very sensitive to the particulars of its S-boxes: size. while Blowﬁsh’s four 8-by-32-bit S-boxes require 4 kilobytes. 6.12 with S-boxes derived from the Declaration of Independence [Jeﬀ+76]. (Note that there is a limit to the advantages of making S-boxes bigger. Note that having the cipher work essentially the same way in both directions is a nice feature in terms of analysis. 6-by-4-bit S-boxes require 256 bytes of storage. sn DES [KPL93. The advantage of the latter is that the S-boxes are more compact. and [Knu93a. turning them into an application-speciﬁc family key. Khufu/Khafre. we chose the 4-by-4-bit S-boxes randomly and then extensively tested the resulting 8-by-8-bit S-boxes against the cryptographic properties we required.2 Algorithmic S-boxes S-boxes can either be large tables. S-boxes with small input size and very large output size tend to have very good linear approximations. DES variants with random ﬁxed S-boxes are very likely to be weak [BS93. like FEAL. rather than considering them as separate attacks with potentially radically diﬀerent levels of difﬁculty [Cop98]. GOST [GOST89] navigated a middle course: each application has diﬀerent ﬁxed S-boxes. can be used to generate S-boxes with given cryptographic properties.1 Large S-boxes 6. Ciphers invented before the public discovery of diﬀerential cryptanalysis sometimes used arbitrary sources for their S-box entries. In Twoﬁsh we tried to do both: we chose to build our 8-by-8-bit S-boxes algorithmically out of random 4by-4-bit S-boxes. from GOST’s 4-by-4-bit Sboxes to Tiger’s 8-by-64-bit S-boxes [AB96b]. simply by testing the results of the generation algorithm. That is. since it lets analysts consider chosen-plaintext and chosen-ciphertext attacks at once. Knu93b] against LOKI. and SAFER. and it is vulnerable to differential cryptanalysis [BS92]. LOKI-89/LOKI-91 (and LOKI97 [Bro98]). Algebraic S-boxes can result in S-boxes that are vulnerable to diﬀerential cryptanalysis: [Mur90] against FEAL. SMK98]. However. The advantage of the former is that there is no mathematical structure that can potentially be used for cryptanalysis. IDEA. DES’ eight 12 Despite 13 The the algorithm name. or derived algebraically. WAKE design has several variants [Cla97. number. NewDES [Sco85]. Mat95]. while the best attack on Blowﬁsh breaks only four [Rij97]. but the exact same module cannot both encrypt and decrypt. Some cipher designers responded to this threat by carefully crafting S-boxes to resist known attacks— DES [Cop94]. Cla98]. Khafre uses S-boxes taken from the RAND tables [RAND55].13 The best existing attack on Khufu breaks 16 rounds [GC94]. Ada97a]—while others relied on random key-dependent S-boxes for security—Khufu. and can be more easily implemented in applications where the ROM or RAM for large tables is not available. however. WAKE [Whe94]. Large S-boxes are generally assumed to be more secure than smaller ones—a view we share—but at the price of increased storage requirements. values. while a 16-by-16-bit S-box requires 128 kilobytes. an 8-by-64-bit S-box can be stored in 2 kilobytes. 21 . and CMEA was weakened extensively because of a poor S-box choice [WSK97].) Twoﬁsh used the same solution as Square: mid-sized S-boxes (8-by-8-bit) used to construct a large S-box (8-by-32-bit). Both tabular and algebraic techniques.4. Blowﬁsh. S-boxes vary in size.4. S-boxes with suﬃciently large outputs relative to input size are guaranteed to have at least one perfect linear approximation [Bih95].

setting up a single Blowﬁsh key takes as much time as encrypting 520 blocks. bijective. Either the round subkey itself is used to generate the other round subkeys in some cryptographically secure manner. uses SHA [NIST93] to create its key-dependent S-boxes. and in others some manipulation of the round subkeys is required to determine the other round subkeys: e. we made the small S-boxes key dependent. self-inverse keys. such as related-key diﬀerential attacks [KSW96. Key schedules can be divided into several broad categories [CDN98]. in order to avoid “slide” attacks [Wag95b]. Blowﬁsh. in general. as in DES. but the cost is an enormous performance penalty in key-setup time. as in RC5 and CS-Cipher. Blowﬁsh uses repeated iterations of itself. However.g. 14 For An algorithm’s key schedule is the mechanism that distributes key material to the diﬀerent parts of the cipher that need it. For example. they can make attacks on the cipher easier. ICE [Kwa97]. RP95b. IDEA. they often lead to strange properties of the cipher: large classes of equivalent keys. there is really no such thing as a keydependent S-box. Biham-DES [BB94]. 22 .3 Key-dependent S-boxes 6. • The key bits used in each round must be unique to the round. In some key schedules. knowledge of a round subkey uniquely speciﬁes bits of other round subkeys. and Shark. etc. the DES weak (self-inverse) keys have been exploited in many attacks on larger cryptographic mechanisms built from DES [Knu95a]. This is not simply a cryptographic PRNG grafted onto the cipher. and related-key) attacks. CWSK98].4. It is our belief that while random key-dependent S-boxes can offer acceptable security if used correctly. of data. KSW97]. These attacks can be especially devastating when the cipher is used in a hash function construction.) We often used this conceptualization when carrying out our own cryptanalysis against Twoﬁsh. see Figure 3 on page 11.g. or a one-way function is used to generate the sound subkeys (sometimes the block cipher itself): e. (For example. linear. The latter results in a simpler key schedule. to avoid attacks based on non-surjective round functions [BB95. • The cipher must be secure against an attacker with partial knowledge or control over some key bits. When key schedules are poorly designed. Some simple design principles guided our development of the key schedule for Twoﬁsh: • Design the Key Schedule for the Cipher.g.. Even worse.. unwanted synergies could lead to attacks on the resulting cipher. It is our belief that ciphers with key-dependent S-boxes are. for example. In some ciphers. the S-box is constructed speciﬁcally to ensure that no two entries are identical—Khufu and WAKE—while others simply create the S-box randomly and hope for the best: REDOC II [CW91] and Blowﬁsh [Sch94]. and some attacks on the cipher will be focused directly at the key schedule. and LOKI.5 The Key Schedule S-boxes are either ﬁxed for all keys or key dependent. expanding the key material in the process. but unless the creation algorithm is extensively cryptanalyzed together with the encryption algorithm. Twoﬁsh uses a complex multistage series of S-boxes and round subkeys that are often precomputed as key-dependent S-boxes for efﬁciency purposes. and the S-1 [Anon95] cipher was broken due to a bad key-schedule design [Wag95a].. To avoid diﬀerential (as well as high-order diﬀerential. Serpent [BAK98]. The results are S-boxes that are eﬀectively random. the Twoﬁsh example. Another strategy is to generate key-dependent S-boxes from a known secure S-box and a series of strict mathematical rules: e. These properties can often aid an attacker in a real-world attack. but may result in weaknesses (e. or 4160 bytes. we made the 8-by-8-bit S-boxes.6. Most key-dependent S-boxes are created by some process completely orthogonal to the underlying cipher. So. RPD97. In some ciphers the bits are just reused. CAST and SAFER.14 An alternative is to build the S-boxes using fairly simple operations from the key. the beneﬁts of a surjective S-box are worth the additional complexities that constructing them entails. This results in a much faster key setup. as well as the 8-by-32-bit Sboxes.g. more secure than ﬁxed Sboxes.. There are two diﬀerent philosophies regarding keydependent S-boxes. a weakness in reduced-round variants of Blowﬁsh [Vau96a]). Other key schedules are designed so that knowledge of one round subkey does not directly specify bits of other round subkeys. SEAL. This is necessary for three reasons: • There are fewer key bits provided as input to the cipher than are needed by the cipher.

as is done in Khufu.15 If performance were not an issue. and quasi-weak keys. 7 7.5. we can make convincing arguments that none of these properties exists. the only way a key bit can aﬀect the cipher is after it deﬁnes a key-dependent S-box. In the design of Twoﬁsh. The key material used to derive the key-dependent S-boxes in g is derived from the key using an RS code having properties similar to those of the MDS matrix. and to generate them randomly. Contrast this with an SP-network. However. All key material goes through h (or g. However. 6. do not necessarily make the cipher weak. 16 In fact. and DES-style complementation properties. This allows us to analyze the properties of the key schedule in terms of the properties of the byte permutations. This led to a much simpler key schedule with much more complicated analysis. A better defense. This allowed us to apply much of the same analysis to both the round function and the subkey generation. Key-stretching techniques can always be used to frustrate brute-force attacks [QDD86. The Twoﬁsh key schedule’s subkey generation mechanism.key schedule is instead an integral part of the whole cipher design. • The key schedule must be reasonably eﬃcient for hardware implementations. • The key schedule must have minimal latency for changing keys. Our performance criteria included: • The key schedule must be precomputable for maximal eﬃciency.16 This means that we were able to use operations in our F function that are ineﬃcient in the other direction. • Use All Key Bytes the Same Way. use of a Feistel network means that the F function need only be calculated in one direction. such as the existence of equivalent keys. The cynical reader would immediately conclude that the NSA is concerned with the eﬃciency of their brute-force keysearch machines. we tried to balance these two items. the F function can be non-surjective. Deriving the key material in this way maximizes the diﬃculties of an attacker trying to mount any kind of related-key attack on the cipher. It is reasonable to consider one round’s operations. Twoﬁsh was designed as a Feistel network. there are implementations where key agility is a valid concern. key setup can overwhelm encryption speed. • Reuse the Same Primitives. which is the same function). 15 In its comments on the AES criteria.1 Performance Issues • The key schedule must work “on the ﬂy. and make do with tables and constants for one direction only. Blowﬁsh. and the derivation of its subkeys. KSHW98].1 The Design of Twoﬁsh The Round Structure For large messages. because it is one of the most studied block cipher building blocks. semi-weak. and SEAL. The key schedule design of some other ciphers has led to various undesirable properties. with as little required memory as possible. which must execute its encryption function in both the forward and backward directions. as it is in DES or Blowﬁsh. However. DESstyle weak. by giving him conﬂicting requirements between controlling the S-box keys and controlling the subkeys. is built from the same primitives as the Twoﬁsh round function. h. they tend to make it harder to use the cipher securely. it would make sense to simply use a one-way hash function to expand the key into the subkeys and S-box entries. That is. Additionally.” deriving each block of subkey material as it is needed. This also makes for a relatively simple picture of the cipher and key schedule together. 23 . These properties. For smaller messages. the NSA suggested that “a goal should be that two blocks could be enciphered with diﬀerent keys in virtually the same time as two blocks could be enciphered with the same key” [McD97]. performance of the key schedule is minor compared to performance of the encryption and decryption functions. With our key schedule. of course. at the same time. This involves trying to minimize the amount of storage required to keep the precomputed key material. the AES eﬃciency requirements make such an approach unacceptable. • Make It Hard to Attack Both S-box and Subkey Generation. is to always use keys too large to make a brute-force search practicable.

Each permutation has DPmax = 10/256 and LPmax = 1/16. s0 (x) = q1 [s1 (q1 [x])].1 ] ⊕ s1. t1 . and the operator · computes the overall parity of the bitwise-and of its two operands. which. but no substantial improvement was found when composite keyed S-boxes were constructed and compared to the q0 and q1 used in Twoﬁsh.3 ] ⊕ s1. Each of the 216 s0 permutations 24 and 2 LPmax (q) = max a. This is done as follows for 128-bit Twoﬁsh keys: s0 (x) = s1 (x) = s2 (x) = s3 (x) = q1 [q0 [q0 [x] ⊕ s0. three.. or four bytes of key material. so long as q0 and q1 have no high-probability diﬀerential characteristics.7.b X Each S-box is deﬁned with two. computing various metrics when combined with key material into Twoﬁsh’s S-box structure.0 ] ⊕ s1.j = 0. these pairs of keys should lead to extremely diﬀerent S-boxes. as described below. The ﬁrst rotor is ﬁxed. These must have several properties: • The four diﬀerent S-boxes need to actually be diﬀerent. It is helpful to recall that these individual ﬁxed-byte permutations are used only to construct the key-dependent S-boxes. and the ﬁxed oﬀset between two rotors speciﬁed by a key byte. When −1 all si. without adding any apparent weaknesses to the cipher.2 ] ⊕ s1. For the 128-bit key. with at least two xors with key material bytes. and t3 for each.5) was chosen because it decreases hardware and memory costs for some implementations.” in the sense of having highprobability diﬀerential or linear characteristics. 7. while q1 has two ﬁxed points. Using the notation of Matsui [Mat96]. but someone else might. There are other similar relationships between S-boxes. LPmax ≤ 1/16.2 The Key-dependent S-boxes A fundamental component of Twoﬁsh is the set of four key-dependent S-boxes. in the case of a 128-bit key. depending on the Twoﬁsh key size.3 ] where the si.2. These criteria alone rejected over 99. • Few or no keys may cause the S-boxes used to be “weak. In some sense this construction is similar to a rotor machine citeDK85. • There should be few or no pairs of keys that deﬁne the same S-boxes. the S-box s0 uses 16 bits of key material.b=0 2 Pr[X · a = q(X) · b] − 1 X . and fewer than three ﬁxed points were accepted as potential candidates. Note that with equal key bytes. Pairs of permutations meeting these criteria were then evaluated as potential (q0 . in turn. the individual characteristics of q0 and q1 are not terribly relevant (except perhaps in some related-key attacks). as well as algebraically derived permutations (e. constructing random 4-bit permutations t0 . For example. as discussed previously. in terms of best diﬀerential and linear characteristics and other kinds of analysis. the probabilities are taken over a uniformly distributed X. We have not been able to ﬁnd any weaknesses resulting from this. q1 ) pairs.8 percent of all randomly chosen permutations of the given construction. t2 . Only ﬁxed permutations with DPmax ≤ 10/256. no pair of these S-boxes is equal.1 ] q1 [q1 [q0 [x] ⊕ s0. In particular. or in the sense of having a very simple algebraic representation. with two diﬀerent types of rotor (q0 and q1 ). The q0 and q1 permutations were chosen by random search.2. because Twoﬁsh always uses at least three of these permutations in series. That is. multiplicative inverses over GF(28 )) that have slightly better individual permutation characteristics. 7. we deﬁne DPmax (q) = max Pr[q(X ⊕ a) ⊕ q(X) = b] a=0.j are the bytes derived from the key bytes using the RS matrix. changing even one bit of the key used to deﬁne an Sbox should always lead to a diﬀerent S-box. q0 has one ﬁxed point.g. In fact. Consideration was initially given to using random full 8-bit permutations for q0 and q1 .3. The actual q0 and q1 chosen were one of several pairs with virtually identical statistics that were found with only a few tens of hours of searching on Pentium class computers. are used only within the h and g functions. we have experimentally veriﬁed that each N/8-bit key used to deﬁne a byte permutation results in a distinct permutation.2 The S-boxes The construction method for building q0 and q1 from 4-bit permutations (speciﬁed in Section 4.2 ] q0 [q1 [q1 [x] ⊕ s0.0 ] q0 [q0 [q1 [x] ⊕ s0.1 The Fixed Permutations q0 and q1 where q is the mapping DPmax and LPmax are being computed for. We did not ﬁnd any useful cryptanalysis from this parallel.

However. with blank entries indicating that no value in the range was found. when viewed individually. We have not yet exhaustively tested longer key length. For example. in Table 4. but the probability distributions obtained for the statistical tests give us a fairly high degree of conﬁdence that no surprises are in store. we present and discuss these results. and the definitions for how to construct s0 . since the computational load for computing LPmax is roughly a factor of ﬁfteen higher than for DPmax . The Twoﬁsh S-box structure has been carefully designed with this issue in mind. In the 128-bit key case. not an exhaustive test. consider a ﬁxed 8-bit permutation q[x] which has very good DPmax and LPmax values. Table 4 shows the DPmax distribution for the various key sizes. but the class of s permutations is so closely related that conventional diﬀerential and linear cryptanalysis techniques can probably be effectively applied without knowing the key k. in the N = 128 case only 1 of every 212. for a given key length. This metric does not necessarily have any direct cryptographic signﬁcance. only the 256-bit key row uses statistical sampling. Monte Carlo) testing has been possible for the larger key sizes. Bi . (The probability of n ﬁxed points is approximately e−1 /n!. which is feasible because each S-box uses only 16 bits of key material. and s3 . but we conjecture that all S-boxes generated by our construction are distinct. while the maximum value is somewhat higher for larger key sizes.0 key-dependent S-boxes has DPmax = 18/256. The distribution of the number of permutation ﬁxed points for various key sizes is given in Table 6. for 128-bit keys. only statistical (i. As a pathological example. an attacker attempts to modify the key bytes in such a way as to minimize the diﬀerences in the round subkey values Ai .2. Table 5 shows the distribution of LPmax for the various key sizes. a measure of the diﬀerential 25 Given the ﬁxed permutations q0 and q1 . This issue is of particular interest in dealing with related-key attacks. It is true that each sk permutation also has good individual metrics.deﬁned is distinct. but which is used as a key-dependent family of S-boxes simply by deﬁning sk (x) = q[x] ⊕ k.) The Twoﬁsh S-box distributions of ﬁxed points over all keys match theory fairly well. In one class of related-key attacks. Another possible concern is how different the various Twoﬁsh S-boxes are from each other. the maximum value from a statistical sample is not a guaranteed maximum. The metrics discussed so far give us a fair level of conﬁdence that the Twoﬁsh method of constructing key-dependent S-boxes does a reasonable job of approximating a random set of S-boxes. the 128-bit and 192-bit cases involve an exhaustive test. s2 . taken as a black box. s1 . Note that. The remaining columns give the distribution of observed DPmax values. Observe that the vast majority . A detailed explanation of the format of this table will also help in understanding the other tables in this section. s2 . s2 . These statistics are taken over all four S-boxes (s0 . 256K) S-boxes were evaluated for the 128-bit key case. we have performed extensive testing of the characteristics of these key-dependent S-boxes. so a total of 4 × 216 (i. but it also has an important bearing on the ability of an attacker to model the S-boxes. For example.9 ) have DPmax = 12/256. We also conjecture that this would be the case for almost all choices of q0 and q1 meeting the basic criteria discussed above. it is a useful way to verify that the S-boxes are behaving similarly to random S-boxes. Monte Carlo statistics are given for the larger two key sizes.e.. A ﬁxed point of an S-box si is a value x for which x = si (x). s3 ). as is also the case for s1 . Each entry is expressed as the negative base-2 logarithm of the fraction of S-boxes with the given value (or value range). Despite the fact that the Twoﬁsh S-boxes are key-dependent. are very closely related. since it is possible to compute the theoretical distribution of ﬁxed points for random S-boxes. no Twoﬁsh S-box has an LPmax value greater than (100/256)2 . all testing has been performed exhaustively. and s3 from them. the beneﬁt of key-dependency is severely weakened. however. each S-box has a DPmax value no larger than 18/256.. if the entire set (or large subsets) of S-boxes. but in many cases the number is considerably larger. although there is a small fraction of S-boxes with larger values. 7. The probabilities for random S-boxes is given in the last row. In many cases.3 Exhaustive and Statistical Analysis of all Twoﬁsh S-boxes have LPmax < (88/256)2 . Since the Twoﬁsh S-box structure is used in computing Ai and Bi . In this section. Each Monte Carlo sampling involves at least 216 S-boxes. An asterisk (∗ ) next to a key size in the tables discussed in this section indicates that the entries in that row are a statistical distribution using Monte Carlo sampling of key bits. with Me and Mo as key material. s1 .e. Clearly. It is our hope to complete exhaustive testing over time for the larger key sizes where feasible. For 128-bit keys. while over half (1 in 20. respectively.

4 2. It can be seen that the probability of injecting a constant diﬀerence into more than ﬁve consecutive subkey entries is extremely low. so intuitively it seems unlikely that he should be able 26 to force a given diﬀerence sequence that is very long.4 2. .3 0.2 Table 6: Number of ﬁxed points over all keys characteristics of Ai (or Bi ) across keys will help us understand both how diﬀerent the S-boxes are from each other and how likely such an attack is to succeed. Consider the diﬀerence sequence yi = yi ⊕ yi . Note that an attacker has exactly 16 bits of freedom for a single S-box in the 128-bit key case.0 8. there are about 231 pairs of distinct keys for each S-box. For example. Instead of requiring a run of n consecutive identical diﬀerences.2 192 bits 10 1.0 6.9 4.1 1. which is reassuring. and the sequence for another key be ∗ yi .0 12..8 25.0 1. even worse.1 8.9 13. For random byte sequences.0 Table 4: DPmax over all keys − log2 (Pr(LPmax = (x/256)2 )) Key Size Max Value x = 56. 0.4 2.9 4. in the 128-bit key case.8 10. 2.0 8. which matches quite nicely with the behavior observed in the table.2 4.8 16..3 0..2 4. What is the probability of having a “run” of n consecutive equal values in the sequence? If n can be made to approach 20.0 256 bits∗ 22/256 15.4 4.2 16.4 1.87 88.9 13.1 6. Without loss of generality.0 23. In other words..0 12.. Table 8 shows the results of measuring this same property in a diﬀerent way. let us ﬁrst look at how many consecutive values of Ai with a ﬁxed xor diﬀerence can be generated for two diﬀerent keys. and in particular we consider a change in only the key material Me that aﬀects one of the four S-boxes.1 14. .4 256 bits∗ 10 1.5 17. with 16 bits of key material per S-box.2 4.9 13.0 16.7 x=8 17.4 1. The quantity measured is the maximum number of equal diﬀerences out of the 20 values generated.0 6. while it may be diﬃcult to generate a large run. showing that it is extremely diﬃcult to force more than ﬁve or six diﬀerences to be identical.7 x = 9 x = 10 19.4 1.g. . Table 7 shows the distribution of run lengths of ∗ the same xor diﬀerence yi for consecutive i.63 64. as that would require many zero diﬀerences.95 96.4 Table 5: LPmax over all keys..2 8.4 1.0 19.4 4. .4 2.0 8.0 1. This also shows that it is not possible to inﬂuence only a few Ai .2 8.7 22.1 8.79 80..0 1.0 256 bits∗ (108/256)2 9.71 72.0 12.3 1.111 128 bits (100/256)2 9. 4. and. is it possible to generate equal values in a large fraction of the set of diﬀerences? The results are again very encouraging.9 23. Let yi be the output sequence for one S-box used in generating the sequence Ai .8 21. .4 192 bits∗ (104/256)2 9.7 10. our belief in the independence of the key-dependent S-boxes must be called seriously into question.1 10.2 8.8 16.4 4.0 6. we consider only Ai .4 21.− log2 (Pr(DPmax = x/256)) Key Size Max Value x = 8 x = 10 x = 12 x = 14 x = 16 x = 18 x = 20 x = 22 x = 24 128 bits 18/256 15.4 4. across key pairs.4 1.0 192 bits 24/256 15. 38). − log2 (Pr(# ﬁxed Key Size Max Value x = 0 x = 1 x = 2 x = 3 x = 4 x = 5 128 bits 8 1. To this end.4 1. computed over only the 20 input values used in the subkey generation (e.3 0. each of length 20. this metric is similar to a “mini”-DPmax . so there would be 231 such diﬀerence sequences.0 12.3 points = x)) x=6 x=7 11.3 1.5 20. then a related-key attack might be able to control the entire sequence of Ai values.2 16.4 random 1.9 4..1 12.2 1.2 17. we would theoretically expect that Pr(xor run length = n) should be roughly 2−8(n−1) .103 104.2 12.1 8.0 8.

Key Size 128 bits 192 bits∗ 256 bits∗ Max Value 5 4 5 − log2 (Pr(xor run length = n)) n=1 n=2 n=3 n=4 n=5 0. any change in any two input bytes is guaranteed to change at least three output bytes. “simple” coeﬃcients. For software implementation on a modern microprocessor.3 MDS Matrix The four bytes output from the four S-boxes are multiplied by a 4-by-4 MDS matrix over GF(28 ). as in Square [DKR97]. These results help conﬁrm our belief that.29 Table 7: Subkey xor diﬀerence run lengths − log2 (Pr(max # equal xor diﬀerences = n)) x=1 x=2 x=3 x=4 x=4 x=6 x=7 1. This matrix multiply is the principal diﬀusion mechanism in Twoﬁsh. the MDS matrix multiply is normally implemented using four lookup tables. 27 7.8 29. even with the extended properties discussed below. each consisting of 256 32-bit words. Unlike the MDS matrix used in Square.1 0. etc. but the MDS properties for three of the bytes are still fully guaranteed with respect to b.7 17.0 1.98 15. as should be expected. for smart cards and in hardware.95 23.9 5.93 23.98 15.1 0. More than 2127 such MDS matrices exist. and the resulting distribution of DPmax values compared to that of the Twoﬁsh key-dependent S-boxes.01 7.9 5.6 23.02 30. However. can make implementations cheaper and faster.0 1. This similarity is comforting.9 11. a similar distribution using randomly generated 8-bit permutations has been generated for purposes of comparison.69 0.9 5. cients.7 17. nor is there a requirement that the matrix be a circulant matrix. For each metric.98 15. It is true that the most signiﬁcant bit of b is still “lost” in this half of the PHT. but the Twoﬁsh MDS matrix is also carefully chosen with the property of preserving the number of bytes changed even after the rotation in the round function. as is the fact that the probability distributions for each metric look quite similar across key sizes. and they are met with probability 254/255 for the fourth byte.9 11. However. However.01 7. but such a scheme would require the veriﬁcation that the keydependent values in fact formed an MDS matrix. any change in a single input byte is guaranteed to change all four output bytes.9 29. The MDS matrix used in Twoﬁsh has ﬁxed coeﬃ- . DPmax was computed for a set of 216 randomly generated permutations. adding a non-trivial computational load to the key selection and key scheduling process.89 30.9 31. from a statistical standpoint. In other words.93 24. the probability distributions looked virtually identical to those obtained for the Twoﬁsh set of key-dependent S-boxes. since the rotation undoes the shift applied to b as part of the PHT.01 7. so the particular coeﬃcients used in the matrix do not aﬀect performance. except for small ﬂuctuations on the tail ends of the distribution.9 11.7 23. For both encryption and decryption. a right rotation by one bit occurs after the xor. it should be noted that there are many acceptable MDS matrices.8 23. For example. because of the rotation applied after the xor within the round function. Twoﬁsh does not use the inverse matrix for decryption because of its Feistel structure. The MDS property here guarantees that the number of changed input bytes plus the number of changed output bytes is at least ﬁve. the Twoﬁsh S-box sets behave largely like a randomly chosen set of permutations.00 0.1 0. This direction of rotation is chosen to preserve the MDS property with respect to the PHT addition a+2b. it is desirable to select the MDS matrix carefully to preserve the diffusion properties even after the rotation.3 Key Size 128 bits 192 bits∗ 256 bits∗ Max Value 7 7 7 Table 8: “Mini”-DPmax subkey distribution For every metric discussed here. some thought was given to making the matrix itself key-dependent.6 17. Initially.

so it should not be diﬃcult to ﬁnd such sets of elements. .e. i. EF. In fact. that can guarantee the same minimum output Hamming diﬀerence. the probability that the output difference Hamming weight is less than eight bits for two byte input diﬀerences is only 0. No higher minimum Hamming weight was found for any matrices with such simple elements. since over 75% of all 4-by-4 matrices over GF(28 ) are MDS. There are many MDS matrices containing only the three elements 1. which aﬀect the round function output.0035) for a totally random binomial distribution. each of which allowed several MDS matrixes with the three values. A single byte input change to the MDS matrix will change all four output bytes. it will also shift in a ﬂipped bit from the next byte. as shown in the table below. so multiplication by these elements consists of two LFSR right shifts mod v(x). only one of the seven outputs with Hamming weight 8 has its most signiﬁcant bit set. with probability 0. The primitive polynomial v(x) = x8 + x6 + x5 + x3 + x0 was selected. For all single byte input changes.000018. This Twoﬁsh MDS matrix guarantees that any singlebyte input change will produce an output Hamming diﬀerence of at least eight bits. using either the same or another set of simple ﬁeld elements. together with the ﬁeld elements 1. Thus. and the particular matrix chosen is representative of the class. the rotation of the output 32-bit word B by eight bits has an output diﬀerence that is guaranteed to be unique from output diﬀerences in the unrotated A quantity. if x = y. It is intuitively obvious (and empirically veriﬁed) that the probability that two random ﬁeld elements satisfy this property is roughly one half. and 5B. This property guarantees that all single-byte input diﬀerences result in unique output diﬀerences. if x ∗ a = 1. y/x) has the least signiﬁcant bit set. If a byte value diﬀerence of 1 is one output from the matrix multiply.The eﬀect of the rotation on the unshifted PHT additions also needs to be addressed. in addition to the property that all four output bytes will be aﬀected. A computer search was executed over all primitive polynomials of degree eight. which is less than the corresponding probability (0. Several dozen sets of suitable values were found.. even when rotations over 8 bits are applied to the output. Another constraint imposed on the Twoﬁsh MDS matrix is that no row (or column) of the matrix be a rotation of another row (or column) of the matrix. even under bitwise rotations by each of the rotation values in the range 6. and 5B (using hexadecial notation and the ﬁeld element to byte value correspondence of Section 4). meaning that at least eight bits will be affected even in the PHT term T0 + 2T1 for 1019 of the 1020 possible single-byte input diﬀerences.26. then y/x = 1. EF. The construction used to preserve this property after rotation is actually very simple. The subkey generation routine takes advantage of this property to help thwart relatedkey diﬀerential attacks. The idea is to choose a small set of non-zero elements of GF(28 ) with the property that. so the property holds. only 7 of the 1020 possible single-byte input diﬀerences result in output diﬀerences with Hamming weight diﬀerence eight. then y ∗ a (i. and 5B is β −2 + 1. The particular Twoﬁsh matrix was also chosen to maximize the minimum binary Hamming weight of the output differences over all single-byte input diﬀerences.. Also. if a byte diﬀerence 1 is output from the matrix multiply with a single byte input change. Simple in this context means that x ∗ a for arbitrary a can be easily computed using at most a few shifts and xors. Input diﬀerences in two bytes are guaranteed to aﬀect three output bytes. The element EF is actually β −2 + β −1 + 1. plus a few byte xors. the remaining 1013 diﬀerences result in higher Hamming weights. for each pair x. y in the set. In fact. This con28 straint did not seem to limit the pool of candidate matrices signiﬁcantly. The MDS matrix coeﬃcients in Twoﬁsh are carefully selected so that.00125.y. the Twoﬁsh MDS matrix actually exhibits a much stronger property: all 1020 MDS output diﬀerences for single-byte input changes are distinct from all others. Also. looking for sets of three “simple” elements x. but after rotation there is no such guarantee. Hamming Weight 8 9 10 Number of occurrences 7/1020 23/1020 42/1020 Number with MSB Set 1 4 15 There are many other MDS matrices. if the rotation shifts out the only original ﬂipped bit. the hope of ﬁnding such a matrix sounds reasonable. as is done in generating the round subkeys. Observe that this property is reﬂexive. where β is a root of v(x).z with the above property. and they can result in an output Hamming diﬀerence of only three bits. the next highest byte is guaranteed to have its least signiﬁcant bit changed too.. performing a 32-bit rotate on the result will shift in one bit from the next byte in the 32-bit word and shift out the only non-zero bit.e.

it can easily be seen that this construction preserves the MDS property nicely even for multi-bit rotations. The ﬁrst is the MDS matrix multiply. an xor operation could have been used. but the other nibble remains entirely contained in the byte. 7. including the addition of the round subkeys. Because of this additional cost. Such a construction would be easier to analyse and would have nicer properties. For best performance. xor is the most eﬃcient operation in both hardware and software. 8-by-8 MDS matrices over GF(24 ) are nowhere nearly as plentiful as 4-by-4 matrices over GF(28 ). but it is much slower in virtually all implementations and would not be worth it. Use of an 8-by-8 MDS matrix over GF(24 ) will guarantee eight output 4-bit nibble changes for every input nibble change. a 4-by-4 MDS matrix. Unfortunately. We chose not to use addition (used in MD4 [Riv91]. both in hardware and in ﬁrmware on a smart card. It should also be noted that a very diﬀerent type of construction could be used to preserve MDS properties on rotation.It is fairly obvious that this matrix multiply can be implemented at high speed with minimal cost. 7. or a more complicated combining function like Latin squares (used in DESV [CDN95]). The LEA opcodes allow the addition of one register to a shifted (by 1. the round subkeys are combined with the PHT output via addition to enable optimal performance on the Pentium CPU family. Because the changes now are nibble-based.2. which requires considerably more gates in hardware and more tables in smart card ﬁrmware than the Twoﬁsh matrix. a GF(2)-linear combination of the bits by the MDS matrix. each of the output bits is the xor of a subset of the input bits. Within the S-boxes. The two outputs of the g functions (T0 and T1 ) are then combined using a PHT so that both of them will aﬀect both 32-bit . so very little freedom is available to pick simple coeﬃcients. addition. using addition instead of xor has virtually no impact on code size or speed. along with a 32bit constant. The goal of this ordering is to help destroy any hope of using a single algebraic structure as the basis of an attack. This is done primarily for simplicity. 7. No two consecutive operations use the same structure. 29 By design. From a cryptographic standpoint. RIPEMD160 [DBP96] and SHA [NIST93]). integer addition mod 232 . That is.7 Use of Diﬀerent Groups The PHT operation. we could have used 8 key-dependent S-boxes and an 8-by-8 MDS matrix over GF(28 ).8 Diﬀusion in the Round Function There are two major mechanisms for providing diﬀusion in the round function. which ensures that each output byte depends on all input bytes. except for the PHT and the key addition that are designed to be merged for faster implementations. and xor.8) version of another register. The best construction for such a matrix seems to be an extended RS-(15. but it provides a speedup for bulk encryption. the general ordering of operations in Twoﬁsh alternates as follows: 8-by-8 S-box. MD5 [Riv92]. We did not implement dynamic swapping [KKT94] or any additional complexity.4 PHT 7. a version of the encryption and decryption code can be “compiled” for a given key. The actual transformation deﬁned by the MDS matrix is purely linear over GF(2). all in a single clock cycle.6 Feistel Combining Operation Twoﬁsh uses xor to combine the output of F with the target block. and GF(2) addition (xor). a one-bit rotation may shift the only non-zero bit of a nibble out of a byte. but this additional overhead was considered to be well worth the extra performance in software. with the result placed in an arbitrary Pentium register. It should be noted that using addition instead of xor does impose a minor gate count and speed penalty in hardware. On a smart card. we decided not to use an 8-by-8 MDS matrix over GF(24 ). In fact. but it would reduce the best Pentium software performance for bulk encryption. was chosen to facilitate very fast operation on the Pentium CPU family using the LEA (load eﬀective address) opcodes. Instead of using 4 key-dependent S-boxes. The underlying algebraic operations thus alternate between non-linear table lookup. with the round subkeys inserted as constant values in LEA opcodes in the instruction stream. and the PHT. 7.7) code over GF(24 ).5 Key Addition As noted in the previous section. MDS matrix. several levels of alternating xor and 8-by-8 permutations are applied.4. This approach requires a full instantiation in memory of the code for each key in use.

an addition with constant might turn a diﬀerence of 0000018016 into 0000008016 . These rotations in the Twoﬁsh round functions were included speciﬁcally to help break up the bytealigned nature of the S-box and MDS matrix operations. In general. This structure provides symmetry for decryption in the sense that the same software pipeline structure can be applied in either direction. A large reduction in the number of changed bytes is very unlikely. and one after the xor. placing the right rotation after the xor preserves this property. The chances of this happening depend on the distance between the two bits that inﬂuence each other. This bit could be regained using some extra operations. Also. We cannot guarantee that a single byte input change to the F function will change 7 or 8 of the output bytes of F . The rotates also make it harder to use the same high-probability characteristic several times as the bits get rotated out of place. The MDS matrix was chosen to guarantee that a 32bit word rotate right by one bit will preserve the fact that all four bytes of the 32-bit word are changed for all single input byte diﬀerences. the one-bit right rotation occurs after the Feistel xor with the PHT output. On the other hand.Feistel xor quantities. For example. in this case. An early draft of the Square paper proposed a very simple and powerful attack. the most signiﬁcant byte of this PHT output will still have a non-zero output diﬀerence with probability 254/255 over all 1-byte input diﬀerences. Limiting rotations to single-bit also helps minimize hardware costs. For example. Rotating by only one bit position helps optimize performance on the Pentium . with very little apparent cryptographic beneﬁt. both of the 32-bit words that are xored with the round function results are also rotated by a single bit. This might be a useful view of the cipher for analysis purposes. For instance. given a single input byte diﬀerence that aﬀects only T1 . it is much harder for the attacker to analyze the cipher. the least signiﬁcant three bytes of the Feistel xor output are guaranteed to change 30 7. Square uses an 8-by-8-bit permutation and an MDS matrix in a fashion fairly similar to Twoﬁsh. but without any rotations. In particular. the advantages were considered to outweigh the disadvantages. The rotations were also very carefully selected to work together with the PHT and the MDS matrix to preserve the MDS diﬀerence properties for single byte input diﬀerences to g. but the software performance would be signiﬁcantly decreased. the rotate right puts the 2T1 quantity back on its byte boundary.9 One-bit Rotation Within each round. it is much harder to ﬁnd iterative characteristics. The half of the PHT involving the quantity T0 + 2T1 will lose the most significant bit of T1 due to the multiply by two. except that the most signiﬁcant bit has been lost. In particular. By rotating a single bit per round. and adding some ﬁxed rotations just before the output whitening. One word is rotated before the xor. It is only the rotations that separate Twoﬁsh from having a “reversible” Feistel structure. Third. for both encryption and decryption. First. the rotate right is done after the xor with the Feistel quantity involving T0 + 2T1 . There are three downsides to the rotations in Twoﬁsh. unlike the Pentium Pro. based solely on such byte statistics. during decryption. they make the simple technique of merely counting active S-boxes quite a bit more complicated. that forced the authors to increase the number of rounds from six to eight. there is a minor performance impact (less than 7% on the Pentium) in software due to the extra rotate opcodes. Choosing a rotation by an odd number of bits ensures that each of the four 32-bit words are used as input to the g function in each of the eight possible bit positions within a byte. The reason is that the carries in the addition of the PHT can remove certain byte diﬀerences. Therefore. Note that. has a oneclock penalty for multi-bit rotations) and on smart card CPUs (which generally do not have multi-bit rotate opcodes). an attack against SAFER is based on the cipher’s reliance on a byte structure [Knu95c]. but we do not expect any implementation to use such a structure. However. Thus. which we feared might otherwise permit attacks using statistics based on strict byte alignment. On the whole. It is possible to convert Twoﬁsh to a pure Feistel structure by incorporating round-dependent rotation amounts in F . the rotates make it harder to analyze the cipher for security against diﬀerential and linear attacks. the rotations make the cipher non-symmetric in that the encryption and decryption algorithms are slightly diﬀerent. each 32-bit quantity is used as an input to the round function once in each of the eight possible bit alignments within the byte. thus requiring distinct code for encryption and decryption. since the bits do not line up. Second. CPU (which. since the wiring overhead of ﬁxed multi-bit rotations is not negligible.

rotate right after xor) occurs during decryption. the key-dependent S-boxes are derived by alternating ﬁxed S-box lookups with xors of key material. Since the rest of the encryption is a permutation. Although our best non-related-key attack only breaks 5 rounds of the cipher. we discuss whether diﬀerent sequences of subkeys can give equivalent encryption functions. the key is mapped down to a block of data half its size. k2 ) and ∗ ∗ ∗ (k0 . The Round Subkeys 64 bits of key material are combined into the output of each round’s F function 31 In this section. to be equivalent in their eﬀects. and the most signiﬁcant byte will change with probability 254/255. performing one rotation before and one after the Feistel xor imposes a symmetry between encryption and decryption that helps guarantee very similar software performance for both operations on the Pentium CPU family. using addition modulo 232 . R0 ) must encrypt to the same output block (L1 . The fact that the rotate left by one bit occurs before the Feistel xors during encryption guarantees that the same relative ordering (i. Twoﬁsh was deﬁned to have 16 rounds primarily out of pessimism. It is easy to deﬁne Twoﬁsh variants with more or fewer rounds. to prevent a slide attack. This key material has the eﬀect of making many cryptanalytic attacks a little more diﬃcult. DES. Recall that F is the F function without the addition of the round subkeys. That is. The properties of a Feistel cipher ensure that no pair of 2-round subkey sequences can be equivalent for all inputs. which seems to be the norm for block ciphers. We get δ1 = F (R0 ⊕ T ) − F (R0 ⊕ (T + δ0 )) (1) . and that block of data is used to specify the S-boxes for the whole cipher. at a very low cost. Let T = F (L0 )+k0 and observe that when L0 goes over all possible values. as we would otherwise have two sequences of 2-round keys that would deﬁne the same 2-round encryption function. k1 . every input block (L0 . It is natural to ask next whether any pairs of three sequential rounds’ subkeys can exist that cause exactly the same encryption. R2 ) for both sequences of subkeys. Even so.. this can be seen as selecting among as many as 2256 diﬀerent (but closely related) 128-bit to 128-bit permutations.e. as will be discussed below. and + represents 32-bit word-wise addition. For this analysis we keep the function F constant. we cannot be sure that undiscovered cryptanalysis techniques do not exist that can do better. Note that nearly all the added strength against cryptanalytic attack is added by the xor of subkeys into the input to the ﬁrst and last rounds’ F functions. IDEA. we took pains to ensure that the Twoﬁsh key schedule works with a variable number of rounds.1 Equivalence of Subkeys 7. Hence. As discussed before.11 The Key Schedule To understand the design of the key schedule. we consider 16 rounds to be a good balance between our natural skepticism and our desire to optimize performance. Also. 7. The F function without the round subkey addition is a permutation on 64-bit values.after the rotation. 7. k2 ). which has only one ALU pipeline capable of performing rotations. preserving the diﬀerence-property for both directions. and another 128 bits after encryption. The Key-dependent S-boxes To create the Sboxes. We have the following equalities R1 ∗ R1 = = = = = = L1 L1 R2 R2 R1 ⊕ (k2 + F (L1 )) ∗ ∗ R1 ⊕ (k2 + F (L1 )) ∗ R0 ⊕ (k0 + F (L0 )) L0 ⊕ (k1 + F (R1 )) ∗ ∗ L0 ⊕ (k1 + F (R1 )) R0 ⊕ (k0 + F (L0 )) where ⊕ represents bitwise xoring. and Skipjack all have 8 cycles. For a pair of subkey sequences. the round subkey selects among one of 264 closely related permutations in each round.11. ∗ ∗ Note that k0 = k0 and k2 = k2 . These subkeys must be slightly diﬀerent per round.10 The Number of Rounds Sixteen rounds corresponds to 8 cycles. Using the two equations for L1 we get k1 + F (R1 ) = δ1 = ∗ ∗ k1 + F (R1 ) ∗ F (R1 ) − F (R1 ) ∗ where δ1 = k1 −k1 is ﬁxed. k1 . so does T . it is necessary to consider how key material is used in Twoﬁsh: The Whitening Subkeys 128 bits of key material are xored into the plaintext block before encryption. we only vary the subkeys Ki and not S. (k0 .

the chance of this happening for the N = 256 case is about 263 · 2−159 = 2−96 . 0).∗ where δ0 = k0 − k0 . Observe that for the speciﬁc values of δT0 that are possible. which can be seen as four key-dependent S-boxes followed by an MDS matrix.2. (231 . 231 ). This allows us to make many useful predictions about the likelihood of ﬁnding keys or pairs of keys with various interesting properties. Because we will be analyzing the key schedule using this assumption in the remainder of this section. That leaves us with the following possible values for δ1 : δ1 ∈ {(0. This property does not guarantee that the round function F has no key-equivalences (since other key material is added into the outputs of the PHT). 7. . (0. 0). (231 .11. The S-box generates a corresponding sequence of outputs. Set T = 0 and look at the cases R0 = 0 and R0 = δ0 . which is close to 2159 . (0. Each of the possible values for δ1 corresponds to exactly one possible value for (δT0 . There simply do not appear to be enough degrees of freedom in choosing the diﬀerent subkeys to make pairs of equivalent subkey sequences in as few as 16 rounds. (0. . 0). (231 . 231 )} We can write down the analogue to equation 1 for g: δT0 = g(R ⊕ T ) − g(R ⊕ (T + δ0 )) for all R and T and where δ0 is the appropriate half of δ0 . where equivalent round keys lead to the same encryption/decryption function for all possible inputs. subtraction and xor are the same. . This is the chance of such equivalent S-boxes existing at all. and those words are processed with the PHT (with a couple of rotations thrown in) to produce a pair of subkey words. we have been unable to prove this. We can model each byte sequence generated by a key-dependent S-box as a randomly selected nonrepeating byte sequence of length 20. For T = 0 this translates in a simple diﬀerential equation δT0 = g(R ) ⊕ g(R ⊕ δ0 ) for all R . As discussed in Section 7. 4. This is a con∗ tradiction. we can conclude that the other half of δ0 must also be zero. A Conjecture About Equivalent Subkeys in Twoﬁsh We believe that. 231 ). We are looking at 20-byte-long sequences of distinct bytes. there are no pairs of equivalent round keys.2 Byte Sequences F (0) − F (δ0 ) = δ1 = F (δ0 ) − F (0) = −δ1 The substraction here is modulo 232 for each of the two 32-bit words. 2. Each S-box gets a sequence of inputs.3 Equivalent S-box Keys We have veriﬁed that there are no equivalent S-box keys that generate the same sequence of 20 bytes. and that the equation must hold for all values of R0 and T . In fact. but it provides partial heuristic evidence for that contention. so we conclude that δ0 = 0. However. There are 256!/236! of those sequences. 38) or (1. . Similarly. In this section we analyze the sequences of outputs that this construction can generate. δT1 ) ∈ {(0. 0). .3 we have not found any statistical deviations between our key-dependent Sboxes and the random model in any of our extensive statistical tests. We can easily convert them to diﬀerence values at the input of the PHT of F . We know that g has only one perfect diﬀerential: 0 → 0. . . In the random model. which are then used to derive subkeys. 231 )} These are the possible diﬀerence values at the output of F in equation 1. The input to the S-boxes is basically a counter. 3. but within that restriction we have veriﬁed that no two keys 32 The subkeys in Twoﬁsh are generated by using the h function. we should discuss how reasonable it is to treat this byte sequence as randomly generated. (231 . 5. as k0 = k0 . for up to 16 rounds of Twoﬁsh. . if we peel oﬀ one layer of our q construction and assume the rest of the construction is random. We conclude that there are no two sets of 3-round subkey sequences that result in the same encryption function. and thus δ0 = 0. Note that δ1 and δ0 depend only on the round keys. Analyzing these byte sequences thus gives us important insights about the whole key schedule. We also conjecture that there are no pairs of Twoﬁsh keys that lead to identical encryptions for all inputs.11. All key material is used to deﬁne key-dependent Sboxes in h. δT1 ): (δT0 . . 39).) The function g depends only on half the key entropy. (Recall that the Twoﬁsh keys are used to derive both subkey sequences and also S-boxes. We get lead to the same function g. 7. The corresponding outputs from the four S-boxes are combined using the MDS matrix multiply to produce the sequence of Ai and Bi words.

k2 ) byte values that leads to a pair of sequences before the xor with k3 that have a ﬁxed xor diﬀerence. The chances of any pair of t. This proves that A and B are uniformly distributed for all key lengths. each desired output xor from the matrix multiply allows only one possible input xor. The numbering of the key bytes has no eﬀect on the security arguments in this section. k8 . M ). 7. the output of the S-box will also cycle through all 256 possible values. M )) Since the MDS matrix multiply is invertible. we can see that no A or B value can repeat itself. but which must be chosen to give us a given diﬀerence sequence in the xor of their byte sequences. we can discuss the properties of the A and B sequences generated by each key M . Let t[i] := q0 [q1 [q1 [q0 [i] ⊕ k0 ] ⊕ k1 ] ⊕ k2 ] The goal is to ﬁnd a pair of t[i] sequences such that t[i] ⊕ t∗ [i] = constant Let us assume that t generates a random sequence. we can see from the construction of h that each key byte aﬀects exactly one S-box used to generate A or B. The actual byte ordering for s and a 0 3 0 256-bit key’s subkey-generating S-box is k0 . up to three of the four byte sequences can be identical without much trouble. s3 (i. assuming the key M is uniformly distributed. Note that this is the probability that such a pair of inputs exists. then the result of h will also cycle through all possible values. each deﬁned using 32 bits of key material. Suppose we have two S-boxes. 33 . s1 (i. we are faced with an interesting problem: since the MDS matrix multiply is xor-linear.11.) Let us consider the more general problem of how to get a given 20-byte diﬀerence sequence between a pair of S-boxes. t∗ generating such a constant diﬀerence is about 2−151 . A zero output xor diﬀerence in A can occur only with a zero output xor diﬀerence in all four of the byte sequences used to build A. this is very unlikely to be possible for a randomly chosen diﬀerence pattern in the A sequence. and identical sequences of inputs will go into that last q0 S-box. 3. This brings the chance of ﬁnding a pair with such a constant diﬀerence down to 247 · 2−151 = 2−104 . k1 . Only 1020 possible output diﬀerences (out of the 232 ) in Ai can occur with a single “active” (altered) S-box. If we take four key bytes that contribute to four diﬀerent S-boxes. k16 . (There are of course diﬀerence sequences of Ai ’s that can occur. Ai = MDS(s0 (i. If we cycle any one of the key bytes that contributes to that S-box through all 256 possible values.6 Diﬀerence Sequences in A and B Let us also consider diﬀerence sequences. 17 For ease of discussion. k24 . This means that getting any desired diﬀerence sequence into all 20 Ai values requires getting a desired xor sequence into all four 20-byte sequences. Without loss of generality we look at17 : s0 (x) = q0 [q0 [q1 [q1 [q0 [x] ⊕ k0 ] ⊕ k1 ] ⊕ k2 ] ⊕ k3 ] Identical sequences are possible only if the inputs to that last q0 ﬁxed permutation are identical for both sequences.11.5 The A and B Sequences From the properties of the byte sequences. 7.) As mentioned above. (Note that if the desired output xor in Ai is an appropriate value. Then. not the probability that a random pair of keys will have this property. 7. Changing a single key byte always alters every one of the 20 bytes of output from that S-box. Each desired output xor in A requires a speciﬁc output xor in each of the four byte sequences used to form A. Consider a single byte of output from one of the Sboxes. which are not equal. M ). That means that the task of ﬁnding a pair of identical sequences comes down to a simple task: ﬁnding a pair of (k0 .k going into one S-box. This means that: 1. simply by leaving their key material unchanged. we number the key bytes as k . and so always alters every word in the 20word A or B sequence to which it contributes.. and we cycle those four bytes through all possible values. M ). 2. Most diﬀerences require all four S-boxes used to form Ai to be active.4 Byte Diﬀerence Sequences Similarly. s2 (i.11. We can estimate the probability of a pair of 4-byte inputs existing with the desired xor diﬀerence sequence as 263 · 2−159 = 2−96 . k3 can be changed to include that xor diﬀerence. If we have a speciﬁc diﬀerence sequence we want to see in A. and since i is diﬀerent for each round’s subkey words generated.we can improve that bound.

this alters the output xor difference. consider an xor diﬀerence. to get into the subkeys.11 Properties of the Key Schedule and Cipher One NIST requirement is that the AES candidates have no weak keys. the probability that the diﬀerence will pass through correctly is cut in half. As all pairs (Ai . Here we argue that Twoﬁsh has . who knows where the changed bits are.11. the most probable xor diﬀerence in a round’s pair of subkey words may be either 2−(2k−1) or 2−2k . Consider x+y (x ⊕ δ0 ) + y = = z z ⊕ δ1 Let k be the number of bits set in δ0 .11. M ∗ ] = X[i. Thus. This is true because addition and xor are very closely related operations. This aﬀects two subkey words: K2i K2i+1 = Ai + Bi = ROL(Ai + 2Bi . depending on whether or not the xor diﬀerence in B covers the next-to-highest-order bit. while it is natural to consider A and B diﬀerence sequences in terms of xor diﬀerences. For the subkey generation. and the results of those additions are xored into half of the cipher block. the highest probability value for δ1 is δ0 .7 The Sequence (K2i .) However. they survive badly through the xor with the plaintext block. the subkeys are added to the results of the PHT of two g functions. 7. 7. 9) in the odd subkey with the same probability.9 XOR Diﬀerences in the Subkeys where the additions are modulo 232 . 9) Diﬀerence sequences in A and B translate into difference sequences in (K2i .11. all the pairs (K2i . K2i+1 ) are distinct. The only diﬀerence between the two is the carry between bit positions.M ⊕ Ki. just rotated left nine bits in the odd subkey word. An xor diﬀerence in the subkeys has a fairly high probability of passing through the addition operation and ending up in the cipher block. The situation is more complex for multiple adjacent 34 An xor diﬀerence in A or B is easy to analyze in terms of additive diﬀerences modulo 232 : an xor diﬀerence with k active bits has 2k equally likely additive diﬀerences. When the xor diﬀerence is in B.The above analysis is of course also valid for the B sequence. 7. 7. not counting the highest-order bit. 7. If we assume these xor diﬀerences propagate independently in the two subkeys (which appears to be the case). the result is slightly more complicated. δ0 . we get it in both subkey words. we may discuss diﬀerence sequences: D[i.11. and the xor diﬀerence ROL(δ0 .M ∗ where the diﬀerence is computed between the key value M and M ∗ . in A. K2i+1 ). and the probability that this will hold is 2−k . A desired xor diﬀerence sequence for all 20 pairs of subkey words is thus quite diﬃcult to get to work when k ≥ 3.11.8 Diﬀerence Sequences in the Subkeys bits. but the general rule still holds: for every bit in the xor diﬀerence not in the high-order bit position.M − Ki. (The rotation does not really complicate things much for the attacker. Note that if we have a additive diﬀerence in A. If ﬂipping a given bit changes the carry into the next bit position. we see that this leads to an xor diﬀerence of δ0 in the even subkey word with probability 2−k . M ∗ ] = Ki. a xor diﬀerence must ﬁrst pass through the ﬁrst addition. Thus.) Note that when additive subkey diﬀerences modulo 232 are used in an attack. We estimate that xor diﬀerences are much more likely to be directly useful in mounting an attack. subkeys can reasonably be considered either as xor diﬀerences or as diﬀerences modulo 232 . k-bit xor diﬀerences lead to a given additive diﬀerence in a pair of subkey words with probability 2−k . so are all the Ki .10 Diﬀerences in the Subkeys Each round. (The probability of this is determined by the Hamming weight of the xor diﬀerence. Bi ) are distinct. This happens with probability 1/2 per bit. not counting the highest-order bit. M.M ∗ Ki. Then. assuming the desired xor diﬀerence sequence can be created in the A sequence at all. K2i+1 ) As Ai and Bi are uniformly distributed (over all keys). However. M. The most probable xor diﬀerence in the round’s subkey block thus occurs with probabiity 2−2k . although it might happen that Ki = Kj for any pair of i and j.

which requires getting all eight of the key scheduling S-boxes to give the same output for the same input for two diﬀerent keys. If an attacker sends xor diﬀerences through the PHT and subkey addition steps. respectively. but it is extremely unlikely.12 Key-dependent Characteristics and Weak Keys The concept of a key-dependent characteristic seems to have been introduced in [BB93] in their cryptanalysis of Lucifer. For this to work. a simple relation [Knu94b] is deﬁned as: EM (P ) = C 18 See ⇒ Ef (M ) (g(P. we do not believe this kind of key-dependent diﬀerential characteristic will have any relevance in attacking Twoﬁsh. and h are simple functions. This seems extremely unlikely to us.11. 7. A diﬀerential attack on Twoﬁsh may consider xorbased diﬀerences. we have to get an identical A and B sequence at the same time. or both. M. We are almost certain that there are no equivalent keys in Twoﬁsh. M ) [Haw98] for further cryptanalysis of IDEA weak keys. We do not believe that self-inverse keys exist for Twoﬁsh. Keys cannot generate a self-inverse sequence of subkeys. Again. but this advantage is relatively small. C . In general. because the same round subkey value can never appear more than once in the cipher. Simple Relations erty exists when: EM (P ) = C A key complementation prop⇒ EM (P ) = C where f . M )) = h(C. The statistical analysis presented earlier shows that the best linear and diﬀerential characteristics over all possible keys are still quite unlikely. There is no pair of keys. thus giving equivalent keys. and K. his diﬀerential characteristic probabilities will be dependent on the subkey values involved. Pairs of these keys that have identical subkeys. More generally. and also appears in [DGV94a] in an analysis of IDEA. where P . This is easy to see. so a single high-probability diﬀerential or linear characteristic for one S-box will not create a weakness in the cipher as a whole. for each S-box. it has probability 2−96 of existing at all. that gives the same subkey sequence. Equivalent Keys A pair of equivalent keys. it is conceivable that some keys are self-inverse despite using diﬀerent subkeys at diﬀerent points. M1 . such that EM0 (EM1 (X)) = X. M ∗ . each S-box must have a pair of keys that reverses it. g.18 The idea is that certain iterative properties of the block cipher useful to an attacker become more eﬀective against the cipher for a speciﬁc subset of keys. (Zero bits in the subkeys improve the probabilities of cleanly getting xor-based diﬀerential characteristics through the subkey addition.) Since there appears to be no special way to choose the key to make the subkey sequence especially low weight. are astronomically unlikely to exist at all. It is conceivable that diﬀerent sequences of subkeys and diﬀerent S-boxes in g could end up producing the same encryption function. and strongly doubt that they exist. M. for all X. We have veriﬁed that this cannot be done. M ∗ . is a pair of keys that encrypt all plaintexts into the same ciphertexts.none. and K are the bitwise complement of P. Self-Inverse Keys Self-inverse keys are keys for which encrypting a block of data twice with the same key gives back the original data. additive diﬀerences. just running backwards. low-weight subkeys will give an attacker some advantage. Note that the structure of both diﬀerential and linear attacks in Twoﬁsh is such that such attacks appear to generally require good characteristics through at least three of the four key-dependent S-boxes (if not all four). We have found no simple relations for Twoﬁsh. 35 . C. No such property has been observed for Twoﬁsh. A much more interesting issue in terms of keydependent characteristics is whether the keydependent S-boxes are ever generated with especially high probability diﬀerential or high bias linear characteristics. This is a kind of collision property. to get identical subkeys. Pairs of Inverse Keys A pair of inverse keys is a pair of keys M0 . but we cannot prove that such equivalent keys do not exist.

though we have validated some of its elements. Relatedkey cryptanalysis gives the attacker an additional way into a cipher: the key schedule. and attacks on a Twoﬁsh variant with ﬁxed Sboxes breaking six rounds with 267 work. Notice that any attempt in a related-key attack to aﬀect only a single byte in the computation of A or B is guaranteed to aﬀect all four bytes in the computation of T0 and T1 . That is. related-key attacks aﬀect a cipher’s ability to be used as a one-way hash function.5 chosen plaintexts and 251 work.5 chosen plaintext pairs and 251 computations of the function g. and breaking seven rounds with 2131 work. We also discuss an attack on four rounds of Twoﬁsh with no post-xor requiring 232 work. we have not implemented this attack. and about 232 work. without the post-whitening. this is particularly attractive for smart cards and hardware implementations.1 Overview of the Attack 8 Cryptanalysis of Twoﬁsh The general idea of the attack is to force a characteristic for the F function of the form (a. Conventional cryptanalysis allows an attacker to control both the plaintext and ciphertext inputs into the cipher. and the most eﬃcient attack against Twoﬁsh with a 256-bit key has a complexity of 2256 . Note that the RS matrix multiply can be implemented as a simple loop using the generator polynomial without storing the coeﬃcients of the RS matrix.12 Reed-Solomon Code The RS structure helps defend against many possible related-key attacks by diﬀusing the key material in a direction “orthogonal” to the ﬂow used in computing the 8-by-8-bit S-boxes of Twoﬁsh. so the concern in choosing an RS code with such simple coeﬃcients was not the performance overhead. there is a 2−k probability that the We have spent over one thousand man-hours attempting to cryptanalyze Twoﬁsh. 19 We have discussed the relevance of related-key attacks to practical implementations of a block cipher in [KSW96. because related-key attacks give the attacker the most control over the cipher’s inputs. a single byte change in the key is guaranteed to aﬀect all four key-dependent S-boxes in g. Because all of these coeﬃcients are “simple. We realize this characteristic internally by causing the same low Hammingweight xor diﬀerence to occur in the outputs of both g computations in the second round of the cipher.and postwhitening) with a chosen-key attack. with 222. we conjecture that the most eﬃcient attack against Twoﬁsh with a 128-bit key has a complexity of 2128 . Due to the high computational requirements. 0) to occur in the second round.1 Diﬀerential Cryptanalysis This attack breaks a 5-round Twoﬁsh variant with a 128-bit key. A cipher that is resistant to attacks with related keys is necessarily resistant to simpler techniques that only involve the plaintext and ciphertext. 8.7. When this happens. 8.” this RS computation is easily performed with no tables. The code generator polynomial is x4 + (α + 1 3 1 )x + αx2 + (α + )x + 1 α α • 10-round Twoﬁsh (without the pre. This computation is only performed once per key schedule setup per 64 bits of key. the minimum number of diﬀerent bytes between distinct 12-byte vectors generated by the RS code is guaranteed to be at least ﬁve. requiring 232 chosen plaintexts and about 211 adaptive chosen plaintexts. A summary of our successful attacks is as follows: • 5-round Twoﬁsh (without the post-whitening) with 222. the most eﬃcient attack against Twoﬁsh with a 192-bit key has a complexity of 2192 . using only a few shifts and xor operations. 36 . and there are k bits in that xor diﬀerence. The S-box keys are used in reverse order from the associated key bytes so that relatedkey material is used in a diﬀerent order in the round function than in the subkey generation. Most importantly. KSW97]. The fact that Twoﬁsh seems to resist related-key attacks well is arguably the most interesting result. None of the attacks appears to be extensible to enough rounds to pose any substantial threat to the full 16 round Twoﬁsh. where α is a root of the primitive polynomial w(x) used to deﬁne the ﬁeld. For example.19 Based on our analysis. but saving ROM space or gates. b) → (X.1. Since RS codes are MDS [MS77]. Precomputing the RS remainders requires only 8 bytes of RAM on a smart card for 128-bit keys. The reversible RS code used in Twoﬁsh was chosen via computer search to minimize implementation cost. we conjecture that there exists no more eﬃcient attack on Twoﬁsh than brute force.

The ﬁrst step allows us to get the characteristic through. There are also several other ways to distinguish the Y. 1) ⊕ F2. 2.3 The Input Structure The input structure works as follows: The input pair is (A. If the same one of those diﬀerences goes into the MDS matrix in both g computations in the second round.2 a 0 a 0 Y Q ∆Rr. We expect there to be between three and four output xor diﬀerences from s0 that will lead to some 8-bit output xor difference from the whole g function. D∗ ) The input structures we use must thus accomplish two things: 1. 1). Getting One Potential Right Pair Let us consider pairs of inputs to the F function in the second round. predictable diﬀerence sequence. which is what makes this attack possible. the second step allows us to mount the whole 5-round attack. 8) so that the single-byte diﬀerence into the second g computation in round 2 lines up with the single-byte diﬀerence in a. We have Z = ROL(a.1. by giving us enough data to recover the low bits of the S-box entries. S. we choose a = (α. but they do not change the diﬃculty of the attack. and b = ROR(a. That. This xor diﬀerence must occur in the output of the active S-box s0 in both parallel g functions in the second round’s F function. inside the g function computation. and R are all diﬀerences whose values we do not care about. Q. However. For each plaintext pair generated in the previous step. and where a = ROL(a. Y . Based on the properties of the MDS matrix. b) → (0. these will be correct. 0.2 The diﬀerential The attack requires that we get a speciﬁc. Select 2n plaintext pairs in such a way that there is a high probability of at least one pair being a right pair.1 0 b X Z R T ∆Rr.1 is guaranteed to be zero. we shall (roughly speaking) guess the last-round subkey and decrypt up one round.3 b 0 b X Z R The thing that makes this attack hard is getting that event to occur several times in succession. D). then. X) to happen with high probability. we know that there are three single-byte output xors for s0 that lead to an output xor of g with Hamming weight of only 8. we get an oﬀsetting pair of values in the two g outputs.0 0 0 1 a 2 0 3 Y 4 Q 5 S ∆Rr. they lead to only one output word from the whole F being changed. Z diﬀerence. Z. Recall that a right pair is a pair that follows the whole diﬀerential characteristic described above. generate about 2048 related plaintext pairs. Therefore. and so we cannot expect to choose the best bytewise input diﬀerence to s0 to get one of our desired xor diﬀerences. 8. 0) to be a single-byte diﬀerence in s0 . The only critical event for this attack occurs at r = 2. We choose: r ∆Rr. b = ROR(b. we do not know s0 . We must ﬁrst get a desired xor diﬀerence to occur in the output of the active S-box of the attack. T . 8. When the outputs go through the PHT. When our guess is successful. We will call a pair of texts for which this happens a potential right pair. 37 where the table gives the diﬀerence patterns for the round values for each of the 5 rounds. leads to a detectable pattern in the output diﬀerence from the third round. To recognize right pairs. 0. so long as the low-order bit of Z is always zero. 1) is also zero.5 chosen plaintexts.1. . B ∗ . and thus to a detectable pattern (with a great deal of work) in the output diﬀerence in the output from the cipher’s ﬁfth round. C. we are very likely to get at least one batch of 2048 “right pairs. with probability 2−8 . and the low-order bit of F2.1 . and we will be able to recover most of the key.” which have the 5-round diﬀerential characteristic shown above. We choose input structures to guarantee that. 1). C ∗ . B. and the other output has the same value subtracted from it. because the low-order bit of Z is always zero. so are all 2048 related plaintext pairs. Note that X.output of the PHT will be unchanged in one of its two words. after 222. one g output has some value added to it modulo 232 . we choose a so that the low-order bit of ROL(a. Then the diﬀerence Z is detectable. and we do not care which of these diﬀerences we hit. also. checking whether the low-order bit of the diﬀerence is as expected. in turn. such that if the ﬁrst plaintext pair is a right pair. (A∗ . where we need the characteristic (a. With 2048 successive events of this kind. we can mount an attack in which we attempt to recover the S-box entries.

it is possible that each time a bit in U is changed from a zero to a one. with fairly high probability. to mount the 5-round attack. Because this xor diﬀerence has a Hamming weight of only 8. V. we expect to get at least one right pair. when i is stepped through all possible byte values. and reduce the number of plaintext pairs needed to about 220 . there is a probability of 1/2 of one of them being a potential right pair for each of the three or four useful s0 output xors. without changing any other bytes. If we xor some nonzero value with Hamming weight 8. B ∗ . Getting Right Pairs from Potential Right Pairs A potential right pair has the property that it is getting the same xor diﬀerence in the output of both parallel g functions in the second round. Thus. of which at least one is. s0 [i] ⊕ s0 [i ⊕ δ] will hit any desired output xor with probability about 1/2. V ⊕ W. That is. we must now try many diﬀerent values for the input to the second g function in the second round. without altering the high-order byte values. T ) also satisﬁes that equation for any value of W . Once we have recovered information on all (or most) S-box entries. to have at least a 7/8 probability of getting one potential right pair. again. This requires 256 diﬀerent input pairs. (This ignores the input whitening. B. Thus. There is a complication to all this. U and V . xoring T into both U and V changes one of the bits to be added from zero to one. X) for the round function has a probability of about 2−20 . For any U and T . we found empirically that a characteristic of the form (a. When this happens: U + V = (U ⊕ T ) + (V ⊕ T ) This is what happens in a right pair. The second round’s F function input is the right half of the plaintext. this makes for 2 · 4 · 256 = 2048 unknown bits. when it is tried on all 256 possible input values. Out of all this.3. C ∗ . we need a batch of 2048 such pairs. In Section 8. This means that. This will generate a potential right pair with probability 7/8 if there are three useful s0 output xors. we must try 216 diﬀerent input pairs. The reason we need 2048 pairs is that we will recover two bits of information about each entry in each S-box. for any value of W . Because we do not know which of the plaintext pairs described above has the potential right pair. Consider two random values. we expect to get a real right pair. then 2−8 of all V values will satisfy equation 2 for given U and T . and the other bit to be added from one to zero. Instead. This allows us to improve the probability of the characteristic for the F function. b) → (0. When T has a Hamming weight of 8. Unfortunately. xored with the ﬁrst round’s F function output. T . the values of V for which this holds are characterized by (U ⊕T ⊕V )∧T = 0 where the ∧ is the bitwise-and operation. into both. a randomly selected (non-zero) input xor diﬀerence has about a 1/2 probability of causing any desired output xor diﬀerence.) This means that. we derive a new potential right pair by leaving the relationship between the pair the same. When we try about 256 different values of this kind with a potential right pair. but that merely oﬀsets our guess and has no eﬀect. This can be done by changing any of the bytes that go to the second round’s second g function. (That is. (A∗ . but altering D in its low-order two bytes. from potential right pair (A. D∗ ). we generate 256 diﬀerent plaintext pairs for each of the 216 plaintext pairs we already have. This implies ((U ⊕ W ) ⊕ T ⊕ (V ⊕ W )) ∧ T = 0. We already saw that this equation is equivalent to (U ⊕ T ⊕ V ) ∧ T = 0. So if we have a (U. equation 2. the probability is at least 2−8 that the two g functions will have a very useful relationship modulo 232 : one g function will have some value added to it modulo 232 . . with a randomly selected s0 . C. and each of the 2048 pairs give us one bit of information on those unknowns. for any nonzero δ. the corresponding bit in V is changed from a one to a zero. the other g function in the same plaintext will have that value subtracted from it.) Our structure attempts to get this sequence of inputs into the same byte position in both parallel g functions in the second round’s F function at the same time. we cannot directly control the inputs to the second round. a right pair. giving 224 plaintext pairs. Turning One Right Pair into 2048 Right Pairs We now have 220 plaintext pairs. we must guess the xor of the high-order byte of each word of the F function output. we can then readily recover the keying material entering each S-box by brute force. From a single potential right pair. Looking only at one bit position.1. in all bit positions where T has a 1-bit.Simulations show that. then (U ⊕ W. D). Consider. the bit values of U and V 38 (2) must be opposite. we must apply this process to all of them. Since we do not know the ﬁrst round’s F function output. T ) that satisﬁes equation 2. the result remains the same.

1 mod 4. We ﬁx c3 . those values can be arbitrary. d0 ). to deal with all possible byte xors between the s1 input in C and in D in the second round.This gives us a way to derive many plaintext pairs from one right pair. c∗ range over all possible values such that 2 1 c1 ⊕ c∗ = v and c2 = c∗ . and the next-lowest bit of F4. (A∗ . d1 .5 resulting plaintexts. The intuition is that 244 > 218 · 256 · 2048. t1 be the outputs of the two g functions in round four. d∗ ) and letting 0 0 c1 . (Ai . we know that the low-order bit of Z is zero. so that the same actual pair of inputs occurs in s1 in both g functions. d∗ ). We must try 256 diﬀerent possible values in our inputs. and T = ROL(Z. D). 8. (C ∗ . We thus summarize how it is built: 1. This is guaranteed to give the ∗ ∗ same output xor in both g functions.5 possible combinations of c1 . Di ). D∗ ) leads to a right pair. B ∗ . we face the same constraints as in getting a desired output xor diﬀerence from s0 . D be denoted by (c0 . (c∗ . Then. C ∗ . depending on K17 mod 2). Let the bytes of C. c1 . Of course. Suppose that some value of (C. in turn. the structure ensures we’ll ﬁnd 2048 simultaneous right pairs. and if so. how to recover key material. From now on. We can restrict our attention to those plaintext pairs (C. When we ﬁnd one right pair. We let c2 range over 16 possible values. (c∗ .1 . d∗ ) in turn. we guess the value v of the byte xor mentioned above (used to ensure that the s1 inputs are the same in round 2). we need only look at T and F4. the next-lowest bit from t0 (and possibly the low-order bit of t0 . . d2 . Finally. Ci . D∗ values to the C. so that the low-order bit of F4. In doing this. c2 . d2 . This. the diﬀerence T is visible from a ciphertext pair. the issue is how 0 0 to identify whether it contains 2048 right pairs. d1 ). d3 . the input structure is highly complex. First. 1) ⊕ ∆F4. We let t0 . Bi . d∗ ). d3 ). 3.(d0 . d0 ). means that the next-lowest bit of the observable diﬀerence T depends only on the xor of low-order bits from the g function outputs. (c∗ . so it is plausible that there could be enough plaintext pairs. C. (c∗ . D∗ ) with c1 ⊕ c∗ = v. c2 . B. If we have a right pair (A. Next let’s look for a right pair. Suppose we knew we had a batch of right pairs. (A∗ . d0 ). D values. if the input structure is well-chosen. d1 . Ci . Summary of the Input Structure To get a ﬁve round attack to work. then we can generate ∗ ∗ a new right pair. 2 1 Based on these observations. for those pairs. c2 . 6. we must manipulate the Ci . Di ). 5. we’ll also have 1 d1 ⊕ d∗ = v for free. Di values in such a way that the same xor diﬀerence occurs in the output of both g functions in the second round. d0 ) range over any 1450 possible values. and get the encryption of the 222. Then we note that what makes this a right pair depends only on (c0 . d0 ). D). this reduces the number of us2 able plaintext pairs to 236 /16 = 232 . so that F4. We then alter the byte inputs to s1 in the altered Ci and Di in the same way.1. so long as they are distinct. Di values as do the C ∗ . given any one 0 0 right pair. and force d1 = c1 . and so we use the same solution: send in the same byte inputs to s1 in C and D. i To do this. This introduces still one more complication to our input structure. (c∗ . D). 2. c∗ .1 depends only on an xor of the low-order bit from t1 . 39 Using the Structure We shall describe how we use the available plaintexts to construct the necessary plaintext pairs. We let the pair (c0 . Therefore. We ﬁx A and B throughout the attack. 4. The Ci . we try each of the 14502 possibilities for (c0 . testing each 0 0 of them for a right pair. We guess the low-order bit of K17 . you can easily get a batch of 232 /14502 = 2048 right pairs by ﬁxing (c0 . d∗ ). d0 ). therefore. we form all 28 · 16 · 1450 = 222.1 = t0 + 2t1 + K17 mod 232 . We let c1 range over all possible values. (C ∗ . we shall use them as described below. We shall also 1 1 insist that c2 = c∗ . D∗ ). Di values have the same relationship to the Ci .4 Recovering Key Material We describe next how to use a batch of 2048 simultaneous right pairs to recover all of the key material entering the S-boxes. This increases the total number of batches of plaintext pairs requested by a factor of 256. This gives us about 244 /28 = 236 plaintext pairs with appropriate values of (c1 . considering it ﬁxed. This gives us 244 possible plaintext pairs which can be chosen from the set of available texts. c3 ). (c0 . We focus on one value of (c0 . we shall think of v as ﬁxed. and those two relations ensure 1 that the byte inputs into s1 match up in the second round as desired if we have guessed v correctly. We again do not know the output from the ﬁrst round’s F function. and so do not know how to get identical bytes into s1 in the second round.1 depends only on the low-order bit of t0 . Bi .

we could guess the key material entering the S-boxes. If we guess the whole seventh round subkey. however: we only really need to know the ﬁrst g computation result in the ﬁfth round. A correct guess will allow us to compute the ﬁfth round’s g functions. Therefore. 8.j. and move on to another value 40 of (c0 . The resulting attack breaks six rounds with about 267 work. This means we need to know only the ﬁrst input to the g function in the ﬁfth round. (c∗ . which will allow us to recognize the sequence of right pairs by the low-order bit of the Z diﬀerence. However. since he can see the Z difference word directly. namely x0.) If some S-box does not admit any choice of keys.0. just as in the above attack. Known g Keys We can also consider a variant of Twoﬁsh with known S-boxes in g. and check our guess against the observed values of T . s0 can be recovered by a brute force search using the known values of x·. The attacker can immediately rule out all but the right batch of ciphertext pairs. the right key will reveal an unchanged low-order bit in A after g is computed. we use our batch of 2048 pairs to generate 2048 linear equations on the 2048 unknowns. using them to generate a system of 0 0 2048 linear equations on 2048 unknowns.k ∈ GF(2) as two linear combinations on each of the 256 entries in each of the four S-boxes. The total work expected is 232 trial g computations. However. (c∗ . we can mount the same attack.p ⊕ x0. Our key recovery technique is based on establishing and solving linear equations. We thus get an attack with 2131 work.q ⊕ x0. To expose the low-order bit of the Z difference. This enables us to isolate the key material entering each S-box. This leads to a total work factor of about 14502 · 20482 · 256 + 232 = 251 . which is too high for comfort. thus obtaining two bits of information on each Sbox entry (or on most of them). d0 ).17 would represent the low-order bit of the output of the MDS matrix when applied to s3 (17). and 2128 work for a 256-bit key. see below for a more eﬀective diﬀerential attack which works when the S-boxes are known. and solve for each S-box in turn. Finally. (c∗ .1.3. . (Alternatively. for example. since the PHT has the eﬀect of making the low-order bit of the second word of F function output dependent only on that value and one bit of the ﬁfth round subkey. d0 ).) Otherwise. similarly for the other x’s. the key material entering s0 could be recovered by a table lookup in a 216 – 232 -entry table.· . d0 ). With this idea in mind. we propose an alternate analysis method with much lower complexity.0. Look at the 2048 candidate pairs generated by one value of (c0 . the attack needs about 222.At this stage. the system of linear equa0 0 tions will be inconsistent with good probability. In the end. In this case. in addition. this procedure will recover all of the key material entering the S-boxes. and 32 bits of the sixth round subkey. We construct 2 · 4 · 256 = 2048 formal unknowns xi. then we can also reject the batch of 2048 candidate pairs. d∗ ). d∗ ). this between 216 work (for a 128-bit key) and 232 work (for a 256bit key). q. For example. This is true because the loworder bit of t0 is obtained as a GF(2)-linear combination of x’s. the rest of the key can be recovered by other techniques.5 Reﬁnements and Extensions Attacking Four Rounds The above structure can be used in a much easier 4-round attack. (When we are looking at the wrong 0 0 value of (c0 . compute the low-order bits of t0 and t1 for each ciphertext pair. we must now guess the last round’s subkey. x0. which means we need only guess 32 bits of the sixth round subkey. on average. although a more sophisticated algorithm is more appropriate since we will have a very sparse matrix). He can then mount a 232 eﬀort search for S. r. d∗ ). then we know that those candidate pairs do not represent right pairs. when we ﬁnally en0 0 counter a right pair. (c∗ . for each ciphertext pair which arises from a right pair. The key material entering. but push it through more rounds. and recover the key material from the right pairs with at most 232 work. s).5 chosen plaintexts. we reject each wrong batch of pairs with about 20482 work. we can use the same basic attack. which is of some theoretical interest against 192-bit and 256-bit keys. Consider a 7-round Twoﬁsh variant with this input structure. d∗ ). We now need to know the whole input to the sixth round F function. This system can be solved by standard linear algebra techniques (such as Gaussian elimination.r ⊕ x0. Therefore. our algorithm for identifying right pairs and recovering key material is as follows.3. The main observation is that the observable diﬀerence T gives us one equation on those unknowns. and we may move on to another value of (c0 . There is an improvement. Consider a 6-round Twoﬁsh variant with this input structure. say. and thus the cost of ﬁltering should not be too high. d0 ).2. we solve the system of linear equations.1.s if the input to g is (p. this would already require 264 work for a 128-bit key. If this system of linear equations has no solution.

These attacks seem to work best against algorithms with simple algebraic functions. Additionally. Truncated diﬀerential attacks are most often successful when most of the cipher’s internal components operate on narrow bit paths. The almost complete diﬀusion within a round function makes it very diﬃcult to isolate a portion of the block and ignore the rest of the block. Our approach was based on the observation that the number of active S-boxes can be counted by examining the active bytes positions of the examined diﬀerential characteristics. all others are considered active. Twoﬁsh’s 64-bitwide F function seems to make truncated diﬀerential characteristics hard to ﬁnd. We cannot ﬁnd any higher-order diﬀerentials that can be exploited in the cryptanalysis of Twoﬁsh.It may also be possible to improve the number of plaintexts required by a factor of several thousand. 8. when (DPmax )n is suﬃciently small. We follow this approach to develop additional evidence that Twoﬁsh is likely to be secure against diﬀerential attacks. the hard part is to determine a bound on the number n of active S-boxes.. this approach often results in extremely crude and conservative bounds. truncated diﬀerential attacks are often diﬃcult to extend to more than a few rounds in a cipher. a shorter form for the 41 Attacks using truncated diﬀerentials apply a differential attack to only a partial block [Knu95b]. while a 0 represents a zero byte diﬀerence (and thus an inactive byte position). because we now know the diﬀerence tables for our S-boxes. 8. For convenience. knowing the pattern of active bytes in the diﬀerence at the output of the F function lets one derive constraints on the pattern of active bytes at the inputs of F functions at neighboring rounds.) If any diﬀerential characteristic of suﬃcient length must include . Moreover. so characterizing the pattern of active bytes in a given diﬀerential characteristic allows us to count the number of S-boxes it includes. (An inactive Sbox is one that is fed with the trivial characteristic 0 → 0.3 already provides convenient values of DPmax for us.2. for example.2. In particular. By an active byte of a diﬀerential characteristic. An even shorter notation treats the string speciﬁcation of active bytes as a binary string. the results can provide powerful evidence of the cipher’s security against diﬀerential cryptanalysis. we are not aware of any higher-order diﬀerential attack reported in the open literature that is successful against more than 6 rounds of the target cipher. For example.2 Truncated Diﬀerentials at least n active S-boxes. we introduce a simple notation to indicate the location of active bytes: a x represents an (unspeciﬁed) non-zero byte diﬀerence (and thus an active byte position). it is often easy to obtain a crude lower bound on the complexity of a diﬀerential attack by simply bounding the number of active S-boxes that must be involved in any successful diﬀerential characteristic. We believe that Twoﬁsh is secure against truncated diﬀerential attacks. due to the heavily byte-oriented nature of the F function. the characteristic 0000000000B4000C16 → 591FE4720123450016 for the F function might be represented in our ﬁrst notation as 00000x0x → xxxxxxx0. then one can conclude that the probability of any diﬀerential characteristic can be at most (DPmax )n . We have not found any truncated attacks against Twoﬁsh. Nonetheless.2. and packs it into a hexadecimal string for more convenient display.3 Search for the Best Diﬀerential Characteristic When assessing the security of ciphers built out of S-boxes and simple diﬀusion boxes. an S-box in the F function is active exactly when its input byte is active. quadratic) between pairs of plaintext and ciphertexts [Lai94. which indicates that two of the byte positions in the input diﬀerence are active and all but one of the bytes in the output diﬀerence are active. we mean one that has a non-zero diﬀerence at that position of the characteristic.2 8. and poor short-term diﬀusion.g. only a few rounds. Of course. Section 7. Knu95b]. 8. Of course. there must be at least four active bytes at its output.1 Extensions to Diﬀerential Cryptanalysis Higher-Order Diﬀerential Cryptanalysis Higher-order diﬀerential cryptanalysis looks at higher order relations (e. we can obtain bounds on the propagation of active bytes through the F function. Furthermore. and there is no characteristic for any of the S-boxes with probability greater than DPmax . if there is one active byte at the input to the F function. in the sense that the true complexity of mounting a practical diﬀerential attack is often much higher than the bound indicates.

) For example. and so on. Our implementation used Matsui’s algorithm for pruned search of all diﬀerential characteristics [Mat95]. 0 → 0. Of course.3 chosen plaintexts. A few troublesome diﬀerential characteristics for the PHT are ignored: namely. so the best an attacker could hope for is an attack needing 268. one of the best 8-round characteristics found was 1 2 3 4 5 6 7 8 9 88->FE 88<-00 88->00 88<-F0 00->F0 00<-F0 88->F0 88<-00 88 00 (2 (0 (2 (4 (0 (4 (2 (0 active) active) active) active) active) active) active) active) Here we see that the characteristics for the F function were 8816 → FE16 . it is diﬃcult to model the eﬀects of the one-bit rotations and the PHT precisely. and also because they have very low probability. thus needing to cover only rounds 2–13. and then any diﬀerential attack working for a signiﬁcant fraction of the keyspace would require at least 288. so we were forced to generalize from data on characteristics covering 8 or fewer rounds. the best 12-round diﬀerential characteristic must involve at least 20 active S-boxes. those that rely on a carry bit propagating through at least 8 bit positions.) 2. Therefore. We may now add in the results of Section 7. The reasoning is as follows. 8816 → F016 . about 2−64 of the 192-bit keyspace). therefore. We ignore the one-bit rotations. any one such pattern represents a large class of possible differential characteristics. For all 128-bit keys. yet it is not considered in our model. Our implementation of Matsui’s algorithm took too long to search the space of all diﬀerential characteristics that were 9 rounds or longer. 01FFFE8016 ) → (0000010016 . For example.) Any attack which does not violate this model must obey our bounds below. These sorts of characteristics were omitted because they are harder to characterize in full generality. The program did show that covering 4 rounds requires at least 6 active S-boxes. . where the average probability of any particular diﬀerential of an S-box is of course much lower than the maximum probability. or in the second case. In this model. The attacker would have to ﬁnd speciﬁc differences for the active byte positions that ﬁt together and that still make the attack work. under this model. we are guaranteed that DPmax ≤ 18/256. This already rules out all practical attacks. Thus. we concentrated our eﬀorts on 12-round characteristics. In this case.3. we have DPmax ≤ 24/256.same characteristic is 0516 → FE16 . so any attacker would need at least (DPmax )−20 ≈ 276. we can count the minimum number of S-boxes over all diﬀerential characteristic by fully searching the set of all patterns of active bytes. we found that there are no good highprobability diﬀerential characteristics covering more than 12 rounds of Twoﬁsh. We list here some of the practical barriers a real attacker would have to surmount: 1. since they introduce non-linear dependence across byte boundaries. (Experiments suggest that the previous example has a probability of perhaps 2−36 or so. These estimates are likely to signiﬁcantly underestimate the complexity of a diﬀerential attack.3 chosen plaintexts. To simplify the task. any 12-round characteristic must include a sub-characteristic covering the ﬁrst 4 rounds and a sub-characteristic covering the last 8 rounds. (In some cases. (This bound is probably not tight. Our results show that. the characteristic (01FFFE8016 . 0000008016 ) for the PHT has positive probability. covering 6 42 rounds requires at least 10 active S-boxes. In this approach. Of course. even if these attacks were possible. DPmax = 12/256 is a more representative value (for all key sizes). any characteristic covering 12 rounds must include at least 6 + 14 = 20 active S-boxes. and covering 8 rounds requires at least 14 active S-boxes. for 192bit keys. we considered a restricted model that introduces two small inaccuracies: 1. This involves picking speciﬁc diﬀerences.6 chosen plaintexts. they could only work for a very small class of weak keys (constituting about 2−48 of the 128-bit keyspace.2. It is much more natural to demand that the attack work for a signiﬁcant percentage of all keys. the one-bit rotations can cause diﬀusion across byte boundaries in a form that we did not model. We conservatively assumed that a good analyst might be able to bypass the ﬁrst round of Twoﬁsh and mount a 3-R attack. When implementing this approach.

By forcing neighboring characteristics for the round function to “line up” with shifted versions of each other. Those speciﬁc diﬀerences would have to propagate through the key-dependent S-boxes. The right way to interpret 43 these results. even for the small class of weak keys where this constraint is satisﬁed. 2. and the subkey addition with high probability. Because the PHT and the subkey addition rely on addition modulo 232 .) 2. since they eliminate any hope for good iterative characteristics. in practice. than our bound might indicate. in some cases. . The attacker would most probably have to ﬁnd a high-probability characteristic for each of the four S-boxes s0 . the one-bit rotations make it even harder. As an important special case. the overall probability of the diﬀerential characteristic would be signiﬁcantly reduced by the combined probabilities of passing the non-trivial round characteristics through the PHT and subkey addition. s3 at the same time. Recall that our model is an imperfect idealization of Twoﬁsh. and they typically have a very low probability. An attacker would have to ﬁnd a way to minimize those costs. all evidence available to us suggests that diﬀerential attacks on Twoﬁsh are likely to be well out of reach for the foreseeable future. Try to model the PHT more accurately than we have done. .(Doing so is already hard within the model. However. is to conclude that any diﬀerential attack which respects this model is extremely unlikely to succeed. the one-bit rotation makes it very hard to piece together several short characteristics into a useful characteristic covering many rounds. even the former task is nontrivial. the characteristics for the PHT which we modeled inaccurately are hard to use in real attacks: they are rare (which means they are hard to piece together with good characteristics for other cipher components). Try to take advantage of the one-bit rotations. Of course. As another important special case. and the latter may well be impossible in most cases. We have seen earlier that attaining DPmax for even one S-box requires exceptional luck— only about 2−12 of all 128-bit keys attain DPmax = 18/256 at any one S-box—and attaining DPmax at all four S-boxes places even more constraints on the key. A brief note on the implications of these results is in order. . if s0 (say) has only one input diﬀerence with probability DPmax . . There are three choices: 1. it makes it diﬃcult to use that input diﬀerence every time that s0 is active. it seems clear that the analyst will be hard-pressed to achieve much success. This might be an promising avenue for further exploration. Try to do both at the same time. if they have any eﬀect at all. We believe that the one-bit rotations make cryptanalysis harder. the attacker would have to learn which input and output diﬀerences for the S-box attain that maximum. Therefore. But what about the adversary who “breaks the rules”? We believe that piecing together an attack that avoids the diﬃculties imposed by our model will not be easy. pushing a diﬀerence with signiﬁcant probability through requires a diﬀerence with relatively low Hamming weight. Since the S-boxes are key-dependent. this serves to increase the diﬃculty of ﬁnding high-probability characteristics even further. . then. and Section 7 shows that it is very diﬃcult to cause the diﬀerence at the output MDS matrix to have low Hamming weight. this imposes signiﬁcant constraints on the diﬀerence pushed through the MDS matrix. we believe that any realistic diﬀerential attack is likely to require many more texts. The PHT and subkey addition are particularly thorny barriers: in our model. the attacker was not charged with the cost of pushing a diﬀerence through them. In short. but in real life. 3. and identify a way to piece them together into full 12round characteristics. the PHT. the MDS matrix. This should be even harder than either of the above. This provides compelling evidence that attackers who respect our model will probably get nowhere. While our results with this model do not rule out the possibility that such an approach to diﬀerential cryptanalysis might be made to work. 3. the abstraction has introduced inaccuracies. this should make good iterative characteristics very hard to ﬁnd (if they even exist). However.

8. The input diﬀerential results in a diﬀerential at the output of the MDS matrix that has a relatively high chance of being converted into a 0x0x pattern after a constant is added to it. We have not found any attack that directly relates to this property.4 Linear Cryptanalysis We perform a similar analysis to that of Section 8. The last characteristic has a diﬀerent explanation. as a result. the PHT maps this into an output diﬀerential where one half is zero.6 known plaintexts.2. The leading cause of all of these diﬀerentials seems to be that these two bytes generate the same output diﬀerential from the S-box. the one-bit rotations are not always treated properly. Note that in all but the last of these. the two active input bytes are going through the same key-dependent S-box due to the 8-bit rotate at the input of the second g function. We found some interesting diﬀerentials. as are the results. upper bounds on the diﬀerential characteristics of F of this type could be used to improve our bound on the best diﬀerential characteristic for the full cipher. 8.3 in the context of linear cryptanalysis. i. and it involves a few inaccuracies in special cases. we turn to the results of Section 7.3.1 Diﬀerential characteristics of F We ran a test for diﬀerential characteristics of the F function with one active byte in each 4-byte g input and at most 4 active bytes at the 8-byte F output. it may fail to capture some important features of the round function. 2−20 2−26 2−22 2−26 2−21 2−22 2−20 2−22 2−14 Diﬀerential x000 0x00 → x000 0x00 → x000 0x00 → 000x x000 → 000x x000 → 000x x000 → 00x0 000x → 00x0 000x → 0x00 0000 → 0000 xxxx 0000 0xxx xxxx 0000 0000 xx0x 0000 xxxx xxxx 0000 0000 xxxx xxxx 0000 0x0x 0x0x Our model is much as before: it is centered around the pattern of active bytes. or about 2−68 of the 192-bit keyspace). A search for the best linear characteristics within this model found that every 12-round characteristic has at least 36 active S-boxes. and thus the same diﬀerential at the output of the MDS matrix. so this suggests an attacker would need at least (LPmax )−36 ≈ 289. the diﬀerential 03f2037a16 has a fair chance of being converted to a 0x0x pattern after a 32-bit addition. 2. (Here a byte position is considered active if any of its bits are active in the linear approximation for that round. and such an analysis would only work for a very small class of weak keys (representing about 2−49.) Our characterization of linear approximations for the PHT was a bit weaker than the corresponding analysis for diﬀerential attacks. our model should be considered only a heuristic. 2216 → FF16 . Because of the second limitation. We have LPmax ≤ (108/256)2 . our model is less accurate than before.6 of the keyspace for 128-bit keys. For example. In particular. This probability depends on the bit pattern of the diﬀerential just before the PHT.3.. and so on. In this . but it provides a useful starting point for further research. More importantly. The approach is much the same. For example: Prob.e. All of these characteristics have a relatively low probability. In particular: 1. our model of the PHT fails to accurately treat some low-probability approximations for the PHT where there are fewer active bytes at the output than at the input. To translate these results into an estimated attack complexity. As before. With some positive probability. It is much more natural to demand that the attack work for a signiﬁcant percentage of all keys. 3316 → DD16 . Here is an example of one characteristic we found: 1 2 3 4 5 6 7 8 9 10 11 12 13 FE->FF FF<-FF FF->DD CC<-DD CC->00 CC<-00 CC->77 00<-77 00->77 66<-77 66->00 66<-00 66 07 (1 (2 (4 (6 (0 (6 (4 (0 (4 (6 (0 (3 active) active) active) active) active) active) active) active) active) active) active) active) These probabilities are very approximate as they derive from numerical experiments with not too many samples. if the mask Γ selects at least one bit from that byte position. 44 The ﬁrst few approximations for the F function are 0116 → FF16 .

we feel that diﬀerential-linear cryptanalysis is unlikely to be successful against the Twoﬁsh structure. Hence. and believe Twoﬁsh to be immune to this kind of analysis. Nonetheless. and probably overestimate the probability of the best linear characteristic. The principle of the attack is simple: if the ciphertext can be represented as a polynomial or rational expresson (with N coeﬃcients) of the plaintext. they present some useful evidence for the security of Twoﬁsh against linear cryptanalysis. we do not believe it can be brought to bear against Twoﬁsh for the same reasons that it is immune to linear cryptanalysis. and are very hard to combine due to the MDS and PHT mappings. We are not aware of any general way to partition F ’s inputs and outputs to facilitate such attacks. Due to the need to cover the last part of the cipher with two copies of a linear characteristic.8 chosen plaintexts. combined with the technique of multiple approximations [KR94]. 8. MSK98b] is eﬀective against ciphers that use simple alegbraic functions. (The available linear characteristics for Twoﬁsh’s round function have a relatively low probability. managed to improve the best linear attack against DES a minute amount [SK98].20 An attacker trying to carry out a partitioning attack is generally trying to ﬁnd some way of partitioning the input and output spaces of the round function 20 Similar The interpolation attack [JK97. However.4.4 Partitioning Cryptanalysis Diﬀerential-linear cryptanalysis uses a combination of techniques from both diﬀerential and linear cryptanalysis [LH94]. 8. We have been unable to ﬁnd any useful way to partition the input and output spaces of the Twoﬁsh F function or a Twoﬁsh round that works consistently across many keys. In practice. there are 264 diﬀerent F functions. because the output is somewhat more likely to be in one speciﬁc partition than in any of the others. the bias of the linear characteristic is likely to be extremely small unless the linear portion of the attack is conﬁned to just three or four rounds.1 Multiple Linear Approximations so that knowledge of which partition the input to a round is in gives some information about which partition the output from a round is in. In our analysis. 8. 8. because of the key-dependent Sboxes. JH96]. KR95] allow one to combine the bias of several high-probability linear approximations. These results are only heuristic estimates. 45 . presumably each with its own most useful partitioning.4. For the 128-bit key case. This can be seen as a general form of a failure of the block cipher to get good confusion.2 Non-linear Cryptanalysis Another generalization of linear cryptanalysis looks at non-linear relations [KR96a]: e. then the polynomial or rational expression can be reconstructed using N plaintext/ciphertext pairs. 8.5 Diﬀerential-linear Cryptanalysis Multiple linear approximations [KR94. it only provides a signiﬁcant advantage over traditional linear cryptanalysis when there are a number of linear approximations whose bias is close to that of the best linear approximation. Therefore.g.4.4. This can be used in a straightforward way to attack the last round of the cipher. 8. Har96.5 Interpolation Attack Partitioning cryptanalysis is another generalization of linear cryptanalysis [Har96. we do not feel that Twoﬁsh is vulnerable to this kind of cryptanalysis.4. and thus in this model any linear attack working for a signiﬁcant fraction of the keyspace would require at least 2120. JH96. ideas can be found in [Vau96b]. HM97].) This means that the cryptanalyst would need to cover almost all of the rounds with the diﬀerential characteristic.. an attacker after N rounds can distinguish the output from the N th round from a random block of bits. While this attack.case. An attacker attempts to ﬁnd a statistical imbalance that can be described as the result of some group operation on some function of the plaintext and some function of the ciphertext. We have not found any such statistical imbalances.3 Generalized Linear Cryptanalysis This generalization of linear cryptanalysis uses the notion of binary I/O sums [HKM95. making a diﬀerential-linear analysis not much more powerful than a purely diﬀerential analysis. LPmax = (80/256)2 is a much more representative value. we have found no diﬀerential-linear attacks that work against Twoﬁsh. quadratic relations. this seems to improve linear attacks by a small constant factor.

We can see no way for the attacker to actually test the 96-bit guess that it would take to attack even one round’s subkey in this way on the full Twoﬁsh. This type of analysis has had considerable success against ciphers with simplistic key schedules—e. diﬀerential related-key cryptanalysis. he now knows Ai . there is a generalization to related-key attacks as well. A “slide” attack occurs in an iterated cipher when the encryption of one block for rounds 1 through n is the same as the encryption of another block for rounds s + 1 to s + n. the sequence of Ai 46 8. If he guesses K0 . or against ciphers whose rounds functions have very low algebraic degree. he can compute the corresponding K1 . adaptive chosen plaintext). as we shall see next. and the level of access to the cipher needed to get those texts (e.7 Related-key Cryptanalysis Related-key cryptanalysis [Bih94. but the subkey sequences slide down one round. Thus. There are 8 at most m 232m−128 possible choices of M ∗ . it is unlikely that a pair exists at all with n ≥ 8. M ∗ for which a slide of n ≥ 4 occurs. A conventional attack is usually judged in terms of the number of plaintexts or ciphertexts needed for the attack. Let m ∈ {5. this requires 8n of these relations to hold. We have a total of nm 8-bit relations that need to be satisﬁed. Related-key slide attacks were ﬁrst discovered by Biham in his attack on a DES variant [Bih94]. For n ≥ 4 this is dominated by the case m = 5. Swapping Key Halves It is worth considering what happens when we swap key halves.1 is the best way we were able to ﬁnd to test such a partial key guess. and the combination of operations from diﬀerent algebraic groups (including both addition mod 232 and xor) should help increase the degree even further. An attacker can look at two encryptions. In its most advanced form. This means that any speciﬁc key is unlikely to support a slide attack with n ≥ 4.g. . but each guess takes 32 bits. Travois [Yuv97] has identical round functions. . M ∗ such that the key-dependent S-boxes in g are unchanged. Twoﬁsh’s S-boxes already have relatively large algebraic degree. we expect 2293−40n pairs M. when an attacker guesses some subset of the key bits. Observe that m ≥ 5 due to the restriction that S cannot change and the properties of the RS matrix that at least 5 inputs must change to keep the output constant.. In that case.. M ∗ ) for n values of j. S-1 [Anon95] can be broken with a slide attack [Wag95a]. Therefore. Conventional slide attacks allow one to break the cipher with only knownor chosen-plaintext queries. The Twoﬁsh key schedule has this property.7. Let us look in more detail for a ﬁxed key M . 8. but provides no information about the round keys Ki . M ) = si (j + 2s. To mount a related-key slide attack on Twoﬁsh. for each of the eight bytepermutations used for subkey generation. however. This is only half the length of the full key M . we swap the key bytes so that the values of Me and Mo are exchanged. an attacker must ﬁnd a pair of keys M. So for each M we can expect about 238−40n keys M ∗ that support a slide attack for n ≥ 4. He learns nothing of the key S to g. known plaintext. We can see no way to test a guess of S on the full sixteen round Twoﬁsh. He can carry this attack out against as many round subkeys as he likes. . GOST [GOST89] and 3-Way [DGV94b]—and is a realistic attack in some circumstances. That is. This amounts to ﬁnding. 8} be the number of S-boxes used to compute the round keys that are aﬀected by the difference between M and M ∗ . An alternative is to guess the key input S to g. In total. KSW97] uses a cipher’s key schedule to break plaintexts encrypted with related keys. .g. The differential attack described in Section 8. a change in the keys such that: si (j.6 Partial Key Guessing Attacks A good key schedule should have the property that. both plaintexts and keys with chosen diﬀerentials are used to recover the keys. The expected number of M ∗ that satisfy these 8 relations is thus m ·2−8nm+32m−128 . we believe that Twoﬁsh is secure against interpolation attacks after even only a small number of rounds. chosen plaintext. he does not learn very much about the subkey sequence or other internal operations in the cipher. Consider an attacker who guesses the even words of the key Me . For each round subkey block. and can slide the rounds forward in one of them relative to another. and can also be broken with a slide attack.1 Resistance to Related-key Slide Attacks 8. KSW96.However. we will ignore the other cases for now. . Over all possible key pairs. interpolation attacks are often only workable against ciphers with a very small number of rounds.

With only the key bytes affecting these ﬁve subkey generation S-boxes active. he wants to change M to M ∗ .. at best. the attacker must get zero diﬀerences in the byte sequences generated by all ﬁve active S-boxes. and thus the key S. the nature of the RS matrix is that if he needs to alter four bytes in any one of these S-boxes.2. All else being equal. and we would expect even eight successive bytes of unchanged byte sequence to require control of all four key bytes into the S-box. we believe that any useful relatedkey attack will require a pair of keys that keeps S unchanged. it will not be possible to compensate for this change in S with changes in the subkeys in single rounds.35. he needs to keep the g function. this implies that it is extremely unlikely that an attacker can ﬁnd a pair of key inputs that will get identical byte sequences in these middle twelve rounds. let us consider an attacker who wants to put a chosen subkey diﬀerence into the middle twelve rounds’ subkeys. since each active S-box is another constraint on his attack. we would not expect to see a pair of keys for even one S-box with more than eight successive bytes unchanged. it is interesting to ask whether there exist pairs of keys that give permutations of one another’s subkeys. we expect there to be. from changing. M. which means the middle twelve bytes.7. There are 263 pairs of key inputs from which to choose. the longer the key. Consider a single such byte sequence: the attacker tries to ﬁnd a pair of four-byte key inputs such that they lead to identical byte sequences in the middle twelve rounds. The attacker can keep the number of active S-boxes down to ﬁve without altering S. and thus to keep the individual byte sequences used to derive A and B identical. Consider the position of the attacker if he attempts a related-key diﬀerential attack with diﬀerent S keys.or 192-bit keys that gets only zero subkey diﬀerences in the rounds whose subkey diﬀerences it must control translates directly to an equivalent relatedkey attack on 256-bit keys. This must result in diﬀerent g outputs for all inputs. the more freedom an attacker has to mount a related-key diﬀerential attack. and control D[i. We discuss this kind of analysis of byte sequences in Section 7. The simplest related-key attack to analyze is the one that keeps both S and also the middle twelve rounds’ subkeys unchanged. The added diﬃculty is approximately that of adding 24 active S-boxes to the existing related-key attack. as well as or instead of through the plaintext/ciphertext port.3 The Zero Diﬀerence Case Permuting the Subkeys Although there is no known attack stemming from it. For the sake of simplifying discussions of the attack. a single pair of values for the twenty active key bytes that leave the middle eight subkeys unchanged. since we know that there are no pairs of S values that lead to identical S-boxes. It thus seeks to generate identical A and B sequences for twelve rounds.and Bi values totally changes. 47 8. We thus will assume the use of 256-bit keys for the remainder of this section. That is. and so there may very well be pairs of keys that give permutations of one another’s round subkey blocks. . At the same time. because of the diﬀerent index values used. If the byte sequences behave more-or-less like random functions of the key inputs.7. 8. This is almost as large as 264 . in order to maximize his control over the byte sequences generated by these S-boxes. and about 295 possible byte sequences available. Note that a successful related-key attack on 128. such an attack must control the subkey diﬀerence sequence for at least the rounds in the middle. From this analysis. he must alter four bytes in all ﬁve active S-boxes. Assuming the pair of S values does not lead to linearly related S-boxes. he can alter between one and four bytes in all ﬁve Sboxes.2 Resistance to Related-key Diﬀerential Attacks A related-key diﬀerential attack seeks to mount a diﬀerential attack on a block cipher through the key.11. and so this is what he should do. For this reason. The RS code used to derive S from M strictly limits the ways an attacker can change M without altering S. We would expect a speciﬁc pair of key bytes to be required to generate these similar byte sequences. he must alter bytes in all ﬁve. In practice. M ∗] for i = 12. The attacker must try to keep the number of active subkey generating S-boxes as low as possible. To get zero subkey diﬀerences. There are 20! ways that the rounds’ subkey blocks could be permuted. Against Twoﬁsh. To extend this to ﬁve active S-boxes. We can see no useful attack that could come from this.

note that with one altered key byte per active subkey generation S-box. Bi does not necessarily have great control over the xor or modulo 232 diﬀerence sequence that appears in the subkeys. we must consider the context of a related-key diﬀerential attack.6 Conclusions Our analysis suggests that related-key diﬀerential attacks against the full Twoﬁsh are not workable. 8. In practice.1 A Related-Key Twoﬁsh Variant Attack on a Overview of the Attack Results We deﬁne a partial chosen key attack on a 10-round Twoﬁsh variant without the pre.7. Bi diﬀerence sequences. If an attacker can ﬁnd a diﬀerence sequence for Ai .8. 48 8. with two key bytes per active S-box.7. because it forces an attacker to keep at least ﬁve key generation S-boxes active. are unknown to us. Note. This points out one of the diﬃculties in mounting any kind of successful related-key attack with nonzero Ai .8 8. For this reason. The attack requires us to control twenty selected bytes of the key. this increases to 2−78 .8. Bi that keeps k = 3. despite the fact that this also slows the speed of a hash function down by a factor of two. which are the same for both keys.5 Probability of a Successful Attack With One Related-Key Query So long as an attacker is unable to improve this. This attack can be converted into a related-key diﬀerential attack. the attacker ends up with a 2−39 probability that a pair of related keys will yield an attack. for nearly any useful diﬀerence sequence. The attacker does not generally know all of the key bytes generating either Ai or Bi . For reference. then the best estimate for the modulo 232 diﬀerence of the two subkey words for a given round has probability about 2−k .4 Other Diﬀerence Sequences An attacker who has control of the xor diﬀerence sequences in Ai . Instead. however. it increases to 2−117 .) Consider an Ai value with an xor diﬀerence of δ. each active S-box in the attack has a speciﬁc pair of deﬁning key bytes it needs to work. he has a probability of about 2−72 of getting the most likely xor subkey diﬀerence sequence. with three key bytes per active S-box. K ∗ . we do not believe that any kind of related-key diﬀerential attack is feasible. not including the high-order bit. Bi xor diﬀerence sequence does not make controlling the subkey xor diﬀerences substantially easier. our analysis suggests that. such as will be available to an attacker if Twoﬁsh is used in the straightforward way to deﬁne a hash function.or post xors. If the Hamming weight of δ is k. that we have spent less time working on resistance to chosen key attacks. Further. Our analysis suggests that any useful control of the subkey diﬀerence sequence requires that each active S-box in the attack have all four key bytes changed. this means that any key relation requiring more than one byte of key changed per active S-box appears to be impractical. but it requires 2155 randomly selected related key pairs be tried before one will prove vulnerable to the attack. Consider an Ai value with an xor diﬀerence of δ. We consider the use of the RS matrix in deriving S from M to be a powerful defense against relatedkey diﬀerential attacks. we recommend that more analysis be done before Twoﬁsh is used in the straightforward way as a hash function. he knows the xor diﬀerence sequence in Ai and Bi . or by ﬁnding a way to mount relatedkey attacks with diﬀerent S values for the diﬀerent keys. unless they also allow the attacker to use far fewer active key bytes. This leaves him with a probability of a pair of keys with his desired relation actually leading to the desired attack of about 2−115 . then the best estimate for the xor diﬀerence that ends up in the two subkey words for a given round generally has probability about 2−2k . and about 2−36 of getting the most likely modulo 232 diﬀerence sequence. which he wants to get. and needs to control the subkey diﬀerences for twelve rounds. and we note that it appears to be much more secure to use Twoﬁsh in this way with 128-bit keys than with 256-bit keys. The remaining twelve bytes of the two keys. An attacker specifying his key relation in terms of bytewise xor has ﬁve pairs of sequences of four key bytes each. First. and to set them diﬀerently for a pair of related keys K. . recovering them is the objective of the attack. which moves the attack totally outside the realm of practical attacks. (Control of the Ai . If the Hamming weight of δ is k. Note the implication of this: clever ways to control a couple extra rounds’ subkey diﬀerences are not going to make the attacks feasible.7. 8. either by ﬁnding a way to get useful diﬀerence sequences into the subkeys without having so many active key bytes.

We expect to have to try about 512 requests with diﬀerent xor diﬀerence values Xi . we might expect from U . Let’s call these values U. the attacker could choose the values of the active twenty bytes of both K and K ∗ . again. B. it is possible to generate a set of about 232 xor diﬀerence values that we can expect to see result from the changed subkeys when they’re used in the F function in the ﬁrst round.3 Choosing the Plaintexts to Request From U . This makes this attack impractical as a diﬀerential related-key attack. we assume that the keys K. How The Attack Works We start by requesting one plaintext encrypted under key K. We thus do the following: 1. The goal is to determine the output xor from the ﬁrst round’s F function when the input is A. The resulting attack recovers the twelve unknown key bytes with about 232 eﬀort. D ⊕ Yi ) to be encrypted under K ∗ . we are able to learn the remaining twelve bytes of key.8 but with nonzero xor subkey diﬀerences in rounds 0 and 9. which must be identical for both keys. D ⊕ Yi ). This allows us to learn S. which are generated from the subkey generation S-boxes. Yi . V . with the changed bytes selected in such a way to leave S unchanged between the two keys.8. and about 211 adaptive chosen plaintexts. We can now learn the S-boxes in the g function. while remaining ignorant of the other twelve bytes of K and K ∗ . Structure of the Key Pair Based on the discussion above. such that we get a useful subkey diﬀerence sequence. 8. with some ﬁxed xor relationship. 2. Thus.2 Finding a Key Pair Subkey Diﬀerences We actually care only about the subkey diﬀerence in the round subkey words. and thus to break the cipher. With knowledge of the active twenty key bytes of the attack. where these values are derived from the Xi .4 Extracting the Key Material The ﬁrst step in the attack is ﬁnding a pair of keys. and 264 related plaintexts encrypted under K ∗ . V. 8. To do this. rather than just an accidental occurrence. We note that a random pair of keys chosen to have this property has only a 2−155 probability of being the pair of keys that will generate our desired subkey diﬀerence sequence. K. Under K. C. We expect one of these to be a right pair. B. and to change all four key bytes used to deﬁne ﬁve of the S-boxes used for subkey generation. For each of the 232 diﬀerent 64-bit xor difference values. which we can recognize. This causes one of 256 possible xor differences in the g function output. but we do know U = U ⊕ U ∗ and V = V ⊕ V ∗ . while leaving the S key unchanged. V and U ∗ . The attacker would then attempt to learn those remaining twelve bytes of key. We don’t know any of U. (The only diﬀerence in the output is caused by the diﬀerence in round subkeys. V ∗ . request the encryption of 128-bit block A. We described above how such keys can exist. K ∗ diﬀer in twenty bytes. 3. which means it will give the same result for its left ciphertext half under K ∗ as the original plaintext block did under key K. it requires about 232 chosen plaintexts. C ⊕ Xi .8. We now vary the plaintexts requested under K ∗ . Here. Yi values that generated the original right pair based on the possible xor diﬀerences in g. we request the encryption of Aj . though an attacker who is able to control the active twenty bytes of key involved in a pair of keys could mount the attack for a chosen pair of keys. K ∗ . we expect to be a right pair. V ∗ . B. Recall that the subkey words are generated from a pair of 32-bit words. by the fact that the right half of .The attack requires two keys chosen to have the desired relation. V . Request the encryption of some ﬁxed 128-bit block A. we know the single xor diﬀerence from the F function. One of these. (That is.) 49 At this point. C. we assume the existence of some pair of keys with a subkey xor diﬀerence sequence zero in rounds 1. B.) 8. if we have more than one such ciphertext. Consider only those ciphertexts from the second step that lead to the same right half of ciphertext as the encryption under K in the ﬁrst step. U ∗ . B. Xi . Yi . D under K ∗ . We may need to request one more encryption under both K and K ∗ . We request another plaintext from K and K ∗ to verify that this was really a right pair. it requires about 1024 chosen plaintexts. under the two keys. and knowledge of S. D under key K. Under K ∗ . We thus request 512 plaintexts of the form Aj . C ⊕ Xi .8. to isolate and learn the diﬀerential properties of each S-box in g. we replace the high-order byte of A in the right pair with 256 diﬀerent values..

BS97] can be used to successfully cryptanalyze this cipher. due to the structure of the RS matrix used to derive S from the raw key bytes. we attempt to ﬁnd a pair of plaintexts. A 7-round diﬀerential attack requires 2131 eﬀort.10. B. B. s3. such that P ∗ is the result of encrypting P with one round. Overview of the Attack In a slide attack. even when the cipher has a 128-bit key. NSA refers to this particular side channel as “TEMPEST. We use this pair to derive four relations on g and thus determine the speciﬁc S-boxes used in g. so a related-key attack is much easier against a known S-box variant. C ⊕ Xi . and a 128-bit key. Additionally. 8. NMR scanning. This information should be enough to brute-force this S-box. but since we never use more than 128 bits of key in the key-dependent S-boxes. power consumption (including diﬀerential power analysis [Koc98]). and requires only 267 eﬀort.10 8. this means we know S.10. More importantly. We develop a slide attack on Twoﬁsh with any number of rounds.1 Attacking Simpliﬁed Twoﬁsh Twoﬁsh with Known S-boxes As discussed above. Side-channel cryptanalysis [KSWH98b] uses information about the cipher in addition to the plaintext or ciphertext.the ciphertext is the same for Aj . This allows us to guess and check the remaining key bytes three bytes at a time. 2. and electronic emanations. such that P ∗ has the same value as the intermediate value after one or more rounds of encrypting P . In particular. 21 The We attack a Twoﬁsh variant without any subkeys. we already know twenty of the thirty-two key bytes used. C. much of Twoﬁsh’s related-key attack resistance comes from the derivation of the S-boxes. Examples include timing [Koc96]. it breaks the Twoﬁsh variant with about 236 eﬀort.21 With many algorithms it is possible to reconstruct the key from these side channels. the eﬀective key size is 128 bits. Again. That is.2 Twoﬁsh Without Round Subkeys 8. However. We must ﬁrst ﬁnd a pair of plaintexts. thus whose whole key material is in the keydependent S-boxes. P ∗ ).” 50 . When we know these S-boxes.) It is obvious that a Twoﬁsh variant with ﬁxed Sboxes and no subkeys would be insecure—there would be no key material injected. D ⊕ Yi encrypted under K ∗ . and for Aj . and expect one to be the best ﬁt for our data. we do have these comments to make on the design. After carrying this out for each of the Aj . (The Twoﬁsh key would be 256 bits. and hence not a major concern in this design. We repeat the attack for each of the other bytes in A. (P. thus recovering the entire key. we gain direct access to the results of a single round of the cipher. we know all but three bytes used to derive each four byte “row” of values in S. The attack requires about 233 chosen plaintexts. we note that Twoﬁsh executes in constant time on most processors.9 Side-Channel Cryptanalysis and Fault Analysis Resistance to these attacks was not part of the AES criteria. The resistance to fault analysis of any block cipher can be improved using classical fault tolerance techniques. P ∗ ). D encrypted under K. If we can ﬁnd and expose this value. While total resistance to side-channel cryptanalysis is probably impossible.) Note that this attack would not work if we used round subkeys that were simply counter values. our diﬀerential attack on Twoﬁsh with ﬁxed S-boxes works for 6-round Twoﬁsh. as would happen if we used the identity permutation for all the key-scheduling S-boxes. Our attack has two parts: 1. we have good information about the xor diﬀerences being generated from g by the changes between A and Aj as input. we believe that total resistance to fault analysis is an impossible design constraint for a cipher. Note that at the beginning of the attack. (P. we will note the fourteen cases where we probably got 8-bit Hamming weights in the output xor diﬀerence of g. so that we recover all four S-boxes in g. This yields the eﬀective key of the cipher. we try all 232 possible S-boxes for s3. We are not currently aware of any attack on such a Twoﬁsh variant. we then attack that single round to recover the key. 8. (Note that this is the amount of key material used to deﬁne the S-boxes in normal Twoﬁsh with a 256-bit key. Fault analysis [BDL97.

iterative 2-round diﬀerential characteristics cannot exist.6 of all 264 possible outputs. we also have two relations on F : F (L0 ) = R0 ⊕ L1 and F (LN ) = RN ⊕ LN +1 . however. empirically E m ≈ 11.374 × 256 of all possible 8-bit values can appear as the output of S.. that we do not yet have direct g values. RN ) from the ﬁrst encryption. by showing that a Twoﬁsh variant which uses nonbijective S-boxes is likely to be much easier to break. Our pair has given us two relations on F . then we have four bytes of output from each S-box.04. attack on Blowﬁsh took advantage of non-bijectivity in the Blowﬁsh round function [Vau96a]. As for the distribution of m. experiments suggest Pr(m ≥ 10/256) ≈ 0. It is easily seen that the expected size of the range of a random function on 8 bits (such as q0 or q1 ) is r1 = 1 − (1 − 1/256)256 ≈ 1 − e−1 ≈ 0. and thus four relations on g. By the birthday paradox. We can try all 232 possible key input values for each S-box. Therefore. That way. and (LN +1 . and encrypt it with 232 randomly selected R0 values as (L0 .3 ≈ 964 possible values. their 3-way composition into si is likely to be even more non-surjective than either of q0 or q1 on its own.10. Let p = ∆x=0 p∆x /255 be the average probability over all non-zero input diﬀerences. We use a trick by Biham [Bih94] to get such a pair of plaintexts with only 233 chosen plaintexts: we choose a ﬁxed L0 .469.Finding Our Related Pair of Texts Twoﬁsh is a Feistel cipher. One rationale was to ensure that the 64-bit round function be bijective. i. This is certainly a rather large certiﬁcational weakness. we expect that only about 96 = 0.0002. so avoiding them should help to reduce the risk of a successful diﬀerential attack.975.02. We have E p = 3/256. When we compose such a function twice. and the 64bit Feistel function can attain only about 252. In other words. Note. attacks based on non-surjective round functions [BB95. its expected range becomes r2 = 1−(1−1/256)256r1 ≈ 1−e−r1 ≈ 0. perhaps with a couple of alternative values. R0 ). and Pr(p ≥ 16/256) ≈ 0. we expect to see one pair of values for which R1 = R0 ⊕ F (L0 ). for most sets byte values. and very quickly recover the correct S-boxes. Pr(m ≥ 16/256) ≈ . ∆x → 0. we sort the ciphertexts from the ﬁrst batch of encryptions on their left halves. R0 . RP95b. RPD97. Pr(p ≥ 10/256) ≈ 0. and the expected size of the range of a 3-way composition will be r3 ≈ 1 − e−r2 ≈ 0. To test whether we have such a pair. when they do exist. only one or two S-boxes will match them. and let (LN . 8. 51 . Since this is the only key material in this Twoﬁsh variant.3 Twoﬁsh with Non-bijective S-boxes We decided early in the design process to use purely bijective S-boxes. they often result in the highest-probability multi-round characteristic. To mount this attack. Consider the probability p∆x that a given input diﬀerence ∆x yields a collision in the S-box output. we invert it to get back the actual S-box outputs for all four diﬀerent values. RN ) represent the resulting ciphertext. We thus learn each key-dependent S-box in g.22 We argue here that this was a good design decision. and the halves are swapped after each round. To ﬁnd this pair.78. and let m = max∆x=0 p∆x be the maximum probability over all non-zero input diﬀerences. the output of the h function can attain only about 226. R0 . and Pr(m ≥ 22/256) ≈ 0. we get (LN . and then combining those bytes with an MDS matrix multiply. (This gets worse when the key size grows. we have two instances of the results of combining two g outputs with a PHT. the attack is done. Extracting the S-boxes The g outputs are the result of applying four key-dependent S-boxes to the input bytes. L0 ). we can try encrypting (L0 . as (R1 . R0 ) represent a given pair of 64-bit halves input into the cipher. operating on two 64-bit halves in each block. and the ciphertexts from the second batch of encryptions on their right halves. LN ) from the second encryption. Let (L0 . Since the MDS multiply is invertible. we can simply undo the PHT. since the number of compositions of the q functions gets larger. If those values are all diﬀerent. Pr(p ≥ 2/256) ≈ 0.632. L1 ) such that L1 is the value of the left half of the block after the ﬁrst round. also. Also. yielding relations on g. and see which ones are consistent with the results. Extracting g Values Once we have this triple (L0 .e.374. Since we have the actual PHT output values. We then encrypt the same L0 with 232 random R1 values. When we ﬁnd a match. Observe that when q0 and q1 are non-bijective. CWSK98] are sure to fail when the 64-bit Feistel round function is bijective. the probability is good that we have a pair with the desired property. L1 ). Instead.7/256.) We point out a serious diﬀerential attack when using non-bijective S-boxes. we need to ﬁnd some (L0 . We use the representation where the left half of the block is the input to the Feistel function. When we have a triple with the desired properties.0002. L0 ). We try all possible alternatives against any plaintext/ciphertext values 22 Vaudenay’s we have available. R0 ) and (L1 .

there is no way to prove this. . Assume for a moment that. we did manage to put a trap door into Twoﬁsh. also. and the MDS matrix. This would imply one of two things: One. As diﬃcult as it is to create a trap door for a particular set of carefully constructed S-boxes. we feel that the use of key-dependent Sboxes makes it harder to install a trap door into the cipher. and we have made no eﬀort to build in a secret way of breaking Twoﬁsh. and not much reason for anyone to believe us. We cannot prove that this is not true. that we have invented a powerful new cryptanalytic attack and have carefully crafted Twoﬁsh to be resistant to all known attacks but vulnerable to this new one. called a “master-key cryptosystem. (Of course. and for a class of weak keys consisting of 0. In this paper. It would have to work even though there is almost perfect diﬀusion in each round. we expect its probability to be about p. any trap door would have to survive 16 rounds of Twoﬁsh. We can oﬀer some assurances. despite the diﬃculties listed above. for 97. Again. one can trade oﬀ the number of texts needed against the size of the weak key class. Also. In fact. Additionally. However. However. If we look at any 2-round diﬀerential characteristic whose ﬁrst round is the trivial characteristic and whose second round involves just one active S-box. (The latter probability can probably be improved to about 3p6 . However.02% of the keys.5 diﬀerent 2-round diﬀerential characteristics (each of the right form so it will have probability about p) to ﬁnd a 13-round characteristic with expected probability p6 .) Next we consider a Twoﬁsh variant with nonbijective S-boxes. for 4% of the keys. q1 . it has been shown that this type of cipher. trap doors would be no diﬀerent. we will omit these considerations. The other possibility is that we have embedded a trap door into the Twoﬁsh magic constants and then transformed them by some means so that ﬁnding them would be a statistical impossibility (see [Har96] for some discussion of this possibility). This analysis clearly shows the value of bijective Sboxes in stopping diﬀerential cryptanalysis. we can ﬁnd a diﬀerential attack that succeeds with about 239 chosen plaintexts for the majority (about 78%) of the keys. making sure Twoﬁsh is immune so that we can profitably attack the other AES submissions. one can break the variant with about 228 chosen plaintexts. We obtain a 2-round iterative diﬀerential characteristic with probability m. Moreover. These design elements have long been known to make any patterns diﬃcult to detect. we cannot prove that this is not true. but with all rotations left intact. in great detail. we would probably publish it along with this paper. this variant can be broken with 228 chosen plaintexts. the ﬁgures given are just examples of points on a smooth curve. we have made headway towards a philosophical proof. We have explained. There are no mysterious design elements. we can certainly piece together 6. due to the extra degrees of freedom. None of this constitutes a proof. for 2% of the keys. Again. The resulting construction would seem immune to current cryptanalytic techniques. as cryptographers we 52 9 Trap Doors in Twoﬁsh We assert that Twoﬁsh has no trap doors. the variant cipher can be broken with roughly 221 chosen plaintexts. it can be broken with 224 chosen plaintexts. it helps motivate the beneﬁt of the rotations: they make short iterative characteristic much harder to ﬁnd. It would have to survive the pre. However. and for a class of weak keys consisting of 0. how we chose Twoﬁsh’s “magic constants”: the RS code. but as we are only doing a back-of-the-envelope estimate anyway. everything has an explicit purpose. Rather than outlining the proof here.” is equivalent to a public-key cryptosystem [BFL96].) Thus. we would likely skip the AES competition and go collect our Fields Medal. thereby conferring (we hope) some additional resistance against diﬀerential attacks. We ﬁnd that. but we as designers would know a secret transformation rule that we could apply to facilitate cryptanalysis. q0 . As designers. and thus a 13-round diﬀerential characteristic with probability m6 . Any reasonable proof of general security for a block cipher would also prove P = NP.5% of the keys. However. we have outlined all of the design elements of Twoﬁsh and our justiﬁcations for including them. One difﬁculty is that the rotations prevent us from ﬁnding an iterative 2-round characteristic. we have made every eﬀort to make Twoﬁsh secure against all known (and unknown) cryptanalyses.02% of the keys. the variant cipher can be broken with roughly 224 chosen plaintexts. it is much harder to create one that works with all possible S-boxes or even a reasonably useful subset of them (of relative size 2−20 or so). we can point out that as cryptographers we would achieve much more fame and glory by publishing our powerful new cryptanalytic attack.Consider a Twoﬁsh variant with non-bijective Sboxes and no rotations.and postwhitening.

Even the best attack against DES (a linear-cryptanalysis-like attack combining quadratic approximations and a multipleapproximation method) requires just under 243 plaintext/ciphertext blocks [SK98]. For example. any use of Twoﬁsh in a DaviesMeyer construction must ensure that only a single key length is used. There is a large gap between a weakness that is exploitable in theory and one that is exploitable in practice. Again. (See [Sch96] for a detailed comparison of the various modes of operation. As keys that are non-standard sizes have equivalent keys that are longer. Its speed is 65 clock cycles per byte of encryption. counter [NBS80]. And we would be foolish to even try. We believe that Twoﬁsh can be used securely in any of these formats. it is useful to compare ciphers of equal speed. and so on [Plu94. however. This means that it is impossible to categorically state that a given cipher construction is insecure: there might always be a number of rounds n for which the cipher is still secure.1 Using Twoﬁsh Chaining Modes All standard block-cipher chaining modes work with Twoﬁsh: CBC. given the quality of the public cryptanalytic research community. the 128-bit block size makes straightforward use of the Davies-Meyer construction useful only when collision-ﬁnding attacks can be expected to be unable to try 264 trial hashes to ﬁnd a collision. Gut98. 11.” After all. the user of the cipher actually has to choose how many rounds to use.g.) A cryptanalyst considering OFB-.23 There are ﬁfteen other variants [Pre93]. PRB98]. which makes it less desirable than faster. In a performancedriven model. SAFER-K64 [Mas94]. and Speed [Zhe97]. And the resulting algorithm’s dual capabilities as both a symmetric and public-key algorithm would make it far more ﬂexible than the AES competition. protocol nonces. alternatives. Table 9 gives performance metrics for block and stream ciphers on the Pentium processor.3 Message Authentication Codes Any one-way hash function can be used to build a message authentication code using existing techniques [BCK96]. KSWH98c]. KSWH98a. or CBC-mode encryption with Twoﬁsh may collapse the pre. this is not a useful engineering deﬁnition of “secure.4 Pseudo-Random Number Generators Twoﬁsh can also be used as a primitive in a pseudo-random number generator suitable for generating session keys.would achieve far greater recognition by publishing a public-key cryptosystem that is not dependent on factoring [RSA78] or the discrete logarithm problem [DH76.and post-xor of key material into a single xor. 11 11. it would be impossible to put a weakness of this magnitude into a block cipher and have it remain undetected through the AES process. We believe that. With that in mind. also secure. that the key schedule has been analyzed mainly for related-key attacks. we believe Twoﬁsh’s strong key schedule makes it very suitable for these constructions. OFB.2 One-Way Hash Functions 10 When is a Cipher Insecure? More and more recent ciphers are being deﬁned with a variable number of rounds: e. but does not appear to beneﬁt much from this. We are aware of no problems with using Twoﬁsh with any commonly used chaining mode. or compare ciphers of equal security and compare their speeds. A useful trap door would need to work with much less plaintext—a few thousand blocks—or it would have to reduce the eﬀective keyspace to something on the order of 272 . FEAL-32 is secure against both diﬀerential and linear attacks. and compare their security.. 23 These metrics are based on theoretical analyses of the algorithms and actual hand-tooled assembly-language implementations [SW97. ElG85. NIST94]. Additionally. which is equivalent to about 64 terabytes of plaintext/ciphertext encrypted under a single key. 11. CFB-. The most common way of using a block cipher as a hash function is a Davies-Meyer construction [Win84]: Hi = Hi−1 ⊕ EMi (Hi−1 ) 11. 53 . RC5. note. not for the class of chosen-key attack that hash functions must resist. However. CFB. while this might theoretically be true. public-key parameters.

Algorithm Twoﬁsh Blowﬁsh Square RC5-32/16 CAST-128 DES Serpent SAFER (S)K-128 FEAL-32 IDEA Triple-DES

Key Length variable variable 128 variable 128 56 128, 192, 256 128 64, 128 128 112

Width (bits) 128 64 128 64 64 64 128 64 64 64 64

Rounds 16 16 8 32 16 16 32 8 32 8 48

Cycles 8 8 8 16 8 8 32 8 16 8 24

Clocks/Byte 18.1 19.8 20.3 24.8 29.5 43 45 52 65 74 116

Table 9: Performance of diﬀerent block ciphers (on a Pentium)

11.5

Larger Keys

Even though it would be straightforward to extend the Twoﬁsh key schedule scheme to larger key sizes, there is currently no deﬁnition of Twoﬁsh for key lengths greater than 256 bits. We urge caution in trying to extend the key length; our experience with Twoﬁsh has taught us that extending the key length can have important security implications.

cipher. A family key is a way of designing this into the algorithm: each diﬀerent family key is used to deﬁne a diﬀerent variant of the cipher. In some sense, the family key is like an additional key to the cipher, but in general, it is acceptable for the family key to be very computationally expensive to change. We would expect nearly all Twoﬁsh implementations that used any family key to use only one family key. Our goals for the family key algorithm are as follows: • No family key variant should be substantially weaker than the original cipher. • Related-key attacks between diﬀerent unknown but related family keys, or between a known family key and the original cipher, should be hard to mount. • The family key should not merely reorder the set of 128-by-128-bit permutations provided by the cipher; it should change that set. A Twoﬁsh family key is simply a 256-bit random Twoﬁsh key. This key, F K, is used to derive several blocks of bits, as follows: 1. Preprocessing Step: This step is done once per family key. (a) (b) (c) (d) (e) T0 T1 T2 T3 T4 = F K. = (EF K (0), EF K (1)) ⊕ F K. = EF K (2). = EF K (3). = First 8 bytes of EF K (4).

11.6

Additional Block Sizes

There is no deﬁnition of Twoﬁsh for block lengths other than 128 bits. While it may be theoretically possible to extend the construction to larger block sizes, we have not evaluated these constructions at all. We urge caution in trying to extend the block size; many of the constructions we use may not scale well to 256 bits, 512 bits, or larger blocks.

11.7

More or Fewer Rounds

Twoﬁsh is deﬁned to have 16 rounds. We designed the key schedule to allow natural extensions to more or fewer rounds if and when required. We strongly advise against reducing the number of rounds. We believe it is safe to increase the number of rounds, although we see no need to do so.

11.8

Family Key Variant: TwoﬁshFK

We often see systems that use a proprietary variant of an existing cipher, altered in some hopefully security-neutral way to prevent that system from interoperating with standard implementations of the 54

Note that, using our little-endian convention, the small integers used as plaintext inputs should occur in the ﬁrst plaintext byte.

2. Key Scheduling Step: This step is done once per key used under this family key. (a) Before subkey generation, T0 is xored into the key, using as many of the leading bytes of T0 as necessary. (b) After subkey generation, T2 is xored into the pre-xor subkeys, T3 is xored into the post-xor subkeys, and T4 is xored into each round’s 64-bit subkey block. (c) Before the cipher’s S-boxes are derived, T1 is xored into the key. Once again, we use as many of the leading bytes of T1 as we need. Each byte of the key is then passed through the byte permutation q0 , and the result is passed through the RS matrix to get S. Note that the deﬁnition of T1 means that the eﬀect of the ﬁrst xor of F K into the key is undone. Note the properties of the alterations made by any family key: • The keyspace is simply permuted for the initial subkey generation. • The subkeys are altered in a simple way. However, there is strong evidence that this alteration cannot occur by changing keys, based on the same diﬀerence sequence analysis used in discussing related-key attacks on Twoﬁsh. • The S-boxes used in the cipher are altered in a powerful way, but one which does not alter the basic required properties of the key schedule. Getting no changes in the S-boxes used in the cipher still requires changing at least ﬁve bytes of key, and those ﬁve bytes of key must change ﬁve of the S-boxes used for subkey generation. 11.8.1 Analysis

then have a constant 64-bit value xored in. Relatedkey attacks of the kind we have been able to consider are made slightly harder, rather than easier, by this 64-bit value. The new constants xored into the preand post-xor subkeys simply permute the input and output space of the cipher in a very simple way. Related Keys Across Family Keys Relatedkey attacks under the same family key appear, as we said above, to be at least as hard as relatedkey attacks in normal Twoﬁsh. There is still the question, however, of whether there are interesting related keys across diﬀerent family keys. It is hard to see how such related keys would be used, but the analysis may be worth pursuing anyway. An interesting question, from the perspective of an attacker, is whether there are pairs of keys that give identical subkeys except for the constant values xored into them, and that also give identical S-boxes. By allowing an attacker to put a constant xor into all round subkeys, such pairs of keys would provide a useful avenue of attack. This can be done by ﬁnding a pair of related family keys, F K, F K ∗ , which are identical in their ﬁrst 128 bits, and a pair of 128-bit cipher keys, M, M ∗ , such that M generates the same set of S-boxes with F K that M ∗ does with F K ∗ . For a random pair of M, M ∗ values, this has probability 2−64 of happening. Thus, an attacker given complete control over M, M ∗ and knowledge of F K, F K ∗ can ﬁnd such a pair of keys and family keys. However, this does not seem to translate into any kind of clean related-key attack; the attacker must actually choose the speciﬁc keys used. We do not consider this to be a valid attack against the system. In general, related-key attacks between family keys seem unrealistic, but one which also requires the attacker to be able to choose speciﬁc key values he is trying to recover is also pointless. A Valid Related-key Attack An attacker can also try to ﬁnd quadruples F K, F K ∗ , M, M ∗ such that the subkey generation and the S-box generation both get identical values. This requires that T0 ⊕ M = =

∗ T1 ⊕ M ∗ ∗ T0 ⊕ M ∗

Eﬀects of Family Keys on Cryptanalysis We are not aware of any substantial diﬀerence in the diﬃculty of cryptanalyzing the family key version of the cipher rather than the regular version. The cipher’s operations are unchanged by the family key; only subkey and S-box generation are changed. However, they are changed in simple ways; the Sboxes are generated in exactly the same way as before, but the key material provided to them is processed in a simple way ﬁrst; the round subkeys are generated in the same way as before (again, with the key material processed in a simple way ﬁrst), and 55

T1 ⊕ M

**If F K, F K ∗ have the property that
**

∗ ∗ T0 ⊕ T0 = T1 ⊕ T1 = δ

then related-key pairs M, M ⊕ δ will put a ﬁxed xor diﬀerence into every round subkey, and may allow

some kind of related-key attack. Again, we do not think this is an interesting attack; the attacker must force the family keys to be chosen this way, since the probability that any given pair of family keys will work this way is (for 128-bit cipher keys) 2−128 . We do not expect such relationships to occur by chance until about 264 family keys are in use.

ibility, so implementers would have a choice of how much key pre-processsing to do depending on the amount of plaintext to be encrypted. And we tried to maintain these tradeoﬀs for 32-bit microprocessors, 8-bit microprocessors, and custom hardware. Since one goal was to be able to keep the complete design in our heads, any complication that did not have a clear purpose was deleted. Additional complications we chose not to introduce were keydependent MDS matrices or round-dependent variations on the byte ordering and the PHT. We also toyed with the idea of swapping 32-bit words within the Feistel halves (something we called the “twist”), but abandoned it because we saw no need for the additional complexity. We did keep in the one-bit rotations of the target block, primarily to reduce the vulnerability to any attack based on byte boundaries. The particular manifestation of this one-bit rotation was due to a combination of performance and cryptanalytic concerns. We considered using all the key bytes, rather than just half, to deﬁne the key-dependent S-boxes. Unfortunately, this made the key setup time for highend machines unreasonably large, and also made encryption too slow on low-end machines. By dropping this to half the key bits, these performance ﬁgures were improved substantially. By carefully selecting how the key bytes are folded down to half their size before being used to generate the cipher’s S-boxes, we were able to ensure that pairs of keys with the same S-boxes would have very diﬀerent subkey sequences. The key schedule gave us the most trouble. We had to resist the temptation to build a cryptographic key schedule like the ones used by Blowﬁsh, Khufu, and SEAL, because low-end implementations needed to be able to function with minimal additional memory and, if necessary, to compute the subkeys as needed by the cipher. However, a simple key schedule can be an important weak point in a cipher design, leaving the whole cipher vulnerable to partial-key guessing attacks, related-key attacks, or to attacks based on waiting for especially weak keys to be selected by a user. Though our ﬁnal key schedule is rather complex, it is conceptually much simpler than many of our intermediate designs. Reusing many of the primitives of the cipher (each round’s subkeys are generated by a computation nearly identical to the one that goes on inside the round function, except for the speciﬁc inputs involved) made it possible to investigate properties of both the key schedule and of the cipher at the same time. 56

12

Historical Remarks

Twoﬁsh originated from an attempt to take the original Blowﬁsh design and modify it for a 128-bit block. We wanted to leverage the speed and good diﬀusion of Blowﬁsh, while also improving it where we could. We wanted the new cipher to have a bijective F function, a much more eﬃcient key schedule, and to be implementable in custom hardware and smart cards in addition to 32-bit processors (i.e., have smaller tables). And we wanted it to be even faster than Blowﬁsh (per byte encrypted), if possible. Initial thoughts were to have the Blowﬁsh round structure operate on the four 32-bit subblocks in a circular structure, but there were problems getting the diﬀusion to work in both the encryption and decryption directions. Having two parallel Blowﬁsh round functions and letting them interact via a two-dimensional Feistel structure ran into the same problems. Our solution was to have a single Feistel structure with two Blowﬁsh-like 32-bit round functions and to combine them using a PHT (an idea stolen from SAFER). This idea also provided nearly complete avalanche during the round. Round subkeys are required to avoid slide attacks against identical round functions. We used addition instead of xor to take advantage of the Pentium LEA opcode and implement them in eﬀectively zero time. We used 8-by-8-bit S-boxes and an MDS matrix (an idea stolen from Square, although Square uses a single ﬁxed S-box) instead of random 8-by-32-bit Sboxes, both to simplify the key schedule and ensure that the g function is bijective. This construction ensured that Twoﬁsh would be eﬃcient on 32-bit processors (by precomputing the S-boxes and MDS matrix into four 8-by-32-bit S-boxes) while still allowing it to be computed on the ﬂy in smart cards. And since our MDS matrix is only ever computed in one direction, we did not have to worry about the matrix’s eﬃciency in the reverse direction (which Square had to consider). The construction also gave us considerable performance ﬂexibility. We worked hard to keep this ﬂex-

We cryptanalyzed and cryptanalyzed and cryptanalyzed. And the attention to cryptographic detail in the design—both the encryption function and the key schedule—make it suitable as a codebook. The multiple layers of performance tradeoﬀs in the key schedule make it suitable for a variety of implementations. • Keys should be as short as possible. Speciﬁcally: • Whether the number of rounds can safely be reduced. It is not enough to design a strong round function and then to graft a strong key schedule onto it (unless you are satisﬁed with an ineﬃcient and inelegant construction. we found it easier to design and analyze Twoﬁsh with a 128-bit key than Twoﬁsh with a 192. they had to be strong. we learned several lessons about cipher design: • The encryption algorithm and key schedule must be designed in tandem. team from the beginning drastically changed the way we looked at design tradeoﬀs. We welcome any new cryptanalysis from the cryptographic community. This variant would have a faster key-setup time than the algorithm presented here—about 1200 clock cycles instead of 7800 clock cycles on a Pentium Pro—and the same encryption and decryption speeds. to the ultimate beneﬁt of Twoﬁsh. • Whether we can improve our lower bound on the complexity of a diﬀerential attack. subtle changes in one aﬀect the other. Throughout our design process. Having a code optimization guru on our . • Analysis can go on forever. • Whether there are alternative ﬁxed tables that increase security. If the submission deadline were not on 15 June 1998. outputfeedback and cipher-feedback stream cipher. Since these provide the primary non-linearity in the cipher. • There is no such thing as a key-dependent Sbox. only a complicated multi-stage nonlinear function that is implemented as a keydependent S-box for eﬃciency. It is eﬃcient on large microprocessors. oneway hash function. there’s no stopping. During the design process. smart cards. • Whether we can deﬁne a Twoﬁsh variant with ﬁxed S-boxes. like Blowﬁsh has). for applications where storing 512 bytes of ﬁxed tables was not possible. and pseudo-random number generator. We wanted to be able to construct them algebraically. we cryptanalyzed Twoﬁsh. it may be safe to reduce the number of rounds to 14 or even 12. We plan on continuing to evaluate Twoﬁsh all through the AES selection process. At this point our best non-relatedkey attack—a diﬀerential attack—can only break ﬁve rounds. Designing Twoﬁsh in this manner made it very hard to mount any statistical cryptanalytical attacks. In the event we ﬁnd better constants that make Twoﬁsh even harder to cryptanalyze. both must work together. Further research is required on what the ﬁxed S-boxes would look like. and tested the permutations against our required criteria. we may want to revise the algorithm. • Build a cipher with strong local encryption and let the round function handle the global diﬀusion. to meet our mathematical requirements. If no better attacks are found after a few years. We have chosen both the MDS matrix and the ﬁxed permutations. We’re still cryptanalyzing. • Consider performance at every stage of the design.We spent considerable time choosing q0 and q1 . and dedicated hardware. Pentium Pro. In the end. and how much data could be safely encrypted with this variant. Design and cryptanalysis go hand in hand—it is impossible to do one without the other—and it is only in the analysis that the strength of an algorithm can be demonstrated. we would still be cryptanalyzing and tweaking Twoﬁsh. 57 13 Conclusions Work and Further We have presented Twoﬁsh. and the results of our initial cryptanalysis.or 256-bit key. right up to the morning of the submission deadline. q0 and q1 . It is much harder to design an algorithm with a long key than an algorithm with a short key. Twelve-round Twoﬁsh can encrypt and decrypt data at about 250 clock cycles per block on a Pentium. the rationale behind its design. we built permutations from random parameters. We believe Twoﬁsh to be an ideal algorithm choice for AES. And ﬁnally. and Pentium II.

School of Computer Science. 222–238.” Jan 1996. 1994. pp. Blaze. and A.A. R. Adams. Lipton “On the Importance of Checking Cryptographic Protocols for Faults. “Two Practical and Provably Secure Block Ciphers: BEAR and LION.J. Biham. Additionally. and Randy Milbert. Italy. C. “Constructing Symmetric Ciphers Using the CAST Design Procedure.” Designs.” Fast Software Encryption. n. Jim Foti. 1996. 71–104. pp. and one of our most satisfying cryptographic projects to date. Weiner. and Ed Roback for putting up with a never-ending steam of questions and complaints about its details. pp. R.Developing Twoﬁsh was a richly rewarding experience. “On Matsui’s Linear Cryptanalysis. Leighton. 1997. Queens University.” Workshop on Selected Areas in Cryptography (SAC ’97) Workshop Record. 398– 412. “DES-80. pp. pp. C. Biham and A. [Bih94] [Bih95] [AT93] [Bih97] 58 . [BAK98] E. Springer-Verlag. Biham. Tavares. Peinado. Biryukov. pp. and H. Canetti. Springer-Verlag. n. This work has been funded by Counterpane Systems and Hi/fn Inc. Anderson. Rutgers University. “Keying Hash Functions for Message Authentication. Biryukov. Springer-Verlag. J. DeMillo. 113–120. Springer-Verlag. Biham and A. Master-Key Cryptosystems. 1–14. “Diﬀerential Cryptanalysis of Lucifer. Springer-Verlag. 1996. 14 Acknowledgments [BB93] The authors would like to thank Carl Ellison. “Akelarre: A New Block Cipher Algorithm. Springer-Verlag. “New Types of Cryptanalytic Attacks Using Related Keys. v. R.” Pragocrypt ’96 Proceedings. pp. 42–54. Anderson and E. T. 1996. Ben-Aroya and E.” Advances in Cryptology — EUROCRYPT ’94 Proceedings. C. M. E.” Advances in Cryptology — CRYPTO ’93 Proceedings. v. “Serpent: A New Block Cipher Proposal. Biham.” Advances in Cryptology — EUROCRYPT ’97 Proceedings. Montoya. DIMACS Technical Report 96-02. “Tiger: A Fast New Hash Function. Codes and Cryptography.3. Bellare. Bellovin. 5th International Workshop Proceedings. Karwczyk. 160–171.E. 1996. 187–199. W. 15–16 Feb 1993. Adams. R. 1996.” Fast Software Encryption.” Advances in Cryptology — ASIACRYPT ’94 Proceedings. Piscataway. F. who copyedited the (almost) ﬁnal version of the paper. 229–246. and Beth Friedman. “this looked like it might be interesting. Nov 1997. Third International Workshop Proceedings.” Fast Software Encryption. 7. E. 37–51. Carleton University. I. Boneh. and M. Schneier. SpringerVerlag. 4. Biham. 398–412. Feigenbaum. Springer-Verlag. Springer-Verlag. E. the authors would like to thank NIST for initiating the AES process. and Miles Smid. 1995. “Designing S-boxes for Ciphers Resistant to Diﬀerential Cryptanalysis. Thompson. pp. pp. 1995. 181–190.” Workshop on Selected Areas in Cryptography (SAC ’96) Workshop Record.” Advances in Cryptology — EUROCRYPT ’94 Proceedings. Rivest. 89–97.” sci. Biham. 4th International Workshop Proceedings. [BB96] [BCK96] [AB96b] [BDL97] [Ada97a] [Ada97b] [BDR+96] M. [BB94] [BB95] References [AB96a] R. 1997. Blaze. We look forward to the next phase of the AES selection process. B. E. Third International Workshop Proceedings.” Proceedings of the 3rd Symposium on State and Progress of Research in Cryptography. 1994. 1–15.M. Alvarez. Paul Kocher.12. Rome. and F. [BFL96] M. “How to Strengthen DES Using Existing Hardware. Adams and S. De la Guia. [Anon95] Anonymous. 1997. 1996. and R. pp. 461–467. “An Improvement of Davies’ Attack on DES.” Journal of Cryptology. pp. and L. 1994. “Minimal Key Lengths for Symmetric Ciphers to Provide Adequate Commercial Security. D. E. Knudsen. Springer-Verlag. Blumenthal and S.” Fast Software Encryption. Biham. pp. U. “A Better Key Schedule for DES-Like Ciphers. who read and commented on drafts of the paper. E. pp.” Advances in Cryptology — CRYPTO ’96 Proceedings. pp. pp. “A Fast New DES Implementation in Software. T. Biham. Anderson and E. 9 Aug 1995. pp. Diﬃe.crypt Usenet posting. D. R. Shimomura. 1998. 260–271. ´ [AGMP96] G.

“A New Approach to Block Cipher Design. Preneel. 85–99. and J. “The MacGufﬁn Block Cipher Algorithm. pp. Daemen. 97– 110. Diﬀerential Cryptanalysis of the Data Encryption Standard. 38. D. Bosselaers. pp.” Advances in Cryptology — EUROCRYPT ’93 Proceedings. 1991. E. pp. [Cla97] 59 . 5th International Workshop Proceedings. 1993. 545–563. 5th International Workshop Proceedings. “Design of LOK97. 1997. Coppersmith. SpringerVerlag. Biham and A.” Advances in Cryptology — AUSCRYPT ’90 Proceedings. “Joint Hardware/Software Design of a Fast Stream Cipher. Brown. Mar 95.. and Lucifer. Blaze and B. Brown. D. 1993.” Fast Software Encryption. Third International Workshop Proceedings. Springer-Verlag. pp. H. Canada. Clapp. Govaerts. Springer-Verlag. 3. “RIPEMD-160: A Strengthened Version of RIPEMD.” draft AES submission. 513–525. 80–89.” Fast Software Encryption.” Advances in Cryptology — CRYPTO ’90 Proceedings. Springer-Verlag. Dawson. Seberry. E. J.” Advances in Cryptology — CRYPTO ’91 Proceedings. “LOKI: A Cryptographic Primitive for Authentication and Secrecy Applications. Springer-Verlag. and J. “Key Schedules of Iterative Block Ciphers. Springer-Verlag. Shamir. 71–82. L. 156–171.” Fast Software Encryption. 1993. 1997. [BS93] [BS95] [DBP96] [BS97] [DC98] [CDN95] [DGV93] [CDN98] [DGV94a] J. Second International Workshop Proceedings. Nielsen. Springer-Verlag.” Ph. Springer-Verlag. Clapp.” Fast Software Encryption. Dawson. Clapp. 1998. May 1994. “Block Ciphers Based on Modular Arithmetic. “The REDOC-II Cryptosystem. Wood. Kushilevitz. Springer-Verlag. J. Carter. Daemen. [Dae95] J. Schneier. 158–172. “Fast Hashing and Stream Encryption with PANAMA. and L. “Optimizing a Fast Stream Cipher for VLIW. Dobbertin. Vandewalle. pp. and J. Vandewalle. [CM98] [Cop94] [Cop98] [CW91] [Bro98] [BS92] [CWSK98] D. Coppersmith.” Advances in Cryptology — ASIACRYPT ’91 Proceedings. R. Govaerts. Daemen.” Third Australian Conference. “JEROBOAM” Fast Software Encryption. “Weak Keys for IDEA. Biham and A. pp. pp.” Advances in Cryptology — CRYPTO ’97 Proceedings. SpringerVerlag. Springer-Verlag. and J. Springer-Verlag.” Proceedings of the 3rd Symposium on: State and Progress of Research in Cryptography. G. R. Biryukov and E. pp. 1998. Khafre. Pieprzyk.” Fast Software Encryption. R. “Diﬀerential Fault Analysis of Secret Key Cryptosystems. [BKPS93] L. Springer-Verlag. 1996. Brown. T.” Fast Software Encryption. Springer-Verlag. 36– 50. Wagner. Kwan. Springer-Verlag. Fondazione Ugo Bordoni. M. Daemen and C. pp. 229– 236. E. H. pp. “Diﬀerential Cryptanalysis of Snefru. 1994. Seberry. “Cryptanalysis of TWOPRIME. 1990. 18–32.” Proceedings of the Workshop on Selected Areas in Cryptography (SAC ’95). personal communication.D. Biham and A. 1998. 1995. pp. 4th International Workshop Proceedings. J. 273–287. Chabanne and E. to appear.K. “Improving Resistance to Diﬀerential Cryptanalysis and the Redesign of LOKI. 32–48. pp. n. A. “Cipher and Hash Function Design. pp.” IBM Journal of Research and Development.” Advances in Cryptology — EUROCRYPT ’98 Proceedings. B. [BPS90] L. D. 1994. 1998. Katholieke Universiteit Leuven.S. 1998. 243– 250. pp. and B. [DGV94b] J. LOKI. 60–74. “Improved Cryptanalysis of RC5. C. 49–59. and L. ACISP ’98. Cusick and M. Carter. Springer-Verlag. Kelsey. SpringerVerlag. SIMD.[BK98] A. E. pp. v. M. Coppersmith. Ottawa. Nielsen. and Superscalar Processors. pp. 1998. 5th International Workshop Proceedings. Pieprzyk. 1995.S. Shamir. “DESV: A Latin Square Variation of DES. and J. 1992. pp. and J. REDOC II. pp. thesis. G. Schneier. [Cla98] C. 5th International Workshop Proceedings.C. 159–167. E. Shamir. J. 1998.” Fast Software Encryption. Cambridge Security Workshop Proceedings. Michon.K. Govaerts. Daemen. 75–92. “The Data Encryption Standard (DES) and its Strength Against Attacks. Vandewalle.

” Proceedings of the 1st International Conference on the Theory and Applications of Cryptography.” Advances in Cryptology — EUROCRYPT ’98 Proceedings. B. “On the Security of the CAST Encryption Algorithm. Rijmen. pp. 15–23. “Cryptography and Computer Privacy. Hellman. [Haw98] P. Chauvaud. T. pp. “Bounds on Non-Uniformity Measures for Generalized Linear Cryptanalysis and Partitioning Cryptanalysis. K. Springer-Verlag. H. pp. 11. Heys and S.E. [HKSW98] C. pp. and V. 1545–1554. Dedham MA. 74–84. Ferguson. 7. 201–212. H. Notz.” Advances in Cryptology — EUROCRYPT ’95 Proceedings. 30–41. Kelsey. v. [DH79] [HKM95] [DK85] [DKR97] [HKR+98] C. pp.” Pragocrypt ’96 Proceedings. Machine Cryptography and Modern Cryptanalysis. personal communication.” Philadelphia PA. “Dynamic Swapping Schemes and Diﬀerential Cryptanalysis. 1997. IT-22.” Government Committee of the USSR for Standards.” Scientiﬁc American. 1995. [Kie96] [Har96] [KKT94] 60 . Diﬃe and M. Massey. [Gut98] P. Mar 1979. K. G. 1996. W. Tavares. 1994. v. 1996. Terada. E77A. 243–257. “A ChosenPlaintext Attack on the 16-Round Khufu Cryptosystem.” Fast Software Encryption. 1989. pp. Proceedings on the IEEE. J. 4th International Workshop Proceedings.L. Springer-Verlag.” Computer. Springer-Verlag. T. “A Generalization of Linear Cryptanalysis and the Applicability of Matsui’s Piling-up Lemma. n. Jeﬀerson et al. N. N. “Cryptanalysis of SPEED. and D. SpringerVerlag. Kramer. “Cryptanalysis of SPEED. Harpes. Nov 1976. and D.. Schneier. H. “A Public-Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms. 1996. Knudsen. Wagner. Harpes. 1975. 1997. 24–38. “Some Cryptographic Techniques for Machine-to-Machine Data Communications. 4th International Workshop Proceedings. Hall.” IEEE Transactions on Information Theory. Ferguson and B. 1994. 1998. Wagner. Jakobsen and C. Kelsey. and R. Carleton University. 1996. pp. Harpes and J. Schneier. Harpes. T. Schneier. n. and J. 112–126. v. 1998. 28–40. n. 4. “The Interpolation Attack on Block Ciphers. School of Computer Science. v. to appear. “Cryptographic Protection for Data Processing Systems. J. W. 1985. ETH Series on Information Processing. H. pp.” IEEE Transactions on Information Theory. Hartung-Gorre Verlang Konstanz. 359–368. “Cryptanalysis of Akelarre. 467–479. T. and J. 4th International Workshop Proceedings. pp.M. pp. 6.” Financial Cryptography ’98 Proceedings. Rijmen. pp. Artech House. L. B. Jakobsen and L. 1994. “New Directions in Cryptography. Kaneko. 1328–1336. Gosudarstvennyi Standard 28147-89. 1998. “Exhaustive Cryptanalysis of the NBS Data Encryption Standard. Springer-Verlag. pp. “Declaration of Independence. “Partitioning Cryptanalysis. “Software Generation of Random Numbers for Cryptographic Purposes. 13–27. “Diﬀerential-Linear Weak Key Classes of IDEA. Cryptanalysis of Iterated Block Ciphers. Springer-Verlag.A.” Fast Software Encryption. [ElG85] [Fei73] [HT94] [Fer96] [FNS75] [Jeﬀ+76] [FS97] [JH96] [GC94] [JK97] [GOST89] GOST. 149–165. Feistel. 3. 1997.[DH76] W. n. C. Hellman. Knudsen. Kiefer. pp. 10.A. 228. SpringerVerlag. Gutmann. May 1973.” Advances in Cryptology — CRYPTO ’94 Proceedings. 1985. 5. pp. J. pp. Massey. Gilbert and P. Koyama. Deavours and L. 469– 472.” Canadian Conference on Electrical and Computer Engineering. Daemen. 644–654. ElGamal. Feistel. pp. C.” Proceedings of the 1998 Usenix Security Symposium. 332– 335. CTU Publishing House. 1998.” Workshop on Selected Areas in Cryptography (SAC ’97) Workshop Record. Diﬃe and M. V. IT-31. C. Pragocrypt ’96. Hawkes.” IEICE Tranactions. v. Hall.” Fast Software Encryption. [HM97] C. pp. 63. “A New Design Concept for Building Secure Block Ciphers.” unpublished manuscript. v. T. n. 4 Jul 1776. “The Block Cipher Square. v. 1997. Smith. Kruh.

Springer-Verlag. “New Potentially ‘Weak’ Keys for DES and LOKI. Springer-Verlag. [KR95] [Knu93a] [KR96a] [Knu93b] [KR96] [Knu93c] [KR97] [Knu94a] [Knu94b] [Knu95a] [KRRR98] L.” Advances in Cryptology — ASIACRYPT ’91. Knudsen and V. GOST. S.D. Springer-Verlag. pp. Kelsey. [Knu95b] [Knu95c] [Koc96] [KSW97] [Koc98] [KPL93] [KSWH98a] J. 1998. 1995. Second International Workshop Proceedings. “Non-Linear Approximations in Linear Cryptanalysis. L.R. pp. pp. L. C. n. Knudsen. 1996. L. Kaliski Jr. K. L. Knudsen and M. Knudsen. School of Computer Science. v. Hall. and D. DSS. 1997. 282–291. Meier. and TEA.R.R. First International Workshop ISW ’97 Proceedings. 61 . S. Rijmen. pp. and D. Rijmen.” Fast Software Encryption. 252–267. B. “Secure Applications of Low-Entropy Keys. Knudsen. L.R. 1996. “Two Rights Sometimes Make a Wrong. and C.” Fast Software Encryption. “Securing DES S-boxes Against Three Robust Cryptanalysis. RSA. SpringerVerlag. Springer-Verlag. Korea.” Proceedings of the 1993 Japan-Korea Workshop on Information Security and Cryptography. 8. 26–39. personal communication. Aarhus University. Knudsen. and S. Lee. 104–113. “Block Ciphers — Analysis. 1995. Canada. 206–221. and M. 5th International Workshop Proceedings. “Practically Secure Feistel Ciphers.” Advances in Cryptology — CRYPTO ’96 Proceedings. and Triple-DES.” Advances in Cryptology — CRYPTO ’96 Proceedings. Hall. D. L. B. “Cryptanalysis of LOKI91. Springer-Verlag. [KR94] B. “Timing Attacks on Implementations of Diﬃe-Hellman. Kocher.” Workshop on Selected Areas in Cryptography (SAC ’97) Workshop Record. 1998. Rogaway. 22–35. Seoul. Schneier.” Advances in Cryptology — CRYPTO ’94 Proceedings. 1998. pp. B. Schneier. Robshaw.” Proceedings of the Workshop on Selected Areas in Cryptography (SAC ’95). 419–424.” European Transactions on Communication.” Advances in Cryptology — EUCROCRYPT ’96.” Information Security. [KM97] L. pp. 1994. J.. Kaliski Jr. “Truncated and Higher Order Diﬀerentials. Park. GDES.” Advances in Cryptology— CRYPTO ’95 Proceedings. Knudsen. Lee. 121–134. Kilian and P. Wagner. Park. P. CAST. L. Carleton University. pp. pp. Robshaw. and D. “Linear Cryptanalysis Using Multiple Approximations and FEAL. “On the Design and Security of RC2.” Information and Communications Security. Knudsen. 211–221. Schneier. Kocher. Cambridge Security Workshop Proceedings. pp. DES-X.” Fast Software Encryption.R. Kim. “Diﬀerential Cryptanalysis of RC5. Springer-Verlag. 213– 223.R. First International Conference Proceedings. RC2.. 1993. 224–236.” Advances in Cryptology — CRYPTO ’96 Proceedings. V. Knudsen and W.R. “Reconstruction of s2 DES S-Boxes and their Immunity to Diﬀerential Cryptanalysis. Knudsen. J. Springer-Verlag. pp. pp. pp. P. 203–207. pp. Wagner. “Iterative Characteristics of DES and s2 DES.” Advances in Cryptology — EUROCRYPT ’94 Proceedings. 249–264.R. “Cryptanalytic Attacks on Pseudorandom Number Generators. Ottawa.” Advances in Cryptology — CRYPTO ’92. 145– 157. B. Nov 1994. Springer-Verlag. L. 2nd International Workshop Proceedings.[KLPL95] K. Robshaw. 1994. pp.R.” Fast Software Encryption. “How to Protect DES Against Exhaustive Key Search. 1996. pp. 196–211. Robshaw. B. dissertation. “Cryptanalysis of LOKI. Knudsen.” Fast Software Encryption.” Advances in Cryptology — AUSCRYPT ’92. 1996. Kim. Applications. Lee. L. Springer-Verlag. Design. [KSHW98] J. Wagner. “A Key-Schedule Weakness in SAFER K-64. pp. pp. Schneier. “Linear Cryptanalysis Using Multiple Approximations. Springer-Verlag. Springer-Verlag. Rivest. 1998. 1995. pp. 1997. Springer-Verlag. and Other Systems. Biham-DES. Springer-Verlag. 1993. Springer-Verlag. “Related-Key Cryptanalysis of 3-WAY. 5th International Workshop Proceedings. 24–26 October 1993. SAFER. 168–188. [KSW96] J. Kelsey. 237–251. R. L. 497–511. S.R. 1995. and M. “Key-Schedule Cryptanalysis of IDEA. Kelsey. pp.” Ph. 274–286. and D. and M. 5. 1993. Kelsey. Springer-Verlag. Wagner. Knudsen. 1995. 445–454. 196–208. 1997.R. Springer-Verlag. NewDES. pp.

Murphy. Schneier. Madryga. “Side Channel Cryptanalysis of Product Ciphers. Springer-Verlag.” letter to NIST. T. “DiﬀerentialLinear Cryptanalysis. SpringerVerlag. MacWilliams and N. Lai. Lai. Wagner. pp. Springer-Verlag.” Fast Software Encryption. “Interpolation Attacks of the Block Cipher: SNAKE. and C. pp. S. pp. 1996. Queens University. 3. [KSWH98c] J. NBS FIPS PUB 46.” Workshop on Selected Areas in Cryptography (SAC ’96) Workshop Record. 2 Apr 97. 205–218.” Fast Software Encryption. 1991. “A Proposal for a New Block Encryption Standard. 1995. Lee and Y. D.” Advances in Cryptology—CRYPTO ’95 Proceedings. 69–82. pp. Springer-Verlag. pp. 5th International Workshop Proceedings.L. “The Theory of Error-Correcting Codes. D. 1998. M. 1994. Wagner. Schneier. 15–29. “Yarrow: A Pseudorandom Number Generator. pp. SpringerVerlag. 54–68. “Fast Software Encryption Functions. 2. 17–31. Springer-Verlag. McDermott.” National Bureau of Standards. “Higher Order Diﬀerential Attack of a CAST Cipher.” Fast Software Encryption. X.” Computer Security: A Global Challenge.” Advances in Cryptology — CRYPTO ’91 Proceedings. 1997.S.-H. v. pp.” ESORICS ’98 Proceedings. pp. pp. [Kwa97] M. F. M. Kluwer Academic Publishers. Hall. Kelsey. 1977. 386–397. 17–38. 1994.” Advances in Cryptology — CRYPTO ’90 Proceedings. Moriai. 389–404. [Mur90] S.” Journal of Cryptology. pp. T.” Fast Software Encryption. 557–570.” in preparation. pp.” Advances in Cryptology — EUROCRYPT ’93 Proceedings.J. Springer-Verlag. “Markov Ciphers and Diﬀerential Cryptanalysis. Matsui. Massey. “NSA comments on criteria for AES. Massey. “Higher Order Derivations and Diﬀerential Cryptanalysis. pp. 1997. Cha. 1996.” NorthHolland. M.-T.” Workshop on Selected Areas in Cryptography (SAC ’96) Workshop Record. Langford and M. 3–17. Shimoyama. [Mas94] J. Leech.[KSWH98b] J. 1991. and S. 3rd International Workshop Proceedings. Kwan. B. “Data Encryption Standard. 1998. “Practical S-Box Design. 4th International Workshop Proceedings. “A High Performance Encryption Algorithm. U. “On Correlation Between the Order of S-Boxes and the Strength of DES. “On Diﬀerential and Linear Cryptanalysis of the RC5 Encryption Algorithm. and T. 366–375.S. KIISC and ISEC Group of IEICE. 145–154. B. Sloane. Lai and J. Springer-Verlag. Kelsey. 1991. X. Matsui. [MSK98b] S. “Linear Cryptanalysis Method for DES Cipher. 476–501. 17–26. M. 1994.L. Hall. Adams. Springer-Verlag. “New Block Encryption Algorithm MISTY.” Advances in Cryptology — CRYPTO ’94 Proceedings. pp. Springer-Verlag. Springer-Verlag. C. 1996. SpringerVerlag. National Security Agency. X. 227–233. Department of Commerce. Springer-Verlag.E. [MA96] [Mad84] [NBS77] 62 . Moriai. Kaneko. 445–454. Yin. Jan 1977. 4th International Workshop Proceedings. 1984. Hellman.A. Cambridge Security Workshop Proceedings. Shimoyama. National Bureau of Standards. M. Massey. “SAFER K-64: A ByteOriented Block-Ciphering Algorithm. and C. to appear. B. “The Design of ICE Encryption Algorithm.” Proceedings of JW-ISC ’97.” unpublished manuscript. Merkle. J. Kaneko. pp.” Communications and Cryptography: Two Sides of One Tapestry. and T. pp. Matsui. [Mat94] [Mat95] [KY95] [Mat96] [Lai94] [Mat97] [LC97] [McD97] [Lee96] [Mer91] [LH94] [MS77] [LM91] [LMM91] [MSK98a] S. Elsevier Science Publishers. “CRISP: A Feistel Network with Hardened Key Scheduling. pp. T.J. “New Structure of Block Ciphers with Provable Security Against Diﬀerential and Linear Cryptanalysis. 61–76.C. pp. 1995. “The Block Cipher: SNAKE with Provable Resistance Against DC and LC Attacks.” Advances in Cryptology — EUROCRYPT ’90 Proceedings. R. n. Amsterdam.” Fast Software Encryption. S. 1990. Murphy. Mister and C. pp. 1997. Matsui. W.J. 1998. Kaliski and Y. “The Cryptanalysis of FEAL4 with 20 Chosen Plaintexts. Queens University. 1994. 1–17.” Advances in Cryptology — EUROCRYPT ’94 Proceedings.

V. 3rd International Workshop Proceedings.ps. Nyberg. P. Springer-Verlag. Springer-Verlag.S. 1994.” Journal of Cryptology. Department of Commerce. Dobbs Journal. pp. “Perfect Nonlinear S-boxes. O’Connor. National Institute of Standards and Technology. 378– 386. A Million Random Digits with 100. 117. “Announcing Development of a Federal Information Standard for Advanced Encryption Standard. Springer-Verlag. Rijmen. Preneel. J.” Advances in Cryptology — CRYPTO ’85 Proceedings. pp. pp. 439–444. pp. Springer-Verlag. “Announcing Request for Candidate Algorithm Nominations for the Advanced Encryption Standard (AES). Nyberg. A. O’Connor. 1. Springer-Verlag. J. Nyberg. L. 1991. Cryptanalysis and Design of Iterated Block Ciphers. 368–377.ucdavis.” Federal Register. A. K.S. pp. Oct 1997. pp. Rijmen. Preneel. K. “On the Construction of Highly Nonlinear Permutations. K.” U. “Fast Software Implementation of MISTY on Alpha Processors. thesis. J. “DES Modes of Operation.” Dr.S. 93–94. 12 Sep 1997. [Nyb93] [Nyb94] [RC97] [Nyb95] [Nyb96] [RDP+96] V. Dec 1980. 1955. 1993. “Truly Random Numbers.” Proceedings of JW-ISC ’97.” Advances in Cryptology — EUROCRYPT ’93 Proceedings. Analysis and Design of Cryptographic Hash Functions. Eds. 1995. Nyberg and L. 1994. pp. [NK95] K. Springer-Verlag.[NBS80] National Bureau of Standards. Coppersmith. KIISC and ISEC Group of IEICE. National Institute of Standards and Technologies. Ph. Cambridge Security Workshop Proceedings. O’Connor. Nyberg.” Advances in Cryptology — EUROCRYPT ’93 Proceedings.” Advances in Cryptology — EUROCRYPT ’91 Proceedings. Govaerts. Rijmen. 1996.edu/~rogaway/ papers/seal. K. pp. Springer-Verlag. 1994. Nyberg. Preneel. 8. “Secure Hash Standard. n. pp. IL. 48051–48058. Department of Commerce. pp. 62.” Federal Register. Glencoe. 99–111. [NIST93] [OCo94b] [NIST94] [OCo94c] [NIST97a] National Institute of Standards and Technology. pp. n. C. Coppersmith. Katholieke Universiteit Leuven. pp. SpringerVerlag. 1986. 360–370. 2 Jan 1997. Knudsen. 1. “Enumerating Nondegenerate Permutations. B. Lecture Notes in Computer Science. 56–63. Jan 1993. “On the Distribution of Characteristics in Bijective Mappings.” Advances in Cryptology — EUROCRYPT ’93 Proceedings. Matsui. R. Preneel. 1994. May 1994. “A Software-Optimized Encryption Algorithm. [Plu94] [Pre93] [PRB98] [NM97] [QDD86] [Nyb91] [RAND55] RAND Corporation.” National Bureau of Standards. B. 55–64. L. Ph. Bosselaers.cs. 403–412. “Diﬀerentially Uniform Mappings for Cryptography. 1994.D. 13. v.” Advances in Cryptology — CRYPTO ’93 Proceedings. “Digital Signature Standard. Nov 1994. v.” Advances in Cryptology — EUROCRYPT ’92 Proceedings.D. Quisquater. “Recent Developments in the Design of Conventional Cryptographic Algorithms. 55–63. Free Press Publishers. Vandewalle. Department of Commerce. 19.” State of the Art and Evolution of Computer Security and Industrial Cryptography. pp. to appear. [RC94] P. 1996. [OCo94a] L. Y. n.” Advances in Cryptology — EUROCRYPT ’94 Proceedings. n. NBS FIPS PUB 46. pp. [NIST97b] National Institute of Standards and Technology. Springer-Verlag.” Advances in Cryptology — ASIACRYPT ’96 Proceedings. 537–542. 1998. 1997. Rogaway and D. 91– 104. Springer-Verlag. 1995. Springer-Verlag. available at http://www. Nakajima and M. “Generalized Feistel Networks. Desmedt.” Fast Software Encryption. “The Cipher SHARK. NIST FIPS PUB 186.-J. Davio. 62. 63 .” Fast Software Encryption. “Provable Security Against Diﬀerential Cryptanalysis. “A Software-Optimized Encryption Algorithm.. 92–98. pp. “On the Distribution of Characteristics in Composite Permutations. and E. pp. B. Katholieke Universiteit Leuven. May 1993. U. K. “The Importance of ‘Good’ Key Scheduling Schemes. dissertation.” U.000 Normal Deviates. v. 27–37. Springer-Verlag. Rogaway and D. 1997. 113-115. “Linear Approximation of Block Ciphers. Bosselaers. Plumb. [Rij97] V.R. and M. DeWin.” full version of [RC94]. B. v.

Second International Workshop Proceedings. Jan 1985. 1997. Scott. pp. Sel¸uk. 121–144. SpringerVerlag. A. Springer-Verlag. Schneier. Shimizu and S.” Proceedings of the Workshop on Selected Areas in Cryptography (SAC ’95). “Fast Data Encipherment Algorithm FEAL. 21. Canada. DeWin. Stern and S.A. Springer-Verlag.” Cryptologia. v.L. ICICS ’97 Proceedings. 5th International Workshop Proceedings. Feb 1978. T. B. [Sco85] [Riv91] [Sel98] [Riv92] [Riv95] [SK96] [Riv97] [SK98] [Ros98] [SM88] [RP95a] [SMK98] [RP95b] [RPD97] [SV98] [RSA78] [SW97] [SAM97] [Vau95] [Sch94] [Vau96a] 64 .” Advances in Cryptology — EUROCRYPT ’87 Proceedings. 191–204. pp.” Fast Software Encryption. pp. pp. Adleman. 100–106. 9. v. Rijmen and B. 32–42. Moriai. 1. pp. “The RC5 Encryption Algorithm. “On Weaknesses of Non-surjective Round Functions. Miyaguchi. Rijmen and B. 253–266. R. “Quadratic Relation of S-box and Its Application to the Linear Attack of Full Round DES. “Cryptanalysis of MacGuﬃn.” Fast Software Encryption.” Communications of the ACM. 267–278. R.” Information and Communications Security. 1995. Kaneko. 1995. Springer-Verlag.” Fast Software Encryption. RACE. Preneel. Second Edition. 203–207. First International Workshop ISW ’97 Proceedings. Shimoyama. 1988. “New Results in Linear Cryptc analysis of RC5. “The MD5 Message Digest Algorithm. B. pp. Preneel and E. 3rd International Workshop Proceedings.” Third Australian Conference. “On Weaknesses of Non-surjective Round Functions. and L. Applied Cryptography. Amada. S. Rivest. A. 353–358. Schneier and J. Schneier and D. “Improving the Higher Order Diﬀerential Attack and Cryptanalysis of the KN Cipher. 1998. Vaudenay. 1998. Kaneko. 1991. SpringerVerlag. Whiting. Preneel. and S. June 1992.” Advances in Cryptology — CRYPTO ’98 Proceedings. SpringerVerlag. 1996. “On the Weak Keys in Blowﬁsh. “On the Need for Multipermutations: Cryptanalysis of MD4 and SAFER. “The MD4 Message Digest Algorithm. v. “A Description of the RC2(r) Encryption Algorithm. 64-Bit Block Cipher (Blowﬁsh). S.” Fast Software Encryption. Springer-Verlag. 1997. R. pp. R. pp. and Cryptography. Rivest. 2. Rose. 4th International Workshop Proceedings. First International Conference. 1998. “Description of a New VariableLength Key. “Wide Open Encryption Design Oﬀers Flexible Implementation. to appear. Vaudenay. Cambridge Security Workshop Proceedings. V. 27–32. Kelsey. 1994. 1995. pp.” Internet-Draft.” RFC 1321.” Advances in Cryptology — CRYPTO ’90 Proceedings. 3rd International Workshop Proceedings. Rivest. RIPE Integrity Primitives: Final Report of RACE Integrity Primitives Evaluation (R1040). 1998. R. pp. “A Stream Cipher Based on Linear Feedback over GF(28 ). 189–205. V. Rivest. S. ACISP ’98. 5th International Workshop Proceedings. and T. Schneier.L. B. J. Shamir. n. 86–96. pp.” Fast Software Encryption. n. Moriai. pp. G. T. Springer-Verlag. V. 1–16. pp. Springer-Verlag. “Unbalanced Feistel Networks and Block Cipher Design. work in progress.” Information Security. Springer-Verlag. S. “CS-Cipher.” Fast Software Encryption. Shimoyama and T. Codes. John Wiley & Sons. pp. “Improved Fast Software Implementation of Block Ciphers. R. June 1997. Rivest. Shimoyama. SpringerVerlag.” Fast Software Encryption. A. 3. 242–259. 1996. Second International Workshop Proceedings. Ottawa. “Fast Software Encryption: Designing Encryption Algorithms for Optimal Speed on the Intel Pentium Processor. [Sch96] B. Springer-Verlag.” Fast Software Encryption. 75–90. in preparation. pp.” Designs. n. T. SpringerVerlag. SpringerVerlag.” Fast Software Encryption. Apr 1992. B. pp.[RIPE92] Research and Development in Advanced Communication Technologies in Europe. 120–126. 12. Rijman. 286–297. 1996. “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. 1995. 2nd International Workshop Proceedings. pp. Springer-Verlag. 303–311. 1997. Vaudenay.L.

Imai. “TEA. “The SPEED Cipher. . 1996. . This particular implementation does not swap the halves but instead applies the F function alternately between the two halves. Springer-Verlag. 16–20. May 1994. 11. Y.” sci.crypt Usenet posting. 40–48. . 127–134. Carleton University.-A. pp. 71–89. 1993. Seberry. . 1997. . 205–209. “Eﬃcient DES Key Search. D.makeKey: . . .1 Twoﬁsh Test Vectors Intermediate Values [WSK97] [YLCY98] X. [ZG97] [Wag95a] [Wag95b] [WH87] F.M. ACM Press. R. SpringerVerlag. pp. D. 139–147. Wheeler. 1997. Tavares.” Advances in Cryptology — AUSCRYPT ’92 Proceedings. . pp. Winternitz and M. Lam.” Workshop on Selected Areas in Cryptography (SAC ’96) Workshop Record. . 132–147. . Carleton University. First International Workshop ISW ’97 Proceedings. 1. 203–207. [YMT97] A. .X. Zheng. 2nd International Workshop Proceedings. The following ﬁle shows the intermediate values of three Twoﬁsh computations. Schneier. R. and H. Needham. [Zhe97] [ZMI90] [Whe94] [Wie94] [ZPS93] [Win84] [WN95] A A. 209–220. You. personal communication. pp. and J. Yuval. SpringerVerlag. 83–104. v. “A Block-Ciphering Algorithm Based on Addition-Multiplication Structure in GF(2n ). J. “Reinventing the Travois: Encrytion/MAC in 30 ROM Bytes. Heys. pp. S. T. Y. n. Youssef. 1997. 1997.S. “An Experiment on DES Statistical Cryptanalysis. Springer-Verlag.” Fast Software Encryption. Wheeler and R. Hellman. Wagner. Wiener. .M. pp.M. 4th International Workshop Proceedings. Counterpane Systems [YTH96] [Yuv97] Input key 00000000 00000000 Subkeys 52C54DDE 7CAC9D4A B7B83A10 EE9C341F F98FFEF9 15A48310 424D89FE 311B834C 3302778F 7A6C6362 3411B994 84ADB1EA 54D2960F A6B8FF8C 6A748D1C 928EF78C 9949D6BE 07C07D68 1FE71844 F298311E 00000000 00000000 11F0626D 4D1B4AAA 1E7D0BEB CFE14BE4 9C5B3C17 342A4D81 C14724A7 FDE87320 26CD67B4 C2BAF60E D972C87F A7DEE434 A2F7CAA8 8014C425 EDBAF720 0338EE13 C8314176 ECAE7EA7 85C05C89 696EA672 --> --> --> S-box key 00000000 00000000 [Encrypt] Input whiten Output whiten Round subkeys 65 . and X. S. pp. D.” Advances in Cryptology — CRYPTO ’89 Proceedings. Zheng.Y. “A New Byte-Oriented Block Cipher. 1984. Jan 1987. SpringerVerlag. 461–480. pp.” Cryptologia.” Fast Software Encryption.txt" Electronic Codebook (ECB) Mode Intermediate Value Tests Algorithm Name: Principal Submitter: ========== KEYSIZE=128 KEY=00000000000000000000000000000000 .” Advances in Cryptology: Proceedings of Crypto 83. Wagner. Springer-Verlag. Wagner. S. K. “A New Class of Substitution-Permutation Networks. Guo. B. pp. Y.” Fast Software Encryption. Matsumoto.H.” 3rd ACM Conference on Computer and Communications Security.J. .” TR-244. M. 1995. Kelsey. .” Workshop on Selected Areas in Cryptography (SAC ’97) Workshop Record. . Tavares. 1998. G. “On the Construction of Block Ciphers Provably Secure and Not Relying on Any Unproved Hypotheses.” Financial Cryptography ’97 Proceedings.[Vau96b] S. Cheng. 1996. pp. Queens University. “HAVAL — A One-Way Hashing Algorithm with Variable Length of Output. and J. School of Computer Science. a Tiny Encryption Algorithm. “Cryptanalysis of S-1. . 145–159. 1990. D. 526–537. Youssef. SpringerVerlag. Cambridge Security Workshop Proceedings. Mister. and H. 1995. Plenum Press. . pp. 27 Aug 1995. . . pp. Zhu and B. . FILENAME: "ecb_ival. School of Computer Science.E.” Advances in Cryptology — CRYPTO ’97 Proceedings. Winternitz. Yi. Zheng. A. Vaudenay. School of Computer Science. 1997. Pieprzyk. 97–110. . pp.” Information Security. TWOFISH Bruce Schneier. “Chosen-key Attacks on a Block Cipher.” Workshop on Selected Areas in Cryptography (SAC ’97) Workshop Record. .E. Carleton University. . “A Bulk Data Encryption Algorithm. D. 1994. . “On the Design of Linear Transformations for Substitution Permutation Encryption Networks. and S. Springer-Verlag. “Cryptanalysis of the Cellular Message Encryption Algorithm. “Producing One-Way Hash Functions from DES.

t1=9044BF4B. BFF8215E. 2C51191D. 57C8DD1F 6A1AD61E . t1=472014C3. 12FEBF99. EE7515E6 F0D54DCD . t0=E8C880BC. t1=7F56EB43. t1=2D2F5FCE. t0=DAA00849. t1=9F09EB26. t1=B87B2DFD. CC2B3F88. t1=C12302BE. F8EA8B1B. 5E68BE8F. FB18EA0B 38BD43D3 . B250BEDA. 16750474. 4D1B4AAA. t0=E3C81108. C069BD9B. t0=09A1B597. 5415D5D3. t0=853A6BB2. t0=E66B9D19. 10325476 98BADCFE --> B255BC4B . 6A7FBC4E 0BF1830B . 20FA7CE8. FB72B4E1 2A43505C . t0=607AAEAD. t1=33D1ECEC. t1=318EACB4. 219BFEB4. 2C51191D. CC2B3F88. t0=C06D4949. t1=8B8678CC. t0=CEB0BBE1. t0=9D16BBB3.PT=00000000000000000000000000000000 Encrypt() R[16]: x= 17738CD3 R[17]: x= E5D2D1CF B5142D18 DF9CBEA9 5CC6CB7B B8131F50 62A2CE64. 04CA5304. t0=8C6DB451. t1=6ED2DBF9. t0=839B0017. t1=41B9BFC1. 99149A52. t1=8B8678CC. 7A0A91B6. t0=114C17C5. 65C2BD96. CT=CFD1D2E5A9BE9CDF501F13B892BD2248 R[-1]: R[ 0]: R[ 1]: R[ 2]: R[ 3]: R[ 4]: R[ 5]: R[ 6]: R[ 7]: R[ 8]: R[ 9]: R[10]: R[11]: R[12]: R[13]: R[14]: R[15]: R[16]: R[17]: x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= 00000000 52C54DDE 52C54DDE 55A538DE 55A538DE 2AE61A96 2AE61A96 0FFE0AD1 0FFE0AD1 A85CE579 A85CE579 013077D7 013077D7 64F0EAA1 64F0EAA1 B0681C46 B0681C46 C1708BA9 5C9F589F 00000000 11F0626D 11F0626D 5C5A4DB6 5C5A4DB6 84BC42D3 84BC42D3 D6B87B70 D6B87B70 DE2661CE DE2661CE B3528BA1 B3528BA1 DA27090C DA27090C 606D0273 606D0273 9522A3CE 322C12F6 00000000 7CAC9D4A C38DCAA4 C38DCAA4 899063BD 899063BD F14F2618 F14F2618 CD0D38A1 CD0D38A1 7A39754C 7A39754C D57933FD D57933FD F64F1005 F64F1005 EB27628F EB27628F 2FECBFB6 00000000. t0=60DAC1A4. 4822BD92. 7A0A91B6. t0=90AB32AA. t1=989D1459. t1=318EACB4. t1=64AD7A3F. 5E68BE8F. C069BD9B. t1=2F7D5E04. 821B5F36. 62A2CE64. t0=A0AA2F81. t1=0B8B72BA. 893E49A9. t0=607AAEAD. 973ABD2A. t1=41B9BFC1. t1=19C23B0A. BFF8215E. B6315B66 B27C05AF . C069BD9B. PT=00000000000000000000000000000000 R[17]: R[16]: R[15]: R[14]: R[13]: R[12]: R[11]: R[10]: R[ 9]: R[ 8]: R[ 7]: R[ 6]: R[ 5]: R[ 4]: R[ 3]: R[ 2]: R[ 1]: R[ 0]: R[-1]: x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= 5C9F589F C1708BA9 B0681C46 B0681C46 64F0EAA1 64F0EAA1 013077D7 013077D7 A85CE579 A85CE579 0FFE0AD1 0FFE0AD1 2AE61A96 2AE61A96 55A538DE 55A538DE 52C54DDE 52C54DDE 00000000 322C12F6 9522A3CE 606D0273 606D0273 DA27090C DA27090C B3528BA1 B3528BA1 DE2661CE DE2661CE D6B87B70 D6B87B70 84BC42D3 84BC42D3 5C5A4DB6 5C5A4DB6 11F0626D 11F0626D 00000000 2FECBFB6 EB27628F EB27628F F64F1005 F64F1005 D57933FD D57933FD 7A39754C 7A39754C CD0D38A1 CD0D38A1 F14F2618 F14F2618 899063BD 899063BD C38DCAA4 C38DCAA4 7CAC9D4A 00000000 5AC3E82A. 4016E346 8D1C0DD4 . t1=FFF30DB7. FFEEDDCC BBAA9988 --> 8E4447F7 . t0=8C6DB451. 5E68BE8F. 12FEBF99. B250BEDA. 62A2CE64. t0=8C8927A1. t1=F6BC546D. t1=F1687F43. t0=877AF61D. t1=77BD388E. t0=677DA87D. 72427BEA 911CC0B8 . t0=C615F1F6. 99149A52. t1=F35A3092. t1=33D1ECEC. 00000000. t1=9044BF4B. F8EA8B1B. 12FEBF99. t1=62585CF7. D45666FB. t0=A0AA2F81. t0=99C9694E. t0=877AF61D. t1=2D84C23D. 821B5F36. C3479695 9B6958FB Round subkeys . 2C51191D. 714AD667 653AD51F . t0=52971E00. t1=EB143CFF. A7489D55 6E44B6E8 . 2C51191D. t1=2D2F5FCE. AF609383 FD36908A . E67C030F. t0=90AB32AA. B250BEDA. t0=5FE8370B. t1=0EB8FA83. 973ABD2A. 61B5E0FB D78D9730 . t1=84878F62. t0=E3C81108. 04CA5304. t1=C12302BE. Subkeys . t0=B33C25D6. 2D7DC789 45B7BD3A . t0=60DAC1A4. 7A0A91B6. F8EA8B1B. A23B7169. t1=F6BC546D. 12FEBF99. t0=58554EDB. t0=C615F1F6. t0=4D8B7489. t0=A7D24F8E. 893E49A9. t0=93690387. t0=067D0B49. 5EC769BF 44D13C60 Input whiten . t0=D5397903. t1=62585CF7. EFCDAB89 67452301 --> B89FF6F2 . 99149A52. t1=67A58299. t1=B3A89DB0. EFCDAB89 67452301 --> B89FF6F2 . t1=6FC0BBF3. t1=F301CE95. t0=0BB41F61. 26AE7271 C2900D79 . 3A9247F7 9A3331DD . 10325476 98BADCFE --> B255BC4B . t0=9357B338. t0=54687CDF. t1=78D2617B. 973ABD2A. t0=52971E00. t1=4B61EEC7. t0=E8C880BC. 76191781 37A9A0D3 . A23B7169. EFB2F051. t1=989D1459. t0=988C8223. t0=C1176720. t0=9357B338. 65C2BD96. t1=E4D3D68D. t0=77FC927F. PT=00000000000000000000000000000000 Encrypt() R[-1]: R[ 0]: R[ 1]: R[ 2]: R[ 3]: R[ 4]: R[ 5]: R[ 6]: R[ 7]: R[ 8]: R[ 9]: R[10]: R[11]: R[12]: R[13]: R[14]: R[15]: x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= 00000000 38394A24 38394A24 C8F5099F C8F5099F 07633866 07633866 A042B99D A042B99D F7B097FA F7B097FA A279C718 A279C718 E4409C22 E4409C22 8561A604 8561A604 00000000 C36D1175 C36D1175 0C4B8F53 0C4B8F53 59421079 59421079 709EF54B 709EF54B 9E5C4FF7 9E5C4FF7 421A8D38 421A8D38 702548A2 702548A2 825D2480 825D2480 00000000 E802528F 9C263D67 9C263D67 69948F5E 69948F5E C015BE79 C015BE79 0CD39FA6 0CD39FA6 77FC8B29 77FC8B29 5B1A0904 5B1A0904 5DDAA2A1 5DDAA2A1 5CC6CB7B 00000000. t1=0B8B72BA. B9141AB4 BD3E70CD Output whiten . C069BD9B. 99149A52. t0=114C17C5. t1=B3A89DB0. 149B9CEC. . t1=4AC0BD46. t1=F3D5AB78. t1=F1687F43. t1=18041948. t0=EE03FB5B. B6363E89 494D9855 . E802528F 219BFEB4 . ED323794 3D3FFD80 . t1=6FC0BBF3. 38394A24 C36D1175 Input whiten . BFD34176 5C133D12 . PT=00000000000000000000000000000000 Encrypt() R[-1]: R[ 0]: R[ 1]: R[ 2]: R[ 3]: R[ 4]: R[ 5]: R[ 6]: R[ 7]: R[ 8]: R[ 9]: R[10]: R[11]: R[12]: R[13]: R[14]: R[15]: R[16]: R[17]: x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= 00000000 5EC769BF 5EC769BF 99424DFF 99424DFF 2C125DD7 2C125DD7 D5178F25 D5178F25 FF92E109 FF92E109 BB79AD2E BB79AD2E 4A6BBAFF 4A6BBAFF CBD3C29D CBD3C29D 85411C2B E07B5237 00000000 44D13C60 44D13C60 FBC14BFC FBC14BFC 5A526278 5A526278 00D35CC5 00D35CC5 DF621C97 DF621C97 AA410F41 AA410F41 439F4766 439F4766 BC31FEBE BC31FEBE 7777DC05 B8342305 00000000 76CD39B1 D38B6C9F D38B6C9F 698BE047 698BE047 E35CD910 E35CD910 D8447F91 D8447F91 28BFEFF5 28BFEFF5 6576A3ED 6576A3ED F7186836 F7186836 D4E77B7C D4E77B7C CAFC0C9F 00000000. t1=F3D5AB78. 973ABD2A. CC2B3F88. t0=9F3BEC03. t1=AC9926BF. t1=A1EAEAAE. t0=5CF5C93C. t0=54687CDF. t0=5FE8370B. t0=93690387. t0=8A317EF8. 219BFEB4. BFF8215E. 57F8A163 2BEFDA69 . t0=5D174956. 149B9CEC. [Encrypt] t0=29C0736C. 00000000. t1=FB5A051C. t1=83068533. t1=FB5A051C. t0=8F8307AA. 77665544 33221100 --> 45661061 . t1=6ED2DBF9. t0=677DA87D. t0=C1176720. 7A0A91B6. t0=7C4536B9. 5E68BE8F. 5415D5D3. t1=2D84C23D. 65C2BD96. t1=84878F62. E69EA8D1 ED99BDFF . t1=C1033512. 893E49A9. t0=8A317EF8. t1=7F56EB43. t0=9F3BEC03. 149B9CEC.makeKey: Input key --> S-box key . D45666FB. t0=067D0B49. t1=A1EAEAAE. Decrypt() CT=37527BE0052334B89F0CFCCAE87CFA20 R[17]: R[16]: R[15]: R[14]: R[13]: R[12]: R[11]: R[10]: R[ 9]: R[ 8]: x= x= x= x= x= x= x= x= x= x= E07B5237 85411C2B CBD3C29D CBD3C29D 4A6BBAFF 4A6BBAFF BB79AD2E BB79AD2E FF92E109 FF92E109 B8342305 7777DC05 BC31FEBE BC31FEBE 439F4766 439F4766 AA410F41 AA410F41 DF621C97 DF621C97 CAFC0C9F D4E77B7C D4E77B7C F7186836 F7186836 6576A3ED 6576A3ED 28BFEFF5 28BFEFF5 D8447F91 20FA7CE8. Subkeys . [Encrypt] PT=00000000000000000000000000000000 ========== KEYSIZE=192 KEY=0123456789ABCDEFFEDCBA98765432100011223344556677 . t0=5BC40012. E67C030F. t1=AC9926BF. t0=5CF5C93C. t1=828E7493. t0=58554EDB. D45666FB. t0=09A1B597. F8EA8B1B. 62A2CE64. t1=FFF30DB7. t1=17AE5B7E. 7CB57D06. 6A997290. E67C030F. EFB2F051. Decrypt() t0=C06D4949.makeKey: Input key --> S-box key . t0=E9BC6975. D45666FB. 76CD39B1 16750474 . t0=B33C25D6. t1=C1033512. t1=3945E62C. 4FBD10B4 578DA0ED . CC2B3F88. 4D1B4AAA. t0=853A6BB2. t1=0EB8FA83. t1=83068533. t1=472014C3. t1=18041948. 349C294B EC21F6D6 Output whiten . t1=828E7493. 66 . 5AC3E82A. t1=F0DDA4C3. 7CB57D06. 6A997290. t1=F0DDA4C3. DF326FE3 5911F70D . A06C8140 9853D419 . t0=EE03FB5B. t1=F301CE95. 893E49A9. t1=0B2FC79F. E67C030F. t1=4AC0BD46. t1=4B61EEC7. 590BBC63 F95A28B5 . CT=CFD1D2E5A9BE9CDF501F13B892BD2248 CT=9F589F5CF6122C32B6BFEC2F2AE8C35A Decrypt() CT=9F589F5CF6122C32B6BFEC2F2AE8C35A R[17]: R[16]: R[15]: R[14]: R[13]: R[12]: R[11]: R[10]: R[ 9]: R[ 8]: R[ 7]: R[ 6]: R[ 5]: R[ 4]: R[ 3]: R[ 2]: R[ 1]: R[ 0]: R[-1]: x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= x= E5D2D1CF 17738CD3 8561A604 8561A604 E4409C22 E4409C22 A279C718 A279C718 F7B097FA F7B097FA A042B99D A042B99D 07633866 07633866 C8F5099F C8F5099F 38394A24 38394A24 00000000 DF9CBEA9 B5142D18 825D2480 825D2480 702548A2 702548A2 421A8D38 421A8D38 9E5C4FF7 9E5C4FF7 709EF54B 709EF54B 59421079 59421079 0C4B8F53 0C4B8F53 C36D1175 C36D1175 00000000 B8131F50 5CC6CB7B 5CC6CB7B 5DDAA2A1 5DDAA2A1 5B1A0904 5B1A0904 77FC8B29 77FC8B29 0CD39FA6 0CD39FA6 C015BE79 C015BE79 69948F5E 69948F5E 9C263D67 9C263D67 E802528F 00000000 4822BD92. t0=5BC40012. F05480BE B6AF816F . t0=99C9694E. 03EFB931 1D2EE7EC Round subkeys . t0=91BC2070. t1=67A58299. 5415D5D3. 149B9CEC. 5DE68E49 9C3D2478 . B250BEDA. 5415D5D3. t0=839B0017. t1=EB143CFF. t0=E9BC6975. t1=9F09EB26. t0=DAA00849. t0=5D174956. t0=C251B3CE. t1=17AE5B7E. t1=78D2617B. t0=77FC927F. 04CA5304. 35DC0BBD A03E5018 . 4235364D 0CEC363A . 77665544 33221100 --> 45661061 . t0=7C4536B9. 821B5F36. t1=77BD388E. 821B5F36. t0=CEB0BBE1. CT=37527BE0052334B89F0CFCCAE87CFA20 t0=988C8223. t1=2F7D5E04. t0=8F8307AA. t1=C3D8103E. t0=91BC2070. 04CA5304. t0=A7D24F8E. BFF8215E. t0=C251B3CE. t1=19C23B0A. 7C6CF0C4 2F9109C8 . ========== KEYSIZE=256 KEY=0123456789ABCDEFFEDCBA987654321000112233445566778899AABBCCDDEEFF . F1689449 71009CA9 . EFB2F051. . EFB2F051. C229F13B B1364772 .

These pairs are related. t0=29C0736C. The key of each entry is made up of the ciphertext two and three entries back.R[ 7]: R[ 6]: R[ 5]: R[ 4]: R[ 3]: R[ 2]: R[ 1]: R[ 0]: R[-1]: x= x= x= x= x= x= x= x= x= D5178F25 D5178F25 2C125DD7 2C125DD7 99424DFF 99424DFF 5EC769BF 5EC769BF 00000000 00D35CC5 00D35CC5 5A526278 5A526278 FBC14BFC FBC14BFC 44D13C60 44D13C60 00000000 D8447F91 E35CD910 E35CD910 698BE047 698BE047 D38B6C9F D38B6C9F 76CD39B1 00000000 65C2BD96. t1=E4D3D68D. t0=4D8B7489. Counterpane Systems KEYSIZE=256 I=1 KEY=0000000000000000000000000000000000000000000000000000000000000000 PT=00000000000000000000000000000000 CT=57FF739D4DC92C1BD7FC01700CC8216F I=2 KEY=0000000000000000000000000000000000000000000000000000000000000000 PT=57FF739D4DC92C1BD7FC01700CC8216F CT=D43BB7556EA32E46F2A282B7D45B4E0D I=3 KEY=57FF739D4DC92C1BD7FC01700CC8216F00000000000000000000000000000000 PT=D43BB7556EA32E46F2A282B7D45B4E0D CT=90AFE91BB288544F2C32DC239B2635E6 I=4 KEY=D43BB7556EA32E46F2A282B7D45B4E0D57FF739D4DC92C1BD7FC01700CC8216F PT=90AFE91BB288544F2C32DC239B2635E6 CT=6CB4561C40BF0A9705931CB6D408E7FA I=5 KEY=90AFE91BB288544F2C32DC239B2635E6D43BB7556EA32E46F2A282B7D45B4E0D PT=6CB4561C40BF0A9705931CB6D408E7FA CT=3059D6D61753B958D92F4781C8640E58 I=6 KEY=6CB4561C40BF0A9705931CB6D408E7FA90AFE91BB288544F2C32DC239B2635E6 PT=3059D6D61753B958D92F4781C8640E58 67 . We believe that these test vectors provide a thorough test of a Twoﬁsh implementation. 6A997290. 16750474. and can easily be tested automatically. A23B7169. t0=9D16BBB3. 7CB57D06. FILENAME: "ecb_tbl.txt" I=2 KEY=000000000000000000000000000000000000000000000000 PT=EFA71F788965BD4453F860178FC19101 CT=88B2B2706B105E36B446BB6D731A1E88 I=3 KEY=EFA71F788965BD4453F860178FC191010000000000000000 PT=88B2B2706B105E36B446BB6D731A1E88 CT=39DA69D6BA4997D585B6DC073CA341B2 I=4 KEY=88B2B2706B105E36B446BB6D731A1E88EFA71F788965BD44 PT=39DA69D6BA4997D585B6DC073CA341B2 CT=182B02D81497EA45F9DAACDC29193A65 I=5 KEY=39DA69D6BA4997D585B6DC073CA341B288B2B2706B105E36 PT=182B02D81497EA45F9DAACDC29193A65 CT=7AFF7A70CA2FF28AC31DD8AE5DAAAB63 I=6 KEY=182B02D81497EA45F9DAACDC29193A6539DA69D6BA4997D5 PT=7AFF7A70CA2FF28AC31DD8AE5DAAAB63 CT=D1079B789F666649B6BD7D1629F1F77E I=7 KEY=7AFF7A70CA2FF28AC31DD8AE5DAAAB63182B02D81497EA45 PT=D1079B789F666649B6BD7D1629F1F77E CT=3AF6F7CE5BD35EF18BEC6FA787AB506B I=8 KEY=D1079B789F666649B6BD7D1629F1F77E7AFF7A70CA2FF28A PT=3AF6F7CE5BD35EF18BEC6FA787AB506B CT=AE8109BFDA85C1F2C5038B34ED691BFF I=9 KEY=3AF6F7CE5BD35EF18BEC6FA787AB506BD1079B789F666649 PT=AE8109BFDA85C1F2C5038B34ED691BFF CT=893FD67B98C550073571BD631263FC78 I=10 KEY=AE8109BFDA85C1F2C5038B34ED691BFF3AF6F7CE5BD35EF1 PT=893FD67B98C550073571BD631263FC78 CT=16434FC9C8841A63D58700B5578E8F67 : : : I=48 KEY=DEA4F3DA75EC7A8EAC3861A9912402CD5DBE44032769DF54 PT=FB66522C332FCC4C042ABE32FA9E902F CT=F0AB73301125FA21EF70BE5385FB76B6 I=49 KEY=FB66522C332FCC4C042ABE32FA9E902FDEA4F3DA75EC7A8E PT=F0AB73301125FA21EF70BE5385FB76B6 CT=E75449212BEEF9F4A390BD860A640941 ========== Electronic Codebook (ECB) Mode Tables Known Answer Test Tests permutation tables and MDS matrix multiply tables. A23B7169. 7CB57D06. t1=0B2FC79F. 00000000. key) pairs. CT=6B459286F3FFD28D49F15B1581B08E42 I=49 KEY=BCA724A54533C6987E14AA827952F921 PT=6B459286F3FFD28D49F15B1581B08E42 CT=5D9D4EEFFA9151575524F115815A12E0 ========== KEYSIZE=192 PT=00000000000000000000000000000000 I=1 KEY=000000000000000000000000000000000000000000000000 PT=00000000000000000000000000000000 CT=EFA71F788965BD4453F860178FC19101 A. Algorithm Name: Principal Submitter: ========== KEYSIZE=128 I=1 KEY=00000000000000000000000000000000 PT=00000000000000000000000000000000 CT=9F589F5CF6122C32B6BFEC2F2AE8C35A I=2 KEY=00000000000000000000000000000000 PT=9F589F5CF6122C32B6BFEC2F2AE8C35A CT=D491DB16E7B1C39E86CB086B789F5419 I=3 KEY=9F589F5CF6122C32B6BFEC2F2AE8C35A PT=D491DB16E7B1C39E86CB086B789F5419 CT=019F9809DE1711858FAAC3A3BA20FBC3 I=4 KEY=D491DB16E7B1C39E86CB086B789F5419 PT=019F9809DE1711858FAAC3A3BA20FBC3 CT=6363977DE839486297E661C6C9D668EB I=5 KEY=019F9809DE1711858FAAC3A3BA20FBC3 PT=6363977DE839486297E661C6C9D668EB CT=816D5BD0FAE35342BF2A7412C246F752 I=6 KEY=6363977DE839486297E661C6C9D668EB PT=816D5BD0FAE35342BF2A7412C246F752 CT=5449ECA008FF5921155F598AF4CED4D0 I=7 KEY=816D5BD0FAE35342BF2A7412C246F752 PT=5449ECA008FF5921155F598AF4CED4D0 CT=6600522E97AEB3094ED5F92AFCBCDD10 I=8 KEY=5449ECA008FF5921155F598AF4CED4D0 PT=6600522E97AEB3094ED5F92AFCBCDD10 CT=34C8A5FB2D3D08A170D120AC6D26DBFA I=9 KEY=6600522E97AEB3094ED5F92AFCBCDD10 PT=34C8A5FB2D3D08A170D120AC6D26DBFA CT=28530B358C1B42EF277DE6D4407FC591 I=10 KEY=34C8A5FB2D3D08A170D120AC6D26DBFA PT=28530B358C1B42EF277DE6D4407FC591 CT=8A8AB983310ED78C8C0ECDE030B8DCA4 : : : I=48 KEY=137A24CA47CD12BE818DF4D2F4355960 PT=BCA724A54533C6987E14AA827952F921 TWOFISH Bruce Schneier. The plaintext of each entry is the ciphertext of the previous one. t1=F35A3092. t0=8C8927A1. t1=64AD7A3F. 6A997290. t0=E66B9D19. t0=0BB41F61. t1=B87B2DFD. ciphertext. t1=3945E62C. t1=C3D8103E. t0=D5397903.2 Full Encryptions The following ﬁle shows a number of (plaintext.

CT=E69465770505D7F80EF68CA38AB3A3D6 I=7 KEY=3059D6D61753B958D92F4781C8640E586CB4561C40BF0A9705931CB6D408E7FA PT=E69465770505D7F80EF68CA38AB3A3D6 CT=5AB67A5F8539A4A5FD9F0373BA463466 I=8 KEY=E69465770505D7F80EF68CA38AB3A3D63059D6D61753B958D92F4781C8640E58 PT=5AB67A5F8539A4A5FD9F0373BA463466 CT=DC096BCD99FC72F79936D4C748E75AF7 I=9 KEY=5AB67A5F8539A4A5FD9F0373BA463466E69465770505D7F80EF68CA38AB3A3D6 PT=DC096BCD99FC72F79936D4C748E75AF7 CT=C5A3E7CEE0F1B7260528A68FB4EA05F2 I=10 KEY=DC096BCD99FC72F79936D4C748E75AF75AB67A5F8539A4A5FD9F0373BA463466 PT=C5A3E7CEE0F1B7260528A68FB4EA05F2 CT=43D5CEC327B24AB90AD34A79D0469151 : : : I=48 KEY=2E2158BC3E5FC714C1EEECA0EA696D48D2DED73E59319A8138E0331F0EA149EA PT=248A7F3528B168ACFDD1386E3F51E30C CT=431058F4DBC7F734DA4F02F04CC4F459 I=49 KEY=248A7F3528B168ACFDD1386E3F51E30C2E2158BC3E5FC714C1EEECA0EA696D48 PT=431058F4DBC7F734DA4F02F04CC4F459 CT=37FE26FF1CF66175F5DDF4C33B97A205 68 .

Ensuring Distributed Accountability for Data Sharing Using Reversible Data Hiding in Cloud Over-Lay Network

by International Organization of Scientific Research (IOSR)

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd