This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

by

Bethany Delman

A Thesis Submitted in Partial Fulﬁllment of the Requirements for the Degree of

Master of Science in Computer Engineering

Supervised by

Professor Dr. Muhammad Shaaban

Department of Computer Engineering

Kate Gleason College of Engineering

Rochester Institute of Technology

Rochester, New York

July 2004

Approved By:

Dr. Muhammad Shaaban

Professor

Primary Advisor

Dr. Stanisław Radziszowski

Professor, Computer Science Department

Dr. Shanchieh Jay Yang

Assistant Professor

Thesis Release Permission Form

Rochester Institute of Technology

Kate Gleason College of Engineering

Master of Science, Computer Engineering

Title: Genetic Algorithms in Cryptography

I, Bethany Delman, hereby grant permission to the Rochester Institute of Technology to

reproduce my print thesis in whole or in part. Any reproduction will not be for commercial

use or proﬁt.

I, Bethany Delman, additionally grant to the Rochester Institute of Technology Digi-

tal Media Library (RIT DML) the non-exclusive license to archive and provide electronic

access to my thesis in whole or in part in all forms of media in perpetuity.

I understand that my work, in addition to its bibliographic record and abstract, will be

available to the world-wide community of scholars and researchers through the RIT DML.

I retain all other ownership rights to the copyright of the thesis. I also retain the right to

use in future works (such as articles or books) all or part of this thesis. I am aware that the

Rochester Institute of Technology does not require registration of copyrights for ETDs.

I hereby certify that, if appropriate, I have obtained and attached written permission

statements from the owners of each third party copyrighted matter to be included in my

thesis. I certify that the version I submitted is the same as that approved by my committee.

Bethany Delman

Date

Abstract

Genetic algorithms (GAs) are a class of optimization algorithms. GAs attempt to solve

problems through modeling a simpliﬁed version of genetic processes. There are many

problems for which a GA approach is useful. It is, however, undetermined if cryptanalysis

is such a problem.

Therefore, this work explores the use of GAs in cryptography. Both traditional crypt-

analysis and GA-based methods are implemented in software. The results are then com-

pared using the metrics of elapsed time and percentage of successful decryptions. A de-

termination is made for each cipher under consideration as to the validity of the GA-based

approaches found in the literature. In general, these GA-based approaches are typical of

the ﬁeld.

Of the genetic algorithm attacks found in the literature, totaling twelve, seven were

re-implemented. Of these seven, only three achieved any success. The successful attacks

were those on the transposition and permutation ciphers by Matthews [20], Clark [4], and

Gr¨ undlingh and Van Vuuren [13], respectively. These attacks were further investigated in

an attempt to improve or extend their success. Unfortunately, this attempt was unsuccessful,

as was the attempt to apply the Clark [4] attack to the monoalphabetic substitution cipher

and achieve the same or indeed any level of success.

Overall, the standard ﬁtness equation genetic algorithm approach, and the scoreboard

variant thereof, are not worth the extra effort involved. Traditional cryptanalysis methods

are more successful, and easier to implement. While a traditional method takes more time,

a faster unsuccessful attack is worthless. The failure of the genetic algorithm approach

indicates that supplementary research into traditional cryptanalysis methods may be more

useful and valuable than additional modiﬁcation of GA-based approaches.

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Document Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Monoalphabetic Substitution cipher . . . . . . . . . . . . . . . . . . . . . 9

3.2 Polyalphabetic Substitution cipher . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Permutation/Transposition ciphers . . . . . . . . . . . . . . . . . . . . . . 12

3.3.1 Permutation cipher . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3.2 Transposition cipher . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.4 Knapsack ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4.1 Merkle-Hellman Knapsack cipher . . . . . . . . . . . . . . . . . . 16

3.4.2 Chor-Rivest Knapsack cipher . . . . . . . . . . . . . . . . . . . . 18

3.5 Vernam cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Relevant Prior Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Spillman et al. - 1993 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Matthews - 1993 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Spillman - 1993 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

i

4.3.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 Clark - 1994 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.4.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5 Lin, Kao - 1995 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.6 Clark, Dawson, Bergen - 1996 . . . . . . . . . . . . . . . . . . . . . . . . 37

4.6.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.7 Clark, Dawson, Nieuwland - 1996 . . . . . . . . . . . . . . . . . . . . . . 39

4.7.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.8 Clark, Dawson - 1997 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.9 Kolodziejczyk - 1997 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.9.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.10 Clark, Dawson - 1998 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.10.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.11 Yaseen, Sahasrabuddhe - 1999 . . . . . . . . . . . . . . . . . . . . . . . . 45

4.11.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.12 Gr¨ undlingh, Van Vuuren - submitted 2002 . . . . . . . . . . . . . . . . . . 47

4.12.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.13 Overall Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Traditional Cryptanalysis Methods . . . . . . . . . . . . . . . . . . . . . . 52

5.1 Distinguishing among the three ciphers . . . . . . . . . . . . . . . . . . . 53

5.2 Monoalphabetic substitution cipher . . . . . . . . . . . . . . . . . . . . . . 53

5.3 Polyalphabetic substitution cipher . . . . . . . . . . . . . . . . . . . . . . 54

5.4 Permutation/Transposition ciphers . . . . . . . . . . . . . . . . . . . . . . 56

5.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6 Attack Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

ii

6.2 Classical Attack Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.3 Genetic Algorithm Attack Results . . . . . . . . . . . . . . . . . . . . . . 64

6.4 Comparison of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7 Extensions and Additions . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.1 Genetic Algorithm Attack Extensions . . . . . . . . . . . . . . . . . . . . 74

7.1.1 Transposition Cipher Attacks . . . . . . . . . . . . . . . . . . . . . 75

7.1.2 Permutation Cipher Attack . . . . . . . . . . . . . . . . . . . . . . 76

7.1.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.2 Additions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.2.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8 Evaluation and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

8.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

iii

List of Figures

4.1 Substitution Cipher Family and Attacking Papers . . . . . . . . . . . . . . 23

4.2 Permutation Cipher Family and Attacking Papers . . . . . . . . . . . . . . 24

4.3 Knapsack Cipher Family and Attacking Papers . . . . . . . . . . . . . . . 24

4.4 Vernam Cipher Family and Attacking Paper . . . . . . . . . . . . . . . . . 24

6.1 Classical Substitution Attack . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.2 Classical Vigen` ere Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.3 Classical Permutation Attack . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.4 Classical Transposition Attack . . . . . . . . . . . . . . . . . . . . . . . . 65

6.5 Effect of Number of Generations at Population Size 10 . . . . . . . . . . . 67

6.6 Effect of Number of Generations at Population Size 20 . . . . . . . . . . . 67

6.7 Effect of Number of Generations at Population Size 40 . . . . . . . . . . . 68

6.8 Effect of Number of Generations at Population Size 50 . . . . . . . . . . . 68

6.9 Effect of Population Size at 25 Generations . . . . . . . . . . . . . . . . . 69

6.10 Effect of Population Size at 50 Generations . . . . . . . . . . . . . . . . . 69

6.11 Effect of Population Size at 100 Generations . . . . . . . . . . . . . . . . . 70

6.12 Effect of Population Size at 200 Generations . . . . . . . . . . . . . . . . . 70

6.13 Effect of Population Size at 400 Generations . . . . . . . . . . . . . . . . . 71

iv

Glossary

block A sequence of consecutive characters encoded at one

time

block length The number of characters in a block

chromosome The genetic material of a individual - represents the in-

formation about a possible solution to the given problem

cipher An algorithm for performing encryption (and the reverse,

decryption) - a series of well-deﬁned steps that can be

followed as a procedure. Works at the level of individual

letters, or small groups of letters

ciphertext A text in the encrypted form produced by some cryp-

tosystem. The convention is for ciphertexts to contain no

white space or punctuation

crossover (mating) Crossover is the process by which two chromosomes

combine some portion of their genetic material to pro-

duce a child or children

cryptanalysis The analysis and deciphering of cryptographic writings

or systems

cryptography The process or skill of communicating in or deciphering

secret writings or ciphers

cryptosystem The package of all processes, formulae, and instructions

for encoding and decoding messages using cryptography

decryption Any procedure used in cryptography to convert cipher-

text (encrypted data) into plaintext

digram Sequence of two consecutive characters

encryption The process of putting text into encoded form

v

ﬁtness The extent to which a possible solution successfully

solves the given problem - usually a numerical value

generation The average interval of time between the birth of parents

and the birth of their offspring - in the genetic algorithm

case, this is one iteration of the main loop of code

genetic algorithm (GA) Search/optimization algorithm based on the mechanics

of natural selection and natural genetics

key A relatively small amount of information that is used by

an algorithm to customize the transformation of plaintext

into ciphertext (during encryption) or vice versa (during

decryption)

key length The size of the key - how many values comprise the key

monoalphabetic Using one alphabet - refers to a cryptosystem where each

alphabetic character is mapped to a unique alphabetic

character

mutation Simulation of transcription errors that occur in nature

with a lowprobability - a child is randomly changed from

what its parents produced in mating

one-time pad Another name for the Vernam cipher

order-based GA A form of GA where the chromosomes represent per-

mutations. Special care must be taken to avoid illegal

permutations

plaintext A message before encryption or after decryption, i.e., in

its usual form which anyone can read, as opposed to its

encrypted form

polyalphabetic Using many alphabets - refers to a cipher where each al-

phabetic character can be mapped to one of many possi-

ble alphabetic characters

vi

population The possible solutions (chromosomes) currently under

investigation, as well as the number of solutions that can

be investigated at one time, i.e., per generation

scoreboard Method of determining the ﬁtness of a possible solution

- takes a small list of the most common digrams and tri-

grams and gives each chromosome a score based on how

often these combinations occur

trigram Sequence of three consecutive characters

unigram Single character

Vigen` ere The speciﬁc polyalphabetic substitution cipher used in

this work

vii

Chapter 1

Introduction

The application of a genetic algorithm (GA) to the ﬁeld of cryptanalysis is rather unique.

Few works exist on this topic. This nontraditional application is investigated to determine

the beneﬁts of applying a GA to a cryptanalytic problem, if any. If the GA-based approach

proves successful, it could lead to faster, more automated cryptanalysis techniques. How-

ever, since this area is so different from the application areas where GAs developed, the

GA-based methods will likely prove to be less successful than the traditional methods.

The primary goals of this work are to produce a performance comparison between

traditional cryptanalysis methods and genetic algorithm based methods, and to determine

the validity of typical GA-based methods in the ﬁeld of cryptanalysis.

The focus will be on classical ciphers, including substitution, permutation, transposi-

tion, knapsack and Vernam ciphers. The principles used in these ciphers form the founda-

tion for many of the modern cryptosystems. Also, if a GA-based approach is unsuccessful

on these simple systems, it is unlikely to be worthwhile to apply a GA-based approach to

more complicated systems. A thorough search of the available literature found GA-based

attacks on only these ciphers. In many cases, these ciphers are some of the simplest pos-

sible versions. For example, one of the knapsack systems attacked is the Merkle-Hellman

knapsack scheme. In this scheme, a superincreasing sequence of length n is used as a pub-

lic key. According to Shamir [23], a typical value for n is 100. In the GA-based attacks on

this cipher, n was less than 16. This is almost an order of magnitude below the typical case,

and makes these attacks useful only as proof-of-concept examples for learning the use of

1

GAs.

In other GA-based attacks, there is a huge gap in the knowledge needed to re-implement

the attack and the information provided in the paper. For example, the values of p and h are

vitally necessary for an implementation of the Chor-Rivest knapsack scheme. This system

utilizes a ﬁnite ﬁeld, F

q

, of characteristic p, where q = p

h

, p ≥ h, and the discrete logarithm

problem is feasible. In the GA-based attack on this cipher, these values are not provided.

This information hole means that this attack could have been run using trivial parameters,

or parameters known to be easy to attack. This attack can not be considered remotely valid

without knowing these parameters.

Many of the GA-based attacks also lack information required for comparison to the tra-

ditional attacks. The number of generations it took for a run of the GA to get to some state

is not at all useful when attempting to compare and contrast attacks. The most coherent

and seemingly valid attacks were re-implemented so that a consistent, reasonable set of

metrics could be collected. These metrics include elapsed time required for the attack and

the success percentage of each attack.

1.1 Document Overview

This thesis is broken into a number of chapters. Each chapter covers one aspect of the thesis

in detail. Chapter 2 gives a brief introduction to genetic algorithms. The general form of

the selection, reproduction and mutation operators are discussed, as is the general sequence

of events for a GA-based approach. Chapter 3 introduces the necessary classical ciphers in

detail. The algorithm for each cipher is given, in some cases including a small example to

illustrate the process of using the given cryptosystem.

The next two chapters cover the different attacks found in the literature. Chapter 4 dis-

cusses the various GA-based attacks in detail. These attacks are covered in chronological

order, with up to four papers attacking the same cipher or variants thereof. The following

2

chapter, Chapter 5, details the classical attack on those ciphers chosen for further investiga-

tion. The ciphers were selected for further investigation due to the difﬁculty or lack thereof

of the traditional attack method, as well as the clarity and quality of the GA-based work.

For example, the Vernam cipher was not selected for further investigation. This cipher is

unconditionally secure, meaning that an attack is successful only when there is a protocol

or implementation failure. There is no basis for attacking this cipher when no mistakes

have been made.

The next chapters detail the results of the traditional and GA-based attacks, as well as

extensions to the GA-based attacks. Chapter 6 covers the results of the traditional attacks,

and discusses those GA-based attacks that achieved some success. Chapter 7 discusses the

extensions made to several of the GA-based attacks, as well as new work attempted.

Finally, Chapter 8 discusses the overall conclusions drawn from this work, as well as

any other directions that need further investigation.

3

Chapter 2

Genetic Algorithms

The genetic algorithm is a search algorithm based on the mechanics of natural selection

and natural genetics [12]. As summarized by Tomassini [29], the main idea is that in order

for a population of individuals to adapt to some environment, it should behave like a natu-

ral system. This means that survival and reproduction of an individual is promoted by the

elimination of useless or harmful traits and by rewarding useful behavior. The genetic algo-

rithm belongs to the family of evolutionary algorithms, along with genetic programming,

evolution strategies, and evolutionary programming. Evolutionary algorithms can be con-

sidered as a broad class of stochastic optimization techniques. An evolutionary algorithm

maintains a population of candidate solutions for the problem at hand. The population is

then evolved by the iterative application of a set of stochastic operators. The set of opera-

tors usually consists of mutation, recombination, and selection or something very similar.

Globally satisfactory, if sub-optimal, solutions to the problem are found in much the same

way as populations in nature adapt to their surrounding environment.

Using Tomassini’s terms [29], genetic algorithms (GAs) consider an optimization prob-

lem as the environment where feasible solutions are the individuals living in that environ-

ment. The degree of adaptation of an individual to its environment is the counterpart of the

ﬁtness function evaluated on a solution. Similarly, a set of feasible solutions takes the place

of a population of organisms. An individual is a string of binary digits or some other set of

symbols drawn from a ﬁnite set. Each encoded individual in the population may be viewed

as a representation of a particular solution to a problem.

4

In general, a genetic algorithm begins with a randomly generated set of individuals.

Once the initial population has been created, the genetic algorithm enters a loop [29]. At

the end of each iteration, a new population has been produced by applying a certain num-

ber of stochastic operators to the previous population. Each such iteration is known as a

generation [29].

A selection operator is applied ﬁrst. This creates an intermediate population of n “par-

ent” individuals. To produce these “parents”, n independent extractions of an individual

from the old population are performed [29]. The probability of each individual being ex-

tracted should be (linearly) proportional to the ﬁtness of that individual. This means that

above average individuals should have more copies in the new population, while below

average individuals should have few to no copies present, i.e., a below average individual

risks extinction.

Once the intermediate population of “parents” (those individuals selected for reproduc-

tion) has been produced, the individuals for the next generation will be created through the

application of a number of reproduction operators [29]. These operators can involve one or

more parents. An operator that involves just one parent, simulating asexual reproduction,

is called a mutation operator. When more than one parent is involved, sexual reproduction

is simulated, and the operator is called recombination [29]. The genetic algorithm uses two

reproduction operators - crossover and mutation.

To apply a crossover operator, parents are paired together. There are several differ-

ent types of crossover operators, and the types available depend on what representation

is used for the individuals. For binary string individuals, one-point, two-point, and uni-

form crossover are often used. For permutation or order-based individuals, order, partially

mapped, and cycle crossover are options. The one-point crossover means that the par-

ent individuals exchange a random preﬁx when creating the child individuals. Two-point

crossover is an exchange of a random substring, and uniform crossover takes each bit in

the child arbitrarily from either parent. Order and partially mapped crossover are similar

to two-point crossover in that two cut points are selected. For order crossover, the section

5

between the ﬁrst and second cut points is copied from the ﬁrst parent to the child [28]. The

remaining places are ﬁlled using elements not occurring in this section, in the order that

they occur in the second parent starting from the second cut point and wrapping around as

needed. For partially mapped crossover, the section between the two cut points deﬁnes a

series of swapping operations to be performed on the second parent [28]. Cycle crossover

satisﬁes two conditions - every position of the child must retain a value found in the cor-

responding position of a parent, and the child must be a valid permutation. Each cycle, a

random parent is selected. For example [28]:

• Parent 1 is (h k c e f d b l a i g j) and parent 2 is (a b c d e f g h i j k l)

• For position #1 in the child, ‘h’ is selected

• Therefore, ‘a’ cannot be selected from parent 2 and must be chosen from parent 1

• Since ‘a’ is above ‘i’ in parent 2, ‘i’ must also be selected from parent 1

• So, if ‘h’ is selected from parent 1 for position #1, then ‘a’, ‘i’, ‘j’, and ‘l’ must also

be selected from parent 1

• The positions of these elements are said to form a cycle, since selection continues

until the ﬁrst element (‘h’ here) is reached again

After crossover, each individual has a small chance of mutation. The purpose of the

mutation operator is to simulate the effect of transcription errors that can happen with a

very low probability when a chromosome is mutated [29]. A standard mutation operator

for binary strings is bit inversion. Each bit in an individual has a small chance of mutating

into its complement i.e. a ‘0’ would mutate into a ‘1’.

In principle, the loop of selection-crossover-mutation is inﬁnite [29]. However, it is

usually stopped when a given termination condition is met. Some common termination

conditions are [29]:

1. A pre-determined number of generations have passed

6

2. A satisfactory solution has been found

3. No improvement in solution quality has taken place for a certain number of genera-

tions

The different termination conditions are possible since a genetic algorithm is not guar-

anteed to converge to a solution. The evolutionary cycle can be summarized as follows

[29]:

generation = 0

seed population

while not (termination condition) do

generation = generation + 1

calculate ﬁtness

selection

crossover

mutation

end while

Typical application areas for genetic algorithms include:

• Function Optimization

• Graph Applications such as

– Coloring

– Paths

– Circuit Layout

• Scheduling

• Satisﬁability

• Game Strategies

• Chess Problems

7

• Computing Models

• Neural Networks

• Lens Design

Many of these application areas are concerned with problems which are hard to solve

but have easily veriﬁable solutions. Another trait common to these application areas is

the equation style of ﬁtness function. Cryptography and cryptanalysis could be considered

to meet these criteria. However, cryptanalysis is not closely related to the typical GA

application areas and, subsequently, ﬁtness equations are difﬁcult to generate. This makes

the use of a genetic algorithm approach to cryptanalysis rather unusual.

8

Chapter 3

Cryptography

In general, the genetic algorithm approach has only been used to attack fairly simple ci-

phers. Most of the ciphers attacked are considered to be classical, i.e. those created before

1950 A.D. [21] which have become well known over time. Ciphers attacked include:

• Monoalphabetic Substitution cipher

• Polyalphabetic Substitution cipher

• Permutation cipher

• Transposition cipher

• Merkle-Hellman Knapsack cipher

• Chor-Rivest Knapsack cipher

• Vernam cipher

3.1 Monoalphabetic Substitution cipher

The simplest cipher of those attacked is the monoalphabetic substitution cipher. This cipher

is described as follows:

• Let the plaintext and the ciphertext character sets be the same, say the alphabet Z

9

• Let the keys, /, consist of all possible permutations of the symbols in Z

• For each permutation π ∈ /,

1. deﬁne the encryption function e

π

(x) = π(x)

2. and deﬁne the decryption function d

π

(y) = π

−1

(y), where π

−1

is the inverse

permutation to π.

For example, deﬁne Z to be the 26 letter English alphabet. Then, a random permutation

π could be (plaintext characters are in lower case and ciphertext characters in upper case)

[27]:

a b c d e f g h i j k l m

X N Y A H P O G Z Q W B T

n o p q r s t u v w x y z

S F L R C V M U E K J D I

This deﬁnes the encryption function like so - e

π

(a) = X, e

π

(b) = N, etc. The decryp-

tion function, d

π

(y) is the inverse permutation, and can be obtained by switching the rows.

π

−1

is, therefore:

A B C D E F G H I J K L M

d l r y v o h e z x w p t

N O P Q R S T U V W X Y Z

b g f j q n m u s k a c i

Using the encryption function, a plaintext of ‘plain’ becomes a ciphertext of ‘LBXZS’.

A ciphertext of ‘YZLGHC’ would decrypt to ‘cipher’.

Monoalphabetic substitution is easy to recognize, since the frequency distribution of

the character set is unchanged [21]. The frequency of individual characters changes, of

10

course. This attribute allows associations between ciphertext and plaintext characters, i.e.

if the most frequent plaintext character X occurred ten times, the ciphertext character X

maps to will appear ten times.

3.2 Polyalphabetic Substitution cipher

This cipher is a more complex version of the substitution cipher. Instead of mapping the

plaintext alphabet onto ciphertext characters, different substitution mappings are used on

different portions of the plaintext [21]. The result is called polyalphabetic substitution. The

simplest case consists of using different alphabets sequentially and repeatedly, so that the

position of each plaintext character determines which mapping is applied to it. Under dif-

ferent alphabets, the same plaintext character is encrypted to different ciphertext characters,

making frequency analysis harder.

This cipher has the following properties (assuming m mappings):

• The key space / consists of all ordered sets of m permutations

• These permutations are represented as (k

1

, k

2

, . . . , k

m

)

• Each permutation k

i

is deﬁned on the alphabet in use

• Encryption of the message x = (x

1

, x

2

, . . . , x

m

) is given by

• e

k

(x) = k

1

(x

1

)k

2

(x

2

) . . . k

m

(x

m

), assuming the key k = (k

1

, k

2

, . . . , k

m

)

• The decryption key associated with e

k

(x) is d

k

(y) = (k

−1

1

, k

−1

2

, . . . , k

−1

m

).

The encryption and decryption functions could also be written as follows (assuming

that the addition/subtraction operations occur modulo the size of the alphabet):

• e

K

(x

1

, x

2

, . . . , x

m

) = (x

1

+ k

1

, x

2

+ k

2

, . . . , x

m

+ k

m

)

• d

K

(y

1

, y

2

, . . . , y

m

) = (y

1

−k

1

, y

2

−k

2

, . . . , y

m

−k

m

)

11

The most common polyalphabetic substitution cipher is the Vigen` ere cipher. This ci-

pher encrypts m characters at a time, making each plaintext character equivalent to m

alphabetic characters. A example of the Vigen` ere cipher is:

• Suppose m = 6

• Using the mapping A ↔0, B ↔1, . . . , Z ↔25

• If the keyword / = CIPHER, numerically (2, 8, 15, 7, 4, 17)

• Suppose the plaintext is the string ‘cryptosystem’

• Convert the plaintext elements to residues modulo 26 - 2 17 24 15 19 14 18 24 18 19

4 12

• Write the plaintext elements in groups of 6 and add the keyword modulo 26

•

2 17 24 15 19 14 18 24 18 19 4 12

2 8 15 7 4 17 2 8 15 7 4 17

4 25 13 22 23 5 20 6 7 0 8 3

• The alphabetic equivalent of the ciphertext string is: EZNWXFUGHAID.

• To decrypt, the same keyword is used but is subtracted modulo 26 from the ciphertext

3.3 Permutation/Transposition ciphers

Although simple transposition ciphers change the dependencies between consecutive char-

acters, they are fairly easily recognized since the frequency distribution of the characters

is preserved [21]. This is true for both ciphers considered, the permutation cipher and the

columnar transposition cipher. Both ciphers apply the same principles, but differ in how

the transformation is applied. The permutation cipher is applied to blocks of characters

while the columnar transposition cipher is applied to the entire text at once.

12

3.3.1 Permutation cipher

The idea behind a permutation cipher is to keep the plaintext characters unchanged, but

alter their positions by rearrangement using a permutation [27].

This cipher is deﬁned as:

• Let m be a positive integer, and / consist of all permutations of ¦1, . . . , m¦

• For a key (permutation) π ∈ /, deﬁne:

• The encryption function e

π

(x

1

, . . . , x

m

) = (x

π(1)

, . . . , x

π(m)

)

• The decryption function d

π

(y

1

, . . . , y

m

) = (y

π

−1

(1)

, . . . , y

π

−1

(m)

)

A small example, assuming m = 6, and the key is the permutation π:

i 1 2 3 4 5 6

π(i) 3 5 1 6 4 2

The ﬁrst row is the value of i, and the second row is the corresponding value of

π(i). The inverse permutation, π

−1

is constructed by interchanging the two rows, and

rearranging the columns so that the ﬁrst row is in increasing order, Therefore, π

−1

is:

x 1 2 3 4 5 6

π

−1

(x) 3 6 1 5 2 4

If the plaintext given is ‘apermutation’, it is ﬁrst partitioned into groups of m letters

(‘apermu’ and ‘tation’) and then rearranged according to the permutation π. This yields

‘EMAURP’ and ‘TOTNIA’, making the ciphertext equal ‘EMAURPTOTNIA’. The cipher-

text is decrypted in a similar process, using the inverse permutation π

−1

.

3.3.2 Transposition cipher

Another type of cipher in this category is the columnar transposition cipher. As described in

[24], this cipher consists of writing the plaintext into a rectangle with a prearranged number

of columns and transcribing it vertically by columns to yield the ciphertext. The key is

a prearranged sequence of numbers which determines both the width of the inscription

13

rectangle and the order in which to transcribe the columns [24]. This key is usually derived

from an agreed keyword by assigning numbers to the individual letters according to their

alphabetical order.

For example, if the keyword is SORCERY, the resulting numerical key would be 6 3

4 1 2 5 7 [24], where numbers are assigned to the keyword letters according to the alpha-

betical order of the letters. If we wish to encipher the message LASER BEAMS CAN BE

MODULATED TO CARRY MORE INTELLIGENCE THAN RADIO WAVES, it would

ﬁrst be written out on a width of 7 under the numerical key. As is traditional, white spaces

and punctuation are ignored in this and all ciphertexts used. The rectangle would look like

this:

S O R C E R Y

6 3 4 1 2 5 7

L A S E R B E

A M S C A N B

E M O D U L A

T E D T O C A

R R Y M O R E

I N T E L L I

G E N C E T H

A N R A D I O

W A V E S Q R

Since the message length was not a multiple of the keyword length, the message does

not ﬁll the entire rectangle. Deciphering is simpliﬁed if the last row contains no blank

cells, so two dummy letters are added (here, Q and R) to make the rectangle completely

ﬁlled [24]. Enciphering consists of reading the ciphertext out vertically in the order of the

numbered columns. It can be written in groups of ﬁve letters at the same time, if so desired

[24]. The ciphertext is, therefore, ECDTM ECAER AUOOL EDSAM MERNE NASSO

DYTNR VBNLC RLTIQ LAETR IGAWE BAAEI HOR.

14

The decipherer would count the number of letters in the message (63) and determine

the length of the inscription rectangle by dividing this value by the length of the keyword.

This makes the inscription rectangle 7 9 in this case. The numerical key is then written

above the rectangle and the ciphertext characters entered into the diagram in the order of

the key numbers [24]. The plaintext is read off by rows, as it was originally entered.

3.4 Knapsack ciphers

Knapsack public-key encryption systems are based on the subset sum problem, an NP-

complete problem. Given in this problem are a ﬁnite set S ⊂ N

+

(N

+

is the set of natural

numbers ¦1, 2, . . . ¦), and a target t ∈ N

+

. The question is whether there exists a subset

S

**⊆ S whose elements sum to t [9]. For example:
**

• If S = ¦1, 2, 7, 14, 49, 98, 343, 686, 2409, 2793, 16808, 17206, 117705, 117993¦

• And t = 138457

• Then the subset S

**= ¦1, 2, 7, 98, 343, 686, 2409, 17206, 117705¦ is a solution.
**

The basic idea of a knapsack cipher is to select an instance of the subset sum problem

that is easy to solve, and then disguise it as an instance of the general subset sum problem

which is hopefully difﬁcult to solve [21]. The original (easy) knapsack set can serve as the

private key, while the transformed knapsack set serves as the public key.

The Merkle-Hellman knapsack cipher is important as it was the ﬁrst implementation

of a public-key encryption scheme. Many variations on it have been suggested but most,

including the original, have been demonstrated to be insecure [21]. A notable exception to

this is the Chor-Rivest knapsack system [21].

15

3.4.1 Merkle-Hellman Knapsack cipher

The Merkle-Hellman knapsack cipher attempts to disguise an easily solved instance of the

subset sum problem, called a superincreasing subset sub problem, by modular multiplica-

tion and a permutation [21]. A superincreasing sequence is a sequence (b

1

, b

2

, . . . , b

n

) of

positive integers with the property that b

i

>

i−1

j=1

b

j

, for each i, 2 ≤ i ≤ n.

The integer n is a common system parameter. b

1

, b

2

, . . . , b

n

is the superincreasing se-

quence and M is the modulus, selected such that M > b

1

+ b

2

+ . . . + b

n

. W is a random

integer, such that 1 ≤ W ≤ M − 1 and gcd(W, M) = 1. π is a random permutation

of the integers ¦1, 2, . . . , n¦. The public key of A is (a

1

, a

2

, . . . , a

n

), where a

i

= Wb

π(i)

mod M, and the private key of A is (π, M, W, (b

1

, b

2

, . . . , b

n

)).

The steps in the transmission of a message m are as follows (B is the sender, A is the

receiver) [21]:

1. Encryption - B should do the following:

• Obtain A’s authentic public key (a

1

, a

2

, . . . , a

n

)

• Represent the message m as a binary string of length n,

• m = m

1

m

2

. . . m

n

• Compute the integer c = m

1

a

1

+ m

2

a

2

+ . . . + m

n

a

n

• Send the ciphertext c to A

2. Decryption - To recover the plaintext m from c, A should do the following:

• Compute d = W

−1

c mod M

• By solving a superincreasing subset sumproblem, ﬁnd the integers r

1

, r

2

, . . . , r

n

,

r

i

∈ ¦0, 1¦ such that d = r

1

b

1

+ r

2

b

2

+ . . . + r

n

b

n

.

• The message bits are m

i

= r

π(i)

, i = 1, 2, . . . , n.

A small example is as follows [21]:

16

1. Let n = 6

2. Entity A chooses the superincreasing sequence (12, 17, 33, 74, 157, 316)

3. M = 737

4. W = 635

5. The permutation π of ¦1, 2, 3, 4, 5, 6¦ is deﬁned as:

6. π(1) = 3, π(2) = 6, π(3) = 1, π(4) = 2, π(5) = 5, π(6) = 4

7. A’s public key is the knapsack set (319, 196, 250, 477, 200, 559)

8. A’s private key is (π, M, W, (12, 17, 33, 74, 157, 316))

9. To encrypt the message m = 101101, B computes c as:

10. c = 319 + 250 + 477 + 559 = 1605

11. and sends it to A

1. To decrypt, A computes

2. d = W

−1

c mod M = 136

3. and solves the superincreasing subset sum problem of:

4. 136 = 12r

1

+ 17r

2

+ 33r

3

+ 74r

4

+ 157r

5

+ 316r

6

5. to get 136 = 12 + 17 + 33 + 74

6. Therefore, r

1

= 1, r

2

= 1, r

3

= 1, r

4

= 1, r

5

= 0, r

6

= 0

7. Applying the permutation π produces the message bits:

8. m

1

= r

3

= 1, m

2

= r

6

= 0, m

3

= r

1

= 1, m

4

= r

2

= 1, m

5

= r

5

= 0, m

6

= r

4

= 1

17

3.4.2 Chor-Rivest Knapsack cipher

The Chor-Rivest cipher is the only known knapsack public-key system that does not use

some form of modular multiplication to disguise an easy subset sum problem [21]. This

cipher is by far the most sophisticated of those attacked by a genetic algorithm. When the

parameters of this system are carefully chosen, and no information is leaked, there is no

feasible attack on the Chor-Rivest system. It can not be covered in detail in a few pages,

so only a summary of the system will be given. Further number theoretical algorithms are

needed for the set-up and operation of the cipher.

Most of the parameters are based on a ﬁnite ﬁeld F

q

, of characteristic p, where q =

p

h

, p ≥ h and for which the discrete logarithm problem is feasible [21]. f(x) is a random

monic irreducible polynomial of degree h over Z

p

. The elements of F

q

are represented as

polynomials in Z

p

[x] of degree less than h, with multiplication done modulo f(x). g(x)

is a random primitive element of F

q

. For each ground ﬁeld element i ∈ Z

p

, the discrete

logarithm a

i

= log

g(x)

(x + i) of the ﬁeld element (x + i) to the base g(x) must be found.

The random permutation π is on the set of integers ¦0, 1, 2, . . . , p − 1¦, and the random

integer d is selected such that 0 ≤ d ≤ p

h

− 2. c

i

is then computed as c

i

= (a

π(i)

+ d)

mod (p

h

−1), 0 ≤ i ≤ p −1. A’s public key is ((c

0

, c

1

, . . . , c

p−1

), p, h) and its private key

is (f(x), g(x), π, d).

A message m is transmitted as follows (B is the sender, A is the receiver) [21]:

1. Encryption - B should do the following:

• Obtain A’s authentic public key ((c

0

, c

1

, . . . , c

p−1

), p, h)

• Represent the message m as a binary string of length lg

p

h

|, where

p

h

is a

binomial coefﬁcient

• Consider m as the binary representation of an integer

• Transform this integer into a binary vector M = (M

0

, M

1

, . . . , M

p−1

) of length

p having exactly h 1’s as follows:

(a) Set l ←h

18

(b) For i from 1 to p do the following:

(c) If m ≥

p−i

l

then set M

i−1

←1, m ←m−

p−i

l

, l ←l −1

(d) Otherwise, set M

i−1

←0

(e) Note:

n

0

= 1 for n ≥ 0,

0

l

= 0) for l ≥ 1

• Compute c =

p−1

i=0

M

i

c

i

mod (p

h

−1)

• Send the ciphertext c to A

2. Decryption - To recover the plaintext m from c, A should do the following:

• Compute r = (c −hd) mod (p

h

−1)

• Compute u(x) = g(x)

r

mod f(x)

• Compute s(x) = u(x) + f(x), a monic polynomial of degree h over Z

p

(Z

p

is

the set of integers ¦0, . . . , p − 1¦. Addition, subtraction, and multiplication in

Z

p

are performed modulo p.)

• Factor s(x) into linear factors over Z

p

: s(x) =

h

j=1

(x + t

j

), where t

j

∈ Z

p

• Compute a binary vector M = (M

0

, M

1

, . . . , M

p−1

) as follows:

(a) The components of M that are 1 have indices π

−1

(t

j

), 1 ≤ j ≤ h.

(b) The remaining components are 0.

• The message m is recovered from M as follows:

(a) Set m ←0, l ←h

(b) For i from 1 to p do the following:

(c) if M

i−1

= 1, then set m ←m +

p−i

l

and l ←l −1

An example using artiﬁcially small parameters would go as follows [21]:

1. Entity A selects p = 7 and h = 4

2. Entity A selects the irreducible polynomial f(x) = x

4

+ 3x

3

+ 5x

2

+ 6x + 2 of

degree 4 over Z

7

. The elements of the ﬁnite ﬁeld F

7

4 are represented as polynomials

in Z

7

[x] of degree less than 4, with multiplication performed modulo f(x).

19

3. Entity A selects the random primitive element g(x) = 3x

3

+ 3x

2

+ 6

4. Entity A computes the following discrete logarithms:

• a

0

= log

g(x)

(x) = 1028

• a

1

= log

g(x)

(x + 1) = 1935

• a

2

= log

g(x)

(x + 2) = 2054

• a

3

= log

g(x)

(x + 3) = 1008

• a

4

= log

g(x)

(x + 4) = 379

• a

5

= log

g(x)

(x + 5) = 1780

• a

6

= log

g(x)

(x + 6) = 223

5. Entity A selects the random permutation π on ¦0, 1, 2, 3, 4, 5, 6¦ deﬁned by π(0) =

6, π(1) = 4, π(2) = 0, π(3) = 2, π(4) = 1, π(5) = 5, π(6) = 3

6. Entity A selects the random integer d = 1702

7. Entity A computes:

• c

0

= (a

6

+ d) mod 2400 = 1925

• c

1

= (a

4

+ d) mod 2400 = 2081

• c

2

= (a

0

+ d) mod 2400 = 330

• c

3

= (a

2

+ d) mod 2400 = 1356

• c

4

= (a

1

+ d) mod 2400 = 1237

• c

5

= (a

5

+ d) mod 2400 = 1082

• c

6

= (a

3

+ d) mod 2400 = 310

8. Entity A’s public key is ((c

0

, c

1

, c

2

, c

3

, c

4

, c

5

, c

6

), p = 7, h = 4), and its private key is

(f(x), g(x), π, d).

20

9. To encrypt a message m = 22 for A, entity B does the following:

(a) Obtains A’s authentic public key ((c

0

, c

1

, c

2

, c

3

, c

4

, c

5

, c

6

), p, h)

(b) Represents m as a binary string of length 5: m = 10110 (Note that lg

7

4

| =

5).

(c) Transforms m to the binary vector M = (1, 0, 1, 1, 0, 0, 1) of length 7

(d) Computes c = (c

0

+ c

2

+ c

3

+ c

6

) mod 2400 = 1521

(e) Sends c = 1521 to A

10. To decrypt the ciphertext c = 1521, A does the following:

(a) Computes r = (c −hd) mod 2400 = 1913

(b) Computes u(x) = g(x)

(

1913) mod f(x) = x

3

+ 3x

2

+ 2x + 5

(c) Computes s(x) = u(x) + f(x) = x

4

+ 4x

3

+ x

2

+ x

(d) Factors s(x) = x(x + 2)(x + 3)(x + 6) (so t

1

= 0, t

2

= 2, t

3

= 3, t

4

= 6).

(e) The components of M that are 1 have indices π

−1

(0) = 2, π

−1

(2) = 3, π

−1

(3) =

6, π

−1

(6) = 0. Hence, M = (1, 0, 1, 1, 0, 0, 1)

(f) Transforms M to the integer m = 22, recovering the plaintext

To illustrate how small the above parameters are, the parameters originally suggested

for the Chor-Rivest system are [21]:

• p = 197 and h = 24

• p = 211 and h = 24

• p = 3

5

and h = 24

• p = 2

8

and h = 25

21

3.5 Vernam cipher

The Vernam cipher is a stream cipher deﬁned on the alphabet / = ¦0, 1¦. A binary

message m

1

m

2

. . . m

t

is operated on by a binary key string k

1

k

2

. . . k

t

of the same length

to produce a ciphertext string c

1

c

2

. . . c

t

where c

i

= m

i

⊕ k

i

, 1 ≤ i ≤ t [21]. If the key

string is randomly chosen and never used again, this cipher is called a one-time system or

one-time pad.

The one-time pad can be shown to be theoretically unbreakable [21]. If a cryptanalyst

has a ciphertext string c

1

c

2

. . . c

t

encrypted using a random, non-reused key string, the

cryptanalyst can do no better than guess at the plaintext being any binary string of length t.

It has been proven that an unbreakable Vernam system requires a random key of the same

length as the message [21].

3.6 Comments

The above ciphers comprise all those cryptosystems attacked using a genetic algorithm in

a fairly exhaustive search of the literature. The newest of these systems are the knapsack

ciphers, with the Chor-Rivest system published in 1985, and the Merkle-Hellman system

published in 1978 [21]. The other ciphers attacked are all systems older than that. In some

cases, centuries older. None of the modern ciphers, such as DES, AES, RSA, or Elliptic

Curve, were even attempted using a genetic algorithm based approach. The only ciphers

that are considered even remotely mathematically difﬁcult are the knapsack systems, and

even they are not true number theoretical protocols. This lack makes many of the genetic

algorithm attacks have little importance in the ﬁeld of cryptanalysis.

22

Chapter 4

Relevant Prior Works

While there have been many papers written on genetic algorithms and their application to

various problems, there are relatively few papers that apply genetic algorithms to cryptanal-

ysis. The earliest papers that use a genetic algorithm approach for a cryptanalysis problem

were written in 1993, almost twenty years after the primary paper on genetic algorithms by

John Holland [12] [14]. These papers focus on the simplest classical and modern ciphers,

and are cited frequently by later works.

Figures 4.1 through 4.4 show which ciphers are attacked by which authors.

Substitution

Monoalphabetic Polyalphabetic

Spillman et. al.1993

Clark 1994

Clark & Dawson 1998

Grundlingh & Van Vuuren 2002

Clark et. al. 1996

Clark & Dawson 1997

Figure 4.1: Substitution Cipher Family and Attacking Papers

23

Permutation/Transposition

Permutation Transposition

Clark 1994

Matthews 1993

Grundlingh & Van Vuuren 2002

Figure 4.2: Permutation Cipher Family and Attacking Papers

Knapsack

Merkle-Hellman

Chor-Rivest

Spillman 1993

Clark, Dawson, Bergen 1996

Kolodziejczuk 1997

Yaseen, Sahasrabuddhe 1999

Figure 4.3: Knapsack Cipher Family and Attacking Papers

Vernam

Lin, Kao 1995

Figure 4.4: Vernam Cipher Family and Attacking Paper

24

4.1 Substitution: Spillman et al. - 1993

The ﬁrst paper published in 1993 by Spillman, Janssen, Nelson and Kepner [26], focuses

on the cryptanalysis of a simple substitution cipher using a genetic algorithm. This paper

selects a genetic algorithm approach so that a directed random search of the key space

can occur. The paper begins with a review of genetic algorithms, covering the genetic

algorithm life cycle, and the systems needed to apply a genetic algorithm. The systems

are considered to be a key representation, a mating scheme, and a mutation scheme. The

selected systems are then discussed. The key representation is an ordered list of alphabet

characters, where the ﬁrst character in the list is the plaintext character for the most frequent

ciphertext character, the second character is the plaintext character for the second most

frequent ciphertext character, and so on. The mating scheme randomly selects two keys for

mating, weighted in favor of keys with a high ﬁtness value. The two parent keys create two

child keys by selecting from each slot in the parents the more frequent ciphertext character.

The ﬁrst child key is created by scanning left to right, and the second by scanning right to

left. If the more frequent character is already in the child, then the character from the other

parent is selected. The mutation scheme consists of swapping two characters in the key

string. The ﬁtness function uses unigrams and digrams, and is as follows:

Fitness = (1 −

26

i=1

¦[SF[i] −DF[i][ +

26

j=1

[SDF[i, j] −DDF[i, j][¦/4)

8

(4.1.1)

The function represents a normalized error value. Larger ﬁtness values represent smaller

errors. Sensitivity to small differences is increased by raising the result to the 8th power,

while sensitivity to large differences is decreased by the division of the summation terms

by 4. SF[i] is the standard frequency of character i in English plaintext, while DF[i] is

the measured frequency of the decoded character i in the ciphertext. This makes the ﬁrst

summation term equal to the sum of the differences between each character’s standard fre-

quency and the frequency of the ciphertext character that decodes to that character. SDF

is the standard frequency for a digram and DDF the measured frequency. When the mea-

sured and standard frequencies are the same, the summation terms equal zero, making the

25

ﬁtness value equal one. This ﬁtness function is bounded below by zero, but cannot evaluate

to zero. Since high ﬁtness values are more important, this does not effect the overall result.

The complete algorithm used in this work is:

1. Generate a random population of key strings

2. Determine a ﬁtness value for each key in the population

3. Do a biased random selection of parents

4. Apply the crossover operation to the parents

5. Apply the mutation process to the children

6. Scan the new population of key strings and update the list of the 10 “best” keys seen

The process stops after a ﬁxed number of generations, at which point, the best keys are

used to decipher the ciphertext. Results are given for 100 generation runs of the algorithm,

using populations of 10 and 50 keys, and mutation rates of 0.05, 0.10, 0.20, and 0.40. In

the smaller population size, the exact key is found after searching less than 4x10 − 23 of

the key space, or after the examination of about 1000 keys. In the larger population size,

only about 20 generations are needed to examine 1000 keys. The best mutation rates for

this experiment were those less than 0.20.

4.1.1 Comments

Overall, this paper is a good choice for re-implementation and comparison. It is an early

work, so does not compare its results to others, but provides a decent level of result report-

ing and discussion, making comparison possible. It presents the basic genetic algorithm

concepts clearly, and uses examples and diagrams to clarify important points.

• Typical results:

– The exact key is not always found

26

– Visual inspection of the plaintext is required most of the time to determine the

exact key

• Re-implementation approach:

– The same parameters are re-used

– Additional parameters are used to expand the range of parameter values

• Observed cryptanalytic usefulness:

– None - no exact keys found in testing

• Comment on result discrepancy:

– Different measures of success were used

4.2 Transposition: Matthews - 1993

The second paper published in 1993 [20], by R. A. J. Matthews, uses an order-based ge-

netic algorithm to attack a simple transposition cipher. The paper begins by discussing the

archetypal genetic algorithm. This includes the typical genetic algorithm components, as

well as the two characteristics a problem must have in order for it to be attackable by a

genetic algorithm. The ﬁrst characteristic is that a partially-correct chromosome must have

a higher ﬁtness than an incorrect chromosome. The second characteristic is an encoding for

the problem that allows genetic operators to work. The next section of the paper discusses

the cryptanalytic use of the genetic algorithm. It concludes that substitution, transposition,

permutation, and other ciphers are attackable by a genetic algorithm, while ciphers such as

DES (Data Encryption Standard), which is a Feistel cipher, are not attackable.

The cipher selected for attack in this work is a classical two-stage transposition cipher.

Here, a key of length N takes the form of a permutation of the integers 1 to N. The

plaintext, L characters long, is written beneath the key to form a matrix N characters wide

by (at least) L mod N characters deep. The second stage consists of reading the text off

27

the columns in the order indicated by the integers in the key. Decryption consists of writing

down the key, determining the length of the columns, writing the ciphertext back into the

matrix in sequence, and reading across the columns. Breaking this cipher is typically a two-

stage process as well. The stages are determining the length of the transposition sequence

(N), and then ﬁnding the permutation of the N integers.

The genetic algorithm system used to attack the transposition cipher is known as GEN-

ALYST. It incorporates all the standard genetic algorithm features, with the addition of the

items needed for an order-based approach. The GENALYST system uses a ﬁtness scoring

system based on the number of times common English digrams and trigrams appear in the

decrypted text. Ten digram and trigrams were used - TH, HE, IN, ER, AN, ED, THE, ING,

AND, and EEE. The presence of EEE resulted in a subtraction of ﬁtness points, as it prac-

tically never appears in English. This process was checked for correlation between ﬁtness

value and correctness of chromosome, to improve conﬁdence in GENALYST. GENALYST

selects the ﬁttest chromosomes using a roulette wheel approach based on normalized ﬁtness

values. This means that the probability of a chromosome getting selected is proportional

to its normalized ﬁtness value. “Elitism” is present to ensure that the ﬁttest or most elite

chromosomes found so far are always included in the breeding pool. After selection occurs,

breeding is done with a proportion of the remaining chromosomes. The proportion used

decreases linearly as the search continues, in order to help optimize the effectiveness of

GENALYST. Breeding is done using the position based crossover method for order-based

genetic algorithms. In this method, a random bit string of the same length as the chromo-

somes is used. The random bit string indicates which elements to take from parent #1 by a

1 in the string, and the elements omitted from parent #1 are added in the order they appear

in parent #2. The second child is produced by taking elements from parent #2 indicated by

a 0 in the string, and ﬁlling in from parent #1 as before. Two possible mutation types may

then be applied. The ﬁrst type is a simple two-point mutation where two random chromo-

some elements are swapped. The second type is a shufﬂe mutation, where the permutation

is shifted forward by a random number of places.

28

A test procedure is given to ensure that test texts have a true ﬁtness value that is repre-

sentative of English text. This is calculated mathematically, based on digram/trigram per-

centage frequency in the text, the ﬁtness score given by GENALYST to that di- or tri-gram,

and the summation over all di- and tri-grams checked. The text length is also important, as

it is easier to attack texts with a length that is an integer multiple of the key length. Based

on these considerations, a test text with a length of 181 characters was selected. This length

is prime, but close to multiples of 6, 9, 10, and 12. This is a worst-case scenario since not

all column lengths are the same, and multiple possibilities for the key length are present.

A Monte Carlo search is run on the same text so that the genetic algorithm results can be

compared to the results from a traditional search technique. In a Monte Carlo search, a

series of guesses is made to determine the correct key, and the most elite chromosome is

kept.

The breaking of a transposition cipher has two parts: ﬁnding the correct keylength, and

ﬁnding the correct permutation of the key. GENALYST was tested on both parts. For the

ﬁrst part, the test text was encrypted using a target key. GENALYST was then used to

ﬁnd the best decryption of the text, using seven different keylengths in the range of 6 to

12, which includes the length of the target key. The goal of this section was to see if the

ﬁttest of the seven chromosomes had the same length as the target key. The population size

used was 20, and the number of generations was 25. The probability of crossover started

at 0.8 and decreased to 0.5. Mutation probability started at 0.1 for both point mutation and

shufﬂe mutation, increasing to 0.5 for point mutation and to 0.8 for shufﬂe mutation. The

experiment was run ﬁve times each for three target keylengths, K = 7, 9, 11. GENALYST

was successful in 13 of the 15 runs (87%). However, the random Monte Carlo algorithm

performed equally well in this task, also producing a success rate of 87%. For the second

part, the same parameters of population size and number of generations were used. GEN-

ALYST managed to completely break the cipher in one of the runs for K = 7 and one of

the runs for K = 9. Overall, GENALYST involves little more computational effort than

the Monte Carlo algorithm, since the single largest task, ﬁtness rating, is present in both.

29

Therefore, a genetic algorithm approach is useful even when only a small advantage is

produced. Using a larger gene pool and number of generations boosts GENALYST’s per-

formance signiﬁcantly over the performance of Monte Carlo. The greatest improvement in

GENALYST’s performance comes from hybridizing the algorithm with other techniques,

such as perming.

4.2.1 Comments

Overall, this is one of the best reference papers available. It is an early work, but is very

complete. It covers background material and problem considerations well, and has a very

balanced approach to considering the genetic algorithm as a possible solution. The param-

eters are clearly reported, as are the algorithms used. This is an excellent candidate for

re-implementation and comparison.

• Typical results:

– Keylength correctly determined 87% of the time

– Correct key found 13.33% of the time

• Re-implementation approach:

– Different text lengths used, same parameters re-used

– Additional parameters are used to expand the range of parameter values

• Observed cryptanalytic usefulness:

– Low - low decryption percentage

• Comment on result discrepancy:

– Percentages reported are based on number of tests and different numbers of

tests were used

30

4.3 Merkle-Hellman Knapsack: Spillman - 1993

The third paper published in 1993, by R. Spillman [25], applies a genetic algorithm ap-

proach to a Merkle-Hellman knapsack system. This paper considers the genetic algorithm

as simply another possibility for the cryptanalysis of knapsack ciphers. A knapsack cipher

is based on the NP-complete problem of knapsack packing, which is an application of the

subset sum problem. The problem statement is: “Given n objects each with a known vol-

ume and a knapsack of ﬁxed volume, is there a subset of the n objects which exactly ﬁlls

the knapsack ?”. The idea behind the knapsack cipher is that ﬁnding the sum is hard, and

this fact can be used to protect an encoding of information. There must also be a decod-

ing algorithm that is relatively easy for the intended recipient to apply. An easy decoding

algorithm can be made by using the easy knapsacks. An easy knapsack consists of a se-

quence of numbers that is superincreasing, i.e., each number is greater than the sum of the

previous numbers. However, an easy knapsack does not protect the data if the elements of

the knapsack are known. A Merkle-Hellman knapsack system converts an easy knapsack

into a trapdoor knapsack. A trapdoor knapsack is not superincreasing, making it harder

to attack. The Merkle-Hellman procedure to create a trapdoor knapsack sequence A is as

follows:

1. Given a simple knapsack sequence A

= (a

1

, . . . , a

n

)

2. Select an integer u > 2a

n

3. Select another integer w such that gcd(u, w) = 1

4. Find the inverse of w mod u

5. Construct the trapdoor sequence A = wA

mod u

Once the easy knapsack has been converted into a trapdoor knapsack, both the easy

knapsack and the value of w

−1

must be known to decode the data. The target sum for the

superincreasing sequence is calculated by multiplying the sumof the trapdoor sequence and

31

w

−1

together and then reducing it modulo u. This system can become a public-key cipher,

with the trapdoor sequence the public key. The private key would be the superincreasing

sequence and the values of u, w, and w

−1

.

The next section of the paper introduces genetic algorithms using binary chromosomes.

The applicable processes of selection, mating, and mutation are described. Selection is a

random process which is biased so that the chromosomes with higher ﬁtness values are

more likely to be selected. The mating process is a simple crossover operation, i.e., bits

s + 1 to r of parent #1 are swapped with bits s + 1 to r of parent #2. Mutation consists of

switching a bit in the chromosome to its complement, i.e., a 0 becomes a 1.

The three systems that need to be deﬁned for a genetic algorithm application are: rep-

resentation scheme, mating process, and mutation process. The representation used here

is that of binary chromosomes, which represent the summation terms of the knapsack, i.e.,

a 1 in the chromosome indicates that that term should be included when calculating the

knapsack sum. The evaluation function is determined once a representation scheme has

been selected. Here, the function measures how close a given sum of terms is to the target

sum of the knapsack. Other properties included for this experiment are:

• Function range between 0 and 1, with 1 indicating an exact match

• Chromosomes that produce a sum greater than the target should have a lower ﬁtness,

in general, than those that produce a sum less than the target

• It should be difﬁcult to produce a high ﬁtness value

The actual evaluation function used depends on the relationship between the target sum

and the chromosome sum. The function is determined as follows:

1. Calculate the maximum difference that could occur between a chromosome and the

target sum

MaxDiff = max(Target, FullSum−Target) (4.3.1)

where FullSum is the sum of all components in the knapsack

32

2. Determine the Sum of the current chromosome

3. If Sum ≤ Target then

Fitness = 1 −

[Sum−Target[/Target (4.3.2)

4. If Sum > Target then

Fitness = 1 −

6

[Sum−Target[/MaxDiff (4.3.3)

As described above, the mating process is simple crossover. The mutation process

consists of three different possibilities. About half the time, bits are randomly mutated as

described above (switched to their complement). Next most likely is the swapping of a

bit with a neighboring bit. The ﬁnal possibility is a reversal of the bit order between two

points.

A description of the complete algorithm is given, as well as results for one 15-item

knapsack applied to a 5 character text. The characters are encoded as 8 bit ASCII charac-

ters. Each of the 5 sums (one per character) was attacked using the genetic algorithm, with

an initial population of 20 random 15-bit binary strings.

4.3.1 Comments

This paper is reasonable for its length (about 10 pages) and year. A minimal level of experi-

mentation is included but the results and parameters are decently reported. The introductory

material and discussion are reasonable, although short. Re-implementation should not be

difﬁcult, and there are sufﬁcient results to make a comparison possible. Overall, this paper

is a reasonable choice for re-implementation and comparison.

• Typical results:

– Used 5 ASCII characters, each of which was decrypted

• Re-implementation approach:

33

– Not re-implemented

• Observed cryptanalytic usefulness:

– None - the knapsack size is trivial

• Comment on result discrepancy:

– No comment, not re-implemented

4.4 Substitution/Permutation: Clark - 1994

The only paper published in 1994, by Andrew Clark [4], includes the genetic algorithm as

one of three optimization algorithms applied to cryptanalysis. The other two algorithms are

tabu search and simulated annealing. The ﬁrst section of the paper describes the two ciphers

used - simple substitution and permutation. Next, suitability assessment is discussed. This

section is where the ﬁtness functions are developed. For the substitution cipher, the ﬁtness

function is

F

k

= (α

i∈Λ

[SF[i] −DF[i][ + β

i∈Λ

j∈Λ

[SDF[i][j] −DDF[i][j][) (4.4.1)

Λ denotes the set of characters in the alphabet - (A, B, C, . . . , Z, space). SF[i] is the

relative frequency of character i in English, while DF[i] is the relative frequency of the

decoded character i in the message decrypted using the key k k. SDF[i][j] is the relative

frequency of the digram ij in English, while DDF[i][j] is the relative frequency of that

digram in the message decrypted using k. Varying α and β allows one to favor either the

unigram or digram frequencies. For the permutation cipher, the same approach was used

as in [20] - the scoring table for digrams and trigrams.

The next three sections introduce the basic concepts of simulated annealing, genetic

algorithms, and tabu search. The genetic algorithm section is fairly standard, and con-

tains much the same material as earlier papers. The section is short and does not give any

34

speciﬁcs about the algorithm used. The ﬁnal section in the paper discusses the results. It is

fairly short, and does not give many details.

4.4.1 Comments

Overall, this paper is too short (5 pages) to be useful. It is well written, but lacks details.

The tiny genetic algorithmsection and fewreported results make it difﬁcult to re-implement

or even to compare against. This paper is not a good candidate for further use.

• Typical results:

– Convergence rates and computation time are the only reported results

• Re-implementation approach:

– Parameters were not reported

– Standard parameter values used in other re-implementations were used

– Only the permutation attack was re-implemented as a later work from this au-

thor was used for the substitution attack

• Observed cryptanalytic usefulness:

– Medium - the correct key was found with a high success rate

• Comment on result discrepancy:

– No real discrepancy, different combination of metrics used

4.5 Vernam: Lin, Kao - 1995

This is the only paper published in 1995. By Feng-Tse Lin and Cheng-Yan Kao, [18] is a

ciphertext-only attack on a Vernam cipher. The paper begins by introducing cryptography

and cryptanalysis, including attack and cryptosystem types, as well as the Vernam cipher

35

itself. The Vernam cipher is a one-time pad style system. Let M = m

1

, m

2

, . . . , m

n

denote a plaintext bit stream and K = k

1

, k

2

, . . . , k

r

a key bit stream. The Vernam cipher

generates a ciphertext bit stream C = E

k

(M) = c

1

, c

2

, . . . , c

n

, where c

i

= (m

i

+ k

i

)

mod p and p is a base. Since this is a one-key (symmetric or private-key) cryptosystem,

the decryption formula has the same key bit stream K, only M = D

k

(C) = m

1

, m

2

, . . .,

where m

i

= (c

i

−k

i

) mod p. The proposed approach is to determine K from an intercepted

ciphertext C, and use it to break the cipher.

Since the described cryptosystem is not the Vernam cipher, and there are many errors in

this paper, both in notation and logic, this application needs no further discussion. The ge-

netic algorithm approach is rendered invalid by the errors present in the reported approach

and assumptions.

4.5.1 Comments

This paper is of below average quality. It is rather short, and was written for the IEEE

International Conference on Systems, Man, and Cybernetics. There are some problems

with English grammar, possibly due to translation from the original language. The systems

are adequately described, but are, however incorrect. The cipher given in this paper is not

the Vernam cipher. A re-implementation is impossible due to the many errors present in

this work.

• Typical results:

– One run of the genetic algorithm is shown, with parameters

• Re-implementation approach:

– Not re-implemented

• Observed cryptanalytic usefulness:

– None - inconsistent notation, leading one to believe that the cipher attacked is

not the true Vernam cipher

36

• Comment on result discrepancy:

– No comment, not re-implemented

4.6 Merkle-Hellman Knapsack: Clark, Dawson, Bergen -

1996

This paper, by Clark, Dawson, and Bergen [7] is an extension of [25]. It contains a detailed

analysis of the ﬁtness function used in [25], as well as a modiﬁed version of the same ﬁtness

function. The paper begins by introducing the subset sum problem and previous work in

genetic algorithm cryptanalysis. The ﬁtness function from [25] is then discussed. Since the

reason this function penalizes solutions which have a sum greater than the target sum for

no clear reason, the function is altered to:

Fitness = 1 −([Sum−Target[/MaxDiff)

1

2

(4.6.1)

The altered ﬁtness function is found to be more efﬁcient by running a genetic algorithm

attack on 100 different knapsacks sums from the knapsack of size 15 used by Spillman

[25]. This function requires fewer generations, on average, and searches less of the key

space to ﬁnd a solution.

The genetic algorithm attack is then introduced and Spillman’s [25] attack summarized.

In this work, for each of three different knapsack sizes - 15, 20, and 25, 100 different

knapsack sums were formed, and the algorithm run until the solution was found. The

amount of key space searched is about half that of an exhaustive attack, which is not a

signiﬁcant gain for a large knapsack. The gain is insigniﬁcant due to the overhead of the

genetic algorithm approach. Also, the number of generations required, and the percentage

of the key space searched vary widely between runs.

This attack is poor because the ﬁtness function does not accurately describe the suit-

ability of a given solution. The Hamming distance between the proposed solution and

the true solution is not always indicated correctly by the ﬁtness function. A high ﬁtness

37

value does not necessarily mean a low Hamming distance. The distribution of Hamming

distances for a given range of ﬁtness values appears to be binomial-like. Therefore, a ﬁt-

ness function based on a difference of sums is not adequate when attempting to solve the

subset-sum problem with an optimization algorithm. Experimental results show that the

genetic algorithm gives little improvement over an exhaustive search in terms of solution

space covered, and, in fact, an exhaustive attack will be faster, on average, due to reduced

algorithmic complexity.

4.6.1 Comments

This paper clearly discusses the problems with an earlier work, [25]. It is well written

and gives a reasonable number of details. It is not, however a stand-alone work. A re-

implementation is possible but unnecessary since the work establishes that the ﬁtness func-

tion used is inappropriate. A comparison may be possible but unlikely since a new ﬁtness

function for the attack is needed.

• Typical results:

– Reports the number of generations and the key space searched

– These results are for three knapsack sizes (15, 20, 25), each attacked until the

solution was found

• Re-implementation approach:

– Not re-implemented

• Observed cryptanalytic usefulness:

– None - this work disproves the usefulness of an earlier work [25]

• Comment on result discrepancy:

– No comment, not re-implemented

38

4.7 Vigen` ere: Clark, Dawson, Nieuwland - 1996

This paper, by Clark, Dawson, and Nieuwland [8], is the ﬁrst to use a parallel genetic al-

gorithm for cryptanalysis. It attacks a polyalphabetic substitution cipher. The ﬁrst section

introduces the simple and polyalphabetic substitution ciphers, while the second section in-

troduces the genetic algorithm. The parallel approach chosen for this work is to have a

number of serial genetic algorithms working on a separate part of the problem. In this case,

since more than one key must be found, each processor will work on ﬁnding a different

key. Next, the application of the parallel genetic algorithm to the polyalphabetic substitu-

tion cipher is discussed. The ﬁtness function uses weighted unigram, digram, and trigram

frequencies as follows:

F(K) = w

1

N

i=1

(K

1

[i] −D

1

[i])

2

+ w

2

N

i,j=1

(K

2

[i, j] −D

2

[i, j])

2

+

w

3

N

i,j,k=1

(K

3

[i, j, k] −D

3

[i, j, k])

2

(4.7.1)

Here, K is a single N character key for a simple substitution cipher. K

1

, K

2

, and K

3

are the known unigram, digram, and trigram statistics, respectively. D

1

, D

2

, and D

3

are

the statistics for the decrypted message, and w

1

, w

2

, and w

3

are weights chosen to equal

one and emphasize different statistics. Since computation of a ﬁtness function including

trigrams can be computationally intensive, this work starts out using only unigrams and

digrams, and later adds in the trigram statistics.

M processors in parallel are used, where M is the period of the polyalphabetic substi-

tution cipher. Each processor works on one of the M keys, and communicates with other

processors after a number of iterations. This communication involves sending the best key

to each of a processor’s neighbors, which allows digram and trigram statistics to be com-

puted. The mating function is borrowed from [26] and requires the same special ordering

of the key. Mutation has two possibilities, depending on the relative ﬁtness of the child key.

If the child has a ﬁtness greater than the median in the gene pool, each character is swapped

with the character to its right. If the child’s ﬁtness is less than the median, each character is

39

swapped with a randomly chosen character in the key. The selection mechanism used is to

combine all the children and all the keys from the previous generation together and select

the most ﬁt to be the new generation.

Fixed parameters for the experiment include:

• A gene pool size of 40

• A period of 3

• 200 iterations of the genetic algorithm

• A mutation rate of 0.05

• 10 iterations between slave-to-slave communications

The ﬁrst test was done to determine the amount of known ciphertext needed to retrieve a

signiﬁcant portion of the plaintext. Aplot was created, with each point found by running the

algorithm on ten different encrypted messages, ﬁve times per message. In the ﬁve runs per

message, the random number generator was seeded differently every time. About 90% of

the original message was recovered using 600 known ciphertext characters per key. In this

test, the weights were w

1

= 0.4, w

2

= 0.6, and w

3

= 0.0 initially, and w

1

= 0.2, w

2

= 0.3,

and w

3

= 0.5 after 100 iterations.

The second test was done to indicate the optimal choice of weights. The weights were

chosen such that

w

1

, w

2

, w

3

∈ ¦0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0¦

and

3

i=1

w

i

= 1.0. There are 66 combinations that satisfy these conditions. The algorithm

was run 20 times per combination on ten different messages, twice per message. Of the two

runs per message, the best result was taken, with the ten best results then averaged together.

The values of w

1

, w

2

and w

3

were ﬁxed for all 200 iterations of the algorithm. The best

results occurred when w

3

= 0.8.

40

4.7.1 Comments

This paper is short but reasonably detailed. The algorithms are clearly deﬁned and the result

graphs are a nice touch. However, only the parallel case is considered, making the paper of

limited usefulness. Re-implementation and comparison are reasonably possible, both in a

serial and a parallel version. If re-implemented, the serial version would be created ﬁrst to

determine if parallelization is warranted.

• Typical results:

– Reports on the amount of known ciphertext required in order to regain a certain

percentage of the original message

– Reports on the optimal choice of ﬁtness function weights

• Re-implementation approach:

– Re-implemented in a serial version only

– Most of the original parameters were used in the re-implementation

– Additional parameters are used to expand the range of parameter values

• Observed cryptanalytic usefulness:

– None - no messages were successfully decrypted

• Comment on result discrepancy:

– Different conclusions based on the different metrics used

4.8 Vigen` ere: Clark, Dawson - 1997

This is the ﬁrst paper published in 1997. By Clark and Dawson, [5] is, overall, a slightly

more detailed, longer version of [8]. Therefore, it shares all the same characteristics and is

evaluated as above.

41

4.9 Merkle-Hellman Knapsack: Kolodziejczyk - 1997

This paper [17], published in 1997, is by Kolodziejczyk. It is an extension of [25]. It

focuses on the Merkle-Hellman knapsack system, and the effect of initial parameters on

the approach reported in [25]. The paper introduces the Merkle-Hellman knapsack, and

then discusses the application of a genetic algorithm to the cryptanalysis thereof. Certain

restrictions are placed on the encoding algorithm. These are:

• Only the ASCII code will be used in encryption

• The superincreasing sequence will have 8 elements, ensuring that each character has

a unique encoding

• The plaintext is not more than 100 characters in length

Due to these restrictions, the knapsack is a completely trivial problem. The genetic

algorithm approach needs no discussion, as this application has no point.

4.9.1 Comments

Overall, this paper presents a reasonable amount of numerical data and is a good extension

work to [25]. Unfortunately, the paper is not well written, has poor English grammar and

word usage, and uses a completely trivial knapsack. The only contribution of this paper is

the addition of parameter variation to [25]. There is no point in reimplementation of this

paper.

• Typical results:

– Used 5 ASCII characters, each of which was decrypted

• Re-implementation approach:

– Not re-implemented

42

• Observed cryptanalytic usefulness:

– None - knapsack size is trivial

• Comment on result discrepancy:

– No comment, not re-implemented

4.10 Substitution: Clark, Dawson - 1998

This is the only paper published in 1998. By Clark and Dawson, [6] compares three op-

timization algorithms applied to the cryptanalysis of a simple substitution cipher. These

algorithms are simulated annealing, the genetic algorithm, and tabu search. Performance

criteria, such as speed and efﬁciency, are investigated. The paper begins by introducing

the application area and giving the basis for the current work. A simple substitution cipher

is attacked by all three algorithms, with a new attack for tabu search, as compared to the

attack done in [4]. Each of the attacks are compared on three criteria:

• The amount of known ciphertext available to the attack

• The number of keys considered before the correct solution was found

• The time required by the attack to determine the correct solution

The next section of the paper gives a detailed overview of each of the three algorithms.

It is followed by an overview of the attack, as well as details about the attack for each

algorithm. The overall attack uses a general suitability assessment or ﬁtness function. As-

suming / denotes the language alphabet, here ¦A, . . . , Z, ¦ (where is the symbol for a

space), the function is:

C

k

= α

i∈A

[K

u

(i)

−D

u

(i)

[ + β

i,j∈A

[K

b

(i,j)

−D

b

(i,j)

[ + γ

i,j,k∈A

[K

t

(i,j,k)

−D

t

(i,j,k)

[ (4.10.1)

k is the proposed key, K is the known language statistics, D is the decrypted message

statistics, and u/b/t are the unigram, digram, and trigram statistics. α, β, and γ are the

43

weights. The complexity of this formula is O(N

3

), where N is the alphabet size, when

trigram statistics are used. The effectiveness of using the different n-grams is evaluated

using a range of weights, and it is concluded that trigrams are the most effective basis for

a cost function, but the beneﬁt of using trigrams over digrams is small. In fact, due to

the added complexity of using trigrams, it is usually more practical to use only unigrams

and digrams. Therefore, the remainder of the paper uses a ﬁtness function based only on

digrams. This function is:

C

k

=

i,j∈A

[K

b

(i,j)

−D

b

(i,j)

[ (4.10.2)

The genetic algorithm attack section begins with detailed pseudocode for the mating

process, which is similar to that used in [26]. Selection is random, with a bias towards

the more ﬁt chromosomes. Next, the mutation operator is discussed. Mutation randomly

swaps two characters in a chromosome. Pseudocode is then given for the overall genetic

algorithm attack. The attack parameters are M, the solution pool size, and MAX ITER,

the maximum number of iterations. The M best solutions after reproduction continue on

to the next iteration.

Experimental results for the three algorithms were generated with 300 runs per data

point. One hundred different messages were created for each case, and the attack run three

times per message. The best result for each message was then averaged together to produce

the data point. This is done to provide a good representation of the algorithm’s ability. In

overall outcome of the attack, all the algorithms performed approximately the same. The

algorithms also perform equally well when comparing the amount of ciphertext provided

in the attack. Considering complexity, however, simulated annealing is more efﬁcient than

the genetic algorithm, and tabu search is the most efﬁcient.

4.10.1 Comments

Overall, this paper is well written and provides a good level of detail. It provides good

discussion of important points, especially in interpretation of the results. It is not focused

44

on a genetic algorithm attack, and only uses a simple substitution cipher, however, the

details given make a re-implementation fairly easy, and provide enough information for

comparison purposes. This paper is a good choice for further study.

• Typical results:

– Results include number of key elements correct vs. amount of known cipher-

text, number of key elements correct vs. total keys considered and number of

key elements correct vs. time

• Re-implementation approach:

– Parameters were not reported

– Standard parameter values used in other re-implementations were used

• Observed cryptanalytic usefulness:

– None - no successful decryptions of the message occurred

• Comment on result discrepancy:

– Different conclusions based on the different metrics used

4.11 Chor-Rivest Knapsack: Yaseen, Sahasrabuddhe - 1999

This is the only paper from 1999. By Yaseen and Sahasrabuddhe, [31] is also the only paper

that considers a genetic algorithm attack on the Chor-Rivest public key cryptosystem. The

paper begins with a review of the Chor-Rivest scheme, followed by a review of genetic

algorithms. The genetic algorithm attack is then discussed.

The input to the genetic algorithm is A = (a

1

, . . . , a

n

), a knapsack of positive integers,

and c, an integer which represents the cipher. The output is M = (m

1

, . . . , m

n

), a binary

vector such that c = A M. A and M are both binary vectors. The binary vector A

is the chromosome that represents the solution message M. The length of the vector is

45

n, the same as the size of the knapsack. a

i

= 1 if the element has been chosen and

0 otherwise, the same as in [25]. The initial population is created randomly, such that

each n-bit binary chromosome contains exactly h 1’s, where h is the degree of the monic

irreducible polynomial f(x) over Z

p

. The ﬁtness function is:

• Determine the value of the current chromosome (call it Sum)

•

Fit = 1 −

[Sum−Target[/FullSum (4.11.1)

where FullSum is the sum of all the components in the knapsack

Two genetic operators are used during the alteration (mutation) phase of the algorithm.

They are:

• Swapbits: choose two positions in the chromosome randomly and swap their contents

• Invert: choose two positions in the chromosome randomly and reverse the sequence

of bits between them

The parameters used throughout the experiment are:

• p

inv

(the probability of the invert operation occurring) = 0.75

• p

m

(the probability of the swapbits operation occurring) initially = 0.75

• Population size of 30

To assist the ﬁtness function, the chromosomes with a Hamming distance of 2 and 4

from a target were the ones used. This approach is based on some observations made in [7].

Experimental results were provided for the knapsacks of the ﬁelds GF(13

5

), GF(17

5

), GF(19

5

),

and GF(23

5

). These results are for the average of 30 experiments, both with and without

the Hamming distance technique. With the Hamming technique, as the size of the knapsack

increased, the percentage of search space visited decreased. Without the technique, the per-

centage visited had no relation to the knapsack size. With the technique, 1 to 2 % of search

46

space was visited, versus 50 to 90 % without. The average number of generations showed

a similar increase - 1 to 10 generations with the technique and 23 to 900 generations with-

out, on average. Of the 120 experiments done in each case, 33 experiments without the

technique did not reach a solution even after 5000 generations.

4.11.1 Comments

The main focus of this paper is on the Hamming distance technique, which is not explained

that clearly. A reasonable level of detail is included, however, due to length (5 pages) and

clarity issues, this paper is a poor choice for re-implementation.

• Typical results:

– Reports the average number of chromosomes required to ﬁnd an exact solution

• Re-implementation approach:

– Not re-implemented

• Observed cryptanalytic usefulness:

– None - lacks information about parameters required to analyze the attack

• Comment on result discrepancy:

– No comment, not re-implemented

4.12 Substitution/Transposition: Gr¨ undlingh, Van Vuuren

- submitted 2002

This paper has not yet been published, but was submitted for publication in 2002. By

Gr¨ undlingh and Van Vuuren, [13] combines operations research with cryptology and at-

tacks two classical ciphers with a genetic algorithm approach. The two ciphers studied

47

are substitution and transposition. The paper begins with some historical background on

the two ﬁelds, and then covers basic components of symmetric (private key) ciphers. The

next two sections discuss genetic algorithms and letter frequencies in natural languages.

The main difﬁculty in a genetic algorithm implementation is said to be the consideration

of constraints. As for character frequencies, these were generated for 7 languages from

modern bible texts.

The mono-alphabetic substitution cipher was attacked ﬁrst. In this case, consider a

ciphertext T of length N, formed from a plaintext of the same length in a natural language

L. The ﬁtness function of a candidate key k, is:

T

1

(L, N, T, k) =

2(N −ρ

N

min

(L)) −

Z

i=A

[ρ

N

i

(L) −f

T(N)

i

(k)[

2(N −ρ

N

min

(L))

(4.12.1)

Here, ρ

N

i

(L) represents the (scaled) expected number of times the letter i will appear per

N characters in a natural language L text. f

T(N)

i

(k) is the observed number of occur-

rences of letter i when the ciphertext T, of length N, is decrypted using the key k. Also,

ρ

N

min

(L) = min

i

¦ρ

N

i

(L)¦. The limits of T

1

are [0, 1], and both bounds may be reached in

extreme cases. The rationale behind this ﬁtness function involves discrepancies between

the expected number of occurrences and the observed number of occurrences of a letter i.

An alternative option for the ﬁtness measure, this time including digrams is:

T

2

(L, N, T, k) = [2(2N −1 −ρ

N

min

(L) −δ

N

min

(L)) −

Z

i=A

[ρ

N

i

(L) −f

T(N)

i

(k)[

−

Z

i,j=A

[δ

N

ij

(L) −f

T(N)

ij

(k)[] ÷(2(2N −1 −ρ

N

min

(L) −δ

N

min

(L))) (4.12.2)

Here, ρ

N

i

(L) and δ

N

ij

(L) represent the expected number of occurrences of, respectively,

the letter i and the digram ij, per N characters in a natural language L text. f

T(N)

i

(k)

and f

T(N)

ij

(k) are the observed number of occurrences of, respectively, the letter i and

the digram ij, when the ciphertext T is decrypted using the key k. Also, ρ

N

min

(L) =

min

i

¦ρ

N

i

(L)¦ and δ

N

min

(L) = min

ij

¦δ

N

ij

(L)¦. The same limits apply to T

2

as to T

1

.

A detailed algorithm overview, including pseudocode, is discussed next, including the

genetic operators of mutation, crossover, and selection. The mutation operator is a single

48

random swap of two characters. Crossover occurs during a traversal of the parent chromo-

somes. The ﬁrst child chromosome is produced as follows:

• Begin a left-to-right traversal of the parent chromosomes

• Select character i from the parent whose i-th character has the better frequency match

to the expected number of occurrences of i

• If the i-th character already appears in the child

– Select the character from the other parent

– If the characters are both already in the child, leave the position blank

• If any characters do not appear in the child, insert them to minimize discrepancies

between observed and expected frequency counts

The second child is produced by the same process using a right-to-left traversal. Keys

are selected stochastically, such that keys that have a higher ﬁtness value are more likely to

be mated. The algorithm was run on a ciphertext of 2,519 characters and a population size

of 20. Using (4.12.1) an average pool ﬁtness of 90 % was reached within 7 generations, and

using (4.12.2), an average of 75 % within 15 generations. The algorithm was run for 200

generations, and sets of the best 3 keys overall were kept as well as the ﬁnal population.

The transposition cipher was attacked second, using a similar genetic algorithm ap-

proach. This attack assumes that the number of columns is known, but the column shufﬂing

is unknown. The ﬁtness function then becomes (using the same symbols as before):

T

3

(L, N, T, k) =

2(N −1 −δ

N

min

(L)) −

Z

i,j=A

[δ

N

ij

(L) −f

T(N)

ij

(k)[

2(N −1 −δ

N

min

(L))

(4.12.3)

The crossover operator uses a random binary vector to select entries from the parents.

The ﬁrst child is formed by taking values from parent 1 where 1 occurs in the binary vector.

The rest of the values are ﬁlled from parent 2, in the order in which they occur. The second

child is formed starting from parent 2, and using the locations of the 0’s in the binary

49

vector. Two mutation operators are considered - point and shift. A point mutation consists

of randomly selecting two positions in a key and swapping their values. Shift mutation, on

the other hand, consists of shifting the values in a key a random number of positions to the

left or right modulo the length of the key. Otherwise, the same algorithm was used in this

attack as for the simple substitution cipher. No convergence of ﬁtness values was found for

this attack.

The paper concludes by describing a decision tool created to assist cryptanalysts who

routinely attack substitution ciphers. This tool is graphical in nature, and is based on the

genetic algorithm attack on substitution ciphers discussed above.

4.12.1 Comments

Overall, this paper is pretty good and includes good introductory material. It lacks enough

detail in the results section to be easily comparable, but gives a great level of detail for the

functions, algorithms, and operators used. It is a good candidate for re-implementation, but

not for comparison.

• Typical results:

– Gives the ﬁnal population pool and the partially decrypted ciphertext produced

by the best key for the substitution attack

– Transposition attack declared to be nonconvergent and no results given

• Re-implementation approach:

– Both the substitution and transposition ciphers were re-implemented

– Different text lengths used, same parameters re-used

– Additional parameters are used to expand the range of parameter values

• Observed cryptanalytic usefulness:

– Substitution: None - no messages successfully decrypted

50

– Transposition: Low - low decryption percentage

• Comment on result discrepancy:

– Different conclusions based on the different metrics used

4.13 Overall Comments

The quality of the reference works found in the literature varies widely. This issue is com-

pounded by the low number of reference works found - twelve. Several of these works were

too poorly speciﬁed for further use or used trivial parameters, leaving even fewer valid ref-

erence works. Seven works contained enough information for a re-implementation to be

completed. Some of these works depended on other, earlier works for the re-implementation

to be possible. A common set of metrics for these re-implemented works is gathered. It

is not possible to compare the results using the new metrics to the results reported in the

previous works due to the differences in metric composition. The metrics previously used

are a diverse collection, with very different bases. This diversity is one of the motivating

factors for this work.

51

Chapter 5

Traditional Cryptanalysis Methods

Only four of the seven ciphers attacked using a genetic algorithm will be further inves-

tigated. These ciphers are monoalphabetic substitution, polyalphabetic substitution (e.g.

the Vigen` ere cipher), permutation and columnar transposition. The other three ciphers are

excluded due to various reasons.

The Merkle-Hellman knapsack cipher was initially marked for further study but has

been removed due to the difﬁculty of implementing the seminal attack by Shamir [23].

This attack involves Lenstra’s integer programming algorithm and a fast diophantine ap-

proximation algorithm. The Chor-Rivest knapsack cipher is difﬁcult to implement and was

only considered for a genetic algorithm attack in one paper, which used trivial examples of

the system. The paper [31] is short and does not explain the attack technique well, making

it difﬁcult to re-implement the work. The third cipher that will not be considered further

is the Vernam cipher. This cipher is one of the few that exhibits perfect secrecy, however,

it is not widely used due to the key length requirements. This cipher is also attacked us-

ing a genetic algorithm in only one paper [18] which really is not attacking the cipher it

claims it is. Therefore, due to lack of replicable works, this cipher is excluded from further

investigation.

52

5.1 Distinguishing among the three ciphers

If the cipher type is not already known, inspection of the frequency of occurrence of the

ciphertext letters (i.e. how often each ciphertext character appears in the ciphertext), will

distinguish between a permutation and a substitution cipher. Each language (e.g. French,

English, German) has a different set of frequencies for its letters. A permutation cipher

will produce a ciphertext with the same frequency distribution as that of the language of

the plaintext [24]. For example, if the plaintext was written in English, a permutation cipher

will produce ciphertext with the same frequency distribution as any similar English text.

The index of coincidence (IC) measures the variation in letter frequencies in a ciphertext

[22]. If simple (monoalphabetic) substitution has been used, then there will be considerable

variation in the letter frequencies and the IC will be high [22]. As the period of the cipher

increases (i.e. more alphabets are used), the variation in letter frequencies decreases and

the IC is low.

5.2 Monoalphabetic substitution cipher

The traditional attack for the monoalphabetic substitution cipher utilizes frequency analysis

and examination. The ﬁrst step is to analyze the ciphertext to determine the frequency of

occurrence of each character [27]. The frequencies of digrams and trigrams may also be

determined in this step. Then, based on the frequencies, the cryptanalyst examines the

ciphertext and proceeds by guessing and changing guesses for which cipher character is

which plaintext character.

For example, the most frequent character in an English text is E [27]. Therefore, the

ciphertext character that appears most frequently is highly likely to be the equivalent of

the plaintext E. The next most frequent English letters are T, A, O, I, N, S, H, and R, all

with around the same frequency. Digram and trigram statistics can be used to help identify

these. TH, HE, and IN are the three most common digrams, and the three most common

trigram are THE, ING, and AND. In the end, however, the cryptanalyst must gain a feel for

53

the text and proceed by guessing letters until the text becomes legible.

5.3 Polyalphabetic substitution cipher

The polyalphabetic substitution cipher (here, the Vigen` ere cipher) is attacked using two

techniques - the Kasiski test and the index of coincidence [27]. The ﬁrst step is to determine

the keyword length, denoted m.

The Kasiski test was described by Friedrich Kasiski in 1863, but apparently discovered

earlier, around 1854, by Charles Babbage [27]. This test is based on the observation that

“two identical segments of plaintext will be encrypted to the same ciphertext whenever their

occurrence in the plaintext is δ positions apart, where δ ≡ 0 (mod m)” [27]. Conversely,

if two segments of ciphertext are identical and have at least a length of three, then there is

a good chance they correspond to identical segments of plaintext [27].

The Kasiski test follows the below procedure [27]:

1. Search the ciphertext for pairs of identical segments of length at least three

2. Record the distance between the starting positions of the two segments

3. If several such distances are obtained (δ

1

, δ

2

, . . .),

• m divides all of the δ

i

’s

• and, therefore, m divides the greatest common divisor (gcd) of the δ

i

’s

The index of coincidence is used to further verify the value of m. This concept was

deﬁned by Wolfe Friedman in 1920, as follows [27]:

• Suppose x = x

1

x

2

x

n

is a string of n alphabetic characters

• The index of coincidence of x, denoted I

c

(x) is deﬁned to be the probability that two

random elements of x are identical

54

Suppose the frequencies of A, B, C, . . . , Z in x are denoted by f

0

, f

1

, . . . , f

25

, respec-

tively. Two elements of x can be chosen in

n

2

**ways [27]. For each i, 0 ≤ i ≤ 25, there
**

are

f

i

2

**ways of choosing both elements to be i. This produces the formula [27]:
**

I

c

(x) =

25

i=0

f

i

2

n

2

=

25

i=0

f

i

(f

i

−1)

n(n −1)

Suppose x is a string of English text. Then, if the expected probabilities of occurrence

of the letters A, B, . . . , Z are denoted by p

0

, . . . , p

25

, respectively, we would expect that

I

c

(x) ≈

25

i=0

p

2

i

= 0.065 [27]. The same reasoning applies if x is a ciphertext string obtained

using any monoalphabetic cipher - the individual probabilities are permuted but the quantity

p

2

i

is unchanged.

Now, suppose we have a ciphertext string y = y

1

y

2

y

n

produced by a Vigen` ere

cipher. Deﬁne m substrings of y, denoted y

1

, y

2

, . . . , y

m

, by writing out the ciphertext, in

columns, in a rectangular array of dimensions m(n/m) [27]. The rows of this matrix are

the substrings y

i

, 1 ≤ i ≤ m. This means that we have:

y

1

= y

1

y

m

+y

1

y

2m

+

y

2

= y

2

y

m

+y

2

y

2m

+

.

.

.

.

.

.

.

.

.

y

m

= y

m

y

2m

y

3m

If y

1

, y

2

, . . . , y

m

are constructed as shown, and m is indeed the keyword length, then

I

c

(y

i

) should be approximately equal to 0.065 for 1 ≤ i ≤ m. If m is not the keyword

length, then the substrings y

i

will look much more random since they will have been pro-

duced by shift encryption with different keys. A completely random string produces an

I

c

≈ 26(

1

26

)

2

=

1

26

= 0.038 [27]. These two values (0.065 and 0.038) are far enough apart

that the correct keyword length can be obtained or conﬁrmed using the index of coinci-

dence.

55

After the value of m has been determined, the actual key, K = (k

1

, k

2

, . . . , k

m

) needs

to be found. The method in [27] is as follows:

• Let 1 ≤ i ≤ m and f

0

, . . . , f

25

denote the frequencies of A, B, . . . , Z in the string y

i

• Let n

**= n/m be the length of the string y
**

i

• The probability distribution of the 26 letters in y

i

is then

f

0

n

, . . . ,

f

25

n

**• Since each substring y
**

i

is obtained by shift encryption of a subset of the plain-

text elements using a shift k

i

, it is hoped that the shifted probability distribution

f

k

i

n

, . . . ,

f

25+k

i

n

**will be close to the ideal probability distribution p
**

0

, . . . , p

25

, with

subscripts evaluated modulo 26

• Suppose that 0 ≤ g ≤ 25 and deﬁne the quantity M

g

=

25

i=0

p

i

f

i+g

n

• If g = k

i

, then M

g

≈

25

i=0

p

2

i

= 0.065

• If g = k

i

, then M

g

will usually be signiﬁcantly smaller than 0.065

This technique should allow the determination of the correct value of k

i

for each value of

i, 1 ≤ i ≤ m.

5.4 Permutation/Transposition ciphers

The permutation cipher deﬁned in [27] is the same as that used in [4], while [20] and [13]

use a columnar transposition cipher. The permutation cipher will be attacked by exhaustion

as per [22]. This attack consists of considering d! permutations of the d characters, where

d is the period of the cipher. The ciphertext is divided into blocks of various lengths, and

all possible permutations considered. For each possible value of d, starting at d = 2, all

d! possible permutations are attempted on the ﬁrst two groups of d letters. For example, if

d = 2, the ﬁrst two groups of 2 letters are permuted as (letter 1)(letter 2) and (letter 2)(letter

56

1). This continues until one or both of the groups permute into something that looks like a

valid word or word part. At this time, the possibly valid permutation is applied to the other

groups of letters in the ciphertext. If the permutation produces valid text, i.e., the human

cryptanalyst accepts the text as valid in the language used, then the attack is completed,

otherwise, the process continues.

The columnar transposition cipher is attacked differently for an incompletely ﬁlled rect-

angle than for a completely ﬁlled rectangle. We will assume that the rectangle is completely

ﬁlled in order to simplify the attack. The width of a completely ﬁlled rectangle must be a

divisor of the length of the message, i.e., if the message has 77 letters, the rectangle must

be either 7 or 11 letters wide [24]. The ciphertext is written vertically into rectangles of

the possible widths, and a solution obtained by rearranging the columns to form plain-

text. The process of restoring a disarranged set of letters into its original positions is called

anagramming [24].

There are several ways to proceed with the process of anagramming. These include

the use of a probable word, the use of a probable sequence, digram/trigram charts, contact

charts, and vowel frequencies. A contact chart is an extension of the digram/trigram charts,

and contains information relating to the likelihood that a speciﬁc letter will be preceded or

followed by another letter. Vowel frequencies are language-speciﬁc. In English, vowels

comprise approximately 40% of the text, independent of text length [11]. Therefore, when

putting the ciphertext into matrix (rectangle) form, a period should be chosen which pro-

duces the most even distribution of vowels across the columns of the matrix [19]. The text

is deciphered more easily if the columns are then exchanged so that the most frequently oc-

curring letter combinations appear ﬁrst [19]. This fact helps the cryptanalyst to determine

the most likely possible width for the inscription rectangle.

The next step after vowel frequency examination is usually the use of a probable word

or sequence. The probable word is used when the cryptanalyst has some information about

the message which would lead to the assumption of a word in the text [24]. For example,

if the message is of a military nature, words such as division, regiment, attack, report,

57

communication, enemy, retreat, etc. are likely to be present [11]. If a probable word can

not be assumed, the next option is to look for the presence of one of the less frequent

digrams such as QU or YP [11]. If this is not possible either, reconstruction may default to

working on the ﬁrst word of the message as in [24].

If there is some difﬁculty involved in anagramming the ﬁrst row, the next step is to try to

ﬁt two columns together to produce good digraphs [24]. One technique for column ﬁtting

is to give each assumed digram its frequency in plaintext and then add the frequencies

[15]. The combination with the highest total is most likely to be correct [15]. Next, one

would attempt to extend the digrams into trigrams, and so on. Columns can be added both

before and after the current set, so the extension into trigrams and longer combinations

must consider both situations.

5.5 Comments

Overall, the traditional attack methods are not easily automated, and tend to require a

trained and experienced cryptanalyst. The polyalphabetic substitution cipher is the only

one that is attacked mostly mathematically. Mathematical information is used for both the

monoalphabetic substitution cipher and the permutation/transposition ciphers, however, the

main component of these attacks is the human cryptanalyst. The ﬁnal decision on whether

or not the plaintext is found is always made by the human cryptanalyst, no matter which

method is used in the attack.

58

Chapter 6

Attack Results

6.1 Methodology

The main metric for both classical and genetic algorithm attacks was elapsed time. This

was measured using the tcsh built-in command, “time”. In the classical algorithm case,

the elapsed time is measured from the time the attack begins to the time the ciphertext is

completely decrypted. This measurement is almost completely comprised of time spent

interacting with the user, as processing time is less than is measurable with the “time”

command. In the genetic algorithm case, each attack was run using a tcsh shell script.

Therefore, the amount of time spent in user interaction is minimal, and the elapsed time

measurement is comprised almost wholly of processing time. The differences in composi-

tion of elapsed time for the two categories of attack make it difﬁcult to compare the attacks

on a time basis. Therefore, an additional metric is used for the genetic algorithm attacks.

This additional metric, used only for the genetic algorithm attacks, is the percentage

of attacks which completely decrypted the ciphertext, as determined by the human crypt-

analyst upon examination of the ﬁnal result. This metric is invalid in a classical attack,

as this class of attacks run until the text is decrypted. A genetic algorithm attack, on the

other hand, runs for a speciﬁc number of generations, independent of the decryption of the

ciphertext. Using this metric, as well as the elapsed time, gives one a better feel for the

usefulness of each attack. The time measurement is irrelevant if the attack is unsuccessful.

These two metrics were chosen so that traditional and genetic algorithm attacks could

59

be compared. Elapsed time measures the efﬁciency of the attack while the percentage of

successful attacks measures how useful the attack is. Both of these metrics are needed to

gauge the success of an attack.

The same set of ciphertexts was used for all attacks. This set consists of 100, 500,

and 1000 characters of ﬁve different English texts. These texts are: Chapter 1 of Genesis

from the King James Version of the Bible [2], Patrick Henry’s “Give me liberty or give

me death!” speech [3], Section 1/Part 1 of Charles Darwin’s The Origin of Species [10],

Chapter 1 of H. G. Wells’s The Time Machine [30], and Chapter 1/Part 1 of Louisa May

Alcott’s Little Women [1].

Additional parameters were required for the genetic algorithm attacks. The parame-

ters were compiled from the various genetic algorithm attacks found in the literature and

then extended to cover greater parameter space. All attacks were run using the following

parameters:

• Population sizes of 10, 20, 40, or 50

• Number of generations equaling 25, 50, 100, 200 or 400

• Mutation rates of 0.05, 0.10, 0.20, 0.40

Each possible combination of parameters was run once per ciphertext and length combina-

tion.

Certain genetic algorithm attacks required additional parameters. The columnar trans-

position attacks were run using keylengths of 7, 9, or 11. The Vigen` ere attack was run

using a period of 3, 5, or 7. The ﬁtness weights used in the Vigen` ere attack were: 0.1 for

unigrams and digrams, 0.8 for trigrams. The permutation attack was run using keylengths

of 2, 4, or 6. The only attack that varied from others attacking its cipher type was the

Matthews [20] columnar transposition attack. An additional set of parameters was used for

this attack. These parameters were:

• Initial and Final Crossover probabilities

60

• Initial and Final Point Mutation probabilities

• Initial and Final Shufﬂe Mutation probabilities

In order to use the same set of parameters as the other attacks, both crossover probabilities

were set to 1 and both of the point mutation and shufﬂe mutation values were set to 0.05,

0.10, 0.20 or 0.40. The additional set of parameters used consisted of setting the initial

crossover probability to 0.8, the ﬁnal crossover probability to 0.5, the initial point mutation

probability to 0.1, the ﬁnal point mutation probability to 0.5, the initial shufﬂe mutation

probability to 0.1, and the ﬁnal shufﬂe mutation probability to 0.8. This additional set of

parameters was used throughout the original results reported for the Matthews attack.

6.2 Classical Attack Results

As stated above, all classical attacks are run until complete decryption of the ciphertext

is obtained. Since each ciphertext is different in both statistics and distribution of com-

mon/uncommon words, the time needed to decrypt a text can vary widely. The more math-

ematical an attack is, the shorter the period of time required.

Since a classical substitution attack is based on statistical information about the text,

as well as the user’s intuition, the time this attack takes can vary widely from text to text.

Usually, a longer text is easier to decrypt, as the statistical characteristics are more distinct.

Some texts are statistically very well-deﬁned, making them easy to decrypt even at shorter

text lengths. The ﬁrst chapter of Genesis, for example, is easy to decrypt due to the clarity

of the statistical evidence. Other texts, that use less common words or word-patterns, such

as the Liberty or Death speech, are consistently difﬁcult.

As Figure 6.1 shows, times can vary widely between texts, and between different

lengths of the same text. Times range from 1:50 to 10:25, and can vary as much as 6:25 or

as little as :45 between different lengths of a text.

The classical Vigen` ere attack, on the other hand, is very mathematically based. This

means that the time to decrypt a text is much more consistent. Longer decryption times

61

text file

genesis liberty littlew origin timem

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

)

0:00

0:30

1:00

1:30

2:00

2:30

3:00

3:30

4:00

4:30

5:00

5:30

6:00

6:30

7:00

7:30

8:00

8:30

9:00

9:30

10:00

10:30

100 chars

500 chars

1000 chars

Figure 6.1: Classical Substitution Attack

mean that there was more than one good candidate for the period, leading to multiple

attempts at decryption. This is what occurred for most of the 100 character texts when a

period of 5 or 7 was used.

As Figure 6.2 shows, times are fairly consistent among different texts and text lengths.

Times range from :58 to 6:25, and vary as much as 4:55 or as little as :10 between different

text lengths.

The classical permutation attack is a brute force, exhaustive search. Therefore, the

amount of time the attack takes depends both on the characteristics of the text and the

block length used. A longer block length takes more time to decrypt because there are

many more possibilities that must be considered and sorted through. Texts that contain less

common character combinations will take longer to decrypt because fewer permutations

result in that particular combination.

Figure 6.3 shows that this attack is more consistent for elapsed time values at smaller,

more trivial block lengths. At longer block lengths, it takes more time to sort through the

possible permutations to determine if a correct arrangement has been found. The word

62

text file

genesis liberty littlew origin timem

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

)

0:00

0:30

1:00

1:30

2:00

2:30

3:00

3:30

4:00

4:30

5:00

5:30

6:00

6:30

100 chars (period = 3)

500 chars (period = 3)

1000 chars (period = 3)

100 chars (period = 5)

500 chars (period = 5)

1000 chars (period = 5)

100 chars (period = 7)

500 chars (period = 7)

1000 chars (period = 7)

Figure 6.2: Classical Vigen` ere Attack

choice of the text can make it more difﬁcult to determine if a correct permutation has been

found, since longer or less common words are more difﬁcult to distinguish from a partial

root. Times range from :13 to :15 for texts at block length 2, :21 to :38 for texts at block

length 4, and 1:21 to 2:25 for texts at block length 6.

A classical columnar transposition attack begins with statistical analysis of the cipher-

text. The user then proceeds with anagramming, or rearranging the columns to produce

recognizable words. Like the classical substitution attack, a great deal depends on the

user’s intuition. The more often easily recognizable words occur in the text, the easier

and faster the user can rearrange the columns correctly. The characteristics of the text,

therefore, are a signiﬁcant factor in the length of time required to decrypt the text.

As shown in Figure 6.4, block length does not heavily inﬂuence the time it takes to

decrypt a text. The text attacked, and its length, are more inﬂuential factors. Times range

from :30 to 3:15, with certain texts fairly consistently taking more time.

63

text file

genesis liberty littlew origin timem

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

)

0:00

0:15

0:30

0:45

1:00

1:15

1:30

1:45

2:00

2:15

2:30

block length = 2

block length = 4

block length = 6

100 chars

500 chars

1000 chars

Figure 6.3: Classical Permutation Attack

6.3 Genetic Algorithm Attack Results

Only the attacks on the monoalphabetic substitution, polyalphabetic substitution (i.e. the

Vigen` ere cipher), permutation and columnar transposition were selected for further inves-

tigation. This includes the attacks by Spillman et al. (1993) [26], Matthews (1993) [20],

Clark (1994) [4], Clark et al. (1996)[8] / Clark & Dawson (1997) [5], Clark & Dawson

(1998) [6], and both attacks from Gr¨ undlingh & Van Vuuren (2002) [13].

The majority of the Genetic Algorithm based attacks did not correctly decrypt any

texts. Two of the attacks, Matthews (1993) [20] and Gr¨ undlingh & Van Vuuren (2002)

[13], decrypted some texts correctly. One attack, Clark (1994) [4] correctly decrypted

many texts. Only the three successful attacks will be discussed here, as time measurements

for the four unsuccessful attacks are irrelevant.

The Matthews (1993) [20] attack and the Gr¨ undlingh & Van Vuuren (2002) [13] at-

tack were against the columnar transposition cipher, while the Clark (1994) [4] attack was

against the permutation cipher. The Matthews attack was successful only for key length

64

text file

genesis liberty littlew origin timem

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

)

0:00

0:15

0:30

0:45

1:00

1:15

1:30

1:45

2:00

2:15

2:30

2:45

3:00

3:15

100 chars (keylength = 7)

500 chars (keylength = 7)

1000 chars (keylength = 7)

100 chars (keylength = 9)

500 chars (keylength = 9)

1000 chars (keylength = 9)

100 chars (keylength = 11)

500 chars (keylength = 11)

1000 chars (keylength = 11)

Figure 6.4: Classical Transposition Attack

7; there were no successes at a keylength of 9 or 11. The per-text percentage of complete

breaks at key length 7 ranged from 0 % to 20 %, with most texts in the 2 to 4 % range.

Mutation and mating rates had little effect on decryption success, with more decryptions

occurring at larger population sizes and/or more generations.

The Gr¨ undlingh & Van Vuuren attack was also only successful at a keylength of 7. Per-

text decryption percentages at key length 7 ranged from 0 % to 28 %. The 100 character

text segments failed almost uniformly - there was one text of the ﬁve that was broken once,

giving it a success rate of 0.416 % as opposed to 0 %. Also, Chapter 1 of Genesis failed

to decode at any text length. The other 500 and 1000 character text segments achieved a

success rate of 20 % to 28 %. Mutation rate and type had no noticeable effect, and more

decryptions occurred with larger population sizes and/or more generations. A population

size of 50 resulted in decryption most of the time, as compared to the other population sizes

(10, 20, 40).

In comparison to the other attacks, Clark’s attack of 1994 [4] was wildly successful.

A block length of 2 always decrypted, while block lengths of 4 and 6 had success rates

65

ranging from 5 % to 91 %. Most of the per-text decryption percentages at block length 4

were in the high seventies or low eighties, while the percentages at block length 6 were

usually in the mid-forties. Overall, 66 % of the 100 character text tests, 73.5 % of the 500

character text tests, and 74.75 % of the 1000 character text tests were correctly decrypted.

As above, larger population sizes and/or more generations increased the decryption rate,

with no obvious effect from mutation rate. Block length 6 texts were especially sensitive

to the number of generations and population size, failing most of the time on a population

size of 10 or 20.

The effects of varying the population size and number of generations in this attack

(Clark 1994 [4]) are shown in the following graphs. The ﬁrst set of graphs, Figures 6.5

through 6.8, show the effect of the number of generations on a speciﬁc population size.

This effect is independent of the block length used. Each graph shows all three block

lengths (2, 4, & 6), but the times are so close as to form one line when displayed on the

same graph. This set of graphs also illustrates the minimal effect of the mutation rate. Each

group of tests at 25, 50, 100, 200 or 400 generations, forms a plateau for time, indicating

that varying the mutation rate from 5 % to 40 % has no measurable effect.

The second set of graphs, Figures 6.9 through 6.13, show the effect of the population

size at a speciﬁc number of generations. Again, this effect is independent of the block

length used. Each graph shows all three block lengths, but only in Figures 6.9 and 6.10 can

the difference between block lengths be seen. This visibility is due to the tiny range used in

these graphs - a maximum elapsed time of 0.65 or 1.30 seconds. Again, the minimal effect

of mutation rate on the elapsed time is apparent.

Overall, these graphs show the expected effect of increasing the number of generations

or the population size - more time is required due to the larger pool of chromosomes that

need evaluation.

66

Number of Generations−Mutation Rate

2

5

−

0

5

2

5

−

1

0

2

5

−

2

0

2

5

−

4

0

5

0

−

0

5

5

0

−

1

0

5

0

−

2

0

5

0

−

4

0

1

0

0

−

0

5

1

0

0

−

1

0

1

0

0

−

2

0

1

0

0

−

4

0

2

0

0

−

0

5

2

0

0

−

1

0

2

0

0

−

2

0

2

0

0

−

4

0

4

0

0

−

0

5

4

0

0

−

1

0

4

0

0

−

2

0

4

0

0

−

4

0

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

.

s

s

)

0:0.00

0:0.10

0:0.20

0:0.30

0:0.40

0:0.50

0:0.60

0:0.70

0:0.80

0:0.90

0:1.00

0:1.10

0:1.20

0:1.30

0:1.40

0:1.50

0:1.60

0:1.70

0:1.80

0:1.90

0:2.00

0:2.10

100 chars

500 chars

1000 chars

Figure 6.5: Effect of Number of Generations at Population Size 10

Number of Generations−Mutation Rate

2

5

−

0

5

2

5

−

1

0

2

5

−

2

0

2

5

−

4

0

5

0

−

0

5

5

0

−

1

0

5

0

−

2

0

5

0

−

4

0

1

0

0

−

0

5

1

0

0

−

1

0

1

0

0

−

2

0

1

0

0

−

4

0

2

0

0

−

0

5

2

0

0

−

1

0

2

0

0

−

2

0

2

0

0

−

4

0

4

0

0

−

0

5

4

0

0

−

1

0

4

0

0

−

2

0

4

0

0

−

4

0

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

.

s

s

)

0:0.00

0:0.25

0:0.50

0:0.75

0:1.00

0:1.25

0:1.50

0:1.75

0:2.00

0:2.25

0:2.50

0:2.75

0:3.00

0:3.25

0:3.50

0:3.75

0:4.00

0:4.25

100 chars

500 chars

1000 chars

Figure 6.6: Effect of Number of Generations at Population Size 20

67

Number of Generations−Mutation Rate

2

5

−

0

5

2

5

−

1

0

2

5

−

2

0

2

5

−

4

0

5

0

−

0

5

5

0

−

1

0

5

0

−

2

0

5

0

−

4

0

1

0

0

−

0

5

1

0

0

−

1

0

1

0

0

−

2

0

1

0

0

−

4

0

2

0

0

−

0

5

2

0

0

−

1

0

2

0

0

−

2

0

2

0

0

−

4

0

4

0

0

−

0

5

4

0

0

−

1

0

4

0

0

−

2

0

4

0

0

−

4

0

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

.

s

s

)

0:0.00

0:0.50

0:1.00

0:1.50

0:2.00

0:2.50

0:3.00

0:3.50

0:4.00

0:4.50

0:5.00

0:5.50

0:6.00

0:6.50

0:7.00

0:7.50

0:8.00

0:8.50

100 chars

500 chars

1000 chars

Figure 6.7: Effect of Number of Generations at Population Size 40

Number of Generations−Mutation Rate

2

5

−

0

5

2

5

−

1

0

2

5

−

2

0

2

5

−

4

0

5

0

−

0

5

5

0

−

1

0

5

0

−

2

0

5

0

−

4

0

1

0

0

−

0

5

1

0

0

−

1

0

1

0

0

−

2

0

1

0

0

−

4

0

2

0

0

−

0

5

2

0

0

−

1

0

2

0

0

−

2

0

2

0

0

−

4

0

4

0

0

−

0

5

4

0

0

−

1

0

4

0

0

−

2

0

4

0

0

−

4

0

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

.

s

s

)

0:0.00

0:0.50

0:1.00

0:1.50

0:2.00

0:2.50

0:3.00

0:3.50

0:4.00

0:4.50

0:5.00

0:5.50

0:6.00

0:6.50

0:7.00

0:7.50

0:8.00

0:8.50

0:9.00

0:9.50

0:10.00

0:10.50

100 chars

500 chars

1000 chars

Figure 6.8: Effect of Number of Generations at Population Size 50

68

Population Size−Mutation Rate

10−05 10−10 10−20 10−40 20−05 20−10 20−20 20−40 40−05 40−10 40−20 40−40 50−05 50−10 50−20 50−40

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

.

s

s

)

0:0.00

0:0.05

0:0.10

0:0.15

0:0.20

0:0.25

0:0.30

0:0.35

0:0.40

0:0.45

0:0.50

0:0.55

0:0.60

0:0.65

100 chars (block length = 2)

500 chars (block length = 2)

1000 chars (block length = 2)

100 chars (block length = 4)

500 chars (block length = 4)

1000 chars (block length = 4)

100 chars (block length = 6)

500 chars (block length = 6)

1000 chars (block length = 6)

Figure 6.9: Effect of Population Size at 25 Generations

Population Size−Mutation Rate

10−05 10−10 10−20 10−40 20−05 20−10 20−20 20−40 40−05 40−10 40−20 40−40 50−05 50−10 50−20 50−40

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

.

s

s

)

0:0.00

0:0.10

0:0.20

0:0.30

0:0.40

0:0.50

0:0.60

0:0.70

0:0.80

0:0.90

0:1.00

0:1.10

0:1.20

0:1.30

100 chars (block length = 2)

500 chars (block length = 2)

1000 chars (block length = 2)

100 chars (block length = 4)

500 chars (block length = 4)

1000 chars (block length = 4)

100 chars (block length = 6)

500 chars (block length = 6)

1000 chars (block length = 6)

Figure 6.10: Effect of Population Size at 50 Generations

69

Population Size−Mutation Rate

10−05 10−10 10−20 10−40 20−05 20−10 20−20 20−40 40−05 40−10 40−20 40−40 50−05 50−10 50−20 50−40

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

.

s

s

)

0:0.00

0:0.10

0:0.20

0:0.30

0:0.40

0:0.50

0:0.60

0:0.70

0:0.80

0:0.90

0:1.00

0:1.10

0:1.20

0:1.30

0:1.40

0:1.50

0:1.60

0:1.70

0:1.80

0:1.90

0:2.00

0:2.10

0:2.20

0:2.30

0:2.40

0:2.50

0:2.60

100 chars

500 chars

1000 chars

Figure 6.11: Effect of Population Size at 100 Generations

Population Size−Mutation Rate

10−05 10−10 10−20 10−40 20−05 20−10 20−20 20−40 40−05 40−10 40−20 40−40 50−05 50−10 50−20 50−40

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

.

s

s

)

0:0.00

0:0.25

0:0.50

0:0.75

0:1.00

0:1.25

0:1.50

0:1.75

0:2.00

0:2.25

0:2.50

0:2.75

0:3.00

0:3.25

0:3.50

0:3.75

0:4.00

0:4.25

0:4.50

0:4.75

0:5.00

0:5.25

100 chars

500 chars

1000 chars

Figure 6.12: Effect of Population Size at 200 Generations

70

Population Size−Mutation Rate

10−05 10−10 10−20 10−40 20−05 20−10 20−20 20−40 40−05 40−10 40−20 40−40 50−05 50−10 50−20 50−40

e

l

a

p

s

e

d

t

i

m

e

(

m

m

:

s

s

.

s

s

)

0:0.00

0:0.50

0:1.00

0:1.50

0:2.00

0:2.50

0:3.00

0:3.50

0:4.00

0:4.50

0:5.00

0:5.50

0:6.00

0:6.50

0:7.00

0:7.50

0:8.00

0:8.50

0:9.00

0:9.50

0:10.00

0:10.50

100 chars

500 chars

1000 chars

Figure 6.13: Effect of Population Size at 400 Generations

6.4 Comparison of Results

Overall, the GA-based methods were unsuccessful. Traditional methods took more time,

but were always successful. Only three of the seven GA-based attacks re-implemented

achieved any success. The successful attacks were on the permutation and transposition

ciphers.

The traditional attack on the transposition cipher produced elapsed times ranging from

:30 to 3:15, with certain texts fairly consistently taking more time. It was successful 100 %

of the time, as is standard with a classical attack run to completion.

The two GA-based attacks on the transposition cipher were by Matthews [20] and

Gr¨ undlingh & Van Vuuren [13]. Both GA attacks were only successful at a key length

of 7. The Matthews attack achieved success rates of 2 - 4 %, on average. The Gr¨ undlingh

& Van Vuuren attack averaged success rates of 6 - 7 %.

The Matthews attack took from :00.05 to :06.27 in the tests run on the 100 character

texts, from :00.06 to :07.29 on the 500 character text tests, and from :00.08 to :07.46 on the

71

1000 character text tests. These elapsed time values are dependent on the population size

and number of generations used in a given test. There is no correlation between elapsed

time and success rate.

The Gr¨ undlingh & Van Vuuren attack took from :00.00 (according to the resolution of

the measurement tool) to :01.21 in the tests run on the 100 character texts, from :00.00 to

:02.27 on the 500 character text tests, and from :00.00 to :02.99 on the 1000 character text

tests. Again, there is no correlation between elapsed time and success rate.

The GA-based attacks took much much smaller amounts of time to complete each test

run. At the tiny success rates these two attacks achieved, one success generally takes

more time to produce than running a traditional attack. At the 2 - 4 % success rate of

the Matthews attack, the test which took the longest time (:07.46) would need to be run

over and over for a total time of about 6:22 to produce one success. For the Gr¨ undlingh &

Van Vuuren attack, with the success rate of 6 - 7 %, the test which took the longest time

(:02.99) would need to be run multiple times for a total time of about :50. In both of these

estimates, it is assumed that success occurs randomly and spread evenly throughout the

tests, which is not the case in actual testing - the successful results tend to cluster together.

This being the case, it is not likely that either of the GA-based attacks will produce a result

faster than the traditional attack, in the typical case.

The classical permutation attack is a brute force, exhaustive search. Times range from

:13 to :15 for texts at block length 2, :21 to :38 for texts at block length 4, and 1:21 to 2:25

for texts at block length 6.

The only GA-based attack on the permutation cipher was the Clark (1994) attack [4].

In comparison to the other attacks, this GA-based attack was wildly successful. A block

length of 2 always decrypted, while block lengths of 4 and 6 had success rates ranging

from 5 % to 91 %. Most of the per-text decryption percentages at block length 4 were in

the high seventies or low eighties, while the percentages at block length 6 were usually in

the mid-forties. Overall, 66 % of the 100 character text tests, 73.5 % of the 500 character

text tests, and 74.75 % of the 1000 character text tests were correctly decrypted.

72

This attack took from :00.06 to :06.12 in tests run on the 100 character texts, from

:00.08 to :07.96 on the 500 character text tests, and from :00.10 to :10.25 on the 1000

character text tests. These elapsed time values are dependent on the population size and

number of generations used in a given test. There is no correlation between elapsed time

and success rate.

The Clark (1994) attack is very competitive with the traditional attack. The elapsed time

required is much smaller, and the success rates offer a excellent chance that this GA-based

approach will be faster than the traditional method, on average.

73

Chapter 7

Extensions and Additions

There were three genetic algorithm attacks that achieved some success in decryption. These

attacks were:

• Matthews’ [20] attack on the transposition cipher

• Gr¨ undlingh and Van Vuuren’s [13] attack on the transposition cipher

• Clark’s [4] attack on the permutation cipher

Since the Matthews [20] and Gr¨ undlingh & Van Vuuren [13] were only slightly suc-

cessful, these attacks were re-run with larger population sizes and more generations, to

possibly improve success rates. The main problem with these two attacks, besides the low

rate of success, is that only a key length of 7 was successfully attacked. The Clark [4] at-

tack was very successful originally, so an attempt was made to extend its success to larger

block sizes and to apply this attack to a different cryptosystem.

7.1 Genetic Algorithm Attack Extensions

The Matthews attack was successful only for key length 7 - there were no successes at a

keylength of 9 or 11. The per-text percentage of complete breaks at key length 7 ranged

from 0 % to 20 %, with most texts in the 2 to 4 % range. Mutation and mating rates had

little effect on decryption success, with more decryptions occurring at larger population

sizes and/or more generations.

74

The Gr¨ undlingh & Van Vuuren attack was also only successful at a keylength of 7.

Per-text decryption percentages at key length 7 ranged from 0 % to 28 %. The overall

average for success rate was in the 6 % to 7 % range. Mutation rate and type had no

noticeable effect, and more decryptions occurred with larger population sizes and/or more

generations.

In comparison to the other attacks, Clark’s attack of 1994 [4] was wildly successful.

A block length of 2 always decrypted, while block lengths of 4 and 6 had success rates

ranging from 5 % to 91 %. Most of the per-text decryption percentages at block length 4

were in the high seventies or low eighties, while the percentages at block length 6 were

usually in the mid-forties. Overall, 66 % of the 100 character text tests, 73.5 % of the 500

character text tests, and 74.75 % of the 1000 character text tests were correctly decrypted.

As above, larger population sizes and/or more generations increased the decryption rate,

with no obvious effect from mutation rate.

7.1.1 Transposition Cipher Attacks

The two attacks on the transposition cipher were run with similar parameters. Population

sizes of 100 and 200 were used, in combination with 1,000, 5,000, or 10,000 generations.

The other parameters for each attack were:

• Matthews

– A point mutation rate starting at 0.1 and ﬁnishing at 0.5

– A shufﬂe mutation rate starting at 0.1 and ﬁnishing at 0.8

– A mating rate starting at 0.8 and ﬁnishing at 0.5

• Gr¨ undlingh and Van Vuuren

– A mutation rate of 0.10

– Mutation using both point and shift mutation

75

As before, the following set of ciphertexts was used: Chapter 1 of Genesis from the

King James Version of the Bible [2], Patrick Henry’s “Give me liberty or give me death!”

speech [3], Section 1/Part 1 of Charles Darwin’s The Origin of Species [10], Chapter 1 of

H. G. Wells’s The Time Machine [30], and Chapter 1/Part 1 of Louisa May Alcott’s Little

Women [1].

Unfortunately, the larger values for the population size and number of generations did

not result in any improvement in the results. The success rate for the Matthews [20] attack

remained in the 2 % to 4 % range, while the rate for Gr¨ undlingh and Van Vuuren [13] was

in the 6 % to 7 % range. Again, only texts at a key length of 7 were decrypted.

7.1.2 Permutation Cipher Attack

The attack on the permutation cipher, Clark’s 1994 attack[4], was extended to attempt

larger block sizes than before. To aid in this attempt, the parameter values were changed,

mostly to larger values. The parameters used were:

• Block lengths of 8, 16, and 32

• Population sizes of 50, 100, and 200

• A number of generations of 200, 400, 1000 or 5000

• A mutation rate of 0.001, 0.01, 0.05 or 0.10

The mutation rate parameter was the only one lowered. The mutation rates used in the

original versions of the genetic algorithm attacks are atypically high for genetic algorithm

applications. More common values are in the 0.001 and 0.01 range, so these values were

used in the extension of this attack.

Again, the following set of ciphertexts was used: Chapter 1 of Genesis from the King

James Version of the Bible [2], Patrick Henry’s “Give me liberty or give me death!” speech

[3], Section 1/Part 1 of Charles Darwin’s The Origin of Species [10], Chapter 1 of H. G.

76

Wells’s The Time Machine [30], and Chapter 1/Part 1 of Louisa May Alcott’s Little Women

[1].

The attack was erratically successful on a block length of 8, slightly successful on a

block length of 16, and unsuccessful on a block length of 32. The ciphertext attacked

seemed to play a larger role in the attack’s success. With a block length of 8, two cipher-

texts were not decrypted at all, 9 ciphertexts were decrypted more than 50 % of the time,

and 4 ciphertexts were decrypted anywhere from 12 % to 40 % of the time. The longer

texts (500 and 1000 characters) were attacked more successfully than the shortest text (100

characters). For a block length of 16, several successes were seen against two of the 1000

character ciphertexts. No attacks were successful against a block length of 32.

As before, mutation rate had no apparent effect, and a greater rate of success was

achieved among larger populations with many generations.

7.1.3 Comments

The two transposition cipher attacks do not justify writing this style of genetic algorithm at-

tack in their current form. Their success rates are stable, but low, even with extraordinarily

large parameter values.

The permutation cipher attack is possibly worthwhile at small block lengths, probably

under 10 or so. It does not work at larger block lengths in its current form. This approach

would need to be evaluated on a case-by-case basis, in order to determine if this genetic

algorithm approach will be cost-effective in comparison with a more traditional, brute force

approach.

7.2 Additions

A new attack was implemented on the substitution cipher. Two variants of this attack were

tested. The difference between the variants was solely in the scoreboard used.

This attack is a conversion of the Clark [4] attack from the permutation cipher to the

77

simple substitution cipher. The only changes made were those necessary to apply the attack

to a different cipher.

The original version of this attack used order-based GAs, since each chromosome rep-

resented a permutation. Breeding was done using the position based crossover method for

order-based genetic algorithms. In this method, a random bit string of the same length as

the chromosomes is used. The random bit string indicates which elements to take from par-

ent #1 by a 1 in the string, and the elements omitted from parent #1 are added in the order

they appear in parent #2. The second child is produced by taking elements from parent #2

indicated by a 0 in the string, and ﬁlling in from parent #1 as before. Two possible muta-

tion types may then be applied. The ﬁrst type is a simple two-point mutation where two

random chromosome elements are swapped. The second type is a shufﬂe mutation, where

the permutation is shifted forward by a random number of places. Either one or both of

these mutation types could occur, depending on the initial parameters. The population for

the next generation was produced by merging the parent and child populations and keeping

only the (population size) best solutions.

In order to apply this attack to a cipher that was not permutation-based, some changes

had to be made. The major change made was the conversion of the chromosomes from

numerical genes to alphabetical genes. Another change made was to allow both mutation

types to occur all the time - the input parameter previously present was removed. This step

was deemed reasonable as mutation type had no apparent effect in prior testing. Breeding

and new population creation occurred as before.

The difference between the two variants of the new attack is in the scoreboard used.

The ﬁrst variant uses the same scoreboard as the implementation of Clark’s [4] attack on

the permutation cipher. The second variant adds several additional items to the scoreboard.

The original scoreboard contains the digrams and trigrams TH, HE, IN, ER, AN, ED, THE,

ING, AND, and EEE. The variant scoreboard adds RE, ON, HER, and ENT, with the same

associated values standard on the original digrams and trigrams. These values are +1 for

digrams and +5 for trigrams. TH is +2 while EEE is −5.

78

The original variant was attacked using the following parameters:

• Population sizes of 10, 20, 40, and 50

• Number of generations of 25, 50, 100, 200, and 400

• Mutation rates of 0.05, 0.10, 0.20, or 0.40

The variant with the extended scoreboard was attacked with these parameters:

• Population sizes of 50, 100, and 200

• Number of generations of 200, 400, and 1000

• Mutation rates of 0.001, 0.01, 0.05, or 0.10

Both variants used the following set of ciphertexts: Chapter 1 of Genesis from the King

James Version of the Bible [2], Patrick Henry’s “Give me liberty or give me death!” speech

[3], Section 1/Part 1 of Charles Darwin’s The Origin of Species [10], Chapter 1 of H. G.

Wells’s The Time Machine [30], and Chapter 1/Part 1 of Louisa May Alcott’s Little Women

[1].

Both attacks were completely unsuccessful. None of the ciphertexts were decrypted in

any of the tests.

7.2.1 Comments

This approach appeared promising at the outset. It achieved at least some success on both

the transposition and permutation ciphers. Additional modiﬁcation of the parameters may

improve performance, but that seems doubtful given the complete lack of success displayed

in these tests. Another option would be to further tune the operators to the new cryptosys-

tem.

If this attack had proved successful, it could have served as a springboard to attacking a

modern cipher such as DES or AES with a GA. These ciphers are based on substitution and

79

permutation principles. It may have been possible to attack the components of the system

with a GA tuned for that component’s cipher.

80

Chapter 8

Evaluation and Conclusion

8.1 Summary

The primary goals of this work were to produce a performance comparison between tradi-

tional cryptanalysis methods and genetic algorithm (GA) based methods, and to determine

the validity of typical GA-based methods in the ﬁeld of cryptanalysis.

The focus was on classical ciphers, including substitution, permutation, transposition,

knapsack and Vernam ciphers. The principles used in these ciphers form the foundation

for many of the modern cryptosystems. Also, if a GA-based approach is unsuccessful on

these simple systems, it is unlikely to be worthwhile to apply a GA-based approach to

more complicated systems. A thorough search of the available literature resulted in GA-

based attacks on only these ciphers. In many cases, these ciphers were among the simplest

possible versions.

Many of the GA-based attacks lacked information required for comparison to the tra-

ditional attacks. Dependence on parameters unique to one GA-based attack does not allow

for effective comparison among the studied approaches. The most coherent and seemingly

valid GA-based attacks were re-implemented so that a consistent, reasonable set of metrics

could be collected.

The main metric for both classical and genetic algorithm attacks was elapsed time. In

the classical algorithm case, the elapsed time was comprised almost completely of time

81

spent interacting with the user, while in the genetic algorithm case, it was comprised al-

most entirely of processing time. This difference in composition of elapsed time makes

it difﬁcult to compare the attacks on a time basis. Therefore, an additional metric was

required.

The secondary metric used was the percentage of successful attacks per test set. Success

was deﬁned as complete decryption of the ciphertext. This metric was only calculated for

the GA-based attacks, as a classical attack will run until the text is decrypted. A GA-based

attack, on the other hand, runs for a speciﬁc number of generations, independent of the

decryption of the ciphertext. Using this metric, as well as elapsed time, gives one a better

feel for the usefulness of each attack. The time measurement is irrelevant if the attack is

unsuccessful.

These two metrics were chosen so that traditional and GA-based attacks could be com-

pared. Elapsed time measures the efﬁciency of the attack while the percentage of successful

attacks measures how useful the attack is. Both of these metrics are needed to gain a com-

plete picture of an attack.

Of the twelve genetic algorithm based attacks found in the literature, seven were se-

lected for re-implementation. None of the knapsack or Vernam cipher attacks were re-

implemented, for varying reasons. Of these seven, only three attacks were successful in

decrypting any of the ciphertexts. The successful attacks were those GA-based attacks on

the permutation and transposition ciphers. Clark’s [4] GA-based attack on the permutation

cipher was by far the most successful. Both of the attacks which used a scoreboard for

ﬁtness calculation, Matthews [20] and Clark [4], achieved at least some success. The third

successful GA-based attack was by Gr¨ undlingh and Van Vuuren [13], the only successful

attack to use a ﬁtness equation. These three successful attacks were extended to further ex-

amine their success rates. The two transposition attacks were tested using much larger pop-

ulation sizes and numbers of generations, while the permutation attack was attempted on

larger block lengths. The two transposition cipher attacks obtained about the same success

rate as before, while the permutation cipher attack experienced decreased success. Since

82

the scoreboard approach was successful in both cases it was used, this style of GA-based

attack was attempted on the monoalphabetic substitution cipher. The attack was based on

Clark’s [4] attack, simply altered to apply to a different cipher. Unfortunately, this attempt

was unsuccessful. No ciphertexts were completely decrypted with either variant attempted.

8.2 Future Work

There are several areas in which this work could be expanded. One of these is in the

investigation of the scoreboard approach to ﬁtness calculation. This method achieved the

best results of the two methods used, yet does not seem to be in widespread use. Applying

this approach to other problems may prove useful.

A comparison of the scoreboard approach to the ﬁtness equation approach in other ap-

plications would be useful as well. This would determine whether the scoreboard approach

is limited to speciﬁc applications only, or if it has widespread applicability.

Another area for investigation is the use of less common ﬁtness calculation measures.

The scoreboard approach, for example, proved useful in this application despite the rarity

of its use. Use of other seldom-used approaches may prove additionally valuable.

This work was coded in C and used perl/tcsh scripts. Re-working of the code to make

it modular and re-usable for other projects would be a useful extension.

8.3 Conclusion

The primary goals of this work were to produce a performance comparison between tradi-

tional cryptanalysis methods and genetic algorithm (GA) based methods, and to determine

the validity of typical GA-based methods in the ﬁeld of cryptanalysis.

The focus was on classical ciphers, including substitution, permutation, transposition,

knapsack and Vernam ciphers. The principles used in these ciphers form the foundation

for many of the modern cryptosystems. Also, if a GA-based approach is unsuccessful on

83

these simple systems, it is unlikely to be worthwhile to apply a GA-based approach to

more complicated systems. A thorough search of the available literature resulted in GA-

based attacks on only these ciphers. In many cases, these ciphers were among the simplest

possible versions.

Many of the GA-based attacks lacked information required for comparison to the tra-

ditional attacks. Dependence on parameters unique to one GA-based attack does not allow

for effective comparison among the studied approaches. The most coherent and seemingly

valid GA-based attacks were re-implemented so that a consistent, reasonable set of metrics

could be collected.

The two metrics used, elapsed time and percentage of successful attacks, were chosen

so that traditional and GA-based attacks could be compared. Elapsed time measures the

efﬁciency of the attack while the percentage of successful attacks measures how useful the

attack is. Both of these metrics are needed to gain a complete picture of an attack.

Of the genetic algorithm attacks found in the literature, totaling twelve, seven were

re-implemented. Of these seven, only three achieved any success. The successful attacks

were those on the transposition and permutation ciphers by Matthews [20], Clark [4], and

Gr¨ undlingh and Van Vuuren [13], respectively. These attacks were further investigated in

an attempt to improve or extend their success. Unfortunately, this attempt was unsuccessful,

as was the attempt to apply the Clark [4] attack to the monoalphabetic substitution cipher

and achieve the same or indeed any level of success.

Overall, the standard ﬁtness equation genetic algorithm approach, and the scoreboard

variant thereof, are not worth the extra effort involved. Traditional cryptanalysis methods

are more successful, and easier to implement. While a traditional method takes more time,

a faster unsuccessful attack is worthless. The failure of the genetic algorithm approach

indicates that supplementary research into traditional cryptanalysis methods may be more

useful and valuable than additional modiﬁcation of GA-based approaches.

84

Bibliography

[1] Alcott, Louisa May. Little Women Parts I & II. Great Lit-

erature Online 1997-2003. Retrieved August 20, 2003 from

http://www.underthesun.cc/Classics/Alcott/littlewomen/.

[2] The Holy Bible, King James Version. New York: American Bible Society: 1999;

Bartleby.com, 2000. Retrieved August 20, 2003 from http://www.bartleby.com/108/.

[3] Bryan, William Jennings, ed. The World’s Famous Orations. New York: Funk and

Wagnalls, 1906; New York: Bartleby.com, 2003. Retrieved August 20, 2003 from

http://www.bartleby.com/268/.

[4] Clark, A. (1994). Modern optimisation algorithms for cryptanalysis. In Proceedings of

the 1994 Second Australian and New Zealand Conference on Intelligent Information

Systems, November 29 - December 2, (pp. 258-262).

[5] Clark, A., & Dawson, Ed. (1997). A Parallel Genetic Algorithm for Cryptanalysis of

the Polyalphabetic Substitution Cipher. Cryptologia, 21 (2), 129-138.

[6] Clark, A., & Dawson, Ed. (1998). Optimisation Heuristics for the Automated Crypt-

analysis of Classical Ciphers. Journal of Combinatorial Mathematics and Combina-

torial Computing, 28, 63-86.

[7] Clark, A., Dawson, Ed, & Bergen, H. (1996). Combinatorial Optimisation and the

Knapsack Cipher. Cryptologia, 20(1), 85-93.

[8] Clark, A., Dawson, Ed, & Nieuwland, H. (1996). Cryptanalysis of Polyalphabetic

Substitution Ciphers Using a Parallel Genetic Algorithm. In Proceedings of IEEE

International Symposium on Information and its Applications, September 17-20.

[9] Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2001). Introduction to

Algorithms: Second Edition. Cambridge, Boston: MIT Press, McGraw-Hill.

85

[10] Darwin, Charles Robert. The Origin of Species. Vol. XI. The Harvard Classics. New

York: P.F. Collier & Son, 1909-14; Bartleby.com, 2001. www.bartleby.com/11/. [Au-

gust 20, 2003].

[11] Gaines, H. F. (1956). Cryptanalysis: a Study of Ciphers and their Solutions. New

York: Dover.

[12] Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and Machine

Learning. Boston: Addison-Wesley.

[13] Gr¨ undlingh, W. & van Vuuren, J. H. (submitted 2002). Using Genetic Algo-

rithms to Break a Simple Cryptographic Cipher. Retrieved March 31, 2003 from

http://dip.sun.ac.za/˜vuuren/abstracts/abstr genetic.htm

[14] Holland, J. H. (1975). Adaptation in natural and artiﬁcial systems. Ann Arbor: The

University of Michigan Press.

[15] Kahn, D. (1967). The Codebreakers: The Story of Secret Writing. New York: Macmil-

lan.

[16] Kreher, D. L. & Stinson, D. R. (1999). Combinatorial Algorithms: Generation, Enu-

meration, and Search. Boca Raton: CRC Press.

[17] Kolodziejczyk, J., Miller, J., & Phillips, P. (1997). The application of genetic al-

gorithm in cryptoanalysis of knapsack cipher. In Krasnoproshin, V.; Soldek, J.;

Ablameyko, S.; Shmerko, V. (Eds.), Proceedings of Fourth International Conference

PRIP ’97 Pattern Recognition and Information Processing, May 20-22, (pp. 394-401).

Poland: Wydawnictwo Uczelniane Politechniki Szczecinskiej.

[18] Lin, Feng-Tse, & Kao, Cheng-Yan. (1995). A genetic algorithm for ciphertext-only

attack in cryptanalysis. In IEEE International Conference on Systems, Man and Cy-

bernetics, 1995, (pp. 650-654, vol. 1).

[19] Lubbe, J. C. A. van der. (1998). Basic Methods of Cryptography. (S. Gee, Trans.).

Cambridge: Cambridge University Press. (Original work published 1997).

[20] Matthews, R.A.J. (1993, April). The use of genetic algorithms in cryptanalysis. Cryp-

tologia, 17(4), 187-201.

[21] Menezes, A., van Oorschot, P., & Vanstone, S. (1997). Handbook of Applied Cryp-

tography. Boca Raton: CRC Press.

86

[22] Seberry, J. & Pieprzyk, J. (1989). Cryptography: An Introduction to Computer Secu-

rity. Sydney: Prentice Hall.

[23] Shamir, A. (1982). A polynomial time algorithm for breaking the basic Merkle-

Hellman cryptosystem. In Proceedings of the 23rd IEEE Symposium on Foundations

of Computer Science, 1982, (pp. 145-152).

[24] Sinkov, A. (1968). Elementary Cryptanalysis: A Mathematical Approach. New York:

Random House.

[25] Spillman, R. (1993, October). Cryptanalysis of knapsack ciphers using genetic algo-

rithms. Cryptologia, 17(4), 367-377.

[26] Spillman, R., Janssen, M., Nelson, B., & Kepner, M. (1993, January). Use of a genetic

algorithm in the cryptanalysis of simple substitution ciphers. Cryptologia, 17(1), 31-

44.

[27] Stinson, D. (2002). Cryptography: Theory and Practice. Boca Raton: CRC Press.

[28] Oliver, I.M., Smith, D.J., & Holland, J.R.C. (1987, July). A Study of Permutation

Crossover Operators on the Traveling Salesman Problem. In John J. Grefenstette

(Ed.), Proceedings of the 2nd International Conference on Genetic Algorithms, Cam-

bridge, MA, USA, July 1987, (pp. 224-230). Lawrence Erlbaum Associates.

[29] Tomassini, M. (1999). Parallel and Distributed Evolutionary Algorithms: A Review.

In K. Miettinen, M. M¨ akel¨ a, P. Neittaanm¨ aki and J. Periaux (Eds.), Evolutionary Al-

gorithms in Engineering and Computer Science (pp. 113 - 133). Chichester: J. Wiley

and Sons.

[30] Wells, H. G. The Time Machine. Bartleby.com, 2000. Retrieved August 20, 2003 from

http://www.bartleby.com/1000/

[31] Yaseen, I.F.T., & Sahasrabuddhe, H.V. (1999). A genetic algorithm for the crypt-

analysis of Chor-Rivest knapsack public key cryptosystem (PKC). In Proceedings of

Third International Conference on Computational Intelligence and Multimedia Ap-

plications, 1999, (pp. 81-85).

87

**Thesis Release Permission Form
**

Rochester Institute of Technology Kate Gleason College of Engineering Master of Science, Computer Engineering

**Title: Genetic Algorithms in Cryptography
**

I, Bethany Delman, hereby grant permission to the Rochester Institute of Technology to reproduce my print thesis in whole or in part. Any reproduction will not be for commercial use or proﬁt. I, Bethany Delman, additionally grant to the Rochester Institute of Technology Digital Media Library (RIT DML) the non-exclusive license to archive and provide electronic access to my thesis in whole or in part in all forms of media in perpetuity. I understand that my work, in addition to its bibliographic record and abstract, will be available to the world-wide community of scholars and researchers through the RIT DML. I retain all other ownership rights to the copyright of the thesis. I also retain the right to use in future works (such as articles or books) all or part of this thesis. I am aware that the Rochester Institute of Technology does not require registration of copyrights for ETDs. I hereby certify that, if appropriate, I have obtained and attached written permission statements from the owners of each third party copyrighted matter to be included in my thesis. I certify that the version I submitted is the same as that approved by my committee.

Bethany Delman

Date

Abstract

Genetic algorithms (GAs) are a class of optimization algorithms. GAs attempt to solve problems through modeling a simpliﬁed version of genetic processes. There are many problems for which a GA approach is useful. It is, however, undetermined if cryptanalysis is such a problem. Therefore, this work explores the use of GAs in cryptography. Both traditional cryptanalysis and GA-based methods are implemented in software. The results are then compared using the metrics of elapsed time and percentage of successful decryptions. A determination is made for each cipher under consideration as to the validity of the GA-based approaches found in the literature. In general, these GA-based approaches are typical of the ﬁeld. Of the genetic algorithm attacks found in the literature, totaling twelve, seven were re-implemented. Of these seven, only three achieved any success. The successful attacks were those on the transposition and permutation ciphers by Matthews [20], Clark [4], and Gr¨ndlingh and Van Vuuren [13], respectively. These attacks were further investigated in u an attempt to improve or extend their success. Unfortunately, this attempt was unsuccessful, as was the attempt to apply the Clark [4] attack to the monoalphabetic substitution cipher and achieve the same or indeed any level of success. Overall, the standard ﬁtness equation genetic algorithm approach, and the scoreboard variant thereof, are not worth the extra effort involved. Traditional cryptanalysis methods are more successful, and easier to implement. While a traditional method takes more time, a faster unsuccessful attack is worthless. The failure of the genetic algorithm approach indicates that supplementary research into traditional cryptanalysis methods may be more useful and valuable than additional modiﬁcation of GA-based approaches.

.3. . . . . . . . . . . . . . . . . . . . . . . . .6 Vernam cipher . . . . . . . . 27 4. . . . . . . . . . .2 Matthews . . . . . 15 3. . . . . . . .2 Merkle-Hellman Knapsack cipher . . 25 4. 11 Permutation/Transposition ciphers . . . .1 3. . . . 22 Comments . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . .5 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Permutation cipher . . . . .2. . . . . . . . . . . .1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Spillman . . . . . . . . . . 18 3. . . . . . . . . . . . . 22 23 4 Relevant Prior Works . . . . . .Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polyalphabetic Substitution cipher . . . . . . . . . . . . . . . . . . . . 4. . . . . . . 13 Transposition cipher . .1993 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Cryptography . . . . 16 Chor-Rivest Knapsack cipher . . . . . . .4. . . . . . . . . .1 Spillman et al. . . . . . . . . . . . . . . . . . . . . . . 31 i . .2 3. . . . . . . . . .1 3. . . .1993 . . 26 4.4. . . . . . . . 3. . . . . . . .1. . . . . . . . . . . .4 Knapsack ciphers . . . . . . . . . . . . . . . . . . . . . . . . 12 3. 1 2 4 9 9 2 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Document Overview . . . . . . . . . . . . . . . . . 1.3 Monoalphabetic Substitution cipher .1 3. . . . . . . . . . . . . . . . . .1993 . . . . . . . . . . . . 13 3. . . 30 4. . . . . . . . . . . . . . . . . .1 Comments . . . . . . . . . . . . . . . . . . . . . . .

. . 5. . . . . . . . . . . . . . Nieuwland . . . . . . . . . . . .1997 . . . . . . . . . . . . . . . .13 Overall Comments . . . . . . . . . . . . . . . 50 4. . . . . . . . . . . . .4 5.6. . . . . . . . . .submitted 2002 . . . . . . . .4. 34 4. . . . . . . . . . . . . 6. . . . . .7. . . . . . . . . . . . . . . . . . .1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4. . . . .1998 .10. . . . . . . . . . .3 5. . . . . . . . . . . . . . . .12. . . . Sahasrabuddhe . . . . . . . . . . .11 Yaseen. . . . . . . . . . . . .1 Comments . . . . . . . . . . . . . . .1996 . . . . . . . . . . .1996 . . . . 58 59 6 Attack Results . . . . . . . .9 Clark. . . . . . . . . Bergen . .1999 .3. . . . . . Dawson . . . . . . . .5. . . . . . . . . 53 Polyalphabetic substitution cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Comments . . . . . . . . . Dawson. Dawson. . . . . . . . . . . 47 u 4. . . . . . 38 4. . . . . . . . . . . . . . . . . .1995 . . 33 Clark . . . . . . . . . . . . 41 4. 35 4. . . . . . . . . . .1 Comments . 56 Comments . . . . . . . . . .5 Lin. . 43 4. . . . . . . . . . . . 41 Kolodziejczyk . .7 Clark. . . . . . . . . . . . . . . . . . . . 39 4. 35 4. . .1 Comments . . .6 Clark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 5. . . . . . . . . . . . . . Van Vuuren . . . . . . . . . . . . . . . . . . . 37 4. . . . . . . . . . .1 Methodology . . Kao . . . . . . . . . . . . . . .1 Comments . . . . . . . . . . . . . . . .9. . . . . . . . . . . . . . . . . . . . . . .11. . . . . . . . .1997 . .8 4. . . . . . 47 4. 54 Permutation/Transposition ciphers . . . . . . . . . 51 5 Traditional Cryptanalysis Methods . . . . . . .1 4. . . . . . . . 45 4. . . . . . . . . . . . . .4. . . . . . . . . . . . . . 36 4. . . . . . . . . . . . . . . . . . 44 4. . . . . . . . . . . . .5 52 Distinguishing among the three ciphers . . . 42 4. . . . . . . . . .1994 . . . . . . . . .10 Clark. . . . . .4 Comments . . . 59 ii . . . . . . . . . . . . . . . . . . . . . . .2 5. . . . . . . . 53 Monoalphabetic substitution cipher . . . . . . . . . . . .12 Gr¨ ndlingh. Dawson . .1 Comments . . .1 Comments . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Comparison of Results . .4 Classical Attack Results . . . . 83 85 Bibliography . . . . . . . . . .6. . . . . . 71 74 7 Extensions and Additions .2 7. . . . . . . . .2 8. . . . 83 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . iii . . . . . . . . . . . . . . . . . . . .1 Genetic Algorithm Attack Extensions . . . . . . . 79 81 8 Evaluation and Conclusion . . . . .2. . . . . . . . . . . . . . . 77 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Future Work . . . . . . . . . 61 Genetic Algorithm Attack Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 6. . .1 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Additions . . . . . . . . . . . . . . . . . . . . . .1 Comments . . . . . . . . . . . . . . . . . 74 7. . . . .1. .2 6. . 76 Comments . . . 77 7. . . . . . . . . . . . 8. . . . . 75 Permutation Cipher Attack . . . . . . . . . . .3 Transposition Cipher Attacks .1. . . . . . . .1 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . .1. . . . . . . . .

. . . . . . . . . . . .4 6. . . . . . . 24 Knapsack Cipher Family and Attacking Papers . . . . . . . . . . . . . . . . . . . . . . 67 Effect of Number of Generations at Population Size 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 6. .9 Substitution Cipher Family and Attacking Papers . . . 67 Effect of Number of Generations at Population Size 20 . 69 6. . . . . . . . . . . .3 4. . . . 70 6. . . . . . . . . 63 e Classical Permutation Attack . . . . . . . . .7 6. . . . . . . 68 Effect of Population Size at 25 Generations . . . . . . . . . . . . . . . . . . 71 iv . . .13 Effect of Population Size at 400 Generations . . 62 Classical Vigen` re Attack . . . . . . . . 64 Classical Transposition Attack . . . . . .12 Effect of Population Size at 200 Generations . .List of Figures 4. . . . . . . . . . . . . . . . . 65 Effect of Number of Generations at Population Size 10 . . . . . .4 6.2 4. . 69 6. . . . . . 24 Classical Substitution Attack . . . . . . . . . . . . 23 Permutation Cipher Family and Attacking Papers . . . . . . . . . .5 6. . . . . .1 4. . . 68 Effect of Number of Generations at Population Size 50 . . . . . . . . . . . . . 24 Vernam Cipher Family and Attacking Paper . . . . . . . . . . . . . . . .10 Effect of Population Size at 50 Generations . .3 6. . .8 6. . . . . . . . . .11 Effect of Population Size at 100 Generations . . . . . . . .6 6. . 70 6. . . . . . . . .1 6. . . . . . . . . . . .

Works at the level of individual letters.Glossary block A sequence of consecutive characters encoded at one time block length chromosome The number of characters in a block The genetic material of a individual . The convention is for ciphertexts to contain no white space or punctuation crossover (mating) Crossover is the process by which two chromosomes combine some portion of their genetic material to produce a child or children cryptanalysis The analysis and deciphering of cryptographic writings or systems cryptography The process or skill of communicating in or deciphering secret writings or ciphers cryptosystem The package of all processes. and instructions for encoding and decoding messages using cryptography decryption Any procedure used in cryptography to convert ciphertext (encrypted data) into plaintext digram encryption Sequence of two consecutive characters The process of putting text into encoded form v .represents the information about a possible solution to the given problem cipher An algorithm for performing encryption (and the reverse. or small groups of letters ciphertext A text in the encrypted form produced by some cryptosystem.a series of well-deﬁned steps that can be followed as a procedure. decryption) . formulae.

as opposed to its encrypted form polyalphabetic Using many alphabets .a child is randomly changed from what its parents produced in mating one-time pad order-based GA Another name for the Vernam cipher A form of GA where the chromosomes represent permutations.e.usually a numerical value generation The average interval of time between the birth of parents and the birth of their offspring . Special care must be taken to avoid illegal permutations plaintext A message before encryption or after decryption. in its usual form which anyone can read. i..refers to a cipher where each alphabetic character can be mapped to one of many possible alphabetic characters vi .refers to a cryptosystem where each alphabetic character is mapped to a unique alphabetic character mutation Simulation of transcription errors that occur in nature with a low probability .ﬁtness The extent to which a possible solution successfully solves the given problem .how many values comprise the key Using one alphabet . this is one iteration of the main loop of code genetic algorithm (GA) Search/optimization algorithm based on the mechanics of natural selection and natural genetics key A relatively small amount of information that is used by an algorithm to customize the transformation of plaintext into ciphertext (during encryption) or vice versa (during decryption) key length monoalphabetic The size of the key .in the genetic algorithm case.

per generation scoreboard Method of determining the ﬁtness of a possible solution .e.. as well as the number of solutions that can be investigated at one time. i.takes a small list of the most common digrams and trigrams and gives each chromosome a score based on how often these combinations occur trigram unigram Vigen` re e Sequence of three consecutive characters Single character The speciﬁc polyalphabetic substitution cipher used in this work vii .population The possible solutions (chromosomes) currently under investigation.

Few works exist on this topic. it could lead to faster. if a GA-based approach is unsuccessful on these simple systems. However. knapsack and Vernam ciphers. This nontraditional application is investigated to determine the beneﬁts of applying a GA to a cryptanalytic problem. more automated cryptanalysis techniques. transposition. The focus will be on classical ciphers. n was less than 16. In this scheme. The principles used in these ciphers form the foundation for many of the modern cryptosystems. Also. According to Shamir [23]. these ciphers are some of the simplest possible versions. If the GA-based approach proves successful. permutation. one of the knapsack systems attacked is the Merkle-Hellman knapsack scheme. a superincreasing sequence of length n is used as a public key. if any. a typical value for n is 100. For example. including substitution. since this area is so different from the application areas where GAs developed. and to determine the validity of typical GA-based methods in the ﬁeld of cryptanalysis. it is unlikely to be worthwhile to apply a GA-based approach to more complicated systems. This is almost an order of magnitude below the typical case. In many cases. The primary goals of this work are to produce a performance comparison between traditional cryptanalysis methods and genetic algorithm based methods. and makes these attacks useful only as proof-of-concept examples for learning the use of 1 . A thorough search of the available literature found GA-based attacks on only these ciphers. In the GA-based attacks on this cipher. the GA-based methods will likely prove to be less successful than the traditional methods.Chapter 1 Introduction The application of a genetic algorithm (GA) to the ﬁeld of cryptanalysis is rather unique.

1. the values of p and h are vitally necessary for an implementation of the Chor-Rivest knapsack scheme. The algorithm for each cipher is given. with up to four papers attacking the same cipher or variants thereof. p ≥ h. Chapter 2 gives a brief introduction to genetic algorithms. The next two chapters cover the different attacks found in the literature. Each chapter covers one aspect of the thesis in detail. Many of the GA-based attacks also lack information required for comparison to the traditional attacks.GAs. The general form of the selection. in some cases including a small example to illustrate the process of using the given cryptosystem. These metrics include elapsed time required for the attack and the success percentage of each attack. This system utilizes a ﬁnite ﬁeld. where q = ph . Chapter 3 introduces the necessary classical ciphers in detail. In other GA-based attacks. For example. The number of generations it took for a run of the GA to get to some state is not at all useful when attempting to compare and contrast attacks. Chapter 4 discusses the various GA-based attacks in detail. In the GA-based attack on this cipher. these values are not provided. The following 2 . or parameters known to be easy to attack. and the discrete logarithm problem is feasible. as is the general sequence of events for a GA-based approach. This information hole means that this attack could have been run using trivial parameters. The most coherent and seemingly valid attacks were re-implemented so that a consistent. These attacks are covered in chronological order. Fq . there is a huge gap in the knowledge needed to re-implement the attack and the information provided in the paper. This attack can not be considered remotely valid without knowing these parameters.1 Document Overview This thesis is broken into a number of chapters. of characteristic p. reproduction and mutation operators are discussed. reasonable set of metrics could be collected.

as well as new work attempted. For example. as well as the clarity and quality of the GA-based work. meaning that an attack is successful only when there is a protocol or implementation failure. The ciphers were selected for further investigation due to the difﬁculty or lack thereof of the traditional attack method. as well as extensions to the GA-based attacks. This cipher is unconditionally secure. the Vernam cipher was not selected for further investigation. Finally. and discusses those GA-based attacks that achieved some success. as well as any other directions that need further investigation. 3 . Chapter 8 discusses the overall conclusions drawn from this work.chapter. details the classical attack on those ciphers chosen for further investigation. Chapter 7 discusses the extensions made to several of the GA-based attacks. Chapter 6 covers the results of the traditional attacks. Chapter 5. The next chapters detail the results of the traditional and GA-based attacks. There is no basis for attacking this cipher when no mistakes have been made.

An individual is a string of binary digits or some other set of symbols drawn from a ﬁnite set. along with genetic programming. 4 . An evolutionary algorithm maintains a population of candidate solutions for the problem at hand. Globally satisfactory. Each encoded individual in the population may be viewed as a representation of a particular solution to a problem. and selection or something very similar. recombination. solutions to the problem are found in much the same way as populations in nature adapt to their surrounding environment. The degree of adaptation of an individual to its environment is the counterpart of the ﬁtness function evaluated on a solution. The set of operators usually consists of mutation. As summarized by Tomassini [29]. a set of feasible solutions takes the place of a population of organisms. and evolutionary programming. Using Tomassini’s terms [29]. Evolutionary algorithms can be considered as a broad class of stochastic optimization techniques. The population is then evolved by the iterative application of a set of stochastic operators. The genetic algorithm belongs to the family of evolutionary algorithms. Similarly. genetic algorithms (GAs) consider an optimization problem as the environment where feasible solutions are the individuals living in that environment. This means that survival and reproduction of an individual is promoted by the elimination of useless or harmful traits and by rewarding useful behavior. evolution strategies.Chapter 2 Genetic Algorithms The genetic algorithm is a search algorithm based on the mechanics of natural selection and natural genetics [12]. the main idea is that in order for a population of individuals to adapt to some environment. if sub-optimal. it should behave like a natural system.

. Two-point crossover is an exchange of a random substring. one-point. a new population has been produced by applying a certain number of stochastic operators to the previous population. The genetic algorithm uses two reproduction operators . For binary string individuals. parents are paired together. order. This creates an intermediate population of n “parent” individuals.In general. and the operator is called recombination [29]. and the types available depend on what representation is used for the individuals. Once the initial population has been created. a below average individual risks extinction.e. The probability of each individual being extracted should be (linearly) proportional to the ﬁtness of that individual. n independent extractions of an individual from the old population are performed [29]. This means that above average individuals should have more copies in the new population. is called a mutation operator. sexual reproduction is simulated. and uniform crossover takes each bit in the child arbitrarily from either parent. the individuals for the next generation will be created through the application of a number of reproduction operators [29]. a genetic algorithm begins with a randomly generated set of individuals. Each such iteration is known as a generation [29]. At the end of each iteration. two-point. There are several different types of crossover operators. Once the intermediate population of “parents” (those individuals selected for reproduction) has been produced. Order and partially mapped crossover are similar to two-point crossover in that two cut points are selected. the genetic algorithm enters a loop [29]. and cycle crossover are options. A selection operator is applied ﬁrst. An operator that involves just one parent. i. simulating asexual reproduction. The one-point crossover means that the parent individuals exchange a random preﬁx when creating the child individuals.crossover and mutation. To produce these “parents”. When more than one parent is involved. For order crossover. For permutation or order-based individuals. while below average individuals should have few to no copies present. the section 5 . To apply a crossover operator. partially mapped. These operators can involve one or more parents. and uniform crossover are often used.

The remaining places are ﬁlled using elements not occurring in this section. the section between the two cut points deﬁnes a series of swapping operations to be performed on the second parent [28]. Some common termination conditions are [29]: 1. A standard mutation operator for binary strings is bit inversion. Cycle crossover satisﬁes two conditions . In principle. Each cycle. then ‘a’.between the ﬁrst and second cut points is copied from the ﬁrst parent to the child [28]. it is usually stopped when a given termination condition is met. For example [28]: • Parent 1 is (h k c e f d b l a i g j) and parent 2 is (a b c d e f g h i j k l) • For position #1 in the child. the loop of selection-crossover-mutation is inﬁnite [29]. The purpose of the mutation operator is to simulate the effect of transcription errors that can happen with a very low probability when a chromosome is mutated [29]. a random parent is selected. each individual has a small chance of mutation.e.every position of the child must retain a value found in the corresponding position of a parent. a ‘0’ would mutate into a ‘1’. in the order that they occur in the second parent starting from the second cut point and wrapping around as needed. ‘i’. ‘j’. ‘h’ is selected • Therefore. Each bit in an individual has a small chance of mutating into its complement i. and the child must be a valid permutation. A pre-determined number of generations have passed 6 . ‘i’ must also be selected from parent 1 • So. For partially mapped crossover. ‘a’ cannot be selected from parent 2 and must be chosen from parent 1 • Since ‘a’ is above ‘i’ in parent 2. if ‘h’ is selected from parent 1 for position #1. and ‘l’ must also be selected from parent 1 • The positions of these elements are said to form a cycle. However. since selection continues until the ﬁrst element (‘h’ here) is reached again After crossover.

2. No improvement in solution quality has taken place for a certain number of generations The different termination conditions are possible since a genetic algorithm is not guaranteed to converge to a solution. A satisfactory solution has been found 3. The evolutionary cycle can be summarized as follows [29]: generation = 0 seed population while not (termination condition) do generation = generation + 1 calculate ﬁtness selection crossover mutation end while Typical application areas for genetic algorithms include: • Function Optimization • Graph Applications such as – Coloring – Paths – Circuit Layout • Scheduling • Satisﬁability • Game Strategies • Chess Problems 7 .

Another trait common to these application areas is the equation style of ﬁtness function. This makes the use of a genetic algorithm approach to cryptanalysis rather unusual. However. Cryptography and cryptanalysis could be considered to meet these criteria. cryptanalysis is not closely related to the typical GA application areas and. subsequently.• Computing Models • Neural Networks • Lens Design Many of these application areas are concerned with problems which are hard to solve but have easily veriﬁable solutions. ﬁtness equations are difﬁcult to generate. 8 .

e.Chapter 3 Cryptography In general. [21] which have become well known over time. the genetic algorithm approach has only been used to attack fairly simple ciphers. Most of the ciphers attacked are considered to be classical.1 Monoalphabetic Substitution cipher The simplest cipher of those attacked is the monoalphabetic substitution cipher.D. This cipher is described as follows: • Let the plaintext and the ciphertext character sets be the same. say the alphabet Z 9 . those created before 1950 A. i. Ciphers attacked include: • Monoalphabetic Substitution cipher • Polyalphabetic Substitution cipher • Permutation cipher • Transposition cipher • Merkle-Hellman Knapsack cipher • Chor-Rivest Knapsack cipher • Vernam cipher 3.

The frequency of individual characters changes.eπ (a) = X. 1. For example. Then. a plaintext of ‘plain’ becomes a ciphertext of ‘LBXZS’. a random permutation π could be (plaintext characters are in lower case and ciphertext characters in upper case) [27]: a X b c d A e H f P g O h G i Z j Q k W l B m T N Y n S o F p L q R r C s t u U v E w K x J y D z I V M This deﬁnes the encryption function like so . The decryption function. π −1 is. deﬁne the encryption function eπ (x) = π(x) 2. K. etc. therefore: A d B l C r D y E v F o G h H e I z J x K w L p M t N b O g P f Q j R q S n T m U V u s W k X a Y c Z i Using the encryption function. Monoalphabetic substitution is easy to recognize. eπ (b) = N . of 10 . deﬁne Z to be the 26 letter English alphabet. since the frequency distribution of the character set is unchanged [21]. A ciphertext of ‘YZLGHC’ would decrypt to ‘cipher’. consist of all possible permutations of the symbols in Z • For each permutation π ∈ K. where π −1 is the inverse permutation to π. and deﬁne the decryption function dπ (y) = π −1 (y). and can be obtained by switching the rows. dπ (y) is the inverse permutation.• Let the keys.

.2 Polyalphabetic Substitution cipher This cipher is a more complex version of the substitution cipher. xm + km ) • dK (y1 . This cipher has the following properties (assuming m mappings): • The key space K consists of all ordered sets of m permutations • These permutations are represented as (k1 . The encryption and decryption functions could also be written as follows (assuming that the addition/subtraction operations occur modulo the size of the alphabet): • eK (x1 . . . x2 + k2 . . . different substitution mappings are used on different portions of the plaintext [21]. y2 . ym − km ) 11 . so that the position of each plaintext character determines which mapping is applied to it. if the most frequent plaintext character X occurred ten times. . This attribute allows associations between ciphertext and plaintext characters. ym ) = (y1 − k1 . . 3. . . .e. km ) −1 −1 −1 • The decryption key associated with ek (x) is dk (y) = (k1 . . x2 . assuming the key k = (k1 . . . . x2 . Instead of mapping the plaintext alphabet onto ciphertext characters. i. The simplest case consists of using different alphabets sequentially and repeatedly. . .course. . . . . k2 . xm ) is given by • ek (x) = k1 (x1 )k2 (x2 ) . . xm ) = (x1 + k1 . y2 − k2 . km ). km ) • Each permutation ki is deﬁned on the alphabet in use • Encryption of the message x = (x1 . the ciphertext character X maps to will appear ten times. . . k2 . k2 . Under different alphabets. making frequency analysis harder. . . the same plaintext character is encrypted to different ciphertext characters. km (xm ). . . . . . . . The result is called polyalphabetic substitution. . .

B ↔ 1. A example of the Vigen` re cipher is: e • Suppose m = 6 • Using the mapping A ↔ 0. This is true for both ciphers considered.3 Permutation/Transposition ciphers Although simple transposition ciphers change the dependencies between consecutive characters.2 17 24 15 19 14 18 24 18 19 4 12 • Write the plaintext elements in groups of 6 and add the keyword modulo 26 2 • 2 4 17 8 25 24 15 13 15 7 22 19 4 23 14 17 5 18 24 18 19 4 12 2 20 8 6 15 7 7 0 4 17 8 3 • The alphabetic equivalent of the ciphertext string is: EZNWXFUGHAID. . Both ciphers apply the same principles. The permutation cipher is applied to blocks of characters while the columnar transposition cipher is applied to the entire text at once. . Z ↔ 25 • If the keyword K = CIPHER. 4. making each plaintext character equivalent to m alphabetic characters. the same keyword is used but is subtracted modulo 26 from the ciphertext 3. • To decrypt.The most common polyalphabetic substitution cipher is the Vigen` re cipher. 12 . 8. This cie pher encrypts m characters at a time. numerically (2. 17) • Suppose the plaintext is the string ‘cryptosystem’ • Convert the plaintext elements to residues modulo 26 . but differ in how the transformation is applied. . they are fairly easily recognized since the frequency distribution of the characters is preserved [21]. 15. 7. the permutation cipher and the columnar transposition cipher. .

. . . . . and K consist of all permutations of {1.3. The key is a prearranged sequence of numbers which determines both the width of the inscription 13 . it is ﬁrst partitioned into groups of m letters (‘apermu’ and ‘tation’) and then rearranged according to the permutation π. . assuming m = 6. but alter their positions by rearrangement using a permutation [27]. . This cipher is deﬁned as: • Let m be a positive integer. π −1 is constructed by interchanging the two rows. The inverse permutation. The ciphertext is decrypted in a similar process. and rearranging the columns so that the ﬁrst row is in increasing order. and the key is the permutation π: i 1 2 3 4 5 6 π(i) 3 5 1 6 4 2 The ﬁrst row is the value of i. . . and the second row is the corresponding value of π(i). . . yπ−1 (m) ) A small example. xπ(m) ) • The decryption function dπ (y1 . this cipher consists of writing the plaintext into a rectangle with a prearranged number of columns and transcribing it vertically by columns to yield the ciphertext. . . π −1 is: x 1 2 3 4 5 6 π −1 (x) 3 6 1 5 2 4 If the plaintext given is ‘apermutation’. As described in [24]. . Therefore.1 Permutation cipher The idea behind a permutation cipher is to keep the plaintext characters unchanged. deﬁne: • The encryption function eπ (x1 . m} • For a key (permutation) π ∈ K.3. making the ciphertext equal ‘EMAURPTOTNIA’. . . using the inverse permutation π −1 . . .3.2 Transposition cipher Another type of cipher in this category is the columnar transposition cipher. . 3. . xm ) = (xπ(1) . ym ) = (yπ−1 (1) . This yields ‘EMAURP’ and ‘TOTNIA’.

Q and R) to make the rectangle completely ﬁlled [24]. ECDTM ECAER AUOOL EDSAM MERNE NASSO DYTNR VBNLC RLTIQ LAETR IGAWE BAAEI HOR. For example. where numbers are assigned to the keyword letters according to the alphabetical order of the letters. if the keyword is SORCERY. white spaces and punctuation are ignored in this and all ciphertexts used. it would ﬁrst be written out on a width of 7 under the numerical key. the message does not ﬁll the entire rectangle. If we wish to encipher the message LASER BEAMS CAN BE MODULATED TO CARRY MORE INTELLIGENCE THAN RADIO WAVES. the resulting numerical key would be 6 3 4 1 2 5 7 [24]. This key is usually derived from an agreed keyword by assigning numbers to the individual letters according to their alphabetical order.rectangle and the order in which to transcribe the columns [24]. 14 . therefore. The ciphertext is. The rectangle would look like this: S O 6 L A E T R I G A W 3 A M M E R N E N A R 4 S S O D Y T N R V C 1 E C D T M E C A E E 2 R A U O O L E D S R 5 B N L C R L T I Q Y 7 E B A A E I H O R Since the message length was not a multiple of the keyword length. Enciphering consists of reading the ciphertext out vertically in the order of the numbered columns. if so desired [24]. so two dummy letters are added (here. It can be written in groups of ﬁve letters at the same time. Deciphering is simpliﬁed if the last row contains no blank cells. As is traditional.

49. 2. 98. 98. as it was originally entered. The Merkle-Hellman knapsack cipher is important as it was the ﬁrst implementation of a public-key encryption scheme. an NPcomplete problem. The basic idea of a knapsack cipher is to select an instance of the subset sum problem that is easy to solve. 2. 2. . . Given in this problem are a ﬁnite set S ⊂ N+ (N+ is the set of natural numbers {1.The decipherer would count the number of letters in the message (63) and determine the length of the inscription rectangle by dividing this value by the length of the keyword. 14. The plaintext is read off by rows. 686. The numerical key is then written above the rectangle and the ciphertext characters entered into the diagram in the order of the key numbers [24]. 17206. 17206. 2793. 15 . 2409. This makes the inscription rectangle 7 × 9 in this case. 16808. . 7. including the original. 2409. and a target t ∈ N+ . 343. 117705} is a solution. 117993} • And t = 138457 • Then the subset S = {1. Many variations on it have been suggested but most. 686. 343. A notable exception to this is the Chor-Rivest knapsack system [21].4 Knapsack ciphers Knapsack public-key encryption systems are based on the subset sum problem. 7. For example: • If S = {1. The question is whether there exists a subset S ⊆ S whose elements sum to t [9]. }). 117705. while the transformed knapsack set serves as the public key. 3. and then disguise it as an instance of the general subset sum problem which is hopefully difﬁcult to solve [21]. have been demonstrated to be insecure [21]. The original (easy) knapsack set can serve as the private key.

. i = 1. a2 . . . ﬁnd the integers r1 . . . bn ) of positive integers with the property that bi > i−1 j=1 bj . for each i. a2 . . . . A should do the following: • Compute d = W −1 c mod M • By solving a superincreasing subset sum problem. . . bn )). . selected such that M > b1 + b2 + . where ai = W bπ(i) mod M . . b2 . . . . 2. . . Encryption . . ri ∈ {0. such that 1 ≤ W ≤ M − 1 and gcd(W. an ). b2 . an ) • Represent the message m as a binary string of length n. . . mn • Compute the integer c = m1 a1 + m2 a2 + . . . . . + bn . M ) = 1. r2 . • m = m1 m2 . . A small example is as follows [21]: 16 . b2 . W is a random integer. . . . Decryption . and the private key of A is (π. + rn bn . 2 ≤ i ≤ n. . 1} such that d = r1 b1 + r2 b2 + . M. A superincreasing sequence is a sequence (b1 . called a superincreasing subset sub problem. n}. . The public key of A is (a1 .3. . . . . . The steps in the transmission of a message m are as follows (B is the sender. A is the receiver) [21]: 1. W. . . • The message bits are mi = rπ(i) .4. + mn an • Send the ciphertext c to A 2.B should do the following: • Obtain A’s authentic public key (a1 . b1 . (b1 . by modular multiplication and a permutation [21].1 Merkle-Hellman Knapsack cipher The Merkle-Hellman knapsack cipher attempts to disguise an easily solved instance of the subset sum problem. π is a random permutation of the integers {1. rn . bn is the superincreasing sequence and M is the modulus. . n. . 2.To recover the plaintext m from c. . The integer n is a common system parameter.

and sends it to A 1. m1 = r3 = 1. 136 = 12r1 + 17r2 + 33r3 + 74r4 + 157r5 + 316r6 5. W.1. 200. 477. π(3) = 1. The permutation π of {1. M = 737 4. π(2) = 6. π(5) = 5. Entity A chooses the superincreasing sequence (12. r3 = 1. (12. M. r5 = 0. 250. To encrypt the message m = 101101. m2 = r6 = 0. Therefore. 316) 3. 196. to get 136 = 12 + 17 + 33 + 74 6. m6 = r4 = 1 17 . and solves the superincreasing subset sum problem of: 4. 2. 3. A’s public key is the knapsack set (319. 559) 8. 33. A computes 2. c = 319 + 250 + 477 + 559 = 1605 11. π(1) = 3. π(4) = 2. m4 = r2 = 1. m5 = r5 = 0. Let n = 6 2. 157. 17. A’s private key is (π. W = 635 5. To decrypt. B computes c as: 10. 316)) 9. 74. 4. 33. r2 = 1. 6} is deﬁned as: 6. 17. π(6) = 4 7. r1 = 1. 157. Applying the permutation π produces the message bits: 8. r6 = 0 7. 5. r4 = 1. m3 = r1 = 1. 74. d = W −1 c mod M = 136 3.

The random permutation π is on the set of integers {0. p. 0 ≤ i ≤ p − 1.3. h) and its private key is (f (x). .2 Chor-Rivest Knapsack cipher The Chor-Rivest cipher is the only known knapsack public-key system that does not use some form of modular multiplication to disguise an easy subset sum problem [21]. p ≥ h and for which the discrete logarithm problem is feasible [21]. . This cipher is by far the most sophisticated of those attacked by a genetic algorithm. cp−1 ). . cp−1 ). f (x) is a random monic irreducible polynomial of degree h over Zp . The elements of Fq are represented as polynomials in Zp [x] of degree less than h. of characteristic p. g(x) is a random primitive element of Fq . . Mp−1 ) of length p having exactly h 1’s as follows: (a) Set l ← h 18 p h . p − 1}. and the random integer d is selected such that 0 ≤ d ≤ ph − 2. When the parameters of this system are carefully chosen. π. . Encryption . . g(x). h) • Represent the message m as a binary string of length lg binomial coefﬁcient • Consider m as the binary representation of an integer • Transform this integer into a binary vector M = (M0 . . . . c1 . ci is then computed as ci = (aπ(i) + d) mod (ph − 1). . . For each ground ﬁeld element i ∈ Zp . . d). A’s public key is ((c0 . Most of the parameters are based on a ﬁnite ﬁeld Fq . c1 . there is no feasible attack on the Chor-Rivest system. 1. with multiplication done modulo f (x). M1 . It can not be covered in detail in a few pages. . A is the receiver) [21]: 1. where p h is a . and no information is leaked. . . so only a summary of the system will be given. . Further number theoretical algorithms are needed for the set-up and operation of the cipher.B should do the following: • Obtain A’s authentic public key ((c0 .4. 2. p. the discrete logarithm ai = logg(x) (x + i) of the ﬁeld element (x + i) to the base g(x) must be found. A message m is transmitted as follows (B is the sender. where q = ph .

. p − 1}. Mp−1 ) as follows: (a) The components of M that are 1 have indices π −1 (tj ). Decryption . (b) The remaining components are 0. Entity A selects the irreducible polynomial f (x) = x4 + 3x3 + 5x2 + 6x + 2 of degree 4 over Z7 . . . . l ← h (b) For i from 1 to p do the following: (c) if Mi−1 = 1.(b) For i from 1 to p do the following: (c) If m ≥ p−i l then set Mi−1 ← 1. p−1 i=0 0 l = 0) for l ≥ 1 • Compute c = Mi ci mod (ph − 1) • Send the ciphertext c to A 2. and multiplication in Zp are performed modulo p. .To recover the plaintext m from c.) • Factor s(x) into linear factors over Zp : s(x) = h j=1 (x + tj ). a monic polynomial of degree h over Zp (Zp is the set of integers {0. A should do the following: • Compute r = (c − hd) mod (ph − 1) • Compute u(x) = g(x)r mod f (x) • Compute s(x) = u(x) + f (x). where tj ∈ Zp • Compute a binary vector M = (M0 . M1 . 19 . subtraction. . set Mi−1 ← 0 (e) Note: n 0 = 1 for n ≥ 0. . then set m ← m + p−i l and l ← l − 1 An example using artiﬁcially small parameters would go as follows [21]: 1. m ← m − p−i l .l ←l−1 (d) Otherwise. with multiplication performed modulo f (x). Addition. The elements of the ﬁnite ﬁeld F74 are represented as polynomials in Z7 [x] of degree less than 4. • The message m is recovered from M as follows: (a) Set m ← 0. 1 ≤ j ≤ h. . Entity A selects p = 7 and h = 4 2.

c4 . c5 . c3 . π(2) = 0. c1 . 1. Entity A selects the random primitive element g(x) = 3x3 + 3x2 + 6 4. Entity A selects the random permutation π on {0. g(x). 5. 3. 2. π(3) = 2.3. Entity A computes the following discrete logarithms: • a0 = logg(x) (x) = 1028 • a1 = logg(x) (x + 1) = 1935 • a2 = logg(x) (x + 2) = 2054 • a3 = logg(x) (x + 3) = 1008 • a4 = logg(x) (x + 4) = 379 • a5 = logg(x) (x + 5) = 1780 • a6 = logg(x) (x + 6) = 223 5. π(4) = 1. p = 7. π(6) = 3 6. π(5) = 5. 4. Entity A’s public key is ((c0 . and its private key is (f (x). 6} deﬁned by π(0) = 6. π(1) = 4. d). c2 . Entity A computes: • c0 = (a6 + d) mod 2400 = 1925 • c1 = (a4 + d) mod 2400 = 2081 • c2 = (a0 + d) mod 2400 = 330 • c3 = (a2 + d) mod 2400 = 1356 • c4 = (a1 + d) mod 2400 = 1237 • c5 = (a5 + d) mod 2400 = 1082 • c6 = (a3 + d) mod 2400 = 310 8. c6 ). 20 . Entity A selects the random integer d = 1702 7. h = 4). π.

To encrypt a message m = 22 for A.9. the parameters originally suggested for the Chor-Rivest system are [21]: • p = 197 and h = 24 • p = 211 and h = 24 • p = 35 and h = 24 • p = 28 and h = 25 7 4 = 21 . 0. t4 = 6). 0. (e) The components of M that are 1 have indices π −1 (0) = 2. recovering the plaintext To illustrate how small the above parameters are. t3 = 3. c2 . 1. 0. c3 . 1. M = (1. 1) (f) Transforms M to the integer m = 22. (c) Transforms m to the binary vector M = (1. π −1 (3) = 6. 1) of length 7 (d) Computes c = (c0 + c2 + c3 + c6 ) mod 2400 = 1521 (e) Sends c = 1521 to A 10. π −1 (2) = 3. To decrypt the ciphertext c = 1521. π −1 (6) = 0. Hence. t2 = 2. 0. c4 . entity B does the following: (a) Obtains A’s authentic public key ((c0 . p. c5 . A does the following: (a) Computes r = (c − hd) mod 2400 = 1913 (b) Computes u(x) = g(x)( 1913) mod f (x) = x3 + 3x2 + 2x + 5 (c) Computes s(x) = u(x) + f (x) = x4 + 4x3 + x2 + x (d) Factors s(x) = x(x + 2)(x + 3)(x + 6) (so t1 = 0. c1 . 0. 0. h) (b) Represents m as a binary string of length 5: m = 10110 (Note that lg 5). 1. c6 ). 1.

. ct encrypted using a random. . or Elliptic Curve. ct where ci = mi ⊕ ki . 22 . . 3. In some cases. The other ciphers attacked are all systems older than that.5 Vernam cipher The Vernam cipher is a stream cipher deﬁned on the alphabet A = {0. The only ciphers that are considered even remotely mathematically difﬁcult are the knapsack systems. centuries older. AES. The one-time pad can be shown to be theoretically unbreakable [21]. kt of the same length to produce a ciphertext string c1 c2 . with the Chor-Rivest system published in 1985. such as DES. . . non-reused key string. were even attempted using a genetic algorithm based approach. A binary message m1 m2 . If the key string is randomly chosen and never used again.6 Comments The above ciphers comprise all those cryptosystems attacked using a genetic algorithm in a fairly exhaustive search of the literature. If a cryptanalyst has a ciphertext string c1 c2 . It has been proven that an unbreakable Vernam system requires a random key of the same length as the message [21].3. . this cipher is called a one-time system or one-time pad. mt is operated on by a binary key string k1 k2 . and even they are not true number theoretical protocols. RSA. None of the modern ciphers. . 1}. 1 ≤ i ≤ t [21]. This lack makes many of the genetic algorithm attacks have little importance in the ﬁeld of cryptanalysis. . The newest of these systems are the knapsack ciphers. the cryptanalyst can do no better than guess at the plaintext being any binary string of length t. and the Merkle-Hellman system published in 1978 [21].

1993 Clark 1994 Clark & Dawson 1998 Grundlingh & Van Vuuren 2002 Clark et. al. Substitution Monoalphabetic Polyalphabetic Spillman et.1: Substitution Cipher Family and Attacking Papers 23 .1 through 4. The earliest papers that use a genetic algorithm approach for a cryptanalysis problem were written in 1993. al.4 show which ciphers are attacked by which authors. there are relatively few papers that apply genetic algorithms to cryptanalysis. Figures 4. These papers focus on the simplest classical and modern ciphers. and are cited frequently by later works.Chapter 4 Relevant Prior Works While there have been many papers written on genetic algorithms and their application to various problems. 1996 Clark & Dawson 1997 Figure 4. almost twenty years after the primary paper on genetic algorithms by John Holland [12] [14].

Sahasrabuddhe 1999 Figure 4.3: Knapsack Cipher Family and Attacking Papers Vernam Lin.Permutation/Transposition Permutation Transposition Clark 1994 Matthews 1993 Grundlingh & Van Vuuren 2002 Figure 4. Dawson. Bergen 1996 Kolodziejczuk 1997 Yaseen.2: Permutation Cipher Family and Attacking Papers Knapsack Merkle-Hellman Chor-Rivest Spillman 1993 Clark.4: Vernam Cipher Family and Attacking Paper 24 . Kao 1995 Figure 4.

Sensitivity to small differences is increased by raising the result to the 8th power. This makes the ﬁrst summation term equal to the sum of the differences between each character’s standard frequency and the frequency of the ciphertext character that decodes to that character. Larger ﬁtness values represent smaller errors. The two parent keys create two child keys by selecting from each slot in the parents the more frequent ciphertext character. then the character from the other parent is selected. and is as follows: 26 26 F itness = (1 − i=1 {|SF [i] − DF [i]| + j=1 |SDF [i. The key representation is an ordered list of alphabet characters. Nelson and Kepner [26]. The paper begins with a review of genetic algorithms.1 Substitution: Spillman et al.4. the second character is the plaintext character for the second most frequent ciphertext character. Janssen. . and the systems needed to apply a genetic algorithm. where the ﬁrst character in the list is the plaintext character for the most frequent ciphertext character. The mutation scheme consists of swapping two characters in the key string. This paper selects a genetic algorithm approach so that a directed random search of the key space can occur. SDF is the standard frequency for a digram and DDF the measured frequency. j] − DDF [i. a mating scheme. The systems are considered to be a key representation. The mating scheme randomly selects two keys for mating. The selected systems are then discussed. j]|}/4)8 (4. and a mutation scheme. and so on. If the more frequent character is already in the child. focuses on the cryptanalysis of a simple substitution cipher using a genetic algorithm. The ﬁtness function uses unigrams and digrams.1) The function represents a normalized error value.1. covering the genetic algorithm life cycle. and the second by scanning right to left. When the measured and standard frequencies are the same.1993 The ﬁrst paper published in 1993 by Spillman. SF [i] is the standard frequency of character i in English plaintext. weighted in favor of keys with a high ﬁtness value. while sensitivity to large differences is decreased by the division of the summation terms by 4. The ﬁrst child key is created by scanning left to right. while DF [i] is the measured frequency of the decoded character i in the ciphertext. the summation terms equal zero. making the 25 .

05. Generate a random population of key strings 2. Do a biased random selection of parents 4. at which point. Scan the new population of key strings and update the list of the 10 “best” keys seen The process stops after a ﬁxed number of generations. the exact key is found after searching less than 4x10 − 23 of the key space.20. or after the examination of about 1000 keys. using populations of 10 and 50 keys. It presents the basic genetic algorithm concepts clearly. and mutation rates of 0. and uses examples and diagrams to clarify important points. the best keys are used to decipher the ciphertext.1 Comments Overall. It is an early work. so does not compare its results to others. Apply the mutation process to the children 6. Since high ﬁtness values are more important. this does not effect the overall result.1. • Typical results: – The exact key is not always found 26 . Results are given for 100 generation runs of the algorithm. but cannot evaluate to zero. This ﬁtness function is bounded below by zero. 0. and 0. only about 20 generations are needed to examine 1000 keys. In the larger population size. The best mutation rates for this experiment were those less than 0. In the smaller population size. Determine a ﬁtness value for each key in the population 3. 0. 4. this paper is a good choice for re-implementation and comparison. making comparison possible. The complete algorithm used in this work is: 1.40. Apply the crossover operation to the parents 5.20. but provides a decent level of result reporting and discussion.ﬁtness value equal one.10.

are not attackable. by R. Matthews. The paper begins by discussing the archetypal genetic algorithm. The next section of the paper discusses the cryptanalytic use of the genetic algorithm. a key of length N takes the form of a permutation of the integers 1 to N . while ciphers such as DES (Data Encryption Standard).no exact keys found in testing • Comment on result discrepancy: – Different measures of success were used 4. A. transposition. The plaintext. uses an order-based genetic algorithm to attack a simple transposition cipher. L characters long. Here.1993 The second paper published in 1993 [20]. It concludes that substitution. The second characteristic is an encoding for the problem that allows genetic operators to work. which is a Feistel cipher. The second stage consists of reading the text off 27 . This includes the typical genetic algorithm components. is written beneath the key to form a matrix N characters wide by (at least) L mod N characters deep.2 Transposition: Matthews .– Visual inspection of the plaintext is required most of the time to determine the exact key • Re-implementation approach: – The same parameters are re-used – Additional parameters are used to expand the range of parameter values • Observed cryptanalytic usefulness: – None . as well as the two characteristics a problem must have in order for it to be attackable by a genetic algorithm. J. The cipher selected for attack in this work is a classical two-stage transposition cipher. The ﬁrst characteristic is that a partially-correct chromosome must have a higher ﬁtness than an incorrect chromosome. permutation. and other ciphers are attackable by a genetic algorithm.

The genetic algorithm system used to attack the transposition cipher is known as GENALYST. IN. After selection occurs. This process was checked for correlation between ﬁtness value and correctness of chromosome. “Elitism” is present to ensure that the ﬁttest or most elite chromosomes found so far are always included in the breeding pool. HE. AND. ED. The random bit string indicates which elements to take from parent #1 by a 1 in the string. 28 . The ﬁrst type is a simple two-point mutation where two random chromosome elements are swapped. and EEE. THE. The second child is produced by taking elements from parent #2 indicated by a 0 in the string. as it practically never appears in English. and then ﬁnding the permutation of the N integers. The second type is a shufﬂe mutation. Decryption consists of writing down the key. breeding is done with a proportion of the remaining chromosomes. to improve conﬁdence in GENALYST. with the addition of the items needed for an order-based approach.TH. Ten digram and trigrams were used . Breaking this cipher is typically a twostage process as well. where the permutation is shifted forward by a random number of places. The GENALYST system uses a ﬁtness scoring system based on the number of times common English digrams and trigrams appear in the decrypted text.the columns in the order indicated by the integers in the key. This means that the probability of a chromosome getting selected is proportional to its normalized ﬁtness value. Breeding is done using the position based crossover method for order-based genetic algorithms. and ﬁlling in from parent #1 as before. and the elements omitted from parent #1 are added in the order they appear in parent #2. writing the ciphertext back into the matrix in sequence. ING. in order to help optimize the effectiveness of GENALYST. The proportion used decreases linearly as the search continues. The stages are determining the length of the transposition sequence (N ). AN. Two possible mutation types may then be applied. The presence of EEE resulted in a subtraction of ﬁtness points. GENALYST selects the ﬁttest chromosomes using a roulette wheel approach based on normalized ﬁtness values. It incorporates all the standard genetic algorithm features. a random bit string of the same length as the chromosomes is used. In this method. determining the length of the columns. ER. and reading across the columns.

A test procedure is given to ensure that test texts have a true ﬁtness value that is representative of English text. This is calculated mathematically, based on digram/trigram percentage frequency in the text, the ﬁtness score given by GENALYST to that di- or tri-gram, and the summation over all di- and tri-grams checked. The text length is also important, as it is easier to attack texts with a length that is an integer multiple of the key length. Based on these considerations, a test text with a length of 181 characters was selected. This length is prime, but close to multiples of 6, 9, 10, and 12. This is a worst-case scenario since not all column lengths are the same, and multiple possibilities for the key length are present. A Monte Carlo search is run on the same text so that the genetic algorithm results can be compared to the results from a traditional search technique. In a Monte Carlo search, a series of guesses is made to determine the correct key, and the most elite chromosome is kept. The breaking of a transposition cipher has two parts: ﬁnding the correct keylength, and ﬁnding the correct permutation of the key. GENALYST was tested on both parts. For the ﬁrst part, the test text was encrypted using a target key. GENALYST was then used to ﬁnd the best decryption of the text, using seven different keylengths in the range of 6 to 12, which includes the length of the target key. The goal of this section was to see if the ﬁttest of the seven chromosomes had the same length as the target key. The population size used was 20, and the number of generations was 25. The probability of crossover started at 0.8 and decreased to 0.5. Mutation probability started at 0.1 for both point mutation and shufﬂe mutation, increasing to 0.5 for point mutation and to 0.8 for shufﬂe mutation. The experiment was run ﬁve times each for three target keylengths, K = 7, 9, 11. GENALYST was successful in 13 of the 15 runs (87%). However, the random Monte Carlo algorithm performed equally well in this task, also producing a success rate of 87%. For the second part, the same parameters of population size and number of generations were used. GENALYST managed to completely break the cipher in one of the runs for K = 7 and one of the runs for K = 9. Overall, GENALYST involves little more computational effort than the Monte Carlo algorithm, since the single largest task, ﬁtness rating, is present in both.

29

Therefore, a genetic algorithm approach is useful even when only a small advantage is produced. Using a larger gene pool and number of generations boosts GENALYST’s performance signiﬁcantly over the performance of Monte Carlo. The greatest improvement in GENALYST’s performance comes from hybridizing the algorithm with other techniques, such as perming.

4.2.1

Comments

Overall, this is one of the best reference papers available. It is an early work, but is very complete. It covers background material and problem considerations well, and has a very balanced approach to considering the genetic algorithm as a possible solution. The parameters are clearly reported, as are the algorithms used. This is an excellent candidate for re-implementation and comparison. • Typical results: – Keylength correctly determined 87% of the time – Correct key found 13.33% of the time • Re-implementation approach: – Different text lengths used, same parameters re-used – Additional parameters are used to expand the range of parameter values • Observed cryptanalytic usefulness: – Low - low decryption percentage • Comment on result discrepancy: – Percentages reported are based on number of tests and different numbers of tests were used

30

**4.3 Merkle-Hellman Knapsack: Spillman - 1993
**

The third paper published in 1993, by R. Spillman [25], applies a genetic algorithm approach to a Merkle-Hellman knapsack system. This paper considers the genetic algorithm as simply another possibility for the cryptanalysis of knapsack ciphers. A knapsack cipher is based on the NP-complete problem of knapsack packing, which is an application of the subset sum problem. The problem statement is: “Given n objects each with a known volume and a knapsack of ﬁxed volume, is there a subset of the n objects which exactly ﬁlls the knapsack ?”. The idea behind the knapsack cipher is that ﬁnding the sum is hard, and this fact can be used to protect an encoding of information. There must also be a decoding algorithm that is relatively easy for the intended recipient to apply. An easy decoding algorithm can be made by using the easy knapsacks. An easy knapsack consists of a sequence of numbers that is superincreasing, i.e., each number is greater than the sum of the previous numbers. However, an easy knapsack does not protect the data if the elements of the knapsack are known. A Merkle-Hellman knapsack system converts an easy knapsack into a trapdoor knapsack. A trapdoor knapsack is not superincreasing, making it harder to attack. The Merkle-Hellman procedure to create a trapdoor knapsack sequence A is as follows: 1. Given a simple knapsack sequence A = (a1 , . . . , an ) 2. Select an integer u > 2an 3. Select another integer w such that gcd(u, w) = 1 4. Find the inverse of w mod u 5. Construct the trapdoor sequence A = wA mod u Once the easy knapsack has been converted into a trapdoor knapsack, both the easy knapsack and the value of w−1 must be known to decode the data. The target sum for the superincreasing sequence is calculated by multiplying the sum of the trapdoor sequence and 31

The private key would be the superincreasing sequence and the values of u. The three systems that need to be deﬁned for a genetic algorithm application are: representation scheme. with the trapdoor sequence the public key. in general. a 1 in the chromosome indicates that that term should be included when calculating the knapsack sum. Other properties included for this experiment are: • Function range between 0 and 1. a 0 becomes a 1..w−1 together and then reducing it modulo u. F ullSum − T arget) where F ullSum is the sum of all components in the knapsack 32 (4.e. than those that produce a sum less than the target • It should be difﬁcult to produce a high ﬁtness value The actual evaluation function used depends on the relationship between the target sum and the chromosome sum. and mutation are described. Here.e. i. Mutation consists of switching a bit in the chromosome to its complement. and w−1 . the function measures how close a given sum of terms is to the target sum of the knapsack. i. and mutation process.1) .3. The applicable processes of selection. Selection is a random process which is biased so that the chromosomes with higher ﬁtness values are more likely to be selected. The representation used here is that of binary chromosomes. The function is determined as follows: 1.. i. mating process. mating. The mating process is a simple crossover operation. The evaluation function is determined once a representation scheme has been selected.e. This system can become a public-key cipher. bits s + 1 to r of parent #1 are swapped with bits s + 1 to r of parent #2. Calculate the maximum difference that could occur between a chromosome and the target sum M axDif f = max(T arget. The next section of the paper introduces genetic algorithms using binary chromosomes.. which represent the summation terms of the knapsack. with 1 indicating an exact match • Chromosomes that produce a sum greater than the target should have a lower ﬁtness. w.

although short. and there are sufﬁcient results to make a comparison possible.3.3. Re-implementation should not be difﬁcult.2. Determine the Sum of the current chromosome 3. with an initial population of 20 random 15-bit binary strings. A minimal level of experimentation is included but the results and parameters are decently reported. The mutation process consists of three different possibilities. • Typical results: – Used 5 ASCII characters. Each of the 5 sums (one per character) was attacked using the genetic algorithm. The introductory material and discussion are reasonable. If Sum > T arget then F itness = 1 − 6 |Sum − T arget|/T arget (4. bits are randomly mutated as described above (switched to their complement). this paper is a reasonable choice for re-implementation and comparison. as well as results for one 15-item knapsack applied to a 5 character text.3) As described above.1 Comments This paper is reasonable for its length (about 10 pages) and year.3. About half the time. If Sum ≤ T arget then F itness = 1 − 4. The characters are encoded as 8 bit ASCII characters. Next most likely is the swapping of a bit with a neighboring bit. A description of the complete algorithm is given. 4.2) |Sum − T arget|/M axDif f (4. Overall. each of which was decrypted • Re-implementation approach: 33 . the mating process is simple crossover. The ﬁnal possibility is a reversal of the bit order between two points.

while DDF [i][j] is the relative frequency of that digram in the message decrypted using k. The ﬁrst section of the paper describes the two ciphers used . Z. the ﬁtness function is Fk = (α i∈Λ |SF [i] − DF [i]| + β i∈Λ j∈Λ |SDF [i][j] − DDF [i][j]|) (4. The next three sections introduce the basic concepts of simulated annealing.the scoring table for digrams and trigrams.– Not re-implemented • Observed cryptanalytic usefulness: – None . C. B. . while DF [i] is the relative frequency of the decoded character i in the message decrypted using the key k k.4. includes the genetic algorithm as one of three optimization algorithms applied to cryptanalysis. SDF [i][j] is the relative frequency of the digram ij in English. For the permutation cipher. not re-implemented 4.4 Substitution/Permutation: Clark . The other two algorithms are tabu search and simulated annealing. by Andrew Clark [4]. . the same approach was used as in [20] . The genetic algorithm section is fairly standard. and tabu search. space). For the substitution cipher. Varying α and β allows one to favor either the unigram or digram frequencies. This section is where the ﬁtness functions are developed. suitability assessment is discussed.1) Λ denotes the set of characters in the alphabet .1994 The only paper published in 1994. The section is short and does not give any 34 . genetic algorithms.(A. . . and contains much the same material as earlier papers.the knapsack size is trivial • Comment on result discrepancy: – No comment.simple substitution and permutation. SF [i] is the relative frequency of character i in English. Next.

1995 This is the only paper published in 1995. This paper is not a good candidate for further use.the correct key was found with a high success rate • Comment on result discrepancy: – No real discrepancy. The tiny genetic algorithm section and few reported results make it difﬁcult to re-implement or even to compare against.1 Comments Overall. 4.speciﬁcs about the algorithm used. It is well written. and does not give many details. including attack and cryptosystem types. By Feng-Tse Lin and Cheng-Yan Kao. different combination of metrics used 4. • Typical results: – Convergence rates and computation time are the only reported results • Re-implementation approach: – Parameters were not reported – Standard parameter values used in other re-implementations were used – Only the permutation attack was re-implemented as a later work from this author was used for the substitution attack • Observed cryptanalytic usefulness: – Medium . Kao . The paper begins by introducing cryptography and cryptanalysis.5 Vernam: Lin. The ﬁnal section in the paper discusses the results. as well as the Vernam cipher 35 . It is fairly short. but lacks details. [18] is a ciphertext-only attack on a Vernam cipher.4. this paper is too short (5 pages) to be useful.

. with parameters • Re-implementation approach: – Not re-implemented • Observed cryptanalytic usefulness: – None . where mi = (ci − ki ) mod p. but are. both in notation and logic. mn denote a plaintext bit stream and K = k1 . c2 . . . . • Typical results: – One run of the genetic algorithm is shown. 4. . possibly due to translation from the original language. Man. only M = Dk (C) = m1 .1 Comments This paper is of below average quality. The genetic algorithm approach is rendered invalid by the errors present in the reported approach and assumptions. . . The systems are adequately described. The Vernam cipher is a one-time pad style system. and there are many errors in this paper. Since the described cryptosystem is not the Vernam cipher.. The cipher given in this paper is not the Vernam cipher. kr a key bit stream. . . .itself. m2 . this application needs no further discussion. where ci = (mi + ki ) mod p and p is a base. There are some problems with English grammar. however incorrect. and use it to break the cipher. . the decryption formula has the same key bit stream K. k2 .5.inconsistent notation. leading one to believe that the cipher attacked is not the true Vernam cipher 36 . . The proposed approach is to determine K from an intercepted ciphertext C. and Cybernetics. . cn . The Vernam cipher generates a ciphertext bit stream C = Ek (M ) = c1 . . . It is rather short. and was written for the IEEE International Conference on Systems. A re-implementation is impossible due to the many errors present in this work. Let M = m1 . Since this is a one-key (symmetric or private-key) cryptosystem. m2 .

The amount of key space searched is about half that of an exhaustive attack. and 25. Bergen 1996 This paper. by Clark. The paper begins by introducing the subset sum problem and previous work in genetic algorithm cryptanalysis. Dawson. It contains a detailed analysis of the ﬁtness function used in [25]. and Bergen [7] is an extension of [25]. 100 different knapsack sums were formed.6 Merkle-Hellman Knapsack: Clark.1) The altered ﬁtness function is found to be more efﬁcient by running a genetic algorithm attack on 100 different knapsacks sums from the knapsack of size 15 used by Spillman [25]. In this work. the number of generations required. This function requires fewer generations. The genetic algorithm attack is then introduced and Spillman’s [25] attack summarized. The Hamming distance between the proposed solution and the true solution is not always indicated correctly by the ﬁtness function. This attack is poor because the ﬁtness function does not accurately describe the suitability of a given solution. Dawson. not re-implemented 4.6. as well as a modiﬁed version of the same ﬁtness function. which is not a signiﬁcant gain for a large knapsack. and the percentage of the key space searched vary widely between runs.15. The gain is insigniﬁcant due to the overhead of the genetic algorithm approach. 20. the function is altered to: F itness = 1 − (|Sum − T arget|/M axDif f ) 2 1 (4. and the algorithm run until the solution was found. The ﬁtness function from [25] is then discussed. on average. and searches less of the key space to ﬁnd a solution. Also.• Comment on result discrepancy: – No comment. A high ﬁtness 37 . Since the reason this function penalizes solutions which have a sum greater than the target sum for no clear reason. for each of three different knapsack sizes .

Experimental results show that the genetic algorithm gives little improvement over an exhaustive search in terms of solution space covered. a ﬁtness function based on a difference of sums is not adequate when attempting to solve the subset-sum problem with an optimization algorithm. It is not. due to reduced algorithmic complexity.value does not necessarily mean a low Hamming distance. 20. The distribution of Hamming distances for a given range of ﬁtness values appears to be binomial-like. in fact. It is well written and gives a reasonable number of details. A comparison may be possible but unlikely since a new ﬁtness function for the attack is needed. not re-implemented 38 . 4. an exhaustive attack will be faster. however a stand-alone work. [25]. and.6. • Typical results: – Reports the number of generations and the key space searched – These results are for three knapsack sizes (15.1 Comments This paper clearly discusses the problems with an earlier work. 25). each attacked until the solution was found • Re-implementation approach: – Not re-implemented • Observed cryptanalytic usefulness: – None . Therefore. A reimplementation is possible but unnecessary since the work establishes that the ﬁtness function used is inappropriate.this work disproves the usefulness of an earlier work [25] • Comment on result discrepancy: – No comment. on average.

Since computation of a ﬁtness function including trigrams can be computationally intensive. and later adds in the trigram statistics. M processors in parallel are used. Dawson. k])2 (4. D2 . The parallel approach chosen for this work is to have a number of serial genetic algorithms working on a separate part of the problem. and Nieuwland [8]. digram. and communicates with other processors after a number of iterations.7. and K3 are the known unigram. each processor will work on ﬁnding a different key. by Clark. digram. and w3 are weights chosen to equal one and emphasize different statistics. Mutation has two possibilities. while the second section introduces the genetic algorithm. depending on the relative ﬁtness of the child key. The ﬁtness function uses weighted unigram.j. and w1 . j] − D2 [i. D1 . the application of the parallel genetic algorithm to the polyalphabetic substitution cipher is discussed. Next. Each processor works on one of the M keys. j])2 + N w3 i. k] − D3 [i. In this case.1) Here. where M is the period of the polyalphabetic substitution cipher. and D3 are the statistics for the decrypted message. If the child has a ﬁtness greater than the median in the gene pool. which allows digram and trigram statistics to be computed. j. is the ﬁrst to use a parallel genetic algorithm for cryptanalysis. It attacks a polyalphabetic substitution cipher. each character is 39 .j=1 2 (K2 [i. The mating function is borrowed from [26] and requires the same special ordering of the key. w2 . respectively. K1 . each character is swapped with the character to its right. Nieuwland . and trigram frequencies as follows: N N F (K) = w1 i=1 (K1 [i] − D1 [i]) + w2 i. The ﬁrst section introduces the simple and polyalphabetic substitution ciphers. and trigram statistics. this work starts out using only unigrams and digrams. K is a single N character key for a simple substitution cipher. If the child’s ﬁtness is less than the median.1996 e This paper. K2 . Dawson.7 Vigen` re: Clark.4. This communication involves sending the best key to each of a processor’s neighbors.k=1 (K3 [i. since more than one key must be found. j.

Fixed parameters for the experiment include: • A gene pool size of 40 • A period of 3 • 200 iterations of the genetic algorithm • A mutation rate of 0.5 after 100 iterations. the weights were w1 = 0. and w3 = 0. 0. The algorithm was run 20 times per combination on ten different messages. The selection mechanism used is to combine all the children and all the keys from the previous generation together and select the most ﬁt to be the new generation.5. and w3 = 0. 40 . 0. About 90% of the original message was recovered using 600 known ciphertext characters per key. w2 = 0.4. w2 and w3 were ﬁxed for all 200 iterations of the algorithm. the best result was taken.0 initially. A plot was created. There are 66 combinations that satisfy these conditions.8.2. In this test. w3 ∈ {0. 0.4. 1. The values of w1 . 0. the random number generator was seeded differently every time.swapped with a randomly chosen character in the key. The second test was done to indicate the optimal choice of weights.8.2. 0.3. 0. and w1 = 0. Of the two runs per message.6. twice per message. 0.1. In the ﬁve runs per message. with each point found by running the algorithm on ten different encrypted messages.9. 0.7. The best results occurred when w3 = 0.05 • 10 iterations between slave-to-slave communications The ﬁrst test was done to determine the amount of known ciphertext needed to retrieve a signiﬁcant portion of the plaintext.0.6.3. w2 = 0.0. 0. w2 .0} and 3 i=1 wi = 1. ﬁve times per message. The weights were chosen such that w1 . with the ten best results then averaged together.

1 Comments This paper is short but reasonably detailed.1997 e This is the ﬁrst paper published in 1997. overall.no messages were successfully decrypted • Comment on result discrepancy: – Different conclusions based on the different metrics used 4. Re-implementation and comparison are reasonably possible. • Typical results: – Reports on the amount of known ciphertext required in order to regain a certain percentage of the original message – Reports on the optimal choice of ﬁtness function weights • Re-implementation approach: – Re-implemented in a serial version only – Most of the original parameters were used in the re-implementation – Additional parameters are used to expand the range of parameter values • Observed cryptanalytic usefulness: – None . the serial version would be created ﬁrst to determine if parallelization is warranted. both in a serial and a parallel version. [5] is. Dawson . The algorithms are clearly deﬁned and the result graphs are a nice touch. longer version of [8]. If re-implemented. only the parallel case is considered.8 Vigen` re: Clark.4. Therefore. 41 .7. it shares all the same characteristics and is evaluated as above. By Clark and Dawson. However. a slightly more detailed. making the paper of limited usefulness.

and uses a completely trivial knapsack.4. ensuring that each character has a unique encoding • The plaintext is not more than 100 characters in length Due to these restrictions. 4.1 Comments Overall. There is no point in reimplementation of this paper. • Typical results: – Used 5 ASCII characters. Unfortunately. The only contribution of this paper is the addition of parameter variation to [25]. this paper presents a reasonable amount of numerical data and is a good extension work to [25].1997 This paper [17]. It focuses on the Merkle-Hellman knapsack system. The genetic algorithm approach needs no discussion.9 Merkle-Hellman Knapsack: Kolodziejczyk . as this application has no point. Certain restrictions are placed on the encoding algorithm. is by Kolodziejczyk. The paper introduces the Merkle-Hellman knapsack. each of which was decrypted • Re-implementation approach: – Not re-implemented 42 . and then discusses the application of a genetic algorithm to the cryptanalysis thereof. the knapsack is a completely trivial problem.9. and the effect of initial parameters on the approach reported in [25]. published in 1997. It is an extension of [25]. has poor English grammar and word usage. These are: • Only the ASCII code will be used in encryption • The superincreasing sequence will have 8 elements. the paper is not well written.

These algorithms are simulated annealing.1998 This is the only paper published in 1998. with a new attack for tabu search. The paper begins by introducing the application area and giving the basis for the current work. as well as details about the attack for each algorithm. the function is: Ck = α i∈A u u |K(i) − D(i) | + β i.k) − D(i.j) | + γ i. } (where is the symbol for a space). Dawson .knapsack size is trivial • Comment on result discrepancy: – No comment. not re-implemented 4. and u/b/t are the unigram. D is the decrypted message statistics. the genetic algorithm.j. as compared to the attack done in [4]. . here {A. K is the known language statistics.j. Performance criteria. such as speed and efﬁciency. [6] compares three optimization algorithms applied to the cryptanalysis of a simple substitution cipher. .10 Substitution: Clark. It is followed by an overview of the attack. Z. A simple substitution cipher is attacked by all three algorithms. and tabu search. digram. are investigated.k) | (4. Assuming A denotes the language alphabet.10. By Clark and Dawson. α.j.• Observed cryptanalytic usefulness: – None . β. The overall attack uses a general suitability assessment or ﬁtness function.j∈A b b |K(i.k∈A t t |K(i. and γ are the 43 . and trigram statistics.1) k is the proposed key. .j) − D(i. Each of the attacks are compared on three criteria: • The amount of known ciphertext available to the attack • The number of keys considered before the correct solution was found • The time required by the attack to determine the correct solution The next section of the paper gives a detailed overview of each of the three algorithms. .

but the beneﬁt of using trigrams over digrams is small.j) | (4. the mutation operator is discussed. and it is concluded that trigrams are the most effective basis for a cost function. which is similar to that used in [26]. with a bias towards the more ﬁt chromosomes.weights. due to the added complexity of using trigrams. The effectiveness of using the different n-grams is evaluated using a range of weights. Pseudocode is then given for the overall genetic algorithm attack. and the attack run three times per message. and MAX ITER.10.j) − D(i. when trigram statistics are used. The complexity of this formula is O(N 3 ). In overall outcome of the attack. the maximum number of iterations. The best result for each message was then averaged together to produce the data point. this paper is well written and provides a good level of detail. it is usually more practical to use only unigrams and digrams.j∈A b b |K(i. where N is the alphabet size. Selection is random. however. simulated annealing is more efﬁcient than the genetic algorithm. Considering complexity. the solution pool size.2) The genetic algorithm attack section begins with detailed pseudocode for the mating process. especially in interpretation of the results. the remainder of the paper uses a ﬁtness function based only on digrams. and tabu search is the most efﬁcient.10. In fact. This is done to provide a good representation of the algorithm’s ability. Therefore. This function is: Ck = i. Experimental results for the three algorithms were generated with 300 runs per data point. One hundred different messages were created for each case. all the algorithms performed approximately the same. The algorithms also perform equally well when comparing the amount of ciphertext provided in the attack. It provides good discussion of important points. It is not focused 44 . The M best solutions after reproduction continue on to the next iteration. 4. The attack parameters are M . Mutation randomly swaps two characters in a chromosome.1 Comments Overall. Next.

mn ). however. A and M are both binary vectors.11 Chor-Rivest Knapsack: Yaseen. and c. . The genetic algorithm attack is then discussed. time • Re-implementation approach: – Parameters were not reported – Standard parameter values used in other re-implementations were used • Observed cryptanalytic usefulness: – None . and provide enough information for comparison purposes. . amount of known ciphertext. the details given make a re-implementation fairly easy. The paper begins with a review of the Chor-Rivest scheme. number of key elements correct vs. and only uses a simple substitution cipher. total keys considered and number of key elements correct vs.on a genetic algorithm attack. an ). .no successful decryptions of the message occurred • Comment on result discrepancy: – Different conclusions based on the different metrics used 4. . The length of the vector is 45 . . • Typical results: – Results include number of key elements correct vs. . a binary vector such that c = A · M . This paper is a good choice for further study. The input to the genetic algorithm is A = (a1 . an integer which represents the cipher. a knapsack of positive integers. The binary vector A is the chromosome that represents the solution message M . [31] is also the only paper that considers a genetic algorithm attack on the Chor-Rivest public key cryptosystem. By Yaseen and Sahasrabuddhe. . The output is M = (m1 . . Sahasrabuddhe .1999 This is the only paper from 1999. followed by a review of genetic algorithms.

the percentage of search space visited decreased. and GF (235 ). such that each n-bit binary chromosome contains exactly h 1’s.n. 1 to 2 % of search 46 . This approach is based on some observations made in [7]. They are: • Swapbits: choose two positions in the chromosome randomly and swap their contents • Invert: choose two positions in the chromosome randomly and reverse the sequence of bits between them The parameters used throughout the experiment are: • pinv (the probability of the invert operation occurring) = 0. GF (175 ). With the Hamming technique. Experimental results were provided for the knapsacks of the ﬁelds GF (135 ). ai = 1 if the element has been chosen and 0 otherwise. GF (195 ).1) where F ullSum is the sum of all the components in the knapsack Two genetic operators are used during the alteration (mutation) phase of the algorithm. the same as in [25]. both with and without the Hamming distance technique. the chromosomes with a Hamming distance of 2 and 4 from a target were the ones used. The ﬁtness function is: • Determine the value of the current chromosome (call it Sum) • F it = 1 − |Sum − T arget|/F ullSum (4. the same as the size of the knapsack. The initial population is created randomly. the percentage visited had no relation to the knapsack size. as the size of the knapsack increased. These results are for the average of 30 experiments.75 • Population size of 30 To assist the ﬁtness function. where h is the degree of the monic irreducible polynomial f (x) over Zp . With the technique.75 • pm (the probability of the swapbits operation occurring) initially = 0.11. Without the technique.

1 to 10 generations with the technique and 23 to 900 generations without.11. Of the 120 experiments done in each case. A reasonable level of detail is included. however.1 Comments The main focus of this paper is on the Hamming distance technique.submitted 2002 This paper has not yet been published. on average. 4. which is not explained that clearly. • Typical results: – Reports the average number of chromosomes required to ﬁnd an exact solution • Re-implementation approach: – Not re-implemented • Observed cryptanalytic usefulness: – None .space was visited. 33 experiments without the technique did not reach a solution even after 5000 generations. The two ciphers studied 47 . [13] combines operations research with cryptology and atu tacks two classical ciphers with a genetic algorithm approach. due to length (5 pages) and clarity issues.12 Substitution/Transposition: Gr¨ ndlingh. not re-implemented 4. By Gr¨ndlingh and Van Vuuren. this paper is a poor choice for re-implementation. The average number of generations showed a similar increase . but was submitted for publication in 2002.lacks information about parameters required to analyze the attack • Comment on result discrepancy: – No comment. versus 50 to 90 % without. Van Vuuren u .

The main difﬁculty in a genetic algorithm implementation is said to be the consideration of constraints. N. ρN (L) = mini {ρN (L)}. T. these were generated for 7 languages from modern bible texts. T. crossover.1) Here. including the genetic operators of mutation. i A detailed algorithm overview. Also.12. The ﬁtness function of a candidate key k. ρN (L) i T (N ) N |δij (L) − fij i. when the ciphertext T is decrypted using the key k. k) = 2(N − ρN (L)) min N characters in a natural language L text.12. In this case. Also. the letter i and the digram ij. The paper begins with some historical background on the two ﬁelds. and both bounds may be reached in min i extreme cases. per N characters in a natural language L text. including pseudocode. is discussed next. T (N ) the letter i and the digram ij. The mono-alphabetic substitution cipher was attacked ﬁrst. and selection.j=A N (k)|] ÷ (2(2N − 1 − ρN (L) − δmin (L))) (4.2) min N and δij (L) represent the expected number of occurrences of. As for character frequencies. The mutation operator is a single 48 . The limits of F1 are [0.are substitution and transposition. The next two sections discuss genetic algorithms and letter frequencies in natural languages. is decrypted using the key k. formed from a plaintext of the same length in a natural language L. ρN (L) = min N N mini {ρN (L)} and δmin (L) = minij {δij (L)}. ρN (L) represents the (scaled) expected number of times the letter i will appear per i (k) is the observed number of occur- rences of letter i when the ciphertext T . respectively. The rationale behind this ﬁtness function involves discrepancies between the expected number of occurrences and the observed number of occurrences of a letter i. k) = [2(2N − 1 − Z ρN (L) min T (N ) − N δmin (L)) − i=A |ρN (L) − fi i T (N ) (k)| − Here. this time including digrams is: Z F2 (L. fi and fij (k) (k) are the observed number of occurrences of. An alternative option for the ﬁtness measure. The same limits apply to F2 as to F1 . N. 1]. is: 2(N − ρN (L)) − Z |ρN (L) − fi min i=A i F1 (L. fi T (N ) T (N ) (k)| (4. respectively. of length N . consider a ciphertext T of length N . and then covers basic components of symmetric (private key) ciphers.

random swap of two characters.2). and using the locations of the 0’s in the binary 49 . Keys are selected stochastically.3) The crossover operator uses a random binary vector to select entries from the parents.12.j=A |δij (L) N − δmin (L)) − fij T (N ) (k)| 2(N − 1 (4. using a similar genetic algorithm approach.12. The rest of the values are ﬁlled from parent 2.12. Using (4. Crossover occurs during a traversal of the parent chromosomes. N. This attack assumes that the number of columns is known. leave the position blank • If any characters do not appear in the child. The ﬁrst child is formed by taking values from parent 1 where 1 occurs in the binary vector. and sets of the best 3 keys overall were kept as well as the ﬁnal population. The algorithm was run for 200 generations. in the order in which they occur. and using (4. The ﬁrst child chromosome is produced as follows: • Begin a left-to-right traversal of the parent chromosomes • Select character i from the parent whose i-th character has the better frequency match to the expected number of occurrences of i • If the i-th character already appears in the child – Select the character from the other parent – If the characters are both already in the child. The ﬁtness function then becomes (using the same symbols as before): F3 (L. The second child is formed starting from parent 2. k) = N 2(N − 1 − δmin (L)) − Z N i. an average of 75 % within 15 generations. The algorithm was run on a ciphertext of 2.1) an average pool ﬁtness of 90 % was reached within 7 generations.519 characters and a population size of 20. such that keys that have a higher ﬁtness value are more likely to be mated. but the column shufﬂing is unknown. insert them to minimize discrepancies between observed and expected frequency counts The second child is produced by the same process using a right-to-left traversal. T. The transposition cipher was attacked second.

The paper concludes by describing a decision tool created to assist cryptanalysts who routinely attack substitution ciphers. algorithms.1 Comments Overall.vector. Shift mutation. consists of shifting the values in a key a random number of positions to the left or right modulo the length of the key. on the other hand. No convergence of ﬁtness values was found for this attack. It lacks enough detail in the results section to be easily comparable.12. same parameters re-used – Additional parameters are used to expand the range of parameter values • Observed cryptanalytic usefulness: – Substitution: None . Two mutation operators are considered .no messages successfully decrypted 50 . Otherwise. This tool is graphical in nature. and operators used. It is a good candidate for re-implementation.point and shift. this paper is pretty good and includes good introductory material. but not for comparison. but gives a great level of detail for the functions. • Typical results: – Gives the ﬁnal population pool and the partially decrypted ciphertext produced by the best key for the substitution attack – Transposition attack declared to be nonconvergent and no results given • Re-implementation approach: – Both the substitution and transposition ciphers were re-implemented – Different text lengths used. 4. A point mutation consists of randomly selecting two positions in a key and swapping their values. and is based on the genetic algorithm attack on substitution ciphers discussed above. the same algorithm was used in this attack as for the simple substitution cipher.

13 Overall Comments The quality of the reference works found in the literature varies widely. Several of these works were too poorly speciﬁed for further use or used trivial parameters. This issue is compounded by the low number of reference works found .low decryption percentage • Comment on result discrepancy: – Different conclusions based on the different metrics used 4. Some of these works depended on other. leaving even fewer valid reference works.– Transposition: Low . Seven works contained enough information for a re-implementation to be completed. It is not possible to compare the results using the new metrics to the results reported in the previous works due to the differences in metric composition. earlier works for the re-implementation to be possible. The metrics previously used are a diverse collection. A common set of metrics for these re-implemented works is gathered. This diversity is one of the motivating factors for this work.twelve. 51 . with very different bases.

which used trivial examples of the system.Chapter 5 Traditional Cryptanalysis Methods Only four of the seven ciphers attacked using a genetic algorithm will be further investigated. permutation and columnar transposition. polyalphabetic substitution (e. however. This cipher is also attacked using a genetic algorithm in only one paper [18] which really is not attacking the cipher it claims it is. The Merkle-Hellman knapsack cipher was initially marked for further study but has been removed due to the difﬁculty of implementing the seminal attack by Shamir [23]. due to lack of replicable works. This attack involves Lenstra’s integer programming algorithm and a fast diophantine approximation algorithm. These ciphers are monoalphabetic substitution. The Chor-Rivest knapsack cipher is difﬁcult to implement and was only considered for a genetic algorithm attack in one paper. it is not widely used due to the key length requirements.g. Therefore. 52 . this cipher is excluded from further investigation. This cipher is one of the few that exhibits perfect secrecy. making it difﬁcult to re-implement the work. The other three ciphers are e excluded due to various reasons. The paper [31] is short and does not explain the attack technique well. the Vigen` re cipher). The third cipher that will not be considered further is the Vernam cipher.

TH. English. The frequencies of digrams and trigrams may also be determined in this step. the ciphertext character that appears most frequently is highly likely to be the equivalent of the plaintext E. however. French. The next most frequent English letters are T. and IN are the three most common digrams. inspection of the frequency of occurrence of the ciphertext letters (i.e. a permutation cipher will produce ciphertext with the same frequency distribution as any similar English text. The ﬁrst step is to analyze the ciphertext to determine the frequency of occurrence of each character [27]. O. if the plaintext was written in English. ING. how often each ciphertext character appears in the ciphertext).g. H. A permutation cipher will produce a ciphertext with the same frequency distribution as that of the language of the plaintext [24].e. based on the frequencies. and R. German) has a different set of frequencies for its letters. 5. Each language (e. The index of coincidence (IC) measures the variation in letter frequencies in a ciphertext [22]. As the period of the cipher increases (i. For example. Therefore. N.5.1 Distinguishing among the three ciphers If the cipher type is not already known. the cryptanalyst must gain a feel for 53 . HE. I. and the three most common trigram are THE. the most frequent character in an English text is E [27]. S. For example.2 Monoalphabetic substitution cipher The traditional attack for the monoalphabetic substitution cipher utilizes frequency analysis and examination. the variation in letter frequencies decreases and the IC is low. In the end. Digram and trigram statistics can be used to help identify these. then there will be considerable variation in the letter frequencies and the IC will be high [22]. A. the cryptanalyst examines the ciphertext and proceeds by guessing and changing guesses for which cipher character is which plaintext character. more alphabets are used). If simple (monoalphabetic) substitution has been used. all with around the same frequency. Then. will distinguish between a permutation and a substitution cipher. and AND.

Search the ciphertext for pairs of identical segments of length at least three 2. The Kasiski test follows the below procedure [27]: 1. but apparently discovered earlier. Record the distance between the starting positions of the two segments 3.3 Polyalphabetic substitution cipher The polyalphabetic substitution cipher (here. . m divides the greatest common divisor (gcd) of the δi ’s The index of coincidence is used to further verify the value of m. 5. δ2 . The ﬁrst step is to determine the keyword length. If several such distances are obtained (δ1 . Conversely. . around 1854. denoted m. the Vigen` re cipher) is attacked using two e techniques . • m divides all of the δi ’s • and.). denoted Ic (x) is deﬁned to be the probability that two random elements of x are identical 54 . This test is based on the observation that “two identical segments of plaintext will be encrypted to the same ciphertext whenever their occurrence in the plaintext is δ positions apart. where δ ≡ 0 (mod m)” [27]. .the Kasiski test and the index of coincidence [27].the text and proceed by guessing letters until the text becomes legible. as follows [27]: • Suppose x = x1 x2 · · · xn is a string of n alphabetic characters • The index of coincidence of x. The Kasiski test was described by Friedrich Kasiski in 1863. therefore. if two segments of ciphertext are identical and have at least a length of three. This concept was deﬁned by Wolfe Friedman in 1920. by Charles Babbage [27]. then there is a good chance they correspond to identical segments of plaintext [27].

we would expect that 25 Ic (x) ≈ i=0 p2 = 0. Z are denoted by p0 . A completely random string produces an 1 Ic ≈ 26( 26 )2 = 1 26 = 0. p25 . .038 [27]. . This produces the formula [27]: 2 25 Ic (x) = i=0 fi 2 n 2 25 fi (fi − 1) = i=0 n(n − 1) Suppose x is a string of English text. then the substrings yi will look much more random since they will have been produced by shift encryption with different keys. . then Ic (yi ) should be approximately equal to 0. . . .the individual probabilities are permuted but the quantity p2 is unchanged. by writing out the ciphertext. . . . . y2 . Then. These two values (0. . . respectively. The rows of this matrix are the substrings yi . . if the expected probabilities of occurrence of the letters A. Z in x are denoted by f0 . . ym . This means that we have: y1 = y1 ym +y1 y2m + · · · y2 = y2 ym +y2 y2m + · · · . If m is not the keyword length. in a rectangular array of dimensions m × (n/m) [27]. there 2 fi are ways of choosing both elements to be i.065 [27]. 1 ≤ i ≤ m. suppose we have a ciphertext string y = y1 y2 · · · yn produced by a Vigen` re e cipher. Deﬁne m substrings of y. . f1 . y2 . . f25 . . B. Two elements of x can be chosen in ways [27]. 55 . . denoted y1 . . . . The same reasoning applies if x is a ciphertext string obtained i using any monoalphabetic cipher . . . in columns. and m is indeed the keyword length. respecn tively.038) are far enough apart that the correct keyword length can be obtained or conﬁrmed using the index of coincidence. . . . . For each i. . . B. . ym = ym y2m y3m · · · If y1 . i Now. C. .Suppose the frequencies of A. 0 ≤ i ≤ 25. ym are constructed as shown.065 for 1 ≤ i ≤ m. .065 and 0.

065 i • If g = ki . .. For example. B. the actual key. 1 ≤ i ≤ m. . k2 . then Mg will usually be signiﬁcantly smaller than 0. . . For each possible value of d. . The permutation cipher will be attacked by exhaustion as per [22]. will be close to the ideal probability distribution p0 . f25 denote the frequencies of A. and all possible permutations considered. with n n subscripts evaluated modulo 26 25 • Suppose that 0 ≤ g ≤ 25 and deﬁne the quantity Mg = i=0 25 pi fi+g n • If g = ki . This attack consists of considering d! permutations of the d characters. km ) needs to be found. K = (k1 . . p25 . all d! possible permutations are attempted on the ﬁrst two groups of d letters. .4 Permutation/Transposition ciphers The permutation cipher deﬁned in [27] is the same as that used in [4]. if d = 2. .. the ﬁrst two groups of 2 letters are permuted as (letter 1)(letter 2) and (letter 2)(letter 56 . The method in [27] is as follows: • Let 1 ≤ i ≤ m and f0 . .. . . where d is the period of the cipher. while [20] and [13] use a columnar transposition cipher. . ... then Mg ≈ i=0 p2 = 0.After the value of m has been determined.. . it is hoped that the shifted probability distribution f25+ki fki ..065 This technique should allow the determination of the correct value of ki for each value of i. 5.. . The ciphertext is divided into blocks of various lengths. starting at d = 2. n n • Since each substring yi is obtained by shift encryption of a subset of the plaintext elements using a shift ki . . Z in the string yi • Let n = n/m be the length of the string yi • The probability distribution of the 26 letters in yi is then f0 f25 .

the use of a probable sequence. contact charts. Vowel frequencies are language-speciﬁc. and vowel frequencies. and contains information relating to the likelihood that a speciﬁc letter will be preceded or followed by another letter. In English. digram/trigram charts. report. The width of a completely ﬁlled rectangle must be a divisor of the length of the message.. if the message is of a military nature. vowels comprise approximately 40% of the text. and a solution obtained by rearranging the columns to form plaintext. the possibly valid permutation is applied to the other groups of letters in the ciphertext. A contact chart is an extension of the digram/trigram charts. when putting the ciphertext into matrix (rectangle) form. the human cryptanalyst accepts the text as valid in the language used. the process continues. This fact helps the cryptanalyst to determine the most likely possible width for the inscription rectangle. At this time.e. The next step after vowel frequency examination is usually the use of a probable word or sequence. The text is deciphered more easily if the columns are then exchanged so that the most frequently occurring letter combinations appear ﬁrst [19]. the rectangle must be either 7 or 11 letters wide [24]. The process of restoring a disarranged set of letters into its original positions is called anagramming [24]. We will assume that the rectangle is completely ﬁlled in order to simplify the attack.. Therefore. then the attack is completed. words such as division. The columnar transposition cipher is attacked differently for an incompletely ﬁlled rectangle than for a completely ﬁlled rectangle. The ciphertext is written vertically into rectangles of the possible widths. For example. These include the use of a probable word. if the message has 77 letters. regiment. The probable word is used when the cryptanalyst has some information about the message which would lead to the assumption of a word in the text [24]. This continues until one or both of the groups permute into something that looks like a valid word or word part. a period should be chosen which produces the most even distribution of vowels across the columns of the matrix [19]. If the permutation produces valid text. 57 .1). otherwise. independent of text length [11]. i. attack. There are several ways to proceed with the process of anagramming. i.e.

If there is some difﬁculty involved in anagramming the ﬁrst row. retreat. the next step is to try to ﬁt two columns together to produce good digraphs [24]. the traditional attack methods are not easily automated. and so on. The polyalphabetic substitution cipher is the only one that is attacked mostly mathematically. so the extension into trigrams and longer combinations must consider both situations. 58 .communication. the next option is to look for the presence of one of the less frequent digrams such as QU or YP [11]. Mathematical information is used for both the monoalphabetic substitution cipher and the permutation/transposition ciphers. one would attempt to extend the digrams into trigrams. The combination with the highest total is most likely to be correct [15]. the main component of these attacks is the human cryptanalyst. are likely to be present [11]. Columns can be added both before and after the current set. and tend to require a trained and experienced cryptanalyst. etc. reconstruction may default to working on the ﬁrst word of the message as in [24]. no matter which method is used in the attack.5 Comments Overall. 5. however. Next. If a probable word can not be assumed. The ﬁnal decision on whether or not the plaintext is found is always made by the human cryptanalyst. enemy. One technique for column ﬁtting is to give each assumed digram its frequency in plaintext and then add the frequencies [15]. If this is not possible either.

and the elapsed time measurement is comprised almost wholly of processing time. the amount of time spent in user interaction is minimal. This measurement is almost completely comprised of time spent interacting with the user. The time measurement is irrelevant if the attack is unsuccessful. as this class of attacks run until the text is decrypted.Chapter 6 Attack Results 6. Therefore. gives one a better feel for the usefulness of each attack. used only for the genetic algorithm attacks. The differences in composition of elapsed time for the two categories of attack make it difﬁcult to compare the attacks on a time basis. independent of the decryption of the ciphertext. “time”. This metric is invalid in a classical attack. the elapsed time is measured from the time the attack begins to the time the ciphertext is completely decrypted. In the classical algorithm case. as well as the elapsed time. These two metrics were chosen so that traditional and genetic algorithm attacks could 59 . A genetic algorithm attack. runs for a speciﬁc number of generations. on the other hand.1 Methodology The main metric for both classical and genetic algorithm attacks was elapsed time. each attack was run using a tcsh shell script. In the genetic algorithm case. is the percentage of attacks which completely decrypted the ciphertext. an additional metric is used for the genetic algorithm attacks. This additional metric. Using this metric. as processing time is less than is measurable with the “time” command. This was measured using the tcsh built-in command. Therefore. as determined by the human cryptanalyst upon examination of the ﬁnal result.

500. The same set of ciphertexts was used for all attacks.8 for trigrams.20. The parameters were compiled from the various genetic algorithm attacks found in the literature and then extended to cover greater parameter space.1 for e unigrams and digrams. 4. Elapsed time measures the efﬁciency of the attack while the percentage of successful attacks measures how useful the attack is.40 Each possible combination of parameters was run once per ciphertext and length combination. This set consists of 100. The permutation attack was run using keylengths of 2. 0. or 6. Additional parameters were required for the genetic algorithm attacks. G. 200 or 400 • Mutation rates of 0. or 11.10. 9. 100. Patrick Henry’s “Give me liberty or give me death!” speech [3]. 50.be compared. All attacks were run using the following parameters: • Population sizes of 10. Wells’s The Time Machine [30]. or 50 • Number of generations equaling 25. An additional set of parameters was used for this attack. Section 1/Part 1 of Charles Darwin’s The Origin of Species [10]. 5. The only attack that varied from others attacking its cipher type was the Matthews [20] columnar transposition attack. 20. The Vigen` re attack was run e using a period of 3. Both of these metrics are needed to gauge the success of an attack. or 7. and Chapter 1/Part 1 of Louisa May Alcott’s Little Women [1]. 0. Chapter 1 of H. These parameters were: • Initial and Final Crossover probabilities 60 . Certain genetic algorithm attacks required additional parameters.05. The columnar transposition attacks were run using keylengths of 7. 40. and 1000 characters of ﬁve different English texts. 0. 0. These texts are: Chapter 1 of Genesis from the King James Version of the Bible [2]. The ﬁtness weights used in the Vigen` re attack were: 0.

The classical Vigen` re attack. are consistently difﬁcult. making them easy to decrypt even at shorter text lengths. The more mathematical an attack is. and between different lengths of the same text. on the other hand. Other texts. 0.40. 6. and can vary as much as 6:25 or as little as :45 between different lengths of a text.05.10. the time this attack takes can vary widely from text to text. a longer text is easier to decrypt. the initial shufﬂe mutation probability to 0. as the statistical characteristics are more distinct.8. The ﬁrst chapter of Genesis. times can vary widely between texts. such as the Liberty or Death speech. Longer decryption times 61 . as well as the user’s intuition. the initial point mutation probability to 0.20 or 0. the ﬁnal crossover probability to 0. Times range from 1:50 to 10:25. all classical attacks are run until complete decryption of the ciphertext is obtained.5.1.5. the shorter the period of time required. This e means that the time to decrypt a text is much more consistent.1. Some texts are statistically very well-deﬁned. for example. both crossover probabilities were set to 1 and both of the point mutation and shufﬂe mutation values were set to 0.8. 0.• Initial and Final Point Mutation probabilities • Initial and Final Shufﬂe Mutation probabilities In order to use the same set of parameters as the other attacks. the time needed to decrypt a text can vary widely. The additional set of parameters used consisted of setting the initial crossover probability to 0. Usually. Since a classical substitution attack is based on statistical information about the text. As Figure 6. Since each ciphertext is different in both statistics and distribution of common/uncommon words.2 Classical Attack Results As stated above. that use less common words or word-patterns.1 shows. is easy to decrypt due to the clarity of the statistical evidence. is very mathematically based. This additional set of parameters was used throughout the original results reported for the Matthews attack. and the ﬁnal shufﬂe mutation probability to 0. the ﬁnal point mutation probability to 0.

it takes more time to sort through the possible permutations to determine if a correct arrangement has been found.1: Classical Substitution Attack mean that there was more than one good candidate for the period. This is what occurred for most of the 100 character texts when a period of 5 or 7 was used.2 shows.10:30 10:00 9:30 9:00 8:30 8:00 7:30 7:00 100 chars 500 chars 1000 chars elapsed time (mm:ss) 6:30 6:00 5:30 5:00 4:30 4:00 3:30 3:00 2:30 2:00 1:30 1:00 0:30 0:00 genesis liberty littlew origin timem text file Figure 6. Times range from :58 to 6:25. the amount of time the attack takes depends both on the characteristics of the text and the block length used. more trivial block lengths. exhaustive search. and vary as much as 4:55 or as little as :10 between different text lengths. The classical permutation attack is a brute force. Texts that contain less common character combinations will take longer to decrypt because fewer permutations result in that particular combination. A longer block length takes more time to decrypt because there are many more possibilities that must be considered and sorted through. leading to multiple attempts at decryption. times are fairly consistent among different texts and text lengths. As Figure 6. At longer block lengths.3 shows that this attack is more consistent for elapsed time values at smaller. Figure 6. The word 62 . Therefore.

are a signiﬁcant factor in the length of time required to decrypt the text. The user then proceeds with anagramming. The characteristics of the text. A classical columnar transposition attack begins with statistical analysis of the ciphertext. block length does not heavily inﬂuence the time it takes to decrypt a text. and its length. a great deal depends on the user’s intuition. since longer or less common words are more difﬁcult to distinguish from a partial root.100 chars (period = 3) 500 chars (period = 3) 6:30 6:00 5:30 5:00 4:30 1000 chars (period = 3) 100 chars (period = 5) 500 chars (period = 5) 1000 chars (period = 5) 100 chars (period = 7) 500 chars (period = 7) 1000 chars (period = 7) elapsed time (mm:ss) 4:00 3:30 3:00 2:30 2:00 1:30 1:00 0:30 0:00 genesis liberty littlew origin timem text file Figure 6.2: Classical Vigen` re Attack e choice of the text can make it more difﬁcult to determine if a correct permutation has been found. are more inﬂuential factors. Times range from :13 to :15 for texts at block length 2. and 1:21 to 2:25 for texts at block length 6. The more often easily recognizable words occur in the text.4. Times range from :30 to 3:15. :21 to :38 for texts at block length 4. Like the classical substitution attack. with certain texts fairly consistently taking more time. 63 . As shown in Figure 6. the easier and faster the user can rearrange the columns correctly. therefore. The text attacked. or rearranging the columns to produce recognizable words.

One attack.2:30 100 chars 500 chars 1000 chars block length = 6 2:15 2:00 1:45 elapsed time (mm:ss) 1:30 1:15 1:00 0:45 0:30 block length = 4 0:15 block length = 2 0:00 genesis liberty littlew origin timem text file Figure 6. Clark et al. Two of the attacks. Clark & Dawson (1998) [6]. (1996)[8] / Clark & Dawson (1997) [5].3 Genetic Algorithm Attack Results Only the attacks on the monoalphabetic substitution.3: Classical Permutation Attack 6. The Matthews (1993) [20] attack and the Gr¨ndlingh & Van Vuuren (2002) [13] atu tack were against the columnar transposition cipher. Only the three successful attacks will be discussed here. polyalphabetic substitution (i. Clark (1994) [4]. permutation and columnar transposition were selected for further invese tigation. (1993) [26]. as time measurements for the four unsuccessful attacks are irrelevant.e. Clark (1994) [4] correctly decrypted many texts. and both attacks from Gr¨ ndlingh & Van Vuuren (2002) [13]. u The majority of the Genetic Algorithm based attacks did not correctly decrypt any texts. decrypted some texts correctly. the Vigen` re cipher). Matthews (1993) [20] and Gr¨ndlingh & Van Vuuren (2002) u [13]. Matthews (1993) [20]. This includes the attacks by Spillman et al. while the Clark (1994) [4] attack was against the permutation cipher. The Matthews attack was successful only for key length 64 .

A block length of 2 always decrypted. The per-text percentage of complete breaks at key length 7 ranged from 0 % to 20 %. Peru text decryption percentages at key length 7 ranged from 0 % to 28 %.there was one text of the ﬁve that was broken once. Chapter 1 of Genesis failed to decode at any text length. with most texts in the 2 to 4 % range.416 % as opposed to 0 %. 20. Clark’s attack of 1994 [4] was wildly successful. The other 500 and 1000 character text segments achieved a success rate of 20 % to 28 %. Also. while block lengths of 4 and 6 had success rates 65 . In comparison to the other attacks. with more decryptions occurring at larger population sizes and/or more generations. Mutation rate and type had no noticeable effect. giving it a success rate of 0. and more decryptions occurred with larger population sizes and/or more generations. A population size of 50 resulted in decryption most of the time.3:15 3:00 2:45 2:30 2:15 100 chars (keylength = 7) 500 chars (keylength = 7) 1000 chars (keylength = 7) 100 chars (keylength = 9) 500 chars (keylength = 9) 1000 chars (keylength = 9) 100 chars (keylength = 11) 500 chars (keylength = 11) 1000 chars (keylength = 11) elapsed time (mm:ss) 2:00 1:45 1:30 1:15 1:00 0:45 0:30 0:15 0:00 genesis liberty littlew origin timem text file Figure 6. Mutation and mating rates had little effect on decryption success. The 100 character text segments failed almost uniformly . The Gr¨ ndlingh & Van Vuuren attack was also only successful at a keylength of 7. as compared to the other population sizes (10. there were no successes at a keylength of 9 or 11.4: Classical Transposition Attack 7. 40).

65 or 1. larger population sizes and/or more generations increased the decryption rate. As above. the minimal effect of mutation rate on the elapsed time is apparent.more time is required due to the larger pool of chromosomes that need evaluation. Most of the per-text decryption percentages at block length 4 were in the high seventies or low eighties. 50.75 % of the 1000 character text tests were correctly decrypted.13. The effects of varying the population size and number of generations in this attack (Clark 1994 [4]) are shown in the following graphs.10 can the difference between block lengths be seen. this effect is independent of the block length used. but the times are so close as to form one line when displayed on the same graph. Each group of tests at 25. 73. 66 .30 seconds. Again.5 % of the 500 character text tests. Each graph shows all three block lengths.9 through 6.ranging from 5 % to 91 %. Overall. indicating that varying the mutation rate from 5 % to 40 % has no measurable effect. these graphs show the expected effect of increasing the number of generations or the population size . Block length 6 texts were especially sensitive to the number of generations and population size. Figures 6. Again. Figures 6. 66 % of the 100 character text tests.9 and 6. This visibility is due to the tiny range used in these graphs . Overall. The ﬁrst set of graphs. but only in Figures 6. This effect is independent of the block length used.a maximum elapsed time of 0. This set of graphs also illustrates the minimal effect of the mutation rate. and 74. while the percentages at block length 6 were usually in the mid-forties. Each graph shows all three block lengths (2. 100. show the effect of the number of generations on a speciﬁc population size. 4. 200 or 400 generations. with no obvious effect from mutation rate. failing most of the time on a population size of 10 or 20.5 through 6. show the effect of the population size at a speciﬁc number of generations. & 6). forms a plateau for time.8. The second set of graphs.

25 0:0.50 0:1.00 100 chars 500 chars 1000 chars elapsed time (mm:ss.10 0:0.25 0:3.75 0:2.00 0:3.50 0:0.25 0:2.00 25−10 25−20 25−40 50−05 50−10 50−20 50−40 100−05 100−10 100−20 100−40 200−05 200−10 200−20 200−40 400−05 400−10 400−20 400−20 Number of Generations−Mutation Rate Figure 6.90 0:1.00 0:0.60 0:0.70 0:1.75 0:3.70 0:0.30 0:0.00 0:0.25 0:4.75 0:0.50 0:2.00 25−10 25−20 25−40 50−05 50−10 50−20 50−40 100−05 100−10 100−20 100−40 200−05 200−10 200−20 200−40 400−05 400−10 Number of Generations−Mutation Rate Figure 6.80 0:0.5: Effect of Number of Generations at Population Size 10 0:4.75 0:1.50 100 chars 500 chars 1000 chars elapsed time (mm:ss.30 0:1.00 0:1.ss) 0:2.50 0:0.40 0:0.40 0:1.90 0:0.10 0:2.6: Effect of Number of Generations at Population Size 20 67 400−40 25−05 400−40 25−05 .0:2.25 0:1.10 0:1.80 0:1.20 0:1.00 0:1.50 0:3.ss) 0:1.60 0:1.20 0:0.

50 0:3.50 100 chars 500 chars 1000 chars elapsed time (mm:ss.50 0:2.00 25−10 25−20 25−40 50−05 50−10 50−20 50−40 100−05 100−10 100−20 100−40 200−05 200−10 200−20 200−40 400−05 400−10 400−20 400−20 Number of Generations−Mutation Rate Figure 6.00 0:8.50 0:8.00 0:3.00 0:4.7: Effect of Number of Generations at Population Size 40 0:10.50 0:5.50 0:6.00 0:4.00 0:7.00 0:1.ss) 0:7.50 0:1.0:8.50 0:2.50 0:7.00 0:7.00 0:0.00 0:2.50 0:3.00 0:5.50 0:10.8: Effect of Number of Generations at Population Size 50 68 400−40 25−05 400−40 25−05 .00 0:0.00 0:6.50 0:6.50 0:9.50 0:4.00 0:9.00 0:6.00 100 chars 500 chars 1000 chars elapsed time (mm:ss.00 0:2.50 0:4.00 0:3.ss) 0:5.50 0:0.50 0:1.50 0:0.50 0:8.00 0:1.50 0:5.00 25−10 25−20 25−40 50−05 50−10 50−20 50−40 100−05 100−10 100−20 100−40 200−05 200−10 200−20 200−40 400−05 400−10 Number of Generations−Mutation Rate Figure 6.

45 100 chars (block length = 2) 500 chars (block length = 2) 1000 chars (block length = 2) 100 chars (block length = 4) 500 chars (block length = 4) 1000 chars (block length = 4) 100 chars (block length = 6) 500 chars (block length = 6) 1000 chars (block length = 6) elapsed time (mm:ss.00 10−05 10−10 10−20 10−40 20−05 20−10 20−20 20−40 40−05 40−10 40−20 40−40 50−05 50−10 50−20 50−40 Population Size−Mutation Rate Figure 6.40 0:0.65 0:0.90 100 chars (block length = 2) 500 chars (block length = 2) 1000 chars (block length = 2) 100 chars (block length = 4) 500 chars (block length = 4) 1000 chars (block length = 4) 100 chars (block length = 6) 500 chars (block length = 6) 1000 chars (block length = 6) elapsed time (mm:ss.25 0:0.10 0:1.10 0:0.30 0:1.ss) 0:0.50 0:0.70 0:0.30 0:0.60 0:0.05 0:0.35 0:0.0:0.20 0:0.15 0:0.00 10−05 10−10 10−20 10−40 20−05 20−10 20−20 20−40 40−05 40−10 40−20 40−40 50−05 50−10 50−20 50−40 Population Size−Mutation Rate Figure 6.9: Effect of Population Size at 25 Generations 0:1.10 0:0.30 0:0.40 0:0.80 0:0.55 0:0.ss) 0:0.20 0:1.20 0:0.50 0:0.60 0:0.00 0:0.10: Effect of Population Size at 50 Generations 69 .

10 0:0.50 0:3.70 0:0.75 0:2.75 0:4.00 0:2.00 0:4.25 0:4.50 0:4.40 0:2.50 0:2.30 0:2.90 0:0.25 0:3.25 0:2.00 0:0.75 0:1.10 0:1.0:2.30 0:1.11: Effect of Population Size at 100 Generations 0:5.60 0:0.20 0:0.10 0:2.25 0:0.00 0:3.25 0:5.00 0:1.80 0:0.90 0:1.00 0:0.60 0:1.25 0:1.50 0:0.20 0:1.50 0:1.70 0:1.ss) 0:3.00 10−05 10−10 10−20 10−40 20−05 20−10 20−20 20−40 40−05 40−10 40−20 40−40 50−05 50−10 50−20 50−40 Population Size−Mutation Rate Figure 6.50 0:1.60 0:2.40 0:1.40 0:0.75 100 chars 500 chars 1000 chars elapsed time (mm:ss.00 10−05 10−10 10−20 10−40 20−05 20−10 20−20 20−40 40−05 40−10 40−20 40−40 50−05 50−10 50−20 50−40 Population Size−Mutation Rate Figure 6.75 0:0.50 0:0.ss) 0:1.50 0:2.30 0:0.00 0:1.80 100 chars 500 chars 1000 chars elapsed time (mm:ss.12: Effect of Population Size at 200 Generations 70 .20 0:2.

05 to :06.4 %. on average.00 0:4. but were always successful.50 0:3.50 0:10. the GA-based methods were unsuccessful.50 100 chars 500 chars 1000 chars elapsed time (mm:ss. It was successful 100 % of the time.50 0:4.0:10. The Matthews attack took from :00.00 0:5. The Gr¨ndlingh u & Van Vuuren attack averaged success rates of 6 . from :00. Traditional methods took more time. The Matthews attack achieved success rates of 2 .ss) 0:7.50 0:0.08 to :07.4 Comparison of Results Overall. The traditional attack on the transposition cipher produced elapsed times ranging from :30 to 3:15.00 0:0.50 0:1.00 0:6.50 0:5.27 in the tests run on the 100 character texts.00 0:7.50 0:6. as is standard with a classical attack run to completion. with certain texts fairly consistently taking more time.13: Effect of Population Size at 400 Generations 6.06 to :07.00 0:2. The successful attacks were on the permutation and transposition ciphers. Only three of the seven GA-based attacks re-implemented achieved any success.00 0:8.00 10−05 10−10 10−20 10−40 20−05 20−10 20−20 20−40 40−05 40−10 40−20 40−40 50−05 50−10 50−20 50−40 Population Size−Mutation Rate Figure 6. Both GA attacks were only successful at a key length u of 7. and from :00.50 0:2.50 0:9. The two GA-based attacks on the transposition cipher were by Matthews [20] and Gr¨ndlingh & Van Vuuren [13].00 0:1.46 on the 71 .29 on the 500 character text tests.00 0:9.7 %.00 0:3.50 0:8.

from :00.75 % of the 1000 character text tests were correctly decrypted. The classical permutation attack is a brute force. Most of the per-text decryption percentages at block length 4 were in the high seventies or low eighties.00 to :02. 72 . Times range from :13 to :15 for texts at block length 2. in the typical case. there is no correlation between elapsed time and success rate.7 %. and 1:21 to 2:25 for texts at block length 6. In both of these estimates. At the 2 .99 on the 1000 character text tests. and from :00. The Gr¨ndlingh & Van Vuuren attack took from :00. it is assumed that success occurs randomly and spread evenly throughout the tests. Again.5 % of the 500 character text tests. In comparison to the other attacks. 73. with the success rate of 6 .4 % success rate of the Matthews attack. There is no correlation between elapsed time and success rate. At the tiny success rates these two attacks achieved.1000 character text tests. A block length of 2 always decrypted. Overall.00 (according to the resolution of u the measurement tool) to :01.the successful results tend to cluster together.21 in the tests run on the 100 character texts. These elapsed time values are dependent on the population size and number of generations used in a given test. the test which took the longest time (:02. The only GA-based attack on the permutation cipher was the Clark (1994) attack [4]. For the Gr¨ndlingh & u Van Vuuren attack. one success generally takes more time to produce than running a traditional attack. it is not likely that either of the GA-based attacks will produce a result faster than the traditional attack.27 on the 500 character text tests.99) would need to be run multiple times for a total time of about :50. This being the case. exhaustive search. The GA-based attacks took much much smaller amounts of time to complete each test run. the test which took the longest time (:07. which is not the case in actual testing . 66 % of the 100 character text tests.46) would need to be run over and over for a total time of about 6:22 to produce one success. :21 to :38 for texts at block length 4.00 to :02. this GA-based attack was wildly successful. while the percentages at block length 6 were usually in the mid-forties. while block lengths of 4 and 6 had success rates ranging from 5 % to 91 %. and 74.

12 in tests run on the 100 character texts. The Clark (1994) attack is very competitive with the traditional attack. and the success rates offer a excellent chance that this GA-based approach will be faster than the traditional method.06 to :06.10 to :10.96 on the 500 character text tests. on average. from :00.This attack took from :00.25 on the 1000 character text tests. 73 . and from :00. These elapsed time values are dependent on the population size and number of generations used in a given test. There is no correlation between elapsed time and success rate. The elapsed time required is much smaller.08 to :07.

with more decryptions occurring at larger population sizes and/or more generations. with most texts in the 2 to 4 % range. so an attempt was made to extend its success to larger block sizes and to apply this attack to a different cryptosystem.there were no successes at a keylength of 9 or 11. these attacks were re-run with larger population sizes and more generations. The Clark [4] attack was very successful originally.1 Genetic Algorithm Attack Extensions The Matthews attack was successful only for key length 7 . 74 . The per-text percentage of complete breaks at key length 7 ranged from 0 % to 20 %.Chapter 7 Extensions and Additions There were three genetic algorithm attacks that achieved some success in decryption. These attacks were: • Matthews’ [20] attack on the transposition cipher • Gr¨ndlingh and Van Vuuren’s [13] attack on the transposition cipher u • Clark’s [4] attack on the permutation cipher Since the Matthews [20] and Gr¨ ndlingh & Van Vuuren [13] were only slightly sucu cessful. is that only a key length of 7 was successfully attacked. Mutation and mating rates had little effect on decryption success. The main problem with these two attacks. 7. to possibly improve success rates. besides the low rate of success.

73. 7.000 generations. A block length of 2 always decrypted. The other parameters for each attack were: • Matthews – A point mutation rate starting at 0. 66 % of the 100 character text tests. Mutation rate and type had no noticeable effect.1.000. Overall. u Per-text decryption percentages at key length 7 ranged from 0 % to 28 %. and 74.000. Clark’s attack of 1994 [4] was wildly successful.The Gr¨ ndlingh & Van Vuuren attack was also only successful at a keylength of 7. or 10.1 Transposition Cipher Attacks The two attacks on the transposition cipher were run with similar parameters. As above. in combination with 1.8 and ﬁnishing at 0.8 – A mating rate starting at 0.1 and ﬁnishing at 0.75 % of the 1000 character text tests were correctly decrypted.5 % of the 500 character text tests. Population sizes of 100 and 200 were used.1 and ﬁnishing at 0.5 • Gr¨ndlingh and Van Vuuren u – A mutation rate of 0. larger population sizes and/or more generations increased the decryption rate. The overall average for success rate was in the 6 % to 7 % range. and more decryptions occurred with larger population sizes and/or more generations. Most of the per-text decryption percentages at block length 4 were in the high seventies or low eighties. while block lengths of 4 and 6 had success rates ranging from 5 % to 91 %.10 – Mutation using both point and shift mutation 75 .5 – A shufﬂe mutation rate starting at 0. 5. while the percentages at block length 6 were usually in the mid-forties. with no obvious effect from mutation rate. In comparison to the other attacks.

To aid in this attempt. 7. G.10 The mutation rate parameter was the only one lowered. Section 1/Part 1 of Charles Darwin’s The Origin of Species [10]. the following set of ciphertexts was used: Chapter 1 of Genesis from the King James Version of the Bible [2]. Again. so these values were used in the extension of this attack. the larger values for the population size and number of generations did not result in any improvement in the results.001.05 or 0. mostly to larger values. was extended to attempt larger block sizes than before. Wells’s The Time Machine [30]. and 200 • A number of generations of 200. G. Patrick Henry’s “Give me liberty or give me death!” speech [3].As before.01. The parameters used were: • Block lengths of 8. 1000 or 5000 • A mutation rate of 0. only texts at a key length of 7 were decrypted. and 32 • Population sizes of 50.01 range. and Chapter 1/Part 1 of Louisa May Alcott’s Little Women [1]. Unfortunately. 76 .001 and 0. The mutation rates used in the original versions of the genetic algorithm attacks are atypically high for genetic algorithm applications. 0. Again. The success rate for the Matthews [20] attack remained in the 2 % to 4 % range. Chapter 1 of H.1. 0. Section 1/Part 1 of Charles Darwin’s The Origin of Species [10]. Clark’s 1994 attack[4]. 100.2 Permutation Cipher Attack The attack on the permutation cipher. the following set of ciphertexts was used: Chapter 1 of Genesis from the King James Version of the Bible [2]. Patrick Henry’s “Give me liberty or give me death!” speech [3]. 400. the parameter values were changed. while the rate for Gr¨ndlingh and Van Vuuren [13] was u in the 6 % to 7 % range. Chapter 1 of H. More common values are in the 0. 16.

and 4 ciphertexts were decrypted anywhere from 12 % to 40 % of the time. It does not work at larger block lengths in its current form. With a block length of 8. mutation rate had no apparent effect. in order to determine if this genetic algorithm approach will be cost-effective in comparison with a more traditional. 9 ciphertexts were decrypted more than 50 % of the time. The attack was erratically successful on a block length of 8. No attacks were successful against a block length of 32.2 Additions A new attack was implemented on the substitution cipher. The ciphertext attacked seemed to play a larger role in the attack’s success. The permutation cipher attack is possibly worthwhile at small block lengths. As before.1. two ciphertexts were not decrypted at all. and a greater rate of success was achieved among larger populations with many generations. 7. several successes were seen against two of the 1000 character ciphertexts. Two variants of this attack were tested. and unsuccessful on a block length of 32.3 Comments The two transposition cipher attacks do not justify writing this style of genetic algorithm attack in their current form. The longer texts (500 and 1000 characters) were attacked more successfully than the shortest text (100 characters). This attack is a conversion of the Clark [4] attack from the permutation cipher to the 77 . The difference between the variants was solely in the scoreboard used. probably under 10 or so. but low. This approach would need to be evaluated on a case-by-case basis. slightly successful on a block length of 16. 7. even with extraordinarily large parameter values. brute force approach. and Chapter 1/Part 1 of Louisa May Alcott’s Little Women [1]. For a block length of 16.Wells’s The Time Machine [30]. Their success rates are stable.

AND. and EEE. with the same associated values standard on the original digrams and trigrams. The only changes made were those necessary to apply the attack to a different cipher. These values are +1 for digrams and +5 for trigrams. In order to apply this attack to a cipher that was not permutation-based. ER. The second type is a shufﬂe mutation. Either one or both of these mutation types could occur. The ﬁrst type is a simple two-point mutation where two random chromosome elements are swapped. where the permutation is shifted forward by a random number of places. HE. ON. The random bit string indicates which elements to take from parent #1 by a 1 in the string. The major change made was the conversion of the chromosomes from numerical genes to alphabetical genes. The difference between the two variants of the new attack is in the scoreboard used.simple substitution cipher. Breeding was done using the position based crossover method for order-based genetic algorithms. Another change made was to allow both mutation types to occur all the time . 78 . HER. some changes had to be made. The population for the next generation was produced by merging the parent and child populations and keeping only the (population size) best solutions. since each chromosome represented a permutation. ING. In this method.the input parameter previously present was removed. The ﬁrst variant uses the same scoreboard as the implementation of Clark’s [4] attack on the permutation cipher. The original version of this attack used order-based GAs. depending on the initial parameters. The variant scoreboard adds RE. and the elements omitted from parent #1 are added in the order they appear in parent #2. The second child is produced by taking elements from parent #2 indicated by a 0 in the string. ED. Two possible mutation types may then be applied. Breeding and new population creation occurred as before. TH is +2 while EEE is −5. and ENT. THE. AN. a random bit string of the same length as the chromosomes is used. IN. and ﬁlling in from parent #1 as before. The original scoreboard contains the digrams and trigrams TH. The second variant adds several additional items to the scoreboard. This step was deemed reasonable as mutation type had no apparent effect in prior testing.

None of the ciphertexts were decrypted in any of the tests. 40. 400.2. 7. G.10. 100. and 400 • Mutation rates of 0. it could have served as a springboard to attacking a modern cipher such as DES or AES with a GA. but that seems doubtful given the complete lack of success displayed in these tests. Another option would be to further tune the operators to the new cryptosystem. 200. or 0. These ciphers are based on substitution and 79 .10 Both variants used the following set of ciphertexts: Chapter 1 of Genesis from the King James Version of the Bible [2]. and Chapter 1/Part 1 of Louisa May Alcott’s Little Women [1]. 0. It achieved at least some success on both the transposition and permutation ciphers.05. 0. 20. If this attack had proved successful.01. Section 1/Part 1 of Charles Darwin’s The Origin of Species [10]. and 50 • Number of generations of 25. 0. Wells’s The Time Machine [30].40 The variant with the extended scoreboard was attacked with these parameters: • Population sizes of 50. 0. 50. Patrick Henry’s “Give me liberty or give me death!” speech [3]. Chapter 1 of H. 100.001. Additional modiﬁcation of the parameters may improve performance. and 1000 • Mutation rates of 0.The original variant was attacked using the following parameters: • Population sizes of 10. or 0.20.1 Comments This approach appeared promising at the outset. and 200 • Number of generations of 200.05. Both attacks were completely unsuccessful.

80 . It may have been possible to attack the components of the system with a GA tuned for that component’s cipher.permutation principles.

transposition. including substitution. Dependence on parameters unique to one GA-based attack does not allow for effective comparison among the studied approaches. if a GA-based approach is unsuccessful on these simple systems. these ciphers were among the simplest possible versions. A thorough search of the available literature resulted in GAbased attacks on only these ciphers. The principles used in these ciphers form the foundation for many of the modern cryptosystems. reasonable set of metrics could be collected. it is unlikely to be worthwhile to apply a GA-based approach to more complicated systems. knapsack and Vernam ciphers. and to determine the validity of typical GA-based methods in the ﬁeld of cryptanalysis. In the classical algorithm case. the elapsed time was comprised almost completely of time 81 . The most coherent and seemingly valid GA-based attacks were re-implemented so that a consistent. permutation. The main metric for both classical and genetic algorithm attacks was elapsed time. Also.1 Summary The primary goals of this work were to produce a performance comparison between traditional cryptanalysis methods and genetic algorithm (GA) based methods. In many cases. The focus was on classical ciphers. Many of the GA-based attacks lacked information required for comparison to the traditional attacks.Chapter 8 Evaluation and Conclusion 8.

A GA-based attack. This metric was only calculated for the GA-based attacks. Of the twelve genetic algorithm based attacks found in the literature. it was comprised almost entirely of processing time. The successful attacks were those GA-based attacks on the permutation and transposition ciphers. while in the genetic algorithm case. an additional metric was required. Matthews [20] and Clark [4]. gives one a better feel for the usefulness of each attack. Elapsed time measures the efﬁciency of the attack while the percentage of successful attacks measures how useful the attack is. achieved at least some success. seven were selected for re-implementation. Since 82 . runs for a speciﬁc number of generations. The two transposition cipher attacks obtained about the same success rate as before. for varying reasons. None of the knapsack or Vernam cipher attacks were reimplemented. The secondary metric used was the percentage of successful attacks per test set. while the permutation attack was attempted on larger block lengths. This difference in composition of elapsed time makes it difﬁcult to compare the attacks on a time basis. These two metrics were chosen so that traditional and GA-based attacks could be compared. the only successful u attack to use a ﬁtness equation. The time measurement is irrelevant if the attack is unsuccessful. Therefore. Of these seven. Success was deﬁned as complete decryption of the ciphertext. on the other hand. These three successful attacks were extended to further examine their success rates.spent interacting with the user. only three attacks were successful in decrypting any of the ciphertexts. independent of the decryption of the ciphertext. Both of these metrics are needed to gain a complete picture of an attack. The two transposition attacks were tested using much larger population sizes and numbers of generations. Clark’s [4] GA-based attack on the permutation cipher was by far the most successful. The third successful GA-based attack was by Gr¨ndlingh and Van Vuuren [13]. while the permutation cipher attack experienced decreased success. as well as elapsed time. Using this metric. as a classical attack will run until the text is decrypted. Both of the attacks which used a scoreboard for ﬁtness calculation.

Applying this approach to other problems may prove useful. Unfortunately. The focus was on classical ciphers. Use of other seldom-used approaches may prove additionally valuable. yet does not seem to be in widespread use. proved useful in this application despite the rarity of its use. this attempt was unsuccessful. if a GA-based approach is unsuccessful on 83 . No ciphertexts were completely decrypted with either variant attempted. Also. This would determine whether the scoreboard approach is limited to speciﬁc applications only. The principles used in these ciphers form the foundation for many of the modern cryptosystems. simply altered to apply to a different cipher. including substitution. this style of GA-based attack was attempted on the monoalphabetic substitution cipher. knapsack and Vernam ciphers. 8. A comparison of the scoreboard approach to the ﬁtness equation approach in other applications would be useful as well.the scoreboard approach was successful in both cases it was used. permutation. 8. One of these is in the investigation of the scoreboard approach to ﬁtness calculation. for example. transposition.2 Future Work There are several areas in which this work could be expanded. The scoreboard approach. This method achieved the best results of the two methods used.3 Conclusion The primary goals of this work were to produce a performance comparison between traditional cryptanalysis methods and genetic algorithm (GA) based methods. or if it has widespread applicability. and to determine the validity of typical GA-based methods in the ﬁeld of cryptanalysis. Another area for investigation is the use of less common ﬁtness calculation measures. The attack was based on Clark’s [4] attack. Re-working of the code to make it modular and re-usable for other projects would be a useful extension. This work was coded in C and used perl/tcsh scripts.

these simple systems, it is unlikely to be worthwhile to apply a GA-based approach to more complicated systems. A thorough search of the available literature resulted in GAbased attacks on only these ciphers. In many cases, these ciphers were among the simplest possible versions. Many of the GA-based attacks lacked information required for comparison to the traditional attacks. Dependence on parameters unique to one GA-based attack does not allow for effective comparison among the studied approaches. The most coherent and seemingly valid GA-based attacks were re-implemented so that a consistent, reasonable set of metrics could be collected. The two metrics used, elapsed time and percentage of successful attacks, were chosen so that traditional and GA-based attacks could be compared. Elapsed time measures the efﬁciency of the attack while the percentage of successful attacks measures how useful the attack is. Both of these metrics are needed to gain a complete picture of an attack. Of the genetic algorithm attacks found in the literature, totaling twelve, seven were re-implemented. Of these seven, only three achieved any success. The successful attacks were those on the transposition and permutation ciphers by Matthews [20], Clark [4], and Gr¨ndlingh and Van Vuuren [13], respectively. These attacks were further investigated in u an attempt to improve or extend their success. Unfortunately, this attempt was unsuccessful, as was the attempt to apply the Clark [4] attack to the monoalphabetic substitution cipher and achieve the same or indeed any level of success. Overall, the standard ﬁtness equation genetic algorithm approach, and the scoreboard variant thereof, are not worth the extra effort involved. Traditional cryptanalysis methods are more successful, and easier to implement. While a traditional method takes more time, a faster unsuccessful attack is worthless. The failure of the genetic algorithm approach indicates that supplementary research into traditional cryptanalysis methods may be more useful and valuable than additional modiﬁcation of GA-based approaches.

84

Bibliography

[1] Alcott, Louisa May. Little Women Parts I erature Online 1997-2003. Retrieved August http://www.underthesun.cc/Classics/Alcott/littlewomen/. & II. Great Lit20, 2003 from

[2] The Holy Bible, King James Version. New York: American Bible Society: 1999; Bartleby.com, 2000. Retrieved August 20, 2003 from http://www.bartleby.com/108/. [3] Bryan, William Jennings, ed. The World’s Famous Orations. New York: Funk and Wagnalls, 1906; New York: Bartleby.com, 2003. Retrieved August 20, 2003 from http://www.bartleby.com/268/. [4] Clark, A. (1994). Modern optimisation algorithms for cryptanalysis. In Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, November 29 - December 2, (pp. 258-262). [5] Clark, A., & Dawson, Ed. (1997). A Parallel Genetic Algorithm for Cryptanalysis of the Polyalphabetic Substitution Cipher. Cryptologia, 21 (2), 129-138. [6] Clark, A., & Dawson, Ed. (1998). Optimisation Heuristics for the Automated Cryptanalysis of Classical Ciphers. Journal of Combinatorial Mathematics and Combinatorial Computing, 28, 63-86. [7] Clark, A., Dawson, Ed, & Bergen, H. (1996). Combinatorial Optimisation and the Knapsack Cipher. Cryptologia, 20(1), 85-93. [8] Clark, A., Dawson, Ed, & Nieuwland, H. (1996). Cryptanalysis of Polyalphabetic Substitution Ciphers Using a Parallel Genetic Algorithm. In Proceedings of IEEE International Symposium on Information and its Applications, September 17-20. [9] Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2001). Introduction to Algorithms: Second Edition. Cambridge, Boston: MIT Press, McGraw-Hill.

85

[10] Darwin, Charles Robert. The Origin of Species. Vol. XI. The Harvard Classics. New York: P.F. Collier & Son, 1909-14; Bartleby.com, 2001. www.bartleby.com/11/. [August 20, 2003]. [11] Gaines, H. F. (1956). Cryptanalysis: a Study of Ciphers and their Solutions. New York: Dover. [12] Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Boston: Addison-Wesley. [13] Gr¨ndlingh, W. & van Vuuren, J. H. (submitted 2002). Using Genetic Algou rithms to Break a Simple Cryptographic Cipher. Retrieved March 31, 2003 from http://dip.sun.ac.za/˜vuuren/abstracts/abstr genetic.htm [14] Holland, J. H. (1975). Adaptation in natural and artiﬁcial systems. Ann Arbor: The University of Michigan Press. [15] Kahn, D. (1967). The Codebreakers: The Story of Secret Writing. New York: Macmillan. [16] Kreher, D. L. & Stinson, D. R. (1999). Combinatorial Algorithms: Generation, Enumeration, and Search. Boca Raton: CRC Press. [17] Kolodziejczyk, J., Miller, J., & Phillips, P. (1997). The application of genetic algorithm in cryptoanalysis of knapsack cipher. In Krasnoproshin, V.; Soldek, J.; Ablameyko, S.; Shmerko, V. (Eds.), Proceedings of Fourth International Conference PRIP ’97 Pattern Recognition and Information Processing, May 20-22, (pp. 394-401). Poland: Wydawnictwo Uczelniane Politechniki Szczecinskiej. [18] Lin, Feng-Tse, & Kao, Cheng-Yan. (1995). A genetic algorithm for ciphertext-only attack in cryptanalysis. In IEEE International Conference on Systems, Man and Cybernetics, 1995, (pp. 650-654, vol. 1). [19] Lubbe, J. C. A. van der. (1998). Basic Methods of Cryptography. (S. Gee, Trans.). Cambridge: Cambridge University Press. (Original work published 1997). [20] Matthews, R.A.J. (1993, April). The use of genetic algorithms in cryptanalysis. Cryptologia, 17(4), 187-201. [21] Menezes, A., van Oorschot, P., & Vanstone, S. (1997). Handbook of Applied Cryptography. Boca Raton: CRC Press. 86

3144. Grefenstette (Ed. 224-230). Neittaanm¨ ki and J. R. A. & Pieprzyk. & Kepner. [27] Stinson. 113 . H.. D. 81-85). M. & Sahasrabuddhe. (1989). (pp. 87 . July 1987. 367-377. 2000.. 2003 from http://www. (1993. A genetic algorithm for the cryptanalysis of Chor-Rivest knapsack public key cryptosystem (PKC). 1982. October).T. Miettinen. July). Evolutionary Ala a a gorithms in Engineering and Computer Science (pp.). A polynomial time algorithm for breaking the basic MerkleHellman cryptosystem. A. I. (1993.J. M. I. (pp. In K.[22] Seberry. In Proceedings of Third International Conference on Computational Intelligence and Multimedia Applications. M. January). Cryptologia.com.. 17(4). Boca Raton: CRC Press. [29] Tomassini.bartleby. Retrieved August 20. 17(1). Bartleby. [25] Spillman. Proceedings of the 2nd International Conference on Genetic Algorithms. Use of a genetic algorithm in the cryptanalysis of simple substitution ciphers. Sydney: Prentice Hall. Janssen. Nelson. H. G. J.M. [30] Wells. (1999). (pp. Cryptologia. 1999. New York: Random House. Cryptanalysis of knapsack ciphers using genetic algorithms. J. (1999). In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science. Cryptography: Theory and Practice. MA. [28] Oliver. Chichester: J. J. USA. A Study of Permutation Crossover Operators on the Traveling Salesman Problem. (1987.133). Periaux (Eds. P.R. (2002).C. M¨ kel¨ . Smith. Elementary Cryptanalysis: A Mathematical Approach. (1968)..). Cryptography: An Introduction to Computer Security. [23] Shamir. Parallel and Distributed Evolutionary Algorithms: A Review. Cambridge.V. Lawrence Erlbaum Associates. D.. The Time Machine. M. [24] Sinkov. & Holland. R. 145-152). B.. [26] Spillman. (1982).F. In John J. Wiley and Sons.com/1000/ [31] Yaseen.

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd