You are on page 1of 22

Cryptologia

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/ucry20

How we set new world records in breaking Playfair


ciphertexts

Elonka Dunin, Magnus Ekhall, Konstantin Hamidullin, Nils Kopal, George


Lasry & Klaus Schmeh

To cite this article: Elonka Dunin, Magnus Ekhall, Konstantin Hamidullin, Nils Kopal, George
Lasry & Klaus Schmeh (2021): How we set new world records in breaking Playfair ciphertexts,
Cryptologia, DOI: 10.1080/01611194.2021.1905734

To link to this article: https://doi.org/10.1080/01611194.2021.1905734

Published online: 13 Aug 2021.

Submit your article to this journal

Article views: 186

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=ucry20
CRYPTOLOGIA
https://doi.org/10.1080/01611194.2021.1905734

How we set new world records in breaking Playfair


ciphertexts
Elonka Dunin, Magnus Ekhall, Konstantin Hamidullin, Nils Kopal, George
Lasry, and Klaus Schmeh

ABSTRACT KEYWORDS
The Playfair cipher is a well-known manual encryption method challenges; cryptanalysis;
developed in the 19th century. Until 2018, known cryptanaly- dictionary attack; hill
sis techniques, with computer assistance, could solve non-key- climbing; Playfair cipher;
simulated annealing;
word-based Playfair ciphertexts if they had at least 60 letters tabu search
to work with. Shorter ciphertexts were effectively impossible
to solve in the absence of a crib. In this article, we show how
we introduced several improvements in these cryptanalysis
methods, which made it possible to do much better. This
resulted in the (unofficial) world record for the shortest
Playfair message broken going down from 60 via 50, 40, 32,
and 28 to 26 letters. The cryptanalysis techniques used include
hill climbing, simulated annealing, tabu search, and plaintext-
based dictionary attacks. For readers interested in improving
the current record, we also provide unsolved Playfair chal-
lenges consisting of 24 and 22 letters.

1. The Playfair cipher


The Playfair cipher is a well-known manual encryption system based on
the substitution of letter pairs (digraphs). It was invented by Charles
Wheatstone (1802–1875) in 1854 and in the same year recommended to
the British military by Lord Lyon Playfair (1818–1898) (Figure 1), hence its
name (Kahn 1996, 198–202). The Playfair cipher can be regarded as a spe-
cial case of the general digraph substitution, which is based on an exhaust-
ive substitution table. If a 26-letter alphabet is used, such a data structure
creates a matrix with 26  26 ¼ 676 entries, a key which is not very handy
and almost impossible to memorize. The Playfair cipher replaces this large
table with a much smaller matrix, and three simple substitution rules.
As the Playfair cipher is quite popular and certainly known to most readers,
we only give a brief introduction. For an example, we will encrypt the message
“to be or not to be.” The first step is to write it as a sequence of digraphs:

TO BE OR NO TT OB E

CONTACT Klaus Schmeh klaus@schmeh.org Private, Nikolaus-Groß-Str. 32, Gelsenkirchen 45886, Germany
ß 2021 Taylor & Francis Group, LLC
2 E. DUNIN ET AL.

As the rules of Playfair (provided below) require that there are no pairs
with identical letters in the plaintext, we will put an X between the two Ts:

TO BE OR NO TX TO BE
Sidenotes about a Playfair plaintext:
 If there was now a single letter at the end, we would have to add
another padding letter, such as an X, at the last position. But this is not
necessary here, as the number of letters after adding the X between the
two Ts is even.
 The letter J may not appear in a Playfair plaintext; if one exists, this is
usually handled by writing an I instead of a J.

Figure 1. The Playfair cipher is named for British politician and scientist Lyon Playfair
(1818–1898, left), who promoted the system. It was actually invented by his friend, inventor
and scientist Charles Wheatstone (1802–1875, right). Sources: Lock & Witfield, Public domain,
via Wikimedia Commons/Samuel Laurence, Public domain, via Wikimedia Commons.

The key of the Playfair cipher is a 5  5 matrix that contains the letters
of the alphabet (except the J) in an arbitrary order; in other words, a per-
mutation of a 25-letter alphabet. It is possible to use a keyword (e.g.,
MONDAY) for determining the order of the letters in the matrix:
2 3
M O N D A
6Y B C E F7
6 7
6G H I K L7
6 7
4 P Q R S T5
U V W X Z
CRYPTOLOGIA 3

However, this method is not applied in the following. Instead, in this art-
icle we assume that a Playfair matrix is always based on a random permu-
tation of the alphabet, as in the following example:

Figure 2. The Playfair cipher substitutes digraphs according to three simple rules.

To encrypt our message, we substitute the plaintext digraphs (TO, BE,


OR, … ) according to three Playfair rules, defined as follows (see also
Figure 2):

1. If the two letters of a digraph are in neither the same column nor the
same row (this is the most frequent case), form a rectangle with the two
letters at opposite corners and replace the two letters by the other two
corner letters. The upper plaintext letter is replaced by the other upper
letter in the rectangle, the lower plaintext letter by the lower one. For
instance, if we use the above matrix, TO encrypts to HL.
2. If the two letters stand in the same row, each one is replaced by its
right neighbor, or wraps around to the first column as needed. In our
example, BE becomes KB.
3. If the two letters stand in the same column, each one is replaced by its
lower neighbor and wraps around to the top row as needed. In our
example, OR encrypts to NO.

In our example, the plaintext is encrypted as follows:


Plaintext: TO BE OR NO TX TO BE
Ciphertext: HL KB NO HN MH HL KB
4 E. DUNIN ET AL.

Key-space size:
As there are 25! ways to permute a 25-letter alphabet, there are as many
keys available for the Playfair cipher. However, rotating the lines or the col-
umns in the matrix doesn’t affect the encryption result. There are 5 ways
to rotate the columns and 5 ways to rotate the rows, and therefore
5  5 ¼ 25 possible rotations which result in equivalent keys. The effective
number of keys for the Playfair cipher is therefore 25!/25 ¼ 24!6:2  1023 ,
which corresponds to a 79-bit key.
Unicity distance:
The unicity distance of the Playfair cipher is 22.69 for the English lan-
guage (Deavours 1977). The unicity distance is the minimum length of a
ciphertext so that there is only one plausible plaintext (i.e., there exists
only one key that decrypts the ciphertext into a meaningful English plain-
text). This means that for a Playfair ciphertext consisting of 22 or fewer
letters, we can expect more than one key that decrypts the ciphertext so
that the decrypted text is meaningful, which could result in ambiguous
solutions. However, the unicity distance is, of course, a theoretical thresh-
old. It is therefore possible that an even shorter Playfair ciphertext
is breakable.
For more information about the Playfair cipher, see (Kahn 1996,
198–203) and (Bauer 2019, 320–326). A detailed look on how to break
such a cipher is provided in (Dunin and Schmeh 2020, 289–305). As can
be seen in these sources, many variants of the Playfair cipher have been
described since the invention of this system in the 19th century. For
instance, during the Second World War the U.S. Army used a Playfair ver-
sion that allowed for identical digraphs (Dunin and Schmeh 2020,
287–288), while the Germans employed a two-matrix variant, known as
two-square cipher or Doppelkasten, at the same time (David 1996). It is
also possible to have nonstandard versions, such as larger matrices of 5  6,
or rules that shift things to the left instead of right, or down instead of up.
However, in this article, we only address the standard version as
described above.
The rest of this article is structured as follows: First, Section 2 introduces
conventional cryptanalysis approaches for attacking the Playfair cipher.
After that, Section 3 presents the concepts of hill climbing and simulated
annealing, which were successfully used for cryptanalyzing classical ciphers.
Then, in Section 4 we present different Playfair challenges and how authors
of this article solved these over the last few years. We also provide unsolved
Playfair challenges for readers interested in improving the current state-of-
the-art of attacking the Playfair cipher as presented in this article. Finally,
Section 5 concludes this article.
CRYPTOLOGIA 5

2. Conventional cryptanalysis approaches


Before the computer age, Playfair ciphertexts were often solved with the
help of cribs. Based on a crib, a cryptanalyst could guess some entries of
the matrix and then reconstruct additional ones by capitalizing on regular-
ities of the cipher. If the Playfair matrix is generated with a keyword (a
case we don’t cover in this article), keyword-based cryptanalytic approaches
can be employed. For instance, (Monge 1936) described how a Playfair
challenge ciphertext with only 30 letters can be solved with such a method.
Since the advent of computers, dictionary attacks on keyword-generated
key matrices have become possible.
A more general method to manually solve a Playfair ciphertext composed
of 800 or more letters is described in (Mauborgne 1918). This approach
matches the frequencies of the most common digraphs in a ciphertext
against those most common in the English language (for instance, TH, ER,
and ET). A tentative initial square is then built and completed in a trial-
and-error process.

3. Hill climbing and simulated annealing


Hill climbing is a method applied in computer science to solve certain opti-
mization problems, most of which are not crypto-related. Examples include
finding the shortest travel route that includes certain cities (the “traveling
salesman” problem) or determining the most efficient configuration of a
production facility. In order to find the optimal solution to a certain prob-
lem, we first need to assign a score to each solution. A method used for
this purpose is referred to as a scoring function. For the traveling salesman
problem, the score that hill climbing tries to optimize is the total distance
traveled, and the optimal solution is the path with the minimal
total distance.
Hill climbing requires a smooth scoring function, that is, a small change
in the solution results in only a small change in the scoring function. With
the traveling salesman problem, a change in the order of traversing a few
adjacent cities out of a total of hundreds of cities would typically result in
a small change in the total distance traveled. Hill climbing is efficient for
problems that have too many potential solutions for each one to be
checked by brute force. The method is named “hill climbing” because it
aims to improve a candidate solution iteratively until the “top of the hill”
(i.e., the optimal solution) is reached (see Figure 3).
It is easy to see that many classical ciphers have exactly the two proper-
ties that make hill climbing applicable and efficient. First, for almost all
conventional encryption methods—such as simple substitution, the
Vigenere cipher, and turning-grille encryption—there exist scoring
6 E. DUNIN ET AL.

Figure 3. Hill climbing is a technique for solving optimization problems with a smooth scoring
function. The algorithm starts with a random solution candidate and iteratively tries to reach a
better solution by making small changes. The goal is to reach the solution with the highest
score, which is most often the correct solution.

functions which are smooth. This means that small changes in the key
cause only small changes in the decryption score (for example, using letter
n-graph statistics). Second, the number of potential solutions (i.e., the key
space) of many classical ciphers is too large for a brute-force attack. For
instance, it is nearly impossible to work through all 26!1027 289 ways a
substitution table with 26 letters can be assembled. For introductions to hill
climbing attacks on ciphers see (Dunin and Schmeh 2020, 377–412), (Lasry
2018, 38–39), and (Bauer 2019, 358–360).
Cryptanalyzing a ciphertext with hill climbing works as follows:

1. Generate a random key candidate K1


2. Decrypt the ciphertext with K1 to get a plaintext candidate P1
3. Rate the plaintext candidate P1 with a scoring function (explanation is
given below)
4. Change the key candidate K1 slightly and obtain key candidate K2
5. Decrypt the ciphertext with K2 to get a new plaintext candidate P2
6. Rate the plaintext candidate P2 with the scoring function
7. If P2 has a higher score than P1, keep K2 as the new K1; otherwise K1
stays the same
8. If it appears that P1 can no longer be further improved with small
changes in K1, stop and display the results; otherwise repeat at step 4

After termination, the final key and plaintext candidates have some prob-
ability of being the correct ones. To increase this probability, the procedure
should be carried out many times with different starting points. In practice,
CRYPTOLOGIA 7

thousands of restarts or even more might be necessary. The highest score


across all these runs has a high probability of being the correct one.
Cryptography experts have used hill climbing successfully to break a
wide range of encryption algorithms from simple substitutions via
advanced manual methods to machine ciphers such as the Enigma. The
scoring function is typically based on letter or n-graph frequencies of
the plaintext language. The closer the frequencies of the alphabet letters in
the plaintext candidate and in a corpus of text in the same language are,
the higher the score. The slight change of the key required in step 4 of the
procedure described above is usually accomplished by switching the values
of two positions in the substitution table, swapping two columns, or swap-
ping two rows.
Another commonly used method for cryptanalyzing a cipher is simulated
annealing (Lasry 2018, 39–41), a generalization of hill climbing. The name
of this technique comes from annealing in metallurgy, a method involving
the heating and controlled cooling of a material to increase the size of its
crystals and reduce their defects. In simulated annealing, the new key can-
didate K2 is accepted not only if it leads to a higher P2 ciphertext score,
but it also has a chance of being accepted if the score is lower (see Figure 4).
The probability of this happening, p, is based on both the difference between
the new and older scores (d), and a control parameter known as the
“temperature” (T). The formula for this is:
jdj

p¼e T (1)

Figure 4. Simulated annealing is a generalization of hill climbing. With simulated annealing,


new key candidates are accepted not only if they result in a higher score, but also, randomly, if
they produce a lower score. This allows simulated annealing to avoid being stuck at
local maxima.
8 E. DUNIN ET AL.

So, if the temperature T is high, relatively many keys are accepted in


spite of a lower score; if the temperature is low, this happens only rarely.
The higher the difference d, the lower the probability of a key candidate
being kept, in spite of a lower score. If the difference is low, more keys are
accepted, in spite of a lower score. There are two common ways to use the
temperature:

 Diminishing temperature: We start with a high temperature, which


allows the algorithm to make big jumps across the key space, including
keys that result in a significantly lower score. Then, we decrease the
temperature gradually. Solutions reducing the score are now less likely
to be accepted.
 Fixed temperature: The probability that the algorithm jumps to a key
with a lower score doesn’t change.

Like hill climbing, simulated annealing requires a problem to have a


smooth scoring function. Simulated annealing is useful and efficient if a
brute-force attack is not feasible. Note that simulated annealing is a gener-
alization of hill climbing, because at a temperature close to 0, p becomes
very small and only solutions improving the score are likely to be accepted.
In general, simulated annealing is slower than hill climbing, but often
performs better if there is a particularly large key space.
A popular computer program that supports hill climbing and simulated
annealing for breaking certain ciphers is CrypTool 2 (Kopal et al. 2014).
Some of the authors of this article have contributed hill climbing or simu-
lated annealing code to this project, and use the platform to solve
cipher challenges.
It is important to note that both hill climbing and simulated annealing,
while being powerful tools for breaking historically important ciphers, are
not suited for attacking modern encryption algorithms such as AES or
DES. This is because it is an important requirement for modern encryption
algorithms, in fact, not to have any smooth scoring functions. Modern
algorithms are designed so that for any small change in the key or in the
plaintext—including the change of a single bit—about half of the bits in
the ciphertext output should change. In modern cryptology, this concept is
known as the “avalanche effect” (Schneier 2007, 273).
Contrary to modern encryption systems, there are scoring functions for
the Playfair cipher that are smooth with respect to the key matrix (e.g.,
switching the position of two letters), as small changes in the key matrix
cause only small changes in the decrypted text. In addition, the number of
keys available for Playfair encryption is much too high for a brute-force
search (remember that a key is represented by a permutation of a 25-letter
CRYPTOLOGIA 9

alphabet). These are the two conditions for a hill climbing or simulated
annealing attack to be most effective and useful. It therefore comes as no
surprise that these two methods have proven very efficient for the crypt-
analysis of Playfair.
For instance, (Cowan 2008) presents an attack on Playfair based on
simulated annealing with a constant temperature. It uses tetragraph (four-
letter group) frequencies for the scoring function. With this method,
Playfair ciphertexts as short as 80 letters can routinely be solved. In (Al-
Kazaz et al. 2018), a Playfair-breaking technique based on simulated
annealing is described and demonstrated on several ciphertexts. The short-
est one, with only 60 letters, was successfully deciphered with only two
errors. The scoring function is based on hexagraph statistics. To the
authors’ knowledge, as of early 2018 the deciphering of the aforementioned
60-letter message (yet with two errors) represented the world record for
the shortest random-matrix Playfair ciphertext ever cryptanalyzed.

4. Playfair challenges
The following sections describe Playfair challenges even shorter than 60
characters which were created by Schmeh and how they were solved by dif-
ferent co-authors of this article. At the end, we also present even smaller
unsolved challenges and invite the readers of this article to try to
solve them.

4.1. The 50-letter Playfair solution


Schmeh wanted to know whether the Playfair world record of 60 letters
could be broken. For this purpose, in April 2018 Schmeh created a 50-letter
Playfair ciphertext (not based on a keyword) and published it on
the internet1:

MQ VS KP EV IS BA WK TP KP PN AU NU NE GL UZ TY UZ LY GC TZ
KN KU ST AG CT NQ

This cryptogram was solved on the same day by Lasry, thus breaking the
previous record. Lasry used a simulated annealing program (with constant
temperature) of Lasry’s own design, the scoring function being based on
hexagraph statistics. The details are provided in (Lasry 2019).
Contrary to most other simulated-annealing algorithms, the one imple-
mented by Lasry not only checked simple changes of the key—swapping

1
Klaus Schmeh’s blog: Playfair cipher: Is it unbreakable, if the message has only 50 letters? https://scienceblogs.
de/klausis-krypto-kolumne/2018/04/07/playfair-cipher-is-it-unbreakable-if-the-message-has-only-50-letters
10 E. DUNIN ET AL.

two elements in the key matrix—at each iteration, but also checked a larger
number of other possible small changes on the key matrix—namely swaps
of any two rows, swaps of any two columns, permutations of the five rows,
permutations of the five columns, permutations of the five elements of any
row, and permutations of the five elements of any column—and computed
the scoring function separately for each one. It took the algorithm only a
few seconds on a 10-core Intel Core i7 6950  3.0 GHz PC to complete the
attack and to derive the correct solution:

WHILE IN PARIS I RECEIVED ORDERS TO REPORT X TO GENERAL


FOSTER X

To our knowledge, this success set a new record in deciphering the


shortest Playfair cryptogram based on a random matrix. The simulated-
annealing routine used is now also available in an open-source software
(Kopal 2018).

4.2. The 40-letter Playfair solution


In order to check whether the 50-letter world record could be improved,
Schmeh published another Playfair challenge, also on the internet, in
December 20182. This one was based on a plaintext with 40 letters and
again on a random key matrix:

OF FC ER VU MW MO OM RU FI WC MA OG FV ZY FX YB HG UX ZV EH

Again, the record was quickly broken, as the solution to the challenge
was solved within a few hours. This time, it was Kopal who came up with
the solution.
For the deciphering work, Kopal applied the simulated-annealing algo-
rithm written by Lasry including the hexagraph-based scoring function.
Initial runs only produced spurious solutions. At some stage, the software
displayed a decryption (at the fourth position of the list), starting with
MEETYOU, but only for a few seconds, before the decryption quickly dis-
appeared from the list as new higher-score decryptions were inserted. At
this point, an additional feature of the software proved helpful, which sup-
ported simulated annealing with a crib. The scoring function was modified
such that the score, based on hexagraph statistics, would be increased for
each known-plaintext symbol (i.e., the crib, or a portion of it) correctly
reproduced when decrypting the ciphertext with a candidate key. With this

2
Klaus Schmeh’s blog: Playfair cipher: Is it breakable, if the message has only 40 letters? https://scienceblogs.de/
klausis-krypto-kolumne/2018/12/08/playfair-cipher-is-it-breakable-if-the-message-has-only-40-letters
CRYPTOLOGIA 11

modification, ciphertexts with 40 letters could easily be solved given a crib


of 10 letters. When the crib-based attack was run with MEETYOU as a
crib, the solution was quickly found:

MEET YOU TOMORROW AT FOUR TWENTY AT MARKET PLACE


The details of this cryptanalysis are also provided in (Lasry 2019).

4.3. The 32-letter Playfair solution


To see whether a new record could be set, Schmeh decreased the number
of characters yet again, this time creating a Playfair cryptogram with 32 let-
ters (based on a 30-letter plaintext), again with a random key. This was
published on the internet in April 20193:

SX CR ED BQ UG VZ RS MN DS IK RK WR SG NS NX VM

The challenge definitely proved more difficult than the previous ones, as
no solution popped up that day, week, or even month. However, five
months later, in September 2019, Ekhall deciphered the message and again
set a new record4.
For this cryptanalysis effort, Ekhall started by writing a simulated-anneal-
ing program, first in Python, then later rewritten in C. The scoring func-
tion was based on the frequencies of five-letter groups (pentagraphs).
Initial results were not successful, so Ekhall improved the simulated-anneal-
ing algorithm by introducing two new functions:

 Restarts: Ekhall included regular restarts with newly generated key


matrices. This was straightforward to implement.
 Memory: Ekhall changed the algorithm so that as it worked from one
key matrix to another, it stored all key candidates that produced a score
higher than a certain limit. The algorithm was not allowed to go back
to a stored state—an approach also used in “tabu search” (Glover and
Laguna 1998). The use of this memory function aims at preventing the
program from visiting the same local maximum more than once and to
instead continue searching for other maxima; the visited keys become a
“taboo” hence the term “tabu search”.

3
Klaus Schmeh’s blog: Playfair cipher: Is it breakable, if the message has only 30 letters? https://scienceblogs.de/
klausis-krypto-kolumne/2019/04/15/playfair-cipher-is-it-breakable-if-the-message-has-only-30-letters
4
Klaus Schmeh’s blog: Magnus Ekhall solves Playfair challenge and sets a new world record. https://scienceblogs.
de/klausis-krypto-kolumne/2019/09/05/magnus-ekhall-solves-playfair-challenge-and-sets-a-new-world-record
12 E. DUNIN ET AL.

The memory function required a large amount of storage for storing the
visited key matrices and appropriate memory management. Ekhall came to
the conclusion that a data structure with efficient insertion and search per-
formance was essential. Switching from C to Cþþ allowed the use of an
std::unordered_set which has average constant-time search and inser-
tion complexity.
Ekhall ran the simulated-annealing algorithm with the memory and
restart functions. All candidate solutions that scored better than a fixed
threshold were logged to a file. Afterwards, the large log file was investi-
gated both manually and with a simple software routine that highlighted
words listed in a dictionary. After the program had worked for over a week
with numerous restarts, Ekhall finally found a plausible message amongst
the large number of plaintext candidates:

TAKE THE LAST X TRAIN TO YORK ON SUNDAY X

This made sense. The first X was apparently included to avoid a


doubled T, the second one to make the number of letters in the mes-
sage even. Omitting the Xs of course leads to the following ori-
ginal plaintext:

TAKE THE LAST TRAIN TO YORK ON SUNDAY

Schmeh confirmed that this was exactly the plaintext that had been
encrypted. A new record was reached, 32 letters. The plaintext did not
have a particularly high score, there were many false plaintext candidates
that scored better. A single score of a short message is more sensitive to
the presence of unusual letter combinations compared to longer messages
(Lasry 2018). As messages get shorter, the variance of the n-graph scoring
becomes larger.

4.4. The 28-letter Playfair solution


After Ekhall’s success, Schmeh created yet another Playfair challenge
in September 2019—this time with four fewer characters, only
28 letters5 :

ZX LS EW HC HU CE LQ OE PN YR IW YC VQ LS

5
Klaus Schmeh’s blog: Can you solve this Playfair cryptogram and set a new world record?. https://
scienceblogs.de/klausis-krypto-kolumne/2019/09/10/can-you-solve-this-playfair-cryptogram-and-set-a-new-
world-record
CRYPTOLOGIA 13

Two months after publication, in November 2019, it was again Ekhall


who came up with the solution and set the new record.6
For this successful cryptanalysis, Ekhall had used the same simulated-
annealing program with restarts and tabu-search elements that had already
solved the previous Playfair challenge. The scoring function was again
based on pentagraphs.
When starting into the project, Ekhall soon realized that a 28-letter
Playfair message was considerably more difficult to break than a challenge
of the same kind with 32 letters. So, the simulated annealer was run repeat-
edly, logging all plaintext candidates with a scoring function result over a
certain threshold. When this didn’t help, the scoring function was changed
from using pentagraph frequencies to hexagraph (6-character) frequencies,
though this required more memory. Running the improved code for
24 hours resulted in about 100,000 potential plaintexts that Ekhall needed
to sift through. Many of the output lines appeared to contain sequences of
proper English words, but they didn’t seem to connect to meaning-
ful sentences.
Ekhall then tried different techniques to comb through the many plain-
text candidates and to select the ones that seemed to have correct grammar.
A technique that proved helpful was to set up a list of common word pairs.
To assemble such a list, Ekhall wrote a software routine that worked
through the English part of Project Gutenberg (a not-for-profit organiza-
tion that digitizes non-copyrighted books and makes them accessible online
on gutenberg.org), identifying all consecutive pairs of words that appeared
in a subset of this corpus. This summed up to about eight million pairs of
words. Using these, Ekhall created another scoring function based on a
non-overlapping version of the Aho-Corasick algorithm.
The Aho-Corasick algorithm (Aho and Corasick 1975) is used to search an
input text for matches within a dictionary or set of strings. The benefit of using
Aho-Corasick is that it searches for all the words of its dictionary simultan-
eously. In this case the dictionary consisted of the list of all the above-men-
tioned word pairs extracted from Project Gutenberg. The Aho-Corasick
implementation first constructed a finite state machine out of all the word
pairs. This took a considerable amount of memory, tens of gigabytes, and took
about 5 minutes for the program to create. Once the state machine was con-
structed, the actual search was quite fast since Aho-Corasick searches with lin-
ear time complexity with regards to the sum of the text length and the number
of found words. Using the Aho-Corasick search outlined above, the 100,000

6
Klaus Schmeh’s blog: Magnus Ekhall solves 28-letter Playfair challenge and sets new world record. https://
scienceblogs.de/klausis-krypto-kolumne/2019/11/14/magnus-ekhall-solves-28-letter-playfair-challenge-and-sets-
new-world-record
14 E. DUNIN ET AL.

potential plaintext candidates were each given a score based on how many
characters could be connected to a word pair. For example, in the input string:
KOTAHEREBEENMAKINGTHEATEMPTA

Two non-overlapping word pairs were spotted (underlined): “HERE


BEEN” and “MAKING THE”, whereas the rest of the text does not have
any recognizable words. Note that “THE AT” was not recognized as a valid
word pair since the letters that constitute “THE” are already part of a word
pair. The number of letters in these four words gives the score of 17. The
potential plaintext candidates were then sorted based on these new scores.
With this method, Ekhall spotted a promising solution, about 2000 lines
from the top:

AWAYWHEREYOUAREUNTILTHURSDAY

Ekhall guessed that this was close to the solution, though the correct first
word might be STAY instead of AWAY. Using
STAYUNTILTHURSDAY as a crib, the program quickly con-
firmed Ekhall’s hypothesis and rendered the following solution:

STAY WHERE YOU ARE UNTIL THURSDAY

As mentioned above, with 28 letters it is difficult to differentiate between


the correct solution and false solutions also consisting of English words;
however, Schmeh affirmed that this was the correct plaintext, and that
Ekhall had set yet another new record for decryption of shortest ciphertext
(28 letters) of this type. Ekhall calls this method of solving this challenge
“brute force with simulated-annealing support” due to the fact that quite a
lot of manual checking was needed to find the correct solution.

4.5. The 26-letter Playfair solution


After the 28-letter Playfair challenge had been solved, Schmeh, again seek-
ing to issue a new challenge, published the following 26-letter ciphertext in
November 20197:

DB AQ IH KN RW VB KW NA DQ WR AM OQ IY

7
Klaus Schmeh’s blog: Can you solve this Playfair cryptogram and set a new world record? https://scienceblogs.
de/klausis-krypto-kolumne/2019/11/22/can-you-solve-this-playfair-cryptogram-and-set-a-new-world-record-2
CRYPTOLOGIA 15

Four weeks later the solution8 was posted by a person so far unknown to the
other authors of this article: Hamidullin. Surprisingly, Hamidullin had solved this
challenge with neither a hill climber nor a simulated annealer, but with a special
kind of dictionary attack on the first words of the plaintext (A dictionary attack
on the key matrix would not have been possible, as the matrix was not derived
from a keyword.). As far as we know, this approach to decipher a Playfair cipher-
text has never been discussed in the codebreaking literature before.
For the attack, Hamidullin required a list (i.e., a dictionary) of words that
stand at the beginning of English sentences. As Hamidullin couldn’t find an
existing collection that would suit his needs, he decided to create one. After
analyzing Ekhall’s solution of the previous two Playfair challenges, Hamidullin
realized that Project Gutenberg provided the input needed for this purpose.
Hamidullin wrote a software program in Cþþ that worked through
about three thousand English-language Project Gutenberg books and gener-
ated lists of the most frequent word n-graphs, with n running from 1 to 6.
Hamidullin’s codebreaking software then used the word n-graph database
that had been compiled, along with the following recursive algorithm that
would incrementally reconstruct plaintext one letter at a time and compare
the reconstructed portion with the word n-graphs to see if the reconstruc-
tion was a potential English expression:

procedure Decipher(Ciphertext, Plaintext)


begin
if Length(Ciphertext) ¼ Length(Plaintext) then begin
yield Plaintext
end
else begin
# Letters ¼ all letters from ’A’ to ’Z’ except ’J’
for ch in Letters begin
NewPlaintext:¼ Plaintext þ ch
if IsMeaningful(NewPlaintext)
and IsConsistent(Ciphertext, NewPlaintext)
then begin
Decipher(Ciphertext, NewPlaintext)
end
end
end
end
Decipher(’DBAQIHKNRWVBKWNADQWRAMOQIY’, ’’)

8
Klaus Schmeh’s blog: Konstantin Hamidullin solves 26-letter Playfair challenge and sets new world record.
https://scienceblogs.de/klausis-krypto-kolumne/2019/12/21/konstantin-hamidullin-solves-26-letter-playfair-challenge-
and-sets-new-world-record
16 E. DUNIN ET AL.

This algorithm produces a list of plaintext candidates that are both con-
sistent with a Playfair matrix and have a certain degree of meaningfulness.
This raises the question of how “meaningfulness” can be measured with the
IsMeaningful function. The method Hamidullin applied was to store the
plaintext candidate as an array of words, and then assign each word a pen-
alty value depending on the context. The context was defined as a sequence
of previous words (but no more than five, as a word hexagraph database
was used). In this algorithm, the phrase was considered meaningful if the
sum of all penalties didn’t exceed a certain value. This procedure is effect-
ively a variant of a scoring function, as known from hill climbingand 
simulated annealing. The penalty for using a word was equal to log mh ,
where h is the frequency of the word in a given context and m is the fre-
quency of the most frequent word in the given context. For example, in the
database that was built, the most frequent word after the phrase “WAIT
FOR” was “THE”, so the penalty for using “THE” would be 0; the alternate
word “FURTHER” was 56 times less frequent than “THE” in this context
and hence would score log(56). The penalty for rare words (and even for
non-words) can also be defined, thus allowing almost any word combina-
tions (for example “WAIT FOR X”). However the total score for randomly
constructed phrases would quickly exceed the allowed limit and the phrase
would be discarded.
To test consistency (IsConsistent), the algorithm starts with simple
checks (example: identical ciphertext letter pairs must encode identical
plaintext letter pairs; thus DBAQIH cannot encode THISIS) and then tries
to create a Playfair matrix with an exhaustive search. This was optimized
by rearranging letter quads (a pair of ciphertext letters and its respective
pair of plaintext letters): quads with fewer placement options should be
processed first.
For example, when checking whether the ciphertext is consistent with
the plaintext:

DBAQIHKNRWVBKW
WAITFORFURTHER

We have 7 quads, taking sequential letter pairs from each: (DB, WA),
(AQ, IT), (IH, FO), (KN, RF), (RW, UR), (VB, TH), and (KW, ER), all of
which are assembled in a key matrix. The first quad (DB, WA) can poten-
tially be placed on an empty matrix in 20 distinct (cryptologically nonequi-
valent) ways. All other ways of placing the quad, say by simply shifting it
left or right, or up and down without changing the distance between the
letters, would lead to an identical result.
All of the other quads would also have 20 distinct ways of being placed
except (RW, UR)—since a letter is shared between them, U, R, and W
CRYPTOLOGIA 17

must be adjacent according to the Playfair rules. Since there are only two
options to place (RW, UR), it should be positioned first. With W and R
located, quad (KW, ER) becomes the one with the fewest number of place-
ment possibilities, only requiring guesses for the positions K and E, so it
should be placed second. (KN, RF) can be positioned next by the same
logic. Out of the remaining four quads, (IH, FO) is taken next since F is
already positioned, followed by (AQ, IT) since I is positioned. Finally, (VB,
TH) and (DB, WA) close the list since there is only one placement option
per each in this attempt (Figures 5 and 6).

Figure 5. Straightforward matrix generation. (a) There are 20 distinct cryptologically nonequiva-
lent ways to place the quad (DB, WA) on an empty matrix, as D and B cannot be adjacent.
One of the possible options is shown. (b) Depending on the position of (DB, WA), there may
be up to 11 ways to place (AQ, IT). One of the possible options is shown.

Figure 6. Optimized matrix generation. (a) There are only two distinct ways to place quad (RW,
UR) on an empty matrix, one of which is shown. (b) Depending on the position of (RW, UR)
there may be up to five ways to place (KW, ER), one of which is shown. (c) No matter how
(RW, UR), (KW, ER), (KN, RF), (IH, FO), (AQ, IT) were positioned, (VB, TH) would have at most
one placement option in this example.

When Hamidullin’s software guessed the plaintexts WAIT, WAITFOR,


and WAITFORFURTHER, both the scoring function and the consistency
checks rendered positive results each time. So the following (correct) solu-
tion came out:

WAIT FOR FURTHER INSTRUCTIONS

Assuming that plaintext, the following key matrix was derived (the aster-
isks stand for the remaining unknown letters; their position is not relevant):
18 E. DUNIN ET AL.

2 3
R W E K U
6Y D   S7
6 7
6N A C F I7
6 7
4 B  H O5
 T M V Q

Within a few days, Schmeh confirmed Hamidullin’s solution. As of writ-


ing this Cryptologia article in February 2021, the 26-letter cryptogram
solved by Hamidullin is the shortest random-matrix Playfair ciphertext ever
publicly deciphered.

4.6 A 24-letter and a 22-letter Playfair challenge


We provide here two other shorter Playfair challenge ciphertexts that
remain to be solved, a 24-letter message created in January 2020, and a
22-letter message created in January 2021.
After the above 26-letter ciphertext had been solved, Schmeh published
the following 24-letter challenge9 in January 2020:

VYRSTKSVSDQLARMWTLRZNVUC

A ciphertext having a length of only 24 letters comes close to the unicity


distance of the Playfair cipher, which is equal to 22.69. This is the theoret-
ical minimum number of ciphertext letters allowing an unambiguous solu-
tion. Shorter ciphertexts than 23 letters allow you to find different keys
generating meaningful decryptions. Deciphering this 24-letter ciphertext
would set a new world record.
A 22-letter ciphertext has a length below the unicity distance of the
Playfair cipher. Therefore, it should be possible to find different meaningful
solutions. Despite that theoretical minimum, Schmeh created a challenge of
this length, which is previously unpublished:

NTKRKDLDGHISZWLICKHIAO

The solutions of the 24-letter and the 22-letter challenges are known
only to Schmeh and Dunin.

9
Klaus Schmeh’s blog: Can you solve this Playfair cryptogram and set a new world record? https://scienceblogs.
de/klausis-krypto-kolumne/2020/01/27/can-you-solve-this-playfair-cryptogram-and-set-a-new-world-record-3
CRYPTOLOGIA 19

5. Conclusion and further research


The cryptanalysis work described in this article took place in the years
2018 and 2019. As a result, the world records for deciphering the shortest
Playfair cryptogram based on a randomized key matrix decreased quickly
from 60 to 26 letters. This article documents the efforts of different
researchers that were involved in these activities.
An obvious next step would be to attack the 24-letter and 22-letter
Playfair challenges given above, in order to further improve the existing
record. It would be particularly interesting to know whether simulated
annealing, the dictionary-based method developed by Hamidullin, or some
other approach is the strongest way to attack the Playfair cipher. It also
seems possible that a combination of these techniques might be most
effective. Some of the authors of this article have analyzed these crypto-
grams, but they have not (yet) been successful in deciphering them. The
race for the next record is still open.

Notes on contributors
Elonka Dunin has co-authored the 2020 book Codebreaking: A Practical Guide together
with Klaus Schmeh. She is considered an authority on classical ciphers, co-founder and co-
leader of a group of cryptographers who are working hard to crack the final cipher on the
famous Kryptos sculpture at CIA Headquarters. She maintains a list of the World’s most
famous unsolved codes on her elonka.com site, and also published The Mammoth Book of
Secret Codes and Cryptograms. Bestselling author Dan Brown named one of the characters
in his The Da Vinci Code sequel, The Lost Symbol, after her. “Nola Kaye” is an ana-
grammed form of “Elonka.” She is a member of the Board of Directors for the National
Cryptologic Foundation, and is a lifetime member of the International Game Developers
Association. Currently living in the Washington, D.C. area, she works as a management
consultant. As a public speaker, Elonka regularly gives talks on her favorite subjects:
Games, Wikipedia, cryptography, medieval history, Agile development, and geocaching.
Magnus Ekhall has a MSc in Computer Science and Engineering from Link€ oping
University, Sweden. Apart from cryptanalysis of classical ciphers, Magnus is interested in
the mechanization of cryptanalysis, especially the Turing Bombes used to break the Enigma
cipher. Magnus currently works for Sectra Communications.
Konstantin Hamidullin received his master’s degree in computer science in 2007 from the
University of Latvia. He is currently a programmer from Riga (Latvia) with years of experi-
ence in the gaming industry, his work being mainly related to statistics and optimization.
After learning about the Voynich Manuscript in 2015, he developed an interest in classical
cryptography, including Playfair ciphers.
Nils Kopal is a computer scientist and cryptanalyst working as a postdoc at the
University of Siegen, Germany. He specializes in cryptanalysis of classical ciphers and
distributed cryptanalysis. He is leading the development of the open-source software
CrypTool 2. In the DECRYPT project he is responsible for developing tools for
20 E. DUNIN ET AL.

cryptanalysis of historical and classical ciphers and integrating these in the DECRYPT
pipeline and CT2.
George Lasry is a computer scientist in the high-tech industry in Israel, and a member of
the DECRYPT and CrypTool projects. He obtained his PhD in 2017 with the research
group “Applied Information Security” (AIS) at the University of Kassel. His primary inter-
est in cryptographic research is the application of specialized optimization techniques for
the computerized cryptanalysis of classical ciphers and cipher machines. In 2013, he solved
the Double Transposition (Doppelw€ urfel) cipher challenge. He also deciphered German
ADFGVX ciphertexts and diplomatic codes from World War I; World War II messages
encrypted using the German Siemens and Halske T52 teleprinter encryption device; a col-
lection of papal ciphers from the 16th, 17th, and 18th centuries; and transposition ciphers
from the Biafran War in the late 1960s.
Klaus Schmeh has co-authored the 2020 book Codebreaking: A Practical Guide
together with Elonka Dunin. He has written 15 other books (mainly in German)
about cryptology, as well as over 200 articles, 25 scientific articles, and 1,300 blog
posts, which probably makes him the most-published cryptology author in the world.
He is also a member of the editorial board of Cryptologia. Klaus’s main fields of
interest are codebreaking and the history of encryption. His blog Cipherbrain.net is
read by crypto enthusiasts all over the world. Klaus is a popular speaker, known for
his entertaining presentation style involving self-drawn cartoons and Lego models. He
has lectured at hundreds of conferences, including the NSA Cryptologic History
Symposium, HistoCrypt, the Charlotte International Cryptologic Symposium, and the
RSA Conference in San Francisco. In his day job, Klaus works for the German crypt-
ology company, cryptovision.

References
Aho, A. V., and M. J. Corasick. 1975. Efficient string matching: An aid to bibliographic
search. Communications of the ACM 18 (6):333–40. doi: 10.1145/360825.360855.
Al-Kazaz, N. R., S. A. Irvine, and W. J. Teahan. 2018. An automatic cryptanalysis of
Playfair ciphers using compression. In Proceedings of the 1st International Conference
on Historical Cryptology HistoCrypt, Vol. 149, 115–24. Link€ oping University Electronic
Press.
Cowan, M. J. 2008. Breaking short Playfair ciphers with the simulated annealing algorithm.
Cryptologia 32 (1):71–83. doi: 10.1080/01611190701743658.
Bauer, C. P. 2019. Unsolved!: The history and mystery of the world’s greatest ciphers from
ancient Egypt to online secret societies. Princeton, NJ: Princeton University Press.
David, C. 1996. A World War II German army field cipher and how we broke it.
Cryptologia 20 (1):55–76. doi: 10.1080/0161-119691884780.
Deavours, C. A. 1977. Unicity points in cryptanalysis. Cryptologia 1 (1):46–68. doi: 10.1080/
0161-117791832797.
Dunin, E., and K. Schmeh. 2020. Codebreaking: A practical guide. London: Robinson.
Glover, F., and M. Laguna. 1998. Editors: Panos M. Pardalos, Ding-Zhu Do, Ronald L.
Graham. Tabu search. In Handbook of combinatorial optimization, 2093–2229.
Heidelberg, Germany: Springer.
CRYPTOLOGIA 21

Kahn, D. 1996. The Codebreakers: The comprehensive history of secret communication from
ancient times to the Internet. New York City, NY: Simon and Schuster.
Kopal, N. 2018. Solving classical ciphers with CrypTool 2. In Proceedings of the 1st
International Conference on Historical Cryptology HistoCrypt, Vol. 149, 29–38.
Link€ oping University Electronic Press.
Kopal, N., O. Kieselmann, A. Wacker, and B. Esslinger. 2014. CrypTool 2.0. Datenschutz
und Datensicherheit - DuD 38 (10):701–8. doi: 10.1007/s11623-014-0274-7.
Lasry, G. 2018. A methodology for the cryptanalysis of classical ciphers with search metaheur-
istics. Kassel, Germany: Kassel University Press GmbH.
Lasry, G. 2019. Solving a 40-letter Playfair challenge with CrypTool 2. In Proceedings of
the 2nd International Conference on Historical Cryptology, HistoCrypt 2019, June 23-26,
Mons, Belgium, Vol. 158, 87–96. Link€ oping University Electronic Press.
Monge, A. 1936. Solution of a Playfair cipher. Fort Gordon, GA: US Signal Corps.
Mauborgne, J. O. 1918. An advanced problem in cryptography and its solution. Fort
Leavenworth, KS: Army Service Schools Press.
Schneier, B. 2007. Applied cryptography: Protocols, algorithms, and source code in C.
Hoboken, NJ: John Wiley & Sons.

You might also like