Professional Documents
Culture Documents
Kevin Knight
Information Sciences Institute University of Southern California
Sources for this talk: Mary DImperio, The Voynich Manuscript, An Elegant Enigma (1978) Kennedy & Churchill, The Voynich Manuscript (2006) Prescott Currier, Some Important New Statistical Findings (1976) Rene Zandbergen, Currier A and B: Two Different Languages? (1997) Rene Zandbergen, http://www.voynich.nu/ http://www.voynich.ms/forum/ experiments at USC/ISI
MIT / September 2009
Outline
Voynich Manuscript VMS, for short
What is it? Where did it come from? What does it mean?
What is it?
Medieval illustrated manuscript Approx. 235 pages on vellum material Color drawings of plants, nymphs, stars, etc. Approx. 38,000 words written in an unknown script Undeciphered!!! Meaning is unknown Currently owned by Yale University
Many pictures look like grafting. Sunflower? Would date VMS as post-1492.
medicine jar?
??
??
Ethel Boole born in England WV born in Lithuania WV imprisoned, Polish nationalist WV & EB meet, marry in 1902 WV publishes first book list WV acquires VMS in ancient castle WV moves to USA, opens bookshop WV sends photostatic copies of VMS Copying reveals de Tepenecz signature WV writes to Bohemian State Archvs WV presents VMS + Marci letter mentioning Bacon, $160k price Newbold & WV announce decipherment WV dies. VMS placed in vault, $100k VMS appraised at $19,400 Ethel dies, VMS to secretary Ann Nill Castle revealed as Villa Mondragone NY dealer Hans Kraus buys for $24,500 Kraus donates VMS to Yale Brumbaugh finds WV letters in BSA Zandbergen finds 1639 Baresch letter in newly online Kircher archive
Newbold Decipherment
Marci letter Bacon Cabala letter doubling cipher Create 222 = 484 Latin letter pairs AAXX
these letter pairs are the cipher alphabet
Assign each plaintext Latin letter to a set of cipher-alphabet letter pairs (B AQ, RT, ) This gives the encipherer some freedom, while the recipient can still decipher by using the table Cleverly encipher plaintext in such a way as to construct a cover message that looks like Latin, to fool readers
Newbold System
Example:
a n n DO MI NU DOMINU
Too hard to assemble good cover text! So, make cipher letter-pairs overlap:
a n n AD DB BR ADBR
Now can construct a plausible looking cover text in Latin for our secret message (also in Latin) an ingenious system, to be sure!!
Newbold Decipherment
Hmm, by the method, both plaintext and ciphertext should be in Latin letters But the VMS doesnt have Latin letters
4OPCC89
apparent ciphertext
artists rendition
4OPCC89
apparent ciphertext
I M O D
artists rendition
PCC89
apparent ciphertext
DOMI
I M O D
doubling
DO OM MI OM DO MI a o n n n n
non-deterministic mapping from 11 Latin letters to full 22 non-deterministic anagramming
PCC89
apparent ciphertext
DOMI
I M O D
doubling
DO OM MI OM DO MI a o n n n n
non-deterministic mapping from 11 Latin letters to full 22 non-deterministic anagramming
Of course the 222 table isnt given, so we have to build it up through cryptanalysis. Wow, this is a lot of work!
Newbold Decipherment
1300 real ciphertext letters in first 3 lines Decipherment of those first lines: I, Roger Bacon, have written this (in Latin) Anagramming sets of 55 letters is sometimes required. Slow but steady progress Andromeda galaxy, ovaries & ova so Bacon must have had a microscope & telescope, hundreds of years before they were discovered!
The Text
Approx. 38,000 words, unknown script Writing style similar to 15th century Florentine humanist hand Between 23 and 40 distinct characters No corrections, likely to have been copied Writing was done after illustrations
Transcription
BSC8AE OPCC9 4OE FCC89 4OFCC9 4OP9 SCBS9 4OBSC9 EFAM OPAE29 2ZC9 4OFC89 4OFAM Z89 4OFCC9 SC89 4OFCC9 4OFCC9 ESC89 EOP9 8ZC9 4OPCCC9 8ARSC89 4OFC9 4OP9
BSC8AE OPCC9 4OE FCC89 4OFCC9 4OP9 SCBS9 4OBSC9 EFAM OPAE29 2ZC9 4OFC89 4OFAM Z89 4OFCC9 SC89 4OFCC9 4OFCC9 ESC89 EOP9 8ZC9 4OPCCC9 8ARSC89 4OFC9 4OP9 last paragraph, f103r
Introduction to Astrology and Its Use in Weather Prediction, Medicine, and Agriculture, in English. Manuscript on Paper. 1490.
IR IIR IIIR
There are several transcription schemes to choose from.
Z , or separate characters?
S S S S S S
If you didnt know English, how would you know if fi was the same as f i ? Suppose f i never occurred. Would that be evidence? Suppose f i did occur, with the same contexts as fi (e.g., *shing)? Suppose f i did occur, but never in the same context as fi ? Another common motif:
SOORSOE9S9
Letter Frequencies
count letter count letter count letter
25468 20227 17655 14281 12973 11008 10471 10026 6716 5994 5423 4501 4076
O C 9 A 8 S E F R P 4 Z M
O C 9 A 8 S E F R P 4 Z M
2886 1752 1413 1046 950 908 591 524 431 316 217 157 156
2 N B J Q X T * V I W D 3
2 N B J Q X T * V I W D 3
148 96 74 52 31 17 14 2 1 1
U 6 Y K G L H 1 5 0
U 6 Y K G L H 1 5 0
word
8AM OE SC89 AM ZC89 SOE OR AR SC9 8AR 4OFCC9 4OFCC89 ZC9 4OFAN 4OFC89 89 4OFAM AE 8AE 2 SOR 8AM OE SC89 AM ZC89 SOE OR AR SC9 8AR 4OFCC9 4OFCC89 ZC9 4OFAN 4OFC89 89 4OFAM AE 8AE 2 SOR
count word
212 211 191 186 177 174 172 155 155 154 152 151 151 150 147 144 144 144 143 141 140 OFAM 8AN 4OFAE ZOE OFCC9 SCC9 SCOE S9 OPC89 OPAM 4OFAR 9 4OE S89 4OF9 ZCC9 OFAN 2AM OPAE OPAR SX9 OFAM 8AN 4OFAE ZOE OFCC9 SCC9 SCOE S9 OPC89 OPAM 4OFAR 9 4OE S89 4OF9 ZCC9 OFAN 2AM OPAE OPAR SX9
count word
140 138 130 129 119 118 OPCC9 OFAE ZO OFAR ESC89 OFC89 OPCC9 OFAE ZO OFAR ESC89 OFC89
etc
English
Length 1 2 3 4 5 6 7 8 9 10 11 12 13 Distribution 0.03 0.15 0.16 0.15 0.11 0.09 0.11 0.08 0.05 0.03 0.01 0.006 0.002 Counts on word types
Cryptogram
Newbold (1921) Manly (1931) critique of Newbold Feely (1945), abbreviated Latin Strong (1945), polyalphabetic cipher, no details
might fall into hands of enemies of USA!
William Freidman
Most famous American cryptographer of World War II
broke key ciphers, including Japanese Purple code, led proto-NSA
Writing in Tongues
suggested in Kennedy & Churchill, 2005
Writing in Tongues?
Medium Helene Smith, investigated by Theodore Flournoy (1896) Under a trance, Smith was able to converse with Martians She learned their language and could speak and write it Looked like a genuine language Grammar closer to French than you might expect
Smiths Martian
Hoax
Previous hoaxes:
Hitler diaries Vinland map
Voynich Manuscript:
How? Why? Who?
How?
Gordon Rugg (Scientific American, 2004)
Proposed Cardan grille Elizabethan espionage tool If applied with randomness injected, claimed to generate VMS-like text
Why?
KPMG Forensics 2006 Survey of Fraud in Australia and New Zealand Most Popular Motives for Fraud:
greed/lifestyle (54%) gambling (22%) personal financial pressure (5%) other (5%) not specified (3.5%) opportunity (0.4%) substance abuse (0.4%)
BUT: what if Voynich knew that? BUT: same signature in other docs de Tepenecz signature suspiciously found during overexposure
Who?
member of Society of Friends of Russian Freedom Needed $
who doesnt?
tricky Marci letter very convenient faked to add a Roger Bacon connection? spoke 18 languages
said to have traded newer, better books for monks old dirty ones
BUT: Baresch letter later found in Kircher archive also mention Bacon
Experiments
Can computers help us make sense of VMS? Is VMS a kind of letter substitution cipher?
Originally in Latin? English? Ukrainian? Ukrainian written without vowels?
Substitution Cipher
Substitution Cipher
Substitution Cipher
Substitution Cipher
Substitution Cipher
Substitution Cipher
e he e of the fof ingcmpnqsnwf cv fpn owoktvcv e f o e o oe t hu ihgzsnwfv rqcffnw cw owgcnwf ef kowazoanv ...
Substitution Cipher
Substitution Cipher
e he e is the sis ingcmpnqsnwf cv fpn owoktvcv e s i e i ie t hu ihgzsnwfv rqcffnw cw owgcnwf es kowazoanv ...
Cryptodict abacdefb abacdefb abacdefb abacdefc abacdefc abacdefd abacdefd abacdefe abacdefe abacdeff ACADEMIC DEDICATE MEMBRANE ELECTRIC TUTELAGE ANARCHIC EVERYDAY ANALYSES ANALYSIS EYEGLASS
Substitution Cipher
e he e is the sis ingcmpnqsnwf cv fpn owoktvcv e s i e i ie t hu ihgzsnwfv rqcffnw cw owgcnwf es kowazoanv ...
Cryptodict abacdefb abacdefb abacdefb abacdefc abacdefc abacdefd abacdefd abacdefe abacdefe abacdeff ACADEMIC DEDICATE MEMBRANE ELECTRIC TUTELAGE ANARCHIC EVERYDAY ANALYSES ANALYSIS EYEGLASS
Substitution Cipher
decipherment is the analysis ingcmpnqsnwf cv fpn owoktvcv of documents written in ancient hu ihgzsnwfv rqcffnw cw owgcnwf languages ... kowazoanv ...
Generative Models
Spanish letter trigram model
quo_vade_brerte_ Train on Spanish web text. Parameters fixed. Probabilistic model that substitutes VMS letters for Latin letters. Initially uniform. EM Algorithm.
a {all Voynich letters} b {all Voynich letters} c {all Voynich letters} z {all Voynich letters} __
EM method demonstrated on many decipherment tasks in [Knight et al 2006]. Easy experiments in Carmel finite-state package:
% carmel --train-cascade corpus latin.wfsa subst.wfst
Substitution Cipher
Input Best decipherment assuming plaintext is Spanish
If plaintext is assumed to be Latin: quiss squm is onum pom quss hates s qum hatis
cevzren cnegr qry vatravbfb uvqnytb qba dhvwbgr qr yn znapun VAS92 9FAE AR APAM ZOE ZOR9 QOR92 9 FOR ZOE89
Spanish
Romanian
nonsense
Consonantal Writing
Input Best guess of plaintext language Best decipherment
Spanish
more nonsense
Generative Models
Okay, that didnt work Lets devise looser generative models, to mine for patterns.
Generative Models
Trigram model over {a, b, _ }
aa _bab_abaa_ Initially uniform What parameter settings result in highest P(corpus) ? EM algorithm.
Generative Models
Trigram model over {a, b, _ }
aa _bab_abaa_ Initially uniform What parameter settings result in highest P(corpus) ? EM algorithm.
Generative Models
Trigram model over {a, b, _ }
aa _bab_abaa_ Initially uniform What parameter settings result in highest P(corpus) ? EM algorithm.
a
Sample tagging with learned model:
b __
a b _ b b a _ b a b b _ i n _ t h e _ t o w n _ b b a b a _ a _ w h e r e _ i _
i n _ t h e _ t o w n _ w h e r e _ i _ was
Generative Models
Trigram model over {a, b, _ }
aa _bab_abaa_ Initially uniform What parameter settings result in highest P(corpus) ? EM algorithm.
? ?
Generative Models
Trigram model over {a, b, _ }
aa _bab_abaa_ Initially uniform What parameter settings result in highest P(corpus) ? EM algorithm.
a
Sample tagging with learned model:
b __
V A S 9 2 _ 9 F A E _ A R _ A P A M _
b b b b a _ a b b a _ b a _ V A S 9 2 _ 9 F A E _ A R _ b b b a _ b b a _ b b b a _ A P A M _ Z O E _ Z O R 9 _
Generative Models
P(letter | tag)
P(a)
P(tag | letter)
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 B D J KMN P Q VWX L R C F G T H S Y U E O A I
English a
P(a)
Voynich a
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 4 SWY X Q C A F P B I O 8 V * 2 H E G T R K U 6 J D 9 3 NM 5 L
Generative Models
Bigram model over {a, b}
aabababaa What parameter settings result in highest P(corpus) ? EM algorithm.
Generative Models
Bigram model over {a, b}
aabababaa That would be very interesting. Do words with similar contexts have similar spellings?!
Generative Models
Bigram model over {a, b}
aabababaa That would be very interesting. Do words with similar contexts have similar spellings?!
b WAIT, WHAT?
VAS92 9FAE AR APAM ZOE ZOR9 QRC2 9 ...
200 0
400
600
200
400
600
Generative Models
pages
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221
200 0
400
600
200
400
600
Generative Models
pages
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221
200 0
400
600
200
400
600
Generative Models
pages
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221
Generative Models
Voynich words tagged as a
600 400 200 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221
pages
Herbal
Astro
Bio
Pharma
Stars
Known since Capt. Curriers analysis (1976): Two languages (in the formal sense). Several handwriting styles, supposedly similar breakdown.
For every pair of pages, how similar are they to each other?
Rene Zandbergen (1997)
same pages
Herbal
Astro
Bio Pharma
Stars
etc
etc
etc
etc
etc
etc
Class-Tag Sequences
Tagging of first VMS page:
f c c b f b c b i f g h c e g j c j d g d f c a b c c c i h h g c b j b c b d f f b b i c e c j i g g j e d b a c c d e i j a i e h i b b a d c i d e f d j j i b c d h e g b c j j b b f a d j b c c j j g h i c e b c j c d f d c a j b c c i g i c b c e b b d b d c j c e j j i j b b c c a c c d e j j c i h c c i a c c c d f b b d i b c c i g e j h d j c c d e a c f i j b b i e h c g d j j j d a f b d b j c c h b g j b j c c c f e b c j c b c c g e j c j b j c c b a c c j j j c c j h b c c c c c c c f j c b b c b c c g c h j j c j c c d c f c c b c c c b b g c h j h i c j j d c f c f d c j c b c g b g b c c b j b b j e j c c i j j j c a c c b d j c j b h c c e i j c c j f c c a d c c c c g c c h c c b c c i b c f b h e c c d j c g j f a c c i c c j c g h c b d b c j c b f b j b j b j c j h j c j c e c c j f j b j c a c c c h c j c b h
b
a 100 50 0 1 6 11 16 21 26 31 36 41 46 b
100 50 0 1 6 11 16 21 26 31 36 41 46
c
200 100 0 1 6 11 16 21 26 31 36 41 46 c 100 50 0
d
d 1 6 11 16 21 26 31 36 41 46
e
100 50 0 1 6 11 16 21 26 31 36 41 46 e 100 50 0
f
f 1 6 11 16 21 26 31 36 41 46
g
100 50 0 1 6 11 16 21 26 31 36 41 46 g 100 50 0
h
h 1 6 11 16 21 26 31 36 41 46
i
100 50 0 1 6 11 16 21 26 31 36 41 46 i 100 50 0
j
j 1 6 11 16 21 26 31 36 41 46
b
a 100 50 0 1 6 11 16 21 26 31 36 41 46 b
10 Classes of words: Voynich-B Tags per page. Bio words vs. Stars words
100 50 0 1 6 11 16 21 26 31 36 41 46
c
200 100 0 1 6 11 16 21 26 31 36 41 46 c 100 50 0
d
d 1 6 11 16 21 26 31 36 41 46
e
100 50 0 1 6 11 16 21 26 31 36 41 46 e 100 50 0
f
f 1 6 11 16 21 26 31 36 41 46
g
100 50 0 1 6 11 16 21 26 31 36 41 46 g 100 50 0
h
h 1 6 11 16 21 26 31 36 41 46
i
100 50 0 1 6 11 16 21 26 31 36 41 46 i 100 50 0
j
j 1 6 11 16 21 26 31 36 41 46
Conclusion
Voynich Manuscript
What it is Where it came from What it means pretty clear less clear totally unclear
thank you