Professional Documents
Culture Documents
Phylogenic Language
Trees
Michele Fretta
Saint Michael’s College
Colchester, Vermont
Phylogenic Trees
Latin is the
ancestor, and
the “leaves” are
French Italian
Portuguese the
Spanish
contemporary
languages.
We will construct tree graphs which model the
relationships between the words of 12
languages.
Warning! The trees in this presentation are examples only. The data they are based on is
not statistically significant.
Spanish French Italian Portugues Russian Polish German Dutch English Hindi Nepali Persian
e
Here’s another method, which is often more reliable than Branch & Bound:
Dutch English French German Hindi Italian Nepalese Persian Polish PortugueseRussian Spanish
Dutch x
English 30% x
French 10% 10% x
German 70% 10% 1% x
Hindi 1% 1% 1% 1% x
Italian 10% 10% 90% 1% 1% x
Nepalese 1% 10% 1% 1% 60% 1% x
Persian 1% 11% 1% 1% 22% 1% 33% x
Polish 20% 1% 20% 10% 1% 20% 1% 1% x
Portuguese 10% 10% 80% 1% 1% 70% 1% 1% 20% x
Russian 20% 1% 20% 10% 1% 20% 1% 1% 40% 20% x
Spanish 10% 10% 99% 1% 1% 90% 1% 1% 30% 70% 20% x
The Complete Graph
Each edge represents an entry in our cognate-percentage matrix. They are color-coded by
percentage similarity.
99%
90%
80%
70%
60%
40%
33%
30%
22%
20%
11%
10%
1%
Implementing Kruskal’s Algorithm
German
Hindi
French
English Italian
Nepali
Dutch
Persian
Spanish
Russian
Polish
Portuguese
Our Minimum Spanning Tree
Russian Portuguese
Nepali
Polish
English
Italian
Hindi
German
Modifying the Tree
Remember the tree with Latin?
Latin is an internal vertex because it is an
ancestor.
Some of the vertices in our minimal spanning
tree are internal vertices, but we want them all
to be leaves
This is because leaves will represent present-
day languages, which we are working with in
this case.
Modifying the Tree
Polish
English
Italian
Hindi
German
Comparing our trees…
They are rooted
differently,
Portuguese
Polish
Spanish
French
Russian
German
Dutch
English
Hindi
Persian
Italian
Nepali
Portuguese
Nepali Russian
Dutch
Persian Spanish French
Polish
English Italian
Hindi German
Rooting the Tree
Our two trees would look more similar if they had the
same root.
Historians do this.
Based on historical records, they can rearrange the tree so
that the older languages are closed to the root.
Class Problem
There is a creole in India which is based on Portuguese.
If this language began as a synthesis of Portuguese and Hindi,
how would you place it in this tree?
Portuguese
Nepali Russian
Dutch Hint:
Persian Spanish French
There might
be a
problem
with this ...
Polish
Italian
English
Hindi
German
Supplements…
Applications
Parallels with modeling biological evolution
Mapping the migrations of human populations
Modeling the genetic similarities among
human populations
Time divergences: using a decomposition
“clock” to estimate the number of years which
have separated languages.
Biology vs. Linguistics:
Inheritance and transference
In biological trees, genes are compared to find these
relationships
In language trees, words are compared
Common ancestor Common ancestor
Word transfer
Gene transfer
French English German
Bacteria Eukarya Archaea