Autor: Yanovich
Tema: Lingüística histórica

© All Rights Reserved

4 views

Autor: Yanovich
Tema: Lingüística histórica

© All Rights Reserved

- points Estimate
- Vasicek
- Assessment 2 - Unit Planner.pdf
- Scien Direct 3(1)
- !!Ijest 2(9) Special Issue NPD Whole
- MULTI-PEOPLE TRACKING ACROSS MULTIPLE CAMERAS
- exsoft2
- 1743-422X-10-275
- Error analysis lecture 15
- Detection of Reliable Software Using SPRT
- Tips and Strategies for Mixed Modeling With SAS STAT Procedure
- Van Der MERWE_Probability of Failure of South African Coal Pillars
- 1329058574-Simonyanetal_2011AJEA854
- hw12
- KNNL_AppA
- esnm chao
- Hayward Green Vg
- chap3.pdf
- Generalized Linear Models-1
- Con Row Risk

You are on page 1of 23

HOMELAND

IGOR YANOVICH

Abstract. [Sicoli and Holton, 2014] (PLOS ONE 9:3, e91722) use computational phylogenetics to

argue that linguistic data from for the putative, but likely Dene-Yeniseian macro-family are better

compatible with a homeland in Beringia (i.e. northeastern Siberia plus northwestern Alaska) than

with one in central Siberia or deeper Asia. I show that a more careful examination invalidates that

conclusion: in fact, linguistic data do not support Beringia as the homeland. In the course of showing

that, I discuss, without requiring a deep mathematical background, a number of methodological

issues concerning computational phylogenetic analyses of linguistic data and drawing inferences

from them. I suggest current best practices for such issues, using which would have helped to avoid

some of the problems in the Dene-Yeniseian case.

central Siberia and Na-Dene languages in North America [Vajda, 2011], [Vajda, 2013]. The macro-

family is still only putative, as many open questions remain (see [Campbell, 2011], [Starostin, 2012],

as well as the reply in [Vajda, 2012]). However, the family does appear to be quite likely, and has

been widely accepted as such (for instance, [Kiparsky, 2015]).1

[Sicoli and Holton, 2014] apply computational phylogenetic methods to typological data from

Dene-Yeniseian languages in order to address the question of where their homeland was. The test

that they apply is very simple: Sicoli and Holton examine the shape of the obtained family trees or

networks, and determine whether they support a basal split into separate Yeniseian and Na-Dene

branches. They find that their analysis does not support such a split, which effectively amounts

to saying that there was no Proto-Na-Dene stage that is ancestral to all Na-Dene languages but

excludes Yeniseian. In other words, Sicoli and Holtons analysis says that the basal split was

not between Yeniseian and Na-Dene, but between some Na-Dene I and [Na-Dene II + Yeniseian].

From this inferred history of splitting, Sicoli and Holton conclude that the homeland of the Dene-

Yeniseian family must have been in Beringia rather than in Siberia, or generally Asia excluding

Beringia.

Importantly, the conclusions of [Sicoli and Holton, 2014] have been taken up as valid by special-

ists outside linguistics. The so-called Beringian standstill hypothesis is an important issue in current

studies of the peopling of Americas that use genetic data. That hypothesis argues that there was a

single (though likely structured) human group that later rapidly colonized the Americas, and that

it has been isolated for several thousand years from other human populations even before entering

the two new continents. Such a scenario appears likely given current results from genetics. A

Date: Nov 16, 2017.

1

This paper has greatly benefitted from discussions with and comments by Chris Bentz, Johannes Dellert, Gerhard

Jager, Taraka Rama, and Johannes Wahle, from the help in locating some of the relevant literature by Alexei

Kassian and Elena Krjukova, and from presentations at the EVOLAEMP project group http://www.evolaemp.

uni-tuebingen.de/ and the DFG Center for Advanced Study Words, Bones, Genes, Tools. Research reported

here was supported by DFG under project FOR 2237, establishing the said Center for Advanced Study, which is

hereby gratefully acknowledged.

1

2 PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND

reasonable place where that isolation period could have happened would be Beringia: Northeastern

Siberia, Western Alaska and the Bering land bridge, which is currently under water. [Watson, 2017]

provides a popular review of the history and the evidence for the Beringian standstill hypothesis.

That review cites [Sicoli and Holton, 2014] as implying that humans occupied Beringia during the

Last Glacial Maximum (LGM), a period of maximum glacier extent at around 26-20 thousands

of years ago [Clark et al., 2009]. [Watson, 2017] cites a p.c. by Gary Holton, one of the authors

of [Sicoli and Holton, 2014], saying that their study supports at least a period of occupation and

diversification within the Beringian area, and probably somewhere within the southwestern Alaskan

area. Similarly, [Hoffecker et al., 2016], a careful review examining multiple lines of evidence for

the Beringian standstill, also relies on [Sicoli and Holton, 2014] for linguistics, stating that a re-

cent analysis of the Na-Dene and Yeniseian languages indicates a back-migration from Beringia

into Siberia and central Asia rather than the reverse. [Hoffecker et al., 2016] conclude their paper

saying that many questions remain unanswered regarding the complicated movements of people

and/or genes into and out of Beringia after the LGM. Some of the answers have been documented

with archeological, linguistic, and genetic data, but others are problematic or disputed, where

linguistic data refers primarily to [Sicoli and Holton, 2014]s work. In other words, Sicoli and

Holtons conclusion that linguistics firmly supports human occupation of Beringia has become very

popular outside linguistics.

While aiming to contribute linguistic evidence to the Beringian debate is commendable, unfor-

tunately, there are several problems with the argument of [Sicoli and Holton, 2014]. First, Sicoli

and Holtons assessment that the shape of the Dene-Yeniseian language-family tree bears bears

on the Beringian question is overly optimistic, as I discuss below in Section 1. Secondly, the tree

structure that Sicoli and Holton obtained in their Bayesian phylogenetic analysis is not robust to

the choice of tree priors: with a different tree prior than the one used by Sicoli and Holton, we

obtain strong evidence for the traditional phylogeny of the macro-family, where the basal split is

into the Yeniseian and the Na-Dene group. This means that the resulting shape of the tree crucially

depends on a technical choice. It also means, importantly, that the linguistic data in Sicoli and

Holtons dataset are insufficient to infer the true tree of the family: with large amounts of data,

the linguistic information should in principle override the preferences induced by the tree prior. I

discuss the general logic behind Bayesian MCMC inference of linguistic phylogenies (the compu-

tational method used by [Sicoli and Holton, 2014]) as well as the specific problem with tree priors

in Section 2. Finally, even though computational methods are of no help in this case for deciding

the general shape of the tree, there is plenty of historical-linguistic information that supports the

traditional phylogeny against Sicoli and Holtons novel proposal, Section 3. In particular, Yeni-

seian and Na-Dene language families are sufficiently different that historical linguists still express

caution regarding whether they can be considered a macro-family, see e.g. [Campbell, 2011]. The

data firmly rule out that a subclade of the Athabaskan languages within Na-Dene could be more

closely related to Yeniseian than to the rest of Athabaskan that is, the phylogeny that Sicoli

and Holton defend.

The Dene-Yeniseian analysis by [Sicoli and Holton, 2014], despite being an innovative take on

the issue, thus suffers from three serious problems: (i) different homeland and migration hypotheses

as such are compatible with both types of linguistic phylogenies, so the latter cannot help us decide

between the former (Section 1 below); (ii) the family-tree structure inferred by Sicoli and Holton

is not robust to the choice of the tree prior: under a different reasonable prior, the alternative

tree structure is inferred, which the authors thought they could reject with certainty (Section 2);

(iii) while phylogenetic methods are of little help for deciding between the tree structures, the overall

linguistic evidence strongly points to the tree which Sicoli and Holton rejected (Section 3). Each of

these three problems alone would have made [Sicoli and Holton, 2014]s Beringia inference invalid.

Section 4 concludes that linguistic evidence currently does not support either side in the Beringia

debate, and briefly summarizes methodological suggestions regarding obtaining and interpreting

PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND 3

computational phylogenetic inferences from linguistic data that would have helped to avoid the

pitfalls that the pioneering study of [Sicoli and Holton, 2014] fell victim to.

It should be particularly stressed that the technical problems [Sicoli and Holton, 2014] ran into

were likely due to treating the employed phylogenetic methodology more or less as a black box.

The authors provided complete logs of their analyses, which is precisely what allowed me to identify

some of their technical mistakes. They were completely transparent about what they did. There

is thus absolutely no question that [Sicoli and Holton, 2014] worked in good faith. I hope that

an extensive, but informal discussion of not-very-transparent technical issues in this paper would

contribute to computational phylogenetics becoming less of a black box for our field of historical

linguistics.

1. [Sicoli and Holton, 2014]s argument for the Beringia homeland of Dene-Yeniseian

[Sicoli and Holton, 2014] use computationally inferred phylogenetic trees based on linguistic ev-

idence in an argument which, according to them, shows that the spread of the Dene-Yeniseian

macro-family proceeded from Beringia. Their linguistic data are 116 binary typological features,

described briefly in their Supplementary Materials 1. Conventionally, the presence of a feature is

coded as 1, its absence as 0, but the phylogenetic-inference model that the authors use does not

make an internal distinction between presences and absences, being only sensitive to match and

mismatch between the same feature in different languages.

Sicoli and Holtons features (which can also be synonymously called characters or sites in

the context of phylogenetic inference) are highly correlated with each other. For example, features

1-18 concern the shape of the vowel system of the respective languages. Features 1, 4, 8, 12, and

16 are mutually exclusive, as they count how many vowels the system has overall: three (feature

1), four (feature 4), and so on. The 13 other features in the group 1-18 describe the more exact

shape of the vowel system, and are conditional on the number of vowels in it. Thus for the 3-vowel

systems, there are two binary features 1-1-1 (feature 2) and 2-1 (feature 3). Obviously these

can have value 1 only if feature 1 (=having exactly 3 vowels) is 1. Similarly, only one of those

two features can be 1 at the same time. Finally, if the system has three vowels overall, this means

that all features corresponding to systems with a different number of vowels, that is features 4-18,

must be 0. Summing up, features 1 and 2 and features 1 and 3 are positively correlated, while

features 1-3 and 4-18 are all pairwise negatively correlated. These 18 features are arguably the

most correlated subset in Sicoli and Holtons dataset, but similar problems occur on a smaller scale

in the rest of the data as well. There are two consequences of that. First, the evolutionary model

Sicoli and Holton use for their phylogenetic inference assumes feature independence, which is not

the case.2 But secondly, even if such statistical dependence does not bias the results, it still means

that the actual amount of data in the dataset is effectively smaller than 116 binary characters.3

2The same problem affects other linguistic phylogenetic studies, and I am not aware of a full-scale quantification

of how serious the problem might be.

The problem might be hard to notice for non-biologists because the assumption of character independence is so

fundamental to phylogenetic inference that it rarely gets spelled out explicitly for example, it is not

3Sicoli and Holton also note that 26 out of their 116 binary characters feature the same value for all languages,

and say that they are therefore uninformative for phylogenetic inference. The latter statement is not completely true.

What is true is that a uniform feature does not give us any information about which languages belong to the same

clade within the family: only shared innovations and retentions that affect a part of the family are useful for that.

But uniform features still contribute information about the rates of change, and could also affect inference of likely

feature states in the proto-language. Through that, they can even affect tree topology, albeit not as directly as non-

uniform characters. In the main text, I only report analyses including all the features, unlike Sicoli and Holton who

excluded uniform ones. (I discuss the technical aspects of the issue a bit further in Section 2.) I checked whether this

difference would affect tree topologies by running one analysis in two variants. As there was no significant difference

in tree topologies, I believe that in the Dene-Yeniseian case, this choice is not particularly consequential. With other

datasets, however, it can be, as I discussed elsewhere [Author, 2017].

4 PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND

This is important, because 116 binary characters is not very much data to start with as far as

computational phylogenetics goes. As the effective number was even smaller, it should therefore

come as no surprise that Sicoli and Holtons phylogenetic results depend heavily on the choice of

prior distributions, as we will see in the next section. With large amounts of data, the signal in

the data can often overwhelm the biases of the prior, but the less data we have, the more influence

our prior assumptions will have on the final result.

Sicoli and Holtons specific goal in their phylogenetic analysis is to compare two hypotheses about

the shape of the Dene-Yeniseian tree. (Here, hypothesis is meant in the statistical sense, namely,

as a theoretical possibility that we can study statistically; this sense is different from the general

scientific sense of hypothesis as a possibility that is formulated to explain the present evidence.)

One hypothesis says that the Dene-Yeniseian tree will have the shape [[Yeniseian], [Na-Dene]], with

the first split separating the two traditionally postulated language families. The other hypothesis

says that the tree does not have that shape, and instead that some Na-Dene languages branch out

before the Yeniseian languages branch out from the stem of the tree. In other words, the second

hypothesis says that the tree has the shape [[Na-Dene I], [Na-Dene II, Yeniseian]]. This hypothesis

explicitly contradicts the traditional linguistic classification of the relevant languages an issue

we will discuss below in Section 3.

Sicoli and Holton argue that the two different topologies correspond to different migration sce-

narios. As this is a crucial step in their argument, it merits a full citation:

We expect the two different migration hypotheses to exhibit different tree topologies.

The out of central/western Asia hypothesis assumes that the Yeniseian languages

(and potentially their extinct relatives) branched off of the Dene-Yeniseian family

with Na-Dene subsequently diversifying. The tree topology for this hypothesis would

place the Yeniseian languages outside of Na-Dene: [Yeniseian[Na-Dene]. The radia-

tion out of Beringia hypothesis does not assume that Yeniseian necessarily branched

first.

[Sicoli and Holton, 2014, p. 4]

What Sicoli and Holton assume in this passage can be usefully illustrated with Figure 1. If the

homeland of Dene-Yeniseian was in central or western Asia, they assume that the only possible

scenario following that would be (SH-I) a single Na-Dene migration into Beringia and then further

into more southerly North America. If this were indeed the only scenario compatible with a deep

Asian homeland, then Sicoli and Holton would have been right to equate the Asian homeland with

the tree topology [[Yeniseian], [Na-Dene]]. However, this is not so. There is no a priori reason why,

for example, the following, completely hypothetical scenario would be ruled out: (A1) Na-Dene

I splits from Proto-Dene-Yeniseian and occupies some territory in central Siberia; (A2) Yeniseian

and Na-Dene II split from each other, all staying in central Siberia; (A3) Na-Dene I and Na-Dene II

migrate to Beringia as a part of a larger migration of diverse peoples, including the speakers of all

non-Na-Dene and non-Eskimo-Inuit American languages. There are more hypothetical scenarios

that can be formulated. Lets spell out one more option: (B1) Na-Dene I splits from Proto-Dene-

Yeniseian and moves to Beringia (for instance, to Western Alaska); (B2) Yeniseian and Na-Dene

II split; (B3) Na-Dene II moves into North America, and Na-Dene I moves around within North

America. Of course, some of such scenarios would be more far-fetched than others. However, all

of them represent logical possibilities. [Sicoli and Holton, 2014] do not discuss why exactly they

think all possibilities but one should be ruled out: the extract above is all that they say about the

issue.

Turning to the hypothesis of a Beringian homeland, Sicoli and Holton remain more cautious.

They only say that a Beringian homeland is compatible with a tree topology other than [[Yeniseian],

[Na-Dene]]. Of course, it is also compatible with [[Yeniseian], [Na-Dene]]: it can be that from a

homeland in Beringia, the Yeniseian branch moves out first (presumably to the east), while Na-Dene

PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND 5

n

Ye

I

ND (3) ND I

(1)

(2)

II

A ND II DI

ND (1) N

ND

II

(3)

(3

(2)

)Y

SH-II

(3 )

Yen

I

en

) ND

(1

B (2)

(3

Yen

)Y

(2) Yen

(2) ND II

en

II

ND

(3)

narios. Representation is schematic: specific migration targets within a region are not

intended to convey information. The numbering on arrows indicates the intended temporal

ordering. (SH-I) and (SH-II) are suggested by [Sicoli and Holton, 2014]. (A) and (B) are

alternative scenarios with a Siberian homeland, but the linguistic tree structure [[Na-Dene

I], [Na-Dene II, Yeniseian]]. The existence of (A) and (B) shows that the true tree phylogeny

does not bear directly on where the Dene-Yeniseian homeland could have been.

continue to develop in the homeland before moving out. One alternative scenario, which Sicoli and

Holton end up arguing for in the end, is (SH-II): there were several Na-Dene migrations out of the

Beringian homeland and therefore several linguistic splits resulting in the modern Na-Dene groups,

but the split of the Yeniseian languages occurred after some of those Na-Dene splits, but before

others. It is worth noting that Sicoli and Holtons favorite scenario has roughly the same level of a

priori far-fetchedness as the hypothetical Asian scenarios sketched above. It requires there to be no

Proto-Na-Dene stage excluding Yeniseian, just as (A) and (B) above. It requires several separate

migrations by Na-Dene groups out of their homeland towards their positions in North America,

just as (A) and (B) do. Such multiple migrations into the American interior might seem somewhat

suspect to a geneticist or an archaeologist, given the astonishing levels of genetic and technological

similarity among most American populations (excluding Eskimo-Inuit).4 However, for a linguist,

the presence of different migration streams would not be surprising, as the linguistic diversity of the

Americas in terms of the number of apparently unrelated language families is equally astonishing,

[Nichols, 2008]. If we cannot see genetic or archaeological signs indicating likely linguistic diversity

even regarding the American language stocks as a whole, it is not a surprise that we did not see

such clear evidence for several separate Na-Dene migrations: after all, the former are much more

diverse than the latter linguistically. However, an important point is that a Beringian homeland is

4For the archaeological side, see the review of evidence in [Potter, 2011]. For the genetic side, see a recent review

in [Skoglund and Reich, 2016].

6 PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND

also compatible with a basal split into Na-Dene and Yeniseian: for instance, we can form such a

scenario by simply changing the order of out-of-Beringia migrations in (SH-II) in Figure 1.

Of course, even quite a far-fetched scenario may turn out to be a true one. The very fact that

the Americas feature such a high level of language-family diversity is a good example. If we did

not know about that diversity, we would have found it implausible that any continent could exhibit

such, extrapolating from the rest of the Earth. Yet the American linguistic diversity exists. The

bottom line for the Dene-Yeniseian case is two-fold: (i) one should not rule out even far-fetched

scenarios without sufficient reason; (ii) arguably, logically possible migratory scenarios that were

ruled out without discussion by [Sicoli and Holton, 2014] are not much more far-fetched than the

scenario that they end up arguing for.

Thus if we look at the different logically possible migratory scenarios, we can see that either of the

homelands is compatible with both of the tree topologies considered by [Sicoli and Holton, 2014].

Linguistic phylogenetic trees do not help us decide whether the relevant population splits occurred

in central Asia or in Beringia. In other words, the very premise of Sicoli and Holtons main line of

argumentation is not valid: either topology is compatible with both homelands. There is therefore

no way to use Sicoli and Holtons linguistic framework to either support or refute the Beringia

story.

What if there were valid reasons to rule out all alternative migration scenarios for the Asian

homeland hypothesis? Even though Sicoli and Holton do not provide such reasons or acknowledge

the issue, lets assume for the sake of the argument that such reasons could be provided. In this

case, we would have a clear line of attack. Since by Sicoli and Holtons assumption, linguistic tree

topology [[Yeniseian], [Na-Dene]] is compatible with both homelands, we would not learn anything

if that topology is supported by the data. If, however, its the other topology that is supported,

namely [[Na-Dene I], [Na-Dene II, Yeniseian]], then we can infer that the homeland was in Beringia.

That is the argument that [Sicoli and Holton, 2014] put forward. In the next two sections, we will

examine how valid that argument is even if we accept their premise.

The bottom line of this long section is very simple: while [Sicoli and Holton, 2014]s original

analysis did not support a basal split between Yeniseian and Na-Dene, if we change one of the

parameters of the computational analysis, namely the tree prior, the results actually show exactly

such a split. Because the two analyses both use a priori reasonable settings, but disagree, by

themselves computational-phylogenetic results cannot be used either to support or refute Sicoli

and Holtons position.

If you are only interested in the general structure of the argument for the Dene-Yeniseian home-

land, that information, also illustrated by the consensus trees in Fig. 2, is already enough, and you

can skip to Section 3. The rest of this section explains, in informal terms, how the computational

analysis works and how one sets up its settings. Using computational phylogenetic software can

be daunting, as the manuals and help pages often presuppose a great deal of knowledge about

the technical details, and are written for biologists and geneticists, not linguists. The goal of this

section is to somewhat demystify the process, and at the same time explain the problems with

[Sicoli and Holton, 2014]s analysis, so that one could avoid running into similar problems in the

future. For a longer methodological introduction that, unlike the present article, gradually works

its way towards a mathematical presentation, see [Ronquist et al., 2009].

2.1. How Bayesian MCMC works. It is useful to start with a brief and informal discussion

of the statistical method that Sicoli and Holton used to obtain their trees for the Dene-Yeniseian

macro-family. That method is called Bayesian MCMC (abbreviated from Markov Chain Monte

Carlo). Inferring a language-family tree from observed data intuitively involves finding the tree or

trees which are best compatible with the data. To completely describe any given tree from scratch,

PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND 7

we need many parameters: the topology of the tree (a categorical parameter) and the length of

each single branch (many real-valued parameters). Furthermore, to connect the tree to our data,

we need even more parameters. To compute how likely our observations were to be generated if a

certain tree were the true tree, we need an evolutionary model that describes e.g. the rates of change

of our linguistic features. We thus need even more numbers. Inferring the optimal tree(s) and the

evolutionary parameters is a very complex task: the search space of possible trees, even forgetting

the evolutionary parameters, is astronomically large; the parameters are not independent from one

another, making the problem even harder; finally, rather than there being one unique absolutely

best tree, in this type of model there are usually very many different trees each of which explains the

data relatively well. Bayesian MCMC is precisely the kind of method to work with such complex

situations. It is able to search through very hard-to-analyze parameter spaces, and it outputs not

a single tree as its answer, but a sample of trees that are well compatible with the data.

Here is how Markov Chain Monte Carlo works. The algorithm defines a Markov chain (hence

the first MC in the name), a mathematical construct that moves through the search space of

possible trees according to certain rules, but at the same time retaining a degree of randomness

(hence the second MC, Monte Carlo, metaphorically referring to the randomness component

through association with casinos.) At each step of the chain, a new tree is picked together with

the evolutionary parameters, essentially as a guess.5 This is our new hypothesis. We compute the

probability that language change would have generated exactly the data that we observed assuming

that our new hypothesis is correct. That probability is called the likelihood of our hypothesis in

statistical parlance. Normally, that probability will be very low even for the best cases, because

there are many ways in which language change can proceed. (That is why that probability is

normally counted on a log-scale: it is much harder to work with numbers like 101000 than with

log10 (101000 ), which is just 1000.) We also compute the prior probability of our hypothesis. In

the case of the tree, for example, its prior probability may be equal to how likely that tree would

be to be generated by a certain random evolutionary process. That process is said to induce a tree

prior. There are also other priors participating in the overall prior probability for example, a

prior on the probabilities of different character values at the root of the tree, etc. As tree priors

will turn out to be important for the Dene-Yeniseian case, we will return to them in greater detail

below. For now, it suffices to note that the prior probability of the tree belongs to the prior rather

than the likelihood because it does not depend on our observed linguistic data.

In this manner, we will have obtained the likelihood and the prior probabilities for our hypothesis.

It is the product likelihood * prior that is relevant in what follows. (Recall that a product on the

normal scale corresponds to a simple sum on the log scale.) In technical terms, that product is

proportional to the probability of our hypothesis given the data, which is what makes it a very useful

quantity. Even though the absolute value of likelihood * prior will be quite low for any hypothesis,

there will still be an enormous difference between more likely and less likely outcomes of language

change, and we want to see how exactly our new hypothesis fares compared to others. For that, we

compare the product likelihood * prior generated assuming that our hypothesis were true, with the

same quantity computed for our previous hypothesis. The fact that we only compare those two may

seem unintuitive at first: dont we need to compare our hypothesis with all others? The beauty of

MCMC is that even though we only use pairwise comparison, the resulting sample that we obtain in

the end contains the true information about how all possible different hypotheses fare comparatively.

What we do in our MCMC pairwise comparison is use a special rule to decide whether to keep the

old hypothesis or to adopt the new one instead: the higher the product likelihood * prior of our

new hypothesis, the more likely we are to adopt it. Importantly, the adopted hypothesis is not

necessarily better than the old one (with goodness here measured by likelihood * prior ). The point

5For the algorithm to be efficient, that guess has to be somewhat informed, but the details of that are not relevant

for our purposes here.

8 PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND

of the algorithm is crucially not monotonic improvement. This again might seem strange at first,

but in fact thats needed to obtain the mathematical guarantee that in the end, we will have a

sample from the true posterior distribution of our model. This means the probability distribution

over the space of our hypotheses, that is, trees and evolutionary parameters, conditional on the

data that we observed. In other words, MCMC allows us to determine which trees are more likely

given our data intuitively, thats of course exactly what we want. The interesting thing about

the MCMC chain is not the final hypothesis that we observe, but the sequence of hypotheses that

the chain passes through.

The Markov chain is defined in such a way that it can run literally forever. In practice, of course,

we are interested in actually getting the results out, so we want to stop it at some point and examine

what weve got. The mathematics of the chain guarantees an astonishing fact: if we run the MCMC

algorithm long enough, we are bound to start at some point sampling from the true posterior. In

other words, the trees and evolutionary parameters that we sequentially adopt as the chain works

will, after a certain point, be all coming from the set of best hypotheses given the data. However,

this will only happen after a certain moment. In a run-of-the-mill MCMC run, the chain will start

with some random hypothesis. Chances are that that random hypothesis would be pretty bad.

Technically, it will have a low likelihood, which means it explains the observed data very poorly,

and the product likelihood * prior will correspondingly also be very low. However, the setup of our

chain is such that it will start quickly moving towards better and better hypotheses, and at some

point we will usually see that the likelihoods of our hypotheses are not climbing up anymore, but

rather stay at roughly the same level. When we reach that plateau, we are likely to have started

sampling from the true posterior. It is said that the MCMC converged at this point. Because the

plateau is such a distinctive shape, a common way to estimate whether we have reached convergence

and started sampling from the true posterior is to simply examine the plot showing the likelihoods

of our sequentially drawn samples: if we see a plateau in that plot, were likely to be in the right

spot already. The reason people use such eyeballing to detect convergence is that unfortunately,

there is no theoretical way to detect mathematically, with absolute certainty, that we have reached

the posterior. Heuristics such as eyeballing the likelihood plot is all we have. Fortunately, however,

in practice we can use further tricks for determining (though not guaranteeing) likely convergence.

There is currently a broad agreement between MCMC practitioners that as a whole, detecting

convergence is not such a big problem in real-life studies. We will briefly discuss some commonly

used practical diagnostics in the next section, when we describe Dene-Yeniseian analyses.

Even though we start sampling from the posterior as MCMC progresses, this does not mean that

all hypotheses that we sample are equal. Because many parameters in our hypotheses are continuous

numbers, it is theoretically impossible to sample precisely the same hypothesis twice. In that

uninteresting sense, all hypotheses are equal. However, some hypotheses may come from regions in

the tree space and the parameter space that are densely populated with good hypotheses, while

others may come from regions less likely on the whole. In the end, we are more interested in this

density at the level of regions rather than on the level of individual hypotheses. For example, we

may be interested in the question of which tree structure a clade of languages A, B, and C has:

there are three logical possibilities. It could be that in the true posterior, shape ((A,B),C) occurs

30% of the time, shape (A,(B,C)) 70% of the time, and shape ((A,C),B) never occurs. If this

is the case, then our samples in MCMC should also be roughly 30% ((A,B),C) and roughly 70%

(A,(B,C)). (Why only roughly? In fact, if we ran our MCMC for an infinite amount of time, the

numbers would be exactly as in the true posterior. But because in practice we only run MCMC

for a finite time, we obtain a sample from the posterior rather than the full posterior. If you draw

10000 samples from an infinite can with green and red balls with a 30% share of greens, you will

get very close to 30% green in your sample, but probably not exactly 30%.) This distribution over

the clades shapes is what is ultimately interesting for us as analysts, and not the fate of individual

PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND 9

hypotheses. To study that distribution, we check how many trees in our posterior sample exhibit

each of the shapes.

2.2. MCMC inference on Sicoli and Holtons Dene-Yeniseian data: setting up the

parameters. We are now ready to turn to the actual analysis of Dene-Yeniseian data. In this

section, we explain the settings used to set the analysis up, and then in the next section, we discuss

the results. The analysis was performed using MrBayes [Ronquist et al., 2012b], a popular, free

and open-source MCMC software specialized for phylogenetic inference. [Sicoli and Holton, 2014]

published both the data file they used and the logs of many of their runs of MrBayes, which allows us

to exactly replicate their analyses. (Generally, making the data and software parameters available

together with the publication is a very good practice, making it easy to replicate and build upon

earlier results.)

[Sicoli and Holton, 2014] ran their MCMC chains for 2 million steps, also called generations (no

connection to human generations). They discarded the initial 25% as burnin: that is, the initial

portion of the chain where it was not likely to have started sampling from the actual posterior. The

likelihood plots in their logs show clear plateaux, suggesting that the chain indeed has converged by

the end of the burnin period. In my replicas of their analyses, together with eyeballing the likelihood

plots, I used another common heuristic for detecting convergence provided in MrBayes. Instead of

running a single MCMC analysis, in MrBayes one can run simultaneously two or more chains. Once

they all converge, they should be sampling from one and the same distribution over trees. This

suggests a simple nice diagnostic: we can compare how similar the trees sampled by our multiple

independent chains are. MrBayes does that comparison by counting how different the relative

frequency of each potential clade in the tree is between the different independent runs. In the

limit, those frequencies should agree almost precisely: in each run, the frequency of a clade should

converge to its true frequency in the posterior distribution. I ran all analyses with 2 independent

runs, using 2 million steps initially, but then adding more steps until the standard deviation of

clade frequencies between the two runs fell below 2%. (This diagnostic is called average standard

deviation of split frequencies in MrBayess output; split here refers to a partition of all languages

into a clade and everything outside of that clade.) In most cases, 2 million generations were already

enough for that to happen. When two heuristics (clade frequencies and the visual examination of

the likelihood plot) both point to likely convergence, it is safer to conclude that we have reached

the true posterior.6

Not all of the samples in the 2 million generations minus the 25% burnin are actually stored and

analyzed. Sicoli and Holton stored only every 500th sample, while I did that with every 1000th

one. The difference between our choices is not significant, but it is very important to thin out ones

sample with a value of around that magnitude. For technical reasons, consecutive samples from

phylogenetic MCMC are normally highly correlated with each other. In other words, they are not

statistically independent. Even though they are drawn from the true posterior, the dependence

6MrBayes has two different parameters that govern the number of MCMC processes run in a single analysis. One

is called nruns, and it represents the number of fully independent runs whose primary purpose is to help us detect

convergence, or a lack thereof, via the diagnostic average standard deviation of split frequencies. Another is called

nchains. That is a very different parameter. With nchains=1, MrBayes runs standard MCMC. With nchains1,

MrBayes runs an improved version of the algorithm, called Metropolis-coupled MCMC (abbreviated MCMCMC, or

MC3 ; see [Altekar et al., 2004] for the explanation in the context of MrBayes). That advanced algorithm creates, in

addition to the true MCMC chain, several separate fake MCMC-like chains that follow looser rules for accepting a

new hypothesis. The output of those chains cannot be used directly: taken as a whole, it is just junk. But what we

can do with it is check whether a hypothesis from one of the junk chains would be accepted if it were to appear in the

main, true chain. Nicely, this swap was shown not to invalidate the good properties of the true chain. Allowing such

swaps of hypotheses with the junk chains helps the software to sample from the posterior more efficiently: the junk

chains traverse the hypothesis space as scouts, and find some good hypotheses that the main chain would otherwise

only find after a long time searching. The non-technical bottom line is that it is good to keep the default value

nchains=4 in ones analyses. It is safe to do this, and it should increase the efficiency of the analysis.

10 PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND

means that they represent less information about the posteriors structure than independent samples

would, and create a random bias. To reduce this, we thin them out, with the hope that our stored

samples are far enough in the chain to be largely independent. There is also a way to see, for each

parameter of interest, how independent our samples really appear to be, even after thinning out.

This is summarized for each estimated parameter separately in the indicator ESS (for effective

sample size), reported by MrBayes in the output of sump command after an analysis was finished.

Sicoli and Holtons 1.5 million steps were sampled 3001 times, while each of my runs with the same

number of steps was sampled 1501 times. For the inferred parameter of the tree height, the ESS

for one of Sicoli and Holtons analysis was over 300, clearly much smaller than the actual number

of samples. For my replica, it was about 500 for each run, even though the overall number of my

samples was smaller. On the other hand, for another parameter Sicoli and Holtons ESS was much

larger than my ESS. This illustrates that it is not trivial to predict in advance how to get a higher

ESS. However, an ESS of 200-300 is generally considered to be sufficient for inference.

When data are loaded into MrBayes, the user needs to specify which coding scheme was used

when collecting the data. For example, if we compiled in advance a list of typological features

and recorded the values for them for each language, then we have coded our features exhaustively,

regardless of what the values actually were. We tell this to MrBayes with coding=all. The

default setting for this parameter is, however, coding=variable, which means that only features

with non-uniform values were recorded. This would have been appropriate if we did not have

a feature list in advance, but consciously included only interesting features on which we knew

our languages had different values (and therefore more useful for phylogenetic classification). As

another example, when lexical cognacy data are used for linguistic phylogenetic analyses (often

obtained from Swadesh lists), each cognate class is usually recorded as a separate binary character.

In this case, the proper coding scheme is coding=noabsencesites, meaning that we did not record

characters that were absent in all languages in the sample. Indeed, if there is a cognate class in our

family, but we did not see its representatives in our languages (for instance, it could have existed

only in ancient languages for which we had no data), then we have no way of knowing it existed.7

Furthermore, we need to tell MrBayes what it should expect in terms of the probability to see

a 0 or a 1 for each feature at the root node, that is, the most removed proto-language. Why do

we need this? Luckily, once those probabilities at the root are set, it becomes possible to compute

the exact probability to have generated our observed data if our current tree and evolutionary

7

In the current version of MrBayes, 3.2.6, the code prohibits setting coding to all or noabsencesites when

the data type is standard, which is what is used in Sicoli and Holtons datafile. There are two ways to solve this

technical issue. The simpler one is to edit Sicoli and Holtons .nex data file, replacing datatype=STANDARD with

datatype=RESTRICTION. However, this has a subtle analytical consequence, described below in this footnote. The

other way is to change line 3679 in source file model.c of MrBayes with the following stopgap line:

if((modelParams[i].dataType != RESTRICTION) && (modelParams[i].dataType != STANDARD))

and then re-compile the program from source.

Both standard and restriction data types allow binary characters. The reason MrBayes disallowed some coding

schemes for standard probably has to do with biologists using the program mostly employing standard characters

when recording biological morphology data (e.g., presence of wings, etc.). For such underlying data, the all coding

scheme would not have much sense, so it is reasonable to disallow it. However, for our linguistic purposes, the

prohibition currently implemented in MrBayes is not meaningful, and its safe to remove it by using the line above.

If both standard and restriction can be used for binary characters, how do they differ? MrBayes uses different

evolutionary models for the two. For standard data, it assumes that change from 0 to 1 has the same probability

as change from 1 to 0. For restriction data, it allows non-equal rates of change in the two directions. For Dene-

Yeniseian data, allowing the rates of change between 0 and 1 to be unequal makes the Yeniseian clade more distinct

from Na-Dene than in Sicoli and Holtons original analysis, but this effect is mild compared to that of changing the

tree prior. I report in the main text only analyses using the standard data type. First, this makes comparison more

favorable for Sicoli and Holtons results, which I am arguing against. Second, since 0s and 1s in different characters of

Sicoli and Holtons do not actually represent identical states, because they refer to very different linguistic entities, it

is far from obvious that unequal rates are any better than equal rates: both do not correspond to the reality, wherein

most characters in the dataset would presumably each have their unique true rates of change between 0 and 1.

PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND 11

models were the true ones. In other words, it becomes possible to compute the likelihood of our

MCMC hypothesis. But the process of letting MrBayes know the root frequencies is not trivial.

When data are coded as standard, in MrBayes we are not allowed to say that, for example, 0s

should be more frequent than 1s at the root. That is because biological standard characters

do not have a fixed interpretation for either 0 or 1: those labels are arbitrary, so it does not

make sense to assign them specific global probabilities. The same is true for linguistic typological

data in [Sicoli and Holton, 2014]s dataset: even though 0s code absences and 1s code presences,

there is hardly any real-world connection between the probability of having a 3-phoneme vowel

system and having the plural expressed on pronouns. Instead of asking us for specific probabilities,

with standard characters MrBayes draws 0s and 1s frequencies at the root (state frequencies)

randomly for each feature. Parameter symdirihyperpr determines how far from equal those values

are allowed be. The default value for symdirihyperpr is fixed(infinity) in MrBayess current

version, which forces the probabilities for 0s and 1s to be exactly equal. If we do not want that,

we can use a smaller number. With a number 1, the preferred state frequencies are around 50%,

but the smaller the number, the farther they are allowed to deviate on average. (Infinity is just the

limit of this: with infinite symdirihyperpr, the frequencies are infinitely strongly forced to be close

to 50%.) With symdirihyperpr=fixed(1), any combination of state frequencies is equally likely

for every character. Finally, with symdirihyperpr smaller than 1, extreme frequency distributions

become more likely: e.g., MrBayes will assume the probability of 80% for 0s (or for 1s, as 0s and

1s are treated symmetrically) more frequently than the probability of 50%. Sicoli and Holton used

the default, i.e. equal state frequencies, in their analyses. I do the same in this paper: for one set of

settings, I tested how using a more liberal state-frequency prior would affect the results, and found

that changes in the inferred tree topology were small. (Sicoli and Holtons original result comes

out a bit stronger under the equal state frequency setting, so keeping it makes comparison more

favorable to their argument.)8

Sicoli and Holton used so-called gamma rate heterogeneity across characters, set up in MrBayes

by rates=gamma. This setting is common in todays phylogenetics. It allows us to somewhat correct

for the fact that different features may be subject to change with different speed. Contrary to a

common misconception, gamma rate heterogeneity does not assign a special rate to each feature.

Instead, it computes the likelihood of the data separately for several possible rates of change, and

then averages across them. This helps to account for true rate differences because if a certain

feature changes very fast, the probability of observing its true values will be much greater under

a fast rate of change, and the corresponding term will dominate the others. Similarly for slow-

changing characters. The reason we do such averaging instead of actually assigning a different rate

for each feature is practical: with a separate rate for each feature, we would have greatly increased

the number of parameters to estimate, while our dataset would remain of the same size. This would

lower the quality of our statistical inference. That is why we have to settle for less precise, but

more practical gamma rate heterogeneity.9

8If data are coded as restriction, then the state frequencies at the root cannot be set independently. Instead,

they are determined by the rates of change from 0 to 1 and from 1 to 0. (Those rates must be equal for standard

characters, but are allowed to be unequal for restriction ones.) Technically, state frequencies at the root in

MrBayess model for restriction characters are the stationary frequencies of a Markov chain with rates of change

as transition probabilities. In practice, if the rate of change from 0 to 1 is twice as large as that from 1 to 0, this

means that 1s will be considered twice as likely as 0s to occur at the root. This is because the root itself is assumed

to be the result of a very long process of language change. After a very long time, the probability for the change

process to have 0 as its current value only depends on the rates of change between 0 and 1: after enough time, the

process forgets which value it originally started from. Thats why one is not allowed to set up the probabilities for

0 and 1 at the root separately from the rates of change for restriction characters in MrBayes.

9One can further parametrize how gamma rate heterogeneity is implemented by setting up the number of gamma

categories ngammacat: the fixed number of distinct rates of change used by the algorithm. MyBayess default is

4 categories, which is generally considered to be sufficient. When they define gamma heterogeneity in MrBayes,

12 PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND

Furthermore, Sicoli and Holton set up the clocked uniform tree prior, proposed by

[Ronquist et al., 2012a]. The tree prior determines the a priori probability of each language-family

tree regardless of the observed data. Somewhat surprisingly, several large classes of tree priors, in-

cluding all commonly used in current phylogenetic software, assign equal probabilities to each tree

topology: the shape of the tree that does not take into account branching times (see [Aldous, 2001],

a.m.o.) That is why the tree prior can be conveniently decomposed into two parts: the prior on tree

topology (which, being uniform, does not have to be explicitly taken into account), and the prior

on branching times, or equivalently, on branch lengths. Correspondingly the tree prior is set up in

MrBayes as brlenspr (for br anch lengths pr ior). [Ronquist et al., 2012a]s clocked uniform tree

prior assumes that the branching times are uniformly distributed over time since the root age. It

is set up using command brlenspr = clock:uniform. In the next section, we will see that the

choice of tree prior greatly affects the results for Dene-Yeniseian, and therefore we will discuss tree

priors in more detail below. Even more information intended to demystify different types of tree

priors is provided in Online Appendix B.10

Under the hood, a tree prior is often conditional on the tree height: the temporal distance between

the root and the leaves observed at the present. For the clocked uniform prior, the root age simply

works as the upper bound on the age of any node. As the probabilities for different branching times

are equal, it thus does not affect the relative probability of different choices of branching times,

as every choice is equi-probable. For other priors inducing non-uniform probabilities on branching

times, those probabilities will often be relative to the root age. The tree height/root age itself is

also a parameter. MrBayes reports its inferred values under the name TH. As any parameter in the

Bayesian setting, the tree height requires a prior, but fortunately, the default prior that MrBayes

uses is very reasonable, so we do not need to worry about it.11

The tree height, as well as the length of each branch, are by default expressed in abstract units.

To give them meaning, we need to explicitly connect them to the actual language change. This is

done through the molecular clock : the rule that says how abstract units of time that we used as

branch lengths are to correspond to actual changes in linguistic features. It is common to set the

base rate of the molecular clock to exactly 1. If we do this, then for example a branch of length 0.05

will correspond to the amount of time during which approximately 0.05 changes per one character

occur. In other words, branch lengths can be read as the expected, on average, number of changes

along that branch. This is the choice made in all analyses in [Sicoli and Holton, 2014] and in this

paper.12

[Sicoli and Holton, 2014] add command nst=6. That command affects only DNA data, as it determines how many

different change rates between the four nucleotides are allowed in the model. When data are binary, the command

has no effect.

10There is no standard name for this prior introduced by [Ronquist et al., 2012a]. I call it clocked uniform, but

it can also be called simply uniform. The problem is then to avoid confusion with another common uniform

prior, set up in MrBayes with command brlenspr = unconstrained:uniform. See Online Appendix B for more

information.

11The prior currently used as the default is the exponential distribution with mean 1. It is reasonably non-

informative, usually allowing the data to determine the right height. One might wonder why we couldnt simply say

that any height is equally likely to occur. Technically, this amounts to a uniform prior distribution over the interval

from 0 to infinity. Even though it might seem attractive, in fact, using that distribution is quite a poor choice with

rather un-intuitive consequences: basically, it would mean that we expect super-high trees (corresponding, say, to

millions of years of language change) to be just as likely as reasonable-length ones. That cant be right. In addition,

there is a technical problem as well: the uniform prior from zero to infinity belongs to the category of improper

priors. In practical terms, this is harmless for, say, tree topology inference, but using an improper prior makes the

stepping-stone likelihood estimation procedure undefined. Whats worse, MrBayes wont report that fact either, as

it does not check for priors being proper.

12One can also go further and try to translate expected numbers of changes into actual calendar years. This is

a difficult enterprise on many levels, and at the minimum requires adding calibration points: dates of existence for

some of the proto-languages or of the extinct languages of the family.

PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND 13

We already discussed the method of gamma rate heterogeneity, which relaxes the assumption

that all features in our dataset are subject to change at the same rate. However, we also know that

language change does not necessarily proceed at the same rate in different languages either. A simple

molecular clock that prohibits such rate-of-change variation between branches is called the strict

clock. But there are many different methods that introduce variation in the speed of language change

for different languages; the resulting clocks are called relaxed. [Sicoli and Holton, 2014] compare

the strict clock model with a specific relaxed clock model called TK02, [Thorne and Kishino, 2002].

The TK02 model belongs to the family of autocorrelated relaxed clocks. What this means is that

other things being equal, average rates at contiguous branches will likely be similar (so the rate at

one branch will correlate with the rate, i.e. with the same quantity, on the neighboring branch, hence

autocorrelation). The amount of allowed difference between the rates is probabilistically controlled

by the parameter called tk02var in MrBayess output. The larger the parameters value, the greater

the variation between neighboring branches. The value of that parameter is chosen at random for

each MCMC hypothesis. How those values are chosen is determined by the prior tk02varpr. The

default value for that prior is sensible, and is the one I kept in my analyses. (Sicoli and Holton did

not report theirs.)

Congratulations we have just reviewed all of Sicoli and Holtons analysis settings! Online

Appendix A presents an example analysis description produced by MrBayes, and explains how its

parts correspond to the notions we have just discussed. It is time to turn to the analyses themselves.

2.3. MCMC inference on Sicoli and Holtons Dene-Yeniseian data: analyzing the re-

sults. First, we look at the choice between strict and relaxed molecular clock. Before starting their

main analysis, [Sicoli and Holton, 2014] wanted to determine whether the strict clock or the TK02

relaxed clock was a better model. Whether to apply computational methods of model selection

in this case depends on the researchers judgement: strictly speaking, the strict molecular clock

is a special case of the relaxed TK02 clock, arising in the limit of low rate variation. So in one

legitimate sense, TK02 cannot be worse than the strict clock. However, another legitimate way

to compare the two models is to ask whether TK02 with the best fitting parameters explains the

data better than the strict clock. Answering this question involves comparing the likelihood (recall

that this refers to the probability of generating our observed data assuming particular evolutionary

model and history) under the strict clock and under those parameters of TK02 that maximize the

likelihood in statistical parlance, under the maximum likelihood estimate. If this is our method

of comparison, then usually it will be the case that the more flexible model will win: since TK02

has more parameters, it can be more finely tuned to fit the observed data, and we can expect

its likelihood to be generally higher. This is indeed the case: while we do not explicitly compute

maximum likelihood estimates, we can use the likelihood of the best MCMC sample as its proxy.

MrBayes reports those values upon concluding an analysis as Likelihood of best state for cold

chain of run n. My replicas of Sicoli and Holtons analyses list the likelihood of ca. 1007 for

the strict clock, and ca. 993 for the TK02 clock. As expected, the more flexible TK02 clock wins

under that measure.

However, such winning comes at a price. A model with more parameters might be tuned very

tightly to the observed data, but that does not necessarily mean that it is the true model. For

example, suppose I toss a coin four times, and it lands tails in three of them. If I were to tightly

fit my model of the coin to the observations, I would conclude that the next coin toss has a 75%

probability of landing tails. However, if I used a regular everyday coin, it is probably not as biased as

that, and a simpler model that does not allow any bias would probably predict the future outcomes

better. This is one of the reasons why statisticians often prefer to balance the maximization of the

likelihood of the observed data and the flexibility of the model.

One popular approach to model selection that implicitly favors simpler models involves comparing

the Bayes factors of the models, and that is the approach [Sicoli and Holton, 2014] used. The

14 PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND

definition of the Bayes factor is deceptively simple: it is the ratio of marginal likelihoods derived

under the two models. The key here is word marginal : it means that we are comparing not the best

possible values of likelihood, but rather average the likelihood under all possible parameter settings.

Online Appendix C discusses this in a bit more detail, but the crucial point is that a model that

derives very good likelihoods with good parameters, but terribly low likelihood with many other

parameter sets, is going to have a lower marginal likelihood than a model that consistently derives

only averagely good outcomes. We are thus comparing the performance of two models under a wide

range of parameters, not the ones that present each model in the most favorable light. From this it

follows that the result of Bayes factor comparisons depends on the range of parameters we deem to

be acceptable for each model: for instance, if we exclude beforehand some parameter values that

we know to be terrible, the resulting narrower model will have on average better likelihoods. It

is thus important to bear in mind that the Bayes factor method is sensitive to how we select the

priors for our model parameters.

Though theoretically taking the ratio of marginal likelihoods (i.e. computing the Bayes factor) is

straightforward, in practice it is not trivial to obtain marginal likelihoods. Fully accurate averaging

across the whole parameter space is out of the question for all but the simplest and most well-

behaved models. Because of that, most Bayes factors reported in the literature are only estimates

of the true Bayes factors. Sicoli and Holton try two different methods for estimation, both of which

are regularly used in phylogenetics: the harmonic mean and the stepping-stone methods.

The first method involves comparing the harmonic mean of the likelihoods in each sample from

the posterior. That quantity is easy to compute, and is in fact reported in MrBayess standard

output. However, it is well-known to statisticians to be terrible, very bad, absolutely not good

estimator for the likelihood value we are seeking. One way to show it is to simply note that the

variance of that estimator may be infinite (e.g., [Raftery et al., 2007]). In informal terms, this

means that you have no idea how far your estimated value for the likelihood is from its true

value. Another way to explain the problem (requiring more mathematical background than the

current paper assumes, but very convincing for those who can make it through) may be found in

a blog post by statistician Radford Neal, [Neal, 2008]. Intuitively, the issue is that to accurately

estimate the overall likelihood, we need to gather likelihood values from regions of the tree space

and evolutionary-parameter space that do not explain the data particularly well. By design, our

MCMC chain will pass through such regions only rarely. So our MCMC sample would normally

largely lack information crucial for the accurate computation of the marginal likelihood. The true

marginal likelihood is usually much lower than the harmonic-mean estimate reported by MrBayes

after a regular MCMC run.

Hence the second method that [Sicoli and Holton, 2014] use: stepping-stone sampling

[Xie et al., 2011], conveniently implemented in MrBayes. The mathematical idea of the stepping

stone method is not too difficult, but nevertheless lies beyond the level of the current largely in-

formal discussion. What is important is that stepping stone estimation (when defined) is far more

accurate than the harmonic mean method. In particular, when the two disagree, the stepping stone

results are very likely to be the more accurate ones.

[Sicoli and Holton, 2014] obtained very close stepping stone estimates for the strict and the TK02

clock. This means that adding TK02 rate variation between languages did not on average help to

explain the data better, but at the same time it did not make matters worse on average either.

Based on that, we can conclude that a simpler model with the strict clock should be used on

practical grounds. Sicoli and Holton, however, observe also that the harmonic mean estimate for

the strict clock was substantially higher than the harmonic mean estimate for TK02. Based on

that difference, they declare that the strict clock model better fitted the data. This is incorrect: as

explained above, when stepping stone and harmonic mean estimates disagree, the stepping stone

ones should be accepted. So Sicoli and Holtons decision to use the strict clock is justifiable, but

not by the argument they employ.

PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND 15

Next, Sicoli and Holton turn to the main computational question of their paper: which tree struc-

ture is more likely for the Dene-Yeniseian macro-family given their typological data, [[Yeniseian],

[Na-Dene]] or [[Na-Dene I], [Na-Dene II, Yeniseian]]? To answer that question, it is sufficient to

look at the posterior distribution of Sicoli and Holtons baseline analysis. It is summarized in

the majority-rule consensus tree in Fig. 2(a), based on my (slightly modified) replica of Sicoli and

Holtons analysis. The majority-rule consensus tree contains only clades that are present in more

than 50% of the trees in the posterior sample. The fact that the Na-Dene family is not shown as

one clade means that in more than half of the trees, Na-Dene do not form a single sub-family to

the exclusion of Yeniseian. In other words, the posterior probability of the topology [[Yeniseian],

[Na-Dene]] is less than 50%. In fact, examining MrBayess output in more detail, we can determine

that the posterior probability of that topology is about 18%. Just as [Sicoli and Holton, 2014]

argued, family structure [[Yeniseian], [Na-Dene]] is not strongly supported by their analysis. The

alternative structure with the most support, namely with 58% posterior probability, is [[Yeniseian,

Californian Athabaskan], [other Na-Dene]].13

So far, we have reviewed what [Sicoli and Holton, 2014] did in their computational analysis. Now

we get to the crucial point of this section: something they did not. [Sicoli and Holton, 2014] did

not check the robustness of their analysis to different choices of tree prior. When the dataset is

large, evidence from the data would usually overwhelm the tree prior, though this always needs to

be tested empirically by checking how different priors work on ones data. But the Dene-Yeniseian

typological-feature dataset is very small, with only 116 binary features and 84 unique patterns of

their distribution over the languages. To put this into perspective: [Ritchie et al., 2017] examine

the effect of tree-prior choice on three empirical datasets. In one of them, they find that the

tree prior strongly and adversely affected the results, while the other two were relatively fine. The

problematic dataset was smaller than the other two. It contained 14K DNA letters, with 188 unique

patterns across the taxa. The two larger datasets contained respectively 14K DNA letters with 5765

unique patterns, and 21K DNA letters with 6355 unique patterns. The Dene-Yeniseian dataset is

obviously much smaller than even the smallest, problematic dataset of [Ritchie et al., 2017]. It is

therefore quite likely that the Dene-Yeniseian analysis might be sensitive to the choice of tree prior.

The clocked uniform prior that [Sicoli and Holton, 2014] used is a mathematical abstraction that

does not arise from any known evolutionary process. It takes all branching times to be uniformly

distributed within the admissible intervals, which in our case simply means the interval between

the root of the tree and the present when the living languages are sampled. On the surface, this

might seem like a nice choice that does not bring any dangerous assumptions into our analysis. But

there are two problems with that reasoning. First, other tree priors can also be argued to be nice,

as we will demonstrate shortly. Second, it is in itself dangerous to feel safe without actually testing

how safe ones assumptions are. In the good case when we have enough data, the data should

overwhelm the tree prior, making its choice generally unimportant,and the results more or less the

same no matter which tree prior was used. But in the bad case when different tree priors lead to

different results, each of them effectively imposes its own preferences onto the analysis. Because of

that, it is always a good idea to test ones analysis for robustness against tree prior choice.

Here, I subject Sicoli and Holtons dataset to such a test, running exactly the same analysis, but

with the birth-death tree prior.14 Unlike the clocked uniform prior, the birth-death prior arises from

13Sicoli and Holton themselves argue for the [[Na-Dene I], [Na-Dene II, Yeniseian]] topology using a different

method, namely the Bayes factor comparison between marginal likelihoods, as described above for the choice between

two clock models. Unfortunately, their application of the method in that case was affected by a mathematical mistake:

their stepping stone likelihood estimation favored the [[Yeniseian], [Na-Dene]] topology, not the [[Na-Dene I], [Na-

Dene II, Yeniseian]] topology as Sicoli and Holton claimed. When comparison is done more carefully, however, it can

support their conclusion, under the clocked uniform tree prior. I further discuss the choice between the posterior

probability and the Bayes factor methods, and the latters application to this case, in Online Appendix C.

14 For the birth-death tree prior, MrBayes implicitly enforces the improper uniform root-age prior from 0 to infinity.

Even if the user tries to set a different prior, this is overridden by the program without any message informing her

16 PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND

txc

kuu

aht

tfn

apc

59

69 apw

87 apj

72 nav

apk

92 100

apl

srs

eya

91

tli

100 ing

78 hoi

59

koy

gwi

dgr

72 scsh

bea

53

crx

hup

(a) 77 mtl

wlk

58 84

kto

ket

100

zko

gce

58 tol

cco

xsl

62

kkz

chp

65 55 ttmN

tceS

55 haa

taa

79 tcb

tau

0.02

ket

100

zko

gwi

dgr

77 scsh

bea

56

crx

xsl

67

kkz

chp

64 60 ttmN

100 tceS

56 haa

taa

51

81 tcb

tau

txc

ing

80 hoi

57

koy

kuu

99

(b) aht

tfn

eya

80

tli

gce

57 tol

cco

hup

88 mtl

wlk

82

kto

apc

60

7 2 apw

87 apj

70 nav

apk

95 100

apl

srs

0.02

Figure 2. Majority-rule consensus trees. Two Yeniseian languages Ket (kto) and

Kott (zko) highlighted in blue; four Californian Athabaskan languages (Na-Dene) in

orange. Numbers on clades show posterior probabilities in percents. (a) Modified

replica of [Sicoli and Holton, 2014]s analysis of Dene-Yeniseian excluding unrelated lan-

guage Haida: gamma rate heterogeneity with 4 categories, strict molecular clock, branch

lengths represent expected number of changes per character. The replica differs from

the original analysis in assuming that all sampled characters were included in the data,

whereas [Sicoli and Holton, 2014] employed only non-uniform characters. Tree prior:

[Ronquist et al., 2012a]s clocked uniform prior. (b) The same, but with a different tree

prior: birth-death. Trees prepared in FigTree, free software by Andrew Rambaut.

of that. This is all right for exploring the posterior, but makes MrBayes stepping-stone estimates undefined for

PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND 17

a specific biological model of the tree-generating process (see [Gernhard, 2008] for mathematical

analysis as well as references to earlier work). The birth-death model assumes that all languages (or

species) are always equally likely to split into two, with constant rate , and also equally likely to die

off, with constant rate . These assumptions lead to specific predictions regarding the probability

of the timing of branching events in the tree. (The model accounts for the fact that there would

often exist branches of the family that left no living descendants, in which case we can only detect

a subset of all branching events.) One of the models predictions is that branching events become

more common as we approach the present. This is intuitively appealing: our family would usually

have more languages closer to the present, and it is natural to assume that among 10 languages, a

single split will happen sooner than if we have only 1 language. The birth-death prior thus prefers

to have branching events closer to the tips, other things being equal. This differs from the clocked

uniform prior, where branchings are distributed uniformly over time. In effect, the clocked uniform

prior implicitly assumes that splitting probabilities per language were greater in the distant past

than in the recent past. Arguably, the birth-death prior is then not less of a natural choice than

the clocked uniform one, and worth trying out. (I discuss common tree priors and the assumptions

between them in more detail in Online Appendix B.)

Fig. 2(b) shows the majority-rule consensus tree from an analysis that is exactly like my replica of

[Sicoli and Holton, 2014]s, but with the birth-death tree prior instead of [Ronquist et al., 2012a]s

clocked uniform prior. It is evident that the tree prior greatly affects the results! While the

clocked uniform prior, Fig. 2(a), did not strongly support the [[Yeniseian], [Na-Dene]] topology,

the birth-death prior does, Fig. 2(b). In fact, under the birth-death prior, the posterior support

for a Na-Dene clade, and consequently for the existence of Proto-Na-Dene, is 99%, so the results

under the birth-death prior are much more concentrated and therefore certain than under the

clocked uniform prior.

Some of the differences in the consensus trees are not hard to trace back to the assumptions

induced by the tree prior. Under the clocked uniform prior, language splits are evenly distributed

through time, and indeed we can see in Fig. 2(a) that the nodes of the tree occur at very different

heights. The birth-death prior, on the other hand, favors trees where most splits are relatively

recent, and in accordance with tht, most nodes in Fig. 2(b) occur relatively close to the leaves.

However, this general observation does not by itself explain the topological difference between

Fig. 2(a) and Fig. 2(b): it could well have been that a tree with the topology like in Fig. 2(a)

had a different distribution of node heights, on average closer to the present... This illustrates

that the choice of the tree prior can affect our predictions about the language familys topology in

non-trivial ways. Note also that a topological difference results from the tree prior even though the

prior probability of all topologies is the same in both analyses: the tree prior induces that change

indirectly.

To reiterate, in the good case when we have a lot of data, the data should to a large extent

override any preferences induced by the tree prior, [Ritchie et al., 2017]. So the fact that we get

very different results under these two tree priors simply means that our data are very scarce.

Consequently, it is relatively pointless to argue on the basis of phylogenetics alone which tree prior

is better in this case: this would amount to an a priori discussion about the proper structure of

Dene-Yeniseian not informed by actual empirical evidence from that family. The conclusion that we

should derive from the phylogenetic analyses in this case is that we do not have enough evidence.15

birth-death priors, therefore meaningless. As a practical consequence, the stepping-stone technique should not be

applied to analyses with birth-death priors.

Importantly, MrBayes cannot and does not check this and therefore would report such estimates without error

when asked; the responsibility to avoid that is completely on the user. This serious issue is thus one of the hidden

perils of phylogenetic analysis.

15In principle, one could try to apply standard model selection techniques to determine which tree prior is a

better fit for the Dene-Yeniseian data, just as with the clock models we discussed above. However, this would not

18 PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND

Let us sum up. Sicoli and Holtons argument for the Beringian homeland of the Dene-Yeniseian

macro-family was based on their result that their computational-phylogenetic analysis did not

support the [[Yeniseian], [Na-Dene]] topology. However, it turns out that their computational

result depends on the choice of the tree prior. This in turn means that we do not have enough

linguistic data in the dataset, and that phylogenetics as such cannot actually help us to decide

which Dene-Yeniseian topology is closer to the truth.

tree

For the sake of the argument, lets suppose that computational analyses provided strong evidence

against the [[Yeniseian], [Na-Dene]] topology (which is actually false, as we just discussed in Sec-

tion 2), and furthermore that the alternative tree topology [[Na-Dene I], [Na-Dene II, Yeniseian]]

strongly supported the Beringian homeland hypothesis (which is also false, as we showed in Sec-

tion 1). At least in this hypothetical case, could we conclude that the Dene-Yeniseian homeland

was in Beringia? Arguably, still not.

Sicoli and Holtons result implied that there was no Proto-Na-Dene that did not include the

Yeniseian languages. As in any statistical investigation, that result should be cross-validated: we

need to check whether it agrees with data external to the analysis. In the case at hand, it is

obvious that it does not. Yeniseian languages and Na-Dene languages are quite different from each

other, and there is no comparative-method reconstruction that supports the notion of Yeniseian

being a closer relative to some Na-Dene subfamily than the rest of Na-Dene. At the very least, the

genealogical unity of the Athabaskan subfamily within Na-Dene (torn into two by Sicoli and Holton)

is presupposed in the literature. In fact, the very reconstruction by [Vajda, 2011] that defends the

Dene-Yeniseian hypothesis employed Proto-Athabaskan forms reconstructed by [Leer, 2008] and

Vajda himself, and crucially not the material of individual Athabaskan languages.

Table 1 illustrates this using lexical data. One should note that the amount of likely lexical

correspondences between Yeniseian on the one hand, and Na-Dene on the other, is quite limited

in the first place it is overall ca. 100 sets at present [Campbell, 2011]. This can be compared

with ca. 800 cognate sets used by [Nikolaev, 2014] for the Na-Dene family. But even those Dene-

Yeniseian correspondences that are likely can demonstrate the point. There is a great difference

in the certainty of cognate correspondences within Na-Dene and between Na-Dene and Yeniseian.

Consider the Yeniseian and Athabaskan words for fire and foot in Table 1. Within Athabaskan,

the initial consonants of the relevant words represent one and the same pattern. Such regular

correspondences are a hallmark of convincing arguments for lexical cognacy (that is, for words

descending from the same ancestral word). In contrast to that, the correspondences between the

Athabaskan words and their likely Yeniseian cognates are much less transparent. The initial-

consonant belongs to the same series in Athabaskan, but corresponds to q in fire, but k in foot in

Yeniseian. Of course, this does not by itself mean that the words are not related, and [Vajda, 2011]

proposes that the k sound in Ket kis is due to a differential development of uvulars before front

vowels in the early Yeniseian. Still, when such constructs are based on a small number of examples

(and that number naturally depends on the overall number of known likely cognates), they have

much less explanatory power than when we observe exact sound correspondences as we do in the

Athabaskan examples. This is, of course, not the only issue: the coda correspondences are not

unproblematic either, as Vajda himself discusses. Finally, for fire, a meaning shift needs to be

postulated. Similar points can be made with respect to two other likely correspondences in Table 1.

be advisable. The fact that the results are so sensitive to the choice of tree prior is an important indicator: it shows

that we have too small an amount of data. It is not very meaningful to answer the question of which prior results in

a better fit for a dataset that is simply not informative enough.

PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND 19

Table 1. Selected likely lexical correspondences between Yeniseian and Na-Dene. All rows

represent items considered Dene-Yeniseian cognates in [Vajda, 2011] (i.e. indicates the

language is not reported to have the relevant cognate). Transliterations and within-family

cognacy judgements according to Global Lexicostatistical Database datasets for Yeniseian

[Starostin, 2013] and Athabaskan [Kassian, 2016]. For simplicity, only the first variant is

provided where several related items are available. Ket and Kott are Yeniseian languages.

Hupa and Kato are Californian Athabaskan, declared likely to be close relatives of Yeniseian

by Sicoli and Holtons analysis. Central Ahtena and Degexitan are Athabaskan spoken in

Alaska. Under [Sicoli and Holton, 2014]s analysis, Hupa and Kato are expected to be closer

to Ket and Kott than to Ahtena and Degexitan.

cloud qon dark ?? qos qUT

earth baN paN ninP neP

fire qoN daytime ?? xoNP khw o:NP qh oP qh UnP

foot kis ?? =xe-P =khw eP =qh e-P qh a:-P

Overall, the picture is clear: while the Athabaskan languages, crucially including Californian

Athabaskan like Hupa and Kato, show clear genealogical relationship, the matter of the Yeniseian

languages been related to them is far from obvious which is precisely why Vajdas argument was

taken up with a lot of enthusiasm among historical linguists. But there is simply no evidence in

the lexical and sound-correspondence data that would suggest that the Californian Athabaskan are

closer related to the Yeniseian than to the other Athabaskan groups. [Sicoli and Holton, 2014]s

phylogenetic findings strongly conflict with that.

Lets assess the situation. The bulk of linguistic evidence points to the existence of the Na-Dene

language family on the one hand, and the Yeniseian family on the other. Specialists arguing for

the Dene-Yeniseian connection on classical historical-linguistic grounds accept this as fact, and

consequently compare the reconstructed proto-languages of the two families, not the modern lan-

guages. There is nothing in the published linguistic evidence known to me that would suggest that

Californian Athabaskan (or any other Athabaskan subgroup) is more closely related to Yeniseian

than to other Athabaskan. The comparative method, by which this conclusion is reached, has been

tested since the 19th century on multiple language families of the world, and rightly remains the

scientific standard for proving language relationship.

At the same time, [Sicoli and Holton, 2014] come to a different conclusion based on a computa-

tional phylogenetic analysis. Their data consist of 116 binary features per language, often highly

correlated with each other. Note that if we recode the words in Table 1 into the binary format, we

will easily have more binary features than Sicoli and Holton used though this represents only a

tiny bit of available data! (It is a different question how reliable such data would be, but then again

we have no reason to believe that all of Sicoli and Holtons features are particularly informative

about language genealogies.) On the basis of their limited dataset, Sicoli and Holton obtain best

support for topologies with a [Yeniseian, Californian Athabaskan] clade. But as we have shown in

the previous section, this happens with the clocked uniform prior, but not the birth-death prior, so

their result is prior-dependent. Finally, we do not have a guarantee that even if the binary dataset of

Sicoli and Holtons were sufficient for accurate topology inference, their evolutionary model would

have uncovered the true history of the family. This point affects all phylogenetic methods, as the

models employed in them are very simple and sometimes dictated by mathematical convenience,

while the empirical processes of language change and language diversification have not yet been

studied with enough precision. This is not to condemn attempting computational phylogenetic

analyses, but when we interpret their output, we need to be aware of the fact that we do not have

a guarantee that our results reflect the truth, and better cross-check what we get. Given all these

20 PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND

circumstances, it is clear that when Sicoli and Holtons phylogenetic analysis resulted in a topology

conflicting with the historical-linguistic body of knowledge on Yeniseian and Na-Dene, it is the

phylogenetic analysis that should be rejected as inaccurate. If the Dene-Yeniseian macro-family

exists at all, it features a basal split into Yeniseian and Na-Dene.

There are in principle two reasons why Sicoli and Holtons computational analysis could have

produced inaccurate results: (i) the dataset carries wrong signal that does not reflect the true

family structure (this would happen if the feature distribution is primarily determined not by

shared descent, but by other factors for example, by chance, or by areal language contact); or

(ii) the dataset contains true genealogical signal, but not enough of it to show the true structure

when the prior discourages it. Our result that the birth-death prior produces the expected family

tree shows that the prior plays a role, but it does not allow us to decide between (i) and (ii). For

instance, while Ket and Kott are genealogically closely related, they are also close areally, and thus

whatever resemblances in the typological features they have may stem from areal effects and not

from shared descent. Thus at this point we cannot confirm whether the typological features of

Sicoli and Holton carry true phylogenetic signal.

4. Conclusion

[Sicoli and Holton, 2014] argued that computational phylogenetic analysis of their 116 binary-

feature typological dataset supported the Dene-Yeniseian likely macro-familys radiation out of

Beringia with back-migration into central Asia, rather than a migration from central or western

Asia to North America. This is incorrect. First, both migration hypotheses are compatible with

a range of linguistic phylogenies: finding the true phylogeny does not by itself decide the Beringia-

vs-deeper-Asia question. Second, Sicoli and Holtons phylogenetic results are sensitive to the choice

of tree prior: when their clocked uniform prior is replaced with the birth-death prior, we obtain

the topology with a basal split into Yeniseian and Na-Dene clades, and not the topology Sicoli

and Holton found where Californian Athabaskan is more closely related to Yeniseian than to other

Athabaskan. Third, while phylogenetic analyses themselves do not tell us which result is closer to

the truth, the bulk of linguistic evidence does. Linguistic data overwhelmingly support Athabaskan

languages forming a family, and allow us to firmly reject the hypothesis that Californian Athabaskan

are closer related to Yeniseian than to other Athabaskan. If any phylogenetic estimate based on

Sicoli and Holtons small dataset has a chance to reflect the true family history, it would be an

estimate showing the expected [[Yeniseian], [Na-Dene]] tree structure, and not the one argued for

by Sicoli and Holton.

Why did an earnest attempt by [Sicoli and Holton, 2014] to answer an important question re-

sult in a demonstrably wrong answer? The authors did their best to be transparent about the

computational analyses and did ensure that their research was replicable. They attempted some

cross-validation strategies to check that their results were not spurious: in particular, they checked

what happened when they included or omitted the outgroup putative isolate Haida, or one of the

two Yeniseian languages. Thus in many ways they followed the best practices in the field. However,

there were several things they did not check which made their overall reasoning faulty. One of them

was the correctness of the predictions they spelled out for the two homeland hypotheses, Section 1.

The others are connected to the technicalities and the interpretation of computational phylogenetic

analyses. There are two important steps which any phylogenetic study should take:

Check whether the inferred results are sensitive to the choice of priors

In particular, analyses based on small datasets may be sensitive to, among others:

the choice of tree prior;

the choice of feature coding scheme;

the choice of molecular clock;

the choice of evolutionary substitution model.

PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND 21

Estimating the fit of different models to data, through stepping stone estimation of marginal

likelihoods, may provide important insight into which models work better. However, we cannot be

sure that our likelihood estimates are correct. Furthermore, even our best computational models

are probably at best approximately correct. So in addition to likelihood checks, it is important

to also check how stable our inferences are to different choices especially when those inferences

feed into further linguistic analyses. In Sicoli and Holtons case, the relevant inferred variable is

the topology, but in other studies it may be something else, such as the proto-languages age in

calendar years.

When our variable of interest is not inferred stably across different settings, this must be reported.

Care and caution should be taken when basing ones interpretation on a value that only comes out

in a subset of analytical settings: this is often a sign that our data are not sufficient for deciding

the question, and the appearance of certainty in any single analysis is created not by the certainty

of our linguistic evidence, but by the narrowness of our sometimes implicit assumptions.

Check whether the inferred results agree with external knowledge

Computational phylogenetics is good at drawing mathematically precise inferences from large

amounts of data that cannot be efficiently processed by a human analyst. However, it suffers from

two drawbacks. First, it can only work with formalized and relatively uniform data. Its strength is

in numbers. This differentiates computational phylogenetics from the comparative method, which

in some cases is able to draw categorical inferences about one-of-a-kind language-change events.

Second, we are not yet at the stage where we know for sure which, if any, computational phylogenetic

models fit the actual language-change processes well enough. We also do not know what amount

of language-family history is at least theoretically identifiable using current phylogenetic methods.

This does not mean such methods should be abandoned: its only by studying many test cases that

we will be able to understand their value better. This is similar to how the value, and the limitations,

of the comparative method were understood as the result of many decades of research. However,

taken together, the two drawbacks mean that we cannot take phylogenetic results for granted even

when they have been meticulously cross-validated against small changes in the dataset, choice of

different priors, and other computational-phylogenetic factors.

Fortunately, historical-linguistic research often provides us with enough information to apply

sanity checks to computational phylogenetic results. Section 3 above is an example of that. In the

Dene-Yeniseian case, we can reject Sicoli and Holtons original phylogenetic results with certainty

because they are contradicted by a much more substantial body of evidence employed by historical

linguists studying the families in question. Similar checks need to be applied whenever possible,

which would usually require collaboration with historical linguists specializing in the families of

interest.

Conversely, the lack of such checks leads to peril: for example, [Bouckaert et al., 2012] report

strong support for the Anatolian homeland of the Indo-European language family based on phylo-

genetic geographical inference of the positions of proto-languages. The study was based on innova-

tive and a priori sensible statistical methodology, and the authors reported a sanity check of their

method using the Italic subfamily, whose proto-language was correctly inferred to have been spoken

in Italy. But for other Indo-European subfamilies, the results of their inference were obviously off

(unknown to the authors themselves): for example, no historical Iranian languages were inferred

to ever have existed in the Pontic and Siberian steppes, while historical sources are clear about the

Iranians presence there. Given the methods failure to correctly infer the geographical positions

of relatively recent historical Iranian languages, its geographical inferences about the much more

temporally distant Indo-European homeland are without merit. Without thorough sanity checks,

even best analytic procedures may lead to obviously incorrect conclusions.

Of course, sometimes accepted positions among specialists may turn out to be wrong, and it is

possible that a computational phylogenetic analysis brings new insight into the history. However,

22 PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND

when such an analysis conflicts with the accepted, or at least prominent, positions by the specialists,

this should be explicitly reported. Moreover, one should attempt to weigh against each other the

evidence for the accepted position vs. the evidence on which the phylogenetic result was based. For

example, in our Dene-Yeniseian case, the accepted position was based on large amounts of data

demonstrating the genealogical affinity of the Athabaskan languages, while Sicoli and Holtons result

was obtained from a very small dataset and was not even stable to the choice of the tree prior.

This makes it obvious which position is to be preferred. In other cases, the choice may be more

difficult. This, however, is not different from many controversies in historical linguistics: it is often

the case that each competing position may cite considerable positive evidence in its favor, though

they obviously cannot be true together. In such cases, there is no reason to dismiss computational

phylogenetic results as a priori less reliable. Equally, there is no reason to assume that they should

be more accurate than those established with the help of classical historical-linguistic approaches.

Supplementary Materials

Online Appendix A: Annotated output of MrBayess settings

Online Appendix B: Tree priors

Online Appendix C: Posterior probabilities or marginal likelihoods?

Supplementary Materials 1: MrBayes command files, logs with MrBayess output, consensus trees

stemming from MrBayess analyses.

References

[Aldous, 2001] Aldous, D. J. (2001). Stochastic models and descriptive statistics for phylogenetic trees, from Yule to

today. Statistical Science, 16(1):2334.

[Altekar et al., 2004] Altekar, G., Dwarkadas, S., Huelsenbeck, J. P., and Ronquist, F. (2004). Parallel Metropolis

coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics, 20(3):407415.

[Bouckaert et al., 2012] Bouckaert, R., Lemey, P., Dunn, M., Greenhill, S. J., Alekseyenko, A. V., Drummond, A. J.,

Gray, R. D., Suchard, M. A., and Atkinson, Q. D. (2012). Mapping the origins and expansion of the Indo-European

language family. Science, 337:957960.

[Campbell, 2011] Campbell, L. (2011). Review of The Dene-Yeniseian connection. International Journal of Amer-

ican Linguistics, 77(3):445451.

[Clark et al., 2009] Clark, P. U., Dyke, A. S., Shakun, J. D., Carlson, A. E., Clark, J., Wohlfarth, B., Mitrovica,

J. X., Hostetler, S. W., and McCabe, A. M. (2009). The Last Glacial Maximum. Science, 325(5941):710714.

[Gernhard, 2008] Gernhard, T. (2008). The conditioned reconstructed process. Journal of Theoretical Biology,

253(4):769778.

[Hoffecker et al., 2016] Hoffecker, J. F., Elias, S. A., ORourke, D. H., Scott, G. R., and Bigelow, N. H. (2016).

Beringia and the global dispersal of modern humans. Evolutionary Anthropology. Issues, News and Reviews, 25(2):64

78.

[Kassian, 2016] Kassian, A. (2016). Global lexicostatistical database. Na-Dene family: Athapaskan group. Available

at starling.rinet.ru.

[Kiparsky, 2015] Kiparsky, P. (2015). New perspectives in historical linguistics. In Bowern, C. and Evans, B., editors,

The Routledge handbook of historical linguistics, pages 64102. Routledge.

[Leer, 2008] Leer, J. (2008). Recent advances in AET comparison. Available at http://www.uaf.edu/anla/

collections/search/resultDetail.xml?id=CA965L2008b.

[Neal, 2008] Neal, R. (2008). The harmonic mean of the likelihood: Worst Monte

Carlo Method Ever. UToronto. https://radfordneal.wordpress.com/2008/08/17/

the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/.

[Nichols, 2008] Nichols, J. (2008). Language spread rates and prehistoric American migration rates. Current Anthro-

pology, 49(6):11091117.

[Nikolaev, 2014] Nikolaev, S. (2014). Toward the reconstruction of Proto-Na-Dene. Journal of Language Relationship,

11:103123.

[Potter, 2011] Potter, B. A. (2011). Archaeological patterning in Northeast Asia and Northwest North America: An

examination of the Dene-Yeniseian hypothesis. In Kari, J., Potter, B. A., and Vajda, E., editors, The Dene-Yeniseian

Connection, pages 138167. ANLC, Fairbanks.

[Raftery et al., 2007] Raftery, A. E., Newton, M. A., Satagopan, J. M., and Krivitsky, P. N. (2007). Estimating the

integrated likelihood via posterior simulation using the harmonic mean identity. In Bernardo, J. M., Bayarri, M. J.,

PHYLOGENETIC LINGUISTIC EVIDENCE AND THE DENE-YENISEIAN HOMELAND 23

Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., and West, M., editors, Bayesian Statistics 8, pages

145. Oxford University Press.

[Ritchie et al., 2017] Ritchie, A. M., Lo, N., and Ho, S. Y. W. (2017). The impact of the tree prior on molecular

dating of data sets containing a mixture of inter- and intraspecies sampling. Systematic Biology, 66(3):413425.

[Ronquist et al., 2012a] Ronquist, F., Klopfstein, S., Vilhelmsen, L., Schulmeister, S., Murray, B. L., and Rasnitsyn,

A. P. (2012a). A total-evidence approach to dating with fossils, applied to the early radiation of the Hymenoptera.

Systematic Biology, 61(6):973999.

[Ronquist et al., 2012b] Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D. L., Darling, A., Hohna, S., Larget,

B., Liu, L., Suchard, M. A., and Huelsenbeck, J. P. (2012b). Mrbayes 3.2: efficient Bayesian phylogenetic inference

and model choice across a large model space. Systematic Biology, 61(3):539542.

[Ronquist et al., 2009] Ronquist, F., van der Mark, P., and Huelsenbeck, J. P. (2009). Bayesian phylogenetic analysis

using mrbayes. theory. practice. In The Phylogenetic Handbook, pages 210266. Cambridge University Press, 2nd

edition.

[Sicoli and Holton, 2014] Sicoli, M. A. and Holton, G. (2014). Linguistic phylogenies support back-migration from

Beringia to Asia. PLoS ONE, 9(3):e91722.

[Skoglund and Reich, 2016] Skoglund, P. and Reich, D. (2016). A genomic view of the peopling of the Americas.

Current Opinion in Genetics and Development, 41:2735.

[Starostin, 2012] Starostin, G. (2012). Dene-Yeniseian: a critical assessment. Journal of Language Relationship, 8:117

138.

[Starostin, 2013] Starostin, G. (2013). Global lexicostatistical database. Yeniseian family. Available at starling.

rinet.ru.

[Thorne and Kishino, 2002] Thorne, J. L. and Kishino, H. (2002). Divergence time and evolutionary rate estimation

with multilocus data. Systematic Biology, 51(5):689702.

[Vajda, 2011] Vajda, E. (2011). A Siberian link with Na-Dene languages. In Kari, J., Potter, B. A., and Vajda, E.,

editors, The Dene-Yeniseian Connection, pages 3399. ANLC, Fairbanks.

[Vajda, 2012] Vajda, E. (2012). The Dene-Yeniseian connection: a reply to G. Starostin. Journal of Language Rela-

tionship, 8:138152.

[Vajda, 2013] Vajda, E. (2013). Vestigial possessive morphology in Na-Dene and Yeniseian. In Hargus, S., Vajda,

E., and Hieber, D., editors, Working Papers in Athabaskan (Dene) Languages 2012, volume 11 of Alaska Native

Language Center Working Papers. ANLC, Fairbanks.

[Watson, 2017] Watson, T. (2017). Is theory about peopling of the Americas a bridge too far? [news feature].

Proceedings of the National Academy of Sciences, 114(22):55545557.

[Xie et al., 2011] Xie, W., Lewis, P. O., Fan, Y., Kuo, L., , and Chen, M.-H. (2011). Improving marginal likelihood

estimation for Bayesian phylogenetic model selection. Systematic Biology, 60(2):150160.

- points EstimateUploaded byKarl Stessy Premier
- VasicekUploaded byErsin Topan
- Assessment 2 - Unit Planner.pdfUploaded byAlison Khun
- Scien Direct 3(1)Uploaded byRudi salam
- !!Ijest 2(9) Special Issue NPD WholeUploaded byDr. Engr. Md Mamunur Rashid
- MULTI-PEOPLE TRACKING ACROSS MULTIPLE CAMERASUploaded byInternational Journal of New Computer Architectures and their Applications (IJNCAA)
- exsoft2Uploaded bysipil123
- 1743-422X-10-275Uploaded byHuGo Castelán S
- Error analysis lecture 15Uploaded byOmegaUser
- Detection of Reliable Software Using SPRTUploaded byEditor IJACSA
- Tips and Strategies for Mixed Modeling With SAS STAT ProcedureUploaded byarijitroy
- Van Der MERWE_Probability of Failure of South African Coal PillarsUploaded byAnonymous PsEz5kGVae
- 1329058574-Simonyanetal_2011AJEA854Uploaded bynbutrm
- hw12Uploaded bydeweazy
- KNNL_AppAUploaded byjoshualizardi8763
- esnm chaoUploaded byapi-259375957
- Hayward Green VgUploaded bysumasuthan
- chap3.pdfUploaded byEugene
- Generalized Linear Models-1Uploaded byAnn Baisa
- Con Row RiskUploaded byVisha Vsh
- AKASHAGARWAL.docxUploaded byAkash Agarwal
- Module 2.1 slides.pdfUploaded byTharitha Murage
- CST ESTUploaded byMahesh Jayawardana
- frohnUploaded byAhmed H Elsayed
- Numerical Method ImplementationUploaded bySatrio Arbiyudho Cesiojakty
- Reading Paper 3Uploaded byYueyue Zhao

- Numerosity and Consumer BehaviorUploaded byJorge Mora
- The Theory of Affordances 1977Uploaded byJorge Mora
- Frederick Roehrig: A forgotten name in Salish linguisticsUploaded byJorge Mora
- Bamana-Tea Practices in Mongolia 2015.pdfUploaded byJorge Mora
- Opinion on Occupational Health Problems Among Salt Workers at Saltpan in TamilnaduUploaded byJorge Mora
- Gaps in Grammar and Culture 1975Uploaded byJorge Mora
- The Anthropology of PowerUploaded byJorge Mora
- The Sorcerer’s Stone: Magic of Water and Blood 2013Uploaded byJorge Mora
- The Syntax of Agreement and Concord 2008Uploaded byJorge Mora
- Multilingualism and Assimilationism in Australia’s Literacy-relatedUploaded byJorge Mora
- Forests and Human ProgressUploaded byJorge Mora
- Work-related Health Problems in Salt Workers of Rajasthan, IndiaUploaded byJorge Mora
- A Dictionary of the Biloxi and Ofo Languages 1912Uploaded byJorge Mora
- Theory, Taxonomy and Methodology: A Reply to Haldane's Understanding FolkUploaded byJorge Mora
- Abstractions 1971Uploaded byJorge Mora
- The Allentiacan, Bororoan, And Calchaquian Linguistic Stocks of South AmericaUploaded byJorge Mora
- talmyUploaded byAgostina Minini
- 05 Bates L. HofferUploaded byHarun Rashid
- Casting Timeshadows_Pleasure and Sadness of Moving Among Nomadic Reindeer HerdersUploaded byJorge Mora
- British Academy Review 2009Uploaded byJorge Mora
- The Disappearance of the Verbal Noun in -((u)r)aku in the History of JapaneseUploaded byJorge Mora
- Is the Rate of Linguistic Change ConstantUploaded byJorge Mora
- The Politics of Sacrifice. An Aymara Cosmology in ActionUploaded byJorge Mora
- Smith-Language and the frontiers of the human 2012.pdfUploaded byJorge Mora

- Miller - South African Long-fingered BatsUploaded bySheni Ogunmola
- Hundt Et Al 2013 BlenoideosUploaded byKarito
- orthalicoideaUploaded byquipu_romero
- science reviewUploaded byStephen G. Sabinay
- Big, Bad, and Beautiful: Phylogenetic Relationships of the Horned Frogs (Anura: Ceratophryidae)Uploaded byBradley Anderson
- Bassaricyon Neblina 2013Uploaded byAleksandra Panyutina
- Korall 2007 Cyatheaceae DNAUploaded byLia Macias
- Biological Journal of the Linnean Society 2005 Miya-1Uploaded byKarito
- pamlDOC.pdfUploaded byBogdan Nica
- Hons Project - Robyn Cuthbertson - Inferring Phylogenies and Distributions of Upper-Andean and Amazonian Frogs From the Genus Pristimantis Using New and Previously Published Sequence Data - 2012-13Uploaded byrobynacuthbertson
- Phylogeny and Classification of the New World SuboscinesUploaded bypavelar_10
- A Biologist’s Guide to Bayesian Phylogenetic AnalysisUploaded byLuis Caceres
- Lecture 8Uploaded byCut Safira
- J. Bacteriol.-2007-Bailly-5223-36Uploaded byAL
- Phylogenetics and Morphological Evolution of Euphorbia subgenus EuphorbiaUploaded byCrisMaya
- CanabaceaeUploaded byFernanda Duarte
- Exobasidium .pdfUploaded byJohn Bradford
- 1206===CARDAMINE FLEXUOSA (BRASSICACEAE)Uploaded bysellaginella
- BlochEtAl2016_First North American Fossil Monkey and Early Miocene Tropi...Uploaded byKarencita Valdés
- OPUNTIA FILOGENIAUploaded byRoxana De la Cruz
- Morphology and Molecular Phylogeny of Peritrich Ciliate Epibionts on Pelagic Diatoms: Vorticella oceanica and Pseudovorticella coscinodisci sp. nov. (Ciliophora, Peritrichia)Uploaded byfernando_gomez8953
- MalphigialesUploaded byAldo A. Vela
- 19 2016 Scleroderma Dunensis PersooniaUploaded byBianca Silva
- Abrothrix Illutea Systematics (ZE 2011)Uploaded byAndres Libardo Botero Botero
- 10.1016@j.ympev.2018.12.022Uploaded byAnonymous bZAo27
- Opuntia Ficus Indica OriginsUploaded byfabys12
- Molecular Phylogenetics and Classification of SantalaceaeUploaded byJoshua Der
- Sexual Activity Without Condoms and Risk of HIV Transmission in Serodifferent Couples When the HIV-Positive Partner Is Using Suppressive Antiretroviral TherapyUploaded bybmx0
- GCB04TutUploaded bylinubinoi
- Basal Cactus Phylogeny- Implications of Pereskia (Cactaceae) Paraphyly for the Transition to the Cactus Life FormUploaded byQueremosabarrabás A Barrabás