Non-arguments about non-universals
G Longobardi & I Roberts
1. Introduction
Dunn et al (2011) apply computational phylogenetic techniques to cross-linguisticdata taken from the
World Atlas of Language Structures
(Haspelmath et al 2008;
) and claim that the results of their analysis show that (i) “contrary to thegenerative account of parameter setting ... the evolution of only a few word order features of languages are strongly correlated” (1) and (ii) “contrary to theGreenbergian generalizations, ... most observed functional dependencies betweentraits are lineage-specific rather than universal tendencies” (1). They conclude thattheir “findings support the view that ... cultural evolution is the primary factor thatdetermines linguistic structure”.Dunn et al’s work has two merits. First, it illustrates the potential importanceof increasingly adopting quantitative techniques in comparative and historicallinguistics, as advocated e.g. by MacMahon and MacMahon (2005). Second, it bringsto prominence a version of what Gianollo, Guardiano and Longobardi (2008) referredto (and began to address) as “Humboldt’s problem”: the question of the extent towhich cross-grammatical generalisations can genuinely be made independently of historical factors, notably genetic affiliation, and vice versa.We unreservedly share what might be taken as one of their ultimate goals:making cultural history into a science, and a quantitative one, at least no less thannatural history is. However, we believe that the work is seriously flawed in a number of respects and fails to approximate such a goal. First, the database is too small andtoo superficial to permit any reliable conclusions to be drawn. Second, the conclusionthat grammatical structure reflects cultural history to a large extent (perhaps no lessthan vocabulary, which rather obviously does) may very well be correct; but, mainlyas a consequence of the previous point, it is poorly supported by their data and can bemuch more strongly and unexpectedly corroborated by a phylogenetic analysis basedon quantitatively wider and qualitatively deeper evidence: indeed Longobardi andGuardiano (2009) have argued for precisely this conclusion in much more detail onthe grounds of parameters of generative syntax. Third, and most importantly, their conclusions in (i-ii) simply do not logically follow from their data or arguments:neither the Chomskyan nor the Greenbergian programme for the study of universalsneed necessarily be affected by the results reported in the article.Moreover, the role of a completely undefined notion of “cultural evolution”in explaining any aspect of universals or language change remains entirely occult, beyond the truism of “the current state of a linguistic system shaping and constrainingfuture states” (1), which is true of all known physical, biological and cultural systemsat all times (except perhaps at the level of subatomic particles).Together, these weaknesses combine to render the claims made either invalidor largely without substance. We now discuss these weaknesses in more detail.1
2.The database
As mentioned above, the database used by Dunn et al is too small and too shallow togrant reliable conclusions. First, the database is too small in that a mere eightcharacters were chosen (all surface word-order features: subject-verb, numeral-noun,adjective-noun, demonstrative-noun, genitive-noun, relative clause-noun, object-verband adposition-noun). Second, although the choice of characters based on attestedword-order variation is discussed and justified, the specific choice of these eight isnot. The choice is in fact rather questionable: Dryer (1992) showed that adjective-noun probably does not correlate with other orders, contrary to what was earlier assumed by Greenberg (1966) and Hawkins (1983); relative-clause positioning inrelation to nouns is known to show a much stronger "head-initial" (noun beforerelative clause) tendency than other orders (Hawkins (1994)); it is not generallythought that subject-verb order correlates with other properties (although Dryer (1992) argues that it does). From the point of view of structural parameters, these pairs form a rather non-homogenous set: only verb-object and adposition-noun areuncontroversially agreed to enter into the kind of relation (head-complement) thatmay be subject to “hard-wired parametrisation” in the sense of generative grammar:although some recent theories might (controversially) assimilate numeral-noun,demonstrative-noun and some cases of genitive-noun to this relation, no theory wouldassimilate subject-verb and adjective-noun to this relation. Furthermore, at the veryleast two classes of cardinal numerals (nominal and adjectival: Zweig 2005) must bedistinguished, as well as two types of postnominal demonstrative positions(Guardiano 2010, Roberts 2011), and three distributionally very different genitives(Longobardi 2001). Hence, from the perspective of almost all updated theories of word-order variation, no trivial correlations are expected among many of these pairs.Finally and more generally, the testing ground is too superficial in that actualassignment of languages to given orders is dubious (to some degree, this is due to theuneven quality of the data in
, which can only be as good as its extremelyheterogeneous sources and the necessarily oversimplifying analytical categoriesused). For example, merely looking at the more familiar Indo-European languages intheir Figure 1, one sees "polymorphic states" assigned to verb-object order in AncientGreek, Latin, Old English, German, Dutch, Afrikaans, Flemish and Frisian. For theModern Germanic languages, this reflects the fact that the
tensed verb form
appears insecond position (frequently, but by no means necessarily, before the object) inassertive main clauses and in final position in subordinate clauses introduced by acomplementiser: however, in all generative work since Koster (1975), the view has been that the verb's position in main clauses is a "derived" one, not reflecting the trueunderlying order, which is always verb-final. Perhaps someone could dismiss thisanalysis as too abstract an artifice of generative theory, but note in this connection thateven the application of Bayesian statistics, as seems accepted by the authors, is a formof equally abstract analysis; therefore, it is entirely unclear why this should beaccepted while the consensus of almost 40 years' work in the syntax of the Germaniclanguages by native speakers expert in syntactic theory should be disregarded.Anyway, if this were the case, we would meet the same kind of methodologicallynarrow attitude as that which underlies many of the claims recently made by Evans &Levinson (2009): if you refuse to look beyond surface phenomena you may miss2
hidden generalisations of some significance. For example, even a superficialdescription should admit that the purely lexical forms of the verb (i.e. those notsynthetically combined with Tense) consistently occur after the object in, say,German: now, as pointed out by Haider (2010), it is precisely this property that should be considered as ‘basic’ in that it correlates with the same further word order  properties as one finds in less controversial (i.e. without verb fronting) OV languageslike Japanese. For Old English, the same point regarding verb-position holds,although there are further complications partly due to the impoverished database;however, here too the (less uniform) consensus is that the language, at least in theearlier Alfredian period, was verb-final. The same has been argued in a series of recent, theoretically informed, empirically detailed studies of Latin word order (seeSalvi 2004, Devine & Stephens 2006, and, in particular, Ledgeway forthcoming).Ancient Greek is a less clear case, although the important work by Taylor (1990)shows that Homeric Greek was verb-final while Classical and New Testament Greek were VO. If such doubt can be cast on the reliability of the basic data in the case of the languages that are well-known and well-studied, how much faith can we have inthe claims about the other orders in the other languages?In short, the well-known principle of “garbage in-garbage out” may applyhere. The choice of characters is small, arbitrary and non-uniform and the datareported regarding the states of those characters is of dubious quality. Hence, no firmconclusions can be drawn from any computational treatment of this data, sophisticatedthough it may be.
3. Humboldt’s problem
The question of the nature or even the existence of structural traits (a kind of 
) which are similar across languages, though not attributable to a commonhistorical source, but rather to some universal ‘type’, is a venerable one (firstexplicitly raised, as far as we know, by Humboldt (cf. Morpurgo-Davies 1998, ch. 5,especially fn. 5)). Dunn et al are to be credited for making one more attempt to re-address this issue and answer it. However, for the reasons just given, we believe thattheir results must be considered with scepticism. Drawing an analogy with geneticshere, one could claim that the surface data Dunn et al. use are really akin to the kindof surface observations of phenotypical features of peas that Mendel used, or toshades of skin colour as were once employed in rough and evolutionarily doubtfulclassifications of humans. We now have available much deeper and fine-grainedinsights on diversity provided by formal grammar and the best parametric approaches.In a series of recent papers by Longobardi and various collaborators (fromLongobardi 2003 through Longobardi and Guardiano 2009, Bortolussi et al 2011),guided precisely by the model of taxonomic genetics, a version of this problem has been re-addressed ‘from the other end’ (not unlike Dunn et al. in this respect, see below), i.e. trying to measure formally how much grammatical structure encodes anhistorical signal. This work demonstrates how a close analysis of a limited, butcoherent, module of cross-linguistic syntax (roughly, the “nominal group”) can yieldmuch more detailed data and, when coupled with computational phylogenetictechniques and statistical evaluations, real (i.e. independently known to be correct)insights into language history. The phylogenies based on parameter values for the3

