Towards Establishing A New Basic Vocabulary List (Swadesh List) (Version 1) Carsten Peust, 2010

Towards establishing a new basic vocabulary list (Swadesh list)
(Version 1) Carsten Peust, 2010
Abstract
Basic vocabulary lists are an important tool in comparative and historical linguistics. They provide the base for estimating the time depth of language families by the technique of glottochronology1 or by other statistical methods.2 The lists are composed of vocabulary that is intended to be as stable as possible diachronically. Several such lists have been proposed, the most famous ones being those by Morris Swadesh.3 Swadesh did not elaborate on how and why he made exactly this selection of words.4 Swadeshs original lists have not convinced everyone, so that various modifications of his lists were proposed by others.5 Also in all of the modified lists, the choice of items has either not been justified at all or can be criticized for methodological reasons. I am proposing here a methodology as well as (limited) empirical data for ranking meanings according to their diachronic stability, in order to construct a revised basic vocabulary list. As a result, I propose a list of 57 items at the end of this paper. More empirical data can and should be added in the future in order to further improve on the list.
Lexical stability
A basic vocabulary list is defined by a set of meanings in a meta-language (such as English). Based on this template list, corresponding lists can be established for any language by translating each meaning into the target language. In the translation, according to Swadesh, the most frequent and most basic or general term of the target language has to be chosen. 6 The glottochronological method then involves counting the cognate terms in the basic vocabulary lists of two languages or of two diachronic stages of a single language. Unter the assumption that the average replacement rate per time is largely language independent for a given list of meanings, the cognate count allows for an estimation of the time distance between both languages. While in order to calibrate the glottochronological model, i.e. to determine the replacement rate per time for a given vocabulary list, languages should be chosen whose history is well known and which allow for good cognacy judgments, cognacy will be harder to judge, and perhaps rely only on sound similarity, when the method is applied on languages with no attested history or on long distance relationships. When the observed cognacy rate falls below a critical level, it may therefore become indistinguishable from random similarities between unrelated languages, so that the glottochronological method can no longer be applied. As the list is composed of more stable items, the limit for the applicability of glottochronology can be pushed further into the past. Diachronic stability of a term in a language during a certain time interval means that the most frequent and basic term for the given meaning is not replaced by any possible competitor term in that interval. 7 On the other hand, a replacement of a term takes place when a competitor term raises its frequency and generalizes its meaning to the degree that it in turn becomes the primary term for the given meaning. The competitor term may be either a native term with an originally different meaning or a loan word from another language. It can be assumed that two factors in particular contribute to the stability of a term: (1) Frequency. A term that is itself frequent is difficult to challenge, in terms of frequency, by a competitor term. In addition, frequent terms are firmly rooted in the memory of the speakers, and known to all speakers of the language community, which favours their stability.8 1 A method developed by Morris Swadesh which tries to measure the degree of language relationship based on the hypothesis that the lexical replacement rate of a given vocabulary list is approximately constant for all languages and ages. 2 A basic vocabulary list (in that case, Swadeshs 200-item list) also served as the basis for the statistical approach to decide upon language relationship developed by Kessler (2001). 3 Swadesh (1955). 4 Swadesh appearently selected items for his lists by a combination of intuition and experience (...). Swadesh calculated a percent persistence factor for each item, based on eight old-world languages, but these factors were not used in deciding what items to keep and what to drop (...) (Oswalt 1971: 422). 5 E.g. by Bender (1983: 266-281), Dolgopolsky (1986: 34f.), Elbert (1953: 150f.), Halayqa (2007), Holman (2008 et al.), Starostin (2000: 257 note 25), Tadmor (2009: 68-75), Woodward (1993: 17) and Yakhontov (cited in Starostin 1991: 59f.). 6 The choice of the best term for a meaning in a given language can, of course, sometimes be disputable, which forms one of the major points of criticism on Swadeshs use of vocabulary lists. While uncertainty about the most adequate translation adds some statistical noise on the results, it does not, in my view, invalidate the glottochronological method in any fundamental way. 7 The gradual phonetic evolution, which all words of a language continually undergo, does not count as a replacement. Also expansions of a term by affixes are not normally counted as a replacement. 8 This relationship is widely acknowledged, cf. e.g. Dyen (1960: 37): ... it is reasonable to suppose that the more common a word is, the less likely it is to be replaced; van Hout & Muysken (1994: 53): The more frequent a word in the Quechua data base, the less the chance that it is Spanish. This suggests indirectly that frequency in the recipient language may operate as an inhibiting factor [for borrowing, C.P.]; Tadmor (2009: 74): It seems logical that frequently used words would also be highly resistant to borrowing, because more time and effort would be needed for the borrowing to become established. A recent study which conforms the correlation for Indo-European languages by statistical methods is Pagel et al. (2007). 1
(2) Semantic distinctness. A term whose meaning is unsharp and highly conventional is apt to change more easily than a term whose meaning is clear-cut and expresses a concept that exists (more or less) a-priori. This is the reason why more nouns than verbs can be found among the most stable lexical items. The world of nouns often reflects notional concepts which have a more or less a-priori existence, whereas the world of verbal ideas often contains concepts whose definitions are more vague and arbitrary.9 As a result, the most stable lexical items should be such that are both frequent and stand for concepts with clear-cut meanings. (3) In addition to these language independent factors, there can be factors specific to a word in a given language which influence its prospects of remaining stable. If a word happens to be in some respect special, e.g. because it has an irregular inflexion, or if either through shortness or through accidental similarity it is in danger of homonymic clash with other items, the pressure will be high for it to be replaced in many daughter languages even if the meaning itself is a stable one.
Selection of language couples

It is evident that the stability of a meaning can only be determined empirically. 10 To this purpose, I use a data table which indicates for several candidate meanings how many cognates they share in a number of language couples. I pose three requirements on the selection of the language couples: (1) All the couples are independent from each other, (2) both languages of the couple are actually attested languages, (3) the languages of the couple have a well-known history so that (relatively) safe cognacy judgments are possible. A fourth potential requirement could be that the chosen language couples should be genetically and geographically diverse. I believe that this requirement, which in practice often contradicts requirement (3), is of lesser importance under the assumption that the glottochronological hypothesis of a language-independent replacement rate, as assumed by Swadesh, is correct. There have been studies where, as I do here, meanings were ranked according to the cognate preservation count in a number of language couples.11 In all studies I am aware of, however, the couples were chosen so that the three requirements mentioned above were not all met, particularly not the first one. The former studies typically used data from several interrelated couples out of a single genetic stock. I believe that this can seriously flaw the results. The independence requirement is important for at least two reasons. First, a word can be instable in a language for a language specific reason (as explained under 3 in the preceding section), so that it is at risk to be replaced in many daughter languages even though the semantic concept as such is a stable one. Second, some of the daughter languages may form an unrecognized subgroup within the language family. If a word happened to get lost in the proto-language of that subgroup, it would appear to be missing in all these daughter languages although historically only a single loss took place.
Selection of lexical entries

The lines of the table contain the candidate meanings. These are 174 meanings for which I considered it possible that they might end up in a reasonable basic vocabulary list. The candidate list includes almost all members of Swadeshs 100-item list with the exception of claw, which I replaced by (finger)nail 12, and to walk, which I replaced by to go 13, as well as several items selected from competing basic vocabulary lists. I have also put to test some words which Swadesh rejected as being cultural vocabulary (e.g. brother, house).
Cognate judgements
Entries are considered cognates if they are etymologically identical at least for their greater part (I accept different suffixes or compounding with another element, provided that there is a substantial common part). The symbols ] and [ indicate prefixed or suffixed additional material. Although I have attempted to select language couples whose mutual historical relationship is relatively well-known, the judgment on the cognacy of words is not always straightforward, and I have certainly not been able to 9 To give just one example, the borderline between meanings such as man and woman, or between dog and cat, has a higher a-priori reality than the borderline between to go and related meanings such as to run, to come, to move, etc. 10 It might become possible in the future to predict the stability of a meaning from, e.g., its frequency and its semantic distinctness, but there is so far no known way of measuring the latter. Frequency would, again, have to be measured empirically. 11 Dolgopolsky (1986); Dyen (1964: 242f.); Dyen & James & Cole (1975: 185f.); Holman et al. (2008); Oswalt (1971); Swadesh (1955: 133-137); probably also Lohr (1998), which was not accessible to me. Tadmor (2009: 68-75) provides a ranked 100-item basic vocabulary list which was created on a large statistical basis but considers diachronic stability only as one among several criteria. 12 Both are synonyms in many languages, but in case of divergence I decided to prefer the human term, as is generally so for the other body part terms of the Swadesh list. 13 As other users of the Swadesh lists have already done, because to walk has no obvious elementary translation in many languages. 2
eliminate errors completely. Apart from uncertainty about the linguistic history of the word, the judgment can be a matter of definition even where we are informed perfectly. I have adopted the following principles: (1) When one of both languages has borrowed a term directly from the other, the terms are considered non-cognate.14 (2) When both languages borrowed their terms independently from a third language, they are considered non-cognate.15 (3) When language A borrowed a term from C where again it is cognate to the term of B, the terms of A and B are considered non-cognate.16 (4) When both A and B borrowed a word from a third source C so early that the borrowing may well have taken place in the common ancestor of A and B, the terms are considered cognate17.
Ranking the items and extracting a basic vocabulary list

Based on the cognate counts of the list, the meanings can be ranked according to their diachronic stability. My measure of the stability of a meaning is simply the number of language couples within my sample that preserve it as cognates. This measure makes sense although the couples differ in their degrees of relationship: Some of them are related much more closely (e.g. English/German) than others (e.g. Finnish/Hungarian), as can be seen in the cognate summations at the bottom of the table. Anyway, one can assume that a meaning with a higher count is always likely to be more stable than a meaning with a lower count, irrespectively of which individual couples contribute to the counts. An intuitive argument for this could be the following: In many cases, a meaning will show up as a cognate in a close couple but not so in a more distant couple. If we encounter, for another meaning, the opposite case, namely the preservation as a cognate only in the distant couple but not in the close couple, one could argue either that this latter meaning is more stable (since it was preserved even in the distant couple) but also that it is less stable (since it was lost even from the close couple) than the first meaning. A more formal proof could look as follows: Proof: Under current glottochronological assumptions, for any concept w there will be a fixed probability p(w) for it to survive over a given time interval, say a millennium. Given a language couple l separated by m millennia and a word list w1, w2, w3, ..., the expected number of surviving cognates C(l) will be p(w1)m + p(w2)m + p(w3)m + ... . Given another couple l' separated by m' millennia, we expect C(l') = p(w1)m' + p(w2)m' + p(w3)m' + ... cognates. It is obvious that observed cognate counts C(l') > C(l) imply that m' < m (and vice versa), irrespectively of which individual cognates contribute to the counts. Once the meanings have been ranked, an n-item list can be extracted by selecting the top n items from the list. There is a tradeoff between the desire to have a high average stability on the one hand and to have a long list (to reduce statistical noise in the application of the list) on the other. There is no known way of how to ideally balance these competing desires, and the purpose for which the list is going to be used may be relevant here as well. In any case, it must be emphasized that the items of any list will not all have the same degree of stability,18 so that any stability rate that can be estimated for a given list is only an average value over all list items.
The data table

First column: Description of the word meaning Second column: Indicates for a number of important basic vocabulary lists whether the given word was included in them: 1 = Swadesh 100-item list; 2 = Swadesh 215-item list (both in Swadesh 1955); B = first 100item list by Bender (1983: 266ff.); D = 15-item list by Dolgopolsky (1986: 34f.); H = 40-item list by Holman et al. (2008); S = 55-item list by Starostin (2000: 257 note 25); T = 100-item list by Tadmor (2009: 6875); Y = 35-item list by Yakhontov cited from Starostin (1991: 59f.). Cells are marked by when either the entries would be non-cognate, or when one of the languages lacks an obvious unmarked word for the meaning. This is work in progress. I intend to add more language couples to the list in order to expand the empirical basis in the future. 14 Among the language couples chosen here, this situation arises particularly often for Hindi which has borrowed a lot of words, including basic vocabulary, from Persian. 15 E.g. English round and German rund, both from Old French. 16 E.g. English flower < French fleur = German Blume, or Amharic gur (older gwr) hair < Cushitic and here probably related to Hebrew sear. A borderline case, which I likewise count as non-cognate, is Engl. fruit < French fruit < Latin fructus and German Frucht < Latin fructus. 17 E.g. Irish clmh = Welsh plu feather, both from Latin pluma, or Finnish sata = Hungarian szz hundred, both from an early Indo-European language (cf. Sanskrit atam). 18 As was clear already to Swadesh (1952: 457): A stability score for individual items could be calculated, and this score taken into account in constructing [an] improved test list. 3
gloss
presence in Amharic = previous Modern lists Hebrew19 12B T 12BT 2T 2 12BS hullu=kol qrfit=klipa
Bahasa Indonesia = Malagasy20
Bulgarian = Latvian21
Egyptian (Old Kingdom) = Bohairic Coptic22 nb=nib[en ni=ini s3=soi msi=misi kmm=khame znf=snof qs=kas mn=mnot sn=son
English = German
Finnish = Hungarian
French = Romanian
Hindi = Persian Irish = Welsh23
Oromo = Somali24
Turkish = Yakut25
all ant ash(es) to ask back (of body) bad bark (of tree)
vski=viss ppel=pelni
all=alle ant=Ameise ashes=Asche
tout=tot
uile=holl luaith=lludw
hamaa=xun (xum-)
kl=kl sr=tr
fourmi=furnic ceindre=cenu amer=amar noir=negru sang=snge os=os sein=sn frre=frate
ber]tanya=man on]tny kulit=hodi[kz o26
ph=porsdan ph=pot janm=zydan bh=berdar rsc=rhisgl bolg=bol mr=mawr an=edn searbh=chwer w dubh=du
to bear / to give birth belly big bird to bite bitter black blood bone breast28 brother 19 20 21 22 23 24 25 26 27 28 12BS 12BST 12BT 12BT T 12BST
wlld=yalad
bear=ge]bren bite=beien bitter=bitter blood=Blut breast=Brust veri=vr
ala=dhal
simbirroo=shi mbir iniina=qaniin
burung=vrona
nkks=naax menggigit=man ikitra mrara=mar darah=ra

27
brat=brlis
haaa=qadha ac=ah adh kara=xara kan=xaan
12BHSTY dm=dam 12BHSTY ant=tsem 1BHT 2
iiga=dhiig
lafee=laf
tulang=tolana
brother=Brude
de]arthir=bra
Cf. Leslau (1969) who compared the same pair of languages. My transcription of Hebrew reflects the modern Israeli pronunciation. I consider the Austronesian Basic Vocabulary Database (http://language.psy.auckland.ac.nz/austronesian/). I consider the Latvian Swadesh list with etymological annotations by Holst (2001: 213-222). Data from personal knowledge. I cite both languages in their conventional transliterations which, as should be noted, must not be taken as a phonological rendering. In fact, the conventional transliteration of Egyptian suggest a greater phonetic similarity to Coptic than was actually the case (note in particular that 3 = /r/, = //, = /x/). In the few cases where the meaning is not yet attested in sources from the Old Kingdom, I have supplied words used in the Middle Kingdom (leaf, mouse, root, tear). I consider Lucht (2007). Somali is given in its standard orthography (note in particular c = //, dh = //, x = //), Oromo in a common orientalistic transcription. Turkish word choices largely adopted from Kessler (2001). I use a transcription of Yakut close to the orthography of modern Turkish. Basically the same cognate pair as for skin. Correspondence Indones. d = Malag. r as in leaf, two, winter. In case of conflict I prefer words for female breast(s). 4
r to burn (intr.) to carry child cloud cold to come to cut to die / dead to do dog to drink dry ear earth / soil to eat egg eight eye to fall / to drop far fat / grease 12BST T 2T 12B 12B 12BHT 2B 12BDHS T l=yled mot=met awan=rhona hari=ndro30 mati=mty tanah=tny nsja=nest den=diena u]mram=mirt suh=saus uh=auss zemj=zeme jgp=hpi qbb=khbob jwi=i =t hrw=ehoou mwt=mou jri=iri zwr=s w=oui msr=ma wnm=oum sw.t=souhi burn=brennen child=Kind cold=kalt
29
wd pilvi=felh venir=veni j[our=zi mourir=muri faire=face chien=cine boire=bea oreille=ureche karn=kardan skh=xok far=oer
31
guba[a=gub o bulut=blt
qabbanaa=qab ow gel=kel gn=kn l=l i=is kuru=kuraan kulak=kulgaax ye=sie
come=kommen day=Tag do=tun drink=trinken dry=trocken ear=Ohr earth=Erde eat=essen egg=Ei eight=acht eye=Auge fall=fallen fat=Fett kuolla=hal tehd=tesz juoda=iszik (iv-)32 syd=eszik33 silm=szem
day(=not night) 2
marn=mordan
danaim =gwn eud cluas=clust ubh=wy ocht=wyth gogaa=engeg
12BHSTY 12BHT 12BS 12T 12BST 12BSTY 2
12BHSTY
makan=mihna jam(jad-) na =st(d-) telur=atdy34 sem=astoi ok=acs
manger=mnca uf=ou huit=opt il=ochi h=hat dr36=dr
hanqaaquu=ug yumurta=smt ax35 saddeet=siddee sekiz=as d ija=il fagoo=fog d=ts ya=sa
smmnt=mon e mata=mso
mnw=mn

12BDHST ayn=yin Y 2T 2T 12B ruq=raxok
graisse=grsim
29 This etymology is not generally accepted, but I consider it to be true in view of identical semantics, gender and plural formation, with only an unexplained n~l-variation. 30 Despite some uncertainties I consider it probable that these words are cognate. The initial is as in liver or rain, Malagasy -ndr- can be the reflex of a former *-r- following an -n- (which is missing from the Indonesian form), cf. to spit for a similar situation. Tagalog raw day is probably related as well. 31 Old Irish do-gn-. 32 Root *ju-. 33 Root *sev-. 34 Correspondence Indones. l = Malag. d as in five, skin. 35 Probably cognate although the sound correspondences are not entirely clear. The form anqoqho egg of Gz seems to be a borrowing from a related older Cushitic language. 36 This word could formally be a borrowing from Persian, but it is common in most Indo-Aryan languages and thus probably inherited. 5
e father 2 abbat=av jtj=it father=Vater doigt=deget feu=foc poisson=pete pit=pedar ugl=angot mahl=mh pn=panj clmh=plu tine=tn cig=pump blth=blodyn abbaa=aabbe baalle=baal to fear/be afraid 2 feather finger fire fish five flower fly (animal) to fly foot four fruit full to give to go good grass green hair (of head) hand head to hear heart 37 38 39 40 41 42 43 12B 12BHTY 2 2 BT 12 12BST 2 1HSY 12BTY BT 12BT 2B 12 12BST 12BHTY 12BS 12BHT 12BD takut=ma]thot boj=baidties ra bulu=volo[mb rona38 api=fo prst=pirksts gn=uguns pet=pieci fear=frchten37 pelt=fl feather=Feder finger=Finger fire=Feuer fish=Fisch five=fnf fly=Fliege fly=fliegen foot=Fu four=vier full=voll give=geben42 go=gehen good=gut grass=Gras green=grn hair=Haar hand=Hand hear=hren heart=Herz kala=hal
at39=etsba
b=tb
djw=tiou
12BHSTY sat=e
qurummii=kall balk=balk uun40 an=shan be=bies ayak=atax drt=trt dolu=toloru ver=bier ot=ot el=ilii ba=bas iit=ihit yrek=srex
ammst=xame lima=dmy zmb=zvuv aratt=arba fre=pri mulu=male =yad ras=ro smma=ama lbb=lev
viisi (viite-)=t cinq=cinci nelj=ngy antaa=ad menn=megy ksi=kz p=fej kuulla=hall sydn=szv fleur=floare
bunga=voni[nk zo lalat=llitra empat=fatra mux=mua letj=lidot tiri=etri
ff=af
fdw=ftoou m=meh ri=ti m=e smw=sim w3=ouotouet sm=stem
mouche=musc makkh=magas cuileog=cleren titiisa=diqsi41 pied=picior quatre=patru fruit=fruct plein=plin donner=da bon=bun herbe=iarb vert=verde main=mn pair=p r=ahr pr=por dn=ddan hth=dast43 sir=sar
ceathair=pedw afur=afar ar ln=llawn far=gwair glas=glas lmh=llaw ceann=pen mataa=madax onnee=wadne
buah=voa[nkz o penuh=fno hijau=m]itso plen=pilns dvam=dot zeln=za glav=galva src=sirds
tangan=tnana rk=roka
cluinim=clywed
3tj=ht
Assuming that there is a connection between the Germanic roots *fr- and *furh-t-, which is not uncontroversial. Lit. hair of bird; volo in isolation changed its meaning to hair. From Gz bat. Somali has kalluum- in derivatives. The geminate -ll- points to an original consonant cluster which was probably -l- as still in Sidamo qilime fish. Somali -q- is here a development from *-- (cf. Rendille aassi fly), which was regularly lost in Oromo. The initial ti- of Oromo must be the result of a reduplication. The true English cognate is an earlier English form yive which was reshaped under Scandinavian influence. This is a borderline case which I count as related. Sanskrit hasta-. 6
heavy honey horn house hundred
T 1BHTY T
kbbad=kaved qnd=kren bet=byit mto=mea rab=raev
berat=ma]vsat ra med=medus tanduk=tndro rog=rags ka se]ratus=zto aku=ho sto=simts az=es led=ledus
bj.t=ebi
honey=Honig horn=Horn house=Haus
sarvi=szarv
miel=miere corne=corn faim=foame glace=ghea genou=genunc hi rire=rde coucher=culc a47 foie=ficat long=lung pou=pduche chair=carne lait=lapte lune=lun
sau=sad mai=man
trom=trym mil=ml teach=t cad=can m=mi oighear=i glin=glin
ulfaa[taa=culu ar=ar s gaafa=gees ani=ani[ga jilba=jilib kolfa=qosol46 iisa =jiif[so

48
boynuz=muos yz=ss a=ak ben=min buz=muus ldr=lr45 bil=bil gl=kl yat=st uzun=uhun bit=bt erkek=erkihi et=et ay=y
b=tap
44
n.t=e
hundred=hunde sata=szz rt hunger=Hunge r I=ich ice=Eis knee=Knie laugh=lachen lie=liegen live=leben liver=Leber long=lang louse=Laus man=Mann milk=Milch moon=Mond j=jg el=l maksa=mj ti=tet kuu=hold
hunger / (to be) hungry I ice to kill knee to know to laugh leaf to lie (down) to live/be alive liver long louse man (male) many meat / flesh milk moon 44 45 46 47 48
qr=hko
jnk=anok
12BDHST ne=ani Y 2 12B 1BHST 12BTY 2T 12BHST 12 2 12BHST 12BST 12 12BS 12BST 1BSY saq=tsaxak
membunuh=ma mno daun=ravna hati=ty laki=lehilhy koljno=celis znam=zint
jnn=dnesta n jn=zende ae=afu fear=gr
smja=smieties zb=sbi ivja=dzvot dlg=ilgs g3b.t=bi
duilleog=dalen
n=nx
jf=af jr.t=erti j=joh
eeraa=dheer
injiraan=injir jia=dayax
12BDHTY
banyak=btsak a bulan=vlana
aannan=caano st=t
Although the Egyptian consonant was normally lost by Coptic, there are some instances of preservation as a dental (also in to cut). Causative of to die. Regular sound shift s > f as well as a metathesis of adjacent consonants in Oromo (which still has kofla as a variant). Both have causative meaning: to lay down; the concept to lie is expressed by passive forms of this verb. Oromo has iif- before consonantic suffixes (e.g. iifta), s > f before C being a regular alternation pattern in the language. However, as Somali and other cognate languages show, the original root should be *iif- and the forms in -s- were created by false analogy with verbs of the alternating type. 7
mother mountain mouse mouth (finger)nail / claw49 name narrow navel near (adj.) neck new night nine nose not old54 one to open other
2 12BH 12BST 1BD
kuku=hho pusat=fitra baru=vo hidung=rona

55
mjka=mte nkt=nags not=nakts dvet=devii ne=ne ed]n=viens
mw.t=mau
mother=Mutter mouse=Maus mouth=Mund nail=Nagel name=Name navel=Nabel nea[r52=nahe new=neu night=Nacht nine=neun nose=Nase not=nicht53 old=alt one=ein open=ffnen other=anderer man=Men[sch hiiri=egr suu=szj nimi=nv uusi=j y=j[szaka yksi=egy muu=ms
mt=mdar
luch=llygoden ionga=ewin ainm=enw nua=newydd naoi=naw srn=trwyn n=ni sean=hen aon=un eile=ar]all duine=dyn deas=de
af[aan=af qeensa=ciddi50
trnak=trax
w=tou
pnw=phin r=ro rn=ran g3w=ou hp3=xelpi gr=rh psw=psit r.t=ai jz=ap]as ww=ouai
mont[agne=mu nte souris=oarece ongle=unghie nom=nume troit=strmt nm=nm
fr=tsipor
12BDHTY sm=em 2 T 2S 12BST 12BHST 2 12DT 2T 12BHTY 2B qrb=karov lelit=lyla and =exad
maqaa=magac hanuuraa=xu ndhur boyun=mooy yeni=saa dokuz=tous burun=murun bir=biir a=as yamur=samr kzl=khl
nombril=buric51 nbhi=nf proche=aproap e nouveau=nou nuit=noapte neuf=nou nez=nas vieux=vechi un=un autre=alt homme=om pluie=ploaie rouge=rou nay=now nau=noh nah=na k=yek dsr=digar56
ioo=dhow
sagal=sagaal mirga=midig
12BHSTY addis=xada
12BHSTY
membuka=mam otvrjam=atvr wn=oun ha t hujan=rana mrah=mna kj=ke rm=rme
person / human 12BH being rain red right (side) 12BT 12BT 2
wy.t=moun]h rain=Regen ou
dr=throre red=rot wnm.j=ouinam right=recht
kanan=havna na
droite=dreapt
49 50 51 52 53 54 55 56
As claw in Swadesh. Dialectal Somali also cinji. I assume both words to be cognate despite an irregular correspondence in the initial. From Latin umbilcus, with strong reshapening of the word form in French. Originally a comparative, the base form nigh now being obsolete. Both are independently created compounds from the same original elements *ne + *wiht. I count this as etymological identity. In case of conflict old (of things). The -n- is an irregular compensation of a lost -- (perhaps via *add). Both are derivatives from the word for two. 8
river root round salt sand to say sea to see seed57 seven to sew shadow
2 12BT 1S 2TY 12BT 1BT 2 12BHT 12B 2 2 T
sr=re zr=zra sbatt=va
jalan=llana akar=fka
sol=sls s[me=s[kla sdem=septii
jtrw=iaro mnj.t=nouni
way=Weg salt=Salz sand=Sand say=sagen see=sehen seed=Saat seven=sieben
abhainn=afon
yol=suol tuz=tuus kum=kumax de=die gr=kr
road=path=way 12BH
racine=rdcin rond=rotund sel=sare mer=mare voir=vedea
framh=gwraid hundee=xidid d cruinn=crwn salann=halen seacht=saith scth=cy]sgod canaim=canu arga=arag
m3.t=hmou
j=
d=
sfw=af
semence=sm n sept=apte coudre=coase ombre=umbr court=s]curt chanter=cnta st=haft hy=sye
torba=toddoba yedi=sette dik=tik glge=klk
menjahit=manj ja=t itra enam=nina kulit=hditra sedj=sdt est=ei neb=debess dim=dmi snjag=sniegs
la=tsel
a r=katsar58 sddst=e
shadow=Schatt en short=kurz59 sing=singen sit=sitzen six=sechs kuusi=hat
short (of things) 2S to sing to sit six skin sky to sleep small smoke snake snow to spit 57 58 59 60 61 62 2 12B 2 12BT 2 12B 12BT 12BT 2BS 2 2
gabaabaa=gaa ban otur=olor61 alt=alta deri=tirii uyu=utuy kar=xaar
si=hs msi=hemsi
sjsw=soou p.t=phe
as]seoir=edea six=ase peau=piele ciel=cer dormir=dormi fume=fum serpent=arpe hah=e sn=xbdan dhu=dd
suighim=eisted d60 s=chwech jaa=lix62 craiceann=croe n beag=bach nathair=neidr bofa=mas tuf=tufa
smay=amyi langit=lnitra m tidur=ma]try
sleep=schlafen snow=Schnee spit=spucken63
f3w=hof
meludah=mand pljvam=s]pau tf=hi]thaf
I attempt to choose words which mean both semen and vegetable seed / grain. Probably related despite an irregularity in the initial (which in Hebrew is an original q-). Gz has ir. The real German cognate is an older form scurz which seems to have been reshaped under the influence of Latin curtus. Cf. a similar variation between French and Romanian. Both words seem to contain the root *sed-, cf. Lucht (2007: 345f.). Cf. Uighur oltur to sit. Regular loss of in Oromo as well as a development l > j as in eye. 9
rra to stand star stone to suck summer sun sweet to swim tail tear(drop) ten that (far demonstrative) thin (of things) this (near demonstrative) three to tie/bind tongue tooth tree two warm to wash 63 64 65 66 67 68 12BT 12BHST qom=kam kokb=koxav
t stoj=stvt
=ohi
stand=stehen star=Stern stone=Stein suck=saugen
kivi=k
toile=stea pierre=piatr sucer=suge soleil=soare doux=dulce queue=coad
tr=setre sraj=xor[d
sigh=sugno
urjii=xiddig65
dur=tur yldz=sulus
bintang=kntan zvezd=zvaigzn sb3=siou a64 e batu=vto kmk=akmens jnr=ni snq=snk mw=m rw=r
12BHSTY 2T 12BHY T 12BS 12BSTY D 2 12 2BS 12BSTY 2B 2T
agaa=dhagax ta=taas
em=em yaz=sayn gne=kn
m=matsat s assr=ser sost=alo
summer=Somm er sun=Sonne sweet=s
samh[radh=haf milis=melys snmh=nofio deoir=deigryn deich=deg sin=hwnnw tr=tri
matahari=maso sl[nce=saule ndro66 manis67=mmy se]puluh=flo tipis=ma]nfy lidah=lla dua=ra sldk=salds
miaawaa=mac aan kuyruk=kuturuk
plvam=pel[dt nbi=nbi dset=desmit tri=trs zb=zobs dve=divi sd=sat rmy.t=erm mw=mt p[n=phai
swim=schwimm uida=szik en ten=zehn thin=dnn this=dieser three=drei tongue=Zunge tooth=Zahn two=zwei warm=warm kyynel=knny kolme=hrom puu=fa kaksi=kett
larme=lacrim s=a[k dix=zece ce=acest trois=trei lier=lega langue=limb dent=dinte arbre=arbore deux=doi chaud=cald das=dah yah=n tn=se
immimaan=ilm o kana=kan sadii=saddex hia=xidh on=uon o=ol ince=sinnyiges bu=bu =s bala=baay
mtw=omt
ns=las sn.wj=snau ji=ii
bndhn=basta n jbh=zabn dnt=dandn d=do teanga=tafod68 crann=pren d=dau
12BDHST Y 12BDHTY 12BHS 12B 2 12BDHSY
arraba=carrab dil=tl ilkaan=ilig lama=laba miia =maydh di=tiis iki=ikki yka=suuy
panas=ma]fna
wash=waschen
Derivatives from an underlying root *spi-. I assume both words to be cognate despite an irregular correspondence in the initial. Correspondence Oromo -r- = Somali -dd- as in seven. Both literally eye (of the) day, a compound that probably already existed in the common ancestor of both languages. From < *mamis, cf. Acehnese mamh sweet. Welsh -f- from *-gw-. 10
water we wet what? white who? wind wing winter
12BDHST Y 12BHS 2B 12BSTY 12BS 2TY 2T a=anxnu rb=ratov mn=ma knf=kanaf
putih=ftsy musim dingin=rirnin a72 tahun=tona
vod=dens bjal=balts koj=kas vj[tr=vj zma=ziema
mw=mou 71 m=ni]m
water=Wasser we=wir what=was white=wei who=wer wind=Wind winter=Winter
vesi=vz me=mi mik=mi kuka=ki talvi=tl
eau=ap69 nous=noi quoi=ce qui=cine vent=vnt aile=ari[p hiver=iarn
ham=m ky=e kaun=k
sinn=ni fliuch=gwlyb c=pwy
bis[aan=biyo70 nu=a/inna[ga maa[l=max adii=cad
su=uu biz=bihigi kim=kim kanat=knat k=khn
12BDSTY man=mi
3w=thou n=tenh
pr.t=phr
geimhreadh=ga eaf
woman work worm year yellow yesterday you (sg.) you (pl.)
12BS 2B 2S 2SY 12S T
tl=tola tlant74=etmol
godna=gads lt=dzeltens
m.t=s]himi
fn=fent rnp.t=rompi sf=saf ntk=nthok ntn=nthten 101
worm=Wurm year=Jahr yellow=gelb
femme=femeie ver=vierme anne=an jaune=galben hier=ieri toi=tu vous=voi 115
km=kr73 t=to 52
yl=sl sen=en siz=ehi[gi 91
bliain=blwyddy n t=ti sibh=chwi 78 kalee=shalay ati=adi[ga isin=idin[ka 64
ke]marin=oml vra=vakar y 63 ti=tu 71
yester[day=ges tern you=ihr 127 sin=te te=ti 45
12BDHST ant=ata Y 2B 174 57
Conclusions
Based on the, admittedly, limited set of language couples evaluated here, the word candidates can be grouped as follows with decreasing degree of stability: Survives in 11 couples: five, four, six Survives in 10 couples: three, two Survives in 9 couples: eight, fly (animal), full, hand, horn, hundred, I, one, seven, star, ten, who, you (sg.) Survives in 8 couples: blood, to die, nail, name, new, nine, stone, tongue, we, winter Survives in 7 couples: to eat, egg, eye, to give, head, heart, to laugh, moon, sun, water, year, yesterday, you (pl.) 69 70 71 72 73 74 Latin aqua. A more conservative form has been preserved in Rendille bie water. Cf. Quack (2002). Indones. dingin is the word for cold, musim dingin = cold season. Malagasy rirnina seems to be a similar composition of a hypothetical related term *rnina *cold plus an unidentified initial element. Same root as to do. < *tmalt. 11
Survives in 6 couples: all, bitter, brother, day, to do, dry, ear, finger, fire, fish, green, to hear, ice, navel, night, nose, other, right, root, salt, shadow, to sit, to suck, sweet, tear, this, tooth, what Survives in 5 couples: ashes, to bear, to bite, bone, to drink, father, grass, heavy, honey, hunger, knee, to live, liver, long, louse, milk, mouse, to open, rain, red, sky, to sleep, to spit, to stand, to swim Survives in 4 couples: black, cloud, cold, to come, to fear, feather, foot, to lie, man, mother, mouth, near, not, old, person, to see, seed, to sew, short, to sing, skin, snake, summer, to tie, to wash, white, wind, worm Survives in 3 couples: to ask, bark, bird, breast, earth, far, fat, flower, fruit, to go, house, to know, leaf, meat, road, sand, to say, smoke, snow, tail, thin, tree, warm, yellow Survives in 2 couples: back, to burn, child, to fall, to fly, good, to kill, mountain, narrow, river, round, that, wet, woman Survives in 1 couple: bad, belly, big, to cut, dog, hair, many, neck, sea, small, work It turns out that certain items which figure prominently in existing basic vocabulary lists are rather bad, such as dog, hair, many, neck, whereas some good words are rarely included in such lists, such as finger, fly (animal), winter, yesterday. Based on these data, a good (= diachronically stable) basic vocabulary list, which I herewith wish to propose, could be the following 57 item-list:75 all, bitter, blood, brother, day, to die, to do, dry, ear, to eat, egg, eye, finger, fire, fish, five, fly (animal), four, full, to give, green, hand, head, to hear, heart, horn, I, ice, to laugh, moon, nail, name, navel, new, night, one, right (side), root, salt, shadow, to sit, star, stone, sun, sweet, ten, this, three, tongue, tooth, two, water, we, who, year, yesterday, you (sg.).
Bibliography
Bender, M. Lionel 1983: Proto-Koman Phonology and Lexicon, Afrika und bersee 66: 259-297 Dolgopolsky, Aaron B. 1986: A probabilistic hypothesis concerning the oldest relationships among the language families in Northern Eurasia, in Shevoroshkin, Vitalij V. & Markey, Thomas L. (eds.), Typology, Relationship and Time, Ann Arbor, 27-50 Dyen, Isidore 1960: Comment on D.H. Hymes Lexicostatistics so far, Current Anthropology 1: 34-39 1964: On the Validity of Comparative Lexicostatistics, in Lunt, Horace G. (ed.), Proceedings of the Ninth International Congress of Linguistics, The Hague, 238-252 Dyen, Isidore & James, A.T. & Cole, J.W.L. 1975: Language Divergence and Estimated Word Retention Rate, in Dyen, Isidore, Linguistic Subgrouping and Lexicostatistics, The Hague, 181-207 Elbert, Samuel H. 1953: Internal Relationship of Polynesian Languages and Dialects, Southwestern Journal of Anthropology 9: 147-173 Halayqa, Issam K.H. 2007: Swadesh List (Basic Vocabulary List) for Ugaritic, Phoenician, Biblical Hebrew, Syriac and Classical Arabic, Ugarit-Forschungen 39: 319-380 Holman, Eric W. et al. 2008: Explorations in Automated Language Classification, Folia Linguistica 42: 331-354 Holst, Jan Henrik 2001: Lettische Grammatik, Hamburg van Hout, Roeland & Muysken, Pieter 1994: Modeling lexical borrowability, Language Variation and Change 6: 39-62 Kessler, Brett 2001: Significance of Word Lists, Stanford Leslau, Wolf 1969: Hebrew Cognates in Amharic, Wiesbaden Lohr, Marisa 1998: Methods for the genetic classification of languages, Diss. Cambridge [non vidi] Lucht, Martina 2007: Der Grundwortschatz des Altirischen, Dissertation Bonn Oswalt, Robert L. 1971: Towards the Construction of a Standard Lexicostatistic List, Anthropological Linguistics 13: 421-434 Pagel, Mark et al. 2007: Frequency of word-use predicts rates of lexical evolution throughout Indo-European history, Nature 449: 717-720 Quack, Joachim F. 2002: Die erste Person Plural des selbstndigen Personalpronomens im Mittelgyptischen, Lingua Aegyptia 10: 335-337 Starostin, Sergei 1991: Altajskaja problema i proisxodenie japonskogo jazyka, Moskva 2000: Comparative-historical linguistics and lexicostatistics, in Renfrew, Colin et al. (eds.), Time Depth in Historical Linguistics, Cambridge, vol. I: 223-265 Swadesh, Morris 1952: Lexico-Statistic Dating of Prehistoric Ethnic Contacts with Special Reference to North American Indians and Eskimos, Proceedings of the American Philosophical Society 96: 452-463 1955: Towards Greater Accuracy in Lexicostatistic Dating, International Journal of American Linguistics 21: 121-137 Tadmor, Uri 2009: Loanwords in the worlds languages: Findings and results, in Haspelmath, Martin & Tadmor, Uri (eds.): Loanwords in the worlds languages. A comparative handbook, Berlin: 55-75 Woodward, James 1993: The Relationship of Sign Language Varieties in India, Pakistan, & Nepal, Sign Language Studies 78: 15-22 75 These are all items that occur at least 6 times in my data, with the exception of: (1) some items that in many languages are dependent from other list items (numbers from 5 to 9 may be composed of lower numbers; hundred may be related to ten; other may be related to two; you (pl.) may be derived from you (sg.); what often from the same root as who; tear often expressed as water of eye or the like); (2) two items which tend to be onomatopoetic and can therefore be misleading when used as evidence in historical linguistics (nose, to suck); and (3) one item which, despite showing a good stability rate where it occurs, does not exist as a concept in a large part of the world (winter). 12

Towards Establishing A New Basic Vocabulary List (Swadesh List) (Version 1) Carsten Peust, 2010

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Towards Establishing A New Basic Vocabulary List (Swadesh List) (Version 1) Carsten Peust, 2010

Uploaded by

Copyright:

Available Formats

Towards establishing a new basic vocabulary list (Swadesh list)

(Version 1) Carsten Peust, 2010

Selection of language couples

Selection of lexical entries

Ranking the items and extracting a basic vocabulary list

The data table

Bahasa Indonesia = Malagasy20

Hindi = Persian Irish = Welsh23

all=alle ant=Ameise ashes=Asche

fourmi=furnic ceindre=cenu amer=amar noir=negru sang=snge os=os sein=sn frre=frate

ber]tanya=man on]tny kulit=hodi[kz o26

bear=ge]bren bite=beien bitter=bitter blood=Blut breast=Brust veri=vr

nkks=naax menggigit=man ikitra mrara=mar darah=ra

haaa=qadha ac=ah adh kara=xara kan=xaan

12BHSTY dm=dam 12BHSTY ant=tsem 1BHT 2

qabbanaa=qab ow gel=kel gn=kn l=l i=is kuru=kuraan kulak=kulgaax ye=sie

danaim =gwn eud cluas=clust ubh=wy ocht=wyth gogaa=engeg

12BHSTY 12BHT 12BS 12T 12BST 12BSTY 2

makan=mihna jam(jad-) na =st(d-) telur=atdy34 sem=astoi ok=acs

manger=mnca uf=ou huit=opt il=ochi h=hat dr36=dr

hanqaaquu=ug yumurta=smt ax35 saddeet=siddee sekiz=as d ija=il fagoo=fog d=ts ya=sa

12BDHST ayn=yin Y 2T 2T 12B ruq=raxok

bunga=voni[nk zo lalat=llitra empat=fatra mux=mua letj=lidot tiri=etri

ceathair=pedw afur=afar ar ln=llawn far=gwair glas=glas lmh=llaw ceann=pen mataa=madax onnee=wadne

buah=voa[nkz o penuh=fno hijau=m]itso plen=pilns dvam=dot zeln=za glav=galva src=sirds

heavy honey horn house hundred

kbbad=kaved qnd=kren bet=byit mto=mea rab=raev

berat=ma]vsat ra med=medus tanduk=tndro rog=rags ka se]ratus=zto aku=ho sto=simts az=es led=ledus

honey=Honig horn=Horn house=Haus

trom=trym mil=ml teach=t cad=can m=mi oighear=i glin=glin

ulfaa[taa=culu ar=ar s gaafa=gees ani=ani[ga jilba=jilib kolfa=qosol46 iisa =jiif[so

membunuh=ma mno daun=ravna hati=ty laki=lehilhy koljno=celis znam=zint

jnn=dnesta n jn=zende ae=afu fear=gr

smja=smieties zb=sbi ivja=dzvot dlg=ilgs g3b.t=bi

2 12BH 12BST 1BD

kuku=hho pusat=fitra baru=vo hidung=rona

mjka=mte nkt=nags not=nakts dvet=devii ne=ne ed]n=viens

mont[agne=mu nte souris=oarece ongle=unghie nom=nume troit=strmt nm=nm

membuka=mam otvrjam=atvr wn=oun ha t hujan=rana mrah=mna kj=ke rm=rme

2 12BT 1S 2TY 12BT 1BT 2 12BHT 12B 2 2 T

sr=re zr=zra sbatt=va

sol=sls s[me=s[kla sdem=septii

way=Weg salt=Salz sand=Sand say=sagen see=sehen seed=Saat seven=sieben

yol=suol tuz=tuus kum=kumax de=die gr=kr

racine=rdcin rond=rotund sel=sare mer=mare voir=vedea

framh=gwraid hundee=xidid d cruinn=crwn salann=halen seacht=saith scth=cy]sgod canaim=canu arga=arag

semence=sm n sept=apte coudre=coase ombre=umbr court=s]curt chanter=cnta st=haft hy=sye

torba=toddoba yedi=sette dik=tik glge=klk

shadow=Schatt en short=kurz59 sing=singen sit=sitzen six=sechs kuusi=hat

gabaabaa=gaa ban otur=olor61 alt=alta deri=tirii uyu=utuy kar=xaar

suighim=eisted d60 s=chwech jaa=lix62 craiceann=croe n beag=bach nathair=neidr bofa=mas tuf=tufa

smay=amyi langit=lnitra m tidur=ma]try

sleep=schlafen snow=Schnee spit=spucken63

meludah=mand pljvam=s]pau tf=hi]thaf

stand=stehen star=Stern stone=Stein suck=saugen

toile=stea pierre=piatr sucer=suge soleil=soare doux=dulce queue=coad

12BHSTY 2T 12BHY T 12BS 12BSTY D 2 12 2BS 12BSTY 2B 2T

m=matsat s assr=ser sost=alo

summer=Somm er sun=Sonne sweet=s

samh[radh=haf milis=melys snmh=nofio deoir=deigryn deich=deg sin=hwnnw tr=tri

matahari=maso sl[nce=saule ndro66 manis67=mmy se]puluh=flo tipis=ma]nfy lidah=lla dua=ra sldk=salds

miaawaa=mac aan kuyruk=kuturuk

immimaan=ilm o kana=kan sadii=saddex hia=xidh on=uon o=ol ince=sinnyiges bu=bu =s bala=baay