You are on page 1of 1

Centre for Language and Communication Research

Cardiff University
wintzis@gmail.com
Phoneme Inventory Size and Demography http://www.replicatedtypo.com

James Winters
1 Introduction The figure below highlights how relative differences in the degree of 3 Results/Findings
social interconnectedness will result in decreases or increases in the
It’s long since been established that demography drives evolutionary amount of within-language speaker variation. More variation Correlation Results
processes. Similar attempts are also being made to describe cultural and potentially provides a coordination problem for an individual within a • Population size and Segment Inventory
linguistic processes by considering the effects of population size and other speaker population. One solution is for the language to adapt, through Size (rho=.33, p<.0001).
demographic variables. Even though these ideas are hardly new, until recently, the selection of utterances on the basis of their learnability, by • Geographic Spread and Segment
increasing the number of phonemes. Of course, there will be other Inventory (rho=.06, p=.2069).
there was a ceiling as to the amount of resources one person could draw
pressures keeping the phoneme inventory size from becoming • Population Density and Segment
upon. In linguistics, this paucity of data is being remedied through the
unmanageably large. Abstraction is, after all, cognitively expensive and Inventory (rho=.38, p<.0001).
implementation of large-scale projects, such as WALS, Ethnologue and UPSID, • Social Interconnectedness and Segment
a speaker population will want to keep their number of phonemes as
that bring together a vast body of linguistic fieldwork from around the world. low as possible, without becoming underspecified. Inventory (rho=.39, p<.0001).
This poster utilises these new resources to present preliminary work into
the relationship between the phoneme inventory size and its underlying As you can see the linear relationship for population/segment (rho=.33) and
speaker population. density/segment (rho=.38) is highly significant (p<.0001). However, there
appears to be no significant relationship between the geographic spread of
1.1 Phoneme Inventory Size, Demography, and Utterance the language (area) and the phoneme inventory size (rho=.06, p=.2069). Each
Selection point corresponds to a language. The red line shows a non-parametric
scatterplot smoother fit through these points (Fox, 2002). To measure the
degree of social interconnectedness (SI) I multiplied both speaker size and
One explanation for the observed variance in phoneme inventory sizes is
density together and then took the log to limit the effect of outliers. The idea
random variation: in other words, chance factors alone account for these
being that SI is a product of these two variables combined (Lycett & Norton,
outcomes we see across the world’s languages. Another explanation, and one
2010). Again, this shows a highly significant correlation (rho=.39, p<.0001).
that is more congruent with other recent studies, is that demographic factors
influence and shape the trajectory of languages (Lupyan & Dale, 2010). The Multiple Regression Results:
question being: is there a correlation between the population size of a • Intercept, Area and Population (p<.0001; R-
language and its number of phonemes? Despite work suggesting at such as Squared (adjusted): 0.132; Deviance explained:
relationship (e.g. Trudgill, 2004), there is little in the way of empirical evidence 14.2%.
to support such claims. Hay & Bauer (2007) perhaps represent the most
2 Method • Statistical significance of each predictor: Area
comprehensive attempt at an investigation: reporting a statistical correlation (F-Value = 40290; p<.0001) and Population (F-
between the number of speakers and a language’s phoneme inventory. The first step was to gather demographic data and segment inventory Value = 34364; p<.0001)
data from two sources: Ethnologue and UPSID.
Using their paper as a springboard, I decided to look at how other With the relationship of segment inventory to area and population appearing
demographic factors might influence the phoneme inventory, namely: area, Ethnologue is a great resource for finding out speaker population size and to be non-linear we can also use a Generalised Additive Model (GAM). This
population density and the degree of social interconnectedness. The rationale being its geographic spread — from which we can then work out the speaker model allows you specify a distribution (in this case a Gaussian), which can
that socio-demographic differences in the underlying speaker community will density per km². The UCLA Phonological Segment Inventory Database then be viewed with a fitted surface for the additive non-parametric
lead to different levels of within-language variability. Experiments into the (UPSID) contains statistical surveys of the phoneme inventories for 451 regression of segment inventory on area and population (see above). The
acquisition of phoneme categories suggest that when an infant (or adult) is world languages. The final number of languages used in my sample was surface shows more or less what I’d expect on the basis of my social
exposed to tokens from a particular ―phonetic space in a uni-modal 397. The removal of some languages was based simply on the lack of data interconnectedness scale – a large population spread over a small area has a
distribution, they tend to learn this as a single category. When a distribution pertaining to geographic spread. Also, in the spirit of openness: some of larger phoneme inventory.
over the same phonetic space is bimodal, it is learned as two categories.‖ the languages, particularly those originating in South America, required
(ibid). As Pierrehumbert, Beckman & Ladd (2000) note: ―VARIABILITY area data from outside of Ethnologue (government surveys etc), and their 4 Discussion/Summary
CAUSES THE NEED FOR ABSTRACTION‖. reliability may be suspect. In line with Hay & Bauer I also removed any
The current poster presents some of the early work into examining the
languages that fell more than four standard deviations from the mean, as
relationship between demography and a language’s phoneme inventory size.
One way of increasing exposure, resulting in denser distributions, is through these may exert undue influence on the statistics. In this particular sample
Future work to be done, includes: controlling for the effects of language
increasing the population size. In addition, the density of a population is also these languages were !Xu (141 phonemes) and Archi (91 phonemes).
families, more fine-grain demographic information, establishing the rate of
important: as this allows for more stable transmission chains between change in phoneme inventories, among other factors. Ultimately, I think any
speakers. In summary: the larger and denser the speaker population, the Next, I plugged the data into R and used this to perform several
future studies will have to build up a large body of mutually supporting
more variation it will produce and be exposed to, and subsequently this will correlations and regressions on speaker size, geographic spread and
evidence (through models and experiments) on top of a more rigorous study
result in a larger phoneme inventory. speaker density (see §3).
of the data.
Funded by the corporate bank of time

You might also like