The social stratification of tongue shape

for postvocalic /r/ in Scottish English1

Eleanor Lawson,a James M. Scobbiea and Jane Stuart-Smithb

a. Queen Margaret University Edinburgh
b. University of Glasgow

The sociolinguistic modelling of phonological variation and change is

almost exclusively based on auditory and acoustic analyses of speech. One
phenomenon which has proved elusive when considered in these ways is
the variation in postvocalic /r/ in Scottish English. This study therefore shifts
to speech production: we present a socioarticulatory study of variation of
postvocalic /r/ in CVr (e.g. car) words, using a socially-stratified ultrasound
tongue imaging corpus of speech collected in eastern central Scotland in
2008. Our results show social stratification of /r/ at the articulatory level,
with middle-class speakers using bunched articulations, while working-
class speakers use greater proportions of tongue-tip and tongue-front raised
variants. Unlike articulatory variation of /r/ in American English, the
articulatory variants in our Scottish English corpus are both auditorily
distinct from one another, and correlate with strong and weak ends of an
auditory rhotic continuum, which also shows clear social stratification.
KEYWORDS: Sociophonetics, articulatory variation, postvocalic /r/,
Scottish English, ultrasound

1.1 Socioarticulatory research
Data on socially-varied speech have overwhelmingly been acoustic/auditory
records of speech output, with the articulatory mechanisms used by speakers
generally inferred from these data. Research into socially-relevant articulatory
variation is missing and the complex and unpredictable relationship that exists
between the sounds of speech and the vocal tract configurations that generate
them has remained the preserve of experimental laboratory-based phonetics
(except see Wright and Kerswill 1989; Kerswill and Wright 1990). In order to
study how speakers physically create the sounds that signal social meaning,

new analysis techniques are needed. We use ultrasound tongue imaging (UTI)
in order to investigate sociolinguistic variables at the articulatory level and
find unexpected socially-stratified articulatory variation. We will suggest that
articulatory data are an essential component in an integrated account of socially-
stratified variation.

1.2 Background: Social stratification of postvocalic Scottish /r/

Over the past four decades, sociolinguistic research carried out mainly in central
Scotland (encompassing the major cities of Glasgow and Edinburgh and their
associated conurbations) has shown that sociophonetic variation in postvocalic
/r/ is becoming increasingly complex. Romaine (1979) was perhaps the first to
point out that the increased sociolinguistic complexity of /r/ variation in eastern
central Scottish speech could be linguistic change in progress. The social indexical
function of weak and strong postvocalic /r/ variants, as observed by Romaine
(1979) and Speitel and Johnston (1983) in Edinburgh and by Macafee (1983) in
Glasgow, seems to have become more clearly defined over subsequent decades.
Contrary to the assumption that middle-class (MC) speech would become more
r-less after the Received Pronunciation (RP) prestige model (see Aitken 1979:
111114), MC speakers have in fact become more r-ful and today typically use a
strongly rhotic postvocalic /r/ variant, usually labelled as retroflex, e.g. far [fa].
Working-class (WC) speakers, on the other hand, have become increasingly
derhoticised, i.e. they often produce /r/s with weak rhoticity and flat, vowel-like
formants. These variants are often weakened to the point where they are heard
as little more than a pharyngealised, or even a plain, vowel with no apparent
/r/ present, e.g. [f( ) ] (Stuart-Smith 2003; Lawson, Stuart-Smith and Scobbie
At the same time, even trying to achieve an adequate description of the
sociophonetic variation in postvocalic /r/, based only on auditory data, has
proved difficult, and this was not resolved with acoustic analysis (Stuart-
Smith 2007). In fact, we were confronted by a more general problem faced
by sociolinguists, namely that a deeper understanding of phonological variation
and change requires information closer to speech production itself, that is, an
investigation of articulation is required.
In 2007 and 2008, we collected ultrasound tongue imaging (UTI) corpora
in eastern central Scotland to investigate the variation from a more direct,
articulatory perspective. A study of tokens of prepausal derhoticised /r/ in the
2007 corpora showed that the tongue-tip raising gesture of (post)alveolar /r/
was still present, but delayed beyond the offset of voicing and therefore weak or
inaudible, explaining why auditory and acoustic analyses of these variants are
so difficult (Lawson, Stuart-Smith and Scobbie 2008; cf. McMahon, Foulkes and
Tollfree 1994: 304305).
In this paper, we show how gestural delay only partly explains the social
patterning in auditory variation for Scottish English postvocalic /r/. Here, we

demonstrate that another aspect of articulation fundamentally different tongue

configurations is also a crucial factor. Our findings are particularly interesting
given that this difference has been claimed to be inaudible in other varieties of
English (Guenther et al. 1999; Twist et al. 2007).


The data used in this study come from a socially-stratified UTI corpus ECB08,
collected in the eastern Central Belt of Scotland in 2008 (Scobbie, Stuart-
Smith and Lawson 2008). Recordings were collected in two locations: the
city of Edinburgh and the New Town of Livingston, 12 km west of Edinburgh
(and around 30 km east of the western city of Glasgow). Adolescents from an
Edinburgh fee-paying school and from a state-sector school in an economically
deprived area of Livingston2 made up broadly distinguished socio-economic
groups of middle- and working-class speakers respectively. Four male and four
female volunteers from the school in Livingston and three male and four female
volunteers from the school in Edinburgh were recorded (UTI and audio) in same-
gender dyads having a conversation for 20 minutes and then individually reading
a word list aloud. The data presented here is taken from the word-list section of
the corpus.
Full methodological details can be found in Scobbie, Stuart-Smith and Lawson
(2008) and Lawson et al. (2010), but briefly, speakers wore a lightweight headset
to stabilise the ultrasound probe under the chin (see Scobbie, Wrench and van der
Linden 2008). Articulate Assistant Advanced software (Wrench 2007) was used
to capture audio and video data at 30 video frames per second. Written prompts
(lexical items containing postvocalic /r/ and /l/, vowels flanked by labial and
glottal consonants, and distracters) were presented to informants via a monitor.
The researchers were in an adjacent room during recordings. Subsequent
debriefing of informants showed that none had guessed our particular focus
on /r/.

Boyce and Espy-Wilson (1997) suggested that, in American English,
coarticulation may lead to varied articulatory strategies for the production of
/r/. In order to focus solely on socially-motivated variation, avoiding variation
that could be attributed to anticipatory coarticulation, 12 words containing a
variety of vowel qualities and ending in /r/: beer, bear, far, bar, par, purr, fur, for,
bore, poor, sure and pure were chosen from the corpus for analysis.
Impressionistic auditory and visual classification of the audio and video data
were undertaken independently to investigate whether speakers from different
backgrounds were using different underlying tongue configurations and whether
variation in tongue configuration correlated with auditory impression of rhotic
strength. Auditory and visual UTI classification was undertaken by two of the

authors (EL and JMS), who are both phonetically-trained native speakers of
Scottish English.

3.1 Auditory classification

In order to avoid making judgments based on prior knowledge of speaker identity,
EL and JMS rated each token of /r/ using a Praat MFC (Multiple Forced Choice)
experiment interface. The raters were asked to choose the closest allophone to
the /r/ found in each of the 147 CVr## words played in random order. A set of
four allophonic categories plus a no /r/ category were arranged running from
weakly rhotic to strongly rhotic on the experiment interface as follows:
no /r/ ;
derhotacised /r/ [V];
alveolar approximant /r/ [];
retroflex /r/ []; and
schwar [].3
Up to twenty replays of each word were allowed.

3.2 Articulatory classification

A system of articulatory classification had to be established before visual
judgment of tongue configuration could take place. Initial attempts at
categorization made reference to Delattre and Freemans (1968) classic
cineradiographic study of British4 and American /r/. However, the different
instrumental techniques and accent varieties used in Delattre and Freeman
(1968) and the present study made it very difficult to unambiguously match the
ECB08 corpus images to Delattre and Freemans categories.
A new broader classificatory system was developed for the present study. The
new classificatory system was developed after several reviews of the data and
careful comparison of tongue shapes. No acoustic data were used during the
articulatory classification, but the identity of each speaker was known.
We were able to classify the tongue gestures at the point of maximum
constriction for /r/ into four categories: TIP UP, FRONT UP, FRONT BUNCHED and

TIP UP describes an articulation where the overall shape of the tongue surface
is either straight and steep, or a concave shape, suggesting retroflexion.
FRONT UPdescribes an articulation where the tongue surface forms a smooth
convex curve. There is no distinct bunching of the tongue front or dip behind
the front region.
FRONT BUNCHED describes an articulation where the front of the tongue has a
distinctly bunched configuration (the tip and blade remain lower than the
rest of the tongue front). A dip in the tongues surface behind the bunched
section is also apparent.

MID BUNCHED describes an articulation where the front, blade and tip are low,
while the middle of the tongue is raised towards the hard palate.
The crucial distinguishing feature between tip/front up variants and bunched
variants is that in the former the tongue tip forms the primary constriction for
/r/, while in the latter, it is the bunched tongue front to mid-dorsum that forms
the primary constriction.
Although the tongue configurations in the present study have been assigned
to discrete categories after Delattre and Freeman (1968), these articulatory
categories might be viewed as ranges on a continuum. In their MRI-based
study of American English /r/, Zhou et al. (2008) also acknowledge the complex
articulatory variability of /r/, but point to the maximal contrast of the retroflex
and bunched /r/ variants (Zhou et al. 2008: 4466). This is also the case in
the present study, where tip up and mid bunched /r/ seem to represent poles at
either end of an articulatory continuum. For this reason, in the analysis of tongue
articulation (section 3.2), the categories are organized as a continuum, so that
general tendencies towards tip up or bunched articulations can be identified.
Waterfall diagrams in Figure 1, below, shows tongue surface outlines for

Figure 1: Sequences of tongue contours, sampled every 30 ms throughout words

ending in /ar/, showing the dynamic movement of the tongue. Time runs in the
direction of the arrows.
Top left informant LM16s utterance of par ending in a tip-up /r/;
Top right LF2s utterance of far ending with a front-up /r/;
Bottom left EF6s utterance of far, ending with a front-bunched /r/;
Bottom right EM5s utterance of bar, ending in a mid-bunched /r/.

consecutive video frames (about 30 ms apart) from the onset of the articulation
of the initial consonant in /Car/ words until the maximum of the r gesture.

4.1 Auditory classification results
Of the 147 auditory rated tokens, only 136 tokens were used in this analysis,
corresponding to the 136 rateable articulatory tokens. In the auditory rating
study, EL and JMS were in exact agreement for 49 percent of the tokens of /r/.
With one category leeway, this became 90 percent agreement. Variation between
the two raters was not predictably in one direction or the other; for example,
tokens classified by EL as derhoticised might be classified as r-less by JMS or vice
versa. In order to incorporate the judgements of the two auditory raters, the five-
point classification scale was expanded to a nine-point classification scale, and
classifications that were one category apart were assigned to an intermediate
category. The remaining 14 tokens were listened to again independently by
EL and JMS using the same Praat MFC experiment interface and reclassified,
bringing EL and JMSs classification of 13 of the tokens within one category of
one another. The remaining token was jointly agreed upon.
Figure 2 shows the percentage makeup of auditory variants used by each of
the socio-economic/gender groups in the study as judged by EL and JMS. The UTI
video recordings of MC male informant EM4s tongue movements were unclear,
possibly due to rotation of the probe during recording. Therefore, no attempt
was made to classify the variants found in his audio recordings. Percentages of
auditory variants used by each socio-economic/gender group were calculated
from raw scores. The lightest shades of grey represent auditory variants at the

Figure 2: The percentage of auditory variants used by each socio-economic and gender
group (N = 136). Paler grey bars represent rless and weakly rhotic variants; darker
grey bars represent strongly rhotic variants

Table 1: Mean classification scores for the

auditory strength of /r/

Socio-economic group
Sex Working class Middle class
Male 3.9 6.6
Female 4.1 7.5

r-less and derhoticised end of the auditory continuum, while the darkest shades
of grey represent retroflex and schwar-type auditory variants. At a glance, each
socio-economic groups preference for variants at opposite ends of the auditory
continuum is confirmed, with the WC group using more weakly rhotic variants
(pale bars) and the MC group preferring strongly rhotic variants (dark bars).
Female MC informants used a more restricted set of variants than the other
groups in the study and these variants were generally strongly rhotic in quality;
for instance, the MC females made greatest use of schwar-type variants. A similar
preference by female informants was also found by Plug and Ogden (2003) in
their study of post-vocalic /r/ in Dutch.
There is some gender differentiation between the WC males and females, with
WC males using the most weakly rhotic variants more often than WC females.
Figure 2 also seems to show that WC males use variants with a strong rhotic
quality more often than the WC females. However, these strongly rhotic variants
occurred mainly in the speech of one individual (see section 4.2).
The mean auditory classification scores for each socio-economic/gender group
are shown in Table 1. A two-way ANOVA was run including Social Class, Gender,
and their interaction. The interaction was not significant, therefore the results
reported below are from a Social Class and Gender main effects model, which
showed a significant main effect for both Social Class F(1,135) = 141.948, p <
0.001 and Gender F(1,135) = 4.853, p < 0.05.
To summarise, the auditory analysis confirms that the informants in this study
conform to the pattern observed in earlier Central Belt studies (Romaine 1979;
Macafee 1983; Speitel and Johnston 1983; Stuart-Smith 1999, 2003, 2007),
whereby auditory strength of /r/ indexes Social Class and Gender.

4.2 Impressionistic articulatory analysis

Speaker EM4s recordings were not used in the study as they were not clearly
visible. EL and JMS were able to classify 136 of the remaining 147 dynamic
tongue movements according to the four categories detailed in section 3.2. The
remaining recordings were not classified because the quality of the UTI video
was poor.
Figure 3 shows the average percentages of each of the four dynamic
articulatory variants used in the classification system, organized according to

Figure 3: The percentage of articulatory variants used by each socio-economic and

gender group (N = 136). The shades of grey ranging from pale to dark represent tip
up, front up, front bunched and mid bunched articulatory variants, respectively

socio-economic/gender group. Paler shades of grey represent tip up and front up

articulations of /r/, while darker shades of grey represent front bunched and mid
bunched articulations of /r/. A divide along socio-economic class lines is again
apparent in Figure 3; a 2 2 Pearsons Chi-square test showed that there was
a relationship between use of tip/front up and bunched variants and social class
2 = 55.479, p < 0.001 (N = 136); MC informants used greater proportions
of bunched variants than the WC informants. There was also a relationship
between use of tip/front up and bunched variants and gender, 2 = 11.230, p <
0.01 (N = 136), with females using greater proportions of bunched variants than
males (although this difference in usage was mainly found in the WC speaker
group). The articulations of WC male informant LM15 account for the entire
21 percent of bunched variants in the WC male group.
For the purposes of illustration, comparison and transparency, Figure 4 shows
nine to 12 overlaid outlines of the tongue surface per informant, taken from
a UTI video frame at the point of maximum constriction for /r/. The tongue
surfaces were fitted using Articulate Assistant Advanced software, an automatic
edge-detection tool and manual correction. The number of tongue outlines per
informant is variable due to the fact that some UTI video recordings were not
sufficiently clear to allow unambiguous identification of the position of the tongue
Most of the speakers in the study are consistent in their production of
postvocalic /r/, with the exception of LM17 who used a noticeably different
tongue configuration in his pronunciation of the /r/ in purr and fur, involving
simultaneous raising and retracting of the tongue dorsum, retraction of the
tongue root and retroflection of the tongue tip and blade. The preceding vowel
environment seems to influence choice of articulation of /r/ for this speaker. The
resulting auditory quality was of a voiceless pharyngealised vowel. Unlike other

Figure 4: All tongue surface splines (between nine and 12 splines per informant)
organized by socio-economic and gender group. Above each set of tongue splines is a
hard-palate surface trace from that speaker

speakers in his socio-gender group, LM15 uses bunched variants exclusively. To

explore this unusual result more, this speakers conversational recordings were
examined in order to see if bunched /r/ was typical in his spontaneous speech
style. Eighteen tokens of stressed prepausal /r/ were found (14 were instances
of pejorative use of the word queer). Eight tokens contained bunched /r/, five
contained tip-up /r/ and five were rless (i.e. with no clear /r/ gesture). The
occurrence of bunched /r/ in word-list speech style is therefore not anomalous
for this speaker, but he makes more use of the bunched variant in the more
formal speech style. In conversational recording, this speaker often used Scots
pronunciations and lexis typical of working-class speech in the Scottish Central
Belt e.g. daein doing, oot out, the day today, fitball football and therefore is
unlikely to have been miscategorised in this socio-economic group. Only one
other WC male used a front bunched variant, LM16 (once in the word beer).
Based on data from the UTI corpora we have collected to date and informal
observation in the field, bunched /r/ is rare among male working-class speakers.
It is unclear at this time why this speaker uses a bunched /r/ variant in word-list
and spontaneous speech style as we do not have detailed information on this

speakers background, or on the general differences in /r/ production which

word-list vs. spontaneous speech style might cause.
Another interesting finding was that the MC tongue configurations in
articulations of prepausal /r/ are clearly neither alveolar nor retroflex
approximants, as has been previously assumed (Romaine 1979; Speitel and
Johnston 1983; Stuart-Smith 1999). What this study shows are tongue
configurations closer to Delattre and Freemans American types 4 and 6
respectively dorsal bunched with dip and fronted bunched (Delattre and
Freeman 1968: 41, 44), the former of which they identified as producing the
strongest auditory impression of /r/ (Delattre and Freeman 1968: 64) in our

4.3 Auditory-articulatory correlation

Although the auditory analysis was more fine-grained than the articulatory
analysis, a one-tailed Spearmans correlation test showed that there was a
strong positive correlation between tongue configuration classification and the
auditory variant classification: r = 0.637 N = 136 p < 0.001. This significant
correlation would appear to confirm that bunched articulations are associated
with a stronger auditory rhotic effect in comparison with other variants.

Our initial auditory identification of a socially-stratified continuum of weaker
(WC) to stronger (MC) auditory variants of postvocalic /r/ was backed up by the
ultrasound investigation. Quantitative analysis of tongue configurations also
showed a continuum from WC boys tending to produce /r/ mainly with canonical
tip up and front up tongue configurations to MC girls always using bunched
tongue configurations. Thus, we find working- and middle-class speakers in the
eastern Central Belt of Scotland using both tongue configuration and gesture
timing in order to produce auditory variants of /r/ at opposite ends of the rhotic
spectrum. This socioarticulatory polarization for postvocalic /r/ is consistent with
the marked sociolinguistic polarization found in auditory data for postvocalic /r/
and seven other consonantal variables in Glasgow, in western central Scotland
(Stuart-Smith et al. 2007).
The indexicality of these different articulatory variants reaches beyond the /r/
itself, causing the vowel systems of these sociolects to diverge. It is likely that
bunched /r/ is responsible for the neutralization of /i/ // and // to [/] before
/r/ e.g. fir, fur, fern [f ], [f], [fn] to [f], [f], [fn] (Aitken 1979: 111), which
is typical of MC Scottish speech, and is untypical of WC speech. (Compare the
prerhotic vocalic tongue body postures of words ending in /ar/ in Figure 1).
Similar articulatory variants of /r/ were identified in American English in
Delattre and Freemans cineradiographic study (Delattre and Freeman 1968).
Delattre and Freeman suggested that alternation between bunched and retroflex

/r/ had no acoustic consequence, so long as strictures were equally extreme

(Delattre and Freeman 1968: 53, 54), making a social-indexical use of tongue
shape unlikely. Subsequent research of American English postvocalic /r/ has
suggested that variation between retroflex and bunched variants is speaker-
specific and idiosyncratic (Mielke, Baker and Archangeli 2007) and that speakers
are only weakly aware of these articulatory variants (Twist et al. 2007: 215),
which again implies that these articulatory variants are unlikely to become
socially indexical. The results of the present socioarticulatory study suggest that
this systemisation of /r/ is contingent, and can be otherwise. In Scottish English,
such articulatory variation is in some way perceptible and can be exploited by
speakers to index socio-economic class, a finding which sheds more light on both
Scottish and American systems of variation.
The identification of socially-stratified articulatory variants of postvocalic /r/
in our corpus shows that an audio/acoustic approach to the analysis of variation
does not capture the full picture. With the advent of ultrasound tongue imaging,
a more integrated approach to variation is possible which can begin to help
address the fundamental question of how articulatory variation is transmitted
from speaker to hearer.

