You are on page 1of 25

Clinical Linguistics & Phonetics

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/iclp20

Tone and vowel disruptions in Mandarin aphasia


and apraxia of speech

Wenjun Chen, Jeroen van de Weijer, Qian Qian, Shuangshuang Zhu & Manna
Wang

To cite this article: Wenjun Chen, Jeroen van de Weijer, Qian Qian, Shuangshuang Zhu &
Manna Wang (2023) Tone and vowel disruptions in Mandarin aphasia and apraxia of speech,
Clinical Linguistics & Phonetics, 37:8, 742-765, DOI: 10.1080/02699206.2022.2081611

To link to this article: https://doi.org/10.1080/02699206.2022.2081611

© 2022 The Author(s). Published with


license by Taylor & Francis Group, LLC.

Published online: 03 Jun 2022.

Submit your article to this journal

Article views: 1081

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=iclp20
CLINICAL LINGUISTICS & PHONETICS
2023, VOL. 37, NO. 8, 742–765
https://doi.org/10.1080/02699206.2022.2081611

Tone and vowel disruptions in Mandarin aphasia and apraxia of


speech
Wenjun Chena, Jeroen van de Weijerb, Qian Qianc, Shuangshuang Zhuc,
and Manna Wangc
a
School of Foreign Languages, Ningbo University of Technology, Ningbo, Zhejiang, China; bSchool of Foreign
Languages, Shenzhen University, Shenzhen, Guangdong, China; cSpeech and Language Therapy Department,
Shanghai YangZhi Rehabilitation Hospital (Shanghai Sunshine Rehabilitation Center), School of Medicine,
Tongji University, Shanghai, China

ABSTRACT ARTICLE HISTORY


In this study, we investigated the lexical tones and vowels produced Received 4 October 2021
by ten speakers diagnosed with aphasia and coexisting apraxia of Revised 18 May 2022
speech (A-AOS) and ten healthy participants, to compare their tone Accepted 19 May 2022
and vowel disruptions. We first judged the productions of both A-AOS KEYWORDS
and healthy participants and classified them into three categories, i.e. Mandarin A-AOS speech;
those by healthy speakers and rated as correct, those by A-AOS lexical tones; vowels;
participants and rated as correct, and those by A-AOS participants disruption
and rated as incorrect. We then compared the perceptual results for
the three groups based on their respective acoustic correlates to reveal
the relations among different accuracy groups. Results showed that
the numbers of tone and vowel disruptions by A-AOS speakers
occurred on a comparable scale. In perception, approximately equal
numbers of tones and vowels produced by A-AOS participants were
identified as correct; however, acoustic parameters showed that,
unlike vowels, the patients’ tones categorised as correct by native
Mandarin listeners differed considerably from those of the healthy
speakers, suggesting that for Mandarin A-AOS patients, tones were
in fact more disrupted than vowels in acoustic terms. Native Mandarin
listeners seemed to be more tolerant of less well-targeted tones than
less-well targeted vowels. The clinical implication is that tonal and
segmental practice should be incorporated for Mandarin A-AOS
patients to enhance their overall motor speech control.

Introduction
Aphasia is a condition in which a person is unable to comprehend or formulate language
due to the impairment of a specific brain region. It affects one (or more) of the four
communication components, including auditory understanding, vocal expression, reading
and writing, and functional communication. On the other hand, Apraxia of speech (AOS) is
a term for motor speech disorder in which the ability to coordinate the sequential,
articulatory actions required to make speech sounds is hampered. AOS has been described
as disruption of motor planning or programming (Van Der Merwe, 2021), and is typically

CONTACT Wenjun Chen 0184101205@shisu.edu.cn School of Foreign Languages, Ningbo University of Technology,
Ningbo, China; Qian Qian flowersinmyheart@126.com Speech and Language Therapy Department, Shanghai
YangZhi Rehabilitation Hospital (Shanghai Sunshine Rehabilitation Center), School of Medicine, Tongji University, Shanghai,
China
© 2022 The Author(s). Published with license by Taylor & Francis Group, LLC.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://
creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided
the original work is properly cited, and is not altered, transformed, or built upon in any way.
CLINICAL LINGUISTICS & PHONETICS 743

seen in patients with lesions to Broca’s area, the left frontal and temporoparietal cortex, the
left, superior, anterior region of the insula, as well as left subcortical structures (Ogar et al.,
2005; Ogar et al., 2006; Ackermann & Ziegler, 2010). Because AOS symptoms frequently
coexist with non-fluent aphasia, examples of aphasia with concomitant apraxia have been
well documented (Haley, 2002; Kurland et al., 2012; Weiss et al., 2016). Both conditions are
thought to affect articulation and prosody.
As a tone language, Mandarin Chinese differentiates words by both segmental and tonal
structures in the lexicon. Each syllable in Mandarin is a morpheme, carrying one of the four
tones and a vowel. In Mandarin monosyllabic words, Tone 1 (T1), Tone 2 (T2), Tone 3 (T3)
and Tone 4 (T4) have a high level, low rising, falling-rising and high falling fundamental
frequency (f0) contour, respectively. Segmentally identical syllables produced with different
tones convey different meanings in Mandarin. For example, the syllable /ma/ means
‘mother’, ‘hemp’, ‘horse’ and ‘scold’ when produced with the four different tones
(Duanmu, 2007). The primary and sufficient perceptual cue for Mandarin tone is the f0
contour. All other auditory cues, such as amplitude, vowel duration, or vocal quality, are
unimportant when f0 information is available, though they have been proven to serve as
cues in tone perception in the absence of f0 contours (Fu & Zeng, 2000;
Whalen & Xu, 1992). Mandarin tones are processed in the fronto-parietal areas (including
the left posterior IFG, adjacent PMC and left SMG/IPS), bilateral temporo-parietal and
subcortical regions (Myers et al., 2009). Impairments to these regions cause production
errors like tone substitutions (e.g. T2 where T3 should appear, Packard, 1986), abnormal
changes in f0 contours (e.g. a level tone with a sudden fall midway), f0 range (e.g. larger or
smaller compared to healthy speakers) or f0 position (e.g. elevated or lowered; Gandour
et al., 1988, 1992) and unusual tone duration (e.g. abnormally longer or shorter than healthy
speakers, Ryalls & Reinvang, 1986). In comparison to the four lexical tones, Mandarin has
a large number of vowels, 22 in all (Lee & Zee, 2003). Common sensorimotor activation
during vowel processing was observed on a left postero-dorsal stream, including the
opercular part of Broca’s area, the adjacent ventral premotor cortex, and the
temporo-parietal junction (Leff et al., 2009; Wilson et al., 2009). Damage to these areas
led to vowel substitution or distortion errors, such as incorrect tongue positioning,
abnormal prolongation, and so on, resulting in formant (e.g. the first and second formant,
i.e. F1, F2) or vowel duration aberrations (Ackermann et al., 1999; Haley et al., 2001; Odell
et al., 1991). The processing of lexical tones and vowels in Mandarin involves both the left
frontal (Broca) and temporal areas, as well as other sensorimotor areas. Disruptions of these
areas are likely seen in either non-fluent aphasic patients or patients with AOS.
Clinically, Mandarin speakers with brain damage face the risk that both tonal and
segmental structures are impaired. However, previous research has only focused on either
tonal or segmental disruption alone (Gandour et al., 1988; Kent & Rosenbek, 1983; Haley
et al., 2001; Yiu & Fok, 1995), with few comparisons of the two for those tone-language
speakers. This is surprising, given the importance of such comparison in therapeutic
practice and linguistic theory. One possible reason for this could be that separate
mechanisms (articulatory vs. laryngeal) are involved in the generation of segmental and
tonal aspects (Liu et al., 2006; Kent et al., 2022), so the connection between them isn’t always
obvious. However, as Mandarin vowels and any voiced segment in syllable codas could also
carry tonal information (Howie & Howie, 1976), it is reasonable to assume that the vocalic
and tonal structures are interdependent in speech production. As both elements are part of
744 W. CHEN ET AL.

the motor control of speech, which is frequently impaired in patients with aphasia and
apraxia of speech, a thorough comparison of acoustic/perceptual characteristics of tone and
vowel disruptions would be informative in how speech impairments occur.
One technical issue is that most prior studies used the same acoustic measures, such as f0
range or f0 overall shape, to examine the pitch contours of all disrupted tones
(Gandour et al., 1988, 1989, 1992; Kadyamusuma et al., 2011). More exact acoustic features,
such as f0 height, f0 slope, and so on, should be used to describe the f0 contours of each tone
type, so that their extreme (highest or lowest) values may be better captured. As for vowels,
the static cues of vowel formants are the major acoustic parameters in assessing vowels
produced by patients with brain damage now (Haley et al., 2001; Jacks et al., 2010; Kurowski
et al., 1998; J. H. Ryalls, 1986). However, as dynamic properties also contribute to vowel
production and perception (Elvin et al., 2016; Renwick & Stanley, 2020; Hualde et al., 2021),
the acoustic measures in vowel analysis can be further improved for this topic.
Thus, the present study aims to provide a detailed account of the severity of tone vs.
vowel disruptions for Mandarin aphasic patients with coexisting apraxia of speech
(A-AOS). The specific aims include: (1) to provide a detailed acoustic account of the correct
and incorrect tones and vowels in monosyllabic words produced by Mandarin A-AOS
patients; (2) to see if tones and vowels are disrupted separately or in combination, and what
impacts the disruptions may have on the speech output.

Method
Design
This experiment compared the tones and vowels produced by Mandarin A-AOS patients
with healthy controls (All work was conducted with the formal approval of the human
subjects committee of Shanghai YangZhi Rehabilitation Hospital). To achieve this, we made
perceptual judgments and acoustic descriptions of participants’ tone and vowel
productions. The productions by A-AOS patients and healthy controls were first evaluated
and categorised by ten Mandarin Chinese listeners. Here we followed Wong’s (2012)
practice for the token categorisation, in which tones or vowels identified correctly by 8 to
10 raters were scored as ‘correct’, whereas those identified correctly by only 4 or fewer
judges were scored as ‘incorrect’. In this way, all the productions can be classified into three
groups: ‘Healthy speakers’ correct productions’ (HC), ‘Patients’ correct productions’ (PC)
and ‘patients’ incorrect productions’ (PI). Productions correctly identified by 5 to 7 judges
were excluded from the analysis. The acoustic parameters of the productions in each group
were then tested and compared. This was to determine if A-AOS patients’ ‘correct’ output
differed acoustically from the healthy speakers, and if the patients’ ‘incorrect’ productions
had any acoustic regularity that identified them from the ‘correct’ ones. Finally, an overall
comparison of tone and vowel disruptions was conducted.

Participants
Ten speakers diagnosed with A-AOS (age range = 25–63, M = 47.3, SD = 11.1) and ten
age-matched healthy controls (age range = 28–61, M = 46.7, SD = 9.8) participated in the
study (Table 1). All participants demonstrated aphasia with AOS at least six months
CLINICAL LINGUISTICS & PHONETICS 745

Table 1. Speaker characteristics for the healthy control and A-AOS groups.
Speaker Age Sex Schooling Handedness Dialect MPO Lesion site Aphasia AQ
HC1 42 F High Right M-BJ NA NA NA 100
HC2 39 F High Right M-BJ NA NA NA 98.4
HC3 58 F High Right M-HB NA NA NA 99.8
HC4 61 F High Right M-BJ NA NA NA 99.2
HC5 54 F High Right M-HB NA NA NA 100
HC6 28 M High Right M-LN NA NA NA 98.9
HC7 39 M University Right M-BJ NA NA NA 100
HC8 44 M High Right M-BJ NA NA NA 99.2
HC9 55 M High Right M-BJ NA NA NA 99.3
HC10 47 F High Right M-HB NA NA NA 99.8

A-AOS1 46 M High Right M-BJ 6 Frontal-parietal Broca 78.4


A-AOS2 44 M University Right M-BJ 15 Frontal-parietal-occipital Broca 70.5
A-AOS3 61 M High Right M-HB 22 Frontal Anomic 82.3
A-AOS4 63 M High Right M-BJ 25 Frontal-parietal-temporal Broca 68.5
A-AOS5 58 M High Right M-HB 13 Parietal Conduction 43.5
A-AOS6 25 F High Right M-LN 7 Parietal Anomic 80.4
A-AOS7 36 F High Right M-BJ 12 Frontal-parietal with lentiform Broca 66.2
nucleus
A-AOS8 43 F High Right M-BJ 18 Frontal-parietal Broca 64.5
A-AOS9 52 F High Right M-BJ 10 Parietal Anomic 53.5
A-AOS10 45 M High Right M-HB 23 Frontal Anomic 67.2
Note. MPO = Months post onset, Aphasia type was determined based on the Western Aphasia Battery Chinese
adaptation, AQ = Western Aphasia Battery (Chinese adaptation) Quotients. For the healthy control group (HC), mean
AQ = 99.4, Std. = 0.55; for the A-AOS group, mean AQ = 67.5, Std. = 12. For the dialect column, M = Mandarin,
abbreviations after dashes stand for birthplaces: BJ = Beijing, HB = Hebei province, LN= Liaoning province.

post-onset (M = 15 months, SD = 6.4) of left hemisphere cerebral vascular accident. The


following inclusion criteria for participants were met: (1) Mandarin monolingual,
(2) right-handed, (3) minimum of high school education, (4) passed an audiometric
pure-tone screening at 35 – dB HL at 500, 1000, and 2000 Hz for at least one ear, (5) assessed
through Mini-Mental State (Folstein et al., 1975) to rule out any dementia. If a participant spoke
a dialect other than Mandarin, had a medical history of depression or other psychiatric illness,
degenerative neurological illnesses, chronic medical illness, or dysarthria, they were excluded
from the study. When dysarthria was suspected, a speech mechanism evaluation was performed
(Duffy, 2019).
The type and severity of aphasia were determined by each subject’s performance on Western
Aphasia Battery Chinese adaptation (Gao, 2006). Presence of AOS was determined primarily by
characteristics observed during the performance on the Apraxia Battery of Adults (2nd edition,
subsets I, II, IV, Dabul, 2000). A motor speech evaluation (Duffy, 2019) was audio-recorded and
presented to three speech-language pathologists (native Mandarin speakers) for AOS diagnosis.
Each pathologist had more than ten years of experience identifying neurogenic communication
disorders and had no knowledge of the study purpose. The audio-recorded A-AOS speech
samples were mixed with those of the healthy controls. To diagnose AOS and assess severity,
speech-language pathologists utilised an 8-point rating scale ranging from 0 (no AOS) to
7 (severe AOS). Criteria for the presence of AOS included: (a) effortful speech with
self-correcting attempts; (b) frequent articulatory errors, such as substitutions, distortions,
omissions, and additions; (c) atypical prosody; and (d) articulatory variability across repeated
productions of the same utterance (Wertz et al., 1984). The pathologists first listened
746 W. CHEN ET AL.

independently and then discussed their findings until they reached an agreement on a diagnosis
and severity level. Rated severity of AOS ranged from 2 to 5 for the patients and 0 for all the
healthy controls.
All patients spoke only Mandarin at home and in their respective job settings; each
A-AOS participant was paired with a relative who was close in age and had a similar
educational background, they constituted the control group.

Stimuli
Twenty monosyllabic words were chosen, each having one of the four lexical tones
and a monophthong from the five point vowels /a, y, u, i, ɤ/ in standard Chinese
(Lee & Zee, 2003). These words were chosen after discussion between the authors
and the speech-language therapists, and also by consultation with patients’ families
or caregivers. The stimuli were related to all the patients’ daily routines, and thus
expected to be easily recognisable (Table 2).

Recording procedures
Recordings were made in the soundproof laboratory office at the hospital. All samples were
recorded as wave files using a Korg audio MR-1 recorder (AKG miniature condenser model
C520) with a headset microphone positioned 5 cm away from the participants’ mouths. Each
subject was provided with 20 monosyllabic words, represented by 20 photographs in random
order. Each photo had an image in the centre and a Chinese character in the bottom right
corner, representing the same word. The participants could thus probe the connotations of the
stimuli either graphically or by the monosyllabic words. An experimenter held up one picture
at a time in front of the participant to encourage word production. A practice session was held
before the recording began, in which participants were asked to label 6 to 8 pictures with
monosyllabic words in isolation. If a bisyllabic or trisyllabic word was produced, they were
instructed to repeat it with only the first or last syllable (usually the target ones). For example,
if a participant identified a picture of a ‘duck’ as the disyllabic word ‘/ja/(T1) /zɨ/’, the
experimenter would say ‘请再试一遍, 注意这次只能说第一个字哦’, meaning ‘please try it
again, just say the first word this time’. Throughout the recording session, the patients were
instructed to pronounce the target words in isolation, with a normal voice and volume. If they
had difficulties retrieving the target words, the experimenter would provide prompts on the
words’ meanings or collocations, to aid elicitation. For example, when eliciting the target verb
‘/ba/ (T2)’(which means ‘to pull’), the experimenter pointed to the picture and ask:
‘他在做一个动作, 我们叫‘什-么-’萝卜?’, meaning ‘He is performing an action that we call
‘w-h-a-t’ the radish’? The word ‘what’ was lengthened and the experimenter performed the
action of ‘pulling the radish’ at the same time. After being prompted by the experimenter, the
participants were given enough time to try again. They were also allowed to self-correct. All
participants labelled each picture twice. If the first production could not be used due to a lack
of isolation, the second was chosen. All the non-isolated words were eventually excluded. The
principal investigator was present during the recording process to ensure that all speakers
received the same instructions and procedures. The final set of usable productions involved
200 monosyllabic words produced by healthy speakers and 191 by A-AOS participants.
Table 2. Stimuli of the experiment.
Tone 1 Tone 2 Tone 3 Tone 4
Word Pinyin IPA Gloss Word Pinyin IPA Gloss Word Pinyin IPA Gloss Word Pinyin IPA Gloss
/a/ 鸭 ya /ja/ duck 拔 ba /ba/ pull 马 ma /ma/ horse 大 da /da/ big
/y/ 居 ju /tɕy/ live 鱼 yu /ju/ fish 雨 yu /jy/ rain 绿 lv /ly/ green
/u/ 书 shu /ȿu/ book 福 fu /fu/ for-tune 斧 fu /fu/ axe 树 shu /ȿu/ tree
/i/ 一 yi /ji/ one 棋 qi /tɕhi/ chess 椅 yi /ji/ chair 戏 xi /ɕi/ opera
/ɤ/ 喝 he /xɤ/ drink 河 he /xɤ/ river 葛 ge /gɤ/ a sur-name 热 re /ɻɤ/ hot
Note: Four lexical tones and five point vowels were included.
CLINICAL LINGUISTICS & PHONETICS
747
748 W. CHEN ET AL.

Perceptual identification of tones and vowels


The 391 productions by A-AOS patients (191) and healthy controls (200) were excised and
saved as individual sound files. Ten Mandarin listeners were recruited to identify the tone
and vowel productions. The listeners were speech and language therapy students who were
all Mandarin natives trained in phonetic transcription. All listeners had normal hearing
(better ear pure-tone threshold ≤ 20 dB HL for octave frequencies of 0.125 kHz to 8 kHz).
They were unaware of the patients’ diagnosis, and were required to transcribe the stimuli
only with Pinyin.1 Methods for controlling lexical bias were used for the tone and vowel
judgments respectively: for tone judgments, all the productions were low-pass filtered at
400 Hz to remove most of the segmental information while keeping the fundamental
frequency information (Wong, 2012); for vowel judgments, the initial consonant of each
word was excised and only the vowel section was kept as a stimulus, to avoid any lexical
influence (Following Mousikou & Rastle, 2015; New et al., 2008; Singh et al., 2015). All
productions were normalised to the same root mean square value of 65 dB SPL using Praat
v6.1.24 (Boersma & Weenink, 2020).
Prior to the formal experiment, a practice session was conducted to familiarise
listeners with the task. During the tone identification task, the listeners were
instructed to write down the tone categories (T1-T4) on an answer sheet, with the
Pinyin marks and tone type abbreviations ‘ ̶ (T1)’ ‘ ̷ (T2)’ ‘˅ (T3)’ ‘\(T4) ’. In the
vowel identification task, the same listeners were told that only vowels from the
Chinese Pinyin inventory would be presented to them, and they should write them
down on another answer sheet.2 All productions were delivered to these listeners via
PowerPoint slides in a sound-treated booth with high-quality headphones
(Sennheiser HDA200). They were presented in a quasi-random order so that listeners
would not hear samples from the same speaker consecutively. Listeners could repeat
a stimulus as many times as they wanted before writing down their response in this
self-paced identification task. They were also allowed to take breaks whenever they
wanted, and to spread out the duties across several days to avoid listening fatigue
(Vitti et al., 2021).
Although differential token manipulation for the vowel and tone identification tasks
(low pass filter vs. vowel extraction) might help avoid a lexical effect, an anonymous
reviewer pointed out that it could have different impacts on listeners’ judgment of tones
and vowels. Considering this, we conducted a second identification task on the
non-manipulated productions. With all other experimental conditions remaining the
same, the listeners were required to identify the tones and vowels of the non-manipulated
stimuli, using symbols in the Pinyin system.

1
The use of the International Phonetic Alphabet was discouraged, because the goal of the transcription was to identify the
participants’ tone or vowel productions within the Mandarin phonological system. The identification, on the other hand,
was not concerned with the degrees of distortion for each stimulus in its most delicate form.
2
The phonemes /i/ and /ɨ/ in Chinese phonology were represented by the same Pinyin symbol ‘i’, so raters were informed that
when an ‘i’ was identified, he/she must specify if it was as that in ‘si’ (/sɨ/) or that in ‘xi’ (/ʃi/).
CLINICAL LINGUISTICS & PHONETICS 749

Acoustic measurements
Acoustic features of tones
To measure the acoustic properties of the tones, we investigated the f0 of each tone. We used
the custom-written Praat script Prosodypro-3.1(Xu, 2005–2010) to extract and measure f0 on
the rime of each word.
First, for the level tone T1, ‘pitch shift’ was used to measure and compare the flatness of the
f0 contour. Then, the ‘height of mean f0’ and ‘height of min f0’, which indexed the mean and
minimum level of the pitch of the produced tone (token mean) relative to the speaker’s mean
f0 (speaker mean), were used to determine how the speaker achieved the high and low tonal
targets for T1 and T3(dipping tone), respectively. We also utilised ‘directional excursion’ to
quantify the degree of positive and negative f0 span for T2 and T4, with the parameter ‘slope’
determining the range and steepness. Finally, the terms ‘timing of min f0’ and ‘timing of max
f0’ were used to measure when the maximum and minimum f0 arrived for both T2 and T4
targets. These tone parameters were obtained following Wong’s (2012) approach. The
algorithms for the f0 parameters (except duration) are presented in Table 3 (in semitones).
Duration values were extracted from the entire length of the tone stimuli.

Acoustic features of vowels


The current study looked at both the static and dynamic spectral features of vowel formants,
as the combination of the two proved to be highly effective in the identification of different
vowels (Chen, 2008; Hillenbrand et al., 2001; Zahorian & Jagharghi, 1991). Here we
incorporate duration, F1, F2 and F1, F2 spectral change values of each vowel as parameters
for acoustic analysis (Adank et al., 2007; Chen, 2008; Sarvasy et al., 2020).
Vowel duration was measured between the vowel onset and offset boundaries which
were identified manually (Ordin & Polyanskaya, 2015; Perterson & Lehiste, 1960): vowel
onset was placed at the onset of visible voicing. In the case of nasal or approximant initials,
the onset boundary was established at the point where formants began to change rapidly out
of the stricture of the preceding consonants. Changes in the waveform and the higher
formant regions (F3, F4) were also considered when determining the exact point; vowel

Table 3. Algorithms for f0 parameters.


Tone Acoustic parameters Purpose
T1 Pitch shift (St) = max f0 - min f0 To test the flatness of the f0 in T1
Height of mean f0 (St) = mean f0 To test the tone target: ‘High’ in T1
(token) - mean f0 (speaker)
Height of min f0 (St) = min f0 (token) - To test the tone target: ‘High’ in T1
mean f0 (speaker)
T3 Height of mean f0 (St) = mean f0 To test the tone target: ‘Low’ in T3
(token) - mean f0 (speaker)
Height of min f0 (St) = min f0 (token) - To test the tone target: ‘Low’ in T3
mean f0 (speaker)
T2 Directional excursion (St) = ± (max f0 - To test the amount of f0 displacement in T2 or T4, negative
min f0) if the max f0 precedes min f0
T4 Slope = Directional excursion/duration To test the steepness of slopes of f0 in T2 or T4
(max f0 - min f0)
Timing of max f0 = Duration (onset - To test the timing of max f0 in T2 and T4 by means of its
max f0)/Duration (rime) proportional duration in the rime
Timing of min f0 = Duration (onset - To test the timing of min f0 in T2 and T4 by means of its
min f0)/Duration (rime) proportional duration in the rime
Note: St = Semitones.
750 W. CHEN ET AL.

offset boundaries were identified by the termination of voicing and the formant structure.
The authors double-checked all segmentations for correctness and consistency to reduce
experimenter bias.
Based on formant trajectories, the values (in Hertz) at the 25%, 50% and 75% points of
the F1 and F2 duration were extracted (Chen, 2008). Therefore, the static values of the
formants were the ones taken at the midpoint of F1 and F2 (50% of the duration), whereas
the spectral change values of F1 and F2 were calculated as the difference in the formant
values between the 25% and 75% points.

Results
Disruption of tones
The listeners identified 197 tone productions as ‘Healthy speakers’ correct (HC)’ (50, 50, 48
and 49 productions for T1, T2, T3 and T4, respectively), 125 as ‘Patients’ correct (PC)’ (35, 39,
13 and 38 productions for T1, T2, T3 and T4, respectively) and 35 as ‘Patients’ incorrect (PI)’
(9, 4, 12 and 10 productions for T1, T2, T3 and T4, respectively). The inter-rater reliability was
0.989 (Cronbach’s alpha). Average tone disruption rate was 18.3%, with by-speaker ranging
from 0 to 35% (mean = 18%, SD = 0.11). The PI/PC ratio was 28%. The accuracy rates for T1,
T2, T3 and T4 were 74% (35), 80% (39), 28% (13) and 79% (38), respectively. Among these,
patients’ T3 was least identifiable to listeners. Overall, the disruption of tones pervasively
existed in the patients’ articulation, with the majority of the patients having at least two or
more productions identified as ‘incorrect’ (For A-AOS1 to A-AOS10, the incorrect numbers
of tokens produced were 3,3,2,2,3,0,6,7,6,3, respectively). The numbers of the HC, PC and PI
groups are shown in Table 4.
Time normalised f0 contours at the rime of the syllables were plotted for the HC, PC and
PI groups in Figure 1. For the HC group, we observe very clear f0 contours that are high and
level for T1, rising for T2, dipping for T3 and falling for T4. The PC group’s contours, on the
other hand, show the general shape of each tone type but are overall flatter than the HC
group. By contrast, the f0 contours in the PI group show a lot of variations. To facilitate
a better comparison, we plotted the average f0s of the four tones in Figure 2 for each
accuracy group. The graphs show that the differences between the HC and PC groups are
more likely to be a matter of degree, but the contours of the HC and PI groups appear to be
different in nature.

Table 4. Numbers of tone and vowel productions in each group.


Tone Vowel
By-speaker disruption rate = 0 to 35% By-speaker disruption rate = 0 to 60%
(mean = 18%, SD = 0.11) (mean = 25%, SD = 0.26)
HC PC PI HC PC PI
T1 50(100%) 35(74%) 9(19%) /a/ 40(100%) 31(78%) 6(15%)
T2 50(100%) 39(80%) 4(8%) /y/ 40(100%) 27(68%) 13(32.5%)
T3 48(96%) 13(28%) 12(26%) /u/ 40(100%) 27(69%) 10(26%)
T4 49(98%) 38(79%) 10(20%) /i/ 40(100%) 34(87%) 5(13%)
/ɤ/ 40(100%) 16(48%) 14(42%)
Total 197(98.5%) 125(65%) 35(18.3%) Total 200(100%) 135(71%) 48(25.1%)
CLINICAL LINGUISTICS & PHONETICS 751

Figure 1. f0 plots of the four tones produced by HC, PC and PI groups. The X-axes represent the
normalised duration of the tone productions, while the Y-axes show the mean fundamental
frequency of the pitch contours (in semitones). The rising-falling contours of two T2 and one T4
(all produced by A-AOS 9) were categorised as correct by listeners, as indicated by the red arrows.

As there were three accuracy groups in the study, and the assumption of normality and
homogeneity of variance was violated in the parametric statistics (the values of the acoustic
parameters are shown in Table 5), a Kruskal-Wallis test was carried out (SPSS version 24) to
measure the differences between the f0 parameters among the three groups. For each type of
tones, the Kruskal-Wallis test provided evidence of a difference (p < 0.05) between the mean
ranks of at least one pair of groups. Group differences were then subjected to Dunn’s
pairwise test (Table 6). It turned out that Patients’ correct productions (PC) of T1 were
lower in height than healthy speakers, as indicated by the significantly smaller height of
mean f0 (p < 0.001) and height of min f0 (p < 0.001) in the f0 contour. The relative flatness
of patients’ correct T2 output compared to the HC group was also reflected by significantly
smaller degrees of slope (p < 0.001) and directional excursion (p < 0.001). T2 in the PC
group also had a shorter duration3 (p < 0.01) and reached its maximum f0 earlier than
healthy speakers (p < 0.001), suggesting that patients were not very good at keeping the f0
low and steep for an extended period. Both of the two T3 parameters, ‘height of mean f0’
and ‘height of min f0’, were significantly smaller in the PC than HC group(p < 0.001),

3
Tone duration was measured for the entire word, which means that the type of onset consonant in each word affects the
duration. For a more rigorous analysis, a subset of words for the HC group that are matched with the PC group should be
considered.
752 W. CHEN ET AL.

Figure 2. Average f0 plots of the four tones produced by HC, PC and PI groups. The X-axes represent the
normalised time of the average tone production, while the mean fundamental frequency of the pitch
contour (in semitones) is given along the Y-axis.

indicating that the patients had difficulties in maintaining the f0 either as low or as dipped as
the healthy speakers. Patients’ correct T4 productions were also less ranged and steep
compared to healthy speakers, as seen by much lower values in directional excursion
(p < 0.001) and slope (p < 0.001).
Patients’ incorrect tone productions (PI) were also significantly different from
Patients’ correct (PC) or Healthy speakers’ correct (HC) tone productions. For
example, T1 in PI had a significantly larger pitch fluctuation (p < 0.001) and
a lower minimum f0 height (p < 0.05) than T1 in PC or HC, which distinguished it
from a level tone. T4 in PI also showed different timing in reaching the maximum f0
than T4 in PC or HC (p < 0.01).

Disruption of vowels
The raters identified 200 vowel productions as Healthy speakers’ correct (HC) (40 for each
of the vowels /a/, /y/, /u/, /i/,/ɤ/), 135 as Patients’ correct (PC) (31, 27, 27, 34, and 16
productions for /a/, /y/, /u/, /i/ and /ɤ/, respectively) and 48 as Patients’ incorrect
(PI) (6, 13, 10, 5 and 14 productions for /a/, /y/, /u/, /i/ and /ɤ/, respectively). Inter-rater
reliability for vowel identification was also good (α = 0.974). Average vowel disruption rate
was 24%, with by-speaker rate ranging from 0 to 60% (Mean = 25%, SD = 0.26). The PI/PC
ratio was 36%. Accuracy rates for the vowels /a/, /y/, /u/, /i/, /ɤ/ were 78% (31), 68% (27),
69% (27), 87% (34) and 48% (16), respectively. The vowel /i/ had the largest number of
correct productions (34) and smallest number of incorrect productions (5). Compared with
tones, the ‘incorrect’ vowels were more unevenly distributed, with most of them clustering
around several patients (For A-AOS5, A-AOS7, A-AOS8, A-AOS9, and A-AOS10, the
incorrect number of vowels produced was 12,7,11,7,10, respectively), see also, Table 4.
Table 5. Mean values of the acoustic parameters for tones and vowels.
Tone parameter HC PC PI Vowel parameter HC PC PI
T1 Pitch shift(St) 1.32(0.73) 1.9(1.65) 6.5(2.53) /a/ F1(Hz) 996(192) 888(258) 665(228)
Height of mean f0(St) 3.92(1.96) 1.2(1.6) 1.4(1.21) F2(Hz) 1406(184) 1368(209) 1387(342)

Height of min f0(St) 3.24(1.9) 0.14(1.77) −2.94(1.39) F1-C(Hz) 44.1(157) −49.7(137) 88.1(230)
Syllable duration(S) 0.59(0.16) 0.54(0.18) 0.57(0.2) F2-C(Hz) −36.2(235) −18.9(187) −101.4(257)
D(S) 0.4(0.15) 0.4(0.17) 0.36(0.14)
T2 D- excursion(St) 9.88(2.33) 3.28(4.28) −0.94(2.97) /y/ F1(Hz) 286(46) 305(40) 494(203)
Slope 33.1(16.7) 17.8(16) −10.8(29.9) F2(Hz) 2196(355) 2018(246) 1645(664)
Timing of max f0 0.98(0.03) 0.9(0.12) 0.44(0.43) F1-C(Hz) −6.65(52) −19.1(47) −11.8(82)
Timing of min f0 0.21(0.14) 0.25(0.22) 0.37(0.46) F2-C(Hz) −12.5(130) −10.7(104) 231(781)
Syllable duration(S) 0.63(0.17) 0.54(0.2) 0.49(0.1) D(S) 0.45(0.16) 0.43(0.18) 0.47(0.24)
T3 Height of mean f0(St) −5.51(1.38) −2.2(1.93) −0.1(2) /u/ F1(Hz) 340(51) 368(44) 557(227)
Height of min f0(St) −9.43(2.16) −4.4(2.32) −1.71(1.75) F2(Hz) 695(80) 755(133) 1152(467)
Syllable duration(S) 0.71(0.17) 0.63(0.21) 0.46(0.16) F1-C(Hz) −11.5(51 11.3(67) 11.5(140)
F2-C(Hz) 61.8(81) −17.7(269) 68.9(420)
D(S) 0.44(0.14) 0.37(0.17) 0.37(0.11)
T4 D- excursion(St) −17.2(3.83)−7.93(4.11) 0.1(7.12) /i/ F1(Hz) 286(44) 383(424) 380(90)
Slope −82.9(24.1) −38.9(17) −2.85(35.5) F2(Hz) 2732(335) 2489(326) 2025(460)
Timing of max f0 0.08(0.08) 0.1(0.12) 0.58(0.39) F1-C(Hz) −9.6(45) −20.1(60) −381.1(690)
Timing of min f0 0.96(0.07) 0.98(0.03) 0.74(0.32) F2-C(Hz) 13.6(219) 34.8(186) −50.5(178)
Syllable duration(S) 0.45(0.14) 0.46(0.18) 0.6(0.33) D(S) 0.46(0.17) 0.4(0.14) 0.47(0.28)
/ɤ/ F1(Hz) 468(74) 488(79) 622(242)
F2(Hz) 1238(107) 1219(89) 1326(403)
F1-C(Hz) −163(148) −129.3(48) 4.2(207)
F2-C(Hz) 37.1(108) −19.1(116) 36.8(237)
D(S) 0.43(0.15) 0.33(0.09) 0.39(0.13)
Note: F1-C = F1 change, F2-C = F2 change, D = Duration (Rime), St = Semitones, S =Second, Hz = Hertz. Numbers in the bracket stand for the standard deviation of the mean.
CLINICAL LINGUISTICS & PHONETICS
753
754 W. CHEN ET AL.

Table 6. Results of the Kruskal-Wallis test for tones and pairwise comparisons.
Pairwise comparison (Dunn’s test)
Kruskal-Wallis Test HC vs. PC PC vs. PI HC vs. PI
Tone Parameter H df Sig. Sig. Sig. Sig.

T1 Pitch shift 24.203 2 p = 0.000 p =0.367 p = 0.000** p = 0.000**


Height of mean f0 36.305 2 p = 0.000 p = 0.000** p = 1.000 p = 0.005**
Height of min f0 52.282 2 p = 0.000 p = 0.000** p = 0.049* p = 0.000**
T2 Syllable duration 10.244 2 p = 0.006 p = 0.009** p = 1.000 p = 0.272
Directional excursion 61.901 2 p = 0.000 p = 0.000** p = 0.423 p = 0.000**
Slope 33.562 2 p = 0.000 p = 0.000** p = 0.17 p = 0.000**
Timing of max f0 28.261 2 p = 0.000 p = 0.000** p = 0.582 p = 0.003**
T3 Syllable duration 15.967 2 p = 0.000 p = 0.378 p = 0.146 p = 0.000
Height of mean f0 41.651 2 p = 0.000 p = 0.000** p = 0.728 p = 0.000**
Height of min f0 45.702 2 p = 0.000 p = 0.000** p = 0.6 p = 0.000**
T4 Directional excursion 63.151 2 p = 0.000 p = 0.000** p = 0.106 p = 0.000**
Slope 58.937 2 p = 0.000 p = 0.000** p = 0.174 p = 0.000**
Timing of max f0 10.654 2 p = 0.005 p = 1.000 p = 0.006** p = 0.006**
Note: Only the acoustic parameters with the main effect observed (p < 0.05) are presented. *indicates statistical significance at the
0.05 level after Bonferroni correction for multiple comparisons, **indicates statistical significance at the 0.01 level after Bonferroni
correction for multiple comparisons. HC, PC and PI stand for ‘Healthy speakers’ correct’, ‘Patients’ correct’ and ‘Patients’ incorrect’
tone productions, respectively.

Figure 3. Mean formant trajectories (F1 and F2) for each vowel uttered by different accuracy groups. The
numbers 1, 2, and 3 on the X-axes represent 25%, 50%, and 75% of the vowel duration, respectively,
while the Y-axes display the frequency (Hertz) value of the averaged formant trajectories.

Average F1 and F2 trajectories for each vowel were plotted based on 25%, 50%, and 75% of
the vowel duration (Figure 3). The F1 and F2 trajectories of the HC (solid lines) and PC (dash
lines) groups are broadly similar in terms of formant position and shape. The trajectories of the
PI group (dotted lines) are, on the other hand, clearly distinct from the other two groups.
Specifically, there is a noticeable tendency for vowel centralisation in the PI group.
CLINICAL LINGUISTICS & PHONETICS 755

A Kruskal-Wallis test (with Dunn’s post hoc pairwise tests) was also performed on the
difference in the spectral parameters among the three accuracy groups (Table 7). No
statistical difference was found in any of the spectral parameters between the HC and PC
groups, indicating that – unlike tones, where PC productions were perceptually judged as
correct but acoustically different from HC productions – PC productions of vowels were
both perceptually and acoustically very close to the HC productions.
By contrast, there were some spectral differences between the PC and PI outputs. For
example, the vowel /a/ in PI exhibited significantly smaller F1 value (indicating a higher
tongue position) than in PC (p < 0.05); and the vowel /i/in PI had significantly smaller F2
scores (reflecting a more retracted tongue position) than in PC (p < 0.01). Furthermore, the
PI group had statistically greater F1 values for /u/ and /y/ (indicating a lower tongue
position) than the PC group (p < 0.05), and the larger F1 spectral changes for /i/ and /ɤ/
than the HC group (p <0.05), indicating that patients had difficulty maintaining the
intrinsic spectral dynamics for certain vowels. No difference was found for the vowel
duration among the three accuracy groups.
Finally, we performed a Spearman correlation analysis where the number of incorrect
tones was compared to the number of incorrect vowels of each patient. A strong and
significant (p < 0.05) correlation of 0.69 was observed, showing that incorrect production
of tones tended to coincide with incorrect production of vowels.

Difference in Mandarin Tone and vowel disruption


In the current study, the numbers of disrupted tones and vowels for the A-AOS patients
were close. The PI/PC ratios for tones and vowels were 28% and 36%, respectively,
across 10 A-AOS participants. Chi-square analysis revealed no significant difference
between the number of patients’ incorrect tones and incorrect vowels (χ2(1) = 0.882,
p > 0.05). Thus, our findings suggest that from a perceptual viewpoint, both tones and
vowels were equally damaged for the patients. However, based on the acoustic parameters
measured, even the patients’ tones categorised as correct by native Mandarin listeners were
different from the healthy speakers in acoustic terms: Patients’ correct productions of T1
were lower (with smaller ‘height of mean f0’ and ‘height of min f0’ value), their T2 and T4

Table 7. Results of the Kruskal-Wallis test for vowels and pairwise comparisons.
Pairwise comparison (Dunn’s test)
Kruskal-Wallis Test (HC vs. PC) (PC vs. PI) (HC vs. PI)
Vowel Parameter H df Sig. Sig. Sig. Sig.
/a/ F1 18.042 2 p = 0.001 p = 0.056 p = 0.024* p = 0.000**
/y/ F1 11.776 2 p = 0.003 p = 1.000 p = 0.002** p = 0.011*
/u/ F1 7.779 2 p = 0.02 p = 1.000 p = 0.047* p = 0.018*
F2 8.219 2 p = 0.016 p = 1.000 p = 0.089 p = 0.012*
/i/ F2 10.422 2 p = 0.005 p = 1.000 p = 0.005** p = 0.006**
F1change 6.802 2 p = 0.033 p = 1.000 p = 0.106 p = 0.029*
/ɤ/ F1 change 7.893 2 p = 0.019 p = 1.000 p = 0.085 p = 0.018*
Note: Only the acoustic parameters with the main effect observed (p < 0.05) are presented. *indicates statistical significance at the
0.05 level after Bonferroni correction for multiple comparisons, **indicates statistical significance at the 0.01 level after Bonferroni
correction for multiple comparisons. HC, PC and PI stand for ‘Healthy speakers’ correct’, ‘Patients’ correct’ and ‘Patients’ incorrect’
vowel productions, respectively.
756 W. CHEN ET AL.

were flatter (T2 was produced with smaller ‘slope’, ‘directional excursion’ and ‘timing of
maximum f0’, while T4 had smaller ‘directional excursion’ and ‘slope’), and their T3 was
less dipped (with smaller values for ‘height of mean f0’ and ‘height of min f0’) than healthy
speakers. By contrast, the acoustic correlates of vowels tended to be consistent with the
perceptual judgments across the three accuracy groups. Patients’ correct production of
vowels did not differ significantly from healthy speakers in terms of acoustic parameters
(F1, F2, F1 spectral change, and F2 spectral change), whereas Patients’ incorrect (PI)
productions showed signs of acoustic discrepancies from both the PC or HC group
(productions in the PI group had a lower F1 for /a/, a lower F2 for /i/, a higher F1 for
/y/, /u/ and larger values for F1 spectral change in /i/ and /ɤ/). The relations among the three
accuracy groups are presented schematically in Figure 4.
Unlike vowels, the patients’ tones identified as correct were acoustically different from the
healthy controls, indicating that Mandarin listeners were fairly tolerant of the less-targeted tone
productions which went undetected. We even observed that two T2 and one T4
(all by A-AOS 9) exhibited anomalous rising-falling contours (see Figure 1), which was atypical
in the Mandarin tonal system. Surprisingly, these three cases were all categorised as correct by
the listeners. Visual examination of the three cases revealed that the maximum f0 of the two T2
tokens was reached rather early in contours, whereas the maximum f0 of the T4 token was
achieved very late (near the midpoint of the contour), but listeners ignored the atypical parts of
the contours and still perceived the tones as the target ones. By contrast, the native listeners were
much less tolerant to any deviation in the acoustic realization of vowels (F1, F2, F1 spectral
change). Taken together, these findings suggestthat tones are actually more damaged than
vowels.

Error types
Errors included the ‘less-targeted’ and the ‘substitution’ types. The less-targeted errors were
tones and vowels articulated in a pattern that was ‘less exact’ than ‘totally erroneous’. If the
intended sound was articulated as something completely different, however, it was
a substitution error. Specifically, many target vowels identified as diphthongs or triphthongs

Figure 4. Acoustic relationships for the three accuracy groups: left panel for tones and right panel for
vowels. Line length was calculated based on the mean value of Euclidean distance of the four tones (or
five vowels) between groups.
CLINICAL LINGUISTICS & PHONETICS 757

could be considered ‘distorted substitutions’ because these vowels were produced with
a prolonged duration (a typical AOS symptom, McNeil et al., 2004) and were ultimately
transcribed as combinations of several vowels. Both distorted substitutions and off-target
articulation reflected the patients’ articulatory control deficit, presumably stemming from
the motor planning problems.
Many examples can be found where tones were articulated in a less precise way
(e.g. T1ʹs pitch contour was less high and level, T2 and T4ʹs pitch contours were less
steep, or T3ʹs pitch contour was less dipped, etc.). In contrast, cases that target tones
were entirely substituted by other tones can be found in, for example, T3
misidentified as T1, T2, or T4 (see Table 8). Vowel can also be less-targeted, as
evidenced by the vowel centralisation tendency in Figure 3.On the other hand, vowel
substitution errors were common. For example, some target vowels were identified as
other monophthongs (e.g. /a/ was identified as /ɤ/, /y/ as /u/, or /ɤ/ as /u/, etc.).
Certain vowels were even identified as diphthongs or triphthongs (as distorted
substitutions, e.g. /a/ was identified as /ai/ or /au/, /u/ as /ao/ or /ou/, /y/ as /iao/
or /uei/, etc., cf., Table 8). In general, the less-targeted tones were very likely to be
considered ‘correct’, whereas the ‘less-targeted’ vowels could possibly be mistaken for
other vowels when transcribed with Pinyin symbols.

Table 8. Identification of the tones and vowels in the PI group.


Patients Tones Vowels
A-AOS1 T4(T4,T1), T3(T3,T1), T2(T2,T1) /y/ (/y/, /i/, /u/)
A-AOS2 T1(T4*), T4(T1*), T3 (T1*)
A-AOS3 T3(T3, T2), T3(T3, T2)
A-AOS4 T1 (T1, T4*), T2 (T4*)
A-AOS5 T1(T1, T2*), /a/ (/a/, /ɤ/, /au/), /y/(/y/,/u/*),
T3 (T1*, T2, T4), /y/(/y/, /u/, /o/), /y/ (/u/*,/ɤ/),
T2 (T1*, T2, T4) /y/(/a/,/ɤ/), /u/(/i/*,/y/),
/u/(/a/,/ɤ/), /u/(/a/,/ɤ/),
/u/(/u/,/ɤ/,/o/), /ɤ/(/ie/*,/ɤ/),
/ɤ/(/ie/,/ɤ/,/i/) /ɤ/(/u/,/ɤ/,/o/),
A-AOS6
A-AOS7 T1(T2, T3), T2(T2,T1,T3), /a/ (/ai/,/a/,/i/), /a/(/ja/, /a/),
T3(T3, T2),T4,(T3*, T2), /y/ (/u/*), /y/(/i/*,/y/),
T1 (T1,T4*), T4 (T2, T3) /ɤ/(/a/,/ɤ/), /ɤ/(/a/,/ɤ/),
/ɤ/(/a/,/ɤ/)
A-AOS8 T1(T2, T3), T3(T3, T2), /y/ (/u/*), /y/(/iau/*,/ja/)
T4 (T1*,T4),T1 (T1,T4*), /y/(/ie/*), /u/(/uo/,/u/,/o/),
T4(T1*, T2), T4 (T2, T3), /u/(/u/,/o/*), /i/(/y/*),
T3(T3, T1) /i/(/y/*) /ɤ/(/o/,/a/,/ɤ/),
/ɤ/(/i/,/ɤ/,/uei/), /ɤ/(/a/,/ɤ/),
/ɤ/(/a/*),
A-AOS9 T4,(T3, T2*), T4 (T2*), /a/(/o/,/u/), /y/(/i/*),
T1(T2*, T3), T3(T3, T2), /u/(/a/,/ao/), /u/(/u/,/ɤ/,/ou/,
T3(T3, T2), T4,(T3, T1) /i/(/ei/,/i/,/a/,/ɤ/), /ɤ/(/ai/*),
/ɤ/(/iou/,/u/)
A-AOS10 T1 (T1,T4*), /a/(/y/*,/u/), /a/(/a/,/ao/,/ai/, /ɤ/),
T3(T3, T2), /y/(/ao/*,/a/), /y/(/uei/*),
T3(T3, T2) /u/(/ou/*, /iou/) /i/(/ie/*),
/u/(/i/*), /ɤ/(/uo/,/ou/,/o/),
/i/(/ɨ/*), /ɤ/(/ao/*)
Note: Tones and vowels in the brackets were the actual identifications made by the listeners. * indicates the
identification was made by more than 8 raters.
758 W. CHEN ET AL.

Tasks with non-manipulated stimuli


A second experiment was conducted to determine what happens when non-manipulated
stimuli were employed for identification. It turned out that when presented with the
non-manipulated stimuli, listeners often responded faster and played the stimuli less
often. Listeners also showed more agreements on the identification of the stimuli: for
example, more items received 10 (identifiable by all the listeners) or 0 (none) in the new
task. Although this change of score pattern occurred for both tone and vowel identification,
it was more evident in the latter.
The listeners identified 197 as ‘Healthy speakers’ correct (HC)’ tone productions
(50, 50, 48 and 49 productions for T1, T2, T3 and T4, respectively), 124 ‘Patients’ correct
(PC)’ productions (32, 40, 15 and 37 productions for T1, T2, T3 and T4, respectively) and 31
‘Patients’ incorrect (PI)’ productions (10, 4, 8 and 9 productions for T1, T2, T3 and T4,
respectively). On the other hand, the listeners identified 200 ‘Healthy speakers’ correct
(HC)’ vowel productions (40 for each /a/, /y/, /u/, /i/, /ɤ/), 136 ‘Patients’ correct (PC)’
productions (31, 28, 28, 31, and 18 productions for /a/, /y/, /u/, /i/ and /ɤ/, respectively) and
44 ‘Patients’ incorrect (PI)’ productions (6, 11, 10, 4, and 13 productions for /a/, /y/, /u/, /i/
and /ɤ/, respectively). Chi-square analysis revealed no significant difference between the
numbers of patients’ incorrect tones and incorrect vowels (as well as the numbers of correct
tones and correct vowels) (χ2(1) = 0.947, p > 0.05).
Acoustic analysis showed that the tone properties (e.g. pitch shift, height of mean f0,
height of min f0, timing of max f0, or timing of min f0) between the ‘Healthy
speakers’correct (HC)’ and the ‘Patients’ correct (PC)’ groups, as well as between the
‘Patients’ correct (PC)’ and ‘Patients’ incorrect (PI)’ groups, differed significantly. By
contrast, the vowel properties of the three groups showed a different pattern: the ‘Healthy
speakers’ correct (HC)’ and ‘Patients’ correct (PC)’ groups exhibited no statistical difference
for any vowel parameter, but both the ‘Healthy speakers’ correct (HC)’ and ‘Patients’ correct
(PC)’ groups differed significantly from the ‘Patients’ incorrect (PI)’ group in F1, F2, and F2
change values. These results mirrored that identification task previously performed with the
manipulated stimuli.

Discussion
Tone vs. vowel disruptions in Mandarin A-AOS patients
Although Mandarin A-AOS patients may have the same amount of perceived tone and
vowel disruptions in the utterance, lexical tones were acoustically more damaged than
vowels. However, Mandarin listeners were relatively tolerant of the inaccurate tones so
tone disruptions often went unnoticed. To understand the phenomenon, we could consider
that Mandarin vowels have different information entropy from tones (Do & Lai, 2021).
Vowels outperform tones in revealing the meaning of an utterance and accurately predict
speech intelligibility (Wiener & Turnbull, 2016). They have a high entropy value, carry
important information in words and are thus essential in conveying the linguistic message.
In contrast, tones may have a lower value in differentiating words, which explains Mandarin
listeners’ perceptual tolerance for them.
CLINICAL LINGUISTICS & PHONETICS 759

While Mandarin speakers may interpret tone and vowel disruption differently, the two may
be intrinsically connected. Previous studies on tone or vowel disruption in Mandarin-speaking
individuals with brain damage suggested that tone and vowel loss were two separate processes
that were unlikely to interact (Liang & van Heuven, 2004; Packard, 1986). This is because the
laryngeal structure responsible for vocal pitch generation and the supralaryngeal structure
responsible for vowel production were separate and essentially independent operating systems.
In the current study, we observed tonal impairment in many patients, whereas vowel disruption
appeared to have a certain degree of clustering. However, there was still a high association (0.69)
between incorrect tones and incorrect vowels. This does not necessarily mean the laryngeal
system interacted directly with the supralaryngeal system. Still, it is reasonable to suspect that
these two operating systems, which are both regulated by motor speech control, were impaired
side by side when this control was damaged in A-AOS patients, in such a way that the tonal and
vowel disruption could occur on a comparable scale.
Neuroscience also adds support to the findings of the current research. Neural imaging studies
show that Mandarin tone processing occurs in the fronto-parietal areas (Myers et al., 2009): the
left posterior IFG, the adjacent PMC and the left SMG/IPS, to be exact. Activity in these areas
reflects phonological processing of pitch contours of tones. Among others, The left IPS has been
linked to a variety of functions, including sensorimotor ones (Kouider et al., 2010), which are
frequently disrupted in A-AOS patients. Additionally, Mandarin speakers also display bilateral
temporo-parietal and subcortical activation specific to tone processing (including bilateral
AG/pMTG, SPL and PCC), which is likely for semantic purposes (Myers et al., 2009; Specht
et al., 2009). By contrast, vowel representations are distributed mainly over left sensorimotor
brain areas (Leff et al., 2009; Wilson et al., 2009). Common sensorimotor activation during vowel
processing is observed on a left postero-dorsal stream, including the opercular part of Broca’s
area, the adjacent ventral premotor cortex and the temporo-parietal junction, all of which are
important for vowel processing and also for motor control (Wilson et al., 2009). In the current
research, most A-AOS patients had brain lesions in the left posterior IFG (typical Broca’s area),
the left temporoparietal or the left sensorimotor area. Because these areas were involved in both
tone and vowel processing, damage to them could cause either tonal or vowel disruption, or
potentially both, with correlated impairment degrees. On the other hand, as tones are processed
through two channels, one phonological in the left fronto-parietal area and the other (automatic)
semantic in the bilateral temporo-parietal regions, they may be easier for Mandarin listeners to
perceive, thanks to the separation of phonological and semantic components and the
involvement of the right temporo-parietal lobes in tone processing. Therefore, in addition to
the claims that tones carry less information than vowels, differences in the neurological correlates
of tones and vowels may also contribute to the disparity in the listeners’ tolerance for their
disruptions.

Speech mechanisms involved in the disruption of Mandarin tones and vowels


The current research demonstrated that among the four tones, T3 was the most disrupted. This
finding is consistent with the previous research indicating that T3 is the most challenging tone for
Chinese children to learn at a young age and is most likely to be disrupted in people with
language disabilities (Gandour et al., 1988; Wong, 2012). The phenomenon may have its roots in
the complexity of motor speech control. T3 might be the most difficult tone to generate because,
unlike other types of tones, it necessitates complex coordination of two laryngeal
760 W. CHEN ET AL.

muscles – a pitch-raising muscle ‘cricothyroid (CT)’ and a pitch lowering muscle ‘sternohyoid
(SH)’(Hallé, 1994). At the start of the tone, the CT muscle relaxes and the SH muscle activates to
produce a very low initial f0, which is followed by the SH relaxing and the CT activating to
achieve the final high f0. (Hallé, 1994). Fine motor planning and programming could be required
for this sophisticated muscle coordination, which is problematic for A-AOS patients.
As for vowels, the highest accuracy rates were seen in the patients’ production of /i/ and
/a/. In particular, the word ‘/ji/ T1’ (Chinese character ‘一’, signifying ‘one’) was accurately
pronounced by all ten patients. We first suspected that this was due to a practice effect
because this word was regularly practiced by the A-AOS patients in their daily counting
exercise (they always began with ‘/ji/T1’). But this doesn’t explain why all other stimuli
involving /i/ were as well articulated (‘qi’, T2, 7 correct; ‘yi’, T3, 9 correct; ‘xi’, T4, 8 correct),
nor why the /a/-related stimuli also had a high accuracy rate. In fact, the high accuracy rates
of vowels /i/ and /a/ in the current study led to the assumption that vowels articulated with
the overt oral postures (for the tongue, lips, etc.) would be easier for the patients to generate.
The vowel /i/ was articulated with the mouth relatively open, lips retracted (or not) and
tongue protracted to the high front (reflected by a comparatively low F1 and high F2 value).
On the other hand, the vowel /a/ was articulated with the mouth wide open and the body of
the tongue lowered (reflected by a high F1 value). In either situation, the posture of the
articulatory organs could be observed, so when the patients produced words containing
these vowels, the overt tongue (or lips, etc.) position in their memories may help them better
calibrate their articulatory organs. By contrast, the vowels /y/ and /u/ were generated with
rounded lips, which obscured the tongue position inside the mouth and may have made it
difficult for the patients to map out their articulatory postures. That is why some patients
produced /y, u/ with clear lip rounding but ambiguous tongue placement. Whether the oral
posture of the vowel was overt enough for the patients to see (or recall) could be important
for the successful articulation of the vowel itself, and this could be part of the reason why
Liang and van Heuven (2004) also found that /y,u/ were produced with the lowest accuracy
rate among other vowels for aphasic patients, and the vowel /u/ was produced with
extremely deviant formant pattern by patients with AOS in Kent and Rosenbek’s (1983)
research. Another possibility for the vowel accuracy difference might again be the degrees of
muscle coordination complexity (like for the tones). The vowel /i/ was produced with an
even higher accuracy rate than /a/ because it mainly requires the voluntary activation of the
posterior genioglossus, constriction of which protrudes the tongue anteriorly. In contrast,
the vowel /a/ is produced with the coordination of the hyoglossus which pulls the tongue
backward and downward, and the anterior genioglossus which flattens the tongue tip and
maintains its contact with the inner surface of the mandible. The vowels /y/ and /u/, on the
other hand, require the activation of multiple tongue muscles as well as lip muscles
(orbicularis oris and mentalis), which are more difficult for A-AOS patients to manage
(Buchaillard et al., 2009).

Differentially manipulated stimuli versus non-manipulated stimuli


When we used non-manipulated stimuli in the identification task, the rating scores were
a little different from the task with manipulated stimuli, indicating that the lexical effect
likely played a role in the identification process. However, using non-manipulated stimuli
CLINICAL LINGUISTICS & PHONETICS 761

was insufficient to substantially impact the general pattern of tonal or vowel perception, so
the findings in the identification test with non-manipulated stimuli followed the same
pattern as the perceptual task with manipulated stimuli.

Conclusion
Both tones and vowels could be impaired in Mandarin A-AOS speech. In perception, about
equal numbers of tones and vowels produced by A-AOS patients were identified as correct,
but acoustic parameters showed that even patients’ tones rated as correct by the native
Mandarin listeners were different from those of healthy speakers, implying that tones were
actually more disrupted than vowels in acoustic terms. These findings could be relevant to
the assessment and treatment of Mandarin patients with A-AOS (or aphasia). Unlike
vowels, disruption of Mandarin tones can go undetected more easily, indicating that the
tone output may be acoustically inaccurate but perceptually acceptable. The evaluation of
tone impairment should therefore be based on different goals and criteria, such as
determining whether the goal is to learn how well patients make themselves understood
(generate perceptually acceptable tones), or to assess how motor speech control is impaired
(by examining the acoustic deviation of tones). Different goals result in varying evaluation
measures.
Second, in the treatment of aphasic speech (with or without apraxia), many methods have
been explored to improve patients’ articulatory control (Ludlow et al., 2013; Maas et al., 2008;),
because articulatory control provides information about kinematic parameters of speech
production, which are often specified at the planning and/or programming levels of articulation
(Haley et al., 2001). In Mandarin, lexical tones generated by laryngeal activities are also very
important for speech output, but there have been few attempts at tone training to improve
patients’ ‘laryngeal control’. Further, because Mandarin vocalic and tonal information is
combined to form the phonological identity, the different operating systems (oral vs. laryngeal)
must work together to generate meaningful utterance. The involvement of both articulatory and
‘laryngeal’ control could, however, make it rather challenging for Mandarin A-AOS patients to
process at the motor planning and programming stage, so a collaborative practice of tones and
vowels would also be necessary. In the current research, we find a comparable decline in
Mandarin tones and vowels. Thus, we propose that a lexical tone training be devised and
incorporated into the vowel (or other segmental elements) practice to enhance A-AOS patients’
overall motor speech control. Specifically, multiple combinations of tonal and vowel structures
could be developed to maximise the treatment effect, followed by a definitive test investigating
tonal and vowel enhancements. Future research should look into the effectiveness of tone
practice for the improvement of lexical tones, as well as the efficacy of combining tone and
vowel practice to improve the overall speech control for A-AOS patients; neuroimaging, which
shows how changes in the brain occur under various treatment conditions (tone vs. vowel), can
also be used in the future study.

Acknowledgments
This research was supported by the Shanghai YangZhi Rehabilitation Hospital (Shanghai Sunshine
Rehabilitation Center); we thank all the therapists who helped with data collection.
762 W. CHEN ET AL.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The author(s) reported there is no funding associated with the work featured in this article.

Ethical statement
This research was conducted ethically in accordance with the World Medical Association Declaration
of Helsinki. All subjects (speakers and listeners) have given their written informed consent.

References
Ackermann, H., Gräber, S., Hertrich, I., & Daum, I. (1999). Phonemic vowel length contrasts in
cerebellar disorders. Brain and Language, 67(2), 95–109. https://doi.org/10.1006/brln.1998.
2044
Ackermann, H., & Ziegler, W. (2010). Brain mechanisms underlying speech motor control. In W. J.
Hardcastle, J. Laver & F. E. Gibbon (Eds.), The handbook of phonetic sciences (pp. 202–250). Wiley
Blackwell.
Adank, P., Van Hout, R., & Velde, H. V. D. (2007). An acoustic description of the vowels of northern
and southern standard Dutch II: Regional varieties. The Journal of the Acoustical Society of
America, 121(2), 1130–1141. https://doi.org/10.1121/1.2409492
Boersma, P., & Weenink, D. (2020). Praat: Doing phonetics by computer [Computer program]
(Version 6.1. 24).
Buchaillard, S., Perrier, P., & Payan, Y. (2009). A biomechanical model of cardinal vowel production:
Muscle activations and the impact of gravity on tongue positioning. The Journal of the Acoustical
Society of America, 126(4), 2033–2051. https://doi.org/10.1121/1.3204306
Chen, Y. (2008). The acoustic realization of vowels of Shanghai Chinese. Journal of Phonetics, 36(4),
629–648. https://doi.org/10.1016/j.wocn.2008.03.001
Dabul, B. (2000). ABA-2: Apraxia battery for adults. Austin, TX: Pro-ed.
Do, Y., & Lai, R. K. Y. (2021). Accounting for lexical tones when modeling phonological distance.
Language, 97(1), e39–e67. https://doi.org/10.1353/lan.2021.0008
Duanmu, S. (2007). The phonology of standard Chinese. Oxford University Press.
Duffy, J. R. (2019). Motor speech disorders e-book: Substrates, differential diagnosis, and management.
Elsevier Health Sciences.
Elvin, J., Williams, D., & Escudero, P. (2016). Dynamic acoustic properties of monophthongs and
diphthongs in Western Sydney Australian English. The Journal of the Acoustical Society of America,
140(1), 576–581. https://doi.org/10.1121/1.4952387
Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). “Mini-mental state”: a practical method for
grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), 189–
198. https://doi.org/10.1016/0022-3956(75)90026-6
Fu, Q, and Zeng, F. (2000). Identification of temporal envelope cues in Chinese tone recognition. Asia
Pacific Journal of Speech, Language and Hearing, 5(1), 45–57. https://doi.org/10.1179/
136132800807547582
Gandour, J., Petty, S. H., & Dardarananda, R. (1988). Perception and production of tone in aphasia.
Brain and Language, 35(2), 201–240. https://doi.org/10.1016/0093-934X(88)90109-5
Gandour, J., Petty, S. H., & Dardarananda, R. (1989). Dysprosody in Broca’s aphasia: A case study.
Brain and Language, 37(2), 232–257. https://doi.org/10.1016/0093-934X(89)90017-5
CLINICAL LINGUISTICS & PHONETICS 763

Gandour, J., Ponglorpisit, S., Khunadorn, F., Dechongkit, S., Boongird, P., Boonklam, R., &
Potisuk, S. (1992). Lexical tones in Thai after unilateral brain damage. Brain and Language, 43
(2), 275–307. https://doi.org/10.1016/0093-934X(92)90131-W
Gao, S. (2006). Aphasia. Beijing: Peking University Medical Press.
Haley, K. L., Ohde, R. N., & Wertz, R. T. (2001). Vowel quality in aphasia and apraxia of speech:
Phonetic transcription and formant analyses. Aphasiology, 15(12), 1107–1123. https://doi.org/10.
1080/02687040143000519
Haley, K. L. (2002). Temporal and spectral properties of voiceless fricatives in aphasia and apraxia of
speech. Aphasiology, 16(4–6), 595–607. https://doi.org/10.1080/02687030244000257
Hallé, P. A. (1994). Evidence for tone-specific activity of the sternohyoid muscle in modern standard
Chinese. Language and Speech, 37(2), 103–123. https://doi.org/10.1177/002383099403700201
Hillenbrand, J. M., Clark, M. J., & Nearey, T. M. (2001). Effects of consonant environment on vowel
formant patterns. The Journal of the Acoustical Society of America, 109(2), 748–763. https://doi.org/
10.1121/1.1337959
Howie, J. M., & Howie, J. M. (1976). Acoustical studies of Mandarin vowels and tones (Vol. 18).
Cambridge University Press.
Hualde, J. I., Barlaz, M., & Luchkina, T. (2021). Acoustic differentiation of allophones of /aɪ/ in
Chicagoland English: Statistical comparison of formant trajectories. Journal of the International
Phonetic Association, 1–31. https://doi.org/10.1017/S0025100320000158
Jacks, A., Mathes, K. A., & Marquardt, T. P. (2010). Vowel acoustics in adults with apraxia of speech.
Journal of Speech, Language, and Hearing Research, 53(1), 61–74. https://doi.org/10.1044/1092-
4388(2009/08-0017 )
Kadyamusuma, M. R., De Bleser, R., & Mayer, J. (2011). Lexical tone disruption in Shona after brain
damage. Aphasiology, 25(10), 1239–1260. https://doi.org/10.1080/02687038.2011.590966
Kent, R. D., & Rosenbek, J. C. (1983). Acoustic patterns of apraxia of speech. Journal of Speech,
Language, and Hearing Research, 26(2), 231–249. https://doi.org/10.1044/jshr.2602.231
Kent, R. D., Kim, Y., & Chen, L. M. (2022). Oral and Laryngeal Diadochokinesis Across the Life Span:
A Scoping Review of Methods, Reference Data, and Clinical Applications. Journal of Speech,
Language, and Hearing Research, 65(2), 574–623. https://doi.org/10.1044/2021_JSLHR-21-00396
Kouider, S. E. A., de Gardelle, V., Dehaene, S., Dupoux, E., & Pallier, C. (2010). Cerebral bases of
subliminal speech priming. Neuroimage, 49(1), 922–929. https://doi.org/10.1016/j.neuroimage.
2009.08.043
Kurland, J., Pulvermuller, F., Silva, N., Burke, K., & Andrianopoulos, M. (2012). Constrained versus
unconstrained intensive language therapy in two individuals with chronic, moderate-to-severe
aphasia and apraxia of speech: behavioral and fMRI outcomes. American Journal of Speech-
Language Pathology, 21(2), S65–S65. https://doi.org/10.1044/1058-0360(2012/11-0113)
Kurowski, K. M., Blumstein, S. E., & Mathison, H. (1998). Consonant and vowel production of
right hemisphere patients. Brain and Language, 63(2), 276–300. https://doi.org/10.1006/brln.
1997.1939
Lee, W.-S., & Zee, E. (2003). Standard Chinese (Beijing). Journal of the International Phonetic
Association, 33(1), 109–112. https://doi.org/10.1017/S0025100303001208
Leff, A. P., Iverson, P., Schofield, T. M., Kilner, J. M., Crinion, J. T., Friston, K. J., & Price, C. J. (2009).
Vowel-specific mismatch responses in the anterior superior temporal gyrus: An fMRI study. cortex,
45(4), 517–526. https://doi.org/10.1016/j.cortex.2007.10.008
Liang, J., & van Heuven, V. J. (2004). Evidence for separate tonal and segmental tiers in the lexical
specification of words: A case study of a brain-damaged Chinese speaker. Brain and Language, 91
(3), 282–293. https://doi.org/10.1016/j.bandl.2004.03.006
Liu, L., Peng, D., Ding, G., Jin, Z., Zhang, L., Li, K., & Chen, C. (2006). Dissociation in the neural basis
underlying Chinese tone and vowel production. Neuroimage, 29(2), 515–523. https://doi.org/10.
1016/j.neuroimage.2005.07.046
Ludlow, C., Morgan, N., Dold, G., Lowell, S., & Dietrich-Burns, K. (2013). U.S. Patent No. 8,388,561.
Washington, DC: U.S. Patent and Trademark Office.
764 W. CHEN ET AL.

Maas, E., Robin, D. A., Hula, S. N. A., Freedman, S. E., Wulf, G., Ballard, K. J., & Schmidt, R. A.
(2008). Principles of motor learning in treatment of motor speech disorders. American Journal of
Speech-Language Pathology, 17(3), 277–298. https://doi.org/10.1044/1058-0360(2008/025)
McNeil, M. R., Pratt, S. R., & Fossett, T. R. D. (2004). The differential diagnosis of apraxia of speech.
In B. R. Maassen, R. Kent, H. Peters, P. van Lieshout, & W. Hulstijn (Eds.), Speech motor control in
normal and disordered speech (pp. 389–413). New York, NY: Oxford University Press.
Mousikou, P., & Rastle, K. (2015). Lexical frequency effects on articulation: A comparison of picture
naming and reading aloud. Frontiers in Psychology, 6(2015), 1571. https://doi.org/10.3389/fpsyg.
2015.01571
Myers, E. B., Blumstein, S. E., Walsh, E., & Eliassen, J. (2009). Inferior frontal regions underlie the
perception of phonetic category invariance. Psychological Science, 20(7), 895–903. https://doi.org/
10.1111/j.1467-9280.2009.02380.x
New, B., Araújo, V., & Nazzi, T. (2008). Differential processing of consonants and vowels in lexical
access through reading. Psychological Science, 19(12), 1223–1227. https://doi.org/10.1111/j.1467-
9280.2008.02228.x
Odell, K., McNeil, M. R., Rosenbek, J. C., & Hunter, L. (1991). Perceptual characteristics of vowel and
prosody production in apraxic, aphasic, and dysarthric speakers. Journal of Speech, Language, and
Hearing Research, 34(1), 67–80. https://doi.org/10.1044/jshr.3401.67
Ogar, J., Slama, H., Dronkers, N., Amici, S., & Luisa Gorno-Tempini, M. (2005). Apraxia of speech: an
overview. Neurocase, 11(6), 427–432. https://doi.org/10.1080/13554790500263529
Ogar, J., Willock, S., Baldo, J., Wilkins, D., Ludy, C., & Dronkers, N. (2006). Clinical and anatomical
correlates of apraxia of speech. Brain and Language, 97(3), 343–350. https://doi.org/10.1016/j.
bandl.2006.01.008
Ordin, M., & Polyanskaya, L. (2015). Perception of speech rhythm in second language: the case of
rhythmically similar L1 and L2. Frontiers in Psychology, 6(2015), 316. https://doi.org/10.3389/
fpsyg.2015.00316
Packard, J. L. (1986). Tone production deficits in nonfluent aphasic Chinese speech. Brain and
Language, 29(2), 212–223. https://doi.org/10.1016/0093-934X(86)90045-3
Peterson, G. E., & Lehiste, I. (1960). Duration of syllable nuclei in English. The Journal of the
Acoustical Society of America, 32(6), 693–703. https://doi.org/10.1121/1.1908183
Renwick, M. E., & Stanley, J. A. (2020). Modeling dynamic trajectories of front vowels in the
American South. The Journal of the Acoustical Society of America, 147(1), 579–595. https://doi.
org/10.1121/10.0000549
Ryalls, J., & Reinvang, I. (1986). Functional lateralization of linguistic tones: Acoustic evidence from
Norwegian. Language and Speech, 29(4), 389–398. https://doi.org/10.1177/002383098602900405
Ryalls, J. H. (1986). An acoustic study of vowel production in aphasia. Brain and Language, 29(1),
48–67. https://doi.org/10.1016/0093-934X(86)90033-7
Sarvasy, H., Elvin, J., Li, W., & Escudero, P. (2020). An acoustic phonetic description of Nungon
vowels. The Journal of the Acoustical Society of America, 147(4), 2891–2900. https://doi.org/10.
1121/10.0001003
Singh, L., Goh, H. H., & Wewalaarachchi, T. D. (2015). Spoken word recognition in early childhood:
Comparative effects of vowel, consonant and lexical tone variation. Cognition, 142(2015), 1–11.
https://doi.org/10.1016/j.cognition.2015.05.010
Specht, K., Osnes, B., & Hugdahl, K. (2009). Detection of differential speech-specific processes in the
temporal lobe using fMRI and a dynamic “sound morphing” technique. Human Brain Mapping, 30
(10), 3436–3444. https://doi.org/10.1002/hbm.20768
Van Der Merwe, A. (2021). New perspectives on speech motor planning and programming in the
context of the four-level model and its implications for understanding the pathophysiology
underlying apraxia of speech and other motor speech disorders. Aphasiology, 35(4), 397–423.
https://doi.org/10.1080/02687038.2020.1765306
Vitti, E., Mauszycki, S., Bunker, L., & Wambaugh, J. (2021). Stability of Speech Intelligibility
Measures Over Repeated Sampling Times in Speakers With Acquired Apraxia of Speech.
American Journal of Speech-Language Pathology, 30(3S), 1429–1445. https://doi.org/10.1044/
2020_AJSLP-20-00135
CLINICAL LINGUISTICS & PHONETICS 765

Weiss, P. H., Ubben, S. D., Kaesberg, S., Kalbe, E., Kessler, J., Liebig, T., & Fink, G. R. (2016). Where
language meets meaningful action: a combined behavior and lesion analysis of aphasia and apraxia.
Brain Structure and Function, 221(1), 563–576. http://doi.org/10.1007/s00429-014-0925-3
Wertz, R. T., LaPointe, L. L., & Rosenbek, J. C. (1984). Apraxia of speech in adults: The disorder and its
management. Orlando, FL: Grune & Stratton.
Whalen, D. H., & Xu, Y. (1992). Information for Mandarin tones in the amplitude contour and in
brief segments. Phonetica, 49(1), 25–47. https://doi.org/10.1159/000261901
Wiener, S., & Turnbull, R. (2016). Constraints of tones, vowels and consonants on lexical selection in
Mandarin Chinese. Language and Speech, 59(1), 59–82. https://doi.org/10.1177/0023830915578000
Wilson, S. M., Isenberg, A. L., & Hickok, G. (2009). Neural correlates of word production stages
delineated by parametric modulation of psycholinguistic variables. Human Brain Mapping, 30(11),
3596–3608. https://doi.org/10.1002/hbm.20782
Wong, P. (2012). Acoustic characteristics of three-year-olds’ correct and incorrect monosyllabic
Mandarin lexical tone productions. Journal of Phonetics, 40(1), 141–151. https://doi.org/10.1016/
j.wocn.2011.10.005
Xu, Y. (2005-2010). Prosody Pro.praat. <http://www.phon.ucl.ac.uk/home/yi/ProsodyPro/>
Yiu, E. M. L., & Fok, A. Y. Y.(1995). Lexical tone disruption in Cantonese aphasic speakers. Clinical
Linguistics & Phonetics, 9(1), 79–92. https://doi.org/10.3109/02699209508985326
Zahorian, S. A., & Jagharghi, A. J. (1991). Speaker normalization of static and dynamic vowel spectral
features. The Journal of the Acoustical Society of America, 90(1), 67–75. https://doi.org/10.1121/1.
402350

You might also like