Speech-Sound Encoding

Advances in Clinical Neurophysiology
(Supplements to Clinical Neurophysiology. Vol. 57)

Editors: M. Hallett, L.H. Phillips, II. D.L. Schomer, J.M. Massey
628 © 2004 Elsevier B.V. All rights reserved
Chapter 66
Speech-sound encoding: physiological manifestations and

behavioral ramifications
Trent Nicola and Nina Kraus''?"

a Department of Communication Sciences, and b Departments of Neurobiology and Physiology;
Otolaryngology, Northwestern University, Frances Searle Building, 2240 Campus Drive, Evanston, IL 60208
(USA)
1. Introduction components that rapidly change in frequency. The

timing and direction of the frequency sweeps and the
In order to fully process and understand speech, it is relative spacing of the harmonic components com-
necessary that its neural encoding remain intact as bine to form the consonant and vowel sounds that are
the signal is transduced from the eighth nerve to the relevant to language. This complex spectro-temporal
auditory cortex. Imperfections in this transduction structure requires an exquisitely well-coordinated
may occur in the brainstem, midbrain, thalamus or ensemble neural response for accurate encoding.
the cortex itself. Unsatisfactory speech perception is This coordinated timing of neural ensembles, in tum,
experienced by everyone to some degree when is a property that is suited for measurement by
exposed to noisy, or otherwise adverse, listening evoked potential averaging.
conditions. However, there are those who, despite This report will review research demonstrating
normal peripheral hearing thresholds, experience how speech sounds are encoded both cortically and
speech perception difficulties even in relatively subcortically in normal and impaired populations.
non-challenging listening conditions. In these popu- Perceptual improvements that arise from both stim-
lations, such as the elderly and individuals with ulus manipulation and auditory training also are
auditory-based learning disabilities, there is an evident in the aggregate neural responses, revealing
interest in identifying and localizing the defect and, that basic encoding of sound structure in the nervous
more importantly, taking corrective steps to improve system can be altered. In some cases, success of a
speech perception. training regimen may be predicted by how speech is
The speech signal, unlike most other naturally encoded by the auditory pathway before training.
occurring sounds, is composed of harmonically rich
2. Measuring speech-sound encoding
* Correspondence to: Dr. Nina Kraus, Department of
Communication Sciences, Northwestern University, Fran-
Described here are three examples of physiological
ces Searle Building, 2240 Campus Drive, Evanston, IL
60208, USA. manifestations of speech-sound encoding that
Tel: + 1847491-3165; Fax: + 1847491-2523; inform us about the normal encoding process and are
E-mail: nkraus@northwestem.edu effective in discerning populations with auditory-
629
based learning problems from normal-learning protocol. The chosen contrast, differing in F3 onset
controls. These differences have been identified both by 80 Hz, was discriminable, but near threshold, for
at the cortical and subcortical levels, and signify most NL children. A near-threshold /ba-wa/ pair,
deficits in the pre-conscious encoding of the ele- differing in formant transition duration by 5 ms, was
mental speech signal rather than a higher-level, chosen as a control. Forty-two children underwent
cognitive shortcoming. the MMN testing. Half were able to distinguish
between the Iga-dal pair, half were not; all were able
2.1. Cortical response to speech-sound change to discriminate the Iba-wal pair.
(Kraus et aI., 1996) Children in both groups had robust MMNs to the
A consonant-vowel continuum, ranging from Idal /ba-wal pair. Conversely, in response to the Iga-dal
and Igal, was synthetically constructed so that only pair, the group comprising the good Iga-dal per-
one aspect, onset frequency of the third formant ceivers had robust MMNs while the poor /ga-da/
(F3), differed among its members. Ninety-one perceivers had small or absent MMNs (Fig. 1). This
school-age children who had been clinically diag- indicates that the difficulties that the LP children
nosed with a variety of language-based learning experienced in discriminating the stop consonants is
problems (LP) and 90 normal-learning controls (NL) manifested in a passively elicited preconscious
were tested on their ability to distinguish between neural response - independent of attention and
closely-spaced Ida-gal pairs. On average, LP chil- cognition - and signals a breakdown of acoustic
dren had much higher discrimination thresholds for encoding of stimulus change along the afferent
stop consonants than the controls, consistent with auditory pathway.
work showing similar perceptual weaknesses in this
population (Tallal and Stark, 1981; Elliott et al., 2.2. Cortical response to rapidly-presented speech
1989). To control for LP children's ability to perform sounds in noise (Wible et aI., 2002)
the task, a continuum composed of /bal and Iwal,
differing in the transition duration of FI and F2, was Increased talker rate and background noise are two
administered, and both groups performed equiva- conditions known to adversely affect accurate per-
lently. ception and recognition of speech. Cortical
The mismatch response, or mismatch negativity responses to rapidly repeated speech sounds, both in
(MMN), is an auditory evoked response that signals quiet and noisy backgrounds, were investigated in
stimulus change (Naatanen, 1992). It is a relevant LP children and NL controls.
reflection of speech encoding because the speech Four-token Idal trains were presented monaurally
signal itself is characterized by acoustic change. The to the right ear at 80 dB. The stimulus was 40 ms in
stimulus delivery protocol for MMN recording duration and was presented with an interstimulus
involves a presentation of an "oddball" stimulus interval of 360 ms, and an inter-train interval of 1060
sequence. This consists of repeated presentation of ms. A continuous white noise masker, at a signal-to-
one sound (i.e. standard), randomly replaced by noise ratio of + 15 dB, was added to half of the trials.
another (i.e. deviant) in a small percentage of trials. Averaged responses to the first and last stimulus in a
Subjects are instructed to ignore the stimuli and to train were compared, both when presented in quiet
attend to the soundtrack of a movie in the non-test and with the masker. Inter-response correlations,
ear. If the mismatch response is present, it is seen as which described relative changes in morphology
a negativity in the averaged response to the deviant between responses to stimuli in position 1 and
stimulus, relative to that of the standard stimulus. A position 4, and thus timing, were measured.
/ga-da/ pair from the same synthesized continuum Under the combined stresses of repetition and
used in the discrimination task was chosen as the noise, LP subjects demonstrated poorer inter-
standard and deviant stimuli for the MMN testing response correlations in noise than in quiet (Fig. 2),
630
+ +
I 0.51JV I O.5IJV
-100 o 100 200 300 400 500 -100 o 100 200 300 400 500 ms
Fig. 1. Mismatch response to a speech syllable pair /ga-daJ. In subjects who perceived the difference between the syllables
(left), the deviant response differed significantly from the standard response from about 200 ms onward. Boxes along
abscissa represent region where waveforms significantly differed (p<0.05). Children who were unable to perceive the
differences did not exhibit a mismatch response (right). Modified from Kraus et aI., 1996.
consistent with our previous work (Cunningham et system. Poor correlation among LP children indi-
al., 200 I; Warrier et al., 2004). NL controls demon- cates that response morphology was not maintained
strated no differences between quiet and noise on to rapidly presented stimuli in noise, which could
this measure of correlation between repeated implicate inconsistency in the timing of response
responses. Such an accurate manifestation of stim- generators. Across groups, the inter-response corre-
ulus timing is a hallmark of the normal perceptual lations bore a significant positive relationship to a
standardized measure of auditory processing. These
1.2 results suggests that the speech-sound perception
_quiet difficulties seen in LP children may be due to
Dnoise degraded cortical temporal processing - the auditory
system's ability to respond precisely under condi-
Q) 1.0
wS
l::
tions of rapid temporal stimulation - in challenging
listening conditions.
o l::
a.O
w 08
~
~
~ . 2.3. Subcortical response to speech sounds
..!. ~ (Cunningham et aI., 200 I; King et al., 2002; Russo
.!l 0 et al., 2004a; Wible et al., 2004)
.E o 0.6
Short-latency - up to 10 ms - scalp recorded
auditory evoked responses have long been used to
0.4 assess hearing sensitivity and auditory pathway
LP NL control integrity. The auditory brainstem response to simple
stimuli such as clicks and tone pips consists of a
Fig. 2. Cortical inter-response correlations. In quiet,
series of well-characterized peaks that reflect neural
inter-response correlations are about the same for LP
children (left) and normal controls (right). In background responses originating from the eighth nerve to the
noise, however, the inter-response correlation is much posterior midbrain. A novel line of research focuses
poorer for the LPs. Modified from Wible et aI., 2002. on activity occurring over a similar short latency in
631
FFR
\
F
'\
A
o 10 20 30 40 50 eo ms
Fig. 3. Stimulus Idal (top) and its subcortical response (bottom). The response to Idal includes both transient and sustained
components. The most stable transients, Y, A, C and F, are labeled.
response to speech sounds. This speech-evoked assessed by stimulus-to-response correlation and

response, containing transient and sustained compo- inter-response correlation between quiet and noise
nents, mimics acoustic aspects of speech itself. conditions (Warrier et aI., 2004).
Inasmuch as it may be an oversimplification, there There were differences between NLs and LPs on
are certain parallels between consonants and vowels, several measures. For LPs, in quiet, response
and transient and sustained evoked responses. latencies of peaks A and C were significantly later.
A 40 ms syllable Ida! was presented to children The mnagnitude of the FO component and the timing
with auditory-based learning problems and normal between the peaks occurring at the wavelength of the
controls. Stimuli were presented both in quiet and stimulus FO (D, E and F) were unaffected. The
with a continuous background white noise masker. amplitude of the F 1 component of the response was
The short-latency evoked response to this complex suppressed. In noise, the earlier onset peaks were
sound comprises a series of transient onset peaks - more frequently eliminated in the LPs. Inter-
much like to a click or tone pip - and a sustained response correlations of the FFR were poorer and the
frequency-following response (FFR), which is amplitude of the FO component was reduced. A
phase-locked to the fundamental frequency of the significant relationship was found between the
speech stimulus (Fig. 3). subcortical response's Fl amplitude and reading, as
In addition to conventional latency and amplitude well as the /da-ga/ speech discrimination task
measurements of the stable discrete peaks, Y, A, C, described in subsection 2.1, above.
D, E and F, a series of analysis techniques was Taken together, these findings reveal that normal
devised to describe the longer-lasting FFR as a speech perception depends on accurate encoding of
whole. A broad measure of activation was measured sound structure - particularly the precision of
by RMS amplitude. A more precise measure of responses timing - in the auditory brainstem and
magnitude was the amplitude of the specific fre- cortex. Moreover, the speech-sound perception diffi-
quency content in the response corresponding to the culties in LP children may be due, in part, to
fundamental frequency (FO) and the first formant degraded temporal processing under conditions that
(Fl) of the Ida! stimulus. Precision of timing was remain relatively unchallenging to normal listeners.
632
3. Improving speech-sound encoding initial Ia! and the Ida!, and increasing the release
burst intensity of the consonant Id!. Both manipula-
In recent years, there has been much interest in tions are used naturally by speakers when attempting
training programs designed to improve language to speak clearly (Picheny et al., 1986). Three
skills in children with auditory-based learning prob- additional continua were created, using each of those
lems (Tallal et aI., 1996; Morrison, 1998; Diehl, strategies separately and in combination. Subjects'
1999). Such programs may have utility for other discrimination thresholds, in quiet and with back-
populations, as well. Auditory training may be ground noise, were established for all four continua.
beneficial for older people whose speech perception Cortical P21N2 amplitudes also were measured to
has diminished and may serve as an aid in foreign "conversational" and "clear" stimuli in quiet and
language acquisition in the normal population. These with background noise.
programs involve intensive exposure to speech In quiet, both NLs and LPs had equivalent
sounds; first using exaggerated cues and then discrimination scores and P21N2 amplitudes to the
gradually moving toward the subtler distinctions that conversational stimulus. With the addition of back-
are experienced in natural speech. We have been ground noise, LPs' discrimination scores suffered
examining the underlying changes in physiology and their cortical response amplitudes were dimin-
using two complementary designs. First, in a group ished compared to the controls. In noise, the
of LP children and normal controls, the physio- cue-enhanced clear speech stimuli restored LPs'
logical and behavioral effects of cue-enhanced discrimination ability and their P21N2 response
speech were examined. Second, a battery of physio- amplitudes to the same level as the controls
logical and behavioral tests was applied to LP (Fig. 4).
children before entering a commercial training Thus the changes in speaking style that people
program, and then again following completion to naturally make when speaking to, for example,
determine whether physiological changes associated hearing impaired individuals or non-native speakers,
with more precise encoding of sound structure have been demonstrated to effect a change in the
accompanied behavioral improvements. cortical encoding of sound structure. Would analo-
gous response changes occur in response to the same
3.1. Improving speech-sound encoding: Change the
speech stimulus after an individual undergoes train-
signal (Cunningham et al., 2001)
ing to improve speech perception?
In subsection 2.1, above, it was noted that LP
children have difficulty discriminating fine-grained 3.2. Improving speech-sound encoding: Change the
differences between speech syllables. Not surpris- response (King et al., 2002; Hayes et al., 2003;
ingly, this deficit is exacerbated by background Russo et al., 2004b)
noise. Background noise also has been demonstrated
to more severely degrade the cortical P21N2 evoked Twenty-seven children with learning disabilities who
response to a speech sound in the LP population. In were enrolled in an independently directed commer-
order to establish the degree to which cue-enhance- cial auditory training program were subjects. Prior to
ments to the speech signal improve behavioral enrolling in the program, and again, within three
discrimination and physiology, a study examining months after its completion, a battery of behavioral
the effects of cue-enhancement was designed. speech-perception tests, standardized measures of
A synthetic lada! to laga! continuum, with mem- learning and academic achievement, and cortical and
bers differing only in the frequency of the brainstem evoked responses was administered to
consonant's F3 onset, was constructed. Two cue- them. Fifteen controls underwent the battery twice
enhancement strategies were employed within a similar time span but received no directed
lengthening the duration of the stop gap between the training.
633
350
perceptual physiological
300
3.5
250
~
'N' 200 ell
"C 2.5
:c :E!
0 150
l5..
~ 100 ~ 1.5
N
50
~ 0.5 +-------r------...,...--------,
quiet noise noise quiet noise noise
conversational clear conversational clear
Fig. 4. Perceptual (left) and physiological (right) effects of cue-enhanced speech. Background noise affects LPs
discrimination and cortical responses more severely than the controls. Cue-enhanced speech stimuli restore discrimination
and physiology to normal levels. Modified from Cunningham et aI., 2001.
Several changes were seen between the pre- and sound structure is plastic; i.e. it is not hard-wired but
post-training behavioral tasks and the physiological is modifiable by learning and experience. Such pre-
measures, and there were some noteworthy relation- conscious, non-cognitive, physiological tests may be
ships between them. Furthermore, one pre-training valuable in predicting which children may receive
physiology measure was predictive of behavioral the most benefit from training programs and serve as
gains following training. tools for monitoring their progress.
The experimental group demonstrated significant
gains on two standardized tests that assess auditory 4. Conclusions
processing. In addition, changes were seen in the
cortical response to Idal in quiet. There is a normal Accurate speech-sound encoding requires an audi-
maturational time course in both latency and ampli- tory pathway that maintains the precise timing
tude of cortical evoked responses PI and N2 (Oades features that compose speech; the phasic and tonic
et aI., 1997; Sharma et al., 1997; Cunningham et aI., aspects of ensemble neural firing share many
2000). Changes in these responses, consistent with features of the speech signal itself. The transient
normal maturation, were demonstrated over the short responses recorded from the brainstem and cortex
testing interval in the experimental group but not in and the sustained responses originating in the
the controls. Furthermore, the cortical response's midbrain in combination can tell us a great deal
resilience to background noise, as measured by inter- about the integrity of the speech-sound encoding
response correlation, improved following training mechanism. A number of physiological abnor-
only in the experimental group. The subjects whose malities - neural timing - have been identified in a
subcortical response latencies were delayed were population of children who experience auditory-
those who showed the most gains: cortical responses based learning problems - some linked to specific
in noise most improved in resilience and Ida-gal behavioral deficits.
discrimination improved. Finally, there is some Importantly, there is evidence that the neural
evidence that brainstem encoding itself may change encoding of elemental acoustic events can be altered.
following training in some subjects. Cue-enhanced stimuli, themselves, are useful in
Thus, a variety of physiological indicators either effecting a normal-like response in an LP individual,
accompanied or predicted behavioral gains follow- and their use as a training tool can lead to improved
ing commercial auditory training. The physiological neural timing to non-enhanced speech. The mallea-
data indicate that the preconscious encoding of bility of encoding of acoustic sound structure in the
634
auditory pathway suggests approaches that could be Naatanen, R. Attention and brain/unction. Erlbaum, New Jersey,
applied more generally in other instances where 1992.
Oades, R.D., Dittmann-Balcar, A. and Zerbin, D. Development
improved perception of sound is desirable, such as
and topography of auditory event-related potientials (ERPs):
learning music or foreign languages. mismatch and processing negativity in individuals 8-22 years
of age. Psychophysiology, 1997,34: 677-693.
Acknowledgments Picheny, M.A., Durlach, N.!. and Braida, L.D. Speaking clearly
for the hard of hearing. II: Acoustic characteristics of clear and
The authors would like to thank Cynthia King, Jenna conversational speech. J. Speech and Hearing Res., 1986,29:
Cunningham, Bradley Wible and Erin Hayes. Sup- 434-446.
ported by NIH NIDCD ROl-OI51O. Russo, N., Nicol, T., Musacchia, G. and Kraus, N. Brainstem
responses to speech syllables. Clin. Neurophysiol., 2004a, 115:
References 2021-2030.
Russo, N., Nicol, T., Zecker, S., Hayes. E. and Kraus, N. Auditory
Cunningham, J., Nicol, T., Zecker, S. and Kraus, N. Speech- training improves neural timing in the human brainstem.
evoked neurophysiologic responses in children with learning Behav. Brain Res., 2004b, in press.
problems: development and behavioral correlates of percep- Sharma, A., Kraus, N., McGee, TJ. and Nicol, T.G. Devel-
tion. Ear and Hearing, 2000, 21: 554-568. opmental changes in PI and NI central auditory responses
Cunningham, J., Nicol, T, Zecker, S.G. and Kraus, N. Neurobio- elicited by consonant-vowel syllables. Electroencephalogr.
logic responses to speech in noise in children with learning Clin. Neurophysiol., 1997, 104: 540-545.
problems: Deficits and strategies for improvement. Clin. Tallal, P. and Stark, R.E. Speech acoustic-cue discrimination
Neurophvsiol., 2001, 112: 758-767. abilities of normally developing and language-impaired chil-
Diehl, S. Listen and Learn? A software approach review of dren. J. Acoust. Soc. America, 1981, 69: 568-574.
Earobics. Language, Speech and Hearing Services in Schools, Tallal, P., Miller, S.L., Bedi, G., Byrna, G., Wang, X., Nagarajan,
1999, 30, 108-116. S.S., Schreiner, C., Jenkins, W.M. and Merzenich, M.M.
Elliott, L.L., Hammer, M.A. and Scholl, M.E. Fine-grained
Language comprehension in language-learning impaired chil-
auditory discrimination in normal children and children with
dren improved with acoustically modified speech. Science,
language-learning problems. J. Speech and Hearing Res.,
1996,271: 81-84.
1989,32: 112-119.
Warrier, C.M., Johnson, KL., Hayes, E.A., Nicol, T.G. and
Hayes, E., Warrier, C.M., Nicol, T, Zecker, S.G. and Kraus, N.
Kraus, N. Learning imparied children exhibit timing deficits
Neural plasticity following auditory training in children with
and training-related improvements in auditory cortical
learning problems. Clin. Neurophysiol., 2003, 114: 673-684.
King, C., Warrier, C.M., Hayes, E. and Kraus, N. Deficits in responses to speech in noise. Exp. Brain Res., 2004, 157:
auditory brainstem encoding of speech sounds in children with 431-441.
learning problems. Neurosci. Lett., 2002,319: 111-115. Wible, B., Nicol, TG. and Kraus, N. Abnormal neural encoding
Kraus, N., McGee, TJ., Carrell, TD., Zecker, S.G., Nicol, TG. of repeated speech stimuli in noise in children with learning
and Koch, D.B. Auditory neurophysiologic responses and problems. Clin. Neurophysiol., 2002, 113: 485-494.
discrimination deficits in children with learning problems. Wible, B., Nicol, T and Kraus, N. Atypical brainstem representa-
Science, 1996,273: 971-973. tion of onset and formant structure of speech sounds in
Morrison, S. Computer applications: Earobics Pro. Child Lan- children with language-based learning problems. BioI. Psy-
guage Teaching and Therapy, 1998, 14: 279-284. chol., 2004, 67, 299-317.

Speech-Sound Encoding

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speech-Sound Encoding

Uploaded by

Copyright:

Available Formats

Advances in Clinical Neurophysiology

(Supplements to Clinical Neurophysiology. Vol. 57)

Speech-sound encoding: physiological manifestations and

Trent Nicola and Nina Kraus''?"

1. Introduction components that rapidly change in frequency. The

response to speech sounds. This speech-evoked assessed by stimulus-to-response correlation and

You might also like