You are on page 1of 50

Auditory, Somatosensory, and Motor Interactions in Speech Production

Frank H. Guenther
Department of Cognitive and Neural Systems, Boston University Division of Health Sciences and Technology, Harvard University / M.I.T. Research Laboratory of Electronics, Massachusetts Institute of Technology

Collaborators
Satrajit Ghosh Alfonso Nieto-Castanon Jason Tourville Oren Civier Kevin Reilly Jason Bohland Jonathan Brumberg Michelle Hampson Joseph Perkell Virgilio Villacorta Majid Zandipour Melanie Matthies Shinji Maeda

HST 722 – Speech Motor Control

Supported by NIDCD, NSF. 1

CNS Speech Lab at Boston University
Primary goal is to elucidate the neural processes underlying: • Learning of speech in children • Normal speech in adults • Breakdowns of speech in disorders such as stuttering and apraxia of speech Methods of investigation include: • Neural network modeling • Functional brain imaging • Motor and auditory psychophysics These studies are organized around the DIVA model, a neural network model of speech acquisition and production developed in our lab.
HST 722 – Speech Motor Control 2

Talk Outline
Overview of the DIVA model • Mirror neurons in the model • Learning in the model • Simulating a hemodynamic response from the model
Feedback control subsystem • Auditory perturbation fMRI experiment • Somatosensory perturbation fMRI experiment

Feedforward control subsystem • Sensorimotor adaptation to F1 perturbation
Summary

HST 722 – Speech Motor Control

3

Schematic of the DIVA Model

HST 722 – Speech Motor Control

4

Boxes in the schematic correspond to maps of neurons; arrows correspond to synaptic projections. The model controls movements of a “virtual vocal tract”, or articulatory synthesizer. Video shows random movements of the articulators in this synthesizer. Production of a speech sound in the model starts with activation of a speech sound map cell in left ventral premotor cortex (BA 44/6), which in turn activates feedforward and feedback control subsystems that converge on primary motor cortex.
HST 722 – Speech Motor Control 5

the DIVA model has included a speech sound map that contains cells which are active during both perception and production of a particular speech sound (phoneme or syllable). these neurons are necessary to learn an auditory target or goal for the sound.Speech Sound Map  Mirror Neurons Since its inception in 1992. and to a lesser degree somatosensory targets (limited to the visible articulators such as lips). During perception. Speech sound map during perception Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State Somatosensory State Feedforward Command Auditory Error (Auditory Cortex) Articulator Velocity and Position Cells (Motor Cortex) Auditory FeedbackBased Command Somatosensory Error (Somatosensory Cortex) Somatosensory FeedbackBased Command To Muscles HST 722 – Speech Motor Control 6 .

These targets are compared to incoming sensory signals to generate corrective commands if needed (blue). Speech sound map during production Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State Somatosensory State Feedforward Command Auditory Error (Auditory Cortex) Articulator Velocity and Position Cells (Motor Cortex) Auditory FeedbackBased Command Somatosensory Error (Somatosensory Cortex) Somatosensory FeedbackBased Command To Muscles HST 722 – Speech Motor Control 7 .Speech Sound Map  Mirror Neurons After a sound has been learned (described next). activating the speech sound map cells for the sound leads to readout of the learned feedforward commands (“gestures”) and auditory and somatosensory targets for the sound (red arrows at right). The overall motor command (purple) combines feedforward and feedback components.

Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State Somatosensory State Feedforward Command Auditory Error (Auditory Cortex) Articulator Velocity and Position Cells (Motor Cortex) Auditory FeedbackBased Command Somatosensory Error (Somatosensory Cortex) Somatosensory FeedbackBased Command This is done with babbling movements of the vocal tract which provide paired sensory and motor signals that can be used to tune these transformations. In particular.Learning in the Model – Stage 1 In the first learning stage. the model needs to learn how to transform sensory error signals into corrective motor commands. the model learns the relationships between motor commands. somatosensory feedback. and auditory feedback. To Muscles HST 722 – Speech Motor Control 8 .

This is done through an imitation process involving the speech sound map cells. and feedforward motor commands (“gestures”) for these sounds. Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State Somatosensory State Feedforward Command Auditory Error (Auditory Cortex) Articulator Velocity and Position Cells (Motor Cortex) Auditory FeedbackBased Command Somatosensory Error (Somatosensory Cortex) Somatosensory FeedbackBased Command To Muscles HST 722 – Speech Motor Control 9 .Learning in the Model – Stage 2 (Imitation) The model then needs to learn auditory and somatosensory targets for individual speech sounds. Model projections tuned during the imitation process are shown in red.

HST 722 – Speech Motor Control 10 . Auditory target for “ba” (2) The model practices production of the sound to tune the feedforward commands and learn a somatosensory target. this target is stored in the synaptic weights projecting from the speech sound map to the higher-order auditory cortical areas.The Imitation Process (1) The model learns an auditory target from a sound sample provided by a fluent speaker.

initially under auditory feedback control. With each repetition. the model relies less on feedback control and more on feedforward control. resulting in better and better productions. HST 722 – Speech Motor Control 11 . Sound sample presented to model: Then it tries to repeat the target.Simulation – Learning Feedforward Commands The model first learns the auditory target for the sound by listening to someone produce it.

Note improvement of auditory trajectories with each practice iteration due to improved feedforward commands.Top panel: Spectrogram of target utterance presented to the model. Remaining panels: Spectrograms of the model‟s first few attempts to produce the utterance. HST 722 – Speech Motor Control 12 .

2006).Simulating a Hemodynamic Response in the Model Each model component corresponds to a particular region of the brain: Estimated Anatomical Locations of Model Components Resp† Resp† Lip Jaw DS Tongue Palate* Tongue Palate* Larynx SSM Jaw Lip DS DA Lat Cbm Larynx Aud DA Aud Lat Cbm The anatomical locations of the model‟s components have been finetuned by comparison to the results of previous neurophysiological and neuroimaging studies (Guenther. and Tourville. Ghosh. HST 722 – Speech Motor Control 13 .

The model‟s cell activities during simulations can be directly compared to the results of fMRI and PET studies. HST 722 – Speech Motor Control 14 .

Talk Outline Overview of the DIVA model • Mirror neurons in the model • Learning in the model • Simulating a hemodynamic response from the model Feedback control subsystem • Auditory perturbation fMRI experiment • Somatosensory perturbation fMRI experiment Feedforward control subsystem • Sensorimotor adaptation to F1 perturbation Summary HST 722 – Speech Motor Control 15 .

If the current auditory or somatosensory state is outside the target region for the sound.Feedback Control Subsystem The model‟s feedback control subsystem compares learned auditory and somatosensory target regions for the current speech sound to incoming sensory information. Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State Somatosensory State Feedforward Command Auditory Error (Auditory Cortex) Articulator Velocity and Position Cells (Motor Cortex) Auditory FeedbackBased Command Somatosensory Error (Somatosensory Cortex) Somatosensory FeedbackBased Command To Muscles HST 722 – Speech Motor Control 16 . error signals arise in higher-order auditory and/or somatosensory areas in the superior temporal lobe and parietal lobe.

Feedback Control Subsystem (continued) Auditory and somatosensory error signals are then transformed into corrective motor commands via projections from the sensory areas to the motor cortex. Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State Somatosensory State Feedforward Command Auditory Error (Auditory Cortex) Articulator Velocity and Position Cells (Motor Cortex) Auditory FeedbackBased Command Somatosensory Error (Somatosensory Cortex) Somatosensory FeedbackBased Command To Muscles HST 722 – Speech Motor Control 17 .

where corrective articulator commands are generated. The model also predicts that this auditory error cell activation will give rise to increased activity in motor areas. HST 722 – Speech Motor Control 18 . Cells in this map should become active if a subject‟s auditory feedback of his/her own speech is perturbed so that it mismatches the subject‟s auditory target.Prediction: Auditory Error Cells The model hypothesizes an auditory error map in the higher-order auditory cortex of the posterior superior temporal gyrus and planum temporale.

fMRI Study of Unexpected Auditory Perturbation During Speech To test these predictions. Guenther. we performed an fMRI study involving 11 subjects in which the first formant frequency (an important acoustic cue for speech) was unexpectedly perturbed upward or downward by 30% in ¼ of the production trials. The perturbed feedback trials were randomly interspersed with normal feedback trials so the subject could not anticipate the perturbations. not noticeable to subject). Perturbations were applied using a DSP device developed with colleagues at MIT (Villacorta. Sound before shift: Sound after shift: HST 722 – Speech Motor Control 19 . 2004) which feeds the modified speech signal back to the subject in near real-time (~16 ms delay. Perkell.

Unexpectedly shifting the feedback caused subjects to compensate within the same syllable as the shift (gray 95% confidence intervals): Response to downshift Normalized F1 Response to upshift Time (sec) DIVA model productions in response to unexpected upward (dashed line) and downward (solid line) perturbations of F1 fall within the distribution of productions of the speakers in the fMRI study (shaded HST 722 – Speech Motor Control 20 regions). .

HST 722 – Speech Motor Control 21 .=> Auditory feedback control is right-lateralized in the frontal cortex.

unexpected perturbation of the jaw should cause an increase in error cell activity in somatosensory and (perhaps) auditory cortical areas. Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State Somatosensory State Feedforward Command Auditory Error (Auditory Cortex) This in turn should lead to increased activity in motor areas where corrective commands are generated. Articulator Velocity and Position Cells (Motor Cortex) Auditory FeedbackBased Command Somatosensory Error (Somatosensory Cortex) Somatosensory FeedbackBased Command To Muscles HST 722 – Speech Motor Control 22 .Prediction: Somatosensory Error Cells The model also predicts that a sudden.

HST 722 – Speech Motor Control 23 . a small balloon was rapidly inflated between the teeth during the initial vowel. “ani”. The balloon inhibits upward jaw movement for the consonant and final vowel.fMRI Study of Unexpected Jaw Perturbation During Speech 13 subjects produced /aCi/ utterances while in the MRI scanner (e. causing the subject to compensate with larger tongue and/or lip movements. “abi”.g. An event-triggered paradigm was used to avoid movement artifacts and scanner noise issues: On 1 in 7 utterances. “agi”)..

001): L R L L R R HST 722 – Speech Motor Control 24 .Perturbed – Unperturbed Speech (p<0.

Talk Outline Overview of the DIVA model • Mirror neurons in the model • Learning in the model • Simulating a hemodynamic response from the model Feedback control subsystem • Auditory perturbation fMRI experiment • Somatosensory perturbation fMRI experiment Feedforward control subsystem • Sensorimotor adaptation to F1 perturbation Summary HST 722 – Speech Motor Control 25 .

activating speech sound map cells also causes the readout of feedforward commands for the sound to be produced. Feedforward commands in the DIVA model These commands are encoded in synaptic projections from premotor cortex to primary motor cortex.Feedforward Control in the Model In addition to activating the feedback control subsystem. including both corticocortical (blue) and transcerebellar (purple) projections. HST 722 – Speech Motor Control 26 .

Combining Feedforward and Feedback Commands The commands generated by the feedforward system (red) and feedback system (blue) are combined in motor cortex to form the overall motor command to the speech articulators (purple) Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State Somatosensory State Feedforward Command Auditory Error (Auditory Cortex) Articulator Velocity and Position Cells (Motor Cortex) Auditory FeedbackBased Command Somatosensory Error (Somatosensory Cortex) Somatosensory FeedbackBased Command To Muscles HST 722 – Speech Motor Control 27 .

resulting in better and better feedforward control with practice. so the feedback control subsystem is needed to “correct” the commands. the feedforward controller incorporates these feedback-based corrections into the feedforward command for the next attempt. On each attempt to produce a sound. HST 722 – Speech Motor Control 28 . the feedforward commands are poorly tuned.Tuning Feedforward Commands Early in development.

corrective commands issued by the auditory feedback control subsystem will become incorporated into the feedforward commands for that syllable. Perkell. • If the perturbation is then removed. This was investigated by Villacorta. the speaker will show “after-effects” due to these adjustments to the feedforward command. & Guenther (2004). • Speakers with better hearing (auditory acuity) will adapt more than speakers with worse hearing. HST 722 – Speech Motor Control 29 .The interactions between the feedforward and feedback control subsystems in the model lead to the following predictions: • If a speaker‟s auditory feedback is perturbed consistently over many consecutive productions of a syllable.

the subject read a short list of words involving the vowel “eh” (e. a shift of F1 was gradually applied to the subject‟s auditory feedback during the next 5 epochs. “bet”. The entire experimental session lasted approximately 60-90 minutes.Sensorimotor Adaptation Study – F1 Perturbation In each epoch of the adaptation study. After a baseline phase of 15 epochs of reading with normal feedback.g. “peck”).. The shift was then held at the maximum level (30% shift) for 25 epochs. HST 722 – Speech Motor Control 30 . Finally. feedback was returned to normal in a 20-epoch post-test phase.

• Shaded region is the 95% confidence interval for model simulation results (one simulation per speaker. target region size determined by speaker‟s auditory acuity). HST 722 – Speech Motor Control 31 .• Results for 20 subjects shown by lines with standard error bars.

e. speech only gradually returns to normal values.. there is an after-effect in the first few trials after hearing returns to normal (evidence for feedforward command adaptation).Sensorimotor Adaptation Study Results • Sustained auditory perturbation leads to adjustments in feedforward commands for speech in order to cancel out the effects of the perturbation. HST 722 – Speech Motor Control 32 . • Amount of adaptation is correlated to speaker‟s auditory acuity: high acuity speakers adapt more completely to the perturbation. • When perturbation is removed. • The model provides a close quantitative fit to these processes. i.

: • Learning of relationships between articulations and their acoustic and somatosensory consequences • Learning of auditory targets for speech sounds in the native language from externally presented examples • Learning of feedforward commands for new sounds through practice The model elucidates the interactions between motor.Summary The DIVA model elucidates several types of learning in speech acquisition.g. and auditory areas responsible for speech motor control. e. The model spans behavioral and neural levels and makes predictions that are being tested using a variety of experimental techniques. somatosensory. HST 722 – Speech Motor Control 33 .

HST 722 – Speech Motor Control 34 . e. stored in the projections from premotor to motor cortex. addition of a false palate.Reconciling Gestural and Auditory Views of Speech Production In our view. These gestural scores are shaped by auditory experience in order to adhere to acceptable auditory bounds of speech sounds in the native language(s). syllables and syllable strings (in Levelt‟s terms. which consists of an optimized set of motor programs for the most frequently produced phonemes. the “gestural score” is the feedforward component of speech production. the “syllabary”). They are supplemented by auditory and somatosensory feedback control systems that constantly adjust the gestures when they detect errors in performance. or auditory feedback perturbation. due to growth of the vocal tract.g.

Collaborators Alfonso NietoCastanon Satrajit Ghosh Jason Tourville Kevin Reilly Oren Civier Jonathan Brumberg Jason Bohland Michelle Hampson Joseph Perkell Majid Zandipour Virgilio Villacorta Melanie Matthies Shinji Maeda Supported by NSF and NIDCD HST 722 – Speech Motor Control 35 .

This smoothing process approximates the smoothing carried out during standard SPM analysis of human subject fMRI data. The resultant volume is then rendered using routines from the SPM toolbox. A brain volume is then constructed with the appropriate hemodynamic response values at each position and smoothed with a Gaussian kernel (FWHM=12mm). This function was designed to characterize the transformation from cell activity to hemodynamic response in the brain. HST 722 – Speech Motor Control 36 . generated using default settings of the function „spm_hrf‟ from the SPM toolbox.Simulating a Hemodynamic Response from the Model Model cell activities during simulations of speech are convolved with an idealized hemodynamic response function.

An event-triggered paradigm was used to avoid movement artifacts and scanner noise issues: HST 722 – Speech Motor Control 37 .

Hypothesis regarding onset of stuttering: Threshold for motor reset due to sensory error signals Error signal magnitude of stuttering individual Error signal magnitude for non-stuttering individual Onset of stuttering Age HST 722 – Speech Motor Control 38 .

Brain regions active during cued production of 3syllable strings PreSMA Frame R ep Frame S ignals SMA T rigger Cells T rigger S ignals Next S ound Overt speech only IFS S equence WM BA 44 S peech S ound Map T o m otor c ortex (DIVA m odel) HST 722 – Speech Motor Control 39 .

HST 722 – Speech Motor Control 40 . for /u/ Config. Lindblom) Tongue Body Height Carryover Coarticulation •Articulatory variability •Anticipatory coarticulation •Carryover coarticulation Target for /k/ •Speaking rate effects Config. including aspects of: • Economy of effort (cf.Auditory Target Regions The model’s use of sensory target regions provides a unified account for a number of speech production phenomena. Position explanation of carryover coarticulation and economy of effort during production of /k/ in “luke” and “leak”. for /i/ Schematized at right is the model’s Tongue Body Horiz.

fast speech) => smaller regions Contrast distance /i/ Second formant frequency /e/ First formant frequency HST 722 – Speech Motor Control 41 .Two factors that could influence target region size: (1) Perceptual acuity of speaker: better perceptual acuity => smaller regions (2) Speaking condition: clear speech (vs.

less in fast speech.b). (2004a. 350 H HI 300 H H HI LO * L L L LO 250 H L L L 200 F N C Speaking Condition F N C Speaking Condition Acoustic contrast distance (Hz) Sod-shod 2000 1500 1000 500 0 Discrimination Said-shed - * H L H L H L H H L H L H L L H L These results support the predictions on the preceding slide.Hood Acoustic Contrast Distance (Hz) 400 Cod-Cud H H (1) Speakers with high perceptual acuity show greater contrast distance in production of neighboring sound categories. (2) General tendency for greater contrast distance in clear speech.4 L L L L Articulatory LO 2 0 Results of EMMA studies: Who’d. F N C F N C Speaking condition Speaking condition HST 722 – Speech Motor Control 42 . Perkell et al.

HST 722 – Speech Motor Control 43 .) used by a speaker to produce five vowels (iy.Ellipses indicating the range of formant frequencies (+/-1 s. uh. uw) during fast speech (light gray) and clear speech (dark gray) in a variety of phonetic contexts.d. aa. eh.

both within a subject and between subjects.Motor Equivalence in American English /r/ Production It has long been known that the American English phoneme /r/ is produced with a large amount of articulatory variability. Delattre and Freeman (1968): HST 722 – Speech Motor Control 44 .

Boyce and Espy-Wilson (1997): HST 722 – Speech Motor Control 45 . the key acoustic cue for /r/ remains relatively stable across phonetic contexts.Despite large articulatory variability.

Motor Equivalence in the DIVA Model Model‟s use of auditory target for /r/ and directional mapping between auditory and articulatory spaces leads to different articulatory gestures. and different vocal tract shapes. for /r/ depending on phonetic context: Producing /r/ after /g/: Producing /r/ after /a/: EMMA/Modeling Study: (1) Collect EMMA data from speakers producing /r/ in different contexts (2) Build speaker-specific vocal tract models (articulatory synthesizers) for two of the speakers (3) Train the DIVA model to produce sounds with the speaker-specific vocal tracts (4) Compare the model’s /r/ productions to those of the EMMA subjects HST 722 – Speech Motor Control 46 .

All seven subjects in the EMMA study utilized similar trading relations (Guenther et al. 1999). S3 S7 BACK FRONT BACK FRONT HST 722 – Speech Motor Control /wagrav/ /warav/ or /wabrav/ 47 . 1998.. Boyce & Espy-Wilson. 1997). 1 cm S2 S6 This yields similar acoustics for “bunched” (red) and “retroflex” (blue) tongue configurations for /r/ (Stevens.Sketch of hypothesized trading relations for /r/: S4 Lips S1 S5 Tongue constriction Acoustic effect of the larger front cavity (blue) is compensated by the effect of the longer and narrower constriction (red).

Building Speaker-Specific Vocal Tract Models from MRI Images changing F1 changing F2 changing F3 HST 722 – Speech Motor Control Subject 2 Subject 1 48 .

and Curtin (2005). Guenther. Perkell.Comparison of model’s articulations using speaker-specific vocal tracts to those speakers’ actual articulations: /ar/ subject data Subject 1 /dr/ /gr/ /ar/ Subject 2 /dr/ /gr/ DIVA simulations [Nieto-Castanon.] HST 722 – Speech Motor Control 49 . J Acoust Soc Am.

Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State Somatosensory State Feedforward Command Auditory Error (Auditory Cortex) Articulator Velocity and Position Cells (Motor Cortex) Auditory FeedbackBased Command Somatosensory Error (Somatosensory Cortex) Somatosensory FeedbackBased Command To Muscles HST 722 – Speech Motor Control 50 .