Right Brain Prosody and Computer Assisted Pronunciation Training

TESL 566: Phonology Research Paper
Dee Matchett 12-10-13 1
Right brain processing and Accent Reduction Software
to Improve Prosody in Non-native Speakers of English
F. Dee Matchett
TESL 566 Phonology
Carson Newman University
Dr. Dan Hinson
December 10, 2013
Abstract
Latin served as the international language for hundreds of years and still has a major influence in medical vocabulary and legalese. However, English has replaced Latin as the global language and now serves as the primary language of economic trade. English presses its influence upon science and medicine as well; hence, professional journals are primarily published in English. It is also the language of the Internet and is, therefore, exercising great influence throughout the World Wide Web.
The smaller our world becomes in terms of communication and the blending of cultures, the more non-natives are finding the need to speak English in a readily comprehensible manner. This paper explores the efficacy of using accent reduction software that stimulates right brain processing to improve the prosody of non-native speakers of English.
The mental process involved in the production of segmentals and suprasegmentals will be explained.
The relationship between the prosody of the speaker and the perception of the listener will be noted and the implications of how intelligibility affects them will be identified.
The benefits of stimulating right brain processing in relationship to pronunciation will be delineated.
An explanation of the Tomatis Method and SpeedLingua software will be offered.
Research regarding the mental processes involved in the production of language has been ongoing since 1861 when the surgeon, Paul Broca identified the area of speech formation by examining a deceased
Figure 1 Brocas Area and Wernikes Area
Retrieved from http://thebrain.mcgill.ca, October 16, 2013
mans brain.
The man, although unable to speak, could hear and understand the speech of others. Broca rightly assumed that a lesion found in the brain had been responsible for the mans inability to articulate. The neurologist Carl Wernicke added to the knowledge of language production when he discovered another region of the brain that processed speech. People with lesions in this area, could articulate, but their speech was largely incomprehensible. Both areas are located in the left hemisphere of the brain. (Bruno, 2002) (See Figure 1) It was later discovered that numerous nerve fibers form a path of transmission between Brocas area and Wernickes area. This connection allows Wernickes area to analyze written words or auditory language, form contextual understandings, and transfer that information to Brocas area. Brocas area plans the pronunciation of words and sends that information to the motor cortex that commands the muscle movements required for pronunciation. This is an oversimplification since all these processes function simultaneously with a variety of other input such as semantic memory. Semantic memory stores definitions and the articulation pattern
necessary to pronounce words, including tongue placement and mouth position. (Dubuch, 2002) This explanation does, however, provide us with a basic understanding of how the segmental aspects of speech are produced. The production of suprasegmentals that give a language its stress patterns, rhythm and intonation requires assistance from a location in the right brain. Its interactive relationship with Brocas area and Wernickes area has been difficult to adequately describe, but a dual
pathway model proposed by researchers, Angela Friederici and Kai Alter has gained acclaim. (Friederici & Alter, 2004) (Friederici, 2011)
Figure 2: Lateralization of language processing

Retrieved from http://www.frontiersin.org/Journal/10.3389/fnene.2010.00013/full, October 21, 2013
The model has been further substantiated by the findings of Yuri Saito. (Saito, Fukuhara, Aoyama, & Toshima, 2009) The complexity of this interaction is beyond the scope of this paper, but basically the dual pathway model describes the synergistic production of the segmental and suprasegmental aspects of speech. (See Figure 2) This additional right brain processing enables us to use speech for emotional expression. Raising voice pitch when surprised or lowering pitch when angry are both functions of right brain language prosody. We use the left brain, where the segmentals of speech are centered, for analysis and the right side of the brain, where suprasegmentals are centered, for the artistic expression of music, art, and dance. This correlates with English being a stress-timed language, quite musical in nature, with intricate patterns of rhythm, stress, and intonation that can be
difficult for ELLs to master. (Romer-Trillo, 2012) Engaging the right brain should help learners assimilate the lyrical nature of the language. When these lyrical features are lacking, comprehensibility is negatively affected. Studies indicate that music and speech are decoded by shared brain processes and the brain responds to the same psychoacoustic cues, namely: loudness, tempo and speech rate, melodic and prosodic contour, spectral centroid and sharpness. (Meng, 2009) (Courinho, 2013) These are all areas that can be addressed through accent reduction training, which can be augmented with computer software that stimulates the right brain in the processing of suprasegmentals. According to The New York View, a publication of Columbia Universitys Graduate School of Journalism, there is a marked increase in the number of non-native speakers seeking the services of accent modification coaches. (Cheng, 2012) Statistics from training services confirm this report. Sankin Speech Improvement, LLC, recently reported a 35% increase in clients seeking accent reduction services. (Sankin, 2013) A recent report from the US Census Bureau indicates that 21% of the American population speaks a language other than English at home. (Census, 2011) This figure is only 1% higher than 2009, but still trending upward as it has been for several decades. The American Community Survey report summarizes the findings as follows: This report provides illustrative evidence of the continuing and growing role of nonEnglish languages as part of the national fabric. Fueled by both long-term historic immigration patterns and more recent ones, the language diversity of the country has increased over the past few decades. As the nation continues to be a destination for people from other lands, this pattern of language diversity will also likely continue. (Ryan, 2013)
Figure 3 US Census Bureau, American Community Survey retrieved from http://www.census.gov/prod/2013pubs/acs-22.pdf , October 31, 2013
As this population filters into the educational system and the work force, there is a need to address speech proficiency. It would be a mistake to inadequately serve the language deficits of our increasing population of non-native speakers The growing global economy that requires greater communicative interaction in the marketplace is another compelling element necessitating clear communication skills. The increase in the demand for accent reduction services stated previously shows that second language speakers are aware of the need to acquire that proficiency. However, they may not be able to determine what specific factors in their speech are creating difficulty.(Derwing, 2003) The two primary factors needed for speech to be comprehensible are pronunciation and prosody. (Baker, 2007) This corresponds to the production of segmentals and suprasegmentals described earlier.
In a study of Pronunciation and Intelligibility, (Levis & Levelle, 2011) speech recordings of native Spanish and Korean speakers of English were evaluated by pronunciation experts to determine what most impacted intelligibility. Although insufficient pronunciation skills did result in a loss of comprehension, panelists felt that correcting pronunciation would not improve intelligibility until the larger problems of rhythm, tempo and word stress (misplaced or lacking) were addressed. These are all prosodic features of language. Monroe and Derwing (Munro & Derwing, 2000) studied the effect of accents upon native English listeners (random people, not pronunciation experts) to determine how they perceived differences in accented speech from their own speech, the difficulty they experienced in understanding accented speech, and how much accented speech the hearer actually understood. The study showed that pronunciation was the least relevant cause of comprehensibility because listeners quickly adjusted to differences in English pronunciation, This finding demonstrates empirically that the presence of a strong foreign accent does not necessarily result in reduced intelligibility or comprehensibility (p.19). A lack of prosody, on the other hand is troublesome to the listener since, two foreign-accented utterances may both be fully understood (and therefore be perfectly intelligible), but one may require more processing time than another (p.19). This slower processing time can be frustrating to the listener and as a result be interpreted as lack of fluency. In the study, listeners rated this type of speech as having lower comprehensibility, even though they could transcribe the speaker perfectly. Generally, this perception is due to the missing elements of suprasegmentals that give the English language its characteristic qualities of rhythm, stress and intonation. Without these qualities, speech becomes laborious to decipher.
Computer Assisted Language Learning (CALL) refers to the use of computer technology and software to teach all aspects of language: reading, writing, listening, and speaking. From CALL has grown a subset of technologies to improve speech: Computer Assisted Pronunciation Training (CAPT). Instead of taped voice recordings on cassette or CD to imitate in a language lab, instruction has trended towards CAPT software programs. Much of the technology for CAPT has been borrowed from speech therapy research and augmented to facilitate language learners. Automatic Speech Recognition (ASR) technology has been adapted to map speech patterns for pronunciation comparison. (Qooco, 2009) The learners pronunciation of a given text is analyzed against an accepted speech model and rated for accuracy. Some CAPT programs combine ASR with speech waveforms. Prosody, rate, speech and loudness can be read from a waveform. (McGregor, 2002) Actual
Figure 6 Waveform and spectrogram of the same word compute Retrieved from http://www.cslu.ogi.edu/tutordemos/SpectrogramReading/waveform.html
phonemes cannot be read within a waveform unless frequency components are analyzed and displayed as a spectrogram. (Carmell, 2013) Reading a spectrogram accurately requires training; for this reason, some linguists have argued against it, citing that they are presented because of their flashy look, to impress the users. (Neri, Cucchiarini, Strik, & Boves, 2009) However, it does give the learner a general visual comparison. The waveform of the model voice can be
examined for conformity to the learners voice. Since the addition of visual display has been shown to increase error recognition, waveforms can be a good learning aid, as noted in a study of the Kay-Pentax Computerized Speech Laboratory. Learners were able to use visual feedback from spectrograms to recognize gaps in their language production that they had not noticed with imitation exercise alone. (Pearson, 2011) A combination of aural and visual modalities produces an increased effectiveness in speech production. (Dominic Massaro, Micahel Cohen, Antoinette Gesi, 1993) This correlates to the previously mentioned stimulation of the right brain during the production of suprasegmentals and with the well-grounded pedagogical strategy of presenting instruction using a variety of modalities. In addition to aural and visual stimulation, by its very nature CAPT also engages the learner kinesthetically. CAPT software offering both aural and visual feedback is referred to as a Dual-Mode program. Numerous CAPT software programs are available with this type of technology. As a result of the FLUENCY project at Carnegie Mellon University using a SPHINX II ARS system developed by Maxine Eskanazi (Eskenazi & Hansma, 1998) CAPT software became available that showed improvement in speech prosody error recognition and an improved user interface. It also allowed users to select a golden voice to imitate. The idea was that learners could choose a voice closer to their own as a model. Males could select a voice with a lower F0 (pitch) and females a voice with a higher F0. An exciting new concept towards the golden voice was introduced in 2007. Since the technology now existed to modify a speakers voice, why not modify the learners own voice and let it become the closest acoustical model possible?
Here we propose a voice transformation technique that can be used to generate the (arguably) ideal voice to imitate: the own voice of the learner with a native accent. Our work extends previous research, which suggests that providing learners with prosodically corrected versions of their utterances can be a suitable form of feedback in computer assisted pronunciation training. Our technique provides a conversion of both prosodic and segmental characteristics by means of a pitch-synchronous decomposition of speech into glottal excitation and spectral envelope. We apply the technique to a corpus containing parallel recordings of foreign-accented and native-accented utterances, and validate the resulting accent conversions through a series of perceptual experiments. Our results indicate that the technique can reduce foreign accentedness without significantly altering the voice quality properties of the foreign speaker. (Felps, Bortfeld, & Gutierrez-Osuna, 2009) A uniquely individual approach had appeared and the results indicated that after this morphing technique was applied, the perception of accentedness in the learners voice was greatly reduced. Still there were problems integrating the application for the purpose of pronunciation learning. To some degree, the inevitable segmental errors of the learner were transferred to their modified voice. Not until 2007 was a method introduced that overcame this problem. Ruili Wang and Jingli Lu developed a system that morphed the voice features of the learners voice with the teachers voice in a way that eliminated learner pronunciation errors, while retaining the voice qualities of the non-native speaker. Because our voice modification is based on a teachers voice, the resynthesized utterances can be free from segmental error. (Wang & Lu, 2011)
The patented process is now part of SpeedLingua software that also incorporates the Tomatis Method (TOMATISDeveloppement, 2009). The Tomatis language learning method is based upon the research of Alfred Tomatis (1920-2001), an ear, nose and throat specialist who spent much of his career exploring the relationship between hearing and speaking. In 1960 he earned the Gold Medal for Scientific Research when his presented his laboratory research to the Academy of Sciences and Academy of Medicine. That research defined the following set of laws: The voice contains only what the ear hears; If you change hearing, the voice is immediately and unconsciously modified; It is possible to durably transform phonation by sustaining auditory stimulation for a specific given time (law of remanence). (Tomatis, 1997) As the chart below shows, there are variations in the frequency ranges of different languages. This is significant because during the acquisition of ones native language, the ear becomes very adept at hearing the frequency of the mother tongue. In order to tune in on those frequencies, the ear tunes out other frequencies. This becomes problematic later when one is trying to learn a foreign language whose frequency varies greatly from ones native
tongue. The frequencies of another language are then quite difficult to hear. It can take much time, effort and immersion the second language for the ear to tune in on its frequencies. Classroom time alone is insufficient and often language learners do not have access to an environment where they can listen to the speech of native speakers. In a 3 year study (1993-95), retuning the ear with the Tomatis method proved 50% more efficient than conventional methods of language learning. (Gianni, 2000) SpeedLinguage software incorporates the Tomatis Method by engaging the learner in a receptive listening activity for 15 minutes prior to engaging in language learning activities. During this pre-exercise time, the learner hears music that is gradually filtered from the sound frequencies of their native language to those of the target language. This tunes the ear to hear the dominant rhythm and musical intonation of the language being practiced.. SpeedLingua is the only software available that preconditions the right side of the brain for language learning and then morphs the users own voice so that they can hear themselves speak the language as if they were a native speaker, while performing the learning exercises. Thus, with the ear pre-tuned they hear themselves speaking in the wave frequency of the target language. The significance of stimulating right brain thinking for the processing of suprasegmentals and the ability of CAPT to facilitate that process through aural, visual, and kinetic modalities has been made evident.. The importance of prosody in making language
intelligible to the listener has been shown along with the pressing need for improved speech skills in our global community. The availability of dual-mode software programs that address this need has been noted. A comparison of some of these programs can be seen in Appendix 1. The benefit of tuning the ear to hear the frequency of the target language and imitating a model voice has been shown. Accompanying this report is a CD with a presentation PowerPoint that includes a video demo of SpeedLingua software and before/after video excerpts of language learners using the software. As technology continues to progress, applications for language learning bring prospects of great benefit to language learners and instructors.
Bibliography
Bruno, D. (2002). THE BRAIN FROM TOP TO BOTTOM history. Language Processiing Areas in the Brain. Retrieved from http://thebrain.mcgill.ca/flash/d/d_10/d_10_cr/d_10_cr_lan/d_10_cr_lan.html Census. (2011). Language Other Than English Spoken At Home. America Fact Fincer. Retrieved from http://factfinder2.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t Cheng, H. (2012). Accent Reduction Classes Now In Demand _ The New York View. The New York View. Retrieved October 26, 2013, from http://newyorkview.net/2012/08/accentreduction-classes-now-in-demand/ Courinho, E. (2013). Psychoacoustic cues to emotion in speech prosody and music. Cognition and Emotion, 27(4), 658684. Derwing, T. M. (2003). ELL perception of their accent.pdf. The Canadian Modern Language Review, 59(4), 21. Dominic Massaro, Micahel Cohen, Antoinette Gesi, R. H. (1993). Massaro Bimodal-SpeechPerception-An-Examination-across-Languages. Journal of Phonetics, 21, 445478. Eskenazi, M., & Hansma, S. (1998). The fluency pronunciation trainer. In Proceedings of the STiLL Workshop (p. 6). Pittsburg: Language Technology Institute, Carnegie Mellon University. Retrieved from http://www.cs.cmu.edu/~max/mainpage_files/Esk-Hans-98.pdf Felps, D., Bortfeld, H., & Gutierrez-Osuna, R. (2009). Foreign accent conversion in computer assisted pronunciation training. Speech communication, 51(10), 920932. doi:10.1016/j.specom.2008.11.004 Friederici, A. D. (2011). The brain basis of language processing: from structure to function. Physiological reviews, 91(4), 135792. doi:10.1152/physrev.00006.2011
Friederici, A. D., & Alter, K. (2004). Lateralization of auditory language functions: a dynamic dual pathway model. Brain and language, 89(2), 267276. doi:10.1016/S0093934X(03)00351-1 Gianni, F. U. K. (2000). Audio Language (pp. 124). Levis, J., & Levelle, K. (2011). PRONUNCIATION AND INTELLIGIBILITY: ISSUES IN RESEARCH AND PRACTICE PROCEEDINGS OF THE 2 ND ANNUAL PRONUNCIATION IN Editors. In Pronunciation in Second Language Learning and Teaching Conference (pp. 5669). Iowa State University. McGregor, A. (2002). Pronunciation Software Review. Meng, H. (2009). Developing Speech Recognition and Synthesis Technologies to Support Computer-Aided Pronunciation Training for Chinese Learners of English *. In 23rd Pacific Asia Conference on Language (pp. 4042). Munro, M. J., & Derwing, T. M. (2000). Foreign Accent , Comprehensibility , and Intelligibility in the Speech of Second Language Learners. Neri, A., Cucchiarini, C., Strik, H., & Boves, L. (2009). The pedagogy-technology interface in Computer Assisted Pronunciation Training. In Computer Assisted Language Learning: Critical Concepts in Linguistics (Vol. IIV, pp. 140164). doi:10.1076/call.15.5.441.13473 Pearson, P. (2011). PRONUNCIATION AND INTELLIGIBILITY: ISSUES IN RESEARCH AND PRACTICE PROCEEDINGS OF THE 2 ND ANNUAL PRONUNCIATION IN Editors. In Pronunciation in Second Language Learning and Teaching Conference (p. 169). Iowa State University. Qooco. (2009). About ASR. Qooco Chinese Learning. Retrieved February 11, 2013, from http://www.qoocochinese.com/web/help_4.htm Romer-Trillo, J. (2012). Pragmatics and Prosody in English Language Teaching. Educational Linguistics, 15, 2314. Ryan, C. (2013). Language Use in the United States: 2011 (p. 16). Washington, D.C. Retrieved from http://www.mla.org/map_main Saito, Y., Fukuhara, R., Aoyama, S., & Toshima, T. (2009). Frontal brain activation in premature infants response to auditory stimuli in neonatal intensive care unit. Early human development, 85(7), 4714. doi:10.1016/j.earlhumdev.2009.04.004 Sankin, S. (2013). Accent Reduction Training Demand Is Increasing Sankinspeechimprovement. PRWeb. Retrieved October 26, 2013, from http://www.prweb.com/releases/accent-reduction-nyc/regionalaccents/prweb10689963.htm
Tomatis, A. (1997). The Ear and Language. (B. Thompson, Ed.) (p. 207). Paris: Moulin Publications. Retrieved from http://www.tomatiscalgary.ca/tomatis-method/history-anddevelopment TOMATIS-Developpement. (2009). The TOMATIS Method , a teaching process for listening (p. 14). Luxembourg: Tomatis Developpement S.A. Wang, R., & Lu, J. (2011). Investigation of golden speakers for second language learners from imitation preference perspective by voice modification. Speech Communication, 53(2), 175184. doi:10.1016/j.specom.2010.08.015 Wendy, B. (2007). Learning prosody and fluency characteristics of second language speech: The effect of experience on child learners acquisition of five suprasegmentals - ProQuest. Applied Psycholinguistics. Retrieved from http://0search.proquest.com.library.acaweb.org/docview/200859527/fulltextPDF/1414C9067752B 26F3C2/3?accountid=9900
Logo Illustrations
SpeedLingua, Copyright 2010, Retrieved November 13, 2013 http://www.learnissimo.com TOMATIS Method, Retrieved December 4, 2013 http://www.tomatis.com/en/index.html?gclid=CJzjveSVmLsCFWJo7AodTUoAhQ
Appendix 1
See attached comparison spreadsheet of currently available CAPT software.

Right Brain Prosody and Computer Assisted Pronunciation Training

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Right Brain Prosody and Computer Assisted Pronunciation Training

Uploaded by

Copyright:

Available Formats

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 1

Right brain processing and Accent Reduction Software

to Improve Prosody in Non-native Speakers of English

TESL 566 Phonology

Carson Newman University

Dr. Dan Hinson

December 10, 2013

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 2

An explanation of the Tomatis Method and SpeedLingua software will be offered.

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 3

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 4

Figure 2: Lateralization of language processing

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 5

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 6

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 7

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 8

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 9

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 10

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 11

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 12

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 13

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 14

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 15

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 16

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 17

TESL 566: Phonology Research Paper

Dee Matchett 12-10-13 18

See attached comparison spreadsheet of currently available CAPT software.

You might also like