You are on page 1of 41

Overview of The Speech Production and Recognition Process

EE 627 Speech Signal Processing Lecture 1/2: Overview, Modeling speech, and Categorization of speech sounds
R. Hegde Dept. of Electrical Engg. IIT Kanpur

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Overview of The Speech Production and Recognition Process

Outline

1

Overview of The Speech Production and Recognition Process Speech Recognition Why and What Modeling The Speech Production Mechanism Categorization of Speech Sounds

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Outline

1

Overview of The Speech Production and Recognition Process Speech Recognition Why and What Modeling The Speech Production Mechanism Categorization of Speech Sounds

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Why ?

Speech is a natural mode of communication Human Communication is Device driven these days - man to machine, machine to man, machine to machine Communication with machines must be made natural Interest of Big Companies Success Stories $$$

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

.in EE 627 .Speech Signal Processing. Lecture 1/2 .ac. rhegde @ iitk.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Applications Telephone applications Hands free operation Assistive living Dictation Translation Emergency response Situational Awareness Multi modal processing Many more ...

in EE 627 .Temporal and Spectral rhegde @ iitk.ac.Speech Signal Processing.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Viewing the Speech Signal . Lecture 1/2 .

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds What We need to know to do Signal-Symbol Transformation Need to understand the speech production mechanism Identify and dene units for the signal (acoustic) to symbol (language specic sound units) tranformation Choice of sound units . segmental/suprasegmental.Voiced/Unvoiced/Silence Nature of the speech signal in terms of source-system. Based on the physiological model (Articulatory phonetics).Speech Signal Processing. Based on the source-systemmodel of production (Acoustic phonetics). Lecture 1/2 . temporal/spectral rhegde @ iitk. Based on the signal (signal parameters and features) Describe human sound production in terms of Articulatory phonetics and Acoustic phonetics Study categories of excitation .in EE 627 .Based on language (Phonemes).ac.

take.. stake. Lecture 1/2 .ac. and disuencies in speakers Speaking rate variability Large vocabularies in all languages Variability in ambient acoustics.Speech Signal Processing. Kate . microphone characteristics...Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Why is Speech Recognition Dicult Word boundary hypothesis is still an unsolved problem due to continuity.training algorithms that run for days are not useful Coarticulation . background noise Adaptation to the variability Practical usability of algorithms . rhegde @ iitk. variability.in EE 627 . butter. straight. tray. channel characteristics.

. Dierent phrases sound the same Its not easy to wreck a nice beach Its not easy to recognize speech Its not easy to wreck a nice beach Sly drool Slide rule say s say yes Semantic sense or non sense Carter plans swell decit Farmer Bill dies in house Stud tires out rhegde @ iitk.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Why is Speech Recognition Dicult .in EE 627 .Speech Signal Processing. Lecture 1/2 .ac.

in EE 627 . rhegde @ iitk.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Why is Speech Recognition Dicult .ac. Lecture 1/2 ..Speech Signal Processing.

Speech Signal Processing. Lecture 1/2 .ac.in EE 627 .Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds The Speech Production and Perception Mechanism rhegde @ iitk.

ac.in EE 627 .Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds The Speech Recognition Layers rhegde @ iitk. Lecture 1/2 .Speech Signal Processing.

in EE 627 . Lecture 1/2 .Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds The SR Layers rhegde @ iitk.Speech Signal Processing.ac.

in EE 627 .Dimensions rhegde @ iitk. Lecture 1/2 .Speech Signal Processing.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Speech Recognition .ac.

Speech Signal Processing.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Outline 1 Overview of The Speech Production and Recognition Process Speech Recognition Why and What Modeling The Speech Production Mechanism Categorization of Speech Sounds rhegde @ iitk.in EE 627 . Lecture 1/2 .ac.

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds The Physiological Model of Speech Production rhegde @ iitk.in EE 627 .ac. Lecture 1/2 .Speech Signal Processing.

Speech Signal Processing.in EE 627 .ac. Lecture 1/2 .Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds The Source-System Model of Speech Production rhegde @ iitk.

Lecture 1/2 .Speech Signal Processing.ac.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Analogy between PHY and MATH Model rhegde @ iitk.in EE 627 .

Lecture 1/2 .Speech Signal Processing.in EE 627 .Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Descriptive Signal Level Analogy rhegde @ iitk.ac.

in EE 627 .Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Reasonances of the Vocal Tract rhegde @ iitk.ac.Speech Signal Processing. Lecture 1/2 .

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Length of Waves and Tube constraints rhegde @ iitk. Lecture 1/2 .ac.Speech Signal Processing.in EE 627 .

Speech Signal Processing.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Relating Articulation and the Acoustic Spectrum rhegde @ iitk. Lecture 1/2 .in EE 627 .ac.

has low frequency f = 35 Hz (35. f the frequency. c the velocity of sound 35K/sec. In general L = 17.in EE 627 . 1500.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Schwa (Neutral Vowel) Fundamentals From sound wave theory f = c /λ .500 Hz (35.5 = 500 Hz F2=c/λ2 =c/(1/3)4L = 3*35000/4*17.000/2) There are 3 fundamental frequencies (formants) of the schwa.5 F1=c/λ1 =c/4L = 35000/4*17.5 = 2500 Hz A neutral vowel has 3 reasonances (formants) at 500.ac. 2500 Hz rhegde @ iitk. has high frequency f = 17.000/1000) A sound with λ =2 centimeters. Lecture 1/2 .5 = 1500 Hz F3=c/λ3 =c/(1/5)4L = 5*35000/4*17. and λ the wavelength A sound with λ =10 meters.Speech Signal Processing.

Lecture 1/2 .Speech Signal Processing.in EE 627 .ac.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Outline 1 Overview of The Speech Production and Recognition Process Speech Recognition Why and What Modeling The Speech Production Mechanism Categorization of Speech Sounds rhegde @ iitk.

in EE 627 .ac. Lecture 1/2 .Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds The Phoneme Set rhegde @ iitk.Speech Signal Processing.

Words rhegde @ iitk.Speech Signal Processing. Lecture 1/2 .ac.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Phonetic Transcription .in EE 627 .

in EE 627 .Speech Signal Processing.Digits rhegde @ iitk.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Phonetic Transcription .ac. Lecture 1/2 .

ac.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Classication Tree for Speech Sounds rhegde @ iitk.in EE 627 . Lecture 1/2 .Speech Signal Processing.

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds The Vowel Space rhegde @ iitk.in EE 627 .Speech Signal Processing.ac. Lecture 1/2 .

Speech Signal Processing.in EE 627 .Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds The Vowel Triangle rhegde @ iitk. Lecture 1/2 .ac.

Lecture 1/2 .ac.in EE 627 .Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds The Dipthong Space rhegde @ iitk.Speech Signal Processing.

Lecture 1/2 .Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Articulatory Models for Vowels rhegde @ iitk.ac.Speech Signal Processing.in EE 627 .

ac. Lecture 1/2 .in EE 627 .Speech Signal Processing.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Vowel Space of American English Vowels rhegde @ iitk.

Speech Signal Processing.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds /iY/ and /uW/ rhegde @ iitk. Lecture 1/2 .in EE 627 .ac.

Speech Signal Processing.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds /ae/ and /aa/ rhegde @ iitk.in EE 627 . Lecture 1/2 .ac.

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Place and Manner of Articulation POA : Dental. Velar. Palatal. Uvular. MOA : Oral and nasal stops rhegde @ iitk. Alveolar. Lecture 1/2 .in EE 627 . Glottal .Speech Signal Processing. Pharyngeal.ac.

Labiodental: f. Dental: th/dh .in EE 627 . Velar: k/g/ng rhegde @ iitk. m .utoronto.html works for a discussion Bilabial: p.Speech Signal Processing.chass.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Consonant Classication based on Place of Articulation Location where airow is constricted is the place of articulation Labial (with lips). Lecture 1/2 .ca/~danhall/phonetics/sammy. Dorsal (using back of tongue) Lets see if http://www. b. Coronal (using tip or blade of tongue). Post: sh/zh/y. Alveolar: t/d/s/z/l .ac. v .

v. Air pressure builds up behind closure. r / Lateral approximant : Obstruction of airstream along center of oral tract. d. Lecture 1/2 . s.in EE 627 . dh/ Approximant: Not so close approximation of two articulators and no turbulence / y.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds Consonant Classication based on Manner of Articulation Stops : No air through the mouth with a complete closure of articulators Oral stops : palate is raised. b. ng/ Fricatives : There is a close approximation of two articulators resulting in turbulent airow between them producing a hissing sound /f. th. g/ Nasal stops : oral closure. t. n. but palate is lowered.Speech Signal Processing. k. no air escapes through nose. explodes when released /p. jh / rhegde @ iitk.ac. with opening around sides of tongue / l / Aricate Stop immediately followed by a fricative / ch. air escapes through nose /m. z.

Lecture 1/2 .Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds POA and MOA Table rhegde @ iitk.ac.Speech Signal Processing.in EE 627 .

ac.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds ARPA and IPA Transcriptions rhegde @ iitk.Speech Signal Processing. Lecture 1/2 .in EE 627 .

html Rabiner and Juang.Speech Signal Processing. Speech Recognition Open Course Ware.Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds References Thomas Quatieri.ac.ca/~danhall/phonetics/sammy.chass.utoronto. Lecture 1/2 .in EE 627 . Prentice Hall James Glass. MIT http://www. Fundamentals of Speech Recognition. Discrete Time Speech Signal Processing. Prentice Hall rhegde @ iitk.