You are on page 1of 16

SPEECH PROCESSING

BINIT MOHANTY
binit.mohanty@gmail.com
Why Speech?
• No visual contact required
• No special equipment required
• Can be done while doing other things

• Telephones – AT&T
• Mobile Phones (1G and 2G)
Speech Processing
• Speech Coding
• Speech Synthesis
• Speech Recognition
• Speaker Recognition/Verification
• Dyslexia and Auditory problems

• Audio Engineering
Speech Coding
• Compress a Speech File
• Why not use standard compression
techniques?

• MP3 Format
– Perceptual Coding
– Exploits sensory organ biases
Speech Synthesis
• Construct Speech waveform from words
• Speaker Quality and Accent
• Prosody?

• http://www.research.att.com/~ttsweb/tts/demo.php
Speech Recognition
• Convert a sound waveform to words
• The most relevant and important task in
the industry
• 90% in lab conditions, much lower in
factory conditions

• Sphinx by CMU, ViaVoce by IBM & SDK


by Microsoft
Speaker Recognition
• Concerned with Biometrics
• Acceptable as a verification technique
• How would this be different from Speech
recognition?
– Speaker Quality
– Prosody
– Pitch, Accent etc.
Dyslexia & Auditory Problems
• Study Voice and Ear defects
• Detect and correct Speech Disfluencies –
CMU
• Development of better Ear substitutes –
Cochlear Implants
Audio Engineering
• Adding effects to sound
• Clarity of reproduction
• A Big industry with players like – Dolby,
Bose, Phillips etc

• Voice Morphing!

SOURCE TARGET CONV 1 CONV 2

Courtesy: Hui Ye & Steve Young, Cambridge


Automatic Speech Recognition
• Most Important Task
• Hardest Task
– Co-articulation: Two speakers speaking at the
same time
– Speaker Variation
– Spontaneity
– Language Modeling
– Noise Robustness
ASR: Problems

© James Glass, MIT


ASR: Method

© James Glass, MIT


ASR: Application

© James Glass, MIT


Automatic Speech Recognition

© James Glass, MIT


Automatic Speech Recognition

© James Glass, MIT


Speech Production

You might also like