Professional Documents
Culture Documents
Speech Processing
Speech measurements Short-time energy (STE) Zero crossing rate (ZCR) Autocorrelation (AC) Pitch period or frequency Formants Speech signal components Speech-Silence or Non-speech Voiced speech-Unvoiced speech
Speech Processing
Speech representations or models
Temporal features Low energy rate Zero crossing rate (ZCR) 4Hz modulation energy Pitch contour Spectral features Spectral Centroid (sharpness) Spectral Flux (rate of change) Spectral Roll-Off (spectral shape) Spectral Flatness (deviation of the spectral form) Linear Predictive Coefficients (LPC) Cepstral coefficients Mel Frequency Cepstral Coefficients (MFCC): human auditory system Harmonic features: sinusoidal harmonic modelling Perceptual features: model of the human hearing process First order derivative (DELTA)
Speaker Recognition
Speaker identification: the process of determining which registered speaker provides input speech sounds
Similarity
Input speech
Feature Extraction
Similarity
Maximum selection
Similarity
Speaker Recognition
Speaker verification: the process of accepting or rejecting the identity claim of a speaker.
Input speech Feature Extraction Similarity Decision Verification result (Accept /Reject)
Input speech
Threshold
Open Set and Closed Set Recognition Text-dependent and Text-independent Recognition Vector quantization Gaussian mixture models (GMM) Dynamic time warping (DTW) Hidden Markov model (HMM)
Text Analysis
Document Structure Detection
Phonetic Analysis
Homograph disambiguation
Prosodic Analysis
Pitch
Speech Synthesis
Voice Rendering
Synthetic Speech
Text Normalization
Grapheme-toPhoneme Conversion
Duration
Linguistic Analysis
Speech Synthesis
Text-to-speech (TTS) synthesis systems Approach TTS system performance measure Synthetic Speech Intelligibility Synthetic speech naturalness Speech Intelligibility Tests Segmental level analysis the Rhyme Test the Modified Rhyme Test the Diagnostic Rhyme Test Supra-segmental analysis the Harvard Psychoacoustic Sentences (HPS) the Haskins syntactic sentences
Speech-Assisted Translation Corrector System Objective: Develop a speech-assisted translation corrector (SATC)
system which provides a grammatically correct sentence for a translated sentence from the machine translation
input sentence translated sentence with grammatical errors grammatically correct sentence Speech assisted translation corrector system
text
He
came
here
Translator
speech
storage
A MT system is correct and complete if it can analyze of the grammatical structures encountered in the source language, and it can generate all of the grammatical structures necessary in the target language translation.
8/25/2011 16
8/25/2011
17
Speaking variability: when the same speaker speaks normally, shouts, whispers, uses a creaky voice, or has a cold Speaker variability: since different speakers have different timbers and different speaking habits
References
M. Honda, NTT CS Laboratories, Speech synthesis technology based on speech production mechanism, How to observe and mimic speech production by human, Journal of the Acoustical Society of Japan, Vol. 55, No. 11, pp. 777-782, 1999 S. Saito and K. Nakata, Fundamentals of Speech Signal Processing, 1981 M. Honda, H. Gomi, T. Ito and A. Fujino, NTT CS Laboratories, Mechanism of articulatory cooperated movements in speech production, Proceedings of Autumn Meeting of the Acoustical Society of Japan, Vol. 1, pp. 283-286, 2001 T. Kaburagi and M. Honda, NTT CS Laboratories A model of articulator trajectory formation based on the motor tasks of vocal-tract shapes, J. Acoust. Soc. Am. Vol. 99, pp. 3154-3170, 1996. S. Suzuki, T. Okadome and M. Honda, NTT CS Laboratories, Determination of articulatory positions from speech acoustics by applying dynamic articulatory constraints, Proc. ICSLP98, pp. 2251-2254, 1998. Benoit, C. and Grice, M. The SUS test: a method for the assessment of text-to-speech intelligibility using Semantically Unpredictable Sentences, Speech Communication, vol. 18, pp. 381-392.