You are on page 1of 17

Speech recognition through the analysis of

spoken syllables using autocorrelation


function parameters

ALAN RUBELLIN, ANDRES SABATER


aurubellin@gmail.com

Universidad Nacional de Tres de Febrero


www.untref.edu.ar

Buenos Aires – September 5-9, 2016


Acoustics for the 21st Century… 1/17
INTRODUCTION

•Why ACF is applied in speech recognition?

–Features of ACF parameters correspond to perceptual


qualities related to pitch timbre, duration and loudness.

Changes in the spectra of


stationary sound (Vowels)
–Timbral distinctions
Transient fluxes in
amplitude, frequency and
phase (Consonants)

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 2/ 17


AUTOCORRELATION FUNCTION

Monoaural ACF parameters:

• Φ(0): Autocorrelation amplitude in t=0


• Wφ(0) : Timbre
• Φ1: Magnitude of the maximum. “Pitch
Strength”.
• τe: Effective Duration of the Autocorrelation
• τ1: Delay of the maximum. “Pitch” frequency
(1/ τ1)

Ando, Y.; Cariani, P.; Auditory and Visual Sensations; Springer, 2009.

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 3/ 17


RUNNING ACF

Bidondo, A. et al.; Speaker recognition analysis using running autocorrelation function parameters; Proceedings of
meeting on acoustics, ICA 2013.

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 4/ 17


PURPOSE OF THE STUDY

• Changes in frequency spectrum and waveform can be


correlated with the temporal behaviour of the rACF
parameters.

Identification of consonants by analysing different CV


syllables formed with the same vowel.

Stablish a classification of consonants according to the


temporal pattern of the rACF parameters.

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 5/ 17


VOICE RECORDING

• 20 males and 20 females speakers.

• Pronounciation in spanish of the syllables “a”, “la”, “ma”, “na”, “ta” and
“sa”.

• 16 bits, 44100 Hz.

• Normalized

• A-Weighted Filter

• Low background noise and short reverberation time

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 6/ 17


DATA PROCESSING

• rACF parameters using Matlab software according to Sato and Wu


method:

–Integration interval: 0.1 s


–Maximum time leg: 0.2 s
–Running step: 0,01 s

• Spectrograms of audio files were plotted with the DAW.

• Statistical parameters using Microsoft Excel: Median, variance and


standard deviation.

• Sato, S.; Wu, S.; Definition of the effective duration (τe) of the running autocorrelation function of music signals; Acta
Acustica united with Acustica; vol.97, 2010, pp 432-440.

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 7/ 17


MATLAB SOFTWARE

Matlab Software used to


calculate rACF
parameteres.
Sato, S.; Wu, S., 2010.

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 8/ 17


RESULTS

Median values of rACF parameters

Syllable Φ(0) [dB] τ1 φ1 Wφ(0)


[ms] [ms]
Median values showed
“a” 9.567 5.501 0.716 0.371
no significant
“la” 8.857 6.504 0.755 0.477
Lowest differences between
“ma” 2.684 6.878 0.737 0.656
Loudness the “a” vowel and the
“na” 2.272 6.646 0.731 0.485
CV Syllables, with the
“sa” 7.086 4.518 0.729 0.179
exception of Φ(0)
“ta” 15.105 5.562 0.768 0.404

Highest
Loudness

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 9/ 17


RESULTS

• Sound energy is similarly distributed within the same


bandwidth between “a”, “la”, “ma” and “na” syllables.

“a” “la”

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 10/ 17


RESULTS

• Sound energy is similarly distributed within the same


bandwidth between “a”, “la”, “ma” and “na” syllables.

“ma” “na”

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 11/ 17


RESULTS

• τ1 shows a similar stationary pattern with the increase of the time


(Related to the pitch)

• Formant frequencies and tonal components of these CV syllables


can also be identified with a stationary behaviour of τ1.

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 12/ 17


RESULTS

• Pronounciation of the “s” produces sound energy within a large


bandwidth of frequency with no significant tonal components.

Initial low values of τ1 are related with pronunciation of the letter “s” which
involves a large frequency bandwidth with no significant tonal components

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 13/ 17


RESULTS

• Syllable “ta” shows an opposite temporal waveform of τ1, which may be


generated by the letter “t” pronunciation.

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 14/ 17


CONCLUSIONS

• It was possible to identify characteristic temporal behaviours of τ1 with


different CV syllables that include the same single vowel, regardless the
speaker.

• It was also possible to relate the temporal patterns of τ1 with spectrum


characteristics of the consonant involved in the CV syllable.

• This can lead to a primary classification of consonants according to the


temporal response of τ1, related to the pitch percept in the pronunciation
of the syllable.

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 15/ 17


CONCLUSIONS

• Possible classification of consonants according to the different loudness


characteristics generated during the pronunciation of syllables
(parameters Φ(0) and τe).

• Future work may include further investigation of r-ACF parameters in the


rest of the single vowels, CV syllables with different consonants, and also
study the syllables that feature more than one vowel.

Rubellin A., Sabater A. – Spoken syllables and rACF parameters 16/ 17


Thanks for your attention
Questions or suggestions ?
arubellin@gmail.com

Universidad Nacional de Tres de Febrero


www.untref.edu.ar

Buenos Aires – September 5-9, 2016


Acoustics for the 21st Century… 17/17