Professional Documents
Culture Documents
Detection Algorithms
1. Introduction
2. Time Domain Algorithms
1. Introduction
Prior Definitions
– Pitch : the fundamental frequency of the complex speech tone – also
known as the pitch or f0 – lies in the range of 100-120 Hz for men, but
variations outside this range can occur. The f0 for women is found
approximately one octave higher. For children, f0 is around 300 Hz.
– Frequency : Physical attribute of a sound or any type other of signal.
Describes the amount of times that a repeated event occur per unit of time.
– Fundamental Frequency : In a complex sound or signal, it is the lowest
partial.
-Pitch period = 1/Fundamental frequency
-If it is able to estimate Pitch Period (also known as Pitch Delay, Pitch
Lag), we can detect Pitch (known as Fundamental Frequency). Estimation
of pitch is known as Pitch Tracking
https://www.dpamicrophones.com/mic-university/facts-about-speech-
intelligibility
1. Introduction
Application of Pitch Tracking
– Music Automatic Transcription from audio signals to
common music notation or to MIDI number
– Score Following
– Musical Queries by singing or humming
– Acoustic feature for Human-Computer Interaction
– Sound-Editing Program like pitch-shifting and time-
scaling operation
1. Introduction
Non-Exclusive Classification
– Voice ( Speech, Singing )
– Instrumental
– Monophonic
– Polyphonic
– Time-Based Algorithm
– Spectral-Based Algorithm
– Alternative
2. Time Domain Algorithms
• Zero-Crossing Detection
• Autocorrelation Function
• Average Magnitude Difference
Function
2.1 Zero-Crossing Detection
– Based on a direct application of the definition of
periodicity
– Counting the number of time that the signal crosses a
reference level
– Mostly Inexpensive in computation
– Weakness against noise
– Presents weakness when used to analyze signals with
energy in high frequencies
Zero-Crossing Detection
2.2 Autocorrelation Technique
– Cross-Correlation is a non-linear operation that
measure the similarity between two signal.
– The corresponding samples of a signals and a time-
shifted version of an other one are multiplied and
added together.
Samples of Xm (within the grey box) are multiplied with a set of aligned samples of the same
audio signal segment (Sm) to produce a correlation value (at dotted line in Fig B2, C2).
The process is repeated for different delay values (k), producing a set of correlation values (Z ).
Autocorrelation Technique
Autocorrelation Technique
– Not very efficient for high fundamental frequency.
– Convolution is a very expensive process.
– Computation efficiency can be improved using the FFT
algorithm instead of convolution. It reduces calculation