Pitch Estimation Explanation

An Overview of Pitch
Detection Algorithms
1. Introduction
2. Time Domain Algorithms
1. Introduction
Prior Definitions
– Pitch : the fundamental frequency of the complex speech tone – also
known as the pitch or f0 – lies in the range of 100-120 Hz for men, but
variations outside this range can occur. The f0 for women is found
approximately one octave higher. For children, f0 is around 300 Hz.
– Frequency : Physical attribute of a sound or any type other of signal.
Describes the amount of times that a repeated event occur per unit of time.
– Fundamental Frequency : In a complex sound or signal, it is the lowest
partial.
-Pitch period = 1/Fundamental frequency
-If it is able to estimate Pitch Period (also known as Pitch Delay, Pitch
Lag), we can detect Pitch (known as Fundamental Frequency). Estimation
of pitch is known as Pitch Tracking
https://www.dpamicrophones.com/mic-university/facts-about-speech-
intelligibility
1. Introduction
Application of Pitch Tracking
– Music Automatic Transcription from audio signals to
common music notation or to MIDI number
– Score Following
– Musical Queries by singing or humming
– Acoustic feature for Human-Computer Interaction
– Sound-Editing Program like pitch-shifting and time-
scaling operation
1. Introduction
Non-Exclusive Classification
– Voice ( Speech, Singing )
– Instrumental
– Monophonic
– Polyphonic
– Time-Based Algorithm
– Spectral-Based Algorithm
– Alternative
2. Time Domain Algorithms
• Zero-Crossing Detection
• Autocorrelation Function
• Average Magnitude Difference
Function
2.1 Zero-Crossing Detection
– Based on a direct application of the definition of
periodicity
– Counting the number of time that the signal crosses a
reference level
– Mostly Inexpensive in computation
– Weakness against noise
– Presents weakness when used to analyze signals with
energy in high frequencies
Zero-Crossing Detection
2.2 Autocorrelation Technique
– Cross-Correlation is a non-linear operation that
measure the similarity between two signal.
– The corresponding samples of a signals and a time-
shifted version of an other one are multiplied and
added together.
– The Cross-Correlation function will then have a peak to

the offset value which corresponds to the maximum of
similarity.
Autocorrelation Technique
– Autocorrelation is a cross-correlation of a signal with
itself.
– The maximum of similarity occurs for time shifting of

zero.
– An other maximum should occur in theory when the
time-shifting of the signal corresponds to the
fundamental period.
- Periodic signals are signals which repeat many times per second (see
the definition of Frequency in the slide No. 2)
- Voice is a continuous signal . To process it, we have to digitalize the
voice signal by sampling it. For example, if the maximum frequency of
a complex voice signal is 3400Hz, the sampling frequency has to be
two times more e.g. 8000 Hz i.e. 8000 samples per second.
- N in the equation above means the number of sample of a voice
segment which is used for pitch detection . N is smaller than the
sampling frequency because the segment duration is less than 1
second.
- In autocorrelation, the original signal is cross-correlated with its shifted
itself signal . If the shifted time (delay/lag , denoted as “k” in slide 11)
is nearly/equal to pitch period, the auto-correllation value should be
the maximum value. That means: by trying different shifted times, we
calculate different autocorrelation values and then select the highest
one of coresponding shifted time which is nearly/equal to the pitch
period (resulting in the approximate pitch).
- Slide 11 and 12 illustrate the above explanation

In this example, Xm represents a segment of the original audio signal.
Samples of Xm (within the grey box) are multiplied with a set of aligned samples of the same
audio signal segment (Sm) to produce a correlation value (at dotted line in Fig B2, C2).
The process is repeated for different delay values (k), producing a set of correlation values (Z ).
– Not very efficient for high fundamental frequency.
– Convolution is a very expensive process.
– Computation efficiency can be improved using the FFT
algorithm instead of convolution. It reduces calculation
from N squared to NlogN.

– Most of the variation of this technique related to the
mathematical definition of the autocorrelation used, the
way the maximums are localized, and how errors in the

maximum identification are attenuated.
2.3 Average Magnitude Difference
Function
– It is an alternate to Autocorrelation function.
– It compute the difference between the signal and a
time-shifted version of itself.
– While auttocorelation have peaks at maximum

similarity, there will be valleys in the average
magnitude difference function.
Other Temporal Algorithms
– Waveform Maximum Detection
– Sum Magnitude Difference Squared Function
– Average Squared Difference Function
– Cumulative Mean Normalized Difference Function
– Circular Average Magnitude Difference Function
– Adaptive Filter
–Super Resolution Pitch Determination

Pitch Estimation Explanation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pitch Estimation Explanation

Uploaded by

Copyright:

Available Formats

An Overview of Pitch

– The Cross-Correlation function will then have a peak to

– The maximum of similarity occurs for time shifting of

- Slide 11 and 12 illustrate the above explanation

from N squared to NlogN.

way the maximums are localized, and how errors in the

– While auttocorelation have peaks at maximum

You might also like