Professional Documents
Culture Documents
Table of Contents
01 Abstract 03
02 Introduction 04
03 Problem 04
04 Methodology 05
05 Analysis 07
06 Conclusion 08
07 References 09
2
ABSTRACT
Pitch detection is a pivotal component of speech signal processing, integral to applications such as
voice recognition, music analysis, and diverse audio processing tasks. This comprehensive report
delves into an array of pitch detection methodologies, encompassing both time and frequency
domain analyses, as well as auto-correlation and cepstrum techniques. The primary objective of
this study is to elevate the precision of fundamental frequency estimation, thereby fostering
advancements in pitch detection within speech signals.
By combining various analytical approaches, this research strives to refine the understanding of
pitch-related features and contribute to the ongoing evolution of pitch detection methods. The
significance of accurate pitch detection reverberates across domains, influencing fields like
telecommunications, entertainment, and human-computer interaction. As technological
applications relying on vocal interfaces continue to burgeon, the outcomes of this study hold the
potential to augment the efficacy of speech-based systems and enhance user experiences in
manifold domains.
3
INTRODUCTION
Sounds from different things mix together in our ears, creating different pitches. Pitch comes from
the way our vocal folds vibrate, called the fundamental frequency. It's a crucial part of how we
speak. As science and technology advance, things get simpler for us. But with more progress comes
more discovery and finding new problems to solve. This report explores the world of sounds,
investigating how we perceive and understand them. In this journey, we aim to unravel the
complexities, striking a balance between the straightforward and the intricate, as we navigate the
realms of human perception and technological innovation.
METHODOLOGY:
1. Data Collection:
Source: https://www.kaggle.com/competitions/pitch/data)
Our investigation into pitch detection techniques begins with a comprehensive dataset from
Kaggle's Pitch Competition. This dataset encompasses a diverse range of sounds, forming the basis
for our analysis.
2. Data Preprocessing:
Specific sound samples were meticulously chosen from the Kaggle dataset for the application of
pitch detection techniques. These samples were strategically selected to represent a variety of
acoustic scenarios and challenges.
4
3. Implementation of Pitch Detection Techniques:
Various techniques have been developed to address the challenges associated with pitch detection.
These methods fall into three distinct categories:
1. Time Domain Detection
2. Frequency Domain Detection
3. Statistical Detection (Machine Detectors based on models of Ears)
A time-based pitch detector gauges the pitch period by identifying the glottal closure instant (GCI)
and measuring the time intervals between these events. Complementarily, frequency domain pitch
detection is employed to ascertain the pitch, involving a period-by-period processing of the speech
signal. In our project, we focus on both Time Domain Detection and Frequency Domain Detection
to determine the pitch of the inputted speech signals.
5
• Autocorrelation:
The autocorrelation function serves as a fundamental element in time-domain pitch detection
algorithms. The key concept behind employing this function is to generate a representation that
exhibits significant peaks at positions corresponding to the period of the waveform, with the largest
peak occurring first. This technique primarily focuses on the time domain, specifically when
correlating a segment of a signal with itself. Autocorrelation becomes particularly useful when
dealing with low-frequency components. The essence of this method lies in evaluating the distance
between positions of the maximum and second maximum correlation.
In the context of the autocorrelation function (ACF), correlation measures the similarity between
two input functions. For the autocorrelation function Γ(d), the input functions are identical,
represented by the same signal x(n), as illustrated in the equation:
Here, 'd' represents the lag or delay between the signal and a delayed segment, and 'N' denotes the
number of samples in the input signal. In cases where the signal is periodic or quasi-periodic, the
similarities between x(n) and x(n+d) are heightened. Correlation coefficients also register high
values if the lag is equal to a period or a multiple thereof. As the autocorrelation function is the
inverse Fourier Transform of the power spectrum of the input signal, the pitch is determined by
the frequency \( \frac{fs}{d} \) at which the maximum of the ACF occurs, where \( fs \) is the
sampling frequency of the speech signal. Notably, this technique is independent of unknown phase
relations and formant structures, avoiding complications associated with these parameters.
6
Within frequency-based methods, the signal frame undergoes transformation into the frequency
domain, often facilitated by the Fourier transform. A notable technique employed in frequency
domain pitch detection is the Cepstrum Method.
• Cepstrum Method:
A Cepstrum is the result of performing the Inverse Fourier Transform (IFT) on the logarithm of
the estimated spectrum of a signal. This technique finds application in the analysis of human
speech. The term 'Cepstrum' is derived by reversing the first four letters of 'spectrum'. Operations
on Cepstral include quefrency analysis, liftering, or cepstral analysis. The power Cepstrum proves
useful in exploring the periodicity of harmonic signals in the frequency representation. Taking the
Fast Fourier Transform (FFT) again reveals a peak corresponding to the fundamental period. This
process can be interpreted as a de-convolution, especially when the input signal is produced by a
train of impulses convolved with a filter. The logarithmic transformation simplifies the
multiplication in the frequency domain, and applying FFT again de-convolves the original signal,
ultimately revealing the fundamental frequency.
ANALYSIS
Exploring effective pitch detection methods is essential in the realm of sound analysis. Two
primary techniques, Autocorrelation and Cepstrum, stand out for their unique characteristics. This
empirical evaluation aims to provide insights into their complexities and efficiencies.
1. Autocorrelation:
Advantages:
• Simplicity and Efficiency
• Conceptual Ease in Mathematical Modeling
Disadvantages:
• Challenge in Peak Level Selection
• Moderate Error in Pitch Calculation
2. Cepstrum Analysis:
Advantages:
7
Disadvantages:
Empirical Comparison
Time Complexity:
Autocorrelation (AUTO): O(n)
Cepstrum (CEPS): O(n)
Latency Ranking:
Cepstrum (CEPS) >> Autocorrelation (AUTO)
Trade-offs:
Autocorrelation: Simplicity and Efficiency
Cepstrum: Computational Intensity with Enhanced Pitch Analysis
CONCLUSION
In summary, our exploration of pitch detection techniques, focusing on Autocorrelation and
Cepstrum analysis, revealed distinct trade-offs. Autocorrelation excelled in simplicity and real-
time applications, while Cepstrum offered enhanced accuracy despite higher computational
demands. The empirical evaluation showcased each method's strengths and weaknesses, guiding
their applicability based on specific requirements. Looking ahead, potential hybrid approaches and
the integration of machine learning could further refine pitch detection in diverse acoustic
scenarios. This project lays a foundational understanding, contributing to ongoing advancements
in sound analysis methodologies.
8
REFERENCES
https://en.wikipedia.org/wiki/Pitch_detection
https://www.kaggle.com/competitions/pitch/data
https://www.ijsr.net/archive/v4i3/SUB151957.pdf
https://mural.maynoothuniversity.ie/14192/1/JT_an%20investigation.pdf
https://ccrma.stanford.edu/~pdelac/154/m154paper.htm
https://www.section.io/engineering-education/machine-learning-for-audio-classification/
https://ieeexplore.ieee.org/document/9277448