You are on page 1of 3

Proceedings of 2011 International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN 2011)

Vocal Pitch Detection For Musical Transcription


Mrs. V. Bharathi Asaph Abraham. A Mrs. R. Ramya
Asst. Professor, Dept. Of ECE Student Asst. Professor, Dept. Of ECE
Sri Manakula Vinayagar Sri Manakula Vinayagar Sri Manakula Vinayagar
Engineering College Engineering College Engineering College
bharathime@rediffmail.com yloz.ash@gmail.com mithulramya@rediffmail.com

such as music transcription and score following, music


Abstract—In this paper, an improved method for the detection recognition and classification, melody modifications, time-
of vocal pitch for musical transcription is considered and stretching and other audio effects. A large number of methods
detected pitch is used as a technique for the extraction of musical
has been proposed for the estimation of the fundamental
data from the given audio input. Harmonics present in the voice
signal are eliminated by filtering techniques and the resulting frequency of speech signals, nowadays used in various
signal is divided into a number of segments based on the input applications, from speaker recognition to sound
sequence given by a metronome input. Pitch of each segment is transformations.
calculated and compared with a stored frequency table to assign Algorithms for pitch estimation are generally classified into
the necessary musical note. The assigned musical note is used for two main categories: methods estimating periodicities in the
automatic annotation of singing voice in monophonic audio. waveform of the signal and methods which look for harmonic
patterns in the spectrum.
Keywords— Pitch Detection, Vocal signals, musical These can be generally classified as working in time
transcription, Linear predictive coding domain or frequency domain. Some of the many available
pitch detection algorithms include the short-time average
I. INTRODUCTION magnitude difference function (AMDF), short-term
New hardware and software enable new forms of autocorrelation function (ACF), direct time domain
interaction with sound. Both composers and listeners can fundamental frequency estimation (DFE), weighted
experiment with new relations to sound objects and music. autocorrelation (WAC), and zero-cross rate with
The use of symbolic notations in music composition and autocorrelation algorithms. Although many pitch detection
production environments has been growing over the past algorithms have been discovered, few of them have been built
decades. Meanwhile, several research areas of the music in special purpose digital hardware able to work on noisy
community are driven towards the extraction of semantic environment and real time.
meaning from a musical stream. However, little has been done In general, the short-time AMDF and short-term ACF are
to link the extraction of this semantic information to its often employed in pitch detection. AMDF based methods
applications in composition and recording. exhibit less computational complexity while ACF based
There are four musically relevant audio descriptors which methods perform better in case of noisy speech. The AMDF
are onset times, pitch, beats and notes. This paper deals with based methods are thus widely used in real-time systems
the extraction of pitch of vocal signals and uses it for musical because of its time efficiency. Using the AMDF based method;
transcription. however, two types of estimation errors often happen. One is
Pitch estimation is one of the highly investigated and that the estimated pitch period is multiple of the actual, while
developed areas of signal processing. The problem of pitch the other is that the actual value is multiple of the estimated.
estimation is an area of research during all the evolution of We refer the first one as double pitch error, and the other one
digital signal processing. Pitch determination has numerous as half pitch error. The reason for these errors is mainly the
applications in speech processing. Accurate pitch extraction complication of speech waveforms. Besides this, these errors
has been demonstrated to play a very important role in speech occur mainly due to the falling trend of the AMDF peaks at
coding, speech compression, speech synthesis, speech higher lags. For the noisy speech signals, this tendency
recognition and speaker identification, as well as in musical increases the occurrence of octave errors in a greater degree.
world. A good estimation of pitch period is crucial to improve
the performance of speech analysis and synthesis systems. Till
now these techniques have been used for the transcription of III. LINEAR PREDICTIVE CODING
other musical signals. Detecting the pitch of a human singing Linear predictive coding (LPC) is a tool used for
voice is an area where active research is going on currently. representing the spectral envelope of a digital signal of speech
in compressed form, using the information of a linear
predictive model. It is also one of the most useful methods for
II. PITCH DETECTION ALGORITHMS encoding good quality speech at a low bit rate and it also
Pitch detection methods are essential for the analysis of provides extremely accurate estimates of speech parameters.
harmony in music signals. They are used in different systems,

978-1-61284-653-8/11/$26.00 ©2011 IEEE 724


Proceedings of 2011 International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN 2011)

LPC analyses the speech signal by estimating the formants, applied as the voice signal is a time varying random signal
removing their effects from the speech signal, and estimating which means that the pitch of the signal will not remain
the intensity and frequency of the remaining buzz. The constant throughout the length of the song or speech.
process of removing the formants is called inverse filtering, Therefore the window function is applied to segment the
and the remaining signal after the subtraction of the filtered entire input audio into a number of smaller chunks off sample
modelled signal is called the residue. This property of the LPC lengths ranging from 30-50.
algorithm makes it a valuable tool in pitch detection process Fig. 1 explains the pitch detection process by using the LPC
of the voiced vocal signal. Speech signals vary with time and method for vocal signals.
therefore this process is done on short chunks of the speech
signal, which are called frames; generally 30 to 50 frames per
second give intelligible speech with good compression. The
LPC algorithm approximates the vocal signal as a linear
combination of the past samples.
Thus a voiced vocal signal can be represented in
accordance to the LPC algorithm by equation 1 as

............1
where,
X[n]: Present speech sample FIGURE 1: BLOCK DIAGRAM OF PITCH DETECTOR USING LPC
X[n-k]: previous speech samples The window function that are generally used in signal
p: order of the model processing include the Rectangular window, Hann window,
ak: prediction coefficient Hamming window, Tukey window, Cosine window, Lanczos
e[n]: prediction error window, Triangular windows, Gaussian windows, Bartlett–
For the detection of pitch of vocal signals, the prediction Hann window, Blackman windows, Kaiser windows.
model that is used is the Autoregressive model. The Of these windows, the rectangular window is generally
autoregressive (AR) model is a type of random process which used for the purpose of pitch estimation as it provides a
is used to model and predict an output of a system based on constant value inside the interval and a zero outside the
the previous outputs. interval. The rectangular window is the best choice for
The auto regressive model can be expressed as in equation detecting a sinusoid at low signal-to-noise ratios.
The choosing of the window function depends on a number
............2 of factors of which the attenuation in the stop band plays an
important role.
where,
The LPC algorithm is preferred over the other pitch
detection algorithms for the detection of pitch of vocal signals
are the parameters of the model because of the reason that the voice signal is made up of
c is a constant (often omitted for simplicity) formant frequencies which are the harmonics of the original
is white noise.
pitch of the signal. The LPC algorithm eliminates these
P is the order of the model
formant frequencies in vocal signal and detects the main
Thus a first order AR model can be represented in the form envelope of the signal.
of a mathematical equation as , The envelope that is detected is then passed through a peak
............3 detector where the peaks are detected based on a preset
Where, threshold value. The detected peaks are then counted by using
is a white noise process with zero mean and variance a counter.
To increase the accuracy of the prediction of the LPC The pitch of the signal is calculated by finding the number
algorithm the order of the model is usually high. To determine of ratio of the peaks to the total time period of the window.
the parameters of the model, the Yule-Walker equations are ............4
used and there is a direct correspondence between these Where,
parameters and the covariance function of the process, and - Number of peaks that is detected
this correspondence can be inverted to determine the
parameters from the autocorrelation function. - Time period of the window.
Fig. 2 shows the varied pitch detected by using different
IV. PITCH DETECTION USING LPC window functions. It can be seen from Fig 2 that the
Before the application of the LPC algorithm on the voice rectangular window offers exact segmentation of the audio
signal whose pitch is to be extracted, a window function is signal with minimum overlap and minimum computations

978-1-61284-653-8/11/$26.00 ©2011 IEEE 725


Proceedings of 2011 International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN 2011)

which makes the model an optimal one that can be realised in VI. CONCLUSION AND FUTURE SCOPE
a real time scenario. In this paper we have proposed and implemented a novel
pitch detection algorithm for detecting the pitch of singing
voice. The LPC technique that is discussed here uses spectral
analysis for the detection of pitch after finding the main
envelope of the vocal signal for a given window duration. The
main advantage of using the LPC algorithm for pitch detection
is that it is not affected by the problem of harmonic
frequencies which are the main formants in vocal signals as in
AMDF and ACF techniques, and it eliminates the unwanted
formants and thus helps in accurate pitch detection of vocal
signals which is not possible in other pitch detection
techniques. The technique was further implemented in
LABVIEW and the experimental results of the same show that
the detected pitch of the vocal signals lie in a Frequency band
and can also be used for automatic annotation of music by
comparing each musical note to a given band of frequencies.

FIGURE2: PITCH OUTPUT FOR DIFFERENT WINDOW FUNCTIONS


REFERENCES
[1] S Yipeng Li and DeLiang Wang, “Detecting pitch of singing voice
in polyphonic audio”, ICASSP 2005, IEEE 2005.
It can also be seen from Fig. 2 that the rectangular window [2] Xi Shao, Changsheng Xu, Mohan S Kankanhalli, “Predominant
offers a flat response for the pitch of a given vocal signal that Vocal Pitch Detection in Polyphonic Music”, ICME 2006, IEEE
is given as input to the LPC pitch detector. The Blackman 2006.
[3] Seung-Jin Jang, Seong-Hee Choi, Hyo-Min Kim, Hong-Shik Choi,
Harris window also offers the same accuracy but it requires a Young-Ro Yoon, “Evaluation of Performance of Several Established
large computational capacity and hence takes a lot of time. Pitch Detection Algorithms in Pathological Voices”, Conference of
the IEEE EMBS, 2007.
V. EXPERIMENTAL RESULTS OBTAINED [4] Paul M. Brossier, Automatic Annotation of Musical Audio for
interactive Applications, August 2006.
The tonal waveforms of a musical note and a vocal note [5] Cuadra P., Master A. and Sapp C., ”Efficient Pitch Detection
vary greatly in terms of intensity and formant frequency. To Techniques for Interactive Music” Proceedings of the International
analyse the performance of the Linear predictive coding Computer Music Conference, 2001
algorithm and to determine its error rate in actual scenarios, an [6] Tadokoro Y., Matsumoto W. and Yamaguchi M., ”Pitch detection
of musical sounds using adaptive comb filters controlled by time
experiment on the detection of pitch using the proposed model delay”, Proceedings of the international Conference on Multimedia
was done on a piano signal and a vocal signal of the same and Expo, p.109-12, 2002
note category. [7] Alexandre Savard, “Overview of Homophonic Pitch Detection
Some of the experimental results that were obtained by Algorithms”. 2006
[8] David Dorran, Eugene Coyle, Robert Lawlor, “Audio Time-Scale
using the linear predictive coding for the extraction of pitch of Modification Using a Hybrid Time-Frequency Domain Approach”
a vocal and a musical piano sound are summarised in the table 2005 IEEE Workshop on Applications of Signal Processing to
above. Audio and Acoustics, October 2005.
[9] Lawrence R. Rabiner, Michael J.Cheng, Aaron E. Rosenberg,
Carol A. McGoneal, “A Comparative Performance Study of
several pitch Detection Algorithms” IEEE TRANSACTIONS ON
TABLE 1: PITCH OUTPUT ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL.
ASSP-24,NO. 5, OCTOBER 1976
Note Original Frequency Detected Detected
Theoretical Voice musical Note
Value Frequency frequency
C 523.25 486.63 496.36
D 587.33 552.11 550.45
E 659.26 616.19 650.12
F 698.46 672.13 669.18
G 783.99 744.36 753.23
A 880.00 834.12 865.67
B 987.77 902.13 970.54
C 1046.50 989.36 1036.21

978-1-61284-653-8/11/$26.00 ©2011 IEEE 726

You might also like