Professional Documents
Culture Documents
3, MARCH 2013
A. Comb Tracking
III. SOURCE TRACKING SYSTEM
The state of an active comb, , which is tracking a cluster of
This section explains how the partial space is clustered into partials, has to be updated at the next time instant. This task is
source subspaces and how these clusters are tracked in time, accomplished using prediction and measurement update, as in
using harmonic as well as dynamic constraints. We call these the Kalman Filter framework [32].
clusters as combs to signify their harmonic structure. The main Prediction update: The a priori state of the comb is esti-
modules of the source tracking system are illustrated in Fig. 3. mated using the first-order prediction as
The state of the th comb at the th instant (time frame) is
, . The amplitude and frequency of the th (3)
partial in the th comb at th instant are represented by , (4)
ARORA AND BEHERA: ON-LINE MELODY EXTRACTION FROM POLYPHONIC AUDIO USING HARMONIC CLUSTER TRACKING 523
for all . Here, , are prediction coefficients, the other harmonics will change by the same ratio as that of the
and is defined as leader harmonic.
B. Comb Initialization
If any of the strongest (in amplitude) three partials in the
partial space is not being tracked in one of the already active
combs, then a new comb, , is initialized with , ini-
tialized using the corresponding features of that strongest par-
tial. Note that in the algorithm, this step is performed after all
(6)
the active comb states have been updated at the current instant.
Maximum number of combs is , each of which tracks
For the other harmonics, the constraints of harmonic structure
number of harmonics. If all the combs are active, then the comb
are used, based on the F0 and calculated from the leader
is terminated to start a new comb.
harmonic.
The other harmonics of this comb are found using the har-
monic constraint, i.e. the th harmonic is predicted to be located
at a frequency equal to times the F0. is considered as F0
for this step. For ,
Eq. (10) selects the partial that is close to the predicted harmonic
frequency and has large amplitude.
C. Comb Termination
The playing of any instrument continues for a short period
of time and then breaks. Correspondingly, our combs should Fig. 5. Block schematic for obtaining the score for vocal comb identification.
also get terminated with the decay of source power to less than
certain level. The comb gets terminated if the sum of its partial To reduce their score, we develop a filter based on the idea pre-
amplitudes falls below a threshold, , times , the amplitude of sented in [25] that these instruments mostly have a stable pitch
the strongest partial in the partial space, contour, whereas the vocal pitch contour has an involuntary in-
stability called jitter. The stability of pitch contour is quantified
(11)
using standard deviation (SD), calculated over a finite number
of previous instants (here 20).
There is also a possibility that two combs start tracking the
same source, due to tracking error. In such a case when two
combs have same , for all at latest two time in-
stants, we terminate the one with lesser temporal length. We
consider two time instants so as to avoid termination when the
F0 trajectories of two sources collide, as we are using last two
states in prediction step. If is less than a threshold, , the recognition score is atten-
To make this system more specific to a particular instrument, uated using the filter
i.e. vocal in this work, we specify some more constraints.
1) The vocal F0 range is limited: . if
2) The first harmonic is very predominant. (14)
otherwise.
Under first constraint, the F0 candidates are searched for only
in the limited range, in the comb initialization step. Also, the This weakens the instrumental comb strength. Sometimes, the
combs which have F0 outside this range are terminated because vocal pitch contour also happens to have an SD little less than
this work is not concerned with tracking the accompaniments. the threshold. Pruning the comb with a low SD altogether as in
The second constraint is required only for the initialization of [25] severely reduces the accuracy, but by attenuating them pro-
the comb and is not required once the comb is initialized as the portionately (as in (14)) makes the vocal combs compete better.
algorithm can depend upon higher harmonics for tracking. The vocal comb is selected as the one having maximum score
after filtering, , among all the combs present at time ,
IV. VOCAL COMB IDENTIFICATION
(15)
Among maximum combs present at the current time in-
stant, the one which corresponds to the vocal source has to be
The scheme for obtaining score for vocal comb recognition is
identified.
illustrated in Fig. 5. To summarize, the scheme consists of cal-
To determine the vocal contour, the harmonic strength crite-
culating the harmonic salience score for each comb and filtering
rion is used. The vocal recognition score for the th comb at the
this score through a first order smoothing filter and then a jitter
th instant is defined as
based filter. The comb having maximum score after filtering is
selected to be the one tracking the vocal source.
(12) The algorithm for the entire melody extraction scheme is
given in the form of a pseudo-code in Fig. 6.
While other researchers have used recognition criteria over
individual frames, we use the knowledge-that a comb tracks the V. EVALUATION
same source-to smoothen out the score over time by using a first The proposed scheme of harmonic cluster tracking for
order linear filter, represented in z-domain as, melody extraction (HCTM henceforth) accomplishes the
melody extraction task in two steps, namely harmonic source
(13) tracking and vocal pitch selection. So we evaluate the perfor-
mance accuracies for melody extraction without vocal source
This reinforces the probability of selecting a source as vocal at selection as well as with vocal source selection. The former
the next time instant too, if it is selected vocal at the current one tells if one of the combs is tracking the vocal melody. This
time instant. Thus, it helps in identifying a vocal comb even if performance is not compared with any existing system because
it has less salience at a particular instant, due to momentary rise this work is not concerned with multi-F0 tracking. The latter
in accompaniment strength (e.g. during accompaniment onset). one tells the tracking accuracy of the complete HCTM system.
Here, is a positive constant less than unity. We used , . The values of various other
Many times it is seen that some loud pitched instruments have parameters chosen for our implementation of HCTM are shown
greater score and hence they deteriorate the recognition quality. in the Table I. These parameters were chosen heuristically for
ARORA AND BEHERA: ON-LINE MELODY EXTRACTION FROM POLYPHONIC AUDIO USING HARMONIC CLUSTER TRACKING 525
TABLE II
DATA DESCRIPTION
TABLE III
PITCH AND CHROMA ACCURACIES
A. Results
1) Melody Extraction Accuracy: The Table III shows the
pitch and chroma accuracies for melody extraction by the
TWMDP and HCTM algorithms for all the datasets, with
dataset 1 having different SNRs of 5 dB, 0 dB and 5 dB for
all the six singers. The Fig. 7 illustrates the overall comparison
of the HCTM with the TWDMP algorithm over the entire
dataset 1 consisting of all the six singers, at different SNRs.
For dataset 1, at 0 dB and 5 dB SNRs, both the raw pitch as
well as raw chroma accuracies of the complete HCTM system Fig. 7. Overall pitch and chroma accuracies (%) for all the singers of dataset
are significantly higher than that of the TWMDP, for all the 1, at different SNRs.
singers. At 5 dB SNR, the raw pitch accuracy of complete
HCTM system is larger than the TWMDP for all the singers. vocal pitch and the other tracks a strong accompaniment pitch.
Better results are obtained with the HCTM over the other two The score for instrumental comb (comb 1) spikes at some
datasets, also. The overall accuracies, as illustratively compared points and becomes larger than that for the vocal comb (comb
in Fig. 7, show that the overall pitch and chroma accuracies are 2) leading to mis-classification (Fig. 8(b)). This identification
better for the HCTM than the TWMDP. All these results show error is reduced by using first order filter which effectively uses
that the HCTM significantly outperforms our implementation the information that a comb tracks the same source (Fig. 8(c)).
of the TWMDP algorithm. To check the statistical significance The error is further reduced by the jitter filter (Fig. 8(d)) since
of these reults, we performed a paired-sample t-test to compare the instrumental F0 contour is stable and has low SD.
the accuracies of the two algorithms, and found the p-values to To evaluate the overall effect of score filtering (both
be less than 0.05, which implies that the improvements due to smoothing as well as jitter filters), Fig. 9 presents the accuracy
the HCTM are statistically significant. Also, the same set of pa- of HCTM method with and without the use of the score filters.
rameters was used for all the three datasets, which shows the This shows that the capability of the recognition score, , in
robustness of the HCTM. telling the predominant melody is improved when we incorpo-
2) Effectiveness of Score Filtering: To show the importance rate source information to enhance the score by filtering out
of the vocal recognition score filtering, Fig. 8 shows the effect noisy fluctuations over individual sources and by using the
of filters on the scores of two combs, one of which tracks the jitter information when the predominant melody is vocal. The
ARORA AND BEHERA: ON-LINE MELODY EXTRACTION FROM POLYPHONIC AUDIO USING HARMONIC CLUSTER TRACKING 527
Fig. 8. (a) F0 for two combs, 1-instrumental (solid) and 2-vocal (dotted) (b) unfiltered vocal recognition score, : Score for comb 1 spikes and becomes more
than that of comb 2, causing mis-identification (c) score filtered using first order filter: Mis-identification reduced (d) final score filtered with both the filters, :
Mis-identification further reduced. Here the SD for comb 1 was too small, hence its score is reduced to almost zero.
Fig. 9. Pitch and chroma accuracies (%) without and with score filtering for
various singers (at 0 dB SNR).
between the vocal and the instrumental pitch contours. After the
collision, the prediction step helps the comb in tracking the same
source as it was tracking just before the collision. On the con-
trary, in TWMDP, at the time of such collision, the dual-F0 state
is not able to find the second F0 and hence starts tracking spu-
rious F0, resulting in decrease in accuracy.
Most of the algorithms falling in the general architecture (in-
cluding TWMDP) first find salience and then apply smoothness
constraints. But the HCTM algorithm simultaneously uses the
smoothness constraints as well as the harmonic based salience in
the form of source comb tracking. It then computes the score for
determining vocal F0, filtered over time for individual source.
This reduces errors when the accompaniments momentarily be-
come stronger than vocal source, e.g. during onset. It also helps
in persistently selecting a source as vocal by filtering out the
fluctuations in recognition score. Another important advantage
is that HCTM also relies upon the timbral features for tracking,
although in a naive way, as the measurement update equations
((6), (7)) tend to minimize the distance between successive spec-
tral envelops of a comb. However, the architectures allows for
more sophisticated ways of capturing timbral features to be in-
corporated easily.
Notably, the accuracy of HCTM is higher than the TWMDP
even while the former uses online processing whereas the later
uses offline dynamic programming for determining melody. A
typical example of melody extraction using the two algorithms
Fig. 11. Example of vocal melody (shown by dark cross) extracted by TWMDP
is given in Fig. 11. It can be seen that HCTM gives smoother and HCTM along with other F0s (shown by dots) for a 4 s long excerpt from
estimate of F0 contours because it relies on multiple harmonics the song titled ‘Kenshin_2_04’ at 0 dB SNR. The actual vocal melody (ground
for F0 estimation. Also, some combs of HCTM can be seen truth) is shown by light circles.
tracking the higher harmonics of the vocal comb, but the octave
error is taken care of by the identification score, as explained However, the HCTM algorithm uses a fixed . This effectively
ahead. prevents this problem if is large enough to capture most
2) Octave Error: Many works [10] including TWMDP use of the energy in the vocal spectrum, which is generally car-
the F0 likelihood criterion of the form ried by the first few harmonics. The use of a fixed is also
supported empirically by the trained weight models in [12],
where the weight of the harmonics, for weighted sum based
(16)
harmonic salience function, decreases with the increase in har-
monic index, showing that the harmonics with large harmonic
where, is F0 candidate, is predicted harmonic count ( index contribute less to the salience function and hence can be
, here), is a function which mea- dropped out.
sures the likelihood for the individual partial peak being the th Another reason for the octave error is the distortion of the
harmonic. So the overall likelihood is measured over a variable first harmonic. This can make the first harmonic unavailable to
number of partials. Such a formulation is prone to octave error. be selected as candidate F0 or may reduce its salience because
For example, when the even harmonics are stronger than the a small error in its frequency gets multiplied by integer fac-
odd harmonics, then may become more than , re- tors while matching with the higher harmonic partials. TWMDP
sulting in being selected as pitch in place of . We note takes care of this problem by taking various integer sub-multi-
that this problem is occurring because for is simply twice ples of a well formed harmonic partial, which however increases
of that for . the number of computations. HCTM algorithm is capable of de-
pending upon higher harmonics for tracking and hence tackles
this problem in a computationally efficient way.
The reduction in octave errors can be seen in the results
(Table III) in the form of reduced difference between the
raw pitch and raw chroma accuracies for HCTM algorithm.
Especially for the singers Fdps and Geniusturtle, the chroma
accuracies of TWMDP and HCTM (complete) algorithm are
close but HCTM pitch accuracy is much better than the former.
The overall results (Fig. 7) also illustrate the same effect, as
the difference in RPA’s for the two algorithms is quite large as
ARORA AND BEHERA: ON-LINE MELODY EXTRACTION FROM POLYPHONIC AUDIO USING HARMONIC CLUSTER TRACKING 529
compared to that between the RCA’s for the two. This shows TABLE IV
that the HCTM is better capable to reduce the octave errors. MIREX AUDIO MELODY EXTRACTION RESULTS
3) Comb Initialization: HCTM simply uses a fixed number
(3 here) of maximum amplitude partials for comb initialization
using the assumption that the first harmonic is the strongest of
all other harmonics. Even if it is not the strongest, but is one of
the top three harmonics, it is able to initialize a new comb. The
effectiveness of this unsophisticated but fast way for the vocal
sources can be seen in high accuracies without vocal selection.
However, some instrumental sounds may not satisfy this con-
straint, for which this method can be improved by using other
functions like the inverse Fourier transform or by using the par- log magnitude spectrum at the th time instant, at the index
tials having high static likelihood (as in (10)), but at the cost of :
extra computations.
VI. CONCLUSION
Notably, the accuracies achieved by the submitted HCTM
algorithm were better than those achieved by the version of
In this work, we have described a harmonic cluster tracking TWMDP algorithm submitted by Rao & Rao [40] in 2009, over
system for tracking various harmonic sources in polyphonic the same datasets.3 The achieved RPA and RCA are shown in
music and among them identifying the predominant vocal the Table IV.
melody based on various heuristics. The evaluation results clearly indicate that the proposed
The novel contributions of this work are: on-line real time system is able to achieve significant accu-
(i) Unified Approach: Most of the previous approaches sep- racy levels as compared to the existing state-of-the-art offline
arately apply the static and dynamic constraints, applying method. The current approach identifies the desired sound
the static constraints first, followed by the dynamic ones source based mainly on the harmonic strength, but using tim-
in the form of dynamic programming. But our approach is bral features for this task may further improve the accuracy.
a unified one using both these constraints simultaneously
at every step of tracking. Thus, each frame is traversed
only once, which is required for on-line processing. ACKNOWLEDGMENT
(ii) Tracking strong higher harmonics: While previous ap- The authors would like to thank Dr. P. Rao and Dr. V. Rao
proaches track the ‘F0 trajectories,’ using Viterbi algo- for sharing their audio dataset for the experiments reported in
rithm etc., our algorithm depends upon the ‘strong higher this paper. The authors are grateful to the reviewers for helpful
harmonics’ for tracking. comments in improving this manuscript.
(iii) Vocal selection filters: Instead of mostly used dynamic
programming based offline methods, our approach uses
score filters which select vocal source based on strength REFERENCES
and jitter constraints. [1] A. de Cheveigné and H. Kawahara, “YIN, a fundamental frequency
(iv) Real time method: The proposed method is implementable estimator for speech and music,” J. Acoust. Soc. Amer., vol. 111, no. 4,
in real time (10 times faster than real time) which makes pp. 1917–1930, 2002.
[2] A. Klapuri, “Automatic music transcription as we know it today,” J.
the proposed method suitable for various novel applica- New Music Res., vol. 33, no. 3, pp. 269–282, 2004.
tions. [3] A. Klapuri and M. Davy, Signal Processing Methods for Music Tran-
The algorithm presented in this work was submitted to scription. Secaucus, NJ: Springer-Verlag, 2006.
[4] T. F. Quatieri, Discrete Time Speech Signal Processing-Principles and
the Music Information Retrieval Evaluation eXchange Practice. Upper Saddle River, NJ: Prentice-Hall, 2002.
(MIREX) 2012 campaign for automatic melody extrac- [5] P. Boersma and D. Weenink, Praat: Doing Phonetics by Computer
tion task [39],2 the results for which are available at 2004.
[6] G. Hu and D. L. Wang, “A tandem algorithm for pitch estimation
http://www.music-ir.org/mirex/wiki/2012:MIREX2012\_Re- and voiced speech segregation,” IEEE Trans. Audio, Speech, Lang.
sults. The HCTM algorithm was extended to track the instru- Process., vol. 18, no. 8, pp. 2067–2079, Nov. 2010.
mental melodies as well, by using inverse Fourier transform [7] E. Vincent, N. Berlin, and R. Badeau, “Harmonic and inharmonic non-
negative matrix factorization for polyphonic pitch transcription,” in
(IFT) of the log magnitude spectrum, in the initialization and Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 2008, pp.
predominant source identification functions. Initialization was 109–112.
based on the top peaks in IFT, instead of STFT. For predomi- [8] J. Durrieu, G. Richard, B. David, and C. Fevotte, “Source/filter model
for unsupervised main melody extraction from polyphonic audio sig-
nant source identification, the R.H.S. of (12) was by multiplied nals,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 3, pp.
with , the inverse Fourier transform of the 564–575, Mar. 2010.
2The submitted code is available for research purpose at http://home. 3http://www.music-ir.org/mirex/wiki/2009:Audio\_Melody\_Extrac-
iitk.ac.in/~lbehera/isl/AB1.rar tion\_Results
530 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 3, MARCH 2013
[9] J. Durrieu, B. David, and G. Richard, “A musically motivated mid-level [30] P. Verma and P. Rao, “Real-time melodic accompaniment system for
representation for pitch estimation and musical audio source separa- indian music using TMS320C6713,” in Proc. Int. Conf. VLSI Design
tion,” IEEE J. Sel. Top. Signal Process., vol. 5, no. 6, pp. 1180–1191, and Embedded Syst., 2012.
Oct. 2011. [31] K. Dressler, “Sinusoidal extraction using an efficient implementation
[10] J. Wise, J. Caprio, and T. Parks, “Maximum likelihood pitch estima- of a multi-resolution FFT,” in Proc. Int. Conf. Digital Audio Effects
tion,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-24, no. (DAFx), 2006, pp. 247–252.
5, pp. 418–423, Oct. 1976. [32] G. Welch and G. Bishop, An introduction to the Kalman filter Chapel
[11] C. Yeh, A. Roebel, and X. Rodet, “Multiple fundamental frequency es- Hill, NC, 1995, Tech. Rep..
timation and polyphony inference of polyphonic music signals,” IEEE [33] Y. Li and D. L. Wang, “Separation of singing voice from music accom-
Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp. 1116–1126, paniment for monaural recordings,” IEEE Trans. Audio, Speech, Lang.
Aug. 2010. Process., vol. 15, no. 4, pp. 1475–1487, May 2007.
[12] A. Klapuri, “Multiple fundamental frequency estimation by summing [34] M. P. Ryynanen and A. Klapuri, “Polyphonic music transcription
harmonic amplitudes,” in Proc. 7th Int. Symp. Music Inf. Retreival using note event modeling,” in Proc. IEEE Workshop Applicat. Signal
(ISMIR’06), Oct. 2006, pp. 216–221. Process. Audio Acoust., Oct. 2005, pp. 319–322.
[13] K. Dressler, “Pitch estimation by the pair-wise evaluation of spectral [35] M. Goto, “Prefest: A predominant-f0 estimation method for polyphonic
peaks,” in Proc. AES 42nd Int. Conf., 2011. musical audio signals,” MIREX 2005.
[14] J. Salamon, E. Gomez, and J. Bonada, “Sinusoid extraction and [36] C. L. Hsu and J. S. R. Jang, “On the improvement of singing voice
salience function design for predominant melody estimation,” in Proc. separation for monaural recordings using the MIR-1K dataset,” IEEE
Int. Conf. Digital Audio Effects (DAFx), 2011, pp. 73–80. Trans. Audio, Speech, Lang. Process., vol. 18, no. 2, pp. 310–319, Feb.
[15] R. C. Maher and J. W. Beauchamp, “Fundamental frequency esti- 2010.
mation of musical signals using a two-way mismatch procedure,” J. [37] W.-H. Tsai and H.-M. Wang, “Automatic singer recognition of popular
Acoust. Soc. Amer., vol. 95, no. 4, pp. 2254–2263, 1994. music recordings via estimation and modeling of solo vocal signals,”
[16] V. Rao and P. Rao, “Vocal melody extraction in the presence of pitched IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 330–341,
accompaniment in polyphonic music,” IEEE Trans. Audio, Speech, Jan. 2006.
Lang. Process., vol. 18, pp. 2145–2154, Nov. 2010. [38] G. E. Poliner, D. P. W. Ellis, A. F. Ehmann, E. Gomez, S. Streich,
[17] B. Doval and X. Rodet, “Fundamental frequency estimation and and B. Ong, “Melody transcription from music audio: Approaches and
tracking using maximum likelihood harmonic matching and HMMs,” evaluation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no.
in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 1993, 4, pp. 1247–1256, May 2007.
vol. 1, pp. 221–224. [39] V. Arora and L. Behera, “Online melody extraction: Mirex 2012,”
[18] M. P. Ryynänen and A. Klapuri, “Automatic transcription of melody, MIREX 2012.
bass line, chords in polyphonic music,” Comput. Music J., vol. 32, no. [40] V. Rao and P. Rao, “Melody extraction using harmonic matching,”
3, pp. 72–86, Sep. 2008. MIREX 2009.
[19] J. Wu, E. Vincent, S. A. Raczynski, T. Nishimoto, N. Ono, and S.
Sagayama, “Polyphonic pitch estimation and instrument identification Vipul Arora received the B.Tech. degree in elec-
by joint modeling of sustained and attack sounds,” IEEE J. Sel. Topics trical engineering from the Indian Institute of
Signal Process., vol. 5, no. 6, pp. 1124–1132, Oct. 2011. Technology (IIT), Kanpur, India. Currently, he is
[20] W. H. Liao, A. W. Y. Su, C. Yeh, and A. Roebel, “On the use of per- working towards the PhD degree at IIT Kanpur.
ceptual properties for melody estimation,” in Proc. Int. Conf. Digital His research interests include music information
Audio Effects (DAFx-11), Paris, France, 2011, pp. 141–145. retrieval and semantic signal processing.
[21] H. Kameoka, T. Nishimoto, and S. Sagayama, “Multi-pitch trajectory
estimation of concurrent speech based on harmonic GMM and non-
linear Kalman filtering,” in Proc. Int. Conf. Spoken Lang. Process., Oct.
2004, vol. 1, pp. 2433–2436.
[22] M. Goto, “A real-time music scene description system: Predominant-f0
estimation for detecting melody and bass lines in real-world audio sig-
nals,” Speech Commun. (ISCA J.), vol. 43, no. 4, pp. 311–329, 2004.
[23] T. Virtanen, “Audio signal modeling with sinusoids plus noise,” M.S.
thesis, Dept. of Information Technol., Tampere Univ. of Technol., Tam- Laxmidhar Behera (S’92–M’03–SM’03) received
pere, Finland, 2000. the BSc (engineering) and MSc (engineering)
[24] J. G. A. Barbedo and G. Tzanetakis, “Musical instrument classification degrees from NIT Rourkela in 1988 and 1990,
using individual partials,” IEEE Trans. Audio, Speech Lang. Process., respectively. He received the PhD degree from
vol. 19, no. 1, pp. 111–122, Jan. 2011. IIT Delhi. He has worked as an assistant professor
[25] V. Rao, S. Ramakrishnan, and P. Rao, “Singing voice detection in poly- at BITS Pilani during 1995–1999 and pursued
phonic music using predominant pitch,” in Proc. INTERSPEECH’09, the postdoctoral studies in the German National
2009, pp. 1131–1134. Research Center for Information Technology, GMD,
[26] H. Tachibana, T. Ono, N. Ono, and S. Sagayama, “Melody extraction Sank Augustin, Germany, during 2000–2001. He is
in music audio signal by melodic component enhancement and pitch currently working as a professor in the Department
tracking,” MIREX, 2009. of Electrical Engineering, IIT Kanpur. He joined
[27] C. L. Hsu, D. Wang, and J. S. R. Jang, “A trend estimation algorithm the Intelligent Systems Research Center (ISRC), University of Ulster, United
for singing pitch detection in musical recordings,” in Proc. IEEE Int. Kingdom, as a reader on sabbatical from IIT Kanpur during 2007–2009. He
Conf. Acoust., Speech Signal Process., May 2011, pp. 393–396. has also worked as a visiting researcher/professor at FHG, Germany, and ETH,
[28] P. de la Cuadra and A. Master, “Efficient pitch detection techniques for Zurich, Switzerland. He has more than 150 papers to his credit published
interactive music,” in Proc. Int. Comp. Music Conf., 2001. in refereed journals and presented in conference proceedings. His research
[29] P. McLeod, “Fast, accurate pitch detection tools for music analysis,” interests include intelligent control, robotics, information processing, neural
Ph.D. dissertation, Univ. of Otago, Dunedin, New Zealand, 2008. networks, and cognitive modeling. He is a senior member of the IEEE.