You are on page 1of 4

2018 IEEE Latin American Conference on Computational Intelligence (LA-CCI) Guadalajara, Jalisco, Mexico, November 7-9, 2018

A comparative between Mel Frequency Cepstral


Coefficients (MFCC) and Inverse Mel Frequency
Cepstral Coefficients (IMFCC) features for an
Automatic Bird Species Recognition System
Angel David Pedroza Ramirez Jose Ismael de la Rosa Vargas
Unidad Academica de Ingenieria Electrica Unidad Academica de Ingenieria Electrica
Universidad Autonoma de Zacatecas Universidad Autonoma de Zacatecas
Zacatecas, Mexico Zacatecas, Mexico
P.A.D 16@hotmail.com ismaelrv@yahoo.com

Rogelio Rosas Valdez Aldonso Becerra


Unidad Academica de Ciencias Biologicas Unidad Academica de Ingenieria Electrica
Universidad Autonoma de Zacatecas Universidad Autonoma de Zacatecas
Zacatecas, Mexico Zacatecas, Mexico
rogrosas@gmail.com a7donso@hotmail.com

Abstract—In this paper a comparative between Mel Fre- recognition techniques [4], [5], [6], [7]. Although there are a
quency Cepstral Coefficients (MFCC) and Inverse Mel Frequency huge variability of techniques and features to be extracted from
Cepstral Coefficients (IMFCC) features for an automatic bird sound, one of the traditional features in speech recognition,
species recognition system is proposed with the aim to validate
IMFCC as a feature that can also be extracted for bird species are Mel frequency cepstral coefficients (MFCC). Despite the
recognition. In biodiversity monitoring task there are some high efficiency by using MFCC features, some new studies
traditional techniques and, bioacoustics studies biodiversity by a proposes the use of inverse mel frequency cepstral coefficients
noninvasive way based on the relationship between animal species (IMFCC) as a new perspective that take into account frequency
and its sounds. Bioacoustics methodology for avian conservation information missed by MFCC features improving the spectral
are based on automatic speech recognition techniques and one
of the traditional extracted features in this area are MFCC. modeling [8], [9], [10], [11].
Nevertheless some new studies uses IMFCC as a complementary Even though MFCC features have been applied in bioacous-
frequency information. From results, it is concluded that IMFCC tics, the application of IMFCC features in avian conservation
features have better performance than traditional MFCC features systems have not been applied yet. In this paper a comparative
but, performance still depends on the recognized bird sound. between MFCC and IMFCC features for an automatic bird
Index Terms—bioacoustics, bird classification, HMM, IMFCC,
MFCC. species recognition system is conducted with the aim to
validate the use of IMFCC features as a feature that can also be
I. I NTRODUCTION extracted for bird species recognition. From results, it can be
observed that IMFCC features give better performance than
Biodiversity monitoring represent an invaluable tool in
traditional MFCC features. However, the performance still
conservation and climate change reduction [1], [2]. Although
depends on the bird species to be recognized.
there are some traditional invasive monitoring techniques, a
In the following: Section II explains basic mathematical
noninvasive method is preferred. Bioacoustics studies biodi-
background of MFCC and IMFCC features. Section III present
versity based on the relationship between animal species and
the used automatic bird species recognition system. Section
its sounds.
IV summarized the comparative results between MFCC and
Bird species are very sensitive to climate change and
IMFCC features, and some discussion. Finally, Section V
its conservation and understanding represents and important
concludes this paper.
challenge [3]. In this sense, bioacoustics traditional method-
ology for avian conservation are based on automatic speech II. F EATURE E XTRACTION : MFCC AND IMFCC
A. MFCC
The authors would like to thank CONACYT and to the Programa de
Doctorado en Ciencias de la Ingenieria from the Universidad Autonoma de Human perception of sound can be described by the Mel
Zacatecas. scale and MFCCs are based on this configuration. The basic
978-1-5386-4626-7/18/$31.00 © 2018 IEEE idea is to capture sound frequencies by a filter bank and
determine its response according to a certain filter gain (which
resolution is high in low frequencies and low for high frequen-
cies) [5], [12].
To compute MFCC features, once bird sound signal (y(n))
is framed and windowed (using a hamming window), for each
frame, power spectrum (y(k))is calculated according to:
Ns
1 X
y(k) = | y(n)WNkn |2 , (1)
N s n=1 Fig. 1. MFCC filter bank.
where WN = exp(−j2π/N s), N s is the number of points
of the 1D Discrete Fourier Transform (1D DFT) and, 1 ≤ k ≤
N s. Next, traditionally set of p Mel spaced triangular filters are
constructed (see Fig. 1). Then, output of filterbank (E(j)) is
calculated by multiplying each frame energy spectrum (y(k))
and filters gain (Mj (k)) by:
N s/2−1
X
E(j) = y(k)Mj (k), (2)
k=1 Fig. 2. IMFCC filter bank.
where j is the index of each filter in the filter bank and,0 ≤
j < p.
After that, Discrete Cosine Transform (DCT-II) is calculated III. B IRD SPECIES RECOGNITION ALGORITHM
according to:
The basic idea behind the proposed algorithm is that, by an-
p−1
X    1π  alyzing bird sounds (songs and calls), identify an specific bird
Cl (t) = log E j cos t j − , (3) species among the huge variability of bird species. This task
j=0
2 p
represents a big challenge especially under hard environment
where Cl (t) is the tth order MFCC of the lth frame and, record conditions. From this perspective, a lot of techniques
t=1,...,13. Finally, mean value of each frame coeficientes is and automatic bird species recognition configuration system
then calculated (this process is repeated per bird sound audio where develop to complete this task.
file). In this section an automatic bird species recognition system
B. IMFCC used to perform the comparatives is described (see Fig. 3).
On the other hand, to capture the acoustic information A. Pre-processing
missed by the MFCC features, Inverted Mel Frequency Cep-
stral Coefficients (IMFCC) features are traditionally calcu- 1) “Most important” region extraction and noise elimi-
lated. In other words, IMFCC filter bank is configured to nation.: The proposed step is presented as a new “visual
have more frequency resolution in high frequencies and a low filter” technique that, by the user observation-decision of two
resolution for low frequencies [8], [9]. points from the spectrogram, extract the “most important”
To compute IMFCC features, once power spectrum (y(k)) information and/or eliminate a “noise region”. Let f (n) be
is calculated, a set of p Imel spaced triangular filters are a N-point 1D sequences (with index range n = 0, . . . , N − 1).
constructed (see Fig. 2). Next, the output of the IMFCC filter Then, from a user spectrogram selected “most important”
bank (E 0 (j)) is calculated according to: region, 1D DFT of the sequence (F (k)) is calculated and
frequency selected index from region are mapped into 1D DFT
N s/2−1
X coordinates. Then, the minimum of 1D DFT is replaced in the
E 0 (j) = y(k)Mj0 (k), (4) magnitude value of the frequencies that did not correspond
k=1
to the selected region. After that, calculate the 1D Inverse
where Mj0 (k) is the Imel filters gain. Discrete Fourier Transform (1D IDFT) from signal according
Then, Discrete Cosine Transform (DCT-II) is calculated to:
according to:
p−1 N −1
1 π 1 X
f 0 (n) = F (k)WN−kn ,
X
Cl0 (t) = 0
log[E (j)] cos[t(j − ) ], (5) (6)
2 p N
j=0 k=0

where Cl0 (t) is the tth order IMFCC of the lth frame and, where f 0 (n) is the frequency region of interest of the bird
t=1,...,13. Finally, as in MFCC features, mean value of each sound signal. After that, time domain region of interest are
frame coeficientes is then calculated (process is repeated per mapped and then extracted into time domain coordinates.
bird sound audio file). Finally, after normalization, based on an audio playback, the
process is repeated until the user decides that the resulting IV. R ESULTS AND D ISCUSSION
audio record has the desired “most important” information.
To compare efficiencies of MFCC and IMFCC features by
the proposed algorithm, bird sound files were collected from
the next free online bird sound databases:
• INECOL [15]: This bird sound database is offered as
a toolkit for ornithology research and was constructed
with the aim to study and disseminate the bird sound
recordings in Mexico.
• xeno-canto [16]: Bird sound recording database from a
huge variability of bird species from around the world.
Since xeno-canto recordings are volunteer submitted there
Fig. 3. Bird sound recognition algorithm.
is no uniform recording conditions and, since there are many
On the other hand, a “noise region” is eliminated by the types of sounds that can be produced by bird species, same
user selection of two points in the spectrogram. In this sense, type recordings was collected from database to create models
as in previous step, 1D DFT from region (G(k)) is calculated, from specific bird sounds. Also, INECOL bird sound database
spectrogram selected frequency information (from the “noise is nowadays limited and therefore only a partial selection of
region”) is replaced with the minimum of 1D DFT magnitude bird species recordings was made by selecting the bird species
value and, IDFT (g 0 (n)) is calculated. Finally, resulting region whose type of bird sound has more records in xeno-canto
(g 0 (n)) is replaced in f 0 (n). Also, the process is repeated until database (see table I).
the user decides that the resulting audio record has an adequate Evaluation of the comparatives was made by the following
noise elimination. parameters:
2) Spectrum threshold: A spectrum threshold is proposed • True Acceptance Rate (TAR): Probability that a bird
since not all the information in the spectrogram is important to sound is correctly identified,
0
PN s After f (n) is framed and windowed, for each
be processed. • True Rejection Rate (TRR): Probability that a bird sound
frame ( n=1 y(n)), energy spectrum (y(k)) is calculated by: is correctly rejected.
Ns
1 X The training set for TAR test was the 70% of bird sound
y(k) = | y(n)WNkn |2 , (7) records per bird species and, for TRR test, it was used the
N s n=1
total of records from others (different to the in acceptance
where WN = exp(−j2π/N s) and, 1 ≤ k ≤ N s. After bird sound) bird species. Results of comparatives are shown
normalization, an “adequate” magnitud threshold value (thr) in table II.
is calculated and for each frame, frequencies magnitud values As expected, table II shows that the efficiency of the
in y(k) under thr are set to a minimum value. The optimal algorithms change depending on the extracted features. First,
spectrum threshold value in the experiments was a constant for Automolus rubiginosus it is shown an improvement in TAR
value of thr=0.01. performace and TRR performance by using IMFCC features
B. Bird sound classification (36.37% and 8.69% respectively). Then, for Synallaxis ery-
Once features are extracted from bird sound signals, tradi- throthorax it is shown an equal TAR performance for both
tional Hidden Markov Models (HMM) are used for training techniques but an improvement in TRR performance by using
and testing [13], [?]. Parameters of a model based on the IMFCC features. Also, for Cardinalis cardinalis it is shown an
extracted features (MFCC or IMFCC) are calculated in the improvement in TAR performance (7.69%) by using IMFCC
training phase according to a vector of probabilities of each features but an improvement in TRR performance (23.80%) by
state in sequence (π), a probability matrix of the transitions using MFCC features. On the other hand, a random recognition
between the states (A) and, a probability matrix for the efficiency is shown in Cercomacra tyrannina.
emissions given the states (B). This parameters define a bird From general efficiencies obtained in Table II, in contrast
sound species model (λ) according to: with traditional efficiencies obtained in automatic speech
recognition by using MFCC and IMFCC features (that usually
λ = (A, B, π), (8) assumes the use of a large-scale training datasets), it is shown
Baum-Welch algorithm is used for training the previous pa- that the amount of data at the training step plays a significant
rameter set by an iterative process to obtain A,B and π role. However, due to the high variability of bird sounds
optimized parameters. In other words, for each bird-sound ex- among all the bird species makes difficult the use of large-
tracted features from species it is calculated a model. Finally, scale training datasets (and therefore limited data is an usual
given a bird sound in test step, it is calculated the probability condition). In addition, the complexity of the bird sounds to
that the given sound has been produced by a specific bird be recognized have an effect in the efficiency of the algorithm.
species(λ). HMM method was implemented by using a free Therefore depending on the bird sound to be recognized is the
software [14]. efficiency of the feature extracted (see Fig. 4).
TABLE I
B IRD SOUND FILES IN COLLECTED DATABASE

Order Family Species Bird sound


files
Passeriformes − Furnariidae Automolus 35
Perching Birds rubiginosus
Synallaxis 54
erythrotho-
rax
Cardinalidae Cardinalis 41
cardinalis (a)
Thamnophilidae Cercomacra 15
tyrannina
Tyrannidae Myiozetetes 33
similis

TABLE II
R ESULTS OF BIRD IDENTIFICATION TEST (%).

MFCC IMFCC
Bird species TAR TRR TAR TRR
Automolus rubiginosus 63.63 34.78 100 43.47 (b)
Synallaxis erythrothorax 100 28.57 100 61.90
Fig. 4. Bird sound spectrogram examples: a) Cercomacra tyrannina song and
Cardinalis cardinalis 76.92 23.80 84.61 0 b) Myiozetetes similis call.
Cercomacra tyrannina 0 0 0 0
Myiozetetes similis 30 4.1 10 61.90

[6] T. S. Brandes, “Feature vector selection and use with hidden markov
models to identify frequency-modulated bioacoustic signals amidst
noise,” IEEE Transactions on Audio, Speech, and Language Processing,
V. C ONCLUSION vol. 16, no. 6, pp. 1173–1180, 2008.
[7] T. M. Aide, C. Corrada-Bravo, M. Campos-Cerqueira, C. Milan,
In this paper a comparative between MFCC and IMFCC G. Vega, and R. Alvarez, “Real-time bioacoustics monitoring and
features for automatic bird species recognition was proposed. automated species identification,” PeerJ, vol. 1, p. e103, 2013.
[8] S. Chakroborty and G. Saha, “Improved text-independent speaker identi-
Although both features are traditionally extracted in speech fication using fused mfcc & imfcc feature sets based on gaussian filter,”
recognition systems, it is shown that IMFCC features can International Journal of Signal Processing, vol. 5, no. 1, pp. 11–19,
also be applied in a bird species recognition system. From 2009.
[9] D. Sharma and I. Ali, “A modified mfcc feature extraction technique
results, IMFCC features have a better general efficiency (in for robust speaker recognition,” in 2015 International Conference on
contrast with traditional MFCC features) but it still depends Advances in Computing, Communications and Informatics (ICACCI),
on the bird species to be recognized. In the same manner from 2015, pp. 1052–1057.
[10] N. Sen, T. Basu, and S. Chakroborty, “Comparison of features extracted
results, the “most important” by-observation region extraction using time-frequency and frequency-time analysis approach for text-
play an important role in the sense that it is expected a better independent speaker identification,” in 2011 National Conference on
performance when bird sounds are easy to extract from the Communications (NCC), 2011, pp. 1–5.
[11] S. Memon, M. Lech, N. Maddage, and L. He, “Application of the vector
spectrogram. quantization methods and the fused mfcc-imfcc features in the gmm
In future work the use of a traditional fusion feature model based speaker recognition,” in Recent Advances in Signal Processing.
(MFCC-IMFCC) is going to be applied for an automatic bird IntechOpen, 2009.
[12] C. Kwan, G. Mei, X. Zhao, Z. Ren, R. Xu, V. Stanford, C. Rochet,
species recognition system. J. Aube, and K. C. Ho, “Bird classification algorithms: theory and exper-
imental results,” in 2004 IEEE International Conference on Acoustics,
R EFERENCES Speech, and Signal Processing, vol. 5, 2004, pp. V–289.
[13] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition.
[1] C. Bellard, C. Bertelsmeier, P. Leadley, W. Thuiller, and F. Courchamp, Prentice-Hall, Inc., 1993.
“Impacts of climate change on the future of biodiversity,” Ecology [14] K. Murphy, “Hidden markov model (hmm) toolbox for matlab,”
letters, vol. 15, no. 4, pp. 365–377, 2012. https://www.cs.ubc.ca/ murphyk/Software/HMM/hmm.html, 1998.
[2] C. Duncan, J. R. Thompson, and N. Pettorelli, “The quest for a mech- [15] “Inecol,” http://www1.inecol.edu.mx/sonidos/menu.htm, Accessed 16
anistic understanding of biodiversity–ecosystem services relationships,” Marzo 2017.
Proc. R. Soc. B, vol. 282, no. 1817, p. 20151348, 2015. [16] “Xeno-canto,” https://www.xeno-canto.org, Accessed 07 Febrero 2017.
[3] J. Wormworth and K. Mallon, “Bird species and climate change: the
global status report-a synthesis of current scientific understanding of
anthropogenic climate change impacts on global bird species now, and
projected future effects,” 2006.
[4] P. C. Caycedo-Rosales, J. F. Ruiz-Muñoz, and M. Orozco-Alzate,
“Reconocimiento automatizado de señales bioacústicas: una revisión de
métodos y aplicaciones,” Ingenierı́a y Ciencia, vol. 9, no. 18, 2013.
[5] C. Chou, P. Liu, and B. Cai, “On the studies of syllable segmentation
and improving mfccs for automatic birdsong recognition,” in 2008 IEEE
Asia-Pacific Services Computing Conference, 2008, pp. 745–750.

You might also like