You are on page 1of 6

(IJARAI) International Journal of Advanced Research in Artificial Intelligence

,
Vol. 1, No. 4, 2012
43 | P a g e
www.ijacsa.thesai.org
Automatic Melakarta Raaga Identification Syste:
Carnatic Music
B. Tarakeswara Rao
1
, Sivakoteswararao Chinnam
2
, P Lakshmi Kanth
3
, M.Gargi
4

School of Computing

Vignan University,
1,2

Department of IT

Vignan Lara Institute of Technology
3,4


Abstract— It is through experience one could as certain that the
classifier in the arsenal or machine learning technique is the
Nearest Neighbour Classifier. Automatic melakarta raaga
identification system is achieved by identifying the nearest
neighbours to a query example and using those neighbours to
determine the class of the query. This approach to classification is
of particular importance today because issues of poor run-time
performance are not such a problem these days with the
computational power that is available. This paper presents an
overview of techniques for Nearest Neighbour classification
focusing on; mechanisms for finding distance between neighbours
using Cosine Distance, Earth Movers Distance and formulas are
used to identify nearest neighbours, algorithm for classification in
training and testing for identifying Melakarta raagas in Carnatic
music. From the derived results it is concluded that Earth
Movers Distance is producing better results than Cosine Distance
measure.
Keywords- Music; Melakarta Raaga; Cosine Distance; Earth
Movers Distance; K-NN;
I. INTRODUCTION
Performance in Indian classical music is always within a
Melakarta raaga, except for solo percussion. Melakarta raaga
is a system within which performers improvise and compose.
Melakarta raagas are often summarized by the notes they use,
though many Melakarta raagas in fact share the same notes.
Melakarta raaga recognition is a difficult task even for
humans. A Melakarta raaga is popularly defined as a specified
combination, decorated with embellishments and graceful
consonances of notes within a mode which has the power of
evoking a unique feeling distinct from all other joys and
sorrows. It possesses something of a transcendental element.
A Melakarta raaga is characterized by several attributes,
like its Vaadi-Samvaadi, Aarohana-Avrohana and Pakad [17],
besides the sequence of notes. It is of utmost importance to
note here that no two performances of the same Melakarta
raaga, even two performances by the same artist, will be
identical. A certain music piece is considered a certain
Melakarta raaga, as long as the attributes associated with it are
satisfied. This concept of Indian classical music, in that way,
is very open.
Based on this work the following major contributions to
the study of musical raagas and KNN with CD and EMD are
made. In first place, our solutions based primarily on
techniques from speech processing and pattern matching,
which shows that techniques from other domains can be
purposefully extended to solve problems in computational
musical raagas, Secondly, the two note transcription methods
presented are novel ways to extract notes from sample raagas
of Indian classical music. This approach has given very
encouraging results.
The rest of the paper is organized as follows. Section 2
highlights some of the useful and related previous research
work in the area. The solution strategy is discussed in detail in
Section 3. The test procedures and experimental results are
presented in Section 4, Finally, Section 5 lists out the
conclusions.
II. PEVIOUS WORK
The earlier work in Carnatic music retrieval is on a slow
pace compared to western music. Some work is being done in
Swara identification [1] and Singer identification [2] of
Carnatic music. In Hindustani music work has been done in
identifying the Melakarta raaga of Hindustani music [3]. In
[3][19] the authors have created a HMM based on which they
have identified two raagas of Hindustani music. The
fundamental difference between Hindustani Raaga pattern and
Carnatic Raaga pattern is that in Hindustani R1, R2 are present
as against R1, R2, R3 in Carnatic. Similarly G, D, N all has
three distinct frequencies in Carnatic music as compared to
two frequencies in Hindustani [8]. This reduces the confusion
in identifying the distinct frequencies in Hindustani music as
compared to Carnatic music. The authors have not used
polyphonic music signal and have assumed that the input
music signal is a voice only signal.
The fundamental frequency of the signal was also assumed
and based on these features the Melakarta raaga identification
process was done for two Hindustani raagas. On the western
music aspect, melody retrieval is being performed by
researchers. The one proposed by [9] is based on identifying
the change in frequency in the given query.
The query is received in the form a humming tune and
based on the rise and fall in the pitch of the received query, the
melody pattern that matches with the query’s rise and fall of
pitch is retrieved. The melody retrieval based on features like
distance measures and gestalt principles. The approach is
based on low level signal features and the Melakarta raaga is
identified by considering different instrument signal as input
to our system. In the present work Melakarta raaga
identification is done using KNN with two different distance
metrics one CD and the other EMD.
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 1, No. 4, 2012
44 | P a g e
www.ijacsa.thesai.org
III. PRESENT WORK
K Nearest Neighbour has rapidly become one of the
booming technologies in today’s world for developing
convoluted control systems. Melakarta raaga Recognition is
the fascinating applications of KNN – which is basically used
in Melakarta raaga identification for many cases, Melakarta
raaga detection is considered as a rudimentary nearest
neighbour problem. The problem becomes more fascinating
because the content is an audio – given an audio find the audio
closest to the query from the trained database.
The intuition underlying Nearest Neighbour Classification
is quite straight forward, classified based on the class of their
nearest neighbours. It is often useful to take more than one
neighbour into account so the technique is more commonly
referred to as k-Nearest Neighbour (k-NN) Classification
where k nearest neighbours are used in determining the class.
Since the training examples are needed at run-time, i.e. they
need to be in memory at run-time, it is sometimes also called
Memory-Based Classification. Because induction is delayed to
run time, it is considered a Lazy Learning technique. Because
classification is based directly on the training examples it is
also called Example-Based Classification or Case-Based
Classification.
The basic idea is as shown in Figure 1 which depicts a 3-
Nearest Neighbour Classifier on a two-class problem in a two-
dimensional feature space. In this example the decision for q
1

is straightforward – all three of its nearest neighbours are of
class O so it is classified as an O. The situation for q
2
is a bit
more complicated at it has two neighbours of class X and one
of class O. This can be resolved by simple majority voting or
by distance weighted voting (see below). So k−NN
classification has two stages; the first is the determination of
the nearest neighbours and the second is the determination of
the class using those neighbours. The following section
describes the techniques CD and EMD which is used to raaga
classification.

Fig. 1 A simple example of 3-Nearest Neighbour Classification
A. COSINE DISTANCE
Cosine similarity (CD) between two vectors x and y is defined as:
CD(x; y) = (x
T
* y)/(||x||*||y||) ------- (1)
Cosine similarity has a special property that makes it
suitable for metric learning: the resulting similarity measure is
always within the range of -1 and +1. This property allows the
objective function to be simple and effective.
B. EARTH MOVER DISTANCE (EMD)
The Earth Mover Distance is based on the solution to a
discrete optimal mass transportation problem. EMD represents
the minimum cost of moving earth from some source locations
to fill up holes at some sink locations. In other words, given
any two mass (or probability) distributions, one of them can be
viewed as a distribution of earth and the other a distribution of
holes, then the EMD between the two distributions is the
minimum cost of rearranging the mass in one distribution to
obtain the other. In the continuous setting, this problem is
known as the Monge-Kantorovich optimal mass transfer
problem and has been well studied over the past 100 years the
importance here is that EMD can be used to measure the
discrepancy between two multidimensional distributions.
C. METHODOLOGY/ALGORITHM FOR MELAKARTA
RAAGA RECOGNITION SYSTEM
Following is the methodology is used for the Melakarta
raaga Recognition for training and testing. Initially first k-
Nearest Neighbour Classifier is determined on a two-class
problem in a two-dimensional feature space which is shown in
the following diagram raagas in horizontal axis and
neighbours of raaga on the vertical axis. In this proposed
approach the decision for raaga is straightforward – one of its
nearest neighbours is of class O and one of class X.

Fig. 2 1-Nearest Neighbour classification of Raagas
A training dataset D is made up of (xi), I Є[1,|D|] training
samples where xi is the raaga. The raaga is divided in to 15
samples by eliminating unwanted frequencies (disturbances,
accompanied instruments) by using low level filter-Fourier
Transform of a Signal (Spft). The same process is repeated for
each raaga in database D. Then these samples are trained by
using Self- Organizing and Learning Vector Quantization
Nets. The grouping process is carried by us. Each training
example is labeled with a class label y
j
Є Y. Our objective is
to classify an unknown example raaga q. Now training process
is completed. Next the testing phase is performed by using
KNN classification.
The KNN approach carried in two phases
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 1, No. 4, 2012
45 | P a g e
www.ijacsa.thesai.org
1 Determination of Nearest Neighbours
2 Determination of the class using those neighbours
DETERMINATION OF NEAREST NEIGHBOURS :
For each xi Є D the distance between q and xi is calculated
as follows:
¿
e
=
F f
f i f f i
x q x q d ) , ( ) , ( o =
------(2)
Where x
i
= trained raaga ,
q = testing raaga,
f = feature(flow pattern)
w
f
= weighted feature of raaga
There are huge ranges of possibilities for this distance
metric; a basic version for continuous and discrete attributes
would be:
¦
¹
¦
´
¦
÷
=
=
=
continuous f x q
x q and discrete f
x q and discrete f
x q
f i f
f i f
f i f
f i f
| |
1
0
) , ( o

----(3)
The k nearest neighbours is selected based on this distance
metric. In order to determine the class of q the majority class
among the nearest neighbours is assigned to the query. It will
often make sense to assign more weight to the nearer
neighbours in deciding the class of the query.
DETERMINATION OF THE CLASS USING THOSE NEIGHBOURS:
If more than one of the neighbours is identified then it can
be resolved by simple majority voting or by distance weighted
voting. A fairly general technique to achieve this is distance
weighted voting where the neighbours get to vote on the class
of the query case with votes weighted by the inverse of their
distance to the query.
¿
=
=
k
c
c j
n
c
i
y y
x q d
y Vote
1
) , ( 1
) , (
1
) (
------ (4)
Thus the vote assigned to class y
j
by neighbour x
c
is 1
divided by the distance to that neighbour, i.e. 1(y
j
, y
c
) returns
1 if the class labels match and 0 otherwise. From the above
equation would normally be 1 but values greater than 1 can be
used to further reduce the influence of more distant
neighbours. Now the distance measures Cosine and EMD
measures applied to our KNN process is discussed.
1) COSINE DISTANCE MEASURE
The cosine similarity measure is the cosine of the angle
between these two vectors, suppose d
i
and d
j
are the paths
between a
i
and a
j
in instance x
i
and instance x
j
, respectively. d
i

and d
j
are represented as vectors of term frequencies in the
vector-space model. The cosine is calculated by using the
following formula


√∑

√∑

----- (5)
2) EARTH MOVER DISTANCE
The Earth Mover Distance (EMD) is a distance measure
that overcomes many of problems that arise from the
arbitrariness of binning. As the name implies, the distance is
based on the notion of the amount of effort required to convert
one instrumental music to another based on the analogy of
transporting mass from one distribution to another. If two
instrumental music are viewed as distributions and view one
distribution as a mass of earth in space and the other
distribution as a hole (or set of holes) in the same space then
the EMD is the minimum amount of work involved in filling
the holes with the earth. Some researchers analysis of the
EMD argue that a measure based on the notion of a signature
is better than one based on a histogram. A signature {s
j
= m
j

,wm
j
} is a set of j clusters where m
j
is a vector describing the
mode of cluster j and wm
j
is the fraction of features falling
into that cluster.
Thus, a signature is a generalization of the notion of a
histogram where boundaries and the number of partitions are
not set in advance; instead j should be ‘appropriate’ to the
complexity of the instrumental music. The example in Figure
3 illustrates this idea. The clustering can be thought as a
quantization of the instrumental music in some frequency
space so that the instrumental music is represented by a set of
cluster modes and their weights. In the figure the source
instrumental music is represented in a 2D space as two points
of weights 0.6 and 0.4; the target instrumental music is
represented by three points with weights 0.5, 0.3 and 0.2. In
this example the EMD is calculated to be the sum of the
amounts moved (0.2, 0.2, 0.1 and 0.5) multiplied by the
distances they are moved. Calculating the EMD involves
discovering an assignment that minimizes this amount.

Fig. 3. An example of the EMD between two 2D signatures with two points
(clusters) in one signature and three in the other.
For two instrumental music described by signatures S =
{m
j
,wm
j
}n
j
=1 and Q = {p
k
,wp
k
}r k=1 . The work required to
transfer from one to the other for a given flow pattern F:

¿¿
= =
=
n
j
r
k
k i jk
f d F Q S WORK
1 1
) , , (
---- (6)
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 1, No. 4, 2012
46 | P a g e
www.ijacsa.thesai.org
where d
jk
is the distance between clusters m
j
and p
k
and f
jk

is the flow between m
j
and p
k
that minimizes overall cost.
Once the transportation problem of identifying the flow that
minimizes effort is solved by using dynamic programming.
The EMD is defined as:

¿ ¿
¿ ¿
=
= =
=
n
j
r
k
jk
n
j
r
k
k j jk
f
f d
Q S D M E
1
1 1
) , (
-----(7)
EMD is expensive to compute with cost increasing more
than linearly with the number of clusters. Nevertheless it is an
effective measure for capturing similarity between
instrumental music. It is identified that the EMD approach is
giving better results than Cosine measure.
IV. RESULTS AND DISCUSSION
The input signal is sampled at 44.1 KHz. The identification
of different Raagams for the purpose of evaluating this
algorithm is considered. For the purpose of Melakarta raaga
identification seven different instruments are considered. The
signal is made to pass through the signal separation algorithm,
and segmentation algorithm.
The result showing the segmentation points for one input is
shown in below Figures. This is the first level of segmentation
where the protruding lines indicate the points of segmentation.
After identifying the segmentation points the frequency
components are determined using the HPS algorithm and
tabulated the frequency values which have the dominant
energy. Using the raaga identification system, the confusion
matrix is determined.
The following figure shows the plot graphs and edge
detection graphs:

Fig.4 Plot Graph for Begada Raaga

Fig. 5 Plot Graph for Kharaharapriya Raaga

Fig. 6 Bhiravi raaga for Edge detection

Fig. 7 Bhiravi raaga for Song Filtered
0 0.5 1 1.5 2 2.5 3 3.5 4
x 10
6
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2 2.5 3 3.5 4
x 10
6
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
0 20 40 60 80 100
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Song & Detected Edges
Time (s)
A
m
p
l
i
t
u
d
e

/

E
d
g
e
0 20 40 60 80 100
-0.5
0
0.5
1
Song Filtered
Time (s)
A
m
p
l
i
t
u
d
e
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 1, No. 4, 2012
47 | P a g e
www.ijacsa.thesai.org

Fig. 8 Malahari raaga for Edge detection

Fig. 9 Malahari raaga for Song Filtered
The following are the results obtained by applying Cosine
Distance measure
Cosine Distance: The Data is same for Train and Sample
Table 1 Confusion Matrix: Same data for Train and Sample

Cosine Distance: The Data is different for Train and
Sample
Table 2 Confusion Matrix: Different data for Train and Sample
Name of
the Raaga
Recognized Raagas (%)
Begad
a
Vanasa
pathi
sundavin
odini
Desh Hindilom
Sri 90 58 58 62 65
Bhiravi 58 88 63 70 72
Abheri 58 68 78 68 70
Malahari 62 70 76 84 75
Sahana 65 72 70 85 86

The following are the results obtained by applying EMD
Distance measure
EMD: The Data is same for both Train and Sample
Table 3 Confusion Matrix: Same data for Train and Sample
EMD: The Data is different for Train and Sample
Table 4 Confusion Matrix: Different data for Train and Sample
Name of
the
Raaga
Recognized Raagas (%)
Begad
a
Vanasapa
thi
sundavi
nodini
Desh Hindilom
Sahana 89 68 78 62 65
0 10 20 30 40 50 60 70 80 90
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Song & Detected Edges
Time (s)
A
m
p
l
i
t
u
d
e

/

E
d
g
e
0 10 20 30 40 50 60 70 80 90
-0.5
0
0.5
1
Song Filtered
Time (s)
A
m
p
l
i
t
u
d
e
Name of
the Raaga
Recognized Raagas (%)
Bega
da
Vanasa
pathi
sundavi
nodini
Desh
Hindil
om
Begada 92 56 56 60 65
Vanasapat
hi
58 92 63 70 72
sundavino
dini
58 68 92 68 70
Desh 62 70 76 92 75
Hindilom 65 72 70 85 92
Name of
the Raaga
Recognized Raagas (%)
Begad
a
Vanasap
athi
sundav
inodini
Desh Hindilom
Begada 96 66 57 74 68
Vanasapat
hi
78 96 65 82 80
sundavino
dini
72 78 96 88 70
Desh 72 70 76 96 85
Hindilom 66 74 72 86 96
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 1, No. 4, 2012
48 | P a g e
www.ijacsa.thesai.org
Todi 68 88 63 70 72
Sri 58 68 78 68 70
Bhiravi 72 70 76 84 75
Abheri 75 72 70 85 86
Malahari 70 75 68 78 80
V. CONCLUSION
K-NN is very simple to understand and easy to implement.
So it should be considered in seeking a solution to any
classification problem. In some circumstances where an
explanation of the output of the classifier is useful, K-NN can
be very effective if an analysis of the neighbours is useful as
explanation. In order to improve classification process an
EMD approach is used for fast convergence. K-NN is very
sensitive to irrelevant or redundant features because all
features contribute to the similarity and thus to the
classification. This can be ameliorated by EMD approach and
feature selection or feature weighted voting. The EMD results
are compared with Cosine distance measure and observed that
EMD gives better results.
REFERENCES
[1] Rajeswari Sridhar, Geetha T. V, “Swara Indentification for South Indian
Classical Music”, ICIT '06 Proceedings of the 9th International
Conference on Information Technology, IEEE Computer Society,
ISBN:0-7695-2635-7.
[2] Youngmoo E. Kim, Brian Whitman” Singer Identification in Popular
Music Recordings Using Voice Coding Features”,
citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115.
[3] Paul G., Corine G.M., Christophe C., Vincent F. “automatic
classification of environmental noise events by hidden Markov models”,
citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.52
[4] Berger. “Some factors in the recognition of timbre“. J. Audio. Eng. Soc.
30, pp. 396-406.
[5] Clark, Milner. “Dependence of timbre on the tonal loudness produced by
musical instruments“. J. Audio. Eng. Soc. 12, pp. 28-31.
[6] Eagleson, H. W., Eagleson, O. W. “Identification of musical instruments
when heard directly and over a public-address system“. J. Acoust. Soc.
Am. 19, pp. 338-342.
[7] Strong, Clark. “Perturbations of synthetic orchestral wind instrument
tones“. J. Acoust. Soc. Am., Vol. 41, pp. 277-285.
[8] Bhatkande.V (1934), Hindusthani Sangeet Paddhati.Sangeet Karyalaya,
1934.
[9] Schroeder.M.R (1968), ” Period histogram and product spectrum: New
methods for fundamental-frequency measurement”, Journal of the
Acoustical Society of America, , vol. 43, no. 4, 1968.
[10]. A. Ghias, J. Logan, D. Chamberlin and B. C. Smith: “Query by
Humming – Musical Information Retrieval in an Audio Database”:
Proc. ACM Multimedia, pp. 231-236: 1995.
[11]. H. Deshpande, U. Nam and R. Singh: “MUGEC: Automatic Music
Genre Classi- fication”: Technical Report, Stanford University: June
2001.
[12]. S. Dixon: “Multiphonic Note Identification”: Proc. 19
th
Australasian
Computer Science Conference: Jan-Feb 2003.
[13]. W. Chai and B. Vercoe: “Folk Music Classification Using Hidden
Markov Models”: Proc. Internation Conference on Artificial
Intelligence: June 2001.
[14]. A. J. Viterbi: “Error bounds for convolutional codes and an
asymptotically optimal decoding algorithm”: IEEE Transactions on
Information Theory, Volume IT-13, pp.260-269: April 1967.
[15]. L. E. Baum: “An inequality and associated maximization technique in
statistical estimation for probabilistic functions of Markov processes”:
Inequalities, Volume 3, pp. 1-8: 1972.
[16]. L. E. Baum and T. Petrie: “Statistical inference for probabilistic
functions of finite state Markov chains”: Ann.Math.Stat., Volume 37,
pp. 1554-1563: 1966.
[17]. Gaurav Pandey, Chaitanya Mishra, and Paul Ipe “TANSEN: a system
for automatic raga identification”
[18]. A. Prasad et al. “Gender Based Emotion Recognition System for Telugu
Rural Dialects using Hidden Markov Models” Journal of Computing: An
International Journal, Volume2 ,Number 6 June 2010 NY, USA, ISSN:
2151-9617
[19]. Tarakeswara Rao B et. All “A Novel Process for Melakartha Raaga
Recognitionusing Hidden Markov Models (HMM)”, International
Journal of Research and Reviews in Computer Science (IJRRCS),
Volume 2, Number 2,April 2011, ISSN: 2079-2557.