Professional Documents
Culture Documents
Feature Selection For Traditional Malay Musical Instruments Sounds Classification Using Rough Set
Feature Selection For Traditional Malay Musical Instruments Sounds Classification Using Rough Set
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 72
Abstract— Finding the most relevant features are crucial in data mining task including musical instruments sounds
classification problem. Various feature selection techniques have been proposed in this domain focusing on Western musical
instruments. However, study on rough set theory for feature selection of non-Western musical instruments sounds is
insufficient and still needs further exploration. Thus, in this paper, an alternative feature selection technique using maximum
attributes dependency based on rough set theory for Traditional Malay musical instruments sounds is proposed. The
modelling process comprises eight phases: data acquisition, sound editing, data representation, feature extraction, data
discretization, data cleansing, feature selection using the proposed technique and finally features evaluation via classifier.
The results show that the selected features generated from the proposed technique able to reduce the complexity process
and improve the classification performance significantly.
Index Terms— Rough set theory; Dependency attribute; Feature selection, Classification, Traditional Malay musical
instruments sounds.
—————————— ——————————
1 INTRODUCTION
Given arbitrary subset X ⊆ U , in general, X as union The corresponding lower approximation and upper
of some equivalence classes in U might be not pre- approximation of X are as follows
sented. It means that, it may not be possible to de-
scribe X precisely in AS . X might be characterized by C (X ) = {1,3 ,6} and C (X ) = {1,2 ,3 ,5 ,6} .
a pair of its approximations, called lower and upper
approximations. It is here that the notion of rough set Thus, concept “Decision” is imprecise (rough). For
emerges. 3
this case, α C (X ) = is obtained. It means that the
5
2.3 Set Approximations concept “Decision” can be characterized partially em-
The indiscernibility relation will be used next to de- ploying attributes Analysis, Algebra and Statistics.
fine approximations, basic concepts of rough set Another important issue in database analysis is disco-
theory. The notions of lower and upper approxima- vering dependencies between attributes. Intuitively, a
tions of a set can be defined as follows. set of attributes D depends totally on a set of
attributes C, denoted C ⇒ D , if all values of attributes
Definition 2.2. Let S = (U , A , V , f ) be an information from D are uniquely determined by values of
system, let B be any subset of A and let X be any subset of attributes from C. In other words, D depends totally
U. The B-lower approximation of X, denoted by B(X ) and on C, if there a functional dependency between values
B-upper approximations of X, denoted by B(X ) , respective- of D and C. The formal definition of attributes depen-
ly, are defined by dency is given as follows.
U
rough (imprecise) with respect to B [11]. This means
that the higher of accuracy of approximation of any
Obviously, 0 ≤ k ≤ 1 . If all set X are crisp, then k = 1 .
subset X ⊆ U is the more precise (the less imprecise)
of itself.
The expression ∑X∈U / D
C (X ) , called a lower approxi-
mation of the partition U / D with respect to C, is the
Example 2.2. Let us depict above notions by examples set of all elements of U that can be uniquely classified
referring to Table 2. Consider the concept “Decision”, to blocks of the partition U / D , by means of C. D is
i.e., the set X (Decision = accept ) = {1,2 ,3,6} and the set said to be fully depends (in a degree of k) on C if
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 76
k = 1 . Otherwise, D is partially depends on C. Thus, D eliminated in such a way that the objects in the table
fully (partially) depends on C, if all (some) elements are still able to be discerned as the original one.
of the universe U can be uniquely classified to equiva-
lence classes of the partition U / D , employing C. Definition 2.6. Let S = (U , A , V , f ) be an information
system and let B be any subsets of A. B is called indepen-
Example 2.3. From Table 2, there are no total depen- dent (orthogonal) set if all its attributes are indispensable.
dencies whatsoever. If in Table 1, the value of the
attribute Statistics for student 5 were “bad” instead of Definition 2.7. Let S = (U , A , V , f ) be an information
“medium”, there would be a total dependency system and let B be any subsets of A. A subset B * of B is a
{Statistics} ⇒ {Decision} , because to each value of the reduct of B if B * is independent and U / B* = U / B .
attribute Statistics there would correspond unique
value of the attribute Decision. For example, for de- Thus a reduct is a set of attributes that preserves parti-
pendency {Analysis, Algebra, Statistics} ⇒ {Decision} , tion. It means that a reduct is the minimal subset of
4 2 attributes that enables the same classification of ele-
k = = is obtained, because four out of six students
6 3 ments of the universe as the whole set of attributes. In
can be uniquely classified as having Decision or not, other words, attributes that do not belong to a reduct
employing attributes Mathematics, Algebra and Sta- are superfluous with regard to classification of ele-
tistics. ments of the universe. While computing equivalence
classes is straighforward, but the problem of finding
Note that, a table may be redundant in two ways. The minimal reducts in information systems is NP-hard.
first form of redundancy is easy to notice: some ob- Reducts have several important properties. One of
jects may have the same features. This is the case for them is a core.
tuples 2 and 3 of Table 2. A way of reducing data size
is to store only one representative object for every set Definition 2.8. Let S = (U , A , V , f ) be an information
of so-called indiscernible tuples as in Definition 2.1. system and let B be any subsets of A. The intersection off
The second form of redundancy is more difficult to all reducts of is called the core of B, i.e.,
locate, especially in large data tables. Some columns
of a table may be erased without affecting the classifi-
Core(B ) = Red(B) ,
cation power of the system. This concept can be ex-
tended also to information systems, where the condi-
tional and decision attributes are do not distin- Thus, the core of B is the set off all indispensable
guished. Using the entire attribute set for describing attributes of B. Because the core is the intersection of
the property is time-consuming, and the constructed all reducts, it is included in every reduct, i.e., each
rules may be difficult to understand, to apply or to element of the core belongs to some reduct. Thus, in a
verify [13]. In order to deal with this problem, sense, the core is the most important subset of
attribute reduction is required. The objective of reduc- attributes, for none of its elements can be removed
tion is to reduce the number of attributes, and at the without affecting the classification power of
same time, preserve the property of information. attributes.
2.5 Reducts and Core Example 2.4. To illustrate in finding the reducts and
A reduct is a minimal set of attributes that preserve the core, the information system as shown in Table 3 is
indiscernibility relation. A core is the common parts of considered. The information system is modified from
all reducts. In order to express the above idea more Example 3 as in [14].
precisely, some preliminaries definitions are needed.
U / A = {{1,2 ,5},{3 ,4}} , U / B = {{1},{2 ,3 ,4 ,5}} , 3.5 Feature Evaluation via Classification
U / C = {{1,2 ,3 ,4},{5}} and U / D = {{1},{2 ,5},{3 ,4}} . The performance of the best features generated from
Section 3.4 is then further evaluated using two differ-
Based on formula in equation (1), the degree of de- ent classifiers which are rough set and MLP. Classifier
pendency of attribute B on attribute A, denoted is used to verify the performance of the selected fea-
A ⇒ k B , can be calculated as follows. tures. The accuracy rate and response time achieved
by the classifier will be analysed to identify the effec-
tiveness of the selected features. Achieving a high
A ⇒k B , k=
∑ X∈U / B
A(X )
=
{3,4}
= 0.4 . accuracy rate is important to ensure that the selected
U {1,2 ,3,4 ,5} features are the best relevance features that perfectly
serve to the classification architecture which able to
Using the same way, the following degrees are ob- produce a good result. While, less response time is
tained important to allow the classifier to operate more effec-
tively. At the end of this phase, the result is compared
B ⇒k C , k=
∑ X∈U / C
B(X )
=
{1}
= 0.2 .
between full features and selected features. This is
done in order to identify the effectiveness of the se-
U {1,2 ,3,4 ,5}
lected features in handling classification problem.
C ⇒k D , k=
∑ X∈U / D
C (X )
=
{5}
= 0.2 .
U {1,2 ,3,4 ,5} 4 RESULTS AND DISCUSSION
The main objective of this study is to select the best
The degree of dependency of all attributes of Table 3 features using the proposed technique. Afterwards,
can be summarized as in Table 6. the performance of the selected features is assessed
using two different classifiers which are rough set and
TABLE 6 MLP. As mentioned, the assessment of the perfor-
THE DEGREE OF DEPENDENCY OF ATTRIBUTES OF TABLE 3 mance is based on the accuracy rate and response
time achieved. Thus, in this section, the results of this
Attribute Degree of dependen- Maximum study are presented as follows:
(Depends on) cy Dependency
of Attribute 4.1 The Best k Value for Discretization is
A B C D 1 Determined
0.2 0.2 1 The original dataset in continuous value is discritized
B A C D 1 into categorical form in order to employ rough set
0.4 0.2 1 theory. For that, the modified equal width binning
C A B D 0.6 technique is employed. In this study, the difference of
0.4 0.2 0.6 k (number of intervals) value from 2 to 10 is investi-
D A B C 0.4 gated. The best k value is determined based on the
highest classification accuracy achieved by the rough
0.4 0.2 0.2
set classifier. The finding reveals that k=3 able to gen-
erate the highest classification accuracy up to 99% as
From Table 6, the attributes A, B, C and D are ranked shown in Table 7. This k value is then applied in the
based on the maximum degree of dependency. It can proposed feature selection technique to identify the
be seen that the attributes A and B have similar max- best features for the dataset.
imum degree of dependency. In order to select the
best attributes and reduce the dimensionality respec- 4.2 Irrelevant Features is Eliminated
The dataset is represented in decision table form as
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 81
S=(U,A∪{d}, V, f). There are 1116 instances in the un- 4.4 The Performance of the Selected Features
iverse U, with the family of the instruments as the In this study, two datasets which consists of the full
decision attribute d and all other attributes shown in features and the selected features (generated from the
Table 5 as the set of condition attributes, A. The distri- propose technique) are used as an input to classify the
bution of all instances in each class is uniform with no Traditional Malay musical instruments sounds into
missing values in the data. From the data cleansing four families which are membranophone, idiophone,
step, it is found that {MMFCC1, SMFCC1} is the dis- chordophone and aerophone. This approach is meant
pensable (irrelevant) set of features. It is means that to assess the performance of the selected features as
the number of the relevant features is 35 out of 37 of compared to the full features. The performance is de-
original full features. Thus, this relevant features can termined based on two factors which are the accuracy
be represented as A−{MMFCC1, SMFCC1}. rate and the response time. For that, two different
classifiers which are rough set and MLP are exploited.
4.3 Finding the Best Features From Table 10, there was slightly improvement in
In this experiment, the proposed technique is em- terms of the accuracy rate and response time with the
ployed to identify the best features for Traditional best 17 features as compared to 35 full features by
Malay musical instruments sounds classification. As using MLP classifier. However, the overall perfor-
demonstrated in Table 8, all the 35 relevant features mance of this classifier is quite satisfactory up to 95%.
are ranked in ascending sequence based on the value On the other hand, with rough set classifier, it can be
of the maximum degree of attribute dependency. seen that the accuracy rate of 99% achieved by the
From the table, it is fascinating to see that some of the selected features is similar with the full features.
features adopted in this study are redundant. In order However, it is fascinating to see that the response time
to reduce the dimensionality of the dataset, only one is faster up to 80% for the selected features as com-
of these redundant features is selected. It is revealed pared to full features. These results show that the
that the propose feature selection technique able to propose feature selection technique able to select the
select the best 17 features out of 35 features available best features for Traditional Malay musical instru-
successfully. The best selected features are given in ments sounds and improve the classification perfor-
Table 9. mance especially on the response time.
TABLE 7
FINDING THE B EST K VALUE FOR DISCRETIZATION
k 2 3 4 5 6 7 8 9 10
Classification Ac-
93.6 98.9 98.6 98.4 98.4 98.3 98.3 98.3 98.3
curacy (%)
TABLE 8
FEATURE RANKING USING PROPOSED METHOD
34 SMFCC10 0.108423
35 SMFCC11 0.108423
27 SMFCC3 0.087814
29 SMFCC5 0.087814
11 STDFLUX 0.077061
21 MMFCC10 0.077061
20 MMFCC9 0.074373
6 MEANC 0.065412
19 MMFCC8 0.065412
18 MMFCC7 0.056452
28 SMFCC4 0.056452
7 STDC 0.042115
8 MEANB 0.042115
9 STDB 0.042115
13 MMFCC2 0.031362
16 MMFCC5 0.031362
17 MMFCC6 0.031362
5 STDRMS 0.021505
10 MEANFLUX 0.011649
2 MEANZCR 0
4 MEANRMS 0
14 MMFCC3 0
15 MMFCC4 0
26 SMFCC2 0
TABLE 9
THE B EST SELECTED FEATURES
TABLE 10
THE COMPARISON OF FEATURES CAPABILITIES VIA CLASSIFICATION PERFORMANCE
5 CONCLUSION AND FUTURE WORKS Algorithms and Subset Feature Selection. In Proceeding of
IEEE International Conference on Acoustics, Speech and Sig-
In this study, an alternative technique of feature selection
nal Processing, ICASSP 2006, 5, 221−224, (2006)
using rough set theory based on the maximum depen- [4] Pawlak, Z.: Rough Sets. International Journal of Computer and
dency of the attributes Traditional Malay musical in- Information Science 11, 341−356 (1982)
struments sounds is proposed. A non-benchmarking [5] Banerjee, M., Mitra, S., and Anand, A.: Feature Selection using
dataset of Traditional Malay musical instruments sounds Rough Sets. In M. Banerjee et al.: Multi-Objective Machine
is utilized. Two categories of features schemes which are Learning, Studies in Computational Intelligence 16, 3−20,
(2006)
perception-based and MFCC which consist of 37
[6] Modrzejewski, M.: Feature Selection using Rough Sets Theory.
attributes are extracted. Afterward, the dataset is discre- In Proceeding of the 11th International Conference on Machine
tized into 3 categorical values. The proposed technique is Learning, LNCS 667, 213−226 (1993)
then adopted for feature selection through feature rank- [7] Li, H., Zhang, W., Xu, P., and Wang, H..: Rough Set Attribute
ing and dimensionality reduction. Finally, two classifiers Reduction in Decision Systems. In Proceeding of 7th Interna-
tional Conference on Artificial Immune Systems, LNCS 5132,
which are rough set and MLP are employed to evaluate
132−141, (2008)
the performance of the selected features in terms of the [8] Herawan, T., Mustafa, M.D., and Abawajy, J.H.: Rough set
accuracy rate and response time produced. In overall, approach for selecting clustering attribute. Knowledge Based
the finding shows that the relevant features selected Systems, 23 (3), 220−231, (2010)
from the proposed model able to reduce the complexity [9] Senan, N., Ibrahim, R., Nawi, N.M, and Mokji, M.M.: Feature
Extraction for Traditional Malay Musical Instruments Classifi-
process and produce highest classification accuracy sig-
cation. In Proceeding of International Conference of Soft
nificantly. Thus, the future work will investigate the ef-
Computing and Pattern Recognition, SOCPAR ’09, 454−459,
fectiveness of the proposed technique towards other (2009)
musical instruments sounds domain and apply different [10] Palaniappan, S., and Hong, T.K.: Discretization of Continuous
types of classifier to validate the performance of the se- Valued Dimensions in OLAP Data Cubes. International Jour-
lected features. nal of Computer Science and Network Security, 8, 116−126,
(2008)
[11] Pawlak, Z.: Rough set and Fuzzy sets. Fuzzy sets and systems,
ACKNOWLEDGEMENT 17, 99–102, (1985)
This work was supported by Universiti Tun Hussein [12] Pawlak, Z. and Skowron, A.: Rudiments of rough sets. Infor-
Onn Malaysia (UTHM). mation Science, 177 (1), 3−27, (2007)
[13] Zhao, Y., Luo, F., Wong, S.K.M. and Yao, Y.Y.: A general defi-
REFERENCES nition of an attribute reduct, LNAI 4481, 101−108, (2007)
[14] Pawlak, Z.: Rough classification. International Journal of Hu-
[1] Liu, M., and Wan, C.: Feature Selection for Automatic Classifi- man Computer Studies 51, 369–383, (1983)
cation of Musical Instrument Sounds. In Proceedings of the 1st [15] Warisan Budaya Malaysia: Alat Muzik Tradisional,
ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’01, http://malaysiana.pnm.my/kesenian/Index.htm
247−248 (2001) [16] Shriver, R.: Webpage, www. rickshriver.net/hires.htm
[2] Deng, J.D., Simmermacher, C., and Cranefield. S.: A Study on [17] Senan, N., Ibrahim, R., Nawi, N.M., Mokji, M.M., and Hera-
Feature Analysis for Musical Instrument Classification. IEEE wan, T. The Ideal Data Representation for Feature Extraction
Transactions on System, Man, and Cybernetics-Part B: Cyber- of Traditional Malay Musical Instrument Sounds Classifica-
netics 38 (2), 429−438, (2008) tion. To appear in De-Shuang Huang et al. ICIC 2010, LNCS,
[3] Benetos, E., Kotti, M., and Kotropoulus, C.: Musical Instru- (2010)
ment Classification using Non-Negative Matrix Factorization
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 84
Norhalina Senan received her B. Sc. and M.Sc. degree in Com- Iwan Tri Riyadi Yanto received his B.Sc degree Mathematics from
puter Science in Computer Science from Universiti Teknologi Ma- Universitas Ahmad Dahlan, Yogyakarta. He is a Master candidate in
laysia. She is currently pursuing her study for Ph.D. degree in Fea- Data Mining at Universiti Tun Hussein Onn Malaysia (UTHM). His
ture Selection of Traditional Malay Musical Instruments Sounds research area includes Data Mining, KDD, and numeric computa-
Classification using Rough Set at Universiti Tun Hussein Onn Ma- tion.
laysia. She is a lecturer at Faculty of Computer Science and Infor-
mation Technology, Universiti Tun Hussein Malaysia. Her research Tutut Herawan received his B.Ed and M.Sc degrees in Mathemat-
area includes data mining, multimedia and rough set. ics from Universitas Ahmad Dahlan and Universitas Gadjah Mada
Yogyakarta, respectively. He obtained his Ph.D from Universiti Tun
Rosziati Ibrahim is with the Software Engineering Department, Hussein Onn Malaysia. Currently, he is a senior lecturer at Com-
Faculty of Computer Science and Information Technology, Universiti puter Science Program, Faculty of Computer Systems and Soft-
Tun Hussein Onn Malaysia (UTHM). She obtained her PhD in Soft- ware Engineering, Universiti Malaysia Pahang (UMP). He published
ware Specification from the Queensland University of Technology more than 40 research papers in journals and conferences. His
(QUT), Brisbane and her MSc and BSc (Hons) in Computer research area includes data mining and KDD, rough and soft set
Science and Mathematics from the University of Adelaide, Australia.
theories.
Her research area is in Software Engineering that covers Software
Specification, Software Testing, Operational Semantics, Formal
Methods, Data Mining, Image Processing and Object-Oriented
Technology.