Professional Documents
Culture Documents
DATA MINING
B.M.Vidyavathi,
Department of Computer science
Bellary Engineering College
Bellary, Karnataka State, India
Vidyabm1@yahoo.co.in
Dr.C.N.Ravikumar
Department of Computer science
Sri Jayachamarajendra College of Engineering
Mysore, Karnataka State, India
kumarcnr@yahoo.com
ABSTRACT
In recent years many applications of data mining deal with a high-dimensional data
(very large number of features) impose a high computational cost as well as the
risk of “over fitting”. In these cases, it is common practice to adopt feature
selection method to improve the generalization accuracy. Feature selection method
has become the focus of research in the area of data mining where there exists a
high-dimensional data. We propose in this paper a novel feature selection method
based on two stage analysis of Fisher Ratio and Mutual Information. The two-
stage analysis of Fisher Ratio and Mutual Information is carried out in the feature
domain to reject the noisy feature indexes and select the most informative
combination from the remaining. In the approach, we develop two practical
solutions, avoiding the difficulties of using high dimensional Mutual Information
in the application, that are the feature indexes clustering using cross Mutual
Information and the latter estimation based on conditional empirical PDF. The
effectiveness of the proposed method is evaluated by the SVM classifier using
datasets from the UCI Machine Learning Repository. Experimental results show
that the proposed method is superior to some other classical feature selection
methods and can get higher prediction accuracy with small number of features.
The results are highly promising.
Keywords: Pattern recognition, feature selection, data mining, fisher ratio, mutual
information.
5 REFERENCES