International Journal of Computer Science and Information Security, 2011
The important corresponding supervised approach
is Linear Discriminant Analysis (LDA).
Primary purpose of my work is to develop an
ient method of feature extraction for reducing the
dimension. For this I have worked on new approach of k
clustering for feature extraction. This method extract the
feature on the basis of cluster center.
In the field of Data mining Feature Extraction has atremendous application such as dimension reduction, patternclassification, data visualization,
Automatic Exploratory Data
Analysis. To extract proper feature from the rich data set is the
major issue. For this many work has been done before toreduce dimension. Mainly PCA and LDA are used for thisdimension reduction. Identification of important attributes orfeatures is a major area of research from last several years.To give new solution to some long standing necessities of feature extraction and to work with a new approach of dimension reduction. PCA finds a set of the mostrepresentative projection vectors such that the projectedsamples retain the most information about original samples.LDA uses the class information and finds a set of vectors thatmaximize the between-class scatter while minimizing the
-class scatter. Cluster is another technique for makinggroup for the different object present in the dataset. With thecluster center also it can be possible to find out the necessaryfeature from the data set. In my present work I use this new
approach of extracting the feature.
An Overview of Data Mining and
Data mining is an iterative process within whichprogress is defined by discovery, through either automatic ormanual methods. Data mining is most useful in an exploratoryanalysis scenario in which there are no predetermined notionsabout what will constitute an "interesting" outcome. Datamining is the search for new, valuable, and nont
information in large volumes of data. It is a cooperative effortof humans and computers. Best results are achieved bybalancing the knowledge of human experts in describing
problems and goals with the search capabilities of computers.
The process of grouping a set of physical or abstractobjects into classes of
is called clustering. Acluster is a collection of data objects that are
within the same cluster and are
to theobjects in other clusters. A cluster of data objects can betreated collectively as one group and so may be considered asa form of data compression. Although classification is an
effective means for distinguishing groups or
classes of objects,
it requires the often
and labeling of a large set
of training tuples or patterns, which the classifier uses tomodel each group. It is often more desirable to proceed in thereverse direction: First partition the set of data into groupsbased on data similarity (e.g., using clustering), and thenassign labels to the relatively small number of
Additional advantages of such a clustering-based process arethat it is adaptable to changes and helps single out useful
features that distinguish different groups.
As a branch of statistics, cluster analysis has beenextensively studied for many years, focusing mainly on
-based cluster analysis
. Cluster analysis tools based
-medoids, and several other methods have also
been built into many statistical
analysis software packages.
In machine learning, clustering is an example of unsupervised learning. Unlike classification, clustering
unsupervised learning do not rely on predefined classes and
-labeled training examples. For this reason, clustering is aform of learning by observation, rather than
. In data mining, efforts have focused on findingmethods for efficient and effective cluster analysis in
. Active themes of research focus on
of clustering methods, the effectiveness of methods for
complex shapes and types of data
clustering techniques, and methods for clustering
mixednumerical and categorical data
in large databases.