Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Analysis of Data Mining Visualization Techniques Using ICA AND SOM Concepts

Analysis of Data Mining Visualization Techniques Using ICA AND SOM Concepts

Ratings:

3.0

(1)
|Views: 86 |Likes:
Published by ijcsis
This research paper is about data mining (DM) and visualization methods using independent component analysis and self organizing map for gaining insight into multidimensional data. A new method is presented for an interactive visualization of cluster structures in a self-organizing Map. By using a contraction model, the regular grid of selforganizing map visualization is smoothly changed toward a presentation that shows better the proximities in the data space. A Novel Visual Data Mining method is proposed for investigating the reliability of estimates resulting from a Stochastic independent component analysis (ICA) algorithm. There are two algorithms presented in this paper that can be used in a general context. Fast ICA for independent binary sources is described. The model resembles the ordinary ICA model but the summation is replaced by the Boolean Operator OR and the multiplication by AND. A heuristic method for estimating the binary mixing matrix is also proposed. Furthermore, the differences on the results when using different objective function in the FastICA estimation algorithm is also discussed.
This research paper is about data mining (DM) and visualization methods using independent component analysis and self organizing map for gaining insight into multidimensional data. A new method is presented for an interactive visualization of cluster structures in a self-organizing Map. By using a contraction model, the regular grid of selforganizing map visualization is smoothly changed toward a presentation that shows better the proximities in the data space. A Novel Visual Data Mining method is proposed for investigating the reliability of estimates resulting from a Stochastic independent component analysis (ICA) algorithm. There are two algorithms presented in this paper that can be used in a general context. Fast ICA for independent binary sources is described. The model resembles the ordinary ICA model but the summation is replaced by the Boolean Operator OR and the multiplication by AND. A heuristic method for estimating the binary mixing matrix is also proposed. Furthermore, the differences on the results when using different objective function in the FastICA estimation algorithm is also discussed.

More info:

Published by: ijcsis on Feb 15, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

07/12/2014

pdf

text

original

 
ANALYSIS OF DATA MINING VISUALIZATION TECHNIQUESUSING ICA AND SOM CONCEPTS
K.S.RATHNAMALA,
1
Dr.R.S.D. WAHIDA BANU
2
 
1 Research Scholar of Mother Teresa Women’s University, Kodaikanal2 Professor& Head, Dept. of Electronics& Communication Engg., GCE.
This research paper is about data mining (DM) andvisualization methods using independentcomponent analysis and self organizing map forgaining insight into multidimensional data. A newmethod is presented for an interactive visualizationof cluster structures in a self-organizing Map. Byusing a contraction model, the regular grid of self-organizing map visualization is smoothly changedtoward a presentation that shows better theproximities in the data space. A Novel Visual DataMining method is proposed for investigating thereliability of estimates resulting from a Stochasticindependent component analysis (ICA) algorithm.There are two algorithms presented in this paperthat can be used in a general context. Fast ICA forindependent binary sources is described. Themodel resembles the ordinary ICA model but thesummation is replaced by the Boolean OperatorOR and the multiplication by AND. A heuristicmethod for estimating the binary mixing matrix isalso proposed. Furthermore, the differences on theresults when using different objective function inthe FastICA estimation algorithm is also discussed.
KEY WORDS:
Independent component analysis, Self organizing map, Vector quantization, patterns,Agglomerative hierarchical methods, Time seriessegmentation, Finding patterns by proximity,Clustering validity indices, Feature selection andweighing Fast ICA.
1. INTRODUCTION
The tasks that are encountered within data miningresearch are predictive modeling, descriptivemodeling, discovering rules and patterns,exploratory data analysis, and retrieval by content.Predictive modeling includes many typical tasks of machine learning such as classification andregression. Descriptive modeling that is ultimatelyabout modeling all of the data e.g., estimating itsprobability distribution. Finding a clustering,segmentation or informative linear representationare common subtasks of descriptive modeling.Particular methods for discovering rules andpatterns emphasize finding interesting localcharacteristics and patterns instead of globalmodels.Descriptive data mining techniques fordata description can be divided roughly into threegroups:Proximity preserving projections for (visual)investigation of the structure of the data.
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 1, January 2011171http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
Partitioning the data by clustering andsegmentation .Linear projections for finding interesting linearcombinations of the original variables usingprincipal component analysis and independentcomponent analysis.A clustering is a partition of the set of alldata items C= {1,2,....N} into K disjoint clusters
ii
1
==
 2
.SELF- ORGANIZING MAP
The basic Self-organizing map is formed of K map units organized on a regular k x l low-dimensional grid-usually 2D for visualization.Associated to each map unit i, there is a1. Neighborhood kernel h(d
ij
,
σ
(t))where thedistance d
ij
is measured from map unit i to othersalong the grid (output space), and2. a codebook vector c
i
that quantize the data space(input space).The magnitude of the neighborhood kerneldecreases monotonically with the distance d
ij
. Atypical choice is the Gaussian kernel .
Batch algorithm
One possibility to implement a batch SOMalgorithm is to add an extra step to the batch K-means procedure.
( )
ih ch c
dj j j jjij ji
===
,))((1 ))(,1 :
,
σ α 
 A relatively large neighborhood radius in thebeginning gives a global ordering for the map. Thekernel width
σ
(t) is then decreased monotonicallyalong with iteration steps which increases theflexibility of the map to provide lower quantizationerror in the end. If the radius is run to zero, thebatch SOM becomes identical to K-means.The batch SOM is a computational short-cut version of the basic. Despite the intuitive clarityand elegance of the basic SOM, its mathematicalanalysis has turned out to be rather complex. Thiscomes from the fact that there exists no costfunction that the basic SOM would minimize for aprobability distribution .In general, the number of map codebook vectorsgoverns the computational complexity of oneiteration step of the SOM. If the size of the SOM isscaled linearly with the number of data vectors, theload scales to O (MN
2
). But on the other hand, theselection of K can be made following, e.g.,
assuggested in and the load decreases to O (MN
1.5
).It is suggested that the SOM Toolbox applies tosmall to medium data sets up to, say, 10 000-100000 records. A specific problem is that the memoryconsumption in the SOM Toolbox growsquadratically along with the map size K.In practice, the SOM and its variants havebeen successful in a considerable number of application fields and individual applications. Inthe context of this paper interesting applicationareas close to VDM include
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 1, January 2011172http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
Visualization and UI techniques especially ininformation retrieval, and exploratory data analysisin general.Context-aware computing.Industrial applications for process monitoringand analysis.Visualization capabilities, data and noisereduction by topoloigically restricted vectorquantization and practical robustness of the SOMare of benefit to data mining. There are alsomethods for additional speed-ups in the SOM forespecially large datasets in data mining and indocument retrieval applications.The SOM framework is not restricted toEuclidean space or real vectors. A variant of theSOM in a non-Euclidean space or real vectors. Avariant of the SOM in a non-Euclidean space ispresented to enhance modeling and visualizationsof hierarchically distributed data. This methoduses a fisheye distortion in the visualization. Alsoself-organizing maps and similar structures forsymbolic data exist and have been applied also tocontext-aware computation.
3.AGGLOMERATIVE HIERARCHICALMETHODS:
Some clustering methods construct a model of theinput data space that inherently would allowclassifying a new sample into some of thedetermined clusters. K-means partition the inputdata space in this manner. Some other methodsmerely provide a partition of the items in thesample: the agglomerative hierarchical methodsprovide an example of this case.The family of partitional methods is often opposedto the hierarchical methods. Agglomerativehierarchical methods do not aim at minimizing aglobal criteria for partitioning, but join data itemsin bigger clusters in a bottom-up manner. In thebeginning, all samples are considered to form theirown cluster. After this, at N-1 steps the pair of clusters having minimal pairwise dissimilarity
δ
 are joined, which reduces the number of remainingclusters by one. The merging is repeated until alldata is in one cluster. This gives a set of nestedpartitions and a tree presentation is quite a naturalway of representing the result.Here we list the between-cluster dissimilarities
δ
 of some of the most common agglomerationstrategies the single linkage (SL), complete linkage(CL) and average linkage (AL) criteria.
1,,1
min
 ji
ijSL
ε ε δ  δ  
==
 
1,,2
 jimixd 
ijCL
ε ε δ  δ  
==
 
==
i  jij AL
ε ε 
δ  δ  
1
13
1
 Where C
, C
l
, (k 
l) are any two distinct clusters.SL and CL are invariant for monotonetransformations of dissimilarity. SL is reported tobe noise sensitive but capable of producingelongated or chained clusters while CL and AL
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 1, January 2011173http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->