18 views

Uploaded by idescitation

Clustering analysis method is one of the main
analytical methods in data mining; the method of clustering
algorithm will influence the clustering results directly. This
paper discusses the standard k-means clustering algorithm
and analyzes the shortcomings of standard k-means
algorithm. This paper also focuses on web usage mining to
analyze the data for pattern recognition. With the help of k-
means algorithm, pattern is identified.

save

- Deterministic Algorithm With Agglomerative Heurist
- Tumor Detection using Particle Swarm Optimization to Initialize Fuzzy C-means
- Px c 3874611
- A Prototype for Intrusion Detection in Wireless Sensor Networks Using Data Mining Methods
- A Survey on Hard Computing Techniques based on Classification Problem
- 06991841
- paper 1 cloud security using NN.pdf
- 1506877
- Bioinformatics problems
- Foundations 6 Data Minimg
- Modifying Insurance Rating Territories via Clustering
- Vedaldi sift.pdf
- Cluster by Evan
- Data Integration Face to Face Meeting Budapest 24 -25 Schedule v1w
- [TAM-12]Top-down vs Bottom-up Methods of Linkage for Asymmetric Agglomerative Clustering
- FUZZY CLUSTERING APPROACH TO STUDY THE DEGREE OF AGGRESSIVENESS IN YOUTH VIOLENCE
- DWDM
- Couple Stady
- AUTOMATIC UNSUPERVISED DATA CLASSIFICATION USING JAYA EVOLUTIONARY ALGORITHM
- Applied Data Analysis (With SPSS)
- Effect of Electrostatic Discharge on Analog Circuits
- Simulation based Study of FPGA Controlled Single- Phase Matrix Converter with Different Types of Loads
- Testing of Boost Rectification of Back-to-Back Converter for Doubly Fed Induction Generator
- A Low Cost Wireless Interfacing Device between PS/2 Keyboard and Display
- Isolated Flyback Converter Designing, Modeling and Suitable Control Strategies
- Computational Analysis of 3-Dimensional Transient Heat Conduction in the Stator of an Induction Motor during Reactor Starting using Finite Element Method
- 581-517-525DNA based Password Authentication Scheme using Hyperelliptic Curve Cryptography for Smart Card
- Microcontroller based Digital Trigger Circuit for Converter
- Selective Harmonic Elimination of Three Phase Bidirectional AC Buck Converter for Power Quality Improvement
- Delta Domain Modeling and Identification Using Neural Network
- Congestion Management using Optimal Choice and Allocation of FACTS Controllers
- Simulation and Analysis of Electric Control System for Metal Halide High Intensity Discharge Lamps
- Study of Logic Gates Using Quantum Cellular Automata
- Low Power Full Adder using 9T Structure
- Modified Adaptive Lifting Structure Of CDF 9/7 Wavelet With Spiht For Lossy Image Coding
- Design of Switched Reluctance Motor for Three Wheeler Electric Vehicle
- Comparison of CMOS Current Mirror Sources
- Design of Series Feed Microstrip Patch Antenna Array using HFSS Simulator
- MTJ-Based Nonvolatile 9T SRAM Cell
- Two Bit Arithmetic Logic Unit (ALU) in QCA
- Static-Noise-Margin Analysis of Modified 6T SRAM Cell during Read Operation
- Three Degree of Freedom Control of Inverted Pendulum by using Repulsive Magnetic Levitation
- Power Quality Parameters Measurement Techniques
- A Three-to-Five-Phase Matrix Converter BasedFive- Phase Induction Motor Drive System

You are on page 1of 3

on Advances in Computer Science and Application 2013

**K-means Clustering Method for the Analysis of Log Data
**

Prof. Rudresh Shirwaikar1 and Chaitali Bhandari2

1

Shree Rayeshwar Institute of Engg. And Information Technology (Dept. of Information Technology), Goa, India Email: rudreshshirwaikar@gmail.com 2 Shree Rayeshwar Institute of Engg. And Information Technology (Dept. of Information Technology), Goa, India Email: chaitalibhandari9@gmail.com

Abstract —Clustering analysis method is one of the main analytical methods in data mining; the method of clustering algorithm will influence the clustering results directly. This paper discusses the standard k-means clustering algorithm and analyzes the shortcomings of standard k-means algorithm. This paper also focuses on web usage mining to analyze the data for pattern recognition. With the help of kmeans algorithm, pattern is identified. Index terms — Pattern recognition, web mining, k-means clustering, nearest neighbour, pattern recovery.

Update Step Calculate the new means to be the centroids of the observations in the new clusters (“Eq. (3)”). (3) The algorithm has converged when the assignments no longer change. Commonly used initialization methods are Forgy and Random Partition. The Forgy method randomly chooses k observations from the data set and uses these as the initial means. The Random Partition method first randomly assigns a cluster to each observation and then proceeds to the update step, thus computing the initial mean to be the centroid of the cluster’s randomly assigned points. The Forgy method tends to spread the initial means out, while Random Partition places all of them close to the center of the data set. According to Hamerly, the Random Partition method is generally preferable for algorithms such as the k-harmonic means and fuzzy k-means. For expectation maximization and standard k-means algorithms, the Forgy method of initialization is preferable. III. .DEMONSTRATION OF STANDARD ALGORITHM Following steps shows the demonstration of k-means algorithm: 1.) k initial “means” are randomly generated within the data domain. 2.) k clusters are created by associating every observation with the nearest mean. 3.) The centroid of each of the k-clusters becomes the new mean. 4.) Steps 2 and 3 are repeated until convergence has been reached. Fig.1. to fig 4 demonstrates above steps: Input for the algorithm includes data from log file shown in table 1. We consider x-axis as patterns such as p1,p2,p3 etc for pattern 1,pattern 2,pattern 3 etc respectively and then we proceed with the algorithm. The output of this algorithm will consists of clusters of patterns as specified by the user. The input table is shown in table I. This input is obtained by performing preprocessing steps on the log file shown in fig. 2. IV. ADVANTAGES AND DISADVANTAGES Strength of K-means algorithm:

I. INTRODUCTION Clustering problems arise in many different applications, such as data mining, web mining, data compression, pattern recognition pattern classification etc. The notion of what constitutes a good cluster depends on the application and there are many methods for finding clusters subject to various criteria’s. Among clustering formulations that are based minimizing a formal objective function, perhaps the most widely used and studied is k-means clustering. K-mean clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k d” n). See “Eq. (1)”. S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS): (1) Where µi is the mean of points in Si. II. STANDARD ALGORITHM The most common algorithm uses iterative refinement technique. Given an initial set of k means m1,m2…, the algorithm proceeds by alternatig between two steps: Assignment Step Assign each observation to the cluster whose mean is closest to it (“Eq. (2)”). (2) © 2013 ACEEE DOI: 03.LSCS.2013.3.578 56

Poster Paper Proc. of Int. Conf. on Advances in Computer Science and Application 2013

**Fig 4. Step 4 Fig 1. Step1 (k=2) TABLE I. INPUT
**

Pattern

FOR ALFORITHM

Time minutes

in

P1 P2 P3 P4 P5

120 50 40 12 37

Fig 2.Step 2

Fig 5. Log data

CONCLUSIONS

Fig 3.Step 3

The method is relatively scalable. Efficient in processing large data sets. Complexity of algorithm is O(nkt),where n is the total number of objects, k is the number of clusters, and t is the number of iterations. Weakness of K-means algorithm: Need to specify k, the number of clusters, in advance. Unable to handle noisy data and outliers. © 2013 ACEEE DOI: 03.LSCS.2013.3.578 57

An efficient form of k-means algorithm is provided. The algorithm is easy to implement and relatively scalable as complexity of algorithm is O. Only problem with the algorithm is number of clusters have to be specified in advance. The output of algorithm depends upon the no. of clusters specified. For k=2,we get following clusters: Cluster1: P1

Poster Paper Proc. of Int. Conf. on Advances in Computer Science and Application 2013 P1 P2 P4 Cluster2: P2 P2 P3 P5 P1 ACKNOWLEDGEMENT First, I thank my project Guide and Advisor Mr.Rudresh Shirwaikar, for enlightening me with the subject of data mining throughout. He has always listened to my ideas and supported them and made me think about the subject with a data miner’s perspective. Finally, I am grateful to Shree Rayeshwar Institute of Engineering And Information Technology for supporting me in Information Technology program. REFERENCES

[1] Hai Dong, Farookh Khadeer Hussain,Elizabeth Chang “A Survey in Traditional Information Retrieval Models”, 2008 Second IEEE International Conference on Digital Ecosystems and Technologies. [2] Determining Usage Patterns on Web Data ,MS Project Report, “Web Usage Mining: Data Preprocessing, Pattern Discovery and Pattern Analysis on the Web Data” by Anand Sharma. [3] Han, J. and Kamber, M. Data Mining: Concepts and Techniques, 2000 (Morgan Kaufmann, San Francisco,California). [4] J. Breckling, Ed., the Analysis of Directional Time Series: Applications to Wind Speed and Direction, ser. Lecture Notes in Statistics. Berlin, Germany: Springer, 1989, vol. 61. [5] S. Zhang, C. Zhu, J. K. O. Sin, and P. K. T. Mok, “A novel ultrathin elevated channel low-temperature poly-Si TFT,” IEEE Electron Device Lett., vol. 20, pp. 569–571, Nov. 1999. [6] E. Schikuta. Grid Clustering: An Efficient Hierarchical Clustering Method for Very Large Data Sets. Proc. 13 th Int’l.Conference on Pattern Recognition, 2, 1996802.11, 1997. [7] K. Alsabti, S. Ranka, and V. Singh. An Efûcient KMeansClusteringAlgorithm. http://www.cise.uû.edu/ranka/ ,1997. [8] IEEE Xplore - Research on k-means Clustering Algorithm. http://ieeexplore.ieee.org/xpls/abs_all. jsp?arnumber=5453745 [9] “k-means clustering - Wikipedia, the free encyclopedia.”http:/ /en.wikipedia.org/wiki/Kmeans_clustering [10] Zhang T,Ramakrishnan R,Livny M.BIRCH:An efficient data clustering method for very large databases.In:Jagadish HV,Mumick IS,eds.Proc.of the 1996 ACM SIGMOD Int’l Conf.on Management of Data.Montreal:ACM Press,1996.103-114. [11] Hinneburg A,Keim D.An efficient approach to clustering in large multimedia databases with noise.In:Agrawal R,Stolorz PE,Piatetsky- Shapiro G,eds.Proc.of the 4th Int’l Conf.on Knowledge Discovery and Data Mining(KDD’98).New York:AAAIPress,1998.58-65 [12] Ding C, He X. K-Nearest-Neighbor in data clustering: Incorporating local information into global optimization. In: Proc. of the ACM Symp. on Applied Computing. Nicosia: ACM Press, 2004. 584-589. [13] Huang Z. A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proc. of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. Tucson, 1997. 146-151. [14] Fred ALN, LeitaÞo JMN. Partitional vs hierarchical clustering using a minimum grammar complexity approach.

© 2013 ACEEE DOI: 03.LSCS.2013.3.578

58

- Deterministic Algorithm With Agglomerative HeuristUploaded byMV Tejaxlrai
- Tumor Detection using Particle Swarm Optimization to Initialize Fuzzy C-meansUploaded byInternational Journal for Scientific Research and Development - IJSRD
- Px c 3874611Uploaded byAdi Dian
- A Prototype for Intrusion Detection in Wireless Sensor Networks Using Data Mining MethodsUploaded byEditor IJRITCC
- A Survey on Hard Computing Techniques based on Classification ProblemUploaded byInternational Journal for Scientific Research and Development - IJSRD
- 06991841Uploaded byFlorin Nastasa
- paper 1 cloud security using NN.pdfUploaded byAmrYassin
- 1506877Uploaded byDe Bu
- Bioinformatics problemsUploaded byRitajit Majumdar
- Foundations 6 Data MinimgUploaded bynoorahmed1991
- Modifying Insurance Rating Territories via ClusteringUploaded byganamaneni
- Vedaldi sift.pdfUploaded byBraulio Otavalo
- Cluster by EvanUploaded bymailtorohitaqua
- Data Integration Face to Face Meeting Budapest 24 -25 Schedule v1wUploaded byMelinda Tokai
- [TAM-12]Top-down vs Bottom-up Methods of Linkage for Asymmetric Agglomerative ClusteringUploaded byrizalespe
- FUZZY CLUSTERING APPROACH TO STUDY THE DEGREE OF AGGRESSIVENESS IN YOUTH VIOLENCEUploaded byIntegrated Intelligent Research
- DWDMUploaded byrajesh_34
- Couple StadyUploaded byOlesea Caraion
- AUTOMATIC UNSUPERVISED DATA CLASSIFICATION USING JAYA EVOLUTIONARY ALGORITHMUploaded byacii journal
- Applied Data Analysis (With SPSS)Uploaded byБошко Матовић

- Effect of Electrostatic Discharge on Analog CircuitsUploaded byidescitation
- Simulation based Study of FPGA Controlled Single- Phase Matrix Converter with Different Types of LoadsUploaded byidescitation
- Testing of Boost Rectification of Back-to-Back Converter for Doubly Fed Induction GeneratorUploaded byidescitation
- A Low Cost Wireless Interfacing Device between PS/2 Keyboard and DisplayUploaded byidescitation
- Isolated Flyback Converter Designing, Modeling and Suitable Control StrategiesUploaded byidescitation
- Computational Analysis of 3-Dimensional Transient Heat Conduction in the Stator of an Induction Motor during Reactor Starting using Finite Element MethodUploaded byidescitation
- 581-517-525DNA based Password Authentication Scheme using Hyperelliptic Curve Cryptography for Smart CardUploaded byidescitation
- Microcontroller based Digital Trigger Circuit for ConverterUploaded byidescitation
- Selective Harmonic Elimination of Three Phase Bidirectional AC Buck Converter for Power Quality ImprovementUploaded byidescitation
- Delta Domain Modeling and Identification Using Neural NetworkUploaded byidescitation
- Congestion Management using Optimal Choice and Allocation of FACTS ControllersUploaded byidescitation
- Simulation and Analysis of Electric Control System for Metal Halide High Intensity Discharge LampsUploaded byidescitation
- Study of Logic Gates Using Quantum Cellular AutomataUploaded byidescitation
- Low Power Full Adder using 9T StructureUploaded byidescitation
- Modified Adaptive Lifting Structure Of CDF 9/7 Wavelet With Spiht For Lossy Image CodingUploaded byidescitation
- Design of Switched Reluctance Motor for Three Wheeler Electric VehicleUploaded byidescitation
- Comparison of CMOS Current Mirror SourcesUploaded byidescitation
- Design of Series Feed Microstrip Patch Antenna Array using HFSS SimulatorUploaded byidescitation
- MTJ-Based Nonvolatile 9T SRAM CellUploaded byidescitation
- Two Bit Arithmetic Logic Unit (ALU) in QCAUploaded byidescitation
- Static-Noise-Margin Analysis of Modified 6T SRAM Cell during Read OperationUploaded byidescitation
- Three Degree of Freedom Control of Inverted Pendulum by using Repulsive Magnetic LevitationUploaded byidescitation
- Power Quality Parameters Measurement TechniquesUploaded byidescitation
- A Three-to-Five-Phase Matrix Converter BasedFive- Phase Induction Motor Drive SystemUploaded byidescitation