3 views

Uploaded by Shubham Gupta

lab assignment

lab assignment

© All Rights Reserved

- Video Data Mining Framework for Surveillance Video
- Unsupervised Learning - Text Clustering Machine Learning for NLP
- Cluster Analisys
- Clustering Thesis
- 3. Comp Sci - Ijcse -Hybrid K-means Clustering for Color Image Segmentation
- The Global Fuzzy C-Means Clustering Algorithm
- Clastering
- pelcat
- Application of Fuzzy C-means Clustering and Particle Swarm Optimizationto Improve Voice Traffic Forecastingin Fuzzy Time Series
- FitnessFunctions_As3-10
- sensors-16-01575 (1).pdf
- BUS 445 - Tutorial 11
- 26.IJAEST-Vol-No-7-Issue-No-2-WAY-TO-IMPROVE-K–MEANS-ALGORITHM-BY-USING-VARIOUS-ATTRIBUTES-330-336
- Accelerating Unique Strategy for Centroid Priming in K-Means Clustering
- k Mean Clustering
- K-MEANS AND D-STREAM ALGORITHM IN HEALTHCARE
- Cis612 Project Presentation PDF
- symmetry-09-00058-v2
- [IJCST-V6I5P6]:Prof. Amar Nath Singh, Ms. Susmita Rani
- IRJET-Implementation of Prototype Based Credal Classification approach For Enhanced Classification of Incomplete Pattern

You are on page 1of 15

PART A

(PART A : TO BE REFFERED BY STUDENTS)

Experiment No.05

Aim: Implementation of 2 dimensional K-means Algorithm for Clustering.

Prerequisites: C/C++/Java

Programming

Learning Outcomes:

Concepts of K-means Algorithm and Clustering.

Theory:

Algorithm:

Example:

Problem Statement : Given: {2,4,10,12,3,20,30,11,25}, k=2 Randomly

assign means: m1=3,m2=4

K1={2,3}, K2={4,10,12,20,30,11,25}, m1=2.5,m2=16

K1={2,3,4},K2={10,12,20,30,11,25}, m1=3,m2=18

K1={2,3,4,10},K2={12,20,30,11,25}, m1=4.75,m2=19.6

K1={2,3,4,10,11,12},K2={20,30,25}, m1=7,m2=25

PART B

(PART B : TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments within two hours of the practical slot.

The soft copy must be uploaded on the Blackboard or emailed to the concerned lab in charge faculties

at the end of the practical in case the there is no Black board access available)

Roll No. E059

Class : B.Tech CS

Date of Experiment:

Grade :

Date of Grading:

Batch : E3

Date of Submission

Time of Submission:

(Paste your c/c++/java code completed during the 2 hours of practical in the lab here)

/*

* To change this license header, choose License Headers in Project

Properties.

* To change this template file, choose Tools | Templates

* and open the template in the editor.

*/

/**

*

* @author mpstme.student

*/

//package means;

import java.util.ArrayList; import java.util.Scanner;

public class KMeans {

public static int NUM_CLUSTERS ;

public static int TOTAL_DATA ;

private static ArrayList<Data> dataSet = new ArrayList<>();

private static ArrayList<Centroid> centroids = new ArrayList<>();

private static void initialize(double SAMPLES[][])

{

ArrayList<Integer> temp=new ArrayList<>(); for(int

i=0;i<NUM_CLUSTERS;i++){

int t=(int)Math.floor(Math.random()*TOTAL_DATA);

if(temp.isEmpty()||!temp.contains(t)){

temp.add(t);

centroids.add(new Centroid(SAMPLES[t][0],SAMPLES[t][1]));

}

else

{

i--;

}

}

System.out.println("Centroids initialized at:"); for(int

i=0;i<NUM_CLUSTERS;i++){

System.out.println(" (" + centroids.get(i).X() + ", " + centroids.get(i).Y()

+ ")");

}

}

private static void kMeanCluster(double SAMPLES[][])

{

final double bigNumber = Math.pow(10, 10); double minimum =

bigNumber;

double distance = 0.0; int sampleNumber = 0; int cluster = 0;

boolean isStillMoving = true; Data newData = null;

{

newData = new Data(SAMPLES[sampleNumber][0],

SAMPLES[sampleNumber][1]); dataSet.add(newData);

minimum = bigNumber;

for(int i = 0; i < NUM_CLUSTERS; i++)

{

distance = dist(newData, centroids.get(i)); if(distance < minimum){

minimum = distance; cluster = i;

}

}

newData.cluster(cluster);

{

double totalX = 0.0; double totalY = 0.0; double totalInCluster = 0.0;

for(int j = 0; j < dataSet.size(); j++)

{

if(dataSet.get(j).cluster() == i){ totalX += dataSet.get(j).X(); totalY +=

dataSet.get(j).Y(); totalInCluster++;

}

}

if(totalInCluster > 0){ centroids.get(i).X(totalX / totalInCluster);

centroids.get(i).Y(totalY / totalInCluster);

}

}

sampleNumber++;

}

while(isStillMoving)

{

for(int i = 0; i < NUM_CLUSTERS; i++)

{

double totalX = 0.0; double totalY = 0.0; double totalInCluster = 0.0;

for(int j = 0; j < dataSet.size(); j++)

{

if(dataSet.get(j).cluster() == i){

+;

}

}

if(totalInCluster > 0){ centroids.get(i).X(totalX / totalInCluster);

centroids.get(i).Y(totalY / totalInCluster);

}

}

isStillMoving = false;

{

Data tempData = dataSet.get(i); minimum = bigNumber;

for(int j = 0; j < NUM_CLUSTERS; j++)

{

distance = dist(tempData, centroids.get(j)); if(distance < minimum){

minimum = distance; cluster = j;

}

}

tempData.cluster(cluster); if(tempData.cluster() != cluster){

tempData.cluster(cluster); isStillMoving = true;

}

}

}

}

{

return Math.sqrt(Math.pow((c.Y() - d.Y()), 2) + Math.pow((c.X() - d.X()),

2));

}

{

private double mX = 0; private double mY = 0; private int mCluster = 0;

public Data()

{

}

public Data(double x, double y)

{

this.X(x);

this.Y(y);

}

public void X(double x)

{

this.mX = x;

}

public double X()

{

return this.mX;

}

public void Y(double y)

{

this.mY = y;

}

public double Y()

{

return this.mY;

}

public void cluster(int clusterNumber)

{

this.mCluster = clusterNumber;

}

public int cluster()

{

return this.mCluster;

}

}

private static class Centroid

{

private double mX = 0.0; private double mY = 0.0;

public Centroid()

{

}

public Centroid(double newX, double newY)

{

this.mX = newX; this.mY = newY;

}

{

this.mX = newX;

}

public double X()

{

return this.mX;

}

public void Y(double newY)

{

this.mY = newY;

}

public double Y()

{

return this.mY;

}

}

public static void main(String[] args)

{

Scanner sc=new Scanner(System.in);

System.out.println("Enter total no of clusters");

NUM_CLUSTERS=sc.nextInt();

do{System.out.println("Total No of data");

TOTAL_DATA=sc.nextInt();

if(TOTAL_DATA<NUM_CLUSTERS){

System.out.println("Number of data should be atleast equal to number of

clusters");

}

}while(TOTAL_DATA<NUM_CLUSTERS);

double SAMPLES[][]=new double[TOTAL_DATA][2];

System.out.println("Enter sample values");

for(int i=0;i<TOTAL_DATA;i++){

for(int j=0;j<2;j++){

SAMPLES[i][j]=sc.nextDouble();

}

}

initialize(SAMPLES);

kMeanCluster(SAMPLES);

for(int i = 0; i < NUM_CLUSTERS; i++)

{

System.out.println("Cluster " + i + " includes:"); for(int j = 0; j <

TOTAL_DATA; j++)

{

if(dataSet.get(j).cluster() == i){

System.out.println(" (" + dataSet.get(j).X() + ", " + dataSet.get(j).Y() +

")");

}

}

System.out.println();

}

System.out.println("Centroids finalized at:"); for(int i = 0; i <

NUM_CLUSTERS; i++)

{

System.out.println(" (" + centroids.get(i).X() + ", " + centroids.get(i).Y()

+")");

}

System.out.print("\n");

}

}

(Paste your program input and output in following format, If there is error then paste the specific error in the output

part. In case of error with due permission of the faculty extension can be given to submit the error free code with output

in due course of time. Students will be graded accordingly.)

Input Data:

debug:

Enter total no of clusters

5

Total No of data

7

Enter sample values

12

34

56

78

89

88

99

Centroids initialized at:

(3.0, 4.0)

(1.0, 2.0)

(5.0, 6.0)

(8.0, 9.0)

(9.0, 9.0)

Cluster 0 includes:

(3.0, 4.0)

Cluster 1 includes:

(1.0, 2.0)

Cluster 2 includes:

(5.0, 6.0)

Cluster 3 includes:

(7.0, 8.0)

(8.0, 8.0)

Cluster 4 includes:

(8.0, 9.0)

(9.0, 9.0)

Centroids finalized at:

(3.0, 4.0)

(1.0, 2.0)

(5.0, 6.0)

(7.5, 8.0)

(8.5, 9.0)

BUILD SUCCESSFUL (total time: 55 seconds)

Output Data:

(Students are expected to comment on the output obtained with clear observations and learning for each task/ sub part

assigned)

After successful completion of this experiment, we learned to implement k-means method for

clustering the given objects using centroids. We observe that the objects get clustered according to

their distances from a given centroid which is chosen randomly.

B.4 Conclusion:

(Students must write the conclusions based on their learning)

have thus implemented K-means method of

clustering.

Q1.Summarize the approaches that are used for clustering with their advantages and limitations.

1) Partitioning algorithm Construct various partitions and then evaluate them by some

criterion. Advantages:

- Relatively efficient

- Terminates at local optimum

Disadvantages:

- Need to specify number of clusters Applicable when mean is defined

2) Hierarchy Algorithms Create a hierarchical decomposition of the set of data using the same

criterion.

Advantages:

- Structure that is more informative

- Does not require to specify number of clusters

Disadvantages:

Selection of merge points is critical.

Split decisions if not well chosen may lead to low quality clusters.

3) Density Based Based on connectivity and density function.

Advantage:

- It is based on connecting points within certain distance

thresholds Disadvantage:

They expect some kind of density drop to detect cluster borders

Q2. Explain Hierarchical algorithms for clustering with example.

The hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters.

Strategies for hierarchical clustering generally fall into two types:

Agglomerative: Start with the points as individual clusters. At each step, merge the closest pair of

clusters until only one cluster (or k clusters) left.

Divisive: Start with one, all-inclusive cluster. At each step, split a cluster until each

cluster contains a point (or there are k clusters).

Clustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to

many thousands of dimensions. Such high-dimensional data spaces are often encountered in areas

such as medicine, where DNA microarray technology can produce a large number of measurements at

once, and the clustering of text documents, where, if a word-frequency vector is used, the number of

dimensions equals the size of the vocabulary.

BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data mining

algorithm used to perform hierarchical clustering over particularly large data-sets. An advantage of

BIRCH is its ability to incrementally and dynamically cluster incoming, multi-dimensional metric data

points in an attempt to produce the best quality clustering for a given set of resources (memory and

time constraints). In most cases, BIRCH only requires a single scan of the database. In addition,

BIRCH also claims to be the "first clustering algorithm proposed in the database area to handle 'noise'

(data points that are not part of the underlying pattern) effectively", beating DBSCAN by two months.

- Video Data Mining Framework for Surveillance VideoUploaded byeditor3854
- Unsupervised Learning - Text Clustering Machine Learning for NLPUploaded byravigobi
- Cluster AnalisysUploaded byMuji Gunarto
- Clustering ThesisUploaded byjbsimha3629
- 3. Comp Sci - Ijcse -Hybrid K-means Clustering for Color Image SegmentationUploaded byiaset123
- The Global Fuzzy C-Means Clustering AlgorithmUploaded byfrmalthus
- ClasteringUploaded byMochammad Adji Firmansyah
- pelcatUploaded byshwetadhatterwal
- Application of Fuzzy C-means Clustering and Particle Swarm Optimizationto Improve Voice Traffic Forecastingin Fuzzy Time SeriesUploaded byAseef Emon
- FitnessFunctions_As3-10Uploaded bytiwcpe8
- sensors-16-01575 (1).pdfUploaded byDavid Daudth
- BUS 445 - Tutorial 11Uploaded byNancy Mo
- 26.IJAEST-Vol-No-7-Issue-No-2-WAY-TO-IMPROVE-K–MEANS-ALGORITHM-BY-USING-VARIOUS-ATTRIBUTES-330-336Uploaded byhelpdesk9532
- Accelerating Unique Strategy for Centroid Priming in K-Means ClusteringUploaded byIJIRST
- k Mean ClusteringUploaded byFaizan Shaikh
- K-MEANS AND D-STREAM ALGORITHM IN HEALTHCAREUploaded byInternational Jpurnal Of Technical Research And Applications
- Cis612 Project Presentation PDFUploaded byAnonymous 1PdjSWhttI
- symmetry-09-00058-v2Uploaded byqwerty
- [IJCST-V6I5P6]:Prof. Amar Nath Singh, Ms. Susmita RaniUploaded byEighthSenseGroup
- IRJET-Implementation of Prototype Based Credal Classification approach For Enhanced Classification of Incomplete PatternUploaded byIRJET Journal
- Scale BasedUploaded byMahakGoindani
- 7de1deedb62a2fb3e587d3fe196a30ad2723.pdfUploaded byimran5705074
- Supressed Fuzzy CmeansUploaded bymn
- PRACTICAL10(2)Uploaded byChirag Patel
- A Framework to Support Management of Hivaids Using K Means and Random Forest AlgorithmUploaded byIJSTR Research Publication
- Exposure of DocumentUploaded byiaetsdiaetsd
- Customer SegmentationUploaded byefiol
- 2016 finalUploaded byapi-367530120
- DemoUploaded bytoo12
- IISITSamb123Uploaded bysonarkar

- DWM_Experiment10_E059Uploaded byShubham Gupta
- DWM_Experiment3_E059Uploaded byShubham Gupta
- DWM_Experiment7_E059Uploaded byShubham Gupta
- DWM_Experiment8_E059Uploaded byShubham Gupta
- Experiment2_E059_DWMUploaded byShubham Gupta
- Experiment4_E059_DWMUploaded byShubham Gupta
- DWM_Experiment9_E059Uploaded byShubham Gupta
- Experiment1_E059_DWMUploaded byShubham Gupta
- Assignment Shubham (1)Uploaded byShubham Gupta
- DWM_Experiment6_E059Uploaded byShubham Gupta

- Takion API and Developer DocumentationUploaded byphoenix92x
- irules-101--08--classesUploaded byKishore Kumar
- Installing Vtiger on GodaddyUploaded byAlejandro Hernandez
- gcryptUploaded byESPOIR KODJO DANVIDE
- KhanUploaded byNaeemkhan Khan
- C Program on Electricity-billing-systemUploaded byAnand Goud
- Color by Numbers With DSMUploaded byrachmat99
- Microprocessor 8085 pdfUploaded bySuriya Krishna Mariappan
- Pascal Environment SetupUploaded bymamcapiral25
- CN-Lab Final_1458661908640Uploaded bypriyanka
- CIS016-1-Ass2_ePortfolio_2016-17ver2(1)(1)Uploaded byB Mwas
- SchedLgUUploaded byPokret
- Velikanovs - Performance Tuning and Troubleshooting for Oracle OC4JUploaded byrockerabc123
- Female_Agent_1.4.6.txtUploaded byAnonymous JwvCaLC
- Python Console Application Development 2Uploaded byDishant Shukla
- lab06-solUploaded byBatman
- OrangeUploaded byjamesbor
- Bisection MethodUploaded bySohar Alkindi
- PPLHW1Uploaded bygameloon
- Frequent Pattern Mining Using Fp Trees (1)Uploaded byVenkataraman Kamath
- VIC Revealed (US Version)Uploaded byCubemanPDX
- Cs 604 Recent SolvedUploaded bychi
- Computer Studies2Uploaded bynoman21
- BTech II Year Comp. Sc. & Engg. and Related Programs (Effective From 2017-18)Uploaded bykrishan
- Bab 1 - Introductory ConceptsUploaded byNur Hasanah
- JQUERY Referance - Jan ZumwaltUploaded byjwzumwalt
- PerlUploaded byconnect2praveen
- CourseNotes FortranUploaded byFessal Kpeky
- Introduction to the Message Passing Interface (MPI) Using CUploaded byvineet verma
- F#Uploaded bySharifur Rahman