Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
2003Data Mining Tut 3

2003Data Mining Tut 3

Ratings: (0)|Views: 12 |Likes:
Published by sreekarscribd

More info:

Published by: sreekarscribd on Nov 12, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as DOC, PDF, TXT or read online from Scribd
See more
See less

11/12/2011

pdf

text

original

 
Intelligent Data Analysis and Probabilistic InferenceData Mining Tutorial 3: Clustering and Associations Rules
1.i.Explain the operation of the k-means clustering algorithm using pseudo code.ii.Given the following eight points, and assuming initial cluster centroids given by A, B, C,and that a Euclidean distance function is used for measuring distance between points, usek-means to show only the three clusters and calculate their new centroids after the secondround of execution.2.i.Explain the meaning of support and confidence in the context of association rulediscovery algorithms and explain how the a priori heuristic can be used to improve theefficiency of such algorithms.ii.Given the transactions described below, find all rules between single items that havesupport >= 60%. For each rule report both support and confidence.1: (Beer)2: (Cola, Beer)3: (Cola, Beer)4: (Nuts, Beer)5: (Nuts, Cola, Beer)6: (Nuts, Cola, Beer)7: (Crisps, Nuts, Cola)8: (Crisps, Nuts, Cola, Beer)9: (Crisps, Nuts, Cola, Beer)10:(Crisps, Nuts, Cola, Beer)yg@doc.ic.ac.uk, mmg@doc.ic.ac.uk 16
th
Dec2003IDXYA210B25C84D58E75F64G12H49
 
3.a. Explain how hierarchical clustering algorithms work, make sure your answer describes what ismeant by a linkage method and how it is used. b. Explain the advantages and disadvantages of hierarchical clustering compared to K-meansclustering.4.The following table shows the distance matrix between five genes,G1G2G3G4G5G10G290G3370G46590G51110280i.Based on a complete linkage method show the distance matrix between the first formedcluster and the other data points.ii.Draw a dendrogram showing the full hierarchical clustering tree for five points based oncomplete linkage.iii.Draw a dendrogram showing the full hierarchicatree for the five points based on singlelinkage.yg@doc.ic.ac.uk, mmg@doc.ic.ac.uk 16
th
Dec2003
 
Data Mining Tutorial 3: Answers
1.Clusters after 1
 st 
iterationCluster1: A (2,10), D (5,8), H (4,9)Cluster2: B: B (2,5), G (1,2)Cluster3: C (8,4), E (7,5), F (6,4)Centroids after 1
 st 
iterationCluster1: centroid: (3.66, 9)Cluster2: centroid: (1.5, 3.5)Cluster3: centroid: (7, 4.33)Clusters after 2
nd 
iteration(no change)Cluster1: A (2,10), D (5,8), H (4,9)Cluster2: B: B (2,5), G (1,2)Cluster3: C (8,4), E (7,5), F (6,4)Centroids after 2
nd 
iteration (no change)Cluster1: centroid: (3.66, 9)Cluster2: centroid: (1.5, 3.5)Cluster3: centroid: (7, 4.33)
2.
 Initial Supports Beer: Support = 9/10Cola: Support=8/10 Nuts: Support=7/10Crisps: Support=4/10 (Drop Crisps) Beer, Cola: Support=7/10 Beer, Nuts: Support=6/10Cola, Nuts: Support=6/10  Beer->Cola (Support=70%, Confidence= 7/9=77%Cola->Beer (Support=70%, Confidence= 7/8=87.5 Beer->Nuts (Support=60%, Confidence= 6/9=66% Nuts->Beer (Support= 60%, Confidence= 6/7=85.7%Cola->Nuts (Support=60%, Confidence= 6/8=75% Nuts->Cola (Support=60%, Confidence= 6/7=85.7%
yg@doc.ic.ac.uk, mmg@doc.ic.ac.uk 16
th
Dec2003

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->